So I wrote these for a guy who wanted help cataloging a small collection: I'd be farther along on the real work if I had just done it, but it was a nice distraction.

The idea is that you have a proxy server running on your local machine, and when you find the marc data that you want it parses it out of the html for you and puts it into your database. There are two versions here:

MARCproxy02.pl
6/29/04: a couple of changes: First, I've put it up as html: just copy the code from your browser window. It's now explicit that this is GPL. A couple of new features: the log file can be supplied as a command line argument, or the program will prompt for it. Also, you no longer need to be on the marc data page in order to grab: the program will look for a link to the marc data and follow it to get the data. As always, let me know if this works/doesn't work for you.
LibraryCatalog.tar.gz
This is the script I'm actually using (FMproxy.pl, once you untarzip it), and the FileMaker databases it works with (I'm running version 4.1 under OS 8.6). They're a compromise (probably not a satisfactory one) between representing the whole MARC structure, and something simple enough that a lay person could enter data by hand if need be. This is also a good example of using Mac::Glue to interact with FileMaker. [though it's also really slow: it might be better just to redo it with FileMaker being told to import the records, rather than trying to create them individually]

Some caveats: This has been working well for the guy I wrote it for, but I haven't gotten feedback from anyone else. I'm on an old mac, so this is a non-forking proxy server, so you probably want to turn off graphics so that it's not any more bogged down than necessary. Finally, I've tuned the regexps for the Library of Congress catalog: if they change, or you want to use another catalog to get your records, you'll need to change the expressions.

The user interaction of both scripts is the same. Once they're started you should see this message:

In your browser please set proxy server to: http://[your IP address]:2512/
and then go to http://0.0.0.0/

With $debug set, the results of all the HTTP transactions will be sent to STDERR. If there are errors they should show up in the window for 0.0.0.0. At the very least there will be an error the first time you run it, unless you've already set the correct path for your output file by changing the value that $outputpath is initialized to at the start of the program (or $dbpath if you're playing with FMproxy.pl).

Let me know if this is useful, or if there are problems or changes that should be made, or if there is a better place to put these than here. Thanks.

Chuck McCallum, mccallucATyahooDOTcom.