NeoArch

February 27, 2007

OCR, Open Source, and Archival Materials

Filed under: Access, OCR, digitization, open source — Jason @ 2:49 pm

While in the workshop today, I started wondering about whether there were open source OCR software options that could be used in conjunction with the GIMP. There is one piece of software that I found that does it. I also found this article, and I can’t wait to try it out sometime.

Digital Imaging Workshop

Filed under: Access, digitization — Jason @ 2:00 pm

I am spending today at a Solinet workshop in Lexington, KY. The workshop deals with digital imaging. So far, it’s been fairly informative. Probably the most helpful aspect of the workshop thus far has been the discussion of the scan once methodology for digitizing images. We also received some pretty helpful handouts that have some nice tables that describe the file formats, resolution, and bit depth that you should use for master, intermediate, and access files. I may try to reproduce the tables on here later as it is pretty common information, but the way it was arranged was helpful. I am sure I will be using these tables in the archives.

January 16, 2007

New Tool for Archivists

Filed under: Arrangement, Description, PHP, Systems, digitization — Jason @ 8:39 pm

Archon LogoI mentioned earlier that the Archivist’s Toolkit had been released. Tomorrow, the Archon Project is scheduled to release version 1.10 of Archon. I played with version 1. It was easy to install and a very powerful tool. The digital library function was particularly promising. I noticed a couple of bugs in version 1 while testing it, so I didn’t put it into production. I am fairly certain that these bugs will be remedied in this latest release, and I cannot wait until I get the time to install and test it. All the folks at UIUC who were involved in this project need to be congratulated for their vision and vigor because they brought such a project into being. What an interesting and exciting time it is to be an archivist!

November 22, 2006

Digitization and Emulation

Filed under: digitization — Jason @ 8:21 pm

Continuing the theme from yesterday, I read this post today that extols the virtues of emulation as an answer for some problems with digital formats.

All are good points, but I seriously doubt people will be able to write emulators for every application, file format, and operating system in the future. Besides, most of the emulators I have seen seem somewhat clunky. Perhaps as the technology develops, clunkiness will decline.

Emulation seems like the best answer in many ways, but in many more it just seems untenable. We’ll see.

November 21, 2006

Fear and Loathing in the Digital Ice Age

Filed under: Preservation, digitization — Jason @ 5:40 pm

And before you ask, digital ice age does not refer to my lack of posting (Peggy, I promise I’ve not forgotten you). The title refers to two interesting pieces that I enjoyed reading today and wanted to mention. The first is a post by a “Dangerous” LIS student who wants to be an archivist.  She’s asking some of the questions that should keep us up at night, especially if we believe Ham was correct about selecting materials that document human experience.

The second is a piece from Popular Mechanics on the instability of digital information (HT: Russ).  I really enjoyed this article. It basically repeats some of the same things I have read and heard archivists say about digital preservation, except for the final line:

And remember, a printed copy is sometimes the best form of backup.

It’s funny. I often think that too. Almost every digital preservation work I have read and workshop I have attended says this is not the case. After all, archivists rightly contend that metadata is important to ensure the authenticity of documents. And authenticity is extremely important. Still, I often think that it’s better to have a printed, stable copy of a work with little or no metadata, than it is to have no document at all.

April 5, 2006

Digitization Gone Awry

Filed under: Access, Library Science, digitization — Jason @ 8:53 pm

I like digitization. I like for people to be able to access books and other resources remotely. I understand that digitization costs money. I know that the various vendors who digitize usually need to get some sort of return on their investment. But this is ridiculous. Logos Bible Software company is willing to sell the “J. A. Broadus Preaching Collection,” a digital collection of three John A. Broadus books, for the low, low price of $59.95. That’s the price with $15.00 off, folks. They assure you it’s a bargain, too. They could “only locate a single copy of Sermons and Addresses anywhere on the web–available used for $100!” They should have looked harder. Alibris has five copies right now, the most expensive of which is $34.95. They are also willing to sell you A Treatise on the Preparation and Delivery of Sermons as part of the package. All well and good. The only problem is, there are already two free, standards-compliant, online editions of the work here and here (you can also get this free). Both of these editions are older than the Dargan-edited edition that Logos is offering.

Look, I know Logos probably has major $$$ invested in equipment, workers, and the like. Still, I think the price on this software is a little exorbidant. I am willing to bet that

  • Logos paid nothing for the books, because they used copies from a theological library.
  • Logos paid nothing for the copyright, because they are in the public domain.
  • Logos could probably sell three of the collections at that price and more than make up for any amount of money it cost them to have an employee scan the books.

I know there are attendant costs with digitization, but it seems crazy to me to charge that much for something that libraries are trying to provide for free. If you are going to charge a good bit, provide a good bit of content. For example, Baptist Standard Bearer’s Baptist History Collection costs $59.95, but you get 43,298 pages with it.

Of course, the whole discussion brings up the concept of the invisible web, because the library versions of Broadus’ work are buried or non-existent in a good Google search, while Logos’s product is the second entry. Libraries need to do a better job of bringing their digital resources to the fore so that these types of digitization ventures do not occur.

Blog at WordPress.com.