- Physical Care and Workflow
- Text Conversion
- Interoperation between the Library of Congress and the University of Iowa Libraries
The University of Iowa Libraries received an award in the 1997/98 round of the Library of Congress/Ameritech National Digital Library Competition to support the digitization of the talent brochures from the Redpath Chautuaqua Collection. Bibliographic records were delivered to the Library of Congress for indexing as part of American Memory. These records enable links to digital reproductions mounted at the University of Iowa Libraries. More details about different aspects of the preparation of the digital collection follow.
This digital collection is drawn from the office files of the Redpath Lyceum Bureau, a booking agency for the Midwest Chautauqua circuits. Held by the University of Iowa Libraries, the physical collection consists of some 648 linear feet of materials dating between 1890 and 1940. It represents the most extensive holding of circuit Chautauqua materials in existence. For the Traveling Culture project, all 7,949 brochures in the talent portion of the collection were digitized. The brochures varied widely in shape and size, from a single page up to 50 or more pages. The total number of page images captured was 28,153.
The talent portion of the collection was separated into correspondence and brochures, with the intent of digitizing only the brochures. Since the correspondence was physically deteriorating, a photocopy was made on archival bond and the original correspondence was discarded. The brochures were also photocopied. The originals remained on site, and the copies were sent to an outside vendor who keyed in the text in order to provide full-text search and retrieval. The originals were cataloged by staff in the Special Collections Department and sent to the production staff for scanning before being refiled with the photocopied correspondence.
Each page of each brochure was scanned using large-format high-quality scanners at 32-bit color and 600 dots per inch. (Even though most documents were printed in black ink, all documents were scanned in color in order to capture the tone of the paper.) Each page image was then stored with full resolution as an uncompressed TIFF file, and archived onto CD-R. At the end of each day, a locally created batch process was run to downsample the original images and create derivatives in a variety of lower resolutions. The core component of the batch process was a locally created script for Adobe Photoshop using Main Event's PhotoScripter. The original image was reduced from 600 dots per inch to 300 dots per inch in order to create a working document. From the working document, the following were created:
- A single-page PDF document in black/white at 300 dots per inch
- A JPEG document for high-quality web display in 32-bit color at 150 dots per inch
- A GIF document for low-resolution web display in 8-bit color at 75 dots per inch
- A JPEG thumbnail document for bibliographic display at 15% of full size
- A tiny GIF thumbnail for the navigation bar at 5% of full size
Images were stored unedited; no stray marks or other blemishes were removed. The PDF files representing individual pages for a given brochure were concatenated in another batch process using AppendPDF from Digital Applications (now known as Appligent). The resulting PDF documents allow end users to print an entire brochure quickly at a lower resolution.
Brochures were cataloged individually, using a locally created FileMaker Pro database and a form optimized for quick data entry. Subject access points are based on both controlled and uncontrolled subject headings. The controlled headings are drawn from the Library of Congress Subject Headings, the Library of Congress Thesaurus for Graphic Materials (parts I and II), and the Art and Architecture Thesaurus. From the raw cataloging data, values equivalent to selected MARC fields and subfields were derived and bibliographic records to support search and retrieval were assembled.
To support full-text search and retrieval, the text in the brochures was re-keyed and marked up in SGML. The University of Iowa Libraries participate in the Committee on Institutional Cooperation (CIC) (external link), a consortium of midwestern universities. For consistency with other full-text initiatives within the CIC, the TEI-Lite document type definition (DTD) was selected. Only a small subset of tags were employed. An outside vendor was selected for double-keying and encoding the text, with a requirement for 99.5% accuracy. For each brochure, an SGML header (following TEI-Lite guidelines) was derived from the raw cataloging data and prepended to the full text documents in order to create complete SGML files.
At the University of Iowa, the various image formats and bibliographic information are tied together through a dynamically generated navigational interface. Each document begins with a "cover page" of bibliographic information, followed by each page image in the low-resolution GIF display (which, in turn, links to the high-resolution JPEG file), followed by the complete brochure as a printable PDF document.
Bibliographic records delivered to the Library of Congress support links to the dynamically assembled digital reproductions. Words from the text transcription were included as one field in the exported records. The records were transformed automatically at the Library of Congress to a tagged form to facilitate indexing by the InQuery search engine and merging into American Memory. The text words support retrieval but are not displayed since the markup makes no attempt to represent the graphic layout that is a significant aspect of these publicity flyers.
Each brochure has been assigned a unique identifier derived from the name of the performer (or performing group) and a number. The dynamic presentation of the item is generated by a program (script) at the University of Iowa Libraries. For example, http://sdrcdata.lib.uiowa.edu/libsdrc/details.jsp?id=/rossn/1 (external link) runs the details.jsp program to present the first brochure for Nellie Tayloe Ross. In each descriptive record, a Uniform Resource Locator (URL) of this form was embedded. Although not guaranteeing persistence or independence of location, this approach to implementing links provides a practical level of both characteristics. The University of Iowa Libraries can reorganize file-structures and change the design for presentation of items without requiring changes to the records stored at the Library of Congress. All that is required is that each identifier be treated as a permanent identifier for the corresponding item and that a script of the same name continue to generate a web-accessible presentation. Behind the scenes, the data and the images could be migrated to new hardware and software.