- Item Selection and Description
- Unique Identifiers and File Naming
- Digitizing the Collection
- Resource Description and Access
Ruthanne Vogel, Project Coordinator, University of Miami Library;
Gail Clement, Technical Advisor, Florida International University Libraries
Item Selection and Description
Reclaiming the Everglades: South Florida's Natural History, 1884-1934, is a consortial digitization project which includes historical resources from three institutions: the University of Miami Library, Florida International University Libraries, and the Historical Museum of Southern Florida. The collections of the three participating institutions contain an immense amount of material documenting the history of the Everglades, a unique subtropical ecosystem that has a rich, but troubled past. Items selected for the Reclaiming the Everglades project draw from fourteen separate but interrelated manuscript/ephemera collections and twenty-one rare publications. Source materials contained in the selected assemblage comprise personal papers, manuscripts and typescripts, rare books and periodicals, personal diaries, scientific or engineering reports, black and white photographs, telegrams, pamphlets, maps, rare color postcards, and other documents. Project staff at the participating libraries selected items for inclusion based on their subject matter, time frame, format, and their relevance to other collections represented in the project. Most of the collections represented in "Reclaiming the Everglades" are massive, and only a fraction of each collection was digitally reproduced for this project.
The selection process involved examination of materials at the item level. For the purposes of project management, an 'item' was defined as a complete bibliographic unit - a manuscript folder, a scrapbook of photographs, or a monograph. Folders organized by topic were most often included in their entirety, although some folder contents were excluded if they fell outside the project date limits of 1884-1934. Folders organized as dated correspondence files required more detailed analysis, where only a subset of the folder's contents were selected on subject merit. As items were chosen, they were assigned a unique identifier (described in the following section) and entered into a project tracking database developed in Filemaker Pro. The tracking database contained a brief physical description of each item, along with descriptive information such as dates, place and people names and subject terms that would be useful to the project cataloger. The project tracking system was updated as items moved through the production process, from selection to capture, cataloging, and image conversion and loading.
Unique Identifiers and File Naming
To maintain item integrity and insure proper order for multiple page documents, a file naming standard was developed for the project. Unique identifiers assigned to each item in the project formed the basis for all file-naming. These identifiers were devised as ten-digit alphanumeric strings comprising two parts: (1) a two-character collection code; and (2) an eight-digit number assigned sequentially to each item digitized from a given collection. An example of an identifier is "md00480125", assigned to the item "Articles about the Everglades". Each part of the identifier is discussed in greater detail below.
The two-character collection codes used in Reclaiming the Everglades are as follows:
- AM- Mary McDougal Axelson Papers (University of Miami)
- CM- Claude Matlock Photographs (Historical Museum of Southern Florida)
- EP - Ephemera collection, Historical Museum of Southern Florida
- JC - James M. Carson Papers (University of Miami)
- JG - Dr. John Gifford
- JJ - James Franklin Jaudon Papers (Historical Museum)
- JS -- John Kunkel Small
- MD - Marjory Stoneman Douglas Papers (University of Miami)
- ML - Model Land Company Records (University of Miami)
- MW - Minnie Moore-Willson Papers (University of Miami)
- PF - Publications (Florida International University)
- PU - Publications (University of Miami)
- RM -- Ralph Munroe Photographs
- UM - University of Miami Presidential Papers (University of Miami)
- VM- Visual Materials, Historical Museum of Southern Florida
Based on these codes, the item mentioned above, "Articles about the Everglades" was assigned an identifier beginning with "md." because it was selected from the Marjory Stoneman Douglas Papers at the University of Miami.
The second part of the identifier -- the eight-digit number-- reflects the position of the item in a given collection. In the case of manuscript materials, these eight digits represent a four-digit numeric box number and a four-digit numeric folder number. For example, "Articles about the Everglades" from the Marjory Stoneman Douglas Collection bears the identifier "md00480125", indicating it represents Box 48, Folder 125 in this collection. In the case of monographs, the eight digits are simply sequential numbers. For example, the book "The Knockabout Club in the Everglades" was assigned the identifier "pf00010003", indicating it was the third book digitized from Florida International University.
The digital images generated for each item were assigned file names derived from the item identifier. Specifically, the identifier was appended with an underscore and a four-character code representing a single leaf, page or image. For example, the file named "md00480125_ 002a.jpg" represents the first page of the second article in the folder entitled "Articles about the Everglades".
Digitizing the Collection
Digital capture of all items included in the project was performed by a contracted vendor, Thomson Photo Imaging of Coral Gables, Florida. The image capture took place on site at the University of Miami and the Historical Museum of Southern Florida. Depending on the size and format of the originals, the images were captured using one of two methods: direct flatbed scanning; or photographic imaging and subsequent scanning of the film intermediaries. As a general rule, unbound materials with dimensions of no more than nine inches by twelve inches were scanned on a flatbed scanner (hardware descriptions below). Larger items and bound materials were captured on 35mm film with a still camera. Maps, fold-outs and other large format items were captured both in their entirety (at a small map scale) and in overlapping segments (at a larger map scale) to provide as much detail as possible.
Specifications for equipment are as follows:
|Flatbed scanning||Epson 836 XL scanner at 600 dpi.|
|Photographic imaging||Kodak GA 100 film by Leitz equipment|
|Film scanning||Kodak Photo CD closed loop system; burned to a Kodak Photo CD disc in PCD format at 220 to 400 dpi.|
All flatbed scans were scaled to 100 percent of source document dimensions and captured at 600 dpi. Originals without significant color detail were captured as grayscale at 8-bits per pixel. Those items with color detail were captured as color at 24-bits per pixel. Images from Kodak Photo CDs were converted by opening them at the highest resolution (2048 x 3072) and sizing them back to the dimensions of their originals. Resolution for these items varied with the size of the original and the required distance of the camera during the capture process. In most cases, a minimum 300 dpi was achieved (excepting the overview captures of the full size maps). All image files were stored as uncompressed ITU TIFF (Tagged Image File Format) files and burned to CDs.
The contractor included a resolution target (IEEE STD 167A.3-1997) as the first image for each bibliographic item scanned; they also imaged the target for every new roll of film used in photographic capture. All target images were inspected to ensure that each digital reproduction met requirements for resolution, tonality, and color management. Images failing this inspection were re-scanned or re-photographed as necessary.
Project staff at the participating libraries performed quality control by inspecting every image in thumbnail view and approximately 10% of the images in full-image view. As images from the vendor were accepted, project staff updated the corresponding records in the project tracking system.
Generation of Images for Online Use
All TIFF images produced through the image capture process were transferred to the Florida Center for Library Automation (FCLA) for storage, conversion to JPEG and PDF derivatives, and access. All uncompressed TIFFs were saved as archival masters. From the TIFF images, FCLA produced browse-quality JPEG derivatives using Adobe ImageReady Version 2.0 in a batch executable process. The TIFF image was resized setting the width to 600 pixels and the height accordingly. The process then progressively optimized the image, creating a browse image that displays progressively in a Web browser. Viewers see a low-resolution version of the image before it downloads completely.
Creation of PDF files from the TIFF images was performed automatically as a function of locally written loader software developed by FCLA for the Florida Heritage project (external link). The loader calls LeadTools custom ActiveX control to open sets of JPEG images, and then uses Thomas Mertz's PDFLib software to build the PDF.
Resource Description and Access
The project cataloger at the Historical Museum of Southern Florida created full MARC catalog records for each bibliographic item in Reclaiming the Everglades following two sets of guidelines:
- Caroline R. Arms, Access Aids and Interoperability. (Washington D.C.: National Digital Library Program, Library of Congress), August 18, 1997. Online, URL: http://memory.loc.gov/ammem/award/docs/interop.html
- Cataloging and Access Guidelines for Electronic Resources (CAGER) Committee, Cataloging Guidelines for Electronic Resources, Part 1: Digitized Collections - general guidelines. (Gainesville, FL: Florida Center for Library Automation), August 2000. Online, URL (July 2003 version): http://www.lib.usf.edu/techservices/CAGER/CAGERGuidelines-Pt1rev.html (external link)
All MARC records created for Reclaiming the Everglades were loaded in the WebLUIS database maintained by the Florida Center for Library Automation and made searchable through the Reclaiming the Everglades (external link) interface maintained by Florida International University. Copies of all project MARC records were also sent to the Library of Congress for integration into American Memory.
For every bibliographic item in Reclaiming the Everglades, a file of structural metadata was created to indicate the relationship between the physical units of digitization (TIFF, JPEG and other images) and the logical units of publication (pages, chapters, and other parts). The metadata format used is a modified version of the Elsevier EFFECT format called DataSet.Toc. For each electronic resource (book volume, journal issue, manuscript, etc.), the DataSet.Toc file:
- identifies and names the image files comprising the resource, defines the order of images,
- identifies and names the subsections (such as chapters), says which images belong to particular subsections, and establishes the order and hierarchy of subsections.
More detailed information about the structural metadata format (external link) used in the project is available from the Florida Center for Library Automation web site.
Linking from MARC Records to Images
FCLA assigned PURLs (Persistent URLs) to each item display and added these PURLs to the 856 field of each MARC record, so that the WebLUIS catalog records can link both to the JPEG images and the PDF files. Copies of the records were sent to the Library of Congress for integration into American Memory. Minor changes were made to the records for full compatibility with American Memory, for example, to ensure that the address of the owning institution is displayed. Also delivered to the Library of Congress were thumbnail images to represent the item or group of items decribed in each catalog record. American Memory bibliographic displays link through the same PURLs to the digital images mounted at FCLA.