Building Digital Collections: A Technical Overview
The American Memory historical collections at the Library of Congress are the product of a permanent commitment to explore and establish the best practices of digitization, online presentation and access, and digital preservation of historical materials. The information on this page documents current solutions to technical challenges and solutions devised and implemented in the past. The page is updated and expanded periodically.
Information about copyright, privacy and publicity rights regarding the Library of Congress collections can be found on the Library's Legal Notices page.
Consult the Library's Learning Page for information about copyright and fair use.
The Library of Congress' commitment to digitizing historical materials and making them broadly accessible led to an early and persistent concern with establishing versatile and flexible metadata protocols. The documents below explore some of our recent experiences.
Available and Useful: OAI at the Library of Congress. Describes Library of Congress experience as an early adopter of the OAI Protocol for Metadata Harvesting. Published in: Library Hi Tech 21(2), 2003, pp. 129 - 139 [DOI:10.1108/07378830310491899]
Library of Congress collections for which records are available for harvesting through the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). Item-level metadata for some Library of Congress collections of historical materials is available for harvesting using OAI-PMH version 2.0. Included are selected collections from American Memory and the Prints & Photographs Division Online Catalog.
Basic Genre Terms for Cultural Heritage Materials. This basic list of genre terms has been compiled to facilitate the American Memory descriptive record normalization project.
The American Memory raises preservation challenges on two fronts: preserving original Library items fully and accurately in digital form; and designing this vast treasury of digital objects so that their utility and accessibility survive and flourish beyond the inevitably limited lifespan of any single technological platform. The links below offer examples of how the Library is meeting these challenges.
Digital Audio-Visual Preservation Prototyping Project. The audio-visual prototyping project is developing new approaches for reformatting recorded sound and moving image collections and experimenting with new ways to present them to researchers. (July 2003)
Scanning and Conversion
Digitizing priceless historical materials poses a unique challenge. Thus American Memory demands continuous and resourceful attention to evolving standards and practices in scanning and other methods of digital conversion. The documents below reflect some of the most authoritative and pertinent of these efforts; other information is linked under Background Papers in the next section.
The Library of Congress is participating in the Federal Agencies Digitization Guidelines Initiative (FADGI), a collaborative effort by federal agencies formed in 2007 to define common guidelines, methods, and practices to digitize historical content in a sustainable manner. Within this, there are two separate working groups, Still Image and Audio Visual, with documentation specific to these areas.
The Technical Guidelines for Digitizing Cultural Heritage Materials, released by FADGI in 2010, is the master document that defines a set of guidelines for still images as recommended by the group. This set of guidelines will be updated on a regular basis.
Library of Congress Technical Standards for Digital Conversion of Text and Graphic Materials (PDF) (280 Kb)
The Library is currently revising a series of standards and best practices to guide the Library's digital conversion efforts. These documents detail the current digitization standards followed by the Library. (December 2006)
Tables for Quick Reference
- LC Baseline Tags for TIFF Images (PDF) (32 Kb)
- Summary of LC Image Quality Standards by Document Type and Expected Outcome (PDF) (44 Kb)
Conversion Specifications for Contracted Scanning Services. NDLP Requests for Proposals for scanning and text conversion of original paper documents, microfilm, and pictorial materials.[3 documents] (1996).
Illustrated Book Study: Digital Conversion Requirements of Printed Illustrations.(View HTML or PDF version.) By Anne R. Kenney and Louis H. Sharpe II with Barbara Berger, Rick Crowhurst, D. Michael Ott, and Allen Quirk. Report prepared for the Library of Congress by The Cornell University Library Department of Preservation and Conservation and Picture Elements, Incorporated, to determine the best means for digitizing the vast array of illustrations used in 19th and early 20th century commercial publications.(July 1999)
Final Report of the Library of Congress Manuscript Digitization Demonstration Project. (View HTML or PDF version.) Includes copies of sample images created during two phases investigation of best practices for digitizing manuscript documents. Sponsored by the Library of Congress Preservation Office in cooperation with the National Digital Library Program. (October 1998)
Recommendations for the Evaluation of Digital Images Produced from Photographic, Micrographic, and Various Paper Formats. A report from the Image Permanence Institute (IPI) in Rochester , New York , providing recommendations for methods to evaluate the performance and products of scanning service providers. Principal investigators were James Reilly and Dr. Franziska Frey (May 1996)
American Memory DTD for Historical Documents. The text of digital reproductions in American Memory is almost always marked up in SGML using a TEI-conformant DTD. The same DTD is used whether the text is generated by human transcription or optical character recognition (OCR). [3 documents] (June 1998)
EAD Finding Aids. Links to archival finding aids available online at the Library of Congress. Several of the American Memory collections have associated archival finding aids. These finding aids have usually been developed to guide users of the physical collection, organized into boxes and folders. The finding aids have been marked up in XML following the Encoded Archival Description (EAD) standard. They may be viewed as framed and unframed HTML and printed as PDF; the HTML and PDF versions are generated from the EAD XML. (January 2006)
The following documents include some of the most significant materials from the early technical history of the American Memory Web site.
Getting the Picture: Observations from the Library of Congress on Providing Online Access to Pictorial Images. Describes selected aspects of the Library's practical experience and current practices from digital capture through interactions with users, with an emphasis on the integration of access to pictorial images online with other services and activities at the Library. Published in: Library Trends. (Fall 1999)
Technical Information for Applicants to the LC/Ameritech National Digital Library Competition. This three-year competition accepted applications from 1996 to 1998. The information provided to applicants in 1998 provides a snapshot of many technical issues relating to the creation and identification of digital reproductions and to interoperability across the Internet. (1996-1999)
Historical Collections for the National Digital Library: Lessons and Challenges at the Library of Congress. A two-part article in the April and May 1996 issues of D-Lib magazine. Touches briefly on a wide range of issues associated with building a library of digitized historical collections. (1996)
Turning Pages within a Digital Reproduction. One example of a solution to a problem found in many digitization projects -- how to present a sequence of images considered a single item from a bibliographic viewpoint. (May 1998)