The Library of Congress | American Memory
»Lawmaking Home » Digitizing the Collection

Building the Digital Collection

Book & Computer image

The decision to digitize a large selection of materials from the Law Library of Congress was first broached in early 1995 when the National Digital Library Program (NDLP) began planning to digitize new collections. After nearly two years of document selection and preparation, digital conversion of Law Library materials began early in 1997. The original goal of achieving a critical mass of images from 440 volumes and 668,000 kilocharacters of associated machine-searchable, SGML-encoded text has been reached and work is ongoing.

The conversion of a collection of traditional bound volumes to a collection of digital images and text presented the Law Library conversion team with serious challenges and posed far-reaching implications.

Selecting the Materials

Several factors affected the choice of materials presented in the digital collection. In keeping with the focus of the NDLP, the Law Library selected primary source government documents that reflect the creation and evolution of the United States as documented by Congress. The Law Library also selected materials which complement other collections digitized by the NDLP, such as the papers and diaries of George Washington from the Manuscript Division.

Handling the Volumes

Book image

After selection, the next challenge faced by the joint Law Library/NDLP team was how to manage the physical material. The team decided that source documents would not be disbound as part of the digitization process. Some of the volumes selected are in their original bindings in use for more than 150 years, while other volumes have been rebound over the years. Many volumes are fragile. The Library's conservators feared that even with great care and gentle handling, some of the volumes might not be robust enough to withstand the physical process of digitization.

With this in mind, the scanning equipment that was chosen places minimal stress on the books by using overhead scanners, digital cameras, and custom-built book cradles. The team worked closely with the conservators to ensure that the scanning methods conformed to Library of Congress standards for handling and care of source material. Every volume was examined to determine its condition, including how far it could safely be opened without damaging the text block or spine and without cracking or damaging pages and foldouts. In cases where the potential for damage was deemed too great, the team obtained permission to locate and use more robust volumes from the collections of other libraries, such as that of the U.S. Senate.

Creating Digital Images

For most text, as well as page size illustrations and maps, images were captured as 300 dpi bitonal TIFF (Tagged Image File Format) images. These images are offered in two sizes, a quick reference GIF (Graphics Interchange Format) and a bitonal TIFF image. Due to their age, source materials vary in tonality on a single page, and some "noise" is inevitable in these images. Since the Law materials are published documents, the intent in scanning capture was to create the most legible images for online users. The exceptions are:

Creating Digitized Text

For the most part, a combination of two types of text were created for the collection: full text transcriptions and partial keying of indexes. (See Using the Collection for a breakdown of the keying of content for individual titles.) The materials with full text transcriptions were encoded with Standard Generalized Markup Language (SGML) according to the American Memory DTD. The text was translated with an OmniMark program to HTML 3.2 for indexing and viewing on the World Wide Web. For most materials appearing as page images only, a ASCII database was produced to provide information for each page image. The database makes possible the page-specific information of certain titles found at the top of the page-turning feature, the links to date-related documents from the full text to the page-image-only titles, and the indexing of words and phrases for retrieval (e.g., page headings).

Person Assuring Quality Assuring Quality

Ensuring the accuracy of the digitized product represents a significant challenge in such a large quantity of material. The images must be acceptable at the rate of 99.5 percent and the final text product must meet the accuracy requirement of 99.95 percent. The staff uses a combination of automated diagnostic tools and time-honored proofreading techniques to ensure that the finished product meets these standards.

»Lawmaking Home » Digitizing the Collection
The Library of Congress | American Memory