American MemoryThe National Digital Library Program: 
Archived Documentation

The Library of Congress / Ameritech National Digital Library Competition (1996-1999)


Digital Formats for Content Reproductions

NOTE: Links to resources outside the Library of Congress are to URLs that were active when this set of archived documentation was actively maintained. Some links may no longer be active because resources have been removed. If a link is active, the resource may have changed substantially since the documentation was created. No attempt will be made to trace the linked resources or to suppress bad links. The URLs are being retained for their value as historical evidence.

Carl Fleischhauer
Technical Coordinator, National Digital Library Program, Library of Congress
July 13, 1998

This document replaces the August 1996 version.


I. Introduction
II. Pictorial Materials
III. Textual Materials Reproduced as Searchable Text and Images
IV. Textual Materials Reproduced as Images
V. Maps
VI. Sound Recordings
VII. Moving-image Materials
VIII. Headers for Computer Files

I. Introduction

This document and two others have been prepared, in part, to offer guidance to applicants in the Library of Congress/Ameritech competition. The other two documents are Digital Historical Collections: Types, Elements, and Construction and Access Aids and Interoperability.

The trio of documents represent the evolution of the Library of Congress digital conversion activity during the years 1996 through 1998. The ideas and approaches outlined represent the collection-digitization effort of the American Memory pilot program (1990-1994) and the operational National Digital Library Program (1995-1998) that has followed the pilot. The Library of Congress recognizes that many avenues remain unexplored and that the process of trying and adopting new technology will continue.

The Library's selections of formats for archiving and for World Wide Web access to American Memory collections represent an attempt to balance reproduction quality, convenience of access, likely longevity of the format itself, production cost, and a preference for true standard formats or for industry formats that have been widely adopted. At this writing, the formats that offer the greatest stability and promise to endure for several years are the many image formats that reproduce documents and pictorial materials, and those used for searchable texts, including formats that employ Standard Generalized Markup Language (SGML) and Hypertext Markup Language (HTML). Formats that reproduce the time-based content of sound recordings and moving-image collections have less-certain longevity.

II. Pictorial Materials

For pictorial collections, the Library produces from three to four images for each original item and they fall into three categories:

Preview image or thumbnail image. A small image, typically presented with the bibliographic record, to allow users to judge whether they wish to take the time to retrieve a higher quality image.

Tonal depth: 8 bits per pixel
Format: GIF
Compression: Native to GIF
Spatial resolution: From about 150x100 to about 200x200 pixels
Example: Thumbnail (20 kb)

Service (or reference) image(s). "Fetchable" higher quality image(s), compressed for speed of access with a lower spatial resolution than the archival image. In current American Memory pictorial collections, only one service image is provided; future collections will offer two (or more) at varying levels of resolution. Service images are sometimes called reference images in the library sense of "available to researchers," and not in the scientific sense, where reference image is typically the highest quality version, used as a yardstick for measurements.

Tonal depth: Grayscale: 8 bits per pixel; color: 24 bits per pixel
Format: JFIF (JPEG File Interchange Format)
Compression: JPEG (generally about 10:1 compression for grayscale, 20:1 for color)
Spatial resolution: Moderate class ranges from about 500x400 to about 1000x700 pixels, with 640x480 the specification in current production contracts; higher resolution class (possible future mode) will range from about 1000x700 to 4000x3000; both moderate and higher resolution will be offered to users
Example: Service image (81 kb)

Archival image. An uncompressed image free of the artifacts resulting from lossy compression, provided to users for reproduction or held for future reprocessing as compression or other image-processing standards change. In some libraries, images compressed with the lossless LZW algorithm fill this niche. The Library of Congress plans to provide its archival pictorial images to WWW users in the near future. Such images are especially useful when making halftones for printing and there have been a number of instances in which publishers have used the Library's uncompressed images to illustrate newspaper articles.

Tonal depth: Grayscale: 8 bits per pixel; color: 24 bits per pixel
Format: TIFF (Tagged Image File Format)
Compression: Uncompressed
Spatial resolution: Moderate type (past practice) ranges from about 500x400 to about 1200x1000 pixels; higher resolution type (coming soon) will range from 3000x2000 to 5000x4000; only the highest resolution will be archived.
Example: Archival image (1.3 mb)

Alternative formats. Several organizations have used the Kodak PhotoCD (Image Pac) format in their imaging projects. Originally associated only with CD-ROM disks, this multi-resolution format may now be written to other storage media. The Library has not had extensive experience with PhotoCD/Image Pac. Archives wishing to produce collections that are interoperable with those at the Library of Congress and who plan to use PhotoCD technology should either determine how direct access to those images may be provided to WWW users or plan to reprocess the Image Pac images to produce GIF and JFIF/JPEG images for WWW access in association with the American Memory site. Other emerging image options include the FlashPix format and the use of PNG-compression; the Library of Congress has not used these options but is interested to hear from other libraries or archives that have tried them. See the section on maps for discussion of wavelet compression.

III. Textual Materials Reproduced as Searchable Text and Images

Searchable transcriptions of books, pamphlets, or manuscripts can be a tremendous aid to a researcher seeking instances of particular words or phrases in a textual work. Transcribed text, especially when encoded with markup language, can also facilitate the researcher's navigation of a document. The cost of providing perfect or near-perfect transcriptions is very high, however, and, for many researchers, proper understanding of a document may depend upon seeing not only the transcription but also a facsimile image (and in some cases, the original). For these reasons, the Library has experimented with the presentation of manuscript and printed matter items as a coordinated set of page-images and searchable text. In some older American Memory collections, however, separate images of illustrations and tables were provided in lieu of full page-image sets.

Thus far, the Library has employed a "text-in-front, images-behind" presentation, appropriate when the accuracy of the converted text is reasonably high and when the text is formatted sufficiently to permit a comprehensible display. In the text-in-front mode, once a researcher has identified an item he or she wishes to view, e.g., by searching the bibliographic records for a collection, the item is presented as a text. In most American Memory online collections, a HTML version of the text is displayed together with the option of displaying the SGML version. As the researcher scrolls through the text, hyperlinks permit the researcher to view the facsimile page images. For example, see the book Modern Dancing by Mr. and Mrs. Vernon Castle.

An alternate approach is "image-in-front, text-hidden-behind," suitable when text accuracy is not as high and/or when the text is not formatted for display. Handsome presentations in the image-in-front mode may be seen in the JSTOR and Making of America projects.

The Library encodes its documents using Standard Generalized Markup Language (SGML; ISO 8879), as described below. The Library's American Memory document type definition (DTD) conforms to the international guidelines for humanities texts developed by the Text Encoding Initiative (TEI). The SGML-encoded version of the text serves as an archival file and is also made available online. In addition, online access is provided to HTML texts derived from the SGML archival files. The Library's transcription requirement for contractors is 99.95 percent accuracy compared to the original.

Since the Library always places SGML texts online together with bibliographic records or a finding aid, the headers within the SGML documents contain minimal bibliographic information. (At this time, there are no headers associated with the HTML versions of the texts.) For a more detailed description of the Library's approach to reproducing texts using SGML, see American Memory DTD for Historical Documents.

The page and illustration images associated with the searchable texts employ the formats for tonal and bitonal document images described below. The Library continues to tinker with its approach but at this writing, the most frequent structure offers access to the archival or master version of the page image (see below; typically 300 dpi bitonal TIFF/ITU Group IV images). Also, some recent additions to American Memory have presented inline versions of illustrations or pages that contain illustrations (see below; typically 200x400-pixel tonal GIF images).

The American Memory DTD employs entity references to link to page-image files. These entity references are embedded in the tags that mark the location of a link in the SGML-encoded document; the entity values typically consist of the filename, without extension, for a page image, illustration, or table. Each SGML-encoded document has an associated entity file, a text file of entity declarations that cite the correspondence between the entity value in the SGML document and the digital filename to which it refers.

Searchable text with markup: ASCII text, with TEI-conformant SGML markup
Document Type Definition (DTD): American Memory DTD (ammem.dtd)
Samples: SGML text (153 kb) and entity file (9 kb)

Derivative texts presented in HTML format. In order to serve World Wide Web users in the most convenient way, the Library presents HTML versions of texts that have been derived from the SGML archival files. In the Library's most recent online offerings, longer--generally book-length--texts are presented in chapters or sections. (Longer texts in older online collections will be retrofitted as time permits.) The presentation of segments of longer texts creates smaller files that are easier for users to access.

The need to present derivative text files reflects the shortage of WWW browser plug-ins or viewers capable of displaying SGML files. The Library anticipates that the emerging Extensible Markup Language (XML) specification will improve the WWW environment for handling complex marked up texts.

The HTML versions of the Library's texts are the result of a two-step process. First, the SGML texts are transformed to a format that is suitable for the indexing routines used by the InQuerysearch engine. Then, when a display presentation is called for, these texts are formatted as HTML on the fly. For example, see the document Journal of the House of Representatives of the United States, 1789-1873.

IV. Textual Materials Reproduced as Images

The following discussion of text page-images applies to images associated with searchable texts (see preceding section) and image-only presentations of manuscripts and printed documents.

Archival or master images. At first, projects like American Memory equated document images with "pure" black and white bitonal images. Indeed, late-nineteenth- and twentieth-century typography and line art is often successfully reproduced in a bitonal image. With documents of these types, the Library typically creates images with a spatial resolution of 300 dpi. In contrast, the Cornell University book preservation project has demonstrated the value of resolutions of 600 dpi or higher, especially when high quality printed output is desired. Bitonal images are attractive because they compress very efficiently and provide better output with many laser or ink-jet printers.

Bitonal document image: One bit per pixel
Format: TIFF
Compression: ITU Group IV
Spatial resolution: 300 dpi
Example: Bitonal page image (31 kb)

Bitonal images, however, often fail to adequately reproduce pages that exhibit variation in reflectance values (brightness, darkness, and color), e.g., manuscripts or older printed matter. With such documents, the shade and condition of the paper may vary (sometimes on the same sheet) and the marks or strokes inscribed upon the sheet may vary in width and density. To reproduce such documents in bitonal form risks the loss of textual information and significant loss in the look and feel of the original.

Since 1994, the Library has been experimenting with tonal (color and grayscale) reproduction of manuscript and older printed documents. In these experiments, the Library has wrestled with the provisional determination of some materials as rare or "treasured" and others as ordinary or "routine." For treasures, the stakes are high and the Library archives uncompressed digital images. For example, 300 dpi uncompressed grayscale images were archived of the pages in four Walt Whitman notebooks and 300 dpi uncompressed color images of a Mahler music manuscript (not available at this time). For comparison, in a project at the University of Virginia Electronic Text Center, images of pages in treasured first editions are being produced at 500 dpi in 24-bit-per-pixel color.

Uncompressed tonal
archival image:
Grayscale: 8 bits per pixel; color: 24 bits per pixel
Format: TIFF
Compression: uncompressed
Spatial resolution: 300 dpi
Example: Grayscale uncompressed archival image (2.4 mb)

A contrasting approach is offered by the Library's Manuscript Digitization Demonstration Project (report forthcoming), which scanned some twentieth century typescripts deemed "routine" (i.e., not treasures) by a steering committee. The color and grayscale tonal images produced received modest (about 5:1) compression with the JPEG algorithm.

Compressed tonal
archival image:
Grayscale: 8 bits per pixel; color: 24 bits per pixel
Compression: 5:1 compression
Spatial resolution: 300 dpi
Example: Grayscale compressed archival image (514 kb)

Images for convenient access. None of the archival or master image types described above provide easy access in WWW browsers. Without added software, browsers will not display TIFF images. Large JPEG/JFIF grayscale and color images require long transmission times and, once displayed in most browsers' native viewing mode, require the user to scroll from side to side and top to bottom in order to view all parts of the document page. Tonal archival images present a second problem beyond browser inconvenience: they are awkward to print. Their large file size slows the printing process and, for laser and ink-jet printers, the computer must "halftone" them (reduce them to a pattern of dots) which reduces resolution and requires even more processing time.

There is no perfect answer to this family of problems and most solutions require the production of multiple images. To solve the problem of browser display, the Library has taken to producing reduced scale, tonal images--called inline paging images below. These images can be electronically bound into a page-turning set using scripts that are interpreted by the browser.

When the archival source image is bitonal, the Library produces paging images by (1) adding shades of gray--increasing the bit depth--and blurring the image, (2) reducing the spatial dimensions, and (3) sharpening. Sometimes a dithering algorithm is applied to eliminate dot patterns that result from the process. The number of colors or shades varies according to the item at hand. Most of the Library's GIFs have been batch produced using Image Alchemy software with a setting of "six colors."

When the source image is tonal, a similar process is employed, although here the bit depth is often decreased. The Library uses the GIF format in this context (although one could use JPEG/JFIF), and GIF images are limited to 8 bits per pixel. Thus the 24-bit-deep color images must be reduced in depth and it is sometimes desireable--to keep file sizes low--to reduce 8-bit-deep grayscale as well. The production of inline paging images from grayscale or color source images may also employ "contrast stretching," an increase in contrast or brightness to ensure that "paper looks light and strokes look dark."

In the Library's most elaborate presentation to date is part of the Manuscript Digitization Demonstration Project (report forthcoming), involving the twentieth century "routine" typescripts referenced earlier. In this case, three images were produced for each page: (1) an archival or master image (color or grayscale, modestly compressed; see example above), (2) a browser inline paging image (color or grayscale, reduced in scale, compressed), and (3) a printing service image (bitonal, derived from the archival master).

Inline paging image: Grayscale: 4 bits per pixel; color: 8 bits per pixel
Format: GIF
Compression: Native to GIF
Spatial resolution: 400-500 pixels horizontal
Example: Grayscale inline image (61 kb)

Printing service image: Bitonal (one bit per pixel)
Format: TIFF
Compression: ITU Group IV
Spatial resolution: 300 dpi
Example: Bitonal printing service image (42 kb)

The trio of images produced by the Manuscript Digitization Demonstration Project are presented using the Library's page-turning script. The inline paging images are displayed "up front," with the printing service and archival master images available to be fetched. For example, see the playscript for The Comedy of Errors.

Alternate formats. PDF (Portable Document Format from Adobe Corporation). The Library has not had extensive experience using this format to reproduce historical collections; archives wishing to produce collections that are interoperable with those at the Library of Congress and who plan to use PDF must be capable of helping to guide their implementation.

The special problem of printed halftone illustrations. Printed halftones present special problems in reproduction because of interference between the spatial frequency of the halftone dot pattern and the spatial frequency applied by scanning and/or output devices. Interference "waves" are produced when the two frequencies combine and these waves manifest themselves as moiré patterns that degrade the image. There are a number of treatments that can mitigate or correct this degradation but not all are practical in a production-line environment. In order to offer an overview, four approaches to addressing this problem are listed below; the Library of Congress has only tried two of them to date.

1. Descreening and rescreening. This approach removes the halftone dots and converts the image to grayscale, then rescreens it to produce a new halftone. Xerox has incorporated this approach in some of its advanced scanning devices and it has also been employed in the Cornell University Library's book-reformatting projects. In the implementations known to the Library, the process seems to depend upon "four-square" capture of the source items. This requires the placement of flat sheets of paper face down upon a scanner's glass surface. Books that receive this treatment must be disbound. Furthermore, if a page containing both text and illustration is captured, the system (or operator) must zone the page and capture text and illustration separately. Thus far, the American Memory/National Digital Library Program at the Library of Congress has been digitizing books for access and not for preservation. Since the volumes have not been disbound, the Library has not had the opportunity to employ this technique.

2. Capture at high enough resolution to reproduce the halftone dots. This approach requires capture resolutions at one or more multiples of the original halftone screen. Thus, for books with high-quality illustrations, the capture might be at 600 dpi or higher. In order to reproduce the scanned image without loss, the screen display or printer must also offer high resolution. In order to produce reduced-resolution (smaller) images for access, a post-process consisting of descreening/rescreening, converting to grayscale, or dot-randomization would have to be applied. The Library has not availed itself of this technique.

3. Grayscale reproduction. For many illustrations, this approach offers a reasonable onscreen rendering for the user, although if the image is rescaled at display time (for example, reduced in size to fit the screen) some moiré patterns may result. Since printed output from a typical laser printer requires that grayscale images be halftoned by the computer (a similar but not identical process to the halftoning used in letterpress and offset printing), paper copies produced from these grayscale images may also suffer from moiré patterns. If a page containing both text block and illustration is captured in grayscale at moderate levels of resolution (e.g., 200-400 dpi), the grayscale treatment that benefits the illustration may injure the clarity of the typography. Thus, one may wish to zone the page and capture illustration and text separately or capture two versions of the page image with settings that favor, first, the text and, second, the illustration. Experiments carried out by a Library contractor have indicated that processing the grayscale images with a combination of high- and low-pass filters and blurring and sharpening may mitigate some unwanted effects. The Library began using this technique for printed matter in 1997.

Grayscale image of
printed halftone:
Format: JPEG
Compression: Approx 20:1
Spatial resolution: 300 dpi
Example: Grayscale image of halftone (157 kb)

4. Randomization of scanner "dot pattern." This process reproduces printed halftones as bitonal images to which a special diffuse dithering treatment is applied at scan time (or in post-processing a grayscale image). This reduces but does not eliminate moiré patterns. The effect on typography is not as severe as the effect produced by grayscale capture, although it adds speckles to white areas surrounding the type.

The Library used this approach for images captured during the pilot project (1990-1994). The contractor used a Xerox K5200 scanner to capture random-dot-pattern images of printed halftones at 300 dpi. When the diffuse dithering treatment is applied, this scanner's software creates files in the PCX format (a format associated with ZSoft's PC PaintBrush software).

bitonal image:
Format: PCX
Compression: Native to PCX
Spatial resolution: 300 dpi
Example: Random-dot printed halftone (404 kb)

The Library's random-dot-pattern images can be printed on a laser printer with good results but do not rescale well for screen display. In order to provide a screen display of an illustration captured using this approach, the Library creates a tonal thumbnail version at a reduced scale.

Browser thumbnail image
from random-dot
Grayscale: 4 bits per pixel
Format: GIF
Compression: Native to GIF
Spatial resolution: 250 pixels horizontal
Example: Grayscale inline image from dithered printed halftone (8 kb)

V. Maps

The Library's Geography and Map Division digitizes its maps with a large-platen flatbed scanner (32x24 inches). The archival or master image has a spatial resolution of 300 dpi and is captured as a color image with a tonal resolution of 24 bits per pixel. (Higher spatial resolution may be called for future projects in which maps are more detailed than the current round of work.) The resulting uncompressed archival files are large, typically ranging from 100-300 MB.

Through a generous gift from LizardTech, Inc., of Seattle, Washington, the Library has been able to use a newly developed approach for the presentation of these images in the WWW. The format is called "MrSID" (for multiresolution seamless image database). This approach for the compression, storage, and retrieval of large digital images was derived from the research efforts of the Los Alamos National Laboratory, New Mexico. The process entails the use of a compression software that creates the proprietary .sid files, typically resulting in compression of 20:1 or greater. In contrast to formats that rely on tiling, MrSID employs a single compressed image and does not require any special hardware. The files are compressed with a proprietary "wavelet" algorithm that also provides the capability of segmented display ("zoom in") in browser software. A second software on the Library's server delivers segments of these files to users, painting a GIF-format image of the segment in a window in the user's browser display.

Map archival image: Color: 24 bits per pixel
Format: TIFF
Compression: Uncompressed
Spatial resolution: 300 dpi
Example: Archival Image (60 mb)

Map service image: Color: 24 bits per pixel
Format: SID
Viewer: Available from LizardTech web site.
Compression: MrSID wavelet
Spatial resolution: 300 dpi
Example: Map service image (2mb), image as presented at LC

In the final presentation of the map images, the Library adds a preview or thumbnail version that displays with the bibliographic record. The user clicks this image to move to the MrSid zoom-capable display. At this time, no direct link to the archival or master image is provided.

VI. Sound Recordings

The large files required to reproduce audio have forced the computer multimedia industry to continually search for new and better compression and playback schemes. For this reason, the digital audio formats suitable for the WWW are less stable than those for text and pictorial images; computer-digital audio files produced today are likely to become obsolete with dismaying speed. The Library does not see a practical way at this time to make an archival or master computer-file version of an audio selection. This circumstance represents a contrast with the circumstance regarding images or texts. With images, when a new distribution version is needed, the archival or master image file can be used as a source. In contrast, when a new audio file for Internet distribution is desired, the producer must return to the original item itself or to an intermediate version (e.g., a DAT digital audio tape) as a source.

American Memory offers two types of digital audio files at this time. The first is a "downloadable" file, meaning that the file must be copied to the user's local computer before it can be played. Since downloadable files are large (the four-minute recordings in the Nation's Forum collection run about 2 megabytes each), this approach consumes time (waiting for the file transfer) and storage space.

The second type is a "streaming" file that begins to play as it is being transmitted through the network. Although more convenient, these files are of slightly lower fidelity than the downloadable examples, especially since the Library has decided to cater to 14.4 baud modems. (In the future, files that require users to have 28.8 baud or higher modems may be produced. These will have a greater frequency range and contain less distortion due to compression.)

Audio file types:

Downloadable files:
Attributes: 22.05 kHz sample rate, 16 bit word, mono
Format (file type): WAVE (Microsoft format)
Example: WAVE file (2.4 mb)

Streaming files:
Encoding: For 14.4 modems
Format (file type): RA (RealAudio format from Progressive Networks)
Example: Streaming file

VII. Moving-image Materials

The large files required to reproduce motion pictures and video have led to an outcome like that described above for audio. The digital moving-image formats suitable for the WWW are less stable than those for text and pictorial images and are destined for obsolescence. The future production of a higher resolution moving-image file for Internet distribution will require the Library to return to the original or to an intermediate version, e.g., a videotape, instead of using a computer master file as a means for "re-production."

American Memory offers two types of digital moving-image files at this time. Both are "downloadable." The higher resolution file is in the MPEG-1 format and the lower resolution is in the QuickTime format. The MPEG files will only play with good effect (relative absence of skipped-over frames) on moderately powerful desktop computers, while the Library's QuickTime files are produced at a low enough level of resolution to permit playing on less powerful end-user computers.

The Library is planning to employ streaming video files in the near future, as soon as a server can be assigned to this function and fitted with appropriate software. Preliminary plans call for the use of RealVideo, the counterpart to the RealAudio files that the Library uses for recorded sound.

Moving-image file types:

Moderate resolution
Image size: 320x240 pixels
Frame rate: 30 fps
Data rate: ca. 1.2 megabits/second (ca. 150 kilobytes/second)
Compression: MPEG-1
Format: mpg
Example: MPEG file (4.3 mb)

Low resolution files
Image size:
160x120 pixels
Color depth: 24 bits/pixel
Data rate: ca. 100 kilobytes/second
Format: QuickTime (Apple Computer format)
File extension: mov
Example: QuickTime file (2.1 mb)

VIII. Headers for Computer Files

The Library plans to add data to the file headers for all of its reproductions over time. For now, a preliminary implementation exists for the four types listed below. Header content will almost certainly be a part of, or interplay with, the administrative and structural metadata associated with the repository described in Digital Historical Collections: Types, Elements, and Construction. The development and implementation of headers will keep pace with the Library's overall design process for metadata.

TIFF image files. The Library's most fully realized headers for images are found in the archival version of pictorial image files. The Library has been using TIFF version 5.0 but expects that actions under version 6.0 to be unchanged. It is worth noting that the Library's use of TIFF formats and headers has not always gone smoothly, perhaps the inevitable result of using an "multi-flavor" set of industry conventions rather than a true standard; this fact accounts for some of the uncertainties in the description that follows. The Library has used the TIFF tags listed below. Contractors have been asked to provide typical or expected data for most tags; exceptions to the norm are noted in the comments column.

In some TIFF tags, dimensions can be rendered in several ways including inches or pixels. The Library has tended to favor pixels over inches, especially for the pictorial collections that have been chosen for digitization. Most of these have scanned negatives, copy negatives, or copy prints, whose physical sizes bear no relationship to, say, an artist's final print.

Tag, Description, and Comment
254 NewSubfileType
256 ImageWidth (LC uses actual pixel count)
257 ImageLength (LC uses actual pixel count)
258 BitsPerSample
259 Compression
262 PhotometricInterpretation
269 DocumentName (LC places identifier here, e.g., pathname or URN or "handle")
273 StripOffsets
277 SamplesPerPixel
278 RowsPerStrip
279 StripByteCounts
282 XResolution (LC uses dots per inch)
283 YResolution (LC uses dots per inch)
296 ResolutionUnit (LC uses "2" [inch])
306 DateTime (LC uses date and time scanned)
315 Artist (LC default data "Library of Congress")
282 XResolution (LC uses actual pixel count or dpi)
283 YResolution (LC uses actual pixel count or dpi)
296 ResolutionUnit (LC uses "1" [no unit specified] or "2" [inch])

SGML text files. The American Memory DTD for historical texts is a TEI-conformant DTD and, therefore, requires the inclusion of a TEI header as part of every document instance. Since the Library's marked-up texts are described by bibliographic records or finding aids, the required TEI header has been modified to include a somewhat simplified element set. American Memory document headers include a small number of MARC field equivalents: title and statement of responsibility (MARC field 245), copyright registration number (MARC field 017), and the Library of Congress Control Number (LCCN; MARC field 010), as well as aggregate- and item-level identifiers.

WAVE audio files. The Library uses the following Resource Interchange File Format (RIFF) INFO list chunk data with its WAVE files:

INAM (name/title) Identifier for the item
ICRD (creation date) Date digitized by vendor as YYMMDD
IARL (archival location) Library of Congress, identifier for collection or project
ICOP(copyright) Default data: "See collection restriction statement"

RealAudio files. The Library uses the RealAudio header:

Title Identifier for the item; date digitized by vendor as YYMMDD
Author Library of Congress, identifier for collection or project
Copyright Default data: "See collection restriction statement"