Available and Useful: OAI at the Library of Congress
Caroline R. Arms
Office of Strategic Initiatives
Library of Congress
It is mounted on the Library of Congress web site with the agreement of MCB University Press Ltd. (trading as Emerald), publisher of Library Hi Tech.
The Library of Congress (LC) was an early adopter of the OAI Protocol for Metadata Harvesting to make records for some of its digitized historical collections available for integration into other services. The American Memory team was attracted by a protocol for sharing its resources that promised to be straightforward to implement with no deleterious effect on the primary users. The promise was borne out in practice; the mechanics of harvesting are working well and records for additional collections are being made available every few months. Now that service builders can integrate records for cultural heritage resources from many sources, can they offer guidance as to how the scarce resources available to produce metadata should be deployed to most advantage, in particular to support discovery in different contexts? What makes simple Dublin Core records most useful? Is there value in a schema that is less complex than a full MARC catalog record, but considerably richer than the simple Dublin Core? How might metadata harvesting, and the marketplace in metadata it makes possible, be exploited to support new interfaces and enhanced navigation among related resources? The experience of the Library of Congress described here may help start conversations that can address these questions.
Soon, requests were coming in from organizations wanting to integrate an entire collection, or large selections, into their own online resources. In some cases, these were consortial organizations of libraries and museums building cultural heritage resources. In other cases, they were state or regional organizations building online resources for teachers. To non-librarians, the MARC communications format, used by the Library of Congress for its catalog records, was unfamiliar and often seen as a significant hurdle. The American Memory interface was designed for users to locate individual items, not to download batches. On several occasions, LC made an entire collection (records and images) available on an "as-is" basis. Assembling batches for download and explaining what was being delivered was time-consuming, but for an entire collection and a partner with the ability to handle the metadata as it came, it was feasible. Often, what was requested was a substantial subset of a very large collection. LC did not have the resources to do the selection or build a custom batch for download and the requesting institution did not have the capacity to receive the full collection and sift through it. Neither party had the tools or resources at hand to transform the metadata from one form to another. Many worthwhile requests for material to repurpose led to nothing because the effort involved in responding to such requests on an ad hoc basis was simply too great.
During 1996, two task forces of the newly-formed Digital Library Federation (DLF), a group of 15 research libraries engaged in projects involving digital content, met several times to consider recommendations for descriptive metadata and for an architecture to provide common access to the growing pool of resources digitized by member libraries. Both task forces found a variety of practices and local imperatives that would have to be accommodated in any shared approach. For example, three institutions were taking different "standard" approaches to the description of photographs. At LC, the Prints and Photographs Division uses MARC records and follows Anglo-American Cataloging Rules (AACR2) supported by additional rules for describing older and unpublished visual materials (Parker, 1982). At Harvard, there was strong support, for example from the Graduate School of Art and Design, for the tiered approach represented by the Visual Resources Association's proposal for a core descriptive framework that distinguishes between records for underlying works and records for images of those works (VRA, 2002). At the University of California, Berkeley, a commitment had been made to archival practice rather than item-level cataloging for reasons of economy; collections of photographs would be described in archival finding aids, to be marked up in SGML (now XML) using the Encoded Archival Description Document Type Definition (DTD).
Practices and technical capabilities also varied widely across university campuses; libraries could not dictate practice to academic departments doing their own digitization or creating their own digital content unless they could provide resources or services to offset the costs of compliance. Rather than attempt to establish a single architectural design, DLF chose to start by promoting exploratory discussions, prototyping, and standards development for components that would be required in any architecture, such as authentication of users, and linking from references to cited articles. Several DLF organizations joined in a proposal to the Mellon Foundation for a project (known as Making of America II) based on a pooled collection of finding aids and a common structure for complex hierarchical objects representing digitized manuscripts, photographs, and other archival objects. Out of the experience with this XML-based metadata structure has come the more general METS standard, which is now being explored by many institutions as a mechanism for encoding the structure of complex digital objects. However, by 1999, the goal of shared access to the high-quality digital resources in the collections managed by DLF member institutions seemed no closer. Ironically, the convenience and effectiveness of the Internet search engines, such as Lycos and Google, meant that random Web pages were more conveniently available to the students and researchers these libraries served than the resources they carefully selected and cataloged. Rather than being in static HTML pages that could be indexed by the search engines, the managed resources were hidden behind a database query interface, part of the "hidden" Web.
In October 1999, a small number of DLF representatives were invited to a meeting called to mobilize work towards a universal service for scholarly literature archived by authors. That first meeting of the Open Archives Initiative (OAI) and its early history is well documented elsewhere (Lagoze and van de Sompel, 2001; Van de Sompel and Lagoze, 2000). Half way through the second day, participants picked up box lunches and moved outside to eat in the Santa Fe sunshine. The DLF representatives checked with each other and found a common sense of optimism that the outcome of that morning's discussion might provide a path to dealing more generally with sharing heterogeneous descriptive metadata and building access services for pooled resources.
Services could be specialized, for example, by discipline or by intended audience, or comprehensive and cross-domain. The content described could be digitized historical materials as easily as e-prints. The protocol to support collaboration among organizations building collections of papers and reports deposited by authors could be applied by libraries and museums wishing to build virtual collections of cultural heritage materials. Technology for collaboration can be independent of any particular incentive to collaborate.
The enthusiasm from that first meeting led the Library of Congress to play an active role in the Open Archives Initiative. The metadata harvesting approach proposed had a great deal in common with the initially ad hoc approach LC had taken in integrating into American Memory materials digitized by institutions with awards through the LC/Ameritech competition. Based on LC's experience with the competition, many characteristics of the initiative and the harvesting protocol were immediately appealing. These included:
the emphasis on keeping technical barriers to entry low for those with valuable resources to share through metadata;
the recognition that requirements for interoperability should not place unacceptable loads on the primary services of institutions choosing to provide access to their content;
a planned period of practical experimentation followed by review;
the combination of a very generic common denominator metadata schema that would be mandatory (for purposes of basic interoperability across domains) with the ability to exchange records in more specific schemas (to satisfy specialist audiences and content owners and to support access services with richer functionality);
the potential to integrate, for purposes of discovery, resources from libraries, museums, historical societies, and other cultural heritage institutions with a variety of descriptive traditions;
suitability for use in any future program similar to the LC/Ameritech competition or for any organization wishing to repurpose records or content from American Memory;
During 2000, DLF members gathered several times to discuss how, based on their own attempts at interoperation, research institutions might use the metadata harvesting approach outlined in the Santa Fe convention to provide better access to their resources. Enthusiasm was high. One outcome was a short list of suggestions for enhancements to the protocol, including the adoption of simple Dublin Core as a mandatory metadata schema. These were submitted for consideration at a September 2000 technical meeting held to review and adapt the protocol on the basis of the initial months of experimentation. The adjustments agreed to at this meeting both made the protocol more applicable to domains beyond articles and technical reports and lowered the barrier to entry for many potential data providers.
By November 2000, with strong encouragement from Winston Tabb, the Associate Librarian for Library Services, the Library of Congress had committed itself to early implementation of the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) as a data provider using selected collections from American Memory as a test. The Library of Congress has been distributing catalog records since 1901, first on cards and since around 1968 in machine-readable form; making records for digitized content available for harvesting continues the tradition of adapting to new opportunities provided by new technology. Tabb also saw participation in the initiative, and other activities supported by DLF, as important for the Library of Congress, particularly in the light of criticisms of isolation in the recent report, LC21 A Digital Strategy for the Library of Congress (National Research Council, 2000).
The commitment was possible because the barrier to entry appeared low enough not to draw significant resources away from American Memory itself. This assessment proved true in practice. The first implementation took about 40 hours of programmer time, including analysis and testing. Mappings and conversion tools that were already used for American Memory or managed in LC's Network Development and MARC Standards Office (NDMSO) were reviewed, updated, and used to advantage. These included Perl language routines for parsing MARC records, a one-way mapping from MARC to simple Dublin Core, and a table to translate characters from the ALA character set used in LC's MARC records to UNICODE. The first records to be made available would be those already upgraded to serve both in LC's main catalog and for indexing in American Memory. These already supported links to the content based on persistent identifiers resolved using a Handle Server, developed by the Corporation for National Research Initiatives (CNRI) and deployed at LC. Procedures for updates to American Memory could be easily extended to update the OAI data-provider service. The computer processing required to support harvesting requests was not expected to have a perceptible effect on response time for American Memory users.
Also in late 2000, the DLF put out a call to members asking for lists of digital content for which they were willing to share metadata through the harvesting protocol. At LC, this call was passed on to the custodial divisions responsible for content in American Memory described by records in the MARC format. Another mechanism to broaden access to their content was greeted enthusiastically, particularly because the links from the metadata records would bring users to the LC site, where more context for the collections would be available and users could follow leads to other related materials. At the January 2001 public release of version 1.0 of the protocol, David Woodward was able to announce that LC's data-provider service, for which he was the programmer, was up and running with records for two American Memory collections of digitized historical materials available: roughly 3,000 records for maps, and 200 records for ballroom dance instruction manuals. The records were harvestable in two formats, the full MARC records in a simple XML schema (oai_marc) established for OAI-PMH and in unqualified Dublin Core in the oai_dc schema. Three weeks later, records for 47,000 pieces of nineteenth century sheet music were added. By the end of 2002, over 120,000 records were available, for maps, early movies, broadsides and other printed ephemera, sheet music, the dance manuals, and the first four collections of photographs. Maps are added every month, substantial additions were made to the sheet music and broadside collections in Fall 2002, and more photograph collections are in the pipeline.
Version 1.0 of the protocol was released in January 2001 with the expectation that there would be a period of experimentation for at least a year. Starting in September 2001, a technical committee was established with membership drawn from institutions actively using OAI-PMH to make metadata available or to harvest metadata and build a service. LC was invited to participate, not only because of its experience as an OAI data provider, but also because of the importance in the library community of MARC records as a carrier for metadata. Issues and suggestions that had been brought up over the previous nine months were organized into topics, with each topic addressed through development of a discussion paper, e-mail exchanges, and a conference call.
The outcome of this process was version 2.0 of the harvesting protocol; the differences are outlined elsewhere (Van de Sompel and Lagoze, 2002). There are many differences in detail, but the underlying philosophy of a low barrier to entry remained. Proposals were accommodated if there was sufficient support, several through optional extension features. LC is taking advantage of the option to provide descriptive records for "sets" of records that can be harvested as a group. Among the additions proposed but not adopted was support for distributed querying. Proponents were encouraged to participate in the activity now known as ZING (Z39.50-International Next Generation). This activity includes a number of initiatives by Z39.50 implementers to make the accumulated wisdom and functionality embodied in 20 years of experience more broadly available and lower the barriers to implementation. Simple Dublin Core was retained as a mandatory metadata format, because no other obvious candidate exists. This author and others argued strongly that without a mandatory metadata format to ensure a basic level of interoperability at low cost to service providers, the path to richer interoperability might peter out immediately. Some proposals were deferred because they were based on technology that was not considered mature enough. For example, implementers were encouraged to form groups interested in experimenting with a protocol based on SOAP (Simple Object Access Protocol) or schema languages other than the XML schema definition language. The focus of the revision process on keeping the barrier to entry low, through continued emphasis on a simple core protocol that could be widely implemented today, was applauded by the OAI steering committee.
Version 2.0 of the OAI-PMH protocol proved as straightforward to implement at LC as the initial version. Based on experience over the first year, a revised architecture was developed to support more convenient and robust updates and additions. LC's OAI service divides records into OAI sets on the basis of American Memory collections or groups of collections. As of late 2002, there were ten sets, with one set being the aggregation of four photograph collections also available as separate sets. Updates to American Memory (and hence to the OAI data-provider service) are managed by complete replacements of record sets, not by correction or addition of individual records.
The implementation of OAI-PMH is programmed in Perl, taking advantage of the popular CGI.pm library to process and respond to HTTP requests. SGML processing routines from James Clark were used to handle the character conversion. For the first version, responses to the Identify and ListSets verbs were both in static files. ListIdentifiers requests were answered from an index, stored as a file, which mapped record identifiers to sets. The OAI verbs that return records access the source files of MARC records in the standard (ISO 2709) communication format, with dynamic conversion (using Perl) to XML.
The principal change when implementing version 2.0 was to use three tables in a relational database (using MySQL) in place of the simple set-identifier index. The tables represent OAI sets, OAI items, and the set-item links that indicate set membership. Sets can be added and related information edited through a Web form. Perl is still used for the HTTP processing, querying the MySQL database, and dynamic conversion from the source MARC records. Implementation of the new version at LC took roughly 60 hours of programmer effort, including analysis and testing. It was completed during the beta-test period and in operation before the official release of the protocol in June 2002.
At NDMSO, the year leading up to the release of version 2.0 of OAI-PMH saw related developments in response to industry trends and requests from the library community. The trends included the growing adoption of XML following the stabilization of the XML schema definition language in May 2001 and the emergence of compliant software tools, the desire in libraries for a "lite" version of MARC encoded in XML, and the interest in metadata harvesting as part of the interoperability toolkit for libraries. NDMSO has developed two MARC-related XML schemata.
One of these schemata is recommended in the new OAI release as a replacement for the original oai_marc schema. This MARCXML schema can be used to encode any MARC21 record. Like oai_marc, the tagging of this "slim" schema indicates the structure of a MARC record in fields and sub-fields, allows round-trip conversion without data loss, and will not need modification with the minor changes to the MARC standard that happen routinely. In February 2003, LC's cataloging distribution service made test files available in MARCXML for evaluation and prototype development.
The second schema, known as MODS (Metadata Object Description Schema), is still at an early stage of development. Introduced in June 2002 and revised in early 2003, it is intended as a multi-purpose bibliographic element set for library applications (Guenther, 2003). Unlike MARCXML, MODS includes only a subset of MARC elements and sometimes regroups them. MODS uses familiar words as element names rather than relying on MARC's numeric tags to convey semantics. In Fall 2002, LC made its OAI-harvestable records available in the MODS format in addition to MARC21 and simple Dublin Core. These records provide richer semantics than Dublin Core and will support experimentation with the new format. The implementation uses an XSLT stylesheet to convert MARCXML records to MODS. NDMSO intends to maintain a suite of such tools for manipulating and transforming records between MARC and MODS.
Implementing OAI-PMH as a data provider has had several beneficial side-effects for LC. Program code for converting records in the MARC Communications Format (ISO 2709) to an XML format can be used in many circumstances. Experience gained with special characters has been applied in other contexts through the development of mappings among MARC 21 characters, UNICODE, and character entities for HTML and SGML. Unusual characters in LC's records have also helped harvesters test their handling of UNICODE. Having records formally validated and tested by automated transformations by harvesters picks up errors that may be hidden by local interfaces or simply not noticed. This finding is consistent with experience with the LC/Ameritech competition. When records were brought into American Memory and subjected to different indexing and display routines, errors exposed were often found to cause undetected anomalous behavior in the original system too.
The Library of Congress has registered its OAI server for American Memory collections in the central OAI registry, which means that others can harvest and re-purpose records without LC being aware of it. This is nothing new for LC. Its records are available for copying from bibliographic utilities, such as OCLC and RLG; users of Z39.50 have been able to download records directly from LC's main catalog for well over a decade. Indeed, one of the key benefits to LC from the harvesting protocol is that no special arrangements have to be made when others want to get the records and use them in another service. Since the records are updated infrequently, there is little re-harvesting and the processing load on the server has been unnoticeable.
LC has a particular interest in the use of its records in services with a focus on cultural heritage or humanities scholarship, and services that integrate materials from libraries, museums, historical societies and other archival institutions. The Research Libraries Group (RLG) has harvested records (in the MARC format) for several collections for integration into its cultural materials service. Dublin Core records for all collections are harvested by two services described elsewhere in this issue: OAIster at the University of Michigan and the University of Illinois-Urbana Champaign (UIUC) Cultural Heritage Repository. Records have also been harvested by the Perseus Digital Library. The mechanics of harvesting appear to be working well.
Now that records for a substantial body of cultural heritage content are easily harvestable, organizations and communities can begin to build on this base to support learning and scholarship through richer interactions with the resources accumulated in archival institutions. To achieve this will likely require communities to adopt more detailed agreements about metadata or make commitments to federated organization. The form of these agreements can usefully emerge from a better understanding of the costs and benefits through experience with services built on the heterogeneous metadata now available.
Costs associated with metadata content can be daunting. Identifying and characterizing a resource and placing it in an intellectual context is expensive. Local needs take priority and upgrading existing records on an individual basis is seldom an option. Few doubt that "better" metadata lets people find what they want more easily, but a compromise between quantity and quality is often necessary. For pictures, music, and other cultural resources that do not include full text or spoken words that can be searched, human expertise is essential to provide summaries, assign topical and genre terms, indicate where and when a work was created, or relate it to other works.
Because of the expense and investment, change in practices is inevitably slow. Most standards and formal consortial agreements allow a great deal of flexibility so that legacy records can be handled. Motivation to modify records or practices, or to develop and test more complex conversion tools, will exist only if there is obvious benefit to the unit bearing the brunt of the workload. How do we allocate scarce human resources most effectively? Which characteristics of descriptive metadata contribute most to its usefulness in a variety of contexts? How can we best apply automated tools to assist in the generation of useful metadata? When are authoritative forms of names worth establishing and using? Are there normalizations or transformations in element content values or encoding that should most cost-effectively be made before exposing records for harvesting, or will each service provider normalize in a service-specific way before incorporating metadata into its service? Or will tools like computational linguistics, natural language analysis, topic maps, and semantic networks eliminate the need for normalization? Can annotations and pathfinder documents be used in conjunction with more formal descriptive practices? What strategies will users develop to combine use of cross-domain discovery services with more specialized services? Answers to these questions may not emerge quickly, but the availability of records for a large body of valuable content should allow observation of real users engaged in real tasks and offer real choices.
The Library of Congress is aware that it could improve the records it makes available for harvesting in a number of ways, with more or less effort, but needs evidence or guidance from service builders to indicate which modifications would be most valuable and which irrelevant. Indications are beginning to surface. Some inferences can be drawn from an analysis from the team building the UIUC Cultural Heritage Repository and the approach to metadata normalization for that service (Cole et al., 2002). The section on data content in RLG's description guidelines for participants in the Cultural Materials Alliance indicates the aspects that are most significant in building their service.
The least expensive way to improve LC's records in the short term would be through modifications to the transformation from the source record into formats for harvesting. LC's current transformation from MARC to simple Dublin Core is extremely basic (NDMSO, 2001). Element values are copied directly without conversion to another vocabulary or encoding scheme. Where codes are converted to terms, MARC terminology is usually used, because it matches the semantics of the coding scheme. In some cases, this results in terms that do not correspond to common usage or map cleanly to categories in other metadata schemes. With the understanding that simple Dublin Core has greatest value for cross-domain discovery, could LC modify its transformation rules to produce records that are more valuable in that context? Issues and questions that have come up in relation to simple (unqualified) Dublin Core records include:
Title. The mapping to simple Dublin Core omits all title information except from the MARC 245 field. This is on the assumption that multiple titles will be confusing for service builders (and their users) if the main title cannot be distinguished. However, significant information will be missing. For example, songs are often known by words from their first lines or choruses rather than the title on the published score. Records for works in other languages may include translated titles. Should some alternative titles be included as Description elements? Or would service builders be happy to treat the first Title element as the one to use in hits lists?
Date. The value for Date is taken from the statement of publication, distribution, etc. This field is intended for human reading, and the cataloging rules provide methods to indicate uncertainty and approximation, or date ranges when the resource being described is not a single item, and even to concatenate several dates for different events. For cultural heritage materials, uncertainty about the date of creation or distribution is common. The result is that the Date element is neither easily machine-processable nor useful for sorting. Should LC consider normalizing the information it makes available in the Date element or rely on service builders to do that?
Description. For American Memory materials, note fields can be very significant for discovery: they can include summary descriptions, information about provenance, or biographical details. LC has chosen to include almost all categories of notes as Description elements. Some notes may be of little value for discovery, but because the range of information included in general notes (MARC 500 field) is very broad, it is not easy to distinguish notes that are logistical or administrative from those that provide intellectual context for the resource. How valuable would it be to service builders if LC attempted to be more selective about the notes it includes in Description elements?
Creator/Contributor. The current mapping uses only Creator and includes all values for MARC fields 100-111 (Main Entry Name fields) and 700-720 (Added Entry Name fields). The semantics of the distinction between main entries and added entries in MARC is not the same as the distinction between Creator and Contributor. In particular, since only one main entry name is permitted, authors beyond the first are added entries. The American Memory experience leads us to believe that a distinction based on role with relationship to the resource being described (e.g. lyricist, photographer, engraver, architect) is more significant for cultural heritage materials than either the MARC distinction or the Dublin Core distinction. Unfortunately, neither the unqualified version of Dublin Core nor the approved qualifiers can represent this distinction. Questions to service builders at UIUC and OAIster about their preferences received somewhat uncertain responses.
Some of these issues will not be a problem when transforming MARC records to the MODS schema, since MODS inherits the relevant semantics from MARC. This is not a coincidence, since experience with American Memory and mappings to metadata schemes used in other communities were among the inputs to the development of MODS. Others, in particular those related to dates and genres, will continue to be a concern when transforming records from sources based on MARC records and AACR2/ISBD practices. LC hopes there will be constructive interactions with service builders and others making metadata for cultural materials available, so that resources can be applied where the benefit will be greatest, including the development of more global support for using controlled vocabularies and matching names for people and places to authorized forms where appropriate.
One of the hopes was that a pool of metadata would support experimentation with interfaces and navigation. The interface for RLG's cultural materials service allows users to switch easily between different structural views of the current result set enabling different browsing strategies and different approaches to successive refinement of a search. Every item is represented by a thumbnail for visual browsing; explicit modeling of parent-child relationships facilitates navigation from collection records to item records and vice versa. How will users respond to this type of interface in comparison to the more traditional interface of OAIster, the UIUC Cultural Heritage Repository, and American Memory itself. Will thumbnails become a standard component of metadata records? The ONIX schema used in the book publishing industry already provides for a small image of a book's cover. Does one style of interface lend itself to certain tasks or appeal to users with certain cognitive styles? What capabilities for manipulating and refining result sets will work with heterogeneous metadata? Perseus and the Electronic Cultural Atlas Initiative are exploring the use of information about time and place to provide new ways to navigate through bodies of information. In what circumstances will users find value in map-based selection of geographical areas or features and sliding bars for timelines? How can visual navigation through time and place be effectively combined with word-based searching? New and effective interfaces may motivate changes in descriptive practice or upgrading of existing records.
A rather different issue has been raised by the functionality of the Perseus Digital Library. The special features of Perseus are based primarily on full text, including the detection and highlighting of names of people and places in unstructured text so that they can function as searches for related information. In records harvested using OAI, they found that records linking back to an HTML representation of the item described seldom provide access to source SGML or XML, even when it exists (Smith et al., 2002). LC has downplayed access to the SGML versions of its converted texts through American Memory because regular users were confused. Perhaps the dynamic addition of a link to the SGML version should be considered for OAI records in the MODS format (which has a flexible structure for identifying, labeling, and linking to related items - more powerful than either MARC's linking fields or qualified Dublin Core).
The Perseus interface nevertheless provides an interesting capability based on its harvested records. Using the Lookup Tool, a search on "Chesapeake Bay" in the Perseus Catalogue will retrieve records for maps at LC retrieved from LC's Open Archives Initiative Repository. Clicking on "View with Perseus links" provides a view of the bibliographic display from American Memory with certain words and phrases highlighted in red and functioning as links to Perseus searches. A link from the name "Hassler" led to a biographical article on the surveyor in the 1902 Harper's Encyclopedia of United States History, one of the Perseus reference texts. This is a small indication of what the future offers. The Library of Congress hopes that if it makes American Memory content more available, others will make the content yet more useful.
In addition to making records for more digital reproductions accessible for harvesting, OAI-PMH will play a role in other activities in which the Library of Congress is involved. Plans are developing for two specialist gateways, one for sheet music and another for video and motion pictures. Both expect to harvest records for particular collections of records as those gateways move from the planning stage to implementation. In a very different project, OCLC, Die Deutsche Bibliothek, and LC are taking initial steps to build a Virtual International Authority File (VIAF). In the first phase, VIAF will include records for Personal Names from selected national libraries and serve the cataloging community. OAI-PMH will be used to update the central service.
Metadata harvesting will be an important part of the interoperability toolkit for libraries. But it is just a first step. Communities that have close affinities with libraries, such as museums and other archival institutions, have good reason to use different descriptive practices (Arms, 2001). Those practices, and the terminology used: "necessarily depend on conscious or sub-conscious assumptions about who will use it and how" (Shabajee, 2002). A resource that is of potential use in many contexts may need to be described in several ways and using several vocabularies. The Library of Congress hopes that by making the intellectual assets in its catalog records more available, a valuable objective in itself, it can also learn how to make them more useful to other communities and in other contexts.
Arms, C.R. (2001)
"Some observations on metadata and digital libraries",
Proceedings of the Bicentennial Conference on Bibliographic Control for the
New Millennium, 15-17 November 2000, Library of Congress, Cataloging Distribution
Service, Washington, DC.
Arms, W.Y., Hillmann, D., Lagoze, C., Krafft, D., Marisa, D., Saylor, J., Terrizzi, C., Van de Sompel, H., 2002, "A spectrum of interoperability: the site for science prototype for the NSDL", D-Lib Magazine, 8, 1.
Cole, T.W., Kaczmarek, J., Marty, P.F., Prom, C.J., Sandore, B., Shreeves, S., 2002, "Now that we've found the hidden Web, what can we do with it? The Illinois Open Archives Initiative metadata harvesting experience", paper presented at Museums and the Web 2002.
Guenther, R., 2003, "MODS: the metadata object description schema", Portal: Libraries and the Academy, 3, 1, 137-50.
Lagoze, C., Van de Sompel, H., 2001, "The Open Archives Initiative: building a low-barrier interoperability framework", Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, Roanoke, VA, 24-28 June 2001, 54-62.
National Research Council, 2000, LC21: A Digital Strategy for the Library of Congress, National Academy Press, Washington, DC.
Network Development and MARC Standards Office (NDMSO), 2001, MARC to Dublin Core Crosswalk.
Parker, E.B., 1982, Graphic Materials: Rules for Describing Original Items and Historical Collections, Library of Congress, Washington, DC.
Shabajee, P., 2002, "Primary multimedia objects and `educational metadata': a fundamental dilemma for developers of multimedia archives", D-Lib Magazine, 8, 6.
Smith, D.A., Mahoney, A., Crane, G., 2002, "Integrating harvesting into digital library content", Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, Portland, OR, 14-18 July 2002, 183-4.
Svenonius, E., 2000, The Intellectual Foundation for Information Organization, MIT Press, Cambridge, MA.
Van de Sompel, H., Lagoze, C., 2000, "The Santa Fe Convention of the Open Archives Initiative", D-Lib Magazine, 6, 2.
Van de Sompel, H., Lagoze, C., 2002, "Notes from the interoperability front: a progress report on the Open Archives Initiative", European Conference on Digital Libraries 2002, Rome, 16-18 September 2002.
VRA, 2002, VRA Core Categories, Version 3, Visual Resources Association Data Standards Committee.
Available and Useful: OAI at the Library of Congress, Library Hi Tech, Volume 21, No. 2, 2003, pp. 129-139
Technical Information and Background Papers
American Memory Home Page