Librarians, archivists, and museum curators are all faced with an amazing and confusing array of standards designed to make data easier to manage, share, deliver, and discover. These standards address the proper use of "metadata"—data about data—that can help computers (and their users) more effectively manage large quantities of information. However, there are literally dozens of emerging metadata standards competing for attention—and many of these standards are undergoing seemingly continuous revision by their parent organizations.
In an effort to reduce the level of confusion, RLG and the CIMI Consortium co-sponsored a 1-½ day forum May 12-13, 2003 in New York City that introduced some of the most significant metadata standards today. "Ready to Wear: Metadata Standards to Suit Your Project" drew a capacity crowd of nearly 100 attendees to the American Museum of Natural History's Linder Theater, where a succession of metadata experts provided insight about the utility and viability of 19 current standards and resources.
In his opening remarks, John Perkins of CIMI wryly noted the often-repeated saying, "the wonderful thing about standards is there are so many to choose from." This forum attempted, with a lighthearted nod towards haute couture, to make the process of choosing a metadata "wardrobe" a little simpler.
Metadata on your sleeve
The first day, "Metadata on Your Sleeve," was dedicated to metadata that can help users find the resources they want: Metadata standards for the description and discovery of information.
Tony Gill of ARTstor kicked off the day with a discussion of two standards for describing items in museum and library collections: the CIDOC Conceptual Reference Model (CRM) and Functional Requirements for Bibliographic Records (FRBR).
CIDOC CRM is aimed at formalizing the description of cultural objects in museums, said Gill. Museum staff or the designers of museum applications can use the CRM to describe objects in a collection and their implicit relationships (such as their creators, places of creation, previous owners, and so forth). The advantages of CRM, said Gill, are that it's an "elegant," concise model, allows institutions to integrate information at varying levels of detail, and is extensible. FRBR, created by the International Federation of Library Associations, is a comprehensive review of the requirements for bibliographic description and control. It distinguishes between four levels of increasing specificity: a work, its expressions, its manifestations, and particular items exemplifying those manifestations.
Both the CIDOC CRM and FRBR are "reference models," meaning that they're not binding standards. They're most useful, said Gill, in disambiguating dialog: for instance, when clarifying terms in discussions between technology experts and domain experts. They can both be expressed in XML, and can be useful in designing systems—for instance, CRM was referred to in building RLG Cultural Materials, and FRBR was used as a reference point during construction of ARTstor's "Illustrated Bartsch" project as well as RLG's RedLightGreen.
CIDOC CRM and FRBR address how information is organized into conceptual structures, but what about the data contained within those structures? Sherman Clarke of New York University addressed several resources for controlling data values for names, places, and subject headings.
The Anglo-American Cataloging Rules (AACR), which was first implemented around 1970 and is now in its 4th edition, provides rules for naming things in catalogs, and also provides an authoritative list of personal names, institutional names, place names, and subjects for use in catalogs.
Clarke discussed a variety of authority files from the Getty Museum that are useful to art librarians and museums. The Getty Vocabularies include the Art & Architecture Thesaurus (AAT), Union List of Artists' Names (ULAN), and the Thesaurus of Geographic Names (TGN). All of these help standardize names and terminology used in cataloging, to help ensure that users find all available resources on a given topic.
Finally, Clarke provided an overview of several Library of Congress authority files indispensable in modern cataloging: the Name Authority File (NAF), Subject Authority File (SAF), and Thesaurus for Graphic Materials (TGM). Clarke concluded his presentation by explaining how these various rules and authority files could be useful in creating more standardized catalogs.
RLG's Merrilee Proffitt next examined three XML-based standards aimed at increasing interoperability and data sharing among institutions. Encoded Archival Description (EAD), created by the Society of American Archivists, is winning wide acceptance among archives as a tool for marking up descriptions of archival collections and, to a lesser extent, the individual items within these collections. EAD is also used by museums, Proffitt reported, although there is more of a challenge there in defining exactly what is meant by a "collection" (a more clear-cut issue in the world of archives).
Proffitt described the Metadata Object Description Schema (MODS) as "very MARC-like, if not 'MARC Lite.'" It's intended to be a simpler, easier-to-use standard for computer-readable library catalog descriptions than the complex, industry-standard MARC, while providing more richness than the highly simplified Dublin Core (a standard addressed later in the day, below). MODS originated within the Library of Congress and is finding some use as a format available to automated information harvesters.
Finally, Proffitt turned to the Research Support Libraries Programme's (RSLP's) Collection Description schema. With this standard, which originated in the UK, collections may contain physical objects, digital resources, or even metadata such as catalog records. The Collection Description schema does not support description of specific items within those collections, however.
CIMI's John Perkins tackled three further standards for sharing data among institutions. The Dublin Core Metadata Initiative emerged out of a workshop in Dublin, Ohio, in 1995 and since then has found fairly wide usage beyond its roots in the library world. Dublin Core was developed to enable discovery of information objects across a wide range of subject domains, and as such is very simple compared to other cataloging standards: There are just 15 data fields. Said Perkins: "It's a machete and not a scalpel when it comes to being able to dissect and categorize information." Dublin Core was designed to simplify access to metadata, but its simplicity can also streamline the production of metadata, said Perkins.
The Visual Resources Association's VRA Core mimics the organization of Dublin Core (it has just 17 data elements), but is aimed at describing visual works instead of books or textual material. Like Dublin Core, it's best at broad, simple categorizations, said Perkins.
For more fine-grained control of visual works, there is SPECTRUM, a widely used museum standard, particularly in the UK. Unlike VRA Core and Dublin Core, said Perkins, SPECTRUM is a "scalpel, not a machete," offering more than 400 data elements for describing any museum object and the museum processes that surround it. Unlike the other two standards, which are publicly described on the Web, the details of SPECTRUM are only available by purchased a printed book or CD-ROM from the standard's parent organization, mda. It's also available as an XML schema available for free on the CIMI Web site.
After a lively Q&A session with all of the day's panelists, RLG's Günter Waibel offered a guide to "mixing and matching" multiple standards in actual projects. Waibel discussed the process of creating an "application profile" documenting a project, specifying data structure standards to be used as well as the sources for data values (which controlled vocabularies will be considered authoritative). Waibel touted the value of expressing this profile as an XML schema, which allows computer applications to enforce the chosen standards and data values.
Metadata in your pocket
The second day of the "Ready to Wear" forum focused on metadata standards for managing and delivering information within an organization, rather than exchanging data with other organizations or making it available to users (hence the "in your pocket" moniker for the day's topics).
Ronald J. Murray of the Library of Congress kicked off the day with an in-depth review of three standards for storing and managing digital resources. The Open Archival Information System (OAIS) reference model defines the terms and processes that might be utilized by a digital archive. According to Murray, there is currently a lot of research and development work around OAIS, which suggests that some degree of widespread adoption is likely. However, as a reference model, OAIS is deliberately vague as to specific processes and technologies, which means many varying interpretations of the standard are likely to appear, Murray said.
Murray then turned to a pair of standards from the Motion Picture Experts Group (MPEG), which includes many movie, music, and technology industry companies. He described the organization's method for establishing standards, which involves many steps, is competitive, and includes empirical testing of proposed standards through experimental testbeds and software that members must supply along with their proposals. The process, which can involve hundreds of participants, produces robust standards that are then firmly endorsed by MPEG members once the standards have been finalized. MPEG-7 provides a standard for the description of multimedia content, including streaming audio and video. MPEG-21 establishes standards for the exchange of multimedia assets and includes support for transaction processing, payment, and the digital rights control.
Steven Abrams of Harvard University discussed an emerging effort to create a global repository of information about digital file formats, so that future archives could easily retrieve the information needed to open, say, a WordPerfect 4.2 document. "It's important that we have a very clear and unambiguous understanding of what these formats mean," said Abrams. "If we did not, our repositories would contain nothing but opaque data streams that nobody could understand or use."
This project started in summer 2002 as a joint effort between Harvard University's Library Digital Initiative (LDI) and MIT D-Space; the Digital Library Foundation sponsored the group's early meetings. Its mission is to identify file formats and to maintain a long-term store of information detailing how data is stored in those formats. The information contained in the registry will be informative and factual, said Abrams. Currently the group is seeking a planning grant in order to continue development of the registry.
Capping the day, Robin Dale of RLG spoke on three data structure standards that offer support for management and delivery metadata. The Joint Photographic Experts Group JPEG 2000 standard is the latest iteration to the familiar JPEG image file format. The core part of the standard governs how image files, which have a .JP2 extension, are stored and displayed. Five adjunct parts to the core standards will be used in files with a .JPX extension. Among these additional parts is support for embedded metadata—data contained within the image file itself that specifies details such as the photographer's name, camera settings when the picture was taken, intellectual property rights associated with the image, and so forth. The metadata contained with .JPX files is based on the DIG35 standard developed by the International Imaging Industry Association (I3A), and can easily be mapped to NISO Z39.87 - the next standard discussed by Dale.
The National Information Standards Organization is working on another metadata standard for still images, NISO Z39.87, which was initiated by the library, museum, and archive communities. This draft standard aims to provide a comprehensive list of data elements needed to preserve and render digital still images, most of which can be extracted from an image file's existing file header (for instance, color depth, image dimensions, format, file size, and so forth).
Finally, Dale discussed the Metadata Encoding and Transmission Standard (METS), which functions as a "wrapper" for data about complex digital objects. METS includes required elements specifying the structure of an object (for instance, the sequence of pages within a book, with separate image files for each page) as well as optional elements which contain or point to additional metadata about the object. METS is open-ended in the sense that it does not specify particular standards for these optional extensions; it is left up to institutions using METS to decide what particular administrative or descriptive metadata standards best suit their needs.
The "Ready to Wear" forum concluded with another lively Q&A period between the day's panelists and members of the audience. Discussion was wide-ranging and lively, covering strategic planning issues as well as the fine details of implementing specific standards and specifications. Many audience members expressed a desire for more guidance and leadership regarding which metadata standards most deserved their time and resources.
After the forum concluded, many attendees were still in deep in conversation as they left the Museum.