![]() |
||
| June 15, 2003, Volume 7, Number 3 | ISSN 1093-5371 |
|
|
Like Russian Dolls: Nesting Standards for Digital Preservation This article introduces three standards for digital preservation, at least two of which feature prominently in the appendix of the plan Congress just approved.[1] Understanding what these standards are, what they can and cannot do, provides a solid foothold in present and future discussions surrounding long-term retention of digital materials, as well as a leg up on implementation.
Although all this probably sounds confusing in bulleted shorthand, it actually makes a lot of sense when properly laid out. This article walks through the standards one by one and elaborates on their functionality and interaction. As it works its way through the standards from the most general to the very specific, it will also home in on digital images as the files to be preserved. The expansive OAIS applies to any type of media, even nondigital materials, whereas METS applies exclusively to the digital realm of images, audio, and video. The NISO Data Dictionary focuses on technical metadata for digital still images. From a business perspective, digital preservation is a mechanism to ensure return on investment. Enormous amounts of money have been and are being spent on reformatting original materials or creating digital resources natively. If the cultural heritage community can not sustain access to those resources or preserve them, the investment will not bear the envisioned returns. Although a basic understanding of the general problems surrounding preservation in an ever-changing technical environment has started to permeate memory institutions, practical solutions to the challenge are slow to emerge. The three standards, OAIS, METS, and Z39.87, converge as a sustainable system architecture for digital image preservation. The space data community represents another group with enormous stakes in the long-term viability of its data. Capturing digital imagery of art or manuscripts may seem expensive, but the cost pales in comparison to that of gathering digital imagery from outer space. Under those circumstances, losing access to data is not an option. To foster a framework for preserving data gathered in space, the Consultative Committee for Space Data Systems (CCSDS) began work on an international standard in 1990. A good ten years later the OAIS was approved by the International Organization for Standardization (ISO).
In the standard’s own words, “[a]n OAIS is an archive, consisting of an organization of people and systems, that has accepted the responsibility to preserve information and make it available for a Designated Community.”[4] The standard formulates a framework for understanding and applying concepts in long-term preservation of digital information. It provides a common language for talking about these concepts and the organizational principles surrounding an archive. Though the OAIS pertains to both the digital and the analog realm, it has received the most attention for its applicability to digital data. As a reference model, the OAIS in and of itself does not specify an implementation—it does not tell you which computers to buy, which software to load, or which storage medium to use. The standard does tell you, however, how an archive should be organized. In its so-called functional model, it defines the entities (or departments, if you will) in an archive, their responsibilities, and interactions. The data flows between those entities and the outside world are specified in the information model, which delineates how information gets into the archive, how it lives in the archive, and how it gets served to the public. The OAIS leaves it up to every distinct community to flesh out an implementation of the high-level guidelines. For the cultural heritage community a number of OAIS-related documents exploring the framework’s application to libraries, museums, and archives have come out of the joint OCLC-RLG Preservation Metadata Working Group.[5]
As figure 1 illustrates, the OAIS stipulates that an archive (everything within the square box) interacts with a producer as well as a consumer. It takes in data from the producer through its ingest entity, and it serves out data to the consumer through its access entity. Within the archive itself, the data content submitted for preservation gets stored and maintained in the archival storage unit; data management maintains the descriptive metadata identifying the archive’s holdings. The OAIS dubs the data flowing between the different players information packages, or IPs. The data flows sketched out in figure 1 contain the following information packages:
The data represented by the information packages may vary according to the specific needs at each station: an archival information package, for example, probably contains more data aimed at managing the object than its more light-weight counterpart on the access side, the dissemination information package. Furthermore, the OAIS details several categories of information comprising a complete information package, but in keeping with its role as a reference model, it stops short of suggesting specific data elements or a specific encoding for the entire bundle of information.[6] Any community interested in implementing the OAIS has to identify or create a file-exchange format to function as an information package. For the cultural heritage community, METS shows great potential for filling that slot. METS wraps digital surrogates with descriptive and administrative metadata into one XML document. Digital surrogates in this context could be digital image files as well as digital audio or video. At the heart of each METS object sits the structural map, which becomes a table of contents for public access. The hierarchy of the structural map allows the navigation of media files embedded in, or referenced by, the METS object. It enables browsing through the individual pages of an artist’s book as well as jumping to specific segments in a time-based program, for example, a particular section of a video clip. These so-called digital objects encoded in METS have three main applications that conveniently align with their potential as OAIS information packages.
![]() Fig. 2. A METS object represented in the context of RLG Cultural Materials—a Chinese album from the Chinese Paintings Collection, contributed by the UC Berkeley Art Museum and Pacific Film Archive. The METS XML schema divides the standard into a core component and several extension components. The METS core supports navigation and browsing of a digital object. It consists of a header, content files, and a structural map. The METS extension components support discovery and management of the digital object. They consist of descriptive metadata and administrative metadata, which in turn split into technical, source, digital provenance, and rights metadata.
Figure 3 details the components of a METS object and one possible set of relationships among them.
The METS designers leveraged the combined power of the W3C specifications for XML schema and Namespaces in XML to create a flexible standard.
In this way, each community can plug in its own preferred descriptive elements as long as they have been formalized into a schema.[9] The visual resources community, for example, may choose to extend METS using the VRA Core, while libraries might be more inclined to stick with Metadata Object Description Schema (MODS) from the Library of Congress. Others may decide the Dublin Core (DC) satisfies their access needs. The flexibility achieved through namespaces gives METS the potential for implementation across a wide range of communities. The same logic applies to all components of administrative metadata. Each community has the opportunity to specify what data it deems most important for the management of its information, formalize those requirements into an XML schema, and use that schema as an extension to the hub-standard METS. For an example of a project that has identified or created a comprehensive suite of METS extensions, consult the Library of Congress AV Prototyping project. An alternative to embedding metadata for the extension components through XML Namespaces and external schemas consists in simply referencing the data from within the object. Descriptive or administrative metadata may live outside the XML markup in a database, to which the METS object can point. Even down to the level of media files, METS provides the dual option of referencing or embedding. The METS specification makes provisions for wrapping the actual bit stream of a digital file in the XML. In most cases, however, files live at online locations pointed to from within the object. In the realm of technical metadata, a fledgling NISO standard takes center stage for describing the different parameters of digital image files. As the NISO Data Dictionary—Technical Metadata for Digital Still Images, or Z39.87, the standard specifies a list of metadata elements. The Library of Congress, motivated by its AV Prototyping project, created an XML schema encoding for Z39.87, called NISO Metadata for Images in XML Standard (MIX). The XML schema constitutes the smallest Russian doll in our series of nesting standards, as it may be plugged into the METS framework as an extension schema for technical metadata. The standard also proposes fields for the source and digital provenance sections of METS. The NISO effort draws heavily on the Tagged Image File Format specifications, better know by their acronym TIFF. As the name implies, this format uses tags to define the characteristics of a digital file.[10] Image creation applications write the necessary parameters to the tags within the TIFF file, which means that the majority of the data Z39.87 covers already exists in file headers. To complete the metadata cycle, harvester utilities have to extract the information from the image file headers and import it into digital-asset-management systems for long-term preservation. By using the image file format specification as an integral part of the Data Dictionary, the standard leverages existing metadata to achieve cost savings. On the other hand, in going beyond the TIFF specifications for some elements, the NISO standard acknowledges information outside the TIFF scope that plays an important role in digital preservation. From this vantage point, the Data Dictionary becomes an important tool for educating vendors about the metadata our community sees as invaluable to preserve our investment. RLG is investigating the formation of a group advocating among digital camera-back vendors for the cultural heritage community’s metadata needs.[11] An industry standard for consumer digital cameras called DIG35 already has broad support among vendors. DIG35 allows transfer of information from the camera to the software utility that consumers use to manage their holiday snapshots. Building on that model, NISO Z39.87 in its XML instantiation MIX could become the file-exchange format to go between high-end scanners or camera backs and sophisticated asset-management databases. The Data Dictionary divides the technical metadata elements into four groups.
For any institution just starting out on the path of digital preservation, managing technical metadata through the NISO Data Dictionary is a great first step. The term data dictionary itself comes from the database community; it refers to a file defining the basic organization of a database down to its individual fields and field types. NISO Z39.87 represents a blueprint for a database or a database module that can be implemented fairly quickly—all the intellectual legwork has already been done by the standards committee. For expanding the database to include structural metadata relating files to each other, plus a descriptive record, as well as rights metadata, the database could be augmented by looking at METS and its extension schemas. Again, the Library of Congress AV Prototyping project offers a model implementation of a database using the METS approach. Scaling up to the bigger picture, this database could find its home in an archival environment specified by the OAIS.
To summarize: as illustrated by figure 4, the OAIS stipulates information packages, which find instantiation in METS; METS stipulates an extension schema for technical metadata, which finds an instantiation in Z39.87’s XML schema, MIX. Now, after the detailed review, the first bulleted list in this article should make a lot more sense. In broad strokes, digital preservation with the
nesting standards OAIS, METS, and Z39.87 looks like a puzzle with all
the pieces neatly falling into place. In the details, however, some harmonization
issues between the standards remain. For example, the OAIS model breaks
an information package into different subcomponents than the METS
schema; the NISO Data Dictionary and its XML encoding MIX cover not only
the technical metadata extension of METS, but also some elements that
the digital object standard relegates to sections on source and digital
provenance. Nevertheless, the convergence of three standards developed
independently illustrates that a holistic view of digital preservation
is emerging. Only widespread implementation will tell whether the theory
as outlined by the standards can hold up in practice. Footnotes [2] National Information Standards Organization.(back)
[10] For the full TIFF tag library, see Appendix A of the format specifications.(back) [11]For more information about this fledgling initiative, please contact the author.
Publishing Information RLG DigiNews (ISSN 1093-5371) is a Web-based newsletter conceived by the RLG preservation community and developed to serve a broad readership around the world. It is produced by staff in the Department of Research, Cornell University Library, in consultation with RLG and is published six times a year at www.rlg.org. Materials in RLG
DigiNews are subject to copyright and other proprietary rights. Permission is
hereby given to use material found here for research purposes or private study.
When citing RLG DigiNews, include the article title and author referenced plus
"RLG DigiNews, Please send comments and questions about this or other issues to the RLG DigiNews editors. Co-Editors: Anne R. Kenney and Nancy Y. McGovern; Associate Editor: Robin Dale (RLG); Technical Researcher: Richard Entlich; Contributor: Erica Olsen; Copy Editor: Martha Crowe; Production Coordinator: Carla DeMello; Assistant: Valerie Jacoski. All links in this issue were confirmed accurate as of June 13, 2003.
|
||
| |
|
|
|
|
|
|
|
|