Cultural heritage institutions have been actively digitizing their collections for more than ten years, and in that time countless standards – both de facto and de jure – have been created to facilitate enhanced, long-term access to these emerging digital collections. And while the development of technical specifications for digitization has been key, some of the most influential developments have been in the area of metadata.
Early on, the need for better methods of resource discovery sparked the formation of what is now the Dublin Core Metadata Initiative and standardization for resource discovery metadata began. It wasn’t until 1999 that questions about technical metadata began to emerge from cultural heritage institutions. Though most cultural heritage institutions had begun to digitize their collections at a rapid pace, few were consistently collecting metadata that would enable them to maintain the functionality and quality intrinsic to the images despite whatever preservation strategies might be applied over the long-term. Institutions attributed the problem to a lack of knowledge or standards specifying technical metadata for digital images.
In April 1999, the National Information Standard Organization (NISO), the Council on Library and Information Resources (CLIR), and RLG sponsored a workshop to examine the technical information needed to manage and use digital still images that reproduce a variety of pictures, documents, and artifacts. An outgrowth of that workshop was the development of NISO Z39.87 Technical Metadata for Digital Still Images, a data dictionary defining a standard set of technical metadata elements that would allow users to develop, interpret, and manage digital images for the long-term.
Technical Metadata for Images and NISO Z39.87
Why is technical metadata so important? Although technical metadata is only a subset of the complete suite of preservation metadata necessary to achieve the long-term viability of a digital asset, it has often been called the first line of defense against losing access. Technical metadata assures that the information content of a digital file can be resurrected even if traditional viewing applications associated with the file have vanished. Furthermore, it provides metrics that allow machines, as well as humans, to evaluate the accuracy of output from a digital file. In its entirety, technical metadata supports the management and preservation of digital images throughout the different stages of their life-cycles.
Technical metadata is necessary to support two fundamental functions: documentation of image provenance and history (production metadata); and assurance that image data would be rendered accurately on output (to screen, print, or film). Ongoing management, or “preservation,” of these core functions would require the development of applications to validate, process, refresh, and migrate image data against criteria encoded as technical metadata.
The NISO Z39.87 data dictionary covers four distinct categories of functions:
Basic image parameters record information crucial to displaying a viewable image.
Image creation metadata records information crucial to understanding the technical environment in which a digital image file was captured.
Imaging performance assessment metadata records information that allows evaluation of the digital image’s quality, or output accuracy.
Change history metadata records information about the processes applied to an image over its life cycle.
NISO Z39.87 has been available for use since 2002 under the status of Draft Standard for Trial Use (DSFTU). The data elements in the DSFTU version relied heavily upon information found in TIFF files and while useful, this convention has become somewhat limiting with the advent of file formats such as JPEG2000.
Currently, Z39.87 is undergoing significant revision so that it can better and more accurately document the range of file formats that institutions are collecting and managing. Data elements in the revised version build and expand on the DSFTU versions, including technical metadata available in TIFF, TIFF/EP, JPEG, and JPEG2000 file formats, as well as metadata elements from the Digital Imaging Group’s1DIG35 metadata element set and the EXIF specification.
Although Z39.87 itself was designed to be agnostic in terms of implementation, the NISO Metadata for Images in XML Schema (MIX), commissioned by NISO and created by the Library of Congress, has been the dominant form of use for the data dictionary. Because MIX is a Metadata Encoding and Transmission Standard (METS) extension schema, implementation and use of the data dictionary on a local level has been fairly easy to manage. Surveys have also shown that Z39.87 has informed the creation of local metadata element sets, including contributing to the formation of broader preservation metadata elements sets.2
The OCLC-RLG Preservation Metadata: Implementation Strategies (PREMIS) working group is in the process of creating a comprehensive data dictionary for preservation metadata independent of file format. Since technical metadata is file-format specific, the revised version of Z39.87 will complement the all-encompassing PREMIS effort for data sets comprised of digital images. However, putting digital preservation into practice still requires economic ways of data capture.
Collecting Technical Metadata to Support Preservation
Despite the importance of technical metadata, community action to gather the necessary metadata has been slow to come. There are two related reasons for this: the inability of many capture devices to record some of the technical metadata desired; and the largely manual process that most institutions have been relying upon to gather and document metadata.
Most of the information available about capture devices and metadata recording is related to digital cameras and comes from product reviews or recent surveys conducted by Kodak.3 From these observations and reviews, it is clear that the full range of metadata from any of the related standards is underutilized. At best, most cameras – consumer and professional levels – capture some core TIFF elements, very few of the available EXIF “camera capture” elements, and a few additional elements categorized as “GPS tags” and “Thumbnail tags.” Surprisingly however, current cameras labeled as “consumer cameras” were more likely to record more information than the “professional cameras” offered by the same company. Recent conversations with members of the I3A/IT10 (Electronic Still Picture Imaging) standards committee reveal that the upcoming adoption of JPEG2000 (and in particular, the JPX file type) by some digital camera manufacturers promises to allow greater technical metadata capture, but this prospect may apply only to manufacturers willing to adopt JPEG2000 as optional file formats from the device. More work must be done to ensure that device manufacturers are recording and making available the metadata and file formats needed by the cultural heritage community.
The actual collection of metadata and the detail to which an institution documents its digital collections is also a significant problem. As a generally labor-intensive activity, institutions have routinely collected and documented minimal metadata to reduce the overall cost of creating and storing the collection. But this cost-cutting measure is potentially short-sighted. Will an institution have the capability to render its digital files over time? Or more critically, will an institution have enough information to perform appropriate preservation measures and keep information viable? Do we really know how much metadata we will need in order to preserve image files for the long-term?
The answers to these questions are not yet known. It is unlikely that we will soon know exactly how much metadata is really needed to support future access and management of digital images though we know that current practice of minimal metadata collection is unlikely to be enough. Several institutions have begun to perform preservation actions on certain image files and whispers are beginning to be heard regarding the necessity of enough appropriate metadata to perform the tasks at hand. But how can an institution acquire enough of the correct technical metadata to hedge bets over time and facilitate the economic creation of digital collections? (Even Z39.87 is comprised of approximately 125 metadata elements and approximately 40 of those elements are mandatory.) The only realistic and feasible answer is to automate metadata collection and extraction to the extent possible.
Automating Technical Metadata Collection: The Automatic Exposure Initiative
The first step toward automating technical metadata collection is the identification of a target metadata element set. The Z39.87 element set fills that role as a recognized, community-created, soon-to-be international standard.4 The accompanying MIX schema serves as a placeholder for institutions to record the information, especially within METS. Problem one solved. Yet until recently, two problems continued to impede progress on automated metadata extraction: the inability of capture devices to record the range of technical metadata required to support long-term preservation and management; and the inability to easily expose and capture the metadata that does exist in digital image files. To address both of the remaining problems, RLG formed the Automatic Exposure initiative.
The Automatic Exposure initiative helps institutions meet the technical metadata imperative by pursuing a variety of implementation strategies. The initiative engages manufacturers of high-end scanners and digital cameras in a dialog about how their products can automatically capture technical metadata and make it available for transfer into digital repositories and asset management systems. Furthermore, it identifies existing or emerging technologies for harvesting technical metadata developed at individual institutions or by the industry, and explores how those tools could be leveraged to serve the entire community. NISO as the custodial home for NISO Z39.87 co-sponsors the initiative, and the Digital Library Federation (DLF) and the Museum Computer Network (MCN) pledged their support from the outset.
In the first phase of Automatic Exposure, RLG distributed an informal survey in June 2003 to identify stakeholders, current practices and common digital capture devices in the community. Despite limited circulation, well over 100 responses were received. In summary, the responses verified that capturing technical metadata tends to be a manual, time-consuming process. All of the responding institutions wholeheartedly subscribed to the value of recording technical metadata, yet only a minority had the ability to capture the technical properties of files even at the most basic level.
The survey responses drove subsequent activities, including a white paper outlining the problems, solutions, and opportunities to be investigated as a part of the Automatic Exposure initiative. Further, the survey identified the prevalent digital capture devices used in the cultural heritage community, thereby providing a “shortlist” of manufacturers with which to work over the course of the initiative. Finally, the white paper identified a number of software programs that have been developed by or for cultural heritage institutions to help them expose and export technical metadata for image files. The compiled list represents significant work of the community to create a suite of tools necessary to support digital preservation. More importantly, most of these tools have been developed using the open source model and are available for use by other cultural heritage institutions.
New Dialog with Device Manufacturers
In the dialog with manufacturers, the project has aimed to find common interests in recording metadata and making it available for further processing, such as ingest into preservation systems. While the cultural heritage community has defined a standard metadata element set for digital preservation in Z39.87, the industry has launched a number of initiatives that promise to deliver self-describing digital files, or files that carry within their code vital information about their origination, content, access rights, etc. In some instances, these initiatives propose metadata element sets that include tags relevant to digital preservation (such as DIG35 or EXIF); in other instances, they propose specific or generic transfer mechanisms for self-describing metadata (such as the XML box in JPEG 2000’s JP2 and JPX file types or Adobe’s eXtensible Metadata Platform, Adobe XMP). The industry at large and the manufacturers of digital capture devices have already made an investment by developing and implementing some of these technologies, though a review of most industry initiatives revealed that none of the existing specifications delivers the complete metadata set crucial for digital preservation as outlined in NISO Z39.87.5
Over the course of the initiative, many device manufacturers have responded positively to the invitation to participate in the initiative, among them Betterlight, Creo/Leaf, HP, Kirtas Technologies, Kodak, Sinar Bron, and PhaseOne. Assistance from experts such as Franziska Frey (Rochester Institute for Technology) and Don Williams (Eastman Kodak) have been instrumental in further connecting this community-based effort with industry. Although no promises have been made, each of the above-named device manufacturers has expressed interest in responding to the needs of this community. Future hardware and software developments from these manufacturers will tell the story and cultural heritage institutions are urged to contact device manufacturer representatives and emphasize their needs.
Looking to the Future — New File Formats and New Tools
Recently, several device and imaging software manufacturers have announced plans to develop new, “archival” file formats such as the newly introduced Adobe Digital Negative (DNG) raw file format or those that will be created through the new Picture Archiving and Sharing Standard (PASS) group. We hope that this convergence in interests presents us with the opportunity to work with the new initiatives so that future file formats can supply a complete Z39.87 technical metadata set.
At the same time, the community cannot afford to wait for a “magic bullet” file format because it is unlikely to come. Instead, institutions should become familiar with existing tools that will assist with metadata exposure and extraction. The Spotlight below contains a list of metadata harvesting tools that are available for use now. In addition, RLG will be releasing two new tools as a part for the Automatic Exposure initiative. The first tool, the “Automatic Exposure Scorecards,” will profile and review the available technologies for capturing technical metadata. Scorecards will be available on the Automatic Exposure web site in the coming months. A second tool under development is a Z39.87-Adobe Extensible Metadata Platform (XMP) panel to allow the extension of the metadata handling capabilities of Adobe Photoshop, a commonly used software package in the cultural heritage digitization process. When completed, this tool will be announced in a future issue of RLG DigiNews and will be freely available on the RLG web site.
1 The Digital Imaging Group (DIG) merged with the Photographic and Imaging Manufacturers Association (PIMA) to form the International Imaging Industry Association (I3A) in July 2001.
2 Both the Automatic Exposure survey and the survey conducted by the Implementation subgroup of the PREMIS working group found this alternate use of the Z39.87 data dictionary.
3Automatic Exposure: Capturing Technical Metadata for Digital Still Images. Mountain View, CA: RLG, 2004. See Appendix 3: Kodak’s Professional Camera Metadata Survey (2002), Appendix 4: Kodak’s 2002 Consumer Digital Camera Metadata Survey, and Appendix 5: Kodak’s 2003 Consumer Digital Camera Metadata Survey.
4 Though the NISO acronym properly translates to the National Information Standards Organization, the standards this organization creates are largely those utilized in the networked environment. The restriction to “national” in this sense is untrue. In fact, the standards this organization creates and supports have international support and impact.
5 Though none of the existing specifications delivers the complete technical metadata set as outlined in NISO Z39.87, all of them cover at least some of the data elements specified there.