[RLG logo] This site was frozen in 2004 and is now out of date. Please go to RLG's current Web site for all information. Questions? Contact us.
RLG and Preservation

Working Group on Preservation Issues of Metadata

Final Report
May, 1998


WORKING GROUP MEMBERS: Barbara Berger, Cornell University; Jim Coleman, Stanford University (Co-Chair); Willy Cromwell-Kessler, RLG (Co-Chair); Robin Dale, RLG; Bob DeCandido, New York Public Library; Carla Montori, University of Michigan; Seamus Ross, University of Glasgow.

Original Charge to Working Group

BACKGROUND

Digital materials are increasingly important in the development of research collections. In particular, the preservation and reformatting community is in the process of incorporating digitization into their repertoire along with microfilming efforts. A significant component of creating and managing digital collections is ensuring that the information essential to their continued use is preserved in an accessible form. The Working Group on Preservation Issues of Metadata was constituted in May 1997 as a first step in the process of addressing this issue. The group was asked to identify the descriptive data elements that should be associated with digital master files that have preservation-based intent.

It is a commonplace that metadata serves many purposes, but to date the main emphasis has been on defining elements essential for discovery and retrieval. Consequently, the starting place for the group was to examine two prominent metadata systems that purport to offer a set of "core" elements necessary for discovery of resources: the Dublin core elements and the Program for Cooperative Cataloging's USMARC-based core record standard. The group decided to specify the elements extra to these core element lists that are important to serve preservation needs for digital masters. The list of data elements below is the result of this process.

Simultaneously, another group, the RLG Working Group On Preservation and Reformatting Information (PRI), was examining the mechanism for sharing of preservation information through the medium of the USMARC record. Consequently, the Metadata Working Group also took care to ensure that its recommendations would be compatible with the work of PRI.

SCOPE

Since the concept of metadata takes in a lot of territory, the Working Group had to begin by defining the constraints that should govern the scope of its activity:

Technological constraints
Given the fact that the relevant technologies are in a state of ongoing and rapid development and that digitization efforts are still evolving in many respects, the group limited its task as follows:

The Working Group concluded that it is premature to make recommendations concerning the way that preservation information should be stored. Such information may be included in a header of a digital file, it may exist in some separate but linked format, or it may be incorporated in a USMARC cataloging record that may or may not be linked to a corresponding digital file.

The Working Group noted that many categories of information important to preservation needs might be automatically captured at the point of digitization and supports efforts to define a preservation standard for the formatting and retention of such information. The Working Group particularly noted the efforts of the Society of Motion and Television Engineers (SMPTE) to define a universal preservation format for videos as an important step in this direction. However, it is too early for this report to attempt to take such work into account in the preparation of its recommendation.

Format constraints
The Working Group also limited itself to a consideration of data elements that describe digital image files. Doing so allowed the group to address the most significant need within a timeframe short enough to be meaningful. Members also agreed that it would be most efficient to constitute other specialist groups to supplement the list of data elements, adding elements for other formats (e.g., audio files, moving images) as the need becomes more pressing.

Functional constraints
Members of the Working Group noted that information that is not specifically related to preservation tasks may be of potential interest to the preservation community-for example, copyright and use restriction information can be crucial and might appropriately be recorded at the time that preservation staff are creating the digital master. Members concluded that since the scope of such information often exceeds preservation needs, it should more appropriately be dealt with by other specialist groups. However, data elements that might serve other purposes as well are included as long as they address a core preservation information need.

SUPPORTING RECOMMENDATIONS

As a result of the above considerations, the group endorses the following recommendations:

PRESERVATION METADATA ELEMENTS

The following list of sixteen elements represents information that the working group deems crucial to the continued viability of a digital master file. Institutions may exceed this list or not, but the Working Group recommends that all the enumerated elements that are relevant to a specific file be recorded.

Since it is recognized that these elements may be recorded according to the specifications of any one of a number of metadata systems, no effort has been made to specify syntax. The list below, including examples, is meant to provide a semantic framework only. The format of the examples is intended to be illustrative, not prescriptive. In order to demonstrate how the list might be used, possible implementations are included in the attached appendices.

1. DATE

DEFINITION: Date file is created

FORMAT: yyyyddmm

2. TRANSCRIBER

DEFINITION: Required: Name of the agency responsible for transcribing the metadata. Optional: may include identification of individual transcribing metadata.

EXAMPLE: Stanford University Libraries. Conservation and Preservation Dept. ; BLK.

3. PRODUCER

DEFINITION: Required: agency responsible for the physical creation of the file. One agency may have caused the file to be created by a second (possibly commercial) agency. In this case, record the name of the agency responsible the actual creation of the file, not the delegating agency. Optional: May additionally identify individual primarily responsible for scanning, etc.

EXAMPLE 1 (Research Library with in-house scanning operation; includes initials of scanner): Stanford University Libraries. Conservation and Preservation Department; KES

EXAMPLE 2 (Commercial firm to which scanning has been outsourced) Luna Imaging, Inc., 1315 Innes Place, Venice, CA 90291-3617, USA

4. CAPTURE DEVICE

DEFINITION: Indicate make and model of digital camera or scanner

EXAMPLE: Kronton 3012

5. CAPTURE DETAILS

DEFINITION 1 (Capture device is a scanner): Name scanner software, including version information; give scanner settings, gamma correction, and other relevant details pertaining to scanning

EXAMPLE: PixelCraft Proimager 8000

DEFINITION 2: (Capture device is a digital camera): Give lens type, focal length, light source type, & indicate if image is tiled.

EXAMPLE: Nikon 24mm lens; high frequency fluorescent studio camera lights, Videsence, model Pl330, with Osram 55 watt 3200 degree color temperature

6. CHANGE HISTORY

DEFINITION: A record of modifications made to the file, and significant versions generated, identifying the person/institution who made them and the date they were made.

EXAMPLE 1: Original digital master image file migrated from TIFF v.X to TIFF v.X+1 using YYY software by JWC on 20010206.

EXAMPLE 2: Printing file created from original digital master using YYY software by JWC on 19990411. Colors bars cropped out, pixel dimensions retained, image sharpened.

7. VALIDATION KEY

DEFINITION: A mechanism, usually consisting of a number, that allows one to verify that an electronically transmitted file is what it purports to be i.e., the file is what is described in the metadata. At the simplest level, such a key might consist of the number of lines in a file (similar to the way that one indicates the number of pages that are transmitted via fax). Especially prevalent is the use of a checksum which is an algorithm based on a manipulation the sum of the bits that make up a file to yield number that serves as a unique identifier for that file.

EXAMPLES: Standard internet checksum; Roland checksum

8. ENCRYPTION

DEFINITION: Technique by which data is scrambled before transmission in order to insure privacy. Encrypted data must be unscrambled (decrypted) by the receiver. If a file is encrypted, the type of encryption should be indicated.

EXAMPLE: RSA Public Key Cryptosystem

9. WATERMARK

DEFINITION: Indicate whether or not some bits in the file have been altered in order to create a "digital fingerprint" that can serve to establish ownership of an image and prevent unauthorized use.

EXAMPLES: Watermark by Digimarc Professional, Watermark by Invisible Ink for Images

10. RESOLUTION (e.g. pixel dimensions, dpi, ppi)

DEFINITION: Traditionally determined by the number of pixels used to represent the scanned image, expressed as pixel dimensions, pixels per inch or dots per inch. Current research into the use of Modulation Transfer Function (MTF - a function of the spatial wave number) to measure resolution should allow a more objective numerical value to be assigned as the measurement.

EXAMPLES: 4096 x 6144 pixels; 600 dpi; 320 dpi

11. COMPRESSION

DEFINITION: Indicate whether or not the file has been compressed (i.e. reduced in size), and if it has, identify the level and method of compression.

EXAMPLES: LZW; JPEG, compression level 10 (Corel Photopaint)

12. SOURCE

DEFINITION: Describe physical characteristics of the source such as its size, condition, and its place in the chain (e.g., original, copy, or copy of a copy). Include information about modifications made to the source to enable better digitization. For images of photographs and digitized microforms, include image type (i.e., positive or negative image).

EXAMPLES: Photocopy; 20 x 25 cm.; Original; waterstained; 18 x 22 cm.

13. COLOR

DEFINITION: Indicate pixel depth.

EXAMPLES: 1-bit; 8-bit

14. COLOR MANAGEMENT

DEFINITION: Identify system, if any, that is used to improve consistency of color across capture, display and output of an image

EXAMPLES: Photo CD; OptiCal (color management system); Profile/80 (color sync profile maker); Softproof (Photoshop Plugin)

15. COLOR BAR/ GRAY SCALE BAR

DEFINITION: Indicate presence or absence of either and, if present, identify the type.

EXAMPLES: Kodak Q13 or Q14 Color Separation Guide and Gray Scale; Kodak Q60 Color Input Target

16. CONTROL TARGETS

DEFINITION: Include information about targets included in the scanned file for purposes of quality control, calibration, verification, etc.

EXAMPLES: AIIM Scanning Test Chart #2; RIT Alphanumeric Resolution Test Object, RT-1-71; IEEE Std 167A-1995 Standard Facsimile Test Chart


Appendix 1: Dublin Core Implementation

Presented below is a effort to incorporate the metadata elements enumerated in the body of the report into a Dublin Core record template. Some data elements have been created as extensions to currently agreed Dublin Core metadata elements and are tagged as RLG (for RLG Preservation Metadata) elements rather than DC elements for illustrative purposes.

This example is not intended to be prescriptive, but to suggest directions that might be explored further and experimented with more extensively. There are undoubtedly a number of alternative ways to embed preservation metadata into Dublin Core records, ranging from simple links to associated files to more elaborate container architectures. Shared experiments in this direction and continued discussion among the members of the preservation community might be especially fruitful in developing future guidelines.

Hypothetical Dublin Core Record Incorporating Preservation Metadata Elements

DC.Title: [Title of digitized item]
DC.Creator.PersonalName: [Author or creator of intellectual content]
DC.Creator.Role: Author
DC.Contributor.CorporateName: [Agency responsible for transcribing metadata]
DC.Creator.Role: Transcriber (Metadata)
DC.Contributor.CorporateName: [Agency to which digitization was outsourced]
DC.Contributor.Role: [Producer]
DC.Contributor.CorporateName.Address: [Address of outsourcing agency]
DC.Publisher: [Institution responsible for digitization]
DC.Date: [date digital preservation copy created--YYYY-DD-MM]
DC.Form: Image

RLG.Form.Capture: [Make and model of scanner or digital camera and relevant capture details]
RLG.Form.Validation: [Validation Key, Watermark]
RLG.Form.Encryption: [Encryption technique]
RLG.Form.Compression.Method [e.g., JPEG, LZW]
RLG.Form.Compression.Level [value including capture device information that makes this information meaningful]
RLG.Form.Color: [The color palette with which the associated image or information is rendered]
RLG.Form.ColorManagement: [Associated color management systems]
RLG.Form.Resolution: [e.g., pixel dimensions, dpi, ppi, mtf]
RLG.Form.Modification: [Change History]
DC.Description: [Color Bar/Gray Scale Bar; Control targets]
DC.Identifier: [URL of document if metadata not carried in header]
DC.Source.Date: [Date of print version that is digitally reproduced]
DC.Source.Publisher: [publisher of print version that is digitally reproduced]
RLG.Source.Condition: [Physical condition of source item, etc.]

NOTE: Alternatively, instead of source use RELATION element to identify print version:

DC.Relation
DC.Relation.Type: IsVersionOf
DC.Relation.Identifier: [e.g., catalog record no. for original]

Appendix 2: Preservation-Related Metadata Recorded In USMARC Records

The templates below offer maps of the 16 Preservation Metadata Elements (described previously) to a USMARC record. Bracketed numbers correspond to the list of the 16 recommended data elements.

Please note the following points:

Please also note that the RLG Working Group on Preservation and Reformatting Information (PRI), which is explicitly concerned with the USMARC record, has prepared a discussion paper for ALA's Machine Readable Bibliographic Information (MARBI) Committee which would extend the 007 in order to include in coded form much of the information that must otherwise be included in variable data fields. The PRI Working Group is also preparing examples demonstrating a potential standard configuration of the 533 field that could be used in conjunction with the extended 007. The adoption of these proposals would considerably simplify the addition of information corresponding to the recommended preservation metadata elements.

TEMPLATE 1: Description of digital master added to record for hard copy (monograph).

040 NUC$dNUC [2]
100 1 Author, Major.
245 12 A very important book /$cby Major Author; edited by Serious Scholar.
250 4th ed., rev.
260 London :$bProminent Publisher,$c1854.
300 672 p. :$bill. ;$c28 cm.
500 Includes index and bibliographies.
533 Computer file. $bBig City: $cBig University Preservation Dept. $d1997. $f(Scanning Project Series ; 34556)$nChange history [6]. $n795 image files; Capture device [4] and details [5]; Validation key [7]; Encryption [8]; Watermark [9].$nResolution [10]; compression [11]; color [13]; color management details [14].$nPresence/type of targets [16], color bar/gray scale bar [15].
583 $b1997-10-10 [1]; $lScanned $xImage Outsourcing Co., 1234 Industrial Park St., Big City, CA [3]; $xCapture device operator [3]
590 Big City Univ. copy: Pages 2-4 lacking. [12].
650 0 Subject 1
650 0 Subject 2
700 10 Scholar, Serious.
830 0 Scanning project series ;$v34556.
856 41 $uhttp://www.abcd.edu/library/dlib/authorm1.tif

TEMPLATE 2. Separate computer file record for digital version.

040 NUC$dNUC [2]
100 1 Author, Major.
245 12 A very important book $h[computer file] /$cby Major Author; edited by Serious Scholar.
260 University Town, CA :$bBig University Preservation Dept.,$c1997 $e(Big City (1234 Industrial Park St., Big City 94025) [3] :$fImage Outsourcing Co.) [3]
256 Data (795 image files)
440 0 Scanning project series ;$v34556
538 Change history [6]
538 Capture device [4] and details [5]; validation key [7]; encryption [8]; watermark [9].
538 Compression [11]; resolution [10]; color [13]; color management details [14].
500 Presence/type of control target [16], color bar/gray scale bar [15].
534 $pDigital reproduction of: $b4th ed., rev. $cLondon: Prominent Publisher, 1854. $e672 p. : ill. ; 28 cm. $nBig. Univ. copy: p. 2-4 lacking. [12]
590 Scanned 1997-10-10. [1]
650 0 Subject 1.
650 0 Subject 2.
700 10 Scholar, Serious.
830 0 Scanning project series ;$v34556.
856 41 $uhttp://www.abcd.edu/library/dlib/authorm1.tif


Appendix 3: XML Implementation

The model below shows how the conservation elements designated in the report might be configured in a simple XML record. The model record below, would, of course, reflect the specifications of a DTD which is not reproduced in this report. Note that the model below does not conform to the RDF specification which would provide another, significant way to present the requisite conservation data in XML format.

Model XML Record Incorporating Preservation Metadata Elements

<RLG.SOURCE_TITLE>[Title of item that is digitized]</RLG.TITLE>
<RLG.SOURCE_CREATOR ROLE="Author">
      <RLG.PERSONAL_NAME> [Author/creator of original item]</RLG.PERSONAL_NAME>
</RLG.SOURCE_CREATOR>
<RLG.SOURCE_PUBLISHER>[Publisher of original item]</RLG.SOURCE_PUBLISHER>
<RLG.SOURCE_DATE>[Publication date of original item]</RLG.SOURCE_DATE>
<RLG.SOURCE_CONDITION>Pages 3-5 missing; waterstained</RLG.SOURCE_CONDITION>
<RLG.DIGITIZED_VERSION URL="[URL for digitized version]">

<RLG.TRANSCRIBER>
<RLG.TRANSCRIBER_NAME>[Name of agency that transcribes metadata </RLG.TRANSCRIBER_NAME>
<RLG.PRODUCER>
<RLG.PRODUCER_NAME>[agency that created the digitized version, e.g. outsource agency]</RLG.PRODUCER_NAME>
<RLG.PRODUCER_ADDRESS>[address of agency that created the digitized version]</RLG.PRODUCER_ADDRESS>
</RLG.PRODUCER>
<RLG.CAPTURE_DEVICE>[Make and model of digital camera or scanner]</RLG.CAPTURE_DEVICE>
<RLG.CAPTURE_DETAILS>[Details about scanner (e.g., software, version information, scanner settings, gamma corrections, etc.) or digital camera (e.g., lens type, focal length, light source type, etc.]</RLG.CAPTURE_DETAILS>
<RLG.DATE_DIGITIZED>[yyyy-dd--mm]< /RLG.DATE_DIGITIZED>
<RLG.IMAGE_DETAILS>
<RLG.VALIDATION>[Validation Key, Watermark, etc.]</RLG.VALIDATION>
<RLG.ENCRYPTION>[Encryption Technique]</RLG.ENCRYPTION>
<RLG.COMPRESSION METHOD="[Compression method]" LEVEL="[Compression level]"> </RLG.COMPRESSION>
<RLG.COLOR>[The color palette with which the associated image or information is rendered]</RLG.COLOR>
<RLG.COLOR_MANAGEMENT>[Associated color management systems]</RLG.COLOR_ MANAGEMENT>
<RLG.RESOLUTION>[e.g., pixel dimensions, dpi, ppi, mtf]</RLG.RESOLUTION>
<RLG.MODIFICATION>[History of changes to digital version]</RLG.MODIFICATION>
</RLG.IMAGE_DETAILS>
<RLG.DESCRIPTION>[Color Bar/Gray Scale Bar; Control targets]</RLG.DESCRIPTION>

</RLG.DIGITIZED_VERSION>


Last updated May 1998

This site was frozen in 2004 and is now out of date. Please go to RLG's current Web site for all information. Questions? Contact us.

HOME | SEARCH | CONTACTS | USER SUPPORT