
 |
 |
 |
 |
| |
Issue index |
|
| · |
Apr 15, 2007 |
|
| · |
Dec 15, 2006 |
|
| · |
Oct 15, 2006 |
|
| · |
Aug 15, 2006 |
|
| · |
June 15, 2006 |
|
| · |
Apr 15, 2006 |
|
| · |
Feb 15, 2006 |
|
| · |
Dec 15, 2005 |
|
| · |
Oct 15, 2005 |
|
| · |
Aug 15, 2005 |
|
| · |
Jun 15, 2005 |
|
| · |
Apr 15, 2005 |
|
| · |
Feb 15, 2005 |
|
| · |
Dec 15, 2004 |
|
| · |
Oct 15, 2004 |
|
| · |
Aug 15, 2004 |
|
| · |
Jun 15, 2004 |
|
| · |
Apr 15, 2004 |
|
| · |
Feb 15, 2004 |
|
| · |
Dec 15, 2003 |
|
| · |
Oct 15, 2003 |
|
| · |
Aug 15, 2003 |
|
| · |
Jun 15, 2003 |
|
| · |
Apr 15, 2003 |
|
| · |
Feb 15, 2003 |
|
| · |
Dec 15, 2002 |
|
| · |
Oct 15, 2002 |
|
| · |
Aug 15, 2002 |
|
| · |
Jun 15, 2002 |
|
| · |
Apr 15, 2002 |
|
| · |
Feb 15, 2002 |
|
| · |
Dec 15, 2001 |
|
| · |
Oct 15, 2001 |
|
| · |
Aug 15, 2001 |
|
| · |
Jun 15, 2001 |
|
| · |
Apr 15, 2001 |
|
| · |
Feb 15, 2001 |
|
| · |
Dec 15, 2000 |
|
| · |
Oct 15, 2000 |
|
| · |
Aug 15, 2000 |
|
| · |
Jun 15, 2000 |
|
| · |
Apr 15, 2000 |
|
| · |
Feb 15, 2000 |
|
| · |
Dec 15, 1999 |
|
| · |
Oct 15, 1999 |
|
| · |
Aug 15, 1999 |
|
| · |
Jun 15, 1999 |
|
| · |
Apr 15, 1999 |
|
| · |
Feb 15, 1999 |
|
| · |
Dec 15, 1998 |
|
| · |
Oct 15, 1998 |
|
| · |
Aug 15, 1998 |
|
| · |
Jun 15, 1998 |
|
| · |
Apr 15, 1998 |
|
| · |
Feb 15, 1998 |
|
| · |
Dec 15, 1997 |
|
| · |
Aug 15, 1997 |
|
| · |
Apr 15, 1997 |
|
Printable Version
|
 |
 |
 |
 |
 |
 |
 |
 |
 |
Editor's Note |
 |
Print this article only |
 |
 A Fond Farewell
 |
 |
 |

As Jim and Lorcan have noted, this is the last issue of RLG DigiNews in its current incarnation. It really hit me last week as I sat in my living room chair early one morning, drinking coffee and reviewing the feature articles, that the job I’ve been doing for a decade has come to an end. Devoting this last issue under the editorship of Cornell University Library to reflecting on a decade of change has allowed my colleagues and me to reach closure more easily. Ten years ago, Google was neither a household name nor a verb. Mass digitization was imagined in the thousands of images rather than the millions and billions. And “digital preservation” was commonly used interchangeably with digitization. There was no OAIS, no PREMIS, no JHOVE; MTF wasn’t a term that tripped off the tongue easily. Certification referred to methelyne blue tests, not trustworthy digital repositories.
Our feature articles highlight some of the changes in the two key areas consistently covered by RLG DigiNews over the years: digital imaging and digital preservation. It’s gratifying to see the progress that has occurred in our understanding of the issues, particularly as they have been informed by practical experience at a range of cultural institutions. The FAQ continues a long tradition of probing assumptions about what is and what isn’t—this time focusing on legal impediments to digital preservation and the role of Open Archives. And, if we can be forgiven for being self-referential, it’s only fitting to showcase RLG DigiNews as the last Highlighted Web Site.
Over the past decade, I’ve had the privilege of working with wonderful colleagues both at RLG and Cornell. Robin Dale served as Associate Editor from the very beginning, providing invaluable advice, support, and content along the way. Other RLG contributors included Nancy Elkington, Jennifer Hartzell, and Jane Moss. A total of seventeen staff at Cornell helped produce RLG DigiNews over the years. I’d particularly like to acknowledge the contributions of Oya Rieger (co-editor from 1997 to 2001), who co-developed the newsletter’s focus, and Nancy McGovern (co-editor from 2002-2006), for her deep understanding of digital preservation. Rich Entlich served as the FAQ editor, gaining well-deserved kudos for his thoughtful insights into technical dimensions of digital imaging and preservation. Peter Hirtle served as Advisor and frequent contributor, lending his expertise in intellectual property issues. Barbara Berger Eden edited the announcements and calendar of events for a number of years in her capacity as Production Editor. Carla DeMello brought her considerable design skills to bear on the look and feel of the newsletter. Others involved in production and editing included Ellie Buckley, Peter Botticelli, Jenn Colt-Demaree, Martha Crowe, John Dean, Kimberly Gazzo, Robert Glase, Valerie Jacoski, Erica Olsen, and Allen Quirk.
I can safely speak for all of my colleagues in expressing our gratitude to RLG for this decade of collaboration, to the many authors who contributed feature articles, FAQs, editor’s interviews and conference reports, and to the readers for their interest and timely feedback. We’ll continue to support RLG DigiNews as eager consumers of its newly conceived focus and direction.
Anne R. Kenney, Editor, RLG DigiNews
 |
 |
 |
 |
 |
 |
 |
 |
 |
Feature Article 1 |
 |
Print this article only |
 |
 Digital Imaging - How Far Have We Come and What Still Needs to be Done?
Authors: Steven Puglia - US National Archives and Records Administration (steven.puglia@nara.gov), Erin Rhodes - US National Archives and Records Administration (Erin.Rhodes@nara.gov)
 |
 |
 |

Introduction
Libraries, archives, and museums have been engaged in the digitization of their collections for well over a decade now. As we look back over the past ten years, what is the best way to assess how far we have come and what work on defining digital imaging approaches still needs to be done?
This article attempts to provide a brief overview of the conceptual and technical influences that have defined digital imaging in cultural institutions during the last decade—by looking at goals and objectives for digital reformatting and how they have changed, by looking at specifications and imaging guidelines and how they have evolved, and by identifying areas that still merit further investigation. The focus is on digitization used to create raster images, as this type of work represents a very high percentage of the digitization that has been done to date.
In general, digitization has moved past the experimental, startup, standalone operation phase. Unfortunately, in many organizations, digitization projects still have not been fully realized as “mainstreamed programs.” Even if digital imaging activities are not quite institutionalized, digitization has found its place within a larger context and is directly related to work being done in the following areas:
- archival and preservation issues and activities
- managed repositories
- IT infrastructure (networks, databases, storage)
- on-going collection and digital project management and policy issues
- Web and online access; metadata and cataloging; and other digital library activities
In many institutions, digitization programs have forged relationships with allied departments, such as faculty media labs, academic computing centers, campus museums, faculty research projects, and with IT systems, such as bibliographic catalogs, collection management systems, digital asset management systems, etc. As we move towards supporting large scale digital imaging programs organizationally, technically, and with the dedication of more resources, there is a growing and sobering understanding of the significant investment that will be required to carry out effective digital imaging initiatives. This is especially true in the areas of staff expertise, IT systems and infrastructure, digitization and metadata specifications and standards, and the costs to create digital resources and manage them over the long-term.
As digital imaging activities also move beyond digitization in special collections in libraries and archives to include involvement in large-scale commercial partnerships for mass digitization of more general collections, the use and nature of these collections are being transformed as well from fixed, discrete, unique collections to resources that will provide the groundwork for networked information, research, and services that we had not envisioned prior to this time.
From some perspectives, a great deal of progress has been made in our understanding of digital imaging as a technology and how to use it within cultural institutions for reformatting collections and making them more accessible. Conversely, some, like Nicholson Baker, have argued that prior approaches to “preservation reformatting,” such as microfilming, were conceptually and technically flawed, and we are only carrying forward similar problems into the digitization environment.
It is undeniably true that the more we learn about digital imaging as a technology, and about digitization as an institutionalized program, the more we realize there is much more to learn. In reviewing efforts during the last decade to define digital imaging approaches, we conclude that in some areas, not a lot of progress has been made.
Goals of Digital Reformatting
“Technology” as a term has come to be synonymous with computers and information technology, particularly the term “high technology.” The dictionary definition of technology is “the practical application of knowledge especially in a particular area” (Merriam-Webster online, 2007). Technology is never “THE answer” to our problems in cultural heritage institutions (and in many ways the same can be said for life in general): technology is just the tools we have available to us to address problems. Over time, the nature and types of tools change, and so does our understanding of problems and how best to address problems. The goal is to become sophisticated users of all appropriate tools, by selecting and using them wisely. We need to acknowledge both the advantages and disadvantages of our tools, and our corresponding technological choices for solving specific problems. What are our goals and assumptions for digital imaging in the context of cultural heritage institutions?
Early digital imaging efforts focused markedly on the technology itself; on the technological feasibility of scanning, on defining work processes, and on accomplishing the actual conversion—rather than focusing on the bigger picture questions of how best to use digitization in an institutional, preservation, or other particular context. The complete range of issues needing to be addressed in order for digitization to be an effective tool within our institutions was not initially tackled. From an imaging perspective, this entails asking ourselves what are the essential characteristics of originals that we want to replicate and carry forward in the digital copy?
Essential characteristics will inform future users about the original resources. The definition of and selection of essential characteristics is informed primarily by curatorial/archival and preservation decisions. Often, they are defined by a variety of physical and qualitative properties (for photographs—such as generation, size, quality, condition, intended use, etc.). Also, they will be unique to the collection/record/media type, and at times are likely to be institution-specific. In all cases, they should be well defined and appropriate to the original resources. In the context of using digitization for preservation reformatting, the ability to define the essential characteristics of originals at a very high level allows us to determine whether the digital copy could truly “stand in” for the original.
The identification and definition of essential characteristics for collection materials in cultural institutions is not new. Approaches to analog preservation reformatting have included specific conceptual rationales regarding essential characteristics and corresponding imaging approaches to reproduce those characteristics. For example, industry standards relating to microfilming documents and preservation microfilming guidelines (http://www.loc.gov/preserv/usnpguidelines.html, http://www.oclc.org/preservation/microfilming/standards/default.htm, and http://www.archives.gov/about/regulations/part-1230.html#partc) focus on the essential characteristic of text legibility. Micrographics standards and guidelines outline specific imaging approaches to maintain this characteristic on the microfilm.
Early approaches to digital imaging came from primarily two perspectives: jump in and just do it, or conduct a pilot project to learn about the technology and the process and then to build a program around these experiences. Each of these approaches has fostered different cultures of digital imaging programs within libraries, archives, and museums. For the preservation community in particular, initial forays into digitization were in many ways an extension of brittle book reformatting projects—should we scan rather than microfilm? Early on, the community recognized that digitization increases and enhances access, but does not guarantee preservation in the same way as microfilm. 
Coming from a long tradition of microfilming, there was much initial interest in digitizing text. Early digitization approaches in the library community focused on defining essential characteristics for text-based originals and corresponding approaches to digitizing to match these characteristics. This includes work done at Cornell and Yale universities in the mid-1990s. In researching the scanning of text-based originals directly, the scanning of microfilm, and the feasibility of hybrid approaches (film-first and then scan, compared to scan first with the subsequent creation of computer output microfilm or COM), these projects focused on two essential characteristics—legibility of the smallest significant character and accurate rendering of type faces or fonts—as metrics for evaluating the appropriateness of specific digital imaging parameters.
While these are appropriate characteristics for high-contrast text based information, other types of originals have other or additional characteristics. Generally, early digitization guidelines adopted the approaches developed for high-contrast text. Many institutions simply accepted specifications developed from these projects, without much consideration of the particular originals or formats that were being digitized. In this phase of digital imaging in libraries, in many ways we were only asking - what is the digital equivalent of microfilm? Given other technological and economic limitations everyone was wrestling with to implement large-scale digitization initiatives, even trying to achieve the digital equivalent of microfilm seemed to be a major challenge.
Cornell University Library’s workshops on digitization in the 1990s provided the community with an excellent technical foundation for digitizing library collections. A major component of the workshop included an approach to defining essential characteristics, which was called benchmarking. The benchmarking concept was applied initially to text, then to graphic illustrations, and finally to photographs. The process focused on identifying the smallest significant character or feature as the primary metric for determining sampling frequency or spatial resolution. The benchmarking approach has had a significant influence on the development of digitizing guidelines over the last 10 years. The Digital Library Federation adopted recommendations for spatial resolution and bit depth as the Benchmark for Faithful Digital Reproductions of Monographs and Serials based on the extensive work conducted by Cornell and other institutions.
Unfortunately, over the last ten years the digital equivalent to microfilm is not holding up as an entirely acceptable model for digitizing text-based originals. Users are demanding that other essential characteristics be carried forward in the digital versions - such as color and the ability to see fine detail. Approaches to digitization are moving towards the consideration of both user expectations and the characteristics of the original resources, rather than being tied to an approach that replicates an earlier technology.
Another example of an analog reformatting approach that replicates essential characteristics is the National Archives and Records Administration’s (NARA) photographic duplication specifications (developed jointly with the Library of Congress over a period of several years with input from commercial vendors – available at http://www.archives.gov/preservation/formats/bw-copying-specs.pdf) for historic negatives. The NARA specifications and other approaches to duplicating still photographic negatives focus on the creation of duplicate negatives that have the same photographic properties as the original negatives. So that the photographic duplicates can be used and printed in the darkroom just like the originals, certain photographic properties are considered essential characteristics.
As described in the duplication specifications, we adopted a new approach to the tone reproduction for the duplicates, called shadow normalization. Traditional duplication approaches were set up to create duplicate negatives that matched the original negatives in terms of overall density, density range, and the relationship between the tones of the image. In order to optimize the duplication process for original negatives with large density ranges and to provide a means of objective assessment of the duplicates (using statistical process control), we opted to adjust the exposure for each negative and place the shadow density at a specified aimpoint on the duplicates. We concluded that the benefits of this approach outweighed losing one of the essential characteristics: the duplicates no longer had the same overall density of the originals (unless by coincidence the shadow density of the original was close to the aimpoint density).
We have certainly moved beyond the phase of limiting ourselves to deficiencies inherent in old technologies and approaching imaging as an extension or equivalent to microfilm. As we consider digitization as a means for preservation reformatting, we will have to weigh similar considerations as we define approaches to digital imaging. Defining characteristics for each type or class of digital object will most likely result in approaches that are not as consistent or standard across different resources, and may also be more difficult to implement.
Imaging Specifications and Guidelines
The following represents a chronology of digital imaging specifications and guidelines, and other articles and publications that have been influential (a fairly comprehensive list, but not intended to represent the “definitive” list, we apologize for leaving off any other significant documents).
|
1995 |
Digital Resolution Requirements for Replacing Text-Based Material: Methods for Benchmarking Image Quality CLIR Report pub53 By Anne R. Kenney and Stephen Chapman 1995 http://www.clir.org/pubs/reports/reports.html |
|
1996
|
Conversion of Microfilm to Digital Images Request for Proposal Library of Congress February 1996 http://memory.loc.gov/ammem/prpsal5/rfp5.pdf or http://memory.loc.gov/ammem/prpsal5/coverpag.html |
|
Requirements and Options for the Digitization of the Illustration Collections of the National Museum of Natural History National Museum of Natural History’s Collections and Research Information System By Donald D’Amato and Rex Klopfenstein March 1996 http://www.nmnh.si.edu/cris/techrpts/imagopts/ |
|
Recommendations for the Evaluation of Digital Images Produced from Photographic, Microphotographic, and Various Paper Formats By Franziska Frey and James Reilly May 1996 http://lcweb2.loc.gov/ammem/Ipireprt.pdf |
|
Digital Imaging for Libraries and Archives Cornell University Library By Anne R. Kenney and Stephen Chapman June 1996 http://www.library.cornell.edu/preservation/dila.html |
|
Digital Images from Original Documents – Text Conversion and SGML-Encoding Request for Proposal Library of Congress June 1996 http://memory.loc.gov/ammem/prpsal/rfp18.pdf or http://memory.loc.gov/ammem/prpsal/coverpag.html |
|
Digital Conversion of Research Library Materials – A Case for Full Information Capture D-Lib By Stephen Chapman and Anne R. Kenney October 1996 http://www.dlib.org/dlib/october96/cornell/10chapman.html |
|
1997
|
Digital to Microfilm Conversion: A Demonstration Project 1994-1996 Final Report to the National Endowment for the Humanities By Anne R. Kenney 1997 http://www.library.cornell.edu/preservation/com/comfin.html |
|
Conversion of Pictorial Materials to Digital Images Request for Proposal Library of Congress May 1997 http://memory.loc.gov/ammem/prpsal9/rfp9.pdf or http://memory.loc.gov/ammem/prpsal9/coverpag.html |
|
1998
|
Guidelines for Digitizing Archival Materials for Electronic Access U.S. National Archives and Records Administration By Steven Puglia and Barry Roginski January 1998 Guidelines - http://www.archives.gov/preservation/technical/guidelines-1998.pdf Matrix - http://www.archives.gov/preservation/technical/guidelines-matrix.pdf |
|
What is an MTF…and Why Should You Care? RLG DigiNews By Don Williams February 1998 http://www.rlg.org/preserv/diginews/diginews21.html#technical |
|
Digital Formats for Content Reproductions Library of Congress By Carl Fleischhauer August 1996 - http://memory.loc.gov/ammem/formatold.html July 1998 - http://memory.loc.gov/ammem/formatold.html |
|
Guidelines for Image Capture Joint RLG and NPO Preservation Conference – Guidelines for Digital Imaging By Stephen Chapman September 1998 http://www.rlg.org/preserv/joint/chapman.html |
|
Manuscript Digitization Demonstration Project By Louis Sharpe and Michael Ott For Library of Congress October 1998 http://memory.loc.gov/ammem/pictel/pictel.pdf or http://memory.loc.gov/ammem/pictel/index.html |
|
1999
|
Digital Imaging and Preservation Microfilm: The Future of the Hybrid Approach for Preservation of Brittle Books RLG DigiNews By Stephen Chapman, Paul Conway, and Anne R. Kenney February 1999 http://www.rlg.org/legacy/preserv/diginews/diginews3-1.html#feature1 |
|
Imaging Pictorial Collections at the Library of Congress RLG DigiNews By John Stokes April 1999 http://www.rlg.org/legacy/preserv/diginews/diginews3-2.html#feature |
|
Illustrated Book Study: Digital Conversion Requirements of Printed Illustration By Anne R. Kenney and Louis Sharpe For the Library of Congress July 1999 http://memory.loc.gov/ammem/techdocs/ibs.pdf or http://www.loc.gov/preserv/rt/illbk/ibs.htm |
|
Digital Imaging for Photographic Collections – Foundations for Technical Standards Image Permanence Institute By Franziska Frey and James Reilly December 1997 article - http://www.rlg.org/preserv/diginews/diginews3.html#com 1999 - http://www.imagepermanenceinstitute.org/shtml_sub/digibook.pdf |
|
2000
|
Image Quality Metrics RLG DigiNews By Don Williams August 2000 http://www.rlg.org/legacy/preserv/diginews/diginews4-4.html#technical1 |
|
Digital Imaging Production Services at the Harvard College Library RLG DigiNews By Stephen Chapman and William Comstock December 2000 http://www.rlg.org/legacy/preserv/diginews/diginews4-6.html#feature1 |
|
2001
|
Report of Imaging Practitioners Meeting on 30 March 2001 to Consider How the Quality of Digital Imaging Systems and Digital Images May be Fairly Evaluated Digital Library Federation By Stephen Chapman May 2001 http://www.diglib.org/standards/imqualrep.htm |
|
Digital Reproduction Quality: Benchmark Recommendations RLG DigiNews By Daniel Greenstein and Gerald George August 2001 http://www.rlg.org/legacy/preserv/diginews/diginews5-4.html#featured |
|
Guidelines for Digital Imaging Projects University of Illinois at Urbana-Champaign December 2001 http://images.library.uiuc.edu/resources/digitalguidev3.pdf |
|
2002 |
Benchmark for Faithful Digital Reproductions of Monographs and Serials Digital Library Federation December 2002 http://www.diglib.org/standards/bmarkfin.pdf or http://www.diglib.org/standards/bmarkfin.htm |
|
2003
|
Western States Digital Imaging Best Practices Collaborative Digitization Program (formerly the Colorado Digitization Program) January 2003 http://www.cdpheritage.org/digital/scanning/documents/WSDIBP_v1.pdf |
|
Debunking of Specsmanship RLG DigiNews By Don Williams February 2003 http://www.rlg.org/legacy/preserv/diginews/diginews7-1.html#feature1 |
|
2004
|
Technical Guidelines for Digitizing Archival Materials for Electronic Access: Production Master Files – Raster Images U.S. National Archives and Records Administration By Steven Puglia, Jeffrey Reed, and Erin Rhodes June 2004 http://www.archives.gov/preservation/technical/guidelines.pdf |
|
Digital Master Images – Sample Technical Specifications for Photograph Collections Library of Congress, Prints and Photographs Division Compiled by Kit Peterson June 2004 http://www.loc.gov/rr/print/tp/DgtlMastersSamplSpecsSelctdRcmndFinal7_2004.pdf |
|
Standards Related to Digital Imaging of Pictorial Materials Library of Congress, Prints and Photographs Division Compiled by Kit Peterson September 2004 http://www.loc.gov/rr/print/tp/DigitizationStandardsPictorial.pdf |
|
2005
|
Introduction to Basic Measures of a Digital Image for Pictorial Collections Library of Congress, Prints and Photographs Division By Kit Peterson June 2005 http://www.loc.gov/rr/print/tp/IntroDgtlImage.pdf |
|
FDsys Specifications for Converted Content – Digitization Specifications and Operating Procedures for Archiving Materials: Creation of Preservation Master Files U.S. Government Printing Office June 2005 http://www.gpoaccess.gov/legacy/FDsys_ccspecs.pdf |
|
CDL Guidelines for Digital Images California Digital Library July 2001 - http://chnm.gmu.edu/digitalhistory/links/pdf/chapter3/3.29b.pdf November 2005 - http://www.cdlib.org/inside/diglib/guidelines/bpgimages/cdl_gdi_v2.pdf or http://www.cdlib.org/inside/diglib/guidelines/bpgimages/ |
|
Digitization for Preservation Reformatting of Photographs DLF Fall Forum, BOF session Presented by Erin Rhodes November 2005 http://www.diglib.org/forums/fall2005/presentations/rhodes-2005-11.pdf |
|
2006 |
Technical Standards for Digital Conversion of Text and Graphic Materials Library of Congress December 2006 http://memory.loc.gov/ammem/about/techStandards122106.pdf |
General Trends in Digital Imaging
Looking at the above chronology, we can conclude the following:
In general, the trends for digital imaging have been:
- From lower minimal spatial resolution to higher spatial resolution
- From 1-bit scanning, to grayscale scanning, and finally to color scanning
- From low-bit (8-bits per channel) to high-bit (16-bits per channel) for grayscale and color images
- From scanning for a specific purpose to digitizing in a “use neutral” manner
Digitization has been limited by the technology:
- People did as little (as low resolution and/or as low a bit-depth) as they could get by with to facilitate only access.
- Minimum specifications (primarily for textual materials digitization) have been a big cost driver—scanning at lower resolutions means you can do twice as much.
- Digital storage was and is expensive. While less expensive today, high-capacity storage area networks (SAN) with automated tape libraries for backups and off-site mirroring (good IT practices for risk mitigation) all remain beyond the financial means of most cultural institutions.
- We are still wrestling with limitations of the science and technology—digital preservation repository infrastructure remains expensive and, to a large degree, undefined.
Early digitization replicated capabilities of the prior technology in the digital capture:
- Many of the early digitizing efforts matched digitization to microfilm.
- Early digitization emphasized scanning existing intermediates – a major problem with this approach is carrying forward the limitations of the previous technology (inaccurate tone reproduction and film grain), as well as carrying forward any defects in the intermediate (photographic and/or physical).
- Early guidelines were based on concepts like QI that came from the micrographic industry
- Realization early on that not all approaches used to assess microfilm quality worked for digital imaging—move toward SFR/MTF and away from resolution charts.
In general, for many projects items were and still are scanned at less than the recommendations cited in the DLF benchmark. This approach contributes to the “building a critical mass” of resources perspective. Often, large projects look towards scanning homogenous materials that are easy to scan both technically and legally, which correlates to a large amount of data created. This trend has accelerated the last few years with large-scale digitization efforts by Google, the Open Content Alliance, the Million Book Project, and the like.
The trend has been for the adoption of fixed approaches, rather than defining the process to achieve a specific result; for example, scanning at a fixed high spatial resolution for all originals, rather than assessing the characteristics of specific groups of originals and adjusting the digital imaging requirements to match the group. There are lots of assumptions in the field that have become truisms, such as fixed high spatial resolution and bit-depth is a good thing—but there is no guarantee of quality.
More recently, the focus has been on high spatial resolution and high-bit sampling, but there has been minimal effort put into defining other quality parameters. In the end, spatial resolution by itself is not a defining factor for digitization requirements, nor does it guarantee quality. It represents the maximum spatial detail or acuity a device is capable of achieving, if designed well. Bit-depth only indicates the maximum range of tones a device is capable of differentiating, but also is not a guarantee of quality. There needs to be more emphasis on ensuring the quality of the pixels, and this is still problematic for the field. Tools have not improved and imaging is still at a point where it cannot be done well without experienced people.
One conceptual approach for information capture is to regard digital imaging along a spectrum. As you move from one end of the spectrum to the other, the amount of information and the accuracy of information that is captured increases.

At one end of the spectrum is a very defined imaging environment. Capture is done to at least minimal specifications, spatial resolution is based on formats and sizes of originals, images are encoded in RGB, and images are processed in a manner to facilitate a specific output (i.e., images are adjusted for generic monitor display or for printing). In imaging science, this may be called an “output-referred” approach to the image state for the digital images. Both NARA’s 1998 and 2004 technical guidelines are based on an output-referred approach and recommend bringing all images to a common rendition that is based on generic monitor display.
At the other end of the spectrum, the imaging environment is less defined and minimal image processing is done, in this end of the spectrum the image state may be either “original-referred” or “input-referred.” Imaging may be done in a manner that relates the image more closely back to the original (although this is also possible with an input referred image state), images may exist in optimized three-channel or multi-channel color encodings, spatial resolution is uniformly high or based on assessment of the original, images are less processed for any particular use. Original-referred and input-referred images will need to be adjusted in order to be used, so that the display or output will look like the original. We are not entirely sure today just how to define digital image capture at this end of the spectrum. These are emerging approaches that warrant further investigation.
The unprocessed end of the spectrum will place a bigger burden on making these resources usable in the future. The more defined the output, the less work there is to do. The more open, more raw the resource, the more work it takes to make it usable. The approach of bringing images to a common rendering solves some of the usability problem.
Although there is the potential for having more functionality at the less-defined end of the imaging spectrum, we still want to ensure we have captured the appropriate essential characteristics that tell us about the original resource. From a preservation reformatting perspective, we would prefer to use an approach that is original-referred.
We feel it is feasible technically to define approaches to digitization that will produce very accurate visual surrogates (for many originals, accurate visual representation is a major aspect of carrying forward the essential characteristics) and create a “good data-set” as Carl Fleischauer of the Library of Congress describes it. We are treading a fine line between a traditional approach that defines a specific visual representation and moving forward to one that is more “use neutral” and accommodates other future (but undefined and unknown) uses.
Comments on Other Aspects of Digitization Guidelines
Scanner and digital camera assessment
- Still in progress, not as much as progress as we would like.
- Early guidelines were not based on the capabilities of the equipment; assumption was that the equipment performed at an appropriate level—even though no one was really measuring the performance of the equipment.
- NARA’s 2004 Guidelines were the first to define capture device performance parameters.
- Only have limits for noise and channel registration
- Higher limit for noise level for text docs—lower maximum density
- Lower limit for noise level for photographs—higher maximum density
- We picked limits based on actual equipment—we ran the tests on a range of scanners and digital cameras and picked numbers that were reasonable
- Other parameters used as a guide for determining the suitability of a particular capture device for a particular original
Viewing environment
- There has been acceptance of the standardized viewing environment defined by the graphic arts industry, if not wide adoption, implementation, and use.
- As we move toward preservation digitization, this becomes even more critical – particularly monitor calibration, if the monitor is used for a basic visual assessment compared to an analysis of the capture device performance.
Color management and ICC compliant workflows
- People are trying to implement color managed workflows.
- The current ICC color management process is not always useful for our work – a new CIE committee on Archival Color has been established and hopes to address the specific needs of our community.
- We still believe in doing the imaging/encoding/image state in a way that would allow us to ignore the ICC profiles – we want the option of interpreting the numerical values literally and still have reasonably accurate color and tone reproduction.
- Rendering intents – when performing color space transformations using the current ICC color management process – relative colorimetric intent is most appropriate for near neutral originals like old documents, and perceptual intent is most appropriate for photographic images.
- Color spaces – assumes RGB encoding (emerging practices may use other encodings)
- NARA’s 1998 guidelines – suggest using sRGB (by assigning), which we still think is appropriate for text documents (they tend to have a smaller color gamut and less saturated colors)
- NARA’s 2004 guidelines – moved to a recommendation of the larger-gamut AdobeRGB 1998 color space
- The future - assume in some cases an even larger gamut color space will be desirable, achievable only with high-bit sampling
Reference targets
- General targets and multiple test targets have been used, but usage and implementation has varied.
- Suggestion of using targets specific to the types of originals we are scanning – for example, aged albumen target for old albumen prints – but this would be very difficult to do.
- Don Williams has worked on an integrated target – the “Golden Thread” target that integrates multiple targets so all aspects can be evaluated.
- Current work at Library of Congress, overseen by Michael Stelmach, with Don Williams and Peter Burns – capture device assessment target, image characterization target (scanned with original), and software for automated analysis – being called the Digital Image Conformance Evaluation (DICE).
Tone and color reproduction aimpoints
- NARA’s aimpoints geared toward generic monitor display and an output referred environment
- Others geared toward prepress work - including the Government Printing Office’s guidelines (http://www.gpoaccess.gov/legacy/FDsys_ccspecs.pdf)
- Still a big question about the variability of digitization
- Currently the Library of Congress is trying to address this
Image processing workflows
- NARA’s illustrative sample workflow intended to minimize doing anything “bad” to the image quality versus leaving the images in a less defined, less processed, “raw” state
- When people have described their image processing – it is almost always specific to their local process – as presented at the IS&T Archiving Conference panel on imaging workflows, in Washington, DC, 2005
- Sharpening – do it or not? Still a question.
Image quality defects
Document Types
- Define text by the characteristics of the information and type
- 1-bit for printed high-contrast text
- 8-bit for low contrast, diffuse characters, staining, faded, mixed content, etc.
- 24-bit for cases where color is important to the interpretation of the information
- Define photographs by
- Transmissive camera originals – negatives, slides, transparencies
- Reflective positives – prints
- Pixel array tied to format and dimensions of the originals – a major departure from earlier guidelines – acknowledging the amount of information in originals varies
Quality Control
- Not standardized and no community-wide standards
Derivatives
- Fixed size vs. dynamic creation
- In general, moving toward JPEG2000 – although implementation in a high-demand environment is still problematic
- Although available for many years for on-the-fly creation of derivatives from traditional raster image formats like TIFF, dynamic creation is not limited to just the JPEG2000 format
File Formats for master files
- TIFF – still the de facto standard, advocated by people who like to “keep it simple”
- JPEG2000 being considered more seriously now
- Resiliency to corruption due to data redundancy
- On the fly conversion of derivative to any size
- Difficult to implement
- Limited choices for software toolkits to support JPEG2000 within IT infrastructure
- High demand on infrastructure when trying to create derivatives on-the-fly
Where are We Headed and What Still Needs to be Done?
A great deal of progress has been made in some areas, and in other areas not as quickly as desired, but much has been learned over the past decade. As we move towards better definition of what digital imaging means in a preservation context, there is still work to be done. It is a little humbling to look back and admit that we are still asking many of the difficult questions that we were asking over a decade ago – particularly about the relationship of digitization to preservation and agreement on approaches that are appropriate for preservation reformatting using digitization.
For the most part, we have accepted that digitization can meet current needs for facilitating access, and by doing so, also fulfill basic preservation needs by limiting handling of originals. However, digitizing is not yet completely synonymous with preservation – see Appendix A of NARA’s 2004 Technical Guidelines. Beyond the benchmarking concept, which has been well established for text-based materials through the work done by Cornell and later by the Digital Library Federation’s Benchmarks for the Reformatting of Monographs and Serials, only recently has there been community-wide discussion regarding requirements for preservation reformatting, particularly for non-text original formats. We are moving closer to a better understanding of what is needed for digitization as a preservation reformatting approach, and in many ways we have already defined some of the requirements that we would be willing to accept as preservation requirements – see “Digitization for Preservation Reformatting of Photographs.”
The move from creating a digital copy primarily for access purposes to one that is more focused on the quality of the digital copy—one that is worth sustaining over time—takes into account not only the properties of the original that are deemed important to carry forward on one hand, but also changing user expectations, the capabilities of the technology at the time, and the purpose or use of the digital image on the other.

Only at the highest technical quality level do you get a digital resource that matches the original, or even the analog preservation copy, if this is the intent. As Stephen Chapman from Harvard University notes, sustainability is a key attribute of good digital collections, and the best time to build in sustainability is at the point of creation. Although the concept of sustainability applies to both use and content, from a digital imaging perspective, what approaches do we follow to create digital images that are worth sustaining over time? How do we start to define a technical approach that could also serve as a preservation approach that takes into account some or all of the factors in the illustration above?

This article has discussed two broad concepts, information capture and essential characteristics, that we think address sustainability in an imaging context. Information capture addresses the concept of producing a good reproduction. Conceptually the community is moving toward capturing information to produce “good data sets.” These representations may not look like the originals we are copying, but can serve as of yet unknown needs, such as scientific analysis and research. 
Defining essential characteristics of originals helps us to move beyond the limitations of current technology to designing specifications based on these properties. These can be based on many factors, including physical and chemical attributes of the original, condition, quality, defects, date of production, generation (photos), curatorial or financial value, etc. We need further investigation into the essential characteristics of different classes of digital objects/files, how to tie these properties to digitizing approaches, and how to determine the best approaches to digitizing classes of originals with similar properties and characteristics. Essential characteristics can be identified via the capture process or in metadata about the image. Metadata should include information about the original and the digital resource, and should document information about characteristics that are not inherent in the digital version.
A focus on essential characteristics does not preclude valid reasons for digitizing collections based on concepts of intended use, affordability, and sustainability over time. “Fitness to purpose” has been a driver for digital imaging for some time; not every program will have the same goals of fidelity to the original, longevity, or preservation. Institutions should take into account the context and reasons for digitization in their individual cases. In a preservation context, however, there may be a higher risk of not achieving preservation goals at this end of the spectrum.
Even as we move toward less-defined imaging approaches, we still need to create images in consistent ways that will allow us to automate ingest into digital repositories - including characterization and validation of the digital objects and data formats, automated transformation of digital objects, and automated creation of reference and/or use copies.
There are gaps in specific technical areas that should be addressed by the larger preservation and imaging community in order for digitizing for preservation reformatting to be fully scoped and defined. In order to consider using digitization as a method of preservation reformatting, it will be necessary to specify more about the characteristics and quality of the digital images beyond specifying minimal and optimal levels of spatial and signal resolution. High-bit, high resolution imaging has been the focus of imaging specifications, but there has been minimal effort put into defining other quality parameters, such as tone reproduction, color reproduction, color mode, capture device performance, assessment of source, and image state, for example.
As mentioned above, one area that still needs a lot of work is capture device performance. Pixel resolution is a good marketing device, but the internal processing of the scanner or camera has a big influence on image quality. Besides tests for spatial frequency response and dynamic range, evaluation of the capture device might include tests to measure noise levels, uniformity in tone and color reproduction, channel registration, dimensional accuracy, etc. More importantly, can we define trustworthy pass/fail limits for each of these tests so that scanner and camera performance is more easily measured and documented? Although much effort has gone into quantifying the performance of scanners and digital cameras in an objective manner, there has not been enough progress in making device performance assessment readily understood, usable, and easily integrated into imaging workflows to date. There has not been advancement of comprehensive guidelines with sophisticated approaches to device performance assessment. Simple approaches, or simply taking the manufacturer’s specifications at face value, come at the expense of quality. We assume that the capabilities and performance of capture devices will need to improve to accommodate high levels of information capture. Currently, the Office of Strategic Initiatives at the Library of Congress is working with consultants on developing better test targets and software for automated evaluation of capture device performance.
We certainly need consensus on applying scanner performance test limits that are acceptable to the imaging community. The digital library community has been relatively silent on some of these technical issues, and there has been a heavy reliance on scanner manufacturers and the digital camera industry to define the criteria. Imaging practitioners should work to define both assessment criteria and pass/fail limits. To a certain extent, the imaging science community can assist us in this process. For particular applications, certain performance criteria will be more critical than others and will need to exceed established minimum limits, such as level of dimensional accuracy for aerial photography scanning.
We need to be sophisticated users of the technology and the tools. We should continue to acknowledge that to do imaging well is both difficult and requires expertise. While it is a versatile tool, it does not accomplish specific functions well without requiring a certain amount of operator expertise. In many cases it still does not work as well as we would like. People have been more than willing to accept very limited, undefined digital imaging guidelines as acceptable for preservation reformatting. We should look to designing and endorsing more comprehensive and sophisticated approaches to imaging, especially as evident in imaging specifications, guidelines, and best practices—which should include an articulation of the entire digitization approach, not just specifications for imaging “at the scanner.” Guidelines should take into consideration a wider range of technical parameters, assessments of the original on a more granular basis, and an acknowledgement that there will be different approaches depending on the role or purpose of imaging within a particular context.
 |
 |
 |
 |
 |
 |
 |
 |
 |
Feature Article 2 |
 |
Print this article only |
 |
 A Digital Decade: Where Have We Been and Where Are We Going in Digital Preservation?
Author: Nancy Y. McGovern - ICPSR (nancymcg@umich.edu)
 |
 |
 |

There has been measurable progress in the digital preservation community since the seminal work Preserving Digital Information: Final Report and Recommendations was published by the commission of the Commission on Preservation and Access and RLG more than a decade ago. Those concerned about digital preservation in 1996 did not have the Open Archival Information System (OAIS) standard to frame the development and discussion of digital preservation developments; or a set of attributes of trusted digital repository to delineate the organizational context for digital preservation; or a data dictionary for preservation metadata; or the concept of institutional repositories made real by a range of software options. All of these developments have emerged within the past decade. Today, we have conferences that are entirely devoted to digital preservation (e.g., the International Preservation (iPres)) conference and peer-reviewed journals for digital preservation, (e.g., The International Journal of Digital Curation). One can follow the maturation of the digital preservation community in a decade of RLG DigiNews articles.
Originally focused on “the converging fields of preservation and digitization,” the first article to specifically address digital preservation appeared in RLG DigiNews in 1998. In 2000, the RLG DigiNews editorial staff significantly expanded the coverage of digital preservation, highlighting articles with the now familiar symbol, which added digital to the established infinity notation from print preservation. The cumulative contribution by RLG DigiNews to the digital preservation literature over the past decade includes more than fifty feature articles plus a sequence of highlighted websites and FAQs. These articles and other features stressed practical steps in digital preservation with an emphasis on the development and evaluation of relevant strategies, applications of research results, the integration and use of tools, and national and community-level agendas.
This tenth anniversary review of digital preservation developments takes an informal gap analysis approach, measuring where we are (the “as is”) against where we might like to be (the “to be”). This gap analysis has three components reflecting the core aspects of digital preservation: organizational infrastructure, technological infrastructure, and requisite resources.
 Figure 1. Three-Legged Stool for Digital Preservation.
These three components comprise the three-legged stool for digital preservation (Figure 1), a concept developed at Cornell for the Digital Preservation Management (DPM) Workshop series, that was funded by the National Endowment for the Humanities from 2003-2006. The workshop curriculum uses the three-legged stool as a means for an organization to assess its development within the context of a maturity model comprised of five sequential stages: acknowledge, act, consolidate, institutionalize, and externalize.[1] This review takes a more basic step by considering the status of the three legs of the stool within the community from the “as is” and “to be” perspectives.
The Organizational Leg
The organizational leg determines the “what” of digital preservation—the mandate, the scope, the objectives, the staffing of an organization—for engaging in digital preservation. Ten years ago, the organizational leg was arguably the weakest leg as evidenced by the general absence of explicit mission statements that referenced digital preservation, policies that specifically addressed the preservation of digital assets and sustained digital preservation programs within organizations.
The “As Is”
There have been several important developments for the organizational leg over the past ten years, including the development and promulgation of the RLG/OCLC report on the Attributes of a Trusted Digital Repository (TDR), an increase in the development of digital preservation policies by organizations, and an acknowledgement of the central role of procedural accountability for audit and certification.
Trusted digital repositories TDR represents the best expression of the organizational leg for digital preservation and has become a de facto standard for the digital preservation community since its release in 2002. Prior to the development of TDR, the community had no formal expression of the organizational context for digital preservation.
 Figure 2. The Cornell Model for Trusted Digital Repository Attributes.
The Trusted Digital Repositories document defines seven attributes of a conformant organization: OAIS compliance, administrative responsibility, organizational viability, financial sustainability, technological and procedural suitability, system security, and procedural accountability. The relationships between the TDR attributes are portrayed in the Cornell model (Figure 2), developed to support the DPM workshop series. OAIS compliance is implicit in the diagram. TDR stresses the importance of the organizational context and places technology within that context. This placement recognizes that technology should be suited to the scope and requirements of each digital preservation program. The Cornell model for TDR added a “digital archives border” to the TDR attributes because one organization might maintain more than one repository instance, in which case the outer layers might be coordinated across the organization, and a group of organizations might come together to manage one repository (e.g., in a consortial effort).
Digital preservation policy development Policies and other documentation of decisions and actions represent one of the best indicators of the development of the organizational leg. At the 2006 Best Practices Exchange in North Carolina “participants stressed again and again that a successful digital preservation program requires a strong foundation…Participants identified four essential elements for building a strong foundation for a digital preservation program: support and buy-in from stakeholders; “good enough” practices implemented now; collaborations and partnerships; and documentation for policies, procedures, and standards.”[2]
This brief list of digital preservation policies is suggestive of the increase in policy development within the digital preservation community world wide.
The advent of the World Wide Web, which was also in its nascent stage in 1996, has made possible more effective and global exchange of information about policies and practices. More work is underway on developing policies. For example, the nestor policy project in Germany is working on a profile for a national long-term preservation policy.
Providing the evidence for audit and certification “A well-written policy should serve as historical proof of an institution’s commitment to digital preservation now and long into the future.” This conclusion from the 2006 Best Practices Exchange reflects an implicit principle that underlies the evidence requirements for the audit and certification of digital archives. The October 2005 issue RLG DigiNews featured articles on the major digital archive audit initiatives in the US, the UK, and Germany. The Center for Research Libraries (CRL) conducted a series of test audits of digital archives, with funding from The Andrew W. Mellon Foundation, and hosted a meeting of with the UK and German audit projects that produced a set of common audit principles. CRL released the “Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC)” in March 2007 and should be releasing the principles and test audit report soon. TRAC is a revised version of the RLG/NARA document, Audit Checklist for Certifying Digital Repositories, that was released for public comment in January 2006. An ISO standard development effort is underway that will build on the work of these initiatives and integrate the relevant requirements from the information technology and security domains. In considering the basis and means for digital archive certification, these initiatives have shifted their focus towards the benefits and tools needed for self-assessment and third-party audits. The results so far have demonstrated that self-assessments and audit effectively identify the strengths and weaknesses of digital preservation programs and define a development plan for organizations to incrementally address the full set of criteria defined for trusted digital repositories.
The “To Be”
Though the “as is” perspective on the organizational leg has improved substantively over the past decade, there are at least two notable areas of development for the “to be” view: the need to integrate the organizational policies for digital preservation into technological implementations and the need to develop and evolve digital preservation skills. 
Integrating policies into action The organizational leg (the “what”) and the technological leg (the “how”) of the digital preservation stool need to be coordinated to develop compliant and feasible digital preservation strategies. The theory is in place. The OAIS Reference Model, for example, identifies specific documents that are needed, including submission agreements, format standards, documentation standards, physical access control, database administration, storage management, disaster recovery, system evolution, migration standards, and procedures regarding most of these areas. In practice, the organizational leg, represented by policies, and the technological leg, represented by digital repositories, may develop separately and not always in parallel. There are ongoing developments to watch in bridging this gap between the organizational and technological legs. The EU-funded project, PLANETS, promises technology-based preservation planning and tools that reflect organizational policies. The PLEDGE project, a collaborative initiative by the Massachusetts Institute of Technology and University of California at San Diego Libraries and the San Diego Supercomputer Center, has developed a promising policy engine prototype. Integrating the organizational and technological legs represents a tangible intersection of theory (what should be done) and practice (what is done).
Developing requisite skills As technology evolves, digital preservation skills need to evolve. Preservation metadata provides an illustrative example of this often unmet requirement. In 2005, the OCLC-RLG Preservation Metadata Implementation Strategies (PREMIS) Working Group released the first version of the preservation metadata data dictionary and is continuing to revise and enhance their results. RLG DigiNews featured PREMIS updates in the October 2004 and December 2004 issues. PREMIS has become a de facto standard that may transform into a formal standard for preservation metadata in the future. Yet practitioners continue to struggle with implementing preservation metadata, as participants at Cornell’s DPM workshops confirmed. One aspect of this struggle is that there are digital preservation specialists who are able to devise digital preservation policies and strategies; and there are metadata specialists, who are versed on metadata standards, schemas, and tools. A useful hybrid skillset would be a digital preservation metadata specialist who is able to bring the best of both together and to apply the policies and requirements at high and low levels of granularity. As digital preservation strategies emerge and evolve, similar hybrid roles that combine organizational and technical skillsets may be needed for specific types of digital content, such as digital preservation workflow management and archival storage management. In the long-term, the digital preservation community will have developed a comprehensive set of specialized roles and skills for digital curators. The Digital Curation Centre in the UK and the Digital Curation Curriculum project at UNC Chapel Hill are examples of initiatives to watch in this area.
The Technological Leg
The technology leg addresses the “how” of digital preservation – the specific digital preservation strategies, staff, tools, equipment, and other means for achieving digital preservation objectives. The technology leg combines hardware, software, formats, storage media, networks, security measures, workflows, procedures, protocols, documentations, and skills, both technical and archival. A decade ago, the hope of a “silver bullet” for digital preservation, typically in the form of a technology-only solution, was still strong and served as an inhibitor to the development of organizational responsibility for digital preservation.
The “As Is”
Arguably, technology has been viewed as both the problem and solution for digital preservation. The lessons from the past decade have demonstrated to the community that a balanced three-legged stool with a sturdy technology leg will be more effective in establishing a sustainable digital preservation program than a technology pogo stick. Certainly, there have been notable technology leg developments, including the OAIS Reference Model and open source repository software and tools.
OAIS Reference Model The development of the OAIS Reference Model, begun more than a decade ago, reflects the work of an international group of experts, and it is intended for use in any context in which digital preservation occurs and represents the most formal and comprehensive expression of the archival process that is available to the community. The stages of development for OAIS can be traced on the OAIS website.
 Figure 3. The high-level diagram for the OAIS Reference Model.
The high-level OAIS diagram depicted in Figure 3 has become ubiquitous in digital preservation presentations. OAIS provides a common language and a set of functions for use in community-wide discussions and in mapping organizational developments. Cal Lee at UNC Chapel Hill wrote an evaluation of the OAIS development for his dissertation, “Defining Digital Preservation Work: A Case Study of the Development of the Reference Model for an Open Archival Information System (OAIS).”
Repository software and tools Examples of repository software developed over the past 10 years include: DSpace, the Flexible Extensible Digital Object and Repository Architecture (Fedora), Greenstone digital library software, the Berkeley Electronic Press (bepress) and the Dark Archive In The Sunshine State (DAITSS). Even with these examples of available repository software, organizations need to decide how to select an appropriate repository option by considering the capabilities and limitations of each and the extent to which the repository software meets archival requirements and suits the digital content to be preserved. Organizations may opt to build their own repository, such as the National Library of the Netherlands, or to subscribe to a digital preservation service provider, such as bepress or the OCLC Digital Archive. None of these options was available to organizations a decade ago.
Repository software may integrate digital preservation tools (or equivalent functionality) or an organization may define for itself a digital preservation workflow that integrates tools at appropriate points in the process. Recent examples of tools used for digital preservation include those that identify and evaluate file formats (e.g., JHOVE, DROID), that normalize files to preservable formats (e.g., XENA), that generate and capture metadata (e.g., the NLNZ metadata extractor), and that produce a unique identifier and aid in detecting changes to files (e.g., checksums). The October 2006 RLG DigiNews FAQ reviewed the NLNZ Metadata Extractor and several other tools. These developments represent progress, but the community has some ways to go before digital preservation is fully automated and fully-compliant digital preservation systems are available.
The “To Be”
There is significant research and development work underway that is targeting the development, enhancement, and scalability of tools and repository software. RLG DigiNews highlighted 10 promising digital preservation research programs in August 2005. The “to be” category for the technology leg could be categorized as making it possible to do more through automation and to provide the means to integrate audit requirements and measures into digital preservation management.
Scalable capabilities Scaling repository software to the increasing size of digital content containers (e.g., digital video files) and the extent of digital content to be preserved is a capacity and capability issue that largely remains in the “to be” category. The past decade has also seen the publication of recommendations from the National Science Foundation (NSF) in the US and the Community Research & Development Information Service (CORDIS) in the EU about the infrastructure that will harness the potential of technology developments to support and enable research. These programs provide a framework for development.
Workflows and suites of tools Still on the wish list for the digital preservation community is the capability to easily define, customize, change, and extend a digital preservation workflow that is modular to allow for the easy integration of tools. There have been developments in generating or extracting metadata for submissions, but this work is still in its infancy. It is also not always possible to easily incorporate tools into a workflow. Moving from individual tools to suites of tools and workflows that can be shared and exchanged between organizations seems to be a natural path for development.
 Figure 5. The Integrated Digital Preservation Matrix.
As more and more organizations develop trusted digital repositories that are based upon sound and continuous workflows, the potential exists for leveraging the capacity and capabilities across repositories and across the community to realize cost-savings, more effective results through collaboration, and community-wide action, as envisioned in the integrated digital preservation matrix (Figure 5, developed for the Cornell DPM workshop series).
Audit capabilities As institutions begin to rely upon each other, there is the need to develop trust through verification. It is not enough to provide assurances about performance and reliability in digital preservation; it is necessary to demonstrate effective and sustained action. With the development of audit and certification for digital preservation, organizations will require the means to conduct self-assessments and participate in external audits. Incorporating these tools into digital preservation repositories would lighten the burden of preparing for audits and make it easier – and less costly – for organizations to meet audit requirements. The audit and certification initiatives have provided tools for self-assessment and are increasingly providing examples for audit; organizations need to step up to contribute local examples and lessons learned.
The Resources Leg
The resources leg factors the “how much” of human, technological, and financial resources are needed to produce desired digital preservation outcomes. A decade ago, the question: “How much does digital preservation cost?” was enough to bring a digital preservation discussion to a shuddering stop. At that time, the resources component of digital preservation had not been explicitly separated from the organizational component. As a distinct component, the resources required for a digital preservation program can be identified, quantified, and measured comprehensively and objectively – although for the most part this potential has yet to be achieved.
The “As Is”
Unlike the organizational leg that is embodied in the TDR document and the technological leg that is defined in the OAIS Reference Model, the resources leg of digital preservation has no community document that expresses its scope and requirements. The inclusion of financial sustainability as an attribute of a TDR signifies an important development for digital preservation because it was the first time that addressing the cost of digital preservation was explicitly acknowledged as an organizational requirement. Additional indicators of progress towards the development of a sound resources leg include the designation of digital preservation funding by organizations (e.g., DSPACE at MIT); digital preservation programs that are lasting longer than digital preservation projects, as evidenced by organizations such as those that have developed digital preservation policies; and research funding for digital preservation that is ongoing if not permanent (e.g., JISC, NEH, NSF, NHPRC programs). In addition to these indicators, the digital preservation community has a growing base of literature that addresses digital preservation costs, including Brian Lavoie’s proposed economic models for digital preservation; Shelby Sannett’s research on cost models and cost frameworks; and the approach developed in the Netherlands (Oltmans and Kol) that provides a tool to compare the costs of migration and emulation over time. The most comprehensive cost formula for digital preservation was proposed by the LIFE project in 2006. These examples have contributed to a deeper understanding of digital preservation costs within the community, but do not equate to a comprehensive community document for the resources leg. Nor are organizations systematically collecting and sharing resource information.
 Figure 5. Integrating the organizational and technological legs of digital preservation.
The resources perspective considers the “what” and the “how” of digital preservation to determine the “how much” (represented by financial sustainability in Figure 4, developed for Cornell’s DPM workshop series). The resources leg is informed by the organizational context and tied to the technological implementation for an organization’s digital preservation program. Figure 4 illustrates the technological implementation expressed by OAIS within the organizational context expressed by TDR and the separation of financial sustainability within the organizational context for digital preservation.
The “To Be”
There has been progress in developing the resources leg, though two areas seem ripe for further development: the designation of funding by organizations for digital preservation and the definition of a community document that addresses resources.
Designating digital preservation funding Organizations are still struggling to secure resources for digital preservation. One of the research library directors interviewed for the recent Metes and Bounds report on e-journal archiving observed that digital preservation is a “just-in-case scenario, and this is very much a just-in-time operation.” (p. 11) Respondents to Cornell’s DPM workshop institutional readiness survey identified insufficient resources for digital preservation as the second highest threat to digital content after insufficient policies or plans. Survey respondents also identified a complicating factor in designating resources for digital preservation. It has been common practice for an organization to establish a digital preservation initiative by assigning a percentage of the digital preservation responsibility to several staff often located across an organization, making it difficult to consolidate or coordinate resources. The digital preservation community also needs a means for being transparent about resources, recognizing that specific details may include confidential or internal-only information.
Defining a community document for resources The “as is” examples of resource-related writings and developments for digital preservation (e.g., Lavoie, Sannett, Oltmans and Kol, and LIFE examples presented above) provide a starting point for defining a community document for resources. Common elements in TDR and OAIS include the definition of core concepts, the definition of roles and responsibilities, descriptions of the components and attributes, and the discussion of implementation issues with examples and/or recommendations. A productive first step for the community might be to consolidate and rationalize the resource issues and elements presented in the resource examples, then apply a gap analysis process to fill in missing elements. There have been few examples within the community of responses to these contributions to the strengthening of the resource leg of digital preservation.
Stabilizing the Three-legged Stool
Taking the three legs of the stool together, there are a number of indicators that the digital preservation community is coalescing and maturing. Communities by nature share common interests and objectives. Indicators of the development of the digital preservation community include accepted standards and practice and an increasingly effective communication network.
Standards and practice A decade ago there were no formal shared standards or practice for digital preservation. Today, we have OAIS, TDR, and PREMIS, for example. The sustainable formats website at the Library of Congress and PRONOM are contributing to the development of preservation strategies for classes of digital content. RLG DigiNews featured articles about PRONOM developments in the October 2003 and April 2005 issues. These examples reflect community practice as defined by representatives of archives, libraries, museums, and other cultural heritage institutions. Domain-specific developments, such as the Canadian Heritage Information network (CHIN) report on digital preservation for museums, have also contributed to the development of community-wide practice. In addition, the standards of our community are regularly supplemented by standards developments in other communities, including information technology, information security, telecommunications, and the Internet. We are moving towards more comprehensive codification of accepted practice, the promulgation of standards and practice through community channels, and the means to develop and maintain policies and procedures as needed.
Communication network A challenge for organizations that are engaged in digital preservation is to balance the time and resources devoted to developing the repository internally against monitoring the external environment for relevant developments, updates, standards, and warnings. The difficulty in keeping up with digital preservation developments is exemplified by a quick review of the RLG DigiNews August 2005 list of ten “watch this space” digital preservation research projects. Three of the project websites had updates and current information about the project that were fairly easy to locate. The current status of three of the project websites was unclear and the projects seemed to be stalled or abandoned based on obvious locations for updates and news on the websites. Three of the project websites had few or no updates since August 2005. It was possible to find results or presentations about the projects by searching, but it was difficult to confidently determine the current status. The URLs for two of the projects have changed and could not be easily found by searching. Of course, there are several possible explanations for that and the projects could be alive and thriving somewhere. One project website required logging-in. Requiring a log-in is not a bad thing, but logging in requires time and a bit more effort. If an organization is trying to track and follow a number of digital preservation developments, these examples represent potential barriers. The PADI website has provided an excellent information service to the digital preservation community for the past decade and other services contribute as well, but there is currently no “one stop shopping” for keeping up with digital preservation research and development. Keeping up takes effort, but it is worthwhile. The digital preservation community is active and offers many opportunities for organizations to participate, contribute, and learn.
“One participant [in the 2006 Best Practices Exchange] characterized a ‘community of practice’ as a flock of birds. Each bird may ultimately have a different end destination, but since they are flying in the same general direction, it is more efficient to fly together as a flock.” A fitting close to this anniversary review of the migration patterns of a community over the past decade. How far will we have gotten towards the “to be” by 2012 or 2017? Stay tuned…
Author's Addendum (7 May 2007): An alert reader contacted me about my list of digital preservation policy examples questioning the dates of some and the inclusion of another. I am submitting this brief response to correct and clarify my list. The reader wondered if I should have cited earlier dates for the National Library of Australia (NLA), the UK Data Archive (UKDA), and the Arts and Humanities Data Service (AHDS). After checking, I can report that 2001 is the correct date for the NLA digital preservation policy and 2004 is the date for version 1.0 of the AHDS digital preservation policy. Both of these institutions have been major contributors to digital preservation progress. An important caveat for the AHDS is that 2004 was the date of their first policy to address the preservation of the digital collections within their care; however, the AHDS developed an early strategic policy framework document (http://ahds.ac.uk/strategic.doc) in 1997 that reported the results of a study they conducted, including recommendations to the community on developing digital preservation policies. I should have cited the date for version 1.0 of the UKDA policy as 2003 and the date for the British Library policy as 2001. I included the Digital Library Sunsite policy because it is both a collection development and a preservation policy. It is an important early example of the definition of preservation levels for digital content and of a preservation policy that address Web content. An interesting thing about digital preservation policies is that even institutions that have been early adopters and pioneers in digital preservation often took a while to develop formal digital preservation policies. We should have many more policy examples that are readily available, though we should also be pleased with the progress we have made and continue to make. Thank you to the diligent reader and my apologies to the British Library and the UKDA for misdating their policies.
Notes [1] Anne R. Kenney and Nancy Y. McGovern, “The Five Organizational Stages of Digital Preservation,” in Digital Libraries: A Vision for the Twenty First Century, a festschrift to honor Wendy Lougee, 2003.
[2] Christy E. Allen, “Foundations for a Successful Digital Preservation Program: Discussions from Digital Preservation in State Government: Best Practices Exchange 2006,” RLG DigiNews, June 2006, Vol 10, No 3.
 |
 |
|