RLG
 Feature Article 1  

Digital Imaging - How Far Have We Come and What Still Needs to be Done?

Authors: Steven Puglia - US National Archives and Records Administration (steven.puglia@nara.gov), Erin Rhodes - US National Archives and Records Administration (Erin.Rhodes@nara.gov)

Introduction

Libraries, archives, and museums have been engaged in the digitization of their collections for well over a decade now. As we look back over the past ten years, what is the best way to assess how far we have come and what work on defining digital imaging approaches still needs to be done?

This article attempts to provide a brief overview of the conceptual and technical influences that have defined digital imaging in cultural institutions during the last decade—by looking at goals and objectives for digital reformatting and how they have changed, by looking at specifications and imaging guidelines and how they have evolved, and by identifying areas that still merit further investigation. The focus is on digitization used to create raster images, as this type of work represents a very high percentage of the digitization that has been done to date.

In general, digitization has moved past the experimental, startup, standalone operation phase. Unfortunately, in many organizations, digitization projects still have not been fully realized as “mainstreamed programs.” Even if digital imaging activities are not quite institutionalized, digitization has found its place within a larger context and is directly related to work being done in the following areas:

  • archival and preservation issues and activities
  • managed repositories
  • IT infrastructure (networks, databases, storage)
  • on-going collection and digital project management and policy issues
  • Web and online access; metadata and cataloging; and other digital library activities

In many institutions, digitization programs have forged relationships with allied departments, such as faculty media labs, academic computing centers, campus museums, faculty research projects, and with IT systems, such as bibliographic catalogs, collection management systems, digital asset management systems, etc. As we move towards supporting large scale digital imaging programs organizationally, technically, and with the dedication of more resources, there is a growing and sobering understanding of the significant investment that will be required to carry out effective digital imaging initiatives. This is especially true in the areas of staff expertise, IT systems and infrastructure, digitization and metadata specifications and standards, and the costs to create digital resources and manage them over the long-term.

v11_n1_art1_bo1As digital imaging activities also move beyond digitization in special collections in libraries and archives to include involvement in large-scale commercial partnerships for mass digitization of more general collections, the use and nature of these collections are being transformed as well from fixed, discrete, unique collections to resources that will provide the groundwork for networked information, research, and services that we had not envisioned prior to this time.

From some perspectives, a great deal of progress has been made in our understanding of digital imaging as a technology and how to use it within cultural institutions for reformatting collections and making them more accessible. Conversely, some, like Nicholson Baker, have argued that prior approaches to “preservation reformatting,” such as microfilming, were conceptually and technically flawed, and we are only carrying forward similar problems into the digitization environment.

It is undeniably true that the more we learn about digital imaging as a technology, and about digitization as an institutionalized program, the more we realize there is much more to learn. In reviewing efforts during the last decade to define digital imaging approaches, we conclude that in some areas, not a lot of progress has been made.

Goals of Digital Reformatting

“Technology” as a term has come to be synonymous with computers and information technology, particularly the term “high technology.” The dictionary definition of technology is “the practical application of knowledge especially in a particular area” (Merriam-Webster online, 2007). Technology is never “THE answer” to our problems in cultural heritage institutions (and in many ways the same can be said for life in general): technology is just the tools we have available to us to address problems. Over time, the nature and types of tools change, and so does our understanding of problems and how best to address problems. The goal is to become sophisticated users of all appropriate tools, by selecting and using them wisely. We need to acknowledge both the advantages and disadvantages of our tools, and our corresponding technological choices for solving specific problems. What are our goals and assumptions for digital imaging in the context of cultural heritage institutions?

v11_n1_art1_bo2Early digital imaging efforts focused markedly on the technology itself; on the technological feasibility of scanning, on defining work processes, and on accomplishing the actual conversion—rather than focusing on the bigger picture questions of how best to use digitization in an institutional, preservation, or other particular context. The complete range of issues needing to be addressed in order for digitization to be an effective tool within our institutions was not initially tackled. From an imaging perspective, this entails asking ourselves what are the essential characteristics of originals that we want to replicate and carry forward in the digital copy?

Essential characteristics will inform future users about the original resources. The definition of and selection of essential characteristics is informed primarily by curatorial/archival and preservation decisions. Often, they are defined by a variety of physical and qualitative properties (for photographs—such as generation, size, quality, condition, intended use, etc.). Also, they will be unique to the collection/record/media type, and at times are likely to be institution-specific. In all cases, they should be well defined and appropriate to the original resources. In the context of using digitization for preservation reformatting, the ability to define the essential characteristics of originals at a very high level allows us to determine whether the digital copy could truly “stand in” for the original.

The identification and definition of essential characteristics for collection materials in cultural institutions is not new. Approaches to analog preservation reformatting have included specific conceptual rationales regarding essential characteristics and corresponding imaging approaches to reproduce those characteristics. For example, industry standards relating to microfilming documents and preservation microfilming guidelines (http://www.loc.gov/preserv/usnpguidelines.html, http://www.oclc.org/preservation/microfilming/standards/default.htm, and http://www.archives.gov/about/regulations/part-1230.html#partc) focus on the essential characteristic of text legibility. Micrographics standards and guidelines outline specific imaging approaches to maintain this characteristic on the microfilm.

Early approaches to digital imaging came from primarily two perspectives: jump in and just do it, or conduct a pilot project to learn about the technology and the process and then to build a program around these experiences. Each of these approaches has fostered different cultures of digital imaging programs within libraries, archives, and museums. For the preservation community in particular, initial forays into digitization were in many ways an extension of brittle book reformatting projects—should we scan rather than microfilm? Early on, the community recognized that digitization increases and enhances access, but does not guarantee preservation in the same way as microfilm. v11_n1_art1_bo3

Coming from a long tradition of microfilming, there was much initial interest in digitizing text. Early digitization approaches in the library community focused on defining essential characteristics for text-based originals and corresponding approaches to digitizing to match these characteristics. This includes work done at Cornell and Yale universities in the mid-1990s. In researching the scanning of text-based originals directly, the scanning of microfilm, and the feasibility of hybrid approaches (film-first and then scan, compared to scan first with the subsequent creation of computer output microfilm or COM), these projects focused on two essential characteristics—legibility of the smallest significant character and accurate rendering of type faces or fonts—as metrics for evaluating the appropriateness of specific digital imaging parameters.

While these are appropriate characteristics for high-contrast text based information, other types of originals have other or additional characteristics. Generally, early digitization guidelines adopted the approaches developed for high-contrast text. Many institutions simply accepted specifications developed from these projects, without much consideration of the particular originals or formats that were being digitized. In this phase of digital imaging in libraries, in many ways we were only asking - what is the digital equivalent of microfilm? Given other technological and economic limitations everyone was wrestling with to implement large-scale digitization initiatives, even trying to achieve the digital equivalent of microfilm seemed to be a major challenge.

Cornell University Library’s workshops on digitization in the 1990s provided the community with an excellent technical foundation for digitizing library collections. A major component of the workshop included an approach to defining essential characteristics, which was called benchmarking. The benchmarking concept was applied initially to text, then to graphic illustrations, and finally to photographs. The process focused on identifying the smallest significant character or feature as the primary metric for determining sampling frequency or spatial resolution. The benchmarking approach has had a significant influence on the development of digitizing guidelines over the last 10 years. The Digital Library Federation adopted recommendations for spatial resolution and bit depth as the Benchmark for Faithful Digital Reproductions of Monographs and Serials based on the extensive work conducted by Cornell and other institutions.

v11_n1_art1_bo4Unfortunately, over the last ten years the digital equivalent to microfilm is not holding up as an entirely acceptable model for digitizing text-based originals. Users are demanding that other essential characteristics be carried forward in the digital versions - such as color and the ability to see fine detail. Approaches to digitization are moving towards the consideration of both user expectations and the characteristics of the original resources, rather than being tied to an approach that replicates an earlier technology.

Another example of an analog reformatting approach that replicates essential characteristics is the National Archives and Records Administration’s (NARA) photographic duplication specifications (developed jointly with the Library of Congress over a period of several years with input from commercial vendors – available at http://www.archives.gov/preservation/formats/bw-copying-specs.pdf) for historic negatives. The NARA specifications and other approaches to duplicating still photographic negatives focus on the creation of duplicate negatives that have the same photographic properties as the original negatives. So that the photographic duplicates can be used and printed in the darkroom just like the originals, certain photographic properties are considered essential characteristics.

As described in the duplication specifications, we adopted a new approach to the tone reproduction for the duplicates, called shadow normalization. Traditional duplication approaches were set up to create duplicate negatives that matched the original negatives in terms of overall density, density range, and the relationship between the tones of the image. In order to optimize the duplication process for original negatives with large density ranges and to provide a means of v11_n1_art1_bo5objective assessment of the duplicates (using statistical process control), we opted to adjust the exposure for each negative and place the shadow density at a specified aimpoint on the duplicates. We concluded that the benefits of this approach outweighed losing one of the essential characteristics: the duplicates no longer had the same overall density of the originals (unless by coincidence the shadow density of the original was close to the aimpoint density).

We have certainly moved beyond the phase of limiting ourselves to deficiencies inherent in old technologies and approaching imaging as an extension or equivalent to microfilm. As we consider digitization as a means for preservation reformatting, we will have to weigh similar considerations as we define approaches to digital imaging. Defining characteristics for each type or class of digital object will most likely result in approaches that are not as consistent or standard across different resources, and may also be more difficult to implement.

Imaging Specifications and Guidelines

The following represents a chronology of digital imaging specifications and guidelines, and other articles and publications that have been influential (a fairly comprehensive list, but not intended to represent the “definitive” list, we apologize for leaving off any other significant documents).

1995

Digital Resolution Requirements for Replacing Text-Based Material: Methods for Benchmarking Image Quality
CLIR Report pub53
By Anne R. Kenney and Stephen Chapman
1995
http://www.clir.org/pubs/reports/reports.html

1996

Conversion of Microfilm to Digital Images
Request for Proposal
Library of Congress
February 1996
http://memory.loc.gov/ammem/prpsal5/rfp5.pdf or http://memory.loc.gov/ammem/prpsal5/coverpag.html

Requirements and Options for the Digitization of the Illustration Collections of the National Museum of Natural History
National Museum of Natural History’s Collections and Research Information System
By Donald D’Amato and Rex Klopfenstein
March 1996
http://www.nmnh.si.edu/cris/techrpts/imagopts/

Recommendations for the Evaluation of Digital Images Produced from Photographic, Microphotographic, and Various Paper Formats
By Franziska Frey and James Reilly
May 1996
http://lcweb2.loc.gov/ammem/Ipireprt.pdf

Digital Imaging for Libraries and Archives
Cornell University Library
By Anne R. Kenney and Stephen Chapman
June 1996
http://www.library.cornell.edu/preservation/dila.html

Digital Images from Original Documents – Text Conversion and SGML-Encoding
Request for Proposal
Library of Congress
June 1996
http://memory.loc.gov/ammem/prpsal/rfp18.pdf or http://memory.loc.gov/ammem/prpsal/coverpag.html

Digital Conversion of Research Library Materials – A Case for Full Information Capture
D-Lib
By Stephen Chapman and Anne R. Kenney
October 1996
http://www.dlib.org/dlib/october96/cornell/10chapman.html

1997

Digital to Microfilm Conversion: A Demonstration Project 1994-1996
Final Report to the National Endowment for the Humanities
By Anne R. Kenney
1997
http://www.library.cornell.edu/preservation/com/comfin.html

Conversion of Pictorial Materials to Digital Images
Request for Proposal
Library of Congress
May 1997
http://memory.loc.gov/ammem/prpsal9/rfp9.pdf or http://memory.loc.gov/ammem/prpsal9/coverpag.html

1998

Guidelines for Digitizing Archival Materials for Electronic Access
U.S. National Archives and Records Administration
By Steven Puglia and Barry Roginski
January 1998
Guidelines - http://www.archives.gov/preservation/technical/guidelines-1998.pdf
Matrix - http://www.archives.gov/preservation/technical/guidelines-matrix.pdf

What is an MTF…and Why Should You Care?
RLG DigiNews
By Don Williams
February 1998
http://www.rlg.org/preserv/diginews/diginews21.html#technical

Digital Formats for Content Reproductions
Library of Congress
By Carl Fleischhauer
August 1996 - http://memory.loc.gov/ammem/formatold.html
July 1998 - http://memory.loc.gov/ammem/formatold.html

Guidelines for Image Capture
Joint RLG and NPO Preservation Conference – Guidelines for Digital Imaging
By Stephen Chapman
September 1998
http://www.rlg.org/preserv/joint/chapman.html

Manuscript Digitization Demonstration Project
By Louis Sharpe and Michael Ott
For Library of Congress
October 1998
http://memory.loc.gov/ammem/pictel/pictel.pdf or http://memory.loc.gov/ammem/pictel/index.html

1999

Digital Imaging and Preservation Microfilm: The Future of the Hybrid Approach for Preservation of Brittle Books
RLG DigiNews
By Stephen Chapman, Paul Conway, and Anne R. Kenney
February 1999
http://www.rlg.org/legacy/preserv/diginews/diginews3-1.html#feature1

Imaging Pictorial Collections at the Library of Congress
RLG DigiNews
By John Stokes
April 1999
http://www.rlg.org/legacy/preserv/diginews/diginews3-2.html#feature

Illustrated Book Study: Digital Conversion Requirements of Printed Illustration
By Anne R. Kenney and Louis Sharpe
For the Library of Congress
July 1999
http://memory.loc.gov/ammem/techdocs/ibs.pdf or http://www.loc.gov/preserv/rt/illbk/ibs.htm

Digital Imaging for Photographic Collections – Foundations for Technical Standards
Image Permanence Institute
By Franziska Frey and James Reilly
December 1997 article - http://www.rlg.org/preserv/diginews/diginews3.html#com
1999 - http://www.imagepermanenceinstitute.org/shtml_sub/digibook.pdf

2000

Image Quality Metrics
RLG DigiNews
By Don Williams
August 2000
http://www.rlg.org/legacy/preserv/diginews/diginews4-4.html#technical1

Digital Imaging Production Services at the Harvard College Library
RLG DigiNews
By Stephen Chapman and William Comstock
December 2000
http://www.rlg.org/legacy/preserv/diginews/diginews4-6.html#feature1

2001

Report of Imaging Practitioners Meeting on 30 March 2001 to Consider How the Quality of Digital Imaging Systems and Digital Images May be Fairly Evaluated
Digital Library Federation
By Stephen Chapman
May 2001
http://www.diglib.org/standards/imqualrep.htm

Digital Reproduction Quality: Benchmark Recommendations
RLG DigiNews
By Daniel Greenstein and Gerald George
August 2001
http://www.rlg.org/legacy/preserv/diginews/diginews5-4.html#featured

Guidelines for Digital Imaging Projects
University of Illinois at Urbana-Champaign
December 2001
http://images.library.uiuc.edu/resources/digitalguidev3.pdf

2002

Benchmark for Faithful Digital Reproductions of Monographs and Serials
Digital Library Federation
December 2002
http://www.diglib.org/standards/bmarkfin.pdf or http://www.diglib.org/standards/bmarkfin.htm

2003

Western States Digital Imaging Best Practices
Collaborative Digitization Program (formerly the Colorado Digitization Program)
January 2003
http://www.cdpheritage.org/digital/scanning/documents/WSDIBP_v1.pdf

Debunking of Specsmanship
RLG DigiNews
By Don Williams
February 2003
http://www.rlg.org/legacy/preserv/diginews/diginews7-1.html#feature1

2004

Technical Guidelines for Digitizing Archival Materials for Electronic Access: Production Master Files – Raster Images
U.S. National Archives and Records Administration
By Steven Puglia, Jeffrey Reed, and Erin Rhodes
June 2004
http://www.archives.gov/preservation/technical/guidelines.pdf

Digital Master Images – Sample Technical Specifications for Photograph Collections
Library of Congress, Prints and Photographs Division
Compiled by Kit Peterson
June 2004
http://www.loc.gov/rr/print/tp/DgtlMastersSamplSpecsSelctdRcmndFinal7_2004.pdf

Standards Related to Digital Imaging of Pictorial Materials
Library of Congress, Prints and Photographs Division
Compiled by Kit Peterson
September 2004
http://www.loc.gov/rr/print/tp/DigitizationStandardsPictorial.pdf

2005

Introduction to Basic Measures of a Digital Image for Pictorial Collections
Library of Congress, Prints and Photographs Division
By Kit Peterson
June 2005
http://www.loc.gov/rr/print/tp/IntroDgtlImage.pdf

FDsys Specifications for Converted Content – Digitization Specifications and Operating Procedures for Archiving Materials: Creation of Preservation Master Files
U.S. Government Printing Office
June 2005
http://www.gpoaccess.gov/legacy/FDsys_ccspecs.pdf

CDL Guidelines for Digital Images
California Digital Library
July 2001 - http://chnm.gmu.edu/digitalhistory/links/pdf/chapter3/3.29b.pdf
November 2005 - http://www.cdlib.org/inside/diglib/guidelines/bpgimages/cdl_gdi_v2.pdf or http://www.cdlib.org/inside/diglib/guidelines/bpgimages/

Digitization for Preservation Reformatting of Photographs
DLF Fall Forum, BOF session
Presented by Erin Rhodes
November 2005
http://www.diglib.org/forums/fall2005/presentations/rhodes-2005-11.pdf

2006

Technical Standards for Digital Conversion of Text and Graphic Materials
Library of Congress
December 2006
http://memory.loc.gov/ammem/about/techStandards122106.pdf

General Trends in Digital Imaging

Looking at the above chronology, we can conclude the following:

In general, the trends for digital imaging have been:

  • From lower minimal spatial resolution to higher spatial resolution
  • From 1-bit scanning, to grayscale scanning, and finally to color scanning
  • From low-bit (8-bits per channel) to high-bit (16-bits per channel) for grayscale and color images
  • From scanning for a specific purpose to digitizing in a “use neutral” manner

Digitization has been limited by the technology:

  • People did as little (as low resolution and/or as low a bit-depth) as they could get by with to facilitate only access.
  • Minimum specifications (primarily for textual materials digitization) have been a big cost driver—scanning at lower resolutions means you can do twice as much.
  • Digital storage was and is expensive. While less expensive today, high-capacity storage area networks (SAN) with automated tape libraries for backups and off-site mirroring (good IT practices for risk mitigation) all remain beyond the financial means of most cultural institutions.
  • We are still wrestling with limitations of the science and technology—digital preservation repository infrastructure remains expensive and, to a large degree, undefined.

Early digitization replicated capabilities of the prior technology in the digital capture:

  • Many of the early digitizing efforts matched digitization to microfilm.
  • Early digitization emphasized scanning existing intermediates – a major problem with this approach is carrying forward the limitations of the previous technology (inaccurate tone reproduction and film grain), as well as carrying forward any defects in the intermediate (photographic and/or physical).
  • Early guidelines were based on concepts like QI that came from the micrographic industry
  • Realization early on that not all approaches used to assess microfilm quality worked for digital imaging—move toward SFR/MTF and away from resolution charts.

In general, for many projects items were and still are scanned at less than the recommendations cited in the DLF benchmark. This approach contributes to the “building a critical mass” of resources perspective. Often, large projects look towards scanning homogenous materials that are easy to scan both technically and legally, which correlates to a large amount of data created. This trend has accelerated the last few years with large-scale digitization efforts by Google, the Open Content Alliance, the Million Book Project, and the like.v11_n1_art1_bo6

The trend has been for the adoption of fixed approaches, rather than defining the process to achieve a specific result; for example, scanning at a fixed high spatial resolution for all originals, rather than assessing the characteristics of specific groups of originals and adjusting the digital imaging requirements to match the group. There are lots of assumptions in the field that have become truisms, such as fixed high spatial resolution and bit-depth is a good thing—but there is no guarantee of quality.

More recently, the focus has been on high spatial resolution and high-bit sampling, but there has been minimal effort put into defining other quality parameters. In the end, spatial resolution by itself is not a defining factor for digitization requirements, nor does it guarantee quality. It represents the maximum spatial detail or acuity a device is capable of achieving, if designed well. Bit-depth only indicates the maximum range of tones a device is capable of differentiating, but also is not a guarantee of quality. There needs to be more emphasis on ensuring the quality of the pixels, and this is still problematic for the field. Tools have not improved and imaging is still at a point where it cannot be done well without experienced people.

One conceptual approach for information capture is to regard digital imaging along a spectrum. As you move from one end of the spectrum to the other, the amount of information and the accuracy of information that is captured increases.

v11_n1_art1_img1

At one end of the spectrum is a very defined imaging environment. Capture is done to at least minimal specifications, spatial resolution is based on formats and sizes of originals, images are encoded in RGB, and images are processed in a manner to facilitate a specific output (i.e., images are adjusted for generic monitor display or for printing). In imaging science, this may be called an “output-referred” approach to the image state for the digital images. Both NARA’s 1998 and 2004 technical guidelines are based on an output-referred approach and recommend bringing all images to a common rendition that is based on generic monitor display.

At the other end of the spectrum, the imaging environment is less defined and minimal image processing is done, in this end of the spectrum the image state may be either “original-referred” or “input-referred.” Imaging may be done in a manner that relates the image more closely back to the original (although this is also possible with an input referred image state), images may exist in optimized three-channel or multi-channel color encodings, spatial resolution is uniformly high or based on assessment of the original, images are less processed for any particular use. Original-referred and input-referred images will need to be adjusted in order to be used, so that the display or output will look like the original. We are not entirely sure today just how to define digital image capture at this end of the spectrum. These are emerging approaches that warrant further investigation.

The unprocessed end of the spectrum will place a bigger burden on making these resources usable in the future. The more defined the output, the less work there is to do. The more open, more raw the resource, the more work it takes to make it usable. The approach of bringing images to a common rendering solves some of the usability problem.

Although there is the potential for having more functionality at the less-defined end of the imaging spectrum, we still want to ensure we have captured the appropriate essential characteristics that tell us about the original resource. From a preservation reformatting perspective, we would prefer to use an approach that is original-referred.

We feel it is feasible technically to define approaches to digitization that will produce very accurate visual surrogates (for many originals, accurate visual representation is a major aspect of carrying forward the essential characteristics) and create a “good data-set” as Carl Fleischauer of the Library of Congress describes it. We are treading a fine line between a traditional approach that defines a specific visual representation and moving forward to one that is more “use neutral” and accommodates other future (but undefined and unknown) uses.

Comments on Other Aspects of Digitization Guidelines

Scanner and digital camera assessment

  • Still in progress, not as much as progress as we would like.
  • Early guidelines were not based on the capabilities of the equipment; assumption was that the equipment performed at an appropriate level—even though no one was really measuring the performance of the equipment.
  • NARA’s 2004 Guidelines were the first to define capture device performance parameters.
    • Only have limits for noise and channel registration
      • Higher limit for noise level for text docs—lower maximum density
      • Lower limit for noise level for photographs—higher maximum density
      • We picked limits based on actual equipment—we ran the tests on a range of scanners and digital cameras and picked numbers that were reasonable
    • Other parameters used as a guide for determining the suitability of a particular capture device for a particular original

Viewing environment

  • There has been acceptance of the standardized viewing environment defined by the graphic arts industry, if not wide adoption, implementation, and use.
  • As we move toward preservation digitization, this becomes even more critical – particularly monitor calibration, if the monitor is used for a basic visual assessment compared to an analysis of the capture device performance.

Color management and ICC compliant workflows

  • People are trying to implement color managed workflows.
  • The current ICC color management process is not always useful for our work – a new CIE committee on Archival Color has been established and hopes to address the specific needs of our community.
  • We still believe in doing the imaging/encoding/image state in a way that would allow us to ignore the ICC profiles – we want the option of interpreting the numerical values literally and still have reasonably accurate color and tone reproduction.
  • Rendering intents – when performing color space transformations using the current ICC color management process – relative colorimetric intent is most appropriate for near neutral originals like old documents, and perceptual intent is most appropriate for photographic images.
  • Color spaces – assumes RGB encoding (emerging practices may use other encodings)
    • NARA’s 1998 guidelines – suggest using sRGB (by assigning), which we still think is appropriate for text documents (they tend to have a smaller color gamut and less saturated colors)
    • NARA’s 2004 guidelines – moved to a recommendation of the larger-gamut AdobeRGB 1998 color space
    • The future - assume in some cases an even larger gamut color space will be desirable, achievable only with high-bit sampling

Reference targets

  • General targets and multiple test targets have been used, but usage and implementation has varied.
  • Suggestion of using targets specific to the types of originals we are scanning – for example, aged albumen target for old albumen prints – but this would be very difficult to do.
  • Don Williams has worked on an integrated target – the “Golden Thread” target that integrates multiple targets so all aspects can be evaluated.
  • Current work at Library of Congress, overseen by Michael Stelmach, with Don Williams and Peter Burns – capture device assessment target, image characterization target (scanned with original), and software for automated analysis – being called the Digital Image Conformance Evaluation (DICE).

Tone and color reproduction aimpoints

  • NARA’s aimpoints geared toward generic monitor display and an output referred environment
  • Others geared toward prepress work  - including the Government Printing Office’s guidelines (http://www.gpoaccess.gov/legacy/FDsys_ccspecs.pdf)
  • Still a big question about the variability of digitization
    • Currently the Library of Congress is trying to address this

Image processing workflows

  • NARA’s illustrative sample workflow intended to minimize doing anything “bad” to the image quality versus leaving the images in a less defined, less processed, “raw” state
  • When people have described their image processing – it is almost always specific to their local process – as presented at the IS&T Archiving Conference panel on imaging workflows, in Washington, DC, 2005
  • Sharpening – do it or not? Still a question.

Image quality defects

  • Not standardized

Document Types

  • Define text by the characteristics of the information and type
    • 1-bit for printed high-contrast text
    • 8-bit for low contrast, diffuse characters, staining, faded, mixed content, etc.
    • 24-bit for cases where color is important to the interpretation of the information
  • Define photographs by
    • Transmissive camera originals – negatives, slides, transparencies
    • Reflective positives – prints
    • Pixel array tied to format and dimensions of the originals – a major departure from earlier guidelines – acknowledging the amount of information in originals varies

Quality Control

  • Not standardized and no community-wide standards

Derivatives

  • Fixed size vs. dynamic creation
    • In general, moving toward JPEG2000 – although implementation in a high-demand environment is still problematic
    • Although available for many years for on-the-fly creation of derivatives from traditional raster image formats like TIFF, dynamic creation is not limited to just the JPEG2000 format

File Formats for master files

  • TIFF – still the de facto standard, advocated by people who like to “keep it simple”
  • JPEG2000 being considered more seriously now
    • Resiliency to corruption due to data redundancy
    • On the fly conversion of derivative to any size
    • Difficult to implement
      • Limited choices for software toolkits to support JPEG2000 within IT infrastructure
      • High demand on infrastructure when trying to create derivatives on-the-fly

Where are We Headed and What Still Needs to be Done?

A great deal of progress has been made in some areas, and in other areas not as quickly as desired, but much has been learned over the past decade. As we move towards better definition of what digital imaging means in a preservation context, there is still work to be done. It is a little humbling to look back and admit that we are still asking many of the difficult questions that we were asking over a decade ago – particularly about the relationship of digitization to preservation and agreement on approaches that are appropriate for preservation reformatting using digitization.v11_n1_art1_bo7

For the most part, we have accepted that digitization can meet current needs for facilitating access, and by doing so, also fulfill basic preservation needs by limiting handling of originals. However, digitizing is not yet completely synonymous with preservation – see Appendix A of NARA’s 2004 Technical Guidelines. Beyond the benchmarking concept, which has been well established for text-based materials through the work done by Cornell and later by the Digital Library Federation’s Benchmarks for the Reformatting of Monographs and Serials, only recently has there been community-wide discussion regarding requirements for preservation reformatting, particularly for non-text original formats. We are moving closer to a better understanding of what is needed for digitization as a preservation reformatting approach, and in many ways we have already defined some of the requirements that we would be willing to accept as preservation requirements – see “Digitization for Preservation Reformatting of Photographs.”

The move from creating a digital copy primarily for access purposes to one that is more focused on the quality of the digital copy—one that is worth sustaining over time—takes into account not only the properties of the original that are deemed important to carry forward on one hand, but also changing user expectations, the capabilities of the technology at the time, and the purpose or use of the digital image on the other.

v11_n1_art1_img2

Only at the highest technical quality level do you get a digital resource that matches the original, or even the analog preservation copy, if this is the intent. As Stephen Chapman from Harvard University notes, sustainability is a key attribute of good digital collections, and the best time to build in sustainability is at the point of creation. Although the concept of sustainability applies to both use and content, from a digital imaging perspective, what approaches do we follow to create digital images that are worth sustaining over time? How do we start to define a technical approach that could also serve as a preservation approach that takes into account some or all of the factors in the illustration above?

v11_n1_art1_img3

This article has discussed two broad concepts, information capture and essential characteristics, that we think address sustainability in an imaging context. Information capture addresses the concept of producing a good reproduction. Conceptually the community is moving toward capturing information to produce “good data sets.” These representations may not look like the originals we are copying, but can serve as of yet unknown needs, such as scientific analysis and research. v11_n1_art1_bo8

Defining essential characteristics of originals helps us to move beyond the limitations of current technology to designing specifications based on these properties. These can be based on many factors, including physical and chemical attributes of the original, condition, quality, defects, date of production, generation (photos), curatorial or financial value, etc. We need further investigation into the essential characteristics of different classes of digital objects/files, how to tie these properties to digitizing approaches, and how to determine the best approaches to digitizing classes of originals with similar properties and characteristics. Essential characteristics can be identified via the capture process or in metadata about the image. Metadata should include information about the original and the digital resource, and should document information about characteristics that are not inherent in the digital version.

A focus on essential characteristics does not preclude valid reasons for digitizing collections based on concepts of intended use, affordability, and sustainability over time. “Fitness to purpose” has been a driver for digital imaging for some time; not every program will have the same goals of fidelity to the original, longevity, or preservation. Institutions should take into account the context and reasons for digitization in their individual cases. In a preservation context, however, there may be a higher risk of not achieving preservation goals at this end of the spectrum.

Even as we move toward less-defined imaging approaches, we still need to create images in consistent ways that will allow us to automate ingest into digital repositories - including characterization and validation of the digital objects and data formats, automated transformation of digital objects, and automated creation of reference and/or use copies.v11_n1_art1_bo9

There are gaps in specific technical areas that should be addressed by the larger preservation and imaging community in order for digitizing for preservation reformatting to be fully scoped and defined. In order to consider using digitization as a method of preservation reformatting, it will be necessary to specify more about the characteristics and quality of the digital images beyond specifying minimal and optimal levels of spatial and signal resolution. High-bit, high resolution imaging has been the focus of imaging specifications, but there has been minimal effort put into defining other quality parameters, such as tone reproduction, color reproduction, color mode, capture device performance, assessment of source, and image state, for example.

As mentioned above, one area that still needs a lot of work is capture device performance. Pixel resolution is a good marketing device, but the internal processing of the scanner or camera has a big influence on image quality. Besides tests for spatial frequency response and dynamic range, evaluation of the capture device might include tests to measure noise levels, uniformity in tone and color reproduction, channel registration, dimensional accuracy, etc. More importantly, can we define trustworthy pass/fail limits for each of these tests so that scanner and camera performance is more easily measured and documented? Although much effort has gone into quantifying the performance of scanners and digital cameras in an objective manner, there has not been enough progress in making device performance assessment readily understood, usable, and easily integrated into imaging workflows to date. There has not been advancement of comprehensive guidelines with sophisticated approaches to device performance assessment. Simple approaches, or simply taking the manufacturer’s specifications at face value, come at the expense of quality. We assume that the capabilities and performance of capture devices will need to improve to accommodate high levels of information capture. Currently, the Office of Strategic Initiatives at the Library of Congress is working with consultants on developing better test targets and software for automated evaluation of capture device performance.v11_n1_art1_bo10

We certainly need consensus on applying scanner performance test limits that are acceptable to the imaging community. The digital library community has been relatively silent on some of these technical issues, and there has been a heavy reliance on scanner manufacturers and the digital camera industry to define the criteria. Imaging practitioners should work to define both assessment criteria and pass/fail limits. To a certain extent, the imaging science community can assist us in this process. For particular applications, certain performance criteria will be more critical than others and will need to exceed established minimum limits, such as level of dimensional accuracy for aerial photography scanning.

We need to be sophisticated users of the technology and the tools. We should continue to acknowledge that to do imaging well is both difficult and requires expertise. While it is a versatile tool, it does not accomplish specific functions well without requiring a certain amount of operator expertise. In many cases it still does not work as well as we would like. People have been more than willing to accept very limited, undefined digital imaging guidelines as acceptable for preservation reformatting. We should look to designing and endorsing more comprehensive and sophisticated approaches to imaging, especially as evident in imaging specifications, guidelines, and best practices—which should include an articulation of the entire digitization approach, not just specifications for imaging “at the scanner.” Guidelines should take into consideration a wider range of technical parameters, assessments of the original on a more granular basis, and an acknowledgement that there will be different approaches depending on the role or purpose of imaging within a particular context.


Copyright 2004 RLG.