|
![]() |
|
![]() |
Guidelines for Image Capture
The reasons to scan historic collections range from enhancing catalog records with thumbnail images to creating facsimile reprints. Given this broad array of goals it is reasonable to ask, "How can we find consensus?" In the conference Digitizing Photographic Collections, held at the Rochester Institute of Technology in 1997, James Reilly stated that in all forms of imaging there is a spectrum of choice, and that we cannot rely on standards to direct institutional purposes. Rather, he insisted that our purposes must drive technical choices. The corollary to Mr. Reilly's observation is that in all types of institutions there is a spectrum of purposes. The goal of this paper is to promote discussions that will lead us to articulate common objectives related to digital image capture.
As a preservation librarian, I will admit to a bias in providing this summary of imaging practice in U.S. libraries and archives. It is not, as you may think, to suggest that preservation-quality products -- "boutique images" as Michael Lesk has dryly observed -- are the only ones worth making, or that investments in the highest quality you can afford will pay off in the long run. Instead, my training and my background lead me to advocate controlling the process as a common objective, regardless of the nature of the source materials, user needs, or image quality requirements. We are talking, after all, about best practice, not best products.
What strikes me about the language of our profession is the degree to which we talk about process rather than product -- "digital imaging," "digitising," "microfilming," "photocopying," or even the catch-all "reformatting" are familiar terms. Perhaps we take for granted the permanence and quality of the products we create routinely in brittle book programs. Or is it possible that we focus on process for two reasons? First, no matter which technology is used we know how many things can go wrong from the source material to the copy; and, second, we know that in all cases we must give up something from the original -- that to balance quality and cost we must sustain some loss of information. For these reasons, we organize reformatting activities into a series of controlled workflows. I think we should sustain, if not strengthen, this approach to best practice as we establish new guidelines for new technologies. Systems measurement, thorough inspection of pictorial reproduction (image quality) and completeness, the testing and certification of media stability, the training of technicians, and the systematic gathering and distribution of documentation arguably serve all scanning objectives equally well.
This Working Group has decided to review five topics related to digital image capture, with the objective of identifying the points in the reformatting workflow that would most benefit from the development of guidelines. They are materials handling, systems quality, digital master quality, derivatives, and file naming. Each of the five sections includes introductory comments, a list of issues, a review of technology, and a summary of practice to gauge where perspectives are similar and where they vary. The sections conclude with lists of questions to promote further discussion.
In one of our conversations about this conference, my colleague Jane Williams used the wonderfully apt phrase "fit for purpose" as an umbrella term to put image capture guidelines into a usable context. This phrase nicely encapsulates James Reilly's adage that there are spectrums of choice in imaging technology and techniques, and spectrums of purpose. Naturally, guides to practice should not be so rigid that they apply only to narrow bands of the spectrum. We have the best chance of being inclusive if we can identify the boundaries and work between them.
I. Handling
Physical handling is one of the most destructive things that can happen to a fragile object. One of the best ways to preserve it is to limit physical access to it. This is a very strong case for creating a digital library of such objects.Peter Noerr, The Digital Library Tool Kit, Sun Microsystems, Inc., April 1998, p. 21. <http://www.sun.com/products-n-solutions/edu/libraries/digitaltoolkit.html>
The potential preservation benefits of digitizing valuable, unique, or fragile materials are so well understood that they are now stated almost as a matter of course among the rationales for scanning historic collections. What is pointed out too rarely, however, is that digital images do not make themselves. The logic that is offered to invest in digital image quality -- namely, that reducing use of materials enhances their preservation -- also applies to the digital conversion process. Physical handling threatens historic collections. When considering handling at the level of the collection rather than the single item, it is not unreasonable to argue that a group of materials may never be handled to such an extent again during its lifetime.
Handling is sometimes required twice in the reformatting workflow, once for processing and cataloging, and again for imaging. Although we can presume that scanning technology will evolve, handling is the one area of digital capture where we should not expect industry to develop acceptable products without significant participation from our community. We are not a large enough market to entice manufacturers to develop scanners that "fit" non-standard materials, and it may take some time to convince engineers, for example, that opening bindings fully to 180° is often a bad idea, whether the bound material is imaged face up or face down.
Reviewing library imaging practices to date, it is therefore not surprising to find that compromises have been made to get the job done. In some cases, handling practices have been less than ideal; in others, image quality was lowered or costs were raised in order to protect source materials.
For example:
A. Issues
B. Technology
Risks to materials are generally lower in face-up reformatting workflows, so digital cameras and overhead scanners are of particular interest to the library, archives, and museum communities. Within the past year or two, imaging experts appear to be approaching consensus that direct digital photography offers exceptional capabilities for image quality -- in some cases even exceeding the reproduction capabilities of film -- and that it may not pose undue risk to historic photographs and other works of art. (1) (Colet, D'Amato,Ester, Frey, Mintzer) Lighting and its effects regularly emergeas issues that concern curators. One wishes for a definitive study about the effects of lighting -- at which cumulative point does light damage a given object? -- and the actual, rather than relative, harmful effects of the types of light used with digital cameras and flatbed scanners.(2)
Pending the outcomes of more research and discussion in this area, we should follow with interest a number of improvements in digital cameras. New area array cameras such as the Leaf DCB support the short exposures of flash photography, although their resolution -- and potential for scanning oversize materials -- is much lower than the longer exposure linear array systems such as the Phase One. The Marc II system now being used in the Library of Congress Prints & Photographs Division is one of the newer cameras that combines the advantages of the area and linear array cameras by stitching together multiple exposures to create large files (up to 11K x 11K) at relatively high speeds.
All this is to say that face-up, "contact-free scanning" is becoming affordable. Some institutions are reaching the conclusion that the costs of producing reasonably high-quality digital photographs are comparable to the combined costs of producing high-quality 35mm photo intermediates and production Kodak Photo CD scans. For digital access projects, this raises the question about the preservation value of 35mm film. Should guidelines for digital image capture recommend that direct digital capture is preferred over the photo intermediate-to-scan approach?
For non-continuous tone materials, particularly the "mixed" format of the prototypical brittle book, face-up (overhead) scanning is something of a mixed blessing. The cradle design of the Minolta PS3000 scanner, for example, presumes that one always wants to open materials fully to 180°, and then to press down on the pages to hold the item flat. (3) Until very recently, the ImageAccess BookEye scanner did not have any type of binding support integrated with its system, so to our community it represents something of a small victory to see that this vendor has recently upgraded their system to include an adjustable cradle. Like the Minolta, however, this 1-bit scanning system has been optimized for modern materials. Getting legible images from aged low-contrast paper with peaks and valleys across the surface requires a high number of rescans with 1-bit scanners, regardless of their optical resolution.
In two exceptional cases (Zeutschel and Xerox PARC), industry is working to develop a digital camera/integrated cradle optimized to rare books. Unfortunately, these scanners are among the most expensive; and the Xerox book scanner is still in development. The more general trend in rare book scanning appears to be the use of a custom cradle with a standard digital camera. These cradles are almost always designed by conservators. Notable prototype systems have been used in several library scanning projects. (Mayer, Mintzer, Riser, University of Oxford) Collaboration in this area will be instrumental in developing systems that balance the needs of handling and production.
C. Summary of PracticeAs might be expected, handling practices vary widely, according to the value and condition of the source materials, the preservation objective (to represent or to replace), cost, willingness to customize equipment, and, perhaps most importantly, the level of participation by conservators.
D. Discussion Questions
II. System Performance*
*in this context, "system" refers to the combined performance of scanning hardware, software, file format, and compression algorithm used to save images to disk.
There are several incentives to define technical guidelines for system components (scanners, software, monitors, printers): to be able to provide specific contractual language regarding the baseline performance for equipment; to make informed comparisons of products and services; and to document whether an imaging system is operating consistently at optimal levels.
While it is true that clearly written technical literature helps us become educated about the key components of scanners and their relationship to image quality, imaging experts remind us that judgments about quality are ultimately subjective. Producing a good image results from knowing what a system can do and what the observer wants to see. Selecting and evaluating equipment, however, can be based upon objective measures. Guidelines and tools in this area of image capture give us control over the system by allowing us to measure the quality of the signal, rather than the quality of the image from a given scanner.
Engineers and image scientists report that the four most important system characteristics to evaluate and monitor are noise, detail, tone, and color reproduction, (D'Amato, Gann, Reilly). The first two are critical for 1-bit scanning; all pertain to color imaging. One of the more appealing aspects of digital technology is that it makes it possible to conduct objective assessments of system performance in these key areas. With the proper tools, precise measurements can be obtained for each of these values. Moving from theory to practice, however, will require the development of easy-to-use targets and associated software -- likely to be available in months rather than years, but not here today. (PIMA, Reilly, Williams)
Where objective measures are in use to document system performance (LC, Smithsonian, NARA), there has been resident photographic expertise, training from imaging experts, or both. Subjective techniques have been used in other in-house projects, where practice generally falls into two categories: evaluating images on screen or in print. Imaging service bureaus, as might be expected, have various levels of expertise and experience. The better ones are advising the libraries and museums how to set up their systems.
When subjective methods are used, experts unanimously agree that system calibration is essential. It is something that we must learn to do if we are to represent our digital products as being of high quality.
A. Issues
B. Technology
From the librarians' perspective, the state of technology may be summarized as the need to integrate technical concepts, tools, and training into the digital image capture workflow if we are to move objective quality assessment from theory to practice. As illustrated by the examples in the Appendix, technical specifications must not be taken at face value. Forecasting image quality with a single specification (such as dpi or bit depth) is particularly risky.
We know that it is important to test and monitor equipment, but why not just look at scanned targets on screen or in print? One of the limitations to this subjective approach is that even with calibrated systems it is not easy to determine where quality loss occurs from one component to another. We also cannot infer from a high-contrast target how a scanner will perform with low contrast material, or what its capabilities might be to capture shadow detail.
Targets designed to measure Modulation Transfer Function (MTF) make it possible to assess detail reproduction across a range of tones. Experts explain that the MTF test reveals whether a scanner's sampling rate on output is actually less than the its specification (i.e., the optical resolution or input dpi). Don Williams, an image scientist at Eastman Kodak Company, explains that although it is reasonable to assume a 400 dpi scanner will sample a document every 1/400", it is not safe to assume that the scanner will actually resolve details this small. (Williams) In fact, it is possible that one scanner with a better signal-to-noise ratio will resolve the same detail at a lower sampling rate. In other words, the potential payoff in implementing the use of MTF targets and analysis is to produce higher quality in smaller file sizes. When comparing systems and services, one other advantage to measuring MTF is that a scanner's exposure setting has no effect on the accuracy of the reading. (Gann)
Traditional photographic targets (Macbeth ColorChecker, the Kodak Q60 targets, Kodak Q13 and Q14 gray scales and color patches,) are being used to measure noise, tone reproduction, and color reproduction, although in the long run these may prove to be less than ideal for objective measurement of digital systems. Perhaps with some beta testing and funding from our community, organizations such as the Photographic & Imaging Manufacturers Association (PIMA) or the Image Permanence Institute will be able to manufacture a single integrated target for system evaluation. Targets such as the RIT Alphanumeric Test Object to measure 1-bit systems may not be sophisticated enough to determine how well a scanner will represent details in low-contrast or illustrated documents. These 1-bit targets can, however, be used to compare systems, particularly the thresholding capabilities of different scanners which have the same optical resolution.
C. Summary of practice
As with handling, practice varies widely. Several institutions have set up in-house imaging facilities, ranging from document scanning labs to fully configured digital photography studios. Some institutions regularly use targets, some do not. Scans of technical targets could be saved as administrative metadata to accompany digital objects, but this practice is rarely followed.(4) Grayscales and color bars are sometimes included with digitized photographs -- see, for example, the imaging recommendations in the MOA II White Paper (5) -- but there is a difference between using targets to document the image and using them to document the scanner.
Several of the leading imaging service bureaus in the United States have established, refined, and documented the techniques they use to monitor scanning systems. (Ester, Preservation Resources) Collaboration among experienced practitioners, imaging scientists, professional photographers, and "digital librarians" will likely be the successful formula to document, then implement best practices in the appropriate use of targets for all types of scanners.
Poor environments can and will degrade system performance and image quality. Scanning of Caribbean Newspapers was actually suspended at the University of Florida during building renovations because vibration and dust made system performance unreliable. When environmental control is feasible, specifications range from the wall sockets (using line conditioners to control voltage) to the ceilings (requirements for paint color and indirect lighting to create calibrated viewing environments), and from the HVAC system to the furniture. Controlled-environment imaging labs have been created at the Denver Public Library, the National Archives and Records Administration, the Museum of Modern Art, the Library of Congress, and selected service bureaus in the U.S. Specifications for environmental control should also be included in any guidelines for systems quality.
D. Discussion Questions
III. Quality of Digital Masters
Creating digital images worth keeping is the heart of the matter in proposing best practices for digital image capture. Preservation librarians sometimes make the case that increasing quality will extend the usable life of a digital collection. Associating longevity with image quality is arguably a matter of speculation, but there are limits to this logic. At some point, one reaches the maximum lifespan. Another, perhaps more cynical, point of view is that only images that are used have a chance of long-term survival.
At the Rochester conference on Digitizing Photographic Collections, James Reilly stated that it is "utterly crazy to think that one image will serve all purposes." Other imaging experts assume that there will be only one chance to scan materials, so one should attempt to capture all of the information content in the original to create a "rich" master that will have great flexibility and lasting value. The two points of view are not diametrically opposed, but their implementation logically suggests different concepts of "best practice." With proper allowances for cost differences in each approach, perhaps two sets of guidelines could be developed.
In this arena of digital image capture, guidelines should address every component of the digital image with an eye toward ensuring openness and flexibility (and therefore longevity). Open formats are those which are widely supported by software that offer both capabilities to read from and write to a specified format. It is up to us to decide whether or not a format must be supported by a standard in order to be considered open.
The guidelines should also provide clear statements of the value judgments that are at work in defining pictorial quality in a master file. Digital masters, we must be reminded, are rarely delivered to the user. Assessing the quality of master images quickly teaches the lesson that "digital image" is an oxymoron. A viewable image is a product of the encoded file and the characteristics of a given viewing device. To define pictorial quality objectively, we need to respond to the technical challenge of eliminating the monitor or printer from quality assessment. With targets and associated software, this can be done.
A. Issues
B. Technology
There are a number of technical issues in this category. Quality and longevity goals are paramount, but it is also important to account for the tradeoffs between quality and cost. Determining how best to stitch the following pieces together to create good master images can be somewhat tricky. Here are some of the separate issues that have a bearing on the quality of the master image:
To view these technical issues from the management perspective is to see a long list of decisions that must be made. But if we don't make them, someone else will on our behalf. To control this part of the digital image capture process, we must become technically literate, and even more importantly, we must have the vocabulary to describe what we want. Just as rules for metadata syntax ensure reliable searching, rules for image quality syntax will help to ensure consistencies in image quality. As an international group, can we reach consensus on this issue?
One industry trend is giving us more latitude to configure scanners to meet guidelines for file format, compression, and image enhancement. With more scanners being built to either the ISIS or TWAIN standards, scanning software and hardware can be evaluated separately. Using the objective metrics described in the previous section, we might configure a scanning system by selecting the hardware based upon engineering, image quality, and price, then select (or write) the scanning software based upon file format specifications, desired enhancements, and ease of use.
C. Summary of Practice
In surveying the guidelines that have been published, it is important to assess first whether scanning guidelines were related to project goals of any kind, or were selected because "Institution X" had previously used that standard. Second, it is important not to compare apples to oranges. Differences in source material must be taken into account, of course, but it is also essential to consider the philosophy that informs the specifications.
C.1. Quality
Broadly speaking, there are two schools of thought regarding image quality:
example: brittle books scanning projects at Cornell University; specification: 600 dpi 1-bit Group IV TIFF images, with use of Xerox resampling/descreening algorithms for halftones
example 1, image capture guideline driven by scanning technology: Caribbean Newspapers Project at the University of Florida to scan 35mm microfilm, specification: 400 dpi 1-bit Group IV TIFF images, because it was the highest resolution offered by the microiflm scanner that met their performance requirements (budget and quality); to produce legible master images, the microfilm images could be enlarged to one-half the size of the original newspaper
example 2, image capture recommendation driven by printing technology: recommendation by Mitretek Systems, Inc. and Allen Press to the Smithsonian Institution National Museum of Natural History (NMNH) to scan illustrations at twice the anticipated halftone line frequency (e.g., scan at 600 dpi for printing at the highest line frequency of 300 lpi); specification: 600 dpi 24 bits per pixel: quality deemed comparable to 4 x 5 color film (D'Amato)
Photographs belong in a slightly more complex category, which combines elements from the two perspectives of image quality summarized above. At Corbis, they have established four categories of image reproduction, and four sets of corresponding guidelines for digital image capture (Süsstrunk):
C.2 File Format
Whether the scanning objectives are to replace or to represent source materials, the choice of file format, compression, color representation, and file-naming convention should, in the words of George Farr at the National Endowment for the Humanities, "close no doors."
The most popular and open file format for master images is TIFF, Intel byte order, version 5.0 or above. For 1-bit images, Group IV compression is widely used. Grayscale and color TIFF images are often stored in uncompressed form. Other formats are also being accepted by digital repositories. Columbia University's guidelines include Photo CD as a master image format, and the Library of Congress guideline for illustrations is PCX (Xerox 5200 scanner to produce diffuse dithered images of printed halftones). In their evaluation of approaches to create high-quality digital reproductions of complex illustrations, the Smithsonian Institution NMNH preferred JPEG (Baseline Sequential compression) as the format for digital masters. At approximately 10:1 for color images and 5:1 for grayscale, images met the NMNH goals for both (faithful reproduction) image quality and file size. (D'Amato) The National Library of the Czech Republic, also setting the standard of faithful reproduction, adopted JPEG with 3:1 compression as the master file format for manuscript scanning.(6)
The conservative approach to creating digital masters favors using no compression. At the other end of the spectrum, one might choose to save images in a proprietary form of wavelet compression. As digital collections increase in size, this decision has important economic ramifications. Even with a repository of 1-bit images, JSTOR was able to save considerable sums of money annually by moving their digital masters from Group IV to Cartesian Perceptual Compression (CPC), which is more efficient. The logic behind the decision to avoid all types of compression is to minimize loss when things go wrong. When a bit flips in an uncompressed image, it produces one dead pixel, but with certain types of compression much larger portions of an image can be lost. In practice, however, this rarely occurs, particularly if digital masters are stored in a system that is programmed for automatic error detection and analysis. (7) Guidelines for compression and file formats should appropriately weigh both (preservation) risks and (fiscal) rewards.
C.3 Color Space
Decisions about file format and color space sometimes go hand-in-hand. Kodak's PCD and Flashpix formats, for example, restrict color interchange to YCC and NIF RGB. In the past year, industry consortia have adopted sRGB as a standard for color encoding, in part to improve color matching between scanners and peripherals, but mostly in pursuit of the goal to represent color consistently on the Intenet. The relative advantages and disadvantages of file format/color encoding combinations should be fully explored in digital image capture guidelines, particularly for the digital master. Is one across-the-board recommendation viable for all materials? This decision has ramifications in the choice of scanning software, the scripts that will be written to create derivatives, and migration schedules.
C.4 Bit Depth
Guidelines for bit depth have traditionally been specified in one of three categories: 1-bit, 8 bit, and 24 bit, even though scanners sample as many as 14 bits per pixel. Dr. Franziska Frey and others recommend capturing a minimum of 12 bits per pixel to capture the dynamic range of photographic prints. (8) (Reilly/Frey) Newer scanning software and image processing programs (e.g., Photoshop 5) accommodate up to 16 bits per pixel. Again, scanning guidelines should consider the tradeoffs between file size and information loss in this category.
C.5 Tone Distribution
In addition to making a decision about bit depth for tone reproduction, project managers have instituted practices to control tone distribution. The Library of Congress RFP for pictorial materials (97-9), and Steven Puglia and Barry Roginski's guidelines (NARA) provide detailed overviews of this technique, which depends on the use of targets. I am slow to appreciate fully whether this practice alters the appearance of some images -- especially the high-key and low-key originals -- to the point that they "look wrong" when compared to the original. Decisions about black point, white point and gamma presumably have a similar effect. (Corbis, for example, uses a gamma setting of 1.6 to produce a neutral, relatively flat image.) Guidelines about tone distribution and representation in digital masters can be stated in the negative -- avoid clipping -- but, in practice, how does one determine if the master is right? Should quality control be restricted to evaluating histograms, or should the image be evaluated on a calibrated monitor or printer? Or both?
At the risk of sounding naive to professional photographers in the audience, I will ask the question, "Is it possible that in applying these techniques to regularize tone distribution that a 'good scan' would not necessarily look the same as a 'pleasing image'?" Image capture guidelines must be explicit on this point so that quality control procedures will be consistent with the desired outcome for the digital master.
C.6 Targets
The decision about whether or not to scan gray scale and color wedge targets with the master image raises other questions of documentation. Should dimension scales also be specified? If grayscales and color wedges are photographed with the source material, should their digital values be recorded as administrative metadata? Is there software that can do this automatically? Preservation metadata is a topic that will be addressed separately during this conference, but image capture guidelines must specify not only what metadata are essential to preserve image quality from generation to generation, but also where this metadata should be located. There are three options: within the file header, outside the header, or both. Guidance from imaging experts should be solicited to advise us of the pros and cons of various formats to record this information in file headers (e.g., TIFF EP will purportedly accommodate additional data about the signal of digital cameras).
C.7 Resolution/Sampling
Given what imaging experts have been saying about MTF, it is with some reservation that the following numbers of sampling rate (i.e., input resolution) are offered to define a spectrum of image quality. Nevertheless, resolution is one of the most important decisions to make about image capture as it has major ramifications in file size.
Depending upon the format of the source material, a resolution specification (e.g., 600 dpi) can be tantamount to saying, "there is only one scanner you can use." In Yale University's Project Open Book, for example, no microfilm scanner met their specification for 600 dpi 1-bit scanning, so they created one by investing in custom software.
Specifications for resolution vary. In some cases, a minimum file size is specified. In others, a single dpi resolution is used in order to ensure that enough pixels are in the master image to create a specified output, such as a full-screen image. (To a certain extent, this was the logic behind the 18MB Kodak Photo CD file format, which ensured that the 16 Base image could produce a quality 8 x 10 print.)
Examples of resolution specifications include:
One needs to develop a translation matrix to compare these numbers. (Which of the above, for example, represents the highest dpi?) When evaluating these specifications, one should segregate them according to the quality objective they were intended to support. To achieve fidelity (and if so, at 1:1 with respect to the original)? Or to achieve a target level of quality for a target output device? Either could be interpreted as meeting a preservation objective. A comparison of guidelines from several projects shows that practices vary widely:
C.8 Intent/Documentation
A final point to consider about the digital master, particularly for visual collections, is the issue of photographer's intent. In every reformatting project, we will apply our biases (sometimes under the guise of "best practice") in the digital capture process. If we want our future colleagues who will manage these digital collections to preserve our vision of the "right image," we must find a way to document what effectively is the "copy photographer's intent." All of the specifications reviewed above, when used in combination, are directed to serve a specific purpose -- either fidelity or a preferred representation. I would suggest that this is where our guidelines must stake one or more claims about the importance of documentation. Because of the great flexibility inherent to digital images, as well as the premise that digital masters will be carried forward to subsequent generations of display and print technology, it is important to document what we wanted the digital master to do when we first copied the original analog print or film. We should indicate, for example, which medium conveys the message -- print, screen, or film?
Examples of copy photographer's intent include:
Many more examples can be given, but this short list suffices to reiterate James Reilly's point that one image (digital master) will not always serve all purposes. We might want to consider the question, "Which purposes earn the designation of 'preservation quality'?"
C.9 Quality Control
Since digital master creation has been informed both by the "fidelity" and the "presentation" objectives, it follows that quality control procedures also fall into two categories:
D. Discussion Questions
IV. Quality of Derivatives
All of the care and attention given to image quality prior to this stage can be vitiated by poor decisions when creating derivatives from the master images. Were it not for limitations of network bandwidth and the limited choice of file formats supported natively by web browsers, guidelines for this stage of production would be focused exclusively on the goal of producing good (i.e., "pleasing") images. For image presentation, knowledge of audience is critical, as is the control over interface design. In current networked applications, image quality and file transfer speed are in direct competition unless proprietary compression schemes are used. (Some wavelet formats, for example, achieve high compression ratios and can decompress quickly at the client.) For this reason, perhaps we should reflect on whether a universal standard for "open, non-proprietary formats" best meets preservation and access objectives. This principle certainly makes sense for master images stored in reserve in a repository, but also for images that are to be used?
This consortial, international conference provides an ideal opportunity to test whether it will be practical, or possible, to develop general guidelines in this area. They may, in the end, be defined by the policies of each holding repository, which account for the needs of their community, and which depend upon the depth of technical and administrative infrastructure.
Setting aside the question of service models and the user interface, we can address many of the same issues of pictorial and functional intent that relate to the digital master. We might want to consider if the rules/recommendations should distinguish between print and on-screen derivatives. How much skew is acceptable? Should images, particularly those scanned from microfilm, be cropped? Is zooming necessary? Should dark, low contrast originals be delivered as "authentic" or as "easy-to-read" copies? Should targets be retained with pictorial images?
A. Issues
B. Technology
The operative rule is that whatever was not done during scanning to create a pleasing image must be done at this stage. In a number of photograph scanning projects, this task requires the judgment of a human observer with a trained eye. (Bancroft Library) Digital photography workflows raise interesting questions about derivatives. If "raw scans" are sent to another workstation for tone and color correction, then saved as high-resolution files, is the second image the digital master or the first derivative? Should this be documented in any way?
Image processing software is widely available, and each software upgrade seems to make it easier to execute scripts for batch production. As is the case with scanning, however, there are significant differences between creating digital images and creating good digital images. Where the scanning objective is to create images worth keeping, it seems that the goal here is to execute a program that will automatically generate images worth distributing.
Questions about parent-child image relationships -- if such a concept is relevant in an image database -- must be answered at this stage of the image capture workflow. Scanners will soon offer the capability of simultaneous output of grayscale and bitonal images. The image processing boards that make this possible to create multiple images during scanning could also be installed at the server to generate multiple images on-the-fly. High-speed, affordable grayscale scanning will create, for example, the technical challenge of making good 1-bit images for printing.
With 1-bit digital masters, the opposite challenge applies: the need to create grayscale derivatives for on-screen display. Programmers at the University of Michigan met this challenge several years ago by writing the (publicly available) TIF2GIF utility that is an important part of the JSTOR and University of Michigan Digital Library infrastructures. In these applications, grayscale GIF images are created on-the-fly "just in time" from 1-bit digital masters. (Price-Wilkin)
Collaboration between library practitioners and image scientists could be extremely fruitful in working to develop production tools optimized to meet digital image capture specifications. Anyone can purchase Debabelizer, Photoshop, or Image Alchemy, but how many librarians and computer programmers have the skills to apply filters in the correct order in a batch script, or to make use of Photoshop's "predefined mathematical operation known as convolution" to make custom filters?
B.1 Need for Production Tools
In this area, I believe we should not settle for guidelines, but work together to develop the tools that can create derivatives with no distinguishable pictorial loss, or derivatives optimized to the presentation of tone and color on a target device (such as a 1.8 gamma 800 x 600 monitor).
C. Summary of Practice
Practices vary according to the quality of digital masters and the target audience(s). At one extreme (of production and quality), 90% of the Corbis images are edited manually for tone and color reproduction. (Süsstrunk) At the other end of the spectrum, digital masters are downsampled without enhancements and converted to JPEG or GIF images. Where one expert favors JPEG for illustrations of natural scenes (D'Amato), another favors GIF for scanned photographs. (Puglia) GIF compression is lossless, but its palette is limited to 8-bits. The Columbia University guidelines recommend JPEG compression at a "Quality Level of 50." The Library of Congress recommends 15:1 compression for 24-bit images and 10:1 compression for 8-bit images. There are practical limitations to these recommendations, as no software exists to specify a set ratio of JPEG compression. These numbers, therefore, should be viewed as rules of thumb rather than prescriptive guidelines.
Target sizes for thumbnail images range from 15-20KB (NARA, Univ. Virginia) to 55KB (CA Heritage); the Library of Congress sets 150 pixels as the maximum image dimension. For full-screen "reference" images, sizes increase to 100-200KB (Univ. of Virginia, Bancroft Library), and to 640 pixels at the Library of Congress. This spectrum could be subdivided further with numbers from other projects, but the question of greatest relevance in interpreting practices and specifications at these and other institutions is this: Are users satisfied with the images?
How can we answer this important question? The project teams in the Museum Educational Site Licensing Project, which made 9,000 images available for study and teaching in seven universities, conducted user evaluations and found that . . . opinions varied. The following excerpts from their final report send a clear message. The problem with practices to date is that derivative images have been made for computers rather than for people:
For now, most designers of delivery systems select standard image sizes for their derivatives based primarily on dimensions of display devices . . . trying to strive for some balance between dimensions, quality and file size/compression. . . . [T]here is little we can say conclusively about the relationship between image production specifications and user satisfaction. . . . so many variables are involved that it is impossible to draw conclusions or make clear recommendations. . . .
It is unclear whether articulating any absolute guidelines for producing digital images is possible. What is possible is the articulation of a sound project planning framework and guidelines developed for particular types of originals and surrogates, digitized for specific uses and users. (Stephenson, p. 60)
Perhaps the guidelines for this area of digital image capture will be addressed indirectly by the Working Group on Selection. In the process of determining what content should be digitized, it may be possible to specify what the content is supposed to do in electronic form. If derivatives are to be optimized to uses and users, we need some way of finding out what our audiences want.
D. Discussion Questions
V. File Organization
Much has been said and written about digital object identifiers, but relatively little appears on the subject of digital file identifiers. Every file must have a name, of course, and rules about these names can be significant in a project workflow. Image capture guidelines should address the implications for cost and for quality (as it relates to functionality in the database) related to file naming practices. (Note: the administrative metadata issues which also relate to file names and to file headers are to be addressed in another session at this conference.)
A. Issue
B. Technology
Some scanning drivers offer the capability of programming a file naming scheme, with wildcards, for a designated batch of materials. As noted below, what they do not do is make exceptions for anomalies. The best technology for this task might be no technology at all. It may be worthwhile to explore alternative database models that do not depend upon meaningful file names for navigation.
C. Summary of Practice
With a file naming specification, the first thing a project manager would do is determine whether the names could be generated automatically by the scanning software. If not, then the names have to be changed in a post-scanning operation. If there is a logical order to an object, then the filenames often have to be named so they can be sorted into sequential order. In some cases, guidelines also stipulate that page numbers also be embedded in the file name. Due to the irregularity of pagination for historic materials, it is impossible to program a file naming utility (such as TiffView Tools) to incorporate this information. Some tools are available to display page images next to a column of file names to make it easier to key the data, but the challenge remains to reduce the manual activity of this operation, which can be considerable.
In some cases, file names also contain feature codes to designate image type, or even image quality. Library of Congress NDLP guidelines include one-letter codes at the end of the file name for selected images: u = archival, r = ref, and t = thumbnail; the code "v" is used in some projects to denote images of very high quality. Other projects embed ownership information in file names, although when this administrative metadata resides in each image, it is more often located in the file header.
Structural metadata can be created early or late in the imaging workflow. Data should be evaluated from a number of comparable projects to determine when this activity is most efficient and cost-effective. Until tools can be developed to automate fully some of these procedures, image capture guidelines should mediate between functional considerations and cost effectiveness. In this case, file naming represents both digital image quality and database design quality. It is important to confirm that we are establishing models that will scale and will persist.
D. Discussion Questions
VI. Conclusion
The development of digital imaging guidelines that can serve the varied needs of a range of cultural institutions promises to be a huge undertaking. Keeping them up to date will be an even bigger challenge if the goals are to keep pace with technology and to respond to users' expectations of image databases.
We can accomplish much, however, if we follow the lead of our sponsoring organizations. As noted in the introduction to this joint conference, RLG and NPO "act on behalf of their constituencies to establish uniform best practices and to disseminate widely the results of consensus-based working groups." We can establish consensus-based guidelines, if we focus on best practices, rather than best products. Despite changes in technology -- even the arrival of whatever will replace raster images as the best format for copying and distributing materials in our collections -- we can rely upon a stable approach to managing the process of converting materials into electronic form. Controlling the digital capture workflow does not guarantee that things will go right, but making reasonable efforts to control what we can will put us in the best position to ensure that high production will have the potential to yield high quality.
Selected Resources
*all URLs valid as of 28 February 1999
Handling
"Digitising the Primary Source Material at the English Faculty Library, University of Oxford." <http://info.ox.ac.uk/jtap/reports/digit/>
Library of Congress National Digital Library Program, RFP96-18
for Digital Images from Original Documents, Sections C.4 and C.6.2.
<http://memory.loc.gov/ammem/prpsal/ctoca.html>
Mayer, Manfred. "Scanners," message posted to Conservation DistList, 9 December 1997. <http://palimpsest.stanford.edu/byform/mailing-lists/cdl/1997/1549.html>
Examples of Cradles
ImageAccess Home Page, click on "New Features" hyperlink
beneath the BookEye Scanner.
<http://www.imageaccess.com/>
Minolta PS3000 scanner.
<http://misi.minolta.com/products/Scanners/PS3000.htm>
Also see, Peter Leggate, "Internet Library of Early Journals
Annual Report (August 1997)," scroll to the section, "Scanning
with the Minolta PS3000 face-up book scanner."
<http://www.bodley.ox.ac.uk/ilej/papers/ar1997.htm>
Mintzer, Fred C., et al. "Toward on-line, worldwide access
to Vatican Library materials,"
IBM Journal of Research and Development, Vol. 40, No. 2 - Services,
Applications, and Solutions, 1996, see Figure 3.
<http://www.almaden.ibm.com/journal/rd/mintz/mintzer.html>
Riser, John. "Hand-crafted cherry book cradles."
<http://www.lib.virginia.edu/speccol/scdc/cradle.html>
See also, <http://www.lib.virginia.edu/speccol/scdc/cradle2.html>
Systems: Evaluating Quality
Colet, Linda Serenson, Kate Keller, and Erik Landsberg. Digitizing Photographic Collections: A Case Study at the Museum of Modern Art, NY, Presented at the Electronic Imaging and the Visual Arts Conference, The Louvre Museum: Paris, September 2, 1997. See, especially, the section, "Evaluating Digital Equipment."
Gann, Robert, Ph.D., Judith Goode, and Rod Wadas. Reviewing and Testing Desktop Scanners. Second edition. Hewlett-Packard Company, Learning Products Department, 700 71st Avenue, Greeley, CO 80634, 5963-5550E, Second Edition, January 1995.
PIMA(Photographic & Imaging Manufacturers Association, Inc.)
/ IT10 Technical Committee on Electronic Still Picture Imaging home page. Follow various links to track progress in developing standards and tools to measure noise, resolution, tone, and color reproduction.
<http://www.pima.net/it10a.htm>
Reilly, James and Franziska Frey. Recommendations for the Evaluation
of Digital Images Produced from Photographic, Micrographic, and Various Paper Formats, May 1996.
<http://memory.loc.gov/ammem/ipirpt.html>
Williams, Don. "What is an MTF ... and Why Should You Care?"
RLG DigiNews, vol. 2, no. 1, February 15, 1998.
<http://www.rlg.org/preserv/diginews/diginews21.html>
Digital Masters: Quality Issues
Each of the following publications presents a clear rationale for the scanning guidelines that have been advocated. In some cases, the authors relate scanning decisions to the characteristics of the source material; in others, to meet baseline output characteristics of digital images, prints, or film.
Agfa Educational Publications. An Introduction to Digital Scanning,
Digital Color Prepress Series, Vol. 4.
<http://www.agfahome.com/publications/dcp4page.html>
D'Amato, Donald, Ph.D., and Rex C. Klopfenstein, Mitretek Systems,
Inc."Requirements and Options for the Digitization of the Illustration
Collections of the National Museum of Natural History," March
1996.
<http://www.nmnh.si.edu/cris/techrpts/imagopts/index.html>
Ester, Michael. Digital Image Collections: Issues and Practice. Washington, DC: Commission on Preservation and Access, 1996.
Kenney, Anne R. "Digital-to-Microfilm Conversion: An Interim Preservation Solution," Library Resources & Technical Services 37 (October 1993) pp.380-401; Erratum 38 (January 1994) pp. 87-95.
Neff, Raymond K., "A New Consortial Model for Building Digital
Libraries," Session #7 Multi-Institutional Cooperation, Scholarly Communication and Technology: Conference Organized by The Andrew W. Mellon Foundation at Emory University April 24-25, 1997. See, "Appendix E: Technical Justification for A Digitization Standard for the Consortium."
<http://www.arl.org/scomm/scat/neff.html>
Puglia, Steven and Barry Roginski. NARA Guidelines for Digitizing
Archival Materials for Electronic Access, January 1998.
<http://www.nara.gov/nara/vision/eap/eapspec.html>
Derivatives: Quality and Production Issues
Bancroft Library, "Digitizing the Collection: Image Capture,"
California Heritage Collection, University of California, Berkeley.
<http://sunsite.berkeley.edu/CalHeritage/image.html>
Fleischhauer, Carl. "Digital Formats for Content Reproductions."
National Digital Library Program, Library of Congress, July 13, 1998. See Section IV, and note descriptions of contrast stretching, diffuse dithering,
and other enhancements.
<http://memory.loc.gov/ammem/formats.html#IV>
Price-Wilkin, John. "Just-in-time Conversion, Just-in-case
Collections: Effectively leveraging rich document formats for the WWW," D-Lib Magazine, May 1997.
<http://www.dlib.org/dlib/may97/michigan/05pricewilkin.html>
Stephenson, Christie and Patricia McClung, eds. Delivering Digital Images, Cultural Heritage Resources for Education: The Museum Educational Site Licensing Project, Volume 1. Los Angeles, California: The Getty Information Institute, 1998.
Süsstrunk, Sabine. "Imaging Production Systems at Corbis
Corporation," RLG DigiNews, vol. 2, no. 1 (February 15, 1998).
<http://www.rlg.org/preserv/diginews/diginews2-4.html#technical>
University of Virginia Library Electronic Text Center, "Sample
Scans: The Electronic Archive of Early American Fiction," scroll to "File size and image quality comparisons."
<http://etext.lib.virginia.edu/projects/scantest.html>
File Organization: File Names as Structural Metadata
Library of Congress Internal Documentation, "Turning pages
within in a digital reproduction," May 4, 1998.
<http://memory.loc.gov/ammem/award/docs/page-turning.html>
"Naming and Linking Strategy," RLG Digital Collections
Project: Studies in Scarlet: Marriage, Women, and the Law, 1815-1914.
<http://www.rlg.org/scarlet/name.html>
Preservation Resources, Scanning Questionnaire, 1998. p. 4; quality
control documentation.
<http://www.oclc.org/oclc/presres/scanning/scanquestion.pdf>
<http://www.oclc.org/presres/scanning/qa.htm>
Seaman, David. "Guidelines for SGML Text Mark-up at the
Electronic Text Center."
See section, "Specific Procedures for Adding Image Headers,"
which outlines batch methods to populate file headers with metadata.
<http://etext.lib.virginia.edu/tei/uvatei11.html>
Appendix
Technical specifications alone not accurate predictors of image quality:
![]() |
![]() |
| 300dpi 1-bit TIFF image Scanner A |
300 dpi 1-bit TIFF Scanner B |
|
|
|
|
|
|
| 300 dpi, Scanner A | 300 dpi, Scanner B |
Note: The images above were created by the same service bureau from the same reel of first-generation (camera negative) black and white 35mm microfilm created in 1998 at Harvard University. These examples are provided to demonstrate that standardized measurement tools (a scanning target and associated software) for objective measurements of output image quality would be tremendously useful to assess and compare scanners with similar technical specifications. In this case, the linear array of the CCD of Scanner B was slightly higher than Scanner A, yet Scanner A is far more expensive and has more sophisticated software. Even more dramatic quality differences in the detail reproduction capabilities of similar materials have been noted in other Harvard projects where different scanners of the same optical resolution (400 dpi) were used. A target that would permit objective measurements of noise, spatial resolution, and image processing combinedwould presumably help to make "digital benchmarking" --currently defined as relating dpi to detail to predict quality -- a more viable technique for determining the minimum resolution needed from a given scanner to meet quality requirements.
References