
As might be expected, newspapers proved particularly troubling, especially
because of their size. However, Ross Coleman identified additional obstacles
stemming from "the variety within any one title, from foxing and discoloration,
to the use of varying fonts and point sizes on the one page." Coleman
also acknowledges that newspapers require quality OCR in order to truly
justify their digitization, since they generally lack even rudimentary indexing.
Unfortunately, marginal print quality and type size variation often thwart
the creation of an accurate body of searchable text, even with current technology.
(For example, ProQuest Historical Newspapers™ reports 80-90% OCR accuracy
for the article text from its New York Times microfilm.)
Despite having successfully completed its initial objectives, ACDP exceeded
its anticipated resource consumption to such a degree that conversion of
additional 19th century periodical titles has been put on hold. Within
the selection of periodicals, newspapers continue to be viewed as especially
daunting targets for digital capture. Coleman reports "the fact that
no more have been done, or even contemplated, highlights the fact that—at
the time—we were not confident in the technology, or our procedures, or
in the effectiveness in delivering such things over the Web in a usable
manner."
So while ACDP has succeeded in greatly expanding access to a corpus documenting
an important slice of Australian history, it has not, as yet, provided the
basis for expanded conversion of other materials from that period.
Conclusion
In revisiting these two projects, we encountered somewhat different perspectives
about the current viability of digitizing, OCRing, and providing Web access
to microfilmed newspapers. One possible explanation for the differing
opinions is the timing of the initiatives. ACDP started out as the Burney
experiment (along with many other early digitization experiments) was wrapping
up. Burney was conceived as more of an experiment, and was carried out
at a time when the technology was clearly not up to the task. It was shelved
until very recently, when the technology seemed like it might finally be
able to tackle the challenge.
On the other hand, ACPD was conceived as a production enterprise, and was
carried to completion despite knocking against technological barriers at
several points. Having only recently completed the mounting of files,
ACDP is still reticent about taking on additional conversion, given the
technological obstacles it encountered.
It is noteworthy, however, that what most distinguishes Ross Coleman's perspective
from that of John Goldfinch’s has little to do with the technological underpinnings
of the respective projects. Although both speak to the frustrations of digitizing
challenging older materials, the most striking difference is in Coleman's
emphasis on the obstacles created by management issues. Problems faced in
the management arena remain underreported and under-discussed within digital
imaging circles, compared to those in the technical realm. Even as some
(though by no means all) the technological barriers to effective large-scale
digitization of older printed materials begin to fall, we would be wise
not to downplay the ongoing challenges represented by funding, staffing,
vendor relations, planning, and the like.
Perhaps the ultimate lesson from the experiences described above is that
there is still no such thing as a large-scale, cookie-cutter digitization
project. Despite many successfully completed efforts and improved availability
of training and documentation, the work remains technically complex, time-consuming,
and expensive. Working from marginal source materials introduces additional
complexities, and newspapers continue to push the limits of current digital
capture, image processing, OCR, and Web delivery technologies.
Further reading
In addition to the references already given, here are some useful readings
on recent newspaper digitization efforts:
The ProQuest Historical Newspapers™ project (backfiles of the Christian
Science Monitor, the Wall Street Journal, the New York Times,
the Washington Post and Canadian newspapers digitized by Cold North
Wind ("practically every newspaper published in Canada from 1750 to
1950") with plans to add other national, regional and local publications).
The home
page provides links to a slide
show about the project. An additional demo
is also available.
OCLC
Digital & Preservation Resources and Olivesoft
digitization of historic newspaper collections (an initiative "to
help libraries provide full online searchable access to their historic newspapers").
Read the press
release for this collaboration and read about Olivesoft's ActivePaper Archive™
software.
The Nordic Digital Newspaper Library (Nordic Newspapers from 1640-1860).
Read a paper
by Majlis Bremer-Laamanen presented at the 2001 Annual Meeting of the United
States Newspaper Program held at the Library of Congress in Washington,
DC on April 26th 2001.
Digitisation of Newspaper Clippings: The LAURIN Project by Günter
Mühlberger. RLG
DigiNews, v. 3, no. 6, December 15, 1999.
--RE
Footnotes
(1) Erich Kesse, Robert Harrell, Richard Phillips and
Cecilia Botero, Caribbean Newspaper Imaging Project, Phase
I: Imaging and Indexing Model and Phase
II: OCR Gateway to Indexing. (back)
(2) Hazel Podmore, “The Digitisation of Microfilm” in
L. Carpenter, S. Shaw and A. Prescott, eds., Towards the Digital Library
(London, 1998). (back)
(3) Colin Webb and Ross Coleman, Digital conversion of Nineteenth century
publicationsProduction management in the Australian Cooperative Digitisation
Project 1840-45. LASIE,
v. 31 no. 2, June
2000, pp.5-20. Also available in HTML.
(back)
(4) Ibid. (back)
Publishing Information
RLG DigiNews (ISSN 1093-5371) is a newsletter conceived by the members of the Research Libraries Group's PRESERV community. Funded in part by the Council on Library and Information Resources (CLIR) 1998-2000, it is available internationally via the RLG PRESERV Web site (http://www.rlg.org/preserv/). It will be published six times in 2001. Materials contained in RLG DigiNews are subject to copyright and other proprietary rights. Permission is hereby given for the material in RLG DigiNews to be used for research purposes or private study. RLG asks that you observe the following conditions: Please cite the individual author and RLG DigiNews (please cite URL of the article) when using the material; please contact Jennifer Hartzell, RLG Corporate Communications, when citing RLG DigiNews.
Any use other than for research or private study of these materials requires prior written authorization from RLG, Inc. and/or the author of the article.
RLG DigiNews is produced for the Research Libraries Group, Inc. (RLG) by the staff of the Department of Preservation and Conservation, Cornell University Library. Co-Editors, Anne R. Kenney and Nancy Y. McGovern; Production Editor, Barbara Berger Eden; Associate Editor, Robin Dale (RLG); Technical Researchers, Richard Entlich and Peter Botticelli; Technical Coordinator, Carla DeMello.
All links in this issue were confirmed accurate as of February 14, 2002.
Please send your comments and questions to preservation@cornell.edu.