RLG
 Contents of: Volume 10, Number 5 ISSN 1093-5371  
  Feature Article 1: Fedora and the Preservation of University Records Project  
  Feature Article 2: Digging Up Bits of the Past: Hands-on With Obsolescence  
  Highlighted Web Site: Collaborative Digitization Program  
  FAQ: Trial by File: Five Tools for Managing Formats  
  Calendar of Events  
  Announcements  
  Publishing Information  
 Feature Article 1  

Fedora and the Preservation of University Records Project

Authors: Kevin Glick - Yale University (kevin.glick@yale.edu), Eliot Wilczek - Tufts University (eliot.wilczek@tufts.edu), Robert Dockins - Tufts University (robert.dockins@tufts.edu)

Introduction

The Digital Collections and Archives of Tufts University and Manuscripts and Archives of Yale University have recently completed a National Historical Publications and Records Commission (NHPRC) electronic records research grant (grant number 2004-083) entitled “Fedora and the Preservation of University Records.” The Tufts-Yale Project focused on three main areas of research: requirements for trustworthy recordkeeping systems and preservation activities, the ingest of records into a preservation system, and the maintenance of records in a preservation system. The project reports are listed in Figure 1.

The project aimed to combine electronic records preservation research and theory with digital library research and practice. In particular, the Tufts-Yale Project planned on answering the question: Does Fedora have the ability to serve as an electronic records preservation system.[1] Tufts University has been using Fedora as the basis of the Tufts Digital Repository for several years.[2] As it was already strongly invested in developing and managing this repository with an expanding set of services, Tufts was keen on exploring Fedora’s ability to serve as a preservation system for electronic archival records. At the start of this project, Yale had been considering various alternatives for a preservation system, including a Fedora-based solution.

The Tufts-Yale Project focused on university records because each institution has a primary responsibility to preserve these records. However, the findings of this project are not particularly university-specific and are easily applicable to the management and preservation of electronic records in most industries.

v10_n5_art1_img1
Figure 1. Project Reports.

The Tufts-Yale Project framed its efforts within the Reference Model for an Open Archival System Information System (OAIS) and the resulting research products can be mapped to that framework.[3] The Ingest Guide describes the Ingest function as well as much of the Administration function: Establish Standards and Policies, Audit Submission, and Negotiate Submission Agreement within the Administration function. The Maintain Guide covers the Data Management and Archival Storage functions. The requirements for recordkeeping attempt to guide the activities of a Producer, while the requirements for preservation activities attempt to guide the activities of an Archive and thus represents all the functional areas of the OAIS Reference Model.

The OAIS Reference Model, the requirements, the Ingest and Maintain guides, the resources and services that support the guides, and the implementation of the guides should be viewed as a tightly related set of steps that build on each other. The OAIS Reference Model is the overarching conceptual structure for preservation activities and systems. Beneath OAIS sits a layer of requirement sets for preservation activities or systems, such as the Tufts-Yale Project preservation requirements. These requirements further articulate OAIS by describing the attributes of preservers that fit within the context of the Reference Model. Beneath these requirements are the Ingest Guide and Maintain Guide, which translate requirements into actions for those functional areas of preservation. The Tufts-Yale Project did not develop guides for all functional areas of preservation such as Access and Preservation Planning. Resources and services—ideally, standardized and openly available—support the execution of the activities defined in the guides. This interconnectedness reinforces each level, giving context to the frameworks, requirements, guides, resources and services, and implementation decisions, helping to enable their intelligent utilization. Institutions will still have implementation decisions to make within the context of the guides, resources, and services: they cannot simply take the guides and call them their procedures.   v10_n5_art1_bo2

Fedora

Over the course of our work, the Tufts-Yale project shifted its attention away from assessing whether Fedora could serve as a preservation system to concentrating on developing requirements for recordkeeping and preservation, and creating the Ingest and Maintain guides. We changed our focus when we realized that we were asking the wrong question. In serving as the repository core of a preservation system, a Fedora instance (or instances) would only be one part of an overall preservation environment. Large portions of ingest and access activities in addition to preservation planning decisions would occur outside of the Fedora instance. Even though some preservation policies may be articulated and managed through Fedora, an institution still must formulate these policies—they are not pre-set in Fedora. Rather than an out-of-the-box, limited repository solution, Fedora is a repository architecture upon which an institution can shape a repository in many different ways. Thus, the suitability of Fedora as the basis of a preservation system depends significantly on its implementation.

Furthermore, as of version 2.1 (released February 2006), Fedora operates within the Fedora Service Framework, which provides the architecture for new services that support a Fedora repository instance but are outside and independent of the repository itself.[4] Two such services that currently exist are Directory Ingest and OAI Provider, and members of the Fedora community are currently developing additional open-source services. The Fedora Preservation Working Group is currently investigating and developing services for supporting preservation activities.[5] Which of these external services an institution uses, and how it employs them, will be a significant factor in the suitability of a Fedora-based repository as a preservation system.  v10_n5_art1_bo3

The question we should have asked was: “Can a Fedora repository, surrounded by the proper preservation policies, tools, and Fedora services, serve as the basis of a trustworthy preservation system?” Or put another way: “Does the use of a Fedora repository necessarily prevent the development of a trustworthy preservation system?” The Tufts-Yale Project team feels it can answer yes to the first question and no to the second. The Fedora core provides a promising basis for a preservation system. Its agnostic view of file formats and object types enables it to manage essentially any type of file. It has the ability to manage objects with complex—including hierarchical—relationships with its use of RDF or METS metadata. It can manage multiple bitstreams for a single object, which can enable archivists to track and store the original bitstream of a record and the bitstreams of any subsequent transformations. It has versioning and persistent identifier capabilities. With XACML, it can articulate policies that manage access to records and prevent unwanted modifications. Fedora is a transparent system and Fedora objects are articulated in XML (usually FOXML or METS), making it feasible to migrate records out of Fedora.

There are, however, many Fedora services critical to preservation that require further development—some of which is already underway. For example, the Fedora Preservation Services Working Group is currently developing an alerting service that would support the documentation, encoding, and management of events that impact preservation.[6] The Working Group is also investigating the feasibility of developing a repository history service. Finally, the Fedora community is working towards formalizing content models, which could possibly be used to define record series. More important than the development of any particular preservation-supporting service, an entity such as the Preservation Services Working Group will have to provide the Fedora community with a roadmap of preservation needs and priorities for new Fedora services or ensure that Fedora can work smoothly with existing or future tools, such as those that enable format validation or integrity checks. This will be a challenging task as Fedora moves beyond grant funding in 2007. However, a well-guided and active community should enable Fedora—with its hallmark of flexibility and adaptability—to support the services needed to meet the evolving challenge of preserving electronic records.

Recordkeeping Systems and Preservation Activities

What turned out to be the most beguiling and difficult part of the project was the activity we originally envisioned as the easiest and most straight-forward. To ensure records would be created and kept by producers in a form that could be preserved, we began to create a set of requirements for recordkeeping systems by synthesizing ten requirement sets developed during the 1990s and early 2000s into a single set that was appropriate for a university setting. We further intended to undertake a parallel process to develop requirements for preservation systems at universities.

We initially organized our requirements according to Indiana University’s “Requirements for Electronic Records Management Systems (ERMS)” because it was the requirement set most closely associated with universities.[7] However, we soon felt that some of the categories of requirements were concerns that either permeated every aspect of recordkeeping—Audit Trails, Metadata, Documentation—or the goal of the requirements—Authenticity. This led us to develop our own array of ten requirement categories[8] that organized just over 200 requirements. All of these requirements implicitly assumed nine concerns—Audit, Authorization, Automation, Compliance, Documentation, Financial Sustainability, Metadata, Reporting, and Training. In other words, nearly every activity within a trustworthy recordkeeping system had to be auditable, automated, authorized, documented and reportable; generate appropriate metadata; be undertaken or managed by properly trained personnel in a compliant manner; and be supported by a stable source of funding. We undertook the same process for developing a requirement set for a trustworthy preservation system. We organized just over 140 requirements into seven categories that closely followed the OAIS Reference Model.[9] Both sets of requirements outlined the attributes needed to support a trustworthy recordkeeping and preservation system. A trustworthy system allows a person to presume the authenticity of records managed by the system.v10_n5_art1_bo4

This initial attempt at a set of requirements for recordkeeping and preservation systems generated many problematic issues. First, we struggled to precisely define the elements that compose a recordkeeping or preservation “system,” in part because the existing literature does not agree on terminology. Second, it was difficult to describe the relationship between the recordkeeping and preservation system. We had developed a very life-cycle-centric relationship between two distinct systems, presuming an institution would always move records from a recordkeeping system to a separate preservation system—not always the case in the real world. In addition, we identified preservation requirements for the recordkeeping system that were repeated throughout the preservation system. Finally, we had organized the ten categories in the recordkeeping requirements arbitrarily and did not firmly base the categories on any previous work.  

With the invaluable help of Nancy McGovern of Cornell University [Editors note: Nancy McGovern is now at the Inter-university Consortium for Political and Social Research (ICPSR).] we resolved these issues and formulated a single document divided into two different chapters, one for recordkeeping system requirements and the other for records preservation requirements. Both sets of requirements are v10_n5_art1_bo5organized and grouped into sections and subsections that correspond to existing intellectual frameworks for recordkeeping and preservation. The recordkeeping chapter is organized into seven sections based loosely on the framework presented in the records management and controls section of ISO 15489-1: Information and documentation—Records management.[10] The records preservation requirements chapter is divided into seven sections and thirty-four subsections, with the requirements grouped loosely according to the functions of the OAIS Reference Model.

Our conception of the difference between a recordkeeping system and preservation activities presumes that a producer will create, acquire, use, and manage records in a recordkeeping system to suit its current business needs, while the central purpose of preservation activities is to preserve records. In a pure records lifecycle model environment, an archive will later ingest some records from a recordkeeping system into a separate preservation system that the archive administers, undertaking preservation activities in this system. In a records continuum model, recordkeeping is a continuous process that does not necessarily move from a recordkeeping system to a separate preservation system administered by completely separate juridical entities. In this model preservation activities may take place in the recordkeeping system. Many producers and archives operate in a mixed world between these two models. This new specification of requirements should be suitable for any of these situations. v10_n5_art1_bo6

Ingest

The main product of our research into ingest is the Ingest Guide, which describes the actions needed for a trustworthy ingest process. The Guide refers to ingest broadly, defining it as the entire process involved in moving records from a recordkeeping system to a preservation system. This process consists of the Producer and Archive agreeing to and defining what records will be transferred and the manner of the transfer, validation, and transformation, as well as getting the records into the preservation system.

As mentioned earlier, the Ingest Guide covers all of the OAIS Ingest function and the following activities within Administration: Establish Standards and Policies, Audit Submission, and Negotiate Submission Agreement. It builds directly on the work of the Producer-Archive Interface Methodology Abstract Standard (PAIMAS), which was created by CCSDS as part of its continued examination of the ingest process.[11] Composed of four phases, Preliminary, Formal, Transfer, and Validation, the PAIMAS makes a detailed examination of the process for creating a submission agreement in the Preliminary and Formal phases, but gives only a cursory study of transfer and validation activities. In developing the Ingest Guide, the Tufts-Yale Project team found the division of a Preliminary and Formal phase for creating a submission agreement too formal for a university setting. We generated a guide that describes a simplified, more action-oriented negotiate submission process. While the PAIMAS was very succinct, we chose to be much more detailed in our description of the transfer and validation process.

The Ingest Guide contains two main sections. Section A, Negotiate Submission Agreement, details how the producer and the archive create and arrange a submission agreement that defines the terms and conditions of the transfer of records from the producer to the archive and details the scope of the records along with the nature of their validation and transformation. Section B, Transfer and Validation, details the actual transfer, validation, and transformation of records. The Guide is presented in the form of a large flowchart. Every part includes a narrative summary, a flowchart illustrating all of its steps, a description of each step, and a list of resources that each step utilizes and/or produces. The parts that comprise the two sections are listed in Figure 2. 

v10_n5_art1_img2
Figure 2. Ingest Guide.

Although the Ingest Guide is a prescriptive guide for a trustworthy ingest process, it is not a detailed manual of procedures. The Guide describes the actions that must be undertaken to trust the ingest process and prescribes how to undertake these steps at a high level, but it does not prescribe how to proceed in full detail. For example, the Guide calls for an archive to select preservation formats for records it chooses to transform, but it does not dictate what those preservation formats should be.

The Ingest Guide gives archives a roadmap to build a network of well-documented resources, including tools, procedures, and policies, that serve as the foundation for well-documented appraisal decisions and accessioning activities. This documentation is a key element of a successful preservation program. Archives making undocumented decisions based on ad-hoc policies, procedures, and tools cannot hope to successfully preserve records.

The Guide is geared toward enabling archives to ingest records in a semi-automated and scaleable manner by helping it regularize and streamline many decision-making steps. We envision that archives could manage many of the resources described in the Guide as machine-readable objects. The more machine-readable resources an archive has, the more it can automate its ingest process. Obviously, expressing resources as machine-readable objects can take a considerable investment of effort. Each archive will have to determine the degree of automation that is appropriate for its operations. However, we feel that the growing scale and complexity of electronic records and data objects that archives will have to preserve will force them to rely on semi-automated, regularized processes to meet the demands of this task.  

Maintain

In order for long-term preservation to be possible, archives must keep, store, and protect from harm records under their care; in short, they must maintain these records. This process is roughly equivalent to the Data Management and Archival Storage functions of the OAIS Reference Model and the Maintain Electronic Records process for the InterPARES Project’s Preservation Model.[12] The central thrust of our research on maintain is the Maintain Guide, which describes ten scheduled and twenty irregular event types that may occur during an archive’s maintenance of electronic records. The event types are listed in Figure 3. 


Figure 3. Maintain Guide.

The Maintain Guide does not represent the entire preservation process, but only a core subset of that larger process. The Guide has excluded any management—such as preservation planning and administration—or subject-related decision-making activities from its purview in order to focus on the technical and procedural activities of maintaining data integrity. The Guide describes activities that an automated system or systems administrator or technician can execute without needing the subject or management knowledge of the records to undertake this work. Any maintenance work that rises to the level of administration or preservation planning falls outside of the scope of the Guide.

The Maintain Guide does not present a sequence of event types, but rather a set of event types that are each triggered by different circumstances. Each event type in the Guide describes the nature of the event, the preconditions for the event occurring, and a list of activities an archive must follow in response to an event. Although a prescriptive guide to undertaking maintenance activities, the Maintain Guide—like the Ingest Guide—does not fully prescribed all the details and decisions archives need to make.

The event types are much the same as those managed by any typical information systems (IS) department. However, the nature of the response to these events and the activities specified do not necessarily follow the standard operating procedures of the typical IS department. The requirements inherent in preserving electronic records may force those maintaining electronic records to undertake different and perhaps more expensive activities than most IS departments normally execute.

The Maintain Guide assumes that no archivist or electronic records preservation officer should attempt to maintain electronic records in isolation—particularly in a university setting. The expenses in technology and staff for conducting a trustworthy maintain process are significant. These costs will likely dwarf the normal operating budgets of most archives and will necessitate finding ways to utilize existing resources or sharing expenses across departments or even across institutions. Nearly all archives will have to collaborate with others such as its institutional information systems department, collaborators at other institutions, or outside vendors. v10_n5_art1_bo7

Conclusion

The products of the Tufts-Yale Project are meant to help bring the electronic records research of the past two decades closer to the daily work of archivists and others charged with the preservation of electronic records and other digital objects. The work of this project does not provide complete solutions that archivists can simply turn into their policies and procedures, rather, this project provides detailed frameworks to help archives make better, more systematic decisions about their work.

The three main products of the Tufts-Yale Project, the requirements for recordkeeping systems and preservation activities, the Ingest Guide, and the Maintain Guide all suggest areas of further work. The requirement sets point to the need for creating new evaluation tools or implementing ones currently under development, such the RLG-NARA Audit Checklist for Certifying Digital Repositories. The Ingest Guide and Maintain Guide provide a roadmap for institutions engaged in digital preservation to carefully reexamine and possibly reengineer their business processes. Both guides also express the need for numerous machine-readable resources and services—many of which do not yet exist. Considerable community-based work needs to be done to develop these tools. We also hope that members of the archival, digital library, and digital preservation communities can take our work in directions we have not imagined.


1. Fedora is a general purpose repository system developed jointly by Cornell University and the University of Virginia Library. For more information, see <http://www.fedora.info>.

2. For more information, see “TDR: Tufts Digital Repository,” <http://dca.tufts.edu/tdr/index.html>.

3. ISO 14721:2003, Space data and information transfer systems -- Open Archival Information System -- Reference model. Available at <http://public.ccsds.org/publications/archive/650x0b1.pdf>.

7. Indiana University, 2002, “Requirements for Electronic Records Management Systems (ERMS),” <http://www.indiana.edu/~libarch/ER/requirementsforrk.doc>.

8. The ten categories were: Compliance, Creation and Capture, Maintenance, Classification, Retention and Disposition, Protective, Preservation, Use Rights, Discovery and Delivery, and Design and Performance.

9. The six categories were: Common Services, Ingest, Archival Storage, Data Management, Administration, Preservation Planning, and Access.

10. ISO 15489-1: 2001, Information and documentation – Records management – Part 1: General. During this process we also gave careful consideration to mapping the recordkeeping requirements to Trusted Digital Repositories: Attributes and Responsibilities, but ultimately decided that ISO 15489 was a better fit for our requirements. Organizing the requirements according to ISO 15489 gave us an existing conceptual framework upon which we could shape the requirements. It is our opinion that there is no consensus or preferred framework for recordkeeping system requirements comparable to the OAIS Reference Model framework for preservation requirements.

11. Consultative Committee for Space Data Systems, Producer-Archive Interface Methodology Abstract Standard, CCSDS 651.0-B-1, Blue Book, May 2004 <http://public.ccsds.org/publications/archive/651x0b1.pdf>.

12. “A Model of the Preservation Function,” Appendix 5 of The Long-term Preservation of Authentic Electronic Records: Findings of the InterPARES Project (San Miniato, Italy: Archilab, 2005).


 Feature Article 2  

Digging Up Bits of the Past: Hands-on With Obsolescence

Authors: Richard Entlich - Cornell University (rge1@cornell.edu), Ellie Buckley - Cornell University (elb34@cornell.edu)

Introduction

Over a decade has passed since “Preserving Digital Information,” the seminal CPA/RLG (now CLIR/OCLC) report sounded the library community’s first major declaration of intent to actively engage the battle against digital decay. Since then a huge body of work on the theoretical and applied aspects of digital preservation has emerged. The authors of these papers have sought examples of loss of significant digital content resulting from physical degradation and/or obsolescence in order to illustrate the nature and severity of the problem. In the analog world, examples of content loss are commonplace and fairly well-known, ranging from spontaneously combusting Hollywood films on cellulose nitrate stock to crumbling printed matter on acidic paper. However, compelling and well-documented examples of digital loss have been relatively scarce, and this has hampered the effort to raise the profile of digital preservation as a problem of immediate import that merits substantial resource allocation and persistent attention by governments, the press, and the general public.

In fact, the paucity of good exemplars, the exposure of some popular anecdotes as apocryphal, and the use of near-loss scenarios as stand-ins for actual loss have led to something of a backlash, with claims that the urgency called for by digital preservation proponents is excessive. For example, in 2003, technology writer Simson Garfinkel, writing in the MIT Technology Review, ridiculed claims of wide-scale endangerment of digital content in a piece entitled “The Myth of Doomed Data.” Garfinkel cites the heroic rescue of the BBC Domesday videodisc project as evidence, not of the need for more rigorous attention to digital preservation issues, but as proof that when the content is valuable enough, a technological fix will be found. He then offers a simple formula for eliminating future problems—use widely supported file formats and avoid file compression schemes.

More recently, in February 2006, Chris Rusbridge, director of the UK Digital Curation Centre, published a provocative article in Ariadne entitled “Excuse Me... Some Digital Preservation Fallacies?” in which he expressed skepticism that truly obsolete commercial software actually exists and issued a challenge for readers to submit bona fide examples of older consumer-oriented commercial software products where the data files are “completely inaccessible” today.

Neither author claimed that digital preservation is a non-issue, and both acknowledged that certain types of obsolescence (e.g., media formats and non-standard file formats) present more significant problems. But both asserted that the sky may not be falling quite as severely or as imminently as often depicted, particularly for commonly used media and file formats.

To an extent, they have a point. The nearly doomed Domesday Project data was published in 1986, just five years after IBM’s introduction of its first microcomputer lent instant credibility to personal computing and catapulted an industry already buzzing with activity into a frenetic era of growth. Technological revolutions are times for experimentation and risk-taking, not stability and standardization. The characteristically intense competition and rapid innovation of such periods present technology users and creators with tremendous opportunity but also substantial risk. Early adopters of new technologies always have to consider the tradeoffs between taking advantage of new approaches and the potential for being saddled with an orphan if the selected technology fails to take hold. Many early optical disk projects faced a fate similar to that of Domesday. As the industry settled down and consolidated, numerous manufacturers of proprietary disc technologies went out of business, leaving early adopters with rapidly obsolete media and hardware.

A similar evolution occurred in the software world. As illustrated in Figure 1, even in the late 1980s there was still a rich assortment of competing software products available, each offering its own proprietary native format and battling to carve out a small slice of market share.

v10_n5_art2_img1

As the Internet evolved and digital content sharing was facilitated via ftp, gopher, WAIS, email attachments, and ultimately the Web, the importance of interoperability increased and use of outlier software became more of a liability. In relatively short order, industry consolidation reduced the number of competitors and today most people would be hard pressed to name three or four word processors across all computing platforms, a far cry from the dozens available for MS-DOS alone fewer than twenty years ago.

Consolidation of formats presents obvious advantages for digital preservation. Having a few dominant products lessens the effort needed to manage a large universe of data files, and the resulting critical mass of users dramatically lessens the likelihood that support for any particular format will completely disappear.

We do not agree, however, that the maturation of the PC market has greatly lessened the threat of digital obsolescence. Consolidation alone doesn’t guarantee stability. Even major corporations can fail or be bought out and have their products disappear. In a span of just four years at the beginning of the 1990s, former market leader WordPerfect saw its flagship word processor fall from a 46% share to 27%, and eventually to single digits today.

Furthermore, having one company’s proprietary format dominate the market doesn’t necessarily guarantee long-term support. Witness Microsoft’s recent commitment to Open XML as the default format for Word. Though this move may bode well for future preservation, its impact on the huge volume of existing files in the proprietary and unpublished “.doc” format is unclear. Finally, although the word processing arena may have settled down and started moving toward standardization and open source products, other genres of digital content are not much farther along today than word processing was twenty years ago and are far less uniform (e.g., 3D and interactive media).

The real bottom line seems to be somewhere between “doom and gloom” and “don’t worry, be happy,” but that covers a lot of territory. One reason for the paucity of examples of digital loss is that most institutions aren’t exactly anxious to publicize their digital stewardship errors. Since we tend to learn far better from our failures than from our successes, we’ll take a moment here to plug the idea of an anonymous reporting system for digital screw-ups (see E-Journal Archiving Metes and Bounds, p. 35.)

Institutions may shy away from publicly airing their dirty laundry, but confidential surveys can be more revealing. Over the past four years, Cornell University Library has conducted more than a dozen Digital Preservation Management Workshops, involving participants from nearly 150 different organizations. In preparation for the workshop, each attendee has been asked to complete an Institutional Readiness Survey, designed as a tool for participants to gauge their institution’s status on a five-stage digital preservation program development scale. One of the questions asked of attendees in the last 6 workshops (2005-2006) was:

Are there any digital materials in your holdings for which you lack the operational and/or technical capacity to mount, read, and access?

Of these respondents, 70% answered yes or don’t know, indicating either the existence of such materials or the ignorance of the potential problem. The kinds of content identified ranged from “Unidentified magnetic and optical disks” to “Bootable diskettes but no hardware to boot them” to older tapes, Videodiscs, Lotus 1-2-3 files, older Microsoft Office files, and a wide range other kinds of obsolete media and file formats.

This told us a great deal about problems at other institutions, but what was the situation at Cornell itself? We conducted an informal survey of Cornell’s library units several years ago and turned up at least a few problems with stacks material, including some holdings of 5.25" floppies in libraries that no longer had the necessary hardware to read them. For the most part, however, the library was paying attention to obsolescence issues and was at least aware of their vulnerabilities. What, we wondered was the status of digital assets held by individual members of Cornell’s academic community?

The File Format and Media Migration Pilot Service

In an effort to get a better read on the scope and seriousness of digital obsolescence in the unmanaged digital holdings of Cornell’s scholarly community, the Cornell University Library Research and Assessment Services unit established a File Format and Media Migration Pilot Service (FFMM) and advertised that we’d attempt to provide “new life for old digital information” for any faculty members wishing to rescue their obsolete files or media. Our goal in setting up this free service was to get some hands-on experience with obsolescence, and we specifically wanted to:

  • Respond to a perceived faculty need (no one else on campus was offering this service)
  • Gain a better understanding of the scope and nature of the problem
  • Familiarize ourselves with some of the necessary tools and procedures and understand their limitations
  • Add to the profession’s knowledge of the obstacles involved
  • Lay groundwork for assessment of other potentially at-risk digital resources, such as stacks holdings and archives
  • Determine if the problem was serious enough to raise awareness of the threat and serve to encourage faculty to deposit their personal papers in a well-managed central repository

This article describes the establishment and operation of the File Format and Media Migration Pilot Service and our experiences in running the service over the first two years. It is primarily focused on lessons learned rather than technical details. We may address the technical aspects in a later publication.

FFMM Planning Stage

v10_n5_art2_img2

Early planning focused on the scope of the service that we would provide. Unsure of the potential demand, we took precautions to limit the duration and extent of the project. The project was called a pilot service, suggesting that it was not a full-blown and dedicated library service and that it would be operational for a limited duration. Additionally, we decided to reserve the right to limit the number of pieces of media or files that we would take from any one individual. We worried about clients showing up with overflowing shoeboxes of dusty disks. In fact, that is how one of our first jobs, 66 Apple II 5.25" disks, was delivered! (See Figure 2.)

We also needed to determine the scope of media and file formats that we would attempt to handle. We conducted scans to determine if other institutions had similar operations and explored the websites of commercial services. We took a guess at the kinds media and file formats that would be most prevalent, considered what kind of hardware would be reasonable to get us started, and settled on a base level of capability (detailed below). We took a wait-and-see approach for adding service capacity based on demand and our ability to procure the appropriate hardware and tools.

Early on, the entire Research and Assessment Services team had input into the conceptualization and planning stages of the service. The bulk of the planning and nearly all of operations of the service was conducted by the authors. We had part-time student help for 2 semesters and had the good fortune in one case of finding a student with a knack for old operating systems that were in use around the time he was born!

v10_n5_art2_img3

Sufficient space to house the service was available in our existing work area. The hardware (as well as overflow parts and supplies) rests on an “L”-shaped desktop in an approximately 8 x 10 foot space (Figures 3 and 4). A locking file cabinet drawer holds clients’ media for active jobs; another drawer hold supplies. There was also a workstation for student use during the semesters we had their assistance.

v10_n5_art2_img4

v10_n5_art2_img5We wanted to roll the service out with at least a modicum of capability already established so we could process some requests without delay. This turned out to be a wise decision, because most users could supply little information about their materials and thus offered little guidance regarding what resources we would need. Since we had no expectation of offering a truly comprehensive data conversion service, we decided to limit ourselves to relatively mainstream platforms, especially Classic MacOS (Macs were once widely used on the Cornell campus), MS-DOS, and MS Windows. In terms of media, we suspected that 5.25" floppies would predominate on the PC side (Figure 5), 3.5" floppies on the Mac side (given Apple’s relatively early elimination of floppy drives from its computers), and perhaps certain cartridge media such as Iomega’s Zip and Jaz and SyQuest on both platforms. We decided not to invest in tape drives, given the proliferation of proprietary types and formats and the fact that tape has never been a particularly popular storage medium for end users.

We then had to decide what kind of hardware to purchase in order to support the selected operating system platforms and media drives. On the PC side, we thought the ideal machine (and we wanted to limit ourselves to one, in order to simplify support and minimize space utilization) should accommodate both 5.25" and 3.5" floppies and have room for several other removable media drives, have Ethernet, and run Windows 98 (the last version of Windows with real MS-DOS behind it). It turned out that many PCs from the mid-1990s meet these requirements fairly well. We wound up purchasing a Pentium II tower system with five bays for removable media drives.

Things were a bit more complicated on the Mac side. We suspected we would not be able to meet all our needs with one machine because of the numerous major technological transitions that Apple has made. Nevertheless, we started out looking for a machine with a floppy drive, Ethernet, and the capacity for expansion using both internal and external drives. We again settled on a mid-1990s era machine, a PowerMac 7300 running Mac OS 8.6.

Sources for hardware included the Cornell Library desktop services scrap heap, eBay, a local non-profit computer recycler (an arm of Ithaca’s Sciencenter called Babbage’s Basement), and contributions from staff (especially the authors—both notorious pack rats).

On the software side, we scoured the Internet for useful utilities, including file format converters, foreign disk format readers, and hardware emulators, and purchased and downloaded an assortment for the MS-DOS, Windows, and Classic Mac OS platforms. We located copies of popular software from the 1980s and 1990s, such as WordPerfect and Lotus 1-2-3 for MS-DOS, and WriteNow for Classic Mac OS. We also obtained a small amount of old media so we could test the various drives we had purchased.

Lastly, we developed a simple website and service forms to track client requests and workflow. The website initially consisted of a description of the service and contact information and a simple Web form (name, email address, media and data description fields) for submitting a request or soliciting more information. A more comprehensive description of the service was added later in the form of an FAQ. The following workflow forms were developed: Inquiry Form, Manifest, Migration Form. The Inquiry Form is completed for all requests and includes contact information and details of the materials to be migrated. If the job is accepted, it is noted on this form and given a batch id. Additional fields on this form track the dates that the job was started, returned, and the date of follow up (discussed below). The Manifest is completed when the client’s media is received. This form includes space to document the number of pieces of media that were submitted to the service, a brief description of the migration plan, an estimated v10_n5_art2_img6completion date, and a short statement addressing privacy and liability. The statement of privacy and liability informs the client that 1) we will need to examine the content of the files, but will take reasonable steps to preserve confidentiality and 2) the disks could be damaged or data could be lost in the process of attempting migration activities (indeed the disks and/or data may be damaged before they come through the door).

FFMM Operations

We publicized the service at launch time (early October 2004) with emails to personal contacts and posts to message lists targeting various people involved in IT areas who might come into contact with faculty with outdated media and files. This was followed by two short articles in Cornell Library electronic newsletters. In January 2006, a second wave of advertising consisted of postcard mailings to all faculty (Figure 6), an announcement on the Library’s home page, and a link on the library’s “Services for Faculty and Instructors” Web page. We are currently set for a third wave with poster-sized announcements ready for distribution to academic departments.

Many clients learned about the service either from our advertising efforts or word-of-mouth. The number of requests seem to spike just after publicizing the service but have also trickled in over the two-year span. The charts in Figure 7 detail the type of requests that we have received to date.

v10_n5_art2_img7b

If a request is out of the scope or capability of our service, we make some attempt (often conducting additional research) to provide helpful information and resources. If we can accept the job, arrangements are made to either pick up the media from the faculty member’s office or accommodate faculty who elect to bring their disks to us. The Manifest is completed at this time. Great care is taken to keep the disks safe and together. Removable labels are attached to the boxes or cases holding the media identifying the client and batch id. We use the Migration Form to record notes and to document the migration process. The migration process generally proceeds as follows:

  1. Prepare hard drive with directories for each piece of media.
  2. Label each piece of media with a removable label with client, batch, and disk ids.
  3. Apply write protection measures when possible (e.g., by sliding up the write protection tab on a 3.5" disk or taping the cut out on a 5.25" disk).
  4. Visually inspect the media.
  5. Copy all files onto the hard drive into the folders designated for each piece of media.
  6. Remove write protection and return media to original boxes or cases.
  7. For media migration only: transfer files to a machine equipped with a CD writer (if needed) and burn to a CD.
  8. For file format migration: duplicate the files on the hard drive to create a working copy. We made every attempt to put a piece of potentially fragile piece of media into a drive only once. All format migration activities were performed using the working files.
  9. Examine/diagnose files to identify formats and develop migration strategies.
  10. Perform migration.
  11. Transfer files (both the untouched original and the converted files) to a machine equipped with a CD writer (if needed) and burn to a CD.
  12. Return media and migrated files to client.
  13. Contact client 1-2 weeks later to ensure the client could open the files and to solicit feedback on the success of the migration. (Formatting errors are likely to occur during migration, and we were, at times, ill-equipped to gauge how close the converted file resembled the original.)

Performing Migrations

The two simple words of step number 10, “Perform Migration,” belies the amount of time, iterative work, problem-solving, technical expertise, hardware modification, software searching, and, sometimes, luck involved in completing this step. In theory, though our approach is fairly systematic and linear (prompt client for information, inspect media and files for clues, perform background research, apply tools as appropriate), the end result often looks more like trial and error. We discuss here a few important issues and then present a brief series of case studies to help illustrate some of the more significant obstacles. We encourage the reader wishing for more specific technical details to contact us directly.

Lack of documentation presents maintenance obstacles for all kinds of digital objects, but it is particularly troublesome for legacy materials, since they may embody obscure and obsolete content for which no independent source of documentation is readily available. After all, Web-based support documents did not begin to be heavily used by commercial hardware and software firms until the mid-1990s, and the Web now tends to be sparsely populated with information about products from earlier eras.

Unfortunately, when a user winds up in a situation where they no longer have the necessary hardware or software to render their old data files, it usually means the media containing the files were put aside and ignored for an extended period of time. More often than not, this means that the information we can gather is limited to the type of computer the files were created on (Mac or PC) and a general sense of the time period when they were being actively used. Details such as specific operating system versions, name and (particularly) version of applications software, disk capacity, and file systems are frequently lacking.

In many cases, the lack of adequate documentation posed a significant obstacle to efficient rescue of the client’s files. Assuming the media can be read, file extensions (in MS-DOS and MS Windows) and type and creator codes (in Mac OS) provided clues to the origin of files, though often without a sufficient level of specificity to identify the correct version. The commercial file conversion utilities could occasionally identify mysterious file types, but they often failed on obscure files of unknown origin. Even more daunting, however, were cases where just getting to the point of being able to read a file directory was the major obstacle. This occasionally occurred even when the media in question appeared common and recognizable.

Case Studies

The following case studies offer specifics on problems encountered over the past 18 months.

Case study #1

Main content: A variety of files on Macintosh floppy disks.

Main obstacle: Most of the disks were not readable in our machine.

Resolution: The disks had been created using MFS (Macintosh File System), which was used by Apple for less than two years before being superseded. Apple’s support for MFS in Mac OS was dropped as of system 7.6, about 10 years after the successor to MFS was introduced. We needed to use an earlier version of the OS in order to successfully read these disks.

Lessons learned: Causes of obsolescence can be obscure. Attaching relatively detailed technical metadata to today’s digital objects can be key to unlocking the content in the future.


Case study #2

Main content: Word processing files on Macintosh floppy disks.

Main obstacle: The disks were readable, but most of the data files could not be properly rendered. The problem files had been created by Microsoft Word v.1.00, using a file format that was abandoned by Microsoft in later versions of Word. Luckily, there was a copy of the software on one of the client’s disks, however the application would not run under even the oldest version of Mac OS we were able to install on our machine.

Resolution: We decided against setting up another and very old Macintosh capable of running an early enough version of Mac OS. Instead we installed an open source Mac Plus emulator and an early version of Mac OS (within the emulator) on our existing machine. This allowed us to run the old version of Word and render the files.

Lessons learned: 1) Proper rendering is a necessary first step in rescuing obsolete digital content, but may not be sufficient. Though we could render the files reasonably well inside the emulator, we lacked a good migration path because the only export format supported by Microsoft Word v.1.00 was plain text. Saving the files as plain text meant losing all character formatting and specialized position formatting. 2) Obsolescence problems can be multi-layered. Once we were able to run the original software and open the files, we found that we still couldn’t properly render some files due to the absence of necessary fonts.


Case study #3

Main content: Apple II 5.25" floppy disks using ProDOS and Apple DOS 3.3.

Main obstacle: Lack of hardware for handling the media format and operating system.

Resolution: Initially, we obtained use of a fairly complete Apple II setup off campus. This solution became unsatisfactory because of the time it would have required to copy the disks and travel to the off campus site. Ultimately, we purchased a used Apple IIe card for the Macintosh and installed it in an older Mac. We also bought (from eBay) a pair of external 5.25" floppy disk drives that work with the Apple IIe card.

Lessons learned: Ideal solutions using original technology can be hard to find and time-consuming to utilize. We had access to the off-campus Apple II on a very limited basis, and the migration process was extremely slow, averaging 15 minutes per 160 KB disk side to transfer the files (there were 66 disks, including many with content on both sides). Bridge technologies such as the Apple IIe card for the Macintosh can be easier to deal with, but usually involve compromises. The transfers with the IIe card went much faster, but introduced additional compatibility problems.


Case study #4

Main content: Single-sided type 2 DVD-RAM.

Main obstacle: Lack of necessary hardware for handling the media.

Resolution: None. After doing considerable research, we determined that there was still a single DVD reader being manufactured that could read this early format. We supplied the client with information about possible sources for both a new drive, and for older drives that could read the format, but we decided against investing in such a drive ourselves.

Lessons learned: Obsolescence can develop rapidly. The DVD in question was only five years old.


Case study #5

Main content: Word processing and graphics files on Macintosh floppy disks.

Main obstacle: The files had been produced by a variety of applications and there were a large number of files in multiple directories and sub-directories. Although we had the ability to read the disks and most of the files, conversion was slow and tedious, with no obvious means for automation. Again, just copying the disks took considerable time—about 1.5 hours for 37 disks. Even using a straight-forward, batch converting utility, the conversion of the files on these disks took over 3 hours.

Resolution: We were able to migrate most of the files, but the work was time-consuming.

Lessons learned: Even when there are no significant technical obstacles, migrating obsolete content can be expensive because of the staff time required for manual operations.



Discussion and Conclusions

Our experiences over the course of two years and with more than two dozen inquiries from faculty members lead us to a number of conclusions about the level of threat posed by technical obsolescence and the best means to tackle the problem. First and foremost, perhaps, is that the issue is real, and though there may be a shortage of verifiable examples of high profile losses at the institutional level, there is no shortage of everyday examples within the academy. It may take some effort to get content owners to dig up their digital skeletons, but they’re out there.

We consciously designed a service that we knew would not be comprehensive. Not only did we have a limited capacity to deal with obsolete media, but we also excluded from our service offerings recovery of data from damaged or corrupt media. Interestingly, in the course of migrating data from many hundreds of 20+ year-old floppy disks, we encountered comparatively few examples of defective media (less than 5%). Obsolescence and inadequate documentation were far more significant threats to data longevity than media failure, lending support to the contention that long-life media is a poor investment and offers only minor advantages, except perhaps in environments where proper storage conditions cannot be maintained.

Technological solutions to legacy obsolescence are available, but depending on the age and nature of the content, the solutions can be hard to discover, complex to implement, and far from ideal in outcome. Though many dedicated data conversion and data recovery firms exist, their services can be prohibitively expensive, especially for individuals. Furthermore, the more time that has passed since the content became functionally obsolete, the more difficult it is to work with, the fewer the options for recovery, and the less satisfactory the results are likely to be. Delay in dealing with obsolescent or obsolete material almost always exacerbates the problem.

Establishing the capacity to deal with obsolete data and media in-house can be a significant undertaking. In particular, it is essential to have staff with substantial knowledge of (and experience using) legacy hardware and software. Existing IT staff are unlikely to want to have anything to do with maintaining obsolete equipment, so that burden will fall on whatever department is offering the service. Though much of the equipment needed to equip a basic file format and media migration service can be obtained inexpensively, multiple sources will probably be needed and set up can take months. However, the primary expense, by far, is staff time.

Nevertheless, there are potential rewards, at least in the short-term, for offering such a service in an academic library environment. Such a service is appreciated by faculty and extends the profile of the library in a new direction. As one faculty member wrote: “Many thanks to you and your colleagues for this amazing help. I had pretty much given up hope of using these old files again.”

If sufficiently well-publicized, such a service can help ascertain the depth of the legacy obsolescence problem for faculty holdings. At Cornell, we weren’t flooded with requests (perhaps concomitant with our somewhat intentionally limited efforts to publicize the service), but received a steady trickle that amounted to about one request per month, with requests continuing to come in even after we stopped publicizing the service.

Ultimately, we do not endorse relying on technology preservation as a long-term solution to technological obsolescence. There are far too many obstacles. (We remain skeptical of predictions such as that in a November 2004 New York Times article, published at the same time we launched the FFMM service of “a whole industry of people who will ... have every ancient computer available.”) Over time, old equipment becomes more and more difficult to obtain and maintain. Even specialized tools such as dedicated file format utilities become obsolete over time, and their support for older formats is dropped as demand lessens. Another significant obstacle is the gradual loss of documentation and the specialized knowledge of how to operate obsolete equipment.

We do think that technology preservation provides a useful stopgap and short-term solution for improperly curated legacy data and media. There will continue to be a demand for conversion services, though at any time, such services are unlikely to be able to work with materials more than two to three decades old. Formats and media that are least standardized today are the most likely to be the prime targets of recovery services in years to come.

What are the long-term solutions? Our experience suggests that the most productive outcome of a pilot project like FFMM may be the opportunity to create a teachable moment. One of the lessons we take from two years of offering this service is that leaving data curation in the hands of individuals is highly problematic. Despite best intentions, most people simply don’t make a sufficiently high priority of employing sound digital preservation strategies, including creation of sufficient metadata, routine auditing of existing content for corruption, planned migration of file formats and media, creation of off-site backup copies, and proper storage and handling regimens.

We believe a superior alternative is to establish institutional repositories in which faculty are encouraged to deposit their work. This enhances the survivability of digital content by putting it into a well-managed, centralized environment where it can be subject to state-of-the-art technological and organizational digital preservation techniques. It also has the potential for more efficient resource utilization due to economies of scale and avoidance of duplicated effort.

Finally, our experience in FFMM leads us to conclude that regular and timely migration offers the best prospect for long-term preservation of digital objects, at least for those stored in proprietary formats with closed specifications. The approach of multiple migrations over time has been criticized for encouraging the propagation and accumulation of errors and for failure to continue to reference the original, authentic digital object. However, our experience suggests that delaying migration of such formats significantly reduces the likelihood that the converted file will retain an acceptable level of fidelity. Ultimately this may be seen as yet another argument for avoiding use of non-standard file formats altogether. But in cases where the files already exist and the choice is between migrating now and hoping for a better solution to emerge later, prompt action using the manufacturer’s own migration tools will almost certainly produce better results (at least, until the device in Figure 8 becomes a reality).

v10_n5_art2_img8

References

Darlington, Jeffrey, Andy Finney and Adrian Pearce, “Domesday Redux: The rescue of the BBC Domesday Project videodiscs,” Ariadne, Issue 36, July 30, 2003. Available at http://www.ariadne.ac.uk/issue36/tna/.

Garfinkel, Simson, “The Myth of Doomed Data,” Technology Review, December 3, 2003. Available at http://www.technologyreview.com/read_article.aspx?id=13419&ch=infotech.

Hafner, Katie, “Digital Memories, Piling Up, May Prove Fleeting,” The New York Times, Wednesday November 10, 2004, Late Edition - Final, Section A, Page 1, Column 3. Available at http://www.nytimes.com/2004/11/10/technology/10archive.html?
ex=1257742800&en=e07d6277aca5e520&ei=5088.

Kenney, Anne R. and Ellie Buckley, “Developing Digital Preservation Programs: the Cornell Survey of Institutional Readiness, 2003-2005,” RLG DigiNews, v.9, no. 4 (August 15, 2005). Available at http://www.rlg.org/en/page.php?Page_ID=20744#article0.

Mellor, Phil, Paul Wheatley , and Derek M. Sergeant, “Migration on Request, a Practical Technique for Preservation,” Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries, p.516-526, September 16-18, 2002. Available at http://www.si.umich.edu/CAMILEON/reports/migreq.pdf.

Rusbridge, Chris, “Excuse Me... Some Digital Preservation Fallacies?” Ariadne, Issue 46, February 28, 2006. Available at http://www.ariadne.ac.uk/issue46/rusbridge/intro.html.

University Museum of the University of Amsterdam. Computer Museum. “Troubles in computer conservation: some examples.” Available at http://www.science.uva.nl/museum/rampspoed.html.

Waters, Donald, and John Garrett, eds. “Preserving Digital Information: Report of the Task Force on Archiving of Digital Information,” The Commission on Preservation and Access and the Research Libraries Group. 1996. Available at http://www.rlg.org/en/page.php?Page_ID=20442.


 Highlighted Web Site  

Collaborative Digitization Program



Collaborative Digitization Program

The Collaborative Digitization Program (CDP) evolved from the Colorado Digitization Project and serves primarily the western US states, but its resources benefit the larger cultural community as well. Its mission is:  “To achieve high quality digital access to cultural heritage collections. To provide resources and training to create digital surrogates of primary source collections.”

In addition to hosting digital collections, the CDP website features many resources and training opportunities targeted to both managers and consumers (specifically educators) of digitized collections:

  • The Digital Toolbox includes CDP resources on best practices (e.g, for digital imaging and digital audio); links to CDP training workshops (e.g., “Digitization and Museums”), and a mini-tutorial on project management for digitization projects.

  • The Teacher Toolbox puts digitized collections at educators’ fingertips and is chock full of lesson plans “created by teachers for teachers,” as well as a mini-tutorial for helping educators to use primary sources, especially those in digital formats.

The organization has also made publicly available many policy documents, reports, papers, and presentations from CDP projects and working groups that may be useful examples to others involved in creating , managing, and marketing digitization projects.


 FAQ  

Trial by File: Five Tools for Managing Formats

Author: Brian Franklin - Cornell University 2006 Summer Intern (bri1976@gmail.com)

We hear a lot about a few digital preservation tools (JHOVE, DROID, XENA) for managing file formats. Are there any other tools that merit attention?

FAQ Editor’s Note: Brian Franklin contributed this issue’s FAQ. In each of the past two years, the Department of Research and Assessment Services at Cornell University Library, as a component of its Digital Preservation Management Workshop, has awarded summer distance internships in digital preservation to two graduate students in library and information science. Brian Franklin, at the time an MS candidate in the Graduate School of Library and Information Science at the University of Illinois at Urbana-Champaign, was selected for one of the 2006 internships (he has since completed his degree). Brian conducted an inventory of digital preservation tools and submitted his final report in August 2006. That report included detailed profiles of nearly ten distinct tools, as well as less formal discussions of about a dozen others.

For this FAQ, we are providing edited excerpts from Brian’s report that focus on five lesser-known tools for managing file formats, including two format identifiers (The National Library of New Zealand Metadata Extraction Tool and TrID) and three format conversion tools (the IBM Digital Asset Preservation Tool, LuraDocument PDF Compressor Desktop v.4, and the TOM (Typed Object Module) Conversion Service).

In addition to tools for which Brian did extensive profiling and testing, there were several for which he had time only for more limited exploration. One of them is introduced below. We’ve seen little mention of TrID, except in discussions by another project that has also largely stayed beneath the digital preservation community radar called the Format Exchange, a project of the Long Now Foundation. However, unlike the Format Exchange, TrID is showing signs of recent growth and development.

Brian’s work was supported by grant funds from the National Endowment for the Humanities, and supervised by Richard Entlich, Digital Projects Librarian in the Department of Research and Assessment Services at Cornell University Library.



TOOL PROFILES

1. Digital Asset Preservation Toolv10_n5_faq_img2

Name of developer(s): Developed at IBM by a team led by Raymond Lorie of IBM Almaden and Raymond van Diessen of IBM BCS in the Netherlands.

Date development began and/or release date: 2001

Version #(s) of release tested:

  • UVC - Version 1.3
  • Image LDV Viewer - Version 1.3
  • General LDV Viewer - Version 1.3
  • UVC Application Runner - Version 1.0
  • JPEG decoder - Version 1.3
  • GIF 87a decoder - Version 1.3
  • Factorial program - Version 1.0

Home page for tool or project: http://www.alphaworks.ibm.com/tech/uvc

URL for download of the code:
https://www14.software.ibm.com/webapp/iwm/web/preLogin.do?source=AW-0IK

Open source or commercial: Open source

Nature of version tested: Digital Asset Preservation Tool is a “proof-of-concept” demonstration of the Universal Virtual Computer (UVC) solution.

Source language and operating system platform used: The Digital Asset Preservation Tool was implemented in Java and tested under Windows XP.

Main function or category of tool: According to the project home page, “A UVC program can be written to decode a specific file format. In this demo, the JPEG (JFIF 1.02) and GIF87a formats have been selected. So that this solution can be used for the rendering of the JPEG and GIF87a format decoders, a translator (Decoder) has been developed that runs on the UVC and translates the specific image files into a Logical Data View (LDV). The LDV is a structured description of a digital object, generated according to a specific schema (the Logical Data Schema or LDS). If somebody in the future (in ten, fifty, or even a hundred years) wants to view a 2003 jpeg object, a UVC emulator can be written that runs the jpeg Decoder, which generates an LDV. Because the Logical Data Schema is also preserved, future programmers are able to understand the LDV and develop a viewer.”

The project website provides a diagram of the proposed functionality of the tool. A discussion of the UVC approach can be found in a past issue of RLG DigiNews.

Operational details: Handles the JPEG (JFIF 1.02) and GIF87a formats
 
Supports batch functionality? No.

Command line, GUI, or both? GUI.

Nature of download: Zip. Executable JAR files.

Prerequisites for installation and or use: Java Runtime Environment  (JDK1.4.1 or later).

Modularity: Again, from the project home page, “The UVC is carefully described and can be interpreted to develop a UVC emulator for any given platform. Once a UVC emulator has been developed, programs that have been written for this UVC can be executed. Because the UVC can be applied to any computer, these UVC programs become technology-independent.”

OAIS functional entity group(s) and task(s) this tool addresses: The UVC approach is designed to avoid the need for periodic migration of files. Instead, the UVC emulator itself is migrated and allows file viewers written to the correct specification to properly render the file in a future environment. A decision to utilize UVC would be made in Preservation Planning and incorporated in policies in Administration, where it might be applied only to some file formats. For those formats, the UVC along with appropriate viewers would be actively deployed within Access, in order to generate a DIP compatible with the environment at the time of access.

Information about future development plans: All of the information available via the project homepage is dated, with the most current accompanying information (via the FAQ section) being from 2002. Since this is a proof of concept demonstration, I’m assuming that future development of this demo isn’t applicable. Relevant question in terms of development: are there any plans to extend this technology to other popular formats, such as TIFF and PDF?

Difficulty level for downloading: LOW The links for downloading the tool provided on the project page are straightforward, but user registration is required.

Difficulty level for installation: MEDIUM The zip file containing the executable files/folders will need to be extracted. User should be familiar with working with BAT and JAR files.

Difficulty level for configuration: LOW No manual configuration of installed files.

Difficulty level for use: MEDIUM When using both the General and Image LDV Viewers, the user should know the basic differences between a JPEG and a GIF. Also, there really is no manual that describes the relationship and/or functionality between the various tools in the package (the decoders and viewers). To figure this out, users would have to experiment by trial and error to some extent.

Level of development: I didn’t have any trouble operating the LDV viewers, but there is the potential for a complete novice to get lost. If anything, I would recommend that each viewer come with an accompanying help manual that could be easily identified/accessed within the viewer (my instinct was to go to the “About” menu, but the only information provided there is about the version number of the tool and basic intellectual property data).

Documentation: The documentation is minimal and there is no associated documentation with the download that explains the installation or purpose of the tool in user-friendly terms. The only document that accompanies the download is a limited installation “Read-Me” manual. A user could consult the project Web page, but the information provided there is highly technical and not easy to grasp.

Errors, bugs, anomalies, usage problems encountered: There was a pesky error that would occasionally manifest itself in attempts to process a jpeg file. There was no obvious difference between the jpeg images that generated the error and those that I successfully processed. The error message that I received was: “Error while executing instruction. java.lang.NullPointerException.” The cause may be either a bug in the program, or a flaw in the Java environment.

How well the tool meets its stated objectives. The program is deconstructing each pixel down to RGB specs and arranging each pixel by assigning it an ID. The concept behind this approach for preserving popular image formats seems sound, but before this tool could be realistically implemented in a production environment there would have to be significant improvement in the efficiency of processing an image into an LDV. It took over two hours to convert a 737 KB jpeg into an LDV. The enormous size of the resulting XML file is also a drawback. This same jpeg produced an output file of over 350 MB.
 
Scalability: As noted above, a production version of the Digital Asset Preservation Tool would have to be substantially faster and produce more compact output files in order to be practical for deployment in a heavily used repository.

Results/Observations of any performance testing: These have been discussed above. As an additional example, a 4 KB jpeg image that I processed took about ten minutes to convert and the resulting XML file was 3.5 MB.

Overall level of technical competence needed to download, install, configure, and use this tool: MEDIUM


v10_n5_faq_img32. LuraDocument PDF Compressor Desktop v.4

Name of developer(s): LuraTech Imaging GmbH
 
Date development began and/or release date: 2003-2006

Version # of release tested: 4.2.02.15

Version (date) # of currently release: 2006

Home page for tool or project:
http://www.luratech.com/products/luradocument/pdf/compressor/index.jsp

URL for download of the code:

http://www.luratech.com/download/files/PDF_CompressorDT_win.exe

Open source or commercial: Commercial.

Nature of version tested: Trial.

Source language and operating system platform used: A Windows application. Windows XP was used for testing.

Cost: The trial version is freely downloadable. The full version is $348.

Limitations, if trial or demo: The trial version marks all output pages with a Luratech logo, resulting in a 5 KB larger output size and a slightly slower page display in Acrobat Reader compared to the licensed version.

Main function or category of tool: According to the documentation within the accompanying manual, “LuraDocument is a document compression procedure which preserves text legibility together with high visual and color quality.” The main interest in this tool relative to digital preservation is the fact that it will create PDF/A-1 files. PDF/A-1 is a subset of PDF 1.4 that enforces certain behaviors (e.g. all fonts must be embedded and be legally embeddable) and prohibits some content deemed problematic for long-term preservation, such as executables and video.

Operational details: The supported input formats are TIFF, jpeg, PDF and PNM.

Supports automated/batch functionality? Yes, but this capability is disabled in the trial version, which only allows one file at a time to be processed.

Command line, GUI, or both? GUI.

Nature of download: EXE (Windows executable).

Prerequisites for installation and or use: Windows 95/98/Me/NT4.0/2000/XP/Server2003. Minimum 3 MB free disk space. Pentium processor running at 300 MHz or greater. Minimum 128 MB memory.

OAIS functional entity group(s) and task(s) this tool addresses: Ingest.

Information about future development plans: Unknown, but there are possible peripheral effects of PDF/A-2, the successor to PDF/A-1, which is the current standard. PDF/A-2 will be based on PDF 1.6 and support some additional features. PDF/A-2 will not impact files that are PDF/A-1 compliant, but PDF/A-2 compliant files will not be PDF/A-1 compliant if they incorporate features from PDF 1.6 that are not supported in PDF 1.4.

Language(s) of available documentation: English and Dutch.

Difficulty level for downloading: LOW The links for downloading the tool provided on the project page are straightforward.

Difficulty level for installation: LOW  Self-extracting binary.

Difficulty level for configuration: MEDIUM  Requires configuration of input and output data. One can also use different profiles to optimize compression results for specific input document types.

Difficulty level for use: MEDIUM  The only relatively complex aspect of utilizing this tool was adjusting the encoder options. Some users who do not have a basic understanding of the issues related to compression might be a bit lost in customizing the encoder options. For example, some users might not understand the difference between JBIG2 and Fax G4. I would recommend consulting the manual prior to using the application in order to get a brief overview of the tool and its encoder options and to get background information on the purpose of PDF/A and its relevance to digital preservation.

Level of development: I didn’t find the interface to be particularly intuitive, and I consulted the accompanying documentation to assist me in navigating and identifying the functionality of the various buttons and icons. I think some of the functionality of the interface was compromised due to this being a trial version, as there were many features that were disabled, which added a slight overall incoherence.

Documentation: The documentation is extensive and thorough. The manual that accompanies the downloaded application provides a step-by-step walk through of installation, configuration, and use.

Errors, bugs, anomalies, usage problems encountered: At times the application would mysteriously quit. It wouldn’t freeze prior to quitting, the GUI would just disappear, and I would have to re-execute the program. I didn’t identify specific actions that preceded this glitch.

How well the tool meets its stated objectives: Since this was a trial version, there were limitations on its functionality. Also, although the application claims to produce PDF/A compliant files, I had no way to verify this. As one of the first tools to claim that it conforms to PDF/A, version 4 of LuraDocument PDF Compressor should be tested on a wide variety of PDF files to determine how well it conforms to the standard.
 
Scalability: I’m not able to comment. The batch processing option was disabled.

Results and/or Observations of any performance testing: It processed individual files very quickly.

Overall level of technical competence needed to download, install, configure, and use this tool: LOW


v10_n5_faq_img6

3. TOM (Typed Object Model) Conversion Service

Name of developer(s): TOM is being developed by a team at the University of Pennsylvania Library led by John Mark Ockerbloom.
 
Date development began: 1995.

Home page for tool or project: http://tom.library.upenn.edu/convert/tom.html

URL for download of the code: http://tom.library.upenn.edu/sw/

The version tested here is a Web-based online demonstration of TOM’s file conversion capabilities.

The software download page describes various options for installing the software on one’s own computer:

  • The TOM Toolkit for Perl has all the basic code you’ll need to set up TOM clients, servers, and brokers. It also comes with a simple script to invoke conversions, and a Web-based TOM type browser and editor.
  • The TOM Client Toolkit for Java allows Java-based programs and systems to access all the services TOM servers and brokers offer.
  • The TOM Conversion Service lets you run a TOM-based conversion service on your own website, once you’ve installed the TOM Toolkit for Perl.

Open source or commercial: The downloadable software is open source.

Nature of version tested: Demo/Free Online Conversion Service

Source language and operating system platform used: I accessed the Web page and uploaded files for testing using Internet Explorer under Windows XP.

Main function or category of tool: The TOM Conversion Service is a website that lets users upload files and convert them to other formats, using the TOM system to carry out the conversions. It is a distributed system for converting, documenting, programming, and sharing services for diverse data formats.

Operational details (varies from tool to tool--formats supported, etc.): Extracts preservation metadata from the header of a broad range of file formats: BMP, MS Excel, GIF, HTML, jpeg, MP3, Open Office, PDF, MS Powerpoint, TIFF, WAV, MS Word 2, MS Word 6, Word Perfect, MS Works. Data is output in XML.

Supports automated/batch functionality? No.

Command line, GUI, or both? GUI (Web version).

Tools that this tool is related to: TOM technology is also used in Fred, a demonstration of a simple data format registry that is now online. Fred is an experimental system that demonstrates a simple digital format registry service. The information model is based on models discussed in meetings of the Global Digital Format Registry group that is sponsored by the Digital Library Federation. Fred is being developed as part of a Mellon Foundation funded project to apply TOM to the needs of digital preservation and learning systems. As the TOM documentation notes, “Fred is built in part on TOM, which can be thought of as Fred’s older, more nerdy brother.”

Modularity: There is documentation available on writing one’s own scripts to enable conversions.

OAIS functional entity group(s) and task(s) this tool addresses: Ingest.

Information about future development plans:

From the TOM Project page:

“We plan to make TOM services available through the Ockham initiative, so that they can be used in the National Science Digital Library and other educational applications. Look for more information as Ockham gets underway.”

I searched for any documentation that established a relationship between Ockham and TOM but didn’t locate any.

Language(s) of available documentation: Only English documentation is provided via the homepage of the tool.

Source(s) of external funding for development: Grant from the Andrew W. Mellon Foundation.

Difficulty level for use: MEDIUM In order to utilize the conversion service, a user should read the online information on converters supported by the service. Otherwise, it’s likely that the user may attempt to set up source and target file pairings that are not compatible or at least not optimal.
 
Level of development: The interface is extremely basic, intuitive, and to the point. It could be a bit more aesthetic, but clear functionality is what’s important, and I don’t believe even a novice user would become lost in navigating this demo interface.

Errors, bugs, anomalies, usage problems encountered: I was unable to convert PDF files to HTML, nor was I able to convert XML files to HTML files, even though they are noted as supported conversions within the available documentation.

How well the tool meets its stated objectives: Content seemed to convert without any problems, other than those noted just above. Yet, while the conversions worked in regard to the transfer of content, there were “look and feel” issues with the formatting. This could be a significant issue, especially if visual formatting is considered relevant to the meaning of the content of a digital object (e.g., illuminated manuscripts, visual poetry, or charts/tables).

Scalability: Since I only performed testing on the demo, I’m not able to comment. The conversion service via the Web only offered the option of processing one file at a time. With that noted, the time it took to convert files seemed lengthy. For example, converting a Microsoft Word document of 46KB to a PDF took about 30 seconds. However, network bottlenecks could account for some of the delay, and I did not test the locally installable version of the TOM software.

Results/Observations of any performance testing: In all of my experiments in converting Microsoft Word files to other formats, the content was converted but the formatting was not retained. In addition, as noted earlier, it is possible to choose source and target combinations (e.g. jpeg to GIF) that result in significant loss of fidelity. The online demo service does not warn you about such possibilities. However, the TOM Conversion Service terms of use clearly state that “[the authors] make no warranties or guarantees of any sort, and in particular do not guarantee the availability, accuracy, reliability, or confidentiality of this service.”

Overall level of technical competence needed to download, install, configure, and use this tool: LOW


4. National Library of New Zealand (NLNZ) Metadata Extraction Tool

v10_n5_faq_img5Name of developer(s): The metadata extraction tool was built by Sytec Resources for the National Library of New Zealand. 
 
Date development began and/or release date: 2002

Version # of release: Version 1.0, although all documentation accompanying version 1.0 refers to a version 2.0, despite the fact that the most recent available download I could identify is for version 1.0.

Home page for tool or project: http://www.natlib.govt.nz/en/whatsnew/4initiatives.html#extraction

URL(s) for download of the code:
Setup: http://www.natlib.govt.nz/files/Preservation/NLNZSetup.exe
Adapters: http://www.natlib.govt.nz/files/Preservation/adapters.zip
Documentation: http://www.natlib.govt.nz/files/Preservation/docs.zip

Open source (which license?) or commercial: Open source.

Nature of version tested: Full.

Source language and operating system platform used: Implemented in Java, and outputs in XML. Tested under Windows XP.

Main function or category of tool: Processes a variety of digital formats and extracts metadata about those formats.

Operational details: Extracts preservation metadata from the header of  a broad range of file formats: BMP, MS Excel, GIF, HTML, jpeg, MP3, Open Office, PDF, MS Powerpoint, TIFF, WAV, MS Word 2, MS Word 6, Word Perfect, MS Works. Data is output in XML.

Supports automated/batch functionality? Yes.

Command line, GUI, or both? Tested using GUI.

Nature of download: EXE (Windows executable) and Zip (for adapters).

Prerequisites for installation and or use: Java Runtime Environment (JDK1.4.1 or later). To run the application in an environment where the Java Runtime Environment is not configured to execute jar files use the command:  java -jar NLNZ.jar.

Tools that this tool is related to: Stephen Abrams notes in his paper, Establishing a Global Digital Format Registry (GDFR), that future developments of the NLNZ Metadata Extraction Tool could involve a relationship with the GDFR: “The JSTOR/Harvard JHOVE tool for format-specific object identification, validation, and characterization and the National Library of New Zealand (NLNZ) Metadata Extraction Tool are two well-known examples of systems whose implementation and maintenance would be facilitated by the existence of the GDFR to provide sufficiently detailed and authoritative format specifications.”

OAIS functional entity group(s) and task(s) this tool addresses: Ingest.

Information about future development plans: See above for information about possible relationship with GDFR.

Language(s) of available documentation: Only English documentation is provided via the homepage of the tool.

Difficulty level for downloading: LOW  The links for downloading the tool provided on the project Web page are straightforward.

Difficulty level for installation: LOW For Windows platforms there is a NLNZSetup.exe wizard that guides the user through the stages of installation. The zip file containing the adapters will also need to be extracted.

Difficulty level for configuration: MEDIUM The most difficult aspect of configuring this tool to operate involved adding the various adapters that corresponded with each file format. This entailed going into the Administration window of the GUI, and then selecting and activating each adapter. After clicking the install button, the operation immediately happens and the adapter is shown in all other admin screens (adapters, mappings, etc). Then, the user is required to match the adapter with the applicable xslt mapping (e.g., Mp3 Audio Adapter to mp3_to_nlnz_presmet.xslt). Configuring the adapters requires required closely reading the documentation.

Difficulty level for use: MEDIUM I would recommend that the user read the accompanying documentation prior to using the tool. I do think a user with a technical background and/or some background with digital preservation/metadata extraction could probably forgo the documentation. I consulted the User Guide for configuration purposes, but not as a manual for using the GUI and processing files.

Level of development: The interface is extremely user-friendly, well designed, and most importantly, simple. It’s not congested with a lot of excess functionality. I would even go so far as to refer to it as almost “intuitive” to those users with a knowledge of metadata concepts, since much of the terminology used on the interface aligns itself with standard conceptual terminology (e.g., one button on the interface is labeled “Schedule a Harvest”). I also like the fact that it offers a mouse rollover effect for active buttons on the interface, making it easy to identify the function of each button.
 
According to an evaluation of the NLNZ Metadata Extraction Tool in the UK Arts and Humanities Data Service’s Digital Images Archiving Study, published in March 2006, “The whole is more consistent but less detailed than the output produced by JHOVE. It extracts a very limited element set....The National Library of New Zealand Metadata Extract tool is open source like JHOVE but there has been less take-up and the National Library has not committed to institutional support for it. However, unlike JHOVE, it can theoretically handle complex relationships—for instance defining website files and relationships between them or spreadsheets.”

Documentation: The documentation is thorough. The download of the tool comes complete with a User Guide, a Glossary of conceptual and technical terms, System Requirements, Software Architecture, and Solution Architecture. However, as stated earlier, much of the documentation refers to a Version 2.0. I was unable to locate this version from the project page. Version 1.0 seems to be the most current.

The user guide was helpful in assisting me in installing the adapters. The language was clear, and there were illustrations/screenshots of the installation process taking place in the administration window of the GUI.

Errors, bugs, anomalies, usage problems encountered: There were several. When I initially installed the adapters in the administration window of the GUI, I assumed that they would be saved when I clicked “OK” and exited out of the window. However, I found that every time I ended the program and then restarted it, I would have to add the adapters all over again. Even though the documentation clearly notes, “The adapters are loaded into the adapters folder on the tool—while the adapter set up is manual, you do only have to do it the once.” During my testing, this wasn’t the case.

Also, at times when I wanted to install the PDF adapter, I would receive an error noting: “No Adapter Class Found!” If I would restart my machine and try again, it would work.

I could only get the tool to extract the most basic of data about a file, including filename, last modified date, MIME type, path, permissions, size, and URL. During my testing, I couldn’t get the program to extract more complex data such as keywords and comments that I embedded in the header (properties) of a PDF file.

Lastly, there were times the tool would unexpectedly quit, but I couldn’t identify factors that would establish a clear pattern for the reason it would do this.

How well the tool meets its stated objectives. The software meets its basic objective as a tool for metadata extraction.

Scalability: The documentation that accompanied this tool claimed that it could process 10,000 jpeg images per hour. I didn’t test the tool to this capacity, but the batch testing I did perform indicates that the tool does have a highly developed, quick batch processing ability. Therefore I could easily see it accommodating a fast-paced production environment. For results of my batch testing, refer to next section.

Results/Observations of any performance testing:

Batch processing: This tool is superbly fast in its ability to batch process multiple files. I didn’t test the developer’s claim that the tool can process 10,000 JPEG files in less than hour, but I did batch process 56 multiple files (PDF, DOC, and GIF) in under ten seconds.

Also, based upon researching available documentation and my own experimentation, I found that the tool identifies formats based upon file extension, and because of this I could trick the program. For example, I renamed a doc file (Microsoft Word) using a gif extension, and then processed it via the tool. The tool output a report that identified the file as a GIF (but with the correct file size and date information).

I also tried processing a few unknown formats with the tool, and basic metadata, such as size, filename, date creation, was still extracted.

Overall level of technical competence needed to download, install, configure, and use this tool: MEDIUM


v10_n5_faq_img75. TrID File Identifier

TrID, another file format identifier, is the work of Marco Pontello, and is available at his website. TrID bases its identifications on pattern matching and is trainable. A program called TrIDScan can be used to analyze a group of files conforming to a particular format specification, looking for any similarities in structure, byte strings and positions, or other repetitive elements. It then generates an XML file containing a definition for that file type. Thus far, there are 2254 file type definitions in the TrID library.

The TrID engine uses the definitions library to attempt an identification of an unknown file. If it finds multiple matches, it ranks them by assigning probabilities to each. TrID can be used online or can be downloaded (Windows or Unix/Linux) and run standalone.

I did some brief testing of the online version. It showed varying degrees of success identifying files. For example, a PDF file returned a perfect match:

Match

Ext

File type

Related URL

Def's author

100.0%

PDF

Adobe Portable Document Format

http://en.wikipedia.org/wiki/Pdf

Marco Pontello

Another example, a jpeg file returned a JPG and MP3 match:

Match

Ext

File type

Related URL

Def's author

80.0%

JPG

JFIF JPEG Bitmap

--

Marco Pontello

20.0%

MP3

MP3 audio

--

Marco Pontello

A Quark file returned a BONK and CGM match in addition to Quark:

Match

Ext

File type

Related URL

Def's author

50.0%

QXD

Quark XPress document

http://www.quark.com

Thomas Werner

25.0%

QXD

Quark Xpress project

http://www.quark.com

Marco Pontello

12.5%

BONK

BONK lossless/lossy audio compressor

http://www.bonkenc.org/

James Heinrich

12.5%

CGM

Computer Graphics Metafile

http://www.itl.nist.gov/div897/ctg/graph

Marco Pontello


 Calendar of Events  





Scanning Forum 2006: “Balancing Quality, Automation, and Service”
November 6-7, 2006
Charlottesville, Virginia

This two-day forum is designed to allow participants and vendors to share best practices and practical information. Participants will:

   * Discuss current problems and solutions for your scanning operations.
   * Obtain practical information about imaging, scanning, and workflow.
   * See demonstrations of equipment and software.
   * Ask questions of practitioners and vendors.
   * See and plan the future of digital services and scanning.

International Workshop on Greenstone Digital Library Software
November 27-December 2, 2006
Kerala, India

Sponsored by UNESCO and the Indian Institute of Management Kozhikode, this week-long workshop aims to provide intensive coverage of the Greenstone repository software, including: “software installation, configuration, customization, digitization and other related workflow operations, content development and management, designing and creating standard metadata sets to describe digital objects and encoding it in standard markup formats.”
 

NEDCC Persistence of Memory: Stewardship of Digital Assets
December 5-6, 2006
Tucson, Arizona

The Northeast Document Conservation Center will hold this two-day conference to “... highlight evolving best practices for digital preservation.” A slate of leading experts will address current issues in facing those tasked with managing and preserving collections of digitized and born-digital resources.

ICDL 2006
December 5-8, 2006
New Delhi, India

The theme of this year’s International Conference on Digital Libraries will be Digital Libraries: Information Management for Global Access. Special focus will be on the creation, adoption, implementation, and utilization of digital libraries, e-learning, and knowledge management systems.

Second International Conference on Open Repositories
January 23-27, 2007
San Antonio, Texas

Open Repositories 2007 will build on its inaugural meeting with the successful format of offering Open User Group meetings for DSpace, Fedora, and EPrints, followed by overarching conference sessions with both technical and managerial themes. Users of EPrints should note that conference organizers have announced that EPrints v3.0 will be formally launched at the conference.

Electronic Imaging 2007
January 28-February 1, 2007
San Jose, California

Co-sponsored by the Society for Imaging Science and Technology and the Society of Photo-Optical Instrumentation Engineers, this annual meeting will feature symposia, continuing education sessions, and technical exhibits. Topics to be covered include: 3D Imaging, Interaction, and Measurement; Image Processing; Multimedia Processing; and others.

Electronic Resources & Libraries 2007
February 22-24, 2007
Atlanta, Georgia

The Electronic Resources & Libraries community has issued a call for proposals for its second annual conference. This conference provides face-to-face networking and instructional opportunities for the online community of information professionals working with technologies related to electronic resources and digital services.

 Announcements  





E-Journal Archiving Metes and Bounds: A Survey of the Landscape

The Council on Library and Information Resources (CLIR) has released a new report that reviews 12 e-journal archiving programs from the perspective of concerns expressed by directors of academic libraries. The report “argues that current license arrangements are inadequate to protect a library’s long-term interest in electronic journals, that individual libraries cannot address the preservation needs of e-journals on their own, that much scholarly e-literature is not covered by archiving arrangements, and that while e-journal archiving programs are becoming available, no comprehensive solution has emerged and large parts of e-literature go unprotected.” Print versions of the report will be available late October 2006.

PLANETS

A new four-year collaborative European libraries and archives project has been initiated under the European Commission Information Science and Technologies  Framework Programe 6 Call 5 (FP6 Call 5). The Planets project aims to: provide preservation planning services; develop methodologies, tools, and services for the characterization of digital objects; evaluate existing tools and services to support preservation actions; establish a preservation test bed; develop a framework to support interoperability for project products; and produce a dissemination plan to facilitate take-up of project deliverables.

Web-at-Risk Needs Assessment Report

The Web-at-Risk project (California Digital Library, the University of North Texas, and New York University), funded by the Library of Congress under the National Digital Information Infrastructure and Preservation Program, has released a “Summary Report of the Needs Assessment.” The report reviews survey, interview, and focus group data from curators, librarians, content providers, and end users. The project is also currently pilot-testing its Web Archiving Service (WAS) architecture that will “…enable curators to build, store, and manage collections of Web-published materials in Web archives.” You can follow its development at the Web-at-Risk Wiki.

Stanford University to Preserve the Monterey Jazz Festival Collection

Stanford University Libraries and the Monterey Jazz Festival have announced a project to preserve the Monterey Jazz Festival Collection—audio and video recordings that date back to 1958 and document the world’s longest running jazz festival. The project partners have received support from the Grammy Foundation and the National Historical Publications and Records Commission to digitize the fragile and degrading analog audio tapes; additional grant funding will be sought to preserve the video recordings.


 Publishing Information  





RLG DigiNews (ISSN 1093-5371) is a Web-based newsletter conceived by the RLG preservation community and developed to serve a broad readership around the world. It is produced by staff in the Department of Research, Cornell University Library, in consultation with RLG Programs, OCLC Office of Programs and Research, and is published six times a year at www.rlg.org.

Materials in RLG DigiNews are subject to copyright and other proprietary rights. Permission is hereby given to use material found here for research purposes or private study. When citing RLG DigiNews, include the article title and author referenced plus “RLG DigiNews”. Any uses other than for research or private study require written permission from RLG and/or the author of the article. To receive this, and prior to using RLG DigiNews contents in any presentations or materials you share with others, please contact Robert Bollander (bolander@oclc.org ), OCLC Office of Programs and Research.

Please send comments and questions about this or other issues to the RLG DigiNews editors.

Co-Editors: Anne R. Kenney and Nancy Y. McGovern; Associate Editor: Robin Dale (RLG Programs, OCLC); FAQ Editor: Richard Entlich; Contributor & Production: Ellie Buckley; Advisor: Peter Hirtle.


All links in this issue were confirmed accurate as of October 16, 2006.


Copyright 2004 RLG.