
Introduction
The Digital Collections and Archives of Tufts University and Manuscripts and Archives of Yale University have recently completed a National Historical Publications and Records Commission (NHPRC) electronic records research grant (grant number 2004-083) entitled “Fedora and the Preservation of University Records.” The Tufts-Yale Project focused on three main areas of research: requirements for trustworthy recordkeeping systems and preservation activities, the ingest of records into a preservation system, and the maintenance of records in a preservation system. The project reports are listed in Figure 1.
The project aimed to combine electronic records preservation research and theory with digital library research and practice. In particular, the Tufts-Yale Project planned on answering the question: Does Fedora have the ability to serve as an electronic records preservation system.[1] Tufts University has been using Fedora as the basis of the Tufts Digital Repository for several years.[2]
As it was already strongly invested in developing and managing this repository with an expanding set of services, Tufts was keen on exploring Fedora’s ability to serve as a preservation system for electronic archival records. At the start of this project, Yale had been considering various alternatives for a preservation system, including a Fedora-based solution.
The Tufts-Yale Project focused on university records because each institution has a primary responsibility to preserve these records. However, the findings of this project are not particularly university-specific and are easily applicable to the management and preservation of electronic records in most industries.

Figure 1. Project Reports.
The Tufts-Yale Project framed its efforts within the Reference Model for an Open Archival System Information System (OAIS) and the resulting research products can be mapped to that framework.[3] The Ingest Guide describes the Ingest function as well as much of the Administration function: Establish Standards and Policies, Audit Submission, and Negotiate Submission Agreement within the Administration function. The Maintain Guide covers the Data Management and Archival Storage functions. The requirements for recordkeeping attempt to guide the activities of a Producer, while the requirements for preservation activities attempt to guide the activities of an Archive and thus represents all the functional areas of the OAIS Reference Model.
The OAIS Reference Model, the requirements, the Ingest and Maintain guides, the resources and services that support the guides, and the implementation of the guides should be viewed as a tightly related set of steps that build on each other. The OAIS Reference Model is the overarching conceptual structure for preservation activities and systems. Beneath OAIS sits a layer of requirement sets for preservation activities or systems, such as the Tufts-Yale Project preservation requirements. These requirements further articulate OAIS by describing the attributes of preservers that fit within the context of the Reference Model. Beneath these requirements are the Ingest Guide and Maintain Guide, which translate requirements into actions for those functional areas of preservation. The Tufts-Yale Project did not develop guides for all functional areas of preservation such as Access and Preservation Planning. Resources and services—ideally, standardized and openly available—support the execution of the activities defined in the guides. This interconnectedness reinforces each level, giving context to the frameworks, requirements, guides, resources and services, and implementation decisions, helping to enable their intelligent utilization. Institutions will still have implementation decisions to make within the context of the guides, resources, and services: they cannot simply take the guides and call them their procedures. 
Fedora
Over the course of our work, the Tufts-Yale project shifted its attention away from assessing whether Fedora could serve as a preservation system to concentrating on developing requirements for recordkeeping and preservation, and creating the Ingest and Maintain guides. We changed our focus when we realized that we were asking the wrong question. In serving as the repository core of a preservation system, a Fedora instance (or instances) would only be one part of an overall preservation environment. Large portions of ingest and access activities in addition to preservation planning decisions would occur outside of the Fedora instance. Even though some preservation policies may be articulated and managed through Fedora, an institution still must formulate these policies—they are not pre-set in Fedora. Rather than an out-of-the-box, limited repository solution, Fedora is a repository architecture upon which an institution can shape a repository in many different ways. Thus, the suitability of Fedora as the basis of a preservation system depends significantly on its implementation.
Furthermore, as of version 2.1 (released February 2006), Fedora operates within the Fedora Service Framework, which provides the architecture for new services that support a Fedora repository instance but are outside and independent of the repository itself.[4] Two such services that currently exist are Directory Ingest and OAI Provider, and members of the Fedora community are currently developing additional open-source services. The Fedora Preservation Working Group is currently investigating and developing services for supporting preservation activities.[5] Which of these external services an institution uses, and how it employs them, will be a significant factor in the suitability of a Fedora-based repository as a preservation system. 
The question we should have asked was: “Can a Fedora repository, surrounded by the proper preservation policies, tools, and Fedora services, serve as the basis of a trustworthy preservation system?” Or put another way: “Does the use of a Fedora repository necessarily prevent the development of a trustworthy preservation system?” The Tufts-Yale Project team feels it can answer yes to the first question and no to the second. The Fedora core provides a promising basis for a preservation system. Its agnostic view of file formats and object types enables it to manage essentially any type of file. It has the ability to manage objects with complex—including hierarchical—relationships with its use of RDF or METS metadata. It can manage multiple bitstreams for a single object, which can enable archivists to track and store the original bitstream of a record and the bitstreams of any subsequent transformations. It has versioning and persistent identifier capabilities. With XACML, it can articulate policies that manage access to records and prevent unwanted modifications. Fedora is a transparent system and Fedora objects are articulated in XML (usually FOXML or METS), making it feasible to migrate records out of Fedora.
There are, however, many Fedora services critical to preservation that require further development—some of which is already underway. For example, the Fedora Preservation Services Working Group is currently developing an alerting service that would support the documentation, encoding, and management of events that impact preservation.[6] The Working Group is also investigating the feasibility of developing a repository history service. Finally, the Fedora community is working towards formalizing content models, which could possibly be used to define record series. More important than the development of any particular preservation-supporting service, an entity such as the Preservation Services Working Group will have to provide the Fedora community with a roadmap of preservation needs and priorities for new Fedora services or ensure that Fedora can work smoothly with existing or future tools, such as those that enable format validation or integrity checks. This will be a challenging task as Fedora moves beyond grant funding in 2007. However, a well-guided and active community should enable Fedora—with its hallmark of flexibility and adaptability—to support the services needed to meet the evolving challenge of preserving electronic records.
Recordkeeping Systems and Preservation Activities
What turned out to be the most beguiling and difficult part of the project was the activity we originally envisioned as the easiest and most straight-forward. To ensure records would be created and kept by producers in a form that could be preserved, we began to create a set of requirements for recordkeeping systems by synthesizing ten requirement sets developed during the 1990s and early 2000s into a single set that was appropriate for a university setting. We further intended to undertake a parallel process to develop requirements for preservation systems at universities.
We initially organized our requirements according to Indiana University’s “Requirements for Electronic Records Management Systems (ERMS)” because it was the requirement set most closely associated with universities.[7] However, we soon felt that some of the categories of requirements were concerns that either permeated every aspect of recordkeeping—Audit Trails, Metadata, Documentation—or the goal of the requirements—Authenticity. This led us to develop our own array of ten requirement categories[8] that organized just over 200 requirements. All of these requirements implicitly assumed nine concerns—Audit, Authorization, Automation, Compliance, Documentation, Financial Sustainability, Metadata, Reporting, and Training. In other words, nearly every activity within a trustworthy recordkeeping system had to be auditable, automated, authorized, documented and reportable; generate appropriate metadata; be undertaken or managed by properly trained personnel in a compliant manner; and be supported by a stable source of funding. We undertook the same process for developing a requirement set for a trustworthy preservation system. We organized just over 140 requirements into seven categories that closely followed the OAIS Reference Model.[9] Both sets of requirements outlined the attributes needed to support a trustworthy recordkeeping and preservation system. A trustworthy system allows a person to presume the authenticity of records managed by the system.
This initial attempt at a set of requirements for recordkeeping and preservation systems generated many problematic issues. First, we struggled to precisely define the elements that compose a recordkeeping or preservation “system,” in part because the existing literature does not agree on terminology. Second, it was difficult to describe the relationship between the recordkeeping and preservation system. We had developed a very life-cycle-centric relationship between two distinct systems, presuming an institution would always move records from a recordkeeping system to a separate preservation system—not always the case in the real world. In addition, we identified preservation requirements for the recordkeeping system that were repeated throughout the preservation system. Finally, we had organized the ten categories in the recordkeeping requirements arbitrarily and did not firmly base the categories on any previous work.
With the invaluable help of Nancy McGovern of Cornell University [Editors note: Nancy McGovern is now at the Inter-university Consortium for Political and Social Research (ICPSR).] we resolved these issues and formulated a single document divided into two different chapters, one for recordkeeping system requirements and the other for records preservation requirements. Both sets of requirements are
organized and grouped into sections and subsections that correspond to existing intellectual frameworks for recordkeeping and preservation. The recordkeeping chapter is organized into seven sections based loosely on the framework presented in the records management and controls section of ISO 15489-1: Information and documentation—Records management.[10] The records preservation requirements chapter is divided into seven sections and thirty-four subsections, with the requirements grouped loosely according to the functions of the OAIS Reference Model.
Our conception of the difference between a recordkeeping system and preservation activities presumes that a producer will create, acquire, use, and manage records in a recordkeeping system to suit its current business needs, while the central purpose of preservation activities is to preserve records. In a pure records lifecycle model environment, an archive will later ingest some records from a recordkeeping system into a separate preservation system that the archive administers, undertaking preservation activities in this system. In a records continuum model, recordkeeping is a continuous process that does not necessarily move from a recordkeeping system to a separate preservation system administered by completely separate juridical entities. In this model preservation activities may take place in the recordkeeping system. Many producers and archives operate in a mixed world between these two models. This new specification of requirements should be suitable for any of these situations. 
Ingest
The main product of our research into ingest is the Ingest Guide, which describes the actions needed for a trustworthy ingest process. The Guide refers to ingest broadly, defining it as the entire process involved in moving records from a recordkeeping system to a preservation system. This process consists of the Producer and Archive agreeing to and defining what records will be transferred and the manner of the transfer, validation, and transformation, as well as getting the records into the preservation system.
As mentioned earlier, the Ingest Guide covers all of the OAIS Ingest function and the following activities within Administration: Establish Standards and Policies, Audit Submission, and Negotiate Submission Agreement. It builds directly on the work of the Producer-Archive Interface Methodology Abstract Standard (PAIMAS), which was created by CCSDS as part of its continued examination of the ingest process.[11] Composed of four phases, Preliminary, Formal, Transfer, and Validation, the PAIMAS makes a detailed examination of the process for creating a submission agreement in the Preliminary and Formal phases, but gives only a cursory study of transfer and validation activities. In developing the Ingest Guide, the Tufts-Yale Project team found the division of a Preliminary and Formal phase for creating a submission agreement too formal for a university setting. We generated a guide that describes a simplified, more action-oriented negotiate submission process. While the PAIMAS was very succinct, we chose to be much more detailed in our description of the transfer and validation process.
The Ingest Guide contains two main sections. Section A, Negotiate Submission Agreement, details how the producer and the archive create and arrange a submission agreement that defines the terms and conditions of the transfer of records from the producer to the archive and details the scope of the records along with the nature of their validation and transformation. Section B, Transfer and Validation, details the actual transfer, validation, and transformation of records. The Guide is presented in the form of a large flowchart. Every part includes a narrative summary, a flowchart illustrating all of its steps, a description of each step, and a list of resources that each step utilizes and/or produces. The parts that comprise the two sections are listed in Figure 2.

Figure 2. Ingest Guide.
Although the Ingest Guide is a prescriptive guide for a trustworthy ingest process, it is not a detailed manual of procedures. The Guide describes the actions that must be undertaken to trust the ingest process and prescribes how to undertake these steps at a high level, but it does not prescribe how to proceed in full detail. For example, the Guide calls for an archive to select preservation formats for records it chooses to transform, but it does not dictate what those preservation formats should be.
The Ingest Guide gives archives a roadmap to build a network of well-documented resources, including tools, procedures, and policies, that serve as the foundation for well-documented appraisal decisions and accessioning activities. This documentation is a key element of a successful preservation program. Archives making undocumented decisions based on ad-hoc policies, procedures, and tools cannot hope to successfully preserve records.
The Guide is geared toward enabling archives to ingest records in a semi-automated and scaleable manner by helping it regularize and streamline many decision-making steps. We envision that archives could manage many of the resources described in the Guide as machine-readable objects. The more machine-readable resources an archive has, the more it can automate its ingest process. Obviously, expressing resources as machine-readable objects can take a considerable investment of effort. Each archive will have to determine the degree of automation that is appropriate for its operations. However, we feel that the growing scale and complexity of electronic records and data objects that archives will have to preserve will force them to rely on semi-automated, regularized processes to meet the demands of this task.
Maintain
In order for long-term preservation to be possible, archives must keep, store, and protect from harm records under their care; in short, they must maintain these records. This process is roughly equivalent to the Data Management and Archival Storage functions of the OAIS Reference Model and the Maintain Electronic Records process for the InterPARES Project’s Preservation Model.[12] The central thrust of our research on maintain is the Maintain Guide, which describes ten scheduled and twenty irregular event types that may occur during an archive’s maintenance of electronic records. The event types are listed in Figure 3.

Figure 3. Maintain Guide.
The Maintain Guide does not represent the entire preservation process, but only a core subset of that larger process. The Guide has excluded any management—such as preservation planning and administration—or subject-related decision-making activities from its purview in order to focus on the technical and procedural activities of maintaining data integrity. The Guide describes activities that an automated system or systems administrator or technician can execute without needing the subject or management knowledge of the records to undertake this work. Any maintenance work that rises to the level of administration or preservation planning falls outside of the scope of the Guide.
The Maintain Guide does not present a sequence of event types, but rather a set of event types that are each triggered by different circumstances. Each event type in the Guide describes the nature of the event, the preconditions for the event occurring, and a list of activities an archive must follow in response to an event. Although a prescriptive guide to undertaking maintenance activities, the Maintain Guide—like the Ingest Guide—does not fully prescribed all the details and decisions archives need to make.
The event types are much the same as those managed by any typical information systems (IS) department. However, the nature of the response to these events and the activities specified do not necessarily follow the standard operating procedures of the typical IS department. The requirements inherent in preserving electronic records may force those maintaining electronic records to undertake different and perhaps more expensive activities than most IS departments normally execute.
The Maintain Guide assumes that no archivist or electronic records preservation officer should attempt to maintain electronic records in isolation—particularly in a university setting. The expenses in technology and staff for conducting a trustworthy maintain process are significant. These costs will likely dwarf the normal operating budgets of most archives and will necessitate finding ways to utilize existing resources or sharing expenses across departments or even across institutions. Nearly all archives will have to collaborate with others such as its institutional information systems department, collaborators at other institutions, or outside vendors. 
Conclusion
The products of the Tufts-Yale Project are meant to help bring the electronic records research of the past two decades closer to the daily work of archivists and others charged with the preservation of electronic records and other digital objects. The work of this project does not provide complete solutions that archivists can simply turn into their policies and procedures, rather, this project provides detailed frameworks to help archives make better, more systematic decisions about their work.
The three main products of the Tufts-Yale Project, the requirements for recordkeeping systems and preservation activities, the Ingest Guide, and the Maintain Guide all suggest areas of further work. The requirement sets point to the need for creating new evaluation tools or implementing ones currently under development, such the RLG-NARA Audit Checklist for Certifying Digital Repositories. The Ingest Guide and Maintain Guide provide a roadmap for institutions engaged in digital preservation to carefully reexamine and possibly reengineer their business processes. Both guides also express the need for numerous machine-readable resources and services—many of which do not yet exist. Considerable community-based work needs to be done to develop these tools. We also hope that members of the archival, digital library, and digital preservation communities can take our work in directions we have not imagined.
1. Fedora is a general purpose repository system developed jointly by Cornell University and the University of Virginia Library. For more information, see <
http://www.fedora.info>.
8. The ten categories were: Compliance, Creation and Capture, Maintenance, Classification, Retention and Disposition, Protective, Preservation, Use Rights, Discovery and Delivery, and Design and Performance.
9. The six categories were: Common Services, Ingest, Archival Storage, Data Management, Administration, Preservation Planning, and Access.
10. ISO 15489-1: 2001, Information and documentation – Records management – Part 1: General. During this process we also gave careful consideration to mapping the recordkeeping requirements to Trusted Digital Repositories: Attributes and Responsibilities, but ultimately decided that ISO 15489 was a better fit for our requirements. Organizing the requirements according to ISO 15489 gave us an existing conceptual framework upon which we could shape the requirements. It is our opinion that there is no consensus or preferred framework for recordkeeping system requirements comparable to the OAIS Reference Model framework for preservation requirements.
12. “A Model of the Preservation Function,” Appendix 5 of The Long-term Preservation of Authentic Electronic Records: Findings of the InterPARES Project (San Miniato, Italy: Archilab, 2005).
