RLG
 Feature Article 1  

Treasuring the Digital Records of Science: Archiving E-Journals at the Koninklijke Bibliotheek

Author: Johan F. Steenbakkers - Koninklijke Bibliotheek, National Library of the Netherlands (johan.steenbakkers@kb.nl)

 

Overview

The Koninklijke Bibliotheek (KB) is the national and depository library of the Netherlands. The depository collection relies on voluntary arrangements with the Dutch publishers, not legislative mandate. In 1994, the KB decided to include e-publications with Dutch imprint in its deposit collection and renewed arrangements with the Dutch publishers. To accomplish this, the KB developed a dedicated infrastructure for processing and safekeeping of the e-publications. In 2002, the KB took the step to include international scientific e-journals in its deposit collection by signing the first formal archiving agreement with Elsevier Science. By doing so the KB became treasurer of an important part of the digital Records of Science. This responsibility implies an ongoing search for solutions for preservation and permanent access.

On August 20, 2002, at the Conference of the International Federation of Library Associations and Institutions (IFLA) in Glasgow, Elsevier Science and the Koninklijke Bibliotheek announced a groundbreaking electronic archiving agreement between publishers and libraries worldwide. The need to provide for permanent digital archiving has been evident to libraries [1] and to Elsevier for several years and Elsevier had been a leader in advocating publisher responsibility in this area. In 1999, Elsevier Science made a public commitment to ensure digital archiving with a trusted repository as part of its license with library customers.

Publisher’s View

The KB was a logical partner, well-known as a leader worldwide in experimentation and investment in digital preservation. Karen Hunter, Senior Vice President, Strategy at Elsevier and responsible for this digital archiving initiative, explains the relevance of this agreement:

“It is essential that we will be able to guarantee both authors and researchers using the journals that the electronic files will be permanently available. Journals have been called ‘the minutes of science.’ As we move toward journals being available only in electronic form and being held centrally on publishers’ computers, the public has the right to be assured that, should a publisher go out of business, these files will not be lost. This agreement provides that assurance for Elsevier Science titles, which constitute an essential part of the core scientific literature currently published.”

Librarian’s View

Research and development on long-term digital archiving has been top priority in the KB. “Ensuring permanent availability of information and knowledge, is at the heart of the KB's mission,” declared Wim van Drimmelen, Director General of the KB.

“Digital archiving is a logical extension of the role we always had and will have in the area of printed material, the modern version of a traditional task. In this era of electronic publishing new arrangements are needed globally in order to preserve our intellectual heritage. The KB wants to take an active part in these evolving new arrangements. It's an exciting challenge to find ways of coping with the fast pace of change in platforms and formats. From the start we committed ourselves strongly to this challenge. We take pride in this groundbreaking agreement with Elsevier and see it as a recognition of our achievements so far and a milestone on the way to our strategic goals.”

KB’s e-Depot

e-Depot is the name of the organization and infrastructure at the KB for archiving e-publications. The purpose of e-Depot is to ensure long-term availability of the digital files (the bits and bytes) and permanent access to the content (the information) captured in the files.

Within the organization of the KB three divisions are jointly responsible for running and developing the e-Depot: Acquisition & Processing Division, Information & Communication Technology, and Research & Development Division. The Acquisition & Processing Division is in charge of acquiring, processing, and archiving of e-publications. e-Depot is a special unit within this division and in charge of the day-to-day operations of obtaining, checking, and loading the e-publications, including their metadata. The Division for Information & Communication Technology is responsible for the technical maintenance of the infrastructure for the e-Depot. This task includes expanding the storage capability, guaranteeing backup, and providing media migration. This division also manages integration of the deposit system within the digital library infrastructure for cataloguing, search and retrieval, user registration, etc.

The Research & Development Division performs studies and experiments to develop and maintain the functionality of the e-Depot. These activities are usually joint projects with the two divisions mentioned before. External technology partners are often involved. The Research & Development Division also organizes or participates in international activities (e.g., development of standards, preservation studies and projects, conferences). For these activities a dedicated research unit named "Digital Preservation" has been created.


Figure 1. Organizational Structure of the Koninklijke Bibliotheek (click on image for larger view)

To coordinate the activities and policy development concerning the e-Depot, the KB has implemented the e-Depot Steering Board, chaired by the Director of Information Technology and Facility Management. In addition to the three divisions already mentioned, the User Services Division also participates in the board. This division is in charge of providing access to the e-publications under conditions specified by the publishers. Because of the strategic impact of the e-Depot on the KB’s policy and organization, the Director General of the KB usually takes part in the board meetings.

e-Depot Infrastructure

The infrastructure of the e-Depot consists of both components that were specifically developed for processing, archiving, and maintaining e-publications, and typical digital library functions. According to the NEDLIB Guidelines, [2] the deposit system should be a separate, dedicated entity within the library’s digital infrastructure. For the traditional library processes, such as cataloguing, search and retrieval, and user registration and authentication, the KB uses the provisions already in place. So these functions have not been duplicated within the deposit system. This approach allows both the depository system and the traditional library systems to evolve at their own pace (e.g. in terms of new functionality and technical updating). Separate entities for e-archiving and for the traditional library also work to keep matters as simple as possible, both for the library and for the system providers.

The deposit system DIAS (Digital Information Archiving System) is the technical core of the e-Depot. The functions at the left of Figure 2 are for receiving and loading: EPO is the Electronic Post Office; BER is the Basic Error Recovery; NBN is the National Bibliographic Number generator. The functions at the right are for search, retrieval, and delivery: GGC is the Central Cataloguing System of Pica/OCLC; KB-TITEL is the local overall catalogue database at the KB; IAA is the function for Identification, Authentication, and Authorization of end users.

Figure 2. The Deposit System Within the Digital Library Environment

The Depository Task Extended

As the national library of the Netherlands, a key role of the KB is to serve as the depository library for publications produced by the country. In the early 1990s it became clear that, after about two decades of experimentation by publishers, e-journals were getting off the ground. Having determined in 1994 to include electronic publications in its deposit collection, the KB initiated discussions with Elsevier Science (ES) in 1995 about depositing e-copies of the ES journals with a Dutch imprint. By 1996 a preliminary agreement was signed and the first e-journals—a total of 315 in the end—were deposited at the KB. Finally, in 1999 the Dutch Publisher's Association and the KB made an arrangement implying that publishers would deposit all electronic publications with Dutch Imprint at the KB. The arrangement covers offline and online publications and prescribes restricted access conditions.

In August 2002, ES signed the archiving agreement with the KB to ensure permanent archiving of all their electronic publications, most of them e-journals. ES is prepared to establish formal archival agent relationships internationally with a limited number of libraries or other institutions, such as the KB.

The e-publications in the KB deposit can be used onsite by persons authorized and registered by the KB as pass holders. Usage is also allowed for print or fax copies of articles for interlibrary loan within the Netherlands. The KB may open access to the journals to users in general in the case that neither ES nor a successor offers these publications to customers. Also open access may be offered to certain journals or publication years upon notice from ES. Information about the e-publications may be included in the KB’s online public catalogue or in the National Bibliography.

Currently three international publishers have signed an archiving agreement with the KB: Elsevier, Kluwer Academic, and the open access publisher Biomed Central. Agreements with more scientific publishers are in preparation. The decision of the KB to establish a formal archival relationship with international publishers builds on the national depository task. By extending this task to include international e-publications (at the moment mostly in science technology and medicine), the KB intends to contribute to the development of a global solution for safeguarding these e-publications. A global solution is needed because, for international e-publications, the traditional approach—national deposit and national bibliographic control—is no longer valid. To be sustainable, global depositing must eventually be based on new business models that take into account the permanent effort, and hence costs, of digital archiving. I have suggested earlier [3] that these costs should be an integral part of the costs of e-publishing. The experience at the KB has shown that once an e-deposit is in place, the costs to scale up the infrastructure and organization to include more publications are fairly modest.

e-Depot & Dutch Academic Repositories

In the Netherlands, Dutch universities, the KB, and three other academic institutions co-operate with the SURF Foundation (the foundation for the national science data network) in project DARE (Digital Academic Repositories). [4] The aim of DARE is to create an infrastructure of institutional repositories that will enable digital recording services, access, storage, and distribution of the Dutch academic output. The DARE infrastructure will closely interface with the e-Depot so that the published electronic academic output will be archived and preserved for the long term. Specific procedures and technological solutions will be developed, including provisions for return delivery, from the e-Depot to the repositories, of a copy of the original e-publication or a preserved and accessible copy.

Developing a Dedicated Deposit System

To handle the electronic publications, the KB needed a deposit system. In 1996 a first pilot system was developed in co-operation with AT&T. This pilot system was replaced in 1998 by a larger pilot system (up to 2 Tb storage) that was developed in cooperation with IBM. After several years of experiments and studies, a list of requirements for an operational deposit system could be compiled. A market scan had shown that a deposit system could not be bought off the shelf, so in 2000, the KB decided to tender for the development of one. Through a European tender procedure, IBM was selected as the best technology partner. The system was created on site at the KB premises. In October 2002, DIAS (Digital Information Archiving System) was handed over to the KB.


Figure 3. DIAS Configuration and OAIS (click on image for larger view)

The functional design of the DIAS is based on a standard for digital archiving, the Open Archival Information System Reference Model (OAIS-RM)/ISO 14721:2003. The system is designed to be durable; and provides for scalability, extensibility, and flexibility. It was built using off-the-shelf components as much as possible. [5] Figure 3 represents the functions of the system developed together with IBM for the e-Depot. The design of the system complies with the OAIS model, the OSI standard for digital archives that is shown on the background of IBM's functional design.

The key functions of DIAS are storage and long-term preservation. It allows the manual and automated ingest of digital publications. Once the publication is successfully stored, it is managed for preservation and permanent access. The preservation functionality is at the moment being developed further. For details about the configuration of DIAS delivered in 2002 to the KB, see the LTP report #1.

e-Depot Statistics

The Deposit System is capable of ingesting over 60,000 articles (mostly PDF) a day. The articles and their metadata are checked, processed for loading, and stored. The descriptive metadata are also copied to the KB catalogue database (see KB-TITEL in figure 2) for search and retrieval purposes.

  2003 2004 (growth) 2004 (prognosis)
e-journals 1.2 Tb 1.8 Tb 3.0 Tb
CD-ROMs 0.7 Tb 1.3 Tb 2.0 Tb
Total storage 1.9 Tb 3.1 Tb 5.0 Tb
e-journal titles 1,200 1,400 2,600
e-journal articles 1,600,000 2,900,000 4,500,000

Table 1. Terabytes of Storage Used and Quantity of Content by Type in the e-Depot, 2003-2004

Studying Long-Term Preservation

The contract for developing the deposit system included a joint research obligation, referred to as the ‘Long-Term Preservation Study.’ The research work was necessary because at that time the KB could not define specific enough requirements for preservation to demand development and delivery of the preservation functionality of the deposit system. It was agreed that IBM would take into account the results of the research effort when designing the depository system.

As a result of the Long-Term Preservation Study, a preliminary module for preservation could be realized. In addition, six reports were published in December 2002 summarizing the research results. The reports can be ordered in print from the KB or IBM, and are also available in PDF on the KB site. The titles of the reports illustrate the variety of preservation issues that have been covered:
1: The Long-Term Preservation Study of the DNEP Project—an Overview of the Results
2: Authenticity in a Digital Environment
3: Preservation Requirements in a Deposit System
4: The UVC: a Method for Preserving Digital Documents–Proof of Concept
5: Managing Media Migration in a Deposit System
6: Archiving Web Publications

After the deposit system was delivered and implemented in 2002, the KB continued at a modest scale [6] with the research on digital preservation. In 2003, KB and IBM worked on designing and developing further functionality for preservation management and for permanent access. The result is a further detailing of the ‘preservation planning’ function of the OAIS model into a Preservation Subsystem. In figure 4 the three components envisaged within the Preservation Subsystem are shown: the Preservation Manager, the Preservation Processor, and the Permanent Access Toolbox. Starting in 2003, a first version of the Preservation Manager has been developed. The Preservation Manager will be tested soon and, if appropriate, will be implemented within the e-Depot in 2004. [7]

Another result is a first permanent access tool, based on Raymond Lorie’s Universal Virtual Computer concept (see Long-Term Study Report 4). The tool enables one to view images in the future, regardless of any change in technical circumstances. [8] The development of more permanent access tools will need continuous dedicated research and development.


Figure 4. Preservation Subsystem in the OAIS Model (click on image for larger view)

Promoting Digital Preservation in Practice

The challenge of preserving digital information and guaranteeing permanent access to it can only be addressed successfully by realizing a long-standing and close co-operation of three key-players: leading memory institutions (national libraries and archives), main producers of information (publishers and public agencies), and, last but not least, leading IT companies. The development of the e-Depot at the KB together with the science publisher Elsevier and IT-company IBM is a good example of such a co-operation. These three partners have been breaking new ground in the functional, technical, and policy area, in order to develop permanent availability of digital information. It is hoped that more major players in the areas mentioned will actually take up their responsibility for digital preservation and start pushing back frontiers.

Notes

[1] National Library of the Netherlands and Elsevier Science make digital preservation history. Permanent digital archive assures perpetual accessibility of scientific heritage. Press release, Glasgow, 20th August 2002, by Elsevier Science and the Koninklijke Bibliotheek. (back)

[2] Johan Steenbakkers. The NEDLIB Guidelines. Setting up a Deposit System for Electronic Publications. NEDLIB Reports Series 5, November 2000, Koninklijke Bibliotheek. (back)

[3] Johan F. Steenbakkers. Digital archiving: a necessary evil or a new opportunity. Serials Review, 30/1, pp. 29-32, 2004. (back)

[4] For more information on DARE see www.surf.nl.(back)

[5] About DIAS see www-5.ibm.com/nl/dias.(back)

[6] In April 2003 a consortium of libraries, archives and IT companies unsuccessfully turned to the European Commission for financial support for an integrated preservation research project under the title PATCH (Permanent Access Toolbox for the digital Cultural Heritage). (back)

[7] Raymond J. van Diessen, Erik Oltmans and Hilde van Wijngaarden. Preservation Functionality in a Digital Archive. To be published in the proceedings of the Joint Conference on Digital Libraries 2004, Tucson, Arizona, June 2004. (back)

[8] Hilde van Wijngaarden and Erik Oltmans. Digital Preservation in Practice: The UVC for Images. To be published in Proceedings of the IS&T Archiving Conference, San Antonio, Texas. April 23rd, 2004. (back)
Copyright 2004 RLG.