RLG DigiNews
BROWSE ISSUES
SEARCH
RLG
   
  August 15, 2003, Volume 7, Number 4
ISSN 1093-5371


Table of Contents


Feature Article 1
Building a Digital Library of Kinematics, by Kizer Walker, John M. Saylor, Francis C. Moon, David W. Henderson, Hod Lipson, Daina Taimina, Ron Rice

Feature Article 2
Digital Preservation The Cost to Preserve Authentic Electronic Records in Perpetuity: Comparing Costs across Cost Models and Cost Frameworks, by Shelby Sanett

Highlighted Web Site
The Museum of Hoaxes

FAQ
Digital Preservation XML/XSLT-mediated File Format Migration as a Digital Preservation Strategy,
by Christopher Hamilton

Calendar of Events

Announcements

RLG News
Automatic Exposure:Capturing Technical Metadata for Digital Still Images

print this article

Building a Digital Library of Kinematics


Kizer Walker
John M. Saylor
Francis C. Moon
David W. Henderson
Hod Lipson
Daina Taimina
Ron Rice
Cornell University
Peaucellier straight-line mechanism
 
Photograph by K. Loeffler,
Dept. of Plant Pathology, Cornell University

As a team of Cornell University librarians and faculty in mathematics and mechanical engineering, we are building a digital library for teaching the principles of kinematics—the geometry of pure motion. The Kinematic Models for Design Digital Library (K-MODDL)[1] project is funded by a two-year collection grant from the National Science Digital Library , a program of the National Science Foundation to build shared digital collections of high-quality materials and services in support of science education at all levels. The project is scheduled for completion in summer 2004. The core of K-MODDL is the Reuleaux Collection of Mechanisms and Machines, an important collection of nineteenth-century model machine elements held by Cornell’s Sibley School of Mechanical and Aerospace Engineering.

K-MODDL will make freely available on the Web:

  • still and navigable moving images of these rare kinematic teaching models with systematic descriptions,
  • computer simulations of mathematical relationships associated with the mechanisms’ movements,
  • stereolithography files for “printing” working physical replicas,
  • historical and contemporary documents related to the collection of the mechanisms,
  • sample teaching modules that employ the models and simulations in the classroom at the undergraduate, high school, and middle school levels.

Break-out quoteKinematics is central to machine and mechanism design in engineering; it is part of the teaching of basic ideas of dynamics in physics, as well as geometric ideas and the ideas of motion in mathematics. The models in Cornell’s collection were designed for research and teaching by the German engineering professor Franz Reuleaux (1829-1905), the founder of modern kinematics and a forerunner of modern design theory of machines. Reuleaux set out to codify, analyze, and synthesize kinematic mechanisms. He laid the foundation for a systematic study of machines by defining clearly the machine and mechanism, determining the basic mechanical building blocks, and developing a system for classifying known mechanism types. Reuleaux created over eight hundred models of mechanisms to embody his basic machine elements and authorized the manufacture of over three hundred of these for technical schools to use in teaching engineers about machines.[2] Cornell’s collection was acquired in 1882.

A freely accessible, Web-based resource, K-MODDL documents a beautiful and historically significant artifact collection. In addition, the inclusion of navigable moving images and simulations of mathematical principles related to the machines’ movements restores the objects to their intended classroom use as teaching models of geometric and kinematic principles.

National Science Digital Library (NSDL)

The mission of the NSDL is to enhance science, technology, engineering, and mathematics education through a partnership of digital libraries joined by common technical and organizational frameworks. Individually and collectively, these partners engage and inform multiple clienteles, using shared resources to serve many communities of users, each with its own level of knowledge and approach to learning. The NSDL embodies long-standing library traditions of service, longevity, equal access, fair use, and privacy, as well as innovations that foster a spirit of inquiry and the accessibility of science to all.[3]

K-MODDL will exist as an autonomous collection housed at Cornell University Library and will also be searchable through the shared NSDL portal. For a collection such as K-MODDL to participate in the NSDL, it is necessary that the descriptive metadata associated with each item in the collection be shared with the NSDL central metadata repository. The Open Archives Initiative (OAI) Protocol for Metadata Harvesting facilitates this exchange by defining a mechanism for harvesting XML-formatted metadata from repositories.[4] OAI compatibility is thus an essential requirement of the K-MODDL project.

Representing Kinematic Motion

In his article "Developing a 3D Digital Library for Spatial Data," Jeremy Rowe describes topological modeling techniques that permit precise analysis of an artifact’s surface and volume based on its digital representation[5]. The K-MODDL project has taken a different approach to representing three-dimensionality in a two-dimensional medium. For this project the principle spatial quality that must be represented is three-dimensional kinematic motion. We concluded that interactive photographic animations coupled with abstract simulations of the mechanisms suffice to illustrate this factor in most cases.

Interactive Moving Images

The K-MODDL team is producing navigable movies that show the kinematic performances of the model machines. A navigable movie is a collection of still images that appear in a sequence determined by the motion of a user interface device, such as a mouse. This allows the display of images in a user-controlled way. For example, if the images depict an object viewed from various viewpoints revolving around the object, then moving the cursor will create the illusion of spinning the object, making the experience three-dimensional. The navigable movies in K-MODDL either depict a machine from various viewpoints, so it appears to spin about its center when the movie is navigated, or they portray various stages in its kinematic motion, so that the machine appears to function when the movie is navigated. This allows the user to go back and forth and examine the kinematic causality in detail.

spiral pump
A sequence of snapshots illustrating the motion of a spiral pump as the user slides the mouse

Kinematic Simulations

Still images and movies demonstrate the functionality of the machine but often obscure the pure kinematic motion associated with it. We have therefore developed a number of kinematic simulators to illustrate the geometric motion first hand. Moreover, a simulator allows users to interact with the machine, pushing and pulling in unscripted ways, modifying it and observing the consequences, and even breaking it.

The simulator is written as an applet so that it executes on the user’s computer and is thus fast and responsive. It simulates propagation of forces and motion using a relaxational algorithm. The user interacts with the machine using a "rubber band" that intuitively translates displacement to force. The machine moves, and any overloaded links change colors to red or blue depending on whether they are in tension or compression, respectively. The users can modify link lengths, connect and disconnect bars, and remove or modify grounding points, thereby creating inversions. They can also erase a machine and build a different one from scratch.

simulator
Simulator for exploring kinematics in two dimensions: (a)user exploring a sketched 4-bar mechanism, (b) drawing a Peaucellier straight-line mechanism, and (c) testing its performance

“Printing” in Three Dimensions[6]

What cannot be experienced with a digital collection is the physical handling of the models. Physical models of machines were prevalent in exhibitions and universities in the nineteenth and early twentieth centuries; today their role is largely filled by computer-aided design (CAD) models and simulations. These computational models are more versatile and of lower cost, but they lack the physical embodiment that is essential for an intuitive appreciation of many critical concepts of motion and force.

Break out quoteMembers of the K-MODDL team are using current rapid-prototyping technology to reproduce physical models as three-dimensional “prints” from digital files. The replicas are based on CAD drawings of the Reuleaux models, captured in stereolithography (STL) format. STL files can be exported for printing on a rapid-prototyping fabricator. This process creates a sequence of thermoplastic layers from a filament- wound coil that is heated and extruded through a nozzle. In order to create functioning mechanisms, a second, water-soluble release material is placed in the gaps between the movable parts.

STL files for several of the Reuleaux models will be available at the K-MODDL site, allowing users with access to rapid-prototyping equipment to download, 3D-print, and interact with their own fully functional physical replicas. While the audience for this part of the collection is clearly limited to a few large research facilities, the project team expects that as rapid-prototyping becomes more commonly available, such forms of documentation will become increasingly prevalent. Meanwhile, this technology is already reproducing accurate historical kinematic models as tools for both teaching and artifact conservancy.

The team has reproduced several pre-assembled, fully functional mechanisms; a sample of a clock escapement is shown in the figure below.

A clock escapement mechanism
(a)
(b)
A clock escapement mechanism: (a) original Reuleaux model, (b) rapid-prototype model

Reading and Writing Kinematics

Along with the machine images and their descriptions, K-MODDL will be a rich source of text materials pertaining to kinematics and the history and theory of mechanisms and machines. The collection will include original scholarship by project team members and others in the form of preprint articles, book chapters, and the like, as well as historical books, digitized in their entirety, and tutorials that model ways of using the collection’s resources in the classroom.

Historical Resources

The K-MODDL project has selected fifty books and other print documents for digitization and inclusion in the collection. These stem principally from the nineteenth and early twentieth centuries; some are older, and several are rare titles from Cornell Library’s History of Science Collection. These items will constitute a freely accessible and searchable digital collection of the historical literature of kinematics and the theory of machines. The materials are being scanned in a nondestructive process and stored as 600 dpi TIFF image files backed by searchable OCR’d text. K-MODDL will display PDF versions in a reader similar to the pages in other Cornell retrodigitization projects, such as the Making of America collection.

Learning Modules

The project team is developing several tutorials or learning modules that will aid instructors at various educational levels in integrating K-MODDL materials into their students’ curricula. At release in summer 2004, K-MODDL will include tutorials suitable for students in undergraduate, high school, and middle school mathematics and technology classes, as well as for undergraduate education in engineering design and the history of technology. The high school and middle school tutorials are being developed in collaboration with teachers from several school districts in Ithaca, New York, and surrounding areas. This direct contact with members of a key target audience has been crucial in ensuring that the materials developed for the K-MODDL collection are understandable, useful, and usable in as many different ways as possible. It is the aim of the project to encourage other educators to produce tutorials of their own from the K-MODDL source material and submit them for possible inclusion in the collection.

Technical issues

Mathematics on the Web

The display of mathematical symbols and equations presents a challenging implementation issue, not only for K-MODDL, but for any Web-based mathematics project. To date, there is no accepted standard for mathematical notation in html. The World Wide Web Consortium (W3C) has issued a recommendation for MathML, an XML application for mathematics, but it is not yet widely implemented. The remaining option is to reproduce the equations and symbols as imbedded graphics files, which in some implementations appear fuzzy or otherwise difficult to read. All of these issues can be avoided using PDF files, but at the sacrifice of the flexibility of html pages. The project team is still wrestling with how best to display mathematics on the K-MODDL site.

Metadata creation

The K-MODDL project team includes subject experts—mechanical engineers and mathematicians—who are acting as descriptive catalogers for the objects in the collection. The project team is working with Digital Consulting and Production Services, Cornell University Library’s internal consulting unit, to build a rich, Qualified Dublin Core-based metadata schema to drive collocation and retrieval. The team plans to declare an XML namespace that would establish qualifying element and attribute names for describing mechanical objects.

K-MODDL contains several types of digital object: documents in a variety of formats, bibliographic records, images, and audiovisual formats, as well as multiformat, multipage learning modules, Java-based simulations, etc. Many of these relate in a one-to-one relationship to a specific model, and the concept of the "model" serves as a base element from which the related components inherit certain descriptive metadata.

Repository Development

The project team deliberated at length on the question of a database structure that would be suitable for the K-MODDL collection. Two proprietary solutions for Internet-based image management and delivery already in use at CUL were considered, as was the University of Wisconsin’s open-source Scout Portal Toolkit. The latter has been developed in close association with the NSDL and is promising where a flat database structure and URL aggregation meet a project’s needs. As K-MODDL progressed, it became clear that a relational structure was needed and the decision was made to pursue a homespun solution.

Having established the metadata we will share with other systems via the NSDL Metadata Repository and OAI, we are now mapping metadata requirements to fields and tables in a relational database management system, using a MySQL database server. Our user interfaces for database administration, as well as for searching and browsing the data, are coded in PHP; in a later phase of development, PHP will also be used to develop the OAI interface for metadata harvesting. The K-MODDL Web server is Apache. MySQL, PHP, and Apache are robust, widely used, open source technologies. The use of open source software both technically and philosophically supports our mission to build a repository that is secure, reliable, interoperable, and sustainable over time.

Ease of Use

Though a functional prototype of the K-MODDL repository is not yet complete, the project team is working to identify target audiences and their characteristics as Web users (for example, bandwidth limitations, technical ability, system limitations, etc.). K-MODDL is a pedagogical space designed for use by teachers and researchers, as well as students at a range of educational levels, and other young and adult learners in a variety of environments; the project team cannot assume that all our end users possess fast connections to the Internet, up-to-date computers, and advanced computer skills. Special care is being taken to assure that all target audiences will find K-MODDL easy to use and valuable as a learning resource.

K-MODDL will be a resource of lasting value for interdisciplinary teaching and research in machine design, mathematics, the history of science, and other fields. The project breaks new ground in the scholarly application of multimedia techniques on the Web and offers new ways of approaching representations of movement and three-dimensionality in digital libraries. While much work remains to be done, we hope that the project can offer lessons and perhaps even templates that can be applied to digital library work in other areas.

Acknowledgments

Thanks to Paul White, Susan Peck, Kathryn Gelsone, Jimmy Hai, and Carlo Paventi for assisting in modeling, photographing, and simulating the exhibits and to Javier Lezaun and David Caruso for project research.

Footnotes

[1]The K-MODDL Web site contains project information, news, and pertinent links. Samples and demonstrations from the collection are added and updated periodically. The full collection will debut in summer 2004.(back)

[2]Francis C. Moon, “Franz Reuleaux: Contributions to 19th C. Kinematics and Theory of Machines,” Cornell University Library Technical Reports and Papers and Applied Mechanics Reviews 56.2 (Mar. 2003): 261-285.(back)

[3]On the NSDL, see Lee L. Zia, “The NSF National Science, Technology, Engineering, and Mathematics Education Digital Library (NSDL) Program: New Projects and a Progress Report,” D-Lib Magazine 7.11 (Nov. 2001).(back)

[4]For more information on the NSDL’s prescription and use of metadata, see the NSDL Metadata Primer. The NSDL Communications portal has more information about the architecture of the metadata repository, as well as other technical and social aspects of the NSDL.(back)

[5]Jeremy Rowe, "Developing a 3D Digital Library for Spatial Data: Issues Identified and Description of Prototype," RLG DigiNews 6.5 (15 Oct. 2002).(back)

[6]On 3D-printing technology and its application in K-MODDL, see Hod Lipson, Francis C. Moon, Jimmy Hai, Carlo Paventi, “3D-Printing the History of Mechanisms,” Cornell University Library Technical Reports and Papers.(back)

print this article

The Cost to Preserve Authentic Electronic Records in Perpetuity: Comparing Costs across Cost Models and Cost Frameworks

Shelby Sanett
Amigos Library Services, Inc

“Within the U.S. and elsewhere, funding agencies are advancing digital preservation as a serious research area. Digital preservation projects and cooperative international efforts have increased significantly over the past decade. Examples include the US National Science Foundation (NSF), collaborative international programs with the UK Joint Information Systems Committee (JISC), with the Deutsche Forschunsgemeinschaft (DFG), and with the European Union (EU); and the international InterPARES Project, which has received funding from a number of countries. These have spurred the development of an interdisciplinary domain that has as its primary goal ensuring long-term access to materials in digital format for legal, economic, and cultural purposes. This domain unites the interests of librarians, archivists, museum specialists, and other preservation professionals with digital object creators, computer scientists, lawyers, publishers and others. The issues cut across government, non-profit, commercial, and academic sectors.”

Editor's Note, RLG DigiNews, October 15, 2002

The economics of digital preservation underlies many projects and programs exploring how to identify and resolve various practical and theoretical problems of preserving digital objects. There is a need to provide scaleable, workable solutions quickly. Large numbers of born-digital and born-again electronic records and materials require immediate attention,[1] and more will be produced over time. Along with technological advances goes a responsibility on the part of the creators and the preservers to develop both an economic framework and a context within which these processes can assure continuing access to information preserved in electronic form.

This paper explores issues of cost modeling and proposes a possible methodology to evaluate costing frameworks and models to preserve authentic electronic records. The methodology could be adapted by institutions interested in the costs of the preservation strategy under consideration. For the purposes of this paper, the term electronic materials will refer to authentic electronic records in born-digital or born-again (reformatted) digital form.

Currently several research projects and institutional initiatives are investigating a broad spectrum of issues in preserving electronic materials. The emphasis in research so far has been on the development of software and hardware to support the implementation of long-term preservation strategies. Significant funding has been provided to various projects to assess whether and how authentic electronic records can be preserved and to address other questions that have arisen from previous research. Assuming there are workable strategies for maintaining digital information, I believe we must now consider how to evaluate costing strategies, develop policies to ensure continued preservation and access, and formulate other long-term mechanisms for digital preservation.

Rationale for a Proposed Methodology to Evaluate Cost Models

Cost Model ImageCost models facilitate an informed decision-making process. Over the past several years a number of cost models and costing frameworks for the preservation of electronic materials have been advanced that consider a variety of ways to determine the full extent of the costs, including possible hidden ones. Some relate costs to the life cycle of the records (Hendley),[2] the OAIS model, or a particular project[3]; identify elements of the digital preservation process; or otherwise attempt to determine categories of costs.

It is expected, however, that the full costs of preserving electronic records will be high and will extend over a long period. Therefore it is particularly important for decision makers to use a methodology to evaluate the various frameworks and models, because they must have information that is as specific as possible. This information will support the choice of a preservation strategy (indicative of the full range of costs) or suite of strategies appropriate to a particular institution, its mission, and anticipated use of the materials.

An evaluative process is needed that can be applied when the decision-making process has begun. In the end this process should facilitate making an appropriate choice from among the cost models. So far, such a methodology to evaluate across models has not been addressed in the literature.

The requirements for an evaluative strategy of this type are complex. The proposed methodology must be flexible enough to be applied to a broad spectrum of extant models, yet credible so that the results have merit. It must be applicable to costing frameworks and models not yet developed. As well, the methodology should be user friendly and easy to apply. A daunting prospect.

A Proposed Methodology to Evaluate Cost Models

For an evaluative methodology in this area to be effective, it must be straightforward. Costing models and frameworks can be evaluated in terms of (1) acquisition and preservation-related activities, and (2) access-related activities.

Earlier I proposed a cost framework that includes three categories: (1) Costs for Preserving Electronic Records (table 1), which include capital costs, direct operating costs, and indirect operating costs; (2) Costs for Use (table 2), which are costs associated with the continued institutional use of the preserved records; and (3) User Populations (table 3), which provides information relating to access and the users’ use of the records. This activity includes gathering various types of information that could then be used to provide access and user services.

Costing categories were then established in the first two categories in combination with the preservation process model developed by the Preservation Task Force of the InterPARES 1 Project.[4] The components of the activity categories may shift as necessary in the future, but the cost categories themselves are consistent with generally accepted accounting principles.

Table 1

Costs for Preserving Electronic Records

Part 1.

 

Capital Costs

  • Software development
  • Hardware (for preservation processing)
  • Research and development
  • Facilities
  • Interface design for processing electronic records

Part 2.

 

Direct Operating Costs

  • Identify potential records
  • Evaluate/Examine (negotiate intellectual property issues and rights)
  • Acquire records (staff and purchase or royalty payment)
  • Establish inventory record
  • Process (prepare for preservation, confirm authenticity/integrity of record)
  • Produce metadata
  • Preserve (select and implement appropriate strategy)
  • Storage (container/other)
  • Maintenance (refresh/migrate)
  • Monitor
  • Evaluate

Part 3.

 

Indirect Operating Costs (Overhead)

  • Indirect staff (supervision, clerical support, benefit times, training times, unallocated times)
  • Facilities (rent, utilities, off-site storage of records)
  • Amortization of capital costs
  • General and administrative (human resources, accounting, funding development and grant writing, staff training and professional development, partnerships with other institutions, policy development)

Table 2

Costs for Use of Preserved Electronic Records

Part 1.

 

Capital Costs

  • Equipment, software, user training, facilities, interface design, etc.

Part 2.

 

Direct Operating Costs

  • Storage, royalties, communications, record access mechanisms
  • Staff for monitoring, user query response and services, records access management

Part 3.

 

Indirect Operating Costs (Overhead)

  • Indirect staff, facilities, amortization of capital costs, general and administrative

Table 3

User Populations
Part 1. Mission statement, legal mandate
Part 2. Target user population
Part 3. Unintended audience, i.e., as a result of exposure to records on the Web

Part 4.

User statistics
  1. The capital costs for preserving electronic records (table 1, part 1) are costs incurred at the beginning. They must be amortized over a time period, such as five years, that can then be used as the period for present value calculations.
  2. Indirect and direct operating costs for preserving electronic records (table 1, parts 2 and 3) are costs incurred on a yearly basis. They should be brought to present value (the current value of a sum of money expected to be received in the future). The period of five years is suggested because the magnitude of the investment in hardware and software is great enough to justify replacement at five years, rather than earlier.
  3. The sum of A) and B) is the total cost for preserving electronic records brought to present value. The cost per item preserved is (A+B)/(total number of items preserved).
  4. Operating costs for the use of preserved electronic records (table 2) are incurred on a yearly basis. These costs should be brought to present value.
  5. The sum of C) and D) is the total present value for preservation and use of electronic records. The cost per use is (C+D)/(total use of electronic records over five years [or the period used for present value calculations]).

To apply the proposed evaluative methodology, acquisition and preservation-related activities would include the following:

Table 4

Costs for Acquiring and Preserving Electronic Records

Part 1.

 

Capital Costs
  • Software development
  • Hardware (for preservation processing)
  • Research and development
  • Facilities
  • Interface design for processing electronic records

Part 2.

 

Direct Operating Costs
  • Identify potential records
  • Evaluate/Examine (negotiate intellectual property issues and rights)
  • Acquire records (staff and purchase or royalty payment)
  • Establish inventory record
  • Process (prepare for preservation, confirm authenticity/integrity of record)
  • Produce metadata
  • Preserve (select and implement appropriate strategy)
  • Storage (container/other)
  • Maintenance (refresh/migrate)
  • Monitor
  • Evaluate
  • Delete

Part 3.

 

Indirect Operating Costs (Overhead)
  • Indirect staff (supervision, clerical support, benefit times, training times, unallocated times)
  • Facilities (rent, utilities, off-site storage of records)
  • Amortization of capital costs
  • General and administrative (human resources, accounting, funding development and grant writing, staff training and professional development, partnerships with other institutions, policy development)

Costs associated with access-related activities, including the institution’s own use would include:

Table 5

Costs for Institutional Use/Outside Access of Preserved Electronic Records
Part 1.

Capital Costs for Use

  • Equipment, software, user training, facilities, interface design, etc.
Part 2.

Direct Operating Costs for Use

  • Storage, royalties, communications, record access mechanisms
  • Staff for monitoring, re-appraising records with each new migration, deleting records, user query response and services, records access management
Part 3.

Indirect Operating Costs for Use

  • Indirect staff, facilities, amortization of capital costs, general and administrative
Part 4. Mission statement, legal mandate
Part 5. Target user population
Part 6. Unintended audience, i.e., as a result of exposure to records on the Web
Part 7. User statistics

Thus the categories referenced in tables 1, 2, and 3 have been reduced to two tables (4 and 5) when an activity-driven evaluative methodology of those categories is applied. The discussion of the process to arrive at costs to acquire, preserve, and access the records is revised as follows:

A = The capital costs for preserving electronic records (table 4, part 1) are costs incurred at the beginning. They must be amortized over a time period, such as five years, which can then be used as the period for present value calculations. This section remains the same.
B = Indirect and direct operating costs for preserving electronic records (table 4, parts 2 and 3) are costs which are incurred on a yearly basis. They should be brought to present value (the current value of a sum of money expected to be received in the future). To remain consistent with the original framework, the period of five years is suggested. This section also remains the same.
C = The sum of A and B is the total cost for preserving electronic records brought to present value. The cost per item preserved is (A+B)/(total number of items preserved). This section also remains unchanged.
D = Costs for institutional use/outside access of preserved electronic records (table 5, parts 1, 2, and 3) are incurred on a yearly basis. These costs should be brought to present value.
E = The sum of C and D is the total present value for acquisition, preservation, and access to electronic records. The cost per use is (C+D)/(total use of electronic records over five years [or the period used for present value calculations].

This evaluative strategy can be applied to extant and future cost models to determine the costs to be incurred to preserve electronic materials. Using this activity-driven methodology, one can compare similar categories of costs across the cost models and frameworks and against the context of a particular preservation strategy or suite of preservation strategies being examined, e.g., cost-related decisions can be determined within a context of other models as well as according to the requirements of a particular preservation strategy.

The methodology can also be customized to a particular institution by adding or deleting appropriate components of the categories. For example, an institution may determine that it must delete records or re-appraise records with each new migration and consider each of these actions to be an institutional use activity. The activity can be added to the direct operating costs in table 5 and would include the cost (planned or actual) allocated for staff to accomplish the task. If the institution determined that costs for the activity were associated with the acquisition and preservation of electronic records, the cost category would be added to table 4, part 2 (direct operating costs).

When the methodology is applied, it may in fact turn out to be less expensive for an institution to continue to maintain the records if its mission permits. In walking through this exercise, not all the costs are allocated evenly. Individual institutions knowing their own priorities and situation best would choose where to allocate a number of the capital and indirect costs. However, this approach is a step toward identifying commonalities among the extant models. Having that information should result in more-informed choices on the part of the decision makers.

How Does This Fit into the Larger Picture?

Who will pay?Now is the time to develop frameworks against which cost, policy, and other issues may be examined and answered. It is clear that the soft-funding scenario of the past and present is not sufficient to fund present and projected activities to preserve electronic materials. The issue of institutional sustainability in preservation must be discussed and resolved. Who will pay for the costs involved with acquiring, preserving, and accessing the materials? A number of strategies have been proposed, some of which are continued institutional support, fee for use, fee from the author, fee from the publisher, and legislative support. Infrastructure funding should be explored as well, to determine whether strategies may be successfully applied by other institutions, e.g., in universities, to determine how computer networking and other costs are paid or funded. Not all of these are possible solutions for all institutions. If institutions had a realistic idea of costs, they could plan accordingly. A cost model makes intelligent planning possible.

We must develop a strategic plan for the future to fund the long-term preservation of the world’s digital and born-again digital materials. This plan should include preservation process models; costing frameworks; preservation policies; a financial, organizational, and economic infrastructure to support ongoing preservation efforts; a pedagogical platform to train future preservation administrators; a centralized funded agency to coordinate these activities; and a blueprint to develop a model of coordinated cross-institutional cooperation and regional repositories.

Footnotes

[1]Born-digital electronic records refer to those that originated in electronic form; born-again digital electronic records refer to those that originated in an analog form and were subsequently transformed, e.g., reformatted, into digital form.(back)

[2]In appendix 3 of his paper, Hendley provides a Table of Digital Preservation Cost Elements compiled by Neil Beagrie, Daniel Greenstein, and the Arts and Humanities Data Service.(back)

[3]Sanett, Shelby. “Toward Developing a Framework of Cost Elements for Preserving Authentic Electronic Records into Perpetuity,” College & Research Libraries 63(5) (September 2002): 388-404.(back)

[4]Both the US team and the International team have Web sites. The InterPARES 1 Project is an international research initiative that involves national archives, university archives, and various government agencies working together with industry representatives and a team of academic researchers in archival science, preservation, and computer science to address important issues of permanent preservation of authentic electronic records. The mandate of the InterPARES 1 Project was to investigate and develop theoretical frameworks, methodologies, and prototype systems. The InterPARES 1 Project focused on the permanent preservation of inactive electronic records, that is, records that are no longer needed for day-to-day business activity, but needed to be preserved for administrative, legal, or historical reasons. Examples of such records include organizational records, legal records, and research data. Among the electronic forms these records might take are ASCII text files, graphics, video and audio material, moving graphics, e-mail with attachments, materials incorporated into a database management system, and PDF viewer materials. The InterPARES 2 Project is currently under way.(back)


Highlighted Web Site

The Museum of Hoaxes

Museum of Hoaxes

This site features a well-documented gallery of photo hoaxes, as well as a detailed list of some of the more-famous and infamous Web site hoaxes. Museum curator Alex Boese provides insightful comments, along with descriptions of how information was manipulated to make the sites appear credible. A few moments’ of viewing some of these elaborate hoaxes will cause anyone to give a second thought to issues of authenticity and trust in the digital realm. In case you have ever wondered about the authenticity of the “British Stick Insect Foundation” or the “National Driver's License Search,” the museum will give you the facts . . . unless, of course, the museum itself is a hoax.



print this faq

FAQ

XML/XSLT-Mediated File Format Migration as a Digital Preservation Strategy

I've heard of XML being used for file formats such as SVG (Scalable Vector Graphics) and OpenOffice and to encode metadata. Can XML also be used to transform one file format into another?

This issue's FAQ is answered by Christopher Hamilton, Programmer/Analyst in the IRIS Research Department, Cornell University Library.

Taking Responsibility and Control of Your Data

Extensible Markup Language (XML) allows for the markup of information in a standard vocabulary free from control of proprietary software vendors. It has been said, “XML is shifting the balance of power from software vendors to software users” (Tidwell) because XML and its related family of technologies are open, standards-based, and platform-neutral. Metadata embedded in platform-neutral XML documents could outlive the original digital object or storage medium. However, to preserve metadata in XML documents is only one way to harness the power of XML for digital preservation. The XML standard for transforming XML documents into different formats is Extensible Stylesheet Language (XSL).

What Is XSL?

XSL consists of a family of three World Wide Web Consortium (W3C) recommendations. These three technologies together form a set of tools for writing style sheets that transform XML documents and format them for display. XSL Transformations (XSLT) is the general-purpose tag-based language that defines rules for transforming an XML document into another XML document or another format such as XHTML, JPG, PDF, or SVG. XML Path Language (XPath) is the path expression language that is used with multiple XML standards for locating nodes within, and traversing the tree structure of, an XML document. XPath is embedded within an XSLT style sheet, and together they instruct the XSLT processor to transform an XML document. The other part of XSL is XSL Formatting Objects (XSL-FO), another tag-based language that adds advanced styling features and is optimized for describing the page layout of print and Web documents.

XSLT as a Format Migration Tool

With the ability to transform XML documents into other formats, it becomes possible to use XSLT to migrate digital objects to new formats. A typical example is using XSLT to transform an XML document into a PDF or an XHTML document. However, current research is working to extend XSLT to allow for the transformation of multimedia files (images, audio, video) to other formats. For example, XSLT could transform a TIFF file from an archive of multimedia objects into JPEG2000 format for display on the Web.

However, a standard TIFF file is not encoded in an XML-compliant format. Therefore, something new is needed for parsing and generating XML descriptions of bitstreams, the most basic form of a digital object. For this purpose two languages are currently in development: Bitstream Syntax Description Language (BSDL) and Formal Language for Audio-Visual Object Representation (XFlavor). To continue with the above example, BSDL or XFlavor would be used to generate an XML-based bitstream-level description of a TIFF file, which is then transformable via XSLT into the desired JPEG2000 format.

flowchart from Tiff image to Jpeg Image
Bitstream level image transformation to derivative format

Moreover, using BSDL or XFlavor would not alter or incrementally corrupt the original object over time, therefore preserving the longevity of the original object.

Endless Possibilities

Once it’s possible to generate an XML-based description of a multimedia object’s bitstream, an XSLT style sheet could be applied to transform the file into another format. Entire digital collections could be migrated to the latest file format or platform. Tools could also be developed that automate XML/XSLT-based migration of files (i.e., with one XSLT style sheet an entire digital collection could be migrated into a derivative format). Users would no longer need to rely on software that reads obsolete file formats, since obsolete formats would be transformable into more-stable formats with XSLT. While much work remains before such scenarios become real, successful implementation would represent a substantial improvement over file format migration using current tools and techniques.

References

Amielh, M. and Devillers, S. May. “Bitstream Syntax Description Language: Application of XML-Schema to Multimedia Content Adaptation” WWW2002: The Eleventh International World Wide Web Conference, Honolulu, Hawaii, USA..

Eleftheriadis, A. and Hong, D. “Flavor: A Language for Media Representation”. To appear as chapter four in Handbook of Video Databases: Design and Applications, Furht, B., and Marques, O., Ed. CRC Press, September, 2003.

The World Wide Web Consortium. “Extensible Markup Language (XML)."

The World Wide Web Consortium. “The Extensible Stylesheet Language Family (XSL)."

Tidwell, Doug. XSLT, O'Reilly & Associates, Inc. 2001.

Calendar of Events

Digital Resources for the Humanities Conference
August 31-September 3, 2003
Cheltenham, UK

Conference themes include the impact of access to digital resources on teaching and learning, as well as issues related to digital libraries, archives, and museums.

ERPANET Training Seminar: Metadata in Digital Preservation
September 3-5, 2003
Marburg, Germany

The seminar will discuss various perspectives on the use of metadata to facilitate preservation and issues of interoperability, as well as the role of standards and schemas.

In Practice, Good Practice: Fourth Open Archives Forum
September 4-5, 2003
Bath, UK

The event will focus on good practice in the implementation of open archives. A particular theme of the workshop will be the use of the Open Archives Initiative Protocol for Metadata Harvesting [OAI-PMH] in the area of cultural heritage.

Toward a User-Centered Approach to Digital Libraries Conference
September 8-9, 2003
Espoo, Finland

This digital libraries conference will focus on user experience, challenges, evaluation, and open access to scientific publications.

ICHIM03: International Cultural Heritage Informatics Meeting Exploring Cultural Institutions and Digital Technology
September 8-12, 2003
Paris, France

Major themes include cultural institutions and digital publishing, management and technological strategies for digitization of cultural heritage materials, and the dissemination, exploitation, and enrichment of digital assets.

ERPANET Training Workshop: DSpace Installation
September 9-11, 2003
Glasgow, Scotland

Jointly sponsored by the Cambridge-MIT Institute and ERPANET, this workshop provides training and guidance for technical staff of institutions that are considering implementing a digital repository using the “DSpace” software.

Preservation of Electronic Records: New Knowledge and Decision-Making
September 15-18, 2003
Ottawa, Ontario

Cohosted by the Canadian Conservation Institute (CCI), the Library and Archives of Canada (LAC), and the Canadian Heritage Information Network (CHIN), the goal of this symposium is to increase awareness of the issues surrounding digital preservation. Panels will focus on making decisions and finding practical solutions that can be implemented immediately.

Document Management and Document Imaging Course
September 16-18, 2003
Los Angeles, Nevada

The course is free to graduate students in library science, to persons traveling from Africa, and to the native peoples of the United States, Canada, Australia, and New Zealand. Please ask for a scholarship request review.

Safeguarding European Photographic Images for Access (SEPIA)
September 18-20, 2003
Helsinki, Finland

This conference will look at issues of digital preservation, scanning requirements, and other photographic preservation issues.

Digital Library Program Development
September 24-25, 2003
Spartanburg, South Carolina

A two-day SOLINET workshop on developing local digital resources in a responsible, sustainable, and systematic manner.

Thesauri & Taxonomies: An International Conference and Workshop
September 29-30, 2003
London, UK

2003 Dublin Core Conference
September 28-October 2, 2003
Seattle, WA

The conference will provide participants with a forum for interaction with researchers, practitioners, and decision makers concerned with advances in metadata for resource discovery, retrieval, management, and use. This years’ conference theme is Supporting Communities of Discourse and Practice: Metadata Research & Applications.

Digital Preservation Management: Short-Term Solutions to Long-Term Problems
October 20-24, 2003
Cornell University Library, Ithaca, NY

Cornell University Library is offering the second digital preservation training program October 20-24 with funding from the National Endowment for the Humanities. The workshop targets managers at organizations that are facing the digital preservation challenge and highlights the need for the integration of organizational and technological issues to devise an appropriate approach. This limited-enrollment workshop has a registration fee of $750 per participant. Registration is now open for the October workshop. There will be three additional offerings of the workshop in 2004.

Off the Wall and Online: A Preconference to the Museum Computer Networks Annual Meeting
November 4-5, 2003
Las Vegas, Nevada

This conference explores digitization for collections management and education in museums and other cultural institutions.

Museum Computer Network Annual Meeting
November 5-8, 2003
Las Vegas, Nevada

SIS 2004: Digital Information Exchange, Pathways to Build Global Information Society
January 21-23, 2004
New Dehli, India

Conference themes include metadata strategy, digitization, metadata formats and standards, digital resources, open access initiatives, and management of digital resources.

International Conference on Digital Libraries: Knowledge Creation, Preservation, Access and Management
February 24-27, 2004
New Delhi, India


Announcements

OCLC and RLG Announce the Formation of PREMIS
The PREservation Metadata: Implementation Strategies working group will address practical aspects of implementing preservation metadata in digital preservation systems.

NISO Announces New Registration Process
The National Information Standards Organization (NISO) is expanding its standards development program by offering a new registration process. The registration process is designed to make specifications and guidelines developed outside the formal consensus process available to a larger community of potential implementers.

ALCTS Metadata Enrichment Task Force (METF)
is now availabe. The final draft of Marcia Bates’s report "Improving User Access to Library Catalog and Portal Information"

NINCH Symposium Report Released
The report on the April 8 NINCH symposium, The Price of Digitization:
New Cost Models for Cultural and Educational Institutions, is now available.

NISO Announces New Thesaurus Initiative
The National Information Standards Organization (NISO) has announced a new initiative to revise the leading standard for thesaurus construction. The new standard will introduce more user-friendly language, update the content to reflect new technology, and expand the scope to a wider variety of producing organizations and content.

Seeking Comments for a Web-based PRONOM
PRONOM is an application developed by the Digital Preservation Department of the UK National Archives for managing information about the file formats used to store electronic records and the software applications used to create and render those formats. The National Archives are currently seeking comments on the user requirements for developing a Web-accessible version of the system.

ERPANET Announces ErpaEPrints
ERPANET, in collaboration with Project Daedalus, announced at IFLA 2003 an ePrints Service for the digital preservation sphere to provide a platform to address the lack of persistence in online references. The service hopes to provide the digital preservation community with the opportunity to build a digital library of e-prints and ensure their long-term accessibility.

LizardTech Acquired by Celartem Technology (PDF)
LizardTech, Inc., the owner of several important digital imaging technologies, including DjVu, MrSID, and Genuine Fractals, is being acquired by Celartem Technology USA, Inc., a subsidiary of Japan's Celartem Technology, Inc. The acquisition follows a period of instability at LizardTech during which staffing was reduced, the CEO departed, and the company became unresponsive to some outside inquiries. In its press release announcing the acquisition, Celartam said, "There are no expected short-term changes to the existing product and technology brands being used by LizardTech." This is Celartem Technology's second major digital imaging acquisition in the past year. In fall 2002, it purchased Extensis, Inc., the maker of the popular digital asset management package Portfolio.

NDIIPP Calls for Applications
The National Digital Information Infrastructure and Preservation Program (NDIIPP) is seeking applications for projects that will advance the nationwide program to collect and preserve digital materials. The "Program Announcement to Support Building a Network of Partners" begins the next phase of the National Digital Information Infrastructure and Preservation Program, which outlines three broad categories of investment projects: practical applications and models, the digital preservation technical architecture, and basic digital preservation research. This program announcement is for projects that focus on the first investment category.


RLG News

Automatic Exposure: Capturing Technical Metadata for Digital Still Images
For most digital files, technical metadata can be seen as a first line of defense against losing access to digital assets and the investment they represent. But thus far, capturing technical metadata has tended to be a manual, time consuming process so little actual "capture" is taking place despite the acknowledgment of its importance. As cultural institutions and commercial organizations continue to create libraries of digital still images, it has increasingly become necessary to determine how technical metadata for digital still images can be captured and recorded in an easier and more cost-effective manner.

Automatic Exposure, a new RLG-led initiative, seeks to minimize the cost of technical metadata acquisition and maximize the cultural heritage community's capability of ensuring long-term access to digital assets. The project will engage manufacturers of high-end scanners and digital cameras in a dialog about how their products can automatically capture NISO Z39.87 technical metadata and make it available for transfer into digital repositories and asset management systems. (NISO Z39.87: Technical Metadata for Digital Still Images [Draft Standard for Trial Use] defines a standard, comprehensive set of data elements key to an institution's ability to manage and preserve its digital images.)

The first phase of the initiative - a survey of existing technical metadata practices of cultural heritage institutions - has been completed. Over 100 responses, representing a broad variety of institutions, have been received. Survey responses are now being compiled and analyzed. The survey results will help us identify the stakeholders in the cultural heritage and vendor communities, as well as common technical metadata practices across institutions. The results will also help determine the invitation list for the next phase of the initiative - a joint meeting of vendors and cultural heritage professionals in November 2003.

Potential outcomes of this project include a crosswalk (a mapping between the elements of two standards) for NISO Z39.87 and the industry's consumer standard DIG35, to enable both cultural heritage institutions and vendors to map existing DIG35 data to Z39.87; identification of existing equipment or software that automatically capture and record technical metadata; identification of potential partnerships between the manufacturer and cultural heritage community; and an opportunity to influence further development of digital camera/scanner products that may better meet the long-term needs of the cultural heritage community.

For more details on this RLG initiative, please visit our website at http://www.rlg.org/longterm/autotechmetadata.html.

 

Publishing Information

RLG DigiNews (ISSN 1093-5371) is a Web-based newsletter conceived by the RLG preservation community and developed to serve a broad readership around the world. It is produced by staff in the Department of Research, Cornell University Library, in consultation with RLG and is published six times a year at www.rlg.org.

Materials in RLG DigiNews are subject to copyright and other proprietary rights. Permission is hereby given to use material found here for research purposes or private study. When citing RLG DigiNews, include the article title and author referenced plus "RLG DigiNews,

." Any uses other than for research or private study require written permission from RLG and/or the author of the article. To receive this, and prior to using RLG DigiNews contents in any presentations or materials you share with others, please contact Jennifer Hartzell (jlh@notes.rlg.org), RLG Corporate Communications.

Please send comments and questions about this or other issues to the RLG DigiNews editors.

Co-Editors: Anne R. Kenney and Nancy Y. McGovern; Associate Editor: Robin Dale (RLG); Technical Researcher: Richard Entlich; Contributor: Erica Olsen; Copy Editor: Martha Crowe; Production Coordinator: Carla DeMello; Assistant: Valerie Jacoski.

All links in this issue were confirmed accurate as of August 15, 2003.

 
   
 
RLG DigiNews
BROWSE ISSUES
SEARCH
RLG