RLG DigiNews
BROWSE ISSUES
SEARCH
RLG
   
  October 15, 2002, Volume 6, Number 5
ISSN 1093-5371


Table of Contents

Feature Article 1
Developing a 3D Digital Library for Spatial Data: Issues Identified and Description of Prototype, by Jeremy Rowe

Feature Article 2
The National Science Foundation Digital Government Research Program’s Role in the Long-Term Preservation of Digital Materials, by Larry Brandt, Valerie Gregg, and Sue Stendebach

Highlighted Web Site
National Digital Information Infrastructure and Preservation Program (NDIIPP)

FAQ
Reference Linking

Calendar of Events

Announcements

RLG News


 

printer icon print this article

Developing a 3D Digital Library for Spatial Data:
Issues Identified and Description of Prototype

Jeremy Rowe
Head, Media Development
I
nformation Technology
Partnership for Research in Spatial Modeling
Arizona State University
jeremy.rowe@asu.edu

Figure 1. polygonal mesh model
Figure 1. Polygonal mesh model of Native American ceramic vessel with regions of curvature identified by color

Overview

The increasing power of computing techniques to model complex geometry and to compare models to identify similarities among them has created powerful new capabilities to analyze and interact with data representing three-dimensional (3D) objects. The techniques to model and extract meaning from 3D information create complex data that must be described, stored, and displayed to be useful to researchers. Because two-dimensional (2D) data representations afford a limited view to scientists in related disciplines, the Partnership for Research in Spatial Modeling (PRISM) project at Arizona State University (ASU) developed modeling and analytic tools that raise the level of abstraction and add semantic value to 3D data.

The goals of the project have been to improve scientific communication and assist in generating new knowledge, particularly about natural objects, whose asymmetry makes study using 2D representations insufficient. The tools developed use curvature and topology to help researchers understand and interact with 3D data, thus simplifying the analysis of surface and volume in the representation of an object. The tools automatically extract information about features and regions of interest to researchers; calculate quantifiable, replicable metric data; and generate metadata about the object being studied. To make this information useful to researchers, the project developed prototype interactive, sketch-based interfaces that permit researchers to remotely search, identify, and interact with the detailed, highly accurate 3D models of the objects. The results support comparative analysis of contextual and spatial information and extend research on asymmetric man-made and natural objects.

Background

Digital libraries are in the midst of a significant evolutionary process. Ever since the initial efforts to apply computers to library catalogs and circulation control, the scope and complexity of the technology-assisted interactions between libraries and their users have created both challenges and opportunities. As catalogs were computerized and the Internet began to link libraries, the emphasis of projects shifted to creating linkages between catalogs to let users search across multiple libraries.

Developing standards for communication, search, and display were initial challenges that, once addressed, gave researchers access to library collections around the world. Soon, modem access permitted researchers using home or office computers to search as effectively as at the dedicated terminals in the physical library. Projects to provide internal access to graphics and images of objects within collections were undertaken using stand-alone applications like HyperCard or Macromedia Director, which stored content on videodiscs or CD-ROM.

As graphic browsers became more powerful, remote access to images became possible. Faster connections, more-powerful hardware and software, and cheaper storage fostered significant projects to digitize graphic content and expanded their scope even further to create digital virtual collections. The interpretive material woven into the primary source material provided by digital libraries continued to increase tremendously, as did access to collateral material available via the Web. As computing power grew, objects could be captured and displayed on the Web.

Scientific visualization has followed a similar path, from computer-assisted statistical analysis to the sophisticated modeling and visualization of complex scientific data that have dramatically changed the kit of research tools in virtually every field. Data can be acquired as textual or numeric descriptions, as well as optically using cameras or microscopes, or electronically using sensors or CCDs. Two examples of the many scientific visualization projects displaying images of 3D objects include:

  • Forma Urbis Romae project at Stanford (FURStanford), which uses a viewer to display 3D images of the fragments of a Roman artifact
  • digital morphology project at the University of Texas (DigiMorph), which displays surface-modeled images derived from computer-aided tomography (CAT) scan data of biological specimens

Figure 2. Forma Urbis Romae screen capture
Figure 2. Screen capture, Forma Urbis Romae project

Figure 3. Screen capture of DigiMorph
Figure 3. Screen capture of DigiMorph, the digital morphology project at the University of Texas

Projects such as these permit users to zoom, rotate, and visually interact with the image using their keyboard and mouse. The DigiMorph project also provides access to the individual-slice images that are 2D representations of the interior composition of CAT-scanned specimens.

2D vs. 3D Object Representations

Unfortunately, most digital libraries to date use two-dimensional images to represent 3D objects. Single slices of CAT scan or confocal microscope data can provide valuable information about the interior of the object, and QuickTime VR views of exterior surfaces let researchers view detailed pictures of object surfaces. Though these images of 3D objects display surface attributes and can support detailed visual analysis and comparisons of surface models, the image representations lack key components essential for many types of scientific analysis. These 2D image models do not capture the underlying geometry and topology that define the relationships between the points that comprise the surface or interior components of the object. This spatial data about the geometry and topology associated with the object is necessary for comprehensive modeling, measurement, and analysis of the 3D objects.

More complex objects in terms of variety in shape and changes in curvature are more difficult to quantify and analyze. By developing mathematical techniques to represent the shape and curvature, accurate models of the surface of 3D objects, such as ceramic vessels, bones, or lithics involved in the PRISM pilot projects, can be created. The surface models and sophisticated mathematical tools provide the ability to analyze, identify, and compare the objects that they represent.

The precision of the models supports measurements whose accuracy equals or exceeds that possible using traditional 2D tools such as calipers and rulers. For example, one of the scanners captures surface data points less than 300 microns (0.3mm) apart, producing high-density triangular meshes with an average resolution of over 1000 points per cm2. In addition, measurements such as height, width, maximum height or width, surface area, or volume can be easily, consistently, and accurately calculated from the scanned data using software tools developed by the project team, even for asymmetric natural objects.

Figure 4. Conceptual model of PRISM
Figure 4. Conceptual model of PRISM digital library for 3D spatial data

Use of 3D data also makes possible new measures based on topology and global or local changes in curvature that define the shape of the original object. Using mathematical models and surface and volume information, many new and powerful analytic tools become available—boundaries can be objectively identified, small local areas of changes in curvature identified and compared, and accurate, replicable measurements calculated automatically.

Access to the object geometry adds the capacity to objectively quantify and analyze spatial measures that define the object. Such geometric information permits analysis of the object using spatial descriptive characteristics such as:

  • area of features or overall surface of the object
  • volume within the object or its internal components
  • orientation of internal components and of the overall object
  • proximity and distance between points of interest
  • changes in local or overall curvature
  • object symmetry

These objective measures augment subjective descriptions and are extremely helpful to domain researchers, particularly when studying asymmetrical objects made by man or occurring in nature.

Features

Once meaning has been linked to the changes in topology, shape, or curvature by the domain scientists, a meaningful "feature" can be defined. The modeling process can provide an objective method to calculate physical measurements and to identify boundaries and local areas of interest to researchers, by the changes that are associated with the feature. Once identified, each feature can be described by its size, position, shape, or curvature.  Examples of features that can be extracted from the model data include the maximum diameter or height of a ceramic vessel.

Figure 5. Mathematically extracted features
Figure 5. Mathematically extracted features: bone surfaces (left), lithic stone tools (right)

Features that are mathematically abstract can also be of interest to the researcher, such as the base or neck of a vessel, the keel of a ship, boundaries of the joint surfaces on a bone, or spindles that form in the nucleus of a cell during meiosis. Often the tools developed to identify features and regions offer additional capabilities that raise new research questions within the disciplines. For example, ceramic analysts have used tools that identify mathematically defined features found on the vertical profile curve of a vessel such as end points, points of vertical tangency, inflection points, and corner points. These features are extremely helpful in analyzing abstract concepts such as vessel shape and style.

Ideally, fresh research questions arise for the domain scientists as each new tool is applied to the object data, which raises new challenges for the computer scientists and fosters another cycle of new tool development. In addition to the tangible research benefit, a significant result of this process has been the cross-pollination of graduate students and a considerable increase in collaboration among faculty researchers across disciplines.

The PRISM Project

To begin to address these issues, PRISM at Arizona State University has developed prototype digital collections of scanned data describing 3D objects. The scanned data includes descriptions of geometric and topologic data in addition to the object surface. Goals of the project were couched in terms of developing partnerships between computer scientists and domain researchers to develop processes to:

  • create quantifiable, measurable models
  • automatically identify and extract data
  • create catalog information
  • automatically populate databases
  • support analysis and interaction
  • help answer research questions and generate new knowledge

The project grew from an interdisciplinary team of researchers from Computer Science, Mechanical Engineering, Anthropology, Fine Arts, and Information Technology. Two lab areas were used, one proximal to the computer science researchers, the other adjacent to the domain science departments. Additional research partners created a web of physical resources and personnel across the university that encouraged interaction and team-based development.

Initial research questions posed by a discipline scientist were shared at team meetings. Research and exploration were initiated, potential solutions brainstormed, and development tasks assigned to smaller teams. As further discussion was needed, prototypes were developed, evaluation input was sought, and formal and informal activity within the team moved the tools and techniques forward. Formally, presentations at team meetings were shared discussions. Informally, after the initial team formation, when sharing expertise and approaches was sufficient, the team members would initiate ad hoc group interaction to work through problems, share ideas and solutions, and compare progress in other project areas.

A summary of the project development sequence includes:

  • team formation and process development
  • shared vocabulary development
  • XML schema and DTD development
  • modeling and visualization research and development
  • data acquisition—scanning/digitizing the objects
  • data extraction—using the tools to extract features, identify regions of interest, and create metadata tags
  • data storage
    • text and tabular data
      • links to other databases
      • generated or derived data
    • binary geometric/spatial data
      • point cloud data
      • polygonal mesh
      • surface/volume models
  • query interface development
  • evaluation
  • revision/identification of new research questions
Figure 6. Prototype visual query interface
Figure 6. Prototype profile-based visual query interface for searching ceramic vessels

Metadata

Metadata to describe the object—and the modeled and derived representations and measures—was an important project design issue. A conceptual goal of the metadata component of the project was to develop an extensible schema structure that could accommodate the addition of new types of objects as the project evolved. An object class was defined as the master class document type definition (DTD) for each item in the digital library database. For the 3D digital library project, the additional descriptive data about each object was defined and organized as classes of contextual or spatial data definitions.

Contextual definitions describe text and metric information about the object. This class includes subclasses for metadata such as type, item name, catalog number, collection, or provenance that are associated with objects as they are acquired, processed, and archived. These fields were initially determined by existing descriptive data elements, though efforts were made to design a schema structure that would accommodate other object types. Several design iterations to refine the schema so it would work across object types have been completed.

Spatial data types define the 3D attributes of the object, including raw data, thumbnails, models, and calculated and derived data about the topology, shape, and composition of the object. Common descriptive components and geometric elements permit shared use of the modeling and analysis tools across classes of objects as new object types are added. An additional goal was to develop standards for description and organization that permit automated cataloging and population of databases as objects are scanned and processed.

Because of its familiarity and the availability of resources, an SQL database was used to store the contextual and spatial data for the initial project. Fields were assigned to each data element and metadata description.  The large spatial data files were stored as hyperlinks to data storage databases. Generally accepted data formats such as binary, PLY, HTML, and XML, were used to make data accessible and simplify migration and access to the data over time.

The process of acquiring object data from the ceramic vessel, bone, and lithic pilot projects starts with laser scanning to capture the 3D data that defines the object. Cellular data is obtained from a confocal microscope. Other data from digital cameras, CAT scanners, MRI, and satellite and aerial scanning has also been used. Once the point cloud data has been obtained from assembling the scanned data, mathematical modeling is applied to identify features and regions of interest to the domain scientists. Software tools developed by the project team generate analytic data about the original object, automatically assign metadata about spatial characteristics, and populate the database.

A visual query process was developed to permit researchers to search and interact with the data using both contextual (text and numeric descriptive data) and spatial (shape and topological attribute) data. A sketch-based interface was developed that permits users to input both context and sketches to visually describe the object to initiate the search. Several text and spatial matching algorithms are used to identify and rank order objects in the database that match the search criteria.

Visual Query Process Flowchart thumbnail
Figure 7. Query interface and search process diagram for PRISM ceramic vessel data (Click on image for larger display.)

Initial development of the digital collections focused on Classic Period (A.D. 1250-1450) prehistoric Hohokam ceramic vessels from central Arizona housed at the Archeological Research Institute at ASU. Additional development has involved bone shape and surface, lithic tools, brain structures, and DNA structures in fertilized mouse egg cells. Research has extended to other disciplines with interest in spatial analysis, including cloud formation, wind erosion, and facial recognition.

We feel that one of the next key challenges in digital library development is to create the processes and tools to support interaction with 3D geometrically and topologically rich information about 3D objects. Capturing the breadth and complexity of data that spatially defines 3D objects offers many design and process challenges in developing digital collections. Several important issues regarding standards and conventions must be addressed to create geometrically accurate 3D digital libraries, including:

  • acquiring 3D object data—file formats, "resolution" (in terms of numbers of points or polygons) that is needed to describe the object, requirements for topological integrity or completeness of the scan. etc.
  • describing object data—the standards, semantic linkages, etc., to describe the contextual and spatial data to permit the identification of individual objects and searching across data collections
  • storing data (file formats, etc.)
  • representing the data—conventions for modeling and displaying the object data
  • interacting with the data—developing common tools and interface conventions to assist user interaction with the data and to begin building a visual literacy of 3D query and analysis

Discussion

One of the pleasant surprises during this project has been the ease of extending the modeling and analytic tools developed for one specific discipline to other research domains.The interactive growth of the tools for surface and volume modeling was another. The improvements that have resulted from the iterative process of identifying a domain research question, developing an application tool, deployment, analysis of potential applications across other research domains, and identification of new research questions have generated significant progress in developing modeling and analytic tools applicable to 3D data.

Figure 8. Bone editor interface
Figure 8. Bone editor developed by PRISM team with plane representing the angle of the trapezium joint surface

As 3D data-acquisition tools become more affordable and readily available, the amount of 3D data that must be described, stored, and displayed will grow dramatically. Accommodating this huge data-management challenge will require the establishment of standards and tools to analyze and add meaning to the data.

Several efforts are underway by the PRISM team or are planned to further extend the capabilities of the tools already developed and their application to domain research. In terms of infrastructure, the move from custom plug-ins to Java-based display will simplify deployment.  We are exploring alternatives to the SQL database currently used, such as object-oriented databases. Another effort to improve searching is a pilot XML search protocol developed by the National Science Foundation Biological Databases and Informatics project at ASU in conjunction with the ASU Long Term Ecological Research Metadata Committee and the Knowledge Network for Biocomplexity project at the National Center for Environmental Analysis and Synthesis. The "Xanthoria" metadata query system uses SOAP (Simple Object Access Protocol) to send XML query requests and responses and supports simultaneous Web-based querying of distributed, structurally different metadata repositories.

The analytic tools continue to develop as improvements are made in the feature extraction and region-editing applications and as more powerful techniques are developed to compare curvature, identify matches, and rank search results. Key to these efforts are the expanding partnerships with other research areas with their own unique modeling and visualization needs. Included to date are more complex anatomical data from CAT scanners and MRI, cloud-formation pattern recognition, geological erosion, and identification of targets within complex, noisy environmental data.

Figure 9. Prototype bone joint surface tool interface
Figure 9. Prototype bone joint surface tool interface

Interface design continues to evolve. The project is evaluating models developed for 3D query and display by other projects, including:

The development of realistic 3D interface models that permit the researcher to sculpt the query image in 3D space is progressing, as are additional analytic tools such as planar overlays to visualize and objectively compare joint surfaces of bones. Techniques to bookmark searches to permit replication and simplify the comparison of objects in the databases are also being explored. A complex variation of bookmarks involves researchers' using the region editor and additional analytic tools such as the planar overlay to interact with the data and create their own interpretive models. Creating storage techniques for these derived, researcher-defined or modeled data, and managing "version control" to permit replication and deconstruction of the analysis is another challenge.

User evaluation of the current interface layout, color palette, and design continues using both surface and volume model data. In addition to initially developing specific bone or ceramic vessel interfaces for the different research domains, the project is working to identify commonalities and conventions to develop a unified interface model. This common design appears to be possible in initial-query interface screens, where a differentiation of interface display occurs as objects are identified, search results are returned, and researchers drill down into object data that may vary across disciplines.

Conclusion

Development of the current model has been an enlightening exercise in interdisciplinary project development. The translation of ideas, approaches, and vocabulary among disciplines has taken significant time and effort. Even when common vocabulary is used, the discipline-specific definitions and nuances can vary significantly.

The tools developed by the PRISM team and other researchers working to model and visualize 3D data have great potential to extend research in many disciplines. The initial challenges have focused on data acquisition and the development and display of models. Initial efforts to display images of surface models using QuickTime and plug-ins have significantly expanded research and science education as complex natural objects become approachable through such visualization. The addition of modeling and analytic tools based on surface and volume that permit objective quantification and analysis of 3D data have the potential to further extend discipline research.

As 3D data and the tools for visualization and analysis become more available, there is an increasing need for intuitive interfaces to provide gateways to the data. Digital libraries of 3D data will need to design effective processes to provide access to content created for specific projects and accommodate 3D data that they obtain from the increasing number of applications in business and industry (e.g., e-commerce, GIS, medical imaging, GPR, satellite and aerial scanning). Standards are needed for data description, storage, interchange, and searching. Conventions for display and organizing research tools are essential to generate broad acceptance and foster effective use.

Because researchers and patrons bring different strategies and approaches to their quests for information, organization and interfaces need to accommodate differences in learning styles, visual literacy, and sophistication. Evaluation data and continued research into learning styles, communication preferences, and visual communication and display are needed to guide interface design. Clearly, the development of simple, elegant, easy-to-use interfaces to accommodate the range of tools and user preferences will be a significant challenge now and in the future.

Acknowledgements

This work was supported in part by the National Science Foundation (grant IIS-9980166) and funding from the Vice Provost for Research and Economic Development at Arizona State University. The authors would like to thank all of the collaborators that make up the Partnership for Research in Spatial Modeling (PRISM) team, particularly Anshuman Razdan, Gerald Farin, Daniel Collins, Peter McCartney, Matthew Tocheri, Mary Zhu, Mark Henderson, Arleyn Simon, Mary Marzke, and David Capco.



 

print this article

Editors’ Note:
Within the U.S. and elsewhere, funding agencies are advancing digital preservation as a serious research area.  Digital preservation projects and cooperative international efforts have increased significantly over the past decade.  Examples include: the US National Science Foundation (NSF) collaborative international programs with the UK Joint Information Systems Committee (JISC), with the Deutsche Forschungsgemeinschaft (DFG), and with the European Union (EU); and the international InterPARES recordkeeping project, which has received funding from a number of countries. These have spurred the development of an interdisciplinary domain that has as its primary goal ensuring long-term access to materials in digital format for legal, economic, and cultural purposes.  This domain unites the interests of librarians, archivists, museum specialists, and other preservation professionals with digital object creators, computer scientists, lawyers, publishers, and others.  The issues cut across government, non-profit, commercial, and academic sectors.  This article discusses one program that has increasing ties to digital program initiatives.

The National Science Foundation Digital Government Research Program’s Role in the Long-Term Preservation of Digital Materials

Digital Government Research Program Managers:
Larry Brandt
lbrandt@nsf.gov

Valerie Gregg
vgregg@nsf.gov

Sue Stendebach
sstendeb@nsf.gov

As part of a major effort to preserve vital cultural heritage material, the National Science Foundation’s Digital Government Research Program (DG) and Digital Libraries Initiative Phase 2 are working closely with the Library of Congress, the National Archives and Records Administration, The National Library of Medicine, the National Agricultural Library, Institute of Museum and Library Sciences, and other organizations to map out a comprehensive research agenda.  The intent of this collaboration is to establish a research program that will support and encourage the exploration of innovative information technologies (IT), policies, economic models, and education and training that could ensure long-term preservation and future availability of such digital materials.

The National Science Foundation and its Digital Government Research Program dg.o logo

The NSF promotes science education and scientific advancement via grants to academic researchers.  Within the NSF, the DG Research Program is in the Computer Information Sciences Engineering Directorate.  The DG Program expands on this standard NSF model by including government entities (federal, state, local, and international) in the equation.

The need for the government to respond to rapid technological change provided the impetus for establishing the DG program.  The emergence of the Internet and its applications has fundamentally altered the environment in which government agencies at all levels conduct their missions and deliver services. These sweeping changes have also affected the mechanisms that underpin democracy and civic discourse. Concurrently, the public’s heightened expectations for government services are being driven by their increasing familiarity with the private sector’s rapid deployment of these new technologies to provide business-critical applications.  The government’s commitment to the development and application of information technology is defined by the scope and scale of the information services it is required to provide; the nature of its role as a collector, interpreter, distributor, and preserver of very large public data sets; the requirement it has to deliver services to all sectors of society, regardless of location, income level, or extent of computer expertise; the need to uphold its tacit and inviolable contract for accessible and reliable information sources and services; the need to balance national security requirements and the privacy of citizens; the need to select and maintain in perpetuity digital objects that are of value to the government or its citizens; and the need to implement the political, economic, and societal mandates that are expressed in law, regulation, and administrative procedures. 

The goal of the DG Research Program is to fund research at the intersections of the computer and information sciences research communities, related social, political, and behavioral science research communities, and the problems and missions of government agencies. The DG Research Program is predicated on three viewpoints:

  • The government sector can usefully inform and enhance its strategic vision through academic research collaborations, thus speeding the innovation, development, deployment, and application of more advanced technologies, methods, policies and processes into usable systems.
  • The unique combination of participants and requirements in the government sector presents new opportunities for academic researchers to gain access to important problems and data in real-world large-scale contexts.
  • To make the best use of available resources to meet its broad range of goals and objectives, the government sector needs to understand and predict the impact of these technologies on government agencies and services, governance, and the democratic process.

The DG Research Program solicits two classes of proposals (or a combination of both) as follows:

  1. Multi-disciplinary and multi-sector partnerships of researchers in information technologies and government agencies at all levels in order to foster collaboration among societal sectors. 
  2. Social, political, and behavioral research on the effect of information technologies on the forms, processes, impact and outcomes of IT within government, both from the standpoint of government agencies and from the standpoint of the public at large.
The DG Research Program, like other programs at NSF, often holds workshops to bring interested agencies, researchers and others together to develop a research agenda on a specific topic.  The research agenda then serves as a guide for grant proposers to the DG Research Program and to the Program’s peer reviewers in examining submitted IT research proposals relevant to that topic.  Researchers may also submit an ad hoc proposal during the year for a small grant for exploratory research (SGER) or a workshop grant. 

The DG Research program is entering its fifth year of competition.  The DG Research Program’s next submission deadline is November 7, 2002, and in subsequent years, the second Wednesday in October.   Some examples of DG Research domain areas include electronic grants administration; interoperable data, networks and architectures; security, privacy and information assurance; Federal statistics; and long-term archiving of digital objects.  Details about research areas and applications can be found on the DG program’s Web site.

Digital Government and the Library/Archival Community: Project for the Long-Term Preservation of Digital Materials

How then can archivists and librarians make use of and help formulate research that is funded by the DG Research Program?  A prime opportunity is the current long-term digital preservation project.  In accordance with the Digital Government model, the staff of the Library of Congress and other government librarians and preservation professionals brought to NSF’s attention a pressing need to develop and employ cutting-edge technologies to successfully preserve the myriad heterogeneous digital materials, all of which are increasing at an unprecedented rate.  Technological challenges abound in this area.  Further challenges in deciding upon and implementing new technologies confront those responsible for ensuring long-term preservation. At present, common, consensus-based policies and procedures are not in place to guide the collection, sharing, preservation and archiving of information in digital format.  Finding innovative methods for the long-term preservation of this information and associated materials will require intensive IT research and development, and associated agreed-upon policies—in collaboration with the affected libraries and archives, as well as non-profit and commercial interests.  Without the full cooperation among the library and archive community, a universal set of technologies and associated policies can neither be developed nor implemented successfully.

Similarly, digital preservation research cannot just begin at the door of IT, but must also include exploration of legal, organizational, political, and societal needs and impacts.  The technology is ultimately key to a successful long-term preservation system; however, without considering these other impacts, a system is not likely to succeed.

In April 2002, NSF hosted a workshop in cooperation with the Library of Congress.  Dr. Margaret Hedstrom, as Principal Investigator, led the two-day working meeting of a multi-disciplinary group of 50 participants, including those from academic institutions, government agencies, professional associations, and private businesses.  The main research categories derived from the workshop are: attributes of digital repositories, attributes of archived collections, tools and technologies, and economic and policy models.  Some of the specifically identified IT needs include:

  • Identifier systems to formulate advanced naming hierarchies for digital collections of all sizes
  • Enhanced and explicit collection definitions
  • Standards and mechanisms for specifying the preservation characteristics of compound, hyperlinked and nested digital objects
  • Approaches to maintain the consistency and longevity of digital objects in the face of a rapidly changing administrative and technological landscape.
Workshop participants also recommend that:
  • NSF, in collaboration with other government agencies, should establish a multi-disciplinary research program for the long-term preservation of digital materials.
  • Government agencies, such as LC, NARA, NLM, IMLS, DOD, and others, should collaborate with the researchers to develop research proposals and to contribute resources, as appropriate.
  • The new research program should be up to ten years in duration, reaching an annual award amount of $10 million by 2005.

The full research agenda for digital preservation, based on these and other identified needs, will be included in the final report, which should be available on-line and in hard copy format later this year. A pre-publication draft of the report: Research Challenges in Digital Archiving: Towards a National Infrastructure for Long-Term Preservation of Digital Information is available at: http://www.si.umich.edu/digarch/.Breakout quote

Following this path, NSF’s DG Research Program hopes to announce a special call for proposals specific to preservation of digital materials in the spring of 2003.  Given the keen interest of numerous researchers, collaboration among the relevant government agencies, and the willingness to develop partnerships in pursuing innovative research, the DG Program is confident that new technologies to accomplish the desired and necessary long-term preservation of digital materials will result.

Continually evolving long-term IT research will be necessary in all areas, as well as in digital preservation, to keep step with rapid changes in data, information availability and communication, technologies, government administrators and policies, and overall missions.  If we wish to retain our prominence as a society, government needs to lead the way, rather than follow.  The long-term preservation of digital materials project is but one example of how the DG Research Program can pave the way for researchers and agencies to bring government to the technological forefront.  Other ways in which libraries and archives might participate in and contribute to the DG program are:

  • Defining and promoting the role of libraries and archives in the provision of products and services that support and enable digital government
  • Developing approaches, tools, and techniques that enhance or encourage the information literacy of citizens
  • Developing specific applications that improve or enable accessibility to digital formats or collections that fit within the scope of digital government

The DG Web site provides the most current information on the research agenda, program calls and announcements, and the annual dg.o conference that features the results of funded projects. 

Further information and context for DG research can be obtained from the following reports:



Highlighted Web Site

National Digital Information Infrastructure and Preservation Program (NDIIPP)

This program was begun in 2000 by the U.S. Congress, which appropriated $100 million for the development of digital preservation policies, standards and technologies. NDIPP was envisioned as a collaborative effort, led by the Library of Congress, and involving other research libraries, government agencies, universities, and private companies with expertise in managing and preserving digital information resources.

The site provides news about the program, along with planning documents, meeting reports, and presentations. Also included are the results of commissioned studies, including a series of “Environmental Scans,” commissioned in 2001, covering Web sites, e-books, e-journals, digital sound and video files, and digital television. As the NDIIP progresses, this site will be a vital source of information on the state of the art in digital preservation.

NDIIPP website banner



print this article
FAQ

I'm developing a Web site of text, digital images, and references. Should I be making special efforts to implement any kind of reference linking?

First, be clear about what reference linking is. It is the process by which a bibliographic reference or citation to a document becomes a link to that document. How does it differ from a link on a Web page to another Web page? The answer is that, at its simplest, it does not. An HTML link is an in-line reference link. Sometimes a set of links is gathered at the end of a Web document, reproducing the format of traditional, non-digital journal articles, but the placement of the links in the document is a convention, not a necessity. The same is true for links in PDF documents or in other digital formats. Another convention from the print world is the format that displays metadata about the referenced document—author, title, publisher, and so on. In non-digital documents, these elements of a bibliographic citation help a reader find the cited work. In a reference-linked digital document, the link "finds" the document for the reader. Thus, reference linking is a value-added feature of digitized materials.

Second, decide whether the users of this material will find reference links useful. Assuming the answer is "yes," you are still faced with several important questions. Does the community of users have permission to view the cited documents, and does the mechanism used to link to the documents recognize those rights? Some of the documents you might want to link to could have access restrictions that exclude some potential users from viewing the documents. Depending on the domain of the collection and the cited documents, you could be providing users with many links that end in refusal. The "appropriate copy" problem, also known as the "localization problem," occurs when a document is available from multiple sources, some of which may not be available to a particular user. In this case, the link you create will provide some users with access but will exclude others who rightfully should be able to view the document from a different resource provider. To solve this problem, an institution can provide localized links using a tool like SFX, one product that has been fairly widely used for this purpose. (1)

Third, you need to consider the stability of the links. Are the cited documents located at a stable address? A permanent address is an unrealistic wish in the fluid digital world. The only way to make sure that an address points to a document is to change the address whenever a document moves. If this change had to be made for every link that pointed to a particular document, the cost would be such that few people would ever bother to use reference linking. Some sort of persistent identifier is needed to decouple a document from its current location. Several systems of redirection are now in use—PURL servers for local collections, Digital Object Identifier resolution services, and persistent identification numbers assigned by reviewing bodies such as Mathematical Reviews or e-print services such as the e-print ArXiv, among others.

Fourth, you need to consider if there are sufficient resources available to create and maintain the links. Can you afford to add this value to the collection, and can you ensure the maintenance of the links' integrity? Adding reference links has a cost, whether the links are created manually or automatically. There are trade-offs in using either method. Humans are relatively slow and expensive, but they parse references into semantic constructs that can be sent to a lookup service easily. Automated methods are usually fast and inexpensive—for large numbers of links—but are not reliable in making decisions when ambiguity exists. Moreover, automatic linking processes are inexpensive only during operation, but not during the programming phase. General linkers that take a document and find the locations of every referenced document do not yet exist.

Once the elements of a query to a lookup service are in hand, there can still be costs involved in finding a location to link to. Using a standard Web search engine takes time. The current economic model of sending metadata to a DOI resolution service involves payments during the registration and lookup phases (but not when a user clicks on a link). Behind every persistent identifier is some infrastructure for updating links. The costs are sometimes carried by the document owners, sometimes by the lookup services, but in every case there is a cost that you should consider before you make a decision about reference linking a collection.

In summary, reference linking is a potentially important value-added feature to digital documents. The decision to add that value depends on your users, the nature of your collection, and your budget.

Errata: Clarification note added 17 October 2002: Original text named SFX as "the mechanism" to provide local links. SFX is an example of an implementation of the OpenURL technique for reference linking; other tools also exist.

calendar of events

Calendar of Events

Open Archival Information System Training Seminar
November  28-29, 2002
The Royal Library, Denmark
ERPANET is pleased to announce its first training seminar on the Open Archival Information System (OAIS) Model. Participants will obtain a working  knowledge of the full range of preservation functions of OAIS. The seminar will provide an overview of the model, an introduction to OAIS metadata, current applications, and group work sessions that will focus on the possible application of the model in the participants' own organizations.

The 5th International Conference on Asian Digital Libraries (ICADL'02)
December 11-14, 2002
Singapore
ICADL 2002 is the fifth in a series of annual Asian Digital Libraries conferences, and focuses on the use, adoption, and adaptation of digital libraries. This will include work surrounding digital libraries and related technologies, the management of knowledge in digital libraries, and the associated usability and social issues. This year’s theme is Digital Libraries: People, Knowledge & Technology.

2003 Joint Conference on Digital Libraries (JCDL)
May 27 – 31,  2003
Houston, TX

Jointly sponsored by the Association for Computing Machinery (ACM) Special Interest Group on Information Retrieval (ACM SIGIR), the Special Interest Group on Hypertext, Hypermedia, and the Web (ACM SIGWEB), and the Institute for Electrical and Electronics Engineers Computer Society (IEEE Computer Society) Technical Committee on Digital Libraries (TCDL), JCDL is an international forum focusing on digital libraries and associated technical, practical, and social issues. JCDL encompasses the many meanings of the term digital libraries, including new forms of information; information systems with all types of digital content; new means of selecting, collecting, organizing, and distributing digital content; digital preservation and archiving; and theoretical models of information media, including document genres and electronic publishing. Call for Papers: Due January 13, 2003.



Announcements

Updated Digital Imaging Tutorial Available
The English language version of Cornell University Library's Moving Theory Into Practice: Digital Imaging Tutorial has been revised. First published in 2000, the tutorial has received periodic updates in order to reflect changes in digital imaging technology and process. In addition to correcting or replacing outdated links and references, the latest update also includes major revisions on topics such as storage and display technology and updated hardware recommendations. It also includes new information on scanning technology, image file formats, and compression schemes. Updating of the Spanish version, as well as availability of a new translation into French will be announced before the end of calendar year 2002.

Exploring Charging Models for Digital Cultural Heritage
The final report for the Higher Education Digitisation Service (HEDS) study on behalf of The Andrew W. Mellon Foundation is now available. The purpose of this study is to investigate some of the underlying assumptions being made in the move from previously analog photographic services into the realm of digital capture and delivery. The report considers how marketable, cost efficient, and income-stable the new digital services and resources are in comparison with previous methods.

The Performing Arts Data Service Announces Two New Guides to Good Practice For Creating Digital Resources
Creating Digital Performance Resources: A Guide to Good Practice, edited by Barry Smith,  explores the advantages in using digital resources for the performing arts. Creating Digital Audio Resources: A Guide to Good Practice by Nick Fells, Pauline Donachy, and Catherine Owen,  is a basic how-to guide  for those using audio materials in the creation of digital resources. The guide addresses issues of copyright, the choice of appropriate equipment, presenting and delivering audio material, and data management. Both publications can be  ordered directly from the publishers Oxbow Books,  Park End Place, Oxford, OX1 1HN, United Kingdom. Email: oxbow@oxbowbooks.com

International Research on Permanent Authentic Records in Electronic Systems Project (InterPARES Project)
The task force report is available on the project's Web site. InterPARES is a multidisciplinary, collaborative research project, the goal of which is  to develop the theoretical and methodological knowledge required for the permanent preservation of authentic electronic records.

The Koninklijke Bibliotheek (KB) Hosts the Digital Archive for Elsevier Science Journals
The National Library of the Netherlands and Elsevier Science are establishing a permanent digital archive of Elsevier journals.  The library will receive digital copies of all Elsevier journals made available on its Web platform, ScienceDirect, representing approximately 1,500 journals covering all areas of science, technology, and medicine and exceeding 7 TB of data.



RLG News

Metadata Matters: an RLG-sponsored Forum at SAA
During the recent Annual Meeting of the Society of American Archivists, RLG sponsored a half-day forum on current metadata initiatives. Participants were briefed on new initiatives that have special relevance to the archival community.

Daniel Pitti of the University of Virginia began the session with a report on Encoded Archival Context (EAC), an ongoing initiative within the international archival community to design and implement a prototype standard based on Extensible Markup Language (XML) for encoding descriptions of record creators. Identifying record creating entities, recording the names or designations used by and for them, describing their essential functions, activities, and characteristics, and the dates when and places in which they were active or over which they had some responsibility are essential components of archival description. Creator information facilitates both access to and interpretation of records.

Martin Halbert from Emory University gave an update on the Open Archives Initiative (OAI), an enabling framework for the development of innovative, networked information services. Data providers support a simple harvesting protocol and provide extracts of metadata in a common, minimal-level format in response to requests from service providers. Service providers use extracted metadata to build higher level, user-oriented services, such as catalogs and portals to materials distributed across multiple sites. Emory University is participating in a Mellon-funded effort to explore OAI by enabling the harvesting of metadata from finding aids, item-level descriptions of texts and photographs, full text encodings of documents, and Web resources created by Emory faculty for research purposes.

Tony Gill, former program officer at RLG, spoke about the CIDOC Conceptual Reference Model, an object-oriented data model that attempts to create a formal framework in which to express the implicit and explicit semantic concepts of cultural heritage information. A discovery system based on this object-oriented model allows even minimally documented collections to "borrow" descriptive detail and contextual background from other resources, so that the whole becomes much greater than the sum of the parts. Through these cross-collection links, the model gives emphasis to people, places, and events whose identity may be only implicit in descriptive records from different contributors. This context creation can significantly enhance the value of individual collections when they are brought together for the scholar's investigation.

Merrilee Proffitt, also from RLG, gave an overview of the Metadata Encoding and Transmission Standard (METS), a generalized metadata framework developed to encode the structural metadata for objects within a digital library and related descriptive and administrative metadata. METS provides for the responsible management and transfer of digital library objects by bundling and storing appropriate metadata along with the digital objects. METS is being used by some institutions as a means of instantiating various information packages as outlined in the OAIS Reference Model (PDF). Those currently involved with or planning digitizing learned a great deal about METS, and how it can help to structure data for presentation and/or archiving.

The final speaker was Nancy McGovern from Cornell University. Actively involved in the development of a digital archive based on the OAIS (Open Archival Information System), McGovern provided background on the recently-issued RLG/OCLC report that deals with OAIS, Trusted Digital Repositories: Attributes and Responsibilities. The report provides a framework for discussions about identifying potential archiving services, and the standards, criteria, and mechanisms necessary for certifying digital repositories, to help achieve an international consensus. McGovern also discussed the certification of digital repositories and the relationship between it and trusted digital repositories. This discussion is relevant to local, regional, national, and international efforts, as successful scholarship in the future will depend heavily on coordinated, interoperable digital archiving.

The meeting was very well attended with nearly 90 participants. The discussion following the forum focused on the challenges of digital archive certification, the need for better understanding about the relationship between all of these initiatives, and the interest in keeping abreast of these developments in as timely a manner as possible. Many participants were interested in seeing a similar kind of forum next year. RLG is already planning to host another forum covering some of these initiatives along with other current developments in May 2003 in New York City. Details from the SAA forum presentations will soon be available on the RLG Web site. Details of other upcoming events can be found on the RLG events page.



Publishing Information

RLG DigiNews (ISSN 1093-5371) is a newsletter conceived by the members of the Research Libraries Group's PRESERV community. Funded in part by the Council on Library and Information Resources (CLIR) 1998-2000, it is available internationally via the RLG PRESERV Web site. It will be published six times in 2002. Materials contained in RLG DigiNews are subject to copyright and other proprietary rights. Permission is hereby given for the material in RLG DigiNews to be used for research purposes or private study. RLG asks that you observe the following conditions: Please cite the individual author and RLG DigiNews (please cite URL of the article) when using the material; please contact Jennifer Hartzell, RLG Corporate Communications, when citing RLG DigiNews.

Any use other than for research or private study of these materials requires prior written authorization from RLG, Inc. and/or the author of the article.

RLG DigiNews is produced for the Research Libraries Group, Inc. (RLG) by the staff of the Department of Preservation and Conservation, Cornell University Library. Co-Editors, Anne R. Kenney and Nancy Y. McGovern; Production Editors, Martha Crowe and Barbara Berger Eden; Associate Editor, Robin Dale (RLG); Technical Researchers, Richard Entlich and Peter Botticelli; Technical Coordinator, Carla DeMello; Technical Assistant, Kimberly Gazzo.

All links in this issue were confirmed accurate as of October 10, 2002.

Please send your comments and questions to RLG Diginews Editorial Staff.

   
 
RLG DigiNews
BROWSE ISSUES
SEARCH
RLG