 |
 |
 |
 |
 |
 |
 |
 |
Feature Article 1 |
|
 |
 Ephemeral Cities
Author: Erich Kesse - Digital Library Center, University of Florida (kesse@ufl.edu)
 |
 |
 |

Introduction Ephemeral Cities is a historical digital atlas project, currently under construction, using a geographic information system (GIS) as a map interface for access to digital library content and as a tool for understanding local and state history. Based on historic fire insurance maps created by the Sanborn® National Insurance Diagram Bureau, the atlas integrates documents, ephemera, maps, museum objects, and photographs to populate a rich, geo-temporal mosaic. Funded in part by the Institute for Museum and Library Services, this prototype project focuses on three key Florida cities: Gainesville in northern Florida, the site of the state’s largest university; Tampa in central Florida, the west coast hub of commerce and finance; and Key West in southern Florida, an island rich with Bahamian and Cuban influences. The time span is 1884–1903.
Partnerships Ephemeral Cities was born of existing and new partnerships. Lead and primary partners, the University of Florida in Gainesville, the University of South Florida in Tampa, and Florida International University in Miami, are all members of the Publication of Archival, Library & Museum Materials (PALMM) digital library collaborative. Each has contributed to Florida Heritage and Florida Environments and other Florida-related collections for more than five years. Each also maintains relationships with local libraries, historical and genealogical societies, museums, and other cultural institutions. Several of these local institutions, in turn, have partnered to contribute additional content. Some, including the Alachua County Library District, the Alachua County Clerk of the Court, and the Monroe County Public Library, have created and maintain their own digital libraries.
Partnerships were vital to the project; no single cultural institution held a sufficiently complete historical record for any given place. Museum collections, in particular, lent a tactile flair to an appreciation of the past. And, the regionalized hub-and-spokes approach to construction of the most complete collection ensured that those new to digitization had immediate access to an experienced partner and gave less experienced partners access to digital technologies and services. Typical of the PALMM experience, digitization and object description were carried out by the holding institution, using common guidelines, while more complex and more expensive tasks, e.g., text conversion (optical character recognition and mark-up) and programming, were centralized. Figures 1-3 illustrate partner relationships. Programming, for the description and contribution of digital objects, helped to ensure consistency among the partners.
The partners include five libraries, four museums, and two public records offices: (in Gainesville) Alachua County Clerk of the Court, Alachua County Historic Trust/Matheson Museum, Alachua County Library District, and the University of Florida; (in Tampa) Henry B. Plant Museum, Tampa Bay History Center, and the University of South Florida Libraries; and (for Key West) the City of Key West, Florida International University, Key West Art & Historical Society, and the Monroe County Public Library. The Florida Center for Library Automation (FCLA) provides digital library technologies and digital archiving services. The Florida Center for Instructional Technology at the University of South Florida creates the educational modules. The Digital Library Center (DLC) at the University of Florida provides text services and project coordination. The University of Florida’s Government Documents and Systems Departments, in collaboration with the DLC, provides GIS and associated programming and database services. As with any project of this size and complexity, communication is both a constant necessity and problem.
Additionally, because many resources are held privately, the project partners sought the assistance of local collectors. Although many allowed access to their collections in-situ, many more collectors brought their resources to “My Town” events, where objects were imaged and descriptions collected. “My Town” events, while allowing collectors to share their sense of history, suffered one limitation: many of those participating were elderly. Their participation represented tremendous physical effort on their part and many potential participants, undoubtedly, were dissuaded by the effort. The project would have benefited from additional planning and out-reach to this community of “living history.”
|

Figure 1: Imaging Partnerships |

Figure 2: Text Processing |
|

Figure 3: Remote Query |
Collective Content Ephemeral Cities references previously and newly digitized resources. Content ranges from published materials—the strength of existing digital collections—to diaries, letters, oral histories, and photographs. Museum and “My Town” partners contribute artifacts, a gamut of every-day objects: kitchen utensils, photographic equipment, writing implements, clothing, etc. But, the project’s work could not have been completed without maps and name-rich resources.
Sanborn® Fire Insurance Maps of Florida, historically published every seven years on average, establish the base layers of the GIS. Sanborn® maps are geographically precise and historically accurate. They provide construction information on each mapped structure and additional use information for many of the same. Each sheet of all map sets published for each of the three cities was commercially geo-rectified. Rectification allowed programmers to associate known places with geographic (and temporal) coordinates and, subsequently, to layer one map atop another. Layering revealed change, often pervasive, over time across the city landscapes.
|
|
|

|

|
|
Tampa 1884
Wood (yellow) and stone (blue) construction |
Tampa 1903
Wood (yellow) and brick (magenta) construction |
|
Figure 4: Tampa Over Time |
New uses changed the character of Tampa between 1884 and 1903 (Figure 4). This block near the waterfront changed from shopping district to an after-hours entertainment center, as transportation took more goods inland and as more goods took more people to handle and deliver them. In 1884, Tampa’s economy and population was centered on Tampa Bay. The southern corner of Franklin was occupied by a barber, cabinet maker, and a seller of fancy oils. By 1903, the railroads had finally reached Tampa. The southern corner of Franklin supported a café, two saloons, and a liquor store.
Without name-rich resources the reasons for change would remain mute. Those resources allow causative factors, such as people and events, to be examined. They answer the question “Why?” Ephemeral Cities postulates that giving voice to history can lead to a sea-change in our understanding of history. More importantly, it suggests that, by bringing together whole communities’ resources we can see change; and, if we can see it, we can understand it. Name-rich resources come from three sources: (1) newspapers, which fix people and events in specific time at specific places; (2) government records, particularly transactional, legal, and census records that capture people in civil actions (e.g., birth, marriage, and death); and (3) city directories, telephone books and the like, which fix people to known places (e.g., homes, workplaces, churches, and clubs). Many other written and printed sources provide names but most of these are low-yield sources. The parsing of name-rich resources is discussed below.
Assumptions Ephemeral Cities makes several assumptions, most of which have thus far proven true, some, however, with considerable effort. First, it is assumed that citizens of Florida and tourists have general knowledge of the state’s geography. At present, information about the use of the PALMM Aerial Photography: Florida collection, another GIS project, supports the validity of the assumption. This largely anecdotal information also suggests that the GIS interface requires simplification. Second, it is assumed that everything can be fixed in time and place. This has proven to be true generally, at least among name-rich resources. The project continues to explore means of collaborative information building in the web environment. Similar to the (analog) experience of the “My Town” events, such methods would serve to fix date and place to other resources and to collection of personal histories that would otherwise be lost. Methods of establishing and attaching levels of trust and independent verification remain outstanding work items. Third, it is assumed that semantic processes can be applied to metadata and searchable text to facilitate tagging and discovery. Although software enabled processes (e.g., Gate and Annie; cf, http://gate.ac.uk/) facilitate tagging, they work well only against structured authorities. Augmentation of existing authority systems is discussed below; the task suggests the need for gazetteers and enhancement of name authority records.
Technologies Ephemeral Cities makes use of several technologies for GIS, for distribution of digital objects, and for tagging. The GIS interface is built upon the ESRI Map Server and the Tomcat servlet container for ESRI’s ArcIMS. Digital object distribution is supported by an FCLA-administered XPAT engine and SQL (Structured Query Language) applications of some of the independent partners.
Because research-discovery occurs within the text of documents rather than against search-discovery based on descriptive metadata, all textual resources have been converted to searchable text. Prime Recognition, configured with six optical character recognition (OCR) engines, ensures optimal text conversion quality.
Selected name-rich resources were parsed into enhanced authority records in an SQL database. These records are based on MARC name authority record structure and on a variety of common geographic schemes (e.g., Alexandria Digital Library [ADL] Feature Type metadata, Geographic Names Information System [GNIS], etc.). Authority records associate persons, named things, and events with places at specific times. Augmented with information from city directories, land ownership files, and other information sources, these records become the base of a gazetteer. Geographic coordinates in the gazetteer allowed linkage to the Sanborn® maps in the GIS.
Gate and Annie scripts subsequently use these records to identify and tag names in texts, thereby enriching associations. This both augments the authority records and links archival, library, and museum resources to the maps. Taken together the process of parsing name-rich resources and using parsed information to uncover related information in other texts and metadata is akin to sampling DNA to establish a family tree.
Access A query engine is currently under construction and a prototype is working in a test environment. The user enters the “collection” through a map interface, potentially zooming in on an area of interest. Guided queries help the user to find specific information. The GIS helps them to explore relationships. An illustrated example follows.
|

|
The user has zoomed into a location of interest on a period Sanborn® map: here, Gainesville, 1884. The user may either point and click, or, construct a particular search strategy here, by building information. |
|
|
|

|
In this case, the user has decided to search “Building Use” for “Cigar Factories.” |
|
|
|
 |
The cigar factory in this block is identified by a red dot. If the user so desired, all of the cigar factories in Gainesville, or, in any or all of the target cities could be seen, or, in one or many places in various layers of time. |
|
|
|

|
Clicking on a red dot (or on any building) displays all of the known information about that building’s use and occupants at that approximate time in history.
Advanced queries will allow searches of building uses and occupants over time. This information is extracted from name-rich resources. |
|
|
|

|
When the user clicks on an alternate use – say, Grocery – red dots indicate the location of other grocers. |
|
|
|

|
By clicking the red dot of any of these locations, information about that location is displayed. |
|
|
|

|
Clicking either a use or an occupant’s name launches a query against targeted collections. |
|
|
|

|
Retrieval lists, sorted by holding institution, display a thumbnail together with brief descriptive information. |
|
|
|

|
A selection invokes a new browser window to display the selected resource. |
Towards the Future Ephemeral Cities suggests the creation of a historic city atlas encompassing the nation. Sanborn® and similarly precise historical map sets were created for U.S. cities with populations over 2,500. Updated periodically, they represent snapshots of the growth of cities. Upon public launch planned for mid-2005, fully elaborated procedures will be released. The project’s planners anticipate extending the model to other Florida cities and hope that others will do so elsewhere.
Programs not yet able to support GIS might also look to the future. Research resources valued by genealogists are an excellent place to begin. City directories and similar content, surprisingly, have been widely overlooked. These resources begin to tell us who lived where and when. Development of services supporting conversion and text-searching of newspapers is also key. Partnership with local government records offices unlocks a wealth of temporal and geographic information for future discovery.1
As we free the content of our digital collections for temporal and geographic discover, Ephemeral Cities also makes possible a more fantastic vision. The geographically aware cellular telephones and digital cameras of the future will enable a form of education and tourism that turns historic sites into museums and libraries that serve-up information as one passes through a location (cf, www.uflib.ufl.edu/digital/collections/EphemeralCities/EPCfuture.avi).
Issues Ephemeral Cities is an attempt to address research functions rather than to provide additional research objects. Its most apparent means is the GIS-enabled map interface. It joins a number of similarly enabled digital libraries.2 Each is predicated upon the notion that regardless of the type of research or information-use conceived, time and place are constants of the process. Nothing exists outside of time or place. Ephemeral Cities distinguishes itself both in its depth and attempts to mine data from digital texts. No other GIS-enabled project has attempted to establish layers of information at the building level that can be peeled back or piled on in sheets of time.
Like previous geo-temporal digital library projects, Ephemeral Cities identifies particular difficulties as development needs. Digital gazetteers, both more and more-historically detailed gazetteers, are needed. Enhancement of authority record structures (e.g., Library of Congress Name Authority and Geographic Names Information System) is also needed, as are methods for trusted, distributed augmentation of those records. Authority systems currently lack sufficient historical record of name changes and of uses. And there is a need for a distributed system of contribution and record augmentation, if only because of the sheer volume of names and name information that can be compiled through machine processes. Finally, though not discussed here, Ephemeral Cities identifies a need to move from association of named places with points, a single set of coordinates roughly identifying the center of an object, toward association with bounding boxes or polygons, roughly identifying the boundaries within which an object rests. A point is an existential designation that says nothing of the shape that an object takes either at any given time or as it moves through time.3 As cities grow and counties shrink, a point is a virtual needle in a hay stack. Notes
1 Alachua County Clerk of the Court’s Ancient Records Program (http://www.clerk-alachua-fl.org/archive/default.cfm) is an excellent model for release of historic public records. Laws of the State of Florida mandate governments to make records digitally available (cf, http://www.clerk-alachua-fl.org/clerk/IALaw.html) and are good example of enabling legislation.
2 Reports of other notable projects examining issues of space and time include:
- Gregory Crane. “Designing Documents to Enhance the Performance of Digital Libraries: Time, Space, People and a Digital Library of London,” D-Lib Magazine, v.6, no. 7/8, July/August 2000, http://www.dlib.org/dlib/july00/crane/07crane.html;
- Scott R. McEathron et al. “Naming the Landscape: Building the Connecticut Digital Gazetteer,” INSPEL 36(2002)1, pp. 83-93, http://www.fh-potsdam.de/~IFLA/INSPEL/02-1mcsc.pdf;
- Michael Buckland and Lewis Lancaster. “Combining Place, Time, and Topic: The Electronic Cultural Atlas Initiative,” D-Lib Magazine, v.10, no. 5, May 2004, http://www.dlib.org/dlib/may04/buckland/05buckland.html; and
- Patrick McGlamery. “A Decade of Spatial Metadata Content Standards,” Readex Digital Institute (2nd: 2004 October 7-9). Unpublished.
3 For illustration of the problem, see Alachua County (FL) Ancient Records’ Census Maps http://www.clerk-alachua-fl.org/Archive/AncientJ/1830map.html.
 |
 |
 |
 |
 |
 |
 |
 |
 |
Feature Article 2 |
|
 |
 X Marks the Spot: The Role of Geographic Location in Metadata Schemas and Digital Collections
Author: Stephanie C. Haas - Digital Library Center, University of Florida (haas@uflib.ufl.edu)
 |
 |
 |

The Importance of Location From pirate maps to GPS (Global Positioning System) units in cars, where we’re going and where we’ve been all revolve around places. Much as taxonomists use scientific names as identifiers for the rich complexity of an organism’s genetic footprint, most of us talk in terms of place names rather than in the more spatially exact terms of latitude and longitude coordinates. Linda Hill, of the Alexandria Digital Library writes, “Place-names are used in discourse and text, subject headings and index terms, labels on maps, and to identify administrative districts for addresses, statistics, and data. Geospatial coordinates are used to represent the location of features on the surface of the Earth and the coverage of maps, aerial photographs, remote-sensing images, and datasets of various kinds.” 1
In the past, and even now for many purposes, the inexactness of place names is functional, but precludes us from effectively integrating the textual and spatial information universes. As society becomes increasingly complex, and multiple and often conflicting human activities demand equal space, exactitude of location takes on a critical function.
Emergency preparedness issues have forced all communities to take an active role in preparing plans that accurately show where key resources (hospitals, power plants, etc.) are located. Prior to this escalated national initiative, governmental and non-governmental organizations were finding it increasing desirable to pinpoint the spatial, or geographic, footprint of various kinds of data collections.
In the library world, geographic visualization is becoming a defining functionality in digital library collections. Gregory Crane, Perseus Project, Tufts University, writes, “In a mature digital library (DL), documents should coexist with a Geographic Information System (GIS). The GIS component of the DL should be able to scan documents for toponyms and then generate a map illustrating the places cited in a document.” 2 Of equal importance is the conceptualization and development of distributed geolibraries. The National Research Council defines a geolibrary as “a digital library filled with geoinformation—information associated with a distinct area or footprint on the earth’s surface—and for which the primary search mechanism is place. A geolibrary is distributed if its users, services, metadata, and information assets can be integrated among many distinct locations.” 3
Library Traditions of Geographic Access MARC records may contain several geographic access points. Classification numbers may include coding for place. Traditionally, map catalogers have entered latitude and longitude coordinates in the 034 field (coded cartographic data) and 255 field (cartographic mathematical). The most familiar access is through geographic names found in the 651 subject fields.
Geographic Subject Headings The Library of Congress Subject Headings (LCSH) approach most geographic issues from an administrative unit focus. Michael Buckland comments, “The names, the boundaries, and the political structures of these entities tend to be unstable over time. This situation is generally acceptable for current affairs and political topics, but provides limited help for scientific, historical, and narrowly local searches when the interest is in geographical region (space) rather than geopolitical unit.” 4
The Geographic Names Information System (GNIS) is the official authority for all U.S. government agencies and is maintained by the U.S. Geological Survey under the auspices of its Board of Geographic Names. Although catalogers draw on GNIS as the primary authority for U.S. place names, they have never systematically included the latitude/longitude coordinates found in the GNIS database. Also, although it seems counterintuitive, when catalogers use other resources to create U.S. place name entries, they do not submit them to GNIS. GNIS and its companion GEOnet (foreign names database) serve as core data sets for all of the well-known digital gazetteer initiatives: Alexandria Digital Library, Getty Thesaurus of Geographic Names, etc.
Spatial Coordinates Spatial coordinates (latitude/longitude pairs) are used to identify the exact location of features on the Earth’s surface. Coordinates define features with points, bounding boxes, or polygons. As noted above, the spatially more exact 034 and 255 fields have been seen as the domain of map catalogers. Hill states, “it is not current practice to use these fields to catalog documents such as environmental impact reports that are also explicitly associated with coordinate-defined locations.” 5
Spatial searching either by lat/long or through a graphical map interface appears to be very limited in the bibliographic world. The National Oceanic and Atmospheric Administration (NOAA) library catalog and NOAA’s Coral Reef Information System are searchable by bounding box coordinates. The reef literature also has a map interface. Additionally, the National Geospatial-Intelligence Agency library catalog and Princeton’s Geosciences and Map Library catalog GEOMAP [not accessible in late November 2004] are purported to be coordinate friendly. GEOREF, the abstracting service of the American Geological Institute, has also enabled bounding box coordinate searching.
Geographic Access in Non-MARC Metadata The impetus for creating rigorous geospatial metadata came in 1994 with President Clinton’s issuance of Executive Order 12906: COORDINATING GEOGRAPHIC DATA ACQUISITION AND ACCESS: THE NATIONAL SPATIAL DATA INFRASTRUCTURE, which states:
Geographic information is critical to promote economic development, improve our stewardship of natural resources, and protect the environment. Modern technology now permits improved acquisition, distribution, and utilization of geographic (or geospatial) data and mapping. The National Performance Review has recommended that the executive branch develop, in cooperation with State, local, and tribal governments, and the private sector, a coordinated National Spatial Data Infrastructure to support public and private sector applications of geospatial data in such areas as transportation, community development, agriculture, emergency response, environmental management, and information technology. 6
This Executive Order led to the establishment of the Federal Geographic Data Committee (FGDC) with the mission to minimize the duplication of effort and cost in collecting data sets. To accomplish that mission, a Content Standard for Digital Geospatial Metadata (CSDGM)(FGDC-STD-001-1998) was developed with an associated metadata clearinghouse. The standard was to be used by all U.S. governmental agencies for sharing information with others on the data they were collecting.
Geographic elements in the Digital Geospatial Metadata are defined by latitude and longitude coordinates. The same geographic approach is used by the National Biological Information Infrastructure in its Content Standard for Digital Geospatial Metadata, Part 1: Biological Data Profile, by the Bathymetric Subcommittee in its Metadata Profile for Shoreline Data (FGDC-STD-001.2-2001), and in the remote sensing community in its Content Standard for Digital Geospatial Metadata: Extensions for Remote Sensing Metadata.
Currently, FGDC is being harmonized with ISO 19115, the metadata part of the ISO 19100 family of standards entitled Geographic Information/Geomatics. The ISO standard provides for the inclusion of polygons and bounding boxes in the metadata. Kresse and Fradaie provide a complete description in their ISO Standards for Geographic Information. 7
Museum curators and biologial researchers recognized early on the importance of spatial elements. Although academic librarians are very familiar with cultural and historical collections, the specimen records in natural history museums often offer the only baseline data available on past environments. In the 1990s, a group of natural history museum experts created the Darwin Core metadata, comprised of 24 elements including the spatially related elements of country, state/province, county, locality, latitude, longitude, and bounding box.
Ecological Metadata Language was developed as a metadata scheme for exchanging information on ecological data sets. It is based on work done by the Ecological Society of America and the efforts of W.K. Michener and others who published the seminal article “Nongeospatial metadata for the ecological sciences,” Ecological Applications, 7(1): 330-342, 1997. Again, geographic coverage is given in bounding boxes.
This brief review of geographic elements in non-MARC metadata clearly indicates the disparity in approaches to geographic access.
Breaching Metadata Boundaries Creating online digital collections brought into sharp focus the incompatibility of the place name world and the spatial world.
In 1997, an IMLS project entitled “Linking Florida’s Natural Heritage” explored the potential of using Z39.50 to simultaneously search museum specimen records and scientific literature. The geographic and taxonomic information of the Darwin Core metadata were identified as critical elements to tie records together. Standard MARC cataloging practices did not address either of these elements effectively. To ameliorate this deficit, a pseudo-MARC metadata was created for scientific literature that incorporated latitude/longitude coordinates in the 034 field to describe spatial coverage. The 752 field was used to represent the GNIS hierarchy of location: Country/State/County/Named Place. Part of the intent of the spatial enhancement was to create records that could function in a GIS mapping environment. This functionality cannot be fulfilled until a geographic search interface is programmed into the OPAC.8
In 1998, Tufts University began to build a temporal-spatial front end for its Perseus digital project. Although searching by latitude/longitude is not functional as of this writing, map visualizations are used extensively with the series of humanities texts and images that comprise the Perseus project.
Two existing digital projects do permit geographic searching using map interfaces and longitude/latitude searches. The Alexandria Digital Library provides access to 15,000 maps, images, and datasets. The Electronic Cultural Atlas Initiative of Berkeley uses the TimeMap project programs developed at the University of Sydney. This project allows the user to search for digital data on historical and archaeological resources by entering bounding box coordinates or by drawing a bounding box on a map. Of critical interest in this project is the metadata. It is enhanced Dublin Core where spatial coverage is defined by bounding boxes, e.g., dc.coverage.x.min -122.5184, dc.coverage.x.max -81.1516, dc.coverage.y.min 26.5847, dc.coverage.y.max 38.2987. This is the first enhancement of Dublin Core that facilitates spatial functionality.
Into The Future Increasingly, digital initiatives benefit from integrating metadata from multiple disciplines. Without appropriate geographic elements, bibliographic metadata will remain isolated, accessible only through inexact textual searching. Spatial searching opens new investigative avenues in humanities as well as the social sciences and sciences. Linda Hill encapsulates the vision needed in libraries to successfully address a spatial future when she writes: “The metadata design challenge for non–GIS metadata is to incorporate a simplified representation of geospatial location that is consistent with the standards developed by the geospatial data community.” 9
Notes 1 Linda L. Hill and Greg Janee. “The Alexandria Digital Library Project: Metadata Development and Use,” in Metadata in Practice, editors Diane I. Hillmann and Elaine L. Westbrooks. Chicago: American Library Association, 2004, p. 118.
2 Gregory Crane. “Designing Documents to Enhance the Performance of Digital Libraries: Time, Space, People and a Digital Library on London,” D-Lib Magazine, v.6, no.7/8, July/August 2000, p. 1.
3 Distributed Geolibraries: Spatial Information Resources. A Summary of a Workshop Panel on Distributed Geolibraries [with members of the] Mapping Science Committee; Board on Earth Sciences and Resources; Commission on Geosciences, Environment, and Resources; and the National Research Council. Washington, D.C.: National Academy Press, 1999, p.1
4 Michael Buckland and Lewis Lancaster. “Combining Place, Time, and Topic: The Electronic Cultural Atlas Initiative,” D-Lib Magazine, v.10, no.5 May 2004, p. 4.
5 Hill, op.cit., p. 118.
6 Executive Order 12906 Coordinating Geographic Data Acquisition and Access: The National Spatial Data Infrastructure. Published in the April 13, 1994, edition of the Federal Register, Volume 59, Number 71, pp. 17671-17674. Amended by Executive Order 13286, published in the March 5, 2003, edition of the Federal Register, Volume 68, Number 43, pp. 10619-10633.
7 Wolfgang Kresse and Kian Fadaie. ISO Standards for Geographic Information. New York: Springer, 2004.
8 Fuller descriptions of this project are found in Priscilla Caplan and Stephanie Haas. “Metadata Rematrixed: merging museum and library boundaries,” Library Hi Tech, v. 22, no.3, Sept. 2004; and Stephanie Haas, Elaine Henjum, Mary Ann O’Daniel, and Joe Aufmuth. “DARWIN and MARC: A Voyage of Metadata Discovery,” Library Collections, Acquisitions, and Technical Services, v. 27, p. 291-304, 2004.)
9 Hill, op.cit., p. 128.
 |
 |
 |
 |
 |
 |
 |
 |
 |
Feature Article 3 |
|
 |
 PREMIS — Preservation Metadata Implementation Strategies Update 2: Core Elements for Metadata to Support Digital Preservation
Author: Rebecca Guenther - Library of Congress (rgue@loc.gov)
 |
 |
 |

OCLC and RLG established the PREMIS Working Group to develop a common, implementable core set of metadata elements to support preservation of digital objects and to explore alternative strategies for the encoding, storage, and management of preservation metadata within a digital preservation system. The PREMIS Working Group consists of two subgroups: the Core Elements and the Implementation Strategies groups. In a previous issue of RLG DigiNews, Priscilla Caplan described a survey conducted by the Implementation Strategies subgroup. The survey was intended to support the creation of a data dictionary and its implementation by exploring what institutions were doing in the area of digital preservation. The present article will present an update of the work of the Core Elements subgroup. A full PREMIS report will be made available early in 2005.
The Core Elements group has spent almost a year and a half working on a list of core elements needed to support the long-term preservation of digital objects. After much deliberation about the meaning of “core,” the group came up with a practical definition: those elements that a working archive needs to support the functions of ensuring viability, renderability, understandability, authenticity, and identity in a preservation context. Initially the group felt that all core elements should be considered mandatory by definition, but some optionality crept in, with the acknowledgement that some elements are more core than others, and even necessary information cannot always be provided. The group acknowledged that it was important that the elements be populated in an automated way as much as possible, either from the object itself, or as defaults according to the policies and business rules of a given repository.
The group decided that the data dictionary should attempt to be implementation-independent, that is, it should not specify how the metadata should be stored (e.g., with each object, within tables, in a file format registry, etc.). For instance, some mandatory metadata elements may be implicit within the system, perhaps because of the repository’s business rules, and it would not be a requirement to explicitly supply these. The principle is that the repository needs to know the information, not that it be recorded in a certain way.
The Core Elements subgroup began by analyzing the recommendations of the earlier OCLC/RLG Preservation Metadata Framework Working Group related to Preservation Description Information. This included reference information (identification systems), context information (relationships among objects), digital provenance (the history of an object), and fixity (data integrity information). Those members of the subgroup from institutions actively running or developing preservation repositories mapped the elements from the framework to their own systems. It became clear that the elements detailed in the previous work (which themselves had been mapped to the OAIS information model) did not always correspond to elements implemented in practice, and did not give adequate guidance on how to use them. However, the exercise was useful in generating a common denominator for diverse implementations; the group discussed each element in conference calls to determine commonality in usage. Elements that emerged as being widely used across implementations were considered the beginning of a core element list.
The second category of metadata that the group considered was technical metadata, or detailed information about the physical characteristics of digital objects that is needed for the digital preservation process. Since the group lacked the expertise to recommend detailed technical metadata for specific file formats, they decided to limit its deliberations to technical metadata that applied to all objects regardless of the specific file format. By scoping the work to include only that metadata applicable to all (or at least most) digital formats, the group was able to limit the work to a reasonable set of semantic units and leave further development to format experts. An example of a format-specific technical metadata element set is NISO Z39.87 (Technical Metadata for Still Images).1 This limited scope does not mean that format-specific technical metadata is not needed, but that these should be developed by groups of experts outside the PREMIS effort.
As the data dictionary was being developed, the working group realized that it would be useful to establish an abstract data model to guide its work by providing a structure for the data dictionary and to assist implementers in applying the semantic units. In the PREMIS data model there are five types of entity: intellectual entities, objects, agents, rights, and events. Most of the data dictionary involves objects and events. Intellectual entities and agents are not fully described, on the assumption that metadata for these entities is being developed by other groups. The minimum core rights information that a preservation repository must know is the permissions that have been granted to the repository itself to carry out actions related to objects within the repository.
The main deliverable is a data dictionary, which will contain the core elements (called “semantic units”) and detailed information about how to apply them. Included will be semantic unit name, definition, the level of object to which the semantic unit applies, a rationale for its inclusion, examples, creation/implementation notes, usage notes, whether it is repeatable, and whether it is mandatory or optional (Table 1).
|
Semantic unit |
Size |
|
Semantic components |
None |
|
Definition |
The size of the file or bitstream in bytes. |
|
Rationale |
Size is useful for knowing whether you have retrieved the correct number of bytes from storage and whether an application has enough room to move or process files. It might also be used when billing for storage. |
|
Data constraint |
Integer |
|
LEVEL |
Representation |
File |
Bitstream |
|
Scope |
Not applicable |
Applicable |
Applicable |
|
Examples |
|
2038927 |
|
|
Repeatability |
|
Not repeatable |
Not repeatable |
|
Obligation |
|
Optional |
Optional |
|
Usage Notes |
May be repeated for embedded files. |
Table 1. An example of the data dictionary for an entry in the objects entity.
Note that there are three subtypes of the object entity detailed in the data dictionary: representation, file, and bitstream. A representation is the set of files, including structural metadata, needed to provide a complete and reasonable rendition of an intellectual entity. For instance, a journal article may be represented by one PDF file; this single file constitutes the representation. Another journal article may be represented by one SGML file and two image files; these three files consitute the representation. A file is a sequence of bits stored as a single unit in a computer system, typically in a file system on disk or magnetic tape. A file may contain a single bitstream or more than one bitstream (for example, when a PDF file embeds images). A bitstream is a contiguous sequence of bits with a defined starting position and length, that has common attributes for preservation purposes.2 An additional subtype, a “filestream,” was discussed but not included separately in the data dictionary, since the metadata that applies to it has the same characteristics as a file; it is defined as a contiguous bitstream within a file that can be transformed into a stand-alone file conforming to some file format without adding any additional information (headers, etc.) to the bitstream.
The most difficult part of the development of the data model has been to appropriately identify, name, and define these subtypes. These distinctions are important because different semantic units of metadata apply at different levels. The intellectual entity may have an ISBN or technical report number, but the representation does not. The representation may have an identifier known to the preservation repository, but the intellectual entity does not. The file will have a filename and file format, the filestream will have a file format but no filename, and the bitstream will have no filename or file format, although it may have other format characteristics.3
Semantic units associated with object entities include identifiers, environment information (e.g., hardware and software), location information, technical characteristics that apply regardless of format (e.g., fixity, size, significant properties, inhibitors, creating application information), and relationships to other objects. In anticipation of the development of digital format registries, the data dictionary also contains semantics for referencing format registry entries. Similarly, it provides for basic software and hardware environment information and anticipates adding references to future environment registries.
Digital provenance metadata is centered around events that have acted upon objects and is intended to record processes during the period of archival retention. Semantic units include event identifier, event type (e.g., compression, fixity check, migration, validation, etc.), event outcome, event date/time, and related agents (Table 2).
|
Semantic unit |
EventOutcome |
|
Semantic components |
None |
|
Definition |
A high-level categorization of the result of the event. |
|
Rationale |
A coded way of representing the outcome of an event at a high level can be useful for machine-processing and reporting. If, for example, a fixity check fails, the event record provides both an actionable and a permanent record. |
|
Data constraint |
Values should be taken from a controlled vocabulary. |
|
Examples |
00 [meaning, action successfully completed]
CV-01 [meaning, checksum validated] |
|
Repeatability |
Not repeatable |
|
Obligation |
Optional |
|
Creation/ Maintenance Notes |
Recommended best practice is to use actionable coded values. |
|
Usage Notes |
More detail about the outcome may be recorded in eventOutcomeDetail. |
Table 2. An example of an entry in the events entity.
In addition to the data dictionary, the final PREMIS report will provide a review of the methodology followed, a description of the data model and the entities defined, a discussion about issues that were not addressed because of overlap with other efforts, background about specific semantic units that warranted lengthy discussions, examples, and implementation considerations.
Once the data dictionary is completed, a member of the group will write XML schemas to enable the implementation. The survey conducted by the Implementation Strategies subgroup found that over half the respondents were using or planning to use METS in their preservation repositories, therefore METS-compatible schema are critical to deploying the emerging PREMIS standard. A digital provenance schema will include semantic units in the events entity, and a general technical metadata schema will include many of those in the objects entity. These could be used as METS extension schemas in the administrative metadata sections of METS objects, specifically the digiProv and techMD sections. Additional work will be required to resolve any overlaps between the PREMIS technical metadata, which is format-independent, and any existing format-specific technical metadata schemas, e.g. NISO Metadata for Images in XML Schema (Mix).
Opportunities for establishing testbeds for the PREMIS elements are under discussion. Many members of the group are planning to experiment within their own repository applications. It is expected that experimentation will also involve the exchange of information packages including both content data objects and PREMIS-compliant preservation metadata. Plans also include setting up a forum for public comment, and ultimately, some form of maintenance activity for the data dictionary following its release.
The full report and data dictionary will be accessible once completed at the PREMIS web site: http://www.oclc.org/research/projects/pmwg/.
Notes
1 Information about the committee developing this standard can be found at: http://www.niso.org/committees/committee_au.html.
2 These definitions will be in the final report of the PREMIS working group, to be available in early 2005.
3 Priscilla Caplan and Rebecca Guenther, Practical Preservation: the PREMIS Experience, Library Trends (forthcoming).
 |
 |
 |
 |
 |
 |
 |
 |
 |
Highlighted Web Site |
|
 |
 SourceForge
 |
 |
 |

SourceForge: A source for open source digital preservation tools
More than just a Web site, SourceForge.net is an online interface to a vast community of open source software developers. It offers a gateway to resources and services and provides free space for interested parties to host, discuss and manage open source software. We chose to highlight it as increasingly software tools and suites developed as part of digital library and preservation research and implementation projects are being made available through the site. There is no area set aside on the site specifically for these projects--yet--and searching through SourceForge requires careful choice of search terms (be sure to check "Require All Words" for multiterm searches). Terms for specific standards, like "OAI" and "Dublin Core" produce well-targeted lists. "Metadata", "digital library" and "archiving" on the other hand, bring up long lists, only a minority of which are of interest.
A sample of developments from digital library and preservation projects you can find on the site includes:
- Xena
A tool to normalize file formats into XML developed by the National Archives of Australia. Note: Another development to watch for is their Digital Preservation Recorder to track preservation actions on preserved digital objects.
SourceForge allows users to check on upcoming software functionality, follow development news, contact a development team to provide input or feedback, and preview a trove of nascent, open source applications, utilities, tools, and source code.
SourceForge is part of the OSTG [Open Source Technology Group] network, which also produces Slashdot.org, NewsForge.com, and ITManagersJournal.com.
 |
 |
 |
 |
 |
 |
 |
 |
 |
FAQ |
|
 |
 One Last Spin: Floppy Disks Head Toward Retirement
Author: Richard Entlich - Cornell University (rge1@cornell.edu)
 |
 |
 |

It appears the floppy disk is going the way of the long playing record and the rotary dial telephone. Is there any cause for concern?
Stories about the death of the floppy disk have been coming with increasing frequency in recent years and there are clear signs that the venerable floppy is on its way out. Apple introduced its first floppy-free design (the original iMac) in 1998 and not long afterward eliminated floppy drives from all its systems. Most laptop manufacturers removed floppy drives as standard equipment years ago. Dell announced in early 2003 that its high-end desktops would no longer include floppy drives and other manufacturers have made similar moves. Widespread availability of high-speed networks, CD writers, USB keys, and other data transfer and storage technologies — along with the floppy's paltry capacity in a world of huge multi-media files — have dramatically lessened dependence on this once essential technology.
All technological change produces a certain amount of dislocation. Floppy disks have been around in one form or another for over 30 years and have already caused their share of heartache. The length, depth, and nature of one's relationship with this technology will determine whether its decline will be greeted with panic, concern, a shoulder shrug, or a cry of joy. Read on to see whether you're ready to say a painless goodbye to the floppy.
Background Those who are familiar with the history of small computers since their advent in the 1970s know that floppy disk technology has come in three major waves. In 1971, IBM introduced an 8" variety, initially as a data loading peripheral for its System/370 mainframe computers. Five years later, in 1976, Shugart Associates brought out a 5.25" version, dubbed the minifloppy, that became the standard in IBM PCs and their many clones during the early to mid-1980s.
In 1981, Sony debuted the 3.5" floppy that was adopted by Apple for its Macintosh computer in 1984 and gradually became the predominant form of floppy, displacing all other designs. The 3.5" floppy 1 (also called a microfloppy) became ubiquitous not only on computers, but other electronic equipment as varied as music keyboards, laboratory instrumentation, and medical diagnostic devices.
Many other floppy designs were proposed, in a range of sizes from two to four inches. Few had any real market impact, but the 3" floppy designed by Hitachi was utilized in several mid-1980s computers, most notably some models by Amstrad. We can be thankful, however, that "only" three major size variants of floppies exist, for there is no shortage of technical usage issues among them.
Design and Usage Issues Many people have never seen an 8" floppy disk, but if you've seen a 5.25" one, you can easily picture its larger forbearer. Besides size, the two were essentially identical in appearance. Both consisted of a round flexible plastic disk with a magnetic oxide coating, encased in a somewhat stiffer (but still eminently bendable) square plastic envelope. To enable the read/write heads of the drive to access the media, oval holes were cut in the outer envelope. This also exposed the sensitive media to dust, hair, and the hazards of human handling, especially since the disks often got separated from the paper (or Tyvek) sleeves they were intended to be stored in when not inside a drive mechanism. Thus, both sizes were vulnerable to bending and creasing as well as to contamination and damage to the oxide coating of the media
The 3.5" floppy represented a major advance in engineering design. The vulnerable magnetic media was encased in a rigid plastic shell and a metal shutter meant to be opened only when the disk was in use replaced the permanent opening of the earlier types. Slightly oblong dimensions (the 3.5" floppy is slightly longer than it is wide) along with a strategically placed notch in one corner foils any attempt to insert an incorrectly oriented disk into a drive (a naive user of the older types had a seven out of eight chance of choosing the wrong orientation). The write protect mechanism uses a captive plastic slide, rather than requiring an adhesive label that could fall off inside the drive and gum up the works.
Intrinsically, the 3.5" floppy provided much better protection for fragile data than its earlier cousins. However, the look of durability along with the convenient pocket size resulted in less ginger treatment that may have negated the advantage. Microfloppies are not sealed units (there is an opening around the hub on the bottom) and an unsheathed disk in a pocket or pocketbook is subject to contamination and damage. Thus, although a well-cared for floppy can maintain its data for decades, users have typically found them to be unreliable and short-lived.
Compatibility Issues The first 10-15 years of the floppy disk coincided with (and to no small degree, helped make possible) the explosive growth phase of the early microcomputer industry. Dozens of companies offered competing designs, at the same time that incremental improvements in floppy disk drive design were a regular occurrence. The result was an almost complete lack of standardization amongst floppy disks.
Early successful "home" computers, such as the Apple II, Commodore 64, and Radio Shack TRS-80, were available with minifloppy drives, as were dozens of less well-known systems using the CP/M operating system. Although they all used identical looking 5.25" floppy media, hundreds of incompatible variants emerged. Some used single-sided media, others double-sided. Some used so called soft-sectored disks, while others required hard-sectored 2. They also employed several different techniques for encoding the data, organizing file directories, and laying out the invisible tracks and sectors that held that data. For the most part, none of these systems could read each other's data.
From the early 1980s onward, variability lessened considerably. The minifloppy disk market eventually settled on two major flavors: 360KB double-sided double-density (DS/DD) and 1.2 MB double-sided high density (DS/HD). These types were used almost exclusively on IBM PCs and their clones. The high density drives can ostensibly read the low density disks, but cannot be used to write to low density disks without risking their continued usability in low-density drives. High-density media cannot be formatted for use in low-density drives.
The microfloppy market has also been dominated by two formats. On the PC side, there are 720KB DS/DD and 1.44MB DS/HD disks. There was also a 360K single-sided and a 2.88MB extra high density type, but neither is very common. Most Macintoshes used either 800KB DS/DD or 1.44MB DS/HD disks, but the earliest models used 400KB single-sided disks. The 400KB and 800KB Macintosh disks used a different encoding scheme that is fundamentally incompatible with the encoding on PCs. Thus, older Mac floppies cannot be read on PC drives without special controller hardware. Macintosh floppy drives after about 1988 support both types of encoding and can read and write a variety of Mac and PC formats. However, even new style Mac DS/HD disks cannot be read on PCs without special software.
A variety of hardware problems can also affect compatibility of floppy drives and their media. Unlike modern hard drives, which have a mechanism to keep the heads properly aligned over the data tracks, most floppy drives cannot find the data tracks other than by relative position. A sensor tells the drive where track zero is and the drive then counts steps until it arrives at where the data should be. If, however, it started in the wrong position because the track zero sensor is out of alignment, it may not be able to read the data. This kind of problem accounts for disks that can be read on the drive that wrote them, but not on other drives of exactly the same kind. Other problems can be caused by speed fluctuation (sometimes resulting from an accumulation of dust in the mechanism), improper head angle, disk or motor spindle eccentricity, and use of poor quality media.
Why the Floppy has Endured Given its fragility and myriad other problems, one could be forgiven wondering why this class of media has stuck around for so long. The simple answer is that, despite all their warts, floppies have many desirable characteristics, and no one competing technology has come along that surpasses the floppy in all areas. Among the microfloppy's most endearing qualities:
- small size—takes up little desk, drawer or shelf space
- easily transportable—fits in pocket; can be mailed without excessive packaging
- inexpensive—can be given away without a second thought
- foolproof insertion
- nearly ubiquitous drive availability—more widely available than any other removable, read/write storage technology, especially in a global context
- standardized data format and capacity (especially in last 10 years)
- transparent to use—rewritable, and handles files and directories just like a hard disk (unlike CD-R or CD-RW)
- accessible from front of computer (unlike USB ports for memory keys on many machines)
- bootable
Certainly, there are many removable media technologies that offer some of the above benefits, along with others such as far greater capacity, faster access, and better reliability. But not one of those technologies offers all of the floppy's advantages on as large a proportion of the current installed base of computers.
Ultimately, it is the floppy's limited storage capacity that has spelled its doom. Even users who would prefer to use it to move and share files may have no option but to learn a different way because the content is too large to fit. But as discussions on Slashdot and ZDnet clearly indicate, the floppy's demise is causing even some technically savvy users distress. Common areas of concern include:
- Performing BIOS upgrades (requires bootable media other than the hard drive)
- Installing drivers for SATA drives or RAID (sometimes installable from floppy only)
- Performing rescue and recovery operations on systems with problem hard drives or CD drives
- Working with systems where CD drives are absent, non-functional, or read-only
- Dealing with service bureaus that supply data on floppy disk
- Exchanging data with users in countries where removable media other than floppies is less common
- Exchanging data for children whose school systems supply floppies for home use
There are workarounds in some of these situations, but in most cases, they require procedures that are more complex, more time consuming, and/or more expensive.
Salvaging Data Still on Floppy Disks Some of the situations described above may be annoying and inconvenient, but for those who are still maintaining valuable data on floppy disks, especially on types other than microfloppies, modestly loud alarm bells should be starting to ring. For a variety of technical and commercial reasons, it is becoming increasingly difficult to salvage data from old floppies.
Hardware is becoming harder to find. Eight inch floppy drives on functioning systems are extremely rare these days, outside of service bureaus that specialize in recovery of data from obsolete media. Though it is still possible to find 8" floppy drives for sale from specialty retailers and on eBay, finding a compatible system may be difficult and hooking up the drives is most assuredly not a plug-and-play operation.
Even the once common 360KB and 1.2MB 5.25" drives are becoming uncommon on functioning systems. The drives themselves are still widely available (mostly used), but they generally cannot be hooked up to modern PCs because the systems lack the proper cabling, controller support, internal drive bay, and/or BIOS support. Hardware for more unusual minifloppy types, such as hard-sectored disks, is becoming much harder to find and get operational.
Software tools are disappearing or only run on old systems. In the aftermath of the demise of many older systems, software tools were developed to allow (where technically possible) newer systems to read disks produced by those systems. For example, there used to be several companies that made software to allow pre-IBM PC minifloppy formats (especially CP/M formats such as Osborne and Kaypro) to be read on a standard PC minifloppy drive. Due to low demand, however, these products are no longer marketed, and even if one can find a copy, they haven't been updated in years and only run in a real DOS environment, which is no longer available under Windows 2000 or XP.
Old media is no longer made. Some transfer from old media can be accomplished through the use of "bridge" systems. In these situations, an old format that is still readable on modern machines (through backwards compatibility) serves as an intermediate conversion target from an even older, completely obsolete type. It may be possible to find a functioning machine from the right era that combines both technologies. However, success in using this technique requires the availability of the bridge media, which may no longer be available.
Copying the files may not be sufficient to salvage them. In some cases, the arrangement of files on a floppy disk is important, or they are expected to be on a floppy disk and not on other media. Copying them to a hard drive may render them unusable. This problem can often be overcome by creating a disk image, a file that retains all the characteristics of the original disk and is treated as such by the system.
Expertise in the use of old hardware is becoming less available. The knowledge necessary to work with obsolete hardware is a dwindling commodity. Technical support personnel are often young and lack familiarity with hardware and media that were already obsolete when they started learning their trade.
Practical Responses to the Floppy's Demise There is no denying the facts. Whether it takes two years or five or ten, all floppy disk formats will be obsolete within the foreseeable future. Any data stored solely on 8" or 5.25" floppies should be considered endangered, not only because of the technical and commercial obstacles already discussed, but because the media itself can fail over time (this is more likely with single and double density media which used lower level magnetic fields to record the data than high density media) and because the data is likely to have been created using software applications that are now obsolete themselves.
For institutions looking to migrate data from large numbers of older style floppies, it may be cost-effective to set up an old system that can support the older style disks. The case studies cited in the resources section below provide a sense of the pitfalls one might encounter in a large-scale migration effort. Otherwise, there are numerous commercial firms (see resources below or try your favorite search engine) offering media migration, data conversion, and data recovery services. Costs vary considerably depending on the precise needs. Simple migration of standard format 5.25" disks can cost anywhere from $5-$50 per disk and up, not including shipping or handling. Data conversion will add to the cost, and data recovery (e.g., from damaged or corrupted media) is substantially more expensive. Migration from more obscure floppy types and file formats is also considerably more expensive. It should be noted that the data on even badly damaged floppies can often be fully recovered, though heroic rescue is quite expensive and should be reserved only for especially valuable data.
Due to the sheer number of compatible systems still around, data stored on PC microfloppies, if properly stored, is probably not immediately threatened with loss. However, even though there are plenty of older Macintosh systems with internal or external USB floppy drives still around, it is certainly time to start moving data on Mac floppies to other media. In fact, all 400KB and 800KB Macintosh microfloppies written using the MFS file system are already in greater danger, since support for that file system was last provided in MacOS v.7.5.5, which was superseded in 1997 and definitely qualifies as obsolete.
For data that must remain on a floppy in order to be usable, disk images will help extend the life of the data while allowing migration of the media. Windows, MacOS Classic, and Unix/Linux all have utilities available for creating and reading floppy disk images. See the resource listings for further information.
Conclusions The history of the floppy disk provides an excellent object lesson in the management of content on digital media. All media is subject to obsolescence. Media types that are less mainstream, less standardized, and less widely adopted are hit harder and faster by obsolescence. If one is paying any attention, there is generally ample opportunity to recognize that a medium is headed for obsolescence at a time when migration can be accomplished relatively painlessly and inexpensively. Of course, it goes without saying that no data of value should be stored on only a single piece of media. Placing backup copies on different kinds of media diminishes the likelihood of loss.
Most physical media formats go through a period of evolution before reaching a dead end. Typically, the storage capacity is increased one or more times and it is often the case that later generation drives are backward compatible with one or even two generations of previous capacities. It's easy to be lulled into complacency by backwards compatibility, but once a media capacity is no longer the current specification, it is time to consider migrating data to newer formats.
A dead end occurs when backwards compatibility is no longer available. All three common sizes of floppy disks should now be considered to be at a dead end, including the still popular microfloppy. There is no forward migration path that allows data to remain safely on microfloppies. In the late 1990s, there were several efforts to develop high-capacity removable media drives that could also read standard 3.5" floppies. Notable among these were the Imation SuperDisk and the Sony HiFD. Neither survived in the marketplace.
What will replace the floppy? None of the dozens of proprietary, higher-capacity, removable, magnetic media devices that have emerged in the past ten years can lay claim to the crown. Some contenders, like those from SyQuest and Iomega's Jaz, were discontinued. Others, like the Castlewood ORB, have tiny installed bases. Even Iomega's successful Zip drive has nothing like the huge installed base of microfloppy drives, and Zip disks exist in three different capacities, meaning that owners of the older drives cannot read the newer disks. Finally, the media for all of these drives (and for USB key drives where the media and drive are one unit) is too expensive to be parted with casually.
Only CD drives and media, which like the microfloppy are based on international standards, come anywhere close to matching the floppy's universality and low-cost media. But CDs lack many of the microfloppy's most desirable characteristics. Thus, we can expect that users will gravitate to a variety of floppy replacements that best meet their specific needs. A survey of libraries conducted in 2001 by Hendricks and Wang 3 found no consensus on a target technology to replace floppies and concluded that " ... libraries, (especially academic) do not have a plan to cope with the forecast obsolescence of the 3.5in floppy disk."
A recent article 4 from The New York Times suggests that the problem of preserving data on obsolete storage media will be solved by for-profit computer museums. It posits the emergence of “a whole industry of people who will have shops of old machines, like the original Mac Plus.” This seems unlikely, given the enormous technical obstacles to such an endeavor, along with its doubtful commercial viability. Rather than relying on “Ye Olde Antique Computer Shoppe” to save the day, long-term digital stewardship requires vigilant monitoring of new developments in storage technology and file formats along with diligent and timely responses to those changes.
Floppy Disk Resources
History and Technical Details
Wikipedia: Floppy Disk
Floppy Disk Drive Primer Accurite
Upgrading & Repairing PCs, Eight Edition, Chapter 13 Floppy Disk Drives.
Migration and Data Recovery: Case Studies
"Excavating Data: Retrieving the Newham Archive," Arts and Humanities Data Service (UK). [Describes efforts to preserve data from 239 hastily created and poorly documented 3.5" floppy disks.]
UCSD GPO Data Migration Project [Detailed procedural information from a project to migrate media and files from 5.25" Microsoft DOS floppy disks.]
Van Bogart, Dr. John W. C. and John Merz, eds., "St. Thomas Electronic Records Disaster Recovery Effort," National Media Lab Technical Report RE0025, November 1995. [Section 4.2.2 discusses recovery of data from rain and seawater damaged microfloppies.]
Webb, Colin and Deborah Woodyard, "Migration Trials: From Floppy Disk to Recordable Compact Disc (CD-R)," LASIE (Library Automated Systems Information Exchange), v.29, no.3, September 1998, pp. 27-32. [Describes a trial to migrate an assortment of 66 mini- and microfloppies to CD-R. The floppies covered a range of characteristics including operating systems, hardware requirements, software requirements, and age.]
Migration and Data Recovery: Service Providers
Yahoo Directory: Data Conversion
Yahoo Directory: Data Recovery
Care and Handling
Dawson, Bill, "How to recover data from an improperly stored floppy diskette."
Imation care and handling recommendations for floppy disks
Floppy Disk Images
Disk Images and ShrinkWrap
Rawrite and related (floppy) disk imaging programs [Provides useful listings for all major operating systems except MacOS]
Notes
1Though commonly referred to as 3.5" disks, ISO/IEC standards for microfloppies specify their size in metric units (90mm x 94mm).
2Soft-sectored disks used a single index hole to locate the origin for sector counting, while hard-sectored disks had one hole for each sector.
3Hendricks, Arthur, and Jian Wang, "Libraries and desktop storage options: results of a Web-based survey," Library Hi-Tech, v.20, no.3, 2002, pp. 270-284.
4Hafner, Katie, "Digital Memories, Piling Up, May Prove Fleeting," The New York Times, Wednesday November 10, 2004, Late Edition - Final, Section A, Page 1, Column 3.
 |
 |
 |
 |
 |
 |
 |
 |
 |
Calendar of Events |
|
 |

 |
 |
 |

Multilingual Computing and Information Management in Networked Digital Environment: CALIBER 2005 February 2 – 4, 2005 Kochi, Kerala, India
The annual Convention on Automation of Libraries in Education and Research Institutions (CALIBER) meeting is sponsored each year by INFLIBNET Centre. The 2005 theme is "Multilingual Computing and Information Management in Digital Networked Environment" with subthemes including archiving, digitization, preservation, interoperability, and OAI-PMH.
ECURE 2004: Preservation and Access for Electronic College and University Records February 28 – March 2, 2005 Tempe, Arizona
Arizona State University will host this interdisciplinary meeting focusing on technical, policy, and training issues important for managing the electronic records of higher education institutions.
EVA 2005 Florence: Digital Imaging & the Electronic Arts March 14 – 18, 2005 Palazzo Degli Affari, Florence, Italy
This event includes a conference, workshops, and training sessions centered around electronic imaging and the visual arts. Target audiences include professionals from cultural heritage, IT, media, government, and media research sectors.
Computers in Libraries March16 – 18, 2005 Washington, DC
The Computers in Libraries conference and exhibition celebrates its 20th anniversary in 2005. This event focuses on technology for information-age librarians, and includes multi-track conference sessions, keynote speeches, workshops, and an exhibition.
Association of Recorded Sound Collections Conference 2005 March 30 – April 2, 2005 Austin, Texas
The non-profit Association of Recorded Sound Collections sponsors this meeting each year. ARSC is dedicated to all facets of collecting and preserving sound recordings as a part of cultural heritage.
Museums and the Web 2005 April 13 – 16, 2005 Vancouver, British Columbia, Canada
This annual conference uses a variety of presentation and networking formats to review, analyze, and discuss social, design, technological, economic, organizational, and cultural issues of the on-line presence of culture and heritage.
Society for Imaging Science & Technology Archiving Conference 2005 April 26 – 29, 2005 Washington, D.C.
“Introduction to Archiving” is the topic of this conference slated to discuss and benchmark the systems currently in place for archiving digital (as well as hard copy) materials, and to uncover inadequacies and directions for innovation and research.
The International Association for Social Science Information Service and Technology (IASSIST) Conference May 24 – 27, 2005 Edinburgh, Scotland
Proposals for papers, sessions and poster/demonstrations will be accepted through January 10, 2005 for the joint conference of IASSIST, the International Association for Social Science Information Service & Technology, and IFDO, the International Federation of Data Organisations. The theme for 2005 will be “Evidence and Enlightenment.” Submissions focusing on data access, data documentation, data preservation, data use, and current research activity are especially encouraged.
Libraries in the Digital Age (LIDA) 2005 Dubrovnik and Mljet, Croatia May 30 – June 3, 2005
Paper submissions will be accepted through January 10, 2005 for this conference. Preliminary conference themes are:
- What can digital libraries do that traditional cannot? Or do in addition?
- Building a small digital library and digital library network.
 |
 |
 |
 |
 |
 |
 |
 |
 |
Announcements |
|
 |

 |
 |
 |

DescribeThis metadata generator Sand’s Dublin Core Services has launched a beta version of DescribeThis, a web-based Dublin Core metadata generator. The website interface offers an easy means to automatically extract metadata from a variety of internet-based resources.
The National Library of the Netherlands to preserve Oxford Journals’ archive Oxford University Press’ Oxford Journals group has negotiated a plan for deposit and long-term preservation of the Journals’ archive of 184 scholarly digital journals with the Netherlands’ Koninklijke Bibliotheek (National Library).
Science Commons A new branch of Creative Commons, named Science Commons, will launch in January 2005. Building from Creative Commons’ work toward expanded options for open access to creative materials, the mission of Science Commons is “to encourage scientific innovation by making it easier for scientists, universities, and industries to use literature, data, and other scientific intellectual property and to share their knowledge with others.”
Digital Curation Centre is formally launched The UK’s Digital Curation Centre is a collaborative initiative to provide means for managing and preserving the digital information and data produced by researchers and scientists throughout the UK. Under a distributed leadership, the Digital Curation Centre will develop and foster programs and services to support coordinated and effective efforts by digital curators across institutions.
ECAR releases “Study of Students and Information Technology” survey The EDUCAUSE Center for Applied Research (ECAR) has released a survey of freshman and senior undergraduates focusing on the roles of IT in student life. The ECAR Study of Students and Information Technology, 2004: Convenience, Connection, and Control includes data from 4,500 survey respondents as well as focus group and interview feedback from small samples of students and administrators.
A Framework of Guidance for Building Good Digital Collections The National Information Standards Organization has released the second edition of A Framework of Guidance for Building Good Digital Collections. This edition builds on the first edition which was released in 2001 under sponsorship of IMLS.
Open WorldCat OCLC has announced that Open WorldCat has concluded its pilot phase and that most of the 57 million WorldCat database records are available to be included in search results from search engines such as Google and Yahoo! A “Find it in a library” link will direct web surfers to resources related to their search available in an OCLC member library near-by a user-supplied zip code.
JISC Joins the DLF The United Kingdom’s Joint Information Systems Committee (JISC) has joined the Digital Library Federation (DLF), becoming the first DLF “Ally” from outside the United States. The DLF consortium provides “leadership for libraries by: identifying standards and ‘best practices’ for digital collections and network access; coordinating research and development in libraries' use of technology; and incubating projects and services that libraries need but cannot develop individually.”
SEPIADES Model for Photo Cataloguing The SEPIA (Safeguarding European Photographic Images for Access) project has released an advisory report for cataloguing photographic collections. The report presents a model, called SEPIA Data Element Set (SEPIADES), developed to define a wide range of suggested elements to describe digital and analogue photographic items and collections. Tandem to the model, the SEPIA working group has released an open-source software tool to implement the SEPIADES model.
National Archives Standard for Record Repositories The National Archives, UK, has announced the publication of The National Archives Standard for record repositories. The new National Archives Standard provides best practices for operating records repositories, covering topics such as storage, public access, cataloguing, and preservation.
A 21st Century Digital Information Factory The US Government Printing Office has published a new document, A Strategic Vision for the 21st Century, outlining important transformations to its operations, especially in terms of managing digital documents.
 |
 |
|
 |