RLG
 Feature Article 2  

MIC (Moving Image Collections)

Author: Jane D. Johnson - Library of Congress (jjohnson@loc.gov)

An Innovative Partnership

MIC (Moving Image Collections, pronounced “mike”) is the product of an innovative partnership between the Library of Congress (LC) and the Association of Moving Image Archivists (AMIA).[1] Emerging from the National Moving Image Preservation Plans,[2] MIC began as a preservation initiative. MIC has successfully evolved to demonstrate that the practical requirements of preserving analog artifacts can provide the foundation for an advanced R&D platform.  The audience for MIC extends beyond archivists by exploiting the most current developments in non-textual indexing, digital rights management, and educational use; though MIC also meets the daily needs of archivists with informational resources and support for collaborative preservation, access, digitization, exhibition, and metadata initiatives. MIC has served as a model and building block for similar initiatives, including the Women Artists Archives National Directory (WAAND)[3] and the New Jersey Digital Highway.[4] Grace Agnew, Associate University Librarian for Digital Library Systems for Rutgers University Libraries, is MIC’s architect. Rutgers University Libraries is the lead developer, working with the University of Washington and Georgia Institute of Technology. Project management is provided by the Library of Congress, which will serve as the permanent host site.

The MIC website integrates a Union Catalog, an Archive Directory, and informational resources within a portal structure that delivers customized information on archival moving images, their preservation, and the images themselves. The Archive Directory was originally envisioned as a complement to the Union Catalog, providing collection descriptions at the repository level. Within the Union Catalog, users are able to limit searches to digital video that can be downloaded or streamed. A custom mapping utility facilitates participation in MIC’s Union Catalog by allowing other archives to map their own local metadata schema into the MIC core registry for import. A cataloging utility is in the works that will enable archives to input records directly into the Union Catalog—records incorporating not just descriptive metadata, but all the types of metadata required to manage a resource through its entire lifecycle.

A Tool for Preservation

Beginning in 1994, the Library of Congress published two documents detailing the crisis in film preservation. Redefining Film Preservation (1994), mandated by the US Congress as part of the National Film Preservation Act of 1992, was the first national moving image preservation plan. Three years later it was complemented by a follow-up plan entitled Television and Video Preservation 1997. MIC began when the Library of Congress turned to AMIA for assistance for developing strategies to implement the numerous recommendations included in the two preservation plans. AMIA identified the first and most crucial step in any preservation solution: a standardized way to identify holdings, particularly unique titles, so that strategic planning and collaborative decision making could occur.

Through MIC’s Union Catalog, archivists can identify past preservation work and emerging critical need, thereby reducing duplication of effort and preventing loss through deterioration, while ensuring that titles are preserved from the best surviving footage. On the public side, MIC seeks to raise awareness about preservation issues and risks to our film, television, and video heritage by enlightening readers about methods to care for home collections, the role of archives, and the preservation process. MIC’s informational resources, numbering in the hundreds, have been created and gathered by experts within AMIA, working in accordance with that organization’s educational mission. The Library of Congress’ role is to provide the management and technical infrastructure for MIC’s long-term maintenance and ongoing development. This distinctive partnership between a professional association and the National Library maximizes the strengths of both organizations to make a total contribution to the field beyond the sum of its parts.

MIC is committed to helping underfunded archives with tools and standards. Any organization holding archival moving images can participate in MIC. What are “archival moving images?” Moving images can be film, video, or digital files. Audio recordings associated with moving images, such as soundtracks, are also within MIC’s scope. Archival moving images are defined as those intended to be kept for future generations, regardless of their age at the time of acquisition. MIC is documenting moving images all over the world, whether held by organizations or individuals, and wherever they reside: in corporations, museums, historical societies, and motion picture studios.

That said, MIC was meant to be extensible. In fact, the acronym was chosen to also stand for “Media In Collections,” anticipating the future inclusion of sound recordings of all kinds. MIC’s architecture has also been carefully designed to allow expansion from a catalog and directory to a cataloging utility with full asset and digital rights management implementations as well as low-level indexing capability. It can be viewed as a prototype for all types of materials.

A Tool for Education

MIC was never envisioned solely as a tool for archivists. Early on, we knew that access for the public and for educators was the key to a sustainable preservation strategy. MIC’s mission is to immerse moving images into the education mainstream, recognizing that what society uses, it values; what it values, it preserves. This mission is addressed in three ways: by providing access to more moving images, by enabling greater processing of those images, and by facilitating digital rights management to expand use of moving images, including those outside the public domain.

Very recently Google announced its plans to partner with the National Archives and Records Administration (NARA) to make NARA’s public domain film holdings available on the Web for free.[5] For moving images, this ready browsability alone is a huge advance over traditional means of access. Nonetheless, some have criticized Google for its non-disclosure policy and speculated that Google is starting, at least, with the “low-hanging fruit” of video (as opposed to digitized film).[6] Often Google relies on extant text (such as closed captions) for retrieval.[7] MIC also allows users to search across numerous archives and limit results to digital video. In contrast to Google, however, MIC exploits the metadata that libraries are already creating for their own patrons, for all materials. This substantially increases search precision and leverages the labor that libraries will continue to employ. MIC allows users to search all moving images in its catalog, whether digitized or not, putting these resources at the fingertips of educators and students. By aggregating records from many moving image collections and by providing access to moving images exclusively, MIC provides services beyond those traditionally provided by libraries and archives.

As Don Waters has pointed out, digital materials can provide greater functionality for teaching and research than legacy analog resources.[8] “What unites our interest in digitization and open access in a digital world is that the material becomes ‘processable,’ or subject to computational processing.”[9] The Union Catalog’s MPEG-7 export capability enables segmenting and low-level indexing to facilitate this type of processing for indexing and creation of learning objects.

MIC can also advance educational use of moving images by making full digital rights management capability possible. A digital rights management implementation would allow teachers and educators to incorporate moving images and sound recordings into the classroom and electronic scholarly publications. Development of standards and practices that support broad, non-infringing use of digital multimedia requires incorporation of rights metadata into MIC’s planned cataloging utility and an expansion of MIC’s LDAP directory architecture to authoritatively identify rights holders. The LDAP directory is designed to interact with both the international authority work spearheaded by the Library of Congress and International Federation of Library Associations (IFLA) and the directory-based authentication and access initiatives spearheaded by Internet2, such as Shibboleth and the emerging eduPerson directory standard. A community collaboration that engages both nonprofit and for-profit moving image archives is important for demonstrating that both communities can come together to promote non-infringing use of digital resources for education.

Finally, it’s not just teachers that license footage for educational purposes. News and documentary production, for better or for worse, is largely driven by the availability of topical footage. Increased availability of moving images and expanded licensing opportunities promise to expose the general public to new ideas and previously unknown or underreported events via televised documentaries and theatrical releases. As an added bonus, historically underfunded repositories could use this licensing mechanism to generate revenue to better support their organizational missions and preservation activities.

Many Routes to Multi-Functional Components

To provide this multi-faceted functionality, we have developed MIC incrementally. The Archive Directory was built first, then the Union Catalog, now a mapping utility, and next a cataloging utility. MIC is designed to be a flexible, extensible, and interoperable tool. It relies on open-source software and is easy to maintain and use. MIC offers a number of services to archives and features components with multiple functions and multiple avenues for archives to participate and collaborate.

For example, while Web directories typically serve merely as pointers to participating organizations’ websites,[10] the MIC directory can actually take users to the organization’s records in the MIC Union Catalog, or directly to the organization’s own catalog, or to its website. A feature of MIC is the integration of Archive Directory and Union Catalog databases. This integration allows end users to limit Union Catalog searches to a single archive or a selection of archives. Once a Union Catalog record is retrieved, the system queries the LDAP directory database and displays information about obtaining the resource with the bibliographic record itself. This information, specific to the organization holding the title, is pulled from the Archive Directory entry. This functionality was crucial for obtaining buy-in from the archival moving image community. Much of what is held by moving image archives is inaccessible. Film may be on negative or other pre-print stock, or it may be unique. Video materials frequently reside on obsolete formats for which viewing equipment is no longer available. In field-wide planning interviews conducted by Grace Agnew, archives rightly insisted that organization-specific access policies be prominently displayed with each record.

Finally, the Archive Directory is a tool for collaboration and community building. Through the Directory input form, information is systematically gathered about the collections, services, and cataloging, and preservation activities of a wide array of archives. These detailed descriptions give archivists the information they need to evaluate archival activities in similar repositories and to identify organizations with common interests. They can then utilize MIC’s portal structure to build communities for collaborative projects. The Directory also enables the Library of Congress and AMIA to identify community needs, potential collaborations, and emerging trends, in order to focus community training and support.

End users can enjoy any of several modes of access to the MIC Union Catalog. First, of course, there is the Web search from any of several MIC portals. The database can also be accessed remotely using MIC’s Z39.50 capability, and Dublin Core export supports OAI harvesting. Even within a Web search, there are multiple avenues of access, since MIC includes six separate portals, and different portals yield different results. For example, the Science Educators portal, dubbed “Science Goes to the Movies,” retrieves only moving images related to science. And while the MIC records display is consistent across portals, Archivists Portal users have the option to display records in any of a number of different schemas, including MIC XML, MARC HTML, MPEG-7 XML, and Dublin Core XML. Similarly, informational resources vary by portal, as do Archive Directory displays. The Archivists Portal, for example, is the only place where Directory entries include information about cataloging and preservation activities.

By the same token, archives are offered several means by which they can contribute their records to the catalog. MARC records can be loaded and mapped automatically. For local metadata schemas, organizations can use the mapping utility to map and ingest records into the catalog. Direct input will be a third option once the cataloging utility is up and running.

Just as there are multiple ways into the catalog, the catalog records serve multiple purposes as well. MIC allows users to search across multiple repositories to find current detailed descriptions of moving images, and the images themselves, for the first time. As with any large aggregation of records, searching can reveal heretofore-unseen relationships that can suggest new areas of research. Because the MIC Union Catalog includes only bibliographic descriptions of moving images, users wishing to search only moving images can do so easily. In many large academic institutions, users need a complicated set of instructions simply to limit search results to moving images; in some major research organizations, a comprehensive search of only moving images is impossible. The MIC Union Catalog contains only records for moving images. Users can search across all or selected repositories or limit to moving images within a particular institution.

Finally, MIC enables research and development in emerging technologies by making a sizeable set of bibliographic records representing a cross section of the archival moving image community available in a variety of metadata schemas. Computer science researchers can partner with the library and archival community to explore low-level indexing, authority control, events-based directories, FRBR implementations, digital rights management, active privacy policies, and fair use.

A Metadata-Driven Strategy

MIC’s innovative design employs a metadata-driven strategy to simultaneously address multiple goals of expanding education, outreach, access, preservation, and research in culture and information technology. This strategy is five-fold:

1. Embrace the inherent diversity in the field.
2. Promote metadata standards.
3. Democratize digital resource management.
4. Enable exploration of new technologies.
5. Provide a model extensible to other archive and library communities.

The most salient characteristic of the archival moving image field is its diversity. Currently moving images can reside in almost any type of organization, from large national institutions like the Library of Congress to small public, private, or non-profit institutions with specialized collections. The diversity of these organizations, like the diversity of materials they hold, continues to grow as media permeates all aspects of society. Organizations can be big or small; often, critically important works are held by individuals. With differences in size come differences in available financial resources. Similarly, differences in repository missions suggest a variety of user needs. The end result is that repositories everywhere employ a huge variety of metadata schemas, many of them local or proprietary. Developing a system that would embrace these differences rather than force conformity would attract the largest possible number of participants. Since MIC’s original mission was preservation, it was important to document extant moving images wherever they might be found.

Central to the system is MIC’s Core Registry, a list of about 50 data elements for moving image description. This schema’s context-independence is the key to its accommodating the multiplicity of extant schemas. Context independence also means that the schema will address needs not just of current users, but also future users and constituencies, some of them as yet unidentified.

The MIC Core Registry is a rigorously maintained and standardized application recorded in a modified ISO 11173 registry format. Data elements from virtually any schema can be mapped to these fields for later export in any of several other standard schema, including MARC21, Dublin Core, and MPEG-7. Other mappings in the works include PBCore, MODS, IEEE-LOM, and SMPTE.

Mapping a few core data elements across schemas is relatively simple. More difficult to achieve is a rich mapping that supports a range of user information needs, some as yet unforeseen. MIC data must also meaningfully participate in other collaborations using other metadata standards, and be extensible beyond descriptive metadata, to incorporate METS-compliant preservation, rights, and technical metadata. This range of functionality requires more careful design and effort, so that data can be exported in different schemas for different purposes.


Figure 1. MIC Mapping 

For example, MARC records allow interoperability, especially with print materials and can be retrieved via Z39.50 protocol. Dublin Core enables OAI harvesting for National Science Digital Library and other consortia. MPEG-7 is one of the few metadata schemas developed specifically to describe, manage, and provide access to moving images. Unlike MARC and Dublin Core, MPEG-7 was designed to support multiple manifestations and accommodate technical and administrative metadata. Moreover, MPEG-7 supports the automatic generation of segments for storyboards and summary clips that can be combined into learning objects. By providing schemas that process the digital bit stream for low-level, non-textual, digital video indexing applications, it allows for retrieval of digital objects by properties like color, shape, and texture. Using the MIC export utility, an archive employing Dublin Core for ease of use in-house can make its records available in MARC for Z39.50 access or in MPEG-7 for low-level indexing. It has been said that “the historical antecedent for this kind of thing is the Rosetta Stone.”

The MIC Core Registry is detailed enough to retain the richness of extensive archival descriptions, but simple enough to provide users with readable, succinct displays of heterogeneous metadata from widely divergent source records. The MIC Core Registry supports the core user needs as defined by IFLA’s Functional Requirements of Bibliographic Records (FRBR): to find entities that correspond to the user’s stated search criteria, to identify an entity, to select an entity appropriate to the user’s needs, and to obtain a copy of a selected entity.

Once MIC’s Core Registry was established, with maps to MARC21, Dublin Core, and MPEG-7, a mechanism was needed to allow organizations with local systems to map their own schema to MIC’s schema. As recently as 1998, Footage: the Worldwide Moving Image Sourcebook documented 3,000 moving image archives in the US alone, and numbers have increased exponentially since then. To ingest millions of records into the MIC Union Catalog would require a production line-like mechanism. In the archival moving image community, a significant number of small institutions rely on commercial databases like FileMaker Pro, or even spreadsheets, to hold their descriptive metadata. For maximum participation, a self-service mapping tool that worked for just about everyone with minimal use of intermediaries was needed. Yang Yu, database programmer at Rutgers University Libraries, worked with system architect Grace Agnew and Project Manager Jane D. Johnson to design the MIC mapping utility to serve this need.

The utility is currently in beta testing and has been successfully used by several organizations to map homegrown schema into MIC. The first step in the mapping process is submission of an online application by the archive. The 20-question form takes a few minutes to fill out and includes contact information, cataloging practices, record format, intended mode of delivery, and space for comments. If an organization uses more than one schema, it can assign a memorable name to each for reference purposes. Once the application is submitted and reviewed, the organization uploads a small set of sample records and its own list of data elements. (If the organization finds it easier to submit its entire record set, it can do that, and we will sample the records at our end.) The system then populates the online mapping form with the field list. The form leads the user through the list of MIC data elements. It asks the user to select, from a pulldown menu, the organization’s own equivalent for each MIC data element and to provide a sample value for each. At any point in the mapping the user can opt to preview its sample values in a MIC display. Once completed, the mapping is reviewed by the MIC administrator, revisions are made as necessary, and the mapping is approved. Finally, the organization submits its full record set, the MIC administrator identifies links to digital video for indexing, and the records are ready for ingest.

The MIC mapping utility has the potential to transform and democratize cataloging. Any moving image repository, using any metadata schema, can easily map its own records for sharing globally through the MIC Union Catalog and elsewhere. Small institutions can make their holdings accessible at a low cost on the Web, comply with national and international standards, and use existing personnel and infrastructure. Larger institutions employing multiple or legacy metadata schema can map different collections through the mapping utility and export records in a single metadata schema. Individual collectors are given a platform for making their materials available to a wider public, exposing previously unknown footage and genres to scholars, educators, and the public.

With standard mappings and the mapping utility in place, it is almost possible for any archive with machine-readable records to readily load records into the MIC Union Catalog. Still, Union Catalog records are limited to largely descriptive metadata, which is not enough metadata for management of archival resources through their lifecycles. The next development will be the MIC cataloging utility to provide a front-end input form for inputting records directly into MIC. This form would accommodate descriptive, administrative (including preservation, technical, rights), and structural metadata. A downloadable METS-compliant cataloging utility, available from Library of Congress and MIC websites, would extend metadata for collection description, management, and access to any repository, regardless of technical readiness or cataloging expertise. This product would leverage LC-supported standards for description (MODS) and digital preservation (PREMIS), but go further to address the integrated management of legacy analog source materials and digital resources and rights for access to digital video resources. Rutgers has developed the bulk of this utility as its Workflow Management System. The Rutgers University Libraries is a recognized leader and innovator in the development of digital library technologies, particularly in the development of repository architectures and services with a dual focus on the long-term preservation and discovery of digital resources.

Although MIC is dedicated to accommodating local and proprietary schemas, one of MIC’s core missions is to promote metadata standards, reflecting the Library of Congress’ leadership role in this area. MIC is committed to open source, standards-based interoperability protocols. By easily ingesting records in standard formats and providing homogenous displays from widely divergent collections, MIC illustrates by example the value of standards. Beyond that, MIC’s informational resources promote standards use by educating archivists in their use.

Indeed, MIC’s informational resources were originally conceived to educate archivists in the use of cataloging standards and practices. The scope of MIC’s education and outreach space has now expanded to include preservation, exhibition, and collection management information, as well as resources for the general public. In the General Users Portal, links to popular sites such as Internet Movie Database and Rotten Tomatoes are intended to draw people to the MIC site, where they can learn about preserving their own home movies or find out how they can contribute to public moving image preservation efforts.

Conclusion

MIC is a collaborative effort to promote discovery, preservation, and educational use of moving image materials. By acknowledging the diversity in the field and building an innovative and extensible platform to accommodate it, MIC provides broad and open access to asset management, promotes standards, and supports collaborative preservation. Building a preservation infrastructure on an organization-by-organization basis is not practical; collective action is required. MIC’s metadata strategy provides an exemplar in its dual commitment to widespread access and collaborative preservation of moving image resources, in both analog and digital form. Its impact is demonstrated by impressive statistics of use. As of March 2006, the site has had 2.8 million total hits, currently averaging over 3,000 hits per day. There have been 323,000 total visitors from 92,000 unique IP addresses. The site averages over 350 visitors per day.

Notes:
[1] The Association of Moving Image Archivists (AMIA) is a non-profit professional association established to advance the field of moving image archiving by fostering cooperation among individuals and organizations concerned with the acquisition, preservation, exhibition, and use of moving image materials. The AMIA website may be found at: http://amianet.org/.
[2] Melville, Annette and Scott Simmon.  Redefining Film Preservation, a National Plan: Recommendations of the Librarian of Congress in Consultation with the National Film Preservation Board. Washington, D.C.: Library of Congress, 1997.  Murphy, William.  Television and Video Preservation 1997. Washington, D.C.: Library of Congress, 1997.  Copies of the plans are available at: http://www.loc.gov/film/filmpres.html.
[3] The Women Artists Archives National Directory, or WAAND, is a Web directory to US archival collections holding primary source materials by and about women visual artists active in the U.S. since 1945. The WAAND website may be found at: http://waand.rutgers.edu.
[4] The New Jersey Digital Highway is a portal providing access to the collections of several New Jersey cultural heritage institutions.  The website may be found at: http://www.njdigitalhighway.org.
[5] “National Archives and Google Launch Pilot Project to Digitize and Offer Historic Films Online” (press release posted on Business Wire, February 24, 2006); Borland, John, “Google puts National Archives video online,”  ZDNet News, February 24, 2006.  
[6] See, for example, Rick Prelinger’s February 24, 2006 posting on AMIA-L:  http://lsv.uky.edu/scripts/wa.exe?A1=ind0602&L=amia-l.
[7] Olsen, Stefanie, “Coming soon: Google TV?ZDNet Australia, November 30, 2004.
[8] Waters, Donald J., “Managing Digital Assets in Higher Education: an Overview of Strategic Issues,” Managing Digital Assets: Strategic Issues for Research Libraries, October 28, 2005:  Forum Proceedings, p. 9.
[9] Joseph J. Esposito, cited in Waters, Donald J., “Managing Digital Assets in Higher Education: an Overview of Strategic Issues,” Managing Digital Assets: Strategic Issues for Research Libraries, October 28, 2005:  Forum Proceedings, p. 14.
[10] See, for example, the UNESCO Archives Portal at: http://portal.unesco.org/ci/en/ev.php-URL_ID=5761&URL_DO=DO_
TOPIC&URL_SECTION=201.html
.


Copyright 2004 RLG.