Nancy Brodie, Chief Information Officer Branch, Treasury Board of Canada Secretariat
Note: The views in this paper of those of the author and do not represent official views of either the Treasury Board of Canada or the National Library of Canada.
Introduction
Archives and libraries, particularly national deposit libraries, have long been considered sources/holders of authentic and/or authoritative texts. As a librarian I have attempted to characterize the professional and institutional practices which support this role and to project these historical practices into the digital environment. In this paper I would like to step back and see how creators and users of information in different communities view the authenticity of digital information and then revisit the practices of libraries and archives. I will draw primarily on recent Canadian research from government, academia and the legal community.
Authentication is very closely associated with preservation in the minds of Canadian scholars, librarians, information managers and lawyers. This paper will review the work of the National Library of Canada with the Government of Canada, the Canadian Association of Law Libraries and the Humanities and Social Sciences Federation of Canada relating to assurance of the authenticity of digital information.
Definitions
Definitions of authenticity and authentication of digital information vary by community.
The Report of the CPA/RLG Task Force on Archiving of Digital Information links authentication to integrity. The Report defines preservation of information integrity as "definition and preservation of those features of an information object that distinguish it as a whole and singular work." The Report lists the determining features as content, fixity, reference, provenance, and context. The Task Force Report takes a narrower view of authentication: "Authentication provides verification that a digital object is what it purports to be and contains the contents that the author/creator or publisher originally intended."
The University of British Columbia study, The Preservation of the Integrity of Electronic Records, links reliability and authenticity: "Reliability refers to a record’s authority and trustworthiness, i.e., its ability to stand for the fact it is about. The concept is linked exclusively to records creation. Authenticity, on the other hand, refers to a record’s reliability over time and is linked to the record’s status, mode, and form of transmission and the manner of its preservation and custody."
Government security specialists define authentication as one of the four components of a basic security service: authentication, integrity, confidentiality, and non-repudiation. Authentication can be applied to data (proof of the source of the data) or to users (to support access controls so that only those who are authorized to view or modify data can access that data). Encryption addresses the confidentiality and access control requirements, while digital signature addresses the integrity, authentication and non-repudiation requirements. In the context of a business transaction, electronic authentication is the process by which an electronic authorization is verified to ensure, before further processing, that the authorizer can be positively identified, that the integrity of the authorized data was preserved and that the data are original. Security specialists further define levels of authentication e.g.
Simple authentication: Authentication by means of simple password arrangements.
Strong authentication: Authentication by means of cryptographically derived credentials.
For the legal community, an authentic document is one that can be used as an official version in a court of law. Authentication may be a procedure established by a court or other legal entity to prove the authenticity of an object.
Canadian scholars in the social sciences and humanities use the term credibility rather than authenticity in a recent study of electronic scholarly publishing.
All the above definitions have in common an understanding that authentication is a process which allows future users to determine the authenticity of objects or proves the authenticity of objects. The definitions also imply that different levels of authenticity may be required by different communities for different functions and hence that there are different levels of authentication.
This paper will go beyond the definitions of authenticity and authentication to consider related concepts of integrity, reliability, trustworthiness and credibility.
Authentication as viewed by creators
Creators of digital objects take decisions that affect the ability of libraries, archives and other institutions to preserve the digital objects. Creators view certain attributes of digital objects as key to their authenticity and apply different authentication processes. These attributes and processes are important for both preservation and accessibility.
Government
Most research libraries collect government publications. In many countries government is the largest publisher. Government records are the raison d’être of national archives. Governments are very conscious that they must establish an environment of trust as they expand electronic service delivery. Government information may receive a higher level of authentication than other information that must be preserved and made accessible over the long term.
The Government of Canada (GOC) has recently enacted legislation to support electronic commerce by protecting personal information, by admitting electronic documents as evidence and by giving electronic publication of statutes and regulations the same legal authority as notices and Acts published on paper. However, the electronic publications portion of the legislation will only be brought into force "when the appropriate technology is in place for ensuring the integrity of the electronic versions."
The Government recognizes that citizens and organizations need a secure and trusted environment to conduct business with government. The Government has adopted a policy to use public key cryptography in order to provide for service delivery, public administration, and communications in a secure manner electronically and support electronic alternatives to the use of paper. The initial focus of use of a Public Key Infrastructure (PKI) is to identify parties to a transaction and protect personal information. The possible use of PKI for public information has not yet been defined. A Public Key Infrastructure Information Management Working Group has been established to work with the Information Management Forum to address issues such as the preservation of records through time. However, the National Archives of Canada has already taken a policy decision that it will only preserve records in unencrypted form.
The Government has recognized the need for less formal authentication of government information through adoption of standards for a Common Look and Feel for government Web sites. These standards promote the clear identification of Government of Canada information through graphical or visual identifiers such as the Canada wordmark, textual and graphical identification of name of the responsible government department and a minimal set of metadata (originating organization, title, language, date, subject). Accessibility is an important component of these standards. Sites must be designed and information made accessible in such a way that a wide range of technologies, including personal computers, assistive devices, and advanced technologies can be used.
The Information Management Forum has developed guidance for the Government of Canada on managing Internet information for long term access and accountability. Although this guidance does not address authentication specifically, it proposes a management framework and risk management approach to preservation. Risk management is a key driver in preservation planning.
Legal community
The Canadian Association of Law Libraries (CALL) recognizes that authentication, preservation and citation are all issues that have to be addressed to assure that legal information in digital form becomes the "official version." In 1997 CALL sponsored A National Summit To Solve the Problems of Authenticating, Preserving and Citing Legal Information in Digital Form called "The Official Version". Proceedings of the Summit include several papers on authentication. However, CALL recognizes that the problems of preservation and authentication have not yet been solved and continues to work with partners in the Canadian legal community to address these issues. The CALL Preservation Committee developed a framework and checklist for digital preservation that identified the following attributes of authenticity where a source document has been digitized: source of document, image or text, changes from original source, documentation of digitization process, extent of encryption and completeness. Where official or court standards for authenticity exist are these standards met? CALL agrees that there are different levels of legal demand for authenticity and there are lower expectations of authenticity for secondary source legal materials than primary materials.
Academic community
Canadian scholars surveyed in 1999 by the Humanities and Social Sciences Federation of Canada (HSSFC) about electronic publishing were most concerned about the credibility of the source. Credibility was strongly linked to the need to ensure long term accessibility of electronic publications through proper archiving. The results of a follow-up study, The Credibility of Electronic Publishing, were available in draft form as this paper was being written. The goal of the credibility study was to recommend strategies to encourage high quality publication in electronic format, and to overcome the skepticism that surrounds the perception of academic electronic publishing. Four sub-studies were undertaken: Peer Review and Imprint, Copyright, and Archiving and Text Fluidity / Version Control.
Peer review, a step in the creation process is seen as a key indicator of credibility in the academic community. The reputation of a publisher or journal is carried in its imprint and this reputation is another key indicator. Jean-Claude Guédon asserts that publishing in a reputable, peer reviewed scholarly publication is the phase in the scholarly communication process where distinctions are attributed. In the print world this is synonymous with the publishing phase. But in the electronic world it is more and more common for the distinction phase to be separate from the publishing phase or the phase where documents are first made public. Electronic pre-print archives are examples of this latter phase as are the draft reports of this research study. Expectations of authenticity are higher for publications in the distinction phase.
Alan Burk and colleagues discuss the number and variety of digital objects that are not a part of the scholarly record, per se, but are cited within the scholarly record. These references to ancillary material contribute to the authenticity of the scholarly document but are not a part of it. Most readers do not wish to read every reference in a print article; nor will they need to follow all the links in an electronic article. Information contained in the citation should be adequate to support trust in the scholarly process. Burk is of the opinion that the scholarly process will not be undermined if links to these cited objects become inaccessible.
The fluidity of electronic information has been considered a barrier to authentication. Burk posits that electronic texts are not fluid but that modifications to texts produce distinct versions. Clear naming of versions and identification of changes between versions go a long way to authenticating these texts. Metadata provided by the creator or publisher assists authentication but metadata provided by libraries is also important. The issue is not how to authenticate but what to authenticate; not all versions may warrant authentication or preservation. In scholarly communication the refereeing and editing process will continue to produce a version that is fixed and must be preserved.
Authentication provided by libraries, archives or other depositories
The assurance of authenticity is not simply a technological problem. There is a chain of persons, organizations and processes that contributes to authenticity. This chain begins with the creator and publisher but it includes organizations that preserve and provide long term access to information.
Bibliographers, librarians and archivists have established principles of authenticity and practices to support assurance of authenticity in the world of print. Library and archival collections and methods of preservation and custody are important components of the authenticity chain.
Selection of a document or collection by a library or archive gives the information resource some authenticity. Libraries apply or record unique naming schemes to documents. Libraries record the attributes of authentic information in their catalogues: author, publisher, date. These attributes distinguish different versions of publications. The catalogue puts the author in a context through collocation and authority files. Archival finding aids are even more explicit in recording context. Libraries and archives are trusted intermediaries that protect the documents they hold and assure anonymity of users. Library and archival procedures are based on professional standards and practices and are documented.
Many professional and institutional principles and practices of the print world are relevant in the digital world. A primary recommendation from the HSSFC Study is for the academic community to work with publishers and libraries to ensure that standards and best practices for dissemination, availability and preservation are integrated into publishing and preservation processes for electronic publications.
Research from other communities supports the role of institutions and processes in addition to technology. Security policy speaks of the integrity of the information and related processes. Active information protection is an important aspect of security that involves processes for protection, detection, response, and recovery. The trustworthiness of an electronic medical records system is essential to authenticity of the records. The business processes of legal authorities such as court clerks serve to authenticate the documents that they handle. Even the management of the authentication technology itself must be trustworthy. Trustworthy authentication systems, procedures and personnel must operate with quality equipment, resources, audits, transparency of procedures, and in compliance with applicable laws according to the United Nations Commission on International Trade Law (UNCITRAL).
Experience of the National Library of Canada with Electronic Publications
The National Library of Canada (NLC) has been building a collection of networked electronic publications since 1994. It has been a goal of NLC to build an authoritative and reliable collection and to preserve the authenticity of publications. Its Networked Electronic Publications Policy refers to preserving the integrity of publications as originally released.
Mechanisms for backing-up, storing back-ups, and recovering/reconstructing the NLC databases and on-line digital resources, and emergency policies and procedures were in place and have been upgraded, particularly through Y2K exercises. NLC servers are protected by firewalls. These partially address authentication concerns. More rigorous methods at the document level to protect contents from deliberate or accidental change, such as hashing or digital time-stamping, were envisioned in 1994 but have yet to be implemented, either by publishers depositing electronic publications with NLC or by NLC itself. However there are many processes in place which contribute to authenticity.
NLC considers its Electronic Collection as the equal of its collections of tangible materials. Collection management policies, guidelines and procedures are in place as well as staff dedicated to the collection. These processes give authority to the Collection; however, gaps in policy and law (e.g. legal deposit), gaps in procedures and lack of dedicated preservation staff negatively influence authenticity in the Electronic Collection.
NLC applies international naming schemes to networked electronic publications e.g. ISSN and more recently, ISBN. NLC applies AACR cataloguing to electronic publications and includes these authoritative bibliographic records in Canadiana, the National Bibliography.
NLC devotes considerable effort to collecting versions and to version control, highlighted by Burk (above) as a key attribute of authenticity. By policy NLC does not necessarily collect every version of all networked electronic publications. NLC collects the first and subsequent editions of a publication. To determine whether a new version should be regarded as a new edition, the following criteria are applied:
(a) the version of the publication is announced as a new edition, i.e., as indicated in wording or by means of a number, date, or code.
(b) the publisher regards the version as a new edition.
(c) the changes to the new version of the publication are significant.
(NLC encourages publishers to identify significant changes.) "The frequency of capture will vary from comprehensive to representative and will depend on factors such as publication pattern, scope of changes, and the overall significance of the publication."
NLC attempts to collect publications as they are published ensuring that the initial published version can be preserved even if the publisher updates (or corrects) the document soon after publication. NLC preserves all versions acquired. Where publications are updated by adding to the original publication rather than releasing a new issue of a serial, each NLC version is in fact a snapshot of the full publication, resulting in considerable redundancy in the Electronic Collection. Versions are identified either through full catalogue records or through annotations in the Electronic Publications Management System displayed in the Web access pages. Versions of publications which are updated by addition of data or where there is a significant gap between date of publication and date of acquisition may be identified by a capture date.
With HTML publications NLC has found it difficult to preserve every bit in the original publication as it appeared on a publisher’s Web site. Moving HTML documents from one environment to another often requires revising HTML links and directory structures. Some images may not be included in the transfer because they are not in the same directory. Migration from one software version to another or from one systems environment to another will undoubtedly require even more changes. These kinds of changes make it impossible to preserve hash codes or digital signatures that are based on bits in the source document. But these changes may not move the contents far from what the author or publisher originally intended.
NLC has put in place many links in the authenticity chain. It is aware of certain missing links. Some technological links are still unknown. Costs of maintaining authenticity of electronic publications are far higher than costs of maintaining authenticity of print collections. Society has invested in national libraries and archives as institutions for the preservation of the national memory. Society has not yet made the commitment to the additional investment required to preserve the national electronic memory. Like other institutions NLC struggles to plan and undertake a national digital preservation role with inadequate funding.
Implications for preservation
Authentication can be considered the enemy of preservation of authenticity. Technological tools of authentication such as encryption make preservation of authentic documents very difficult. All the pieces of a public key infrastructure, encryption algorithms, encryption and decryption software, private keys, public keys, certificates, certificate authorities, etc. would have to be preserved along with the encrypted document. Institutions such as the National Archives of Canada have decided not to preserve encrypted documents but only to preserve the contents in unencrypted form. The Archives will use its status as a trusted repository and custodian to achieve authentication through other means. And as this paper has shown there are many other links in the authenticity chain.
Libraries are in a more difficult situation because their tradition of open access to collections adds risk to custodianship of unencrypted authentic digital objects. In addition publishers are using technological tools very similar to authentication tools to manage intellectual property rights and control access to commercial assets. Libraries may only be able to collect and preserve such commercial publications if they guarantee preservation in encrypted form.
Implications for access
Scholars surveyed in the HSSFC studies placed high importance on the long-term accessibility of electronic publications. Scholarly research builds on past research and that past research must be available for consultation. Canada’s legal community operates in a common law framework where past precedents must be available for research and citation. While government information is generally disseminated or made available to meet short-term objectives, governments recognize that they have a long term accountability to citizens and that there is a requirement for permanent public access to electronic government publications. A recent court case in Canada, the Authorson case on veterans’ pensions, has highlighted both legal and government needs for historic government publications and records.
The security specialist looks at risks. "To be useful information must be accessible, and this very accessibility puts it at risk.... Connectivity exposes information to risks outside each organization’s control." Institutions striving to meet both access needs and preservation needs must manage this risk and assess the risk to authenticity based on the functions of authenticity being supported and the levels of authenticity required for classes of documents.
As mentioned in the previous section, technological solutions may introduce barriers to long term access to electronic publications. But long term access is in itself a key component of authenticity. Libraries must play an active role in resolving any tension between access and authenticity. However, both are dependent on the preservation of digital documents in electronic collections.
Conclusion
Many links in the authenticity chain have been identified. We must continue the research and experimentation to identify other links. Further analysis of the components of authentication included as an Appendix may help further this research. But in the end it will be an environment of trust that holds the links together and is the key contributor to authenticity in the digital world.
Appendix: Components of authentication in an electronic environment
The following components of authentication were identified from the research, policy and practice described in the paper. More work from all communities is required to clarify the variances in definitions of these components and define undefined components.
Attributes of authenticity
RLG: preservation of integrity: content, fixity, reference, provenance and context
Records (UBC): Status, mode, form of transmission, manner of preservation and custody
GOC: identifiers (visual, textual, metadata) including jurisdiction, responsible government department, title, date
CALL: document source, text or image, changes from source, completeness, extent of encryption, documentation of digitization process.
HSSFC: inclusion in an archive, accessibility, peer review, publisher reputation, references, version naming, identification of changes between versions, fixed version.
Library science: selection for a library collection, unique naming, title page information i.e. author, title, publisher, date, name authorities, pagination, binding, existence of multiple copies, comparison to standard bibliographic description prepared by authoritative source, comparison to other copies.
NLC: international naming schemes, authoritative bibliographic record, version control, capture date
Functions of authenticity
Government: environment of trust
Legal: official version that can be used in court, giving documents legal authority
Academic: credible scholarly resource
Security: component of a security service, verification of an electronic authorization
Levels of authentication
Security: none, simple, strong, "two factor" e.g. smart card or token
GOC: high level (PKI) to ensure trust; Common Look and Feel for Web sites
CALL: higher level for primary than secondary legal materials
Academic: higher for publications in the "distinction" phase, higher for the scholarly record
Processes of authentication
Library standards and practices
Archival standards and practices
Business processes
Trustworthy systems
Transparent procedures
Active information protection
Audit
Acquisition at time of publication
Version control
Back-up and recovery procedures
Emergency policies and procedures
Hashing
Digital time-stamping
Digital signatures
Encryption
PKI