
Introduction
PANDORA, Australia’s Web Archive at the National Library of Australia (NLA), has been archiving Web-based publications for 10 years, in conjunction with participants at the Australian State Libraries and other cultural organisations, including the Australian War Memorial, National Film and Sound Archive, and the Australian Institute of Aboriginal and Torres Strait Islander Studies. There are approximately 12,000 titles within the Archive; each title may be a single discrete document or a whole government website containing thousands of pages.
Many studies and articles examining archival practice and policy have emanated from PANDORA. None, however, has attempted to gauge the effect of archiving on the archived—that is the publishers and their publications.
This study examines publisher behaviour and attitudes in relation to Internet archiving. Data for the study was obtained by various means. The NLA placed an online survey on their website and invited the 4920 people who had given permission for PANDORA to archive a resource between 1996 and May 2005 to complete it. The May 2005 cut-off date was used so that information would be received from those whose work had been archived for more than one year. To complement the survey, a selected range of archived material was examined to discover publication patterns pre- and post-archiving. A small number of electronic resources that were not archived by PANDORA, but which had been archived in the Library’s Whole of Domain Harvest or by the Internet Archive, were compared with items archived by PANDORA. In this way a sample of knowingly archived and unknowingly archived items was available for comparison. An analysis of published comments appearing on archived websites was also undertaken.
There are a number of Internet archiving projects currently gathering websites for preservation. Most of these—and the largest, the Internet Archive—do so in general without the express consent or knowledge of the Web publisher. As such, the Web publisher does not automatically know that a copy of their publication is in existence elsewhere and that what they are producing will potentially have a much longer life than may have been intended. However, this is not the case with PANDORA, as it is one of the few archiving projects that explicitly seeks permission from publishers before archiving and notifies them post-archiving. This study, then, queries only those knowingly archived publishers.
The PANDORA Publisher’s Survey
Material produced for the Internet is generally still not afforded the respect that is garnered by traditional print publishing: it is often not subject to peer review, it is frequently perceived to contain unreliable information when compared to print publications, and it is often hard to rank. Usage statistics are one means of defining quality and usefulness, but popularity does not always indicate quality, reliability, or link stability. 
One way that an Australian Internet publication can receive permanence and recognition is by being invited to be archived in PANDORA. PANDORA is a selective archive: Web publications found within PANDORA have been chosen by staff using explicit selection criteria, giving many the impression that the items archived are in some way different and more significant than those not archived. The PANDORA form letter, which is sent out to publishers when initially asking for archival permission, includes the claim that the desired item has both “lasting cultural value” and “national significance.” These compliments clearly resonate with many publishers and seemingly convey a perception of recognition and formal imprimatur. Some publishers have quoted these letter in their publications, seeking to make their audience aware of the Library’s estimation of their publication. One publisher of an online novel has even used the excerpted sentence to make it seem like a positive review.

Figure 1. Effect on public perception.
Many publishers are therefore very happy to be archived. When asked in the survey whether PANDORA archiving was worthwhile, 97% said that they thought it was; 96% also thought that archiving had been a positive thing for their publication. Conversely, the survey also showed that, prior to our first contact requesting archival permission, just over 52% of publishers had not heard of the Archive. And once aware of the Archive, only 35% had ever used it to view any other website. Interestingly, 29% of publishers also believe that it is improbable that PANDORA will preserve their publications in the long term.

Figure 2. Effect of archiving.

Figure 3. Role of Pandora as a back-up strategy.

The majority of publishers also did not appear to rely on the Archive as a back-up of their publications, indicating that they are either unaware of the importance of back-ups or that they are in organisations that have risk management strategies in hand. On occasion, however, we have been able to provide copies of content to publishers that have suffered serious problems with their computers or networks and have lost the content on their own websites. Another service that PANDORA provides for some publishers of websites and online journals is the ability for them to point to our Archive for past issues of their publication, so that they do not have to host them themselves, presumably saving on their storage and maintenance costs.
Survey Findings
Some publishers have worried that a “light” archive, which makes material publicly accessible, would draw some readers away from their own websites, and most of the sites in the Archive are still available from the publisher.The PANDORA Archive does receive a relatively large number of hits: the usage in 2004-2005 was 5,390,459 page views. The archived sites most frequently accessed in PANDORA are almost invariably websites that are no longer available on the live Web.

Figure 4. Impact of PANDORA on website hits.
The PANDORA survey asked Web publishers if they believed that archiving had affected the number of hits to their publications. Sixty five percent said that it had not, and of the 34% who said archiving did have an effect, 92% said that it had been positive. Caution should be taken, however, when extrapolating these results to all archives. PANDORA may lead to increased hits and usage of live websites only because it actively attempts to do this as a reciprocal gesture for publishers who participate in the Archive. Every title within PANDORA has a Title Entry Page (TEP) that serves as the first point of entry, and on this page is a link to the live site. PANDORA also uses a pop-up window to inform users that they are entering an archive and not the live site. This pop-up window appears when users enter the Archive from a link that is not within the National Library Web domain. Robots exclusions have also been used to direct search engines to deliver the TEP in their search results rather than a direct link to lower level pages. These activities, especially the active link from the National Library of Australia, as well as our metadata on the TEP and the individual catalogue records created for each title on Libraries Australia, all raise the visibility of the live resource to a marked degree.
Blogs
The National Library began a concerted effort to archive blogs in May 2005. Prior to that date, it had previously selected and archived the blogs of some notable Australian politicians and journalists. This archiving was not done because the medium was a blog, but rather because the individuals were high profile and we wanted to capture the online-only adjuncts to their traditional media output.
Beginning in mid-2005, the NLA began to archive a representative sample of blogs to document this popular means of communication and personal publication. Bloggers in particular were very enthusiastic about being archived and consequently many documented the experience on their blogs. One young man wrote an online letter for future youth; another considered the effect on their publication habits thus:
One of the weird things is that I’m going to need to resist the urge to be more self-conscious, now that I know that my words here will be preserved in this manner. I feel kind of inspired to keep this blog going and may end up trying to improve the quality and quantity of my posts, which is a good thing … I guess ;) [1]
Another was less impressed writing that, for posterity:
...you’ll still be able to get your fix of ill-informed commentary, shilling for gambling sites and hot Asian chicks. And it’s all thanks to you, the Australian taxpayer. I would have preferred they just gave me one of those $50,000 grants, but beggars can’t be choosers.[2]
To what extent should the NLA be concerned about the effect of archiving on the documentary record? There is some justified fear that informing a content provider that what they produce will be recorded and made available for the long-term may tend to influence what they produce. Thus, we might create the “observer effect” whereby things are changed merely by the fact of observing them. Happily, from a comparison of blogs both before and after archiving by PANDORA, it is possible to see that archiving does not appear to have affected the content. Bloggers, though they may consider it in the short-term after initial archiving (and have commented thus), do not appear to continue self-consciously writing for a possible future audience. Instead, they concentrate on the immediate and the quotidian. From a brief textual analysis, there appeared to be no evidence that archived bloggers censor themselves any more than they did prior to being archived, a result which was confirmed by the general survey responses. The bloggers who discuss sexual, political, and personal information continue to do so, and where there is no illegality, the Library takes no role, except to restrict some archived websites and blogs to adult researchers.

Figure 5. Effect of archiving on self-censorship.
From studying a set of both archived and un-archived blogs, there also does not appear to be any changes in blogs’ longevity by being archived. The author’s personal circumstances, available time, and “having something to say,” seem to be far greater determinants of a blog’s longevity than any effect of archiving.
E-Journals
E-journals are journals that are published only in online form; they can emanate from any source. In Australia they are most widely used by government and academic, rather than by commercial, publishers. The National Library and PANDORA are often involved at the outset of these publications, since the first task of a newly created serial publication is often to apply to the Library for an ISSN. Built into the ISSN application form is a PANDORA archival notification clause. PANDORA, therefore, is often able to archive many Australian online journals from inception.
That archiving in PANDORA significantly improves quality or maintains publishing life cannot be proved from an analysis of archived serials. The survey asked publishers whether archiving increased submissions, comments, or other contributions. Publishers mostly reported that they had experienced no change, but where there was change, it was predominately positive. There also appeared to be no evidence that archiving prolonged publishing life.
Figure 6. Effect of archiving on submissions and comments.

Figure 7. Effect of archiving on citation rate.
There was some indication that some serials had increased citation rates as a result from being archived. Publishers’ perception of the usefulness of our creation of Persistent Uniform Resource Indicators (PIs) for their publications was very low, however. Only 14% of survey respondents believed that creating PIs for their publication had any benefit. However, we are aware that a very large number of links to the Archive are made by indexing agencies using PIs, since PANDORA has ongoing relationships with a number of indexing agencies (who furthermore actively advise us on the selection decisions in their specialist areas). The lack of knowledge of PI usage is possibly due to the fact that the links point to the Archive and not the live resource.
Commercial Websites
Most websites within the Archive exist to disseminate information, and many publishers welcome PANDORA’s role in promoting that further. Commercial websites differ in that they exist to gain revenue from users. Diffusing their customers has the potential to cost them revenue. It was interesting to note from the survey results that publishers of commercial websites say that they do not appear to be detrimentally affected by being archived.
There are three major ways in which an archive could have an economically damaging role on a live website. The archive may take away hits, leading to less page-view-based advertising revenue and possible click-through revenue. Archives may also display materials that normally are only viewable at a cost, such as a commercial subscription online journal. The third potential problem could be user confusion, whereby a user unknowingly accesses the archive rather than the live website and attempts but fails to complete a purchase, leading to dissatisfaction with the company and loss of potential revenue for it.
PANDORA has made efforts to not interfere with commercial websites’ activities. If commercially available material is archived, there is prior negotiation with publishers to restrict access for a publisher-specified period. PANDORA also tries to make sure that users are aware that they are in the Archive and minimizes sales confusion by not archiving or allowing any transaction pages or functions and by using re-directs to point to the live website.
Consequently, PANDORA has limited effects on commercial activity. The survey results showed that this policy seems to have worked since only 1% of commercial publishers believe archiving has had a negative impact.

Figure 8. Effect of archiving on revenue.
Conclusion
This study does not seek to give an opinion on how publishers perceive all Internet archiving projects. If, however, a known, openly searchable archive, such as PANDORA, has few problems with publishers, then a less searchable (deep Web) whole domain or larger archive should present even fewer problems for publishers.
The National Library has a statutory duty to collect and preserve Australia’s documentary history, and born digital publications are no exception. To successfully create an archive that encompasses the broadest range of publications requires the ongoing consent of publishers. As long as there are archived copies and live websites, it will remain the responsibility of archives to make sure that their activity does not adversely affect online publications—both their content and their commercial value. In the long term, many publishers’ websites will no longer be available outside of the Archive. However, the archived publications will still be protected by copyright for years to come, and, therefore the Library will need to continue to have the publisher’s consent to make material accessible.
The results of the study show that PANDORA archiving has thus far not had a detrimental effect on publications, and is in fact mostly benign and in some cases beneficial. It is to be hoped that the knowledge that Internet archiving does not necessitate any conflict between archivists and publishers will assist in guiding future negotiations.
Notes:
[1] Wilson, Morgan, explodelibrary.info, http://www.explodedlibrary.info/2005/07/the_national_li.html, accessed 14 June 2006.
[2] Ward, Sam, A yobbos view, http://www.gravett.org/yobbo/?m=200508, accessed 14 June 2006.
