Authority Control in the 21st Century: An Invitational Conference
From August 1995 to March 1996 I made a methodical study of new personal name headings entering the bibliographic database of Cooperative Computer Services. My primary objective was to evaluate the criteria by which I was selecting names to be searched against the LC/NACO Name Authority File on OCLC. In the process, however, I also hoped to gain some insight into the numbers and characteristics of personal name headings entering the database. Because I was also thinking about petitioning the CCS libraries to apply for NACO membership, I was curious whether there were really enough unauthorized names for us to bother with the process. As I began to accumulate statistics, I wondered if the patterns I was seeing duplicated others' findings. I was encouraged to find that some of the aspects of my local study mimicked research done by Arlene Taylor, Mark R. Watson, Tamara Weintraub and others.
Cooperative Computer Services (CCS) is a consortium of twenty-four medium-sized public libraries in three counties of suburban Chicago. CCS's shared bibliographic database of nearly 650,000 MARC records increases by about 1300 records every month. The computer system currently in use is Geac's LibsPlus system. All CCS libraries use OCLC as a source for cataloging copy. Original bibliographic and authority records are contributed under the consortium's shared OCLC and NUC symbols.
CCS's LibsPlus system allows integrated authority control. Authority records can be batch-loaded, exported from OCLC, and created manually at a computer terminal. The integrated system allows global change of headings and supports simple SEE: and SEE ALSO: references in the online public access catalog.
While LibsPlus allows CCS the sophistication of keyword searching and right hand truncation, it also requires the user to initially select the most promising search technique, i.e. keyword search or examination of an alphabetical list of headings which are linked to bibliographic records. Should the initial search be unsuccessful the searcher must rekey the search and choose the alternative technique. Given this and other considerations, CCS has decided that the default search will be an examination of alphabetical indexes and not a keyword search. In such an environment, up-to-date cross references are critical.
In 1994 the consortium's bibliographic database was sent offsite for cleanup then reloaded, along with 260,000 authority records. Since that time the CCS authority files have been maintained in-house by a fulltime Authorities Librarian who adds and makes changes to all authority records. The CCS authority file grows at the rate of about 1100 records per month.
A portion of day-to-day authority file maintenance at CCS involves matching headings which are entering the database for the first time against the authority files on the OCLC Prism system and exporting records into CCS. New headings are reported by means of a list which displays the normalized headings and their MARC field tags. On an average 500 new controlled headings are reported each day. The size of the report precludes checking every single heading, so criteria were established for selective checking. My study concerns the criteria for checking personal names.
To understand the criteria for checking name headings, it is first necessary to understand the purpose of the local authority file. The local file does not exist to provide authority data on every personal name in the database. The LC/NACO Name Authority File, which is available on OCLC, is the primary file against which CCS catalogers verify personal name headings. The local authority file has three functions. It is a repository of local authority work . It is also where internal decisions are recorded, some of which would likely shrivel and die under the stern eye of NACO. Finally, and most pertinent to my study, it is the records in the local authority file which create the cross references in the libraries' opacs.
One challenge the Authorities Librarian faces is to pick out, from the daily list of normalized headings, those names which might require cross references. Initially it was decided to look for 1) names which were used as subjects, 2) compound surnames, 3) surnames with separately written prefixes, 4) women's names which indicated a possible name change, 5) names for which the entry element was other than a surname, and 6) all instances of four common English surnames: Jones, Johnson, Smith and Miller.
You will notice there is no mention of checking "names that look peculiar" or "suspicious headings." In order to arrive at an objective measurement, I tried to eliminate from my study and from a formal policy statement the unquantifiable, the intuitive and the judgmental. In fact, as you certainly know, intuition is as beneficial as policy when it comes to exception reports. It's just not very measurable.
After about a year of working with the original criteria, I was challenged, indirectly, to defend their reliability. I needed to estimate how soon and how frequently it would be advisable to have our authority file outsourced for cleanup. One measurable part of that answer would be how many, possibly important, cross references are lacking. So I set out to test my criteria.
The method used to measure the effectiveness of the criteria for capturing authority records with cross references was simply to substitute a blanket search of personal name headings for the normal selective search. On fourteen days, at intervals of from one week to one month, every new personal name heading (MARC tag 100, 600, or 700) entering the database was checked against the LC/NACO Name Authority File on OCLC. Subject headings with subdivisions and name-title headings were not searched. When a searched name exactly matched the name on an authority record, the record was exported into the CCS database. The decision to record only exact matches meant I did not make allowances for typos or missing dates, for near misses, in other words. My intent was to duplicate, as closely as possible, the matching algorithm of the LibsPlus system. For the computer, being close is being wrong. If the authority record contained cross references, this was noted also. Authority records with cross references, which would have been examined per the existing search criteria, were marked. Those authority records which would have been missed were printed out to be examined later.
After the downloaded authority records had been incorporated into the indexes, the personal names were checked against the CCS database. This was done for two reasons; to identify false matches and to remove from the study the headings on preliminary records. False matches are defined, for the purpose of this study, as headings which match the form of a heading or cross reference, but not the bibliographic identity of the name on an authority record. In other words, the right name but the wrong person. Arlene Taylor, in an article published in 1992 acknowledged what she termed an "exact match problem" in a sample of records drawn at random from OCLC, but did not count the instances of the problem in her study. (1)
Preliminary records are brief, pseudo-MARC records entered locally to represent an item until full MARC cataloging is found. Little, if any, attempt is made to assure consistency in the headings on these temporary records; consequently they were eliminated from the statistics.
Daily and cumulative totals were kept for 1) the number of all new headings entering the database, 2) the number of personal name headings, 3) the number of personal name headings which had corresponding authority records in OCLC, 4) the number of those records with cross references, and, finally, 6) the number of records with cross references which would have been identified by the search criteria. The authority records with cross references which would have been missed were collected and charted. They were then examined for patterns which could be translated to additional criteria to make selective searching of personal names more successful.
On the fourteen days of the study 6310 new controlled headings entered the CCS database. Forty-three percent (2740) of those headings were personal names. Authority records for 1624 of the names were found in the LC/NACO Name Authority File on OCLC, however 10% (148) of those were false matches.
Personal Name Headings (as percentage of all headings)
| Number | Percent of Total | |
|---|---|---|
| New Headings | 6310 | --- |
| Personal Names | 2740 | 43.4 |
| Not Personal Names | 3570 | 56.6 |
The remaining 1468 personal names represent a 54% match rate against the Name Authority File. Again, these are exact matches. The 46% (1272) of the names which did not match include transcription errors, false hits, and names lacking authority records, with the latter group being by far the largest. Here, then, was my answer to whether there would be enough work to keep a public library NACO participant busy.
Personal Name Headings with Authority Records
| Number | Percent of Total | |
|---|---|---|
| Personal Headings | 2740 | --- |
| Match with Authority File | 1468 | 53.6 |
| No Match | 1272 | 46.4 |
Two hundred twenty eight of the 1468 PNAR's found in OCLC included 4xx or 5xx cross references. In other words, 85% of the authority records did not contain references. It was here that I found several published reports against which to compare my statistics. Weintraub in 1991 (2), Taylor in 1984 (3), and Watson and Taylor in 1987 (4) found authority records without references accounted for from 58% to 68% of their samples. Given that public library acquisitions are heavily popular, contemporary, English language works, the discrepancy between my values and theirs, which were measured in university databases or randomly from the NAF, is not surprising.
Cross References on PNAR's
| Number | Percent of Total | |
|---|---|---|
| Total Matches | 1468 | --- |
| With Cross References | 228 | 15.5 |
| Without Cross References | 1240 | 84.5 |
The Authorities Librarian's selective criteria would have captured 39% (89) of the 228 records with references. The 139 personal name authority records which escaped capture, when examined, yielded several possibilities for amending the search criteria, and thus ensuring a better success rate than 39%.
Success Rate Authority Records with References
| Number | Percent of Total | |
|---|---|---|
| Total with Cross References | 228 | --- |
| Captured | 89 | 39.0 |
| Not Captured | 139 | 61.0 |
Twenty-four of the missed headings contained a subfield q. In 18 cases both surname and forename suggested the names were from Russian, Hebrew, Chinese, or other languages where multiple romanizations might be expected. Another 8 headings included a pre-20th century birthdate. Making adjustments for some overlap, these three categories accounted for 48 headings. Moving the 48 from the debit to the credit side of the tally sheet, the success rate for searching personal name headings based on the amended criteria jumped from 39% to 61%.
As a result of my study the criteria have indeed been rewritten to include three new categories of headings: headings with subfield q; foreign names possibly romanized, and headings with pre-20th century dates. Obviously the new criteria are impractical if they increase the total number of headings needing examination to an unmanageable number. While I have not tabulated the increase in the total number of headings being examined, I have not found the change to be burdensome.
To keep things in perspective, I needed to remind myself where the subject of this sometimes tedious exercise-- new personal name headings with cross references--fit into the whole scheme of things. Graphically represented, the whole idea was to get column 5 as close as possible to column 4 and still have time left to deal with the rest of the job.

The study of new personal name headings in the CCS database did achieve the desired end, that of evaluating the search criteria and suggesting ways to improve them. However, even with the new proven criteria, the ability to capture personal names with cross references, under the scheme of centralized, post-cataloging authority control, remains at about 60%.
In retrospect I wonder if this study could have been more about significance than numbers. I did not allow myself the luxury of judging the usefulness in the CCS database of the cross references I searched for. In truth I am uneasy with the concept. Some research has suggested doing away with cross references which rotate compound names or names with prefixes, on the premise that keyword searching makes them unnecessary. I do know that this would seriously threaten access in our database. For me to tinker with references because I doubt their utility strikes me as being dangerously short-sighted, more authoritarian than authority, at least without further study.
1. Arlene G. Taylor, "Variations in Personal Name Access Points in OCLC Bibliographic Records," Library Resources and Technical Services 36 (April 1992): pp. 224-241.
2. Tamara S. Weintraub, "Personal Name Variations: Implications for Authority Control in Computerized Catalogs," Library Resources and Technical Services 35 (April 1991): pp. 217-228.
3. Arlene G. Taylor, "Authority Files in Online Catalogs: An Investigation of Their Value," Cataloging & Classification Quarterly 4 (Spring 1984): pp. 14-15.
4. Mark R. Watson and Arlene G. Taylor, "Implications of Current Reference Structures for Authority Work in Online Environments," Information Technology and Libraries 6 ( March 1987): pp.10-19.
Return to Proceedings Home Page
| Advanced Search | Careers at OCLC | Feedback | Privacy Policy | ISO 9001 Certificate | ©2003 OCLC Online Computer Library Center, Inc. |