Authority Control in the 21st Century: An Invitational Conference
Authority Control Profile of a Large Academic Research Library Database
Mary Ann Itoga
John Attig
Christine Avery
Pennsylvania State University
Table of Contents
Background
- Planning for automated authority control component for LIAS, Penn State's local system
- Presentation and article by Jennifer Younger
- Concept of utility in authority control
- Maximizing the impact of authority work
- Correlation between disciplines and the way information is likely to be retrieved
- Do humanists and scientists search online catalogs differently?
- Need for research
Research Plan: Two Studies
- Survey of the content of the online catalog [preliminary results reported here]
- Examination of user behavior related to use of the online catalog [to be conducted later in 1996]
Basic Question: Is there a relationship between subject matter and the distribution of types of headings or the frequency of occurrence of individual headings?
The Database Survey
The Sample
- A random sample of 449 records
- Sampled from records added to the online catalog 1983 through 1995
- All formats included
- In-process records excluded
- Expected to be representative of current cataloging
The "Questionnaire"
- A stripped-down MARC record
- Included title and all access points
- Data collected on print-outs of records
- 68 questions asked for each record
- Data gathered on call number, language, names, subjects, and series
Sample Validation
- Call number breakdown of sample compared with October 1995 call number data for the entire Penn State database
- Almost an exact match
Disciplines
Based on Library of Congress call numbers
| Humanities | A-B (except BF), M-N, P |
| History/Geography | C-F, G |
| Social Sciences | BF, GN-GV, H-L, U, V |
| Science/Technology | GA-GF, Q-T |
Note: Z's assigned to discipline categories based on subject content of item.
Results of the Survey
Profile of the Sample
Number of Records by Language
| English | 382 | 85% |
| German | 22 | 5% |
| Spanish | 12 | 3% |
| French | 11 | 2% |
| Other | 22 | 5% |
| TOTAL | 449 | 100% |
- English accounted for an overwhelming majority of records in the sample
Chart 1: 15K
Number of Records by Discipline
| Humanities | 141 | 31.4% |
| History/Geography | 48 | 10.7% |
| Social Sciences | 142 | 31.6% |
| Science & Technology | 118 | 26.3% |
| TOTAL | 449 | 100% |
- Humanities, Social Sciences, and Science are more or less balanced.
Chart 2: 16K
Number of Headings Per Record: Names
| None | 7 | 1.5% |
|---|
| 1 | 289 | 64.4% |
|---|
| 2 | 105 | 23.4% |
|---|
| 3-9 | 47 | 10.5% |
|---|
| 10+ | 1 | 0.2% |
|---|
| TOTAL | 449 | 100% |
|---|
- The overwhelming majority (almost 88%) of the records had 1 or 2 names.
- Only 7 records had no name headings.
Chart 3: 14K
Number of Headings Per Record: Subjects
| None | 45 | 10.0% |
| 1 | 145 | 32.3% |
| 2 | 117 | 26.1% |
| 3 | 78 | 17.4% |
| 4 | 40 | 8.9% |
| 5+ | 24 | 5.3% |
| TOTAL | 449 | 100% |
- 45 records (10%) had no subject headings.
- The overwhelming majority (almost 78%) of the records had 1 to 3 subject headings.
Chart 4: 15K
Number of Headings Per Record: Series
- 63% of the records had no series.
Name Headings in the Sample
There were a total of 667 name headings in the sample.
Number of Names by Type of Name
| Personal | 505 | 75.7% |
| Corporate | 144 | 21.6% |
| Conference | 18 | 2.7% |
| TOTAL | 667 | 100% |
Chart 5: 15K
Number of Names by Discipline: Personal Names
| Humanities | 188 | 37.2% |
| History | 45 | 8.9% |
| Social Sciences | 140 | 27.7% |
| Science | 132 | 26.2% |
| TOTAL | 505 | 100% |
- 37% of the personal name headings were in Humanities (only 31% of the records in the sample were in Humanities)
Chart 6: 15K
Number of Names by Discipline: Corporate Names
| Humanities | 19 | 13.2% |
| History | 16 | 11.1% |
| Social Sciences | 61 | 42.4% |
| Science | 48 | 33.3% |
| TOTAL | 144 | 100% |
- 42% of the corporate name headings were in Social Sciences (only 32% of the records in the sample were in Social Sciences).
- 33% of the corporate name headings were in Science (only 26% of the records in the sample were in Science).
- In contrast, only 13% of the corporate name headings were in Humanities (31% of the records in the sample were in Humanities).
Chart 7: 15K
Number of Names by Discipline: Conference Names
| Humanities | 0 | 0.0% |
| History | 1 | 5.5% |
| Social Sciences | 4 | 22.2% |
| Science | 13 | 72.3% |
| TOTAL | 18 | 100% |
- Most of the conference names were in Science.
- However, out of the 667 name headings in the sample, there were only 18 conference names.
Chart 8: 15K
Frequency of Sample Headings in the Database
Each name heading in the sample was searched in the Penn State online catalog (about 1.75 million records), and the number of records with that heading were counted.
Number of Names by Type of Name
| Frequency | 1 | 2-5 | 6-10 | 11-99 | 100+ |
| Personal Names | 148 | 184 | 66 | 96 | 11 |
| Corporate/Conference Names | 21 | 22 | 10 | 36 | 55 |
- Most personal names occur 5 times or less in the database.
- Most corporate and conference names occur 10 times or more.
Chart 9: 17K
Number of Personal Names by Discipline
| Frequency | 1 | 2-5 | 6-10 | 11-99 | 100+ |
| Humanities | 52 | 55 | 25 | 45 | 11 |
| History | 10 | 18 | 9 | 8 | 0 |
| Social Sciences | 37 | 60 | 17 | 26 | 0 |
| Science | 49 | 51 | 15 | 17 | 0 |
- Personal names are most likely to occur frequently in Humanities.
Chart 10: 18K
Number of Corporate and Conference Names by Discipline
| Frequency | 1 | 2-5 | 6-10 | 11-99 | 100+ |
| Humanities | 4 | 5 | 3 | 4 | 3 |
| History | 4 | 1 | 1 | 1 | 10 |
| Social Sciences | 12 | 2 | 6 | 17 | 28 |
| Science | 10 | 20 | 1 | 16 | 14 |
- Corporate and conference names are most likely to occur frequently in Social Sciences.
Chart 11: 18K
Conclusions
- Type of Name
- Personal names are concentrated in Humanities
- Corporate and Conference names are concentrated in Social Sciences and Science.
- Frequency of Occurrence
- Personal names are less likely to occur frequently.
- Corporate and conference names are more likely to occur frequently.
Subject Headings
Topics of Interest
- Structure of subject headings:
- How complex are the headings?
- How many subdivisions do they contain?
- Verification of subject headings:
- How many headings are in LCSH?
- What is not in LCSH?
Complexity of Subject Headings
Of 906 subject headings in the sample:
- 31% had only one subdivision ($a)
- 42% had two subdivisions
- 27% had three or more subdivisions
Frequency of Occurrence
Looking at each subdivision in the sample independent of the rest of the heading and searching each subdivision in the entire Penn State online catalog:
- 49% of the main headings ($a) occurred in more than 100 headings in the catalog
- 92% of the topical subdivisions ($x) occurred in more than 100 headings
- 85% of the chronological subdivisions ($y) occurred in more than 100 headings
- 93% of the geographic subdivisions ($z) occurred in more than 100 headings
Verification Against LCSH
Complete Headings
- Total of 816 LC subject headings (650 or 651)
- 365 of these (45%) could be verified completely in LCSH
- 451 (55%) could not be verified completely
- Almost always, this was because one or more subdivisions were not in LCSH
Topical Subdivisions ($x)
- The 816 subject headings contained 473 topical subdivisions
- 149 of these subdivisions were in LCSH, 324 were not
- Of these 324 unverified subdivisions, there were 122 unique subdivisions
- Of these 122 unverified topical subdivisions, 111 are free-floating subdivisions
- 75 of these appear in H1095
- Conclusion: Subdivision records for the headings in H1095 would greatly enhance the odds of verification
Chronological Subdivisions ($y)
- The 816 subject headings contained 51 chronological subdivisions
- 39 of these subdivisions were in LCSH, 12 were not
- Of these 12 unverified subdivisions, there were only 2 unique subdivisions: 19th century and 20th century
Geographic Subdivisions ($z)
- The 816 subject headings contained 264 geographic subdivisions
- Only 33 of these subdivisions were in LCSH, 231 were not
- Of these 231 unverified subdivisions, there were 74 unique subdivisions
- Of the 74 unique geographic subdivisions:
- 3 were names of continents
- 35 were names of countries
- 11 were names of states
- 49 of the subdivisions belong to these three categories
- Conclusion: If subdivision records were created for these three categories, the odds of verification would be greatly increased
- Note: For all three categories, the form of the subdivision is the same as the form of the main heading, so the subdivision records could be created automatically.
Impact of Subdivision Records on Our Sample
- Using just LCSH:
- 365 headings (45%) were verified
- 451 headings (55%) were not
- Using LCSH plus the subdivision records recommended
- 705 headings (86.5%) were verified
- Only 111 headings (13.5%) were not
Conclusions
- Subject headings are complex and hierarchical.
- LCSH does not contain a majority of the complete subject headings appearing in bibliographic records.
- Subdivision records can dramatically increase odds of verification.
Copyright © 1996 Christine Avery, Mary Ann Itoga, John C. Attig
Return to Proceedings Home Page
| | | | |  |