From WorldCat Developers' Network
WorldCat Search API: Specific Index Searching tips
This gives information about indexes that have unusual features or normalization rules. Also check out the general notes on searching indexes here. Also check out the See also the indexes offered at each Service Level.
The indexes below are listed in alphabetical order after listing the Keyword index information.
Keyword
The keyword index for the WorldCat Search API is not the same index used in the WorldCat.org service. It matches the index used in the cataloging and FirstSearch WorldCat versions of the database. In general the main thing not included in this index that is included in the WorldCat.org index are standard numbers other then the ISBN. So, for example, the ISSN is not included in this index.
The Keyword search finds information in the author, title, subject, notes, ISBN, year, year 2 and a few fields specific to the keyword search (034/a,b,d,e,f,g, 052/a,b, 255/a,b,c,d,e).
The Year and Year2 data in the 008 data is indexed following the same rules used in the Year index following the rules given below.
The ISBN is indexed as the data is indexed, without hyphens. However, any keyword search term that meets the characteristics of an ISBN and is entered with hyphens will be automatically concatenated by the search processor.
A geographic field found only in the keyword indexes is 052/a,b. This is a field useful to map catalogers. The 052/a field is indexed alone. Any records containing 052/b also indexes as 052/a concatenated together with 052/b without spaces, as a single word. So, a record with 052 #a 1234 #b P4 #b C2 is searchable as 1234, 1234p4, and 1234c2.
Access Method
The most direct search of internet resources is the Access Method search which searches the URLs found in some WorldCat records. The characters between the punctuation in the URL are the "words" that are searched for. For example, http://www.oclc.org is searched by the "words" www, oclc, and org. The most distinct term in the string is the most useful on which to search. All the stopwords apply to this index, plus two additional stopwords, "http" and "https".
Author indexes
The author index actually includes all people and corporations that have participated in the creation of the item, including sponsors, editors, directors, actors, illustrators and many other roles. The Personal Name index includes only people. The Corporate/Conference Name index includes only organizations such as corporations as well as meetings and conferences.
There are no stopwords in the Author, Corporate and Conference Name, and Personal Name indexes.
Dewey Class Number
This is available as an index when using the full service level. It is available as a limit only if using the default service level.
The data has been normalized so that all spaces and punctuation was removed except periods. The Dewey Decimal index will index classification numbers up to each slash or prime number also. So a Dewey number of 123.4/56/789 will be indexed as 123.4 and 123.456 and 123.456789.
DLC Limit
The DLC limit identifies records that were either cataloged by the United States Library of Congress or cataloged under the auspices of a United States national program such as CONSER or PCC. The only value to search is “y” to limit the result to include only these records.
Government document number
This data was normalized by removing all punctuation including periods and by removing all spaces.
ISBN
A search cam be entered that includes all hyphens or that has removed and concatenated the ISBN number.
ISSN
This data is searchable only by including the hyphen in the number.
LCCN
The Library of Congress Control Numbers can be searched in a variety of ways. The numbers can be search with the hyphen added or with the zero fill characters that is also used to store the number. So for example, sn92-1234 is searchable as 92-1234 or 92001234 or sn92-1234 or sn92001234.
Language limit
The language index not only includes the primary language value found in the MARC 008, but it also includes all the languages when an item includes multiple languages. It also includes the language of summaries and additional textual material.
The Language index includes both the three-letter codes for languages found in the OCLC Marc Code Lists (ISBN 1-55653-169-9) and the expanded value for that language code in English. The language index also has indexed the less common two-character ISO codes for languages.
And while the English values for these codes are also searchable, the only precise search term is the three-letter code. The English search terms may group language codes together. For example, English as a word search brings together Modern English, Old English, and Creole or Pidgin English.
Please also see the Primary Language limit described below.
Library Holdings limit
This limit will determine if a library has attached their holdings to a record within the OCLC WorldCat database. The search term to use is the OCLC symbol. To find the OCLC symbol for an institution, please see Find an OCLC Library.
Library Holdings Group limit
Records with more libraries indicating that they have that item can be seen as more valuable to more libraries. This index limits results of records to items with a set number of libraries holding that item.
The index has number codes that can be searched to limit the results to only records that have that number of OCLC libraries holding the item. Only one value can be searched and only by using the SRU relation of “=”. These numbers can not be ranged, so for example it isn’t possible to search “>” 17, but it is possible to get that pre-determined range by searching the code 08.
The search terms that can be searched include the following codes:
| Number of Library Holdings | Search code |
|---|---|
| 5 or more holdings | 05 |
| 10 or more holdings | 06 |
| 50 or more holdings | 07 |
| 100 or more holdings | 08 |
| 500 or more holdings | 09 |
| No holdings | 10 |
| 1 holding only | 11 |
| 2 – 4 holdings | 12 |
| 5 – 9 holdings | 13 |
| 10 – 24 holdings | 14 |
| 25 – 49 holdings | 15 |
| 50 - 74 holdings | 16 |
| 75 – 99 holdings | 17 |
| 100 - 149 holdings | 18 |
| 150 - 199 holdings | 19 |
| 200 - 299 holdings | 20 |
| 300 - 399 holdings | 21 |
| 400 - 499 holdings | 22 |
| 500 - 599 holdings | 23 |
| 600 - 699 holdings | 24 |
| 700 - 799 holdings | 25 |
| 800 - 899 holdings | 26 |
| 900 - 999 holdings | 27 |
| 1,000 - 1,499 holdings | 28 |
| 1,500 - 1,999 holdings | 29 |
| 2,000 - 2,499 holdings | 30 |
| 2,500 or more holdings | 31 |
Library of Congress Class number
The data has been normalized so that all spaces and punctuation was removed except periods.
Material type limit
The Material type index searches the record to identify different kinds of items. The complete list of codes can be found here.
While many of these codes are also searchable in the Material Type index to retrieve any record that is that type of document, not all the codes searched would be the same. The codes that have different meanings between these two indexes are listed here.
• art is for Article, chapters, papers, etc. as the primary document type in document type index
• acp is for the same type of material, with additional article items, in material type index
• art is for 3-d items or artifacts in the material type index
• bks is for Books that are primarily books and not articles or internet resources in the document type index
• bks is for Books or Text of any kind in the material type index, including articles and internet resources cataloged as text
• bnu is for Books that are not internet resources, including additional items, in the material type index
• map is for Cartographic Material including maps in the document type index
• cmt is for Cartographic material including records with additional cartographic information in the material type index
• map is only items cataloged as having an 007 to indicate maps in the record in the material type index
Primary document type limit
The Primary document type is assigning a single document type of the record by determining if the record qualifies as an Internet Resource. If it is not an Internet Resource then it is assigned the document type based on the value in the Leader field of the MARC record.
The searchable values are: art Articles bks Books com Computer files int Continually updated resources map Maps mix Mixed material (Archival Materials) sco Musical scores ser Serials (Journals and Magazines) rec Sound recordings url Internet Resource vis Visual Materials
While many of these codes are also searchable in the Material Type index to retrieve any record that is that type of document, not all the codes searched would be the same. See information on the Material Type index below.
Primary language limit
The primary language index searches the three-letter codes for languages found in the OCLC Marc Code Lists (ISBN 1-55653-169-9). There is only one primary language code per record, determined by the cataloger in used the three character language code of the 008 of the MARC record.
See information on the Full Service level Langauge index above.
Publisher and music number
This index is of a field that was originally used for Music numbers and later was expanded to include other Publisher numbers. The data in this field has normalized so that the different ways in which the data could be entered is now more easily searched in a consistent manner. Rules for normalizing the data are given below:
• The data is concatenated, removing punctuation up to parentheses, commas, and dashes (double hyphens). Numbers in parentheses follow the same rules. So "ab 123" and "ab.123" and "ab-123" all are searches as "ab123". However, two spaces makes it a new number. So "ab 123" is "ab123," but when two spaces are used, it's "ab" and "123".
• Each number up to a comma space is indexed alone. So [028 #a ab123, ab124, ab125] has three numbers that are searchable: "ab123," "ab124," and "ab125."
• Numbers up to dashes are indexed, plus possibly the series between the dash. The series is indexed only if the beginning of the second number after the dash matches the beginning of the first number. So [262 #c ab123--ab125 has three searchable terms: "ab123," "ab124," and "ab125." However, if the ab does not start the next number, the range is not searchable. So [262 #c ab123--ac125] has two searchable terms, "ab123 and ac125." However, [265 #c ab123-ab125] has three searchable terms, "ab123", "ab124", and "ab125". While the start and ending of a series are always indexed, the range is only indexed up to the first 20 values.
• Information within parenthesis behaves as if it is a new field, with all the rules included. So [028 #a 123--125 (cd345--cd347, dd123) ] has searchable "123," "124," "125," "cd345," "cd346," "cd347," and "dd123."
Standard Number
The standard number index includes ISBNs, ISSNs, LCCNs, and many other standard numbers. For all of these, all punctuation is removed and concatenated. However, the OCLC number is not part of this index.
For the LCCNs, only the version of that number that includes zeros is retained. Further, if there is an alphabetic prefix with three letters it is attached to the number. If there is an alphabetic prefix of one or two letters, it is not attached. So for example, sn92-1234 is indexed as 92001234 only. And if the number was So for example, map92-1234 it is indexed as map92001234 only.
Title
The Title phrase search has the 245/a and 245/b subfields combined into a single phrase search with structure attribute 1. These subfields (245/a, 245/b) can be searched as separate subfields also.
Year limit
The data indexed is the 008 Date1 data. This data has the practice of storing unknown data as “u”. For the index, all “u”s were indexed as zeros, so for example 199u is indexed as 1990. Years that are shorter than four digits have leading zeros added. To search year 999, enter 0999. While a range will go through the entire years indexed, any search that is an unbounded ranges is searching the range between 1000-2030. So to search a search of less then 1900 searches the years 1000-1899.
