The ATIS3 corpus, on three CD-ROMs, includes over 774 scenarios completed by 137 subjects, yielding a total of over 7,300 utterances. All utterances are transcribed and 2,900 of them have been categorized and annotated with canonical reference answers. The relational database for this dataset included flight information for 46 cities and 52 airports. Data was collected at BBN, CMU, MIT and SRI, using their own ATIS systems and at NIST using systems provided by BBN and SRI. Two 1,000-utterance test sets were set aside from the data pooled by the collection sites. The first set was used in a December 1993 ARPA test and is included in ATIS3. The second has been reserved for future testing
This release contains a corpus of speech and natural language data collected under the auspices of the Advanced Research Projects Agency Spoken Language Systems (ARPA-SLS) technology development program. The corpus, which contains data in the Air Travel Information Services (ATIS) domain, was designed by the ARPA-SLS Multi-site Atis Data COllection Working (MADCOW) group and was collected by five sites at locations across the U.S.: * BBN Systems & Technologies, Cambridge, MA * Carnegie Mellon University, Pittsburgh, PA * MIT Laboratory for Computer Science, Boston, MA * National Institute of Standards and Technology, Gaithersburg, MD * SRI International, Menlo Park, CA The corpora is part of the third phase of collection of ATIS data (ATIS3) and comprises the development test (NIST Speech Disc 17-4.2) and evaluation test material (NIST Speech Disc 17-5.1) used in the December 1994 ARPA SLS Benchmark Tests. As in the previous ATIS corpora, the speech contained in this corpus was elicited by presenting subjects with various hypothetical travel planning scenarios to solve. The resulting spontaneous spoken queries were recorded as the subjects interacted with partially or completely automated ATIS systems to solve the scenarios. Note that the ATIS3 training data is available on NIST Speech Discs 17-1.1 - 17-3.1. *Data* The recorded speech has been transcribed and annotated with categorizations and canonical reference answers. All of the utterances have been recorded using a close-talking, noise-canceling head-mounted Sennheiser microphone. For some subjects, secondary (noisier) microphone data was recorded simultaneously as well. This release also contains the ATIS3 46 city/52 airport relational database, a revised Principles of Interpretation and test implementation and scoring instructions as well as other general documentation. The ATIS3 corpus has been verified, collated, documented by the National Institute of Standards and Technology (NIST) in cooperation with MADCOW and distributed by the Linguistic Data Consortium (LDC)
The ATIS2 corpus contains approximately 15,000 utterances recorded from approximately 450 subjects at five sites: ATT, BBN, CMU, MIT's Laboratory for Computer Science and SRI. All utterances have been transcribed and almost 10,000 of them annotated with categorizations and canonical reference answers. Unlike the ATIS0 corpus, much of the data in ATIS2 was collected using partially or fully-automated data collection systems. The fully-automated data collection systems were, in fact, working ATIS prototypes. For ATIS2, the ten-city relational database of ATIS0 was revised to accommodate connecting flights and fares and some table headings were renamed. In addition to training data, the February and November '92 ATIS Benchmark Tests are included as well. Each contains approximately 1,000 utterances from the pool of data collected by the five sites. Audio Sample *Update* This publication has been condensed from four CDROM discs to a single web download
