The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus

Authors: 

John S. Garofolo, Lori F. Lamel, William M. Fisher, Jonathan G. Fiscus, David S. Pallett, Nancy L. Dahlgren

 

Description of the Dataset:

The TIMIT corpus is designed to provide speech data for the acquisition of acoustic-phonetic knowledge and the development and evaluation of automatic speech recognition systems.

It includes read speech from 630 speakers from 8 major dialect regions of the United States.

The corpus consists of 2 dialect sentences, 450 phonetically compact sentences, and 1890 phonetically diverse sentences.

 

Data Creation Method:

  • The TIMIT corpus was created through a joint effort involving the Massachusetts Institute of Technology (MIT), Stanford Research Institute (SRI), and Texas Instruments (TI).
  • Text corpus design was a collaboration among MIT, SRI, and TI.
  • Speech was recorded at TI.
  • Transcription was done at MIT.
  • Data was maintained, verified, and prepared for CD-ROM production by the National Institute of Standards and Technology (NIST).

 

Number of Speakers:

  • The dataset contains speech from 630 speakers.

Total Size of Data:

  • The corpus consists of 6300 sentences.

Real Samples:

  • All samples in the dataset are real speech recordings from 630 speakers.

Fake Samples:

  • There are no fake samples in the TIMIT corpus. All data is real.

 

Extra Details about the Data:

  • dr1: New England
  • dr2: Northern
  • dr3: North Midland
  • dr4: South Midland
  • dr5: Southern
  • dr6: New York City
  • dr7: Western
  • dr8: Army Brat (moved around)
  • The data includes orthographic transcriptions, word-level transcriptions, and phonetic transcriptions.

 

Data Types:

  • .wav: Speech waveform files.
  • .txt: Orthographic transcriptions.
  • .wrd: Time-aligned word transcriptions.
  • .phn: Time-aligned phonetic transcriptions.

 

Average Length of Data:

  • Each speaker reads 10 sentences, leading to a total of 6300 sentences. The exact average length in terms of duration isn’t specified, but the dataset ensures a broad coverage of phonetic contexts.

 

Data Collection Period:

  •  Not specified.

 

Annotation Process:

  • Speech was transcribed at MIT, and phonetic transcriptions were created by trained linguists.

 

Usage Scenarios:

  • The TIMIT corpus is widely used for:
    • Research in acoustic-phonetic studies.
    • Dialect and linguistic analysis

 

Technical Specifications:

  • Sampling Rate: The waveform files (.wav) are typically sampled at 16 kHz.
  • Bit Depth: Usually 16-bit audio files.

 

Data Accessibility:

  • The dataset was originally distributed on CD-ROM and might now be available through specific research institutions or online repositories.

 

Challenges and Limitations:

  • Limited dialect representation for certain regions, particularly the Western dialect where boundaries are less clearly defined.

Dataset Link


Main Paper Link


License Link:

  • The license for the dataset is not specified in the provided abstract. The work may be protected by copyright, and further details might be available.

Last Accessed: 6/19/2024

NSF Award #2346473