Authors:
John S. Garofolo, Lori F. Lamel, William M. Fisher, Jonathan G. Fiscus, David S. Pallett, Nancy L. Dahlgren
Description of the Dataset:
The TIMIT corpus is designed to provide speech data for the acquisition of acoustic-phonetic knowledge and the development and evaluation of automatic speech recognition systems.
It includes read speech from 630 speakers from 8 major dialect regions of the United States.
The corpus consists of 2 dialect sentences, 450 phonetically compact sentences, and 1890 phonetically diverse sentences.
Data Creation Method:
- The TIMIT corpus was created through a joint effort involving the Massachusetts Institute of Technology (MIT), Stanford Research Institute (SRI), and Texas Instruments (TI).
- Text corpus design was a collaboration among MIT, SRI, and TI.
- Speech was recorded at TI.
- Transcription was done at MIT.
- Data was maintained, verified, and prepared for CD-ROM production by the National Institute of Standards and Technology (NIST).
Number of Speakers:
- The dataset contains speech from 630 speakers.
Total Size of Data:
- The corpus consists of 6300 sentences.
Real Samples:
- All samples in the dataset are real speech recordings from 630 speakers.
Fake Samples:
- There are no fake samples in the TIMIT corpus. All data is real.
Extra Details about the Data:
- The data includes orthographic transcriptions, word-level transcriptions, and phonetic transcriptions.
Data Types:
- .wav: Speech waveform files.
- .txt: Orthographic transcriptions.
- .wrd: Time-aligned word transcriptions.
- .phn: Time-aligned phonetic transcriptions.
Average Length of Data:
- Each speaker reads 10 sentences, leading to a total of 6300 sentences. The exact average length in terms of duration isn’t specified, but the dataset ensures a broad coverage of phonetic contexts.
Data Collection Period:
- Not specified.
Annotation Process:
- Speech was transcribed at MIT, and phonetic transcriptions were created by trained linguists.
Usage Scenarios:
- The TIMIT corpus is widely used for:
- Research in acoustic-phonetic studies.
- Dialect and linguistic analysis
Technical Specifications:
- Sampling Rate: The waveform files (.wav) are typically sampled at 16 kHz.
- Bit Depth: Usually 16-bit audio files.
Data Accessibility:
- The dataset was originally distributed on CD-ROM and might now be available through specific research institutions or online repositories.
Challenges and Limitations:
- Limited dialect representation for certain regions, particularly the Western dialect where boundaries are less clearly defined.
License Link:
- The license for the dataset is not specified in the provided abstract. The work may be protected by copyright, and further details might be available.
Last Accessed: 6/19/2024