The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus

Authors:

John S. Garofolo, Lori F. Lamel, William M. Fisher, Jonathan G. Fiscus, David S. Pallett, Nancy L. Dahlgren

Description of the Dataset:

The TIMIT corpus is designed to provide speech data for the acquisition of acoustic-phonetic knowledge and the development and evaluation of automatic speech recognition systems.

It includes read speech from 630 speakers from 8 major dialect regions of the United States.

The corpus consists of 2 dialect sentences, 450 phonetically compact sentences, and 1890 phonetically diverse sentences.

Data Creation Method:

The TIMIT corpus was created through a joint effort involving the Massachusetts Institute of Technology (MIT), Stanford Research Institute (SRI), and Texas Instruments (TI).
Text corpus design was a collaboration among MIT, SRI, and TI.
Speech was recorded at TI.
Transcription was done at MIT.
Data was maintained, verified, and prepared for CD-ROM production by the National Institute of Standards and Technology (NIST).

Number of Speakers:

The dataset contains speech from 630 speakers.

Total Size of Data:

The corpus consists of 6300 sentences.

Real Samples:

All samples in the dataset are real speech recordings from 630 speakers.

Fake Samples:

There are no fake samples in the TIMIT corpus. All data is real.

Extra Details about the Data:

Dialect regions

dr1: New England
dr2: Northern
dr3: North Midland
dr4: South Midland
dr5: Southern
dr6: New York City
dr7: Western
dr8: Army Brat (moved around)

The data includes orthographic transcriptions, word-level transcriptions, and phonetic transcriptions.

Data Types:

.wav: Speech waveform files.
.txt: Orthographic transcriptions.
.wrd: Time-aligned word transcriptions.
.phn: Time-aligned phonetic transcriptions.

Average Length of Data:

Each speaker reads 10 sentences, leading to a total of 6300 sentences. The exact average length in terms of duration isn’t specified, but the dataset ensures a broad coverage of phonetic contexts.

Data Collection Period:

Not specified.

Annotation Process:

Speech was transcribed at MIT, and phonetic transcriptions were created by trained linguists.

Usage Scenarios:

The TIMIT corpus is widely used for:
- Research in acoustic-phonetic studies.
- Dialect and linguistic analysis

Technical Specifications:

Sampling Rate: The waveform files (.wav) are typically sampled at 16 kHz.
Bit Depth: Usually 16-bit audio files.

Data Accessibility:

The dataset was originally distributed on CD-ROM and might now be available through specific research institutions or online repositories.

Challenges and Limitations:

Limited dialect representation for certain regions, particularly the Western dialect where boundaries are less clearly defined.

Dataset Link

Main Paper Link

License Link:

The license for the dataset is not specified in the provided abstract. The work may be protected by copyright, and further details might be available.

Last Accessed: 6/19/2024

NSF Award #2346473

Community Infrastructure to Strengthen AI for Audio Deepfake analysis (CISAAD)

College of Engineering and Information Technology

Community Infrastructure to Strengthen AI for Audio Deepfake analysis (CISAAD)

The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus

Community Infrastructure to Strengthen AI for Audio Deepfake analysis (CISAAD)

Search UMBC

Subscribe to UMBC Weekly Top Stories

I am interested in: