The M-AILABS Speech Dataset

Authors:
M-AILABS

Description of the Dataset:
The M-AILABS Speech Dataset provides extensive audio and text data for training speech recognition and synthesis models. The dataset consists of WAV files.

Data Creation Method:
Audio was recorded by volunteers (LibriVox) and is in the public domain. Texts are sourced from LibriVox and Project Gutenberg and are in the public domain, published between 1884 and 1964.

Number of Speakers:

Not specified

Total Size:

18.7 hours

Number of Real Samples:

9265

Number of Fake Samples:

Extra Details:
Speech language is in German

Data Type:

WAV files

Average Length:

1-20 seconds

Keywords:

Speech Recognition, Training Data

When Published:

3rd January 2019

Annotation Process:
Users can preprocess the data using existing tools that support the LJSpeech data format or perform their own preprocessing.

Usage Scenarios:
Training and evaluation of ASR and TTS models

Data Accessibility:
Publicly accessible

Dataset Link

Main Paper Link

License:
Copyright (c) 2017-2019 by the original creators @ M-AILABS with the following license

Last Accessed: 6/19/2024

NSF Award #2346473

Search UMBC

Subscribe to UMBC Weekly Top Stories

I am interested in: