The M-AILABS Speech Dataset

Authors:
M-AILABS

 

Description of the Dataset:
The M-AILABS Speech Dataset provides extensive audio and text data for training speech recognition and synthesis models. The dataset consists of WAV files.

 

Data Creation Method:
Audio was recorded by volunteers (LibriVox) and is in the public domain. Texts are sourced from LibriVox and Project Gutenberg and are in the public domain, published between 1884 and 1964.

 

Number of Speakers:

  • Not specified

Total Size:

  • 18.7 hours

Number of Real Samples:

  • 9265

Number of Fake Samples:

  • 806

 

Extra Details:
Speech language is in German

 

Data Type:

  • WAV files

Average Length:

  • 1-20 seconds

Keywords:

  • Speech Recognition, Training Data

When Published:

  • 3rd January 2019

 

Annotation Process:
Users can preprocess the data using existing tools that support the LJSpeech data format or perform their own preprocessing.

 

Usage Scenarios:
Training and evaluation of ASR and TTS models

 

Data Accessibility:
Publicly accessible

Dataset Link


Main Paper Link


License:
Copyright (c) 2017-2019 by the original creators @ M-AILABS with the following license


Last Accessed: 6/19/2024

NSF Award #2346473