CMU WILDERNESS MULTILINGUAL SPEECH DATASET
Alan W Black & Language Technologies Institute, Carnegie Mellon University
2019
This paper describes the CMU Wilderness Multilingual Speech Dataset. A dataset of over 700 different languages providing audio, aligned text and word pronunciations.
Common Voice: A Massively-Multilingual Speech Corpus
Rosana Ardila, Megan Branson, Kelly Davis, Michael Henretty, Michael Kohler, Josh Meyer, Reuben Morais, Lindsay Saunders, Francis M. Tyers, Gregor Weber
2020
The Common Voice corpus is a massively-multilingual collection of transcribed speech intended for speech technology research and development. Common Voice is designed for Automatic Speech Recognition purposes but can be useful in other domains (e.g. language identification).
Faked Speech Detection with Zero Prior Knowledge
Sahar Al Ajmi, Khizar Hayat, Alaa M. Al Obaidi, Naresh Kumar, Munaf Najmuldeen
2024
Detection, Dataset, English, Arabic, Multiple Languages
This work introduces a neural network method to develop a classifier that will blindly classify an input audio as real or mimicked; the word ’blindly’ refers to the ability to detect mimicked audio without references or real sources.
MLAAD: The Multi-Language Audio Anti-Spoofing Dataset
Nicolas M. Müller, Piotr Kawa, Wei Herng Choong, Edresson Casanova, Eren Gölge, Thorsten Müller, Piotr Syga, Philip Sperl, Konstantin Böttinger
2024
This paper presents the Multi-Language Audio Anti-Spoof Dataset (MLAAD), created using 82 TTS models, comprising 33 different architectures, to generate 378.0 hours of synthetic voice in 38 different languages.
One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech
Tomáš Nekvinda, Ondřej Dušek
2020
Generation, Multiple Languages
This paper introduces an approach to multilingual speech synthesis which uses the meta-learning concept of contextual parameter generation and produces natural-sounding multilingual speech using more languages and less training data than previous approaches.
Jan “Yenda” Trmal
N/A
OpenSLR is a site devoted to hosting speech and language resources, such as training corpora for speech recognition, and software related to speech recognition.