Data Sets

WAV Files


ADD 2022

The dataset for the challenge consists of training, development, adaptation, and test sets.

Jiangyan Yi, Ruibo Fu, Jianhua Tao, Shuai Nie, Haoxin Ma, Chenglong Wang, Tao Wang, Zhengkun Tian, Ye Bai, Cunhang Fan, Shan Liang, Shiming Wang, Shuai Zhang, Xinrui Yan, Le Xu, Zhengqi Wen, Haizhou Li, Zheng Lian, Bin Liu

Ar-DAD: Arabic Diversified Audio Dataset

This dataset contains 15,810 audio clips of 30 popular reciters cantillating verses from the Holy Quran (chapters 78-114).

Mohammed Lataifeh, Ashraf Elnagar

The LJ Speech Dataset

A dataset consisting of 13,100 short audio clips of a single speaker reading passages from 7 non-fiction books.

Keith Ito, Linda Johnson

The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus

Speech data for the acquisition of acoustic-phonetic knowledge and the development and evaluation of automatic speech recognition systems.

John S. Garofolo, Lori F. Lamel, William M. Fisher, Jonathan G. Fiscus, David S. Pallett, Nancy L. Dahlgren

The M-AILABS Speech Dataset

The M-AILABS Speech Dataset provides extensive audio and text data for training speech recognition and synthesis models. The dataset consists of WAV files.

M-AILABS

 

MP3 Files


Baidu Silicon Valley AI Lab cloned audio (Neural Voice Cloning with a Few Samples)

This dataset introduces a neural voice cloning system that learns to synthesize a person’s voice from only a few audio samples.

Sercan Ö. Arık, Jitong Chen, Kainan Peng, Wei Ping, Yanqi Zhou

FoR: Fake or Real Dataset for Synthetic Speech Detection

The dataset includes both real and synthetic speech samples for the purpose of detecting synthetic speech using machine learning and deep learning models.

Ricardo Reimao (York University),
Vassilios Tzerpos (York University)

FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset

FakeAVCeleb is a multimodal dataset featuring synchronized fake audio and video, created to enhance the development of deepfake detection systems capable of identifying both visual and audio manipulations.

Hasam Khalid, Shahroz Tariq, Minha Kim, Simon S. Woo

 

PCM Files


ASVspoof 2021 (DF)

Involves detecting deepfake speech processed with different lossy codecs typically used for media storage. Best performing system achieved an EER of 15.64%.

Junichi Yamagishi, Xin Wang, Massimiliano Todisco, Md Sahidullah, Jose Patino, Andreas Nautsch, Xuechen Liu, Kong Aik Lee, Tomi Kinnunen, Nicholas Evans, Hector Delgado

ASVspoof 2021 (PA)

Involves replay attacks recorded in real physical spaces with various noise and reverberation conditions.

Junichi Yamagishi, Xin Wang, Massimiliano Todisco, Md Sahidullah, Jose Patino, Andreas Nautsch, Xuechen Liu, Kong Aik Lee, Tomi Kinnunen, Nicholas Evans, Hector Delgado

ASVspoof 2021: Accelerating Progress in Spoofed and Deepfake Speech Detection

Involves detecting synthetic and converted speech injected into communication systems without acoustic propagation.

Junichi Yamagishi, Xin Wang, Massimiliano Todisco, Md Sahidullah, Jose Patino, Andreas Nautsch, Xuechen Liu, Kong Aik Lee, Tomi Kinnunen, Nicholas Evans, Hector Delgado

ASVspoof2021 (LA)

The ASVspoof 2021 dataset advances the field of spoofed and deepfake speech detection by introducing more challenging and realistic conditions, fostering the development of robust countermeasures.

Junichi Yamagishi, Xin Wang, Massimiliano Todisco, Md Sahidullah, Jose Patino, Andreas Nautsch, Xuechen Liu, Kong Aik Lee, Tomi Kinnunen, Nicholas Evans, Hector Delgado

WaveFake: A Data Set to Facilitate Audio Deepfake Detection

The dataset addresses the threat of audio deepfakes by providing a novel dataset of generated audio samples from 6 different network architectures across 2 languages.

Joel Frank, Lea Schönherr

 

 

Other Files


ASVspoof 2015 (The First Automatic Speaker Verification Spoofing and Countermeasures Challenge)

The dataset includes genuine and spoofed speech, partitioned into training, development, and evaluation sets.

Zhizheng Wu, Tomi Kinnunen, Nicholas Evans, Junichi Yamagishi, Cemal Hanilçi, Md Sahidullah, Aleksandr Sizov

ASVspoof 2019 (A large-scale public database of synthesized, converted and replayed speech)

The dataset includes various state-of-the-art spoofing techniques to provide a challenging test bed for anti-spoofing research.

Xin Wang, Junichi Yamagishi, Massimiliano Todisco, Hector Delgado, Andreas Nautsch, Nicholas Evans, Md Sahidullah, Ville Vestman, Tomi Kinnunen, Kong Aik Lee, Lauri Juvela, Paavo Alku, Yu-Huai Peng, et al.

AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset

It includes various combinations of real and fake audio-visual segments, providing a comprehensive benchmark for state-of-the-art deepfake detection and localization methods.

Zhixi Cai, Shreya Ghosh, Aman Pankaj, Munawar Hayat, Abhinav Dhall, Tom Gedeon, Kalin Stefanov

H-Voice: Histograms of Original and Fake Voice Recordings

The dataset consists of 6672 histograms of voice recordings, both original and fake. It is organized into six directories for training, validation, and external testing

Dora M. Ballesteros, Yohanna Rodriguez, Diego Renza

VoxCeleb2

Contains audio-visual recordings from over 6,000 speakers, extracted from YouTube videos.

Joon Son Chung, Arsha Nagrani, Andrew Zisserman

 

NSF Award #2346473