FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset

Authors:

Hasam Khalid, Shahroz Tariq, Minha Kim, Simon S. Woo

Description of Dataset:

FakeAVCeleb is a multimodal dataset featuring synchronized fake audio and video, created to enhance the development of deepfake detection systems capable of identifying both visual and audio manipulations. The dataset covers multiple ethnicities (Caucasian, Black, South Asian, East Asian) and is balanced for gender diversity. The four different combinations of real/fake audio and video provide a diverse and challenging dataset for detecting deepfakes.

Data Creation Method:
Collected ten sample sets from six different network architectures, spanning two languages (English and Japanese). The dataset includes samples that resemble the training distributions, enabling one-to-one comparisons of audio clips between different architectures.

Number of Speakers:

2 speakers (one for each reference dataset).

Total Size:

Approximately 196 hours

Number of Real Samples:

Not specified

Number of Fake Samples:

117,985 generated audio clips

Extra Details:
The dataset includes samples based on the LJSPEECH and JSUT datasets, covering passages from non-fiction books and basic kanji of the Japanese language, respectively. It also provides a detailed analysis of the frequency statistics and prosody of the generated samples.

Data Type:

16-bit PCM WAV files

Average Length:

The average length of the clips is around 4.8 seconds (JSUT), 6 seconds (LJSpeech), and (3.8 seconds) Text-to-Speech with the dataset totaling approximately 175 hours.

Keywords:

Audio Deepfake Detection, Speech Synthesis, Training Data, GANs, TTS, Generative Models

When Published:

August 26, 2021

Annotation Process:
The dataset includes ten sample sets from six different network architectures. The samples were generated by first extracting Mel spectrograms from the original audio files and then feeding these spectrograms to the respective models to obtain the data set. The dataset is annotated with frequency analysis and prosody statistics.

Usage Scenarios:
Training and evaluation of audio deepfake detection models, comparison of different generative models, research into audio deepfakes, and development of robust ASR systems.

Dataset Link

Main Paper Link

License Link

Last Accessed: 6/24/2024

NSF Award #2346473

Community Infrastructure to Strengthen AI for Audio Deepfake analysis (CISAAD)

College of Engineering and Information Technology

Community Infrastructure to Strengthen AI for Audio Deepfake analysis (CISAAD)

FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset

Community Infrastructure to Strengthen AI for Audio Deepfake analysis (CISAAD)

Subscribe to UMBC Weekly Top Stories

I am interested in: