Fighting AI with AI: Fake Speech Detection using Deep Learning

Authors:
Hafiz Malik, Raghavendar Changalvala

Where published:
Audio Engineering Society Conference on Audio Forensics, Porto, Portugal

 

Dataset names (used for):

  • Baidu Cloned Audio Dataset: includes 10 ground truth audio samples, 120 cloned recordings, and four morphed speech recordings

 

Some description of the approach:
The study evaluates the performance of a deep learning-based fake speech detection method using a dataset of cloned and bona-fide speech samples. It focuses on detecting fake speech through a deep learning model using spectrograms of the audio recordings.

 

Some description of the data:
The dataset consists of 124 cloned speech samples, 124 bona-fide speech samples, and additional morphed speech recordings. Each sample is processed into spectrograms, with audio duration of 5 seconds per sample and a sampling rate of 16 kHz.

 

Keywords:
Voice cloning, deep learning, speech synthesis, voice biometrics, AI security, fake speech detection

Instance Represent:
Audio recordings of speech, both bona-fide (authentic) and cloned (synthesized)

Dataset Characteristics:
248 speech samples, including variations based on speaker adaptation and speaker encoding methods for cloned speech. Spectrograms of audio recordings are used as input for the CNN model.

Subject Area:
Audio forensics, speech processing, and AI security

Associated Tools:
Fake speech detection using deep learning

Feature Type:
Spectral-temporal representations (spectrograms) of audio recordings

Main Paper Link

Document Link


License: Not explicitly specified, as it is based on the spectrogram images (size 625×469×3 pixels)


Last Accessed: 7/7/2024

NSF Award #2346473