Authors:
Hafiz Malik, Raghavendar Changalvala
Where published:
Audio Engineering Society Conference on Audio Forensics, Porto, Portugal
Dataset names (used for):
- Baidu Cloned Audio Dataset: includes 10 ground truth audio samples, 120 cloned recordings, and four morphed speech recordings
The study evaluates the performance of a deep learning-based fake speech detection method using a dataset of cloned and bona-fide speech samples. It focuses on detecting fake speech through a deep learning model using spectrograms of the audio recordings.
The dataset consists of 124 cloned speech samples, 124 bona-fide speech samples, and additional morphed speech recordings. Each sample is processed into spectrograms, with audio duration of 5 seconds per sample and a sampling rate of 16 kHz.
Keywords:
Voice cloning, deep learning, speech synthesis, voice biometrics, AI security, fake speech detection
Instance Represent:
Audio recordings of speech, both bona-fide (authentic) and cloned (synthesized)
Dataset Characteristics:
248 speech samples, including variations based on speaker adaptation and speaker encoding methods for cloned speech. Spectrograms of audio recordings are used as input for the CNN model.
Subject Area:
Audio forensics, speech processing, and AI security
Associated Tools:
Fake speech detection using deep learning
Feature Type:
Spectral-temporal representations (spectrograms) of audio recordings
License: Not explicitly specified, as it is based on the spectrogram images (size 625×469×3 pixels)
Last Accessed: 7/7/2024