A Deep Learning Framework for Audio Deepfake Detection

Authors:

Janavi Khochare, Chaitali Joshi, Bakul Yenarkar, Shraddha Suratkar, Faruk Kazi

Where published:

Arabian Journal for Science and Engineering, 2021, № 3, p. 3447-3458

 

Dataset names (used for):

  • Fake or Real (FoR)

 

Some description of the approach:

used Temporal Convolutional Network (TCN) and Spatial Transformer Network (STN) to classify a benchmark Fake or Real dataset. Using MEL spectrogram as the input feature of the audio data. Being limited to the FoR dataset reduces the generalizability of the model, since it only consists of one sub-type of audio deepfake called text-to-speech.

 

Some description of the data (number of data points, any other features that describe the data):

FoR dataset includes only TTS samples

 

Keywords:

TTS, audio anti-spoofing

Instance Represent:

Hand-crafted acoustic fatures and more

Dataset Characteristics:

Only audio deepfake TTS samples

Subject Area:

Security of audio authentication systems

Dataset Link


Main Paper Link


License Link


Last Accessed: 11/26/2024

NSF Award #2346473