Authors:
Janavi Khochare, Chaitali Joshi, Bakul Yenarkar, Shraddha Suratkar, Faruk Kazi
Where published:
Arabian Journal for Science and Engineering, 2021, № 3, p. 3447-3458
Dataset names (used for):
- Fake or Real (FoR)
Some description of the approach:
used Temporal Convolutional Network (TCN) and Spatial Transformer Network (STN) to classify a benchmark Fake or Real dataset. Using MEL spectrogram as the input feature of the audio data. Being limited to the FoR dataset reduces the generalizability of the model, since it only consists of one sub-type of audio deepfake called text-to-speech.
Some description of the data (number of data points, any other features that describe the data):
FoR dataset includes only TTS samples
Keywords:
TTS, audio anti-spoofing
Instance Represent:
Hand-crafted acoustic fatures and more
Dataset Characteristics:
Only audio deepfake TTS samples
Subject Area:
Security of audio authentication systems