Authors:
Yang, Yujie and Qin, Haochen and Zhou, Hang and Wang, Chengcheng and Guo, Tianyu and Han, Kai and Wang, Yunhe
Where published:
ICASSP
Dataset names (used for):
- ASVspoof 2019
- In-the-Wild
Some description of the approach:
This study showes three pretrained models that had the least performance drop and most generalizability are: Wav2Vec-XLSR [3] HuBERT [4] and WavLM [5]. The study also proposes a feature fusion to incorporate all of these representations in the classifier. Also showes another popular audio pre-trained representation [6] does not show enough generalizability.
Some description of the data (number of data points, any other features that describe the data):
The paper focuses on deepfake detection by integrating learning based features using feature selection and fusion methods
Keywords:
Audio deepfake detection, anti-spoofing, feature incorporation, learning-based audio features.
Instance Represent:
Real and fake speech audio
Dataset Characteristics:
Handcrafted features: Mel, MFCC. Learning-based features: Hubert, XLS-R, Whisper.
Subject Area:
Security of audio authentication systems