A Robust Audio Deepfake Detection System via Multi-view Feature

Authors:

Yang, Yujie and Qin, Haochen and Zhou, Hang and Wang, Chengcheng and Guo, Tianyu and Han, Kai and Wang, Yunhe

Where published:

ICASSP

 

Dataset names (used for):

  • ASVspoof 2019
  • In-the-Wild

 

Some description of the approach:

This study showes three pretrained models that had the least performance drop and most generalizability are: Wav2Vec-XLSR [3] HuBERT [4] and WavLM [5]. The study also proposes a feature fusion to incorporate all of these representations in the classifier. Also showes another popular audio pre-trained representation [6] does not show enough generalizability.

 

Some description of the data (number of data points, any other features that describe the data):

The paper focuses on deepfake detection by integrating learning based features using feature selection and fusion methods

 

Keywords:

Audio deepfake detection, anti-spoofing, feature incorporation, learning-based audio features.

Instance Represent:

Real and fake speech audio

Dataset Characteristics:

Handcrafted features: Mel, MFCC. Learning-based features: Hubert, XLS-R, Whisper.

Subject Area:

Security of audio authentication systems

Dataset Link


Main Paper Link


License Link


Last Accessed: 11/26/2024

NSF Award #2346473