Learning A Self-Supervised Domain-Invariant Feature Representation for Generalized Audio Deepfake Detection

Authors:

Xie, Yuankun and Cheng, Haonan and Wang, Yutian and Ye, Long

Where published:

INTERSPEECH

Dataset names (used for):

ASVspoof 2019 LA
WaveFake
FakeAVCeleb

Some description of the approach:

This uses Wav2Vec-XLSR [3] to get domain-invariant feature representations before feeding the embedding to the classifier. After Wav2Vec-XLSR front-end, they use a Light Convolutional Neural Network (LCNN) followed by a transformer block as back-end. This additional step gives a feature space in which real audio data stay together in the same cluster, while other audio types (any types of attacks) scatter in the feature space

Some description of the data (number of data points, any other features that describe the data):

Across training dataets, it containes total data points for 26,065 real utterances and 212,035 fake utterances

Keywords:

Audio deepfake detection, self-supervised representation, domain generalization, feature space

Instance Represent:

Real and fake audio utterances across multiple domains.

Dataset Characteristics:

Raw waveforms processed for domain diversity

Subject Area:

Security of audio authentication systems

Main Paper Link

License Link

Last Accessed: 11/26/2024

NSF Award #2346473

Community Infrastructure to Strengthen AI for Audio Deepfake analysis (CISAAD)

College of Engineering and Information Technology

Community Infrastructure to Strengthen AI for Audio Deepfake analysis (CISAAD)

Learning A Self-Supervised Domain-Invariant Feature Representation for Generalized Audio Deepfake Detection

Community Infrastructure to Strengthen AI for Audio Deepfake analysis (CISAAD)

Subscribe to UMBC Weekly Top Stories

I am interested in: