Towards Generalisable and Calibrated Audio Deepfake Detection with Self-supervised Representations

Authors:

Pascu, Octavian and Stan, Adriana and Oneata, Dan and Oneata, Elisabeta and Cucu, Horia

Where published:

INTERSPEECH

 

Dataset names (used for):

  • ASVspoof 19
  • In-the-wild
  • FoR
  • MLAAD
  • TIM

 

Some description of the approach:

This study uses a common model which is previously utilized in image deepfake detection for the purpose of generalization [7] as its representation learning to achieve generalization in SAD. It also proposed a “direct method of estimating the uncertainty from the output probabilities of the detector, by computing the entropy over the outputs” as its calibration technique. This study is considered as the-state-of-the-art in terms of generalization and calibration performance.

 

Some description of the data (number of data points, any other features that describe the data):

This paper focuses on data including full utterances and partial fakes covering diverse domains, languages and spoofing systems. Representations extracted from self-supervised models (e.g., wav2vec2, XLS-R).

 

Keywords:

Deepfake detection, anti-spoofing, pretrained representations

Instance Represent:

Audio clips (real or synthesized)

Dataset Characteristics:

N/A

Subject Area:

Security of audio authentication systems

Dataset Link


Main Paper Link


License Link


Last Accessed: 11/26/2024

NSF Award #2346473