Who Are You (I Really Wanna Know)? Detecting Audio DeepFakes Through Vocal Tract Reconstruction

Authors:

Logan Blue, Kevin Warren, Hadi Abdullah, Cassidy Gibson, Luis Vargas, Jessica O’Dell, Kevin Butler, Patrick Traynor

Where published:

USENIX Security Symposium

 

Dataset names (used for):

  • TIMIT (transfer function) 
  • Own synthetic audio samples using Real-Time-Voice-Cloning (RTVC) tool *implementation of Tactron 2
  • Evaluated datasets published by Lyrebird

 

Some description of the approach:

The dataset utilized in this study is the TIMIT Acoustic-Phonetic Continuous Speech Corpus. It is used to test the deepfake audio detection method described.

 

Some description of the data:

The TIMIT dataset includes recordings of 630 speakers of eight major American English dialects, each speaking ten phonetically rich sentences. This provides a diverse and comprehensive set of speech data for analyzing and testing speech-reled technologies.

 

Keywords:

Spoofed audio detection, linguistic data augmentation, AI models, audio deepfake

Instance Represent:

Audio samples used for detecting spoofed audios.

Dataset Characteristics:

Mixed type (real and synthetic)

Subject Area:

Audio Security, Machine Learning

Associated Tools:

Spoofed audio detection

Feature Type:

Audio files with linguistic annotations

Number of Instances:

344

Number of Features:

Includes features like pitch, pause, word-initial or word-final consonant stops, audible intake or outtake of breath, and audio quality.

Main Paper Link


License: ©USENIX 2024


Last Accessed: 6/13/2024 (3:07PM)

NSF Award #2346473