Who Are You (I Really Wanna Know)? Detecting Audio DeepFakes Through Vocal Tract Reconstruction


Logan Blue, Kevin Warren, Hadi Abdullah, Cassidy Gibson, Luis Vargas, Jessica O’Dell, Kevin Butler, Patrick Traynor

Where published:

USENIX Security Symposium


Dataset names (used for):

  • TIMIT (transfer function) 
  • Own synthetic audio samples using Real-Time-Voice-Cloning (RTVC) tool *implementation of Tactron 2
  • Evaluated datasets published by Lyrebird


Some description of the approach:

The dataset utilized in this study is the TIMIT Acoustic-Phonetic Continuous Speech Corpus. It is used to test the deepfake audio detection method described.


Some description of the data:

The TIMIT dataset includes recordings of 630 speakers of eight major American English dialects, each speaking ten phonetically rich sentences. This provides a diverse and comprehensive set of speech data for analyzing and testing speech-reled technologies.



Spoofed audio detection, linguistic data augmentation, AI models, audio deepfake

Instance Represent:

Audio samples used for detecting spoofed audios.

Dataset Characteristics:

Mixed type (real and synthetic)

Subject Area:

Audio Security, Machine Learning

Associated Tools:

Spoofed audio detection

Feature Type:

Audio files with linguistic annotations

Number of Instances:


Number of Features:

Includes features like pitch, pause, word-initial or word-final consonant stops, audible intake or outtake of breath, and audio quality.

Main Paper Link

License: ©USENIX 2024

Last Accessed: 6/13/2024 (3:07PM)

NSF Award #2346473