Who Are You (I Really Wanna Know)? Detecting Audio DeepFakes Through Vocal Tract Reconstruction

Authors:

Logan Blue, Kevin Warren, Hadi Abdullah, Cassidy Gibson, Luis Vargas, Jessica O’Dell, Kevin Butler, Patrick Traynor

Where published:

USENIX Security Symposium

Dataset names (used for):

TIMIT (transfer function)
Own synthetic audio samples using Real-Time-Voice-Cloning (RTVC) tool *implementation of Tactron 2
Evaluated datasets published by Lyrebird

Some description of the approach:

The dataset utilized in this study is the TIMIT Acoustic-Phonetic Continuous Speech Corpus. It is used to test the deepfake audio detection method described.

Some description of the data:

The TIMIT dataset includes recordings of 630 speakers of eight major American English dialects, each speaking ten phonetically rich sentences. This provides a diverse and comprehensive set of speech data for analyzing and testing speech-reled technologies.

Keywords:

Spoofed audio detection, linguistic data augmentation, AI models, audio deepfake

Instance Represent:

Audio samples used for detecting spoofed audios.

Dataset Characteristics:

Mixed type (real and synthetic)

Subject Area:

Audio Security, Machine Learning

Associated Tools:

Spoofed audio detection

Feature Type:

Audio files with linguistic annotations

Number of Instances:

344

Number of Features:

Includes features like pitch, pause, word-initial or word-final consonant stops, audible intake or outtake of breath, and audio quality.

Main Paper Link

Last Accessed: 6/13/2024 (3:07PM)

NSF Award #2346473

Community Infrastructure to Strengthen AI for Audio Deepfake analysis (CISAAD)

College of Engineering and Information Technology

Community Infrastructure to Strengthen AI for Audio Deepfake analysis (CISAAD)

Who Are You (I Really Wanna Know)? Detecting Audio DeepFakes Through Vocal Tract Reconstruction

Community Infrastructure to Strengthen AI for Audio Deepfake analysis (CISAAD)

Subscribe to UMBC Weekly Top Stories

I am interested in: