Authors:
Logan Blue, Kevin Warren, Hadi Abdullah, Cassidy Gibson, Luis Vargas, Jessica O’Dell, Kevin Butler, Patrick Traynor
Where published:
USENIX Security Symposium
Dataset names (used for):
- TIMIT (transfer function)
- Own synthetic audio samples using Real-Time-Voice-Cloning (RTVC) tool *implementation of Tactron 2
- Evaluated datasets published by Lyrebird
Some description of the approach:
The dataset utilized in this study is the TIMIT Acoustic-Phonetic Continuous Speech Corpus. It is used to test the deepfake audio detection method described.
Some description of the data:
The TIMIT dataset includes recordings of 630 speakers of eight major American English dialects, each speaking ten phonetically rich sentences. This provides a diverse and comprehensive set of speech data for analyzing and testing speech-reled technologies.
Keywords:
Spoofed audio detection, linguistic data augmentation, AI models, audio deepfake
Instance Represent:
Audio samples used for detecting spoofed audios.
Dataset Characteristics:
Mixed type (real and synthetic)
Subject Area:
Audio Security, Machine Learning
Associated Tools:
Spoofed audio detection
Feature Type:
Audio files with linguistic annotations
Number of Instances:
344
Number of Features:
Includes features like pitch, pause, word-initial or word-final consonant stops, audible intake or outtake of breath, and audio quality.