Auto Annotation of Linguistic Features for Audio Deepfake Discernment

Authors:
Kifekachukwu Nwosu, Chloe Evered, Zahra Khanjani, Noshaba Bhalli, Lavon Davis, Christine Mallinson, Vandana P. Janeja

Where published:
AAAI Fall Symposium Series (FSS-23) by University of Maryland, Baltimore County; Rochester Institute of Technology; Georgetown University.

 

Dataset names (used for):

  • The dataset includes 20 Text to Speech, 20 Voice Conversion, and 10 genuine clips, utilized to test the auto-annotation methodology.

 

Some description of the approach:
The study focuses on detecting audio deepfakes through linguistic analysis. It involves analyzing audio samples for specific linguistic features and testing the auto-annotation methodology.

 

Some description of the data:
The dataset comprises 50 audio clips, including 20 Text to Speech, 20 Voice Conversion, and 10 genuine clips. It includes 344 audio samples with five main features: pitch, pause, breath, consonant bursts, and audio quality.

 

Keywords:
Audio Deepfake Detection, Linguistic Features, Time Series Discords, Expert Annotations

Instance Represent:
Audio samples analyzed for linguistic features

Dataset Characteristics:
Includes 344 audio samples with linguistic features such as pitch, pause, breath, consonant bursts, and audio quality.

Subject Area:
Audio security, linguistics, deepfake detection

Associated Tools:
Detection of audio deepfakes through linguistic analysis

Feature Type:
Audio and linguistic features

Main Paper Link


License: The paper is accessible through the conference series’ platform, with no additional license information provided.


Last Accessed: 6/14/2024

NSF Award #2346473