Learning to Listen and Listening to Learn: Spoofed Audio Detection Through Linguistic Data Augmentation

Authors:
Zahra Khanjani, Lavon Davis, Anna Tuz, Kifekachukwu Nwosu, Christine Mallinson, Vandana P. Janeja

Where published:
2023 IEEE International Conference on Intelligence and Security Informatics (ISI) by the Institute of Electrical and Electronics Engineers (IEEE)

 

Dataset names (used for):

  • WaveFake, FoR, ASVspoof 2019, ASVspoof 2015, ASVspoof, ASVspoof 2021 (all for spoofed audio detection)
  • ASSEM-VC (for voice conversion)
  • LJspeech

 

Some description of the approach:
The study utilizes a hybrid dataset composed of various spoofed audio samples, including replay attacks, Text-to-Speech, Voice Conversion, and mimicry, as well as genuine samples. This dataset aims to facilitate the development and testing of spoofed audio detection techniques.

 

Some description of the data:
The dataset includes 25% replay attack samples, 30% Text-to-Speech samples, 30% Voice Conversion samples, and 15% mimicry samples, along with genuine samples from various sources. The average duration of samples varies by type.

 

Keywords:
Audio deepfake, spoofed audio detection, artificial intelligence, linguistics, sociolinguistics, linguistic perception

Instance Represent:
Audio samples analyzed for authenticity based on linguistic features

Dataset Characteristics:
Contains both genuine and spoofed audio samples from various sources. Includes linguistic features such as pitch, pause, and quality annotated for authenticity.

Subject Area:
Audio security, linguistics, AI in speech recognition

Associated Tools:
Spoofed audio detection

Feature Type:
Audio features with linguistic annotations

Main Paper Link


Code Link


License: Not specified


Last Accessed: 6/14/2024

NSF Award #2346473