Authors:
Junichi Yamagishi, Xin Wang, Massimiliano Todisco, Md Sahidullah, Jose Patino, Andreas Nautsch, Xuechen Liu, Kong Aik Lee, Tomi Kinnunen, Nicholas Evans, Hector Delgado
Junichi Yamagishi, Xin Wang, Massimiliano Todisco, Md Sahidullah, Jose Patino, Andreas Nautsch, Xuechen Liu, Kong Aik Lee, Tomi Kinnunen, Nicholas Evans, Hector Delgado
Abstract:
Involves detecting synthetic and converted speech injected into communication systems without acoustic propagation.
Involves detecting synthetic and converted speech injected into communication systems without acoustic propagation.
Data Creation Method:
Bona fide and spoofed utterances generated using text-to-speech (TTS) and voice conversion (VC) algorithms are communicated across telephony and VoIP networks with various coding and transmission effects.
Bona fide and spoofed utterances generated using text-to-speech (TTS) and voice conversion (VC) algorithms are communicated across telephony and VoIP networks with various coding and transmission effects.
Number of Speakers:
- Training and Development: 30 speakers (20 for training, 10 for development)
- Evaluation: 48 speakers (21 male, 27 female)
Total Size:
- Not specified
Number of Real Samples:
- Not specified
Number of Fake Samples:
- More than 100 different spoofing algorithms
Description of the Dataset:
- The dataset includes bona fide and spoofed utterances communicated across telephony and VoIP networks with various coding and transmission effects.
Extra Details:
The best performing system achieved a minimum t-DCF of 0.2177 and an EER of 1.32%.
The best performing system achieved a minimum t-DCF of 0.2177 and an EER of 1.32%.
Data Type:
- PCM files
Average Length:
- Not specified
Keywords:
- Speaker verification, Spoofing, Anti-spoofing, Countermeasure, Deepfake detection
When Published:
- 2021
Annotation Process:
Genuine utterances were recorded from speakers in controlled environments. Spoofed utterances were generated using various speech synthesis and voice conversion algorithms.
Genuine utterances were recorded from speakers in controlled environments. Spoofed utterances were generated using various speech synthesis and voice conversion algorithms.
Usage Scenarios:
Evaluating and improving detection systems for synthetic and converted speech communicated over telephony and VoIP networks.
Evaluating and improving detection systems for synthetic and converted speech communicated over telephony and VoIP networks.
Miscellaneous Information:
The dataset provides a benchmark for detecting spoofed speech in communication systems with various coding and transmission effects.
The dataset provides a benchmark for detecting spoofed speech in communication systems with various coding and transmission effects.
Credits:
Datasets Used:
Datasets Used:
- Text-to-speech (TTS) and voice conversion (VC) algorithms
Speech Synthesis Models Referenced:
- Various speech synthesis and voice conversion algorithms