H-Voice: A Dataset of Histograms of Original and Fake Voice Recordings

Authors:
Dora M. Ballesteros
Yohanna Rodriguez
Diego Renza

 

Description of the Dataset:
The dataset consists of 6672 histograms of voice recordings, both original and fake. It is organized into six directories for training, validation, and external testing, with histograms derived from both original and fake voice recordings using Imitation and Deep Voice methods.The dataset contains histograms (in PNG format) generated from both original and fake voice recordings. The recordings are split into six directories:

  • Training_fake: 2088 histograms from fake voice recordings.
  • Training_original: 2020 histograms from original voice recordings.
  • Validation_fake: 864 histograms from fake voice recordings.
  • Validation_original: 864 histograms from original voice recordings.
  • External_test1: 380 histograms from original and 380 from fake voice recordings (Imitation method).
  • External_test2: 4 histograms from original and 72 from fake voice recordings (Deep Voice method).

 

Data Creation Method:
The dataset was created by calculating histograms of voice recordings (both original and fake) using two methods: the Imitation method and the Deep Voice method. The voice recordings were re-quantized to 16 bits, and histograms with 65,536 bins were generated from the recordings.

 

Number of Speakers:

  • Not specified (voice recordings obtained from the Imitation and Deep Voice methods).

Total Size:

  • 6672 histograms

Number of Real Samples:

  • A total of 3808 histograms from original voice recordings.

Number of Fake Samples:

  • A total of 3864 histograms from fake voice recordings (obtained via Imitation and Deep Voice methods).

 

Extra Details:
The dataset is balanced between original and fake voice recordings.Can be used to train, cross-validate, and test machine learning models for fake voice detection, particularly with convolutional neural networks (CNNs).The dataset allows for binary classification tasks between real and fake voice recordings.

 

Data Type:

  • Histograms in PNG format.

Average Length:

  • Not applicable (histograms are static images).

Keywords:

  • Fake voice, Machine learning, Convolutional neural networks, Binary classification, Imitation, Deep Voice, H-Voice

When Published:

  • 31st January 2020

 

Annotation Process:
Histograms were generated from original and fake voice recordings. The fake voice recordings were created using the Imitation and Deep Voice methods, and the histograms were then categorized into training, validation, and test sets.

 

Usage Scenarios:

  • Training machine learning models for fake voice detection.
  • Performance comparison of different fake voice classification models.
  • Research on anti-spoofing techniques in speaker verification systems

Dataset Link


Main Paper Link


License Link


Last Accessed: 6/24/2024

NSF Award #2346473