Authors:
Jee-weon Jung and Hee-Soo Heo and Hemlata Tak and Hye-jin Shim and Joon Son Chung and Bong-Jin Lee and Ha-jin Yu and Nicholas W. D. Evans
Where published:
ICASSP
Dataset names (used for):
- ASVspoof 2019 Logical Access (LA) Dataset
Some description of the approach:
This study uses RawNet2-based encoder to get the audio representations. In contrast to RawNet2, they consider the output of the sinc-convolution layer as a 2-dimensional image with a single channel (spectrogram) rather than a 1-dimensional sequence. They also use some residual blocks with pre-activation to extract the high-level representation. Then, the representation goes to the graph module containing graph attention network. They introduce an innovative heterogeneous stacking graph attention layer that leverages a heterogeneous attention mechanism and a stack node to capture artifacts across various temporal and spectral domains.
Some description of the data (number of data points, any other features that describe the data):
Audio waveforms with spoofing attacks categorized by algorithm types like A07–A19 for evaluation.
Keywords:
Audio spoofing detection, anti-spoofing, graph attention networks, end-to-end, heterogeneous.
Instance Represent:
Bona-fide (genuine) and spoofed speech utterances.
Dataset Characteristics:
Contains raw waveform audio data with annotations
Subject Area:
Security of audio authentication systems