AASIST: Audio Anti-Spoofing Using Integrated Spectro-Temporal Graph Attention Networks

Authors:

Jee-weon Jung and Hee-Soo Heo and Hemlata Tak and Hye-jin Shim and Joon Son Chung and Bong-Jin Lee and Ha-jin Yu and Nicholas W. D. Evans

Where published:

ICASSP

 

Dataset names (used for):

  • ASVspoof 2019 Logical Access (LA) Dataset

 

Some description of the approach:

This study uses RawNet2-based encoder to get the audio representations. In contrast to RawNet2, they consider the output of the sinc-convolution layer as a 2-dimensional image with a single channel (spectrogram) rather than a 1-dimensional sequence. They also use some residual blocks with pre-activation to extract the high-level representation. Then, the representation goes to the graph module containing graph attention network. They introduce an innovative heterogeneous stacking graph attention layer that leverages a heterogeneous attention mechanism and a stack node to capture artifacts across various temporal and spectral domains.

 

Some description of the data (number of data points, any other features that describe the data):

Audio waveforms with spoofing attacks categorized by algorithm types like A07–A19 for evaluation.

 

Keywords:

Audio spoofing detection, anti-spoofing, graph attention networks, end-to-end, heterogeneous.

Instance Represent:

Bona-fide (genuine) and spoofed speech utterances.

Dataset Characteristics:

Contains raw waveform audio data with annotations

Subject Area:

Security of audio authentication systems

Dataset Link


Main Paper Link


License Link


Last Accessed: 11/26/2024

NSF Award #2346473