CISAAD

This project conducts exploratory development towards a prototype Community Infrastructure to Strengthen AI for Audio Deepfake analysis (CISAAD) particularly for English language audio by increasing access and availability to datasets and support for audio deepfake analysis. Deepfakes, or AI generated content, are widely recognized as a major societal concern and challenge. This project 1) addresses the challenges of limited data availability and human augmented data through open datasets shared by the community, 2) enables both single and multi-speaker deepfake analysis across various use cases, and 3) addresses ethical, social, and political challenges associated with deploying deepfake technology developed from open-sourced community data.

CISAAD advances cybersecurity research and information trustworthiness focusing on audio deepfakes, an under-explored type of AI generated content, and employs a new transdisciplinary approach to strengthening AI models by incorporating human knowledge. CISAAD strengthens AI for English language audio analysis in both generative applications, such as voice reconstruction, and discriminative applications, such as audio deepfake detection. The project advances the current state of the art in deepfake analysis by enabling unique and compelling research opportunities in audio deepfake analysis otherwise inaccessible to the CISE research community, such as human knowledge-augmented deepfake models, auto-annotation of linguistic features, and multi-speaker deepfake models, in addition to single speaker use cases.

The work will inform our understanding of mis/dis-information as a major societal concern and challenge, and also offer opportunities for content generation in positive applications such as voice restoration and smart and connected community research. With an interdisciplinary team across AI, linguistics, cyber infrastructure and human centered computing, the project develops an innovative infrastructure for expanding research informed by types of audio deepfakes. Together, our research and dissemination efforts expand formal and informal learning in AI and STEM fields related to cybersecurity analytics at the intersection of technology, language, behavior, and society. The principles developed through this project will expand to multiple types of deepfakes and support media and communications experts working to address challenges related to information integrity.

CISAAD is a prototype community resource; it includes a deepfake data catalog and repository for English audio data, tools and models for deepfake audio analysis use cases, and educational materials.