Chinese

AISHELL-3: A MULTI-SPEAKER MANDARIN TTS CORPUS AND THE BASELINES

Yao Shi, Hui Bu, Xin Xu, Shaoji Zhang, Ming Li
2020

In this paper, we present AISHELL-3, a large-scale and high-fidelity multi-speaker Mandarin speech corpus which could be used to train multi-speaker Text-to-Speech (TTS) systems. The corpus contains roughly 85 hours of emotion-neutral recordings spoken by 218 native Chinese mandarin speakers.

Controllable Context-aware Conversational Speech Synthesis

Jian Cong, Shan Yang, Na Hu, Guangzhi Li, Lei Xie, Dan Su

2021

Generation, AI x Human Dialogue, Chinese

This study presents a framework for synthesizing human-like conversational speech by modeling spontaneous behaviors, such as filled pauses and prolongations, and speech entrainment. By predicting and controlling these behaviors, the approach generates realistic, contextually aligned speech, with experiments demonstrating its effectiveness in producing natural-sounding conversations.

Conversational End-to-End TTS for Voice Agents

Haohan Guo; Shaofei Zhang; Frank K. Soong; Lei He; Lei Xie

2021

Generation, AI x Human Dialogue, Chinese

It is still a challenge to build a high-quality conversational TTS due to the limitations of corpus and modeling capability. This study aims at building a conversational TTS for a voice agent under sequence to sequence modeling framework.

SPONTTS: MODELING AND TRANSFERRING SPONTANEOUS STYLE FOR TTS

Hanzhao Li, Xinfa Zhu, Liumeng Xue, Yang Song, Yunlin Chen, Lei Xie

2024

Generation, Chinese

The paper introduces SponTTS, a two-stage approach for text-to-speech (TTS) that models and transfers spontaneous speaking styles using neural bottleneck features. By capturing prosody and spontaneous phenomena, SponTTS effectively generates natural and expressive spontaneous speech for target speakers, even in zero-shot scenarios for speakers without prior spontaneous data.

NSF Award #2346473

Community Infrastructure to Strengthen AI for Audio Deepfake analysis (CISAAD)

College of Engineering and Information Technology

Community Infrastructure to Strengthen AI for Audio Deepfake analysis (CISAAD)

Chinese

Community Infrastructure to Strengthen AI for Audio Deepfake analysis (CISAAD)

Subscribe to UMBC Weekly Top Stories

I am interested in: