Considering Temporal Connection between Turns for Conversational Speech Synthesis
Kangdi Mei, Zhaoci Liu, Huipeng Du, Hengyu Li, Yang Ai, Liping Chen, Zhenhua Ling
2024
Most studies in conversational speech synthesis only focus on the synthesis performance of the current speaker’s turn and neglect the temporal relationship between turns of interlocutors. Therefore, we consider the temporal connection between turns for conversational speech synthesis, which is crucial for the naturalness and coherence of conversations. Specifically, this paper formulates a task in which there is no overlap between turns and only one history turn is considered.
Controllable Context-aware Conversational Speech Synthesis
Jian Cong, Shan Yang, Na Hu, Guangzhi Li, Lei Xie, Dan Su
2021
Generation, AI x Human Dialogue, Chinese
This study presents a framework for synthesizing human-like conversational speech by modeling spontaneous behaviors, such as filled pauses and prolongations, and speech entrainment. By predicting and controlling these behaviors, the approach generates realistic, contextually aligned speech, with experiments demonstrating its effectiveness in producing natural-sounding conversations.
Conversational End-to-End TTS for Voice Agents
Haohan Guo; Shaofei Zhang; Frank K. Soong; Lei He; Lei Xie
2021
Generation, AI x Human Dialogue, Chinese
It is still a challenge to build a high-quality conversational TTS due to the limitations of corpus and modeling capability. This study aims at building a conversational TTS for a voice agent under sequence to sequence modeling framework.
Evaluating Comprehension of Natural and Synthetic Conversational Speech
Mirjam Wester, Oliver Watts and Gustav Eje Henter
2016
AI x Human Dialogue, Discernment, English
In an effort to develop more ecologically relevant evaluation techniques that go beyond isolated sentences, this paper investigates comprehension of natural and synthetic speech dialogues.