AI x Human Dialogue

Considering Temporal Connection between Turns for Conversational Speech Synthesis


Kangdi Mei, Zhaoci Liu, Huipeng Du, Hengyu Li, Yang Ai, Liping Chen, Zhenhua Ling

2024

AI x Human Dialogue, English


Most studies in conversational speech synthesis only focus on the synthesis performance of the current speaker’s turn and neglect the temporal relationship between turns of interlocutors. Therefore, we consider the temporal connection between turns for conversational speech synthesis, which is crucial for the naturalness and coherence of conversations. Specifically, this paper formulates a task in which there is no overlap between turns and only one history turn is considered.

Controllable Context-aware Conversational Speech Synthesis


Jian Cong, Shan Yang, Na Hu, Guangzhi Li, Lei Xie, Dan Su

2021

Generation, AI x Human Dialogue, Chinese


This study presents a framework for synthesizing human-like conversational speech by modeling spontaneous behaviors, such as filled pauses and prolongations, and speech entrainment. By predicting and controlling these behaviors, the approach generates realistic, contextually aligned speech, with experiments demonstrating its effectiveness in producing natural-sounding conversations.

Conversational End-to-End TTS for Voice Agents


Haohan Guo; Shaofei Zhang; Frank K. Soong; Lei He; Lei Xie

2021

Generation, AI x Human Dialogue, Chinese


It is still a challenge to build a high-quality conversational TTS due to the limitations of corpus and modeling capability. This study aims at building a conversational TTS for a voice agent under sequence to sequence modeling framework.

Evaluating Comprehension of Natural and Synthetic Conversational Speech


Mirjam Wester, Oliver Watts and Gustav Eje Henter

2016

AI x Human Dialogue, Discernment, English


In an effort to develop more ecologically relevant evaluation techniques that go beyond isolated sentences, this paper investigates comprehension of natural and synthetic speech dialogues.

NSF Award #2346473