Feedback loop for prosody prediction in concatenative speech synthesis

Sergio Gracia, Javier Latorre, Masami Akamine

doi:10.21437/interspeech.2009-593

Abstract We propose a method for concatenative speech synthesis thatpermits to obtain a better matching between the logF0 and dura-tion predicted by the prosody module and the waveform genera-tion back-end. The proposed method is based upon our previousmultilevel parametric F0 model and Toshiba’s plural unit selec-tion and fusion synthesizer. The method adds a feedback loopfrom the back-end into the prosody module so that the prosod-ical information of the selected units is used to re-estimatenew prosody values. The feedback loop deﬁnes a frame-levelprosody model which consists of the average value and vari-ance of the duration and logF0 of the selected units. The log-likelihood deﬁned by this model is added to the log-likelihoodof the prosody model. From the maximization of this total log-likelihood, we obtain the prosody values that produce the opti-mum compromise between the distortion introduced by F0 dis-continuities and the one created by the prosody adjusting signalprocessing.Index Terms: speech synthesis, multilevel, parametric F0,prosody, Discrete cosine transform, log-likelihood maximiza-tion

Feedback loop for prosody prediction in concatenative speech synthesis

Description

Journal

Details 詳細情報について

Export

Report a problem