Feedback loop for prosody prediction in concatenative speech synthesis
説明
Abstract We propose a method for concatenative speech synthesis thatpermits to obtain a better matching between the logF0 and dura-tion predicted by the prosody module and the waveform genera-tion back-end. The proposed method is based upon our previousmultilevel parametric F0 model and Toshiba’s plural unit selec-tion and fusion synthesizer. The method adds a feedback loopfrom the back-end into the prosody module so that the prosod-ical information of the selected units is used to re-estimatenew prosody values. The feedback loop defines a frame-levelprosody model which consists of the average value and vari-ance of the duration and logF0 of the selected units. The log-likelihood defined by this model is added to the log-likelihoodof the prosody model. From the maximization of this total log-likelihood, we obtain the prosody values that produce the opti-mum compromise between the distortion introduced by F0 dis-continuities and the one created by the prosody adjusting signalprocessing.Index Terms: speech synthesis, multilevel, parametric F0,prosody, Discrete cosine transform, log-likelihood maximiza-tion
収録刊行物
-
- Interspeech 2009
-
Interspeech 2009 2067-2070, 2009-09-06
ISCA