Feedback loop for prosody prediction in concatenative speech synthesis

説明

Abstract We propose a method for concatenative speech synthesis thatpermits to obtain a better matching between the logF0 and dura-tion predicted by the prosody module and the waveform genera-tion back-end. The proposed method is based upon our previousmultilevel parametric F0 model and Toshiba’s plural unit selec-tion and fusion synthesizer. The method adds a feedback loopfrom the back-end into the prosody module so that the prosod-ical information of the selected units is used to re-estimatenew prosody values. The feedback loop defines a frame-levelprosody model which consists of the average value and vari-ance of the duration and logF0 of the selected units. The log-likelihood defined by this model is added to the log-likelihoodof the prosody model. From the maximization of this total log-likelihood, we obtain the prosody values that produce the opti-mum compromise between the distortion introduced by F0 dis-continuities and the one created by the prosody adjusting signalprocessing.Index Terms: speech synthesis, multilevel, parametric F0,prosody, Discrete cosine transform, log-likelihood maximiza-tion

収録刊行物

詳細情報 詳細情報について

問題の指摘

ページトップへ