Spectral Modification for Voice Gender Conversion using Temporal Decomposition

この論文をさがす

抄録

In most state-of-the-art voice gender conversion systems, the converted speech still sounds unnatural, which is mainly attributed to the insufficient smoothness of the converted spectra between frames and ineffective spectral modification. In this paper, we present a new method for voice gender conversion using a speech analysis technique called temporal decomposition (TD). TD is used to model spectral evolution effectively. Instead of modifying speech spectra frame by frame, we only need to modify event targets and event functions, and the smoothness of the converted speech is ensured by the shape of the event functions. To overcome the ineffective spectral modification, we explore Gaussian mixture model (GMM) parameter sets for an input of TD to flexibly model the spectral envelope, and develop a new method of modifying GMM parameters in accordance with formant scaling factors. For transforming fundamental frequencies, our system is based on STRAIGHT, which is a very high-quality vocoder. Experimental results show that the quality of the speech converted by the proposed method is significantly improved.

identifier:https://dspace.jaist.ac.jp/dspace/handle/10119/4888

収録刊行物

  • Journal of Signal Processing

    Journal of Signal Processing 11 (4), 333-336, 2007-07

    Research Institute of Signal Processing Japan(信号処理学会)

被引用文献 (1)*注記

もっと見る

詳細情報 詳細情報について

問題の指摘

ページトップへ