Spectral Modification for Voice Gender Conversion using Temporal Decomposition

Nguyen, Binh Phu, Akagi, Masato

この論文をさがす

説明

In most state-of-the-art voice gender conversion systems, the converted speech still sounds unnatural, which is mainly attributed to the insufficient smoothness of the converted spectra between frames and ineffective spectral modification. In this paper, we present a new method for voice gender conversion using a speech analysis technique called temporal decomposition (TD). TD is used to model spectral evolution effectively. Instead of modifying speech spectra frame by frame, we only need to modify event targets and event functions, and the smoothness of the converted speech is ensured by the shape of the event functions. To overcome the ineffective spectral modification, we explore Gaussian mixture model (GMM) parameter sets for an input of TD to flexibly model the spectral envelope, and develop a new method of modifying GMM parameters in accordance with formant scaling factors. For transforming fundamental frequencies, our system is based on STRAIGHT, which is a very high-quality vocoder. Experimental results show that the quality of the speech converted by the proposed method is significantly improved.

identifier:https://dspace.jaist.ac.jp/dspace/handle/10119/4888

収録刊行物

Journal of Signal Processing

Journal of Signal Processing 11 (4), 333-336, 2007-07

Research Institute of Signal Processing Japan（信号処理学会）

詳細情報詳細情報について

CRID: 1050282812513890816

NII論文ID: 120000861659

NII書誌ID: AA11147833

ISSN: 13426230

NDL書誌ID: 8918499

Web Site: http://hdl.handle.net/10119/4888; http://id.ndl.go.jp/bib/8918499; https://ndlsearch.ndl.go.jp/books/R000000004-I8918499

本文言語コード: en

資料種別: journal article

データソース種別

IRDB
NDLサーチ
CiNii Articles

書き出し

問題の指摘