English-Read-By-Japanese Speech Synthesis Preserving Speaker Individuality Based on Partial Correction of Prosody and Phonetic Sounds and Effects of English Proficiency Level on Its Performance

Bibliographic Information

Other Title
  • 韻律・音韻の部分補正に基づく話者性を保持した日本人英語音声合成と英語習熟度が与える影響

Search this article

Description

Cross-lingual speech synthesis for generating naturally sounding English speech uttered by Japanese speakers based on voice conversion and HMM-based speech synthesis tends to cause the degradation of speaker individuality in synthetic speech compared to intra-lingual speech synthesis. To address this issue, we have proposed an ERJ(English Read by Japanese) speech synthesis method to preserve speaker individuality in synthetic speech and a prosody correction method to improve its naturalness. However, their effectiveness has never been evaluated by native listeners: the effects of each speaker's English proficiency level on their performance have never been evaluated; and incorrect phonetic sounds of ERJ have never been addressed. In this paper, we evaluate these points by applying the proposed method to multiple speakers with various English proficiency levels and also propose a correction method of some incorrect phonetic sounds based on spectrum swapping for unvoiced consonants. The experimental results demonstrate that (1) the effectiveness of power correction is well confirmed by native listeners; (2) the naturalness of ERJ synthetic speech is successfully improved over various English prociency levels by the prosody correction method; and (3) the proposed phonetic sound correction method is also effective for further improving its naturalness.

Journal

  • IPSJ SIG Notes

    IPSJ SIG Notes 2015 (3), 1-6, 2015-02-20

    Information Processing Society of Japan (IPSJ)

Keywords

Details 詳細情報について

  • CRID
    1570572702916817408
  • NII Article ID
    110009877335
  • NII Book ID
    AN10442647
  • Text Lang
    ja
  • Data Source
    • CiNii Articles

Report a problem

Back to top