Speech recognition using voice-characteristic-dependent acoustic models

Chiyomi Miyajima, Tadashi Kitamura, Keiichi Tokuda, Heiga Zen, Yoshihiko Nankaku, Hiroyuki Suzuki

doi:10.1109/icassp.2003.1198887

This paper proposes a speech recognition technique based on acoustic models considering voice characteristic variations. Context-dependent acoustic models, which are typically triphone HMM, are often used in continuous speech recognition systems. This work hypothesizes that the speaker voice characteristics that humans can perceive by listening are also factors in acoustic variation for construction of acoustic models, and a tree-based clustering technique is also applied to speaker voice characteristics to construct voice-characteristic-dependent acoustic models. In speech recognition using triphone models, the neighboring phonetic context is given from the linguistic-phonetic knowledge in advance; in contrast, the voice characteristics of input speech are unknown in recognition using voice-characteristic-dependent acoustic models. This paper proposes a method of recognizing speech even under conditions where the voice characteristics of the input speech are unknown. The result of a gender-dependent speech recognition experiment shows that the proposed method achieves higher recognition performance in comparison to conventional methods.

Speech recognition using voice-characteristic-dependent acoustic models

説明

収録刊行物

詳細情報詳細情報について

書き出し

問題の指摘

Speech recognition using voice-characteristic-dependent acoustic models

説明

収録刊行物

詳細情報 詳細情報について

書き出し

問題の指摘

詳細情報詳細情報について