Performance evaluation of noisy shouted speech detection based on acoustic model with rahmonic and mel-frequency cepstrum coefficients

Bibliographic Information

Other Title
  • Rahmonicとメルケプストラムを用いた音響モデルに基づく騒音環境下叫び声検出の性能評価
  • ポスター講演 Rahmonicとメルケプストラムを用いた音響モデルに基づく騒音環境下叫び声検出の性能評価
  • ポスター コウエン Rahmonic ト メルケプストラム オ モチイタ オンキョウ モデル ニ モトズク ソウオン カンキョウ カ サケビ コエ ケンシュツ ノ セイノウ ヒョウカ

Search this article

Description

This paper describes a method based on new combined features with mel-frequency cepstrum coefficients (MFCCs) and rahmonic in order to robustly detect a shouted speech in noisy environments. MFCCs collectively make up mel-frequency cepstrum, and rahmonic shows a subharmonic of fundamental frequency in the cepstrum domain. In our previous method, Gaussian mixture models (GMM) is constructed with the proposed features extracted from training data which includes a lot of normal and shouted speech samples. In this paper, evaluation experiments of noisy shouted speech detection were conducted using not only GMM but also hidden Markov models (HMM) and deep neural network (DNN). The results show that MFCCs and rahmonic were effective for representing an utterance mechanism including both vocal tract and vocal cords. In addition, DNN could achieve higher performance in noisy environments than GMM and HMM.

Journal

Details 詳細情報について

Report a problem

Back to top