Emotion recognition method based on normalization of prosodic features

Shohei Nakagawa, Motoyuki Suzuki, Kenji Kita

doi:10.1109/apsipa.2013.6694147

Emotion recognition from speech signals is one of the most important technologies for natural conversation between humans and robots. Most emotion recognizers extract prosodic features from an input speech in order to use emotion recognition. However, prosodic features changes drastically depending on the uttered text. In order to solve this problem, we have proposed the normalization method of prosodic features by using the synthesized speech, which has the same word sequence but uttered with a “neutral” emotion. In this method, all prosodic features (pitch, power, etc.) are normalized. However, nobody knows which kind of prosodic features should be normalized. In this paper, all combinations of with/without normalization were examined, and the most appropriate normalization method was found. When both “RMS Energy” (root mean square frame energy) and “VoiceProb” (power of harmonics divided by the total power) were normalized, emotion recognition accuracy became 5.98% higher than the recognition accuracy without normalization.

Emotion recognition method based on normalization of prosodic features

説明

収録刊行物

詳細情報詳細情報について

書き出し

問題の指摘

Emotion recognition method based on normalization of prosodic features

説明

収録刊行物

詳細情報 詳細情報について

書き出し

問題の指摘

詳細情報詳細情報について