Emotion recognition method based on normalization of prosodic features
説明
Emotion recognition from speech signals is one of the most important technologies for natural conversation between humans and robots. Most emotion recognizers extract prosodic features from an input speech in order to use emotion recognition. However, prosodic features changes drastically depending on the uttered text. In order to solve this problem, we have proposed the normalization method of prosodic features by using the synthesized speech, which has the same word sequence but uttered with a “neutral” emotion. In this method, all prosodic features (pitch, power, etc.) are normalized. However, nobody knows which kind of prosodic features should be normalized. In this paper, all combinations of with/without normalization were examined, and the most appropriate normalization method was found. When both “RMS Energy” (root mean square frame energy) and “VoiceProb” (power of harmonics divided by the total power) were normalized, emotion recognition accuracy became 5.98% higher than the recognition accuracy without normalization.
収録刊行物
-
- 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference
-
2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 1-5, 2013-10-01
IEEE