Recognition of Spoken Words Using Motion Features Extracted from Time Series Imagery

NAKAMURA Ryota, AKAMATSU Shigeru

doi:10.11371/wiieej.09-06.0_19

Bibliographic Information

Other Title

時系列顔画像の動き特徴を用いた発声単語認識
時系列顔画像の動き特徴を用いた発声単語認識--特徴抽出の時間・空間的正規化条件の比較
ジケイレツカオガゾウノウゴキトクチョウオモチイタハッセイタンゴニンシキトクチョウチュウシュツノジカンクウカンテキセイキカジョウケンノヒカク
-特徴抽出の時間・空間的正規化条件の比較-
- Comparison of Temporal and Spatial Normalization Condition of Feature Extraction -

Search this article

Abstract

This paper describes a vision-based spoken word recognition system that utilizes, instead of audio signal, visual motion signal which is obtained from motion pictures during speech. Motion information on each pixel in the input time-series imagery was obtained by computation of optical flow, and feature values representing a spatial configuration of pixel-wise velocities were extracted for each frame image. Both starting and ending points of time for each spoken word were defined using the velocity feature values, and a high dimensional feature vector was obtained to indicate time variation of the velocity distribution within the period of utterance. As a preliminary performance evaluation of the proposed feature in spoken word recognition, discrimination test of five spoken words including A-RI-GA-TO-U and KO-N-NI-CHI-WA was conducted, and fairly promising results were achieved. Moreover, the result of using not only the motion around the mouth but also the motion extracted from other region, the spoken word recognition rather focus on the motion of the entire face than focus on only the motion around the mouth.

Journal

Reports of the Technical Conference of the Institute of Image Electronics Engineers of Japan

Reports of the Technical Conference of the Institute of Image Electronics Engineers of Japan 09-06 (0), 19-24, 2010

The Institute of Image Electronics Engineers of Japan

Keywords

Details 詳細情報について

Export