Recognition of Spoken Words Using Motion Features Extracted from Time Series Imagery

Bibliographic Information

Other Title
  • 時系列顔画像の動き特徴を用いた発声単語認識
  • 時系列顔画像の動き特徴を用いた発声単語認識--特徴抽出の時間・空間的正規化条件の比較
  • ジケイレツ カオ ガゾウ ノ ウゴキ トクチョウ オ モチイタ ハッセイ タンゴ ニンシキ トクチョウ チュウシュツ ノ ジカン クウカンテキ セイキカ ジョウケン ノ ヒカク
  • -特徴抽出の時間・空間的正規化条件の比較-
  • - Comparison of Temporal and Spatial Normalization Condition of Feature Extraction -

Search this article

Abstract

This paper describes a vision-based spoken word recognition system that utilizes, instead of audio signal, visual motion signal which is obtained from motion pictures during speech. Motion information on each pixel in the input time-series imagery was obtained by computation of optical flow, and feature values representing a spatial configuration of pixel-wise velocities were extracted for each frame image. Both starting and ending points of time for each spoken word were defined using the velocity feature values, and a high dimensional feature vector was obtained to indicate time variation of the velocity distribution within the period of utterance. As a preliminary performance evaluation of the proposed feature in spoken word recognition, discrimination test of five spoken words including A-RI-GA-TO-U and KO-N-NI-CHI-WA was conducted, and fairly promising results were achieved. Moreover, the result of using not only the motion around the mouth but also the motion extracted from other region, the spoken word recognition rather focus on the motion of the entire face than focus on only the motion around the mouth.

Journal

Details 詳細情報について

Report a problem

Back to top