Multi-modal Gesture Recognition Using Integrated Model of Motion, Audio and Video

  • GOUTSU Yusuke
    Graduate School of Information Science and Technology, The University of Tokyo
  • KOBAYASHI Takaki
    Graduate School of Information Science and Technology, The University of Tokyo
  • OBARA Junya
    Graduate School of Information Science and Technology, The University of Tokyo
  • KUSAJIMA Ikuo
    Graduate School of Information Science and Technology, The University of Tokyo
  • TAKEICHI Kazunari
    Graduate School of Information Science and Technology, The University of Tokyo
  • TAKANO Wataru
    Graduate School of Information Science and Technology, The University of Tokyo
  • NAKAMURA Yoshihiko
    Graduate School of Information Science and Technology, The University of Tokyo

Bibliographic Information

Other Title
  • 身体運動・音声・映像の特徴を用いた統合モデルによるマルチモーダルジェスチャー認識
  • シンタイ ウンドウ ・ オンセイ ・ エイゾウ ノ トクチョウ オ モチイタ トウゴウ モデル ニ ヨル マルチモーダルジェスチャー ニンシキ

Search this article

Abstract

Gesture recognition is used in many practical applications such as human-robot interaction, medical rehabilitation and sign language. With increasing motion sensor development, multiple data sources have become available, which leads to the rise of multi-modal gesture recognition. Since our previous approach to gesture recognition depends on a unimodal system, it was difficult to classify similar motion patterns. In order to solve this problem, a novel approach which integrates motion, audio and video models is proposed in the present paper by using dataset captured with Kinect. The proposed system can recognize observed gestures by using three models. Recognition results of three models are integrated by using the proposed framework and the output becomes the final result. The motion and audio models are constructed by using Hidden Markov Model. Random Forest which is the video classifier is used to learn the video model. In the experiments to test the performances of the proposed system, the motion and audio models most suitable for gesture recognition are chosen by varying feature vector and learning method. Additionally, the unimodal and multi-modal models are compared with respect to recognition accuracy. All the experiments are conducted on dataset provided by the competition organizer of MMGRC, which is a workshop for Multi-Modal Gesture Recognition Challenge. The comparison results show that the multi-modal model composed of three models scores the highest recognition rate. This improvement of recognition accuracy means that the complementary relationship among three models improves the accuracy of gesture recognition.

Journal

References(18)*help

See more

Details 詳細情報について

Report a problem

Back to top