Spatial-Temporal Graph Convolution-Transformerに基づく手話認識

書誌事項

タイトル別名
  • Sign Language Recognition Based on Spatial-Temporal Graph Convolution-Transformer

抄録

<p>This paper reports on sign language recognition based on human body part tracking. Tracking-based sign language recognition has practical advantages, such as robustness against variations in clothes and scene backgrounds. However, there is still room for improving feature extraction in tracking-based sign language recognition. In this paper, a tracking-based continuous sign language word recognition method called Spatial-Temporal Graph Convolution-Transformer is presented. Spatial-temporal graph convolution is employed to improve framewise feature extraction using tracking points, while Transformer enables the model to recognize word sequences of arbitrary lengths. Besides the model design, the training strategy also has an impact on the recognition performance. Multi-task learning, which combines connectionist temporal classification and cross-entropy losses, is employed to train the proposed method in this study. This training strategy improved the recognition performance by a significant margin. The proposed method was evaluated statistically using a sign language video dataset consisting of 275 types of isolated words and 120 types of sentences. The evaluation results show that STGC-Transformer with multi-task learning achieved 12.14% and 2.07% word error rates for isolated words and sentences, respectively.</p>

収録刊行物

  • 精密工学会誌

    精密工学会誌 87 (12), 1028-1035, 2021-12-05

    公益社団法人 精密工学会

被引用文献 (1)*注記

もっと見る

参考文献 (19)*注記

もっと見る

詳細情報 詳細情報について

問題の指摘

ページトップへ