A Method for Displaying Timing between Speaker's Face and Captions for a Real-time Speech-to-Caption System
-
- Kuroki Hayato
- Tsukuba University of Technology Utsunomiya University
-
- Ino Shuichi
- National Institute of Advanced Industrial Science and Technology
-
- Nakano Satoko
- University of Tokyo
-
- Hori Kotaro
- B.U.G., Inc.
-
- Ifukube Tohru
- University of Tokyo
-
- Aayama Miyoshi
- Utsunomiya University
-
- Hasegawa Hiroshi
- Utsunomiya University
-
- Yuyama Ichiro
- Utsunomiya University
Bibliographic Information
- Other Title
-
- 聴覚障害者のためのリアルタイム字幕システムにおける話者顔映像と誤認識字幕の呈示タイミングに関する研究
- チョウカク ショウガイシャ ノ タメ ノ リアルタイム ジマク システム ニ オケル ワシャ ガン エイゾウ ト ゴニンシキ ジマク ノ テイジ タイミング ニ カンスル ケンキュウ
Search this article
Abstract
We have been studying a real-time speech-to-caption system using speech recognition technology with a repeat speaking method. In this system, we used a repeat speaker who listens to a lecturer's voice and then speaks back the lecturer's utterances into a speech recognition computer. Our developing system showed that the accuracy of the captions is about 97% in Japanese-Japanese conversion, and the conversion time from voices to captions is about 4 seconds in English-English conversion in some international conferences. Of course it required a lot of costs to achieve these high performances. In human communications, speech understanding depends not only on verbal information but also on non-verbal information such as speaker's gestures and face and mouth movements. Therefore, we found a suitable way to display the information of captions and speaker's face movement images to achieve higher comprehension after briefly storing information once into a computer. In this paper, we investigated the relationship of the display sequence and display timing between captions that have speech recognition errors and the speaker's face movement images. The results showed that the sequence displaying the caption before the speaker's face image improved the comprehension of the captions. The sequence displaying both simultaneously showed an improvement of only a few percent higher than that of the question sentence, and the sequence displaying the speaker's face image before the caption showed almost no change. In addition, the sequence displaying the caption 1 second before the speaker's face showed the most significant improvement of all the conditions in the hearing-impaired.
Journal
-
- The Journal of The Institute of Image Information and Television Engineers
-
The Journal of The Institute of Image Information and Television Engineers 65 (12), 1750-1757, 2011
The Institute of Image Information and Television Engineers
- Tweet
Details 詳細情報について
-
- CRID
- 1390001205128000128
-
- NII Article ID
- 10030041343
-
- NII Book ID
- AN10588970
-
- ISSN
- 18816908
- 13426907
-
- NDL BIB ID
- 023348649
-
- Text Lang
- ja
-
- Data Source
-
- JaLC
- NDL
- Crossref
- CiNii Articles
- KAKEN
-
- Abstract License Flag
- Disallowed