A Method for Displaying Timing between Speaker's Face and Captions for a Real-time Speech-to-Caption System

Kuroki Hayato, Ino Shuichi, Nakano Satoko, Hori Kotaro, Ifukube Tohru, Aayama Miyoshi, Hasegawa Hiroshi, Yuyama Ichiro

doi:10.3169/itej.65.1750

We have been studying a real-time speech-to-caption system using speech recognition technology with a repeat speaking method. In this system, we used a repeat speaker who listens to a lecturer's voice and then speaks back the lecturer's utterances into a speech recognition computer. Our developing system showed that the accuracy of the captions is about 97% in Japanese-Japanese conversion, and the conversion time from voices to captions is about 4 seconds in English-English conversion in some international conferences. Of course it required a lot of costs to achieve these high performances. In human communications, speech understanding depends not only on verbal information but also on non-verbal information such as speaker's gestures and face and mouth movements. Therefore, we found a suitable way to display the information of captions and speaker's face movement images to achieve higher comprehension after briefly storing information once into a computer. In this paper, we investigated the relationship of the display sequence and display timing between captions that have speech recognition errors and the speaker's face movement images. The results showed that the sequence displaying the caption before the speaker's face image improved the comprehension of the captions. The sequence displaying both simultaneously showed an improvement of only a few percent higher than that of the question sentence, and the sequence displaying the speaker's face image before the caption showed almost no change. In addition, the sequence displaying the caption 1 second before the speaker's face showed the most significant improvement of all the conditions in the hearing-impaired.

A Method for Displaying Timing between Speaker's Face and Captions for a Real-time Speech-to-Caption System

Bibliographic Information

Search this article

Abstract

Journal

Citations (1)*help

References(22)*help

Related Projects

Keywords

Details 詳細情報について

Export

Report a problem

A Method for Displaying Timing between Speaker's Face and Captions for a Real-time Speech-to-Caption System

Bibliographic Information

Search this article

Abstract

Journal

Citations (1)*help

References(22)*help

Related Projects

Keywords

Details 詳細情報について

Export

Report a problem

Project list