Construction of Spoken Language Model Considering Pause

太田, 健吾, Kengo, Ohta

本論文では，話し言葉の音声認識タスクで問題となる，入力音声中のポーズと言語モデルの学習コーパス中の句読点との不一致に対処する方法を提案する．自発的に発声される話し言葉の音声では，言語的な区切りとは無関係な位置に多数のポーズが出現するため，言語モデルを学習する際に，コーパス中の句読点をポーズと見なして学習を行うことは適切でない．この問題に対処する最も簡単な方法は，ポーズの情報を含むコーパスから，実際のポーズを反映した言語モデルを構築することであるが，そのようなコーパスが利用できるドメインはきわめて稀である．そこで本論文では，ポーズ情報の付与されていないコーパスからポーズを積極的に考慮した言語モデルを構築する手法を提案する．提案手法では，話し言葉音声コーパスに基づいて学習したモデルによってポーズの情報を補うことにより，ポーズを考慮した言語モデルを作成する．提案手法によって構築された言語モデルを国会審議の認識実験によって評価したところ，従来の句読点に基づく認識処理単位を用いた言語モデルと比較して，認識精度を改善することができた．

This paper addresses the mismatch between pauses in input speech and punctuations in training corpora of language model. In a spontaneous speech recognition task, it is inadequate to train a language model with regarding punctuations as pauses, because there is an inevitable gap between pauses in input speech and punctuations in corpora. The simplest approach to address this problem is to build a language model that considers pauses from a corpus that includes pause information. However, such corpora can only be available in a limited domain. In this paper, we propose a method to build a language model that considers pauses from a corpus that does not include pause information. In our method, a pause insertion model is trained from spontaneous speech corpora, and then the language model that considers pauses is built by using this model. Our proposed model achieved an improvement over the conventional model in the recognition task of committee meetings of Japanese National Diet.

Construction of Spoken Language Model Considering Pause

Bibliographic Information

Search this article

Description

Journal

Related Projects

Keywords

Details 詳細情報について

Export

Report a problem