Morphological Analysis of Unsegmented Kana Strings Using Recurrent Neural Network Language Model

森山, 柊平, 大野, 誠寛, 増田, 英孝, 絹川, 博之, Shuhei, Moriyama, Tomohiro, Ohno, Hidetaka, Masuda, Hiroshi, Kinukawa

外国人向け初級日本語教育では，日本語の読みを学ぶために，学習者は最初にかなのべた書きで作文を行う．このため，習い始めの学習者を対象とした学習支援システムはべた書きかな文を形態素解析する必要がある．しかし，従来の形態素解析器は，主に漢字かなまじり文により学習されており，べた書きかな文の解析にそのまま適用することはできない．一部，べた書きかな文により学習し直した解析器を用いて，かなで構成された絵本テキストの形態素解析を試みた研究が存在するが，漢字かなまじり文に対する解析と比べて，十分な解析精度は得られていない．そこで本稿では，誤りを含まないべた書きかな文を対象として，形態素周辺確率とRecurrent neural network language model（RNNLM）を用いた形態素解析手法を提案する．RNNLMの効果により単語系列の意味的自然さをとらえた解析を，また，形態素周辺確率の効果によりビームサーチにおける最適経路の取りこぼしの軽減を期待できる．評価実験では，新聞記事から生成したべた書きかな文に対する形態素解析を実施した．RNNLMによる悪影響や最適経路の取りこぼしの残存などによる失敗があるものの，単語分割と単語素性すべての一致を正解とする最も厳しい基準において，提案手法のF値は95.52を達成し，従来手法よりも有意（p<0.01）に上回ることを確認した．

In elementary Japanese language education for foreigners, students only use kana characters for writing in order to learn how to pronounce Japanese words. Therefore, an elementary Japanese language learning system needs to analyze unsegmented kana strings as a method of preprocessing to find errors and to give advice for their correction. Conventional morphological analyzers are trained on native-speech sentences, which contain characters other than kana characters. Thus, analyzers cannot simply be applied to sentences composed of only kana characters. Although there has been research that performs morphological analysis of kana-string sentences on picture books using an analysis tool re-trained by kana-string sentences, its analytical accuracy is not high enough. We propose a morphological analysis method integrating a conventional method and recurrent neural network language model (RNNLM) for kana-string sentences not containing grammatical errors. Our method can perform morphological analysis catching semantic plausibility of a word sequence through the RNNLM. We conducted an experiment on morphological analysis of kana-string sentences. Although there were some errors caused by the harmful effects of the RNNLM, we confirmed that our method achieved an F-measure of 95.52 on the hardest evaluation criterion and significantly outperformed the conventional methods (p<0.01).

Morphological Analysis of Unsegmented Kana Strings Using Recurrent Neural Network Language Model

Bibliographic Information

Search this article

Description

Journal

Related Projects

Keywords

Details 詳細情報について

Export

Report a problem