決定木による日本語長文の短文分割

書誌事項

タイトル別名
  • The Application of Decision Trees to Segmentation of Long Japanese Sentences
  • ケッテイギ ニ ヨル ニホンゴ チョウブン ノ タンブン ブンカツ

この論文をさがす

抄録

It is well known that direct parsing of a long Japanese sentence, including many conjunctive clauses, is extremely difficult. Therefore, it is preferable to segment such a sentence into shorter, simpler ones prior to parsing. Some methods for sentence segmentation have been reported so far. However, because those conventional methods are based on handmade segmentation patterns or rules, they have problems in keeping consistency of the patterns, and in deciding the optimal order of applying those rules. This paper proposes a new method of sentence segmentation using a decision tree, which acquires optimal segmentation patterns and the optimal order of their application automatically from a corpus, taking both linguistic phenomena and their occurrence frequencies into account. Generation and evaluation of a decision tree for sentence segmentation were conducted on an EDR corpus. For 400 evaluation sentences, precision and recall were both 84%, and the percentage of correctly segmented sentences was 77%. It was also confirmed that pruning reduces the tree size significantly without deteriorating the performance.

収録刊行物

  • 自然言語処理

    自然言語処理 7 (1), 13-30, 2000

    一般社団法人 言語処理学会

参考文献 (16)*注記

もっと見る

詳細情報 詳細情報について

問題の指摘

ページトップへ