Neural RST-Style Discourse Parsing Exploiting Agreement Sub-trees as Silver Data

Bibliographic Information

Other Title
  • 疑似正解データを活用したニューラル修辞構造解析

Abstract

<p>Recent Rhetorical Structure Theory (RST)-style discourse parsing methods are trained by supervised learning, requiring an annotated corpus of sufficient size and quality. However, the RST Discourse Treebank, the most extensive corpus, consists of only 385 documents. This is insufficient to learn a long-tailed rhetorical-relation label distribution. To solve this problem, we propose a novel approach to improve the performance of low-frequency labels. Our approach utilized a silver dataset obtained from different parsers as a teacher parser. We extracted agreement subtrees from RST trees built by multiple teacher parsers to obtain a more reliable silver dataset. We used span-based top-down RST parser, a neural SOTA model, as a student parser. In our training procedure, we first pre-trained the student parser by the silver dataset and then fine-tuned it with a gold dataset, a human-annotated dataset. Experimental results showed that our parser achieved excellent scores for nuclearity and relation, that is, 64.7 and 54.1, respectively, on the Original Parseval.</p>

Journal

References(37)*help

See more

Related Projects

See more

Details 詳細情報について

Report a problem

Back to top