Neural RST-Style Discourse Parsing Exploiting Agreement Sub-trees as Silver Data

Kobayashi Naoki, Hirao Tsutomu, Kamigaito Hidetaka, Okumura Manabu, Nagata Masaaki

doi:10.5715/jnlp.29.875

Bibliographic Information

Other Title

疑似正解データを活用したニューラル修辞構造解析

Abstract

<p>Recent Rhetorical Structure Theory (RST)-style discourse parsing methods are trained by supervised learning, requiring an annotated corpus of sufficient size and quality. However, the RST Discourse Treebank, the most extensive corpus, consists of only 385 documents. This is insufficient to learn a long-tailed rhetorical-relation label distribution. To solve this problem, we propose a novel approach to improve the performance of low-frequency labels. Our approach utilized a silver dataset obtained from different parsers as a teacher parser. We extracted agreement subtrees from RST trees built by multiple teacher parsers to obtain a more reliable silver dataset. We used span-based top-down RST parser, a neural SOTA model, as a student parser. In our training procedure, we first pre-trained the student parser by the silver dataset and then fine-tuned it with a gold dataset, a human-annotated dataset. Experimental results showed that our parser achieved excellent scores for nuclearity and relation, that is, 64.7 and 54.1, respectively, on the Original Parseval.</p>

Journal

Journal of Natural Language Processing

Journal of Natural Language Processing 29 (3), 875-900, 2022

The Association for Natural Language Processing

Keywords

Details 詳細情報について

CRID: 1390856374249912064

DOI: 10.5715/jnlp.29.875

ISSN: 21858314; 13407619

Web Site: https://www.jstage.jst.go.jp/article/jnlp/29/3/29_875/_pdf

Text Lang: ja

Data Source

JaLC
Crossref
KAKEN

Abstract License Flag: Disallowed

Export