Quantifying Appropriateness of Summarization Data for Curriculum Learning

Kano Ryuji, Taniguchi Tomoki, Ohkuma Tomoko

doi:10.5715/jnlp.29.144

Bibliographic Information

Other Title

要約データの適切性定量化を利用したカリキュラムラーニング

Description

<p>Previous research of summarization models regards titles as summaries of source texts. However, much research has reported these training data are noisy. We propose an effective method of curriculum learning to train summarization models from noisy data. Curriculum learning is a method to improve performance by sorting training data based on difficulty or noisiness, and is effective to training models with noisy data. However, previous research never applied curriculum learning to summarization tasks. One aim of this research is to validate the effectiveness of curriculum learning to summarization tasks. In translation tasks, previous research quantified noise using two models trained with noisy and clean corpora. Because such corpora do not exist in summarization fields, it is difficult to apply this method to summarization tasks. Another aim of this research is to propose a model that can quantify noise using a single noisy corpus. The training task of the proposed model, Appropriateness Estimator is to distinguish correct source-summary pairs of from randomly assigned pairs. Throughout the training, the model learns to compute the appropriateness of source-summary pairs. We conduct experiments on three summarization models and verify curriculum learning and our method improves the performance. </p>

Journal

Journal of Natural Language Processing

Journal of Natural Language Processing 29 (1), 144-165, 2022

The Association for Natural Language Processing

Keywords

Details 詳細情報について

CRID: 1390291767636827904

DOI: 10.5715/jnlp.29.144

ISSN: 21858314; 13407619

Web Site: https://www.jstage.jst.go.jp/article/jnlp/29/1/29_144/_pdf

Text Lang: ja

Data Source

JaLC
Crossref
OpenAIRE

Abstract License Flag: Disallowed

Export

Report a problem