書誌事項
- タイトル別名
-
- Data Augmentation Using Pretrained Models in Japanese Grammatical Error Correction
抄録
<p>Grammatical error correction (GEC) is commonly referred to as a machine translation task that converts an ungrammatical sentence to a grammatical sentence. This task requires a large amount of parallel data consisting of pairs of ungrammatical and grammatical sentences. However, for the Japanese GEC task, only a limited number of large-scale parallel data are available. Therefore, data augmentation (DA), which generates pseudo-parallel data, is being actively researched. Many previous studies have focused on generating ungrammatical sentences rather than grammatical sentences. To tackle this problem, this study proposes the BERT-DA algorithm, which is a DA algorithm that generates correct sentences using a pre-trained BERT model. In our experiments, we focused on two factors: the source data and the amount of data generated. Considering these elements proved to be more effective for BERT-DA. Based on the evaluation results of multiple domains, the BERT-DA model outperformed the existing system in terms of the Max Match and GLEU+.</p>
収録刊行物
-
- 人工知能学会論文誌
-
人工知能学会論文誌 38 (4), A-L41_1-10, 2023-07-01
一般社団法人 人工知能学会
- Tweet
キーワード
詳細情報 詳細情報について
-
- CRID
- 1390015191520703488
-
- ISSN
- 13468030
- 13460714
-
- 本文言語コード
- ja
-
- データソース種別
-
- JaLC
- Crossref
-
- 抄録ライセンスフラグ
- 使用不可