- 【Updated on May 12, 2025】 Integration of CiNii Dissertations and CiNii Books into CiNii Research
- Trial version of CiNii Research Knowledge Graph Search feature is available on CiNii Labs
- Suspension and deletion of data provided by Nikkei BP
- Regarding the recording of “Research Data” and “Evidence Data”
Bibliographic Information
- Other Title
-
- 説明文を対象とした日本語文末述語の平易化
Search this article
Description
日本語文の文末述語は,内容語とアスペクト・モダリティ・丁寧体などの機能表現の複雑な組合せからなることが多く,それがしばしば日本語学習者によるテキスト読解を妨げる要因となる.従来の語彙平易化手法の多くは,難解な語を単語単位で平易な同義語に置き換える枠組みを採用しており,文末述語の平易化には必ずしも適していない.そこで本研究では,難解表現の検出および換言候補の生成・検証・ランキングからなる基本的な語彙平易化のプロセスを採用しつつ,日本語文末述語を一括して平易に言い換える手法を提案する.本手法の最大の特徴は,換言候補の生成プロセスにおいて事前学習済みのマスク言語モデルであるBERTを効果的に適用することで,文全体の主要な意味を保持したまま,文末述語をまとめて平易化することである.これにより多様な表現候補の生成が可能となる.説明文を対象とした人手評価実験の結果,提案手法は複数の従来手法と比較して,一貫して多くの流暢かつ妥当な換言候補を生成できることが示された.さらに,(1)平均トークン埋め込みとドロップアウトの有効性,(2)生成された候補の平易度,(3)適用先テキストドメインによる性能の違い,(4)提案手法のエラー事例を詳細に調査することで,提案手法の挙動の特徴や改善点を明らかにした.
Japanese sentence-ending predicates tend to be composed of a complex sequence of content words and functional elements, such as aspect, modality, and honorifics, which can often hinder the understanding of language learners. Conventional lexical simplification methods, which are designed to replace difficult target words with simpler synonyms in a word-by-word manner, are not always suitable for simplifying such Japanese predicates. Here, we propose a novel method that can simplify the whole sequence of predicate, following a basic lexical simplification process consisting of detection, generation, validation and ranking steps. The principal feature of our method is the high ability to substitute the whole predicates with simple ones while maintaining their core meanings in the context by effectively using the pre-trained masked language model of BERT. Experimental results showed that our proposed method consistently produced many more candidates that are both fluent and adequate than the multiple baseline methods. Furthermore, we conducted in-depth analyses of (1) the effectiveness of the average token embedding and dropout, (2) the simplicity of generated candidates, (3) the differences of performance by text domain, and (4) the remaining errors of our proposed method, revealing the characteristics of our methods and future prospects for improvement.
Journal
-
- 情報処理学会論文誌
-
情報処理学会論文誌 62 (9), 1605-1619, 2021-09-15
dummy publisher
- Tweet
Details 詳細情報について
-
- CRID
- 1390009225897246976
-
- NII Article ID
- 170000185550
-
- NII Book ID
- AN00116647
-
- ISSN
- 18827764
-
- Web Site
- http://id.nii.ac.jp/1001/00212765/
-
- Text Lang
- ja
-
- Article Type
- journal article
-
- Data Source
-
- JaLC
- IRDB
- CiNii Articles
- KAKEN