Neural Machine Translation of Classical Japanese Texts in the Late 19th Century Using Pretrained Language Models

喜友名 朝視顕, 平澤 寅庄, 小町 守, 小木曽 智信

doi:10.20729/00216233

明治・大正期に広く用いられた近代文語文は，現代の日本語話者にとって，専門知識がないと読むことが難しい．近代文語文の特徴として，現代の書き言葉と共通する単語は多いが，共通するnの大きなnグラムはほとんどないことがあげられる．本研究では，『学問のすゝめ』（1872-1876）と『人世三宝説』（1875）の翻訳に焦点を当てる．ニューラル翻訳モデルの学習に使用できる対訳コーパスが少ないという問題に対応するため，事前学習モデルを用いる．実験の結果，対訳コーパスを用いず単言語コーパスのみを用いることで，原文との類似度が高い出力が得られた．加えて，既存の自動評価指標とその変種がどの程度ポストエディットのコストを考慮できているかを調査した．

Classical Japanese texts in the late 19th century are difficult for contemporary Japanese to read without specialized knowledge. Specifically, this paper focuses on the translation of “An Encouragement of Learning” (1872-1876) and “Three Treasures in Human Life,” (1875) which have many identical unigrams between the source and reference sentences but no significant overlapping larger n-grams. We approach this task by using pretrained language models to address the associated data acquisition bottleneck. The results show that the use of an unsupervised method without fine-tuning on parallel data provides translation outputs with a high degree of similarity to the source text. In addition, we investigate the extent to which existing automatic evaluation metrics and their variants are able to account for post-editing cost.

Neural Machine Translation of Classical Japanese Texts in the Late 19th Century Using Pretrained Language Models

Bibliographic Information

Search this article

Description

Journal

Related Projects

Keywords

Details 詳細情報について

Export

Report a problem