Encoder-Decoder Attention ≠ Word Alignment: Axiomatic Method of Learning Word Alignments for Neural Machine Translation
-
- Ma Chunpeng
- Harbin Institute of Technology
-
- Tamura Akihiro
- Ehime University
-
- Utiyama Masao
- National Institute of Information and Communications Technology
-
- Zhao Tiejun
- Harbin Institute of Technology
-
- Sumita Eiichiro
- National Institute of Information and Communications Technology
Search this article
Description
<p>The encoder-decoder attention matrix has been regarded as the (soft) alignment model for conventional neural machine translation (NMT) models such as RNN-based models. However, we show empirically that this is not true for the Transformer. By comparing the Transformer with the RNN-based NMT model, we find two inherent differences, and accordingly present two methods of capturing word alignments in the Transformer. Furthermore, instead of focusing on the Transformer, we present three axioms for the attention mechanism that captures word alignments, and propose a new attention mechanism based on these axioms that we have termed the axiomatic attention mechanism (AAM), and which is applicable to any NMT models. The AAM depends on a perturbation function, and we apply several perturbation functions to the AAM, including a novel function based on the masked language model (Devlin, Chang, Lee, and Toutanova 2019). Using the AAM to guide the training of an NMT model improved both the translation performance and the learning of word alignments of the NMT model. Our research sheds light on the interpretation of sequence-to-sequence models on neural machine translation. </p>
Journal
-
- Journal of Natural Language Processing
-
Journal of Natural Language Processing 27 (3), 531-552, 2020-09-15
The Association for Natural Language Processing
- Tweet
Details 詳細情報について
-
- CRID
- 1390568456336515584
-
- NII Article ID
- 130007956043
-
- NII Book ID
- AN10472659
-
- ISSN
- 21858314
- 13407619
-
- NDL BIB ID
- 030660404
-
- Text Lang
- en
-
- Data Source
-
- JaLC
- NDL
- Crossref
- CiNii Articles
- KAKEN
-
- Abstract License Flag
- Disallowed