Encoder-Decoder Attention ≠ Word Alignment: Axiomatic Method of Learning Word Alignments for Neural Machine Translation

この論文をさがす

説明

<p>The encoder-decoder attention matrix has been regarded as the (soft) alignment model for conventional neural machine translation (NMT) models such as RNN-based models. However, we show empirically that this is not true for the Transformer. By comparing the Transformer with the RNN-based NMT model, we find two inherent differences, and accordingly present two methods of capturing word alignments in the Transformer. Furthermore, instead of focusing on the Transformer, we present three axioms for the attention mechanism that captures word alignments, and propose a new attention mechanism based on these axioms that we have termed the axiomatic attention mechanism (AAM), and which is applicable to any NMT models. The AAM depends on a perturbation function, and we apply several perturbation functions to the AAM, including a novel function based on the masked language model (Devlin, Chang, Lee, and Toutanova 2019). Using the AAM to guide the training of an NMT model improved both the translation performance and the learning of word alignments of the NMT model. Our research sheds light on the interpretation of sequence-to-sequence models on neural machine translation. </p>

収録刊行物

  • 自然言語処理

    自然言語処理 27 (3), 531-552, 2020-09-15

    一般社団法人 言語処理学会

被引用文献 (2)*注記

もっと見る

参考文献 (16)*注記

もっと見る

関連プロジェクト

もっと見る

詳細情報 詳細情報について

問題の指摘

ページトップへ