Sense-Aware Decoder for Character Based Japanese-Chinese NMT

  • LI Zezhong
    Faculty of Computer Science, Zhejiang University of Water Resources and Electric Power
  • REN Fuji
    Faculty of Computer Science, University of Electronic Science and Technology of China

抄録

<p>Compared to subword based Neural Machine Translation (NMT), character based NMT eschews linguistic-motivated segmentation which performs directly on the raw character sequence, following a more absolute end-to-end manner. This property is more fascinating for machine translation (MT) between Japanese and Chinese, both of which use consecutive logographic characters without explicit word boundaries. However, there is still one disadvantage which should be addressed, that is, character is a less meaning-bearing unit than the subword, which requires the character models to be capable of sense discrimination. Specifically, there are two types of sense ambiguities existing in the source and target language, separately. With the former, it has been partially solved by the deep encoder and several existing works. But with the later, interestingly, the ambiguity in the target side is rarely discussed. To address this problem, we propose two simple yet effective methods, including a non-parametric pre-clustering for sense induction and a joint model to perform sense discrimination and NMT training simultaneously. Extensive experiments on Japanese↔Chinese MT show that our proposed methods consistently outperform the strong baselines, and verify the effectiveness of using sense-discriminated representation for character based NMT.</p>

収録刊行物

参考文献 (11)*注記

もっと見る

詳細情報 詳細情報について

問題の指摘

ページトップへ