Sense-Aware Decoder for Character Based Japanese-Chinese NMT

LI Zezhong, REN Fuji

doi:10.1587/transinf.2023edl8059

<p>Compared to subword based Neural Machine Translation (NMT), character based NMT eschews linguistic-motivated segmentation which performs directly on the raw character sequence, following a more absolute end-to-end manner. This property is more fascinating for machine translation (MT) between Japanese and Chinese, both of which use consecutive logographic characters without explicit word boundaries. However, there is still one disadvantage which should be addressed, that is, character is a less meaning-bearing unit than the subword, which requires the character models to be capable of sense discrimination. Specifically, there are two types of sense ambiguities existing in the source and target language, separately. With the former, it has been partially solved by the deep encoder and several existing works. But with the later, interestingly, the ambiguity in the target side is rarely discussed. To address this problem, we propose two simple yet effective methods, including a non-parametric pre-clustering for sense induction and a joint model to perform sense discrimination and NMT training simultaneously. Extensive experiments on Japanese↔Chinese MT show that our proposed methods consistently outperform the strong baselines, and verify the effectiveness of using sense-discriminated representation for character based NMT.</p>

Sense-Aware Decoder for Character Based Japanese-Chinese NMT

抄録

収録刊行物

参考文献 (11)*注記

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

Sense-Aware Decoder for Character Based Japanese-Chinese NMT

抄録

収録刊行物

参考文献 (11)*注記

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

詳細情報詳細情報について