-
- Chay-intr Thodsaporn
- School of Engineering, Tokyo Institute of Technology
-
- Kamigaito Hidetaka
- Division of Information Science, Nara Institute of Science and Technology
-
- Funakoshi Kotaro
- Institute of Innovative Research, Tokyo Institute of Technology
-
- Okumura Manabu
- Institute of Innovative Research, Tokyo Institute of Technology
抄録
<p>A character sequence comprises at least one or more segmentation alternatives. This can be considered segmentation ambiguity and may weaken segmentation performance in word segmentation. Proper handling of such ambiguity lessens ambiguous decisions on word boundaries. Previous works have achieved remarkable segmentation performance and alleviated the ambiguity problem by incorporating the lattice, owing to its ability to capture segmentation alternatives, along with graph-based and pre-trained models. However, multiple granularity information, including character and word, in a lattice that encodes with such models may not be attentively exploited. To strengthen multi-granularity representations in a lattice, we propose the Lattice ATTentive Encoding (LATTE) method for character-based word segmentation. Our model employs the lattice structure to handle segmentation alternatives and utilizes graph neural networks along with an attention mechanism to attentively extract multi-granularity representation from the lattice for complementing character representations. Our experimental results demonstrated improvements in segmentation performance on the BCCWJ, CTB6, and BEST2010 datasets in three languages, particularly Japanese, Chinese, and Thai. </p>
収録刊行物
-
- 自然言語処理
-
自然言語処理 30 (2), 456-488, 2023
一般社団法人 言語処理学会
- Tweet
詳細情報 詳細情報について
-
- CRID
- 1390577917543772544
-
- ISSN
- 21858314
- 13407619
-
- 本文言語コード
- en
-
- データソース種別
-
- JaLC
- Crossref
-
- 抄録ライセンスフラグ
- 使用不可