Construction of Japanese Semantic Role Labeling System Using Hierarchical Tag Context Trees Extracted from Tail Expressions of Dependency Elements

石原, 靖弘, 竹内, 孔一, Yasuhiro, Ishihara, Koichi, Takeuchi

近年，言語処理において，述語とその係り元との関係をタイプ分けする意味役割付与の研究が英語圏を中心に進められている．英語の意味役割付与では統計的学習モデルに対して構文解析情報を特徴量として利用している．一方，日本語では表層格を関係ラベルとする述語項構造解析において有効な特徴量が提案されており，意味役割付与においても同様に有効であると考えられるが，より多様な関係ラベルを扱う意味役割付与において他に有効な特徴量が存在するかに関しては検討の余地があると考えられる．しかしながら，日本語では意味役割付与データが少なく，そのような検討についてはこれまでほとんどなされていない．そこで，最近構築された日本語意味役割付与データに対して，統計的学習モデルによる意味役割付与システムを構築し，どのような特徴量が精度向上に寄与するか明らかにする．本論文では日本語の機能表現が意味役割の決定に関与することに注目し，階層的な可変長n-gramをコーパスから獲得するHTCT（階層的タグ文脈木）を特徴量抽出器として利用する．意味役割付与実験においてHTCTから得た特徴量を使用した場合に付与精度が向上することを示す．また機能語辞書や長単位解析器を利用した場合と比較してHTCTを利用して抽出した特徴量がより効果的であることを示す．さらに，コーパスから直接獲得した固定長n-gramの特徴量と組み合わせることでより良い精度が得られることを示す．またHTCTを含む提案する特徴量は述語項構造解析の特徴量とともに使用した場合においても有効に働くことを示す．

In recent natural language processing, semantic role labeling (SRL) which determines semantic relations between predicates and their arguments have been studied especially for English language. In English, syntactic features are effective for statistical learning model-based semantic role labelers. In Japanese, on the other hand, effective features have been being revealed in Japanese case marker-based predicate argument structure analysis, and they should also be effective for Japanese SRL because of the similarity of the tasks, however, just the features might not be enough for Japanese SRL due to the diversity of the labels it considers. Unfortunately, there had been few language resources in Japanese that can be used for SRL, and therefore, researches on effective features for SRL as well. This paper reveals effective features for SRL systems based on statistical learning approach with a Japanese semantic role labeled corpus released recently. In the preliminary study we found that functional multi-word expressions in the arguments have a great influence on determining their semantic role labels. Thus we exploit hierarchical tag context trees (HTCTs), which can obtain variable length n-grams to extract generalized functional multi-word expressions as features for SRL systems. The experimental results show that the SRL systems added the features obtained by HTCTs outperform the system with the features extracted using a dictionary on multiword expressions. Additionally, the systems using both the variable length features with HTCTs and the fixed length features directly extracted from the corpus show a better accuracy. The experimental comparison with the features developed in Japanese predicate-argument analysis shows that our proposed features are still effective when used with them.

Construction of Japanese Semantic Role Labeling System Using Hierarchical Tag Context Trees Extracted from Tail Expressions of Dependency Elements

Bibliographic Information

Search this article

Description

Journal

Keywords

Details 詳細情報について

Export

Report a problem