Applying Suffix Rules to Organization Name Recognition
-
- INUI Takashi
- Graduate School of Systems and Information Engineering, University of Tsukuba
-
- MURAKAMI Koji
- Graduate School of Information Science, Nara Institute of Science and Technology
-
- HASHIMOTO Taiichi
- Integrated Research Institute, Tokyo Institute of Technology
-
- UTSUMI Kazuo
- Integrated Research Institute, Tokyo Institute of Technology
-
- ISHIKAWA Masamichi
- Integrated Research Institute, Tokyo Institute of Technology
Bibliographic Information
- Other Title
-
- 接尾辞情報を利用した文書からの組織名抽出
Abstract
This paper presents a method for boosting the performance of the organization name recognition, which is a part of named entity recognition (NER). Although gazetteers (lists of the NEs) have been known as one of the effective features for supervised machine learning approaches on the NER task, the previous methods which have applied the gazetteers to the NER were very simple. The gazetteers have been used just for searching the exact matches between input text and NEs included in them. The proposed method generates regular expression rules from gazetteers, and, with these rules, it can realize a high-coverage searches based on looser matches between input text and NEs. To generate these rules, we focus on the two well-known characteristics of NE expressions; 1) most of NE expressions can be divided into two parts, class-reference part and instance-reference part, 2) for most of NE expressions the class-reference parts are located at the suffix position of them. A pattern mining algorithm runs on the set of NEs in the gazetteers, and some frequent word sequences from which NEs are constructed are found. Then, we employ only word sequences which have the class-reference part at the suffix position as suffix rules. Experimental results showed that our proposed method improved the performance of the organization name recognition, and achieved the 84.58 F-value for evaluation data.
Journal
-
- Transactions of the Japanese Society for Artificial Intelligence
-
Transactions of the Japanese Society for Artificial Intelligence 24 (6), 469-479, 2009
The Japanese Society for Artificial Intelligence
- Tweet
Keywords
Details 詳細情報について
-
- CRID
- 1390282680085791744
-
- NII Article ID
- 130000137882
-
- ISSN
- 13468030
- 13460714
-
- Text Lang
- ja
-
- Data Source
-
- JaLC
- Crossref
- CiNii Articles
-
- Abstract License Flag
- Disallowed