An Efficient Method of Determining Field Association Terms of Compound Words
-
- TSUJI TAKAKO
- Dept. of Information Science & Intelligent Systems, The University of Tokushima
-
- FUKETA MASAO
- Dept. of Information Science & Intelligent Systems, The University of Tokushima
-
- MORITA KAZUHIRO
- Dept. of Information Science & Intelligent Systems, The University of Tokushima
-
- AOE JUN-ICHI
- Dept. of Information Science & Intelligent Systems, The University of Tokushima
Bibliographic Information
- Other Title
-
- 複合語の分野連想語の効率的決定法
- フクゴウゴ ノ ブンヤ レンソウゴ ノ コウリツテキ ケッテイホウ
Search this article
Description
Although there are many kinds of research about text classification based on term information in the whole text, humans can recognize the field of a text by finding a small number of specific words in it. In this paper, such terms are called a field association (FA) term that can be directly related to the field of a text. It is possible to collect single-word FA terms because the number is finite, but there are some difficulties: how to select useful compound FA terms from a huge number of combinations of single-word FA terms. For FA terms, five association levels are defined and two kinds of ranks based on stability and inheritance are presented. Redundant candidates of compound FA terms can be removed remarkably by using the level and the rank. From the simulation results of 180 fields' Japanese text files, it turns out that the total number 88, 782 of candidates for compound FA terms can be reduced to 8, 405 which is about 9% to the original and that recall and precision are more than 0.77 and 0.90, respectively. From the experimental results of field determination using FA terms for 264 fragments of texts, it is shown that the accuracy by the presented method attains more than 90%, and that is about 30% higher than the case where only single-word FA terms are used.
Journal
-
- Journal of Natural Language Processing
-
Journal of Natural Language Processing 7 (2), 3-26, 2000
The Association for Natural Language Processing
- Tweet
Details 詳細情報について
-
- CRID
- 1390282679452139392
-
- NII Article ID
- 10008829582
-
- NII Book ID
- AN10472659
-
- ISSN
- 21858314
- 13407619
-
- NDL BIB ID
- 5437692
-
- Text Lang
- ja
-
- Data Source
-
- JaLC
- NDL Search
- Crossref
- CiNii Articles
-
- Abstract License Flag
- Disallowed