An Efficient Method of Determining Field Association Terms of Compound Words

  • TSUJI TAKAKO
    Dept. of Information Science & Intelligent Systems, The University of Tokushima
  • FUKETA MASAO
    Dept. of Information Science & Intelligent Systems, The University of Tokushima
  • MORITA KAZUHIRO
    Dept. of Information Science & Intelligent Systems, The University of Tokushima
  • AOE JUN-ICHI
    Dept. of Information Science & Intelligent Systems, The University of Tokushima

Bibliographic Information

Other Title
  • 複合語の分野連想語の効率的決定法
  • フクゴウゴ ノ ブンヤ レンソウゴ ノ コウリツテキ ケッテイホウ

Search this article

Abstract

Although there are many kinds of research about text classification based on term information in the whole text, humans can recognize the field of a text by finding a small number of specific words in it. In this paper, such terms are called a field association (FA) term that can be directly related to the field of a text. It is possible to collect single-word FA terms because the number is finite, but there are some difficulties: how to select useful compound FA terms from a huge number of combinations of single-word FA terms. For FA terms, five association levels are defined and two kinds of ranks based on stability and inheritance are presented. Redundant candidates of compound FA terms can be removed remarkably by using the level and the rank. From the simulation results of 180 fields' Japanese text files, it turns out that the total number 88, 782 of candidates for compound FA terms can be reduced to 8, 405 which is about 9% to the original and that recall and precision are more than 0.77 and 0.90, respectively. From the experimental results of field determination using FA terms for 264 fragments of texts, it is shown that the accuracy by the presented method attains more than 90%, and that is about 30% higher than the case where only single-word FA terms are used.

Journal

Citations (2)*help

See more

References(33)*help

See more

Details 詳細情報について

Report a problem

Back to top