- 【Updated on May 12, 2025】 Integration of CiNii Dissertations and CiNii Books into CiNii Research
- Trial version of CiNii Research Knowledge Graph Search feature is available on CiNii Labs
- 【Updated on June 30, 2025】Suspension and deletion of data provided by Nikkei BP
- Regarding the recording of “Research Data” and “Evidence Data”
Text Classification Using the Sum of Frequency Ratios of Word andN-gram Over Categories
-
- Suzuki Makoto
- Dept. of Information Science, Shonan Institute of Technology
-
- Hirasawa Shigeichi
- School of Creative Science and Engineering, Waseda University
Bibliographic Information
- Other Title
-
- 単語とN-gramの各カテゴリにおける出現頻度の比の和を用いたテキスト自動分類手法
- タンゴ ト N gram ノ カク カテゴリ ニ オケル シュツゲン ヒンド ノ ヒ ノ ワ オ モチイタ テキスト ジドウ ブンルイ シュホウ
Search this article
Description
In this paper, we consider the automatic text classification as a series of information processing, and propose a new classification technique, namely, “Frequency Ratio Accumulation Method (FRAM)”. This is a simple technique that calculates the sum of ratios of term frequency in each category. However, it has a desirable property that feature terms can be used without their extraction procedure. Then, we use “character N-gram” and “word N-gram” as feature terms by using this property of our classification technique. Next, we evaluate our technique by some experiments. In our experiments, we classify the newspaper articles of Japanese “CD-Mainichi 2002” and English “Reuters-21578” using the Naive Bayes (baseline method) and the proposed method. As the result, we show that the classification accuracy of the proposed method improves greatly compared with the baseline. That is, it is 89.6% for Mainichi, 87.8% for Reuters. Thus, the proposed method has a very high performance. Though the proposed method is a simple technique, it has a new viewpoint, a high potential and is language-independent, so it can be expected the development in the future.
Journal
-
- IEEJ Transactions on Electronics, Information and Systems
-
IEEJ Transactions on Electronics, Information and Systems 129 (1), 118-124, 2009
The Institute of Electrical Engineers of Japan
- Tweet
Details 詳細情報について
-
- CRID
- 1390282679581925504
-
- NII Article ID
- 10023999364
-
- NII Book ID
- AN10065950
-
- ISSN
- 13488155
- 03854221
-
- NDL BIB ID
- 9763778
-
- Text Lang
- ja
-
- Data Source
-
- JaLC
- NDL Search
- Crossref
- CiNii Articles
- OpenAIRE
-
- Abstract License Flag
- Disallowed