書誌事項
- タイトル別名
-
- Text Classification Using the Sum of Frequency Ratios of Word andN-gram Over Categories
- タンゴ ト N gram ノ カク カテゴリ ニ オケル シュツゲン ヒンド ノ ヒ ノ ワ オ モチイタ テキスト ジドウ ブンルイ シュホウ
この論文をさがす
抄録
In this paper, we consider the automatic text classification as a series of information processing, and propose a new classification technique, namely, “Frequency Ratio Accumulation Method (FRAM)”. This is a simple technique that calculates the sum of ratios of term frequency in each category. However, it has a desirable property that feature terms can be used without their extraction procedure. Then, we use “character N-gram” and “word N-gram” as feature terms by using this property of our classification technique. Next, we evaluate our technique by some experiments. In our experiments, we classify the newspaper articles of Japanese “CD-Mainichi 2002” and English “Reuters-21578” using the Naive Bayes (baseline method) and the proposed method. As the result, we show that the classification accuracy of the proposed method improves greatly compared with the baseline. That is, it is 89.6% for Mainichi, 87.8% for Reuters. Thus, the proposed method has a very high performance. Though the proposed method is a simple technique, it has a new viewpoint, a high potential and is language-independent, so it can be expected the development in the future.
収録刊行物
-
- 電気学会論文誌C(電子・情報・システム部門誌)
-
電気学会論文誌C(電子・情報・システム部門誌) 129 (1), 118-124, 2009
一般社団法人 電気学会
- Tweet
詳細情報 詳細情報について
-
- CRID
- 1390282679581925504
-
- NII論文ID
- 10023999364
-
- NII書誌ID
- AN10065950
-
- ISSN
- 13488155
- 03854221
-
- NDL書誌ID
- 9763778
-
- 本文言語コード
- ja
-
- データソース種別
-
- JaLC
- NDL
- Crossref
- CiNii Articles
-
- 抄録ライセンスフラグ
- 使用不可