A Note on Document Classification with Small Training Data
-
- Maeda Yasunari
- Dept. of Computer Science, Kitami Institute of Technology
-
- Yoshida Hideki
- Dept. of Computer Science, Kitami Institute of Technology
-
- Suzuki Masakiyo
- Dept. of Computer Science, Kitami Institute of Technology
-
- Matsushima Toshiyasu
- Department of Applied Mathematics, Waseda University
Bibliographic Information
- Other Title
-
- 学習データが少量しかない場合の文書分類に関する一考察
- ガクシュウ データ ガ ショウリョウ シカ ナイ バアイ ノ ブンショ ブンルイ ニ カンスル イチ コウサツ
Search this article
Abstract
Document classification is one of important topics in the field of NLP (Natural Language Processing). In the previous research a document classification method has been proposed which minimizes an error rate with reference to a Bayes criterion. But when the number of documents in training data is small, the accuracy of the previous method is low. So in this research we use estimating data in order to estimate prior distributions. When the training data is small the accuracy using estimating data is higher than the accuracy of the previous method. But when the training data is big the accuracy using estimating data is lower than the accuracy of the previous method. So in this research we also propose another technique whose accuracy is higher than the accuracy of the previous method when the training data is small, and is almost the same as the accuracy of the previous method when the training data is big.
Journal
-
- IEEJ Transactions on Electronics, Information and Systems
-
IEEJ Transactions on Electronics, Information and Systems 131 (8), 1459-1466, 2011
The Institute of Electrical Engineers of Japan
- Tweet
Details 詳細情報について
-
- CRID
- 1390282679584458368
-
- NII Article ID
- 10030527175
-
- NII Book ID
- AN10065950
-
- ISSN
- 13488155
- 03854221
-
- NDL BIB ID
- 11196040
-
- Text Lang
- ja
-
- Data Source
-
- JaLC
- IRDB
- NDL
- Crossref
- CiNii Articles
-
- Abstract License Flag
- Disallowed