-
- WANG Xiang
- School of Computer, National University of Defense Technology
-
- JIA Yan
- School of Computer, National University of Defense Technology
-
- CHEN Ruhua
- School of Computer, National University of Defense Technology
-
- FAN Hua
- School of Computer, National University of Defense Technology
-
- ZHOU Bin
- School of Computer, National University of Defense Technology
説明
Text categorization, especially short text categorization, is a difficult and challenging task since the text data is sparse and multidimensional. In traditional text classification methods, document texts are represented with “Bag of Words (BOW)” text representation schema, which is based on word co-occurrence and has many limitations. In this paper, we mapped document texts to Wikipedia concepts and used the Wikipedia-concept-based document representation method to take the place of traditional BOW model for text classification. In order to overcome the weakness of ignoring the semantic relationships among terms in document representation model and utilize rich semantic knowledge in Wikipedia, we constructed a semantic matrix to enrich Wikipedia-concept-based document representation. Experimental evaluation on five real datasets of long and short text shows that our approach outperforms the traditional BOW method.
収録刊行物
-
- IEICE Transactions on Information and Systems
-
IEICE Transactions on Information and Systems E96.D (12), 2786-2794, 2013
一般社団法人 電子情報通信学会
- Tweet
詳細情報 詳細情報について
-
- CRID
- 1390282679355953152
-
- NII論文ID
- 130003385449
-
- ISSN
- 17451361
- 09168532
-
- 本文言語コード
- en
-
- データソース種別
-
- JaLC
- Crossref
- CiNii Articles
-
- 抄録ライセンスフラグ
- 使用不可