Improving Text Categorization with Semantic Knowledge in Wikipedia

WANG Xiang, JIA Yan, CHEN Ruhua, FAN Hua, ZHOU Bin

doi:10.1587/transinf.e96.d.2786

Improving Text Categorization with Semantic Knowledge in Wikipedia

DOI Web Site 参考文献16件

WANG Xiang

School of Computer, National University of Defense Technology
JIA Yan

School of Computer, National University of Defense Technology
CHEN Ruhua

School of Computer, National University of Defense Technology
FAN Hua

School of Computer, National University of Defense Technology
ZHOU Bin

School of Computer, National University of Defense Technology

説明

Text categorization, especially short text categorization, is a difficult and challenging task since the text data is sparse and multidimensional. In traditional text classification methods, document texts are represented with “Bag of Words (BOW)” text representation schema, which is based on word co-occurrence and has many limitations. In this paper, we mapped document texts to Wikipedia concepts and used the Wikipedia-concept-based document representation method to take the place of traditional BOW model for text classification. In order to overcome the weakness of ignoring the semantic relationships among terms in document representation model and utilize rich semantic knowledge in Wikipedia, we constructed a semantic matrix to enrich Wikipedia-concept-based document representation. Experimental evaluation on five real datasets of long and short text shows that our approach outperforms the traditional BOW method.

収録刊行物

IEICE Transactions on Information and Systems

IEICE Transactions on Information and Systems E96.D (12), 2786-2794, 2013

一般社団法人電子情報通信学会

参考文献 (16)*注記

詳細情報詳細情報について

CRID

1390282679355953152
NII論文ID

130003385449
DOI

10.1587/transinf.e96.d.2786
ISSN

17451361

09168532
Web Site

https://www.jstage.jst.go.jp/article/transinf/E96.D/12/E96.D_2786/_pdf
本文言語コード

en
データソース種別
- JaLC
- Crossref
- CiNii Articles
抄録ライセンスフラグ
使用不可

書き出し

問題の指摘

ページトップへ

Improving Text Categorization with Semantic Knowledge in Wikipedia

説明

収録刊行物

参考文献 (16)*注記

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

詳細情報詳細情報について