Improving Text Categorization with Semantic Knowledge in Wikipedia

WANG Xiang, JIA Yan, CHEN Ruhua, FAN Hua, ZHOU Bin

doi:10.1587/transinf.e96.d.2786

Improving Text Categorization with Semantic Knowledge in Wikipedia

DOI Web Site 16 References

WANG Xiang

School of Computer, National University of Defense Technology
JIA Yan

School of Computer, National University of Defense Technology
CHEN Ruhua

School of Computer, National University of Defense Technology
FAN Hua

School of Computer, National University of Defense Technology
ZHOU Bin

School of Computer, National University of Defense Technology

Description

Text categorization, especially short text categorization, is a difficult and challenging task since the text data is sparse and multidimensional. In traditional text classification methods, document texts are represented with “Bag of Words (BOW)” text representation schema, which is based on word co-occurrence and has many limitations. In this paper, we mapped document texts to Wikipedia concepts and used the Wikipedia-concept-based document representation method to take the place of traditional BOW model for text classification. In order to overcome the weakness of ignoring the semantic relationships among terms in document representation model and utilize rich semantic knowledge in Wikipedia, we constructed a semantic matrix to enrich Wikipedia-concept-based document representation. Experimental evaluation on five real datasets of long and short text shows that our approach outperforms the traditional BOW method.

Journal

IEICE Transactions on Information and Systems

IEICE Transactions on Information and Systems E96.D (12), 2786-2794, 2013

The Institute of Electronics, Information and Communication Engineers

References(16)*help

Keywords

Details 詳細情報について

CRID

1390282679355953152
NII Article ID

130003385449
DOI

10.1587/transinf.e96.d.2786
ISSN

17451361

09168532
Web Site

https://www.jstage.jst.go.jp/article/transinf/E96.D/12/E96.D_2786/_pdf
Text Lang

en
Data Source
- JaLC
- Crossref
- CiNii Articles
Abstract License Flag
Disallowed

Improving Text Categorization with Semantic Knowledge in Wikipedia

Description

Journal

References(16)*help

Keywords

Details 詳細情報について

Export

Report a problem