Information Retrieval Using Non-negative Matrix Factorization
-
- Tsuge Satoru
- Faculty of Engineering, Tokushima University
-
- Shishibori Masami
- Faculty of Engineering, Tokushima University
-
- Kuroiwa Singo
- Faculty of Engineering, Tokushima University
-
- Kita Kenji
- Center for Advanced Information Technology, Tokushima University
Bibliographic Information
- Other Title
-
- Non-negative Matrix Factorizationを用いたベクトル空間情報検索モデルの次元削減手法
- Non negative Matrix Factorization オ モチイタ ベクトル クウカン ジョウホウ ケンサク モデル ノ ジゲン サクゲン シュホウ
Search this article
Description
The Vector Space Model (VSM) is a conventional information retrieval model, which represents a document collection by a term-by-document matrix. Since term-by-document matrices are usually high-dimensional and sparse, they are susceptible to noise and are also difficult to capture the underlying semantic structure. Dimensionality reduction is a way to overcome these problems. Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) are popular techniques for dimensionality reduction based on matrix decomposition, however they contain both positive and negative values in the decomposed matrices. In the work described here, we use Non-negative Matrix Factorization (NMF) for dimensionality reduction of the vector space model. Since matrices decomposed by NMF only contain non-negative values, the original data are represented by only additive, not subtractive, combinations of the basis vectors. This characteristic of parts-based representation is appealing because it reflects the intuitive notion of combining parts to form a whole. Using MEDLINE collection, we experimentally showed that NMF offers great improvement over the vector space model.
Journal
-
- IEEJ Transactions on Electronics, Information and Systems
-
IEEJ Transactions on Electronics, Information and Systems 124 (7), 1500-1506, 2004
The Institute of Electrical Engineers of Japan
- Tweet
Keywords
Details 詳細情報について
-
- CRID
- 1390282679580788480
-
- NII Article ID
- 10013268306
-
- NII Book ID
- AN10065950
-
- ISSN
- 13488155
- 03854221
-
- NDL BIB ID
- 7020378
-
- Text Lang
- ja
-
- Data Source
-
- JaLC
- NDL Search
- Crossref
- CiNii Articles
- KAKEN
- OpenAIRE
-
- Abstract License Flag
- Disallowed