説明
We propose a new unsupervised learning method called multinomial PCA (MuPCA) for efficiently extracting the major latent topics from a document stream based on the "bag-of-words" (BOW) representation of a document. Unlike PCA, MuPCA follows a suitable probabilistic generative model for the document stream represented as time-series of word-frequency vectors. Using real data of document streams on the Web, we experimentally demonstrate the effectiveness of the proposed method.
収録刊行物
-
- Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005.
-
Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005. 2 238-243, 2006-01-05
IEEE