Two-dimensional clustering for text categorization
説明
We propose a new method to improve the accuracy of Text Categorization using two-dimensional clustering. In a number of previous probabilistic approaches, texts in the same category are implicitly assumed to be generated from an identical distribution. We empirically show that this assumption is not accurate, and propose a new framework based on two-dimensional clustering to alleviate this problem. In our method, training texts are clustered so that the assumption is more likely to be true, and at the same time, features are also clustered in order to tackle the data sparseness problem. We conduct some experiments to validate the proposed two-dimensional clustering method.
収録刊行物
-
- proceeding of the 6th conference on Natural language learning - COLING-02
-
proceeding of the 6th conference on Natural language learning - COLING-02 20 1-7, 2002-01-01
Association for Computational Linguistics (ACL)