Comparison of Clustering Results for k-means by using different seeding methods

  • Onoda Takashi
    Central Research Institute of Electric Power Industory Tokyo Institute of Technology
  • Sakai Miho
    Tokyo Institute of Technology
  • Yamada Seiji
    Tokyo Institute of Technology National Institute of Informatics

Bibliographic Information

Other Title
  • 初期値設定法の違いによるk-means法の性能比較

Description

The k-means clustering method is a widely used clustering technique for the Web because of its simplicity and speed. However, the clustering result depends heavily on the chosen initial clustering centers, which are chosen uniformly at random from the data points. We propose a seeding method based on the independent component analysis for the k-means clustering method. We evaluate the performance of our proposed method and compare it with other seeding methods by using benchmark datasets. We applied our proposed method to a Web corpus, which is provided by ODP. The experiments show that the normalized mutual information of our proposed method is better than the normalized mutual information of k-means clustering method and k-means++ clustering method. Therefore, the proposed method is useful for Web corpus.

Journal

Details 詳細情報について

  • CRID
    1390282680650589184
  • NII Article ID
    130004591966
  • DOI
    10.14864/fss.27.0.55.0
  • Text Lang
    ja
  • Data Source
    • JaLC
    • CiNii Articles
  • Abstract License Flag
    Disallowed

Report a problem

Back to top