An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review

67 Scopus Citations
View graph of relations



Original languageEnglish
Pages (from-to)785-795
Journal / PublicationKnowledge-Based Systems
Issue number6
Publication statusPublished - Aug 2011


The leading partitional clustering technique, k-modes, is one of the most computationally efficient clustering methods for categorical data. However, in the k-modes-type algorithms, the performance of their clustering depends on initial cluster centers and the number of clusters needs be known or given in advance. This paper proposes a novel initialization method for categorical data which is implemented to the k-modes-type algorithms. The proposed method can not only obtain the good initial cluster centers but also provide a criterion to find candidates for the number of clusters. The performance and scalability of the proposed method has been studied on real data sets. The experimental results illustrate that the proposed method is effective and can be applied to large data sets for its linear time complexity with respect to the number of data points. © 2011 Elsevier B.V. All rights reserved.

Research Area(s)

  • Categorical data, Density measure, Initial cluster centers, The k-modes-type algorithms, The number of clusters