An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Author(s)
Detail(s)
Original language | English |
---|---|
Pages (from-to) | 785-795 |
Journal / Publication | Knowledge-Based Systems |
Volume | 24 |
Issue number | 6 |
Publication status | Published - Aug 2011 |
Link(s)
Abstract
The leading partitional clustering technique, k-modes, is one of the most computationally efficient clustering methods for categorical data. However, in the k-modes-type algorithms, the performance of their clustering depends on initial cluster centers and the number of clusters needs be known or given in advance. This paper proposes a novel initialization method for categorical data which is implemented to the k-modes-type algorithms. The proposed method can not only obtain the good initial cluster centers but also provide a criterion to find candidates for the number of clusters. The performance and scalability of the proposed method has been studied on real data sets. The experimental results illustrate that the proposed method is effective and can be applied to large data sets for its linear time complexity with respect to the number of data points. © 2011 Elsevier B.V. All rights reserved.
Research Area(s)
- Categorical data, Density measure, Initial cluster centers, The k-modes-type algorithms, The number of clusters
Citation Format(s)
An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data. / Bai, Liang; Liang, Jiye; Dang, Chuangyin.
In: Knowledge-Based Systems, Vol. 24, No. 6, 08.2011, p. 785-795.
In: Knowledge-Based Systems, Vol. 24, No. 6, 08.2011, p. 785-795.
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review