Regularized k-means clustering of high-dimensional data and its asymptotic consistency
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Author(s)
Detail(s)
Original language | English |
---|---|
Pages (from-to) | 148-167 |
Journal / Publication | Electronic Journal of Statistics |
Volume | 6 |
Publication status | Published - 2012 |
Externally published | Yes |
Link(s)
Abstract
K-means clustering is a widely used tool for cluster analysis due to its conceptual simplicity and computational efficiency. However, its performance can be distorted when clustering high-dimensional data where the number of variables becomes relatively large and many of them may contain no information about the clustering structure. This article proposes a high-dimensional cluster analysis method via regularized k-means clus- tering, which can simultaneously cluster similar observations and eliminate redundant variables. The key idea is to formulate the k-means clustering in a form of regularization, with an adaptive group lasso penalty term on cluster centers. In order to optimally balance the trade-off between the clustering model fitting and sparsity, a selection criterion based on clustering stabil- ity is developed. The asymptotic estimation and selection consistency of the regularized k-means clustering with diverging dimension is established. The effectiveness of the regularized k-means clustering is also demonstrated through a variety of numerical experiments as well as applications to two gene microarray examples. The regularized clustering framework can also be extended to the general model-based clustering.
Research Area(s)
- Diverging dimension, K-means, Lasso, Selection consistency, Stability, Variable selection
Citation Format(s)
Regularized k-means clustering of high-dimensional data and its asymptotic consistency. / Sun, Wei; Wang, Junhui; Fang, Yixin.
In: Electronic Journal of Statistics, Vol. 6, 2012, p. 148-167.
In: Electronic Journal of Statistics, Vol. 6, 2012, p. 148-167.
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review