TY - GEN
T1 - Stability of K-means clustering
AU - Rakhlin, Alexander
AU - Caponnetto, Andrea
PY - 2007
Y1 - 2007
N2 - We phrase K-means clustering as an empirical risk minimization procedure over a class HK and explicitly calculate the covering number for this class. Next, we show that stability of K-means clustering is characterized by the geometry of H K with respect to the underlying distribution. We prove that in the case of a unique global minimizer, the clustering solution is stable with respect to complete changes of the data, while for the case of multiple minimizers, the change of Ω(n 1/2) samples defines the transition between stability and instability. While for a finite number of minimizers this result follows from multinomial distribution estimates, the case of infinite minimizers requires more refined tools. We conclude by proving that stability of the functions in H K implies stability of the actualcenters of the clusters. Since stability is often used for selecting the number of clusters in practice, we hope that our analysis serves as a starting point for finding theoretically grounded recipes for the choice of K.
AB - We phrase K-means clustering as an empirical risk minimization procedure over a class HK and explicitly calculate the covering number for this class. Next, we show that stability of K-means clustering is characterized by the geometry of H K with respect to the underlying distribution. We prove that in the case of a unique global minimizer, the clustering solution is stable with respect to complete changes of the data, while for the case of multiple minimizers, the change of Ω(n 1/2) samples defines the transition between stability and instability. While for a finite number of minimizers this result follows from multinomial distribution estimates, the case of infinite minimizers requires more refined tools. We conclude by proving that stability of the functions in H K implies stability of the actualcenters of the clusters. Since stability is often used for selecting the number of clusters in practice, we hope that our analysis serves as a starting point for finding theoretically grounded recipes for the choice of K.
UR - http://www.scopus.com/inward/record.url?scp=38049080280&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-38049080280&origin=recordpage
M3 - RGC 32 - Refereed conference paper (with host publication)
SN - 9780262195683
SP - 1121
EP - 1128
BT - Advances in Neural Information Processing Systems
T2 - 20th Annual Conference on Neural Information Processing Systems, NIPS 2006
Y2 - 4 December 2006 through 7 December 2006
ER -