TY - JOUR
T1 - Clustering ensemble based on sample's stability
AU - Li, Feijiang
AU - Qian, Yuhua
AU - Wang, Jieting
AU - Dang, Chuangyin
AU - Jing, Liping
PY - 2019/8
Y1 - 2019/8
N2 - The objective of clustering ensemble is to find the underlying structure of data based on a set of clustering results. It has been observed that the samples can change between clusters in different clustering results. This change shows that samples may have different contributions to the detection of the underlying structure. However, the existing clustering ensemble methods treat all sample equally. To tackle this deficiency, we introduce the stability of a sample to quantify its contribution and present a methodology to determine this stability. We propose two formulas accord with this methodology to calculate sample's stability. Then, we develop a clustering ensemble algorithm based on the sample's stability. With either formula, this algorithm divides a data set into two classes: cluster core and cluster halo. With the core and halo, the proposed algorithm then discovers a clear structure using the samples in the cluster core and assigns samples in the cluster halo to the clear structure gradually. The experiments on eight synthetic data sets illustrate how the proposed algorithm works. This algorithm statistically outperforms twelve state-of-the-art clustering ensemble algorithms on ten real data sets from UCI and six document data sets. The experimental analysis on the case of image segmentation shows that cluster cores discovered by the stability are rational.
AB - The objective of clustering ensemble is to find the underlying structure of data based on a set of clustering results. It has been observed that the samples can change between clusters in different clustering results. This change shows that samples may have different contributions to the detection of the underlying structure. However, the existing clustering ensemble methods treat all sample equally. To tackle this deficiency, we introduce the stability of a sample to quantify its contribution and present a methodology to determine this stability. We propose two formulas accord with this methodology to calculate sample's stability. Then, we develop a clustering ensemble algorithm based on the sample's stability. With either formula, this algorithm divides a data set into two classes: cluster core and cluster halo. With the core and halo, the proposed algorithm then discovers a clear structure using the samples in the cluster core and assigns samples in the cluster halo to the clear structure gradually. The experiments on eight synthetic data sets illustrate how the proposed algorithm works. This algorithm statistically outperforms twelve state-of-the-art clustering ensemble algorithms on ten real data sets from UCI and six document data sets. The experimental analysis on the case of image segmentation shows that cluster cores discovered by the stability are rational.
KW - Clustering analysis
KW - Clustering ensemble
KW - Ensemble learning
KW - Sample's stability
UR - http://www.scopus.com/inward/record.url?scp=85062145422&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-85062145422&origin=recordpage
U2 - 10.1016/j.artint.2018.12.007
DO - 10.1016/j.artint.2018.12.007
M3 - RGC 21 - Publication in refereed journal
SN - 0004-3702
VL - 273
SP - 37
EP - 55
JO - Artificial Intelligence
JF - Artificial Intelligence
ER -