TY - JOUR
T1 - WC-KNNG-PC
T2 - Watershed clustering based on k-nearest-neighbor graph and Pauta Criterion
AU - Xia, Jianhua
AU - Zhang, Jinbing
AU - Wang, Yang
AU - Han, Lixin
AU - Yan, Hong
PY - 2022/1
Y1 - 2022/1
N2 - Watershed clustering utilizes the concept of watershed algorithm to process clustering or cluster analyzes. The most attractive characteristic of this method is the capability to determine automatically the number of clusters from the data sets. However, in terms of the literature, the purposes of the original watershed clustering algorithm and the improved version are the detection of the clusters within two-dimensional linear data sets. In order to enable watershed clustering to deal with the dataset with multiple dimensions and nonlinear structures, we introduce k-nearest neighbor graph (KNNG), the shared nearest neighbor method and Pauta Criterion into watershed clustering to present a new watershed graph clustering with noise detection, WC-KNNG-PC. This approach first calculates a KNNG for the data sets, and then compute catchment basins (subclusters), basin immersions (connectivity between basins) and outliers. To prevent the merger of illegal subclusters, a maximum normalization stability factor, based on t-nearest neighbors and angle, MNSF, is proposed to detect the invalid basin immersions. Finally, a basin level similarity using median criterion is presented to merge the catchment basins to obtain the final clustering. Experiments on complex synthetic datasets and multidimensional real-world datasets have successfully demonstrated that the performance of the WC-KNNG-PC in clustering some various dimensional and complex datasets with heterogeneous density and diverse shapes.
AB - Watershed clustering utilizes the concept of watershed algorithm to process clustering or cluster analyzes. The most attractive characteristic of this method is the capability to determine automatically the number of clusters from the data sets. However, in terms of the literature, the purposes of the original watershed clustering algorithm and the improved version are the detection of the clusters within two-dimensional linear data sets. In order to enable watershed clustering to deal with the dataset with multiple dimensions and nonlinear structures, we introduce k-nearest neighbor graph (KNNG), the shared nearest neighbor method and Pauta Criterion into watershed clustering to present a new watershed graph clustering with noise detection, WC-KNNG-PC. This approach first calculates a KNNG for the data sets, and then compute catchment basins (subclusters), basin immersions (connectivity between basins) and outliers. To prevent the merger of illegal subclusters, a maximum normalization stability factor, based on t-nearest neighbors and angle, MNSF, is proposed to detect the invalid basin immersions. Finally, a basin level similarity using median criterion is presented to merge the catchment basins to obtain the final clustering. Experiments on complex synthetic datasets and multidimensional real-world datasets have successfully demonstrated that the performance of the WC-KNNG-PC in clustering some various dimensional and complex datasets with heterogeneous density and diverse shapes.
KW - K-nearest neighbor graph (KNNG)
KW - Pauta criterion
KW - Shared nearest neighbor (SNN)
KW - Watershed clustering
UR - http://www.scopus.com/inward/record.url?scp=85111537219&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-85111537219&origin=recordpage
U2 - 10.1016/j.patcog.2021.108177
DO - 10.1016/j.patcog.2021.108177
M3 - RGC 21 - Publication in refereed journal
SN - 0031-3203
VL - 121
JO - Pattern Recognition
JF - Pattern Recognition
M1 - 108177
ER -