Unsupervised data pruning for clustering of noisy data

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review

7 Scopus Citations
View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Original languageEnglish
Pages (from-to)612-616
Journal / PublicationKnowledge-Based Systems
Volume21
Issue number7
Publication statusPublished - Oct 2008

Abstract

Data pruning works with identifying noisy instances of a data set and removing them from the data set in order to improve the generalization of a learning algorithm. It has been well studied in supervised classification where the identification and removal of noisy instances are guided by available labels of instances. However, to the best knowledge of the authors', very few work has been done on data pruning for unsupervised clustering. This paper deals with the problem of data pruning for unsupervised clustering under the condition that labels of instances are unknown beforehand. We claim that unsupervised data pruning can benefit for the clustering of the data with noise. We propose a feasible approach, termed as unsupervised Data Pruning using Ensembles of multiple Clusterers (DPEC), to identify noisy instances of a data set. DPEC checks all instances of a data set and identifies noisy instances by using ensembles of multiple clustering results provided by different clusterers on the same data set. We test the performance of DPEC on several real data sets with artificial noise. Experimental results demonstrate that DPEC is often able to improve the accuracy and robustness of the clustering algorithm. © 2008 Elsevier B.V. All rights reserved.

Research Area(s)

  • Clustering analysis, Clustering ensembles, Data pruning

Citation Format(s)

Unsupervised data pruning for clustering of noisy data. / Hong, Yi; Kwong, Sam; Chang, Yuchou; Ren, Qingsheng.

In: Knowledge-Based Systems, Vol. 21, No. 7, 10.2008, p. 612-616.

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review