GCA : A real-time grid-based clustering algorithm for large data set

Research output: Chapters, Conference Papers, Creative and Literary Works (RGC: 12, 32, 41, 45)32_Refereed conference paper (with ISBN/ISSN)peer-review

8 Scopus Citations
View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Original languageEnglish
Title of host publicationProceedings - International Conference on Pattern Recognition
Pages740-743
Volume2
Publication statusPublished - 2006

Publication series

Name
Volume2
ISSN (Print)1051-4651

Conference

Title18th International Conference on Pattern Recognition, ICPR 2006
PlaceChina
CityHong Kong
Period20 - 24 August 2006

Abstract

Few of the current existing methods for unsupervised learning (clustering) algorithms consider clustering the data points in a low-dimensional subspace in real time. In this paper, we present a grid based clustering algorithm (GCA) with time complexity (O(n)). Unlike previous clustering algorithm, GCA pay s more attention to the running time of the algorithm. GCA achieves low running time by (i) determining the number of the clusters according to the point density of the grid cell and (ii) computing the distances between the centers of the clusters and the grid cells, not the data points. In order to make GCA more efficient, principal component analysis(PCA) is introduced to transform the data points from high dimension to low dimension. Finally, we analyze the performance of GCA and show that it outperforms most of the current state-of-the-art methods in terms of efficiency. In particular, it outperforms k-means algorithm by several orders in the running time. © 2006 IEEE.

Citation Format(s)

GCA : A real-time grid-based clustering algorithm for large data set. / Yu, Zhiwen; Wong, Hau-San.

Proceedings - International Conference on Pattern Recognition. Vol. 2 2006. p. 740-743 1699311.

Research output: Chapters, Conference Papers, Creative and Literary Works (RGC: 12, 32, 41, 45)32_Refereed conference paper (with ISBN/ISSN)peer-review