Data Clustering with Cluster Size Constraints Using a Modified k-means Algorithm

Nuwan Ganganath*, Chi-Tsun Cheng, Chi K. Tse

*Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

Data clustering is a frequently used technique in finance, computer science, and engineering. In most of the applications, cluster sizes are either constrained to particular values or available as prior knowledge. Unfortunately, traditional clustering methods cannot impose constrains on cluster sizes. In this paper, we propose some vital modifications to the standard k-means algorithm such that it can incorporate size constraints for each cluster separately. The modified k-means algorithm can be used to obtain clusters in preferred sizes. A potential application would be obtaining clusters with equal cluster size. Moreover, the modified algorithm makes use of prior knowledge of the given data set for selectively initializing the cluster centroids which helps escaping from local minima. Simulation results on multidimensional data demonstrate that the k-means algorithm with the proposed modifications can fulfill cluster size constraints and lead to more accurate and robust results.
Original languageEnglish
Title of host publicationProceedings - 2014 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery
PublisherIEEE
Pages158-161
ISBN (Print)9781479962358, 9781479962365
DOIs
Publication statusPublished - Oct 2014
Externally publishedYes
Event6th International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC 2014) - Shanghai, China
Duration: 10 Oct 201412 Oct 2014

Conference

Conference6th International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC 2014)
PlaceChina
CityShanghai
Period10/10/1412/10/14

Research Keywords

  • constrained clustering
  • data clustering
  • data mining
  • k-means
  • size constraints

Fingerprint

Dive into the research topics of 'Data Clustering with Cluster Size Constraints Using a Modified k-means Algorithm'. Together they form a unique fingerprint.

Cite this