Effect of cluster size distribution on clustering: a comparative study of k-means and fuzzy c-means clustering

Kaile Zhou*, Shanlin Yang

*Corresponding author for this work

    Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

    59 Citations (Scopus)

    Abstract

    Data distribution has a significant impact on clustering results. This study focuses on the effect of cluster size distribution on clustering, namely the uniform effect of k-means and fuzzy c-means (FCM) clustering. We first provide some related works of k-means and FCM clustering. Then, the structure decomposition analysis of the objective functions of k-means and FCM is presented. Afterward, extensive experiments on both synthetic two-dimensional and three-dimensional data sets and real-world data sets from the UCI machine learning repository are conducted. The results demonstrate that FCM has stronger uniform effect than k-means clustering. Also, it reveals that the fuzzifier value m = 2 in FCM, which has been widely adopted in many applications, is not a good choice, particularly for data sets with great variation in cluster sizes. Therefore, for data sets with significant uneven distributions in cluster sizes, a smaller fuzzifier value is preferred for FCM clustering, and k-means clustering is a better choice compared with FCM clustering.
    Original languageEnglish
    Pages (from-to)455–466
    JournalPattern Analysis and Applications
    Volume23
    Online published6 Mar 2019
    DOIs
    Publication statusPublished - Feb 2020

    Bibliographical note

    Full text of this publication does not contain sufficient affiliation information. With consent from the author(s) concerned, the Research Unit(s) information for this record is based on the existing academic department affiliation of the author(s).

    Research Keywords

    • Clustering
    • Data distribution
    • Fuzzifier
    • Fuzzy c-means (FCM)
    • k-means
    • Uniform effect

    Fingerprint

    Dive into the research topics of 'Effect of cluster size distribution on clustering: a comparative study of k-means and fuzzy c-means clustering'. Together they form a unique fingerprint.

    Cite this