Scalable Co-Clustering for Large-Scale Data Through Dynamic Partitioning and Hierarchical Merging

Zihan Wu*, Zhaoke Huang, Hong Yan

*Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

Co-clustering simultaneously clusters rows and columns, revealing more fine-grained groups. However, existing co-clustering methods suffer from poor scalability and cannot handle large-scale data. This paper presents a novel and scalable co-clustering method designed to uncover intricate patterns in high-dimensional, large-scale datasets. Specifically, we first propose a large matrix partitioning algorithm that partitions a large matrix into smaller submatrices, enabling parallel co-clustering. This method employs a probabilistic model to optimize the configuration of submatrices, balancing the computational efficiency and depth of analysis. Additionally, we propose a hierarchical co-cluster merging algorithm that efficiently identifies and merges co-clusters from these submatrices, enhancing the robustness and reliability of the process. Extensive evaluations validate the effectiveness and efficiency of our method. Experimental results demonstrate a significant reduction in computation time, with an approximate 83% decrease for dense matrices and up to 30% for sparse matrices. © 2024 IEEE
Original languageEnglish
Title of host publication2024 IEEE International Conference on Systems, Man, and Cybernetics (SMC)
PublisherIEEE
Pages4686-4691
Number of pages6
ISBN (Electronic)978-1-6654-1020-5
ISBN (Print)978-1-6654-1021-2
DOIs
Publication statusPublished - 20 Jan 2025
Event2024 IEEE International Conference on Systems, Man, and Cybernetics (SMC 2024): Sustainable Futures: Harmonizing Humanity and Technology for a Thriving World - Borneo Convention Centre Kuching, Sarawak, Malaysia
Duration: 6 Oct 202410 Oct 2024
https://www.ieeesmc2024.org/home

Conference

Conference2024 IEEE International Conference on Systems, Man, and Cybernetics (SMC 2024)
Abbreviated titleIEEE SMC 2024
PlaceMalaysia
CitySarawak
Period6/10/2410/10/24
Internet address

Funding

This work is supported by Hong Kong Innovation and Technology Commission (InnoHK Project CIMDA) and Hong Kong Research Grants Council (Project CityU 11204821).

RGC Funding Information

  • RGC-funded

Fingerprint

Dive into the research topics of 'Scalable Co-Clustering for Large-Scale Data Through Dynamic Partitioning and Hierarchical Merging'. Together they form a unique fingerprint.

Cite this