Projects per year
Abstract
Learning the community structure of a large-scale graph is a fundamental problem in machine learning, computer science, and statistics. Among others, the Stochastic Block Model (SBM) serves as a canonical model for community detection and clustering, and the Massively Parallel Computation (MPC) model is a mathematical abstraction of real-world parallel computing systems which provides a powerful computational framework for handling large-scale datasets. We study the problem of exactly recovering the communities in a graph generated from the SBM in the MPC model. Specifically, given kn vertices that are partitioned into k equal-sized clusters (i.e., each has size n), a graph on these kn vertices is randomly generated such that each pair of vertices is connected with probability p if they are in the same cluster and with probability q if not, where p > q > 0.
We give an MPC algorithm that recovers the ground-truth clusters when (p-q)/√p ≥˜Ω(k1/2n(-1/2+1/(2r-2))) for any integer r ∈ [3, O(log n)] in O(kr/δ) rounds in the sublinear space MPC model, where each machine has local memory O(nδ) for some constant δ > 0. When (p-q)/√p≥ ˜Ω(k3/4n-1/4), we also give an MPC clustering algorithm that works in O(logs n) rounds in the s-space MPC model where each machine is only guaranteed to have memory s = Ω(log n). To implement the latter algorithm, we propose new algorithms for some basic graph operations in the s-space MPC model. Both algorithms significantly improve upon a recent result of Cohen-Addad et al. [PODC'22], who gave an algorithm that only works in the sublinear space MPC model with a much stronger condition on p, q, k. Our algorithms are based on collecting the r-step neighborhood of each vertex and comparing the difference of some statistical information generated from the local neighborhoods for each pair of vertices.
Original language | English |
---|---|
Title of host publication | 31st Annual European Symposium on Algorithms (ESA 2023) |
Publisher | Schloss Dagstuhl – Leibniz-Zentrum für Informatik |
Pages | 78:1–78:17 |
ISBN (Electronic) | 978-3-95977-295-2 |
DOIs | |
Publication status | Published - Sept 2023 |
Event | 31st Annual European Symposium on Algorithms (ESA 2023) - Amsterdam, Netherlands Duration: 4 Sept 2023 → 6 Sept 2023 https://algo-conference.org/2023/esa/ https://algo-conference.org/esa/ |
Conference
Conference | 31st Annual European Symposium on Algorithms (ESA 2023) |
---|---|
Abbreviated title | ESA 2023 |
Country/Territory | Netherlands |
City | Amsterdam |
Period | 4/09/23 → 6/09/23 |
Internet address |
Funding
Zelin Li and Pan Peng: Supported in part by NSFC grant 62272431 and “the Fundamental Research Funds for the Central Universities”. Xianbin Zhu: Supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China [Project No. CityU 11213620].
Research Keywords
- Massively Parallel Computation
- k-Clustering
- SBM
- Graph algorithms
Publisher's Copyright Statement
- This full text is made available under CC-BY 4.0. https://creativecommons.org/licenses/by/4.0/
Fingerprint
Dive into the research topics of 'Massively Parallel Algorithms for the Stochastic Block Model'. Together they form a unique fingerprint.Projects
- 1 Finished
-
GRF: Algorithmic Foundations of Distributed Large-Scale Graph Processing
LI, M. (Principal Investigator / Project Coordinator), Konrad, C. (Co-Investigator) & ROBINSON, P. (Co-Investigator)
1/01/21 → 5/06/25
Project: Research