Massively Parallel Algorithms for the Stochastic Block Model

Zelin Li*, Pan Peng*, Xianbin Zhu*

*Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

20 Downloads (CityUHK Scholars)

Abstract

Learning the community structure of a large-scale graph is a fundamental problem in machine learning, computer science, and statistics. Among others, the Stochastic Block Model (SBM) serves as a canonical model for community detection and clustering, and the Massively Parallel Computation (MPC) model is a mathematical abstraction of real-world parallel computing systems which provides a powerful computational framework for handling large-scale datasets. We study the problem of exactly recovering the communities in a graph generated from the SBM in the MPC model. Specifically, given kn vertices that are partitioned into k equal-sized clusters (i.e., each has size n), a graph on these kn vertices is randomly generated such that each pair of vertices is connected with probability p if they are in the same cluster and with probability q if not, where p > q > 0.

 

We give an MPC algorithm that recovers the ground-truth clusters when (p-q)/√p ≥˜Ω(k1/2n(-1/2+1/(2r-2))) for any integer ∈ [3, O(log n)] in O(kr/δ) rounds in the sublinear space MPC model, where each machine has local memory O(nδ) for some constant δ > 0. When (p-q)/√p≥ ˜Ω(k3/4n-1/4), we also give an MPC clustering algorithm that works in O(logn) rounds in the s-space MPC model where each machine is only guaranteed to have memory s = Ω(log n). To implement the latter algorithm, we propose new algorithms for some basic graph operations in the s-space MPC model. Both algorithms significantly improve upon a recent result of Cohen-Addad et al. [PODC'22], who gave an algorithm that only works in the sublinear space MPC model with a much stronger condition on p, q, k.  Our algorithms are based on collecting the r-step neighborhood of each vertex and comparing the difference of some statistical information generated from the local neighborhoods for each pair of vertices.

Original languageEnglish
Title of host publication31st Annual European Symposium on Algorithms (ESA 2023)
PublisherSchloss Dagstuhl – Leibniz-Zentrum für Informatik
Pages78:1–78:17
ISBN (Electronic)978-3-95977-295-2
DOIs
Publication statusPublished - Sept 2023
Event31st Annual European Symposium on Algorithms (ESA 2023) - Amsterdam, Netherlands
Duration: 4 Sept 20236 Sept 2023
https://algo-conference.org/2023/esa/
https://algo-conference.org/esa/

Conference

Conference31st Annual European Symposium on Algorithms (ESA 2023)
Abbreviated titleESA 2023
Country/TerritoryNetherlands
CityAmsterdam
Period4/09/236/09/23
Internet address

Funding

Zelin Li and Pan Peng: Supported in part by NSFC grant 62272431 and “the Fundamental Research Funds for the Central Universities”. Xianbin Zhu: Supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China [Project No. CityU 11213620].

Research Keywords

  • Massively Parallel Computation
  • k-Clustering
  • SBM
  • Graph algorithms

Publisher's Copyright Statement

  • This full text is made available under CC-BY 4.0. https://creativecommons.org/licenses/by/4.0/

Fingerprint

Dive into the research topics of 'Massively Parallel Algorithms for the Stochastic Block Model'. Together they form a unique fingerprint.

Cite this