Towards big topic modeling

JianFeng Yan, Jia Zeng*, Zhi-Qiang Liu, Lu Yang, Yang Gao

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

4 Citations (Scopus)

Abstract

To solve the big topic modeling problem, we need to reduce both the time and space complexities of batch latent Dirichlet allocation (LDA) algorithms. Although parallel LDA algorithms on multi-processor architectures have low time and space complexities, their communication costs among processors often scale linearly with the vocabulary size and the number of topics, leading to a serious scalability problem. To reduce the communication complexity among processors to achieve improved scalability, we propose a novel communication-efficient parallel topic modeling architecture based on a power law, which consumes orders of magnitude less communication time when the number of topics is large. We combine the proposed communication-efficient parallel architecture with the online belief propagation (OBP) algorithm, referred to as POBP, for big topic modeling tasks. Extensive empirical results confirm that POBP has the following advantages for solving the big topic modeling problem when compared with recent state-of-the-art parallel LDA algorithms on multi-processor architectures: (1) high accuracy, (2) high communication efficiency, (3) high speed, and (4) constant memory usage.
Original languageEnglish
Pages (from-to)15-31
JournalInformation Sciences
Volume390
DOIs
Publication statusPublished - 1 Jun 2017

Research Keywords

  • Big topic modeling
  • Communication complexity
  • Latent Dirichlet allocation
  • Multi-processor architecture
  • Online belief propagation
  • Power law

Fingerprint

Dive into the research topics of 'Towards big topic modeling'. Together they form a unique fingerprint.

Cite this