TY - JOUR
T1 - Towards big topic modeling
AU - Yan, JianFeng
AU - Zeng, Jia
AU - Liu, Zhi-Qiang
AU - Yang, Lu
AU - Gao, Yang
PY - 2017/6/1
Y1 - 2017/6/1
N2 - To solve the big topic modeling problem, we need to reduce both the time and space complexities of batch latent Dirichlet allocation (LDA) algorithms. Although parallel LDA algorithms on multi-processor architectures have low time and space complexities, their communication costs among processors often scale linearly with the vocabulary size and the number of topics, leading to a serious scalability problem. To reduce the communication complexity among processors to achieve improved scalability, we propose a novel communication-efficient parallel topic modeling architecture based on a power law, which consumes orders of magnitude less communication time when the number of topics is large. We combine the proposed communication-efficient parallel architecture with the online belief propagation (OBP) algorithm, referred to as POBP, for big topic modeling tasks. Extensive empirical results confirm that POBP has the following advantages for solving the big topic modeling problem when compared with recent state-of-the-art parallel LDA algorithms on multi-processor architectures: (1) high accuracy, (2) high communication efficiency, (3) high speed, and (4) constant memory usage.
AB - To solve the big topic modeling problem, we need to reduce both the time and space complexities of batch latent Dirichlet allocation (LDA) algorithms. Although parallel LDA algorithms on multi-processor architectures have low time and space complexities, their communication costs among processors often scale linearly with the vocabulary size and the number of topics, leading to a serious scalability problem. To reduce the communication complexity among processors to achieve improved scalability, we propose a novel communication-efficient parallel topic modeling architecture based on a power law, which consumes orders of magnitude less communication time when the number of topics is large. We combine the proposed communication-efficient parallel architecture with the online belief propagation (OBP) algorithm, referred to as POBP, for big topic modeling tasks. Extensive empirical results confirm that POBP has the following advantages for solving the big topic modeling problem when compared with recent state-of-the-art parallel LDA algorithms on multi-processor architectures: (1) high accuracy, (2) high communication efficiency, (3) high speed, and (4) constant memory usage.
KW - Big topic modeling
KW - Communication complexity
KW - Latent Dirichlet allocation
KW - Multi-processor architecture
KW - Online belief propagation
KW - Power law
UR - http://www.scopus.com/inward/record.url?scp=85010653950&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-85010653950&origin=recordpage
U2 - 10.1016/j.ins.2016.12.014
DO - 10.1016/j.ins.2016.12.014
M3 - RGC 21 - Publication in refereed journal
SN - 0020-0255
VL - 390
SP - 15
EP - 31
JO - Information Sciences
JF - Information Sciences
ER -