TY - JOUR
T1 - Powered embarrassing parallel MCMC sampling in Bayesian inference, a weighted average intuition
AU - Li, Song
AU - Tso, Geoffrey K.F.
AU - Long, Lufan
PY - 2017/11
Y1 - 2017/11
N2 - Although the Markov Chain Monte Carlo (MCMC) is very popular in parameter inference, the alleviation of the burden of calculation is crucial due to the limit of processors, memory, and disk bottleneck. This is especially true in terms of handling big data. In recent years, researchers have developed a parallel MCMC algorithm, in which full data are partitioned into subdatasets. Samples are drawn from the subdatasets independently at different machines without communication. In the extant literature, all machines are deemed to be identical. However, due to the heterogeneity of the data put into different machines, and the random nature of MCMC, the assumption of “identical machines” is questionable. Here we propose a Powered Embarrassing Parallel MCMC (PEPMCMC) algorithm, in which the full data posterior density is the product of the sub-posterior densities (posterior densities of different subdatasets) raised by some constraint powers. This is proven to be equivalent to a weighted averaging procedure. In our work, the powers are determined based on a maximum likelihood criterion, which leads to finding a maximum likelihood point within the convex hull of the estimates from different machines. We prove the asymptotic exactness and apply it to several cases to verify its strength in comparison with the unparallel and unpowered parallel algorithms. Furthermore, the connection between normal kernel density and parametric density estimations under certain conditions is investigated.
AB - Although the Markov Chain Monte Carlo (MCMC) is very popular in parameter inference, the alleviation of the burden of calculation is crucial due to the limit of processors, memory, and disk bottleneck. This is especially true in terms of handling big data. In recent years, researchers have developed a parallel MCMC algorithm, in which full data are partitioned into subdatasets. Samples are drawn from the subdatasets independently at different machines without communication. In the extant literature, all machines are deemed to be identical. However, due to the heterogeneity of the data put into different machines, and the random nature of MCMC, the assumption of “identical machines” is questionable. Here we propose a Powered Embarrassing Parallel MCMC (PEPMCMC) algorithm, in which the full data posterior density is the product of the sub-posterior densities (posterior densities of different subdatasets) raised by some constraint powers. This is proven to be equivalent to a weighted averaging procedure. In our work, the powers are determined based on a maximum likelihood criterion, which leads to finding a maximum likelihood point within the convex hull of the estimates from different machines. We prove the asymptotic exactness and apply it to several cases to verify its strength in comparison with the unparallel and unpowered parallel algorithms. Furthermore, the connection between normal kernel density and parametric density estimations under certain conditions is investigated.
KW - Markov Chain Monte Carlo
KW - Maximum likelihood
KW - Powered parallel
KW - Weighted average
UR - http://www.scopus.com/inward/record.url?scp=85020300016&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-85020300016&origin=recordpage
U2 - 10.1016/j.csda.2017.05.005
DO - 10.1016/j.csda.2017.05.005
M3 - 21_Publication in refereed journal
VL - 115
SP - 11
EP - 20
JO - Computational Statistics and Data Analysis
JF - Computational Statistics and Data Analysis
SN - 0167-9473
ER -