Powered embarrassing parallel MCMC sampling in Bayesian inference, a weighted average intuition

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review

3 Scopus Citations
View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Original languageEnglish
Pages (from-to)11-20
Journal / PublicationComputational Statistics and Data Analysis
Volume115
Online published25 May 2017
Publication statusPublished - Nov 2017

Abstract

Although the Markov Chain Monte Carlo (MCMC) is very popular in parameter inference, the alleviation of the burden of calculation is crucial due to the limit of processors, memory, and disk bottleneck. This is especially true in terms of handling big data. In recent years, researchers have developed a parallel MCMC algorithm, in which full data are partitioned into subdatasets. Samples are drawn from the subdatasets independently at different machines without communication. In the extant literature, all machines are deemed to be identical. However, due to the heterogeneity of the data put into different machines, and the random nature of MCMC, the assumption of “identical machines” is questionable. Here we propose a Powered Embarrassing Parallel MCMC (PEPMCMC) algorithm, in which the full data posterior density is the product of the sub-posterior densities (posterior densities of different subdatasets) raised by some constraint powers. This is proven to be equivalent to a weighted averaging procedure. In our work, the powers are determined based on a maximum likelihood criterion, which leads to finding a maximum likelihood point within the convex hull of the estimates from different machines. We prove the asymptotic exactness and apply it to several cases to verify its strength in comparison with the unparallel and unpowered parallel algorithms. Furthermore, the connection between normal kernel density and parametric density estimations under certain conditions is investigated.

Research Area(s)

  • Markov Chain Monte Carlo, Maximum likelihood, Powered parallel, Weighted average