Studies on Improving the Sampling Efficiency of Markov Chain Monte Carlo Algorithms

關於提高馬爾可夫鏈蒙特卡羅算法抽樣效率的研究

Student thesis: Doctoral Thesis

View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Awarding Institution
Supervisors/Advisors
Award date26 Aug 2019

Abstract

Markov chain Monte Carlo (MCMC) is a class of sampling algorithms that is widely used because it can sample from complex distributions and is easy to implement. Various MCMC algorithms have been developed, for example the Metropolis-Hastings algorithm (Metropolis et al., 1953; Hastings, 1970), slice sampling (Neal, 2003) and Hamiltonian Monte Carlo (Neal, 2011). However, MCMC suffers from some problems that greatly limit its sampling efficiency. In the MCMC literature, `efficiency' is a general terminology that includes the generation of accurate samples from the target distribution, samples that take less time to converge to the target distribution and the costing of fewer computing resources. One problem is that MCMC suffers from a heavy computational burden when the rejection rate is high and when parameter estimation is performed from a large scale dataset. Another problem is that the MCMC method has difficulty dealing with multi-modal distributions. Nowadays, with rapidly increasing data volume and complexity, heavy data load and high modality are quite common. Therefore, it is urgent to solve these issues. In this thesis, three MCMC algorithms are proposed: 1. a powered embarrassing parallel MCMC (PEPMCMC) algorithm, 2. a parallel adaptive regional pseudo-prior generalized elliptical slice sampling (RGESS) algorithm and 3. an error corrected neural network gradient Hamiltonian Monte Carlo (ECNNGHMC) algorithm. The three algorithms improve the existing MCMC sampling methods in three different directions; however, they all aim to improve sampling efficiency. Next, we describe some main ideas and strengths of the proposed methods.

For the PEPMCMC algorithm, we partition the full dataset into different sub-datasets, and we consider the heterogeneity of the different sub-datasets by computing a power term to the sub-posterior of each sub-dataset. The powers are determined using maximum likelihood estimation (MLE). Interestingly, the final result is proven to be a weighted average of the results on different sub-datasets, and the weights is a function of the powers. Experimental results indicate that the PEPMCMC algorithm can achieve much better sampling efficiency and estimation accuracy than algorithms without power terms. In addition, the connection between normal kernel density and parametric density estimations under certain conditions is investigated.

The RGESS algorithm is proposed to explore distant modes of the target distribution and reduce the rejection rate. MCMC algorithm is well-known for having difficulty exploring distant modes when the target distribution is multi-modal. The reason is that a proposal state is likely to be rejected when traversing across low density regions. Focusing on this issue, we proposed parallel generalized elliptical slice sampling (RGESS) algorithm with adaptive regional pseudo-prior. Different pseudo-priors are used at different regions to conduct the generalized elliptical slice sampling (GESS) algorithm in Fagan et al. (2016) and Nishihara et al. (2014). The rejection rate is modified to guarantee detailed balance condition. We also employ adaptive transition kernel and parallel computing to accelerate sampling speed. Experimental results on one synthetic and one real-world dataset show that the proposed algorithm has the following advantages: with the same starting points, the proposed algorithm spend less time to find the modes of the target distribution; the proposed algorithm has lower rejection rates due to the parameters adaption; when estimating the parameters of multi-modal posterior distributions, the samples generated by proposed algorithm can find different modes well.

For the ECNNGHMC algorithm, we add the friction term in stochastic gradient Hamiltonian Monte Carlo (SGHMC, Chen et al., 2014) to the neural network gradient Hamiltonian Monte Carlo (NNGHMC, Li et al., 2018). In this way, the ECNNGHMC algorithm not only takes advantage of the approximation power of the neural network to reduce the time taken to compute gradients on a large scale dataset, but also attains rejection rate of $0$ because of the friction term. Therefore, the ECNNGHMC algorithm is much more efficient compared with NNGHMC algorithm. Furthermore, the structure of the neural network in ECNNGHMC can adopt newly developed technologies such as deep layers and skip-connection layers to achieve better accuracy.

    Research areas

  • Markov chain Monte Carlo, Efficiency