Studies on Improving the Sampling Efficiency of Markov Chain Monte Carlo Algorithms
關於提高馬爾可夫鏈蒙特卡羅算法抽樣效率的研究
Student thesis: Doctoral Thesis
Author(s)
Related Research Unit(s)
Detail(s)
Awarding Institution  

Supervisors/Advisors 

Award date  26 Aug 2019 
Link(s)
Permanent Link  https://scholars.cityu.edu.hk/en/theses/theses(1445b66c3ea647afb78646caa406cd52).html 

Other link(s)  Links 
Abstract
Markov chain Monte Carlo (MCMC) is a class of sampling algorithms that is widely used because it can sample from complex distributions and is easy to implement. Various MCMC algorithms have been developed, for example the MetropolisHastings algorithm (Metropolis et al., 1953; Hastings, 1970), slice sampling (Neal, 2003) and Hamiltonian Monte Carlo (Neal, 2011). However, MCMC suffers from some problems that greatly limit its sampling efficiency. In the MCMC literature, `efficiency' is a general terminology that includes the generation of accurate samples from the target distribution, samples that take less time to converge to the target distribution and the costing of fewer computing resources. One problem is that MCMC suffers from a heavy computational burden when the rejection rate is high and when parameter estimation is performed from a large scale dataset. Another problem is that the MCMC method has difficulty dealing with multimodal distributions. Nowadays, with rapidly increasing data volume and complexity, heavy data load and high modality are quite common. Therefore, it is urgent to solve these issues. In this thesis, three MCMC algorithms are proposed: 1. a powered embarrassing parallel MCMC (PEPMCMC) algorithm, 2. a parallel adaptive regional pseudoprior generalized elliptical slice sampling (RGESS) algorithm and 3. an error corrected neural network gradient Hamiltonian Monte Carlo (ECNNGHMC) algorithm. The three algorithms improve the existing MCMC sampling methods in three different directions; however, they all aim to improve sampling efficiency. Next, we describe some main ideas and strengths of the proposed methods.
For the PEPMCMC algorithm, we partition the full dataset into different subdatasets, and we consider the heterogeneity of the different subdatasets by computing a power term to the subposterior of each subdataset. The powers are determined using maximum likelihood estimation (MLE). Interestingly, the final result is proven to be a weighted average of the results on different subdatasets, and the weights is a function of the powers. Experimental results indicate that the PEPMCMC algorithm can achieve much better sampling efficiency and estimation accuracy than algorithms without power terms. In addition, the connection between normal kernel density and parametric density estimations under certain conditions is investigated.
The RGESS algorithm is proposed to explore distant modes of the target distribution and reduce the rejection rate. MCMC algorithm is wellknown for having difficulty exploring distant modes when the target distribution is multimodal. The reason is that a proposal state is likely to be rejected when traversing across low density regions. Focusing on this issue, we proposed parallel generalized elliptical slice sampling (RGESS) algorithm with adaptive regional pseudoprior. Different pseudopriors are used at different regions to conduct the generalized elliptical slice sampling (GESS) algorithm in Fagan et al. (2016) and Nishihara et al. (2014). The rejection rate is modified to guarantee detailed balance condition. We also employ adaptive transition kernel and parallel computing to accelerate sampling speed. Experimental results on one synthetic and one realworld dataset show that the proposed algorithm has the following advantages: with the same starting points, the proposed algorithm spend less time to find the modes of the target distribution; the proposed algorithm has lower rejection rates due to the parameters adaption; when estimating the parameters of multimodal posterior distributions, the samples generated by proposed algorithm can find different modes well.
For the ECNNGHMC algorithm, we add the friction term in stochastic gradient Hamiltonian Monte Carlo (SGHMC, Chen et al., 2014) to the neural network gradient Hamiltonian Monte Carlo (NNGHMC, Li et al., 2018). In this way, the ECNNGHMC algorithm not only takes advantage of the approximation power of the neural network to reduce the time taken to compute gradients on a large scale dataset, but also attains rejection rate of $0$ because of the friction term. Therefore, the ECNNGHMC algorithm is much more efficient compared with NNGHMC algorithm. Furthermore, the structure of the neural network in ECNNGHMC can adopt newly developed technologies such as deep layers and skipconnection layers to achieve better accuracy.
For the PEPMCMC algorithm, we partition the full dataset into different subdatasets, and we consider the heterogeneity of the different subdatasets by computing a power term to the subposterior of each subdataset. The powers are determined using maximum likelihood estimation (MLE). Interestingly, the final result is proven to be a weighted average of the results on different subdatasets, and the weights is a function of the powers. Experimental results indicate that the PEPMCMC algorithm can achieve much better sampling efficiency and estimation accuracy than algorithms without power terms. In addition, the connection between normal kernel density and parametric density estimations under certain conditions is investigated.
The RGESS algorithm is proposed to explore distant modes of the target distribution and reduce the rejection rate. MCMC algorithm is wellknown for having difficulty exploring distant modes when the target distribution is multimodal. The reason is that a proposal state is likely to be rejected when traversing across low density regions. Focusing on this issue, we proposed parallel generalized elliptical slice sampling (RGESS) algorithm with adaptive regional pseudoprior. Different pseudopriors are used at different regions to conduct the generalized elliptical slice sampling (GESS) algorithm in Fagan et al. (2016) and Nishihara et al. (2014). The rejection rate is modified to guarantee detailed balance condition. We also employ adaptive transition kernel and parallel computing to accelerate sampling speed. Experimental results on one synthetic and one realworld dataset show that the proposed algorithm has the following advantages: with the same starting points, the proposed algorithm spend less time to find the modes of the target distribution; the proposed algorithm has lower rejection rates due to the parameters adaption; when estimating the parameters of multimodal posterior distributions, the samples generated by proposed algorithm can find different modes well.
For the ECNNGHMC algorithm, we add the friction term in stochastic gradient Hamiltonian Monte Carlo (SGHMC, Chen et al., 2014) to the neural network gradient Hamiltonian Monte Carlo (NNGHMC, Li et al., 2018). In this way, the ECNNGHMC algorithm not only takes advantage of the approximation power of the neural network to reduce the time taken to compute gradients on a large scale dataset, but also attains rejection rate of $0$ because of the friction term. Therefore, the ECNNGHMC algorithm is much more efficient compared with NNGHMC algorithm. Furthermore, the structure of the neural network in ECNNGHMC can adopt newly developed technologies such as deep layers and skipconnection layers to achieve better accuracy.
 Markov chain Monte Carlo, Efficiency