Bayesian Learning from Unstructured Data: Hierarchical Prior, Inference and Approximation


Student thesis: Doctoral Thesis

View graph of relations


Related Research Unit(s)


Awarding Institution
Award date2 Aug 2021


With information explosion, unstructured data is becoming vastly available, including but not limited to graphs, texts, relational and behavioral data. New statistical tools are called for, in order to better understand and exploit such unstructured information. In this thesis, Bayesian methods that incorporate particular structures of interests are established, to model and represent user-generated data, such as crowdsourced data, social networks, model ensembles and etc.

Chapter 1 is devoted to a high-level review of Bayesian methods and their applications in machine learning, especially the areas of matrix factorization, factor analytic models, and their extensions to the exponential family. Special attention is paid to variational inference, a technique for improving tractability in complicated posterior structure and possibly non-conjugate hierarchical models.

Chapter 2 presents a Bayesian model for crowdsourced labels, which are commonly available alternatives to ground truths for training different types of classifiers. In contrast to ground-truth targets, the crowdsourced labels contain information relevant to discrimination of instances and other interesting aspects, such as the texture of images or the meaning of words. More specifically, a Bayesian factor analytic model with GMM prior distribution is proposed, aiming to learn latent representations and simultaneously cluster instances, based purely on crowdsourced labels or together with features. For the proposed model, we derive a closed-form variational posterior update scheme based on Böhning's bound. Two specific approximate families are considered and different optimal posterior distributions are compared. Empirically, we demonstrate that the proposed method achieves good label aggregation accuracy and retains meaningful information in learned embeddings.

In Chapter 3, we extend the hierarchical method to statistical networks, such as social, citation, and co-authorship networks. In contrast to the model introduced in Chapter 2, which is bipartite in nature, the adapted model structure is symmetric for undirected networks. The latent structure induced by infinite GMM prior distribution assists community discovery and detection, which is usually observed in real-world networks. As extensions to the previous model and existing methods, we adopt a fully nonparametric DP mixture with conjugate ARD type and Gaussian-Wishart base distribution to formulate the structured prior, which estimates the unknown number of communities and latent dimensions in a principled manner, and thus elevates the tuning efforts required by non-Bayesian methods. For posterior inference purposes, the binary Bernoulli likelihood is handled by the Pólya-Gamma augmentation method, resulting in closed-form variational updates for all latent variables with a fixed point iteration scheme. The effectiveness and efficiency of the proposed method are demonstrated on both synthetic and real data.

A learning scheme from Bayesian classifier ensembles is formulated in Chapter 4. For a generic Bayesian classifier, we explore the probabilistic structure of predictive distribution and develop an adversarial information distillation method using the idea of amortization with a black-box model, without loss of ability to quantify predictive uncertainty. The approximation error introduced by the explicit model is analyzed under the MMD metric. It is demonstrated empirically that while introducing negligible amortization loss, the proposed approximation achieves great acceleration for prediction, compared with naive MC integration. Also, experiments suggest that the approximation retains information originated from the target Bayesian model, which facilitates out-of-domain detection and improves predictive robustness.