Abstract
Accurate estimation of mutual information (MI) between high-dimensional random variables remains a fundamental challenge in information theory and machine learning. Recent progress has been made by employing deep neural networks to optimize variational MI formulations, demonstrating significant success and offering enhanced flexibility to complex data distributions. However, this thesis reveals that the current approaches may fail to provide appropriate learning signals when observed samples from target distributions lack overlapping support or reference samples overshadow data samples, leading to MI estimates either overshoot to infinity or are constrained by the logarithm of the sample size. This thesis presents a systematic investigation into the challenges of overfitting in MI neural estimation and proposes several smoothing techniques on different estimators.The research is organized into four main parts. The first part focuses on the overconfidence issues in classifier-based MI estimators that can lead to numerical instability and biased estimates. We address this problem by introducing soft labels adaptively according to the classifier predictions, offering a more stable and generalizable estimator. We demonstrate that the proposed model is unbiased and consistent under mild assumptions. The experimental results on self-supervised learning tasks demonstrate that the proposed method can mitigate the overfitting issue and help learn more informative encoders in representation learning.
In the second part, we reveal that the estimation of the Information Noise Contrastive Estimator (InfoNCE) can be significantly off, and its subsequent fix can cause the MI estimate to overshoot without any bounds. To tackle these issues, we establish a novel MI variational representation framework based on the InfoNCE, which introduces a mixture of data and reference distributions to ensure proper gradients for effective model training. A pretrained probabilistic classifier on the mixed dataset is used to approximate the posterior distribution and serves as the smoothing weight in the contrastive estimation. The optimal solution of the proposed model is related to the approximated probabilistic classifier, and the estimate is guaranteed to be finite under mild assumptions.
The third part investigates the overshooting issue in ƒ-divergence estimation problems. A notion of correlation embedding is established to measure the correlation between any function and the feature map in Reproducing Kernel Hilbert Space (RKHS). By minimizing the distance between the correlation embeddings of the parameterized neural network and the true density ratio, we propose a novel ƒ-divergence estimator derived from the dual of ƒ-divergence variational formula. This approach avoids the normalization constant estimation problem in current neural estimators and generalizes existing work by providing smooth density ratio estimates beyond finite samples.
The final part of the thesis introduces a new realistic image dataset for MI estimation that is different from the synthetic datasets commonly used in the literature. We show that inherent noise in the data can be misleading and cause many existing MI estimators to fail to capture true patterns. We propose a novel approach that employs smoothing techniques in the lowerdimensional latent space of the data to squash large variations. The theoretical support is presented through a proof of the necessary and sufficient conditions for preserving mutual information during the encoding process. Experimental results confirm the effectiveness of the proposed approach across synthetic and realistic datasets.
In conclusion, this thesis offers a thorough examination of MI neural estimation challenges and proposes practical and effective smoothing techniques that contribute to the advancement of this critical area in information theory and machine learning.
| Date of Award | 20 May 2024 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Supervisor | Chung CHAN (Supervisor) |