Improving the Stability and Convergence of GAN Training
提高GAN的穩定性和收斂性
Student thesis: Doctoral Thesis
Author(s)
Related Research Unit(s)
Detail(s)
Awarding Institution | |
---|---|
Supervisors/Advisors |
|
Award date | 5 Sept 2022 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/theses/theses(34c1e716-3c80-4024-861a-5aa8aae3979f).html |
---|---|
Other link(s) | Links |
Abstract
Enabling machines to automatically understand the underlying rich structure of high-dimensional data (e.g., images) remains one of the main challenges in computer science. One way to tackle this problem is via generative modelling. With the advent of deep neural networks as a universal approximator, a broad family of these algorithms has flourished: flow-based models, diffusion models, VAE (Variational Auto Encoder), and GAN (Generative Adversarial Networks), to cite a few. Among them, GAN is a typical one due to 1) its implicit density formulation as a zero-sum game between two players (e.g., the generator and the discriminator) and 2) its ability to generate higher quality samples. Although the formulation as a game is the key to GAN success, it also gives rise to shortcomings such as the mode collapse issue, the vanishing gradient problem, the training stability and the convergence of the algorithm. In this thesis, we tackle these shortcomings from 3 different but complementary perspectives:
First, we demonstrate that the original GAN objective function based on KL-divergence is subject to overfitting and can yield a poorer generation characterized by the mode collapse issue. Therefore, we advocate for a robust and more expressive objective based on αβ-divergence. Unlike KL-divergence, we prove that in the context of disjoint supports between data and model distribution, our proposed objective can still provide gradient updates to the generator.
Second, we present a theoretical analysis of three well-known GANs' algorithms where we found that the distribution of GAN discriminator output is correlated to GAN training stability. We, therefore, propose to learn a higher-order distribution over the likelihood of the discriminator output via modelling both model (e.g., epistemic) and data (e.g., aleatoric) uncertainty. Unlike other uncertainty modelling methods such that Bayesian and ensemble learning, our proposed method is both time and space-efficient and does not require MCMC (Markov chain Monte Carlo) approximation.
Third, we show that divergence minimization via the zero-sum formulation of GAN leads to the vanishing gradient problem resulting in poor convergence of the algorithm. Inspired by the closed-loop control theory, we mitigate this shortcoming by adopting for the GAN generator an objective function that is the residual between the loss of generated data to be real and the loss of generated data to be fake from the discriminator perspective. We prove that optimizing residual objective yields in minimization of both f-divergence and IPM (Integral Probability Metric), thereby bridging the gap between these two families of divergence.
Comprehensive experiments conducted on both synthetic and real image datasets show that our proposed methods significantly mitigate GAN shortcomings.
First, we demonstrate that the original GAN objective function based on KL-divergence is subject to overfitting and can yield a poorer generation characterized by the mode collapse issue. Therefore, we advocate for a robust and more expressive objective based on αβ-divergence. Unlike KL-divergence, we prove that in the context of disjoint supports between data and model distribution, our proposed objective can still provide gradient updates to the generator.
Second, we present a theoretical analysis of three well-known GANs' algorithms where we found that the distribution of GAN discriminator output is correlated to GAN training stability. We, therefore, propose to learn a higher-order distribution over the likelihood of the discriminator output via modelling both model (e.g., epistemic) and data (e.g., aleatoric) uncertainty. Unlike other uncertainty modelling methods such that Bayesian and ensemble learning, our proposed method is both time and space-efficient and does not require MCMC (Markov chain Monte Carlo) approximation.
Third, we show that divergence minimization via the zero-sum formulation of GAN leads to the vanishing gradient problem resulting in poor convergence of the algorithm. Inspired by the closed-loop control theory, we mitigate this shortcoming by adopting for the GAN generator an objective function that is the residual between the loss of generated data to be real and the loss of generated data to be fake from the discriminator perspective. We prove that optimizing residual objective yields in minimization of both f-divergence and IPM (Integral Probability Metric), thereby bridging the gap between these two families of divergence.
Comprehensive experiments conducted on both synthetic and real image datasets show that our proposed methods significantly mitigate GAN shortcomings.