Image Classification and Synthesis Using Deep Learning Techniques


Student thesis: Doctoral Thesis

View graph of relations


Related Research Unit(s)


Awarding Institution
Award date20 Nov 2018


With the increasing digitization of society, humans carry out more activities on various digital devices and the Internet, and massive data with different types are recorded. As one of the most important types, image data possesses important and valuable information about human activities. Hence, there is an urgent demand for processing images using computer vision technologies to promote the digitalization and intelligentization on daily life. As two fundamental tasks in image processing, image classification and image synthesis have received a lot of attention from researchers. Briefly speaking, image classification aims to categorize images into one of several predefined classes, and image synthesis aims to generate synthetic images which would be visually realistic and could not be differentiated from natural images in some measure. Viewed from the perspective of data distribution modeling and data representation learning, these two tasks are closely related. Specifically, data representations can be inferred when performing image classification, and classified images from classification models can be used as an augmented dataset to improve image synthesis. Similarly, the data distribution can be modeled when performing image synthesis, and generated images from synthesis models can serve as an augmented dataset to enhance image classification.

Recently, deep learning has become one of the most active subfields of machine learning. It allows computational models composed of multiple-level processing layers to learn multiple-level data representation with multiple-level abstraction. With this predominant multiple-level architecture, deep learning can be applied to solve complicated problems which cannot be easily solved by traditional methods. In particular, as a deep learning model inspired by the biological vision system, deep convolutional neural networks (CNNs) have good intrinsic properties, which are very suitable to solve various computer vision problems, such as image classification and image synthesis.

In this thesis, considering the prominent advances and outstanding performance of deep learning, we investigated how to exploit deep learning techniques to improve the performance of image classification and image synthesis. For image classification, the importance of network architectures of deep CNNs optimized for the improvement of image classification was considered, with a focus on exploring the activation functions which play important roles in the architecture design. For image synthesis, research was carried out from two perspectives.

Firstly, considering the correlation between color image channels, a deep CNN-based generative model combining the self-supervised learning framework and the generative adversarial network (GAN) framework together was proposed for bettering image generation. Secondly, considering shortcomings in existing unsupervised learning frameworks for modeling both the distributions of the data and the latent representation, a deep CNN-based generative model based on the maximum mean discrepancy (MMD) framework was proposed for achieving both image generation and latent representation inference. More specifically, our study was performed in the following three works:

1. We sought to learn activation functions via combining basic activation functions in a data-driven way. Specifically, three strategies were proposed to allow the activation operation to be adaptive to inputs. Firstly two strategies of linearly and nonlinearly combining basic activation functions were examined, respectively. Then a strategy of combining basic activation functions in a way of a hierarchical integration was further investigated. Experimental results showed that the proposed adaptive activation functions based on these strategies could improve the performance of image classification.

2. Currently most GAN-based methods directly generate all channels of a color image as a whole, while it has not been investigated that self-supervised information from the correlation between image channels could be explored for further boosting image generation. By leveraging the closely-related semantic relationship of color image channels, we introduced self-supervised learning into the GAN framework, and proposed a generative model called self-supervised GAN (SSGAN). Specifically, SSGAN explicitly decomposed the image generation process into multiple procedures as follows: (1) generate image channels, (2) correlate image channels, and (3) concatenate image channels into the whole image. Based on this decomposition, a basic adversarial learning task for generating images was performed, and meanwhile an auxiliary self-supervised learning task for further regularizing generation procedures was constructed. Experimental results demonstrated that the proposed SSGAN could improve image generation and possess capabilities of image colorization and image texturization.

3. For modeling the data distribution or the latent representation distribution, deep learning methods such as GAN and the variational autoencoder (VAE) have been proposed. However, there are some shortcomings in them: GAN can only model the data distribution using the challenging and unstable adversarial training, and VAE tends to learn less meaningful latent representations. To address these issues, we proposed an unsupervised learning framework called CoupledMMD. Specifically, CoupledMMD performed coupled learning for image generation and latent representation inference based on kernel maximum mean discrepancy (MMD). It consists of an inference network and a generation network for mapping between the data space and the latent space, and two MMD testers for performing two-sample tests in these two spaces. With imposing structural regularization that the two networks are inverses of each other, it could build the relationship of learning these two distributions. Experimental results indicated that the proposed CoupledMMD is competitive on data generation and latent representation inference.