Deep Learning for Image Representation in Non-Euclidean Feature Spaces


Student thesis: Doctoral Thesis

View graph of relations


Related Research Unit(s)


Awarding Institution
Award date8 Aug 2019


The explosive growth of visual contents from social media drives an urgent need for more effective image retrieval and recognition methods. Deep representation learning has become a major approach for large-scale visual search and classification due to the fast development of deep convolutional neural networks (CNN). The deep representation learning method can find a non-linear mapping function such as a CNN, which can project an image to a compact and discriminative feature space. In general, this feature space is treated as a Euclidean space. However, it is not efficient to store Euclidean features and compute the Euclidean distance. Moreover, it faces the problem of "curse of dimensionality".

Apart from the classical Euclidean feature space, two different spaces, the hyper-sphere and the Hamming spaces can be employed as feature spaces for deep representation learning. The hyper-sphere feature space is a point set in which all points have a unit norm. It has been proposed to deal with fine-level image classification and retrieval problems such as face recognition, person re-identification, and online product retrieval. In the Hamming space, for the purpose of reducing storage and computational cost, a CNN can be used to convert an image to a binary code, which is more efficient for large-scale image retrieval task. This thesis aims to develop new deep representation learning models using these two non-Euclidean feature spaces.

The first part of thesis is focused on deep representation learning with a hyper-sphere feature space. In general, a hyper-sphere feature space is achieved by L2-normalizing features. L2-normalization can serve as an effective method to enhance the discriminant of deep representation learning. However, without exploiting the geometric properties of a feature space, the generally used gradient based optimization methods fail to track global information during training. To overcome this problem, we propose a novel deep metric learning model based on directional distribution. By defining a probability model of the loss function from the von Mises-Fisher distribution, we propose an effective alternative learning algorithm by periodically updating the class centers. This metric learning procedure not only captures the global information about the embedding space but also yields an approximate representation of the class distribution during training. Considering classification and retrieval tasks, our experiments on benchmark datasets demonstrate an improvement in performance using the proposed algorithm. Particularly, with a small number of convolutional layers, a significant increase in accuracy can be observed compared to widely used gradient-based methods.

In the second part of this thesis, the Hamming space-based deep representation learning methods, also known as deep hashing, are investigated. Deep supervised hashing has emerged as an effective solution to large-scale semantic image retrieval problems in computer vision. CNN-based hashing methods typically seek pairwise or triplet labels to conduct similarity-preserving learning. However, complex semantic concepts of visual contents are hard to capture by similar/dissimilar labels, which limits the retrieval performance. Generally, pair-wise or triplet losses not only suffer from expensive training costs but also lack sufficient semantic information. We propose a novel deep supervised hashing model to learn more compact class-level similarity-preserving binary codes. Our method is motivated by deep metric learning that takes semantic labels directly as supervised information in training and generates corresponding discriminant hashing codes. Specifically, a novel cubic constraint loss function based on Gaussian distribution is proposed, which preserves semantic variations while penalizing the overlapping part of different classes in the embedding space. To address the discrete optimization problem introduced by binary codes, a two-step optimization strategy is proposed to provide efficient training and avoid the problem of gradient vanishing. Extensive experiments on five large-scale benchmark databases show that our model can achieve state-of-the-art retrieval performance.