Kernel and spectral methods for representation and learning in image understanding
Student thesis: Doctoral Thesis
Related Research Unit(s)
This thesis investigates the challenging representation and learning problems that occur in many image understanding tasks such as image categorization, annotation, and retrieval. To reduce the semantic gap which is the key and open issue of image understanding, we propose novel representation and learning approaches to image understanding based on kernel and spectral methods. The distinct advantage of our kernel and spectral methods is that they can be readily combined with other machine learning techniques which are widely used in image understanding. To capture the context within images, we propose spatial Markov kernels using the image representation with visual keywords. Based on 2D Markov models, the spatial dependencies between visual keywords are incorporated into two different kernels, which differ in whether the class labels of training images are considered for kernel definition. Our spatial Markov kernels can be applied to different image understanding tasks such as image categorization and annotation. We further present a novel semantics-aware image representation which is derived from but beyond the traditional bag-of-features representation. Specifically, we learn latent semantics automatically from a large vocabulary of visual keywords through contextual spectral embedding by exploiting two types of context between visual keywords for graph construction. The learnt latent semantics can provide a more succinct representation but a richer descriptor than the visual keywords. Based on our spatial Markov kernels, we propose an exhaustive and efficient constraint propagation approach to weakly-supervised image categorization. Different from most previous methods that are limited to two-class problems or using only must-link constraints, our exhaustive and efficient constraint propagation approach can be seen as a very general technique which is free from such limitations. More significantly, we first explicitly show how pairwise constraints are propagated independently and then accumulated into a conciliatory closed-form solution. Moreover, our spatial Markov kernels can also be applied to interactive image categorization. The context across visual keywords within an image is first captured by our spatial Markov kernels. After graph construction with our kernels, the large unlabeled data can be exploited by graph-based semi-supervised learning through label propagation with inter-image consistency. For interactive image categorization, we further combine this semi-supervised learning with active learning by defining a new diversity-based data selection criterion using spectral embedding. In this way, we succeed in developing a novel graph-based framework which can exploit context, consistency, and diversity cues for interactive image categorization. Although the above proposed kernel and spectral methods for image understanding are shown to achieve superior performance on a number of benchmark image datasets, we also need to demonstrate their potential to be used in other applications such as action recognition. Following the idea of latent semantic learning for image understanding by contextual spectral embedding, we further propose two novel spectral methods to learn latent semantics for action recognition by parameter-free spectral embedding with sparse representation and hypergraphs, respectively. The superior performance of these two spectral methods on unconstrained videos verifies that our proposed methods for image understanding can be extended to semantic understanding of other types of multimedia content.
- Image processing, Digital techniques