This thesis investigates the challenging representation and learning problems that
occur in many image understanding tasks such as image categorization, annotation,
and retrieval. To reduce the semantic gap which is the key and open issue of image
understanding, we propose novel representation and learning approaches to image
understanding based on kernel and spectral methods. The distinct advantage of our
kernel and spectral methods is that they can be readily combined with other machine
learning techniques which are widely used in image understanding.
To capture the context within images, we propose spatial Markov kernels using
the image representation with visual keywords. Based on 2D Markov models, the spatial
dependencies between visual keywords are incorporated into two different kernels,
which differ in whether the class labels of training images are considered for kernel definition.
Our spatial Markov kernels can be applied to different image understanding
tasks such as image categorization and annotation.
We further present a novel semantics-aware image representation which is derived
from but beyond the traditional bag-of-features representation. Specifically, we learn
latent semantics automatically from a large vocabulary of visual keywords through
contextual spectral embedding by exploiting two types of context between visual
keywords for graph construction. The learnt latent semantics can provide a more
succinct representation but a richer descriptor than the visual keywords.
Based on our spatial Markov kernels, we propose an exhaustive and efficient constraint
propagation approach to weakly-supervised image categorization. Different
from most previous methods that are limited to two-class problems or using only must-link constraints, our exhaustive and efficient constraint propagation approach
can be seen as a very general technique which is free from such limitations. More
significantly, we first explicitly show how pairwise constraints are propagated independently
and then accumulated into a conciliatory closed-form solution.
Moreover, our spatial Markov kernels can also be applied to interactive image
categorization. The context across visual keywords within an image is first captured
by our spatial Markov kernels. After graph construction with our kernels, the large
unlabeled data can be exploited by graph-based semi-supervised learning through
label propagation with inter-image consistency. For interactive image categorization,
we further combine this semi-supervised learning with active learning by defining a
new diversity-based data selection criterion using spectral embedding. In this way,
we succeed in developing a novel graph-based framework which can exploit context,
consistency, and diversity cues for interactive image categorization.
Although the above proposed kernel and spectral methods for image understanding
are shown to achieve superior performance on a number of benchmark image
datasets, we also need to demonstrate their potential to be used in other applications
such as action recognition. Following the idea of latent semantic learning for
image understanding by contextual spectral embedding, we further propose two novel
spectral methods to learn latent semantics for action recognition by parameter-free
spectral embedding with sparse representation and hypergraphs, respectively. The
superior performance of these two spectral methods on unconstrained videos verifies
that our proposed methods for image understanding can be extended to semantic
understanding of other types of multimedia content.
Date of Award | 15 Jul 2011 |
---|
Original language | English |
---|
Awarding Institution | - City University of Hong Kong
|
---|
Supervisor | Ho Shing Horace IP (Supervisor) |
---|
- Image processing
- Digital techniques
Kernel and spectral methods for representation and learning in image understanding
LU, Z. (Author). 15 Jul 2011
Student thesis: Doctoral Thesis