Image understanding based on sparse coding and semi-supervised learning
Student thesis: Doctoral Thesis
Related Research Unit(s)
This thesis proposes novel representation and learning approaches for image understanding tasks such as image classification, annotation, etc. To begin with, in order to capture spatial context in the data, we propose a novel coding scheme called Robust Regularized Coding (RRC), which fully exploits the geometrical information among local descriptors to significantly boost the discriminating capability of the resultant features. More specifically, both locality constraint and smoothness constraint terms with respect to RRC codes are incorporated into the objective function to preserve the local invariance of RRC codes, and the global similarity between local features is considered to constrain sparsity. To make the RRC scale up to larger databases, an online codebook learning algorithm is also proposed, which processes a small subset of the dataset at a time and loops through the whole dataset to update the codebook incrementally. In particular, RRC can be readily combined with other machine learning techniques which are widely used in image understanding. To address the problem that most graph-based semi-supervised learning approaches did not explore the label dissimilarity knowledge, we also propose a novel graphbased label propagation framework that effectively incorporates similarity and dissimilarity information into semi-supervised learning. The class mass normalization is utilized to make the label decision rule match class priors. The function induction algorithm is presented to predict the labels of test data. More importantly, by solving quadratic optimization, the approach can give rise to closed-form solution for classification functions of unlabeled data and out-of-sample data. Furthermore, we propose a novel algorithm, Iterative Semi-supervised Sparse Coding (ISSC), which jointly explores the advantages of both sparse coding and graphbased semi-supervised learning in order to learn discriminative sparse codes as well as an effective classification function. The ISSC algorithm fully exploits initial labels and the subsequently predicted labels for sparse codes learning. At the same time, during the graph-based semi-supervised learning stage, similarity matrix is firstly adjusted through the latest learned sparse codes, and then is utilized to obtain a better classification function. The ISSC approach can also give rise to closed-form solutions for sparse codes and classification function, respectively. Last but not least, as traditional one-to-one similarity measurements usually suffer from low effectiveness in annotating the images with multiple objects/semantics, we present an effective reconstruction based image annotation (RBIA) algorithm to propagate the labels of training images to test image by multi-label linear embedding. All our approaches proposed in this thesis have been extensively evaluated over several benchmarking datasets. The experimental results demonstrate each of our proposed approaches can achieve significant performance improvements with respect to the state-of-the-arts.
- Supervised learning (Machine learning), Coding theory, Digital techniques, Image processing