Multiscale Spatial Markov Model in image categorization


Student thesis: Master's Thesis

View graph of relations


  • Lihua WANG

Related Research Unit(s)


Awarding Institution
Award date17 Feb 2010


To meet the increasing demand of efficient and accurate retrieval of images, the field of content-based image retrieval (CBIR) has been booming since 1970s. More recently, machine learning has begun to play an influential role in the development of CBIR techniques to a higher level. As a machine learning technique, the Hidden Markov Model (HMM) has been utilized on the image classification problem for the past decade. The Image Computing Group at City University of Hong Kong has previous proposed the Spatial Hidden Markov Model (SHMM), which is a two dimensional generalization of the traditional hidden Markov model (HMM), with the capability of block-based annotation as well as semantic classification of images. Based upon the work on SHMM, the contributions of this thesis can be summarized as follows. We have analyzed the property of SHMM in terms of the sensitivity in semantic classification with respect to different block sizes. We have proposed a multiscale SHMM that combines multiple SHMMs, each classifying the image on a different scale. By regarding each SHMM as distinct classifiers, classifier combination algorithms are proposed to integrate the outputs of the respective SHMMs to improve image classification accuracy. Experiment results demonstrate that the multiscale SHMM consistently outperforms single SHMM in image semantic classifications. Furthermore, we propose a Hierarchical Semantic Markov Model (HSMM) for image categorization, which is an original hierarchical extension of previous 2-D Markov Models. The proposed HSMM combines the advantage of informative hierarchical image features and the advantage of Bag-of-Words model to represent image features with visual words and thus avoiding manual annotation as well as providing a multiscale representation of image features and their relations across scales. The HSMM is designed to describe the distribution of visual words over each image category, by means of capturing the neighboring relationship of visual words within the same scale as well as the compositional relationship of visual words across different scales. Moreover, a novel idea of semantic hierarchy has been developed in the model to represent the compositional relationship of visual words at semantic level. Experiments demonstrate that our model performs better than other related approaches in terms of image classification. In virtue of the work presented in the thesis, our understanding of the properties of SHMM is enhanced and consequently the performance of previous SHMM gets improved. Moreover, the SHMM is further extended to be a hierarchical Markov model, which bridges the gap between the low-level visual features and the high-level semantic by analyzing features as well as their spatial relations over several different scales. The proposed HSMM has remarkable advantage over the previous SHMM in both saving manual annotations and improving the accuracy of image categorization.

    Research areas

  • Markov processes, Computer vision, Digital techniques, Image processing