Visual language modeling for image classification

Lei Wu, Mingjing Li, Zhiwei Li, Wei-Ying Ma, Nenghai Yu

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

57 Citations (Scopus)

Abstract

Although it has been studied for many years, image classification is still a challenging problem. In this paper, we propose a visual language modeling method for content-based image classification. It transforms each image into a matrix of visual words, and assumes that each visual word is conditionally dependent on its neighbors. For each image category, a visual language model is constructed using a set of training images, which captures both the co-occurrence and proximity information of visual words. According to how many neighbors are taken in consideration, three kinds of language models can be trained, including unigram, bigram and trigram, each of which corresponds to a different level of model complexity. Given a test image, its category is determined by estimating how likely it is generated under a specific category. Compared with traditional methods that are based on bag-of-words models, the proposed method can utilize the spatial correlation of visual words effectively in image classification. In addition, we propose to use the absent words, which refer to those appearing frequently in a category but not in the target image, to help image classification. Experimental results show that our method can achieve comparable accuracy while performing classification much more quickly. Copyright 2007 ACM.
Original languageEnglish
Title of host publicationInternational Multimedia Conference, MM'07 - Proceedings of the 9th ACM SIG Multimedia International Workshop on Multimedia Information Retrieval, MIR'07
Pages115-124
DOIs
Publication statusPublished - 2007
Externally publishedYes
EventInternational Multimedia Conference, MM'07 - 9th ACM SIG Multimedia International Workshop on Multimedia Information Retrieval, MIR'07 - Augsburg, Bavaria, Germany
Duration: 28 Sept 200728 Sept 2007

Publication series

NameProceedings of the ACM International Multimedia Conference and Exhibition

Conference

ConferenceInternational Multimedia Conference, MM'07 - 9th ACM SIG Multimedia International Workshop on Multimedia Information Retrieval, MIR'07
PlaceGermany
CityAugsburg, Bavaria
Period28/09/0728/09/07

Bibliographical note

Publication details (e.g. title, author(s), publication statuses and dates) are captured on an “AS IS” and “AS AVAILABLE” basis at the time of record harvesting from the data source. Suggestions for further amendments or supplementary information can be sent to [email protected].

Funding

The research is supported in part by National Natural Science Foundation of China (60672056) and Microsoft Research Asia Internet Services in Academic Research Fund. This work was performed at Microsoft Research Asia.

Research Keywords

  • Absent word criterion
  • Image classification
  • Visual language model

Fingerprint

Dive into the research topics of 'Visual language modeling for image classification'. Together they form a unique fingerprint.

Cite this