Abstract
Although it has been studied for many years, image classification is still a challenging problem. In this paper, we propose a visual language modeling method for content-based image classification. It transforms each image into a matrix of visual words, and assumes that each visual word is conditionally dependent on its neighbors. For each image category, a visual language model is constructed using a set of training images, which captures both the co-occurrence and proximity information of visual words. According to how many neighbors are taken in consideration, three kinds of language models can be trained, including unigram, bigram and trigram, each of which corresponds to a different level of model complexity. Given a test image, its category is determined by estimating how likely it is generated under a specific category. Compared with traditional methods that are based on bag-of-words models, the proposed method can utilize the spatial correlation of visual words effectively in image classification. In addition, we propose to use the absent words, which refer to those appearing frequently in a category but not in the target image, to help image classification. Experimental results show that our method can achieve comparable accuracy while performing classification much more quickly. Copyright 2007 ACM.
| Original language | English |
|---|---|
| Title of host publication | International Multimedia Conference, MM'07 - Proceedings of the 9th ACM SIG Multimedia International Workshop on Multimedia Information Retrieval, MIR'07 |
| Pages | 115-124 |
| DOIs | |
| Publication status | Published - 2007 |
| Externally published | Yes |
| Event | International Multimedia Conference, MM'07 - 9th ACM SIG Multimedia International Workshop on Multimedia Information Retrieval, MIR'07 - Augsburg, Bavaria, Germany Duration: 28 Sept 2007 → 28 Sept 2007 |
Publication series
| Name | Proceedings of the ACM International Multimedia Conference and Exhibition |
|---|
Conference
| Conference | International Multimedia Conference, MM'07 - 9th ACM SIG Multimedia International Workshop on Multimedia Information Retrieval, MIR'07 |
|---|---|
| Place | Germany |
| City | Augsburg, Bavaria |
| Period | 28/09/07 → 28/09/07 |
Bibliographical note
Publication details (e.g. title, author(s), publication statuses and dates) are captured on an “AS IS” and “AS AVAILABLE” basis at the time of record harvesting from the data source. Suggestions for further amendments or supplementary information can be sent to [email protected].Funding
The research is supported in part by National Natural Science Foundation of China (60672056) and Microsoft Research Asia Internet Services in Academic Research Fund. This work was performed at Microsoft Research Asia.
Research Keywords
- Absent word criterion
- Image classification
- Visual language model