Common pattern discovery and multiple appearance modeling for visual content analysis
通用模式檢測與多可視模塊建模技術用於可視代內容分析
Student thesis: Master's Thesis
Author(s)
Related Research Unit(s)
Detail(s)
Awarding Institution | |
---|---|
Supervisors/Advisors |
|
Award date | 15 Feb 2007 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/theses/theses(3c22180a-4425-4c31-bf1c-b32590c6b4f9).html |
---|---|
Other link(s) | Links |
Abstract
Advances in the modern multimedia technology have led to an ever growing archive of multimedia documents. In order to provide better browsing, sum- marization and indexing of visual contents, there is a need to bridge the gap between low-level features and their high level semantic contents. In this the- sis, we investigate two major tasks related to content analysis: common pattern discovery from a small set of images, and the mining and modeling of multiple visual parts keyword categories from a training set. Both tasks are conducted us- ing only partial-labels at image level. Data mining is explored, as an alternative to learning, to discover the interesting patterns from the image set using partial information. In the ¯rst task, common patterns in multiple images are discovered using matching. The issues in feature robustness, matching robustness and noise ar- tifact are addressed to delve into the potential of using regions as the basic matching unit. We novelly employ the many-to-many (M2M) matching strat- egy, speci¯cally with the Earth Mover's Distance (EMD), to increase resilience towards the structural inconsistency from improper region segmentation of a pat- tern as a result of various geometric and photometric transformations. However, the matching pattern of M2M is dispersed and unregulated in nature, leading to the challenges of mining a common pattern while identifying the underlying transformation. To avoid analysis on unregulated matching, we propose mono- lithic matching for the collaborative mining of common pattern from multiple images. The patterns are re¯ned iteratively using the Expectation-Maximization algorithm by taking advantage of the crowding phenomenon in the EMD °ows. Experiment results show that our approach is robust and can e®ectively handle images with background clutter. To pinpoint the potential of CPD, we further use image retrieval as an example to show the application of CPD for pattern learning in relevancy feedback. In the second task, the problem of learning and modeling of visual semantics of keyword categories is investigated for image annotation. Supervised learning algorithms which learn only a single concept point of a category are limited in their e®ectiveness. Data mining technique is employed to mine multiple con- cepts, where each concept may consist of one or more visual parts, to capture the diverse visual appearances of a single keyword category. For training, the Apriori principle is used to e±ciently mine a set of frequent blobsets to capture the semantics of a rich and diverse visual category. Each concept is ranked based on a discriminative or diverse density measure. For annotation, a level-sensitive matching is used to rank words given an unannotated image.
- Pattern recognition systems, Digital techniques, Image processing