Clustering and searching WWW images using link and page layout analysis

Xiaofei He, Deng Cai, Ji-Rong Wen, Wei-Ying Ma, Hong-Jiang Zhang

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

48 Citations (Scopus)

Abstract

Due to the rapid growth of the number of digital images on the Web, there is an increasing demand for an effective and efficient method for organizing and retrieving the available images. This article describes iFind, a system for clustering and searching WWW images. By using a vision-based page segmentation algorithm, a Web page is partitioned into blocks, and the textual and link information of an image can be accurately extracted from the block containing that image. The textual information is used for image indexing. By extracting the page-to-block, block-to-image, block-to-page relationships through link structure and page layout analysis, we construct an image graph. Our method is less sensitive to noisy links than previous methods like PageRank, HITS, and PicASHOW, and hence the image graph can better reflect the semantic relationship between images. Using the notion of Markov Chain, we can compute the limiting probability distributions of the images, ImageRanks, which characterize the importance of the images. The ImageRanks are combined with the relevance scores to produce the final ranking for image search. With the graph models, we can also use techniques from spectral graph theory for image clustering and embedding, or 2-D visualization. Some experimental results on 11.6 million images downloaded from the Web are provided in the article. © 2007 ACM.
Original languageEnglish
Article number1230816
JournalACM Transactions on Multimedia Computing, Communications and Applications
Volume3
Issue number2
DOIs
Publication statusPublished - 1 May 2007
Externally publishedYes

Bibliographical note

Publication details (e.g. title, author(s), publication statuses and dates) are captured on an “AS IS” and “AS AVAILABLE” basis at the time of record harvesting from the data source. Suggestions for further amendments or supplementary information can be sent to [email protected].

Research Keywords

  • Image clustering
  • Image search
  • Link analysis
  • Web mining

Fingerprint

Dive into the research topics of 'Clustering and searching WWW images using link and page layout analysis'. Together they form a unique fingerprint.

Cite this