Skip to main navigation Skip to search Skip to main content

Improving pseudo-relevance feedback in web information retrieval using web page segmentation

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

In contrast to traditional document retrieval, a web page as a whole is not a good information unit to search because it often contains multiple topics and a lot of irrelevant information from navigation, decoration, and interaction part of the page. In this paper, we propose a VIsion-based Page Segmentation (VIPS) algorithm to detect the semantic content structure in a web page. Compared with simple DOM based segmentation method, our page segmentation scheme utilizes useful visual cues to obtain a better partition of a page at the semantic level. By using our VIPS algorithm to assist the selection of query expansion terms in pseudo-relevance feedback in web information retrieval, we achieve 27% performance improvement on Web Track dataset.
Copyright is held by the author/owner(s).
Original languageEnglish
Title of host publicationProceedings of the 12th International Conference on World Wide Web, WWW 2003
Pages11-18
DOIs
Publication statusPublished - 2003
Externally publishedYes
Event12th International Conference on World Wide Web, WWW 2003 - Budapest, Hungary
Duration: 20 May 200324 May 2003

Publication series

NameProceedings of the 12th International Conference on World Wide Web, WWW 2003

Conference

Conference12th International Conference on World Wide Web, WWW 2003
PlaceHungary
CityBudapest
Period20/05/0324/05/03

Bibliographical note

Publication details (e.g. title, author(s), publication statuses and dates) are captured on an “AS IS” and “AS AVAILABLE” basis at the time of record harvesting from the data source. Suggestions for further amendments or supplementary information can be sent to [email protected].

Research Keywords

  • page segmentation
  • query expansion
  • relevance feedback
  • web information retrieval

Fingerprint

Dive into the research topics of 'Improving pseudo-relevance feedback in web information retrieval using web page segmentation'. Together they form a unique fingerprint.

Cite this