Block-based web search

Deng Cai, Shipeng Yu, Ji-Rong Wen, Wei-Ying Ma

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

130 Citations (Scopus)

Abstract

Multiple-topic and varying-length of web pages are two negative factors significantly affecting the performance of web search. In this paper, we explore the use of page segmentation algorithms to partition web pages into blocks and investigate how to take advantage of block-level evidence to improve retrieval performance in the web context. Because of the special characteristics of web pages, different page segmentation method will have different impact on web search performance. We compare four types of methods, including fixed-length page segmentation, DOM-based page segmentation, vision-based page segmentation, and a combined method which integrates both semantic and fixed-length properties. Experiments on block-level query expansion and retrieval are performed. Among the four approaches, the combined method achieves the best performance for web search. Our experimental results also show that such a semantic partitioning of web pages effectively deals with the problem of multiple drifting topics and mixed lengths, and thus has great potential to boost up the performance of current web search engines.
Original languageEnglish
Title of host publicationProceedings of Sheffield SIGIR - Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
PublisherAssociation for Computing Machinery
Pages456-463
ISBN (Print)1581138814, 9781581138818
DOIs
Publication statusPublished - 2004
Externally publishedYes
EventProceedings of Sheffield SIGIR - Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval - Sheffield, United Kingdom
Duration: 25 Jul 200429 Jul 2004

Publication series

NameProceedings of Sheffield SIGIR - Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

Conference

ConferenceProceedings of Sheffield SIGIR - Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
PlaceUnited Kingdom
CitySheffield
Period25/07/0429/07/04

Bibliographical note

Publication details (e.g. title, author(s), publication statuses and dates) are captured on an “AS IS” and “AS AVAILABLE” basis at the time of record harvesting from the data source. Suggestions for further amendments or supplementary information can be sent to [email protected].

Research Keywords

  • Page Segmentation
  • Passage Retrieval
  • Query Expansion
  • Vision-based Page Segmentation
  • Web Information Retrieval

Fingerprint

Dive into the research topics of 'Block-based web search'. Together they form a unique fingerprint.

Cite this