Abstract
In contrast to traditional document retrieval, a web page as a whole is not a good information unit to search because it often contains multiple topics and a lot of irrelevant information from navigation, decoration, and interaction part of the page. In this paper, we propose a VIsion-based Page Segmentation (VIPS) algorithm to detect the semantic content structure in a web page. Compared with simple DOM based segmentation method, our page segmentation scheme utilizes useful visual cues to obtain a better partition of a page at the semantic level. By using our VIPS algorithm to assist the selection of query expansion terms in pseudo-relevance feedback in web information retrieval, we achieve 27% performance improvement on Web Track dataset.
Copyright is held by the author/owner(s).
Copyright is held by the author/owner(s).
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 12th International Conference on World Wide Web, WWW 2003 |
| Pages | 11-18 |
| DOIs | |
| Publication status | Published - 2003 |
| Externally published | Yes |
| Event | 12th International Conference on World Wide Web, WWW 2003 - Budapest, Hungary Duration: 20 May 2003 → 24 May 2003 |
Publication series
| Name | Proceedings of the 12th International Conference on World Wide Web, WWW 2003 |
|---|
Conference
| Conference | 12th International Conference on World Wide Web, WWW 2003 |
|---|---|
| Place | Hungary |
| City | Budapest |
| Period | 20/05/03 → 24/05/03 |
Bibliographical note
Publication details (e.g. title, author(s), publication statuses and dates) are captured on an “AS IS” and “AS AVAILABLE” basis at the time of record harvesting from the data source. Suggestions for further amendments or supplementary information can be sent to [email protected].Research Keywords
- page segmentation
- query expansion
- relevance feedback
- web information retrieval
Fingerprint
Dive into the research topics of 'Improving pseudo-relevance feedback in web information retrieval using web page segmentation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver