A Study on Web Searching: Overlap and Distance of the Search Engine Results

Shanfeng Zhu, Xiaotie Deng, Qizhi Fang, Weimin Zheng

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 12 - Chapter in an edited book (Author)peer-review

Abstract

Web search engines are one of the most popular services to help users find useful information on the Web. Although many studies have been carried out to estimate the size and overlap of the general web search engines, it may not benefit the ordinary web searching users, since they care more about the overlap of the top N (N=10, 20 or 50) search results on concrete queries, but not the overlap of the total index database. In this study, we present experimental results on the comparison of the overlap of the top N (N=10, 20 or 50) search results from AlltheWeb, Google, AltaVista and WiseNut for the 58 most popular queries, as well as for the distance of the overlapped results. These 58 queries are chosen from WordTracker service, which records the most popular queries submitted to some famous metasearch engines, such as MetaCrawler and Dogpile. We divide these 58 queries into three categories for further investigation. Through in-depth study, we observe a number of interesting results: the overlap of the top N results retrieved by different search engines is very small; the search results of the queries in different categories behave in dramatically different ways; Google, on average, has the highest overlap among these four search engines; each search engine tends to adopt a different rank algorithm independently. © 2004 by Idea Group Inc.
Original languageEnglish
Title of host publicationIntelligent Agents for Data Mining and Information Retrieval
PublisherIGI Global Publishing
Pages207-224
ISBN (Print)9781591401957, 9781591401940
DOIs
Publication statusPublished - 1 Jan 2003
Externally publishedYes

Bibliographical note

Publication details (e.g. title, author(s), publication statuses and dates) are captured on an “AS IS” and “AS AVAILABLE” basis at the time of record harvesting from the data source. Suggestions for further amendments or supplementary information can be sent to <a href="mailto:[email protected]">[email protected]</a>.

Fingerprint

Dive into the research topics of 'A Study on Web Searching: Overlap and Distance of the Search Engine Results'. Together they form a unique fingerprint.

Cite this