TY - GEN
T1 - Exploring URL hit priors for Web search
AU - Song, Ruihua
AU - Xin, Guomao
AU - Shi, Shuming
AU - Wen, Ji-Rong
AU - Ma, Wei-Ying
N1 - Publication details (e.g. title, author(s), publication statuses and dates) are captured on an “AS IS” and “AS AVAILABLE” basis at the time of record harvesting from the data source. Suggestions for further amendments or supplementary information can be sent to [email protected].
PY - 2006
Y1 - 2006
N2 - URL usually contains meaningful information for measuring the relevance of a Web page to a query in Web search. Some existing works utilize URL depth priors (i.e. the probability of being a good page given the length and depth of a URL) for improving some types of Web search tasks. This paper suggests the use of the location of query terms occur in a URL for measuring how well a web page is matched with a user's information need in web search. First, we define and estimate URL hit types, i.e. the priori probability of being a good answer given the type of query term hits in the URL. The main advantage of URL hit priors (over depth priors) is that it can achieve stable improvement for both informational and navigational queries. Second, an obstacle of exploiting such priors is that shortening and concatenation are frequently used in a URL. Our investigation shows that only 30% URL hits are recognized by an ordinary word breaking approach. Thus we combine three methods to improve matching. Finally, the priors are integrated into the probabilistic model for enhancing web document retrieval. Our experiments were conducted using 7 query sets of TREC2002, TREC2003 and TREC2004, and show that the proposed approach is stable and improve retrieval effectiveness by 4%-11% for navigational queries and 10% for informational queries. © Springer-Verlag Berlin Heidelberg 2006.
AB - URL usually contains meaningful information for measuring the relevance of a Web page to a query in Web search. Some existing works utilize URL depth priors (i.e. the probability of being a good page given the length and depth of a URL) for improving some types of Web search tasks. This paper suggests the use of the location of query terms occur in a URL for measuring how well a web page is matched with a user's information need in web search. First, we define and estimate URL hit types, i.e. the priori probability of being a good answer given the type of query term hits in the URL. The main advantage of URL hit priors (over depth priors) is that it can achieve stable improvement for both informational and navigational queries. Second, an obstacle of exploiting such priors is that shortening and concatenation are frequently used in a URL. Our investigation shows that only 30% URL hits are recognized by an ordinary word breaking approach. Thus we combine three methods to improve matching. Finally, the priors are integrated into the probabilistic model for enhancing web document retrieval. Our experiments were conducted using 7 query sets of TREC2002, TREC2003 and TREC2004, and show that the proposed approach is stable and improve retrieval effectiveness by 4%-11% for navigational queries and 10% for informational queries. © Springer-Verlag Berlin Heidelberg 2006.
UR - http://www.scopus.com/inward/record.url?scp=33745869365&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-33745869365&origin=recordpage
U2 - 10.1007/11735106_25
DO - 10.1007/11735106_25
M3 - RGC 32 - Refereed conference paper (with host publication)
SN - 3540333479
SN - 9783540333470
VL - 3936 LNCS
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 277
EP - 288
BT - Advances in Information Retrieval - 28th European Conference on IR Research, ECIR 2006, Proceedings
T2 - 28th European Conference on Information Retrieval Research, ECIR 2006
Y2 - 10 April 2006 through 12 April 2006
ER -