TY - GEN
T1 - AggregateRank
T2 - 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '06)
AU - Feng, Guang
AU - Liu, Tie-Yan
AU - Wang, Ying
AU - Bao, Ying
AU - Ma, Zhiming
AU - Zhang, Xu-Dong
AU - Ma, Wei-Ying
PY - 2006
Y1 - 2006
N2 - Since the website is one of the most important organizational structures of the Web, how to effectively rank websites has been essential to many Web applications, such as Web search and crawling. In order to get the ranks of websites, researchers used to describe the inter-connectivity among websites with a so-called HostGraph in which the nodes denote websites and the edges denote linkages between websites (if and only if there are hyperlinks from the pages in one website to the pages in the other, there will be an edge between these two websites), and then adopted the random walk model in the HostGraph. However, as pointed in this paper, the random walk over such a HostGraph is not reasonable because it is not in accordance with the browsing behavior of web surfers. Therefore, the derivate rank cannot represent the true probability of visiting the corresponding website. In this work, we mathematically proved that the probability of visiting a website by the random web surfer should be equal to the sum of the PageRank values of the pages inside that website. Nevertheless, since the number of web pages is much larger than that of websites, it is not feasible to base the calculation of the ranks of websites on the calculation of PageRank. To tackle this problem, we proposed a novel method named AggregateRank rooted in the theory of stochastic complement, which cannot only approximate the sum of PageRank accurately, but also have a lower computational complexity than PageRank. Both theoretical analysis and experimental evaluation show that AggregateRank is a better method for ranking websites than previous methods. Copyright 2006 ACM.
AB - Since the website is one of the most important organizational structures of the Web, how to effectively rank websites has been essential to many Web applications, such as Web search and crawling. In order to get the ranks of websites, researchers used to describe the inter-connectivity among websites with a so-called HostGraph in which the nodes denote websites and the edges denote linkages between websites (if and only if there are hyperlinks from the pages in one website to the pages in the other, there will be an edge between these two websites), and then adopted the random walk model in the HostGraph. However, as pointed in this paper, the random walk over such a HostGraph is not reasonable because it is not in accordance with the browsing behavior of web surfers. Therefore, the derivate rank cannot represent the true probability of visiting the corresponding website. In this work, we mathematically proved that the probability of visiting a website by the random web surfer should be equal to the sum of the PageRank values of the pages inside that website. Nevertheless, since the number of web pages is much larger than that of websites, it is not feasible to base the calculation of the ranks of websites on the calculation of PageRank. To tackle this problem, we proposed a novel method named AggregateRank rooted in the theory of stochastic complement, which cannot only approximate the sum of PageRank accurately, but also have a lower computational complexity than PageRank. Both theoretical analysis and experimental evaluation show that AggregateRank is a better method for ranking websites than previous methods. Copyright 2006 ACM.
KW - AggregateRank
KW - Coupling matrix
KW - Stochastic complement theory
UR - http://www.scopus.com/inward/record.url?scp=33750313238&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-33750313238&origin=recordpage
U2 - 10.1145/1148170.1148187
DO - 10.1145/1148170.1148187
M3 - RGC 32 - Refereed conference paper (with host publication)
SN - 9781595933690
T3 - Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
SP - 75
EP - 82
BT - SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
PB - Association for Computing Machinery
Y2 - 6 August 2006 through 11 August 2006
ER -