TY - GEN
T1 - Event detection from evolution of click-through data
AU - Zhao, Qiankun
AU - Liu, Tie-Yan
AU - Bhowmick, Sourav S.
AU - Ma, Wei-Ying
N1 - Publication details (e.g. title, author(s), publication statuses and dates) are captured on an “AS IS” and “AS AVAILABLE” basis at the time of record harvesting from the data source. Suggestions for further amendments or supplementary information can be sent to [email protected].
PY - 2006
Y1 - 2006
N2 - Previous efforts on event detection from the web have focused primarily on web content and structure data ignoring the rich collection of web log data. In this paper, we propose the first approach to detect events from the click-through data, which is the log data of web search engines. The intuition behind event detection from click-through data is that such data is often event-driven and each event can be represented as a set of query-page pairs that are not only semantically similar but also have similar evolution pattern over time. Given the click-through data, in our proposed approach, we first segment it into a sequence of bipartite graphs based on the user-defined time granularity. Next, the sequence of bipartite graphs is represented as a vector-based graph, which records the semantic and evolutionary relationships between queries and pages. After that, the vector-based graph is transformed into its dual graph, where each node is a query-page pair that will be used to represent real world events. Then, the problem of event detection is equivalent to the problem of clustering the dual graph of the vector-based graph. The clustering process is based on a two-phase graph cut algorithm. In the first phase, query-page pairs are clustered based on the semantic-based similarity such that each cluster in the result corresponds to a specific topic. In the second phase, query-page pairs related to the same topic are further clustered based on the evolution pattern-based similarity such that each cluster is expected to represent a specific event under the specific topic. Experiments with real click-through data collected from a commercial web search engine show that the proposed approach produces high quality results. Copyright 2006 ACM.
AB - Previous efforts on event detection from the web have focused primarily on web content and structure data ignoring the rich collection of web log data. In this paper, we propose the first approach to detect events from the click-through data, which is the log data of web search engines. The intuition behind event detection from click-through data is that such data is often event-driven and each event can be represented as a set of query-page pairs that are not only semantically similar but also have similar evolution pattern over time. Given the click-through data, in our proposed approach, we first segment it into a sequence of bipartite graphs based on the user-defined time granularity. Next, the sequence of bipartite graphs is represented as a vector-based graph, which records the semantic and evolutionary relationships between queries and pages. After that, the vector-based graph is transformed into its dual graph, where each node is a query-page pair that will be used to represent real world events. Then, the problem of event detection is equivalent to the problem of clustering the dual graph of the vector-based graph. The clustering process is based on a two-phase graph cut algorithm. In the first phase, query-page pairs are clustered based on the semantic-based similarity such that each cluster in the result corresponds to a specific topic. In the second phase, query-page pairs related to the same topic are further clustered based on the evolution pattern-based similarity such that each cluster is expected to represent a specific event under the specific topic. Experiments with real click-through data collected from a commercial web search engine show that the proposed approach produces high quality results. Copyright 2006 ACM.
KW - Click-through data
KW - Dynamic web
KW - Event detection
KW - Evolution pattern
UR - http://www.scopus.com/inward/record.url?scp=33749567382&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-33749567382&origin=recordpage
U2 - 10.1145/1150402.1150456
DO - 10.1145/1150402.1150456
M3 - RGC 32 - Refereed conference paper (with host publication)
SN - 1595933395
SN - 9781595933393
VL - 2006
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 484
EP - 493
BT - KDD 2006: Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
T2 - KDD 2006: 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Y2 - 20 August 2006 through 23 August 2006
ER -