TY - GEN
T1 - A unified framework for clustering heterogeneous Web objects
AU - Zeng, Hua-Jun
AU - Chen, Zheng
AU - Ma, Wei-Ying
N1 - Publication details (e.g. title, author(s), publication statuses and dates) are captured on an “AS IS” and “AS AVAILABLE” basis at the time of record harvesting from the data source. Suggestions for further amendments or supplementary information can be sent to [email protected].
PY - 2002
Y1 - 2002
N2 - We introduce a novel framework for clustering Web data which is often heterogeneous in nature. As most existing methods often integrate heterogeneous data into a unified feature space, their flexibilities to explore and adjust contributing effects from different heterogeneous information are compromised. In contrast, our framework enables separate clustering of homogeneous data in the entire process based on their respective features, and a layered structure with link information is used to iteratively project and propagate the clustered results between layers until it converges. Our experimental results show that such a scheme not only effectively overcomes the problem of data sparseness caused by the high dimensional link space but also improves the clustering accuracy significantly. We achieve 19% and 41% performance increases when clustering Web-pages and users based on a semi-synthetic Web log. Finally, we show a real clustering result based on UC Berkeley's Web log. © 2002 IEEE.
AB - We introduce a novel framework for clustering Web data which is often heterogeneous in nature. As most existing methods often integrate heterogeneous data into a unified feature space, their flexibilities to explore and adjust contributing effects from different heterogeneous information are compromised. In contrast, our framework enables separate clustering of homogeneous data in the entire process based on their respective features, and a layered structure with link information is used to iteratively project and propagate the clustered results between layers until it converges. Our experimental results show that such a scheme not only effectively overcomes the problem of data sparseness caused by the high dimensional link space but also improves the clustering accuracy significantly. We achieve 19% and 41% performance increases when clustering Web-pages and users based on a semi-synthetic Web log. Finally, we show a real clustering result based on UC Berkeley's Web log. © 2002 IEEE.
UR - http://www.scopus.com/inward/record.url?scp=26944503672&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-26944503672&origin=recordpage
U2 - 10.1109/WISE.2002.1181653
DO - 10.1109/WISE.2002.1181653
M3 - RGC 32 - Refereed conference paper (with host publication)
SN - 0769517668
SN - 9780769517667
T3 - WISE 2002 - Proceedings of the 3rd International Conference on Web Information Systems Engineering
SP - 161
EP - 170
BT - WISE 2002 - Proceedings of the 3rd International Conference on Web Information Systems Engineering
PB - IEEE
T2 - 3rd International Conference on Web Information Systems Engineering, WISE 2002
Y2 - 12 December 2002 through 14 December 2002
ER -