A unified framework for clustering heterogeneous Web objects

Hua-Jun Zeng, Zheng Chen, Wei-Ying Ma

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

42 Citations (Scopus)

Abstract

We introduce a novel framework for clustering Web data which is often heterogeneous in nature. As most existing methods often integrate heterogeneous data into a unified feature space, their flexibilities to explore and adjust contributing effects from different heterogeneous information are compromised. In contrast, our framework enables separate clustering of homogeneous data in the entire process based on their respective features, and a layered structure with link information is used to iteratively project and propagate the clustered results between layers until it converges. Our experimental results show that such a scheme not only effectively overcomes the problem of data sparseness caused by the high dimensional link space but also improves the clustering accuracy significantly. We achieve 19% and 41% performance increases when clustering Web-pages and users based on a semi-synthetic Web log. Finally, we show a real clustering result based on UC Berkeley's Web log. © 2002 IEEE.
Original languageEnglish
Title of host publicationWISE 2002 - Proceedings of the 3rd International Conference on Web Information Systems Engineering
PublisherIEEE
Pages161-170
ISBN (Print)0769517668, 9780769517667
DOIs
Publication statusPublished - 2002
Externally publishedYes
Event3rd International Conference on Web Information Systems Engineering, WISE 2002 - Singapore, Singapore
Duration: 12 Dec 200214 Dec 2002

Publication series

NameWISE 2002 - Proceedings of the 3rd International Conference on Web Information Systems Engineering

Conference

Conference3rd International Conference on Web Information Systems Engineering, WISE 2002
PlaceSingapore
CitySingapore
Period12/12/0214/12/02

Bibliographical note

Publication details (e.g. title, author(s), publication statuses and dates) are captured on an “AS IS” and “AS AVAILABLE” basis at the time of record harvesting from the data source. Suggestions for further amendments or supplementary information can be sent to [email protected].

Fingerprint

Dive into the research topics of 'A unified framework for clustering heterogeneous Web objects'. Together they form a unique fingerprint.

Cite this