Extracting content structure for web pages based on visual representation

Deng Cai, Shipeng Yu, Ji-Rong Wen, Wei-Ying Ma

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 12 - Chapter in an edited book (Author)peer-review

233 Citations (Scopus)

Abstract

A new web content structure based on visual representation is proposed in this paper. Many web applications such as information retrieval, information extraction and automatic page adaptation can benefit from this structure. This paper presents an automatic top-down, tag-tree independent approach to detect web content structure. It simulates how a user understands web layout structure based on his visual perception. Comparing to other existing techniques, our approach is independent to underlying documentation representation such as HTML and works well even when the HTML structure is far different from layout structure. Experiments show satisfactory results. © Springer-Verlag Berlin Heidelberg 2003.
Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages406-417
Volume2642
ISBN (Print)9783540023548
DOIs
Publication statusPublished - 2003
Externally publishedYes

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2642
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Bibliographical note

Publication details (e.g. title, author(s), publication statuses and dates) are captured on an “AS IS” and “AS AVAILABLE” basis at the time of record harvesting from the data source. Suggestions for further amendments or supplementary information can be sent to [email protected].

Fingerprint

Dive into the research topics of 'Extracting content structure for web pages based on visual representation'. Together they form a unique fingerprint.

Cite this