Hybrid method for automated news content extraction from the Web

Research output: Chapters, Conference Papers, Creative and Literary Works (RGC: 12, 32, 41, 45)32_Refereed conference paper (with ISBN/ISSN)peer-review

8 Scopus Citations
View graph of relations

Author(s)

  • Yu Li
  • Xiaofeng Meng
  • Qing Li
  • Liping Wang

Related Research Unit(s)

Detail(s)

Original languageEnglish
Title of host publicationWeb Information Systems - WISE 2006
Subtitle of host publication7th International Conference on Web Information Systems Engineering, Proceedings
PublisherSpringer Verlag
Pages327-338
Volume4255 LNCS
ISBN (Print)3540481052, 9783540481058
Publication statusPublished - 2006

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4255 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Title7th International Conference on Web Information Systems Engineering, WISE 2006
PlaceChina
CityWuhan
Period23 - 26 October 2006

Abstract

Web news content extraction is vital to improve news indexing and searching in nowadays search engines, especially for the news searching service. In this paper we study the Web news content extraction problem and propose an automated extraction algorithm for it. Our method is a hybrid one taking the advantage of both sequence matching and tree matching techniques. We propose TSReC, a variant of tag sequence representation suitable for both sequence matching and tree matching, along with an associated algorithm for automated Web news content extraction. By implementing a prototype system for Web news content extraction, the empirical evaluation is conducted and the result shows that our method is highly effective and efficient. © Springer-Verlag Berlin Heidelberg 2006.

Citation Format(s)

Hybrid method for automated news content extraction from the Web. / Li, Yu; Meng, Xiaofeng; Li, Qing; Wang, Liping.

Web Information Systems - WISE 2006: 7th International Conference on Web Information Systems Engineering, Proceedings. Vol. 4255 LNCS Springer Verlag, 2006. p. 327-338 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4255 LNCS).

Research output: Chapters, Conference Papers, Creative and Literary Works (RGC: 12, 32, 41, 45)32_Refereed conference paper (with ISBN/ISSN)peer-review