Hybrid method for automated news content extraction from the Web
Research output: Chapters, Conference Papers, Creative and Literary Works (RGC: 12, 32, 41, 45) › 32_Refereed conference paper (with ISBN/ISSN) › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Title of host publication | Web Information Systems - WISE 2006 |
Subtitle of host publication | 7th International Conference on Web Information Systems Engineering, Proceedings |
Publisher | Springer Verlag |
Pages | 327-338 |
Volume | 4255 LNCS |
ISBN (Print) | 3540481052, 9783540481058 |
Publication status | Published - 2006 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 4255 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Title | 7th International Conference on Web Information Systems Engineering, WISE 2006 |
---|---|
Place | China |
City | Wuhan |
Period | 23 - 26 October 2006 |
Link(s)
Abstract
Web news content extraction is vital to improve news indexing and searching in nowadays search engines, especially for the news searching service. In this paper we study the Web news content extraction problem and propose an automated extraction algorithm for it. Our method is a hybrid one taking the advantage of both sequence matching and tree matching techniques. We propose TSReC, a variant of tag sequence representation suitable for both sequence matching and tree matching, along with an associated algorithm for automated Web news content extraction. By implementing a prototype system for Web news content extraction, the empirical evaluation is conducted and the result shows that our method is highly effective and efficient. © Springer-Verlag Berlin Heidelberg 2006.
Citation Format(s)
Hybrid method for automated news content extraction from the Web. / Li, Yu; Meng, Xiaofeng; Li, Qing; Wang, Liping.
Web Information Systems - WISE 2006: 7th International Conference on Web Information Systems Engineering, Proceedings. Vol. 4255 LNCS Springer Verlag, 2006. p. 327-338 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4255 LNCS).Research output: Chapters, Conference Papers, Creative and Literary Works (RGC: 12, 32, 41, 45) › 32_Refereed conference paper (with ISBN/ISSN) › peer-review