Title extraction from Loosely Structured Data Records

Yi-Pu Wu, Xue-Jie Zhang, Qing Li, Jing Chen

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

In this paper, we present a novel title extraction method from Loosely Structured Data Records (LSDRs). Firstly, we automatically identify the format of titles and then extract them accordingly. For the Web page whose title is occurred in all the Data Records, we obtain the one in the candidate titles which has the largest length of the "same content" as the accurate title. And for the Web page whose title is occurred before the first Data Record, the candidate title which has the largest length of the "different content" can be considered as the accurate title. Our experiment demonstrates that our automatic algorithm is robust and effective on two databases collected from the Internet. © 2008 IEEE.
Original languageEnglish
Title of host publicationProceedings of the 7th International Conference on Machine Learning and Cybernetics, ICMLC
Pages2623-2628
Volume5
DOIs
Publication statusPublished - 2008
Event7th International Conference on Machine Learning and Cybernetics, ICMLC - Kunming, China
Duration: 12 Jul 200815 Jul 2008

Publication series

Name
Volume5

Conference

Conference7th International Conference on Machine Learning and Cybernetics, ICMLC
PlaceChina
CityKunming
Period12/07/0815/07/08

Research Keywords

  • Forum data
  • Loosely structured data records
  • Structured data records
  • Title extraction

Fingerprint

Dive into the research topics of 'Title extraction from Loosely Structured Data Records'. Together they form a unique fingerprint.

Cite this