Skip to main navigation Skip to search Skip to main content

ODE: Ontology-assisted data extraction

  • Weifeng Su
  • , Jiying Wang
  • , Frederick H. Lochovsky

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

Abstract

Online databases respond to a user query with result records encoded in HTML files. Data extraction, which is important for many applications, extracts the records from the HTML files automatically. We present a novel data extraction method, ODE (Ontology-assisted Data Extraction), which automatically extracts the query result records from the HTML pages. ODE first constructs an ontology for a domain according to information matching between the query interfaces and query result pages from different Web sites within the same domain. Then, the constructed domain ontology is used during data extraction to identify the query result section in a query result page and to align and label the data values in the extracted records. The ontology-assisted data extraction method is fully automatic and overcomes many of the deficiencies of current automatic data extraction methods. Experimental results show that ODE is extremely accurate for identifying the query result section in an HTML page, segmenting the query result section into query result records, and aligning and labeling the data values in the query result records. © 2009 ACM.
Original languageEnglish
Article number12
JournalACM Transactions on Database Systems
Volume34
Issue number2
DOIs
Publication statusPublished - 1 Jun 2009

Research Keywords

  • Data value alignment
  • Domain ontology
  • Label assignment

Fingerprint

Dive into the research topics of 'ODE: Ontology-assisted data extraction'. Together they form a unique fingerprint.

Cite this