Cross language information extraction knowledge adaptation

Tak-Lam Wong, Kai-On Chow, Wai Lam

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

1 Citation (Scopus)

Abstract

We propose a framework for adapting a previously learned wrapper from a source Web site to unseen sites which are written in different languages. The idea of our framework is to utilize the previously learned information extraction knowledge and the previously extracted or collected items in the source Web site. These knowledge and data are automatically translated to the same language as the unseen sites via online Web resources such as online Web dictionary or map. Multiple text mining methods are employed to automatically discover some machine labeled training examples in the unseen site. Both content oriented features and site dependent features of the machine labeled training examples are used for learning the new wrapper for the new unseen site using our language independent wrapper induction component. We conducted experiments on some real-world Web sites in different languages to demonstrate the effectiveness of our framework. © 2009 Springer Berlin Heidelberg.
Original languageEnglish
Title of host publicationRough Sets and Knowledge Technology
Subtitle of host publication4th International Conference, RSKT 2009, Proceedings
PublisherSpringer Verlag
Pages520-528
Volume5589 LNAI
ISBN (Print)3642029612, 9783642029615
DOIs
Publication statusPublished - 2009
Event4th International Conference on Rough Sets and Knowledge Technology, RSKT 2009 - Gold Coast, QLD, Australia
Duration: 14 Jul 200916 Jul 2009

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5589 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference4th International Conference on Rough Sets and Knowledge Technology, RSKT 2009
PlaceAustralia
CityGold Coast, QLD
Period14/07/0916/07/09

Fingerprint

Dive into the research topics of 'Cross language information extraction knowledge adaptation'. Together they form a unique fingerprint.

Cite this