TY - GEN
T1 - Cross language information extraction knowledge adaptation
AU - Wong, Tak-Lam
AU - Chow, Kai-On
AU - Lam, Wai
PY - 2009
Y1 - 2009
N2 - We propose a framework for adapting a previously learned wrapper from a source Web site to unseen sites which are written in different languages. The idea of our framework is to utilize the previously learned information extraction knowledge and the previously extracted or collected items in the source Web site. These knowledge and data are automatically translated to the same language as the unseen sites via online Web resources such as online Web dictionary or map. Multiple text mining methods are employed to automatically discover some machine labeled training examples in the unseen site. Both content oriented features and site dependent features of the machine labeled training examples are used for learning the new wrapper for the new unseen site using our language independent wrapper induction component. We conducted experiments on some real-world Web sites in different languages to demonstrate the effectiveness of our framework. © 2009 Springer Berlin Heidelberg.
AB - We propose a framework for adapting a previously learned wrapper from a source Web site to unseen sites which are written in different languages. The idea of our framework is to utilize the previously learned information extraction knowledge and the previously extracted or collected items in the source Web site. These knowledge and data are automatically translated to the same language as the unseen sites via online Web resources such as online Web dictionary or map. Multiple text mining methods are employed to automatically discover some machine labeled training examples in the unseen site. Both content oriented features and site dependent features of the machine labeled training examples are used for learning the new wrapper for the new unseen site using our language independent wrapper induction component. We conducted experiments on some real-world Web sites in different languages to demonstrate the effectiveness of our framework. © 2009 Springer Berlin Heidelberg.
UR - http://www.scopus.com/inward/record.url?scp=69049116345&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-69049116345&origin=recordpage
U2 - 10.1007/978-3-642-02962-2_66
DO - 10.1007/978-3-642-02962-2_66
M3 - RGC 32 - Refereed conference paper (with host publication)
SN - 3642029612
SN - 9783642029615
VL - 5589 LNAI
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 520
EP - 528
BT - Rough Sets and Knowledge Technology
PB - Springer Verlag
T2 - 4th International Conference on Rough Sets and Knowledge Technology, RSKT 2009
Y2 - 14 July 2009 through 16 July 2009
ER -