TY - GEN
T1 - Extracting loosely structured data records through mining strict patterns
AU - Wu, Yipu
AU - Chen, Jing
AU - Li, Qing
PY - 2008
Y1 - 2008
N2 - Extracting loosely structured data records (DRs) has wide applications in many domains, such as forum pattern recognition, blog data analysis, and books and news review analysis. Currently existing methods work well for strongly structured DRs only. In this paper, we address the problem of extracting loosely structured DRs through mining strict patterns. In our method, we utilize both content feature and tag tree feature to recognize the loosely structured DRs, and propose a new approach to extract the DRs automatically. Through experimental study we demonstrate that this method is both effective and robust in practice. © 2008 IEEE.
AB - Extracting loosely structured data records (DRs) has wide applications in many domains, such as forum pattern recognition, blog data analysis, and books and news review analysis. Currently existing methods work well for strongly structured DRs only. In this paper, we address the problem of extracting loosely structured DRs through mining strict patterns. In our method, we utilize both content feature and tag tree feature to recognize the loosely structured DRs, and propose a new approach to extract the DRs automatically. Through experimental study we demonstrate that this method is both effective and robust in practice. © 2008 IEEE.
UR - http://www.scopus.com/inward/record.url?scp=52649084677&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-52649084677&origin=recordpage
U2 - 10.1109/ICDE.2008.4497543
DO - 10.1109/ICDE.2008.4497543
M3 - RGC 32 - Refereed conference paper (with host publication)
SN - 9781424418374
SP - 1322
EP - 1324
BT - Proceedings - International Conference on Data Engineering
T2 - 2008 IEEE 24th International Conference on Data Engineering, ICDE'08
Y2 - 7 April 2008 through 12 April 2008
ER -