TY - GEN
T1 - 2D conditional random fields for Web information extraction
AU - Zhu, Jun
AU - Nie, Zaiqing
AU - Wen, Ji-Rong
AU - Zhang, Bo
AU - Ma, Wei-Ying
N1 - Publication details (e.g. title, author(s), publication statuses and dates) are captured on an “AS IS” and “AS AVAILABLE” basis at the time of record harvesting from the data source. Suggestions for further amendments or supplementary information can be sent to [email protected].
PY - 2005
Y1 - 2005
N2 - The Web contains an abundance of useful semi-structured information about real world objects, and our empirical study shows that strong sequence characteristics exist for Web information about objects of the same type across different Web sites. Conditional Random Fields (CRFs) are the state of the art approaches taking the sequence characteristics to do better labeling. However, as the information on a Web page is two-dimensionally laid out, previous linear-chain CRFs have their limitations for Web information extraction. To better incorporate the two-dimensional neighborhood interactions, this paper presents a two-dimensional CRF model to automatically extract object information from the Web. We empirically compare the proposed model with existing linear-chain CRF models for product information extraction, and the results show the effectiveness of our model.
AB - The Web contains an abundance of useful semi-structured information about real world objects, and our empirical study shows that strong sequence characteristics exist for Web information about objects of the same type across different Web sites. Conditional Random Fields (CRFs) are the state of the art approaches taking the sequence characteristics to do better labeling. However, as the information on a Web page is two-dimensionally laid out, previous linear-chain CRFs have their limitations for Web information extraction. To better incorporate the two-dimensional neighborhood interactions, this paper presents a two-dimensional CRF model to automatically extract object information from the Web. We empirically compare the proposed model with existing linear-chain CRF models for product information extraction, and the results show the effectiveness of our model.
UR - http://www.scopus.com/inward/record.url?scp=31844452562&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-31844452562&origin=recordpage
U2 - 10.1145/1102351.1102483
DO - 10.1145/1102351.1102483
M3 - RGC 32 - Refereed conference paper (with host publication)
SN - 1595931805
T3 - ICML 2005 - Proceedings of the 22nd International Conference on Machine Learning
SP - 1049
EP - 1056
BT - ICML 2005 - Proceedings of the 22nd International Conference on Machine Learning
T2 - ICML 2005: 22nd International Conference on Machine Learning
Y2 - 7 August 2005 through 11 August 2005
ER -