TY - GEN
T1 - Title extraction from Loosely Structured Data Records
AU - Wu, Yi-Pu
AU - Zhang, Xue-Jie
AU - Li, Qing
AU - Chen, Jing
PY - 2008
Y1 - 2008
N2 - In this paper, we present a novel title extraction method from Loosely Structured Data Records (LSDRs). Firstly, we automatically identify the format of titles and then extract them accordingly. For the Web page whose title is occurred in all the Data Records, we obtain the one in the candidate titles which has the largest length of the "same content" as the accurate title. And for the Web page whose title is occurred before the first Data Record, the candidate title which has the largest length of the "different content" can be considered as the accurate title. Our experiment demonstrates that our automatic algorithm is robust and effective on two databases collected from the Internet. © 2008 IEEE.
AB - In this paper, we present a novel title extraction method from Loosely Structured Data Records (LSDRs). Firstly, we automatically identify the format of titles and then extract them accordingly. For the Web page whose title is occurred in all the Data Records, we obtain the one in the candidate titles which has the largest length of the "same content" as the accurate title. And for the Web page whose title is occurred before the first Data Record, the candidate title which has the largest length of the "different content" can be considered as the accurate title. Our experiment demonstrates that our automatic algorithm is robust and effective on two databases collected from the Internet. © 2008 IEEE.
KW - Forum data
KW - Loosely structured data records
KW - Structured data records
KW - Title extraction
UR - http://www.scopus.com/inward/record.url?scp=57849155380&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-57849155380&origin=recordpage
U2 - 10.1109/ICMLC.2008.4620851
DO - 10.1109/ICMLC.2008.4620851
M3 - RGC 32 - Refereed conference paper (with host publication)
SN - 9781424420964
VL - 5
SP - 2623
EP - 2628
BT - Proceedings of the 7th International Conference on Machine Learning and Cybernetics, ICMLC
T2 - 7th International Conference on Machine Learning and Cybernetics, ICMLC
Y2 - 12 July 2008 through 15 July 2008
ER -