TY - GEN
T1 - Web object indexing using domain knowledge
AU - Wang, Muyuan
AU - Li, Zhiwei
AU - Lu, Lie
AU - Ma, Wei-Ying
AU - Zhang, Naiyao
N1 - Publication details (e.g. title, author(s), publication statuses and dates) are captured on an “AS IS” and “AS AVAILABLE” basis at the time of record harvesting from the data source. Suggestions for further amendments or supplementary information can be sent to [email protected].
PY - 2005
Y1 - 2005
N2 - A web object is defined to represent any meaningful object embedded in web pages (e.g. images, music) or pointed to by hyperlinks (e.g. downloadable files). In many cases, users would like to search for information of a certain 'object', rather than a web page containing the query terms. To facilitate web object searching and organizing, in this paper, we propose a novel approach to web object indexing, by discovering its inherent structure information with existed domain knowledge. In our approach, first, Layered LSI spaces are built for a better representation of the hierarchically structured domain knowledge, in order to emphasize the specific semantics and term space in each layer of the domain knowledge. Meanwhile, the web object representation is constructed by hyperlink analysis, and further pruned to remove the noises. Then an optimal matching between the web object and the domain knowledge is performed, in order to pick out the structure attributes of the web object from the knowledge. Finally, the obtained structure attributes are used to re-organize and index the web objects. Our approach also indicates a new promising way to use trust-worthy Deep Web knowledge to help organize dispersive information of Surface Web. Copyright 2005 ACM.
AB - A web object is defined to represent any meaningful object embedded in web pages (e.g. images, music) or pointed to by hyperlinks (e.g. downloadable files). In many cases, users would like to search for information of a certain 'object', rather than a web page containing the query terms. To facilitate web object searching and organizing, in this paper, we propose a novel approach to web object indexing, by discovering its inherent structure information with existed domain knowledge. In our approach, first, Layered LSI spaces are built for a better representation of the hierarchically structured domain knowledge, in order to emphasize the specific semantics and term space in each layer of the domain knowledge. Meanwhile, the web object representation is constructed by hyperlink analysis, and further pruned to remove the noises. Then an optimal matching between the web object and the domain knowledge is performed, in order to pick out the structure attributes of the web object from the knowledge. Finally, the obtained structure attributes are used to re-organize and index the web objects. Our approach also indicates a new promising way to use trust-worthy Deep Web knowledge to help organize dispersive information of Surface Web. Copyright 2005 ACM.
KW - Confidence propagation
KW - Domain knowledge
KW - Indexing
KW - Information retrieval
KW - Latent semantic indexing
KW - Link analysis
KW - Music indexing
KW - Web object
UR - http://www.scopus.com/inward/record.url?scp=32344452076&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-32344452076&origin=recordpage
U2 - 10.1145/1081870.1081905
DO - 10.1145/1081870.1081905
M3 - RGC 32 - Refereed conference paper (with host publication)
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 294
EP - 303
BT - KDD-2005 - Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
T2 - KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Y2 - 21 August 2005 through 24 August 2005
ER -