A multi-level matching method with hybrid similarity for document retrieval

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journal

12 Scopus Citations
View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Original languageEnglish
Pages (from-to)2710-2719
Journal / PublicationExpert Systems with Applications
Volume39
Issue number3
Publication statusPublished - 15 Feb 2012

Abstract

This paper presents a multi-level matching method for document retrieval (DR) using a hybrid document similarity. Documents are represented by multi-level structure including document level and paragraph level. This multi-level-structured representation is designed to model underlying semantics in a more flexible and accurate way that the conventional flat term histograms find it hard to cope with. The matching between documents is then transformed into an optimization problem with Earth Mover's Distance (EMD). A hybrid similarity is used to synthesize the global and local semantics in documents to improve the retrieval accuracy. In this paper, we have performed extensive experimental study and verification. The results suggest that the proposed method works well for lengthy documents with evident spatial distributions of terms. © 2011 Elsevier Ltd. All rights reserved.

Research Area(s)

  • Document retrieval, EMD, Hybrid similarity, Multi-level matching, Multi-level structure