A coarse-to-fine framework to efficiently thwart plagiarism

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

25 Scopus Citations
View graph of relations

Author(s)

  • Haijun Zhang
  • Tommy W.S. Chow

Related Research Unit(s)

Detail(s)

Original languageEnglish
Pages (from-to)471-487
Journal / PublicationPattern Recognition
Volume44
Issue number2
Publication statusPublished - Feb 2011

Abstract

This paper presents a systematic framework using multilevel matching approach for plagiarism detection (PD). A multilevel structure, i.e. documentparagraphsentence, is used to represent each document. In document and paragraph level, we use traditional dimensionality reduction technique to project high dimensional histograms into latent semantic space. The Earth Mover's Distance (EMD), instead of exhaustive matching, is employed to retrieve relevant documents, which enables us to markedly shrink the searching domain. Two PD algorithms are designed and implemented to efficiently flag the suspected plagiarized document sources. We conduct extensive experimental verifications including document retrieval, PD, the study of the effects of parameters, and the empirical study of the system response. The results corroborate that the proposed approach is accurate and computationally efficient for performing PD. © 2010 Elsevier Ltd.

Research Area(s)

  • Document retrieval, EMD, Multilevel matching, Plagiarism detection

Citation Format(s)

A coarse-to-fine framework to efficiently thwart plagiarism. / Zhang, Haijun; Chow, Tommy W.S.
In: Pattern Recognition, Vol. 44, No. 2, 02.2011, p. 471-487.

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review