Short text similarity based on probabilistic topics

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review

67 Scopus Citations
View graph of relations

Author(s)

  • Xiaojun Quan
  • Gang Liu
  • Zhi Lu
  • Xingliang Ni
  • Liu Wenyin

Related Research Unit(s)

Detail(s)

Original languageEnglish
Pages (from-to)473-491
Journal / PublicationKnowledge and Information Systems
Volume25
Issue number3
Publication statusPublished - Dec 2010

Abstract

In this paper, we propose a new method for measuring the similarity between two short text snippets by comparing each of them with the probabilistic topics. Specifically, our method starts by firstly finding the distinguishing terms between the two short text snippets and comparing them with a series of probabilistic topics, extracted by Gibbs sampling algorithm. The relationship between the distinguishing terms of the short text snippets can be discovered by examining their probabilities under each topic. The similarity between two short text snippets is calculated based on their common terms and the relationship of their distinguishing terms. Extensive experiments on paraphrasing and question categorization show that the proposed method can calculate the similarity of short text snippets more accurately than other methods including the pure TF-IDF measure. © 2009 Springer-Verlag London Limited.

Research Area(s)

  • Information retrieval, Query expansion, Question answering, Text mining, Text similarity measures

Citation Format(s)

Short text similarity based on probabilistic topics. / Quan, Xiaojun; Liu, Gang; Lu, Zhi et al.

In: Knowledge and Information Systems, Vol. 25, No. 3, 12.2010, p. 473-491.

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review