Skip to main navigation Skip to search Skip to main content

Short text similarity based on probabilistic topics

  • Xiaojun Quan
  • , Gang Liu
  • , Zhi Lu
  • , Xingliang Ni
  • , Liu Wenyin

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

Abstract

In this paper, we propose a new method for measuring the similarity between two short text snippets by comparing each of them with the probabilistic topics. Specifically, our method starts by firstly finding the distinguishing terms between the two short text snippets and comparing them with a series of probabilistic topics, extracted by Gibbs sampling algorithm. The relationship between the distinguishing terms of the short text snippets can be discovered by examining their probabilities under each topic. The similarity between two short text snippets is calculated based on their common terms and the relationship of their distinguishing terms. Extensive experiments on paraphrasing and question categorization show that the proposed method can calculate the similarity of short text snippets more accurately than other methods including the pure TF-IDF measure. © 2009 Springer-Verlag London Limited.
Original languageEnglish
Pages (from-to)473-491
JournalKnowledge and Information Systems
Volume25
Issue number3
DOIs
Publication statusPublished - Dec 2010

Research Keywords

  • Information retrieval
  • Query expansion
  • Question answering
  • Text mining
  • Text similarity measures

Fingerprint

Dive into the research topics of 'Short text similarity based on probabilistic topics'. Together they form a unique fingerprint.

Cite this