Recognition of word collocation habits using frequency rank ratio and inter-term intimacy

Peng Tang, Tommy W.S. Chow

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

7 Citations (Scopus)

Abstract

An effective algorithm for extracting two useful features from text documents for analyzing word collocation habits, "Frequency Rank Ratio" (FRR) and "Intimacy", is proposed. FRR is derived from a ranking index of a word according to its word frequency. Intimacy, computed by a compact language model called Influence Language Model (ILM), measures how close a word is to others within the same sentence. Using the proposed features, a visualization framework is developed for word collocation analysis. To evaluate our proposed framework, two corpora are designed and collected from the real-life data covering diverse topics and genres. Extensive simulations are conducted to illustrate the feasibility and effectiveness of our visualization framework. Our results demonstrate that the proposed features and algorithm are able to conduct reliable text analysis efficiently. © 2013 Elsevier Ltd. All rights reserved.
Original languageEnglish
Pages (from-to)4301-4314
JournalExpert Systems with Applications
Volume40
Issue number11
DOIs
Publication statusPublished - 1 Sept 2013

Research Keywords

  • Frequency rank ratio
  • Intimacy
  • Text classification
  • Text visualization

Fingerprint

Dive into the research topics of 'Recognition of word collocation habits using frequency rank ratio and inter-term intimacy'. Together they form a unique fingerprint.

Cite this