TY - JOUR
T1 - Recognition of word collocation habits using frequency rank ratio and inter-term intimacy
AU - Tang, Peng
AU - Chow, Tommy W.S.
PY - 2013/9/1
Y1 - 2013/9/1
N2 - An effective algorithm for extracting two useful features from text documents for analyzing word collocation habits, "Frequency Rank Ratio" (FRR) and "Intimacy", is proposed. FRR is derived from a ranking index of a word according to its word frequency. Intimacy, computed by a compact language model called Influence Language Model (ILM), measures how close a word is to others within the same sentence. Using the proposed features, a visualization framework is developed for word collocation analysis. To evaluate our proposed framework, two corpora are designed and collected from the real-life data covering diverse topics and genres. Extensive simulations are conducted to illustrate the feasibility and effectiveness of our visualization framework. Our results demonstrate that the proposed features and algorithm are able to conduct reliable text analysis efficiently. © 2013 Elsevier Ltd. All rights reserved.
AB - An effective algorithm for extracting two useful features from text documents for analyzing word collocation habits, "Frequency Rank Ratio" (FRR) and "Intimacy", is proposed. FRR is derived from a ranking index of a word according to its word frequency. Intimacy, computed by a compact language model called Influence Language Model (ILM), measures how close a word is to others within the same sentence. Using the proposed features, a visualization framework is developed for word collocation analysis. To evaluate our proposed framework, two corpora are designed and collected from the real-life data covering diverse topics and genres. Extensive simulations are conducted to illustrate the feasibility and effectiveness of our visualization framework. Our results demonstrate that the proposed features and algorithm are able to conduct reliable text analysis efficiently. © 2013 Elsevier Ltd. All rights reserved.
KW - Frequency rank ratio
KW - Intimacy
KW - Text classification
KW - Text visualization
UR - http://www.scopus.com/inward/record.url?scp=84876042627&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-84876042627&origin=recordpage
U2 - 10.1016/j.eswa.2013.01.003
DO - 10.1016/j.eswa.2013.01.003
M3 - RGC 21 - Publication in refereed journal
SN - 0957-4174
VL - 40
SP - 4301
EP - 4314
JO - Expert Systems with Applications
JF - Expert Systems with Applications
IS - 11
ER -