Inducing Word Clusters from Classical Chinese Poems

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review

View graph of relations

Author(s)

Detail(s)

Original languageEnglish
Pages (from-to)13-30
Journal / PublicationInternational Journal of Asian Language Processing
Volume28
Issue number1
Publication statusPublished - 30 Jun 2018

Abstract

Parallelism is a literary device that is frequently used in Classical Chinese poetry. Within the two lines of a parallel couplet, the words in one line are expected to mirror those in the other in terms of syntax and meaning. Judicious selection of pairs of related words is thus important for poem composition. This article investigates statistical approaches for word clustering, such that all words in each cluster can serve as candidates to form appropriate word pairs. We compare three corpus-based methods for computing word similarity relatedness, and apply a graph-based clustering algorithm to induce word clusters. We evaluate the quality of the automatically induced clusters with respect to a gold standard proposed by a literary scholar. Experimental results show that similarity scores estimated by the word2vec model lead to more accurate clusters than pointwise mutual information and chi-square, reaching 61.2% precision, 70.8% recall and 61.2% purity. Our work lays a foundation to support further studies on parallelism in Classical Chinese literature, and to provide training data for computer-assisted poem composition.

Research Area(s)

  • Classical Chinese, parallelism, Chinese poetry, word clustering

Bibliographic Note

Information for this record is provided by the author(s) concerned.

Citation Format(s)

Inducing Word Clusters from Classical Chinese Poems. / Lee, John; Luo, Mengqi.
In: International Journal of Asian Language Processing, Vol. 28, No. 1, 30.06.2018, p. 13-30.

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review