Exploring topic discriminating power of words in latent dirichlet allocation

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

22 Scopus Citations
View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Original languageEnglish
Title of host publicationProceedings of COLING 2016, the 26th International Conference on Computational Linguistics
Subtitle of host publicationTechnical Papers
PublisherThe COLING 2016 Organizing Committee
Pages2238-2247
ISBN (print)978-4-87974-702-0
Publication statusPublished - Dec 2016

Conference

Title26th International Conference on Computational Linguistics, COLING 2016
LocationOsaka International Convention Center
PlaceJapan
CityOsaka
Period11 - 16 December 2016

Abstract

Latent Dirichlet Allocation (LDA) and its variants have been widely used to discover latent topics in textual documents. However, some of topics generated by LDA may be noisy with irrelevant words scattering across these topics. We name this kind of words as topic-indiscriminate words, which tend to make topics more ambiguous and less interpretable by humans. In our work, we propose a new topic model named TWLDA, which assigns low weights to words with low topic discriminating power (ability). Our experimental results show that the proposed approach, which effectively reduces the number of topic-indiscriminate words in discovered topics, improves the effectiveness of LDA.

Citation Format(s)

Exploring topic discriminating power of words in latent dirichlet allocation. / Yang, Kai; Cai, Yi; Chen, Zhenhong et al.
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. The COLING 2016 Organizing Committee , 2016. p. 2238-2247.

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review