COS-training : A new semi-supervised learning method for keyphrase extraction based on co-training and SMOTE

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalNot applicablepeer-review

1 Scopus Citations
View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Original languageEnglish
Pages (from-to)233-238
Journal / PublicationICIC Express Letters, Part B: Applications
Volume6
Issue number1
Publication statusPublished - 1 Jan 2015

Abstract

As keyphrase is a small set of words that can best represent a document, they play significant roles in varieties of text-related tasks. In recent years, many unsupervised and supervised methods have been proposed for keyphrase extraction. However, keyphrase extraction is an imbalanced classification problem in nature and contains many unlabeled data, which have not been paid attention to in the previous studies. In this research, a new semi-supervised learning method, COS-training, is proposed for keyphrase extraction based on co-training and SMOTE. For the testing and illustration purpose, a keyphrase extraction dataset is selected to verify the effectiveness of the proposed method. Empirical results reveal that COS-training is a potential solution for keyphrase extraction. Among the compared methods, COS-training gets the best result. Al l these results illustrate that COS-training can be used as an alternative method for keyphrase extraction.

Research Area(s)

  • Co-training, Keyphrase extraction, Semi-supervised learning, SMOTE