Utility-based feature selection for text classification

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

4 Scopus Citations
View graph of relations



Original languageEnglish
Pages (from-to)197–226
Journal / PublicationKnowledge and Information Systems
Issue number1
Online published8 Dec 2018
Publication statusPublished - Oct 2019


Feature selection is a significant step before a classification task used to reduce excessive computational costs and enhance classification performance. This paper illustrates a novel feature selection method based on the concept of utility that is grounded in economics theory. In particular, we focus on a utility-based feature selection method for enhancing text classification. Different from existing feature selection methods, the proposed method selects discriminative semantic terms according to how authors utilize terms to express the main ideas in textual documents, i.e., the “utility of terms,” a criteria that can be used to measure the usefulness of terms on expressing authors’ main ideas. To our best knowledge, our work represents the successful research on the leveraging economics theory for developing a semantically rich feature selection method to improve text classification. Our empirical tests based on six UCI benchmark datasets confirm that the proposed method often outperforms other state-of-the-art feature selection methods in text classification. Moreover, our method provides an economics explanation of term weighting for information retrieval and semantic information acquisition in textual documents.

Research Area(s)

  • Economics theory, Feature selection, Text classification, Text mining, Utility theory

Citation Format(s)

Utility-based feature selection for text classification. / Wang, Heyong; Hong, Ming; Lau, Raymond Yiu Keung.
In: Knowledge and Information Systems, Vol. 61, No. 1, 10.2019, p. 197–226.

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review