Utility-based feature selection for text classification

Heyong Wang*, Ming Hong, Raymond Yiu Keung Lau

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

4 Citations (Scopus)

Abstract

Feature selection is a significant step before a classification task used to reduce excessive computational costs and enhance classification performance. This paper illustrates a novel feature selection method based on the concept of utility that is grounded in economics theory. In particular, we focus on a utility-based feature selection method for enhancing text classification. Different from existing feature selection methods, the proposed method selects discriminative semantic terms according to how authors utilize terms to express the main ideas in textual documents, i.e., the “utility of terms,” a criteria that can be used to measure the usefulness of terms on expressing authors’ main ideas. To our best knowledge, our work represents the successful research on the leveraging economics theory for developing a semantically rich feature selection method to improve text classification. Our empirical tests based on six UCI benchmark datasets confirm that the proposed method often outperforms other state-of-the-art feature selection methods in text classification. Moreover, our method provides an economics explanation of term weighting for information retrieval and semantic information acquisition in textual documents.
Original languageEnglish
Pages (from-to)197–226
JournalKnowledge and Information Systems
Volume61
Issue number1
Online published8 Dec 2018
DOIs
Publication statusPublished - Oct 2019

Research Keywords

  • Economics theory
  • Feature selection
  • Text classification
  • Text mining
  • Utility theory

RGC Funding Information

  • RGC-funded

Fingerprint

Dive into the research topics of 'Utility-based feature selection for text classification'. Together they form a unique fingerprint.

Cite this