Structured Penalized Logistic Regression for Gene Selection in Gene Expression Data Analysis

Cheng Liu, Hau San Wong*

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

28 Citations (Scopus)

Abstract

In gene expression data analysis, the problems of cancer classification and gene selection are closely related. Successfully selecting informative genes significantly improve the classification performance. To identify informative genes from a large number of candidate genes, various methods have been proposed. However, the gene expression data may include some important correlation structures, and some of the genes can be divided into different groups based on their biological pathways. Many existing methods do not take into consideration the exact correlation structure within the data. Therefore, from both the knowledge discovery and biological perspectives, an ideal gene selection method should take this structural information into account. Moreover, the better generalization performance can be obtained by discovering correlation structure within data. In order to discover structure information among data and improve learning performance, we propose a structured penalized logistic regression model which simultaneously performs feature selection and model learning for gene expression data analysis. An efficient coordinate descent algorithm has been developed to optimize the model. The numerical simulation studies demonstrate that our method is able to select the highly correlated features. In addition, the results from real gene expression datasets show that the proposed method performs competitively with respect to previous approaches.
Original languageEnglish
Pages (from-to)312-321
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
Volume16
Issue number1
Online published30 Oct 2017
DOIs
Publication statusPublished - Feb 2019

Funding

The work described in this paper was supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China [Project No. CityU 11300715].

Research Keywords

  • Analytical models
  • Correlation
  • Data analysis
  • Data models
  • Gene expression
  • Logistics
  • Microarray
  • Penalized logistic regression model
  • Structured penalized regularization

Fingerprint

Dive into the research topics of 'Structured Penalized Logistic Regression for Gene Selection in Gene Expression Data Analysis'. Together they form a unique fingerprint.

Cite this