Structured Penalized Logistic Regression for Gene Selection in Gene Expression Data Analysis

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review

21 Scopus Citations
View graph of relations

Related Research Unit(s)

Detail(s)

Original languageEnglish
Pages (from-to)312-321
Journal / PublicationIEEE/ACM Transactions on Computational Biology and Bioinformatics
Volume16
Issue number1
Online published30 Oct 2017
Publication statusPublished - Feb 2019

Abstract

In gene expression data analysis, the problems of cancer classification and gene selection are closely related. Successfully selecting informative genes significantly improve the classification performance. To identify informative genes from a large number of candidate genes, various methods have been proposed. However, the gene expression data may include some important correlation structures, and some of the genes can be divided into different groups based on their biological pathways. Many existing methods do not take into consideration the exact correlation structure within the data. Therefore, from both the knowledge discovery and biological perspectives, an ideal gene selection method should take this structural information into account. Moreover, the better generalization performance can be obtained by discovering correlation structure within data. In order to discover structure information among data and improve learning performance, we propose a structured penalized logistic regression model which simultaneously performs feature selection and model learning for gene expression data analysis. An efficient coordinate descent algorithm has been developed to optimize the model. The numerical simulation studies demonstrate that our method is able to select the highly correlated features. In addition, the results from real gene expression datasets show that the proposed method performs competitively with respect to previous approaches.

Research Area(s)

  • Analytical models, Correlation, Data analysis, Data models, Gene expression, Logistics, Microarray, Penalized logistic regression model, Structured penalized regularization