TY - GEN
T1 - PCA based sequential feature space learning for gene selection
AU - Yang, Jing-Lin
AU - Li, Han-Xiong
PY - 2010
Y1 - 2010
N2 - The expression of genes could be used for tumor subtype classification, clinical diagnosis and prognosis outcome prediction, but the underlying mechanism remains unknown. It is possible for data-based machine learning method to be employed for phenotype classification problem. But high dimensionality and small sample size make many machine learning methods fail. In this research, a PCA based sequential feature space learning method is proposed for gene selection. A two level feature selection process is conducted. In the first level PCA decomposition is conducted to obtain the orthogonal axis, and then features are projected and evaluated on the orthogonal axis. In second level, the features that have large projections are selected to form the feature space. Then the projections of all features onto the feature space are evaluated. Only features that have large projections both on orthogonal axis and feature subspace are selected as the feature subset. Then a neural network (NN) is employed to learn the classification model. The PCA based feature space learning is processed in a sequential manner until the classification performance is under pre-specified threshold and stable. The proposed methods have been applied to two gene microarray databases and showing good results. © 2010 IEEE.
AB - The expression of genes could be used for tumor subtype classification, clinical diagnosis and prognosis outcome prediction, but the underlying mechanism remains unknown. It is possible for data-based machine learning method to be employed for phenotype classification problem. But high dimensionality and small sample size make many machine learning methods fail. In this research, a PCA based sequential feature space learning method is proposed for gene selection. A two level feature selection process is conducted. In the first level PCA decomposition is conducted to obtain the orthogonal axis, and then features are projected and evaluated on the orthogonal axis. In second level, the features that have large projections are selected to form the feature space. Then the projections of all features onto the feature space are evaluated. Only features that have large projections both on orthogonal axis and feature subspace are selected as the feature subset. Then a neural network (NN) is employed to learn the classification model. The PCA based feature space learning is processed in a sequential manner until the classification performance is under pre-specified threshold and stable. The proposed methods have been applied to two gene microarray databases and showing good results. © 2010 IEEE.
KW - Feature selection
KW - Gene expressions
KW - Microarray
KW - PCA
UR - http://www.scopus.com/inward/record.url?scp=78149311173&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-78149311173&origin=recordpage
U2 - 10.1109/ICMLC.2010.5580720
DO - 10.1109/ICMLC.2010.5580720
M3 - RGC 32 - Refereed conference paper (with host publication)
SN - 9781424465262
VL - 6
SP - 3079
EP - 3084
BT - 2010 International Conference on Machine Learning and Cybernetics, ICMLC 2010
T2 - 2010 International Conference on Machine Learning and Cybernetics, ICMLC 2010
Y2 - 11 July 2010 through 14 July 2010
ER -