TY - JOUR
T1 - Multiobjective Semisupervised Classifier Ensemble
AU - Yu, Zhiwen
AU - Zhang, Yidong
AU - Chen, C. L. Philip
AU - You, Jane
AU - Wong, Hau-San
AU - Dai, Dan
AU - Wu, Si
AU - Zhang, Jun
PY - 2019/6
Y1 - 2019/6
N2 - Classification of high-dimensional data with very limited labels is a challenging task in the field of data mining and machine learning. In this paper, we propose the multiobjective semisupervised classifier ensemble (MOSSCE) approach to address this challenge. Specifically, a multiobjective subspace selection process (MOSSP) in MOSSCE is first designed to generate the optimal combination of feature subspaces. Three objective functions are then proposed for MOSSP, which include the relevance of features, the redundancy between features, and the data reconstruction error. Then, MOSSCE generates an auxiliary training set based on the sample confidence to improve the performance of the classifier ensemble. Finally, the training set, combined with the auxiliary training set, is used to select the optimal combination of basic classifiers in the ensemble, train the classifier ensemble, and generate the final result. In addition, diversity analysis of the ensemble learning process is applied, and a set of nonparametric statistical tests is adopted for the comparison of semisupervised classification approaches on multiple datasets. The experiments on 12 gene expression datasets and two large image datasets show that MOSSCE has a better performance than other state-of-the-art semisupervised classifiers on high-dimensional data.
AB - Classification of high-dimensional data with very limited labels is a challenging task in the field of data mining and machine learning. In this paper, we propose the multiobjective semisupervised classifier ensemble (MOSSCE) approach to address this challenge. Specifically, a multiobjective subspace selection process (MOSSP) in MOSSCE is first designed to generate the optimal combination of feature subspaces. Three objective functions are then proposed for MOSSP, which include the relevance of features, the redundancy between features, and the data reconstruction error. Then, MOSSCE generates an auxiliary training set based on the sample confidence to improve the performance of the classifier ensemble. Finally, the training set, combined with the auxiliary training set, is used to select the optimal combination of basic classifiers in the ensemble, train the classifier ensemble, and generate the final result. In addition, diversity analysis of the ensemble learning process is applied, and a set of nonparametric statistical tests is adopted for the comparison of semisupervised classification approaches on multiple datasets. The experiments on 12 gene expression datasets and two large image datasets show that MOSSCE has a better performance than other state-of-the-art semisupervised classifiers on high-dimensional data.
KW - Clustering algorithms
KW - Cybernetics
KW - Ensemble learning
KW - feature selection
KW - Linear programming
KW - multiobjective optimization
KW - Partitioning algorithms
KW - Robustness
KW - Semisupervised learning
KW - Training
UR - http://www.scopus.com/inward/record.url?scp=85045736512&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-85045736512&origin=recordpage
U2 - 10.1109/TCYB.2018.2824299
DO - 10.1109/TCYB.2018.2824299
M3 - RGC 21 - Publication in refereed journal
SN - 2168-2267
VL - 49
SP - 2280
EP - 2293
JO - IEEE Transactions on Cybernetics
JF - IEEE Transactions on Cybernetics
IS - 6
ER -