TY - JOUR
T1 - Evolving pathway activation from cancer gene expression data using nature-inspired ensemble optimization[Formula presented]
AU - Wang, Xubin
AU - Wang, Yunhe
AU - Ma, Zhiqiang
AU - Wong, Ka-Chun
AU - Li, Xiangtao
PY - 2024/8/15
Y1 - 2024/8/15
N2 - Class-imbalanced biological datasets pose significant challenges in machine learning and data analysis tasks. Prior methods to handle imbalance rely on data oversampling, which increases computational costs and overfitting. While feature selection and ensemble learning are promising techniques, current applications in imbalanced contexts are limited. To address these challenges, we present a novel framework called Hybrid Sampling Nature-Inspired Optimization Ensemble (HSNOE) to enhance the identification of hidden responders in imbalanced biological datasets. Our contributions are three-fold: 1) A hybrid undersampling and oversampling technique to mitigate class-imbalance; 2) Integrate an ant colony optimization-based feature selection that identifies informative feature subsets; 3) An ensemble classifier integrating diverse models trained on optimized features to improve performance. The experiments conducted on the five biological datasets demonstrate that HSNOE exhibits more stable comprehensive performance across six evaluation metrics compared to ten benchmark methods. We also conducted a biological analysis specifically on the Pan-cancer dataset. Moreover, the HSNOE method has been made publicly available on GitHub.1 © 2024 Elsevier Ltd.
AB - Class-imbalanced biological datasets pose significant challenges in machine learning and data analysis tasks. Prior methods to handle imbalance rely on data oversampling, which increases computational costs and overfitting. While feature selection and ensemble learning are promising techniques, current applications in imbalanced contexts are limited. To address these challenges, we present a novel framework called Hybrid Sampling Nature-Inspired Optimization Ensemble (HSNOE) to enhance the identification of hidden responders in imbalanced biological datasets. Our contributions are three-fold: 1) A hybrid undersampling and oversampling technique to mitigate class-imbalance; 2) Integrate an ant colony optimization-based feature selection that identifies informative feature subsets; 3) An ensemble classifier integrating diverse models trained on optimized features to improve performance. The experiments conducted on the five biological datasets demonstrate that HSNOE exhibits more stable comprehensive performance across six evaluation metrics compared to ten benchmark methods. We also conducted a biological analysis specifically on the Pan-cancer dataset. Moreover, the HSNOE method has been made publicly available on GitHub.1 © 2024 Elsevier Ltd.
KW - Ant colony optimization
KW - Class-imbalanced learning
KW - Ensemble learning
KW - Feature selection
KW - Sampling
UR - http://www.scopus.com/inward/record.url?scp=85185255898&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-85185255898&origin=recordpage
U2 - 10.1016/j.eswa.2024.123469
DO - 10.1016/j.eswa.2024.123469
M3 - RGC 21 - Publication in refereed journal
SN - 0957-4174
VL - 248
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 123469
ER -