TY - JOUR
T1 - Advancing text classification
T2 - a novel two-stage multi-objective feature selection framework
AU - Liu, Yan
AU - Cheng, Xian
AU - Stephen, Liao Shaoyi
AU - Wei, Shansen
PY - 2025/4/13
Y1 - 2025/4/13
N2 - In the realm of text classification, feature selection stands as a pivotal element, focusing on the identification of relevant terms through filter indicators or accuracy measures. Given the plethora of available indicators and measures, the diverse information they unveil leads to disparate feature selection outcomes. This paper presents a novel two-stage multi-objective feature selection framework that encompasses multiple filter indicators and accuracy measures in both the filter and wrapper stages. Employing Data Envelopment Analysis (DEA), the framework addresses the multi-objective decision-making challenge by exploring the Pareto efficient frontier. To comprehensively assess the framework's efficacy, experiments were conducted on twelve datasets using six distinct Classification Algorithms. The results highlight the superiority of the DEA Filter-Wrapper model (DEAFW), constructed based on this innovative framework. DEAFW consistently outperformed five single-objective filter models and a one-stage multi-objective filter model across six performance metrics in the majority of cases. For instance, in the case of logistic regression, DEAFW achieved the highest average rank among twelve datasets across all performance metrics. Furthermore, a comparative analysis with four existing feature selection techniques affirmed the consistent superiority of the DEAFW model, as it consistently attained the smallest grand average rank value across twelve datasets for most performance metrics. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025.
AB - In the realm of text classification, feature selection stands as a pivotal element, focusing on the identification of relevant terms through filter indicators or accuracy measures. Given the plethora of available indicators and measures, the diverse information they unveil leads to disparate feature selection outcomes. This paper presents a novel two-stage multi-objective feature selection framework that encompasses multiple filter indicators and accuracy measures in both the filter and wrapper stages. Employing Data Envelopment Analysis (DEA), the framework addresses the multi-objective decision-making challenge by exploring the Pareto efficient frontier. To comprehensively assess the framework's efficacy, experiments were conducted on twelve datasets using six distinct Classification Algorithms. The results highlight the superiority of the DEA Filter-Wrapper model (DEAFW), constructed based on this innovative framework. DEAFW consistently outperformed five single-objective filter models and a one-stage multi-objective filter model across six performance metrics in the majority of cases. For instance, in the case of logistic regression, DEAFW achieved the highest average rank among twelve datasets across all performance metrics. Furthermore, a comparative analysis with four existing feature selection techniques affirmed the consistent superiority of the DEAFW model, as it consistently attained the smallest grand average rank value across twelve datasets for most performance metrics. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025.
KW - Data envelopment analysis
KW - Feature selection
KW - Multi-objective decision making
KW - Text classification
UR - http://www.scopus.com/inward/record.url?scp=105002424768&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-105002424768&origin=recordpage
U2 - 10.1007/s10799-025-00450-9
DO - 10.1007/s10799-025-00450-9
M3 - RGC 21 - Publication in refereed journal
SN - 1385-951X
JO - Information Technology and Management
JF - Information Technology and Management
M1 - 107057
ER -