TY - JOUR
T1 - Stability analysis of feature ranking techniques in the presence of noise
T2 - a comparative study
AU - Ramezani, Iman
AU - Niaki, Mojtaba Khorram
AU - Dehghani, Milad
AU - Rezapour, Mostafa
PY - 2020
Y1 - 2020
N2 - Noisy data is one of the common problems associated with real-world data, and may affects the performance of the data models, consequent decisions and the performance of feature ranking techniques. In this paper, we show how stability performance can be changed if different feature ranking methods against attribute noise and class noise are used. We consider Kendall's Tau rank correlation and Spearman rank correlation to evaluate various feature ranking methods stability, and quantify the degree of agreement between ordered lists of features created by a filter on a clean dataset and its outputs on the same dataset corrupted with different combinations of the noise level. According to the results of Kendall and Spearman measures, Gini index (GI) and information gain (IG) have the best performances respectively. Nevertheless, both Kendall and Spearman measures results show that ReliefF (RF) is the most sensitive (the worst) performance.
AB - Noisy data is one of the common problems associated with real-world data, and may affects the performance of the data models, consequent decisions and the performance of feature ranking techniques. In this paper, we show how stability performance can be changed if different feature ranking methods against attribute noise and class noise are used. We consider Kendall's Tau rank correlation and Spearman rank correlation to evaluate various feature ranking methods stability, and quantify the degree of agreement between ordered lists of features created by a filter on a clean dataset and its outputs on the same dataset corrupted with different combinations of the noise level. According to the results of Kendall and Spearman measures, Gini index (GI) and information gain (IG) have the best performances respectively. Nevertheless, both Kendall and Spearman measures results show that ReliefF (RF) is the most sensitive (the worst) performance.
KW - Attribute noise
KW - Class noise
KW - Filter-based feature ranking
KW - Kendall's Tau rank correlation
KW - Spearman rank correlation
KW - Stability
KW - Threshold-based feature ranking
KW - Attribute noise
KW - Class noise
KW - Filter-based feature ranking
KW - Kendall's Tau rank correlation
KW - Spearman rank correlation
KW - Stability
KW - Threshold-based feature ranking
KW - Attribute noise
KW - Class noise
KW - Filter-based feature ranking
KW - Kendall's Tau rank correlation
KW - Spearman rank correlation
KW - Stability
KW - Threshold-based feature ranking
UR - http://www.scopus.com/inward/record.url?scp=85094161245&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-85094161245&origin=recordpage
U2 - 10.1504/IJBIDM.2020.110371
DO - 10.1504/IJBIDM.2020.110371
M3 - RGC 21 - Publication in refereed journal
SN - 1743-8187
VL - 17
SP - 413
EP - 427
JO - International Journal of Business Intelligence and Data Mining
JF - International Journal of Business Intelligence and Data Mining
IS - 4
ER -