TY - GEN
T1 - MAHAKIL
T2 - 40th International Conference on Software Engineering (ICSE 2018)
AU - Bennin, Kwabena E.
AU - Keung, Jacky
AU - Phannachitta, Passakorn
AU - Monden, Akito
AU - Mensah, Solomon
PY - 2018/5
Y1 - 2018/5
N2 - This study presents MAHAKIL, a novel and efficient synthetic over-sampling approach for software defect datasets that is based on the chromosomal theory of inheritance. Exploiting this theory, MAHAKIL interprets two distinct sub-classes as parents and generates a new instance that inherits different traits from each parent and contributes to the diversity within the data distribution. We extensively compare MAHAKIL with five other sampling approaches using 20 releases of defect datasets from the PROMISE repository and five prediction models. Our experiments indicate that MAHAKIL improves the prediction performance for all the models and achieves better and more significant pf values than the other oversampling approaches, based on robust statistical tests.
AB - This study presents MAHAKIL, a novel and efficient synthetic over-sampling approach for software defect datasets that is based on the chromosomal theory of inheritance. Exploiting this theory, MAHAKIL interprets two distinct sub-classes as parents and generates a new instance that inherits different traits from each parent and contributes to the diversity within the data distribution. We extensively compare MAHAKIL with five other sampling approaches using 20 releases of defect datasets from the PROMISE repository and five prediction models. Our experiments indicate that MAHAKIL improves the prediction performance for all the models and achieves better and more significant pf values than the other oversampling approaches, based on robust statistical tests.
KW - Class imbalance learning
KW - Classification problems
KW - Data sampling methods
KW - Software defect prediction
KW - Synthetic sample generation
UR - http://www.scopus.com/inward/record.url?scp=85049405348&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-85049405348&origin=recordpage
U2 - 10.1145/3180155.3182520
DO - 10.1145/3180155.3182520
M3 - RGC 32 - Refereed conference paper (with host publication)
SN - 9781450356381
SP - 699
BT - ICSE '18 - Proceedings of the 40th International Conference on Software Engineering
PB - Association for Computing Machinery
Y2 - 27 May 2018 through 3 June 2018
ER -