MAHAKIL : Diversity based Oversampling Approach to Alleviate the Class Imbalance Issue in Software Defect Prediction: Extended Abstract

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

12 Scopus Citations
View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Original languageEnglish
Title of host publicationICSE '18 - Proceedings of the 40th International Conference on Software Engineering
PublisherAssociation for Computing Machinery (ACM)
Pages699
ISBN (print)9781450356381
Publication statusPublished - May 2018

Publication series

Name
ISSN (Print)0270-5257

Conference

Title40th International Conference on Software Engineering (ICSE 2018)
PlaceSweden
CityGothenburg
Period27 May - 3 June 2018

Abstract

This study presents MAHAKIL, a novel and efficient synthetic over-sampling approach for software defect datasets that is based on the chromosomal theory of inheritance. Exploiting this theory, MAHAKIL interprets two distinct sub-classes as parents and generates a new instance that inherits different traits from each parent and contributes to the diversity within the data distribution. We extensively compare MAHAKIL with five other sampling approaches using 20 releases of defect datasets from the PROMISE repository and five prediction models. Our experiments indicate that MAHAKIL improves the prediction performance for all the models and achieves better and more significant pf values than the other oversampling approaches, based on robust statistical tests.

Research Area(s)

  • Class imbalance learning, Classification problems, Data sampling methods, Software defect prediction, Synthetic sample generation

Citation Format(s)

MAHAKIL: Diversity based Oversampling Approach to Alleviate the Class Imbalance Issue in Software Defect Prediction: Extended Abstract. / Bennin, Kwabena E.; Keung, Jacky; Phannachitta, Passakorn et al.
ICSE '18 - Proceedings of the 40th International Conference on Software Engineering. Association for Computing Machinery (ACM), 2018. p. 699.

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review