Cross project defect prediction using class distribution estimation and oversampling
Research output: Journal Publications and Reviews (RGC: 21, 22, 62) › 21_Publication in refereed journal › Not applicable › peer-review
Related Research Unit(s)
|Journal / Publication||Information and Software Technology|
|Early online date||12 Apr 2018|
|State||Published - Aug 2018|
|Link to Scopus||https://www.scopus.com/record/display.uri?eid=2-s2.0-85046137081&origin=recordpage|
Objective: To alleviate the negative effects of class imbalance and distribution mismatch on performance of CPDP models by using Class Distribution Estimation and Synthetic Minority Oversampling Technique. A novel approach called Class Distribution Estimation with Synthetic Minority Oversampling Technique (CDE-SMOTE) is proposed to optimize and improve the CPDP performance and avoid excessive oversampling.
Method: The proposed CDE-SMOTE employs CDE to estimate the class distribution of the target project. SMOTE is then used to modify the class distribution of the training data until the distribution becomes the reverse of the approximated class distribution of the target project. Four comprehensive experiments are conducted on 14 open source software projects.
Results: The proposed approach improves the overall performance of CPDP models when compared to the performance of other CPDP approaches. Significant improvements are observed in 63% of the test cases according to the Wilcoxon signed-rank tests with 16.421%, 29.687% and 20.259% improvements in terms of Balance, G-measure, and F-measure, respectively. Application of CDE-SMOTE on NN-filtered datasets significantly improved prediction performance.
Conclusions: CDE-SMOTE mitigates the class imbalance and distribution mismatch problems and also helps prevents excessive oversampling that results in performance degradation of prediction models. This approach is thus recommended for CPDP studies in software engineering.
- Class distribution estimation, Class imbalance learning, Cross-Project defect prediction, Oversampling, Software fault prediction
Cross project defect prediction using class distribution estimation and oversampling. / Limsettho, Nachai; Bennin, Kwabena Ebo; Keung, Jacky W.; Hata, Hideaki; Matsumoto, Kenichi.In: Information and Software Technology, Vol. 100, 08.2018, p. 87-102.