A Drift Propensity Detection Technique to Improve the Performance for Cross-Version Software Defect Prediction

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

7 Scopus Citations
View graph of relations

Related Research Unit(s)

Detail(s)

Original languageEnglish
Title of host publicationProceedings - 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC 2020)
PublisherInstitute of Electrical and Electronics Engineers, Inc.
Pages882-891
Number of pages10
ISBN (electronic)9781728173030
Publication statusPublished - Jul 2020

Publication series

NameProceedings - International Computer Software and Applications Conference
ISSN (Print)0730-3157

Conference

Title44th IEEE Computer Society International Conference on Computers, Software, and Applications (COMPSAC 2020)
LocationVirtual
PlaceSpain
CityMadrid
Period13 - 17 July 2020

Abstract

In cross-version defect prediction (CVDP), historical data is derived from the prior version of the same project to predict defects of the current version. Recent studies in CVDP focus on subset selection to deal with the changes of the data distributions. No prior study has focused on training data arriving in streaming fashion across the versions where the significant differences between versions make the prediction unreliable. We refer to this situation as Drift Propensity(DP). By identifying DP, necessary steps can be taken (e.g., updating or retraining the model) to improve the prediction performance. In this paper, we investigate the chronological defect datasets and identify DP in the datasets. The no-memory data management technique is employed to manage the data distributions and a DP detection technique is proposed. The idea behind the proposed DP detection technique is to monitor the algorithm’s error-rate. The DP detector triggers DP, warning, and control flags to take necessary steps. The proposed technique is significantly superior in identifying the distribution differences (p-value < 0.05). The DP’s identified in the data distributions achieve large effect sizes (Hedges′g ≥ 0.80) during the pair-wise comparisons. We observe that if the error-rate exponentially increases, it causes DP, resulting in prediction performance deterioration. We thus recommend researches and practitioners to address DP in the chronological datasets. Due to its potential effects in the datasets, the prediction models could be enhanced to get the best results in CVDP.

Research Area(s)

  • cross-version defect prediction, drift propensity, software defect prediction, streaming data, two-window-based data distributions

Citation Format(s)

A Drift Propensity Detection Technique to Improve the Performance for Cross-Version Software Defect Prediction. / Kabir, Md Alamgir; Keung, Jacky W.; Bennin, Kwabena E. et al.
Proceedings - 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC 2020). Institute of Electrical and Electronics Engineers, Inc., 2020. p. 882-891 9202527 (Proceedings - International Computer Software and Applications Conference).

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review