Predicting the precise number of software defects : Are we there yet?
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Article number | 106847 |
Journal / Publication | Information and Software Technology |
Volume | 146 |
Online published | 22 Jan 2022 |
Publication status | Published - Jun 2022 |
Link(s)
Abstract
Context: Defect Number Prediction (DNP) models can offer more benefits than classification-based defect prediction. Recently, many researchers proposed to employ regression algorithms for DNP, and found that the algorithms achieve low Average Absolute Error (AAE) and high Pred(0.3) values. However, since the defect datasets generally contain many non-defective modules, even if a DNP model predicts the number of defects in all modules as zero, the AAE value of the model will be low and Pred(0.3) value will be high. Therefore, the good performance of the regression algorithms in terms of AAE and Pred(0.3) may be questioned due to the imbalanced distribution of the number of defects.
Objective: To revisit the impact of regression algorithms for predicting the precise number of defects.
Method: We examine the practical effects of 12 widely-used regression algorithms, two data resampling algorithm (SmoteR and ROS), and three ensemble learning algorithms (gradient boosting regression, AdaBoost.R2, and Bagging), one feature selection method (information gain) and one parameter optimization method (grid search) for predicting the precise number of defects on the 18 PROMISE datasets. We propose to evaluate the AAE and Pred(0.3) values for the modules with different numbers of defects separately.
Results: The AAE values for defective modules are very high and the Pred(0.3) values are very low, i.e., the regression algorithms are very inaccurate for predicting the precise number of defects in defective modules.
Conclusion: The problem of predicting the precise number of defects via regression algorithms is far from being solved. We recommend that software testers use regression algorithms to rank modules for testing resource allocation, rather than predict the precise number of defects to evaluate the software reliability and maintenance effort. In addition, most existing DNP studies employing the whole AAE and Pred(0.3) values of all modules as the evaluation metrics for the proposed DNP algorithms should be revisited.
© 2022 Elsevier B.V. All rights reserved.
Objective: To revisit the impact of regression algorithms for predicting the precise number of defects.
Method: We examine the practical effects of 12 widely-used regression algorithms, two data resampling algorithm (SmoteR and ROS), and three ensemble learning algorithms (gradient boosting regression, AdaBoost.R2, and Bagging), one feature selection method (information gain) and one parameter optimization method (grid search) for predicting the precise number of defects on the 18 PROMISE datasets. We propose to evaluate the AAE and Pred(0.3) values for the modules with different numbers of defects separately.
Results: The AAE values for defective modules are very high and the Pred(0.3) values are very low, i.e., the regression algorithms are very inaccurate for predicting the precise number of defects in defective modules.
Conclusion: The problem of predicting the precise number of defects via regression algorithms is far from being solved. We recommend that software testers use regression algorithms to rank modules for testing resource allocation, rather than predict the precise number of defects to evaluate the software reliability and maintenance effort. In addition, most existing DNP studies employing the whole AAE and Pred(0.3) values of all modules as the evaluation metrics for the proposed DNP algorithms should be revisited.
© 2022 Elsevier B.V. All rights reserved.
Research Area(s)
- Empirical software engineering, Number of defects, Regression algorithms, Software Defect Prediction
Citation Format(s)
Predicting the precise number of software defects: Are we there yet? / Yu, Xiao; Keung, Jacky; Xiao, Yan et al.
In: Information and Software Technology, Vol. 146, 106847, 06.2022.
In: Information and Software Technology, Vol. 146, 106847, 06.2022.
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review