Abstract
Effort-Aware Defect Prediction (EADP) ranks software modules based on the possibility of these modules being defective, their predicted number of defects, or defect density by using learning to rank algorithms. Prior empirical studies compared a few learning to rank algorithms considering small number of datasets, evaluating with inappropriate or one type of performance measure, and non-robust statistical test techniques. To address these concerns and investigate the impact of learning to rank algorithms on the performance of EADP models, we examine the practical effects of 23 learning to rank algorithms on 41 available defect datasets from the PROMISE repository using a module-based effort-Aware performance measure (FPA) and a source lines of code (SLOC) based effort-Aware performance measure (Norm(Popt)). In addition, we compare the performance of these algorithms when they are trained on a more relevant feature subset selected by the Information Gain feature selection method. In terms of FPA and Norm(Popt), statistically significant differences are observed among these algorithms with BRR (Bayesian Ridge Regression) performing best in terms of FPA, and BRR and LTR (Learning-To-Rank) performing best in terms of Norm(Popt). When these algorithms are trained on a more relevant feature subset selected by Information Gain, LTR and BRR still perform best with significant differences in terms of FPA and Norm(Popt). Therefore, we recommend BRR and LTR for building the EADP model in order to find more defects by inspecting a certain number of modules or lines of codes.
| Original language | English |
|---|---|
| Title of host publication | SANER '19 - Proceedings of the 2019 IEEE 26th International Conference on Software Analysis, Evolution, and Reengineering |
| Editors | Xinyu Wang, David Lo, Emad Shihab |
| Publisher | IEEE |
| Pages | 298-309 |
| ISBN (Electronic) | 9781728105918 |
| ISBN (Print) | 9781728105925 |
| DOIs | |
| Publication status | Published - Feb 2019 |
| Event | 26th IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2019) - Zhejiang University, Hangzhou, China Duration: 24 Feb 2019 → 27 Feb 2019 Conference number: 26th |
Publication series
| Name | Proceedings of the ... IEEE International Conference on Software Analysis, Evolution, and Reengineering |
|---|---|
| ISSN (Print) | 1534-5351 |
Conference
| Conference | 26th IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2019) |
|---|---|
| Abbreviated title | SANER 2019 |
| Place | China |
| City | Hangzhou |
| Period | 24/02/19 → 27/02/19 |
Research Keywords
- effort-Aware defect prediction
- empirical study
- learning to rank
- Scott-Knott ESD test