Research on Methods of Ranking-Oriented Software Defect Prediction


Student thesis: Doctoral Thesis

View graph of relations


Related Research Unit(s)


Awarding Institution
  • Qing LI (Supervisor)
  • Jacky Keung (Supervisor)
  • Jin Liu (External person) (Supervisor)
  • Jin Liu (External person) (External Supervisor)
  • Qing Li (External person) (External Co-Supervisor)
Award date22 Dec 2020


The classification-based software defect prediction methods build prediction models using historical software data, and use the built models to predict whether new software modules are defective. According to the prediction results, software testers allocate equal testing resource to all predicted defective software modules, thus leading to the waste of limited testing resources. Ranking-Oriented Defect Prediction (RODP) methods that use learning to rank algorithms to build prediction models and then rank software modules according to the predicted number of defects, can aid the allocation of limited testing resources more efficiently than the classification-based software defect prediction methods, by guiding software testers to first inspect the software modules with more defects. Considering the four problems in the field of RODP, this thesis has carried out an in-depth study to find the best modeling method for RODP:

(1) Based on the inconsistent conclusions on which learning to rank algorithm is the best modeling method in recent RODP studies, we find that the ranking instability problem exists and the main cause is the weaknesses of the experimental method, such as comparison of few learning to rank algorithms, the use of a small number of datasets, and evaluation with inappropriate or few evaluation measures. In order to find a stable ranking of learning to rank algorithms, this thesis evaluates 22 learning to rank algorithms for RODP with 6 performance measures to the double extended Scott-Knott with Cohen’s d effect size awareness test (Scott-Knott ESD test) on 41 defect datasets. The experimental results show: (1) some ensemble learning algorithms (e.g., random forest, gradient boosting regression, and RankBoost) perform well; (2) ridge regression and LTR-linear (Learning-To-Rank with the linear model) perform best among the 22 learning to rank algorithms in terms of FPA (Fault-Percentile-Average) and PofB@20%module (Proportion of the inspected Bugs when inspecting top 20% module), but LTR-linear needs to inspect more LOC (Lines Of Code). Therefore, for the first experimental result, Chapter 4 and Chapter 5 in this thesis continue to investigate whether two other data imbalance learning methods (i.e., data resampling and cost-sensitive learning methods) can improve the performance of RODP models. For the second experimental result, Chapter 6 in this thesis proposes a multi-objective search based RODP algorithm.

(2) When training a RODP model on a dataset with the imbalanced number of defects, learning to rank algorithms often fail to rank the defective software modules accurately. To alleviate the problem of the imbalanced number of defects, this thesis proposes two data resampling and ensemble learning based RODP algorithms. The two algorithms introduce SmoteR (Synthetic Minority Oversampling Technique for Regression) and RusR (Random Under-Sampling for Regression) into the gradient boosting regression algorithm. In each iteration of gradient boosting regression, the two algorithms first employ SmoteR and RusR to balance the imbalanced distribution of the number of defects, respectively, then use gradient boosting regression to update the models built on the balanced training dataset in each iteration. Experimental results on 26 defect datasets show that the two algorithms can alleviate the problem of the imbalanced number of defects, and achieves better performance than the baseline algorithms in terms of PofB@20%module and FPA.

(3) As failure to find more defects can severely degrade software quality, ranking software modules with more defects incorrectly incurs a much higher cost than incorrectly ranking software modules with fewer defects. To solve the cost problem, this thesis proposes a cost-sensitive Ranking SVM (CSRankSVM) based RODP algorithm. CSRankSVM modifies the loss function of the original Ranking SVM algorithm. CSRankSVM increases the penalty when a module with more defects is ranked incorrectly, such that the modules with more defects can be ranked correctly. To solve the data imbalance problem in the training dataset, CSRankSVM places more weight on the module pairs that occupy a small proportion. Finally, the modified cost function of CSRankSVM is optimized using the genetic algorithm. Experimental results on 41 defect datasets show that CSRankSVM is capable of handling the cost problem of the incorrect ranking of the software modules with more defects and achieves better performance than the baseline algorithms in terms of PofB@20%module and FPA.

(4) The existing RODP algorithms ignore the size of the top-ranked software modules, but the software modules with more LOC require more inspection effort. To solve the problem, this thesis proposes a Multi-Objective Search (MOS) based RODP algorithm. MOS first uses the linear model to build the RODP model, and then employs NSGA-II (Non-dominated Sorting Genetic Algorithm -II) to generate a set of Pareto non-dominated solutions by maximizing the PofB@20%module value and minimizing the PLI@20%module (Proportion of LOC Inspected when inspecting top 20% modules) value simultaneously. Finally, MOS selects the non-dominated solution that achieves the highest PofB@20%module value on the training dataset to construct the RODP model. Experimental results on 41 defect datasets show that MOS can find more defects and inspect less LOC when inspecting the top 20% software modules compared with the baseline methods.

In conclusion, this thesis aims to the difficult problems in the field of RODP, and proposes four novel algorithms to improve the performance of RODP models, which is of great significance in improving software quality.

    Research areas

  • defect prediction, learning to rank, data resampling, ensemble learning, cost-sensitive, multi-objective search