An Empirical Study of Learning to Rank Techniques for Effort-Aware Defect Prediction

Xiao Yu, Kwabena Ebo Bennin, Jin Liu*, Jacky Wai Keung, Xiaofei Yin, Zhou Xu

*Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

42 Citations (Scopus)

Abstract

Effort-Aware Defect Prediction (EADP) ranks software modules based on the possibility of these modules being defective, their predicted number of defects, or defect density by using learning to rank algorithms. Prior empirical studies compared a few learning to rank algorithms considering small number of datasets, evaluating with inappropriate or one type of performance measure, and non-robust statistical test techniques. To address these concerns and investigate the impact of learning to rank algorithms on the performance of EADP models, we examine the practical effects of 23 learning to rank algorithms on 41 available defect datasets from the PROMISE repository using a module-based effort-Aware performance measure (FPA) and a source lines of code (SLOC) based effort-Aware performance measure (Norm(Popt)). In addition, we compare the performance of these algorithms when they are trained on a more relevant feature subset selected by the Information Gain feature selection method. In terms of FPA and Norm(Popt), statistically significant differences are observed among these algorithms with BRR (Bayesian Ridge Regression) performing best in terms of FPA, and BRR and LTR (Learning-To-Rank) performing best in terms of Norm(Popt). When these algorithms are trained on a more relevant feature subset selected by Information Gain, LTR and BRR still perform best with significant differences in terms of FPA and Norm(Popt). Therefore, we recommend BRR and LTR for building the EADP model in order to find more defects by inspecting a certain number of modules or lines of codes.
Original languageEnglish
Title of host publicationSANER '19 - Proceedings of the 2019 IEEE 26th International Conference on Software Analysis, Evolution, and Reengineering
EditorsXinyu Wang, David Lo, Emad Shihab
PublisherIEEE
Pages298-309
ISBN (Electronic)9781728105918
ISBN (Print)9781728105925
DOIs
Publication statusPublished - Feb 2019
Event26th IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2019) - Zhejiang University, Hangzhou, China
Duration: 24 Feb 201927 Feb 2019
Conference number: 26th

Publication series

NameProceedings of the ... IEEE International Conference on Software Analysis, Evolution, and Reengineering
ISSN (Print)1534-5351

Conference

Conference26th IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2019)
Abbreviated titleSANER 2019
PlaceChina
CityHangzhou
Period24/02/1927/02/19

Research Keywords

  • effort-Aware defect prediction
  • empirical study
  • learning to rank
  • Scott-Knott ESD test

Fingerprint

Dive into the research topics of 'An Empirical Study of Learning to Rank Techniques for Effort-Aware Defect Prediction'. Together they form a unique fingerprint.

Cite this