Using Class Imbalance Learning for Cross-Company Defect Prediction

Xiao Yu, Mingsong Zhou, Xu Chen*, Lijun Deng, Lu Wang

*Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

7 Citations (Scopus)

Abstract

Cross-company defect prediction (CCDP) is a practical way that trains a prediction model by exploiting one or multiple projects of a source company and then applies the model to target company. Unfortunately, the performance of such CCDP models is susceptible to the high imbalanced nature between the defect-prone and non-defect classes of CC data. Class imbalance learning is applied to alleviate this issue. Because many class imbalance learning methods have been proposed, there is an imperative need to analyze and compare the performance of these methods for CCDP. Although prior empirical studies have proven AdaBoost.NC algorithm achieves the best performance for defect prediction. This observation leads us to conduct a careful empirical study the issues of if and how class imbalance learning methods can benefit cross-company defect prediction. We investigate different types of class imbalance learning methods, including under-sampling technique, over-sampling technique and over sampling followed by under-sampling technique on the cross-company defect prediction performance over 15 publicly available datasets. Experimental results show that under-sampling technique achieves the best overall performance in terms of the gmeasure among those methods we studied.
Original languageEnglish
Title of host publicationSEKE 2017
Subtitle of host publicationProceedings of the 29th International Conference on Software Engineering & Knowledge Engineering
PublisherKSI Research Inc. and Knowledge Systems Institute
Pages117-122
ISBN (Print)1891706411
DOIs
Publication statusPublished - 5 Jul 2017
Event29th International Conference on Software Engineering and Knowledge Engineering, SEKE 2017 - Wyndham Pittsburgh University Center, Pittsburgh, United States
Duration: 5 Jul 20177 Jul 2017
http://ksiresearchorg.ipage.com/seke/seke17.html
http://ksiresearchorg.ipage.com/seke/Proceedings/seke/SEKE2017_Proceedings.pdf

Publication series

Name
ISSN (Print)2325-9000
ISSN (Electronic)2325-9086

Conference

Conference29th International Conference on Software Engineering and Knowledge Engineering, SEKE 2017
Country/TerritoryUnited States
CityPittsburgh
Period5/07/177/07/17
Internet address

Research Keywords

  • Class imbalance learning
  • Cross-company defect prediction
  • Software defect prediction

Fingerprint

Dive into the research topics of 'Using Class Imbalance Learning for Cross-Company Defect Prediction'. Together they form a unique fingerprint.

Cite this