TSTSS : A two-stage training subset selection framework for cross version defect prediction
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Pages (from-to) | 59-78 |
Journal / Publication | Journal of Systems and Software |
Volume | 154 |
Online published | 23 Mar 2019 |
Publication status | Published - Aug 2019 |
Link(s)
DOI | DOI |
---|---|
Document Link | |
Link to Scopus | https://www.scopus.com/record/display.uri?eid=2-s2.0-85064625005&origin=recordpage |
Permanent Link | https://scholars.cityu.edu.hk/en/publications/publication(1f3e2e3b-6093-44d5-871d-490069df2361).html |
Abstract
Cross Version Defect Prediction (CVDP) is a practical scenario by training the classification model on the historical data of the prior version and then predicting the defect labels of modules in the current version. Unfortunately, the differences of data distribution across versions may hinder the effectiveness of the trained CVDP model. Thus, it is not trivial to select a suitable training subset from the prior version to promote the CVDP performance. In this paper, we propose a novel method, called Two-Stage Training Subset Selection (TSTSS), to address this challenging issue. In the first stage, TSTSS utilizes a sparse modeling representative selection method to select an initial module subset from the prior version which can well reconstruct the data of the prior version. In the second stage, TSTSS leverages a dissimilarity-based sparse subset selection method to further refine the selected module subset, which enables the selected modules to well represent the modules of the current version. Finally, we use a novel weighted extreme learning machine classifier to construct the CVDP model. We evaluate the CVDP performance of TSTSS on 50 cross-version pairs using 6 indicators. The experiments show that TSTSS can efficiently improve the CVDP performance compared with 11 baseline methods.
Research Area(s)
- Cross version defect prediction, Spare modeling, Training subset selection, Weighted extreme learning machine
Citation Format(s)
TSTSS: A two-stage training subset selection framework for cross version defect prediction. / Xu, Zhou; Li, Shuai; Luo, Xiapu et al.
In: Journal of Systems and Software, Vol. 154, 08.2019, p. 59-78.
In: Journal of Systems and Software, Vol. 154, 08.2019, p. 59-78.
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review