Feature selection and embedding based cross project framework for identifying crashing fault residence

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

9 Scopus Citations
View graph of relations

Author(s)

  • Zhou Xu
  • Tao Zhang
  • Meng Yan
  • Xiapu Luo
  • Xiaohong Zhang
  • Ling Xu
  • Yutian Tang

Related Research Unit(s)

Detail(s)

Original languageEnglish
Article number106452
Journal / PublicationInformation and Software Technology
Volume131
Online published15 Oct 2020
Publication statusPublished - Mar 2021

Abstract

Context: The automatically produced crash reports are able to analyze the root of fault causing the crash (crashing fault for short) which is a critical activity for software quality assurance.
Objective: Correctly predicting the existence of crashing fault residence in stack traces of crash report can speed up program debugging process and optimize debugging efforts. Existing work focused on the collected label information from bug-fixing logs, and the extracted features of crash instances from stack traces and source code for Identification of Crashing Fault Residence (ICFR) of newly-submitted crashes. This work develops a novel cross project ICFR framework to address the data scarcity problem by using labeled crash data of other project for the ICFR task of the project at hand. This framework removes irrelevant features, reduces distribution differences, and eases the class imbalance issue of cross project data since these factors may negatively impact the ICFR performance.
Method: The proposed framework, called FSE, combines Feature Selection and feature Embedding techniques. The FSE framework first uses an information gain ratio based feature ranking method to select a relevant feature subset for cross project data, and then employs a state-of-the-art Weighted Balanced Distribution Adaptation (WBDA) method to map features of cross project data into a common space. WBDA considers both marginal and conditional distributions as well as their weights to reduce data distribution discrepancies. Besides, WBDA balances the class proportion of each project data to alleviate the class imbalance issue.
Results: We conduct experiments on 7 projects to evaluate the performance of our FSE framework. The results show that FSE outperforms 25 methods under comparison.
Conclusion: This work proposes a cross project learning framework for ICFR, which uses feature selection and embedding to remove irrelevant features and reduce distribution differences, respectively. The results illustrate the performance superiority of our FSE framework.

Research Area(s)

  • Crashing fault, Cross project framework, Feature embedding, Feature selection, Stack trace