Accuracy Analysis of Spectrum-based Fault Localization - Relations and Ensembles

基於頻譜的缺陷定位方法之精確度分析 ─ 關係與組合

Student thesis: Doctoral Thesis

View graph of relations


Related Research Unit(s)


Awarding Institution
Award date26 May 2017



Spectrum-based fault localization (SBFL) is a class of program debugging techniques which aim to estimate the fault-proneness of program entities responsible to the observed failures of executing a program over a test suite. The fault-proneness estimation is computed by the SBFL technique’s underlying formula based on the execution coverage profile achieved by the test suite on individual program entities. Comparison of the fault-proneness estimation accuracy of a small set of SBFL formulas has been successfully studied in a theoretical context to form a hierarchy of these SBFL formulas. Nonetheless, existing theoretical results rest on several assumptions that can only be validated after the program faults have been located. Also, no study has been done to apply SBFL to the scenario of debugging a forthcoming program version while making use of the execution profiles generated by live test data collected from the real usage of a currently deployed version of the same program. We address these important problems in this thesis.

The first contribution of the thesis is our proposal of a first technique that can empirically construct a graph representing the accuracy ordering relations among an arbitrary set of SBFL formulas without making assumptions of program faults and their locations. The technique systematically accounts for the effect of variations in fault-proneness estimation through both statistical assessment at the program level and consistency analysis among different programs. It can replicate the theoretically-derived accuracy hierarchy of SBFL formulas under the same setting and assumptions, and furthermore it has been extended to produce accuracy graphs of an expanded set of SBFL formulas as well as analyzing and contrasting these graphs under a variety of settings. We also show that collecting execution coverage profile at the instruction level for fault-proneness estimation could be more effective.

The second contribution of the thesis is a novel technique that can dynamically synthesize an adaptive ensemble SBFL formula based on a set of solo SBFL formulas and the execution coverage of a pair of successive code versions of the same program. This technique is novel in that the ensemble formula is adaptive to the fault-proneness estimated by the underlying solo formulas and is generated at strategic time points adaptive to the cumulative coverage profile pair at runtime. We show by experiments that some synthesized formulas are statistically more accurate than their respective underlying solo formulas and conducted a comprehensive analysis of the factors that affect the accuracy of the synthesized formulas.

In summary, this thesis reports our contributions of proposing 1) a first technique in accuracy relation graph construction and analysis through experimentation, and 2) a novel technique that synthesizes an ensemble SBFL formula responsive to adaptive execution profiles at runtime.