Bug Localization with Semantic and Structural Features using Convolutional Neural Network and Cascade Forest

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

25 Scopus Citations
View graph of relations

Related Research Unit(s)

Detail(s)

Original languageEnglish
Title of host publicationProceedings of the 22nd International Conference on Evaluation and Assessment in Software Engineering 2018, EASE 2018
PublisherAssociation for Computing Machinery
ISBN (print)9781450364034
Publication statusPublished - Jun 2018

Publication series

NameACM International Conference Proceeding Series

Conference

Title22nd Evaluation and Assessment in Software Engineering Conference (EASE 2018)
LocationUniversity of Canterbury
PlaceNew Zealand
CityChristchurch
Period28 - 29 June 2018

Abstract

Background: Correctly localizing buggy files for bug reports together with their semantic and structural information is a crucial task, which would essentially improve the accuracy of bug localization techniques. Aims: To empirically evaluate and demonstrate the effects of both semantic and structural information in bug reports and source files on improving the performance of bug localization, we propose CNN_Forest involving convolutional neural network and ensemble of random forests that have excellent performance in the tasks of semantic parsing and structural information extraction. Method: We first employ convolutional neural network with multiple filters and an ensemble of random forests with multi-grained scanning to extract semantic and structural features from the word vectors derived from bug reports and source files. And a subsequent cascade forest (a cascade of ensembles of random forests) is used to further extract deeper features and observe the correlated relationships between bug reports and source files. CNN Forest is then empirically evaluated over 10,754 bug reports extracted from AspectJ, Eclipse UI, JDT, SWT, and Tomcat projects. Results: The experiments empirically demonstrate the significance of including semantic and structural information in bug localization, and further show that the proposed CNN_Forest achieves higher Mean Average Precision and Mean Reciprocal Rank measures than the best results of the four current state-of-the-art approaches (NP-CNN, LR+WE, DNNLOC, and BugLocator). Conclusion: CNN_Forest is capable of defining the correlated relationships between bug reports and source files, and we empirically show that semantic and structural information in bug reports and source files are crucial in improving bug localization.

Research Area(s)

  • Bug localization, Cascade forest, Convolutional neural network, Semantic information, Structural information, Word embedding

Citation Format(s)

Bug Localization with Semantic and Structural Features using Convolutional Neural Network and Cascade Forest. / Xiao, Yan; Keung, Jacky; Mi, Qing et al.
Proceedings of the 22nd International Conference on Evaluation and Assessment in Software Engineering 2018, EASE 2018. Association for Computing Machinery, 2018. (ACM International Conference Proceeding Series).

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review