Improving bug localization with word embedding and enhanced convolutional neural networks

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

81 Scopus Citations
View graph of relations

Related Research Unit(s)

Detail(s)

Original languageEnglish
Pages (from-to)17-29
Journal / PublicationInformation and Software Technology
Volume105
Online published15 Aug 2018
Publication statusPublished - Jan 2019

Abstract

Context: Automatic localization of buggy files can speed up the process of bug fixing to improve the efficiency and productivity of software quality assurance teams. Useful semantic information is available in bug reports and source code, but it is usually underutilized by existing bug localization approaches. 
     Objective: To improve the performance of bug localization, we propose DeepLoc, a novel deep learning-based model that makes full use of semantic information. 
     Method: DeepLoc is composed of an enhanced convolutional neural network (CNN) that considers bug-fixing recency and frequency, together with word-embedding and feature-detecting techniques. DeepLoc uses word embeddings to represent the words in bug reports and source files that retain their semantic information, and different CNNs to detect features from them. DeepLoc is evaluated on over 18,500 bug reports extracted from AspectJ, Eclipse, JDT, SWT, and Tomcat projects. 
     Results: The experimental results show that DeepLoc achieves 10.87%–13.4% higher MAP (mean average precision) than conventional CNN. DeepLoc outperforms four current state-of-the-art approaches (DeepLocator, HyLoc, LR+WE, and BugLocator) in terms of Accuracy@k (the percentage of bug reports for which at least one real buggy file is located within the top k rank), MAP, and MRR (mean reciprocal rank) using less computation time. 
     Conclusion: DeepLoc is capable of automatically connecting bug reports to the corresponding buggy files and achieves better performance than four state-of-the-art approaches based on a deep understanding of semantics in bug reports and source code.

Research Area(s)

  • Bug localization, Convolutional neural network, Deep learning, Semantic information, TF-IDF, Word embedding