Research on Methods in Locating Software Features and Bugs
軟件特徵與缺陷的定位方法研究
Student thesis: Doctoral Thesis
Author(s)
Related Research Unit(s)
Detail(s)
Awarding Institution | |
---|---|
Supervisors/Advisors |
|
Award date | 23 Jul 2018 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/theses/theses(b66b9567-b0b6-4a2e-8125-d44734e64278).html |
---|---|
Other link(s) | Links |
Abstract
Nowadays, software products are widely used in various fields. In the software development and maintenance processes, features from the requirements document and bugs from bug reports represent the wanted and unwanted functionalities of software projects. To implement these features or remove these bugs, finding the initial locations of the required features or buggy locations in the source code is important for developers. Particularly when the developers are the newcomers to a project, automatic localization approaches can help these new developers to quickly find the starting locations of features and bugs in an unfamiliar project. These approaches are generally referred to as feature location or bug localization. One of the main techniques used in feature location and bug localization is based on Information Retrieval (IR). This approach mainly calculates the textual similarity between source code entities and the feature description including bug reports, and then produces a ranked list of source code entities based on the similarities. Developers then examine the ranked list, beginning with the top entity, to find the target source entity, which is relevant to the feature or bug. Since feature location and bug localization have the same basic framework in IR-based approaches, bug localization is considered to be a specific area of feature location, and the bugs are considered to be unwanted features.
In this IR-based framework, there are several problems which limit the performance of location: (1) Recent automatic location methods do not consider the conventional methods. There is little research on the situation about the automatic method has low performance on bugs. (2) The existing methods of hybrid bug localization put little effort into the combining method and its influence on the localization performance. (3) There is little research work concerning the BM25 and BM25F algorithms in the information retrieval based feature location.
The three problems are distributed in three different parts of the whole location process. In order to increase the performance of feature location and bug localization, this thesis introduces the following three approaches to deal with the above problems:
1) This thesis proposes a bug localization strategy, which switches the localization process from the automatic ones to conventional ones, according to the prediction of the performance of both approaches in a current software project. Although some IR-based location techniques perform very well in some bugs, there are still some bugs hard to be localized using the automatic technique. For these bugs, conventional localization approaches may be more suitable than the automatic methods. The proposed strategy considers the situations including using automatic and conventional bug localization method. It optimizes the time point when developers switch the automatic method to the conventional method. This strategy increases the localization performance of the bugs which are hard to locate using automatic methods, but it has little influence on the well-performing bugs using automatic methods.
2) In hybrid bug localization methods, this thesis compares eight Learning to Rank (LtR) techniques to combine beneficial attributes from six different types and find the suitable integrating technique for hybrid bug localization. In recent decades, many additional attributes have been identified as being beneficial for bug localization. Attributes derived from the version history, stack traces, source code structure and so on, have been imported to localization approaches to help locate buggy source entities more precisely. However, the recent hybrid methods most use the linear combining method to integrate the beneficial attributes, or only use on LtR method. Little research puts effort on the influence of the combining method on bug localization performance. This thesis has researched the performance of the eight LtR based bug localization methods and find that the coordinate ascent algorithm performs best in the selected attributes and data.
3) This thesis has research on BM25 and BM25F based feature location, which outperforms three conventional IR methods. BM25 and BM25F are popular information retrieval ranking algorithms, for which the performance is largely affected by their parameter settings. This thesis compares the feature location performance of BM25 and BM25F with that of three IR models, including Vector Space Model (VSM), Unigram Model (UM) and Latent Dirichlet Allocation (LDA), with different parameter values respectively. When applying the BM25F algorithm, the source code text is divided into two fields depending on whether the source entity is called. The results show that BM25 and BM25F are more effective than the three basic IR models for feature location.
To conclude, this thesis contributes three ways to increase the performance of IR based feature location and bug localization from three distinct aspects. This can help the software developers to quickly locate the source code entities which need to be modified, when receiving functional requirements and bug reports, thereby improving the efficiency of software development and reducing development and maintenance costs.
In this IR-based framework, there are several problems which limit the performance of location: (1) Recent automatic location methods do not consider the conventional methods. There is little research on the situation about the automatic method has low performance on bugs. (2) The existing methods of hybrid bug localization put little effort into the combining method and its influence on the localization performance. (3) There is little research work concerning the BM25 and BM25F algorithms in the information retrieval based feature location.
The three problems are distributed in three different parts of the whole location process. In order to increase the performance of feature location and bug localization, this thesis introduces the following three approaches to deal with the above problems:
1) This thesis proposes a bug localization strategy, which switches the localization process from the automatic ones to conventional ones, according to the prediction of the performance of both approaches in a current software project. Although some IR-based location techniques perform very well in some bugs, there are still some bugs hard to be localized using the automatic technique. For these bugs, conventional localization approaches may be more suitable than the automatic methods. The proposed strategy considers the situations including using automatic and conventional bug localization method. It optimizes the time point when developers switch the automatic method to the conventional method. This strategy increases the localization performance of the bugs which are hard to locate using automatic methods, but it has little influence on the well-performing bugs using automatic methods.
2) In hybrid bug localization methods, this thesis compares eight Learning to Rank (LtR) techniques to combine beneficial attributes from six different types and find the suitable integrating technique for hybrid bug localization. In recent decades, many additional attributes have been identified as being beneficial for bug localization. Attributes derived from the version history, stack traces, source code structure and so on, have been imported to localization approaches to help locate buggy source entities more precisely. However, the recent hybrid methods most use the linear combining method to integrate the beneficial attributes, or only use on LtR method. Little research puts effort on the influence of the combining method on bug localization performance. This thesis has researched the performance of the eight LtR based bug localization methods and find that the coordinate ascent algorithm performs best in the selected attributes and data.
3) This thesis has research on BM25 and BM25F based feature location, which outperforms three conventional IR methods. BM25 and BM25F are popular information retrieval ranking algorithms, for which the performance is largely affected by their parameter settings. This thesis compares the feature location performance of BM25 and BM25F with that of three IR models, including Vector Space Model (VSM), Unigram Model (UM) and Latent Dirichlet Allocation (LDA), with different parameter values respectively. When applying the BM25F algorithm, the source code text is divided into two fields depending on whether the source entity is called. The results show that BM25 and BM25F are more effective than the three basic IR models for feature location.
To conclude, this thesis contributes three ways to increase the performance of IR based feature location and bug localization from three distinct aspects. This can help the software developers to quickly locate the source code entities which need to be modified, when receiving functional requirements and bug reports, thereby improving the efficiency of software development and reducing development and maintenance costs.