Analyzing the Open Source Software Quality Using the Debugging Activities


Student thesis: Doctoral Thesis

View graph of relations



Awarding Institution
Award date9 Aug 2019


System quality is paralyzed without software quality and this realization from the past four decades, is the focus of researchers. The adoption of open source software development has become popular in software industries. Presence of bugs in the software is critical to software quality. Open source software’s biggest concern is the consideration of software reliability as its inverse proportionality to software bugs is alarming. In this thesis, a broader systematic view has been provided for software reliability efforts in past and its research trends. However, it is observed that the major portion of the software life cycle cost is spent to detect and remove software bugs. In addition, it is noted that there are bugs that penetrate even after a particular release deployment. The count and frequency of those penetrating bugs can affect the efficiency of debugging. This thesis addresses various such problems in order to evaluate the effect of delayed and prolong bugs in a single and multiple releases of open source software. The purpose is to study the bugs’ characteristics so that their long-lasting impact on software release time and cost can be evaluated. This thesis addresses various such problems in order to evaluate the effect of delayed and prolong bugs in a single and multiple releases of open source software. The purpose is to study the bugs’ characteristics so that their long-lasting impact on software release time and cost can be evaluated.

Cost and time effective solution for software reliability analysis is becoming the biggest concern for open source software development. The OSS development process is a bug driven development and its life cycle cost is mainly incurred in the bug correction process. Bug reports, with its various features, play an important role in improving the quality of software products. Unfortunately, less attention is paid towards the extraction of important attributes from bug logs. The open source software bug repositories provide abundant data to encounter in and assist in the reliability of the deployed software product. The major
problem of filtering and extracting relevant information from these bug logs stills exists. Investigating the temporal aspect of debugging processes can help in effective open source software debugging. The first part of the thesis is focused on the extraction of relevant data for efficient debugging. This thesis makes use of comment feature of bug repositories to extract bug judgment and correction times for OSS reliability model building. For this, it presents a bug report life cycle model and proposes the method to compute the two-time points.

The low quality of the data is another challenge related to bug repositories that hinders the effective use of bug logs for the improvement of software quality. The second part of the thesis deals with the classification of submitted bug reports in the bug tracking systems of open source software. This part conducts the classification of the problem reports into fourteen categories on the basis of key attributes found in the bug logs, that is, bug resolution type, comments text, and comments time stamps. The methods of extracting the key attributes are introduced subsequently. In addition, this part is evaluated by the extended mathematical formulation of problem report judging and bug fixing time which is defined as a function of delay in response made by core and normal contributors of the open source software development.

Over the years, quality is studied through software reliability growth models with various assumptions. However, modeling reliability under optimized assumptions of debugging reveals unrealistic patterns in practice. Researchers have worked extensively from various aspects of bug reports to predict, prevent and categorize the bugs in the software. Unfortunately, the survival aspect of software bugs is hardly reflected upon for bug removal efficiency. The surviving bugs are far more crucial for software reliability as compared to timely detected bugs. Mining software repositories exposes the bugs that survive and penetrate over successive releases even after new version of the release is deployed. This may contribute to deficiency of defect detection and correction process. The third part of the thesis introduces the surviving bugs during the debugging process. This part analyzes the debugging process to interpret the bug survival through the community involved in the open source software development. A causal assessment model is developed using the Bayesian network for drawing the probabilistic inference to answer the proposed research questions.

The ignorance of surviving bugs and error generation during the debugging process is risky for software reliability. The existence of bug survival during imperfect debugging creates the probability of bug generation over multiple versions of a release. The fourth part highlights the strength of surviving bugs when they are penetrated across multi-versions of a release. To investigate the probability of surviving bugs and propagated newly generated error, two models are proposed in this part. The first model, using the Bayes network, assess the probability of surviving bugs and new errors in each release. The second model is for the assessment of the propagated new errors across the versions of a release.

The thesis employs the datasets from ten official releases of Apache 2.0 in order to draw empirical results in each part. However, the data usage depends on the problem under consideration for each part. The data sets in the bug tracking system are growing fast having structured, semi-structured and unstructured data from various developers and numerous sources forming big bug repositories. The last part of the thesis proposes the challenges associated with growing open source software bug repositories. The trend of machine learning in the study of OSS quality is discussed and the need for machine reasoning is presented. An illustration of the proposition is presented using the bug life cycle as a case of Bayesian reasoning.