A Framework to Sample Bellwether Moving Windows for Improving Software Effort Estimation


Student thesis: Doctoral Thesis

View graph of relations


Related Research Unit(s)


Awarding Institution
Award date16 Aug 2018


Context: Software effort estimation (SEE) is the process used in predicting the effort needed for software development, thus supporting scheduling, costing and the allocation of resources to meet project delivery deadlines. Accurately predicting the required effort of a new software project plays a key role in the strategic planning of project development and delivery since it minimises overestimation and underestimation. Previous studies have considered the use of recently completed projects (defined as moving windows) in developing SEE models which works by the assumption that moving windows are more relevant for improving prediction accuracy of new projects. However, the following constraints exist when considering moving windows for setting up SEE models for new projects: (1) the relevancy constraint of projects forming the moving window, (2) the sizing constraint of the moving window, (3) the aging constraint of the moving window, and (4) the weighting function constraint of the moving window. Thus the sampling of moving windows from chronological projects to be used as the training set for SEE models is known to have an impact on prediction accuracy based on the four aforementioned constraints. Furthermore, conclusion instability across SEE models (or learners) for a given dataset still threatens the reliability of prediction accuracy for new projects. Such instability may be attributed to the sampling problem based on the four aforementioned constraints and the lack of an effort classification benchmark in SEE. Thus given relevant sampling of moving windows from historical projects and trained with a set of SEE learners, then the learner(s) which yields accurate classifications of predicted effort of hold-out projects can be used for new project estimation. For example, given a set of learners (say l_1,l_2,…,l_n where i...n∈N) trained and validated on a given set of historical projects with their respective effort values. If li and lj (∀i,j∈N) yield accurate classifications of their respective estimated efforts, then their prediction accuracies can be relatively trusted when use for new project estimation.

Objectives: The study seeks to tackle the four aforementioned constraints by conducting theoretical and empirical investigations. Thus the study sets out to first investigate the existence of exemplary and recently completed projects with defined window size and age parameters, and whether their use in SEE modelling improves prediction accuracy. Such exemplary and recently completed projects are referred to as the Bellwether moving window (BMW). This study also seeks to ameliorate the conclusion instability issue across learners using the BMW and an effort classification scheme. The study contributes with an introduced guideline (based on the studied datasets) to assist researchers and practitioners to sample BMW to be used for training prediction models.

Method: Empirical investigation of the moving window assumption was done based on the theory that the prediction outcome of a future event depends on the outcome of prior events. This investigation study was initialised with six postulations which were theoretically and empirically proven to establish the existence of BMW in a given chronological dataset to be used for SEE modelling. Sampling of the exemplary or relevant projects (Bellwethers) from four chronological datasets (i.e., the ISBSG dataset, Kitchenham dataset, Desharnais dataset and Maxwell dataset) was undertaken using three introduced Bellwether methods (SSPM, SysSam and RandSam). The sizing and aging constraints of the BMW were addressed based on the Markov chain Monte Carlo methodology. The ergodic Markov chain was used to determine the stationarity of the BMW. Two weighting functions (namely Biweight and Triweight functions) were introduced in addition to the existing four weighting functions (i.e., Triangular, Epanechnikov, Gaussian and Rectangular) to further investigate their effects on prediction accuracy. The sampled BMW was benchmarked against the growing portfolio (the entire collection of historical projects) and were used to predict the targets of hold-out projects. The Eubank’s optimal spacing theory based on the density-quantile function was adopted to discretise the software effort values of the studied datasets into three classes (high, moderate and low). This classification scheme is to provide a benchmark to assist researchers and practitioners to assess prediction results from different learners and select the learner(s) that provides accurate classifications of the hold-out projects estimation. Robust test statistics comprising Brunner’s ANOVA-like test, Yuen’s test and Cliff’s δ effect size as well as mean absolute error (MAE) were used to assess the prediction performance of three learners – the baseline Automatically Transformed Linear Model (ATLM), ElasticNet regression and Deep neural networks. Statistical inferences were made at the 5% asymptotic significance level.

Results: Empirical results show that (1) Bellwethers are evident in the studied chronological datasets for SEE modelling, (2) the BMW has an approximate size of 40 to 80 exemplary projects that should not be more than 3 years old relative to the hold-out projects being estimated, (3) weighting the BMW with the Triweight function was more advantageous with respect to relative prediction accuracy against the growing portfolio benchmark, (4) training the Deep neural networks with the BMW minimises conclusion instability in SEE modelling against the growing portfolio benchmark, (5) the characteristics of the exemplary projects that constituted the BMW were further investigated and was found that trimming + log transforming the data yielded improved prediction accuracy as compared to using untransformed data.

Conclusion: Results from the study show the effectiveness and recommended use of BMW for improved prediction accuracy in SEE modelling. The introduced guideline for BMW will serve as a foundation to assist software engineers and researchers to sample relevant training and validation sets with defined window size and age parameters prior to the development of new software projects. The software effort classification scheme is to assist researchers and practitioners to interpret the likely level of effort to be expended in software project development.

    Research areas

  • Bellwether effect, Bellwether moving windows, Growing portfolio, Markov chains, Software effort estimation