Finding conclusion stability for selecting the best effort predictor in software effort estimation

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review

63 Scopus Citations
View graph of relations



Original languageEnglish
Pages (from-to)543-567
Journal / PublicationAutomated Software Engineering
Issue number4
Publication statusPublished - Dec 2013
Externally publishedYes


Background: Conclusion Instability in software effort estimation (SEE) refers to the inconsistent results produced by a diversity of predictors using different datasets. This is largely due to the "ranking instability" problem, which is highly related to the evaluation criteria and the subset of the data being used. Aim: To determine stable rankings of different predictors. Method: 90 predictors are used with 20 datasets and evaluated using 7 performance measures, whose results are subject to Wilcoxon rank test (95 %). These results are called the "aggregate results". The aggregate results are challenged by a sanity check, which focuses on a single error measure (MRE) and uses a newly developed evaluation algorithm called CLUSTER. These results are called the "specific results." Results: Aggregate results show that: (1) It is now possible to draw stable conclusions about the relative performance of SEE predictors; (2) Regression trees or analogy-based methods are the best performers. The aggregate results are also confirmed by the specific results of the sanity check. Conclusion: This study offers means to address the conclusion instability issue in SEE, which is an important finding for empirical software engineering. © 2012 Springer Science+Business Media, LLC.

Research Area(s)

  • Analogy, Data mining, Effort estimation, Evaluation criteria, Linear regression, MMRE, Neural nets, Regression trees, Stability