Case consistency : a necessary data quality property for software engineering data sets

Research output: Conference Papers (RGC: 31A, 31B, 32, 33)32_Refereed conference paper (no ISBN/ISSN)peer-review

View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Original languageEnglish
Publication statusPublished - 27 Apr 2015

Conference

TitleInternational Conference on Evaluation and Assessment in Software Engineering
PlaceChina
CityNanjing
Period27 - 29 April 2015

Abstract

Data quality is an essential aspect in any empirical study, because the validity of models and/or analysis results derived from an empirical data is inherently influenced by its quality. In this empirical study, we focus on data consistency as a critical factor influencing the accuracy of prediction models in software engineering. We propose a software metric called Cases Inconsistency Level (CIL) for analyzing conflicts within software engineering data sets by leveraging probability statistics on project cases and counting the number of conflicting pairs. The result demonstrated that CIL is able to be used as a metric to identify either consistent data sets or inconsistent data sets, which are valuable for building robust prediction models. In addition to measuring the level of consistency, CIL is proved to be applicable to predict whether or not an effort model built from data set can achieve higher accuracy, an important indicator for empirical experiments in software engineering.

Citation Format(s)

Case consistency : a necessary data quality property for software engineering data sets. / Passakorn, Phannachitta; Akito, Monden; KEUNG, Wai Jacky; MATSUMOTO, Kenichi.

2015. Paper presented at International Conference on Evaluation and Assessment in Software Engineering, Nanjing, China.

Research output: Conference Papers (RGC: 31A, 31B, 32, 33)32_Refereed conference paper (no ISBN/ISSN)peer-review