Case consistency : a necessary data quality property for software engineering data sets
Research output: Conference Papers › RGC 32 - Refereed conference paper (without host publication) › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Publication status | Published - 27 Apr 2015 |
Conference
Title | International Conference on Evaluation and Assessment in Software Engineering |
---|---|
Place | China |
City | Nanjing |
Period | 27 - 29 April 2015 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/publications/publication(375dc8f8-191d-4aa1-9783-f8c27f58d773).html |
---|
Abstract
Data quality is an essential aspect in any empirical study, because the validity of models and/or analysis results derived from an empirical data is inherently influenced by its quality. In this empirical study, we focus on data consistency as a critical factor influencing the accuracy of prediction models in software engineering. We propose a software metric called Cases Inconsistency Level (CIL) for analyzing conflicts within software engineering data sets by leveraging probability statistics on project cases and counting the number of conflicting pairs. The result demonstrated that CIL is able to be used as a metric to identify either consistent data sets or inconsistent data sets, which are valuable for building robust prediction models. In addition to measuring the level of consistency, CIL is proved to be applicable to predict whether or not an effort model built from data set can achieve higher accuracy, an important indicator for empirical experiments in software engineering.
Citation Format(s)
Case consistency: a necessary data quality property for software engineering data sets. / Passakorn, Phannachitta; Akito, Monden; KEUNG, Wai Jacky et al.
2015. Paper presented at International Conference on Evaluation and Assessment in Software Engineering, Nanjing, China.
2015. Paper presented at International Conference on Evaluation and Assessment in Software Engineering, Nanjing, China.
Research output: Conference Papers › RGC 32 - Refereed conference paper (without host publication) › peer-review