Validating Unsupervised Machine Learning Techniques for Software Defect Prediction with Generic Metamorphic Testing

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

View graph of relations

Related Research Unit(s)

Detail(s)

Original languageEnglish
Pages (from-to)165155-165172
Journal / PublicationIEEE Access
Volume12
Online published8 Nov 2024
Publication statusPublished - 2024

Link(s)

Abstract

In the realm of software defect prediction, unsupervised models often step in when labelled datasets are scarce, despite facing the challenge of validating models without prior knowledge of data. Addressing this, we proposed a novel approach leveraging generic metamorphic testing to validate such models effectively, bypassing the need for expert-derived metamorphic relations. Our method identifies five categories of generic metamorphic relations, further divided into twenty-one individual generic metamorphic relations, all formulated through generic Data Mutation Operators. This framework enables us to generate follow-up datasets from the source datasets, training respective software defect prediction models. By comparing predictions between the source and follow-up software defect prediction models and identifying inconsistencies, we can assess the model’s sensitivity to generic metamorphic relations as a form of validation. This approach was rigorously evaluated across twenty software defect prediction models, incorporating fourteen different machine learning algorithms and twenty high-dimensional public datasets. Remarkably, the robustness of our generic MT model was confirmed, showing substantial effectiveness in validating software defect prediction models, independent of whether they were supervised or unsupervised. Software defect prediction models, using Agglomerative clustering and Density-Based Spatial Clustering of Applications with Noise algorithms, did not violate any metamorphic relation, and nineteen software defect prediction models did not significantly violate the generic metamorphic relation “Shrinkage and Expansion”. Our findings suggest that employing generic metamorphic relations, especially “Shrinkage and Expansion”, can universally enhance the validation of defect prediction models. Furthermore, our model can be applied to continuously monitor software defect prediction models. © 2024 The Authors.

Research Area(s)

  • Clustering, machine learning, metamorphic relation, metamorphic testing, software defect prediction, validation

Download Statistics

No data available