Projects per year
Abstract
Pre-trained large language models (LLMs) are increasingly used in software development for code generation, with a preference for private LLMs over public ones to avoid the risk of exposing corporate secrets. Validating the stability of these LLMs’ outputs is crucial, and our study proposes using symmetric Metamorphic Relations (MRs) from Metamorphic Testing (MT) for this purpose. Our study involved an empirical experiment with ten LLMs (eight private and two public) and two publicly available datasets. We defined seven symmetric MRs to generate “Follow-up” datasets from “Source” datasets for testing. Our evaluation aimed to detect violations (inconsistent predictions) between “Source” and “Follow-up” datasets and assess the effectiveness of MRs in identifying correct and incorrect non-violated predictions from ground truths. Results showed that one public and four private LLMs did not violate “Case transformation of prompts” MR. Furthermore, effectiveness and performance results indicated that proposed MRs are effective tools for explaining the instability of LLM’s outputs by “Case transformation of prompts”, “Duplication of prompts”, and “Paraphrasing of prompts”. The study underscored the importance of enhancing LLMs’ semantic understanding of prompts for better stability and highlighted potential future research directions, including exploring different MRs, enhancing semantic understanding, and applying symmetry to prompt engineering. © 2024 Elsevier Inc. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
Original language | English |
---|---|
Article number | 112330 |
Journal | The Journal of Systems and Software |
Volume | 222 |
Online published | 25 Dec 2024 |
DOIs | |
Publication status | Published - Apr 2025 |
Funding
This work is supported in part by the General Research Fund of the Research Grants Council of Hong Kong and the research funds of the City University of Hong Kong (6000796, 9229109, 9229098, 9220103, 9229029), and the Natural Science Foundation of Shandong Province under Grant ZR2024QF093.
Research Keywords
- Metamorphic testing
- Metamorphic relation
- True satisfaction
- Large Language model
- Code generation
Fingerprint
Dive into the research topics of 'Effectiveness of Symmetric Metamorphic Relations on Validating the Stability of code generation LLM'. Together they form a unique fingerprint.-
DON_RMG: Deep Probabilistic Reasoning and Statistical Analysis Using Deep-Learning – Phase 2 - RMGS
Keung, J. W. (Principal Investigator / Project Coordinator)
1/08/22 → …
Project: Research
-
DON_RMG: Deep Learning-based Technologies for Practical Data Analytics in Technological Innovation - RMGS
Keung, J. W. (Principal Investigator / Project Coordinator)
1/04/22 → …
Project: Research
-
DON_RMG: Smart Intelligent Process Automation for the Mortgage Lending Industry - RMGS
Keung, J. W. (Principal Investigator / Project Coordinator)
1/06/20 → …
Project: Research