Effectiveness of Symmetric Metamorphic Relations on Validating the Stability of code generation LLM
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Article number | 112330 |
Journal / Publication | The Journal of Systems and Software |
Online published | 25 Dec 2024 |
Publication status | Online published - 25 Dec 2024 |
Link(s)
DOI | DOI |
---|---|
Permanent Link | https://scholars.cityu.edu.hk/en/publications/publication(f46f31e9-a95e-4a77-82a8-995a781bcb30).html |
Abstract
Pre-trained large language models (LLMs) are increasingly used in software development for code generation, with a preference for private LLMs over public ones to avoid the risk of exposing corporate secrets. Validating the stability of these LLMs’ outputs is crucial, and our study proposes using symmetric Metamorphic Relations (MRs) from Metamorphic Testing (MT) for this purpose. Our study involved an empirical experiment with ten LLMs (eight private and two public) and two publicly available datasets. We defined seven symmetric MRs to generate “Follow-up” datasets from “Source” datasets for testing. Our evaluation aimed to detect violations (inconsistent predictions) between “Source” and “Follow-up” datasets and assess the effectiveness of MRs in identifying correct and incorrect non-violated predictions from ground truths. Results showed that one public and four private LLMs did not violate “Case transformation of prompts” MR. Furthermore, effectiveness and performance results indicated that proposed MRs are effective tools for explaining the instability of LLM’s outputs by “Case transformation of prompts”, “Duplication of prompts”, and “Paraphrasing of prompts”. The study underscored the importance of enhancing LLMs’ semantic understanding of prompts for better stability and highlighted potential future research directions, including exploring different MRs, enhancing semantic understanding, and applying symmetry to prompt engineering. © 2024 Elsevier Inc. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
Research Area(s)
- Metamorphic testing, Metamorphic relation, True satisfaction, Large Language model, Code generation
Bibliographic Note
Information for this record is provided by the author(s) concerned.
Citation Format(s)
Effectiveness of Symmetric Metamorphic Relations on Validating the Stability of code generation LLM. / Chan, Pak Yuen Patrick; Keung, Jacky; Yang, Zhen.
In: The Journal of Systems and Software, 25.12.2024.
In: The Journal of Systems and Software, 25.12.2024.
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review