Effectiveness of Symmetric Metamorphic Relations on Validating the Stability of code generation LLM

Pak Yuen Patrick Chan, Jacky Keung, Zhen Yang*

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

Abstract

Pre-trained large language models (LLMs) are increasingly used in software development for code generation, with a preference for private LLMs over public ones to avoid the risk of exposing corporate secrets. Validating the stability of these LLMs’ outputs is crucial, and our study proposes using symmetric Metamorphic Relations (MRs) from Metamorphic Testing (MT) for this purpose. Our study involved an empirical experiment with ten LLMs (eight private and two public) and two publicly available datasets. We defined seven symmetric MRs to generate “Follow-up” datasets from “Source” datasets for testing. Our evaluation aimed to detect violations (inconsistent predictions) between “Source” and “Follow-up” datasets and assess the effectiveness of MRs in identifying correct and incorrect non-violated predictions from ground truths. Results showed that one public and four private LLMs did not violate “Case transformation of prompts” MR. Furthermore, effectiveness and performance results indicated that proposed MRs are effective tools for explaining the instability of LLM’s outputs by “Case transformation of prompts”, “Duplication of prompts”, and “Paraphrasing of prompts”. The study underscored the importance of enhancing LLMs’ semantic understanding of prompts for better stability and highlighted potential future research directions, including exploring different MRs, enhancing semantic understanding, and applying symmetry to prompt engineering. © 2024 Elsevier Inc. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
Original languageEnglish
Article number112330
JournalThe Journal of Systems and Software
Volume222
Online published25 Dec 2024
DOIs
Publication statusPublished - Apr 2025

Funding

This work is supported in part by the General Research Fund of the Research Grants Council of Hong Kong and the research funds of the City University of Hong Kong (6000796, 9229109, 9229098, 9220103, 9229029), and the Natural Science Foundation of Shandong Province under Grant ZR2024QF093.

Research Keywords

  • Metamorphic testing
  • Metamorphic relation
  • True satisfaction
  • Large Language model
  • Code generation

Fingerprint

Dive into the research topics of 'Effectiveness of Symmetric Metamorphic Relations on Validating the Stability of code generation LLM'. Together they form a unique fingerprint.

Cite this