Skip to main navigation Skip to search Skip to main content

A Zero-Training Error Correction System with Large Language Models

Yangyang Wu, Chen Yang, Mengying Zhu, Xiaoye Miao, Wei Ni, Meng Xi, Xinkui Zhao, Jianwei Yin

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

Correcting missing or erroneous data values is an essential task in data cleaning. Traditional pre-configuration error correction (EC) methods rely heavily on predefined rules or constraints, demanding significant domain knowledge and manual effort. While configuration-free EC approaches have been explored, they still demand extensive feature engineering or labeled data for intensive model training. In this paper, we propose a zero-training and interpretable EC system, named ZeroEC, that leverages large language models (LLMs) to generate chain-of-thoughts (CoTs) and correction rules for EC, without the need for model training. ZeroEC consists of two modules, contextual-relevant tuple search (CTS) and training-free explainable correction (TEC). CTS constructs a contextual-relevant tuple retriever using a weighted cosine similarity function to efficiently identify the most relevant tuples for each dirty tuple, reducing redundancy in the LLM prompts and lowering computational costs. TEC employs a clustering-based representative tuple sampling strategy to alleviate 'hallucination' risk by exposing LLMs to diverse types of data errors. It further prompts for generating correction CoTs for user-corrected representative tuples, as well as prompts for creating correction rules and explainable ECs, which automatically provide explanations for EC, all without the need for model training. Extensive experiments conducted on various real-world datasets demonstrate that ZeroEC achieves a 66.82% increase in accuracy and a 6.87x speedup compared to state-of-the-art methods. The codes and datasets of this paper are available at https://github.com/YangChen32768/ZeroEC. © 2025 IEEE.
Original languageEnglish
Title of host publicationProceedings - 2025 IEEE 41st International Conference on Data Engineering,
PublisherIEEE
Pages2949-2962
ISBN (Electronic)9798331536039
ISBN (Print)979-8-3315-3604-6
DOIs
Publication statusPublished - 2025
Event41st IEEE International Conference on Data Engineering (ICDE 2025) - Hong Kong SAR, China
Duration: 19 May 202523 May 2025
https://ieee-icde.org/2025
https://ieeexplore.ieee.org/xpl/conhome/11112833/proceeding

Publication series

NameProceedings - International Conference on Data Engineering
ISSN (Print)1084-4627
ISSN (Electronic)2375-0286

Conference

Conference41st IEEE International Conference on Data Engineering (ICDE 2025)
PlaceChina
CityHong Kong SAR
Period19/05/2523/05/25
Internet address

Funding

This work is supported by the National Key R&D Program under Grant No. 2023YFC2706404, the National NSFC under Grant No. 62372404, the Leading Goose R&D Program of Zhejiang under Grant No. 2024C01109, and the Fundamental Research Funds for the Central Universities under Grant No. 226-2024-00030.

Research Keywords

  • Correction Chain-of-thoughts
  • Correction Rule Generation
  • Error Correction
  • Large Language Models

Fingerprint

Dive into the research topics of 'A Zero-Training Error Correction System with Large Language Models'. Together they form a unique fingerprint.

Cite this