Skip to main navigation Skip to search Skip to main content

LLM4CGDS: Large language model-based agents for Chinese graded document simplification

Dengzhao Fang, Jipeng Qiang*, Wenjie Hou, Yi Zhu, Jingtong Gao, Xiangyu Zhao

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

Abstract

Graded reading tailors text difficulty to learners’ proficiency by producing multiple versions of the same content—an approach long embraced in language education but still dependent on labor-intensive, expert-driven adaptation. In this paper, we introduce the task of Chinese Graded Document Simplification (CGDS) for non-native learners, which seeks to automate the creation of multi-level reading materials in accordance with established proficiency standards. Guided by the three stages of the Hanyu Shuiping Kaoshi (HSK) 3.0 framework (Levels 1–3 for Advanced, Levels 4–6 for Intermediate, and Levels 7–9 for Beginner learners), we propose Large Language Model for Chinese Graded Document Simplification (LLM4CGDS), a rule-guided, large language model (LLM)-based framework that integrates HSK-level readability constraints and external knowledge retrieval to control document-level simplification without requiring supervised fine-tuning. To foster further research, we construct two complementary datasets: Journey to the West Document Simplification (JWDS) and Multi-Domain Document Simplification (MDDS) that covering diverse genres and difficulty levels. Experimental evaluation on two datasets demonstrates that LLM4CGDS substantially outperforms direct prompting of state-of-the-art LLMs in both readability control and meaning preservation. © 2026 Elsevier Ltd.
Original languageEnglish
Article number113905
Number of pages13
JournalEngineering Applications of Artificial Intelligence
Volume169
Online published7 Feb 2026
DOIs
Publication statusPublished - 1 Apr 2026

Funding

This research is partially supported by the National Natural Science Foundation of China under grants (62076217), and the National Language Commission (ZDI145-71).

Research Keywords

  • Graded reading
  • Text simplification
  • Large language modeling
  • Hanyu shuiping kaoshi

Fingerprint

Dive into the research topics of 'LLM4CGDS: Large language model-based agents for Chinese graded document simplification'. Together they form a unique fingerprint.

Cite this