Leveraging large language models to supplement corpus-based inductive learning of Chinese as a second language

Tiffany Tsz-Yin Pang*

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

Abstract

Corpus tools have proven effective for supporting inductive language learning by enabling learners to observe multiple examples, form hypotheses, and verify the hypotheses based on additional examples. However, when applied to Chinese as a Second Language (CSL), these tools encounter limitations that disrupt the observe-hypothesize-verify process. Sketch Engine, for example, misanalyzes Chinese word boundaries, topicalized objects, and ba-constructions, and provides inaccurate observational data that undermines the effectiveness of inductive learning. This paper proposes integrating Large Language Models (LLMs) with corpus tools to address the limitations. Using Sketch Engine and Claude Opus 4 as exemplars, I demonstrate how LLMs serve three pedagogical functions: (1) error detection to identify misanalyzed features in corpus outputs, (2) guided pattern discovery to help learners recognize linguistic regularities across examples, and (3) hypothesis verification to confirm/refine learners’ observations. Through analysis of specific Chinese features, I show how LLM integration maintains the discovery processes while ensuring accurate linguistic input for the learners. The proposed corpus-LLM integration represents an advancement in leveraging AI for language pedagogy. The paper concludes with future research directions for optimizing this integration in CSL acquisition, and emphasizes the need to balance technological innovation with pedagogical principles. © 2025 The Author. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
Original languageEnglish
Article number100170
JournalApplied Corpus Linguistics
Volume6
Issue number1
Online published20 Nov 2025
DOIs
Publication statusOnline published - 20 Nov 2025

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Research Keywords

  • Chinese as a Second Language (CSL)
  • Corpus-based learning
  • Inductive learning
  • Large Language Models (LLMs)

Publisher's Copyright Statement

  • This full text is made available under CC-BY 4.0. https://creativecommons.org/licenses/by/4.0/

Fingerprint

Dive into the research topics of 'Leveraging large language models to supplement corpus-based inductive learning of Chinese as a second language'. Together they form a unique fingerprint.

Cite this