A Psycholinguistic and Corpus-based Study of Chunk: Ontology and Pedagogical Application in TCSL


Student thesis: Doctoral Thesis

View graph of relations



Awarding Institution
Award date1 Sept 2021


Historically, there has been a gap between the effects of second-language acquisition and that of first-language acquisition. Scholars have tried to explain the gap utilizing different theories. Some such theories include the Universal Grammar Theory, the Emergentist Theory, etc. Scholars posited and utilized the concept of chunks to solve many quandaries in language acquisition. However, the current academic research on chunks lacks depth, and there is no unified conclusion on the definition, classification, and features of chunks.

John McHardy Sinclair proposed two language-processing mechanisms: the Idiom Principle and the Open-choice Principle. These two principles are complementary and work together; they cover the complete process of language perception and production, and that of formulating efficiencies and creativities in language. We tried to propose a language-processing model inclusive of memory and cognitive systems in accordance with these two principles. By reviewing the previous studies on chunks, we summarised chunks’ common features in these studies. We analysed the chunk’s position in the above-mentioned language-processing model; and proposed our definition of the chunk from the perspective of psycholinguistics.

Combining research objects and research questions, we posited a series of hypotheses and designed two experiments—a mental-representation experiment and a teaching experiment—and a corpus analysis based on these hypotheses. Each experiment or analysis sought to verify one hypothesis.

Aptly named “the mental representation experiment”, it tested for different language units on 16 native-Chinese speakers and 16 Chinese-as-a-second-language (CSL) learners. The test results indicated that both native speakers and learners memorised some of the multi-word combinations as a whole unit. As integrated memory units, these language components—chunks—were qualitatively distinguishable from those that combined both the lexicon and the syntactic component in cognitive language-processing. Chunk, as a multi-word unit, will not be analyzed for its internal components and grammatic relations by the language users in memorization and use. The results also revealed differences in the types of integrally memorised units between native speakers and learners. Native speakers generally remembered some of the following as one unit: semantically-related binary phrases, expressions with discourse functions, common sayings, and idiomatic phrases. In contrast, learners partially remembered the semantically-related binary phrases and the expressions with discourse functions as a whole. The chunk types differed between native speakers and language learners, which reflected the learners’ insufficiency in second language comprehension. Utilizing the test results, the chunk’s classification was further discussed. Additionally notable, the difference in the number Chinese units memorized between native speakers and learners was significant. On average, native speakers’ Chinese-unit memory-span was larger than that of the learners.

The corpora analysis identified the chunk distribution in three native-speaker corpora of different styles, a learner corpus, and a textbook corpus. The results indicated that the chunk distributions in the corpora of different styles (textual, spoken and mixed) differed in type and token. The varying distributions of chunks across corpora showed that there were specific application scenarios for each type of chunk. Through these differences, we analyzed the roles and functions of different chunk types. Of the spoken corpus, the distribution of chunks differed between the native-speaker corpus, the learner corpus, and the textbook corpus. In comparing the corpora, we concluded that the learners had insufficiently mastered chunks and that the textbooks deviated from the authentic spoken content as the chunk usage of learners and textbooks disaccorded with that of native speakers.

In the teaching experiment, the control group (n = 23) and experimental group’s (n = 25) pre-test and post-test scores were compared to test the influence of the short-term intensive chunk-teaching on the learners’ oral performance. The scoring items included fluency, vocabulary, grammar, pronunciation, content, and idiomaticity. The results indicated that short-term intensive chunk-teaching increased the number of specific kinds of chunks learners used, and significantly improved the idiomaticity of their spoken language. We found that short-term intensive teaching had different effects based on the type of chunk, in relation to its function, formation, and user’s preferences.

We lastly proposed several suggestions on the ontology and pedagogical application of chunks.