A Framework for New Language Learning Models: Production and Perception of English Initial Clusters by Cantonese and Mandarin Chinese Speakers


Student thesis: Doctoral Thesis

View graph of relations


  • Yizhou LAN


Awarding Institution
Award date28 Dec 2016


Second language speech acquisition has been explored by numerous speech production and perception experiments as well as some influential theoretical models such as Perceptual Assimilation Model (PAM) and Speech Learning Model (SLM). These models provide a means to predict speech learning outcome by examining the acoustic or perceptual distance between L1 and L2 sound categories. Therefore, the models are often referred to as distance-based models. However, these models seldom include more universal factors other than L1, such as the complexity of articulatory gestures and the influence of prosody on segments. The present thesis intends to test these distance-based models by examining L2 pronunciation of English Cr-. The distance between /r/ in Cr- and L2-accented sounds such as /w/ can be tested with acoustic measurements through formant analysis. Therefore, Cr- was chosen to test the distance-based models in the present thesis.

Three experiments are performed to examine Cantonese and Mandarin productions of English Cr- consonant clusters and compare them to those of Native American English speakers. The experiments are conducted to test the PAM and SLM models in terms of general pattern in Experiment 1, gestural variations in Experiment 2 and tonal variations in Experiment 3.

Experiment 1 explores the general production and perception patterns of consonant clusters by the three groups of participants. Cantonese and Mandarin speakers showed different acoustic properties from those by Native English speakers. For Cantonese and Mandarin speakers, productions of cluster words differ significantly from those of words with full phonological deletion or substitution. The perception tests, including AX identification and ABX discrimination tasks, also show similar variation.

Experiment 2 investigates the effect of vowel context on alveolar cluster perception. Results show that perceptual accuracies of the alveolar cluster are significantly lower than the other two places of articulation.

Experiment 3 examines Cantonese and Mandarin speakers’ tonal variations in their productions of cluster words and words with an added vowel within the cluster. Results show that tonal aspects are significantly correlated with their disperse choices of usage of deletion or epenthesis. A following perceptual study using manipulative stimuli adapted from the Cantonese and Mandarin productions to mask each possible acoustic cue supports that tone is the most weighted perceptual cue in cluster perception. The findings may have important implications in reviewing the existing theoretical frameworks of L2 speech learning including PAM, SLM, and some alternative models such as the Motor Theory and the Automatic Selective Perception model (ASP).

Based on the findings, it is suggested that the formation of a L2 speech category is not purely a phonological transfer process, but rather a complex system influenced by L1 background, motor control of gestures, and prosodic factors. In other words, the realizations of L2 categories may not be fully predicted by the L1-L2 phonological contrast. The findings partly conform to distance-based models in that contrasts may be different according to categorial independence of the C-/r/ cluster with regard to deleted or substituted categories, or other approximants or vowels. However, it is argued that in different tone contrasts or vowel environments that may induce gestural reduction, the formation of L2 speech category show significant variations. In this regard, the findings of the experiments serve as an extension to distance-based models. It attempts to argue for the inclusion of more fine-grained aspects to the models such as the complexity of articulatory gestures and the covariance between the segment and supra-segment. Pedagogically, the findings suggest that teachers and learners, as well as L2 speakers and listeners should both strive for manageable cognitive ways to direct the attentional resources to the features that may confuse L1 and L2 sounds. Specifically, minimal pair practice, shifting of attention and visual aids are suggested to be included in the pronunciation pedagogy.