Perception and Production of Mandarin Tones with Nondiagnostic F0


Student thesis: Doctoral Thesis

View graph of relations



Awarding Institution
  • Wai Sum LEE (Supervisor)
  • Hongwei Ding (External person) (External Supervisor)
Award date25 Apr 2023


Speech contrasts are realized by multiple dimensions, among which the more informative cues receive more perceptual weight for phonetic recognition. In the case of Mandarin tones, F0 dominates tone identification over other cues, such as secondary cues (e.g., duration and amplitude) and external contextual cues. Nevertheless, it is not necessarily the case that the non-dominant cues have no perceptual value. Their perceptual weight varies according to listening situations and becomes important when F0 is not informative. The present study investigates the production and perception of Mandarin tones with non-diagnostic/uninformative F0 from a perspective of (i) the dynamic adjustment of perceptual weight on non-primary dimensions across listening situations and (ii) the coordination and joint efforts of speakers and listeners.

Two listening situations where F0 is uninformative are examined: coarticulated tones in connected speech and unphonated tones in whispered speech. In the first situation, the F0 contour or shape is often changed by coarticulation such that the variant contour deviates from its canonical form, thus giving rise to perceptual ambiguity or misperception. In the second situation, F0 is totally missing in whispered speech. We explore whether listeners increase the perceptual weight on less prominent cues when F0 is insufficient. Then the mechanism that causes these cues to have an increase in perceptual weight is examined. Finally, the adjustment of perceptual weight is explored in both production and perception to seek the coordination and joint efforts of speakers and listeners. To answer these questions, both perceptual and production experiments were conducted in each situation.

Chapters One to Three discuss the fundamentals, including the dissertation outline, theoretical frameworks, research questions, and literature review. Chapter Four explores the perceptual weight on phonetic contextual cues for tone identification in disyllabic utterances, the shortest unit in connected speech. Listeners were presented with a series of tone targets varying acoustically and perceptually from Mandarin Tone 3 to Tone 4 in isolation or following a tone precursor of Tone 1, Tone 2, or Tone 4. The results showed that tone precursors affected the identification of the target tones such that the targets were more likely to be categorized as Tone 3 following the tone precursors with high offset F0 (Tone 1 and Tone 2) than following those with low offset F0 (Tone 4). The perceptual effect only occurred on the ambiguous tonal targets. This suggests that when F0 information is insufficient, the perceptual weight on the external context F0 increases, reflecting the adjustment of perceptual weight on phonetic contextual cues. The perceptual effect of tone precursors compensated for the acoustic consequence of coarticulation because the tone precursors with a high offset F0 produced a Tone 4-like Tone 3 variant in production but tended to recover a Tone 3 identification of this variant in perception (the perceptual context [PC] effect). When the tone precursors were replaced by nonspeech pure tones that preserved only the F0 information of speech contexts, the PC effect still existed. However, the effect of nonspeech contexts was significantly smaller than speech contexts. The larger effect of speech context than nonspeech context was not caused by a possible lack of focal attention on the nonspeech context or by different mechanisms of processing the nonspeech context and speech targets. Our findings suggest that a general auditory and learning mechanism must be operative in causing the PC effect and point to future work for clarifying the factors that may modulate the magnitude of the PC effect. The adjustment of perceptual weight on tone precursors was also observed in production, providing evidence for the speaker-listener coordination and joint efforts. Specifically, speakers raised the offset F0 of tone precursors to a larger extent for a following Tone 3 than for a following Tone 4. The rise of offset F0 in tone precursors had perceptual significance in that it enhanced the PC effect and facilitated the identification of the following tone as Tone 3, thus uncovering the original form of Tone 3 in the face of a Tone 4-like variant.

Chapter Five studies the use of secondary cues when F0 is unavailable in whispered speech. Speakers produced phonated and whispered tones that were subsequently presented to listeners in a tone identification task. Spectral (First and second formants and Center of gravity) and non-spectral (amplitude-F0 correlation index, mean amplitude, and duration) cues of the whispered tones were evaluated regarding their perceptual value for tone identification. Results showed that only non-spectral cues, namely amplitude and duration, had perceptual value, though the specific cue varied across the identification of the four Mandarin tones. In a subsequent perceptual analysis, amplitude-modulated noises that only kept the amplitude and duration characteristics of whispered tones were presented to listeners in another tone identification test. The accuracy of tone identification for amplitude-modulated noises was very similar to that of naturally-produced whispered tones, corroborating the findings that non-spectral cues, instead of spectral cues, had perceptual value for tone recognition. Results also showed that amplitude dominated tone identification over duration. Duration complemented the perceptual value of amplitude by mildly modifying the identification rate. Results of the acoustic analysis suggest that the dominant role of amplitude is not directly related to its acoustic informativeness but possibly is due to auditory reasons. The adjustment of weight on non-spectral cues was also observed in production. Speakers enhanced the use of amplitude and duration more in whispered speech than in phonated speech, evidenced by a less overlap of tones in duration and a higher amplitude-F0 correlation index in whispered tones than in phonated tones. However, the enhancement was only observed in Tone 3 and Tone 4, suggesting possible constraints on the use of amplitude and duration for Tone 1 and Tone 2. Finally, the enhancement of amplitude, but not duration, helped listeners more correctly recognize tones, indicating the coordination and joint efforts of speakers and listeners.

The general discussion that incorporates the PC effect and use of secondary cues for the perception of Mandarin tones is presented in Chapter Six. The two situations reveal the dynamic adjustment of perceptual weights across listening situations and the coordination and joint efforts of speakers and listeners. Moreover, the empirical evidence points to (i) a general auditory and learning mechanism that causes the an increase in perceptual weight of the contextual cues and (ii) an auditory enhancement mechanism for the perceptual weight of the secondary cues, in particular amplitude.

The dissertation concludes by summarizing the research questions, the major findings, the theoretical and pedagogical implications, and directions for future research in Chapter Seven. The findings of this dissertation deepen our understanding of Mandarin tones in perception and production when F0 is uninformative: Listeners and speakers adjust the weight on the non-dominant cues in a cooperative manner with the aim of maintaining smooth communication.

    Research areas

  • Mandarin tones, perception and production, insufficient F0, dynamic adjustment of perceptual weight, speaker-listener coordination