Designing and implementing a corpus-based online pronunciation learning platform for Cantonese learners of Mandarin

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

3 Scopus Citations
View graph of relations



Original languageEnglish
Pages (from-to)18-31
Number of pages14
Journal / PublicationInteractive Learning Environments
Issue number1
Online published23 Aug 2018
Publication statusPublished - 2020


As an international financial centre, Hong Kong is a metropolitan city that has given rise to multilingual characteristics in recent years. In addition to Cantonese and English, which serve mostly as first and second languages, Hong Kong residents have increasingly begun to develop a third or even a fourth language. The biliteracy and trilingualism language (兩文三語) policy encourages Mandarin as the third language. This paper introduces a corpus-based online pronunciation learning platform for Mandarin teachers, learners, and researchers to better understand the major problems encountered by Hong Kong learners of Cantonese in learning Mandarin pronunciation. A phonological corpus was established and analysed in order (a) to identify learners’ recurring difficulties in accurately and appropriately using Mandarin segmental and suprasegmental features and (b) to suggest possible solutions to reduce or eliminate such difficulties. The phonological corpus contains recorded data of four spoken tasks (reading of monosyllabic words, reading of multisyllabic words, reading of a passage, and free speech) from Hong Kong Cantonese college students. The phonological annotations of the recordings mainly focus on two areas of segmental features (vowels and consonants), two areas of suprasegmental features (tone and retroflex finals), and mispronunciation. In addition to the corpus, a pronunciation learning website was developed for learners to (a) practice segmental and suprasegmental aspects of pronunciation through a variety of perception and production exercises and (b) discover the possible causes of common Mandarin pronunciation features found in the corpus. Based on the corpus, 40 datasets were analysed, and a checklist of common Mandarin pronunciation errors made by Cantonese learners was made available for teachers and learners. The use and the evaluation of the pronunciation learning platform will also be introduced and discussed.

Research Area(s)

  • Spoken corpora, Mandarin acquisition, pronunciation learning