A Counselling Corpus in Cantonese

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

157 Downloads (CityUHK Scholars)

Abstract

Virtual agents are increasingly used for delivering health information in general, and mental health assistance in particular. This paper presents a corpus designed for training a virtual counsellor in Cantonese, a variety of Chinese. The corpus consists of a domain-independent subcorpus that supports small talk for rapport building with users, and a domain-specific subcorpus that provides material for a particular area of counselling. The former consists of ELIZA style responses, chitchat expressions, and a dataset of general dialog, all of which are reusable across counselling domains. The latter consists of example user inputs and appropriate chatbot replies relevant to the specific domain. In a case study, we created a chatbot with a domain-specific subcorpus that addressed 25 issues in test anxiety, with 436 inputs solicited from native speakers of Cantonese and 150 chatbot replies harvested from mental health websites. Preliminary evaluations show that Word Mover’s Distance achieved 56% accuracy in identifying the issue in user input, outperforming a number of baselines.
Original languageEnglish
Title of host publicationProceedings of the LREC 2020 1st Joint SLTU and CCURL Workshop (SLTU-CCURL 2020)
EditorsDorothee Beermann, Laurent Besacier, Sakriani Sakti
PublisherEuropean Language Resources Association (ELRA)
Pages358-361
ISBN (Electronic)9791095546351
Publication statusPublished - May 2020
Event1st Joint SLTU and CCURL Workshop (SLTU-CCURL 2020) - Marseille, France
Duration: 11 May 202016 May 2020

Conference

Conference1st Joint SLTU and CCURL Workshop (SLTU-CCURL 2020)
Abbreviated titleSLTU-CCURL 2020
Country/TerritoryFrance
CityMarseille
Period11/05/2016/05/20

Research Keywords

  • Cantonese
  • chatbot
  • counselling
  • test anxiety

Publisher's Copyright Statement

  • European Language Resources Association (ELRA), licensed under CC-BY-NC.

Fingerprint

Dive into the research topics of 'A Counselling Corpus in Cantonese'. Together they form a unique fingerprint.

Cite this