Universal Dependencies for Mandarin Chinese

Rafaël Poiret*, Tak-Sum Wong, John Lee, Kim Gerdes, Herman Leung

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

8 Citations (Scopus)

Abstract

This article presents a Universal Dependency (UD) annotation scheme for Mandarin Chinese, as well as the current UD Chinese HK treebank. Our focus is mainly on parts-of-speech tags and syntactic relations, with a quite large array of phenomena investigated. The main goal is to make transparent the linguistic consideration behind our annotation choices, and show how we articulated these choices with the criteria of Universal Dependencies. This scheme has been developed with reference to two other dependency schemes for this language, i.e. the Chinese Stanford Dependencies (Chang et al., 2009) and the Chinese Dependency Treebank (HIT-SCIR, 2010). We provide mappings between our scheme and the two others. The content of the UD Chinese HK treebank is discussed in relation to the other UD treebanks for Chinese, and the inter-annotator agreement on POS and dependency annotation is reported. Our proposed scheme is motivated by reasoned linguistic analysis, is suitable for cross-linguistic comparison, and produced a high level of agreement between annotators.

© The Author(s), under exclusive licence to Springer Nature B.V. 2021, corrected publication 2022
Original languageEnglish
Pages (from-to)673-710
JournalLanguage Resources and Evaluation
Volume57
Issue number2
Online published24 Nov 2021
DOIs
Publication statusPublished - Jun 2023

Research Keywords

  • Chinese
  • Universal dependencies
  • Treebank
  • Annotation scheme

Fingerprint

Dive into the research topics of 'Universal Dependencies for Mandarin Chinese'. Together they form a unique fingerprint.

Cite this