Analyzing who, what, and where in a mediaeval Chinese corpus : A case study on the Chinese Buddhist Canon

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 12 - Chapter in an edited book (Author)peer-review

View graph of relations

Detail(s)

Original languageEnglish
Title of host publicationAdvances in Corpus Applications in Literary and Translation Studies
EditorsRiccardo Moratto, Defeng Li
Place of PublicationLondon
PublisherRoutledge 
Pages81-102
ISBN (electronic)9781003298328
ISBN (print)9781032287386, 9781032287409
Publication statusPublished - 2023

Publication series

NameRoutledge Advances in Translation and Interpreting Studies

Abstract

Information extraction from historical text is challenging because of the lack of data to train natural language processing tools. This chapter evaluates the utility of in-domain training data for data-driven profiling of characters, verbs, and toponyms and reports a case study on a corpus of Chinese Buddhist text. As is typical for such a corpus, the Chinese Buddhist Canon has few annotated linguistic resources other than lexica of names, places, and domain-specific terms. We apply a lexicon-based approach for named entity recognition and then report an analysis of the “who,” “what,” and “where” of the Canon: who the characters were, what they did, and where they were. Experimental results also show that even a small amount of word segmentation, part-of-speech, and dependency annotation can improve accuracy in named entity recognition and in extraction of character-verb associations.

Citation Format(s)

Analyzing who, what, and where in a mediaeval Chinese corpus: A case study on the Chinese Buddhist Canon. / Wong, Tak-sum; Lee, John Sie Yuen.
Advances in Corpus Applications in Literary and Translation Studies. ed. / Riccardo Moratto; Defeng Li. London: Routledge , 2023. p. 81-102 (Routledge Advances in Translation and Interpreting Studies).

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 12 - Chapter in an edited book (Author)peer-review