Character profiling in low-resource language documents

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

This paper focuses on automatic character profiling — connecting “who”, “what” and “when” — in literary documents. This task is especially challenging for low-resource languages, since off-the-shelf tools for named entity recognition, syntactic parsing and other natural language processing tasks are rarely available. We investigate the impact of human annotation on automatic profiling. Based on a Medieval Chinese corpus, experimental results show that even a relatively small amount of word segmentation, part-of-speech and dependency annotation can improve accuracy in named entity recognition and in identifying character-verb associations, but not character-toponym associations.
Original languageEnglish
Title of host publicationADCS '19: Proceedings of the 24th Australasian Document Computing Symposium
EditorsGianluca Demartini, Paul Thomas
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450377669
DOIs
Publication statusPublished - Dec 2019
Event24th Australasian Document Computing Symposium (ADCS 2019) - University of Technology, Sydney, Australia
Duration: 5 Dec 20196 Dec 2019
http://adcs-conference.org/2019/

Publication series

NameACM International Conference Proceeding Series

Conference

Conference24th Australasian Document Computing Symposium (ADCS 2019)
Abbreviated titleADCS 2019
PlaceAustralia
CitySydney
Period5/12/196/12/19
Internet address

Bibliographical note

Full text of this publication does not contain sufficient affiliation information. With consent from the author(s) concerned, the Research Unit(s) information for this record is based on the existing academic department affiliation of the author(s).

Research Keywords

  • Dependency parsing
  • Information extraction
  • Low-resource language
  • Medieval Chinese
  • Named entity recognition

Fingerprint

Dive into the research topics of 'Character profiling in low-resource language documents'. Together they form a unique fingerprint.

Cite this