Abstract
This paper focuses on automatic character profiling — connecting “who”, “what” and “when” — in literary documents. This task is especially challenging for low-resource languages, since off-the-shelf tools for named entity recognition, syntactic parsing and other natural language processing tasks are rarely available. We investigate the impact of human annotation on automatic profiling. Based on a Medieval Chinese corpus, experimental results show that even a relatively small amount of word segmentation, part-of-speech and dependency annotation can improve accuracy in named entity recognition and in identifying character-verb associations, but not character-toponym associations.
| Original language | English |
|---|---|
| Title of host publication | ADCS '19: Proceedings of the 24th Australasian Document Computing Symposium |
| Editors | Gianluca Demartini, Paul Thomas |
| Publisher | Association for Computing Machinery |
| ISBN (Electronic) | 9781450377669 |
| DOIs | |
| Publication status | Published - Dec 2019 |
| Event | 24th Australasian Document Computing Symposium (ADCS 2019) - University of Technology, Sydney, Australia Duration: 5 Dec 2019 → 6 Dec 2019 http://adcs-conference.org/2019/ |
Publication series
| Name | ACM International Conference Proceeding Series |
|---|
Conference
| Conference | 24th Australasian Document Computing Symposium (ADCS 2019) |
|---|---|
| Abbreviated title | ADCS 2019 |
| Place | Australia |
| City | Sydney |
| Period | 5/12/19 → 6/12/19 |
| Internet address |
Bibliographical note
Full text of this publication does not contain sufficient affiliation information. With consent from the author(s) concerned, the Research Unit(s) information for this record is based on the existing academic department affiliation of the author(s).Research Keywords
- Dependency parsing
- Information extraction
- Low-resource language
- Medieval Chinese
- Named entity recognition