Non-deterministic segmentation for Chinese lattice parsing

Hai Hu, Daniel Dakota, Sandra Kubier

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

1 Downloads (CityUHK Scholars)

Abstract

Parsing Chinese critically depends on correct word segmentation for the parser since incorrect segmentation inevitably causes incorrect parses. We investigate a pipeline approach to segmentation and parsing using word lattices as parser input. We compare CRF-based and lexicon-based approaches to word segmentation. Our results show that the lattice parser is capable of selecting the correction segmentation from thousands of options, thus drastically reducing the number of unparsed sentence. Lexicon-based parsing models have a better coverage than the CRF-based approach, but the many options are more difficult to handle. We reach our best result by using a lexicon from the n-best CRF analyses, combined with highly probable words. © 2018 Association for Computational Linguistics (ACL).

Original languageEnglish
Title of host publicationProceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017
EditorsGalia Angelova, Kalina Bontcheva, Ruslan Mitkov, Ivelina Nikolova, Irina Temnikova
Place of PublicationShoumen, BULGARIA
PublisherIncoma Ltd
Pages316-324
Number of pages9
ISBN (Electronic)9789544520496
ISBN (Print)9789544520489
DOIs
Publication statusPublished - Sept 2017
Externally publishedYes
Event11th International Conference on Recent Advances in Natural Language Processing (RANLP 2017) - Varna, Bulgaria
Duration: 2 Sept 20178 Sept 2017
https://lml.bas.bg/ranlp2017/start.php

Publication series

NameInternational Conference Recent Advances in Natural Language Processing, RANLP
Volume2017-September
ISSN (Print)1313-8502
ISSN (Electronic)2603-2813

Conference

Conference11th International Conference on Recent Advances in Natural Language Processing (RANLP 2017)
PlaceBulgaria
CityVarna
Period2/09/178/09/17
Internet address

Funding

H. Hu is funded by the China Scholarship Council.

Publisher's Copyright Statement

  • This full text is made available under CC-BY 4.0. https://creativecommons.org/licenses/by/4.0/

Fingerprint

Dive into the research topics of 'Non-deterministic segmentation for Chinese lattice parsing'. Together they form a unique fingerprint.

Cite this