Abstract
Parsing Chinese critically depends on correct word segmentation for the parser since incorrect segmentation inevitably causes incorrect parses. We investigate a pipeline approach to segmentation and parsing using word lattices as parser input. We compare CRF-based and lexicon-based approaches to word segmentation. Our results show that the lattice parser is capable of selecting the correction segmentation from thousands of options, thus drastically reducing the number of unparsed sentence. Lexicon-based parsing models have a better coverage than the CRF-based approach, but the many options are more difficult to handle. We reach our best result by using a lexicon from the n-best CRF analyses, combined with highly probable words. © 2018 Association for Computational Linguistics (ACL).
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017 |
| Editors | Galia Angelova, Kalina Bontcheva, Ruslan Mitkov, Ivelina Nikolova, Irina Temnikova |
| Place of Publication | Shoumen, BULGARIA |
| Publisher | Incoma Ltd |
| Pages | 316-324 |
| Number of pages | 9 |
| ISBN (Electronic) | 9789544520496 |
| ISBN (Print) | 9789544520489 |
| DOIs | |
| Publication status | Published - Sept 2017 |
| Externally published | Yes |
| Event | 11th International Conference on Recent Advances in Natural Language Processing (RANLP 2017) - Varna, Bulgaria Duration: 2 Sept 2017 → 8 Sept 2017 https://lml.bas.bg/ranlp2017/start.php |
Publication series
| Name | International Conference Recent Advances in Natural Language Processing, RANLP |
|---|---|
| Volume | 2017-September |
| ISSN (Print) | 1313-8502 |
| ISSN (Electronic) | 2603-2813 |
Conference
| Conference | 11th International Conference on Recent Advances in Natural Language Processing (RANLP 2017) |
|---|---|
| Place | Bulgaria |
| City | Varna |
| Period | 2/09/17 → 8/09/17 |
| Internet address |
Funding
H. Hu is funded by the China Scholarship Council.
Publisher's Copyright Statement
- This full text is made available under CC-BY 4.0. https://creativecommons.org/licenses/by/4.0/