Projects per year
Abstract
Sentence Simplification aims to make sentences easier to read and understand. With most effort on corpus development focused on English, the amount of annotated data is limited in Chinese. To address this need, we introduce CSSWiki, an open-source dataset for Chinese sentence simplification based on Wikipedia. This dataset contains 1.6k source sentences paired with their simplified versions. Each sentence pair is annotated with operation tags that distinguish between linguistic and content modifications. We analyze differences in annotation scheme and data statistics between CSSWiki and existing datasets. We then report baseline sentence simplification performance on CSSWiki using zero-shot and few-shot approaches with Large Language Models. © 2024 ELRA Language Resource Association
Original language | English |
---|---|
Title of host publication | Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) |
Publisher | European Language Resources Association (ELRA) |
Pages | 4205-4213 |
ISBN (Print) | 9782493814104 |
Publication status | Published - 23 May 2024 |
Event | 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) - Hybrid, Torino, Italy Duration: 20 May 2024 → 25 May 2024 https://lrec-coling-2024.org/ https://aclanthology.org/volumes/2024.isa-1/ https://aclanthology.org/2024.lrec-main |
Publication series
Name | Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING - Main Conference Proceedings |
---|
Conference
Conference | 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) |
---|---|
Abbreviated title | LREC-COLING 2024 |
Country/Territory | Italy |
City | Torino |
Period | 20/05/24 → 25/05/24 |
Internet address |
Funding
This work was partly supported by the Language Fund from the Standing Committee on Language Education and Research (project EDB(LE)/P&R/EL/203/14) and by the General Research Fund (project 11207320).
Research Keywords
- Chinese sentence simplification
- Corpus creation
- Linguistic simplification operations
- Content simplification operations
Publisher's Copyright Statement
- This full text is made available under CC-BY-NC 4.0. https://creativecommons.org/licenses/by-nc/4.0/
Fingerprint
Dive into the research topics of 'CSSWiki: A Chinese Sentence Simplification Dataset with Linguistic and Content Operations'. Together they form a unique fingerprint.Projects
- 1 Finished
-
GRF: Semantic Modeling for Sentence-level Readability Assessment
LEE, J. S. Y. (Principal Investigator / Project Coordinator), LIU, M. (Co-Investigator) & Sun, W. (Co-Investigator)
1/01/21 → 17/06/24
Project: Research