Abstract
Recent studies in automatic readability assessment have shown that hybrid models — models that leverage both linguistically motivated features and neural models — can outperform neural models. However, most evaluations on hybrid models have been based on in-domain data in English. This paper provides further evidence on the contribution of linguistic features by reporting the first direct comparison between hybrid, neural and linguistic models on cross-domain data. In experiments on a Chinese dataset, the hybrid model outperforms the neural model on both in-domain and cross-domain data. Importantly, the hybrid model exhibits much smaller performance degradation in the cross-domain setting, suggesting that the linguistic features are more robust and can better capture salient indicators of text difficulty.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 20th Workshop of the Australasian Language Technology Association |
| Subtitle of host publication | ALTA 2022 |
| Publisher | Australasian Language Technology Association |
| Pages | 62-67 |
| Publication status | Published - Dec 2022 |
| Event | 20th Annual Workshop of the Australasian Language Technology Association (ALTA 2022) - Flinders University, Adelaide, Australia Duration: 14 Dec 2022 → 16 Dec 2022 https://alta2022.alta.asn.au/ |
Publication series
| Name | Proceedings of the Australasian Language Technology Workshop |
|---|---|
| ISSN (Print) | 1834-7037 |
Conference
| Conference | 20th Annual Workshop of the Australasian Language Technology Association (ALTA 2022) |
|---|---|
| Place | Australia |
| City | Adelaide |
| Period | 14/12/22 → 16/12/22 |
| Internet address |