Abstract
Automatic generation of multiple-choice (MC) items for reading comprehension can support language learning by providing large amounts of practice materials. To enable rapid development of MC generation models, automatic assessment is essential since it is time-consuming to manually evaluate question and distractor quality. Although Text Informativity (TI) has been adopted as an automatic evaluation metric, the ability of Large Language Models (LLMs) to estimate the TI scores of different categories of questions and distractors has not yet been thoroughly analyzed. This paper investigates LLM performance in calculating TI scores for the range of questions and distractors defined in the PIRLS (Progress in International Reading Literacy Study) and STARC (Structured Annotations for Reading Comprehension) frameworks. We show that automatically estimated TI scores may result in systematic preferences for some question and distractor categories, and recommend that TI scores be used for within-category comparisons only.
©ELRA Language Resources Association (ELRA), 2026
©ELRA Language Resources Association (ELRA), 2026
| Original language | English |
|---|---|
| Title of host publication | Proceedings of Shaping Multilingual, Multimodal AI for the Social Sciences and Humanities (LLMs4SSH) @ LREC 2026 |
| Editors | Arturo Montejo-Ráez, Cristina Grisot, Joanna Blochowiak |
| Publisher | European Language Resources Association (ELRA) |
| Pages | 170-174 |
| ISBN (Electronic) | 978-2-493814-85-2 |
| Publication status | Published - 11 May 2026 |
| Event | 15th International Conference on Language Resources and Evaluation - Palau de Congressos de Palma, Palma, Spain Duration: 11 May 2026 → 16 May 2026 https://lrec2026.info/ |
Conference
| Conference | 15th International Conference on Language Resources and Evaluation |
|---|---|
| Abbreviated title | LREC 2026 |
| Place | Spain |
| City | Palma |
| Period | 11/05/26 → 16/05/26 |
| Internet address |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 4 Quality Education
Research Keywords
- multiple-choice items
- text informativity
- question generation
- distractor generation
Fingerprint
Dive into the research topics of 'Automatic Evaluation of Multiple-Choice Items for Reading Comprehension: Effects of Question and Distractor Categories'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver