Skip to main navigation Skip to search Skip to main content

Grade-Like-a-Human: Rethinking Automated Assessment with Large Language Models

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

Grading is a foundational component of assessment in higher education, aiming to evaluate student work in a reliable, repeatable, and interpretable manner. Short-answer questions effectively assess understanding, analysis, and articulation, but their open-ended nature makes traditional workflows reliant on detailed rubrics and manual review, resulting in substantial time and labor. Although recent work has explored using large language models (LLMs) for automated short-answer grading (ASAG), significant gaps remain in rubric design and in ensuring scoring consistency and fairness. Inspired by best practices in human grading, we propose Grade-Like-a-Human, a systematic multi-agent framework that spans the full pipeline: iteratively aligning rubrics with real answers, leveraging cross-item memory to enhance scoring consistency, and integrating a post-grading audit-and-feedback loop. We evaluate our method on an open-source short-answer grading benchmark and deploy it in a real undergraduate Operating Systems course, using authentic questions and student submissions for evaluation. We further release the questions, student submissions, and grading artifacts as the OS dataset1. Experiments demonstrate substantial improvements in accuracy, consistency, and fairness. © 2025 the owner/author(s).
Original languageEnglish
Title of host publicationRACS '25: Proceedings of the International Conference on Research in Adaptive and Convergent Systems
Place of PublicationNew York, NY
PublisherAssociation for Computing Machinery
Number of pages8
ISBN (Print)9798400722318
DOIs
Publication statusPublished - Nov 2025
Event2025 International Conference on Research in Adaptive and Convergent Systems (RACS 2025) - Industrial University of Ho Chi Mihn City, Ho Chi Minh, Viet Nam
Duration: 16 Nov 202519 Nov 2025
https://www.sigapp.org/RACS/RACS2025/index.php

Publication series

NameResearch in Adaptive and Convergent Systems, RACS

Conference

Conference2025 International Conference on Research in Adaptive and Convergent Systems (RACS 2025)
Abbreviated titleACM RACS 2025
PlaceViet Nam
CityHo Chi Minh
Period16/11/2519/11/25
Internet address

Research Keywords

  • Automated short-answer grading
  • Large language model
  • Post-grading audit-and-feedback
  • Rubric refinement

Fingerprint

Dive into the research topics of 'Grade-Like-a-Human: Rethinking Automated Assessment with Large Language Models'. Together they form a unique fingerprint.

Cite this