An automated system with a versatile test oracle for assessing student programs

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review

View graph of relations


Related Research Unit(s)


Original languageEnglish
Pages (from-to)176-199
Journal / PublicationComputer Applications in Engineering Education
Issue number1
Online published11 Oct 2022
Publication statusPublished - Jan 2023


Automated program assessment systems have been widely adopted in many universities. Many of these systems judge the correctness of student programs by comparing their actual outputs with predefined expected outputs for selected test inputs. A common weakness of such systems is that student programs would be marked as incorrect as long as their outputs deviate from the predefined ones, even if the deviations are only minor, insignificant, and considered acceptable by a human assessor that the programs have satisfied the specifications. This critical weakness caused undue frustration to students and undesirable pedagogical consequences that undermine these systems’ benefits. To address this issue, we developed an improved mechanism for program output comparison to serve as a versatile test oracle that brings the results of automated assessment much closer to those of human assessors. We evaluated the new mechanism in real programming classes using an existing automated program assessment system. We found that the new mechanism achieved zero false-positive error (did not wrongly accept any incorrect output) and very low (0%–0.02%) false-negative error (that wrongly rejected correct outputs), with very high accuracy (99.8%–100%) in correctly recognizing outputs deemed acceptable by instructors. This represents a major improvement over an existing assessment mechanism, which had 56.4%–64.1% false-negative error with an accuracy of 25.4%–40.9%. Moreover, about 67%–96% of students achieved their best results in their first attempt, which could be encouraging to them and reduce their frustration. Furthermore, students generally welcomed the new assessment mechanism and agreed it was beneficial to their learning.

Research Area(s)

  • automated program assessment system, computer science education, learning computer programming, program assessment, test oracle