Abstract
The explosive proliferation of Generative Artificial Intelligence (GenAI) has posed an unprecedented existential challenge to traditional evaluation systems in higher education. In response to this wave, the higher education sector is undergoing a profound transformation, shifting from a focus on technological containment toward an assessment governance paradigm. Understanding the characteristics of detectors is the foundational prerequisite for utilizing such tools to optimize educational assessment; however, their effectiveness within authentic educational contexts remains insufficiently explored. To address this research gap, the study constructs three large-scale ecological datasets—StuTask, StuThesis, and DataCode—comprising over 280,000 authentic samples of student coursework, academic theses, and engineering code. A systematic evaluation was conducted on 13 mainstream detectors, covering both commercial and open-source models, across multiple dimensions including overall performance, task complexity, disciplinary variations, and adversarial robustness. The results indicate that while detectors achieve acceptable performance on long-form theses, they exhibit systematic failures in engineering code and short-form coursework tasks. Due to the formulaic nature of technical writing, STEM disciplines are subject to significant algorithmic bias. Furthermore, robustness tests reveal the extreme vulnerability of current detection tools, as a hybrid editing strategy can enable up to 88% of AI-generated content to evade detection successfully. These findings suggest that existing detection technologies are inadequate to support high-stakes educational assessments. In the future educational trajectory of ‘embracing AI,’ AIGC detectors should function as reference metrics within the assessment system, serving to quantify the depth of human–AI collaboration across distinct disciplinary logical frameworks. Detection technology must evolve toward the optimization of ‘logical innovation recognition,’ thereby establishing a robust academic integrity defense line that is truly resilient within the future ecosystem of human–AI symbiosis. © 2026 The Authors
| Original language | English |
|---|---|
| Article number | 105616 |
| Number of pages | 21 |
| Journal | Computers and Education |
| Volume | 249 |
| Online published | 19 Mar 2026 |
| DOIs | |
| Publication status | Online published - 19 Mar 2026 |
Funding
This research is funded by the Hong Kong Metropolitan University R&D Fund (Grant No. RD/2025/1.24). All authors gratefully acknowledge the financial and academic support provided by this institution.
Research Keywords
- Academic integrity
- AIGC detection
- Assessment reform
- Engineering education
- Higher education
- Large language models
Fingerprint
Dive into the research topics of 'Trusting AI to detect AI? A systematic evaluation of the reliability and robustness of current AIGC detection tools for student academic work'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver