Skip to main navigation Skip to search Skip to main content

Toward Effective Knowledge Distillation: Navigating Beyond Small-data Pitfall

  • Zhiwei Hao (Co-first Author)
  • , Jianyuan Guo (Co-first Author)
  • , Kai Han
  • , Han Hu*
  • , Chang Xu
  • , Yunhe Wang
  • *Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

Abstract

The spectacular success of training large models on extensive datasets highlights the potential of scaling up for exceptional performance. To deploy these models on edge devices, knowledge distillation (KD) is commonly used to create a compact model from a larger, pretrained teacher model. However, as models and datasets rapidly scale up in practical applications, it is crucial to consider the applicability of existing KD approaches originally designed for limited-capacity architectures and small-scale datasets. In this paper, we revisit current KD methods and identify the presence of a small-data pitfall, where most modifications to vanilla KD prove ineffective on large-scale datasets. To guide the design of consistently effective KD methods across different data scales, we conduct a meticulous evaluation of the knowledge transfer process. Our findings reveal that incorporating more useful information is crucial for achieving consistently effective KD methods, while modifications in loss functions show relatively less significance. In light of this, we present a paradigmatic example that combines vanilla KD with deep supervision, incorporating additional information into the student during distillation. This approach surpasses almost all recent KD methods. We believe our study will offer valuable insights to guide the community in navigating beyond the small-data pitfall and toward consistently effective KD. © 2025 IEEE All rights reserved.
Original languageEnglish
Pages (from-to)542-556
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
Volume48
Issue number1
Online published9 Sept 2025
DOIs
Publication statusPublished - Jan 2026

Funding

This work was supported by the Joint Funds of the National Natural Science Foundation of China (NSFC) under Grant U2336211 and in part by Major Research Plan of the NSFC under Grant 92467206.

Research Keywords

  • computer vision
  • deep supervision
  • dense prediction
  • Knowledge distillation

Fingerprint

Dive into the research topics of 'Toward Effective Knowledge Distillation: Navigating Beyond Small-data Pitfall'. Together they form a unique fingerprint.

Cite this