Skip to main navigation Skip to search Skip to main content

Spatial-Channel Token Distillation for Vision MLPs

Yanxi Li, Xinghao Chen, Minjing Dong, Yehui Tang, Yunhe Wang*, Chang Xu*

*Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

Recently, neural architectures with all Multi-layer Perceptrons (MLPs) have attracted great research interest from the computer vision community. However, the inefficient mixing of spatial-channel information causes MLP-like Vision Models to demand tremendous pre-training on large-scale datasets. This work solves the problem from a novel knowledge distillation perspective. We propose a novel Spatial-channel Token Distillation (STD) method, which improves the information mixing in the two dimensions by introducing distillation tokens to each of them. A mutual information regularization is further introduced to let distillation tokens focus on their specific dimensions and maximize the performance gain. Extensive experiments on ImageNet for several MLP-like architectures demonstrate that the proposed token distillation mechanism can efficiently improve the accuracy. For example, the proposed STD boosts the top-1 accuracy of Mixer-S16 on ImageNet from 73.8% to 75.7% without any costly pre-training on JFT-300M. When applied to stronger architectures, e.g. CycleMLP-B1 and CycleMLP-B2, STD can still harvest about 1.1% and 0.5% accuracy gains, respectively. Copyright 2022 by the author(s).

Original languageEnglish
Title of host publicationProceedings of the 39th International Conference on Machine Learning
EditorsKamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, Sivan Sabato
PublisherML Research Press
Pages12685-12695
Publication statusPublished - Jul 2022
Externally publishedYes
Event39th International Conference on Machine Learning (ICML 2022) - Hybrid, Baltimore, United States
Duration: 17 Jul 202223 Jul 2022
https://icml.cc/virtual/2022/index.html
https://icml.cc/Conferences/2022
https://proceedings.mlr.press/v162/

Publication series

NameProceedings of Machine Learning Research
Volume162
ISSN (Electronic)2640-3498

Conference

Conference39th International Conference on Machine Learning (ICML 2022)
PlaceUnited States
CityBaltimore
Period17/07/2223/07/22
Internet address

Funding

The authors would like to thank the area chairs and the reviewers for their constructive comments. This work was supported in part by the Australian Research Council under Project DP210101859 and the University of Sydney SOAR Prize.

Fingerprint

Dive into the research topics of 'Spatial-Channel Token Distillation for Vision MLPs'. Together they form a unique fingerprint.

Cite this