Abstract
Recently, neural architectures with all Multi-layer Perceptrons (MLPs) have attracted great research interest from the computer vision community. However, the inefficient mixing of spatial-channel information causes MLP-like Vision Models to demand tremendous pre-training on large-scale datasets. This work solves the problem from a novel knowledge distillation perspective. We propose a novel Spatial-channel Token Distillation (STD) method, which improves the information mixing in the two dimensions by introducing distillation tokens to each of them. A mutual information regularization is further introduced to let distillation tokens focus on their specific dimensions and maximize the performance gain. Extensive experiments on ImageNet for several MLP-like architectures demonstrate that the proposed token distillation mechanism can efficiently improve the accuracy. For example, the proposed STD boosts the top-1 accuracy of Mixer-S16 on ImageNet from 73.8% to 75.7% without any costly pre-training on JFT-300M. When applied to stronger architectures, e.g. CycleMLP-B1 and CycleMLP-B2, STD can still harvest about 1.1% and 0.5% accuracy gains, respectively. Copyright 2022 by the author(s).
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 39th International Conference on Machine Learning |
| Editors | Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, Sivan Sabato |
| Publisher | ML Research Press |
| Pages | 12685-12695 |
| Publication status | Published - Jul 2022 |
| Externally published | Yes |
| Event | 39th International Conference on Machine Learning (ICML 2022) - Hybrid, Baltimore, United States Duration: 17 Jul 2022 → 23 Jul 2022 https://icml.cc/virtual/2022/index.html https://icml.cc/Conferences/2022 https://proceedings.mlr.press/v162/ |
Publication series
| Name | Proceedings of Machine Learning Research |
|---|---|
| Volume | 162 |
| ISSN (Electronic) | 2640-3498 |
Conference
| Conference | 39th International Conference on Machine Learning (ICML 2022) |
|---|---|
| Place | United States |
| City | Baltimore |
| Period | 17/07/22 → 23/07/22 |
| Internet address |
Funding
The authors would like to thank the area chairs and the reviewers for their constructive comments. This work was supported in part by the Australian Research Council under Project DP210101859 and the University of Sydney SOAR Prize.
Fingerprint
Dive into the research topics of 'Spatial-Channel Token Distillation for Vision MLPs'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver