Abstract
In the field of computer vision, recent works show that a pure MLP architecture mainly stacked by fully-connected layers can achieve competing performance with CNN and transformer. An input image of vision MLP is usually split into multiple tokens (patches), while the existing MLP models directly aggregate them with fixed weights, neglecting the varying semantic information of tokens from different images. To dynamically aggregate tokens, we propose to represent each token as a wave function with two parts, amplitude and phase. Amplitude is the original feature and the phase term is a complex value changing according to the semantic contents of input images. Introducing the phase term can dynamically modulate the relationship between tokens and fixed weights in MLP. Based on the wave-like token representation, we establish a novel Wave-MLP architecture for vision tasks. Extensive experiments demonstrate that the proposed Wave-MLP is superior to the state-of-the-art MLP architectures on various vision tasks such as image classification, object detection and semantic segmentation. The source code is available at https://github.com/huawei-noah/CV-Backbones/tree/master/wavemlp_pytorch and https://gitee.com/mindspore/models/tree/master/research/cv/wave_mlp. © 2022 IEEE.
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 |
| Place of Publication | Los Alamitos, Calif. |
| Publisher | IEEE Computer Society |
| Pages | 10925-10934 |
| ISBN (Electronic) | 9781665469463 |
| ISBN (Print) | 9781665469470 |
| DOIs | |
| Publication status | Published - 2022 |
| Externally published | Yes |
| Event | 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022) - Hybrid, New Orleans, United States Duration: 19 Jun 2022 → 24 Jun 2022 https://cvpr2022.thecvf.com/ |
Publication series
| Name | Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition |
|---|---|
| Volume | 2022-June |
| ISSN (Print) | 1063-6919 |
| ISSN (Electronic) | 2575-7075 |
Conference
| Conference | 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022) |
|---|---|
| Place | United States |
| City | New Orleans |
| Period | 19/06/22 → 24/06/22 |
| Internet address |
Funding
This work is supported by National Natural Science Foundation of China under Grant No.61876007, Australian Research Council under Project DP210101859 and the University of Sydney SOAR Prize.
Research Keywords
- categorization
- Deep learning architectures and techniques
- Efficient learning and inferences
- Machine learning
- Recognition: detection
- Representation learning
- retrieval
Fingerprint
Dive into the research topics of 'An Image Patch is a Wave: Phase-Aware Vision MLP'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver