Skip to main navigation Skip to search Skip to main content

An Image Patch is a Wave: Phase-Aware Vision MLP

  • Yehui Tang
  • , Kai Han
  • , Jianyuan Guo
  • , Chang Xu
  • , Yanxi Li
  • , Chao Xu
  • , Yunhe Wang*
  • *Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

In the field of computer vision, recent works show that a pure MLP architecture mainly stacked by fully-connected layers can achieve competing performance with CNN and transformer. An input image of vision MLP is usually split into multiple tokens (patches), while the existing MLP models directly aggregate them with fixed weights, neglecting the varying semantic information of tokens from different images. To dynamically aggregate tokens, we propose to represent each token as a wave function with two parts, amplitude and phase. Amplitude is the original feature and the phase term is a complex value changing according to the semantic contents of input images. Introducing the phase term can dynamically modulate the relationship between tokens and fixed weights in MLP. Based on the wave-like token representation, we establish a novel Wave-MLP architecture for vision tasks. Extensive experiments demonstrate that the proposed Wave-MLP is superior to the state-of-the-art MLP architectures on various vision tasks such as image classification, object detection and semantic segmentation. The source code is available at https://github.com/huawei-noah/CV-Backbones/tree/master/wavemlp_pytorch and https://gitee.com/mindspore/models/tree/master/research/cv/wave_mlp. © 2022 IEEE.
Original languageEnglish
Title of host publicationProceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
Place of PublicationLos Alamitos, Calif.
PublisherIEEE Computer Society
Pages10925-10934
ISBN (Electronic)9781665469463
ISBN (Print)9781665469470
DOIs
Publication statusPublished - 2022
Externally publishedYes
Event2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022) - Hybrid, New Orleans, United States
Duration: 19 Jun 202224 Jun 2022
https://cvpr2022.thecvf.com/

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Volume2022-June
ISSN (Print)1063-6919
ISSN (Electronic)2575-7075

Conference

Conference2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022)
PlaceUnited States
CityNew Orleans
Period19/06/2224/06/22
Internet address

Funding

This work is supported by National Natural Science Foundation of China under Grant No.61876007, Australian Research Council under Project DP210101859 and the University of Sydney SOAR Prize.

Research Keywords

  • categorization
  • Deep learning architectures and techniques
  • Efficient learning and inferences
  • Machine learning
  • Recognition: detection
  • Representation learning
  • retrieval

Fingerprint

Dive into the research topics of 'An Image Patch is a Wave: Phase-Aware Vision MLP'. Together they form a unique fingerprint.

Cite this