Adaptive Split-Fusion Transformer

Zixuan Su, Jingjing Chen*, Lei Pang, Chong-Wah Ngo, Yu-Gang Jiang

*Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

3 Citations (Scopus)

Abstract

Neural networks for visual content understanding have recently evolved from convolutional ones to transformers. The prior (CNN) relies on small-windowed kernels to capture the regional clues, demonstrating solid local expressiveness. On the contrary, the latter (transformer) establishes long-range global connections between localities for holistic learning. Inspired by this complementary nature, there is a growing interest in designing hybrid models which utilize both techniques. Current hybrids merely replace convolutions as simple approximations of linear projection or juxtapose a convolution branch with attention without considering the importance of local/global modeling. To tackle this, we propose a new hybrid named Adaptive Split-Fusion Transformer (ASF-former) that treats convolutional and attention branches differently with adaptive weights. Specifically, an ASF-former encoder equally splits feature channels into half to fit dual-path inputs. Then, the outputs of the dual-path are fused with weights calculated from visual cues. We also design a compact convolutional path from a concern of efficiency. Extensive experiments on standard benchmarks show that our ASF-former outperforms its CNN, transformer, and hybrid counterparts in terms of accuracy (83.9% on ImageNet-1K), under similar conditions (12.9G MACs / 56.7M Params, without large-scale pre-training). The code is available at: https://github.com/szx503045266/ASF-former. © 2023 IEEE.
Original languageEnglish
Title of host publicationProceedings - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023
PublisherIEEE
Pages1169-1174
ISBN (Electronic)9781665468916
ISBN (Print)978-1-6654-6892-3
DOIs
Publication statusPublished - 2023
Event2023 IEEE International Conference on Multimedia and Expo (ICME 2023) - Brisbane Convention and Exhibition Centre, Brisbane, Australia
Duration: 10 Jul 202314 Jul 2023
https://www.2023.ieeeicme.org/

Publication series

NameProceedings - IEEE International Conference on Multimedia and Expo
ISSN (Print)1945-7871
ISSN (Electronic)1945-788X

Conference

Conference2023 IEEE International Conference on Multimedia and Expo (ICME 2023)
PlaceAustralia
CityBrisbane
Period10/07/2314/07/23
Internet address

Research Keywords

  • CNN
  • gating
  • hybrid
  • transformer
  • Visual understanding

Fingerprint

Dive into the research topics of 'Adaptive Split-Fusion Transformer'. Together they form a unique fingerprint.

Cite this