Abstract
Neural networks for visual content understanding have recently evolved from convolutional ones to transformers. The prior (CNN) relies on small-windowed kernels to capture the regional clues, demonstrating solid local expressiveness. On the contrary, the latter (transformer) establishes long-range global connections between localities for holistic learning. Inspired by this complementary nature, there is a growing interest in designing hybrid models which utilize both techniques. Current hybrids merely replace convolutions as simple approximations of linear projection or juxtapose a convolution branch with attention without considering the importance of local/global modeling. To tackle this, we propose a new hybrid named Adaptive Split-Fusion Transformer (ASF-former) that treats convolutional and attention branches differently with adaptive weights. Specifically, an ASF-former encoder equally splits feature channels into half to fit dual-path inputs. Then, the outputs of the dual-path are fused with weights calculated from visual cues. We also design a compact convolutional path from a concern of efficiency. Extensive experiments on standard benchmarks show that our ASF-former outperforms its CNN, transformer, and hybrid counterparts in terms of accuracy (83.9% on ImageNet-1K), under similar conditions (12.9G MACs / 56.7M Params, without large-scale pre-training). The code is available at: https://github.com/szx503045266/ASF-former. © 2023 IEEE.
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023 |
| Publisher | IEEE |
| Pages | 1169-1174 |
| ISBN (Electronic) | 9781665468916 |
| ISBN (Print) | 978-1-6654-6892-3 |
| DOIs | |
| Publication status | Published - 2023 |
| Event | 2023 IEEE International Conference on Multimedia and Expo (ICME 2023) - Brisbane Convention and Exhibition Centre, Brisbane, Australia Duration: 10 Jul 2023 → 14 Jul 2023 https://www.2023.ieeeicme.org/ |
Publication series
| Name | Proceedings - IEEE International Conference on Multimedia and Expo |
|---|---|
| ISSN (Print) | 1945-7871 |
| ISSN (Electronic) | 1945-788X |
Conference
| Conference | 2023 IEEE International Conference on Multimedia and Expo (ICME 2023) |
|---|---|
| Place | Australia |
| City | Brisbane |
| Period | 10/07/23 → 14/07/23 |
| Internet address |
Research Keywords
- CNN
- gating
- hybrid
- transformer
- Visual understanding