FastViT: Real-Time Linear Attention Accelerator for Dense Predictions of Vision Transformer (ViT)

Zhuoheng Ran*, Zewen Ye, Chong Wu, Ray C.C. Cheung, Hong Yan

*Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

2 Citations (Scopus)

Abstract

The commercial success of generative artificial intelligence (GenAI) has driven an exponential surge in demand for real-time inference in Vision Transformer (ViT) applications, including latency-sensitive domains in autonomous driving, medical imaging and computational photography. This paper introduces FastViT, a high-performance and energy-efficient hardware accelerator for emerging kernel function-based linear attention mechanisms. By leveraging cost-efficient multiplication, mixed-precision quantisation and optimised data flow, FastViT improves real-time performance for high-resolution dense prediction tasks. Compared to existing approaches, experiments demonstrate that FastViT achieves higher throughput and energy efficiency while maintaining negligible accuracy degradation and balanced resource allocation. In the future, we will improve its scalability for next-generation hardware equipped with advanced DSP cores.

© 2025 IEEE.
Original languageEnglish
Title of host publicationIEEE ISCAS 2025 SYMPOSIUM PROCEEDINGS
PublisherIEEE
Number of pages5
ISBN (Electronic)979-8-3503-5683-0
ISBN (Print)979-8-3503-5684-7
DOIs
Publication statusPublished - 2025
Event2025 IEEE International Symposium on Circuits and Systems - London, United Kingdom
Duration: 25 May 202528 May 2025
https://2025.ieee-iscas.org/

Publication series

Name
ISSN (Print)0271-4302
ISSN (Electronic)2158-1525

Conference

Conference2025 IEEE International Symposium on Circuits and Systems
Abbreviated titleISCAS 2025
PlaceUnited Kingdom
CityLondon
Period25/05/2528/05/25
Internet address

Funding

This work is supported by Hong Kong Innovation and Technology Commission (InnoHK Project CIMDA), Hong Kong Research Grants Council (Project 11204821), and City University of Hong Kong (Project 9610460).

Research Keywords

  • vision transformer (ViT)
  • mixed-precision quantisation
  • kernel-based linear attention
  • hardware acceleration

RGC Funding Information

  • RGC-funded

Fingerprint

Dive into the research topics of 'FastViT: Real-Time Linear Attention Accelerator for Dense Predictions of Vision Transformer (ViT)'. Together they form a unique fingerprint.

Cite this