Skip to main navigation Skip to search Skip to main content

DuSA: Fast and Accurate Dual-Stage Sparse Attention Mechanism Accelerating Both Training and Inference

Chong Wu (Co-first Author), Jiawang Cao (Co-first Author), Renjie Xu (Co-first Author), Zhuoheng Ran, Maolin Che*, Wenbo Zhu, Hong Yan

*Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

This paper proposes the Dual-Stage Sparse Attention (DuSA) mechanism for attention acceleration of transformers. In the first stage, DuSA performs intrablock sparse attention to aggregate local inductive biases. In the second stage, DuSA performs interblock sparse attention to obtain long-range dependencies. Both stages have low computational complexity and can be further accelerated by memory acceleration attention mechanisms directly, which makes DuSA faster than some extremely fast attention mechanisms. The dual-stage sparse attention design provides a lower error in approximating vanilla scaled-dot product attention than the basic single-stage sparse attention mechanisms and further advances the basic sparse attention mechanisms to match or even outperform vanilla scaled-dot product attention. Even in some plug and play situations, DuSA can still maintain low performance loss. DuSA can be used in both training and inference acceleration. DuSA achieves leading performance in different benchmarks: long range arena, image classification, semantic segmentation, object detection, text to video generation, and long context understanding, and accelerates models of different sizes.
Original languageEnglish
Title of host publicationThe Thirty-ninth Annual Conference on Neural Information Processing Systems, NeurIPS 2025
Number of pages27
Publication statusPublished - Dec 2025
Event39th Annual Conference on Neural Information Processing Systems (NeurIPS 2025) - San Diego, United States
Duration: 2 Dec 20257 Dec 2025
https://neurips.cc/Conferences/2025

Conference

Conference39th Annual Conference on Neural Information Processing Systems (NeurIPS 2025)
Abbreviated titleNeurIPS 2025
PlaceUnited States
CitySan Diego
Period2/12/257/12/25
Internet address

Funding

This work is supported by the Hong Kong Innovation and Technology Commission (InnoHK Project CIMDA), the Institute of Digital Medicine, City University of Hong Kong (Projects 9229503 and 9610460), the National Natural Science Foundation of China (No. 12561095), and the Special Posts of Guizhou University (No. [2025]06).

Research Keywords

  • Efficient Attention Mechanism
  • Sparse Attention Mechanism
  • Transformer

Fingerprint

Dive into the research topics of 'DuSA: Fast and Accurate Dual-Stage Sparse Attention Mechanism Accelerating Both Training and Inference'. Together they form a unique fingerprint.

Cite this