Skip to main navigation Skip to search Skip to main content

Hardware-Friendly 3-D CNN Acceleration With Balanced Kernel Group Sparsity

  • Mengshu Sun
  • , Kaidi Xu
  • , Xue Lin
  • , Yongli Hu
  • , Baocai Yin*
  • *Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

Abstract

Being capable of extracting more information than 2-D convolutional neural networks (CNNs), 3-D CNNs have been playing a vital role in video analysis tasks like human action recognition, but their massive operations hinder the real-time execution on edge devices with constrained computation and memory resources. Although various model compression techniques have been applied to accelerate 2-D CNNs, there are rare efforts in investigating hardware-friendly pruning of 3-D CNNs and acceleration on customizable edge platforms like FPGAs. This work starts from proposing a kernel group row-column (KGRC) weight sparsity pattern, which is fine-grained to achieve high pruning ratios with negligible accuracy loss, and balanced across kernel groups to achieve high computation parallelism on hardware. The reweighted pruning algorithm for this sparsity is then presented and performed on 3-D CNNs, followed by quantization under different precisions. Along with model compression, FPGA-based accelerators with four modes are designed in support of the kernel group sparsity in multiple dimensions. The co-design framework of the pruning algorithm and the accelerator is tested on two representative 3-D CNNs, namely C3D and R(2+1)D, with the Xilinx ZCU102 FPGA platform for action recognition. The experimental results indicate that the accelerator implementation with the KGRC sparsity and 8-bit quantization achieves a good balance between the speedup and model accuracy, leading to acceleration ratios of 4.12× for C3D and 3.85× for R(2+1)D compared with the 16-bit baseline designs supporting only dense models. © 2024 IEEE.
Original languageEnglish
Pages (from-to)3027-3040
JournalIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Volume43
Issue number10
Online published16 Apr 2024
DOIs
Publication statusPublished - Oct 2024
Externally publishedYes

Funding

Manuscript received 11 November 2023; revised 1 March 2024 and 10 April 2024; accepted 12 April 2024. Date of publication 16 April 2024; date of current version 20 September 2024. This work was supported in part by the National Natural Science Foundation of China under Grant 62376014, Grant U21B2038, and Grant U19B2039; in part by the National Key Research and Development Program of China under Grant 2021ZD0111902; and in part by the Research and Development Program of Beijing Municipal Education Commission under Grant KZ202210005008. This article was recommended by Associate Editor M. D. Santambrogio. (Corresponding author: Baocai Yin.) Mengshu Sun, Yongli Hu, and Baocai Yin are with the Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing Institute of Artificial Intelligence, and the Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China (e-mail: [email protected]; [email protected]; [email protected]).

Research Keywords

  • 3-D convolutional neural network (CNN)
  • edge device inference
  • FPGA
  • model compression
  • weight pruning

Fingerprint

Dive into the research topics of 'Hardware-Friendly 3-D CNN Acceleration With Balanced Kernel Group Sparsity'. Together they form a unique fingerprint.

Cite this