Aaron: Compile-time Kernel Adaptation for Multi-DNN Inference Acceleration on Edge GPU

Zhihe Zhao, Neiwen Ling, Nan Guan, Guoliang Xing*

*Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

4 Citations (Scopus)

Abstract

Poster Abstract:
AI applications powered by deep learning are increasingly running on edge devices. Meanwhile, many real-world IoT applications demand multiple real-time tasks to run on the same device, for example, to achieve both object tracking and image segmentation simultaneously on an augmented reality glass. However, the current solutions can not yet support such multi-tenant real-time DNN inference on edge devices. Techniques such as on-device model compression trade inference accuracy for speed, while traditional DNN compilers mainly focus on single-tenant DNN model optimization. To fill this gap, we propose Aaron, which leverages DNN compiling techniques to accelerate multi-DNN inference on edge GPU based on compile-time kernel adaptation with no accuracy loss. Aaron integrates both DNN graph and kernel optimization to maximize on-device parallelism and minimize contention brought by concurrent inference. © 2022 Owner/Author.
Original languageEnglish
Title of host publicationSenSys 2022 - Proceedings of the 20th ACM Conference on Embedded Networked Sensor Systems
PublisherAssociation for Computing Machinery
Pages802-803
ISBN (Print)9781450398862
DOIs
Publication statusPublished - Nov 2022
Event20th ACM Conference on Embedded Networked Sensor Systems (SenSys 2022) - Hynes Convention Center, Boston, United States
Duration: 6 Nov 20229 Nov 2022
https://sensys.acm.org/2022/

Publication series

NameSenSys - Proceedings of the ACM Conference on Embedded Networked Sensor Systems

Conference

Conference20th ACM Conference on Embedded Networked Sensor Systems (SenSys 2022)
PlaceUnited States
CityBoston
Period6/11/229/11/22
Internet address

Funding

The work described in this article was partially supported by the Research Grants Council of Hong Kong under Grant No. 14203420, and by the Centre for Perceptual and Interactive Intelligence (CPII) Ltd under the Innovation and Technology Fund.

Research Keywords

  • DNN compiler
  • efficient DNN processing
  • real-time system

RGC Funding Information

  • RGC-funded

Fingerprint

Dive into the research topics of 'Aaron: Compile-time Kernel Adaptation for Multi-DNN Inference Acceleration on Edge GPU'. Together they form a unique fingerprint.

Cite this