Abstract
Poster Abstract:
AI applications powered by deep learning are increasingly running on edge devices. Meanwhile, many real-world IoT applications demand multiple real-time tasks to run on the same device, for example, to achieve both object tracking and image segmentation simultaneously on an augmented reality glass. However, the current solutions can not yet support such multi-tenant real-time DNN inference on edge devices. Techniques such as on-device model compression trade inference accuracy for speed, while traditional DNN compilers mainly focus on single-tenant DNN model optimization. To fill this gap, we propose Aaron, which leverages DNN compiling techniques to accelerate multi-DNN inference on edge GPU based on compile-time kernel adaptation with no accuracy loss. Aaron integrates both DNN graph and kernel optimization to maximize on-device parallelism and minimize contention brought by concurrent inference. © 2022 Owner/Author.
AI applications powered by deep learning are increasingly running on edge devices. Meanwhile, many real-world IoT applications demand multiple real-time tasks to run on the same device, for example, to achieve both object tracking and image segmentation simultaneously on an augmented reality glass. However, the current solutions can not yet support such multi-tenant real-time DNN inference on edge devices. Techniques such as on-device model compression trade inference accuracy for speed, while traditional DNN compilers mainly focus on single-tenant DNN model optimization. To fill this gap, we propose Aaron, which leverages DNN compiling techniques to accelerate multi-DNN inference on edge GPU based on compile-time kernel adaptation with no accuracy loss. Aaron integrates both DNN graph and kernel optimization to maximize on-device parallelism and minimize contention brought by concurrent inference. © 2022 Owner/Author.
| Original language | English |
|---|---|
| Title of host publication | SenSys 2022 - Proceedings of the 20th ACM Conference on Embedded Networked Sensor Systems |
| Publisher | Association for Computing Machinery |
| Pages | 802-803 |
| ISBN (Print) | 9781450398862 |
| DOIs | |
| Publication status | Published - Nov 2022 |
| Event | 20th ACM Conference on Embedded Networked Sensor Systems (SenSys 2022) - Hynes Convention Center, Boston, United States Duration: 6 Nov 2022 → 9 Nov 2022 https://sensys.acm.org/2022/ |
Publication series
| Name | SenSys - Proceedings of the ACM Conference on Embedded Networked Sensor Systems |
|---|
Conference
| Conference | 20th ACM Conference on Embedded Networked Sensor Systems (SenSys 2022) |
|---|---|
| Place | United States |
| City | Boston |
| Period | 6/11/22 → 9/11/22 |
| Internet address |
Funding
The work described in this article was partially supported by the Research Grants Council of Hong Kong under Grant No. 14203420, and by the Centre for Perceptual and Interactive Intelligence (CPII) Ltd under the Innovation and Technology Fund.
Research Keywords
- DNN compiler
- efficient DNN processing
- real-time system
RGC Funding Information
- RGC-funded