TY - GEN
T1 - Delay-aware DNN inference throughput maximization in edge computing via jointly exploring partitioning and parallelism
AU - Li, Jing
AU - Liang, Weifa
AU - Li, Yuchen
AU - Xu, Zichuan
AU - Jia, Xiaohua
PY - 2021
Y1 - 2021
N2 - Mobile Edge Computing (MEC) has emerged as a promising paradigm catering to overwhelming explosions of mobile applications, by offloading the compute-intensive tasks to an MEC network for processing. The surging of deep learning brings new vigor and vitality to shape the prospect of intelligent Internet of Things (IoT), and edge intelligence arises to provision real-time deep neural network (DNN) inference services for users. To accelerate the processing of the DNN inference of a request in an MEC network, the DNN inference model usually can be partitioned into two connected parts: one part is processed on the local IoT device of the request; and another part is processed on a cloudlet (server) in the MEC network. Also, the DNN inference can be further accelerated by allocating multiple threads of the cloudlet in which the request is assigned.In this paper, we study a novel delay-aware DNN inference throughput maximization problem with the aim to maximize the number of delay-aware DNN service requests admitted, by accelerating each DNN inference through jointly exploring DNN model partitioning and multi-thread parallelism of DNN inference. To this end, we first show that the problem is NP-hard. We then devise a constant approximation algorithm for it. We finally evaluate the performance of the proposed algorithm through experimental simulations. Experimental results demonstrate that the proposed algorithm is promising.
AB - Mobile Edge Computing (MEC) has emerged as a promising paradigm catering to overwhelming explosions of mobile applications, by offloading the compute-intensive tasks to an MEC network for processing. The surging of deep learning brings new vigor and vitality to shape the prospect of intelligent Internet of Things (IoT), and edge intelligence arises to provision real-time deep neural network (DNN) inference services for users. To accelerate the processing of the DNN inference of a request in an MEC network, the DNN inference model usually can be partitioned into two connected parts: one part is processed on the local IoT device of the request; and another part is processed on a cloudlet (server) in the MEC network. Also, the DNN inference can be further accelerated by allocating multiple threads of the cloudlet in which the request is assigned.In this paper, we study a novel delay-aware DNN inference throughput maximization problem with the aim to maximize the number of delay-aware DNN service requests admitted, by accelerating each DNN inference through jointly exploring DNN model partitioning and multi-thread parallelism of DNN inference. To this end, we first show that the problem is NP-hard. We then devise a constant approximation algorithm for it. We finally evaluate the performance of the proposed algorithm through experimental simulations. Experimental results demonstrate that the proposed algorithm is promising.
UR - http://www.scopus.com/inward/record.url?scp=85118443109&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-85118443109&origin=recordpage
U2 - 10.1109/LCN52139.2021.9524928
DO - 10.1109/LCN52139.2021.9524928
M3 - RGC 32 - Refereed conference paper (with host publication)
SN - 9780738124766
T3 - Proceedings - Conference on Local Computer Networks, LCN
SP - 193
EP - 200
BT - Proceedings of the IEEE 46th Conference on Local Computer Networks (LCN 2021)
A2 - Khoukh, Lyes
A2 - Oteafy, Sharief
A2 - Bulut, Eyuphan
PB - IEEE Computer Society
T2 - 46th IEEE Conference on Local Computer Networks (LCN 2021)
Y2 - 4 October 2021 through 7 October 2021
ER -