TY - JOUR
T1 - Binary convolutional neural network acceleration framework for rapid system prototyping
AU - Xu, Zhe
AU - Cheung, Ray C.C.
PY - 2020/10
Y1 - 2020/10
N2 - The huge model size and high computational complexity make emerging convolutional neural network (CNN) models unsuitable to deploy on current embedded or edge computing devices. Recently the binary neural network (BNN) is explored to help reduce network model size and avoid complex multiplication. In this paper, a binary network acceleration framework for rapid system prototyping is proposed to promote the deployment of CNNs on embedded devices. Firstly trainable scaling factors are adopted in binary network training to improve network accuracy performance. The hardware/software co-design framework supports various compact network structures such as residual block, 1 × 1 squeeze convolution layer, and depthwise separable convolution. With flexible network binarization and efficient hardware architecture optimization, the acceleration system is able to achieve over 2 TOPS throughput performance comparable to modern desktop GPU with much higher power efficiency.
AB - The huge model size and high computational complexity make emerging convolutional neural network (CNN) models unsuitable to deploy on current embedded or edge computing devices. Recently the binary neural network (BNN) is explored to help reduce network model size and avoid complex multiplication. In this paper, a binary network acceleration framework for rapid system prototyping is proposed to promote the deployment of CNNs on embedded devices. Firstly trainable scaling factors are adopted in binary network training to improve network accuracy performance. The hardware/software co-design framework supports various compact network structures such as residual block, 1 × 1 squeeze convolution layer, and depthwise separable convolution. With flexible network binarization and efficient hardware architecture optimization, the acceleration system is able to achieve over 2 TOPS throughput performance comparable to modern desktop GPU with much higher power efficiency.
KW - Binarization
KW - Convolutional neural network
KW - FPGA
KW - Hardware acceleration
KW - Rapid system prototyping
UR - http://www.scopus.com/inward/record.url?scp=85082566267&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-85082566267&origin=recordpage
U2 - 10.1016/j.sysarc.2020.101762
DO - 10.1016/j.sysarc.2020.101762
M3 - RGC 21 - Publication in refereed journal
SN - 1383-7621
VL - 109
JO - Journal of Systems Architecture
JF - Journal of Systems Architecture
M1 - 101762
ER -