An Efficient Parallel Processor for Dense Tensor Computation
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Pages (from-to) | 1335-1347 |
Journal / Publication | IEEE Transactions on Very Large Scale Integration (VLSI) Systems |
Volume | 29 |
Issue number | 7 |
Online published | 27 May 2021 |
Publication status | Published - Jul 2021 |
Link(s)
Abstract
Nowadays, many data are multidimensional, which are called tensors. Tensor computations have been applied in different fields and various software libraries have been developed. However, not much attention has been received for developing a hardware architecture to accelerate the tensor computations. In this article, an efficient and unified processing element (PE) array for the 3-D tensor computation is demonstrated. Our PE array is optimized for thin and tall tensor-matrix multiplication and two types of tensor times matrices chain (TTMc) operations. Our design is evaluated in three study cases and compared with the state-of-the-art design. By using computation partition and rearrangement, data movement between the field-programmable gate array (FPGA) and off-chip DDR memory can be reduced by O (I ²), where I is the maximum range among all the dimensions of the data tensor. For TTMc implementation, clock frequency has been increased by 18% compared with the state-of-the-art implementation on the same FPGA chip. An experiment on 3-D volumetric data set rendering by tensor approximation method is conducted for demonstration. For the bricks reconstruction process, the runtime decreased by 50%, i.e., two times faster, on our FPGA implementation compared with that running on GPU. In CANDECOMP/PARAFAC decomposition, for one iteration, the runtime has been decreased by up to 93% compared with the programs implemented by Tensorly, which is a python library.
Research Area(s)
- Computer architecture, Field programmable gate arrays, Field-programmable gate array (FPGA), Hardware, hardware architecture, Matrix decomposition, parallel processor, Task analysis, tensor computation., Tensors, Very large scale integration
Citation Format(s)
An Efficient Parallel Processor for Dense Tensor Computation. / Huang, Wei-Pei; Cheung, Ray C. C.; Yan, Hong.
In: IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 29, No. 7, 07.2021, p. 1335-1347.
In: IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 29, No. 7, 07.2021, p. 1335-1347.
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review