Randomized tensor decomposition using parallel reconfigurable systems
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Article number | 543 |
Number of pages | 32 |
Journal / Publication | Journal of Supercomputing |
Volume | 81 |
Issue number | 4 |
Online published | 25 Feb 2025 |
Publication status | Published - Mar 2025 |
Link(s)
DOI | DOI |
---|---|
Attachment(s) | Documents
Publisher's Copyright Statement
|
Link to Scopus | https://www.scopus.com/record/display.uri?eid=2-s2.0-85218627028&origin=recordpage |
Permanent Link | https://scholars.cityu.edu.hk/en/publications/publication(8f8faaf5-cd28-4088-96e2-00b32cfe1fd0).html |
Abstract
Tensor decomposition algorithms are essential for extracting meaningful latent variables and uncovering hidden structures in real-world data tensors. Unlike conventional deterministic tensor decomposition algorithms, randomized methods offer higher efficiency by reducing memory requirements and computational complexity. This paper proposes an efficient hardware architecture for a randomized tensor decomposition implemented on a field-programmable gate array (FPGA) using high-level synthesis (HLS). The proposed architecture integrates random projection, power iteration, and subspace approximation via QR decomposition to achieve lowrank approximation of multidimensional datasets. The proposed architecture utilizes the capabilities of reconfigurable systems to accelerate tensor computation. It includes three central units: (1) tensor times matrix chain (TTMc), (2) tensor unfolding unit, and (3) QR decomposition unit to implement a three-stage algorithm. Experimental results demonstrate that our FPGA design achieves up to 14.56 times speedup compared to the well-implemented tensor decomposition using software library Tensor Toolbox on an Intel i7-9700 CPU. For a large input tensor of size 512×512×512 , the proposed design achieves a 5.55 times speedup compared to an Nvidia Tesla T4 GPU. Furthermore, we utilize our hardware-based high-order singular value decomposition (HOSVD) accelerator for two real applications: background subtraction of dynamic video datasets and data compression. In both applications, our proposed design shows high efficiency regarding accuracy and computational time.
© The Author(s) 2025
© The Author(s) 2025
Research Area(s)
- High-level synthesis (HLS), Field programmable gate array (FPGA), Randomized algorithm, Low-rank tensor computing, High order singular value decomposition (HOSVD)
Citation Format(s)
Randomized tensor decomposition using parallel reconfigurable systems. / Misra, Ajita (Co-first Author); Abdelgawad, Muhammad A. A. (Co-first Author); Jing, Peng (Co-first Author) et al.
In: Journal of Supercomputing, Vol. 81, No. 4, 543, 03.2025.
In: Journal of Supercomputing, Vol. 81, No. 4, 543, 03.2025.
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Download Statistics
No data available