Randomized tensor decomposition using parallel reconfigurable systems

Ajita Misra* (Co-first Author), Muhammad A. A. Abdelgawad (Co-first Author), Peng Jing (Co-first Author), Ray C. C. Cheung, Hong Yan

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

48 Downloads (CityUHK Scholars)

Abstract

Tensor decomposition algorithms are essential for extracting meaningful latent variables and uncovering hidden structures in real-world data tensors. Unlike conventional deterministic tensor decomposition algorithms, randomized methods offer higher efficiency by reducing memory requirements and computational complexity. This paper proposes an efficient hardware architecture for a randomized tensor decomposition implemented on a field-programmable gate array (FPGA) using high-level synthesis (HLS). The proposed architecture integrates random projection, power iteration, and subspace approximation via QR decomposition to achieve lowrank approximation of multidimensional datasets. The proposed architecture utilizes the capabilities of reconfigurable systems to accelerate tensor computation. It includes three central units: (1) tensor times matrix chain (TTMc), (2) tensor unfolding unit, and (3) QR decomposition unit to implement a three-stage algorithm. Experimental results demonstrate that our FPGA design achieves up to 14.56 times speedup compared to the well-implemented tensor decomposition using software library Tensor Toolbox on an Intel i7-9700 CPU. For a large input tensor of size 512×512×512 , the proposed design achieves a 5.55 times speedup compared to an Nvidia Tesla T4 GPU. Furthermore, we utilize our hardware-based high-order singular value decomposition (HOSVD) accelerator for two real applications: background subtraction of dynamic video datasets and data compression. In both applications, our proposed design shows high efficiency regarding accuracy and computational time.

© The Author(s) 2025
Original languageEnglish
Article number543
Number of pages32
JournalJournal of Supercomputing
Volume81
Issue number4
Online published25 Feb 2025
DOIs
Publication statusPublished - Mar 2025

Funding

This work is supported by Hong Kong Innovation and Technology Commission (InnoHK Project CIMDA), Hong Kong Research Grants Council (Project 11204821), and City University of Hong Kong (Projects 9610034 and 9610460).

Research Keywords

  • High-level synthesis (HLS)
  • Field programmable gate array (FPGA)
  • Randomized algorithm
  • Low-rank tensor computing
  • High order singular value decomposition (HOSVD)

Publisher's Copyright Statement

  • This full text is made available under CC-BY 4.0. https://creativecommons.org/licenses/by/4.0/

RGC Funding Information

  • RGC-funded

Fingerprint

Dive into the research topics of 'Randomized tensor decomposition using parallel reconfigurable systems'. Together they form a unique fingerprint.

Cite this