A high performance hardware architecture for non-negative tensor factorization

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review

1 Scopus Citations
View graph of relations

Related Research Unit(s)


Original languageEnglish
Pages (from-to)25-33
Journal / PublicationMicroelectronics Journal
Online published16 Nov 2018
Publication statusPublished - Mar 2019


Non-negative tensor factorization (NTF) algorithm is an emerging method for high-dimensional data analysis, which is applied in many fields such as computer vision, and bioinformatics. This paper presents an effective method to accelerate NTF computations and proposes a corresponding hardware architecture, which consists of multiple processing units. The decomposed factors are calculated by using shared intermediate results. By using the proposed method, NTF can be implemented in parallel and hardware resources can be saved by sharing. In this paper, we evaluate the proposed architecture on Xilinx Virtex-6 FPGA XC6VLX760T and apply it into 2 applications, i.e. video background estimation and facial images processing. The experimental results show that the proposed hardware architecture achieves over 80 times faster than CPU implementation of NTF. Compared with the implementations on GPGPU, the proposed architecture achieves nearly the same speedup. While the strategy in this paper is applied to GPU platforms, the execution time can be reduced by a half, due to the computational sharing in this paper.

Research Area(s)

  • Field-programmable gate array (FPGA), Hardware architecture, High-dimensional data, Non-negative tensor factorization, Parallel architecture