Abstract
As tensors become widespread in modern data analysis, Tucker low-rank Principal Component Analysis (PCA) has become essential for dimensionality reduction and structural discovery in tensor datasets. Motivated by the common scenario where large-scale tensors are distributed across diverse geographic locations, this article investigates tensor PCA within a distributed framework where direct data pooling is theoretically suboptimal or practically infeasible. We offer a comprehensive analysis of three specific scenarios in distributed Tensor PCA: a homogeneous setting in which tensors at various locations are generated from a single noise-affected model; a heterogeneous setting where tensors at different locations come from distinct models but share some principal components, aiming to improve estimation across all locations; and a targeted heterogeneous setting, designed to boost estimation accuracy at a specific location with limited samples by utilizing transferred knowledge from other sites with ample data.
We introduce novel estimation methods tailored to each scenario, establish statistical guarantees, and develop distributed inference techniques to construct confidence regions. Our theoretical findings demonstrate that these distributed methods achieve sharp rates of accuracy by efficiently aggregating shared information across different tensors, while maintaining reasonable communication costs. Empirical validation through simulations and real-world data applications highlights the advantages of our approaches, particularly in managing heterogeneous tensor data. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.
© The Author(s)
We introduce novel estimation methods tailored to each scenario, establish statistical guarantees, and develop distributed inference techniques to construct confidence regions. Our theoretical findings demonstrate that these distributed methods achieve sharp rates of accuracy by efficiently aggregating shared information across different tensors, while maintaining reasonable communication costs. Empirical validation through simulations and real-world data applications highlights the advantages of our approaches, particularly in managing heterogeneous tensor data. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.
© The Author(s)
| Original language | English |
|---|---|
| Pages (from-to) | 2619-2631 |
| Journal | Journal of the American Statistical Association |
| Volume | 120 |
| Issue number | 552 |
| Online published | 22 May 2025 |
| DOIs | |
| Publication status | Published - Dec 2025 |
| Externally published | Yes |
Funding
Xi Chen would like to thank the support from the National Science Foundation via the grant IIS-1845444. Elynn Chen’s research was supported in part by the National Science Foundation under Award ID 2412577.
Research Keywords
- Tensor Principal Component Analysis
- Distributed inference
- Data heterogeneity
- Communication efficiency
- Tucker decomposition
Publisher's Copyright Statement
- This full text is made available under CC-BY-NC 4.0. https://creativecommons.org/licenses/by-nc/4.0/
Fingerprint
Dive into the research topics of 'Distributed Tensor Principal Component Analysis with Data Heterogeneity'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver