TY - JOUR
T1 - Re-thinking Memory-Bound Limitations in CGRAs
AU - LIU, XIANGFENG
AU - JIANG, ZHE
AU - ZHU, ANZHEN
AU - HAN, XIAOMENG
AU - LYU, MINGSONG
AU - DENG, QINGXU
AU - GUAN, NAN
N1 - Research Unit(s) information for this publication is provided by the author(s) concerned.
PY - 2025/11
Y1 - 2025/11
N2 - Coarse-Grained Reconfigurable Arrays (CGRAs) are specialized accelerators commonly employed to boost performance in workloads with iterative structures. Existing research typically focuses on compiler or architecture optimizations aimed at improving CGRA performance, energy efficiency, flexibility, and area utilization, under the idealistic assumption that kernels can access all data from Scratchpad Memory (SPM). However, certain complex workloads–particularly in fields like graph analytics, irregular database operations, and specialized forms of high-performance computing (e.g., unstructured mesh simulations)–exhibit irregular memory access patterns that hinder CGRA utilization, sometimes dropping below 1.5%, making the CGRA memory-bound. To address this challenge, we conduct a thorough analysis of the underlying causes of performance degradation, then propose a redesigned memory subsystem and refine the memory model. With both microarchitectural and theoretical optimization, our solution can effectively manage irregular memory accesses through CGRA-specific runahead execution mechanism and cache reconfiguration techniques. Our results demonstrate that we can achieve performance comparable to the original SPM-only system while requiring only 1.27% of the storage size. The runahead execution mechanism achieves an average 3.04× speedup (up to 6.91×), with cache reconfiguration technique providing an additional 6.02% improvement, significantly enhancing CGRA performance for irregular memory access patterns.
© 2025 Copyright held by the owner/author(s).
AB - Coarse-Grained Reconfigurable Arrays (CGRAs) are specialized accelerators commonly employed to boost performance in workloads with iterative structures. Existing research typically focuses on compiler or architecture optimizations aimed at improving CGRA performance, energy efficiency, flexibility, and area utilization, under the idealistic assumption that kernels can access all data from Scratchpad Memory (SPM). However, certain complex workloads–particularly in fields like graph analytics, irregular database operations, and specialized forms of high-performance computing (e.g., unstructured mesh simulations)–exhibit irregular memory access patterns that hinder CGRA utilization, sometimes dropping below 1.5%, making the CGRA memory-bound. To address this challenge, we conduct a thorough analysis of the underlying causes of performance degradation, then propose a redesigned memory subsystem and refine the memory model. With both microarchitectural and theoretical optimization, our solution can effectively manage irregular memory accesses through CGRA-specific runahead execution mechanism and cache reconfiguration techniques. Our results demonstrate that we can achieve performance comparable to the original SPM-only system while requiring only 1.27% of the storage size. The runahead execution mechanism achieves an average 3.04× speedup (up to 6.91×), with cache reconfiguration technique providing an additional 6.02% improvement, significantly enhancing CGRA performance for irregular memory access patterns.
© 2025 Copyright held by the owner/author(s).
KW - Coarse-Grained Reconfigurable Array (CGRA)
KW - irregular memory access
KW - memory subsystem
KW - runahead execution
KW - cache reconfiguration
U2 - 10.1145/3760386
DO - 10.1145/3760386
M3 - RGC 21 - Publication in refereed journal
SN - 1539-9087
VL - 24
JO - ACM Transactions on Embedded Computing Systems
JF - ACM Transactions on Embedded Computing Systems
IS - 5s
M1 - 105
ER -