TY - JOUR
T1 - Maximize parallelism minimize overhead for nested loops via loop striping
AU - Xue, Chun
AU - Shao, Zili
AU - Sha, Edwin H.-M.
PY - 2007/5
Y1 - 2007/5
N2 - Majority of scientific and Digital Signal Processing (DSP) applications are recursive or iterative. Transformation techniques are generally applied to increase parallelism for these nested loops. Most of the existing loop transformation techniques either can not achieve maximum parallelism, or can achieve maximum parallelism but with complicated loop bounds and loop indexes calculations. This paper proposes a new technique, loop striping, that can maximize parallelism while maintaining the original row-wise execution sequence with minimum overhead. Loop striping groups iterations into stripes, where all iterations in a stripe are independent and can be executed in parallel. Theorems and efficient algorithms are proposed for loop striping transformations. The experimental results show that loop striping always achieves better iteration period than software pipelining and loop unfolding, improving average iteration period by 50 and 54% respectively. © Springer Science+Business Media, LLC 2007.
AB - Majority of scientific and Digital Signal Processing (DSP) applications are recursive or iterative. Transformation techniques are generally applied to increase parallelism for these nested loops. Most of the existing loop transformation techniques either can not achieve maximum parallelism, or can achieve maximum parallelism but with complicated loop bounds and loop indexes calculations. This paper proposes a new technique, loop striping, that can maximize parallelism while maintaining the original row-wise execution sequence with minimum overhead. Loop striping groups iterations into stripes, where all iterations in a stripe are independent and can be executed in parallel. Theorems and efficient algorithms are proposed for loop striping transformations. The experimental results show that loop striping always achieves better iteration period than software pipelining and loop unfolding, improving average iteration period by 50 and 54% respectively. © Springer Science+Business Media, LLC 2007.
KW - Loop transformation
KW - Optimization
KW - Parallelism
UR - http://www.scopus.com/inward/record.url?scp=34248162475&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-34248162475&origin=recordpage
U2 - 10.1007/s11265-006-0034-5
DO - 10.1007/s11265-006-0034-5
M3 - RGC 22 - Publication in policy or professional journal
SN - 1387-5485
VL - 47
SP - 153
EP - 167
JO - Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology
JF - Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology
IS - 2
ER -