Skip to main navigation Skip to search Skip to main content

An adaptive Non-Uniform Loop Tiling for DMA-based bulk data transfers on many-core processor

Keni Qiu*, Yuanhui Ni, Weigong Zhang, Jing Wang, Xiaoqiang Wu, Chun Jason Xue, Tao Li

*Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

Mesh Network-on-Chip (NoC) is a key fabric to interconnect many cores with desirable scalability, reliability and interoperability. We observe that DMA-based bulk data block transfer exhibits non-negligible NoC latency due to heavy congestions. Loop tiling is an effective way to partition data space for SPM+DMA-based data block transfer. Nevertheless, we observe that the unbalanced NoC latency can degrade the effectiveness of loop tiling in a uniform fashion. In this paper, we propose a NoC-aware Non-Uniform Loop Tiling (NULT) scheme to improve DMA performance. A NULT framework is built on the proposed model to adaptively hide DMA latency into computation time and reduce the overall execution time. The framework first groups cores into different families taking into account their distance-to-data in NoC. Then a heuristic method is presented to solve the near optimal tiling factors for each core family. In this way, different core families are assigned non-uniform tiling sizes. We evaluate the NULT scheme on the NIRGAM platform. Compared to the traditional uniform tiling approach, the proposed NULT technique shows more benefit to overlap memory access time and computation time and thus reduce the overall execution time of a loop nest.
Original languageEnglish
Title of host publicationProceedings of the 34th IEEE International Conference on Computer Design, ICCD 2016
PublisherIEEE
Pages9-16
ISBN (Print)9781509051427
DOIs
Publication statusPublished - 22 Nov 2016
Event34th IEEE International Conference on Computer Design, ICCD 2016 - Scottsdale, United States
Duration: 2 Oct 20165 Oct 2016

Publication series

NameProceedings IEEE International Conference on Computer Design
PublisherIEEE
ISSN (Print)1063-6404

Conference

Conference34th IEEE International Conference on Computer Design, ICCD 2016
PlaceUnited States
CityScottsdale
Period2/10/165/10/16

Funding

This work is supported by Beijing Advanced Innovation Center for Imaging Technology, Guangdong Innovative Research Team Program [Project No. 201001D0104726115], PhD Start-up Fund of Natural Science Foundation of Guangdong Province, China [Project No. 2014A030310344], the Project of Construction of Innovative Teams and Teacher Career Development for Universities and Colleges Under Beijing Municipality [Project No. IDHT20150507], National Natural Science Foundation of China [Project No. 61472260, 61402302 and 61502321]. Weigong Zhang is the corresponding author.

Research Keywords

  • DMA
  • looping tiling
  • many-core system
  • cost model
  • non-uniform
  • ARCHITECTURE

Fingerprint

Dive into the research topics of 'An adaptive Non-Uniform Loop Tiling for DMA-based bulk data transfers on many-core processor'. Together they form a unique fingerprint.

Cite this