Download PDFOpen PDF in browserAutomatic Mapping of Parallel Pattern-based Algorithms on Heterogeneous ArchitecturesEasyChair Preprint 559715 pages•Date: May 23, 2021AbstractNowadays, specialized hardware is often found in clusters to improve compute performance and energy efficiency. The porting and tuning of scientific codes to these heterogeneous clusters requires significant development efforts. To mitigate these efforts while maintaining high performance, modern parallel programming models introduce a second layer of abstraction, where an architecture-agnostic source code can be maintained and automatically optimized for the target architecture. However, with increasing heterogeneity, the mapping of an application to a specific architecture itself becomes a complex decision requiring a differentiated consideration of processor features and algorithmic properties. Furthermore, architecture-agnostic global transformations are necessary to maximize the simultaneous utilization of different processors. Therefore, we introduce a combinatorial optimization approach to globally transform and automatically map parallel algorithms to heterogeneous architectures. We derive a global transformation and mapping algorithm which bases on a static performance model. Moreover, we demonstrate the approach on four typical algorithmic kernels showing automatic and global transformations such as re-ordering, pipelining, and cache blocking and optimal mapping strategies to an exemplary CPU-GPU compute node. Our algorithm achieves performance on par with hand-tuned implementations of all four kernels. Keyphrases: Algorithmic efficiency, abstract pattern tree, automatic mapping, heterogeneous systems, parallel algorithms, performance analysis, performance optimization, programming languages, programming models
|