Download PDFOpen PDF in browserExploring Strategies to Improve Locality Across Many-Core AffinitiesEasyChair Preprint 747112 pages•Date: February 15, 2022AbstractSeveral recent rank one systems in the Top500 include many-core chips with complex memory systems, including intermediate levels of memory, multiple memory channels, and explicit affinity of specific memory channels to specific sub-blocks of cores. Creating codes to utilize these features efficiently is thus a significant challenge. This paper uses Intel's Knights Landing (KNL) processor as a testbed, as it includes both intermediate memory and multiple architectural knobs to adjust affinity. This paper also uses a 2D Fast Fourier Transform (FFT) as a test case to explore what combination of architectural and algorithmic techniques are of most benefit. Several codes are used, including state-of-the-art FFT codes FFTW and MKL, along with two additional simple parallel 2D FFT codes exploring explicit options. The conclusions are that intermediate memory does provide a significant boost, that there are architectural modes in the memory subsystem that are better suited to FFT than others, and that a cache-oblivious FFT performs consistently across affinity modes. Keyphrases: FFT, affinity, buffering, cache-oblivious, multilevel memory
|