Swizzle Inventor: Data Movement Synthesis for GPU KernelsPhitchaya Mangpo Phothilimthana, Archibald Samuel Elliott, Abhinav Jangda, Bastian Hagedorn, Henrik Barthels, Rastislav Bodik, Vinod Grover
November 2018 • Paper • Accepted to ASPLOS 2019 ← Publications
Utilizing memory and register bandwidth in modern architectures may require irregular data placement and movement, such as shuffles and broadcasts. We unify these swizzle optimizations in a framework, called Swizzle Inventor, which allows programmers to express high-level optimization strategies, delegating the creation of swizzles to an automatic synthesizer. Our synthesis algorithm scales to real-world programs, allowing us to invent new GPU kernels for stencil computations, matrix transposition, and a finite field multiplication algorithm (used in cryptographic applications). The synthesized 2D convolution and finite-field multiplication kernels are on average 1.5–3.2x and 1.1–1.7x faster, respectively, than expert-optimized CUDA kernels.