Abstract
General purpose GPU architecture has various types of on-chip memory: registers, software-managed cache, and hardware-managed cache. These on-chip memory resources are powerful yet difficult to maneuver. Each type of on-chip memory has its advantages/disadvantages, making it suitable for different types of data. Further, the on-chip memory contention at different levels affects hardware concurrency that can be achieved on GPU. Unlike CPU architecture, on which on-chip memory allocation is performed under a fixed resource bound, GPU on-chip memory resource bound is a variable because of its relationship with the adjustable hardware concurrency. In this paper, we look at the data values that are analyzable at compile-time for placement in registers, softwaremanaged cache and hardware-managed cache. We propose an unified data placement strategy applicable to every type of on-chip memory, and yet flexible enough to maximize synergy among different types of on-chip memory