Logo image
Improving Inter-thread Data Sharing with GPU Caches
Technical documentation   Open access

Improving Inter-thread Data Sharing with GPU Caches

Lingda Li, Kun Wang, Eddy Z. Zhang and Mario Szegedy
Rutgers University
2014
DOI:
https://doi.org/10.7282/T39Z98F3

Abstract

The massive amount of fine-grained parallelism exposed by a GPU program makes it difficult to exploit shared cache benefits even there is good program locality. The non deterministic feature of thread execution in the bulk synchronize parallel (BSP) model makes the situation even worse. Most prior work in exploiting GPU cache sharing focuses on regular applications that have linear memory access indices. In this paper, we formulate a generic workload partitioning model that systematically exploits the complexity and approximation bound for optimal cache sharing among GPU threads. Our exploration in this paper demonstrates that it is possible to utilize GPU cache efficiently without significant programming overhead or ad-hoc application-specific implementation.
pdf
tr5b4f43fc4b8f4834.92 kBDownloadView
Technical Documentation Open Access
url
Report an accessibility issueView
Please complete a content remediation request to report an accessibility issue with a library electronic resource, website, or service.

Metrics

71 File downloads
83 Record Views

Details

Logo image