Abstract
The widening gap between processor speed and main memory speed has generated interest in compile-time optimizations to improve memory locality. The possibility that processors will continue to outpace memory systems raises the question of whether these techniques can be used to produce ever higher degrees of locality. In this article, we discuss techniques that can be used to produce scalable locality": locality that can be adjusted for any ratio of processor and memory speeds. We identify a class of calculations for which existing techniques do not generally produce scalable locality, and give an algorithm for obtaining scalable locality for a subset of this class. We include empirical evidence that this transformation can provide dramatic speedups when processor speed is far higher than memory speed.