Abstract
Besides the algorithm selection, the data layout choice is the key intellectual step in writing an efficient HPF program. Although finding an efficient data layout fully automatically may not be possible in all cases, HPF users will need support during the difficult data layout selection process. In particular, this support is necessary if the user is not familiar with the characteristics of the target HPF compiler and target architecture, or even with HPF itself. In addition to the target compiler and architecture, the quality of a data layout depends on the problem size, the number of processors, and the available memory on each processor. Therefore, tools and techniques for automatic data layout will be crucial if the HPF is to find general acceptance in the scientific community. The memory requirement characteristics of a data layout are particularly important for applications that are executed on a parallel machine mainly because of the amount of main memory that the machine provides, rather than its computation power. It may not be possible to execute such a memory intensive program on a conventional uniprocessor due to the lack of the necessary memory resources. This paper discusses a new framework for automatic data layout that considers read--only data replication and minimizes the overall execution time under given memory constraints. The framework can be used to generate data layout specifications with additional read--only array copies. Many applications use arrays that are assigned a value and keep this value over large portions of the program. In such read--only regions, multiple read--only copies of the array --- each copy with a different data layout --- may avoid otherwise necessary communication, resulting in a reduction of the overall execution time. Read--only replication does not come for free, since it increases the memory requirements of the program. The approach presented in this paper addresses the necessary tradeoff decisions between read--only replication and memory requirements in a new, unified framework that extends our previous framework for automatic data layout with remapping. As in our previous work, the data layout selection problem is formulated as an efficient 0--1 integer programming problem. Preliminary experiments show the performance tradeoffs between the new and old formulations.