Abstract
Automatic partitioning, scheduling and code generation are of major importance in the development of compilers for massively parallel architectures. In this thesis we consider these problems, propose efficient algorithms and analyze their performances for automatic scheduling and code generation. In the first part of this thesis, we consider compile-time static scheduling when communication overhead is not negligible. We provide a new quantitative analysis of granularity issues to identify the impact of partitioning on optimal scheduling. We propose a new algorithm for scheduling on an unbounded number of processors named DSC, which outperforms existing algorithms in both complexity and performance. Furthermore, we study algorithms for scheduling on a bounded number of processors based on a multistep approach. We use DSC to cluster tasks and then use a load balancing and physical mapping heuristic to map the clusters onto processors. Finally we introduce a new task ordering algorithm that tries to minimize parallel time and overlap communication with computation. What distinguishes our approach with that of others in the literature is the low complexity and very good performance. We present theoretical and experimental results to verify the performance of these algorithms. In the second part of the thesis, we consider the code generation problem for message passing architectures. We propose a new optimized method for code generation in executing a schedule of an arbitrary task graph based on an asynchronous communication model. We optimize the code by reducing communication overhead, eliminating redundant communication and improving memory utilization. We present a correctness analysis of the generated code to assure data coherence and deadlock-free communication. We also present a new multicasting algorithm for a hypercube that eliminates unnecessary routing processors. We have implemented a compiler tool system named PYRROS that integrates the above algorithms for scheduling and code generations. The input of this system is a C program with annotated dependence information and the output is optimized parallel C code for nCUBE-I, nCUBE-II and iPSC/860 machines. The experimental results show that the performance of automatically produced code is comparable with that of hand-written code