Abstract
Problems in the class of unstructured sparse matrix computations are characterized by highly irregular dependencies and communication patterns that are not known at compile-time, but can be completely determined at run-time before the computations are actually performed. For this class of problems, current parallelizing compilers are unable to produce efficient code on large-scale distributed memory MIMD multiprocessors, and manual techniques are inflexible and too ad hoc to be generally effective. In this thesis, we propose a run-time automatic partitioning and scheduling methodology for unstructured sparse matrix computations on large-scale multiprocessors. Our methodology is based on extracting information from the problem instance by preprocessing its symbolic structure, and using this information to achieve high performance in repeated iterations of the computations during which the symbolic structure is unchanged. We present efficient software tools to help users build their parallelization system by following this methodology. We demonstrate the efficacy of our methodology on sparse Cholesky factorization, which has historically proven to be hard to parallelize. The highlight of our approach is a new two-dimensional block partitioning scheme. We build a run-time parallel system for block sparse Cholesky factorization called Sparse Hybrid Automatic Parallelization Environment (SHAPE), consisting of a parallel partitioner, a parallel scheduler and a parallel communication optimization algorithm. These are modular tools tied together by an explicit representation for block-based unstructured computations. We employ SHAPE to carry out an extensive experimental study of sparse Cholesky factorization on the iPSC/860. The experimental results show that with a judicious choice of partitioning parameters, our block-based partitioning and scheduling method outperforms a well-known column-based method in delivering high performance on a variety of structured and unstructured matrices. The preprocessing itself is shown to be very efficient, its cost being recovered in a small number of iterations of the factorization. Our methodology and tools may be used to parallelize other unstructured sparse matrix computations for which the same symbolic structure is used in several iterations of the computations. Such computations include sparse triangular solution and sparse matrix-vector multiplication.