Abstract
Many parallel applications exhibit a behavior in which each computation entity communicates with a small set of other entities and the communication pattern changes slowly with respect to time. We call this phenomenon switching locality. The Interconnection Cached Network (ICN) is a reconfigurable network suitable for exploiting such locality. The ICN contains a number of small crossbar switches that connect clusters of processing elements to the input/output ports of a single large crossbar. Technology restrictions impose a trade-o between the size of a switch and its switching speed. By using a large crossbar for topology configuration, and small crossbars for the more frequent task of message switching, the ICN effectively combines the connectivity of the large switch with the speed of the smaller switches. This is analogous to the concept of memory caching. Embedding communication patterns efficiently in an ICN requires finding a special kind of partitioning, called a bounded l-contraction, of the corresponding communication graph. The problem of identifying whether a graph has a bounded l-contraction for a given integer l is NP-complete for l > 2. We extend the class of classical communication graphs that are known to have efficient embeddings in the ICN. For general graphs, we develop a heuristic algorithm based on simulated annealing to solve this partitioning problem. In addition to providing a mapping strategy for assigning processes to processing elements, this partitioning also generates the topology to which the ICN must be configured. For applications with sufficient switching locality, good mappings combined with topology reconfiguration in the ICN ensure that communication path lengths are uniformly short. Conventional networks are less successful in meeting these objectives. Using both analysis and simulations, we show that the ICN outperforms other networks, such as multistage interconnection networks and low degree k-ary n-cubes, in terms of message latency, highest sustainable throughput, processor utilization and application scalability.