Abstractions, algorithms, and tools for the mapping of parallel algorithms to message-passing machines when both the computation and the interconnection network are regular in structure.
Funding: Abstractions and Algorithms for Mapping Parallel Algorithms to Parallel Architectures, supported by NSF MIP91-08528 and OACIS. Virginia Lo also supported by CCR-8808532; Sanjay Rajopadhye (rajopadh@irisa.fr) also supported by MIP-8802454.
Authors: Virginia M. Lo, Sanjay Rajopadhye, Samik Gupta, David Keldsen, Moataz A. Mohamed, Bill Nitzberg, Jan Arne Telle, and Xiaoxiong Zhong
The OREGAMI project involves the design, implementation, and testing of algorithms for mapping parallel computations to message-passing parallel architectures. This paper is an overview of the project and describes the underlying formal model used by OREGAMI, the LaRCS graph description language, and gives an overview of OREGAMI's mapping algorithms and the OREGAMI software tool.
Authors: Virginia M. Lo, Sanjay Rajopadhye, Moataz A. Mohamed, Samik Gupta, Bill Nitzberg, Jan Arne Telle, and Xiaoxiong Zhong
LaRCS is a graph description language which enables the
programmer of parallel algorithms to specify information about the
static and temporal communication structure of parallel
algorithms. The information contained in a LaRCS program is used
for the mapping problem: to assign tasks in the parallel
computation to processors, and to route inter-task messages along
the links of the interconnection network. This paper describes the
LaRCS language, gives a comparison to other graph description
languages, and illustrates its use in mapping.
Author: Virginia M. Lo
The Temporal Communication Graph (TCG) is the graph theoretic
model of parallel computation underlying OREGAMI. It can be viewed
as an augmented version of Lamport's process-time graphs. The TCG
integrates the classic static task graph of Stone and Bokhari with
the DAG model used for scheduling.
Authors: Virginia M. Lo, Rajen Datta, and Kurt Windisch
METRICS is software tool for the static (compile time) analysis
of mappings. It is designed for analysis of message-passing
parallel computations that have been mapped to machines with mesh
or hypercube toplogies, including the iPSC/2, iPSC/860, NCUBE-640,
NCUBE-320, Paragon, Touchstone Delta, and iWarp.
Techreport number CIS-TR-92-22 - mesh-binomial.old.ps.Z
As Submitted for publication in TPDS - mesh-binomial.ps.gz
Authors: Virginia M. Lo, Sanjay Rajopadhye, Jan Arne Telle, and Xiaoxiong Zhong
This paper addresses the problem of mapping divide-and-conquer programs to mesh-connected multicomputers with wormhole or store-and-forward routing. Performance metrics are defined and the two mapping algorithms are described and analyzed. The Reflecting Mapping is shown to be optimal with respect to "communication slowdown" for wormhole routing, while the Growing Mapping is shown to be better suited for store-and-forward.
Authors: Virginia Lo, Kurt Windisch, Wanqian Liu, Bill Nitzberg.
Current processor allocation techniques for highly parallel systems are typically restricted to contiguous allocation strategies for which performance suffers significantly due to the inherent problem of fragmentation. As a result, message passing systems have yet to achieve the high utilization levels exhibited by traditional vector supercomputers. We are investigating processor allocation algorithms which lift the restriction on contiguity of processors in order to address the problem of fragmentation.
Three non-contiguous processor allocation strategies: Paging allocation, Random allocation and the Multiple Buddy Strategy (MBS) are proposed and studied in this paper. Simulations compare the performance of the non-contiguous strategies with that of several well-known contiguous algorithms. We show that non-contiguous allocation algorithms perform better overall than the contiguous ones, even when message-passing contention is considered. We also present the results of experiments on an Intel Paragon XP/S-15 with 208 nodes that show non-contiguous allocation is feasible with current technologies.
Authors: Kurt Windisch, Virginia Lo, and Bella Bose.
Efficient utilization of processing resources in a large, multi-user parallel computer depends on processor allocation algorithms that minimize system fragmentation. We propose three processor allocation algorithms for the k-ary n-cube class of parallel architectures, which includes the hypercube and multidimensional torus. The k-ary Partner strategy is a conventional contiguous processor allocation strategy that improves subcube recognition. The non-contiguous Multiple Buddy and Multiple Partner strategies lift the restriction of contiguity in order to address the problem of fragmentation associated with contiguous strategies. Simulations compare the performance of these three strategies with the performance of other k-ary n-cube allocation strategies, showing that non-contiguous allocation provides significantly increased system utilization by eliminating fragmentation.
Authors: Wanqian Liu, Virginia M. Lo, Kurt Windisch, Bill Nitzberg.
Current processor allocation techniques for highly parallel systems have thus far been restricted to contiguous allocation strategies for which performance suffers significantly due to the inherent problem of fragmentation. We are investigating processor allocation algorithms which lift the restriction on contiguity of processors in order to address the problem of fragmentation. Three non-contiguous processor allocation strategies: Naive, Random and the Multiple Buddy Strategy (MBS) are proposed and studied in this paper. Simulations compare the performance of the non-contiguous strategies with that of several well-known contiguous algorithms. We show that non-contiguous allocation algorithms perform better overall than the contiguous ones, even when message-passing contention is considered. We also present the results of experiments on an Intel Paragon XP/S-15 with 208 nodes that show non-contiguous allocation is feasible with current technologies.
Authors: Kurt Windisch, Jayne Valenti Miller, Virginia Lo.
ProcSimity is a software tool that supports research in processor allocation and scheduling for highly parallel systems. Allocation involves selection of a set of processors for a newly arrived job, while scheduling refers to the job level scheduling discipline. ProcSimity's multicomputer simulator supports experimentation with selected allocation and scheduling algorithms on architectures with a range of network topologies and for several current routing and flow control mechanisms (wormhole, virtual cut-through, and store-and-forward). Message-passing can be simulated in detail at the flit level or at a higher level of modeling. Our tool supports both stochastic job streams as well as communication patterns from actual parallel applications, including several of the NAS parallel benchmarks. ProcSimity's visualization and performance analysis tool allows the user to view a dynamic animation of the selected algorithms as well as a variety of system and job level performance metrics. ProcSimity has been successfully used in experiments investigating the feasibility of non-contiguous processor allocation in meshes.
Authors: Virginia Lo, Kurt Windisch, Bill Nitzberg, Dror Feitelson, and Reagan Moore.
Authors: Virginia Lo, Kurt Windisch, Bill Nitzberg, Dror Feitelson, and Reagan Moore.
Authors: Virginia Lo, Kurt Windisch, Bill Nitzberg, Dror Feitelson, and Reagan Moore.