In order to provide the programmer with a useful set of abstractions for high-performance computing, the facilities of a language (or abstract machine model) must be a subset of those provided by the type architecture, which in turn must be a subset of the facilities of the physical machine. More importantly, the performance costs of these facilities must be accurately represented, and apparent in the source code. (This is known as the manifest cost principle[Cha95].) The Phase Abstractions programming model and Candidate Type Architecture provide this foundation for ZPL.
The Candidate Type Architecture has been proposed by Snyder in [Sny86,Sny94] as representative of most parallel machines. The CTA is a set of von Neumann processors, each with local memory and program counter, connected by a sparse communication network (i.e. having constant node degree). The network provides channels over which messages are sent with some latency, which is large compared to local memory access. For messages beyond a certain size, transmission time is proportional to the size of the message. The CTA also has a global controller processor, which is connected to each processor via a single channel.
The Phase Abstractions programming model is based on the XYZ programming levels, which provide a structure for organizing a computation:
ZPL (the Z Programming Language) consists of the SIMD subset of a complete parallel language known as A-ZPL[Lin93], and supports the Z Level, phase composition. Since this level is (conceptually) executed by the (sequential) global controller of the CTA, ZPL is an implicitly parallel language, consisting of sequences of phase invocations. Because ZPL is only a subset of a complete language (and no compiler for A-ZPL exists), these phase invocations are limited to operations on regions of parallel arrays.
A region is simply a set of indices. A region can be used to declare an array, or to specify the section of an array to which an operation is to be applied. The abstract maching model provides a separate thread of execution for each index value in a given region, so operations applied to regions of arrays are calculated in parallel. Inter-thread communication is specified at the array level, using shift operations, so communication patterns are always regular.
Control variables such as loop counters can be arrays with associated regions, in which case the flow of control is said to shatter, and each thread in the region is controlled by the corresponding array value. Another (similar) means of specifying parallel computation is the promoted scalar function. The user can apply any function or procedure that contains no parallelism (i.e. no explicit references to parallel arrays) to all elements of an array in parallel.
For more information about ZPL syntax and semantics, see [Sny96].