A far more interesting benchmark in the NAS suite is the random sparse conjugate gradient computation. This benchmark requires the repeated solution to , where is a random sparse matrix. There are two ways to approach the parallelization of this task. The best, and most difficult, is to view the matrix as the connectivity graph for the elements in the vector. By a preprocessing step one can partition the vector into segments of nearly equal size that minimizes the communication between subsets. The resulting algorithm can have the same low communication to computation ratio as our simple grid CG example above.
The second approach is not as efficient in terms of locality, but it a more direct implementation of the benchmark. In this version we represent the matrix as a distributed random sparse matrix. More specifically, we represent the matrix as a Distributed Matrix of Sparse Matrices. The collection class in the pC++ library allows any class that has the algebraic structure of a ring to be the element of the matrix. Furthermore, a by matrix whose elements are by matrices has all the same mathematical properties of a by matrix. Indeed, this is the key property used in the blocking algorithms used in most BLAS level 3 libraries. Consequently, we can select the element class to be a sparse matrix. This distributed matrix of sparse matrices can then be used like an ordinary matrix.