The KSR-1 is a shared virtual memory, massively-parallel computer. The memory is physically distributed on the nodes and organized as a hardware-coherent distributed cache . The machine can scale to 1088 nodes, in clusters of 32. Nodes in a cluster are interconnected with a pipelined slotted ring. Clusters are connected by a higher-level ring. Each node has a superscalar 64-bit custom processor, a 0.5 Mbyte local sub-cache, and 32 Mbyte local cache memory.
For the pC++ runtime system implementation, we used the POSIX thread package with a KSR-supplied extension for barrier synchronization. The collection allocation strategy is exactly the same as for the Sequent except that no special shared memory allocation is required; data is automatically shared between threads. However, the hierarchical memory system in the KSR is more complex than in the Sequent machine. Latencies for accessing data in the local sub-cache and the local cache memory are 2 and 18 cycles, respectively. Latencies between node caches are significantly larger: 150 cycles in the same ring and 500 cycles across rings. Although our current implementation simply calls the standard memory allocation routine, we suspect that more sophisticated memory allocation and management strategies will be important in optimizing the KSR performance.