The BBN TC2000  is a scalable multiprocessor architecture which can support up to 512 computational nodes. The nodes are interconnected by a variant of a multistage cube network referred to as the butterfly switch. Each node contains a 20 MHz Motorola 88100 microprocessor and memory which can be configured for local and shared access. The contribution of each node to the interleaved shared memory pool is set at boot time.
The parallel processes are forked one at a time via the nX system routine fork_and_bind. This routine creates a child process via a UNIX fork mechanism and attaches the child to the specified processor node. The collection element tables and local collection elements are allocated in the local memory space on each node of the TC2000. There are several choices under nX for allocating collection elements in shared memory: across node memories (e.g., interleaved or random) or on a particular node's memory with different caching policies (e.g., uncached or cached with copy-back or write-through cache coherency). Currently, the TC2000 pC++ runtime system allocates collection elements in the ``owner's'' node memory with a write-through caching strategy. The TC2000 does not have special barrier synchronization hardware. Instead, we implemented the logarithmic barrier algorithm described in . Our implementation requires approximately 70 microseconds to synchronize 32 nodes. This time scales as the log of the number of processors.