Skip Navigation

Colloquium Details

Redesigning the Cluster Architecture (or How to Build an 8TF 1024 Node Cluster)

Author:Matt Sottile Los Alamos National Laboratory
Date:December 02, 2002
Time:15:30
Location:220 Deschutes

Note: Special Day

Abstract

Clusters have proven to be capable of supporting high-performance scientific computations over the past decade, yet the technologies we use to build and manage them are the same or in some ways worse than those used over a decade ago. This is due in part to the fact that with great human effort, these technologies can be made to work for today's clusters with node counts in the hundreds. As the DOE and others begin investing in terascale clusters with node counts in the thousands, it is becoming increasingly clear that existing technologies will not scale. In this talk, we describe Clustermatic, our redesigned cluster architecture, and some of the key technologies (e.g., LinuxBIOS, BProc) that distinguish it from other approaches. These technologies are all open source and can be used independently of each other, but together they form an architecture that is both scalable and manageable at scale. Clustermatic has been used to build several mid-sized clusters which have served as proving grounds for a 1024-node cluster that will be installed at LANL in January of 2003.

The LANL Cluster Home Page is at http://www.acl.lanl.gov/cluster

Biography