Colloquium Details
Redesigning the Cluster Architecture (or How to Build an 8TF 1024 Node Cluster)
Author: | Matt Sottile Los Alamos National Laboratory |
---|---|
Date: | December 02, 2002 |
Time: | 15:30 |
Location: | 220 Deschutes |
Note: Special Day
Abstract
Clusters have proven to be capable of supporting high-performance scientific computations over the past decade, yet the technologies we use to build and manage them are the same or in some ways worse than those used over a decade ago. This is due in part to the fact that with great human effort, these technologies can be made to work for today's clusters with node counts in the hundreds. As the DOE and others begin investing in terascale clusters with node counts in the thousands, it is becoming increasingly clear that existing technologies will not scale. In this talk, we describe Clustermatic, our redesigned cluster architecture, and some of the key technologies (e.g., LinuxBIOS, BProc) that distinguish it from other approaches. These technologies are all open source and can be used independently of each other, but together they form an architecture that is both scalable and manageable at scale. Clustermatic has been used to build several mid-sized clusters which have served as proving grounds for a 1024-node cluster that will be installed at LANL in January of 2003.
The LANL Cluster Home Page is at http://www.acl.lanl.gov/cluster
Biography