Cornell Theory Centre Migrates to Dell Clusters
By Cornell Theory Centre (Issue 4 2001)
The Cornell Theory Centre has found that high-performance cluster systems from Dell can provide leading-edge performance while improving reliability and manageability. This article describes the centre's Velocity cluster complex and some of the applications that run on it.
The Cornell Theory Centre (CTC) is a centre of excellence in high-performance computing (HPC) and interdisciplinary research located at Cornell University. CTC supports faculty and staff from more than 100 different research areas as well as corporate clients that require leading-edge computational resources.
As one of the four original National Science Foundation supercomputing centres, CTC ran proprietary UNIX® -based systems from IBM, SGI, and others for more than 10 years. At the sunset of its national mission, CTC began to look for more cost-effective computing solutions that would provide leading-edge performance for its users while improving reliability and manageability. CTC has found that high-performance cluster systems from Dell can more than meet this challenge.
Building the Velocity cluster
In 1997, CTC received its first Dell® cluster as part of the Intel® Technology for Education 2000 program. This trial system consisted of 12 Dell PowerEdge® servers, six with dual Pentium® II 300 MHz symmetric multiprocessing (SMP) processors and six with dual Pentium II 450 MHz (SMP) processors, all running Microsoft® Windows NT® 4.0 Server. Ethernet (100 Mbps) was used for the cluster interconnect.
CTC wanted a platform that provided all the development and systems tools necessary to make it as functional for CTC users as the CTC production UNIX system. The platform also needed to focus on commercially available and supported tools.
Today, CTC runs more than 600 processors in a variety of cluster configurations ranging from the original production system, Velocity with 256 processors and Velocity+ with 128, to smaller systems customized for data warehousing, bioinformatics, and materials science.
Commercially available tools
CTC believed that both hardware and software in an industry- standard solution should be readily available and competitively priced. This means that it should be possible for anyone to duplicate the CTC HPC environment. CTC's challenge was to identify the commercially available tools necessary to offer quality production computing services to the CTC user community. CTC worked with software vendors to pull together the essentials, such as message-passing libraries, performance tools, math libraries, and compilers.
However, CTC did not find an adequate solution for parallel job scheduling, so it developed the ClusterController® scheduling system, which leverages the Windows® domain security model. ClusterController is now fully supported and commercially available from MPI Software Technology, Inc.
Once the tools were in place, installation and implementation of CTC's first large-scale cluster, Velocity, proceeded smoothly. Velocity comprises 64 Dell PowerEdge 6250 servers with quad Pentium III XeonTM 500 MHz processors (SMP) that have 2 MB cache per processor and 4 GB RAM per node. Each node has 54 GB of disk capacity (redundant array of independent disks, RAID-0, striped set). The interconnects are an Emulex® (formerly Giganet) switch (100 MB/sec) and 100 Mbps switched Ethernet. See Figure 1 .
Figure 1. Front view of CTC Velocity cluster system
CTC installed Velocity in less than 10 hours, brought it online just over 24 hours after the hardware delivery, and had applications running within a week. This system runs Windows 2000.
Price, performance, and reliability
The price for the initial Velocity system was approximately one-fifth of the previous system. The inclusion of three years' maintenance in the price dramatically reduced total cost of ownership (TCO).
In a few months, CTC ran the MP-Linpack benchmark on this machine and achieved 50 gigaflops (GFLOPS), which at the time was enough to make Velocity one of the first two Windows systems on the TOP500 Supercomputer Sites list. The other system was at the National Centre for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign.
CTC achieved 99.9986 percent availability (as independently verified by Massachusetts Institute of Technology) during the first three months it ran Windows 2000 on Velocity, and by late 2001, CTC reached 99.99999 percent across the entire machine room.
Once Velocity was up and running, the CTC consulting staff worked with the CTC user community to port their applications from UNIX to Windows. In many cases, this process was straightforward. However, several million lines of legacy code benefited from the CTC staff's experience with the Windows environment and tools like Cygwin, which assist migrations from UNIX to Windows.
In addition, CTC established a Collaboratory, a facility equipped with 10 high-end Dell workstations and all the necessary tools for training and development, where users could port their applications with the support of CTC consultants.
Velocity for parallel applications
The Velocity system was saturated with users within six months of going online, and CTC soon installed an additional system named Velocity+ to support strategic users who could take advantage of the entire system for parallel applications.
Velocity+ consists of 64 Dell PowerEdge 2450 servers with dual Pentium III 733 MHz processors and 2 GB of RAM per node. Each processor has 256 KB cache. Each node has 27 GB disk capacity (RAID-0), and the system uses Emulex cLANTM interconnect technology. Velocity+ , with half the processors (and installed at one-third the cost) of the original Velocity cluster, also achieved 50 GFLOPS on MP-Linpack.
As the queue of parallel users grew for this system, CTC implemented a special-purpose cluster of 36 serial nodes for code development and serial users, so that the parallel users received priority access to the Velocity systems. CTC also installed a small development cluster with eight dual-processor Pentium III Dell PowerEdge 1550 servers. The short time limit set on this system provides researchers quick turnaround on debugging and tuning runs.
Custom clusters for research groups
Research groups soon realized that they could afford to purchase their own custom systems for applications that range from bioinformatics to demographics. For example, the Cornell Institute for Social and Economic Research (CISER) system is a 32-processor cluster dedicated to running SAS, both parallel and serial, against secure census data. One of the groups that drove the purchase and installation of Velocity+, the Computational Materials Institute (CMI), later purchased a dedicated 64-processor parallel system designed to meet its needs for compute-intensive simulations in computational materials science.
When integrated into the overall computing environment with secure access for specific user groups, these custom clusters allow for best use of shared resources and demonstrate the flexibility of Dell server systems to meet a variety of needs.
An important piece of any HPC environment is a scientific visualization system such as the Cave Automatic Virtual Environment (CAVE) virtual reality system. CTC has recently implemented the world's first Windows-based CAVE with software from VRCO, Inc. CTC's CAVE is powered by five Dell Precision® 620 workstations with Xeon processors and 3Dlabs® WildcatTM graphics cards. This solution operates under the same Windows security domain and job scheduling system as the Velocity complex, making possible application steering for parallel simulations.
The other advantage of the Windows-based CAVE is the price/performance. CTC purchased the entire system for the price of six months' maintenance on the previous system, and performance is better. In addition, CTC recently purchased several Dell OptiPlexTM workstations with stereo-capable graphics cards for a CAVE Collaboratory. These systems are more than adequate for development, and the applications port seamlessly to the CAVE from the stereo desktops.
CTC strives to implement and demonstrate new ways to provide user-friendly and efficient interfaces to HPC systems, such as Web-based eScience applications. CTC has developed applications for bioinformatics, fracture mechanics, and computational finance that allow researchers to enter their data and an e-mail address on a custom Web page that submits an appropriate job to the Velocity clusters and then sends the results via e-mail to the user when the job is complete.
Researchers use the fracture mechanics site primarily in production runs. The site provides a cookbook approach to computational materials science applications for students and researchers, from model mesh generation to time-based simulation of crack growth. Engineers can easily leverage the codes that have been developed for this application without learning all the flags and options for each code. This example can be applied directly to streamline industrial research and development (R&D).
CTC's current Web-based implementation interfaces with the job scheduler, and the queue represents a bottleneck. By extending these eScience applications to .NET using Microsoft Application Centre 2000, CTC can run these types of problems in real time and use features such as automatic load balancing and failover. In this setup, the system automatically finds the server with the lightest load and the job runs immediately, eliminating the queue.
Working on the leading edge
CTC's focus on Dell systems has helped the centre stay at the leading edge of the technology. For example, a team of CTC CMI staff researchers recently acquired one of the first Dell Precision 730 ItaniumTM workstations and ported simulation code for fracture mechanics to this new architecture.
After working with Intel to optimize the code, the CMI team achieved a 340 percent speed improvement for single-processor execution on Itanium. The optimizations they conducted ranged from disambiguation over software pipelining to prefetching and optimizing BLAS/LAPACK (basic linear algebra subprograms/ Linear Algebra PACKage) routines. The speedup was sustained when they scaled up from one to four processors.
Industry-standard clusters help move HPC to Main Street
CTC has benefited from its move to Windows cluster computing on Dell systems. The performance meets the requirements of its demanding users, the systems are reliable and flexible, and the total cost of operation is a fraction of that for large custom systems. In addition, CTC has provided its users and staff with a fully integrated environment—from desktops to high-end server-based clusters and visualization systems.
When the migration began, the CTC team believed that if cluster computing were truly an industry-standard commodity, anyone should be able to purchase the software they need and install a cluster quickly without a sophisticated systems staff. CTC still believes that the ease-of-use and manageability that Microsoft Windows brings to the desktop will make high-performance computing truly available to the masses—not just well-funded academic institutions and national laboratories. CTC continues its efforts to make parallel programming easy and to move supercomputing out of the laboratory (research, engineering, and other scientific disciplines) and onto Main Street.
Cornell Theory Centre (www.tc.cornell.edu) is a high-performance computing and interdisciplinary computational research centre located at Cornell University. Researchers associated with the centre work in fields such as genomics, digital materials, drug design, and financial risk analysis. CTC supports faculty and staff from more than 100 different research areas as well as corporate clients that require leading-edge computational resources.
For more information
TOP500 Supercomputer Sites: www.top500.org