Advanced Micro Devices

AMD Virtualization Journal

Subscribe to AMD Virtualization Journal: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get AMD Virtualization Journal: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Related Topics: AMD Virtualization Journal

AMD: Article

Linux.SYS-CON.com Cover Story: Rapid Cluster Deployment

From delivery to production in hours

After building a number of clusters from the ground up -including one that made it to the Top500 Supercomputer list - I decided to try a service that many vendors now offer - having a system racked and stacked at the factory then shipped to us. Such a service saves a huge amount of time, not to mention my back, not having to build the cluster and cable all the equipment together. I've been a fan of well-cabled systems and have found the quality control to be acceptable. The key component is the pre-build requirements and verification before the system is built. This will ensure the system shipped is what is expected when it arrives at your front door. There can still be a fair amount of cabling that has to be done once it arrives, if you have a multi-rack configuration, but it's usually limited to plugging in the system's power and public network.

Once this is done, the fun begins...
I've tried a few cluster distribution toolkits, and the one that works for me is the Rocks Cluster Distribution from the San Diego Supercomputing Center. I came across the package in a simple Google search in 2002 and was immediately sold on it. I use the term "sold" loosely since it's under an Open Source BSD-style license available for download and supported by a broad range of technical people who answer most questions on the Rocks user list. I've found support on the list to be better than most commercial distributions, but this may be because there are over 500 registered systems on the Rocks Register.

Here's how simple it is - insert the boot CD, complete a few screens worth of configuration data, and grab a coffee because it's a fairly simple base installation. The Rocks solution is extensible, with a mechanism for users and software vendors to ensure customizations are correctly installed on the system at setup. The mechanism is called a Roll.

The Roll typically consists of packages (RPMS/SRPMS/source) that have to be installed and scripts that are needed to ensure the packages are properly installed and distributed on the cluster. The Rocks team has extensive documentation for the Roll developer in the user manual.

Rocks 4.0.0 is a "cluster on a CD" set. That is it contains all the bits and configuration to build a cluster from "naked" hardware. The core OS bundled with Rocks is CentOS 4, which is a freely downloadable rebuild of Red Hat Enterprise Linux 4. As a side note, in Rocks CentOS 4 is encapsulated as the "OS Roll" and this OS Roll can be substituted with any Red Hat Enterprise Linux 4 rebuild (e.g., Scientific Linux ) including the official bits from Red Hat. Rolls are used in Rocks to customize your cluster. For example, the HPC Roll contains cluster-specific packages, such as an MPI environment for developing and running parallel programs. Two other examples are the Ganglia Roll, which provides cluster-monitoring tools, and the Area51 Roll, which provides security tools such as Tripwire and chkrootkit.

The Software
The core OS we used for the cluster in this article is CentOS 4.0 and the rolls we used to customize the cluster to our needs were the Compute Roll and the PBS Roll from University of Tromso in Norway.

The Hardware

  • 1 - Front-end node - a Dell PowerEdge 2850 with dual 3.6GHz Intel Xeon EM64T processors and 4GB RAM
  • 48 - Compute nodes - Dell PowerEdge SC 1425s with dual 3.4GHz Intel Xeon EM64T processors, 2GB RAM and a Topspin PCI-X Infiniband HCA card
  • 1 - Topspin 270 Infiniband chassis with modules
  • 4 - Dell PowerConnect 5324 Gigabit Ethernet switches
  • 1 - Panasas Storage Cluster with one DirectorBlade and 10 StorageBlades
  • 2 - Dell 19-inch racks

    Start the build process ***time 0:00:00***
    Setting up the front-end:
    - Insert Compute Roll and boot the system
    - Select hpc, kernel, ganglia, base, java, and area51 as the rolls to install
    - Select "Yes" for additional roll
    - Insert CentOS disk 1
    - Select "Yes" for additional roll
    - Insert CentOS disk 2
    - Select "Yes" for additional roll
    - Insert PBS roll
    - Select "No" for additional rolls
    - Input data on the configuration screen (e.g., fully qualified domain name, root password, IP addresses)
    - Select "Disk Druid" to create partitions
    - Create/partition ext3 64GB
    - Create swap partition 4GB
    - Create/export partition 64GB
    - Insert CDs as requested to merge them into the distribution

The most important step...grab a mocha and enjoy it while the install runs.

After the-front end installation completes, the site-specific customization of the front-end starts. The base installation of CentOS 4.0 x86_64 has the 2.6.9-5.0.5.ELsmp kernel and we need the 2.6.9-11.ELsmp for many of the packages that will be included with our cluster. Below we'll describe how we do this key upgrade then continue with many package and mount point customizations.

Customization of the front-end:
The first step is to apply the updated kernel packages:

  • # rpm -ivh kernel-smp.2.6.9-11.EL.x86_64.rpm
  • # rpm -ivh kernel-smp-devel-2.6.9-11.EL.x86_64.rpm
  • # rpm -ivh kernel-sourcecode-2.6.9-11.EL.x86_64.rpm

    I always check /boot/grub/grub.conf to be sure the system is booting from the proper kernel after an update.

    Then apply an RPM to resolve a known (to us) library issue:

  • # rpm -ivh compat-libstdc++-33-3.2.3-47.3.i386.rpm

    Prepare for Panasas Storage Cluster and Topspin integration on the front-end:

  • # rpm -ivh panfs-2.6.9-11.EM-mp-2.2.3-166499.27.rhel_4_amd64.rpm
  • # rpm -ivh topspin-ib-rhel4-3.1.0-113. x86_64.rpm
  • # rpm -ivh topspin-ib-mod-rhel4-2.6.9-11.ELsmp-3.1.0-113.x86_64.rpm

    Time for a break. ***time 1:05:00***

I need to complete the setup of the disks on the front-end because there are two RAID volumes and Rocks only configures the first disk (the boot disk) on the front-end leaving the other disks untouched.

Create a second partition for applications:

# fdisk /dev/sdb
(400GB single partition on our system)

Create the file system and mount point:

# mkfs -t ext3 /dev/sdb1
# mkdir /space

Modify /etc/fstab to include the mount point then mount it:

# mount /space

Now let's start adding some of the goods...

  • Install Portland Group compilers in /space/apps/pgi/
  • Install Intel 9.0 compilers in /space/apps/pgi
  • Install OSU MVAPICH 0.95 in /space/apps/mvapich
  • Build version of MVAPICH for intel/gnu/pgi
  • Install our own version of Python in /space/apps/python64
  • Install f2py, Numeric, pyMPI built against our vanilla version of python64
The Rocks solution uses a simple XML framework to provide a programmatic way in which to apply site-specific customizations to compute nodes. While this requires a small learning curve regarding the XML syntax, once you make the transition from "administering" your cluster to "programming" your cluster, you'll find that writing programs (in the form of scripts) are a powerful way in which to ensure your site customizations are consistent across all compute nodes. The following describes how we used the XML framework to apply our customizations.

More Stories By Steve Jones

Steve Jones is currently the technology operations manager at the Institute for Computational and Mathematical Engineering at Stanford University. Steve designed and administered a Top 500 Supercomputer and speaks regularly about the design and management of High Performance Computing Clusters, most recently as a keynote speaker at the annual Rocks-a-Palooza conference at the San Diego Supercomputing Center. His free time is spent with his significant other, Leilani, far away from a keyboard. More information about Steve can be found at http://www.hpcclusters.org.

Comments (4) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Most Recent Comments
clusteradmin.net 02/18/08 06:17:49 PM EST

For those who came here searching for cluster resources you may consider visiting my blog (http://clusteradmin.net) about cluster administration. Some introductory stuff, load-balancing guide, monitoring and other articles.

Thanks,

-marek

Grid 04/01/06 10:38:44 AM EST

Seems like SGE was not mentioned:
http://gridengine.sunsource.net

Grid 04/01/06 10:36:27 AM EST

Seems like SGE was not mentioned:
http://gridengine.sunsource.net

SYS-CON Belgium News Desk 03/17/06 09:36:01 AM EST

After building a number of clusters from the ground up -including one that made it to the Top500 Supercomputer list - I decided to try a service that many vendors now offer - having a system racked and stacked at the factory then shipped to us. Such a service saves a huge amount of time, not to mention my back, not having to build the cluster and cable all the equipment together. I've been a fan of well-cabled systems and have found the quality control to be acceptable. The key component is the pre-build requirements and verification before the system is built. This will ensure the system shipped is what is expected when it arrives at your front door. There can still be a fair amount of cabling that has to be done once it arrives, if you have a multi-rack configuration, but it's usually limited to plugging in the system's power and public network.