A lot of time and work has past since our blade server technology proposal, proof-of-concept, purchase, installation and configuration. Now we get ready to implement one of the cornerstones of this year’s projects by migrating our Peoplesoft Financial applications from HP/UX servers into one BladeCenter, while re-locating our core clinical applications from its current rack & stack implementation into another BladeCenter.
The first challenge set before us was to make certain all the modules in the IBM BladeCenter H chassis were in proper working order and with compatible firmware revisions for the forthcoming onslaught of host environments. Each module (advanced management, LAN and SAN switches, concurrent KVM, HBA cards, and BIOS) was checked and flashed when appropriate. For instance, since we are making use of its Open Fabric Manager feature, we discovered the second chassis required its AMM and HBA cards had to be flashed to later firmware releases than what was shipped from the factory. This was necessary to enable the AMM to properly apply the next set of unique WWN and MAC addresses to avoid conflicts with the first chassis.
The next challenge was to integrate the embedded Cisco SAN switches into our existing SAN infrastructure, made up of primarily Cisco and EMC. Fortunately for us, we have a crack Storage Team that was able to deliver trunking, zoning, multipathing (PowerPath), and automatic trespassing (ALUA) to enable our diskless blades to boot-from-SAN. To expedite the zoning, I was able to construct a Linux Live CD image (using Red Hat Fedora livecd-tools) to remote boot the blades over the network via the AMM and KVM option card — launching the required Navisphere agent to auto-register with the SAN management processors.
Once the boot LUNs were zoned and masked, it was a fairly simple task to install Red Hat Enterprise Linux 4 and 5 host environments. We are choosing to limit our boot-from-SAN to only a couple of large LUNs, allowing each root filesystem to reside within a logical volume (LVOL) on the LUN. It makes for simpler provisioning and helps reduce the number of devices enumerated over all the SCSI software layers. As seen in the screenshot, the included grand unified boot (GRUB) manager is all that is needed for the admin to make the proper LVOL selection.
Working with the Server Connectivity Modules (SCM) as the LAN switches into our center row distribution layer has produced visible benefits and revealed some interesting challenges. First, this implementation significantly reduces space and cabling count, to highlight:
- Occupies less than 1/3 the rack space;
- 81% reduction in power cords;
- 93% reduction in fiber SAN cables; and
- 70% reduction in copper LAN cables.
The challenge using this new network interface was in configuring Linux to properly team its NICs for fault tolerance. Normally, we just allow the bonding driver to fail-over to another slave ethernet device when detecting loss of a physical link. That sensor does not work for all use-cases with SCMs, because there are “no patch cables” between it and the blades’ NICs. Fortunately, there is a built-in option for bonding to do arp monitoring. We configured it to send out arp requests to a list of listening devices during half-second intervals when there is an absence of any other normal network traffic. That has the advantage of not adding any gratuitous overhead when normal networking traffic is already flowing. After a few rounds of configuration and testing, we have two pairs of teamed NICs per blade, which can be further carved up for virtual guests, or parted out dynamically if a dedicated NIC requirement became necessary.
Our clinical applications also make use of Red Hat’s Cluster Suite and their Global Filesystem (GFS). It was a fairly simple task to re-configure for BladeCenter, again with benefits. The fencing agent for BladeCenter is vastly simpler and more robust than the HP ILO management card — and there is no additional software fees to enable such features within IBM BladeCenter, whereas there is an associated $285 cost per card, just to unlock access to features already in the HP ILO management card — ugh!!
Here’s a look-see at its current cluster configuration:
[root@acropolis ~]# cman_tool status Protocol version: 5.0.1 Config version: 9 Cluster name: ccc_cluster Cluster ID: 6324 Cluster Member: Yes Membership state: Cluster-Member Nodes: 12 Expected_votes: 22 Total_votes: 22 Quorum: 12 Active subsystems: 0 Node name: acropolis Node ID: 1 Node addresses: 192.168.0.1
[root@acropolis ~]# cman_tool nodes Node Votes Exp Sts Name 1 1 22 M acropolis 2 1 22 M atlantia 3 1 22 M columbia 4 1 22 M cerberus 5 1 22 M galactica 6 6 22 M pacifica 9 6 22 M pegasus 10 1 22 M prometheus 11 1 22 M rycon 12 1 22 M solaria 13 1 22 M triton 14 1 22 M valkyrie [root@acropolis ~]# df Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/VGCCCBOOT1-lvolacropolis ext3 5.4G 3.3G 1.9G 64% / none tmpfs 7.9G 0 7.9G 0% /dev/shm /dev/mapper/VGCCC-lvolshare gfs 6.5G 20K 6.5G 1% /cluster/share /dev/mapper/VGCCC-lvolhome gfs 6.5G 264K 6.5G 1% /home
The blade names were chosen from ships in Battlestar Galactica. I initially wanted the twelve zodiac signs, but a few names were already taken. The same issue when I thought I could use the original twelve battle-cruiser class starship names from Star Trek. These names have a better fit, because when clustered, it justifies my blog title, Quorum of Twelve, in reference to the presiding body over the original twelve colonies.







Pingback: Robert Hurst » Analysis Paralysis