Linux IPVS

There are a number of ways to implement Linux IP Virtual Server, one of which from Red Hat Enterprise Linux is to make use of their Cluster Suite packages: ipvsadm and piranha.  We are using most of piranha’s components to front-end clients into our Oracle Peoplesoft systems running on IBM BladeCenter: its service pulse for heartbeat fail-over and lvsd as the IPVS director which spawns nanny processes for service health monitoring; we ignore using piranha’s web administration tools, as we manage most of that via text editing of its /etc/sysconfig/ha/lvs.cf and made our own custom utility scripts to wrap the various ipvsadm calls.

Peoplesoft is setup to do an IPVS direct routing method.  That is, client connections connect into the IPVS director hosting the virtual IP address.  The packet gets redirected by IPVS to a physical host running one of the Weblogic services.  After the client requests are processed by Weblogic, the reply packets get sent back directly to the client machine — without traversing back through the IPVS layer.  This is the most lightweight of methods offered by IPVS.

To configure IPVS for direct routing, all the hosts must be running on the same VLAN.  And since IPVS in this mode does not do any network address translation (nat), all of its virtual service port numbers must match the real service port numbers, i.e., https://hrms.somedomain.com:8443 might redirect as https://blade4.chassis2:8443.

It is important to setup any IPVS timeout that works with your application needs, i.e., our Peoplesoft portal pages have an enforced application timeout of 20-minutes, so IPVS should not drop any idle client connections in advance of that.  Just modify /etc/sysconfig/ipvsadm and add a line like:

--set 1200 0 0

Start the ipvsadm service, which merely echos this file to ipvsadm -R.  You can always change this value on-the-fly, too, so no worries here.  Then configure /etc/sysconfig/ha/lvs.cf to load-balance for two web services on different hosts:

serial_no = 12
primary = 10.25.63.62
service = lvs
backup_active = 1
backup = 10.25.63.63
heartbeat = 1
heartbeat_port = 539
keepalive = 3
deadtime = 6
syncdaemon = 1
network = direct
debug_level = NONE
monitor_links = 0
virtual hrms8443 {
  active = 1
  address = 10.25.63.64 eth3:0
  vip_nmask = 255.255.255.0
  port = 8443
  persistent = 1200
  expect = "200 OK"
  use_regex = 0
  send_program = "/usr/local/sbin/check_hrms_ssl.sh %h 8443"
  load_monitor = none
  scheduler = wlc
  protocol = tcp
  timeout = 6
  reentry = 15
  quiesce_server = 0
  server hrwebp01 {
    address = 10.25.63.60
    active = 1
    weight = 1
  }
  server hrwebp02 {
    address = 10.25.63.61
    active = 1
    weight = 1
  }
}

Start the pulse service and dump the IPVS table:

$ sudo ipvsadm -L -n
IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
 -- RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.25.63.64:8443 wlc persistent 1200
 -- 10.25.63.60:8443             Route   1      0          0
 -- 10.25.63.61:8443             Route   1      0          0

The last step is to setup each of the real servers running a web service each to process the redirected TCP packets for the virtual IP address, i.e., 10.25.63.64.  So, modify their /etc/sysconfig/iptables and add the following ruleset:

*nat
: PREROUTING ACCEPT [136:11568]
: POSTROUTING ACCEPT [12:1557]
: OUTPUT ACCEPT [12:1557]
-A PREROUTING -d 10.25.63.64 -p tcp -m tcp --dport 8443 -j REDIRECT
COMMIT

Startup the iptables service to enable these rules.  Again, there is no NAT’ing going on here, but these rules ALLOW the redirected packets from the IPVS host to be processed on this host’s networking stack, just as if it held the virtual IP itself.

For a number of years we have enjoyed the high availability successes of IPVS handling this Peoplesoft architecture.  Thus, we have also implemented IPVS for our core clinical information system running InterSystems Caché.  This implementation, however, does not make any use of piranha and does not use the direct routing method.  The main reason why we chose not to use piranha’s pulse / lvsd services is because there is a large mix of stateless (web) and stateful (telnet, ssh) connections into the applications.  And while the pulse / lvsd services could be configured to accommodate these requirements, there are some associated complexities in its configuration, startup, and fail-over scenarios that were undesirable to its system engineers.

Also, we were planning to run multiple Caché application services per physical blade, but alas, those services listen on ALL the host’s adapters (0.0.0.0) and cannot be configured to listen to assigned virtual IP addresses, only port numbering.  The thought of running one instance per KVM virtual guest to circumvent the port conflict was explored, but that introduced an unsupported RHEL 5 cluster configuration (it may be supported using RHEL 6, so that might be a future consideration) since there are integral, shared GFS2 clustered filesystems needed for the application, too.

These limiting factors did not hinder us from moving forward with IPVS.  Instead, we developed our own framework to operate this useful load-balancer without introducing “too much” complexity.  That is, the service configurations are maintained consistently on all physical hosts (dev-test-production), making for simpler startup and shutdown sequences, yet allowing for real-time dynamic changes to any environment without contending with fixed, running daemon services.

The IPVS host must have NAT enabled, so enable that in /etc/sysctl.conf as:

# Controls IP packet forwarding
net.ipv4.ip_forward = 1

The IPVS ruleset can be maintained in /etc/sysconfig/ipvsadm, although we have a cluster service (start, stop, status) script that invokes the appropriate ipvsadm commands, like thus:

start)
  # if either LAN is NOT configured, do nothing but exit normally,
  # because we may be running in some monolithic mode
  if ! alive $VIP ; then
    println "$INSTANCE ($VIP) is not alive"
    exit
  fi
  if ! alive $DIP ; then
    println "$INSTANCE ($DIP) is not alive"
    exit
  fi 

  # ok, setup as an IPVS director
  echo "1" > /proc/sys/net/ipv4/ip_forward
  grep -v '^#' /etc/sysconfig/ipvsadm | $ipvsadm -R
  $ipvsadm -A -t $VIP:22
  $ipvsadm -A -t $VIP:2972 -p
  $ipvsadm -A -t $VIP:9671
  ;;
stop)
  # downtime, teardown an IPVS director
  $ipvsadm -D -t $VIP:22
  $ipvsadm -D -t $VIP:2972
  $ipvsadm -D -t $VIP:9671
  ;;
status)
  # not a running IPVS director, just exit
  [ ! -f /proc/net/ip_vs ] && exit
...
  ;;

The result of this IPVS host looks something like this:

$ sudo ipvsadm -L -n
IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
 -- RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.25.63.79:22 wlc
 -- 192.168.2.131:22             Masq    1      1          0
 -- 192.168.2.132:22             Masq    1      0          0
 -- 192.168.2.133:22             Masq    1      1          0
 -- 192.168.2.134:22             Masq    1      0          0

TCP  10.25.63.79:9671 wlc
 -- 192.168.2.131:9671           Masq    1      105        0
 -- 192.168.2.132:9671           Masq    1      103        0
 -- 192.168.2.133:9671           Masq    1      103        0
 -- 192.168.2.134:9671           Masq    1      101        0

TCP  10.25.63.79:2972 wlc persistent 360
 -- 192.168.2.131:1981           Masq    1      10         0
 -- 192.168.2.132:1982           Masq    1      9          0
 -- 192.168.2.133:1983           Masq    1      10         0
 -- 192.168.2.134:1984           Masq    1      0          0

Note that there are only three clustered IIS web servers connecting into the four application server pool, so each web server (and each of its threads) connected to a discrete application server; the web client paradigm typically requires client persistence enabled, so there are no concurrency issues that comes with multi-threaded requests.

Optionally, you will need to setup some iproute2 rules on each of the application services, but only if you want the hosts running those target virtual services to be also reachable by other means, such as ssh’inf directly to the physical hostname.  Here’s how to configure that on a RHEL 5 server.  Modify /etc/iproute2/rt_tables and append:

118    development
128    test
138    production

… and create two new files in /etc/sysconfig/networking-scripts as rule-{adapter} and route-{adapter}, substituting {adapter} with the name of the physical network adapter, i.e., eth0 or br1.  First, the rule file:

from 192.168.2.111 lookup development
from 192.168.2.112 lookup development

from 192.168.2.121 lookup test
from 192.168.2.122 lookup test

from 192.168.2.131 lookup production
from 192.168.2.132 lookup production
from 192.168.2.133 lookup production
from 192.168.2.134 lookup production

Then the route file:

default via 192.168.2.118 table development
default via 192.168.2.128 table test
default via 192.168.2.138 table production