ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


Using Xen for High Availability Clusters

by Kris Buytaert and Johan Huysmans

02/05/2008

The idea of using virtual machines to build high available clusters is not new. Some software companies claim that virtualization is the answer to your HA problems, off course that's not true. Yes, you can reduce downtime by migrating virtual machines to another physical machine for maintenance purposes or when you think hardware is about to fail, but if an application crashes you still need to make sure another application instance takes over the service. And by the time your hardware fails, it's usually already too late to initiate the migration.

So, for each and every application you still need to look at whether you want to have it constantly available, if you can afford the application to be down for some time, or if your users won't mind having to relogin when one server fails.

If you need to have your application always available, you'll need to investigate if you can run two instances of the application at the same time and what happens if you would connect to another application instance in the middle of a session.

Using Xen at Newtec

This is exactly the kind of challenges we run into at Newtec, a market leader in Satellite Communications, delivering an extensive range of highly innovative products and solutions. Since being founded in 1985, TV broadcasters, Telecom Service Providers, Integrators, and Satellite Operators from all over the world have been relying on the exceptional performance and reliability of Newtec's unrivaled products.

Newtec has different test, development, and validation platforms, which have identical network topologies that are isolated on their own vlan. For each of these platforms, a gateway has been created that allows developers and testers to log on to their platform. All of these gateways have a small footprint and, apart from their vlan configuration, they are identical. There are no applications running on these gateways, they mainly function as access point to platforms. A user ssh's to the platform and from there connects to the server or application he needs. So, we implemented a Xen virtual machine host that hosts 12 of those Virtual Gateways.

If this machine goes down, it means that a couple dozen developers can't access the machines they are supposed to be working on. Upgrades or maintenance have to be done in a time window when nobody is working on the platforms or we have to provide other alternative path to the platform.

The initial idea was to provide two completely identical machines. If an error occurs on the running machine, the network cables would be switched to the standby machine. That idea was not really feasible because manual intervention and physical access to the machines is required to switch the cables. It is also not possible to remotely log in to the standby machine to perform some administrative tasks as it is disconnected.

We wanted to avoid those problems and therefore both physical machines had to be accessible at all times. Only the virtual machines on the active host may be remotely accessible. The virtual machines on the passive host must be running to allow a fast switch of the active node.

The main problem was to find a way that allowed virtual machines to be running, but not connected to the network while the physical machine was connected to the network and accessible.

This had to be combined with a system that allows to automatically switch this network-blockage to the other host.

Isolating Virtual Machines on the Network

If you want to find a way to block network traffic to the virtual hosts you must understand how networking is handled by Xen.

Whenever a system is booted with a xen-kernel it will just have an eth0, just like any other system. This will change when the xend service is started. The xend service will activate the bridging and the necessary devices, which allows network traffic from outside to the virtual hosts.

At first the current eth0, which is the physical network card, will be changed to peth0. For other network interfaces (eth1, ...) the same will happen (peth1, ...). From now on you have to check the link status on device peth0 instead of eth0.

Next eth0 will be created the same way as on every virtual host. The virtual host will only see eth0, which has a corresponding vif (virtual interface) on dom0. Eth0 on dom0 (which has ID 0) will be mapped to vif0.0, eth1 of dom0 will be mapped to vif0.1. On a virtual host with ID 3 the eth0 interface will be mapped against vif3.0. In general you have a unique vifY.X interface on dom0 for every interface of each virtual host. Y indicates the ID of the virtual host and X the number of the interface on the dom0 (eth0, eth1, ...).

The connection between the physical interface (peth0) and the interface of the virtual host (vifY.X) is made by bridging. The amount of bridges and which interface is bounded to which bridge is handled by the Xen configuration.

So we know how the network communication inside Xen works, but how can we stop the network communication between the outside and our virtual machines?

We can remove the vif-interface from the bridges. But if we reboot a virtual machine, it will be created with a new ID and also a new vif-interface, which will be automatically connected to the bridge. Any rebooted machine or freshly started virtual machine will have immediate connection with the outside world.

We can configure the virtual machines to not have connection with the outside world after a fresh boot. But when a virtual machine is rebooted on the active machine it won't have any connection with the outside world either.

Both cases fail because the disabling or enabling of the network traffic to the virtual machines is a one-time operation and when a virtual machine machine is rebooted is must have the same connectivity as before.

Instead of removing or adding these devices to the bridge, we can just block all traffic to these specific devices. This can easily be done with iptables.

With iptables we can't specify "all virtual machines." So we assume that "all virtual machines" means all the network devices execpt the ones from dom0.

iptables -I FORWARD 1 -m physdev --physdev-in vif0.0 -j ACCEPT
iptables -I FORWARD 2 -m physdev --physdev-out vif0.0 -j ACCEPT
iptables -I FORWARD 3 -m physdev --physdev-in vif0.1 -j ACCEPT
iptables -I FORWARD 4 -m physdev --physdev-out vif0.1 -j ACCEPT
iptables -I FORWARD 5 -j DROP

The first four rules will allow traffic to the dom0, rule 5 will drop all traffic. The sequence of the rules is very important and therefore we inserted them at a specific place instead of just adding them.

All traffic for the virtual hosts is now blocked. But whenever you start the networking on one of the vhosts it will notice that the IP address is already in use. This happens because whenever you assign an IP to an interface it will first send an arpping to see if the address is not already taken. Arp-packets can't be blocked by iptables and therefore arptables is also needed.

Arptables comes with most distributions, on a Red Hat style distribution you will need the arptables_jf package.

Check that both iptables and arptables are installed.

[root@XEN-A ~]# rpm -qa | grep tables
arptables_jf-0.0.8-2
iptables-1.2.11-3.1.RHEL4

Check that both services are started during boot.

[root@XEN-A ~]# chkconfig --list | grep tables
arptables_jf 0:off 1:off 2:on 3:on 4:on 5:on 6:off
iptables 0:off 1:off 2:on 3:on 4:on 5:on 6:off

From the package info: "The arptables_jf utility controls the arpfilter network packet filtering code in the Linux kernel. You do not need this program for normal network firewalling. If you need to manually control which arp requests and/or replies this machine accepts and sends, you should install this package."

An arptables ruleset looks really similar to an iptables ruleset

arptables -I FORWARD 1 -i vif0.0 -j ACCEPT
arptables -I FORWARD 2 -o vif0.0 -j ACCEPT 
arptables -I FORWARD 3 -i vif0.1 -j ACCEPT 
arptables -I FORWARD 4 -o vif0.1 -j ACCEPT 
arptables -I FORWARD 5 -j DROP

These arptables rules do exactly the same as the iptables rules but now for arp packets.

The iptables and arptables rules will block all traffic to the virtual hosts. If we want to enable the traffic on one machine, only one iptables and one arptables rule has to be changed.

iptables -R FORWARD 5
arptables -R FORWARD 5

These rules will overwrite the DROP-rule (rule nr. 5). Traffic passing that rule won't be accepted or dropped, but just passed to the next rule, which is exactly what we want.

Disabling the traffic is as easy as allowing all traffic.

iptables -R FORWARD 5 -j DROP
arptables -R FORWARD 5 -j DROP

You can save these rules so that the initscripts you enabled earlier will use them at boot time by running "iptables-save" or "arptables-save".

Adding Heartbeat

Linux-HA, aka Heartbeat, is used to monitor which node is active and to automate the failover. We wrote a init script around these rules that can be used as a Heartbeat resource, which allows you to start (unblock) or stop (block) the network traffic to the vhosts.

While creating this setup, we encountered some problems with Xen and Heartbeat. Initially we were working with Heartbeat version 1. One of the error messages was:

ERROR: No local heartbeat. Forcing shutdown

This error message was explained in the Heartbeat FAQ, but the indicated causes didn't make any sense. An update from Xen 3 to Xen 3.0.4 didn't solve the problem. Upgrading Heartbeat from v1 to v2 made the error disappear, and Heartbeat started working without any problems.

Heartbeat v2 has a complete new config file. The main config file (ha.cf) is now an XML file. But don't worry, if you don't want to upgrade the config file, you can keep working with a v1 config file. You can't take advantage of the new features of Heartbeat v2, but at least you're taking advantage of the bugfixes ;).

A Linux-HA v1 cluster needs three key files to function, these configs do still work with Heartbeat v2.

/etc/ha.d/authkeys

Auth 2
2 sha1 SomeVeryLongSentenceWhichWillMa<WBR>keTheTrafficABitMoreSecure!

/etc/ha.d/ha.cf

debugfile /var/log/ha-debug
logfile /var/log/ha-log
keepalive 1
deadtime 5
warntime 2
initdead 120
baud    19200
ucast eth0 192.168.253.99

auto_failback off
node    GW-X
node    GW-Y
ping_group group1 192.168.253.244 192.168.253.99
respawn hacluster /usr/lib/heartbeat/ipfail

In the above file, we define the two different nodes of which the cluster consists, make sure that the hosts names you mention are known on both nodes! And the last file you need is /etc/ha.d/haresources.

XEN-A 192.168.253.97 unblock-vhosts-network

This last file contains the resources an active node should carry, being the IP address we want to use as the actual IP address the node is listening to, and the unblock-vhost-network script. Once a node becomes active, it will run this script with a start parameter unblocking access to the virtual machines. When a node is degraded it stops the same script, hence blocking arp and IP level access to the virtual machines.

The unblock-vhosts-network is build like any other init script and can handle these parameters: start, stop, restart, and status. The script will be located in: /etc/rc.d/init.d/unblock-vhosts-network .

Make sure this script is not started during boot, it will be called by Heartbeat.

#!/bin/sh
#
# unblock-vhosts-network
#
# chkconfig: 2345 08 92
# description: Allow or block traffic from/to xen vhosts
#

# Source function library.
. /etc/init.d/functions

start() {
    iptables -R FORWARD 5 
    echo -n "iptables -R FORWARD 5"

    if [ $? -eq 0 ]; then
        success; echo; ret=0
    else
        failure; echo; return 1
    fi
 
    arptables -R FORWARD 5 
    echo -n "arptables -R FORWARD 5"
    if [ $? -eq 0 ]; then
        success; echo; ret=0
    else
        failure; echo; return 1
    fi
    
    return $ret
}

stop() {
    iptables -R FORWARD 5 -j DROP
    echo -n "iptables -R FORWARD 5 -j DROP"
    if [ $? -eq 0 ]; then
        success; echo; ret=0
    else
        failure; echo; return 1
    fi

    arptables -R FORWARD 5 -j DROP
    echo -n "arptables -R FORWARD 5 -j DROP"
    if [ $? -eq 0 ]; then
        success; echo; ret=0
    else
        failure; echo; return 1
    fi
   
    return $ret
}

status() {
    iptables -L FORWARD -n -v --line-numbers | grep "^5"
    arptables -L FORWARD -n -v --line-numbers | grep "^5"

    return $?
}

restart() {
    stop
    start
}

case "$1" in
    start)
        start
        RETVAL=$?
        ;;
    stop)
        stop
        RETVAL=$?
        ;;
    restart)
        restart
        RETVAL=$?
        ;;
    status)
        status
        RETVAL=$?
        ;;
    *)
        echo $"Usage: $0 {start|stop|restart|status}"
        exit 1
        ;;
esac

exit $RETVAL

Conclusion

With this setup, we can survive a failure on one of the two Xen hosts, we can schedule downtime for hardware maintenance, or move the machines with the users of the platforms. A failure of the service indeed means that users have to reconnect, but they won't have to wait until the node has rebooted or until we have fixed the hardware problem.

These virtual gateways are not the only Xen or Heartbeat implementations at Newtec, we also use Xen for build machines, testing of deployment tools and configuration management tools, or to run different lightweight applications isolated from each other on the same machines.

Links & References

http://www.newtec.eu/

http://www.inuits.be/

http://howto.krisbuytaert.be/AutomatingVirtualMachineDeployment/

http://www.krisbuytaert.be/blog/

http://www.raskas.be/blog/

http://www.linux-ha.org/

http://www.xen.org/

Kris Buytaert is a Linux and open source consultant operating in the Benelux. He currently maintains the openMosix HOWTO.

Johan Huysmans is a Linux and Open Source consultant at Inuits. He maintains a blog at http://www.raskas.be/blog/ where he documents his findings about several linux system administration items, including Xen and kickstart.


Return to ONLamp.

Copyright © 2009 O'Reilly Media, Inc.