Adding a Host Machine to the Virtual Machine Cluster

From Livedoc - The Documentation Repository

Jump to: navigation, search

This guide assumes that the virtual machine cluster has already been setup and is running. At least one iSCSI volume is assumed to be accessible from the server that will be added to the cluster. This volume should already be configured with LVM. Additionally, this guide will assume that a cluster is already up and running. As such, this guide will not involve the initial setup procedures.

For more information on the various components, please see the other pages on this wiki. This guide is meant merely to be a reference for those who already know the theory, not an instructive document describing how everything is working. This guide only includes the configuration details necessary to add a system to the virtual machine cluster and is not meant to be a comprehensive setup guide.

This guide is meant for a system running Debian GNU/Linux. However, it could be easily adapted for other distributions, although the configuration details may be slightly different.

Contents

Required packages

  • ccs: the cluster configuration system
  • clvm: the cluster LVM daemon
  • cman: the cluster manager
  • multipath-tools: the utilities for administering multipath
  • open-iscsi: an implementation of iSCSI (RFC 3720)
  • redhat-cluster-modules: the Redhat Cluster infrastructure modules
  • vlan: the user mode programs to enable VLANs (needed only if using VLANs)
  • xen-linux-system: the metapackage for running Xen

Network

An interface of the server should be on the network on which the storage is located. As CHAP is insecure, all storage is served via a separate storage VLAN. This can be achieved by either creating a virtual interface and using VLAN tagging, or by using a different physical interface.

Separate physical connection

First is the network configuration in /etc/network/interfaces, where eth1 is the connection that will be used for storage data. In order to keep things simple, the xx.xx in the IP addresses should match the second half of each TJ host's IP address. For example, bottom (198.38.17.55) would have the storage address 172.16.17.55.

allow-hotplug eth1
iface eth1 inet static
   address 172.16.xx.xx
   netmask 255.255.0.0

The configuration of the interface through which the server connects to the storage network should look like the following in Cisco IOS. Here, VLAN 16 is the storage network.

interface SomeInterface3/14
 description Your description here
 switchport access vlan 16
 switchport mode access
 logging event link-status
 spanning-tree portfast

VLAN tagging (802.1q)

After installing the vlan package, the following is possible in /etc/network/interfaces. In order to keep things simple, the xx.xx in the IP addresses should match the second half of each TJ host's IP address. For example, bottom (198.38.17.55) would have the storage address 172.16.17.55.

auto vlan16
iface vlan16 inet static
   address 172.16.xx.xx
   netmask 255.255.0.0
   vlan_raw_device eth0

The VLAN tagging must be configured on the switch. For Cisco IOS, the interface should look similar to the following:

interface SomeInterface3/14
 description Your description here
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 1,16
 switchport mode trunk
 logging event link-status
 spanning-tree portfast

Finally, the host must know to load the 8021q kernel module if this functionality is not embedded into the kernel. Add the following line to /etc/modules:

8021q

iSCSI

/etc/initiatorname.iscsi

(Also link /etc/iscsi/initiatorname.iscsi to this file)

## DO NOT EDIT OR REMOVE THIS FILE!
## If you remove this file, the iSCSI daemon will not start.
## If you change the InitiatorName, existing access control lists
## may reject this initiator.  The InitiatorName must be unique
## for each iSCSI initiator.  Do NOT duplicate iSCSI InitiatorNames.
InitiatorName=iqn.1992-03.edu.tjhsst:initiator:HOSTNAME.0
InitiatorAlias=HOSTNAME

/etc/iscsid.conf

(Also link /etc/iscsi/iscsid.conf to this file)

#
# Open-iSCSI configuration.
#
node.active_cnx = 1
node.startup = automatic
node.session.auth.username = SOME_USERNAME
node.session.auth.password = SOME_PASSWORD
node.session.timeo.replacement_timeout = 120
node.session.err_timeo.abort_timeout = 10
node.session.err_timeo.reset_timeout = 30
node.session.iscsi.InitialR2T = No
node.session.iscsi.ImmediateData = Yes
node.session.iscsi.FirstBurstLength = 262144
node.session.iscsi.MaxBurstLength = 16776192
node.session.iscsi.DefaultTime2Wait = 0
node.session.iscsi.DefaultTime2Retain = 0
node.session.iscsi.MaxConnections = 0
node.conn[0].iscsi.HeaderDigest = None
node.conn[0].iscsi.DataDigest = None
node.conn[0].iscsi.MaxRecvDataSegmentLength = 65536
#discovery.sendtargets.auth.authmethod = CHAP
#discovery.sendtargets.auth.username = SOME_USERNAME
#discovery.sendtargets.auth.password = SOME_PASSWORD

Discovering and connecting to the storage

Run the following command for each storage server, replacing 172.16.xx.xx with the IP addresses of the storage. If a storage device has multiple IP addresses, run the command for each IP address.

iscsiadm -m discovery -t sendtargets -p 172.16.xx.xx

This command should return the iqn of the storage server. Use this iqn in the next command.

iscsiadm -m node -T iqn.1992-03.edu.tjhsst.csl:storage:alexandria -p 172.16.xx.xx -l

The previous command logged into the storage server using the username and password configured earlier. Use -u instead of -l to logout. Again, be sure to run this command for each storage IP address. In the case of storage that can be accessed by multiple IP addresses, each LUN will actually show up twice in /dev. This is what we want, so that the multipath software will be able to perform load balancing. Be sure to stop and restart the open-iscsi service to ensure that the connections will open again automatically.

Multipath

/etc/multipath.conf

Notes on the following file:

  • The WWN (world wide name) or WWID (world wide identifier) can be found on the storage server. For our storage arrays, a 2 must be prepended to the provided WWN. All dashes should be removed from the identifier in the configuration.
  • The path_grouping_policy should be set to multibus in order to ensure load balancing, although other values are possible. Please refer to the multipath manpage for more information.
  • A multipath block should be added for each LUN that multipath is going to control.
  • The alias will appear under /dev/mapper. For our purposes, this is currently vmx, where x is replaced by a different number for each VM LUN. If an alias is not given, a generic name will be assigned by the multipath tools.
  • This file has been configured for our storage arrays. The configuration may be different depending on how the iSCSI target is configured.
defaults {
    user_friendly_names yes
}
blacklist {
        devnode cciss
        devnode fd
        devnode hd
        devnode md
        devnode sr
        devnode scd
        devnode st
        devnode ram
        devnode raw
        devnode loop
}

devices {
        device {
                vendor                  "Promise"
                product                 "VTrak M310i"
                path_grouping_policy    multibus
                getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
                features                "1 queue_if_no_path"
                path_checker            readsector0
                failback                immediate
        }
}

multipaths {
    multipath {
        wwid    (wwn / wwid here)
        alias   (an alias here)
    }
}

The multipath tool

Running multipath on the command-line will read the configuration file and setup the multipath environment. However, this should normally be handled by the multipath services, which can also be restarted. Another useful command is multipath -ll, which displays the current multipath status. Please see the manpage for other options.

CCS

While installing the ccs package, the package installation will fail when it tries to start ccs. No default configuration file is shipped with the ccs package, so the daemon naturally fails. Copy /etc/cluster/ from an existing node onto the new cluster node and then run the installation command again. Apt should pick up from where it left off.

/etc/cluster/cluster.conf

Now that a sample file is in place, it is time to add the new cluster node into the configuration file. Add a clusternode section to the cluster.conf file on an existing cluster node, not the new node. In addition, increase the config_version, again on an existing cluster node, not the new node. Next, run ccs_tool update /etc/cluster/cluster.conf to load the new configuration file on, you guessed it, an existing cluster node, not the new node. Restart CCS on the new node (it should already be running after the package installation). CCS will automatically distribute the new file to all nodes in the cluster.

There is a ccs_tool addnode command, but the command mangles the configuration file, making the file hard to read by humans.

Example /etc/cluster/cluster.conf:

<?xml version="1.0"?>
<cluster name="(Cluster name here)" config_version="(An increasing number)">
   <cman>
   </cman>
   <clusternodes>
      <clusternode name="(Hostname Here)" votes="1">
         <fence>
            <method name="single">
               <device name="manual" ipaddr="(IP Address Here)"/>
            </method>
         </fence>
      </clusternode>
   </clusternodes>
   <fencedevices>
      <fencedevice name="manual" agent="fence_manual"/>
   </fencedevices>
</cluster>

Runlevel configuration

Ensure that the following, and only the following, symlinks exist in the runlevel directories. This should be correct by default.

/etc/rc0.d/S05ccs
/etc/rc6.d/S05ccs
/etc/rcS.d/S61ccs

Upgrading the cluster configuration version

On an existing node in the cluster, run cman_tool version -r (config_version), where (config_version) is the new config_version of the cluster.conf configuration file.

CMAN

Redhat cluster modules

Before installing CMAN, be sure the redhat cluster modules are installed. For the 2.6.18-6-xen-686 kernel, the package is called redhat-cluster-modules-2.6.18-6-xen-686. Once the kernel modules are installed, modprobe the cman kernel module. If this is not done before CMAN is installed, the package will encounter installation errors while trying to start the service. To ensure this always happens, add the following lines to /etc/modules:

cman
dlm

Installing CMAN

After installing the CMAN package, the cman package should connect to the cluster. Check with the cman_tool status command.

Runlevel configuration

Ensure that the following, and only the following, symlinks exist in the runlevel directories. This should be correct by default.

/etc/rc0.d/S04cman
/etc/rc6.d/S04cman
/etc/rcS.d/S62cman

CLVM

Init script

The Debian CLVM package currently does not come with an init script for the CLVM daemon. The following init script, largely from a Debian bug report, works nicely.

#! /bin/sh
#
# clvmd         Start/Stop script for the cluster LVM daemon
#
# Author:       Daniel Bertolo <dbertolo@hsr.ch>.
#
# Version:      @(#)clvmd  1.00  25-Jun-2006  dbertolo@hsr.ch
#
set -e
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
DESC="The cluster LVM daemon"
NAME=clvmd
DAEMON=/sbin/$NAME
PIDFILE=/var/run/$NAME.pid
SCRIPTNAME=/etc/init.d/$NAME
# Gracefully exit if the package has been removed.
test -x $DAEMON || exit 0
#
#       Function that starts the daemon/service.
#
d_start() {
        start-stop-daemon --start --quiet --exec $DAEMON
}
#
#       Function that stops the daemon/service.
#
d_stop() {
        start-stop-daemon --stop --quiet --name $NAME
}
#
#       Function that sends a SIGHUP to the daemon/service.
#
d_reload() {
        start-stop-daemon --stop --quiet --name $NAME --signal 1
}
case "$1" in
  start)
        echo -n "Starting $DESC: $NAME"
        d_start
        echo "."
        ;;
  stop)
        echo -n "Stopping $DESC: $NAME"
        d_stop
        echo "."
        ;;
  restart|force-reload)
        #
        #       If the "reload" option is implemented, move the "force-reload"
        #       option to the "reload" entry above. If not, "force-reload" is
        #       just the same as "restart".
        #
        echo -n "Restarting $DESC: $NAME"
        d_stop
        sleep 1
        d_start
        echo "."
        ;;
  *)
        # echo "Usage: $SCRIPTNAME {start|stop|restart|reload|force-reload}" >&2
        echo "Usage: $SCRIPTNAME {start|stop|restart|force-reload}" >&2
        exit 1
        ;;
esac 
exit 0

Runlevels

Create the following start and stop links exist in the runlevel directories:

rc0.d/K12clvm
rc1.d/K49clvm
rc2.d/S74clvm
rc3.d/S74clvm
rc4.d/S74clvm
rc5.d/S74clvm
rc6.d/K12clvm

LVM configuration

Uncomment the following options in /etc/lvm/lvm.conf. Be sure to comment the existing locking_type option.

file = "/var/log/lvm2.log
locking_library = "liblvm2clusterlock.so"
locking_type = 2
library_dir = "/lib/lvm2"

DLM kernel module

Ensure the dlm kernel module is loaded before trying to start CLVM. If it is not loaded, CLVM will give a file not found error. This module was added to /etc/modules above, so it should be loaded when the system is rebooted.

Finding and activating LVM volumes

As it turns out, CLVM is not enough to attach to the LVM volumes over iSCSI. LVM needs to start early in the boot process in the special runlevel in order for the system to run properly. Thus, it starts before the iSCSI storage is attached, so the LVM volumes on the iSCSI storage are not activated automatically at startup. Instead, we use an san-lvm init script to find and activate all LVM volumes. This could potentially cause problems if there are volumes that should not be activated; this has not been tested, as we have no such volumes. The LVM init script will find the volumes if iSCSI is already running.

/etc/init.d/san-lvm:

#! /bin/sh
### BEGIN INIT INFO
# Provides:          san-vg
# Required-Start:    $local_fs cman clvm ccs open-iscsi multipath-tools
# Required-Stop:     $local_fs
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: Find and activate the san LVM VGs
# Description:       Activates all LVM VGs on the system. Required because the
#                    LVM volume groups on the SAN are not found when LVM
#                    starts.
### END INIT INFO

# Author: Brandon Vargo

# PATH should only include /usr/* if it runs after the mountnfs.sh script
PATH=/sbin:/usr/sbin:/bin:/usr/bin
DESC="SAN LVM Volume Groups"

set -e

test -x /sbin/lvmiopversion -a -x /sbin/vgscan -a -x /sbin/vgchange || exit 0
test -f /etc/default/lvm-common && . /etc/default/lvm-common

case "$1" in
start)
    vgscan
    vgchange -aly
    ;;
stop|restart|reload|force-reload)
    ;;
esac

exit 0

Create the following start and stop links exist in the runlevel directories:

rc2.d/S75san-lvm
rc3.d/S75san-lvm
rc4.d/S75san-lvm
rc5.d/S75san-lvm

Xen

Enable the following options in /etc/xen/xend-config.sxp:

(xend-relocation-server yes)
(xend-relocation-hosts-allow )

This will enable live migration support between hosts. NOTE: This opens up the host node to any node that is able to migrate a VM to it. There are probably security implications here, so you should probably change the xend-relocation-hosts-allow option.

Be sure to restart xend to ensure the changes take effect.

Grub configuration

Xen's handling of console conflicts with that of iLO, HP's integrated lights-out manager. In order to use the iLO serial port to run a getty, for example, add the following to the the kernel options list in the GRUB configuration file. Note, the Linux console messages will still not appear on the serial console, as a result of the xencons=tty, but a getty can be setup to run on the serial port.

console=ttyS0,115200n8 console=tty0 xencons=tty

Starting it all up

In theory, everything should have been started along the way. However, that does not mean it will all startup properly when booting, so now would be the time to reboot your system and make sure it all works, if possible.

Management and maintenance

TODO

  • Configuration file management
  • Adding more LVM volume groups

Problems

TODO

  • Shutdown host machine
  • Network latency
  • Loss of network
  • NX bit

Further reading

TODO

Personal tools