Adding a Host Machine to the Virtual Machine Cluster
From Livedoc - The Documentation Repository
This guide assumes that the virtual machine cluster has already been setup and is running. At least one iSCSI volume is assumed to be accessible from the server that will be added to the cluster. This volume should already be configured with LVM. Additionally, this guide will assume that a cluster is already up and running. As such, this guide will not involve the initial setup procedures.
For more information on the various components, please see the other pages on this wiki. This guide is meant merely to be a reference for those who already know the theory, not an instructive document describing how everything is working. This guide only includes the configuration details necessary to add a system to the virtual machine cluster and is not meant to be a comprehensive setup guide.
This guide is meant for a system running Debian GNU/Linux. However, it could be easily adapted for other distributions, although the configuration details may be slightly different.
Contents |
Required packages
- ccs: the cluster configuration system
- clvm: the cluster LVM daemon
- cman: the cluster manager
- multipath-tools: the utilities for administering multipath
- open-iscsi: an implementation of iSCSI (RFC 3720)
- redhat-cluster-modules: the Redhat Cluster infrastructure modules
- vlan: the user mode programs to enable VLANs (needed only if using VLANs)
- xen-linux-system: the metapackage for running Xen
Network
An interface of the server should be on the network on which the storage is located. As CHAP is insecure, all storage is served via a separate storage VLAN. This can be achieved by either creating a virtual interface and using VLAN tagging, or by using a different physical interface.
Separate physical connection
First is the network configuration in /etc/network/interfaces, where eth1 is the connection that will be used for storage data. In order to keep things simple, the xx.xx in the IP addresses should match the second half of each TJ host's IP address. For example, bottom (198.38.17.55) would have the storage address 172.16.17.55.
allow-hotplug eth1 iface eth1 inet static address 172.16.xx.xx netmask 255.255.0.0
The configuration of the interface through which the server connects to the storage network should look like the following in Cisco IOS. Here, VLAN 16 is the storage network.
interface SomeInterface3/14 description Your description here switchport access vlan 16 switchport mode access logging event link-status spanning-tree portfast
VLAN tagging (802.1q)
After installing the vlan package, the following is possible in /etc/network/interfaces. In order to keep things simple, the xx.xx in the IP addresses should match the second half of each TJ host's IP address. For example, bottom (198.38.17.55) would have the storage address 172.16.17.55.
auto vlan16 iface vlan16 inet static address 172.16.xx.xx netmask 255.255.0.0 vlan_raw_device eth0
The VLAN tagging must be configured on the switch. For Cisco IOS, the interface should look similar to the following:
interface SomeInterface3/14 description Your description here switchport trunk encapsulation dot1q switchport trunk allowed vlan 1,16 switchport mode trunk logging event link-status spanning-tree portfast
Finally, the host must know to load the 8021q kernel module if this functionality is not embedded into the kernel. Add the following line to /etc/modules:
8021q
iSCSI
/etc/initiatorname.iscsi
(Also link /etc/iscsi/initiatorname.iscsi to this file)
## DO NOT EDIT OR REMOVE THIS FILE! ## If you remove this file, the iSCSI daemon will not start. ## If you change the InitiatorName, existing access control lists ## may reject this initiator. The InitiatorName must be unique ## for each iSCSI initiator. Do NOT duplicate iSCSI InitiatorNames. InitiatorName=iqn.1992-03.edu.tjhsst:initiator:HOSTNAME.0 InitiatorAlias=HOSTNAME
/etc/iscsid.conf
(Also link /etc/iscsi/iscsid.conf to this file)
# # Open-iSCSI configuration. # node.active_cnx = 1 node.startup = automatic node.session.auth.username = SOME_USERNAME node.session.auth.password = SOME_PASSWORD node.session.timeo.replacement_timeout = 120 node.session.err_timeo.abort_timeout = 10 node.session.err_timeo.reset_timeout = 30 node.session.iscsi.InitialR2T = No node.session.iscsi.ImmediateData = Yes node.session.iscsi.FirstBurstLength = 262144 node.session.iscsi.MaxBurstLength = 16776192 node.session.iscsi.DefaultTime2Wait = 0 node.session.iscsi.DefaultTime2Retain = 0 node.session.iscsi.MaxConnections = 0 node.conn[0].iscsi.HeaderDigest = None node.conn[0].iscsi.DataDigest = None node.conn[0].iscsi.MaxRecvDataSegmentLength = 65536 #discovery.sendtargets.auth.authmethod = CHAP #discovery.sendtargets.auth.username = SOME_USERNAME #discovery.sendtargets.auth.password = SOME_PASSWORD
Discovering and connecting to the storage
Run the following command for each storage server, replacing 172.16.xx.xx with the IP addresses of the storage. If a storage device has multiple IP addresses, run the command for each IP address.
iscsiadm -m discovery -t sendtargets -p 172.16.xx.xx
This command should return the iqn of the storage server. Use this iqn in the next command.
iscsiadm -m node -T iqn.1992-03.edu.tjhsst.csl:storage:alexandria -p 172.16.xx.xx -l
The previous command logged into the storage server using the username and password configured earlier. Use -u instead of -l to logout. Again, be sure to run this command for each storage IP address. In the case of storage that can be accessed by multiple IP addresses, each LUN will actually show up twice in /dev. This is what we want, so that the multipath software will be able to perform load balancing. Be sure to stop and restart the open-iscsi service to ensure that the connections will open again automatically.
Multipath
/etc/multipath.conf
Notes on the following file:
- The WWN (world wide name) or WWID (world wide identifier) can be found on the storage server. For our storage arrays, a 2 must be prepended to the provided WWN. All dashes should be removed from the identifier in the configuration.
- The path_grouping_policy should be set to multibus in order to ensure load balancing, although other values are possible. Please refer to the multipath manpage for more information.
- A multipath block should be added for each LUN that multipath is going to control.
- The alias will appear under /dev/mapper. For our purposes, this is currently vmx, where x is replaced by a different number for each VM LUN. If an alias is not given, a generic name will be assigned by the multipath tools.
- This file has been configured for our storage arrays. The configuration may be different depending on how the iSCSI target is configured.
defaults {
user_friendly_names yes
}
blacklist {
devnode cciss
devnode fd
devnode hd
devnode md
devnode sr
devnode scd
devnode st
devnode ram
devnode raw
devnode loop
}
devices {
device {
vendor "Promise"
product "VTrak M310i"
path_grouping_policy multibus
getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
features "1 queue_if_no_path"
path_checker readsector0
failback immediate
}
}
multipaths {
multipath {
wwid (wwn / wwid here)
alias (an alias here)
}
}
The multipath tool
Running multipath on the command-line will read the configuration file and setup the multipath environment. However, this should normally be handled by the multipath services, which can also be restarted. Another useful command is multipath -ll, which displays the current multipath status. Please see the manpage for other options.
CCS
While installing the ccs package, the package installation will fail when it tries to start ccs. No default configuration file is shipped with the ccs package, so the daemon naturally fails. Copy /etc/cluster/ from an existing node onto the new cluster node and then run the installation command again. Apt should pick up from where it left off.
/etc/cluster/cluster.conf
Now that a sample file is in place, it is time to add the new cluster node into the configuration file. Add a clusternode section to the cluster.conf file on an existing cluster node, not the new node. In addition, increase the config_version, again on an existing cluster node, not the new node. Next, run ccs_tool update /etc/cluster/cluster.conf to load the new configuration file on, you guessed it, an existing cluster node, not the new node. Restart CCS on the new node (it should already be running after the package installation). CCS will automatically distribute the new file to all nodes in the cluster.
There is a ccs_tool addnode command, but the command mangles the configuration file, making the file hard to read by humans.
Example /etc/cluster/cluster.conf:
<?xml version="1.0"?>
<cluster name="(Cluster name here)" config_version="(An increasing number)">
<cman>
</cman>
<clusternodes>
<clusternode name="(Hostname Here)" votes="1">
<fence>
<method name="single">
<device name="manual" ipaddr="(IP Address Here)"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice name="manual" agent="fence_manual"/>
</fencedevices>
</cluster>
Runlevel configuration
Ensure that the following, and only the following, symlinks exist in the runlevel directories. This should be correct by default.
/etc/rc0.d/S05ccs /etc/rc6.d/S05ccs /etc/rcS.d/S61ccs
Upgrading the cluster configuration version
On an existing node in the cluster, run cman_tool version -r (config_version), where (config_version) is the new config_version of the cluster.conf configuration file.
CMAN
Redhat cluster modules
Before installing CMAN, be sure the redhat cluster modules are installed. For the 2.6.18-6-xen-686 kernel, the package is called redhat-cluster-modules-2.6.18-6-xen-686. Once the kernel modules are installed, modprobe the cman kernel module. If this is not done before CMAN is installed, the package will encounter installation errors while trying to start the service. To ensure this always happens, add the following lines to /etc/modules:
cman dlm
Installing CMAN
After installing the CMAN package, the cman package should connect to the cluster. Check with the cman_tool status command.
Runlevel configuration
Ensure that the following, and only the following, symlinks exist in the runlevel directories. This should be correct by default.
/etc/rc0.d/S04cman /etc/rc6.d/S04cman /etc/rcS.d/S62cman
CLVM
Init script
The Debian CLVM package currently does not come with an init script for the CLVM daemon. The following init script, largely from a Debian bug report, works nicely.
#! /bin/sh
#
# clvmd Start/Stop script for the cluster LVM daemon
#
# Author: Daniel Bertolo <dbertolo@hsr.ch>.
#
# Version: @(#)clvmd 1.00 25-Jun-2006 dbertolo@hsr.ch
#
set -e
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
DESC="The cluster LVM daemon"
NAME=clvmd
DAEMON=/sbin/$NAME
PIDFILE=/var/run/$NAME.pid
SCRIPTNAME=/etc/init.d/$NAME
# Gracefully exit if the package has been removed.
test -x $DAEMON || exit 0
#
# Function that starts the daemon/service.
#
d_start() {
start-stop-daemon --start --quiet --exec $DAEMON
}
#
# Function that stops the daemon/service.
#
d_stop() {
start-stop-daemon --stop --quiet --name $NAME
}
#
# Function that sends a SIGHUP to the daemon/service.
#
d_reload() {
start-stop-daemon --stop --quiet --name $NAME --signal 1
}
case "$1" in
start)
echo -n "Starting $DESC: $NAME"
d_start
echo "."
;;
stop)
echo -n "Stopping $DESC: $NAME"
d_stop
echo "."
;;
restart|force-reload)
#
# If the "reload" option is implemented, move the "force-reload"
# option to the "reload" entry above. If not, "force-reload" is
# just the same as "restart".
#
echo -n "Restarting $DESC: $NAME"
d_stop
sleep 1
d_start
echo "."
;;
*)
# echo "Usage: $SCRIPTNAME {start|stop|restart|reload|force-reload}" >&2
echo "Usage: $SCRIPTNAME {start|stop|restart|force-reload}" >&2
exit 1
;;
esac
exit 0
Runlevels
Create the following start and stop links exist in the runlevel directories:
rc0.d/K12clvm rc1.d/K49clvm rc2.d/S74clvm rc3.d/S74clvm rc4.d/S74clvm rc5.d/S74clvm rc6.d/K12clvm
LVM configuration
Uncomment the following options in /etc/lvm/lvm.conf. Be sure to comment the existing locking_type option.
file = "/var/log/lvm2.log locking_library = "liblvm2clusterlock.so" locking_type = 2 library_dir = "/lib/lvm2"
DLM kernel module
Ensure the dlm kernel module is loaded before trying to start CLVM. If it is not loaded, CLVM will give a file not found error. This module was added to /etc/modules above, so it should be loaded when the system is rebooted.
Finding and activating LVM volumes
As it turns out, CLVM is not enough to attach to the LVM volumes over iSCSI. LVM needs to start early in the boot process in the special runlevel in order for the system to run properly. Thus, it starts before the iSCSI storage is attached, so the LVM volumes on the iSCSI storage are not activated automatically at startup. Instead, we use an san-lvm init script to find and activate all LVM volumes. This could potentially cause problems if there are volumes that should not be activated; this has not been tested, as we have no such volumes. The LVM init script will find the volumes if iSCSI is already running.
/etc/init.d/san-lvm:
#! /bin/sh
### BEGIN INIT INFO
# Provides: san-vg
# Required-Start: $local_fs cman clvm ccs open-iscsi multipath-tools
# Required-Stop: $local_fs
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Find and activate the san LVM VGs
# Description: Activates all LVM VGs on the system. Required because the
# LVM volume groups on the SAN are not found when LVM
# starts.
### END INIT INFO
# Author: Brandon Vargo
# PATH should only include /usr/* if it runs after the mountnfs.sh script
PATH=/sbin:/usr/sbin:/bin:/usr/bin
DESC="SAN LVM Volume Groups"
set -e
test -x /sbin/lvmiopversion -a -x /sbin/vgscan -a -x /sbin/vgchange || exit 0
test -f /etc/default/lvm-common && . /etc/default/lvm-common
case "$1" in
start)
vgscan
vgchange -aly
;;
stop|restart|reload|force-reload)
;;
esac
exit 0
Create the following start and stop links exist in the runlevel directories:
rc2.d/S75san-lvm rc3.d/S75san-lvm rc4.d/S75san-lvm rc5.d/S75san-lvm
Xen
Enable the following options in /etc/xen/xend-config.sxp:
(xend-relocation-server yes) (xend-relocation-hosts-allow )
This will enable live migration support between hosts. NOTE: This opens up the host node to any node that is able to migrate a VM to it. There are probably security implications here, so you should probably change the xend-relocation-hosts-allow option.
Be sure to restart xend to ensure the changes take effect.
Grub configuration
Xen's handling of console conflicts with that of iLO, HP's integrated lights-out manager. In order to use the iLO serial port to run a getty, for example, add the following to the the kernel options list in the GRUB configuration file. Note, the Linux console messages will still not appear on the serial console, as a result of the xencons=tty, but a getty can be setup to run on the serial port.
console=ttyS0,115200n8 console=tty0 xencons=tty
Starting it all up
In theory, everything should have been started along the way. However, that does not mean it will all startup properly when booting, so now would be the time to reboot your system and make sure it all works, if possible.
Management and maintenance
TODO
- Configuration file management
- Adding more LVM volume groups
Problems
TODO
- Shutdown host machine
- Network latency
- Loss of network
- NX bit
Further reading
TODO