SAN/Installation

Summary
This article will cover the process of installing and initially configuring the hardware and software required to run the CSL SAN.

Prerequisites
This setup assumes that you have a multipath-capable SAS storage array and at least two servers and the requisite HBAs and cables to connect the array to the servers. You will also need a third server to serve as a dummy node in the cluster to break ties. See the main SAN article for details on the hardware currently used in the CSL.

Preparation
Install and update the servers that you will be using in the SAN. You will need a total of three systems, at least two of which should be as close to identical as reasonable. For detailed instructions on installing systems with the CSL Gentoo Server image, see the Gentoo_Server_Install guide.

Hardware Configuration
Install an SAS HBA into each of the two servers that will be connected to the storage array. Then connect each server to the storage array using appropriate SAS cables. Make sure to check the array's documentation to ensure that both servers are connected to all of the drives in the array. For the SC847, this means that each server should have one connection to the bottom pair of ports and one connection to the top pair of ports. At this point you should also connect any network cables to the servers; a minimum of two NICs is recommended for both redundancy and performance.

Once everything is connected, boot both servers and make sure that all of the hardware shows up correctly. In particular make sure that all of the drives in the array are listed in /dev/ on both servers. If not, double-check the cabling before continuing.

Software Installation
Install the following packages onto the two storage nodes:
 * iproute2
 * ZFS on Linux (ZoL)
 * pacemaker 1.1.8 or higher
 * corosync 2.3.0 or higher
 * nfs-utils

On gentoo this can be done with the following steps. First, add the following lines to /etc/portage/package.keywords (replace version numbers where applicable):

Add the following line to /etc/portage/package.unmask (replace version numbers if needed):

Add the following line to /etc/portage/package.use:

Run the following command to emerge the necessary packages and dependencies onto the storage servers:

Required Kernel Options
You will need the following options enabled in storage server kernels for the LIO iSCSI Target: You will need the following options enabled in your storage server kernels for the NFS File Server:

Quorum Node
The quorum node only needs the basic cluster software; it does not need any of the resource specific files. Repeat the above steps to install the required software on the quorum node, however, you may optionally skip anything related to the following packages:
 * sys-kernel/spl
 * sys-kernel/zfs-kmod
 * sys-kernel/zfs
 * LIO kernel configuration
 * NFS kernel configuration

Software Configuration
With all of the necessary software installed, it is now time to configure and initialize the cluster. All of the below steps should be performed on all three cluster nodes (both storage servers and the quorum node) unless otherwise indicated.

ZFS on Linux
Storage Servers Only For ease of disk management as well as identifying and replacing lost disks, we will configure friendly names for our disks before creating the zpools. This is done through /etc/zfs/vdev_id.conf which should be created with the following format:

After this, run the following command to create the drive aliases in /dev/disk/by-vdev/:

Next you need to create one or more zpools out of the drives in your array. Currently we use a single zpool comprised of a 10-disk RAID-z2 array with a hot spare. Depending on your planned I/O and redundancy needs, you may benefit from different drive configurations. When in doubt, the ZFS on Linux zfs-discuss mailing list is a good place to ask. Our zpool was created with the following command:

IMPORTANT - only create the zpool on one of the two servers. The same zpool will be migrated between servers using the zpool export and import commands.

Once the zpool is created, there are a number of configuration options that can be changed. We recommend immediately creating a reserved 'safety' partition as well as enabling transparent data compression. The safety partition is necessary because ZFS is a copy-on-write filesystem and will lock up and be unable to write or delete files if it completely runs out of disk space. The above steps can be done with the following commands:

Networking
A stable, high-speed network is crucial to the performance and operation of the cluster. In addition, because iSCSI traffic is unencrypted and only lightly secured, it is highly recommended that it only be run on an isolated VLAN to prevent snooping. Because our storage servers each have two NICs, we use the following network configuration:
 * eth0 - tagged onto our server VLAN for management, tagged onto the SAN VLAN for storage operations
 * eth1 - untagged onto the SAN VLAN for storage operations.

This is done on Gentoo using a configuration similar to the following:

Each storage server is given two separate IPs on the SAN network which then allows the VM Servers to use multipath I/O to increase bandwidth between them and the storage servers.

Once all addresses are configured, make sure reverse DNS is working for all cluster IPs. All cluster nodes need to be able to resolve all cluster IPs in order for the cluster to function properly. You may need to add reverse zones to your site DNS servers for the SAN network if they do not already exist.

Corosync
We need to configure a corosync 'ring' on each of our SAN interfaces for redundancy and communication. Corosync will use these rings to share cluster information among the various nodes.

Add or edit the following lines in /etc/corosync/corosync.conf. Note that the bindnetaddrs must match the addresses configured on each server, however, the mcastaddrs must be THE SAME on each server.

Now start corosync on each of the cluster nodes with the following command:

Corosync should now be running on each of the cluster nodes. You can check the status of the cluster with the following commands:

Pacemaker
Start pacemaker on each of the cluster nodes with the following command:

Make sure that pacemaker has started successfully on all of the nodes with the following command:

Start the pacemaker configuration shell, change to configure mode, and display the current configuration:

Edit the configuration for the quorum-only node and add the standby attribute so that it does not attempt to start cluster resources:

STONITH
The first resource that we will configure on our new cluster is STONITH. STONITH is needed to ensure the safe failover of resources in the event of a failed node.

We use STONITH's iLO plugin to forcibly power-off nodes via the Integrated Lights-out management built into our HP servers. To configure this, first create a stonith user on each storage server's iLO and assign it power privileges. Then use the following command in the configure shell to create the primitive for each iLO device. Note that we specify the iLO by IP Address to ensure that STONITH will function even if DNS is offline.

We also setup a backup STONITH system called meatware. This is a manual STONITH system that requires a Systems Administrator to manually ensure that the affected system is powered off and then acknowledge to the cluster that this has been done. It is ordinarily not needed, however, it provides a backup if one of the iLOs dies.

Resource Agents
Resource agents form the backbone of the cluster. Primarily they manage cluster resources, however, there are a few that are essential to the proper operation of the cluster.

We setup a ping resource that monitors each storage node's connection to our central switching infrastructure. This is then combined with rules that force a node that loses its connection to the network to remove itself from the cluster before it needs to be killed.