AFS
The Andrew File System is the network file system used by the Computer Systems Lab. It is a networked file system with a global namespace, and is in use among many universities and companies.
Contents |
Notes
Currently all Solaris systems in the CSL use Transarc paths. For those not familiar with the difference in paths, see the Gentoo Linux OpenAFS Guide (see External Links below) for a handy comparison chart. At the time of writing, the guide does not specify a directory for client binaries in the Transarc paths section, but according to the old docs, they can be found at /usr/afsws/bin.
Implementations
The CSL AFS servers and clients all run the OpenAFS implementation, but there also exist two others: Arla and IBM/Transarc. The IBM/Transarc implementation is an old version, back from when AFS was being developed by IBM. It is no longer maintained, but IBM open sourced the project when they decided to no longer maintain it, and that developed into the OpenAFS project. Arla was developed while IBM's AFS was not Open Source, in order to provide an Open Source implementation. The client is very functional today and is actively maintained, but the server side is not considered finished yet, and is not widely used. However, Arla's client is compatible (mostly) with OpenAFS servers, so the client has seen widespread use. Although the OpenAFS client is probably more popular in general, Arla can be run on several platforms that the OpenAFS client has issues with (such as the BSDs), and so it has achieved popularity with use on those platforms.
AFS Servers
The CSL's main AFS servers are currently Solaris zones running on Seatac and Dulles, aptly named haafs1 and haafs2. Solaris Cluster runs on Seatac and Dulles, allowing for automatic fail-over of either AFS zone to the other host.
Oracle Solaris Cluster Installation/Configuration
HA-AFS Server Installation
Setting up the Zone
One of the base technologies that HA-AFS is built on is ZFS built into Solaris. We use the properties of the filesystem and tweak them to our advantage to attempt to find the balance between maximized storage space and speed.
Create the storage pool for the zone. This is done on each host for the two storage pools utilized. For TJ, the pools are skillet_a and skillet_b hosted on haafs1 and haafs2, respectively.
zfs set <pool> compression=on zfs set <pool> atime=off zfs set <pool> recordsize=64k zfs set mountpoint=/<vicep> <pool>/<volume> zfs set quota <amount> <pool>/<volume>
Set up the zone. This is done via the zonecfg -z <zone> command on the zone host
zonecfg -z <zone> create -b #this creates a zone with a full root, instead of using sparse add dataset #the dataset created in the previous steps is used here add viceps to zone (mountpoint=/vicepa <zpool>/vicepa) add networking set autoboot to true
Set ZFS ARC maximum usage on the zone host:
vim /etc/system set zfs:zfs_arc_max = 2147483648
This sets the maximum amount of RAM that ZFS is allowed to use to cache data actively being used by zpools. As shown above, this is currently set at 2GB on seatac and dulles, both of which have 4GB of RAM. Depending on the amount of RAM available on the HA-AFS systems, this amount may be increased to allow more data to be stored in the ARC to increase performance. On the other hand, allowing the ARC to increase too much may decrease performance by limiting the amount of RAM other system process have to use.
Modify zone timeout to 1800 seconds:
svccfg -s system/zones setprop start/timeout_seconds = 1800 exit
Install and boot the zone:
zoneadm -z <zone> install; zlogin -C <zone> zoneadm -z <zone> boot
The installation of the zone here is pretty much a slimmed-down version of a Solaris install and will prompt for the same general questions. The Solaris postinstall (/afs/csl.tjhsst.edu/common/sun/OS_install/postinstall) should be run after this. Keep in mind that not all steps will apply, as things like networking and datasets are handled by the zone's host. As well, any directions that involve the kernel will not apply.
Compile AFS either on the zone host or within the zone.
./configure --enable-namei-fileserver --enable-transarc-paths \
--enable-fast-restart --enable-bitmap-later \
--enable-bos-restricted-mode --enable-bos-new-config \
--enable-supergroups
make
make dest
- NOTE*** --enable-fast-restart and --enable-bitmap-later will be *deprecated* on the OpenAFS 1.6 branch. There will be new ways of going about having a 'demand attach file server (DAFS)' architecture instead of using these flag options. After a period of time, the OpenAFS team will be making demand attach the default configuration (1.10.x or 2.0.x). This is in active discussion, so the information here may not be 100% accurate. Please check the openafs-devel list for more recent information.
Install AFS server on the zone
mkdir /usr/afs copy /etc/openafs/server or /usr/afs/etc from another AFS server to /usr/afs/etc - do NOT copy the whole /etc/openafs/ or /usr/afs directory vim NetInfo, change to server's IP cd ~/openafs-1.4*/dest/sun4x_510/root.client/usr/vice/etc cp -p afs.rc /etc/init.d/afs (make sure +x is set) #Comment all kernel and afsd related lines; since this is a zone, these do not apply here cp -r root.client/usr/vice /usr/ ln -s /usr/afs/etc/ThisCell /usr/vice/etc/ThisCell ln -s /usr/afs/etc/CellServDB /usr/vice/etc/CellServDB cp -pr root.server/usr/afs/bin /usr/afs/bin
Install AFS on the zone host
vim /etc/name_to_sysnum, add "afs 65" init 6 cd ~/openafs-1.4*/sun4x_510/root.client/usr/vice/etc cp -p modload/afs.rc /etc/init.d/afs #Comment all afsd related lines here - the kernel lines are really the ones we care about cp -p modload/libafs64.nonfs.o /kernel/fs/sparcv9/afs cd /etc/rc3.d ln -s ../init.d/afs S99afs cd /etc/rc0.d ln -s ../init.d/afs K66afs /etc/init.d/afs start
Before starting AFS on the zone: Make sure BosConfig has the most current options: (see BosConfig.20090215 in /afs/csl.tjhsst.edu/common/sun/afs)
-fileserver: -vattachpar (attach vols in parallel from each vicep) -salvager: -DontSalvage **specify number of viceps; only 1 instance per!
On the zone:
/etc/init.d/afs start cd /etc/rc3.d ln -s ../init.d/afs S99afs cd /etc/rc0.d ln -s ../init.d/afs K66afs ./bos status localhost (should say "running unauthenticated") /usr/afs/bin/bos create localhost fs fs /usr/afs/bin/fileserver \ /usr/afs/bin/volserver /usr/afs/bin/salvager -cell csl.tjhsst.edu \ -noauth (use -localauth if root@<zone>)
After completing these steps, your zone on the Solaris 10 machine should be working. You may wish to `bos status <zone>` to make sure that AFS is properly behaving and that you can reach the server.
Solaris Cluster
WARNING: If for some reason, you are installing Solaris Cluster with ZFS on an older Solaris 10 build that has patch 137137-09/137138-09 but not 139579-02/139580-02 or later, or if the system is older than snv_104 (OpenSolaris/Nevada), do NOT proceed! See SunSolve Alert 245626 for details.
The automatic fail-over between Seatac and Dulles is managed by Solaris Cluster 3.2. The two servers are directly attached to a Sun StorEdge D2 array named skillet, which is split into two raidz2 zpools named skillet_a and skillet_b. The two hosts are joined in a Solaris Cluster named 'jetblue'.
There are two cluster resource groups comprising the current setup: haafs1 and haafs2:
Cluster Resource Groups ===
Group Name Node Name Suspended Status
---------- --------- --------- ------
haafs2 seatac.sun.tjhsst.edu No Online
dulles.sun.tjhsst.edu No Offline
haafs1 dulles.sun.tjhsst.edu No Online
seatac.sun.tjhsst.edu No Offline
Each of these resource groups is comprised of three resources:
Cluster Resources ===
Resource Name Node Name State Status Message
------------- --------- ----- --------------
skillet_b seatac.sun.tjhsst.edu Online Online
dulles.sun.tjhsst.edu Offline Offline
haafs2-lh seatac.sun.tjhsst.edu Online Online - LogicalHostname online.
dulles.sun.tjhsst.edu Offline Offline - LogicalHostname offline.
haafs2-rs seatac.sun.tjhsst.edu Online Online - Service is online.
dulles.sun.tjhsst.edu Offline Offline
skillet_a dulles.sun.tjhsst.edu Online Online
seatac.sun.tjhsst.edu Offline Offline
haafs1-lh dulles.sun.tjhsst.edu Online Online - LogicalHostname online.
seatac.sun.tjhsst.edu Offline Offline - LogicalHostname offline.
haafs1-rs dulles.sun.tjhsst.edu Online Online - Service is online.
seatac.sun.tjhsst.edu Offline Offline
These three resources manage the Logical Hostname (lh), Resource (rs), and storage.
Quorum
The cluster is kept together via the quorum that is created with the software.
Cluster Quorum ===
--- Quorum Votes Summary ---
Needed Present Possible
------ ------- --------
2 3 3
--- Quorum Votes by Node ---
Node Name Present Possible Status
--------- ------- -------- ------
seatac.sun.tjhsst.edu 1 1 Online
dulles.sun.tjhsst.edu 1 1 Online
--- Quorum Votes by Device ---
Device Name Present Possible Status
----------- ------- -------- ------
d1 1 1 Online
Both cluster nodes and quorum devices vote to form quorum. By default, cluster nodes acquire a quorum vote count of one when they boot and become cluster members. Nodes can have a vote count of zero when the node is being installed, or when an administrator has placed a node into the maintenance state.
Cluster Installation
Sun provides an in-depth installation guide for a two-node cluster, which is what we have. This guide was used to create the original cluster, and should provide all the commands and directions needed to recreate the current setup. The installation guide is found here