Warning Livedoc is no longer being updated and will be deprecated shortly. Please refer to https://documentation.tjhsst.edu.


From Livedoc - The Documentation Repository
Revision as of 20:31, 12 October 2010 by William Yang (talk | contribs) (Solaris Cluster: add reference to sun doc 245626 regarding data corruption issue)
Jump to: navigation, search

The Andrew File System is the network file system used by the Computer Systems Lab. It is a networked file system with a global namespace, and is in use among many universities and companies.


Currently all Solaris systems in the CSL use Transarc paths. For those not familiar with the difference in paths, see the Gentoo Linux OpenAFS Guide (see External Links below) for a handy comparison chart. At the time of writing, the guide does not specify a directory for client binaries in the Transarc paths section, but according to the old docs, they can be found at /usr/afsws/bin.


The CSL AFS servers and clients all run the OpenAFS implementation, but there also exist two others: Arla and IBM/Transarc. The IBM/Transarc implementation is an old version, back from when AFS was being developed by IBM. It is no longer maintained, but IBM open sourced the project when they decided to no longer maintain it, and that developed into the OpenAFS project. Arla was developed while IBM's AFS was not Open Source, in order to provide an Open Source implementation. The client is very functional today and is actively maintained, but the server side is not considered finished yet, and is not widely used. However, Arla's client is compatible (mostly) with OpenAFS servers, so the client has seen widespread use. Although the OpenAFS client is probably more popular in general, Arla can be run on several platforms that the OpenAFS client has issues with (such as the BSDs), and so it has achieved popularity with use on those platforms.

AFS Servers

The CSL's main AFS servers are currently Solaris zones running on Seatac and Dulles, aptly named haafs1 and haafs2. Solaris Cluster runs on Seatac and Dulles, allowing for automatic fail-over of either AFS zone to the other host.

HA-AFS Server Installation

Setting up the Zone

One of the base technologies that HA-AFS is built on is ZFS built into Solaris. We use the properties of the filesystem and tweak them to our advantage to attempt to find the balance between maximized storage space and speed.

Create the storage pool for the zone. This is done on each host for the two storage pools utilized. For TJ, the pools are skillet_a and skillet_b hosted on haafs1 and haafs2, respectively.

zfs set <pool> compression=on
zfs set <pool> atime=off
zfs set <pool> recordsize=64k
zfs set mountpoint=/<vicep> <pool>/<volume>
zfs set quota <amount> <pool>/<volume>

Set up the zone. This is done via the zonecfg -z <zone> command on the zone host

zonecfg -z <zone>
 create -b #this creates a zone with a full root, instead of using sparse
 add dataset #the dataset created in the previous steps is used here
 add viceps to zone (mountpoint=/vicepa <zpool>/vicepa)
 add networking
 set autoboot to true

Set ZFS ARC maximum usage on the zone host:

vim /etc/system
set zfs:zfs_arc_max = 2147483648

This sets the maximum amount of RAM that ZFS is allowed to use to cache data actively being used by zpools. As shown above, this is currently set at 2GB on seatac and dulles, both of which have 4GB of RAM. Depending on the amount of RAM available on the HA-AFS systems, this amount may be increased to allow more data to be stored in the ARC to increase performance. On the other hand, allowing the ARC to increase too much may decrease performance by limiting the amount of RAM other system process have to use.

Modify zone timeout to 1800 seconds:

svccfg -s system/zones
 setprop start/timeout_seconds = 1800

Install and boot the zone:

zoneadm -z <zone> install; zlogin -C <zone>
zoneadm -z <zone> boot

The installation of the zone here is pretty much a slimmed-down version of a Solaris install and will prompt for the same general questions. The Solaris postinstall (/afs/csl.tjhsst.edu/common/sun/OS_install/postinstall) should be run after this. Keep in mind that not all steps will apply, as things like networking and datasets are handled by the zone's host. As well, any directions that involve the kernel will not apply.

Compile AFS either on the zone host or within the zone.

./configure --enable-namei-fileserver --enable-transarc-paths \
            --enable-fast-restart --enable-bitmap-later \
            --enable-bos-restricted-mode --enable-bos-new-config \
make dest
      • NOTE*** --enable-fast-restart and --enable-bitmap-later will be *deprecated* on the OpenAFS 1.6 branch. There will be new ways of going about having a 'demand attach file server (DAFS)' architecture instead of using these flag options. After a period of time, the OpenAFS team will be making demand attach the default configuration (1.10.x or 2.0.x). This is in active discussion, so the information here may not be 100% accurate. Please check the openafs-devel list for more recent information.

Install AFS server on the zone:

mkdir /usr/afs
copy /etc/openafs/server or /usr/afs/etc from another AFS server to /usr/afs/etc - do NOT copy the whole /etc/openafs/ or /usr/afs directory
vim NetInfo, change to server's IP
cd ~/openafs-1.4*/dest/sun4x_510/root.client/usr/vice/etc
cp -p afs.rc /etc/init.d/afs (make sure +x is set) #Comment all kernel and afsd related lines; since this is a zone, these do not apply here
#cd ~/openafs-1.4*/dest/sun4x_510/root.server/usr/afs

Install AFS on the zone host

vim /etc/name_to_sysnum, add "afs 65"
init 6
cd ~/openafs-1.4*/sun4x_510/root.client/usr/vice/etc
cp -p modload/afs.rc /etc/init.d/afs #Comment all afsd related lines here - the kernel lines are really the ones we care about
cp -p modload/libafs64.nonfs.o /kernel/fs/sparcv9/afs
cd /etc/rc3.d
ln -s ../init.d/afs S99afs
cd /etc/rc0.d
ln -s ../init.d/afs K66afs
/etc/init.d/afs start

Before starting AFS on the zone: Make sure BosConfig has the most current options: (see BosConfig.20090215 in /afs/csl.tjhsst.edu/common/sun/afs)

-fileserver: -vattachpar (attach vols in parallel from each vicep)
-salvager: -DontSalvage
**specify number of viceps; only 1 instance per!

On the zone:

/etc/init.d/afs start
cd /etc/rc3.d
ln -s ../init.d/afs S99afs
cd /etc/rc0.d
ln -s ../init.d/afs K66afs 
./bos status localhost (should say "running unauthenticated")
./bos create localhost fs fs /usr/afs/bin/fileserver \
/usr/afs/bin/volserver /usr/afs/bin/salvager -cell csl.tjhsst.edu \
-noauth (use -localauth if root@<zone>)

After completing these steps, your zone on the Solaris 10 machine should be working. You may wish to `bos status <zone>` to make sure that AFS is properly behaving and that you can reach the server.

Solaris Cluster

WARNING: If for some reason, you are installing Solaris Cluster with ZFS on an older Solaris 10 build that has patch 137137-09/137138-09 but not 139579-02/139580-02 or later, or if the system is older than snv_104 (OpenSolaris/Nevada), do NOT proceed! See SunSolve Alert 245626 for details.

The automatic fail-over between Seatac and Dulles is managed by Solaris Cluster 3.2. The two servers are directly attached to a Sun StorEdge D2 array named skillet, which is split into two raidz2 zpools named skillet_a and skillet_b. The two hosts are joined in a Solaris Cluster named 'jetblue'.

There are two cluster resource groups comprising the current setup: haafs1 and haafs2:

Cluster Resource Groups ===

Group Name   Node Name               Suspended   Status
----------   ---------               ---------   ------
haafs2       seatac.sun.tjhsst.edu   No          Online
             dulles.sun.tjhsst.edu   No          Offline

haafs1       dulles.sun.tjhsst.edu   No          Online
             seatac.sun.tjhsst.edu   No          Offline

Each of these resource groups is comprised of three resources:

Cluster Resources ===

Resource Name   Node Name               State     Status Message
-------------   ---------               -----     --------------
skillet_b       seatac.sun.tjhsst.edu   Online    Online
                dulles.sun.tjhsst.edu   Offline   Offline

haafs2-lh       seatac.sun.tjhsst.edu   Online    Online - LogicalHostname online.
                dulles.sun.tjhsst.edu   Offline   Offline - LogicalHostname offline.

haafs2-rs       seatac.sun.tjhsst.edu   Online    Online - Service is online.
                dulles.sun.tjhsst.edu   Offline   Offline

skillet_a       dulles.sun.tjhsst.edu   Online    Online
                seatac.sun.tjhsst.edu   Offline   Offline

haafs1-lh       dulles.sun.tjhsst.edu   Online    Online - LogicalHostname online.
                seatac.sun.tjhsst.edu   Offline   Offline - LogicalHostname offline.

haafs1-rs       dulles.sun.tjhsst.edu   Online    Online - Service is online.
                seatac.sun.tjhsst.edu   Offline   Offline

These three resources manage the Logical Hostname (lh), Resource (rs), and storage.


The cluster is kept together via the quorum that is created with the software.

Cluster Quorum ===

--- Quorum Votes Summary ---

            Needed   Present   Possible
            ------   -------   --------
            2        3         3

--- Quorum Votes by Node ---

Node Name                Present    Possible    Status
---------                -------    --------    ------
seatac.sun.tjhsst.edu    1          1           Online
dulles.sun.tjhsst.edu    1          1           Online

--- Quorum Votes by Device ---

Device Name       Present      Possible      Status
-----------       -------      --------      ------
d1                1            1             Online

Both cluster nodes and quorum devices vote to form quorum. By default, cluster nodes acquire a quorum vote count of one when they boot and become cluster members. Nodes can have a vote count of zero when the node is being installed, or when an administrator has placed a node into the maintenance state.

Cluster Installation

Sun provides an in-depth installation guide for a two-node cluster, which is what we have. This guide was used to create the original cluster, and should provide all the commands and directions needed to recreate the current setup. The installation guide is found here

External Links

External Links