The CSL SAN (Storage Area Network) is a redundant cluster providing iSCSI and NFS storage to other servers. Its primary purposes is to provide iSCSI storage for VM hard drives. It also provides NFS storage for bulk shared file storage.
Snares and Bottom each have an LSI PCI-E dual-port SAS-2 HBA. They are connected via copper SFF-8088 cables to Apocalypse, a dual-port SAS array, such that each server has access to all of the drives in Apocalypse. Any server, HBA, or backplane component within Apocalypse can fail without a loss of functionality.
In addition, Rockhopper is included in the cluster as a standby member to provide a third cluster member to break ties and prevent other messy situations. While it is possible to run a two node cluster, this is not recommended due to the difficulty in determining which cluster node (if any) is operational in the event of network disconnects or other severe disruptions. Because Rockhopper does not have any connections to a storage array, it is not capable of nor configured to run any SAN services.
A number of pieces of software are used on top of the SAN hardware to provide data mangement and redundancy, high availability, and iSCSI and NFS access.
All three servers run a standard CSL Gentoo Linux Server Image as their operating system and base software. Linux was chosen for its flexibility and the ready availability of all required software. In this instance, Gentoo was used for commonality with other CSL systems, however, most Linux distributions should be usable.
ZFS via the ZFS on Linux project is used as the filesystem on the SAN. Currently there is a single zpool, called Apocalypse, with 10 drives in a RAID-Z2 vdev and an eleventh drive as an online spare. This allows any two drives in the pool to fail without any loss of availability or data and the system will automatically start a rebuild after a drive failure using the spare disk.
Using ZFS as our base filesystem provides a number of benefits including transparent data check-summing and compression. ZFS is also capable of transparent deduplication, however, we do not currently have this feature enabled because of the amount of memory it requires and because this feature is not well-tested in ZFS on Linux.
Corosync and Pacemaker are used to provide high availability fail-over of SAN services in the event of a hardware or software failure. All SAN resources should be managed through the cluster software and not directly through the OS configuration files.
Corosync provides messaging and cluster engine services between cluster nodes. It handles the establishment of the cluster and the management of membership and quorum within the cluster.
Pacemaker runs over top of Corosync and provides management of cluster resources using Resource Agents. Pacemaker handles actually starting and stopping cluster resources on each node as well as failing resources between nodes.
Resource agents are small scripts used by Pacemaker that define how to start, stop, and monitor cluster resources. A large number are included by default with Pacemaker and new ones can be written following a specification provided by Pacemaker.
Currently we use the following resource agents in the cluster:
- ocf::tjhsst:ZPool (custom written)
STONITH (Shoot The Other Node In The Head) provides a means for the cluster to ensure the complete removal of a malfunctioning node from the cluster prior to taking over its resources. This is particularly important when managing non-clustered filesystems which will suffer corruption if they are simultaneously activated on multiple cluster nodes.
STONITH provides resource agents which can be used to kill our cluster nodes using the built-in iLO remote management as well as that can ask the systems administrator to manually kill a node if iLO fails.
On Gentoo, STONITH is a part of the cluster-glue package.
We use the LIO Unified Target integrated into the Linux Kernel to supply iSCSI targets for VM Storage. The LIO LUNs are backed by ZFS Block Devices (ZVOLs). LIO was chosen over other Linux iSCSI targets due to its integration into the Linux Kernel and its active development.
Because all LIO targets and LUNs are managed through pacemaker, it is not necessary to install the management utility (targetcli) on the cluster nodes.
We use the NFSv4 server integrated into the Linux Kernel to provide NFS exports for VM support and mail storage.