SAN

From Livedoc - The Documentation Repository
Jump to: navigation, search

The CSL SAN (Storage Area Network) is a redundant cluster providing iSCSI and NFS storage to other servers. Its primary purposes is to provide iSCSI storage for VM hard drives. It also provides NFS storage for bulk shared file storage.

Storage arrays

{{#ask:Page describes type::Storage array|?Number of drives|?Contact person|?Criticality}}

Storage pools

{{#ask:Page describes type::Storage pool|?Located on storage array}}

Hardware

Snares and Bottom each have an LSI PCI-E dual-port SAS-2 HBA. They are connected via copper SFF-8088 cables to Apocalypse, a dual-port SAS array, such that each server has access to all of the drives in Apocalypse. Any server, HBA, or backplane component within Apocalypse can fail without a loss of functionality.

In addition, Rockhopper is included in the cluster as a standby member to provide a third cluster member to break ties and prevent other messy situations. While it is possible to run a two node cluster, this is not recommended due to the difficulty in determining which cluster node (if any) is operational in the event of network disconnects or other severe disruptions. Because Rockhopper does not have any connections to a storage array, it is not capable of nor configured to run any SAN services.

Valdes and Barrel are connected to Apocalypse (array), and there's not currently any sort of redundancy with respect to servers. This would be something good to do.

Software

A number of pieces of software are used on top of the SAN hardware to provide data mangement and redundancy, high availability, and iSCSI and NFS access.

Ubuntu Linux

Valdes and Barrel run Ubuntu 16.04 LTS as their operating system. Ubuntu Linux was chosen for its flexibility and the ready availability of all required software, in particular, ZFS on Linux.

When setting up a new Ubuntu storage server, make sure to generate a unique hostid for the system. This can be done with

hostid | fold -w 2 | xargs -n1 -I {} echo -e -n \\x{} > /etc/hostid

If this is not done, this can cause unintended behavior with ZFS due to ZFS on Linux defaulting hostid to 0x00000000 if /etc/hostid does not exist.

ZFS

ZFS via the ZFS on Linux project is used as the filesystem on the SAN. Currently there is a single zpool, called Apocalypse, with 10 drives in a RAID-Z2 vdev and an eleventh drive as an online spare. This allows any two drives in the pool to fail without any loss of availability or data and the system will automatically start a rebuild after a drive failure using the spare disk.

Using ZFS as our base filesystem provides a number of benefits including transparent data check-summing and compression. ZFS is also capable of transparent deduplication, however, we do not currently have this feature enabled because of the amount of memory it requires and because this feature is not well-tested in ZFS on Linux.

Creating a new zvol

To create a new zvol:

zfs create -V 16G apocalypse/vms/whatever/root  # 16G can be replaced with any kind of size

Corosync/Pacemaker

Corosync and Pacemaker are used to provide high availability fail-over of SAN services in the event of a hardware or software failure. All SAN resources should be managed through the cluster software and not directly through the OS configuration files.

Corosync provides messaging and cluster engine services between cluster nodes. It handles the establishment of the cluster and the management of membership and quorum within the cluster.

Pacemaker runs over top of Corosync and provides management of cluster resources using Resource Agents. Pacemaker handles actually starting and stopping cluster resources on each node as well as failing resources between nodes.

Resource Agents

Resource agents are small scripts used by Pacemaker that define how to start, stop, and monitor cluster resources. A large number are included by default with Pacemaker and new ones can be written following a specification provided by Pacemaker.

Currently we use the following resource agents in the cluster:

  • ocf::tjhsst:ZPool (custom written)
  • ocf::heartbeat:IPaddr2
  • ocf::heartbeat:IPv6addr
  • ocf::heartbeat:iSCSITarget
  • ocf::heartbeat:iSCSILogicalUnit
  • ocf::heartbeat:nfsserver
  • stonith:external/riloe
  • stonith:meatware

STONITH

STONITH (Shoot The Other Node In The Head) provides a means for the cluster to ensure the complete removal of a malfunctioning node from the cluster prior to taking over its resources. This is particularly important when managing non-clustered filesystems which will suffer corruption if they are simultaneously activated on multiple cluster nodes.

STONITH provides resource agents which can be used to kill our cluster nodes using the built-in iLO remote management as well as that can ask the systems administrator to manually kill a node if iLO fails.

On Gentoo, STONITH is a part of the cluster-glue package.

LIO

We use the LIO Unified Target integrated into the Linux Kernel to supply iSCSI targets for VM Storage. The LIO LUNs are backed by ZFS Block Devices (ZVOLs). LIO was chosen over other Linux iSCSI targets due to its integration into the Linux Kernel and its active development.

Because all LIO targets and LUNs are managed through pacemaker, it is not necessary to install the management utility (targetcli) on the cluster nodes.

NFS

We use the NFSv4 server integrated into the Linux Kernel to provide NFS exports for VM support and mail storage.