Warning Livedoc is no longer being updated and will be deprecated shortly. Please refer to https://documentation.tjhsst.edu.

Difference between revisions of "SAN"

From Livedoc - The Documentation Repository
Jump to: navigation, search
(add detailed information on cluster software)
(add cluster OS information and additional details throughout)
Line 1: Line 1:
The CSL SAN (Storage Area Network) is a redundant cluster providing iSCSI and NFS storage to other servers. Its primary purposes is to provide iSCSI storage for VM hard drives.
+
The CSL SAN (Storage Area Network) is a redundant cluster providing iSCSI and NFS storage to other servers. Its primary purposes is to provide iSCSI storage for VM hard drives. It also provides NFS storage for bulk shared file storage.
  
 
==Hardware==
 
==Hardware==
Line 8: Line 8:
 
==Software==
 
==Software==
 
A number of pieces of software are used on top of the SAN hardware to provide data mangement and redundancy, high availability, and iSCSI and NFS access.
 
A number of pieces of software are used on top of the SAN hardware to provide data mangement and redundancy, high availability, and iSCSI and NFS access.
 +
 +
===Gentoo Linux===
 +
Both Snares and Bottom run a standard CSL Gentoo Linux Server Image as their operating system and base software. Linux was chosen for its flexibility and the ready availability of all required software. In this instance, Gentoo was used for commonality with other CSL systems, however, most Linux distributions should be usable.
  
 
===ZFS===
 
===ZFS===
ZFS via the ZFS on Linux project is used as the filesystem on the SAN. Currently there is a single zpool, called apocalypse, with 10 drives in a RAID-Z2 and an eleventh drive as an online spare. This allows any two drives in the pool to fail without any loss of availability or data and the system will automatically start a rebuild after a single drive failure to the spare disk.
+
ZFS via the ZFS on Linux project is used as the filesystem on the SAN. Currently there is a single zpool, called apocalypse, with 10 drives in a RAID-Z2 vdev and an eleventh drive as an online spare. This allows any two drives in the pool to fail without any loss of availability or data and the system will automatically start a rebuild after a single drive failure to the spare disk.
  
 
Using ZFS as our base filesystem provides a number of benefits including transparent data check-summing and compression. ZFS is also capable of transparent deduplication, however, we do not currently have this feature enabled because of the amount of memory it requires and because this feature is not well-tested in ZFS on Linux.
 
Using ZFS as our base filesystem provides a number of benefits including transparent data check-summing and compression. ZFS is also capable of transparent deduplication, however, we do not currently have this feature enabled because of the amount of memory it requires and because this feature is not well-tested in ZFS on Linux.
  
 
===Corosync/Pacemaker===
 
===Corosync/Pacemaker===
Corosync and Pacemaker are used to provide high availability failover of SAN services in the event of a hardware or software failure.
+
Corosync and Pacemaker are used to provide high availability fail-over of SAN services in the event of a hardware or software failure. All SAN resources should be managed through the cluster software and not directly through the OS configuration files.
  
 
Corosync provides messaging and cluster engine services between cluster nodes. It handles the establishment of the cluster and the management of membership and quorum within the cluster.
 
Corosync provides messaging and cluster engine services between cluster nodes. It handles the establishment of the cluster and the management of membership and quorum within the cluster.
Line 26: Line 29:
 
Currently we use the following resource agents in the cluster:
 
Currently we use the following resource agents in the cluster:
  
*ocf:tjhsst:ZPool (custom written)
+
*ocf::tjhsst:ZPool (custom written)
*ocf:heartbeat:IPaddr2
+
*ocf::heartbeat:IPaddr2
*ocf:heartbeat:IPv6addr
+
*ocf::heartbeat:IPv6addr
*ocf:heartbeat:iSCSITarget
+
*ocf::heartbeat:iSCSITarget
*ocf:heartbeat:iSCSILogicalUnit
+
*ocf::heartbeat:iSCSILogicalUnit
 +
*ocf::heartbeat:nfsserver
 
*stonith:external/riloe
 
*stonith:external/riloe
 
*stonith:meatware
 
*stonith:meatware

Revision as of 00:33, 4 July 2013

The CSL SAN (Storage Area Network) is a redundant cluster providing iSCSI and NFS storage to other servers. Its primary purposes is to provide iSCSI storage for VM hard drives. It also provides NFS storage for bulk shared file storage.

Hardware

Snares and Bottom each have an LSI PCI-E dual-port SAS-2 HBA. They are connected via copper SFF-8088 cables to Apocalypse such that each server has access to all of the drives in Apocalypse. Any Server, HBA, or backplane component within Apocalypse can fail without a loss of functionality.

In addition, Rockhopper is included in the cluster as a standby member to provide a third cluster member to break ties and prevent other messy situations. Because it does not have any connections to a storage array, it is not capable of nor configured to run any SAN services.

Software

A number of pieces of software are used on top of the SAN hardware to provide data mangement and redundancy, high availability, and iSCSI and NFS access.

Gentoo Linux

Both Snares and Bottom run a standard CSL Gentoo Linux Server Image as their operating system and base software. Linux was chosen for its flexibility and the ready availability of all required software. In this instance, Gentoo was used for commonality with other CSL systems, however, most Linux distributions should be usable.

ZFS

ZFS via the ZFS on Linux project is used as the filesystem on the SAN. Currently there is a single zpool, called apocalypse, with 10 drives in a RAID-Z2 vdev and an eleventh drive as an online spare. This allows any two drives in the pool to fail without any loss of availability or data and the system will automatically start a rebuild after a single drive failure to the spare disk.

Using ZFS as our base filesystem provides a number of benefits including transparent data check-summing and compression. ZFS is also capable of transparent deduplication, however, we do not currently have this feature enabled because of the amount of memory it requires and because this feature is not well-tested in ZFS on Linux.

Corosync/Pacemaker

Corosync and Pacemaker are used to provide high availability fail-over of SAN services in the event of a hardware or software failure. All SAN resources should be managed through the cluster software and not directly through the OS configuration files.

Corosync provides messaging and cluster engine services between cluster nodes. It handles the establishment of the cluster and the management of membership and quorum within the cluster.

Pacemaker runs over top of Corosync and provides management of cluster resources using Resource Agents. Pacemaker handles actually starting and stopping cluster resources on each node as well as failing resources between nodes.

Resource Agents

Resource agents are small scripts used by Pacemaker that define how to start, stop, and monitor cluster resources. A large number are included by default with Pacemaker and new ones can be written following a specification provided by Pacemaker.

Currently we use the following resource agents in the cluster:

  • ocf::tjhsst:ZPool (custom written)
  • ocf::heartbeat:IPaddr2
  • ocf::heartbeat:IPv6addr
  • ocf::heartbeat:iSCSITarget
  • ocf::heartbeat:iSCSILogicalUnit
  • ocf::heartbeat:nfsserver
  • stonith:external/riloe
  • stonith:meatware

STONITH

STONITH (Shoot The Other Node In The Head) provides a means for the cluster to ensure the complete removal of a malfunctioning node from the cluster prior to taking over its resources. This is particularly important when managing non-clustered filesystems which will suffer corruption if they are simultaneously activated on multiple cluster nodes.

STONITH provides resource agents which can be used to kill our cluster nodes using the built-in iLO remote management as well as that can ask the systems administrator to manually kill a node if iLO fails.

On Gentoo, STONITH is a part of the cluster-glue package.

LIO

We use the LIO Unified Target integrated into the Linux Kernel to supply iSCSI targets for VM Storage. The LIO LUNs are backed by ZFS Block Devices (ZVOLs). LIO was chosen over other Linux iSCSI targets due to its integration into the Linux Kernel and its active development.

Because all LIO targets and LUNs are managed through pacemaker, it is not necessary to install the management utility (targetcli) on the cluster nodes.

NFS

We use the NFSv4 server integrated into the Linux Kernel to provide NFS exports for VM support and Mail storage.