Warning Livedoc is no longer being updated and will be deprecated shortly. Please refer to https://documentation.tjhsst.edu.

Difference between revisions of "HA-iSCSI"

From Livedoc - The Documentation Repository
Jump to: navigation, search
(update on the ha-iscsi)
(update HA-iSCSI page for new system)
Line 1: Line 1:
Royal and Fiordland are the lab's storage servers. As of November 2010 HA-iSCSI is down indefinitely.
+
'''HA-iSCSI''' is a high-availability storage system which serves most of the Xen Virtual Machines. [[Royal]] and [[Magellanic]] are the lab's storage servers.
  
 
==Hardware==
 
==Hardware==
[[Royal]] and [[Fiordland]] each have 2 QLA2200 HBAs in them.  Each server has a single path to each Fibre Channel Switch (Powervault 51F)Each switch then has a connection to one of the 224Fs controllers. This provides two completely separate Fibre Channel fabrics from the hard drive itself to the server.  Any switch, IO module, or HBA can explode and HA-iSCSI will still be up.
+
[[Royal]] and [[Magellanic]] each have a QLogic ISP2312-based 2Gb Fibre Channel HBA in themThese HBAs are then connected via fiber-optic links to a pair of Sun StorEdge 3510 FC JBOD Arrays. These connections are made by daisy-chaining so that in the event of a storage array failure, one of the servers will still be able to reach the working array.
  
Currently, 3 Dell 224F JBODs are being used.  The 660Fs with their smart controllers were giving problems: trying to reset the 224Fs, not liking to be configured and such. Thumbtack and Soupspoon have 14x73GB drives and Bigfish has 14x146GB drives.
+
Each array has 12x72GB FC SCSI Drives. A single RAID-Z2 zpool was created on each array; these are named Fryingpan (array ID 0) and Dutchoven (array ID 1). Individual ZVOLs are then created in these zpools for exporting as VM partitions. In addition, space is reserved on Fryingpan for the vminfo NFS share that is used to store the VM configurations and kernels. As of November 2011, Fryingpan is used for storing production VMs while Dutchoven is used for testing purposes and as a warm backup should Fryingpan catastrophically fail.
  
 
==Software==
 
==Software==
Currently, the "iSCSI Enterprise Target" is used (http://iscsitarget.sourceforge.net/).  This is the iscsitarget package in portage and is also known as iet.  IETD provides two IO methods for targets, BlockIO and FileIO.  FileIO uses the Linux page cache (filesystem cache) allowing reads to be cached, but everything must be copied twice.  BlockIO does not use the page cache and thus does not have to copy data twice, resulting in faster write speeds, but slightly reduced read speeds.  Since VMs can cache reads, but not writes, it was decided to go for faster write speeds and use BlockIO.
+
iSCSI management is done using the COMSTAR iSCSI target built into NexentaOS.
  
In order to make iSCSI HA, Pacemaker is used (on top of OpenAIS/Corosync) (http://clusterlabs.org/wiki/Main_Page).  Under normal conditions, one server runs haicsci1 and the other haicsi2.  If one server is unable to serve haiscsi for any reason, then the other server takes over the failed servers' haiscsi. This consists of starting a Linux software raid array, hot-adding the iSCSI target, and bringing up the service IP.
+
High availability management is done using the SimpleHA package of scripts from the Rochester Center for Brain Imaging. The scripts are located in /opt/SimpleHA/ on both Royal and Magellanic. This system uses a primary heartbeat over TCP/IP on the SAN VLAN with a secondary heartbeat over a null serial cable connecting the two servers. A tie-breaking quorum check is made by pinging the 4506. In the event the slave node decides that it needs to take over operations, it will first attempt to kill the active primary node via an iLO power reset and will only bring up services if it can confirm the reset was successful.
 
 
In addition to HA-iSCSI, the storage servers provide imageserver (not yet HA) and HA-NFS used for /usr/portage on vms and vm configs and locks.  To keep locks in place, locks are stored on the shared array instead of on local storage.
 
  
 
==Administration==
 
==Administration==
 
===Bringing up a storage server===
 
===Bringing up a storage server===
After booting a storage server (royal or fiord), perform these steps to allow it to run services again:
 
*modprobe qla2xxx #loads the hba modules, takes a while to load and detect all disk drives
 
*multipath #creates multipath devices for each drive
 
*/etc/init.d/multipathd start #Starts the multipath path checker daemon (this may not be needed)
 
*/etc/init.d/ietd start #Start the iSCSI target daemon
 
*/etc/init.d/corosync start #Start the corosync (which automatically starts pacemaker)
 
  
 
===Monitoring Cluster Status===
 
===Monitoring Cluster Status===
Run 'crm_mon' on a cluster node to real time view resource status.
+
On either node, cat /opt/SimpleHA/status.txt, possible states are MASTER, SLAVE, and INIT (the node is in the process of becoming MASTER).
 +
 
 +
===Adding a new VM on HA-iSCSI===

Revision as of 01:12, 20 November 2011

HA-iSCSI is a high-availability storage system which serves most of the Xen Virtual Machines. Royal and Magellanic are the lab's storage servers.

Hardware

Royal and Magellanic each have a QLogic ISP2312-based 2Gb Fibre Channel HBA in them. These HBAs are then connected via fiber-optic links to a pair of Sun StorEdge 3510 FC JBOD Arrays. These connections are made by daisy-chaining so that in the event of a storage array failure, one of the servers will still be able to reach the working array.

Each array has 12x72GB FC SCSI Drives. A single RAID-Z2 zpool was created on each array; these are named Fryingpan (array ID 0) and Dutchoven (array ID 1). Individual ZVOLs are then created in these zpools for exporting as VM partitions. In addition, space is reserved on Fryingpan for the vminfo NFS share that is used to store the VM configurations and kernels. As of November 2011, Fryingpan is used for storing production VMs while Dutchoven is used for testing purposes and as a warm backup should Fryingpan catastrophically fail.

Software

iSCSI management is done using the COMSTAR iSCSI target built into NexentaOS.

High availability management is done using the SimpleHA package of scripts from the Rochester Center for Brain Imaging. The scripts are located in /opt/SimpleHA/ on both Royal and Magellanic. This system uses a primary heartbeat over TCP/IP on the SAN VLAN with a secondary heartbeat over a null serial cable connecting the two servers. A tie-breaking quorum check is made by pinging the 4506. In the event the slave node decides that it needs to take over operations, it will first attempt to kill the active primary node via an iLO power reset and will only bring up services if it can confirm the reset was successful.

Administration

Bringing up a storage server

Monitoring Cluster Status

On either node, cat /opt/SimpleHA/status.txt, possible states are MASTER, SLAVE, and INIT (the node is in the process of becoming MASTER).

Adding a new VM on HA-iSCSI