Warning Livedoc is no longer being updated and will be deprecated shortly. Please refer to https://documentation.tjhsst.edu.

Guardian Backup System

From Livedoc - The Documentation Repository
Revision as of 00:27, 27 September 2013 by Andrew Hamilton (talk | contribs) (add www backup time)
Jump to: navigation, search

The Guardian Backup System (named after the zone/zpool for lack of a better name) is a set of scripts that is currently used to backup systems to Rockhopper.

Layout

The backup system consists of three main parts. A script and excludes file that are located on the host to be backed up, an intermediary zone, and a backup host which interfaces with the backup storage and also runs the zone.

Backup Script

A script and excludes file are used on the host to be backed up. The script will backup the entire system (minus anything in the excludes file) to an intermediate zone (currently Guardian) using rsync and the host's keytab. The script also takes an argument to dump databases from a local MySQL server or LDAP server before rsyncing in order to preserve consistency. In this case, a databases file is also needed which contains the databases to be backed up as well as their engine type (currently supports MyISAM and InnoDB). The backup script is executed by an entry in the root crontab. Hosts are currently scheduled to run every 5 minutes with the hour and half-hour off to allow any long-running backups some extra time to finish.

A variant of this script is used to backup shared storage directories independent of their hosts (for example, nfs-mail).

Container

Guardian is the current intermediate container that all of the systems back up to and from which the backup host copies the backups to more permanent storage. All of the hosts that are to be backed up need to have their host principal in ~root/.k5login on the zone in order to allow them to passwordless rsync using GSSAPI. The reasons for using an intermediate zone are 1) Speed: the zone's storage is tuned primarily for speed so that backups have a minimal impact on the host while the permanent storage is tuned for maximum space savings and 2) Security: if a host is compromised, only root at Guardian is gained and previously made backups are much harder to compromise.

Guardian is stored on the same ZPool as the long-term backup storage.

Backup Host

Rockhopper is currently being used with iSCSI drives exported from Alexandria as the backup host for long-term storage. The zfs filesystem here (via the zfs-linux project from Lawrence Livermore National Laboratory) is organized by system. Every morning at 0600, Agni rsyncs a copy of each host's latest backup from guardian to permanent storage and then snapshots the backup. The script also checks to make sure each host actually ran the backup using a per-host checkin file which contains the UNIX Timestamp of when the host finished running its backup script. When it is done, it sends a summary email with information about which hosts did and did not backup successfully along with how long the migration took.

Initially, Agni was used as the backup host, however, it lacked sufficient memory to run many backups at speed and backup services were transferred to Fiordland as the number of backups grew. Backups were later migrated to Rockhopper when Fiordland developed temporary stability issues.

Adding a host

To setup a host to backup, copy the backup script and backup excludes file to the host and edit them appropriately. For a host with a mysql server, you will also need a backup-databases file as well as a .my.cnf file for root with appropriate credentials on each database to be backed up. For a host with an LDAP server, you will need a backup-ldapdatabases file.

On Guardian, add the host's principal to /root/.k5login

On Rockhopper, create a zfs filesystem for the host and add its FQDN to /root/scripts/backup-hosts so that the migration script knows about it.

Now run the backup script to make sure everything works properly. You can check your excludes file by tailing the script's log file (default is /root/scripts/backup.log). After everything is working, add the backup script to the host crontab in the next available timeslot.

Offsite Backups

Periodically, both Host and AFS backups are migrated manually from Rockhopper's long term storage to a pair of Western Digital USB3.0 2 Terabyte Drives. One of these drives is stored offsite while the other one is connected to Antipodes in the machine room. The two drives are rotated at least weekly to ensure that the offsite backups are kept reasonably up-to-date.

Notes/Bugs

Backups must check-in within 12 hours of the time when the migration script runs or it will flag them as having failed. Since the migration script runs at 6AM, this means backups should be scheduled sometime between 6PM and 5:30AM (currently our host backups start just after midnight and finish by about 2:30).

Current Backup Schedule

The following is the current schedule of host backups. These are setup primarily in the order in which hosts were added to the system. All hosts run the standard backup script unless noted.

  • 2200: nfs-mail (Directory Backup from Casey)
  • 0005: ion
  • 0010: bugs
  • 0015: license
  • 0020: monitor
  • 0025: ns1
  • 0030: INTENTIONALLY LEFT EMPTY
  • 0035: Bottom
  • 0040: ns2
  • 0045: Galapagos
  • 0050: smith
  • 0055: Rockhopper
  • 0100: INTENTIONALLY LEFT EMPTY
  • 0105: openvpn
  • 0110: Snares
  • 0115: casey
  • 0120: Antipodes
  • 0125: iodine (runs MySQL)
  • 0130: INTENTIONALLY LEFT EMPTY
  • 0135: Weather
  • 0140: www
  • 0145: openafs4 (runs AFS)
  • 0150: EMPTY
  • 0155: openldap1 (runs LDAP)
  • 0200: INTENTIONALLY LEFT EMPTY
  • 0205: openldap2
  • 0210: lists
  • 0215: mysql1 (runs MySQL)
  • 0220: openafs1 (runs AFS)
  • 0225: EMPTY
  • 0230: INTENTIONALLY LEFT EMPTY
  • 0235: cups2
  • 0240: haimageserver