Guardian Backup System
The backup system consists of three main parts. A script and excludes file that are located on the host to be backed up, an intermediary zone, and a backup host which interfaces with the backup storage and also runs the zone.
A script and excludes file are used on the host to be backed up. The script will backup the entire system (minus anything in the excludes file) to an intermediate zone (currently Guardian) using rsync and the host's keytab. The script also takes an argument to dump databases from a local MySQL server or LDAP server before rsyncing in order to preserve consistency. In this case, a databases file is also needed which contains the databases to be backed up as well as their engine type (currently supports MyISAM and InnoDB). The backup script is executed by an entry in the root crontab. Hosts are currently scheduled to run every 5 minutes with the hour and half-hour off to allow any long-running backups some extra time to finish.
A variant of this script is used to backup shared storage directories independent of their hosts (for example, nfs-mail).
Guardian is the current intermediate container that all of the systems back up to and from which the backup host copies the backups to more permanent storage. All of the hosts that are to be backed up need to have their host principal in ~root/.k5login on the zone in order to allow them to passwordless rsync using GSSAPI. The reasons for using an intermediate zone are 1) Speed: the zone's storage is tuned primarily for speed so that backups have a minimal impact on the host while the permanent storage is tuned for maximum space savings and 2) Security: if a host is compromised, only root at Guardian is gained and previously made backups are much harder to compromise.
Guardian is stored on the same ZPool as the long-term backup storage.
Snares and Bottom are currently being used with iSCSI drives exported from Alexandria as the backup host for long-term storage. The zfs filesystem here (via the zfs-linux project from Lawrence Livermore National Laboratory) is organized by system. Every morning at 0430, the active server rsyncs a copy of each host's latest backup from guardian to permanent storage and then snapshots the backup. The script also checks to make sure each host actually ran the backup using a per-host checkin file which contains the UNIX Timestamp of when the host finished running its backup script. When it is done, it sends a summary email with information about which hosts did and did not backup successfully along with how long the migration took.
Initially, Agni was used as the backup host, however, it lacked sufficient memory to run many backups at speed and backup services were transferred to Fiordland as the number of backups grew. Backups were later migrated to Rockhopper when Fiordland developed temporary stability issues. As the number of backups grew, Rockhopper also had insufficient memory to handle daily backup duties and they were transferred to the SAN servers (although they are currently not managed by the cluster).
Adding a host
To setup a host to backup, copy the backup script and backup excludes file to the host and edit them appropriately. For a host with a mysql server, you will also need a backup-databases file as well as a .my.cnf file for root with appropriate credentials on each database to be backed up. For a host with an LDAP server, you will need a backup-ldapdatabases file.
On Guardian, add the host's principal to /root/.k5login
On Snares or Bottom (whichever has the zpool imported), create a zfs filesystem for the host and add its FQDN to /root/scripts/backup-hosts so that the migration script knows about it. Then copy the backup-hosts file to the inactive server.
Now run the backup script to make sure everything works properly. You can check your excludes file by tailing the script's log file (default is /root/scripts/backup.log). After everything is working, add the backup script to the host crontab in the next available timeslot.
Periodically, both Host and AFS backups are migrated manually from the long term storage to a LTO-5 tapes stored in Scribe. These tapes are then taken offsite or stored in the school's vault. Because the LTFS system used on the tapes does not support directories or permissions, each host backup is tarred prior to being copied to tape. Over the summer, archival backups are also made of legacy student and host data that no longer needs to be backed up on a daily basis. These archives are also stored both offsite and in the vault.
Backups must check-in within 12 hours of the time when the migration script runs or it will flag them as having failed. Since the migration script runs at 4:30AM, this means backups should be scheduled sometime between 6PM and 3:30AM (currently our host backups start just after midnight and finish by about 0300).
Current Backup Schedule
The following is the current schedule of host backups. These are setup primarily in the order in which hosts were added to the system. All hosts run the standard backup script unless noted.
- 2200: nfs-mail (Directory Backup from Casey)
- 0005: ion
- 0010: bugs
- 0015: license
- 0020: monitor
- 0025: ns1
- 0030: INTENTIONALLY LEFT EMPTY
- 0035: Bottom
- 0040: ns2
- 0045: Galapagos
- 0050: smith
- 0055: Rockhopper
- 0100: INTENTIONALLY LEFT EMPTY
- 0105: openvpn
- 0110: Snares
- 0115: casey
- 0120: Antipodes
- 0125: iodine (runs MySQL)
- 0130: INTENTIONALLY LEFT EMPTY
- 0135: Weather
- 0140: www
- 0145: openafs4 (runs AFS)
- 0150: EMPTY
- 0155: openldap1 (runs LDAP)
- 0200: INTENTIONALLY LEFT EMPTY
- 0205: openldap2
- 0210: lists
- 0215: mysql1 (runs MySQL)
- 0220: openafs1 (runs AFS)
- 0225: EMPTY
- 0230: INTENTIONALLY LEFT EMPTY
- 0235: cups2
- 0240: haimageserver