Warning Livedoc is no longer being updated and will be deprecated shortly. Please refer to https://documentation.tjhsst.edu.

Todo - Summer 2013

From Livedoc - The Documentation Repository
Revision as of 19:14, 26 February 2016 by 2016fwilson (talk | contribs) (Graduates: categorize)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The following list of items should ideally be accomplished over the summer. If you are interested in working on a task below, please add your name next to the system name, see iodine-ldap for an example. Items in red need downtime notification provided to users and the Sysadmins mailing list at least 24 hours in advance of maintenance. Items in orange are on hold until the building is accessible. Maintenance times should be posted here as soon as they are available for general awareness

General Notes

Downtime notifications for items highlighted in Red should be posted via Iodine at least 24 hours in advance (but ideally as soon as possible). They should include both a start time and estimated end time. Be generous when estimating end-times :).

In general, a good pattern to follow when updating systems is: Reboot, Update, Reboot again, Verify. This way you are sure that the system is in working order before beginning work. The Verify step is also very important to make sure that you leave systems in working order :). When running updates, check for unexpected downgrades in the package list before starting an emerge; these can indicate packages that need a later version unmasked or keyworded. Always run updates from within a screen on the host system in case of an unexpected disconnection.

You should also make sure you have a current backup (If the system is running Guardian, check /root/scripts/backup.log to make sure the last backup is current and successful).

For paired/redundant systems (eg: casey/smith or ns1/ns2), there should be at least 24 hours between the maintenance windows for the two servers to allow time for any subtle problems to surface.

Please do not claim tasks unless you intend to start working on them shortly. Claiming a bunch of jobs right off the bat leaves other people looking for things to do.

Updates

VM Servers

Antipodes

  • update system software Samuel Damashek 23:23, 29 July 2013 (EDT)
  • update system kernel to git version 3.4.0-kvm Samuel Damashek 23:23, 29 July 2013 (EDT)

Bottom

  • update system software Samuel Damashek 17:31, 21 July 2013 (EDT)
  • update cluster software Samuel Damashek 17:31, 21 July 2013 (EDT)
    • Pacemaker to 1.1.8-r2
    • Corosync to 2.3.0-r1
    • crmsh to 1.2.5-r3
  • test cluster operation without DNS (with Rockhopper/Snares) Cluster operation without DNS works perfectly. DNS was blocked on Snares and Bottom and Snares was rebooted. All VMs running on the Apocalypse cluster were unaffected and cluster services were transitioned to bottom by pacemaker successfully. (There was a STONITH loop due to the shutdown but in a real situation this would not be an issue) Samuel Damashek 20:10, 21 July 2013 (EDT)

Galapagos

  • update system software ahamilto 15:00 30 July 2013 (EDT)
  • update system kernel to git version 3.4.0-kvm

Littleblue (2016fwilson)

  • finish OS installation Fox Wilson 23:24, 29 July 2013 (EDT)
  • update system software Fox Wilson 23:24, 29 July 2013 (EDT)
  • update system kernel to git version 3.4.0-kvm Fox Wilson 23:24, 29 July 2013 (EDT)

Rockhopper

  • update system software 2017sdamashe (talk) 23:25, 20 July 2013 (EDT)
  • update cluster software 2017sdamashe (talk) 23:25, 20 July 2013 (EDT)
    • Pacemaker to 1.1.8-r2
    • Corosync to 2.3.0-r1
    • crmsh to 1.2.5-r3
  • test cluster operation without DNS (with Bottom/Snares) See above Samuel Damashek 20:10, 21 July 2013 (EDT)

Snares

  • update system software Samuel Damashek 19:05, 21 July 2013 (EDT)
  • update cluster software Samuel Damashek 19:05, 21 July 2013 (EDT)
    • Pacemaker to 1.1.8-r2
    • Corosync to 2.3.0-r1
    • crmsh to 1.2.5-r3
  • test cluster operation without DNS (With Bottom/Rockhopper) See above Samuel Damashek 20:10, 21 July 2013 (EDT)

Vega

  • update system software
  • update system kernel to git version 3.4.0-kvm

Waitaha

  • update system software 2017sdamashe (talk) 02:05, 21 July 2013 (EDT)
  • update system kernel to git version 3.4.0-kvm Already done
  • cleanup Hackathon files 2015msmith Done
  • configure backups 2017sdamashe (talk) 02:05, 21 July 2013 (EDT)
  • configure portage check 2017sdamashe (talk) 02:05, 21 July 2013 (EDT)

VMs

bugs

casey (Srijan Karan)

Remove from mail.tjhsst.edu round-robin prior to maintenance

  • update system software
  • enable virtio memory ballooning
  • reconfigure memory limits

cups2

haimageserver (2014jforcier)

  • update system software 2014jforcier (talk) 22:49, 29 August 2013 (EDT)
  • enable virtio memory ballooning
  • reconfigure memory limits

iodine (Andrew Hamilton)

iodine-ldap (Andrew Hamilton)

license

lists

ltsp2

mysql1

ns1 (2014jforcier)

  • update system software
  • enable virtio memory ballooning
  • reconfigure memory limits
  • develop way to monitor failing zone transfers

ns2 (Andrew Hamilton)

openafs1

openldap1

Move ldap-sun service IP prior to maintenance

openldap2

Move ldap-sun service IP prior to maintenance

openvpn

  • update system software
  • enable virtio memory ballooning
  • reconfigure memory limits

smith

Remove from mail.tjhsst.edu round-robin prior to maintenance

stage64

steeltoe (2017sdamashe)

  • update system software Samuel Damashek 08:29, 23 July 2013 (EDT)
  • enable virtio memory ballooning
  • reconfigure memory limits
  • update netroot environment Samuel Damashek 07:48, 24 July 2013 (EDT)

www (Andrew Hamilton)

Other Servers

Agni (2014jforcier)

  • update system to OpenBSD 5.3
  • reconfigure backups
  • install and configure tac_plus to provide TACACS+ for network management

Scylla (2014jforcier)

  • update system to OpenBSD 5.3
  • configure backups
  • install and configure tac_plus to provide TACACS+ for network management

Nebula

  • install and configure tac_plus to provide TACACS+ for network management

Infrastructure Changes

VM Cluster

apcupsd

install / configure apcupsd (instructions: Apcupsd#Configuration) on the servers in VM Rack 0. Servers should be evenly split between the two UPSes in the rack as follows:

Apocalypse

begin migrating VMs from Fryingpan to Apocalypse; starting with VMs that have online redundant backups or are non-mission critical

  • bugs
  • ns2
  • openldap2
  • smith
  • steeltoe

Clustering

Investigate using pacemaker to manage VM migrations / failovers.

Hardware Moves

  • Antipodes to VM Rack 0 ahamilto 15:00 30 July 2013 (EDT)
  • Galapagos to VM Rack 0 ahamilto 15:00 30 July 2013 (EDT)

Live Migration

Configure and test live-migration of VMs between servers

New Switch

  • rackmount new switch ahamilto 16:00, 26 July 2013 (EDT)
  • install / configure ahamilto 16:00, 26 July 2013 (EDT)
  • repatch rack equipment to new switch ahamilto 16:00, 26 July 2013 (EDT)

Backups (Samuel Damashek)

setup rockhopper to create tars of completed backups for archiving to tape. Archives should be refreshed at least once a week.

Done. /root/scripts/archive-backups.sh, scheduled once a week on Monday at midnight6 am. 2017sdamashe (talk) 01:51, 19 July 2013 (EDT)

Migrate backup operations to the SAN Cluster. Backup operations should try to run opposite the Apocalypse ZPool for redundancy.

In progress, Apocalypse is experiencing issues. Samuel Damashek 12:44, 27 August 2013 (EDT)

Nagios

  • fix / update parent-child relationships
  • ensure that all systems are appropriately monitored
  • ensure that all necessary admins are notified of problems in their area of responsibility

nfs-mail

  • migrate nfs-mail from Dulles/haafs1 to Apocalypse
  • separate current and inactive mail directories
  • configure direct backups from snares/bottom (backups need to failover with Apocalypse)

Sysmon (Samuel Damashek)

setup a system to display CSL System Information to the Machine Room TV

I'd like to work on this; if anyone else does, I'd be more than willing to work out some solution together. 2014jforcier (talk) 15:28, 17 July 2013 (EDT)

I missed this and have been working on this since yesterday. I'll be offline working on it so a collab might not be possible at the moment. 2017sdamashe (talk) 10:52, 18 July 2013 (EDT)
Not quite finished but my progress is at [1]. (Requires VPN) Suggestions welcome. I need more ideas for system stats. 2017sdamashe (talk) 01:55, 19 July 2013 (EDT)

Apache VCL / OpenStack

  • reinstall altair with Gentoo Linux Server Image
  • update system software
  • connect to Apocalypse SAN
  • install VCL 2.3 or OpenStack Grizzly for testing

Clusters

i7 Cluster

If there is any physical work (cabling/booting) that needs to be done with these, let me know and I can take care of it—Andrew Hamilton

  • Status on nodes?
  • Finish installation/configuration
  • Documentation

Itanium Cluster

  • Any plans/ideas?
    • fwilson and I were discussing the possibility of using Itanium for OpenStack. The VMs would be used for Senior research projects along with other school-related projects for which normal web hosting is too restrictive but require to be run in a server atmosphere. It would also be a pilot for TJ into OpenStack. What does everybody else think of this idea? Samuel Damashek 23:25, 7 August 2013 (EDT)
      • Scratch the idea of running virtualization on Itanium, however we may pilot OpenStack on Altair and Sirius. Samuel Damashek 23:30, 7 August 2013 (EDT)

Workstations

  • install new tables from 231 to replace the old narrow tables along the machine room wall
  • organize workstations / cables
  • test all HDDs
  • updates
  • new projectors for teacher stations
  • configure projection station for HD TV

Graduates

  • archive 2013 AFS home directories to openafs2 Samuel Damashek 14:26, 19 September 2013 (EDT)
  • copy 2013 email forwardings to aliases
  • deactivate 2013 email accounts (not before August 1)
  • archive 2013 maildirs to Apocalypse
  • generate 2013 maildir archive for backup to tape