Warning Livedoc is no longer being updated and will be deprecated shortly. Please refer to https://documentation.tjhsst.edu.

Todo - Summer 2013

From Livedoc - The Documentation Repository
Revision as of 13:04, 17 July 2013 by Andrew Hamilton (talk | contribs) (add additional tasks)
Jump to: navigation, search

The following list of items should ideally be accomplished over the summer. If you are interested in working on a task below, please add your name next to the system name, see iodine-ldap for an example. Items in red need downtime notification provided to users and the Sysadmins mailing list at least 24 hours in advance of maintenance. Items in orange are on hold until the building is accessible. Maintenance times should be posted here as soon as they are available for general awareness

General Notes

Downtime notifications for items highlighted in Red should be posted via Iodine at least 24 hours in advance (but ideally as soon as possible). They should include both a start time and estimated end time. Be generous when estimating end-times :).

In general, a good pattern to follow when updating systems is: Reboot, Update, Reboot again, Verify. This way you are sure that the system is in working order before beginning work. The Verify step is also very important to make sure that you leave systems in working order :). When running updates, check for unexpected downgrades in the package list before starting an emerge; these can indicate packages that need a later version unmasked or keyworded. Always run updates from within a screen on the host system in case of an unexpected disconnection.

You should also make sure you have a current backup (If the system is running Guardian, check /root/scripts/backup.log to make sure the last backup is current and successful).

For paired/redundant systems (eg: casey/smith or ns1/ns2), there should be at least 24 hours between the maintenance windows for the two servers to allow time for any subtle problems to surface.

Please do not claim tasks unless you intend to start working on them shortly. Claiming a bunch of jobs right off the bat leaves other people looking for things to do.

Updates

VM Servers

Antipodes

  • update system software
  • update system kernel to git version 3.4.0-kvm

Bottom

  • update system software
  • update cluster software
    • Pacemaker to 1.1.8-r2
    • Corosync to 2.3.0-r1
    • crmsh to 1.2.5-r3
  • test cluster operation without DNS (with Rockhopper/Snares)

Galapagos

  • update system software
  • update system kernel to git version 3.4.0-kvm

Littleblue

  • finish OS installation
  • update system software
  • update system kernel to git version 3.4.0-kvm

Rockhopper

  • update system software
  • update cluster software
    • Pacemaker to 1.1.8-r2
    • Corosync to 2.3.0-r1
    • crmsh to 1.2.5-r3
  • test cluster operation without DNS (with Bottom/Snares)

Snares

  • update system software
  • update cluster software
    • Pacemaker to 1.1.8-r2
    • Corosync to 2.3.0-r1
    • crmsh to 1.2.5-r3
  • test cluster operation without DNS (With Bottom/Rockhopper)

Vega

  • update system software
  • update system kernel to git version 3.4.0-kvm

Waitaha

  • update system software
  • update system kernel to git version 3.4.0-kvm
  • cleanup Hackathon files
  • configure backups
  • configure portage check

VMs

bugs

  • update system software
  • update Bugzilla
  • enable virtio memory ballooning
  • reconfigure memory limits

casey

Remove from mail.tjhsst.edu round-robin prior to maintenance

  • update system software
  • enable virtio memory ballooning
  • reconfigure memory limits

cups2

  • update system software
  • enable virtio memory ballooning
  • reconfigure memory limits

haimageserver

  • update system software
  • enable virtio memory ballooning
  • reconfigure memory limits

iodine

  • update system software
  • enable virtio memory ballooning
  • reconfigure memory limits

iodine-ldap (Andrew Hamilton)

  • update system software
  • enable virtio memory ballooning
  • reconfigure memory limits

license

  • update system software
  • enable virtio memory ballooning
  • reconfigure memory limits

lists

  • convert to Gentoo
  • update system software
  • enable virtio memory ballooning
  • reconfigure memory limits

ltsp2

  • update system software
  • enable virtio memory ballooning
  • reconfigure memory limits

mysql1

  • update system software
  • enable virtio memory ballooning
  • reconfigure memory limits

ns1

  • update system software
  • enable virtio memory ballooning
  • reconfigure memory limits
  • develop way to monitor failing zone transfers

ns2

  • update system software
  • enable virtio memory ballooning
  • reconfigure memory limits

openafs1

  • convert to Gentoo
  • update system software
  • enable virtio memory ballooning
  • reconfigure memory limits

openldap1

Move ldap-sun service IP prior to maintenance

  • update system software
  • enable virtio memory ballooning
  • reconfigure memory limits

openldap2

Move ldap-sun service IP prior to maintenance

  • update system software
  • enable virtio memory ballooning
  • reconfigure memory limits

openvpn

  • update system software
  • enable virtio memory ballooning
  • reconfigure memory limits

smith

Remove from mail.tjhsst.edu round-robin prior to maintenance

  • update system software
  • enable virtio memory ballooning
  • reconfigure memory limits

stage64

  • update system software
  • enable virtio memory ballooning
  • reconfigure memory limits
  • fix nagios diskspace check

steeltoe

  • update system software
  • enable virtio memory ballooning
  • reconfigure memory limits
  • update netroot environment

www

  • update system software
  • enable virtio memory ballooning
  • reconfigure memory limits

Other Servers

Agni

  • update system to OpenBSD 5.3
  • reconfigure backups

Scylla

  • update system to OpenBSD 5.3
  • configure backups

Infrastructure Changes

VM Cluster

apcupsd

install / configure apcupsd on the servers in VM Rack 0. Servers should be evenly split between the two UPSes in the rack as follows:

  • apcupsnet14
    • bottom
    • waitaha
    • antipodes (once moved)
  • apcupsnet15
    • snares
    • littleblue
    • rockhopper (sync disks, do not shut down)
    • galapagos (once moved)

Apocalypse

begin migrating VMs from Fryingpan to Apocalypse; starting with VMs that have online redundant backups or are non-mission critical

  • bugs
  • ns2
  • openldap2
  • smith
  • steeltoe

Clustering

Investigate using pacemaker to manage VM migrations / failovers.

Hardware Moves

  • Antipodes to VM Rack 0 (on hold pending new switch)
  • Galapagos to VM Rack 0 (on hold pending new switch)

Live Migration

Configure and test live-migration of VMs between servers

New Switch

  • rackmount new switch (waiting on hardware, no current ETA)
  • install / configure
  • repatch rack equipment to new switch

Backups

setup rockhopper to create tars of completed backups for archiving to tape. Archives should be refreshed at least once a week.

Nagios

  • fix / update parent-child relationships
  • ensure that all systems are appropriately monitored
  • ensure that all necessary admins are notified of problems in their area of responsibility

nfs-mail

  • migrate nfs-mail from Dulles/haafs1 to Apocalypse
  • separate current and inactive mail directories
  • configure direct backups from snares/bottom (backups need to failover with Apocalypse)

Sysmon

setup a system to display CSL System Information to the Machine Room TV

Apache VCL

  • reinstall altair with Gentoo Linux Server Image
  • update system software
  • connect to Apocalypse SAN
  • install VCL 2.3 for testing

Clusters

i7 Cluster

  • Status on nodes?
  • Finish installation/configuration
  • Documentation

Itanium Cluster

  • Any plans/ideas?

Workstations

  • install new tables from 231 to replace the old narrow tables along the machine room wall
  • organize workstations / cables
  • test all HDDs
  • updates
  • new projectors for teacher stations
  • configure projection station for HD TV

Graduates

  • archive 2013 AFS home directories to openafs2
  • copy 2013 email forwardings to aliases
  • deactivate 2013 email accounts (not before August 1st)
  • archive 2013 maildirs to Apocalypse
  • generate 2013 maildir archive for backup to tape