Todo - Summer 2013
The following list of items should ideally be accomplished over the summer. If you are interested in working on a task below, please add your name next to the system name, see iodine-ldap for an example. Items in red need downtime notification provided to users and the Sysadmins mailing list at least 24 hours in advance of maintenance. Items in orange are on hold until the building is accessible. Maintenance times should be posted here as soon as they are available for general awareness
Contents
- 1 General Notes
- 2 Updates
- 2.1 VM Servers
- 2.2 VMs
- 2.2.1 bugs
- 2.2.2 casey
- 2.2.3 cups2
- 2.2.4 haimageserver (2014jforcier)
- 2.2.5 iodine
- 2.2.6 iodine-ldap (Andrew Hamilton)
- 2.2.7 license
- 2.2.8 lists
- 2.2.9 ltsp2
- 2.2.10 mysql1
- 2.2.11 ns1 (2014jforcier)
- 2.2.12 ns2 (2014jforcier)
- 2.2.13 openafs1
- 2.2.14 openldap1
- 2.2.15 openldap2
- 2.2.16 openvpn
- 2.2.17 smith
- 2.2.18 stage64
- 2.2.19 steeltoe (2017sdamashe)
- 2.2.20 www
- 2.3 Other Servers
- 3 Infrastructure Changes
- 4 Clusters
- 5 Workstations
- 6 Graduates
General Notes
Downtime notifications for items highlighted in Red should be posted via Iodine at least 24 hours in advance (but ideally as soon as possible). They should include both a start time and estimated end time. Be generous when estimating end-times :).
In general, a good pattern to follow when updating systems is: Reboot, Update, Reboot again, Verify. This way you are sure that the system is in working order before beginning work. The Verify step is also very important to make sure that you leave systems in working order :). When running updates, check for unexpected downgrades in the package list before starting an emerge; these can indicate packages that need a later version unmasked or keyworded. Always run updates from within a screen on the host system in case of an unexpected disconnection.
You should also make sure you have a current backup (If the system is running Guardian, check /root/scripts/backup.log to make sure the last backup is current and successful).
For paired/redundant systems (eg: casey/smith or ns1/ns2), there should be at least 24 hours between the maintenance windows for the two servers to allow time for any subtle problems to surface.
Please do not claim tasks unless you intend to start working on them shortly. Claiming a bunch of jobs right off the bat leaves other people looking for things to do.
Updates
VM Servers
Antipodes
- update system software
- update system kernel to git version 3.4.0-kvm
Bottom
update system softwareSamuel Damashek 17:31, 21 July 2013 (EDT)update cluster softwareSamuel Damashek 17:31, 21 July 2013 (EDT)- Pacemaker to 1.1.8-r2
- Corosync to 2.3.0-r1
- crmsh to 1.2.5-r3
test cluster operation without DNS (with Rockhopper/Snares)Cluster operation without DNS works perfectly. DNS was blocked on Snares and Bottom and Snares was rebooted. All VMs running on the Apocalypse cluster were unaffected and cluster services were transitioned to bottom by pacemaker successfully. Samuel Damashek 20:10, 21 July 2013 (EDT)
Galapagos
- update system software
- update system kernel to git version 3.4.0-kvm
Littleblue (2016fwilson)
(done)
- finish OS installation
- update system software
- update system kernel to git version 3.4.0-kvm
Rockhopper
update system software2017sdamashe (talk) 23:25, 20 July 2013 (EDT)update cluster software2017sdamashe (talk) 23:25, 20 July 2013 (EDT)- Pacemaker to 1.1.8-r2
- Corosync to 2.3.0-r1
- crmsh to 1.2.5-r3
test cluster operation without DNS (with Bottom/Snares)See above Samuel Damashek 20:10, 21 July 2013 (EDT)
Snares
update system softwareSamuel Damashek 19:05, 21 July 2013 (EDT)update cluster softwareSamuel Damashek 19:05, 21 July 2013 (EDT)- Pacemaker to 1.1.8-r2
- Corosync to 2.3.0-r1
- crmsh to 1.2.5-r3
test cluster operation without DNS (With Bottom/Rockhopper)See above Samuel Damashek 20:10, 21 July 2013 (EDT)
Vega
- update system software
- update system kernel to git version 3.4.0-kvm
Waitaha
update system software2017sdamashe (talk) 02:05, 21 July 2013 (EDT)update system kernel to git version 3.4.0-kvmAlready done- cleanup Hackathon files
configure backups2017sdamashe (talk) 02:05, 21 July 2013 (EDT)configure portage check2017sdamashe (talk) 02:05, 21 July 2013 (EDT)
VMs
bugs
update system software2017sdamashe (talk) 02:07, 20 July 2013 (EDT)update Bugzilla2017sdamashe (talk) 02:18, 20 July 2013 (EDT)- enable virtio memory ballooning
- reconfigure memory limits
casey
Remove from mail.tjhsst.edu round-robin prior to maintenance
- update system software
- enable virtio memory ballooning
- reconfigure memory limits
cups2
update system software2017sdamashe (talk) 02:07, 20 July 2013 (EDT)- enable virtio memory ballooning
- reconfigure memory limits
haimageserver (2014jforcier)
- update system software
- enable virtio memory ballooning
- reconfigure memory limits
iodine
- update system software
- enable virtio memory ballooning
- reconfigure memory limits
iodine-ldap (Andrew Hamilton)
- update system software
- enable virtio memory ballooning
- reconfigure memory limits
license
- update system software
- enable virtio memory ballooning
- reconfigure memory limits
lists
- convert to Gentoo
- update system software
- enable virtio memory ballooning
- reconfigure memory limits
ltsp2
- update system software
- enable virtio memory ballooning
- reconfigure memory limits
mysql1
- update system software
- enable virtio memory ballooning
- reconfigure memory limits
ns1 (2014jforcier)
- update system software
- enable virtio memory ballooning
- reconfigure memory limits
- develop way to monitor failing zone transfers
ns2 (2014jforcier)
- update system software
- enable virtio memory ballooning
- reconfigure memory limits
openafs1
- convert to Gentoo
- update system software
- enable virtio memory ballooning
- reconfigure memory limits
openldap1
Move ldap-sun service IP prior to maintenance
- update system software
- enable virtio memory ballooning
- reconfigure memory limits
openldap2
Move ldap-sun service IP prior to maintenance
- update system software
- enable virtio memory ballooning
- reconfigure memory limits
openvpn
- update system software
- enable virtio memory ballooning
- reconfigure memory limits
smith
Remove from mail.tjhsst.edu round-robin prior to maintenance
- update system software
- enable virtio memory ballooning
- reconfigure memory limits
stage64
update system software2017sdamashe (talk) 02:07, 20 July 2013 (EDT)- enable virtio memory ballooning
- reconfigure memory limits
fix nagios diskspace check2017sdamashe (talk) 02:07, 20 July 2013 (EDT)
steeltoe (2017sdamashe)
update system softwareSamuel Damashek 08:29, 23 July 2013 (EDT)- enable virtio memory ballooning
- reconfigure memory limits
update netroot environmentSamuel Damashek 07:48, 24 July 2013 (EDT)
www
- update system software
- enable virtio memory ballooning
- reconfigure memory limits
Other Servers
Agni (2014jforcier)
- update system to OpenBSD 5.3
- reconfigure backups
- install and configure tac_plus to provide TACACS+ for network management
Scylla (2014jforcier)
- update system to OpenBSD 5.3
- configure backups
- install and configure tac_plus to provide TACACS+ for network management
Nebula
- install and configure tac_plus to provide TACACS+ for network management
Infrastructure Changes
VM Cluster
apcupsd
install / configure apcupsd (instructions: Apcupsd#Configuration) on the servers in VM Rack 0. Servers should be evenly split between the two UPSes in the rack as follows:
- apcupsnet14
bottomSamuel Damashek 19:06, 21 July 2013 (EDT)waitahaSamuel Damashek 14:18, 21 July 2013 (EDT)- antipodes (once moved)
- apcupsnet15
snaresSamuel Damashek 19:06, 21 July 2013 (EDT)littleblueFox Wilson 09:49, 24 July 2013 (EDT)rockhopper (sync disks, do not shut down)2017sdamashe (talk) 23:27, 20 July 2013 (EDT)- galapagos (once moved)
Apocalypse
begin migrating VMs from Fryingpan to Apocalypse; starting with VMs that have online redundant backups or are non-mission critical
- bugs
- ns2
- openldap2
- smith
- steeltoe
Clustering
Investigate using pacemaker to manage VM migrations / failovers.
Hardware Moves
Live Migration
Configure and test live-migration of VMs between servers
New Switch
rackmount new switchahamilto 16:00, 26 July 2013 (EDT)install / configureahamilto 16:00, 26 July 2013 (EDT)repatch rack equipment to new switchahamilto 16:00, 26 July 2013 (EDT)
Backups (Samuel Damashek)
setup rockhopper to create tars of completed backups for archiving to tape. Archives should be refreshed at least once a week.
- Done. /root/scripts/archive-backups.sh, scheduled once a week on Monday at
midnight6 am. 2017sdamashe (talk) 01:51, 19 July 2013 (EDT)
Migrate backup operations to the SAN Cluster. Backup operations should try to run opposite the Apocalypse ZPool for redundancy.
Nagios
- fix / update parent-child relationships
- ensure that all systems are appropriately monitored
- ensure that all necessary admins are notified of problems in their area of responsibility
nfs-mail
- migrate nfs-mail from Dulles/haafs1 to Apocalypse
- separate current and inactive mail directories
- configure direct backups from snares/bottom (backups need to failover with Apocalypse)
Sysmon (Samuel Damashek)
setup a system to display CSL System Information to the Machine Room TV
I'd like to work on this; if anyone else does, I'd be more than willing to work out some solution together. 2014jforcier (talk) 15:28, 17 July 2013 (EDT)
- I missed this and have been working on this since yesterday. I'll be offline working on it so a collab might not be possible at the moment. 2017sdamashe (talk) 10:52, 18 July 2013 (EDT)
- Not quite finished but my progress is at [1]. (Requires VPN) Suggestions welcome. I need more ideas for system stats. 2017sdamashe (talk) 01:55, 19 July 2013 (EDT)
Apache VCL
- reinstall altair with Gentoo Linux Server Image
- update system software
- connect to Apocalypse SAN
- install VCL 2.3 for testing
Clusters
i7 Cluster
If there is any physical work (cabling/booting) that needs to be done with these, let me know and I can take care of it—Andrew Hamilton
- Status on nodes?
- Finish installation/configuration
- Documentation
Itanium Cluster
- Any plans/ideas?
Workstations
- install new tables from 231 to replace the old narrow tables along the machine room wall
- organize workstations / cables
- test all HDDs
- updates
- new projectors for teacher stations
- configure projection station for HD TV
Graduates
- archive 2013 AFS home directories to openafs2
- copy 2013 email forwardings to aliases
- deactivate 2013 email accounts (not before August 1)
- archive 2013 maildirs to Apocalypse
- generate 2013 maildir archive for backup to tape