2020-10-05

2020-10-05 

It is coming time to do some deferred maintenance on Greenplanet that will require draining all
jobs and rebooting all systems. If everything goes smoothly, the core systems and most of the
nodes should only be down for a day.

This is not a critical emergency shutdown, so we have flexibility in exact dates and it can be 
pushed back if there are major conflicts. However, many core systems have been running for more
than 2 years and need significant firmware and OS upgrades.

The current target is Thursday, 5 November. Please let us know if this is a horrible, rather
than merely a bad, time to go down. Running jobs will get killed, but pending jobs are probably OK,
so we will place a maintenance reservation holding jobs that won't finish before the shutdown.

  Planned changes:
1) Update Lustre (/DFS-L) to 2.12.5 (currently 2.10.6)
2) Update BeeGFS (/DFS-B) to 7.2 (currently 6.19)
3) Convert old NFS servers to new system (/data12 through /data25)
4) Retire oldest NFS servers (/data1 through /data11)
5) Update all nodes to CentOS 7.8 (now on 7.5-7.7)
6) Update Slurm to 20.02.6 (from 19.05.4)
7) Replace any remaining DDR Infiniband with QDR