Please read: Reboot of all CHTC servers NEXT WEEK to fix security vulnerability


Date: Fri, 17 Aug 2018 16:31:12 -0500
From: chtc-users@xxxxxxxxxxx
Subject: Please read: Reboot of all CHTC servers NEXT WEEK to fix security vulnerability
Greetings CHTC Users,

In order to patch CHTC systems for a vulnerability that became apparent this week, a fullÂrebootÂofÂallÂCHTC servers is required ASAP.ÂWe will startÂrebootingÂservers in our HTC System and HPC Cluster starting Monday (Aug 20), as described below.

For the HTC System:
  • Starting Monday, execute servers willÂrebootedÂon a rolling basis over the course of the weekend.
  • Already-running jobs that have not completed by the time an execute server isÂrebootedÂwill beÂevicted, which means that HTCondor will interrupt the jobs, but keep them in the queue (back in Idle state) to run again.
  • Submit servers will be briefly unavailable (less than 1 hour) when they're individuallyÂrebootedÂsometime MondayÂ(including group-owned submit servers).
For the HPC Cluster:
  • By Monday, execute servers will not start new jobs. Starting Wednesday (8/22), execute servers will beÂrebootedÂon a rolling basis.Â
  • The queue will interrupt and remove an already-running job when any of its execute servers begin the reboot process. These jobs will need to be re-submitted by users.
  • The head nodes will be briefly unavailable (less than 1 hour) when they areÂrebootedÂsometime on/after Monday, though the queue will be preserved (to run queued jobs afterwards).
For *some* licensed software:
  • The license server supporting Matlab, Lumerical, Comsol, and Converge softwares will be unavailable Wednesday afternoon into the evening.
  • Any CHTC jobs depending on the licenses for these softwares will fail during this downtime.
  • Any CHTC jobs using already-compiled Matlab code will be unaffected (since compiled Matlab code does not use licenses), but new Matlab compilations will fail during the downtime for the license server.

IMPORTANT NOTE ON JOB PERFORMANCE/DURATION:
ThisÂrebootÂis taking place to address the 'Foreshadow'ÂsecurityÂvulnerability which was disclosed this week and affectsÂALL Intel processors; so allÂCHTC servers require the patch. ALL patched Intel processors are predicted to perform somewhat slower as a result of the patch (worst-case estimates are still emerging).ÂTherefore, be aware that in certain cases, jobs may run for slightly longer than before. If you are concerned that the performance effect may make your jobs run longer than allowed on CHTC systems, we strongly recommend running one (or a few) test jobs after the system has been updated, so that you can prepare for the new duration of your jobs.

We thank you for your patience and understanding while we work to keep our compute systemsÂsecureÂfor all users. As always, please send any questions toÂchtc@xxxxxxxxxxx, rather than replying to this email.

Best,
Your CHTC Team (chtc@xxxxxxxxxxx)
[← Prev in Thread] Current Thread [Next in Thread→]
  • Please read: Reboot of all CHTC servers NEXT WEEK to fix security vulnerability, chtc-users <=