Please read: Reboot of all CHTC servers to fix security vulnerability (1/8/17)


Date: Mon, 08 Jan 2018 11:39:54 -0600
From: chtc-users@xxxxxxxxxxx
Subject: Please read: Reboot of all CHTC servers to fix security vulnerability (1/8/17)
Greetings CHTC Users,

In order to patch CHTC systems for a vulnerability that became apparent last week (more details below, esp. regarding job performance/duration), a fullÂrebootÂofÂallÂCHTC servers is required ASAP.ÂWe will startÂrebootingÂservers in our HTC System and HPC Cluster starting today (January 8), as described below.

For the HTC System:
  • Starting this morning, servers are being rebootedÂon a rolling basis over the course of 24 hours.
  • Already-running jobs that have not completed by the time an execute server isÂrebootedÂwill beÂevicted, which means that HTCondor will interrupt the jobs, but keep them in the queue (back in idle state) to run again.
  • Submit servers will be briefly unavailable (less than 1 hour) when they're individuallyÂrebooted sometime today (including group-owned submit servers).
For the HPC Cluster:
  • Starting later today, execute servers will not start new jobs. Over today and tomorrow, execute servers will be rebooted on a rolling basis.
  • The queue will interrupt and remove already-running jobs when the execute servers start rebooting. These jobs will need to be re-submitted by users.
  • The head nodes will be briefly unavailable (less than 1 hour) when they are rebooted sometime today or tomorrow, though the queue will be preserved (to run queued jobs afterwards).
IMPORTANT NOTE ON JOB PERFORMANCE/DURATION:
This reboot is taking place to address the Meltdown/Spectre security vulnerability which was disclosed last week and affects ALL Intel processors; the security patch will therefore impact all CHTC servers. ALL patched Intel processors (including the ones in CHTC) are predicted to perform anywhere from 0% to 19% (likely less) more slowly as a result of the patchTherefore, be aware that in certain cases, jobs may run for slightly longer than before. If you are concerned that the performance effect may make your jobs run longer than allowed on CHTC systems, we strongly recommend running one (or a few) test jobs after the system has been updated, so that you can prepare for the new duration of your jobs.

Because Meltdown is tied to Intel processors and Spectre implicates all processors, these vulnerabilities impact desktops and laptops as well! This article from lawfareblog.com provides a nice summary of the issue, as well as the actions that users and administrators can take:

We thank you for your patience and understanding while we work to keep our compute systems secure for all users. As always, please send any questions toÂchtc@xxxxxxxxxxx, rather than replying to this email.

Best,
Your CHTC Team (chtc@xxxxxxxxxxx)
[← Prev in Thread] Current Thread [Next in Thread→]
  • Please read: Reboot of all CHTC servers to fix security vulnerability (1/8/17), chtc-users <=