Reboots of all CHTC servers today and next week to fix security vulnerability


Date: Thu, 11 Oct 2018 15:53:42 -0500
From: chtc-users@xxxxxxxxxxx
Subject: Reboots of all CHTC servers today and next week to fix security vulnerability
Greetings CHTC Users,

In order to patch CHTC systems for a new Linux vulnerability that became apparent last week, a fullÂrebootÂofÂallÂCHTC servers is required ASAP.ÂWe started rebooting servers in our HTC System yesterday (Oct 10) and will begin rebooting servers in the HPC Cluster next Thursday (Oct 18), as described below.

For the HTC System:
  • Execute servers are currently being rebooted.
  • Already-running jobs that have not completed by the time an execute server isÂrebootedÂwill beÂevicted, which means that HTCondor will interrupt the jobs, but keep them in the queue (back in Idle state) to run again.
  • Submit servers and the transfer server will be briefly unavailable (less than 1 hour) when they're individuallyÂrebootedÂsometime in the next 48 hours (including group-owned submit servers).
For the HPC Cluster:
  • Starting next Thursday morning (10/18), execute servers will beÂrebooted.Â
  • Jobs in the queue on/after today will only begin running if their time request will allow them to complete by 9am next Thursday.Â
  • The queue will interrupt and remove an already-running job when any of its execute servers begin the reboot process. Interrupted jobs will be put on hold so users can remove them or release them to re-run.Â
    • To release held jobs, use the command: scontrol release JobNumber
  • The head nodes will be briefly unavailable (less than 1 hour) when they areÂrebootedÂsometime on Thursday, though the queue will be preserved (to run queued jobs afterwards).
We thank you for your patience and understanding while we work to keep our compute systemsÂsecureÂfor all users. As always, please send any questions toÂchtc@xxxxxxxxxxx, rather than replying to this email.

Best,
Your CHTC Team (chtc@xxxxxxxxxxx)
[← Prev in Thread] Current Thread [Next in Thread→]
  • Reboots of all CHTC servers today and next week to fix security vulnerability, chtc-users <=