[Chtc-users] submit-3.chtc.wisc.edu still down after second reboot, today


Date: Fri, 28 Oct 2016 16:11:33 -0500
From: chtc-users@xxxxxxxxxxx
Subject: [Chtc-users] submit-3.chtc.wisc.edu still down after second reboot, today
Greetings,

Following information from yesterday's reboot, the submit-3.chtc.wisc.edu server needed to be rebooted again today so that some automated checks on it's disk space could be performed. We are letting it continue these checks, so it may remain unresponsive to login attempts for several more hours. We apologize for the unforeseen inconvenience, but will try to let you know when it is up and running again.

Thank you,
Lauren Michael

Lauren Michael -ÂResearch Computing Facilitator,ÂCenter for High Throughput ComputingUniversity of Wisconsin - Madison

On Thu, Oct 27, 2016 at 4:34 PM, <chtc-users@xxxxxxxxxxx> wrote:
Greetings,

As promised, we are writing to provide notice that the HPC Cluster will be rebooted at 1:00pm tomorrow, October 28 (Friday). Implications for HPC Cluster users are listed in the previous email, below.

The HTC System submit servers have been rebooted. Execute servers are still gradually being rebooted over the next 24 hours, so it's still possible that running jobs will be interrupted between now and mid-day tomorrow (Friday).

Thank you, again,Âfor your patience while we secure CHTC compute systems with the patch for this linux security vulnerability; please send emails to chtc@xxxxxxxxxxx.

Cheers,
Your CHTC Team

On Wed, Oct 26, 2016 at 5:40 PM, Lauren Michael <lmichael@xxxxxxxxxxx> wrote:
Greetings CHTC Users,

A linux security vulnerability has become apparent and will require a full reboot of all CHTC servers ASAP. We will start rebooting servers in our HTC System tomorrow (October 27) at noon, and servers in the HPC Cluster will be rebooted at a later time that we'll announce tomorrow.

For users of our HTC System:
  • submit servers will be briefly unavailable when they're rebooted (including group-owned submit servers)
  • execute servers will be gradually rebooted over the course of 24 hours, starting at noon tomorrow (Thursday, Oct. 27)
  • jobs that are running when the execute servers are rebooted will be evicted, but HTCondor will keep them in the queue and automatically re-run them
For users of our HPC Cluster:
  • jobs running when the head node and execute nodes are rebooted will need to be resubmitted
  • the exact time of the reboot will be announced tomorrow, likely to take place tomorrow afternoon or on Friday

We thank you for your patience and understanding while we work to keep our compute systems secure for all users. As always, please send any questions to chtc@xxxxxxxxxxx, rather than replying to this email.

Happy Computing,
Your CHTC Team



--
Lauren Michael -ÂResearch Computing Facilitator,ÂUniversity of Wisconsin - Madison


_______________________________________________
Chtc-users mailing list
Chtc-users@xxxxxxxxxxx
To unsubscribe send an email to:
chtc@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/chtc-users


[← Prev in Thread] Current Thread [Next in Thread→]
  • [Chtc-users] submit-3.chtc.wisc.edu still down after second reboot, today, chtc-users <=