[Chtc-users] Interruption to HTC and HPC systems, January 30-31


Date: Wed, 20 Jan 2016 14:12:06 -0600
From: chtc-users@xxxxxxxxxxx
Subject: [Chtc-users] Interruption to HTC and HPC systems, January 30-31
Greetings CHTC users,

Our HPC cluster, and some of our HTC pool, including our two main submit nodes (submit-3 and submit-5) will be shut down and unavailable onÂJanuary 30-31Â(the last weekend of the month).ÂÂThis down time is necessary to repair the main power source in a server room, with the aim of preventing future power outages.

What will be affected? Â
  • On both systems, users will be unable to log into the HPC head nodes (aci-service-1, aci-service-2) and HTC submit nodes (submit-3, submit-5) during the outage. ÂOther HTC submit servers should be unaffected by the outage.
  • Jobs still in the HPC cluster queue when the cluster is taken down on Jan 30 will be killed and will need to be re-submitted,Âafter access to the head node is restored on January 31.Â
  • Jobs submitted to the HTC system via submit-3 or submit-5 will remain in the queue, but will not successfully complete until the submit nodes have been turned back on. Any jobs interrupted by the outage will be automatically re-run by HTCondor after submit-3 and submit-5 are rebooted on January 31.Â
We anticipate a restart of all machines by the end of Sunday, January 31, and we will inform you (via this email address) as our resources become available again.Â

Thank you for your patience during this temporary inconvenience, as we work to improve the long-term reliability of our resources. Â

Best wishes,Â

Your CHTC team
[← Prev in Thread] Current Thread [Next in Thread→]
  • [Chtc-users] Interruption to HTC and HPC systems, January 30-31, chtc-users <=