Certain execute nodes shut down


Date: Wed, 12 May 2021 15:37:55 -0500
From: chtc-users@xxxxxxxxxxx
Subject: Certain execute nodes shut down
Greetings CHTC users,

There was a short chilled water interruption this morning, leading to the shut off of 5 HTC System execute nodes, and 20 HPC Cluster execute nodes.

We expect that the impacted servers will be back up within the next day, but jobs running on these will have been interrupted and may require action from users:
  • In the HTC System, these jobs are automatically re-queued to run again elsewhere, likely requiring no action from the user.
  • In the HPC Cluster, such jobs may need to be re-queued.
Please review your jobs and email us at chtc@xxxxxxxxxxx if youâre not sure how to cleanly re-run work that may have been affected. Weâll be happy to help.

Best,
Your CHTC team
[← Prev in Thread] Current Thread [Next in Thread→]
  • Certain execute nodes shut down, chtc-users <=