Outage since late Friday, Nov 26; HPC Cluster and parts of the HTC System are still down


Date: Mon, 29 Nov 2021 15:12:14 +0000
From: chtc-users@xxxxxxxxxxx
Subject: Outage since late Friday, Nov 26; HPC Cluster and parts of the HTC System are still down

Greetings,

 

As some of you noticed over the holiday weekend, the HPC Cluster and parts of the HTC System are down after new complications following the previous weekend’s planned maintenance. We’re currently working to get things back online as soon as we think they can be stably supported and will provide updates as we have them.

 

Affected systems are the same as for the recent planned maintenance (all in the same server room in the Discovery building):

  • HPC Cluster (entire; no login possible)
  • a portion of the HTC execute servers (including some researcher-owned and GPU hardware)

 

The HTC submit servers and majority of the HTC execute capacity are still up (nearly all are in another building). We are not yet certain of the state of the HPC Cluster queue.

 

We hope you had a nice Thanksgiving and understand the frustration of coming back to downed components. Thank you for your patience, and please contact us with any questions or issues at chtc@xxxxxxxxxxx

 

Regards,

Your CHTC Team

[← Prev in Thread] Current Thread [Next in Thread→]