Date: | Mon, 06 Apr 2020 17:40:44 -0500 |
---|---|
From: | chtc-users@xxxxxxxxxxx |
Subject: | HTC system components back up; HPC Cluster still down |
Greetings CHTC users, We have an update on the systems impacted by the cooling maintenance performed last week and over the weekend. The HPC Cluster is still down and HTC system components are back up. More details are below. HPC Cluster We started bringing the HPC Cluster back up today, but there were unforeseen issues with bringing up the file system in a stable state. We have disabled login access to perform additional diagnosis and maintenance. Once we have a more specific timeline for making the cluster available, we will notify this email list. Note: if you log in to aci-service-1.chtc.wisc.edu or aci-service-2.chtc.wisc.edu you are using the HPC Cluster. HTC System Affected components of the HTC System (as detailed in our previous emails) are back online or will be back up soon. All submit servers are on as of this morning. Note: if you log in to submit-1.chtc.wisc.edu, submit2.chtc.wisc.edu, and/or submit3.chtc.wisc.edu you are using the HTC System. As always, please get in touch (chtc@xxxxxxxxxxx) with any questions or concerns. Best, Your CHTC Team ---------- Forwarded message --------- Date: Thu, Apr 2, 2020 at 3:39 PM Subject: Additional HPC Cluster and HTC System servers down Greetings CHTC users,
Additional CHTC services have been turned off due to an unexpected failure in the backup cooling system for the server room currently undergoing maintenance. In addition to our previously communicated outages (described in our original email, below), the following services are impacted: High Performance Cluster
We don't yet know if the situation will improve to the point where we can turn certain key services back on. If any additional servers go down, or we're able to bring other servers back up, we will let you know via the chtc-users mailing list. Again, please get in touch at chtc@xxxxxxxxxxx with any questions or concerns, especially if this outage means that you wonât make a hard deadline. Best, Your CHTC team ---------- Forwarded message --------- Date: Wed, Apr 1, 2020 at 5:41 PM Subject: Immediate: HPC Cluster and portions of HTC System down April 1 - 5 Greetings CHTC users, Due to a campus chilled-water maintenance announced this afternoon, CHTC needs to turn off major components of our computing services for the next 4 days (our server rooms depend heavily on chilled water for server cooling). Weâve already begun powering down a number of servers, with more to come as described for the below categories. The HPC Cluster will be down:
There is a chance that we will need to turn off more servers; we will endeavor to provide immediate (or advanced) notice if this becomes necessary. The campus maintenance is expected to conclude by Sunday, April 5 at 8pm CDT. We will send an email via this address (chtc-users@xxxxxxxxxxx) confirming when our systems are back online. Please get in touch at chtc@xxxxxxxxxxx with any questions or concerns, especially if this outage means that you wonât make a hard deadline. Weâll do our best to help you with potential alternative solutions. Best, Your CHTC Team |
[← Prev in Thread] | Current Thread | [Next in Thread→] |
---|---|---|
|
Previous by Date: | Additional HPC Cluster and HTC System servers down, chtc-users |
---|---|
Next by Date: | HPC Cluster back up, test installations and restore files, chtc-users |
Previous by Thread: | HPC Cluster back up, test installations and restore files, chtc-users |
Next by Thread: | Immediate: HPC Cluster and portions of HTC System down April 1 - 5, chtc-users |
Indexes: | [Date] [Thread] |