HPC Cluster unplanned downtime starting now, possibly through next week (6/11)


Date: Fri, 4 Jun 2021 10:05:15 -0500
From: chtc-users@xxxxxxxxxxx
Subject: HPC Cluster unplanned downtime starting now, possibly through next week (6/11)
Greetings CHTC users,

This is an urgent message for users of our HPC Cluster.

We apologize for the lack of detailed update until now. While we thought the HPC Cluster filesystem had been repaired to stability early yesterday afternoon and had some specific users doing some tests, we discovered at the end of yesterday (just before we would have sent a positive update) that the issues had returned. It appears that the filesystem will need more extended maintenance work, which we are beginning ASAP and hope to complete by sometime next week.

As of now, we need to take the following steps to prevent further issues until the downtime has ended:
  • users will not be able to log into the HPC Cluster until the TBD completion of the downtime
  • running jobs will be re-queued, and so will remain in the queue until users can log in again
  • jobs queued but not yet running will also remain in the queue
Additional details:
  • While we plan to complete the maintenance work next week, this timeline is subjectÂto change and we'll notify users as soon as we can.
  • At this time we expect that all user data in the filesystem (i.e. /home and /software locations) will be preserved. The issue appears to be with the filesystemâs metadata (which keeps track of which data is where in the filesystem).
  • If you have imminently urgent work delayed by this downtime, please get in touch ASAP to discuss.

As always, please email us with any questions or issues via chtc@xxxxxxxxxxx

Thank you for your patience,
Your CHTC Team
[← Prev in Thread] Current Thread [Next in Thread→]
  • HPC Cluster unplanned downtime starting now, possibly through next week (6/11), chtc-users <=