[Chtc-users] Some CHTC systems down again due to power outage


Date: Mon, 05 Oct 2015 09:35:52 -0500
From: chtc-users@xxxxxxxxxxx
Subject: [Chtc-users] Some CHTC systems down again due to power outage
Greetings,

Several campus buildings experienced another power outage over the weekend, which has affected all of CHTC servers, yet again. We are growingly frustrated with the power outages, as we're sure you are too. Please know that we're always working as quickly as possible to bring various aspects of CHTC compute systems back online.

Here is a summary of known issues for CHTC Compute Systems, which we're still working on:

HPC Cluster is Completely Down
- We hope to restore full functionality by the end of the day
- All queued jobs will need to be resubmitted

HTC System is Mostly Back
- CHTC's HTCondor pool is up and running, but with depleted execute server capacity (~1/2) that *should* be restored by the end of the day
- CHTC's primary submit servers (submit-3/4/5) are back up and running
- Most group-specific submit servers managedÂby CHTC are still down, but *should* be restored today
- Any jobs in the queue on a submit server that may have been interrupted by the outage will be automatically re-run by HTCondor, but not until the submit server is up and running again.
Let us know if you notice any behavior not consistent with the above.


We appreciate your patience, and will provide updates as we can provide them!

Regards,
Your CHTC Team
[← Prev in Thread] Current Thread [Next in Thread→]
  • [Chtc-users] Some CHTC systems down again due to power outage, chtc-users <=