Greetings,
Numerous servers in CHTC’s HTC System and and the entirety of the HPC Cluster went down in a fairly disruptive way because of power outages to campus buildings during this afternoon’s storm. We are working to restore functionality and will
provide updates as we can, beyond the below general expectations:
All jobs running on the HPC Cluster and most running in the HTC System will have been interrupted. While queued jobs on the HTC System will remain in the queue to run again, interrupted jobs on the HPC Cluster may need to be resubmitted.
As we bring up HTC submit nodes and the HPC Cluster head nodes, users are welcome to log in and clean up incomplete data and remove jobs. However, please know that there may be additional interruptions (especially on the HPC Cluster) or
missing functionality as we ensure that servers are rebooted in a proper state. Additionally, it may take time to restore the full capacity of down execute nodes.
More updates to come.
Thank you,
Your CHTC Team
lmichael@xxxxxxxx,
go.wisc.edu/lmcal, Discovery 2262, (608)316-4430, she/her