CHTC maintenance complete; old HPC cluster still down


Date: Wed, 17 Jul 2024 21:51:28 +0000
From: chtc-users@xxxxxxxxxxx
Subject: CHTC maintenance complete; old HPC cluster still down

Hello CHTC users, 

 

The maintenance planned for this week is complete. Most services are back up; access to the old cluster (hpclogin3.chtc.wisc.edu and related partitions) remains unavailable. Our next steps for the old cluster remain in flux; please see the note below and contact us if you anticipate any significant impacts to your work

 

Restored services: 

  • HTC users of ap2001.chtc.wisc.edu and ap2002.chtc.wisc.edu should be able to log in and submit jobs.
    • More maintenance on ap2002 will be needed in the future.
  • HPC users can use the new cluster login node (spark-login.chtc.wisc.edu) and submit jobs to the partitions available from that node.

 

Ongoing outage: 

The old HPC cluster (hpclogin3.chtc.wisc.edu and job submission to its partitions) is still down due to hardware failures. We are evaluating next steps, including accelerating the shutdown of the old cluster and moving all old cluster worker nodes to the new cluster (accessed through spark-login.chtc.wisc.edu). 

Please contact chtc@xxxxxxxxxxx as soon as possible if this would cause significant issues to you, especially for any upcoming deadlines. 

 

We will continue providing updates for ongoing outages via our status page: https://status.chtc.wisc.edu/

 

Contact us at chtc@xxxxxxxxxxx with any questions or concerns. 

 

Best,

The CHTC Team

 

[← Prev in Thread] Current Thread [Next in Thread→]
  • CHTC maintenance complete; old HPC cluster still down, chtc-users <=