Full HPC and Partial HTC Outages Nov 18

Date:	Wed, 10 Nov 2021 15:43:39 +0000
From:	chtc-users@xxxxxxxxxxx
Subject:	Full HPC and Partial HTC Outages Nov 18 - Nov 22

Greetings,

Due to just-confirmed maintenance for the cooling infrastructure in one of CHTC’s server rooms, we will experience full HPC Cluster and partial HTC System outages beginning in the afternoon on Thursday, November 18, with service being restored by Monday, November 22.

Impacts to the HPC Cluster

All hardware (head nodes, execute nodes, storage) in the HPC cluster will be powered down during the planned outage.

To prevent HPC Cluster jobs from being interrupted by the downtime, we will begin draining the nodes one week prior to the downtime. Jobs submitted requesting time that would exceed the November 18 downtime will not run until after the cluster is back up, but will be accepted into the queue. Jobs can still run on the cluster within the week before the downtime, IF their time request (“--time=” in the submit file) indicates that they will complete before the morning of November 18.

Impacts to the HTC System

The following components of the HTC system will be powered down during the outage:

a subset of HTC execute nodes
the following submit servers may go down (and would likely be inaccessible for through Nov 22), but we hope to keep them up: submit2.chtc.wisc.edu, submit3.chtc.wisc.edu, learn.chtc.wisc.edu

While jobs on the affected submit servers and execute servers will be interrupted when they go down, they will remain in the queue to run again once the submit servers are back up. Otherwise, HTC users should not be impacted by this outage.

It is possible the exact dates of the outage may shift, and we realize this is somewhat short notice, but plan to provide a reminder or update at least one day prior to the start of the downtime.

Please contact us at chtc@xxxxxxxxxxx with any questions or concerns.

Best,

Your CHTC team

CARE OF:

Lauren Michael - Research Computing Facilitator, Center for High Throughput Computing, University of Wisconsin - Madison

Research Facilitation Lead, Open Science Grid; co-PI, PATh; co-PI, CaRCC

lmichael@xxxxxxxx, go.wisc.edu/lmcal, Discovery 2262, (608)316-4430, she/her

[← Prev in Thread]	Current Thread	[Next in Thread→]
Full HPC and Partial HTC Outages Nov 18 - Nov 22, chtc-users <= Reminder: Full HPC and Partial HTC Outages Nov 18 - Nov 22, chtc-users Outage extended to submit2/3; flocking/gliding jobs with HTTP transfer may go on hold, chtc-users

Previous by Date:	Office Hours cancelled today (Nov 2), chtc-users
Next by Date:	Ongoing unplanned outage of the HTC /staging location, some GPU and researcher-owned execute servers, chtc-users
Previous by Thread:	All Systems Restored after Outage; Limited Staff 11/25-26; No Office Hours 11/25, chtc-users
Next by Thread:	Reminder: Full HPC and Partial HTC Outages Nov 18 - Nov 22, chtc-users
Indexes:	[Date] [Thread]

Full HPC and Partial HTC Outages Nov 18 - Nov 22