Reminder: Interruption to certain HTC services on Monday, February 25


Date: Fri, 22 Feb 2019 09:55:39 -0600
From: chtc-users@xxxxxxxxxxx
Subject: Reminder: Interruption to certain HTC services on Monday, February 25
Greetings CHTC users,

This message is for users of our high throughput compute (HTC system).

Reminder: on Monday, February 25, there will be a scheduled downtime that affects one of our server rooms, interrupting certain services that are part of the HTC system.Â

Affected services are listed in the email copied below; an update to the implications for HTC jobs is listed here:
  • Over the weekend, we will limit the types of jobs that can run on the affected execute servers to those that specify "WantFlocking" or "WantGlidein", so that more long-running jobs will not be interrupted when the downtime begins. Jobs not already specifying "WantFlocking" or "WantGlidein" will experience less throughput because they'll only run on the portion of CHTC servers that are unaffected by maintenance.
  • Any jobs running on the high-memory servers will be evicted when the maintenance window begins on Monday, February 25. Evicted jobs will remain in the queue and be re-run automatically.
  • Jobs submitted from submit-1 or a research group-owned submit server may be interrupted if they are running when the maintenance window begins on Monday, February 25. These jobs will remain in the queue and be re-run automatically.
  • Gluster-dependent jobs that haven't started by the morning of Feb 22 (today) will not run until after the downtime completes on Feb 25.
Email us (chtc@xxxxxxxxxxx) with any questions.Â

Cheers,
Your CHTC team

---------- Forwarded message ---------
From: <chtc-users@xxxxxxxxxxx>
Date: Mon, Feb 18, 2019 at 1:36 PM
Subject: Today's downtime rescheduled to Monday, February 25
To: chtc-users <chtc-users@xxxxxxxxxxx>

Hi CHTC users,

This message is for users of our high throughput compute (HTC system).

Due to the last-minute discovery that today's planned downtime would affect more HTC services than anticipated, we are pushing it back a week to next Monday, February 25.Â

The following resources will be unavailable or taken down during the downtime:
  • submit-1 and most research group-owned submit servers
  • the high memory servers
  • the Gluster file share and transfer server
  • about half of our execute servers
Implications for HTC jobs:
  • Any jobs running on the affected execute servers (including the high-memory servers) will be evicted when the maintenance window begins on Monday, February 25. Evicted jobs will remain in the queue and be re-run automatically.
  • Jobs submitted from submit-1 or a research group-owned submit server may be interrupted if they are running when the maintenance window begins on Monday, February 25. These jobs will also remain in the queue and be re-run automatically.
  • Gluster-dependent jobs that haven't started by the morning of Feb 22 will not run until after the downtime completes on Feb 25.
Get in touch with us at chtc@xxxxxxxxxxx with any questions and concerns.

Cheers,
Your CHTC team

_______________________________________________
CHTC-users mailing list
CHTC-users@xxxxxxxxxxx
To unsubscribe send an email to:
chtc@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/chtc-users
[← Prev in Thread] Current Thread [Next in Thread→]
  • Reminder: Interruption to certain HTC services on Monday, February 25, chtc-users <=