HTC Gluster file system outage (as of 8/28)


Date: Wed, 28 Aug 2019 18:08:52 -0500
From: chtc-users@xxxxxxxxxxx
Subject: HTC Gluster file system outage (as of 8/28)
Greetings CHTC users,

This message is for users of our HTC system who use our Gluster file system.

There was an unexpected network interruption in the server room that houses our Gluster file system; this interruption has triggered some issues in how Gluster is connected to CHTC servers and how it handles quotas, making it currently unreliable (and in some cases, unavailable) for running jobs.

We cannot fully address these issues today. For now, HTC jobs that use Gluster will be affected as follows:
  • Idle jobs: We have configured HTCondor to stop matching new Gluster-requiring jobs as of 6:00pm today. Jobs requiring Gluster will stay idle until Gluster is fixed and back to normal.
  • Running jobs: We expect further interruptions to Gluster's availability as we fix it tomorrow, so be aware that currently running jobs that depend on Gluster are unlikely to complete successfully. Jobs that were running at the time of the network interruption may have already failed.
We will send an update tomorrow once we have a better idea of when Gluster will be back to normal and Gluster-dependent jobs can resume running.

Thanks for your patience! Get in touch with us at chtc@xxxxxxxxxxx with any questions.

Best,
Your CHTC team
[← Prev in Thread] Current Thread [Next in Thread→]
  • HTC Gluster file system outage (as of 8/28), chtc-users <=