HTC System to hold jobs over-using disk as of 2019-01-23


Date: Fri, 18 Jan 2019 11:15:23 -0600
From: chtc-users@xxxxxxxxxxx
Subject: HTC System to hold jobs over-using disk as of 2019-01-23
For users of CHTC's HTC System:
(users of the HPC Cluster can ignore)

As we continue to improve CHTC compute systems, we will be implementing new HTCondor policy to place jobs on hold for using more 'disk' (file space used within a running job) than specified by the user's submit file under "request_disk". This change will take effect on or after next Wednesday, January 23Âsuch that users with jobs that use more disk than requested will see their jobs placed on hold with a corresponding hold reason in the HTCondor job log file. This change will help to make sure that the jobs of some users are not running a server out of disk space, thereby affecting the jobs of other users running on the same server, which has been a recently increasing problem over the last year.

As with memory over-use, users will need to modify their disk request values of any held jobs before resubmitting. As always, we recommend that you review the log files of your recent jobs and new tests before deciding on memory and disk requests so that your jobs are requesting values only slightly above what you expect the jobs to use. It's almost always worth testing a few jobs for these values before submitting many jobs, in order to avoid delays caused by holds for overuse or longer queue wait times for jobs requesting much more than they truly need.

If you have any questions about how to choose the 'right' request for memory or disk for your HTC jobs, we're very happy to help! Just send an email to chtc@xxxxxxxxxxx to get in touch.

Thank you,
CHTC's Research Computing Facilitators
[← Prev in Thread] Current Thread [Next in Thread→]
  • HTC System to hold jobs over-using disk as of 2019-01-23, chtc-users <=