|
Hi Gerard,
My colleague just reminded me this morning of the two following first class JDL commands
allowed_execute_duration (maximum execution time of one job epoch). While this command is intended to be used by user, the AP can set this limit via a submit transform to ensure all jobs placed to that AP has a two day limit:
JOB_TRANSFORM_NAMES = $(JOB_TRANSFORM_NAMES) SetTimeLimit
JOB_TRANSFORM_SetTimeLimit @=end
# Set 2 day limit for any jobs that don't define a max duration or define a duration greater than the two day limit
ââââââREQUIREMENTS AllowedExecuteDuration =?= UNDEFINED || AllowedExecuteDuration > (2 * 24 * 60 * 60)
ââââââEVALSET AllowedExecuteDuration (2 * 24 * 60 * 60)
@end
This should cause non-checkpointing jobs to go on hold with a nice message while making checkpointing jobs go back into the queue for further matchmaking. Note in this sample configuration I am overwriting any user defined execute duration greater than the
desired limit (2 days). If you wanted to make this less silent behavior you could move the second clause of the requirements and put it into an explicit submit requirements but inversed to make job placement fail if the user defined an allowed execution duration
greater than the systems desired limit.
Cheers,
Cole Bollig
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Cole Bollig via HTCondor-users <htcondor-users@xxxxxxxxxxx>
Sent: Tuesday, March 24, 2026 3:27 PM To: gweatherby@xxxxxxxx <gweatherby@xxxxxxxx>; HTCondor Users <htcondor-users@xxxxxxxxxxx> Cc: Cole Bollig <cabollig@xxxxxxxx> Subject: Re: [HTCondor-users] Limiting jobs to two days
Hi Gerard,
If you want to control this logic from the Access Point (AP) then you would want to use SYSTEM_PERIODIC_VACATE to kick any jobs exceeding the desired execute time and allow them to go back into the queue for matchmaking. Here in our local CHTC pool we do max
execution timeout on the Execution Point side of things. It would take some time to dig that configuration out and strip out CHTC pool specifics, but it is based on this
2015 HTC presentation.
-Cole Bollig
From: Weatherby,Gerard <gweatherby@xxxxxxxx>
Sent: Tuesday, March 24, 2026 1:40 PM To: Cole Bollig <cabollig@xxxxxxxx>; HTCondor Users <htcondor-users@xxxxxxxxxxx> Subject: Re: Limiting jobs to two days
We want the job to release the scarce resource on the EP (the GPUs) and let other jobs that have been waiting have a turn. Ideally, the job would get back in line. (We will be urging our users to checkpoint their jobs).
|