[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Limiting jobs to two days



Hi Gerard,

My colleague just reminded me this morning of the two following first class JDL commands allowed_execute_duration (maximum execution time of one job epoch). While this command is intended to be used by user, the AP can set this limit via a submit transform to ensure all jobs placed to that AP has a two day limit:

JOB_TRANSFORM_NAMES = $(JOB_TRANSFORM_NAMES) SetTimeLimit
JOB_TRANSFORM_SetTimeLimit @=end
               # Set 2 day limit for any jobs that don't define a max duration or define a duration greater than the two day limit
ââââââREQUIREMENTS AllowedExecuteDuration =?= UNDEFINED || AllowedExecuteDuration > (2 * 24 * 60 * 60)
ââââââEVALSET AllowedExecuteDuration (2 * 24 * 60 * 60)
@end

This should cause non-checkpointing jobs to go on hold with a nice message while making checkpointing jobs go back into the queue for further matchmaking. Note in this sample configuration I am overwriting any user defined execute duration greater than the desired limit (2 days). If you wanted to make this less silent behavior you could move the second clause of the requirements and put it into an explicit submit requirements but inversed to make job placement fail if the user defined an allowed execution duration greater than the systems desired limit.

Cheers,
Cole Bollig

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Cole Bollig via HTCondor-users <htcondor-users@xxxxxxxxxxx>
Sent: Tuesday, March 24, 2026 3:27 PM
To: gweatherby@xxxxxxxx <gweatherby@xxxxxxxx>; HTCondor Users <htcondor-users@xxxxxxxxxxx>
Cc: Cole Bollig <cabollig@xxxxxxxx>
Subject: Re: [HTCondor-users] Limiting jobs to two days
 
Hi Gerard,

If you want to control this logic from the Access Point (AP) then you would want to use SYSTEM_PERIODIC_VACATE to kick any jobs exceeding the desired execute time and allow them to go back into the queue for matchmaking. Here in our local CHTC pool we do max execution timeout on the Execution Point side of things. It would take some time to dig that configuration out and strip out CHTC pool specifics, but it is based on this 2015 HTC presentation.

-Cole Bollig


From: Weatherby,Gerard <gweatherby@xxxxxxxx>
Sent: Tuesday, March 24, 2026 1:40 PM
To: Cole Bollig <cabollig@xxxxxxxx>; HTCondor Users <htcondor-users@xxxxxxxxxxx>
Subject: Re: Limiting jobs to two days
 
We want the job to release the scarce resource on the EP (the GPUs) and let other jobs that have been waiting have a turn. Ideally, the job would get back in line. (We will be urging our users to checkpoint their jobs).


From: Cole Bollig <cabollig@xxxxxxxx>
Date: Tuesday, March 24, 2026 at 1:51âPM
To: HTCondor Users <htcondor-users@xxxxxxxxxxx>
Cc: Weatherby,Gerard <gweatherby@xxxxxxxx>
Subject: Re: Limiting jobs to two days

*** Attention: This is an external email. ***
Use caution responding, opening attachments or clicking on links.
 
Hi Gerard,


-Cole Bollig 

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Weatherby,Gerard via HTCondor-users <htcondor-users@xxxxxxxxxxx>
Sent: Tuesday, March 24, 2026 12:24 PM
To: HTCondor Users <htcondor-users@xxxxxxxxxxx>
Cc: gweatherby@xxxxxxxx <gweatherby@xxxxxxxx>
Subject: [HTCondor-users] Limiting jobs to two days
 
We want to limit user jobs to two days to more fairly allocate resources. Weâre asking user to checkpoint their jobs if they are going to run longer than that.

Itâs not clear which SYSTEM_PERIODIC_ we should set to best implement this.


-----------------------------------

 

GERARD WEATHERBY

Application Architect

 

NMRhub

nmrhub.org

 

signature_1266212082