[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Limiting jobs to two days



Hi,


I was just trying to implement something similar and I've gone a job transformation route in the past to implement this with periodic_remove and periodic_release for a small group of people.

However, I was now planning to implement this for a number of nodes and considered going the SYSTEM_PERIODIC_VACATE route instead, since that seemed to be purpose built.
That does not seem to do anything, which kinda makes sense looking at the codebase, the m_sys_periodic_vacates is only filled, but never handled anywhere: https://github.com/htcondor/htcondor/blob/e1fcc751b2272449d3faeedbe16dd0f8a2395f52/src/condor_utils/user_job_policy.cpp#L493


So assuming that we will want to use a Nh periodic vacate, before this is actually implemented: We could add PeriodicVacate (this actually works) as part of a Job Transform.
However, I am wondering about the stability of this approach: would a user be able to just "condor_qedit" this entry away?


Thanks,
- Joachim


Am 24.03.26 um 21:27 schrieb Cole Bollig via HTCondor-users:
Hi Gerard,

If you want to control this logic from the Access Point (AP) then you would want to use SYSTEM_PERIODIC_VACATE to kick any jobs exceeding the desired execute time and allow them to go back into the queue for matchmaking. Here in our local CHTC pool we do max execution timeout on the Execution Point side of things. It would take some time to dig that configuration out and strip out CHTC pool specifics, but it is based on this 2015 HTC presentation.

-Cole Bollig


From: Weatherby,Gerard <gweatherby@xxxxxxxx>
Sent: Tuesday, March 24, 2026 1:40 PM
To: Cole Bollig <cabollig@xxxxxxxx>; HTCondor Users <htcondor-users@xxxxxxxxxxx>
Subject: Re: Limiting jobs to two days
Â
We want the job to release the scarce resource on the EP (the GPUs) and let otherÂjobs that have been waiting have a turn. Ideally, the job would get back in line. (We will be urging our users to checkpoint their jobs).


From: Cole Bollig <cabollig@xxxxxxxx>
Date: Tuesday, March 24, 2026 at 1:51âPM
To: HTCondor Users <htcondor-users@xxxxxxxxxxx>
Cc: Weatherby,Gerard <gweatherby@xxxxxxxx>
Subject: Re: Limiting jobs to two days

*** Attention: This is an external email. ***
Use caution responding, opening attachments or clicking on links.
Â
Hi Gerard,


-Cole BolligÂ

From:ÂHTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Weatherby,Gerard via HTCondor-users <htcondor-users@xxxxxxxxxxx>
Sent:ÂTuesday, March 24, 2026 12:24 PM
To:ÂHTCondor Users <htcondor-users@xxxxxxxxxxx>
Cc:Âgweatherby@xxxxxxxx <gweatherby@xxxxxxxx>
Subject:Â[HTCondor-users] Limiting jobs to two days
Â
We want to limit user jobs to two days to more fairly allocate resources. Weâre asking user to checkpoint their jobs if they are going to run longer than that.

Itâs not clear which SYSTEM_PERIODIC_ we should set to best implement this.


-----------------------------------

Â

GERARD WEATHERBY

Application Architect

Â

NMRhub

nmrhub.org

Â

signature_1266212082

Â

Â


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe

The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/
--

Joachim Meyer
HPC-Koordination & Support

UniversitÃt des Saarlandes FR Informatik | HPC

Postanschrift: Postfach 15 11 50 | 66041 SaarbrÃcken

Besucheranschrift: Campus E1 3 | Raum 4.03 66123 SaarbrÃcken

T: +49 681 302-57522 jmeyer@xxxxxxxxxxxxxxxxxx www.uni-saarland.de