[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Configuring preemption based on JobPrio



Thanks, I tried this configuration yesterday. it gives undesired results.

First few batches submitted with jobprio from 1 to 20, as expected starting from 4 to 20 jobs started running after that node went out of resources.Â

Submitted new batches with job prio from 11 to 15, ideally it should have preempted the jobs with priority 4,5,6,7,8 but it doesn't. it picks job with higher job prio for preemption and continuously getting preempted the old and new batch.Â


$ condor_q -af:h globaljobid jobprio jobruncount jobstatus
globaljobid               jobprio jobruncount jobstatus
testmachine.example.com#503.0#1747403832 1    undefined  1
testmachine.example.com#504.0#1747403832 2    undefined  1
testmachine.example.com#505.0#1747403832 3    undefined  1
testmachine.example.com#506.0#1747403832 4 Â Â Â 8 Â Â Â Â Â 2
testmachine.example.com#507.0#1747403832 5 Â Â Â 1 Â Â Â Â Â 2
testmachine.example.com#508.0#1747403832 6 Â Â Â 1 Â Â Â Â Â 2
testmachine.example.com#509.0#1747403832 7 Â Â Â 1 Â Â Â Â Â 2
testmachine.example.com#510.0#1747403832 8 Â Â Â 1 Â Â Â Â Â 2
testmachine.example.com#511.0#1747403832 9 Â Â Â 1 Â Â Â Â Â 2
testmachine.example.com#512.0#1747403832 10 Â Â Â1 Â Â Â Â Â 2
testmachine.example.com#513.0#1747403832 11 Â Â Â1 Â Â Â Â Â 2
testmachine.example.com#514.0#1747403832 12 Â Â Â1 Â Â Â Â Â 1
testmachine.example.com#515.0#1747403832 13 Â Â Â1 Â Â Â Â Â 1
testmachine.example.com#516.0#1747403832 14 Â Â Â7 Â Â Â Â Â 2
testmachine.example.com#517.0#1747403832 15 Â Â Â8 Â Â Â Â Â 1
testmachine.example.com#518.0#1747403832 16 Â Â Â10 Â Â Â Â Â2
testmachine.example.com#519.0#1747403832 17 Â Â Â10 Â Â Â Â Â2
testmachine.example.com#520.0#1747403832 18 Â Â Â10 Â Â Â Â Â2
testmachine.example.com#521.0#1747403832 19 Â Â Â14 Â Â Â Â Â2
testmachine.example.com#522.0#1747403832 20 Â Â Â15 Â Â Â Â Â2
testmachine.example.com#523.0#1747404022 11   Âundefined  1
testmachine.example.com#524.0#1747404022 12   Âundefined  1
testmachine.example.com#525.0#1747404022 13   Âundefined  1
testmachine.example.com#526.0#1747404022 14 Â Â Â9 Â Â Â Â Â 2
testmachine.example.com#527.0#1747404022 15 Â Â Â11 Â Â Â Â Â2


Thanks & Regards,
Vikrant Aggarwal


On Fri, May 16, 2025 at 1:25âAM Carles Acosta <cacosta@xxxxxx> wrote:
Hi Vikrant,

Our site allows preemption by priority (not JobPrio) and by Rank in the STARTD under certain conditions. We define different RANK numbers depending on the job Owner, OwnerGroup, whatever. So, in your case, I think that you can do something like this on the STARTD side:

STARTD_JOB_ATTRS = $(STARTD_JOB_ATTRS), JobPrio
RANK = -My.JobPrio
RetirementTime = 2 * $(MINUTE)
MAXJOBRETIREMENTTIME = $(RetirementTime)

Cheers,

Carles

On Thu, 15 May 2025 at 19:44, Vikrant Aggarwal <ervikrant06@xxxxxxxxx> wrote:
I realizedÂPREEMPT is not what I was supposed to use.Â

On worker node:Â

STARTD_JOB_ATTRS = $(STARTD_JOB_ATTRS), JobPrio
PREEMPTION_REQUIREMENTS = Â( My.JobPrio < Target.JobPrio )
PREEMPTION_RANK = -My.JobPrio
RetirementTime = 2 * $(MINUTE)
MAXJOBRETIREMENTTIME = $(RetirementTime)


Still it doesn't work to preempt lower priority jobs from worker node.Â


Thanks & Regards,
Vikrant Aggarwal


On Thu, May 15, 2025 at 1:08âPM Vikrant Aggarwal <ervikrant06@xxxxxxxxx> wrote:
After going through a couple of presentations and official docs. I realized that to make PREEMPT work I don't need any other configuration.

This minimal configurationÂon the worker node doesn't work to evacuate the jobs with less priority.Â

STARTD_JOB_ATTRS = $(STARTD_JOB_ATTRS), JobPrio
PREEMPT = Target.JobPrio > My.JobPrio



My requirement is a single user submitting multiple jobs, Jobs with high job priority should evacuate the running jobs with low priority.Â


Thanks & Regards,
Vikrant Aggarwal


On Wed, May 14, 2025 at 4:46âPM Vikrant Aggarwal <ervikrant06@xxxxxxxxx> wrote:
Hello Experts,

Following settings on negotiator:Â

# condor_config_val ALLOW_PSLOT_PREEMPTION NEGOTIATOR_CONSIDER_EARLY_PREEMPTION NEGOTIATOR_CONSIDER_PREEMPTION PREEMPTION_RANK PREEMPTION_REQUIREMENTS
True
True
true
(RemoteUserPrio * 1000000) - ifThenElse(isUndefined(TotalJobRuntime), 0, TotalJobRuntime)
True


Following settings on worker node.Â

STARTD_JOB_ATTRS = $(STARTD_JOB_ATTRS), JobPrio
ALLOW_PSLOT_PREEMPTION = True
PREEMPT = (Target.JobPrio > My.JobPrio)
SHUTDOWN_GRACEFUL_TIMEOUT = 1 * $(MINUTE)

I have the jobs running and other jobs waiting with a higher JobPrio submitted from the same user. But they can't preempt existing jobs. Am I missingÂsomething?

# condor_who -af JobPrio Â| sort | uniq -c
   8 20
   8 200


Thanks & Regards,
Vikrant Aggarwal

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe

Join us in June at Throughput Computing 25: https://osg-htc.org/htc25

The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/


--
Carles Acosta i Silva
PIC (Port d'Informacià CientÃfica)
Campus UAB, Edifici D
E-08193 Bellaterra, Barcelona
Tel: +34 93 581 33 08
Fax: +34 93 581 41 10
AvÃs - Aviso - Legal Notice: Âhttp://legal.ifae.es
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe

Join us in June at Throughput Computing 25: https://osg-htc.org/htc25

The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/