Hi Joe, Still can’t make it work for some reason. I tried adding ‘Rank = 1000000.0’ to the submit file. condor_q -long does show the new rank of the job, but it still won’t take precedence when all other jobs are Idled. I tried adding ‘DEDICATED_SCHEDULER_USE_FIFO = False’ to the CM’s config file, but nothing changed. I also tried replacing ‘RANK = Scheduler =?= $(DedicatedScheduler)’ on execute node with: RANK = ("AcctGroupUser" == "pronto" * 1000000000000) + (Scheduler =?= $(DedicatedScheduler)) or simply RANK = ("AcctGroupUser" == "pronto" * 1000000000000) and still nothing changed. Finally, I also updated to 10.0.8 from 9.0.17. Other than all 3 jobs waiting longer in IDLE before the first one going back to RUN, it didn’t seem to change anything. Somewhere during my tests, I tried with 10.7.0, but then it was the second job that started running instead of the third one when the first got pre-empted. And I’m not sure if that was because the
Schedd suddenly kept crashing or something else… Martin From: JOSEPH RYAN REUSS <jrreuss@xxxxxxxx>
Hi Martin, So, with parallel universe jobs, things will work a little differently because parallel universe runs jobs FIFO. You will need to then assign the job a rank within the submit file. In the submit file you will need
to add 'Rank = <floating_point_rank>' and the higher the rank should be run first when trying to match to a machine. Here's the documentation:
https://htcondor.readthedocs.io/en/latest/users-manual/submitting-a-job.html?#about-requirements-and-rank Best, Joe From: Beaumont, Martin <Martin.Beaumont@xxxxxxxxxxxxxxx> Hi Joseph, Thanks for the quick reply! Will this work with parallel universe jobs (DedicatedScheduler)? Because I’m trying what you said right now and it doesn’t seem to work. Job 116 is from user “test”. Job 117 is from user “test2”. Job 118 is from user “test2” and “accounting_group_user = pronto” added to submit file. All 3 jobs are parallel universe. Jobs are set to be Pre-empted after running for 120 seconds (for quick testing purposes): use POLICY: Preempt_if_Runtime_Exceeds( 120 ) Job 116 keeps going back to the running state after being pre-empted. I would assume Job 118 would start running instead. Is this about FIFO? If so, is there any way to change it? Also, I have dynamic partitionable slots configured: DedicatedScheduler = "DedicatedScheduler@sms1" STARTD_ATTRS = \$(STARTD_ATTRS), DedicatedScheduler START = True SUSPEND = False CONTINUE = True PREEMPT = False KILL = False WANT_SUSPEND = False WANT_VACATE = False RANK = Scheduler =?= \$(DedicatedScheduler) use FEATURE: PartitionableSlot( 1, auto ) Thanks! Martin From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx>
On Behalf Of JOSEPH RYAN REUSS via HTCondor-users Hi Martin! Condor assigns fair share by user, which is not necessarily a human, so let's create a high priority user that a human can utilize so jobs can get high priority. You would need to set 'accounting_group_user =
<some_user>' in your submit file to override the default user selected and select <some_user> instead. You can then set the priority of that user by running 'condor_userprio -setfactor <some_user> <priority number>' on the AP you are submitting the job from. Here's the documentation for reference:
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Beaumont, Martin <Martin.Beaumont@xxxxxxxxxxxxxxx> Hi, We sometimes have urgent jobs where we’d want them to bypass all other jobs as soon as possible. Something like a reversed nice_user (greedy_user?). Now that I know how to Hold or Preempt jobs with a timelimit, I’d like a way for an urgent job to be put at the front of the queue, regardless of other users, fair-share, priorities, weights, quotas, job universe, etc. The system would
then wait for enough resources to be free and launch that job before every other regular job from the queue. Is there a configuration that could enable such behavior? Thanks! Martin |