Hi Michael,
I tried setting NEGOTIATOR_ALLOW_QUOTA_OVERSUBSCRIPTION = True (its probably the longest Condor attr I ever set! :)) and set the group quota to a huge number, but it did not really
affect the speed of matching empty slots to high priority post process jobs. I still suspect that there are some claims and timeouts that delay the matchmaking.
Cheers,
SzabolcsOn Thu, Dec 1, 2016 at 7:27 PM, Michael Pelletier <Michael.V.Pelletier@raytheon.com > wrote:______________________________While pondering this question, I found what looks like the information you need on page 334 of the 8.4.9 manual â in effect you want a âstrict priorityâ policy for the post-processing DAG nodes:
Â
One possible group quota policy is strict priority. For example, a site prefers physics users to match as many
slots as they can, and only when all the physics jobs are running, and idle slots remain, are chemistry jobs allowed
to run. The default "starvation group order" can be used to implement this. By setting configuration variable
NEGOTIATOR_ALLOW_QUOTA_OVERSUB
SCRIPTION to True, and setting the physics quota to a number solarge that it cannot ever be met, such as one million, the physics group will always be the "most starving" group, will
always negotiate first, and will always be unable to meet the quota. Only when all the physics jobs are running will
the chemistry jobs then run.
Â
Your post-job is equivalent to âphysicsâ and everything else is equivalent to âchemistry,â I think.
Â
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ -Michael Pelletier.
Â
From: HTCondor-users [mailto:htcondor-users-bounces
@cs.wisc.edu ] On Behalf Of Szabolcs HorvÃtth
Sent: Thursday, December 01, 2016 12:07 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Execute last DAGMan job as soon as possibleÂ
It turned out that we modified the default prio factor to 10 (before the condor default switched to 1000) so I changed all users priority factor to 1000 and set the urgent group's priority to 1. It did help in shortening the process of the jobs grabbing free slots, but it still takes between 10-15 minutes to do so. Whats interesting is that after these ten minutes lots of slots are allocated to the group, so there is obviously something affected by the group priority. The might be some unintentional claim / timeout setting behind all this but I don't know what to look for.
My main gripe is that why do the jobs wait for minutes, when the jobs' machine rank is the highest in the pool, the group priority factor is the lowest, the job priority is also high, PRIORITY_HALFLIFE = 1 so the amount of resources used should not matter, and there *are* free slots that get matched to other users.Cheers,
Szabolcs
_________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxx.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/