Cheers,
Szabolcs
On Fri, Dec 2, 2016 at 4:35 PM, Szabolcs Horvátth
<szabolcs.horvatth@xxxxxxxxx <mailto:szabolcs.horvatth@xxxxxxxxx>> wrote:
Hi Michael,
I tried setting NEGOTIATOR_ALLOW_QUOTA_OVERSUBSCRIPTION = True (its
probably the longest Condor attr I ever set! :)) and set the group
quota to a huge number, but it did not really
affect the speed of matching empty slots to high priority post
process jobs. I still suspect that there are some claims and
timeouts that delay the matchmaking.
Cheers,
Szabolcs
On Thu, Dec 1, 2016 at 7:27 PM, Michael Pelletier
<Michael.V.Pelletier@xxxxxxxxxxxx
<mailto:Michael.V.Pelletier@xxxxxxxxxxxx>> wrote:
While pondering this question, I found what looks like the
information you need on page 334 of the 8.4.9 manual – in effect
you want a “strict priority” policy for the post-processing DAG
nodes:____
__ __
One possible group quota policy is strict priority. For example,
a site prefers physics users to match as many____
slots as they can, and only when all the physics jobs are
running, and idle slots remain, are chemistry jobs allowed____
to run. The default "starvation group order" can be used to
implement this. By setting configuration variable____
NEGOTIATOR_ALLOW_QUOTA_OVERSUBSCRIPTION to True, and setting the
physics quota to a number so____
large that it cannot ever be met, such as one million, the
physics group will always be the "most starving" group, will____
always negotiate first, and will always be unable to meet the
quota. Only when all the physics jobs are running will____
the chemistry jobs then run.____
__ __
Your post-job is equivalent to “physics” and everything else is
equivalent to “chemistry,” I think.____
__ __
-Michael Pelletier.____
__ __
*From:*HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx
<mailto:htcondor-users-bounces@xxxxxxxxxxx>] *On Behalf Of
*Szabolcs Horvátth
*Sent:* Thursday, December 01, 2016 12:07 PM
*To:* HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx
<mailto:htcondor-users@xxxxxxxxxxx>>
*Subject:* Re: [HTCondor-users] Execute last DAGMan job as soon
as possible____
__ __
It turned out that we modified the default prio factor to 10
(before the condor default switched to 1000) so I changed all
users priority factor to 1000 and set the urgent group's
priority to 1. It did help in shortening the process of the jobs
grabbing free slots, but it still takes between 10-15 minutes to
do so. Whats interesting is that after these ten minutes lots of
slots are allocated to the group, so there is obviously
something affected by the group priority. The might be some
unintentional claim / timeout setting behind all this but I
don't know what to look for.
My main gripe is that why do the jobs wait for minutes, when the
jobs' machine rank is the highest in the pool, the group
priority factor is the lowest, the job priority is also high,
PRIORITY_HALFLIFE = 1 so the amount of resources used should not
matter, and there *are* free slots that get matched to other
users.____
Cheers,____
Szabolcs____
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to
htcondor-users-request@xxxxxxxxxxx
<mailto:htcondor-users-request@xxxxxxxxxxx> with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
<https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users>
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
<https://lists.cs.wisc.edu/archive/htcondor-users/>
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/