Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] jobprio isn't global?
- Date: Mon, 10 Mar 2014 14:26:30 -0500
- From: Brian Bockelman <bbockelm@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] jobprio isn't global?
Hi,
Yes - I think this is a pretty rare setup. Usually there's a wider distribution of job types and runtimes, allowing things to mix.
Unfortunately, building a scale test to look like a "real job mixture" is quite difficult.
Brian
On Mar 7, 2014, at 2:39 AM, Pek Daniel <pekdaniel@xxxxxxxxx> wrote:
> Thanks for the answer! But am I right when I think that in real life
> scenarios this kind of situation practically never happens because
> usually there are plenty of users who has their ever-changing
> priorities which naturally distinguishes between jobs on different
> schedds, so basically the load is spread across schedds when
> negotiating?
>
> 2014-03-06 22:36 GMT+01:00 Brian Bockelman <bbockelm@xxxxxxxxxxx>:
>>
>> On Mar 6, 2014, at 10:42 AM, Pek Daniel <pekdaniel@xxxxxxxxx> wrote:
>>
>>> Hi,
>>>
>>> First, I turned off the negotiator.
>>>
>>> I've submitted 80 000 identical jobs from 10 schedd nodes with the
>>> same user, 8000 jobs / schedd.
>>>
>>> After, I turned on the negotiator.
>>>
>>> I noticed, that during the negotiation, all of the jobs from a single
>>> schedd will be dispatched first, then from the second one, etc,
>>> sequentially.
>>>
>>> Then, I tried to trick around a bit, and I assigned a randomized
>>> JobPrio to every job (0-1000000) with the 'priority' submitfile
>>> command. I experienced the same behaviour.
>>>
>>> I can imagine two explanations:
>>> - jobprio is local to a specific schedd, and doesn't have any effect
>>> on the order of dispatching across different schedds.
>>>
>>> - jobprio is ignored for some reason, maybe a global setting which
>>> overwrites it...
>>> The possible configuration settings which - in my opinion - can affect this:
>>>
>>> ## When is this machine willing to start a job?
>>> START = TRUE
>>>
>>> ## We don't want preemption ever to be used
>>> PREEMPT = FALSE
>>> SUSPEND = FALSE
>>> KILL = FALSE
>>> PREEMPTION_REQUIREMENTS = FALSE
>>> NEGOTIATOR_CONSIDER_PREEMPTION = FALSE
>>> RANK = 0
>>>
>>> Any idea what can cause this, and how to circumvent the original problem?
>>>
>>
>> Hi Daniel,
>>
>> It's a bit of an undocumented knob, but you can set:
>>
>> USE_GLOBAL_JOB_PRIOS = true
>>
>> in the negotiator and the schedd's IIRC.
>>
>> You're not going to be very happy though - that protocol was designed to work with dozens of different priorities, not tens of thousands. I bet it won't work well (would love to hear I'm wrong!).
>>
>> No clever ideas on how to randomly order the schedd list. I suppose one could argue that if the jobs have identical priority, then htcondor can do whatever it wants.
>>
>> However, I also suspect this would be pretty easy to make configurable.
>>
>> What do others think?
>>
>> Brian
>>
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/