Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] jobprio isn't global?

Date: Mon, 10 Mar 2014 14:26:30 -0500
From: Brian Bockelman <bbockelm@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] jobprio isn't global?

Hi,

Yes - I think this is a pretty rare setup.  Usually there's a wider distribution of job types and runtimes, allowing things to mix.

Unfortunately, building a scale test to look like a "real job mixture" is quite difficult.

Brian

On Mar 7, 2014, at 2:39 AM, Pek Daniel <pekdaniel@xxxxxxxxx> wrote:

> Thanks for the answer! But am I right when I think that in real life
> scenarios this kind of situation practically never happens because
> usually there are plenty of users who has their ever-changing
> priorities which naturally distinguishes between jobs on different
> schedds, so basically the load is spread across schedds when
> negotiating?
> 
> 2014-03-06 22:36 GMT+01:00 Brian Bockelman <bbockelm@xxxxxxxxxxx>:
>> 
>> On Mar 6, 2014, at 10:42 AM, Pek Daniel <pekdaniel@xxxxxxxxx> wrote:
>> 
>>> Hi,
>>> 
>>> First, I turned off the negotiator.
>>> 
>>> I've submitted 80 000 identical jobs from 10 schedd nodes with the
>>> same user, 8000 jobs / schedd.
>>> 
>>> After, I turned on the negotiator.
>>> 
>>> I noticed, that during the negotiation, all of the jobs from a single
>>> schedd will be dispatched first, then from the second one, etc,
>>> sequentially.
>>> 
>>> Then, I tried to trick around a bit, and I assigned a randomized
>>> JobPrio to every job (0-1000000) with the 'priority' submitfile
>>> command. I experienced the same behaviour.
>>> 
>>> I can imagine two explanations:
>>> - jobprio is local to a specific schedd, and doesn't have any effect
>>> on the order of dispatching across different schedds.
>>> 
>>> - jobprio is ignored for some reason, maybe a global setting which
>>> overwrites it...
>>> The possible configuration settings which - in my opinion - can affect this:
>>> 
>>> ##  When is this machine willing to start a job?
>>> START = TRUE
>>> 
>>> ## We don't want preemption ever to be used
>>> PREEMPT = FALSE
>>> SUSPEND = FALSE
>>> KILL = FALSE
>>> PREEMPTION_REQUIREMENTS = FALSE
>>> NEGOTIATOR_CONSIDER_PREEMPTION = FALSE
>>> RANK = 0
>>> 
>>> Any idea what can cause this, and how to circumvent the original problem?
>>> 
>> 
>> Hi Daniel,
>> 
>> It's a bit of an undocumented knob, but you can set:
>> 
>> USE_GLOBAL_JOB_PRIOS = true
>> 
>> in the negotiator and the schedd's IIRC.
>> 
>> You're not going to be very happy though - that protocol was designed to work with dozens of different priorities, not tens of thousands.  I bet it won't work well (would love to hear I'm wrong!).
>> 
>> No clever ideas on how to randomly order the schedd list.  I suppose one could argue that if the jobs have identical priority, then htcondor can do whatever it wants.
>> 
>> However, I also suspect this would be pretty easy to make configurable.
>> 
>> What do others think?
>> 
>> Brian
>> 
>> 
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>> 
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/

References:
- [HTCondor-users] jobprio isn't global?
  - From: Pek Daniel
- Re: [HTCondor-users] jobprio isn't global?
  - From: Brian Bockelman
- Re: [HTCondor-users] jobprio isn't global?
  - From: Pek Daniel

Prev by Date: [HTCondor-users] DAGMan node status file
Next by Date: Re: [HTCondor-users] PERMISSION DENIED log in StartLog for command 440 (MATCH_INFO)
Previous by thread: Re: [HTCondor-users] jobprio isn't global?
Next by thread: [HTCondor-users] condor_power query
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [HTCondor-users] jobprio isn't global?