[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] scheduling delay



On 10/11/2018 2:35 PM, Dimitri Maziuk via HTCondor-users wrote:
> Apologies if this is a FAQ:
> 
> I've a 64-core VM host with cycles to spare. I want it to run jobs but
> keep condor load down at all times. So I configure with
> 
> NUM_SLOTS = 32
> START = ( TotalLoadAvg < 0.5 )
> 
> -- and I have 32 jobs start and drive load average into 30s.
> 
> What I think happens is the jobs are "slow-start" and the load won't go
> up right away -- long enough for the next job to get scheduled, and the
> next one, etc.
> 
> Does that sound right and if so, what's the knob for "wait 60 seconds
> before scheduling another job on this particular host"?
> 

Hi Dimitri,

I'm not sure what policy you are really after with this VM host, nor I am sure what you mean by "I want it to run jobs but keep condor load down at all times".  Perhaps you mean you want to only run jobs when another service on the machine is not using the CPU ?  In which case perhaps you want START to look at NonCondorLoadAvg instead of TotalLoadAvg?  

Since I don't fully understand the problem, I am not sure that "waiting 60+ seconds before scheduling another job on a particular host" is what you really want.... but I took it as a configuration challenge :).    I think appending the following condor_config knobs will ensure that the startd will wait a minimum of 60 seconds (or more depending on how often negotiation happens based on NEGOTIATOR_INTERVAL)  between launching jobs:

  USE FEATURE:PARTITIONABLESLOT
  CLAIM_WORKLIFE = 0
  CLAIM_PARTITIONABLE_LEFTOVERS = False
  START = $(START) && ((CurrentTime - (max(ChildEnteredCurrentState)=?=UNDEFINED ? 0 : max(ChildEnteredCurrentState))) > 60 )

Note the above switches the startd to use partitionable slots, enabling my START expression 
to simply references the most recent time a slot was claimed by max(ChildEnteredCurrentState).

regards and hope the above helps,
Todd