[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Is ExpectedMachineGracefulDrainingBadput the sum of subslots and related defrag questions



# this ought to work even if ExpectedRuntimeHours were undefined, right?
START = $(START) && (KillableJob =?= true || ExpectedRuntimeHours <= 6)
	Seems unlikely:

$ classad_eval 'ExpectedRunTimeHours <= 6'
[  ]
undefined
$ classad_eval 'ExpectedRunTimeHours = 5' 'ExpectedRunTimeHours <= 6'
[ ExpectedRunTimeHours = 5 ]
true
$ classad_eval 'START=true' 'START && (KillableJOb =?= true || ExpectedRunTimeHours <= 6)'
[ START = true ]
undefined
$ classad_eval 'ExpectedRuntimeHours = 5' 'START=true' 'START && (KillableJOb =?= true || ExpectedRunTimeHours <= 6)'
[ START = true; ExpectedRuntimeHours = 5 ]
true

Also, while I expect DEFRAG_RANK to mostly steer condor_defrag to the machines with lower MaxJobVacateTime should we worry about DEFRAG_MAX_CONCURRENT_DRAINING = 10 if we have many more than 10 of the second kind of machines defined? If so, any idea which handle to use to ensure a good turn-around time?
	DEFRAG_MAX_CONCURRENT_DRAINING is just a throttle, and what you 
want to set it to is as much a matter of your job mix as your hardware 
configuration.  To absolutely minimize turn-around time of the "big" jobs, 
of course, you'd just not run "small" jobs on the big-job machines. 
Otherwise, it seems like setting the throttles to allow the defrag daemon 
to start draining all of your second type of machines would result in the 
shortest turn-around time.  It's just not as efficient.
Have we?
	Looks generally sane to me, although I can't speak to the question 
about if the badput numbers are summer across d-slots.
	Depending on how much need there is for these very large slots, 
you may also want to discourage them from matching smaller jobs -- you 
spent quite a bit of effor draining them.  One trick I've heard is to 
adjust the START expression for the the designated big-job slots to avoid 
matching small jobs for some amount of time after a defrag.  (HTCondor 
matches jobs based on user priority, so this allows that startd to wait 
until the high-priority but small jobs have all been started elsewhere.)
- ToddM