# this ought to work even if ExpectedRuntimeHours were undefined, right? START = $(START) && (KillableJob =?= true || ExpectedRuntimeHours <= 6)
Seems unlikely: $ classad_eval 'ExpectedRunTimeHours <= 6' [ ] undefined $ classad_eval 'ExpectedRunTimeHours = 5' 'ExpectedRunTimeHours <= 6' [ ExpectedRunTimeHours = 5 ] true $ classad_eval 'START=true' 'START && (KillableJOb =?= true || ExpectedRunTimeHours <= 6)' [ START = true ] undefined $ classad_eval 'ExpectedRuntimeHours = 5' 'START=true' 'START && (KillableJOb =?= true || ExpectedRunTimeHours <= 6)' [ START = true; ExpectedRuntimeHours = 5 ] true
Also, while I expect DEFRAG_RANK to mostly steer condor_defrag to the machines with lower MaxJobVacateTime should we worry about DEFRAG_MAX_CONCURRENT_DRAINING = 10 if we have many more than 10 of the second kind of machines defined? If so, any idea which handle to use to ensure a good turn-around time?
DEFRAG_MAX_CONCURRENT_DRAINING is just a throttle, and what you want to set it to is as much a matter of your job mix as your hardware configuration. To absolutely minimize turn-around time of the "big" jobs, of course, you'd just not run "small" jobs on the big-job machines. Otherwise, it seems like setting the throttles to allow the defrag daemon to start draining all of your second type of machines would result in the shortest turn-around time. It's just not as efficient.
Have we?
Looks generally sane to me, although I can't speak to the question about if the badput numbers are summer across d-slots.
Depending on how much need there is for these very large slots, you may also want to discourage them from matching smaller jobs -- you spent quite a bit of effor draining them. One trick I've heard is to adjust the START expression for the the designated big-job slots to avoid matching small jobs for some amount of time after a defrag. (HTCondor matches jobs based on user priority, so this allows that startd to wait until the high-priority but small jobs have all been started elsewhere.)
- ToddM