[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Tuning condor_defrag



On 4/21/2015 4:04 PM, Anthony Tiradani wrote:
Apologies if you get this twice, I think I sent this to the wrong address
originally.

For the moment we are avoiding large scale fragmentation of our cluster by
requesting that our stakeholders request jobs in 8 core chunks.  We only have a
small number of other jobs coming in with different request sizes.  However, we
want to be able to handle varying request sizes with greater frequency and that
requires tuning the condor_defrag to avoid slot starvation.

What solutions do other sites uses?  Preferably, I would like to have something
that "auto-tunes" the condor_defrag settings.

We have been doing some thinking about how to 'auto-tune' condor_defrag as well, in a similar direction as to how the condor_rooster knows when (and which) machines to wake up from hibernation. With the condor_rooster approach, when nobody is using an execute node and HTCondor takes it offline by hibernating, the negotiator leaves behind hints in the machine ClassAd that effectively say "if this machine were actually awake, I could match it". The condor_rooster then incorporates these hints from the negotiator in its wake up policy.

Similarly, the matchmaker could leave behind hints like "I could make more desirable matches to machine X if it was drained", which condor_defrag could then incorporate into its policy to "auto-tune". Several gory details about this line of thinking are written down in this first-draft developer design document at http://goo.gl/eMwJCv. Interested in your feedback as always...

thanks
Todd