On 18 Jun 2024, at 13:56, Beyer, Christoph <christoph.beyer@xxxxxxx> wrote:
Hi Jeff,
that problem as you might know is as old as batch scheduling ;)
Yes :) On Torque, I know how to do it - I turn on reservations and allow backfilling. On HTCondor I donât know the tools well enough yet. Hence my question.
There are a lot of different approaches depending on your overall pool setup.
If your pool is never really full you could teach the negotiator to completely fill up a node before using a 'new' one.
This is something I could look into, indeed. âNeverâ is a strong statement, but often itâs not full.
If your workload is very predictable you could provide some static slots for the multicore usage or tag some workernodes to only run multicore jobs.
How would this âtaggingâ work in practice? I guess the node would need to have some ClassAd, maybe something that would only match jobs that ask for more than N CPUs?
The defrag daemon can be used to drain a configurable number of slots down to a 'whole-machine' definition which would be '32 cores == whole-machine' in your case. Then multicore jobs would jump on these slots.
This is good, it would be good if whatever does this realises that there may be some reason to drain a particular node, because a user is asking for it.
The startd-policy section in the docs is a good read and also the defrag daemon part is useful !
I will take a look, thanks for the starting point suggestion.