Hi Jeff,
tagging a worker node would be done easiest by altering the START _expression_, the startd on the execution point has the last saying over starting the job or putting it back into the idle mode.
something like
START=$START && TARGET.RequestCpus >= 8
Would do the trick (for 8 core jobs)
You could also play it more smart and introduce a tuneable execution point classad like 'multicorerequest = x'
Make that settable via remote and put it in the START _expression_
START=$START && TARGET.RequestCpus >= multicorerequest
Then you could come up with some kind of intelligent stuff that checks what kind of 'big slots' you need based on the queueing jobs ...
The start _expression_ in general is a powerful thing and justifies some reading in the manual ...
Best
christoph
--
Christoph Beyer
DESY Hamburg
IT-Department
Notkestr. 85
Building 02b, Room 009
22607 Hamburg
phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx
Von: "Jeff Templon" <templon@xxxxxxxxx>
An: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
Gesendet: Dienstag, 18. Juni 2024 14:09:32
Betreff: Re: [HTCondor-users] How is multicore supposed to work in HTCondor? How to get started
Hi Christoph,
Thanks for your response!
On 18 Jun 2024, at 13:56, Beyer, Christoph <christoph.beyer@xxxxxxx> wrote:
Hi Jeff,
that problem as you might know is as old as batch scheduling ;)
Yes :) On Torque, I know how to do it - I turn on reservations and allow backfilling. On HTCondor I donât know the tools well enough yet. Hence my question.
There are a lot of different approaches depending on your overall pool setup.
If your pool is never really full you could teach the negotiator to completely fill up a node before using a 'new' one.
This is something I could look into, indeed. âNeverâ is a strong statement, but often itâs not full.
If your workload is very predictable you could provide some static slots for the multicore usage or tag some workernodes to only run multicore jobs.
How would this âtaggingâ work in practice? I guess the node would need to have some ClassAd, maybe something that would only match jobs that ask for more than N CPUs?
The defrag daemon can be used to drain a configurable number of slots down to a 'whole-machine' definition which would be '32 cores == whole-machine' in your case. Then multicore jobs would jump on these slots.
This is good, it would be good if whatever does this realises that there may be some reason to drain a particular node, because a user is asking for it.
The startd-policy section in the docs is a good read and also the defrag daemon part is useful !
I will take a look, thanks for the starting point suggestion.
JT
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/