[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] How is multicore supposed to work in HTCondor? How to get started



Hi Jeff,

tagging  a worker node would be done easiest by altering the START _expression_, the startd on the execution point has the last saying over starting the job or putting it back into the idle mode.

something like

START=$START && TARGET.RequestCpus >= 8

Would do the trick (for 8 core jobs)

You could also play it more smart and introduce a tuneable execution point classad like 'multicorerequest = x' 

Make that settable via remote and put it in the START _expression_

START=$START && TARGET.RequestCpus >= multicorerequest

Then you could come up with some kind of intelligent stuff that checks what kind of 'big slots' you need based on the queueing jobs ...

The start _expression_ in general is a powerful thing and justifies some reading in the manual ...

Best
christoph

--
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx


Von: "Jeff Templon" <templon@xxxxxxxxx>
An: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
Gesendet: Dienstag, 18. Juni 2024 14:09:32
Betreff: Re: [HTCondor-users] How is multicore supposed to work in HTCondor? How        to get started

Hi Christoph,
Thanks for your response!

On 18 Jun 2024, at 13:56, Beyer, Christoph <christoph.beyer@xxxxxxx> wrote:

Hi Jeff,

that problem as you might know is as old as batch scheduling ;)

Yes :) On Torque, I know how to do it - I turn on reservations and allow backfilling.  On HTCondor I donât know the tools well enough yet.  Hence my question.

There are a lot of different approaches depending on your overall pool setup.

If your pool is never really full you could teach the  negotiator to completely fill up a node before using  a 'new' one.

This is something I could look into, indeed.  âNeverâ is a strong statement, but often itâs not full.

If your workload is very predictable you could provide some static slots for the  multicore usage or tag some workernodes to only run multicore jobs. 

How would this âtaggingâ work in practice?  I guess the node would need to have some ClassAd, maybe something that would only match jobs that ask for more than N CPUs?

The defrag daemon can be used to drain a configurable number of slots down to a 'whole-machine' definition which would be '32 cores == whole-machine' in your case. Then multicore jobs would jump on these slots. 

This is good, it would be good if whatever does this realises that there may be some reason to drain a particular node, because a user is asking for it.

The startd-policy section in the docs is a good read and also the defrag daemon part is useful !

I will take a look, thanks for the starting point suggestion.

JT


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/