Re: [HTCondor-users] How is multicore supposed to work in HTCondor? How to get started

Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

Oh,

and of course if it is just a static user name you just put it into the start _expression_ and Bob is your uncle - the remote settable option is just usefull for more volatile stuff ...

--
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx

Von: "Christoph Beyer" <christoph.beyer@xxxxxxx>
An: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
Gesendet: Dienstag, 18. Juni 2024 16:16:02
Betreff: Re: [HTCondor-users] How is multicore supposed to work in HTCondor? How to get started

Hi,

you're getting there :D ;)

Setting an attribut is indeed a bit tricky security wise, for once the host you are coming from needs to have write access on the target host. In addition you need to tag the attribut being settable e.g.:

We use an attribut on the execution point called 'startjobs' to disable a node by putting it to 'false' it goes like this:

StartJobs = true # set default
STARTD_ATTRS = $STARTD_ATTRS StartJobs # make it a startd attribut
STARTD.SETTABLE_ATTRS_ADMINISTRATOR = StartJobs # make it settable by administrators
START = ($START && (StartJobs =?= True)) # use it as a start condition

In addition you can make it persitent over reboots using:

ENABLE_PERSISTENT_CONFIG = True
PERSISTENT_CONFIG_DIR = /etc/condor/persistent

Changing the classadd value is also a bit tricky and might be better wrapped in a bash script ;)

condor_config_val -name <workernode> -startd -set "StartJobs = False"
condor_reconfig <workernode> -daemon startd

I would not expect to be able to alter the START _expression_ as such remotely ...

Best

christoph

--
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx

Von: "Jeff Templon" <templon@xxxxxxxxx>
An: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
Gesendet: Dienstag, 18. Juni 2024 15:52:57
Betreff: Re: [HTCondor-users] How is multicore supposed to work in HTCondor? How to get started

Hi,

Thanks, Iâd just read it and came up with a temporary solution as today itâs only one user â

START = Owner == âthe_owner_in_questionâ

However I couldnât get the startd to accept the condor_config_val -set, even though I tried as root. I guess there is some security setting somewhere.

6/18/24 15:35:29 WARNING: Someone at 145.107.4.124 is trying to modify "START"

06/18/24 15:35:29 WARNING: Potential security problem, request refused

J âa lot to learnâ T

On 18 Jun 2024, at 15:34, Beyer, Christoph <christoph.beyer@xxxxxxx> wrote:

Hi Jeff,

tagging a worker node would be done easiest by altering the START _expression_, the startd on the execution point has the last saying over starting the job or putting it back into the idle mode.

something like

START=$START && TARGET.RequestCpus >= 8

Would do the trick (for 8 core jobs)

You could also play it more smart and introduce a tuneable execution point classad like 'multicorerequest = x'

Make that settable via remote and put it in the START _expression_

START=$START && TARGET.RequestCpus >= multicorerequest

Then you could come up with some kind of intelligent stuff that checks what kind of 'big slots' you need based on the queueing jobs ...

The start _expression_ in general is a powerful thing and justifies some reading in the manual ...

Best
christoph

--
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx

Von: "Jeff Templon" <templon@xxxxxxxxx>
An: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
Gesendet: Dienstag, 18. Juni 2024 14:09:32
Betreff: Re: [HTCondor-users] How is multicore supposed to work in HTCondor? How to get started

Hi Christoph,
Thanks for your response!

On 18 Jun 2024, at 13:56, Beyer, Christoph <christoph.beyer@xxxxxxx> wrote:

Hi Jeff,

that problem as you might know is as old as batch scheduling ;)

Yes :) On Torque, I know how to do it - I turn on reservations and allow backfilling. On HTCondor I donât know the tools well enough yet. Hence my question.

There are a lot of different approaches depending on your overall pool setup.

If your pool is never really full you could teach the negotiator to completely fill up a node before using a 'new' one.

This is something I could look into, indeed. âNeverâ is a strong statement, but often itâs not full.

If your workload is very predictable you could provide some static slots for the multicore usage or tag some workernodes to only run multicore jobs.

How would this âtaggingâ work in practice? I guess the node would need to have some ClassAd, maybe something that would only match jobs that ask for more than N CPUs?

The defrag daemon can be used to drain a configurable number of slots down to a 'whole-machine' definition which would be '32 cores == whole-machine' in your case. Then multicore jobs would jump on these slots.

This is good, it would be good if whatever does this realises that there may be some reason to drain a particular node, because a user is asking for it.

The startd-policy section in the docs is a good read and also the defrag daemon part is useful !

I will take a look, thanks for the starting point suggestion.

JT

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

Mailing List Archives

Authenticated access

Re: [HTCondor-users] How is multicore supposed to work in HTCondor? How to get started