We are doing something like what you are doing at Fermilab. Basically our slot-weight _expression_ charges the user by the CPU's or the number of 2GB memory chunks, whichever is higher. i.e. 1cpu 2 GB = 1, 1 cpu 3GB = 1.5, 1 cpu 4GB = 2, 2 cpu 2GB =2 , and so forth. What I don't understand is why you would set the weight of the Partitionable slot to 1, it should be set to how many cpus remaining in it at the time.
The trap that can happen is that if you have a lot of small submitters, sometimes the slot weight of the slots will be so big that a submitter with a low limit will never get a slot. In theory the negotiator could either hand an existing dynamic slot or the whole Partitionable slot to the schedd. In practice it is more the latter that we see. We see the effect that small submitters get frozen out sometimes but they recently put in a patch to fix most of the problem.
The other thing that can happen is that condor_userprio doesn't accurately show the effects of a floating point slot weight, it only reports integer resources used. but the underlying math is right.
Finally be sure that you never have the slot weight variable end up undefined, major craziness can happen then.
Steve Timm
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Vikrant Aggarwal <ervikrant06@xxxxxxxxx>
Sent: Tuesday, August 13, 2019 11:12:20 AM To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx> Subject: Re: [HTCondor-users] Changing slotweight for few nodes in pool Hello Experts,
A gentle follow-up email.
On Mon, 12 Aug, 2019, 19:52 Vikrant Aggarwal, <ervikrant06@xxxxxxxxx> wrote:
|