Hello, everyone.
I'm running a cluster with multi-account_groups.
Here is the information on the cluster, including quota, etc.
Group Computed Config Quota Use Auto Claimed Requestd SubmtersAllocatd
Name quota quota static surplus Regroup cores cores in group cores
----------------------------------------------------------------------------------------------------
<none> 0 0 N Y N 0 0 0 0
group_alice 1173.28 0.2255 N Y N 3576 91324 2 0
group_cms 2314.29 0.4448 N Y N 20 21 3 0
group_genome 1715.43 0.3297 N Y N 0 0 0 0
group_genome.bio 1715.43 1 N Y N 0 0 0 0
I'm contacting you because in my environment, group_alice has many jobs queued up, and when I throw a test job, it seems that a negotiator ignored the test job.
Below is a log of a negotiator.
07/18/24 18:01:11 group quotas: groups= 5 requesting= 2 served= 2 unserved= 0 requested= 91345 allocated= 6918.43 surplus= 0 maxdelta= 3321.43
...
07/18/24 18:01:11 Phase 4.1: Negotiating with schedds ...
07/18/24 18:01:11 Negotiating with group_alice.kiaf@xxxxxxxxx at <134.75.125.41:9618?addrs=134.75.125.41-9618+[2001-320-15-125-ca1f-66ff-fedb-5d65]-9618&alias=kiaf-ui.sdfarm.kr&noUDP&sock=schedd_2978860_c47c>
07/18/24 18:01:11 0 seconds so far for this submitter
07/18/24 18:01:11 0 seconds so far for this schedd
07/18/24 18:01:11 Negotiating with group_alice.<Heavy User>@sdfarm.kr at <134.75.125.41:9618?addrs=134.75.125.41-9618+[2001-320-15-125-ca1f-66ff-fedb-5d65]-9618&alias=kiaf-ui.sdfarm.kr&noUDP&sock=schedd_2978860_c47c>
07/18/24 18:01:11 0 seconds so far for this submitter
07/18/24 18:01:11 0 seconds so far for this schedd
07/18/24 18:01:11 Starting prefetch round; 1 potential prefetches to do.
07/18/24 18:01:11 Starting prefetch negotiation for group_alice.<Heavy User>@sdfarm.kr.
07/18/24 18:01:11 Got NO_MORE_JOBS; schedd has no more requests
07/18/24 18:01:11 Prefetch summary: 1 attempted, 1 successful.
07/18/24 18:01:11 Phase 4.2: Negotiating with schedds ...
07/18/24 18:01:11 Negotiating with group_alice.<Heavy User>@sdfarm.kr at <134.75.125.41:9618?addrs=134.75.125.41-9618+[2001-320-15-125-ca1f-66ff-fedb-5d65]-9618&alias=kiaf-ui.sdfarm.kr&noUDP&sock=schedd_2978860_c47c>
07/18/24 18:01:11 0 seconds so far for this submitter
07/18/24 18:01:11 0 seconds so far for this schedd
07/18/24 18:01:11 negotiateWithGroup resources used submitterAds length 0
...
07/18/24 18:01:11 Round 1 totals: allocated= 6918.43 usage= 3596
07/18/24 18:01:11 group quotas: allocation round 2
07/18/24 18:01:11 group quotas: groups= 5 requesting= 2 served= 2 unserved= 0 requested= 3596 allocated= 3596 surplus= 3322.43 maxdelta= 0
07/18/24 18:01:11 group quotas: entering RR iteration n= 0
07/18/24 18:01:11 Group group_genome - skipping, zero slots allocated
07/18/24 18:01:11 Group group_genome.bio - skipping, zero slots allocated
07/18/24 18:01:11 Group group_cms - skipping, no submitters (usage=20)
07/18/24 18:01:11 Group group_alice - skipping, no submitters (usage=3576)
07/18/24 18:01:11 Group <none> - skipping, zero slots allocated
07/18/24 18:01:11 Round 2 totals: allocated= 3596 usage= 3596
The negotiation didn't fail because of a lack of resources, as there is a dedicated slot for performing the test job; it doesn't appear to have happened at all, and there are no logs in MatchLog for that group_alice.kiaf account.
If this is the case, it would appear that when a user other than the heavyweight submits a job, the conclusion is that if the job is not finished to the appropriate quota level, the other user's job will not be executed first.
If you know anything about this, please share.
Thank you.
Regards,
-- Geonmo