[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] About negotiation opportunity conditions in a multi-accounting group environment



Hello, everyone.


I'm running a cluster with multi-account_groups. 


Here is the information on the cluster, including quota, etc.


Group                Computed   Config    Quota      Use     Auto  Claimed Requestd SubmtersAllocatd

Name                    quota    quota   static  surplus  Regroup    cores    cores in group   cores

----------------------------------------------------------------------------------------------------

<none>                      0        0        N        Y        N        0        0        0       0

group_alice           1173.28   0.2255        N        Y        N     3576    91324        2       0

group_cms             2314.29   0.4448        N        Y        N       20       21        3       0

group_genome          1715.43   0.3297        N        Y        N        0        0        0       0

group_genome.bio      1715.43        1        N        Y        N        0        0        0       0


I'm contacting you because in my environment, group_alice has many jobs queued up, and when I throw a test job, it seems that a negotiator ignored the test job.


Below is a log of a negotiator.

07/18/24 18:01:11 group quotas: groups= 5  requesting= 2  served= 2  unserved= 0  requested= 91345  allocated= 6918.43  surplus= 0  maxdelta= 3321.43
...
07/18/24 18:01:11 Phase 4.1:  Negotiating with schedds ...
07/18/24 18:01:11   Negotiating with group_alice.kiaf@xxxxxxxxx at <134.75.125.41:9618?addrs=134.75.125.41-9618+[2001-320-15-125-ca1f-66ff-fedb-5d65]-9618&alias=kiaf-ui.sdfarm.kr&noUDP&sock=schedd_2978860_c47c>
07/18/24 18:01:11 0 seconds so far for this submitter
07/18/24 18:01:11 0 seconds so far for this schedd
07/18/24 18:01:11   Negotiating with group_alice.<Heavy User>@sdfarm.kr at <134.75.125.41:9618?addrs=134.75.125.41-9618+[2001-320-15-125-ca1f-66ff-fedb-5d65]-9618&alias=kiaf-ui.sdfarm.kr&noUDP&sock=schedd_2978860_c47c>
07/18/24 18:01:11 0 seconds so far for this submitter
07/18/24 18:01:11 0 seconds so far for this schedd
07/18/24 18:01:11 Starting prefetch round; 1 potential prefetches to do.
07/18/24 18:01:11 Starting prefetch negotiation for group_alice.<Heavy User>@sdfarm.kr.
07/18/24 18:01:11     Got NO_MORE_JOBS;  schedd has no more requests
07/18/24 18:01:11 Prefetch summary: 1 attempted, 1 successful.
07/18/24 18:01:11 Phase 4.2:  Negotiating with schedds ...
07/18/24 18:01:11   Negotiating with group_alice.<Heavy User>@sdfarm.kr at <134.75.125.41:9618?addrs=134.75.125.41-9618+[2001-320-15-125-ca1f-66ff-fedb-5d65]-9618&alias=kiaf-ui.sdfarm.kr&noUDP&sock=schedd_2978860_c47c>
07/18/24 18:01:11 0 seconds so far for this submitter
07/18/24 18:01:11 0 seconds so far for this schedd
07/18/24 18:01:11  negotiateWithGroup resources used submitterAds length 0 
...
07/18/24 18:01:11 Round 1 totals: allocated= 6918.43  usage= 3596
07/18/24 18:01:11 group quotas: allocation round 2
07/18/24 18:01:11 group quotas: groups= 5  requesting= 2  served= 2  unserved= 0  requested= 3596  allocated= 3596  surplus= 3322.43  maxdelta= 0
07/18/24 18:01:11 group quotas: entering RR iteration n= 0
07/18/24 18:01:11 Group group_genome - skipping, zero slots allocated
07/18/24 18:01:11 Group group_genome.bio - skipping, zero slots allocated
07/18/24 18:01:11 Group group_cms - skipping, no submitters (usage=20)
07/18/24 18:01:11 Group group_alice - skipping, no submitters (usage=3576)
07/18/24 18:01:11 Group <none> - skipping, zero slots allocated
07/18/24 18:01:11 Round 2 totals: allocated= 3596  usage= 3596

The negotiation didn't fail because of a lack of resources, as there is a dedicated slot for performing the test job; it doesn't appear to have happened at all, and there are no logs in MatchLog for that group_alice.kiaf account.

If this is the case, it would appear that when a user other than the heavyweight submits a job, the conclusion is that if the job is not finished to the appropriate quota level, the other user's job will not be executed first. 

If you know anything about this, please share.

Thank you.

Regards,

-- Geonmo