[HTCondor-users] Effective Priority lower but job stays idle

Hello,

I have these (among others):

[1]

$ condor_userprio -priority

Last Priority Update: 7/21 12:39

Effective Real Priority

User Name Priority Priority Factor

--------------------------------------------- ------------ -------- ---------

group_atlas.mcore.atlpilot001@xxxxxxxxxxxxxxx 1065005.62 10.65 100000.00

group_cms.mcore.cmspilot003@xxxxxxxxxxxxxxx 3259168.25 32.59 100000.00

group_alice.sgmalice@xxxxxxxxxxxxxxx 657536192.00 6575.36 100000.00

--------------------------------------------- ------------ -------- ---------

[2]

$ condor_userprio -usage

Group Res Total Usage Usage Last

User Name In Use (wghted-hrs) Start Time Usage Time

-------------------------------------- ------ ------------ ---------------- ----------------

group_cms 40 5201044.50 4/17/2020 10:55 7/21/2021 12:46

mcore.cmspilot003@xxxxxxxxxxxxxxx 40 4117231.75 9/17/2020 14:12 7/21/2021 12:46

group_atlas 1685 20845206.00 4/17/2020 10:55 7/21/2021 12:46

sgmatlas@xxxxxxxxxxxxxxx 1 712.06 4/17/2020 10:55 7/21/2021 12:39

mcore.atlpilot001@xxxxxxxxxxxxxxx 8 406800.62 9/17/2020 19:30 7/21/2021 12:46

atlpilot001@xxxxxxxxxxxxxxx 260 2988551.00 6/30/2020 09:56 7/21/2021 12:46

mcore.prdatl008@xxxxxxxxxxxxxxx 701 3553249.75 9/17/2020 14:50 7/21/2021 12:46

prdatl008@xxxxxxxxxxxxxxx 716 11126445.00 6/30/2020 09:35 7/21/2021 12:46

group_alice 6762 45051064.00 4/17/2020 10:55 7/21/2021 12:46

Number of users: 11 8492 67240840.00 7/20/2021 12:46

[3]

$ condor_userprio –quotas

Group Effective Config Use Subtree Requested

Name Quota Quota Surplus Quota Resources

-------------------------------------- --------- --------- ------- --------- ----------

group_alice 1552.07 0.18 Regroup 1552.07 6796

group_atlas 3657.77 0.42 Regroup 3657.77 3266

group_cms 2220.79 0.32 Regroup 2741.38 1640

from [2] I get:

alice=> 45051064/67240840=0.67

cms=> 5201044/67240840=0.08

atlas=>20845206/67240840=0.31

My problem is that from [1] mcore.atlas is first served, then cms, them alice.

BUT

1) alice uses only single core, and is always getting slots to run, even though its quota is much much over quota (0.67 instead of 0.18)

2) mcore atlas always gets in, and core.cms NEVER ( I had to reserve one workernode to let them have job running)

analyse says there are slots suitable but busy, ( and I’ve seen some lines in NegotiatorLog saying it is over quota, which is not the case, but I cn’t find those lines anymore)

Anyone might know

1) how to use defrag to leave space for 8cores ?

2) how come cms never enters … ?

Thanks for any help

SF.

[ANALYSE]

]# condor_q -better-analyse 5914078.0

-- Schedd: node16.datagrid.cea.fr : <192.54.206.43:28348>

The Requirements _expression_ for job 5914078.000 is

((NumJobStarts == 0) && ((IfThenElse(RequestCpus isnt undefined,(RequestCpus == 8 || RequestCpus == 1),true)))) && (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") &&

(TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory) && (TARGET.Cpus >= RequestCpus) && (TARGET.HasFileTransfer)

Job 5914078.000 defines the following attributes:

DiskUsage = 150

NumJobStarts = 0

RequestCpus = 8

RequestDisk = DiskUsage

RequestMemory = 24000

The Requirements _expression_ for job 5914078.000 reduces to these conditions:

Slots

Step Matched Condition

----- -------- ---------

[3] 8030 TARGET.Arch == "X86_64"

[5] 8030 TARGET.OpSys == "LINUX"

[7] 8030 TARGET.Disk >= RequestDisk

[9] 289 TARGET.Memory >= RequestMemory

[11] 93 TARGET.Cpus >= RequestCpus

No successful match recorded.

Last failed match: Wed Jul 21 13:04:22 2021

Reason for last match failure: no match found

5914078.000: Run analysis summary ignoring user priority. Of 212 machines,

1 are rejected by your job's requirements

41 reject your job because of their own requirements

0 match and are already running your jobs

0 match but are serving other users

170 are able to run your job

---------------------

Sophie Ferry |

CEA Saclay |

91190 Gif-sur-Yvette |

DRF/IRFU/DEDIP/LIS |

GRIF-IRFU |

Bat 141 p023B |

+33(0)1 69 08 76 45 |

---------------------