Hello, I have these (among others): [1] $ condor_userprio -priority Last Priority Update: 7/21 12:39 Effective Real Priority User Name Priority Priority Factor --------------------------------------------- ------------ -------- --------- group_atlas.mcore.atlpilot001@xxxxxxxxxxxxxxx 1065005.62 10.65 100000.00 group_cms.mcore.cmspilot003@xxxxxxxxxxxxxxx 3259168.25 32.59 100000.00 group_alice.sgmalice@xxxxxxxxxxxxxxx 657536192.00 6575.36 100000.00 --------------------------------------------- ------------ -------- --------- [2] $ condor_userprio -usage Group Res Total Usage Usage Last User Name In Use (wghted-hrs) Start Time Usage Time -------------------------------------- ------ ------------ ---------------- ---------------- group_cms 40 5201044.50 4/17/2020 10:55 7/21/2021 12:46 mcore.cmspilot003@xxxxxxxxxxxxxxx 40 4117231.75 9/17/2020 14:12 7/21/2021 12:46 group_atlas 1685 20845206.00 4/17/2020 10:55 7/21/2021 12:46 sgmatlas@xxxxxxxxxxxxxxx 1 712.06 4/17/2020 10:55 7/21/2021 12:39 mcore.atlpilot001@xxxxxxxxxxxxxxx 8 406800.62 9/17/2020 19:30 7/21/2021 12:46 atlpilot001@xxxxxxxxxxxxxxx 260 2988551.00 6/30/2020 09:56 7/21/2021 12:46 mcore.prdatl008@xxxxxxxxxxxxxxx 701 3553249.75 9/17/2020 14:50 7/21/2021 12:46 prdatl008@xxxxxxxxxxxxxxx 716 11126445.00 6/30/2020 09:35 7/21/2021 12:46 group_alice 6762 45051064.00 4/17/2020 10:55 7/21/2021 12:46 Number of users: 11 8492 67240840.00 7/20/2021 12:46 [3] $ condor_userprio –quotas Group Effective Config Use Subtree Requested Name Quota Quota Surplus Quota Resources -------------------------------------- --------- --------- ------- --------- ---------- group_alice 1552.07 0.18 Regroup 1552.07 6796 group_atlas 3657.77 0.42 Regroup 3657.77 3266 group_cms 2220.79 0.32 Regroup 2741.38 1640 from [2] I get: alice=> 45051064/67240840=0.67 cms=> 5201044/67240840=0.08 atlas=>20845206/67240840=0.31 My problem is that from [1] mcore.atlas is first served, then cms, them alice. BUT 1)
alice uses only single core, and is always getting slots to run, even though its quota is much much over quota (0.67 instead of 0.18) 2)
mcore atlas always gets in, and
core.cms NEVER ( I had to reserve one workernode to let them have job running)
analyse says there are slots suitable but busy, ( and I’ve seen some lines in NegotiatorLog saying it is over quota, which is not the case, but I cn’t find those lines anymore) Anyone might know 1)
how to use defrag to leave space for 8cores ? 2)
how come cms never enters … ? Thanks for any help SF. [ANALYSE] ]# condor_q -better-analyse 5914078.0 -- Schedd: node16.datagrid.cea.fr : <192.54.206.43:28348> The Requirements _expression_ for job 5914078.000 is ((NumJobStarts == 0) && ((IfThenElse(RequestCpus isnt undefined,(RequestCpus == 8 || RequestCpus == 1),true)))) && (TARGET.Arch
== "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory) && (TARGET.Cpus >= RequestCpus) && (TARGET.HasFileTransfer) Job 5914078.000 defines the following attributes: DiskUsage = 150 NumJobStarts = 0 RequestCpus = 8 RequestDisk = DiskUsage RequestMemory = 24000 The Requirements _expression_ for job 5914078.000 reduces to these conditions: Slots Step Matched Condition ----- -------- --------- [3] 8030 TARGET.Arch == "X86_64" [5] 8030 TARGET.OpSys == "LINUX" [7] 8030 TARGET.Disk >= RequestDisk [9] 289 TARGET.Memory >= RequestMemory [11] 93 TARGET.Cpus >= RequestCpus No successful match recorded. Last failed match: Wed Jul 21 13:04:22 2021 Reason for last match failure: no match found 5914078.000: Run analysis summary ignoring user priority. Of 212 machines, 1 are rejected by your job's requirements 41 reject your job because of their own requirements 0 match and are already running your jobs 0 match but are serving other users 170 are able to run your job --------------------- Sophie Ferry
| CEA Saclay
| 91190 Gif-sur-Yvette
|
DRF/IRFU/DEDIP/LIS
| GRIF-IRFU
|
Bat 141 p023B
| +33(0)1 69 08 76 45
| --------------------- |