We use group accounting and you can see in the negotiator D_FULLDEBUG
output below there are two lines I've inserted the word "HERE" in.
Where the first HERE is, I'm expecting it to be saying that
group_MCprod is over quota so it's skipping it but instead it is
saying that the usage is 0. It goes ahead and negotiates with
group_MCprod then even though at the second HERE you can see it knows
that it's using 3591 slots and the quota is 520. The condor_user_prio
command at the bottom also shows the slots being used. Near the
bottom of the debug output there is also a line with
matchmakingAlgorithm: in it again saying the usage is 0.
I've been fighting with this for a long time. Occasionally one of our
groups will manage to suck up all our slots even though they're over
quota. Most of the time they appear to work.
Any seen this before?
Thanks,
joe
03/22 11:56:06 group group_italy dynamic quota for 11106 slots = 188.000
03/22 11:56:06 Group Table : group group_italy quota 188.000 usage
115.000 prio 61.17
03/22 11:56:06 group group_japan dynamic quota for 11106 slots = 233.000
03/22 11:56:06 Group Table : group group_japan quota 233.000 usage
0.000 prio 0.00
03/22 11:56:06 group group_karlsruhe dynamic quota for 11106 slots =
55.000
03/22 11:56:06 Group Table : group group_karlsruhe quota 55.000 usage
0.000 prio 0.00
03/22 11:56:06 group group_mit dynamic quota for 11106 slots = 33.000
03/22 11:56:06 Group Table : group group_mit quota 33.000 usage 0.000
prio 0.00
03/22 11:56:06 group group_physmon dynamic quota for 11106 slots = 11.000
03/22 11:56:06 Group Table : group group_physmon quota 11.000 usage
0.000 prio 0.00
03/22 11:56:06 group group_prd dynamic quota for 11106 slots = 815.000
03/22 11:56:06 Group Table : group group_prd quota 815.000 usage
299.000 prio 36.69
03/22 11:56:06 group group_sam dynamic quota for 11106 slots = 277.000
03/22 11:56:06 Group Table : group group_sam quota 277.000 usage 0.000
prio 0.00
03/22 11:56:06 group group_fixedwntest dynamic quota for 11106 slots =
55.000
03/22 11:56:06 Group Table : group group_fixedwntest quota 55.000
usage 0.000 prio 0.00
03/22 11:56:06 group group_fnal dynamic quota for 11106 slots = 233.000
03/22 11:56:06 Group Table : group group_fnal quota 233.000 usage
173.000 prio 74.25
03/22 11:56:06 group group_highprio dynamic quota for 11106 slots =
888.000
03/22 11:56:06 Group Table : group group_highprio quota 888.000 usage
147.000 prio 16.55
03/22 11:56:06 group group_ntp dynamic quota for 11106 slots = 916.000
03/22 11:56:06 Group Table : group group_ntp quota 916.000 usage
567.000 prio 61.90
03/22 11:56:06 group group_mcprod dynamic quota for 11106 slots = 520.000
HERE --------> 03/22 11:56:06 Group Table : group group_mcprod quota
520.000 usage 0.000 prio 0.00
03/22 11:56:06 group group_btagging dynamic quota for 11106 slots =
222.000
03/22 11:56:06 Group Table : group group_btagging quota 222.000 usage
0.000 prio 0.00
03/22 11:56:06 group group_dbg dynamic quota for 11106 slots = 55.000
03/22 11:56:06 Group Table : group group_dbg quota 55.000 usage 0.000
prio 0.00
03/22 11:56:06 Group group_alignment - skipping, no submitters
03/22 11:56:06 Group group_calib - skipping, no submitters
03/22 11:56:06 Group group_dqm - skipping, no submitters
03/22 11:56:06 Group group_florida - skipping, no submitters
03/22 11:56:06 Group group_japan - skipping, no submitters
03/22 11:56:06 Group group_karlsruhe - skipping, no submitters
03/22 11:56:06 Group group_mit - skipping, no submitters
03/22 11:56:06 Group group_physmon - skipping, no submitters
03/22 11:56:06 Group group_sam - skipping, no submitters
03/22 11:56:06 Group group_fixedwntest - skipping, no submitters
03/22 11:56:06 Group group_mcprod - negotiating
03/22 11:56:06 Phase 3: Sorting submitter ads by priority ...
03/22 11:56:06 Phase 4.1: Negotiating with schedds ...
03/22 11:56:06 numSlots = 520
03/22 11:56:06 slotWeightTotal = 520.000000
03/22 11:56:06 pieLeft = 520.000
03/22 11:56:06 NormalFactor = 1.000000
03/22 11:56:06 MaxPrioValue = 25528.660156
03/22 11:56:06 NumSubmitterAds = 1
03/22 11:56:06 Negotiating with group_MCprod.vellidis@xxxxxxxx at
<131.225.240.215:38554>
03/22 11:56:06 0 seconds so far
03/22 11:56:06 Calculating submitter limit with the following
parameters
03/22 11:56:06 SubmitterPrio = 25528.660156
03/22 11:56:06 SubmitterPrioFactor = 20.000000
03/22 11:56:06 submitterShare = 1.000000
03/22 11:56:06 submitterAbsShare = 1.000000
03/22 11:56:06 submitterLimit = 520.000000
HERE ---------> 03/22 11:56:06 submitterUsage = 3591.000000
03/22 11:56:06 Socket to group_MCprod.vellidis@xxxxxxxx
(<131.225.240.215:38554>) already in cache, reusing
03/22 11:56:06 Sending SEND_JOB_INFO/eom
03/22 11:56:06 Getting reply from schedd ...
03/22 11:56:06 Got JOB_INFO command; getting classad/eom
03/22 11:56:06 Request 17947890.00000:
03/22 11:56:06 matchmakingAlgorithm: limit 520.000000 used 0.000000
pieLeft 520.000000
03/22 11:56:06 Start of sorting MatchList (len=44)
03/22 11:56:06 Finished sorting MatchList
03/22 11:56:06 Connecting to startd
glidein_5068@xxxxxxxxxxxxxxxxxxxx at <131.225.238.42:43337>
03/22 11:56:06 Sending PERMISSION, claim id, startdAd to schedd
03/22 11:56:06 Matched 17947890.0 group_MCprod.vellidis@xxxxxxxx
<131.225.240.215:38554> preempting none <131.225.238.42:43337>
glidein_5068@xxxxxxxxxxxxxxxxxxxx
[cdfcaf@fcdfhead10 /export/condor_local/spool] condor_userprio
-getreslist group_MCprod.vellidis@xxxxxxxx | tail -1
Number of Resources Used: 3579
[cdfcaf@fcdfhead10 /export/condor_local/spool] condor_userprio
-getreslist group_mcprod.vellidis@xxxxxxxx | tail -1
Number of Resources Used: 0
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/