The Negotiator log for a similar run is listed below. I do not see the negotiator writing something like you mentioned in your trials: 08/22/12 11:15:24 Rejected 3.1
alice@.internal <10.220.159.79:59831>: fair share exceeded NegotiatorLog 09/20/12 20:01:27 Got SIGTERM. Performing graceful shutdown. 09/20/12 20:01:27 **** condor_negotiator (condor_NEGOTIATOR) pid 27571 EXITING WITH STATUS 0 09/20/12 20:03:09 Setting maximum accepts per cycle 8. 09/20/12 20:03:09 ****************************************************** 09/20/12 20:03:09 ** condor_negotiator (CONDOR_NEGOTIATOR) STARTING UP 09/20/12 20:03:09 ** /usr/local/condor-7.8.0/sbin/condor_negotiator 09/20/12 20:03:09 ** SubsystemInfo: name=NEGOTIATOR type=NEGOTIATOR(4) class=DAEMON(1) 09/20/12 20:03:09 ** Configuration: subsystem:NEGOTIATOR local:<NONE> class:DAEMON 09/20/12 20:03:09 ** $CondorVersion: 7.8.0 May 08 2012 $ 09/20/12 20:03:09 ** $CondorPlatform: x86_64_rhap_5.7 $ 09/20/12 20:03:09 ** PID = 27664 09/20/12 20:03:09 ** Log last touched 9/20 20:01:27 09/20/12 20:03:09 ****************************************************** 09/20/12 20:03:09 Using config source: /usr/local/condor-7.8.0/etc/condor_config 09/20/12 20:03:09 Using local config sources:
09/20/12 20:03:09 /usr/local/condor/condor_config.local 09/20/12 20:03:09 DaemonCore: command socket at <10.0.3.124:34940> 09/20/12 20:03:09 DaemonCore: private command socket at <10.0.3.124:34940> 09/20/12 20:03:09 Setting maximum accepts per cycle 8. 09/20/12 20:03:09 About to rotate ClassAd log /usr/local/condor/spool/Accountantnew.log 09/20/12 20:03:09 NEGOTIATOR_SOCKET_CACHE_SIZE = 16 09/20/12 20:03:09 PREEMPTION_REQUIREMENTS = ((SubmitterGroup =?= RemoteGroup) && ((time() - EnteredCurrentState) > (1 * (60 * 60))) && (RemoteUserPrio > TARGET.SubmitterUserPrio
* 1.2)) || (MY.NiceUser == True) 09/20/12 20:03:09 ACCOUNTANT_HOST = None (local) 09/20/12 20:03:09 NEGOTIATOR_INTERVAL = 60 sec 09/20/12 20:03:09 NEGOTIATOR_TIMEOUT = 30 sec 09/20/12 20:03:09 MAX_TIME_PER_SUBMITTER = 31536000 sec 09/20/12 20:03:09 MAX_TIME_PER_PIESPIN = 31536000 sec 09/20/12 20:03:09 PREEMPTION_RANK = (RemoteUserPrio * 1000000) - TARGET.ImageSize 09/20/12 20:03:09 NEGOTIATOR_PRE_JOB_RANK = RemoteOwner =?= UNDEFINED 09/20/12 20:03:09 NEGOTIATOR_POST_JOB_RANK = (RemoteOwner =?= UNDEFINED) * (KFlops - SlotID - 1.0e10*(Offline=?=True)) 09/20/12 20:03:09 ---------- Started Negotiation Cycle ---------- 09/20/12 20:03:09 Phase 1: Obtaining ads from collector ... 09/20/12 20:03:09 Getting Scheduler, Submitter and Machine ads ... 09/20/12 20:03:09 Sorting 19 ads ... 09/20/12 20:03:09 Getting startd private ads ... 09/20/12 20:03:09 Got ads: 19 public and 16 private 09/20/12 20:03:09 Public ads include 2 submitter, 16 startd 09/20/12 20:03:09 Phase 2: Performing accounting ... 09/20/12 20:03:09 Phase 3: Sorting submitter ads by priority ... 09/20/12 20:03:09 Phase 4.1: Negotiating with schedds ... 09/20/12 20:03:09 Negotiating with eitan@xxxxxxxxxxx at <10.0.3.124:57920> 09/20/12 20:03:09 0 seconds so far 09/20/12 20:03:09 Request 00288.00000: 09/20/12 20:03:09 Matched 288.0 eitan@xxxxxxxxxxx <10.0.3.124:57920> preempting none <10.0.3.124:36464> slot9@xxxxxxxxxxxxxxxxxxxxxx 09/20/12 20:03:09 Successfully matched with slot9@xxxxxxxxxxxxxxxxxxxxxx 09/20/12 20:03:09 Request 00288.00001: 09/20/12 20:03:09 Matched 288.1 eitan@xxxxxxxxxxx <10.0.3.124:57920> preempting none <10.0.3.124:36464> slot10@xxxxxxxxxxxxxxxxxxxxxx 09/20/12 20:03:09 Successfully matched with slot10@xxxxxxxxxxxxxxxxxxxxxx 09/20/12 20:03:09 Request 00288.00002: 09/20/12 20:03:09 Matched 288.2 eitan@xxxxxxxxxxx <10.0.3.124:57920> preempting none <10.0.3.124:36464> slot11@xxxxxxxxxxxxxxxxxxxxxx 09/20/12 20:03:09 Successfully matched with slot11@xxxxxxxxxxxxxxxxxxxxxx 09/20/12 20:03:09 Request 00288.00003: 09/20/12 20:03:09 Matched 288.3 eitan@xxxxxxxxxxx <10.0.3.124:57920> preempting none <10.0.3.124:36464> slot12@xxxxxxxxxxxxxxxxxxxxxx 09/20/12 20:03:09 Successfully matched with slot12@xxxxxxxxxxxxxxxxxxxxxx 09/20/12 20:03:09 Request 00288.00004: 09/20/12 20:03:09 Matched 288.4 eitan@xxxxxxxxxxx <10.0.3.124:57920> preempting none <10.0.3.124:36464> slot13@xxxxxxxxxxxxxxxxxxxxxx 09/20/12 20:03:09 Successfully matched with slot13@xxxxxxxxxxxxxxxxxxxxxx 09/20/12 20:03:09 Got NO_MORE_JOBS; done negotiating 09/20/12 20:03:09 Negotiating with leader@xxxxxxxxxxx at <10.0.3.124:57920> 09/20/12 20:03:09 0 seconds so far 09/20/12 20:03:09 Request 00289.00000: 09/20/12 20:03:09 Matched 289.0 leader@xxxxxxxxxxx <10.0.3.124:57920> preempting none <10.0.3.124:36464> slot14@xxxxxxxxxxxxxxxxxxxxxx 09/20/12 20:03:09 Successfully matched with slot14@xxxxxxxxxxxxxxxxxxxxxx 09/20/12 20:03:09 Request 00289.00001: 09/20/12 20:03:09 Matched 289.1 leader@xxxxxxxxxxx <10.0.3.124:57920> preempting none <10.0.3.124:36464> slot15@xxxxxxxxxxxxxxxxxxxxxx 09/20/12 20:03:09 Successfully matched with slot15@xxxxxxxxxxxxxxxxxxxxxx 09/20/12 20:03:09 Request 00289.00002: 09/20/12 20:03:09 Matched 289.2 leader@xxxxxxxxxxx <10.0.3.124:57920> preempting none <10.0.3.124:36464> slot16@xxxxxxxxxxxxxxxxxxxxxx 09/20/12 20:03:09 Successfully matched with slot16@xxxxxxxxxxxxxxxxxxxxxx 09/20/12 20:03:09 Request 00289.00003: 09/20/12 20:03:09 Rejected 289.3 leader@xxxxxxxxxxx <10.0.3.124:57920>: no match found 09/20/12 20:03:09 Got NO_MORE_JOBS; done negotiating 09/20/12 20:03:09 negotiateWithGroup resources used scheddAds length 0
09/20/12 20:03:09 ---------- Finished Negotiation Cycle ---------- -- Yuval Leader Design Automation Engineer, Mellanox Technologies mailto: leader@xxxxxxxxxxxx Tel: +972-74-7236360 Fax: +972-4-9593245 Beit Mellanox. 6th Floor,R-620 P.O.Box 586, Yokneam Industrial Park, 20692 Israel From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx]
On Behalf Of Yuval Leader Hi, I am finally in a position to verify the allocations kindly described by Ian Chesal on 22-aug (see thread below). I am trying to verify that a fair share of resources is indeed matches to 2 users with same user priority, both with job numbers of same job priority I expect that a limited set of equal resources will be shared 50-50 between the users. The actual result I’m seeing is that one user still gets all its requests filled. background The machine I have has 16 cpus’ thus giving me 16 slots. I have assigned a STARTD_ATTR to just 8 of the slots: MT_Model = "PowerEdge 1850". The other 8 slots have different values for this attr. MY Test flow: 1.
2 users are assigned the same user priority 2.
First user submits a cluster job with ‘queue 5’ and job requirements is: TARGET.MT_Model =?= "PowerEdge 1850" 3.
Second user submits a cluster job with ‘queue 6’ and same job requirements is: TARGET.MT_Model =?= "PowerEdge 1850" 4.
I expect that each will get exactly 4 slots and have remaining jobs un-matched 5.
Actually what I see is that the second user gets all its requested 6 slots and the other gets 2, with 3 jobs un-matched I would like to understand what I’m doing wrong here….. Test Flow Commands: Users priority is set to equal value: >condor_userprio -setprio
eitan@xxxxxxxxxxx 3 The priority of eitan@xxxxxxxxxxx was set to 3.000000 >condor_userprio -setprio
leader@xxxxxxxxxxx 3 The priority of leader@xxxxxxxxxxx was set to 3.000000 > condor_userprio -all Last Priority Update: 9/20 19:52 Effective Real Priority Res Total Usage Usage Last Time Since User Name Priority Priority Factor In Use (wghted-hrs) Start Time Usage Time Last Usage ------------------ ------------ -------- --------- ------ ------------ ---------------- ---------------- ---------- eitan@xxxxxxxxxxx 3.00 3.00 1.00 0 17.06 8/23/2012 10:28 9/20/2012 19:52 <now> leader@xxxxxxxxxxx 3.00 3.00 1.00 0 28.20 8/23/2012 10:28 9/20/2012 19:52 <now> ------------------ ------------ -------- --------- ------ ------------ ---------------- ---------------- ---------- Number of users: 2 0 45.26 9/19/2012 19:53 0+23:59 I turn OFF the negotiator >condor_off -negotiator Sent "Kill-Daemon" command for "negotiator" to local master User eitan then executes:
>condor_submit eitan:ENG:hmake_or_jk:eitan:5:3.simul.cmd Submitting job(s)..... 5 job(s) submitted to cluster 282. >condor_prio -p 17 282 User leader then executes:
>condor_submit aviram:SW:umake:leader:6:3.simul.cmd Submitting job(s)...... 6 job(s) submitted to cluster 283. >condor_prio -p 17 283 The resulting pending queue is as expected: >condor_q -- Submitter: MTLSLURM02.yok.mtl.com : <10.0.3.124:57920> : MTLSLURM02.yok.mtl.com ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
282.0 eitan 9/20 19:16 0+00:00:00 I 17 0.0 eitan:ENG:hmake_or 282.1 eitan 9/20 19:16 0+00:00:00 I 17 0.0 eitan:ENG:hmake_or 282.2 eitan 9/20 19:16 0+00:00:00 I 17 0.0 eitan:ENG:hmake_or 282.3 eitan 9/20 19:16 0+00:00:00 I 17 0.0 eitan:ENG:hmake_or 282.4 eitan 9/20 19:16 0+00:00:00 I 17 0.0 eitan:ENG:hmake_or 283.0 leader 9/20 19:17 0+00:00:00 I 17 0.0 aviram:SW:umake:le 283.1 leader 9/20 19:17 0+00:00:00 I 17 0.0 aviram:SW:umake:le 283.2 leader 9/20 19:17 0+00:00:00 I 17 0.0 aviram:SW:umake:le 283.3 leader 9/20 19:17 0+00:00:00 I 17 0.0 aviram:SW:umake:le 283.4 leader 9/20 19:17 0+00:00:00 I 17 0.0 aviram:SW:umake:le 283.5 leader 9/20 19:17 0+00:00:00 I 17 0.0 aviram:SW:umake:le 11 jobs; 0 completed, 0 removed, 11 idle, 0 running, 0 held, 0 suspended I turn on the negotiator and see the results >condor_on -negotiator Sent "Spawn-Daemon" command for "negotiator" to local master >condor_q -- Submitter: MTLSLURM02.yok.mtl.com : <10.0.3.124:57920> : MTLSLURM02.yok.mtl.com ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
282.0 eitan 9/20 19:16 0+00:00:01 R 17 0.0 eitan:ENG:hmake_or 282.1 eitan 9/20 19:16 0+00:00:01 R 17 0.0 eitan:ENG:hmake_or 282.2 eitan 9/20 19:16 0+00:00:00 I 17 0.0 eitan:ENG:hmake_or 282.3 eitan 9/20 19:16 0+00:00:00 I 17 0.0 eitan:ENG:hmake_or 282.4 eitan 9/20 19:16 0+00:00:00 I 17 0.0 eitan:ENG:hmake_or 283.0 leader 9/20 19:17 0+00:00:02 R 17 0.0 aviram:SW:umake:le 283.1 leader 9/20 19:17 0+00:00:02 R 17 0.0 aviram:SW:umake:le 283.2 leader 9/20 19:17 0+00:00:02 R 17 0.0 aviram:SW:umake:le 283.3 leader 9/20 19:17 0+00:00:01 R 17 0.0 aviram:SW:umake:le 283.4 leader 9/20 19:17 0+00:00:01 R 17 0.0 aviram:SW:umake:le 283.5 leader 9/20 19:17 0+00:00:01 R 17 0.0 aviram:SW:umake:le 11 jobs; 0 completed, 0 removed, 3 idle, 8 running, 0 held, 0 suspended # As you can see, user leader, got 6 matches “R” while user eitan got the remaining 2 >condor_q –anal -- Submitter: MTLSLURM02.yok.mtl.com : <10.0.3.124:57920> : MTLSLURM02.yok.mtl.com --- 282.000: Request is being serviced --- 282.001: Request is being serviced --- 282.002: Run analysis summary. Of 16 machines, 8 are rejected by your job's requirements 0 reject your job because of their own requirements 8 match but are serving users with a better priority in the pool
0 match but reject the job for unknown reasons 0 match but will not currently preempt their existing job
0 match but are currently offline 0 are available to run your job No successful match recorded. Last failed match: Thu Sep 20 19:35:58 2012 Reason for last match failure: insufficient priority The Requirements _expression_ for your job is: ( ( TARGET.MT_Model is "PowerEdge 1850" ) && ( target.MT_SimulTime >= 1373881200 ) && ( target.MT_SimulTime <= 1376047080 ) && ( target.MT_SimulMachineState == "unclaimed" ) ) && ( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) && ( ( TARGET.HasFileTransfer ) || ( TARGET.FileSystemDomain == MY.FileSystemDomain ) ) Condition Machines Matched Suggestion --------- ---------------- ---------- 1 ( TARGET.MT_Model is "PowerEdge 1850" )8
2 ( target.MT_SimulTime >= 1373881200 )16
3 ( target.MT_SimulTime <= 1376047080 )16 4 ( target.MT_SimulMachineState == "unclaimed" ) 16 5 ( TARGET.Arch == "X86_64" ) 16 6 ( TARGET.OpSys == "LINUX" ) 16 7 ( TARGET.Disk >= 1 ) 16 8 ( TARGET.Memory >= ifthenelse(MemoryUsage isnt undefined,MemoryUsage,1) ) 16 9 ( ( TARGET.HasFileTransfer ) || ( TARGET.FileSystemDomain == "yok.mtl.com" ) ) 16 --- 282.003: Run analysis summary. Of 16 machines, 8 are rejected by your job's requirements 0 reject your job because of their own requirements 8 match but are serving users with a better priority in the pool
0 match but reject the job for unknown reasons 0 match but will not currently preempt their existing job
0 match but are currently offline 0 are available to run your job The Requirements _expression_ for your job is: ( ( TARGET.MT_Model is "PowerEdge 1850" ) && ( target.MT_SimulTime >= 1373881200 ) && ( target.MT_SimulTime <= 1376047080 ) && ( target.MT_SimulMachineState == "unclaimed" ) ) && ( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) && ( ( TARGET.HasFileTransfer ) || ( TARGET.FileSystemDomain == MY.FileSystemDomain ) ) Condition Machines Matched Suggestion --------- ---------------- ---------- 1 ( .RIGHT.MT_Model is "PowerEdge 1850" )8
2 ( .RIGHT.MT_SimulTime >= 1373881200 )16
3 ( .RIGHT.MT_SimulTime <= 1376047080 )16 4 ( .RIGHT.MT_SimulMachineState == "unclaimed" ) 16 5 ( .RIGHT.Arch == "X86_64" ) 16 6 ( .RIGHT.OpSys == "LINUX" ) 16 7 ( .RIGHT.Disk >= 1 ) 16 8 ( .RIGHT.Memory >= ifthenelse(MemoryUsage isnt undefined,MemoryUsage,1) ) 16 9 ( ( .RIGHT.HasFileTransfer ) || ( .RIGHT.FileSystemDomain == "yok.mtl.com" ) ) 16 --- 282.004: Run analysis summary. Of 16 machines, 8 are rejected by your job's requirements 0 reject your job because of their own requirements 8 match but are serving users with a better priority in the pool
0 match but reject the job for unknown reasons 0 match but will not currently preempt their existing job
0 match but are currently offline 0 are available to run your job The Requirements _expression_ for your job is: ( ( TARGET.MT_Model is "PowerEdge 1850" ) && ( target.MT_SimulTime >= 1373881200 ) && ( target.MT_SimulTime <= 1376047080 ) && ( target.MT_SimulMachineState == "unclaimed" ) ) && ( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) && ( ( TARGET.HasFileTransfer ) || ( TARGET.FileSystemDomain == MY.FileSystemDomain ) ) Condition Machines Matched Suggestion --------- ---------------- ---------- 1 ( .RIGHT.MT_Model is "PowerEdge 1850" )8
2 ( .RIGHT.MT_SimulTime >= 1373881200 )16
3 ( .RIGHT.MT_SimulTime <= 1376047080 )16 4 ( .RIGHT.MT_SimulMachineState == "unclaimed" ) 16 5 ( .RIGHT.Arch == "X86_64" ) 16 6 ( .RIGHT.OpSys == "LINUX" ) 16 7 ( .RIGHT.Disk >= 1 ) 16 8 ( .RIGHT.Memory >= ifthenelse(MemoryUsage isnt undefined,MemoryUsage,1) ) 16 9 ( ( .RIGHT.HasFileTransfer ) || ( .RIGHT.FileSystemDomain == "yok.mtl.com" ) ) 16 --- 283.000: Request is being serviced --- 283.001: Request is being serviced --- 283.002: Request is being serviced --- 283.003: Request is being serviced --- 283.004: Request is being serviced --- 283.005: Request is being serviced <<END OF MY TEST FLOW>> -- Yuval Leader Design Automation Engineer, Mellanox Technologies mailto:
leader@xxxxxxxxxxxx Tel: +972-74-7236360 Fax: +972-4-9593245 Beit Mellanox. 6th Floor,R-620 P.O.Box 586, Yokneam Industrial Park, 20692 Israel From:
condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx]
On Behalf Of Ian Chesal Hi Yuval, On Wednesday, 22 August, 2012 at 7:35 AM, Yuval Leader wrote:
Add another assumption and yes, this is what you'll see. The other assumption you need to add is that all four users have the exact same effective user priority. See:
http://research.cs.wisc.edu/condor/manual/v7.6/3_4User_Priorities.html#25902 If they all have the same EUP, they'll all get exactly 1/4 of the system after one negotiation cycle assuming everything about their jobs is equal. This is easy enough to test. I queued up 10 sleep jobs from four users in a new pool that has four slots available in it. None of these users had accumulated any use history so all had identical EUPs of 0. Before I queued up the jobs, I shut down the negotiator with: condor_off -negotiator You can see the jobs ready to go: -bash-3.2# condor_status -submitter Name Machine Running IdleJobs HeldJobs alice@.internal domU-12-31 0 10 0 bob@.internal domU-12-31 0 10 0 eve@.internal domU-12-31 0 10 0 test.user@.internal domU-12-31 0 10 0 RunningJobs IdleJobs HeldJobs alice@.internal 0 10 0 bob@.internal 0 10 0 eve@.internal 0 10 0 test.user@.internal 0 10 0 Total 0 40 0 I turned on the negotiator for one negotiation cycle and got one job from each user assigned to each of the four slots in my pool: -bash-3.2# condor_q -const 'jobstatus == 2' -- Submitter: Q1@domU-12-31-38-04-9C-A1 : <10.220.159.79:59831> : domU-12-31-38-04-9C-A1.compute-1.internal ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 2.0 test.user 8/22 11:12 0+00:05:44 R 0 0.0 sleeper.py --min=6 3.0 alice 8/22 11:14 0+00:05:47 R 0 0.0 sleeper.py --min=6 4.0 bob 8/22 11:14 0+00:05:47 R 0 0.0 sleeper.py --min=6 5.0 eve 8/22 11:14 0+00:05:45 R 0 0.0 sleeper.py --min=6 08/22/12 11:15:23 ---------- Started Negotiation Cycle ---------- 08/22/12 11:15:23 Phase 1: Obtaining ads from collector ... 08/22/12 11:15:23 Getting all public ads ... 08/22/12 11:15:24 Sorting 17 ads ... 08/22/12 11:15:24 Getting startd private ads ... 08/22/12 11:15:24 Got ads: 17 public and 4 private 08/22/12 11:15:24 Public ads include 4 submitter, 4 startd 08/22/12 11:15:24 Phase 2: Performing accounting ... 08/22/12 11:15:24 Phase 3: Sorting submitter ads by priority ... 08/22/12 11:15:24 Phase 4.1: Negotiating with schedds ... 08/22/12 11:15:24 Negotiating with
alice@.internal at <10.220.159.79:59831> 08/22/12 11:15:24 0 seconds so far 08/22/12 11:15:24 Request 00003.00000: 08/22/12 11:15:24 Matched 3.0
alice@.internal <10.220.159.79:59831> preempting none <10.123.7.99:57106> ip-10-123-7-99.ec2.internal 08/22/12 11:15:24 Successfully matched with ip-10-123-7-99.ec2.internal 08/22/12 11:15:24 Request 00003.00001: 08/22/12 11:15:24 Rejected 3.1
alice@.internal <10.220.159.79:59831>: fair share exceeded 08/22/12 11:15:24 Got NO_MORE_JOBS; done negotiating 08/22/12 11:15:24 Negotiating with
bob@.internal at <10.220.159.79:59831> 08/22/12 11:15:24 0 seconds so far 08/22/12 11:15:24 Request 00004.00000: 08/22/12 11:15:24 Matched 4.0
bob@.internal <10.220.159.79:59831> preempting none <10.93.21.85:53716> ip-10-93-21-85.ec2.internal 08/22/12 11:15:24 Successfully matched with ip-10-93-21-85.ec2.internal 08/22/12 11:15:24 Request 00004.00001: 08/22/12 11:15:24 Rejected 4.1
bob@.internal <10.220.159.79:59831>: fair share exceeded 08/22/12 11:15:25 Got NO_MORE_JOBS; done negotiating 08/22/12 11:15:25 Negotiating with
eve@.internal at <10.220.159.79:59831> 08/22/12 11:15:25 0 seconds so far 08/22/12 11:15:25 Request 00005.00000: 08/22/12 11:15:25 Matched 5.0
eve@.internal <10.220.159.79:59831> preempting none <10.127.163.251:50135> ip-10-127-163-251.ec2.internal 08/22/12 11:15:25 Successfully matched with ip-10-127-163-251.ec2.internal 08/22/12 11:15:25 Request 00005.00001: 08/22/12 11:15:25 Rejected 5.1
eve@.internal <10.220.159.79:59831>: fair share exceeded 08/22/12 11:15:25 Got NO_MORE_JOBS; done negotiating 08/22/12 11:15:25 Negotiating with
test.user@.internal at <10.220.159.79:59831> 08/22/12 11:15:25 0 seconds so far 08/22/12 11:15:25 Request 00002.00000: 08/22/12 11:15:25 Matched 2.0
test.user@.internal <10.220.159.79:59831> preempting none <10.220.109.195:45947> domU-12-31-38-04-6E-39.compute-1.internal 08/22/12 11:15:25 Successfully matched with domU-12-31-38-04-6E-39.compute-1.internal 08/22/12 11:15:25 Reached submitter resource limit: 1.000000 ... stopping 08/22/12 11:15:25 negotiateWithGroup resources used scheddAds length 4 08/22/12 11:15:25 ---------- Finished Negotiation Cycle ---------- Condor determines the fairshare allotments at the outset of the negotiation cycle so it stopped after each user got one machine -- their fair share.
Yes. This is what will happen. Again, assuming their EUPs are all equal.
No, that's not what happens. The negotiator determines up front, using the EUP of each submitter, what each submitter's fair share of the machines should be for this negotiation cycle. And based on that it moves through each submitter's
list of idle jobs and tries to match them to slots. If the EUPs of your users aren't all identical then the allocations will not be equal. Some users will get more because they've used less in the recent past. Some users will get less because they've used more in the recent past.
Only if you also add the assumption that all EUPs are identical for the users.
Accounting groups help ensure that, regardless of EUP, people get some minimum (and possibly maximum) number of slots in your pool when they have jobs in a queue. If you wanted each user to always get 50 machines, but user >50 machines if other users aren't using their machines, you'd setup soft quotas for 4 different groups and put each user in a unique group. Now, Condor will attempt to fulfill
their quotas first and, once all the quotas have been satisfied, it'll let excess free resources be used, fair share, but anyone who has a soft quota limit. Regards, - Ian --- Ian Chesal Cycle Computing, LLC Leader in Open Compute Solutions for Clouds, Servers, and Desktops Enterprise Condor Support and Management Tools |