Thanks - that got things going. With that negotiator option, will I be able to use something like: match_list_length = 1 Rank = TARGET.Name != LastMatchName0 in a job submit file? Warren Dan Bradley wrote: This is a bug in Condor. A fix for it has been discussed but not yet implemented. The workaround is to add the following to your fake startd ads: RemoteUser = "fake_user" Rank = 1.0 CurrentRank = 0.0 and to add the following to your negotiator configuration: NEGOTIATOR_MATCHLIST_CACHING = false --Dan Warren Smith wrote:Hi, I'm working on deploying Condor-G and matchmaking. My problem is that while jobs are being matched and executed, they are only matched to a system one at a time. I'd like Condor-G to have several jobs submitted to a system at the same time. I have a simple test job that only can match to a single class ad: executable = /bin/hostname arguments = --fqdn transfer_executable = false output = hostname-match-$(CLUSTER)-$(PROCESS).out error = hostname-match-$(CLUSTER)-$(PROCESS).err log = hostname-match-$(CLUSTER)-$(PROCESS).log universe = grid x509userproxy=/home/utexas/staff/wsmith/.globus/userproxy.pem grid_resource = $$(GridResource) Requirements = (Name=="tacc.lonestar.serial") globusrsl = (maxWallTime=5)(count=1)(queue=$$(Queue)) queue 10 And the classad in Condor is: lslogin2$ condor_status -l tacc.lonestar.serial MyType = "Machine" TargetType = "Job" Requirements = (TARGET.JobUniverse == 9) Rank = 0.000000 CurrentRank = 0.000000 WantAdRevaluate = TRUE CurMatches = 0 Name = "tacc.lonestar.serial" Machine = "gatekeeper.lonestar.tacc.teragrid.org" StartdIpAddr = "<129.114.50.32>" GridResource = "gt2 gatekeeper.lonestar.tacc.teragrid.org:2119/jobmanager-lsf" State = "Unclaimed" Activity = "Idle" UpdateSequenceNumber = 1220367368 Arch = "X86_64" OpSys = "LINUX" LoadAvg = 0.865580 TotalMemory = 11840721 Memory = 1725537 Queue = "serial" Priority = 0.030000 MaxWallTime = 720 MaxProcessors = 1 MyAddress = "<192.5.198.172:0>" LastHeardFrom = 1220367369 UpdatesTotal = 1328 UpdatesSequenced = 0 UpdatesLost = 0 UpdatesHistory = "0x00000000000000000000000000000000" >From the Condor manual, it seems like setting WantAdRevaluate to True will result in Condor matching multiple jobs to this system. What I'm seeing is that the jobs run one at a time on the system. Here's part of the MatchLog: 9/2 09:48:49 Matched 153.0 wsmith@xxxxxxxxxxxxxxxxx <129.114.69.97:50761> preempting none <129.114.50.32> tacc.lonestar.serial 9/2 09:48:49 Rejected 153.1 wsmith@xxxxxxxxxxxxxxxxx <129.114.69.97:50761>: no match found 9/2 09:53:51 Matched 153.1 wsmith@xxxxxxxxxxxxxxxxx <129.114.69.97:50761> preempting none <129.114.50.32> tacc.lonestar.serial 9/2 09:53:51 Rejected 153.2 wsmith@xxxxxxxxxxxxxxxxx <129.114.69.97:50761>: no match found 9/2 09:58:52 Matched 153.2 wsmith@xxxxxxxxxxxxxxxxx <129.114.69.97:50761> preempting none <129.114.50.32> tacc.lonestar.serial 9/2 09:58:52 Rejected 153.3 wsmith@xxxxxxxxxxxxxxxxx <129.114.69.97:50761>: no match found 9/2 10:03:53 Matched 153.3 wsmith@xxxxxxxxxxxxxxxxx <129.114.69.97:50761> preempting none <129.114.50.32> tacc.lonestar.serial 9/2 10:03:53 Rejected 153.4 wsmith@xxxxxxxxxxxxxxxxx <129.114.69.97:50761>: no match found 9/2 10:08:55 Matched 153.4 wsmith@xxxxxxxxxxxxxxxxx <129.114.69.97:50761> preempting none <129.114.50.32> tacc.lonestar.serial 9/2 10:08:55 Rejected 153.5 wsmith@xxxxxxxxxxxxxxxxx <129.114.69.97:50761>: no match found 9/2 10:13:56 Matched 153.5 wsmith@xxxxxxxxxxxxxxxxx <129.114.69.97:50761> preempting none <129.114.50.32> tacc.lonestar.serial 9/2 10:13:56 Rejected 153.6 wsmith@xxxxxxxxxxxxxxxxx <129.114.69.97:50761>: no match found 9/2 10:18:58 Matched 153.6 wsmith@xxxxxxxxxxxxxxxxx <129.114.69.97:50761> preempting none <129.114.50.32> tacc.lonestar.serial 9/2 10:18:58 Rejected 153.7 wsmith@xxxxxxxxxxxxxxxxx <129.114.69.97:50761>: no match found 9/2 10:24:00 Matched 153.7 wsmith@xxxxxxxxxxxxxxxxx <129.114.69.97:50761> preempting none <129.114.50.32> tacc.lonestar.serial 9/2 10:24:00 Rejected 153.8 wsmith@xxxxxxxxxxxxxxxxx <129.114.69.97:50761>: no match found 9/2 10:29:01 Matched 153.8 wsmith@xxxxxxxxxxxxxxxxx <129.114.69.97:50761> preempting none <129.114.50.32> tacc.lonestar.serial 9/2 10:29:01 Rejected 153.9 wsmith@xxxxxxxxxxxxxxxxx <129.114.69.97:50761>: no match found 9/2 10:34:02 Matched 153.9 wsmith@xxxxxxxxxxxxxxxxx <129.114.69.97:50761> preempting none <129.114.50.32> tacc.lonestar.serial As you can see, all of the jobs get matched and run, but only one gets matched every 5 mins (every Negotiator cycle?). The serial queue on lonestar was empty so the jobs ran quickly. The collector and negotiator are from Condor 7.1.0. I sent an earlier query to the list about a STARTD_AD_REEVAL_EXPR error message in my NegotiatorLog that I don't think is related to this... Thanks for the help, Warren _______________________________________________ Condor-users mailing list To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/condor-users The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/_______________________________________________ Condor-users mailing list To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/condor-users The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/ |