Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] condor_negotiator/condor_collector scheduling problem
- Date: Fri, 05 May 2006 16:36:53 -0400
- From: Armen Babikyan <armenb@xxxxxxxxxx>
- Subject: Re: [Condor-users] condor_negotiator/condor_collector scheduling problem
Hi Erik,
Upon further examination, I don't think that my condor_negotiator isn't
checking to see if all the jobs that are currently idle can be matched.
Rather, it seems to stop on the first "no match found". Below is a
snippet of two iterations of the negotiation cycle from NegotiatorLog.
I'm not sure why the Negotiator "gets" a NO_MORE_JOBS message, since
there are 6 more jobs that it is not considering. Is there a flag I
need to pass to the Negotiator to force it consider all idle jobs in
every iteration of the negotiation cycle instead of just stopping at the
first?
I've been deleting the spool directory everytime I try this. Jobs 1-4
require MY_RESOURCE_1, and Jobs 5-8 require MY_RESOURCE_2. I also I
changed the submit files so the dummy programs sleep for 90 seconds
instead of 600. I should also mention that despite the fact that my
negotiation cycles are set to run more frequently, I get exactly the
same problem when I use the default timings.
In case the inline snippet isn't helpful enough, I've uploaded all the
log files to:
http://www.static.net/~armenb/condor-negotiator-problem/
q1.txt and s1.txt are dumps of condor_q -l and condor_status -l when all
jobs are running, and q2.txt and s2.txt are dumps of the same programs
when Condor runs the first MY_RESOURCE_2-needing program concurrently
with the last MY_RESOURCE_1-needing program.
Please let me know if you have any questions. Thanks!
- Armen
5/5 15:56:05 ---------- Started Negotiation Cycle ----------
5/5 15:56:05 Phase 1: Obtaining ads from collector ...
5/5 15:56:05 Getting all public ads ...
5/5 15:56:05 Sorting 8 ads ...
5/5 15:56:05 Getting startd private ads ...
5/5 15:56:05 Got ads: 8 public and 4 private
5/5 15:56:05 Public ads include 1 submitter, 4 startd
5/5 15:56:05 Phase 2: Performing accounting ...
5/5 15:56:05 Phase 3: Sorting submitter ads by priority ...
5/5 15:56:05 Phase 4.1: Negotiating with schedds ...
5/5 15:56:05 Negotiating with armenb@xxxxxxxxxxxxxxxxxxxxxxxxx at
<155.34.66.121:50431>
5/5 15:56:05 0 seconds so far
5/5 15:56:05 Request 00001.00000:
5/5 15:56:05 Matched 1.0 armenb@xxxxxxxxxxxxxxxxxxxxxxxxx
<155.34.66.121:50431> preempting none <155.34.66.121:50432>
vm1@xxxxxxxxxxxxxxxxxxxxxxxxx
5/5 15:56:05 Successfully matched with vm1@xxxxxxxxxxxxxxxxxxxxxxxxx
5/5 15:56:05 Got NO_MORE_JOBS; done negotiating
5/5 15:56:05 ---------- Finished Negotiation Cycle ----------
5/5 15:56:25 ---------- Started Negotiation Cycle ----------
5/5 15:56:25 Phase 1: Obtaining ads from collector ...
5/5 15:56:25 Getting all public ads ...
5/5 15:56:25 Sorting 8 ads ...
5/5 15:56:25 Getting startd private ads ...
5/5 15:56:25 Got ads: 8 public and 4 private
5/5 15:56:25 Public ads include 1 submitter, 4 startd
5/5 15:56:25 Phase 2: Performing accounting ...
5/5 15:56:25 Phase 3: Sorting submitter ads by priority ...
5/5 15:56:25 Phase 4.1: Negotiating with schedds ...
5/5 15:56:25 Negotiating with armenb@xxxxxxxxxxxxxxxxxxxxxxxxx at
<155.34.66.121:50431>
5/5 15:56:25 0 seconds so far
5/5 15:56:25 Request 00002.00000:
5/5 15:56:25 Rejected 2.0 armenb@xxxxxxxxxxxxxxxxxxxxxxxxx
<155.34.66.121:50431>: no match found
5/5 15:56:25 Got NO_MORE_JOBS; done negotiating
5/5 15:56:25 ---------- Finished Negotiation Cycle ----------
Erik Paulson wrote:
On Thu, May 04, 2006 at 02:34:11PM -0400, Armen Babikyan wrote:
Hi Condor Team,
A few weeks ago I described a problem I was having with Condor not
scheduling jobs on available resources. I've recreated the problem in a
simpler way, without the need for a DAG. It seems like
condor_negotiator and/or condor_collector are somehow misbehaving and
not matching jobs when there are resources and jobs that match.
It'd be more useful to see the output of
condor_status -l and
condor_q -l
when the situation you're describing is happening, along with
the NegotiatorLog, and possibly the ScheddLog.
-Erik
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
--
Armen Babikyan
MIT Lincoln Laboratory
armenb@xxxxxxxxxx . 781-981-1796