Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] jobs stop running when lots of people submit jobs
- Date: Tue, 20 May 2008 13:58:23 +0000 (GMT)
- From: Ben Clifford <benc@xxxxxxxxxxxxx>
- Subject: Re: [Condor-users] jobs stop running when lots of people submit jobs
On Tue, 20 May 2008, Dan Bradley wrote:
> The problem you describe sounds like a problem that was fixed in 7.0.0.
> Here's the entry in the 7.0.0 version history:
[...]
> assigned to anybody. The message in the /condor_ negotiator/ log in this
> case was this:
>
> Over submitter resource limit (0) ... only consider startd ranks
I don't see anything in NegotiatorLog that looks like that.
A negotiation cycle from the log is pasted at the end of this message.
The first job mentioned, 2255, shows this in better-analyze:
root@workshop2:/sw/condor-6.8.4/local/log# condor_q -better-analyze 2255
-- Submitter: workshop2.ci.uchicago.edu : <127.0.1.1:34935> :
workshop2.ci.uchicago.edu
---
2255.000: Run analysis summary. Of 2 machines,
0 are rejected by your job's requirements
0 reject your job because of their own requirements
0 match but are serving users with a better priority in the pool
2 match but reject the job for unknown reasons
0 match but will not currently preempt their existing job
0 are available to run your job
No successful match recorded.
Last failed match: Tue May 20 08:55:42 2008
Reason for last match failure: no match found
That timestamp in the 'Last failed match' line is a few minutes ago.
5/20 08:54:01 ---------- Started Negotiation Cycle ----------
5/20 08:54:01 Phase 1: Obtaining ads from collector ...
5/20 08:54:01 Getting all public ads ...
5/20 08:54:01 Sorting 20 ads ...
5/20 08:54:01 Getting startd private ads ...
5/20 08:54:01 Got ads: 20 public and 2 private
5/20 08:54:01 Public ads include 15 submitter, 2 startd
5/20 08:54:01 Phase 2: Performing accounting ...
5/20 08:54:01 Phase 3: Sorting submitter ads by priority ...
5/20 08:54:01 Phase 4.1: Negotiating with schedds ...
5/20 08:54:01 Negotiating with train07@xxxxxxxxxxxxxxxxxxxxxxxxx at
<127.0.1.1:34935>
5/20 08:54:01 0 seconds so far
5/20 08:54:01 Request 02255.00000:
5/20 08:54:01 Rejected 2255.0 train07@xxxxxxxxxxxxxxxxxxxxxxxxx
<127.0.1.1:34935>: no match found
5/20 08:54:01 Got NO_MORE_JOBS; done negotiating
5/20 08:54:01 Negotiating with train08@xxxxxxxxxxxxxxxxxxxxxxxxx at
<127.0.1.1:34935>
5/20 08:54:01 0 seconds so far
5/20 08:54:01 Request 02128.00000:
5/20 08:54:01 Rejected 2128.0 train08@xxxxxxxxxxxxxxxxxxxxxxxxx
<127.0.1.1:34935>: no match found
5/20 08:54:01 Got NO_MORE_JOBS; done negotiating
5/20 08:54:01 Negotiating with train15@xxxxxxxxxxxxxxxxxxxxxxxxx at
<127.0.1.1:34935>
5/20 08:54:01 0 seconds so far
5/20 08:54:01 Request 02134.00000:
5/20 08:54:01 Rejected 2134.0 train15@xxxxxxxxxxxxxxxxxxxxxxxxx
<127.0.1.1:34935>: no match found
5/20 08:54:01 Got NO_MORE_JOBS; done negotiating
5/20 08:54:01 Negotiating with train19@xxxxxxxxxxxxxxxxxxxxxxxxx at
<127.0.1.1:34935>
5/20 08:54:01 0 seconds so far
5/20 08:54:01 Request 02149.00000:
5/20 08:54:01 Rejected 2149.0 train19@xxxxxxxxxxxxxxxxxxxxxxxxx
<127.0.1.1:34935>: no match found
5/20 08:54:01 Got NO_MORE_JOBS; done negotiating
5/20 08:54:01 Negotiating with train21@xxxxxxxxxxxxxxxxxxxxxxxxx at
<127.0.1.1:34935>
5/20 08:54:01 0 seconds so far
5/20 08:54:01 Request 02145.00000:
5/20 08:54:01 Rejected 2145.0 train21@xxxxxxxxxxxxxxxxxxxxxxxxx
<127.0.1.1:34935>: no match found
5/20 08:54:01 Got NO_MORE_JOBS; done negotiating
5/20 08:54:01 Negotiating with train39@xxxxxxxxxxxxxxxxxxxxxxxxx at
<127.0.1.1:34935>
5/20 08:54:01 0 seconds so far
5/20 08:54:01 Request 02169.00000:
5/20 08:54:01 Rejected 2169.0 train39@xxxxxxxxxxxxxxxxxxxxxxxxx
<127.0.1.1:34935>: no match found
5/20 08:54:01 Got NO_MORE_JOBS; done negotiating
5/20 08:54:01 ---------- Finished Negotiation Cycle ----------
--