Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Failed to send REQUEST_CLAIM to startd
- Date: Tue, 23 Jun 2009 07:38:29 -0400
- From: Matthew Farrellee <matt@xxxxxxxxxx>
- Subject: Re: [Condor-users] Failed to send REQUEST_CLAIM to startd
陈婷 wrote:
> Hi everyone,
> There are five machines in the pool. The result of executing condor_status is as follow:
> ============================================================
> Name OpSys Arch State Activity LoadAv Mem ActvtyTime
> 10.10.4.214 LINUX INTEL Unclaimed Idle 0.000 512 0+00:00:04
> slot1@xxxxxxxxxx LINUX INTEL Unclaimed Idle 0.000 442 0+16:42:40
> slot2@xxxxxxxxxx LINUX INTEL Unclaimed Idle 0.000 442 0+03:05:06
> slot3@xxxxxxxxxx LINUX INTEL Unclaimed Idle 0.000 442 0+17:02:43
> slot4@xxxxxxxxxx LINUX INTEL Unclaimed Idle 0.000 442 0+02:50:08
> Total Owner Claimed Unclaimed Matched Preempting Backfill
> INTEL/LINUX 5 0 0 4 1 0 0
> Total 5 0 0 4 1 0 0
> =============================================================
>
> Machine "10.10.4.214" is a virtual machine installed condor. When I submit a job from m2m.jsi.cn and the content of test.cmd is :
> ====================================
> Universe = vanilla
> CMD = test.bat
> output = condor.output
> error = condor.error
> log = condor.log
> Requirements = Name == "10.10.4.214"
> WhenToTransferOutput = ON_EXIT_OR_EVICT
> queue
> ====================================
>
> The job cannot be dispatched to "10.10.4.214".
>
> Here is the result when I execute condor_q -analyze.
> ======================================================
> -- Submitter: m2m.jsi.cn : <10.10.3.11:35384> : m2m.jsi.cn
> ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
> ---
> 066.000: Run analysis summary. Of 5 machines,
> 4 are rejected by your job's requirements
> 1 reject your job because of their own requirements
> 0 match but are serving users with a better priority in the pool
> 0 match but reject the job for unknown reasons
> 0 match but will not currently preempt their existing job
> 0 are available to run your job
> Last successful match: Fri Jun 19 16:26:15 2009
> ======================================================
>
> The SchedLog is:
> ===================================================================================
> 6/19 16:26:15 (pid:8919) Sent ad to central manager for agrid@xxxxxx
> 6/19 16:26:15 (pid:8919) Sent ad to 1 collectors for agrid@xxxxxx
> 6/19 16:26:15 (pid:8919) Called reschedule_negotiator()
> 6/19 16:26:15 (pid:8919) Activity on stashed negotiator socket
> 6/19 16:26:15 (pid:8919) Negotiating for owner: agrid@xxxxxx
> 6/19 16:26:15 (pid:8919) Checking consistency running and runnable jobs
> 6/19 16:26:15 (pid:8919) Tables are consistent
> 6/19 16:26:15 (pid:8919) Rebuilt prioritized runnable job list in 0.000s.
> 6/19 16:26:15 (pid:8919) Out of jobs - 1 jobs matched, 0 jobs idle, flock level = 0
> 6/19 16:26:15 (pid:8919) Response problem from startd when requesting claim 10.10.4.214 <10.10.4.214:50611> for agrid@xxxxxx 66.0.
> 6/19 16:26:15 (pid:8919) Failed to send REQUEST_CLAIM to startd 10.10.4.214 <10.10.4.214:50611> for agrid@xxxxxx:
> 6/19 16:26:15 (pid:8919) Match record (10.10.4.214 <10.10.4.214:50611> for agrid@xxxxxx, 66.0) deleted
> 6/19 16:26:20 (pid:8919) Sent ad to central manager for agrid@xxxxxx
> 6/19 16:26:20 (pid:8919) Sent ad to 1 collectors for agrid@xxxxxx
> =====================================================================================
> The NegotiatorLog is:
> =================================================================================
> 6/19 16:26:15 ---------- Started Negotiation Cycle ----------
> 6/19 16:26:15 Phase 1: Obtaining ads from collector ...
> 6/19 16:26:15 Getting all public ads ...
> 6/19 16:26:15 Sorting 11 ads ...
> 6/19 16:26:15 Getting startd private ads ...
> 6/19 16:26:15 Got ads: 11 public and 5 private
> 6/19 16:26:15 Public ads include 1 submitter, 5 startd
> 6/19 16:26:15 Phase 2: Performing accounting ...
> 6/19 16:26:15 Phase 3: Sorting submitter ads by priority ...
> 6/19 16:26:15 Phase 4.1: Negotiating with schedds ...
> 6/19 16:26:15 Negotiating with agrid@xxxxxx at <10.10.3.11:35384>
> 6/19 16:26:15 0 seconds so far
> 6/19 16:26:15 Request 00066.00000:
> 6/19 16:26:15 Matched 66.0 agrid@xxxxxx <10.10.3.11:35384> preempting none <10.10.4.214:50611> 10.10.4.214
> 6/19 16:26:15 Successfully matched with 10.10.4.214
> 6/19 16:26:15 Got NO_MORE_JOBS; done negotiating
> 6/19 16:26:15 ---------- Finished Negotiation Cycle ----------
> ================================================================================
>
> All the information is shown, can anybody help me pls? Thanks very much.
>
> Jassy
You need to take a look at the StartLog on 10.10.4.12214 at th the time
of the issue.
Best,
matt