Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] starter failed to connect to collector
- Date: Sat, 15 Oct 2005 14:17:01 -0500
- From: Jaime Frey <jfrey@xxxxxxxxxxx>
- Subject: Re: [Condor-users] starter failed to connect to collector
On Oct 14, 2005, at 1:44 AM, DeVoil, Peter wrote:
I have a few machines in a 30 node winXP pool that refuse to start
jobs.
I see these in the starter log:
...
10/14 14:59:38 vm2: Received match <192.168.0.162:1353>#4441711918
10/14 14:59:38 vm2: State change: match notification protocol
successful
10/14 14:59:38 vm2: Changing state: Unclaimed -> Matched
10/14 15:01:38 vm2: State change: match timed out
10/14 15:01:38 vm2: Changing state: Matched -> Owner
10/14 15:01:38 vm2: State change: IS_OWNER is false
10/14 15:01:38 vm2: Changing state: Owner -> Unclaimed
...
10/14 15:04:50 DaemonCore: Command received via TCP from host
<192.168.0.98:3484>
10/14 15:04:50 DaemonCore: received command 442 (REQUEST_CLAIM),
calling
handler (command_request_claim)
10/14 15:04:50 Error: can't find resource with capability
(<192.168.0.162:1353>#4441711918)
....
It appears the schedd that was matched to this startd took over 5
minutes to connect to it to start the job. We'd have to look at the
schedd log to see why it took so long.
+----------------------------------+---------------------------------+
| Jaime Frey | Public Split on Whether |
| jfrey@xxxxxxxxxxx | Bush Is a Divider |
| http://www.cs.wisc.edu/~jfrey/ | -- CNN Scrolling Banner |
+----------------------------------+---------------------------------+