Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Bit of a problem with HAD
- Date: Tue, 10 Jan 2006 12:37:18 -0800
- From: "Finch, Ralph" <rfinch@xxxxxxxxxxxx>
- Subject: [Condor-users] Bit of a problem with HAD
condor -version
$CondorVersion: 6.7.13 Nov 7 2005 $
$CondorPlatform: INTEL-WINNT50 $
My desktop machine and another machine are the HAD machines, and also
serve as condor executors.
When I installed this a few weeks ago things were working OK, though I
don't think I tested dagman then. Now I have these symptoms: when I
submit a dagman job, the jobs wait in the queue several minutes. Then
on my machine (MERRIT) a condor_exec.exe starts and runs full CPU speed,
but no other jobs start to run.
I also get this in MERRIT's SchedLog:
1/10 12:24:19 (pid:2144) Sent ad to central manager for
rfinch@xxxxxxxxxxxx
1/10 12:24:19 (pid:2144) Sent ad to 2 collectors for rfinch@xxxxxxxxxxxx
1/10 12:24:19 (pid:2144) Haven't heard from negotiator, trying to claim
local startd
1/10 12:24:19 (pid:2144) Claiming local startd vm 2 at
<136.200.YYYYY:1219>
1/10 12:24:19 (pid:2144) Negotiator gone, trying to use our local startd
1/10 12:24:27 (pid:2144) Starting add_shadow_birthdate(1287.0)
1/10 12:24:27 (pid:2144) Started shadow for job 1287.0 on
"<136.200.YYYYY:1219>", (shadow pid = 1492)
1/10 12:24:27 (pid:2144) Sent ad to central manager for
rfinch@xxxxxxxxxxxx
1/10 12:24:27 (pid:2144) Sent ad to 2 collectors for rfinch@xxxxxxxxxxxx
1/10 12:24:27 (pid:2144) Haven't heard from negotiator, trying to claim
local startd
1/10 12:24:32 (pid:2144) DaemonCore: PERMISSION DENIED to unknown user
from host <136.200.XXXXX:1831> for command 416 (NEGOTIATE)
YYYYY is MERRIT, XXXXX is the other HAD machine (delta-mod).
The HADLog and CollectorLog show no problems.
Any clues appreciated.
Ralph Finch, P.E.
Dept. of Water Resources
Bay-Delta Office, Room 215-13
Sacramento, CA 95814
916-653-7552
rfinch@xxxxxxxxxxxx