[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Condor on XP stops talking to pool manager



Hi,
I've installed 28 WindowsXP machines with condor 6.6.10.
After the 1st day of running, 11 have stopped showing up on the condor_status 
list.

The MasterLog on some of the affected machines ends with this:
> 8/30 18:17:33 Child 2424 died, but not a daemon -- Ignored

When I check the processes running on the afflicted system, they do appear to 
be active, but I can't contact them from the pool manager

> condor_restart -direct 130.195.109.42
> 9/1 10:04:19 TOOL_TIMEOUT_MULTIPLIER is undefined, using default value of 0
> Can't find address for master 130.195.109.42

Is there anything else I can do to get these machines back online without 
actually rebooting them ?


thanks

Mel.



8/30 17:17:32 ******************************************************
8/30 17:17:32 ** Condor (CONDOR_MASTER) STARTING UP
8/30 17:17:32 ** C:\Condor\bin\condor_master.exe
8/30 17:17:32 ** $CondorVersion: 6.6.10 Jun 22 2005 $
8/30 17:17:32 ** $CondorPlatform: INTEL-WINNT50 $
8/30 17:17:32 ** PID = 2196
8/30 17:17:32 ******************************************************
8/30 17:17:32 Using config file: C:\Condor\condor_config
8/30 17:17:32 Using local config files: C:\Condor/condor_config.local
8/30 17:17:32 DaemonCore: Command Socket at <130.195.109.42:3022>
8/30 17:17:32 Started DaemonCore process "C:\Condor/bin/condor_startd.exe", 
pid and pgroup = 916
8/30 18:17:32 Preen pid is 2424
8/30 18:17:33 DaemonCore: Command received via UDP from host 
<130.195.109.42:3182>
8/30 18:17:33 DaemonCore: received command 60001 (DC_PROCESSEXIT), calling 
handler (HandleProcessExitCommand())
8/30 18:17:33 Child 2424 died, but not a daemon -- Ignored