Hello,
We installed Condor 6.6.10 and 6.6.11 on two of our Windows 2003 servers. The rest of the cluster consists of Windows 2000 and XP machines (The Master is a Windows 2000 PC). Jobs can be submitted from the servers to Condor, but from other machines to the servers or even from each server to another. For both, the error message is always the same:
7/13 10:18:36 Using config file:
C:\Condor\condor_config
7/13 10:18:36 Using
local config files: C:\Condor/condor_config.local
7/13 10:18:36 DaemonCore: Command Socket at
<10.13.1.18:3556>
7/13 10:18:37
Initializing a VANILLA shadow
7/13 10:18:37
(1.0) (4648): Request to run on <10.13.1.18:2335> was ACCEPTED
7/13 10:18:39 (1.0) (4648): condor_read(): recv()
returned -1, errno = 10054, assuming failure.
7/13 10:18:39 (1.0) (4648): IO: Failed to read packet header
7/13 10:18:39 (1.0) (4648): ERROR "Can no longer
talk to condor_starter on execute machine (10.13.1.18)" at line 63 in file
..\src\condor_shadow.V6.1\NTreceivers.C
We found now, that the dynamically installed Condor-users on the servers exist, but are set to "not active". This is not the case for the Win 2000 machines, here, the Condor users are always set to "active". Can this be the cause of the error? If yes, how can it be changed?
Thanks for your help
Helge