Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Windows executor nodes loose communication to Linux CM after its restart
- Date: Thu, 21 May 2009 13:05:53 -0300
- From: kschwarz@xxxxxxxxxxxxxx
- Subject: [Condor-users] Windows executor nodes loose communication to Linux CM after its restart
Hi all,
When I restart the Condor Central Manager
machine (linux), I loose communication to Windows executor nodes that are
in other subnet. Executor nodes on same subnet refresh their communication
to the CM.
Login in the executor machine and stopping/starting
condor service restores communication. This is not supposed to be the common
procedure.
No local firewall is enabled (Windows
Firewall is off and Firewall on CM machine is also off).
See MasterLog file attached.
Is there any guideline to avoid loosing
the connection?
Thanks, Klaus
This message is intended solely for the
use of its addressee and may contain privileged or confidential information.
All information contained herein shall be treated as confidential and shall
not be disclosed to any third party without Embraer’s prior written approval.
If you are not the addressee you should not distribute, copy or file this
message. In this case, please notify the sender and destroy its contents
immediately.
Esta mensagem é para uso exclusivo de seu destinatário e pode conter informações
privilegiadas e confidenciais. Todas as informações aqui contidas devem
ser tratadas como confidenciais e não devem ser divulgadas a terceiros
sem o prévio consentimento por escrito da Embraer. Se você não é o destinatário
não deve distribuir, copiar ou arquivar a mensagem. Neste caso, por favor,
notifique o remetente da mesma e destrua imediatamente a mensagem.5/16 13:28:49 UnsetEnv(NET_REMAP_ENABLE): SetEnvironmentVariable failed, errno=203
5/16 13:28:49 ******************************************************
5/16 13:28:49 ** Condor (CONDOR_MASTER) STARTING UP
5/16 13:28:49 ** C:\Condor\bin\condor_master.exe
5/16 13:28:49 ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1)
5/16 13:28:49 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON
5/16 13:28:49 ** $CondorVersion: 7.2.1 Feb 19 2009 BuildID: 133382 $
5/16 13:28:49 ** $CondorPlatform: INTEL-WINNT50 $
5/16 13:28:49 ** PID = 324
5/16 13:28:49 ** Log last touched 5/16 13:00:18
5/16 13:28:49 ******************************************************
5/16 13:28:49 Using config source: C:\Condor\condor_config
5/16 13:28:49 Using local config sources:
5/16 13:28:49 \\smbsjk01\grid_env\CONDOR\condor_config.1
5/16 13:28:49 \\smbsjk01\grid_env\CONDOR\1-start\condor_config.master.PC263439
5/16 13:28:49 \\smbsjk01\grid_env\CONDOR\2-main\condor_config.INTEL.WINNT51
5/16 13:28:49 \\smbsjk01\grid_env\CONDOR\2-main\condor_config.common
5/16 13:28:49 \\smbsjk01\grid_env\CONDOR\3-pool\pc222771\condor_config.pool.pc222771
5/16 13:28:49 \\smbsjk01\grid_env\CONDOR\3-pool\pc222771\PC263439\condor_config.local
5/16 13:28:50 DaemonCore: Command Socket at <10.20.12.1:1028>
5/16 13:28:50 Started DaemonCore process "C:\Condor/bin/condor_schedd.exe", pid and pgroup = 920
5/16 13:28:50 Started DaemonCore process "C:\Condor/bin/condor_startd.exe", pid and pgroup = 536
5/16 14:28:50 Preen pid is 3528
5/16 14:28:51 Child 3528 died, but not a daemon -- Ignored
5/17 14:28:50 Preen pid is 2052
5/17 14:28:51 Child 2052 died, but not a daemon -- Ignored
5/18 14:28:50 Preen pid is 3788
5/18 14:28:51 Child 3788 died, but not a daemon -- Ignored
5/19 14:28:50 Preen pid is 488
5/19 14:28:51 Child 488 died, but not a daemon -- Ignored
5/20 14:28:50 Preen pid is 2716
5/20 14:28:51 Child 2716 died, but not a daemon -- Ignored
5/21 10:54:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 4 bytes from <10.3.29.209:54873>.
5/21 10:54:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 5 bytes from <10.3.29.209:54873>.
5/21 10:54:16 IO: Failed to read packet header
5/21 10:54:16 DaemonCore: Can't receive command request from 10.3.29.209 (perhaps a timeout?)
5/21 10:59:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 4 bytes from <10.3.29.209:56031>.
5/21 10:59:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 5 bytes from <10.3.29.209:56031>.
5/21 10:59:16 IO: Failed to read packet header
5/21 10:59:16 DaemonCore: Can't receive command request from 10.3.29.209 (perhaps a timeout?)
5/21 11:04:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 4 bytes from <10.3.29.209:41212>.
5/21 11:04:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 5 bytes from <10.3.29.209:41212>.
5/21 11:04:16 IO: Failed to read packet header
5/21 11:04:16 DaemonCore: Can't receive command request from 10.3.29.209 (perhaps a timeout?)
5/21 11:09:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 4 bytes from <10.3.29.209:58919>.
5/21 11:09:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 5 bytes from <10.3.29.209:58919>.
5/21 11:09:16 IO: Failed to read packet header
5/21 11:09:16 DaemonCore: Can't receive command request from 10.3.29.209 (perhaps a timeout?)
5/21 11:14:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 4 bytes from <10.3.29.209:54807>.
5/21 11:14:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 5 bytes from <10.3.29.209:54807>.
5/21 11:14:16 IO: Failed to read packet header
5/21 11:14:16 DaemonCore: Can't receive command request from 10.3.29.209 (perhaps a timeout?)
5/21 11:19:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 4 bytes from <10.3.29.209:51189>.
5/21 11:19:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 5 bytes from <10.3.29.209:51189>.
5/21 11:19:16 IO: Failed to read packet header
5/21 11:19:16 DaemonCore: Can't receive command request from 10.3.29.209 (perhaps a timeout?)
5/21 11:24:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 4 bytes from <10.3.29.209:52571>.
5/21 11:24:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 5 bytes from <10.3.29.209:52571>.
5/21 11:24:16 IO: Failed to read packet header
5/21 11:24:16 DaemonCore: Can't receive command request from 10.3.29.209 (perhaps a timeout?)
5/21 11:29:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 4 bytes from <10.3.29.209:41606>.
5/21 11:29:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 5 bytes from <10.3.29.209:41606>.
5/21 11:29:16 IO: Failed to read packet header
5/21 11:29:16 DaemonCore: Can't receive command request from 10.3.29.209 (perhaps a timeout?)
5/21 11:34:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 4 bytes from <10.3.29.209:52965>.
5/21 11:34:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 5 bytes from <10.3.29.209:52965>.
5/21 11:34:16 IO: Failed to read packet header
5/21 11:34:16 DaemonCore: Can't receive command request from 10.3.29.209 (perhaps a timeout?)
5/21 11:39:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 4 bytes from <10.3.29.209:37451>.
5/21 11:39:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 5 bytes from <10.3.29.209:37451>.
5/21 11:39:16 IO: Failed to read packet header
5/21 11:39:16 DaemonCore: Can't receive command request from 10.3.29.209 (perhaps a timeout?)
5/21 11:44:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 4 bytes from <10.3.29.209:42761>.
5/21 11:44:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 5 bytes from <10.3.29.209:42761>.
5/21 11:44:16 IO: Failed to read packet header
5/21 11:44:16 DaemonCore: Can't receive command request from 10.3.29.209 (perhaps a timeout?)
5/21 11:49:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 4 bytes from <10.3.29.209:53223>.
5/21 11:49:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 5 bytes from <10.3.29.209:53223>.
5/21 11:49:16 IO: Failed to read packet header
5/21 11:49:16 DaemonCore: Can't receive command request from 10.3.29.209 (perhaps a timeout?)
5/21 11:54:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 4 bytes from <10.3.29.209:59908>.
5/21 11:54:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 5 bytes from <10.3.29.209:59908>.
5/21 11:54:16 IO: Failed to read packet header
5/21 11:54:16 DaemonCore: Can't receive command request from 10.3.29.209 (perhaps a timeout?)
5/21 11:59:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 4 bytes from <10.3.29.209:36863>.
5/21 11:59:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 5 bytes from <10.3.29.209:36863>.
5/21 11:59:16 IO: Failed to read packet header
5/21 11:59:16 DaemonCore: Can't receive command request from 10.3.29.209 (perhaps a timeout?)
5/21 12:04:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 4 bytes from <10.3.29.209:41069>.
5/21 12:04:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 5 bytes from <10.3.29.209:41069>.
5/21 12:04:16 IO: Failed to read packet header
5/21 12:04:16 DaemonCore: Can't receive command request from 10.3.29.209 (perhaps a timeout?)
5/21 12:09:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 4 bytes from <10.3.29.209:60121>.
5/21 12:09:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 5 bytes from <10.3.29.209:60121>.
5/21 12:09:16 IO: Failed to read packet header
5/21 12:09:16 DaemonCore: Can't receive command request from 10.3.29.209 (perhaps a timeout?)
5/21 12:14:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 4 bytes from <10.3.29.209:52260>.
5/21 12:14:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 5 bytes from <10.3.29.209:52260>.
5/21 12:14:16 IO: Failed to read packet header
5/21 12:14:16 DaemonCore: Can't receive command request from 10.3.29.209 (perhaps a timeout?)
5/21 12:19:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 4 bytes from <10.3.29.209:52639>.
5/21 12:19:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 5 bytes from <10.3.29.209:52639>.
5/21 12:19:16 IO: Failed to read packet header
5/21 12:19:16 DaemonCore: Can't receive command request from 10.3.29.209 (perhaps a timeout?)