Hi! I am trying to configure Condor on a
linux cluster consisting of 13 machines plus the administrator machine that is
running windows 7. I get two main error messages that are
the following, WARNING: Unable to
determine local IP address. Condor might not work propertly until
you set NETWORK_INTERFACE=<machine IP address> and In order for Condor
to work properly you must set your CONDOR_CONFIG environment variable
to point to your Condor configuration file: /home/sonia/condor-7.4.1/etc/condor_config
before running Condor commands/daemons. How should I solve these problems? I
have tried several alternatives but it’s still not working… Any hints? Another error message that I receive now
and then is the following, This is an
automated email from the Condor system on machine
"o2f-sth-lap-016.un.dr.dgcsystems.net". Do not reply. "C:\condor/bin/condor_schedd.exe"
on "o2f-sth-lap-016.un.dr.dgcsystems.net" exited with status 4. Condor will automatically
restart this process in 11 seconds. *** Last 20 line(s) of file
C:\condor/log/SchedLog: 06/18 09:10:17 (pid:5892)
Using local config sources: 06/18 09:10:17 (pid:5892)
C:\condor/condor_config.local 06/18 09:10:17 (pid:5892)
DaemonCore: Command Socket at <10.110.44.113:62060> 06/18 09:10:17 (pid:5892)
History file rotation is enabled. 06/18 09:10:17 (pid:5892)
Maximum history file size is: 20971520 bytes 06/18 09:10:17 (pid:5892)
Number of rotated history files is: 2 06/18 09:10:17 (pid:5892)
my_popen: CreateProcess failed 06/18 09:10:17 (pid:5892)
Failed to execute C:\condor/bin/condor_shadow.std.exe, ignoring 06/18 09:10:17 (pid:5892)
attempt to connect to <169.254.67.219:49157> failed: connect errno =
10051. Will keep trying for 20 total seconds (20 to go). 06/18 09:10:37 (pid:5892)
attempt to connect to <169.254.67.219:49157> failed: connect errno =
10051. 06/18 09:10:37 (pid:5892)
ERROR: SECMAN:2003:TCP auth connection to <169.254.67.219:49157> failed. 06/18 09:10:37 (pid:5892)
Failed to send alive to <169.254.67.219:49157>, will try again... 06/18 09:10:42 (pid:5892)
attempt to connect to <169.254.67.219:49157> failed: connect errno =
10051. Will keep trying for 20 total seconds (20 to go). 06/18 09:11:02 (pid:5892)
attempt to connect to <169.254.67.219:49157> failed: connect errno =
10051. 06/18 09:11:02 (pid:5892)
ERROR: SECMAN:2003:TCP auth connection to <169.254.67.219:49157> failed. 06/18 09:11:02 (pid:5892)
Failed to send alive to <169.254.67.219:49157>, will try again... 06/18 09:11:07 (pid:5892)
attempt to connect to <169.254.67.219:49157> failed: connect errno =
10051. Will keep trying for 20 total seconds (20 to go). 06/18 09:11:27 (pid:5892)
attempt to connect to <169.254.67.219:49157> failed: connect errno =
10051. 06/18 09:11:27 (pid:5892)
ERROR: SECMAN:2003:TCP auth connection to <169.254.67.219:49157> failed. 06/18 09:11:27 (pid:5892)
ERROR "FAILED TO SEND INITIAL KEEP ALIVE TO OUR PARENT
<169.254.67.219:49157>" at line 9312 in file
..\src\condor_daemon_core.V6\daemon_core.cpp *** End of file SchedLog What does this mean? Cheers, Sónia Sónia Liléo |