Dear all:
I ‘m a new user of condor and this is my first time of installing it.
After I installed condor as a central manager then I use “./etc/condor_master” to start it.
I use “ps –ef| grep condor” to check the process :
condor 8394 1 0 13:49 ? 00:00:00 ./sbin/condor_master
condor 8395 8394 0 13:49 ? 00:00:05 condor_startd -f
condor 8396 8394 0 13:49 ? 00:00:00 condor_schedd -f
root 8398 8396 0 13:49 ? 00:00:00 condor_procd -A /data/condor/log/procd_pipe.SCHEDD -R 10000000 -S 60 -C 60000
I think it’s wrong! Then I check the log files(listed below), I find maybe there is something wrong with the TCP connection. All the three log files say:” TCP connection to <10.122.226.129:9618> failed.” However I have turn off my firewall. Could anybody help me to find where the error is and how to fix it?
MasterLog file is :
03/01/11 13:49:10 Setting maximum accepts per cycle 4.
03/01/11 13:49:10 ******************************************************
03/01/11 13:49:10 ** condor_master (CONDOR_MASTER) STARTING UP
03/01/11 13:49:10 ** /mnt/disk2/yw60175/WORK_1/JFT/Software/Condor/condor-7.5.5-x86_64_rhas_3-unstripped/sbin/condor_master
03/01/11 13:49:10 ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1)
03/01/11 13:49:10 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON
03/01/11 13:49:10 ** $CondorVersion: 7.5.5 Jan 26 2011 BuildID: 308936 $
03/01/11 13:49:10 ** $CondorPlatform: X86_64-LINUX_x86_64_rhas_3 $
03/01/11 13:49:10 ** PID = 8394
03/01/11 13:49:10 ** Log last touched time unavailable (No such file or directory)
03/01/11 13:49:10 ******************************************************
03/01/11 13:49:10 Using config source: /mnt/disk2/yw60175/WORK_1/JFT/Software/Condor/condor-7.5.5-x86_64_rhas_3-unstripped/etc/condor_config
03/01/11 13:49:10 Using local config sources:
03/01/11 13:49:10 /data/condor/condor_config.local
03/01/11 13:49:10 DaemonCore: command socket at <10.122.226.129:48922>
03/01/11 13:49:10 Setting maximum accepts per cycle 4.
03/01/11 13:49:10 Started DaemonCore process "/mnt/disk2/yw60175/WORK_1/JFT/Software/Condor/condor-7.5.5-x86_64_rhas_3-unstripped/sbin/condor_startd", pid and pgroup = 8395
03/01/11 13:49:10 Started DaemonCore process "/mnt/disk2/yw60175/WORK_1/JFT/Software/Condor/condor-7.5.5-x86_64_rhas_3-unstripped/sbin/condor_schedd", pid and pgroup = 8396
03/01/11 13:49:15 attempt to connect to <10.122.226.129:9618> failed: Connection refused (connect errno = 111).
03/01/11 13:49:15 ERROR: SECMAN:2004:Failed to create security session to <10.122.226.129:9618> with TCP.
|SECMAN:2003:TCP connection to <10.122.226.129:9618> failed.
03/01/11 13:49:15 Failed to start non-blocking update to <10.122.226.129:9618>.
03/01/11 13:54:15 attempt to connect to <10.122.226.129:9618> failed: Connection refused (connect errno = 111).
03/01/11 13:54:15 ERROR: SECMAN:2004:Failed to create security session to <10.122.226.129:9618> with TCP.
|SECMAN:2003:TCP connection to <10.122.226.129:9618> failed.
03/01/11 13:54:15 Failed to start non-blocking update to <10.122.226.129:9618>.
03/01/11 13:59:15 attempt to connect to <10.122.226.129:9618> failed: Connection refused (connect errno = 111).
03/01/11 13:59:15 ERROR: SECMAN:2004:Failed to create security session to <10.122.226.129:9618> with TCP.
|SECMAN:2003:TCP connection to <10.122.226.129:9618> failed.
03/01/11 13:59:15 Failed to start non-blocking update to <10.122.226.129:9618>.
03/01/11 14:04:15 attempt to connect to <10.122.226.129:9618> failed: Connection refused (connect errno = 111).
03/01/11 14:04:15 ERROR: SECMAN:2004:Failed to create security session to <10.122.226.129:9618> with TCP.
|SECMAN:2003:TCP connection to <10.122.226.129:9618> failed.
03/01/11 14:04:15 Failed to start non-blocking update to <10.122.226.129:9618>.
SchedLog file is :
03/01/11 13:49:10 (pid:8396) Setting maximum accepts per cycle 4.
03/01/11 13:49:10 (pid:8396) ******************************************************
03/01/11 13:49:10 (pid:8396) ** condor_schedd (CONDOR_SCHEDD) STARTING UP
03/01/11 13:49:10 (pid:8396) ** /mnt/disk2/yw60175/WORK_1/JFT/Software/Condor/condor-7.5.5-x86_64_rhas_3-unstripped/sbin/condor_schedd
03/01/11 13:49:10 (pid:8396) ** SubsystemInfo: name=SCHEDD type=SCHEDD(5) class=DAEMON(1)
03/01/11 13:49:10 (pid:8396) ** Configuration: subsystem:SCHEDD local:<NONE> class:DAEMON
03/01/11 13:49:10 (pid:8396) ** $CondorVersion: 7.5.5 Jan 26 2011 BuildID: 308936 $
03/01/11 13:49:10 (pid:8396) ** $CondorPlatform: X86_64-LINUX_x86_64_rhas_3 $
03/01/11 13:49:10 (pid:8396) ** PID = 8396
03/01/11 13:49:10 (pid:8396) ** Log last touched time unavailable (No such file or directory)
03/01/11 13:49:10 (pid:8396) ******************************************************
03/01/11 13:49:10 (pid:8396) Using config source: /mnt/disk2/yw60175/WORK_1/JFT/Software/Condor/condor-7.5.5-x86_64_rhas_3-unstripped/etc/condor_config
03/01/11 13:49:10 (pid:8396) Using local config sources:
03/01/11 13:49:10 (pid:8396) /data/condor/condor_config.local
03/01/11 13:49:10 (pid:8396) DaemonCore: command socket at <10.122.226.129:48924>
03/01/11 13:49:10 (pid:8396) Setting maximum accepts per cycle 4.
03/01/11 13:49:10 (pid:8396) History file rotation is enabled.
03/01/11 13:49:10 (pid:8396) Maximum history file size is: 20971520 bytes
03/01/11 13:49:10 (pid:8396) Number of rotated history files is: 2
03/01/11 13:49:16 (pid:8396) attempt to connect to <10.122.226.129:9618> failed: Connection refused (connect errno = 111).
03/01/11 13:49:16 (pid:8396) ERROR: SECMAN:2004:Failed to create security session to <10.122.226.129:9618> with TCP.
|SECMAN:2003:TCP connection to <10.122.226.129:9618> failed.
03/01/11 13:49:16 (pid:8396) Failed to start non-blocking update to <10.122.226.129:9618>.
03/01/11 13:54:17 (pid:8396) attempt to connect to <10.122.226.129:9618> failed: Connection refused (connect errno = 111).
03/01/11 13:54:17 (pid:8396) ERROR: SECMAN:2004:Failed to create security session to <10.122.226.129:9618> with TCP.
|SECMAN:2003:TCP connection to <10.122.226.129:9618> failed.
03/01/11 13:54:17 (pid:8396) Failed to start non-blocking update to <10.122.226.129:9618>.
03/01/11 13:59:18 (pid:8396) attempt to connect to <10.122.226.129:9618> failed: Connection refused (connect errno = 111).
03/01/11 13:59:18 (pid:8396) ERROR: SECMAN:2004:Failed to create security session to <10.122.226.129:9618> with TCP.
|SECMAN:2003:TCP connection to <10.122.226.129:9618> failed.
03/01/11 13:59:18 (pid:8396) Failed to start non-blocking update to <10.122.226.129:9618>.
03/01/11 14:04:19 (pid:8396) attempt to connect to <10.122.226.129:9618> failed: Connection refused (connect errno = 111).
03/01/11 14:04:19 (pid:8396) ERROR: SECMAN:2004:Failed to create security session to <10.122.226.129:9618> with TCP.
|SECMAN:2003:TCP connection to <10.122.226.129:9618> failed.
03/01/11 14:04:19 (pid:8396) Failed to start non-blocking update to <10.122.226.129:9618>.
StartLog file is :
03/01/11 13:49:10 Setting maximum accepts per cycle 4.
03/01/11 13:49:10 ******************************************************
03/01/11 13:49:10 ** condor_startd (CONDOR_STARTD) STARTING UP
03/01/11 13:49:10 ** /mnt/disk2/yw60175/WORK_1/JFT/Software/Condor/condor-7.5.5-x86_64_rhas_3-unstripped/sbin/condor_startd
03/01/11 13:49:10 ** SubsystemInfo: name=STARTD type=STARTD(7) class=DAEMON(1)
03/01/11 13:49:10 ** Configuration: subsystem:STARTD local:<NONE> class:DAEMON
03/01/11 13:49:10 ** $CondorVersion: 7.5.5 Jan 26 2011 BuildID: 308936 $
03/01/11 13:49:10 ** $CondorPlatform: X86_64-LINUX_x86_64_rhas_3 $
03/01/11 13:49:10 ** PID = 8395
03/01/11 13:49:10 ** Log last touched time unavailable (No such file or directory)
03/01/11 13:49:10 ******************************************************
03/01/11 13:49:10 Using config source: /mnt/disk2/yw60175/WORK_1/JFT/Software/Condor/condor-7.5.5-x86_64_rhas_3-unstripped/etc/condor_config
03/01/11 13:49:10 Using local config sources:
03/01/11 13:49:10 /data/condor/condor_config.local
03/01/11 13:49:10 DaemonCore: command socket at <10.122.226.129:48923>
03/01/11 13:49:10 Setting maximum accepts per cycle 4.
03/01/11 13:49:16 VM-gahp server reported an internal error
03/01/11 13:49:16 VM universe will be tested to check if it is available
03/01/11 13:49:16 History file rotation is enabled.
03/01/11 13:49:16 Maximum history file size is: 20971520 bytes
03/01/11 13:49:16 Number of rotated history files is: 2
03/01/11 13:49:16 slot1: New machine resource allocated
03/01/11 13:49:16 slot2: New machine resource allocated
03/01/11 13:49:16 About to run initial benchmarks.
03/01/11 13:49:21 Completed initial benchmarks.
03/01/11 13:49:25 attempt to connect to <10.122.226.129:9618> failed: Connection refused (connect errno = 111).
03/01/11 13:49:25 ERROR: SECMAN:2004:Failed to create security session to <10.122.226.129:9618> with TCP.
|SECMAN:2003:TCP connection to <10.122.226.129:9618> failed.
03/01/11 13:49:25 Failed to start non-blocking update to <10.122.226.129:9618>.
03/01/11 13:49:26 attempt to connect to <10.122.226.129:9618> failed: Connection refused (connect errno = 111).
03/01/11 13:49:26 ERROR: SECMAN:2004:Failed to create security session to <10.122.226.129:9618> with TCP.
|SECMAN:2003:TCP connection to <10.122.226.129:9618> failed.
03/01/11 13:49:26 Failed to start non-blocking update to <10.122.226.129:9618>.
03/01/11 13:54:25 attempt to connect to <10.122.226.129:9618> failed: Connection refused (connect errno = 111).
03/01/11 13:54:25 ERROR: SECMAN:2004:Failed to create security session to <10.122.226.129:9618> with TCP.
|SECMAN:2003:TCP connection to <10.122.226.129:9618> failed.
03/01/11 13:54:25 Failed to start non-blocking update to <10.122.226.129:9618>.
03/01/11 13:54:26 attempt to connect to <10.122.226.129:9618> failed: Connection refused (connect errno = 111).
03/01/11 13:54:26 ERROR: SECMAN:2004:Failed to create security session to <10.122.226.129:9618> with TCP.
|SECMAN:2003:TCP connection to <10.122.226.129:9618> failed.
03/01/11 13:54:26 Failed to start non-blocking update to <10.122.226.129:9618>.
03/01/11 13:59:25 attempt to connect to <10.122.226.129:9618> failed: Connection refused (connect errno = 111).
03/01/11 13:59:25 ERROR: SECMAN:2004:Failed to create security session to <10.122.226.129:9618> with TCP.
|SECMAN:2003:TCP connection to <10.122.226.129:9618> failed.
03/01/11 13:59:25 Failed to start non-blocking update to <10.122.226.129:9618>.
03/01/11 13:59:26 attempt to connect to <10.122.226.129:9618> failed: Connection refused (connect errno = 111).
03/01/11 13:59:26 ERROR: SECMAN:2004:Failed to create security session to <10.122.226.129:9618> with TCP.
|SECMAN:2003:TCP connection to <10.122.226.129:9618> failed.
03/01/11 13:59:26 Failed to start non-blocking update to <10.122.226.129:9618>.
03/01/11 14:04:25 attempt to connect to <10.122.226.129:9618> failed: Connection refused (connect errno = 111).
03/01/11 14:04:25 ERROR: SECMAN:2004:Failed to create security session to <10.122.226.129:9618> with TCP.
|SECMAN:2003:TCP connection to <10.122.226.129:9618> failed.
03/01/11 14:04:25 Failed to start non-blocking update to <10.122.226.129:9618>.
03/01/11 14:04:26 attempt to connect to <10.122.226.129:9618> failed: Connection refused (connect errno = 111).
03/01/11 14:04:26 ERROR: SECMAN:2004:Failed to create security session to <10.122.226.129:9618> with TCP.
|SECMAN:2003:TCP connection to <10.122.226.129:9618> failed.
03/01/11 14:04:26 Failed to start non-blocking update to <10.122.226.129:9618>.