We use Rocks to install Condor RPM.We have the following line in /etc/sysconfig/condor to point to the system wide
configuration file: CONDOR_CONFIG="/share/apps/condor/etc/condor_config_7.6.6" The condor_config_7.6.6 is attached.Did not see any alarming errors in either MasterLog and SchedLog files. Both
are attached as well. BTW, did not see neither Schedd_Event_Log nor ShadowLog files which lead us to believe that it's not accepting jobs. Thanks..... Steven..... On 04/03/2012 06:50 PM, Alain Roy wrote:
On Apr 3, 2012, at 8:37 PM, Steven Lo wrote:Hi, We just upgraded Condor from 7.4.1 to 7.6.6 on one of our CE. When we do a condor_q, the following error pops out: # condor_q Error: Extra Info: You probably saw this error because the condor_schedd is not running on the machine you are trying to query. We did see that both schedd and startd are running: condor 6518 6490 0 17:42 ? 00:00:00 condor_startd -f condor 6520 6490 0 17:42 ? 00:00:00 condor_schedd -fThat's interesting. How did you install Condor? Do you have CONDOR_CONFIG set? Are there errors in the MasterLog or the SchedLog? -alain ------------------------------ Alain Roy Condor Project roy@xxxxxxxxxxx http://www.cs.wisc.edu/condor _______________________________________________ Condor-users mailing list To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/condor-users The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/
Attachment:
condor_config_7.6.6
Description: Unix manual page
04/03/12 17:23:27 Setting maximum accepts per cycle 4. 04/03/12 17:23:27 ****************************************************** 04/03/12 17:23:27 ** condor_startd (CONDOR_STARTD) STARTING UP 04/03/12 17:23:27 ** /usr/sbin/condor_startd 04/03/12 17:23:27 ** SubsystemInfo: name=STARTD type=STARTD(7) class=DAEMON(1) 04/03/12 17:23:27 ** Configuration: subsystem:STARTD local:<NONE> class:DAEMON 04/03/12 17:23:27 ** $CondorVersion: 7.6.6 Jan 17 2012 BuildID: 401976 $ 04/03/12 17:23:27 ** $CondorPlatform: x86_64_rhap_5 $ 04/03/12 17:23:27 ** PID = 16225 04/03/12 17:23:27 ** Log last touched time unavailable (No such file or directory) 04/03/12 17:23:27 ****************************************************** 04/03/12 17:23:27 Using config source: /share/apps/condor/etc/condor_config_7.6.6 04/03/12 17:23:27 Using local config sources: 04/03/12 17:23:27 /share/apps/condor/hosts/cithep252/condor_config.local 04/03/12 17:23:27 DaemonCore: command socket at <10.3.255.253:56762> 04/03/12 17:23:27 DaemonCore: private command socket at <10.3.255.253:56762> 04/03/12 17:23:27 Setting maximum accepts per cycle 4. 04/03/12 17:23:32 VM-gahp server reported an internal error 04/03/12 17:23:32 VM universe will be tested to check if it is available 04/03/12 17:23:32 History file rotation is enabled. 04/03/12 17:23:32 Maximum history file size is: 20971520 bytes 04/03/12 17:23:32 Number of rotated history files is: 2 04/03/12 17:23:32 slot1: New machine resource allocated 04/03/12 17:23:32 slot2: New machine resource allocated 04/03/12 17:23:32 slot3: New machine resource allocated 04/03/12 17:23:32 slot4: New machine resource allocated 04/03/12 17:23:32 slot5: New machine resource allocated 04/03/12 17:23:32 slot6: New machine resource allocated 04/03/12 17:23:32 slot7: New machine resource allocated 04/03/12 17:23:32 slot8: New machine resource allocated 04/03/12 17:23:32 CronJobList: Adding job 'MIPS' 04/03/12 17:23:32 CronJobList: Adding job 'KFLOPS' 04/03/12 17:23:32 CronJob: Initializing job 'MIPS' (/usr/libexec/condor/condor_mips) 04/03/12 17:23:32 CronJob: Initializing job 'KFLOPS' (/usr/libexec/condor/condor_kflops) 04/03/12 17:39:35 Got SIGTERM. Performing graceful shutdown. 04/03/12 17:39:35 shutdown graceful 04/03/12 17:39:35 Cron: Killing all jobs 04/03/12 17:39:35 Cron: Killing all jobs 04/03/12 17:39:35 Killing job MIPS 04/03/12 17:39:35 Killing job KFLOPS 04/03/12 17:39:35 Deleting cron job manager 04/03/12 17:39:35 Cron: Killing all jobs 04/03/12 17:39:35 Cron: Killing all jobs 04/03/12 17:39:35 CronJobList: Deleting all jobs 04/03/12 17:39:35 Cron: Killing all jobs 04/03/12 17:39:35 CronJobList: Deleting all jobs 04/03/12 17:39:35 Deleting benchmark job mgr 04/03/12 17:39:35 Cron: Killing all jobs 04/03/12 17:39:35 Killing job MIPS 04/03/12 17:39:35 Killing job KFLOPS 04/03/12 17:39:35 Cron: Killing all jobs 04/03/12 17:39:35 Killing job MIPS 04/03/12 17:39:35 Killing job KFLOPS 04/03/12 17:39:35 CronJobList: Deleting all jobs 04/03/12 17:39:35 CronJobList: Deleting job 'MIPS' 04/03/12 17:39:35 CronJob: Deleting job 'MIPS' (/usr/libexec/condor/condor_mips), timer -1 04/03/12 17:39:35 CronJobList: Deleting job 'KFLOPS' 04/03/12 17:39:35 CronJob: Deleting job 'KFLOPS' (/usr/libexec/condor/condor_kflops), timer -1 04/03/12 17:39:35 Cron: Killing all jobs 04/03/12 17:39:35 CronJobList: Deleting all jobs 04/03/12 17:39:35 SafeMsg: sending small msg failed. errno: 101 04/03/12 17:39:35 SafeMsg: sending small msg failed. errno: 101 04/03/12 17:39:35 SafeMsg: sending small msg failed. errno: 101 04/03/12 17:39:35 SafeMsg: sending small msg failed. errno: 101 04/03/12 17:39:35 SafeMsg: sending small msg failed. errno: 101 04/03/12 17:39:35 SafeMsg: sending small msg failed. errno: 101 04/03/12 17:39:35 SafeMsg: sending small msg failed. errno: 101 04/03/12 17:39:35 SafeMsg: sending small msg failed. errno: 101 04/03/12 17:39:35 All resources are free, exiting. 04/03/12 17:39:35 **** condor_startd (condor_STARTD) pid 16225 EXITING WITH STATUS 0 04/03/12 17:42:22 Setting maximum accepts per cycle 4. 04/03/12 17:42:22 ****************************************************** 04/03/12 17:42:22 ** condor_startd (CONDOR_STARTD) STARTING UP 04/03/12 17:42:22 ** /usr/sbin/condor_startd 04/03/12 17:42:22 ** SubsystemInfo: name=STARTD type=STARTD(7) class=DAEMON(1) 04/03/12 17:42:22 ** Configuration: subsystem:STARTD local:<NONE> class:DAEMON 04/03/12 17:42:22 ** $CondorVersion: 7.6.6 Jan 17 2012 BuildID: 401976 $ 04/03/12 17:42:22 ** $CondorPlatform: x86_64_rhap_5 $ 04/03/12 17:42:22 ** PID = 6518 04/03/12 17:42:22 ** Log last touched 4/3 17:39:35 04/03/12 17:42:22 ****************************************************** 04/03/12 17:42:22 Using config source: /share/apps/condor/etc/condor_config_7.6.6 04/03/12 17:42:22 Using local config sources: 04/03/12 17:42:22 /share/apps/condor/hosts/cithep252/condor_config.local 04/03/12 17:42:22 DaemonCore: command socket at <10.3.255.253:50008> 04/03/12 17:42:22 DaemonCore: private command socket at <10.3.255.253:50008> 04/03/12 17:42:22 Setting maximum accepts per cycle 4. 04/03/12 17:42:30 VM-gahp server reported an internal error 04/03/12 17:42:30 VM universe will be tested to check if it is available 04/03/12 17:42:30 History file rotation is enabled. 04/03/12 17:42:30 Maximum history file size is: 20971520 bytes 04/03/12 17:42:30 Number of rotated history files is: 2 04/03/12 17:42:30 slot1: New machine resource allocated 04/03/12 17:42:30 slot2: New machine resource allocated 04/03/12 17:42:30 slot3: New machine resource allocated 04/03/12 17:42:30 slot4: New machine resource allocated 04/03/12 17:42:30 slot5: New machine resource allocated 04/03/12 17:42:30 slot6: New machine resource allocated 04/03/12 17:42:30 slot7: New machine resource allocated 04/03/12 17:42:30 slot8: New machine resource allocated 04/03/12 17:42:30 CronJobList: Adding job 'MIPS' 04/03/12 17:42:30 CronJobList: Adding job 'KFLOPS' 04/03/12 17:42:30 CronJob: Initializing job 'MIPS' (/usr/libexec/condor/condor_mips) 04/03/12 17:42:30 CronJob: Initializing job 'KFLOPS' (/usr/libexec/condor/condor_kflops)
04/03/12 17:23:27 (pid:16226) Setting maximum accepts per cycle 4. 04/03/12 17:23:27 (pid:16226) ****************************************************** 04/03/12 17:23:27 (pid:16226) ** condor_schedd (CONDOR_SCHEDD) STARTING UP 04/03/12 17:23:27 (pid:16226) ** /usr/sbin/condor_schedd 04/03/12 17:23:27 (pid:16226) ** SubsystemInfo: name=SCHEDD type=SCHEDD(5) class=DAEMON(1) 04/03/12 17:23:27 (pid:16226) ** Configuration: subsystem:SCHEDD local:<NONE> class:DAEMON 04/03/12 17:23:27 (pid:16226) ** $CondorVersion: 7.6.6 Jan 17 2012 BuildID: 401976 $ 04/03/12 17:23:27 (pid:16226) ** $CondorPlatform: x86_64_rhap_5 $ 04/03/12 17:23:27 (pid:16226) ** PID = 16226 04/03/12 17:23:27 (pid:16226) ** Log last touched time unavailable (No such file or directory) 04/03/12 17:23:27 (pid:16226) ****************************************************** 04/03/12 17:23:27 (pid:16226) Using config source: /share/apps/condor/etc/condor_config_7.6.6 04/03/12 17:23:27 (pid:16226) Using local config sources: 04/03/12 17:23:27 (pid:16226) /share/apps/condor/hosts/cithep252/condor_config.local 04/03/12 17:23:27 (pid:16226) DaemonCore: command socket at <10.3.255.253:32919> 04/03/12 17:23:27 (pid:16226) DaemonCore: private command socket at <10.3.255.253:32919> 04/03/12 17:23:27 (pid:16226) Setting maximum accepts per cycle 4. 04/03/12 17:23:27 (pid:16226) History file rotation is enabled. 04/03/12 17:23:27 (pid:16226) Maximum history file size is: 20971520 bytes 04/03/12 17:23:27 (pid:16226) Number of rotated history files is: 2 04/03/12 17:23:27 (pid:16226) Logging per-job history files to: /osg/1.2.8/gratia/var/data 04/03/12 17:23:32 (pid:16226) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s 04/03/12 17:28:33 (pid:16226) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s 04/03/12 17:33:34 (pid:16226) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s 04/03/12 17:38:35 (pid:16226) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s 04/03/12 17:39:35 (pid:16226) Got SIGTERM. Performing graceful shutdown. 04/03/12 17:39:35 (pid:16226) Deleting CronJobMgr 04/03/12 17:39:35 (pid:16226) Cron: Killing all jobs 04/03/12 17:39:35 (pid:16226) Cron: Killing all jobs 04/03/12 17:39:35 (pid:16226) CronJobList: Deleting all jobs 04/03/12 17:39:35 (pid:16226) Cron: Killing all jobs 04/03/12 17:39:35 (pid:16226) CronJobList: Deleting all jobs 04/03/12 17:39:35 (pid:16226) sendMsg:sendto failed - errno: 101 04/03/12 17:39:35 (pid:16226) All shadows are gone, exiting. 04/03/12 17:39:35 (pid:16226) error reading from named pipe: watchdog pipe has closed 04/03/12 17:39:35 (pid:16226) ProcFamilyClient: failed to read response from ProcD 04/03/12 17:39:35 (pid:16226) error telling ProcD to exit 04/03/12 17:39:35 (pid:16226) **** condor_schedd (condor_SCHEDD) pid 16226 EXITING WITH STATUS 0 04/03/12 17:42:23 (pid:6520) Setting maximum accepts per cycle 4. 04/03/12 17:42:23 (pid:6520) ****************************************************** 04/03/12 17:42:23 (pid:6520) ** condor_schedd (CONDOR_SCHEDD) STARTING UP 04/03/12 17:42:23 (pid:6520) ** /usr/sbin/condor_schedd 04/03/12 17:42:23 (pid:6520) ** SubsystemInfo: name=SCHEDD type=SCHEDD(5) class=DAEMON(1) 04/03/12 17:42:23 (pid:6520) ** Configuration: subsystem:SCHEDD local:<NONE> class:DAEMON 04/03/12 17:42:23 (pid:6520) ** $CondorVersion: 7.6.6 Jan 17 2012 BuildID: 401976 $ 04/03/12 17:42:23 (pid:6520) ** $CondorPlatform: x86_64_rhap_5 $ 04/03/12 17:42:23 (pid:6520) ** PID = 6520 04/03/12 17:42:23 (pid:6520) ** Log last touched 4/3 17:39:35 04/03/12 17:42:23 (pid:6520) ****************************************************** 04/03/12 17:42:23 (pid:6520) Using config source: /share/apps/condor/etc/condor_config_7.6.6 04/03/12 17:42:23 (pid:6520) Using local config sources: 04/03/12 17:42:23 (pid:6520) /share/apps/condor/hosts/cithep252/condor_config.local 04/03/12 17:42:23 (pid:6520) DaemonCore: command socket at <10.3.255.253:48116> 04/03/12 17:42:23 (pid:6520) DaemonCore: private command socket at <10.3.255.253:48116> 04/03/12 17:42:23 (pid:6520) Setting maximum accepts per cycle 4. 04/03/12 17:42:23 (pid:6520) History file rotation is enabled. 04/03/12 17:42:23 (pid:6520) Maximum history file size is: 20971520 bytes 04/03/12 17:42:23 (pid:6520) Number of rotated history files is: 2 04/03/12 17:42:23 (pid:6520) Logging per-job history files to: /osg/1.2.8/gratia/var/data 04/03/12 17:42:28 (pid:6520) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s 04/03/12 17:47:28 (pid:6520) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s 04/03/12 17:52:28 (pid:6520) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s 04/03/12 17:57:28 (pid:6520) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s 04/03/12 18:02:28 (pid:6520) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s 04/03/12 18:07:28 (pid:6520) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s 04/03/12 18:12:28 (pid:6520) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s 04/03/12 18:17:28 (pid:6520) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s 04/03/12 18:22:28 (pid:6520) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s 04/03/12 18:27:29 (pid:6520) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s 04/03/12 18:32:30 (pid:6520) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s 04/03/12 18:37:31 (pid:6520) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s 04/03/12 18:42:32 (pid:6520) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s 04/03/12 18:47:33 (pid:6520) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s 04/03/12 18:52:34 (pid:6520) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s 04/03/12 18:57:35 (pid:6520) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s 04/03/12 19:02:36 (pid:6520) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s 04/03/12 19:07:37 (pid:6520) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s 04/03/12 19:12:38 (pid:6520) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s
04/03/12 17:23:26 Setting maximum accepts per cycle 4. 04/03/12 17:23:26 ****************************************************** 04/03/12 17:23:26 ** condor_master (CONDOR_MASTER) STARTING UP 04/03/12 17:23:26 ** /usr/sbin/condor_master 04/03/12 17:23:26 ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1) 04/03/12 17:23:26 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON 04/03/12 17:23:26 ** $CondorVersion: 7.6.6 Jan 17 2012 BuildID: 401976 $ 04/03/12 17:23:26 ** $CondorPlatform: x86_64_rhap_5 $ 04/03/12 17:23:26 ** PID = 16224 04/03/12 17:23:26 ** Log last touched time unavailable (No such file or directory) 04/03/12 17:23:26 ****************************************************** 04/03/12 17:23:26 Using config source: /share/apps/condor/etc/condor_config_7.6.6 04/03/12 17:23:26 Using local config sources: 04/03/12 17:23:26 /share/apps/condor/hosts/cithep252/condor_config.local 04/03/12 17:23:26 DaemonCore: command socket at <10.3.255.253:55402> 04/03/12 17:23:26 DaemonCore: private command socket at <10.3.255.253:55402> 04/03/12 17:23:26 Setting maximum accepts per cycle 4. 04/03/12 17:23:26 Started DaemonCore process "/usr/sbin/condor_startd", pid and pgroup = 16225 04/03/12 17:23:27 Started DaemonCore process "/usr/sbin/condor_schedd", pid and pgroup = 16226 04/03/12 17:39:35 Got SIGTERM. Performing graceful shutdown. 04/03/12 17:39:35 SafeMsg: sending small msg failed. errno: 101 04/03/12 17:39:35 Sent SIGTERM to SCHEDD (pid 16226) 04/03/12 17:39:35 Sent SIGTERM to STARTD (pid 16225) 04/03/12 17:39:35 The STARTD (pid 16225) exited with status 0 04/03/12 17:39:35 The SCHEDD (pid 16226) exited with status 0 04/03/12 17:39:35 All daemons are gone. Exiting. 04/03/12 17:39:35 **** condor_master (condor_MASTER) pid 16224 EXITING WITH STATUS 0 04/03/12 17:42:22 Setting maximum accepts per cycle 4. 04/03/12 17:42:22 ****************************************************** 04/03/12 17:42:22 ** condor_master (CONDOR_MASTER) STARTING UP 04/03/12 17:42:22 ** /usr/sbin/condor_master 04/03/12 17:42:22 ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1) 04/03/12 17:42:22 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON 04/03/12 17:42:22 ** $CondorVersion: 7.6.6 Jan 17 2012 BuildID: 401976 $ 04/03/12 17:42:22 ** $CondorPlatform: x86_64_rhap_5 $ 04/03/12 17:42:22 ** PID = 6490 04/03/12 17:42:22 ** Log last touched 4/3 17:39:35 04/03/12 17:42:22 ****************************************************** 04/03/12 17:42:22 Using config source: /share/apps/condor/etc/condor_config_7.6.6 04/03/12 17:42:22 Using local config sources: 04/03/12 17:42:22 /share/apps/condor/hosts/cithep252/condor_config.local 04/03/12 17:42:22 DaemonCore: command socket at <10.3.255.253:46860> 04/03/12 17:42:22 DaemonCore: private command socket at <10.3.255.253:46860> 04/03/12 17:42:22 Setting maximum accepts per cycle 4. 04/03/12 17:42:22 Started DaemonCore process "/usr/sbin/condor_startd", pid and pgroup = 6518 04/03/12 17:42:22 Started DaemonCore process "/usr/sbin/condor_schedd", pid and pgroup = 6520 04/03/12 18:42:22 Preen pid is 24588
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature