I noticed a few jobs in my 6.9.2pre cluster that had been evicted and restarted even though the cluster is not full. I finally found an error message in the StarterLog for one of the jobs (see below). It seems that the job was restarted due to a 'ProcAPI short read' error. Note that the job had been running for just over 7 hours, which is the expected normal runtime for this job. What might be the cause for an error like this? --Mike 7/23 00:16:23 ****************************************************** 7/23 00:16:23 ** condor_starter (CONDOR_STARTER) STARTING UP 7/23 00:16:23 ** /share/apps/condor-6.9.2/sbin/condor_starter 7/23 00:16:23 ** $CondorVersion: 6.9.2 Jan 17 2007 PRE-RELEASE-UWCS $ 7/23 00:16:23 ** $CondorPlatform: I386-LINUX_RHEL3 $ 7/23 00:16:23 ** PID = 29667 7/23 00:16:23 ** Log last touched 7/23 00:16:21 7/23 00:16:23 ****************************************************** 7/23 00:16:23 Using config source: /home/condor/condor_config7/23 00:16:23 Using local config sources: 7/23 00:16:23 /share/apps/condor/hosts/cithep184/condor_config.local7/23 00:16:23 DaemonCore: Command Socket at <10.255.255.201:40565> 7/23 00:16:23 Done setting resource limits7/23 00:16:23 Communicating with shadow <10.255.255.216:46438> 7/23 00:16:23 Submitting machine is "gatekeeper-0-2.local" 7/23 00:16:24 File transfer completed successfully.7/23 00:16:25 Starting a VANILLA universe job with ID: 61742.0 7/23 00:16:25 IWD: /state/partition1/tmp/cithep184/execute/dir_296677/23 00:16:25 Output file: /state/partition1/tmp/cithep184/execute/dir_29667/_condor_stdout 7/23 00:16:25 Error file: /state/partition1/tmp/cithep184/execute/dir_29667/_condor_stderr 7/23 00:16:25 About to exec /state/partition1/tmp/cithep184/execute/dir_29667/condor_exec.exe 7/23 00:16:25 Create_Process succeeded, pid=29671 7/23 07:27:27 ProcAPI: Unexpected short scan on /proc/11191/stat, errno: 3. 7/23 09:53:03 Process exited, pid=29671, status=0 7/23 09:53:03 Got SIGQUIT. Performing fast shutdown. 7/23 09:53:03 ShutdownFast all jobs.
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature