Hi, We are using condor 6.8 pool (around 150 nodes), until
now it use to run fine. Looks like somebody submitted 2000 jobs seems. After that none of the jobs are running, I am keep getting
email from condor like this Condor job 13221.0 has been put on hold. No condor_shadow installed that supports vanilla
jobs on V6.3.3 or newer resources Please correct this problem and release the
job with "condor_release" And also ….. condor_schedd is exited with status 44. And I am
seeing this file under log. “dprintf_failure.SCHEDD” Here is the file content. 3/21 18:55:48 dprintf() had a fatal error in pid
6362 Can't
link(/u/condor/log/SchedLog,/u/condor/log/SchedLog.old) errno: 17 (File exists) euid: 32768, ruid: 0 Here is the SchedLog. Even I killed condor master and
restarted, then released all the jobs. But still having the same problem all
the jobs are going to hold. Could you please help me, what might be the problem? How to
fix this. ps -ef | grep condor condor
19294 1 0 Mar21
? 00:00:03
/usr/local/condor/sbin/condor_master condor 19295 19294 0
Mar21 ? 00:00:11 condor_collector -f condor 19296 19294 0 Mar21
? 00:00:03 condor_negotiator -f condor 19297 19294 0
Mar21 ? 00:00:06 condor_startd -f condor 19298 19294 4
Mar21 ? 00:03:25 condor_schedd -f -p
9600 But the pool couldn’t not run job,
turn into hold and keep complaining about No condor_shadow installed
…………. Is it possible to fix it with out
reinstalling condor ? Thanks, Senthil SchedLog *********** 3/22 00:00:22 Marked job 14508.0 as IDLE 3/22 00:00:22 Job 14508.0 put on hold: No condor_shadow
installed that supports vanilla jobs on V6.3.3 or newer resources 3/22 00:00:22 abort_job_myself: 14508.0 action:Hold
log_hold:true notify:true 3/22 00:00:22 Writing record to user logfile=/u/jum18/AIRTP/Patient_278.log
owner=jum18 3/22 00:00:22 Forking Mailer process... 3/22 00:00:22 start next job after 2 sec, JobsThisBurst 0 3/22 00:00:22 DaemonCore: No more children processes to
reap. 3/22 00:00:24 Job prep for 14509.0 will not block, calling
aboutToSpawnJobHandler() directly 3/22 00:00:24 aboutToSpawnJobHandler() completed for job
14509.0, attempting to spawn job handler 3/22 00:00:24 Trying to run a VANILLA job on a 6.3.3 or
later resource, but you do not have condor_shadow that will work, aborting |