On 4/12/11 12:46 PM, Carsten Aulbert wrote:
Hi Dan On Tuesday 12 April 2011 16:58:52 Dan Bradley wrote:I am puzzled about why preemption is ineffective in the case where the work-fetch job has higher rank than the existing claim. What version of condor is this?Version 7.4.4 But I was not aware that preemption is needed to claim an idle slot
The logs you posted showed the slot transitioning to Claimed/Idle, not Unclaimed/Idle. Therefore, the work-fetch job must preempt the claim of the schedd that is holding it. I can't think of any reason why the schedd would hold the claim after a job completes without starting another job for an hour other than the schedd being very very busy. Perhaps it would be worth looking into what exactly is going on with that. One place to start would be the shadow log. Look at the shadow that ran the job that ran on the claim before it transitioned to Claimed/Idle for a long period of time. Did the shadow exit cleanly? In the schedd log, can you see the schedd handling the exit of that shadow? It should immediately launch another job on the claim at that point.
I am also curious why claims are sitting in Claimed/Idle for so long after a job finishes. Is the schedd severely overloaded?Not really - as far as I can tell, busy as usual with< ~50% CPU time on a single node
The schedd is single-threaded. It is possible for the cpu to be not very busy but for the schedd to be having performance problems due to disk i/o or blocking network communications. Is the schedd responsive to condor_q queries?
--Dan