HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-devel] proposal: PREEMPT_AND_HOLD



On Mon, Mar 03, 2008 at 12:20:52PM -0600, Dan Bradley wrote:
> 
> Admins have requested the ability to put jobs on hold in the machine 
> policy rather than just evicting them.  I agree with this.  There are 
> cases where the user needs to be better informed of why their job is 
> being booted (often because they did not specify appropriate 
> requirements), just like we made the change to putting jobs on hold for 
> exec failure and specific file transfer errors.
> 
> Here's the proposal:
> 
> # this is just an example macro used in the following policy
> MEMORY_EXCEEDED = (ImageSize > VirtualMemory*0.8)
> 
> # if this becomes true in the startd, the job is preempted and goes on hold
> PREEMPT_AND_HOLD = $(MEMORY_EXCEEDED)
> 
> # if this is defined, it sets the hold reason; otherwise, a default
> # hold reason will be used
> PREEMPT_AND_HOLD_REASON = \
>    ifThenElse( $(MEMORY_EXCEEDED), "Virtual memory exhausted.", Undefined )
> 
> # if this is defined, it sets the hold subcode
> PREEMPT_AND_HOLD_SUBCODE = \
>    ifThenElse( $(MEMORY_EXCEEDED), 1, Undefined )
> 
> 
> 
> The starter already has a mechanism to tell the shadow to put the job on 
> hold, but I think it should be the startd which evaluates 
> PREEMPT_AND_HOLD, in case it depends on dynamic information in the 
> machine ad.  Therefore, I think the best route is to have the startd 
> tell the starter and then the starter can tell the shadow to put the job 
> on hold using the existing mechanism for that.
> 
> Thoughts?
> 

Not a fan.

I'm all for relaying some sort of eviction information back to the schedd,
but a startd should never have any say about what's happening in the queue.
To use your example, just because a job runs out of virtual memory on one
machine doesn't mean that it wouldn't succeed on another machine, so why
should it go on hold?

Now, if the schedd wants to use information from the eviction to make a 
scheduling decision, I'm all for that.

-Erik

> --Dan
> 
> _______________________________________________
> Condor-devel mailing list
> Condor-devel@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-devel