HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-devel] proposal: PREEMPT_AND_HOLD




Admins have requested the ability to put jobs on hold in the machine policy rather than just evicting them. I agree with this. There are cases where the user needs to be better informed of why their job is being booted (often because they did not specify appropriate requirements), just like we made the change to putting jobs on hold for exec failure and specific file transfer errors.

Here's the proposal:

# this is just an example macro used in the following policy
MEMORY_EXCEEDED = (ImageSize > VirtualMemory*0.8)

# if this becomes true in the startd, the job is preempted and goes on hold
PREEMPT_AND_HOLD = $(MEMORY_EXCEEDED)

# if this is defined, it sets the hold reason; otherwise, a default
# hold reason will be used
PREEMPT_AND_HOLD_REASON = \
  ifThenElse( $(MEMORY_EXCEEDED), "Virtual memory exhausted.", Undefined )

# if this is defined, it sets the hold subcode
PREEMPT_AND_HOLD_SUBCODE = \
  ifThenElse( $(MEMORY_EXCEEDED), 1, Undefined )



The starter already has a mechanism to tell the shadow to put the job on hold, but I think it should be the startd which evaluates PREEMPT_AND_HOLD, in case it depends on dynamic information in the machine ad. Therefore, I think the best route is to have the startd tell the starter and then the starter can tell the shadow to put the job on hold using the existing mechanism for that.

Thoughts?

--Dan