[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-devel] proposal: PREEMPT_AND_HOLD
- Date: Mon, 03 Mar 2008 12:20:52 -0600
- From: Dan Bradley <dan@xxxxxxxxxxxx>
- Subject: [Condor-devel] proposal: PREEMPT_AND_HOLD
Admins have requested the ability to put jobs on hold in the machine
policy rather than just evicting them. I agree with this. There are
cases where the user needs to be better informed of why their job is
being booted (often because they did not specify appropriate
requirements), just like we made the change to putting jobs on hold for
exec failure and specific file transfer errors.
Here's the proposal:
# this is just an example macro used in the following policy
MEMORY_EXCEEDED = (ImageSize > VirtualMemory*0.8)
# if this becomes true in the startd, the job is preempted and goes on hold
PREEMPT_AND_HOLD = $(MEMORY_EXCEEDED)
# if this is defined, it sets the hold reason; otherwise, a default
# hold reason will be used
PREEMPT_AND_HOLD_REASON = \
ifThenElse( $(MEMORY_EXCEEDED), "Virtual memory exhausted.", Undefined )
# if this is defined, it sets the hold subcode
PREEMPT_AND_HOLD_SUBCODE = \
ifThenElse( $(MEMORY_EXCEEDED), 1, Undefined )
The starter already has a mechanism to tell the shadow to put the job on
hold, but I think it should be the startd which evaluates
PREEMPT_AND_HOLD, in case it depends on dynamic information in the
machine ad. Therefore, I think the best route is to have the startd
tell the starter and then the starter can tell the shadow to put the job
on hold using the existing mechanism for that.
Thoughts?
--Dan