Re: [HTCondor-devel] new name for MaxJobRetirementTime


Date: Thu, 07 Jun 2018 11:10:29 -0500
From: Dan Bradley <dan@xxxxxxxxxxxxxxxx>
Subject: Re: [HTCondor-devel] new name for MaxJobRetirementTime


On 6/7/18 9:26 AM, Greg Thain wrote:
On 06/06/2018 07:54 PM, Dan Bradley wrote:
And you are correct that the draining policy does not accommodate such a policy. That was intentional, but perhaps the ramifications were not fully digested, or perhaps the intention was to not rely on shifty expressions that zero themselves out but rather to introduce a first class policy mechanism for interrupting retirement.

Dan:

It's always good to hear from you! FWIW, the particular problem that set this off was the fact that PREEMPT (meaning evict) honors MaxJobRetirementTime. Generally speaking, if we want to PREEMPT jobs because they are over some resource limit, we want that to happen right away, and not honor MJRT. So, we can hack up the PREEMPT expression to reset MJRT back to zero, but this doesn't work as expected in the draining case.
Makes sense. I think there are two semantically different types of policies enforced by PREEMPT. One is evicting a misbehaving job. MJRT should not apply to that. My idea of using factors that dynamically zero out the expression is probably not a good way to achieve that and, anyway, it was not incorporated into draining.

The other use of PREEMPT is preempting a job in order to devote resources to some higher priority task (e.g. desktop user or other local process). It was the latter case that was in mind when designing MJRT. Typically, on machines that use PREEMPT for the latter reason, one would set MJRT=0, so having MJRT override PREEMPT seems like an inconvenience in this case, but at least it encourages one not to engage in false advertising.

--Dan

[← Prev in Thread] Current Thread [Next in Thread→]