Re: [HTCondor-devel] new name for MaxJobRetirementTime


Date: Wed, 06 Jun 2018 19:54:32 -0500
From: Dan Bradley <dan@xxxxxxxxxxxx>
Subject: Re: [HTCondor-devel] new name for MaxJobRetirementTime
I've always felt a bit sheepish about the naming of MaxJobRetirementTime. It will be nice to know that it is fixed!

When some condor policy wants to terminate a job, there is a maximum amount of time that it might have to wait before moving on to the next step. It could be less, depending on circumstances. Hence the name. It fails to express what those circumstances are in which the time could be less.

It sounds like you are aware that the expression can renege on its promise by containing a factor that zeroes the whole thing out in some conditions. And you are correct that the draining policy does not accommodate such a policy. That was intentional, but perhaps the ramifications were not fully digested, or perhaps the intention was to not rely on shifty expressions that zero themselves out but rather to introduce a first class policy mechanism for interrupting retirement.

--Dan

On 6/6/18 11:48 AM, Todd L Miller wrote:
ÂÂÂÂWe've come to the conclusion that every word in MaxJobRetirementTime is wrong, so we'd like to rename it. To be precise, we must observe that there's a job ad attribute and a startd configuration knob with the same name. The job ad attribute can only shorten the duration specified by the startd configuration knob. Therefore, since the job is not informed when it enters vacating state, the only utility for the job ad attribute is match-making, meaning "don't bother to start me if you won't guarantee me if enough time to make forward progress."

ÂÂÂÂI therefore propose that we call the job ad attribute "request_duration", since that's the only thing it can do.

ÂÂÂÂThe startd configuration knob should thus include the word duration. It can't be /just/ duration (unlike MEMORY), because (unlike MEMORY and NUM_CPUs), it's a conditional minimum, not a configured maximum. (The Miron directive for CHTC implies a knob called MAXIMUM_DURATION, but that's a different a problem.) Maybe call it ADVERTISED_DURATION?

ÂÂÂÂThings that ignore MJRT currently include condor_reassign_slot(s) (which may be renamed to condor_now) and one (or more?) of the shutdown styles. A job may also not run for the MJRT because it didn't need that much time or because its policy expressions indicated it shouldn't. We agree that a job should also not run for the MJRT if it abuses the system (e.g., uses more memory than requested), but that's not currently implementable if the machine is draining. Because of these conditions, I feel we shouldn't name the knob 'promised' or 'minimum'.

- ToddM
_______________________________________________
HTCondor-devel mailing list
HTCondor-devel@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-devel

[← Prev in Thread] Current Thread [Next in Thread→]