I've always felt a bit sheepish about the naming of
MaxJobRetirementTime. It will be nice to know that it is fixed!
When some condor policy wants to terminate a job, there is a maximum
amount of time that it might have to wait before moving on to the next
step. It could be less, depending on circumstances. Hence the name. It
fails to express what those circumstances are in which the time could be
less.
It sounds like you are aware that the expression can renege on its
promise by containing a factor that zeroes the whole thing out in some
conditions. And you are correct that the draining policy does not
accommodate such a policy. That was intentional, but perhaps the
ramifications were not fully digested, or perhaps the intention was to
not rely on shifty expressions that zero themselves out but rather to
introduce a first class policy mechanism for interrupting retirement.
--Dan
On 6/6/18 11:48 AM, Todd L Miller wrote:
ÂÂÂÂWe've come to the conclusion that every word in
MaxJobRetirementTime is wrong, so we'd like to rename it. To be
precise, we must observe that there's a job ad attribute and a startd
configuration knob with the same name. The job ad attribute can only
shorten the duration specified by the startd configuration knob.Â
Therefore, since the job is not informed when it enters vacating
state, the only utility for the job ad attribute is match-making,
meaning "don't bother to start me if you won't guarantee me if enough
time to make forward progress."
ÂÂÂÂI therefore propose that we call the job ad attribute
"request_duration", since that's the only thing it can do.
ÂÂÂÂThe startd configuration knob should thus include the word
duration. It can't be /just/ duration (unlike MEMORY), because
(unlike MEMORY and NUM_CPUs), it's a conditional minimum, not a
configured maximum. (The Miron directive for CHTC implies a knob
called MAXIMUM_DURATION, but that's a different a problem.) Maybe call
it ADVERTISED_DURATION?
ÂÂÂÂThings that ignore MJRT currently include condor_reassign_slot(s)
(which may be renamed to condor_now) and one (or more?) of the
shutdown styles. A job may also not run for the MJRT because it
didn't need that much time or because its policy expressions indicated
it shouldn't. We agree that a job should also not run for the MJRT if
it abuses the system (e.g., uses more memory than requested), but
that's not currently implementable if the machine is draining.Â
Because of these conditions, I feel we shouldn't name the knob
'promised' or 'minimum'.
- ToddM
_______________________________________________
HTCondor-devel mailing list
HTCondor-devel@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-devel
|