We've come to the conclusion that every word in
MaxJobRetirementTime is wrong, so we'd like to rename it. To be precise,
we must observe that there's a job ad attribute and a startd configuration
knob with the same name. The job ad attribute can only shorten the
duration specified by the startd configuration knob. Therefore, since the
job is not informed when it enters vacating state, the only utility for
the job ad attribute is match-making, meaning "don't bother to start me if
you won't guarantee me if enough time to make forward progress."
I therefore propose that we call the job ad attribute
"request_duration", since that's the only thing it can do.
The startd configuration knob should thus include the word
duration. It can't be /just/ duration (unlike MEMORY), because (unlike
MEMORY and NUM_CPUs), it's a conditional minimum, not a configured
maximum. (The Miron directive for CHTC implies a knob called
MAXIMUM_DURATION, but that's a different a problem.)
Maybe call it ADVERTISED_DURATION?
Things that ignore MJRT currently include condor_reassign_slot(s)
(which may be renamed to condor_now) and one (or more?) of the shutdown
styles. A job may also not run for the MJRT because it didn't need that
much time or because its policy expressions indicated it shouldn't. We
agree that a job should also not run for the MJRT if it abuses the system
(e.g., uses more memory than requested), but that's not currently
implementable if the machine is draining. Because of these conditions, I
feel we shouldn't name the knob 'promised' or 'minimum'.
- ToddM
|