[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-devel] RFC: starter-enforced eviction policy expressions
- Date: Fri, 8 Apr 2005 10:11:18 +0100
- From: Matt Hope <matthew.hope@xxxxxxxxx>
- Subject: Re: [Condor-devel] RFC: starter-enforced eviction policy expressions
On Apr 8, 2005 12:35 AM, Derek Wright <wright@xxxxxxxxxxx> wrote:
>
> hi folks. one of our high-prio support customers wants to be able to
> have machine policies like "evict the job if the imagesize grows
> larger than the allocated memory on the virtual machine where it's
> running". unfortunately, it's the starter that monitors the job's
> imagesize, not the startd, so to make the above possible using
> existing policy expressions, we'd have to provide a mechanism for the
> starter to share info with the startd.
Makes sense
external forces -> startd -> starter -> job
job specific details should go via the starter
> moreover, we might want similar sorts of eviction to happen in cases
> where the starter is running without a startd at all (local universe,
> gridshell, etc). so, i'm proposing we add some additional policy
> expressions and logic into the starter itself. below i'm including
> the write-up i did about how i think it should all work. if anyone
> has time to read, think about, and comment on this proposal, that'd be
> swell. thanks,
nice idea - look forward to more of them...
> the starter currently maintains the state of the job, either "running"
> or "suspended". this would be expanded so that the state could be any
> of: "running", "suspended", "vacating" (graceful/soft eviction), or
> "killing" (hard eviction).
what about retirement? how will this fit in...
> admins would be able to define a few new expressions in their config
> file to control the transitions between these states. to simplify
> things, "suspended" would still only be triggered by the startd.
indeed - suspended should be a response to an external action, a job
can take care of pausing itself if it ever wants to.
> i see these policy expressions used to allow eviction if
> the job is "misbehaving".
good to have a clear use case in advance.
> whenever the job is either "running" or "suspended", the starter would
> evaluate the "STARTER_EVICT" expression to decide if we should evict
> the job (analagous to the startd's "PREEMPT" expression, which i think
> is unfortunately named, and should probably be called "EVICT"). if
> undefined, "STARTER_EVICT" would default to FALSE (don't do
> starter-based eviction at all, let the startd's policy settings
> control things). if "STARTER_EVICT" becomes TRUE, the starter would
> evaluate "STARTER_WANT_VACATE" (analagous to the startd's
> "WANT_VACATE") to decide what kind of eviction to perform. if
> "STARTER_WANT_VACATE" is FALSE, we go into the "killing" state and
> immediately hard-kill the job (and all its children) with a SIGKILL.
Will the starters vacate signals match those defined in the submit file?
> if "STARTER_WANT_VACATE" is TRUE, we go into the "vacating" state and
> begin a graceful eviction (sending the "KillSig" specificed in the job
> classad, or SIGTERM by default). is "STARTER_WANT_VACATE" is
> undefined, it defaults to TRUE.
What happens if the job specified it didn't want to vacate?
if STARTER_KILL is false (or never evaluates to false) what
happens...does the startds expressions kick in if they want to trigger
a vacation - how about if someone does a vacate command directly, with
or without -fast.
I think a state transition diagram with the (potentially noop)
responses to all the various actions/commands/expression evaluations
would be useful
> while the job is in the "vacating" state, the starter would also
> evaluate the "STARTER_KILL" expression, to decide if it should give up
> on the graceful eviction and move immediately to the hard-kill
> eviction. this is analagous to the startd's "KILL" expression. if
> undefined, this would default to FALSE (never move to hard killing,
> allow graceful eviction to run its course).
>
> all of these expressions would be evaluated in the context of the copy
> of the job classad used to spawn the job, and a starter-constructed
> classad that contained information the starter had gathered about the
> job, including image size, cpu usage (user and system rusage), total
> wallclock runtime, and number of children processes. so far, that's
> all the info the starter is monitoring about the job.
If you are looking at misbehaving jobs you may want to track:
Network IO usage (very tricky what with multiple processes)
Number of open file handles/descriptors (since these are a limited resource)
An additional, possibly useful piece of functionality, is some concept
of heart beating.
If a file such as .condor.heartbeat (name controlled by a variable is
prob a good idea) was created by the job and touched every so often
then the starter could factor the time stamp on the file into a
classad expression like "LastFileHeartbeat".
Then the expressions could say things like
CurrentTime - LastFileHeartbeat > 10 * Minute
and such like.
> NOTE: all of this would only be added to the starter that manages
> vanilla, java, MPI and local universes. i.e. these features would NOT
> be available for standard or PVM universe.
out of interest why not? not that it matters personally being firmly
"clipped" :)
> one undecided issue is if the starter should notify the startd when it
> decides to change the job state, so that the startd's state/activity
> could reflect what the starter is doing. if we make no such attempt,
> the startd will still report the virtual machine as "Claimed/Busy",
> even though the starter might be evicting the job.
Not nice - this should be viewed as a temporary stopgap to enable
testing of the functionality...
Though on a related point - are these classads going to be
broadcast/stored by the collector?
> however, adding
> this kind of reporting would involve additional complications in the
> startd and would delay providing this functionality, so if we do it,
> it should probably be a "phase 2" change, something we can do
> seperately, after we get the initial changes outlined above working.
Should definitely be viewed as a necessary - whether you implement in
two phases is entirely up to you since one can always not use it till
it is complete if you like.
A good idea - though with some complexities that need some ironing out
perhaps...
Matt