[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] peaceful node drain and shutdown
- Date: Wed, 13 Jul 2016 17:12:53 -0400
- From: Michael V Pelletier <Michael.V.Pelletier@xxxxxxxxxxxx>
- Subject: Re: [HTCondor-users] peaceful node drain and shutdown
From: Brian Bockelman <bbockelm@xxxxxxxxxxx>
Date: 07/13/2016 04:28 PM
> You sure about this?
>
> I also recall the same behavior that Bob describes - if START goes
to
> FALSE instead of UNDEFINED, then the node transitions to Owner state,
> which then kills off running jobs.
>
> (Again, might have changed at some point)
I use to to manage machine oversubscription, among
other things.
However, I've always set PREEMPT to false and used
partitionable
slots, so perhaps once I start trying to use preemption
all that's
going to fall apart on me.
For instance, I have the START _expression_ go false
when the load
average of the machine exceeds 125% of the CPU capacity
while the
dynamic slots continue to run. Likewise if a remote
filesystem
runs low on disk space. This has been quite handy.
Maybe I need to add a check of SlotType to limit this
application
of the START _expression_ to Partitionable slots only,
or look at
the state and activity so that a non-Unclaimed slot
doesn't go
into Owner and then try to preempt when low disk space
or whatever
pulls my START _expression_ false?
Or is it just a matter of moving it to UNDEFINED instead
of
False?
-Michael Pelletier.
_