Subject: Re: [HTCondor-users] peaceful node drain and shutdown
From: Brian Bockelman <bbockelm@xxxxxxxxxxx> Date: 07/13/2016 04:28 PM
> You sure about this?
>
> I also recall the same behavior that Bob describes - if START goes
to
> FALSE instead of UNDEFINED, then the node transitions to Owner state,
> which then kills off running jobs.
>
> (Again, might have changed at some point)
I use to to manage machine oversubscription, among
other things.
However, I've always set PREEMPT to false and used
partitionable slots, so perhaps once I start trying to use preemption
all that's going to fall apart on me.
For instance, I have the START _expression_ go false
when the load average of the machine exceeds 125% of the CPU capacity
while the dynamic slots continue to run. Likewise if a remote
filesystem runs low on disk space. This has been quite handy.
Maybe I need to add a check of SlotType to limit this
application of the START _expression_ to Partitionable slots only,
or look at the state and activity so that a non-Unclaimed slot
doesn't go into Owner and then try to preempt when low disk space
or whatever pulls my START _expression_ false?
Or is it just a matter of moving it to UNDEFINED instead
of False?