[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] peaceful node drain and shutdown



I have my START _expression_ evaluate to FALSE when certain filesystems become unmounted/unmountable (using STARTD_CRON_*). Empirically the cluster keeps on rolling, it just doesn't start new jobs. The behavior that I expect and want.

I use partitionable slots throughout.

--
Tom Downes
Senior Scientist and Data CenterÂManager
Center for Gravitation, Cosmology and Astrophysics
University of Wisconsin-Milwaukee
414.229.2678

On Wed, Jul 13, 2016 at 3:27 PM, Brian Bockelman <bbockelm@xxxxxxxxxxx> wrote:
You sure about this?

I also recall the same behavior that Bob describes - if START goes to FALSE instead of UNDEFINED, then the node transitions to Owner state, which then kills off running jobs.

(Again, might have changed at some point)

Brian

> On Jul 13, 2016, at 3:09 PM, Todd Tannenbaum <tannenba@xxxxxxxxxxx> wrote:
>
> On 7/13/2016 3:03 PM, Bob Ball wrote:
>> Maybe this info is now obsolete, but I remember once setting the START
>> to an _expression_ that evaluated "FALSE" and caused all the running jobs
>> to terminate....
>>
>> bob
>>
>
> Only if $(START) is referenced in the PREEMPT _expression_....
>
> START just controls when new jobs can be launched.
>
> PREEMPT controls when to kick off jobs (really would be more accurate to have named it "Evict" instead of "Preempt", sigh...).
>
> regards
> Todd
>
>
>> On 7/13/2016 3:56 PM, Fox, Kevin M wrote:
>>> I'm guessing the condor_drain command will have similar issues to the
>>> condor_off -peaceful command? That you have to have all the
>>> permissions setup right?
>>>
>>> The nice thing about the START=FALSE config trick is you only need
>>> root on the machine to do it.
>>>
>>> Thanks,
>>> Kevin
>>> ________________________________________
>>> From: HTCondor-users [htcondor-users-bounces@xxxxxxxxxxx] on behalf of
>>> Todd Tannenbaum [tannenba@xxxxxxxxxxx]
>>> Sent: Wednesday, July 13, 2016 12:46 PM
>>> To: HTCondor-Users Mail List
>>> Subject: Re: [HTCondor-users] peaceful node drain and shutdown
>>>
>>> On 7/13/2016 2:29 PM, Fox, Kevin M wrote:
>>>> Ah. I had seen the docs for START but didn't realize it would affect new
>>>> job startup too. It seemed to imply that its for eviction.
>>>>
>>>> But, the following seems to work to drain the node gracefully, as you
>>>> suggested:
>>>> echo START=FALSE > /etc/condor/config.d/00shutdown
>>>> kill -HUP <PID OF MASTER>
>>>>
>>>> and to reverse it
>>>> rm -f /etc/condor/config.d/00shutdown
>>>> kill -HUP <PID OF MASTER>
>>>>
>>>> Thanks for the help. :)
>>>>
>>> Hi Kevin,
>>>
>>> If the above satisfies your needs, great. But just wanted to point out
>>> you can do the same thing (drain a node gracefully) with the
>>> condor_drain tool. Do "man condor_drain", or see
>>>Â Âhttp://htcondor.org/manual/v8.4/condor_drain.html
>>>
>>> Also in the upcoming HTCondor v8.5.6, the condor_drain functionality is
>>> exposed via HTCondor's Python API. :)
>>>
>>> regards,
>>> Todd
>>>
>>>
>>> _______________________________________________
>>> HTCondor-users mailing list
>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
>>> with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>>
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>> _______________________________________________
>>> HTCondor-users mailing list
>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
>>> with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>>
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>>
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>
>
> --
> Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
> Center for High Throughput Computing ÂDepartment of Computer Sciences
> HTCondor Technical Lead        1210 W. Dayton St. Rm #4257
> Phone: (608) 263-7132Â Â Â Â Â Â Â Â Â Madison, WI 53706-1685
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/