Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Drain HTCondor worker by setting instance metadata value
- Date: Tue, 05 Sep 2017 15:55:52 -0500
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Drain HTCondor worker by setting instance metadata value
On 9/5/2017 1:49 PM, Dimitri Maziuk wrote:
On 09/05/2017 01:28 PM, Todd Tannenbaum wrote:
On 9/5/2017 1:19 PM, Dimitri Maziuk wrote:
On 09/05/2017 11:28 AM, Todd Tannenbaum wrote:
condor_drain <machine-name>
Quick question: will it reset if I bounce the node or will I need to run
condor_drain -cancel after reboot?
It will reset after rebooting the node. No need for -cancel.
Thank you, but condor_drain -graceful has just SIGTERM'ed the running
jobs which is not quite the same as setting START to false and running
condor_reconig.
Good point.
If you don't want HTCondor to preempt (i.e. SIGTERM) a running job
unless the job has already run for over X seconds, set
MaxJobRetirementTime to X in condor_config on the execute node.
condor_drain -graceful will honor the MaxJobRetirementTime attribute, as
will preemption for any other reason i.e. user priority, startd rank
expression, preempt expression, etc. See
http://research.cs.wisc.edu/htcondor/manual/v8.6/3_5Configuration_Macros.html#25630
If you want to configure things so preemption of a job is only delayed
in the case of draining, but you still want the job to be immediately
preempted in the case of user priority/rank/preempt expression, etc,
note that MaxJobRetirementTime is a classad expression evaluated in the
context of the slot ad. So if you put in the condor_config on your
execute machine something like:
MaxJobRetirementTime = ifThenElse(Draining =?= True, 8*60*60, 0)
it will tell HTCondor on that execute node to allow jobs to continue
running unmolested for up to eight hours when they receive a
condor_drain command. The key here is the condor_startd helpfully sets
the slot attribute
Draining=True whenever it is in draining state.
Hope the above helps,
Todd
So it appears "instead of twiddling with START expressions ... simply
invoke the condor_drain" is not entirely correct.
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing Department of Computer Sciences
HTCondor Technical Lead 1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132 Madison, WI 53706-1685