[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Jobs restarting



On Wed, Nov 18, 2015 at 6:28 AM, Peter Ellevseth
<Peter.Ellevseth@xxxxxxxxxx> wrote:

Peter,

> 1.     How can I change the time it takes before the head node orders a
> restart of a job.
>
I know I've answered this question before, but I can't find the answer
(or the source of the answer) right now. Sorry.

> 2.     Is it possible to change what is done when a restart is issued. Could
> I, instead of condor sending a SIGKILL to the job, tell it to run a script
> that shuts the job down safely? It would be preferable to have condor shut
> the job quietly down instead of restarting it.
>
For Linux, you can use the kill_sig command in the submit file to tell
HTCondor what signal to use. Your code (or a wrapper around it) would
need to trap whatever signal you set and do the appropriate action. If
it's a vanilla universe job, you can also use something like DMTCP to
do checkpointing.


Thanks,
BC

-- 
Ben Cotton

Cycle Computing
Better Answers. Faster.

http://www.cyclecomputing.com
twitter: @cyclecomputing