Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Jobs restarting

Date: Thu, 19 Nov 2015 17:42:52 -0500
From: Ben Cotton <ben.cotton@xxxxxxxxxxxxxxxxxx>
Subject: Re: [HTCondor-users] Jobs restarting

On Wed, Nov 18, 2015 at 6:28 AM, Peter Ellevseth
<Peter.Ellevseth@xxxxxxxxxx> wrote:

Peter,

> 1.     How can I change the time it takes before the head node orders a
> restart of a job.
>
I know I've answered this question before, but I can't find the answer
(or the source of the answer) right now. Sorry.

> 2.     Is it possible to change what is done when a restart is issued. Could
> I, instead of condor sending a SIGKILL to the job, tell it to run a script
> that shuts the job down safely? It would be preferable to have condor shut
> the job quietly down instead of restarting it.
>
For Linux, you can use the kill_sig command in the submit file to tell
HTCondor what signal to use. Your code (or a wrapper around it) would
need to trap whatever signal you set and do the appropriate action. If
it's a vanilla universe job, you can also use something like DMTCP to
do checkpointing.


Thanks,
BC

-- 
Ben Cotton

Cycle Computing
Better Answers. Faster.

http://www.cyclecomputing.com
twitter: @cyclecomputing

References:
- [HTCondor-users] Jobs restarting
  - From: Peter Ellevseth

Prev by Date: Re: [HTCondor-users] bug in schedd_negotiate.cpp - 144-core startd
Next by Date: Re: [HTCondor-users] Jobs restarting
Previous by thread: [HTCondor-users] Jobs restarting
Next by thread: Re: [HTCondor-users] Jobs restarting
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [HTCondor-users] Jobs restarting