[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Disabling the restarting of jobs





Jaime Frey wrote:

You can also prevent the jobs from rerunning if accidentally released like so:
requirements = NumJobStarts  =?= 0 || NumJobStarts =?= Undefined
periodic_hold = NumJobStarts > 0 && JobStatus == 1

This is better than my answer, which was simply periodic_hold = NumJobStarts>0. Just to be clear, this policy prevents the job from restarting in cases where the job was started and then returns to the idle state (JobStatus == 1). This can happen, for example, if Condor encounters some problem when trying to run the job, such as the condor_startd or condor_starter getting killed, or some communication failure between the submit and execute node. For preemptable jobs, this can also happen if Condor evicts the job from the machine.

Jaime's other suggestion (below) applies if the job runs to completion (or is killed by something external to Condor). In this case, Condor's normal behavior is to have the job leave the job queue, so you shouldn't normally have to do anything special to prevent it from rerunning.

Adding this to your submit file will put the job on hold if it doesn't exit with status 0:
on_exit_hold = ExitCode =!= 0



--Dan