Hi Thomass,
I think that decreasing values of variables MAX_CLAIM_ALIVES_MISSED
and ALIVE_INTERVAL will help you.
Details in manual:
http://research.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:AliveInterval
http://research.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:MaxClaimAlivesMissed
Regards, Lukas
On Tue, Nov 22, 2011 at 01:59:01PM +0000, Thomas Luff wrote:
If a target machine shutsdown/crashes whilst a job is running on
the machine the job will hang around in the queue with the status
'Running'.
Even if the machine is shutdown and left off, the job still acts as
if it's running and has been like this for over an hour now.
Is it possible to make these jobs automatically fail or requeue if
the target machine goes down?
Thanks