I'll add one thing, there's a grace period - http://research.cs.wisc.edu/condor/manual/v7.6/3_5Policy_Configuration.html#SECTION00455100000000000000If you are seeing all jobs restart when you reboot your submit node, there may be a bug. Send along some details and we can have a look at it.
Best, matt On 03/21/2012 06:24 AM, Hermann Fuchs wrote:
Thank you very much for your replies. It seems this is exactly what we need. Thanks a lot, Hermann On Wed, 2012-03-21 at 09:51 +0000, Alex Iosup - EWI wrote:Hermann, You could try the High Availability Daemons [ http://research.cs.wisc.edu/condor/manual/v7.0/3_10High_Availability.html ]. Section 3.10.1, High Availability of the Job Queue, is what you're looking for. Afair, Condor is using a hot spares and a fail-over mechanism. Regards, Alexandru