Hi, I found condor would not reschedule my jobs executing on nodes which were failed because of hardware or power. I think there is a way to tell condor do it. Can anyone point it to me? Or I must monitor the log file and do it by myself?
I also found that if a node shutdown, jobs executing on it would be terminated abnormally by signal 9.
I hope that the solution can apply to cluster job. Thanks very much!