[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Faulty node and idle state



Dear all,

I encountered (a solved) problem of a faulty compute node that had some troubles to be reached by the scheduler, but that was able to validate the acceptation of the job to the central manager that is on another machine.
The job failed in idle state; and looking at the scheduler log, the job 
was always resubmitted to the same node for hours. Hence, I was 
wandering if there was a possibility to avoid this kind of behaviour in 
the configuration of the scheduler / central manager, ie that the 
scheduler asks the central manager another node to compute after having 
the job staying in idle state for a while, not started, and that always 
the same node has responded to the central manager?
HTCondor version is 8.8.15-1

Best regards,

Xavier