Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] Faulty node and idle state
- Date: Fri, 24 Sep 2021 08:59:37 +0200
- From: Xavier OUVRARD <xavier.ouvrard@xxxxxxx>
- Subject: [HTCondor-users] Faulty node and idle state
Dear all,
I encountered (a solved) problem of a faulty compute node that had some
troubles to be reached by the scheduler, but that was able to validate
the acceptation of the job to the central manager that is on another
machine.
The job failed in idle state; and looking at the scheduler log, the job
was always resubmitted to the same node for hours. Hence, I was
wandering if there was a possibility to avoid this kind of behaviour in
the configuration of the scheduler / central manager, ie that the
scheduler asks the central manager another node to compute after having
the job staying in idle state for a while, not started, and that always
the same node has responded to the central manager?
HTCondor version is 8.8.15-1
Best regards,
Xavier