Hi Xavier, It sounds like you've stumbled on the "HTCondor Black Hole" problem, which has come up several times before. I'm not sure if we have a clear solution to it. I think this largely depends on your cluster size and configuration. One option is to set a START _expression_ on the failing machine: STARTD.STATISTICS_TO_PUBLISH_LIST = JobDuration JobBusyTime START = RecentJobBusyTimeAvg is Undefined || RecentJobBusyTimeAvg > $(MIN_JOB_TIME) And set MIN_JOB_TIME to whatever you consider a reasonable minimum job time, maybe 60 seconds? Another solution is to use a requirements _expression_, although this can be inefficient in larger pools. There's some information and an example on this page here: https://research.iac.es/sieinvens/siepedia/pmwiki.php?n=HOWTOs.CondorHowTo#howto_failing Maybe if you can describe your setup I'll think of something else. How big is your pool? Why are you keeping that node around at all if it's faulty? Mark On Fri, Sep 24, 2021 at 1:59 AM Xavier OUVRARD <xavier.ouvrard@xxxxxxx> wrote:Dear all, I encountered (a solved) problem of a faulty compute node that had some troubles to be reached by the scheduler, but that was able to validate the acceptation of the job to the central manager that is on another machine. The job failed in idle state; and looking at the scheduler log, the job was always resubmitted to the same node for hours. Hence, I was wandering if there was a possibility to avoid this kind of behaviour in the configuration of the scheduler / central manager, ie that the scheduler asks the central manager another node to compute after having the job staying in idle state for a while, not started, and that always the same node has responded to the central manager? HTCondor version is 8.8.15-1 Best regards, Xavier _______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/-- Mark Coatsworth Systems Programmer Center for High Throughput Computing Department of Computer Sciences University of Wisconsin-Madison _______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/