Hi Xavier,
Without additional information, it's hard to say what was
happening. One execute node being down shouldn't cause jobs
to idle in the queue -- they would just match to one of the
other execute nodes (if they fit the job's requirements).
Can you post the job log from one of the stuck jobs somewhere?
Perhaps that will give us more information.
Thanks,
-Mat
On 4/7/21 5:00 AM, Xavier OUVRARD wrote:
Dear all,_______________________________________________
since yesterday I had 6 jobs that were idle on a scheduler; one
computation node was faulty and I kept having attempt to connect to ...
in the SchedulLog; it seems then that it was blocking all the remaining
jobs that were kept in the condor_q. Rebooting the faulty node (not the
scheduler), allowed all the remaining jobs that were iddled to be run
again without any additional intervention.
Is it a normal behaviour?
The condor version is 8.8.13 on all machines.
Best regards,
Xavier
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/