Condor 6.8.5
Occasionally, there's some sort of lock-up occuring in my cluster. The
symptoms are:
- condor_status hangs indefinitely
- condor_q hangs for about a minute and returns 'Failed to fetch ads
from: <... : 9683> : ..'
- condor_restart -subsystem schedd hangs
- I tried this based on looking at condor_users mail
- condor processes still running (although no apparent activity)
Logs:
- MasterLog shows normal activity
- NegotiatorLog seems to have stopped reporting
- normally it writes messages every 5 minutes
- the last report was "Getting all public ads ..."
- SchedLog reports 'Called reschedule_negotiator()' as last message
- a condor_submit_dag had been performed in the same time frame
- normally, the next message is "Activity on stashed negotiator
socket"
- StartLog has nothing special (although file is still being touched)
- the only other file still being touched is MasterLog
My conclusion would be the negotiator is somehow stuck.
any ideas
thank you
andy pleat
------------------------------------------------------------------------
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/