On Jul 28, 2017, at 4:27 AM, Justin Fisher <justin0419@xxxxxxxxx> wrote:
I occasionally get this error. 192.168.1.206 is the machine I use to submit the jobs. I think it's some kind of network issue, but I'm not sure. My work around is to reboot the submit machine, but is there a less drastic method?
I can ping all the other machines on the network and the NFS shares needed for Condor are all there.
ERROR: Failed to connect to local queue managerCEDAR:6001:Failed to connect to <192.168.1.206:9618?addrs=192.168.1.206-9618+[--1]-9618& >noUDP&sock=1870_3e31_4
This looks like an error message that condor_submit prints.When this error occurs, does it happen every time, or does condor_submit still work sometimes? Do other commands that talk to the schedd (e.g. condor_q, condor_rm) also fail?
You say you can ping all of the other machines on the network. Can you ping this machine (192.168.1.206) when the errors occur? If the machine is otherwise healthy, you can try restarting just the HTCondor daemons.
Thanks and regards,Jaime FreyUW-Madison HTCondor Project
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@cs.wisc.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor- users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/