Hello,
I've fixed the issue. For some reason the /var/lock/condor directory was re-grouped/owned under root. Changing it back to a condor user/group and restarting was enough for them to register in the pool. Thanks, Iain From: HTCondor-users [htcondor-users-bounces@xxxxxxxxxxx] on behalf of Iain Bradford Steers [iain.steers@xxxxxxx]
Sent: 10 March 2015 08:45 To: htcondor-users@xxxxxxxxxxx Subject: [HTCondor-users] SharedPortEndpoint: failed to bind to /var/lock/condor/daemon_sock/25689_90ae: Permission denied Hi,
I noticed some of my worker nodes never showed up in condor_status after creating them. Doing a pstree on the nodes shows that startd wasn't running. I attempted to start it and encountered the following situation. ~]# condor_startd 03/10/15 08:38:05 Can't open "/var/log/condor/StartLog" ERROR "Cannot open log file '/var/log/condor/StartLog'" at line 208 in file /slots/01/dir_21000/userdir/src/condor_utils/dprintf_setup.cpp So I temporarily renamed the file and I'm now getting the following in the StartLog. 03/10/15 08:24:38 ERROR: SharedPortEndpoint: failed to bind to /var/lock/condor/daemon_sock/25689_90ae: Permission denied 03/10/15 08:24:38 ERROR "Failed to start local listener (USE_SHARED_PORT=true)" at line 2897 in file /slots/01/dir_21000/userdir/src/condor_daemon_core.V6/daemon_core.cpp I'm using Puppet to configure htcondor so it doesn't appear to be a differing config between successful worker nodes and this. Regards, Iain |