Where did you get that address? When the schedd is behind a shared port its address will include a shared port id using the keyword sock=<port-id>. something like this. <172.1.3.3:9618?addrs=172.1.3.3-9618&noUDP&sock=5044_80fc_5> -tj From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx]
On Behalf Of robert smith via HTCondor-users Hi, I can't get condor_submit -addr to work when condor_schedd is behind a condor_shared_port. Output from condor_submit is below sh-4.2$ condor_submit -debug -addr "<172.1.3.3:9618>" job.sub Submitting job(s)07/04/18 10:06:49 condor_read() failed: recv(fd=4) returned -1, errno = 104 Connection reset by peer, reading 5 bytes from schedd at <172.1.3.3:9618>. 07/04/18 10:06:49 IO: Failed to read packet header 07/04/18 10:06:49 SECMAN: no classad from server, failing ERROR: Failed to connect to local queue manager SECMAN:2007:Failed to end classad message. Error message written to /var/log/condor/SharedPortLog in the schedd container is 07/04/18 10:06:49 SharedPortServer: server was busy, failed to connect collector as requested by <172.1.3.3:46528>: primary (7d2cc1f5fc7f6a4e2eb39facb9bb27877fdd809e4b7fa28fd830cd99c77172ee/collector):
Connection refused (111); alt (/var/lock/condor/daemon_sock/collector): Connection refused (111) Nothing is written to /var/log/condor/SchedLog Why is condor_submit even trying to access the collector when -addr is meant to tell it to connect straight to the sched? Is there is a bug in condor_submit that means it
asks the shared_port_daemon to connect to the the collector, not the sched, even when the -addr option it set? Everything works fine when sched isn't running behind a condor_shared_port, so I've worked round this issue by simply not using a shared port. Relevant versions are sh-4.2$ condor_version $CondorVersion: 8.6.11 May 10 2018 BuildID: 440910 $ $CondorPlatform: x86_64_RedHat7 $ Relevant files are sh-4.2$ cat job.sub should_transfer_files = YES when_to_transfer_output = ON_EXIT Universe = vanilla Executable = /bin/bash Arguments = test.sh Log = job.log Output = job.out Error = job.error transfer_input_files = test.sh Queue sh-4.2$ sh-4.2$ sh-4.2$ cat test.sh echo Starting test.sh whoami id hostname /usr/sbin/ip a echo Ending test.sh sh-4.2$ I'm running HTCondor in a container on Kubernetes, but doubt that is relevant to this problem. Thanks, Rob |