[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Condor_ssh_to_job not working with Shared Port across WAN



Hi Todd,

Thank for the answer. But I thought there was some rework con condor_ssh_to_job. Because I still cannot do it over the WAN (root or non root)

[1032] alarmstr@uclhc-1 ~$ condor_ssh_to_job -debug 856033.29
10/12/18 10:32:56 SharedPortClient: sent connection request to schedd at <192.5.19.13:9615> for shared port id 1256425_f007_4
10/12/18 10:32:56 SharedPortClient: sent connection request to local schedd for shared port id 1256425_f007_4
10/12/18 10:32:56 Response for GET_JOB_CONNECT_INFO:
StarterIpAddr = "<169.228.131.243:39092?CCBID=169.228.130.106:9622%3faddrs%3d169.228.130.106-9622+[--1]-9622#1251&PrivNet=cabinet-0-0-11.t2.ucsd.edu&addrs=169.228.131.243-39092+[--1]-39092&noUDP>"
Result = true
ServerTime = 1539365576
CondorVersion = "$CondorVersion: 8.6.12 Jul 31 2018 BuildID: 446077 $"

10/12/18 10:32:56 Got connect info for starter <169.228.131.243:39092?CCBID=169.228.130.106:9622%3faddrs%3d169.228.130.106-9622+[--1]-9622#1251&PrivNet=cabinet-0-0-11.t2.ucsd.edu&addrs=169.228.131.243-39092+[--1]-39092&noUDP>
10/12/18 10:32:56 No shared_port cookie available; will fall back to using on-disk $(DAEMON_SOCKET_DIR)
10/12/18 10:32:56 No shared_port cookie available; will fall back to using on-disk $(DAEMON_SOCKET_DIR)
10/12/18 10:32:56 Executing ssh command: ssh -oUser=cuser2 -oIdentityFile=/tmp/alarmstr.condor_ssh_to_job_8d60cdd6/ssh_key -oStrictHostKeyChecking=yes -oUserKnownHostsFile=/tmp/alarmstr.condor_ssh_to_job_8d60cdd6/known_hosts -oGlobalKnownHostsFile=/tmp/alarmstr.condor_ssh_to_job_8d60cdd6/known_hosts -oProxyCommand="condor_ssh_to_job"' '"-debug"' '"-proxy"' '"/tmp/alarmstr.condor_ssh_to_job_8d60cdd6/fdpass" condor-job.cabinet-0-0-11.t2.ucsd.edu
10/12/18 10:32:56 Passed ssh connection to ssh proxy.
10/12/18 10:32:56 Setting up ssh proxy on file descriptor 4
ssh_exchange_identification: Connection closed by remote host
10/12/18 10:32:57 Attempting to remove /tmp/alarmstr.condor_ssh_to_job_8d60cdd6 as unknown user

Do I can do condor_tail


Edgar M Fajardo Hernandez



On Oct 11, 2018, at 3:23 PM, Todd Tannenbaum <tannenba@xxxxxxxxxxx> wrote:

On 10/11/2018 3:48 PM, Edgar M Fajardo Hernandez wrote:
It would seem to me the starter is not aware that the Submit host is
in the shared Port since it is trying to connect back to it on the
ephemeral ports rather than on the Shared Port port 9615

Condor_Tail shows similar error:

[1327] dantrim@uclhc-1 ~$ condor_tail -debug 856006.142
10/10/18 13:27:22 Requesting GoAhead from the transfer queue manager.
10/10/18 13:27:22 Received GoAhead from the transfer queue manager.
10/10/18 13:27:22 CCBClient: received failure message from CCB server
collector 169.228.130.106:9647?addrs=169.228.130.106-9647+[--1]-9647
in response to request for reversed connection to starter at
<169.228.132.166:2574>: failed to connect
10/10/18 13:27:22 Failed to reverse connect to starter at
<169.228.132.166:2574> via CCB.
Failed to peek at file from starter: Failed to connect to starter

However it works when I run it as root:

[snip]

Any ideas here to try?



Yep.

My guess is you are encountering the same issue as back in Feb.

Refer to

 https://lists.cs.wisc.edu/archive/htcondor-users/2018-February/msg00104.shtml

for solutions.

Best regards,
Todd