Re: [HTCondor-devel] condor_ssh_to_job is cool, but DNS would be cooler


Date: Fri, 22 Mar 2013 14:26:31 -0500
From: Erik Paulson <epaulson@xxxxxxxxxxxx>
Subject: Re: [HTCondor-devel] condor_ssh_to_job is cool, but DNS would be cooler


On Thu, Mar 21, 2013 at 4:53 PM, Dan Bradley <dan@xxxxxxxxxxxx> wrote:

On 3/21/13 4:01 PM, Erik Paulson wrote:

I will report back. (After building a more modern version of Python for submit-1.chtc)

We may have a slight plumbing problem here.  ssh_to_job wants to set up the connection, and keys, and then invoke the ssh client.  Now for the convoluted part: the command-line options ssh_to_job passes to the ssh client by default contain a ProxyCommand option, which invokes ssh_to_job.  When this inner ssh_to_job runs, it gets the file descriptor of the connection passed to it from the outer ssh_to_job, and it then proxies this connection via its stdin/stdout for the ssh client.

It sounds like what you want is to dispense with the top level ssh_to_job and just have the inner one set up the connection and do the proxying.  But how will your ssh know to use the key that ssh_to_job sets up for the connection?  Also, how will it know which username to try to login as?  These things are normally passed from ssh_to_job as options to the ssh client that it launches.

Anyway, once those questions are answered, here is a way to trick ssh_to_job into forming the connection and proxying it for an outer ssh client:

ssh -oProxyCommand='/usr/bin/condor_ssh_to_job -ssh "/bin/sh -c %%x"  jobid' whatever

Instead of launching an ssh client, it launches itself in proxy mode.  (That's what the %%x expands to.  I had to double the % to get it to pass through the outer ssh's parser for ProxyCommand.  Your case may differ.)  Clearly, if this were something we wanted to support for real, we could make the outer ssh_to_job do the proxying directly, rather than having it invoke a second copy of itself to do it.

If you try the above example, you will find that you can't log in, because the outer ssh doesn't have the right key, and it probably isn't logging in as the right user.


Right. I kind of got it to work, but it would be pretty hacky to script. You can sort of hack around it by watching the debug output of condor_ssh_to_job:

03/22/13 14:05:27 Executing ssh command: /bin/sh -c "/usr/bin/condor_ssh_to_job"' '"-debug"' '"-proxy"' '"/tmp/epaulson.condor_ssh_to_job_e93ccc2f/fdpass" -oUser=slot1 -oIdentityFile=/tmp/epaulson.condor_ssh_to_job_e93ccc2f/ssh_key -oStrictHostKeyChecking=yes -oUserKnownHostsFile=/tmp/epaulson.condor_ssh_to_job_e93ccc2f/known_hosts -oGlobalKnownHostsFile=/tmp/epaulson.condor_ssh_to_job_e93ccc2f/known_hosts -oProxyCommand="/usr/bin/condor_ssh_to_job"' '"-debug"' '"-proxy"' '"/tmp/epaulson.condor_ssh_to_job_e93ccc2f/fdpass" condor-job.e103.chtc.wisc.edu
03/22/13 14:05:27 OpSysMajorVersion:  5 
03/22/13 14:05:27 OpSysShortName:  SL 
03/22/13 14:05:27 OpSysLongName:  Scientific Linux SL release 5.7 (Boron) 
03/22/13 14:05:27 OpSysAndVer:  SL5 
03/22/13 14:05:27 OpSysLegacy:  LINUX 
03/22/13 14:05:27 OpSysName:  SL 
03/22/13 14:05:27 OpSysVer:  507 
03/22/13 14:05:27 OpSys:  LINUX 
03/22/13 14:05:27 Using IDs: 12 processors, 12 CPUs, 0 HTs
03/22/13 14:05:27 Reading condor configuration from '/etc/condor/condor_config'
03/22/13 14:05:27 Enumerating interfaces: lo 127.0.0.1 up
03/22/13 14:05:27 Enumerating interfaces: bond0 128.104.100.43 up
03/22/13 14:05:27 Enumerating interfaces: virbr0 192.168.122.1 up
03/22/13 14:05:27 Disabling ConvertDefaultIPToSocketIP() because NETWORK_INTERFACE does not match multiple IPs.
03/22/13 14:05:27 Setting up ssh proxy on file descriptor 4
03/22/13 14:05:27 Passed ssh connection to ssh proxy.
debug1: Remote protocol version 2.0, remote software version OpenSSH_4.3
debug1: match: OpenSSH_4.3 pat OpenSSH*

The outer ssh pauses at this point waiting for me to accept the private key - I could also run it as
ssh -oProxyCommand='/usr/bin/condor_ssh_to_job -debug -ssh "/bin/sh -c %%x"  22413109' -oPreferredAuthentications=keyboard-interactive,password,publickey -v -l slot1 22431093 
and get a pause

While the outer ssh is paused, you can grab the ssh_key that it writes out in /tmp and copy it to ~/.ssh/id_rsa - it doesn't check the key at the beginning and a later step will be able to find it. 

I think you're screwed on the 'which user' bit - you invoke the outer ssh without knowing what to specify as -l at the commandline. You can discover that by looking at the debug output as well, but it's too late to get it into the commandline. (The same thing with the pre-populated known-hosts file)

Hacking condor_ssh_to_job to not invoke itself but instead just run the proxy and spit out the necessary info for a second version to be invoked seems like it would work - ie

condor_ssh_to_job -createProxy -outputFile=/path/to/results/dictionaryfile jobid &
condor_ssh_to_job -proxy dictionaryfile.fdpath -l dictionaryfile.username -oIdentityFile=dictionaryfile.ssh_key -oUserKnownHostsFile=dictionaryfile.knownhost

I think the second condor_ssh_to_job can be a simple program that just reads the fd over the domain socket and spits data in and out - if nc knew how to read that first fd over the domain socket it'd be done. 

-Erik


[← Prev in Thread] Current Thread [Next in Thread→]