Hi, I am submitting MPI jobs (parallel universe, using MPICH1.2.4)
I have setup a separate user (something like “condor-user”)
to run condor jobs on all the dedicated nodes. I created the certificates and
copied to all the nodes. So the user (“condor-user”) can ssh with out password,
within all the nodes and to its own node. But the job fails and complaining about the connection
refused to the same machine. (I.e) the job runs on Machine A, couldn’t
not connect to Machine A. Here is the error from one of the node. connect to address xxx.xx.xxx.xx: Connection refused connect to address xxx.xx.xxx.xx: Connection refused trying normal rsh (/usr/bin/rsh) MachineA: Connection refused By default condor tries ssh right? And then it tries rsh
right? Because the machine won’t allow to do rsh. Could you please let me know what might be the problem? Thanks, Senthil |