On Tue, Feb 01, 2005 at 12:42:23PM +0100, Tobias Edler wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hello everybody !
>
> I try to set up condor to use MPI, so i installed mpi to /usr/global/mpi
> and linked mpirun to /usr/bin/
>
> no matter where i have mpi installed or not , whenever i try run the
> simplempi from the users manual, all i get is this output:
>
> p0_2957: p4_error: Child process exited while making connection to
> remote process on c029.cip.physik.local: 0
> p0_2957: (6.333597) net_send: could not write to fd=4, errno = 32
>
> As far as i understand, condor uses /home/condor/condor/sbin/rsh to
> start the job, right ? this doesn't work, as for security reasons, rsh is
> not allowed here.
You'll note that /home/condor/condor/sbin/rsh is not really rsh, it's
just named rsh. It does not have the security problems of the Berekely
rsh.
> So i set up ssh :
>
> bash-2.05b$ whoami
> condor
> bash-2.05b$ ssh c029 date
> Tue Feb 1 12:41:20 CET 2005
>
> and linked it there, but this didn't help either.
>
That was your mistake. Put the condor program named 'rsh' back.
> So
> a) how do i tell condor where to look for mpi
> b) how do i tell condor to use ssh ?
>
a) you don't need to
b) you can't
Link your job with MPICH 1.2.4 for the ch_p4 device. Condor does not
need any MPI runtime support (we don't use mpirun)
-Erik
|