OK I did that but there seems to be a problem.
/home is shared among nodes.
[mahmood@rocks7 ~]$ which mpirun
/opt/openmpi/bin/mpirun
[mahmood@rocks7 ~]$ grep MPDIR openmpiscript
# $MPDIR points to the location of the OpenMPI install
MPDIR=/opt/openmpi
MPDIR=$(condor_config_val OPENMPI_INSTALL_PATH)
# If MPDIR is not set, then use a default value
if [ -z $MPDIR ]; then
echo "WARNING: Using default value for \$MPDIR in openmpiscript"
MPDIR=/usr/lib64/openmpi
PATH=$MPDIR/bin:.:$PATH
mpirun -v --prefix $MPDIR --mca $mca_ssh_agent $CONDOR_SSH -n $_CONDOR_NPROCS -hostfile machines $EXECUTABLE $@ &
[mahmood@rocks7 ~]$ cat mpi.ht
universe = parallel
executable = openmpiscript
arguments = mpihello
log = hellompi.log
output = hellompi.out
error = hellompi.err
machine_count = 2
queue
[mahmood@rocks7 ~]$ condor_submit mpi.ht
Submitting job(s).
1 job(s) submitted to cluster 13.
[mahmood@rocks7 ~]$ cat hellompi.err
Not defined: MOUNT_UNDER_SCRATCH
Not defined: MOUNT_UNDER_SCRATCH
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[compute-0-1.local:9520] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[compute-0-1.local:9521] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
[compute-0-1.local:09228] [[38005,0],0]->[[38005,0],2] mca_oob_tcp_msg_send_bytes: write failed: Broken pipe (32) [sd = 15]
[compute-0-1.local:09228] [[38005,0],0]-[[38005,0],2] mca_oob_tcp_peer_send_handler: unable to send message ON SOCKET 15
[mahmood@rocks7 ~]$ cat hellompi.out
WARNING: MOUNT_UNDER_SCRATCH not set in condor_config
WARNING: MOUNT_UNDER_SCRATCH not set in condor_config
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
--> Returned "(null)" (-43) instead of "Success" (0)
--------------------------------------------------------------------------
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
--> Returned "(null)" (-43) instead of "Success" (0)
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[38005,1],0]
Exit code: 1
--------------------------------------------------------------------------
[mahmood@rocks7 ~]$
Regards,
Mahmood