Subject: [HTCondor-users] MPICH job run on different machines
Hi, I am using the condor to submit a MPI job needing 16 slot. It's ok that using the mpiexec to run the job on two machines. But if I use mp1script to tun it, condor gives the following errors: 10.1.1.103 no such file or directory /tmp/var/condor/execute/dir_2109562 The following is the contact file: 8 10.1.1.103 4444 condor /tmp/var/condor/execute/dir_3338933 1460387177 0 node70 4444 condor /tmp/var/condor/execute/dir_2109562 1460387177 7 node70 4445 condor /tmp/var/condor/execute/dir_2109566 1460387177 4 10.1.1.103 4445 condor /tmp/var/condor/execute/dir_3338930 1460387177 3 node70 4446 condor /tmp/var/condor/execute/dir_2109564 1460387177 14 10.1.1.103 4446 condor /tmp/var/condor/execute/dir_3338936 1460387177 1 node70 4447 condor /tmp/var/condor/execute/dir_2109563 1460387177 12 10.1.1.103 4447 condor /tmp/var/condor/execute/dir_3338935 1460387177 11 node70 4448 condor /tmp/var/condor/execute/dir_2109568 1460387177!
10
10.1.1.103 4448 condor /tmp/var/condor/execute/dir_3338934 1460387177 6 10.1.1.103 4449 condor /tmp/var/condor/execute/dir_3338932 1460387177 9 node70 4449 condor /tmp/var/condor/execute/dir_2109567 1460387177 5 node70 4450 condor /tmp/var/condor/execute/dir_2109565 1460387177 2 10.1.1.103 4450 condor /tmp/var/condor/execute/dir_3338929 1460387177 15 10.1.1.103 4451 condor /tmp/var/condor/execute/dir_3338938 1460387177 13 node70 4451 condor /tmp/var/condor/execute/dir_2109569 1460387177
It means the mpich app must be in the same dir on different machines ? So how to solve it ? Thanks, HaozhanW