Hi Pat,
I guess your job is running under
user nobody, and mpd daemon needs to be started for each user on each
machine.
Make sure that mpd daemon is started on the machine
that run your job.
Chunbao Miao
Hi,
I'm trying to run a simple parallel MPI
hello world on condor but I keep getting errors. My code works using mpirun.
Here's my submit file:
universe =
parallel
requirements =
(TARGET.OpSys=="LINUX" && TARGET.Arch=="INTEL")
executable =
mp2script
arguments =
hello
log =
hello.log
output =
hello.out
error =
hello.err
machine_count =
2
should_transfer_files =
yes
when_to_transfer_output
= on_exit
transfer_input_files =
hello
+ParallelShutdownPolicy
= "WAIT_FOR_ALL"
queue
And
here's the error that I get from the generated files:
mpd.out.0:
/var/lib/condor/execute/dir_3282/condor_exec.exe: 60:
/var/lib/condor/execute/dir_3282/condor_exec.exe: mpd: not
found
mpd.out.1:
/var/lib/condor/execute/dir_5103/condor_exec.exe: 101:
/var/lib/condor/execute/dir_5103/condor_exec.exe: mpd: not
found
Any help would be appreciated. Thanks!
Regards,
Pat --------------------------------------------------------------------------------------------------------------------------------------------------------------------
The information transmitted is intended only for the
person or entity to which it is addressed and may contain confidential and/or
privileged material. Any review,retransmission,dissemination or other use of, or
taking of any action in reliance upon, this information by persons or entities
other than the intended recipient is prohibited. If you received this in error,
please contact the sender and delete the material from any
computer. |