Dear All, To continuing of my last reported problem, I change
mp2script to the new one which are attached. After a few minute, the submitted job was exited with the
attached error and log file. Thanks for your consideration. Sincerely, Arash From: arash
[mailto:anoorghorbani@xxxxxxxxx] Dear All , Regard to my first e-mail with
subject “mpich2 error " '.../condor_exec.exe' with arguments
hellow.exe: No such file or directory”, I attached the correspond
parts of all of my log files, may be useful. And please note that I use ubuntu
7.10. Regard, Arash From: arash
[mailto:anoorghorbani@xxxxxxxxx] Dear All, I was configured to quad-core computers (called mpi0 and
mpi1) as dedicated resources , which mpi0 are set as scheduler. however I can
run simple parallel jobs, but I couldn’t run mpi jobs. And I received the error : '/home/condor/execute/dir_6618/condor_exec.exe' with
arguments hellow.exe: No such file or directory In the file
log.#pArAlLeLnOdE# I submitted the following file: ######################################### universe
= parallel executable =
mp2script.smp arguments
= hellow.exe machine_count
= 3 should_transfer_files = yes when_to_transfer_output
= on_exit transfer_input_files =
hellow.exe +WantParallelSchedulingGroups
= False notification
=never log
=log.$(NODE) error =err.$(NODE) output =out.$(NODE) queue ######################################### Which hellow.exe is mpicc of ***************************************** /* -*- Mode: C; c-basic-offset:4 ; -*- */ /* * (C) 2001 by Argonne National Laboratory. * See COPYRIGHT in
top-level directory. */ #include <stdio.h> #include "mpi.h" int main( int argc, char *argv[] ) { int rank; int size; MPI_Init( 0, 0 ); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); printf( "Hello world from process %d
of %d\n", rank, size ); MPI_Finalize(); return 0; } And I used mp2script described in CamGrid page which
is: #!/bin/sh # # File: mp2script.smp # # Edit MPDIR and LD_LIBRARY_PATH to suit your # local configuration. _CONDOR_PROCNO=$_CONDOR_PROCNO _CONDOR_NPROCS=$_CONDOR_NPROCS EXECUTABLE=$1 shift # the binary is copied but the executable flag is cleared. # so the script have to take care of this chmod +x $EXECUTABLE # Set this to the bin directory of your mpich2 installation MPDIR=/usr/local/mpich2 PATH=$MPDIR/bin:.:$PATH export PATH # When a job is killed by the user, this script will get
sigterm # This script has to catch it and do the cleaning for the # mpich2 environment finalize() { mpdallexit exit } trap finalize TERM # start the mpich2 environment if [ $_CONDOR_PROCNO -eq 0 ] then
# MPICH2 requires an mpd.conf file with a # password in it
on the host starting the job. # We'll generate
one on the fly, though we could
# use a pre-prepared one, e.g:
# export MPD_CONF_FILE=~/.mpd.conf export
MPD_CONF_FILE=`pwd`/mpd.conf echo
"secretword=MySecretWord" > $MPD_CONF_FILE chmod 600
$MPD_CONF_FILE
# Adjust the following to your needs. I use Intel
# compilers to build MPICH2 export
LD_LIBRARY_PATH=/lib:/usr/lib:/$MPDIR/lib
mpd --daemon --debug val=$?
if [ $val -ne 0 ]
then
echo "mp2script error booting mpd: $val"
exit 1
fi
## Run the actual mpi job. Note pre-prepared machine file.
mpiexec -l -machinefile $MPDIR/etc/machfile -envall -n $_CONDOR_NPROCS
$EXECUTABLE $@ mpdallexit
rm $MPD_CONF_FILE else
wait
exit 0 fi exit $? ###### End of mp2script.smp ###### And the file log.#pArAlLeLnOdE# was
generated as followed: 000 (030.000.000) 02/02 17:28:00 Job submitted from host:
<x.x.x.27:54299> ... 014 (030.000.000) 02/02 17:33:04 Node 0 executing on host:
<x.x.x.27:39023> ... 014 (030.000.001) 02/02 17:33:04 Node 1 executing on host:
<x.x.x.27:39023> ... 014 (030.000.002) 02/02 17:33:04 Node 2 executing on host:
<x.x.x.27:39023> ... 001 (030.000.000) 02/02 17:33:04 Job executing on host:
MPI_job ... 007 (030.000.000) 02/02 17:33:04 Shadow exception!
Error from starter on vm3@xxxxxxxxxx: Failed to execute
'/home/condor/execute/dir_6618/condor_exec.exe' with arguments hellow.exe: No
such file or directory
0 - Run Bytes Sent By Job
1621935 - Run Bytes Received By Job ... 012 (030.000.000) 02/02 17:33:04 Job was held.
Error from starter on vm3@xxxxxxxxxx: Failed to execute
'/home/condor/execute/dir_6618/condor_exec.exe' with arguments hellow.exe: No
such file or directory
Code 6 Subcode 2 ... I will be pleasured if you have any hint. Regard, Arash |
Attachment:
mp2script.smp
Description: Binary data
Attachment:
err.0
Description: Binary data
Attachment:
log.#pArAlLeLnOdE#
Description: Binary data
Attachment:
log.rar
Description: Binary data