Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] mpich2 error " '.../condor_exec.exe' witharguments hellow.exe: No such file or directory"
- Date: Mon, 4 Feb 2008 14:22:45 -0000
- From: "Kewley, J (John)" <j.kewley@xxxxxxxx>
- Subject: Re: [Condor-users] mpich2 error " '.../condor_exec.exe' witharguments hellow.exe: No such file or directory"
Do you not need to add
transfer_executable=true
so that your "executable" (mp2script.smp) is transferred?
(I haven't used parallel universe, but that error is common for this error in
other universes and I noticed you were transferring other files, hence not in a shared
filestore environment)
Cheers
JK
-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx on behalf of arash
Sent: Mon 04/02/2008 14:16
To: Condor-Users Mail List
Subject: Re: [Condor-users] mpich2 error " '.../condor_exec.exe' witharguments hellow.exe: No such file or directory"
Dear All,
To continuing of my last reported problem, I change mp2script to the new one
which are attached.
After a few minute, the submitted job was exited with the attached error and
log file.
Thanks for your consideration.
Sincerely,
Arash
From: arash [mailto:anoorghorbani@xxxxxxxxx]
Sent: Monday, February 04, 2008 5:39 PM
To: Condor-Users Mail List
Subject: RE: mpich2 error " '.../condor_exec.exe' with arguments hellow.exe:
No such file or directory"
Dear All ,
Regard to my first e-mail with subject "mpich2 error " '.../condor_exec.exe'
with arguments hellow.exe: No such file or directory", I attached the
correspond parts of all of my log files, may be useful. And please note that
I use ubuntu 7.10.
Regard,
Arash
From: arash [mailto:anoorghorbani@xxxxxxxxx]
Sent: Sunday, February 03, 2008 2:38 PM
To: Condor-Users Mail List
Subject: mpich2 error " '.../condor_exec.exe' with arguments hellow.exe: No
such file or directory"
Dear All,
I was configured to quad-core computers (called mpi0 and mpi1) as dedicated
resources , which mpi0 are set as scheduler. however I can run simple
parallel jobs, but I couldn't run mpi jobs.
And I received the error :
'/home/condor/execute/dir_6618/condor_exec.exe' with arguments hellow.exe:
No such file or directory
In the file log.#pArAlLeLnOdE#
I submitted the following file:
#########################################
universe = parallel
executable = mp2script.smp
arguments = hellow.exe
machine_count = 3
should_transfer_files = yes
when_to_transfer_output = on_exit
transfer_input_files = hellow.exe
+WantParallelSchedulingGroups = False
notification =never
log =log.$(NODE)
error =err.$(NODE)
output =out.$(NODE)
queue
#########################################
Which hellow.exe is mpicc of
*****************************************
/* -*- Mode: C; c-basic-offset:4 ; -*- */
/*
* (C) 2001 by Argonne National Laboratory.
* See COPYRIGHT in top-level directory.
*/
#include <stdio.h>
#include "mpi.h"
int main( int argc, char *argv[] )
{
int rank;
int size;
MPI_Init( 0, 0 );
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
printf( "Hello world from process %d of %d\n", rank, size );
MPI_Finalize();
return 0;
}
And I used mp2script described in CamGrid page which is:
#!/bin/sh
#
# File: mp2script.smp
#
# Edit MPDIR and LD_LIBRARY_PATH to suit your
# local configuration.
_CONDOR_PROCNO=$_CONDOR_PROCNO
_CONDOR_NPROCS=$_CONDOR_NPROCS
EXECUTABLE=$1
shift
# the binary is copied but the executable flag is cleared.
# so the script have to take care of this
chmod +x $EXECUTABLE
# Set this to the bin directory of your mpich2 installation
MPDIR=/usr/local/mpich2
PATH=$MPDIR/bin:.:$PATH
export PATH
# When a job is killed by the user, this script will get sigterm
# This script has to catch it and do the cleaning for the
# mpich2 environment
finalize()
{
mpdallexit
exit
}
trap finalize TERM
# start the mpich2 environment
if [ $_CONDOR_PROCNO -eq 0 ]
then
# MPICH2 requires an mpd.conf file with a
# password in it on the host starting the job.
# We'll generate one on the fly, though we could
# use a pre-prepared one, e.g:
# export MPD_CONF_FILE=~/.mpd.conf
export MPD_CONF_FILE=`pwd`/mpd.conf
echo "secretword=MySecretWord" > $MPD_CONF_FILE
chmod 600 $MPD_CONF_FILE
# Adjust the following to your needs. I use Intel
# compilers to build MPICH2
export LD_LIBRARY_PATH=/lib:/usr/lib:/$MPDIR/lib
mpd --daemon --debug
val=$?
if [ $val -ne 0 ]
then
echo "mp2script error booting mpd: $val"
exit 1
fi
## Run the actual mpi job. Note pre-prepared machine file.
mpiexec -l -machinefile $MPDIR/etc/machfile -envall -n
$_CONDOR_NPROCS $EXECUTABLE $@
mpdallexit
rm $MPD_CONF_FILE
else
wait
exit 0
fi
exit $?
###### End of mp2script.smp ######
And the file log.#pArAlLeLnOdE# was generated as followed:
000 (030.000.000) 02/02 17:28:00 Job submitted from host: <x.x.x.27:54299>
...
014 (030.000.000) 02/02 17:33:04 Node 0 executing on host: <x.x.x.27:39023>
...
014 (030.000.001) 02/02 17:33:04 Node 1 executing on host: <x.x.x.27:39023>
...
014 (030.000.002) 02/02 17:33:04 Node 2 executing on host: <x.x.x.27:39023>
...
001 (030.000.000) 02/02 17:33:04 Job executing on host: MPI_job
...
007 (030.000.000) 02/02 17:33:04 Shadow exception!
Error from starter on vm3@xxxxxxxxxx: Failed to execute
'/home/condor/execute/dir_6618/condor_exec.exe' with arguments hellow.exe:
No such file or directory
0 - Run Bytes Sent By Job
1621935 - Run Bytes Received By Job
...
012 (030.000.000) 02/02 17:33:04 Job was held.
Error from starter on vm3@xxxxxxxxxx: Failed to execute
'/home/condor/execute/dir_6618/condor_exec.exe' with arguments hellow.exe:
No such file or directory
Code 6 Subcode 2
...
I will be pleasured if you have any hint.
Regard,
Arash