Dear All I am so sorry about forgetting to attach related files. It is all of the file Best wish, Arash -----Original Message----- From: arash [mailto:anoorghorbani@xxxxxxxxx] Sent: Tuesday, February 05, 2008 7:01 PM To: 'Condor-Users Mail List' Subject: RE: [Condor-users] mpich2 error " '.../condor_exec.exe' witharguments hellow.exe: No such file or directory" Thanks for your consideration, I add this line but I get the same result. Moreover I have another error in my configuration, I had called condor start twice in my startup of Linux, after fixing that it seems that the job run, but I have no output, and additionally I receive very similar error files. Again , I attached all of the related files. I think there is an error in Mark Calleja's mp2script, or I am using this file wrongly. In particular at the end of my error files you can see: ___________________________________________________ + hostname=mpi0 + pwd + currentDir=/home/condor/execute/dir_6717 + whoami + user=condor + echo hellow.exe mpi0 4446 condor /home/condor/execute/dir_6717 + /usr/local/condor/libexec/condor_chirp put -mode cwa - /home/condor/spool/cluster41.proc0.subproc0/contact + [ 0 -ne 0 ] + [ hellow.exe -eq 0 ] [: 1: hellow.exe: bad number + EXECUTABLE=hellow.exe + shift + chmod +x hellow.exe + MPDIR=/usr/local/mpich2 + PATH=/usr/local/mpich2/bin:.:/usr/local/condor/bin:/sbin:/bin:/usr/sbin:/usr /bin + export PATH + export SCRATCH_LOC=loclocloc /home/condor/execute/dir_6717/condor_exec.exe: 39: cannot create ~/loclocloc: Directory nonexistent + echo /home/condor/execute/dir_6717 + trap finalize TERM + [ hellow.exe -ne 0 ] [: 1: hellow.exe: bad number + [ hellow.exe -eq 0 ] [: 1: hellow.exe: bad number + exit 0 ___________________________________________________ I don't know what is loclocloc and also I am confusing about the meaning of [: 1: hellow.exe: bad number Again Thanks for your consideration, Regard, Arash -----Original Message----- From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Kewley, J (John) Sent: Monday, February 04, 2008 5:53 PM To: Condor-Users Mail List Subject: Re: [Condor-users] mpich2 error " '.../condor_exec.exe' witharguments hellow.exe: No such file or directory" Do you not need to add transfer_executable=true so that your "executable" (mp2script.smp) is transferred? (I haven't used parallel universe, but that error is common for this error in other universes and I noticed you were transferring other files, hence not in a shared filestore environment) Cheers JK -----Original Message----- From: condor-users-bounces@xxxxxxxxxxx on behalf of arash Sent: Mon 04/02/2008 14:16 To: Condor-Users Mail List Subject: Re: [Condor-users] mpich2 error " '.../condor_exec.exe' witharguments hellow.exe: No such file or directory" Dear All, To continuing of my last reported problem, I change mp2script to the new one which are attached. After a few minute, the submitted job was exited with the attached error and log file. Thanks for your consideration. Sincerely, Arash From: arash [mailto:anoorghorbani@xxxxxxxxx] Sent: Monday, February 04, 2008 5:39 PM To: Condor-Users Mail List Subject: RE: mpich2 error " '.../condor_exec.exe' with arguments hellow.exe: No such file or directory" Dear All , Regard to my first e-mail with subject "mpich2 error " '.../condor_exec.exe' with arguments hellow.exe: No such file or directory", I attached the correspond parts of all of my log files, may be useful. And please note that I use ubuntu 7.10. Regard, Arash From: arash [mailto:anoorghorbani@xxxxxxxxx] Sent: Sunday, February 03, 2008 2:38 PM To: Condor-Users Mail List Subject: mpich2 error " '.../condor_exec.exe' with arguments hellow.exe: No such file or directory" Dear All, I was configured to quad-core computers (called mpi0 and mpi1) as dedicated resources , which mpi0 are set as scheduler. however I can run simple parallel jobs, but I couldn't run mpi jobs. And I received the error : '/home/condor/execute/dir_6618/condor_exec.exe' with arguments hellow.exe: No such file or directory In the file log.#pArAlLeLnOdE# I submitted the following file: ######################################### universe = parallel executable = mp2script.smp arguments = hellow.exe machine_count = 3 should_transfer_files = yes when_to_transfer_output = on_exit transfer_input_files = hellow.exe +WantParallelSchedulingGroups = False notification =never log =log.$(NODE) error =err.$(NODE) output =out.$(NODE) queue ######################################### Which hellow.exe is mpicc of ***************************************** /* -*- Mode: C; c-basic-offset:4 ; -*- */ /* * (C) 2001 by Argonne National Laboratory. * See COPYRIGHT in top-level directory. */ #include <stdio.h> #include "mpi.h" int main( int argc, char *argv[] ) { int rank; int size; MPI_Init( 0, 0 ); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); printf( "Hello world from process %d of %d\n", rank, size ); MPI_Finalize(); return 0; } And I used mp2script described in CamGrid page which is: #!/bin/sh # # File: mp2script.smp # # Edit MPDIR and LD_LIBRARY_PATH to suit your # local configuration. _CONDOR_PROCNO=$_CONDOR_PROCNO _CONDOR_NPROCS=$_CONDOR_NPROCS EXECUTABLE=$1 shift # the binary is copied but the executable flag is cleared. # so the script have to take care of this chmod +x $EXECUTABLE # Set this to the bin directory of your mpich2 installation MPDIR=/usr/local/mpich2 PATH=$MPDIR/bin:.:$PATH export PATH # When a job is killed by the user, this script will get sigterm # This script has to catch it and do the cleaning for the # mpich2 environment finalize() { mpdallexit exit } trap finalize TERM # start the mpich2 environment if [ $_CONDOR_PROCNO -eq 0 ] then # MPICH2 requires an mpd.conf file with a # password in it on the host starting the job. # We'll generate one on the fly, though we could # use a pre-prepared one, e.g: # export MPD_CONF_FILE=~/.mpd.conf export MPD_CONF_FILE=`pwd`/mpd.conf echo "secretword=MySecretWord" > $MPD_CONF_FILE chmod 600 $MPD_CONF_FILE # Adjust the following to your needs. I use Intel # compilers to build MPICH2 export LD_LIBRARY_PATH=/lib:/usr/lib:/$MPDIR/lib mpd --daemon --debug val=$? if [ $val -ne 0 ] then echo "mp2script error booting mpd: $val" exit 1 fi ## Run the actual mpi job. Note pre-prepared machine file. mpiexec -l -machinefile $MPDIR/etc/machfile -envall -n $_CONDOR_NPROCS $EXECUTABLE $@ mpdallexit rm $MPD_CONF_FILE else wait exit 0 fi exit $? ###### End of mp2script.smp ###### And the file log.#pArAlLeLnOdE# was generated as followed: 000 (030.000.000) 02/02 17:28:00 Job submitted from host: <x.x.x.27:54299> ... 014 (030.000.000) 02/02 17:33:04 Node 0 executing on host: <x.x.x.27:39023> ... 014 (030.000.001) 02/02 17:33:04 Node 1 executing on host: <x.x.x.27:39023> ... 014 (030.000.002) 02/02 17:33:04 Node 2 executing on host: <x.x.x.27:39023> ... 001 (030.000.000) 02/02 17:33:04 Job executing on host: MPI_job ... 007 (030.000.000) 02/02 17:33:04 Shadow exception! Error from starter on vm3@xxxxxxxxxx: Failed to execute '/home/condor/execute/dir_6618/condor_exec.exe' with arguments hellow.exe: No such file or directory 0 - Run Bytes Sent By Job 1621935 - Run Bytes Received By Job ... 012 (030.000.000) 02/02 17:33:04 Job was held. Error from starter on vm3@xxxxxxxxxx: Failed to execute '/home/condor/execute/dir_6618/condor_exec.exe' with arguments hellow.exe: No such file or directory Code 6 Subcode 2 ... I will be pleasured if you have any hint. Regard, Arash _______________________________________________ Condor-users mailing list To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/condor-users The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/
Attachment:
simple_mpi.rar
Description: Binary data