++
Nicolas
----------------
On Thu, 30 Aug 2007 12:36:08 +0200
Nicolas GUIOT wrote:
Hi
I'm trying to sumbit an MPI job to my condor pool.
The problem is that when I ask it to run on 2 cpus (ie 1
computer), it's fine, but when I ask for 4 CPU (ie 2 computer),
one seems not to find the file to write the output.
Here is the submission script :
$ cat sub-cond.cmd
universe = parallel
executable = mp2script
arguments = /nfs/opt/amber/amber9/exe/sander.MPI -O -i md.in -o
TGA07.1.out -p TGA07.top -c TGA07.0.rst -r TGA07.1.rst -x
TGA07.1.trj -e TGA07.1.ene
machine_count = 4
should_transfer_files = yes
when_to_transfer_output = on_exit_OR_EVICT
transfer_input_files = /nfs/opt/amber/amber9/exe/
sander.MPI,md.in,TGA07.top,TGA07.0.rst
Output = sanderMPI.out
Error = sanderMPI.err
Log = sanderMPI.log
queue
I'm starting the script from a directory that is nfs-shared :
(/nfs/test-space/amber)$ ls
blu.sh clean.sh md.in mdinfo mp2script mpd.hosts run_MD.sh
sub-cond.cmd TGA07.0.rst TGA07.top
The error is a typical amber error when it can't find the result
file (TGA07.1.out is an output file, doesn't exist before runnning
the progam.:
$ more sanderMPI.err
0:
0: Unit 6 Error on OPEN: TGA07.1.out
0: [cli_0]: aborting job:
0: application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
$
So, where is my problem ? NFS ? file transfer ?
Any help would be greatly appreciated :)
Nicolas
----------------------------------------------------
CNRS - UPR 9080 : Laboratoire de Biochimie Theorique
Institut de Biologie Physico-Chimique
13 rue Pierre et Marie Curie
75005 PARIS - FRANCE
Tel : +33 158 41 51 70
Fax : +33 158 41 50 26
----------------------------------------------------
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/