Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] condor_shadow exits with STATUS 100 on MPI jobs
- Date: Thu, 28 Apr 2005 08:16:55 +0100
- From: Mark Calleja <M.Calleja@xxxxxxxxxxxxxxx>
- Subject: [Condor-users] condor_shadow exits with STATUS 100 on MPI jobs
Hi,
I'm trying to get MPI universe jobs to work without using shared disc
space but have a bit of a hitch. The set up uses MPICH v1.2.4 compiled
with Intel's ifc 7.1, and raw MPI jobs work well. When I submit a simple
"hello world" program via Condor's MPI universe, the jobs also run to
completion and return the data, but the nodes don't seem to exit cleanly
but remain in a Claimed/Idle state and the ShadowLog on the submit host
ends up with:
4/28 07:58:27 (22.0) (1040): Job 22.0 terminated: exited with status 0
4/28 07:58:27 (22.0) (1040): **** condor_shadow (condor_SHADOW) EXITING
WITH STATUS 100
The StartLog and StarterLog on the execute nodes seem happy enough, and
jobs on those nodes are executed as dedicated user condor_user which has
had passwordless rsh set up between all execute nodes.
The submit script is:
=================
universe = MPI
executable = hello
machine_count = 6
should_transfer_files = yes
when_to_transfer_output = ON_EXIT
log = logfile
input = /dev/null
output = outfile.$(NODE)
error = errfile.$(NODE)
queue
=================
Now, I realise I may get round all this by NFS mounting all home space,
but I'd like to avoid this if possible for performance reasons. Any
suggestions?
Cheers,
Mark