Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] condor_shadow exits with STATUS 100 on MPI jobs

Date: Thu, 28 Apr 2005 08:16:55 +0100
From: Mark Calleja <M.Calleja@xxxxxxxxxxxxxxx>
Subject: [Condor-users] condor_shadow exits with STATUS 100 on MPI jobs

Hi,

I'm trying to get MPI universe jobs to work without using shared disc space but have a bit of a hitch. The set up uses MPICH v1.2.4 compiled with Intel's ifc 7.1, and raw MPI jobs work well. When I submit a simple "hello world" program via Condor's MPI universe, the jobs also run to completion and return the data, but the nodes don't seem to exit cleanly but remain in a Claimed/Idle state and the ShadowLog on the submit host ends up with:

4/28 07:58:27 (22.0) (1040): Job 22.0 terminated: exited with status 0 4/28 07:58:27 (22.0) (1040): **** condor_shadow (condor_SHADOW) EXITING WITH STATUS 100

The StartLog and StarterLog on the execute nodes seem happy enough, and jobs on those nodes are executed as dedicated user condor_user which has had passwordless rsh set up between all execute nodes.

The submit script is:

=================
universe = MPI
executable = hello
machine_count = 6

should_transfer_files = yes
when_to_transfer_output = ON_EXIT

log = logfile
input = /dev/null
output = outfile.$(NODE)
error = errfile.$(NODE)

queue
=================

Now, I realise I may get round all this by NFS mounting all home space, but I'd like to avoid this if possible for performance reasons. Any suggestions?

Cheers,
Mark

Follow-Ups:
- Re: [Condor-users] condor_shadow exits with STATUS 100 on MPI jobs
  - From: Mark Silberstein

Prev by Date: [Condor-users] a couple of quick condor problems
Next by Date: Re: [Condor-users] condor_shadow exits with STATUS 100 on MPI jobs
Previous by thread: [Condor-users] a couple of quick condor problems
Next by thread: Re: [Condor-users] condor_shadow exits with STATUS 100 on MPI jobs
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

[Condor-users] condor_shadow exits with STATUS 100 on MPI jobs