Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] MPI - What the heck does this mean?
- Date: Fri, 10 Feb 2006 12:33:02 -0700
- From: rnayar@xxxxxxxx
- Subject: [Condor-users] MPI - What the heck does this mean?
Hey everyone, I've been mucking around with the parallel universe and i tried
the sleeping basic program as indicated in the manual:
#############################################
## submit description file for parallel program
#############################################
universe = parallel
executable = /bin/sleep
arguments = 30
machine_count = 2
queue
Anyhow after the job completed I got an email that stated the following:
From: condor
Message-Id: <200001011014.e01AESvB003893@xxxxxxxxxxxxxxxx>
To: condor@xxxxxxxx
Subject: [Condor] Condor Job 43.0
This is an automated email from the Condor system
on machine "panndaa.nmsu.edu". Do not reply.
Your Condor-MPI job 43.0 has completed.
Here are the machines that ran your MPI job.
They are listed in the order they were started
in, which is the same as MPI_Comm_rank.
Machine Name Result
------------------------ -----------
panndaa.nmsu.edu exited normally with status 0
gutti.nmsu.edu was removed by the user
Have a nice day.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Questions about this message or Condor in general?
Email address of the local Condor administrator: condor@xxxxxxxxxxxxxxxx
The Official Condor Homepage is http://www.cs.wisc.edu/condor
So I was like, interesting but shouldn't "both" jobs exit with a status 0?
Anyone have any ideas whats going on? Below is the local config file for gutti.
It is pretty much your general run of the mill
condor_config.local.dedicated.resource modification.
DedicatedScheduler = "DedicatedScheduler@xxxxxxxxxxxxxxxx"
START = True
SUSPEND = False
CONTINUE = True
PREEMPT = False
KILL = False
WANT_SUSPEND = False
WANT_VACATE = False
RANK = Scheduler =?= $(DedicatedScheduler)
MPI_CONDOR_RSH_PATH = $(LIBEXEC)
CONDOR_SSHD = /usr/sbin/sshd
CONDOR_SSH_KEYGEN = /usr/bin/ssh-keygen
STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler
Well if anyone knows whats up or has run into this problem let me know.. Also
its weird.. even when there is nothing being executed my machines stay in the
claimed state odd....
Name OpSys Arch State Activity LoadAv Mem ActvtyTime
gutti.nmsu.ed LINUX INTEL Claimed Idle 0.000 495[?????]
panndaa.nmsu. LINUX INTEL Claimed Idle 1.110 503 0+00:03:36
Machines Owner Claimed Unclaimed Matched Preempting
INTEL/LINUX 2 0 2 0 0 0
Total 2 0 2 0 0 0
Thanks in Advance
Danny Nayar
New Mexico State University