Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] MPI job problem
Hi Mark and Greg
Thanks for your responses
I change the START attribute from Scheduler =?= $(DedicatedScheduler) to True
in pragma002 and pragma004 local configuraion file and indeed , the status
become "Unclaimed"
------------------------------------------------------------------------
[lyho@pragma001 lyho]$ condor_status
Name OpSys Arch State Activity LoadAv Mem
ActvtyTime
pragma001.gri LINUX INTEL Owner Idle 0.010 469
0+00:10:04
pragma002.gri LINUX INTEL Unclaimed Idle 0.290 469
0+03:21:02
pragma004.gri LINUX INTEL Unclaimed Idle 0.150 1004
0+03:19:48
Machines Owner Claimed Unclaimed Matched Preempting
INTEL/LINUX 3 1 0 2 0 0
Total 3 1 0 2 0 0
-------------------------------------------------------------------------
but the job still IDLE
-------------------------------------------------------------------------
[lyho@pragma001 lyho]$ condor_q
-- Submitter: pragma001.grid.sinica.edu.tw : <140.109.98.21:33670> :
pragma001.g
rid.sinica.edu.tw
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
140.0 lyho 4/29 17:44 0+00:00:00 I 0 0.3 cpi
1 jobs; 1 idle, 0 running, 0 held
------------------------------------------------------------------------
and then I test the vanilla job
the job description file :
============================
universe = vanilla
executable = cpi
log = logofcpi.new
error = errofcpi.$(NODE).new
output = outofcpi.$(NODE).new
queue
=============================
and it can be done
------------------------------------------------------------------------
[lyho@pragma001 condor_test]$ condor_q
-- Submitter: pragma001.grid.sinica.edu.tw : <140.109.98.21:33670> :
pragma001.g
rid.sinica.edu.tw
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
142.0 lyho 5/2 13:18 0+00:00:00 R 0 0.3 cpi
1 jobs; 0 idle, 1 running, 0 held
---------------------------------------------------------------------
The files of log, error and output
---------------------------------------------------------------------
[lyho@pragma001 condor_test]$ more *.new
::::::::::::::
errofcpi..new
::::::::::::::
Process 0 on pragma002.grid.sinica.edu.tw
::::::::::::::
logofcpi.new
::::::::::::::
000 (142.000.000) 05/02 13:18:57 Job submitted from host:
<140.109.98.21:33670>
...
001 (142.000.000) 05/02 13:19:00 Job executing on host: <140.109.98.22:48852>
...
005 (142.000.000) 05/02 13:19:00 Job terminated.
(1) Normal termination (return value 0)
Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage
0 - Run Bytes Sent By Job
0 - Run Bytes Received By Job
0 - Total Bytes Sent By Job
0 - Total Bytes Received By Job
...
::::::::::::::
outofcpi..new
::::::::::::::
pi is approximately 3.1416009869231254, Error is 0.0000083333333323
wall clock time = 0.000055
--------------------------------------------------------------------
So, someting wrong with mpi job
Can anyone help me ??
On Fri, 29 Apr 2005 12:11:53 +0300, Mark Silberstein wrote
> The problem seems to be in the fact that all your computers are in
> the "Owner" state, i.e. Condor is NOT allowed to start any job on them.
> Obviously you're using the START expression (in the condor_config),
> which makes your resources reject Condor jobs when they are under
> load or when there's some keyboard activity. ( the output you sent was
> produced on pragma001, so you were working on it, and two others
> have a load average of 1.000 ) . To TEST that MPI really works you
> might want to disable this, by putting START=TRUE ( which would
> allow any job to be invoked, regardless of the current computer
> activity), or START=($(START))||((Scheduler =?= $(DedicatedScheduler)
> ). Mark
>