Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Jobs blocked as Idle in Multi-CPU machine
- Date: Wed, 22 Aug 2007 13:48:17 -0700
- From: "Jones, Torrin A \(US SSA\)" <torrin.jones@xxxxxxxxxxxxxx>
- Subject: Re: [Condor-users] Jobs blocked as Idle in Multi-CPU machine
Title: Message
What
does condor_q -analyze say?
What
does condor_q -better-analyze say?
Hi,
All:
I have two machines, nodeA contains 2 CPU, nodeB contains 1 CPU,
here is the cpu information:
_______________________
ye@nodea:~$ cat
/proc/cpuinfo
processor :
0
vendor_id : GenuineIntel
cpu
family :
6
model :
15
model name : Intel(R) Core(TM)2
CPU 6400 @
2.13GHz
stepping : 2
cpu
MHz : 1596.000
cache
size : 2048 KB
...
...
bogomips : 4265.69
clflush
size :
64
processor :
1
vendor_id : GenuineIntel
cpu
family :
6
model :
15
model name : Intel(R) Core(TM)2
CPU 6400 @
2.13GHz
stepping : 2
cpu
MHz : 1596.000
cache
size : 2048 KB
...
...
bogomips : 4262.73
clflush
size : 64
_______________________
ye@nodeb:~$
cat /proc/cpuinfo
processor :
0
vendor_id : GenuineIntel
cpu
family :
15
model :
1
model name : Intel(R) Pentium(R) 4 CPU
1.70GHz
... ...
bogomips :
3393.42
clflush size :
64
_______________________
I fellow Condor's( 6.8.4) tutorial(http://www.cs.wisc.edu/condor/tutorials/intl-grid-school-3/)
as my beginning, for the step of "Submitting your first Condor job", I find
all the job submitted in nodeA are blocked as idle:
_______________________
ye@nodea:~$ condor_q
-- Submitter: nodea.gridgroup.eif.ch : <160.98.20.75:40855> : nodea.gridgroup.eif.ch
ID
OWNER
SUBMITTED RUN_TIME ST PRI SIZE CMD
2.0 globus
8/22 18:45 0+00:00:00 I 0 9.8 simple 4
10
1 jobs; 1 idle, 0 running, 0 held
_______________________
But when I submit the same job in nodeB, it works perfectly.
In
this case, I checked the condor status, the following is the
feedback:
_______________________
ye@nodea:~$
condor_status
Name
OpSys Arch
State Activity LoadAv
Mem ActvtyTime
vm1@xxxxxxxxx
LINUX INTEL Unclaimed
Idle 0.000 1000
0+03:45:04
vm2@xxxxxxxxx
LINUX INTEL Unclaimed
Idle 0.000 1000 0+03:45:05
Total Owner Claimed Unclaimed Matched Preempting
Backfill
INTEL/LINUX 2
0
0
2
0
0
0
Total 2
0
0
2
0
0 0
_______________________
ye@nodeb:~$
condor_status
Name
OpSys Arch
State Activity LoadAv
Mem ActvtyTime
nodeb.gridgro
LINUX INTEL Unclaimed
Idle 0.000 1011 0+03:09:53
Total Owner Claimed Unclaimed Matched Preempting
Backfill
INTEL/LINUX 1
0
0
1
0
0
0
Total 1
0
0
1
0
0 0
_______________________
I don't know whether it's caused by nodeA
contains 2 CPU, so the jobs in nodeA is blocked because they don't know where
to execute?
And how could I fix this problem upon nodeA(multi-processes)?
Thanks a lot!
Best
regards
ye