[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Jobs blocked as Idle in Multi-CPU machine
- Date: Thu, 23 Aug 2007 04:40:36 +0800
- From: "ye huang" <ye.huang@xxxxxxxx>
- Subject: [Condor-users] Jobs blocked as Idle in Multi-CPU machine
Hi, All:
I have two machines, nodeA contains 2 CPU, nodeB contains 1 CPU, here is the cpu information:
_______________________
ye@nodea:~$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 CPU 6400 @ 2.13GHz
stepping : 2
cpu MHz : 1596.000
cache size : 2048 KB
... ...
bogomips :
4265.69
clflush size : 64
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 CPU 6400 @ 2.13GHz
stepping : 2
cpu MHz : 1596.000
cache size : 2048 KB
... ...
bogomips : 4262.73
clflush size : 64
_______________________
ye@nodeb:~$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 1
model name : Intel(R) Pentium(R) 4 CPU 1.70GHz
... ...
bogomips : 3393.42
clflush size : 64
_______________________
I fellow Condor's(
6.8.4) tutorial(http://www.cs.wisc.edu/condor/tutorials/intl-grid-school-3/) as my beginning, for the step of "Submitting your first Condor job", I find all the job submitted in nodeA are blocked as idle:
_______________________
ye@nodea:~$ condor_q
-- Submitter: nodea.gridgroup.eif.ch : <160.98.20.75:40855> :
nodea.gridgroup.eif.ch
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
2.0 globus 8/22 18:45 0+00:00:00 I 0 9.8 simple 4 10
1 jobs; 1 idle, 0 running, 0 held
_______________________
But when I submit the same job in nodeB, it works perfectly.
In this case, I checked the condor status, the following is the feedback:
_______________________
ye@nodea:~$ condor_status
Name OpSys Arch State Activity LoadAv Mem ActvtyTime
vm1@xxxxxxxxx LINUX INTEL Unclaimed Idle 0.000 1000 0+03:45:04
vm2@xxxxxxxxx LINUX INTEL Unclaimed Idle 0.000 1000 0+03:45:05
Total Owner Claimed Unclaimed Matched Preempting Backfill
INTEL/LINUX 2 0 0 2 0 0 0
Total 2 0 0 2 0 0 0
_______________________
ye@nodeb:~$ condor_status
Name OpSys Arch State Activity LoadAv Mem ActvtyTime
nodeb.gridgro LINUX INTEL Unclaimed Idle 0.000 1011 0+03:09:53
Total Owner Claimed Unclaimed Matched Preempting Backfill
INTEL/LINUX 1 0 0 1 0 0 0
Total 1 0 0 1 0 0 0
_______________________
I don't know whether it's caused by nodeA contains 2 CPU, so the jobs in nodeA is blocked because they don't know where to execute?
And how could I fix this problem upon nodeA(multi-processes)?
Thanks a lot!
Best regards
ye