Re: [Condor-users] Jobs blocked as Idle in Multi-CPU machine

Date: Wed, 22 Aug 2007 13:48:17 -0700

From: "Jones, Torrin A $US SSA$" <torrin.jones@xxxxxxxxxxxxxx>

Subject: Re: [Condor-users] Jobs blocked as Idle in Multi-CPU machine

Title: Message

What does condor_q -analyze say?

What does condor_q -better-analyze say?

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of ye huang
Sent: Wednesday, August 22, 2007 13:41
To: condor-users@xxxxxxxxxxx
Subject: [Condor-users] Jobs blocked as Idle in Multi-CPU machine

Hi, All:

I have two machines, nodeA contains 2 CPU, nodeB contains 1 CPU, here is the cpu information:
_______________________

ye@nodea:~$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Core(TM)2 CPU          6400 @ 2.13GHz
stepping        : 2
cpu MHz         : 1596.000
cache size      : 2048 KB
... ...
bogomips        : 4265.69
clflush size    : 64

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Core(TM)2 CPU          6400 @ 2.13GHz
stepping        : 2
cpu MHz         : 1596.000
cache size      : 2048 KB
... ...
bogomips        : 4262.73
clflush size    : 64

_______________________

ye@nodeb:~$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 1
model name      : Intel(R) Pentium(R) 4 CPU 1.70GHz
... ...
bogomips        : 3393.42
clflush size    : 64
_______________________

I fellow Condor's( 6.8.4) tutorial(http://www.cs.wisc.edu/condor/tutorials/intl-grid-school-3/) as my beginning, for the step of "Submitting your first Condor job", I find all the job submitted in nodeA are blocked as idle:
_______________________
ye@nodea:~$ condor_q

-- Submitter: nodea.gridgroup.eif.ch : <160.98.20.75:40855> : nodea.gridgroup.eif.ch
ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
   2.0   globus          8/22 18:45   0+00:00:00 I 0   9.8 simple 4 10

1 jobs; 1 idle, 0 running, 0 held
_______________________

But when I submit the same job in nodeB, it works perfectly.
In this case, I checked the condor status, the following is the feedback:
_______________________

ye@nodea:~$ condor_status

Name          OpSys       Arch   State      Activity   LoadAv Mem   ActvtyTime

vm1@xxxxxxxxx LINUX       INTEL Unclaimed Idle       0.000 1000 0+03:45:04
vm2@xxxxxxxxx LINUX       INTEL Unclaimed Idle       0.000 1000 0+03:45:05

                     Total Owner Claimed Unclaimed Matched Preempting Backfill

         INTEL/LINUX     2     0       0         2       0          0        0

               Total     2     0       0         2       0          0        0
_______________________

ye@nodeb:~$ condor_status

Name          OpSys       Arch   State      Activity   LoadAv Mem   ActvtyTime

nodeb.gridgro LINUX       INTEL Unclaimed Idle       0.000 1011 0+03:09:53

                     Total Owner Claimed Unclaimed Matched Preempting Backfill

         INTEL/LINUX     1     0       0         1       0          0        0

               Total     1     0       0         1       0          0        0
_______________________

I don't know whether it's caused by nodeA contains 2 CPU, so the jobs in nodeA is blocked because they don't know where to execute?
And how could I fix this problem upon nodeA(multi-processes)?

Thanks a lot!

Best regards
ye