[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Problems with jobs



I have managed to get that number up as high as 20 and even 50 with only little difference. I am seeing
more running jobs, but not much more. Only 7vms max so far
 
thanks
 
Chris
 
chris@tux:/usr/share/gridshare/condorjobs/som$ condor_status | grep Busy
vm2@xxxxxxxxx LINUX       X86_64 Claimed    Busy       0.290  2048  0+00:01:28
vm1@xxxxxxxxx LINUX       X86_64 Claimed    Busy       0.140  2048  0+00:00:04
vm2@xxxxxxxxx LINUX       X86_64 Claimed    Busy       0.240  2048  0+00:00:06
vm2@xxxxxxxxx LINUX       X86_64 Claimed    Busy       0.100  2048  0+00:00:05
vm1@xxxxxxxxx LINUX       X86_64 Claimed    Busy       1.000  2048  0+00:00:05
chris@tux:/usr/share/gridshare/condorjobs/som$ condor_status | grep Busy
vm2@xxxxxxxxx LINUX       X86_64 Claimed    Busy       0.290  2048  0+00:01:28
vm1@xxxxxxxxx LINUX       X86_64 Claimed    Busy       1.000  2048  0+00:00:04
vm2@xxxxxxxxx LINUX       X86_64 Claimed    Busy       0.240  2048  0+00:00:06
vm2@xxxxxxxxx LINUX       X86_64 Claimed    Busy       0.100  2048  0+00:00:05
vm1@xxxxxxxxx LINUX       X86_64 Claimed    Busy       1.000  2048  0+00:00:05
vm1@xxxxxxxxx LINUX       X86_64 Claimed    Busy       0.160  2048  0+00:00:05
vm2@xxxxxxxxx LINUX       X86_64 Claimed    Busy       1.000  2048  0+00:00:07
chris@tux:/usr/share/gridshare/condorjobs/som$
chris@tux:/usr/share/gridshare/condorjobs/som$ condor_status | grep Busy
vm1@xxxxxxxxx LINUX       X86_64 Claimed    Busy       1.000  2048  0+00:00:05
vm1@xxxxxxxxx LINUX       X86_64 Claimed    Busy       0.160  2048  0+00:00:04
vm1@xxxxxxxxx LINUX       X86_64 Claimed    Busy       1.000  2048  0+00:00:05
chris@tux:/usr/share/gridshare/condorjobs/som$ condor_status | grep Busy
vm1@xxxxxxxxx LINUX       X86_64 Claimed    Busy       0.160  2048  0+00:00:04
vm1@xxxxxxxxx LINUX       X86_64 Claimed    Busy       1.000  2048  0+00:00:05
chris@tux:/usr/share/gridshare/condorjobs/som$ condor_status | grep Busy
vm2@xxxxxxxxx LINUX       X86_64 Claimed    Busy       0.220  2048  0+00:00:06
vm2@xxxxxxxxx LINUX       X86_64 Claimed    Busy       1.000  2048  0+00:00:06
vm2@xxxxxxxxx LINUX       X86_64 Claimed    Busy       1.000  2048  0+00:00:06
chris@tux:/usr/share/gridshare/condorjobs/som$ condor_status | grep Busy
vm2@xxxxxxxxx LINUX       X86_64 Claimed    Busy       1.000  2048  0+00:00:06
vm2@xxxxxxxxx LINUX       X86_64 Claimed    Busy       1.000  2048  0+00:00:06
chris@tux:/usr/share/gridshare/condorjobs/som$
----- Original Message -----
From: Ian Chesal
Sent: Wednesday, December 07, 2005 8:52 PM
Subject: RE: [Condor-users] Problems with jobs

That?s it! That?s the key: the jobs run very quickly (I?m guessing in the range of a few minutes, right?).

 

In that case condor can?t spawn shadows fast enough. The shadow spawn rate on the schedds is throttled to prevent overloading the machine by starting many, many processes at the same time. There are two variables that control the spawn rate. You?ll only want to change JOB_START_COUNT.

 

Put this in the condor_config file used by all your schedds:

 

            ##  Start more than one job at a time

            JOB_START_INTERVAL = 2

 

Once that?s deployed in all your condor_config files issue:

 

            condor_reconfig -all

 

From your central negotiator to reconfigure all of them.

 

You can up that number until the Claimed+Idle machines disappear but keep a careful on CPU usage on your schedd machines. It can spike spawing too many shadow processes at once.

 

- Ian

 

0 jobs; 0 idle, 0 running, 0 held

on any machine that I try. I think by the time I SSH to a node thats running a job Its already

finished hence the empty queue. The jobs run very quickly