Re: [Condor-users] Problems with jobs

Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

Date: Wed, 7 Dec 2005 23:55:36 -0000

From: "Chris Miles" <chrismiles@xxxxxxxxxxxxxxxx>

Subject: Re: [Condor-users] Problems with jobs

I have managed to get that number up as high as 20 and even 50 with only little difference. I am seeing

more running jobs, but not much more. Only 7vms max so far

thanks

Chris

chris@tux:/usr/share/gridshare/condorjobs/som$ condor_status | grep Busy
vm2@xxxxxxxxx LINUX       X86_64 Claimed    Busy       0.290 2048 0+00:01:28
vm1@xxxxxxxxx LINUX       X86_64 Claimed    Busy       0.140 2048 0+00:00:04
vm2@xxxxxxxxx LINUX       X86_64 Claimed    Busy       0.240 2048 0+00:00:06
vm2@xxxxxxxxx LINUX       X86_64 Claimed    Busy       0.100 2048 0+00:00:05
vm1@xxxxxxxxx LINUX       X86_64 Claimed    Busy       1.000 2048 0+00:00:05
chris@tux:/usr/share/gridshare/condorjobs/som$ condor_status | grep Busy
vm2@xxxxxxxxx LINUX       X86_64 Claimed    Busy       0.290 2048 0+00:01:28
vm1@xxxxxxxxx LINUX       X86_64 Claimed    Busy       1.000 2048 0+00:00:04
vm2@xxxxxxxxx LINUX       X86_64 Claimed    Busy       0.240 2048 0+00:00:06
vm2@xxxxxxxxx LINUX       X86_64 Claimed    Busy       0.100 2048 0+00:00:05
vm1@xxxxxxxxx LINUX       X86_64 Claimed    Busy       1.000 2048 0+00:00:05
vm1@xxxxxxxxx LINUX       X86_64 Claimed    Busy       0.160 2048 0+00:00:05
vm2@xxxxxxxxx LINUX       X86_64 Claimed    Busy       1.000 2048 0+00:00:07
chris@tux:/usr/share/gridshare/condorjobs/som$
chris@tux:/usr/share/gridshare/condorjobs/som$ condor_status | grep Busy
vm1@xxxxxxxxx LINUX       X86_64 Claimed    Busy       1.000 2048 0+00:00:05
vm1@xxxxxxxxx LINUX       X86_64 Claimed    Busy       0.160 2048 0+00:00:04
vm1@xxxxxxxxx LINUX       X86_64 Claimed    Busy       1.000 2048 0+00:00:05
chris@tux:/usr/share/gridshare/condorjobs/som$ condor_status | grep Busy
vm1@xxxxxxxxx LINUX       X86_64 Claimed    Busy       0.160 2048 0+00:00:04
vm1@xxxxxxxxx LINUX       X86_64 Claimed    Busy       1.000 2048 0+00:00:05
chris@tux:/usr/share/gridshare/condorjobs/som$ condor_status | grep Busy
vm2@xxxxxxxxx LINUX       X86_64 Claimed    Busy       0.220 2048 0+00:00:06
vm2@xxxxxxxxx LINUX       X86_64 Claimed    Busy       1.000 2048 0+00:00:06
vm2@xxxxxxxxx LINUX       X86_64 Claimed    Busy       1.000 2048 0+00:00:06
chris@tux:/usr/share/gridshare/condorjobs/som$ condor_status | grep Busy
vm2@xxxxxxxxx LINUX       X86_64 Claimed    Busy       1.000 2048 0+00:00:06
vm2@xxxxxxxxx LINUX       X86_64 Claimed    Busy       1.000 2048 0+00:00:06
chris@tux:/usr/share/gridshare/condorjobs/som$

----- Original Message -----

From: Ian Chesal

To: Chris Miles ; Condor-Users Mail List

Sent: Wednesday, December 07, 2005 8:52 PM

Subject: RE: [Condor-users] Problems with jobs

That?s it! That?s the key: the jobs run very quickly (I?m guessing in the range of a few minutes, right?).

In that case condor can?t spawn shadows fast enough. The shadow spawn rate on the schedds is throttled to prevent overloading the machine by starting many, many processes at the same time. There are two variables that control the spawn rate. You?ll only want to change JOB_START_COUNT.

Put this in the condor_config file used by all your schedds:

            ## Start more than one job at a time

            JOB_START_INTERVAL = 2

Once that?s deployed in all your condor_config files issue:

            condor_reconfig -all

From your central negotiator to reconfigure all of them.

You can up that number until the Claimed+Idle machines disappear but keep a careful on CPU usage on your schedd machines. It can spike spawing too many shadow processes at once.

- Ian

0 jobs; 0 idle, 0 running, 0 held

on any machine that I try. I think by the time I SSH to a node thats running a job Its already

finished hence the empty queue. The jobs run very quickly

Mailing List Archives

Authenticated access

Re: [Condor-users] Problems with jobs