[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Problems with jobs

Date: Thu, 8 Dec 2005 14:13:53 -0000
From: "Chris Miles" <chrismiles@xxxxxxxxxxxxxxxx>
Subject: Re: [Condor-users] Problems with jobs

Basically I have a pool with a shared file system and 25 machines. These areall very powerfull.I think the weakest link in my chain is my submitting machine which is justa lone server with itsown configuration. Its a 1ghz 512mb Mini ITX box. Not the fastest in theworld and has a few

other applications running (required).

Is this the machine I should set JOB_START_COUNT on? or should it be set onthe machines that

actually run the jobs?

On my submitting machine thats the one I see the condor_shadow daemonsfiring up.

chris 18253 5552 0 14:01 ? 00:00:00 condor_shadow -f 80.9<146.191.100.202:46251> -chris 18325 5552 0 14:01 ? 00:00:00 condor_shadow -f 80.12<146.191.100.202:46251> -chris 18362 5552 0 14:01 ? 00:00:00 condor_shadow -f 80.13<146.191.100.202:46251> -chris 18396 5552 0 14:01 ? 00:00:00 condor_shadow -f 80.14<146.191.100.202:46251> -chris 18454 5552 0 14:01 ? 00:00:00 condor_shadow -f 80.15<146.191.100.202:46251> -chris 18464 5552 0 14:01 ? 00:00:00 condor_shadow -f 80.16<146.191.100.202:46251> -chris 18499 5552 0 14:01 ? 00:00:00 condor_shadow -f 80.17<146.191.100.202:46251> -chris 18533 5552 0 14:01 ? 00:00:00 condor_shadow -f 80.18<146.191.100.202:46251> -chris 18570 5552 0 14:01 ? 00:00:00 condor_shadow -f 80.19<146.191.100.202:46251> -chris 18579 5552 0 14:01 ? 00:00:00 condor_shadow -f 80.20<146.191.100.202:46251> -

How many of these is normal? I am submitting a 1000 job cluster to the poolwith 25 machines (50 vms).


Looks like I may be running low on memory on my submitting machine as well.

top - 14:01:52 up 20 days, 21:24,  3 users,  load average: 0.79, 1.21, 0.92
Tasks:  71 total,   1 running,  70 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.3% us,  1.7% sy,  0.7% ni, 97.0% id,  0.0% wa,  0.3% hi,  0.0% si
Mem:    484284k total,   476868k used,     7416k free,    14216k buffers
Swap:   999928k total,        0k used,   999928k free,   249768k cached

Im still unsure to some of this.... where exactly is the problem lying,, thesubmitter or the executers?


thanks again

Chris

----- Original Message -----From: "Matt Hope" <matthew.hope@xxxxxxxxx>

To: "Condor-Users Mail List" <condor-users@xxxxxxxxxxx>
Cc: "Ian Chesal" <ICHESAL@xxxxxxxxxx>
Sent: Thursday, December 08, 2005 8:21 AM
Subject: Re: [Condor-users] Problems with jobs

On 12/7/05, Chris Miles <chrismiles@xxxxxxxxxxxxxxxx> wrote:

I have managed to get that number up as high as 20 and even 50 with only
little difference. I am seeing
more running jobs, but not much more. Only 7vms max so far


How many (non held) clusters and jobs* are in your queue and how often
do you negotiate?

Since the schedd can only do one of the two tasks (starting shadows
and serving queue info requests) it can fail to keep up

A similar situation can occur if something/someone is running condor_q
against your schedd repeatedly.

* if NEGOTIATE_ALL_JOBS_IN_CLUSTER is true then jobs matter, if not
then clusters matter.

Matt

_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

References:
- Re: [Condor-users] Problems with jobs
  - From: Ian Chesal
- Re: [Condor-users] Problems with jobs
  - From: Chris Miles
- Re: [Condor-users] Problems with jobs
  - From: Matt Hope

Prev by Date: [Condor-users] Multiple schedulers on the same computer
Next by Date: Re: [Condor-users] Problems with jobs
Previous by thread: Re: [Condor-users] Problems with jobs
Next by thread: Re: [Condor-users] Problems with jobs
Index(es):
- Date
- Thread