[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-devel] Feedback: Quill stress testing
- Date: Tue, 18 Oct 2005 15:20:15 -0400
- From: "Ian Chesal" <ICHESAL@xxxxxxxxxx>
- Subject: Re: [Condor-devel] Feedback: Quill stress testing
> On 10/18/05, Ian Chesal <ICHESAL@xxxxxxxxxx> wrote:
> > seconds. It's when we remove large numbers of jobs that the quill
binary
> > really doesn't do well.
> >
> > If I submit 300 1000-job clusters and then condor_rm all 300
clusters
> with
> > one command line the database grinds to a halt.
>
> On a side note I should point out the same thing* happens to many of
> my users when removing large numbers of jobs wihout quill (even if
> those jobs aren't even running and are on hold!)
>
> the condor_rm process in general seems seriously expensive
>
> Matt
>
> * takes ages - schedd stops responding, ends needing a net stop condor
> to sort it out
NB: my original email should have read: "If I sumbit 300 100-job
clusters" not "1000-job clusters" -- total # of procs was 30k across 300
clusters. Apparently I have forgotten how to divide.
Weird, our schedd didn't seem to be very bogged down when we did this. I
was able to condor_submit and get a prompt back within a few minutes
despite the removal of 30k jobs going on in the background. The schedd
certainly was taking much cpu. Looking at top it was all condor_quill
and the rest were just any old processes from the OS. Does the
submission request get cached? I didn't think it did.
I know that a 30k job removal isn't necessarily a realistic usage
scenario but it's good to know the limits. The upper comfortable removal
bound seems to be in around 3000 jobs at once. Beyond that performance
degrades pretty quickly.
- Ian