HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-devel] Feedback: Quill stress testing



> On 10/18/05, Ian Chesal <ICHESAL@xxxxxxxxxx> wrote:
> > seconds. It's when we remove large numbers of jobs that the quill
binary
> > really doesn't do well.
> >
> > If I submit 300 1000-job clusters and then condor_rm all 300
clusters
> with
> > one command line the database grinds to a halt.
> 
> On a side note I should point out the same thing* happens to many of
> my users when removing large numbers of jobs wihout quill (even if
> those jobs aren't even running and are on hold!)
> 
> the condor_rm process in general seems seriously expensive
> 
> Matt
> 
> * takes ages - schedd stops responding, ends needing a net stop condor
> to sort it out

NB: my original email should have read: "If I sumbit 300 100-job
clusters" not "1000-job clusters" -- total # of procs was 30k across 300
clusters. Apparently I have forgotten how to divide.

Weird, our schedd didn't seem to be very bogged down when we did this. I
was able to condor_submit and get a prompt back within a few minutes
despite the removal of 30k jobs going on in the background. The schedd
certainly was taking much cpu. Looking at top it was all condor_quill
and the rest were just any old processes from the OS. Does the
submission request get cached? I didn't think it did.

I know that a 30k job removal isn't necessarily a realistic usage
scenario but it's good to know the limits. The upper comfortable removal
bound seems to be in around 3000 jobs at once. Beyond that performance
degrades pretty quickly.

- Ian