[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-devel] Feedback: Quill stress testing
- Date: Wed, 19 Oct 2005 09:29:45 +0100
- From: Matt Hope <matthew.hope@xxxxxxxxx>
- Subject: Re: [Condor-devel] Feedback: Quill stress testing
On 10/18/05, Jaime Frey <jfrey@xxxxxxxxxxx> wrote:
> On Oct 18, 2005, at 1:54 PM, Matt Hope wrote:
>
> > On 10/18/05, Ian Chesal <ICHESAL@xxxxxxxxxx> wrote:
> >
> >> seconds. It's when we remove large numbers of jobs that the quill
> >> binary
> >> really doesn't do well.
> >>
> >> If I submit 300 1000-job clusters and then condor_rm all 300
> >> clusters with
> >> one command line the database grinds to a halt.
> >
> > On a side note I should point out the same thing* happens to many of
> > my users when removing large numbers of jobs wihout quill (even if
> > those jobs aren't even running and are on hold!)
> >
> > the condor_rm process in general seems seriously expensive
> >
> > Matt
> >
> > * takes ages - schedd stops responding, ends needing a net stop condor
> > to sort it out
>
> Does the condor_rm take a long time to complete, or does it take a
> long time for the jobs to leave the queue? Leaving the queue forces
> at least one disk sync per job.
condor_rm takes under 1 min to return (seems reasonable)
But after this for at least tens of minutes* the schedd won't respond
- at least to condor_q
* Most users give up before this and start killing processes and
wiping their job queue log