HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-devel] Feedback: Quill stress testing



On 10/18/05, Jaime Frey <jfrey@xxxxxxxxxxx> wrote:
> On Oct 18, 2005, at 1:54 PM, Matt Hope wrote:
>
> > On 10/18/05, Ian Chesal <ICHESAL@xxxxxxxxxx> wrote:
> >
> >> seconds. It's when we remove large numbers of jobs that the quill
> >> binary
> >> really doesn't do well.
> >>
> >> If I submit 300 1000-job clusters and then condor_rm all 300
> >> clusters with
> >> one command line the database grinds to a halt.
> >
> > On a side note I should point out the same thing* happens to many of
> > my users when removing large numbers of jobs wihout quill (even if
> > those jobs aren't even running and are on hold!)
> >
> > the condor_rm process in general seems seriously expensive
> >
> > Matt
> >
> > * takes ages - schedd stops responding, ends needing a net stop condor
> > to sort it out
>
> Does the condor_rm take a long time to complete, or does it take a
> long time for the jobs to leave the queue? Leaving the queue forces
> at least one disk sync per job.

condor_rm takes under 1 min to return (seems reasonable)

But after this for at least tens of minutes* the schedd won't respond
- at least to condor_q

* Most users give up before this and start killing processes and
wiping their job queue log