HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-devel] Feedback: Quill stress testing



> On Oct 18, 2005, at 1:54 PM, Matt Hope wrote:
> 
> > On 10/18/05, Ian Chesal <ICHESAL@xxxxxxxxxx> wrote:
> >
> >> seconds. It's when we remove large numbers of jobs that the quill
> >> binary
> >> really doesn't do well.
> >>
> >> If I submit 300 1000-job clusters and then condor_rm all 300
> >> clusters with
> >> one command line the database grinds to a halt.
> >
> > On a side note I should point out the same thing* happens to many of
> > my users when removing large numbers of jobs wihout quill (even if
> > those jobs aren't even running and are on hold!)
> >
> > the condor_rm process in general seems seriously expensive
> >
> > Matt
> >
> > * takes ages - schedd stops responding, ends needing a net stop
condor
> > to sort it out
> 
> Does the condor_rm take a long time to complete, or does it take a
> long time for the jobs to leave the queue? Leaving the queue forces
> at least one disk sync per job.

In my case the condor_rm command returns resonably quickly. Within 2
minutes. What ensues is a complete lock up of the quill db though. If I
stop all daemons after about 20 minutes. Delete the quill database. And
bring it all back up the jobs are gone from the DB and from the schedd
queue. Because quill is running, and there is no quick and easy way to
bypass it's use, I'm not sure how long it took for the schedd to clear
it's queue.

- Ian