[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor overload???



At 09:29 AM 7/14/2006 +0000, John Coulthard wrote:
I submitted 744,400 jobs to our condor cluster, was I a little over
ambitious? Is there a recommended limit?
Yes, that's probably too ambitious if you just do it all at once.

It's hard to give a precise recommended limit. People that have tuned their systems reasonably can submit a few thousand jobs at a time. Here are a few thoughts on what you can do:
1) If your jobs are short-running jobs, is it possible for you to 
combine your jobs? Condor excels at running longer jobs, and if you 
end up running fewer jobs, it will all work better.
2) You can use DAGMan to throttle your submissions. DAGman lets you 
manage sets of dependent jobs (job A runs, then B and C can run 
simultaneously, then D runs--that sort of thing), but you don't have 
to use it for that purpose. You can make a single DAG with 800,000 
independent jobs in it, then tell DAGMan to submit the jobs bit by 
bit to Condor.
DAGMan is in Section 2.12 of the Condor 6.7 manual. Note the -maxidle 
option to limit how many idle jobs DAGMan will allow there to be: 
this will effectively throttle how much you submit to Condor at once.
http://www.cs.wisc.edu/condor/manual/v6.7/2_12DAGMan_Applications.html


Anyway, what I need is a method of clearing the jobs queued so I can get
back to work on smaller batches but condor_q seems to hang so I can't
actually determine what's in the queue and 'condor_rm job#' also seem to
hang.  I've tried restarting condor but obviously the queue remains.  Is
there a backdoor method of clearing this?
If you want to totally clear the queue, remove the job_queue* files 
in your spool directory.
-alain