[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] slow scheduling of dagman jobs



On Wed, 7 Sep 2011, Patty Bragger wrote:

I'm running into a performance issue of sorts with submitting dagman jobs.
When submitting a dagman job of say 100 nodes, I find that it takes quite a
wile for all 100 nodes to show up in the queue.  After an initial wait of
about 12 seconds, the nodes are added to the queue at a rate of about 7 per
second. The nodes have no dependencies on each other, they are completely
stand alone and could be submitted without using dag.  When I do submit jobs
without using dag, the jobs are added to the queue much faster, about
100/second.  I can get that submission rate whether submitting one job with
a "queue 100"  or submitting 100 separate jobs in one submit file.
Well, keep in mind that DAGMan is doing a separate condor_submit for each 
node.  When I do that (outside of DAGMan) it's much slower than doing a 
single condor_submit that queues 100 jobs.
So I think you're basically seeing the overhead of a condor_submit call 
for every job versus a single condor_submit call.
Keep in mind that (at least with recent versions of DAGMan) you can queue 
multiple jobs in a single submit file (as long as they are all part of the 
same cluster).  I'm pretty sure (but not 100% sure) that that feature was 
in 7.4.4.  Of course, depending on exactly how you are using DAGMan, this 
may not be a good idea, but the option is there if one of your main goals 
is to get jobs into the queue as fast as possible.
Kent Wenger
Condor Team