[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Notification of multiple job completion



On Fri, 29 Oct 2010, Rob Matthews wrote:

I am new to Condor and am using it for Monte Carlo simulation. Each MC run
is independent and carried out by a given executable which produces a
results file, so I have a wrapper program which populates the inputs for
these and submits all the needed runs to the Condor queue. This all works
great except now I need some way of knowing when all the MC runs I submitted
are complete so I can postprocess results (i.e. parse all the individual
results files and operate as needed).

Right now my wrapper code does this by polling the local directory every 5
seconds looking for the needed results files but this becomes inefficient
with large simulations. Is there a mechanism in Condor to possibly execute a
program (like my postprocessing code) once all the jobs submitted to the
queue are compete?
You can do this by putting all of your MC jobs into a DAG with no 
dependencies (see 
http://www.cs.wisc.edu/condor/manual/v7.5/2_10DAGMan_Applications.html#SECTION003106500000000000000 
for info about DAGMan).
However, from your description, it sounds like you might benefit from 
using DAGMan for more than just getting the notification when things are 
done.  You could make the code that creates the input files a node in the 
DAG, then have all of the actual MC jobs be dependent on that node, and 
then have another node that does the postprocessing that's dependent on 
all of the MC nodes.  This would get you the correct sequences of job 
submissions without any coding on your part, and it would also enable you 
to get rid of your wrapper code that does the actual submits.  Plus you 
get all the other goodness of DAGMan, like options to re-try failed 
nodes...
Kent Wenger
Condor Team