I’m pretty new to Condor and I’m trying to understand the best approach for our application. We have a need to process thousands of images through a series of algorithms to do things like feature extraction. These algorithms can be and
have been represented in a DAG like this: # DVF.DAG # JOB ProcessingArea PA.condor JOB EDMS0 EDMS0.condor JOB QVT QVT.condor JOB MMlnD MMLND.condor JOB PCFF PCFF.condor JOB EDMS1 EDMS1.condor JOB DLP DLP.condor PARENT ProcessingArea CHILD EDMS0 EDMS1 QVT MMlnD PCFF PARENT QVT MMlnD PCFF CHILD DLP DOT dvf.dot See attached for the diagram. I’ve been doing a lot of reading lately trying to figure out the best (or good enough) approach to our application. One change I’ll be making is to make use of the VARS syntax to create a single submission
file for the DAG since each algorithm is implemented in the same executable and only one or two command line arguments vary between algorithms. We need to run these seven algorithms over each image, these images are all in separate directories so I’m trying to figure out how others approach this. I thought I’d be able to use something like the flexible queue command to iterate
over each image but my reading through the mailing list archive explained why this isn’t support with DAGman. At this point the only thing I’ve figured is to write a script to create unique DAG files for each image and then either submit each DAG file individually
or wrap all of the individual DAGs into a “master” DAG as SUBDAGs. I guess I’m ultimately asking for pointers or what approaches have others used in situations like this? -Sean Milligan |
Attachment:
dag.jpeg
Description: dag.jpeg