[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Condor's DAG scalability?



Hello,

I'm planning to use Condor on a cluster of ~50 CPUs to carry out a large set of experiments. Each experiment will have several different modules, which need to be executed in a sequential fashion. My block diagrams of each experiment are arranged such that both looping and nested looping need to occur. Fortunately, iterations of loops are completely independent of each other data-wise.
I see that Condor's DAG functionality only allows inclusion of one job 
per submit file that is referenced with the "JOB" directive.  Therefore, 
I see the most straightforward solution to condor-izing my experiment is 
to dynamically generate a DAG file with (potentially) hundreds or 
thousands of JOB entries, and PARENT/CHILD entries with hundreds or 
thousands of arguments.
May I solicit some words of wisdom with respect to the scalability of 
Condor's DAG functionality as I will be using it?  :-)  Have others used 
Condor's DAG tools for single experiments in which there are thousands 
(or even millions) of component processes?  Of course, some of these 
components will be hidden under nested condor_dagman executions, but 
nevertheless, there will be a lot of schedule-processing going on...will 
Condor and/or condor_dagman be able to handle this?
Any advice is appreciated!  Thanks,

 - Armen

--
Armen Babikyan
MIT Lincoln Laboratory
armenb@xxxxxxxxxx . 781-981-1796