[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Best Practices: how to handle a DAG with an unknown number of jobs in one step...



G'day.

We have a regular set of jobs that I am a bit puzzled about how best to model
in Condor.  Specifically, the model is this:

Step 1: Fetch data from our collection system.
Step 2: Process that data into an unknown number of "packs".
Step 3: Generate one report for each pack at step 2.

Now, stage one and two are pretty easy, but I would ideally like to have a
single DAG that would encapsulate the whole process.

What gives me trouble is working out how to get step 3 to generate one condor
job for each pack — since they can run in parallel trivially, and we generally
only run one or two of the overall jobs at any given time.[1]


What I would like is, effectively, to have something like this:

,----[ example.sub ]
| executable = generate-report
| arguments  = pack.$(Process)
| queue <magic here to count the number of pack files>
`----

...or, alternately, to have the DAG do the same thing, submitting the job
once for each file.

Is this possible, or do we need to write something custom?

        Daniel

Footnotes: 
[1]  Which, if it isn't clear, means we can't count on multiple of these
     overall processes to take advantage of the multiple machines that we have
     to run the jobs on.

-- 
✣ Daniel Pittman            ✉ daniel@xxxxxxxxxxxx            ☎ +61 401 155 707
               ♽ made with 100 percent post-consumer electrons