Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Best Practices: how to handle a DAG with an unknown number of jobs in one step...
- Date: Thu, 25 Feb 2010 19:57:00 +1100
- From: Daniel Pittman <daniel@xxxxxxxxxxxx>
- Subject: [Condor-users] Best Practices: how to handle a DAG with an unknown number of jobs in one step...
G'day.
We have a regular set of jobs that I am a bit puzzled about how best to model
in Condor. Specifically, the model is this:
Step 1: Fetch data from our collection system.
Step 2: Process that data into an unknown number of "packs".
Step 3: Generate one report for each pack at step 2.
Now, stage one and two are pretty easy, but I would ideally like to have a
single DAG that would encapsulate the whole process.
What gives me trouble is working out how to get step 3 to generate one condor
job for each pack — since they can run in parallel trivially, and we generally
only run one or two of the overall jobs at any given time.[1]
What I would like is, effectively, to have something like this:
,----[ example.sub ]
| executable = generate-report
| arguments = pack.$(Process)
| queue <magic here to count the number of pack files>
`----
...or, alternately, to have the DAG do the same thing, submitting the job
once for each file.
Is this possible, or do we need to write something custom?
Daniel
Footnotes:
[1] Which, if it isn't clear, means we can't count on multiple of these
overall processes to take advantage of the multiple machines that we have
to run the jobs on.
--
✣ Daniel Pittman ✉ daniel@xxxxxxxxxxxx ☎ +61 401 155 707
♽ made with 100 percent post-consumer electrons