Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Dagman with a variable number of jobs

Date: Wed, 2 Jun 2010 09:25:14 -0500 (CDT)
From: "R. Kent Wenger" <wenger@xxxxxxxxxxx>
Subject: Re: [Condor-users] Dagman with a variable number of jobs

On Tue, 1 Jun 2010, Robert Sandilands wrote:

I am just fishing for some ideas on how to better handle the followingscenario:
Job 1 collects a list of data objects to process from a database. This listcan be of variable size and this size is unknown until the first jobfinishes.
Job 2 .. N then processes the data objects in batches of up to 1,000 itemsper job and updates the database.
As additional fun this needs to run once a day or on a continuous basisdepending on the specific data objects.
My current attempt to solve this involves running a script that submits a dag(right terminology?) and waits for it to finish. It then will sleep for therequired amount of time and resubmit the dag. Inside the .dag file I use PREscripts to determine which individual jobs needs to be submitted and whichnot.
This works fine if there is a reasonable upper limit to the number of dataobjects. The number of items in the list can be anything from 1,000 to1,000,000.
If we assume that no single job should process more than 1,000 items then itimplies that there can be between 2 and 1,001 jobs in the dag.
Is it even possible to write a dag with that number of dependenciesespecially since there is only 1 parent? I have tested up to 51 jobs and thatseems to run without any issues.
And what do you do if the list suddenly grows to have 1,000,001 entries?

Any ideas would be appreciated.


We have users running DAGs with up to around 500k nodes, so having 1000
children of one parent should be no problem.

I'd recommend slightly changing your approach, though -- I think this is agreat case for using nested DAGs. If you take this approach, yourtop-level DAG would have two nodes -- one is the node that figures out howmuch processing has to be done, and writes the lower-level DAG that doesthe processing, and one is the node that actually runs the lower-levelDAG.

You can find more information about nested DAGs in the DAGMan section ofthe Condor manual:


http://www.cs.wisc.edu/condor/manual/v7.5/2_10DAGMan_Applications.html#SECTION003106700000000000000

Also, this will be easier to implement if you are running the latestDAGMan (7.5.2). If you're running an older DAGMan, you have to have a"dummy" lower-level DAG in place when you submit the upper-level DAG, etc.

If you're not running Condor 7.5.2, you can run the latest DAGMan withoutupgrading your overall Condor version -- just grab the 7.5.2 condor_dagmanand condor_submit_dag binaries, and put them somewhere that's closer tothe beginning of your $path list than the "regular" Condor binaries --maybe ~/bin, for example.


Kent Wenger
Condor Team

References:
- [Condor-users] Dagman with a variable number of jobs
  - From: Robert Sandilands

Prev by Date: [Condor-users] Jobs stuck in running state after completion
Next by Date: [Condor-users] problem with failure associated with LOG LINE CACHE
Previous by thread: [Condor-users] Dagman with a variable number of jobs
Next by thread: [Condor-users] Jobs stuck in running state after completion
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [Condor-users] Dagman with a variable number of jobs