Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] DAG questions

Date: Mon, 21 Dec 2009 16:27:25 -0600 (CST)
From: "R. Kent Wenger" <wenger@xxxxxxxxxxx>
Subject: Re: [Condor-users] DAG questions

On Mon, 21 Dec 2009, Ian Stokes-Rees wrote:

R. Kent Wenger wrote:
Unless your DAG is really "wide" (most of the 3500 nodes in the queue atone time) upgrading to 7.4 should fix your file
Yes, ours is really wide: 100k nodes, with no dependencies. We use MAX_JOBSto limit how many DAGMan releases at any one time. The main reasons we useDAGMan are for pre/post scripts, the retry mechanism, and to slowly releasejobs to Condor. The jobs themselves are independent parts of a parametersweep. We then collect results from completed jobs.


Well, as long as you have maxjobs set, you'll limit your fd usage with
7.4.

As promised, 7.4 is much better: 26 minutes to submit the DAG with 7.2 wasreduced to 7 seconds.


Good to hear!

I'm looking for more opportunities to speed things up with DAGMan. My newslowdown is with the rate at which DAGMan attempts to submit jobs. Isubmitted my 100k node DAG around noon, and now 3 hours later I only have 250jobs running, 700 queued, tens of thousands left. These run for around 5minutes, so if we have a steady-state of 250 running jobs. I'd at least liketo have my MAX JOB limit number of jobs queued (currently set to 2000). Ihave DAGMAN_MAX_SUBMITS_PER_INTERVAL=250, which seemed reasonable, butperhaps is too low.

You could also set DAGMAN_USER_LOG_SCAN_INTERVAL to 1 (second). Thatshould help some.

If it would help, I could also investigate setting up a single classad forall the jobs and using the VARS command in the DAG file to customize eachinstance.

I don't think that will speed up the submits. (I assume you mean a singlesubmit file...)

If you're running a 7.4 DAGMan, a new feature is that you don't have tospecify a log file at all in your submit file -- if you don't, DAGMan willassign a default log file for you. In fact, this may be the preferred wayto do things, especially if you want to re-use your submit files in morethan one DAG. The default log files are per-DAG, so if you use the samesubmit file in two different DAGs you won't have to worry about log filecollisions if you use the default log file feature.
This sounds interesting. Is there any way to force DAGMan to do this, evenif a log file is specified in the individual classad files? The reason I askis because I'd like to keep the layered model I have right now where the nodeclassads are self-contained and can be individually submitted if required.These will need the "Log = ... " attribute.

As of now, there's no way to override the log file specified in eachsubmit file, if there is a 'log=' line. Maybe that's an option we shouldadd, though...

Finally, we are working on figuring out how to monitor and visualize theprogress of our DAG. Is there some way to do DOT file generation "ondemand"? Or does someone with more experience think it is safe in ourenvironment (100k nodes, 6000 active, 5-10 minutes per node to complete,500-2000 running at any given time) to have UPDATE enabled for automatic DOTfile generation? On the command/file side, it seems the dagman.out log fileand condor_q -dag are the only sources of monitoring information pertainingto the DAGs state and progress, or are there other places/commands I'm notaware of?


I haven't tried the automatic DOT file generation on a DAG of that size.

We want to add another output file from DAGMan that's designed to be aconcise, manchine-readable record of the status of the DAG's jobs. Thedagman.out file is really designed to be read by a human for debugging, soit's not a good idea to build tools on top of it if you can avoid that.

I don't know how soon we'll have the other log file, though.

Kent Wenger
Condor Team

References:
- [Condor-users] Updated version of "Linux Scalability" Condor page
  - From: Ian Stokes-Rees
- [Condor-users] DAG questions
  - From: Ian Stokes-Rees
- Re: [Condor-users] DAG questions
  - From: R. Kent Wenger
- Re: [Condor-users] DAG questions
  - From: Ian Stokes-Rees
- Re: [Condor-users] DAG questions
  - From: R. Kent Wenger
- Re: [Condor-users] DAG questions
  - From: Ian Stokes-Rees

Prev by Date: Re: [Condor-users] DAG questions
Next by Date: Re: [Condor-users] BLAST
Previous by thread: Re: [Condor-users] DAG questions
Next by thread: [Condor-users] CfP BADS 2010 -- 2nd Workshop on Bio-Inspired Algorithms for Distributed Systems
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [Condor-users] DAG questions