Mailing List Archives
	Authenticated access
	
	
     | 
    
	 
	 
     | 
    
	
	 
     | 
  
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] DAG questions
I had a few DAG-specific questions to follow up on.  I have increased my 
file handle and process limits to 40k and 20k respectively.
Ian Stokes-Rees wrote:
In particular, I'm trying to create a 100k node DAG (flat, no 
dependencies), with MAXJOBS 6000 and I'm getting the error:
...
These are in 100k separate classads in 100k directories (in a 2-tier 
hierarchy groupX/nodeY, so as to avoid overloading a single 
directory), with 100k log files in each of the node directories.
It takes about 1 hour for the DAG to be submitted.  I've bumped up 
ulmits to a level which should get rid of the problem, but it isn't 
clear if I need to re-submit the DAG, restart Condor, logout/login, or 
even reboot the machine to have these changes come into effect.  Any 
advice kindly appreciated.
I've read and re-read some of the DAGMan documentation.  I've now set:
DAGMAN_MAX_SUBMITS_PER_INTERVAL=250
DAGMAN_LOG_ON_NFS_IS_ERROR=False
The latter is surprising since I understand the default is "True", but 
my jobs were submitted OK (docs for 7.0 say this should cause DAG 
failure).  All my job files are on NFS.  I don't have space on local 
disk for the 20+ GB this DAG will produce on each iteration.  I'm using 
Condor 7.2.  I should also mention that I have DOT generation turned on 
and set to UPDATE.  This may not be a good idea.  In the short term I 
can move job submission to a local disk for testing, and turn off DOT 
generation.
My dagman.out file is huge: 200 MB.  Is there some way to reduce the 
logging level?  I couldn't see any option to do this.  I seem to get one 
line per DAG node every time DAGMan re-evaluates the DAG.  100k lines 
every few minutes is too much.  My ideal scenario:
1. Specify the location of the DAG log, out, and err files explicitly 
(rather than have them end up in the directory where condor_submit_dag 
is executed).
2. Limit logging to remove per-DAG-node lines.
3. Log rotate files that could grow big
Finally condor_submit_dag seems to be silent while it processes the 
DAG.  I don't want a flood of output, but it would be nice to know 
*something* is going on.  Instead it outputs nothing for 60 minutes, 
then dumps the status of the DAG submission.
Thanks for advice on how to improve our use of DAGMan.
Ian
--
Ian Stokes-Rees, Research Associate
SBGrid, Harvard Medical School
http://sbgrid.org