This may not be relevant, but one thing I do when submitting many jobs 
is send stdout & stderr to /dev/null. 
 
This may free up some file descriptors.  
It's not clear from your email where all the files descriptors are 
being used up. Have you identified that yet? 
 
rob  
 On Feb 11, 2005, at 5:41 PM, Daniel Durand wrote:
  
Hi  
I am rather new to condor although I did pass through a fair amount of 
help/web page before having to post to 
the list to gather some precious help. 
 
Here is the situation.  
I have to run a fair amount of DAGs, about 100,000 which are all quite 
simple. 
 
I used to submit every DAGs independently for small job number (<300) 
but with a large number of 
jobs I ran quickly out of file descriptors. 
 
I try a solution which is putting all the independent DAGs in on 
master dag like: 
Job solaris_1 job1.opus 
Job linux_1 job1.linux 
Script POST linux_1 remove_tar.pl job1.tar 
Parent solaris_1 Child linux1 
Job solaris_2 job2.opus 
Job linux_2 job2.linux 
Script POST linux_2 remove_tar.pl job2.tar 
Parent solaris_2 Child linux_2 
. 
. 
. 
 
This was repeated many times and submitted via condor_submit_dag 
-maxjobs 40 file.dag 
 
This ran much better but still ran out of file descriptors at some 
point. The reason is that all 
the parent tasks got executed first and I end up with tons of tar 
files (passing data fine between 
parents and child) in the submission directory filling up precious 
disk space. Looks like 
all the parents are executed first, condor not finishing a given 
sub-dag before starting a new one. 
 
Is there a better way to do this?  
My system manager tried to change the number of file descriptors 
available for my account but 
any changes to the default 1024 would render my account not usable, 
any shell would give up 
immediately after login in. We tried to change 
/etc/security/limits.conf 
without any success 
 
Here is my setup:
host 31% cat /proc/sys/fs/file-max
209664  
host 34% limit
cputime         unlimited
filesize        unlimited
datasize        unlimited
stacksize       unlimited
coredumpsize    1 kbytes
memoryuse       unlimited
vmemoryuse      unlimited
descriptors     1024
memorylocked    unlimited
maxproc         7168  
host 37% condor_version
$CondorVersion: 6.6.6 Jul 26 2004 $
$CondorPlatform: I386-LINUX_RH9 $  
Linux host 2.4.22-1.2188.nptlsmp #1 SMP Wed Apr 21 20:12:56 EDT 2004 
i686 athlon i386 GNU/Linux 
 
Many thanks  
Daniel  
 Daniel Durand | Tel/Tél: +1 250 363 0052 | FAX: +1 250 363 0045 
HST archives scientist                         | Responsable Archive 
HST 
Herzbergh Institute of Astrophysics     | Institut Herzberg 
Astrophysique 
National Research Council Canada    | Conseil National de Recherches 
du Canada 
5071 W. Saanich Road                       | 5071 W. Saanich Road 
Victoria, B.C.                                       | Victoria, C.B. 
Canada V9E 2E7 
 
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users  
 
 
  
 
 |