This may not be relevant, but one thing I do when submitting many jobs
is send stdout & stderr to /dev/null.
This may free up some file descriptors.
It's not clear from your email where all the files descriptors are
being used up. Have you identified that yet?
rob
On Feb 11, 2005, at 5:41 PM, Daniel Durand wrote:
Hi
I am rather new to condor although I did pass through a fair amount of
help/web page before having to post to
the list to gather some precious help.
Here is the situation.
I have to run a fair amount of DAGs, about 100,000 which are all quite
simple.
I used to submit every DAGs independently for small job number (<300)
but with a large number of
jobs I ran quickly out of file descriptors.
I try a solution which is putting all the independent DAGs in on
master dag like:
Job solaris_1 job1.opus
Job linux_1 job1.linux
Script POST linux_1 remove_tar.pl job1.tar
Parent solaris_1 Child linux1
Job solaris_2 job2.opus
Job linux_2 job2.linux
Script POST linux_2 remove_tar.pl job2.tar
Parent solaris_2 Child linux_2
.
.
.
This was repeated many times and submitted via condor_submit_dag
-maxjobs 40 file.dag
This ran much better but still ran out of file descriptors at some
point. The reason is that all
the parent tasks got executed first and I end up with tons of tar
files (passing data fine between
parents and child) in the submission directory filling up precious
disk space. Looks like
all the parents are executed first, condor not finishing a given
sub-dag before starting a new one.
Is there a better way to do this?
My system manager tried to change the number of file descriptors
available for my account but
any changes to the default 1024 would render my account not usable,
any shell would give up
immediately after login in. We tried to change
/etc/security/limits.conf
without any success
Here is my setup:
host 31% cat /proc/sys/fs/file-max
209664
host 34% limit
cputime unlimited
filesize unlimited
datasize unlimited
stacksize unlimited
coredumpsize 1 kbytes
memoryuse unlimited
vmemoryuse unlimited
descriptors 1024
memorylocked unlimited
maxproc 7168
host 37% condor_version
$CondorVersion: 6.6.6 Jul 26 2004 $
$CondorPlatform: I386-LINUX_RH9 $
Linux host 2.4.22-1.2188.nptlsmp #1 SMP Wed Apr 21 20:12:56 EDT 2004
i686 athlon i386 GNU/Linux
Many thanks
Daniel
Daniel Durand | Tel/Tél: +1 250 363 0052 | FAX: +1 250 363 0045
HST archives scientist | Responsable Archive
HST
Herzbergh Institute of Astrophysics | Institut Herzberg
Astrophysique
National Research Council Canada | Conseil National de Recherches
du Canada
5071 W. Saanich Road | 5071 W. Saanich Road
Victoria, B.C. | Victoria, C.B.
Canada V9E 2E7
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
|