This may not be relevant, but one thing I do when submitting many jobs
is send stdout & stderr to /dev/null.
This may free up some file descriptors.
It's not clear from your email where all the files descriptors are
being used up. Have you identified that yet?
rob
On Feb 11, 2005, at 5:41 PM, Daniel Durand wrote:
> Hi
>
> I am rather new to condor although I did pass through a fair amount of
> help/web page before having to post to
> the list to gather some precious help.
>
> Here is the situation.
>
> I have to run a fair amount of DAGs, about 100,000 which are all quite
> simple.
>
> I used to submit every DAGs independently for small job number (<300)
> but with a large number of
> jobs I ran quickly out of file descriptors.
>
> I try a solution which is putting all the independent DAGs in on
> master dag like:
> Job solaris_1 job1.opus
> Job linux_1 job1.linux
> Script POST linux_1 remove_tar.pl job1.tar
> Parent solaris_1 Child linux1
> Job solaris_2 job2.opus
> Job linux_2 job2.linux
> Script POST linux_2 remove_tar.pl job2.tar
> Parent solaris_2 Child linux_2
> .
> .
> .
>
> This was repeated many times and submitted via condor_submit_dag
> -maxjobs 40 file.dag
>
> This ran much better but still ran out of file descriptors at some
> point. The reason is that all
> the parent tasks got executed first and I end up with tons of tar
> files (passing data fine between
> parents and child) in the submission directory filling up precious
> disk space. Looks like
> all the parents are executed first, condor not finishing a given
> sub-dag before starting a new one.
>
> Is there a better way to do this?
>
> My system manager tried to change the number of file descriptors
> available for my account but
> any changes to the default 1024 would render my account not usable,
> any shell would give up
> immediately after login in. We tried to change
> /etc/security/limits.conf
> without any success
>
> Here is my setup:
> host 31% cat /proc/sys/fs/file-max
> 209664
>
> host 34% limit
> cputime unlimited
> filesize unlimited
> datasize unlimited
> stacksize unlimited
> coredumpsize 1 kbytes
> memoryuse unlimited
> vmemoryuse unlimited
> descriptors 1024
> memorylocked unlimited
> maxproc 7168
>
> host 37% condor_version
> $CondorVersion: 6.6.6 Jul 26 2004 $
> $CondorPlatform: I386-LINUX_RH9 $
>
> Linux host 2.4.22-1.2188.nptlsmp #1 SMP Wed Apr 21 20:12:56 EDT 2004
> i686 athlon i386 GNU/Linux
>
> Many thanks
>
> Daniel
>
>
> Daniel Durand | Tel/Tél: +1 250 363 0052 | FAX: +1 250 363 0045
> HST archives scientist | Responsable Archive
> HST
> Herzbergh Institute of Astrophysics | Institut Herzberg
> Astrophysique
> National Research Council Canada | Conseil National de Recherches
> du Canada
> 5071 W. Saanich Road | 5071 W. Saanich Road
> Victoria, B.C. | Victoria, C.B.
> Canada V9E 2E7
>
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users