Re: [Condor-users] Large Number of small jobs


Date: Sun, 13 Feb 2005 08:24:34 -0800
From: Daniel Durand <Daniel.Durand@xxxxxx>
Subject: Re: [Condor-users] Large Number of small jobs
Yes they were all used up, in my first scenario when I was using one condor_submit_dag per pair of jobs and in the second
scenario when I was grouping all the job pairs in one dag file only and was submitting without using
-maxjobs. Right now I am limiting the number of jobs to 40 but I ran into this other problem where the parents
are all executed before the chil creating a potential disk space problem.


You idea of setting stderr and stdout to /dev/null is nice but it is only partly solving the problem. I really need
to submit thousands of jobs......


Many thanks

Daniel


Robert E. Parrott wrote:

This may not be relevant, but one thing I do when submitting many jobs
is send stdout & stderr to /dev/null.

This may free up some file descriptors.

It's not clear from your email where all the files descriptors are
being used up. Have you identified that yet?

rob


On Feb 11, 2005, at 5:41 PM, Daniel Durand wrote:

> Hi
>
> I am rather new to condor although I did pass through a fair amount of
> help/web page before having to post to
> the list to gather some precious help.
>
> Here is the situation.
>
> I have to run a fair amount of DAGs, about 100,000 which are all quite
> simple.
>
> I used to submit every DAGs independently for small job number (<300)
> but with a large number of
> jobs I ran quickly out of file descriptors.
>
> I try a solution which is putting all the independent DAGs in on
> master dag like:
> Job solaris_1 job1.opus
> Job linux_1 job1.linux
> Script POST linux_1 remove_tar.pl job1.tar
> Parent solaris_1 Child linux1
> Job solaris_2 job2.opus
> Job linux_2 job2.linux
> Script POST linux_2 remove_tar.pl job2.tar
> Parent solaris_2 Child linux_2
> .
> .
> .
>
> This was repeated many times and submitted via condor_submit_dag
> -maxjobs 40 file.dag
>
> This ran much better but still ran out of file descriptors at some
> point. The reason is that all
> the parent tasks got executed first and I end up with tons of tar
> files (passing data fine between
> parents and child) in the submission directory filling up precious
> disk space. Looks like
> all the parents are executed first, condor not finishing a given
> sub-dag before starting a new one.
>
> Is there a better way to do this?
>
> My system manager tried to change the number of file descriptors
> available for my account but
> any changes to the default 1024 would render my account not usable,
> any shell would give up
> immediately after login in. We tried to change
> /etc/security/limits.conf
> without any success
>
> Here is my setup:
> host 31% cat /proc/sys/fs/file-max
> 209664
>
> host 34% limit
> cputime         unlimited
> filesize        unlimited
> datasize        unlimited
> stacksize       unlimited
> coredumpsize    1 kbytes
> memoryuse       unlimited
> vmemoryuse      unlimited
> descriptors     1024
> memorylocked    unlimited
> maxproc         7168
>
> host 37% condor_version
> $CondorVersion: 6.6.6 Jul 26 2004 $
> $CondorPlatform: I386-LINUX_RH9 $
>
> Linux host 2.4.22-1.2188.nptlsmp #1 SMP Wed Apr 21 20:12:56 EDT 2004
> i686 athlon i386 GNU/Linux
>
> Many thanks
>
> Daniel
>
>
> Daniel Durand | Tel/Tél: +1 250 363 0052 | FAX: +1 250 363 0045
> HST archives scientist                         | Responsable Archive
> HST
> Herzbergh Institute of Astrophysics     | Institut Herzberg
> Astrophysique
> National Research Council Canada    | Conseil National de Recherches
> du Canada
> 5071 W. Saanich Road                       | 5071 W. Saanich Road
> Victoria, B.C.                                       | Victoria, C.B.
> Canada V9E 2E7
>
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>


_______________________________________________ Condor-users mailing list Condor-users@xxxxxxxxxxx https://lists.cs.wisc.edu/mailman/listinfo/condor-users



[← Prev in Thread] Current Thread [Next in Thread→]