Date: | Fri, 11 Feb 2005 14:41:18 -0800 |
---|---|
From: | Daniel Durand <Daniel.Durand@xxxxxx> |
Subject: | [Condor-users] Large Number of small jobs |
Hi I am rather new to condor although I did pass through a fair amount of help/web page before having to post to the list to gather some precious help. Here is the situation. I have to run a fair amount of DAGs, about 100,000 which are all quite simple. I used to submit every DAGs independently for small job number (<300) but with a large number of jobs I ran quickly out of file descriptors. I try a solution which is putting all the independent DAGs in on master dag like: Job solaris_1 job1.opus Job linux_1 job1.linux Script POST linux_1 remove_tar.pl job1.tar Parent solaris_1 Child linux1 Job solaris_2 job2.opus Job linux_2 job2.linux Script POST linux_2 remove_tar.pl job2.tar Parent solaris_2 Child linux_2 . . . This was repeated many times and submitted via condor_submit_dag -maxjobs 40 file.dag This ran much better but still ran out of file descriptors at some point. The reason is that all the parent tasks got executed first and I end up with tons of tar files (passing data fine between parents and child) in the submission directory filling up precious disk space. Looks like all the parents are executed first, condor not finishing a given sub-dag before starting a new one. Is there a better way to do this? My system manager tried to change the number of file descriptors available for my account but any changes to the default 1024 would render my account not usable, any shell would give up immediately after login in. We tried to change /etc/security/limits.conf without any success Here is my setup: host 31% cat /proc/sys/fs/file-max 209664 host 34% limit cputime unlimited filesize unlimited datasize unlimited stacksize unlimited coredumpsize 1 kbytes memoryuse unlimited vmemoryuse unlimited descriptors 1024 memorylocked unlimited maxproc 7168 host 37% condor_version $CondorVersion: 6.6.6 Jul 26 2004 $ $CondorPlatform: I386-LINUX_RH9 $ Linux host 2.4.22-1.2188.nptlsmp #1 SMP Wed Apr 21 20:12:56 EDT 2004 i686 athlon i386 GNU/Linux Many thanks Daniel Daniel Durand | Tel/Tél: +1 250 363 0052 | FAX: +1 250 363 0045 HST archives scientist | Responsable Archive HST Herzbergh Institute of Astrophysics | Institut Herzberg Astrophysique National Research Council Canada | Conseil National de Recherches du Canada 5071 W. Saanich Road | 5071 W. Saanich Road Victoria, B.C. | Victoria, C.B. Canada V9E 2E7 |
[← Prev in Thread] | Current Thread | [Next in Thread→] |
---|---|---|
|
Previous by Date: | Re: [Condor-users] about condor-g to globus to condor problem, Jaime Frey |
---|---|
Next by Date: | [Condor-users] newbie, toby sebastian |
Previous by Thread: | Re: [Condor-users] KILL expression !!, Matt Hope |
Next by Thread: | Re: [Condor-users] Large Number of small jobs, Robert E. Parrott |
Indexes: | [Date] [Thread] |