Steve,
Most scaling issues of this sort can be addressed by adding more submit nodes. However, in many situations, a single submit node can handle 2000-3000 running jobs without sweating, so some investigation into your case may be worthwhile.
Windows or linux? Scaling a submit node under Linux is much better supported.
What version of HTCondor?
Does the machine grind to a halt due to thrashing of the swap device? i.e. is it out of memory? On a 64-bit machine running HTCondor 6.8, I'd expect each running job to require about 1.5MB on the submit machine. 16GB RAM should therefore be enough at your scale, but perhaps other things are eating some of the memory.
How long do individual jobs typically take to complete? Job completion rates > ~20 Hz on a single submit node are possible, but may require some attention to details, such as the ephemeral port range.
--Dan
On 3/19/13 10:35 AM, Rochford, Steve wrote:
We have a user who is submitting a lot of jobs to our condor system. He’s hitting some limits and I want to work out how we can help.
He would like to be able to have 2000-3000 jobs running simultaneously – we have enough nodes to cope with this – but actually submitting them is causing problems.
Essentially his job is running the program but using slightly different parameters each time so he has a submit file with (eg) queue 500 at the end.
He can submit about 500 jobs simultaneously and everything works but trying to submit more than that and his machine grinds to a halt – presumably the overhead of communicating with all the nodes is too much (the machine has 16GB RAM and a reasonably decent CPU)
If I give him (say) another 6 machines set up as submit nodes will this work or will we hit other bottlenecks (or is this too vague a question??)
Thanks
Steve
_______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/