[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] how to beef up the submit host ?



Hi Ian,

the better practice is to keep the queue small. You can do this with dagman, which
is part of the Condor installation. Wrap you jobs in dagman and limit the maximum
number of jobs (with -maxjobs x) when you submit it. Dagman is submitting the jobs
as soon as there are less then x number of jobs in the queue. Dagman actually has
much more functions like dependencies of jobs, resubmit if jobs should fail etc. 

This is working fine here in Plymouth with 151000 jobs waiting and up to 2000 in the
queue. The queue is kept small by dagman, and the scheduler is responding fast when
you ask it with condor_q -direct schedd.
To release your submitter you can use quill to mirror the queue in a database. With
quill the scheduler does not respond to condor_history or condor_q request anymore,
as this is done in the database, which is much faster for big queues anyway.

The scheduler is a singlethread program. Keep in mind, one shadow (which is spawned
for every node executing) is using around 1MB of Ram, so 1000 shadows will be 1GB of
Ram. We have 2GB and this is working fine for 1200 nodes. Soon we plan to scale up
to 5500 nodes, but we will have to use more submitters then. 


Best regards,

Michael Hess
PlymGrid Officer
University of Plymouth
Devon, UK

> Hi,
> 
> I've got a quick question. Our submit host is currently
> a single processor Sun Blade-1000 server with 1 GB RAM. We
> have a user who has recently submitted a cluster of around
> 20,000 jobs and as a result the schedd is taking a bit of
> hammering. This means that the response time of any commands
> involving schedd (condor_q, condor_rm, condor_submit etc) is
> going through the roof.
> 
> Question is: if we had the money to beef up the submit host
> would it be worth going for a multi-processor (can schedd
> work with multiple threads) ? Or would more memory be the
> answer (memory usage seems OK at the moment though).
> 
> Anyone have a handle on what hardware spec would cope with this
> kind of thing - I know there are some big Condor installations
> out there.
> 
> regards,
> 
> -ian.
> 
> -----------------------------------
> Dr Ian C. Smith,
> e-Science team,
> University of Liverpool
> Computing Services Department
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx <javascript:void(0)>  with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at either
> https://lists.cs.wisc.edu/archive/condor-users/
> http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR
>