[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] New to Condor - Difficult (I think) problem...



Hi All,

I'm new to condor and distributed computing, so the problem I'm trying to solve may be trivial, difficult or impossible; briefly, here is what I need to do.
We have a pool of multi-CPU (actually dual-CPU) windows machines that we  
would like to maximize the use of CPU time on. We have three types of jobs  
to be run with the following requirements for each job type:
1. Single-CPU (about 80% of jobs). These jobs require only one CPU and  
thus can run concurrently on the same multi-CPU machine up to the number  
of CPUs on the machine. This seems easy enough and should work "straight  
out of the box".
2. Multi-CPU (about 15% of jobs).  These jobs require all the CPUs on the  
machine and no other job running on the machine. The application will take  
care of starting it's own processes/threads to make full use of all CPUs.
3. Multi-CPU, Multi-Machine (about 5% of jobs). These jobs require  
multiple multi-CPU machines, one master and one or more "slaves". Each  
machine will be dedicated to this job (i.e. no other jobs on these  
machines). The application, running on the "master" machine will take care  
of starting it's own processes/threads (local and remote) to fully utilize  
the machines assigned to the job. In addition, the "master" machine needs  
to get a list of all the "slave" machines. (It may be sufficient to limit  
this to one slave.)
Once started, each job must complete before another is started. If it  
helps, we may be able to identify two machines to handle the "Multi-CPU,  
Multi-Machine" case, as long as they can also run type 1 and 2 jobs when  
type 3 jobs are not in the queue. Writing scripts around the application  
to gather information to pass to the application is also a possible  
solution (we have MKS and perl available on all machines).
If this is fairly straight-forward, please say so, but also point in the  
direction of some documentation and preferably examples.
Any pointers and/or advise will be greatly appreciated.

Thanks,
Bob Mortensen