[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor and GPUs



Hi Miron,

Miron Livny wrote:
> How should we go about collecting the requirements for the most basic 
> support we should offer in this space. What I am looking for is input 
> that will help us pick the right cost/benefit ratio for an effort to 
> support GPUS and/or multi-core nodes.

I don't know of a good metric for that, but I can tell you about one of
our current problems which cries for a - in my eyes rather simple -
solution but I don't know if there is one yet:

On a large cluster we have a user running jobs which are incredibly I/O
intensive (over the network) in the beginning and then after a while are
only CPU bound.

We have quad core CPUs and when two or even four of these jobs land on
the same node they get into each other's way very effectively and the
total run time effectively goes almost with the number of jobs started
on that node.

Of course we could say: limit only to slot 1 or slot 3 on each node, but
that's not what we really want since these slots might be already filled
with other (long running and CPU bound jobs).

What we would like is a settings where we could tell condor in a
requirement:

slot_number==SINGLE

or something like this, i.e. meaning run only a single instance on this
node.

Given that hex- or octo-CPUs are already appearing on the roadmaps, I
think we definitely need this sooner rather than later.

Just my $0.031612

Cheers

Carsten

PS: Of course I also agree with Steffen that we do need some more
flexible and dynamic memory scheme allowing to use the available
resources much better than any static system could yield.