HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-devel] Per machine resource limits in 7.6/7.8



On 05/01/2012 03:52 PM, Alan De Smet wrote:
I had an interesting chat with a Condor Week attendee who has
some challenges that I'm not sure how to handle.  They have
various limited resources (software licenses) , but the limits
aren't global, they're per machine, so the concurrency limits
aren't a good fit.  Furthermore a job might claim multiple
identical resources at once, in much the same way a single job
might claim multiple cores.

Custom resource limits per eje's proposal
(https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2905) seem
like a good fit, and he was open to the idea.  However, that's
7.9 territory, and they'd prefer to not use a development
release.

Is there a good solution for 7.6 or 7.8?  All I've got in having
slots START expressions total up usage by other slots and testing
against the limit, which seems clumsy.

There is some dark magic you can try in 7.6 or 7.8, but it suffers from a few problems -

http://spinningmatt.wordpress.com/2010/02/21/node-local-limiters-e-g-gpus/

You could also try making some extra slots and have their use mutually exclusive with some generic slots - kinda like the whole machine slots configuration. Also dark magic.

A backport to 7.8 isn't out of the question, though it seems awfully late to me.

At some point the stable-devel split just breaks down. I wish we had data on devel feature impact on the stability of existing (presumably stable series) features.

Best,


matt