Hi,
I recently came across a very interesting condor-admin thread from a few
years ago that raised the issue of having Condor enforce slot (then VM)
limits:
http://www.cs.wisc.edu/condor/ligo-tickets/12616.html
Towards the end of that exchange, one of the Condor developers, in
response to a user's comment that it would be useful for Condor to
police these limits using setrlimit, writes:
"Agreed--I'm just trying hard to find a short-term solution that can
hold you over while we improve things."
With no intention of putting anyone on the spot, can I ask whether this
ever led to anything concrete? Speaking personally, I'd be very keen to
see such a solution implemented since in our flocked environment the
execute nodes cannot depend on cooperation from the submit hosts (e.g.
using periodic removes) to perform the policing. I realise that one can
go some way at achieving this using PREEMPT expressions, e.g. Todd's
comments in
https://lists.cs.wisc.edu/archive/condor-users/2007-November/msg00156.shtml,
but that doesn't seem to be able to discriminate between a heterogeneous
set of slot definitions on a multi-processor machine, e.g. we allow some
slots more resources than others. Also, it would be nice to be able to
enforce all of the resources in a slot definition, e.g. disk usage.
Cheers,
Mark