El 09/09/13 21:11, Todd Tannenbaum
escribió:
On 9/9/2013
9:22 AM, Joan J. Piles wrote:
Hi Brian,
Well my idea was more from the POV of a systems guy... Giving
the choice
to the users is (for me at least) optional and just a nicety...
But I
would like to limit the possibility of a user requesting 1Gb of
RAM and
then using over 10Gb, thus making the machine swap like hell and
impacting the rest of users.
If user A has all of its processes in RAM and user B is swapping
like hell, is user A actually impacted by user B ? Or does user B
only impact other users that have swapping processes?
Well, heavy swapping (say 8Gb, and we've got well over that) can
certainly impact the whole system. For instance, all I/O is almost
halted (and this includes even condor activity). We allow
interactive logins through condor, and they become completely
unusable. Any other job wanting to read o write a file (and most of
them do, at some point) will be severely impacted, since swap runs
with the highest priority.
I know one
option is limiting the swap, but
what I'd rather have a big swap space just in case, and then
limit the
available swap to each job (proportionally to the RAM requested,
for
instance).
Limiting swap space makes sense in that swap is a shared resource
and thus should be requested/managed, but is swap activity and
swap size closely correlated? Seems like what you are really
worried about is lots of swap activity slowing down response time
for all users of the system, not about exhaustion of swap space
itself....
As a general rule we try to avoid OoM situations because they are
very unpredictable i.e. the kernel kills the wrong process (not the
"culprit") and most often than not some system process is killed and
we end up having to physically (or IPMI, thankfully) reboot the
machine.
To avoid this we are kind of generous with the swap space, but this
doesn't mean we intend all this swap to be used, we just leave it
there just in case something unexpected happens. Until now a user's
job running amok was one such occurrence, but now that we have
cgroups, we'd like to use them to limit this.
I'm aware that one possible solution would be to limit the memsw
usage in the htcondor parent cgroup, and this would limit the swap
available to all the condor jobs, but I'd rather use a more
fine-grained solution where we can define a policy for each job
(because there is no documentation on which job the kernel chooses
to free up memory when there are more than one going over the
limit).
I've already devised a dirty hack using a job wrapper (or hook, to
be decided yet) and a setuid binary which would get its own cgroup
from /proc/self/cgroup and would tune it accordingly, but as I've
said, I think it's a kludge.
With this I am only explaining my use case, but I'm fairly confident
there is other people out there which would also find this useful,
and that a tunable knob in HTCondor is worth, provided it is easy
enough to implement.
Regards,
Joan
regards,
Todd
And I
think that making this tunable is just a small step worth it for
the (admittedly few) users that would profit from this.
Regards,
Joan
El 09/09/13 15:49, Brian Bockelman escribió:
On Sep
4, 2013, at 8:54 AM, Joan J. Piles <jpiles@xxxxxxxxx>
wrote:
Hi all,
We have recently started using cgroups to limit the RAM
usage of our
users. We want to avoid the situation where badly predicted
requirements can bring down the whole machine where the job
is
executing, impacting other users' jobs.
HTCondor does a great job using cgroups to achieve this, but
I have
the feeling that this can be improved.
Right now, RAM usage is limited whilst swap is not. I am
aware that
you can tune this using swappiness parameters, but it is
neither
straightforward nor optimum, and furthermore it is something
difficult to do on a per job basis.
Right now HTCondor tunes the memory.limit_in_bytes or
memory.soft_limit_in_bytes files within the cgroup to limit
the RAM
usage.
I think HTCondor could provide a "request_swap" parameter in
the
submit file (and a RequestSwap associated job ClassAd) that
whould be
used to compute the value for memory.memsw.limit_in_bytes
(which
would of course be RequestMemory + RequestSwap).
There would also be the associated
MODIFY_REQUEST_EXPR_REQUESTSWAP
which could be used (for instance) to limit the amount of
swap
reserved to a % of the RAM or to provide a sensible (or even
unlimited) default.
What do you think about this idea? I think it could easily
piggyback
on the existing cgroup infrastructure without too much
hassle.
Hi Joan,
I'm not too hot on this idea - how does the user know what
value to
provide for RequestSwap? Determining a working set size for
an
application is a black art; knowing the memory requirements is
hard
enough for most users!
Brian
_______________________________________________
HTCondor-devel mailing list
HTCondor-devel@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-devel
--
--------------------------------------------------------------------------
Joan Josep Piles Contreras - Analista de sistemas
I3A - Instituto de Investigación en Ingeniería de Aragón
Tel: 876 55 51 47 (ext. 845147)
http://i3a.unizar.es -- jpiles@xxxxxxxxx
--------------------------------------------------------------------------
|
|