Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Using cgroups to limit job memory
- Date: Fri, 03 Apr 2015 13:36:34 -0500
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Using cgroups to limit job memory
On 4/2/2015 9:30 AM, Roderick Johnstone wrote:
Todd
The HOWTO recipes are your friend. From the HTCondor.org homepage look
for "HOWTO recipes"; the direct link is
https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToAdminRecipes
Thanks for the pointer, yes these are really useful.
Glad to hear it!
ok, I think I understand this, but it would be good to have
clarification on a one thing please:
1) Is MemoryUsage tracking the resident memory usage (ie excluding any
virtual memory) of the whole job process tree when cgroups is configured?
Yes.
And even if you do not configure/use cgroups, HTCondor attempts to make
MemoryUsage mean the same thing. It does this by summing the resident
set size for each process in the job process tree. This may end up
overestimating the memory usage compared to what cgroups would report
(imagine a job that has dozens of child processes that all load the same
shared library), but it is a pretty reasonable approximation. Without
cgroups, HTCondor tracks what processes are in a group via several
different algorithms that can work very accurately in practice,
especially if you give HTCondor a range of GIDs to use (see
http://goo.gl/LVDSys).
If so, would something like the following, (based on examples from the
wiki page), in an environment with cgroups enabled, place a job on hold
when the job process tree allocates more resident memory than in the
request_memory submit file attribute?
# Allow jobs to not be limited by request_memory otherwise
# this policy can never be triggered
CGROUP_MEMORY_LIMIT_POLICY=none
# hold jobs that are more than 10% over requested memory
MEMORY_EXCEEDED = ((MemoryUsage*1.1 > request_memory) =!= TRUE)
PREEMPT = $(PREEMPT)) || $(MEMORY_EXCEEDED)
WANT_SUSPEND = False
WANT_HOLD = $(MEMORY_EXCEEDED)
WANT_HOLD_REASON = ifThenElse( $(MEMORY_EXCEEDED), \
"Your job used more resident memory than it requested.", \
undefined )
Without actually testing the above, off the top of my head the idea
looks like it should work. Note that the above has a syntax error for
the PREEMPT expression due to unmatched parenthesis - you probably wanted
PREEMPT = ($(PREEMPT)) || $(MEMORY_EXCEEDED)
Also note that jobs will not be preempted until they exhaust their
MaxJobRetirementTime, which is time HTCondor promises to let the job run
without being preempted for any reason. So if you want to immediately
hold jobs that exceed memory usage even if the jobs have specified a
maxjobretirementtime and you are using HTCondor v8.2 or above, you will
want to use the template at
https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToLimitCpuUsage
and just replace $(CPU_EXCEEDED) with $(MEMORY_EXCEEDED).
Nice work Roderick, thanks for sharing!
Hope the above helps,
Todd
Also likely of interest is
https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToLimitCpuUsage
Thanks, thats my next project!
Hope the above helps. Also interested in any thoughts you may have to
improve the above HOWTOs.
Thanks again. See above for (minimal) feedback.
Roderick
regards,
Todd
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing Department of Computer Sciences
HTCondor Technical Lead 1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132 Madison, WI 53706-1685