Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] ResidentSetSize
- Date: Tue, 10 May 2016 15:01:08 -0500
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] ResidentSetSize
On 5/10/2016 2:38 PM, Bob Ball wrote:
It would appear that the ResidentSetSize value does not immediately
appear in the Job ClassAd as a job starts, but only some short time
later. OK, I can live with that.
However, what is harder to understand is that we have seen instances
where it will display a value of 27GB when a job starts! I was trying
to use this on the schedd machine
SYSTEM_PERIODIC_REMOVE = ResidentSetSize > 5000*RequestMemory
but that 27GB cut was killing the jobs. However, once I removed this
from the schedd, similar jobs finished fine, with ClassAds reporting
only about 2.7GB of final ResidentSetSize.
Has anyone seen this kind of thing before? How could this be real, and
why does the ResidentSetSize not appear in the job ClassAd at job start?
We are running HTCondor 8.4.6 with cgroups enabled.
Hi Bob,
Couple quick thoughts:
1. We've seen Linux cgroups report large memory sizes for jobs because
it is including memory used by the kernel to buffer file system writes.
If you have condor_config knob ENABLE_KERNEL_TUNING set to True (the
default), HTCondor should prevent this problem from happening starting
with v8.4.5. But if you disabled this for whatever reason, that could
be the culprit. For more info see
https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=5500
2. Suggest using MemoryUsage for all policy expressions instead of
ResidentSetSize.
3. As to why does ResidentSetSize not appear in the job classad at job
start : Immediately at job start (i.e. right after the job is execed),
many jobs haven't allocated much memory yet, because they are still
initializing. So now the question is how long to wait... 1 second? 1
minute? By default HTCondor waits 8 seconds after job startup before
updating ResidentSetSize and other attributes. There is a
condor_starter config knob for this named STARTER_INITIAL_UPDATE_INTERVAL.
4. You may be interested in the HOWTO at
https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToLimitMemoryUsage
hope the above helps
Todd