Hi Todd, (sorry to fork in between) I am a bit confused regarding the soft limits. So far I had assumed that the kernel would allow a cgroup to exceed its soft limit usage as long as there is free memory available - and kill a group's processes if the system runs low on unwired memory (assuming a translation between limits in condor to cgroup limits). So, we have effectively not set a 'real' cgroup hard limit assuming that the soft limit would be sufficient, e.g., would the kernel kill [1] when exceeding it's 4GB soft limit and running low on system-wide memory? (looking now onto the values: would memsw -set to such a large value- actually send the job heavily swapping...?) Cheers, Thomas [1] /sys/fs/cgroup/memory/system.slice/condor.service/condor_var_lib_condor_execute_slot1_6@xxxxxxxxxxxxxxxxx/memory.limit_in_bytes 142668537856 /sys/fs/cgroup/memory/system.slice/condor.service/condor_var_lib_condor_execute_slot1_6@xxxxxxxxxxxxxxxxx/memory.memsw.limit_in_bytes 142668541952 /sys/fs/cgroup/memory/system.slice/condor.service/condor_var_lib_condor_execute_slot1_6@xxxxxxxxxxxxxxxxx/memory.soft_limit_in_bytes 4294967296 On 2017-10-20 18:26, Todd Tannenbaum wrote: > On 10/20/2017 9:44 AM, Alessandra Forti wrote: >> Hi, >> >> is more information needed? >> > > Hi Alessandra, > > The version of HTCondor you are using would be helpful :). > > But I have some answers/suggestions below that I hope will help... > >>> * On the head node >>> >>> RemoveMemoryUsage = ( ResidentSetSize_RAW > 2000*RequestMemory ) >>> SYSTEM_PERIODIC_REMOVE = $(RemoveMemoryUsage) || <OtherParameters> >>> >>> So the questions are two >>> >>> 1) Why SYSTEM_PERIODIC_REMOVE didn't work? > > Because the (system_)periodic_remove expressions are evaluated by the > condor_shadow while the job is running, and the *_RAW attributes are > only updated in the condor_schedd. > > A simple solution is to use attribute MemoryUsage instead of > ResidentSetSize_RAW. So I think things will work as you want if you > instead did: > >  RemoveMemoryUsage = ( MemoryUsage > 2*RequestMemory ) >  SYSTEM_PERIODIC_REMOVE = $(RemoveMemoryUsage) || <OtherParameters> > > Note that MemoryUsage is in the same units as RequestMemory, so only > need to multiply by 2 instead of 2000. > > You are not the first person to be tripped up by this. :( I realize it > is not at all intuitive. I think I will add a quick patch in the code to > allow _RAW attributes to be referenced inside of job policy expressions > to help prevent frustration by the next person. > > Also you may want to place your memory limit policy on the execute nodes > via startd policy expression, instead of having them enforced on the > submit machine (what I think you are calling the head node). The reason > is the execute node policy is evaluated every five seconds, while the > submit machine policy is evaluated every several minutes. A runaway job > could consume a lot of memory in a few minutes :). > >>> 2) Shouldn't htcondor set the job soft limit with this configuration? >>> or is the site expected to set the soft limit separately? >>> > > Personally, I think "soft" limits in cgroups are completely bogus. The > way the Linux kernel treats soft limits does not do in practice what > anyone (including htcondor itself) expects. I recommend settings > CGROUP_MEMORY_LIMIT to either none or hard, soft makes no sense imho. > > "CGROUP_MEMORY_LIMIT=hard" is clear to understand: if the job uses more > memory than it requested, it is __immediately__ kicked off and put on > hold. This way users get a consistent experience. > > If you want jobs to be able to go over their requested memory so long as > the machine isn't swapping, consider disabling swap on your execute > nodes (not a bad idea for compute servers in general) and simply leaving > "CGROUP_MEMORY_LIMIT=none". What will happen is if the system is > stressed, eventually the Linux OOM (out of memory killer) will kick in > and pick a process to kill. HTCondor sets the OOM priority of job > process such that the OOM killer should always pick job processes ahead > of other processes on the system. Furthermore, HTCondor "captures" the > OOM request to kill a job and only allows it to continue if the job is > indeed using more memory than requested (i.e. provisioned in the slot). > This is probably what you wanted by setting the limit to soft in the > first place. > > I am thinking we should remove the "soft" option to CGROUP_MEMORY_LIMIT > in future releases, it just causes confusion imho. Curious if others on > the list disagree... > > Hope the above helps, > regards, > Todd > > _______________________________________________ > HTCondor-users mailing list > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a > subject: Unsubscribe > You can also unsubscribe by visiting > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users > > The archives can be found at: > https://lists.cs.wisc.edu/archive/htcondor-users/
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature