Hi Thomas, On 24/10/2017 09:17, Thomas Hartmann
wrote:
do you set the limit or your htcondor does? because my htcondor doesn't set that limit. Maybe I'm doing something wrong.Hi Todd, (sorry to fork in between) I am a bit confused regarding the soft limits. So far I had assumed that the kernel would allow a cgroup to exceed its soft limit usage as long as there is free memory available - and kill a group's processes if the system runs low on unwired memory (assuming a translation between limits in condor to cgroup limits). no the kernel doesn't kill with the soft limit. This is why system periodic remove is needed.So, we have effectively not set a 'real' cgroup hard limit assuming that the soft limit would be sufficient, e.g., would the kernel kill [1] when exceeding it's 4GB soft limit and running low on system-wide memory? infact memsw is the place where RAM+swap is limited. However as pointed out in the thread you may end up with a job which has 0 memory and 4GB of swap.(looking now onto the values: would memsw -set to such a large value- actually send the job heavily swapping...?) Cheers, Thomas [1] /sys/fs/cgroup/memory/system.slice/condor.service/condor_var_lib_condor_execute_slot1_6@xxxxxxxxxxxxxxxxx/memory.limit_in_bytes 142668537856 /sys/fs/cgroup/memory/system.slice/condor.service/condor_var_lib_condor_execute_slot1_6@xxxxxxxxxxxxxxxxx/memory.memsw.limit_in_bytes 142668541952 /sys/fs/cgroup/memory/system.slice/condor.service/condor_var_lib_condor_execute_slot1_6@xxxxxxxxxxxxxxxxx/memory.soft_limit_in_bytes 4294967296 On 2017-10-20 18:26, Todd Tannenbaum wrote:On 10/20/2017 9:44 AM, Alessandra Forti wrote:Hi, is more information needed?Hi Alessandra, The version of HTCondor you are using would be helpful :). But I have some answers/suggestions below that I hope will help...* On the head node RemoveMemoryUsage = ( ResidentSetSize_RAW > 2000*RequestMemory ) SYSTEM_PERIODIC_REMOVE = $(RemoveMemoryUsage) || <OtherParameters> So the questions are two 1) Why SYSTEM_PERIODIC_REMOVE didn't work?Because the (system_)periodic_remove expressions are evaluated by the condor_shadow while the job is running, and the *_RAW attributes are only updated in the condor_schedd. A simple solution is to use attribute MemoryUsage instead of ResidentSetSize_RAW. So I think things will work as you want if you instead did:  RemoveMemoryUsage = ( MemoryUsage > 2*RequestMemory )  SYSTEM_PERIODIC_REMOVE = $(RemoveMemoryUsage) || <OtherParameters> Note that MemoryUsage is in the same units as RequestMemory, so only need to multiply by 2 instead of 2000. You are not the first person to be tripped up by this. :( I realize it is not at all intuitive. I think I will add a quick patch in the code to allow _RAW attributes to be referenced inside of job policy expressions to help prevent frustration by the next person. Also you may want to place your memory limit policy on the execute nodes via startd policy _expression_, instead of having them enforced on the submit machine (what I think you are calling the head node). The reason is the execute node policy is evaluated every five seconds, while the submit machine policy is evaluated every several minutes. A runaway job could consume a lot of memory in a few minutes :).2) Shouldn't htcondor set the job soft limit with this configuration? or is the site expected to set the soft limit separately?Personally, I think "soft" limits in cgroups are completely bogus. The way the Linux kernel treats soft limits does not do in practice what anyone (including htcondor itself) expects. I recommend settings CGROUP_MEMORY_LIMIT to either none or hard, soft makes no sense imho. "CGROUP_MEMORY_LIMIT=hard" is clear to understand: if the job uses more memory than it requested, it is __immediately__ kicked off and put on hold. This way users get a consistent experience. If you want jobs to be able to go over their requested memory so long as the machine isn't swapping, consider disabling swap on your execute nodes (not a bad idea for compute servers in general) and simply leaving "CGROUP_MEMORY_LIMIT=none". What will happen is if the system is stressed, eventually the Linux OOM (out of memory killer) will kick in and pick a process to kill. HTCondor sets the OOM priority of job process such that the OOM killer should always pick job processes ahead of other processes on the system. Furthermore, HTCondor "captures" the OOM request to kill a job and only allows it to continue if the job is indeed using more memory than requested (i.e. provisioned in the slot). This is probably what you wanted by setting the limit to soft in the first place. I am thinking we should remove the "soft" option to CGROUP_MEMORY_LIMIT in future releases, it just causes confusion imho. Curious if others on the list disagree... Hope the above helps, regards, Todd _______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/ -- Respect is a rational process. \\// Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante) For Ur-Fascism, disagreement is treason. (U. Eco) But but but her emails... covfefe! |