Hi, I've setup cgroups on my htcondor cluster some months ago. I expected cgroups to handle soft limits and htcondor to kill with SYTEM_PERIODIC_REMOVE when the limit is twice the requested memory. However last week we had a user running havoc on the nodes and using up to 35GB of RSS when his limit should have been 4GB. My settings are as follows * On the WNs # Enable CGROUP BASE_CGROUP = /system.slice/condor.service CGROUP_MEMORY_LIMIT = soft * On the head node RemoveMemoryUsage = ( ResidentSetSize_RAW > 2000*RequestMemory ) SYSTEM_PERIODIC_REMOVE = $(RemoveMemoryUsage)Â ||Â <OtherParameters> this is a set up other sites have. cgroup doesn't have any limit set neither soft nor hard. So the questions are two 1) Why SYSTEM_PERIODIC_REMOVEÂ didn't work? Here is an example of job that exceeded the limit 4GB limit condor_history 66469.0 -autoformat ClusterId 2000*RequestMemory ResidentSetSize_RAW 66469 4000000 34723028 2) Shouldn't htcondor set the job soft limit with this configuration? or is the site expected to set the soft limit separately? thanks cheers alessandra -- Respect is a rational process. \\// Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante) For Ur-Fascism, disagreement is treason. (U. Eco) But but but her emails... covfefe! |