On 03/15/2016 02:45 PM, Thomas Hartmann wrote: > thank you for the suggestion, dimitri. but if i understand correctly > what is happening there, a job that exceeds the limit will be put on > hold and then rescheduled also if there is the possibility to just > increase the request_memory on the same machine. we cannot work with > checkpoints here (at least not using htcondor's standard universe), so > this would mean that jobs would need to rerun from the very beginning. The obvious question is what do you expect to happen when you "increase the request_memory on the same machine" past the VM the machine actually has. In general, when I think of how MemoryUsage plays together with overcommit, request_memory, OOM killer, and now cgroups, my brane hurtz. So I try to keep it simple stupid: I have other jobs that run with request_memory = 0. I happen to *know* that calculated MemoryUsage for them is wildly incorrect, I *know* how much RAM they really need, and I run them on dedicated machines that *have that RAM*. But if you don't know those 3 things, I don't advise running with "request_memory = 0" HTH -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
Attachment:
signature.asc
Description: OpenPGP digital signature