[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] dynamic allocation of RAM



On 03/15/2016 02:45 PM, Thomas Hartmann wrote:
> thank you for the suggestion, dimitri. but if i understand correctly
> what is happening there, a job that exceeds the limit will be put on
> hold and then rescheduled also if there is the possibility to just
> increase the request_memory on the same machine. we cannot work with
> checkpoints here (at least not using htcondor's standard universe), so
> this would mean that jobs would need to rerun from the very beginning.

The obvious question is what do you expect to happen when you "increase
the request_memory on the same machine" past the VM the machine actually
has.

In general, when I think of how MemoryUsage plays together with
overcommit, request_memory, OOM killer, and now cgroups, my brane hurtz.
So I try to keep it simple stupid: I have other jobs that run with
request_memory = 0. I happen to *know* that calculated MemoryUsage for
them is wildly incorrect, I *know* how much RAM they really need, and I
run them on dedicated machines that *have that RAM*. But if you don't
know those 3 things, I don't advise running with "request_memory = 0"

HTH
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu

Attachment: signature.asc
Description: OpenPGP digital signature