Hi all, maybe a bit blunt, but would it be possible to update Condor's cgroup main slice of memory echo "${NEWLIMIT}" > /sys/fs/cgroup/memory/system.slice/condor.service/{memory,memsw}.limit_in_bytes (would be outside of the Codnor context & I have no idea, how/if Condor would take note of changed cgroup values...??) Cheers, Thomas On 2018-01-31 22:37, Steve Huston wrote: > On Wed, Jan 31, 2018 at 12:43 PM, John M Knoeller <johnkn@xxxxxxxxxxx> wrote: >> What you can't do is tell HTCondor that it can have all of the memory and also let some other scheduler use all the memory >> and expect HTCondor to dynamically adjust its allocations to account for non-HTCondor memory usage. > > I thought that was the whole point? > > "All machines in the HTCondor pool advertise their resource > properties, both static and dynamic, such as *available RAM memory*, > CPU type, CPU speed, virtual memory size, physical location, and > current load average, in a resource offer ad." -- > http://research.cs.wisc.edu/htcondor/manual/current/1_2HTCondor_s_Power.html > (emphasis mine) > > Of course I could restrict the memory allowed for Condor, and I could > probably with the right settings restrict the available memory for > console (owner) usage to something so that Condor jobs always have > resources. But just like a CPU core can be used by the owner and then > free for Condor usage later, I would think RAM should be as well. > > On Wed, Jan 31, 2018 at 3:29 PM, John M Knoeller <johnkn@xxxxxxxxxxx> wrote: >> You could probably do something using a startd cron script to push a value >> into the slot ads the represents the amount of non-HTCondor memory usage, >> and then have the START expression refer to that value in order to prevent >> matches. There will be some delay between when the startd sees the updated >> value for non-HTCondor usage and when the Negotiator and Schedd see that >> value â so you will still probably get some jobs starting that then just OOM >> killed a little while later, but it wonât *keep* happening. > > I suppose that's the route I'll have to take, if this becomes > problematic enough. So far it hasn't before, and I've been running a > Condor scheduler here for just over 13 years, so it might not be worth > the hassle. I was just confused that it didn't already exist, and > figured I was overlooking something simple. > > Thanks all. >
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature