Hi Greg, On 4/8/26 17:16, Greg Thain via HTCondor-users wrote:
On 4/8/26 07:04, Emily Kooistra wrote:Yea that was also my impression, altho beeing able to limit the total amount of memory a job can store in zswap would be beneficial, given this right now is unlimited. (Or well up to the system max), by having condor also set memory.zswap.max based on a classadd expresion, similary to the other cgroup limits.In my opinion, that's the advantage of zswap: the kernel manages the lot, there's no tuning needed (balancing betweenzwap and disk-cache).Hi Emily:Historically, we have been cautious about encouraging the use of swap space for jobs. While this can result in a increase in memory utilization and perhaps throughput for a well-controlled and well- understood workflow, it is easy for a poorly behaved job to have astronomically bad results.I'm curious if your idea is that the *user* (e.g. the job) would control the use of zswap, or the *admin* (e.g. the condor_starter)? Already today we find that users have difficulty estimating and measuring their memory needs.
For swap i fully agree with your point, however users specifying there memory requirements is surprisingly hard. But zswap is a lot more controlled in this. One can specify how much ram of the total system can be allowed to zswap. So for example you reserve a % of your total memory pool for zswap, and then can also just not hand this out to jobs but only to jobs in the form of zswap. And as long as you disable writeback you don't end up actually flushing it to disk for the jobs.
The main idea mostly is that currently we have in a lot of our EPs way more memory then people request, and with the cgroup hard caps this is not beeing able to use it for disk cache. Ofcourse one can increase the hard cap to some degree, but temporary allowing jobs to burst and let it spill over to zswap does not hurt in my opinion. In that sense allowing a job to burst to zswap temporarily is still better then killing a job given it runs out of memory. That is worse throughput.
I would see it as admin configurable, in the same setup as the cgroup memory limits are configured. And if you enable writeback or not, so you could allocate a 10-20% burst buffer for jobs if needed in the form of zswap.
Emily