Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] zswap for jobs
Hi together,
to add another datapoint: We have also been using zswap on all EPs since many years, and since a year or so also on all the desktops we manage.
We have not yet encountered any issues â actually, we have some workloads from time to time when users use garbage-collected languages (or hoard memory in self-written applications...) which profit from actual swapping of almost-never-accessed memory.
In some cases, such workloads seem hard to avoid, if memory access patterns of the algorithm they use are hard to predict and contain a significant fraction of rarely accessed memory.
zswap definitely helps in such cases â one reason is "burst to zswap temporarily is still better then killing a job given it runs out of memory" of course (quoting Emily), and the other reason is that swapping into zswap is reasonably cheap and the freed RAM can be used more efficiently (e.g. for file cache).
I've yet to see a clean study of this (or we have to find time to do it) but this is what I observe when I investigate cases with heavy memory pressure. zswap always seems to improve the situations in the cases I checked manually.
On our production systems, we've also switched to zswap.zpool=zsmalloc since about half a year (which is not the default yet for all distros / enterprise Linuxes, but improves compression ratios a bit).
On my personal systems, I went "the other way" and run without swap partition, with zswap.enabled=0 and those sysctl overrides:
vm.swappiness = 180
vm.watermark_boost_factor = 0
vm.watermark_scale_factor = 125
vm.page-cluster = 0
(see also: https://wiki.archlinux.org/title/Zram#Optimizing_swap_on_zram ). That works quite well for me to swap out things like idle Firefox and Thunderbird memory regions into compressed "cold" memory ;-).
This is likely not the best choice for servers (potentially, zram with disk-backed backing store would be, but I never came around to test this), so we are using zswap with a classic swap disk device there as described before.
I second the "admin configurable" point raised by Emily ;-).
Cheers,
Oliver
Am 09.04.26 um 13:06 schrieb Emily Kooistra:
Hi Greg,
On 4/8/26 17:16, Greg Thain via HTCondor-users wrote:
On 4/8/26 07:04, Emily Kooistra wrote:
In my opinion, that's the advantage of zswap: the kernel manages the lot, there's no tuning needed (balancing between
zwap and disk-cache).
Yea that was also my impression, altho beeing able to limit the total amount of memory a job can store in zswap would be beneficial, given this right now is unlimited. (Or well up to the system max), by having condor also set memory.zswap.max based on a classadd expresion, similary to the other cgroup limits.
Hi Emily:
Historically, we have been cautious about encouraging the use of swap space for jobs. While this can result in a increase in memory utilization and perhaps throughput for a well-controlled and well- understood workflow, it is easy for a poorly behaved job to have astronomically bad results.
I'm curious if your idea is that the *user* (e.g. the job) would control the use of zswap, or the *admin* (e.g. the condor_starter)? Already today we find that users have difficulty estimating and measuring their memory needs.
For swap i fully agree with your point, however users specifying there memory requirements is surprisingly hard. But zswap is a lot more controlled in this. One can specify how much ram of the total system can be allowed to zswap. So for example you reserve a % of your total memory pool for zswap, and then can also just not hand this out to jobs but only to jobs in the form of zswap. And as long as you disable writeback you don't end up actually flushing it to disk for the jobs.
The main idea mostly is that currently we have in a lot of our EPs way more memory then people request, and with the cgroup hard caps this is not beeing able to use it for disk cache. Ofcourse one can increase the hard cap to some degree, but temporary allowing jobs to burst and let it spill over to zswap does not hurt in my opinion. In that sense allowing a job to burst to zswap temporarily is still better then killing a job given it runs out of memory. That is worse throughput.
I would see it as admin configurable, in the same setup as the cgroup memory limits are configured. And if you enable writeback or not, so you could allocate a 10-20% burst buffer for jobs if needed in the form of zswap.
Emily
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/
--
Oliver Freyermuth
UniversitÃt Bonn
Physikalisches Institut, Raum 1.055
KÃthe-KÃmmel-StraÃe 1
53115 Bonn
--
Tel.: +49 228 73 2367
Fax: +49 228 73 7869
--