-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Add zswap support to the kernel to improve swap performance #2649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The main problem with zswap is the performance impact of memory intensive
applications. There is a significant drop in performance if the data set is
right below the RAM amount since zswap allocates a portion of RAM. It also
is supported only on Linux 4.14 or later. For slightly compression
unfriendly in-memory data, LZ4 doesn't get much benefit either since it is
built for performance and cannot compress 3 pages into one.
Sometimes the performance impact can be quite major and this should be carefully evaluated.
|
This is something I looked at a year ago. Let me check results... The benchmark was running:
and time how long it takes for CPU to drop. Default: With 256M of ZRAM 369s With zswap.enabled=1 zswap.compressor=lz4 zswap.max_pool_percent=50 523s With zswap.enabled=1 zswap.compressor=lz4 zswap.max_pool_percent=50 419s I didn't see a benefit with zram/zswap with this current test. But with all these tests, it depends on exactly how much memory is in use whether you see a benefit or not so it's pretty hard to judge whether a change will be beneficial to the majority. |
Another thing to mention about zswap is that it should never exceed 1/3 of RAM which means the memory amplification factor never exceeds 50%. It is also much better suited for systems with large amounts of memory rather than systems with small amounts of memory since idle pages occur more often on large system. In a system with high RAM turnover, low physical RAM, limited memory bandwidth, and limited CPU performance like the Raspberry Pi, allocating more than 25% is a very bad idea. Setting it at 10-15% gives 20%-30% memory amplification which is almost a best case scenario. |
Something seems wrong then, because my Pi3B can load those pages in about 90 seconds with my current configuration (dropping to a lower steady state CPU of 20% or so and all the throbbers done - the Pi blog currently has some gif video above the fold). And I'm not on a fast internet connection. Your zswap.max_pool_percent is way, way high in those tests, though. Try 15-20% and see how it works. And what sort of heatsinking are you using? My 3B+ is in a FLIRC case, which is the only case I've found so far that will keep it at 1.2GHz sustained. I'll get a fresh, clean Raspbian install set up (most of my installs are unusual, to say the least), and do some benchmarking along these lines. For me, though, it was more a "The system locks up when I ask it to load this sufficiently complicated website" issue, more than just a performance/timing issue. dsx724 - you can calculate the amount of traffic that bypasses zswap easily, from the debug counters. The reject_compress_poor count shows how many pages were unable to be suitably compressed, and the number, while not zero, remains quite low compared to the total pages stored in zswap. If you split the swap partition onto a separate device, you can see the low rate of actual writes to it as well. Though I was running zswap on the stock Pi kernel 4.9 - it's supported, you just have to rebuild the kernel for it (same as on 4.14). |
I believe my numbers were using Pi1 (and the more limited 512M sdram). |
@Syonyk 4.9 only supports zbud which compresses 2 pages into one page. 4.14 supports z3fold which compresses 3 pages into one page. zbud will bring almost negligible memory amplification while z3fold will bring modest benefits. For Pi 0, it is not recommended at all since the devices are extremely limited to begin with. For Pi 2/3, one might be able to get some benefits. |
During the testing of an application benchmark suite for Libre Computer, no significant benefit was found for zbud and z3fold on 1GB systems. There were significant benefits for 2GB and 4GB systems. For matrix compute workloads that used slightly under the total physical RAM, there were significant regressions across the board but that was an expected corner case. |
Ah, that would explain a lot - the Pi1 is a far less powerful system. But we shouldn't be restricting the Pi3B/3B+ boards based on benchmarking results from an older system, as they're under a separate kernel. Though I admit that I don't have anything older than a Pi2 to run tests on. I expect the benefits are greater on a multi-core system when otherwise idle cores can be used for the compression and swapping as well. I'm fairly certain the 4.9 kernel did support z3fold as I was using it, but as the version currently in use is 4.14, it doesn't matter anymore. I will attempt to get some benchmarks done on the 3B+ with some different configurations. I recognize that my goals (light desktop use) are not entirely what the Pi is built for, but with proper tuning, it's entirely possible to make it work better than it does out of the box. One thing to be aware of with wallclock based benchmarks is the throttling behavior of the CPUs. The 3B, without a heatsink, goes into throttling behavior very quickly. The 3B+ is far better about that. How many people are running matrix compute workloads sized to fit in RAM on a Raspberry Pi, though? That seems oddly specialized for a general purpose little SBC. |
Before 4.14, zswap would lock up quite often which made it unsuitable. There are a lot of compute workloads optimized for the amount of RAM on a Pi. I brought up our matrix workload just as an example among many others. Those workloads also hit and replace that data constantly which can wreak havoc when paired with zswap. Reserving 100-200MB for the zswap pool is quite significant so such a change should be evaluated carefully. |
I don't believe zswap "reserves" memory - it will use memory as needed, but if swap is not being used, zswap shouldn't interfere with other workloads. If those workloads have been pushing stale pages out to disk swap and they go into zswap instead, yes, I can see that causing performance regressions. The max_pool_percent parameter sets an upper limit, but the pool will often be smaller. |
Sorry I may be using the wrong wording. I meant the steady state condition of RAM allocated for zpool when under pressure. I think a low zpool size like 10%, quick and decent ratio compressor, and z3fold definitely softens the hit to the response time curve so that there are fewer IO stalls in desktop use cases. Definitely should be evaluated. The benefit is just not as pronounced as on large memory systems where it is night and day. |
I think 10% or 15% is a good size to target for desktop use, certainly. And if people have use cases that require it, they can tune it higher at runtime as well. In the event that something is custom written to take advantage of all the memory without swap, certainly you could get into trouble if you've filled zswap first. But if whatever was using memory has terminated, the zswap buffer should shrink down, and leave more physical RAM available than, at least, not having swap. It's not quite as good as pushing swap purely to disk for some edge cases, certainly, but I do question how often the Pi hardware is used, with stock desktop Raspbian, for those use cases. I still think that having more forgiving desktop settings, for the desktop install, is the right path. I think I finally even found a good MicroSD card to put the latest stock image on for testing! |
Experimental results on the current 4.14 kernel in the Raspberry Pi kernel repo: z3fold is unstable under heavy loads and still has some null pointer dereference issues that cause kernel panics and make it unusable when the swap is being thrashed. Which, obviously, is bad. It behaves properly (and very nicely) under light load, but not heavy load, and the failures are still an open problem per kernel list discussions. As such, I retract my request to set it up by default, but I would still like to propose building the zswap/lzo/zbud modules as part of the default kernel install such that people can use them if they wish. The only difference, if they're modules, will be a small increase in disk space for the kernel. When z3fold becomes stable (again? I don't recall the issues with 4.9, but there have been some changes in z3fold since 4.9), we can revisit this. Benchmarking mostly consisted of opening https://inbox.google.com, https://docs.google.com, http://reddit.com, with a few other sites, and watching to see if I got kernel panics and how the debug parameters looked. I consistently saw a compression ratio of about 2.95 with lzo and z3fold, as larger pages spill to disk. With 15% of RAM in use for the pool, this offers ~450MB of useful swap, plus the disk backed pages, and allows far better behavior under at least browser load. |
If you merge some commits, zswap with z3fold on 4.14 is stable. |
Could you point me to those? I've attempted to merge some of the recent patches I've found laying around, and while it changes the nature of the kernel panics, I can still reliably produce a panic. Or if there's another bug somewhere better suited to tracking this, I can work there. It's only under really heavy pressure it seems to die. |
https://bugs.chromium.org/p/chromium/issues/detail?id=822360 I think only 1 made it for 4.18. They should all be in 4.19. |
I don't appear to have access to all of the commits. Is there a chance you could paste the fully updated z3fold.c file to try against 4.14, if you have access? Or I can try a 4.19 build. |
If I use the z3fold.c file out of the 4.19 kernel tree, and apply the last patch listed (https://lore.kernel.org/patchwork/patch/959862/), I get the following results (on multiple block devices, so I don't believe this is purely a corrupted swap partition): [81427.960029] Adding 4194300k swap on /dev/mmcblk0p2. Priority:-2 extents:1 across:4194300k SSFS At this point, running swapoff is likely to hard-freeze the system, and kswapd0 is using 100% of CPU until I reboot. If I use zbud with the exact same configuration, I don't have this error. I'm unable to reproduce the issue with the stock swapfile size, however. With the stock swapfile size of 100MB, I fill swap (apparently without using the swapfile) to 100MB, using typically 35-40MB of zswap space. If I attach a larger swapfile, and add memory pressure until |
Continued updates on this: With the 4.18 source in the tree (at least as of several days ago), with z3fold, I can still reproduce lockups under heavy memory pressure with Chromium, though I've not had luck with some simpler synthetic test cases. In all cases I've tested, zbud performs properly (without either a hard lockup or a kernel bug). I typically set the pool limit to 15% of RAM (so ~150MB), with a large swapfile (~1GB) attached. Start loading up webpages in Chromium, in multiple tabs, while monitoring the zswap use stats (/sys/kernel/debug/zswap). Once the pool size approaches the set limit and zswap has to start rejecting pages or pushing old pages out to disk, the system will likely either lock up entirely or suffer a kernel bug (I tend to monitor dmesg and stop as soon as I see the bug, as sustained stress will then lock the system). With zbud, I cannot reproduce this behavior at all. It just works as expected. However, with the same_filled_page support in 4.18, I still see very good compression ratios from 2-4, typically, depending on use and how long the system has been up. This still deflects a large amount of traffic from the disk backed swap file/partition. Currently, on a system that's been up under typical desktop use (browser, IRC, terminals, etc) for several days: Compression ratio: 2.15x
I would like to at least get zswap included in the stock kernel, even if not enabled. The Pi kernel includes the zram support as a module, and adding zswap support (with at least zbud) as a module would allow people to use it as well, if their use case works for it. I still think enabling it by default would be good for typical use, though with the stock 100MB swap file, it's somewhat limited in how effective it can be (as it doesn't use the swapfile, but still "counts" as swap file used). |
I can still reproduce a crash on z3fold with the 4.19 kernel in the tree, as of commit 0038f2a [ 311.955909] z3fold: unknown buddy id 0 |
Good day for all! PS You can test it. And ask me if you have questions PPS first russian manual for compile kernel with zswap+z3fold I write mar17 for rpi-4.9.y |
@Syonyk Maybe email the mailing list or Jongseok Kim directly? |
I've struggled to reproduce it on x86. But that patch may be worth trying. |
I would try the latest patch without Jongseok Kim's patch. |
RaspberryPi 3B here, running 32bit GNU/Gentoo Linux here on armv7a-unknown-linux-gnueabihf with kernel: 4.18.17. The system is stable, I have not seen any issues so far. The settings:
Current values:
Swap usage:
Compression ratio(possible due to high number of same filled pages): 7.7, obtain like this:
Kernel swappiness:
HTH, |
tested on RPI3 and RPI3B+ with arm64 self build kernel on debian testing. works like a charm. with systemd-swap its also a no brainer |
I've been using z3fold on the 4.20 branch with zero observed issues, so I think it's mature enough to fold in as modules. What is required to make this happen? A pull request? |
I would also like to ask what is required to make this happen. |
Don't use z3fold until mid-4.20. It's still not stable in 4.19. However, it doesn't matter that much. When you add same filled page merging, the value of z3fold drops significantly. In long term running, I see roughly the same total compression with z3fold and zbud, as long as the same-filled merging is enabled. The value of z3fold seems to mostly be allowing one to store an 0-filled page in the slack between two data-filled pages. In some situations, I'm sure it has a slight benefit, but I genuinely couldn't tell you what allocator I'm running based on the total compression in recent kernel versions. I suppose I can submit a patch to the config and put it in the pipeline. |
My call is for zwsap in 4.19 apols for the wording. z3fold yeah great and if it needs to come later then OK. |
As far as I can see this still isn't available - any chance we could turn enable it sometime soon? |
I suppose I'll get a pull request together... unless someone else wants to do it! |
zswap for raspbian .conf please :) |
+1 for adding the zswap & all folding & compression goodies as modules to the official kernel configuration and saving us the time & energy to recompile |
I've created a patch to add the zswap feature to the three kernels - bcmrpi_defconfig, bcm2709_defconfig, bcm2711_defconfig. Linked above. I'm slightly unclear as to what the proper next step is, though. Hopefully the patch gets swallowed in soon! |
This is definitely an improvement, specially for low memory boards. Waiting to see it merged. |
Sorry, I misread the issue as a request to enable zswap by default (as ram compression is in Windows, Mac, Android..). What are your thoughts @Syonyk? Did you find any standard benchmark to show the improvement? Web browsing definitely improves. |
I haven't come up with any synthetic benchmark suites to demonstrate the improvement, but anything with reasonably compressible memory that goes to swap ought to be improved. I just use zswap on my memory limited systems and it radically improves how much they can handle. However, with the Pi4 having sane amounts of memory (4GB/8GB), it matters a bit less for my needs anymore. As it's part of the Pi kernel now, I see no reason to keep this thread open, though. |
In personal experimentation working towards "Using the Raspberry Pi 3B/3B+ as a light duty desktop," I've discovered that fronting my swap file with zswap makes a huge difference in system capability, most notably in how Chromium functions. With stock settings, Chromium on the Pi 3B cannot load Google Inbox (https://inbox.google.com, assuming one's account is enabled) or Google Docs properly. With zswap enabled, I can load both, simultaneously, and still have a usable system.
Under normal operation, with Chromium running, I have a fully responsive system with 300-500MB of memory swapped out - this being memory that, while not able to be discarded, is not actively in use.
I'm aware of the concerns about thrashing the SD card (and the glacial performance of said SD card under swap use), which is why zswap works so well.
zswap, in a nutshell, is a compressed frontend for swap. It's quite configurable, with multiple compression options (lzo and lz4 being the most useful), several ways of storing compressed pages in memory (two and three "slots" per 4k page for compressed data), configurable in terms of percent of total system memory it will cache, etc.
It also includes a LRU (Least Recently Used) algorithm for evicting pages from compressed swap to physical disk swap when the cache is full, which prevents the priority inversion issues one can run into when using zram and physical swap with priorities set (zram fills up with the first stuff swapped out, which is typically least important, leaving physical disk to handle the later, higher priority swap that you'd like to keep in RAM).
Enabling zswap in the kernel requires the following changes to .config:
CONFIG_ZSWAP=y
CONFIG_ZPOOL=y
CONFIG_ZBUD=y
CONFIG_Z3FOLD=y
And, optionally, if you want lzo compression (somewhat better than lz4, but somewhat slower):
CONFIG_CRYPTO_LZO=y
I believe these can be built as modules as well, with no loss of functionality. If the defaults are not zswap, they should be modules, but if the decision is made to use zswap on all installs, these should be built in.
To enable zswap, there needs to be a backing swapfile (already the case, though 100MB is a bit small in 2018), and the kernel needs to have zswap configured. I've done this in /boot/cmdline.txt, though other locations would probably work as well.
At a minimum, this requires: zswap.enabled=1
I've also set the following on my install, though a smaller value may be a better default initially. With Chromium being as memory hungry as it is, I normally raise this at runtime.
zswap.max_pool_percent=15
One can also set:
zswap.zpool=z3fold
However, while this worked properly on 4.9, with 4.14, I've seen a few kernel oopses related to this (buddy ID of 0 - I haven't worked out the details on this bug), so I have reverted to using zbud for now. The effective compression is worse than using z3fold, but the stability is better.
Current zswap parameters on my light desktop:
root@raspberrypi:/sys/kernel/debug/zswap# grep -R .
stored_pages:68856
pool_total_size:151773184
duplicate_entry:0
written_back_pages:0
reject_compress_poor:1091
reject_kmemcache_fail:0
reject_alloc_fail:0
reject_reclaim_fail:0
pool_limit_hit:0
This works out to 282MB of data swapped into 151MB, for a compression ratio of 1.86:1. z3fold is better, but, as previously noted, seems somewhat unstable right now. I will investigate that further when I have time.
I encourage the maintainers to build a kernel with zswap enabled, and use Chromium for a while to observe the difference. It makes a substantial difference in what can be loaded without grinding the system to a halt. If you still run into memory pressure, try adding more swapfile. I currently have a 4GB swapfile, which is entirely excessive and mostly unused, but I'm experimenting and have no particular storage pressures at the moment. I would suggest increasing the default swapfile size to 200MB if zswap is used, although this may be something to simply note for users.
Unlike zram, zswap allows data to overflow out of RAM to physical swap, which allows for better system performance and a higher ratio of "Getting stuff that's actually unused out of RAM."
Some relevant documentation for reading:
https://www.kernel.org/doc/Documentation/vm/zswap.txt
https://lwn.net/Articles/537422/
Let me know what sort of benchmarking or other testing you would like to see in this thread. I understand that the maintainers are touchy about adding anything that requires additional kernel size, but experimentally, zswap is a massive win in terms of usability with the new default browser of Chromium.
The text was updated successfully, but these errors were encountered: