Skip to content

Add zswap support to the kernel to improve swap performance #2649

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Syonyk opened this issue Aug 16, 2018 · 40 comments
Closed

Add zswap support to the kernel to improve swap performance #2649

Syonyk opened this issue Aug 16, 2018 · 40 comments

Comments

@Syonyk
Copy link

Syonyk commented Aug 16, 2018

In personal experimentation working towards "Using the Raspberry Pi 3B/3B+ as a light duty desktop," I've discovered that fronting my swap file with zswap makes a huge difference in system capability, most notably in how Chromium functions. With stock settings, Chromium on the Pi 3B cannot load Google Inbox (https://inbox.google.com, assuming one's account is enabled) or Google Docs properly. With zswap enabled, I can load both, simultaneously, and still have a usable system.

Under normal operation, with Chromium running, I have a fully responsive system with 300-500MB of memory swapped out - this being memory that, while not able to be discarded, is not actively in use.

I'm aware of the concerns about thrashing the SD card (and the glacial performance of said SD card under swap use), which is why zswap works so well.

zswap, in a nutshell, is a compressed frontend for swap. It's quite configurable, with multiple compression options (lzo and lz4 being the most useful), several ways of storing compressed pages in memory (two and three "slots" per 4k page for compressed data), configurable in terms of percent of total system memory it will cache, etc.

It also includes a LRU (Least Recently Used) algorithm for evicting pages from compressed swap to physical disk swap when the cache is full, which prevents the priority inversion issues one can run into when using zram and physical swap with priorities set (zram fills up with the first stuff swapped out, which is typically least important, leaving physical disk to handle the later, higher priority swap that you'd like to keep in RAM).

Enabling zswap in the kernel requires the following changes to .config:
CONFIG_ZSWAP=y
CONFIG_ZPOOL=y
CONFIG_ZBUD=y
CONFIG_Z3FOLD=y

And, optionally, if you want lzo compression (somewhat better than lz4, but somewhat slower):
CONFIG_CRYPTO_LZO=y

I believe these can be built as modules as well, with no loss of functionality. If the defaults are not zswap, they should be modules, but if the decision is made to use zswap on all installs, these should be built in.

To enable zswap, there needs to be a backing swapfile (already the case, though 100MB is a bit small in 2018), and the kernel needs to have zswap configured. I've done this in /boot/cmdline.txt, though other locations would probably work as well.

At a minimum, this requires: zswap.enabled=1

I've also set the following on my install, though a smaller value may be a better default initially. With Chromium being as memory hungry as it is, I normally raise this at runtime.
zswap.max_pool_percent=15

One can also set:
zswap.zpool=z3fold

However, while this worked properly on 4.9, with 4.14, I've seen a few kernel oopses related to this (buddy ID of 0 - I haven't worked out the details on this bug), so I have reverted to using zbud for now. The effective compression is worse than using z3fold, but the stability is better.

Current zswap parameters on my light desktop:

root@raspberrypi:/sys/kernel/debug/zswap# grep -R .
stored_pages:68856
pool_total_size:151773184
duplicate_entry:0
written_back_pages:0
reject_compress_poor:1091
reject_kmemcache_fail:0
reject_alloc_fail:0
reject_reclaim_fail:0
pool_limit_hit:0

This works out to 282MB of data swapped into 151MB, for a compression ratio of 1.86:1. z3fold is better, but, as previously noted, seems somewhat unstable right now. I will investigate that further when I have time.

I encourage the maintainers to build a kernel with zswap enabled, and use Chromium for a while to observe the difference. It makes a substantial difference in what can be loaded without grinding the system to a halt. If you still run into memory pressure, try adding more swapfile. I currently have a 4GB swapfile, which is entirely excessive and mostly unused, but I'm experimenting and have no particular storage pressures at the moment. I would suggest increasing the default swapfile size to 200MB if zswap is used, although this may be something to simply note for users.

Unlike zram, zswap allows data to overflow out of RAM to physical swap, which allows for better system performance and a higher ratio of "Getting stuff that's actually unused out of RAM."

Some relevant documentation for reading:
https://www.kernel.org/doc/Documentation/vm/zswap.txt
https://lwn.net/Articles/537422/

Let me know what sort of benchmarking or other testing you would like to see in this thread. I understand that the maintainers are touchy about adding anything that requires additional kernel size, but experimentally, zswap is a massive win in terms of usability with the new default browser of Chromium.

@dsx724
Copy link

dsx724 commented Aug 16, 2018 via email

@popcornmix
Copy link
Collaborator

This is something I looked at a year ago. Let me check results...

The benchmark was running:

chromium-browser https://www.raspberrypi.org/blog/ http://www.bbc.co.uk/ http://www.antipope.org/charlie/blog-static/index.html

and time how long it takes for CPU to drop.

Default:
383s
368s
376s

With 256M of ZRAM

369s
386s
386s

With zswap.enabled=1 zswap.compressor=lz4 zswap.max_pool_percent=50
CONF_SWAPSIZE=100

523s
542s
529s

With zswap.enabled=1 zswap.compressor=lz4 zswap.max_pool_percent=50
CONF_SWAPSIZE=512

419s
389s
384s

I didn't see a benefit with zram/zswap with this current test.

But with all these tests, it depends on exactly how much memory is in use whether you see a benefit or not so it's pretty hard to judge whether a change will be beneficial to the majority.

@dsx724
Copy link

dsx724 commented Aug 16, 2018

Another thing to mention about zswap is that it should never exceed 1/3 of RAM which means the memory amplification factor never exceeds 50%. It is also much better suited for systems with large amounts of memory rather than systems with small amounts of memory since idle pages occur more often on large system. In a system with high RAM turnover, low physical RAM, limited memory bandwidth, and limited CPU performance like the Raspberry Pi, allocating more than 25% is a very bad idea. Setting it at 10-15% gives 20%-30% memory amplification which is almost a best case scenario.

@Syonyk
Copy link
Author

Syonyk commented Aug 16, 2018

Something seems wrong then, because my Pi3B can load those pages in about 90 seconds with my current configuration (dropping to a lower steady state CPU of 20% or so and all the throbbers done - the Pi blog currently has some gif video above the fold). And I'm not on a fast internet connection.

Your zswap.max_pool_percent is way, way high in those tests, though. Try 15-20% and see how it works. And what sort of heatsinking are you using? My 3B+ is in a FLIRC case, which is the only case I've found so far that will keep it at 1.2GHz sustained.

I'll get a fresh, clean Raspbian install set up (most of my installs are unusual, to say the least), and do some benchmarking along these lines. For me, though, it was more a "The system locks up when I ask it to load this sufficiently complicated website" issue, more than just a performance/timing issue.

dsx724 - you can calculate the amount of traffic that bypasses zswap easily, from the debug counters. The reject_compress_poor count shows how many pages were unable to be suitably compressed, and the number, while not zero, remains quite low compared to the total pages stored in zswap. If you split the swap partition onto a separate device, you can see the low rate of actual writes to it as well.

Though I was running zswap on the stock Pi kernel 4.9 - it's supported, you just have to rebuild the kernel for it (same as on 4.14).

@popcornmix
Copy link
Collaborator

I believe my numbers were using Pi1 (and the more limited 512M sdram).

@dsx724
Copy link

dsx724 commented Aug 16, 2018

@Syonyk 4.9 only supports zbud which compresses 2 pages into one page. 4.14 supports z3fold which compresses 3 pages into one page. zbud will bring almost negligible memory amplification while z3fold will bring modest benefits. For Pi 0, it is not recommended at all since the devices are extremely limited to begin with. For Pi 2/3, one might be able to get some benefits.

@dsx724
Copy link

dsx724 commented Aug 16, 2018

During the testing of an application benchmark suite for Libre Computer, no significant benefit was found for zbud and z3fold on 1GB systems. There were significant benefits for 2GB and 4GB systems. For matrix compute workloads that used slightly under the total physical RAM, there were significant regressions across the board but that was an expected corner case.

@Syonyk
Copy link
Author

Syonyk commented Aug 16, 2018

Ah, that would explain a lot - the Pi1 is a far less powerful system. But we shouldn't be restricting the Pi3B/3B+ boards based on benchmarking results from an older system, as they're under a separate kernel. Though I admit that I don't have anything older than a Pi2 to run tests on. I expect the benefits are greater on a multi-core system when otherwise idle cores can be used for the compression and swapping as well.

I'm fairly certain the 4.9 kernel did support z3fold as I was using it, but as the version currently in use is 4.14, it doesn't matter anymore.

I will attempt to get some benchmarks done on the 3B+ with some different configurations. I recognize that my goals (light desktop use) are not entirely what the Pi is built for, but with proper tuning, it's entirely possible to make it work better than it does out of the box.

One thing to be aware of with wallclock based benchmarks is the throttling behavior of the CPUs. The 3B, without a heatsink, goes into throttling behavior very quickly. The 3B+ is far better about that.

How many people are running matrix compute workloads sized to fit in RAM on a Raspberry Pi, though? That seems oddly specialized for a general purpose little SBC.

@dsx724
Copy link

dsx724 commented Aug 16, 2018

Before 4.14, zswap would lock up quite often which made it unsuitable. There are a lot of compute workloads optimized for the amount of RAM on a Pi. I brought up our matrix workload just as an example among many others. Those workloads also hit and replace that data constantly which can wreak havoc when paired with zswap. Reserving 100-200MB for the zswap pool is quite significant so such a change should be evaluated carefully.

@Syonyk
Copy link
Author

Syonyk commented Aug 16, 2018

I don't believe zswap "reserves" memory - it will use memory as needed, but if swap is not being used, zswap shouldn't interfere with other workloads. If those workloads have been pushing stale pages out to disk swap and they go into zswap instead, yes, I can see that causing performance regressions. The max_pool_percent parameter sets an upper limit, but the pool will often be smaller.

@dsx724
Copy link

dsx724 commented Aug 16, 2018

Sorry I may be using the wrong wording. I meant the steady state condition of RAM allocated for zpool when under pressure. I think a low zpool size like 10%, quick and decent ratio compressor, and z3fold definitely softens the hit to the response time curve so that there are fewer IO stalls in desktop use cases. Definitely should be evaluated. The benefit is just not as pronounced as on large memory systems where it is night and day.

@Syonyk
Copy link
Author

Syonyk commented Aug 16, 2018

I think 10% or 15% is a good size to target for desktop use, certainly. And if people have use cases that require it, they can tune it higher at runtime as well.

In the event that something is custom written to take advantage of all the memory without swap, certainly you could get into trouble if you've filled zswap first. But if whatever was using memory has terminated, the zswap buffer should shrink down, and leave more physical RAM available than, at least, not having swap. It's not quite as good as pushing swap purely to disk for some edge cases, certainly, but I do question how often the Pi hardware is used, with stock desktop Raspbian, for those use cases.

I still think that having more forgiving desktop settings, for the desktop install, is the right path. I think I finally even found a good MicroSD card to put the latest stock image on for testing!

@Syonyk
Copy link
Author

Syonyk commented Aug 21, 2018

Experimental results on the current 4.14 kernel in the Raspberry Pi kernel repo:

z3fold is unstable under heavy loads and still has some null pointer dereference issues that cause kernel panics and make it unusable when the swap is being thrashed. Which, obviously, is bad. It behaves properly (and very nicely) under light load, but not heavy load, and the failures are still an open problem per kernel list discussions.

As such, I retract my request to set it up by default, but I would still like to propose building the zswap/lzo/zbud modules as part of the default kernel install such that people can use them if they wish. The only difference, if they're modules, will be a small increase in disk space for the kernel.

When z3fold becomes stable (again? I don't recall the issues with 4.9, but there have been some changes in z3fold since 4.9), we can revisit this.

Benchmarking mostly consisted of opening https://inbox.google.com, https://docs.google.com, http://reddit.com, with a few other sites, and watching to see if I got kernel panics and how the debug parameters looked. I consistently saw a compression ratio of about 2.95 with lzo and z3fold, as larger pages spill to disk. With 15% of RAM in use for the pool, this offers ~450MB of useful swap, plus the disk backed pages, and allows far better behavior under at least browser load.

@dsx724
Copy link

dsx724 commented Aug 21, 2018

If you merge some commits, zswap with z3fold on 4.14 is stable.

@Syonyk
Copy link
Author

Syonyk commented Aug 21, 2018

Could you point me to those? I've attempted to merge some of the recent patches I've found laying around, and while it changes the nature of the kernel panics, I can still reliably produce a panic. Or if there's another bug somewhere better suited to tracking this, I can work there. It's only under really heavy pressure it seems to die.

@dsx724
Copy link

dsx724 commented Aug 21, 2018

https://bugs.chromium.org/p/chromium/issues/detail?id=822360

I think only 1 made it for 4.18. They should all be in 4.19.

@Syonyk
Copy link
Author

Syonyk commented Aug 21, 2018

I don't appear to have access to all of the commits. Is there a chance you could paste the fully updated z3fold.c file to try against 4.14, if you have access? Or I can try a 4.19 build.

@Syonyk
Copy link
Author

Syonyk commented Aug 21, 2018

If I use the z3fold.c file out of the 4.19 kernel tree, and apply the last patch listed (https://lore.kernel.org/patchwork/patch/959862/), I get the following results (on multiple block devices, so I don't believe this is purely a corrupted swap partition):

[81427.960029] Adding 4194300k swap on /dev/mmcblk0p2. Priority:-2 extents:1 across:4194300k SSFS
[81569.387470] BUG: Bad page state in process bash pfn:34da6
[81569.387490] page:ba716b58 count:-1 mapcount:0 mapping: (null) index:0x0
[81569.387499] flags: 0x0()
[81569.387510] raw: 00000000 00000000 00000000 ffffffff ffffffff 00000100 00000200 00000001
[81569.387515] raw: 00000000
[81569.387519] page dumped because: nonzero _refcount
[81569.387524] Modules linked in: fuse ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter rfcomm bnep hci_uart btbcm serdev bluetooth ecdh_generic binfmt_misc joydev evdev ftdi_sio usbserial sg brcmfmac brcmutil cfg80211 rfkill snd_bcm2835(C) snd_pcm snd_timer snd uio_pdrv_genirq uio fixed i2c_dev ip_tables x_tables ipv6
[81569.387609] CPU: 2 PID: 1742 Comm: bash Tainted: G C 4.14.61-v7+ #2
[81569.387611] Hardware name: BCM2835
[81569.387633] [<8010ffc0>] (unwind_backtrace) from [<8010c1fc>] (show_stack+0x20/0x24)
[81569.387644] [<8010c1fc>] (show_stack) from [<8089cddc>] (dump_stack+0xc8/0x10c)
[81569.387655] [<8089cddc>] (dump_stack) from [<80225db8>] (bad_page+0x108/0x16c)
[81569.387664] [<80225db8>] (bad_page) from [<80225ea0>] (free_pages_check_bad+0x84/0x88)
[81569.387672] [<80225ea0>] (free_pages_check_bad) from [<802265e0>] (free_pcppages_bulk+0x394/0x4e8)
[81569.387680] [<802265e0>] (free_pcppages_bulk) from [<802283f4>] (free_hot_cold_page+0x28c/0x2a0)
[81569.387690] [<802283f4>] (free_hot_cold_page) from [<80228798>] (free_hot_cold_page_list+0x68/0xe4)
[81569.387699] [<80228798>] (free_hot_cold_page_list) from [<80238e74>] (shrink_page_list+0x498/0xe3c)
[81569.387708] [<80238e74>] (shrink_page_list) from [<80239f64>] (shrink_inactive_list+0x230/0x62c)
[81569.387715] [<80239f64>] (shrink_inactive_list) from [<8023aad0>] (shrink_node_memcg+0x364/0x6a4)
[81569.387723] [<8023aad0>] (shrink_node_memcg) from [<8023af04>] (shrink_node+0xf4/0x364)
[81569.387730] [<8023af04>] (shrink_node) from [<8023b280>] (do_try_to_free_pages+0x10c/0x3a4)
[81569.387738] [<8023b280>] (do_try_to_free_pages) from [<8023b680>] (try_to_free_pages+0x168/0x474)
[81569.387746] [<8023b680>] (try_to_free_pages) from [<8022a1e0>] (__alloc_pages_nodemask+0x5e4/0x1130)
[81569.387756] [<8022a1e0>] (__alloc_pages_nodemask) from [<8011b000>] (copy_process.part.5+0xec/0x17d8)
[81569.387766] [<8011b000>] (copy_process.part.5) from [<8011c864>] (_do_fork+0xb0/0x3ec)
[81569.387775] [<8011c864>] (_do_fork) from [<8011ccbc>] (SyS_clone+0x30/0x38)
[81569.387785] [<8011ccbc>] (SyS_clone) from [<80108000>] (ret_fast_syscall+0x0/0x28)
[81569.387789] Disabling lock debugging due to kernel taint
[81569.394031] BUG: Bad page state in process bash pfn:2c18f
[81569.394049] page:ba5db81c count:-1 mapcount:0 mapping: (null) index:0x0
[81569.394058] flags: 0x0()
[81569.394069] raw: 00000000 00000000 00000000 ffffffff ffffffff 00000100 00000200 00000001
[81569.394074] raw: 00000000
[81569.394078] page dumped because: nonzero _refcount
[81569.394082] Modules linked in: fuse ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter rfcomm bnep hci_uart btbcm serdev bluetooth ecdh_generic binfmt_misc joydev evdev ftdi_sio usbserial sg brcmfmac brcmutil cfg80211 rfkill snd_bcm2835(C) snd_pcm snd_timer snd uio_pdrv_genirq uio fixed i2c_dev ip_tables x_tables ipv6
[81569.394162] CPU: 2 PID: 1742 Comm: bash Tainted: G B C 4.14.61-v7+ #2
[81569.394164] Hardware name: BCM2835
[81569.394186] [<8010ffc0>] (unwind_backtrace) from [<8010c1fc>] (show_stack+0x20/0x24)
[81569.394196] [<8010c1fc>] (show_stack) from [<8089cddc>] (dump_stack+0xc8/0x10c)
[81569.394206] [<8089cddc>] (dump_stack) from [<80225db8>] (bad_page+0x108/0x16c)
[81569.394215] [<80225db8>] (bad_page) from [<80225ea0>] (free_pages_check_bad+0x84/0x88)
[81569.394223] [<80225ea0>] (free_pages_check_bad) from [<802265e0>] (free_pcppages_bulk+0x394/0x4e8)
[81569.394232] [<802265e0>] (free_pcppages_bulk) from [<802283f4>] (free_hot_cold_page+0x28c/0x2a0)
[81569.394241] [<802283f4>] (free_hot_cold_page) from [<80228798>] (free_hot_cold_page_list+0x68/0xe4)
[81569.394251] [<80228798>] (free_hot_cold_page_list) from [<80238e74>] (shrink_page_list+0x498/0xe3c)
[81569.394260] [<80238e74>] (shrink_page_list) from [<80239f64>] (shrink_inactive_list+0x230/0x62c)
[81569.394267] [<80239f64>] (shrink_inactive_list) from [<8023aad0>] (shrink_node_memcg+0x364/0x6a4)
[81569.394274] [<8023aad0>] (shrink_node_memcg) from [<8023af04>] (shrink_node+0xf4/0x364)
[81569.394282] [<8023af04>] (shrink_node) from [<8023b280>] (do_try_to_free_pages+0x10c/0x3a4)
[81569.394289] [<8023b280>] (do_try_to_free_pages) from [<8023b680>] (try_to_free_pages+0x168/0x474)
[81569.394297] [<8023b680>] (try_to_free_pages) from [<8022a1e0>] (__alloc_pages_nodemask+0x5e4/0x1130)
[81569.394307] [<8022a1e0>] (__alloc_pages_nodemask) from [<8011b000>] (copy_process.part.5+0xec/0x17d8)
[81569.394318] [<8011b000>] (copy_process.part.5) from [<8011c864>] (_do_fork+0xb0/0x3ec)
[81569.394327] [<8011c864>] (_do_fork) from [<8011ccbc>] (SyS_clone+0x30/0x38)
[81569.394336] [<8011ccbc>] (SyS_clone) from [<80108000>] (ret_fast_syscall+0x0/0x28)
[81585.125095] INFO: rcu_sched self-detected stall on CPU
[81585.125115] 3-...: (2099 ticks this GP) idle=04e/140000000000001/0 softirq=1871919/1871932 fqs=1044
[81585.125117] (t=2100 jiffies g=1290443 c=1290442 q=14079)
[81585.125128] NMI backtrace for cpu 3
[81585.125134] CPU: 3 PID: 44 Comm: kswapd0 Tainted: G B C 4.14.61-v7+ #2
[81585.125137] Hardware name: BCM2835
[81585.125160] [<8010ffc0>] (unwind_backtrace) from [<8010c1fc>] (show_stack+0x20/0x24)
[81585.125170] [<8010c1fc>] (show_stack) from [<8089cddc>] (dump_stack+0xc8/0x10c)
[81585.125181] [<8089cddc>] (dump_stack) from [<808a2d1c>] (nmi_cpu_backtrace+0x11c/0x120)
[81585.125191] [<808a2d1c>] (nmi_cpu_backtrace) from [<808a2e00>] (nmi_trigger_cpumask_backtrace+0xe0/0x12c)
[81585.125199] [<808a2e00>] (nmi_trigger_cpumask_backtrace) from [<8010e648>] (arch_trigger_cpumask_backtrace+0x20/0x24)
[81585.125209] [<8010e648>] (arch_trigger_cpumask_backtrace) from [<80185710>] (rcu_dump_cpu_stacks+0xb0/0xdc)
[81585.125220] [<80185710>] (rcu_dump_cpu_stacks) from [<801850b0>] (rcu_check_callbacks+0x7e8/0x960)
[81585.125229] [<801850b0>] (rcu_check_callbacks) from [<8018b314>] (update_process_times+0x44/0x70)
[81585.125239] [<8018b314>] (update_process_times) from [<8019d32c>] (tick_sched_handle+0x64/0x70)
[81585.125248] [<8019d32c>] (tick_sched_handle) from [<8019d590>] (tick_sched_timer+0x50/0xac)
[81585.125256] [<8019d590>] (tick_sched_timer) from [<8018c2cc>] (__hrtimer_run_queues+0x158/0x2ec)
[81585.125263] [<8018c2cc>] (__hrtimer_run_queues) from [<8018c704>] (hrtimer_interrupt+0xb8/0x20c)
[81585.125273] [<8018c704>] (hrtimer_interrupt) from [<8075175c>] (arch_timer_handler_phys+0x40/0x48)
[81585.125286] [<8075175c>] (arch_timer_handler_phys) from [<8017a150>] (handle_percpu_devid_irq+0x88/0x23c)
[81585.125295] [<8017a150>] (handle_percpu_devid_irq) from [<801749fc>] (generic_handle_irq+0x34/0x44)
[81585.125303] [<801749fc>] (generic_handle_irq) from [<80175050>] (__handle_domain_irq+0x6c/0xc4)
[81585.125311] [<80175050>] (__handle_domain_irq) from [<80101520>] (bcm2836_arm_irqchip_handle_irq+0xac/0xb0)
[81585.125320] [<80101520>] (bcm2836_arm_irqchip_handle_irq) from [<808b88fc>] (__irq_svc+0x5c/0x7c)
[81585.125324] Exception stack(0xb9ba5b18 to 0xb9ba5b60)
[81585.125329] 5b00: 983de008 00000000
[81585.125335] 5b20: 00005ae8 00000006 983de000 ba310b38 ba310b4c ba310b4c 80e1b6e8 80d069c4
[81585.125341] 5b40: b0ac7400 b9ba5b74 b9ba5b78 b9ba5b68 8028a6a0 808b80b0 88000013 ffffffff
[81585.125351] [<808b88fc>] (__irq_svc) from [<808b80b0>] (_raw_spin_lock+0x40/0x54)
[81585.125361] [<808b80b0>] (_raw_spin_lock) from [<8028a6a0>] (z3fold_zpool_shrink+0x32c/0x4d8)
[81585.125369] [<8028a6a0>] (z3fold_zpool_shrink) from [<80288748>] (zpool_shrink+0x24/0x28)
[81585.125379] [<80288748>] (zpool_shrink) from [<802746f0>] (zswap_frontswap_store+0x22c/0x478)
[81585.125389] [<802746f0>] (zswap_frontswap_store) from [<80272dd8>] (__frontswap_store+0x8c/0x160)
[81585.125399] [<80272dd8>] (__frontswap_store) from [<8026c290>] (swap_writepage+0x54/0x94)
[81585.125409] [<8026c290>] (swap_writepage) from [<802393b8>] (shrink_page_list+0x9dc/0xe3c)
[81585.125418] [<802393b8>] (shrink_page_list) from [<80239f64>] (shrink_inactive_list+0x230/0x62c)
[81585.125425] [<80239f64>] (shrink_inactive_list) from [<8023aad0>] (shrink_node_memcg+0x364/0x6a4)
[81585.125433] [<8023aad0>] (shrink_node_memcg) from [<8023af04>] (shrink_node+0xf4/0x364)
[81585.125440] [<8023af04>] (shrink_node) from [<8023c080>] (kswapd+0x308/0x7f0)
[81585.125449] [<8023c080>] (kswapd) from [<8013db68>] (kthread+0x144/0x174)
[81585.125458] [<8013db68>] (kthread) from [<801080ac>] (ret_from_fork+0x14/0x28)

At this point, running swapoff is likely to hard-freeze the system, and kswapd0 is using 100% of CPU until I reboot.

If I use zbud with the exact same configuration, I don't have this error.

I'm unable to reproduce the issue with the stock swapfile size, however. With the stock swapfile size of 100MB, I fill swap (apparently without using the swapfile) to 100MB, using typically 35-40MB of zswap space. If I attach a larger swapfile, and add memory pressure until

@Syonyk
Copy link
Author

Syonyk commented Sep 6, 2018

Continued updates on this:

With the 4.18 source in the tree (at least as of several days ago), with z3fold, I can still reproduce lockups under heavy memory pressure with Chromium, though I've not had luck with some simpler synthetic test cases. In all cases I've tested, zbud performs properly (without either a hard lockup or a kernel bug).

I typically set the pool limit to 15% of RAM (so ~150MB), with a large swapfile (~1GB) attached.

Start loading up webpages in Chromium, in multiple tabs, while monitoring the zswap use stats (/sys/kernel/debug/zswap). Once the pool size approaches the set limit and zswap has to start rejecting pages or pushing old pages out to disk, the system will likely either lock up entirely or suffer a kernel bug (I tend to monitor dmesg and stop as soon as I see the bug, as sustained stress will then lock the system).

With zbud, I cannot reproduce this behavior at all. It just works as expected.

However, with the same_filled_page support in 4.18, I still see very good compression ratios from 2-4, typically, depending on use and how long the system has been up. This still deflects a large amount of traffic from the disk backed swap file/partition. Currently, on a system that's been up under typical desktop use (browser, IRC, terminals, etc) for several days:

Compression ratio: 2.15x
Zswap Used: 241.89 MB Stored in 112.39 MB of RAM
Same filled pages: 53.07 MB
Swapfile Used: 45.19 MB

pi@raspberrypi:~ $ cat /proc/swaps
Filename				Type		Size	        Used	Priority
/dev/sda3                          partition	         789332	293976	-2

I would like to at least get zswap included in the stock kernel, even if not enabled. The Pi kernel includes the zram support as a module, and adding zswap support (with at least zbud) as a module would allow people to use it as well, if their use case works for it.

I still think enabling it by default would be good for typical use, though with the stock 100MB swap file, it's somewhat limited in how effective it can be (as it doesn't use the swapfile, but still "counts" as swap file used).

@Syonyk
Copy link
Author

Syonyk commented Sep 6, 2018

I can still reproduce a crash on z3fold with the 4.19 kernel in the tree, as of commit 0038f2a

[ 311.955909] z3fold: unknown buddy id 0
[ 311.955933] ------------[ cut here ]------------
[ 311.955978] WARNING: CPU: 2 PID: 46 at mm/z3fold.c:971 z3fold_zpool_map+0xa0/0xec
[ 311.955983] Modules linked in: rfcomm bnep fuse hci_uart btbcm serdev bluetooth ecdh_generic binfmt_misc evdev brcmfmac brcmutil sha256_generic snd_bcm2835(C) snd_pcm cfg80211 snd_timer rfkill snd uio_pdrv_genirq fixed uio i2c_dev ip_tables x_tables ipv6
[ 311.956098] CPU: 2 PID: 46 Comm: kswapd0 Tainted: G C 4.19.0-rc2-v7+ #1
[ 311.956101] Hardware name: BCM2835
[ 311.956143] [<80111cdc>] (unwind_backtrace) from [<8010d2dc>] (show_stack+0x20/0x24)
[ 311.956170] [<8010d2dc>] (show_stack) from [<8080a630>] (dump_stack+0xc8/0x10c)
[ 311.956191] [<8080a630>] (dump_stack) from [<801205ac>] (__warn+0x104/0x11c)
[ 311.956204] [<801205ac>] (__warn) from [<801206fc>] (warn_slowpath_null+0x50/0x58)
[ 311.956220] [<801206fc>] (warn_slowpath_null) from [<802a4a60>] (z3fold_zpool_map+0xa0/0xec)
[ 311.956234] [<802a4a60>] (z3fold_zpool_map) from [<802a3bf0>] (zpool_map_handle+0x24/0x28)
[ 311.956246] [<802a3bf0>] (zpool_map_handle) from [<8028d268>] (zswap_writeback_entry+0x58/0x408)
[ 311.956255] [<8028d268>] (zswap_writeback_entry) from [<802a4538>] (z3fold_zpool_evict+0x40/0x4c)
[ 311.956265] [<802a4538>] (z3fold_zpool_evict) from [<802a5a94>] (z3fold_zpool_shrink+0x2d0/0x488)
[ 311.956272] [<802a5a94>] (z3fold_zpool_shrink) from [<802a3bc0>] (zpool_shrink+0x2c/0x38)
[ 311.956283] [<802a3bc0>] (zpool_shrink) from [<8028e074>] (zswap_frontswap_store+0x2f8/0x604)
[ 311.956297] [<8028e074>] (zswap_frontswap_store) from [<8028c584>] (__frontswap_store+0x8c/0x160)
[ 311.956309] [<8028c584>] (__frontswap_store) from [<80284be0>] (swap_writepage+0x54/0x94)
[ 311.956326] [<80284be0>] (swap_writepage) from [<802500e0>] (shrink_page_list+0xa8c/0xf98)
[ 311.956338] [<802500e0>] (shrink_page_list) from [<80250e44>] (shrink_inactive_list+0x2f4/0x6c8)
[ 311.956345] [<80250e44>] (shrink_inactive_list) from [<80251a58>] (shrink_node_memcg+0x37c/0x6d0)
[ 311.956352] [<80251a58>] (shrink_node_memcg) from [<80251e9c>] (shrink_node+0xf0/0x50c)
[ 311.956359] [<80251e9c>] (shrink_node) from [<802532d8>] (kswapd+0x330/0x83c)
[ 311.956376] [<802532d8>] (kswapd) from [<80141d74>] (kthread+0x140/0x170)
[ 311.956389] [<80141d74>] (kthread) from [<801010ac>] (ret_from_fork+0x14/0x28)
[ 311.956393] Exception stack(0xb9fb5fb0 to 0xb9fb5ff8)
[ 311.956400] 5fa0: 00000000 00000000 00000000 00000000
[ 311.956406] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 311.956410] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ 311.956419] ---[ end trace ebcf936fd4d2ecee ]---
[ 311.956442] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 311.956472] pgd = e8caa937
[ 311.956479] [00000000] *pgd=00000000
[ 311.956499] Internal error: Oops: 5 [#1] SMP ARM
[ 311.956508] Modules linked in: rfcomm bnep fuse hci_uart btbcm serdev bluetooth ecdh_generic binfmt_misc evdev brcmfmac brcmutil sha256_generic snd_bcm2835(C) snd_pcm cfg80211 snd_timer rfkill snd uio_pdrv_genirq fixed uio i2c_dev ip_tables x_tables ipv6
[ 311.956578] CPU: 2 PID: 46 Comm: kswapd0 Tainted: G WC 4.19.0-rc2-v7+ #1
[ 311.956586] Hardware name: BCM2835
[ 311.956600] PC is at zswap_writeback_entry+0x5c/0x408
[ 311.956610] LR is at __warn+0xbc/0x11c
[ 311.956617] pc : [<8028d26c>] lr : [<80120564>] psr: 60000013
[ 311.956622] sp : b9fb5a98 ip : 00000000 fp : b9fb5b24
[ 311.956627] r10: baa5d26c r9 : 80e15c10 r8 : 80d04d48
[ 311.956632] r7 : 80e14eb4 r6 : baa5d26c r5 : 96bda000 r4 : b9685c80
[ 311.956638] r3 : 00000019 r2 : 00000600 r1 : 96bda000 r0 : 00000000
[ 311.956648] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
[ 311.956653] Control: 10c5383d Table: 1d88806a DAC: 00000055
[ 311.956662] Process kswapd0 (pid: 46, stack limit = 0xf2416b17)
[ 311.956667] Stack: (0xb9fb5a98 to 0xb9fb6000)
[ 311.956672] 5a80: b9fb5aac 00000004
[ 311.956681] 5aa0: ffffffef b8d6ea00 b9720000 00001000 00000000 00000000 00000000 00000000
[ 311.956690] 5ac0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 311.956698] 5ae0: 00000000 00000000 00000000 00000000 b9fb5b1c b1ab2ef0 00000000 baa5d268
[ 311.956706] 5b00: 96bda000 baa5d26c 80e14eb4 80d07a84 b9684b80 baa5d26c b9fb5b34 b9fb5b28
[ 311.956715] 5b20: 802a4538 8028d21c b9fb5b8c b9fb5b38 802a5a94 802a4504 ba5f1e90 00000000
[ 311.956725] 5b40: 00000001 00000000 96bda003 38e38e39 ffffffef 00000007 b9684b84 b9684b90
[ 311.956732] 5b60: 80d21dc0 80e15c10 b8d6ea00 80d22050 80d04d48 80d0546c b94c10c0 ba5f1e90
[ 311.956741] 5b80: b9fb5b9c b9fb5b90 802a3bc0 802a57d0 b9fb5bf4 b9fb5ba0 8028e074 802a3ba0
[ 311.956749] 5ba0: 27404000 b9fb5c18 a3025840 9d9bf720 000243f7 80d0546c 00001000 b9fb5bc8
[ 311.956756] 5bc0: 000243f7 b1ab2ef0 80286584 80d22068 000243f7 00000000 ba5f1e90 b5c77600
[ 311.956763] 5be0: 80d055dc b9fb5ca0 b9fb5c24 b9fb5bf8 8028c584 8028dd88 000004ef ba5f1e90
[ 311.956771] 5c00: ba5f1e90 b9fb5cb0 b6bf4b60 b9fb5d54 00000001 b9fb5ca0 b9fb5c44 b9fb5c28
[ 311.956780] 5c20: 80284be0 8028c504 ba5f1e94 ba5f1e90 b9fb5f0c b6bf4b60 b9fb5d24 b9fb5c48
[ 311.956787] 5c40: 802500e0 80284b98 6907c000 90291a40 80c8feb0 00000000 00000000 80d8abc0
[ 311.956794] 5c60: 80d04d48 b9fb5d5c 00000000 0000000b 00000000 00000000 00000000 b9fb4000
[ 311.956802] 5c80: b9fb5c48 00000000 00000000 00000000 00000000 00000000 b9fb5d24 00005ca8
[ 311.956810] 5ca0: ba607c08 ba607c08 ba607c2c ba64dc40 00000020 00000000 00000000 00000000
[ 311.956819] 5cc0: ffffffff 7fffffff 00000000 00000008 00000000 00000000 00000000 00000000
[ 311.956829] 5ce0: 00000000 00000000 00000000 00000000 00000020 b1ab2ef0 b9fb5d24 b9fb5f0c
[ 311.956836] 5d00: 80d8b2c4 b9fb5d54 80d8b2c0 00000020 00000020 60000093 b9fb5dac b9fb5d28
[ 311.956844] 5d20: 80250e44 8024f660 b9fb5d5c 00000000 00000000 80d04d48 00000000 fffffffe
[ 311.956851] 5d40: 80d04d48 00000007 00000000 80d8abc0 00000020 ba756c60 ba5f1edc 00000000
[ 311.956859] 5d60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 b1ab2ef0
[ 311.956867] 5d80: 00000000 00000000 b9fb5dec 00000074 b9fb5f0c 80d8b2c4 00000000 00000020
[ 311.956874] 5da0: b9fb5e44 b9fb5db0 80251a58 80250b5c 00000001 80200fb0 80c91e40 0000019d
[ 311.956882] 5dc0: 80d04d48 51eb851f 00000000 00001800 8014a548 b9fb5dd4 b9fb5dd4 b9739e30
[ 311.956890] 5de0: b973a830 b9fb5de4 b9fb5de4 000000d9 000000f8 00000000 00000076 9eb4efc0
[ 311.956898] 5e00: 00000139 00000138 00000038 000000b6 9eb4efc0 b1ab2ef0 80821c68 b9fb5f0c
[ 311.956906] 5e20: 00000000 00000000 00000000 80d8abc0 00000150 00000000 b9fb5eb4 b9fb5e48
[ 311.956913] 5e40: 80251e9c 802516e8 b9fb5eb4 b9fb5e58 80d04d48 80d04ac0 b9fb5f28 b9fb5f04
[ 311.956921] 5e60: b9fb4000 00000000 00000000 00000150 00001800 0001b32e 80d8abc0 00000006
[ 311.956928] 5e80: 00000000 b1ab2ef0 8024df1c 00000000 80d8b2c4 00000001 00000001 80d8b2c4
[ 311.956935] 5ea0: 00000000 80d8abc0 b9fb5f74 b9fb5eb8 802532d8 80251db8 00000001 b9fb8000
[ 311.956943] 5ec0: 80c8fed8 80c8fedc 80d04d48 b9fb8000 80d04d70 ffffe000 80168020 b9fb5eb8
[ 311.956951] 5ee0: 00000003 b9fb4000 80dc38d0 80e15bd8 80e17474 00000000 00000000 00000150
[ 311.956958] 5f00: b9d9031c 00000000 00000000 00001800 00000000 00000000 00060007 006000c0
[ 311.956967] 5f20: 000000d2 00000150 00000000 00000000 00000000 00000000 00000000 00000038
[ 311.956975] 5f40: 00000078 b1ab2ef0 80141888 b9d90300 b9f3aa00 00000000 b9fb4000 80d8abc0
[ 311.956986] 5f60: b9d9031c b9d0be1c b9fb5fac b9fb5f78 80141d74 80252fb4 8012dfcc 80252fa8
[ 311.956992] 5f80: ffffe000 b9f3aa00 80141c34 00000000 00000000 00000000 00000000 00000000
[ 311.956999] 5fa0: 00000000 b9fb5fb0 801010ac 80141c40 00000000 00000000 00000000 00000000
[ 311.957006] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 311.957013] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[ 311.957039] [<8028d26c>] (zswap_writeback_entry) from [<802a4538>] (z3fold_zpool_evict+0x40/0x4c)
[ 311.957052] [<802a4538>] (z3fold_zpool_evict) from [<802a5a94>] (z3fold_zpool_shrink+0x2d0/0x488)
[ 311.957064] [<802a5a94>] (z3fold_zpool_shrink) from [<802a3bc0>] (zpool_shrink+0x2c/0x38)
[ 311.957075] [<802a3bc0>] (zpool_shrink) from [<8028e074>] (zswap_frontswap_store+0x2f8/0x604)
[ 311.957092] [<8028e074>] (zswap_frontswap_store) from [<8028c584>] (__frontswap_store+0x8c/0x160)
[ 311.957104] [<8028c584>] (__frontswap_store) from [<80284be0>] (swap_writepage+0x54/0x94)
[ 311.957117] [<80284be0>] (swap_writepage) from [<802500e0>] (shrink_page_list+0xa8c/0xf98)
[ 311.957126] [<802500e0>] (shrink_page_list) from [<80250e44>] (shrink_inactive_list+0x2f4/0x6c8)
[ 311.957136] [<80250e44>] (shrink_inactive_list) from [<80251a58>] (shrink_node_memcg+0x37c/0x6d0)
[ 311.957145] [<80251a58>] (shrink_node_memcg) from [<80251e9c>] (shrink_node+0xf0/0x50c)
[ 311.957153] [<80251e9c>] (shrink_node) from [<802532d8>] (kswapd+0x330/0x83c)
[ 311.957165] [<802532d8>] (kswapd) from [<80141d74>] (kthread+0x140/0x170)
[ 311.957177] [<80141d74>] (kthread) from [<801010ac>] (ret_from_fork+0x14/0x28)
[ 311.957182] Exception stack(0xb9fb5fb0 to 0xb9fb5ff8)
[ 311.957188] 5fa0: 00000000 00000000 00000000 00000000
[ 311.957194] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 311.957200] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ 311.957213] Code: e3a02001 e1a00004 eb005a58 e1a01005 (e5907000)
[ 311.957269] ---[ end trace ebcf936fd4d2ecef ]---
[ 311.957281] ------------[ cut here ]------------
[ 311.957296] WARNING: CPU: 2 PID: 46 at kernel/exit.c:773 do_exit+0x6c/0xb7c
[ 311.957301] Modules linked in: rfcomm bnep fuse hci_uart btbcm serdev bluetooth ecdh_generic binfmt_misc evdev brcmfmac brcmutil sha256_generic snd_bcm2835(C) snd_pcm cfg80211 snd_timer rfkill snd uio_pdrv_genirq fixed uio i2c_dev ip_tables x_tables ipv6
[ 311.957370] CPU: 2 PID: 46 Comm: kswapd0 Tainted: G D WC 4.19.0-rc2-v7+ #1
[ 311.957374] Hardware name: BCM2835
[ 311.957392] [<80111cdc>] (unwind_backtrace) from [<8010d2dc>] (show_stack+0x20/0x24)
[ 311.957407] [<8010d2dc>] (show_stack) from [<8080a630>] (dump_stack+0xc8/0x10c)
[ 311.957422] [<8080a630>] (dump_stack) from [<801205ac>] (__warn+0x104/0x11c)
[ 311.957433] [<801205ac>] (__warn) from [<801206fc>] (warn_slowpath_null+0x50/0x58)
[ 311.957442] [<801206fc>] (warn_slowpath_null) from [<80124704>] (do_exit+0x6c/0xb7c)
[ 311.957452] [<80124704>] (do_exit) from [<8010d544>] (die+0x264/0x364)
[ 311.957463] [<8010d544>] (die) from [<80116590>] (__do_kernel_fault.part.0+0x74/0x84)
[ 311.957477] [<80116590>] (__do_kernel_fault.part.0) from [<8082805c>] (do_page_fault+0x240/0x3ac)
[ 311.957490] [<8082805c>] (do_page_fault) from [<80828284>] (do_translation_fault+0xbc/0xc0)
[ 311.957498] [<80828284>] (do_translation_fault) from [<801163a4>] (do_DataAbort+0x5c/0xf8)
[ 311.957506] [<801163a4>] (do_DataAbort) from [<80101934>] (__dabt_svc+0x54/0x80)
[ 311.957511] Exception stack(0xb9fb5a48 to 0xb9fb5a90)
[ 311.957518] 5a40: 00000000 96bda000 00000600 00000019 b9685c80 96bda000
[ 311.957525] 5a60: baa5d26c 80e14eb4 80d04d48 80e15c10 baa5d26c b9fb5b24 00000000 b9fb5a98
[ 311.957531] 5a80: 80120564 8028d26c 60000013 ffffffff
[ 311.957542] [<80101934>] (__dabt_svc) from [<8028d26c>] (zswap_writeback_entry+0x5c/0x408)
[ 311.957556] [<8028d26c>] (zswap_writeback_entry) from [<802a4538>] (z3fold_zpool_evict+0x40/0x4c)
[ 311.957565] [<802a4538>] (z3fold_zpool_evict) from [<802a5a94>] (z3fold_zpool_shrink+0x2d0/0x488)
[ 311.957573] [<802a5a94>] (z3fold_zpool_shrink) from [<802a3bc0>] (zpool_shrink+0x2c/0x38)
[ 311.957581] [<802a3bc0>] (zpool_shrink) from [<8028e074>] (zswap_frontswap_store+0x2f8/0x604)
[ 311.957593] [<8028e074>] (zswap_frontswap_store) from [<8028c584>] (__frontswap_store+0x8c/0x160)
[ 311.957603] [<8028c584>] (__frontswap_store) from [<80284be0>] (swap_writepage+0x54/0x94)
[ 311.957616] [<80284be0>] (swap_writepage) from [<802500e0>] (shrink_page_list+0xa8c/0xf98)
[ 311.957625] [<802500e0>] (shrink_page_list) from [<80250e44>] (shrink_inactive_list+0x2f4/0x6c8)
[ 311.957633] [<80250e44>] (shrink_inactive_list) from [<80251a58>] (shrink_node_memcg+0x37c/0x6d0)
[ 311.957642] [<80251a58>] (shrink_node_memcg) from [<80251e9c>] (shrink_node+0xf0/0x50c)
[ 311.957650] [<80251e9c>] (shrink_node) from [<802532d8>] (kswapd+0x330/0x83c)
[ 311.957661] [<802532d8>] (kswapd) from [<80141d74>] (kthread+0x140/0x170)
[ 311.957671] [<80141d74>] (kthread) from [<801010ac>] (ret_from_fork+0x14/0x28)
[ 311.957676] Exception stack(0xb9fb5fb0 to 0xb9fb5ff8)
[ 311.957685] 5fa0: 00000000 00000000 00000000 00000000
[ 311.957699] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 311.957705] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ 311.957713] ---[ end trace ebcf936fd4d2ecf0 ]---

@monetny
Copy link

monetny commented Sep 22, 2018

Good day for all!
I make desktop raspbian for russian users every month.
(First release was make 23sep17)
And I use zswap+z3fold+sparse file+f2fs and find that very good.

PS You can test it. And ask me if you have questions
You can see my built-in script for update kernel: /usr/bin/rpi-zswap

PPS first russian manual for compile kernel with zswap+z3fold I write mar17 for rpi-4.9.y

PPPS z3fold was added to rpi-4.7.y at may16

@dsx724
Copy link

dsx724 commented Oct 25, 2018

@Syonyk Maybe email the mailing list or Jongseok Kim directly?
https://lore.kernel.org/patchwork/patch/1003157/

@Syonyk
Copy link
Author

Syonyk commented Oct 25, 2018

I've struggled to reproduce it on x86.

But that patch may be worth trying.

@dsx724
Copy link

dsx724 commented Oct 25, 2018

I would try the latest patch without Jongseok Kim's patch.

@nkichukov
Copy link

nkichukov commented Dec 3, 2018

RaspberryPi 3B here, running 32bit GNU/Gentoo Linux here on armv7a-unknown-linux-gnueabihf with kernel: 4.18.17. The system is stable, I have not seen any issues so far. The settings:

grep . -R /sys/module/zswap/parameters/
/sys/module/zswap/parameters/enabled:Y
/sys/module/zswap/parameters/max_pool_percent:40
/sys/module/zswap/parameters/zpool:z3fold
/sys/module/zswap/parameters/same_filled_pages_enabled:Y
/sys/module/zswap/parameters/compressor:lz4

Current values:

same_filled_pages:6279
stored_pages:9856
pool_total_size:5255168
duplicate_entry:0
written_back_pages:0
reject_compress_poor:0
reject_kmemcache_fail:0
reject_alloc_fail:0
reject_reclaim_fail:0
pool_limit_hit:0

Swap usage:

free -m
              total        used        free      shared  buff/cache   available
Mem:            816         301           9           0         506         501
Swap:           999          39         960

Compression ratio(possible due to high number of same filled pages): 7.7, obtain like this:

9856*4096/5255168
7.68199532346063912704

Kernel swappiness:

sysctl -a | grep swappiness
vm.swappiness = 100

HTH,
-N

@burnbabyburn
Copy link

tested on RPI3 and RPI3B+ with arm64 self build kernel on debian testing. works like a charm. with systemd-swap its also a no brainer

@Syonyk
Copy link
Author

Syonyk commented Mar 15, 2019

I've been using z3fold on the 4.20 branch with zero observed issues, so I think it's mature enough to fold in as modules.

What is required to make this happen? A pull request?

@StuartIanNaylor
Copy link

StuartIanNaylor commented Apr 14, 2019

I would also like to ask what is required to make this happen.
Looks like I have been beaten by quite a period as just started playing with dphys-swap on f2fs fronted by zswap cache. Those clever damn Russians @monetny :) great idea.
Because SD media is so cheap f2fs with bigger than default Over-provision is as valid as any SD life extending technology and even more valid with a zswap cache fronting it.
All modules are optional by choice but please try to include zwsap in 4.19.

@Syonyk
Copy link
Author

Syonyk commented Apr 14, 2019

Don't use z3fold until mid-4.20. It's still not stable in 4.19.

However, it doesn't matter that much. When you add same filled page merging, the value of z3fold drops significantly. In long term running, I see roughly the same total compression with z3fold and zbud, as long as the same-filled merging is enabled. The value of z3fold seems to mostly be allowing one to store an 0-filled page in the slack between two data-filled pages. In some situations, I'm sure it has a slight benefit, but I genuinely couldn't tell you what allocator I'm running based on the total compression in recent kernel versions.

I suppose I can submit a patch to the config and put it in the pipeline.

@StuartIanNaylor
Copy link

StuartIanNaylor commented Apr 14, 2019

My call is for zwsap in 4.19 apols for the wording. z3fold yeah great and if it needs to come later then OK.

@pwr22
Copy link

pwr22 commented Dec 11, 2019

As far as I can see this still isn't available - any chance we could turn enable it sometime soon?

@Syonyk
Copy link
Author

Syonyk commented Dec 18, 2019

I suppose I'll get a pull request together... unless someone else wants to do it!

@yutayu
Copy link

yutayu commented Jan 1, 2020

zswap for raspbian .conf please :)

@yutayu
Copy link

yutayu commented Jan 1, 2020

@zertyz
Copy link

zertyz commented Jan 5, 2020

+1 for adding the zswap & all folding & compression goodies as modules to the official kernel configuration and saving us the time & energy to recompile

@Syonyk
Copy link
Author

Syonyk commented Jan 27, 2020

I've created a patch to add the zswap feature to the three kernels - bcmrpi_defconfig, bcm2709_defconfig, bcm2711_defconfig. Linked above.

I'm slightly unclear as to what the proper next step is, though. Hopefully the patch gets swallowed in soon!

@CarlosGS
Copy link

CarlosGS commented Dec 3, 2020

This is definitely an improvement, specially for low memory boards. Waiting to see it merged.

@popcornmix
Copy link
Collaborator

@CarlosGS zswap support was merged about 6 months ago.
#3626

@CarlosGS
Copy link

CarlosGS commented Dec 3, 2020

Sorry, I misread the issue as a request to enable zswap by default (as ram compression is in Windows, Mac, Android..).

What are your thoughts @Syonyk? Did you find any standard benchmark to show the improvement? Web browsing definitely improves.

@Syonyk
Copy link
Author

Syonyk commented Dec 3, 2020

I haven't come up with any synthetic benchmark suites to demonstrate the improvement, but anything with reasonably compressible memory that goes to swap ought to be improved. I just use zswap on my memory limited systems and it radically improves how much they can handle. However, with the Pi4 having sane amounts of memory (4GB/8GB), it matters a bit less for my needs anymore.

As it's part of the Pi kernel now, I see no reason to keep this thread open, though.

@Syonyk Syonyk closed this as completed Dec 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests