Description
As we looked back over the year, we noticed that the BiogoKrishna benchmark had regressed significantly (~20%) between the Go 1.20 and Go 1.21 release on our linux-amd64-perf builder.
Upon further investigation, the culprit appears to be 9f9bb26. This change partially rolled back a switch during the Go 1.21 to setting MADV_HUGEPAGE
on all heap memory. For a number of reasons, this switch turned out not to be a good idea. Mainly, as it turns out, marking memory as MADV_HUGEPAGE
can trigger an unbounded stall on a memory access to that memory in many common kernel configurations for huge pages. See #61718. (9f9bb26 replaced MADV_HUGEPAGE
with something else which had the same latency problem, just at syscall time. This too was rolled back.) As of Go 1.21.4, the Go runtime no longer tries to mark any memory for huge page tracking purposes. What is left is a policy change in the Go runtime's scavenger (the part that returns memory back to the OS) to make it significantly friendlier to OS huge page heuristics, which at this point I believe is about as good as we can get. The Linux APIs unfortunately do not give the Go runtime the precise control it wants over the situation, and seem mostly focused around the use-case of operators and application owners tweaking the huge page settings, not memory allocators or language runtimes, despite some of the messaging around these features.
(For more details about huge pages see https://go.dev/doc/gc-guide#Linux_transparent_huge_pages, which was added during the Go 1.21 release.)
One thing to note is that this culprit doesn't explain why there is a regression between Go 1.20 and Go 1.21. The reason for that is that prior to the policy change that landed in Go 1.21, Go 1.20 would occasionally mark memory as MADV_HUGEPAGE
. The behavior was unpredictable (since it relied on the order of memory allocations), but deterministic. In short, if a memory allocation was made that contained at least one complete aligned huge page, that huge page would get marked as MADV_HUGEPAGE
.
My best understanding of the BiogoKrishna benchmark is that this was mostly a microbenchmark. It operates on a significant data source, but the analysis it performs on that data source involves a series of tight loops that run for a very long time. It is quite sensitive to the caching effects of the memory accesses in those loops. Hence, the lack of backing that memory with a huge page led to a significant performance regression.
Crucially though,MADV_HUGEPAGE
only made a meaningful difference if the memory wasn't already backed by a huge page. As of this writing, the default kernel configuration of the linux-amd64-perf builder sets /sys/kernel/mm/transparent_hugepage/enabled
set to madvise
, which means huge pages are only enabled for memory regions marked MADV_HUGEPAGE
. This regression did not always reproduce, and it did not reproduce on machines that set /sys/kernel/mm/transparent_hugepage/enabled
to always
, because that memory region was presumably already backed by a huge page.
Therefore, I conclude that this performance regression is an unfortunate consequence of the following factors:
- BiogoKrishna behaves more like a microbenchmark and isn't as realistic as I originally thought. I held up for a long time, but this regression can be compared to a situation wherein a microbenchmark regressed because of arbitrary code alignment changes. (And we have no end of those on https://perf.golang.org/dashboard.)
- We made a conscious decision in the Go 1.21.4 release that the Go runtime would no longer try to impose a huge page policy itself. It would still strive to be friendly to huge pages and picks allocation and release policies that aided the kernel in applying and maintaining huge pages, but it would no longer force the kernel to back any heap memory with huge pages.