Description
Proposal: Soft memory limit
Author: Michael Knyszek
Summary
I propose a new option for tuning the behavior of the Go garbage collector by setting a soft memory limit on the total amount of memory that Go uses.
This option comes in two flavors: a new runtime/debug
function called SetMemoryLimit
and a GOMEMLIMIT
environment variable. In sum, the runtime will try to maintain this memory limit by limiting the size of the heap, and by returning memory to the underlying platform more aggressively. This includes with a mechanism to help mitigate garbage collection death spirals. Finally, by setting GOGC=off
, the Go runtime will always grow the heap to the full memory limit.
This new option gives applications better control over their resource economy. It empowers users to:
- Better utilize the memory that they already have,
- Confidently decrease their memory limits, knowing Go will respect them,
- Avoid unsupported forms of garbage collection tuning.
Details
Full design document found here.
Note that, for the time being, this proposal intends to supersede #44309. Frankly, I haven't been able to find a significant use-case for it, as opposed to a soft memory limit overall. If you believe you have a real-world use-case for a memory target where a memory limit with GOGC=off
would not solve the same problem, please do not hesitate to post on that issue, contact me on the gophers slack, or via email at mknyszek@golang.org. Please include as much detail as you can.
Activity
gopherbot commentedon Sep 15, 2021
Change https://golang.org/cl/350116 mentions this issue:
design: add proposal for a soft memory limit
design: add proposal for a soft memory limit
mpx commentedon Sep 21, 2021
Afaict, the impact of memory limit is visible once the GC is CPU throttled, but not before. Would it be worth exposing the current effective GOGC as well?
mknyszek commentedon Sep 21, 2021
@mpx I think that's an interesting idea. If
GOGC
is not off, then you have a very clear sign of throttling in telemetry. However, ifGOGC=off
I think it's harder to tell, and it gets blurry once the runtime starts bumping up against the GC CPU utilization limit, i.e. what does effectiveGOGC
mean when the runtime is letting itself exceed the heap goal?I think that's pretty close. Ideally we would have just one metric that could show, at-a-glance, "are you in the red, and if so, how far?"
raulk commentedon Sep 27, 2021
In case you find this useful as a reference (and possibly to include in "prior art"), the go-watchdog library schedules GC according to a user-defined policy. It can infer limits from the environment/host, container, and it can target a maximum heap size defined by the user. I built this library to deal with #42805, and ever since we integrated it into https://github.com/filecoin-project/lotus, we haven't had a single OOM reported.
rsc commentedon Oct 6, 2021
This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group
rsc commentedon Oct 13, 2021
@mknyszek what is the status of this?
mknyszek commentedon Oct 13, 2021
@rsc I believe the design is complete. I've received feedback on the design, iterated on it, and I've arrived at a point where there aren't any major remaining comments that need to be addressed. I think the big question at the center of this proposal is whether the API benefit is worth the cost. The implementation can change and improve over time; most of the details are internal.
Personally, I think the answer is yes. I've found that mechanisms that respects users' memory limits and that give the GC the flexibility to use more of the available memory are quite popular. Where Go users implement this themselves, they're left working with tools (like
runtime.GC
/debug.FreeOSMemory
and heap ballasts) that have some significant pitfalls. The proposal also takes steps to mitigate the most significant costs of having a new GC tuning knob.In terms of implementation, I have some of the foundational bits up for review now that I wish to land in 1.18 (I think they're uncontroversial improvements, mostly related to the scavenger). My next step is create a complete implementation and trial it on real workloads. I suspect that a complete implementation won't land in 1.18 at this point, which is fine. It'll give me time to work out any unexpected issues with the design in practice.
rsc commentedon Oct 20, 2021
Thanks for the summary. Overall the reaction here seems overwhelmingly positive.
Does anyone object to doing this?
95 remaining items