-
Notifications
You must be signed in to change notification settings - Fork 3.9k
cli,server: set GOMEMLIMIT by default #97666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
I didn't add any tests since I'm not sure whether and how to test this (if this is needed). Any pointers are welcome. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since I'm not sure whether and how to test this (if this is needed)
I'm pretty sure we want tests. Maybe you can ask test-eng for ideas?
We have some tests in pkg/cli/interactive_tests/test_sql_mem_monitor.tcl.disabled
already. (This is currently disabled but should be reinstated ASAP)
Reviewed 7 of 7 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @yuzefovich)
pkg/cli/start.go
line 1342 at r1 (raw file):
maybeWarnMemorySizes(ctx) if serverCfg.GoMemLimit == goMemLimitNotSet {
use the flag interface .Lookup().Changed
like is done in other places.
pkg/cli/cliflags/flags.go
line 163 at r1 (raw file):
GoMemLimit = FlagInfo{ Name: "go-mem-limit",
maybe make this consistent with the other flags and call it --max-go-memory
pkg/server/config.go
line 261 at r1 (raw file):
ObsServiceAddr string // GoMemLimit is the soft memory limit enforced by the Go runtime. Set to 0
See my other comment about the code location. This may not need to be included in BaseConfig.
pkg/server/server.go
line 572 at r1 (raw file):
if cfg.GoMemLimit != 0 { runtimeDebug.SetMemoryLimit(cfg.GoMemLimit)
This is not the right place to do this. Things that impact the process-wide configuration should remain scope to pkg/cli, for example in start.go. Somewhere around the call to cgroups.AdjustMaxProcs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review!
Re: tests - what kind of tests do you envision? That the soft memory limit prevents OOMs? (This will be implicitly tested via tpch_concurrency
roachtest.) Or that the default calculation is correct? (That seems like a reasonable unit test for this commit, but I'm not sure how to test it.) Or something else?
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @knz)
pkg/cli/start.go
line 1342 at r1 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
use the flag interface
.Lookup().Changed
like is done in other places.
Done.
pkg/cli/cliflags/flags.go
line 163 at r1 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
maybe make this consistent with the other flags and call it
--max-go-memory
Done.
pkg/server/config.go
line 261 at r1 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
See my other comment about the code location. This may not need to be included in BaseConfig.
Done.
pkg/server/server.go
line 572 at r1 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
This is not the right place to do this. Things that impact the process-wide configuration should remain scope to pkg/cli, for example in start.go. Somewhere around the call to
cgroups.AdjustMaxProcs
.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or that the default calculation is correct? (That seems like a reasonable unit test for this commit, but I'm not sure how to test it.)
Yes that is what i thought about.
I would still use a global variable in the cli
package and hook it up to your flag, alongside with the others at the top of pkg/cli/flags.go.
Then you'd ensure it's reset in cli/context.go
Then you can make unit tests like the others in cli/flags_test.go.
Reviewed 6 of 7 files at r2, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @yuzefovich)
pkg/cli/start.go
line 592 at r2 (raw file):
} } if goMemLimit == 0 {
We need to also be conservative about the lower bound: it would be unreasonable if the above computation results in the limit set to just 10MB, for example.
So we need to see the following in the implementation:
- the limit should never be set lower than some "reasonable" minimum value (maybe 200MB? maybe 1GB?)
- there should be warnings when the computation results in very low memory bounds
(I think the end result is that we'll be informing the user when there's really too little memory available in the system, and that would be a good addition next to your primary change)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I added some sanity tests. I couldn't figure out how to use the global default that I just introduced at the top of flags.go
, so I extracted the calculation into a method that is called in the test.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @knz)
pkg/cli/start.go
line 592 at r2 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
We need to also be conservative about the lower bound: it would be unreasonable if the above computation results in the limit set to just 10MB, for example.
So we need to see the following in the implementation:
- the limit should never be set lower than some "reasonable" minimum value (maybe 200MB? maybe 1GB?)
- there should be warnings when the computation results in very low memory bounds
(I think the end result is that we'll be informing the user when there's really too little memory available in the system, and that would be a good addition next to your primary change)
Good point, I added a lower bound of 256MiB and a couple of additional warnings.
pkg/cli/flags_test.go
line 213 at r3 (raw file):
} for i, tc := range []struct {
Two of these test cases result in warnings (which is expected), is this acceptable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like your tests.
Reviewed 8 of 8 files at r3, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @yuzefovich)
-- commits
line 18 at r3:
I recommend you add a few examples in the release note. The complexity of the change is rather high, and we'll need to explain this in docs.
pkg/cli/flags.go
line 66 at r3 (raw file):
// initPreFlagsDefaults initializes the values of the global variables // defined above. func initPreFlagsDefaults() {
You need to reset your new global variable here, so it gets reset in-between tests.
pkg/cli/start.go
line 689 at r3 (raw file):
// those unaccounted for allocations. limit := int64(2.25 * float64(serverCfg.MemoryPoolSize)) if limit < minDefaultGoMemLimit {
I don't understand this comparison -- why do you apply the limit for --max-go-memory
to --max-sql-memory
? Don't we need different minimums?
pkg/cli/flags_test.go
line 213 at r3 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Two of these test cases result in warnings (which is expected), is this acceptable?
yes.
pkg/cli/flags_test.go
line 234 at r3 (raw file):
maxSQLMemory: "200MiB", cache: ".8", // --cache is set too high so that we truncate to the lower bound.
are you sure about this test? If the test computer has, say, 100GB of memory, then cache is 80GB and you still have 20GB to go with.
Can you confirm this will lead to clusters upgrade from 22.2 to 23.1 to having No need to block merging on approval of @1lann but if he has a different suggested parametrizaton, can we consider it? Ideally, this change allows us to roll out to CC without putting in overrides on the flag. That might not be possible if we have env-specific reasons to set lower, etc. But let's see. Jason has thought a bit about how we should set in CC. CC @reuter-roach |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you confirm this will lead to clusters upgrade from 22.2 to 23.1 to having
GOMEMLIMT
set? As a result, upgrading a CC cluster to 23.1 will lead to havingGOMEMLIMT
set.
Correct. If GOMEMLIMIT
env var is not set, then when upgrading to 23.1 we will either use the operator-provided value for --max-go-memory
flag or the default computed value to set the memory limit on the Go runtime; if GOMEMLIMIT
env var is set and --max-go-memory
is not, then the env var is used.
No need to block merging on approval of @1lann but if he has a different suggested parametrizaton, can we consider it?
Of course, all feedback is very welcome!
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @1lann, @andreimatei, and @knz)
Previously, knz (Raphael 'kena' Poss) wrote…
I recommend you add a few examples in the release note. The complexity of the change is rather high, and we'll need to explain this in docs.
I extended the release note, but please let me know if you think some other details should be mentioned there.
pkg/cli/flags.go
line 66 at r3 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
You need to reset your new global variable here, so it gets reset in-between tests.
Done (although currently it doesn't seem necessary since the tests don't modify this global variable).
pkg/cli/start.go
line 689 at r3 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
I don't understand this comparison -- why do you apply the limit for
--max-go-memory
to--max-sql-memory
? Don't we need different minimums?
This is effectively using max(minDefaultGoMemLimit, 2.25*MemoryPoolSize)
which is the current default formula plus a lower bound. Seems reasonable, no? I can omit / move the warning about --max-sql-memory
being too low to maybeWarnMemorySizes
.
pkg/cli/flags_test.go
line 234 at r3 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
are you sure about this test? If the test computer has, say, 100GB of memory, then cache is 80GB and you still have 20GB to go with.
The current upper bound (maxGoMemLimit
) is 0.9 * sysMem - 1.15 * cache
and with cache=0.8
we always will get a negative number there, regardless of total system memory. We'll override maxGoMemLimit
to minDefaultGoMemLimit
which is what getDefaultGoMemLimit
will return.
Thanks for doing this! Very excited about it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r4, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @1lann, @andreimatei, and @yuzefovich)
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I extended the release note, but please let me know if you think some other details should be mentioned there.
This release note is excellent.
Can you also add a markdown table underneath with some example values? something like the following:
| Command line flags | Computed max SQL memory | Computed max Go memory |
|---|---|---|
| ... | ... | ... |
pkg/cli/flags.go
line 66 at r3 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Done (although currently it doesn't seem necessary since the tests don't modify this global variable).
they do, since your new test code touches it via
startCtx.goMemLimitValue = makeBytesOrPercentageValue(&goMemLimit, memoryPercentResolver)
pkg/cli/flags_test.go
line 234 at r3 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
The current upper bound (
maxGoMemLimit
) is0.9 * sysMem - 1.15 * cache
and withcache=0.8
we always will get a negative number there, regardless of total system memory. We'll overridemaxGoMemLimit
tominDefaultGoMemLimit
which is whatgetDefaultGoMemLimit
will return.
I think this is a sign that you should give your constants (1.15, 2.25, 0.9) explanatory names, and then use these names to construct your test case here.
b9b4ad1
to
d0691ee
Compare
This commit introduces new `--max-go-memory` flag to CLI `start` command which controls the soft memory limit set on the Go runtime. The default value for this flag is computed as `2.25x --max-sql-memory` (which should give enough room for allocations not registered with our memory accounting system), subject to not exceeding `90% RAM - 1.15x pebble cache`. The rationale for this upper bound is as follows: - we don't want for the estimated max memory usage to exceed 90% of the available RAM to prevent the OOMs - Go runtime doesn't control the pebble cache, so we need to subtract it - anecdotally, the pebble cache can have some slop over its target size (perhaps, due to memory fragmentation), so we adjust the footprint of the cache by 15%. The lower bound on the default value is set to 256MiB. Release note (cli change): New flag `--max-go-memory` is introduced to `start` command. It controls the soft memory limit on the Go runtime which adjusts the behavior of the Go garbage collector to try keeping the memory usage under the soft memory limit (the limit is "soft" in a sense that it is not enforced if live objects (RSS) exceed it). Similar to the `--max-sql-memory` flag, the new flag `--max-go-memory` accepts numbers interpreted as bytes, size suffixes (e.g. 1GB and 1GiB) or a percentage of physical memory (e.g. .25). If left unspecified, the flag defaults to 2.25x of `--max-sql-memory` (subject to `--max-go-memory + 1.15x --cache` not exceeding 90% of available RAM). Set to 0 to disable the soft memory limit (not recommended). If GOMEMLIMIT env var is set and `--max-go-memory` is not, then the value from the env var is used; if both are set, then the flag takes precedence. Here is a few examples of how the default value is calculated on a machine with 16GiB of RAM: | Command line flags | Computed max SQL memory | Computed cache size | Computed max Go memory | | --- | --- | --- | --- | | --max-sql-memory=.25 --cache=.25 | 4GiB | 4GiB | 9GiB | | --max-sql-memory=.1 --cache=.5 | 1.6GiB | 8GiB | 3.6GiB | | --max-sql-memory=.25 --cache=.4 | 4GiB | 6.4GiB | 7.04GiB | | --max-sql-memory=100MiB | 100MiB | 128MiB | 256MiB | | --max-sql-memory=.4 --cache=.2 --max-go-memory=100MiB | 6.4GiB | 3.2GiB | 100MiB | Explanation: - in the first two lines we just use the default formula `2.25x --max-sql-memory` - in the third line, the default formula results in exceeding the upper bound on total usage (including the cache), so we use the upper bound determined as `0.9 * total RAM - 1.15 * cache size` - in the fourth line, the default formula results in 225MiB which is smaller than the lower bound of 256MiB, so we bump the value to that lower bound - in the fifth line, we use the value specified by the user (even though it is smaller than the lower bound on the default value). Release note (general change): CockroachDB now uses the soft memory limit of Go runtime by default. This feature of Go has been available since 22.2 version by setting GOMEMLIMIT environment variable and was opt-in for 22.2. Now it is enabled by default which should reduce the likelihood of CockroachDB process OOMing. This soft memory limit can be disabled by specifying `--max-go-memory=0` in `cockroach start`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @1lann, @andreimatei, and @knz)
Previously, knz (Raphael 'kena' Poss) wrote…
This release note is excellent.
Can you also add a markdown table underneath with some example values? something like the following:
| Command line flags | Computed max SQL memory | Computed max Go memory | |---|---|---| | ... | ... | ... |
Thanks, done.
pkg/cli/flags.go
line 66 at r3 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
they do, since your new test code touches it via
startCtx.goMemLimitValue = makeBytesOrPercentageValue(&goMemLimit, memoryPercentResolver)
Ack.
pkg/cli/flags_test.go
line 234 at r3 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
I think this is a sign that you should give your constants (1.15, 2.25, 0.9) explanatory names, and then use these names to construct your test case here.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r5, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @1lann and @andreimatei)
@1lann @reuter-roach should I wait for your feedback on this PR? |
lgtm. Similar to @joshimhoff, I'm excited for this change to land! I can see two potential problems in the cloud environment:
The first problem is mitigated by setting |
Thanks Josh and Jordan for taking a look! We can always update the default calculation later if we find that in cloud clusters some other value works better. Thanks Raphael for thorough review! bors r+ |
Build succeeded: |
This commit introduces new
--max-go-memory
flag to CLIstart
commandwhich controls the soft memory limit set on the Go runtime. The default
value for this flag is computed as
2.25x --max-sql-memory
(whichshould give enough room for allocations not registered with our memory
accounting system), subject to not exceeding
90% RAM - 1.15x pebble cache
.The rationale for this upper bound is as follows:
available RAM to prevent the OOMs
(perhaps, due to memory fragmentation), so we adjust the footprint of the
cache by 15%.
The lower bound on the default value is set to 256MiB.
Fixes: #95690.
Release note (cli change): New flag
--max-go-memory
is introduced tostart
command. It controls the soft memory limit on the Go runtimewhich adjusts the behavior of the Go garbage collector to try keeping the
memory usage under the soft memory limit (the limit is "soft" in a sense
that it is not enforced if live objects (RSS) exceed it). Similar to the
--max-sql-memory
flag, the new flag--max-go-memory
accepts numbersinterpreted as bytes, size suffixes (e.g. 1GB and 1GiB) or a percentage
of physical memory (e.g. .25). If left unspecified, the flag defaults to
2.25x of
--max-sql-memory
(subject to--max-go-memory + 1.15x --cache
not exceeding 90% of available RAM). Set to 0 to disable the soft memory
limit (not recommended). If GOMEMLIMIT env var is set and
--max-go-memory
is not, then the value from the env var is used; ifboth are set, then the flag takes precedence.
Here is a few examples of how the default value is calculated on
a machine with 16GiB of RAM:
Explanation:
2.25x --max-sql-memory
bound on total usage (including the cache), so we use the upper bound
determined as
0.9 * total RAM - 1.15 * cache size
smaller than the lower bound of 256MiB, so we bump the value to that
lower bound
it is smaller than the lower bound on the default value).
Release note (general change): CockroachDB now uses the soft memory
limit of Go runtime by default. This feature of Go has been available
since 22.2 version by setting GOMEMLIMIT environment variable and was
opt-in for 22.2. Now it is enabled by default which should reduce the
likelihood of CockroachDB process OOMing. This soft memory limit can be
disabled by specifying
--max-go-memory=0
incockroach start
.