-
Notifications
You must be signed in to change notification settings - Fork 3.9k
sql/colexecerror: CatchVectorizedRuntimeError uses an expensive debug.Stack call #123235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
branch-release-24.1
Used to mark GA and release blockers, technical advisories, and bugs for 24.1
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
GA-blocker
O-support
Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs
T-sql-queries
SQL Queries Team
Comments
michae2
added a commit
to michae2/cockroach
that referenced
this issue
Apr 30, 2024
Informs: cockroachdb#123235 Release note: None
michae2
added a commit
to michae2/cockroach
that referenced
this issue
Apr 30, 2024
Informs: cockroachdb#123235 Release note: None
michae2
added a commit
to michae2/cockroach
that referenced
this issue
Apr 30, 2024
Informs: cockroachdb#123235 Release note: None
michae2
added a commit
to michae2/cockroach
that referenced
this issue
Apr 30, 2024
Fixes: cockroachdb#123235 Release note (performance improvement): Make error handling in the vectorized execution engine much cheaper. This should help avoid bad metastable regimes perpetuated by statement timeout handling consuming all CPU time, leading to more statement timeouts.
michae2
added a commit
to michae2/cockroach
that referenced
this issue
May 1, 2024
Informs: cockroachdb#123235 Release note: None
michae2
added a commit
to michae2/cockroach
that referenced
this issue
May 1, 2024
Informs: cockroachdb#123235 Release note: None
michae2
added a commit
to michae2/cockroach
that referenced
this issue
May 1, 2024
Informs: cockroachdb#123235 Release note: None
michae2
added a commit
to michae2/cockroach
that referenced
this issue
May 1, 2024
Fixes: cockroachdb#123235 Release note (performance improvement): Make error handling in the vectorized execution engine much cheaper. This should help avoid bad metastable regimes perpetuated by statement timeout handling consuming all CPU time, leading to more statement timeouts.
michae2
added a commit
to michae2/cockroach
that referenced
this issue
May 2, 2024
Informs: cockroachdb#123235 Release note: None
michae2
added a commit
to michae2/cockroach
that referenced
this issue
May 2, 2024
Informs: cockroachdb#123235 Release note: None
michae2
added a commit
to michae2/cockroach
that referenced
this issue
May 2, 2024
Informs: cockroachdb#123235 Release note: None
michae2
added a commit
to michae2/cockroach
that referenced
this issue
May 2, 2024
Fixes: cockroachdb#123235 Release note (performance improvement): Make error handling in the vectorized execution engine much cheaper. This should help avoid bad metastable regimes perpetuated by statement timeout handling consuming all CPU time, leading to more statement timeouts.
michae2
added a commit
to michae2/cockroach
that referenced
this issue
May 2, 2024
Informs: cockroachdb#123235 Release note: None
michae2
added a commit
to michae2/cockroach
that referenced
this issue
May 2, 2024
Informs: cockroachdb#123235 Release note: None
blathers-crl bot
pushed a commit
that referenced
this issue
May 2, 2024
Informs: #123235 Release note: None
blathers-crl bot
pushed a commit
that referenced
this issue
May 2, 2024
Fixes: #123235 Release note (performance improvement): Make error handling in the vectorized execution engine much cheaper. This should help avoid bad metastable regimes perpetuated by statement timeout handling consuming all CPU time, leading to more statement timeouts.
michae2
added a commit
to michae2/cockroach
that referenced
this issue
May 2, 2024
Informs: cockroachdb#123235 Release note: None
michae2
added a commit
to michae2/cockroach
that referenced
this issue
May 2, 2024
Informs: cockroachdb#123235 Release note: None
michae2
added a commit
to michae2/cockroach
that referenced
this issue
May 2, 2024
Informs: cockroachdb#123235 Release note: None
michae2
added a commit
to michae2/cockroach
that referenced
this issue
May 2, 2024
Fixes: cockroachdb#123235 Release note (performance improvement): Make error handling in the vectorized execution engine much cheaper. This should help avoid bad metastable regimes perpetuated by statement timeout handling consuming all CPU time, leading to more statement timeouts.
michae2
added a commit
to michae2/cockroach
that referenced
this issue
May 3, 2024
Informs: cockroachdb#123235 Release note: None
michae2
added a commit
to michae2/cockroach
that referenced
this issue
May 3, 2024
Informs: cockroachdb#123235 Release note: None
michae2
added a commit
to michae2/cockroach
that referenced
this issue
May 3, 2024
Informs: cockroachdb#123235 Release note: None
michae2
added a commit
to michae2/cockroach
that referenced
this issue
May 3, 2024
Fixes: cockroachdb#123235 Release note (performance improvement): Make error handling in the vectorized execution engine much cheaper. This should help avoid bad metastable regimes perpetuated by statement timeout handling consuming all CPU time, leading to more statement timeouts.
michae2
added a commit
to michae2/cockroach
that referenced
this issue
May 6, 2024
Informs: cockroachdb#123235 Release note: None
michae2
added a commit
to michae2/cockroach
that referenced
this issue
May 6, 2024
Informs: cockroachdb#123235 Release note: None
michae2
added a commit
to michae2/cockroach
that referenced
this issue
May 6, 2024
Informs: cockroachdb#123235 Release note: None
michae2
added a commit
to michae2/cockroach
that referenced
this issue
May 6, 2024
Fixes: cockroachdb#123235 Release note (performance improvement): Make error handling in the vectorized execution engine much cheaper. This should help avoid bad metastable regimes perpetuated by statement timeout handling consuming all CPU time, leading to more statement timeouts.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
branch-release-24.1
Used to mark GA and release blockers, technical advisories, and bugs for 24.1
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
GA-blocker
O-support
Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs
T-sql-queries
SQL Queries Team
Uh oh!
There was an error while loading. Please reload this page.
Describe the problem
The usage of
debug.Stack
here is problematic:cockroach/pkg/sql/colexecerror/error.go
Line 41 in 81569be
During a recent customer outage, we captured this profile:

(Full profile available here.)
In the outage, the CRDB behavior was that once CPU usage became high enough, response times started to go up, and execution throughput went down. Notably, the throughput/latency impact is correlated with a large increase in concurrent open transactions. See the support ticket for more details.
The main part of the issue here appears to be that the usage of
debug.Stack
causes lock contention due to the system calls that rely onfutex
.To Reproduce
This can be seen with the
admission-control/tpcc-severe-overload
test. See these notes.Expected behavior
Make the CatchVectorizedRuntimeError less expensive.
There are two main ideas:
ExpectedError
already wraps many errors with a known type:cockroach/pkg/sql/colexecerror/error.go
Lines 202 to 204 in 81569be
InternalError
function could do the same.runtime.Callers
andruntime.CallersFrames
instead ofdebug.Stack
. It should be much cheaper. (See runtime: Stack() calls lock against each other unnecessarily, even when single-goroutine golang/go#56400 for some discussion on whydebug.Stack
is slower.)In addition to these two changes, there might be ways to improve the function even more. We can use this issue to track those ideas as well.
@michae2 has some early benchmark results from making some of these changes.
Jira issue: CRDB-38252
The text was updated successfully, but these errors were encountered: