Skip to content

Skip CPU time instrumentation if incompatible rollbar is detected #1362

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

ivoanjo
Copy link
Member

@ivoanjo ivoanjo commented Feb 11, 2021

When I initially worked on #1300 ("Warn on incompatible rollbar version") my understanding of the issue on rollbar/rollbar-gem#1018 was incomplete and I did not realize that old rollbar versions, beyond breaking datadog profiler's CPU time instrumentation, could actually break the customer's application.

One of our private beta customers mentioned he had seen both the warning but also had seen a stack trace:

[1] ! Unable to load application: SystemStackError: stack level too deep
/usr/local/bundle/gems/rollbar-3.1.1/lib/rollbar/plugins/thread.rb:5:in `initialize_with_rollbar': stack level too deep (SystemStackError)
	from /usr/local/bundle/gems/ddtrace-0.45.0.feature.profiling.109457/lib/ddtrace/profiling/ext/cthread.rb:47:in `initialize'
	from /usr/local/bundle/gems/rollbar-3.1.1/lib/rollbar/plugins/thread.rb:6:in `initialize_with_rollbar'
	from /usr/local/bundle/gems/ddtrace-0.45.0.feature.profiling.109457/lib/ddtrace/profiling/ext/cthread.rb:47:in `initialize'

This led me to try to reproduce this issue. I discovered that with

ENV['DD_PROFILING_ENABLED'] = 'true'
require 'ddtrace/profiling/preload'

require 'rollbar'
Rollbar.configure do |c|
  c.access_token = "nope"
end

Thread.new { }.join

I could trigger the issue, and that the current behavior was that the warning from #1300 would be printed, but then the application would proceed to fail.

Having a bit more experience with the codebase now, I realize that the correct place to put this check is in the CPU extension #unsupported_reason method, which will ensure the extension is not loaded at all if an incompatible version of rollbar is around.

The resulting behavior is that even with an old rollbar version, profiler will happily load and work; it will just omit the CPU
time profiling, similarly to how it behaves on other platforms where for different reasons we don't support CPU time profiling.

When I initially worked on #1300 ("Warn on incompatible rollbar
version") my understanding of the issue on rollbar/rollbar-gem#1018 was
incomplete and I did not realize that old rollbar versions, beyond
breaking datadog profiler's CPU time instrumentation, could actually
break the customer's application.

One of our private beta customers mentioned he had seen both the
warning but also had seen a stack trace:

```
[1] ! Unable to load application: SystemStackError: stack level too deep
/usr/local/bundle/gems/rollbar-3.1.1/lib/rollbar/plugins/thread.rb:5:in `initialize_with_rollbar': stack level too deep (SystemStackError)
	from /usr/local/bundle/gems/ddtrace-0.45.0.feature.profiling.109457/lib/ddtrace/profiling/ext/cthread.rb:47:in `initialize'
	from /usr/local/bundle/gems/rollbar-3.1.1/lib/rollbar/plugins/thread.rb:6:in `initialize_with_rollbar'
	from /usr/local/bundle/gems/ddtrace-0.45.0.feature.profiling.109457/lib/ddtrace/profiling/ext/cthread.rb:47:in `initialize'
```

This led me to try to reproduce this issue. I discovered that with

```ruby
ENV['DD_PROFILING_ENABLED'] = 'true'
require 'ddtrace/profiling/preload'

require 'rollbar'
Rollbar.configure do |c|
  c.access_token = "nope"
end

Thread.new { }.join
```

I could trigger the issue, and that the current behavior was that
the warning from #1300 would be printed, but then the application
would proceed to fail.

Having a bit more experience with the codebase now, I realize that
the correct place to put this check is in the CPU extension
 #unsupported_reason method, which will ensure the extension is
not loaded at all if an incompatible version of rollbar is around.

The resulting behavior is that even with an old rollbar version,
profiler will happily load and work; it will just omit the CPU
time profiling, similarly to how it behaves on other platforms where
for different reasons we don't support CPU time profiling.
@ivoanjo ivoanjo requested a review from a team February 11, 2021 15:05
Copy link
Contributor

@delner delner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. For some reason I thought the old warning was disabling CPU when triggered, but I suppose it wasn't at the right level to do so. This makes sense, and the bonus of this update is that we now have a specific Rollbar version to put in the warning.

@ivoanjo ivoanjo merged commit 6fe9cf1 into feature/profiling Feb 17, 2021
@ivoanjo ivoanjo deleted the fix/skip-cpu-time-instrumentation-if-rollbar-is-around branch February 17, 2021 17:00
ivoanjo added a commit that referenced this pull request Mar 11, 2021
…on-if-rollbar-is-around

Skip CPU time instrumentation if incompatible rollbar is detected
ivoanjo added a commit that referenced this pull request Mar 16, 2021
…on-if-rollbar-is-around

Skip CPU time instrumentation if incompatible rollbar is detected
ivoanjo added a commit that referenced this pull request Mar 29, 2021
…on-if-rollbar-is-around

Skip CPU time instrumentation if incompatible rollbar is detected
ivoanjo added a commit that referenced this pull request Apr 12, 2021
…on-if-rollbar-is-around

Skip CPU time instrumentation if incompatible rollbar is detected
ivoanjo added a commit that referenced this pull request Jun 17, 2021
This is a repeat of #1362 -- the logging gem also monkey patches
`Thread` initialization using `alias_method`, which is incompatible
with our instrumentation that uses `preload`.

For details, see TwP/logging#230 where we
submitted a fix to the upstream gem.

Right now the only way to avoid this issue and still enable proflier is
to either remove the logging gem or to disable the feature using the
`LOGGING_INHERIT_CONTEXT` environment variable.

If/when the gem creators accept our proposed change and release it,
we'll need to revise this to take that into consideration, as we do
for rollbar.

NOTE: I'm not entirely happy with the complexity that is accumulating
for the tests, but I've decided to not refactor it yet. If we need
to add more gems, we should definitely refactor the tests. My hope is
that soon we can get rid of our `Thread` monkey patches, and thus get
rid of all of this instead.
ivoanjo added a commit that referenced this pull request Jun 17, 2021
Instead, customers will get the following log message:

```
INFO -- ddtrace: [ddtrace] (/app/lib/ddtrace/profiling/tasks/setup.rb
:44:in `activate_cpu_extensions') CPU time profiling skipped because
native CPU time is not supported: The `logging` gem is installed and
its thread inherit context feature is enabled. Please add
LOGGING_INHERIT_CONTEXT=false to your application's environment
variables to disable the conflicting `logging` gem feature. See
TwP/logging#230 for details. Profiles
containing Wall time will still be reported.
```

This is a repeat of #1362 -- the logging gem also monkey patches
`Thread` initialization using `alias_method`, which is incompatible
with our instrumentation that uses `preload`.

For details, see TwP/logging#230 where we
submitted a fix to the upstream gem.

Right now the only way to avoid this issue and still enable proflier is
to either remove the logging gem or to disable the feature using the
`LOGGING_INHERIT_CONTEXT` environment variable.

If/when the gem creators accept our proposed change and release it,
we'll need to revise this to take that into consideration, as we do
for rollbar.

NOTE: I'm not entirely happy with the complexity that is accumulating
for the tests, but I've decided to not refactor it yet. If we need
to add more gems, we should definitely refactor the tests. My hope is
that soon we can get rid of our `Thread` monkey patches, and thus get
rid of all of this instead.
ivoanjo added a commit that referenced this pull request Jun 17, 2021
Instead, customers will get the following log message:

```
INFO -- ddtrace: [ddtrace] (/app/lib/ddtrace/profiling/tasks/setup.rb
:44:in `activate_cpu_extensions') CPU time profiling skipped because
native CPU time is not supported: The `logging` gem is installed and
its thread inherit context feature is enabled. Please add
LOGGING_INHERIT_CONTEXT=false to your application's environment
variables to disable the conflicting `logging` gem feature. See
TwP/logging#230 for details. Profiles
containing Wall time will still be reported.
```

This is a repeat of #1362 -- the logging gem also monkey patches
`Thread` initialization using `alias_method`, which is incompatible
with our instrumentation that uses `preload`.

For details, see TwP/logging#230 where we
submitted a fix to the upstream gem.

Right now the only way to avoid this issue and still enable proflier is
to either remove the logging gem or to disable the feature using the
`LOGGING_INHERIT_CONTEXT` environment variable.

If/when the gem creators accept our proposed change and release it,
we'll need to revise this to take that into consideration, as we do
for rollbar.

NOTE: I'm not entirely happy with the complexity that is accumulating
for the tests, but I've decided to not refactor it yet. If we need
to add more gems, we should definitely refactor the tests. My hope is
that soon we can get rid of our `Thread` monkey patches, and thus get
rid of all of this instead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants