Skip to content

Integrate OpenTelemetry #57992

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mcollina opened this issue Apr 23, 2025 · 21 comments
Open

Integrate OpenTelemetry #57992

mcollina opened this issue Apr 23, 2025 · 21 comments

Comments

@mcollina
Copy link
Member

OpenTelemetry (OTEL) has emerged as the de facto standard for observability. It's vendor-neutral and a CNCF project. Lately, Deno shipped support for OTEL.

Based on a few conversations in the hallway at the recent collaboration summit, there is interest in building this within Node.js. Moreover, it would simplify many integrations and improve overall performance.

@mcollina
Copy link
Member Author

cc @bengl @targos @legendecas @cjihrig @nodejs/diagnostics

@cjihrig
Copy link
Contributor

cjihrig commented Apr 23, 2025

Would the plan be to vendor the JavaScript library (or maybe the C++ library)? One concern could be that @opentelemetry/core alone is 9.3MB according to pkg-size.dev. Maybe that's fine though?

It also looks like Deno has run into at least one rough edge here.

@mhdawson
Copy link
Member

@mcollina thanks for opening the issue, worth discussing.

@davidfiala
Copy link

Would be good to ensure there's a compile time flag to exclude this entirely, not just a runtime flag. Mostly to save space and resources on embedded environments.

@Flarna
Copy link
Member

Flarna commented Apr 23, 2025

Do you think about removing the "burden" to manually install OTel components like SDK, exporter,... by selecting a specific set of them and include it into node core?
I fear we will run into similar compat problems as that one linked above for deno. Also topics like long term stability need to be clarified. Quite some parts of OTel are still experimental and move/change fast.
OTel is quite extensible, e.g. it supports exporting data via HTTP, grpc,... therefore it's also important to choose wise what is included always and what can be extended by users using npm modules. Which again raises the compat topic between node internal implementation and OTel JS implementation.

Or is it more about creating spans for node internals (e.g. HTTP) using OTel API but leave the SDK, Exporter,.. installation/configuration up to user? So somewhat replace existing diagnostic channels by calls to OTel.

@rochdev
Copy link
Contributor

rochdev commented Apr 23, 2025

I don't think integrating OpenTelemetry is the right way to go. What should be done instead is adding diagnostics_channel support everywhere telemetry is needed, which OpenTelemetry can then listen to when installed. This is also how it should be integrated into libraries instead of adding a dependency on OpenTelemetry everywhere as is currently done. This separates the concern of reporting the data from collecting and processing it.

I opened a similar issue a few years ago in open-telemetry/opentelemetry-js#1263.

@AbhiPrasad
Copy link

From the perspective of someone who's contributed to OpenTelemetry, and currently works at an observability vendor (Sentry), I would also advocate for putting diagnostics_channel everywhere we need telemetry. OpenTelemetry instrumentation (and other kind of telemetry data) can easily build integrations on-top of diagnostics_channel, which we've seen does work in use cases like Undici.

There are other kinds of telemetry than logs, traces, and metrics (:cough: errors) that could benefit from this, and defaulting to OTEL for observability means we restrict the kind of signals that can be collected from Node.js and it's internals. I'd prefer if Node.js kept this open, as it leaves room for innovation, although I am a bit biased working at Sentry.

Bundling in OTEL means quite a bit of API surface area that Node.js has to expose. It will also introduce possibilities of conflicting versions, especially for instrumentations and semantic conventions (semantic conventions frequently experience changes every release). We built the Sentry Node SDKs on-top of OpenTelemetry, but are now experiencing pains with the breaking change between 1.x and 2.x of the OTEL JS SDKs (it's very hard to support both in the way the packages are architected).

@bengl
Copy link
Member

bengl commented Apr 23, 2025

Overall I see a bunch of wins in doing this roughly similar to how Deno did it.

  1. There would be less non-core code needed for OTel-style observability. This is great for scenarios like serverless where bundle sizes matter quite a bit.
  2. Code in core can be optimized in ways that are difficult or impossible to do outside of core.
  3. For observability vendors, zero-install onboarding is desirable, even if it only provides a subset of a vendor's offerings.

All that said, as @rochdev mentioned, as did @AbhiPrasad and @Flarna, there ought to be a focus on including diagnostics_channel support in as many places as possible, both in Node.js core and in popular packages. Rather than including OTel API components everywhere, diagnostics_channel already provides a sufficient API for tracing, logging, metrics, and a host of other observability possibilities. It's a lower-level API that's included, meaning OTel and other observability concepts can be build upon it relatively trivially. By building OTel data on top of it, it's also much easier to share the context objects, which is great for folks building on top of built-in instrumentations. Self-instrumentation is a goal of OTel, but it's better in a lot of ways to do it via diagnostics_channel.

But it's not an either-or situation. We can have both! To serve both these needs, I propose the following approach:

  1. Ensure that diagnostics_channel is used appropriately in every place in core that would be useful for OTel-style observability. This primarily means HTTP inbounds and outbounds.
  2. Create internals that subscribe to those channels, create those spans/traces, insert context propafation headers, and ship all that data via OTLP (via HTTP for now). This would all only be activated when an OTLP endpoint is provided via an environment variable.
  3. Create a new public-facing API that takes arbitrary tracing channels, and subscribes to them and creates spans out of them when the OTel subsystem mentioned above is activated. I'm doing a lot of hand-waving here around converting diagnostics_channel data into span tags, but we ought to be able to have a mostly-declarative way of doing this.
  4. As a stopgap, provide TracerProvider, etc., implementations to enable @opentelemetry/api usage.

This would create a subsystem that's runtime-optional, API compatible, can be optimized in core, and provides a nice OOTB experience, while ensuring that low-level observability primitives are still usable for all kinds of vendor needs, and providing a lighter-weight API for library authors to add their own instrumentation with.

I'm glossing over a lot of finer details and rough edges here because I wanted to chime in before the thread got too long 😄.

@cjihrig
Copy link
Contributor

cjihrig commented Apr 23, 2025

Another option could be for Node.js to develop a blessed OpenTelemetry integration that lives on npm, but is in the Node.js GitHub org. Node core could work backward from that to provide any missing telemetry data or APIs. It wouldn't be quite as simple to get started with, but maybe it means npm installing one thing instead of many things and dealing with the config headache that sometimes comes with OTel.

@legendecas
Copy link
Member

legendecas commented Apr 24, 2025

Another option could be for Node.js to develop a blessed OpenTelemetry integration that lives on npm, but is in the Node.js GitHub org. Node core could work backward from that to provide any missing telemetry data or APIs. It wouldn't be quite as simple to get started with, but maybe it means npm installing one thing instead of many things and dealing with the config headache that sometimes comes with OTel.

I'm not sure I'm following the reason behind maintaining a https://github.com/nodejs version of OpenTelemetry. Node.js is a first class runtime that is officially supported by https://github.com/open-telemetry/opentelemetry-js and I don't think there is any reason that will block us from landing any improvement in the official repo or npm packages.

I'd agree with @bengl that the ultimate goal is to improve the onboarding setup for users to get a full observerbility experience. OpenTelemetry has its own API and SDK versioning pace and there are disparities between OTel's and Node.js'. I'd agree that the first step is to improve diagnostic_channel in Node.js and reduce the instrumentation cost in OpenTelemetry.

However, shipping OTel in Node.js core is not the only approach that we can do towards the goal of improving observerbility setup. We can also provide an integration stack, like a Docker image setup, to avoid coupling the versioning issue of the two project at the first stage.

Overall, I'm glad that this is brought up!

(/cc @open-telemetry/javascript-maintainers @blumamir @dyladan @JamieDanielson @pichlermarc @trentm)

@cjihrig
Copy link
Contributor

cjihrig commented Apr 24, 2025

I'm not sure I'm following the reason behind maintaining a https://github.com/nodejs version of OpenTelemetry. Node.js is a first class runtime that is officially supported by https://github.com/open-telemetry/opentelemetry-js and I don't think there is any reason that will block us from landing any improvement in the official repo or npm packages.

Sorry, I wasn't implying any changes to the OpenTelemetry packages. By "provide any missing telemetry data or APIs" I meant in core, such as additional diagnostic channels, etc.

As far as something that could be put on npm, what I meant is: Deno is offering zero config OTel integration. What if we could do something similar via a single npm package, assuming that OTel would not be integrated directly into core. I think that's something that could be useful to users due to the complexity and number of dependencies that need to be configured when setting up OTel by hand.

@Flarna
Copy link
Member

Flarna commented Apr 24, 2025

Shipping preconfigured/zero config packages for OTel is usually done by APM tool vendors.
They know/decide on details like

  • export protocol
  • export endpoints
  • authentication on these endpoints
  • which propagators to use
  • which resource detectors to use
  • opinionated selection and config of 3rd party instrumentations
  • ...
    to get results fitting best to their backend capabilities.

Sure, node project could provide one more opinion here. But why would node project do a better job here then OTel project could do?
There is already an npm package which takes care about most of it and allows config via env vars.

@cjihrig
Copy link
Contributor

cjihrig commented Apr 24, 2025

Shipping preconfigured/zero config packages for OTel is usually done by APM tool vendors.
They know/decide on details like

My understanding is that this issue is proposing something similar to what Deno is doing. If Deno can do it, so can Node even though we aren't an APM vendor. If you don't think this is a good idea, that is a fine opinion to have.

But why would node project do a better job here then OTel project could do? There is already an npm package which takes care about most of it and allows config via env vars.

My understanding from proposing auto-instrumentation for node:sqlite is that the OTel team isn't overly eager to support this (but maybe I'm wrong there).

@Qard
Copy link
Member

Qard commented Apr 24, 2025

Very much agreed with @bengl. We've talked at length about this direction in the past. 🙂

Definitely biased as the creator of diagnostics_channel here, but providing a base on which OTel can be built without its core needing to exist within Node.js core was literally the primary goal I had in mind when creating it.

I wanted to have an API that we could integrate in core with minimal dependencies or overhead, but with the flexibility that additional systems could be connected in userland or potentially exist in core too, but lazy-loaded.

With diagnostics_channel as the base architecture, the heavier capabilities of OTel can be layered on top without needing any of its infrastructure loaded or initialized until needed.

@Flarna
Copy link
Member

Flarna commented Apr 24, 2025

My understanding from proposing auto-instrumentation for node:sqlite is that the OTel team isn't overly eager to support this (but maybe I'm wrong there).

A concrete instrumentation for node:sqlite is for sure a good thing. Wherever it is hosted it requires to be maintained. OTel project usually doesn't take the maintenance part for instrumentations but they offer to host it on their repo which comes with pros/conts.

Having diagnostic channels (or some other hooks) in node:sqlite would avoid the need for monkey patching without binding node core to OTel API.
On top of the same hooks other tools could also build their stuff.

Having instrumentation/monitoring friendly built-ins and maybe even implement/maintain some of these instrumentations is one topic. But IMHO providing a preconfigured OTel setup is a different beast by far harder to get right.

@tedsuo
Copy link

tedsuo commented Apr 24, 2025

Very much agreed with @bengl. We've talked at length about this direction in the past. 🙂

Definitely biased as the creator of diagnostics_channel here, but providing a base on which OTel can be built without its core needing to exist within Node.js core was literally the primary goal I had in mind when creating it.

I wanted to have an API that we could integrate in core with minimal dependencies or overhead, but with the flexibility that additional systems could be connected in userland or potentially exist in core too, but lazy-loaded.

With diagnostics_channel as the base architecture, the heavier capabilities of OTel can be layered on top without needing any of its infrastructure loaded or initialized until needed.

I feel like there is a lot of architectural alignment here. OpenTelemetry keeps the API packages separate from the implementation packages for the same reason - the presence of OpenTelemetry instrumentation in a package does not automatically require the application to take a dependency on span processors, exporters, or other implementation packages.

So one way to rephrase the proposal could be: should Otel and diagnostics channel merge in some way, so that there is one instrumentation API that is aligned with the rest of the OpenTelemetry toolchain?

@pichlermarc
Copy link

My understanding from open-telemetry/opentelemetry-js-contrib#2698 is that the OTel team isn't overly eager to support this (but maybe I'm wrong there).

@cjihrig I posted a comment on the instrumentation request for node:sqlite to clarify what I meant. I think the discussion about instrumentation hosting is related, but slightly off-topic for this thread, so we can continue over there. 🙂

@Qard
Copy link
Member

Qard commented Apr 25, 2025

should Otel and diagnostics channel merge in some way, so that there is one instrumentation API that is aligned with the rest of the OpenTelemetry toolchain?

I would be happy to entertain that idea, though I'd be a bit wary with OpenTelemetry trying to unify to a single shape across languages, which is not actually optimal when languages themselves are shaped quite differently.

In my opinion each language should likely have its own thing, which maybe looks similar to what diagnostics_channel is, but should probably differ in various ways to match the semantics of the language. It can be the responsibility of APIs layered over top of that to smooth out those inconsistencies, if desirable.

Part of what makes OpenTelemetry such a heavyweight thing to propose integrating into a language runtime is how much work it needs to do to contort itself into the shape of what the language-neutral spec says. Another cost being all the additional state management required as the OpenTelemetry spec mixes the data extraction need with the data propagation need.

Node.js solves these problems with the isolation of diagnostics_channel and AsyncLocalStorage as two separate generic systems for data extraction and data propagation. I would really like to see this separated model replicated in other languages as I think it would help a lot to make the two purposes better fit the differences between the languages. That and there are quite a lot of use cases for both capabilities which exist outside the tracing-specific case.

@fengmk2
Copy link
Contributor

fengmk2 commented Apr 26, 2025

Actually, just like undici being built into Node.js as the implementation of fetch, it would be very developer-friendly for OpenTelemetry to be built-in as well.
I've been working on upgrading OpenTelemetry from v1 to v2 recently, and unless you're quite familiar with OpenTelemetry, the upgrade can be a bit difficult.

@timfish
Copy link
Contributor

timfish commented Apr 26, 2025

I agree with much of what's been said already!

It seems like one one of the main reasons to integrate otel into Node is because diagnostics_channel use hasn't become ubiquitous in libraries. I think the major obstacles to library authors using it have been:

  • The diagnostics_channel APIs are low level and vastly different to what otel and most APM vendors have settled on. Most library maintainers probably just want to startSpan and rarely care about much else or the underlying detail.
  • Even if a library is instrumented with diagnostics_channel, it still requires an otel plugin/instrumentation to convert the events to spans. This is more work and more maintenance.

We could do a lot more in userland to improve this:

  • Add some higher level APIs on top of diagnostics_channel that cover basic use cases.
  • Create a corresponding otel plugin that can listen for the events and create spans
  • TracingChannel is opt-in from the subscriber for performance reasons so... create a common way for libraries to expose a list of their tracing channels to the above plugin to subscribe to.

@Qard
Copy link
Member

Qard commented Apr 28, 2025

Well, I never really expected the ecosystem to adopt diagnostics_channel on their own. My expectation was APM vendors would go and make PRs to the libraries from which they care to get insight. I had intended on Datadog leading the charge, but that never really materialized, likely somewhat due to a reorg that happened which made ownership and priority of that work nebulous.

The intent was for it to fit a space similar to the tracing Rust crate, replacing the use case of the debug module in Node.js ecosystem with something that can be made more structural as desired, but did not too strongly impose structure on the consumer, just suggested it.

It may be the case that more tracing-shaped abstractions are needed, and if so I would suggest they should be in core. But I tried including those as part of the original implementation and got a lot of debate so removed them just to land a simpler core that could be iterated on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests