[Core,Frontend,Doc] Trace v1 cuda start up with opentelemetry (vllm-project#19318) #20229

ibl-g · 2025-06-29T19:53:26Z

Purpose

The PR is in response to #19318 and

Adds MVP level OpenTelemetry tracing of a starting set of spans covering GPU/CUDA start up for openai.api_server entrypoint.
Tracing is on by default if opentelemetry is installed and a trace endpoint is configured via env var. This is default opentelemetry behaviour but differs from v0 request tracing that requires a CLI arg --otlp-traces-endpoint
Keeps opentelemetry an optional dependency. It uses no-op trace provider/spans if opentelemetry packages are not available. This is similar to how opentelemetry behaves if no trace provider is configured or tracing is disabled.
Forwards trace context between the API Server/AsyncLLM process and Engine Core process such that all spans are grouped together into a single trace view.
Adds a new pattern of per module trace "scopes" (opentelemetry terminology), similar to logging loggers. This is a common opentelemetry pattern but is a bit different from the v0 request tracing that exports a single span at the end of a request based on data collected by vLLM over time.

This PR is intended to be a starting point for iteration. We'll want to add coverage for other hardware and entrypoints and iterate on the set of spans and their attributes.

Test Plan

Unit tests testing that API server exports trace spans via gRPC, similar to v0 request tracing.

We may want to expand the test to cover also

custom no-op tracing behaviour (would require otel libraries to not be installed)
otel no-op tracing behaviour (by setting OTEL_SDK_DISABLED)
other entrypoints, notably llm.py

And perhaps share more of the test utilities with the v0 request tracing. Currently there's some duplication.

Happy to do this and more testing. I mostly wanted to get the PR in motion for early feedback.

Test Result

The new test passes. I've not yet been able to run the full suite of tests locally. If this is not done on the PR automatically I'll continue investigating my environment set up to resolve the missing imports resulting in test failures.

(Optional) Documentation Update

I've added an example documentation under "other", let me know if you prefer it under "online serving".

Example screenshots from Jaeger

Signed-off-by: ibl <[email protected]>

github-actions · 2025-06-29T19:53:33Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Summary of Changes

Hello @ibl-g, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces OpenTelemetry tracing for the vLLM startup process, specifically focusing on GPU/CUDA initialization within the openai.api_server entrypoint. It establishes a new, more granular tracing pattern using per-module scopes and ensures trace context propagation across different processes, allowing for a unified view of the startup sequence. The implementation maintains OpenTelemetry as an optional dependency, enabling tracing by default if the necessary environment variables are configured.

Highlights

OpenTelemetry Integration: Introduced OpenTelemetry tracing capabilities for vLLM's startup process, specifically focusing on GPU/CUDA initialization within the openai.api_server entrypoint. This allows for detailed visibility into cold start phases.
Trace Context Propagation: Implemented mechanisms to propagate trace context between the API server process and the engine core process. This ensures that all related startup spans are grouped into a single, unified trace view, simplifying debugging and performance analysis.
Flexible Tracing Activation: Tracing is now enabled by default if OpenTelemetry packages are installed and a trace endpoint is configured via environment variables. OpenTelemetry remains an optional dependency, gracefully falling back to no-op tracing if not available.
Granular Tracing Scopes: Adopted a new pattern of per-module trace 'scopes' (similar to logging loggers) to provide more granular and organized tracing, allowing for detailed instrumentation of specific components like tokenizer initialization, model loading, and KV cache setup.
New Startup Spans: Added a comprehensive set of startup-related spans, including vllm.startup, vllm.python_imports, vllm.asyncllm, vllm.asyncllm.tokenizer, vllm.model_registry.inspect_model, vllm.engine_core, vllm.engine_core_client, vllm.engine_core.kv_cache, vllm.engine_core.model_executor, vllm.engine_core.model_runner.load_model, vllm.engine_core.model_runner.profile_run, vllm.api_server.init_app_state, vllm.engine_core.torch_compile, and vllm.engine_core.model_runner.model_capture, each capturing relevant attributes.
Documentation and Testing: Added new documentation (examples/others/tracing_vllm_startup.md) detailing how to use the new tracing features, along with a new unit test (tests/tracing/test_startup_tracing.py) to validate that the API server correctly exports trace spans via gRPC.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces OpenTelemetry tracing for vLLM startup, which is a valuable addition for observability and performance analysis. The implementation is well-structured, making tracing an optional dependency and providing a clear pattern for adding more spans. The context propagation between processes is also handled correctly.

I've identified a couple of medium-severity issues. One is in the documentation and could lead to misconfiguration by users. The other is in the new test, which could make it brittle to future changes. After addressing these points, the PR will be in great shape.

gemini-code-assist · 2025-06-29T19:55:14Z

examples/others/tracing_vllm_startup.md

+3. Configure [OpenTelemetry environment variables](https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/) for vLLM
+
+    ```
+    export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=grpc://localhost:4317


The grpc:// scheme is not standard for OTEL_EXPORTER_OTLP_TRACES_ENDPOINT when using gRPC and might not be handled correctly by the OpenTelemetry library, leading to connection errors. The protocol is determined by OTEL_EXPORTER_OTLP_TRACES_PROTOCOL (which defaults to grpc), and security by OTEL_EXPORTER_OTLP_TRACES_INSECURE. It's better to provide just the host and port as the test does.

Suggested change

export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=grpc://localhost:4317

export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=localhost:4317

gemini-code-assist · 2025-06-29T19:55:14Z

tests/tracing/test_startup_tracing.py

+                for span in scope.spans:
+                    spans[span.name] = span
+
+    assert len(spans) == 12, (f"Expected 12 spans but got {len(spans)}.")


This assertion for an exact number of spans is brittle and likely to break as new spans are added or existing ones are refactored. The following assertion assert expected_spans <= found_spans is more robust as it checks for a minimum set of required spans while allowing for future additions. Removing this line will make the test more maintainable.

ibl-g · 2025-06-29T20:10:43Z

My apologies, running pre-commit locally did not reveal the issues noted by your neat automation. I'll get those addressed.

Signed-off-by: ibl <[email protected]>

Trace v1 cuda start up with opentelemetry (vllm-project#19318)

cba57e3

Signed-off-by: ibl <[email protected]>

ibl-g requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac, alexm-redhat and aarnphm as code owners June 29, 2025 19:53

mergify bot added documentation Improvements or additions to documentation frontend v1 labels Jun 29, 2025

gemini-code-assist bot reviewed Jun 29, 2025

View reviewed changes

ibl-g added 2 commits June 30, 2025 20:49

Merge branch 'vllm-project:main' into main

787f2c6

Fix a few pre-commit findings for vllm-project#20229 PR.

13d6cae

Signed-off-by: ibl <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Core,Frontend,Doc] Trace v1 cuda start up with opentelemetry (vllm-project#19318) #20229

[Core,Frontend,Doc] Trace v1 cuda start up with opentelemetry (vllm-project#19318) #20229

ibl-g commented Jun 29, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jun 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jun 29, 2025

Uh oh!

gemini-code-assist bot Jun 29, 2025

Uh oh!

ibl-g commented Jun 29, 2025

Uh oh!

Uh oh!

	export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=grpc://localhost:4317
	export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=localhost:4317

Uh oh!

[Core,Frontend,Doc] Trace v1 cuda start up with opentelemetry (vllm-project#19318) #20229

Are you sure you want to change the base?

[Core,Frontend,Doc] Trace v1 cuda start up with opentelemetry (vllm-project#19318) #20229

Conversation

ibl-g commented Jun 29, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Jun 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jun 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 29, 2025

Choose a reason for hiding this comment

Uh oh!

ibl-g commented Jun 29, 2025

Uh oh!

Uh oh!

ibl-g commented Jun 29, 2025 •

edited by github-actions bot

Loading