-
Notifications
You must be signed in to change notification settings - Fork 583
Description
What happened?
I have an application which uses a multi-threaded tokio runtime. Calling trace_provider.force_flush()
in our pre-shutdown routine consistently deadlocks the application (since the call never returns).
I suspect the internal usage of futures_executor::block_on
is the culprit. My hypothesis is that calling force_flush
from an async task blocks the runtime thread and prevents actual progress if the scheduler cannot cope with other tasks assigned to that thread. This hypothesis is further supported by the observation that starting an opentelemetry_otlp
pipeline from a dedicated, single-threaded tokio runtime no longer exhibits the deadlocking scenario when trace_provider.force_flush()
is called (notably because there's a separate thread available to handle the internal export tasks while the caller is blocked)
opentelemetry-rust/opentelemetry-sdk/src/trace/span_processor.rs
Lines 290 to 292 in 073f7a6
futures_executor::block_on(res_receiver) | |
.map_err(|err| TraceError::Other(err.into())) | |
.and_then(|identity| identity) |
API Version
Not sure, using the opentelemetry-collector with Jaeger, so likely the latest API version
SDK Version
opentelemetry 0.21.0
opentelmetry-otlp 0.14.0
opentelemetry_sdk 0.21.1
What Exporters are you seeing the problem on?
OTLP
Relevant log output
No useful log output. I did some println debugging of the opentelemetry-sdk internals (with a local fork) which showed that once force_flush
was called, the internal methods of the BatchSpanProcessor stopped processing messages