[Bug]: vLLM server timeout due to multiprocessing communication error

### Your current environment

 I ran 

sudo docker pull vllm/vllm-openai:latest

```
sudo docker run -d \
  --gpus all \
  --name vllm-8b-bf16-b200 \
  -p 8000:8000 \
  --ipc=host \
  -e HF_TOKEN=... \
  vllm/vllm-openai:latest \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --tensor-parallel-size 8 \
  --trust-remote-code \
  --host 0.0.0.0 \
  --port 8000
```

and tried to perform a sweep across input and output configs using vllm bench serve and ran into below isues


### 🐛 Describe the bug

```
(APIServer pid=1) INFO:     127.0.0.1:42352 - "POST /v1/completions HTTP/1.1" 200 OK
(APIServer pid=1) INFO 08-25 12:24:14 [loggers.py:123] Engine 000: Avg prompt throughput: 26298.0 tokens/s, Avg generation throughput: 10103.2 tokens/s, Running: 224 reqs, Waiting: 0 reqs, GPU KV cache usage: 3.2%, Prefix cache hit rate: 6.8%
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596] WorkerProc hit an exception.
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596] Traceback (most recent call last):
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 591, in worker_busy_loop
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596]     output = func(*args, **kwargs)
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596]              ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596]     return func(*args, **kwargs)
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596]            ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 362, in execute_model
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596]     output = self.model_runner.execute_model(scheduler_output,
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596]     return func(*args, **kwargs)
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596]            ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1733, in execute_model
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596]     valid_sampled_token_ids = sampled_token_ids.tolist()
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596]                               ^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596] RuntimeError: CUDA error: an illegal memory access was encountered
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596] 
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596] Traceback (most recent call last):
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 591, in worker_busy_loop
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596]     output = func(*args, **kwargs)
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596]              ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596]     return func(*args, **kwargs)
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596]            ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 362, in execute_model
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596]     output = self.model_runner.execute_model(scheduler_output,
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596]     return func(*args, **kwargs)
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596]            ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1733, in execute_model
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596]     valid_sampled_token_ids = sampled_token_ids.tolist()
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596]                               ^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596] RuntimeError: CUDA error: an illegal memory access was encountered
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596] 
(VllmWorker TP1 pid=406) ERROR 08-25 12:24:17 [multiproc_executor.py:596] 
[rank1]:[E825 12:24:17.876068535 ProcessGroupNCCL.cpp:1899] [PG ID 2 PG GUID 3 Rank 1] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7fb2b19785e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xe0 (0x7fb2b190d4a2 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x3c2 (0x7fb2b1d24422 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7fb24756d5a6 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x70 (0x7fb24757d840 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x782 (0x7fb24757f3d2 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7fb247580fdd in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0xdc253 (0x7fb2379b3253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #8: <unknown function> + 0x94ac3 (0x7fb2b2602ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #9: clone + 0x44 (0x7fb2b2693a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)

terminate called after throwing an instance of 'c10::DistBackendError'
  what():  [PG ID 2 PG GUID 3 Rank 1] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7fb2b19785e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xe0 (0x7fb2b190d4a2 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x3c2 (0x7fb2b1d24422 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7fb24756d5a6 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x70 (0x7fb24757d840 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x782 (0x7fb24757f3d2 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7fb247580fdd in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0xdc253 (0x7fb2379b3253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #8: <unknown function> + 0x94ac3 (0x7fb2b2602ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #9: clone + 0x44 (0x7fb2b2693a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)

Exception raised from ncclCommWatchdog at /pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1905 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7fb2b19785e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xcc7b9e (0x7fb24754fb9e in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #2: <unknown function> + 0x9165ed (0x7fb24719e5ed in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #3: <unknown function> + 0xdc253 (0x7fb2379b3253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #4: <unknown function> + 0x94ac3 (0x7fb2b2602ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #5: clone + 0x44 (0x7fb2b2693a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)

(EngineCore_0 pid=271) ERROR 08-25 12:24:19 [multiproc_executor.py:146] Worker proc VllmWorker-1 died unexpectedly, shutting down executor.
(VllmWorker TP0 pid=405) INFO 08-25 12:24:19 [multiproc_executor.py:520] Parent process exited, terminating worker
(VllmWorker TP2 pid=407) INFO 08-25 12:24:19 [multiproc_executor.py:520] Parent process exited, terminating worker
(VllmWorker TP3 pid=408) INFO 08-25 12:24:19 [multiproc_executor.py:520] Parent process exited, terminating worker
(VllmWorker TP4 pid=409) INFO 08-25 12:24:19 [multiproc_executor.py:520] Parent process exited, terminating worker
(VllmWorker TP5 pid=410) INFO 08-25 12:24:19 [multiproc_executor.py:520] Parent process exited, terminating worker
(VllmWorker TP6 pid=411) INFO 08-25 12:24:19 [multiproc_executor.py:520] Parent process exited, terminating worker
(VllmWorker TP7 pid=412) INFO 08-25 12:24:19 [multiproc_executor.py:520] Parent process exited, terminating worker
(APIServer pid=1) INFO 08-25 12:24:24 [loggers.py:123] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 7599.4 tokens/s, Running: 46 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.8%, Prefix cache hit rate: 6.8%
(APIServer pid=1) INFO 08-25 12:24:34 [loggers.py:123] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 46 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.8%, Prefix cache hit rate: 6.8%
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [dump_input.py:69] Dumping input data for V1 LLM engine (v0.10.1.1) with config: model='meta-llama/Llama-3.1-8B-Instruct', speculative_config=None, tokenizer='meta-llama/Llama-3.1-8B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=auto, tensor_parallel_size=8, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=meta-llama/Llama-3.1-8B-Instruct, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":1,"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null}, 
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [dump_input.py:76] Dumping scheduler output for model execution: SchedulerOutput(scheduled_new_reqs=[], scheduled_cached_reqs=CachedRequestData(req_ids=['cmpl-a1608f9679d149b986d6c224b1692e1a-0', 'cmpl-b9f9b4cad7c842c6af3f97cd025b8827-0', 'cmpl-e4ff1c56885d4842ab7b9cc64a74f2cc-0', 'cmpl-e1d55e6ae7c548a197239f8fdadee2ce-0', 'cmpl-5df04664825344ddb17a4a2e03d71fc0-0', 'cmpl-6af42ebea36845f8ac231f6b5a85d788-0', 'cmpl-4ad81fc4d4cb4b4f8040716c6a88e93f-0', 'cmpl-7140426cd0274962a6a11e34a15f1c59-0', 'cmpl-f788a2daf8404fd9b022c7fa09a7a647-0', 'cmpl-c039f23811e84f62bfc4e91c1889ee89-0', 'cmpl-1f3a5ce1766f4089a1d5a5525b1d40ba-0', 'cmpl-4a2b48ee95894d268f0154ae3767a4f2-0', 'cmpl-fbd5471ba88d47f7b25f74fd36878654-0', 'cmpl-2d5a59add4a64ff0be382b958b67d7a6-0', 'cmpl-a8c734c5e2fc4068afa5da1a81db36f7-0', 'cmpl-db324d84c39a46fc96337dca0c5fb016-0', 'cmpl-fd46a95e15df4b479c449777f657bb7c-0', 'cmpl-67843866b6aa4e38aa711a6d7f965d64-0', 'cmpl-dbe7e63fa7204fc4972d56770b10f7e2-0', 'cmpl-f4bcdaea24a54d6b9f030357f63e51c8-0', 'cmpl-0055582e065f485b9544012ee7f10421-0', 'cmpl-7b37a606b8224758a21d919d80be033e-0', 'cmpl-54cc950534024949a6d81f3a8a754c5c-0', 'cmpl-b24988d1c4e54d71a4b98229fc153ec4-0', 'cmpl-e093e5e9465340a9a094a6fbf8717dc7-0', 'cmpl-b2b106465b9242a3981e8e41d3f52059-0', 'cmpl-1f9111a751bf4b8d9900e220977f1999-0', 'cmpl-643c6c37535c47b29eb2bce1eed10fa8-0', 'cmpl-86728ec6560a407d8362075a334b3bec-0', 'cmpl-59ca12e89c60434c845094de2eb20c9a-0', 'cmpl-5c17eac5412f4f27a80697e7fb2d897c-0', 'cmpl-9300f636cd6947a0ba2942e5c06ade7c-0', 'cmpl-73dc79082dab4d2c9865b1f07ce56066-0', 'cmpl-86372334c65b4b5b8c721c630bf7f2eb-0', 'cmpl-e4a76396c85140b2a51bd5ba1153dbdb-0', 'cmpl-721c99b3cf6a4291bbe6733dc2aab4a3-0', 'cmpl-8caaebbd79e748d68b9d22e2dc8fba73-0', 'cmpl-73a225f1176a4115b39c3fc9fc40a473-0', 'cmpl-4f2efb18da2c4a8081108869ae6ef5c7-0', 'cmpl-ea717b38e897454887bf5211398001dd-0', 'cmpl-b5453d5862794f9d983d589396451c8f-0', 'cmpl-b54360cc76ff4079bf66699b81753ae1-0', 'cmpl-935e4c690e93471a82c9fcd90ff4fd73-0', 'cmpl-ad50070e4db743a4a2d0e77634fef7ce-0', 'cmpl-921a25acd8954387b9649171350be691-0', 'cmpl-9d84b57ac81c484b9876dfa52d66a50d-0'], resumed_from_preemption=[false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false], new_token_ids=[], new_block_ids=[[[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]]], num_computed_tokens=[1790, 1790, 1790, 1790, 1790, 1790, 1789, 1789, 1789, 1789, 1789, 1789, 1789, 1789, 1788, 1788, 1788, 1788, 1788, 1788, 1788, 1788, 1787, 1787, 1787, 1787, 1787, 1787, 1787, 1786, 1786, 1786, 1786, 1786, 1786, 1786, 1786, 1785, 1785, 1785, 1785, 1785, 1785, 1785, 1784, 1784]), num_scheduled_tokens={cmpl-f788a2daf8404fd9b022c7fa09a7a647-0: 1, cmpl-1f9111a751bf4b8d9900e220977f1999-0: 1, cmpl-e4a76396c85140b2a51bd5ba1153dbdb-0: 1, cmpl-b54360cc76ff4079bf66699b81753ae1-0: 1, cmpl-7140426cd0274962a6a11e34a15f1c59-0: 1, cmpl-ad50070e4db743a4a2d0e77634fef7ce-0: 1, cmpl-67843866b6aa4e38aa711a6d7f965d64-0: 1, cmpl-b2b106465b9242a3981e8e41d3f52059-0: 1, cmpl-935e4c690e93471a82c9fcd90ff4fd73-0: 1, cmpl-1f3a5ce1766f4089a1d5a5525b1d40ba-0: 1, cmpl-b5453d5862794f9d983d589396451c8f-0: 1, cmpl-fd46a95e15df4b479c449777f657bb7c-0: 1, cmpl-5df04664825344ddb17a4a2e03d71fc0-0: 1, cmpl-b9f9b4cad7c842c6af3f97cd025b8827-0: 1, cmpl-73a225f1176a4115b39c3fc9fc40a473-0: 1, cmpl-b24988d1c4e54d71a4b98229fc153ec4-0: 1, cmpl-e1d55e6ae7c548a197239f8fdadee2ce-0: 1, cmpl-ea717b38e897454887bf5211398001dd-0: 1, cmpl-86372334c65b4b5b8c721c630bf7f2eb-0: 1, cmpl-86728ec6560a407d8362075a334b3bec-0: 1, cmpl-8caaebbd79e748d68b9d22e2dc8fba73-0: 1, cmpl-5c17eac5412f4f27a80697e7fb2d897c-0: 1, cmpl-7b37a606b8224758a21d919d80be033e-0: 1, cmpl-73dc79082dab4d2c9865b1f07ce56066-0: 1, cmpl-4ad81fc4d4cb4b4f8040716c6a88e93f-0: 1, cmpl-59ca12e89c60434c845094de2eb20c9a-0: 1, cmpl-db324d84c39a46fc96337dca0c5fb016-0: 1, cmpl-c039f23811e84f62bfc4e91c1889ee89-0: 1, cmpl-a8c734c5e2fc4068afa5da1a81db36f7-0: 1, cmpl-e093e5e9465340a9a094a6fbf8717dc7-0: 1, cmpl-a1608f9679d149b986d6c224b1692e1a-0: 1, cmpl-54cc950534024949a6d81f3a8a754c5c-0: 1, cmpl-9300f636cd6947a0ba2942e5c06ade7c-0: 1, cmpl-9d84b57ac81c484b9876dfa52d66a50d-0: 1, cmpl-721c99b3cf6a4291bbe6733dc2aab4a3-0: 1, cmpl-dbe7e63fa7204fc4972d56770b10f7e2-0: 1, cmpl-f4bcdaea24a54d6b9f030357f63e51c8-0: 1, cmpl-6af42ebea36845f8ac231f6b5a85d788-0: 1, cmpl-fbd5471ba88d47f7b25f74fd36878654-0: 1, cmpl-921a25acd8954387b9649171350be691-0: 1, cmpl-4f2efb18da2c4a8081108869ae6ef5c7-0: 1, cmpl-643c6c37535c47b29eb2bce1eed10fa8-0: 1, cmpl-4a2b48ee95894d268f0154ae3767a4f2-0: 1, cmpl-0055582e065f485b9544012ee7f10421-0: 1, cmpl-2d5a59add4a64ff0be382b958b67d7a6-0: 1, cmpl-e4ff1c56885d4842ab7b9cc64a74f2cc-0: 1}, total_num_scheduled_tokens=46, scheduled_spec_decode_tokens={}, scheduled_encoder_inputs={}, num_common_prefix_blocks=[0], finished_req_ids=['cmpl-9e8212d8f3df4fcfbc8a02e347a50af9-0', 'cmpl-141566cfe0d74757902f26519a45838d-0', 'cmpl-0bf2ca4c7ca54bf2b4ea1404847e424a-0', 'cmpl-ee984d614f544d6fa38c0f170a6657f8-0', 'cmpl-f26038f1788a477c8053acd5e5b49ec5-0', 'cmpl-801bcde4959d4a86bd3f4f7ce5df741a-0', 'cmpl-37b1364bc95148a9b512f8baee500fa1-0', 'cmpl-417cf722c01a4844ba0dbd89297d1d91-0'], free_encoder_input_ids=[], structured_output_request_ids={}, grammar_bitmask=null, kv_connector_metadata=null)
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [dump_input.py:79] Dumping scheduler stats: SchedulerStats(num_running_reqs=46, num_waiting_reqs=0, step_counter=0, current_wave=0, kv_cache_usage=0.00820318034420564, prefix_cache_stats=PrefixCacheStats(reset=False, requests=0, queries=0, hits=0), spec_decoding_stats=None, num_corrupted_reqs=0)
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702] EngineCore encountered a fatal error.
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702] Traceback (most recent call last):
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 243, in collective_rpc
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702]     result = get_response(w, dequeue_timeout)
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 226, in get_response
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702]     status, result = w.worker_response_mq.dequeue(
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/shm_broadcast.py", line 507, in dequeue
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702]     with self.acquire_read(timeout, cancel) as buf:
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702]          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702]   File "/usr/lib/python3.12/contextlib.py", line 137, in __enter__
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702]     return next(self.gen)
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702]            ^^^^^^^^^^^^^^
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/shm_broadcast.py", line 469, in acquire_read
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702]     raise TimeoutError
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702] TimeoutError
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702] 
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702] The above exception was the direct cause of the following exception:
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702] 
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702] Traceback (most recent call last):
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 693, in run_engine_core
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702]     engine_core.run_busy_loop()
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 720, in run_busy_loop
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702]     self._process_engine_step()
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 745, in _process_engine_step
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702]     outputs, model_executed = self.step_fn()
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702]                               ^^^^^^^^^^^^^^
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 288, in step
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702]     model_output = self.execute_model_with_error_logging(
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 274, in execute_model_with_error_logging
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702]     raise err
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 265, in execute_model_with_error_logging
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702]     return model_fn(scheduler_output)
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702]            ^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 173, in execute_model
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702]     (output, ) = self.collective_rpc(
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702]                  ^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 249, in collective_rpc
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702]     raise TimeoutError(f"RPC call to {method} timed out.") from e
(EngineCore_0 pid=271) ERROR 08-25 12:29:17 [core.py:702] TimeoutError: RPC call to execute_model timed out.
(APIServer pid=1) ERROR 08-25 12:29:17 [async_llm.py:430] AsyncLLM output_handler failed.
(APIServer pid=1) ERROR 08-25 12:29:17 [async_llm.py:430] Traceback (most recent call last):
(APIServer pid=1) ERROR 08-25 12:29:17 [async_llm.py:430]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 389, in output_handler
(APIServer pid=1) ERROR 08-25 12:29:17 [async_llm.py:430]     outputs = await engine_core.get_output_async()
(APIServer pid=1) ERROR 08-25 12:29:17 [async_llm.py:430]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 08-25 12:29:17 [async_llm.py:430]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 843, in get_output_async
(APIServer pid=1) ERROR 08-25 12:29:17 [async_llm.py:430]     raise self._format_exception(outputs) from None
(APIServer pid=1) ERROR 08-25 12:29:17 [async_llm.py:430] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
(EngineCore_0 pid=271) Process EngineCore_0:
(EngineCore_0 pid=271) Traceback (most recent call last):
(EngineCore_0 pid=271)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 243, in collective_rpc
(EngineCore_0 pid=271)     result = get_response(w, dequeue_timeout)
(EngineCore_0 pid=271)              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=271)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 226, in get_response
(EngineCore_0 pid=271)     status, result = w.worker_response_mq.dequeue(
(EngineCore_0 pid=271)                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=271)   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/shm_broadcast.py", line 507, in dequeue
(EngineCore_0 pid=271)     with self.acquire_read(timeout, cancel) as buf:
(EngineCore_0 pid=271)          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=271)   File "/usr/lib/python3.12/contextlib.py", line 137, in __enter__
(EngineCore_0 pid=271)     return next(self.gen)
(EngineCore_0 pid=271)            ^^^^^^^^^^^^^^
(EngineCore_0 pid=271)   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/shm_broadcast.py", line 469, in acquire_read
(EngineCore_0 pid=271)     raise TimeoutError
(EngineCore_0 pid=271) TimeoutError
(EngineCore_0 pid=271) 
(EngineCore_0 pid=271) The above exception was the direct cause of the following exception:
(EngineCore_0 pid=271) 
(EngineCore_0 pid=271) Traceback (most recent call last):
(EngineCore_0 pid=271)   File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_0 pid=271)     self.run()
(EngineCore_0 pid=271)   File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore_0 pid=271)     self._target(*self._args, **self._kwargs)
(EngineCore_0 pid=271)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 704, in run_engine_core
(EngineCore_0 pid=271)     raise e
(EngineCore_0 pid=271)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 693, in run_engine_core
(EngineCore_0 pid=271)     engine_core.run_busy_loop()
(EngineCore_0 pid=271)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 720, in run_busy_loop
(EngineCore_0 pid=271)     self._process_engine_step()
(EngineCore_0 pid=271)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 745, in _process_engine_step
(EngineCore_0 pid=271)     outputs, model_executed = self.step_fn()
(EngineCore_0 pid=271)                               ^^^^^^^^^^^^^^
(EngineCore_0 pid=271)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 288, in step
(EngineCore_0 pid=271)     model_output = self.execute_model_with_error_logging(
(EngineCore_0 pid=271)                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=271)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 274, in execute_model_with_error_logging
(EngineCore_0 pid=271)     raise err
(EngineCore_0 pid=271)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 265, in execute_model_with_error_logging
(EngineCore_0 pid=271)     return model_fn(scheduler_output)
(EngineCore_0 pid=271)            ^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=271)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 173, in execute_model
(EngineCore_0 pid=271)     (output, ) = self.collective_rpc(
(EngineCore_0 pid=271)                  ^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=271)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 249, in collective_rpc
(EngineCore_0 pid=271)     raise TimeoutError(f"RPC call to {method} timed out.") from e
(EngineCore_0 pid=271) TimeoutError: RPC call to execute_model timed out.
(APIServer pid=1) INFO:     Shutting down
(APIServer pid=1) INFO:     Waiting for application shutdown.
(APIServer pid=1) INFO:     Application shutdown complete.
(APIServer pid=1) INFO:     Finished server process [1]
```

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: vLLM server timeout due to multiprocessing communication error #23582

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: vLLM server timeout due to multiprocessing communication error #23582

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions