Skip to content

[Bug]: When using qwen-32b-chat-awq with multi-threaded access, errors occur after approximately several hundred visits.”vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already.“ #6421

@ZHJ19970917

Description

@ZHJ19970917

Your current environment

Collecting environment information...
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.3 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.35

Python version: 3.12.2 | packaged by Anaconda, Inc. | (main, Feb 27 2024, 17:35:02) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-73-generic-x86_64-with-glibc2.35
Is CUDA available: N/A
CUDA runtime version: 12.1.105
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: GPU 0: NVIDIA A100-PCIE-40GB
Nvidia driver version: 535.129.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: N/A

CPU:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Address sizes:                   46 bits physical, 48 bits virtual
Byte Order:                      Little Endian
CPU(s):                          80
On-line CPU(s) list:             0-79
Vendor ID:                       GenuineIntel
Model name:                      Intel Xeon Processor (Skylake, IBRS)
CPU family:                      6
Model:                           85
Thread(s) per core:              2
Core(s) per socket:              20
Socket(s):                       2
Stepping:                        4
BogoMIPS:                        5986.22
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti ibrs ibpb fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat
L1d cache:                       2.5 MiB (80 instances)
L1i cache:                       2.5 MiB (80 instances)
L2 cache:                        160 MiB (40 instances)
L3 cache:                        32 MiB (2 instances)
NUMA node(s):                    2
NUMA node0 CPU(s):               0-39
NUMA node1 CPU(s):               40-79
Vulnerability Itlb multihit:     KVM: Mitigation: VMX unsupported
Vulnerability L1tf:              Mitigation; PTE Inversion
Vulnerability Mds:               Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Mmio stale data:   Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Retbleed:          Mitigation; IBRS
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown

Versions of relevant libraries:
[pip3] numpy==1.26.4
[conda] numpy                     1.26.4                   pypi_0    pypi
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: N/A
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      0-79    0-1             N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

🐛 Describe the bug

frame #27: _PyEval_EvalFrameDefault + 0x53d6 (0x4f34c6 in /root/autodl-tmp/miniconda3/envs/llm/bin/python)
frame #28: /root/autodl-tmp/miniconda3/envs/llm/bin/python() [0x5099ce]
frame #29: PyObject_Call + 0xb8 (0x50a508 in /root/autodl-tmp/miniconda3/envs/llm/bin/python)
frame #30: _PyEval_EvalFrameDefault + 0x2b79 (0x4f0c69 in /root/autodl-tmp/miniconda3/envs/llm/bin/python)
frame #31: /root/autodl-tmp/miniconda3/envs/llm/bin/python() [0x5099ce]
frame #32: PyObject_Call + 0xb8 (0x50a508 in /root/autodl-tmp/miniconda3/envs/llm/bin/python)
frame #33: _PyEval_EvalFrameDefault + 0x2b79 (0x4f0c69 in /root/autodl-tmp/miniconda3/envs/llm/bin/python)
frame #34: _PyFunction_Vectorcall + 0x6f (0x4fe0cf in /root/autodl-tmp/miniconda3/envs/llm/bin/python)
frame #35: _PyObject_FastCallDictTstate + 0x17d (0x4f681d in /root/autodl-tmp/miniconda3/envs/llm/bin/python)
frame #36: _PyObject_Call_Prepend + 0x66 (0x507f36 in /root/autodl-tmp/miniconda3/envs/llm/bin/python)
frame #37: /root/autodl-tmp/miniconda3/envs/llm/bin/python() [0x5cf883]
frame #38: _PyObject_MakeTpCall + 0x25b (0x4f741b in /root/autodl-tmp/miniconda3/envs/llm/bin/python)
frame #39: _PyEval_EvalFrameDefault + 0x5757 (0x4f3847 in /root/autodl-tmp/miniconda3/envs/llm/bin/python)
frame #40: /root/autodl-tmp/miniconda3/envs/llm/bin/python() [0x509b26]
frame #41: _PyEval_EvalFrameDefault + 0x2b79 (0x4f0c69 in /root/autodl-tmp/miniconda3/envs/llm/bin/python)
frame #42: /root/autodl-tmp/miniconda3/envs/llm/bin/python() [0x509b26]
frame #43: _PyEval_EvalFrameDefault + 0x2b79 (0x4f0c69 in /root/autodl-tmp/miniconda3/envs/llm/bin/python)
frame #44: _PyObject_FastCallDictTstate + 0xcd (0x4f676d in /root/autodl-tmp/miniconda3/envs/llm/bin/python)
frame #45: _PyObject_Call_Prepend + 0xe0 (0x507fb0 in /root/autodl-tmp/miniconda3/envs/llm/bin/python)
frame #46: /root/autodl-tmp/miniconda3/envs/llm/bin/python() [0x5cf883]
frame #47: _PyObject_MakeTpCall + 0x25b (0x4f741b in /root/autodl-tmp/miniconda3/envs/llm/bin/python)
frame #48: _PyEval_EvalFrameDefault + 0x4dde (0x4f2ece in /root/autodl-tmp/miniconda3/envs/llm/bin/python)
frame #49: /root/autodl-tmp/miniconda3/envs/llm/bin/python() [0x509b26]
frame #50: _PyEval_EvalFrameDefault + 0x2b79 (0x4f0c69 in /root/autodl-tmp/miniconda3/envs/llm/bin/python)
frame #51: /root/autodl-tmp/miniconda3/envs/llm/bin/python() [0x509b26]
frame #52: _PyEval_EvalFrameDefault + 0x2b79 (0x4f0c69 in /root/autodl-tmp/miniconda3/envs/llm/bin/python)
frame #53: _PyObject_FastCallDictTstate + 0xcd (0x4f676d in /root/autodl-tmp/miniconda3/envs/llm/bin/python)
frame #54: _PyObject_Call_Prepend + 0x66 (0x507f36 in /root/autodl-tmp/miniconda3/envs/llm/bin/python)
frame #55: /root/autodl-tmp/miniconda3/envs/llm/bin/python() [0x5cf883]
frame #56: _PyObject_MakeTpCall + 0x25b (0x4f741b in /root/autodl-tmp/miniconda3/envs/llm/bin/python)
frame #57: _PyEval_EvalFrameDefault + 0x53d6 (0x4f34c6 in /root/autodl-tmp/miniconda3/envs/llm/bin/python)
frame #58: /root/autodl-tmp/miniconda3/envs/llm/bin/python() [0x5099ce]
frame #59: PyObject_Call + 0xb8 (0x50a508 in /root/autodl-tmp/miniconda3/envs/llm/bin/python)
frame #60: _PyEval_EvalFrameDefault + 0x2b79 (0x4f0c69 in /root/autodl-tmp/miniconda3/envs/llm/bin/python)
frame #61: /root/autodl-tmp/miniconda3/envs/llm/bin/python() [0x5099ce]
frame #62: PyObject_Call + 0xb8 (0x50a508 in /root/autodl-tmp/miniconda3/envs/llm/bin/python)
frame #63: _PyEval_EvalFrameDefault + 0x2b79 (0x4f0c69 in /root/autodl-tmp/miniconda3/envs/llm/bin/python)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in call
return await self.app(scope, receive, send)
File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in call
await super().call(scope, receive, send)
File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/starlette/applications.py", line 123, in call
await self.middleware_stack(scope, receive, send)
File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in call
raise exc
File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in call
await self.app(scope, receive, _send)
File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/starlette/middleware/cors.py", line 83, in call
await self.app(scope, receive, send)
File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in call
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/starlette/routing.py", line 758, in call
await self.middleware_stack(scope, receive, send)
File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/starlette/routing.py", line 778, in app
await route.handle(scope, receive, send)
File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/starlette/routing.py", line 299, in handle
await self.app(scope, receive, send)
File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/starlette/routing.py", line 79, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/starlette/routing.py", line 74, in app
response = await func(request)
File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/fastapi/routing.py", line 299, in app
raise e
File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/fastapi/routing.py", line 294, in app
raw_response = await run_endpoint_function(
File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
File "/root/autodl-tmp/apps/LLaMA-Factory/src/llamafactory/api/app.py", line 85, in create_chat_completion
return await create_chat_completion_response(request, chat_model)
File "/root/autodl-tmp/apps/LLaMA-Factory/src/llamafactory/api/chat.py", line 132, in create_chat_completion_response
responses = await chat_model.achat(
File "/root/autodl-tmp/apps/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 56, in achat
return await self.engine.chat(messages, system, tools, image, **input_kwargs)
File "/root/autodl-tmp/apps/LLaMA-Factory/src/llamafactory/chat/vllm_engine.py", line 178, in chat
async for request_output in generator:
File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 662, in generate
async for output in self._process_request(
File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 756, in _process_request
stream = await self.add_request(
File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 561, in add_request
self.start_background_loop()
File "/root/autodl-tmp/miniconda3/envs/llm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 431, in start_background_loop
raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already.
INFO: 127.0.0.1:51326 - "GET /.env HTTP/1.1" 404 Not Found

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleOver 90 days of inactivity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions