Skip to content

AsyncEngineDeadError when LoRA loading fails #3310

@lifuhuang

Description

@lifuhuang

Error: when client requesting a LoRA model that cannot be loaded, AsyncLLMEngine would crash with AsyncEngineDeadError. Client HTTP session would hang indefinitely.

Expected Behavior: VLLM should either prevent unloadable LoRA during init phase to avoid user running into this error OR return 500 error immediately.

Stacktrace:

Exception in callback functools.partial(<function _raise_exception_on_finish at 0x7f3abae836d0>, request_tracker=<vllm.engine.async_llm_engine.RequestTracker object at 0x7f3ab0fef550>)
handle: <Handle functools.partial(<function _raise_exception_on_finish at 0x7f3abae836d0>, request_tracker=<vllm.engine.async_llm_engine.RequestTracker object at 0x7f3ab0fef550>)>
Traceback (most recent call last):
  File "<packages_root>/vllm/engine/async_llm_engine.py", line 29, in _raise_exception_on_finish
    task.result()
  File "<packages_root>/vllm/engine/async_llm_engine.py", line 414, in run_engine_loop
    has_requests_in_progress = await self.engine_step()
  File "<packages_root>/vllm/engine/async_llm_engine.py", line 393, in engine_step
    request_outputs = await self.engine.step_async()
  File "<packages_root>/vllm/engine/async_llm_engine.py", line 189, in step_async
    all_outputs = await self._run_workers_async(
  File "<packages_root>/vllm/engine/async_llm_engine.py", line 276, in _run_workers_async
    all_outputs = await asyncio.gather(*coros)
  File "/home/<user>/Repos/hello-vllm/.conda/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "<packages_root>/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "<packages_root>/vllm/worker/worker.py", line 223, in execute_model
    output = self.model_runner.execute_model(seq_group_metadata_list,
  File "<packages_root>/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "<packages_root>/vllm/worker/model_runner.py", line 574, in execute_model
    self.set_active_loras(lora_requests, lora_mapping)
  File "<packages_root>/vllm/worker/model_runner.py", line 660, in set_active_loras
    self.lora_manager.set_active_loras(lora_requests, lora_mapping)
  File "<packages_root>/vllm/lora/worker_manager.py", line 112, in set_active_loras
    self._apply_loras(lora_requests)
  File "<packages_root>/vllm/lora/worker_manager.py", line 224, in _apply_loras
    self.add_lora(lora)
  File "<packages_root>/vllm/lora/worker_manager.py", line 231, in add_lora
    lora = self._load_lora(lora_request)
  File "<packages_root>/vllm/lora/worker_manager.py", line 153, in _load_lora
    raise ValueError(
ValueError: LoRA rank 256 is greater than max_lora_rank 16.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
  File "<packages_root>/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
    raise exc
  File "<packages_root>/vllm/engine/async_llm_engine.py", line 33, in _raise_exception_on_finish
    raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.

The repro is based on a larger than expected LoRA rank but I suppose any error from background loop would trigger the same unexpected behavior.

I was thinking about sending a PR to propagate the error from the error from background loop to the http response, but I would love to confirm if this would be the ideal solution, or you have better suggestions how this should be fixed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleOver 90 days of inactivity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions