[Bug]: AgentWorkflow problem with VLLM initialization #18519

mahmoudmohey97 · 2025-04-24T11:45:45Z

Bug Description

I was trying to create agent "AgentWorkflow.from_tools_or_functions" using LLM initialized from VLLM.
I used the code in docs from llama-index to initialize model using VLLM.
When i use HuggingFaceLLM, this problem doesn't happen.
VLLM version: 0.8.4

Version

0.12.31

Steps to Reproduce

from llama_index.core.agent.workflow import AgentWorkflow
from llama_index.core.tools import FunctionTool
from llama_index.llms.vllm import Vllm

model = Vllm(
                        model="unsloth/Qwen2.5-7B-Instruct-bnb-4bit",
                        dtype="float16",
                        tensor_parallel_size=1,
                        max_new_tokens=5000,
                    )

def summarize(text: str):
    """
    Summarize Document
    """
    return "Working"

workflow = AgentWorkflow.from_tools_or_functions(
    tools_or_functions=[FunctionTool.from_defaults(
        summarize,
        name="summarization",
        description="""
	Tool for document summarization.
        """
    )
  ],
    llm=model,
    verbose=True
)

async def main():
    prompt_doc = "summarize the following: {0}"    
    response = await workflow.run(prompt_doc.format("any text to summarize here"))
    print(response)
    
if __name__ == "__main__":
    import asyncio
    
    begin = time.time()
    asyncio.run(main())
    end = time.time() 
    # total time taken 
    print(f"Total runtime of whole program is {end - begin}")

Relevant Logs/Tracbacks

Exception in callback Dispatcher.span.<locals>.wrapper.<locals>.handle_future_result(span_id='Workflow.run...-38c79edf3cf1', bound_args=<BoundArgumen...t_history')})>, instance=<llama_index....x7ff8c7512590>, context=<_contextvars...x7ff8a05cc6c0>)(<WorkflowHand...Implemented")>) at /home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/instrumentation/dispatcher.py:274
handle: <Handle Dispatcher.span.<locals>.wrapper.<locals>.handle_future_result(span_id='Workflow.run...-38c79edf3cf1', bound_args=<BoundArgumen...t_history')})>, instance=<llama_index....x7ff8c7512590>, context=<_contextvars...x7ff8a05cc6c0>)(<WorkflowHand...Implemented")>) at /home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/instrumentation/dispatcher.py:274>
Traceback (most recent call last):
  File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/workflow/context.py", line 583, in _step_worker
    new_ev = await instrumented_step(**kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/instrumentation/dispatcher.py", line 368, in async_wrapper
    result = await func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/agent/workflow/multi_agent_workflow.py", line 382, in run_agent_step
    agent_output = await agent.take_step(
                   ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/agent/workflow/react_agent.py", line 95, in take_step
    response = await self.llm.astream_chat(input_chat)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/instrumentation/dispatcher.py", line 368, in async_wrapper
    result = await func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/llms/callbacks.py", line 75, in wrapped_async_llm_chat
    f_return_val = await f(_self, messages, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/llms/vllm/base.py", line 311, in astream_chat
    raise (ValueError("Not Implemented"))
ValueError: Not Implemented

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/cm/shared/ebtree/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/instrumentation/dispatcher.py", line 286, in handle_future_result
    raise exception
  File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/workflow/workflow.py", line 394, in _run_workflow
    raise exception_raised
  File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/workflow/context.py", line 592, in _step_worker
    raise WorkflowRuntimeError(
llama_index.core.workflow.errors.WorkflowRuntimeError: Error in step 'run_agent_step': Not Implemented
[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/workflow/context.py", line 583, in _step_worker
[rank0]:     new_ev = await instrumented_step(**kwargs)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/instrumentation/dispatcher.py", line 368, in async_wrapper
[rank0]:     result = await func(*args, **kwargs)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/agent/workflow/multi_agent_workflow.py", line 382, in run_agent_step
[rank0]:     agent_output = await agent.take_step(
[rank0]:                    ^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/agent/workflow/react_agent.py", line 95, in take_step
[rank0]:     response = await self.llm.astream_chat(input_chat)
[rank0]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/instrumentation/dispatcher.py", line 368, in async_wrapper
[rank0]:     result = await func(*args, **kwargs)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/llms/callbacks.py", line 75, in wrapped_async_llm_chat
[rank0]:     f_return_val = await f(_self, messages, **kwargs)
[rank0]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/llms/vllm/base.py", line 311, in astream_chat
[rank0]:     raise (ValueError("Not Implemented"))
[rank0]: ValueError: Not Implemented

[rank0]: The above exception was the direct cause of the following exception:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/mahmoud.hassanain/llama-index/vllm/appllama.py", line 162, in <module>
[rank0]:     asyncio.run(main())
[rank0]:   File "/cm/shared/ebtree/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/asyncio/runners.py", line 190, in run
[rank0]:     return runner.run(main)
[rank0]:            ^^^^^^^^^^^^^^^^
[rank0]:   File "/cm/shared/ebtree/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/asyncio/runners.py", line 118, in run
[rank0]:     return self._loop.run_until_complete(task)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/cm/shared/ebtree/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
[rank0]:     return future.result()
[rank0]:            ^^^^^^^^^^^^^^^
[rank0]:   File "/home/mahmoud.hassanain/llama-index/vllm/appllama.py", line 141, in main
[rank0]:     response = await workflow.run(query4, memory=memory)
[rank0]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/workflow/workflow.py", line 394, in _run_workflow
[rank0]:     raise exception_raised
[rank0]:   File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/workflow/context.py", line 592, in _step_worker
[rank0]:     raise WorkflowRuntimeError(
[rank0]: llama_index.core.workflow.errors.WorkflowRuntimeError: Error in step 'run_agent_step': Not Implemented
[rank0]:[W424 13:11:00.727119009 ProcessGroupNCCL.cpp:1496] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

The text was updated successfully, but these errors were encountered:

dosubot · 2025-04-24T11:47:24Z

Hello @mahmoudmohey97! I'm Dosu, here to help you solve bugs, answer questions, and become a contributor while you wait for a human maintainer. Let’s tackle this issue together!

To resolve the "Not Implemented" error when using VLLM with the AgentWorkflow in LlamaIndex, you need to avoid using methods that are not implemented in the Vllm class. Specifically, the astream_chat method is not implemented and will raise this error if called. Instead, you should use the chat, complete, achat, or acomplete methods, which are implemented and should work without raising the error.

Here's an example of how you can use the complete method with VLLM:

from llama_index.llms.vllm import Vllm

# specific functions to format for mistral instruct
def messages_to_prompt(messages):
    prompt = "\n".join([str(x) for x in messages])
    return f"<s>[INST] {prompt} [/INST] </s>\n"

def completion_to_prompt(completion):
    return f"<s>[INST] {completion} [/INST] </s>\n"

llm = Vllm(
    model="mistralai/Mistral-7B-Instruct-v0.1",
    tensor_parallel_size=4,
    max_new_tokens=256,
    vllm_kwargs={"swap_space": 1, "gpu_memory_utilization": 0.5},
    messages_to_prompt=messages_to_prompt,
    completion_to_prompt=completion_to_prompt,
)

response = llm.complete("What is a black hole?")
print(response.text)

Make sure to adjust your workflow to use these methods instead of the ones that are not implemented ^[1].

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other}

logan-markewich · 2025-04-24T14:55:55Z

@mahmoudmohey97 astream_chat() is not implemented on VLLM -- maybe just deploy vllm server and use OpenAILike ?

mahmoudmohey97 · 2025-04-24T16:30:53Z

@logan-markewich you mean like the following:

from llama_index.llms.vllm import Vllm
from llama_index.llms.vllm import VllmServer
from llama_index.llms.openai_like import OpenAILike

_model = VllmServer(model="unsloth/Qwen2.5-7B-Instruct-bnb-4bit",api_url="http://localhost:8000",max_new_tokens=2000, dtype="float16")
llm = OpenAILike(model="Qwen2.5-7B-Instruct-bnb-4bit", api_base="http://localhost:8000", api_key="fake", is_chat_model=True, is_function_calling_model=True)

Then use llm variable in AgentWorkflow?

logan-markewich · 2025-04-25T21:01:56Z

@mahmoudmohey97 yea, except don't use VllmServer class at all, I meant just launch the server from the CLI and then use OpenAILike to connect 👍🏻

mahmoudmohey97 · 2025-04-26T09:59:44Z

@logan-markewich Thank you logan it's working now, but i have a question do you know which call tool cool parser to select from when using qwe2.5?
I did some research and found that some people use hermes.

mahmoudmohey97 added bug Something isn't working triage Issue needs to be triaged/prioritized labels Apr 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: AgentWorkflow problem with VLLM initialization #18519

[Bug]: AgentWorkflow problem with VLLM initialization #18519

mahmoudmohey97 commented Apr 24, 2025 •

edited

Loading

dosubot bot commented Apr 24, 2025

logan-markewich commented Apr 24, 2025

mahmoudmohey97 commented Apr 24, 2025

logan-markewich commented Apr 25, 2025 •

edited

Loading

mahmoudmohey97 commented Apr 26, 2025

[Bug]: AgentWorkflow problem with VLLM initialization #18519

[Bug]: AgentWorkflow problem with VLLM initialization #18519

Comments

mahmoudmohey97 commented Apr 24, 2025 • edited Loading

Bug Description

Version

Steps to Reproduce

Relevant Logs/Tracbacks

dosubot bot commented Apr 24, 2025

logan-markewich commented Apr 24, 2025

mahmoudmohey97 commented Apr 24, 2025

logan-markewich commented Apr 25, 2025 • edited Loading

mahmoudmohey97 commented Apr 26, 2025

mahmoudmohey97 commented Apr 24, 2025 •

edited

Loading

logan-markewich commented Apr 25, 2025 •

edited

Loading