Skip to content

[Bug]: AgentWorkflow problem with VLLM initialization #18519

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mahmoudmohey97 opened this issue Apr 24, 2025 · 5 comments
Open

[Bug]: AgentWorkflow problem with VLLM initialization #18519

mahmoudmohey97 opened this issue Apr 24, 2025 · 5 comments
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized

Comments

@mahmoudmohey97
Copy link

mahmoudmohey97 commented Apr 24, 2025

Bug Description

I was trying to create agent "AgentWorkflow.from_tools_or_functions" using LLM initialized from VLLM.
I used the code in docs from llama-index to initialize model using VLLM.
When i use HuggingFaceLLM, this problem doesn't happen.
VLLM version: 0.8.4

Version

0.12.31

Steps to Reproduce

from llama_index.core.agent.workflow import AgentWorkflow
from llama_index.core.tools import FunctionTool
from llama_index.llms.vllm import Vllm

model = Vllm(
                        model="unsloth/Qwen2.5-7B-Instruct-bnb-4bit",
                        dtype="float16",
                        tensor_parallel_size=1,
                        max_new_tokens=5000,
                    )

def summarize(text: str):
    """
    Summarize Document
    """
    return "Working"

workflow = AgentWorkflow.from_tools_or_functions(
    tools_or_functions=[FunctionTool.from_defaults(
        summarize,
        name="summarization",
        description="""
	Tool for document summarization.
        """
    )
  ],
    llm=model,
    verbose=True
)

async def main():
    prompt_doc = "summarize the following: {0}"    
    response = await workflow.run(prompt_doc.format("any text to summarize here"))
    print(response)
    
if __name__ == "__main__":
    import asyncio
    
    begin = time.time()
    asyncio.run(main())
    end = time.time() 
    # total time taken 
    print(f"Total runtime of whole program is {end - begin}")

Relevant Logs/Tracbacks

Exception in callback Dispatcher.span.<locals>.wrapper.<locals>.handle_future_result(span_id='Workflow.run...-38c79edf3cf1', bound_args=<BoundArgumen...t_history')})>, instance=<llama_index....x7ff8c7512590>, context=<_contextvars...x7ff8a05cc6c0>)(<WorkflowHand...Implemented")>) at /home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/instrumentation/dispatcher.py:274
handle: <Handle Dispatcher.span.<locals>.wrapper.<locals>.handle_future_result(span_id='Workflow.run...-38c79edf3cf1', bound_args=<BoundArgumen...t_history')})>, instance=<llama_index....x7ff8c7512590>, context=<_contextvars...x7ff8a05cc6c0>)(<WorkflowHand...Implemented")>) at /home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/instrumentation/dispatcher.py:274>
Traceback (most recent call last):
  File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/workflow/context.py", line 583, in _step_worker
    new_ev = await instrumented_step(**kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/instrumentation/dispatcher.py", line 368, in async_wrapper
    result = await func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/agent/workflow/multi_agent_workflow.py", line 382, in run_agent_step
    agent_output = await agent.take_step(
                   ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/agent/workflow/react_agent.py", line 95, in take_step
    response = await self.llm.astream_chat(input_chat)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/instrumentation/dispatcher.py", line 368, in async_wrapper
    result = await func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/llms/callbacks.py", line 75, in wrapped_async_llm_chat
    f_return_val = await f(_self, messages, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/llms/vllm/base.py", line 311, in astream_chat
    raise (ValueError("Not Implemented"))
ValueError: Not Implemented

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/cm/shared/ebtree/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/instrumentation/dispatcher.py", line 286, in handle_future_result
    raise exception
  File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/workflow/workflow.py", line 394, in _run_workflow
    raise exception_raised
  File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/workflow/context.py", line 592, in _step_worker
    raise WorkflowRuntimeError(
llama_index.core.workflow.errors.WorkflowRuntimeError: Error in step 'run_agent_step': Not Implemented
[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/workflow/context.py", line 583, in _step_worker
[rank0]:     new_ev = await instrumented_step(**kwargs)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/instrumentation/dispatcher.py", line 368, in async_wrapper
[rank0]:     result = await func(*args, **kwargs)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/agent/workflow/multi_agent_workflow.py", line 382, in run_agent_step
[rank0]:     agent_output = await agent.take_step(
[rank0]:                    ^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/agent/workflow/react_agent.py", line 95, in take_step
[rank0]:     response = await self.llm.astream_chat(input_chat)
[rank0]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/instrumentation/dispatcher.py", line 368, in async_wrapper
[rank0]:     result = await func(*args, **kwargs)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/llms/callbacks.py", line 75, in wrapped_async_llm_chat
[rank0]:     f_return_val = await f(_self, messages, **kwargs)
[rank0]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/llms/vllm/base.py", line 311, in astream_chat
[rank0]:     raise (ValueError("Not Implemented"))
[rank0]: ValueError: Not Implemented

[rank0]: The above exception was the direct cause of the following exception:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/mahmoud.hassanain/llama-index/vllm/appllama.py", line 162, in <module>
[rank0]:     asyncio.run(main())
[rank0]:   File "/cm/shared/ebtree/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/asyncio/runners.py", line 190, in run
[rank0]:     return runner.run(main)
[rank0]:            ^^^^^^^^^^^^^^^^
[rank0]:   File "/cm/shared/ebtree/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/asyncio/runners.py", line 118, in run
[rank0]:     return self._loop.run_until_complete(task)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/cm/shared/ebtree/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
[rank0]:     return future.result()
[rank0]:            ^^^^^^^^^^^^^^^
[rank0]:   File "/home/mahmoud.hassanain/llama-index/vllm/appllama.py", line 141, in main
[rank0]:     response = await workflow.run(query4, memory=memory)
[rank0]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/workflow/workflow.py", line 394, in _run_workflow
[rank0]:     raise exception_raised
[rank0]:   File "/home/mahmoud.hassanain/.local/lib/python3.11/site-packages/llama_index/core/workflow/context.py", line 592, in _step_worker
[rank0]:     raise WorkflowRuntimeError(
[rank0]: llama_index.core.workflow.errors.WorkflowRuntimeError: Error in step 'run_agent_step': Not Implemented
[rank0]:[W424 13:11:00.727119009 ProcessGroupNCCL.cpp:1496] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
@mahmoudmohey97 mahmoudmohey97 added bug Something isn't working triage Issue needs to be triaged/prioritized labels Apr 24, 2025
Copy link

dosubot bot commented Apr 24, 2025

Hello @mahmoudmohey97! I'm Dosu, here to help you solve bugs, answer questions, and become a contributor while you wait for a human maintainer. Let’s tackle this issue together!

To resolve the "Not Implemented" error when using VLLM with the AgentWorkflow in LlamaIndex, you need to avoid using methods that are not implemented in the Vllm class. Specifically, the astream_chat method is not implemented and will raise this error if called. Instead, you should use the chat, complete, achat, or acomplete methods, which are implemented and should work without raising the error.

Here's an example of how you can use the complete method with VLLM:

from llama_index.llms.vllm import Vllm

# specific functions to format for mistral instruct
def messages_to_prompt(messages):
    prompt = "\n".join([str(x) for x in messages])
    return f"<s>[INST] {prompt} [/INST] </s>\n"

def completion_to_prompt(completion):
    return f"<s>[INST] {completion} [/INST] </s>\n"

llm = Vllm(
    model="mistralai/Mistral-7B-Instruct-v0.1",
    tensor_parallel_size=4,
    max_new_tokens=256,
    vllm_kwargs={"swap_space": 1, "gpu_memory_utilization": 0.5},
    messages_to_prompt=messages_to_prompt,
    completion_to_prompt=completion_to_prompt,
)

response = llm.complete("What is a black hole?")
print(response.text)

Make sure to adjust your workflow to use these methods instead of the ones that are not implemented [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

@logan-markewich
Copy link
Collaborator

@mahmoudmohey97 astream_chat() is not implemented on VLLM -- maybe just deploy vllm server and use OpenAILike ?

@mahmoudmohey97
Copy link
Author

@logan-markewich you mean like the following:

from llama_index.llms.vllm import Vllm
from llama_index.llms.vllm import VllmServer
from llama_index.llms.openai_like import OpenAILike

_model = VllmServer(model="unsloth/Qwen2.5-7B-Instruct-bnb-4bit",api_url="http://localhost:8000",max_new_tokens=2000, dtype="float16")
llm = OpenAILike(model="Qwen2.5-7B-Instruct-bnb-4bit", api_base="http://localhost:8000", api_key="fake", is_chat_model=True, is_function_calling_model=True)    

Then use llm variable in AgentWorkflow?

@logan-markewich
Copy link
Collaborator

logan-markewich commented Apr 25, 2025

@mahmoudmohey97 yea, except don't use VllmServer class at all, I meant just launch the server from the CLI and then use OpenAILike to connect 👍🏻

@mahmoudmohey97
Copy link
Author

@logan-markewich Thank you logan it's working now, but i have a question do you know which call tool cool parser to select from when using qwe2.5?
I did some research and found that some people use hermes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized
Projects
None yet
Development

No branches or pull requests

2 participants