Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,11 @@
BuiltinTool,
ToolCall,
)
from llama_stack.providers.utils.inference.openai_compat import (
convert_message_to_openai_dict,
convert_openai_chat_completion_stream,
convert_tooldef_to_openai_tool,
)
from llama_stack.providers.utils.kvstore import KVStore
from llama_stack.providers.utils.telemetry import tracing

Expand Down Expand Up @@ -510,16 +515,60 @@ async def _run(
async with tracing.span("inference") as span:
if self.agent_config.name:
span.set_attribute("agent_name", self.agent_config.name)
async for chunk in await self.inference_api.chat_completion(
self.agent_config.model,
input_messages,
tools=self.tool_defs,
tool_prompt_format=self.agent_config.tool_config.tool_prompt_format,
response_format=self.agent_config.response_format,
# Convert messages to OpenAI format
openai_messages = []
for message in input_messages:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we are going to convert to openai responses we should also shift our inputs to match so we don't need to convert anything using the openai compat stuff.

you can look at the inference API for how it handles this with things like openai_chat_completion

openai_message = await convert_message_to_openai_dict(message)
openai_messages.append(openai_message)

# Convert tool definitions to OpenAI format
openai_tools = None
if self.tool_defs:
openai_tools = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: compact to write as:

openai_tools = [convert...(x) for x in (self.tool_defs or [])]

for tool_def in self.tool_defs:
openai_tool = convert_tooldef_to_openai_tool(tool_def)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be wrong, but I think we should avoid conversion of any inputs and instead rely on openai native inputs? I will leave this up to others though since I don't have too much context here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good first step, when changing things we can change things piecemeal. Then ripple changes outwards and do what you suggest. When things go outwards, you are going to make things backwards incompatible also, and that is something we have slated for 0.3.0.

@aakankshaduggal this is great and looks nearly ready perhaps? have you run the tests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok makes sense, we can start with this approach as long as there is work to do the following changes!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cdoern and @ashwinb! I feel after this is merged we can think of the larger refactor for agents api.
I did run the tests locally and succeed.

openai_tools.append(openai_tool)

# Extract tool_choice from tool_config for OpenAI compatibility
# Note: tool_choice can only be provided when tools are also provided
tool_choice = None
if openai_tools and self.agent_config.tool_config and self.agent_config.tool_config.tool_choice:
tool_choice = (
self.agent_config.tool_config.tool_choice.value
if hasattr(self.agent_config.tool_config.tool_choice, "value")
else str(self.agent_config.tool_config.tool_choice)
)

# Convert sampling params to OpenAI format (temperature, top_p, max_tokens)
temperature = None
top_p = None
max_tokens = None
if sampling_params:
if hasattr(sampling_params.strategy, "temperature"):
temperature = sampling_params.strategy.temperature
if hasattr(sampling_params.strategy, "top_p"):
top_p = sampling_params.strategy.top_p
if sampling_params.max_tokens:
max_tokens = sampling_params.max_tokens

# Use OpenAI chat completion
openai_stream = await self.inference_api.openai_chat_completion(
model=self.agent_config.model,
messages=openai_messages,
tools=openai_tools if openai_tools else None,
tool_choice=tool_choice,
temperature=temperature,
top_p=top_p,
max_tokens=max_tokens,
stream=True,
sampling_params=sampling_params,
tool_config=self.agent_config.tool_config,
):
)

# Convert OpenAI stream back to Llama Stack format
response_stream = convert_openai_chat_completion_stream(
Copy link
Contributor

@cdoern cdoern Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, same here, I wonder if we should return the openai format not the llama stack format. it looks to me here like we are taking llama stack input, converting to openai format, doing a request, and then converting back to llama stack format. I think to do a proper refactor here we should depend on the openai types

wdyt about this @aakankshaduggal ? I am open to anything here

openai_stream, enable_incremental_tool_calls=True
)

async for chunk in response_stream:
event = chunk.event
if event.event_type == ChatCompletionResponseEventType.start:
continue
Expand Down
Loading