-
Notifications
You must be signed in to change notification settings - Fork 1.2k
refactor(agents): migrate to OpenAI chat completions API #3097
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -68,6 +68,11 @@ | |
BuiltinTool, | ||
ToolCall, | ||
) | ||
from llama_stack.providers.utils.inference.openai_compat import ( | ||
convert_message_to_openai_dict, | ||
convert_openai_chat_completion_stream, | ||
convert_tooldef_to_openai_tool, | ||
) | ||
from llama_stack.providers.utils.kvstore import KVStore | ||
from llama_stack.providers.utils.telemetry import tracing | ||
|
||
|
@@ -510,16 +515,60 @@ async def _run( | |
async with tracing.span("inference") as span: | ||
if self.agent_config.name: | ||
span.set_attribute("agent_name", self.agent_config.name) | ||
async for chunk in await self.inference_api.chat_completion( | ||
self.agent_config.model, | ||
input_messages, | ||
tools=self.tool_defs, | ||
tool_prompt_format=self.agent_config.tool_config.tool_prompt_format, | ||
response_format=self.agent_config.response_format, | ||
# Convert messages to OpenAI format | ||
openai_messages = [] | ||
for message in input_messages: | ||
openai_message = await convert_message_to_openai_dict(message) | ||
openai_messages.append(openai_message) | ||
|
||
# Convert tool definitions to OpenAI format | ||
openai_tools = None | ||
if self.tool_defs: | ||
openai_tools = [] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: compact to write as:
|
||
for tool_def in self.tool_defs: | ||
openai_tool = convert_tooldef_to_openai_tool(tool_def) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I could be wrong, but I think we should avoid conversion of any inputs and instead rely on openai native inputs? I will leave this up to others though since I don't have too much context here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is a good first step, when changing things we can change things piecemeal. Then ripple changes outwards and do what you suggest. When things go outwards, you are going to make things backwards incompatible also, and that is something we have slated for 0.3.0. @aakankshaduggal this is great and looks nearly ready perhaps? have you run the tests? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok makes sense, we can start with this approach as long as there is work to do the following changes! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
openai_tools.append(openai_tool) | ||
|
||
# Extract tool_choice from tool_config for OpenAI compatibility | ||
# Note: tool_choice can only be provided when tools are also provided | ||
tool_choice = None | ||
if openai_tools and self.agent_config.tool_config and self.agent_config.tool_config.tool_choice: | ||
tool_choice = ( | ||
self.agent_config.tool_config.tool_choice.value | ||
if hasattr(self.agent_config.tool_config.tool_choice, "value") | ||
else str(self.agent_config.tool_config.tool_choice) | ||
) | ||
|
||
# Convert sampling params to OpenAI format (temperature, top_p, max_tokens) | ||
temperature = None | ||
top_p = None | ||
max_tokens = None | ||
if sampling_params: | ||
if hasattr(sampling_params.strategy, "temperature"): | ||
temperature = sampling_params.strategy.temperature | ||
if hasattr(sampling_params.strategy, "top_p"): | ||
top_p = sampling_params.strategy.top_p | ||
if sampling_params.max_tokens: | ||
max_tokens = sampling_params.max_tokens | ||
|
||
# Use OpenAI chat completion | ||
openai_stream = await self.inference_api.openai_chat_completion( | ||
model=self.agent_config.model, | ||
messages=openai_messages, | ||
tools=openai_tools if openai_tools else None, | ||
tool_choice=tool_choice, | ||
temperature=temperature, | ||
top_p=top_p, | ||
max_tokens=max_tokens, | ||
stream=True, | ||
sampling_params=sampling_params, | ||
tool_config=self.agent_config.tool_config, | ||
): | ||
) | ||
|
||
# Convert OpenAI stream back to Llama Stack format | ||
response_stream = convert_openai_chat_completion_stream( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah, same here, I wonder if we should return the openai format not the llama stack format. it looks to me here like we are taking llama stack input, converting to openai format, doing a request, and then converting back to llama stack format. I think to do a proper refactor here we should depend on the openai types wdyt about this @aakankshaduggal ? I am open to anything here |
||
openai_stream, enable_incremental_tool_calls=True | ||
) | ||
|
||
async for chunk in response_stream: | ||
event = chunk.event | ||
if event.event_type == ChatCompletionResponseEventType.start: | ||
continue | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we are going to convert to openai responses we should also shift our inputs to match so we don't need to convert anything using the openai compat stuff.
you can look at the inference API for how it handles this with things like openai_chat_completion