BadRequest error using remote-vllm on inference

### System Info

CentOS 9 - CPU only
remote-vllm image from docker.io

### Information

- [x] The official example scripts
- [ ] My own modified scripts

### 🐛 Describe the bug

I'm running a llama-stack server using the ``docker.io/llamastack/distribution-remote-vllm`` image and getting a ``BadRequest``. The last image that worked for me was the ``0.1.9`` tag and every other one after this fails with the same error.

From the client side, I'm just using ``curl`` like this:

```
$ curl http://localhost:8321/v1/inference/chat-completion     -H "Content-Type: application/json"     -d '{
        "model_id": "meta-llama/Llama-3.1-8B-Instruct",
        "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Write me a limerick about Llama Stack."}],
        "max_tokens": 100,
        "temperature": 0
    }'
```

This same request works perfectly using the ``0.1.9`` tag of the container.
And below is how I start the server:

`podman run  -it --privileged --rm -p 8321:8321  docker.io/llamastack/distribution-remote-vllm:latest --port 8321      --env INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct  --env VLLM_URL=$VLLM_URL  --env VLLM_API_TOKEN=$VLLM_API_TOKEN  --env VLLM_MAX_TOKENS=200  --env LLAMA_STACK_PORT=8321`

### Error logs


```
         BadRequestError: Error code: 400 - {'object': 'error', 'message': "[{'type': 'list_type', 'loc': ('body',
         'tools'), 'msg': 'Input should be a valid list', 'input': {}}]", 'type': 'BadRequestError', 'param': None,
         'code': 400}

INFO:     ::1:54744 - "POST /v1/inference/chat-completion HTTP/1.1" 500 Internal Server Error
09:25:37.793 [END] /v1/inference/chat-completion [StatusCode.OK] (1393.66ms)
 09:25:37.791 [ERROR] Error executing endpoint route='/v1/inference/chat-completion' method='post'
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 201, in endpoint
    return await maybe_await(value)
  File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 161, in maybe_await
    return await value
  File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/utils/telemetry/trace_protocol.py", line 102, in async_wrapper
    result = await method(self, *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/routers/routers.py", line 324, in chat_completion
    response = await provider.chat_completion(**params)
  File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/utils/telemetry/trace_protocol.py", line 102, in async_wrapper
    result = await method(self, *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/remote/inference/vllm/vllm.py", line 307, in chat_completion
    return await self._nonstream_chat_completion(request, self.client)
  File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/remote/inference/vllm/vllm.py", line 313, in _nonstream_chat_completion
    r = await client.chat.completions.create(**params)
  File "/usr/local/lib/python3.10/site-packages/openai/resources/chat/completions/completions.py", line 2002, in create
    return await self._post(
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1767, in post
    return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1461, in request
    return await self._request(
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1524, in _request
    return await self._retry_request(
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1594, in _retry_request
    return await self._request(
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1562, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': "[{'type': 'list_type', 'loc': ('body', 'tools'), 'msg': 'Input should be a valid list', 'input': {}}]", 'type': 'BadRequestError', 'param': None, 'code': 400}
```

### Expected behavior

On the working version, the server replies just fine:

```
$ curl http://localhost:8321/v1/inference/chat-completion     -H "Content-Type: application/json"     -d '{
        "model_id": "meta-llama/Llama-3.1-8B-Instruct",
        "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Write me a limerick about Llama Stack."}],
        "max_tokens": 100,
        "temperature": 0
    }'

{"metrics":[{"metric":"prompt_tokens","value":32,"unit":null},{"metric":"completion_tokens","value":47,"unit":null},{"metric":"total_tokens","value":79,"unit":null}],"completion_message":{"role":"assistant","content":"There once was a Llama Stack high,\nBuilt with blocks that touched the sky,\nIt stood with great care,\nAnd a gentle air,\nThis Llama's tower reached on by.","stop_reason":"end_of_turn","tool_calls":[]},"logprobs":null}
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BadRequest error using remote-vllm on inference #1955

System Info

Information

🐛 Describe the bug

Error logs

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BadRequest error using remote-vllm on inference #1955

Description

System Info

Information

🐛 Describe the bug

Error logs

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions