422 Unprocessable Entity using Neural Chat via OpenAI interface with meta--lama/llama-2-7b-chat-hf

Is there a specific version of openai that is aligned with the OpenAI interfaces offered by neuralchat? I am currently testing using the current **1.12.0** but encountering a **422 Unprocessable Entity** error.

I saw that **meta-llama/Llama-2-7b-chat-hf** is a supported model and appears to be small enough to fit into my Intel Data Center Flex 170 XPU.

I can successfully run this model locally with the code outlined in **deploy_chatbot_on_xpu**.

However, when I attempt to use the OpenAI interface per the instructions at https://github.com/intel/intel-extension-for-transformers/tree/main/intel_extension_for_transformers/neural_chat, the server shows **422 Unprocessable Entity** and the client gets an error about a missing value. I am assuming this relates to a mismatch between the OpenAI client and the neural_chat server in terms of the required fields. I have also included the text extracted from the tcpdump below.

Following along from the notebook examples, I have prepared textbot.yaml and server.py as below.

## Starting the server
```
$ grep -v "^#" textbot.yaml | grep -v "^$"
host: 0.0.0.0
port: 8000
model_name_or_path: "meta-llama/Llama-2-7b-chat-hf"
device: "xpu"
tasks_list: ['textchat']

$ cat server.py
#!/usr/bin/env python

import os
import time
import multiprocessing
from intel_extension_for_transformers.neural_chat import NeuralChatServerExecutor
import nest_asyncio

nest_asyncio.apply()

def start_service():
    server_executor = NeuralChatServerExecutor()
    server_executor(config_file="textbot.yaml", log_file="neuralchat.log")
multiprocessing.Process(target=start_service).start()

$ ./server.py
/home/REDACTED/miniconda3/envs/jupyter2/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
/home/REDACTED/miniconda3/envs/jupyter2/lib/python3.9/site-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
  warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
Loading config settings from the environment...
2024-02-19 14:11:22.837584: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-02-19 14:11:22.841047: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-19 14:11:22.887207: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-19 14:11:22.887246: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-19 14:11:22.888669: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-02-19 14:11:22.896900: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-19 14:11:22.897194: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-02-19 14:11:23.782914: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-02-19 14:11:27,327 - datasets - INFO - PyTorch version 2.1.0a0+cxx11.abi available.
2024-02-19 14:11:27,328 - datasets - INFO - TensorFlow version 2.15.0.post1 available.
Loading model meta-llama/Llama-2-7b-chat-hf
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.25it/s]
2024-02-19 14:11:31,912 - root - INFO - Model loaded.
INFO:     Started server process [2913373]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
```

## Additional logs after starting the TextChatClientExecutor client - successful inference
```
[2024-02-19 14:32:57,683] [    INFO] - Checking parameters of completion request...
[2024-02-19 14:32:57,683] [    INFO] - Predicting chat completion using prompt 'Tell me about Intel Xeon Scalable Processors.'
[2024-02-19 14:33:07,119] [    INFO] - Chat completion finished.
INFO:     127.0.0.1:60734 - "POST /v1/chat/completions HTTP/1.1" 200 OK
```

## Additional logs after connecting via OpenAI - failing access
```
INFO:     127.0.0.1:39368 - "POST /v1/chat/completions HTTP/1.1" 422 Unprocessable Entity
```

## Open AI Client contents
Aside from the shebang and the modified model string, this should be identical to the content on the webpage.
```
$ cat openai-client.py
#!/usr/bin/env python

import openai
openai.api_key = "EMPTY"
openai.base_url = 'http://127.0.0.1:8000/v1/'

response = openai.chat.completions.create(
      model="meta-llama/Llama-2-7b-chat-hf",
      messages=[
          {"role": "system", "content": "You are a helpful assistant."},
          {"role": "user", "content": "Tell me about Intel Xeon Scalable Processors."},
      ],
)
print(response.choices[0].message.content)

$ ./openai-client.py
Traceback (most recent call last):
  File "/home/REDACTED/jupyter/./openai-client.py", line 7, in <module>
    response = openai.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/REDACTED/miniconda3/envs/openai/lib/python3.11/site-packages/openai/_utils/_utils.py", line 275, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/REDACTED/miniconda3/envs/openai/lib/python3.11/site-packages/openai/resources/chat/completions.py", line 663, in create
    return self._post(
           ^^^^^^^^^^^
  File "/home/REDACTED/miniconda3/envs/openai/lib/python3.11/site-packages/openai/_base_client.py", line 1200, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/REDACTED/miniconda3/envs/openai/lib/python3.11/site-packages/openai/_base_client.py", line 889, in request
    return self._request(
           ^^^^^^^^^^^^^^
  File "/home/REDACTED/miniconda3/envs/openai/lib/python3.11/site-packages/openai/_base_client.py", line 980, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.UnprocessableEntityError: Error code: 422 - {'detail': [{'loc': ['body', 'prompt'], 'msg': 'field required', 'type': 'value_error.missing'}]}
```

## Text from packet capture of exchange
```
POST /v1/chat/completions HTTP/1.1
Host: REDACTED:8000
Accept-Encoding: gzip, deflate
Connection: keep-alive
Accept: application/json
Content-Type: application/json
User-Agent: _ModuleClient/Python 1.12.0
X-Stainless-Lang: python
X-Stainless-Package-Version: 1.12.0
X-Stainless-OS: Linux
X-Stainless-Arch: x64
X-Stainless-Runtime: CPython
X-Stainless-Runtime-Version: 3.11.7
Authorization: Bearer EMPTY
X-Stainless-Async: false
Content-Length: 197

{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Tell me about Intel Xeon Scalable Processors."}], "model": "meta-llama/Llama-2-7b-chat-hf"}

HTTP/1.1 422 Unprocessable Entity
date: Mon, 19 Feb 2024 23:02:02 GMT
server: uvicorn
content-length: 90
content-type: application/json

{"detail":[{"loc":["body","prompt"],"msg":"field required","type":"value_error.missing"}]}
```

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

422 Unprocessable Entity using Neural Chat via OpenAI interface with meta--lama/llama-2-7b-chat-hf #1288

Starting the server

Additional logs after starting the TextChatClientExecutor client - successful inference

Additional logs after connecting via OpenAI - failing access

Open AI Client contents

Text from packet capture of exchange

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

422 Unprocessable Entity using Neural Chat via OpenAI interface with meta--lama/llama-2-7b-chat-hf #1288

Description

Starting the server

Additional logs after starting the TextChatClientExecutor client - successful inference

Additional logs after connecting via OpenAI - failing access

Open AI Client contents

Text from packet capture of exchange

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions