-
Notifications
You must be signed in to change notification settings - Fork 215
422 Unprocessable Entity using Neural Chat via OpenAI interface with meta--lama/llama-2-7b-chat-hf #1288
Description
Is there a specific version of openai that is aligned with the OpenAI interfaces offered by neuralchat? I am currently testing using the current 1.12.0 but encountering a 422 Unprocessable Entity error.
I saw that meta-llama/Llama-2-7b-chat-hf is a supported model and appears to be small enough to fit into my Intel Data Center Flex 170 XPU.
I can successfully run this model locally with the code outlined in deploy_chatbot_on_xpu.
However, when I attempt to use the OpenAI interface per the instructions at https://github.com/intel/intel-extension-for-transformers/tree/main/intel_extension_for_transformers/neural_chat, the server shows 422 Unprocessable Entity and the client gets an error about a missing value. I am assuming this relates to a mismatch between the OpenAI client and the neural_chat server in terms of the required fields. I have also included the text extracted from the tcpdump below.
Following along from the notebook examples, I have prepared textbot.yaml and server.py as below.
Starting the server
$ grep -v "^#" textbot.yaml | grep -v "^$"
host: 0.0.0.0
port: 8000
model_name_or_path: "meta-llama/Llama-2-7b-chat-hf"
device: "xpu"
tasks_list: ['textchat']
$ cat server.py
#!/usr/bin/env python
import os
import time
import multiprocessing
from intel_extension_for_transformers.neural_chat import NeuralChatServerExecutor
import nest_asyncio
nest_asyncio.apply()
def start_service():
server_executor = NeuralChatServerExecutor()
server_executor(config_file="textbot.yaml", log_file="neuralchat.log")
multiprocessing.Process(target=start_service).start()
$ ./server.py
/home/REDACTED/miniconda3/envs/jupyter2/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
warn(
/home/REDACTED/miniconda3/envs/jupyter2/lib/python3.9/site-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
Loading config settings from the environment...
2024-02-19 14:11:22.837584: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-02-19 14:11:22.841047: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-19 14:11:22.887207: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-19 14:11:22.887246: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-19 14:11:22.888669: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-02-19 14:11:22.896900: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-19 14:11:22.897194: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-02-19 14:11:23.782914: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-02-19 14:11:27,327 - datasets - INFO - PyTorch version 2.1.0a0+cxx11.abi available.
2024-02-19 14:11:27,328 - datasets - INFO - TensorFlow version 2.15.0.post1 available.
Loading model meta-llama/Llama-2-7b-chat-hf
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00, 1.25it/s]
2024-02-19 14:11:31,912 - root - INFO - Model loaded.
INFO: Started server process [2913373]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
Additional logs after starting the TextChatClientExecutor client - successful inference
[2024-02-19 14:32:57,683] [ INFO] - Checking parameters of completion request...
[2024-02-19 14:32:57,683] [ INFO] - Predicting chat completion using prompt 'Tell me about Intel Xeon Scalable Processors.'
[2024-02-19 14:33:07,119] [ INFO] - Chat completion finished.
INFO: 127.0.0.1:60734 - "POST /v1/chat/completions HTTP/1.1" 200 OK
Additional logs after connecting via OpenAI - failing access
INFO: 127.0.0.1:39368 - "POST /v1/chat/completions HTTP/1.1" 422 Unprocessable Entity
Open AI Client contents
Aside from the shebang and the modified model string, this should be identical to the content on the webpage.
$ cat openai-client.py
#!/usr/bin/env python
import openai
openai.api_key = "EMPTY"
openai.base_url = 'http://127.0.0.1:8000/v1/'
response = openai.chat.completions.create(
model="meta-llama/Llama-2-7b-chat-hf",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Tell me about Intel Xeon Scalable Processors."},
],
)
print(response.choices[0].message.content)
$ ./openai-client.py
Traceback (most recent call last):
File "/home/REDACTED/jupyter/./openai-client.py", line 7, in <module>
response = openai.chat.completions.create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/REDACTED/miniconda3/envs/openai/lib/python3.11/site-packages/openai/_utils/_utils.py", line 275, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/REDACTED/miniconda3/envs/openai/lib/python3.11/site-packages/openai/resources/chat/completions.py", line 663, in create
return self._post(
^^^^^^^^^^^
File "/home/REDACTED/miniconda3/envs/openai/lib/python3.11/site-packages/openai/_base_client.py", line 1200, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/REDACTED/miniconda3/envs/openai/lib/python3.11/site-packages/openai/_base_client.py", line 889, in request
return self._request(
^^^^^^^^^^^^^^
File "/home/REDACTED/miniconda3/envs/openai/lib/python3.11/site-packages/openai/_base_client.py", line 980, in _request
raise self._make_status_error_from_response(err.response) from None
openai.UnprocessableEntityError: Error code: 422 - {'detail': [{'loc': ['body', 'prompt'], 'msg': 'field required', 'type': 'value_error.missing'}]}
Text from packet capture of exchange
POST /v1/chat/completions HTTP/1.1
Host: REDACTED:8000
Accept-Encoding: gzip, deflate
Connection: keep-alive
Accept: application/json
Content-Type: application/json
User-Agent: _ModuleClient/Python 1.12.0
X-Stainless-Lang: python
X-Stainless-Package-Version: 1.12.0
X-Stainless-OS: Linux
X-Stainless-Arch: x64
X-Stainless-Runtime: CPython
X-Stainless-Runtime-Version: 3.11.7
Authorization: Bearer EMPTY
X-Stainless-Async: false
Content-Length: 197
{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Tell me about Intel Xeon Scalable Processors."}], "model": "meta-llama/Llama-2-7b-chat-hf"}
HTTP/1.1 422 Unprocessable Entity
date: Mon, 19 Feb 2024 23:02:02 GMT
server: uvicorn
content-length: 90
content-type: application/json
{"detail":[{"loc":["body","prompt"],"msg":"field required","type":"value_error.missing"}]}
Thank you!