Skip to content

Commit 04dabe3

Browse files
ywang96DarkLight1337
authored andcommitted
[Frontend] Add OpenAI Vision API Support (vllm-project#5237)
Co-authored-by: DarkLight1337 <[email protected]>
1 parent df8725e commit 04dabe3

File tree

9 files changed

+653
-19
lines changed

9 files changed

+653
-19
lines changed

docs/source/models/vlm.rst

Lines changed: 67 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
Using VLMs
44
==========
55

6-
This document shows you how to run and serve Vision Language Models (VLMs) using vLLM.
6+
vLLM provides experimental support for Vision Language Models (VLMs). This document shows you how to run and serve these models using vLLM.
77

88
Engine Arguments
99
----------------
@@ -54,3 +54,69 @@ For now, we only support a single image per text prompt. To pass an image to the
5454
print(generated_text)
5555
5656
A code example can be found in `examples/llava_example.py <https://github.com/vllm-project/vllm/blob/main/examples/llava_example.py>`_.
57+
58+
Online OpenAI Vision API Compatible Inference
59+
----------------------------------------------
60+
61+
You can serve vision language models with vLLM's HTTP server that is compatible with `OpenAI Vision API <https://platform.openai.com/docs/guides/vision>`_.
62+
63+
.. note::
64+
Currently, vLLM supports only **single** ``image_url`` input per ``messages``. Support for multi-image inputs will be
65+
added in the future.
66+
67+
Below is an example on how to launch the same ``llava-hf/llava-1.5-7b-hf`` with vLLM API server.
68+
69+
.. important::
70+
Since OpenAI Vision API is based on `Chat <https://platform.openai.com/docs/api-reference/chat>`_ API, a chat template
71+
is **required** to launch the API server if the model's tokenizer does not come with one. In this example, we use the
72+
HuggingFace Llava chat template that you can find in the example folder `here <https://github.com/vllm-project/vllm/blob/main/examples/template_llava.jinja>`_.
73+
74+
.. code-block:: bash
75+
76+
python -m vllm.entrypoints.openai.api_server \
77+
--model llava-hf/llava-1.5-7b-hf \
78+
--image-input-type pixel_values \
79+
--image-token-id 32000 \
80+
--image-input-shape 1,3,336,336 \
81+
--image-feature-size 576 \
82+
--chat-template template_llava.jinja
83+
84+
To consume the server, you can use the OpenAI client like in the example below:
85+
86+
.. code-block:: python
87+
88+
from openai import OpenAI
89+
openai_api_key = "EMPTY"
90+
openai_api_base = "http://localhost:8000/v1"
91+
client = OpenAI(
92+
api_key=openai_api_key,
93+
base_url=openai_api_base,
94+
)
95+
chat_response = client.chat.completions.create(
96+
model="llava-hf/llava-1.5-7b-hf",
97+
messages=[{
98+
"role": "user",
99+
"content": [
100+
{"type": "text", "text": "What's in this image?"},
101+
{
102+
"type": "image_url",
103+
"image_url": {
104+
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
105+
},
106+
},
107+
],
108+
}],
109+
)
110+
print("Chat response:", chat_response)
111+
112+
.. note::
113+
114+
By default, the timeout for fetching images through http url is ``5`` seconds. You can override this by setting the environment variable:
115+
116+
.. code-block:: shell
117+
118+
export VLLM_IMAGE_FETCH_TIMEOUT=<timeout>
119+
120+
.. note::
121+
The prompt formatting with the image token ``<image>`` is not needed when serving VLMs with the API server since the prompt will be
122+
processed automatically by the server.

docs/source/serving/openai_compatible_server.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@ Please see the [OpenAI API Reference](https://platform.openai.com/docs/api-refer
3030
- Chat: `tools`, and `tool_choice`.
3131
- Completions: `suffix`.
3232

33+
vLLM also provides experimental support for OpenAI Vision API compatible inference. See more details in [Using VLMs](../models/vlm.rst).
34+
3335
## Extra Parameters
3436
vLLM supports a set of parameters that are not part of the OpenAI API.
3537
In order to use them, you can pass them as extra parameters in the OpenAI client.
@@ -120,4 +122,4 @@ It is the callers responsibility to prompt the model with the tool information,
120122

121123
vLLM will use guided decoding to ensure the response matches the tool parameter object defined by the JSON schema in the `tools` parameter.
122124

123-
Please refer to the OpenAI API reference documentation for more information.
125+
Please refer to the OpenAI API reference documentation for more information.

examples/template_llava.jinja

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
{%- if messages[0]['role'] == 'system' -%}
2+
{%- set system_message = messages[0]['content'] -%}
3+
{%- set messages = messages[1:] -%}
4+
{%- else -%}
5+
{% set system_message = '' -%}
6+
{%- endif -%}
7+
8+
{{ bos_token + system_message }}
9+
{%- for message in messages -%}
10+
{%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
11+
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
12+
{%- endif -%}
13+
14+
{%- if message['role'] == 'user' -%}
15+
{{ 'USER: ' + message['content'] + '\n' }}
16+
{%- elif message['role'] == 'assistant' -%}
17+
{{ 'ASSISTANT: ' + message['content'] + eos_token + '\n' }}
18+
{%- endif -%}
19+
{%- endfor -%}
20+
21+
{%- if add_generation_prompt -%}
22+
{{ 'ASSISTANT:' }}
23+
{% endif %}

0 commit comments

Comments
 (0)