[Usage]: Ray + vLLM OpenAI (offline) Batch Inference

### Your current environment

None

### How would you like to use vllm

I want to use the OpenAI library to do offline batch inference leveraging Ray (for scaling and scheduling) on top of vLLM. 

Context: The plan is to built a FastAPI service that closely mimicks OpenAI's batch API and allows to process a larger number of prompts (tens of thousands) in 24h. There are a few options of achieving this with vLLM but every one has some important drawback, but maybe I am missing something:

- There is an existing guide that uses the `LLMClass` [in the docs](https://docs.vllm.ai/en/latest/getting_started/examples/offline_inference_distributed.html) with Ray. While the `LLMClass` shares the OpenAI [sampling parameters](https://docs.vllm.ai/en/v0.5.2/dev/offline_inference/llm.html), it does lack the important OpenAI prompt templating.
- The `run_batch.py` entrypoint that was introduced [here](https://github.com/vllm-project/vllm/issues/8567) would be the simplest one. But it does not support Ray out of the box.
- The third option would be to use the `AsyncLLMEngine` as done [here](https://github.com/vllm-project/vllm/issues/7904) and bundle it with [OpenAIServingChat](https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/serving_chat.py) as has been done in [run_batch.py](https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/run_batch.py). But this would entail some (potential) performance degredation due to going asynch even though it is not really needed for offline batch inference.
- The fourth option could be to use Ray serve like in this example [from Ray's docs](https://docs.ray.io/en/latest/serve/tutorials/vllm-example.html). But this would lack the OpenAI batch format and is – again – async. 

Maybe this helps other people as well. Would be super grateful for some feedback. 🙂 
And thanks a ton for this very nice piece of software and the great community!

### Before submitting a new issue...

- [X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Usage]: Ray + vLLM OpenAI (offline) Batch Inference #8636

Your current environment

How would you like to use vllm

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Usage]: Ray + vLLM OpenAI (offline) Batch Inference #8636

Description

Your current environment

How would you like to use vllm

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions