[Misc]: How can I serve multiple models on a single port using the OpenAI API?

### Anything you want to discuss about vllm.

I deployed a model on port 4400 using the OpenAI API. When I try to deploy another model on the same port, I get an error as follow. Is there any way I can deploy two models on the same port?

command: 
python -m vllm.entrypoints.openai.api_server --served-model-name Invoke --model ./models/invoke_model --gpu-memory-utilization 0.35 --port 4400
python -m vllm.entrypoints.openai.api_server --served-model-name Emotion --model ./models/emotion_model --gpu-memory-utilization 0.35 --port 4400

ERROR: 
[Errno 98] error while attempting to bind on address ('0.0.0.0', 4400): address already in use

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Misc]: How can I serve multiple models on a single port using the OpenAI API? #5899

Anything you want to discuss about vllm.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Misc]: How can I serve multiple models on a single port using the OpenAI API? #5899

Description

Anything you want to discuss about vllm.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions