Skip to content

[Misc]: How can I serve multiple models on a single port using the OpenAI API? #5899

@SuiJiGuoChengSuiJiGuo

Description

@SuiJiGuoChengSuiJiGuo

Anything you want to discuss about vllm.

I deployed a model on port 4400 using the OpenAI API. When I try to deploy another model on the same port, I get an error as follow. Is there any way I can deploy two models on the same port?

command:
python -m vllm.entrypoints.openai.api_server --served-model-name Invoke --model ./models/invoke_model --gpu-memory-utilization 0.35 --port 4400
python -m vllm.entrypoints.openai.api_server --served-model-name Emotion --model ./models/emotion_model --gpu-memory-utilization 0.35 --port 4400

ERROR:
[Errno 98] error while attempting to bind on address ('0.0.0.0', 4400): address already in use

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions