[Feature]: load/unload API to run multiple LLMs in a single GPU instance

### 🚀 The feature, motivation and pitch

The feature request is to add support for a load/unload endpoint/API in vLLM to dynamically load and unload multiple LLMs within a single GPU instance. This feature aims to enhance resource utilization and scalability by allowing concurrent operation of multiple LLMs on the same GPU.

The load/unload endpoint in vLLM facilitates:

- Increased Resource Utilization: Enables concurrent operation of multiple LLMs on a single GPU, optimizing computational resources and system efficiency.

- Enhanced Scalability: Allows dynamic model loading and unloading based on demand, adapting to varying workloads and user requirements.

- Improved Cost-effectiveness: Maximizes throughput and performance without additional hardware investments, ideal for organizations with budget constraints.

### Alternatives

Alternatively, providing an API for manual model unloading offers finer control over resource management.

### Additional context

- models here in my context are mainly small LLM (<= 10B).
- Several community members have raised issue to [unload models](https://github.com/vllm-project/vllm/issues/3281) or [release GPU memory](https://github.com/vllm-project/vllm/issues/1908) in vLLM. While workarounds exist, their efficacy is inconsistent. It is hoped that official support for these functions can be implemented.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: load/unload API to run multiple LLMs in a single GPU instance #5491

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: load/unload API to run multiple LLMs in a single GPU instance #5491

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions