Skip to content

[Misc] ADD Docker compose exemple #20210

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions docker-compose.yml-exemple
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
services:
vllm:
image: vllm/vllm-openai:v0.9.1
container_name: vllm
runtime: nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
ipc: host
ports:
- "8000:8000"
volumes:
- models:/models
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current volume configuration mounts a volume at /models, but vLLM and the underlying Hugging Face libraries download models to /root/.cache/huggingface by default. This means models won't be persisted in the models volume as intended. To correctly cache the models, you should mount the volume to /root/.cache/huggingface.

      - models:/root/.cache/huggingface

environment:
HUGGING_FACE_HUB_TOKEN: "YOUR TOKEN HERE"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

To ensure that models are stored in the correctly mounted volume, you should explicitly set the HF_HOME environment variable. This directs the Hugging Face library to use the specified path for caching.

      HF_HOME: /root/.cache/huggingface
      HUGGING_FACE_HUB_TOKEN: "YOUR TOKEN HERE"

restart: always
command: >
--model YOUR-MODEL-NALE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Typo: YOUR-MODEL-NALE should be YOUR-MODEL-NAME.

      --model YOUR-MODEL-NAME

--tensor-parallel-size 2
--compilation-config "{\"level\": 3}"
--gpu-memory-utilization 0.95
--host 0.0.0.0
volumes:
models: