[Doc]: Steps to run vLLM on your RTX5080 or 5090!

### 📚 The doc issue

Let's take a look at the steps required to run vLLM on your RTX5080/5090! 

1. **Initial Setup:** To start with, we need a container that has CUDA 12.8 and PyTorch 2.6 so that we have nvcc that can compile for Blackwell. 

```
docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
                                -it nvcr.io/nvidia/pytorch:25.02-py3 /bin/bash
```

2. **Clone vLLM Repository:** Let's clone top of tree vLLM. If you have an existing clone or working directory, ensure that you are at or above the commit [ed6ea06](https://github.com/vllm-project/vllm/commit/ed6ea06577ec06f0b3a9ac921b55ef254f19d923) in your clone. 

```
git clone https://github.com/vllm-project/vllm.git && cd vllm
```

3. **Build vLLM in the container:** Now, we start building vLLM. Please note here that we can't use precompiled vLLM because `vllm-project/vllm` has not moved to the required torch and CUDA versions yet. So, we leverage the torch and CUDA versions that come with the NGC containers. The following steps are your standard build from source instructions, with the caveat of running `use_existing_torch.py`

```
python use_existing_torch.py
pip install -r requirements/build.txt
pip install setuptools_scm

# optionally create a CACHE_DIR if you don't have your regular CCACHE_DIR
mkdir <path/to/ccache/dir>

CCACHE_DIR=<path/to/ccache/dir> python setup.py develop
```

Notes: 

- If `ccache` is not already installed, please install using - `apt-get update && apt-get install ccache`. 
- The following may also be needed based on your environment. 
```
apt-get update && apt-get install -y --no-install-recommends \
    kmod \
    git \
    python3-pip \
    && apt-get clean && rm -rf /var/lib/apt/lists/*
```

- To speed up your process, you can leverage `MAX_JOBS` flag. Check the number of cores on your CPU using `nproc` and use it while running your build. For example, if your machine has 16 cores, MAX_JOBS=10 may be a good number to not overload your CPU with the build. Set it to `1` if you want a single threaded build or if you are running into any issues with your parallel build.

```
MAX_JOBS=<number> CCACHE_DIR=<path/to/ccache/dir> python setup.py develop
```
- Switch steps 1 and 2 based on whether or not you want to re-use your repository for development purposes. If you clone first and then start the container, you may have to give additional permissions for making changes to vLLM source in the container. 


4. **Test vLLM**: Once your build succeeds, run the following to check your installation. 

```
python -c "import vllm; print(vllm.__version__)"
```

You should see a compiled version of `vllm.0.7.4`+

Congratulations, your RTX5080/90 is now ready to run vLLM!

Note: Flash Attention 3 backend doesn't work with Blackwell yet, please use `VLLM_FLASH_ATTN_VERSION=2` if you run into any issues. 


Thanks @ywang96 for testing this out! Thanks to @kushanam, @kaixih for all the Blackwell support PRs!

### Suggest a potential alternative/fix

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Doc]: Steps to run vLLM on your RTX5080 or 5090! #14452

📚 The doc issue

Suggest a potential alternative/fix

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Doc]: Steps to run vLLM on your RTX5080 or 5090! #14452

Description

📚 The doc issue

Suggest a potential alternative/fix

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions