Building PyTorch for ROCm

General remarks

This is a quick guide to setup PyTorch with ROCm support inside a docker container. Assumes a .deb based system. See ROCm install for supported operating systems and general information on the ROCm software stack.

A ROCm install version 3.5.1 is required currently.

Follow the instructions from ROCm installation page to install the baseline ROCm driver
https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html

Recommended: Pull and use the published PyTorch ROCm docker image

Pull the latest public PyTorch docker container This option provides a docker image which has PyTorch ROCm installed. Users can launch the docker container and train/run deep learning models directly. This docker image will run on both gfx900(Vega10-type GPU - MI25, Vega56, Vega64,...) and gfx906(Vega20-type GPU - MI50, MI60)

docker run -it --network=host --device=/dev/kfd --device=/dev/dri --group-add video rocm/pytorch:latest
This will automatically download the image if it does not exist on the host. You can also pass -v argument to mount any data directories on to the container.

Option 2: Install using published PyTorch ROCm docker image

Obtain docker image:
docker pull rocm/pytorch:latest-base or docker pull rocm/pytorch:latest-pytorch
Clone PyTorch repository on the host:
cd ~
git clone https://github.com/pytorch/pytorch.git
cd pytorch
git submodule update --init --recursive
Start a docker container using the downloaded image:
sudo docker run -it -v $HOME:/data --privileged --rm --device=/dev/kfd --device=/dev/dri --group-add video rocm/pytorch:latest-base
Note: This will mount your host home directory on /data in the container.
Change to previous PyTorch checkout from within the running docker:
cd /data/pytorch
Build PyTorch for ROCm:
By default, PyTorch will build for gfx803, gfx900, and gfx906 simultaneously (to see which AMD uarch you have, run /opt/rocm/bin/rocm_agent_enumerator, gfx900 are Vega10-type GPUs (MI25, Vega56, Vega64, ...) and work best). If you want to compile only for your uarch, export PYTORCH_ROCM_ARCH=gfx900 to gfx803, gfx900, or gfx906. Then build with
.jenkins/pytorch/build.sh
This will first hipify the PyTorch sources and then compile, needing 16 GB of RAM to be available to the docker image.
Confirm working installation:
.jenkins/pytorch/test.sh
runs all CI unit tests and skips as appropriate on your system based on ROCm and, e.g., single or multi GPU configuration. No tests will fail if the compilation and installation is correct. Additionally, this step will install torchvision which most PyTorch script use to load models. E.g., running the PyTorch examples requires torchvision.
Individual test sets can be run with
PYTORCH_TEST_WITH_ROCM=1 python test/test_nn.py --verbose
Where test_nn.py can be replaced with any other test set.
Commit the container to preserve the pytorch install (from the host):
sudo docker commit <container_id> -m 'pytorch installed'

Option 3: Install using PyTorch upstream docker file

Clone PyTorch repository on the host:
cd ~
git clone https://github.com/pytorch/pytorch.git
cd pytorch
git submodule update --init --recursive
Build PyTorch docker image:
cd pytorch/docker/caffe2/jenkins
./build.sh py3.6-clang7-rocmdeb-ubuntu18.04
This should complete with a message "Successfully built <image_id>"
Note here that other software versions may be chosen, such setups are currently not tested though!
Start a docker container using the new image:
sudo docker run -it -v $HOME:/data --privileged --rm --device=/dev/kfd --device=/dev/dri --group-add video <image_id>
Note: This will mount your host home directory on /data in the container.
Change to previous PyTorch checkout from within the running docker:
cd /data/pytorch
Build PyTorch for ROCm:
By default, PyTorch will build for gfx803, gfx900, and gfx906 simultaneously (to see which AMD uarch you have, run /opt/rocm/bin/rocm_agent_enumerator, gfx900 are Vega10-type GPUs (MI25, Vega56, Vega64, ...) and work best). If you want to compile only for your uarch, export PYTORCH_ROCM_ARCH=gfx900 to gfx803, gfx900, or gfx906. Then build with
.jenkins/pytorch/build.sh
This will first hipify the PyTorch sources and then compile using several concurrent jobs, needing 16 GB of RAM to be available to the docker image.
Confirm working installation:
.jenkins/pytorch/test.sh
runs all CI unit tests and skips as appropriate on your system based on ROCm and, e.g., single or multi GPU configuration. No tests will fail if the compilation and installation is correct. Additionally, this step will install torchvision whih most PyTorch script use to load models. E.g., running the pytorch examples requires torchvision.
Individual test sets can be run with
PYTORCH_TEST_WITH_ROCM=1 python test/test_nn.py --verbose
Where test_nn.py can be replaced with any other test set.
Commit the container to preserve the pytorch install (from the host):
sudo docker commit <container_id> -m 'pytorch installed'

Option 4: Install using minimal ROCm docker file

Download pytorch dockerfile:
Dockerfile
Build docker image:
cd pytorch_docker
sudo docker build .
This should complete with a message "Successfully built <image_id>"
Start a docker container using the new image:
sudo docker run -it -v $HOME:/data --privileged --rm --device=/dev/kfd --device=/dev/dri --group-add video <image_id>
Note: This will mount your host home directory on /data in the container.
Clone pytorch master (on to the host):
cd ~
git clone https://github.com/pytorch/pytorch.git or git clone https://github.com/ROCmSoftwarePlatform/pytorch.git
cd pytorch
git submodule update --init --recursive
Run "hipify" to prepare source code (in the container):
cd /data/pytorch/
python tools/amd_build/build_amd.py
Build and install pytorch:
By default, PyTorch will build for gfx803, gfx900, and gfx906 simultaneously (to see which AMD uarch you have, run /opt/rocm/bin/rocm_agent_enumerator, gfx900 are Vega10-type GPUs (MI25, Vega56, Vega64, ...) and work best). If you want to compile only for your uarch, export PYTORCH_ROCM_ARCH=gfx900 to gfx803, gfx900, or gfx906. Then build with
USE_MKLDNN=0 USE_ROCM=1 MAX_JOBS=4 python setup.py install --user
UseMAX_JOBS=n to limit peak memory usage. If building fails try falling back to fewer jobs. 4 jobs assume available main memory of 16 GB or larger.
Confirm working installation:
.jenkins/pytorch/test.sh
runs all CI unit tests and skips as appropriate on your system based on ROCm and, e.g., single or multi GPU configuration. No tests will fail if the compilation and installation is correct. Additionally, this step will install torchvision whih most PyTorch script use to load models. E.g., running the pytorch examples requires torchvision.
Individual test sets can be run with
PYTORCH_TEST_WITH_ROCM=1 python test/test_nn.py --verbose
Where test_nn.py can be replaced with any other test set.
Commit the container to preserve the pytorch install (from the host):
sudo docker commit <container_id> -m 'pytorch installed'

Try PyTorch examples

Clone the PyTorch examples repository:
git clone https://github.com/pytorch/examples.git
Run individual example: MNIST
cd examples/mnist
Follow instructions in README.md, in this case:
pip install -r requirements.txt python main.py
Run individual example: Try ImageNet training
cd ../imagenet
Follow instructions in README.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Building PyTorch for ROCm

General remarks

Recommended: Pull and use the published PyTorch ROCm docker image

Option 2: Install using published PyTorch ROCm docker image

Option 3: Install using PyTorch upstream docker file

Option 4: Install using minimal ROCm docker file

Try PyTorch examples

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally