-
Notifications
You must be signed in to change notification settings - Fork 69
Building PyTorch for ROCm
This is a quick guide to setup PyTorch with ROCm support inside a docker container. Assumes a .deb based system. See ROCm install for supported operating systems and general information on the ROCm software stack.
A ROCm install version 2.1 is required currently.
-
Install or update rock-dkms on the host system:
sudo apt-get install rock-dkms
or
sudo apt-get update
sudo apt-get upgrade
-
Obtain docker image:
docker pull rocm/pytorch:rocm2.1_ubuntu16.04
-
Clone PyTorch repository on the host:
cd ~
git clone https://github.com/pytorch/pytorch.git
cd pytorch
git submodule init
git submodule update
-
Start a docker container using the downloaded image:
sudo docker run -it -v $HOME:/data --privileged --rm --device=/dev/kfd --device=/dev/dri --group-add video rocm/pytorch:rocm2.1_ubuntu16.04
Note: This will mount your host home directory on/data
in the container. -
Change to previous PyTorch checkout from within the running docker:
cd /data/pytorch
-
Build PyTorch for ROCm:
By default, PyTorch will build for gfx803, gfx900, and gfx906 simultaneously (to see which AMD uarch you have, run/opt/rocm/bin/rocm_agent_enumerator
, gfx900 are Vega10-type GPUs (MI25, Vega56, Vega64, ...) and work best). If you want to compile only for your uarch,export PYTORCH_ROCM_ARCH=gfx900
to gfx803, gfx900, or gfx906. Then build with
.jenkins/pytorch/build.sh
This will first hipify the PyTorch sources and then compile using 4 concurrent jobs, needing 16 GB of RAM to be available to the docker image. -
Confirm working installation:
PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py --verbose
No tests will fail if the compilation and installation is correct. -
Install torchvision:
pip install torchvision
This step is optional but most PyTorch scripts will use torchvision to load models. E.g., running the pytorch examples requires torchvision. -
Commit the container to preserve the pytorch install (from the host):
sudo docker commit <container_id> -m 'pytorch installed'
-
Clone PyTorch repository on the host:
cd ~
git clone https://github.com/pytorch/pytorch.git
cd pytorch
git submodule init
git submodule update
-
Build PyTorch docker image:
cd pytorch/docker/caffe2/jenkins
./build.sh py2-clang7-rocmdeb-ubuntu16.04
This should complete with a message "Successfully built <image_id>"
Note here that other software versions may be chosen, such setups are currently not tested though! -
Start a docker container using the new image:
sudo docker run -it -v $HOME:/data --privileged --rm --device=/dev/kfd --device=/dev/dri --group-add video <image_id>
Note: This will mount your host home directory on/data
in the container. -
Change to previous PyTorch checkout from within the running docker:
cd /data/pytorch
-
Build PyTorch for ROCm:
By default, PyTorch will build for gfx803, gfx900, and gfx906 simultaneously (to see which AMD uarch you have, run/opt/rocm/bin/rocm_agent_enumerator
, gfx900 are Vega10-type GPUs (MI25, Vega56, Vega64, ...) and work best). If you want to compile only for your uarch,export PYTORCH_ROCM_ARCH=gfx900
to gfx803, gfx900, or gfx906. Then build with
.jenkins/pytorch/build.sh
This will first hipify the PyTorch sources and then compile using 4 concurrent jobs, needing 16 GB of RAM to be available to the docker image. -
Confirm working installation:
PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py --verbose
No tests will fail if the compilation and installation is correct. -
Install torchvision:
pip install torchvision
This step is optional but most PyTorch scripts will use torchvision to load models. E.g., running the pytorch examples requires torchvision. -
Commit the container to preserve the pytorch install (from the host):
sudo docker commit <container_id> -m 'pytorch installed'
-
Download pytorch dockerfile:
Dockerfile -
Build docker image:
cd pytorch_docker
sudo docker build .
This should complete with a message "Successfully built <image_id>" -
Start a docker container using the new image:
sudo docker run -it -v $HOME:/data --privileged --rm --device=/dev/kfd --device=/dev/dri --group-add video <image_id>
Note: This will mount your host home directory on/data
in the container. -
Clone pytorch master (on to the host):
cd ~
git clone https://github.com/pytorch/pytorch.git
orgit clone https://github.com/ROCmSoftwarePlatform/pytorch.git
cd pytorch
git submodule init
git submodule update
-
Run "hipify" to prepare source code (in the container):
cd /data/pytorch/
python tools/amd_build/build_amd.py
-
Build and install pytorch:
By default, PyTorch will build for gfx803, gfx900, and gfx906 simultaneously (to see which AMD uarch you have, run/opt/rocm/bin/rocm_agent_enumerator
, gfx900 are Vega10-type GPUs (MI25, Vega56, Vega64, ...) and work best). If you want to compile only for your uarch,export PYTORCH_ROCM_ARCH=gfx900
to gfx803, gfx900, or gfx906. Then build with
USE_ROCM=1 MAX_JOBS=4 python setup.py install --user
UseMAX_JOBS=n
to limit peak memory usage. If building fails try falling back to fewer jobs. 4 jobs assume available main memory of 16 GB or larger. -
Confirm working installation:
PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py --verbose
No tests will fail if the compilation and installation is correct. -
Install torchvision:
pip install torchvision
This step is optional but most PyTorch scripts will use torchvision to load models. E.g., running the pytorch examples requires torchvision. -
Commit the container to preserve the pytorch install (from the host):
sudo docker commit <container_id> -m 'pytorch installed'
-
Clone the PyTorch examples repository:
git clone https://github.com/pytorch/examples.git
-
Run individual example: MNIST
cd examples/mnist
Follow instructions inREADME.md
, in this case:
pip install -r requirements.txt
python main.py
-
Run individual example: Try ImageNet training
cd ../imagenet
Follow instructions inREADME.md
.