Skip to content

Building PyTorch for ROCm

iotamudelta edited this page Feb 28, 2019 · 78 revisions

General remarks

This is a quick guide to setup PyTorch with ROCm support inside a docker container. Assumes a .deb based system. See ROCm install for supported operating systems and general information on the ROCm software stack.

A ROCm install version 2.1 is required currently.

  1. Install or update rock-dkms on the host system:
    sudo apt-get install rock-dkms
    or
    sudo apt-get update
    sudo apt-get upgrade

Recommended: Install using published PyTorch ROCm docker image

  1. Obtain docker image:
    docker pull rocm/pytorch:rocm2.1_ubuntu16.04

  2. Clone PyTorch repository on the host:
    cd ~
    git clone https://github.com/pytorch/pytorch.git
    cd pytorch
    git submodule init
    git submodule update

  3. Start a docker container using the downloaded image:
    sudo docker run -it -v $HOME:/data --privileged --rm --device=/dev/kfd --device=/dev/dri --group-add video rocm/pytorch:rocm2.1_ubuntu16.04
    Note: This will mount your host home directory on /data in the container.

  4. Change to previous PyTorch checkout from within the running docker:
    cd /data/pytorch

  5. Build PyTorch for ROCm:
    By default, PyTorch will build for gfx803, gfx900, and gfx906 simultaneously (to see which AMD uarch you have, run /opt/rocm/bin/rocm_agent_enumerator, gfx900 are Vega10-type GPUs (MI25, Vega56, Vega64, ...) and work best). If you want to compile only for your uarch, export PYTORCH_ROCM_ARCH=gfx900 to gfx803, gfx900, or gfx906. Then build with
    .jenkins/pytorch/build.sh
    This will first hipify the PyTorch sources and then compile using 4 concurrent jobs, needing 16 GB of RAM to be available to the docker image.

  6. Confirm working installation:
    PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py --verbose
    No tests will fail if the compilation and installation is correct.

  7. Install torchvision:
    pip install torchvision
    This step is optional but most PyTorch scripts will use torchvision to load models. E.g., running the pytorch examples requires torchvision.

  8. Commit the container to preserve the pytorch install (from the host):
    sudo docker commit <container_id> -m 'pytorch installed'

Option 2: Install using PyTorch upstream docker file

  1. Clone PyTorch repository on the host:
    cd ~
    git clone https://github.com/pytorch/pytorch.git
    cd pytorch
    git submodule init
    git submodule update

  2. Build PyTorch docker image:
    cd pytorch/docker/caffe2/jenkins
    ./build.sh py2-clang7-rocmdeb-ubuntu16.04
    This should complete with a message "Successfully built <image_id>"
    Note here that other software versions may be chosen, such setups are currently not tested though!

  3. Start a docker container using the new image:
    sudo docker run -it -v $HOME:/data --privileged --rm --device=/dev/kfd --device=/dev/dri --group-add video <image_id>
    Note: This will mount your host home directory on /data in the container.

  4. Change to previous PyTorch checkout from within the running docker:
    cd /data/pytorch

  5. Build PyTorch for ROCm:
    By default, PyTorch will build for gfx803, gfx900, and gfx906 simultaneously (to see which AMD uarch you have, run /opt/rocm/bin/rocm_agent_enumerator, gfx900 are Vega10-type GPUs (MI25, Vega56, Vega64, ...) and work best). If you want to compile only for your uarch, export PYTORCH_ROCM_ARCH=gfx900 to gfx803, gfx900, or gfx906. Then build with
    .jenkins/pytorch/build.sh
    This will first hipify the PyTorch sources and then compile using 4 concurrent jobs, needing 16 GB of RAM to be available to the docker image.

  6. Confirm working installation:
    PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py --verbose
    No tests will fail if the compilation and installation is correct.

  7. Install torchvision:
    pip install torchvision
    This step is optional but most PyTorch scripts will use torchvision to load models. E.g., running the pytorch examples requires torchvision.

  8. Commit the container to preserve the pytorch install (from the host):
    sudo docker commit <container_id> -m 'pytorch installed'

Option 3: Install using minimal ROCm docker file

  1. Download pytorch dockerfile:
    Dockerfile

  2. Build docker image:
    cd pytorch_docker
    sudo docker build .
    This should complete with a message "Successfully built <image_id>"

  3. Start a docker container using the new image:
    sudo docker run -it -v $HOME:/data --privileged --rm --device=/dev/kfd --device=/dev/dri --group-add video <image_id>
    Note: This will mount your host home directory on /data in the container.

  4. Clone pytorch master (on to the host):
    cd ~
    git clone https://github.com/pytorch/pytorch.git or git clone https://github.com/ROCmSoftwarePlatform/pytorch.git
    cd pytorch
    git submodule init
    git submodule update

  5. Run "hipify" to prepare source code (in the container):
    cd /data/pytorch/
    python tools/amd_build/build_amd.py

  6. Build and install pytorch:
    By default, PyTorch will build for gfx803, gfx900, and gfx906 simultaneously (to see which AMD uarch you have, run /opt/rocm/bin/rocm_agent_enumerator, gfx900 are Vega10-type GPUs (MI25, Vega56, Vega64, ...) and work best). If you want to compile only for your uarch, export PYTORCH_ROCM_ARCH=gfx900 to gfx803, gfx900, or gfx906. Then build with
    USE_ROCM=1 MAX_JOBS=4 python setup.py install --user
    UseMAX_JOBS=n to limit peak memory usage. If building fails try falling back to fewer jobs. 4 jobs assume available main memory of 16 GB or larger.

  7. Confirm working installation:
    PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py --verbose
    No tests will fail if the compilation and installation is correct.

  8. Install torchvision:
    pip install torchvision
    This step is optional but most PyTorch scripts will use torchvision to load models. E.g., running the pytorch examples requires torchvision.

  9. Commit the container to preserve the pytorch install (from the host):
    sudo docker commit <container_id> -m 'pytorch installed'

Try PyTorch examples

  1. Clone the PyTorch examples repository:
    git clone https://github.com/pytorch/examples.git

  2. Run individual example: MNIST
    cd examples/mnist
    Follow instructions in README.md, in this case:
    pip install -r requirements.txt python main.py

  3. Run individual example: Try ImageNet training
    cd ../imagenet
    Follow instructions in README.md.

Clone this wiki locally