diff --git a/.circleci/README.md b/.circleci/README.md deleted file mode 100644 index 5b0d56d1df2e19..00000000000000 --- a/.circleci/README.md +++ /dev/null @@ -1,498 +0,0 @@ -Structure of CI -=============== - -setup job: -1. Does a git checkout -2. Persists CircleCI scripts (everything in `.circleci`) into a workspace. Why? - We don't always do a Git checkout on all subjobs, but we usually - still want to be able to call scripts one way or another in a subjob. - Persisting files this way lets us have access to them without doing a - checkout. This workspace is conventionally mounted on `~/workspace` - (this is distinguished from `~/project`, which is the conventional - working directory that CircleCI will default to starting your jobs - in.) -3. Write out the commit message to `.circleci/COMMIT_MSG`. This is so - we can determine in subjobs if we should actually run the jobs or - not, even if there isn't a Git checkout. - - - - -CircleCI configuration generator -================================ - -One may no longer make changes to the `.circleci/config.yml` file directly. -Instead, one must edit these Python scripts or files in the `verbatim-sources/` directory. - - -Usage ----------- - -1. Make changes to these scripts. -2. Run the `regenerate.sh` script in this directory and commit the script changes and the resulting change to `config.yml`. - -You'll see a build failure on GitHub if the scripts don't agree with the checked-in version. - - -Motivation ----------- - -These scripts establish a single, authoritative source of documentation for the CircleCI configuration matrix. -The documentation, in the form of diagrams, is automatically generated and cannot drift out of sync with the YAML content. - -Furthermore, consistency is enforced within the YAML config itself, by using a single source of data to generate -multiple parts of the file. - -* Facilitates one-off culling/enabling of CI configs for testing PRs on special targets - -Also see https://github.com/pytorch/pytorch/issues/17038 - - -Future direction ----------------- - -### Declaring sparse config subsets -See comment [here](https://github.com/pytorch/pytorch/pull/17323#pullrequestreview-206945747): - -In contrast with a full recursive tree traversal of configuration dimensions, -> in the future I think we actually want to decrease our matrix somewhat and have only a few mostly-orthogonal builds that taste as many different features as possible on PRs, plus a more complete suite on every PR and maybe an almost full suite nightly/weekly (we don't have this yet). Specifying PR jobs in the future might be easier to read with an explicit list when we come to this. - ----------------- ----------------- - -# How do the binaries / nightlies / releases work? - -### What is a binary? - -A binary or package (used interchangeably) is a pre-built collection of c++ libraries, header files, python bits, and other files. We build these and distribute them so that users do not need to install from source. - -A **binary configuration** is a collection of - -* release or nightly - * releases are stable, nightlies are beta and built every night -* python version - * linux: 3.7m (mu is wide unicode or something like that. It usually doesn't matter but you should know that it exists) - * macos: 3.7, 3.8 - * windows: 3.7, 3.8 -* cpu version - * cpu, cuda 9.0, cuda 10.0 - * The supported cuda versions occasionally change -* operating system - * Linux - these are all built on CentOS. There haven't been any problems in the past building on CentOS and using on Ubuntu - * MacOS - * Windows - these are built on Azure pipelines -* devtoolset version (gcc compiler version) - * This only matters on Linux cause only Linux uses gcc. tldr is gcc made a backwards incompatible change from gcc 4.8 to gcc 5, because it had to change how it implemented std::vector and std::string - -### Where are the binaries? - -The binaries are built in CircleCI. There are nightly binaries built every night at 9pm PST (midnight EST) and release binaries corresponding to Pytorch releases, usually every few months. - -We have 3 types of binary packages - -* pip packages - nightlies are stored on s3 (pip install -f \). releases are stored in a pip repo (pip install torch) (ask Soumith about this) -* conda packages - nightlies and releases are both stored in a conda repo. Nighty packages have a '_nightly' suffix -* libtorch packages - these are zips of all the c++ libraries, header files, and sometimes dependencies. These are c++ only - * shared with dependencies (the only supported option for Windows) - * static with dependencies - * shared without dependencies - * static without dependencies - -All binaries are built in CircleCI workflows except Windows. There are checked-in workflows (committed into the .circleci/config.yml) to build the nightlies every night. Releases are built by manually pushing a PR that builds the suite of release binaries (overwrite the config.yml to build the release) - -# CircleCI structure of the binaries - -Some quick vocab: - -* A \**workflow** is a CircleCI concept; it is a DAG of '**jobs**'. ctrl-f 'workflows' on https://github.com/pytorch/pytorch/blob/master/.circleci/config.yml to see the workflows. -* **jobs** are a sequence of '**steps**' -* **steps** are usually just a bash script or a builtin CircleCI command. *All steps run in new environments, environment variables declared in one script DO NOT persist to following steps* -* CircleCI has a **workspace**, which is essentially a cache between steps of the *same job* in which you can store artifacts between steps. - -## How are the workflows structured? - -The nightly binaries have 3 workflows. We have one job (actually 3 jobs: build, test, and upload) per binary configuration - -1. binary_builds - 1. every day midnight EST - 2. linux: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/linux-binary-build-defaults.yml - 3. macos: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/macos-binary-build-defaults.yml - 4. For each binary configuration, e.g. linux_conda_3.7_cpu there is a - 1. binary_linux_conda_3.7_cpu_build - 1. Builds the build. On linux jobs this uses the 'docker executor'. - 2. Persists the package to the workspace - 2. binary_linux_conda_3.7_cpu_test - 1. Loads the package to the workspace - 2. Spins up a docker image (on Linux), mapping the package and code repos into the docker - 3. Runs some smoke tests in the docker - 4. (Actually, for macos this is a step rather than a separate job) - 3. binary_linux_conda_3.7_cpu_upload - 1. Logs in to aws/conda - 2. Uploads the package -2. update_s3_htmls - 1. every day 5am EST - 2. https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/binary_update_htmls.yml - 3. See below for what these are for and why they're needed - 4. Three jobs that each examine the current contents of aws and the conda repo and update some html files in s3 -3. binarysmoketests - 1. every day - 2. https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/nightly-build-smoke-tests-defaults.yml - 3. For each binary configuration, e.g. linux_conda_3.7_cpu there is a - 1. smoke_linux_conda_3.7_cpu - 1. Downloads the package from the cloud, e.g. using the official pip or conda instructions - 2. Runs the smoke tests - -## How are the jobs structured? - -The jobs are in https://github.com/pytorch/pytorch/tree/master/.circleci/verbatim-sources. Jobs are made of multiple steps. There are some shared steps used by all the binaries/smokes. Steps of these jobs are all delegated to scripts in https://github.com/pytorch/pytorch/tree/master/.circleci/scripts . - -* Linux jobs: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/linux-binary-build-defaults.yml - * binary_linux_build.sh - * binary_linux_test.sh - * binary_linux_upload.sh -* MacOS jobs: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/macos-binary-build-defaults.yml - * binary_macos_build.sh - * binary_macos_test.sh - * binary_macos_upload.sh -* Update html jobs: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/binary_update_htmls.yml - * These delegate from the pytorch/builder repo - * https://github.com/pytorch/builder/blob/master/cron/update_s3_htmls.sh - * https://github.com/pytorch/builder/blob/master/cron/upload_binary_sizes.sh -* Smoke jobs (both linux and macos): https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/nightly-build-smoke-tests-defaults.yml - * These delegate from the pytorch/builder repo - * https://github.com/pytorch/builder/blob/master/run_tests.sh - * https://github.com/pytorch/builder/blob/master/smoke_test.sh - * https://github.com/pytorch/builder/blob/master/check_binary.sh -* Common shared code (shared across linux and macos): https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/nightly-binary-build-defaults.yml - * binary_checkout.sh - checks out pytorch/builder repo. Right now this also checks out pytorch/pytorch, but it shouldn't. pytorch/pytorch should just be shared through the workspace. This can handle being run before binary_populate_env.sh - * binary_populate_env.sh - parses BUILD_ENVIRONMENT into the separate env variables that make up a binary configuration. Also sets lots of default values, the date, the version strings, the location of folders in s3, all sorts of things. This generally has to be run before other steps. - * binary_install_miniconda.sh - Installs miniconda, cross platform. Also hacks this for the update_binary_sizes job that doesn't have the right env variables - * binary_run_in_docker.sh - Takes a bash script file (the actual test code) from a hardcoded location, spins up a docker image, and runs the script inside the docker image - -### **Why do the steps all refer to scripts?** - -CircleCI creates a final yaml file by inlining every <<* segment, so if we were to keep all the code in the config.yml itself then the config size would go over 4 MB and cause infra problems. - -### **What is binary_run_in_docker for?** - -So, CircleCI has several executor types: macos, machine, and docker are the ones we use. The 'machine' executor gives you two cores on some linux vm. The 'docker' executor gives you considerably more cores (nproc was 32 instead of 2 back when I tried in February). Since the dockers are faster, we try to run everything that we can in dockers. Thus - -* linux build jobs use the docker executor. Running them on the docker executor was at least 2x faster than running them on the machine executor -* linux test jobs use the machine executor in order for them to properly interface with GPUs since docker executors cannot execute with attached GPUs -* linux upload jobs use the machine executor. The upload jobs are so short that it doesn't really matter what they use -* linux smoke test jobs use the machine executor for the same reason as the linux test jobs - -binary_run_in_docker.sh is a way to share the docker start-up code between the binary test jobs and the binary smoke test jobs - -### **Why does binary_checkout also checkout pytorch? Why shouldn't it?** - -We want all the nightly binary jobs to run on the exact same git commit, so we wrote our own checkout logic to ensure that the same commit was always picked. Later circleci changed that to use a single pytorch checkout and persist it through the workspace (they did this because our config file was too big, so they wanted to take a lot of the setup code into scripts, but the scripts needed the code repo to exist to be called, so they added a prereq step called 'setup' to checkout the code and persist the needed scripts to the workspace). The changes to the binary jobs were not properly tested, so they all broke from missing pytorch code no longer existing. We hotfixed the problem by adding the pytorch checkout back to binary_checkout, so now there's two checkouts of pytorch on the binary jobs. This problem still needs to be fixed, but it takes careful tracing of which code is being called where. - -# Azure Pipelines structure of the binaries - -TODO: fill in stuff - -## How are the workflows structured? - -TODO: fill in stuff - -## How are the jobs structured? - -TODO: fill in stuff - -# Code structure of the binaries (circleci agnostic) - -## Overview - -The code that runs the binaries lives in two places, in the normal [github.com/pytorch/pytorch](http://github.com/pytorch/pytorch), but also in [github.com/pytorch/builder](http://github.com/pytorch/builder), which is a repo that defines how all the binaries are built. The relevant code is - - -``` -# All code needed to set-up environments for build code to run in, -# but only code that is specific to the current CI system -pytorch/pytorch -- .circleci/ # Folder that holds all circleci related stuff - - config.yml # GENERATED file that actually controls all circleci behavior - - verbatim-sources # Used to generate job/workflow sections in ^ - - scripts/ # Code needed to prepare circleci environments for binary build scripts - -- setup.py # Builds pytorch. This is wrapped in pytorch/builder -- cmake files # used in normal building of pytorch - -# All code needed to prepare a binary build, given an environment -# with all the right variables/packages/paths. -pytorch/builder - -# Given an installed binary and a proper python env, runs some checks -# to make sure the binary was built the proper way. Checks things like -# the library dependencies, symbols present, etc. -- check_binary.sh - -# Given an installed binary, runs python tests to make sure everything -# is in order. These should be de-duped. Right now they both run smoke -# tests, but are called from different places. Usually just call some -# import statements, but also has overlap with check_binary.sh above -- run_tests.sh -- smoke_test.sh - -# Folders that govern how packages are built. See paragraphs below - -- conda/ - - build_pytorch.sh # Entrypoint. Delegates to proper conda build folder - - switch_cuda_version.sh # Switches activate CUDA installation in Docker - - pytorch-nightly/ # Build-folder -- manywheel/ - - build_cpu.sh # Entrypoint for cpu builds - - build.sh # Entrypoint for CUDA builds - - build_common.sh # Actual build script that ^^ call into -- wheel/ - - build_wheel.sh # Entrypoint for wheel builds -- windows/ - - build_pytorch.bat # Entrypoint for wheel builds on Windows -``` - -Every type of package has an entrypoint build script that handles the all the important logic. - -## Conda - -Linux, MacOS and Windows use the same code flow for the conda builds. - -Conda packages are built with conda-build, see https://conda.io/projects/conda-build/en/latest/resources/commands/conda-build.html - -Basically, you pass `conda build` a build folder (pytorch-nightly/ above) that contains a build script and a meta.yaml. The meta.yaml specifies in what python environment to build the package in, and what dependencies the resulting package should have, and the build script gets called in the env to build the thing. -tl;dr on conda-build is - -1. Creates a brand new conda environment, based off of deps in the meta.yaml - 1. Note that environment variables do not get passed into this build env unless they are specified in the meta.yaml - 2. If the build fails this environment will stick around. You can activate it for much easier debugging. The “General Python” section below explains what exactly a python “environment” is. -2. Calls build.sh in the environment -3. Copies the finished package to a new conda env, also specified by the meta.yaml -4. Runs some simple import tests (if specified in the meta.yaml) -5. Saves the finished package as a tarball - -The build.sh we use is essentially a wrapper around `python setup.py build`, but it also manually copies in some of our dependent libraries into the resulting tarball and messes with some rpaths. - -The entrypoint file `builder/conda/build_conda.sh` is complicated because - -* It works for Linux, MacOS and Windows - * The mac builds used to create their own environments, since they all used to be on the same machine. There’s now a lot of extra logic to handle conda envs. This extra machinery could be removed -* It used to handle testing too, which adds more logic messing with python environments too. This extra machinery could be removed. - -## Manywheels (linux pip and libtorch packages) - -Manywheels are pip packages for linux distros. Note that these manywheels are not actually manylinux compliant. - -`builder/manywheel/build_cpu.sh` and `builder/manywheel/build.sh` (for CUDA builds) just set different env vars and then call into `builder/manywheel/build_common.sh` - -The entrypoint file `builder/manywheel/build_common.sh` is really really complicated because - -* This used to handle building for several different python versions at the same time. The loops have been removed, but there's still unnecessary folders and movements here and there. - * The script is never used this way anymore. This extra machinery could be removed. -* This used to handle testing the pip packages too. This is why there’s testing code at the end that messes with python installations and stuff - * The script is never used this way anymore. This extra machinery could be removed. -* This also builds libtorch packages - * This should really be separate. libtorch packages are c++ only and have no python. They should not share infra with all the python specific stuff in this file. -* There is a lot of messing with rpaths. This is necessary, but could be made much much simpler if the above issues were fixed. - -## Wheels (MacOS pip and libtorch packages) - -The entrypoint file `builder/wheel/build_wheel.sh` is complicated because - -* The mac builds used to all run on one machine (we didn’t have autoscaling mac machines till circleci). So this script handled siloing itself by setting-up and tearing-down its build env and siloing itself into its own build directory. - * The script is never used this way anymore. This extra machinery could be removed. -* This also builds libtorch packages - * Ditto the comment above. This should definitely be separated out. - -Note that the MacOS Python wheels are still built in conda environments. Some of the dependencies present during build also come from conda. - -## Windows Wheels (Windows pip and libtorch packages) - -The entrypoint file `builder/windows/build_pytorch.bat` is complicated because - -* This used to handle building for several different python versions at the same time. This is why there are loops everywhere - * The script is never used this way anymore. This extra machinery could be removed. -* This used to handle testing the pip packages too. This is why there’s testing code at the end that messes with python installations and stuff - * The script is never used this way anymore. This extra machinery could be removed. -* This also builds libtorch packages - * This should really be separate. libtorch packages are c++ only and have no python. They should not share infra with all the python specific stuff in this file. - -Note that the Windows Python wheels are still built in conda environments. Some of the dependencies present during build also come from conda. - -## General notes - -### Note on run_tests.sh, smoke_test.sh, and check_binary.sh - -* These should all be consolidated -* These must run on all OS types: MacOS, Linux, and Windows -* These all run smoke tests at the moment. They inspect the packages some, maybe run a few import statements. They DO NOT run the python tests nor the cpp tests. The idea is that python tests on master and PR merges will catch all breakages. All these tests have to do is make sure the special binary machinery didn’t mess anything up. -* There are separate run_tests.sh and smoke_test.sh because one used to be called by the smoke jobs and one used to be called by the binary test jobs (see circleci structure section above). This is still true actually, but these could be united into a single script that runs these checks, given an installed pytorch package. - -### Note on libtorch - -Libtorch packages are built in the wheel build scripts: manywheel/build_*.sh for linux and build_wheel.sh for mac. There are several things wrong with this - -* It’s confusing. Most of those scripts deal with python specifics. -* The extra conditionals everywhere severely complicate the wheel build scripts -* The process for building libtorch is different from the official instructions (a plain call to cmake, or a call to a script) - -### Note on docker images / Dockerfiles - -All linux builds occur in docker images. The docker images are - -* pytorch/conda-cuda - * Has ALL CUDA versions installed. The script pytorch/builder/conda/switch_cuda_version.sh sets /usr/local/cuda to a symlink to e.g. /usr/local/cuda-10.0 to enable different CUDA builds - * Also used for cpu builds -* pytorch/manylinux-cuda90 -* pytorch/manylinux-cuda100 - * Also used for cpu builds - -The Dockerfiles are available in pytorch/builder, but there is no circleci job or script to build these docker images, and they cannot be run locally (unless you have the correct local packages/paths). Only Soumith can build them right now. - -### General Python - -* This is still a good explanation of python installations https://caffe2.ai/docs/faq.html#why-do-i-get-import-errors-in-python-when-i-try-to-use-caffe2 - -# How to manually rebuild the binaries - -tl;dr make a PR that looks like https://github.com/pytorch/pytorch/pull/21159 - -Sometimes we want to push a change to master and then rebuild all of today's binaries after that change. As of May 30, 2019 there isn't a way to manually run a workflow in the UI. You can manually re-run a workflow, but it will use the exact same git commits as the first run and will not include any changes. So we have to make a PR and then force circleci to run the binary workflow instead of the normal tests. The above PR is an example of how to do this; essentially you copy-paste the binarybuilds workflow steps into the default workflow steps. If you need to point the builder repo to a different commit then you'd need to change https://github.com/pytorch/pytorch/blob/master/.circleci/scripts/binary_checkout.sh#L42-L45 to checkout what you want. - -## How to test changes to the binaries via .circleci - -Writing PRs that test the binaries is annoying, since the default circleci jobs that run on PRs are not the jobs that you want to run. Likely, changes to the binaries will touch something under .circleci/ and require that .circleci/config.yml be regenerated (.circleci/config.yml controls all .circleci behavior, and is generated using `.circleci/regenerate.sh` in python 3.7). But you also need to manually hardcode the binary jobs that you want to test into the .circleci/config.yml workflow, so you should actually make at least two commits, one for your changes and one to temporarily hardcode jobs. See https://github.com/pytorch/pytorch/pull/22928 as an example of how to do this. - -```sh -# Make your changes -touch .circleci/verbatim-sources/nightly-binary-build-defaults.yml - -# Regenerate the yaml, has to be in python 3.7 -.circleci/regenerate.sh - -# Make a commit -git add .circleci * -git commit -m "My real changes" -git push origin my_branch - -# Now hardcode the jobs that you want in the .circleci/config.yml workflows section -# Also eliminate ensure-consistency and should_run_job checks -# e.g. https://github.com/pytorch/pytorch/commit/2b3344bfed8772fe86e5210cc4ee915dee42b32d - -# Make a commit you won't keep -git add .circleci -git commit -m "[DO NOT LAND] testing binaries for above changes" -git push origin my_branch - -# Now you need to make some changes to the first commit. -git rebase -i HEAD~2 # mark the first commit as 'edit' - -# Make the changes -touch .circleci/verbatim-sources/nightly-binary-build-defaults.yml -.circleci/regenerate.sh - -# Ammend the commit and recontinue -git add .circleci -git commit --amend -git rebase --continue - -# Update the PR, need to force since the commits are different now -git push origin my_branch --force -``` - -The advantage of this flow is that you can make new changes to the base commit and regenerate the .circleci without having to re-write which binary jobs you want to test on. The downside is that all updates will be force pushes. - -## How to build a binary locally - -### Linux - -You can build Linux binaries locally easily using docker. - -```sh -# Run the docker -# Use the correct docker image, pytorch/conda-cuda used here as an example -# -# -v path/to/foo:path/to/bar makes path/to/foo on your local machine (the -# machine that you're running the command on) accessible to the docker -# container at path/to/bar. So if you then run `touch path/to/bar/baz` -# in the docker container then you will see path/to/foo/baz on your local -# machine. You could also clone the pytorch and builder repos in the docker. -# -# If you know how, add ccache as a volume too and speed up everything -docker run \ - -v your/pytorch/repo:/pytorch \ - -v your/builder/repo:/builder \ - -v where/you/want/packages/to/appear:/final_pkgs \ - -it pytorch/conda-cuda /bin/bash - -# Export whatever variables are important to you. All variables that you'd -# possibly need are in .circleci/scripts/binary_populate_env.sh -# You should probably always export at least these 3 variables -export PACKAGE_TYPE=conda -export DESIRED_PYTHON=3.7 -export DESIRED_CUDA=cpu - -# Call the entrypoint -# `|& tee foo.log` just copies all stdout and stderr output to foo.log -# The builds generate lots of output so you probably need this when -# building locally. -/builder/conda/build_pytorch.sh |& tee build_output.log -``` - -**Building CUDA binaries on docker** - -You can build CUDA binaries on CPU only machines, but you can only run CUDA binaries on CUDA machines. This means that you can build a CUDA binary on a docker on your laptop if you so choose (though it’s gonna take a long time). - -For Facebook employees, ask about beefy machines that have docker support and use those instead of your laptop; it will be 5x as fast. - -### MacOS - -There’s no easy way to generate reproducible hermetic MacOS environments. If you have a Mac laptop then you can try emulating the .circleci environments as much as possible, but you probably have packages in /usr/local/, possibly installed by brew, that will probably interfere with the build. If you’re trying to repro an error on a Mac build in .circleci and you can’t seem to repro locally, then my best advice is actually to iterate on .circleci :/ - -But if you want to try, then I’d recommend - -```sh -# Create a new terminal -# Clear your LD_LIBRARY_PATH and trim as much out of your PATH as you -# know how to do - -# Install a new miniconda -# First remove any other python or conda installation from your PATH -# Always install miniconda 3, even if building for Python <3 -new_conda="~/my_new_conda" -conda_sh="$new_conda/install_miniconda.sh" -curl -o "$conda_sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -chmod +x "$conda_sh" -"$conda_sh" -b -p "$MINICONDA_ROOT" -rm -f "$conda_sh" -export PATH="~/my_new_conda/bin:$PATH" - -# Create a clean python env -# All MacOS builds use conda to manage the python env and dependencies -# that are built with, even the pip packages -conda create -yn binary python=2.7 -conda activate binary - -# Export whatever variables are important to you. All variables that you'd -# possibly need are in .circleci/scripts/binary_populate_env.sh -# You should probably always export at least these 3 variables -export PACKAGE_TYPE=conda -export DESIRED_PYTHON=3.7 -export DESIRED_CUDA=cpu - -# Call the entrypoint you want -path/to/builder/wheel/build_wheel.sh -``` - -N.B. installing a brand new miniconda is important. This has to do with how conda installations work. See the “General Python” section above, but tldr; is that - -1. You make the ‘conda’ command accessible by prepending `path/to/conda_root/bin` to your PATH. -2. You make a new env and activate it, which then also gets prepended to your PATH. Now you have `path/to/conda_root/envs/new_env/bin:path/to/conda_root/bin:$PATH` -3. Now say you (or some code that you ran) call python executable `foo` - 1. if you installed `foo` in `new_env`, then `path/to/conda_root/envs/new_env/bin/foo` will get called, as expected. - 2. But if you forgot to installed `foo` in `new_env` but happened to previously install it in your root conda env (called ‘base’), then unix/linux will still find `path/to/conda_root/bin/foo` . This is dangerous, since `foo` can be a different version than you want; `foo` can even be for an incompatible python version! - -Newer conda versions and proper python hygiene can prevent this, but just install a new miniconda to be safe. - -### Windows - -TODO: fill in diff --git a/.circleci/cimodel/data/binary_build_data.py b/.circleci/cimodel/data/binary_build_data.py index 1c714186568f93..5df203b6ce395f 100644 --- a/.circleci/cimodel/data/binary_build_data.py +++ b/.circleci/cimodel/data/binary_build_data.py @@ -31,13 +31,6 @@ def get_processor_arch_name(gpu_version): ) CONFIG_TREE_DATA = OrderedDict( - windows=( - # Stop building Win+CU102, see https://github.com/pytorch/pytorch/issues/65648 - [v for v in dimensions.GPU_VERSIONS if v not in dimensions.ROCM_VERSION_LABELS and v != "cuda102"], - OrderedDict( - conda=dimensions.STANDARD_PYTHON_VERSIONS, - ) - ), ) # GCC config variants: diff --git a/.circleci/cimodel/data/dimensions.py b/.circleci/cimodel/data/dimensions.py index 57af81de7157eb..efdc363579003b 100644 --- a/.circleci/cimodel/data/dimensions.py +++ b/.circleci/cimodel/data/dimensions.py @@ -4,6 +4,7 @@ "102", "113", "115", + "116", ] ROCM_VERSIONS = [ diff --git a/.circleci/cimodel/data/pytorch_build_definitions.py b/.circleci/cimodel/data/pytorch_build_definitions.py index 036e8a5991919f..e3b9365b6f2607 100644 --- a/.circleci/cimodel/data/pytorch_build_definitions.py +++ b/.circleci/cimodel/data/pytorch_build_definitions.py @@ -185,7 +185,7 @@ def gen_docs_configs(xenial_parent_config): HiddenConf( "pytorch_python_doc_build", parent_build=xenial_parent_config, - filters=gen_filter_dict(branches_list=["master", "nightly"], + filters=gen_filter_dict(branches_list=["master", "main", "nightly"], tags_list=RC_PATTERN), ) ) @@ -201,7 +201,7 @@ def gen_docs_configs(xenial_parent_config): HiddenConf( "pytorch_cpp_doc_build", parent_build=xenial_parent_config, - filters=gen_filter_dict(branches_list=["master", "nightly"], + filters=gen_filter_dict(branches_list=["master", "main", "nightly"], tags_list=RC_PATTERN), ) ) diff --git a/.circleci/cimodel/data/simple/android_definitions.py b/.circleci/cimodel/data/simple/android_definitions.py deleted file mode 100644 index fb6d6f5661b8a0..00000000000000 --- a/.circleci/cimodel/data/simple/android_definitions.py +++ /dev/null @@ -1,103 +0,0 @@ -import cimodel.data.simple.util.branch_filters as branch_filters -from cimodel.data.simple.util.docker_constants import ( - DOCKER_IMAGE_NDK, DOCKER_REQUIREMENT_NDK -) - - -class AndroidJob: - def __init__(self, - variant, - template_name, - is_master_only=True): - - self.variant = variant - self.template_name = template_name - self.is_master_only = is_master_only - - def gen_tree(self): - - base_name_parts = [ - "pytorch", - "linux", - "xenial", - "py3", - "clang5", - "android", - "ndk", - "r19c", - ] + self.variant + [ - "build", - ] - - full_job_name = "_".join(base_name_parts) - build_env_name = "-".join(base_name_parts) - - props_dict = { - "name": full_job_name, - "build_environment": "\"{}\"".format(build_env_name), - "docker_image": "\"{}\"".format(DOCKER_IMAGE_NDK), - "requires": [DOCKER_REQUIREMENT_NDK] - } - - if self.is_master_only: - props_dict["filters"] = branch_filters.gen_filter_dict(branch_filters.NON_PR_BRANCH_LIST) - - return [{self.template_name: props_dict}] - - -class AndroidGradleJob: - def __init__(self, - job_name, - template_name, - dependencies, - is_master_only=True, - is_pr_only=False, - extra_props=tuple()): - - self.job_name = job_name - self.template_name = template_name - self.dependencies = dependencies - self.is_master_only = is_master_only - self.is_pr_only = is_pr_only - self.extra_props = dict(extra_props) - - def gen_tree(self): - - props_dict = { - "name": self.job_name, - "requires": self.dependencies, - } - - if self.is_master_only: - props_dict["filters"] = branch_filters.gen_filter_dict(branch_filters.NON_PR_BRANCH_LIST) - elif self.is_pr_only: - props_dict["filters"] = branch_filters.gen_filter_dict(branch_filters.PR_BRANCH_LIST) - if self.extra_props: - props_dict.update(self.extra_props) - - return [{self.template_name: props_dict}] - - -WORKFLOW_DATA = [ - AndroidJob(["x86_32"], "pytorch_linux_build", is_master_only=False), - AndroidJob(["x86_64"], "pytorch_linux_build"), - AndroidJob(["arm", "v7a"], "pytorch_linux_build"), - AndroidJob(["arm", "v8a"], "pytorch_linux_build"), - AndroidGradleJob( - "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-x86_32", - "pytorch_android_gradle_build-x86_32", - ["pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build"], - is_master_only=False, - is_pr_only=True), - AndroidGradleJob( - "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build", - "pytorch_android_gradle_build", - ["pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build", - "pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_64_build", - "pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v7a_build", - "pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v8a_build"]), -] - - -def get_workflow_jobs(): - return [item.gen_tree() for item in WORKFLOW_DATA] diff --git a/.circleci/cimodel/data/simple/binary_smoketest.py b/.circleci/cimodel/data/simple/binary_smoketest.py deleted file mode 100644 index 6d1d421d029cca..00000000000000 --- a/.circleci/cimodel/data/simple/binary_smoketest.py +++ /dev/null @@ -1,193 +0,0 @@ -""" -TODO: Refactor circleci/cimodel/data/binary_build_data.py to generate this file - instead of doing one offs here - Binary builds (subset, to smoke test that they'll work) - - NB: If you modify this file, you need to also modify - the binary_and_smoke_tests_on_pr variable in - pytorch-ci-hud to adjust the allowed build list - at https://github.com/ezyang/pytorch-ci-hud/blob/master/src/BuildHistoryDisplay.js - - Note: - This binary build is currently broken, see https://github_com/pytorch/pytorch/issues/16710 - - binary_linux_conda_3_6_cu90_devtoolset7_build - - binary_linux_conda_3_6_cu90_devtoolset7_test - - TODO - we should test a libtorch cuda build, but they take too long - - binary_linux_libtorch_3_6m_cu90_devtoolset7_static-without-deps_build -""" - -import cimodel.lib.miniutils as miniutils -import cimodel.data.simple.util.branch_filters - - -class SmoketestJob: - def __init__(self, - template_name, - build_env_parts, - docker_image, - job_name, - is_master_only=False, - requires=None, - has_libtorch_variant=False, - extra_props=None): - - self.template_name = template_name - self.build_env_parts = build_env_parts - self.docker_image = docker_image - self.job_name = job_name - self.is_master_only = is_master_only - self.requires = requires or [] - self.has_libtorch_variant = has_libtorch_variant - self.extra_props = extra_props or {} - - def gen_tree(self): - - props_dict = { - "build_environment": " ".join(self.build_env_parts), - "name": self.job_name, - "requires": self.requires, - } - - if self.docker_image: - props_dict["docker_image"] = self.docker_image - - if self.is_master_only: - props_dict["filters"] = cimodel.data.simple.util.branch_filters.gen_filter_dict() - - if self.has_libtorch_variant: - props_dict["libtorch_variant"] = "shared-with-deps" - - props_dict.update(self.extra_props) - - return [{self.template_name: props_dict}] - - -WORKFLOW_DATA = [ - SmoketestJob( - "binary_linux_build", - ["manywheel", "3.7m", "cu102", "devtoolset7"], - "pytorch/manylinux-cuda102", - "binary_linux_manywheel_3_7m_cu102_devtoolset7_build", - is_master_only=True, - ), - SmoketestJob( - "binary_linux_build", - ["libtorch", "3.7m", "cpu", "devtoolset7"], - "pytorch/manylinux-cuda102", - "binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_build", - is_master_only=True, - has_libtorch_variant=True, - ), - SmoketestJob( - "binary_linux_build", - ["libtorch", "3.7m", "cpu", "gcc5.4_cxx11-abi"], - "pytorch/pytorch-binary-docker-image-ubuntu16.04:latest", - "binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build", - is_master_only=False, - has_libtorch_variant=True, - ), - SmoketestJob( - "binary_mac_build", - ["wheel", "3.7", "cpu"], - None, - "binary_macos_wheel_3_7_cpu_build", - is_master_only=True, - ), - # This job has an average run time of 3 hours o.O - # Now only running this on master to reduce overhead - SmoketestJob( - "binary_mac_build", - ["libtorch", "3.7", "cpu"], - None, - "binary_macos_libtorch_3_7_cpu_build", - is_master_only=True, - ), - SmoketestJob( - "binary_windows_build", - ["libtorch", "3.7", "cpu", "debug"], - None, - "binary_windows_libtorch_3_7_cpu_debug_build", - is_master_only=True, - ), - SmoketestJob( - "binary_windows_build", - ["libtorch", "3.7", "cpu", "release"], - None, - "binary_windows_libtorch_3_7_cpu_release_build", - is_master_only=True, - ), - SmoketestJob( - "binary_windows_build", - ["wheel", "3.7", "cu113"], - None, - "binary_windows_wheel_3_7_cu113_build", - is_master_only=True, - ), - - SmoketestJob( - "binary_windows_test", - ["libtorch", "3.7", "cpu", "debug"], - None, - "binary_windows_libtorch_3_7_cpu_debug_test", - is_master_only=True, - requires=["binary_windows_libtorch_3_7_cpu_debug_build"], - ), - SmoketestJob( - "binary_windows_test", - ["libtorch", "3.7", "cpu", "release"], - None, - "binary_windows_libtorch_3_7_cpu_release_test", - is_master_only=False, - requires=["binary_windows_libtorch_3_7_cpu_release_build"], - ), - SmoketestJob( - "binary_windows_test", - ["wheel", "3.7", "cu113"], - None, - "binary_windows_wheel_3_7_cu113_test", - is_master_only=True, - requires=["binary_windows_wheel_3_7_cu113_build"], - extra_props={ - "executor": "windows-with-nvidia-gpu", - }, - ), - - - - SmoketestJob( - "binary_linux_test", - ["manywheel", "3.7m", "cu102", "devtoolset7"], - "pytorch/manylinux-cuda102", - "binary_linux_manywheel_3_7m_cu102_devtoolset7_test", - is_master_only=True, - requires=["binary_linux_manywheel_3_7m_cu102_devtoolset7_build"], - extra_props={ - "resource_class": "gpu.nvidia.small", - "use_cuda_docker_runtime": miniutils.quote((str(1))), - }, - ), - SmoketestJob( - "binary_linux_test", - ["libtorch", "3.7m", "cpu", "devtoolset7"], - "pytorch/manylinux-cuda102", - "binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_test", - is_master_only=True, - requires=["binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_build"], - has_libtorch_variant=True, - ), - SmoketestJob( - "binary_linux_test", - ["libtorch", "3.7m", "cpu", "gcc5.4_cxx11-abi"], - "pytorch/pytorch-binary-docker-image-ubuntu16.04:latest", - "binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_test", - is_master_only=True, - requires=["binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build"], - has_libtorch_variant=True, - ), -] - - -def get_workflow_jobs(): - return [item.gen_tree() for item in WORKFLOW_DATA] diff --git a/.circleci/cimodel/data/simple/nightly_android.py b/.circleci/cimodel/data/simple/nightly_android.py deleted file mode 100644 index c6da5bbc4c76b1..00000000000000 --- a/.circleci/cimodel/data/simple/nightly_android.py +++ /dev/null @@ -1,77 +0,0 @@ -from cimodel.data.simple.util.docker_constants import ( - DOCKER_IMAGE_NDK, - DOCKER_REQUIREMENT_NDK -) - - -class AndroidNightlyJob: - def __init__(self, - variant, - template_name, - extra_props=None, - with_docker=True, - requires=None, - no_build_suffix=False): - - self.variant = variant - self.template_name = template_name - self.extra_props = extra_props or {} - self.with_docker = with_docker - self.requires = requires - self.no_build_suffix = no_build_suffix - - def gen_tree(self): - - base_name_parts = [ - "pytorch", - "linux", - "xenial", - "py3", - "clang5", - "android", - "ndk", - "r19c", - ] + self.variant - - build_suffix = [] if self.no_build_suffix else ["build"] - full_job_name = "_".join(["nightly"] + base_name_parts + build_suffix) - build_env_name = "-".join(base_name_parts) - - props_dict = { - "name": full_job_name, - "requires": self.requires, - "filters": {"branches": {"only": "nightly"}}, - } - - props_dict.update(self.extra_props) - - if self.with_docker: - props_dict["docker_image"] = DOCKER_IMAGE_NDK - props_dict["build_environment"] = build_env_name - - return [{self.template_name: props_dict}] - -BASE_REQUIRES = [DOCKER_REQUIREMENT_NDK] - -WORKFLOW_DATA = [ - AndroidNightlyJob(["x86_32"], "pytorch_linux_build", requires=BASE_REQUIRES), - AndroidNightlyJob(["x86_64"], "pytorch_linux_build", requires=BASE_REQUIRES), - AndroidNightlyJob(["arm", "v7a"], "pytorch_linux_build", requires=BASE_REQUIRES), - AndroidNightlyJob(["arm", "v8a"], "pytorch_linux_build", requires=BASE_REQUIRES), - AndroidNightlyJob(["android_gradle"], "pytorch_android_gradle_build", - with_docker=False, - requires=[ - "nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build", - "nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_64_build", - "nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v7a_build", - "nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v8a_build"]), - AndroidNightlyJob(["x86_32_android_publish_snapshot"], "pytorch_android_publish_snapshot", - extra_props={"context": "org-member"}, - with_docker=False, - requires=["nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_android_gradle_build"], - no_build_suffix=True), -] - - -def get_workflow_jobs(): - return [item.gen_tree() for item in WORKFLOW_DATA] diff --git a/.circleci/cimodel/data/simple/util/branch_filters.py b/.circleci/cimodel/data/simple/util/branch_filters.py index dfbc6e4d63bc90..ba4e00a059ef1c 100644 --- a/.circleci/cimodel/data/simple/util/branch_filters.py +++ b/.circleci/cimodel/data/simple/util/branch_filters.py @@ -1,4 +1,5 @@ NON_PR_BRANCH_LIST = [ + "main", "master", r"/ci-all\/.*/", r"/release\/.*/", diff --git a/.circleci/config.yml b/.circleci/config.yml index 57f2fba481373a..8b5d8b87793b57 100644 --- a/.circleci/config.yml +++ b/.circleci/config.yml @@ -455,234 +455,6 @@ promote_common: &promote_common # Job specs ############################################################################## jobs: - pytorch_linux_build: - <<: *pytorch_params - machine: - image: ubuntu-2004:202104-01 - steps: - # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml - - checkout - - calculate_docker_image_tag - - setup_linux_system_environment - - optional_merge_target_branch - - setup_ci_environment - - run: - name: Build - no_output_timeout: "1h" - command: | - set -e - if [[ ${BUILD_ENVIRONMENT} == *"pure_torch"* ]]; then - echo 'BUILD_CAFFE2=OFF' >> "${BASH_ENV}" - fi - if [[ ${BUILD_ENVIRONMENT} == *"paralleltbb"* ]]; then - echo 'ATEN_THREADING=TBB' >> "${BASH_ENV}" - echo 'USE_TBB=1' >> "${BASH_ENV}" - elif [[ ${BUILD_ENVIRONMENT} == *"parallelnative"* ]]; then - echo 'ATEN_THREADING=NATIVE' >> "${BASH_ENV}" - fi - echo "Parallel backend flags: "${PARALLEL_FLAGS} - # Pull Docker image and run build - echo "DOCKER_IMAGE: "${DOCKER_IMAGE}:${DOCKER_TAG} - time docker pull ${DOCKER_IMAGE}:${DOCKER_TAG} >/dev/null - export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE}:${DOCKER_TAG}) - - git submodule sync && git submodule update -q --init --recursive --depth 1 --jobs 0 - - docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace - - export COMMAND='((echo "sudo chown -R jenkins workspace && export JOB_BASE_NAME="$CIRCLE_JOB" && cd workspace && .jenkins/pytorch/build.sh && find ${BUILD_ROOT} -type f -name "*.a" -or -name "*.o" -delete") | docker exec -u jenkins -i "$id" bash) 2>&1' - - echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts - - # Copy dist folder back - docker cp $id:/var/lib/jenkins/workspace/dist /home/circleci/project/. || echo "Dist folder not found" - - # Push intermediate Docker image for next phase to use - if [ -z "${BUILD_ONLY}" ]; then - # Note [Special build images] - # The xla build uses the same docker image as - # pytorch_linux_bionic_py3_6_clang9_build. In the push step, we have to - # distinguish between them so the test can pick up the correct image. - output_image=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1} - if [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then - export COMMIT_DOCKER_IMAGE=$output_image-xla - elif [[ ${BUILD_ENVIRONMENT} == *"libtorch"* ]]; then - export COMMIT_DOCKER_IMAGE=$output_image-libtorch - elif [[ ${BUILD_ENVIRONMENT} == *"paralleltbb"* ]]; then - export COMMIT_DOCKER_IMAGE=$output_image-paralleltbb - elif [[ ${BUILD_ENVIRONMENT} == *"parallelnative"* ]]; then - export COMMIT_DOCKER_IMAGE=$output_image-parallelnative - elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-x86_64"* ]]; then - export COMMIT_DOCKER_IMAGE=$output_image-android-x86_64 - elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-arm-v7a"* ]]; then - export COMMIT_DOCKER_IMAGE=$output_image-android-arm-v7a - elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-arm-v8a"* ]]; then - export COMMIT_DOCKER_IMAGE=$output_image-android-arm-v8a - elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-x86_32"* ]]; then - export COMMIT_DOCKER_IMAGE=$output_image-android-x86_32 - elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-vulkan-x86_32"* ]]; then - export COMMIT_DOCKER_IMAGE=$output_image-android-vulkan-x86_32 - elif [[ ${BUILD_ENVIRONMENT} == *"vulkan-linux"* ]]; then - export COMMIT_DOCKER_IMAGE=$output_image-vulkan - else - export COMMIT_DOCKER_IMAGE=$output_image - fi - docker commit "$id" ${COMMIT_DOCKER_IMAGE} - time docker push ${COMMIT_DOCKER_IMAGE} - fi - - run: - name: upload build & binary data - no_output_timeout: "5m" - command: | - cd /pytorch && export COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0) - python3 -mpip install requests && \ - SCRIBE_GRAPHQL_ACCESS_TOKEN=${SCRIBE_GRAPHQL_ACCESS_TOKEN} \ - python3 -m tools.stats.upload_binary_size_to_scuba || exit 0 - - store_artifacts: - path: /home/circleci/project/dist - - pytorch_linux_test: - <<: *pytorch_params - machine: - image: ubuntu-2004:202104-01 - steps: - # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml - - checkout - - calculate_docker_image_tag - - setup_linux_system_environment - - setup_ci_environment - - run: - name: Download Docker image - no_output_timeout: "90m" - command: | - set -e - export PYTHONUNBUFFERED=1 - if [[ "${DOCKER_IMAGE}" == *rocm3.9* ]]; then - export DOCKER_TAG="f3d89a32912f62815e4feaeed47e564e887dffd6" - fi - # See Note [Special build images] - output_image=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1} - if [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then - export COMMIT_DOCKER_IMAGE=$output_image-xla - elif [[ ${BUILD_ENVIRONMENT} == *"libtorch"* ]]; then - export COMMIT_DOCKER_IMAGE=$output_image-libtorch - elif [[ ${BUILD_ENVIRONMENT} == *"paralleltbb"* ]]; then - export COMMIT_DOCKER_IMAGE=$output_image-paralleltbb - elif [[ ${BUILD_ENVIRONMENT} == *"parallelnative"* ]]; then - export COMMIT_DOCKER_IMAGE=$output_image-parallelnative - elif [[ ${BUILD_ENVIRONMENT} == *"vulkan-linux"* ]]; then - export COMMIT_DOCKER_IMAGE=$output_image-vulkan - else - export COMMIT_DOCKER_IMAGE=$output_image - fi - echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE} - - if [[ ${BUILD_ENVIRONMENT} == *"paralleltbb"* ]]; then - echo 'ATEN_THREADING=TBB' >> "${BASH_ENV}" - echo 'USE_TBB=1' >> "${BASH_ENV}" - elif [[ ${BUILD_ENVIRONMENT} == *"parallelnative"* ]]; then - echo 'ATEN_THREADING=NATIVE' >> "${BASH_ENV}" - fi - echo "Parallel backend flags: "${PARALLEL_FLAGS} - - time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null - - # TODO: Make this less painful - if [ -n "${USE_CUDA_DOCKER_RUNTIME}" ]; then - export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --gpus all --shm-size=2g -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE}) - elif [[ ${BUILD_ENVIRONMENT} == *"rocm"* ]]; then - hostname - export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --shm-size=8g --ipc=host --device /dev/kfd --device /dev/dri --group-add video -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE}) - else - export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --shm-size=1g --ipc=host -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE}) - fi - echo "id=${id}" >> "${BASH_ENV}" - - - run: - name: Check for no AVX instruction by default - no_output_timeout: "20m" - command: | - set -e - is_vanilla_build() { - if [ "${BUILD_ENVIRONMENT}" == "pytorch-linux-bionic-py3.7-clang9-test" ]; then - return 0 - fi - if [ "${BUILD_ENVIRONMENT}" == "pytorch-linux-xenial-py3.7-gcc5.4-test" ]; then - return 0 - fi - return 1 - } - - if is_vanilla_build; then - echo "apt-get update || apt-get install libgnutls30" | docker exec -u root -i "$id" bash - echo "apt-get install -y qemu-user gdb" | docker exec -u root -i "$id" bash - echo "cd workspace/build; qemu-x86_64 -g 2345 -cpu Broadwell -E ATEN_CPU_CAPABILITY=default ./bin/basic --gtest_filter=BasicTest.BasicTestCPU & gdb ./bin/basic -ex 'set pagination off' -ex 'target remote :2345' -ex 'continue' -ex 'bt' -ex='set confirm off' -ex 'quit \$_isvoid(\$_exitcode)'" | docker exec -u jenkins -i "$id" bash - else - echo "Skipping for ${BUILD_ENVIRONMENT}" - fi - - run: - name: Test - no_output_timeout: "90m" - command: | - set -e - - cat >docker_commands.sh \<> docker_commands.sh - elif [[ ${BUILD_ENVIRONMENT} == *onnx* ]]; then - echo ".jenkins/caffe2/test.sh" >> docker_commands.sh - else - echo ".jenkins/pytorch/test.sh" >> docker_commands.sh - fi - echo "(cat docker_commands.sh | docker exec -u jenkins -i "$id" bash) 2>&1" > command.sh - unbuffer bash command.sh | ts - - - run: - name: Report results - no_output_timeout: "5m" - command: | - set -e - # Retrieving test results should be done as very first step as command never fails - # But is always executed if previous step fails for some reason - echo "Retrieving test reports" - docker cp $id:/var/lib/jenkins/workspace/test/test-reports ./ || echo 'No test reports found!' - docker stats --all --no-stream - - cat >docker_commands.sh \<&1" > command.sh - unbuffer bash command.sh | ts - when: always - - store_test_results: - path: test-reports binary_linux_build: <<: *binary_linux_build_params steps: @@ -1085,7 +857,7 @@ jobs: parameters: branch: type: string - default: "master" + default: "main" steps: - attach_workspace: at: /tmp/workspace @@ -1125,7 +897,7 @@ jobs: echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE} # turn v1.12.0rc3 into 1.12 tag=$(echo $CIRCLE_TAG | sed -e 's/v*\([0-9]*\.[0-9]*\).*/\1/') - target=${tag:-master} + target=${tag:-main} echo "building for ${target}" time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE}) @@ -1135,7 +907,7 @@ jobs: echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts mkdir -p ~/workspace/build_artifacts - docker cp $id:/var/lib/jenkins/workspace/pytorch.github.io/docs/master ~/workspace/build_artifacts + docker cp $id:/var/lib/jenkins/workspace/pytorch.github.io/docs/main ~/workspace/build_artifacts docker cp $id:/var/lib/jenkins/workspace/pytorch.github.io /tmp/workspace # Save the docs build so we can debug any problems @@ -1147,7 +919,7 @@ jobs: paths: - . - store_artifacts: - path: ~/workspace/build_artifacts/master + path: ~/workspace/build_artifacts/main destination: docs pytorch_cpp_doc_build: @@ -1171,12 +943,12 @@ jobs: echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE} # turn v1.12.0rc3 into 1.12 tag=$(echo $CIRCLE_TAG | sed -e 's/v*\([0-9]*\.[0-9]*\).*/\1/') - target=${tag:-master} + target=${tag:-main} echo "building for ${target}" time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE}) - export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && '"export CIRCLE_SHA1='$CIRCLE_SHA1'"' && . ./.circleci/scripts/cpp_doc_push_script.sh docs/"$target" master") | docker exec -u jenkins -i "$id" bash) 2>&1' + export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && '"export CIRCLE_SHA1='$CIRCLE_SHA1'"' && . ./.circleci/scripts/cpp_doc_push_script.sh docs/"$target" main") | docker exec -u jenkins -i "$id" bash) 2>&1' echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts @@ -1660,7 +1432,7 @@ jobs: time docker pull ${DOCKER_IMAGE}:${DOCKER_TAG} >/dev/null export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE}:${DOCKER_TAG}) - echo "Do NOT merge master branch into $CIRCLE_BRANCH in environment $BUILD_ENVIRONMENT" + echo "Do NOT merge main branch into $CIRCLE_BRANCH in environment $BUILD_ENVIRONMENT" git submodule sync && git submodule update -q --init --recursive --depth 1 --jobs 0 @@ -1904,654 +1676,8 @@ jobs: # Workflows ############################################################################## workflows: - binary_builds: - jobs: - - binary_windows_build: - name: binary_windows_conda_3_7_cpu_nightly_build - build_environment: "conda 3.7 cpu" - filters: - branches: - only: - - /.*/ - tags: - only: - - /v[0-9]+(\.[0-9]+)*-rc[0-9]+/ - - binary_windows_build: - name: binary_windows_conda_3_8_cpu_nightly_build - build_environment: "conda 3.8 cpu" - filters: - branches: - only: - - /.*/ - tags: - only: - - /v[0-9]+(\.[0-9]+)*-rc[0-9]+/ - - binary_windows_build: - name: binary_windows_conda_3_9_cpu_nightly_build - build_environment: "conda 3.9 cpu" - filters: - branches: - only: - - /.*/ - tags: - only: - - /v[0-9]+(\.[0-9]+)*-rc[0-9]+/ - - binary_windows_build: - name: binary_windows_conda_3_10_cpu_nightly_build - build_environment: "conda 3.10 cpu" - filters: - branches: - only: - - /.*/ - tags: - only: - - /v[0-9]+(\.[0-9]+)*-rc[0-9]+/ - - binary_windows_build: - name: binary_windows_conda_3_7_cu113_nightly_build - build_environment: "conda 3.7 cu113" - filters: - branches: - only: - - /.*/ - tags: - only: - - /v[0-9]+(\.[0-9]+)*-rc[0-9]+/ - - binary_windows_build: - name: binary_windows_conda_3_8_cu113_nightly_build - build_environment: "conda 3.8 cu113" - filters: - branches: - only: - - /.*/ - tags: - only: - - /v[0-9]+(\.[0-9]+)*-rc[0-9]+/ - - binary_windows_build: - name: binary_windows_conda_3_9_cu113_nightly_build - build_environment: "conda 3.9 cu113" - filters: - branches: - only: - - /.*/ - tags: - only: - - /v[0-9]+(\.[0-9]+)*-rc[0-9]+/ - - binary_windows_build: - name: binary_windows_conda_3_10_cu113_nightly_build - build_environment: "conda 3.10 cu113" - filters: - branches: - only: - - /.*/ - tags: - only: - - /v[0-9]+(\.[0-9]+)*-rc[0-9]+/ - - binary_windows_build: - name: binary_windows_conda_3_7_cu115_nightly_build - build_environment: "conda 3.7 cu115" - filters: - branches: - only: - - /.*/ - tags: - only: - - /v[0-9]+(\.[0-9]+)*-rc[0-9]+/ - - binary_windows_build: - name: binary_windows_conda_3_8_cu115_nightly_build - build_environment: "conda 3.8 cu115" - filters: - branches: - only: - - /.*/ - tags: - only: - - /v[0-9]+(\.[0-9]+)*-rc[0-9]+/ - - binary_windows_build: - name: binary_windows_conda_3_9_cu115_nightly_build - build_environment: "conda 3.9 cu115" - filters: - branches: - only: - - /.*/ - tags: - only: - - /v[0-9]+(\.[0-9]+)*-rc[0-9]+/ - - binary_windows_build: - name: binary_windows_conda_3_10_cu115_nightly_build - build_environment: "conda 3.10 cu115" - filters: - branches: - only: - - /.*/ - tags: - only: - - /v[0-9]+(\.[0-9]+)*-rc[0-9]+/ - - binary_windows_test: - name: binary_windows_conda_3_7_cpu_nightly_test - build_environment: "conda 3.7 cpu" - filters: - branches: - only: - - /.*/ - tags: - only: - - /v[0-9]+(\.[0-9]+)*-rc[0-9]+/ - requires: - - binary_windows_conda_3_7_cpu_nightly_build - - binary_windows_test: - name: binary_windows_conda_3_8_cpu_nightly_test - build_environment: "conda 3.8 cpu" - filters: - branches: - only: - - /.*/ - tags: - only: - - /v[0-9]+(\.[0-9]+)*-rc[0-9]+/ - requires: - - binary_windows_conda_3_8_cpu_nightly_build - - binary_windows_test: - name: binary_windows_conda_3_9_cpu_nightly_test - build_environment: "conda 3.9 cpu" - filters: - branches: - only: - - /.*/ - tags: - only: - - /v[0-9]+(\.[0-9]+)*-rc[0-9]+/ - requires: - - binary_windows_conda_3_9_cpu_nightly_build - - binary_windows_test: - name: binary_windows_conda_3_10_cpu_nightly_test - build_environment: "conda 3.10 cpu" - filters: - branches: - only: - - /.*/ - tags: - only: - - /v[0-9]+(\.[0-9]+)*-rc[0-9]+/ - requires: - - binary_windows_conda_3_10_cpu_nightly_build - - binary_windows_test: - name: binary_windows_conda_3_7_cu113_nightly_test - build_environment: "conda 3.7 cu113" - filters: - branches: - only: - - /.*/ - tags: - only: - - /v[0-9]+(\.[0-9]+)*-rc[0-9]+/ - requires: - - binary_windows_conda_3_7_cu113_nightly_build - executor: windows-with-nvidia-gpu - - binary_windows_test: - name: binary_windows_conda_3_8_cu113_nightly_test - build_environment: "conda 3.8 cu113" - filters: - branches: - only: - - /.*/ - tags: - only: - - /v[0-9]+(\.[0-9]+)*-rc[0-9]+/ - requires: - - binary_windows_conda_3_8_cu113_nightly_build - executor: windows-with-nvidia-gpu - - binary_windows_test: - name: binary_windows_conda_3_9_cu113_nightly_test - build_environment: "conda 3.9 cu113" - filters: - branches: - only: - - /.*/ - tags: - only: - - /v[0-9]+(\.[0-9]+)*-rc[0-9]+/ - requires: - - binary_windows_conda_3_9_cu113_nightly_build - executor: windows-with-nvidia-gpu - - binary_windows_test: - name: binary_windows_conda_3_10_cu113_nightly_test - build_environment: "conda 3.10 cu113" - filters: - branches: - only: - - /.*/ - tags: - only: - - /v[0-9]+(\.[0-9]+)*-rc[0-9]+/ - requires: - - binary_windows_conda_3_10_cu113_nightly_build - executor: windows-with-nvidia-gpu - - binary_windows_test: - name: binary_windows_conda_3_7_cu115_nightly_test - build_environment: "conda 3.7 cu115" - filters: - branches: - only: - - /.*/ - tags: - only: - - /v[0-9]+(\.[0-9]+)*-rc[0-9]+/ - requires: - - binary_windows_conda_3_7_cu115_nightly_build - executor: windows-with-nvidia-gpu - - binary_windows_test: - name: binary_windows_conda_3_8_cu115_nightly_test - build_environment: "conda 3.8 cu115" - filters: - branches: - only: - - /.*/ - tags: - only: - - /v[0-9]+(\.[0-9]+)*-rc[0-9]+/ - requires: - - binary_windows_conda_3_8_cu115_nightly_build - executor: windows-with-nvidia-gpu - - binary_windows_test: - name: binary_windows_conda_3_9_cu115_nightly_test - build_environment: "conda 3.9 cu115" - filters: - branches: - only: - - /.*/ - tags: - only: - - /v[0-9]+(\.[0-9]+)*-rc[0-9]+/ - requires: - - binary_windows_conda_3_9_cu115_nightly_build - executor: windows-with-nvidia-gpu - - binary_windows_test: - name: binary_windows_conda_3_10_cu115_nightly_test - build_environment: "conda 3.10 cu115" - filters: - branches: - only: - - /.*/ - tags: - only: - - /v[0-9]+(\.[0-9]+)*-rc[0-9]+/ - requires: - - binary_windows_conda_3_10_cu115_nightly_build - executor: windows-with-nvidia-gpu - - binary_upload: - name: binary_windows_conda_3_7_cpu_nightly_upload - context: org-member - requires: - - binary_windows_conda_3_7_cpu_nightly_test - filters: - branches: - only: - - nightly - tags: - only: - - /v[0-9]+(\.[0-9]+)*-rc[0-9]+/ - package_type: conda - upload_subfolder: cpu - - binary_upload: - name: binary_windows_conda_3_8_cpu_nightly_upload - context: org-member - requires: - - binary_windows_conda_3_8_cpu_nightly_test - filters: - branches: - only: - - nightly - tags: - only: - - /v[0-9]+(\.[0-9]+)*-rc[0-9]+/ - package_type: conda - upload_subfolder: cpu - - binary_upload: - name: binary_windows_conda_3_9_cpu_nightly_upload - context: org-member - requires: - - binary_windows_conda_3_9_cpu_nightly_test - filters: - branches: - only: - - nightly - tags: - only: - - /v[0-9]+(\.[0-9]+)*-rc[0-9]+/ - package_type: conda - upload_subfolder: cpu - - binary_upload: - name: binary_windows_conda_3_10_cpu_nightly_upload - context: org-member - requires: - - binary_windows_conda_3_10_cpu_nightly_test - filters: - branches: - only: - - nightly - tags: - only: - - /v[0-9]+(\.[0-9]+)*-rc[0-9]+/ - package_type: conda - upload_subfolder: cpu - - binary_upload: - name: binary_windows_conda_3_7_cu113_nightly_upload - context: org-member - requires: - - binary_windows_conda_3_7_cu113_nightly_test - filters: - branches: - only: - - nightly - tags: - only: - - /v[0-9]+(\.[0-9]+)*-rc[0-9]+/ - package_type: conda - upload_subfolder: cu113 - - binary_upload: - name: binary_windows_conda_3_8_cu113_nightly_upload - context: org-member - requires: - - binary_windows_conda_3_8_cu113_nightly_test - filters: - branches: - only: - - nightly - tags: - only: - - /v[0-9]+(\.[0-9]+)*-rc[0-9]+/ - package_type: conda - upload_subfolder: cu113 - - binary_upload: - name: binary_windows_conda_3_9_cu113_nightly_upload - context: org-member - requires: - - binary_windows_conda_3_9_cu113_nightly_test - filters: - branches: - only: - - nightly - tags: - only: - - /v[0-9]+(\.[0-9]+)*-rc[0-9]+/ - package_type: conda - upload_subfolder: cu113 - - binary_upload: - name: binary_windows_conda_3_10_cu113_nightly_upload - context: org-member - requires: - - binary_windows_conda_3_10_cu113_nightly_test - filters: - branches: - only: - - nightly - tags: - only: - - /v[0-9]+(\.[0-9]+)*-rc[0-9]+/ - package_type: conda - upload_subfolder: cu113 - - binary_upload: - name: binary_windows_conda_3_7_cu115_nightly_upload - context: org-member - requires: - - binary_windows_conda_3_7_cu115_nightly_test - filters: - branches: - only: - - nightly - tags: - only: - - /v[0-9]+(\.[0-9]+)*-rc[0-9]+/ - package_type: conda - upload_subfolder: cu115 - - binary_upload: - name: binary_windows_conda_3_8_cu115_nightly_upload - context: org-member - requires: - - binary_windows_conda_3_8_cu115_nightly_test - filters: - branches: - only: - - nightly - tags: - only: - - /v[0-9]+(\.[0-9]+)*-rc[0-9]+/ - package_type: conda - upload_subfolder: cu115 - - binary_upload: - name: binary_windows_conda_3_9_cu115_nightly_upload - context: org-member - requires: - - binary_windows_conda_3_9_cu115_nightly_test - filters: - branches: - only: - - nightly - tags: - only: - - /v[0-9]+(\.[0-9]+)*-rc[0-9]+/ - package_type: conda - upload_subfolder: cu115 - - binary_upload: - name: binary_windows_conda_3_10_cu115_nightly_upload - context: org-member - requires: - - binary_windows_conda_3_10_cu115_nightly_test - filters: - branches: - only: - - nightly - tags: - only: - - /v[0-9]+(\.[0-9]+)*-rc[0-9]+/ - package_type: conda - upload_subfolder: cu115 - when: << pipeline.parameters.run_binary_tests >> build: jobs: - - pytorch_linux_build: - build_environment: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-x86_32-build" - docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c" - name: pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build - requires: - - docker-pytorch-linux-xenial-py3-clang5-android-ndk-r19c - - pytorch_linux_build: - build_environment: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-x86_64-build" - docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c" - filters: - branches: - only: - - master - - /ci-all\/.*/ - - /release\/.*/ - name: pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_64_build - requires: - - docker-pytorch-linux-xenial-py3-clang5-android-ndk-r19c - - pytorch_linux_build: - build_environment: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-arm-v7a-build" - docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c" - filters: - branches: - only: - - master - - /ci-all\/.*/ - - /release\/.*/ - name: pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v7a_build - requires: - - docker-pytorch-linux-xenial-py3-clang5-android-ndk-r19c - - pytorch_linux_build: - build_environment: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-arm-v8a-build" - docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c" - filters: - branches: - only: - - master - - /ci-all\/.*/ - - /release\/.*/ - name: pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v8a_build - requires: - - docker-pytorch-linux-xenial-py3-clang5-android-ndk-r19c - - pytorch_android_gradle_build-x86_32: - filters: - branches: - only: - - /gh\/.*\/head/ - - /pull\/.*/ - name: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-x86_32 - requires: - - pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build - - pytorch_android_gradle_build: - filters: - branches: - only: - - master - - /ci-all\/.*/ - - /release\/.*/ - name: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build - requires: - - pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build - - pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_64_build - - pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v7a_build - - pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v8a_build - - binary_linux_build: - build_environment: manywheel 3.7m cu102 devtoolset7 - docker_image: pytorch/manylinux-cuda102 - filters: - branches: - only: - - master - - /ci-all\/.*/ - - /release\/.*/ - name: binary_linux_manywheel_3_7m_cu102_devtoolset7_build - - binary_linux_build: - build_environment: libtorch 3.7m cpu devtoolset7 - docker_image: pytorch/manylinux-cuda102 - filters: - branches: - only: - - master - - /ci-all\/.*/ - - /release\/.*/ - libtorch_variant: shared-with-deps - name: binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_build - - binary_linux_build: - build_environment: libtorch 3.7m cpu gcc5.4_cxx11-abi - docker_image: pytorch/pytorch-binary-docker-image-ubuntu16.04:latest - libtorch_variant: shared-with-deps - name: binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build - - binary_mac_build: - build_environment: wheel 3.7 cpu - filters: - branches: - only: - - master - - /ci-all\/.*/ - - /release\/.*/ - name: binary_macos_wheel_3_7_cpu_build - - binary_mac_build: - build_environment: libtorch 3.7 cpu - filters: - branches: - only: - - master - - /ci-all\/.*/ - - /release\/.*/ - name: binary_macos_libtorch_3_7_cpu_build - - binary_windows_build: - build_environment: libtorch 3.7 cpu debug - filters: - branches: - only: - - master - - /ci-all\/.*/ - - /release\/.*/ - name: binary_windows_libtorch_3_7_cpu_debug_build - - binary_windows_build: - build_environment: libtorch 3.7 cpu release - filters: - branches: - only: - - master - - /ci-all\/.*/ - - /release\/.*/ - name: binary_windows_libtorch_3_7_cpu_release_build - - binary_windows_build: - build_environment: wheel 3.7 cu113 - filters: - branches: - only: - - master - - /ci-all\/.*/ - - /release\/.*/ - name: binary_windows_wheel_3_7_cu113_build - - binary_windows_test: - build_environment: libtorch 3.7 cpu debug - filters: - branches: - only: - - master - - /ci-all\/.*/ - - /release\/.*/ - name: binary_windows_libtorch_3_7_cpu_debug_test - requires: - - binary_windows_libtorch_3_7_cpu_debug_build - - binary_windows_test: - build_environment: libtorch 3.7 cpu release - name: binary_windows_libtorch_3_7_cpu_release_test - requires: - - binary_windows_libtorch_3_7_cpu_release_build - - binary_windows_test: - build_environment: wheel 3.7 cu113 - executor: windows-with-nvidia-gpu - filters: - branches: - only: - - master - - /ci-all\/.*/ - - /release\/.*/ - name: binary_windows_wheel_3_7_cu113_test - requires: - - binary_windows_wheel_3_7_cu113_build - - binary_linux_test: - build_environment: manywheel 3.7m cu102 devtoolset7 - docker_image: pytorch/manylinux-cuda102 - filters: - branches: - only: - - master - - /ci-all\/.*/ - - /release\/.*/ - name: binary_linux_manywheel_3_7m_cu102_devtoolset7_test - requires: - - binary_linux_manywheel_3_7m_cu102_devtoolset7_build - resource_class: gpu.nvidia.small - use_cuda_docker_runtime: "1" - - binary_linux_test: - build_environment: libtorch 3.7m cpu devtoolset7 - docker_image: pytorch/manylinux-cuda102 - filters: - branches: - only: - - master - - /ci-all\/.*/ - - /release\/.*/ - libtorch_variant: shared-with-deps - name: binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_test - requires: - - binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_build - - binary_linux_test: - build_environment: libtorch 3.7m cpu gcc5.4_cxx11-abi - docker_image: pytorch/pytorch-binary-docker-image-ubuntu16.04:latest - filters: - branches: - only: - - master - - /ci-all\/.*/ - - /release\/.*/ - libtorch_variant: shared-with-deps - name: binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_test - requires: - - binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build - binary_ios_build: build_environment: libtorch-ios-12.5.1-nightly-x86_64-build context: org-member @@ -2617,60 +1743,6 @@ workflows: requires: - pytorch_ios_full_jit_12_5_1_nightly_x86_64_build - pytorch_ios_full_jit_12_5_1_nightly_arm64_build - - pytorch_linux_build: - build_environment: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-x86_32 - docker_image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c - filters: - branches: - only: nightly - name: nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build - requires: - - docker-pytorch-linux-xenial-py3-clang5-android-ndk-r19c - - pytorch_linux_build: - build_environment: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-x86_64 - docker_image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c - filters: - branches: - only: nightly - name: nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_64_build - requires: - - docker-pytorch-linux-xenial-py3-clang5-android-ndk-r19c - - pytorch_linux_build: - build_environment: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-arm-v7a - docker_image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c - filters: - branches: - only: nightly - name: nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v7a_build - requires: - - docker-pytorch-linux-xenial-py3-clang5-android-ndk-r19c - - pytorch_linux_build: - build_environment: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-arm-v8a - docker_image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c - filters: - branches: - only: nightly - name: nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v8a_build - requires: - - docker-pytorch-linux-xenial-py3-clang5-android-ndk-r19c - - pytorch_android_gradle_build: - filters: - branches: - only: nightly - name: nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_android_gradle_build - requires: - - nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build - - nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_64_build - - nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v7a_build - - nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v8a_build - - pytorch_android_publish_snapshot: - context: org-member - filters: - branches: - only: nightly - name: nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_android_publish_snapshot - requires: - - nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_android_gradle_build - anaconda_prune: name: anaconda-prune-pytorch-nightly context: "org-member" @@ -2689,232 +1761,7 @@ workflows: branches: only: - postnightly - - update_s3_htmls: - context: org-member - filters: - branches: - only: - - postnightly - name: update_s3_htmls - - smoke_windows_test: - name: smoke_windows_conda_3_7_cpu_nightly - build_environment: "conda 3.7 cpu" - requires: - - update_s3_htmls - filters: - branches: - only: - - postnightly - - smoke_windows_test: - name: smoke_windows_conda_3_8_cpu_nightly - build_environment: "conda 3.8 cpu" - requires: - - update_s3_htmls - filters: - branches: - only: - - postnightly - - smoke_windows_test: - name: smoke_windows_conda_3_9_cpu_nightly - build_environment: "conda 3.9 cpu" - requires: - - update_s3_htmls - filters: - branches: - only: - - postnightly - - smoke_windows_test: - name: smoke_windows_conda_3_10_cpu_nightly - build_environment: "conda 3.10 cpu" - requires: - - update_s3_htmls - filters: - branches: - only: - - postnightly - - smoke_windows_test: - name: smoke_windows_conda_3_7_cu113_nightly - build_environment: "conda 3.7 cu113" - requires: - - update_s3_htmls - filters: - branches: - only: - - postnightly - executor: windows-with-nvidia-gpu - - smoke_windows_test: - name: smoke_windows_conda_3_8_cu113_nightly - build_environment: "conda 3.8 cu113" - requires: - - update_s3_htmls - filters: - branches: - only: - - postnightly - executor: windows-with-nvidia-gpu - - smoke_windows_test: - name: smoke_windows_conda_3_9_cu113_nightly - build_environment: "conda 3.9 cu113" - requires: - - update_s3_htmls - filters: - branches: - only: - - postnightly - executor: windows-with-nvidia-gpu - - smoke_windows_test: - name: smoke_windows_conda_3_10_cu113_nightly - build_environment: "conda 3.10 cu113" - requires: - - update_s3_htmls - filters: - branches: - only: - - postnightly - executor: windows-with-nvidia-gpu - - smoke_windows_test: - name: smoke_windows_conda_3_7_cu115_nightly - build_environment: "conda 3.7 cu115" - requires: - - update_s3_htmls - filters: - branches: - only: - - postnightly - executor: windows-with-nvidia-gpu - - smoke_windows_test: - name: smoke_windows_conda_3_8_cu115_nightly - build_environment: "conda 3.8 cu115" - requires: - - update_s3_htmls - filters: - branches: - only: - - postnightly - executor: windows-with-nvidia-gpu - - smoke_windows_test: - name: smoke_windows_conda_3_9_cu115_nightly - build_environment: "conda 3.9 cu115" - requires: - - update_s3_htmls - filters: - branches: - only: - - postnightly - executor: windows-with-nvidia-gpu - - smoke_windows_test: - name: smoke_windows_conda_3_10_cu115_nightly - build_environment: "conda 3.10 cu115" - requires: - - update_s3_htmls - filters: - branches: - only: - - postnightly - executor: windows-with-nvidia-gpu - - docker_build_job: - name: "docker-pytorch-linux-xenial-py3-clang5-android-ndk-r19c" - image_name: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c" when: << pipeline.parameters.run_build >> - master_build: - jobs: - - pytorch_linux_build: - build_environment: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-x86_32-build" - docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c" - name: pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build - requires: - - docker-pytorch-linux-xenial-py3-clang5-android-ndk-r19c - - pytorch_linux_build: - build_environment: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-x86_64-build" - docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c" - name: pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_64_build - requires: - - docker-pytorch-linux-xenial-py3-clang5-android-ndk-r19c - - pytorch_linux_build: - build_environment: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-arm-v7a-build" - docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c" - name: pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v7a_build - requires: - - docker-pytorch-linux-xenial-py3-clang5-android-ndk-r19c - - pytorch_linux_build: - build_environment: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-arm-v8a-build" - docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c" - name: pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v8a_build - requires: - - docker-pytorch-linux-xenial-py3-clang5-android-ndk-r19c - - pytorch_android_gradle_build: - name: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build - requires: - - pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build - - pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_64_build - - pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v7a_build - - pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v8a_build - - binary_linux_build: - build_environment: manywheel 3.7m cu102 devtoolset7 - docker_image: pytorch/manylinux-cuda102 - name: binary_linux_manywheel_3_7m_cu102_devtoolset7_build - - binary_linux_build: - build_environment: libtorch 3.7m cpu devtoolset7 - docker_image: pytorch/manylinux-cuda102 - libtorch_variant: shared-with-deps - name: binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_build - - binary_linux_build: - build_environment: libtorch 3.7m cpu gcc5.4_cxx11-abi - docker_image: pytorch/pytorch-binary-docker-image-ubuntu16.04:latest - libtorch_variant: shared-with-deps - name: binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build - - binary_mac_build: - build_environment: wheel 3.7 cpu - name: binary_macos_wheel_3_7_cpu_build - - binary_mac_build: - build_environment: libtorch 3.7 cpu - name: binary_macos_libtorch_3_7_cpu_build - - binary_windows_build: - build_environment: libtorch 3.7 cpu debug - name: binary_windows_libtorch_3_7_cpu_debug_build - - binary_windows_build: - build_environment: libtorch 3.7 cpu release - name: binary_windows_libtorch_3_7_cpu_release_build - - binary_windows_build: - build_environment: wheel 3.7 cu113 - name: binary_windows_wheel_3_7_cu113_build - - binary_windows_test: - build_environment: libtorch 3.7 cpu debug - name: binary_windows_libtorch_3_7_cpu_debug_test - requires: - - binary_windows_libtorch_3_7_cpu_debug_build - - binary_windows_test: - build_environment: wheel 3.7 cu113 - executor: windows-with-nvidia-gpu - name: binary_windows_wheel_3_7_cu113_test - requires: - - binary_windows_wheel_3_7_cu113_build - - binary_linux_test: - build_environment: manywheel 3.7m cu102 devtoolset7 - docker_image: pytorch/manylinux-cuda102 - name: binary_linux_manywheel_3_7m_cu102_devtoolset7_test - requires: - - binary_linux_manywheel_3_7m_cu102_devtoolset7_build - resource_class: gpu.nvidia.small - use_cuda_docker_runtime: "1" - - binary_linux_test: - build_environment: libtorch 3.7m cpu devtoolset7 - docker_image: pytorch/manylinux-cuda102 - libtorch_variant: shared-with-deps - name: binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_test - requires: - - binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_build - - binary_linux_test: - build_environment: libtorch 3.7m cpu gcc5.4_cxx11-abi - docker_image: pytorch/pytorch-binary-docker-image-ubuntu16.04:latest - libtorch_variant: shared-with-deps - name: binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_test - requires: - - binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build - - docker_build_job: - name: "docker-pytorch-linux-xenial-py3-clang5-android-ndk-r19c" - image_name: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c" - when: << pipeline.parameters.run_master_build >> # Promotion workflow promote: jobs: diff --git a/.circleci/docker/build.sh b/.circleci/docker/build.sh index dcd83f7ee0bbc0..0f372a3bb6991b 100755 --- a/.circleci/docker/build.sh +++ b/.circleci/docker/build.sh @@ -145,6 +145,17 @@ case "$image" in VISION=yes KATEX=yes ;; + pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7) + CUDA_VERSION=11.6.0 + CUDNN_VERSION=8 + ANACONDA_PYTHON_VERSION=3.7 + CMAKE_VERSION=3.10.3 + GCC_VERSION=7 + PROTOBUF=yes + DB=yes + VISION=yes + KATEX=yes + ;; pytorch-linux-xenial-py3-clang5-asan) ANACONDA_PYTHON_VERSION=3.7 CLANG_VERSION=5.0 @@ -222,21 +233,21 @@ case "$image" in DB=yes VISION=yes ;; - pytorch-linux-bionic-rocm4.3.1-py3.7) + pytorch-linux-bionic-rocm4.5-py3.7) ANACONDA_PYTHON_VERSION=3.7 GCC_VERSION=9 PROTOBUF=yes DB=yes VISION=yes - ROCM_VERSION=4.3.1 + ROCM_VERSION=4.5.2 ;; - pytorch-linux-bionic-rocm4.5-py3.7) + pytorch-linux-bionic-rocm5.0-py3.7) ANACONDA_PYTHON_VERSION=3.7 GCC_VERSION=9 PROTOBUF=yes DB=yes VISION=yes - ROCM_VERSION=4.5.2 + ROCM_VERSION=5.0 ;; *) # Catch-all for builds that are not hardcoded. @@ -283,6 +294,13 @@ fi tmp_tag=$(basename "$(mktemp -u)" | tr '[:upper:]' '[:lower:]') +#when using cudnn version 8 install it separately from cuda +if [[ "$image" == *cuda* && ${OS} == "ubuntu" ]]; then + IMAGE_NAME="nvidia/cuda:${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-ubuntu${UBUNTU_VERSION}" + if [[ ${CUDNN_VERSION} == 8 ]]; then + IMAGE_NAME="nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION}" + fi +fi # Build image # TODO: build-arg THRIFT is not turned on for any image, remove it once we confirm @@ -321,6 +339,7 @@ docker build \ --build-arg "KATEX=${KATEX:-}" \ --build-arg "ROCM_VERSION=${ROCM_VERSION:-}" \ --build-arg "PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH:-gfx900;gfx906}" \ + --build-arg "IMAGE_NAME=${IMAGE_NAME}" \ -f $(dirname ${DOCKERFILE})/Dockerfile \ -t "$tmp_tag" \ "$@" \ diff --git a/.circleci/docker/centos-rocm/Dockerfile b/.circleci/docker/centos-rocm/Dockerfile index 264ccaf0ea7c01..f2747d58bfd652 100644 --- a/.circleci/docker/centos-rocm/Dockerfile +++ b/.circleci/docker/centos-rocm/Dockerfile @@ -42,8 +42,10 @@ RUN bash ./install_user.sh && rm install_user.sh # Install conda and other packages (e.g., numpy, pytest) ENV PATH /opt/conda/bin:$PATH ARG ANACONDA_PYTHON_VERSION +ADD requirements-ci.txt /opt/conda/requirements-ci.txt ADD ./common/install_conda.sh install_conda.sh RUN bash ./install_conda.sh && rm install_conda.sh +RUN rm /opt/conda/requirements-ci.txt # (optional) Install protobuf for ONNX ARG PROTOBUF diff --git a/.circleci/docker/common/install_conda.sh b/.circleci/docker/common/install_conda.sh index 72f06fb2285c3e..b333051a89e6f2 100755 --- a/.circleci/docker/common/install_conda.sh +++ b/.circleci/docker/common/install_conda.sh @@ -21,7 +21,7 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then ;; esac - mkdir /opt/conda + mkdir -p /opt/conda chown jenkins:jenkins /opt/conda # Work around bug where devtoolset replaces sudo and breaks it. @@ -94,20 +94,7 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then conda_install nnpack -c killeent # Install some other packages, including those needed for Python test reporting - # Pin SciPy because of failing distribution tests (see #60347) - # Pin MyPy version because new errors are likely to appear with each release - # Pin hypothesis to avoid flakiness: https://github.com/pytorch/pytorch/issues/31136 - as_jenkins pip install --progress-bar off pytest \ - scipy==1.6.3 \ - scikit-image \ - psutil \ - unittest-xml-reporting \ - boto3==1.16.34 \ - hypothesis==4.53.2 \ - expecttest==0.1.3 \ - mypy==0.812 \ - tb-nightly \ - librosa>=0.6.2 + as_jenkins pip install --progress-bar off -r /opt/conda/requirements-ci.txt # Install numba only on python-3.8 or below # For numba issue see https://github.com/pytorch/pytorch/issues/51511 diff --git a/.circleci/docker/common/install_cudnn.sh b/.circleci/docker/common/install_cudnn.sh new file mode 100644 index 00000000000000..1f1c34ea200d4f --- /dev/null +++ b/.circleci/docker/common/install_cudnn.sh @@ -0,0 +1,18 @@ +#!/bin/bash + +if [[ ${CUDNN_VERSION} == 8 ]]; then + # cuDNN license: https://developer.nvidia.com/cudnn/license_agreement + mkdir tmp_cudnn && cd tmp_cudnn + CUDNN_NAME="cudnn-linux-x86_64-8.3.2.44_cuda11.5-archive" + curl -OLs https://developer.download.nvidia.com/compute/redist/cudnn/v8.3.2/local_installers/11.5/${CUDNN_NAME}.tar.xz + tar xf ${CUDNN_NAME}.tar.xz + cp -a ${CUDNN_NAME}/include/* /usr/include/ + cp -a ${CUDNN_NAME}/include/* /usr/local/cuda/include/ + cp -a ${CUDNN_NAME}/include/* /usr/include/x86_64-linux-gnu/ + + cp -a ${CUDNN_NAME}/lib/* /usr/local/cuda/lib64/ + cp -a ${CUDNN_NAME}/lib/* /usr/lib/x86_64-linux-gnu/ + cd .. + rm -rf tmp_cudnn + ldconfig +fi diff --git a/.circleci/docker/common/install_rocm.sh b/.circleci/docker/common/install_rocm.sh index f5bbbe85a0d239..1a20b79ec2191b 100644 --- a/.circleci/docker/common/install_rocm.sh +++ b/.circleci/docker/common/install_rocm.sh @@ -6,7 +6,7 @@ install_magma() { # "install" hipMAGMA into /opt/rocm/magma by copying after build git clone https://bitbucket.org/icl/magma.git pushd magma - # Mar 7 - Fixes memory leaks for many linalg UTs + # Fixes memory leaks of magma found while executing linalg UTs git checkout 5959b8783e45f1809812ed96ae762f38ee701972 cp make.inc-examples/make.inc.hip-gcc-mkl make.inc echo 'LIBDIR += -L$(MKLROOT)/lib' >> make.inc @@ -35,7 +35,7 @@ ver() { } # Map ROCm version to AMDGPU version -declare -A AMDGPU_VERSIONS=( ["4.5.2"]="21.40.2" ) +declare -A AMDGPU_VERSIONS=( ["4.5.2"]="21.40.2" ["5.0"]="21.50" ["5.2"]="22.20" ["5.2.1"]="22.20.1" ["5.2.3"]="22.20.3" ) install_ubuntu() { apt-get update @@ -117,7 +117,7 @@ install_centos() { echo "gpgkey=http://repo.radeon.com/rocm/rocm.gpg.key" >> /etc/yum.repos.d/amdgpu.repo fi - local rocm_baseurl="http://repo.radeon.com/rocm/yum/${ROCM_VERSION}" + local rocm_baseurl="http://repo.radeon.com/rocm/yum/${ROCM_VERSION}/main" echo "[ROCm]" > /etc/yum.repos.d/rocm.repo echo "name=ROCm" >> /etc/yum.repos.d/rocm.repo echo "baseurl=${rocm_baseurl}" >> /etc/yum.repos.d/rocm.repo diff --git a/.circleci/docker/common/install_user.sh b/.circleci/docker/common/install_user.sh index 69c762350bbfb4..f0a8d86805dc0a 100755 --- a/.circleci/docker/common/install_user.sh +++ b/.circleci/docker/common/install_user.sh @@ -3,8 +3,11 @@ set -ex # Mirror jenkins user in container -echo "jenkins:x:1014:1014::/var/lib/jenkins:" >> /etc/passwd -echo "jenkins:x:1014:" >> /etc/group +# jenkins user as ec2-user should have the same user-id +echo "jenkins:x:1000:1000::/var/lib/jenkins:" >> /etc/passwd +echo "jenkins:x:1000:" >> /etc/group +# Needed on focal or newer +echo "jenkins:*:19110:0:99999:7:::" >>/etc/shadow # Create $HOME mkdir -p /var/lib/jenkins diff --git a/.circleci/docker/requirements-ci.txt b/.circleci/docker/requirements-ci.txt new file mode 100644 index 00000000000000..838062474d7a13 --- /dev/null +++ b/.circleci/docker/requirements-ci.txt @@ -0,0 +1,210 @@ +# Python dependencies required for unit tests + +#awscli==1.6 #this breaks some platforms +#Description: AWS command line interface +#Pinned versions: 1.6 +#test that import: + +boto3==1.19.12 +#Description: AWS SDK for python +#Pinned versions: 1.19.12, 1.16.34 +#test that import: + +click +#Description: Command Line Interface Creation Kit +#Pinned versions: +#test that import: + +coremltools==5.0b5 +#Description: Apple framework for ML integration +#Pinned versions: 5.0b5 +#test that import: + +#dataclasses #this breaks some platforms +#Description: Provides decorators for auto adding special methods to user classes +#Pinned versions: +#test that import: + +expecttest==0.1.3 +#Description: method for writing tests where test framework auto populates +# the expected output based on previous runs +#Pinned versions: 0.1.3 +#test that import: + +flatbuffers==2.0 +#Description: cross platform serialization library +#Pinned versions: 2.0 +#test that import: + +#future #this breaks linux-bionic-rocm4.5-py3.7 +#Description: compatibility layer between python 2 and python 3 +#Pinned versions: +#test that import: + +hypothesis==4.53.2 +# Pin hypothesis to avoid flakiness: https://github.com/pytorch/pytorch/issues/31136 +#Description: advanced library for generating parametrized tests +#Pinned versions: 3.44.6, 4.53.2 +#test that import: test_xnnpack_integration.py, test_pruning_op.py, test_nn.py + +junitparser==2.1.1 +#Description: unitparser handles JUnit/xUnit Result XML files +#Pinned versions: 2.1.1 +#test that import: + +librosa>=0.6.2 +#Description: A python package for music and audio analysis +#Pinned versions: >=0.6.2 +#test that import: test_spectral_ops.py + +#mkl #this breaks linux-bionic-rocm4.5-py3.7 +#Description: Intel oneAPI Math Kernel Library +#Pinned versions: +#test that import: test_profiler.py, test_public_bindings.py, test_testing.py, +#test_nn.py, test_mkldnn.py, test_jit.py, test_fx_experimental.py, +#test_autograd.py + +#mkl-devel +# see mkl + +#mock # breaks ci/circleci: docker-pytorch-linux-xenial-py3-clang5-android-ndk-r19c +#Description: A testing library that allows you to replace parts of your +#system under test with mock objects +#Pinned versions: +#test that import: test_module_init.py, test_modules.py, test_nn.py, +#test_testing.py + +#MonkeyType # breaks pytorch-xla-linux-bionic-py3.7-clang8 +#Description: collects runtime types of function arguments and return +#values, and can automatically generate stub files +#Pinned versions: +#test that import: + +mypy==0.960 +# Pin MyPy version because new errors are likely to appear with each release +#Description: linter +#Pinned versions: 0.960 +#test that import: test_typing.py, test_type_hints.py + +#networkx +#Description: creation, manipulation, and study of +#the structure, dynamics, and functions of complex networks +#Pinned versions: 2.0 +#test that import: + +#ninja +#Description: build system. Note that it install from +#here breaks things so it is commented out +#Pinned versions: 1.10.0.post1 +#test that import: run_test.py, test_cpp_extensions_aot.py,test_determination.py + +#numba +#Description: Just-In-Time Compiler for Numerical Functions +#Pinned versions: 0.54.1, 0.49.0, <=0.49.1 +#test that import: test_numba_integration.py + +#numpy +#Description: Provides N-dimensional arrays and linear algebra +#Pinned versions: 1.20 +#test that import: test_view_ops.py, test_unary_ufuncs.py, test_type_promotion.py, +#test_type_info.py, test_torch.py, test_tensorexpr_pybind.py, test_tensorexpr.py, +#test_tensorboard.py, test_tensor_creation_ops.py, test_static_runtime.py, +#test_spectral_ops.py, test_sort_and_select.py, test_shape_ops.py, +#test_segment_reductions.py, test_reductions.py, test_pruning_op.py, +#test_overrides.py, test_numpy_interop.py, test_numba_integration.py +#test_nn.py, test_namedtensor.py, test_linalg.py, test_jit_cuda_fuser.py, +#test_jit.py, test_indexing.py, test_datapipe.py, test_dataloader.py, +#test_binary_ufuncs.py + +#onnxruntime +#Description: scoring engine for Open Neural Network Exchange (ONNX) models +#Pinned versions: 1.9.0 +#test that import: + +#pillow +#Description: Python Imaging Library fork +#Pinned versions: +#test that import: + +protobuf==3.20.1 +#Description: Google’s data interchange format +#Pinned versions: 3.20.1 +#test that import: test_tensorboard.py + +psutil +#Description: information on running processes and system utilization +#Pinned versions: +#test that import: test_profiler.py, test_openmp.py, test_dataloader.py + +pytest +#Description: testing framework +#Pinned versions: +#test that import: test_typing.py, test_cpp_extensions_aot.py, run_test.py + +#pytest-benchmark +#Description: fixture for benchmarking code +#Pinned versions: 3.2.3 +#test that import: + +#pytest-sugar +#Description: shows failures and errors instantly +#Pinned versions: +#test that import: + +#PyYAML +#Description: data serialization format +#Pinned versions: +#test that import: + +#requests +#Description: HTTP library +#Pinned versions: +#test that import: test_type_promotion.py + +#rich +#Description: rich text and beautiful formatting in the terminal +#Pinned versions: 10.9.0 +#test that import: + +scikit-image +#Description: image processing routines +#Pinned versions: +#test that import: test_nn.py + +#scikit-learn +#Description: machine learning package +#Pinned versions: 0.20.3 +#test that import: + +scipy==1.6.3 +# Pin SciPy because of failing distribution tests (see #60347) +#Description: scientific python +#Pinned versions: 1.6.3 +#test that import: test_unary_ufuncs.py, test_torch.py,test_tensor_creation_ops.py +#test_spectral_ops.py, test_sparse_csr.py, test_reductions.py,test_nn.py +#test_linalg.py, test_binary_ufuncs.py + +#tabulate +#Description: Pretty-print tabular data +#Pinned versions: +#test that import: + +tb-nightly +#Description: TensorBoard +#Pinned versions: +#test that import: + +#typing-extensions +#Description: type hints for python +#Pinned versions: +#test that import: + +#virtualenv +#Description: virtual environment for python +#Pinned versions: +#test that import: + +unittest-xml-reporting<=3.2.0,>=2.0.0 +#Description: saves unit test results to xml +#Pinned versions: +#test that import: diff --git a/.circleci/docker/ubuntu-cuda/Dockerfile b/.circleci/docker/ubuntu-cuda/Dockerfile index 9c9e40387066e5..241b91cff394d1 100644 --- a/.circleci/docker/ubuntu-cuda/Dockerfile +++ b/.circleci/docker/ubuntu-cuda/Dockerfile @@ -1,12 +1,11 @@ ARG UBUNTU_VERSION ARG CUDA_VERSION -ARG CUDNN_VERSION +ARG IMAGE_NAME -FROM nvidia/cuda:${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-ubuntu${UBUNTU_VERSION} +FROM ${IMAGE_NAME} ARG UBUNTU_VERSION ARG CUDA_VERSION -ARG CUDNN_VERSION ENV DEBIAN_FRONTEND noninteractive @@ -27,8 +26,10 @@ RUN bash ./install_katex.sh && rm install_katex.sh # Install conda and other packages (e.g., numpy, pytest) ENV PATH /opt/conda/bin:$PATH ARG ANACONDA_PYTHON_VERSION +ADD requirements-ci.txt /opt/conda/requirements-ci.txt ADD ./common/install_conda.sh install_conda.sh RUN bash ./install_conda.sh && rm install_conda.sh +RUN rm /opt/conda/requirements-ci.txt # Install gcc ARG GCC_VERSION @@ -99,5 +100,11 @@ ENV CUDA_PATH /usr/local/cuda # Install LLVM dev version (Defined in the pytorch/builder github repository) COPY --from=pytorch/llvm:9.0.1 /opt/llvm /opt/llvm +# Install CUDNN +ARG CUDNN_VERSION +ADD ./common/install_cudnn.sh install_cudnn.sh +RUN if [ "${CUDNN_VERSION}" -eq 8 ]; then bash install_cudnn.sh; fi +RUN rm install_cudnn.sh + USER jenkins CMD ["bash"] diff --git a/.circleci/docker/ubuntu-rocm/Dockerfile b/.circleci/docker/ubuntu-rocm/Dockerfile index 73f0e1822e895a..26059287636332 100644 --- a/.circleci/docker/ubuntu-rocm/Dockerfile +++ b/.circleci/docker/ubuntu-rocm/Dockerfile @@ -28,8 +28,10 @@ RUN bash ./install_user.sh && rm install_user.sh # Install conda and other packages (e.g., numpy, pytest) ENV PATH /opt/conda/bin:$PATH ARG ANACONDA_PYTHON_VERSION +ADD requirements-ci.txt /opt/conda/requirements-ci.txt ADD ./common/install_conda.sh install_conda.sh RUN bash ./install_conda.sh && rm install_conda.sh +RUN rm /opt/conda/requirements-ci.txt # Install gcc ARG GCC_VERSION diff --git a/.circleci/docker/ubuntu/Dockerfile b/.circleci/docker/ubuntu/Dockerfile index e0ae5c096ec9a8..d5940c7a1d55e3 100644 --- a/.circleci/docker/ubuntu/Dockerfile +++ b/.circleci/docker/ubuntu/Dockerfile @@ -36,8 +36,10 @@ RUN bash ./install_katex.sh && rm install_katex.sh # Install conda and other packages (e.g., numpy, pytest) ENV PATH /opt/conda/bin:$PATH ARG ANACONDA_PYTHON_VERSION +ADD requirements-ci.txt /opt/conda/requirements-ci.txt ADD ./common/install_conda.sh install_conda.sh RUN bash ./install_conda.sh && rm install_conda.sh +RUN rm /opt/conda/requirements-ci.txt # Install gcc ARG GCC_VERSION diff --git a/.circleci/generate_config_yml.py b/.circleci/generate_config_yml.py index 581a32cd485c80..f089518f4f46cd 100755 --- a/.circleci/generate_config_yml.py +++ b/.circleci/generate_config_yml.py @@ -10,12 +10,8 @@ import sys from collections import namedtuple -import cimodel.data.binary_build_definitions as binary_build_definitions -import cimodel.data.simple.android_definitions -import cimodel.data.simple.binary_smoketest import cimodel.data.simple.docker_definitions import cimodel.data.simple.mobile_definitions -import cimodel.data.simple.nightly_android import cimodel.data.simple.nightly_ios import cimodel.data.simple.anaconda_prune_defintions import cimodel.lib.miniutils as miniutils @@ -83,11 +79,11 @@ def _for_all_items(items, functor) -> None: functor(item_type, item) def filter_master_only_jobs(items): - def _is_master_item(item): + def _is_main_or_master_item(item): filters = item.get('filters', None) branches = filters.get('branches', None) if filters is not None else None branches_only = branches.get('only', None) if branches is not None else None - return 'master' in branches_only if branches_only is not None else False + return ('main' in branches_only or 'master' in branches_only) if branches_only is not None else False master_deps = set() @@ -96,7 +92,7 @@ def _save_requires_if_master(item_type, item): item_name = item.get("name", None) if not isinstance(requires, list): return - if _is_master_item(item) or item_name in master_deps: + if _is_main_or_master_item(item) or item_name in master_deps: master_deps.update([n.strip('"') for n in requires]) def _do_filtering(items): @@ -107,7 +103,7 @@ def _do_filtering(items): item_type, item = next(iter(items.items())) item_name = item.get("name", None) item_name = item_name.strip('"') if item_name is not None else None - if not _is_master_item(item) and item_name not in master_deps: + if not _is_main_or_master_item(item) and item_name not in master_deps: return None if 'filters' in item: item = item.copy() @@ -115,7 +111,7 @@ def _do_filtering(items): return {item_type: item} # Scan of dependencies twice to pick up nested required jobs - # I.e. jobs depending on jobs that master-only job depend on + # I.e. jobs depending on jobs that main-only job depend on _for_all_items(items, _save_requires_if_master) _for_all_items(items, _save_requires_if_master) return _do_filtering(items) @@ -137,14 +133,9 @@ def _requires_docker_image(item_type, item): def gen_build_workflows_tree(): build_workflows_functions = [ - cimodel.data.simple.android_definitions.get_workflow_jobs, cimodel.data.simple.mobile_definitions.get_workflow_jobs, - cimodel.data.simple.binary_smoketest.get_workflow_jobs, cimodel.data.simple.nightly_ios.get_workflow_jobs, - cimodel.data.simple.nightly_android.get_workflow_jobs, cimodel.data.simple.anaconda_prune_defintions.get_workflow_jobs, - binary_build_definitions.get_post_upload_jobs, - binary_build_definitions.get_binary_smoke_test_jobs, ] build_jobs = [f() for f in build_workflows_functions] build_jobs.extend( @@ -155,28 +146,20 @@ def gen_build_workflows_tree(): ) master_build_jobs = filter_master_only_jobs(build_jobs) - binary_build_functions = [ - binary_build_definitions.get_binary_build_jobs, - binary_build_definitions.get_nightly_tests, - binary_build_definitions.get_nightly_uploads, - ] - - return { + rc = { "workflows": { - "binary_builds": { - "when": r"<< pipeline.parameters.run_binary_tests >>", - "jobs": [f() for f in binary_build_functions], - }, "build": { "when": r"<< pipeline.parameters.run_build >>", "jobs": build_jobs, }, - "master_build": { - "when": r"<< pipeline.parameters.run_master_build >>", - "jobs": master_build_jobs, - }, } } + if len(master_build_jobs) > 0: + rc["workflows"]["master_build"] = { + "when": r"<< pipeline.parameters.run_master_build >>", + "jobs": master_build_jobs, + } + return rc # Order of this list matters to the generated config.yml. @@ -189,7 +172,6 @@ def gen_build_workflows_tree(): File("build-parameters/binary-build-params.yml"), File("build-parameters/promote-build-params.yml"), Header("Job specs"), - File("job-specs/pytorch-job-specs.yml"), File("job-specs/binary-job-specs.yml"), File("job-specs/job-specs-custom.yml"), File("job-specs/job-specs-promote.yml"), diff --git a/.circleci/scripts/binary_checkout.sh b/.circleci/scripts/binary_checkout.sh index db2b0660d9f506..86bfeb77e6ac4a 100755 --- a/.circleci/scripts/binary_checkout.sh +++ b/.circleci/scripts/binary_checkout.sh @@ -49,8 +49,9 @@ if [[ -n "${CIRCLE_PR_NUMBER:-}" ]]; then git reset --hard "$CIRCLE_SHA1" elif [[ -n "${CIRCLE_SHA1:-}" ]]; then # Scheduled workflows & "smoke" binary build on master on PR merges + DEFAULT_BRANCH="$(git remote show $CIRCLE_REPOSITORY_URL | awk '/HEAD branch/ {print $NF}')" git reset --hard "$CIRCLE_SHA1" - git checkout -q -B master + git checkout -q -B $DEFAULT_BRANCH else echo "Can't tell what to checkout" exit 1 diff --git a/.circleci/scripts/binary_linux_build.sh b/.circleci/scripts/binary_linux_build.sh index 42aa728d55a6fb..88561fcd80ec02 100755 --- a/.circleci/scripts/binary_linux_build.sh +++ b/.circleci/scripts/binary_linux_build.sh @@ -26,7 +26,7 @@ else build_script='manywheel/build.sh' fi -if [[ "$CIRCLE_BRANCH" == "master" ]] || [[ "$CIRCLE_BRANCH" == release/* ]]; then +if [[ "$CIRCLE_BRANCH" == "main" ]] || [[ "$CIRCLE_BRANCH" == "master" ]] || [[ "$CIRCLE_BRANCH" == release/* ]]; then export BUILD_DEBUG_INFO=1 fi diff --git a/.circleci/scripts/binary_linux_test.sh b/.circleci/scripts/binary_linux_test.sh index 5be7f7cae21375..e915903ad8746f 100755 --- a/.circleci/scripts/binary_linux_test.sh +++ b/.circleci/scripts/binary_linux_test.sh @@ -53,7 +53,7 @@ if [[ "\$python_nodot" = *39* ]]; then NUMPY_PIN=">=1.20" fi -if [[ "$DESIRED_CUDA" == "cu112" || "$DESIRED_CUDA" == "cu115" ]]; then +if [[ "$DESIRED_CUDA" == "cu115" || "$DESIRED_CUDA" == "cu116" ]]; then EXTRA_CONDA_FLAGS="-c=conda-forge" fi @@ -67,7 +67,8 @@ mv /final_pkgs/debug-*.zip /tmp/debug_final_pkgs || echo "no debug packages to m # TODO there is duplicated and inconsistent test-python-env setup across this # file, builder/smoke_test.sh, and builder/run_tests.sh, and also in the # conda build scripts themselves. These should really be consolidated -pkg="/final_pkgs/\$(ls /final_pkgs)" +# Pick only one package of multiple available (which happens as result of workflow re-runs) +pkg="/final_pkgs/\$(ls -1 /final_pkgs|sort|tail -1)" if [[ "$PACKAGE_TYPE" == conda ]]; then ( # For some reason conda likes to re-activate the conda environment when attempting this install diff --git a/.circleci/scripts/binary_windows_build.sh b/.circleci/scripts/binary_windows_build.sh index 2104e5728f8500..e6500b8d9c93d1 100644 --- a/.circleci/scripts/binary_windows_build.sh +++ b/.circleci/scripts/binary_windows_build.sh @@ -8,15 +8,16 @@ export CUDA_VERSION="${DESIRED_CUDA/cu/}" export USE_SCCACHE=1 export SCCACHE_BUCKET=ossci-compiler-cache-windows export SCCACHE_IGNORE_SERVER_IO_ERROR=1 -export NIGHTLIES_PYTORCH_ROOT="$PYTORCH_ROOT" export VC_YEAR=2019 if [[ "${DESIRED_CUDA}" == *"cu11"* ]]; then export BUILD_SPLIT_CUDA=ON fi + echo "Free Space for CUDA DEBUG BUILD" if [[ "${CIRCLECI:-}" == 'true' ]]; then + export NIGHTLIES_PYTORCH_ROOT="$PYTORCH_ROOT" if [[ -d "C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Community" ]]; then rm -rf "C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Community" fi @@ -71,6 +72,7 @@ pushd "$BUILDER_ROOT" if [[ "$PACKAGE_TYPE" == 'conda' ]]; then ./windows/internal/build_conda.bat elif [[ "$PACKAGE_TYPE" == 'wheel' || "$PACKAGE_TYPE" == 'libtorch' ]]; then + export NIGHTLIES_PYTORCH_ROOT="$PYTORCH_ROOT" ./windows/internal/build_wheels.bat fi diff --git a/.circleci/scripts/cpp_doc_push_script.sh b/.circleci/scripts/cpp_doc_push_script.sh index fa68d07e537eaa..1b4ea71ffd9dbc 100755 --- a/.circleci/scripts/cpp_doc_push_script.sh +++ b/.circleci/scripts/cpp_doc_push_script.sh @@ -20,7 +20,7 @@ echo "cpp_doc_push_script.sh: Invoked with $*" # but since DOCS_INSTALL_PATH can be derived from DOCS_VERSION it's probably better to # try and gather it first, just so we don't potentially break people who rely on this script # Argument 2: What version of the Python API docs we are building. -version="${2:-${DOCS_VERSION:-master}}" +version="${2:-${DOCS_VERSION:-main}}" if [ -z "$version" ]; then echo "error: cpp_doc_push_script.sh: version (arg2) not specified" exit 1 @@ -34,9 +34,9 @@ echo "error: cpp_doc_push_script.sh: install_path (arg1) not specified" exit 1 fi -is_master_doc=false -if [ "$version" == "master" ]; then - is_master_doc=true +is_main_doc=false +if [ "$version" == "main" ]; then + is_main_doc=true fi echo "install_path: $install_path version: $version" @@ -65,8 +65,7 @@ cp torch/_utils_internal.py tools/shared # Generate PyTorch files time python tools/setup_helpers/generate_code.py \ - --native-functions-path aten/src/ATen/native/native_functions.yaml \ - --nn-path aten/src/ + --native-functions-path aten/src/ATen/native/native_functions.yaml # Build the docs pushd docs/cpp diff --git a/.circleci/scripts/python_doc_push_script.sh b/.circleci/scripts/python_doc_push_script.sh index ccfc44917400a7..cb6d520c260de4 100755 --- a/.circleci/scripts/python_doc_push_script.sh +++ b/.circleci/scripts/python_doc_push_script.sh @@ -23,7 +23,7 @@ set -ex # but since DOCS_INSTALL_PATH can be derived from DOCS_VERSION it's probably better to # try and gather it first, just so we don't potentially break people who rely on this script # Argument 2: What version of the docs we are building. -version="${2:-${DOCS_VERSION:-master}}" +version="${2:-${DOCS_VERSION:-main}}" if [ -z "$version" ]; then echo "error: python_doc_push_script.sh: version (arg2) not specified" exit 1 @@ -37,9 +37,9 @@ echo "error: python_doc_push_script.sh: install_path (arg1) not specified" exit 1 fi -is_master_doc=false -if [ "$version" == "master" ]; then - is_master_doc=true +is_main_doc=false +if [ "$version" == "main" ]; then + is_main_doc=true fi # Argument 3: The branch to push to. Usually is "site" @@ -86,7 +86,7 @@ pushd docs # Build the docs pip -q install -r requirements.txt -if [ "$is_master_doc" = true ]; then +if [ "$is_main_doc" = true ]; then build_docs html [ $? -eq 0 ] || exit $? make coverage diff --git a/.circleci/scripts/setup_ci_environment.sh b/.circleci/scripts/setup_ci_environment.sh index 1f2e6bfaef61bc..dab183d907a6c6 100755 --- a/.circleci/scripts/setup_ci_environment.sh +++ b/.circleci/scripts/setup_ci_environment.sh @@ -32,7 +32,7 @@ if ! command -v aws >/dev/null; then fi if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then - DRIVER_FN="NVIDIA-Linux-x86_64-495.44.run" + DRIVER_FN="NVIDIA-Linux-x86_64-510.60.02.run" wget "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN" sudo /bin/bash "$DRIVER_FN" -s --no-drm || (sudo cat /var/log/nvidia-installer.log && false) nvidia-smi diff --git a/.circleci/scripts/trigger_azure_pipeline.py b/.circleci/scripts/trigger_azure_pipeline.py index b35ee5ce9def07..9dc9dff2d54de1 100644 --- a/.circleci/scripts/trigger_azure_pipeline.py +++ b/.circleci/scripts/trigger_azure_pipeline.py @@ -11,7 +11,7 @@ AZURE_DEVOPS_PAT_BASE64 = os.environ.get("AZURE_DEVOPS_PAT_BASE64_SECRET", "") PIPELINE_ID = "911" PROJECT_ID = "0628bce4-2d33-499e-bac5-530e12db160f" -TARGET_BRANCH = os.environ.get("CIRCLE_BRANCH", "master") +TARGET_BRANCH = os.environ.get("CIRCLE_BRANCH", "main") TARGET_COMMIT = os.environ.get("CIRCLE_SHA1", "") build_base_url = AZURE_PIPELINE_BASE_URL + "_apis/build/builds?api-version=6.0" diff --git a/.circleci/scripts/windows_cuda_install.sh b/.circleci/scripts/windows_cuda_install.sh index abcdcf134b3769..b12ec7516ab7b0 100644 --- a/.circleci/scripts/windows_cuda_install.sh +++ b/.circleci/scripts/windows_cuda_install.sh @@ -22,6 +22,10 @@ case ${CUDA_VERSION} in cuda_installer_name="cuda_11.5.0_496.13_win10" cuda_install_packages="thrust_11.5 nvcc_11.5 cuobjdump_11.5 nvprune_11.5 nvprof_11.5 cupti_11.5 cublas_11.5 cublas_dev_11.5 cudart_11.5 cufft_11.5 cufft_dev_11.5 curand_11.5 curand_dev_11.5 cusolver_11.5 cusolver_dev_11.5 cusparse_11.5 cusparse_dev_11.5 npp_11.5 npp_dev_11.5 nvrtc_11.5 nvrtc_dev_11.5 nvml_dev_11.5" ;; + 11.6) + cuda_installer_name="cuda_11.6.0_511.23_windows" + cuda_install_packages="thrust_11.6 nvcc_11.6 cuobjdump_11.6 nvprune_11.6 nvprof_11.6 cupti_11.6 cublas_11.6 cublas_dev_11.6 cudart_11.6 cufft_11.6 cufft_dev_11.6 curand_11.6 curand_dev_11.6 cusolver_11.6 cusolver_dev_11.6 cusparse_11.6 cusparse_dev_11.6 npp_11.6 npp_dev_11.6 nvrtc_11.6 nvrtc_dev_11.6 nvml_dev_11.6" + ;; *) echo "CUDA_VERSION $CUDA_VERSION is not supported yet" exit 1 diff --git a/.circleci/scripts/windows_cudnn_install.sh b/.circleci/scripts/windows_cudnn_install.sh index 87e8a8dd09bf20..fbcbdc4020e961 100644 --- a/.circleci/scripts/windows_cudnn_install.sh +++ b/.circleci/scripts/windows_cudnn_install.sh @@ -22,6 +22,10 @@ case ${CUDA_VERSION} in # Since cudnn 8.3 the filename have changed cudnn_file_name="cudnn-windows-x86_64-8.3.2.44_cuda${CUDA_VERSION}-archive" ;; + 11.6) + # Use cudnn8.3 with hard-coded cuda11.5 version + cudnn_file_name="cudnn-windows-x86_64-8.3.2.44_cuda11.5-archive" + ;; *) echo "CUDA_VERSION: ${CUDA_VERSION} not supported yet" exit 1 diff --git a/.circleci/verbatim-sources/job-specs/binary-job-specs.yml b/.circleci/verbatim-sources/job-specs/binary-job-specs.yml index ab60b0d372d679..f6f16ef7dd651c 100644 --- a/.circleci/verbatim-sources/job-specs/binary-job-specs.yml +++ b/.circleci/verbatim-sources/job-specs/binary-job-specs.yml @@ -1,3 +1,4 @@ +jobs: binary_linux_build: <<: *binary_linux_build_params steps: diff --git a/.circleci/verbatim-sources/job-specs/job-specs-custom.yml b/.circleci/verbatim-sources/job-specs/job-specs-custom.yml index a3c1d932d93eb5..f0f12e09b2d902 100644 --- a/.circleci/verbatim-sources/job-specs/job-specs-custom.yml +++ b/.circleci/verbatim-sources/job-specs/job-specs-custom.yml @@ -5,7 +5,7 @@ parameters: branch: type: string - default: "master" + default: "main" steps: - attach_workspace: at: /tmp/workspace @@ -45,7 +45,7 @@ echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE} # turn v1.12.0rc3 into 1.12 tag=$(echo $CIRCLE_TAG | sed -e 's/v*\([0-9]*\.[0-9]*\).*/\1/') - target=${tag:-master} + target=${tag:-main} echo "building for ${target}" time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE}) @@ -55,7 +55,7 @@ echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts mkdir -p ~/workspace/build_artifacts - docker cp $id:/var/lib/jenkins/workspace/pytorch.github.io/docs/master ~/workspace/build_artifacts + docker cp $id:/var/lib/jenkins/workspace/pytorch.github.io/docs/main ~/workspace/build_artifacts docker cp $id:/var/lib/jenkins/workspace/pytorch.github.io /tmp/workspace # Save the docs build so we can debug any problems @@ -67,7 +67,7 @@ paths: - . - store_artifacts: - path: ~/workspace/build_artifacts/master + path: ~/workspace/build_artifacts/main destination: docs pytorch_cpp_doc_build: @@ -91,12 +91,12 @@ echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE} # turn v1.12.0rc3 into 1.12 tag=$(echo $CIRCLE_TAG | sed -e 's/v*\([0-9]*\.[0-9]*\).*/\1/') - target=${tag:-master} + target=${tag:-main} echo "building for ${target}" time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE}) - export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && '"export CIRCLE_SHA1='$CIRCLE_SHA1'"' && . ./.circleci/scripts/cpp_doc_push_script.sh docs/"$target" master") | docker exec -u jenkins -i "$id" bash) 2>&1' + export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && '"export CIRCLE_SHA1='$CIRCLE_SHA1'"' && . ./.circleci/scripts/cpp_doc_push_script.sh docs/"$target" main") | docker exec -u jenkins -i "$id" bash) 2>&1' echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts @@ -580,7 +580,7 @@ time docker pull ${DOCKER_IMAGE}:${DOCKER_TAG} >/dev/null export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE}:${DOCKER_TAG}) - echo "Do NOT merge master branch into $CIRCLE_BRANCH in environment $BUILD_ENVIRONMENT" + echo "Do NOT merge main branch into $CIRCLE_BRANCH in environment $BUILD_ENVIRONMENT" git submodule sync && git submodule update -q --init --recursive --depth 1 --jobs 0 diff --git a/.circleci/verbatim-sources/job-specs/pytorch-job-specs.yml b/.circleci/verbatim-sources/job-specs/pytorch-job-specs.yml deleted file mode 100644 index 79f879a13f0197..00000000000000 --- a/.circleci/verbatim-sources/job-specs/pytorch-job-specs.yml +++ /dev/null @@ -1,229 +0,0 @@ -jobs: - pytorch_linux_build: - <<: *pytorch_params - machine: - image: ubuntu-2004:202104-01 - steps: - # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml - - checkout - - calculate_docker_image_tag - - setup_linux_system_environment - - optional_merge_target_branch - - setup_ci_environment - - run: - name: Build - no_output_timeout: "1h" - command: | - set -e - if [[ ${BUILD_ENVIRONMENT} == *"pure_torch"* ]]; then - echo 'BUILD_CAFFE2=OFF' >> "${BASH_ENV}" - fi - if [[ ${BUILD_ENVIRONMENT} == *"paralleltbb"* ]]; then - echo 'ATEN_THREADING=TBB' >> "${BASH_ENV}" - echo 'USE_TBB=1' >> "${BASH_ENV}" - elif [[ ${BUILD_ENVIRONMENT} == *"parallelnative"* ]]; then - echo 'ATEN_THREADING=NATIVE' >> "${BASH_ENV}" - fi - echo "Parallel backend flags: "${PARALLEL_FLAGS} - # Pull Docker image and run build - echo "DOCKER_IMAGE: "${DOCKER_IMAGE}:${DOCKER_TAG} - time docker pull ${DOCKER_IMAGE}:${DOCKER_TAG} >/dev/null - export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE}:${DOCKER_TAG}) - - git submodule sync && git submodule update -q --init --recursive --depth 1 --jobs 0 - - docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace - - export COMMAND='((echo "sudo chown -R jenkins workspace && export JOB_BASE_NAME="$CIRCLE_JOB" && cd workspace && .jenkins/pytorch/build.sh && find ${BUILD_ROOT} -type f -name "*.a" -or -name "*.o" -delete") | docker exec -u jenkins -i "$id" bash) 2>&1' - - echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts - - # Copy dist folder back - docker cp $id:/var/lib/jenkins/workspace/dist /home/circleci/project/. || echo "Dist folder not found" - - # Push intermediate Docker image for next phase to use - if [ -z "${BUILD_ONLY}" ]; then - # Note [Special build images] - # The xla build uses the same docker image as - # pytorch_linux_bionic_py3_6_clang9_build. In the push step, we have to - # distinguish between them so the test can pick up the correct image. - output_image=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1} - if [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then - export COMMIT_DOCKER_IMAGE=$output_image-xla - elif [[ ${BUILD_ENVIRONMENT} == *"libtorch"* ]]; then - export COMMIT_DOCKER_IMAGE=$output_image-libtorch - elif [[ ${BUILD_ENVIRONMENT} == *"paralleltbb"* ]]; then - export COMMIT_DOCKER_IMAGE=$output_image-paralleltbb - elif [[ ${BUILD_ENVIRONMENT} == *"parallelnative"* ]]; then - export COMMIT_DOCKER_IMAGE=$output_image-parallelnative - elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-x86_64"* ]]; then - export COMMIT_DOCKER_IMAGE=$output_image-android-x86_64 - elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-arm-v7a"* ]]; then - export COMMIT_DOCKER_IMAGE=$output_image-android-arm-v7a - elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-arm-v8a"* ]]; then - export COMMIT_DOCKER_IMAGE=$output_image-android-arm-v8a - elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-x86_32"* ]]; then - export COMMIT_DOCKER_IMAGE=$output_image-android-x86_32 - elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-vulkan-x86_32"* ]]; then - export COMMIT_DOCKER_IMAGE=$output_image-android-vulkan-x86_32 - elif [[ ${BUILD_ENVIRONMENT} == *"vulkan-linux"* ]]; then - export COMMIT_DOCKER_IMAGE=$output_image-vulkan - else - export COMMIT_DOCKER_IMAGE=$output_image - fi - docker commit "$id" ${COMMIT_DOCKER_IMAGE} - time docker push ${COMMIT_DOCKER_IMAGE} - fi - - run: - name: upload build & binary data - no_output_timeout: "5m" - command: | - cd /pytorch && export COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0) - python3 -mpip install requests && \ - SCRIBE_GRAPHQL_ACCESS_TOKEN=${SCRIBE_GRAPHQL_ACCESS_TOKEN} \ - python3 -m tools.stats.upload_binary_size_to_scuba || exit 0 - - store_artifacts: - path: /home/circleci/project/dist - - pytorch_linux_test: - <<: *pytorch_params - machine: - image: ubuntu-2004:202104-01 - steps: - # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml - - checkout - - calculate_docker_image_tag - - setup_linux_system_environment - - setup_ci_environment - - run: - name: Download Docker image - no_output_timeout: "90m" - command: | - set -e - export PYTHONUNBUFFERED=1 - if [[ "${DOCKER_IMAGE}" == *rocm3.9* ]]; then - export DOCKER_TAG="f3d89a32912f62815e4feaeed47e564e887dffd6" - fi - # See Note [Special build images] - output_image=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1} - if [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then - export COMMIT_DOCKER_IMAGE=$output_image-xla - elif [[ ${BUILD_ENVIRONMENT} == *"libtorch"* ]]; then - export COMMIT_DOCKER_IMAGE=$output_image-libtorch - elif [[ ${BUILD_ENVIRONMENT} == *"paralleltbb"* ]]; then - export COMMIT_DOCKER_IMAGE=$output_image-paralleltbb - elif [[ ${BUILD_ENVIRONMENT} == *"parallelnative"* ]]; then - export COMMIT_DOCKER_IMAGE=$output_image-parallelnative - elif [[ ${BUILD_ENVIRONMENT} == *"vulkan-linux"* ]]; then - export COMMIT_DOCKER_IMAGE=$output_image-vulkan - else - export COMMIT_DOCKER_IMAGE=$output_image - fi - echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE} - - if [[ ${BUILD_ENVIRONMENT} == *"paralleltbb"* ]]; then - echo 'ATEN_THREADING=TBB' >> "${BASH_ENV}" - echo 'USE_TBB=1' >> "${BASH_ENV}" - elif [[ ${BUILD_ENVIRONMENT} == *"parallelnative"* ]]; then - echo 'ATEN_THREADING=NATIVE' >> "${BASH_ENV}" - fi - echo "Parallel backend flags: "${PARALLEL_FLAGS} - - time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null - - # TODO: Make this less painful - if [ -n "${USE_CUDA_DOCKER_RUNTIME}" ]; then - export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --gpus all --shm-size=2g -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE}) - elif [[ ${BUILD_ENVIRONMENT} == *"rocm"* ]]; then - hostname - export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --shm-size=8g --ipc=host --device /dev/kfd --device /dev/dri --group-add video -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE}) - else - export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --shm-size=1g --ipc=host -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE}) - fi - echo "id=${id}" >> "${BASH_ENV}" - - - run: - name: Check for no AVX instruction by default - no_output_timeout: "20m" - command: | - set -e - is_vanilla_build() { - if [ "${BUILD_ENVIRONMENT}" == "pytorch-linux-bionic-py3.7-clang9-test" ]; then - return 0 - fi - if [ "${BUILD_ENVIRONMENT}" == "pytorch-linux-xenial-py3.7-gcc5.4-test" ]; then - return 0 - fi - return 1 - } - - if is_vanilla_build; then - echo "apt-get update || apt-get install libgnutls30" | docker exec -u root -i "$id" bash - echo "apt-get install -y qemu-user gdb" | docker exec -u root -i "$id" bash - echo "cd workspace/build; qemu-x86_64 -g 2345 -cpu Broadwell -E ATEN_CPU_CAPABILITY=default ./bin/basic --gtest_filter=BasicTest.BasicTestCPU & gdb ./bin/basic -ex 'set pagination off' -ex 'target remote :2345' -ex 'continue' -ex 'bt' -ex='set confirm off' -ex 'quit \$_isvoid(\$_exitcode)'" | docker exec -u jenkins -i "$id" bash - else - echo "Skipping for ${BUILD_ENVIRONMENT}" - fi - - run: - name: Test - no_output_timeout: "90m" - command: | - set -e - - cat >docker_commands.sh \<> docker_commands.sh - elif [[ ${BUILD_ENVIRONMENT} == *onnx* ]]; then - echo ".jenkins/caffe2/test.sh" >> docker_commands.sh - else - echo ".jenkins/pytorch/test.sh" >> docker_commands.sh - fi - echo "(cat docker_commands.sh | docker exec -u jenkins -i "$id" bash) 2>&1" > command.sh - unbuffer bash command.sh | ts - - - run: - name: Report results - no_output_timeout: "5m" - command: | - set -e - # Retrieving test results should be done as very first step as command never fails - # But is always executed if previous step fails for some reason - echo "Retrieving test reports" - docker cp $id:/var/lib/jenkins/workspace/test/test-reports ./ || echo 'No test reports found!' - docker stats --all --no-stream - - cat >docker_commands.sh \<&1" > command.sh - unbuffer bash command.sh | ts - when: always - - store_test_results: - path: test-reports diff --git a/.gitattributes b/.gitattributes index 70246abe9bbbaf..d87495166e5c2b 100644 --- a/.gitattributes +++ b/.gitattributes @@ -2,3 +2,4 @@ .circleci/config.yml linguist-generated=true .github/workflows/generated-*.yml linguist-generated=true .github/generated-* linguist-generated=true +.github/scripts/gql_mocks.json linguist-generated=true diff --git a/.github/actions/build-android/action.yml b/.github/actions/build-android/action.yml new file mode 100644 index 00000000000000..2493bb3a76066a --- /dev/null +++ b/.github/actions/build-android/action.yml @@ -0,0 +1,82 @@ +name: build android + +description: build android for a specific arch + +inputs: + arch: + description: arch to build + required: true + arch-for-build-env: + description: | + arch to pass to build environment. + This is currently different than the arch name we use elswhere, which + should be fixed. + required: true + github-secret: + description: github token + required: true + build-environment: + required: true + description: Top-level label for what's being built/tested. + docker-image: + required: true + description: Name of the base docker image to build with. + branch: + required: true + description: What branch we are building on. +outputs: + container_id: + description: Docker container identifier used to build the artifacts + value: ${{ steps.build.outputs.container_id }} + +runs: + using: composite + steps: + - name: Build-${{ inputs.arch }} + id: build + shell: bash + env: + BRANCH: ${{ inputs.branch }} + JOB_BASE_NAME: ${{ inputs.build-environment }}-build-and-test + BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-${{ inputs.arch-for-build-env }}-build" + AWS_DEFAULT_REGION: us-east-1 + PR_NUMBER: ${{ github.event.pull_request.number }} + SHA1: ${{ github.event.pull_request.head.sha || github.sha }} + CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts + SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 + DOCKER_IMAGE: ${{ inputs.docker-image }} + MATRIX_ARCH: ${{ inputs.arch }} + run: | + # detached container should get cleaned up by teardown_ec2_linux + set -exo pipefail + export container_name + container_name=$(docker run \ + -e BUILD_ENVIRONMENT \ + -e JOB_BASE_NAME \ + -e MAX_JOBS="$(nproc --ignore=2)" \ + -e AWS_DEFAULT_REGION \ + -e IS_GHA \ + -e PR_NUMBER \ + -e SHA1 \ + -e BRANCH \ + -e GITHUB_RUN_ID \ + -e SCCACHE_BUCKET \ + -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ + -e SKIP_SCCACHE_INITIALIZATION=1 \ + --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ + --security-opt seccomp=unconfined \ + --cap-add=SYS_PTRACE \ + --tty \ + --detach \ + --user jenkins \ + -w /var/lib/jenkins/workspace \ + "${DOCKER_IMAGE}" + ) + git submodule sync && git submodule update -q --init --recursive --depth 1 --jobs 0 + docker cp "${GITHUB_WORKSPACE}/." "${container_name}:/var/lib/jenkins/workspace" + (echo "sudo chown -R jenkins . && .jenkins/pytorch/build.sh && find ${BUILD_ROOT} -type f -name "*.a" -or -name "*.o" -delete" | docker exec -u jenkins -i "${container_name}" bash) 2>&1 + + # Copy install binaries back + mkdir -p "${GITHUB_WORKSPACE}/build_android_install_${MATRIX_ARCH}" + docker cp "${container_name}:/var/lib/jenkins/workspace/build_android/install" "${GITHUB_WORKSPACE}/build_android_install_${MATRIX_ARCH}" + echo "::set-output name=container_id::${container_name}" diff --git a/.github/actions/calculate-docker-image/action.yml b/.github/actions/calculate-docker-image/action.yml new file mode 100644 index 00000000000000..d32179ac78a7d8 --- /dev/null +++ b/.github/actions/calculate-docker-image/action.yml @@ -0,0 +1,93 @@ +name: Calculate docker image + +description: Determine docker image to pull, building a new one if necessary. + +inputs: + docker-image-name: + description: The name of a docker image, like `pytorch-linux-xenial-py3.7-gcc7` + required: true + xla: + description: | + Whether or not to use a pre-build XLA docker image. + Note that this is a string, either "true" or "false" due to GHA limitations. + required: false + always-rebuild: + description: If set to any value, always build a fresh docker image. + required: false + pull: + description: If set to any value, run `docker pull`` on the calculated image. + required: false + +outputs: + docker-image: + description: The docker image to use for the rest of the workflow + value: ${{ steps.calculate-tag.outputs.docker-image }} + +runs: + using: composite + steps: + - name: Calculate docker image tag + shell: bash + id: calculate-tag + env: + IS_XLA: ${{ inputs.xla == 'true' && 'true' || '' }} + XLA_IMAGE_TAG: v0.2 + DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/${{ inputs.docker-image-name }} + run: | + if [ -n "${IS_XLA}" ]; then + echo "XLA workflow uses pre-built test image at ${XLA_IMAGE_TAG}" + DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker) + echo "::set-output name=docker-tag::${DOCKER_TAG}" + echo "::set-output name=docker-image::${DOCKER_IMAGE_BASE}:${XLA_IMAGE_TAG}" + else + DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker) + echo "::set-output name=docker-tag::${DOCKER_TAG}" + echo "::set-output name=docker-image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" + fi + + - name: Check if image should be built + shell: bash + id: check + if: ${{ !inputs.always-rebuild }} + env: + BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }} + DOCKER_IMAGE: ${{ steps.calculate-tag.outputs.docker-image }} + DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker-tag }} + run: | + set -x + # Check if image already exists, if it does then skip building it + if docker manifest inspect "${DOCKER_IMAGE}"; then + exit 0 + fi + if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then + # if we're on the base branch then use the parent commit + MERGE_BASE=$(git rev-parse HEAD~) + else + # otherwise we're on a PR, so use the most recent base commit + MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") + fi + # Covers the case where a previous tag doesn't exist for the tree + # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly + if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then + echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit" + exit 1 + fi + PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker") + # If no image exists but the hash is the same as the previous hash then we should error out here + if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then + echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch" + echo " contact the PyTorch team to restore the original images" + exit 1 + fi + echo ::set-output name=rebuild::yes + + - name: Build and push docker image + if: inputs.always-rebuild || steps.check.outputs.rebuild + env: + IMAGE_NAME: ${{inputs.docker-image-name}} + DOCKER_SKIP_S3_UPLOAD: "1" + DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker-tag }} + working-directory: .circleci/docker + shell: bash + run: | + ./build_docker.sh diff --git a/.github/actions/checkout-pytorch/action.yml b/.github/actions/checkout-pytorch/action.yml new file mode 100644 index 00000000000000..eb1b728467f8f5 --- /dev/null +++ b/.github/actions/checkout-pytorch/action.yml @@ -0,0 +1,32 @@ +name: Checkout PyTorch + +description: Clean workspace and check out PyTorch + +inputs: + no-sudo: + description: If set to any value, don't use sudo to clean the workspace + required: false + +runs: + using: composite + steps: + - name: Clean workspace + shell: bash + env: + NO_SUDO: ${{ inputs.no-sudo }} + run: | + echo "${GITHUB_WORKSPACE}" + if [ -z "${NO_SUDO}" ]; then + sudo rm -rf "${GITHUB_WORKSPACE}" + else + rm -rf "${GITHUB_WORKSPACE}" + fi + mkdir "${GITHUB_WORKSPACE}" + + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + # deep clone, to allow use of git merge-base + fetch-depth: 0 + submodules: recursive diff --git a/.github/actions/chown-workspace/action.yml b/.github/actions/chown-workspace/action.yml new file mode 100644 index 00000000000000..6adc6cdc217db4 --- /dev/null +++ b/.github/actions/chown-workspace/action.yml @@ -0,0 +1,11 @@ +name: Chown workspace + +description: Ensure that the working directory gets chowned back to the current user + +runs: + using: composite + steps: + - run: docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + shell: bash + env: + ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" diff --git a/.github/actions/download-build-artifacts/action.yml b/.github/actions/download-build-artifacts/action.yml new file mode 100644 index 00000000000000..a3c9444c1b98fb --- /dev/null +++ b/.github/actions/download-build-artifacts/action.yml @@ -0,0 +1,34 @@ +name: Download PyTorch Build Artifacts + +description: Download and unzip artifacts from a previous PyTorch build. + +inputs: + name: + description: Name of what artifact to download + required: true + use-gha: + description: If set to any value, use GHA to download the artifact. Otherwise use s3. + required: false + +runs: + using: composite + steps: + - name: Download PyTorch Build Artifacts from S3 + if: ${{ !inputs.use-gha }} + uses: seemethere/download-artifact-s3@v3 + with: + name: ${{ inputs.name }} + + - name: Download PyTorch Build Artifacts from GHA + if: inputs.use-gha + uses: actions/download-artifact@v2 + with: + name: ${{ inputs.name }} + + - name: Unzip artifacts + shell: bash + run: unzip -o artifacts.zip + + - name: Output disk space left + shell: bash + run: df -H diff --git a/.github/actions/get-workflow-job-id/action.yml b/.github/actions/get-workflow-job-id/action.yml new file mode 100644 index 00000000000000..c7ca1e07d6bec8 --- /dev/null +++ b/.github/actions/get-workflow-job-id/action.yml @@ -0,0 +1,31 @@ +name: Get workflow job id + +description: Get the ID of the workflow job that is currently running. + +inputs: + github-token: + description: GITHUB_TOKEN + required: true + +outputs: + job-id: + description: The retrieved workflow job id + value: ${{ steps.get-job-id.outputs.job-id }} + +runs: + using: composite + steps: + - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a + id: get-job-id + env: + GITHUB_TOKEN: ${{ inputs.github-token }} + with: + shell: bash + timeout_minutes: 10 + max_attempts: 5 + retry_wait_seconds: 30 + command: | + set -x + python3 -m pip install requests==2.26.0 + GHA_WORKFLOW_JOB_ID=$(python3 .github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}") + echo "::set-output name=job-id::${GHA_WORKFLOW_JOB_ID}" diff --git a/.github/actions/pull-docker-image/action.yml b/.github/actions/pull-docker-image/action.yml new file mode 100644 index 00000000000000..ad1cc1baf9d3dc --- /dev/null +++ b/.github/actions/pull-docker-image/action.yml @@ -0,0 +1,19 @@ +name: Pull docker image + +description: pull a specific docker image + +inputs: + docker-image: + description: the image to pull + required: true + +runs: + using: composite + steps: + - name: Pull Docker image + shell: bash + env: + DOCKER_IMAGE: ${{ inputs.docker-image }} + run: | + retry () { "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") } + retry docker pull "${DOCKER_IMAGE}" diff --git a/.github/actions/setup-linux/action.yml b/.github/actions/setup-linux/action.yml new file mode 100644 index 00000000000000..d7500f11de7d63 --- /dev/null +++ b/.github/actions/setup-linux/action.yml @@ -0,0 +1,47 @@ +name: Setup Linux + +description: Set up Docker workspace on EC2 + +runs: + using: composite + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + + - name: Start docker if docker deamon is not running + shell: bash + run: | + if systemctl is-active --quiet docker; then + echo "Docker daemon is running..."; + else + echo "Starting docker deamon..." && sudo systemctl start docker; + fi + + - name: Log in to ECR + shell: bash + env: + AWS_RETRY_MODE: standard + AWS_MAX_ATTEMPTS: "5" + AWS_DEFAULT_REGION: us-east-1 + run: | + AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") + retry () { "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") } + retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ + --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + + - name: Preserve github env variables for use in docker + shell: bash + run: | + env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" diff --git a/.github/actions/setup-rocm/action.yml b/.github/actions/setup-rocm/action.yml new file mode 100644 index 00000000000000..d261a557802919 --- /dev/null +++ b/.github/actions/setup-rocm/action.yml @@ -0,0 +1,57 @@ +name: Setup ROCm host + +description: Set up ROCm host for CI + +runs: + using: composite + steps: + - name: Set DOCKER_HOST + shell: bash + run: echo "DOCKER_HOST=unix:///run/user/$(id -u)/docker.sock" >> "${GITHUB_ENV}" + + - name: Runner health check system info + if: always() + shell: bash + run: | + cat /etc/os-release || true + cat /etc/apt/sources.list.d/rocm.list || true + cat /opt/rocm/.info/version || true + whoami + + - name: Runner health check rocm-smi + if: always() + shell: bash + run: | + rocm-smi + + - name: Runner health check rocminfo + if: always() + shell: bash + run: | + rocminfo + + - name: Runner health check GPU count + if: always() + shell: bash + run: | + ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx') + if [[ "x$ngpu" != "x2" && "x$ngpu" != "x4" ]]; then + echo "Failed to detect GPUs on the runner" + exit 1 + fi + + - name: Runner health check disconnect on failure + if: ${{ failure() }} + shell: bash + run: | + killall runsvc.sh + + - name: Preserve github env variables for use in docker + shell: bash + run: | + env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" + + - name: ROCm set GPU_FLAG + shell: bash + run: | + echo "GPU_FLAG=--device=/dev/mem --device=/dev/kfd --device=/dev/dri --group-add video --group-add daemon" >> "${GITHUB_ENV}" diff --git a/.github/actions/setup-ssh/action.yml b/.github/actions/setup-ssh/action.yml new file mode 100644 index 00000000000000..9daed4a5f9734f --- /dev/null +++ b/.github/actions/setup-ssh/action.yml @@ -0,0 +1,16 @@ +name: Setup SSH + +description: Adds ssh keys for current user to machine + +inputs: + github-secret: + description: GitHub token + required: true + +runs: + using: composite + steps: + - name: "Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ inputs.github-secret }} diff --git a/.github/actions/setup-win/action.yml b/.github/actions/setup-win/action.yml new file mode 100644 index 00000000000000..12f287b230898a --- /dev/null +++ b/.github/actions/setup-win/action.yml @@ -0,0 +1,60 @@ +name: Setup Windows + +description: Set up for windows jobs + +inputs: + cuda-version: + description: which cuda version to install, 'cpu' for none + required: true + +runs: + using: composite + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + + - name: Install Visual Studio 2019 toolchain + shell: powershell + env: + VS_VERSION: "16.8.6" + INSTALL_WINDOWS_SDK: "1" + run: | + .\.circleci\scripts\vs_install.ps1 + + - name: Install CUDA and CUDNN + shell: bash + if: inputs.cuda-version != 'cpu' + env: + CUDA_VERSION: ${{ inputs.cuda-version }} + run: | + .circleci/scripts/windows_cuda_install.sh + .circleci/scripts/windows_cudnn_install.sh + + - name: Setup Python3 + uses: actions/setup-python@v2 + with: + python-version: "3.x" diff --git a/.github/actions/teardown-linux/action.yml b/.github/actions/teardown-linux/action.yml new file mode 100644 index 00000000000000..9238a073a6b621 --- /dev/null +++ b/.github/actions/teardown-linux/action.yml @@ -0,0 +1,28 @@ +name: Teardown Linux + +description: Stuff that should always run at the end of a linux job + +inputs: + skip-wait-ssh: + description: If set, don't wait for ssh to drain before tearing down + required: false + default: "" + +runs: + using: composite + steps: + - name: Hold runner for 2 hours or until ssh sessions have drained + # TODO working-directory: !{{ pytorch_directory }} + # Always hold for active ssh sessions + shell: bash + if: inputs.skip-wait-ssh == '' + run: .github/scripts/wait_for_ssh_to_drain.sh + + - name: Kill containers, clean up images + shell: bash + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af diff --git a/.github/actions/teardown-win/action.yml b/.github/actions/teardown-win/action.yml new file mode 100644 index 00000000000000..49c509444e095a --- /dev/null +++ b/.github/actions/teardown-win/action.yml @@ -0,0 +1,33 @@ +name: Teardown Windows + +description: Set up Docker workspace on linux + +inputs: + extra-delete-dir: + description: If set, cleaning up the workspace will delete this too + required: false + default: "" + +runs: + using: composite + steps: + - name: Wait until all sessions have drained + shell: powershell + if: always() + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + + - name: Cleanup workspace + if: always() + shell: bash + env: + EXTRA_DELETE_DIR: ${{ inputs.extra-delete-dir }} + run: | + [ ! -z "${EXTRA_DELETE_DIR}" ] || rm -rf "${EXTRA_DELETE_DIR}" + rm -rf ./* diff --git a/.github/actions/upload-test-artifacts/action.yml b/.github/actions/upload-test-artifacts/action.yml new file mode 100644 index 00000000000000..7a00a377fca41f --- /dev/null +++ b/.github/actions/upload-test-artifacts/action.yml @@ -0,0 +1,94 @@ +name: Upload test artifacts + +description: Upload various artifacts produced by our testing process + +inputs: + use-gha: + description: If set to any value, upload GHA. Otherwise upload to S3. + required: false + file-suffix: + description: | + Suffix to add to the filename of the artifacts. This should include the + workflow job id, see [Job id in artifacts]. + required: true + +runs: + using: composite + steps: + # Mac/Linux zip + - name: Zip JSONs for upload + if: runner.os != 'Windows' && !inputs.use-gha + shell: bash + env: + FILE_SUFFIX: ${{ inputs.file-suffix }} + run: | + # Remove any previous test jsons if they exist + rm -f test-jsons-*.zip + zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' + + - name: Zip test reports for upload + if: runner.os != 'Windows' && !inputs.use-gha + shell: bash + env: + FILE_SUFFIX: ${{ inputs.file-suffix }} + run: | + # Remove any previous test reports if they exist + rm -f test-reports-*.zip + zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' + + # Windows zip + - name: Zip JSONs for upload + if: runner.os == 'Windows' && !inputs.use-gha + shell: powershell + env: + FILE_SUFFIX: ${{ inputs.file-suffix }} + run: | + # -ir => recursive include all files in pattern + 7z a "test-jsons-$Env:FILE_SUFFIX.zip" -ir'!test\*.json' + + - name: Zip test reports for upload + if: runner.os == 'Windows' && !inputs.use-gha + shell: powershell + env: + FILE_SUFFIX: ${{ inputs.file-suffix }} + run: | + # -ir => recursive include all files in pattern + 7z a "test-reports-$Env:FILE_SUFFIX.zip" -ir'!test\*.xml' + + # S3 upload + - name: Store Test Downloaded JSONs on S3 + uses: seemethere/upload-artifact-s3@v4 + if: ${{ !inputs.use-gha }} + with: + retention-days: 14 + if-no-files-found: warn + path: test-jsons-*.zip + + - name: Store Test Reports on S3 + uses: seemethere/upload-artifact-s3@v4 + if: ${{ !inputs.use-gha }} + with: + retention-days: 14 + if-no-files-found: error + path: test-reports-*.zip + + # GHA upload + - name: Store Test Downloaded JSONs on Github + uses: actions/upload-artifact@v2 + if: inputs.use-gha + with: + # Add the run attempt, see [Artifact run attempt] + name: test-jsons-runattempt${{ github.run_attempt }}-${{ inputs.file-suffix }}.zip + retention-days: 14 + if-no-files-found: warn + path: test/**/*.json + + - name: Store Test Reports on Github + uses: actions/upload-artifact@v2 + if: inputs.use-gha + with: + # Add the run attempt, see [Artifact run attempt] + name: test-reports-runattempt${{ github.run_attempt }}-${{ inputs.file-suffix }}.zip + retention-days: 14 + if-no-files-found: error + path: test/**/*.xml diff --git a/.github/generated-ciflow-ruleset.json b/.github/generated-ciflow-ruleset.json deleted file mode 100644 index 3625512b7a804e..00000000000000 --- a/.github/generated-ciflow-ruleset.json +++ /dev/null @@ -1,298 +0,0 @@ -{ - "__comment": "@generated DO NOT EDIT MANUALLY, Generation script: .github/scripts/generate_ci_workflows.py", - "label_rules": { - "ciflow/all": [ - "caffe2-linux-xenial-py3.7-gcc5.4", - "docker-builds", - "ios-12-5-1-arm64", - "ios-12-5-1-arm64-coreml", - "ios-12-5-1-arm64-custom-ops", - "ios-12-5-1-arm64-metal", - "ios-12-5-1-x86-64", - "ios-12-5-1-x86-64-coreml", - "libtorch-linux-xenial-cuda10.2-py3.7-gcc7", - "libtorch-linux-xenial-cuda11.3-py3.7-gcc7", - "linux-bionic-cuda10.2-py3.9-gcc7", - "linux-bionic-py3.7-clang9", - "linux-bionic-rocm4.5-py3.7", - "linux-docs", - "linux-docs-push", - "linux-vulkan-bionic-py3.7-clang9", - "linux-xenial-cuda11.3-py3.7-gcc7", - "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test", - "linux-xenial-cuda11.3-py3.7-gcc7-no-ops", - "linux-xenial-py3-clang5-mobile-build", - "linux-xenial-py3-clang5-mobile-custom-build-static", - "linux-xenial-py3.7-clang7-asan", - "linux-xenial-py3.7-clang7-onnx", - "linux-xenial-py3.7-gcc5.4", - "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build", - "linux-xenial-py3.7-gcc7", - "linux-xenial-py3.7-gcc7-no-ops", - "macos-10-15-py3-arm64", - "macos-10-15-py3-lite-interpreter-x86-64", - "macos-11-py3-x86-64", - "parallelnative-linux-xenial-py3.7-gcc5.4", - "periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7", - "periodic-linux-bionic-cuda11.5-py3.7-gcc7", - "periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck", - "periodic-linux-xenial-cuda11.3-py3.7-gcc7-debug", - "periodic-win-vs2019-cuda11.5-py3", - "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build", - "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single", - "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit", - "pytorch-xla-linux-bionic-py3.7-clang8", - "win-vs2019-cpu-py3", - "win-vs2019-cuda11.3-py3" - ], - "ciflow/android": [ - "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build", - "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single", - "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit" - ], - "ciflow/bazel": [ - "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test" - ], - "ciflow/binaries": [ - "linux-binary-conda", - "linux-binary-libtorch-cxx11-abi", - "linux-binary-libtorch-pre-cxx11", - "linux-binary-manywheel", - "macos-arm64-binary-conda", - "macos-arm64-binary-wheel", - "macos-binary-conda", - "macos-binary-libtorch-cxx11-abi", - "macos-binary-libtorch-pre-cxx11", - "macos-binary-wheel", - "windows-binary-libtorch-debug", - "windows-binary-libtorch-release", - "windows-binary-wheel" - ], - "ciflow/binaries_conda": [ - "linux-binary-conda", - "macos-arm64-binary-conda", - "macos-binary-conda" - ], - "ciflow/binaries_libtorch": [ - "linux-binary-libtorch-cxx11-abi", - "linux-binary-libtorch-pre-cxx11", - "macos-binary-libtorch-cxx11-abi", - "macos-binary-libtorch-pre-cxx11", - "windows-binary-libtorch-debug", - "windows-binary-libtorch-release" - ], - "ciflow/binaries_wheel": [ - "linux-binary-manywheel", - "macos-arm64-binary-wheel", - "macos-binary-wheel", - "windows-binary-wheel" - ], - "ciflow/cpu": [ - "caffe2-linux-xenial-py3.7-gcc5.4", - "linux-bionic-py3.7-clang9", - "linux-docs", - "linux-docs-push", - "linux-vulkan-bionic-py3.7-clang9", - "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test", - "linux-xenial-py3.7-clang7-asan", - "linux-xenial-py3.7-clang7-onnx", - "linux-xenial-py3.7-gcc5.4", - "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build", - "linux-xenial-py3.7-gcc7", - "linux-xenial-py3.7-gcc7-no-ops", - "parallelnative-linux-xenial-py3.7-gcc5.4", - "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build", - "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single", - "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit", - "pytorch-xla-linux-bionic-py3.7-clang8", - "win-vs2019-cpu-py3" - ], - "ciflow/cuda": [ - "libtorch-linux-xenial-cuda10.2-py3.7-gcc7", - "libtorch-linux-xenial-cuda11.3-py3.7-gcc7", - "linux-bionic-cuda10.2-py3.9-gcc7", - "linux-xenial-cuda11.3-py3.7-gcc7", - "linux-xenial-cuda11.3-py3.7-gcc7-no-ops", - "periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7", - "periodic-linux-bionic-cuda11.5-py3.7-gcc7", - "periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck", - "periodic-linux-xenial-cuda11.3-py3.7-gcc7-debug", - "periodic-win-vs2019-cuda11.5-py3", - "win-vs2019-cuda11.3-py3" - ], - "ciflow/default": [ - "linux-binary-conda", - "linux-binary-libtorch-cxx11-abi", - "linux-binary-libtorch-pre-cxx11", - "linux-binary-manywheel", - "linux-bionic-py3.7-clang9", - "linux-bionic-rocm4.5-py3.7", - "linux-docs", - "linux-vulkan-bionic-py3.7-clang9", - "linux-xenial-cuda11.3-py3.7-gcc7", - "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test", - "linux-xenial-py3-clang5-mobile-build", - "linux-xenial-py3-clang5-mobile-custom-build-static", - "linux-xenial-py3.7-clang7-asan", - "linux-xenial-py3.7-clang7-onnx", - "linux-xenial-py3.7-gcc5.4", - "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build", - "linux-xenial-py3.7-gcc7", - "linux-xenial-py3.7-gcc7-no-ops", - "macos-arm64-binary-conda", - "macos-arm64-binary-wheel", - "macos-binary-conda", - "macos-binary-libtorch-cxx11-abi", - "macos-binary-libtorch-pre-cxx11", - "macos-binary-wheel", - "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single", - "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit", - "win-vs2019-cpu-py3", - "win-vs2019-cuda11.3-py3", - "windows-binary-libtorch-debug", - "windows-binary-libtorch-release", - "windows-binary-wheel" - ], - "ciflow/docs": [ - "linux-docs" - ], - "ciflow/ios": [ - "ios-12-5-1-arm64", - "ios-12-5-1-arm64-coreml", - "ios-12-5-1-arm64-custom-ops", - "ios-12-5-1-arm64-metal", - "ios-12-5-1-x86-64", - "ios-12-5-1-x86-64-coreml" - ], - "ciflow/libtorch": [ - "libtorch-linux-xenial-cuda10.2-py3.7-gcc7", - "libtorch-linux-xenial-cuda11.3-py3.7-gcc7", - "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build", - "periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7" - ], - "ciflow/linux": [ - "caffe2-linux-xenial-py3.7-gcc5.4", - "libtorch-linux-xenial-cuda10.2-py3.7-gcc7", - "libtorch-linux-xenial-cuda11.3-py3.7-gcc7", - "linux-bionic-cuda10.2-py3.9-gcc7", - "linux-bionic-py3.7-clang9", - "linux-bionic-rocm4.5-py3.7", - "linux-docs", - "linux-docs-push", - "linux-vulkan-bionic-py3.7-clang9", - "linux-xenial-cuda11.3-py3.7-gcc7", - "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test", - "linux-xenial-cuda11.3-py3.7-gcc7-no-ops", - "linux-xenial-py3-clang5-mobile-build", - "linux-xenial-py3-clang5-mobile-custom-build-static", - "linux-xenial-py3.7-clang7-asan", - "linux-xenial-py3.7-clang7-onnx", - "linux-xenial-py3.7-gcc5.4", - "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build", - "linux-xenial-py3.7-gcc7", - "linux-xenial-py3.7-gcc7-no-ops", - "parallelnative-linux-xenial-py3.7-gcc5.4", - "periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7", - "periodic-linux-bionic-cuda11.5-py3.7-gcc7", - "periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck", - "periodic-linux-xenial-cuda11.3-py3.7-gcc7-debug", - "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build", - "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single", - "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit", - "pytorch-xla-linux-bionic-py3.7-clang8" - ], - "ciflow/macos": [ - "ios-12-5-1-arm64", - "ios-12-5-1-arm64-coreml", - "ios-12-5-1-arm64-custom-ops", - "ios-12-5-1-arm64-metal", - "ios-12-5-1-x86-64", - "ios-12-5-1-x86-64-coreml", - "macos-10-15-py3-arm64", - "macos-10-15-py3-lite-interpreter-x86-64", - "macos-11-py3-x86-64" - ], - "ciflow/mobile": [ - "linux-xenial-py3-clang5-mobile-build", - "linux-xenial-py3-clang5-mobile-custom-build-static", - "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build" - ], - "ciflow/noarch": [ - "linux-bionic-py3.7-clang9" - ], - "ciflow/onnx": [ - "linux-xenial-py3.7-clang7-onnx" - ], - "ciflow/rocm": [ - "linux-bionic-rocm4.5-py3.7" - ], - "ciflow/sanitizers": [ - "linux-xenial-py3.7-clang7-asan" - ], - "ciflow/scheduled": [ - "ios-12-5-1-arm64", - "ios-12-5-1-arm64-coreml", - "ios-12-5-1-arm64-custom-ops", - "ios-12-5-1-arm64-metal", - "linux-docs-push", - "periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7", - "periodic-linux-bionic-cuda11.5-py3.7-gcc7", - "periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck", - "periodic-linux-xenial-cuda11.3-py3.7-gcc7-debug", - "periodic-win-vs2019-cuda11.5-py3" - ], - "ciflow/slow": [ - "linux-bionic-cuda10.2-py3.9-gcc7", - "periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck" - ], - "ciflow/slow-gradcheck": [ - "periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck" - ], - "ciflow/trunk": [ - "caffe2-linux-xenial-py3.7-gcc5.4", - "docker-builds", - "ios-12-5-1-x86-64", - "ios-12-5-1-x86-64-coreml", - "libtorch-linux-xenial-cuda10.2-py3.7-gcc7", - "libtorch-linux-xenial-cuda11.3-py3.7-gcc7", - "linux-bionic-cuda10.2-py3.9-gcc7", - "linux-bionic-py3.7-clang9", - "linux-bionic-rocm4.5-py3.7", - "linux-docs", - "linux-vulkan-bionic-py3.7-clang9", - "linux-xenial-cuda11.3-py3.7-gcc7", - "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test", - "linux-xenial-cuda11.3-py3.7-gcc7-no-ops", - "linux-xenial-py3-clang5-mobile-build", - "linux-xenial-py3-clang5-mobile-custom-build-static", - "linux-xenial-py3.7-clang7-asan", - "linux-xenial-py3.7-clang7-onnx", - "linux-xenial-py3.7-gcc5.4", - "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build", - "linux-xenial-py3.7-gcc7", - "linux-xenial-py3.7-gcc7-no-ops", - "macos-10-15-py3-arm64", - "macos-10-15-py3-lite-interpreter-x86-64", - "macos-11-py3-x86-64", - "parallelnative-linux-xenial-py3.7-gcc5.4", - "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build", - "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single", - "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit", - "pytorch-xla-linux-bionic-py3.7-clang8", - "win-vs2019-cpu-py3", - "win-vs2019-cuda11.3-py3" - ], - "ciflow/vulkan": [ - "linux-vulkan-bionic-py3.7-clang9" - ], - "ciflow/win": [ - "periodic-win-vs2019-cuda11.5-py3", - "win-vs2019-cpu-py3", - "win-vs2019-cuda11.3-py3" - ], - "ciflow/xla": [ - "pytorch-xla-linux-bionic-py3.7-clang8" - ] - }, - "version": "v1" -} diff --git a/.github/merge_rules.json b/.github/merge_rules.json index dded4737f40509..56268e5381618a 100644 --- a/.github/merge_rules.json +++ b/.github/merge_rules.json @@ -2,54 +2,49 @@ { "name": "ONNX exporter", "patterns": [ - "torch/onnx/**", - "torch/csrc/jit/passes/onnx/**", - "torch/csrc/jit/passes/onnx.*", - "test/onnx/**", + ".jenkins/caffe2/*", "docs/source/onnx.rst", + "test/onnx/**", + "tools/onnx/**", + "torch/_C/__init__.pyi.in", + "torch/csrc/jit/passes/onnx.*", + "torch/csrc/jit/passes/onnx/**", "torch/csrc/jit/serialization/export.*", "torch/csrc/jit/serialization/onnx.*", - "torch/_C/__init__.pyi.in", "torch/csrc/onnx/**", - ".jenkins/caffe2/*" + "torch/onnx/**" ], "approved_by": ["BowenBao", "garymm"], - "mandatory_app_id": 12274 + "mandatory_checks_name": ["Facebook CLA Check", "Lint"] }, { "name": "NVFuser", "patterns": ["torch/csrc/jit/codegen/fuser/cuda/**", "torch/csrc/jit/codegen/cuda/**", "benchmarks/cpp/nvfuser/**"], "approved_by": ["csarofeen", "ngimel"], - "mandatory_app_id": 12274 + "mandatory_checks_name": ["Facebook CLA Check", "Lint"] }, { "name": "OSS CI", "patterns": [".github/**", ".circleci/**", ".jenkins/**", "scripts/**", "tools/**"], - "approved_by": ["janeyx99", "ezyang"], - "mandatory_app_id": 12274 + "approved_by": ["ezyang", "pytorch/pytorch-dev-infra"], + "mandatory_checks_name": ["Facebook CLA Check", "Lint"] }, { "name": "Documentation", "patterns": ["docs/**", "torch/*docs.py"], "approved_by": ["mruberry", "ngimel", "janeyx99"], - "mandatory_app_id": 12274 - }, - { - "name": "Android", - "patterns": ["android/**"], - "approved_by": ["linbinyu", "kit1980", "IvanKobzarev"], - "mandatory_app_id": 12274 + "mandatory_checks_name": ["Facebook CLA Check", "Lint"] }, { - "name": "iOS", - "patterns": ["ios/**"], - "approved_by": ["linbinyu", "kit1980", "xta0", "hanton"], - "mandatory_app_id": 12274 + "name": "Mobile", + "patterns": ["ios/**", "android/**", "test/mobile/**"], + "approved_by": ["linbinyu", "kit1980", "IvanKobzarev", "dreiss"], + "mandatory_checks_name": ["Facebook CLA Check", "Lint"] }, { "name": "superuser", "patterns": ["*"], - "approved_by": ["albanD", "jbschlosser", "suo", "osalpekar", "malfet", "seemethere", "ezyang"], - "mandatory_app_id": 12274 + "approved_by": ["pytorch/metamates"], + "mandatory_checks_name": ["Facebook CLA Check", "Lint"] } ] diff --git a/.github/scale-config.yml b/.github/scale-config.yml index 0670ed9598ae63..213a9942ff9071 100644 --- a/.github/scale-config.yml +++ b/.github/scale-config.yml @@ -30,7 +30,7 @@ runner_types: linux.2xlarge: instance_type: c5.2xlarge os: linux - max_available: 500 + max_available: 750 disk_size: 150 is_ephemeral: false linux.4xlarge: # for binary-builds diff --git a/.github/scripts/README.md b/.github/scripts/README.md new file mode 100644 index 00000000000000..22099c3732ea53 --- /dev/null +++ b/.github/scripts/README.md @@ -0,0 +1,58 @@ +# pytorch/.github + +> NOTE: This README contains information for the `.github` directory but cannot be located there because it will overwrite the +repo README. + +This directory contains workflows and scripts to support our CI infrastructure that runs on Github Actions. + +## Workflows + +- Pull CI (`pull.yml`) is run on PRs and on master. +- Trunk CI (`trunk.yml`) is run on trunk to validate incoming commits. Trunk jobs are usually more expensive to run so we do not run them on PRs unless specified. +- Scheduled CI (`periodic.yml`) is a subset of trunk CI that is run every few hours on master. +- Binary CI is run to package binaries for distribution for all platforms. + +## Templates + +Templates written in [Jinja](https://jinja.palletsprojects.com/en/3.0.x/) are located in the `.github/templates` directory +and used to generate workflow files for binary jobs found in the `.github/workflows/` directory. These are also a +couple of utility templates used to discern common utilities that can be used amongst different templates. + +### (Re)Generating workflow files + +You will need `jinja2` in order to regenerate the workflow files which can be installed using: +```bash +pip install -r .github/requirements.txt +``` + +Workflows can be generated / regenerated using the following command: +```bash +.github/regenerate.sh +``` + +### Adding a new generated binary workflow + +New generated binary workflows can be added in the `.github/scripts/generate_ci_workflows.py` script. You can reference +examples from that script in order to add the workflow to the stream that is relevant to what you particularly +care about. + +Different parameters can be used to acheive different goals, i.e. running jobs on a cron, running only on trunk, etc. + +#### ciflow (trunk) + +The label `ciflow/trunk` can be used to run `trunk` only workflows. This is especially useful if trying to re-land a PR that was +reverted for failing a `non-default` workflow. + +## Infra + +Currently most of our self hosted runners are hosted on AWS, for a comprehensive list of available runner types you +can reference `.github/scale-config.yml`. + +Exceptions to AWS for self hosted: +* ROCM runners + +### Adding new runner types + +New runner types can be added by committing changes to `.github/scale-config.yml`. Example: https://github.com/pytorch/pytorch/pull/70474 + +> NOTE: New runner types can only be used once the changes to `.github/scale-config.yml` have made their way into the default branch diff --git a/.github/scripts/build_publish_nightly_docker.sh b/.github/scripts/build_publish_nightly_docker.sh index 3e953db88b891d..db84704aa3e4c8 100644 --- a/.github/scripts/build_publish_nightly_docker.sh +++ b/.github/scripts/build_publish_nightly_docker.sh @@ -1,9 +1,9 @@ -#!/bin/sh +#!/usr/bin/env bash set -xeuo pipefail PYTORCH_DOCKER_TAG=$(git describe --tags --always)-devel -CUDA_VERSION=11.3 +CUDA_VERSION=11.3.1 # Build PyTorch nightly docker make -f docker.Makefile \ @@ -25,18 +25,20 @@ docker tag ghcr.io/pytorch/pytorch-nightly:${PYTORCH_DOCKER_TAG} \ docker tag ghcr.io/pytorch/pytorch-nightly:${PYTORCH_NIGHTLY_COMMIT}-cu${CUDA_VERSION} \ ghcr.io/pytorch/pytorch-nightly:latest -# Push the nightly docker to GitHub Container Registry -echo $GHCR_PAT | docker login ghcr.io -u pytorch --password-stdin -make -f docker.Makefile \ - DOCKER_REGISTRY=ghcr.io \ - DOCKER_ORG=pytorch \ - DOCKER_IMAGE=pytorch-nightly \ - DOCKER_TAG=${PYTORCH_NIGHTLY_COMMIT}-cu${CUDA_VERSION} \ - devel-push - -make -f docker.Makefile \ - DOCKER_REGISTRY=ghcr.io \ - DOCKER_ORG=pytorch \ - DOCKER_IMAGE=pytorch-nightly \ - DOCKER_TAG=latest \ - devel-push +if [[ ${WITH_PUSH:-} == "true" ]]; then + # Push the nightly docker to GitHub Container Registry + echo $GHCR_PAT | docker login ghcr.io -u pytorch --password-stdin + make -f docker.Makefile \ + DOCKER_REGISTRY=ghcr.io \ + DOCKER_ORG=pytorch \ + DOCKER_IMAGE=pytorch-nightly \ + DOCKER_TAG=${PYTORCH_NIGHTLY_COMMIT}-cu${CUDA_VERSION} \ + devel-push + + make -f docker.Makefile \ + DOCKER_REGISTRY=ghcr.io \ + DOCKER_ORG=pytorch \ + DOCKER_IMAGE=pytorch-nightly \ + DOCKER_TAG=latest \ + devel-push +fi diff --git a/.github/scripts/ensure_actions_will_cancel.py b/.github/scripts/ensure_actions_will_cancel.py index a07f4359dd045c..c479aefb9fc433 100755 --- a/.github/scripts/ensure_actions_will_cancel.py +++ b/.github/scripts/ensure_actions_will_cancel.py @@ -9,14 +9,8 @@ REPO_ROOT = Path(__file__).resolve().parent.parent.parent WORKFLOWS = REPO_ROOT / ".github" / "workflows" - - -def concurrency_key(filename: Path) -> str: - workflow_name = filename.with_suffix("").name.replace("_", "-") - if workflow_name.startswith("generated-"): - workflow_name = workflow_name[len("generated-"):] - return f"{workflow_name}-${{{{ github.event.pull_request.number || github.sha }}}}" \ - "-${{ github.event_name == 'workflow_dispatch' }}" +EXPECTED_GROUP = "${{ github.workflow }}-${{ github.event.pull_request.number || github.sha }}" \ + "-${{ github.event_name == 'workflow_dispatch' }}" def should_check(filename: Path) -> bool: @@ -38,12 +32,19 @@ def should_check(filename: Path) -> bool: errors_found = False files = [f for f in files if should_check(f)] + names = set() for filename in files: with open(filename, "r") as f: data = yaml.safe_load(f) + name = data.get("name") + if name is not None and name in names: + print("ERROR: duplicate workflow name:", name, file=sys.stderr) + errors_found = True + names.add(name) + expected = { - "group": concurrency_key(filename), + "group": EXPECTED_GROUP, "cancel-in-progress": True, } actual = data.get("concurrency", None) diff --git a/.github/scripts/generate_binary_build_matrix.py b/.github/scripts/generate_binary_build_matrix.py index 90e509d87c2762..b4476092b71a9e 100644 --- a/.github/scripts/generate_binary_build_matrix.py +++ b/.github/scripts/generate_binary_build_matrix.py @@ -10,10 +10,10 @@ * Latest ROCM """ -from typing import Dict, List, Tuple +from typing import Dict, List, Tuple, Optional -CUDA_ARCHES = ["10.2", "11.3", "11.5"] +CUDA_ARCHES = ["10.2", "11.3", "11.5", "11.6"] ROCM_ARCHES = ["4.5.2", "5.0"] @@ -59,6 +59,14 @@ def arch_type(arch_version: str) -> str: (gpu_arch, CXX11_ABI): f"pytorch/libtorch-cxx11-builder:cuda{gpu_arch}" for gpu_arch in CUDA_ARCHES }, + **{ + (gpu_arch, PRE_CXX11_ABI): f"pytorch/manylinux-builder:rocm{gpu_arch}" + for gpu_arch in ROCM_ARCHES + }, + **{ + (gpu_arch, CXX11_ABI): f"pytorch/libtorch-cxx11-builder:rocm{gpu_arch}" + for gpu_arch in ROCM_ARCHES + }, ("cpu", PRE_CXX11_ABI): "pytorch/manylinux-builder:cpu", ("cpu", CXX11_ABI): "pytorch/libtorch-cxx11-builder:cpu", } @@ -112,23 +120,29 @@ def generate_conda_matrix(os: str) -> List[Dict[str, str]]: return ret -def generate_libtorch_matrix(os: str, abi_version: str) -> List[Dict[str, str]]: - libtorch_variants = [ - "shared-with-deps", - "shared-without-deps", - "static-with-deps", - "static-without-deps", - ] +def generate_libtorch_matrix(os: str, abi_version: str, + arches: Optional[List[str]] = None, + libtorch_variants: Optional[List[str]] = None) -> List[Dict[str, str]]: + if arches is None: + arches = ["cpu"] + if os == "linux": + arches += CUDA_ARCHES + arches += ROCM_ARCHES + elif os == "windows": + # We don't build CUDA 10.2 for window see https://github.com/pytorch/pytorch/issues/65648 + arches += list_without(CUDA_ARCHES, ["10.2"]) + + if libtorch_variants is None: + libtorch_variants = [ + "shared-with-deps", + "shared-without-deps", + "static-with-deps", + "static-without-deps", + ] + ret: List[Dict[str, str]] = [] - arches = ["cpu"] - if os == "linux": - arches += CUDA_ARCHES - elif os == "windows": - # We don't build CUDA 10.2 for window see https://github.com/pytorch/pytorch/issues/65648 - arches += list_without(CUDA_ARCHES, ["10.2"]) for arch_version in arches: for libtorch_variant in libtorch_variants: - # We don't currently build libtorch for rocm # one of the values in the following list must be exactly # CXX11_ABI, but the precise value of the other one doesn't # matter @@ -156,19 +170,29 @@ def generate_libtorch_matrix(os: str, abi_version: str) -> List[Dict[str, str]]: return ret -def generate_wheels_matrix(os: str) -> List[Dict[str, str]]: - arches = ["cpu"] +def generate_wheels_matrix(os: str, + arches: Optional[List[str]] = None, + python_versions: Optional[List[str]] = None) -> List[Dict[str, str]]: package_type = "wheel" - python_versions = FULL_PYTHON_VERSIONS if os == "linux": - arches += CUDA_ARCHES + ROCM_ARCHES # NOTE: We only build manywheel packages for linux package_type = "manywheel" - elif os == "windows": - # We don't build CUDA 10.2 for window see https://github.com/pytorch/pytorch/issues/65648 - arches += list_without(CUDA_ARCHES, ["10.2"]) - elif os == "macos-arm64": - python_versions = list_without(python_versions, ["3.7"]) + + if python_versions is None: + # Define default python version + python_versions = FULL_PYTHON_VERSIONS + if os == "macos-arm64": + python_versions = list_without(python_versions, ["3.7"]) + + if arches is None: + # Define default compute archivectures + arches = ["cpu"] + if os == "linux": + arches += CUDA_ARCHES + ROCM_ARCHES + elif os == "windows": + # We don't build CUDA 10.2 for window see https://github.com/pytorch/pytorch/issues/65648 + arches += list_without(CUDA_ARCHES, ["10.2"]) + ret: List[Dict[str, str]] = [] for python_version in python_versions: for arch_version in arches: diff --git a/.github/scripts/generate_ci_workflows.py b/.github/scripts/generate_ci_workflows.py index dab955d3596e57..c8b815bf018036 100755 --- a/.github/scripts/generate_ci_workflows.py +++ b/.github/scripts/generate_ci_workflows.py @@ -2,10 +2,10 @@ from dataclasses import asdict, dataclass, field from pathlib import Path -from typing import Dict, Set, List, Iterable, Any +from typing import Dict, Set, List, Iterable import jinja2 -import json + import os import sys from typing_extensions import Literal, TypedDict @@ -14,88 +14,15 @@ Arch = Literal["windows", "linux", "macos"] -DOCKER_REGISTRY = "308535385114.dkr.ecr.us-east-1.amazonaws.com" GITHUB_DIR = Path(__file__).resolve().parent.parent -WINDOWS_CPU_TEST_RUNNER = "windows.4xlarge" -# contains 1 gpu -WINDOWS_CUDA_TEST_RUNNER = "windows.8xlarge.nvidia.gpu" -WINDOWS_RUNNERS = { - WINDOWS_CPU_TEST_RUNNER, - WINDOWS_CUDA_TEST_RUNNER, -} - -LINUX_CPU_TEST_RUNNER = "linux.2xlarge" -# contains 1 gpu -LINUX_CUDA_TEST_RUNNER = "linux.4xlarge.nvidia.gpu" -# contains at least 2 gpus -LINUX_ROCM_TEST_RUNNER = "linux.rocm.gpu" -LINUX_RUNNERS = { - LINUX_CPU_TEST_RUNNER, - LINUX_CUDA_TEST_RUNNER, - LINUX_ROCM_TEST_RUNNER, -} - -LINUX_DISTRIBUTED_GPU_RUNNERS = { - LINUX_CUDA_TEST_RUNNER : "linux.8xlarge.nvidia.gpu", - LINUX_ROCM_TEST_RUNNER : LINUX_ROCM_TEST_RUNNER, -} - -LINUX_MULTIGPU_RUNNERS = { - LINUX_CUDA_TEST_RUNNER : "linux.16xlarge.nvidia.gpu", - LINUX_ROCM_TEST_RUNNER : LINUX_ROCM_TEST_RUNNER, -} - -MACOS_TEST_RUNNER_10_15 = "macos-10.15" -MACOS_TEST_RUNNER_11 = "macos-11" - -MACOS_RUNNERS = { - MACOS_TEST_RUNNER_10_15, - MACOS_TEST_RUNNER_11, -} - -CUDA_RUNNERS = { - WINDOWS_CUDA_TEST_RUNNER, - LINUX_CUDA_TEST_RUNNER, -} -ROCM_RUNNERS = { - LINUX_ROCM_TEST_RUNNER, -} -CPU_RUNNERS = { - WINDOWS_CPU_TEST_RUNNER, - LINUX_CPU_TEST_RUNNER, -} - -LABEL_CIFLOW_ALL = "ciflow/all" -LABEL_CIFLOW_BAZEL = "ciflow/bazel" -LABEL_CIFLOW_CPU = "ciflow/cpu" -LABEL_CIFLOW_CUDA = "ciflow/cuda" -LABEL_CIFLOW_ROCM = "ciflow/rocm" -LABEL_CIFLOW_DOCS = "ciflow/docs" -LABEL_CIFLOW_DEFAULT = "ciflow/default" -LABEL_CIFLOW_LIBTORCH = "ciflow/libtorch" -LABEL_CIFLOW_LINUX = "ciflow/linux" -LABEL_CIFLOW_MOBILE = "ciflow/mobile" -LABEL_CIFLOW_ANDROID = "ciflow/android" -LABEL_CIFLOW_SANITIZERS = "ciflow/sanitizers" -LABEL_CIFLOW_ONNX = "ciflow/onnx" -LABEL_CIFLOW_SCHEDULED = "ciflow/scheduled" -LABEL_CIFLOW_SLOW = "ciflow/slow" -LABEL_CIFLOW_WIN = "ciflow/win" -LABEL_CIFLOW_XLA = "ciflow/xla" -LABEL_CIFLOW_NOARCH = "ciflow/noarch" -LABEL_CIFLOW_VULKAN = "ciflow/vulkan" -LABEL_CIFLOW_PREFIX = "ciflow/" -LABEL_CIFLOW_SLOW_GRADCHECK = "ciflow/slow-gradcheck" -LABEL_CIFLOW_DOCKER = "ciflow/docker" -LABEL_CIFLOW_IOS = "ciflow/ios" -LABEL_CIFLOW_MACOS = "ciflow/macos" LABEL_CIFLOW_TRUNK = "ciflow/trunk" +LABEL_CIFLOW_ALL = "ciflow/all" LABEL_CIFLOW_BINARIES = "ciflow/binaries" -LABEL_CIFLOW_BINARIES_WHEEL = "ciflow/binaries_wheel" -LABEL_CIFLOW_BINARIES_CONDA = "ciflow/binaries_conda" +LABEL_CIFLOW_PERIODIC = "ciflow/periodic" LABEL_CIFLOW_BINARIES_LIBTORCH = "ciflow/binaries_libtorch" - +LABEL_CIFLOW_BINARIES_CONDA = "ciflow/binaries_conda" +LABEL_CIFLOW_BINARIES_WHEEL = "ciflow/binaries_wheel" @dataclass class CIFlowConfig: @@ -108,245 +35,13 @@ class CIFlowConfig: def __post_init__(self) -> None: if not self.isolated_workflow: self.labels.add(LABEL_CIFLOW_ALL) - if LABEL_CIFLOW_SCHEDULED not in self.labels: + if LABEL_CIFLOW_PERIODIC not in self.labels: self.labels.add(LABEL_CIFLOW_TRUNK) - assert all(label.startswith(LABEL_CIFLOW_PREFIX) for label in self.labels) - - -@dataclass -class CIFlowRuleset: - version = 'v1' - output_file = f'{GITHUB_DIR}/generated-ciflow-ruleset.json' - label_rules: Dict[str, Set[str]] = field(default_factory=dict) - - def add_label_rule(self, labels: Set[str], workflow_name: str) -> None: - for label in labels: - if label in self.label_rules: - self.label_rules[label].add(workflow_name) - else: - self.label_rules[label] = {workflow_name} - - def generate_json(self) -> None: - GENERATED = "generated" # Note that please keep the variable GENERATED otherwise phabricator will hide the whole file - output = { - "__comment": f"@{GENERATED} DO NOT EDIT MANUALLY, Generation script: .github/scripts/generate_ci_workflows.py", - "version": self.version, - "label_rules": { - label: sorted(list(workflows)) - for label, workflows in self.label_rules.items() - } - } - with open(self.output_file, 'w') as outfile: - json.dump(output, outfile, indent=2, sort_keys=True) - outfile.write('\n') - class Config(TypedDict): num_shards: int runner: str - -@dataclass -class CIWorkflow: - # Required fields - arch: Arch - build_environment: str - - # Optional fields - test_runner_type: str = '' - multigpu_runner_type: str = '' - distributed_gpu_runner_type: str = '' - ciflow_config: CIFlowConfig = field(default_factory=CIFlowConfig) - cuda_version: str = '' - docker_image_base: str = '' - enable_doc_jobs: bool = False - exclude_test: bool = False - build_generates_artifacts: bool = True - build_with_debug: bool = False - is_scheduled: str = '' - is_default: bool = False - on_pull_request: bool = False - num_test_shards: int = 1 - timeout_after: int = 240 - xcode_version: str = '' - ios_arch: str = '' - ios_platform: str = '' - test_jobs: Any = field(default_factory=list) - - enable_default_test: bool = True - enable_jit_legacy_test: bool = False - enable_distributed_test: bool = True - enable_multigpu_test: bool = False - enable_nogpu_no_avx_test: bool = False - enable_nogpu_no_avx2_test: bool = False - enable_slow_test: bool = False - enable_docs_test: bool = False - enable_backwards_compat_test: bool = False - enable_xla_test: bool = False - enable_noarch_test: bool = False - enable_force_on_cpu_test: bool = False - - def __post_init__(self) -> None: - if not self.build_generates_artifacts: - self.exclude_test = True - - self.multigpu_runner_type = LINUX_MULTIGPU_RUNNERS.get(self.test_runner_type, "linux.16xlarge.nvidia.gpu") - self.distributed_gpu_runner_type = LINUX_DISTRIBUTED_GPU_RUNNERS.get(self.test_runner_type, "linux.8xlarge.nvidia.gpu") - - if LABEL_CIFLOW_DEFAULT in self.ciflow_config.labels: - self.is_default = True - - if self.is_default: - self.on_pull_request = True - - self.test_jobs = self._gen_test_jobs() - self.assert_valid() - - def assert_valid(self) -> None: - err_message = f"invalid test_runner_type for {self.arch}: {self.test_runner_type}" - if self.arch == 'linux': - assert self.test_runner_type in LINUX_RUNNERS, err_message - if self.arch == 'windows': - assert self.test_runner_type in WINDOWS_RUNNERS, err_message - - if not self.ciflow_config.isolated_workflow: - assert LABEL_CIFLOW_ALL in self.ciflow_config.labels - if self.arch == 'linux': - assert LABEL_CIFLOW_LINUX in self.ciflow_config.labels - if self.arch == 'windows': - assert LABEL_CIFLOW_WIN in self.ciflow_config.labels - if self.arch == 'macos': - assert LABEL_CIFLOW_MACOS in self.ciflow_config.labels - # Make sure that jobs with tests have a test_runner_type - if not self.exclude_test: - assert self.test_runner_type != '' - if self.test_runner_type in CUDA_RUNNERS: - assert LABEL_CIFLOW_CUDA in self.ciflow_config.labels - if self.test_runner_type in ROCM_RUNNERS: - assert LABEL_CIFLOW_ROCM in self.ciflow_config.labels - if self.test_runner_type in CPU_RUNNERS and not self.exclude_test: - assert LABEL_CIFLOW_CPU in self.ciflow_config.labels - if self.is_scheduled: - assert LABEL_CIFLOW_DEFAULT not in self.ciflow_config.labels - assert LABEL_CIFLOW_TRUNK not in self.ciflow_config.labels - assert LABEL_CIFLOW_SCHEDULED in self.ciflow_config.labels - if self.build_with_debug: - assert self.build_environment.endswith("-debug") - - def generate_workflow_file(self, workflow_template: jinja2.Template) -> None: - output_file_path = GITHUB_DIR / f"workflows/generated-{self.build_environment}.yml" - with open(output_file_path, "w") as output_file: - GENERATED = "generated" # Note that please keep the variable GENERATED otherwise phabricator will hide the whole file - output_file.writelines([f"# @{GENERATED} DO NOT EDIT MANUALLY\n"]) - try: - content = workflow_template.render(asdict(self)) - except Exception as e: - print(f"Failed on template: {workflow_template}", file=sys.stderr) - raise e - output_file.write(content) - if content[-1] != "\n": - output_file.write("\n") - print(output_file_path) - - def normalized_build_environment(self, suffix: str) -> str: - return self.build_environment.replace(".", "_") + suffix - - def _gen_test_jobs(self) -> Any: - if self.arch == "linux": - MULTIGPU_RUNNER_TYPE = "linux.16xlarge.nvidia.gpu" - DISTRIBUTED_GPU_RUNNER_TYPE = "linux.8xlarge.nvidia.gpu" - NOGPU_RUNNER_TYPE = "linux.2xlarge" - elif self.arch == "windows": - DISTRIBUTED_GPU_RUNNER_TYPE = self.test_runner_type - NOGPU_RUNNER_TYPE = "windows.4xlarge" - - test_jobs = [] - - configs: Dict[str, Config] = {} - if self.enable_jit_legacy_test: - configs["jit_legacy"] = {"num_shards": 1, "runner": self.test_runner_type} - if self.enable_multigpu_test: - configs["multigpu"] = {"num_shards": 1, "runner": MULTIGPU_RUNNER_TYPE} - - if self.enable_nogpu_no_avx_test: - configs["nogpu_NO_AVX"] = {"num_shards": 1, "runner": NOGPU_RUNNER_TYPE} - if self.enable_nogpu_no_avx2_test: - configs["nogpu_NO_AVX2"] = {"num_shards": 1, "runner": NOGPU_RUNNER_TYPE} - if self.enable_force_on_cpu_test: - configs["force_on_cpu"] = {"num_shards": 1, "runner": NOGPU_RUNNER_TYPE} - if self.enable_distributed_test: - configs["distributed"] = { - "num_shards": 1, - "runner": DISTRIBUTED_GPU_RUNNER_TYPE - if "cuda" in str(self.build_environment) - else self.test_runner_type, - } - if self.enable_slow_test: - configs["slow"] = {"num_shards": 1, "runner": self.test_runner_type} - if self.enable_docs_test: - configs["docs_test"] = {"num_shards": 1, "runner": self.test_runner_type} - if self.enable_backwards_compat_test: - configs["backwards_compat"] = { - "num_shards": 1, - "runner": self.test_runner_type, - } - if self.enable_xla_test: - configs["xla"] = {"num_shards": 1, "runner": self.test_runner_type} - if self.enable_noarch_test: - configs["noarch"] = {"num_shards": 1, "runner": self.test_runner_type} - - for name, config in configs.items(): - for shard in range(1, config["num_shards"] + 1): - test_jobs.append( - { - "id": f"test_{name}_{shard}_{config['num_shards']}", - "name": f"test ({name}, {shard}, {config['num_shards']}, {config['runner']})", - "config": name, - "shard": shard, - "num_shards": config["num_shards"], - "runner": config["runner"], - } - ) - - if self.enable_default_test: - for shard in range(1, self.num_test_shards + 1): - test_jobs.append( - { - "id": f"test_default_{shard}_{self.num_test_shards}", - "name": f"test (default, {shard}, {self.num_test_shards}, {self.test_runner_type})", - "config": "default", - "shard": shard, - "num_shards": self.num_test_shards, - "runner": self.test_runner_type, - } - ) - return test_jobs - -@dataclass -class DockerWorkflow: - build_environment: str - docker_images: List[str] - - # Optional fields - ciflow_config: CIFlowConfig = field(default_factory=CIFlowConfig) - cuda_version: str = '' - is_scheduled: str = '' - - def generate_workflow_file(self, workflow_template: jinja2.Template) -> None: - output_file_path = GITHUB_DIR / "workflows/generated-docker-builds.yml" - with open(output_file_path, "w") as output_file: - GENERATED = "generated" # Note that please keep the variable GENERATED otherwise phabricator will hide the whole file - output_file.writelines([f"# @{GENERATED} DO NOT EDIT MANUALLY\n"]) - try: - content = workflow_template.render(asdict(self)) - except Exception as e: - print(f"Failed on template: {workflow_template}", file=sys.stderr) - raise e - output_file.write(content) - if content[-1] != "\n": - output_file.write("\n") - print(output_file_path) - @dataclass class BinaryBuildWorkflow: os: str @@ -358,6 +53,7 @@ class BinaryBuildWorkflow: abi_version: str = '' ciflow_config: CIFlowConfig = field(default_factory=CIFlowConfig) is_scheduled: str = '' + branches: str = 'nightly' # Mainly for macos cross_compile_arm64: bool = False xcode_version: str = '' @@ -369,7 +65,7 @@ def __post_init__(self) -> None: self.build_environment = f"{self.os}-binary-{self.package_type}" def generate_workflow_file(self, workflow_template: jinja2.Template) -> None: - output_file_path = GITHUB_DIR / f"workflows/generated-{self.build_environment}.yml" + output_file_path = GITHUB_DIR / f"workflows/generated-{self.build_environment}-{self.branches}.yml" with open(output_file_path, "w") as output_file: GENERATED = "generated" # Note that please keep the variable GENERATED otherwise phabricator will hide the whole file output_file.writelines([f"# @{GENERATED} DO NOT EDIT MANUALLY\n"]) @@ -383,533 +79,6 @@ def generate_workflow_file(self, workflow_template: jinja2.Template) -> None: output_file.write("\n") print(output_file_path) -WINDOWS_WORKFLOWS = [ - CIWorkflow( - arch="windows", - build_environment="win-vs2019-cpu-py3", - cuda_version="cpu", - enable_distributed_test=False, - test_runner_type=WINDOWS_CPU_TEST_RUNNER, - num_test_shards=2, - ciflow_config=CIFlowConfig( - run_on_canary=True, - labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_CPU, LABEL_CIFLOW_WIN} - ), - ), - CIWorkflow( - arch="windows", - build_environment="win-vs2019-cuda11.3-py3", - cuda_version="11.3", - enable_distributed_test=False, - test_runner_type=WINDOWS_CUDA_TEST_RUNNER, - num_test_shards=2, - enable_force_on_cpu_test=True, - # TODO: Revert back to default value after https://github.com/pytorch/pytorch/issues/73489 is closed - timeout_after=270, - ciflow_config=CIFlowConfig( - run_on_canary=True, - labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_CUDA, LABEL_CIFLOW_WIN} - ), - ), - CIWorkflow( - arch="windows", - build_environment="periodic-win-vs2019-cuda11.5-py3", - cuda_version="11.5", - enable_distributed_test=False, - test_runner_type=WINDOWS_CUDA_TEST_RUNNER, - num_test_shards=2, - enable_force_on_cpu_test=True, - is_scheduled="45 4,10,16,22 * * *", - ciflow_config=CIFlowConfig( - run_on_canary=True, - labels={LABEL_CIFLOW_SCHEDULED, LABEL_CIFLOW_CUDA, LABEL_CIFLOW_WIN} - ), - ), -] - -LINUX_WORKFLOWS = [ - CIWorkflow( - arch="linux", - build_environment="linux-xenial-py3.7-gcc5.4", - docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.7-gcc5.4", - test_runner_type=LINUX_CPU_TEST_RUNNER, - enable_jit_legacy_test=True, - enable_backwards_compat_test=True, - enable_docs_test=True, - num_test_shards=2, - ciflow_config=CIFlowConfig( - run_on_canary=True, - labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU} - ), - ), - CIWorkflow( - arch="linux", - build_environment="linux-docs", - docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.7-gcc5.4", - test_runner_type=LINUX_CPU_TEST_RUNNER, - enable_doc_jobs=True, - exclude_test=True, - ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_DOCS, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU} - ), - ), - CIWorkflow( - arch="linux", - build_environment="linux-docs-push", - docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.7-gcc5.4", - test_runner_type=LINUX_CPU_TEST_RUNNER, - enable_doc_jobs=True, - exclude_test=True, - is_scheduled="0 0 * * *", # run pushes only on a nightly schedule - # NOTE: This is purposefully left without LABEL_CIFLOW_DOCS so that you can run - # docs builds on your PR without the fear of anything pushing - ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_SCHEDULED, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU} - ), - ), - CIWorkflow( - arch="linux", - build_environment="linux-xenial-py3.7-gcc7", - docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.7-gcc7", - test_runner_type=LINUX_CPU_TEST_RUNNER, - num_test_shards=2, - ciflow_config=CIFlowConfig( - run_on_canary=True, - labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU} - ), - ), - # ParallelTBB does not have a maintainer and is currently flaky - # CIWorkflow( - # arch="linux", - # build_environment="paralleltbb-linux-xenial-py3.6-gcc5.4", - # docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc5.4", - # test_runner_type=LINUX_CPU_TEST_RUNNER, - # ciflow_config=CIFlowConfig( - # labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU}, - # ), - # ), - CIWorkflow( - arch="linux", - build_environment="parallelnative-linux-xenial-py3.7-gcc5.4", - docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.7-gcc5.4", - test_runner_type=LINUX_CPU_TEST_RUNNER, - ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU}, - ), - ), - # Build PyTorch with BUILD_CAFFE2=ON - CIWorkflow( - arch="linux", - build_environment="caffe2-linux-xenial-py3.7-gcc5.4", - docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.7-gcc5.4", - test_runner_type=LINUX_CPU_TEST_RUNNER, - exclude_test=True, - ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU}, - ), - ), - CIWorkflow( - arch="linux", - build_environment="linux-xenial-py3-clang5-mobile-build", - docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-asan", - test_runner_type=LINUX_CPU_TEST_RUNNER, - build_generates_artifacts=False, - exclude_test=True, - ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_MOBILE, LABEL_CIFLOW_DEFAULT}, - ), - ), - CIWorkflow( - arch="linux", - build_environment="linux-xenial-py3-clang5-mobile-custom-build-static", - docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c", - test_runner_type=LINUX_CPU_TEST_RUNNER, - build_generates_artifacts=False, - exclude_test=True, - ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_MOBILE, LABEL_CIFLOW_DEFAULT}, - ), - ), - CIWorkflow( - arch="linux", - build_environment="linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build", - docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.7-gcc5.4", - test_runner_type=LINUX_CPU_TEST_RUNNER, - build_generates_artifacts=False, - exclude_test=True, - ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_MOBILE, LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_LIBTORCH, LABEL_CIFLOW_CPU}, - ), - ), - CIWorkflow( - arch="linux", - build_environment="linux-xenial-py3.7-clang7-asan", - docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang7-asan", - test_runner_type=LINUX_CPU_TEST_RUNNER, - num_test_shards=3, - timeout_after=300, - enable_distributed_test=False, - ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_SANITIZERS, LABEL_CIFLOW_CPU}, - ), - ), - CIWorkflow( - arch="linux", - build_environment="linux-xenial-py3.7-clang7-onnx", - docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang7-onnx", - test_runner_type=LINUX_CPU_TEST_RUNNER, - num_test_shards=2, - enable_distributed_test=False, - ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_ONNX, LABEL_CIFLOW_CPU}, - ), - ), - CIWorkflow( - arch="linux", - build_environment="linux-bionic-cuda10.2-py3.9-gcc7", - docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7", - test_runner_type=LINUX_CUDA_TEST_RUNNER, - enable_jit_legacy_test=True, - enable_multigpu_test=True, - enable_nogpu_no_avx_test=True, - enable_nogpu_no_avx2_test=True, - enable_slow_test=True, - num_test_shards=2, - ciflow_config=CIFlowConfig( - run_on_canary=True, - labels={LABEL_CIFLOW_SLOW, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA} - ), - ), - CIWorkflow( - arch="linux", - build_environment="libtorch-linux-xenial-cuda10.2-py3.7-gcc7", - docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7", - test_runner_type=LINUX_CUDA_TEST_RUNNER, - build_generates_artifacts=False, - exclude_test=True, - ciflow_config=CIFlowConfig( - labels=set([LABEL_CIFLOW_LIBTORCH, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA]), - ), - ), - CIWorkflow( - arch="linux", - build_environment="periodic-linux-bionic-cuda11.5-py3.7-gcc7", - docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-cuda11.5-cudnn8-py3-gcc7", - test_runner_type=LINUX_CUDA_TEST_RUNNER, - num_test_shards=2, - is_scheduled="45 4,10,16,22 * * *", - ciflow_config=CIFlowConfig( - labels=set([LABEL_CIFLOW_SCHEDULED, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA]), - ), - ), - CIWorkflow( - arch="linux", - build_environment="periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7", - docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-cuda11.5-cudnn8-py3-gcc7", - test_runner_type=LINUX_CUDA_TEST_RUNNER, - build_generates_artifacts=False, - is_scheduled="45 4,10,16,22 * * *", - exclude_test=True, - ciflow_config=CIFlowConfig( - labels=set([LABEL_CIFLOW_SCHEDULED, LABEL_CIFLOW_LIBTORCH, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA]), - ), - ), - CIWorkflow( - arch="linux", - build_environment="linux-xenial-cuda11.3-py3.7-gcc7", - docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7", - test_runner_type=LINUX_CUDA_TEST_RUNNER, - num_test_shards=2, - ciflow_config=CIFlowConfig( - labels=set([LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA]), - ), - ), - # no-ops builds test USE_PER_OPERATOR_HEADERS=0 where ATen/ops is not generated - CIWorkflow( - arch="linux", - build_environment="linux-xenial-cuda11.3-py3.7-gcc7-no-ops", - docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7", - test_runner_type=LINUX_CUDA_TEST_RUNNER, - exclude_test=True, - ciflow_config=CIFlowConfig( - labels=set([LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA]), - ), - ), - CIWorkflow( - arch="linux", - build_environment="linux-xenial-py3.7-gcc7-no-ops", - docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.7-gcc7", - test_runner_type=LINUX_CPU_TEST_RUNNER, - exclude_test=True, - ciflow_config=CIFlowConfig( - labels=set([LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU]), - ), - ), - CIWorkflow( - arch="linux", - build_environment="linux-bionic-rocm4.5-py3.7", - docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-rocm4.5-py3.7", - test_runner_type=LINUX_ROCM_TEST_RUNNER, - num_test_shards=2, - ciflow_config=CIFlowConfig( - labels=set([LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_ROCM]), - ), - ), - CIWorkflow( - arch="linux", - build_environment="libtorch-linux-xenial-cuda11.3-py3.7-gcc7", - docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7", - test_runner_type=LINUX_CUDA_TEST_RUNNER, - build_generates_artifacts=False, - exclude_test=True, - ciflow_config=CIFlowConfig( - labels=set([LABEL_CIFLOW_LIBTORCH, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA]), - ), - ), - CIWorkflow( - arch="linux", - build_environment="periodic-linux-xenial-cuda11.3-py3.7-gcc7-debug", - docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7", - test_runner_type=LINUX_CUDA_TEST_RUNNER, - num_test_shards=2, - build_with_debug=True, - is_scheduled="45 0,4,8,12,16,20 * * *", - ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_SCHEDULED, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA} - ), - ), - CIWorkflow( - arch="linux", - build_environment="linux-bionic-py3.7-clang9", - docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-py3.7-clang9", - test_runner_type=LINUX_CPU_TEST_RUNNER, - num_test_shards=2, - enable_distributed_test=False, - enable_noarch_test=True, - ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU, LABEL_CIFLOW_NOARCH}, - ), - ), - CIWorkflow( - arch="linux", - build_environment="linux-vulkan-bionic-py3.7-clang9", - docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-py3.7-clang9", - test_runner_type=LINUX_CPU_TEST_RUNNER, - num_test_shards=1, - enable_distributed_test=False, - ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU, LABEL_CIFLOW_VULKAN}, - ), - ), - CIWorkflow( - arch="linux", - build_environment="periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck", - docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7", - test_runner_type=LINUX_CUDA_TEST_RUNNER, - num_test_shards=2, - enable_distributed_test=False, - timeout_after=360, - # Only run this on master 4 times per day since it does take a while - is_scheduled="0 */4 * * *", - ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA, LABEL_CIFLOW_SLOW_GRADCHECK, LABEL_CIFLOW_SLOW, LABEL_CIFLOW_SCHEDULED}, - ), - ), -] - -XLA_WORKFLOWS = [ - CIWorkflow( - arch="linux", - build_environment="pytorch-xla-linux-bionic-py3.7-clang8", - docker_image_base=f"{DOCKER_REGISTRY}/pytorch/xla_base", - test_runner_type=LINUX_CPU_TEST_RUNNER, - enable_distributed_test=False, - enable_xla_test=True, - enable_default_test=False, - on_pull_request=True, - ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU, LABEL_CIFLOW_XLA}, - ), - ), - -] - -ANDROID_SHORT_WORKFLOWS = [ - CIWorkflow( - arch="linux", - build_environment="pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single", - docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c", - test_runner_type=LINUX_CPU_TEST_RUNNER, - exclude_test=True, - ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU, LABEL_CIFLOW_ANDROID, LABEL_CIFLOW_DEFAULT}, - ), - ), - CIWorkflow( - arch="linux", - build_environment="pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit", - docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c", - test_runner_type=LINUX_CPU_TEST_RUNNER, - exclude_test=True, - ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU, LABEL_CIFLOW_ANDROID, LABEL_CIFLOW_DEFAULT}, - ), - ), -] - -ANDROID_WORKFLOWS = [ - CIWorkflow( - arch="linux", - build_environment="pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build", - docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c", - test_runner_type=LINUX_CPU_TEST_RUNNER, - exclude_test=True, - ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU, LABEL_CIFLOW_ANDROID}, - ), - ), -] - -BAZEL_WORKFLOWS = [ - CIWorkflow( - arch="linux", - build_environment="linux-xenial-cuda11.3-py3.7-gcc7-bazel-test", - docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7", - test_runner_type=LINUX_CPU_TEST_RUNNER, - ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_BAZEL, LABEL_CIFLOW_CPU, LABEL_CIFLOW_LINUX}, - ), - ), -] - -IOS_WORKFLOWS = [ - CIWorkflow( - arch="macos", - build_environment="ios-12-5-1-arm64", - ios_arch="arm64", - ios_platform="OS", - test_runner_type=MACOS_TEST_RUNNER_10_15, - is_scheduled="45 4,10,16,22 * * *", - exclude_test=True, - ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_SCHEDULED, LABEL_CIFLOW_IOS, LABEL_CIFLOW_MACOS}, - ), - ), - CIWorkflow( - arch="macos", - build_environment="ios-12-5-1-arm64-coreml", - ios_arch="arm64", - ios_platform="OS", - test_runner_type=MACOS_TEST_RUNNER_10_15, - is_scheduled="45 4,10,16,22 * * *", - exclude_test=True, - ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_SCHEDULED, LABEL_CIFLOW_IOS, LABEL_CIFLOW_MACOS}, - ), - ), - CIWorkflow( - arch="macos", - build_environment="ios-12-5-1-arm64-custom-ops", - ios_arch="arm64", - ios_platform="OS", - test_runner_type=MACOS_TEST_RUNNER_10_15, - is_scheduled="45 4,10,16,22 * * *", - exclude_test=True, - ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_SCHEDULED, LABEL_CIFLOW_IOS, LABEL_CIFLOW_MACOS}, - ), - ), - CIWorkflow( - arch="macos", - build_environment="ios-12-5-1-arm64-metal", - ios_arch="arm64", - ios_platform="OS", - test_runner_type=MACOS_TEST_RUNNER_10_15, - is_scheduled="45 4,10,16,22 * * *", - exclude_test=True, - ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_SCHEDULED, LABEL_CIFLOW_IOS, LABEL_CIFLOW_MACOS}, - ), - ), - CIWorkflow( - arch="macos", - build_environment="ios-12-5-1-x86-64", - ios_arch="x86_64", - ios_platform="SIMULATOR", - test_runner_type=MACOS_TEST_RUNNER_10_15, - exclude_test=True, - ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_IOS, LABEL_CIFLOW_MACOS}, - ), - ), - CIWorkflow( - arch="macos", - build_environment="ios-12-5-1-x86-64-coreml", - ios_arch="x86_64", - ios_platform="SIMULATOR", - test_runner_type=MACOS_TEST_RUNNER_10_15, - exclude_test=True, - ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_IOS, LABEL_CIFLOW_MACOS}, - ), - ), -] - -MACOS_WORKFLOWS = [ - # Distributed tests are still run on MacOS, but part of regular shards - CIWorkflow( - arch="macos", - build_environment="macos-11-py3-x86-64", - xcode_version="12.4", - test_runner_type=MACOS_TEST_RUNNER_11, - num_test_shards=2, - enable_distributed_test=False, - ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_MACOS}, - ), - ), - CIWorkflow( - arch="macos", - build_environment="macos-10-15-py3-lite-interpreter-x86-64", - xcode_version="12", - test_runner_type=MACOS_TEST_RUNNER_10_15, - exclude_test=True, - build_generates_artifacts=False, - ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_MACOS}, - ), - ), - CIWorkflow( - arch="macos", - build_environment="macos-10-15-py3-arm64", - test_runner_type=MACOS_TEST_RUNNER_10_15, - exclude_test=True, - ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_MACOS}, - ), - ), -] - -DOCKER_IMAGES = { - f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-rocm4.3.1-py3.7", # for rocm - f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-rocm4.5-py3.7", # for rocm -} - -DOCKER_IMAGES.update({ - workflow.docker_image_base - for workflow in [*LINUX_WORKFLOWS, *BAZEL_WORKFLOWS, *ANDROID_WORKFLOWS] - if workflow.docker_image_base -}) - -DOCKER_WORKFLOWS = [ - DockerWorkflow( - build_environment="docker-builds", - docker_images=sorted(DOCKER_IMAGES), - # Run every Wednesday at 3:01am to ensure they can build - is_scheduled="1 3 * * 3", - ), -] - class OperatingSystem: LINUX = "linux" WINDOWS = "windows" @@ -922,7 +91,7 @@ class OperatingSystem: package_type="manywheel", build_configs=generate_binary_build_matrix.generate_wheels_matrix(OperatingSystem.LINUX), ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_BINARIES, LABEL_CIFLOW_BINARIES_WHEEL}, + labels={LABEL_CIFLOW_BINARIES, LABEL_CIFLOW_BINARIES_WHEEL}, isolated_workflow=True, ), ), @@ -931,7 +100,7 @@ class OperatingSystem: package_type="conda", build_configs=generate_binary_build_matrix.generate_conda_matrix(OperatingSystem.LINUX), ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_BINARIES, LABEL_CIFLOW_BINARIES_CONDA}, + labels={LABEL_CIFLOW_BINARIES, LABEL_CIFLOW_BINARIES_CONDA}, isolated_workflow=True, ), ), @@ -943,7 +112,7 @@ class OperatingSystem: OperatingSystem.LINUX, generate_binary_build_matrix.CXX11_ABI ), ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_BINARIES, LABEL_CIFLOW_BINARIES_LIBTORCH}, + labels={LABEL_CIFLOW_BINARIES, LABEL_CIFLOW_BINARIES_LIBTORCH}, isolated_workflow=True, ), ), @@ -955,33 +124,65 @@ class OperatingSystem: OperatingSystem.LINUX, generate_binary_build_matrix.PRE_CXX11_ABI ), ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_BINARIES, LABEL_CIFLOW_BINARIES_LIBTORCH}, + labels={LABEL_CIFLOW_BINARIES, LABEL_CIFLOW_BINARIES_LIBTORCH}, isolated_workflow=True, ), ), ] +LINUX_BINARY_SMOKE_WORKFLOWS = [ + BinaryBuildWorkflow( + os=OperatingSystem.LINUX, + package_type="manywheel", + build_configs=generate_binary_build_matrix.generate_wheels_matrix( + OperatingSystem.LINUX, + arches=["10.2"], + python_versions=["3.7"]), + branches="master", + ), + BinaryBuildWorkflow( + os=OperatingSystem.LINUX, + package_type="libtorch", + abi_version=generate_binary_build_matrix.CXX11_ABI, + build_configs=generate_binary_build_matrix.generate_libtorch_matrix( + OperatingSystem.LINUX, generate_binary_build_matrix.CXX11_ABI, + arches=["cpu"], + libtorch_variants=["shared-with-deps"], + ), + branches="master", + ), + BinaryBuildWorkflow( + os=OperatingSystem.LINUX, + package_type="libtorch", + abi_version=generate_binary_build_matrix.PRE_CXX11_ABI, + build_configs=generate_binary_build_matrix.generate_libtorch_matrix( + OperatingSystem.LINUX, generate_binary_build_matrix.CXX11_ABI, + arches=["cpu"], + libtorch_variants=["shared-with-deps"], + ), + branches="master", + ), +] + WINDOWS_BINARY_BUILD_WORKFLOWS = [ BinaryBuildWorkflow( os=OperatingSystem.WINDOWS, package_type="wheel", build_configs=generate_binary_build_matrix.generate_wheels_matrix(OperatingSystem.WINDOWS), ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_BINARIES, LABEL_CIFLOW_BINARIES_WHEEL}, + labels={LABEL_CIFLOW_BINARIES, LABEL_CIFLOW_BINARIES_WHEEL}, + isolated_workflow=True, + ), + ), + BinaryBuildWorkflow( + os=OperatingSystem.WINDOWS, + package_type="conda", + build_configs=generate_binary_build_matrix.generate_conda_matrix(OperatingSystem.WINDOWS), + ciflow_config=CIFlowConfig( + labels={LABEL_CIFLOW_BINARIES, LABEL_CIFLOW_BINARIES_CONDA}, isolated_workflow=True, ), ), - # NOTE: conda binaries are currently bugged on the installation step - # See, https://github.com/pytorch/pytorch/pull/71484#issuecomment-1022617195 - # BinaryBuildWorkflow( - # os=OperatingSystem.WINDOWS, - # package_type="conda", - # build_configs=generate_binary_build_matrix.generate_conda_matrix(OperatingSystem.WINDOWS), - # ciflow_config=CIFlowConfig( - # labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_BINARIES, LABEL_CIFLOW_BINARIES_CONDA}, - # isolated_workflow=True, - # ), - # ), BinaryBuildWorkflow( os=OperatingSystem.WINDOWS, package_type="libtorch", @@ -990,7 +191,7 @@ class OperatingSystem: OperatingSystem.WINDOWS, generate_binary_build_matrix.RELEASE ), ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_BINARIES, LABEL_CIFLOW_BINARIES_LIBTORCH}, + labels={LABEL_CIFLOW_BINARIES, LABEL_CIFLOW_BINARIES_LIBTORCH}, isolated_workflow=True, ), ), @@ -1002,11 +203,44 @@ class OperatingSystem: OperatingSystem.WINDOWS, generate_binary_build_matrix.DEBUG ), ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_BINARIES, LABEL_CIFLOW_BINARIES_LIBTORCH}, + labels={LABEL_CIFLOW_BINARIES, LABEL_CIFLOW_BINARIES_LIBTORCH}, isolated_workflow=True, ), ), ] +WINDOWS_BINARY_SMOKE_WORKFLOWS = [ + BinaryBuildWorkflow( + os=OperatingSystem.WINDOWS, + package_type="wheel", + build_configs=generate_binary_build_matrix.generate_wheels_matrix( + OperatingSystem.WINDOWS, + arches=["11.3"], + python_versions=["3.7"]), + branches="master", + ), + BinaryBuildWorkflow( + os=OperatingSystem.WINDOWS, + package_type="libtorch", + abi_version=generate_binary_build_matrix.RELEASE, + build_configs=generate_binary_build_matrix.generate_libtorch_matrix( + OperatingSystem.WINDOWS, generate_binary_build_matrix.RELEASE, + arches=["cpu"], + libtorch_variants=["shared-with-deps"], + ), + branches="master", + ), + BinaryBuildWorkflow( + os=OperatingSystem.WINDOWS, + package_type="libtorch", + abi_version=generate_binary_build_matrix.DEBUG, + build_configs=generate_binary_build_matrix.generate_libtorch_matrix( + OperatingSystem.WINDOWS, generate_binary_build_matrix.DEBUG, + arches=["cpu"], + libtorch_variants=["shared-with-deps"], + ), + branches="master", + ), +] MACOS_BINARY_BUILD_WORKFLOWS = [ BinaryBuildWorkflow( @@ -1014,7 +248,7 @@ class OperatingSystem: package_type="wheel", build_configs=generate_binary_build_matrix.generate_wheels_matrix(OperatingSystem.MACOS), ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_BINARIES, LABEL_CIFLOW_BINARIES_WHEEL}, + labels={LABEL_CIFLOW_BINARIES, LABEL_CIFLOW_BINARIES_WHEEL}, isolated_workflow=True, ), ), @@ -1023,7 +257,7 @@ class OperatingSystem: package_type="conda", build_configs=generate_binary_build_matrix.generate_conda_matrix(OperatingSystem.MACOS), ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_BINARIES, LABEL_CIFLOW_BINARIES_CONDA}, + labels={LABEL_CIFLOW_BINARIES, LABEL_CIFLOW_BINARIES_CONDA}, isolated_workflow=True, ), ), @@ -1035,7 +269,7 @@ class OperatingSystem: OperatingSystem.MACOS, generate_binary_build_matrix.CXX11_ABI ), ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_BINARIES, LABEL_CIFLOW_BINARIES_LIBTORCH}, + labels={LABEL_CIFLOW_BINARIES, LABEL_CIFLOW_BINARIES_LIBTORCH}, isolated_workflow=True, ), ), @@ -1047,7 +281,7 @@ class OperatingSystem: OperatingSystem.MACOS, generate_binary_build_matrix.PRE_CXX11_ABI ), ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_BINARIES, LABEL_CIFLOW_BINARIES_LIBTORCH}, + labels={LABEL_CIFLOW_BINARIES, LABEL_CIFLOW_BINARIES_LIBTORCH}, isolated_workflow=True, ), ), @@ -1057,7 +291,7 @@ class OperatingSystem: build_configs=generate_binary_build_matrix.generate_wheels_matrix(OperatingSystem.MACOS), cross_compile_arm64=True, ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_BINARIES, LABEL_CIFLOW_BINARIES_WHEEL}, + labels={LABEL_CIFLOW_BINARIES, LABEL_CIFLOW_BINARIES_WHEEL}, isolated_workflow=True, ), ), @@ -1067,7 +301,7 @@ class OperatingSystem: cross_compile_arm64=True, build_configs=generate_binary_build_matrix.generate_conda_matrix(OperatingSystem.MACOS_ARM64), ciflow_config=CIFlowConfig( - labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_BINARIES, LABEL_CIFLOW_BINARIES_CONDA}, + labels={LABEL_CIFLOW_BINARIES, LABEL_CIFLOW_BINARIES_CONDA}, isolated_workflow=True, ), ), @@ -1079,18 +313,13 @@ def main() -> None: loader=jinja2.FileSystemLoader(str(GITHUB_DIR.joinpath("templates"))), undefined=jinja2.StrictUndefined, ) + + # not ported yet template_and_workflows = [ - (jinja_env.get_template("linux_ci_workflow.yml.j2"), LINUX_WORKFLOWS), - (jinja_env.get_template("linux_ci_workflow.yml.j2"), XLA_WORKFLOWS), - (jinja_env.get_template("windows_ci_workflow.yml.j2"), WINDOWS_WORKFLOWS), - (jinja_env.get_template("bazel_ci_workflow.yml.j2"), BAZEL_WORKFLOWS), - (jinja_env.get_template("ios_ci_workflow.yml.j2"), IOS_WORKFLOWS), - (jinja_env.get_template("macos_ci_workflow.yml.j2"), MACOS_WORKFLOWS), - (jinja_env.get_template("docker_builds_ci_workflow.yml.j2"), DOCKER_WORKFLOWS), - (jinja_env.get_template("android_ci_full_workflow.yml.j2"), ANDROID_WORKFLOWS), - (jinja_env.get_template("android_ci_workflow.yml.j2"), ANDROID_SHORT_WORKFLOWS), (jinja_env.get_template("linux_binary_build_workflow.yml.j2"), LINUX_BINARY_BUILD_WORFKLOWS), + (jinja_env.get_template("linux_binary_build_workflow.yml.j2"), LINUX_BINARY_SMOKE_WORKFLOWS), (jinja_env.get_template("windows_binary_build_workflow.yml.j2"), WINDOWS_BINARY_BUILD_WORKFLOWS), + (jinja_env.get_template("windows_binary_build_workflow.yml.j2"), WINDOWS_BINARY_SMOKE_WORKFLOWS), (jinja_env.get_template("macos_binary_build_workflow.yml.j2"), MACOS_BINARY_BUILD_WORKFLOWS), ] # Delete the existing generated files first, this should align with .gitattributes file description. @@ -1101,16 +330,12 @@ def main() -> None: except Exception as e: print(f"Error occurred when deleting file {w}: {e}") - ciflow_ruleset = CIFlowRuleset() for template, workflows in template_and_workflows: # added Iterable check to appease the mypy gods if not isinstance(workflows, Iterable): raise Exception(f"How is workflows not iterable? {workflows}") for workflow in workflows: workflow.generate_workflow_file(workflow_template=template) - ciflow_ruleset.add_label_rule(workflow.ciflow_config.labels, workflow.build_environment) - ciflow_ruleset.generate_json() - if __name__ == "__main__": main() diff --git a/.github/scripts/get_workflow_job_id.py b/.github/scripts/get_workflow_job_id.py new file mode 100644 index 00000000000000..72aed91d55ca96 --- /dev/null +++ b/.github/scripts/get_workflow_job_id.py @@ -0,0 +1,60 @@ +# Helper to get the id of the currently running job in a GitHub Actions +# workflow. GitHub does not provide this information to workflow runs, so we +# need to figure it out based on what they *do* provide. + +import requests +import os +import argparse + +# Our strategy is to retrieve the parent workflow run, then filter its jobs on +# RUNNER_NAME to figure out which job we're currently running. +# +# Why RUNNER_NAME? Because it's the only thing that uniquely identifies a job within a workflow. +# GITHUB_JOB doesn't work, as it corresponds to the job yaml id +# (https://bit.ly/37e78oI), which has two problems: +# 1. It's not present in the workflow job JSON object, so we can't use it as a filter. +# 2. It isn't unique; for matrix jobs the job yaml id is the same for all jobs in the matrix. +# +# RUNNER_NAME on the other hand is unique across the pool of runners. Also, +# since only one job can be scheduled on a runner at a time, we know that +# looking for RUNNER_NAME will uniquely identify the job we're currently +# running. +parser = argparse.ArgumentParser() +parser.add_argument( + "workflow_run_id", help="The id of the workflow run, should be GITHUB_RUN_ID" +) +parser.add_argument( + "runner_name", + help="The name of the runner to retrieve the job id, should be RUNNER_NAME", +) + +args = parser.parse_args() + + +PYTORCH_REPO = "https://api.github.com/repos/pytorch/pytorch" +GITHUB_TOKEN = os.environ["GITHUB_TOKEN"] +REQUEST_HEADERS = { + "Accept": "application/vnd.github.v3+json", + "Authorization": "token " + GITHUB_TOKEN, +} + +response = requests.get( + f"{PYTORCH_REPO}/actions/runs/{args.workflow_run_id}/jobs?per_page=100", + headers=REQUEST_HEADERS, +) + +jobs = response.json()["jobs"] +while "next" in response.links.keys(): + response = requests.get(response.links["next"]["url"], headers=REQUEST_HEADERS) + jobs.extend(response.json()["jobs"]) + +# Sort the jobs list by start time, in descending order. We want to get the most +# recently scheduled job on the runner. +jobs.sort(key=lambda job: job["started_at"], reverse=True) + +for job in jobs: + if job["runner_name"] == args.runner_name: + print(job["id"]) + exit(0) + +exit(1) diff --git a/.github/scripts/gitutils.py b/.github/scripts/gitutils.py index 7d5d24f7963043..d8d4e8f7cd8592 100644 --- a/.github/scripts/gitutils.py +++ b/.github/scripts/gitutils.py @@ -1,10 +1,11 @@ #!/usr/bin/env python3 +import os +import re +import tempfile from collections import defaultdict from datetime import datetime from typing import cast, Any, Dict, Iterator, List, Optional, Tuple, Union -import os -import re RE_GITHUB_URL_MATCH = re.compile("^https://.*@?github.com/(.+)/(.+)$") @@ -30,9 +31,9 @@ def fuzzy_list_to_dict(items: List[Tuple[str, str]]) -> Dict[str, List[str]]: def _check_output(items: List[str], encoding: str = "utf-8") -> str: - from subprocess import check_output, CalledProcessError + from subprocess import check_output, CalledProcessError, STDOUT try: - return check_output(items).decode(encoding) + return check_output(items, stderr=STDOUT).decode(encoding) except CalledProcessError as e: msg = f"Command `{' '.join(e.cmd)}` returned non-zero exit code {e.returncode}" stdout = e.stdout.decode(encoding) if e.stdout is not None else "" @@ -129,8 +130,13 @@ def current_branch(self) -> str: def checkout(self, branch: str) -> None: self._run_git("checkout", branch) - def fetch(self, ref: str, branch: str) -> None: - self._run_git("fetch", self.remote, f"{ref}:{branch}") + def fetch(self, ref: Optional[str] = None, branch: Optional[str] = None) -> None: + if branch is None and ref is None: + self._run_git("fetch", self.remote) + elif branch is None: + self._run_git("fetch", self.remote, ref) + else: + self._run_git("fetch", self.remote, f"{ref}:{branch}") def show_ref(self, name: str) -> str: refs = self._run_git('show-ref', '-s', name).strip().split('\n') @@ -188,8 +194,15 @@ def compute_branch_diffs(self, from_branch: str, to_branch: str) -> Tuple[List[s while len(from_values) > 0 and len(to_values) > 0: frc = self.get_commit(from_values.pop()) toc = self.get_commit(to_values.pop()) + # FRC branch might have PR number added to the title if frc.title != toc.title or frc.author_date != toc.author_date: - raise RuntimeError(f"Unexpected differences between {frc} and {toc}") + # HACK: Same commit were merged, reverted and landed again + # which creates a tracking problem + if ( + "pytorch/pytorch" not in self.remote_url() or + frc.commit_hash != "0a6a1b27a464ba5be5f587cce2ee12ab8c504dbf" + ): + raise RuntimeError(f"Unexpected differences between {frc} and {toc}") from_commits.remove(frc.commit_hash) to_commits.remove(toc.commit_hash) continue @@ -212,11 +225,19 @@ def cherry_pick_commits(self, from_branch: str, to_branch: str) -> None: self.cherry_pick(commit) self.checkout(orig_branch) - def push(self, branch: str, dry_run: bool) -> None: - if dry_run: - self._run_git("push", "--dry-run", self.remote, branch) - else: - self._run_git("push", self.remote, branch) + def push(self, branch: str, dry_run: bool, retry: int = 3) -> None: + for cnt in range(retry): + try: + if dry_run: + self._run_git("push", "--dry-run", self.remote, branch) + else: + self._run_git("push", self.remote, branch) + except RuntimeError as e: + # Check if push were rejected because branch is stale + if len(e.args) == 0 or re.search(r"\[rejected\].+\(fetch first\)\n", e.args[0]) is None: + raise + self.fetch() + self._run_git("rebase", f"{self.remote}/{branch}") def head_hash(self) -> str: return self._run_git("show-ref", "--hash", "HEAD").strip() @@ -240,6 +261,12 @@ def amend_commit_message(self, msg: str) -> None: self._run_git("commit", "--amend", "-m", msg) +def clone_repo(username: str, password: str, org: str, project: str) -> GitRepo: + path = tempfile.mkdtemp() + _check_output(['git', 'clone', f'https://{username}:{password}@github.com/{org}/{project}', path]).strip() + return GitRepo(path=path) + + class PeekableIterator(Iterator[str]): def __init__(self, val: str) -> None: self._val = val diff --git a/.github/scripts/gql_mocks.json b/.github/scripts/gql_mocks.json index 16c563eced73c5..123680f97a33e7 100644 --- a/.github/scripts/gql_mocks.json +++ b/.github/scripts/gql_mocks.json @@ -1 +1,9182 @@ -{"query_sha=fea7527b55661c30013cf0ce69b664e4ffe28199ce44b9af994c72288bde5fa0 name=pytorch number=71759 owner=pytorch": {"data": {"repository": {"pullRequest": {"closed": true, "isCrossRepository": true, "author": {"login": "coolteemf"}, "title": "Optimize grid sample 3d", "body": "Fixes #71415\r\nI have implemented the changes that replicate what @to-mi did in this [PR](https://github.com/pytorch/pytorch/pull/65986#issue-1012959443) for the 3D case :\r\n\r\n> Fixes #64977\r\n> \r\n> Avoids creating a tensor for and calculating `input` gradient if it's not needed in the backward pass of `grid_sample` (2d case, native CPU & CUDA kernels). Especially the tensor creation seemed time consuming (see #64977).\r\n> \r\n> Brief description of the changes:\r\n> \r\n> * I have tried to go with rather minimal changes. It would probably be possible to make a more elegant version with a bit larger refactoring (or possibly with better understanding of PyTorch internals and C++ functionalities).\r\n> \r\n> * Changed the `native_functions.yaml` and `derivatives.yaml` so that the gradient input mask is passed to the functions.\r\n> \r\n> * Changed the CPU kernels:\r\n> (1) added `bool input_requires_grad` template parameter to the `backward` function,\r\n> (2) added if branches based on it to remove `input` gradient computations if it's not requested,\r\n> (3) feed in `TensorAccessor* gInp_slice_ptr` instead of `TensorAccessor& gInp_slice` so that I can pass a `nullptr` in case gradient for `input` is not requested. (A bit inelegant perhaps, but allows to keep one signature for `backward` function and not require breaking it to smaller pieces. Perhaps there's a more elegant way to achieve this?)\r\n> \r\n> * Changed CUDA kernel:\r\n> (1) added ~`bool input_requires_grad` template parameter~ `const bool input_requires_grad` argument to the `backward` function,\r\n> (2) added if branches based on it to remove `input` gradient computations if it's not requested,\r\n> (3) feed in `TensorInfo()` instead of `getTensorInfo(grad_input)` in case gradient for `input` is not requested.\r\n> \r\n> * Modified tests in `test/test_nn.py` so that they run also cases with no `input` gradient needed.\r\n> \r\n> * Have not touched the CPU fallback kernel.\r\n\r\nNote: the changes number (3) are N/A in this case.\r\n\r\n", "headRefName": "optimize_grid_sample_3d", "headRepository": {"nameWithOwner": "coolteemf/pytorch"}, "baseRefName": "master", "baseRepository": {"nameWithOwner": "pytorch/pytorch", "isPrivate": false, "defaultBranchRef": {"name": "master"}}, "mergeCommit": null, "commits": {"nodes": [{"commit": {"author": {"user": null, "email": "ghp_XXXXXX", "name": "coolteemf"}, "oid": "e0b0d1e695aeddceaf265da602c4704592053e9e", "checkSuites": {"nodes": []}}}, {"commit": {"author": {"user": null, "email": "ghp_XXXXXX", "name": "coolteemf"}, "oid": "563ec73747ad53b63b36736c47c4342f962c2a09", "checkSuites": {"nodes": []}}}, {"commit": {"author": {"user": null, "email": "ghp_XXXXXX", "name": "coolteemf"}, "oid": "51abe41a132d9dd5b1c0551bdca902aacc028ff8", "checkSuites": {"nodes": []}}}, {"commit": {"author": {"user": null, "email": "ghp_XXXXXX", "name": "coolteemf"}, "oid": "be9898205992034a00e8ace8a55c2ecdcee2c2f8", "checkSuites": {"nodes": []}}}, {"commit": {"author": {"user": null, "email": "ghp_XXXXXX", "name": "coolteemf"}, "oid": "2929c60b64384c2deae0f7dea8bab94ad4bc9ec8", "checkSuites": {"nodes": []}}}, {"commit": {"author": {"user": null, "email": "ghp_XXXXXX", "name": "coolteemf"}, "oid": "9241b737e7e2b257905cc74ad9c50b737d7f9d0a", "checkSuites": {"nodes": []}}}, {"commit": {"author": {"user": null, "email": "ghp_XXXXXX", "name": "coolteemf"}, "oid": "64d6b795d0636928a8aa2fd3da01302fb5f5f7af", "checkSuites": {"nodes": []}}}, {"commit": {"author": {"user": null, "email": "ghp_XXXXXX", "name": "coolteemf"}, "oid": "4503577e53760a0006f1e80ca6bfe04d2be90470", "checkSuites": {"nodes": []}}}, {"commit": {"author": {"user": null, "email": "ghp_XXXXXX", "name": "coolteemf"}, "oid": "b16f4b11ffbbbf2ca2098f9702af4ef6b6fc5e1f", "checkSuites": {"nodes": [{"app": {"databaseId": 12274}, "conclusion": "SUCCESS"}]}}}, {"commit": {"author": {"user": null, "email": "ghp_XXXXXX", "name": "coolteemf"}, "oid": "7ffc23368a604afdc92d2818747f730ce31a2bb5", "checkSuites": {"nodes": []}}}, {"commit": {"author": {"user": null, "email": "ghp_XXXXXX", "name": "coolteemf"}, "oid": "b85292604b9ad6c31706b76b5a5498c4f6d94309", "checkSuites": {"nodes": [{"app": {"databaseId": 12274}, "conclusion": "SUCCESS"}]}}}, {"commit": {"author": {"user": null, "email": "ghp_XXXXXX", "name": "coolteemf"}, "oid": "9d81d7bae8ad91aaa24b3ceab83e3138894dbc69", "checkSuites": {"nodes": [{"app": {"databaseId": 12274}, "conclusion": "SUCCESS"}]}}}, {"commit": {"author": {"user": null, "email": "ghp_XXXXXX", "name": "coolteemf"}, "oid": "e79f6a2202512b294c55bf4bfb2e0524fafd4c48", "checkSuites": {"nodes": [{"app": {"databaseId": 12274}, "conclusion": "SUCCESS"}]}}}, {"commit": {"author": {"user": null, "email": "ghp_XXXXXX", "name": "coolteemf"}, "oid": "f683e8aec7aea76097a264eec01511e704c31154", "checkSuites": {"nodes": [{"app": {"databaseId": 12274}, "conclusion": "SUCCESS"}]}}}, {"commit": {"author": {"user": {"login": "coolteemf"}, "email": "67541941+coolteemf@users.noreply.github.com", "name": "Fran\u00e7ois Lecomte"}, "oid": "b932e9e286c22aaf352375186df851ef060b295a", "checkSuites": {"nodes": [{"app": {"databaseId": 12274}, "conclusion": "SUCCESS"}]}}}, {"commit": {"author": {"user": null, "email": "ghp_XXXXXX", "name": "coolteemf"}, "oid": "346e0c547953d98eb84d23c1391a95badb9c4a22", "checkSuites": {"nodes": [{"app": {"databaseId": 12274}, "conclusion": "SUCCESS"}]}}}], "totalCount": 16}, "changedFiles": 9, "files": {"nodes": [{"path": "aten/src/ATen/native/GridSampler.cpp"}, {"path": "aten/src/ATen/native/cpu/GridSamplerKernel.cpp"}, {"path": "aten/src/ATen/native/cuda/GridSampler.cpp"}, {"path": "aten/src/ATen/native/cuda/GridSampler.cu"}, {"path": "aten/src/ATen/native/cuda/GridSampler.h"}, {"path": "aten/src/ATen/native/native_functions.yaml"}, {"path": "test/forward_backward_compatibility/check_forward_backward_compatibility.py"}, {"path": "test/test_nn.py"}, {"path": "tools/autograd/derivatives.yaml"}]}, "reviews": {"nodes": [{"author": {"login": "albanD"}, "state": "COMMENTED"}, {"author": {"login": "coolteemf"}, "state": "COMMENTED"}, {"author": {"login": "albanD"}, "state": "COMMENTED"}, {"author": {"login": "coolteemf"}, "state": "COMMENTED"}, {"author": {"login": "albanD"}, "state": "COMMENTED"}, {"author": {"login": "coolteemf"}, "state": "COMMENTED"}, {"author": {"login": "coolteemf"}, "state": "COMMENTED"}, {"author": {"login": "albanD"}, "state": "COMMENTED"}, {"author": {"login": "coolteemf"}, "state": "COMMENTED"}, {"author": {"login": "albanD"}, "state": "COMMENTED"}, {"author": {"login": "albanD"}, "state": "COMMENTED"}, {"author": {"login": "coolteemf"}, "state": "COMMENTED"}, {"author": {"login": "albanD"}, "state": "COMMENTED"}, {"author": {"login": "coolteemf"}, "state": "COMMENTED"}, {"author": {"login": "albanD"}, "state": "COMMENTED"}, {"author": {"login": "albanD"}, "state": "APPROVED"}, {"author": {"login": "albanD"}, "state": "APPROVED"}], "totalCount": 17}, "comments": {"nodes": [{"bodyText": "Hey @coolteemf.\nYou've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.\nFor changes that are 'topic: not user facing' there is no need for a release notes label.", "author": {"login": "github-actions"}, "authorAssociation": "NONE", "editor": null}]}}}}}, "query_sha=1847f597bd535a3d45a5751d69792ce57f4e565713118eee6057e5ee89e17997 name=pytorch number=71759 owner=pytorch": {"data": {"repository": {"pullRequest": {"closed": true, "isCrossRepository": true, "author": {"login": "coolteemf"}, "title": "Optimize grid sample 3d", "body": "Fixes #71415\r\nI have implemented the changes that replicate what @to-mi did in this [PR](https://github.com/pytorch/pytorch/pull/65986#issue-1012959443) for the 3D case :\r\n\r\n> Fixes #64977\r\n> \r\n> Avoids creating a tensor for and calculating `input` gradient if it's not needed in the backward pass of `grid_sample` (2d case, native CPU & CUDA kernels). Especially the tensor creation seemed time consuming (see #64977).\r\n> \r\n> Brief description of the changes:\r\n> \r\n> * I have tried to go with rather minimal changes. It would probably be possible to make a more elegant version with a bit larger refactoring (or possibly with better understanding of PyTorch internals and C++ functionalities).\r\n> \r\n> * Changed the `native_functions.yaml` and `derivatives.yaml` so that the gradient input mask is passed to the functions.\r\n> \r\n> * Changed the CPU kernels:\r\n> (1) added `bool input_requires_grad` template parameter to the `backward` function,\r\n> (2) added if branches based on it to remove `input` gradient computations if it's not requested,\r\n> (3) feed in `TensorAccessor* gInp_slice_ptr` instead of `TensorAccessor& gInp_slice` so that I can pass a `nullptr` in case gradient for `input` is not requested. (A bit inelegant perhaps, but allows to keep one signature for `backward` function and not require breaking it to smaller pieces. Perhaps there's a more elegant way to achieve this?)\r\n> \r\n> * Changed CUDA kernel:\r\n> (1) added ~`bool input_requires_grad` template parameter~ `const bool input_requires_grad` argument to the `backward` function,\r\n> (2) added if branches based on it to remove `input` gradient computations if it's not requested,\r\n> (3) feed in `TensorInfo()` instead of `getTensorInfo(grad_input)` in case gradient for `input` is not requested.\r\n> \r\n> * Modified tests in `test/test_nn.py` so that they run also cases with no `input` gradient needed.\r\n> \r\n> * Have not touched the CPU fallback kernel.\r\n\r\nNote: the changes number (3) are N/A in this case.\r\n\r\n", "headRefName": "optimize_grid_sample_3d", "headRepository": {"nameWithOwner": "coolteemf/pytorch"}, "baseRefName": "master", "baseRepository": {"nameWithOwner": "pytorch/pytorch", "isPrivate": false, "defaultBranchRef": {"name": "master"}}, "mergeCommit": null, "commits": {"nodes": [{"commit": {"author": {"user": null, "email": "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", "name": "coolteemf"}, "oid": "e0b0d1e695aeddceaf265da602c4704592053e9e", "checkSuites": {"nodes": []}}}, {"commit": {"author": {"user": null, "email": "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", "name": "coolteemf"}, "oid": "563ec73747ad53b63b36736c47c4342f962c2a09", "checkSuites": {"nodes": []}}}, {"commit": {"author": {"user": null, "email": "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", "name": "coolteemf"}, "oid": "51abe41a132d9dd5b1c0551bdca902aacc028ff8", "checkSuites": {"nodes": []}}}, {"commit": {"author": {"user": null, "email": "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", "name": "coolteemf"}, "oid": "be9898205992034a00e8ace8a55c2ecdcee2c2f8", "checkSuites": {"nodes": []}}}, {"commit": {"author": {"user": null, "email": "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", "name": "coolteemf"}, "oid": "2929c60b64384c2deae0f7dea8bab94ad4bc9ec8", "checkSuites": {"nodes": []}}}, {"commit": {"author": {"user": null, "email": "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", "name": "coolteemf"}, "oid": "9241b737e7e2b257905cc74ad9c50b737d7f9d0a", "checkSuites": {"nodes": []}}}, {"commit": {"author": {"user": null, "email": "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", "name": "coolteemf"}, "oid": "64d6b795d0636928a8aa2fd3da01302fb5f5f7af", "checkSuites": {"nodes": []}}}, {"commit": {"author": {"user": null, "email": "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", "name": "coolteemf"}, "oid": "4503577e53760a0006f1e80ca6bfe04d2be90470", "checkSuites": {"nodes": []}}}, {"commit": {"author": {"user": null, "email": "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", "name": "coolteemf"}, "oid": "b16f4b11ffbbbf2ca2098f9702af4ef6b6fc5e1f", "checkSuites": {"nodes": [{"app": {"databaseId": 12274}, "conclusion": "SUCCESS"}]}}}, {"commit": {"author": {"user": null, "email": "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", "name": "coolteemf"}, "oid": "7ffc23368a604afdc92d2818747f730ce31a2bb5", "checkSuites": {"nodes": []}}}, {"commit": {"author": {"user": null, "email": "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", "name": "coolteemf"}, "oid": "b85292604b9ad6c31706b76b5a5498c4f6d94309", "checkSuites": {"nodes": [{"app": {"databaseId": 12274}, "conclusion": "SUCCESS"}]}}}, {"commit": {"author": {"user": null, "email": "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", "name": "coolteemf"}, "oid": "9d81d7bae8ad91aaa24b3ceab83e3138894dbc69", "checkSuites": {"nodes": [{"app": {"databaseId": 12274}, "conclusion": "SUCCESS"}]}}}, {"commit": {"author": {"user": null, "email": "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", "name": "coolteemf"}, "oid": "e79f6a2202512b294c55bf4bfb2e0524fafd4c48", "checkSuites": {"nodes": [{"app": {"databaseId": 12274}, "conclusion": "SUCCESS"}]}}}, {"commit": {"author": {"user": null, "email": "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", "name": "coolteemf"}, "oid": "f683e8aec7aea76097a264eec01511e704c31154", "checkSuites": {"nodes": [{"app": {"databaseId": 12274}, "conclusion": "SUCCESS"}]}}}, {"commit": {"author": {"user": {"login": "coolteemf"}, "email": "67541941+coolteemf@users.noreply.github.com", "name": "Fran\u00e7ois Lecomte"}, "oid": "b932e9e286c22aaf352375186df851ef060b295a", "checkSuites": {"nodes": [{"app": {"databaseId": 12274}, "conclusion": "SUCCESS"}]}}}, {"commit": {"author": {"user": null, "email": "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", "name": "coolteemf"}, "oid": "346e0c547953d98eb84d23c1391a95badb9c4a22", "checkSuites": {"nodes": [{"app": {"databaseId": 12274}, "conclusion": "SUCCESS"}]}}}], "totalCount": 16}, "changedFiles": 9, "files": {"nodes": [{"path": "aten/src/ATen/native/GridSampler.cpp"}, {"path": "aten/src/ATen/native/cpu/GridSamplerKernel.cpp"}, {"path": "aten/src/ATen/native/cuda/GridSampler.cpp"}, {"path": "aten/src/ATen/native/cuda/GridSampler.cu"}, {"path": "aten/src/ATen/native/cuda/GridSampler.h"}, {"path": "aten/src/ATen/native/native_functions.yaml"}, {"path": "test/forward_backward_compatibility/check_forward_backward_compatibility.py"}, {"path": "test/test_nn.py"}, {"path": "tools/autograd/derivatives.yaml"}], "pageInfo": {"endCursor": "OQ", "hasNextPage": false}}, "reviews": {"nodes": [{"author": {"login": "albanD"}, "state": "COMMENTED"}, {"author": {"login": "coolteemf"}, "state": "COMMENTED"}, {"author": {"login": "albanD"}, "state": "COMMENTED"}, {"author": {"login": "coolteemf"}, "state": "COMMENTED"}, {"author": {"login": "albanD"}, "state": "COMMENTED"}, {"author": {"login": "coolteemf"}, "state": "COMMENTED"}, {"author": {"login": "coolteemf"}, "state": "COMMENTED"}, {"author": {"login": "albanD"}, "state": "COMMENTED"}, {"author": {"login": "coolteemf"}, "state": "COMMENTED"}, {"author": {"login": "albanD"}, "state": "COMMENTED"}, {"author": {"login": "albanD"}, "state": "COMMENTED"}, {"author": {"login": "coolteemf"}, "state": "COMMENTED"}, {"author": {"login": "albanD"}, "state": "COMMENTED"}, {"author": {"login": "coolteemf"}, "state": "COMMENTED"}, {"author": {"login": "albanD"}, "state": "COMMENTED"}, {"author": {"login": "albanD"}, "state": "APPROVED"}, {"author": {"login": "albanD"}, "state": "APPROVED"}], "totalCount": 17}, "comments": {"nodes": [{"bodyText": "Hey @coolteemf.\nYou've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.\nFor changes that are 'topic: not user facing' there is no need for a release notes label.", "author": {"login": "github-actions"}, "authorAssociation": "NONE", "editor": null}]}}}}}, "query_sha=1847f597bd535a3d45a5751d69792ce57f4e565713118eee6057e5ee89e17997 name=pytorch number=73099 owner=pytorch": {"data": {"repository": {"pullRequest": {"closed": false, "isCrossRepository": false, "author": {"login": "BowenBao"}, "title": "[ONNX] Make graph name spec-compliant (#71961)", "body": "Stack from [ghstack](https://github.com/ezyang/ghstack):\n* #73104\n* #73103\n* #73102\n* #73101\n* #73100\n* __->__ #73099\n\n[According to the ONNX spec](https://github.com/onnx/onnx/blob/main/docs/IR.md#names-within-a-graph),\nall names must adhere to C90 identifier syntax rules, which means no\ndashes.\n\nFixes: #30952", "headRefName": "gh/BowenBao/138/head", "headRepository": {"nameWithOwner": "pytorch/pytorch"}, "baseRefName": "gh/BowenBao/138/base", "baseRepository": {"nameWithOwner": "pytorch/pytorch", "isPrivate": false, "defaultBranchRef": {"name": "master"}}, "mergeCommit": null, "commits": {"nodes": [{"commit": {"author": {"user": {"login": "BowenBao"}, "email": "bowbao@microsoft.com", "name": "BowenBao"}, "oid": "3038b939eb2069653305c419326a0f47d2598e39", "checkSuites": {"nodes": [{"app": {"databaseId": 12274}, "conclusion": "SUCCESS"}]}}}], "totalCount": 1}, "changedFiles": 162, "files": {"nodes": [{"path": "test/onnx/expect/TestOperators.test_acos.expect"}, {"path": "test/onnx/expect/TestOperators.test_add_broadcast.expect"}, {"path": "test/onnx/expect/TestOperators.test_add_left_broadcast.expect"}, {"path": "test/onnx/expect/TestOperators.test_add_size1_broadcast.expect"}, {"path": "test/onnx/expect/TestOperators.test_add_size1_right_broadcast.expect"}, {"path": "test/onnx/expect/TestOperators.test_add_size1_singleton_broadcast.expect"}, {"path": "test/onnx/expect/TestOperators.test_addconstant.expect"}, {"path": "test/onnx/expect/TestOperators.test_addmm.expect"}, {"path": "test/onnx/expect/TestOperators.test_arange_dynamic.expect"}, {"path": "test/onnx/expect/TestOperators.test_argmax.expect"}, {"path": "test/onnx/expect/TestOperators.test_asin.expect"}, {"path": "test/onnx/expect/TestOperators.test_at_op.expect"}, {"path": "test/onnx/expect/TestOperators.test_atan.expect"}, {"path": "test/onnx/expect/TestOperators.test_aten_embedding_1.expect"}, {"path": "test/onnx/expect/TestOperators.test_aten_embedding_2.expect"}, {"path": "test/onnx/expect/TestOperators.test_avg_pool2d.expect"}, {"path": "test/onnx/expect/TestOperators.test_baddbmm.expect"}, {"path": "test/onnx/expect/TestOperators.test_basic.expect"}, {"path": "test/onnx/expect/TestOperators.test_batchnorm.expect"}, {"path": "test/onnx/expect/TestOperators.test_batchnorm_1d.expect"}, {"path": "test/onnx/expect/TestOperators.test_batchnorm_noaffine.expect"}, {"path": "test/onnx/expect/TestOperators.test_batchnorm_onnx_irv4.expect"}, {"path": "test/onnx/expect/TestOperators.test_batchnorm_training.expect"}, {"path": "test/onnx/expect/TestOperators.test_bitshift.expect"}, {"path": "test/onnx/expect/TestOperators.test_c2_op.expect"}, {"path": "test/onnx/expect/TestOperators.test_chunk.expect"}, {"path": "test/onnx/expect/TestOperators.test_clip.expect"}, {"path": "test/onnx/expect/TestOperators.test_clip_max.expect"}, {"path": "test/onnx/expect/TestOperators.test_clip_min.expect"}, {"path": "test/onnx/expect/TestOperators.test_concat2.expect"}, {"path": "test/onnx/expect/TestOperators.test_conv.expect"}, {"path": "test/onnx/expect/TestOperators.test_conv_onnx_irv4.expect"}, {"path": "test/onnx/expect/TestOperators.test_conv_onnx_irv4_opset8.expect"}, {"path": "test/onnx/expect/TestOperators.test_convtranspose.expect"}, {"path": "test/onnx/expect/TestOperators.test_cos.expect"}, {"path": "test/onnx/expect/TestOperators.test_cumsum.expect"}, {"path": "test/onnx/expect/TestOperators.test_det.expect"}, {"path": "test/onnx/expect/TestOperators.test_dict.expect"}, {"path": "test/onnx/expect/TestOperators.test_dict_str.expect"}, {"path": "test/onnx/expect/TestOperators.test_dim.expect"}, {"path": "test/onnx/expect/TestOperators.test_dropout.expect"}, {"path": "test/onnx/expect/TestOperators.test_dropout_default.expect"}, {"path": "test/onnx/expect/TestOperators.test_dropout_opset12.expect"}, {"path": "test/onnx/expect/TestOperators.test_dropout_training.expect"}, {"path": "test/onnx/expect/TestOperators.test_dropout_training_opset12.expect"}, {"path": "test/onnx/expect/TestOperators.test_dynamic_axes_add.expect"}, {"path": "test/onnx/expect/TestOperators.test_dynamic_axes_add_inputs_same_symbolic_shape.expect"}, {"path": "test/onnx/expect/TestOperators.test_dynamic_axes_matmul.expect"}, {"path": "test/onnx/expect/TestOperators.test_dynamic_axes_reduce_mean.expect"}, {"path": "test/onnx/expect/TestOperators.test_dynamic_axes_unchange.expect"}, {"path": "test/onnx/expect/TestOperators.test_elu.expect"}, {"path": "test/onnx/expect/TestOperators.test_embedding_bags.expect"}, {"path": "test/onnx/expect/TestOperators.test_empty_like.expect"}, {"path": "test/onnx/expect/TestOperators.test_empty_like_opset7.expect"}, {"path": "test/onnx/expect/TestOperators.test_equal.expect"}, {"path": "test/onnx/expect/TestOperators.test_erf.expect"}, {"path": "test/onnx/expect/TestOperators.test_exp.expect"}, {"path": "test/onnx/expect/TestOperators.test_expand.expect"}, {"path": "test/onnx/expect/TestOperators.test_flatten.expect"}, {"path": "test/onnx/expect/TestOperators.test_flatten2D.expect"}, {"path": "test/onnx/expect/TestOperators.test_fmod.expect"}, {"path": "test/onnx/expect/TestOperators.test_frobenius_norm.expect"}, {"path": "test/onnx/expect/TestOperators.test_full.expect"}, {"path": "test/onnx/expect/TestOperators.test_full_like.expect"}, {"path": "test/onnx/expect/TestOperators.test_gather.expect"}, {"path": "test/onnx/expect/TestOperators.test_gather_opset11.expect"}, {"path": "test/onnx/expect/TestOperators.test_ge.expect"}, {"path": "test/onnx/expect/TestOperators.test_gelu.expect"}, {"path": "test/onnx/expect/TestOperators.test_gt.expect"}, {"path": "test/onnx/expect/TestOperators.test_hardtanh.expect"}, {"path": "test/onnx/expect/TestOperators.test_implicit_expand.expect"}, {"path": "test/onnx/expect/TestOperators.test_index.expect"}, {"path": "test/onnx/expect/TestOperators.test_isnan.expect"}, {"path": "test/onnx/expect/TestOperators.test_layer_norm_aten.expect"}, {"path": "test/onnx/expect/TestOperators.test_le.expect"}, {"path": "test/onnx/expect/TestOperators.test_linear.expect"}, {"path": "test/onnx/expect/TestOperators.test_log_sigmoid.expect"}, {"path": "test/onnx/expect/TestOperators.test_logsoftmax.expect"}, {"path": "test/onnx/expect/TestOperators.test_lstm_none_sequence_lens.expect"}, {"path": "test/onnx/expect/TestOperators.test_lt.expect"}, {"path": "test/onnx/expect/TestOperators.test_master_opset.expect"}, {"path": "test/onnx/expect/TestOperators.test_max.expect"}, {"path": "test/onnx/expect/TestOperators.test_maxpool.expect"}, {"path": "test/onnx/expect/TestOperators.test_maxpool_dilations.expect"}, {"path": "test/onnx/expect/TestOperators.test_maxpool_indices.expect"}, {"path": "test/onnx/expect/TestOperators.test_mean.expect"}, {"path": "test/onnx/expect/TestOperators.test_mean_dtype.expect"}, {"path": "test/onnx/expect/TestOperators.test_meshgrid.expect"}, {"path": "test/onnx/expect/TestOperators.test_min.expect"}, {"path": "test/onnx/expect/TestOperators.test_mm.expect"}, {"path": "test/onnx/expect/TestOperators.test_narrow.expect"}, {"path": "test/onnx/expect/TestOperators.test_ne.expect"}, {"path": "test/onnx/expect/TestOperators.test_nonzero.expect"}, {"path": "test/onnx/expect/TestOperators.test_norm_p1.expect"}, {"path": "test/onnx/expect/TestOperators.test_norm_p2.expect"}, {"path": "test/onnx/expect/TestOperators.test_ones_like.expect"}, {"path": "test/onnx/expect/TestOperators.test_pad.expect"}, {"path": "test/onnx/expect/TestOperators.test_params.expect"}, {"path": "test/onnx/expect/TestOperators.test_params_onnx_irv4.expect"}, {"path": "test/onnx/expect/TestOperators.test_permute2.expect"}], "pageInfo": {"endCursor": "MTAw", "hasNextPage": true}}, "reviews": {"nodes": [{"author": {"login": "garymm"}, "state": "APPROVED"}], "totalCount": 1}, "comments": {"nodes": [{"bodyText": "@malfet Thank you for info. Sure, I have separated the rest of stack from this one, we'll wait for the fix to try again.", "author": {"login": "BowenBao"}, "authorAssociation": "COLLABORATOR", "editor": null}]}}}}}, "query_sha=0a34acb829d8aca9dd28a8ba388dfa52f6ecdde7e903ace1caabdcfaba87de98 cursor=MTAw name=pytorch number=73099 owner=pytorch": {"data": {"repository": {"pullRequest": {"files": {"nodes": [{"path": "test/onnx/expect/TestOperators.test_pixel_shuffle.expect"}, {"path": "test/onnx/expect/TestOperators.test_pow.expect"}, {"path": "test/onnx/expect/TestOperators.test_prelu.expect"}, {"path": "test/onnx/expect/TestOperators.test_prod.expect"}, {"path": "test/onnx/expect/TestOperators.test_prod_dtype.expect"}, {"path": "test/onnx/expect/TestOperators.test_rand.expect"}, {"path": "test/onnx/expect/TestOperators.test_randn.expect"}, {"path": "test/onnx/expect/TestOperators.test_reduce_sum_negative_indices.expect"}, {"path": "test/onnx/expect/TestOperators.test_reduced_mean.expect"}, {"path": "test/onnx/expect/TestOperators.test_reduced_mean_dtype.expect"}, {"path": "test/onnx/expect/TestOperators.test_reduced_mean_keepdim.expect"}, {"path": "test/onnx/expect/TestOperators.test_reduced_prod.expect"}, {"path": "test/onnx/expect/TestOperators.test_reduced_prod_dtype.expect"}, {"path": "test/onnx/expect/TestOperators.test_reduced_prod_keepdim.expect"}, {"path": "test/onnx/expect/TestOperators.test_reduced_sum.expect"}, {"path": "test/onnx/expect/TestOperators.test_reduced_sum_dtype.expect"}, {"path": "test/onnx/expect/TestOperators.test_reduced_sum_keepdim.expect"}, {"path": "test/onnx/expect/TestOperators.test_reducemax.expect"}, {"path": "test/onnx/expect/TestOperators.test_reducemin.expect"}, {"path": "test/onnx/expect/TestOperators.test_remainder.expect"}, {"path": "test/onnx/expect/TestOperators.test_repeat.expect"}, {"path": "test/onnx/expect/TestOperators.test_repeat_dim_overflow.expect"}, {"path": "test/onnx/expect/TestOperators.test_round.expect"}, {"path": "test/onnx/expect/TestOperators.test_rrelu.expect"}, {"path": "test/onnx/expect/TestOperators.test_rsqrt.expect"}, {"path": "test/onnx/expect/TestOperators.test_rsub.expect"}, {"path": "test/onnx/expect/TestOperators.test_scatter_add.expect"}, {"path": "test/onnx/expect/TestOperators.test_scatter_add_opset11.expect"}, {"path": "test/onnx/expect/TestOperators.test_selu.expect"}, {"path": "test/onnx/expect/TestOperators.test_shape_value_map.expect"}, {"path": "test/onnx/expect/TestOperators.test_sign.expect"}, {"path": "test/onnx/expect/TestOperators.test_sin.expect"}, {"path": "test/onnx/expect/TestOperators.test_slice.expect"}, {"path": "test/onnx/expect/TestOperators.test_slice_dynamic.expect"}, {"path": "test/onnx/expect/TestOperators.test_softmaxcrossentropy.expect"}, {"path": "test/onnx/expect/TestOperators.test_softmaxcrossentropy_3d.expect"}, {"path": "test/onnx/expect/TestOperators.test_softmaxcrossentropy_3d_none.expect"}, {"path": "test/onnx/expect/TestOperators.test_softmaxcrossentropy_4d.expect"}, {"path": "test/onnx/expect/TestOperators.test_softmaxcrossentropy_ignore_index.expect"}, {"path": "test/onnx/expect/TestOperators.test_softmaxcrossentropy_weights.expect"}, {"path": "test/onnx/expect/TestOperators.test_split.expect"}, {"path": "test/onnx/expect/TestOperators.test_split_with_sizes.expect"}, {"path": "test/onnx/expect/TestOperators.test_sqrt.expect"}, {"path": "test/onnx/expect/TestOperators.test_std.expect"}, {"path": "test/onnx/expect/TestOperators.test_sum.expect"}, {"path": "test/onnx/expect/TestOperators.test_sum_dtype.expect"}, {"path": "test/onnx/expect/TestOperators.test_tan.expect"}, {"path": "test/onnx/expect/TestOperators.test_topk.expect"}, {"path": "test/onnx/expect/TestOperators.test_topk_smallest_unsorted.expect"}, {"path": "test/onnx/expect/TestOperators.test_transpose.expect"}, {"path": "test/onnx/expect/TestOperators.test_type_as.expect"}, {"path": "test/onnx/expect/TestOperators.test_unfold.expect"}, {"path": "test/onnx/expect/TestOperators.test_unique.expect"}, {"path": "test/onnx/expect/TestOperators.test_unsqueeze.expect"}, {"path": "test/onnx/expect/TestOperators.test_upsample_nearest_scale.expect"}, {"path": "test/onnx/expect/TestOperators.test_upsample_nearest_scale_default_scale_factor.expect"}, {"path": "test/onnx/expect/TestOperators.test_upsample_nearest_size.expect"}, {"path": "test/onnx/expect/TestOperators.test_view.expect"}, {"path": "test/onnx/expect/TestOperators.test_view_flatten.expect"}, {"path": "test/onnx/expect/TestOperators.test_zeros_like.expect"}, {"path": "torch/csrc/jit/serialization/export.cpp"}, {"path": "torch/csrc/jit/serialization/export.h"}], "pageInfo": {"endCursor": "MTYy", "hasNextPage": false}}}}}}} +{ + "query_sha=a782f66a44a63d21c9e17b1373747a1c07e50b695762a68a8b8db1203ac6c1bb name=pytorch number=73811 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "closed": true, + "isCrossRepository": false, + "author": { + "login": "seemethere" + }, + "title": "ci: Migrate metrics credentials to managed IAM", + "body": "Stack from [ghstack](https://github.com/ezyang/ghstack):\n* __->__ #73811\n\r\nMigrates our credentials to upload metrics statistics to managed IAM\r\ncredentials in order to make it easier to know where the credentials are\r\ncoming from and to make it easier to add more permissions / less\r\npermissions later on.\r\n\r\nRelates to work done in [D34535827](https://www.internalfb.com/diff/D34535827)\r\n\r\nSigned-off-by: Eli Uriegas ", + "headRefName": "gh/seemethere/215/head", + "headRepository": { + "nameWithOwner": "pytorch/pytorch" + }, + "baseRefName": "gh/seemethere/215/base", + "baseRepository": { + "nameWithOwner": "pytorch/pytorch", + "isPrivate": false, + "defaultBranchRef": { + "name": "master" + } + }, + "mergeCommit": null, + "commits_with_authors": { + "nodes": [ + { + "commit": { + "author": { + "user": { + "login": "seemethere" + }, + "email": "eliuriegas@fb.com", + "name": "Eli Uriegas" + }, + "oid": "13c44d16a876a56bca479b4cf30715d21fa16e99" + } + }, + { + "commit": { + "author": { + "user": { + "login": "seemethere" + }, + "email": "eliuriegas@fb.com", + "name": "Eli Uriegas" + }, + "oid": "9d26f4e6d8c8df275ea546180fef42548257d2d7" + } + } + ], + "totalCount": 2 + }, + "commits": { + "nodes": [ + { + "commit": { + "checkSuites": { + "nodes": [ + { + "app": { + "name": "Facebook GitHub Tools", + "databaseId": 12274 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [ + { + "name": "Facebook CLA Check", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqOaHA=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build" + } + }, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": "CANCELLED" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "win-vs2019-cpu-py3" + } + }, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": "CANCELLED" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3-clang5-mobile-build" + } + }, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": "CANCELLED" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single" + } + }, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": "CANCELLED" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "Lint" + } + }, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": "CANCELLED" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" + } + }, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": "CANCELLED" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-gcc7-no-ops" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqObRM=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "Test tools" + } + }, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": "CANCELLED" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-clang7-asan" + } + }, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": "CANCELLED" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test" + } + }, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": "CANCELLED" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-bionic-py3.7-clang9" + } + }, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": "CANCELLED" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-clang7-onnx" + } + }, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": "CANCELLED" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-gcc7" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 2, 2, linux.2xlarge)", + "conclusion": "FAILURE" + }, + { + "name": "test (distributed, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqP89A=", + "hasNextPage": false + } + }, + "conclusion": "FAILURE" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3-clang5-mobile-custom-build-static" + } + }, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": "CANCELLED" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build-and-test", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqObTk=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-docs" + } + }, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": "CANCELLED" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "win-vs2019-cuda11.3-py3" + } + }, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": "CANCELLED" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-cuda11.3-py3.7-gcc7" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 2, 2, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 2, linux.4xlarge.nvidia.gpu)", + "conclusion": "FAILURE" + }, + { + "name": "test (distributed, 1, 1, linux.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqUJII=", + "hasNextPage": false + } + }, + "conclusion": "FAILURE" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "pytorch-xla-linux-bionic-py3.7-clang8" + } + }, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": "CANCELLED" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-gcc5.4" + } + }, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": "CANCELLED" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-vulkan-bionic-py3.7-clang9" + } + }, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": "CANCELLED" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-bionic-py3.7-clang9" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 2, linux.2xlarge)", + "conclusion": "FAILURE" + }, + { + "name": "test (noarch, 1, 1, linux.2xlarge)", + "conclusion": "FAILURE" + }, + { + "name": "test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqP_28=", + "hasNextPage": false + } + }, + "conclusion": "FAILURE" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-gcc7-no-ops" + } + }, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": "CANCELLED" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-clang7-onnx" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqQMyA=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-clang7-asan" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 3, 3, linux.2xlarge)", + "conclusion": "FAILURE" + }, + { + "name": "test (default, 1, 3, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 2, 3, linux.2xlarge)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqQcpA=", + "hasNextPage": false + } + }, + "conclusion": "FAILURE" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "Lint" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "cmakelint", + "conclusion": "SUCCESS" + }, + { + "name": "clang-format", + "conclusion": "SUCCESS" + }, + { + "name": "clang-tidy", + "conclusion": "SUCCESS" + }, + { + "name": "flake8-py3", + "conclusion": "SUCCESS" + }, + { + "name": "quick-checks", + "conclusion": "SUCCESS" + }, + { + "name": "mypy", + "conclusion": "SUCCESS" + }, + { + "name": "py2-setup-validate-errormsg", + "conclusion": "SUCCESS" + }, + { + "name": "shellcheck", + "conclusion": "SUCCESS" + }, + { + "name": "toc", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqOcU4=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "run-torchbench", + "conclusion": "NEUTRAL" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqObKc=", + "hasNextPage": false + } + }, + "conclusion": "SKIPPED" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-cuda11.3-py3.7-gcc7" + } + }, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": "CANCELLED" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit" + } + }, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": "CANCELLED" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "pytorch-xla-linux-bionic-py3.7-clang8" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (xla, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqQjCM=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "win-vs2019-cuda11.3-py3" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 2, windows.8xlarge.nvidia.gpu)", + "conclusion": "FAILURE" + }, + { + "name": "test (force_on_cpu, 1, 1, windows.4xlarge)", + "conclusion": "FAILURE" + }, + { + "name": "test (default, 2, 2, windows.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqUs2w=", + "hasNextPage": false + } + }, + "conclusion": "FAILURE" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqObQs=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build-and-test", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqObUI=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build-and-test", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqObSk=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-vulkan-bionic-py3.7-clang9" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqQKq4=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-docs" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "build-docs (cpp)", + "conclusion": "SUCCESS" + }, + { + "name": "build-docs (python)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqQCGQ=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-bionic-rocm4.5-py3.7" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 2, linux.rocm.gpu)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 2, 2, linux.rocm.gpu)", + "conclusion": "FAILURE" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqRADE=", + "hasNextPage": false + } + }, + "conclusion": "FAILURE" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3-clang5-mobile-build" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqObKU=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "win-vs2019-cpu-py3" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 2, 2, windows.4xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 2, windows.4xlarge)", + "conclusion": "FAILURE" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqSq34=", + "hasNextPage": false + } + }, + "conclusion": "FAILURE" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-gcc5.4" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (distributed, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (backwards_compat, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 2, 2, linux.2xlarge)", + "conclusion": "FAILURE" + }, + { + "name": "test (jit_legacy, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (docs_test, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqQFvU=", + "hasNextPage": false + } + }, + "conclusion": "FAILURE" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-gcc7" + } + }, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": "CANCELLED" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-bionic-rocm4.5-py3.7" + } + }, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": "CANCELLED" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "Test tools" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "test", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqObQ4=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3-clang5-mobile-custom-build-static" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqObRg=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "macos-10-15-py3-arm64" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqOcA8=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "ios-12-5-1-arm64-coreml" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqOcA4=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "ios-12-5-1-arm64" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqOcAc=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "macos-11-py3-x86-64" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 2, macos-11)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 2, 2, macos-11)", + "conclusion": "FAILURE" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqSQ2M=", + "hasNextPage": false + } + }, + "conclusion": "FAILURE" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "ios-12-5-1-arm64-custom-ops" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqOcBE=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAVFCdAA=", + "hasNextPage": true + } + }, + "oid": "9d26f4e6d8c8df275ea546180fef42548257d2d7" + } + } + ] + }, + "changedFiles": 3, + "files": { + "nodes": [ + { + "path": ".github/templates/common.yml.j2" + }, + { + "path": ".github/workflows/generated-macos-11-py3-x86-64.yml" + }, + { + "path": ".github/workflows/update_pytorch_labels.yml" + } + ], + "pageInfo": { + "endCursor": "Mw", + "hasNextPage": false + } + }, + "reviews": { + "nodes": [ + { + "author": { + "login": "kit1980" + }, + "state": "APPROVED" + }, + { + "author": { + "login": "janeyx99" + }, + "state": "APPROVED" + } + ], + "totalCount": 2 + }, + "comments": { + "nodes": [ + { + "bodyText": "Merge failed due to Too many checksuites for commit\nRaised by https://github.com/pytorch/pytorch/actions/runs/1988337976", + "author": { + "login": "pytorchmergebot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1068270969 + }, + { + "bodyText": "@pytorchbot force merge this", + "author": { + "login": "seemethere" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1068436128 + }, + { + "bodyText": "Merge failed due to Too many checksuites for commit\nRaised by https://github.com/pytorch/pytorch/actions/runs/1989076952", + "author": { + "login": "pytorchmergebot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1068437098 + }, + { + "bodyText": "@pytorchbot merge this", + "author": { + "login": "seemethere" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1068482921 + }, + { + "bodyText": "Hey @seemethere.\nYou've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.\nFor changes that are 'topic: not user facing' there is no need for a release notes label.", + "author": { + "login": "github-actions" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 1068484404 + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpHOP6yFeQ==", + "hasPreviousPage": true + } + } + } + } + } + }, + "query_sha=a1fbb4e3efd3c0ee1c99a701334f73a0d1fd689c8341a4302ded4d4ebfa5b38a cursor=Y3Vyc29yOnYyOpHPAAAAAVFCdAA= name=pytorch number=73811 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "commits": { + "nodes": [ + { + "commit": { + "oid": "9d26f4e6d8c8df275ea546180fef42548257d2d7", + "checkSuites": { + "nodes": [ + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "ios-12-5-1-x86-64-coreml" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqOcA0=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "ios-12-5-1-arm64-metal" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqOcBA=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "macos-10-15-py3-lite-interpreter-x86-64" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqOcAs=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "ios-12-5-1-x86-64" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqOcAQ=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "Netlify", + "databaseId": 13473 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null + }, + { + "app": { + "name": "Azure Pipelines", + "databaseId": 9426 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null + }, + { + "app": { + "name": "Dependabot", + "databaseId": 29110 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null + }, + { + "app": { + "name": "Codecov", + "databaseId": 254 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null + }, + { + "app": { + "name": "PyTorch Bot", + "databaseId": 40112 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAVFCdes=", + "hasNextPage": false + } + } + } + } + ] + } + } + } + } + }, + "query_sha=a782f66a44a63d21c9e17b1373747a1c07e50b695762a68a8b8db1203ac6c1bb name=pytorch number=31093 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "closed": false, + "isCrossRepository": true, + "author": { + "login": "mingxiaoh" + }, + "title": "improve mkldnn convolution test coverage", + "body": "This pr will improve the test coverage of mkldnn convolution.\r\n1.test input: specific sensitive numbers\r\n2.pass criteria: output of mkldnn convolution matches output of thnn convolution\r\n3.coverage: by using coverage tool, we found out the following sensitive parameters. Overall the case will test 4352 patterns, takes 8.8s on my machine.\r\n\r\nto run the test case:\r\n\r\npython test_mkldnn_conv2d_ext.py\r\nor\r\npython run_test.py -i mkldnn_conv2d_ext\r\n\r\nIn case of failure, the pattern will be printed in the log for further debugging.\r\n\r\nactually, this PR is created to replace and improve that PR we created before(https://github.com/pytorch/pytorch/pull/25085) ", + "headRefName": "master", + "headRepository": { + "nameWithOwner": "mingxiaoh/pytorch" + }, + "baseRefName": "master", + "baseRepository": { + "nameWithOwner": "pytorch/pytorch", + "isPrivate": false, + "defaultBranchRef": { + "name": "master" + } + }, + "mergeCommit": null, + "commits_with_authors": { + "nodes": [ + { + "commit": { + "author": { + "user": { + "login": "11pikachu" + }, + "email": "junx.du@intel.com", + "name": "dujun" + }, + "oid": "29f6aa6ecc2ece3fa58170ff4561f9d8d5c129f9" + } + } + ], + "totalCount": 1 + }, + "commits": { + "nodes": [ + { + "commit": { + "checkSuites": { + "nodes": [ + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "clang-format" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "clang-format", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHOQYu8fQ==", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "Lint" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "flake8-py3", + "conclusion": "SUCCESS" + }, + { + "name": "quick-checks", + "conclusion": "SUCCESS" + }, + { + "name": "clang-tidy", + "conclusion": "SUCCESS" + }, + { + "name": "cmakelint", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHOQYu8qA==", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "Codecov", + "databaseId": 254 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [ + { + "name": "codecov/project", + "conclusion": "SUCCESS" + }, + { + "name": "codecov/patch", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHOQZhcFQ==", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "Codecov", + "databaseId": 254 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [ + { + "name": "codecov/patch", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHOQZZsEQ==", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "Facebook GitHub Tools", + "databaseId": 12274 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [ + { + "name": "Facebook CLA Check", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHOUquzJg==", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHOWKm2eg==", + "hasNextPage": false + } + }, + "oid": "29f6aa6ecc2ece3fa58170ff4561f9d8d5c129f9" + } + } + ] + }, + "changedFiles": 5, + "files": { + "nodes": [ + { + "path": "test/math_libraries/convolutions.py" + }, + { + "path": "test/math_libraries/convolutions_cases/shapes_googlenet_v3.json" + }, + { + "path": "test/math_libraries/convolutions_cases/shapes_maskrcnn_p1.json" + }, + { + "path": "test/math_libraries/convolutions_cases/shapes_mobilenet.json" + }, + { + "path": "test/math_libraries/convolutions_cases/shapes_resnet_50.json" + } + ], + "pageInfo": { + "endCursor": "NQ", + "hasNextPage": false + } + }, + "reviews": { + "nodes": [ + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "CHANGES_REQUESTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "CHANGES_REQUESTED" + }, + { + "author": { + "login": "ailzhang" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "ngimel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "VitalyFedyunin" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "ngimel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mingxiaoh" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mingxiaoh" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "VitalyFedyunin" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "VitalyFedyunin" + }, + "state": "APPROVED" + } + ], + "totalCount": 34 + }, + "comments": { + "nodes": [ + { + "bodyText": "@mruberry sorry but what is missing actually?\n\nThe JSON files.\n\n@mruberry sorry, we add them now, would you please check it again? Thanks.", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 673402901 + }, + { + "bodyText": "I cloned your repo and ran the tests:\n~/pytorch/test/math_libraries$ python convolutions.py\nFFFF\n======================================================================\nFAIL: test_conv2d_ext_cpu_float32 (__main__.TestConvExtCPU)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 244, in instantiated_test\n result = test(self, *args)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 615, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 472, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 114, in test_conv2d_ext\n \"invalid cases:\" + \",\".join(invalid_cases)\nAssertionError: invalid cases:masknet_p1:conv33,masknet_p1:conv8,masknet_p1:conv2*4,masknet_p1:conv12,masknet_p1:conv4*3,masknet_p1:conv19,masknet_p1:conv4,masknet_p1:conv4,masknet_p1:conv27,masknet_p1:conv39,masknet_p1:conv23,masknet_p1:conv20,masknet_p1:conv25,masknet_p1:conv17,masknet_p1:conv9*4,masknet_p1:conv36,masknet_p1:conv18,masknet_p1:conv5,masknet_p1:conv38,masknet_p1:conv31,masknet_p1:conv14,masknet_p1:conv26,masknet_p1:conv2,masknet_p1:conv5*2,masknet_p1:conv28,masknet_p1:conv16,masknet_p1:conv20*3,masknet_p1:conv9,masknet_p1:conv14*23,masknet_p1:conv32,masknet_p1:conv30,masknet_p1:conv35,masknet_p1:conv37,masknet_p1:conv3,masknet_p1:conv24,masknet_p1:conv13,masknet_p1:conv21*3,masknet_p1:conv10,masknet_p1:conv7,masknet_p1:conv34,masknet_p1:conv13*24,masknet_p1:conv10*4,masknet_p1:conv22*2,masknet_p1:conv6,masknet_p1:conv22,masknet_p1:conv11,masknet_p1:conv40,masknet_p1:conv15,masknet_p1:conv17*23,masknet_p1:conv29,masknet_p1:conv21,masknet_p1:conv1,masknet_p1:conv11*3,mobilenet:conv3,mobilenet:conv2*4,mobilenet:conv6,mobilenet:conv7,mobilenet:conv5*4,mobilenet:conv4*4,mobilenet:conv7*4,mobilenet:conv1*3,mobilenet:conv10,mobilenet:conv2,mobilenet:conv5,mobilenet:conv4,mobilenet:conv9*4,mobilenet:conv8,mobilenet:conv9,mobilenet:conv6*4,mobilenet:conv10*4,mobilenet:conv11,mobilenet:conv8*20,mobilenet:conv1,mobilenet:conv11*4,mobilenet:conv3*4\n\n======================================================================\nFAIL: test_conv2d_ext_cuda_float16 (__main__.TestConvExtCUDA)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 244, in instantiated_test\n result = test(self, *args)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 615, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 472, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 114, in test_conv2d_ext\n \"invalid cases:\" + \",\".join(invalid_cases)\nAssertionError: invalid cases:masknet_p1:conv33,masknet_p1:conv8,masknet_p1:conv2*4,masknet_p1:conv12,masknet_p1:conv4*3,masknet_p1:conv19,masknet_p1:conv4,masknet_p1:conv4,masknet_p1:conv27,masknet_p1:conv39,masknet_p1:conv23,masknet_p1:conv20,masknet_p1:conv25,masknet_p1:conv17,masknet_p1:conv9*4,masknet_p1:conv36,masknet_p1:conv18,masknet_p1:conv5,masknet_p1:conv38,masknet_p1:conv31,masknet_p1:conv14,masknet_p1:conv26,masknet_p1:conv2,masknet_p1:conv5*2,masknet_p1:conv28,masknet_p1:conv16,masknet_p1:conv20*3,masknet_p1:conv9,masknet_p1:conv14*23,masknet_p1:conv32,masknet_p1:conv30,masknet_p1:conv35,masknet_p1:conv37,masknet_p1:conv3,masknet_p1:conv24,masknet_p1:conv13,masknet_p1:conv21*3,masknet_p1:conv10,masknet_p1:conv7,masknet_p1:conv34,masknet_p1:conv13*24,masknet_p1:conv10*4,masknet_p1:conv22*2,masknet_p1:conv6,masknet_p1:conv22,masknet_p1:conv11,masknet_p1:conv40,masknet_p1:conv15,masknet_p1:conv17*23,masknet_p1:conv29,masknet_p1:conv21,masknet_p1:conv1,masknet_p1:conv11*3,mobilenet:conv3,mobilenet:conv2*4,mobilenet:conv6,mobilenet:conv7,mobilenet:conv5*4,mobilenet:conv4*4,mobilenet:conv7*4,mobilenet:conv1*3,mobilenet:conv10,mobilenet:conv2,mobilenet:conv5,mobilenet:conv4,mobilenet:conv9*4,mobilenet:conv8,mobilenet:conv9,mobilenet:conv6*4,mobilenet:conv10*4,mobilenet:conv11,mobilenet:conv8*20,mobilenet:conv1,mobilenet:conv11*4,mobilenet:conv3*4\n\n======================================================================\nFAIL: test_conv2d_ext_cuda_float32 (__main__.TestConvExtCUDA)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 244, in instantiated_test\n result = test(self, *args)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 615, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 472, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 114, in test_conv2d_ext\n \"invalid cases:\" + \",\".join(invalid_cases)\nAssertionError: invalid cases:masknet_p1:conv33,masknet_p1:conv8,masknet_p1:conv2*4,masknet_p1:conv12,masknet_p1:conv4*3,masknet_p1:conv19,masknet_p1:conv4,masknet_p1:conv4,masknet_p1:conv27,masknet_p1:conv39,masknet_p1:conv23,masknet_p1:conv20,masknet_p1:conv25,masknet_p1:conv17,masknet_p1:conv9*4,masknet_p1:conv36,masknet_p1:conv18,masknet_p1:conv5,masknet_p1:conv38,masknet_p1:conv31,masknet_p1:conv14,masknet_p1:conv26,masknet_p1:conv2,masknet_p1:conv5*2,masknet_p1:conv28,masknet_p1:conv16,masknet_p1:conv20*3,masknet_p1:conv9,masknet_p1:conv14*23,masknet_p1:conv32,masknet_p1:conv30,masknet_p1:conv35,masknet_p1:conv37,masknet_p1:conv3,masknet_p1:conv24,masknet_p1:conv13,masknet_p1:conv21*3,masknet_p1:conv10,masknet_p1:conv7,masknet_p1:conv34,masknet_p1:conv13*24,masknet_p1:conv10*4,masknet_p1:conv22*2,masknet_p1:conv6,masknet_p1:conv22,masknet_p1:conv11,masknet_p1:conv40,masknet_p1:conv15,masknet_p1:conv17*23,masknet_p1:conv29,masknet_p1:conv21,masknet_p1:conv1,masknet_p1:conv11*3,mobilenet:conv3,mobilenet:conv2*4,mobilenet:conv6,mobilenet:conv7,mobilenet:conv5*4,mobilenet:conv4*4,mobilenet:conv7*4,mobilenet:conv1*3,mobilenet:conv10,mobilenet:conv2,mobilenet:conv5,mobilenet:conv4,mobilenet:conv9*4,mobilenet:conv8,mobilenet:conv9,mobilenet:conv6*4,mobilenet:conv10*4,mobilenet:conv11,mobilenet:conv8*20,mobilenet:conv1,mobilenet:conv11*4,mobilenet:conv3*4\n\n======================================================================\nFAIL: test_conv2d_ext_cuda_float64 (__main__.TestConvExtCUDA)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 244, in instantiated_test\n result = test(self, *args)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 615, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 472, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 114, in test_conv2d_ext\n \"invalid cases:\" + \",\".join(invalid_cases)\nAssertionError: invalid cases:masknet_p1:conv33,masknet_p1:conv8,masknet_p1:conv2*4,masknet_p1:conv12,masknet_p1:conv4*3,masknet_p1:conv19,masknet_p1:conv4,masknet_p1:conv4,masknet_p1:conv27,masknet_p1:conv39,masknet_p1:conv23,masknet_p1:conv20,masknet_p1:conv25,masknet_p1:conv17,masknet_p1:conv9*4,masknet_p1:conv36,masknet_p1:conv18,masknet_p1:conv5,masknet_p1:conv38,masknet_p1:conv31,masknet_p1:conv14,masknet_p1:conv26,masknet_p1:conv2,masknet_p1:conv5*2,masknet_p1:conv28,masknet_p1:conv16,masknet_p1:conv20*3,masknet_p1:conv9,masknet_p1:conv14*23,masknet_p1:conv32,masknet_p1:conv30,masknet_p1:conv35,masknet_p1:conv37,masknet_p1:conv3,masknet_p1:conv24,masknet_p1:conv13,masknet_p1:conv21*3,masknet_p1:conv10,masknet_p1:conv7,masknet_p1:conv34,masknet_p1:conv13*24,masknet_p1:conv10*4,masknet_p1:conv22*2,masknet_p1:conv6,masknet_p1:conv22,masknet_p1:conv11,masknet_p1:conv40,masknet_p1:conv15,masknet_p1:conv17*23,masknet_p1:conv29,masknet_p1:conv21,masknet_p1:conv1,masknet_p1:conv11*3,mobilenet:conv3,mobilenet:conv2*4,mobilenet:conv6,mobilenet:conv7,mobilenet:conv5*4,mobilenet:conv4*4,mobilenet:conv7*4,mobilenet:conv1*3,mobilenet:conv10,mobilenet:conv2,mobilenet:conv5,mobilenet:conv4,mobilenet:conv9*4,mobilenet:conv8,mobilenet:conv9,mobilenet:conv6*4,mobilenet:conv10*4,mobilenet:conv11,mobilenet:conv8*20,mobilenet:conv1,mobilenet:conv11*4,mobilenet:conv3*4\n\n----------------------------------------------------------------------\nRan 4 tests in 33.838s\n\nFAILED (failures=4)\n\nStill fails.", + "author": { + "login": "mruberry" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 673760580 + }, + { + "bodyText": "I cloned your repo and ran the tests:\n~/pytorch/test/math_libraries$ python convolutions.py\nFFFF\n======================================================================\nFAIL: test_conv2d_ext_cpu_float32 (__main__.TestConvExtCPU)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 244, in instantiated_test\n result = test(self, *args)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 615, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 472, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 114, in test_conv2d_ext\n \"invalid cases:\" + \",\".join(invalid_cases)\nAssertionError: invalid cases:masknet_p1:conv33,masknet_p1:conv8,masknet_p1:conv2*4,masknet_p1:conv12,masknet_p1:conv4*3,masknet_p1:conv19,masknet_p1:conv4,masknet_p1:conv4,masknet_p1:conv27,masknet_p1:conv39,masknet_p1:conv23,masknet_p1:conv20,masknet_p1:conv25,masknet_p1:conv17,masknet_p1:conv9*4,masknet_p1:conv36,masknet_p1:conv18,masknet_p1:conv5,masknet_p1:conv38,masknet_p1:conv31,masknet_p1:conv14,masknet_p1:conv26,masknet_p1:conv2,masknet_p1:conv5*2,masknet_p1:conv28,masknet_p1:conv16,masknet_p1:conv20*3,masknet_p1:conv9,masknet_p1:conv14*23,masknet_p1:conv32,masknet_p1:conv30,masknet_p1:conv35,masknet_p1:conv37,masknet_p1:conv3,masknet_p1:conv24,masknet_p1:conv13,masknet_p1:conv21*3,masknet_p1:conv10,masknet_p1:conv7,masknet_p1:conv34,masknet_p1:conv13*24,masknet_p1:conv10*4,masknet_p1:conv22*2,masknet_p1:conv6,masknet_p1:conv22,masknet_p1:conv11,masknet_p1:conv40,masknet_p1:conv15,masknet_p1:conv17*23,masknet_p1:conv29,masknet_p1:conv21,masknet_p1:conv1,masknet_p1:conv11*3,mobilenet:conv3,mobilenet:conv2*4,mobilenet:conv6,mobilenet:conv7,mobilenet:conv5*4,mobilenet:conv4*4,mobilenet:conv7*4,mobilenet:conv1*3,mobilenet:conv10,mobilenet:conv2,mobilenet:conv5,mobilenet:conv4,mobilenet:conv9*4,mobilenet:conv8,mobilenet:conv9,mobilenet:conv6*4,mobilenet:conv10*4,mobilenet:conv11,mobilenet:conv8*20,mobilenet:conv1,mobilenet:conv11*4,mobilenet:conv3*4\n\n======================================================================\nFAIL: test_conv2d_ext_cuda_float16 (__main__.TestConvExtCUDA)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 244, in instantiated_test\n result = test(self, *args)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 615, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 472, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 114, in test_conv2d_ext\n \"invalid cases:\" + \",\".join(invalid_cases)\nAssertionError: invalid cases:masknet_p1:conv33,masknet_p1:conv8,masknet_p1:conv2*4,masknet_p1:conv12,masknet_p1:conv4*3,masknet_p1:conv19,masknet_p1:conv4,masknet_p1:conv4,masknet_p1:conv27,masknet_p1:conv39,masknet_p1:conv23,masknet_p1:conv20,masknet_p1:conv25,masknet_p1:conv17,masknet_p1:conv9*4,masknet_p1:conv36,masknet_p1:conv18,masknet_p1:conv5,masknet_p1:conv38,masknet_p1:conv31,masknet_p1:conv14,masknet_p1:conv26,masknet_p1:conv2,masknet_p1:conv5*2,masknet_p1:conv28,masknet_p1:conv16,masknet_p1:conv20*3,masknet_p1:conv9,masknet_p1:conv14*23,masknet_p1:conv32,masknet_p1:conv30,masknet_p1:conv35,masknet_p1:conv37,masknet_p1:conv3,masknet_p1:conv24,masknet_p1:conv13,masknet_p1:conv21*3,masknet_p1:conv10,masknet_p1:conv7,masknet_p1:conv34,masknet_p1:conv13*24,masknet_p1:conv10*4,masknet_p1:conv22*2,masknet_p1:conv6,masknet_p1:conv22,masknet_p1:conv11,masknet_p1:conv40,masknet_p1:conv15,masknet_p1:conv17*23,masknet_p1:conv29,masknet_p1:conv21,masknet_p1:conv1,masknet_p1:conv11*3,mobilenet:conv3,mobilenet:conv2*4,mobilenet:conv6,mobilenet:conv7,mobilenet:conv5*4,mobilenet:conv4*4,mobilenet:conv7*4,mobilenet:conv1*3,mobilenet:conv10,mobilenet:conv2,mobilenet:conv5,mobilenet:conv4,mobilenet:conv9*4,mobilenet:conv8,mobilenet:conv9,mobilenet:conv6*4,mobilenet:conv10*4,mobilenet:conv11,mobilenet:conv8*20,mobilenet:conv1,mobilenet:conv11*4,mobilenet:conv3*4\n\n======================================================================\nFAIL: test_conv2d_ext_cuda_float32 (__main__.TestConvExtCUDA)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 244, in instantiated_test\n result = test(self, *args)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 615, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 472, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 114, in test_conv2d_ext\n \"invalid cases:\" + \",\".join(invalid_cases)\nAssertionError: invalid cases:masknet_p1:conv33,masknet_p1:conv8,masknet_p1:conv2*4,masknet_p1:conv12,masknet_p1:conv4*3,masknet_p1:conv19,masknet_p1:conv4,masknet_p1:conv4,masknet_p1:conv27,masknet_p1:conv39,masknet_p1:conv23,masknet_p1:conv20,masknet_p1:conv25,masknet_p1:conv17,masknet_p1:conv9*4,masknet_p1:conv36,masknet_p1:conv18,masknet_p1:conv5,masknet_p1:conv38,masknet_p1:conv31,masknet_p1:conv14,masknet_p1:conv26,masknet_p1:conv2,masknet_p1:conv5*2,masknet_p1:conv28,masknet_p1:conv16,masknet_p1:conv20*3,masknet_p1:conv9,masknet_p1:conv14*23,masknet_p1:conv32,masknet_p1:conv30,masknet_p1:conv35,masknet_p1:conv37,masknet_p1:conv3,masknet_p1:conv24,masknet_p1:conv13,masknet_p1:conv21*3,masknet_p1:conv10,masknet_p1:conv7,masknet_p1:conv34,masknet_p1:conv13*24,masknet_p1:conv10*4,masknet_p1:conv22*2,masknet_p1:conv6,masknet_p1:conv22,masknet_p1:conv11,masknet_p1:conv40,masknet_p1:conv15,masknet_p1:conv17*23,masknet_p1:conv29,masknet_p1:conv21,masknet_p1:conv1,masknet_p1:conv11*3,mobilenet:conv3,mobilenet:conv2*4,mobilenet:conv6,mobilenet:conv7,mobilenet:conv5*4,mobilenet:conv4*4,mobilenet:conv7*4,mobilenet:conv1*3,mobilenet:conv10,mobilenet:conv2,mobilenet:conv5,mobilenet:conv4,mobilenet:conv9*4,mobilenet:conv8,mobilenet:conv9,mobilenet:conv6*4,mobilenet:conv10*4,mobilenet:conv11,mobilenet:conv8*20,mobilenet:conv1,mobilenet:conv11*4,mobilenet:conv3*4\n\n======================================================================\nFAIL: test_conv2d_ext_cuda_float64 (__main__.TestConvExtCUDA)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 244, in instantiated_test\n result = test(self, *args)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 615, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 472, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 114, in test_conv2d_ext\n \"invalid cases:\" + \",\".join(invalid_cases)\nAssertionError: invalid cases:masknet_p1:conv33,masknet_p1:conv8,masknet_p1:conv2*4,masknet_p1:conv12,masknet_p1:conv4*3,masknet_p1:conv19,masknet_p1:conv4,masknet_p1:conv4,masknet_p1:conv27,masknet_p1:conv39,masknet_p1:conv23,masknet_p1:conv20,masknet_p1:conv25,masknet_p1:conv17,masknet_p1:conv9*4,masknet_p1:conv36,masknet_p1:conv18,masknet_p1:conv5,masknet_p1:conv38,masknet_p1:conv31,masknet_p1:conv14,masknet_p1:conv26,masknet_p1:conv2,masknet_p1:conv5*2,masknet_p1:conv28,masknet_p1:conv16,masknet_p1:conv20*3,masknet_p1:conv9,masknet_p1:conv14*23,masknet_p1:conv32,masknet_p1:conv30,masknet_p1:conv35,masknet_p1:conv37,masknet_p1:conv3,masknet_p1:conv24,masknet_p1:conv13,masknet_p1:conv21*3,masknet_p1:conv10,masknet_p1:conv7,masknet_p1:conv34,masknet_p1:conv13*24,masknet_p1:conv10*4,masknet_p1:conv22*2,masknet_p1:conv6,masknet_p1:conv22,masknet_p1:conv11,masknet_p1:conv40,masknet_p1:conv15,masknet_p1:conv17*23,masknet_p1:conv29,masknet_p1:conv21,masknet_p1:conv1,masknet_p1:conv11*3,mobilenet:conv3,mobilenet:conv2*4,mobilenet:conv6,mobilenet:conv7,mobilenet:conv5*4,mobilenet:conv4*4,mobilenet:conv7*4,mobilenet:conv1*3,mobilenet:conv10,mobilenet:conv2,mobilenet:conv5,mobilenet:conv4,mobilenet:conv9*4,mobilenet:conv8,mobilenet:conv9,mobilenet:conv6*4,mobilenet:conv10*4,mobilenet:conv11,mobilenet:conv8*20,mobilenet:conv1,mobilenet:conv11*4,mobilenet:conv3*4\n\n----------------------------------------------------------------------\nRan 4 tests in 33.838s\n\nFAILED (failures=4)\n\nStill fails.\n\n@mruberry It is suggested by @VitalyFedyunin that, we need to display fail test to avoid invalid inputs, I guess we should set it as expected failures under the pytest test framework, right? we will change it as expected failure cases under pytest test framework. The result will looks like be low, is it ok?\n2500 passed, 136 skipped, 0 failed, 0 errors, 2 expected failures, 0 unexpected passes", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": { + "login": "mingxiaoh" + }, + "databaseId": 673816925 + }, + { + "bodyText": "Displaying tests that fail is fine, but I don't think @VitalyFedyunin meant that it was OK if the tests didn't pass. If these are expected failures then yes, you can use with self.assertRaises(RuntimeError):... when testing them. If you also want to report that the test has test cases with these properties you can print or warn, which will appear in the test output.", + "author": { + "login": "mruberry" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 673858224 + }, + { + "bodyText": "Codecov Report\n\nMerging #31093 into master will not change coverage.\nThe diff coverage is n/a.\n\n\n@@ Coverage Diff @@\n## master #31093 +/- ##\n=======================================\n Coverage 68.00% 68.00% \n=======================================\n Files 382 382 \n Lines 49527 49527 \n=======================================\n Hits 33679 33679 \n Misses 15848 15848 \n\nContinue to review full report at Codecov.\n\nLegend - Click here to learn more\n\u0394 = absolute (impact), \u00f8 = not affected, ? = missing data\nPowered by Codecov. Last update 69f6d94...29f6aa6. Read the comment docs.", + "author": { + "login": "codecov" + }, + "authorAssociation": "NONE", + "editor": { + "login": "codecov" + }, + "databaseId": 686921371 + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpHOKCNQFQ==", + "hasPreviousPage": true + } + } + } + } + } + }, + "query_sha=62ce809793481ce6ddce6e1a19d9b0761755ff0ff75decaf8a79419eaf793110 cursor=Y3Vyc29yOnYyOpHOKCNQFQ== name=pytorch number=31093 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "comments": { + "nodes": [ + { + "bodyText": "Hi, @mingfeima @soumith @Jianhui-Li\nthis will improve the test coverage of mkldnn convolution, would you please review it?\nThe current code is forward only, do we need to cover backward, if yes, we can add backward.", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 564806270 + }, + { + "bodyText": "@mingxiaoh, what is the value in testing DNNL as part of Pytorch validation for the Pytorch developers? Shouldn't having these tests run in DNNL validation be enough?", + "author": { + "login": "vpirogov" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 564808528 + }, + { + "bodyText": "@vpirogov The main value is to serve as a blind test to DNNL. If DNNL adds these test to DNNL test sets, it lost the value as a blind test. The spirit of validation is to cross check.\n@gottbrath @gchanan The test was developed per the request of Pytorch team. Mingxiao made an effort to reduce the execution time to a few second but still with good coverage. Although the test today is focused on DNNL, it could be easily extended to be blind test for any conv implementation used in Pytorch.", + "author": { + "login": "Jianhui-Li" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 567826907 + }, + { + "bodyText": "@mruberry thanks for the comment. As for the chainer dependency, we import it is because we would like to use its testing function for pytest test cases combinations, other wise we need to write much more code to achieve same effect. So, can we use it?", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 574563012 + }, + { + "bodyText": "@mingxiaoh You cannot import chainer. Looking at the code you should be able to achieve the same effect without it.", + "author": { + "login": "mruberry" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 575272358 + }, + { + "bodyText": "@mruberry ok, we will change it according to your requirement. Thanks", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 583917522 + }, + { + "bodyText": "\ud83d\udd17 Helpful links\n\n\ud83e\uddea \u00a0See artifacts and rendered test results at hud.pytorch.org/pr/31093\n\ud83d\udd27 \u00a0Opt-in to CIFlow to control what jobs run on your PRs\n\n\ud83d\udc8a CI failures summary and remediations\nAs of commit 29f6aa6 (more details on the Dr. CI page):\n\nCommit 29f6aa6 was recently pushed. Waiting for builds...\n\nThis comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.\nPlease report bugs/suggestions to the (internal) Dr. CI Users group.\nClick here to manually regenerate this comment.", + "author": { + "login": "dr-ci" + }, + "authorAssociation": "NONE", + "editor": { + "login": "facebook-github-bot" + }, + "databaseId": 628466876 + }, + { + "bodyText": "@mruberry how about those cudnn UT error? we add check for it but it should be NV to fix cudnn bugs.", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 629955767 + }, + { + "bodyText": "Hey @mingxiaoh! You're right, of course, that you shouldn't have to fix cuDNN bugs. Would you please:\n\nAssert that the test case fails, so we know it's failing and if someone fixes it they'll know what test to update.\nFile a new issue explaining the behavior and providing a short PyTorch program to reproduce the issue.\n\nThen we can ping NVIDIA on that issue.", + "author": { + "login": "mruberry" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 629997129 + }, + { + "bodyText": "about the suggestion 'Assert that the test case fails, so we know it's failing and if someone fixes it they'll know what test to update. ', if we only assert it and continue the following test, I guess users might always ignore them in later test. Anyway, any similar example case for reference?", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 630010734 + }, + { + "bodyText": "In this recent PR https://github.com/pytorch/pytorch/pull/38505/files, for example, you can see that the construction of bool tensors wasn't working properly, so the test author cited the relevant issue and asserted that the incorrect behavior happened, as expected. You can also see how these lines are being removed by https://github.com/pytorch/pytorch/pull/38392/files, which fixes the issue.\nAnother common pattern is to use with self.assertRaises(RuntimeError/AssertionError/etc.):.", + "author": { + "login": "mruberry" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 630014823 + }, + { + "bodyText": "@mruberry the failed UT case is not introduced by our modification, how to handle this issue?", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 631187735 + }, + { + "bodyText": "@mingxiaoh You mean the failures on ROCm? You may ignore them. Be sure to re-request review when you're ready.", + "author": { + "login": "mruberry" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 631191425 + }, + { + "bodyText": "@mruberry we already skipped those ROCm errors, but there are stil somel error caused by the original code, they are not introduced by our modification.", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 631886529 + }, + { + "bodyText": "I understand. Let me know when you're ready for me to review.", + "author": { + "login": "mruberry" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 631908011 + }, + { + "bodyText": "@mruberry thanks, we are ready for review now.", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 631909442 + }, + { + "bodyText": "@mingxiaoh Great! I'll take a look ASAP.", + "author": { + "login": "mruberry" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 631910556 + }, + { + "bodyText": "@mruberry we just pull the latest code and updated the patch according to your comment, may you please help double check it? BTW, the new failed case in preci is not introduced by our modification.", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 633430458 + }, + { + "bodyText": "@ailzhang would you please check the comment below? Thanks.\nIs there a reason why this TestConv2dExt is a new class instead a test inside TestNN?\n//comment: it is actually suggested by Tongzhou Wang in another thread before.\nAlthough this test sits in generic testing framework, it's actually comparing thnn/mkldnn/cudnn results specially. I feel it's better to make it truly generic so that it compares any device result with CPU result. Alternatively you can mark this test only run when torch.backends.mkldnn.is_available()=True\n//comment: but our goal is to compare the result with that of thnn. Anyway, if you insist, we can start to compare it with cpu.", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": { + "login": "mingxiaoh" + }, + "databaseId": 634432326 + }, + { + "bodyText": "Pruning reviewers. @ngimel, @VitalyFedyunin, this PR is looking pretty good from a test framework perspective. Would one of you like to review?", + "author": { + "login": "mruberry" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 634557563 + }, + { + "bodyText": "@mruberry Thanks, would you please help review it again. BTW: failed case is not introduced by our modification.", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 635256214 + }, + { + "bodyText": "@mruberry we moved our case to TestNNDeviceType class, would you please help review it again? BTW, those failed cases are not introduced by our code", + "author": { + "login": "1pikachu" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 637364148 + }, + { + "bodyText": "@mruberry we moved our case to TestNNDeviceType class, would you please help review it again? BTW, those failed cases are not introduced by our code\n\n@ngimel will follow-up on the test itself sometime this week or early next week.", + "author": { + "login": "mruberry" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 637444457 + }, + { + "bodyText": "@mruberry we moved our case to TestNNDeviceType class, would you please help review it again? BTW, those failed cases are not introduced by our code\n\n@ngimel will follow-up on the test itself sometime this week or early next week.\n\n@mruberry thank you", + "author": { + "login": "1pikachu" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 637479226 + }, + { + "bodyText": "Improving test coverage of math libraries is certainly a good goal and this PR is moving towards it. I have some doubts about implementation decisions made, and about running this PR as part of regular pytorch CI.\nIf the primary goal of this PR is to test correctness of the convolution implementations in the vendor library, then it does not serve this purpose. The absolute majority of the 4000+ test cases come from group 1, where different kernel sizes/strides/dilations are used to produce the output of size 1x1. This can test whether pytorch correctly passes convolution parameters to the backends (although there are cheaper ways to do that), but as actual library correctness check it is almost useless - libraries use very different kernels depending in the input/output sizes, and tests with toy sizes like this don't invoke the real bread-and-butter kernels.\nAlso, if this test suite is meant as primary a means of testing vendor libraries (which is a good goal!) it does not have a place as a part of pytorch regular CI, and should be run when the corresponding vendor libraries are updated. I'd suggest moving this test out into a separate file (maybe even outside of torch/test directory) and have it as a part of library update/qualification process rather than regular CI.\nAlso, if the primary goal is to enable easier testing of vendor libraries correctness, perhaps we should rethink the mechanism of the generation of test cases. It should be easy to add a test case with a particular set of parameters that was found to be buggy. Also, running a cross-product of cases in a multi-dimensional space (as this PR does) is rarely an efficient way of getting a signal, some forms of random sampling usually provide a way to get better correctness signal why using less resources.\nAlso, when testing libraries it is important to test both forward and backward functions, whereas this PR does forward only. I'm openminded on whether convTransposed should be tested or not - if we are testing vendor libraries, then it's not necessary, convTransposed calls the same underlying functions, if we are testing pytorch, then it makes sense to test it separately because it takes different codepaths.", + "author": { + "login": "ngimel" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 637827507 + }, + { + "bodyText": "@mruberry ngimel is quite responsible, but it seems that she is not familiar with the background of this pull-request, since this pull-request is pending for so such a long time, each time we are almost done, then reviewer changes, each reviewer has different idea, it is good, but, would it be better if you help review it or ask the same reviewer to review it considering that you are more familiar with the background/change history? Thanks in advance.", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 637912105 + }, + { + "bodyText": "@mruberry ngimel is quite responsible, but it seems that she is not familiar with the background of this pull-request, since this pull-request is pending for so such a long time, each time we are almost done, then reviewer changes, each reviewer has different idea, it is good, but, would it be better if you help review it or ask the same reviewer to review it considering that you are more familiar with the background/change history? Thanks in advance.\n\nWe know this PR has been open for awhile and we respect that your time is valuable, but we want to make sure we're making the right change here, and I think @ngimel's comments reflect that and should not be too difficult to address. As I understand, her points are:\n\nThis is a good PR with an exciting idea. To let it run longer and test more cases maybe it should run outside the regular PyTorch CI.\nTo remedy this, let's create a test/math_libraries folder and put this test there: test/math_libaries/convolutions.py. Yes, this is different from our requests in the past, which is our mistake, but it should be an easy change.\nTo make the test more interesting it'd be good for the test cases to resemble convolutions used in practice. The current test cases seem like similar \"toy\" examples. Without time pressure we should be able to run larger, more computationally intensive convolutions.\nLet's change the test cases to include some practical convolutions, make it easy to add test cases, and think about how we might generate other interesting cases. (We should also test backwards once we have more time!)\n\nAnd I think these are good points. Maybe the PR doesn't create a new way to generate interesting convolutions to start and instead only runs a few representative convolutions, but @ngimel is positioning the work for success so that it's useful and we can continue to improve on it in the future.\nDoes that make sense?", + "author": { + "login": "mruberry" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 637924703 + }, + { + "bodyText": "@mruberry we were required to finish the test in limited time long long before, at that time, jianhui discussed this issue with you, and you are all agreed with the current test scope and test case number and test time, so you meant you change your mind now? you are not care about the test time currently? Sorry, this issue is pending so long, we are struggling with it now and would like to finish it asap. Given this, it would be be better if you raise all the requirement at a time, considering that we have many tasks at hand, we are hoping so eagerly that we can finish this PR and use it for further test for bugs finding.", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": { + "login": "mingxiaoh" + }, + "databaseId": 637960626 + }, + { + "bodyText": "@mruberry we were required to finish the test in limited time long long before, at that time, jianhui discussed this issue with you, and you are all agreed with the current test scope and test case number and test time, so you meant you change your mind now? you are not care about the test time currently? Sorry, this issue is pending so long, we are struggling with it now and would like to finish it asap. Given this, it would be be better if you raise all the requirement at a time, considering that we have many tasks at hand, we are hoping so eagerly that we can finish this PR and use it for further test for bugs finding.\n\nI'm sorry, I don't think I've talked to @Jianhui-Li before. It's true that the team we expressed a concern about timing if the test was to be run in the CI initially, but I think now that we understand what the test is trying to do better we're not sure the CI is the best place for it. The PR was also closed after a lengthy period of inactivity, and we assumed it had simply been abandoned.\nDo you know who @Jianhui-Li spoke with about this issue originally? Maybe I can follow-up with them for more context.", + "author": { + "login": "mruberry" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 637967153 + }, + { + "bodyText": "@mruberry it is reviewed and discussed with @soumith before. Anyway, since current reviewer is you, so, it should be decided by you. So, what we should do next?", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 637978356 + }, + { + "bodyText": "@mruberry it is reviewed and discussed with @soumith before. Anyway, since current reviewer is you, so, it should be decided by you. So, what we should do next?\n\nI think this will be easier to discuss at the regular Intel-FB meeting.", + "author": { + "login": "mruberry" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 638446723 + }, + { + "bodyText": "@mruberry it is reviewed and discussed with @soumith before. Anyway, since current reviewer is you, so, it should be decided by you. So, what we should do next?\n\nI think this will be easier to discuss at the regular Intel-FB meeting.\n\nLet me sync with Mingxiao and follow up with this. Thanks.", + "author": { + "login": "Jianhui-Li" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 638451670 + }, + { + "bodyText": "@mruberry would you please help review it again?", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 653028208 + }, + { + "bodyText": "@mruberry would you please help review it again?\n\nHappy to help out, but as last discussed this needs some follow-up at the Intel-FB meeting. Did you get a chance to discuss it there, yet? If so, what did you decide?", + "author": { + "login": "mruberry" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 654443242 + }, + { + "bodyText": "@mruberry would you please help review it again?\n\nHappy to help out, but as last discussed this needs some follow-up at the Intel-FB meeting. Did you get a chance to discuss it there, yet? If so, what did you decide?\n\nyes, we talked it with jianhui, and we decided to follow your ideas. Anyway, we would like to do so modification later, will contact you for review tomorrow. Thanks", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 656062287 + }, + { + "bodyText": "@mruberry would you please help review it again?\n\nHappy to help out, but as last discussed this needs some follow-up at the Intel-FB meeting. Did you get a chance to discuss it there, yet? If so, what did you decide?\n\nyes, we talked it with jianhui, and we decided to follow your ideas. Anyway, we would like to do so modification later, will contact you for review tomorrow. Thanks\n\n@mruberry the code is ready for review now, would you please take time for it? Thanks.", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 658071151 + }, + { + "bodyText": "super nit: renaming files to .json will make it more IDE friendly.", + "author": { + "login": "VitalyFedyunin" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 658464685 + }, + { + "bodyText": "@mruberry would you please help review it again?\n\nHappy to help out, but as last discussed this needs some follow-up at the Intel-FB meeting. Did you get a chance to discuss it there, yet? If so, what did you decide?\n\nyes, we talked it with jianhui, and we decided to follow your ideas. Anyway, we would like to do so modification later, will contact you for review tomorrow. Thanks\n\n@mruberry the code is ready for review now, would you please take time for it? Thanks.\n\nCool! I took a look with @ngimel, once these issues are addressed I think we're good to go!", + "author": { + "login": "mruberry" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 659164401 + }, + { + "bodyText": "@ngimel & @VitalyFedyunin We have changed the code according to your suggestions, would you please review it again? Thanks.", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 660884305 + }, + { + "bodyText": "@ngimel & @VitalyFedyunin We have changed the code according to your suggestions, would you please review it again? Thanks.\n\nUpdated: one more question about tolerances, one code cleanup recommendation, and one task leftover from the last review.", + "author": { + "login": "mruberry" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 662678464 + }, + { + "bodyText": "Updated: one more question about tolerances, one code cleanup recommendation, and one task leftover from the last review.\n@mruberry we have finished the modification according to your comment, would you please review it again? Thanks.", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 662930687 + }, + { + "bodyText": "The code looks good, but I tried running the test suite and hit the following failures:\n======================================================================\nFAIL: test_conv2d_ext_cuda_float16 (__main__.TestConvExtCUDA)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 777, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 777, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 241, in instantiated_test\n result = test(self, device_arg, dtype)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 542, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 411, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 102, in test_conv2d_ext\n msg=msg\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 1085, in assertEqual\n self.assertTrue(result, msg=msg)\nAssertionError: False is not true : device:cuda:0, dtype:torch.float16, group:1, batchsize:22input channel:448, output channel:384, bias:False, padding:[1, 1], dilation:[1, 1], stride:[1, 1], kernel:[3, 3]\n\n======================================================================\nFAIL: test_conv2d_ext_cuda_float32 (__main__.TestConvExtCUDA)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 777, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 777, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 241, in instantiated_test\n result = test(self, device_arg, dtype)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 542, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 411, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 102, in test_conv2d_ext\n msg=msg\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 1085, in assertEqual\n self.assertTrue(result, msg=msg)\nAssertionError: False is not true : device:cuda:0, dtype:torch.float32, group:1, batchsize:22input channel:80, output channel:192, bias:False, padding:[0, 0], dilation:[1, 1], stride:[1, 1], kernel:[3, 3]\n\n======================================================================\nFAIL: test_conv2d_ext_cuda_float64 (__main__.TestConvExtCUDA)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 777, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 777, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 241, in instantiated_test\n result = test(self, device_arg, dtype)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 542, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 411, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 106, in test_conv2d_ext\n \"invalid cases:\" + \",\".join(invalid_cases)\nAssertionError: invalid cases:masknet_p1:conv33,masknet_p1:conv8,masknet_p1:conv2*4,masknet_p1:conv12,masknet_p1:conv4*3,masknet_p1:conv19,masknet_p1:conv4,masknet_p1:conv4,masknet_p1:conv27,masknet_p1:conv39,masknet_p1:conv23,masknet_p1:conv20,masknet_p1:conv25,masknet_p1:conv17,masknet_p1:conv9*4,masknet_p1:conv36,masknet_p1:conv18,masknet_p1:conv5,masknet_p1:conv38,masknet_p1:conv31,masknet_p1:conv14,masknet_p1:conv26,masknet_p1:conv2,masknet_p1:conv5*2,masknet_p1:conv28,masknet_p1:conv16,masknet_p1:conv20*3,masknet_p1:conv9,masknet_p1:conv14*23,masknet_p1:conv32,masknet_p1:conv30,masknet_p1:conv35,masknet_p1:conv37,masknet_p1:conv3,masknet_p1:conv24,masknet_p1:conv13,masknet_p1:conv21*3,masknet_p1:conv10,masknet_p1:conv7,masknet_p1:conv34,masknet_p1:conv13*24,masknet_p1:conv10*4,masknet_p1:conv22*2,masknet_p1:conv6,masknet_p1:conv22,masknet_p1:conv11,masknet_p1:conv40,masknet_p1:conv15,masknet_p1:conv17*23,masknet_p1:conv29,masknet_p1:conv21,masknet_p1:conv1,masknet_p1:conv11*3,mobilenet:conv3,mobilenet:conv2*4,mobilenet:conv6,mobilenet:conv7,mobilenet:conv5*4,mobilenet:conv4*4,mobilenet:conv7*4,mobilenet:conv1*3,mobilenet:conv10,mobilenet:conv2,mobilenet:conv5,mobilenet:conv4,mobilenet:conv9*4,mobilenet:conv8,mobilenet:conv9,mobilenet:conv6*4,mobilenet:conv10*4,mobilenet:conv11,mobilenet:conv8*20,mobilenet:conv1,mobilenet:conv11*4,mobilenet:conv3*4\n\nLooking at the first invalid convolution, for example, it's:\n {\n \"case_name\":\"masknet_p1:conv33\",\n \"mb\":1,\n \"g\":1,\n \"ic\":512,\n \"ih\":64,\n \"iw\":64,\n \"oc\":12,\n \"kh\":1,\n \"kw\":1,\n \"sh\":1,\n \"sw\":1,\n \"ph\":0,\n \"pw\":0,\n \"dh\":0,\n \"dw\":0,\n \"bias\":\"False\"\n },\n\nwhich has a dh and dw of zero, causing it to be added to invalid cases here:\ndh, dw = case['dh'], case['dw']\n has_bias = case['bias']\n if dh == 0 or dw == 0:\n invalid_cases.append(case_name)", + "author": { + "login": "mruberry" + }, + "authorAssociation": "MEMBER", + "editor": { + "login": "mruberry" + }, + "databaseId": 663240268 + }, + { + "bodyText": "@mruberry the failure was not detected is because we did not export the cudnn path. Yes, you are right, we need to a large atol of 1e-2 . Would you please help review it again? Thanks.", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 664373079 + }, + { + "bodyText": "@mruberry the failure was not detected is because we did not export the cudnn path. Yes, you are right, we need to a large atol of 1e-2 . Would you please help review it again? Thanks.\n\nBefore I run these tests again, is an atol of 1e-2 needed for all types or just half? Also, how does 1e-2 compare to the values that are being compared?", + "author": { + "login": "mruberry" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 664569507 + }, + { + "bodyText": "@mruberry 1e-2 is experimental result, details see below, random means it might be failed sometimes.\n\n\n\natol,rtol\n1e-2,1e-2\n1e-2,1e-3\n1e-3,1e-2\n1e-3,1e-3\n1e-4,1e-3\n1e-3,1e-4\n1e-4,1e-4\n1e-4,1e-5\n1e-5,1e-4\n\n\n\n\nCuda float16\npass\npass\npass\npass\npass\nfail\nFail\nFail\nfail\n\n\nCuda float32\npass\nrandom\nrandom\nrandom\nrandom\nrandom\nrandom\nrandom\nfail", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 666894774 + }, + { + "bodyText": "@mruberry would you please find time to review it again? Thanks.", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 668380451 + }, + { + "bodyText": "@mruberry would you please find time to review it again? Thanks.\n\nI was just about to try and run this again locally but it looks like the files describing the convolutions are missing?", + "author": { + "login": "mruberry" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 670306210 + }, + { + "bodyText": "@mruberry sorry but what is missing actually?", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 670322557 + }, + { + "bodyText": "@mruberry sorry but what is missing actually?\n\nThe JSON files.", + "author": { + "login": "mruberry" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 670591170 + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpHOIapCfg==", + "hasPreviousPage": false + } + } + } + } + } + }, + "query_sha=a782f66a44a63d21c9e17b1373747a1c07e50b695762a68a8b8db1203ac6c1bb name=pytorch number=71759 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "closed": true, + "isCrossRepository": true, + "author": { + "login": "coolteemf" + }, + "title": "Optimize grid sample 3d", + "body": "Fixes #71415\r\nI have implemented the changes that replicate what @to-mi did in this [PR](https://github.com/pytorch/pytorch/pull/65986#issue-1012959443) for the 3D case :\r\n\r\n> Fixes #64977\r\n> \r\n> Avoids creating a tensor for and calculating `input` gradient if it's not needed in the backward pass of `grid_sample` (2d case, native CPU & CUDA kernels). Especially the tensor creation seemed time consuming (see #64977).\r\n> \r\n> Brief description of the changes:\r\n> \r\n> * I have tried to go with rather minimal changes. It would probably be possible to make a more elegant version with a bit larger refactoring (or possibly with better understanding of PyTorch internals and C++ functionalities).\r\n> \r\n> * Changed the `native_functions.yaml` and `derivatives.yaml` so that the gradient input mask is passed to the functions.\r\n> \r\n> * Changed the CPU kernels:\r\n> (1) added `bool input_requires_grad` template parameter to the `backward` function,\r\n> (2) added if branches based on it to remove `input` gradient computations if it's not requested,\r\n> (3) feed in `TensorAccessor* gInp_slice_ptr` instead of `TensorAccessor& gInp_slice` so that I can pass a `nullptr` in case gradient for `input` is not requested. (A bit inelegant perhaps, but allows to keep one signature for `backward` function and not require breaking it to smaller pieces. Perhaps there's a more elegant way to achieve this?)\r\n> \r\n> * Changed CUDA kernel:\r\n> (1) added ~`bool input_requires_grad` template parameter~ `const bool input_requires_grad` argument to the `backward` function,\r\n> (2) added if branches based on it to remove `input` gradient computations if it's not requested,\r\n> (3) feed in `TensorInfo()` instead of `getTensorInfo(grad_input)` in case gradient for `input` is not requested.\r\n> \r\n> * Modified tests in `test/test_nn.py` so that they run also cases with no `input` gradient needed.\r\n> \r\n> * Have not touched the CPU fallback kernel.\r\n\r\nNote: the changes number (3) are N/A in this case.\r\n\r\n", + "headRefName": "optimize_grid_sample_3d", + "headRepository": { + "nameWithOwner": "coolteemf/pytorch" + }, + "baseRefName": "master", + "baseRepository": { + "nameWithOwner": "pytorch/pytorch", + "isPrivate": false, + "defaultBranchRef": { + "name": "master" + } + }, + "mergeCommit": null, + "commits_with_authors": { + "nodes": [ + { + "commit": { + "author": { + "user": null, + "email": "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", + "name": "coolteemf" + }, + "oid": "e0b0d1e695aeddceaf265da602c4704592053e9e" + } + }, + { + "commit": { + "author": { + "user": null, + "email": "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", + "name": "coolteemf" + }, + "oid": "563ec73747ad53b63b36736c47c4342f962c2a09" + } + }, + { + "commit": { + "author": { + "user": null, + "email": "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", + "name": "coolteemf" + }, + "oid": "51abe41a132d9dd5b1c0551bdca902aacc028ff8" + } + }, + { + "commit": { + "author": { + "user": null, + "email": "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", + "name": "coolteemf" + }, + "oid": "be9898205992034a00e8ace8a55c2ecdcee2c2f8" + } + }, + { + "commit": { + "author": { + "user": null, + "email": "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", + "name": "coolteemf" + }, + "oid": "2929c60b64384c2deae0f7dea8bab94ad4bc9ec8" + } + }, + { + "commit": { + "author": { + "user": null, + "email": "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", + "name": "coolteemf" + }, + "oid": "9241b737e7e2b257905cc74ad9c50b737d7f9d0a" + } + }, + { + "commit": { + "author": { + "user": null, + "email": "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", + "name": "coolteemf" + }, + "oid": "64d6b795d0636928a8aa2fd3da01302fb5f5f7af" + } + }, + { + "commit": { + "author": { + "user": null, + "email": "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", + "name": "coolteemf" + }, + "oid": "4503577e53760a0006f1e80ca6bfe04d2be90470" + } + }, + { + "commit": { + "author": { + "user": null, + "email": "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", + "name": "coolteemf" + }, + "oid": "b16f4b11ffbbbf2ca2098f9702af4ef6b6fc5e1f" + } + }, + { + "commit": { + "author": { + "user": null, + "email": "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", + "name": "coolteemf" + }, + "oid": "7ffc23368a604afdc92d2818747f730ce31a2bb5" + } + }, + { + "commit": { + "author": { + "user": null, + "email": "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", + "name": "coolteemf" + }, + "oid": "b85292604b9ad6c31706b76b5a5498c4f6d94309" + } + }, + { + "commit": { + "author": { + "user": null, + "email": "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", + "name": "coolteemf" + }, + "oid": "9d81d7bae8ad91aaa24b3ceab83e3138894dbc69" + } + }, + { + "commit": { + "author": { + "user": null, + "email": "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", + "name": "coolteemf" + }, + "oid": "e79f6a2202512b294c55bf4bfb2e0524fafd4c48" + } + }, + { + "commit": { + "author": { + "user": null, + "email": "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", + "name": "coolteemf" + }, + "oid": "f683e8aec7aea76097a264eec01511e704c31154" + } + }, + { + "commit": { + "author": { + "user": { + "login": "coolteemf" + }, + "email": "67541941+coolteemf@users.noreply.github.com", + "name": "Fran\u00e7ois Lecomte" + }, + "oid": "b932e9e286c22aaf352375186df851ef060b295a" + } + }, + { + "commit": { + "author": { + "user": null, + "email": "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", + "name": "coolteemf" + }, + "oid": "346e0c547953d98eb84d23c1391a95badb9c4a22" + } + } + ], + "totalCount": 16 + }, + "commits": { + "nodes": [ + { + "commit": { + "checkSuites": { + "nodes": [ + { + "app": { + "name": "Facebook GitHub Tools", + "databaseId": 12274 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [ + { + "name": "Facebook CLA Check", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwGYqY=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-clang7-onnx" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwIob0=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3-clang5-mobile-build" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwGZ1E=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-bionic-rocm4.5-py3.7" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 2, 2, linux.rocm.gpu)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 2, linux.rocm.gpu)", + "conclusion": "SUCCESS" + }, + { + "name": "test (distributed, 1, 1, linux.rocm.gpu)", + "conclusion": "FAILURE" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwMsZY=", + "hasNextPage": false + } + }, + "conclusion": "FAILURE" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "win-vs2019-cuda11.3-py3" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 2, windows.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS" + }, + { + "name": "test (force_on_cpu, 1, 1, windows.4xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 2, 2, windows.8xlarge.nvidia.gpu)", + "conclusion": "FAILURE" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwZbzg=", + "hasNextPage": false + } + }, + "conclusion": "FAILURE" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "Lint" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "mypy", + "conclusion": "SUCCESS" + }, + { + "name": "shellcheck", + "conclusion": "SUCCESS" + }, + { + "name": "py2-setup-validate-errormsg", + "conclusion": "SUCCESS" + }, + { + "name": "clang-format", + "conclusion": "SUCCESS" + }, + { + "name": "cmakelint", + "conclusion": "SUCCESS" + }, + { + "name": "toc", + "conclusion": "SUCCESS" + }, + { + "name": "quick-checks", + "conclusion": "SUCCESS" + }, + { + "name": "clang-tidy", + "conclusion": "SUCCESS" + }, + { + "name": "flake8-py3", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwGbAQ=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-clang7-asan" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 3, 3, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 3, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 2, 3, linux.2xlarge)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwJC4U=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build-and-test", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwGZ_w=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-gcc5.4" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (backwards_compat, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (distributed, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (jit_legacy, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (docs_test, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwIWu4=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build-and-test", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwGZ1k=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-cuda11.3-py3.7-gcc7" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 2, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS" + }, + { + "name": "test (distributed, 1, 1, linux.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 2, 2, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwOTJ0=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "Test tools" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "test", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwGZ80=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-bionic-py3.7-clang9" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (noarch, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwIUUk=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-docs" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "build-docs (cpp)", + "conclusion": "SUCCESS" + }, + { + "name": "build-docs (python)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwIXQk=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3-clang5-mobile-custom-build-static" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwGZ9k=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "run-torchbench", + "conclusion": "NEUTRAL" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwGZ08=", + "hasNextPage": false + } + }, + "conclusion": "SKIPPED" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-gcc7" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (distributed, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwIaFM=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build-and-test", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwGZ9Y=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "win-vs2019-cpu-py3" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 2, windows.4xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 2, 2, windows.4xlarge)", + "conclusion": "FAILURE" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwXcvs=", + "hasNextPage": false + } + }, + "conclusion": "FAILURE" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-vulkan-bionic-py3.7-clang9" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwIjzs=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-gcc7-no-ops" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwGZ9Q=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUK_Ueg=", + "hasNextPage": false + } + }, + "oid": "346e0c547953d98eb84d23c1391a95badb9c4a22" + } + } + ] + }, + "changedFiles": 9, + "files": { + "nodes": [ + { + "path": "aten/src/ATen/native/GridSampler.cpp" + }, + { + "path": "aten/src/ATen/native/cpu/GridSamplerKernel.cpp" + }, + { + "path": "aten/src/ATen/native/cuda/GridSampler.cpp" + }, + { + "path": "aten/src/ATen/native/cuda/GridSampler.cu" + }, + { + "path": "aten/src/ATen/native/cuda/GridSampler.h" + }, + { + "path": "aten/src/ATen/native/native_functions.yaml" + }, + { + "path": "test/forward_backward_compatibility/check_forward_backward_compatibility.py" + }, + { + "path": "test/test_nn.py" + }, + { + "path": "tools/autograd/derivatives.yaml" + } + ], + "pageInfo": { + "endCursor": "OQ", + "hasNextPage": false + } + }, + "reviews": { + "nodes": [ + { + "author": { + "login": "albanD" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "coolteemf" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "albanD" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "coolteemf" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "albanD" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "coolteemf" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "coolteemf" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "albanD" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "coolteemf" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "albanD" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "albanD" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "coolteemf" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "albanD" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "coolteemf" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "albanD" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "albanD" + }, + "state": "APPROVED" + }, + { + "author": { + "login": "albanD" + }, + "state": "APPROVED" + } + ], + "totalCount": 17 + }, + "comments": { + "nodes": [ + { + "bodyText": "Merge failed due to 'NoneType' object is not subscriptable\nRaised by https://github.com/pytorch/pytorch/actions/runs/1887945630", + "author": { + "login": "pytorchmergebot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1048868910 + }, + { + "bodyText": "Thanks for the update! The windows failure is not your fault, you can ignore it!\n\nThank you very much for all of your feedback and sorry for the delay !", + "author": { + "login": "coolteemf" + }, + "authorAssociation": "CONTRIBUTOR", + "editor": null, + "databaseId": 1048983572 + }, + { + "bodyText": "@coolteemf can you please send either me or @albanD an email? (or I can send you and invite to collab on private repo)", + "author": { + "login": "malfet" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1049048119 + }, + { + "bodyText": "@pytorchbot merge this please", + "author": { + "login": "albanD" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1049131992 + }, + { + "bodyText": "Hey @coolteemf.\nYou've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.\nFor changes that are 'topic: not user facing' there is no need for a release notes label.", + "author": { + "login": "github-actions" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 1049134520 + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpHOPoR4Lg==", + "hasPreviousPage": true + } + } + } + } + } + }, + "query_sha=a782f66a44a63d21c9e17b1373747a1c07e50b695762a68a8b8db1203ac6c1bb name=pytorch number=68111 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "closed": true, + "isCrossRepository": true, + "author": { + "login": "chunyuan-w" + }, + "title": "Add JIT graph fuser for oneDNN Graph API (Preview4)", + "body": "## Description\r\nPreview4 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444).\r\n\r\nOn the basis of https://github.com/pytorch/pytorch/pull/50256, the below improvements are included:\r\n\r\n- The [preview4 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.4.1) of the oneDNN Graph API is used\r\n- The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties.\r\n\r\n### User API:\r\nThe optimization pass is disabled by default. Users could enable it by:\r\n```\r\ntorch.jit.enable_onednn_fusion(True)\r\n```\r\n\r\n### Performance:\r\n[pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance:\r\n- SkyLake 8180 (1 socket of 28 cores):\r\n\r\n ![image](https://user-images.githubusercontent.com/65992142/151162305-05e44425-a24e-4d5e-94e1-743b40b87a8c.png)\r\n\r\n- SkyLake 8180 (single thread):\r\n\r\n ![image](https://user-images.githubusercontent.com/65992142/151162528-69f90b79-d08d-46b8-8775-d80a6ccbce8a.png)\r\n \\* By mapping hardswish to oneDNN Graph, it\u2019s 8% faster than PyTorch JIT (NNC + OFI)\r\n \\** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops\r\n\r\n\r\n### Directory structure of the integration code\r\nFuser-related code are placed under:\r\n```\r\ntorch/csrc/jit/codegen/onednn/\r\n```\r\n\r\nOptimization pass registration is done in:\r\n```\r\ntorch/csrc/jit/passes/onednn_graph_fuser.h\r\n```\r\n\r\nCMake for the integration code is:\r\n```\r\ncaffe2/CMakeLists.txt\r\n```\r\n\r\n## Limitations\r\n\r\n- In this PR, we have only supported the optimization on Linux platform. The support on Windows and MacOS will be enabled as the next step.\r\n- We have only optimized the inference use case.", + "headRefName": "chunyuan/llga_preview2", + "headRepository": { + "nameWithOwner": "chunyuan-w/pytorch" + }, + "baseRefName": "master", + "baseRepository": { + "nameWithOwner": "pytorch/pytorch", + "isPrivate": false, + "defaultBranchRef": { + "name": "master" + } + }, + "mergeCommit": null, + "commits_with_authors": { + "nodes": [ + { + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "0096fcc49f277fd8e006fcb42e0cb28a1422ec98" + } + }, + { + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "7bcc4de26a5472f1d252735dd425b46794b0844f" + } + }, + { + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "3a2a588bfe6bbf9bf74d88d441cd22affda207da" + } + }, + { + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "ca7df12fbfaa3ddbabeca39b76300d17f4a33f2f" + } + }, + { + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "81d44f35b8bc043c38837d0694e5bc072203b832" + } + }, + { + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "14fd5d1bfc2c58a71379f778871e3fca0a8e79b2" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "954dc23663125897f4b199eb2a8607dc5fca3274" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "9f77a0b476accc678b6f0569e4ff33fa6bbe97fc" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchitintel" + }, + "oid": "fbf3b23bc1288697e1aec539a7c4ee3dc0bcb84c" + } + }, + { + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "f8b8e78f786586c3cdf3966fd83ffa124d3eda70" + } + }, + { + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "6fffa2f7453ee7e0f8d8e2f73ea8a65230539589" + } + }, + { + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "849385404e6f3cd1cf7cef19f931ecf4fa28afdb" + } + }, + { + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "adbae7b77f8c0dbc59fccf15207d97ba86cfade2" + } + }, + { + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "6dcf2a4981aff24fa16fc7461ae4ec29690f956f" + } + }, + { + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "54f3e05ad524cffd0911ee93be3c50f589b51f58" + } + }, + { + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "edbfc640ea79a0af85757d9e73796dcc90231519" + } + }, + { + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "67654db7cba562809d1b4a44cdda58af5cc9daaf" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "9c9d99b930b11af9ff03f52d45bf49c652df758d" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "ffb25119cd9ce815cc4d9d14a2317fcbbfa9ea86" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "ab9eee84512ca1bdfbc81e25c6eb67b29d0f302a" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "62a4642cf3330524990a69ac29e002c97812320a" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "ca9b1223be4af2c8b4929303d498eafd71793128" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "6f4a23d24514a02954d2ec792830085f612223c9" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchitintel" + }, + "oid": "b2a9a9c0926b02d0b2e87722ed61450f224a61d0" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "e88b492be733f24b6aa395829c76add67d0901e7" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "c44336d7a914952bfb78e012e08d9a6d6dde5937" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "5157930f7b3921d41a586260582b574c915f6ca1" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "04cb8353813f6bbd0d913a994923cc7e1e291406" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchitintel" + }, + "oid": "62991eaad0e638bb0bced327e03f932f66f68732" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchitintel" + }, + "oid": "7496bf1588050191595d833d23b8972b2f22655e" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchitintel" + }, + "oid": "d9d35f23cca0cd29c78a845731b24826152dcf1c" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "f74ec134f18a65a7c72455bdf44f72e3ebb27105" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "eb32cc65a975361160948bfc3d6a577991ea262e" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "c7665f8d695b680c54db0bad2b7b7df46d886b50" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "e6321ad8f59ea01130568c202d186448bb9cb9d0" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "a72cd0d02693f45e5354a70654581ad514581ec7" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "b3cd3028b4ed31805e82f7eaf02217ab74ca59b9" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "49a592d9788d08e6cd0593882f867e129057c1cc" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "0575766b2144b13f6a38227c4e2b8d22ec8db80f" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "b5c9b10ff87d622350e8ca64fae3a476eb70d5aa" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "66bc652a30ccc329adb929870a4ac726bb98b38c" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "72b9ca9c8e2dac98cbb7199b3dfac7c7305b80c5" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "a7892ed7373207d96406c8b5734a089643c5cdbd" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchitintel" + }, + "oid": "d54cb084e1daad8a08c3f8de0ad3f7afb5b05ac1" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchitintel" + }, + "oid": "aef71d692a8a159e0ca56be363e2cc1225ce7647" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "bf618e205ec31cff962dcc8ab478e0a699a9572d" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "e4a331f1088448f7d7d86256ce71e0e71da006b0" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "0b743523d1430fec759d5fefbb687f17c89335a5" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "e80a351a62d98b810ec8985c4b25257af1d6c5bb" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "c189eca154b6691919d0e21489d1c322c7435c0b" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchitintel" + }, + "oid": "e080a067c75d7b888a8a362682a2d5ba70e0c3a8" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchitintel" + }, + "oid": "028561fbf8f3ed90e074e6e0e3a4ca4dd7ffa2a8" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "d550cf14037badd4caa2f52202e2f20bc4db8432" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "574159ebadd1dec24daaf883879ffeca8d9e71b7" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "9eb3ee98ea756067ed1c8f52f309f6d3e211a904" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "29929f48be03dcdd1bbfade572de7feafa825547" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "8a7358ca8da547b40ea1a99ddc57ebed19959684" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "6606637d2c5525b43e294a8b366a85052e1be0c6" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "5ecfd1f28b87045deb8bc8ffe33b3d8b906f3264" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "be2d4345c65442c4cfbe8afdfb2ae0893945da42" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchitintel" + }, + "oid": "b5b89d3644a43e2dbda841cafb71b32edbe07c8a" + } + }, + { + "commit": { + "author": { + "user": { + "login": "malfet" + }, + "email": "nikita.shulga@gmail.com", + "name": "Nikita Shulga" + }, + "oid": "73881411e2bfb3aaa2e89926a82390b4c587ad75" + } + } + ], + "totalCount": 62 + }, + "commits": { + "nodes": [ + { + "commit": { + "checkSuites": { + "nodes": [ + { + "app": { + "name": "Facebook GitHub Tools", + "databaseId": 12274 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [ + { + "name": "Facebook CLA Check", + "conclusion": "SUCCESS" + }, + { + "name": "Meta Internal-Only Changes Check", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAU_NXnc=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "Lint" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "clang-format", + "conclusion": "SUCCESS" + }, + { + "name": "py2-setup-validate-errormsg", + "conclusion": "SUCCESS" + }, + { + "name": "quick-checks", + "conclusion": "SUCCESS" + }, + { + "name": "shellcheck", + "conclusion": "SUCCESS" + }, + { + "name": "toc", + "conclusion": "SUCCESS" + }, + { + "name": "clang-tidy", + "conclusion": "SUCCESS" + }, + { + "name": "cmakelint", + "conclusion": "SUCCESS" + }, + { + "name": "flake8-py3", + "conclusion": "SUCCESS" + }, + { + "name": "Test collect_env (with_torch)", + "conclusion": "SUCCESS" + }, + { + "name": "Test collect_env (without_torch)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAU_NZdg=", + "hasNextPage": true + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "run-torchbench", + "conclusion": "NEUTRAL" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAU_NYIw=", + "hasNextPage": false + } + }, + "conclusion": "SKIPPED" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "pull" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "pytorch-xla-linux-bionic-py3.7-clang8", + "conclusion": "NEUTRAL" + }, + { + "name": "deploy-linux-xenial-cuda11.3-py3.7-gcc7 / build", + "conclusion": "SUCCESS" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test", + "conclusion": "FAILURE" + }, + { + "name": "linux-bionic-rocm4.5-py3.7 / build", + "conclusion": "SUCCESS" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / build", + "conclusion": "SUCCESS" + }, + { + "name": "linux-xenial-py3.7-clang7-onnx / build", + "conclusion": "SUCCESS" + }, + { + "name": "linux-bionic-py3.7-clang9 / build", + "conclusion": "SUCCESS" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / build", + "conclusion": "SUCCESS" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / build", + "conclusion": "SUCCESS" + }, + { + "name": "linux-xenial-py3-clang5-mobile-build / build", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAU_NZ50=", + "hasNextPage": true + } + }, + "conclusion": "FAILURE" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAVZYxQs=", + "hasNextPage": false + } + }, + "oid": "73881411e2bfb3aaa2e89926a82390b4c587ad75" + } + } + ] + }, + "changedFiles": 37, + "files": { + "nodes": [ + { + "path": "aten/src/ATen/core/interned_strings.h" + }, + { + "path": "caffe2/CMakeLists.txt" + }, + { + "path": "cmake/Dependencies.cmake" + }, + { + "path": "cmake/Modules/FindMKLDNN.cmake" + }, + { + "path": "cmake/public/mkldnn.cmake" + }, + { + "path": "docs/source/jit.rst" + }, + { + "path": "test/test_jit_llga_fuser.py" + }, + { + "path": "torch/_C/__init__.pyi.in" + }, + { + "path": "torch/csrc/jit/codegen/onednn/LlgaTensorImpl.cpp" + }, + { + "path": "torch/csrc/jit/codegen/onednn/LlgaTensorImpl.h" + }, + { + "path": "torch/csrc/jit/codegen/onednn/README.md" + }, + { + "path": "torch/csrc/jit/codegen/onednn/defer_size_check.cpp" + }, + { + "path": "torch/csrc/jit/codegen/onednn/defer_size_check.h" + }, + { + "path": "torch/csrc/jit/codegen/onednn/graph_fuser.cpp" + }, + { + "path": "torch/csrc/jit/codegen/onednn/graph_fuser.h" + }, + { + "path": "torch/csrc/jit/codegen/onednn/graph_helper.cpp" + }, + { + "path": "torch/csrc/jit/codegen/onednn/graph_helper.h" + }, + { + "path": "torch/csrc/jit/codegen/onednn/graph_rewriter.cpp" + }, + { + "path": "torch/csrc/jit/codegen/onednn/guard_shape.cpp" + }, + { + "path": "torch/csrc/jit/codegen/onednn/guard_shape.h" + }, + { + "path": "torch/csrc/jit/codegen/onednn/interface.cpp" + }, + { + "path": "torch/csrc/jit/codegen/onednn/interface.h" + }, + { + "path": "torch/csrc/jit/codegen/onednn/kernel.cpp" + }, + { + "path": "torch/csrc/jit/codegen/onednn/kernel.h" + }, + { + "path": "torch/csrc/jit/codegen/onednn/layout_propagation.cpp" + }, + { + "path": "torch/csrc/jit/codegen/onednn/layout_propagation.h" + }, + { + "path": "torch/csrc/jit/codegen/onednn/operator.h" + }, + { + "path": "torch/csrc/jit/codegen/onednn/prepare_binary.cpp" + }, + { + "path": "torch/csrc/jit/codegen/onednn/prepare_binary.h" + }, + { + "path": "torch/csrc/jit/codegen/onednn/register_interface.cpp" + }, + { + "path": "torch/csrc/jit/ir/alias_analysis.cpp" + }, + { + "path": "torch/csrc/jit/ir/ir.cpp" + }, + { + "path": "torch/csrc/jit/passes/inline_autodiff_subgraphs.cpp" + }, + { + "path": "torch/csrc/jit/passes/onednn_graph_fuser.h" + }, + { + "path": "torch/csrc/jit/python/init.cpp" + }, + { + "path": "torch/csrc/jit/runtime/operator.cpp" + }, + { + "path": "torch/jit/__init__.py" + } + ], + "pageInfo": { + "endCursor": "Mzc", + "hasNextPage": false + } + }, + "reviews": { + "nodes": [ + { + "author": { + "login": "pinzhenx" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "pinzhenx" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "pinzhenx" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "chunyuan-w" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "eellison" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "wukong1992" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "eellison" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "eellison" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "eellison" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "eellison" + }, + "state": "APPROVED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "eellison" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "malfet" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "malfet" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "malfet" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + } + ], + "totalCount": 49 + }, + "comments": { + "nodes": [ + { + "bodyText": "Looks like this broke master https://hud.pytorch.org/pytorch/pytorch/commit/7dd08230117f4fa8bb82b3524e90fb00340198c7. I am reverting.", + "author": { + "login": "suo" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1074498483 + }, + { + "bodyText": "@pytorchbot revert this", + "author": { + "login": "suo" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1074498550 + }, + { + "bodyText": "Looks like this broke master https://hud.pytorch.org/pytorch/pytorch/commit/7dd08230117f4fa8bb82b3524e90fb00340198c7. I am reverting.\n\nOops! Will fix it ASAP.", + "author": { + "login": "sanchitintel" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 1074499668 + }, + { + "bodyText": "This pull request has been reverted by e5bf879. To re-land this change, please open another pull request, assignthe same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).", + "author": { + "login": "facebook-github-bot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1074508608 + }, + { + "bodyText": "This pull request has been reverted by e5bf879. To re-land this change, please open another pull request, assignthe same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).", + "author": { + "login": "facebook-github-bot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1082508130 + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpHOQAuLsw==", + "hasPreviousPage": true + } + } + } + } + } + }, + "query_sha=62ce809793481ce6ddce6e1a19d9b0761755ff0ff75decaf8a79419eaf793110 cursor=Y3Vyc29yOnYyOpHOQAuLsw== name=pytorch number=68111 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "comments": { + "nodes": [ + { + "bodyText": "CI Flow Status\n\u269b\ufe0f CI Flow\nRuleset - Version: v1\nRuleset - File: https://github.com/chunyuan-w/pytorch/blob/7496bf1588050191595d833d23b8972b2f22655e/.github/generated-ciflow-ruleset.json\nPR ciflow labels: ciflow/default\n\n\n\nWorkflows\nLabels (bold enabled)\nStatus\n\n\n\n\nTriggered Workflows\n\n\n\n\nlinux-bionic-py3.7-clang9\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/trunk\n\u2705 triggered\n\n\nlinux-docs\nciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-vulkan-bionic-py3.7-clang9\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk, ciflow/vulkan\n\u2705 triggered\n\n\nlinux-xenial-cuda11.3-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-cuda11.3-py3.7-gcc7-bazel-test\nciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3-clang5-mobile-build\nciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3-clang5-mobile-custom-build-static\nciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-clang7-asan\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-clang7-onnx\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-gcc5.4\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-gcc7\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-gcc7-no-ops\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\npytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single\nciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\npytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit\nciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nwin-vs2019-cpu-py3\nciflow/all, ciflow/cpu, ciflow/default, ciflow/trunk, ciflow/win\n\u2705 triggered\n\n\nwin-vs2019-cuda11.3-py3\nciflow/all, ciflow/cuda, ciflow/default, ciflow/trunk, ciflow/win\n\u2705 triggered\n\n\nSkipped Workflows\n\n\n\n\ncaffe2-linux-xenial-py3.7-gcc5.4\nciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\ndocker-builds\nciflow/all, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-arm64\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-arm64-coreml\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-arm64-custom-ops\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-arm64-full-jit\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-arm64-metal\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-x86-64\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-x86-64-coreml\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-x86-64-full-jit\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nlibtorch-linux-xenial-cuda10.2-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\nlibtorch-linux-xenial-cuda11.3-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\nlinux-binary-conda\nciflow/binaries, ciflow/binaries/conda\n\ud83d\udeab skipped\n\n\nlinux-binary-libtorch-cxx11-abi\nciflow/binaries, ciflow/binaries/libtorch\n\ud83d\udeab skipped\n\n\nlinux-binary-libtorch-pre-cxx11\nciflow/binaries, ciflow/binaries/libtorch\n\ud83d\udeab skipped\n\n\nlinux-binary-manywheel\nciflow/binaries, ciflow/binaries/wheel\n\ud83d\udeab skipped\n\n\nlinux-bionic-cuda10.2-py3.9-gcc7\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow, ciflow/trunk\n\ud83d\udeab skipped\n\n\nlinux-docs-push\nciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nlinux-xenial-cuda11.3-py3.7-gcc7-no-ops\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\nmacos-10-15-py3-arm64\nciflow/all, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nmacos-10-15-py3-lite-interpreter-x86-64\nciflow/all, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nmacos-11-py3-x86-64\nciflow/all, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nparallelnative-linux-xenial-py3.7-gcc5.4\nciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\nperiodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nperiodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nperiodic-linux-bionic-cuda11.5-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nperiodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck\n\ud83d\udeab skipped\n\n\nperiodic-linux-xenial-cuda11.1-py3.7-gcc7-debug\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nperiodic-win-vs2019-cuda11.1-py3\nciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win\n\ud83d\udeab skipped\n\n\nperiodic-win-vs2019-cuda11.5-py3\nciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win\n\ud83d\udeab skipped\n\n\npytorch-linux-xenial-py3-clang5-android-ndk-r19c-build\nciflow/all, ciflow/android, ciflow/cpu, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\n\n\nYou can add a comment to the PR and tag @pytorchbot with the following commands:\n\n# ciflow rerun, \"ciflow/default\" will always be added automatically\n@pytorchbot ciflow rerun\n\n# ciflow rerun with additional labels \"-l \", which is equivalent to adding these labels manually and trigger the rerun\n@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow\n\nFor more information, please take a look at the CI Flow Wiki.", + "author": { + "login": "pytorch-probot" + }, + "authorAssociation": "NONE", + "editor": { + "login": "pytorch-probot" + }, + "databaseId": 964902865 + }, + { + "bodyText": "\ud83d\udd17 Helpful links\n\n\ud83e\uddea \u00a0See artifacts and rendered test results at hud.pytorch.org/pr/68111\nNeed help or want to give feedback on the CI? Visit our office hours\n\n\ud83d\udc8a CI failures summary and remediations\nAs of commit 7388141 (more details on the Dr. CI page):\n\n\n29/29 failures introduced in this PR\n\n\n\ud83d\udd75\ufe0f 29 new failures recognized by patterns\nThe following CI failures do not appear to be due to upstream breakages:\n pull / linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge) (1/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:31:38.6978776Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T21:31:38.3001628Z + python3 -m pip install boto3==1.19.12\n2022-03-21T21:31:38.5169168Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T21:31:38.5362923Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T21:31:38.5413452Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T21:31:38.5458747Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T21:31:38.5484014Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T21:31:38.5497924Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T21:31:38.5656491Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T21:31:38.5678893Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T21:31:38.6888479Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0f6488c20adb4dca4\n2022-03-21T21:31:38.6978776Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T21:31:38.6992648Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T21:31:38.7003010Z ##[error]Process completed with exit code 2.\n2022-03-21T21:31:38.7044027Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T21:31:38.7044261Z with:\n2022-03-21T21:31:38.7044413Z env:\n2022-03-21T21:31:38.7044565Z IN_CI: 1\n2022-03-21T21:31:38.7044709Z IS_GHA: 1\n2022-03-21T21:31:38.7044885Z GIT_DEFAULT_BRANCH: master\n2022-03-21T21:31:38.7045067Z ##[endgroup]\n2022-03-21T21:31:38.7060958Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-xenial-py3.7-gcc5.4 / test (default, 1, 2, linux.2xlarge) (2/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:35:19.2635222Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T21:35:18.9028722Z + python3 -m pip install boto3==1.19.12\n2022-03-21T21:35:19.1132721Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T21:35:19.1310590Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T21:35:19.1360251Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T21:35:19.1386865Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T21:35:19.1429182Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T21:35:19.1441925Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T21:35:19.1468280Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T21:35:19.1617667Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T21:35:19.2545368Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-098be2985e0392130\n2022-03-21T21:35:19.2635222Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T21:35:19.2648463Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T21:35:19.2658727Z ##[error]Process completed with exit code 2.\n2022-03-21T21:35:19.2706355Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T21:35:19.2706591Z with:\n2022-03-21T21:35:19.2706748Z env:\n2022-03-21T21:35:19.2706908Z IN_CI: 1\n2022-03-21T21:35:19.2707061Z IS_GHA: 1\n2022-03-21T21:35:19.2707246Z GIT_DEFAULT_BRANCH: master\n2022-03-21T21:35:19.2707438Z ##[endgroup]\n2022-03-21T21:35:19.2724554Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / win-vs2019-cuda11.3-py3 / test (force_on_cpu, 1, 1, windows.4xlarge) (3/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T23:11:57.5531419Z C:\\actions-runner\\...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T23:11:52.7662022Z Downloading botocore-1.22.12-py3-none-any.whl (8.1 MB)\n2022-03-21T23:11:53.1213298Z ---------------------------------------- 8.1/8.1 MB 23.6 MB/s eta 0:00:00\n2022-03-21T23:11:53.1644665Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T23:11:53.2218699Z Collecting python-dateutil<3.0.0,>=2.1\n2022-03-21T23:11:53.2389674Z Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)\n2022-03-21T23:11:53.2787295Z -------------------------------------- 247.7/247.7 KB 7.4 MB/s eta 0:00:00\n2022-03-21T23:11:53.3761842Z Requirement already satisfied: six>=1.5 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T23:11:53.5457622Z Installing collected packages: python-dateutil, jmespath, botocore, s3transfer, boto3\n2022-03-21T23:11:57.4175080Z Successfully installed boto3-1.19.12 botocore-1.22.12 jmespath-0.10.0 python-dateutil-2.8.2 s3transfer-0.5.2\n2022-03-21T23:11:57.5296815Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0105d4db093574f40\n2022-03-21T23:11:57.5531419Z C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\\python3.exe: can't open file 'C:\\\\actions-runner\\\\_work\\\\pytorch\\\\pytorch\\\\.github\\\\scripts\\\\get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T23:11:57.5564814Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T23:11:57.5587712Z ##[error]Process completed with exit code 2.\n2022-03-21T23:11:57.5790311Z ##[group]Run pytorch/pytorch/.github/actions/teardown-win@master\n2022-03-21T23:11:57.5790832Z with:\n2022-03-21T23:11:57.5791104Z env:\n2022-03-21T23:11:57.5791358Z IN_CI: 1\n2022-03-21T23:11:57.5791620Z IS_GHA: 1\n2022-03-21T23:11:57.5791939Z GIT_DEFAULT_BRANCH: master\n2022-03-21T23:11:57.5792425Z pythonLocation: C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\n2022-03-21T23:11:57.5792884Z ##[endgroup]\n\n\n pull / linux-bionic-rocm4.5-py3.7 / test (default, 1, 2, linux.rocm.gpu) (4/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-22T02:17:12.6257577Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-22T02:17:11.9280556Z Using cached https://files.pythonhosted.org/packages/7b/9c/f51775ebe7df5a7aa4e7c79ed671bde94e154bd968aca8d65bb24aba0c8c/s3transfer-0.5.2-py3-none-any.whl\n2022-03-22T02:17:11.9335199Z Collecting urllib3<1.27,>=1.25.4 (from botocore<1.23.0,>=1.22.12->boto3==1.19.12)\n2022-03-22T02:17:11.9682045Z Using cached https://files.pythonhosted.org/packages/ec/03/062e6444ce4baf1eac17a6a0ebfe36bb1ad05e1df0e20b110de59c278498/urllib3-1.26.9-py2.py3-none-any.whl\n2022-03-22T02:17:11.9850357Z Collecting python-dateutil<3.0.0,>=2.1 (from botocore<1.23.0,>=1.22.12->boto3==1.19.12)\n2022-03-22T02:17:12.0403171Z Using cached https://files.pythonhosted.org/packages/36/7a/87837f39d0296e723bb9b62bbb257d0355c7f6128853c78955f57342a56d/python_dateutil-2.8.2-py2.py3-none-any.whl\n2022-03-22T02:17:12.0468875Z Collecting six>=1.5 (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12)\n2022-03-22T02:17:12.0590000Z Using cached https://files.pythonhosted.org/packages/d9/5a/e7c31adbe875f2abbb91bd84cf2dc52d792b5a01506781dbcf25c91daf11/six-1.16.0-py2.py3-none-any.whl\n2022-03-22T02:17:12.0607093Z Installing collected packages: jmespath, urllib3, six, python-dateutil, botocore, s3transfer, boto3\n2022-03-22T02:17:12.5273459Z Successfully installed boto3-1.19.12 botocore-1.22.12 jmespath-0.10.0 python-dateutil-2.8.2 s3transfer-0.5.2 six-1.16.0 urllib3-1.26.9\n2022-03-22T02:17:12.6032812Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 worker-rocm-amd-114\n2022-03-22T02:17:12.6257577Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-22T02:17:12.6259543Z + GHA_WORKFLOW_JOB_ID=\n2022-03-22T02:17:12.6291924Z ##[error]Process completed with exit code 2.\n2022-03-22T02:17:12.6387977Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-22T02:17:12.6388298Z with:\n2022-03-22T02:17:12.6388521Z wait-ssh: false\n2022-03-22T02:17:12.6388727Z env:\n2022-03-22T02:17:12.6388932Z IN_CI: 1\n2022-03-22T02:17:12.6389143Z IS_GHA: 1\n2022-03-22T02:17:12.6389368Z GIT_DEFAULT_BRANCH: master\n2022-03-22T02:17:12.6389669Z DOCKER_HOST: unix:///run/user/1121/docker.sock\n\n\n pull / linux-xenial-py3.7-clang7-onnx / test (default, 2, 2, linux.2xlarge) (5/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T22:19:24.4890693Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T22:19:24.0962005Z + python3 -m pip install boto3==1.19.12\n2022-03-21T22:19:24.3152253Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T22:19:24.3341183Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T22:19:24.3391374Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T22:19:24.3436392Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T22:19:24.3448982Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T22:19:24.3474092Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T22:19:24.3502003Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T22:19:24.3655072Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T22:19:24.4799309Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0bc9250521f338cae\n2022-03-21T22:19:24.4890693Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T22:19:24.4903625Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T22:19:24.4913841Z ##[error]Process completed with exit code 2.\n2022-03-21T22:19:24.4957338Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T22:19:24.4957575Z with:\n2022-03-21T22:19:24.4957735Z env:\n2022-03-21T22:19:24.4957900Z IN_CI: 1\n2022-03-21T22:19:24.4958055Z IS_GHA: 1\n2022-03-21T22:19:24.4958246Z GIT_DEFAULT_BRANCH: master\n2022-03-21T22:19:24.4958437Z ##[endgroup]\n2022-03-21T22:19:24.4989649Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-bionic-rocm4.5-py3.7 / test (default, 2, 2, linux.rocm.gpu) (6/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-22T01:05:07.6983899Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-22T01:05:06.8364546Z Using cached https://files.pythonhosted.org/packages/7b/9c/f51775ebe7df5a7aa4e7c79ed671bde94e154bd968aca8d65bb24aba0c8c/s3transfer-0.5.2-py3-none-any.whl\n2022-03-22T01:05:06.8431763Z Collecting urllib3<1.27,>=1.25.4 (from botocore<1.23.0,>=1.22.12->boto3==1.19.12)\n2022-03-22T01:05:06.8949391Z Using cached https://files.pythonhosted.org/packages/ec/03/062e6444ce4baf1eac17a6a0ebfe36bb1ad05e1df0e20b110de59c278498/urllib3-1.26.9-py2.py3-none-any.whl\n2022-03-22T01:05:06.9180079Z Collecting python-dateutil<3.0.0,>=2.1 (from botocore<1.23.0,>=1.22.12->boto3==1.19.12)\n2022-03-22T01:05:06.9803351Z Using cached https://files.pythonhosted.org/packages/36/7a/87837f39d0296e723bb9b62bbb257d0355c7f6128853c78955f57342a56d/python_dateutil-2.8.2-py2.py3-none-any.whl\n2022-03-22T01:05:06.9882133Z Collecting six>=1.5 (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12)\n2022-03-22T01:05:07.0067062Z Using cached https://files.pythonhosted.org/packages/d9/5a/e7c31adbe875f2abbb91bd84cf2dc52d792b5a01506781dbcf25c91daf11/six-1.16.0-py2.py3-none-any.whl\n2022-03-22T01:05:07.0088676Z Installing collected packages: urllib3, jmespath, six, python-dateutil, botocore, s3transfer, boto3\n2022-03-22T01:05:07.5819667Z Successfully installed boto3-1.19.12 botocore-1.22.12 jmespath-0.10.0 python-dateutil-2.8.2 s3transfer-0.5.2 six-1.16.0 urllib3-1.26.9\n2022-03-22T01:05:07.6774717Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 worker-rocm-amd-60\n2022-03-22T01:05:07.6983899Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-22T01:05:07.6988652Z + GHA_WORKFLOW_JOB_ID=\n2022-03-22T01:05:07.7023073Z ##[error]Process completed with exit code 2.\n2022-03-22T01:05:07.7102087Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-22T01:05:07.7102389Z with:\n2022-03-22T01:05:07.7102603Z wait-ssh: false\n2022-03-22T01:05:07.7102820Z env:\n2022-03-22T01:05:07.7103015Z IN_CI: 1\n2022-03-22T01:05:07.7103224Z IS_GHA: 1\n2022-03-22T01:05:07.7103458Z GIT_DEFAULT_BRANCH: master\n2022-03-22T01:05:07.7103737Z DOCKER_HOST: unix:///run/user/1502/docker.sock\n\n\n pull / linux-xenial-py3.7-gcc5.4 / test (jit_legacy, 1, 1, linux.2xlarge) (7/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T20:51:39.3637996Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T20:51:39.2041249Z Attempting uninstall: s3transfer\n2022-03-21T20:51:39.2043010Z Found existing installation: s3transfer 0.3.7\n2022-03-21T20:51:39.2083799Z Uninstalling s3transfer-0.3.7:\n2022-03-21T20:51:39.2089675Z Successfully uninstalled s3transfer-0.3.7\n2022-03-21T20:51:39.2480546Z Attempting uninstall: boto3\n2022-03-21T20:51:39.2482953Z Found existing installation: boto3 1.16.34\n2022-03-21T20:51:39.2584292Z Uninstalling boto3-1.16.34:\n2022-03-21T20:51:39.2599474Z Successfully uninstalled boto3-1.16.34\n2022-03-21T20:51:39.3130921Z Successfully installed boto3-1.19.12 botocore-1.22.12 s3transfer-0.5.2\n2022-03-21T20:51:39.3550598Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-03ef7efc3078e3da5\n2022-03-21T20:51:39.3637996Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T20:51:39.3650651Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T20:51:39.3660484Z ##[error]Process completed with exit code 2.\n2022-03-21T20:51:39.3696465Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T20:51:39.3696693Z with:\n2022-03-21T20:51:39.3696850Z env:\n2022-03-21T20:51:39.3697012Z IN_CI: 1\n2022-03-21T20:51:39.3697161Z IS_GHA: 1\n2022-03-21T20:51:39.3697342Z GIT_DEFAULT_BRANCH: master\n2022-03-21T20:51:39.3697528Z ##[endgroup]\n2022-03-21T20:51:39.3730420Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-vulkan-bionic-py3.7-clang9 / test (default, 1, 1, linux.2xlarge) (8/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:03:36.3916860Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T21:03:36.0096309Z + python3 -m pip install boto3==1.19.12\n2022-03-21T21:03:36.2278560Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T21:03:36.2461618Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T21:03:36.2513260Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T21:03:36.2541524Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T21:03:36.2554899Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T21:03:36.2598277Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T21:03:36.2758299Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T21:03:36.2780690Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T21:03:36.3825021Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0a4a552890e6ef7d3\n2022-03-21T21:03:36.3916860Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T21:03:36.3930343Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T21:03:36.3941263Z ##[error]Process completed with exit code 2.\n2022-03-21T21:03:36.3979258Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T21:03:36.3979496Z with:\n2022-03-21T21:03:36.3979654Z env:\n2022-03-21T21:03:36.3979814Z IN_CI: 1\n2022-03-21T21:03:36.3979968Z IS_GHA: 1\n2022-03-21T21:03:36.3980157Z GIT_DEFAULT_BRANCH: master\n2022-03-21T21:03:36.3980360Z ##[endgroup]\n2022-03-21T21:03:36.3996257Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / win-vs2019-cuda11.3-py3 / test (default, 1, 2, windows.8xlarge.nvidia.gpu) (9/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-22T00:41:15.5325784Z C:\\actions-runner\\...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-22T00:41:10.3015614Z Downloading s3transfer-0.5.2-py3-none-any.whl (79 kB)\n2022-03-22T00:41:10.3625659Z ---------------------------------------- 79.5/79.5 KB 1.1 MB/s eta 0:00:00\n2022-03-22T00:41:10.4120236Z Collecting python-dateutil<3.0.0,>=2.1\n2022-03-22T00:41:10.4170155Z Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)\n2022-03-22T00:41:10.4722115Z -------------------------------------- 247.7/247.7 KB 5.2 MB/s eta 0:00:00\n2022-03-22T00:41:10.4843512Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-22T00:41:10.6596108Z Requirement already satisfied: six>=1.5 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-22T00:41:10.8733354Z Installing collected packages: python-dateutil, jmespath, botocore, s3transfer, boto3\n2022-03-22T00:41:15.3745408Z Successfully installed boto3-1.19.12 botocore-1.22.12 jmespath-0.10.0 python-dateutil-2.8.2 s3transfer-0.5.2\n2022-03-22T00:41:15.4987162Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-09cacc848abc3dd32\n2022-03-22T00:41:15.5325784Z C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\\python3.exe: can't open file 'C:\\\\actions-runner\\\\_work\\\\pytorch\\\\pytorch\\\\.github\\\\scripts\\\\get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-22T00:41:15.5373630Z + GHA_WORKFLOW_JOB_ID=\n2022-03-22T00:41:15.5404353Z ##[error]Process completed with exit code 2.\n2022-03-22T00:41:15.5790508Z ##[group]Run pytorch/pytorch/.github/actions/teardown-win@master\n2022-03-22T00:41:15.5791192Z with:\n2022-03-22T00:41:15.5791530Z env:\n2022-03-22T00:41:15.5791849Z IN_CI: 1\n2022-03-22T00:41:15.5792186Z IS_GHA: 1\n2022-03-22T00:41:15.5792599Z GIT_DEFAULT_BRANCH: master\n2022-03-22T00:41:15.5793237Z pythonLocation: C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\n2022-03-22T00:41:15.5793831Z ##[endgroup]\n\n\n pull / linux-xenial-py3.7-gcc5.4 / test (docs_test, 1, 1, linux.2xlarge) (10/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T20:50:32.9799307Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T20:50:32.8167560Z Attempting uninstall: s3transfer\n2022-03-21T20:50:32.8169351Z Found existing installation: s3transfer 0.3.7\n2022-03-21T20:50:32.8213295Z Uninstalling s3transfer-0.3.7:\n2022-03-21T20:50:32.8219209Z Successfully uninstalled s3transfer-0.3.7\n2022-03-21T20:50:32.8602320Z Attempting uninstall: boto3\n2022-03-21T20:50:32.8603289Z Found existing installation: boto3 1.16.34\n2022-03-21T20:50:32.8704535Z Uninstalling boto3-1.16.34:\n2022-03-21T20:50:32.8719403Z Successfully uninstalled boto3-1.16.34\n2022-03-21T20:50:32.9244278Z Successfully installed boto3-1.19.12 botocore-1.22.12 s3transfer-0.5.2\n2022-03-21T20:50:32.9710449Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0c568461a276d4a71\n2022-03-21T20:50:32.9799307Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T20:50:32.9812238Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T20:50:32.9823052Z ##[error]Process completed with exit code 2.\n2022-03-21T20:50:32.9859290Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T20:50:32.9859527Z with:\n2022-03-21T20:50:32.9859664Z env:\n2022-03-21T20:50:32.9859817Z IN_CI: 1\n2022-03-21T20:50:32.9859977Z IS_GHA: 1\n2022-03-21T20:50:32.9860144Z GIT_DEFAULT_BRANCH: master\n2022-03-21T20:50:32.9860327Z ##[endgroup]\n2022-03-21T20:50:32.9893642Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-xenial-py3.7-clang7-asan / test (default, 1, 3, linux.2xlarge) (11/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:05:00.7163042Z SUMMARY: Undefined.../jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in\n\n2022-03-21T21:05:00.6660824Z #10 0x55fc8a3ea801 in run_mod /tmp/build/80754af9/python_1627392990942/work/Python/pythonrun.c:1037\n2022-03-21T21:05:00.6661768Z #11 0x55fc8a3f57a9 in PyRun_StringFlags /tmp/build/80754af9/python_1627392990942/work/Python/pythonrun.c:961\n2022-03-21T21:05:00.6662455Z #12 0x55fc8a3f580b in PyRun_SimpleStringFlags /tmp/build/80754af9/python_1627392990942/work/Python/pythonrun.c:455\n2022-03-21T21:05:00.6663570Z #13 0x55fc8a3f5908 in pymain_run_command /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:420\n2022-03-21T21:05:00.6663952Z #14 0x55fc8a3f5908 in pymain_run_python /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:2907\n2022-03-21T21:05:00.6664431Z #15 0x55fc8a3f5908 in pymain_main /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:3460\n2022-03-21T21:05:00.6665304Z #16 0x55fc8a3f5ccb in _Py_UnixMain /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:3495\n2022-03-21T21:05:00.7162113Z #17 0x7f940d00f83f in __libc_start_main /build/glibc-S7Ft5T/glibc-2.23/csu/../csu/libc-start.c:291\n2022-03-21T21:05:00.7162534Z #18 0x55fc8a39a554 in _start (/opt/conda/bin/python3.7+0x1d7554)\n2022-03-21T21:05:00.7162711Z \n2022-03-21T21:05:00.7163042Z SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in \n2022-03-21T21:05:00.7334595Z + retcode=1\n2022-03-21T21:05:00.7334954Z + set -e\n2022-03-21T21:05:00.7335215Z + return 1\n2022-03-21T21:05:00.7338688Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX-* ]]\n2022-03-21T21:05:00.7339232Z + [[ default == \\n\\o\\g\\p\\u\\_\\N\\O\\_\\A\\V\\X ]]\n2022-03-21T21:05:00.7340113Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX2-* ]]\n2022-03-21T21:05:00.7340612Z + [[ default == \\n\\o\\g\\p\\u\\_\\N\\O\\_\\A\\V\\X\\2 ]]\n2022-03-21T21:05:00.7341187Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX512-* ]]\n2022-03-21T21:05:00.7341668Z + [[ default == \\n\\o\\g\\p\\u\\_\\N\\O\\_\\A\\V\\X\\5\\1\\2 ]]\n2022-03-21T21:05:00.7344466Z + [[ linux-xenial-py3.7-clang7-asan-default == *tbb* ]]\n\n\n pull / linux-xenial-py3.7-clang7-onnx / test (default, 1, 2, linux.2xlarge) (12/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T22:06:03.4437430Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T22:06:03.0752199Z + python3 -m pip install boto3==1.19.12\n2022-03-21T22:06:03.2853252Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T22:06:03.3032326Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T22:06:03.3081589Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T22:06:03.3093911Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T22:06:03.3120244Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T22:06:03.3162406Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T22:06:03.3188431Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T22:06:03.3337181Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T22:06:03.4348072Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0ee48c8811fafc444\n2022-03-21T22:06:03.4437430Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T22:06:03.4450920Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T22:06:03.4461263Z ##[error]Process completed with exit code 2.\n2022-03-21T22:06:03.4502346Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T22:06:03.4502576Z with:\n2022-03-21T22:06:03.4502730Z env:\n2022-03-21T22:06:03.4502888Z IN_CI: 1\n2022-03-21T22:06:03.4503038Z IS_GHA: 1\n2022-03-21T22:06:03.4503302Z GIT_DEFAULT_BRANCH: master\n2022-03-21T22:06:03.4503492Z ##[endgroup]\n2022-03-21T22:06:03.4519156Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge) (13/29)\nStep: \"Test\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T20:50:13.2205634Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T20:50:12.8679322Z + python3 -m pip install boto3==1.19.12\n2022-03-21T20:50:13.0744228Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T20:50:13.0916284Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T20:50:13.0964264Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T20:50:13.1005656Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T20:50:13.1017299Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T20:50:13.1041042Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T20:50:13.1189450Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T20:50:13.1208751Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T20:50:13.2119445Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0d02da60fd18c22f5\n2022-03-21T20:50:13.2205634Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T20:50:13.2217939Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T20:50:13.2220259Z ##[error]Process completed with exit code 2.\n2022-03-21T20:50:13.2248664Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T20:50:13.2249012Z with:\n2022-03-21T20:50:13.2249260Z env:\n2022-03-21T20:50:13.2249500Z IN_CI: 1\n2022-03-21T20:50:13.2249738Z IS_GHA: 1\n2022-03-21T20:50:13.2250025Z GIT_DEFAULT_BRANCH: master\n2022-03-21T20:50:13.2250329Z ##[endgroup]\n2022-03-21T20:50:13.2272735Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed, 1, 1, linux.8xlarge.nvidia.gpu) (14/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T23:47:38.0451999Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T23:47:37.5554508Z + python3 -m pip install boto3==1.19.12\n2022-03-21T23:47:37.8411473Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T23:47:37.8631484Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T23:47:37.8699561Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T23:47:37.8737037Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T23:47:37.8754443Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T23:47:37.8814393Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T23:47:37.8849540Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T23:47:37.9059579Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T23:47:38.0336298Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0b44f47f4292089a2\n2022-03-21T23:47:38.0451999Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T23:47:38.0469471Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T23:47:38.0484106Z ##[error]Process completed with exit code 2.\n2022-03-21T23:47:38.0532678Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T23:47:38.0533007Z with:\n2022-03-21T23:47:38.0533223Z env:\n2022-03-21T23:47:38.0533440Z IN_CI: 1\n2022-03-21T23:47:38.0533649Z IS_GHA: 1\n2022-03-21T23:47:38.0533902Z GIT_DEFAULT_BRANCH: master\n2022-03-21T23:47:38.0534170Z GPU_FLAG: --gpus all\n2022-03-21T23:47:38.0534401Z ##[endgroup]\n\n\n pull / linux-xenial-py3.7-clang7-asan / test (default, 2, 3, linux.2xlarge) (15/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:04:59.3115800Z SUMMARY: Undefined.../jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in\n\n2022-03-21T21:04:59.2595213Z #10 0x55a7f39a4801 in run_mod /tmp/build/80754af9/python_1627392990942/work/Python/pythonrun.c:1037\n2022-03-21T21:04:59.2595707Z #11 0x55a7f39af7a9 in PyRun_StringFlags /tmp/build/80754af9/python_1627392990942/work/Python/pythonrun.c:961\n2022-03-21T21:04:59.2597203Z #12 0x55a7f39af80b in PyRun_SimpleStringFlags /tmp/build/80754af9/python_1627392990942/work/Python/pythonrun.c:455\n2022-03-21T21:04:59.2598205Z #13 0x55a7f39af908 in pymain_run_command /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:420\n2022-03-21T21:04:59.2598697Z #14 0x55a7f39af908 in pymain_run_python /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:2907\n2022-03-21T21:04:59.2599178Z #15 0x55a7f39af908 in pymain_main /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:3460\n2022-03-21T21:04:59.2599747Z #16 0x55a7f39afccb in _Py_UnixMain /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:3495\n2022-03-21T21:04:59.3114751Z #17 0x7f3b3822383f in __libc_start_main /build/glibc-S7Ft5T/glibc-2.23/csu/../csu/libc-start.c:291\n2022-03-21T21:04:59.3115277Z #18 0x55a7f3954554 in _start (/opt/conda/bin/python3.7+0x1d7554)\n2022-03-21T21:04:59.3115468Z \n2022-03-21T21:04:59.3115800Z SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in \n2022-03-21T21:04:59.3292385Z + retcode=1\n2022-03-21T21:04:59.3292781Z + set -e\n2022-03-21T21:04:59.3293062Z + return 1\n2022-03-21T21:04:59.3295462Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX-* ]]\n2022-03-21T21:04:59.3295802Z + [[ default == \\n\\o\\g\\p\\u\\_\\N\\O\\_\\A\\V\\X ]]\n2022-03-21T21:04:59.3296394Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX2-* ]]\n2022-03-21T21:04:59.3296700Z + [[ default == \\n\\o\\g\\p\\u\\_\\N\\O\\_\\A\\V\\X\\2 ]]\n2022-03-21T21:04:59.3297055Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX512-* ]]\n2022-03-21T21:04:59.3297416Z + [[ default == \\n\\o\\g\\p\\u\\_\\N\\O\\_\\A\\V\\X\\5\\1\\2 ]]\n2022-03-21T21:04:59.3299623Z + [[ linux-xenial-py3.7-clang7-asan-default == *tbb* ]]\n\n\n pull / win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge) (16/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T22:14:31.7846086Z C:\\actions-runner\\...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T22:14:25.5525714Z Collecting jmespath<1.0.0,>=0.7.1\n2022-03-21T22:14:25.5568155Z Downloading jmespath-0.10.0-py2.py3-none-any.whl (24 kB)\n2022-03-21T22:14:25.5952617Z Collecting python-dateutil<3.0.0,>=2.1\n2022-03-21T22:14:25.6169392Z Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)\n2022-03-21T22:14:25.6629996Z -------------------------------------- 247.7/247.7 KB 5.1 MB/s eta 0:00:00\n2022-03-21T22:14:25.6710247Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T22:14:25.8284354Z Requirement already satisfied: six>=1.5 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T22:14:25.9816751Z Installing collected packages: python-dateutil, jmespath, botocore, s3transfer, boto3\n2022-03-21T22:14:31.6672236Z Successfully installed boto3-1.19.12 botocore-1.22.12 jmespath-0.10.0 python-dateutil-2.8.2 s3transfer-0.5.2\n2022-03-21T22:14:31.7630473Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0ed0915ecee5d2424\n2022-03-21T22:14:31.7846086Z C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\\python3.exe: can't open file 'C:\\\\actions-runner\\\\_work\\\\pytorch\\\\pytorch\\\\.github\\\\scripts\\\\get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T22:14:31.7876742Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T22:14:31.7897140Z ##[error]Process completed with exit code 2.\n2022-03-21T22:14:31.8195621Z ##[group]Run pytorch/pytorch/.github/actions/teardown-win@master\n2022-03-21T22:14:31.8196110Z with:\n2022-03-21T22:14:31.8196356Z env:\n2022-03-21T22:14:31.8196614Z IN_CI: 1\n2022-03-21T22:14:31.8196876Z IS_GHA: 1\n2022-03-21T22:14:31.8197169Z GIT_DEFAULT_BRANCH: master\n2022-03-21T22:14:31.8197652Z pythonLocation: C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\n2022-03-21T22:14:31.8198093Z ##[endgroup]\n\n\n pull / linux-xenial-py3.7-gcc5.4 / test (default, 2, 2, linux.2xlarge) (17/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:19:15.8845728Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T21:19:15.5116060Z + python3 -m pip install boto3==1.19.12\n2022-03-21T21:19:15.7231476Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T21:19:15.7409711Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T21:19:15.7458478Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T21:19:15.7470508Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T21:19:15.7496799Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T21:19:15.7538362Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T21:19:15.7566161Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T21:19:15.7711630Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T21:19:15.8753543Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0e2b3b4ddb246ff2a\n2022-03-21T21:19:15.8845728Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T21:19:15.8859814Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T21:19:15.8870165Z ##[error]Process completed with exit code 2.\n2022-03-21T21:19:15.8917039Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T21:19:15.8917279Z with:\n2022-03-21T21:19:15.8917433Z env:\n2022-03-21T21:19:15.8917586Z IN_CI: 1\n2022-03-21T21:19:15.8917734Z IS_GHA: 1\n2022-03-21T21:19:15.8917917Z GIT_DEFAULT_BRANCH: master\n2022-03-21T21:19:15.8918102Z ##[endgroup]\n2022-03-21T21:19:15.8934572Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 1, 2, linux.4xlarge.nvidia.gpu) (18/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T23:19:48.5900162Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T23:19:48.0742254Z + python3 -m pip install boto3==1.19.12\n2022-03-21T23:19:48.3742563Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T23:19:48.3976536Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T23:19:48.4048700Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T23:19:48.4065374Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T23:19:48.4128076Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T23:19:48.4164273Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T23:19:48.4202610Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T23:19:48.4416723Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T23:19:48.5773033Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-07ab7a3c4a5402af2\n2022-03-21T23:19:48.5900162Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T23:19:48.5919822Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T23:19:48.5936087Z ##[error]Process completed with exit code 2.\n2022-03-21T23:19:48.6007930Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T23:19:48.6008268Z with:\n2022-03-21T23:19:48.6008483Z env:\n2022-03-21T23:19:48.6008701Z IN_CI: 1\n2022-03-21T23:19:48.6008920Z IS_GHA: 1\n2022-03-21T23:19:48.6009170Z GIT_DEFAULT_BRANCH: master\n2022-03-21T23:19:48.6009440Z GPU_FLAG: --gpus all\n2022-03-21T23:19:48.6009671Z ##[endgroup]\n\n\n pull / win-vs2019-cuda11.3-py3 / test (default, 2, 2, windows.8xlarge.nvidia.gpu) (19/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T22:54:04.2844259Z C:\\actions-runner\\...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T22:53:59.0889659Z Downloading botocore-1.22.12-py3-none-any.whl (8.1 MB)\n2022-03-21T22:53:59.6881416Z ---------------------------------------- 8.1/8.1 MB 14.0 MB/s eta 0:00:00\n2022-03-21T22:53:59.7427779Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T22:53:59.7691882Z Collecting python-dateutil<3.0.0,>=2.1\n2022-03-21T22:53:59.7779847Z Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)\n2022-03-21T22:53:59.8281663Z -------------------------------------- 247.7/247.7 KB 5.1 MB/s eta 0:00:00\n2022-03-21T22:54:00.0185115Z Requirement already satisfied: six>=1.5 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T22:54:00.2359770Z Installing collected packages: python-dateutil, jmespath, botocore, s3transfer, boto3\n2022-03-21T22:54:04.1208891Z Successfully installed boto3-1.19.12 botocore-1.22.12 jmespath-0.10.0 python-dateutil-2.8.2 s3transfer-0.5.2\n2022-03-21T22:54:04.2505862Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-03b4fbe63be8ef4b0\n2022-03-21T22:54:04.2844259Z C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\\python3.exe: can't open file 'C:\\\\actions-runner\\\\_work\\\\pytorch\\\\pytorch\\\\.github\\\\scripts\\\\get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T22:54:04.2891082Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T22:54:04.2919900Z ##[error]Process completed with exit code 2.\n2022-03-21T22:54:04.3377901Z ##[group]Run pytorch/pytorch/.github/actions/teardown-win@master\n2022-03-21T22:54:04.3378575Z with:\n2022-03-21T22:54:04.3378930Z env:\n2022-03-21T22:54:04.3379275Z IN_CI: 1\n2022-03-21T22:54:04.3379600Z IS_GHA: 1\n2022-03-21T22:54:04.3380023Z GIT_DEFAULT_BRANCH: master\n2022-03-21T22:54:04.3380691Z pythonLocation: C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\n2022-03-21T22:54:04.3381278Z ##[endgroup]\n\n\n pull / linux-bionic-py3.7-clang9 / test (noarch, 1, 1, linux.2xlarge) (20/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T22:09:34.0074610Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T22:09:33.6365531Z + python3 -m pip install boto3==1.19.12\n2022-03-21T22:09:33.8475619Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T22:09:33.8655152Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T22:09:33.8704395Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T22:09:33.8716774Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T22:09:33.8760145Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T22:09:33.8785000Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T22:09:33.8811316Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T22:09:33.8960134Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T22:09:33.9984866Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0d325eb9fd156146f\n2022-03-21T22:09:34.0074610Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T22:09:34.0087465Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T22:09:34.0101743Z ##[error]Process completed with exit code 2.\n2022-03-21T22:09:34.0154014Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T22:09:34.0154246Z with:\n2022-03-21T22:09:34.0154412Z env:\n2022-03-21T22:09:34.0154574Z IN_CI: 1\n2022-03-21T22:09:34.0154728Z IS_GHA: 1\n2022-03-21T22:09:34.0154917Z GIT_DEFAULT_BRANCH: master\n2022-03-21T22:09:34.0155112Z ##[endgroup]\n2022-03-21T22:09:34.0191047Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-xenial-py3.7-gcc5.4 / test (distributed, 1, 1, linux.2xlarge) (21/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:03:17.8502655Z [E request_callbac...yUniqueId(created_on=0, local_id=0) to be created.\n\n2022-03-21T21:03:14.4669960Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxgdsmeer\n2022-03-21T21:03:14.4671407Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxgdsmeer/_remote_module_non_sriptable.py\n2022-03-21T21:03:14.4973023Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1i2hfmpc\n2022-03-21T21:03:14.4973800Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1i2hfmpc/_remote_module_non_sriptable.py\n2022-03-21T21:03:14.5532339Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgx4da7b0\n2022-03-21T21:03:14.5533064Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgx4da7b0/_remote_module_non_sriptable.py\n2022-03-21T21:03:14.7050673Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0\n2022-03-21T21:03:14.7097127Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3\n2022-03-21T21:03:14.7398339Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2\n2022-03-21T21:03:14.7922283Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1\n2022-03-21T21:03:17.8502655Z [E request_callback_no_python.cpp:559] Received error while processing request type 261: false INTERNAL ASSERT FAILED at \"/var/lib/jenkins/workspace/torch/csrc/distributed/rpc/rref_context.cpp\":387, please report a bug to PyTorch. Expected OwnerRRef with id GloballyUniqueId(created_on=0, local_id=0) to be created.\n2022-03-21T21:03:17.8503603Z Exception raised from getOwnerRRef at /var/lib/jenkins/workspace/torch/csrc/distributed/rpc/rref_context.cpp:387 (most recent call first):\n2022-03-21T21:03:17.8504385Z frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x69 (0x7f180df19e19 in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)\n2022-03-21T21:03:17.8505131Z frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string, std::allocator > const&) + 0xd2 (0x7f180df160e2 in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)\n2022-03-21T21:03:17.8505927Z frame #2: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string, std::allocator > const&) + 0x4e (0x7f180df17a7e in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)\n2022-03-21T21:03:17.8506674Z frame #3: torch::distributed::rpc::RRefContext::getOwnerRRef(torch::distributed::rpc::GloballyUniqueId const&, bool) + 0x4b4 (0x7f18118b7b64 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\n2022-03-21T21:03:17.8507642Z frame #4: torch::distributed::rpc::RequestCallbackNoPython::assignOwnerRRef(torch::distributed::rpc::GloballyUniqueId const&, torch::distributed::rpc::GloballyUniqueId const&, c10::intrusive_ptr >) const + 0x70 (0x7f18118a7bf0 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\n2022-03-21T21:03:17.8508613Z frame #5: torch::distributed::rpc::RequestCallbackImpl::processPythonRemoteCall(torch::distributed::rpc::RpcCommandBase&, std::vector >) const + 0xc8 (0x7f1819736208 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\n2022-03-21T21:03:17.8509749Z frame #6: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x194 (0x7f18118ac914 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\n2022-03-21T21:03:17.8510708Z frame #7: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x65 (0x7f1819735865 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\n2022-03-21T21:03:17.8511369Z frame #8: + 0x375249a (0x7f18118a949a in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\n\n\n pull / linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test (22/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T20:01:07.7015580Z \ufffd[36;1m echo \"ERR...t available for the merge-base of your branch\"\ufffd[0m\n\n2022-03-21T20:01:07.7012399Z \ufffd[36;1mfi\ufffd[0m\n2022-03-21T20:01:07.7012634Z \ufffd[36;1m# Covers the case where a previous tag doesn't exist for the tree\ufffd[0m\n2022-03-21T20:01:07.7012992Z \ufffd[36;1m# this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly\ufffd[0m\n2022-03-21T20:01:07.7013373Z \ufffd[36;1mif ! git rev-parse \"$MERGE_BASE:.circleci/docker\"; then\ufffd[0m\n2022-03-21T20:01:07.7013784Z \ufffd[36;1m echo \"Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit\"\ufffd[0m\n2022-03-21T20:01:07.7014149Z \ufffd[36;1m exit 1\ufffd[0m\n2022-03-21T20:01:07.7014325Z \ufffd[36;1mfi\ufffd[0m\n2022-03-21T20:01:07.7014573Z \ufffd[36;1mPREVIOUS_DOCKER_TAG=$(git rev-parse \"$MERGE_BASE:.circleci/docker\")\ufffd[0m\n2022-03-21T20:01:07.7014907Z \ufffd[36;1m# If no image exists but the hash is the same as the previous hash then we should error out here\ufffd[0m\n2022-03-21T20:01:07.7015231Z \ufffd[36;1mif [[ \"${PREVIOUS_DOCKER_TAG}\" = \"${DOCKER_TAG}\" ]]; then\ufffd[0m\n2022-03-21T20:01:07.7015580Z \ufffd[36;1m echo \"ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch\"\ufffd[0m\n2022-03-21T20:01:07.7015931Z \ufffd[36;1m echo \" contact the PyTorch team to restore the original images\"\ufffd[0m\n2022-03-21T20:01:07.7016225Z \ufffd[36;1m exit 1\ufffd[0m\n2022-03-21T20:01:07.7016400Z \ufffd[36;1mfi\ufffd[0m\n2022-03-21T20:01:07.7016608Z \ufffd[36;1mecho ::set-output name=rebuild::yes\ufffd[0m\n2022-03-21T20:01:07.7027605Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}\n2022-03-21T20:01:07.7027837Z env:\n2022-03-21T20:01:07.7028006Z IN_CI: 1\n2022-03-21T20:01:07.7028159Z IS_GHA: 1\n2022-03-21T20:01:07.7028346Z GIT_DEFAULT_BRANCH: master\n2022-03-21T20:01:07.7028589Z BASE_REVISION: 6643522db9ff595f564b8081de58b3a33c546178\n\n\n pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (deploy, 1, 1, linux.4xlarge.nvidia.gpu) (23/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-22T00:49:54.2949572Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-22T00:49:53.8049151Z + python3 -m pip install boto3==1.19.12\n2022-03-22T00:49:54.0981629Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-22T00:49:54.1207562Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-22T00:49:54.1277146Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-22T00:49:54.1315027Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-22T00:49:54.1331813Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-22T00:49:54.1391622Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-22T00:49:54.1609217Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-22T00:49:54.1637417Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-22T00:49:54.2830197Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0f7c32fe13be12fea\n2022-03-22T00:49:54.2949572Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-22T00:49:54.2966933Z + GHA_WORKFLOW_JOB_ID=\n2022-03-22T00:49:54.2982588Z ##[error]Process completed with exit code 2.\n2022-03-22T00:49:54.3031464Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-22T00:49:54.3031794Z with:\n2022-03-22T00:49:54.3032012Z env:\n2022-03-22T00:49:54.3032227Z IN_CI: 1\n2022-03-22T00:49:54.3032434Z IS_GHA: 1\n2022-03-22T00:49:54.3032681Z GIT_DEFAULT_BRANCH: master\n2022-03-22T00:49:54.3033084Z GPU_FLAG: --gpus all\n2022-03-22T00:49:54.3033312Z ##[endgroup]\n\n\n pull / win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge) (24/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:56:12.5872636Z C:\\actions-runner\\...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T21:56:07.3365589Z Downloading botocore-1.22.12-py3-none-any.whl (8.1 MB)\n2022-03-21T21:56:07.7926584Z ---------------------------------------- 8.1/8.1 MB 17.3 MB/s eta 0:00:00\n2022-03-21T21:56:07.9319362Z Collecting python-dateutil<3.0.0,>=2.1\n2022-03-21T21:56:07.9366132Z Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)\n2022-03-21T21:56:08.0077590Z -------------------------------------- 247.7/247.7 KB 3.0 MB/s eta 0:00:00\n2022-03-21T21:56:08.0164070Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T21:56:08.1775537Z Requirement already satisfied: six>=1.5 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T21:56:08.3393469Z Installing collected packages: python-dateutil, jmespath, botocore, s3transfer, boto3\n2022-03-21T21:56:12.4576766Z Successfully installed boto3-1.19.12 botocore-1.22.12 jmespath-0.10.0 python-dateutil-2.8.2 s3transfer-0.5.2\n2022-03-21T21:56:12.5641959Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0afad69838118af0e\n2022-03-21T21:56:12.5872636Z C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\\python3.exe: can't open file 'C:\\\\actions-runner\\\\_work\\\\pytorch\\\\pytorch\\\\.github\\\\scripts\\\\get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T21:56:12.5905611Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T21:56:12.5927729Z ##[error]Process completed with exit code 2.\n2022-03-21T21:56:12.6239531Z ##[group]Run pytorch/pytorch/.github/actions/teardown-win@master\n2022-03-21T21:56:12.6240039Z with:\n2022-03-21T21:56:12.6240299Z env:\n2022-03-21T21:56:12.6240557Z IN_CI: 1\n2022-03-21T21:56:12.6240805Z IS_GHA: 1\n2022-03-21T21:56:12.6241118Z GIT_DEFAULT_BRANCH: master\n2022-03-21T21:56:12.6241613Z pythonLocation: C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\n2022-03-21T21:56:12.6242052Z ##[endgroup]\n\n\n pull / linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge) (25/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:46:39.5474616Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T21:46:39.1884210Z + python3 -m pip install boto3==1.19.12\n2022-03-21T21:46:39.3928976Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T21:46:39.4105069Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T21:46:39.4152571Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T21:46:39.4194931Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T21:46:39.4218947Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T21:46:39.4230812Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T21:46:39.4380089Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T21:46:39.4399461Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T21:46:39.5387703Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0888bed1149cca415\n2022-03-21T21:46:39.5474616Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T21:46:39.5487145Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T21:46:39.5497480Z ##[error]Process completed with exit code 2.\n2022-03-21T21:46:39.5541319Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T21:46:39.5541544Z with:\n2022-03-21T21:46:39.5541698Z env:\n2022-03-21T21:46:39.5541851Z IN_CI: 1\n2022-03-21T21:46:39.5541997Z IS_GHA: 1\n2022-03-21T21:46:39.5542176Z GIT_DEFAULT_BRANCH: master\n2022-03-21T21:46:39.5542361Z ##[endgroup]\n2022-03-21T21:46:39.5557878Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-xenial-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge) (26/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:34:57.0623859Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T21:34:56.9039884Z Attempting uninstall: s3transfer\n2022-03-21T21:34:56.9041446Z Found existing installation: s3transfer 0.3.7\n2022-03-21T21:34:56.9090783Z Uninstalling s3transfer-0.3.7:\n2022-03-21T21:34:56.9095968Z Successfully uninstalled s3transfer-0.3.7\n2022-03-21T21:34:56.9453014Z Attempting uninstall: boto3\n2022-03-21T21:34:56.9454356Z Found existing installation: boto3 1.16.34\n2022-03-21T21:34:56.9564320Z Uninstalling boto3-1.16.34:\n2022-03-21T21:34:56.9578035Z Successfully uninstalled boto3-1.16.34\n2022-03-21T21:34:57.0091363Z Successfully installed boto3-1.19.12 botocore-1.22.12 s3transfer-0.5.2\n2022-03-21T21:34:57.0536230Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-034a3afd5d80b91fd\n2022-03-21T21:34:57.0623859Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T21:34:57.0637167Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T21:34:57.0647396Z ##[error]Process completed with exit code 2.\n2022-03-21T21:34:57.0688237Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T21:34:57.0688481Z with:\n2022-03-21T21:34:57.0688631Z env:\n2022-03-21T21:34:57.0688769Z IN_CI: 1\n2022-03-21T21:34:57.0688930Z IS_GHA: 1\n2022-03-21T21:34:57.0689109Z GIT_DEFAULT_BRANCH: master\n2022-03-21T21:34:57.0689462Z ##[endgroup]\n2022-03-21T21:34:57.0704768Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-xenial-py3.7-clang7-asan / test (default, 3, 3, linux.2xlarge) (27/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:05:00.7896545Z SUMMARY: Undefined.../jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in\n\n2022-03-21T21:05:00.7395504Z #10 0x5597fd5a9801 in run_mod /tmp/build/80754af9/python_1627392990942/work/Python/pythonrun.c:1037\n2022-03-21T21:05:00.7396330Z #11 0x5597fd5b47a9 in PyRun_StringFlags /tmp/build/80754af9/python_1627392990942/work/Python/pythonrun.c:961\n2022-03-21T21:05:00.7396688Z #12 0x5597fd5b480b in PyRun_SimpleStringFlags /tmp/build/80754af9/python_1627392990942/work/Python/pythonrun.c:455\n2022-03-21T21:05:00.7398664Z #13 0x5597fd5b4908 in pymain_run_command /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:420\n2022-03-21T21:05:00.7399177Z #14 0x5597fd5b4908 in pymain_run_python /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:2907\n2022-03-21T21:05:00.7399663Z #15 0x5597fd5b4908 in pymain_main /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:3460\n2022-03-21T21:05:00.7399986Z #16 0x5597fd5b4ccb in _Py_UnixMain /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:3495\n2022-03-21T21:05:00.7895241Z #17 0x7f0a5905983f in __libc_start_main /build/glibc-S7Ft5T/glibc-2.23/csu/../csu/libc-start.c:291\n2022-03-21T21:05:00.7895772Z #18 0x5597fd559554 in _start (/opt/conda/bin/python3.7+0x1d7554)\n2022-03-21T21:05:00.7896033Z \n2022-03-21T21:05:00.7896545Z SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in \n2022-03-21T21:05:00.8063448Z + retcode=1\n2022-03-21T21:05:00.8063787Z + set -e\n2022-03-21T21:05:00.8064058Z + return 1\n2022-03-21T21:05:00.8067638Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX-* ]]\n2022-03-21T21:05:00.8068127Z + [[ default == \\n\\o\\g\\p\\u\\_\\N\\O\\_\\A\\V\\X ]]\n2022-03-21T21:05:00.8069018Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX2-* ]]\n2022-03-21T21:05:00.8069500Z + [[ default == \\n\\o\\g\\p\\u\\_\\N\\O\\_\\A\\V\\X\\2 ]]\n2022-03-21T21:05:00.8070105Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX512-* ]]\n2022-03-21T21:05:00.8070580Z + [[ default == \\n\\o\\g\\p\\u\\_\\N\\O\\_\\A\\V\\X\\5\\1\\2 ]]\n2022-03-21T21:05:00.8072640Z + [[ linux-xenial-py3.7-clang7-asan-default == *tbb* ]]\n\n\n pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 2, 2, linux.4xlarge.nvidia.gpu) (28/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T22:48:17.3384813Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T22:48:16.8599645Z + python3 -m pip install boto3==1.19.12\n2022-03-21T22:48:17.1464241Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T22:48:17.1685222Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T22:48:17.1754164Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T22:48:17.1771662Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T22:48:17.1808722Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T22:48:17.1868636Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T22:48:17.1903889Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T22:48:17.2113746Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T22:48:17.3267404Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-01fe178c405417375\n2022-03-21T22:48:17.3384813Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T22:48:17.3402286Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T22:48:17.3418376Z ##[error]Process completed with exit code 2.\n2022-03-21T22:48:17.3470528Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T22:48:17.3470874Z with:\n2022-03-21T22:48:17.3471096Z env:\n2022-03-21T22:48:17.3471327Z IN_CI: 1\n2022-03-21T22:48:17.3471538Z IS_GHA: 1\n2022-03-21T22:48:17.3471802Z GIT_DEFAULT_BRANCH: master\n2022-03-21T22:48:17.3472083Z GPU_FLAG: --gpus all\n2022-03-21T22:48:17.3472322Z ##[endgroup]\n\n\n pull / linux-xenial-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge) (29/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:16:38.9646300Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T21:16:38.7995969Z Attempting uninstall: s3transfer\n2022-03-21T21:16:38.7998039Z Found existing installation: s3transfer 0.3.7\n2022-03-21T21:16:38.8066994Z Uninstalling s3transfer-0.3.7:\n2022-03-21T21:16:38.8072844Z Successfully uninstalled s3transfer-0.3.7\n2022-03-21T21:16:38.8449275Z Attempting uninstall: boto3\n2022-03-21T21:16:38.8451430Z Found existing installation: boto3 1.16.34\n2022-03-21T21:16:38.8559828Z Uninstalling boto3-1.16.34:\n2022-03-21T21:16:38.8574290Z Successfully uninstalled boto3-1.16.34\n2022-03-21T21:16:38.9100438Z Successfully installed boto3-1.19.12 botocore-1.22.12 s3transfer-0.5.2\n2022-03-21T21:16:38.9558098Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0d779c59d277d32ee\n2022-03-21T21:16:38.9646300Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T21:16:38.9658894Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T21:16:38.9673240Z ##[error]Process completed with exit code 2.\n2022-03-21T21:16:38.9720106Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T21:16:38.9720333Z with:\n2022-03-21T21:16:38.9720485Z env:\n2022-03-21T21:16:38.9720645Z IN_CI: 1\n2022-03-21T21:16:38.9720793Z IS_GHA: 1\n2022-03-21T21:16:38.9720970Z GIT_DEFAULT_BRANCH: master\n2022-03-21T21:16:38.9721151Z ##[endgroup]\n2022-03-21T21:16:38.9736762Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n\nThis comment was automatically generated by Dr. CI (expand for details).\nPlease report bugs/suggestions to the (internal) Dr. CI Users group.\nClick here to manually regenerate this comment.", + "author": { + "login": "facebook-github-bot" + }, + "authorAssociation": "MEMBER", + "editor": { + "login": "facebook-github-bot" + }, + "databaseId": 964902894 + }, + { + "bodyText": "@vitaly-fedyunin @gottbrath FYI that this is the oneDNN Graph API integration. It depends on the #63748.", + "author": { + "login": "Jianhui-Li" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 970451860 + }, + { + "bodyText": "CI failures are currently being caused by some issues in the CI infra, and are also occurring with other PRs.", + "author": { + "login": "sanchitintel" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 990641309 + }, + { + "bodyText": "CI failures are unrelated.", + "author": { + "login": "sanchitintel" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 991281407 + }, + { + "bodyText": "The CI failure is unrelated.", + "author": { + "login": "sanchitintel" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 995389295 + }, + { + "bodyText": "Hi, thank you for the PR!\nDo you mind running a larger amount of torchbench and reporting numbers ? You can look at Jason's post here for what models are supported in script. Initially just the vision models would be useful. @Krovatkin also did some benchmarking of a traced Bert model and found on average a ~16% speedup with this PR.", + "author": { + "login": "eellison" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1015689390 + }, + { + "bodyText": "Thanks a lot for reviewing, @eellison & @Krovatkin!\nWe just wanted to let you know that we're working on the benchmarking & will get back to you in a day, or two.\nUPDATE (Jan 21): While running some TorchBench models, we discovered some composability issues, and are working to ensure that oneDNN Graph would complement PyTorch's existing fusion capabilities, not hinder them.\nUPDATE (Jan 24): We've resolved the issues & will update this PR later today. Thanks!", + "author": { + "login": "sanchitintel" + }, + "authorAssociation": "NONE", + "editor": { + "login": "sanchitintel" + }, + "databaseId": 1016996190 + }, + { + "bodyText": "Hello @eellison,\nWe used this TorchBench branch for comparison. compare_llga.sh can be run for comparison.\nFor benchmarking mobilenet_v3_large with hardswish support in oneDNN Graph, this oneDNN Graph branch can be used in third_party/ideep/mkl-dnn. It delivers a speedup over PyTorch JIT (NNC + OFI) because 21 additional reorders are prevented (the major factor here), and fusion with conv also helps further.\nThe next release of oneDNN Graph would have hardswish support.\nWe're also exploring adding a hardsigmoid op in oneDNN Graph.\nThank you!", + "author": { + "login": "sanchitintel" + }, + "authorAssociation": "NONE", + "editor": { + "login": "sanchitintel" + }, + "databaseId": 1022709513 + }, + { + "bodyText": "Please note that this PR should be merged after #71546, as #71546 changes the third_party/ideep commit (this PR also uses that ideep commit, but it'd probably be better to merge #71546 first, so that oneDNN v2.5.2 upgrade would be in a separate PR). Thank you!", + "author": { + "login": "sanchitintel" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 1026330085 + }, + { + "bodyText": "@sanchitintel mind rebasing and i'll land ?", + "author": { + "login": "eellison" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1055813984 + }, + { + "bodyText": "@eellison has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.", + "author": { + "login": "facebook-github-bot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1057203495 + }, + { + "bodyText": "Thanks a lot for taking a look, @eellison! To fix this error, we would enable Bazel build for oneDNN Graph.", + "author": { + "login": "sanchitintel" + }, + "authorAssociation": "NONE", + "editor": { + "login": "sanchitintel" + }, + "databaseId": 1061230087 + }, + { + "bodyText": "@eellison has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.", + "author": { + "login": "facebook-github-bot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1063276600 + }, + { + "bodyText": "@malfet has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.", + "author": { + "login": "facebook-github-bot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1074355779 + }, + { + "bodyText": "And graph_rewriter.cpp is full of DOS newlines...", + "author": { + "login": "malfet" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1074407452 + }, + { + "bodyText": "Hey @chunyuan-w.\nYou've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.\nFor changes that are 'topic: not user facing' there is no need for a release notes label.", + "author": { + "login": "github-actions" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 1074471758 + }, + { + "bodyText": "Thanks a ton for your help, @malfet & @eellison! :)\nWe'll incorporate your suggestions in subsequent PR(s).", + "author": { + "login": "sanchitintel" + }, + "authorAssociation": "NONE", + "editor": { + "login": "sanchitintel" + }, + "databaseId": 1074492365 + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpHOOYM_0Q==", + "hasPreviousPage": false + } + } + } + } + } + }, + "query_sha=a782f66a44a63d21c9e17b1373747a1c07e50b695762a68a8b8db1203ac6c1bb name=pytorch number=73969 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "closed": true, + "isCrossRepository": true, + "author": { + "login": "malfet" + }, + "title": "Dummy change", + "body": "Test Plan: None at all\n\nDifferential Revision: D34753911\n\n", + "headRefName": "export-D34753911", + "headRepository": { + "nameWithOwner": "malfet/pytorch" + }, + "baseRefName": "master", + "baseRepository": { + "nameWithOwner": "pytorch/pytorch", + "isPrivate": false, + "defaultBranchRef": { + "name": "master" + } + }, + "mergeCommit": null, + "commits_with_authors": { + "nodes": [ + { + "commit": { + "author": { + "user": { + "login": "malfet" + }, + "email": "nshulga@fb.com", + "name": "Nikita Shulga" + }, + "oid": "4746da707a9912356f5179625da89616b228dc21" + } + } + ], + "totalCount": 1 + }, + "commits": { + "nodes": [ + { + "commit": { + "checkSuites": { + "nodes": [ + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-vulkan-bionic-py3.7-clang9" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbRQMQ=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO2aM=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-bionic-rocm4.5-py3.7" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 2, linux.rocm.gpu)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 2, 2, linux.rocm.gpu)", + "conclusion": "SUCCESS" + }, + { + "name": "test (distributed, 1, 1, linux.rocm.gpu)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbTiXw=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "win-vs2019-cuda11.3-py3" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (force_on_cpu, 1, 1, windows.4xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 2, 2, windows.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 2, windows.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbY_vU=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build-and-test", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO2ao=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-docs" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "build-docs (cpp)", + "conclusion": "SUCCESS" + }, + { + "name": "build-docs (python)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbRIt0=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-gcc7" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (distributed, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbRFm4=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3-clang5-mobile-build" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO2aw=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "Lint" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "shellcheck", + "conclusion": "SUCCESS" + }, + { + "name": "quick-checks", + "conclusion": "SUCCESS" + }, + { + "name": "clang-tidy", + "conclusion": "SUCCESS" + }, + { + "name": "clang-format", + "conclusion": "SUCCESS" + }, + { + "name": "cmakelint", + "conclusion": "SUCCESS" + }, + { + "name": "toc", + "conclusion": "SUCCESS" + }, + { + "name": "py2-setup-validate-errormsg", + "conclusion": "SUCCESS" + }, + { + "name": "flake8-py3", + "conclusion": "SUCCESS" + }, + { + "name": "mypy", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO4Es=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build-and-test", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO2b0=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "run-torchbench", + "conclusion": "NEUTRAL" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO2c8=", + "hasNextPage": false + } + }, + "conclusion": "SKIPPED" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "Test tools" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "test", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO2as=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-cuda11.3-py3.7-gcc7" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 2, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS" + }, + { + "name": "test (distributed, 1, 1, linux.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 2, 2, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbUkMA=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-gcc7-no-ops" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO2d8=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "win-vs2019-cpu-py3" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 2, 2, windows.4xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 2, windows.4xlarge)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbWDX8=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build-and-test", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO2aQ=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-clang7-asan" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 3, 3, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 3, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 2, 3, linux.2xlarge)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbSD-k=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "Facebook GitHub Tools", + "databaseId": 12274 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [ + { + "name": "Facebook CLA Check", + "conclusion": "SUCCESS" + }, + { + "name": "Meta Internal-Only Changes Check", + "conclusion": "NEUTRAL" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO574=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "pytorch-xla-linux-bionic-py3.7-clang8" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (xla, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbSGAM=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-gcc5.4" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (backwards_compat, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (distributed, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (docs_test, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (jit_legacy, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbRlJs=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-bionic-py3.7-clang9" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (noarch, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbRN_c=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-clang7-onnx" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbRySo=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3-clang5-mobile-custom-build-static" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO2d0=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-UI=", + "hasNextPage": false + } + }, + "oid": "4746da707a9912356f5179625da89616b228dc21" + } + } + ] + }, + "changedFiles": 1, + "files": { + "nodes": [ + { + "path": "tools/build_variables.bzl" + } + ], + "pageInfo": { + "endCursor": "MQ", + "hasNextPage": false + } + }, + "reviews": { + "nodes": [], + "totalCount": 0 + }, + "comments": { + "nodes": [ + { + "bodyText": "CI Flow Status\n\u269b\ufe0f CI Flow\nRuleset - Version: v1\nRuleset - File: https://github.com/malfet/pytorch/blob/4746da707a9912356f5179625da89616b228dc21/.github/generated-ciflow-ruleset.json\nPR ciflow labels: ciflow/default\nAdd ciflow labels to this PR to trigger more builds:\n\n\n\nWorkflows\nLabels (bold enabled)\nStatus\n\n\n\n\nTriggered Workflows\n\n\n\n\nlinux-binary-conda\nciflow/binaries, ciflow/binaries_conda, ciflow/default\n\u2705 triggered\n\n\nlinux-binary-libtorch-cxx11-abi\nciflow/all, ciflow/binaries, ciflow/binaries_libtorch, ciflow/default, ciflow/trunk\n\u2705 triggered\n\n\nlinux-binary-libtorch-pre-cxx11\nciflow/all, ciflow/binaries, ciflow/binaries_libtorch, ciflow/default, ciflow/trunk\n\u2705 triggered\n\n\nlinux-binary-manywheel\nciflow/all, ciflow/binaries, ciflow/binaries_wheel, ciflow/default, ciflow/trunk\n\u2705 triggered\n\n\nlinux-bionic-py3.7-clang9\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/trunk\n\u2705 triggered\n\n\nlinux-bionic-rocm4.5-py3.7\nciflow/all, ciflow/default, ciflow/linux, ciflow/rocm, ciflow/trunk\n\u2705 triggered\n\n\nlinux-docs\nciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-vulkan-bionic-py3.7-clang9\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk, ciflow/vulkan\n\u2705 triggered\n\n\nlinux-xenial-cuda11.3-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-cuda11.3-py3.7-gcc7-bazel-test\nciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3-clang5-mobile-build\nciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3-clang5-mobile-custom-build-static\nciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-clang7-asan\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-clang7-onnx\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-gcc5.4\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build\nciflow/all, ciflow/cpu, ciflow/default, ciflow/libtorch, ciflow/linux, ciflow/mobile, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-gcc7\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-gcc7-no-ops\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nmacos-arm64-binary-conda\nciflow/binaries, ciflow/binaries_conda, ciflow/default\n\u2705 triggered\n\n\nmacos-arm64-binary-wheel\nciflow/binaries, ciflow/binaries_wheel, ciflow/default\n\u2705 triggered\n\n\nmacos-binary-conda\nciflow/binaries, ciflow/binaries_conda, ciflow/default\n\u2705 triggered\n\n\nmacos-binary-libtorch-cxx11-abi\nciflow/binaries, ciflow/binaries_libtorch, ciflow/default\n\u2705 triggered\n\n\nmacos-binary-libtorch-pre-cxx11\nciflow/binaries, ciflow/binaries_libtorch, ciflow/default\n\u2705 triggered\n\n\nmacos-binary-wheel\nciflow/binaries, ciflow/binaries_wheel, ciflow/default\n\u2705 triggered\n\n\npytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single\nciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\npytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit\nciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nwin-vs2019-cpu-py3\nciflow/all, ciflow/cpu, ciflow/default, ciflow/trunk, ciflow/win\n\u2705 triggered\n\n\nwin-vs2019-cuda11.3-py3\nciflow/all, ciflow/cuda, ciflow/default, ciflow/trunk, ciflow/win\n\u2705 triggered\n\n\nwindows-binary-conda\nciflow/binaries, ciflow/binaries_conda, ciflow/default\n\u2705 triggered\n\n\nwindows-binary-libtorch-debug\nciflow/all, ciflow/binaries, ciflow/binaries_libtorch, ciflow/default, ciflow/trunk\n\u2705 triggered\n\n\nwindows-binary-libtorch-release\nciflow/all, ciflow/binaries, ciflow/binaries_libtorch, ciflow/default, ciflow/trunk\n\u2705 triggered\n\n\nwindows-binary-wheel\nciflow/all, ciflow/binaries, ciflow/binaries_wheel, ciflow/default, ciflow/trunk\n\u2705 triggered\n\n\nSkipped Workflows\n\n\n\n\ncaffe2-linux-xenial-py3.7-gcc5.4\nciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\ndocker-builds\nciflow/all, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-arm64\nciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nios-12-5-1-arm64-coreml\nciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nios-12-5-1-arm64-custom-ops\nciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nios-12-5-1-arm64-metal\nciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nios-12-5-1-x86-64\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-x86-64-coreml\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nlibtorch-linux-xenial-cuda10.2-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\nlibtorch-linux-xenial-cuda11.3-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\nlinux-bionic-cuda10.2-py3.9-gcc7\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow, ciflow/trunk\n\ud83d\udeab skipped\n\n\nlinux-docs-push\nciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nlinux-xenial-cuda11.3-py3.7-gcc7-no-ops\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\nmacos-10-15-py3-arm64\nciflow/all, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nmacos-10-15-py3-lite-interpreter-x86-64\nciflow/all, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nmacos-11-py3-x86-64\nciflow/all, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nparallelnative-linux-xenial-py3.7-gcc5.4\nciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\nperiodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nperiodic-linux-bionic-cuda11.5-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nperiodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck\n\ud83d\udeab skipped\n\n\nperiodic-linux-xenial-cuda11.3-py3.7-gcc7-debug\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nperiodic-win-vs2019-cuda11.5-py3\nciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win\n\ud83d\udeab skipped\n\n\npytorch-linux-xenial-py3-clang5-android-ndk-r19c-build\nciflow/all, ciflow/android, ciflow/cpu, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\npytorch-xla-linux-bionic-py3.7-clang8\nciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk, ciflow/xla\n\ud83d\udeab skipped", + "author": { + "login": "pytorch-bot" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 1063079053 + }, + { + "bodyText": "\ud83d\udd17 Helpful links\n\n\ud83e\uddea \u00a0See artifacts and rendered test results at hud.pytorch.org/pr/73969\n\ud83d\udcc4 \u00a0Preview docs built from this PR\n\ud83d\udcc4 \u00a0Preview C++ docs built from this PR\n\ud83d\udd27 \u00a0Opt-in to CIFlow to control what jobs run on your PRs\n\n\ud83d\udc8a CI failures summary and remediations\nAs of commit 4746da7 (more details on the Dr. CI page):\n\n\ud83d\udc9a \ud83d\udc9a Looks good so far! There are no failures yet. \ud83d\udc9a \ud83d\udc9a\n\nThis comment was automatically generated by Dr. CI (expand for details).\nPlease report bugs/suggestions to the (internal) Dr. CI Users group.\nClick here to manually regenerate this comment.", + "author": { + "login": "facebook-github-bot" + }, + "authorAssociation": "MEMBER", + "editor": { + "login": "facebook-github-bot" + }, + "databaseId": 1063079113 + }, + { + "bodyText": "This pull request was exported from Phabricator. Differential Revision: D34753911", + "author": { + "login": "facebook-github-bot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1063079731 + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpHOP11MjQ==", + "hasPreviousPage": false + } + } + } + } + } + }, + "query_sha=a782f66a44a63d21c9e17b1373747a1c07e50b695762a68a8b8db1203ac6c1bb name=pytorch number=73099 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "closed": true, + "isCrossRepository": false, + "author": { + "login": "BowenBao" + }, + "title": "[ONNX] Make graph name spec-compliant (#71961)", + "body": "Stack from [ghstack](https://github.com/ezyang/ghstack):\n* #73104\n* #73103\n* #73102\n* #73101\n* #73100\n* __->__ #73099\n\n[According to the ONNX spec](https://github.com/onnx/onnx/blob/main/docs/IR.md#names-within-a-graph),\nall names must adhere to C90 identifier syntax rules, which means no\ndashes.\n\nFixes: #30952", + "headRefName": "gh/BowenBao/138/head", + "headRepository": { + "nameWithOwner": "pytorch/pytorch" + }, + "baseRefName": "gh/BowenBao/138/base", + "baseRepository": { + "nameWithOwner": "pytorch/pytorch", + "isPrivate": false, + "defaultBranchRef": { + "name": "master" + } + }, + "mergeCommit": null, + "commits_with_authors": { + "nodes": [ + { + "commit": { + "author": { + "user": { + "login": "BowenBao" + }, + "email": "bowbao@microsoft.com", + "name": "BowenBao" + }, + "oid": "3038b939eb2069653305c419326a0f47d2598e39" + } + } + ], + "totalCount": 1 + }, + "commits": { + "nodes": [ + { + "commit": { + "checkSuites": { + "nodes": [ + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "run-torchbench", + "conclusion": "NEUTRAL" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkNn9o=", + "hasNextPage": false + } + }, + "conclusion": "SKIPPED" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-cuda11.3-py3.7-gcc7" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 2, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 2, 2, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS" + }, + { + "name": "test (distributed, 1, 1, linux.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkRE_E=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-gcc7-no-ops" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkNoJE=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3-clang5-mobile-build" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkNoIY=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build-and-test", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkNoJs=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-bionic-py3.7-clang9" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (noarch, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkPiwA=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-vulkan-bionic-py3.7-clang9" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkPxgQ=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3-clang5-mobile-custom-build-static" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkNoKA=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "win-vs2019-cpu-py3" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 2, windows.4xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 2, 2, windows.4xlarge)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkX070=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-gcc7" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (distributed, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkPiQA=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "win-vs2019-cuda11.3-py3" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 2, windows.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 2, 2, windows.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS" + }, + { + "name": "test (force_on_cpu, 1, 1, windows.4xlarge)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkdLEE=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build-and-test", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkNoIQ=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "Test tools" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "test", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkNoG0=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-docs" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "build-docs (cpp)", + "conclusion": "SUCCESS" + }, + { + "name": "build-docs (python)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkPfnY=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-gcc5.4" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (distributed, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (jit_legacy, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (backwards_compat, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (docs_test, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkPiwQ=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build-and-test", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkNoHU=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-bionic-rocm4.5-py3.7" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 2, linux.rocm.gpu)", + "conclusion": "FAILURE" + }, + { + "name": "test (default, 2, 2, linux.rocm.gpu)", + "conclusion": "SUCCESS" + }, + { + "name": "test (distributed, 1, 1, linux.rocm.gpu)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkQmxE=", + "hasNextPage": false + } + }, + "conclusion": "FAILURE" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-clang7-asan" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 3, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 2, 3, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 3, 3, linux.2xlarge)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkQNRA=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-clang7-onnx" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS" + }, + { + "name": "test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkPqms=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "Lint" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "cmakelint", + "conclusion": "SUCCESS" + }, + { + "name": "clang-format", + "conclusion": "SUCCESS" + }, + { + "name": "mypy", + "conclusion": "SUCCESS" + }, + { + "name": "py2-setup-validate-errormsg", + "conclusion": "SUCCESS" + }, + { + "name": "flake8-py3", + "conclusion": "SUCCESS" + }, + { + "name": "clang-tidy", + "conclusion": "SUCCESS" + }, + { + "name": "shellcheck", + "conclusion": "SUCCESS" + }, + { + "name": "quick-checks", + "conclusion": "SUCCESS" + }, + { + "name": "toc", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkNpZc=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "Facebook GitHub Tools", + "databaseId": 12274 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [ + { + "name": "Facebook CLA Check", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkNnvQ=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "Netlify", + "databaseId": 13473 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null + }, + { + "app": { + "name": "Azure Pipelines", + "databaseId": 9426 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null + }, + { + "app": { + "name": "Dependabot", + "databaseId": 29110 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null + }, + { + "app": { + "name": "Codecov", + "databaseId": 254 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null + }, + { + "app": { + "name": "PyTorch Bot", + "databaseId": 40112 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAT_KTRw=", + "hasNextPage": false + } + }, + "oid": "3038b939eb2069653305c419326a0f47d2598e39" + } + } + ] + }, + "changedFiles": 162, + "files": { + "nodes": [ + { + "path": "test/onnx/expect/TestOperators.test_acos.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_add_broadcast.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_add_left_broadcast.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_add_size1_broadcast.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_add_size1_right_broadcast.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_add_size1_singleton_broadcast.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_addconstant.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_addmm.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_arange_dynamic.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_argmax.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_asin.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_at_op.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_atan.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_aten_embedding_1.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_aten_embedding_2.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_avg_pool2d.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_baddbmm.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_basic.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_batchnorm.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_batchnorm_1d.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_batchnorm_noaffine.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_batchnorm_onnx_irv4.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_batchnorm_training.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_bitshift.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_c2_op.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_chunk.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_clip.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_clip_max.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_clip_min.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_concat2.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_conv.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_conv_onnx_irv4.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_conv_onnx_irv4_opset8.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_convtranspose.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_cos.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_cumsum.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_det.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_dict.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_dict_str.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_dim.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_dropout.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_dropout_default.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_dropout_opset12.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_dropout_training.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_dropout_training_opset12.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_dynamic_axes_add.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_dynamic_axes_add_inputs_same_symbolic_shape.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_dynamic_axes_matmul.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_dynamic_axes_reduce_mean.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_dynamic_axes_unchange.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_elu.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_embedding_bags.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_empty_like.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_empty_like_opset7.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_equal.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_erf.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_exp.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_expand.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_flatten.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_flatten2D.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_fmod.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_frobenius_norm.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_full.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_full_like.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_gather.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_gather_opset11.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_ge.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_gelu.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_gt.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_hardtanh.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_implicit_expand.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_index.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_isnan.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_layer_norm_aten.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_le.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_linear.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_log_sigmoid.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_logsoftmax.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_lstm_none_sequence_lens.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_lt.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_master_opset.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_max.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_maxpool.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_maxpool_dilations.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_maxpool_indices.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_mean.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_mean_dtype.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_meshgrid.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_min.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_mm.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_narrow.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_ne.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_nonzero.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_norm_p1.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_norm_p2.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_ones_like.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_pad.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_params.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_params_onnx_irv4.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_permute2.expect" + } + ], + "pageInfo": { + "endCursor": "MTAw", + "hasNextPage": true + } + }, + "reviews": { + "nodes": [ + { + "author": { + "login": "garymm" + }, + "state": "APPROVED" + } + ], + "totalCount": 1 + }, + "comments": { + "nodes": [ + { + "bodyText": "This PR cannot be merged by bot due to changing > 100 files. @malfet \n \n \n pytorch/.github/scripts/trymerge.py\n \n \n Line 63\n in\n 932adf2\n \n \n \n \n\n \n \n files(last: 100) { \n \n \n \n\n Can this be relaxed? If not please import.", + "author": { + "login": "BowenBao" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 1048084569 + }, + { + "bodyText": "This PR cannot be merged by bot due to changing > 100 files. @malfet\nCan this be relaxed? If not please import.\n\nWow, you've hit a really interesting problem. 100 is a limitation enforced by GitHub, see https://docs.github.com/en/graphql/overview/resource-limitations, but I can implement a pagination. Do you mind keeping it like that for a bit, want to land a fix soonish.", + "author": { + "login": "malfet" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1048088691 + }, + { + "bodyText": "@malfet Thank you for info. Sure, I have separated the rest of stack from this one, we'll wait for the fix to try again.", + "author": { + "login": "BowenBao" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 1048090640 + }, + { + "bodyText": "@pytorchbot merge this", + "author": { + "login": "BowenBao" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 1050293881 + }, + { + "bodyText": "Hey @BowenBao.\nYou've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.\nFor changes that are 'topic: not user facing' there is no need for a release notes label.", + "author": { + "login": "github-actions" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 1050295451 + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpHOPniAWQ==", + "hasPreviousPage": true + } + } + } + } + } + }, + "query_sha=0a34acb829d8aca9dd28a8ba388dfa52f6ecdde7e903ace1caabdcfaba87de98 cursor=MTAw name=pytorch number=73099 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "files": { + "nodes": [ + { + "path": "test/onnx/expect/TestOperators.test_pixel_shuffle.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_pow.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_prelu.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_prod.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_prod_dtype.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_rand.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_randn.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_reduce_sum_negative_indices.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_reduced_mean.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_reduced_mean_dtype.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_reduced_mean_keepdim.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_reduced_prod.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_reduced_prod_dtype.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_reduced_prod_keepdim.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_reduced_sum.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_reduced_sum_dtype.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_reduced_sum_keepdim.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_reducemax.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_reducemin.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_remainder.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_repeat.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_repeat_dim_overflow.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_round.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_rrelu.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_rsqrt.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_rsub.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_scatter_add.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_scatter_add_opset11.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_selu.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_shape_value_map.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_sign.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_sin.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_slice.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_slice_dynamic.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_softmaxcrossentropy.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_softmaxcrossentropy_3d.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_softmaxcrossentropy_3d_none.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_softmaxcrossentropy_4d.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_softmaxcrossentropy_ignore_index.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_softmaxcrossentropy_weights.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_split.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_split_with_sizes.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_sqrt.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_std.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_sum.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_sum_dtype.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_tan.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_topk.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_topk_smallest_unsorted.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_transpose.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_type_as.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_unfold.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_unique.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_unsqueeze.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_upsample_nearest_scale.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_upsample_nearest_scale_default_scale_factor.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_upsample_nearest_size.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_view.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_view_flatten.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_zeros_like.expect" + }, + { + "path": "torch/csrc/jit/serialization/export.cpp" + }, + { + "path": "torch/csrc/jit/serialization/export.h" + } + ], + "pageInfo": { + "endCursor": "MTYy", + "hasNextPage": false + } + } + } + } + } + }, + "query_sha=a782f66a44a63d21c9e17b1373747a1c07e50b695762a68a8b8db1203ac6c1bb name=pytorch number=74649 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "closed": true, + "isCrossRepository": false, + "author": { + "login": "malfet" + }, + "title": "This should fail flake8", + "body": "Test issue for GHF mandatory checks", + "headRefName": "malfet-patch-8", + "headRepository": { + "nameWithOwner": "pytorch/pytorch" + }, + "baseRefName": "master", + "baseRepository": { + "nameWithOwner": "pytorch/pytorch", + "isPrivate": false, + "defaultBranchRef": { + "name": "master" + } + }, + "mergeCommit": null, + "commits_with_authors": { + "nodes": [ + { + "commit": { + "author": { + "user": { + "login": "malfet" + }, + "email": "nshulga@fb.com", + "name": "Nikita Shulga" + }, + "oid": "57c86ff1c5ab948888fd329986c9d55796680e33" + } + }, + { + "commit": { + "author": { + "user": { + "login": "malfet" + }, + "email": "nshulga@fb.com", + "name": "Nikita Shulga" + }, + "oid": "6c3c3de6a5c1183d9a08f3c54148bc0b5de11bb4" + } + } + ], + "totalCount": 2 + }, + "commits": { + "nodes": [ + { + "commit": { + "checkSuites": { + "nodes": [ + { + "app": { + "name": "Facebook GitHub Tools", + "databaseId": 12274 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [ + { + "name": "Facebook CLA Check", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAVHsK3w=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS" + }, + { + "app": { + "name": "Netlify", + "databaseId": 13473 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null + }, + { + "app": { + "name": "Azure Pipelines", + "databaseId": 9426 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null + }, + { + "app": { + "name": "Dependabot", + "databaseId": 29110 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null + }, + { + "app": { + "name": "Codecov", + "databaseId": 254 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null + }, + { + "app": { + "name": "PyTorch Bot", + "databaseId": 40112 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "Lint" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "clang-format", + "conclusion": "SUCCESS" + }, + { + "name": "clang-tidy", + "conclusion": "SUCCESS" + }, + { + "name": "cmakelint", + "conclusion": "SUCCESS" + }, + { + "name": "flake8-py3", + "conclusion": "FAILURE" + }, + { + "name": "mypy", + "conclusion": "SUCCESS" + }, + { + "name": "Test collect_env (with_torch)", + "conclusion": "SUCCESS" + }, + { + "name": "Test collect_env (without_torch)", + "conclusion": "SUCCESS" + }, + { + "name": "Test tools", + "conclusion": "SUCCESS" + }, + { + "name": "py2-setup-validate-errormsg", + "conclusion": "SUCCESS" + }, + { + "name": "quick-checks", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAVHsMNU=", + "hasNextPage": true + } + }, + "conclusion": "FAILURE" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "run-torchbench", + "conclusion": "NEUTRAL" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAVHsLW0=", + "hasNextPage": false + } + }, + "conclusion": "SKIPPED" + }, + { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "pull" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "pytorch-xla-linux-bionic-py3.7-clang8", + "conclusion": "NEUTRAL" + }, + { + "name": "linux-vulkan-bionic-py3.7-clang9 / build", + "conclusion": "SUCCESS" + }, + { + "name": "linux-bionic-py3.7-clang9 / build", + "conclusion": "SUCCESS" + }, + { + "name": "linux-bionic-rocm4.5-py3.7 / build", + "conclusion": "SUCCESS" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / build", + "conclusion": "SUCCESS" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / build", + "conclusion": "SUCCESS" + }, + { + "name": "linux-xenial-py3-clang5-mobile-build / build", + "conclusion": "SUCCESS" + }, + { + "name": "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build / build", + "conclusion": "SUCCESS" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test", + "conclusion": "SUCCESS" + }, + { + "name": "linux-xenial-py3-clang5-mobile-custom-build-static / build", + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAVHsaNA=", + "hasNextPage": true + } + }, + "conclusion": "SUCCESS" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAVhlkGU=", + "hasNextPage": false + } + }, + "oid": "6c3c3de6a5c1183d9a08f3c54148bc0b5de11bb4" + } + } + ] + }, + "changedFiles": 1, + "files": { + "nodes": [ + { + "path": "torch/nn/cpp.py" + } + ], + "pageInfo": { + "endCursor": "MQ", + "hasNextPage": false + } + }, + "reviews": { + "nodes": [ + { + "author": { + "login": "seemethere" + }, + "state": "APPROVED" + } + ], + "totalCount": 1 + }, + "comments": { + "nodes": [ + { + "bodyText": "\ud83d\udd17 Helpful links\n\n\ud83e\uddea \u00a0See artifacts and rendered test results at hud.pytorch.org/pr/74649\n\u21a9\ufe0f \u00a0[fb-only] Re-run with SSH instructions\nNeed help or want to give feedback on the CI? Visit our office hours\n\n\ud83d\udc8a CI failures summary and remediations\nAs of commit 6c3c3de (more details on the Dr. CI page):\n\n\n1/1 failures introduced in this PR\n\n\n1 failure not recognized by patterns:\n\n\n\nJob\nStep\nAction\n\n\n\n\n Lint / flake8-py3\nFail if there were any warnings\n\ud83d\udd01 rerun\n\n\n\n\nThis comment was automatically generated by Dr. CI (expand for details).\nPlease report bugs/suggestions to the (internal) Dr. CI Users group.\nClick here to manually regenerate this comment.", + "author": { + "login": "facebook-github-bot" + }, + "authorAssociation": "MEMBER", + "editor": { + "login": "facebook-github-bot" + }, + "databaseId": 1076891218 + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpHOQDAOUg==", + "hasPreviousPage": false + } + } + } + } + } + }, + "query_sha=a91ab398f97fb43cbe6e0899980dad8ff7447457ea5a71bbc59f7702a9280eb5 cursor=None name=pytorch-dev-infra org=pytorch": { + "data": { + "organization": { + "team": { + "members": { + "nodes": [ + { + "login": "kit1980" + }, + { + "login": "b0noI" + }, + { + "login": "seemethere" + }, + { + "login": "malfet" + }, + { + "login": "tenpercent" + }, + { + "login": "atalman" + }, + { + "login": "osalpekar" + }, + { + "login": "janeyx99" + }, + { + "login": "clee2000" + } + ], + "pageInfo": { + "hasNextPage": false, + "endCursor": "Y3Vyc29yOnYyOpHOAqnOlw==" + } + } + } + } + } + }, + "query_sha=a91ab398f97fb43cbe6e0899980dad8ff7447457ea5a71bbc59f7702a9280eb5 cursor=None name=metamates org=pytorch": { + "data": { + "organization": { + "team": { + "members": { + "nodes": [ + { + "login": "dreiss" + }, + { + "login": "kumpera" + }, + { + "login": "ezyang" + }, + { + "login": "stephenroller" + }, + { + "login": "swolchok" + }, + { + "login": "hyuen" + }, + { + "login": "orionr" + }, + { + "login": "dhruvbird" + }, + { + "login": "likethesky" + }, + { + "login": "lw" + }, + { + "login": "raziel" + }, + { + "login": "simpkins" + }, + { + "login": "ebyrne" + }, + { + "login": "Babar" + }, + { + "login": "kostmo" + }, + { + "login": "bhosmer" + }, + { + "login": "zdevito" + }, + { + "login": "bugra" + }, + { + "login": "caraya10" + }, + { + "login": "kit1980" + }, + { + "login": "shoumikhin" + }, + { + "login": "teytaud" + }, + { + "login": "xuzhao9" + }, + { + "login": "jansel" + }, + { + "login": "abhinavarora" + }, + { + "login": "b0noI" + }, + { + "login": "djthorne" + }, + { + "login": "nairbv" + }, + { + "login": "Mortimerp9" + }, + { + "login": "dadkins20" + }, + { + "login": "colesbury" + }, + { + "login": "laurencer" + }, + { + "login": "nickgg" + }, + { + "login": "yzhao30" + }, + { + "login": "bearzx" + }, + { + "login": "mattjgalloway" + }, + { + "login": "chenyang78" + }, + { + "login": "yns88" + }, + { + "login": "lc0" + }, + { + "login": "wenleix" + }, + { + "login": "aivanou" + }, + { + "login": "jingsh" + }, + { + "login": "mthrok" + }, + { + "login": "drdarshan" + }, + { + "login": "tvalentius" + }, + { + "login": "d4l3k" + }, + { + "login": "jamiemccrindle" + }, + { + "login": "kazhang" + }, + { + "login": "simonhollis" + }, + { + "login": "lqiao" + }, + { + "login": "ajyu" + }, + { + "login": "bitfort" + }, + { + "login": "govardhan" + }, + { + "login": "yinghai" + }, + { + "login": "zyan0" + }, + { + "login": "ajtulloch" + }, + { + "login": "pbelevich" + }, + { + "login": "VitalyFedyunin" + }, + { + "login": "dbish" + }, + { + "login": "NicolasHug" + }, + { + "login": "efaust" + }, + { + "login": "idning" + }, + { + "login": "soumith" + }, + { + "login": "nimin98" + }, + { + "login": "chaekit" + }, + { + "login": "radkris-git" + }, + { + "login": "javier-m" + }, + { + "login": "mostafaelhoushi" + }, + { + "login": "brianjo" + }, + { + "login": "ShijunK" + }, + { + "login": "suo" + }, + { + "login": "vkuzo" + }, + { + "login": "seemethere" + }, + { + "login": "qihqi" + }, + { + "login": "jackm321" + }, + { + "login": "neerajprad" + }, + { + "login": "rsemenov" + }, + { + "login": "ziky90" + }, + { + "login": "gmagogsfm" + }, + { + "login": "zzzwen" + }, + { + "login": "ikriv" + }, + { + "login": "deeptigp" + }, + { + "login": "andrewor14" + }, + { + "login": "jianyuh" + }, + { + "login": "cykustcc" + }, + { + "login": "highker" + }, + { + "login": "navahgar" + }, + { + "login": "beauby" + }, + { + "login": "jeffreyksmithjr" + }, + { + "login": "suphoff" + }, + { + "login": "smessmer" + }, + { + "login": "ananthsub" + }, + { + "login": "d1jang" + }, + { + "login": "firstprayer" + }, + { + "login": "malfet" + }, + { + "login": "fegin" + }, + { + "login": "hanton" + }, + { + "login": "zanqi" + }, + { + "login": "bujar" + }, + { + "login": "supriyar" + } + ], + "pageInfo": { + "hasNextPage": true, + "endCursor": "Y3Vyc29yOnYyOpHOACiM0Q==" + } + } + } + } + } + }, + "query_sha=a91ab398f97fb43cbe6e0899980dad8ff7447457ea5a71bbc59f7702a9280eb5 cursor=Y3Vyc29yOnYyOpHOACiM0Q== name=metamates org=pytorch": { + "data": { + "organization": { + "team": { + "members": { + "nodes": [ + { + "login": "kausv" + }, + { + "login": "divchenko" + }, + { + "login": "rahuln32" + }, + { + "login": "bilgeacun" + }, + { + "login": "caogao" + }, + { + "login": "blefaudeux" + }, + { + "login": "miguelmartin75" + }, + { + "login": "penguinwu" + }, + { + "login": "shz117" + }, + { + "login": "ajliu" + }, + { + "login": "saketh-are" + }, + { + "login": "jessebrizzi" + }, + { + "login": "msaroufim" + }, + { + "login": "mdundas" + }, + { + "login": "davides" + }, + { + "login": "alannnna" + }, + { + "login": "hlin09" + }, + { + "login": "terrychenism" + }, + { + "login": "xiaomengy" + }, + { + "login": "jisaacso" + }, + { + "login": "fkhan1337" + }, + { + "login": "xing-liu" + }, + { + "login": "alanadakotashine" + }, + { + "login": "desertfire" + }, + { + "login": "banitag1" + }, + { + "login": "letterx" + }, + { + "login": "gchanan" + }, + { + "login": "dbort" + }, + { + "login": "bilalsal" + }, + { + "login": "jaceyca" + }, + { + "login": "serhaty" + }, + { + "login": "yf225" + }, + { + "login": "yifuwang" + }, + { + "login": "piyushmh" + }, + { + "login": "z-a-f" + }, + { + "login": "superzgc" + }, + { + "login": "tenpercent" + }, + { + "login": "spaugh" + }, + { + "login": "bertmaher" + }, + { + "login": "chauhang" + }, + { + "login": "jiayisuse" + }, + { + "login": "bradleyhd" + }, + { + "login": "ZolotukhinM" + }, + { + "login": "jamesr66a" + }, + { + "login": "mullachv" + }, + { + "login": "voznesenskym" + }, + { + "login": "charliechen0401" + }, + { + "login": "bwasti" + }, + { + "login": "cryptopic" + }, + { + "login": "chinannyang" + }, + { + "login": "NivekT" + }, + { + "login": "zhxchen17" + }, + { + "login": "jerryzh168" + }, + { + "login": "MohammadMahdiJavanmard" + }, + { + "login": "rajkar86" + }, + { + "login": "wconstab" + }, + { + "login": "Hangjun" + }, + { + "login": "davidberard98" + }, + { + "login": "Krovatkin" + }, + { + "login": "CamiWilliams" + }, + { + "login": "J0Nreynolds" + }, + { + "login": "datumbox" + }, + { + "login": "aartibasant" + }, + { + "login": "xta0" + }, + { + "login": "zou3519" + }, + { + "login": "xman1979" + }, + { + "login": "suraj813" + }, + { + "login": "gqchen" + }, + { + "login": "jayleverett" + }, + { + "login": "george-qi" + }, + { + "login": "abhikrish" + }, + { + "login": "zhangguanheng66" + }, + { + "login": "mikeiovine" + }, + { + "login": "Adolfo-Karim" + }, + { + "login": "Chillee" + }, + { + "login": "albanD" + }, + { + "login": "robotal" + }, + { + "login": "MarcioPorto" + }, + { + "login": "srsuryadev" + }, + { + "login": "IvanKobzarev" + }, + { + "login": "eprivezentsev" + }, + { + "login": "linux-jedi" + }, + { + "login": "chandlerzuo" + }, + { + "login": "prateek1404" + }, + { + "login": "otsneh" + }, + { + "login": "husthyc" + }, + { + "login": "briancoutinho" + }, + { + "login": "fduwjj" + }, + { + "login": "esqu1" + }, + { + "login": "prabhat00155" + }, + { + "login": "Gamrix" + }, + { + "login": "QuentinDuval" + }, + { + "login": "atalman" + }, + { + "login": "xush6528" + }, + { + "login": "dracifer" + }, + { + "login": "SS-JIA" + }, + { + "login": "helunwencser" + }, + { + "login": "xw285cornell" + }, + { + "login": "hhbyyh" + }, + { + "login": "rohan-varma" + } + ], + "pageInfo": { + "hasNextPage": true, + "endCursor": "Y3Vyc29yOnYyOpHOAHqtWg==" + } + } + } + } + } + }, + "query_sha=a91ab398f97fb43cbe6e0899980dad8ff7447457ea5a71bbc59f7702a9280eb5 cursor=Y3Vyc29yOnYyOpHOAHqtWg== name=metamates org=pytorch": { + "data": { + "organization": { + "team": { + "members": { + "nodes": [ + { + "login": "teng-li" + }, + { + "login": "larryliu0820" + }, + { + "login": "lyoka" + }, + { + "login": "cbalioglu" + }, + { + "login": "hl475" + }, + { + "login": "hwangjeff" + }, + { + "login": "Jack-Khuu" + }, + { + "login": "alanwaketan" + }, + { + "login": "mehtanirav" + }, + { + "login": "nateanl" + }, + { + "login": "boyuantan" + }, + { + "login": "muntaqim" + }, + { + "login": "dennysem" + }, + { + "login": "ymao1993" + }, + { + "login": "fmassa" + }, + { + "login": "esantorella" + }, + { + "login": "HamidShojanazeri" + }, + { + "login": "jubinchheda" + }, + { + "login": "mehdimashayekhi" + }, + { + "login": "rkindi" + }, + { + "login": "wanchaol" + }, + { + "login": "zephirefaith" + }, + { + "login": "alexbeloi" + }, + { + "login": "kapilsh" + }, + { + "login": "plahera" + }, + { + "login": "SherlockNoMad" + }, + { + "login": "venkatacrc" + }, + { + "login": "pritamdamania87" + }, + { + "login": "rahxephon89" + }, + { + "login": "iseeyuan" + }, + { + "login": "Matphyler" + }, + { + "login": "protonu" + }, + { + "login": "terhuhf" + }, + { + "login": "aruntonic" + }, + { + "login": "gcatron" + }, + { + "login": "yingrliu" + }, + { + "login": "alexanderguzhva" + }, + { + "login": "zhaoalex" + }, + { + "login": "shahofblah" + }, + { + "login": "vivekmig" + }, + { + "login": "yqhu" + }, + { + "login": "jspisak" + }, + { + "login": "akshaypandian" + }, + { + "login": "HarutMov" + }, + { + "login": "tktrungna" + }, + { + "login": "eellison" + }, + { + "login": "ziab" + }, + { + "login": "NarineK" + }, + { + "login": "andrewconnors" + }, + { + "login": "wenwei202" + }, + { + "login": "jg2912" + }, + { + "login": "jwpark1985" + }, + { + "login": "robieta" + }, + { + "login": "davidxili" + }, + { + "login": "mreso" + }, + { + "login": "soulitzer" + }, + { + "login": "prigoyal" + }, + { + "login": "PaliC" + }, + { + "login": "anijain2305" + }, + { + "login": "pvtuan10" + }, + { + "login": "huangyi1979" + }, + { + "login": "osalpekar" + }, + { + "login": "xiaohui-zhang" + }, + { + "login": "jerry39213gh" + }, + { + "login": "jarodhou" + }, + { + "login": "hlu1" + }, + { + "login": "huiguoo" + }, + { + "login": "H-Huang" + }, + { + "login": "vtsyvina" + }, + { + "login": "qchip" + }, + { + "login": "Nitrokitty" + }, + { + "login": "satgera" + }, + { + "login": "ngimel" + }, + { + "login": "dongreenberg" + }, + { + "login": "markkm" + }, + { + "login": "EscapeZero" + }, + { + "login": "bdhirsh" + }, + { + "login": "cccclai" + }, + { + "login": "carolineechen" + }, + { + "login": "tugsbayasgalan" + }, + { + "login": "frankseide" + }, + { + "login": "YazhiGao" + }, + { + "login": "pavithranrao" + }, + { + "login": "VirgileHlav" + }, + { + "login": "mrshenli" + }, + { + "login": "lena-kashtelyan" + }, + { + "login": "brad-mengchi" + }, + { + "login": "kimishpatel" + }, + { + "login": "aaronenyeshi" + }, + { + "login": "shajrawi" + }, + { + "login": "samdow" + }, + { + "login": "dzhulgakov" + }, + { + "login": "great-way" + }, + { + "login": "ashkan-software" + }, + { + "login": "garroud" + }, + { + "login": "knottb" + }, + { + "login": "jbitton" + }, + { + "login": "jdsgomes" + }, + { + "login": "zhangxy988" + }, + { + "login": "samlurye" + } + ], + "pageInfo": { + "hasNextPage": true, + "endCursor": "Y3Vyc29yOnYyOpHOAStXFg==" + } + } + } + } + } + }, + "query_sha=a91ab398f97fb43cbe6e0899980dad8ff7447457ea5a71bbc59f7702a9280eb5 cursor=Y3Vyc29yOnYyOpHOAStXFg== name=metamates org=pytorch": { + "data": { + "organization": { + "team": { + "members": { + "nodes": [ + { + "login": "EdwardTyantov" + }, + { + "login": "anjali411" + }, + { + "login": "842974287" + }, + { + "login": "JacobSzwejbka" + }, + { + "login": "nishantpdce" + }, + { + "login": "srinivas212" + }, + { + "login": "cherie11" + }, + { + "login": "shreyanb98" + }, + { + "login": "kavoor" + }, + { + "login": "dzdang" + }, + { + "login": "naveedgol" + }, + { + "login": "Nayef211" + }, + { + "login": "zrphercule" + }, + { + "login": "HengruiX" + }, + { + "login": "langong347" + }, + { + "login": "soapisnotfat" + }, + { + "login": "ebsmothers" + }, + { + "login": "anshuljain1" + }, + { + "login": "b-koopman" + }, + { + "login": "salilsdesai" + }, + { + "login": "vmoens" + }, + { + "login": "xinyang0" + }, + { + "login": "ramvenkat98" + }, + { + "login": "fbbradheintz" + }, + { + "login": "kauterry" + }, + { + "login": "VenkatSubramaniam" + }, + { + "login": "yxia11" + }, + { + "login": "anirbanraywork" + }, + { + "login": "houseroad" + }, + { + "login": "erichan1" + }, + { + "login": "hsrussell" + }, + { + "login": "ilia-cher" + }, + { + "login": "ajitmaths" + }, + { + "login": "awgu" + }, + { + "login": "wz337" + }, + { + "login": "LynneD" + }, + { + "login": "qxy11" + }, + { + "login": "janeyx99" + }, + { + "login": "msedwar" + }, + { + "login": "dustinh1999" + }, + { + "login": "glaringlee" + }, + { + "login": "anj-s" + }, + { + "login": "liuchen9494" + }, + { + "login": "jramseyer" + }, + { + "login": "zengk95" + }, + { + "login": "gtarjun" + }, + { + "login": "mikaylagawarecki" + }, + { + "login": "xianxl" + }, + { + "login": "lucasgadams" + }, + { + "login": "mingzhe09088" + }, + { + "login": "Vucibatina" + }, + { + "login": "aazzolini" + }, + { + "login": "nataliakliushkina" + }, + { + "login": "mruberry" + }, + { + "login": "mja314" + }, + { + "login": "HDCharles" + }, + { + "login": "mcr229" + }, + { + "login": "guangy10" + }, + { + "login": "mengwa41" + }, + { + "login": "hx89" + }, + { + "login": "kiukchung" + }, + { + "login": "hanhsienhuang" + }, + { + "login": "clee2000" + }, + { + "login": "lhuang04" + }, + { + "login": "sidneyfletcher" + }, + { + "login": "gottbrath" + }, + { + "login": "lessw2020" + }, + { + "login": "choward232" + }, + { + "login": "mmh683" + }, + { + "login": "dwarakrajagopal" + }, + { + "login": "lazysjb" + }, + { + "login": "zhaojuanmao" + }, + { + "login": "johncalab" + }, + { + "login": "dhthompson" + }, + { + "login": "superwizard2019" + }, + { + "login": "fbhuba" + }, + { + "login": "shunting314" + }, + { + "login": "edward-io" + }, + { + "login": "sean-ngo" + }, + { + "login": "bzinodev" + }, + { + "login": "xcheng16" + }, + { + "login": "adamomainz" + }, + { + "login": "sluks" + }, + { + "login": "poojahp" + }, + { + "login": "ansley" + }, + { + "login": "mvsampath" + }, + { + "login": "cheetah2216" + }, + { + "login": "pinaki-mukerji" + }, + { + "login": "hongxiayang" + }, + { + "login": "kyulee-com" + }, + { + "login": "sstsai-adl" + }, + { + "login": "dahsh" + }, + { + "login": "ohgnoes" + }, + { + "login": "szewaiyuen7" + }, + { + "login": "byterover" + }, + { + "login": "changjishi" + }, + { + "login": "ejguan" + }, + { + "login": "nimaelyasi" + }, + { + "login": "nikithamalgifb" + }, + { + "login": "qxu-fb" + } + ], + "pageInfo": { + "hasNextPage": true, + "endCursor": "Y3Vyc29yOnYyOpHOBECNfg==" + } + } + } + } + } + }, + "query_sha=a91ab398f97fb43cbe6e0899980dad8ff7447457ea5a71bbc59f7702a9280eb5 cursor=Y3Vyc29yOnYyOpHOBECNfg== name=metamates org=pytorch": { + "data": { + "organization": { + "team": { + "members": { + "nodes": [ + { + "login": "sshawnwu" + }, + { + "login": "andrewyounkins" + }, + { + "login": "njuvekar" + }, + { + "login": "iramazanli" + }, + { + "login": "jnkwok1" + }, + { + "login": "jbschlosser" + }, + { + "login": "ccongge" + }, + { + "login": "haichuan-fb" + }, + { + "login": "wwang84" + }, + { + "login": "JustinPinero" + }, + { + "login": "gcramer23" + }, + { + "login": "woo-kim" + }, + { + "login": "chowarfb" + }, + { + "login": "priyaramani" + }, + { + "login": "yidawang-oss" + }, + { + "login": "beback4u" + }, + { + "login": "asalioufb" + }, + { + "login": "four4fish" + }, + { + "login": "kkosik20" + }, + { + "login": "KZFB" + }, + { + "login": "henryliu-bluehills" + } + ], + "pageInfo": { + "hasNextPage": false, + "endCursor": "Y3Vyc29yOnYyOpHOBftYGg==" + } + } + } + } + } + }, + "query_sha=a91ab398f97fb43cbe6e0899980dad8ff7447457ea5a71bbc59f7702a9280eb5 cursor=None name=qwertyuiop org=pytorch": { + "data": { + "organization": { + "team": null + } + } + } +} diff --git a/.github/scripts/install_nvidia_utils_linux.sh b/.github/scripts/install_nvidia_utils_linux.sh index 0db7de71f4fc80..b854320c9eaa40 100755 --- a/.github/scripts/install_nvidia_utils_linux.sh +++ b/.github/scripts/install_nvidia_utils_linux.sh @@ -3,7 +3,7 @@ set -eou pipefail DISTRIBUTION=$(. /etc/os-release;echo $ID$VERSION_ID) \ -DRIVER_FN="NVIDIA-Linux-x86_64-495.44.run" +DRIVER_FN="NVIDIA-Linux-x86_64-510.60.02.run" YUM_REPO_URL="https://nvidia.github.io/nvidia-docker/${DISTRIBUTION}/nvidia-docker.repo" install_nvidia_docker2_amzn2() { diff --git a/.github/scripts/syncbranches.py b/.github/scripts/syncbranches.py index 163c4b3759b800..8437e1fa9c1818 100755 --- a/.github/scripts/syncbranches.py +++ b/.github/scripts/syncbranches.py @@ -1,6 +1,6 @@ #!/usr/bin/env python3 -from gitutils import get_git_repo_dir, GitRepo +from gitutils import get_git_repo_dir, get_git_remote_name, GitRepo from typing import Any @@ -16,7 +16,7 @@ def parse_args() -> Any: def main() -> None: args = parse_args() - repo = GitRepo(get_git_repo_dir(), debug=args.debug) + repo = GitRepo(get_git_repo_dir(), get_git_remote_name(), debug=args.debug) repo.cherry_pick_commits(args.sync_branch, args.default_branch) repo.push(args.default_branch, args.dry_run) diff --git a/.github/scripts/test_trymerge.py b/.github/scripts/test_trymerge.py index 539aec9b9c6933..753936d616a488 100755 --- a/.github/scripts/test_trymerge.py +++ b/.github/scripts/test_trymerge.py @@ -1,10 +1,20 @@ #!/usr/bin/env python3 +# Tests implemented in this file are relying on GitHub GraphQL APIs +# In order to avoid test flakiness, results of the queries +# are cached in gql_mocks.json +# PyTorch Lint workflow does not have GITHUB_TOKEN defined to avoid +# flakiness, so if you are making changes to merge_rules or +# GraphQL queries in trymerge.py, please make sure to delete `gql_mocks.json` +# And re-run the test locally with ones PAT + import json import os from hashlib import sha256 -from trymerge import gh_graphql, GitHubPR +from trymerge import find_matching_merge_rule, gh_graphql, gh_get_team_members, GitHubPR +from gitutils import get_git_remote_name, get_git_repo_dir, GitRepo from typing import Any from unittest import TestCase, main, mock +from urllib.error import HTTPError def mocked_gh_graphql(query: str, **kwargs: Any) -> Any: gql_db_fname = os.path.join(os.path.dirname(__file__), "gql_mocks.json") @@ -17,7 +27,8 @@ def get_mocked_queries() -> Any: def save_mocked_queries(obj: Any) -> None: with open(gql_db_fname, encoding="utf-8", mode="w") as f: - json.dump(obj, f) + json.dump(obj, f, indent=2) + f.write("\n") key = f"query_sha={sha256(query.encode('utf-8')).hexdigest()} " + " ".join([f"{k}={kwargs[k]}" for k in sorted(kwargs.keys())]) mocked_queries = get_mocked_queries() @@ -25,7 +36,16 @@ def save_mocked_queries(obj: Any) -> None: if key in mocked_queries: return mocked_queries[key] - rc = gh_graphql(query, **kwargs) + try: + rc = gh_graphql(query, **kwargs) + except HTTPError as err: + if err.code == 401: + err_msg = "If you are seeing this message during workflow run, please make sure to update gql_mocks.json" + err_msg += f" locally, by deleting it and running {os.path.basename(__file__)} with " + err_msg += " GitHub Personal Access Token passed via GITHUB_TOKEN environment variable" + if os.getenv("GITHUB_TOKEN") is None: + err_msg = "Failed to update cached GraphQL queries as GITHUB_TOKEN is not defined." + err_msg + raise RuntimeError(err_msg) from err mocked_queries[key] = rc save_mocked_queries(mocked_queries) @@ -34,6 +54,29 @@ def save_mocked_queries(obj: Any) -> None: class TestGitHubPR(TestCase): + @mock.patch('trymerge.gh_graphql', side_effect=mocked_gh_graphql) + def test_match_rules(self, mocked_gql: Any) -> None: + "Tests that PR passes merge rules" + pr = GitHubPR("pytorch", "pytorch", 71759) + repo = GitRepo(get_git_repo_dir(), get_git_remote_name()) + self.assertTrue(find_matching_merge_rule(pr, repo) is not None) + + @mock.patch('trymerge.gh_graphql', side_effect=mocked_gh_graphql) + def test_lint_fails(self, mocked_gql: Any) -> None: + "Tests that PR fails mandatory lint check" + pr = GitHubPR("pytorch", "pytorch", 74649) + repo = GitRepo(get_git_repo_dir(), get_git_remote_name()) + self.assertRaises(RuntimeError, lambda: find_matching_merge_rule(pr, repo)) + + @mock.patch('trymerge.gh_graphql', side_effect=mocked_gh_graphql) + def test_get_last_comment(self, mocked_gql: Any) -> None: + "Tests that last comment can be fetched" + pr = GitHubPR("pytorch", "pytorch", 71759) + comment = pr.get_last_comment() + self.assertEqual(comment.author_login, "github-actions") + self.assertIsNone(comment.editor_login) + self.assertTrue("You've committed this PR" in comment.body_text) + @mock.patch('trymerge.gh_graphql', side_effect=mocked_gh_graphql) def test_get_author_null(self, mocked_gql: Any) -> None: """ Tests that PR author can be computed @@ -43,6 +86,7 @@ def test_get_author_null(self, mocked_gql: Any) -> None: author = pr.get_author() self.assertTrue(author is not None) self.assertTrue("@" in author) + self.assertTrue(pr.get_diff_revision() is None) @mock.patch('trymerge.gh_graphql', side_effect=mocked_gh_graphql) def test_large_diff(self, mocked_gql: Any) -> None: @@ -52,6 +96,43 @@ def test_large_diff(self, mocked_gql: Any) -> None: flist = pr.get_changed_files() self.assertEqual(len(flist), pr.get_changed_files_count()) + @mock.patch('trymerge.gh_graphql', side_effect=mocked_gh_graphql) + def test_internal_changes(self, mocked_gql: Any) -> None: + "Tests that PR with internal changes is detected" + pr = GitHubPR("pytorch", "pytorch", 73969) + self.assertTrue(pr.has_internal_changes()) + + @mock.patch('trymerge.gh_graphql', side_effect=mocked_gh_graphql) + def test_checksuites_pagination(self, mocked_gql: Any) -> None: + "Tests that PR with lots of checksuits can be fetched" + pr = GitHubPR("pytorch", "pytorch", 73811) + self.assertGreater(len(pr.get_checkrun_conclusions()), 0) + + @mock.patch('trymerge.gh_graphql', side_effect=mocked_gh_graphql) + def test_comments_pagination(self, mocked_gql: Any) -> None: + "Tests that PR with 50+ comments can be fetched" + pr = GitHubPR("pytorch", "pytorch", 31093) + self.assertGreater(len(pr.get_comments()), 50) + + @mock.patch('trymerge.gh_graphql', side_effect=mocked_gh_graphql) + def test_gql_complexity(self, mocked_gql: Any) -> None: + "Fetch comments and conclusions for PR with 60 commits" + # Previous version of GrapQL query used to cause HTTP/502 error + # see https://gist.github.com/malfet/9b93bc7eeddeaf1d84546efc4f0c577f + pr = GitHubPR("pytorch", "pytorch", 68111) + self.assertGreater(len(pr.get_comments()), 20) + self.assertGreater(len(pr.get_checkrun_conclusions()), 3) + self.assertGreater(pr.get_commit_count(), 60) + + @mock.patch('trymerge.gh_graphql', side_effect=mocked_gh_graphql) + def test_team_members(self, mocked_gql: Any) -> None: + "Test fetching team members works" + dev_infra_team = gh_get_team_members("pytorch", "pytorch-dev-infra") + self.assertGreater(len(dev_infra_team), 2) + with self.assertWarns(Warning): + non_existing_team = gh_get_team_members("pytorch", "qwertyuiop") + self.assertEqual(len(non_existing_team), 0) + if __name__ == "__main__": main() diff --git a/.github/scripts/trymerge.py b/.github/scripts/trymerge.py index 25ba3db7feb112..0f0fadbd13e2b9 100755 --- a/.github/scripts/trymerge.py +++ b/.github/scripts/trymerge.py @@ -8,6 +8,8 @@ from urllib.error import HTTPError from typing import cast, Any, Callable, Dict, List, Optional, Tuple, Union from gitutils import get_git_remote_name, get_git_repo_dir, patterns_to_regex, GitRepo +from functools import lru_cache +from warnings import warn GH_GET_PR_INFO_QUERY = """ @@ -36,7 +38,7 @@ mergeCommit { oid } - commits(first: 100) { + commits_with_authors:commits(first: 100) { nodes { commit { author { @@ -47,17 +49,44 @@ name } oid - checkSuites(filterBy: {appId: 12274}, first: 1) { + } + } + totalCount + } + commits(last: 1) { + nodes { + commit { + checkSuites(first: 50) { nodes { app { + name databaseId } + workflowRun { + workflow { + name + } + } + checkRuns(first: 10) { + nodes { + name + conclusion + } + pageInfo { + endCursor + hasNextPage + } + } conclusion } + pageInfo { + endCursor + hasNextPage + } } + oid } } - totalCount } changedFiles files(first: 100) { @@ -78,7 +107,7 @@ } totalCount } - comments(last: 1) { + comments(last: 5) { nodes { bodyText author { @@ -88,6 +117,11 @@ editor { login } + databaseId + } + pageInfo { + startCursor + hasPreviousPage } } } @@ -113,6 +147,95 @@ } """ +GH_GET_PR_NEXT_CHECK_RUNS = """ +query ($owner: String!, $name: String!, $number: Int!, $cursor: String!) { + repository(name: $name, owner: $owner) { + pullRequest(number: $number) { + commits(last: 1) { + nodes { + commit { + oid + checkSuites(first: 100, after: $cursor) { + nodes { + app { + name + databaseId + } + workflowRun { + workflow { + name + } + } + checkRuns(first: 10) { + nodes { + name + conclusion + } + pageInfo { + endCursor + hasNextPage + } + } + conclusion + } + pageInfo { + endCursor + hasNextPage + } + } + } + } + } + } + } +} +""" + +GH_GET_PR_PREV_COMMENTS = """ +query ($owner: String!, $name: String!, $number: Int!, $cursor: String!) { + repository(name: $name, owner: $owner) { + pullRequest(number: $number) { + comments(last: 100, before: $cursor) { + nodes { + bodyText + author { + login + } + authorAssociation + editor { + login + } + databaseId + } + pageInfo { + startCursor + hasPreviousPage + } + } + } + } +} +""" + +# This query needs read-org permission +GH_GET_TEAM_MEMBERS_QUERY = """ +query($org: String!, $name: String!, $cursor: String) { + organization(login: $org) { + team(slug: $name) { + members(first: 100, after: $cursor) { + nodes { + login + } + pageInfo { + hasNextPage + endCursor + } + } + } + } +} +""" + RE_GHSTACK_HEAD_REF = re.compile(r"^(gh/[^/]+/[0-9]+/)head$") RE_GHSTACK_SOURCE_ID = re.compile(r'^ghstack-source-id: (.+)\n?', re.MULTILINE) RE_PULL_REQUEST_RESOLVED = re.compile( @@ -178,15 +301,41 @@ def gh_get_pr_info(org: str, proj: str, pr_no: int) -> Any: return rc["data"]["repository"]["pullRequest"] +@lru_cache(maxsize=None) +def gh_get_team_members(org: str, name: str) -> List[str]: + rc: List[str] = [] + team_members: Dict[str, Any] = {"pageInfo": {"hasNextPage": "true", "endCursor": None}} + while bool(team_members["pageInfo"]["hasNextPage"]): + query = gh_graphql(GH_GET_TEAM_MEMBERS_QUERY, org=org, name=name, cursor=team_members["pageInfo"]["endCursor"]) + team = query["data"]["organization"]["team"] + if team is None: + warn(f"Requested non-existing team {org}/{name}") + return [] + team_members = team["members"] + rc += [member["login"] for member in team_members["nodes"]] + return rc + + def parse_args() -> Any: from argparse import ArgumentParser parser = ArgumentParser("Merge PR into default branch") parser.add_argument("--dry-run", action="store_true") parser.add_argument("--revert", action="store_true") + parser.add_argument("--force", action="store_true") + parser.add_argument("--comment-id", type=int) parser.add_argument("pr_num", type=int) return parser.parse_args() +@dataclass +class GitHubComment: + body_text: str + author_login: str + author_association: str + editor_login: Optional[str] + database_id: int + + class GitHubPR: def __init__(self, org: str, project: str, pr_num: int) -> None: assert isinstance(pr_num, int) @@ -195,6 +344,8 @@ def __init__(self, org: str, project: str, pr_num: int) -> None: self.pr_num = pr_num self.info = gh_get_pr_info(org, project, pr_num) self.changed_files: Optional[List[str]] = None + self.conclusions: Optional[Dict[str, str]] = None + self.comments: Optional[List[GitHubComment]] = None def is_closed(self) -> bool: return bool(self.info["closed"]) @@ -257,28 +408,56 @@ def get_approved_by(self) -> List[str]: return [login for (login, state) in self._get_reviewers() if state == "APPROVED"] def get_commit_count(self) -> int: - return int(self.info["commits"]["totalCount"]) + return int(self.info["commits_with_authors"]["totalCount"]) def get_pr_creator_login(self) -> str: return cast(str, self.info["author"]["login"]) def get_committer_login(self, num: int = 0) -> str: - user = self.info["commits"]["nodes"][num]["commit"]["author"]["user"] + user = self.info["commits_with_authors"]["nodes"][num]["commit"]["author"]["user"] # If author is not github user, user node will be null if user is None: return "" return cast(str, user["login"]) def get_committer_author(self, num: int = 0) -> str: - node = self.info["commits"]["nodes"][num]["commit"]["author"] + node = self.info["commits_with_authors"]["nodes"][num]["commit"]["author"] return f"{node['name']} <{node['email']}>" - def get_check_suite_conclusions(self) -> Dict[int, str]: - last_commit = self.info["commits"]["nodes"][-1]["commit"] - rc = {} - for node in last_commit["checkSuites"]["nodes"]: - rc[int(node["app"]["databaseId"])] = node["conclusion"] - return rc + def get_checkrun_conclusions(self) -> Dict[str, str]: + """ Returns list of checkrun / conclusions """ + if self.conclusions is not None: + return self.conclusions + orig_last_commit = self.info["commits"]["nodes"][-1]["commit"] + checksuites = orig_last_commit["checkSuites"] + conclusions = {} + + def add_conclusions(nodes: List[Dict[str, Any]]) -> None: + for node in nodes: + workflow_run = node["workflowRun"] + checkruns = node["checkRuns"] + if workflow_run is not None: + conclusions[workflow_run["workflow"]["name"]] = node["conclusion"] + continue + if checkruns is not None: + for checkrun_node in checkruns["nodes"]: + conclusions[checkrun_node["name"]] = checkrun_node["conclusion"] + + add_conclusions(checksuites["nodes"]) + while bool(checksuites["pageInfo"]["hasNextPage"]): + rc = gh_graphql(GH_GET_PR_NEXT_CHECK_RUNS, + name=self.project, + owner=self.org, + number=self.pr_num, + cursor=checksuites["pageInfo"]["endCursor"]) + info = rc["data"]["repository"]["pullRequest"] + last_commit = info["commits"]["nodes"][-1]["commit"] + if last_commit["oid"] != orig_last_commit["oid"]: + raise RuntimeError("Last commit changed on PR") + checksuites = last_commit["checkSuites"] + add_conclusions(checksuites["nodes"]) + self.conclusions = conclusions + return conclusions def get_authors(self) -> Dict[str, str]: rc = {} @@ -306,20 +485,64 @@ def get_merge_commit(self) -> Optional[str]: def get_pr_url(self) -> str: return f"https://github.com/{self.org}/{self.project}/pull/{self.pr_num}" - def get_comment_body(self, num: int = -1) -> str: - return cast(str, self.info["comments"]["nodes"][num]["bodyText"]) - - def get_comment_author_login(self, num: int = -1) -> str: - return cast(str, self.info["comments"]["nodes"][num]["author"]["login"]) - - def get_comment_editor_login(self, num: int = -1) -> Optional[str]: - rc = self.info["comments"]["nodes"][num]["editor"] - return rc["login"] if rc is not None else None - - def get_comment_author_association(self, num: int = -1) -> str: - return cast(str, self.info["comments"]["nodes"][num]["authorAssociation"]) - - def merge_ghstack_into(self, repo: GitRepo) -> None: + @staticmethod + def _comment_from_node(node: Any) -> GitHubComment: + editor = node["editor"] + return GitHubComment(body_text=node["bodyText"], + author_login=node["author"]["login"], + author_association=node["authorAssociation"], + editor_login=editor["login"] if editor else None, + database_id=node["databaseId"] + ) + + def get_comments(self) -> List[GitHubComment]: + if self.comments is not None: + return self.comments + self.comments = [] + info = self.info["comments"] + # Do not try to fetch more than 10K comments + for _ in range(100): + self.comments = [self._comment_from_node(node) for node in info["nodes"]] + self.comments + if not info["pageInfo"]["hasPreviousPage"]: + break + rc = gh_graphql(GH_GET_PR_PREV_COMMENTS, + name=self.project, + owner=self.org, + number=self.pr_num, + cursor=info["pageInfo"]["startCursor"]) + info = rc["data"]["repository"]["pullRequest"]["comments"] + return self.comments + + def get_last_comment(self) -> GitHubComment: + return self._comment_from_node(self.info["comments"]["nodes"][-1]) + + def get_comment_by_id(self, database_id: int) -> GitHubComment: + if self.comments is None: + # Fastpath - try searching in partial prefetched comments + for node in self.info["comments"]["nodes"]: + comment = self._comment_from_node(node) + if comment.database_id == database_id: + return comment + + for comment in self.get_comments(): + if comment.database_id == database_id: + return comment + raise RuntimeError(f"Comment with id {database_id} not found") + + def get_diff_revision(self) -> Optional[str]: + rc = RE_DIFF_REV.search(self.get_body()) + return rc.group(1) if rc is not None else None + + def has_internal_changes(self) -> bool: + checkrun_name = "Meta Internal-Only Changes Check" + if self.get_diff_revision() is None: + return False + checks = self.get_checkrun_conclusions() + if checks is None or checkrun_name not in checks: + return False + return checks[checkrun_name] != "SUCCESS" + + def merge_ghstack_into(self, repo: GitRepo, force: bool) -> None: assert self.is_ghstack_pr() approved_by = self.get_approved_by() # For ghstack, cherry-pick commits based from origin @@ -340,7 +563,7 @@ def merge_ghstack_into(self, repo: GitRepo) -> None: continue approved_by = pr.get_approved_by() # Raises exception if matching rule is not found - find_matching_merge_rule(pr, repo) + find_matching_merge_rule(pr, repo, force=force) # Adding the url here makes it clickable within the Github UI approved_by_urls = ', '.join(prefix_with_github_url(login) for login in approved_by) @@ -349,9 +572,11 @@ def merge_ghstack_into(self, repo: GitRepo) -> None: msg += f"\nApproved by: {approved_by_urls}\n" repo.amend_commit_message(msg) - def merge_into(self, repo: GitRepo, dry_run: bool = False) -> None: + def merge_into(self, repo: GitRepo, *, force: bool = False, dry_run: bool = False) -> None: # Raises exception if matching rule is not found - find_matching_merge_rule(self, repo) + find_matching_merge_rule(self, repo, force=force) + if self.has_internal_changes(): + raise RuntimeError("This PR must be landed via phabricator") if repo.current_branch() != self.default_branch(): repo.checkout(self.default_branch()) if not self.is_ghstack_pr(): @@ -365,7 +590,7 @@ def merge_into(self, repo: GitRepo, dry_run: bool = False) -> None: repo._run_git("merge", "--squash", pr_branch_name) repo._run_git("commit", f"--author=\"{self.get_author()}\"", "-m", msg) else: - self.merge_ghstack_into(repo) + self.merge_ghstack_into(repo, force) repo.push(self.default_branch(), dry_run) @@ -375,7 +600,7 @@ class MergeRule: name: str patterns: List[str] approved_by: List[str] - mandatory_app_id: Optional[int] + mandatory_checks_name: Optional[List[str]] def read_merge_rules(repo: GitRepo) -> List[MergeRule]: @@ -389,57 +614,85 @@ def read_merge_rules(repo: GitRepo) -> List[MergeRule]: return cast(List[MergeRule], rc) - -def find_matching_merge_rule(pr: GitHubPR, repo: GitRepo) -> MergeRule: +def find_matching_merge_rule(pr: GitHubPR, repo: GitRepo, force: bool = False) -> MergeRule: """Returns merge rule matching to this pr or raises an exception""" changed_files = pr.get_changed_files() approved_by = set(pr.get_approved_by()) rules = read_merge_rules(repo) + reject_reason = f"PR {pr.pr_num} does not match merge rules" + # Used to determine best rejection reason + # Score 0 to 10K - how many files rule matched + # Score 10K - matched all files, but no overlapping approvers + # Score 20K - matched all files and approvers, but lacks mandatory checks + reject_reason_score = 0 for rule in rules: rule_name = rule.name - rule_approvers_set = set(rule.approved_by) + rule_approvers_set = set() + for approver in rule.approved_by: + if "/" in approver: + org, name = approver.split("/") + rule_approvers_set.update(gh_get_team_members(org, name)) + else: + rule_approvers_set.add(approver) patterns_re = patterns_to_regex(rule.patterns) approvers_intersection = approved_by.intersection(rule_approvers_set) - # If rule requires approvers but they aren't the ones that reviewed PR - if len(approvers_intersection) == 0 and len(rule_approvers_set) > 0: - print(f"Skipping rule {rule_name} due to no approvers overlap") - continue - if rule.mandatory_app_id is not None: - cs_conslusions = pr.get_check_suite_conclusions() - mandatory_app_id = rule.mandatory_app_id - if mandatory_app_id not in cs_conslusions or cs_conslusions[mandatory_app_id] != "SUCCESS": - print(f"Skipping rule {rule_name} as mandatory app {mandatory_app_id} is not in {cs_conslusions}") - continue non_matching_files = [] for fname in changed_files: if not patterns_re.match(fname): non_matching_files.append(fname) if len(non_matching_files) > 0: - print(f"Skipping rule {rule_name} due to non-matching files: {non_matching_files}") + num_matching_files = len(changed_files) - len(non_matching_files) + if num_matching_files > reject_reason_score: + reject_reason_score = num_matching_files + reject_reason = (f"{num_matching_files} files matched rule {rule_name}, but there are still non-matching files: " + + f"{','.join(non_matching_files[:5])}{', ...' if len(non_matching_files) > 5 else ''}") continue - print(f"Matched rule {rule_name} for {pr.pr_num}") + # If rule requires approvers but they aren't the ones that reviewed PR + if len(approvers_intersection) == 0 and len(rule_approvers_set) > 0: + if reject_reason_score < 10000: + reject_reason_score = 10000 + reject_reason = (f"Matched rule {rule_name}, but it was not reviewed yet by any of:" + + f"{','.join(list(rule_approvers_set)[:5])}{', ...' if len(rule_approvers_set) > 5 else ''}") + continue + if rule.mandatory_checks_name is not None: + pass_checks = True + checks = pr.get_checkrun_conclusions() + # HACK: We don't want to skip CLA check, even when forced + for checkname in filter(lambda x: force is False or "CLA Check" in x, rule.mandatory_checks_name): + if checkname not in checks or checks[checkname] != "SUCCESS": + if reject_reason_score < 20000: + reject_reason_score = 20000 + reject_reason = f"Refusing to merge as mandatory check {checkname} " + reject_reason += "has not been run" if checkname not in checks else "failed" + reject_reason += f" for rule {rule_name}" + pass_checks = False + if not pass_checks: + continue + if pr.has_internal_changes(): + raise RuntimeError("This PR has internal changes and must be landed via Phabricator") return rule - raise RuntimeError(f"PR {pr.pr_num} does not match merge rules") + raise RuntimeError(reject_reason) -def try_revert(repo: GitRepo, pr: GitHubPR, dry_run: bool = False) -> None: +def try_revert(repo: GitRepo, pr: GitHubPR, *, dry_run: bool = False, comment_id: Optional[int] = None) -> None: def post_comment(msg: str) -> None: gh_post_comment(pr.org, pr.project, pr.pr_num, msg, dry_run=dry_run) if not pr.is_closed(): return post_comment(f"Can't revert open PR #{pr.pr_num}") - if not RE_REVERT_CMD.match(pr.get_comment_body()): - raise RuntimeError(f"Comment {pr.get_comment_body()} does not seem to be a valid revert command") - if pr.get_comment_editor_login() is not None: + comment = pr.get_last_comment() if comment_id is None else pr.get_comment_by_id(comment_id) + if not RE_REVERT_CMD.match(comment.body_text): + raise RuntimeError(f"Comment {comment.body_text} does not seem to be a valid revert command") + if comment.editor_login is not None: return post_comment("Don't want to revert based on edited command") - author_association = pr.get_comment_author_association() - author_login = pr.get_comment_author_login() + author_association = comment.author_association + author_login = comment.author_login # For some reason, one can not be a member of private repo, only CONTRIBUTOR expected_association = "CONTRIBUTOR" if pr.is_base_repo_private() else "MEMBER" if author_association != expected_association and author_association != "OWNER": return post_comment(f"Will not revert as @{author_login} is not a {expected_association}, but {author_association}") - # Raises exception if matching rule is not found - find_matching_merge_rule(pr, repo) + # Raises exception if matching rule is not found, but ignores all status checks + find_matching_merge_rule(pr, repo, force=True) commit_sha = pr.get_merge_commit() if commit_sha is None: commits = repo.commits_resolving_gh_pr(pr.pr_num) @@ -473,7 +726,7 @@ def main() -> None: pr = GitHubPR(org, project, args.pr_num) if args.revert: try: - try_revert(repo, pr, dry_run=args.dry_run) + try_revert(repo, pr, dry_run=args.dry_run, comment_id=args.comment_id) except Exception as e: msg = f"Reverting PR {args.pr_num} failed due to {e}" run_url = os.getenv("GH_RUN_URL") @@ -491,7 +744,7 @@ def main() -> None: return try: - pr.merge_into(repo, dry_run=args.dry_run) + pr.merge_into(repo, dry_run=args.dry_run, force=args.force) except Exception as e: msg = f"Merge failed due to {e}" run_url = os.getenv("GH_RUN_URL") diff --git a/.github/templates/android_ci_full_workflow.yml.j2 b/.github/templates/android_ci_full_workflow.yml.j2 deleted file mode 100644 index 9736bee5c4ed81..00000000000000 --- a/.github/templates/android_ci_full_workflow.yml.j2 +++ /dev/null @@ -1,165 +0,0 @@ -{%- extends "linux_ci_workflow.yml.j2" -%} -{% import 'common_android.yml.j2' as common_android %} -{%- set exclude_test = true -%} -{% block name -%} -# Template is at: .github/templates/android_ci_full_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: !{{ build_environment }} -{%- endblock %} - -on: -{%- if is_default %} - pull_request: -{%- endif -%} -{%- for label in ciflow_config.labels | sort %} - {%- if loop.first %} - push: - tags: - {%- endif %} - {%- if label != "ciflow/default" %} - - '!{{ label }}/*' - {%- endif %} -{%- endfor %} - -{% block build +%} - # building and testing in a single job since bazel runs only small subset of tests - build-and-test: - runs-on: !{{ test_runner_type }} - env: - JOB_BASE_NAME: !{{ build_environment }}-build-and-test - NUM_TEST_SHARDS: !{{ num_test_shards }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - !{{ common.setup_ec2_linux() }} - !{{ common.checkout() }} - !{{ common.calculate_docker_image(false) }} - - name: Pull Docker image - run: | - !{{ common.add_retry_to_env() }} - retry docker pull "${DOCKER_IMAGE}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - name: Output disk space left - run: | - sudo df -H - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - !{{ common.parse_ref() }} - !{{ common_android.build_android("pytorch-linux-xenial-py3-clang5-android-ndk-r19c-arm-v7a-build", "arm-v7a") }} - !{{ common_android.build_android("pytorch-linux-xenial-py3-clang5-android-ndk-r19c-arm-v8a-build", "arm-v8a") }} - !{{ common_android.build_android("pytorch-linux-xenial-py3-clang5-android-ndk-r19c-x86_32-build", "x86_32") }} - !{{ common_android.build_android("pytorch-linux-xenial-py3-clang5-android-ndk-r19c-x86_64-build", "x86_64") }} - - name: Build final artifact - env: - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - set -eux - - docker_image_libtorch_android_x86_32="${DOCKER_IMAGE}-x86_32" - docker_image_libtorch_android_x86_64="${DOCKER_IMAGE}-x86_64" - docker_image_libtorch_android_arm_v7a="${DOCKER_IMAGE}-arm-v7a" - docker_image_libtorch_android_arm_v8a="${DOCKER_IMAGE}-arm-v8a" - - echo "docker_image_commit: ${DOCKER_IMAGE}" - echo "docker_image_libtorch_android_x86_32: ${docker_image_libtorch_android_x86_32}" - echo "docker_image_libtorch_android_x86_64: ${docker_image_libtorch_android_x86_64}" - echo "docker_image_libtorch_android_arm_v7a: ${docker_image_libtorch_android_arm_v7a}" - echo "docker_image_libtorch_android_arm_v8a: ${docker_image_libtorch_android_arm_v8a}" - - # x86_32 - time docker pull "${docker_image_libtorch_android_x86_32}" >/dev/null - export id_x86_32 - id_x86_32=$(docker run -e GRADLE_OFFLINE=1 --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins "${docker_image_libtorch_android_x86_32}") - - # shellcheck disable=SC1105 - ((echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "${id_x86_32}" bash) 2>&1 - - # arm-v7a - time docker pull "${docker_image_libtorch_android_arm_v7a}" >/dev/null - export id_arm_v7a - id_arm_v7a=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins "${docker_image_libtorch_android_arm_v7a}") - - # shellcheck disable=SC1105 - ((echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "${id_arm_v7a}" bash) 2>&1 - - mkdir -p "${GITHUB_WORKSPACE}/build_android_install_arm_v7a" - docker cp "${id_arm_v7a}:/var/lib/jenkins/workspace/build_android/install" "${GITHUB_WORKSPACE}/build_android_install_arm_v7a" - - # x86_64 - time docker pull "${docker_image_libtorch_android_x86_64}" >/dev/null - export id_x86_64 - id_x86_64=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins "${docker_image_libtorch_android_x86_64}") - - # shellcheck disable=SC1105 - ((echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "${id_x86_64}" bash) 2>&1 - - mkdir -p "${GITHUB_WORKSPACE}/build_android_install_x86_64" - docker cp "${id_x86_64}:/var/lib/jenkins/workspace/build_android/install" "${GITHUB_WORKSPACE}/build_android_install_x86_64" - - # arm-v8a - time docker pull "${docker_image_libtorch_android_arm_v8a}" >/dev/null - export id_arm_v8a - id_arm_v8a=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins "${docker_image_libtorch_android_arm_v8a}") - - # shellcheck disable=SC1105 - ((echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "$id_arm_v8a" bash) 2>&1 - - mkdir -p "${GITHUB_WORKSPACE}/build_android_install_arm_v8a" - docker cp "${id_arm_v8a}:/var/lib/jenkins/workspace/build_android/install" "${GITHUB_WORKSPACE}/build_android_install_arm_v8a" - - # Putting everything together - docker cp "${GITHUB_WORKSPACE}/build_android_install_arm_v7a" "${id_x86_32}:/var/lib/jenkins/workspace/build_android_install_arm_v7a" - docker cp "${GITHUB_WORKSPACE}/build_android_install_x86_64" "${id_x86_32}:/var/lib/jenkins/workspace/build_android_install_x86_64" - docker cp "${GITHUB_WORKSPACE}/build_android_install_arm_v8a" "${id_x86_32}:/var/lib/jenkins/workspace/build_android_install_arm_v8a" - - # run gradle buildRelease - # shellcheck disable=SC1105 - ((echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/build_android_gradle.sh") | docker exec \ - -e BUILD_ENVIRONMENT="pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build" \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e AWS_DEFAULT_REGION \ - -e IS_GHA \ - -e PR_NUMBER \ - -e SHA1 \ - -e BRANCH \ - -e GITHUB_RUN_ID \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e PR_LABELS \ - -e http_proxy="!{{ common.squid_proxy }}" -e https_proxy="!{{ common.squid_proxy }}" -e no_proxy="!{{ common.squid_no_proxy }}" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --user jenkins \ - -u jenkins -i "${id_x86_32}" bash) 2>&1 - - mkdir -p "${GITHUB_WORKSPACE}/build_android_artifacts" - docker cp "${id_x86_32}:/var/lib/jenkins/workspace/android/artifacts.tgz" "${GITHUB_WORKSPACE}/build_android_artifacts/" - - output_image="${DOCKER_IMAGE}-android-x86_32-gradle" - docker commit "${id_x86_32}" "${output_image}" - time docker push "${output_image}" - !{{ common_android.upload_androind_binary_size("prebuilt", "${GITHUB_WORKSPACE}/build_android_artifacts/artifacts.tgz") }} - - uses: !{{ common.upload_artifact_s3_action }} - name: Store PyTorch Android Build Artifacts on S3 - with: - name: ${{ env.BUILD_ENVIRONMENT }} - retention-days: 14 - if-no-files-found: error - path: - build_android_artifacts/artifacts.tgz - !{{ common.teardown_ec2_linux() }} -{%- endblock %} diff --git a/.github/templates/android_ci_workflow.yml.j2 b/.github/templates/android_ci_workflow.yml.j2 deleted file mode 100644 index c86b94c1ad48b8..00000000000000 --- a/.github/templates/android_ci_workflow.yml.j2 +++ /dev/null @@ -1,111 +0,0 @@ -{%- extends "linux_ci_workflow.yml.j2" -%} -{% import 'common_android.yml.j2' as common_android %} -{%- set exclude_test = true -%} -{% block name -%} -# Template is at: .github/templates/android_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: !{{ build_environment }} -{%- endblock %} - -on: -{%- if is_default %} - pull_request: -{%- endif -%} -{%- for label in ciflow_config.labels | sort %} - {%- if loop.first %} - push: - tags: - {%- endif %} - {%- if label != "ciflow/default" %} - - '!{{ label }}/*' - {%- endif %} -{%- endfor %} - -{% block build +%} - # building and testing in a single job since bazel runs only small subset of tests - build-and-test: - runs-on: !{{ test_runner_type }} - env: - JOB_BASE_NAME: !{{ build_environment }}-build-and-test - NUM_TEST_SHARDS: !{{ num_test_shards }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - !{{ common.setup_ec2_linux() }} - !{{ common.checkout() }} - !{{ common.calculate_docker_image(false) }} - - name: Pull Docker image - run: | - !{{ common.add_retry_to_env() }} - retry docker pull "${DOCKER_IMAGE}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - name: Output disk space left - run: | - sudo df -H - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Build - run: | - set -e - # Unlike other gradle jobs, it's not worth building libtorch in a separate CI job and share via docker, because: - # 1) Not shareable: it's custom selective build, which is different from default libtorch mobile build; - # 2) Not parallelizable by architecture: it only builds libtorch for one architecture; - - echo "DOCKER_IMAGE: ${DOCKER_IMAGE}" - time docker pull "${DOCKER_IMAGE}" >/dev/null - - export BUILD_LITE_INTERPRETER - BUILD_LITE_INTERPRETER="1" - if [[ "${BUILD_ENVIRONMENT}" == *"full-jit" ]]; then - BUILD_LITE_INTERPRETER="0" - fi - - git submodule sync && git submodule update -q --init --recursive --depth 1 --jobs 0 - # shellcheck disable=SC2016 - export id - id=$(docker run -e BUILD_ENVIRONMENT \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e PR_LABELS \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e BUILD_LITE_INTERPRETER \ - -e http_proxy="!{{ common.squid_proxy }}" -e https_proxy="!{{ common.squid_proxy }}" -e no_proxy="!{{ common.squid_no_proxy }}" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -v "$(pwd):/var/lib/jenkins/workspace" \ - --cap-add=SYS_PTRACE \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --security-opt seccomp=unconfined \ - -t -d -w /var/lib/jenkins "${DOCKER_IMAGE}") - - # shellcheck disable=SC2016 - export COMMAND - # shellcheck disable=SC2016 - COMMAND='((echo "export GRADLE_OFFLINE=1" && echo "export BUILD_LITE_INTERPRETER=${BUILD_LITE_INTERPRETER}" && echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/build_android_gradle.sh") | docker exec -u jenkins -i "$id" bash) 2>&1' - echo "${COMMAND}" > ./command.sh && bash ./command.sh - # Skip docker push as this job is purely for size analysis purpose. - # Result binaries are already in `/home/circleci/project/` as it's mounted instead of copied. - !{{ common.parse_ref() }} - !{{ common_android.upload_androind_binary_size("custom-build-single", "") }} - !{{ common.teardown_ec2_linux() }} -{%- endblock %} diff --git a/.github/templates/bazel_ci_workflow.yml.j2 b/.github/templates/bazel_ci_workflow.yml.j2 deleted file mode 100644 index 0480835794bc84..00000000000000 --- a/.github/templates/bazel_ci_workflow.yml.j2 +++ /dev/null @@ -1,127 +0,0 @@ -{%- extends "linux_ci_workflow.yml.j2" -%} -{% import 'common_android.yml.j2' as common_android %} -{%- set exclude_test = true -%} -{% block name -%} -# Template is at: .github/templates/bazel_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: !{{ build_environment }} -{%- endblock %} - -on: -{%- if is_default %} - pull_request: -{%- endif -%} -{%- for label in ciflow_config.labels | sort %} - {%- if loop.first %} - push: - tags: - {%- endif %} - {%- if label != "ciflow/default" %} - - '!{{ label }}/*' - {%- endif %} -{%- endfor %} - -{% block build +%} - # building and testing in a single job since bazel runs only small subset of tests - build-and-test: - runs-on: !{{ test_runner_type }} - env: - JOB_BASE_NAME: !{{ build_environment }}-build-and-test - NUM_TEST_SHARDS: !{{ num_test_shards }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - !{{ common.setup_ec2_linux() }} - !{{ common.checkout() }} - !{{ common.calculate_docker_image(false) }} - - name: Pull Docker image - run: | - !{{ common.add_retry_to_env() }} - retry docker pull "${DOCKER_IMAGE}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - name: Output disk space left - run: | - sudo df -H - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Build - run: | - # detached container should get cleaned up by teardown_ec2_linux - container_name=$(docker run \ - -e BUILD_ENVIRONMENT \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e PR_LABELS \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e http_proxy="!{{ common.squid_proxy }}" -e https_proxy="!{{ common.squid_proxy }}" -e no_proxy="!{{ common.squid_no_proxy }}" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && sudo chown -R jenkins /dev && .jenkins/pytorch/build.sh' - !{{ common.parse_ref() }} - !{{ common_android.upload_androind_binary_size("", "")}} - - name: Test - # Time out the test phase after 3.5 hours - timeout-minutes: 210 - run: | - # detached container should get cleaned up by teardown_ec2_linux - export SHARD_NUMBER=0 - # TODO: Stop building test binaries as part of the build phase - # Make sure we copy test results from bazel-testlogs symlink to - # a regular directory ./test/test-reports - container_name=$(docker run \ - -e BUILD_ENVIRONMENT \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e SHARD_NUMBER \ - -e NUM_TEST_SHARDS \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e PR_LABELS \ - -e http_proxy="!{{ common.squid_proxy }}" -e https_proxy="!{{ common.squid_proxy }}" -e no_proxy="!{{ common.squid_no_proxy }}" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && sudo chown -R jenkins /dev && .jenkins/pytorch/test.sh && cp -Lr ./bazel-testlogs ./test/test-reports' - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - !{{ common.upload_test_reports(name='bazel') }} - !{{ common.upload_downloaded_files(name='bazel') }} - !{{ common.upload_test_statistics(build_environment) }} - !{{ common.teardown_ec2_linux() }} -{%- endblock %} diff --git a/.github/templates/common.yml.j2 b/.github/templates/common.yml.j2 index 154745bcc98271..f701f92cf64cee 100644 --- a/.github/templates/common.yml.j2 +++ b/.github/templates/common.yml.j2 @@ -1,4 +1,4 @@ -{%- set upload_artifact_s3_action = "seemethere/upload-artifact-s3@v3" -%} +{%- set upload_artifact_s3_action = "seemethere/upload-artifact-s3@v4" -%} {# squid_proxy is an private ELB that only available for GHA custom runners #} {%- set squid_proxy = "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -%} @@ -22,6 +22,37 @@ concurrency: } {%- endmacro -%} +{%- macro gen_dispatch_rules(on_pull_request, is_scheduled, ciflow_labels, branches = ['master', 'main', 'release/*'], enable_doc_jobs = True) -%} +on: +{%- if on_pull_request %} + pull_request: +{%- endif %} + push: +{%- if enable_doc_jobs and is_scheduled %} + tags: + # NOTE: Binary build pipelines should only get triggered on release candidate builds + # Release candidate tags look like: v1.11.0-rc1 + - v[0-9]+.[0-9]+.[0-9]+-rc[0-9]+ +{%- endif %} +{%- for label in ciflow_labels | sort %} + {%- if loop.first and not (enable_doc_jobs and is_scheduled) %} + tags: + {%- endif %} + - '!{{ label }}/*' +{%- endfor %} +{%- if not is_scheduled %} + branches: +{%- for branch in branches %} + - !{{ branch }} +{%- endfor %} +{%- endif %} +{%- if is_scheduled %} + schedule: + - cron: !{{ is_scheduled }} +{%- endif %} + workflow_dispatch: +{%- endmacro -%} + {%- macro display_ec2_information() -%} - name: Display EC2 information shell: bash @@ -36,6 +67,7 @@ concurrency: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" {%- endmacro -%} {%- macro parse_ref(pytorch_directory="") -%} @@ -56,20 +88,25 @@ concurrency: if: !{{ when }} env: AWS_DEFAULT_REGION: us-east-1 + GIT_DEFAULT_BRANCH: ${{ github.event.repository.default_branch }} BRANCH: ${{ steps.parse-ref.outputs.branch }} JOB_BASE_NAME: !{{ build_environment }}-test PR_NUMBER: ${{ github.event.pull_request.number }} SHA1: ${{ github.event.pull_request.head.sha || github.sha }} TAG: ${{ steps.parse-ref.outputs.tag }} WORKFLOW_ID: '${{ github.run_id }}' + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} {%- if needs_credentials %} - AWS_ACCESS_KEY_ID: ${{ secrets.AWS_OSSCI_METRICS_ACCESS_KEY_ID }} - AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_OSSCI_METRICS_SECRET_ACCESS_KEY }} + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_OSSCI_METRICS_V2_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_OSSCI_METRICS_V2_SECRET_ACCESS_KEY }} {%- endif %} shell: bash run: | + set -x python3 -m pip install -r requirements.txt python3 -m pip install boto3==1.19.12 + GHA_WORKFLOW_JOB_ID=$(python3 .github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}") + export GHA_WORKFLOW_JOB_ID python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test {%- endmacro -%} @@ -87,19 +124,23 @@ concurrency: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore {%- endmacro -%} {%- macro setup_ec2_linux() -%} - !{{ display_ec2_information() }} - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - !{{ add_retry_to_env() }} - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | !{{ add_retry_to_env() }} @@ -114,9 +155,6 @@ concurrency: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" {%- endmacro -%} {%- macro setup_rocm_linux() -%} @@ -296,6 +334,25 @@ concurrency: test-reports-*.zip {%- endmacro -%} +{%- macro upload_cores(artifact_name="coredumps", config=None, shard=None, use_s3=True) -%} +{%- if use_s3 %}- uses: !{{ upload_artifact_s3_action }} + name: Store Core dumps on S3 +{%- else %}- uses: actions/upload-artifact@v2 + name: Store Core dumps on Github +{%- endif %} + if: failure() + with: +{%- if config != "" and shard != "" %} + name: !{{ artifact_name }}-!{{ config }}-!{{ shard }} +{%- else %} + name: !{{ artifact_name }} +{%- endif %} + retention-days: 14 + if-no-files-found: ignore + path: + ./**/core.[1-9]* +{%- endmacro -%} + {%- macro render_test_results() -%} - name: Install render_test_results dependencies if: always() diff --git a/.github/templates/common_android.yml.j2 b/.github/templates/common_android.yml.j2 deleted file mode 100644 index a0e4e781b6adf0..00000000000000 --- a/.github/templates/common_android.yml.j2 +++ /dev/null @@ -1,81 +0,0 @@ -{% import 'common.yml.j2' as common %} - -{%- macro upload_androind_binary_size(build_type, artifacts) -%} - - name: Display and upload binary build size statistics (Click Me) - # temporary hack: set CIRCLE_* vars, until we update - # tools/stats/print_test_stats.py to natively support GitHub Actions - env: - AWS_DEFAULT_REGION: us-east-1 - SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - run: | - # The artifact file is created inside docker container, which contains the result binaries. - # Now unpackage it into the project folder. The subsequent script will scan project folder - # to locate result binaries and report their sizes. - # If artifact file is not provided it assumes that the project folder has been mounted in - # the docker during build and already contains the result binaries, so this step can be skipped. - export ARTIFACTS=!{{ artifacts }} - if [ -n "${ARTIFACTS}" ]; then - tar xf "${ARTIFACTS}" -C "${GITHUB_WORKSPACE}" - cd "${GITHUB_WORKSPACE}" - fi - COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0) - export COMMIT_TIME - ANDROID_BUILD_TYPE=!{{ build_type}} - export ANDROID_BUILD_TYPE - pip3 install requests==2.26 boto3==1.16.34 - python3 -m tools.stats.upload_binary_size_to_scuba "android" || exit 0 -{%- endmacro -%} - -{%- macro build_android(env_name, container_suffix) -%} - - name: Build-!{{ container_suffix }} - env: - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - # detached container should get cleaned up by teardown_ec2_linux - #!/bin/bash -eo pipefail - # Pull Docker image and run build - time docker pull "${DOCKER_IMAGE}" >/dev/null - echo "${DOCKER_IMAGE}" - export container_name - container_name=$(docker run \ - -e BUILD_ENVIRONMENT=!{{ env_name }} \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e AWS_DEFAULT_REGION \ - -e IS_GHA \ - -e PR_NUMBER \ - -e SHA1 \ - -e BRANCH \ - -e GITHUB_RUN_ID \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e PR_LABELS \ - -e http_proxy="!{{ common.squid_proxy }}" -e https_proxy="!{{ common.squid_proxy }}" -e no_proxy="!{{ common.squid_no_proxy }}" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - git submodule sync && git submodule update -q --init --recursive --depth 1 --jobs 0 - docker cp "${GITHUB_WORKSPACE}/." "${container_name}:/var/lib/jenkins/workspace" - # shellcheck disable=SC1105 - ((echo "sudo chown -R jenkins . && .jenkins/pytorch/build.sh && find ${BUILD_ROOT} -type f -name "*.a" -or -name "*.o" -delete") | docker exec -u jenkins -i "${container_name}" bash) 2>&1 - - # Copy dist folder back - export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-!{{ container_suffix }} - docker cp "${container_name}:/var/lib/jenkins/workspace/dist" "${GITHUB_WORKSPACE}/." || echo "Dist folder not found" - docker commit "${container_name}" "${COMMIT_DOCKER_IMAGE}" - time docker push "${COMMIT_DOCKER_IMAGE}" -{%- endmacro -%} diff --git a/.github/templates/docker_builds_ci_workflow.yml.j2 b/.github/templates/docker_builds_ci_workflow.yml.j2 deleted file mode 100644 index 224f683a35a47b..00000000000000 --- a/.github/templates/docker_builds_ci_workflow.yml.j2 +++ /dev/null @@ -1,60 +0,0 @@ -{% import 'common.yml.j2' as common %} - -{%- block name -%} -# Template is at: .github/templates/docker_builds_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: !{{ build_environment }} -{%- endblock %} - -on: - workflow_dispatch: - pull_request: - types: [opened, synchronize, reopened] - paths: - - '.circleci/docker/**' - - '.github/workflows/generated-docker-builds.yml' -{%- if is_scheduled %} - schedule: - - cron: !{{ is_scheduled }} -{%- endif %} -!{{ common.concurrency(build_environment) }} - -env: - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - AWS_DEFAULT_REGION: us-east-1 - -jobs: -{% block docker_build +%} - docker-build: - runs-on: linux.2xlarge - timeout-minutes: !{{ common.timeout_minutes }} - strategy: - matrix: - include: - {%- for docker_image in docker_images %} - - docker_image_base: '!{{ docker_image }}' - docker_image_short_name: '!{{ docker_image.split('/')[-1] }}' - {%- endfor %} - env: - DOCKER_IMAGE_BASE: '${{ matrix.docker_image_base }}' - name: docker-build (${{ matrix.docker_image_short_name }}) - steps: - !{{ common.setup_ec2_linux() }} - !{{ common.checkout() }} - !{{ common.calculate_docker_image(true) }} - - name: Pull Docker image - run: | - !{{ common.add_retry_to_env() }} - retry docker pull "${DOCKER_IMAGE}" - !{{ common.parse_ref() }} - !{{ common.teardown_ec2_linux() }} - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Clean up docker images - if: always() - run: | - # Prune all of the docker images - docker system prune -af -{%- endblock %} diff --git a/.github/templates/ios_ci_workflow.yml.j2 b/.github/templates/ios_ci_workflow.yml.j2 deleted file mode 100644 index 0dd6cbbfff8a3a..00000000000000 --- a/.github/templates/ios_ci_workflow.yml.j2 +++ /dev/null @@ -1,184 +0,0 @@ -{% import 'common.yml.j2' as common %} - -{%- block name -%} -# Template is at: .github/templates/ios_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: !{{ build_environment }} -{%- endblock %} - -on: -{%- if is_default %} - pull_request: -{%- endif %} -{%- if is_scheduled %} - schedule: - - cron: !{{ is_scheduled }} -{%- endif %} - push: -{%- if not is_scheduled %} - branches: - - master - - main - - release/* -{%- endif %} -{%- for label in ciflow_config.labels | sort %} - {%- if loop.first %} - tags: - {%- endif %} - {%- if label != "ciflow/default" %} - - '!{{ label }}/*' - {%- endif %} -{%- endfor %} - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: !{{ build_environment }} - IN_CI: 1 - IS_GHA: 1 - IOS_PLATFORM: !{{ ios_platform }} - IOS_ARCH: !{{ ios_arch }} -!{{ common.set_xcode_version(xcode_version) }} - -jobs: -{% block build +%} - build: - # NOTE: These builds will not run successfully without running on `pytorch/pytorch` due to the limitations - # of accessing secrets from forked pull requests and IOS' dependency on secrets for their build/test - if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} - runs-on: macos-10.15 - timeout-minutes: !{{ common.timeout_minutes }} - env: - JOB_BASE_NAME: !{{ build_environment }}-build - IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }} - IOS_CERT_SECRET: ${{ secrets.IOS_CERT_SECRET }} - IOS_DEV_TEAM_ID: ${{ secrets.IOS_DEV_TEAM_ID }} - IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }} - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - !{{ common.checkout() }} - - name: Populate CI build options - run: | - # Most builds use the lite interpreter, if certain builds shouldn't - # build the lite interpreter this env variable should get over-written - # in the following case statement - echo "BUILD_LITE_INTERPRETER=1" >> "${GITHUB_ENV}" - - case ${BUILD_ENVIRONMENT} in - *metal*) - echo "USE_PYTORCH_METAL=1" >> "${GITHUB_ENV}" - ;; - *full_jit*) - echo "BUILD_LITE_INTERPRETER=0" >> "${GITHUB_ENV}" - ;; - *custom*) - echo "SELECTED_OP_LIST=${GITHUB_WORKSPACE}/ios/TestApp/custom_build/mobilenetv2.yaml" >> "${GITHUB_ENV}" - ;; - *coreml*) - echo "USE_COREML_DELEGATE=1" >> "${GITHUB_ENV}" - ;; - esac - - name: Install brew dependencies - run: | - # Install dependencies - brew install libtool - - name: Install conda and dependencies - run: | - # Install conda, setup-miniconda messes with the path that messes with the ruby stuff we do later on - curl --retry 3 -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh - chmod +x "${RUNNER_TEMP}/conda.sh" - /bin/bash "${RUNNER_TEMP}/conda.sh" -b -p "${RUNNER_TEMP}/anaconda" - echo "${RUNNER_TEMP}/anaconda/bin" >> "${GITHUB_PATH}" - # shellcheck disable=SC1091 - source "${RUNNER_TEMP}/anaconda/bin/activate" - conda install -y \ - cffi \ - cmake \ - mkl \ - mkl-include \ - ninja \ - numpy \ - pyyaml \ - requests \ - setuptools \ - typing_extensions - - name: Run Fastlane - run: | - set -x - cd ios/TestApp - # install fastlane - sudo gem install bundler && bundle install - # install certificates - echo "${IOS_CERT_KEY_2022}" >> cert.txt - base64 --decode cert.txt -o Certificates.p12 - rm cert.txt - bundle exec fastlane install_root_cert - bundle exec fastlane install_dev_cert - # install the provisioning profile - PROFILE=PyTorch_CI_2022.mobileprovision - PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles - mkdir -pv "${PROVISIONING_PROFILES}" - cd "${PROVISIONING_PROFILES}" - echo "${IOS_SIGN_KEY_2022}" >> cert.txt - base64 --decode cert.txt -o ${PROFILE} - rm cert.txt - - name: Build - run: | - # shellcheck disable=SC1091 - source "${RUNNER_TEMP}/anaconda/bin/activate" - export TCLLIBPATH="/usr/local/lib" - python -VV - export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"} - scripts/build_ios.sh - - name: Run Build Test - run: | - PROFILE=PyTorch_CI_2022 - # run the ruby build script - if ! [ -x "$(command -v xcodebuild)" ]; then - echo 'Error: xcodebuild is not installed.' - exit 1 - fi - if [ "${IOS_PLATFORM}" != "SIMULATOR" ]; then - ruby scripts/xcode_build.rb -i build_ios/install -x ios/TestApp/TestApp.xcodeproj -p "${IOS_PLATFORM}" -c "${PROFILE}" -t "${IOS_DEV_TEAM_ID}" - else - ruby scripts/xcode_build.rb -i build_ios/install -x ios/TestApp/TestApp.xcodeproj -p "${IOS_PLATFORM}" - fi -{%- if ios_platform == "SIMULATOR" %} - - name: Run Simulator Tests - run: | - # shellcheck disable=SC1091 - source "${RUNNER_TEMP}/anaconda/bin/activate" - pip3 install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html - # generate models for differnet backends - cd "${GITHUB_WORKSPACE}/ios/TestApp/benchmark" - mkdir -p ../models - if [ "${USE_COREML_DELEGATE}" == 1 ]; then - pip install coremltools==5.0b5 - pip install six==1.16.0 - python coreml_backend.py - else - python trace_model.py - fi - if [ "${BUILD_LITE_INTERPRETER}" == 1 ]; then - echo "Setting up the TestApp for LiteInterpreter" - ruby setup.rb --lite 1 - else - echo "Setting up the TestApp for Full JIT" - ruby setup.rb - fi - cd "${GITHUB_WORKSPACE}/ios/TestApp" - instruments -s -devices - if [ "${BUILD_LITE_INTERPRETER}" == 1 ]; then - if [ "${USE_COREML_DELEGATE}" == 1 ]; then - fastlane scan --only_testing TestAppTests/TestAppTests/testCoreML - else - fastlane scan --only_testing TestAppTests/TestAppTests/testLiteInterpreter - fi - else - fastlane scan --only_testing TestAppTests/TestAppTests/testFullJIT - fi -{%- endif -%} -{% endblock +%} - -!{{ common.concurrency(build_environment) }} diff --git a/.github/templates/linux_binary_build_workflow.yml.j2 b/.github/templates/linux_binary_build_workflow.yml.j2 index 97f2795f4f6405..f10b39a72ced0e 100644 --- a/.github/templates/linux_binary_build_workflow.yml.j2 +++ b/.github/templates/linux_binary_build_workflow.yml.j2 @@ -9,17 +9,22 @@ name: !{{ build_environment }} on: push: + {%- if branches == "nightly" %} # NOTE: Meta Employees can trigger new nightlies using: https://fburl.com/trigger_pytorch_nightly_build + {%- endif %} branches: - - nightly + - !{{ branches }} + {%- if branches == "nightly" %} tags: # NOTE: Binary build pipelines should only get triggered on release candidate builds # Release candidate tags look like: v1.11.0-rc1 - v[0-9]+.[0-9]+.[0-9]+-rc[0-9]+ + {%- endif %} {%- for label in ciflow_config.labels | sort %} - {%- if label != "ciflow/default" %} + {%- if loop.first and branches != "nightly" %} + tags: + {%- endif %} - '!{{ label }}/*' - {%- endif %} {%- endfor %} workflow_dispatch: @@ -114,7 +119,7 @@ jobs: !{{ upload.binary_env(config) }} steps: !{{ common.setup_ec2_linux() }} - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: !{{ config["build_name"] }} @@ -172,5 +177,7 @@ jobs: docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh" docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh" !{{ common.teardown_ec2_linux("pytorch/") }} + {%- if branches == "nightly" %} !{{ upload.upload_binaries(config) }} + {%- endif %} {%- endfor %} diff --git a/.github/templates/linux_ci_workflow.yml.j2 b/.github/templates/linux_ci_workflow.yml.j2 deleted file mode 100644 index 7bbdfe04b3f6e0..00000000000000 --- a/.github/templates/linux_ci_workflow.yml.j2 +++ /dev/null @@ -1,446 +0,0 @@ -{% import 'common.yml.j2' as common %} - -{%- block name -%} -# Template is at: .github/templates/linux_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: !{{ build_environment }} -{%- endblock %} - -on: -{%- if on_pull_request %} - pull_request: -{%- endif %} - push: -{%- if enable_doc_jobs and is_scheduled %} - tags: - # NOTE: Binary build pipelines should only get triggered on release candidate builds - # Release candidate tags look like: v1.11.0-rc1 - - v[0-9]+.[0-9]+.[0-9]+-rc[0-9]+ -{%- endif %} -{%- for label in ciflow_config.labels | sort %} - {%- if loop.first and not (enable_doc_jobs and is_scheduled) %} - tags: - {%- endif %} - {%- if label != "ciflow/default" %} - - '!{{ label }}/*' - {%- endif %} -{%- endfor %} -{%- if not is_scheduled %} - branches: - - master - - main - - release/* -{%- endif %} -{%- if is_scheduled %} - schedule: - - cron: !{{ is_scheduled }} -{%- endif %} - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: !{{ build_environment }} - DOCKER_IMAGE_BASE: !{{ docker_image_base }} - SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 - XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla - TORCH_CUDA_ARCH_LIST: 5.2 - IN_CI: 1 - IS_GHA: 1 - # This is used for the phase of adding wheel tests only, will be removed once completed - IN_WHEEL_TEST: 1 - # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh - CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - AWS_DEFAULT_REGION: us-east-1 - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - PYTORCH_RETRY_TEST_CASES: 1 -{%- if enable_xla_test == 1 %} - # This is used for XLA tests only - XLA_CUDA: 0 - XLA_IMAGE_TAG: v0.2 -{%- endif %} -{%- if build_with_debug %} - DEBUG: 1 -{%- endif %} -!{{ common.concurrency(build_environment) }} - -jobs: -{% block build +%} - build: - runs-on: linux.2xlarge - timeout-minutes: !{{ common.timeout_minutes }} - env: - JOB_BASE_NAME: !{{ build_environment }}-build - outputs: - docker_image: ${{ steps.calculate-tag.outputs.docker_image }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - !{{ common.setup_ec2_linux() }} - !{{ common.checkout() }} - {%- if enable_xla_test == 1 %} - - name: Calculate docker image tag - id: calculate-tag - run: | - echo "XLA workflow uses pre-built test image at ${XLA_IMAGE_TAG}" - DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker) - echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${XLA_IMAGE_TAG}" >> "${GITHUB_ENV}" - echo "::set-output name=docker_tag::${DOCKER_TAG}" - echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${XLA_IMAGE_TAG}" - {%- else %} - !{{ common.calculate_docker_image(false) }} - {%- endif %} - - name: Pull Docker image - run: | - !{{ common.add_retry_to_env() }} - retry docker pull "${DOCKER_IMAGE}" - !{{ common.parse_ref() }} - - name: Build - env: - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - # detached container should get cleaned up by teardown_ec2_linux - container_name=$(docker run \ - -e BUILD_ENVIRONMENT \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e AWS_DEFAULT_REGION \ - -e IS_GHA \ - -e PR_NUMBER \ - -e SHA1 \ - -e BRANCH \ - -e GITHUB_RUN_ID \ - -e SCCACHE_BUCKET \ - {%- if enable_xla_test == 1 %} - -e XLA_CUDA \ - {%- endif %} - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e PR_LABELS \ - -e http_proxy="!{{ common.squid_proxy }}" -e https_proxy="!{{ common.squid_proxy }}" -e no_proxy="!{{ common.squid_no_proxy }}" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh' - - name: Display and upload binary build size statistics (Click Me) - # temporary hack: set CIRCLE_* vars, until we update - # tools/stats/print_test_stats.py to natively support GitHub Actions - env: - SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - run: | - COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0) - export COMMIT_TIME - pip3 install requests==2.26 boto3==1.16.34 - python3 -m tools.stats.upload_binary_size_to_scuba || exit 0 - - name: Chown workspace - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - {%- if build_generates_artifacts %} - - name: Archive artifacts into zip - run: | - zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json - - uses: !{{ common.upload_artifact_s3_action }} - name: Store PyTorch Build Artifacts on S3 - with: - name: ${{ env.BUILD_ENVIRONMENT }} - retention-days: 14 - if-no-files-found: error - path: - artifacts.zip - {%- endif %} - !{{ common.teardown_ec2_linux() }} - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Clean up docker images - if: always() - run: | - # Prune all of the docker images - docker system prune -af -{%- endblock %} -{%- if not exclude_test %} -{% block test +%} - {%- for test_job in test_jobs %} - !{{ test_job.id }}: - name: !{{ test_job.name }} - needs: build - runs-on: !{{ test_job.runner }} - timeout-minutes: !{{ timeout_after + 30 }} - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: !{{ build_environment }}-test - TEST_CONFIG: !{{ test_job.config }} - SHARD_NUMBER: !{{ test_job.shard }} - NUM_TEST_SHARDS: !{{ test_job.num_shards }} - PR_BODY: ${{ github.event.pull_request.body }} - steps: -{%- if 'rocm' in test_runner_type %} - !{{ common.setup_rocm_linux() }} -{%- else %} - !{{ common.setup_ec2_linux() }} -{%- endif %} - !{{ common.checkout() }} - - name: Pull Docker image - run: | - !{{ common.add_retry_to_env() }} - retry docker pull "${DOCKER_IMAGE}" -{%- if 'rocm' in test_runner_type and "nogpu" not in test_job.config %} - - name: ROCm set GPU_FLAG - run: | - echo "GPU_FLAG=--device=/dev/mem --device=/dev/kfd --device=/dev/dri --group-add video --group-add daemon" >> "${GITHUB_ENV}" -{%- elif "cuda" in build_environment and "nogpu" not in test_job.config %} - - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG - with: - timeout_minutes: 10 - max_attempts: 3 - command: | - set -ex - bash .github/scripts/install_nvidia_utils_linux.sh - echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" -{%- endif %} - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | -{%- if 'rocm' in test_runner_type %} - df -H -{%- else %} - sudo df -H -{%- endif %} - !{{ common.parse_ref() }} - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after !{{ timeout_after }} minutes - timeout-minutes: !{{ timeout_after }} - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi -{%- if 'rocm' not in test_runner_type %} - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=!{{ common.squid_proxy }} -e https_proxy=!{{ common.squid_proxy }} -e no_proxy=!{{ common.squid_no_proxy }}" - fi -{%- endif %} - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - {%- if enable_xla_test == 1 %} - -e XLA_CUDA \ - {%- endif %} - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ -{%- if 'rocm' not in test_runner_type %} - ${PROXY_ENV} \ -{%- endif %} - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ -{%- if 'rocm' not in test_runner_type %} - --ipc=host \ -{%- endif %} - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) -{%- if 'rocm' in test_runner_type %} - # jenkins user does not have write permission to mounted workspace; work-around by copying within container to jenkins home - docker exec -t "${container_name}" sh -c "cd .. && cp -R workspace pytorch && cd pytorch && pip install dist/*.whl && ${TEST_COMMAND}" - # copy test results back to the mounted workspace, needed sudo, resulting permissions were correct - docker exec -t "${container_name}" sh -c "cd ../pytorch && sudo cp -R test/test-reports ../workspace/test" -{%- else %} - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" -{%- endif %} -{%- if 'rocm' not in test_runner_type %} - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . -{%- endif %} - !{{ common.render_test_results() }} -{%- if 'rocm' in test_runner_type %} - !{{ common.upload_downloaded_files(name='linux', use_s3=False, config=test_job.config, shard=test_job.shard, num_shards=test_job.num_shards, runner=test_job.runner) }} - !{{ common.upload_test_reports(name='linux', artifact_name="test-reports", use_s3=False, config=test_job.config, shard=test_job.shard, num_shards=test_job.num_shards, runner=test_job.runner) }} -{%- else %} - !{{ common.upload_downloaded_files(name='linux', config=test_job.config, shard=test_job.shard, num_shards=test_job.num_shards, runner=test_job.runner) }} - !{{ common.upload_test_reports(name='linux', config=test_job.config, shard=test_job.shard, num_shards=test_job.num_shards, runner=test_job.runner) }} -{%- endif %} - !{{ common.upload_test_statistics(build_environment) }} -{%- if 'rocm' in test_runner_type %} - !{{ common.teardown_rocm_linux() }} -{%- else %} - !{{ common.teardown_ec2_linux() }} -{%- endif %} -{%- endfor %} -{% endblock %} -{%- endif -%} -{%- if enable_doc_jobs %} - build-docs: - runs-on: linux.2xlarge - timeout-minutes: !{{ common.timeout_minutes }} - strategy: - matrix: - docs_type: [cpp, python] - needs: [build] - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - DOCS_TYPE: ${{ matrix.docs_type }} - WITH_PUSH: ${{ github.event_name == 'schedule' || startsWith(github.event.ref, 'refs/tags/v') }} - steps: - !{{ common.setup_ec2_linux() }} - !{{ common.checkout() }} - - name: Pull Docker image - run: | - !{{ common.add_retry_to_env() }} - retry docker pull "${DOCKER_IMAGE}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip -{%- if is_scheduled %} - - name: Generate netrc (only for docs-push) - if: ${{ github.event_name == 'schedule' || startsWith(github.event.ref, 'refs/tags/v') }} - env: - GITHUB_PYTORCHBOT_TOKEN: ${{ secrets.GH_PYTORCHBOT_TOKEN }} - run: | - # set credentials for https pushing - echo "machine github.com" > "${RUNNER_TEMP}/.netrc" - echo "login pytorchbot" >> "${RUNNER_TEMP}/.netrc" - echo "password ${GITHUB_PYTORCHBOT_TOKEN}" >> "${RUNNER_TEMP}/.netrc" -{%- endif %} - - name: Build ${{ matrix.docs_type }} docs - run: | - set -ex - time docker pull "${DOCKER_IMAGE}" > /dev/null - # Convert refs/tags/v1.12.0rc3 into 1.12 - if [[ "${GITHUB_REF}" =~ ^refs/tags/v([0-9]+\.[0-9]+)\.* ]]; then - target="${BASH_REMATCH[1]}" - else - target="master" - fi - # detached container should get cleaned up by teardown_ec2_linux - container_name=$(docker run \ - -e BUILD_ENVIRONMENT \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e IN_CI \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SHA1="$GITHUB_SHA" \ - -e DOCS_VERSION="${target}" \ - -e DOCS_TYPE \ - -e PR_LABELS \ - -e WITH_PUSH \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ -{%- if is_scheduled %} - -v "${RUNNER_TEMP}/.netrc":/var/lib/jenkins/.netrc \ -{%- endif %} - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" bash -c "sudo chown -R jenkins . && pip install dist/*.whl && ./.circleci/scripts/${DOCS_TYPE}_doc_push_script.sh" - - name: Chown workspace - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: !{{ common.upload_artifact_s3_action }} - name: Upload Python Docs Preview - if: ${{ github.event_name == 'pull_request' && matrix.docs_type == 'python' }} - with: - retention-days: 14 - s3-bucket: doc-previews - if-no-files-found: error - path: pytorch.github.io/docs/master/ - s3-prefix: pytorch/${{ github.event.pull_request.number }} - - uses: !{{ common.upload_artifact_s3_action }} - name: Upload C++ Docs Preview - if: ${{ github.event_name == 'pull_request' && matrix.docs_type == 'cpp' }} - with: - retention-days: 14 - if-no-files-found: error - s3-bucket: doc-previews - path: cppdocs/ - s3-prefix: pytorch/${{ github.event.pull_request.number }}/cppdocs -{%- endif -%} diff --git a/.github/templates/macos_binary_build_workflow.yml.j2 b/.github/templates/macos_binary_build_workflow.yml.j2 index 10b0f6310d2a66..e788e608619008 100644 --- a/.github/templates/macos_binary_build_workflow.yml.j2 +++ b/.github/templates/macos_binary_build_workflow.yml.j2 @@ -33,9 +33,10 @@ on: # Release candidate tags look like: v1.11.0-rc1 - v[0-9]+.[0-9]+.[0-9]+-rc[0-9]+ {%- for label in ciflow_config.labels | sort %} - {%- if label != "ciflow/default" %} + {%- if loop.first and branches != "nightly" %} + tags: + {%- endif %} - '!{{ label }}/*' - {%- endif %} {%- endfor %} workflow_dispatch: @@ -59,6 +60,7 @@ env: jobs: {%- for config in build_configs %} !{{ config["build_name"] }}-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: macos-10.15 {%- if config["package_type"] == "libtorch" %} # libtorch builds take a long time on github hosted runners diff --git a/.github/templates/macos_ci_workflow.yml.j2 b/.github/templates/macos_ci_workflow.yml.j2 deleted file mode 100644 index 47fa86fac54b05..00000000000000 --- a/.github/templates/macos_ci_workflow.yml.j2 +++ /dev/null @@ -1,131 +0,0 @@ -{% import 'common.yml.j2' as common %} - -{%- block name -%} -# Template is at: .github/templates/macos_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: !{{ build_environment }} -{%- endblock %} - -on: -{%- if is_default -%} - pull_request: -{%- endif -%} - -{%- if is_scheduled %} - schedule: - - cron: !{{ is_scheduled }} -{%- else %} - push: - branches: - - master - - main - - release/* -{%- endif %} -{%- for label in ciflow_config.labels | sort %} - {%- if loop.first %} - tags: - {%- endif %} - {%- if label != "ciflow/default" %} - - '!{{ label }}/*' - {%- endif %} -{%- endfor %} - workflow_dispatch: - -# For setup-miniconda, see https://github.com/conda-incubator/setup-miniconda/issues/179 -defaults: - run: - shell: bash -e -l {0} -env: - BUILD_ENVIRONMENT: !{{ build_environment }} - COMPACT_JOB_NAME: !{{ build_environment }} - IN_CI: 1 - IS_GHA: 1 - PYTORCH_RETRY_TEST_CASES: 1 -!{{ common.set_xcode_version(xcode_version) }} - -jobs: -{% block build +%} - build: - runs-on: !{{ test_runner_type }} - env: - JOB_BASE_NAME: !{{ build_environment }} - # For sccache access (only on non-forked PRs) - AWS_ACCESS_KEY_ID: ${{ secrets.MACOS_SCCACHE_S3_ACCESS_KEY_ID }} - AWS_SECRET_ACCESS_KEY: ${{ secrets.MACOS_SCCACHE_S3_SECRET_ACCESS_KEY }} - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - !{{ common.checkout() }} - !{{ common.setup_miniconda("3.8") }} - - name: Install macOS homebrew dependencies - run: | - # Install dependencies - brew install libomp - - name: Install sccache (only for non-forked PRs, and pushes to trunk) - if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} - run: | - sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache - sudo chmod +x /usr/local/bin/sccache - echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" - - name: Build - run: | - echo "CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"}" >> "${GITHUB_ENV}" - .jenkins/pytorch/macos-build.sh -{%- if build_generates_artifacts %} - - name: Archive artifacts into zip - run: | - zip -1 -r artifacts.zip dist/ - - uses: actions/upload-artifact@v2 - name: Store PyTorch Build Artifacts on GHA - with: - name: ${{ env.BUILD_ENVIRONMENT }} - retention-days: 14 - if-no-files-found: error - path: - artifacts.zip -{%- endif %} -{% endblock +%} -{%- if not exclude_test %} -{% block test +%} - {%- for test_job in test_jobs %} - !{{ test_job.id }}: - name: !{{ test_job.name }} - needs: build - runs-on: !{{ test_job.runner }} - timeout-minutes: !{{ common.timeout_minutes }} - env: - JOB_BASE_NAME: !{{ build_environment }}-test - TEST_CONFIG: !{{ test_job.config }} - SHARD_NUMBER: !{{ test_job.shard }} - NUM_TEST_SHARDS: !{{ test_job.num_shards }} - PR_BODY: ${{ github.event.pull_request.body }} - steps: - !{{ common.checkout(submodules="false") }} - - uses: actions/download-artifact@v2 - name: Download PyTorch Build Artifacts from GHA - with: - name: ${{ env.BUILD_ENVIRONMENT }} - path: . - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - !{{ common.setup_miniconda("3.8") }} - - name: Install macOS homebrew dependencies - run: | - # Install dependencies - brew install libomp - !{{ common.parse_ref() }} - - name: Test - run: | - python3 -mpip install dist/*.whl - .jenkins/pytorch/macos-test.sh - !{{ common.render_test_results() }} - !{{ common.upload_downloaded_files(name='macos', config=test_job.config, shard=test_job.shard, num_shards=test_job.num_shards, runner=test_job.runner, artifact_name="test-jsons", use_s3=False) }} - !{{ common.upload_test_reports("macos", config=test_job.config, shard=test_job.shard, num_shards=test_job.num_shards, runner=test_job.runner, artifact_name="test-reports", use_s3=False) }} - !{{ common.upload_test_statistics(build_environment, needs_credentials=True) }} -{%- endfor %} -{% endblock +%} -{%- endif %} - -!{{ common.concurrency(build_environment) }} diff --git a/.github/templates/upload.yml.j2 b/.github/templates/upload.yml.j2 index 4c680eea47d714..63bec412997e27 100644 --- a/.github/templates/upload.yml.j2 +++ b/.github/templates/upload.yml.j2 @@ -52,7 +52,7 @@ - name: Clone pytorch/pytorch uses: actions/checkout@v2 {%- if use_s3 %} - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 {%- else %} - uses: actions/download-artifact@v2 {%- endif %} diff --git a/.github/templates/windows_binary_build_workflow.yml.j2 b/.github/templates/windows_binary_build_workflow.yml.j2 index df018fc43919bf..0fcfbf9096b805 100644 --- a/.github/templates/windows_binary_build_workflow.yml.j2 +++ b/.github/templates/windows_binary_build_workflow.yml.j2 @@ -21,17 +21,22 @@ name: !{{ build_environment }} on: push: + {%- if branches == "nightly" %} # NOTE: Meta Employees can trigger new nightlies using: https://fburl.com/trigger_pytorch_nightly_build + {%- endif %} branches: - - nightly + - !{{ branches }} + {%- if branches == "nightly" %} tags: # NOTE: Binary build pipelines should only get triggered on release candidate builds # Release candidate tags look like: v1.11.0-rc1 - v[0-9]+.[0-9]+.[0-9]+-rc[0-9]+ + {%- endif %} {%- for label in ciflow_config.labels | sort %} - {%- if label != "ciflow/default" %} + {%- if loop.first and branches != "nightly" %} + tags: + {%- endif %} - '!{{ label }}/*' - {%- endif %} {%- endfor %} workflow_dispatch: @@ -54,6 +59,7 @@ env: jobs: {%- for config in build_configs %} !{{ config["build_name"] }}-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: !{{ common.timeout_minutes }} !{{ upload.binary_env(config, True) }} @@ -91,7 +97,7 @@ jobs: steps: !{{ common.setup_ec2_windows() }} !{{ set_runner_specific_vars() }} - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: !{{ config["build_name"] }} @@ -107,5 +113,7 @@ jobs: run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" !{{ common.wait_and_kill_ssh_windows('pytorch') }} + {%- if branches == "nightly" %} !{{ upload.upload_binaries(config, True) }} + {%- endif %} {%- endfor %} diff --git a/.github/templates/windows_ci_workflow.yml.j2 b/.github/templates/windows_ci_workflow.yml.j2 deleted file mode 100644 index af1561343a9b05..00000000000000 --- a/.github/templates/windows_ci_workflow.yml.j2 +++ /dev/null @@ -1,208 +0,0 @@ -{% import 'common.yml.j2' as common %} - -{%- macro wait_and_kill_ssh() -%} - - name: Wait until all sessions have drained - shell: powershell - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 -{%- endmacro -%} - -# Template is at: .github/templates/windows_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: !{{ build_environment }} - -on: -{%- if on_pull_request %} - pull_request: -{%- endif %} - push: -{%- for label in ciflow_config.labels | sort %} - {%- if loop.first %} - tags: - {%- endif %} - {%- if label != "ciflow/default" %} - - '!{{ label }}/*' - {%- endif %} -{%- endfor %} -{%- if not is_scheduled %} - branches: - - master - - main - - release/* -{%- endif %} -{%- if is_scheduled %} - schedule: - - cron: !{{ is_scheduled }} -{%- endif %} - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: !{{ build_environment }} - BUILD_WHEEL: 1 - MAX_JOBS: 8 - CUDA_VERSION: "!{{ cuda_version }}" - IN_CI: 1 - IS_GHA: 1 - INSTALL_WINDOWS_SDK: 1 - PYTHON_VERSION: "3.8" - PYTORCH_RETRY_TEST_CASES: 1 - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - SCCACHE_BUCKET: "ossci-compiler-cache" - VC_PRODUCT: "BuildTools" - VC_VERSION: "" - VS_VERSION: "16.8.6" - VC_YEAR: "2019" - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - no_proxy: !{{ common.squid_no_proxy }} - AWS_DEFAULT_REGION: us-east-1 - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} -{%- if build_with_debug %} - DEBUG: 1 -{%- endif %} -{%- if cuda_version != "cpu" %} - TORCH_CUDA_ARCH_LIST: "7.0" -{%- endif %} - USE_CUDA: !{{ 1 if cuda_version != "cpu" else 0 }} - -!{{ common.concurrency(build_environment) }} - -jobs: - build: - runs-on: "windows.4xlarge" - timeout-minutes: !{{ common.timeout_minutes }} - env: - JOB_BASE_NAME: !{{ build_environment }}-build - http_proxy: "!{{ common. squid_proxy }}" - https_proxy: "!{{ common.squid_proxy }}" - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - !{{ common.checkout() }} - !{{ common.display_ec2_information() }} - - name: Install Visual Studio 2019 toolchain - shell: powershell - run: | - .\.circleci\scripts\vs_install.ps1 -{%- if cuda_version != "cpu" %} - - name: Install Cuda - shell: bash - run: | - .circleci/scripts/windows_cuda_install.sh - - name: Install Cudnn - shell: bash - run: | - .circleci/scripts/windows_cudnn_install.sh -{%- endif %} - - uses: actions/setup-python@v2 - name: Setup Python3 - with: - python-version: '3.x' - !{{ common.parse_ref() }} - - name: Build - shell: bash - env: - PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/ - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - .jenkins/pytorch/win-build.sh - # Upload to github so that people can click and download artifacts - - name: Upload artifacts to s3 - uses: !{{ common.upload_artifact_s3_action }} - with: - retention-days: 14 - if-no-files-found: error - name: ${{ env.BUILD_ENVIRONMENT }} - path: C:\${{ github.run_id }}\build-results - !{{ common.wait_and_kill_ssh_windows() }} - - name: Cleanup build-results and workspaces - if: always() - shell: bash - env: - PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/ - # Should remove the entirety of pytorch-${{ github.run_id }} - run: | - rm -rf "${PYTORCH_FINAL_PACKAGE_DIR}" - rm -rf ./* - - {%- for test_job in test_jobs %} - !{{ test_job.id }}: - name: !{{ test_job.name }} - timeout-minutes: !{{ timeout_after + 30 }} - env: - JOB_BASE_NAME: !{{ build_environment }}-test - SHARD_NUMBER: !{{ test_job.shard }} - NUM_TEST_SHARDS: !{{ test_job.num_shards }} - TEST_CONFIG: !{{ test_job.config }} - http_proxy: "!{{ common.squid_proxy }}" - https_proxy: "!{{ common.squid_proxy }}" - PR_BODY: ${{ github.event.pull_request.body }} - needs: build - runs-on: !{{ test_job.runner }} - steps: - !{{ common.display_ec2_information() }} - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - !{{ common.checkout() }} - - name: Install Visual Studio 2019 toolchain - shell: powershell - run: | - .\.circleci\scripts\vs_install.ps1 -{%- if cuda_version != "cpu" and not test_job.config == 'force_on_cpu' %} - - name: Install Cuda - shell: bash - run: | - .circleci/scripts/windows_cuda_install.sh - - name: Install Cudnn - shell: bash - run: | - .circleci/scripts/windows_cudnn_install.sh -{%- endif %} - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - path: C:\${{ github.run_id }}\build-results - - name: Check build-results folder - shell: powershell - run: | - tree /F C:\$Env:GITHUB_RUN_ID\build-results - # Needed for coverage in win-test.sh - - uses: actions/setup-python@v2 - name: Setup Python3 - with: - python-version: '3.x' - - name: Test - shell: bash - env: - PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/ - # Time out the test phase after !{{ timeout_after }} minutes - timeout-minutes: !{{ timeout_after }} - run: | - .jenkins/pytorch/win-test.sh - !{{ common.upload_downloaded_files(name='windows', config=test_job.config, shard=test_job.shard, num_shards=test_job.num_shards, runner=test_job.runner) }} - !{{ common.upload_test_reports(name='windows', config=test_job.config, shard=test_job.shard, num_shards=test_job.num_shards, runner=test_job.runner) }} - !{{ common.render_test_results() }} - !{{ common.wait_and_kill_ssh_windows() }} - !{{ common.parse_ref() }} - !{{ common.upload_test_statistics(build_environment) }} - - name: Cleanup workspace - if: always() - shell: bash - # Should remove the entirety of pytorch-${{ github.run_id }} - run: | - rm -rf ./* - {%- endfor %} diff --git a/.github/workflows/_android-build-test.yml b/.github/workflows/_android-build-test.yml new file mode 100644 index 00000000000000..a489d7d7e002d4 --- /dev/null +++ b/.github/workflows/_android-build-test.yml @@ -0,0 +1,150 @@ +name: android-build-test + +on: + workflow_call: + inputs: + build-environment: + required: true + type: string + description: Top-level label for what's being built/tested. + docker-image-name: + required: true + type: string + description: Name of the base docker image to build with. + +env: + IN_CI: 1 # TODO delete in favor of GITHUB_ACTIONS + IS_GHA: 1 # TODO delete in favor of GITHUB_ACTIONS + GIT_DEFAULT_BRANCH: ${{ github.event.repository.default_branch }} + +jobs: + build-and-test: + # Don't run on forked repos. + if: github.repository_owner == 'pytorch' + runs-on: [self-hosted, linux.2xlarge] + steps: + # [see note: pytorch repo ref] + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + + - name: Setup Linux + uses: ./.github/actions/setup-linux + + - name: Setup SSH (Click me for login details) + uses: ./.github/actions/setup-ssh + with: + github-secret: ${{ secrets.GITHUB_TOKEN }} + + - name: Calculate docker image + id: calculate-docker-image + uses: ./.github/actions/calculate-docker-image + with: + docker-image-name: ${{ inputs.docker-image-name }} + xla: ${{ contains(inputs.build-environment, 'xla') }} + + - name: Pull docker image + uses: ./.github/actions/pull-docker-image + with: + docker-image: ${{ steps.calculate-docker-image.outputs.docker-image }} + + - name: Output disk space left + run: | + sudo df -H + + - name: Preserve github env variables for use in docker + run: | + env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" + + - name: Build + env: + BUILD_ENVIRONMENT: ${{ inputs.build-environment }} + JOB_BASE_NAME: ${{ inputs.build-environment }}-build-and-test + CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts + TORCH_CUDA_ARCH_LIST: 5.2 + SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 + PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} + DOCKER_IMAGE: ${{ steps.calculate-docker-image.outputs.docker-image }} + run: | + set -e + # Unlike other gradle jobs, it's not worth building libtorch in a separate CI job and share via docker, because: + # 1) Not shareable: it's custom selective build, which is different from default libtorch mobile build; + # 2) Not parallelizable by architecture: it only builds libtorch for one architecture; + + echo "DOCKER_IMAGE: ${DOCKER_IMAGE}" + time docker pull "${DOCKER_IMAGE}" >/dev/null + + export BUILD_LITE_INTERPRETER + BUILD_LITE_INTERPRETER="1" + if [[ "${BUILD_ENVIRONMENT}" == *"full-jit" ]]; then + BUILD_LITE_INTERPRETER="0" + fi + + git submodule sync && git submodule update -q --init --recursive --depth 1 --jobs 0 + export id + id=$(docker run -e BUILD_ENVIRONMENT \ + -e JOB_BASE_NAME \ + -e MAX_JOBS="$(nproc --ignore=2)" \ + -e SCCACHE_BUCKET \ + -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ + -e PR_LABELS \ + -e SKIP_SCCACHE_INITIALIZATION=1 \ + -e TORCH_CUDA_ARCH_LIST \ + -e BUILD_LITE_INTERPRETER \ + --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ + --security-opt seccomp=unconfined \ + --cap-add=SYS_PTRACE \ + --tty \ + --detach \ + --user jenkins \ + -v "$(pwd):/var/lib/jenkins/workspace" \ + --cap-add=SYS_PTRACE \ + --security-opt seccomp=unconfined \ + --cap-add=SYS_PTRACE \ + --security-opt seccomp=unconfined \ + -t -d -w /var/lib/jenkins "${DOCKER_IMAGE}") + + export COMMAND + # shellcheck disable=SC2016 + COMMAND='(echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/build_android_gradle.sh" | docker exec -u jenkins -e BUILD_LITE_INTERPRETER -e GRADLE_OFFLINE=1 -i "$id" bash) 2>&1' + echo "${COMMAND}" > ./command.sh && bash ./command.sh + # Skip docker push as this job is purely for size analysis purpose. + # Result binaries are already in `/home/circleci/project/` as it's mounted instead of copied. + + - name: Parse ref + id: parse-ref + run: .github/scripts/parse_ref.py + + - name: Display and upload binary build size statistics (Click Me) + # temporary hack: set CIRCLE_* vars, until we update + # tools/stats/print_test_stats.py to natively support GitHub Actions + env: + AWS_DEFAULT_REGION: us-east-1 + BRANCH: ${{ steps.parse-ref.outputs.branch }} + PR_NUMBER: ${{ github.event.pull_request.number }} + SHA1: ${{ github.event.pull_request.head.sha || github.sha }} + TAG: ${{ steps.parse-ref.outputs.tag }} + WORKFLOW_ID: ${{ github.run_id }} + ARTIFACTS: "" + ANDROID_BUILD_TYPE: custom-build-single + run: | + # The artifact file is created inside docker container, which contains the result binaries. + # Now unpackage it into the project folder. The subsequent script will scan project folder + # to locate result binaries and report their sizes. + # If artifact file is not provided it assumes that the project folder has been mounted in + # the docker during build and already contains the result binaries, so this step can be skipped. + if [ -n "${ARTIFACTS}" ]; then + tar xf "${ARTIFACTS}" -C "${GITHUB_WORKSPACE}" + cd "${GITHUB_WORKSPACE}" + fi + COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0) + export COMMIT_TIME + pip3 install requests==2.26 boto3==1.16.34 + python3 -m tools.stats.upload_binary_size_to_scuba "android" || exit 0 + + - name: Chown workspace + uses: ./.github/actions/chown-workspace + if: always() + + - name: Teardown Linux + uses: ./.github/actions/teardown-linux + if: always() diff --git a/.github/workflows/_android-full-build-test.yml b/.github/workflows/_android-full-build-test.yml new file mode 100644 index 00000000000000..d0b8845a662097 --- /dev/null +++ b/.github/workflows/_android-full-build-test.yml @@ -0,0 +1,222 @@ +name: android-full-build-test + +on: + workflow_call: + inputs: + build-environment: + required: true + type: string + description: Top-level label for what's being built/tested. + docker-image-name: + required: true + type: string + description: Name of the base docker image to build with. + + secrets: + SONATYPE_NEXUS_USERNAME: + description: nexus user + required: true + SONATYPE_NEXUS_PASSWORD: + description: nexus pass + required: true + ANDROID_SIGN_KEY: + description: android key + required: true + ANDROID_SIGN_PASS: + description: android pass + required: true + SCRIBE_GRAPHQL_ACCESS_TOKEN: + description: token for writing to scribe/scuba + required: true + +env: + IN_CI: 1 # TODO delete in favor of GITHUB_ACTIONS + IS_GHA: 1 # TODO delete in favor of GITHUB_ACTIONS + GIT_DEFAULT_BRANCH: ${{ github.event.repository.default_branch }} + +jobs: + build: + # Don't run on forked repos. + if: github.repository_owner == 'pytorch' + runs-on: [self-hosted, linux.2xlarge] + steps: + # [see note: pytorch repo ref] + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + + - name: Setup Linux + uses: ./.github/actions/setup-linux + + - name: Setup SSH (Click me for login details) + uses: ./.github/actions/setup-ssh + with: + github-secret: ${{ secrets.GITHUB_TOKEN }} + + - name: Calculate docker image + id: calculate-docker-image + uses: ./.github/actions/calculate-docker-image + with: + docker-image-name: ${{ inputs.docker-image-name }} + + - name: Pull docker image + uses: ./.github/actions/pull-docker-image + with: + docker-image: ${{ steps.calculate-docker-image.outputs.docker-image }} + + - name: Output disk space left + shell: bash + run: | + sudo df -H + + - name: Preserve github env variables for use in docker + shell: bash + run: | + env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" + + - name: Parse ref + id: parse-ref + run: .github/scripts/parse_ref.py + + - name: Build arm-v7a + uses: ./.github/actions/build-android + with: + arch: arm_v7a + arch-for-build-env: arm-v7a + github-secret: ${{ secrets.GITHUB_TOKEN }} + build-environment: ${{ inputs.build-environment }} + docker-image: ${{ steps.calculate-docker-image.outputs.docker-image }} + branch: ${{ steps.parse-ref.outputs.branch }} + + - name: Build arm-v8a + uses: ./.github/actions/build-android + with: + arch: arm_v8a + arch-for-build-env: arm-v8a + github-secret: ${{ secrets.GITHUB_TOKEN }} + build-environment: ${{ inputs.build-environment }} + docker-image: ${{ steps.calculate-docker-image.outputs.docker-image }} + branch: ${{ steps.parse-ref.outputs.branch }} + + - name: Build x86_32 + id: build-x86_32 + uses: ./.github/actions/build-android + with: + arch: x86_32 + arch-for-build-env: x86_32 + github-secret: ${{ secrets.GITHUB_TOKEN }} + build-environment: ${{ inputs.build-environment }} + docker-image: ${{ steps.calculate-docker-image.outputs.docker-image }} + branch: ${{ steps.parse-ref.outputs.branch }} + + - name: Build x86_64 + uses: ./.github/actions/build-android + with: + arch: x86_64 + arch-for-build-env: x86_64 + github-secret: ${{ secrets.GITHUB_TOKEN }} + build-environment: ${{ inputs.build-environment }} + docker-image: ${{ steps.calculate-docker-image.outputs.docker-image }} + branch: ${{ steps.parse-ref.outputs.branch }} + + - name: Build final artifact + env: + BRANCH: ${{ steps.parse-ref.outputs.branch }} + DOCKER_IMAGE: ${{ steps.calculate-docker-image.outputs.docker-image }} + AWS_DEFAULT_REGION: us-east-1 + PR_NUMBER: ${{ github.event.pull_request.number }} + SHA1: ${{ github.event.pull_request.head.sha || github.sha }} + CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts + SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 + ID_X86_32: ${{ steps.build-x86_32.outputs.container_id }} + run: | + set -eux + + # Putting everything together + # ID_X86_32 container were created during build-x86_32 step + docker cp "${GITHUB_WORKSPACE}/build_android_install_arm_v7a" "${ID_X86_32}:/var/lib/jenkins/workspace/build_android_install_arm_v7a" + docker cp "${GITHUB_WORKSPACE}/build_android_install_x86_64" "${ID_X86_32}:/var/lib/jenkins/workspace/build_android_install_x86_64" + docker cp "${GITHUB_WORKSPACE}/build_android_install_arm_v8a" "${ID_X86_32}:/var/lib/jenkins/workspace/build_android_install_arm_v8a" + docker cp "${GITHUB_WORKSPACE}/build_android_install_x86_32" "${ID_X86_32}:/var/lib/jenkins/workspace/build_android_install_x86_32" + + # run gradle buildRelease + (echo "./.circleci/scripts/build_android_gradle.sh" | docker exec \ + -e BUILD_ENVIRONMENT="pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build" \ + -e MAX_JOBS="$(nproc --ignore=2)" \ + -e AWS_DEFAULT_REGION \ + -e IS_GHA \ + -e PR_NUMBER \ + -e SHA1 \ + -e BRANCH \ + -e GITHUB_RUN_ID \ + -e SCCACHE_BUCKET \ + -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ + -e SKIP_SCCACHE_INITIALIZATION=1 \ + --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ + --user jenkins \ + -u jenkins -i "${ID_X86_32}" bash) 2>&1 + + mkdir -p "${GITHUB_WORKSPACE}/build_android_artifacts" + docker cp "${ID_X86_32}:/var/lib/jenkins/workspace/android/artifacts.tgz" "${GITHUB_WORKSPACE}/build_android_artifacts/" + + - name: Display and upload binary build size statistics (Click Me) + # temporary hack: set CIRCLE_* vars, until we update + # tools/stats/print_test_stats.py to natively support GitHub Actions + env: + AWS_DEFAULT_REGION: us-east-1 + BRANCH: ${{ steps.parse-ref.outputs.branch }} + PR_NUMBER: ${{ github.event.pull_request.number }} + SHA1: ${{ github.event.pull_request.head.sha || github.sha }} + TAG: ${{ steps.parse-ref.outputs.tag }} + WORKFLOW_ID: ${{ github.run_id }} + ANDROID_BUILD_TYPE: prebuilt + SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }} + run: | + # The artifact file is created inside docker container, which contains the result binaries. + # Now unpackage it into the project folder. The subsequent script will scan project folder + # to locate result binaries and report their sizes. + # If artifact file is not provided it assumes that the project folder has been mounted in + # the docker during build and already contains the result binaries, so this step can be skipped. + export ARTIFACTS=${GITHUB_WORKSPACE}/build_android_artifacts/artifacts.tgz + if [ -n "${ARTIFACTS}" ]; then + tar xf "${ARTIFACTS}" -C "${GITHUB_WORKSPACE}" + cd "${GITHUB_WORKSPACE}" + fi + COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0) + export COMMIT_TIME + pip3 install requests==2.26 boto3==1.16.34 + python3 -m tools.stats.upload_binary_size_to_scuba "android" || exit 0 + + - name: Publish android snapshot + if: ${{ github.event_name == 'push' && github.event.ref == 'refs/heads/nightly' }} + env: + SONATYPE_NEXUS_USERNAME: ${{ secrets.SONATYPE_NEXUS_USERNAME }} + SONATYPE_NEXUS_PASSWORD: ${{ secrets.SONATYPE_NEXUS_PASSWORD }} + ANDROID_SIGN_KEY: ${{ secrets.ANDROID_SIGN_KEY }} + ANDROID_SIGN_PASS: ${{ secrets.ANDROID_SIGN_PASS }} + ID_X86_32: ${{ steps.build-x86_32.outputs.container_id }} + run: | + set -eux + (echo "./.circleci/scripts/publish_android_snapshot.sh" | docker exec \ + -e BUILD_ENVIRONMENT="pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-publish-snapshot" \ + -e SONATYPE_NEXUS_USERNAME \ + -e SONATYPE_NEXUS_PASSWORD \ + -e ANDROID_SIGN_KEY \ + -e ANDROID_SIGN_PASS \ + -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \ + -u jenkins -i "${ID_X86_32}" bash) 2>&1 + + - name: Store PyTorch Android Build Artifacts on S3 + uses: seemethere/upload-artifact-s3@v4 + with: + name: ${{ inputs.build-environment }} + retention-days: 14 + if-no-files-found: error + path: build_android_artifacts/artifacts.tgz + + - name: Chown workspace + uses: ./.github/actions/chown-workspace + if: always() + + - name: Teardown Linux + uses: ./.github/actions/teardown-linux + if: always() diff --git a/.github/workflows/_bazel-build-test.yml b/.github/workflows/_bazel-build-test.yml new file mode 100644 index 00000000000000..57a5d47af3c25c --- /dev/null +++ b/.github/workflows/_bazel-build-test.yml @@ -0,0 +1,185 @@ +name: bazel + +on: + workflow_call: + inputs: + build-environment: + required: true + type: string + description: Top-level label for what's being built/tested. + docker-image-name: + required: true + type: string + description: Name of the base docker image to build with. + +env: + IN_CI: 1 # TODO delete in favor of GITHUB_ACTIONS + IS_GHA: 1 # TODO delete in favor of GITHUB_ACTIONS + GIT_DEFAULT_BRANCH: ${{ github.event.repository.default_branch }} + +jobs: + build-and-test: + # Don't run on forked repos. + if: github.repository_owner == 'pytorch' + runs-on: [self-hosted, linux.2xlarge] + steps: + # [see note: pytorch repo ref] + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + + - name: Setup Linux + uses: ./.github/actions/setup-linux + + - name: Setup SSH (Click me for login details) + uses: ./.github/actions/setup-ssh + with: + github-secret: ${{ secrets.GITHUB_TOKEN }} + + - name: Calculate docker image + id: calculate-docker-image + uses: ./.github/actions/calculate-docker-image + with: + docker-image-name: ${{ inputs.docker-image-name }} + + - name: Pull docker image + uses: ./.github/actions/pull-docker-image + with: + docker-image: ${{ steps.calculate-docker-image.outputs.docker-image }} + + - name: Output disk space left + run: | + sudo df -H + + - name: Preserve github env variables for use in docker + run: | + env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" + + - name: Parse ref + id: parse-ref + run: .github/scripts/parse_ref.py + + - name: Build + env: + BUILD_ENVIRONMENT: ${{ inputs.build-environment }} + BRANCH: ${{ steps.parse-ref.outputs.branch }} + JOB_BASE_NAME: ${{ inputs.build-environment }}-build-and-test + # TODO duplicated + AWS_DEFAULT_REGION: us-east-1 + SHA1: ${{ github.event.pull_request.head.sha || github.sha }} + SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 + CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts + PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} + TORCH_CUDA_ARCH_LIST: 5.2 + DOCKER_IMAGE: ${{ steps.calculate-docker-image.outputs.docker-image }} + run: | + # detached container should get cleaned up by teardown_ec2_linux + container_name=$(docker run \ + -e BUILD_ENVIRONMENT \ + -e JOB_BASE_NAME \ + -e MAX_JOBS="$(nproc --ignore=2)" \ + -e SCCACHE_BUCKET \ + -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ + -e PR_LABELS \ + -e SKIP_SCCACHE_INITIALIZATION=1 \ + -e TORCH_CUDA_ARCH_LIST \ + --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ + --security-opt seccomp=unconfined \ + --cap-add=SYS_PTRACE \ + --tty \ + --detach \ + --user jenkins \ + -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ + -w /var/lib/jenkins/workspace \ + "${DOCKER_IMAGE}" + ) + docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && sudo chown -R jenkins /dev && .jenkins/pytorch/build.sh' + + # !{{ common_android.upload_androind_binary_size("", "")}} + - name: Test + # Time out the test phase after 3.5 hours + timeout-minutes: 210 + env: + JOB_BASE_NAME: ${{ inputs.build-environment }}-build-and-test + BUILD_ENVIRONMENT: ${{ inputs.build-environment }} + PR_NUMBER: ${{ github.event.pull_request.number }} + BRANCH: ${{ steps.parse-ref.outputs.branch }} + CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts + SHA1: ${{ github.event.pull_request.head.sha || github.sha }} + PYTORCH_RETRY_TEST_CASES: 1 + PR_BODY: ${{ github.event.pull_request.body }} + SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 + DOCKER_IMAGE: ${{ steps.calculate-docker-image.outputs.docker-image }} + run: | + # detached container should get cleaned up by teardown_ec2_linux + export SHARD_NUMBER=0 + COMMIT_MESSAGES=$(git cherry -v "origin/${GIT_DEFAULT_BRANCH:-master}") + export COMMIT_MESSAGES + # TODO: Stop building test binaries as part of the build phase + # Make sure we copy test results from bazel-testlogs symlink to + # a regular directory ./test/test-reports + container_name=$(docker run \ + -e BUILD_ENVIRONMENT \ + -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ + -e GITHUB_ACTIONS \ + -e GIT_DEFAULT_BRANCH="$GIT_DEFAULT_BRANCH" \ + -e IN_CI \ + -e SHARD_NUMBER \ + -e NUM_TEST_SHARDS \ + -e JOB_BASE_NAME \ + -e MAX_JOBS="$(nproc --ignore=2)" \ + -e SCCACHE_BUCKET \ + -e PR_LABELS \ + --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ + --security-opt seccomp=unconfined \ + --cap-add=SYS_PTRACE \ + --shm-size="1g" \ + --tty \ + --detach \ + --user jenkins \ + -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ + -w /var/lib/jenkins/workspace \ + "${DOCKER_IMAGE}" + ) + docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && sudo chown -R jenkins /dev && .jenkins/pytorch/test.sh && cp -Lr ./bazel-testlogs ./test/test-reports' + + - name: Chown workspace + uses: ./.github/actions/chown-workspace + if: always() + + - name: Get workflow job id + id: get-job-id + uses: pytorch/pytorch/.github/actions/get-workflow-job-id@master + if: always() + with: + github-token: ${{ secrets.GITHUB_TOKEN }} + + - name: Upload test artifacts + uses: ./.github/actions/upload-test-artifacts + if: always() + with: + file-suffix: bazel-${{ github.job }}_${{ steps.get-job-id.outputs.job-id }} + + - name: Upload test statistics + if: always() + env: + AWS_DEFAULT_REGION: us-east-1 + GIT_DEFAULT_BRANCH: ${{ github.event.repository.default_branch }} + BRANCH: ${{ steps.parse-ref.outputs.branch }} + BUILD_ENVIRONMENT: ${{ inputs.build-environment }} + JOB_BASE_NAME: ${{ inputs.build-environment }}-test + PR_NUMBER: ${{ github.event.pull_request.number }} + SHA1: ${{ github.event.pull_request.head.sha || github.sha }} + TAG: ${{ steps.parse-ref.outputs.tag }} + WORKFLOW_ID: ${{ github.run_id }} + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + GHA_WORKFLOW_JOB_ID: ${{ steps.get-job-id.outputs.job-id }} + shell: bash + run: | + set -x + python3 -m pip install -r requirements.txt + python3 -m pip install boto3==1.19.12 + python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test + + - name: Teardown Linux + uses: ./.github/actions/teardown-linux + if: always() diff --git a/.github/workflows/_docs.yml b/.github/workflows/_docs.yml new file mode 100644 index 00000000000000..96ed63cbb0f6a4 --- /dev/null +++ b/.github/workflows/_docs.yml @@ -0,0 +1,132 @@ +name: build docs + +on: + workflow_call: + inputs: + build-environment: + required: true + type: string + description: Top-level label for what's being built/tested. + docker-image: + required: true + type: string + description: Docker image to run in. + push: + required: false + type: boolean + default: false + description: If set, push the docs to the docs website. + + secrets: + GH_PYTORCHBOT_TOKEN: + required: false + description: Permissions for pushing to the docs site. + +env: + IN_CI: 1 # TODO delete in favor of GITHUB_ACTIONS + IS_GHA: 1 # TODO delete in favor of GITHUB_ACTIONS + +jobs: + build-docs: + # Don't run on forked repos. + if: github.repository_owner == 'pytorch' + runs-on: [self-hosted, linux.2xlarge] + strategy: + matrix: + docs_type: [cpp, python] + steps: + # [see note: pytorch repo ref] + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + + - name: Setup Linux + uses: ./.github/actions/setup-linux + + - name: Setup SSH (Click me for login details) + uses: ./.github/actions/setup-ssh + with: + github-secret: ${{ secrets.GITHUB_TOKEN }} + + - name: Pull docker image + uses: ./.github/actions/pull-docker-image + with: + docker-image: ${{ inputs.docker-image }} + + - name: Download build artifacts + uses: ./.github/actions/download-build-artifacts + with: + name: ${{ inputs.build-environment }} + + - name: Generate netrc (only for docs-push) + if: inputs.push + env: + GITHUB_PYTORCHBOT_TOKEN: ${{ secrets.GH_PYTORCHBOT_TOKEN }} + run: | + # set credentials for https pushing + echo "machine github.com" > "${RUNNER_TEMP}/.netrc" + echo "login pytorchbot" >> "${RUNNER_TEMP}/.netrc" + echo "password ${GITHUB_PYTORCHBOT_TOKEN}" >> "${RUNNER_TEMP}/.netrc" + + - name: Build ${{ matrix.docs_type }} docs + env: + PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} + CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts + WITH_PUSH: ${{ github.event_name == 'schedule' || startsWith(github.event.ref, 'refs/tags/v') }} + DOCKER_IMAGE: ${{ inputs.docker-image }} + DOCS_TYPE: ${{ matrix.docs_type }} + BUILD_ENVIRONMENT: ${{ inputs.build-environment }} + run: | + set -ex + # Convert refs/tags/v1.12.0rc3 into 1.12 + if [[ "${GITHUB_REF}" =~ ^refs/tags/v([0-9]+\.[0-9]+)\.* ]]; then + target="${BASH_REMATCH[1]}" + else + target="master" + fi + # detached container should get cleaned up by teardown_ec2_linux + container_name=$(docker run \ + -e BUILD_ENVIRONMENT \ + -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ + -e IN_CI \ + -e MAX_JOBS="$(nproc --ignore=2)" \ + -e SHA1="$GITHUB_SHA" \ + -e DOCS_VERSION="${target}" \ + -e DOCS_TYPE \ + -e PR_LABELS \ + -e WITH_PUSH \ + --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ + --security-opt seccomp=unconfined \ + --cap-add=SYS_PTRACE \ + --tty \ + --detach \ + --user jenkins \ + -v "${RUNNER_TEMP}/.netrc":/var/lib/jenkins/.netrc \ + -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ + -w /var/lib/jenkins/workspace \ + "${DOCKER_IMAGE}" + ) + docker exec -t "${container_name}" bash -c "sudo chown -R jenkins . && pip install dist/*.whl && ./.circleci/scripts/${DOCS_TYPE}_doc_push_script.sh" + + - name: Chown workspace + uses: ./.github/actions/chown-workspace + if: always() + + - name: Upload Python Docs Preview + uses: seemethere/upload-artifact-s3@v4 + if: ${{ github.event_name == 'pull_request' && matrix.docs_type == 'python' }} + with: + retention-days: 14 + s3-bucket: doc-previews + if-no-files-found: error + path: pytorch.github.io/docs/master/ + s3-prefix: pytorch/${{ github.event.pull_request.number }} + + - name: Upload C++ Docs Preview + uses: seemethere/upload-artifact-s3@v4 + if: ${{ github.event_name == 'pull_request' && matrix.docs_type == 'cpp' }} + with: + retention-days: 14 + if-no-files-found: error + s3-bucket: doc-previews + path: cppdocs/ + s3-prefix: pytorch/${{ github.event.pull_request.number }}/cppdocs diff --git a/.github/workflows/generated-ios-12-5-1-x86-64.yml b/.github/workflows/_ios-build-test.yml similarity index 79% rename from .github/workflows/generated-ios-12-5-1-x86-64.yml rename to .github/workflows/_ios-build-test.yml index b4e762094b8a3b..fa3b7e2836f8f3 100644 --- a/.github/workflows/generated-ios-12-5-1-x86-64.yml +++ b/.github/workflows/_ios-build-test.yml @@ -1,58 +1,62 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/ios_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: ios-12-5-1-x86-64 +name: ios-build-test on: - push: - branches: - - master - - main - - release/* - tags: - - 'ciflow/all/*' - - 'ciflow/ios/*' - - 'ciflow/macos/*' - - 'ciflow/trunk/*' - workflow_dispatch: + workflow_call: + inputs: + build-environment: + required: true + type: string + description: Top-level label for what's being built/tested. + ios-platform: + required: true + type: string + description: Which iOS platform to build for. + ios-arch: + required: true + type: string + description: Which iOS arch to build for. + + secrets: + IOS_CERT_KEY_2022: + required: true + description: ios cert + IOS_CERT_SECRET: + required: true + description: ios cert + IOS_DEV_TEAM_ID: + required: true + description: ios cert + IOS_SIGN_KEY_2022: + required: true + description: ios cert env: - BUILD_ENVIRONMENT: ios-12-5-1-x86-64 IN_CI: 1 IS_GHA: 1 - IOS_PLATFORM: SIMULATOR - IOS_ARCH: x86_64 - + GIT_DEFAULT_BRANCH: ${{ github.event.repository.default_branch }} + BUILD_ENVIRONMENT: ${{ inputs.build-environment }} + IOS_PLATFORM: ${{ inputs.ios-platform }} + IOS_ARCH: ${{ inputs.ios-arch }} jobs: - build: # NOTE: These builds will not run successfully without running on `pytorch/pytorch` due to the limitations # of accessing secrets from forked pull requests and IOS' dependency on secrets for their build/test - if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} + if: github.repository_owner == 'pytorch' runs-on: macos-10.15 timeout-minutes: 240 env: - JOB_BASE_NAME: ios-12-5-1-x86-64-build + JOB_BASE_NAME: ${{ inputs.build-environment }}-build IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }} IOS_CERT_SECRET: ${{ secrets.IOS_CERT_SECRET }} IOS_DEV_TEAM_ID: ${{ secrets.IOS_DEV_TEAM_ID }} IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }} PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} steps: - - name: print labels - run: echo "${PR_LABELS}" + # [see note: pytorch repo ref] - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Populate CI build options run: | # Most builds use the lite interpreter, if certain builds shouldn't @@ -74,10 +78,12 @@ jobs: echo "USE_COREML_DELEGATE=1" >> "${GITHUB_ENV}" ;; esac + - name: Install brew dependencies run: | # Install dependencies brew install libtool + - name: Install conda and dependencies run: | # Install conda, setup-miniconda messes with the path that messes with the ruby stuff we do later on @@ -98,6 +104,7 @@ jobs: requests \ setuptools \ typing_extensions + - name: Run Fastlane run: | set -x @@ -118,6 +125,7 @@ jobs: echo "${IOS_SIGN_KEY_2022}" >> cert.txt base64 --decode cert.txt -o ${PROFILE} rm cert.txt + - name: Build run: | # shellcheck disable=SC1091 @@ -126,6 +134,7 @@ jobs: python -VV export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"} scripts/build_ios.sh + - name: Run Build Test run: | PROFILE=PyTorch_CI_2022 @@ -139,7 +148,9 @@ jobs: else ruby scripts/xcode_build.rb -i build_ios/install -x ios/TestApp/TestApp.xcodeproj -p "${IOS_PLATFORM}" fi + - name: Run Simulator Tests + if: inputs.ios-platform == 'SIMULATOR' run: | # shellcheck disable=SC1091 source "${RUNNER_TEMP}/anaconda/bin/activate" @@ -152,8 +163,10 @@ jobs: pip install six==1.16.0 python coreml_backend.py else - python trace_model.py + cd "${GITHUB_WORKSPACE}" + python test/mobile/model_test/gen_test_model.py ios-test fi + cd "${GITHUB_WORKSPACE}/ios/TestApp/benchmark" if [ "${BUILD_LITE_INTERPRETER}" == 1 ]; then echo "Setting up the TestApp for LiteInterpreter" ruby setup.rb --lite 1 @@ -167,12 +180,8 @@ jobs: if [ "${USE_COREML_DELEGATE}" == 1 ]; then fastlane scan --only_testing TestAppTests/TestAppTests/testCoreML else - fastlane scan --only_testing TestAppTests/TestAppTests/testLiteInterpreter + fastlane scan --skip_testing TestAppTests/TestAppTests/testCoreML fi else fastlane scan --only_testing TestAppTests/TestAppTests/testFullJIT fi - -concurrency: - group: ios-12-5-1-x86-64-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true diff --git a/.github/workflows/_linux-build.yml b/.github/workflows/_linux-build.yml new file mode 100644 index 00000000000000..cf6419f208e2a8 --- /dev/null +++ b/.github/workflows/_linux-build.yml @@ -0,0 +1,158 @@ +name: linux-build + +on: + workflow_call: + inputs: + build-environment: + required: true + type: string + description: Top-level label for what's being built/tested. + docker-image-name: + required: true + type: string + description: Name of the base docker image to build with. + build-generates-artifacts: + required: false + type: boolean + default: true + description: If set, upload generated build artifacts. + build-with-debug: + required: false + type: boolean + default: false + description: If set, build in debug mode. + + outputs: + docker-image: + value: ${{ jobs.build.outputs.docker-image }} + description: The docker image containing the built PyTorch. + +env: + IN_CI: 1 # TODO delete in favor of GITHUB_ACTIONS + IS_GHA: 1 # TODO delete in favor of GITHUB_ACTIONS + +jobs: + build: + # Don't run on forked repos. + if: github.repository_owner == 'pytorch' + runs-on: [self-hosted, linux.2xlarge] + timeout-minutes: 240 + outputs: + docker-image: ${{ steps.calculate-docker-image.outputs.docker-image }} + steps: + # [pytorch repo ref] + # Use a pytorch/pytorch reference instead of a reference to the local + # checkout because when we run this action we don't *have* a local + # checkout. In other cases you should prefer a local checkout. + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + + - name: Check for new workflows + run: | + if [ ! -f "./.github/actions/setup-linux/action.yml" ]; then + echo "::error::Your PR is based on a version of master that is too old for our CI to work. Please rebase your PR on latest master and resubmit." + exit 1 + fi + + - name: Setup Linux + uses: ./.github/actions/setup-linux + + - name: Setup SSH (Click me for login details) + uses: ./.github/actions/setup-ssh + with: + github-secret: ${{ secrets.GITHUB_TOKEN }} + + - name: Calculate docker image + id: calculate-docker-image + uses: ./.github/actions/calculate-docker-image + with: + docker-image-name: ${{ inputs.docker-image-name }} + xla: ${{ contains(inputs.build-environment, 'xla') }} + + - name: Pull docker image + uses: ./.github/actions/pull-docker-image + with: + docker-image: ${{ steps.calculate-docker-image.outputs.docker-image }} + + - name: Parse ref + id: parse-ref + run: .github/scripts/parse_ref.py + + - name: Build + env: + BUILD_ENVIRONMENT: ${{ inputs.build-environment }} + BRANCH: ${{ steps.parse-ref.outputs.branch }} + JOB_BASE_NAME: ${{ inputs.build-environment }}-build + # TODO duplicated + AWS_DEFAULT_REGION: us-east-1 + PR_NUMBER: ${{ github.event.pull_request.number }} + SHA1: ${{ github.event.pull_request.head.sha || github.sha }} + SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 + XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla + CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts + PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} + TORCH_CUDA_ARCH_LIST: 5.2 + DOCKER_IMAGE: ${{ steps.calculate-docker-image.outputs.docker-image }} + XLA_CUDA: ${{ contains(inputs.build-environment, 'xla') && '0' || '' }} + DEBUG: ${{ inputs.build-with-debug && '1' || '0' }} + run: | + # detached container should get cleaned up by teardown_ec2_linux + container_name=$(docker run \ + -e BUILD_ENVIRONMENT \ + -e JOB_BASE_NAME \ + -e MAX_JOBS="$(nproc --ignore=2)" \ + -e AWS_DEFAULT_REGION \ + -e IS_GHA \ + -e PR_NUMBER \ + -e SHA1 \ + -e BRANCH \ + -e GITHUB_RUN_ID \ + -e SCCACHE_BUCKET \ + -e XLA_CUDA \ + -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ + -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ + -e SKIP_SCCACHE_INITIALIZATION=1 \ + -e TORCH_CUDA_ARCH_LIST \ + -e PR_LABELS \ + --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ + --security-opt seccomp=unconfined \ + --cap-add=SYS_PTRACE \ + --tty \ + --detach \ + --user jenkins \ + -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ + -w /var/lib/jenkins/workspace \ + "${DOCKER_IMAGE}" + ) + docker exec -t "${container_name}" sh -c '.jenkins/pytorch/build.sh' + + - name: Display and upload binary build size statistics (Click Me) + # temporary hack: set CIRCLE_* vars, until we update + # tools/stats/print_test_stats.py to natively support GitHub Actions + env: + BRANCH: ${{ steps.parse-ref.outputs.branch }} + TAG: ${{ steps.parse-ref.outputs.tag }} + WORKFLOW_ID: ${{ github.run_id }} + run: | + COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0) + export COMMIT_TIME + pip3 install requests==2.26 boto3==1.16.34 + python3 -m tools.stats.upload_binary_size_to_scuba || exit 0 + + - name: Archive artifacts into zip + if: inputs.build-generates-artifacts + run: | + zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json + + - name: Store PyTorch Build Artifacts on S3 + uses: seemethere/upload-artifact-s3@v4 + if: inputs.build-generates-artifacts + with: + name: ${{ inputs.build-environment }} + retention-days: 14 + if-no-files-found: error + path: artifacts.zip + + - name: Teardown Linux + uses: ./.github/actions/teardown-linux + if: always() diff --git a/.github/workflows/_linux-test.yml b/.github/workflows/_linux-test.yml new file mode 100644 index 00000000000000..8c203b87ebcc5b --- /dev/null +++ b/.github/workflows/_linux-test.yml @@ -0,0 +1,193 @@ +name: linux-test + +on: + workflow_call: + inputs: + build-environment: + required: true + type: string + description: Top-level label for what's being built/tested. + test-matrix: + required: true + type: string + description: JSON description of what test configs to run. + docker-image: + required: true + type: string + description: Docker image to run in. + +env: + IN_CI: 1 # TODO delete in favor of GITHUB_ACTIONS + IS_GHA: 1 # TODO delete in favor of GITHUB_ACTIONS + GIT_DEFAULT_BRANCH: ${{ github.event.repository.default_branch }} + +jobs: + test: + # Don't run on forked repos. + if: github.repository_owner == 'pytorch' + strategy: + matrix: ${{ fromJSON(inputs.test-matrix) }} + fail-fast: false + runs-on: ${{ matrix.runner }} + steps: + # [see note: pytorch repo ref] + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + + - name: Setup Linux + uses: ./.github/actions/setup-linux + + - name: Setup SSH (Click me for login details) + uses: ./.github/actions/setup-ssh + with: + github-secret: ${{ secrets.GITHUB_TOKEN }} + + - name: Pull docker image + uses: ./.github/actions/pull-docker-image + with: + docker-image: ${{ inputs.docker-image }} + + - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG + uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a + if: contains(inputs.build-environment, 'cuda') && !contains(matrix.config, 'nogpu') + with: + timeout_minutes: 10 + max_attempts: 3 + command: | + set -ex + bash .github/scripts/install_nvidia_utils_linux.sh + echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" + + - name: Download build artifacts + uses: ./.github/actions/download-build-artifacts + with: + name: ${{ inputs.build-environment }} + + - name: Parse ref + id: parse-ref + run: .github/scripts/parse_ref.py + + - name: Test + env: + BUILD_ENVIRONMENT: ${{ inputs.build-environment }} + PR_NUMBER: ${{ github.event.pull_request.number }} + BRANCH: ${{ steps.parse-ref.outputs.branch }} + CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts + SHA1: ${{ github.event.pull_request.head.sha || github.sha }} + PYTORCH_RETRY_TEST_CASES: 1 + JOB_BASE_NAME: ${{ inputs.build-environment }}-test + TEST_CONFIG: ${{ matrix.config }} + SHARD_NUMBER: ${{ matrix.shard }} + NUM_TEST_SHARDS: ${{ matrix.num_shards }} + PR_BODY: ${{ github.event.pull_request.body }} + SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 + SHM_SIZE: ${{ contains(inputs.build-environment, 'cuda') && '2g' || '1g' }} + DOCKER_IMAGE: ${{ inputs.docker-image }} + XLA_CUDA: ${{ contains(inputs.build-environment, 'xla') && '0' || '' }} + XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla + timeout-minutes: 240 + run: | + set -x + + if [[ $TEST_CONFIG == 'multigpu' ]]; then + TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh + elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then + TEST_COMMAND=.jenkins/caffe2/test.sh + else + TEST_COMMAND=.jenkins/pytorch/test.sh + fi + + COMMIT_MESSAGES=$(git cherry -v "origin/${GIT_DEFAULT_BRANCH:-master}") + export COMMIT_MESSAGES + + # detached container should get cleaned up by teardown_ec2_linux + # TODO: Stop building test binaries as part of the build phase + # Used for GPU_FLAG since that doesn't play nice + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BUILD_ENVIRONMENT \ + -e PR_NUMBER \ + -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ + -e GITHUB_ACTIONS \ + -e IN_CI \ + -e IS_GHA \ + -e BRANCH \ + -e SHA1 \ + -e AWS_DEFAULT_REGION \ + -e IN_WHEEL_TEST \ + -e SHARD_NUMBER \ + -e JOB_BASE_NAME \ + -e TEST_CONFIG \ + -e NUM_TEST_SHARDS \ + -e PR_BODY \ + -e COMMIT_MESSAGES \ + -e PYTORCH_RETRY_TEST_CASES \ + -e PR_LABELS \ + -e MAX_JOBS="$(nproc --ignore=2)" \ + -e SCCACHE_BUCKET \ + -e XLA_CUDA \ + -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ + --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ + --ulimit stack=10485760:83886080 \ + --security-opt seccomp=unconfined \ + --cap-add=SYS_PTRACE \ + --ipc=host \ + --shm-size="${SHM_SIZE}" \ + --tty \ + --detach \ + --name="${container_name}" \ + --user jenkins \ + -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ + -w /var/lib/jenkins/workspace \ + "${DOCKER_IMAGE}" + ) + docker exec -t "${container_name}" sh -c "pip install dist/*.whl && ${TEST_COMMAND}" + + - name: Get workflow job id + id: get-job-id + uses: pytorch/pytorch/.github/actions/get-workflow-job-id@master + if: always() + with: + github-token: ${{ secrets.GITHUB_TOKEN }} + + - name: Upload test artifacts + uses: ./.github/actions/upload-test-artifacts + if: always() + with: + file-suffix: ${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}_${{ steps.get-job-id.outputs.job-id }} + + - name: Store Core dumps on S3 + uses: seemethere/upload-artifact-s3@v4 + if: failure() + with: + name: coredumps-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }} + retention-days: 14 + if-no-files-found: ignore + path: + ./**/core.[1-9]* + + - name: Upload test statistics + if: always() + env: + AWS_DEFAULT_REGION: us-east-1 + GIT_DEFAULT_BRANCH: ${{ github.event.repository.default_branch }} + BRANCH: ${{ steps.parse-ref.outputs.branch }} + JOB_BASE_NAME: ${{ inputs.build-environment }}-test + BUILD_ENVIRONMENT: ${{ inputs.build-environment }} + PR_NUMBER: ${{ github.event.pull_request.number }} + SHA1: ${{ github.event.pull_request.head.sha || github.sha }} + TAG: ${{ steps.parse-ref.outputs.tag }} + WORKFLOW_ID: ${{ github.run_id }} + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + GHA_WORKFLOW_JOB_ID: ${{ steps.get-job-id.outputs.job-id }} + shell: bash + run: | + set -x + python3 -m pip install -r requirements.txt + python3 -m pip install boto3==1.19.12 + python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test + + - name: Teardown Linux + uses: ./.github/actions/teardown-linux + if: always() diff --git a/.github/workflows/_mac-build.yml b/.github/workflows/_mac-build.yml new file mode 100644 index 00000000000000..bfda5df5dd104e --- /dev/null +++ b/.github/workflows/_mac-build.yml @@ -0,0 +1,102 @@ +name: mac-build + +on: + workflow_call: + inputs: + build-environment: + required: true + type: string + description: Top-level label for what's being built/tested. + runner-type: + required: true + type: string + description: Name of the GitHub-managed runner type to use for the build. + build-generates-artifacts: + required: true + type: boolean + description: If set, upload generated build artifacts. + xcode-version: + required: false + type: string + default: "" + description: What xcode version to build with. + + secrets: + MACOS_SCCACHE_S3_ACCESS_KEY_ID: + required: true + description: Access key for S3 bucket for macOS sccache. + MACOS_SCCACHE_S3_SECRET_ACCESS_KEY: + required: true + description: Secret for S3 bucket for macOS sccache. + +env: + IN_CI: 1 # TODO delete in favor of GITHUB_ACTIONS + IS_GHA: 1 # TODO delete in favor of GITHUB_ACTIONS + +# For setup-miniconda, see https://github.com/conda-incubator/setup-miniconda/issues/179 +defaults: + run: + shell: bash -e -l {0} + +jobs: + build: + # Don't run on forked repos. + if: github.repository_owner == 'pytorch' + runs-on: ${{ inputs.runner-type }} + env: + JOB_BASE_NAME: ${{ inputs.build-environment }} + # For sccache access (only on non-forked PRs) + AWS_ACCESS_KEY_ID: ${{ secrets.MACOS_SCCACHE_S3_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.MACOS_SCCACHE_S3_SECRET_ACCESS_KEY }} + BUILD_ENVIRONMENT: ${{ inputs.build-environment }} + COMPACT_JOB_NAME: ${{ inputs.build-environment }} + steps: + # [see note: pytorch repo ref] + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + + - name: Set xcode version + env: + XCODE_VERSION: ${{ inputs.xcode-version }} + run: | + if [ -n "${XCODE_VERSION}" ]; then + echo "DEVELOPER_DIR=/Applications/Xcode_${XCODE_VERSION}.app/Contents/Developer" >> "${GITHUB_ENV}" + fi + + - name: Setup miniconda + uses: conda-incubator/setup-miniconda@v2 + with: + auto-update-conda: true + python-version: 3.8 + activate-environment: build + + - name: Install macOS homebrew dependencies + run: | + # Install dependencies + brew install libomp + + - name: Install sccache (only for non-forked PRs, and pushes to trunk) + if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} + run: | + sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache + sudo chmod +x /usr/local/bin/sccache + echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" + + - name: Build + run: | + echo "CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"}" >> "${GITHUB_ENV}" + .jenkins/pytorch/macos-build.sh + + - name: Archive artifacts into zip + if: inputs.build-generates-artifacts + run: | + zip -1 -r artifacts.zip dist/ + + - name: Store PyTorch Build Artifacts on GHA + uses: actions/upload-artifact@v2 + if: inputs.build-generates-artifacts + with: + name: ${{ env.BUILD_ENVIRONMENT }} + retention-days: 14 + if-no-files-found: error + path: artifacts.zip diff --git a/.github/workflows/_mac-test.yml b/.github/workflows/_mac-test.yml new file mode 100644 index 00000000000000..2234ae78f3206a --- /dev/null +++ b/.github/workflows/_mac-test.yml @@ -0,0 +1,120 @@ +name: mac-test + +on: + workflow_call: + inputs: + build-environment: + required: true + type: string + description: Top-level label for what's being built/tested. + test-matrix: + required: true + type: string + description: JSON description of what test configs to run. + + secrets: + AWS_OSSCI_METRICS_V2_ACCESS_KEY_ID: + required: true + description: access key id for test stats upload + AWS_OSSCI_METRICS_V2_SECRET_ACCESS_KEY: + required: true + description: secret acess key for test stats upload + +env: + IN_CI: 1 # TODO delete in favor of GITHUB_ACTIONS + IS_GHA: 1 # TODO delete in favor of GITHUB_ACTIONS + +# For setup-miniconda, see https://github.com/conda-incubator/setup-miniconda/issues/179 +defaults: + run: + shell: bash -e -l {0} + +jobs: + test: + # Don't run on forked repos. + if: github.repository_owner == 'pytorch' + strategy: + matrix: ${{ fromJSON(inputs.test-matrix) }} + fail-fast: false + runs-on: ${{ matrix.runner }} + timeout-minutes: 240 + env: + GIT_DEFAULT_BRANCH: ${{ github.event.repository.default_branch }} + BUILD_ENVIRONMENT: ${{ inputs.build-environment }} + COMPACT_JOB_NAME: ${{ inputs.build-environment }} + JOB_BASE_NAME: ${{ inputs.build-environment }}-test + TEST_CONFIG: ${{ matrix.config }} + SHARD_NUMBER: ${{ matrix.shard }} + NUM_TEST_SHARDS: ${{ matrix.num_shards }} + PR_BODY: ${{ github.event.pull_request.body }} + PYTORCH_RETRY_TEST_CASES: 1 + steps: + # [see note: pytorch repo ref] + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + + - name: Download build artifacts + uses: ./.github/actions/download-build-artifacts + with: + name: ${{ inputs.build-environment }} + use-gha: true + + - name: Setup miniconda + uses: conda-incubator/setup-miniconda@v2 + with: + auto-update-conda: true + python-version: 3.8 + activate-environment: build + + - name: Install macOS homebrew dependencies + run: | + # Install dependencies + brew install libomp + + - name: Parse ref + id: parse-ref + run: .github/scripts/parse_ref.py + + - name: Test + run: | + COMMIT_MESSAGES=$(git cherry -v "origin/${GIT_DEFAULT_BRANCH:-master}") + export COMMIT_MESSAGES + python3 -mpip install dist/*.whl + .jenkins/pytorch/macos-test.sh + + - name: Get workflow job id + id: get-job-id + uses: pytorch/pytorch/.github/actions/get-workflow-job-id@master + if: always() + with: + github-token: ${{ secrets.GITHUB_TOKEN }} + + - name: Upload test artifacts + uses: ./.github/actions/upload-test-artifacts + if: always() + with: + use-gha: true + file-suffix: ${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}_${{ steps.get-job-id.outputs.job-id }} + + - name: Upload test statistics + if: always() + env: + AWS_DEFAULT_REGION: us-east-1 + GIT_DEFAULT_BRANCH: ${{ github.event.repository.default_branch }} + BRANCH: ${{ steps.parse-ref.outputs.branch }} + JOB_BASE_NAME: ${{ inputs.build-environment }}-test + BUILD_ENVIRONMENT: ${{ inputs.build-environment }} + PR_NUMBER: ${{ github.event.pull_request.number }} + SHA1: ${{ github.event.pull_request.head.sha || github.sha }} + TAG: ${{ steps.parse-ref.outputs.tag }} + WORKFLOW_ID: ${{ github.run_id }} + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_OSSCI_METRICS_V2_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_OSSCI_METRICS_V2_SECRET_ACCESS_KEY }} + GHA_WORKFLOW_JOB_ID: ${{ steps.get-job-id.outputs.job-id }} + shell: bash + run: | + set -x + python3 -m pip install -r requirements.txt + python3 -m pip install boto3==1.19.12 + python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test diff --git a/.github/workflows/_rocm-test.yml b/.github/workflows/_rocm-test.yml new file mode 100644 index 00000000000000..167e73424def88 --- /dev/null +++ b/.github/workflows/_rocm-test.yml @@ -0,0 +1,176 @@ +# TODO: this looks sort of similar to _linux-test, but there are like a dozen +# places where you would have to insert an if statement. Probably it's better to +# just use a different workflow altogether + +name: test + +on: + workflow_call: + inputs: + build-environment: + required: true + type: string + description: Top-level label for what's being built/tested. + test-matrix: + required: true + type: string + description: JSON description of what test configs to run. + docker-image: + required: true + type: string + description: Docker image to run in. + +env: + IN_CI: 1 # TODO delete in favor of GITHUB_ACTIONS + IS_GHA: 1 # TODO delete in favor of GITHUB_ACTIONS + GIT_DEFAULT_BRANCH: ${{ github.event.repository.default_branch }} + +jobs: + test: + # Don't run on forked repos. + if: github.repository_owner == 'pytorch' + timeout-minutes: 270 + strategy: + matrix: ${{ fromJSON(inputs.test-matrix) }} + fail-fast: false + runs-on: ${{ matrix.runner }} + steps: + # [see note: pytorch repo ref] + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + with: + no-sudo: true + + - name: Setup ROCm + uses: ./.github/actions/setup-rocm + + - name: Pull docker image + uses: ./.github/actions/pull-docker-image + with: + docker-image: ${{ inputs.docker-image }} + + - name: Download build artifacts + uses: ./.github/actions/download-build-artifacts + with: + name: ${{ inputs.build-environment }} + + - name: Parse ref + id: parse-ref + run: .github/scripts/parse_ref.py + + - name: Test + env: + BUILD_ENVIRONMENT: ${{ inputs.build-environment }} + PR_NUMBER: ${{ github.event.pull_request.number }} + BRANCH: ${{ steps.parse-ref.outputs.branch }} + CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts + SHA1: ${{ github.event.pull_request.head.sha || github.sha }} + PYTORCH_RETRY_TEST_CASES: 1 + JOB_BASE_NAME: ${{ inputs.build-environment }}-test + TEST_CONFIG: ${{ matrix.config }} + SHARD_NUMBER: ${{ matrix.shard }} + NUM_TEST_SHARDS: ${{ matrix.num_shards }} + PR_BODY: ${{ github.event.pull_request.body }} + SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 + DOCKER_IMAGE: ${{ inputs.docker-image }} + XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla + timeout-minutes: 240 + run: | + set -x + + if [[ $TEST_CONFIG == 'multigpu' ]]; then + TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh + elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then + TEST_COMMAND=.jenkins/caffe2/test.sh + else + TEST_COMMAND=.jenkins/pytorch/test.sh + fi + + COMMIT_MESSAGES=$(git cherry -v "origin/${GIT_DEFAULT_BRANCH:-master}") + export COMMIT_MESSAGES + + # detached container should get cleaned up by teardown_ec2_linux + # TODO: Stop building test binaries as part of the build phase + # Used for GPU_FLAG since that doesn't play nice + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BUILD_ENVIRONMENT \ + -e PR_NUMBER \ + -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ + -e GITHUB_ACTIONS \ + -e IN_CI \ + -e IS_GHA \ + -e BRANCH \ + -e SHA1 \ + -e AWS_DEFAULT_REGION \ + -e IN_WHEEL_TEST \ + -e SHARD_NUMBER \ + -e JOB_BASE_NAME \ + -e TEST_CONFIG \ + -e NUM_TEST_SHARDS \ + -e PR_BODY \ + -e COMMIT_MESSAGES \ + -e PYTORCH_RETRY_TEST_CASES \ + -e PR_LABELS \ + -e MAX_JOBS="$(nproc --ignore=2)" \ + -e SCCACHE_BUCKET \ + -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ + --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ + --ulimit stack=10485760:83886080 \ + --security-opt seccomp=unconfined \ + --cap-add=SYS_PTRACE \ + --shm-size="8g" \ + --tty \ + --detach \ + --name="${container_name}" \ + --user jenkins \ + -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ + -w /var/lib/jenkins/workspace \ + "${DOCKER_IMAGE}" + ) + # jenkins user does not have write permission to mounted workspace; work-around by copying within container to jenkins home + docker exec -t "${container_name}" sh -c "cd .. && cp -R workspace pytorch && cd pytorch && pip install dist/*.whl && ${TEST_COMMAND}" + # copy test results back to the mounted workspace, needed sudo, resulting permissions were correct + docker exec -t "${container_name}" sh -c "cd ../pytorch && sudo cp -R test/test-reports ../workspace/test" + + - name: Get workflow job id + id: get-job-id + uses: pytorch/pytorch/.github/actions/get-workflow-job-id@master + if: always() + with: + github-token: ${{ secrets.GITHUB_TOKEN }} + + - name: Upload test artifacts + uses: ./.github/actions/upload-test-artifacts + if: always() + with: + use-gha: true + file-suffix: ${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}_${{ steps.get-job-id.outputs.job-id }} + + - name: Upload test statistics + if: always() + env: + AWS_DEFAULT_REGION: us-east-1 + GIT_DEFAULT_BRANCH: ${{ github.event.repository.default_branch }} + BRANCH: ${{ steps.parse-ref.outputs.branch }} + JOB_BASE_NAME: ${{ inputs.build-environment }}-test + BUILD_ENVIRONMENT: ${{ inputs.build-environment }} + PR_NUMBER: ${{ github.event.pull_request.number }} + SHA1: ${{ github.event.pull_request.head.sha || github.sha }} + TAG: ${{ steps.parse-ref.outputs.tag }} + WORKFLOW_ID: ${{ github.run_id }} + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + GHA_WORKFLOW_JOB_ID: ${{ steps.get-job-id.outputs.job-id }} + shell: bash + run: | + set -x + python3 -m pip install -r requirements.txt + python3 -m pip install boto3==1.19.12 + python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test + + - name: Teardown Linux + uses: ./.github/actions/teardown-linux + if: always() + with: + skip-wait-ssh: true diff --git a/.github/workflows/_win-build.yml b/.github/workflows/_win-build.yml new file mode 100644 index 00000000000000..abd7aca07f7a6d --- /dev/null +++ b/.github/workflows/_win-build.yml @@ -0,0 +1,94 @@ +name: windows-build + +on: + workflow_call: + inputs: + build-environment: + required: true + type: string + description: Top-level label for what's being built/tested. + cuda-version: + required: true + type: string + description: What CUDA version to build with, "cpu" for none. + build-with-debug: + required: false + type: boolean + default: false + description: If set, build in debug mode. + +env: + IN_CI: 1 # TODO delete in favor of GITHUB_ACTIONS + IS_GHA: 1 # TODO delete in favor of GITHUB_ACTIONS + GIT_DEFAULT_BRANCH: ${{ github.event.repository.default_branch }} + +jobs: + build: + # Don't run on forked repos. + if: github.repository_owner == 'pytorch' + runs-on: [self-hosted, windows.4xlarge] + timeout-minutes: 240 + env: + JOB_BASE_NAME: ${{ inputs.build-environment }}-build + steps: + # [see note: pytorch repo ref] + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + with: + no-sudo: true + + - name: Setup Windows + uses: ./.github/actions/setup-win + with: + cuda-version: ${{ inputs.cuda-version }} + + - name: Setup SSH (Click me for login details) + uses: ./.github/actions/setup-ssh + with: + github-secret: ${{ secrets.GITHUB_TOKEN }} + + - name: Parse ref + id: parse-ref + run: .github/scripts/parse_ref.py + + - name: Build + shell: bash + env: + PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/ + BRANCH: ${{ steps.parse-ref.outputs.branch }} + BUILD_ENVIRONMENT: ${{ inputs.build-environment }} + BUILD_WHEEL: 1 + MAX_JOBS: 8 + CUDA_VERSION: ${{ inputs.cuda-version }} + PYTHON_VERSION: "3.8" + PYTORCH_RETRY_TEST_CASES: 1 + PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} + SCCACHE_BUCKET: "ossci-compiler-cache" + VC_PRODUCT: "BuildTools" + VC_VERSION: "" + VC_YEAR: "2019" + ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" + AWS_DEFAULT_REGION: us-east-1 + PR_NUMBER: ${{ github.event.pull_request.number }} + SHA1: ${{ github.event.pull_request.head.sha || github.sha }} + DEBUG: ${{ inputs.build-with-debug && '1' || '0' }} + TORCH_CUDA_ARCH_LIST: "7.0" + USE_CUDA: ${{ inputs.cuda-version != 'cpu' && '1' || '0' }} + run: | + .jenkins/pytorch/win-build.sh + + # Upload to github so that people can click and download artifacts + - name: Upload artifacts to s3 + uses: seemethere/upload-artifact-s3@v4 + with: + retention-days: 14 + if-no-files-found: error + name: ${{ env.BUILD_ENVIRONMENT }} + path: C:\${{ github.run_id }}\build-results + + - name: Teardown Windows + uses: ./.github/actions/teardown-win + if: always() + timeout-minutes: 120 + with: + extra-delete-dir: /c/${{ github.run_id }}/build-results/ diff --git a/.github/workflows/_win-test.yml b/.github/workflows/_win-test.yml new file mode 100644 index 00000000000000..9aa3eb17648639 --- /dev/null +++ b/.github/workflows/_win-test.yml @@ -0,0 +1,132 @@ +name: win-test + +on: + workflow_call: + inputs: + build-environment: + required: true + type: string + description: Top-level label for what's being built/tested. + cuda-version: + required: true + type: string + description: What CUDA version to build with, "cpu" for none. + test-matrix: + required: true + type: string + description: JSON description of what test configs to run. + +env: + IN_CI: 1 # TODO delete in favor of GITHUB_ACTIONS + IS_GHA: 1 # TODO delete in favor of GITHUB_ACTIONS + GIT_DEFAULT_BRANCH: ${{ github.event.repository.default_branch }} + +jobs: + test: + # Don't run on forked repos. + if: github.repository_owner == 'pytorch' + strategy: + matrix: ${{ fromJSON(inputs.test-matrix) }} + fail-fast: false + runs-on: ${{ matrix.runner }} + timeout-minutes: 300 + steps: + # [see note: pytorch repo ref] + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + with: + no-sudo: true + + - name: Setup Windows + uses: ./.github/actions/setup-win + with: + cuda-version: ${{ inputs.cuda-version }} + + - name: Setup SSH (Click me for login details) + uses: ./.github/actions/setup-ssh + with: + github-secret: ${{ secrets.GITHUB_TOKEN }} + + - name: Download PyTorch Build Artifacts + uses: seemethere/download-artifact-s3@v3 + with: + name: ${{ env.BUILD_ENVIRONMENT }} + path: C:\${{ github.run_id }}\build-results + + - name: Check build-results folder + shell: powershell + run: | + tree /F C:\$Env:GITHUB_RUN_ID\build-results + + - name: Test + shell: bash + env: + USE_CUDA: ${{ inputs.cuda-version != 'cpu' && '1' || '0' }} + INSTALL_WINDOWS_SDK: 1 + PYTHON_VERSION: 3.8 + PYTORCH_RETRY_TEST_CASES: 1 + PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} + VC_PRODUCT: "BuildTools" + VC_VERSION: "" + VS_VERSION: "16.8.6" + VC_YEAR: "2019" + AWS_DEFAULT_REGION: us-east-1 + PR_NUMBER: ${{ github.event.pull_request.number }} + SHA1: ${{ github.event.pull_request.head.sha || github.sha }} + CUDA_VERSION: ${{ inputs.cuda-version }} + PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/ + BUILD_ENVIRONMENT: ${{ inputs.build-environment }} + ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" + SHARD_NUMBER: ${{ matrix.shard }} + NUM_TEST_SHARDS: ${{ matrix.num_shards }} + TEST_CONFIG: ${{ matrix.config }} + JOB_BASE_NAME: ${{ inputs.build-environment }}-test + PR_BODY: ${{ github.event.pull_request.body }} + TORCH_CUDA_ARCH_LIST: "7.0" + run: | + COMMIT_MESSAGES=$(git cherry -v "origin/${GIT_DEFAULT_BRANCH:-master}") + export COMMIT_MESSAGES + .jenkins/pytorch/win-test.sh + + - name: Get workflow job id + id: get-job-id + uses: pytorch/pytorch/.github/actions/get-workflow-job-id@master + if: always() + with: + github-token: ${{ secrets.GITHUB_TOKEN }} + + - name: Upload test artifacts + uses: ./.github/actions/upload-test-artifacts + if: always() + with: + file-suffix: ${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}_${{ steps.get-job-id.outputs.job-id }} + + - name: Parse ref + id: parse-ref + run: .github/scripts/parse_ref.py + + - name: Upload test statistics + if: always() + env: + AWS_DEFAULT_REGION: us-east-1 + GIT_DEFAULT_BRANCH: ${{ github.event.repository.default_branch }} + BRANCH: ${{ steps.parse-ref.outputs.branch }} + JOB_BASE_NAME: ${{ inputs.build-environment }}-test + BUILD_ENVIRONMENT: ${{ inputs.build-environment }} + PR_NUMBER: ${{ github.event.pull_request.number }} + SHA1: ${{ github.event.pull_request.head.sha || github.sha }} + TAG: ${{ steps.parse-ref.outputs.tag }} + WORKFLOW_ID: ${{ github.run_id }} + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + GHA_WORKFLOW_JOB_ID: ${{ steps.get-job-id.outputs.job-id }} + shell: bash + run: | + set -x + python3 -m pip install -r requirements.txt + python3 -m pip install boto3==1.19.12 + python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test + + - name: Teardown Windows + uses: ./.github/actions/teardown-win + if: always() + timeout-minutes: 120 diff --git a/.github/workflows/create_release.yml b/.github/workflows/create_release.yml index f32c3021e3a2ed..b23282536789c4 100644 --- a/.github/workflows/create_release.yml +++ b/.github/workflows/create_release.yml @@ -20,6 +20,7 @@ jobs: - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: submodules: 'recursive' + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - name: Fake name for PRs if: ${{ github.event_name == 'pull_request' }} run: echo "PT_GITHUB_REF=refs/tags/pr-tag" >> "$GITHUB_ENV" @@ -51,5 +52,5 @@ jobs: files: ${{env.PT_RELEASE_FILE}} concurrency: - group: create-release-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} + group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} cancel-in-progress: true diff --git a/.github/workflows/docker-builds.yml b/.github/workflows/docker-builds.yml new file mode 100644 index 00000000000000..d294c63e7b3a30 --- /dev/null +++ b/.github/workflows/docker-builds.yml @@ -0,0 +1,76 @@ +name: docker-builds + +on: + workflow_dispatch: + pull_request: + paths: + - .circleci/docker/** + - .github/workflows/docker-builds.yml + schedule: + - cron: 1 3 * * 3 + +concurrency: + group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} + cancel-in-progress: true + +env: + ALPINE_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine + AWS_DEFAULT_REGION: us-east-1 + +jobs: + docker-build: + runs-on: [self-hosted, linux.2xlarge] + timeout-minutes: 240 + strategy: + matrix: + include: + - docker-image-name: pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7 + - docker-image-name: pytorch-linux-bionic-cuda11.5-cudnn8-py3-gcc7 + - docker-image-name: pytorch-linux-bionic-py3.7-clang9 + - docker-image-name: pytorch-linux-bionic-rocm4.5-py3.7 + - docker-image-name: pytorch-linux-bionic-rocm5.0-py3.7 + - docker-image-name: pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7 + - docker-image-name: pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7 + - docker-image-name: pytorch-linux-xenial-py3-clang5-android-ndk-r19c + - docker-image-name: pytorch-linux-xenial-py3-clang5-asan + - docker-image-name: pytorch-linux-xenial-py3-clang7-asan + - docker-image-name: pytorch-linux-xenial-py3-clang7-onnx + - docker-image-name: pytorch-linux-xenial-py3.7-gcc5.4 + - docker-image-name: pytorch-linux-xenial-py3.7-gcc7 + env: + DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/${{ matrix.docker-image-name }} + steps: + - name: Clean workspace + shell: bash + run: | + echo "${GITHUB_WORKSPACE}" + sudo rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + + # [see note: pytorch repo ref] + # deep clone (fetch-depth 0) required for git merge-base + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + + - name: Setup Linux + uses: ./.github/actions/setup-linux + + - name: Build docker image + id: build-docker-image + uses: ./.github/actions/calculate-docker-image + with: + docker-image-name: ${{ matrix.docker-image-name }} + always-rebuild: true + + - name: Pull docker image + uses: ./.github/actions/pull-docker-image + with: + docker-image: ${{ steps.build-docker-image.outputs.docker-image }} + + - name: Chown workspace + uses: ./.github/actions/chown-workspace + if: always() + + - name: Teardown Linux + uses: ./.github/actions/teardown-linux + if: always() diff --git a/.github/workflows/generated-caffe2-linux-xenial-py3.7-gcc5.4.yml b/.github/workflows/generated-caffe2-linux-xenial-py3.7-gcc5.4.yml deleted file mode 100644 index d8b08b4ac55bea..00000000000000 --- a/.github/workflows/generated-caffe2-linux-xenial-py3.7-gcc5.4.yml +++ /dev/null @@ -1,251 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/linux_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: caffe2-linux-xenial-py3.7-gcc5.4 - -on: - push: - tags: - - 'ciflow/all/*' - - 'ciflow/cpu/*' - - 'ciflow/linux/*' - - 'ciflow/trunk/*' - branches: - - master - - main - - release/* - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: caffe2-linux-xenial-py3.7-gcc5.4 - DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc5.4 - SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 - XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla - TORCH_CUDA_ARCH_LIST: 5.2 - IN_CI: 1 - IS_GHA: 1 - # This is used for the phase of adding wheel tests only, will be removed once completed - IN_WHEEL_TEST: 1 - # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh - CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - AWS_DEFAULT_REGION: us-east-1 - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - PYTORCH_RETRY_TEST_CASES: 1 -concurrency: - group: caffe2-linux-xenial-py3.7-gcc5.4-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true - -jobs: - - build: - runs-on: linux.2xlarge - timeout-minutes: 240 - env: - JOB_BASE_NAME: caffe2-linux-xenial-py3.7-gcc5.4-build - outputs: - docker_image: ${{ steps.calculate-tag.outputs.docker_image }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Calculate docker image tag - id: calculate-tag - run: | - DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker) - echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "::set-output name=docker_tag::${DOCKER_TAG}" - echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" - - name: Check if image should be built - id: check - env: - BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }} - run: | - set -x - # Check if image already exists, if it does then skip building it - if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then - exit 0 - fi - if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then - # if we're on the base branch then use the parent commit - MERGE_BASE=$(git rev-parse HEAD~) - else - # otherwise we're on a PR, so use the most recent base commit - MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") - fi - # Covers the case where a previous tag doesn't exist for the tree - # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly - if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then - echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit" - exit 1 - fi - PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker") - # If no image exists but the hash is the same as the previous hash then we should error out here - if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then - echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch" - echo " contact the PyTorch team to restore the original images" - exit 1 - fi - echo ::set-output name=rebuild::yes - - name: Build and push docker image - if: ${{ steps.check.outputs.rebuild }} - env: - DOCKER_SKIP_S3_UPLOAD: 1 - working-directory: .circleci/docker - run: | - export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/} - ./build_docker.sh - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Build - env: - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - # detached container should get cleaned up by teardown_ec2_linux - container_name=$(docker run \ - -e BUILD_ENVIRONMENT \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e AWS_DEFAULT_REGION \ - -e IS_GHA \ - -e PR_NUMBER \ - -e SHA1 \ - -e BRANCH \ - -e GITHUB_RUN_ID \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e PR_LABELS \ - -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh' - - name: Display and upload binary build size statistics (Click Me) - # temporary hack: set CIRCLE_* vars, until we update - # tools/stats/print_test_stats.py to natively support GitHub Actions - env: - SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - run: | - COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0) - export COMMIT_TIME - pip3 install requests==2.26 boto3==1.16.34 - python3 -m tools.stats.upload_binary_size_to_scuba || exit 0 - - name: Chown workspace - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Archive artifacts into zip - run: | - zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json - - uses: seemethere/upload-artifact-s3@v3 - name: Store PyTorch Build Artifacts on S3 - with: - name: ${{ env.BUILD_ENVIRONMENT }} - retention-days: 14 - if-no-files-found: error - path: - artifacts.zip - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Clean up docker images - if: always() - run: | - # Prune all of the docker images - docker system prune -af diff --git a/.github/workflows/generated-docker-builds.yml b/.github/workflows/generated-docker-builds.yml deleted file mode 100644 index 357305f2b3b2db..00000000000000 --- a/.github/workflows/generated-docker-builds.yml +++ /dev/null @@ -1,173 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/docker_builds_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: docker-builds - -on: - workflow_dispatch: - pull_request: - types: [opened, synchronize, reopened] - paths: - - '.circleci/docker/**' - - '.github/workflows/generated-docker-builds.yml' - schedule: - - cron: 1 3 * * 3 -concurrency: - group: docker-builds-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true - -env: - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - AWS_DEFAULT_REGION: us-east-1 - -jobs: - - docker-build: - runs-on: linux.2xlarge - timeout-minutes: 240 - strategy: - matrix: - include: - - docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7' - docker_image_short_name: 'pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7' - - docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.5-cudnn8-py3-gcc7' - docker_image_short_name: 'pytorch-linux-bionic-cuda11.5-cudnn8-py3-gcc7' - - docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-py3.7-clang9' - docker_image_short_name: 'pytorch-linux-bionic-py3.7-clang9' - - docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-rocm4.3.1-py3.7' - docker_image_short_name: 'pytorch-linux-bionic-rocm4.3.1-py3.7' - - docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-rocm4.5-py3.7' - docker_image_short_name: 'pytorch-linux-bionic-rocm4.5-py3.7' - - docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7' - docker_image_short_name: 'pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7' - - docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7' - docker_image_short_name: 'pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7' - - docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c' - docker_image_short_name: 'pytorch-linux-xenial-py3-clang5-android-ndk-r19c' - - docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-asan' - docker_image_short_name: 'pytorch-linux-xenial-py3-clang5-asan' - - docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang7-asan' - docker_image_short_name: 'pytorch-linux-xenial-py3-clang7-asan' - - docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang7-onnx' - docker_image_short_name: 'pytorch-linux-xenial-py3-clang7-onnx' - - docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc5.4' - docker_image_short_name: 'pytorch-linux-xenial-py3.7-gcc5.4' - - docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc7' - docker_image_short_name: 'pytorch-linux-xenial-py3.7-gcc7' - env: - DOCKER_IMAGE_BASE: '${{ matrix.docker_image_base }}' - name: docker-build (${{ matrix.docker_image_short_name }}) - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Calculate docker image tag - id: calculate-tag - run: | - DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker) - echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "::set-output name=docker_tag::${DOCKER_TAG}" - echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" - - name: Check if image should be built - id: check - env: - BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }} - run: | - set -x - echo ::set-output name=rebuild::yes - - name: Build and push docker image - if: ${{ steps.check.outputs.rebuild }} - env: - DOCKER_SKIP_S3_UPLOAD: 1 - working-directory: .circleci/docker - run: | - export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/} - ./build_docker.sh - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Clean up docker images - if: always() - run: | - # Prune all of the docker images - docker system prune -af diff --git a/.github/workflows/generated-ios-12-5-1-arm64-coreml.yml b/.github/workflows/generated-ios-12-5-1-arm64-coreml.yml deleted file mode 100644 index 7640a34c634a67..00000000000000 --- a/.github/workflows/generated-ios-12-5-1-arm64-coreml.yml +++ /dev/null @@ -1,143 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/ios_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: ios-12-5-1-arm64-coreml - -on: - schedule: - - cron: 45 4,10,16,22 * * * - push: - tags: - - 'ciflow/all/*' - - 'ciflow/ios/*' - - 'ciflow/macos/*' - - 'ciflow/scheduled/*' - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: ios-12-5-1-arm64-coreml - IN_CI: 1 - IS_GHA: 1 - IOS_PLATFORM: OS - IOS_ARCH: arm64 - - -jobs: - - build: - # NOTE: These builds will not run successfully without running on `pytorch/pytorch` due to the limitations - # of accessing secrets from forked pull requests and IOS' dependency on secrets for their build/test - if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} - runs-on: macos-10.15 - timeout-minutes: 240 - env: - JOB_BASE_NAME: ios-12-5-1-arm64-coreml-build - IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }} - IOS_CERT_SECRET: ${{ secrets.IOS_CERT_SECRET }} - IOS_DEV_TEAM_ID: ${{ secrets.IOS_DEV_TEAM_ID }} - IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }} - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Populate CI build options - run: | - # Most builds use the lite interpreter, if certain builds shouldn't - # build the lite interpreter this env variable should get over-written - # in the following case statement - echo "BUILD_LITE_INTERPRETER=1" >> "${GITHUB_ENV}" - - case ${BUILD_ENVIRONMENT} in - *metal*) - echo "USE_PYTORCH_METAL=1" >> "${GITHUB_ENV}" - ;; - *full_jit*) - echo "BUILD_LITE_INTERPRETER=0" >> "${GITHUB_ENV}" - ;; - *custom*) - echo "SELECTED_OP_LIST=${GITHUB_WORKSPACE}/ios/TestApp/custom_build/mobilenetv2.yaml" >> "${GITHUB_ENV}" - ;; - *coreml*) - echo "USE_COREML_DELEGATE=1" >> "${GITHUB_ENV}" - ;; - esac - - name: Install brew dependencies - run: | - # Install dependencies - brew install libtool - - name: Install conda and dependencies - run: | - # Install conda, setup-miniconda messes with the path that messes with the ruby stuff we do later on - curl --retry 3 -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh - chmod +x "${RUNNER_TEMP}/conda.sh" - /bin/bash "${RUNNER_TEMP}/conda.sh" -b -p "${RUNNER_TEMP}/anaconda" - echo "${RUNNER_TEMP}/anaconda/bin" >> "${GITHUB_PATH}" - # shellcheck disable=SC1091 - source "${RUNNER_TEMP}/anaconda/bin/activate" - conda install -y \ - cffi \ - cmake \ - mkl \ - mkl-include \ - ninja \ - numpy \ - pyyaml \ - requests \ - setuptools \ - typing_extensions - - name: Run Fastlane - run: | - set -x - cd ios/TestApp - # install fastlane - sudo gem install bundler && bundle install - # install certificates - echo "${IOS_CERT_KEY_2022}" >> cert.txt - base64 --decode cert.txt -o Certificates.p12 - rm cert.txt - bundle exec fastlane install_root_cert - bundle exec fastlane install_dev_cert - # install the provisioning profile - PROFILE=PyTorch_CI_2022.mobileprovision - PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles - mkdir -pv "${PROVISIONING_PROFILES}" - cd "${PROVISIONING_PROFILES}" - echo "${IOS_SIGN_KEY_2022}" >> cert.txt - base64 --decode cert.txt -o ${PROFILE} - rm cert.txt - - name: Build - run: | - # shellcheck disable=SC1091 - source "${RUNNER_TEMP}/anaconda/bin/activate" - export TCLLIBPATH="/usr/local/lib" - python -VV - export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"} - scripts/build_ios.sh - - name: Run Build Test - run: | - PROFILE=PyTorch_CI_2022 - # run the ruby build script - if ! [ -x "$(command -v xcodebuild)" ]; then - echo 'Error: xcodebuild is not installed.' - exit 1 - fi - if [ "${IOS_PLATFORM}" != "SIMULATOR" ]; then - ruby scripts/xcode_build.rb -i build_ios/install -x ios/TestApp/TestApp.xcodeproj -p "${IOS_PLATFORM}" -c "${PROFILE}" -t "${IOS_DEV_TEAM_ID}" - else - ruby scripts/xcode_build.rb -i build_ios/install -x ios/TestApp/TestApp.xcodeproj -p "${IOS_PLATFORM}" - fi - -concurrency: - group: ios-12-5-1-arm64-coreml-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true diff --git a/.github/workflows/generated-ios-12-5-1-arm64-custom-ops.yml b/.github/workflows/generated-ios-12-5-1-arm64-custom-ops.yml deleted file mode 100644 index 75bc1f77252b21..00000000000000 --- a/.github/workflows/generated-ios-12-5-1-arm64-custom-ops.yml +++ /dev/null @@ -1,143 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/ios_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: ios-12-5-1-arm64-custom-ops - -on: - schedule: - - cron: 45 4,10,16,22 * * * - push: - tags: - - 'ciflow/all/*' - - 'ciflow/ios/*' - - 'ciflow/macos/*' - - 'ciflow/scheduled/*' - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: ios-12-5-1-arm64-custom-ops - IN_CI: 1 - IS_GHA: 1 - IOS_PLATFORM: OS - IOS_ARCH: arm64 - - -jobs: - - build: - # NOTE: These builds will not run successfully without running on `pytorch/pytorch` due to the limitations - # of accessing secrets from forked pull requests and IOS' dependency on secrets for their build/test - if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} - runs-on: macos-10.15 - timeout-minutes: 240 - env: - JOB_BASE_NAME: ios-12-5-1-arm64-custom-ops-build - IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }} - IOS_CERT_SECRET: ${{ secrets.IOS_CERT_SECRET }} - IOS_DEV_TEAM_ID: ${{ secrets.IOS_DEV_TEAM_ID }} - IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }} - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Populate CI build options - run: | - # Most builds use the lite interpreter, if certain builds shouldn't - # build the lite interpreter this env variable should get over-written - # in the following case statement - echo "BUILD_LITE_INTERPRETER=1" >> "${GITHUB_ENV}" - - case ${BUILD_ENVIRONMENT} in - *metal*) - echo "USE_PYTORCH_METAL=1" >> "${GITHUB_ENV}" - ;; - *full_jit*) - echo "BUILD_LITE_INTERPRETER=0" >> "${GITHUB_ENV}" - ;; - *custom*) - echo "SELECTED_OP_LIST=${GITHUB_WORKSPACE}/ios/TestApp/custom_build/mobilenetv2.yaml" >> "${GITHUB_ENV}" - ;; - *coreml*) - echo "USE_COREML_DELEGATE=1" >> "${GITHUB_ENV}" - ;; - esac - - name: Install brew dependencies - run: | - # Install dependencies - brew install libtool - - name: Install conda and dependencies - run: | - # Install conda, setup-miniconda messes with the path that messes with the ruby stuff we do later on - curl --retry 3 -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh - chmod +x "${RUNNER_TEMP}/conda.sh" - /bin/bash "${RUNNER_TEMP}/conda.sh" -b -p "${RUNNER_TEMP}/anaconda" - echo "${RUNNER_TEMP}/anaconda/bin" >> "${GITHUB_PATH}" - # shellcheck disable=SC1091 - source "${RUNNER_TEMP}/anaconda/bin/activate" - conda install -y \ - cffi \ - cmake \ - mkl \ - mkl-include \ - ninja \ - numpy \ - pyyaml \ - requests \ - setuptools \ - typing_extensions - - name: Run Fastlane - run: | - set -x - cd ios/TestApp - # install fastlane - sudo gem install bundler && bundle install - # install certificates - echo "${IOS_CERT_KEY_2022}" >> cert.txt - base64 --decode cert.txt -o Certificates.p12 - rm cert.txt - bundle exec fastlane install_root_cert - bundle exec fastlane install_dev_cert - # install the provisioning profile - PROFILE=PyTorch_CI_2022.mobileprovision - PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles - mkdir -pv "${PROVISIONING_PROFILES}" - cd "${PROVISIONING_PROFILES}" - echo "${IOS_SIGN_KEY_2022}" >> cert.txt - base64 --decode cert.txt -o ${PROFILE} - rm cert.txt - - name: Build - run: | - # shellcheck disable=SC1091 - source "${RUNNER_TEMP}/anaconda/bin/activate" - export TCLLIBPATH="/usr/local/lib" - python -VV - export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"} - scripts/build_ios.sh - - name: Run Build Test - run: | - PROFILE=PyTorch_CI_2022 - # run the ruby build script - if ! [ -x "$(command -v xcodebuild)" ]; then - echo 'Error: xcodebuild is not installed.' - exit 1 - fi - if [ "${IOS_PLATFORM}" != "SIMULATOR" ]; then - ruby scripts/xcode_build.rb -i build_ios/install -x ios/TestApp/TestApp.xcodeproj -p "${IOS_PLATFORM}" -c "${PROFILE}" -t "${IOS_DEV_TEAM_ID}" - else - ruby scripts/xcode_build.rb -i build_ios/install -x ios/TestApp/TestApp.xcodeproj -p "${IOS_PLATFORM}" - fi - -concurrency: - group: ios-12-5-1-arm64-custom-ops-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true diff --git a/.github/workflows/generated-ios-12-5-1-arm64-metal.yml b/.github/workflows/generated-ios-12-5-1-arm64-metal.yml deleted file mode 100644 index 2a9da911d79b8d..00000000000000 --- a/.github/workflows/generated-ios-12-5-1-arm64-metal.yml +++ /dev/null @@ -1,143 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/ios_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: ios-12-5-1-arm64-metal - -on: - schedule: - - cron: 45 4,10,16,22 * * * - push: - tags: - - 'ciflow/all/*' - - 'ciflow/ios/*' - - 'ciflow/macos/*' - - 'ciflow/scheduled/*' - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: ios-12-5-1-arm64-metal - IN_CI: 1 - IS_GHA: 1 - IOS_PLATFORM: OS - IOS_ARCH: arm64 - - -jobs: - - build: - # NOTE: These builds will not run successfully without running on `pytorch/pytorch` due to the limitations - # of accessing secrets from forked pull requests and IOS' dependency on secrets for their build/test - if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} - runs-on: macos-10.15 - timeout-minutes: 240 - env: - JOB_BASE_NAME: ios-12-5-1-arm64-metal-build - IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }} - IOS_CERT_SECRET: ${{ secrets.IOS_CERT_SECRET }} - IOS_DEV_TEAM_ID: ${{ secrets.IOS_DEV_TEAM_ID }} - IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }} - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Populate CI build options - run: | - # Most builds use the lite interpreter, if certain builds shouldn't - # build the lite interpreter this env variable should get over-written - # in the following case statement - echo "BUILD_LITE_INTERPRETER=1" >> "${GITHUB_ENV}" - - case ${BUILD_ENVIRONMENT} in - *metal*) - echo "USE_PYTORCH_METAL=1" >> "${GITHUB_ENV}" - ;; - *full_jit*) - echo "BUILD_LITE_INTERPRETER=0" >> "${GITHUB_ENV}" - ;; - *custom*) - echo "SELECTED_OP_LIST=${GITHUB_WORKSPACE}/ios/TestApp/custom_build/mobilenetv2.yaml" >> "${GITHUB_ENV}" - ;; - *coreml*) - echo "USE_COREML_DELEGATE=1" >> "${GITHUB_ENV}" - ;; - esac - - name: Install brew dependencies - run: | - # Install dependencies - brew install libtool - - name: Install conda and dependencies - run: | - # Install conda, setup-miniconda messes with the path that messes with the ruby stuff we do later on - curl --retry 3 -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh - chmod +x "${RUNNER_TEMP}/conda.sh" - /bin/bash "${RUNNER_TEMP}/conda.sh" -b -p "${RUNNER_TEMP}/anaconda" - echo "${RUNNER_TEMP}/anaconda/bin" >> "${GITHUB_PATH}" - # shellcheck disable=SC1091 - source "${RUNNER_TEMP}/anaconda/bin/activate" - conda install -y \ - cffi \ - cmake \ - mkl \ - mkl-include \ - ninja \ - numpy \ - pyyaml \ - requests \ - setuptools \ - typing_extensions - - name: Run Fastlane - run: | - set -x - cd ios/TestApp - # install fastlane - sudo gem install bundler && bundle install - # install certificates - echo "${IOS_CERT_KEY_2022}" >> cert.txt - base64 --decode cert.txt -o Certificates.p12 - rm cert.txt - bundle exec fastlane install_root_cert - bundle exec fastlane install_dev_cert - # install the provisioning profile - PROFILE=PyTorch_CI_2022.mobileprovision - PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles - mkdir -pv "${PROVISIONING_PROFILES}" - cd "${PROVISIONING_PROFILES}" - echo "${IOS_SIGN_KEY_2022}" >> cert.txt - base64 --decode cert.txt -o ${PROFILE} - rm cert.txt - - name: Build - run: | - # shellcheck disable=SC1091 - source "${RUNNER_TEMP}/anaconda/bin/activate" - export TCLLIBPATH="/usr/local/lib" - python -VV - export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"} - scripts/build_ios.sh - - name: Run Build Test - run: | - PROFILE=PyTorch_CI_2022 - # run the ruby build script - if ! [ -x "$(command -v xcodebuild)" ]; then - echo 'Error: xcodebuild is not installed.' - exit 1 - fi - if [ "${IOS_PLATFORM}" != "SIMULATOR" ]; then - ruby scripts/xcode_build.rb -i build_ios/install -x ios/TestApp/TestApp.xcodeproj -p "${IOS_PLATFORM}" -c "${PROFILE}" -t "${IOS_DEV_TEAM_ID}" - else - ruby scripts/xcode_build.rb -i build_ios/install -x ios/TestApp/TestApp.xcodeproj -p "${IOS_PLATFORM}" - fi - -concurrency: - group: ios-12-5-1-arm64-metal-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true diff --git a/.github/workflows/generated-ios-12-5-1-arm64.yml b/.github/workflows/generated-ios-12-5-1-arm64.yml deleted file mode 100644 index 3463fc5c48ac63..00000000000000 --- a/.github/workflows/generated-ios-12-5-1-arm64.yml +++ /dev/null @@ -1,143 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/ios_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: ios-12-5-1-arm64 - -on: - schedule: - - cron: 45 4,10,16,22 * * * - push: - tags: - - 'ciflow/all/*' - - 'ciflow/ios/*' - - 'ciflow/macos/*' - - 'ciflow/scheduled/*' - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: ios-12-5-1-arm64 - IN_CI: 1 - IS_GHA: 1 - IOS_PLATFORM: OS - IOS_ARCH: arm64 - - -jobs: - - build: - # NOTE: These builds will not run successfully without running on `pytorch/pytorch` due to the limitations - # of accessing secrets from forked pull requests and IOS' dependency on secrets for their build/test - if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} - runs-on: macos-10.15 - timeout-minutes: 240 - env: - JOB_BASE_NAME: ios-12-5-1-arm64-build - IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }} - IOS_CERT_SECRET: ${{ secrets.IOS_CERT_SECRET }} - IOS_DEV_TEAM_ID: ${{ secrets.IOS_DEV_TEAM_ID }} - IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }} - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Populate CI build options - run: | - # Most builds use the lite interpreter, if certain builds shouldn't - # build the lite interpreter this env variable should get over-written - # in the following case statement - echo "BUILD_LITE_INTERPRETER=1" >> "${GITHUB_ENV}" - - case ${BUILD_ENVIRONMENT} in - *metal*) - echo "USE_PYTORCH_METAL=1" >> "${GITHUB_ENV}" - ;; - *full_jit*) - echo "BUILD_LITE_INTERPRETER=0" >> "${GITHUB_ENV}" - ;; - *custom*) - echo "SELECTED_OP_LIST=${GITHUB_WORKSPACE}/ios/TestApp/custom_build/mobilenetv2.yaml" >> "${GITHUB_ENV}" - ;; - *coreml*) - echo "USE_COREML_DELEGATE=1" >> "${GITHUB_ENV}" - ;; - esac - - name: Install brew dependencies - run: | - # Install dependencies - brew install libtool - - name: Install conda and dependencies - run: | - # Install conda, setup-miniconda messes with the path that messes with the ruby stuff we do later on - curl --retry 3 -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh - chmod +x "${RUNNER_TEMP}/conda.sh" - /bin/bash "${RUNNER_TEMP}/conda.sh" -b -p "${RUNNER_TEMP}/anaconda" - echo "${RUNNER_TEMP}/anaconda/bin" >> "${GITHUB_PATH}" - # shellcheck disable=SC1091 - source "${RUNNER_TEMP}/anaconda/bin/activate" - conda install -y \ - cffi \ - cmake \ - mkl \ - mkl-include \ - ninja \ - numpy \ - pyyaml \ - requests \ - setuptools \ - typing_extensions - - name: Run Fastlane - run: | - set -x - cd ios/TestApp - # install fastlane - sudo gem install bundler && bundle install - # install certificates - echo "${IOS_CERT_KEY_2022}" >> cert.txt - base64 --decode cert.txt -o Certificates.p12 - rm cert.txt - bundle exec fastlane install_root_cert - bundle exec fastlane install_dev_cert - # install the provisioning profile - PROFILE=PyTorch_CI_2022.mobileprovision - PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles - mkdir -pv "${PROVISIONING_PROFILES}" - cd "${PROVISIONING_PROFILES}" - echo "${IOS_SIGN_KEY_2022}" >> cert.txt - base64 --decode cert.txt -o ${PROFILE} - rm cert.txt - - name: Build - run: | - # shellcheck disable=SC1091 - source "${RUNNER_TEMP}/anaconda/bin/activate" - export TCLLIBPATH="/usr/local/lib" - python -VV - export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"} - scripts/build_ios.sh - - name: Run Build Test - run: | - PROFILE=PyTorch_CI_2022 - # run the ruby build script - if ! [ -x "$(command -v xcodebuild)" ]; then - echo 'Error: xcodebuild is not installed.' - exit 1 - fi - if [ "${IOS_PLATFORM}" != "SIMULATOR" ]; then - ruby scripts/xcode_build.rb -i build_ios/install -x ios/TestApp/TestApp.xcodeproj -p "${IOS_PLATFORM}" -c "${PROFILE}" -t "${IOS_DEV_TEAM_ID}" - else - ruby scripts/xcode_build.rb -i build_ios/install -x ios/TestApp/TestApp.xcodeproj -p "${IOS_PLATFORM}" - fi - -concurrency: - group: ios-12-5-1-arm64-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true diff --git a/.github/workflows/generated-ios-12-5-1-x86-64-coreml.yml b/.github/workflows/generated-ios-12-5-1-x86-64-coreml.yml deleted file mode 100644 index d9fdd93b79abdf..00000000000000 --- a/.github/workflows/generated-ios-12-5-1-x86-64-coreml.yml +++ /dev/null @@ -1,178 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/ios_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: ios-12-5-1-x86-64-coreml - -on: - push: - branches: - - master - - main - - release/* - tags: - - 'ciflow/all/*' - - 'ciflow/ios/*' - - 'ciflow/macos/*' - - 'ciflow/trunk/*' - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: ios-12-5-1-x86-64-coreml - IN_CI: 1 - IS_GHA: 1 - IOS_PLATFORM: SIMULATOR - IOS_ARCH: x86_64 - - -jobs: - - build: - # NOTE: These builds will not run successfully without running on `pytorch/pytorch` due to the limitations - # of accessing secrets from forked pull requests and IOS' dependency on secrets for their build/test - if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} - runs-on: macos-10.15 - timeout-minutes: 240 - env: - JOB_BASE_NAME: ios-12-5-1-x86-64-coreml-build - IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }} - IOS_CERT_SECRET: ${{ secrets.IOS_CERT_SECRET }} - IOS_DEV_TEAM_ID: ${{ secrets.IOS_DEV_TEAM_ID }} - IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }} - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Populate CI build options - run: | - # Most builds use the lite interpreter, if certain builds shouldn't - # build the lite interpreter this env variable should get over-written - # in the following case statement - echo "BUILD_LITE_INTERPRETER=1" >> "${GITHUB_ENV}" - - case ${BUILD_ENVIRONMENT} in - *metal*) - echo "USE_PYTORCH_METAL=1" >> "${GITHUB_ENV}" - ;; - *full_jit*) - echo "BUILD_LITE_INTERPRETER=0" >> "${GITHUB_ENV}" - ;; - *custom*) - echo "SELECTED_OP_LIST=${GITHUB_WORKSPACE}/ios/TestApp/custom_build/mobilenetv2.yaml" >> "${GITHUB_ENV}" - ;; - *coreml*) - echo "USE_COREML_DELEGATE=1" >> "${GITHUB_ENV}" - ;; - esac - - name: Install brew dependencies - run: | - # Install dependencies - brew install libtool - - name: Install conda and dependencies - run: | - # Install conda, setup-miniconda messes with the path that messes with the ruby stuff we do later on - curl --retry 3 -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh - chmod +x "${RUNNER_TEMP}/conda.sh" - /bin/bash "${RUNNER_TEMP}/conda.sh" -b -p "${RUNNER_TEMP}/anaconda" - echo "${RUNNER_TEMP}/anaconda/bin" >> "${GITHUB_PATH}" - # shellcheck disable=SC1091 - source "${RUNNER_TEMP}/anaconda/bin/activate" - conda install -y \ - cffi \ - cmake \ - mkl \ - mkl-include \ - ninja \ - numpy \ - pyyaml \ - requests \ - setuptools \ - typing_extensions - - name: Run Fastlane - run: | - set -x - cd ios/TestApp - # install fastlane - sudo gem install bundler && bundle install - # install certificates - echo "${IOS_CERT_KEY_2022}" >> cert.txt - base64 --decode cert.txt -o Certificates.p12 - rm cert.txt - bundle exec fastlane install_root_cert - bundle exec fastlane install_dev_cert - # install the provisioning profile - PROFILE=PyTorch_CI_2022.mobileprovision - PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles - mkdir -pv "${PROVISIONING_PROFILES}" - cd "${PROVISIONING_PROFILES}" - echo "${IOS_SIGN_KEY_2022}" >> cert.txt - base64 --decode cert.txt -o ${PROFILE} - rm cert.txt - - name: Build - run: | - # shellcheck disable=SC1091 - source "${RUNNER_TEMP}/anaconda/bin/activate" - export TCLLIBPATH="/usr/local/lib" - python -VV - export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"} - scripts/build_ios.sh - - name: Run Build Test - run: | - PROFILE=PyTorch_CI_2022 - # run the ruby build script - if ! [ -x "$(command -v xcodebuild)" ]; then - echo 'Error: xcodebuild is not installed.' - exit 1 - fi - if [ "${IOS_PLATFORM}" != "SIMULATOR" ]; then - ruby scripts/xcode_build.rb -i build_ios/install -x ios/TestApp/TestApp.xcodeproj -p "${IOS_PLATFORM}" -c "${PROFILE}" -t "${IOS_DEV_TEAM_ID}" - else - ruby scripts/xcode_build.rb -i build_ios/install -x ios/TestApp/TestApp.xcodeproj -p "${IOS_PLATFORM}" - fi - - name: Run Simulator Tests - run: | - # shellcheck disable=SC1091 - source "${RUNNER_TEMP}/anaconda/bin/activate" - pip3 install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html - # generate models for differnet backends - cd "${GITHUB_WORKSPACE}/ios/TestApp/benchmark" - mkdir -p ../models - if [ "${USE_COREML_DELEGATE}" == 1 ]; then - pip install coremltools==5.0b5 - pip install six==1.16.0 - python coreml_backend.py - else - python trace_model.py - fi - if [ "${BUILD_LITE_INTERPRETER}" == 1 ]; then - echo "Setting up the TestApp for LiteInterpreter" - ruby setup.rb --lite 1 - else - echo "Setting up the TestApp for Full JIT" - ruby setup.rb - fi - cd "${GITHUB_WORKSPACE}/ios/TestApp" - instruments -s -devices - if [ "${BUILD_LITE_INTERPRETER}" == 1 ]; then - if [ "${USE_COREML_DELEGATE}" == 1 ]; then - fastlane scan --only_testing TestAppTests/TestAppTests/testCoreML - else - fastlane scan --only_testing TestAppTests/TestAppTests/testLiteInterpreter - fi - else - fastlane scan --only_testing TestAppTests/TestAppTests/testFullJIT - fi - -concurrency: - group: ios-12-5-1-x86-64-coreml-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true diff --git a/.github/workflows/generated-libtorch-linux-xenial-cuda10.2-py3.7-gcc7.yml b/.github/workflows/generated-libtorch-linux-xenial-cuda10.2-py3.7-gcc7.yml deleted file mode 100644 index 5889466d9b0824..00000000000000 --- a/.github/workflows/generated-libtorch-linux-xenial-cuda10.2-py3.7-gcc7.yml +++ /dev/null @@ -1,241 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/linux_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: libtorch-linux-xenial-cuda10.2-py3.7-gcc7 - -on: - push: - tags: - - 'ciflow/all/*' - - 'ciflow/cuda/*' - - 'ciflow/libtorch/*' - - 'ciflow/linux/*' - - 'ciflow/trunk/*' - branches: - - master - - main - - release/* - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: libtorch-linux-xenial-cuda10.2-py3.7-gcc7 - DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7 - SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 - XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla - TORCH_CUDA_ARCH_LIST: 5.2 - IN_CI: 1 - IS_GHA: 1 - # This is used for the phase of adding wheel tests only, will be removed once completed - IN_WHEEL_TEST: 1 - # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh - CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - AWS_DEFAULT_REGION: us-east-1 - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - PYTORCH_RETRY_TEST_CASES: 1 -concurrency: - group: libtorch-linux-xenial-cuda10.2-py3.7-gcc7-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true - -jobs: - - build: - runs-on: linux.2xlarge - timeout-minutes: 240 - env: - JOB_BASE_NAME: libtorch-linux-xenial-cuda10.2-py3.7-gcc7-build - outputs: - docker_image: ${{ steps.calculate-tag.outputs.docker_image }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Calculate docker image tag - id: calculate-tag - run: | - DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker) - echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "::set-output name=docker_tag::${DOCKER_TAG}" - echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" - - name: Check if image should be built - id: check - env: - BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }} - run: | - set -x - # Check if image already exists, if it does then skip building it - if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then - exit 0 - fi - if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then - # if we're on the base branch then use the parent commit - MERGE_BASE=$(git rev-parse HEAD~) - else - # otherwise we're on a PR, so use the most recent base commit - MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") - fi - # Covers the case where a previous tag doesn't exist for the tree - # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly - if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then - echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit" - exit 1 - fi - PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker") - # If no image exists but the hash is the same as the previous hash then we should error out here - if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then - echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch" - echo " contact the PyTorch team to restore the original images" - exit 1 - fi - echo ::set-output name=rebuild::yes - - name: Build and push docker image - if: ${{ steps.check.outputs.rebuild }} - env: - DOCKER_SKIP_S3_UPLOAD: 1 - working-directory: .circleci/docker - run: | - export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/} - ./build_docker.sh - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Build - env: - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - # detached container should get cleaned up by teardown_ec2_linux - container_name=$(docker run \ - -e BUILD_ENVIRONMENT \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e AWS_DEFAULT_REGION \ - -e IS_GHA \ - -e PR_NUMBER \ - -e SHA1 \ - -e BRANCH \ - -e GITHUB_RUN_ID \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e PR_LABELS \ - -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh' - - name: Display and upload binary build size statistics (Click Me) - # temporary hack: set CIRCLE_* vars, until we update - # tools/stats/print_test_stats.py to natively support GitHub Actions - env: - SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - run: | - COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0) - export COMMIT_TIME - pip3 install requests==2.26 boto3==1.16.34 - python3 -m tools.stats.upload_binary_size_to_scuba || exit 0 - - name: Chown workspace - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Clean up docker images - if: always() - run: | - # Prune all of the docker images - docker system prune -af diff --git a/.github/workflows/generated-libtorch-linux-xenial-cuda11.3-py3.7-gcc7.yml b/.github/workflows/generated-libtorch-linux-xenial-cuda11.3-py3.7-gcc7.yml deleted file mode 100644 index 7c9e9f19ff3fda..00000000000000 --- a/.github/workflows/generated-libtorch-linux-xenial-cuda11.3-py3.7-gcc7.yml +++ /dev/null @@ -1,241 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/linux_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: libtorch-linux-xenial-cuda11.3-py3.7-gcc7 - -on: - push: - tags: - - 'ciflow/all/*' - - 'ciflow/cuda/*' - - 'ciflow/libtorch/*' - - 'ciflow/linux/*' - - 'ciflow/trunk/*' - branches: - - master - - main - - release/* - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: libtorch-linux-xenial-cuda11.3-py3.7-gcc7 - DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7 - SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 - XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla - TORCH_CUDA_ARCH_LIST: 5.2 - IN_CI: 1 - IS_GHA: 1 - # This is used for the phase of adding wheel tests only, will be removed once completed - IN_WHEEL_TEST: 1 - # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh - CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - AWS_DEFAULT_REGION: us-east-1 - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - PYTORCH_RETRY_TEST_CASES: 1 -concurrency: - group: libtorch-linux-xenial-cuda11.3-py3.7-gcc7-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true - -jobs: - - build: - runs-on: linux.2xlarge - timeout-minutes: 240 - env: - JOB_BASE_NAME: libtorch-linux-xenial-cuda11.3-py3.7-gcc7-build - outputs: - docker_image: ${{ steps.calculate-tag.outputs.docker_image }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Calculate docker image tag - id: calculate-tag - run: | - DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker) - echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "::set-output name=docker_tag::${DOCKER_TAG}" - echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" - - name: Check if image should be built - id: check - env: - BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }} - run: | - set -x - # Check if image already exists, if it does then skip building it - if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then - exit 0 - fi - if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then - # if we're on the base branch then use the parent commit - MERGE_BASE=$(git rev-parse HEAD~) - else - # otherwise we're on a PR, so use the most recent base commit - MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") - fi - # Covers the case where a previous tag doesn't exist for the tree - # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly - if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then - echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit" - exit 1 - fi - PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker") - # If no image exists but the hash is the same as the previous hash then we should error out here - if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then - echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch" - echo " contact the PyTorch team to restore the original images" - exit 1 - fi - echo ::set-output name=rebuild::yes - - name: Build and push docker image - if: ${{ steps.check.outputs.rebuild }} - env: - DOCKER_SKIP_S3_UPLOAD: 1 - working-directory: .circleci/docker - run: | - export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/} - ./build_docker.sh - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Build - env: - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - # detached container should get cleaned up by teardown_ec2_linux - container_name=$(docker run \ - -e BUILD_ENVIRONMENT \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e AWS_DEFAULT_REGION \ - -e IS_GHA \ - -e PR_NUMBER \ - -e SHA1 \ - -e BRANCH \ - -e GITHUB_RUN_ID \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e PR_LABELS \ - -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh' - - name: Display and upload binary build size statistics (Click Me) - # temporary hack: set CIRCLE_* vars, until we update - # tools/stats/print_test_stats.py to natively support GitHub Actions - env: - SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - run: | - COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0) - export COMMIT_TIME - pip3 install requests==2.26 boto3==1.16.34 - python3 -m tools.stats.upload_binary_size_to_scuba || exit 0 - - name: Chown workspace - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Clean up docker images - if: always() - run: | - # Prune all of the docker images - docker system prune -af diff --git a/.github/workflows/generated-linux-binary-conda.yml b/.github/workflows/generated-linux-binary-conda-nightly.yml similarity index 77% rename from .github/workflows/generated-linux-binary-conda.yml rename to .github/workflows/generated-linux-binary-conda-nightly.yml index f1ff75db90d386..63861bbe87c13a 100644 --- a/.github/workflows/generated-linux-binary-conda.yml +++ b/.github/workflows/generated-linux-binary-conda-nightly.yml @@ -54,30 +54,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -94,9 +74,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -161,7 +138,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: conda-py3_7-cpu retention-days: 14 @@ -201,30 +178,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -241,10 +198,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: conda-py3_7-cpu @@ -343,30 +297,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -383,12 +317,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: conda-py3_7-cpu @@ -459,30 +390,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -499,9 +410,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -566,7 +474,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: conda-py3_7-cuda10_2 retention-days: 14 @@ -607,30 +515,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -647,10 +535,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: conda-py3_7-cuda10_2 @@ -761,30 +646,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -801,12 +666,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: conda-py3_7-cuda10_2 @@ -877,30 +739,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -917,9 +759,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -987,7 +826,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: conda-py3_7-cuda11_3 retention-days: 14 @@ -1028,30 +867,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -1068,10 +887,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: conda-py3_7-cuda11_3 @@ -1182,30 +998,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -1222,12 +1018,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: conda-py3_7-cuda11_3 @@ -1298,30 +1091,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -1338,9 +1111,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -1408,7 +1178,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: conda-py3_7-cuda11_5 retention-days: 14 @@ -1449,30 +1219,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -1489,10 +1239,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: conda-py3_7-cuda11_5 @@ -1603,30 +1350,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -1643,12 +1370,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: conda-py3_7-cuda11_5 @@ -1704,7 +1428,7 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - conda-py3_8-cpu-build: + conda-py3_7-cuda11_6-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: linux.4xlarge timeout-minutes: 240 @@ -1712,36 +1436,17 @@ jobs: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu - DOCKER_IMAGE: pytorch/conda-builder:cpu + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/conda-builder:cuda11.6 SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.8" + DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -1758,9 +1463,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -1784,6 +1486,9 @@ jobs: # Remove any artifacts from the previous checkouts git clean -fxd working-directory: builder + - name: Set BUILD_SPLIT_CUDA + run: | + echo "BUILD_SPLIT_CUDA='ON'" >> "$GITHUB_ENV" - name: Pull Docker image run: | retry () { @@ -1825,9 +1530,9 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: - name: conda-py3_8-cpu + name: conda-py3_7-cuda11_6 retention-days: 14 if-no-files-found: error path: @@ -1850,45 +1555,26 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - conda-py3_8-cpu-test: # Testing + conda-py3_7-cuda11_6-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_8-cpu-build - runs-on: linux.4xlarge + needs: conda-py3_7-cuda11_6-build + runs-on: linux.4xlarge.nvidia.gpu timeout-minutes: 240 env: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu - DOCKER_IMAGE: pytorch/conda-builder:cpu + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/conda-builder:cuda11.6 SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.8" + DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -1905,13 +1591,10 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: conda-py3_8-cpu + name: conda-py3_7-cuda11_6 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -1936,6 +1619,17 @@ jobs: # Remove any artifacts from the previous checkouts git clean -fxd working-directory: builder + - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a + name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG + with: + timeout_minutes: 10 + max_attempts: 3 + command: | + set -ex + pushd pytorch + bash .github/scripts/install_nvidia_utils_linux.sh + echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" + popd - name: Pull Docker image run: | retry () { @@ -1993,44 +1687,25 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - conda-py3_8-cpu-upload: # Uploading + conda-py3_7-cuda11_6-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_8-cpu-test + needs: conda-py3_7-cuda11_6-test env: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu - DOCKER_IMAGE: pytorch/conda-builder:cpu + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/conda-builder:cuda11.6 SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.8" + DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2047,15 +1722,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: conda-py3_8-cpu + name: conda-py3_7-cuda11_6 path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -2108,7 +1780,7 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - conda-py3_8-cuda10_2-build: + conda-py3_8-cpu-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: linux.4xlarge timeout-minutes: 240 @@ -2116,37 +1788,16 @@ jobs: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda10.2 + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + DOCKER_IMAGE: pytorch/conda-builder:cpu SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2163,9 +1814,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -2230,9 +1878,9 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: - name: conda-py3_8-cuda10_2 + name: conda-py3_8-cpu retention-days: 14 if-no-files-found: error path: @@ -2255,46 +1903,25 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - conda-py3_8-cuda10_2-test: # Testing + conda-py3_8-cpu-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_8-cuda10_2-build - runs-on: linux.4xlarge.nvidia.gpu + needs: conda-py3_8-cpu-build + runs-on: linux.4xlarge timeout-minutes: 240 env: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda10.2 + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + DOCKER_IMAGE: pytorch/conda-builder:cpu SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2311,13 +1938,10 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: conda-py3_8-cuda10_2 + name: conda-py3_8-cpu path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -2342,17 +1966,6 @@ jobs: # Remove any artifacts from the previous checkouts git clean -fxd working-directory: builder - - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG - with: - timeout_minutes: 10 - max_attempts: 3 - command: | - set -ex - pushd pytorch - bash .github/scripts/install_nvidia_utils_linux.sh - echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" - popd - name: Pull Docker image run: | retry () { @@ -2410,45 +2023,24 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - conda-py3_8-cuda10_2-upload: # Uploading + conda-py3_8-cpu-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_8-cuda10_2-test + needs: conda-py3_8-cpu-test env: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda10.2 + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + DOCKER_IMAGE: pytorch/conda-builder:cpu SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2465,15 +2057,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: conda-py3_8-cuda10_2 + name: conda-py3_8-cpu path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -2526,7 +2115,7 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - conda-py3_8-cuda11_3-build: + conda-py3_8-cuda10_2-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: linux.4xlarge timeout-minutes: 240 @@ -2534,37 +2123,17 @@ jobs: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu102 + GPU_ARCH_VERSION: 10.2 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda11.3 + DOCKER_IMAGE: pytorch/conda-builder:cuda10.2 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2581,9 +2150,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -2607,9 +2173,6 @@ jobs: # Remove any artifacts from the previous checkouts git clean -fxd working-directory: builder - - name: Set BUILD_SPLIT_CUDA - run: | - echo "BUILD_SPLIT_CUDA='ON'" >> "$GITHUB_ENV" - name: Pull Docker image run: | retry () { @@ -2651,9 +2214,9 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: - name: conda-py3_8-cuda11_3 + name: conda-py3_8-cuda10_2 retention-days: 14 if-no-files-found: error path: @@ -2676,46 +2239,26 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - conda-py3_8-cuda11_3-test: # Testing + conda-py3_8-cuda10_2-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_8-cuda11_3-build + needs: conda-py3_8-cuda10_2-build runs-on: linux.4xlarge.nvidia.gpu timeout-minutes: 240 env: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu102 + GPU_ARCH_VERSION: 10.2 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda11.3 + DOCKER_IMAGE: pytorch/conda-builder:cuda10.2 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2732,13 +2275,10 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: conda-py3_8-cuda11_3 + name: conda-py3_8-cuda10_2 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -2831,45 +2371,25 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - conda-py3_8-cuda11_3-upload: # Uploading + conda-py3_8-cuda10_2-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_8-cuda11_3-test + needs: conda-py3_8-cuda10_2-test env: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu102 + GPU_ARCH_VERSION: 10.2 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda11.3 + DOCKER_IMAGE: pytorch/conda-builder:cuda10.2 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2886,15 +2406,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: conda-py3_8-cuda11_3 + name: conda-py3_8-cuda10_2 path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -2947,7 +2464,7 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - conda-py3_8-cuda11_5-build: + conda-py3_8-cuda11_3-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: linux.4xlarge timeout-minutes: 240 @@ -2955,37 +2472,17 @@ jobs: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu115 - GPU_ARCH_VERSION: 11.5 + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda11.5 + DOCKER_IMAGE: pytorch/conda-builder:cuda11.3 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3002,9 +2499,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -3072,9 +2566,9 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: - name: conda-py3_8-cuda11_5 + name: conda-py3_8-cuda11_3 retention-days: 14 if-no-files-found: error path: @@ -3097,46 +2591,26 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - conda-py3_8-cuda11_5-test: # Testing + conda-py3_8-cuda11_3-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_8-cuda11_5-build + needs: conda-py3_8-cuda11_3-build runs-on: linux.4xlarge.nvidia.gpu timeout-minutes: 240 env: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu115 - GPU_ARCH_VERSION: 11.5 + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda11.5 + DOCKER_IMAGE: pytorch/conda-builder:cuda11.3 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3153,13 +2627,10 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: conda-py3_8-cuda11_5 + name: conda-py3_8-cuda11_3 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -3252,45 +2723,25 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - conda-py3_8-cuda11_5-upload: # Uploading + conda-py3_8-cuda11_3-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_8-cuda11_5-test + needs: conda-py3_8-cuda11_3-test env: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu115 - GPU_ARCH_VERSION: 11.5 + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda11.5 + DOCKER_IMAGE: pytorch/conda-builder:cuda11.3 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3307,15 +2758,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: conda-py3_8-cuda11_5 + name: conda-py3_8-cuda11_3 path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -3368,7 +2816,7 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - conda-py3_9-cpu-build: + conda-py3_8-cuda11_5-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: linux.4xlarge timeout-minutes: 240 @@ -3376,36 +2824,17 @@ jobs: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu - DOCKER_IMAGE: pytorch/conda-builder:cpu + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/conda-builder:cuda11.5 SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.9" + DESIRED_PYTHON: "3.8" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3422,9 +2851,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -3448,6 +2874,9 @@ jobs: # Remove any artifacts from the previous checkouts git clean -fxd working-directory: builder + - name: Set BUILD_SPLIT_CUDA + run: | + echo "BUILD_SPLIT_CUDA='ON'" >> "$GITHUB_ENV" - name: Pull Docker image run: | retry () { @@ -3489,9 +2918,9 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: - name: conda-py3_9-cpu + name: conda-py3_8-cuda11_5 retention-days: 14 if-no-files-found: error path: @@ -3514,45 +2943,26 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - conda-py3_9-cpu-test: # Testing + conda-py3_8-cuda11_5-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_9-cpu-build - runs-on: linux.4xlarge + needs: conda-py3_8-cuda11_5-build + runs-on: linux.4xlarge.nvidia.gpu timeout-minutes: 240 env: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu - DOCKER_IMAGE: pytorch/conda-builder:cpu + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/conda-builder:cuda11.5 SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.9" + DESIRED_PYTHON: "3.8" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3569,13 +2979,10 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: conda-py3_9-cpu + name: conda-py3_8-cuda11_5 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -3600,6 +3007,17 @@ jobs: # Remove any artifacts from the previous checkouts git clean -fxd working-directory: builder + - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a + name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG + with: + timeout_minutes: 10 + max_attempts: 3 + command: | + set -ex + pushd pytorch + bash .github/scripts/install_nvidia_utils_linux.sh + echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" + popd - name: Pull Docker image run: | retry () { @@ -3657,44 +3075,25 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - conda-py3_9-cpu-upload: # Uploading + conda-py3_8-cuda11_5-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_9-cpu-test + needs: conda-py3_8-cuda11_5-test env: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu - DOCKER_IMAGE: pytorch/conda-builder:cpu + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/conda-builder:cuda11.5 SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.9" + DESIRED_PYTHON: "3.8" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3711,15 +3110,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: conda-py3_9-cpu + name: conda-py3_8-cuda11_5 path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -3772,7 +3168,7 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - conda-py3_9-cuda10_2-build: + conda-py3_8-cuda11_6-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: linux.4xlarge timeout-minutes: 240 @@ -3780,37 +3176,17 @@ jobs: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda10.2 + DOCKER_IMAGE: pytorch/conda-builder:cuda11.6 SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.9" + DESIRED_PYTHON: "3.8" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3827,9 +3203,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -3853,6 +3226,9 @@ jobs: # Remove any artifacts from the previous checkouts git clean -fxd working-directory: builder + - name: Set BUILD_SPLIT_CUDA + run: | + echo "BUILD_SPLIT_CUDA='ON'" >> "$GITHUB_ENV" - name: Pull Docker image run: | retry () { @@ -3894,9 +3270,9 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: - name: conda-py3_9-cuda10_2 + name: conda-py3_8-cuda11_6 retention-days: 14 if-no-files-found: error path: @@ -3919,46 +3295,26 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - conda-py3_9-cuda10_2-test: # Testing + conda-py3_8-cuda11_6-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_9-cuda10_2-build + needs: conda-py3_8-cuda11_6-build runs-on: linux.4xlarge.nvidia.gpu timeout-minutes: 240 env: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda10.2 + DOCKER_IMAGE: pytorch/conda-builder:cuda11.6 SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.9" + DESIRED_PYTHON: "3.8" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3975,13 +3331,10 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: conda-py3_9-cuda10_2 + name: conda-py3_8-cuda11_6 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -4074,45 +3427,25 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - conda-py3_9-cuda10_2-upload: # Uploading + conda-py3_8-cuda11_6-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_9-cuda10_2-test + needs: conda-py3_8-cuda11_6-test env: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda10.2 + DOCKER_IMAGE: pytorch/conda-builder:cuda11.6 SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.9" + DESIRED_PYTHON: "3.8" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -4129,15 +3462,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: conda-py3_9-cuda10_2 + name: conda-py3_8-cuda11_6 path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -4190,7 +3520,7 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - conda-py3_9-cuda11_3-build: + conda-py3_9-cpu-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: linux.4xlarge timeout-minutes: 240 @@ -4198,37 +3528,16 @@ jobs: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda11.3 + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + DOCKER_IMAGE: pytorch/conda-builder:cpu SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.9" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -4245,9 +3554,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -4271,9 +3577,6 @@ jobs: # Remove any artifacts from the previous checkouts git clean -fxd working-directory: builder - - name: Set BUILD_SPLIT_CUDA - run: | - echo "BUILD_SPLIT_CUDA='ON'" >> "$GITHUB_ENV" - name: Pull Docker image run: | retry () { @@ -4315,9 +3618,9 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: - name: conda-py3_9-cuda11_3 + name: conda-py3_9-cpu retention-days: 14 if-no-files-found: error path: @@ -4340,7 +3643,695 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - conda-py3_9-cuda11_3-test: # Testing + conda-py3_9-cpu-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_9-cpu-build + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + DOCKER_IMAGE: pytorch/conda-builder:cpu + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.9" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_9-cpu + path: "${{ runner.temp }}/artifacts/" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Test PyTorch binary + run: | + set -x + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/final_pkgs" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + # Generate test script + docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh" + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + conda-py3_9-cpu-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_9-cpu-test + env: + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + DOCKER_IMAGE: pytorch/conda-builder:cpu + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.9" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_9-cpu + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + conda-py3_9-cuda10_2-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu102 + GPU_ARCH_VERSION: 10.2 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/conda-builder:cuda10.2 + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.9" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Build PyTorch binary + run: | + set -x + mkdir -p artifacts/ + container_name=$(docker run \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/conda/build.sh" + - name: Chown artifacts + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - uses: seemethere/upload-artifact-s3@v4 + with: + name: conda-py3_9-cuda10_2 + retention-days: 14 + if-no-files-found: error + path: + ${{ runner.temp }}/artifacts/* + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + conda-py3_9-cuda10_2-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_9-cuda10_2-build + runs-on: linux.4xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu102 + GPU_ARCH_VERSION: 10.2 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/conda-builder:cuda10.2 + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.9" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_9-cuda10_2 + path: "${{ runner.temp }}/artifacts/" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a + name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG + with: + timeout_minutes: 10 + max_attempts: 3 + command: | + set -ex + pushd pytorch + bash .github/scripts/install_nvidia_utils_linux.sh + echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" + popd + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Test PyTorch binary + run: | + set -x + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/final_pkgs" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + # Generate test script + docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh" + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + conda-py3_9-cuda10_2-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_9-cuda10_2-test + env: + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu102 + GPU_ARCH_VERSION: 10.2 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/conda-builder:cuda10.2 + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.9" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_9-cuda10_2 + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + conda-py3_9-cuda11_3-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/conda-builder:cuda11.3 + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.9" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Set BUILD_SPLIT_CUDA + run: | + echo "BUILD_SPLIT_CUDA='ON'" >> "$GITHUB_ENV" + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Build PyTorch binary + run: | + set -x + mkdir -p artifacts/ + container_name=$(docker run \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/conda/build.sh" + - name: Chown artifacts + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - uses: seemethere/upload-artifact-s3@v4 + with: + name: conda-py3_9-cuda11_3 + retention-days: 14 + if-no-files-found: error + path: + ${{ runner.temp }}/artifacts/* + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + conda-py3_9-cuda11_3-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} needs: conda-py3_9-cuda11_3-build runs-on: linux.4xlarge.nvidia.gpu @@ -4349,37 +4340,721 @@ jobs: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/conda-builder:cuda11.3 + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.9" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_9-cuda11_3 + path: "${{ runner.temp }}/artifacts/" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a + name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG + with: + timeout_minutes: 10 + max_attempts: 3 + command: | + set -ex + pushd pytorch + bash .github/scripts/install_nvidia_utils_linux.sh + echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" + popd + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Test PyTorch binary + run: | + set -x + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/final_pkgs" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + # Generate test script + docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh" + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + conda-py3_9-cuda11_3-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_9-cuda11_3-test + env: + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/conda-builder:cuda11.3 + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.9" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_9-cuda11_3 + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + conda-py3_9-cuda11_5-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/conda-builder:cuda11.5 + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.9" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Set BUILD_SPLIT_CUDA + run: | + echo "BUILD_SPLIT_CUDA='ON'" >> "$GITHUB_ENV" + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Build PyTorch binary + run: | + set -x + mkdir -p artifacts/ + container_name=$(docker run \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/conda/build.sh" + - name: Chown artifacts + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - uses: seemethere/upload-artifact-s3@v4 + with: + name: conda-py3_9-cuda11_5 + retention-days: 14 + if-no-files-found: error + path: + ${{ runner.temp }}/artifacts/* + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + conda-py3_9-cuda11_5-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_9-cuda11_5-build + runs-on: linux.4xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/conda-builder:cuda11.5 + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.9" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_9-cuda11_5 + path: "${{ runner.temp }}/artifacts/" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a + name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG + with: + timeout_minutes: 10 + max_attempts: 3 + command: | + set -ex + pushd pytorch + bash .github/scripts/install_nvidia_utils_linux.sh + echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" + popd + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Test PyTorch binary + run: | + set -x + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/final_pkgs" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + # Generate test script + docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh" + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + conda-py3_9-cuda11_5-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_9-cuda11_5-test + env: + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/conda-builder:cuda11.5 + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.9" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_9-cuda11_5 + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + conda-py3_9-cuda11_6-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda11.3 + DOCKER_IMAGE: pytorch/conda-builder:cuda11.6 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.9" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Set BUILD_SPLIT_CUDA + run: | + echo "BUILD_SPLIT_CUDA='ON'" >> "$GITHUB_ENV" + - name: Pull Docker image run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") retry () { "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + retry docker pull "${DOCKER_IMAGE}" + - name: Build PyTorch binary + run: | + set -x + mkdir -p artifacts/ + container_name=$(docker run \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/conda/build.sh" + - name: Chown artifacts + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - uses: seemethere/upload-artifact-s3@v4 + with: + name: conda-py3_9-cuda11_6 + retention-days: 14 + if-no-files-found: error + path: + ${{ runner.temp }}/artifacts/* + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + conda-py3_9-cuda11_6-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_9-cuda11_6-build + runs-on: linux.4xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/conda-builder:cuda11.6 + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.9" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -4396,13 +5071,10 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: conda-py3_9-cuda11_3 + name: conda-py3_9-cuda11_6 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -4495,45 +5167,25 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - conda-py3_9-cuda11_3-upload: # Uploading + conda-py3_9-cuda11_6-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_9-cuda11_3-test + needs: conda-py3_9-cuda11_6-test env: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda11.3 + DOCKER_IMAGE: pytorch/conda-builder:cuda11.6 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.9" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -4550,15 +5202,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: conda-py3_9-cuda11_3 + name: conda-py3_9-cuda11_6 path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -4611,7 +5260,7 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - conda-py3_9-cuda11_5-build: + conda-py3_10-cpu-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: linux.4xlarge timeout-minutes: 240 @@ -4619,37 +5268,16 @@ jobs: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu115 - GPU_ARCH_VERSION: 11.5 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda11.5 + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + DOCKER_IMAGE: pytorch/conda-builder:cpu SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.9" + DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -4666,9 +5294,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -4692,9 +5317,6 @@ jobs: # Remove any artifacts from the previous checkouts git clean -fxd working-directory: builder - - name: Set BUILD_SPLIT_CUDA - run: | - echo "BUILD_SPLIT_CUDA='ON'" >> "$GITHUB_ENV" - name: Pull Docker image run: | retry () { @@ -4736,9 +5358,9 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: - name: conda-py3_9-cuda11_5 + name: conda-py3_10-cpu retention-days: 14 if-no-files-found: error path: @@ -4761,46 +5383,25 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - conda-py3_9-cuda11_5-test: # Testing + conda-py3_10-cpu-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_9-cuda11_5-build - runs-on: linux.4xlarge.nvidia.gpu + needs: conda-py3_10-cpu-build + runs-on: linux.4xlarge timeout-minutes: 240 env: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu115 - GPU_ARCH_VERSION: 11.5 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda11.5 + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + DOCKER_IMAGE: pytorch/conda-builder:cpu SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.9" + DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -4817,13 +5418,10 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: conda-py3_9-cuda11_5 + name: conda-py3_10-cpu path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -4848,17 +5446,6 @@ jobs: # Remove any artifacts from the previous checkouts git clean -fxd working-directory: builder - - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG - with: - timeout_minutes: 10 - max_attempts: 3 - command: | - set -ex - pushd pytorch - bash .github/scripts/install_nvidia_utils_linux.sh - echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" - popd - name: Pull Docker image run: | retry () { @@ -4916,45 +5503,24 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - conda-py3_9-cuda11_5-upload: # Uploading + conda-py3_10-cpu-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_9-cuda11_5-test + needs: conda-py3_10-cpu-test env: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu115 - GPU_ARCH_VERSION: 11.5 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda11.5 + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + DOCKER_IMAGE: pytorch/conda-builder:cpu SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.9" + DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -4971,15 +5537,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: conda-py3_9-cuda11_5 + name: conda-py3_10-cpu path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -5032,7 +5595,7 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - conda-py3_10-cpu-build: + conda-py3_10-cuda10_2-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: linux.4xlarge timeout-minutes: 240 @@ -5040,36 +5603,17 @@ jobs: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu - DOCKER_IMAGE: pytorch/conda-builder:cpu + DESIRED_CUDA: cu102 + GPU_ARCH_VERSION: 10.2 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/conda-builder:cuda10.2 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -5086,9 +5630,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -5153,9 +5694,9 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: - name: conda-py3_10-cpu + name: conda-py3_10-cuda10_2 retention-days: 14 if-no-files-found: error path: @@ -5178,45 +5719,26 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - conda-py3_10-cpu-test: # Testing + conda-py3_10-cuda10_2-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_10-cpu-build - runs-on: linux.4xlarge + needs: conda-py3_10-cuda10_2-build + runs-on: linux.4xlarge.nvidia.gpu timeout-minutes: 240 env: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu - DOCKER_IMAGE: pytorch/conda-builder:cpu + DESIRED_CUDA: cu102 + GPU_ARCH_VERSION: 10.2 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/conda-builder:cuda10.2 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -5233,13 +5755,10 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: conda-py3_10-cpu + name: conda-py3_10-cuda10_2 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -5264,6 +5783,17 @@ jobs: # Remove any artifacts from the previous checkouts git clean -fxd working-directory: builder + - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a + name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG + with: + timeout_minutes: 10 + max_attempts: 3 + command: | + set -ex + pushd pytorch + bash .github/scripts/install_nvidia_utils_linux.sh + echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" + popd - name: Pull Docker image run: | retry () { @@ -5321,44 +5851,25 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - conda-py3_10-cpu-upload: # Uploading + conda-py3_10-cuda10_2-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_10-cpu-test + needs: conda-py3_10-cuda10_2-test env: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu - DOCKER_IMAGE: pytorch/conda-builder:cpu + DESIRED_CUDA: cu102 + GPU_ARCH_VERSION: 10.2 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/conda-builder:cuda10.2 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -5375,15 +5886,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: conda-py3_10-cpu + name: conda-py3_10-cuda10_2 path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -5436,7 +5944,7 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - conda-py3_10-cuda10_2-build: + conda-py3_10-cuda11_3-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: linux.4xlarge timeout-minutes: 240 @@ -5444,37 +5952,17 @@ jobs: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda10.2 + DOCKER_IMAGE: pytorch/conda-builder:cuda11.3 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -5491,9 +5979,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -5517,6 +6002,9 @@ jobs: # Remove any artifacts from the previous checkouts git clean -fxd working-directory: builder + - name: Set BUILD_SPLIT_CUDA + run: | + echo "BUILD_SPLIT_CUDA='ON'" >> "$GITHUB_ENV" - name: Pull Docker image run: | retry () { @@ -5558,9 +6046,9 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: - name: conda-py3_10-cuda10_2 + name: conda-py3_10-cuda11_3 retention-days: 14 if-no-files-found: error path: @@ -5583,46 +6071,26 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - conda-py3_10-cuda10_2-test: # Testing + conda-py3_10-cuda11_3-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_10-cuda10_2-build + needs: conda-py3_10-cuda11_3-build runs-on: linux.4xlarge.nvidia.gpu timeout-minutes: 240 env: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda10.2 + DOCKER_IMAGE: pytorch/conda-builder:cuda11.3 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -5639,13 +6107,10 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: conda-py3_10-cuda10_2 + name: conda-py3_10-cuda11_3 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -5738,45 +6203,25 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - conda-py3_10-cuda10_2-upload: # Uploading + conda-py3_10-cuda11_3-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_10-cuda10_2-test + needs: conda-py3_10-cuda11_3-test env: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda10.2 + DOCKER_IMAGE: pytorch/conda-builder:cuda11.3 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -5793,15 +6238,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: conda-py3_10-cuda10_2 + name: conda-py3_10-cuda11_3 path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -5854,7 +6296,7 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - conda-py3_10-cuda11_3-build: + conda-py3_10-cuda11_5-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: linux.4xlarge timeout-minutes: 240 @@ -5862,37 +6304,17 @@ jobs: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda11.3 + DOCKER_IMAGE: pytorch/conda-builder:cuda11.5 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -5909,9 +6331,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -5979,9 +6398,9 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: - name: conda-py3_10-cuda11_3 + name: conda-py3_10-cuda11_5 retention-days: 14 if-no-files-found: error path: @@ -6004,46 +6423,26 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - conda-py3_10-cuda11_3-test: # Testing + conda-py3_10-cuda11_5-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_10-cuda11_3-build + needs: conda-py3_10-cuda11_5-build runs-on: linux.4xlarge.nvidia.gpu timeout-minutes: 240 env: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda11.3 + DOCKER_IMAGE: pytorch/conda-builder:cuda11.5 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -6060,13 +6459,10 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: conda-py3_10-cuda11_3 + name: conda-py3_10-cuda11_5 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -6159,45 +6555,25 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - conda-py3_10-cuda11_3-upload: # Uploading + conda-py3_10-cuda11_5-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_10-cuda11_3-test + needs: conda-py3_10-cuda11_5-test env: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda11.3 + DOCKER_IMAGE: pytorch/conda-builder:cuda11.5 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -6214,15 +6590,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: conda-py3_10-cuda11_3 + name: conda-py3_10-cuda11_5 path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -6275,7 +6648,7 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - conda-py3_10-cuda11_5-build: + conda-py3_10-cuda11_6-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: linux.4xlarge timeout-minutes: 240 @@ -6283,37 +6656,17 @@ jobs: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu115 - GPU_ARCH_VERSION: 11.5 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda11.5 + DOCKER_IMAGE: pytorch/conda-builder:cuda11.6 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -6330,9 +6683,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -6400,9 +6750,9 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: - name: conda-py3_10-cuda11_5 + name: conda-py3_10-cuda11_6 retention-days: 14 if-no-files-found: error path: @@ -6425,46 +6775,26 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - conda-py3_10-cuda11_5-test: # Testing + conda-py3_10-cuda11_6-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_10-cuda11_5-build + needs: conda-py3_10-cuda11_6-build runs-on: linux.4xlarge.nvidia.gpu timeout-minutes: 240 env: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu115 - GPU_ARCH_VERSION: 11.5 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda11.5 + DOCKER_IMAGE: pytorch/conda-builder:cuda11.6 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -6481,13 +6811,10 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: conda-py3_10-cuda11_5 + name: conda-py3_10-cuda11_6 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -6580,45 +6907,25 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - conda-py3_10-cuda11_5-upload: # Uploading + conda-py3_10-cuda11_6-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_10-cuda11_5-test + needs: conda-py3_10-cuda11_6-test env: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu115 - GPU_ARCH_VERSION: 11.5 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda11.5 + DOCKER_IMAGE: pytorch/conda-builder:cuda11.6 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -6635,15 +6942,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: conda-py3_10-cuda11_5 + name: conda-py3_10-cuda11_6 path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} diff --git a/.github/workflows/generated-linux-binary-libtorch-cxx11-abi-master.yml b/.github/workflows/generated-linux-binary-libtorch-cxx11-abi-master.yml new file mode 100644 index 00000000000000..3fa24203231b66 --- /dev/null +++ b/.github/workflows/generated-linux-binary-libtorch-cxx11-abi-master.yml @@ -0,0 +1,283 @@ +# @generated DO NOT EDIT MANUALLY + +# Template is at: .github/templates/linux_binary_build_workflow.yml.j2 +# Generation script: .github/scripts/generate_ci_workflows.py +name: linux-binary-libtorch-cxx11-abi + +on: + push: + branches: + - master + tags: + - 'ciflow/all/*' + - 'ciflow/trunk/*' + workflow_dispatch: + +env: + # Needed for conda builds + ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" + ANACONDA_USER: pytorch + AWS_DEFAULT_REGION: us-east-1 + BINARY_ENV_FILE: /tmp/env + BUILD_ENVIRONMENT: linux-binary-libtorch-cxx11-abi + BUILDER_ROOT: /builder + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + IN_CI: 1 + IS_GHA: 1 + PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} + PR_NUMBER: ${{ github.event.pull_request.number }} + PYTORCH_FINAL_PACKAGE_DIR: /artifacts + PYTORCH_RETRY_TEST_CASES: 1 + PYTORCH_ROOT: /pytorch + SHA1: ${{ github.event.pull_request.head.sha || github.sha }} + SKIP_ALL_TESTS: 1 +concurrency: + group: linux-binary-libtorch-cxx11-abi-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} + cancel-in-progress: true + +jobs: + libtorch-cpu-shared-with-deps-cxx11-abi-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cpu + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-with-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Build PyTorch binary + run: | + set -x + mkdir -p artifacts/ + container_name=$(docker run \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/libtorch/build.sh" + - name: Chown artifacts + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - uses: seemethere/upload-artifact-s3@v4 + with: + name: libtorch-cpu-shared-with-deps-cxx11-abi + retention-days: 14 + if-no-files-found: error + path: + ${{ runner.temp }}/artifacts/* + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-cpu-shared-with-deps-cxx11-abi-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-cpu-shared-with-deps-cxx11-abi-build + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cpu + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-with-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cpu-shared-with-deps-cxx11-abi + path: "${{ runner.temp }}/artifacts/" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Test PyTorch binary + run: | + set -x + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/final_pkgs" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + # Generate test script + docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh" + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af diff --git a/.github/workflows/generated-linux-binary-libtorch-cxx11-abi.yml b/.github/workflows/generated-linux-binary-libtorch-cxx11-abi-nightly.yml similarity index 56% rename from .github/workflows/generated-linux-binary-libtorch-cxx11-abi.yml rename to .github/workflows/generated-linux-binary-libtorch-cxx11-abi-nightly.yml index 5505e4a86971e5..46a8370c1c57e1 100644 --- a/.github/workflows/generated-linux-binary-libtorch-cxx11-abi.yml +++ b/.github/workflows/generated-linux-binary-libtorch-cxx11-abi-nightly.yml @@ -55,30 +55,10 @@ jobs: LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -95,9 +75,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -162,7 +139,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: libtorch-cpu-shared-with-deps-cxx11-abi retention-days: 14 @@ -203,30 +180,10 @@ jobs: LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -243,10 +200,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cpu-shared-with-deps-cxx11-abi @@ -346,30 +300,10 @@ jobs: LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -386,12 +320,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cpu-shared-with-deps-cxx11-abi @@ -462,30 +393,10 @@ jobs: LIBTORCH_VARIANT: shared-without-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -502,9 +413,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -569,7 +477,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: libtorch-cpu-shared-without-deps-cxx11-abi retention-days: 14 @@ -610,30 +518,10 @@ jobs: LIBTORCH_VARIANT: shared-without-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -650,10 +538,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cpu-shared-without-deps-cxx11-abi @@ -753,30 +638,10 @@ jobs: LIBTORCH_VARIANT: shared-without-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -793,12 +658,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cpu-shared-without-deps-cxx11-abi @@ -869,30 +731,10 @@ jobs: LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -909,9 +751,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -976,7 +815,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: libtorch-cpu-static-with-deps-cxx11-abi retention-days: 14 @@ -1017,30 +856,10 @@ jobs: LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -1057,10 +876,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cpu-static-with-deps-cxx11-abi @@ -1160,30 +976,10 @@ jobs: LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -1200,12 +996,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cpu-static-with-deps-cxx11-abi @@ -1276,30 +1069,10 @@ jobs: LIBTORCH_VARIANT: static-without-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -1316,9 +1089,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -1383,7 +1153,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: libtorch-cpu-static-without-deps-cxx11-abi retention-days: 14 @@ -1424,30 +1194,10 @@ jobs: LIBTORCH_VARIANT: static-without-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -1464,10 +1214,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cpu-static-without-deps-cxx11-abi @@ -1567,30 +1314,10 @@ jobs: LIBTORCH_VARIANT: static-without-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -1607,12 +1334,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cpu-static-without-deps-cxx11-abi @@ -1684,30 +1408,10 @@ jobs: LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -1724,9 +1428,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -1791,7 +1492,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: libtorch-cuda10_2-shared-with-deps-cxx11-abi retention-days: 14 @@ -1833,30 +1534,10 @@ jobs: LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -1873,10 +1554,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda10_2-shared-with-deps-cxx11-abi @@ -1988,30 +1666,10 @@ jobs: LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2028,12 +1686,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda10_2-shared-with-deps-cxx11-abi @@ -2105,30 +1760,10 @@ jobs: LIBTORCH_VARIANT: shared-without-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2145,9 +1780,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -2212,7 +1844,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: libtorch-cuda10_2-shared-without-deps-cxx11-abi retention-days: 14 @@ -2254,30 +1886,10 @@ jobs: LIBTORCH_VARIANT: shared-without-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2294,10 +1906,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda10_2-shared-without-deps-cxx11-abi @@ -2409,30 +2018,10 @@ jobs: LIBTORCH_VARIANT: shared-without-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2449,12 +2038,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda10_2-shared-without-deps-cxx11-abi @@ -2526,30 +2112,10 @@ jobs: LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2566,9 +2132,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -2633,7 +2196,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: libtorch-cuda10_2-static-with-deps-cxx11-abi retention-days: 14 @@ -2675,30 +2238,10 @@ jobs: LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2715,10 +2258,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda10_2-static-with-deps-cxx11-abi @@ -2830,30 +2370,10 @@ jobs: LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2870,12 +2390,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda10_2-static-with-deps-cxx11-abi @@ -2947,30 +2464,10 @@ jobs: LIBTORCH_VARIANT: static-without-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2987,9 +2484,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -3054,7 +2548,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: libtorch-cuda10_2-static-without-deps-cxx11-abi retention-days: 14 @@ -3096,30 +2590,10 @@ jobs: LIBTORCH_VARIANT: static-without-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3136,10 +2610,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda10_2-static-without-deps-cxx11-abi @@ -3251,30 +2722,10 @@ jobs: LIBTORCH_VARIANT: static-without-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3291,12 +2742,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda10_2-static-without-deps-cxx11-abi @@ -3368,30 +2816,10 @@ jobs: LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3408,9 +2836,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -3478,7 +2903,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: libtorch-cuda11_3-shared-with-deps-cxx11-abi retention-days: 14 @@ -3520,30 +2945,10 @@ jobs: LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3560,10 +2965,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_3-shared-with-deps-cxx11-abi @@ -3675,30 +3077,10 @@ jobs: LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3715,12 +3097,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_3-shared-with-deps-cxx11-abi @@ -3792,30 +3171,10 @@ jobs: LIBTORCH_VARIANT: shared-without-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3832,9 +3191,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -3902,7 +3258,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: libtorch-cuda11_3-shared-without-deps-cxx11-abi retention-days: 14 @@ -3944,30 +3300,10 @@ jobs: LIBTORCH_VARIANT: shared-without-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3984,10 +3320,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_3-shared-without-deps-cxx11-abi @@ -4099,30 +3432,10 @@ jobs: LIBTORCH_VARIANT: shared-without-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -4139,12 +3452,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_3-shared-without-deps-cxx11-abi @@ -4216,30 +3526,10 @@ jobs: LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -4256,9 +3546,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -4326,7 +3613,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: libtorch-cuda11_3-static-with-deps-cxx11-abi retention-days: 14 @@ -4368,30 +3655,10 @@ jobs: LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -4408,10 +3675,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_3-static-with-deps-cxx11-abi @@ -4523,30 +3787,10 @@ jobs: LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -4563,12 +3807,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_3-static-with-deps-cxx11-abi @@ -4640,30 +3881,10 @@ jobs: LIBTORCH_VARIANT: static-without-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -4680,9 +3901,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -4750,7 +3968,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: libtorch-cuda11_3-static-without-deps-cxx11-abi retention-days: 14 @@ -4792,30 +4010,10 @@ jobs: LIBTORCH_VARIANT: static-without-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -4832,10 +4030,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_3-static-without-deps-cxx11-abi @@ -4947,30 +4142,10 @@ jobs: LIBTORCH_VARIANT: static-without-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -4987,12 +4162,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_3-static-without-deps-cxx11-abi @@ -5064,31 +4236,11 @@ jobs: LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace run: | retry () { "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") @@ -5104,9 +4256,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -5174,7 +4323,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: libtorch-cuda11_5-shared-with-deps-cxx11-abi retention-days: 14 @@ -5216,30 +4365,10 @@ jobs: LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -5256,10 +4385,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_5-shared-with-deps-cxx11-abi @@ -5371,30 +4497,10 @@ jobs: LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -5411,12 +4517,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_5-shared-with-deps-cxx11-abi @@ -5488,30 +4591,10 @@ jobs: LIBTORCH_VARIANT: shared-without-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -5528,9 +4611,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -5598,7 +4678,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: libtorch-cuda11_5-shared-without-deps-cxx11-abi retention-days: 14 @@ -5640,30 +4720,10 @@ jobs: LIBTORCH_VARIANT: shared-without-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -5680,10 +4740,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_5-shared-without-deps-cxx11-abi @@ -5795,30 +4852,10 @@ jobs: LIBTORCH_VARIANT: shared-without-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -5835,12 +4872,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_5-shared-without-deps-cxx11-abi @@ -5912,30 +4946,10 @@ jobs: LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -5952,9 +4966,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -6022,7 +5033,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: libtorch-cuda11_5-static-with-deps-cxx11-abi retention-days: 14 @@ -6064,30 +5075,10 @@ jobs: LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -6104,10 +5095,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_5-static-with-deps-cxx11-abi @@ -6219,30 +5207,10 @@ jobs: LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -6259,12 +5227,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_5-static-with-deps-cxx11-abi @@ -6336,30 +5301,10 @@ jobs: LIBTORCH_VARIANT: static-without-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -6376,9 +5321,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -6446,7 +5388,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: libtorch-cuda11_5-static-without-deps-cxx11-abi retention-days: 14 @@ -6488,30 +5430,10 @@ jobs: LIBTORCH_VARIANT: static-without-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -6528,10 +5450,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_5-static-without-deps-cxx11-abi @@ -6643,30 +5562,104 @@ jobs: LIBTORCH_VARIANT: static-without-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") retry () { "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cuda11_5-static-without-deps-cxx11-abi + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-cuda11_6-shared-with-deps-cxx11-abi-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cuda11.6 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-with-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -6683,15 +5676,4066 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Clone pytorch/pytorch - uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download Build Artifacts + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: - name: libtorch-cuda11_5-static-without-deps-cxx11-abi + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Set BUILD_SPLIT_CUDA + run: | + echo "BUILD_SPLIT_CUDA='ON'" >> "$GITHUB_ENV" + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Build PyTorch binary + run: | + set -x + mkdir -p artifacts/ + container_name=$(docker run \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/libtorch/build.sh" + - name: Chown artifacts + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - uses: seemethere/upload-artifact-s3@v4 + with: + name: libtorch-cuda11_6-shared-with-deps-cxx11-abi + retention-days: 14 + if-no-files-found: error + path: + ${{ runner.temp }}/artifacts/* + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-cuda11_6-shared-with-deps-cxx11-abi-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-cuda11_6-shared-with-deps-cxx11-abi-build + runs-on: linux.4xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cuda11.6 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-with-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cuda11_6-shared-with-deps-cxx11-abi + path: "${{ runner.temp }}/artifacts/" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a + name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG + with: + timeout_minutes: 10 + max_attempts: 3 + command: | + set -ex + pushd pytorch + bash .github/scripts/install_nvidia_utils_linux.sh + echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" + popd + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Test PyTorch binary + run: | + set -x + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/final_pkgs" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + # Generate test script + docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh" + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-cuda11_6-shared-with-deps-cxx11-abi-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-cuda11_6-shared-with-deps-cxx11-abi-test + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cuda11.6 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-with-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cuda11_6-shared-with-deps-cxx11-abi + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-cuda11_6-shared-without-deps-cxx11-abi-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cuda11.6 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-without-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Set BUILD_SPLIT_CUDA + run: | + echo "BUILD_SPLIT_CUDA='ON'" >> "$GITHUB_ENV" + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Build PyTorch binary + run: | + set -x + mkdir -p artifacts/ + container_name=$(docker run \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/libtorch/build.sh" + - name: Chown artifacts + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - uses: seemethere/upload-artifact-s3@v4 + with: + name: libtorch-cuda11_6-shared-without-deps-cxx11-abi + retention-days: 14 + if-no-files-found: error + path: + ${{ runner.temp }}/artifacts/* + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-cuda11_6-shared-without-deps-cxx11-abi-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-cuda11_6-shared-without-deps-cxx11-abi-build + runs-on: linux.4xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cuda11.6 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-without-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cuda11_6-shared-without-deps-cxx11-abi + path: "${{ runner.temp }}/artifacts/" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a + name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG + with: + timeout_minutes: 10 + max_attempts: 3 + command: | + set -ex + pushd pytorch + bash .github/scripts/install_nvidia_utils_linux.sh + echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" + popd + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Test PyTorch binary + run: | + set -x + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/final_pkgs" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + # Generate test script + docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh" + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-cuda11_6-shared-without-deps-cxx11-abi-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-cuda11_6-shared-without-deps-cxx11-abi-test + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cuda11.6 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-without-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cuda11_6-shared-without-deps-cxx11-abi + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-cuda11_6-static-with-deps-cxx11-abi-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cuda11.6 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: static-with-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Set BUILD_SPLIT_CUDA + run: | + echo "BUILD_SPLIT_CUDA='ON'" >> "$GITHUB_ENV" + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Build PyTorch binary + run: | + set -x + mkdir -p artifacts/ + container_name=$(docker run \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/libtorch/build.sh" + - name: Chown artifacts + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - uses: seemethere/upload-artifact-s3@v4 + with: + name: libtorch-cuda11_6-static-with-deps-cxx11-abi + retention-days: 14 + if-no-files-found: error + path: + ${{ runner.temp }}/artifacts/* + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-cuda11_6-static-with-deps-cxx11-abi-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-cuda11_6-static-with-deps-cxx11-abi-build + runs-on: linux.4xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cuda11.6 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: static-with-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cuda11_6-static-with-deps-cxx11-abi + path: "${{ runner.temp }}/artifacts/" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a + name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG + with: + timeout_minutes: 10 + max_attempts: 3 + command: | + set -ex + pushd pytorch + bash .github/scripts/install_nvidia_utils_linux.sh + echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" + popd + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Test PyTorch binary + run: | + set -x + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/final_pkgs" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + # Generate test script + docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh" + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-cuda11_6-static-with-deps-cxx11-abi-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-cuda11_6-static-with-deps-cxx11-abi-test + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cuda11.6 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: static-with-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cuda11_6-static-with-deps-cxx11-abi + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-cuda11_6-static-without-deps-cxx11-abi-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cuda11.6 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: static-without-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Set BUILD_SPLIT_CUDA + run: | + echo "BUILD_SPLIT_CUDA='ON'" >> "$GITHUB_ENV" + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Build PyTorch binary + run: | + set -x + mkdir -p artifacts/ + container_name=$(docker run \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/libtorch/build.sh" + - name: Chown artifacts + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - uses: seemethere/upload-artifact-s3@v4 + with: + name: libtorch-cuda11_6-static-without-deps-cxx11-abi + retention-days: 14 + if-no-files-found: error + path: + ${{ runner.temp }}/artifacts/* + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-cuda11_6-static-without-deps-cxx11-abi-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-cuda11_6-static-without-deps-cxx11-abi-build + runs-on: linux.4xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cuda11.6 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: static-without-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cuda11_6-static-without-deps-cxx11-abi + path: "${{ runner.temp }}/artifacts/" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a + name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG + with: + timeout_minutes: 10 + max_attempts: 3 + command: | + set -ex + pushd pytorch + bash .github/scripts/install_nvidia_utils_linux.sh + echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" + popd + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Test PyTorch binary + run: | + set -x + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/final_pkgs" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + # Generate test script + docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh" + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-cuda11_6-static-without-deps-cxx11-abi-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-cuda11_6-static-without-deps-cxx11-abi-test + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cuda11.6 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: static-without-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cuda11_6-static-without-deps-cxx11-abi + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm4_5_2-shared-with-deps-cxx11-abi-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm4.5.2 + GPU_ARCH_VERSION: 4.5.2 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm4.5.2 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-with-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Build PyTorch binary + run: | + set -x + mkdir -p artifacts/ + container_name=$(docker run \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/libtorch/build.sh" + - name: Chown artifacts + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - uses: seemethere/upload-artifact-s3@v4 + with: + name: libtorch-rocm4_5_2-shared-with-deps-cxx11-abi + retention-days: 14 + if-no-files-found: error + path: + ${{ runner.temp }}/artifacts/* + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm4_5_2-shared-with-deps-cxx11-abi-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-rocm4_5_2-shared-with-deps-cxx11-abi-build + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm4.5.2 + GPU_ARCH_VERSION: 4.5.2 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm4.5.2 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-with-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-rocm4_5_2-shared-with-deps-cxx11-abi + path: "${{ runner.temp }}/artifacts/" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Test PyTorch binary + run: | + set -x + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/final_pkgs" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + # Generate test script + docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh" + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm4_5_2-shared-with-deps-cxx11-abi-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-rocm4_5_2-shared-with-deps-cxx11-abi-test + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm4.5.2 + GPU_ARCH_VERSION: 4.5.2 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm4.5.2 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-with-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-rocm4_5_2-shared-with-deps-cxx11-abi + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm4_5_2-shared-without-deps-cxx11-abi-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm4.5.2 + GPU_ARCH_VERSION: 4.5.2 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm4.5.2 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-without-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Build PyTorch binary + run: | + set -x + mkdir -p artifacts/ + container_name=$(docker run \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/libtorch/build.sh" + - name: Chown artifacts + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - uses: seemethere/upload-artifact-s3@v4 + with: + name: libtorch-rocm4_5_2-shared-without-deps-cxx11-abi + retention-days: 14 + if-no-files-found: error + path: + ${{ runner.temp }}/artifacts/* + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm4_5_2-shared-without-deps-cxx11-abi-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-rocm4_5_2-shared-without-deps-cxx11-abi-build + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm4.5.2 + GPU_ARCH_VERSION: 4.5.2 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm4.5.2 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-without-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-rocm4_5_2-shared-without-deps-cxx11-abi + path: "${{ runner.temp }}/artifacts/" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Test PyTorch binary + run: | + set -x + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/final_pkgs" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + # Generate test script + docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh" + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm4_5_2-shared-without-deps-cxx11-abi-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-rocm4_5_2-shared-without-deps-cxx11-abi-test + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm4.5.2 + GPU_ARCH_VERSION: 4.5.2 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm4.5.2 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-without-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-rocm4_5_2-shared-without-deps-cxx11-abi + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm4_5_2-static-with-deps-cxx11-abi-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm4.5.2 + GPU_ARCH_VERSION: 4.5.2 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm4.5.2 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: static-with-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Build PyTorch binary + run: | + set -x + mkdir -p artifacts/ + container_name=$(docker run \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/libtorch/build.sh" + - name: Chown artifacts + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - uses: seemethere/upload-artifact-s3@v4 + with: + name: libtorch-rocm4_5_2-static-with-deps-cxx11-abi + retention-days: 14 + if-no-files-found: error + path: + ${{ runner.temp }}/artifacts/* + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm4_5_2-static-with-deps-cxx11-abi-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-rocm4_5_2-static-with-deps-cxx11-abi-build + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm4.5.2 + GPU_ARCH_VERSION: 4.5.2 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm4.5.2 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: static-with-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-rocm4_5_2-static-with-deps-cxx11-abi + path: "${{ runner.temp }}/artifacts/" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Test PyTorch binary + run: | + set -x + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/final_pkgs" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + # Generate test script + docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh" + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm4_5_2-static-with-deps-cxx11-abi-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-rocm4_5_2-static-with-deps-cxx11-abi-test + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm4.5.2 + GPU_ARCH_VERSION: 4.5.2 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm4.5.2 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: static-with-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-rocm4_5_2-static-with-deps-cxx11-abi + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm4_5_2-static-without-deps-cxx11-abi-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm4.5.2 + GPU_ARCH_VERSION: 4.5.2 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm4.5.2 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: static-without-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Build PyTorch binary + run: | + set -x + mkdir -p artifacts/ + container_name=$(docker run \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/libtorch/build.sh" + - name: Chown artifacts + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - uses: seemethere/upload-artifact-s3@v4 + with: + name: libtorch-rocm4_5_2-static-without-deps-cxx11-abi + retention-days: 14 + if-no-files-found: error + path: + ${{ runner.temp }}/artifacts/* + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm4_5_2-static-without-deps-cxx11-abi-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-rocm4_5_2-static-without-deps-cxx11-abi-build + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm4.5.2 + GPU_ARCH_VERSION: 4.5.2 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm4.5.2 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: static-without-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-rocm4_5_2-static-without-deps-cxx11-abi + path: "${{ runner.temp }}/artifacts/" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Test PyTorch binary + run: | + set -x + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/final_pkgs" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + # Generate test script + docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh" + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm4_5_2-static-without-deps-cxx11-abi-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-rocm4_5_2-static-without-deps-cxx11-abi-test + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm4.5.2 + GPU_ARCH_VERSION: 4.5.2 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm4.5.2 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: static-without-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-rocm4_5_2-static-without-deps-cxx11-abi + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm5_0-shared-with-deps-cxx11-abi-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm5.0 + GPU_ARCH_VERSION: 5.0 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm5.0 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-with-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Build PyTorch binary + run: | + set -x + mkdir -p artifacts/ + container_name=$(docker run \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/libtorch/build.sh" + - name: Chown artifacts + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - uses: seemethere/upload-artifact-s3@v4 + with: + name: libtorch-rocm5_0-shared-with-deps-cxx11-abi + retention-days: 14 + if-no-files-found: error + path: + ${{ runner.temp }}/artifacts/* + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm5_0-shared-with-deps-cxx11-abi-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-rocm5_0-shared-with-deps-cxx11-abi-build + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm5.0 + GPU_ARCH_VERSION: 5.0 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm5.0 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-with-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-rocm5_0-shared-with-deps-cxx11-abi + path: "${{ runner.temp }}/artifacts/" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Test PyTorch binary + run: | + set -x + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/final_pkgs" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + # Generate test script + docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh" + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm5_0-shared-with-deps-cxx11-abi-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-rocm5_0-shared-with-deps-cxx11-abi-test + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm5.0 + GPU_ARCH_VERSION: 5.0 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm5.0 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-with-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-rocm5_0-shared-with-deps-cxx11-abi + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm5_0-shared-without-deps-cxx11-abi-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm5.0 + GPU_ARCH_VERSION: 5.0 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm5.0 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-without-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Build PyTorch binary + run: | + set -x + mkdir -p artifacts/ + container_name=$(docker run \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/libtorch/build.sh" + - name: Chown artifacts + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - uses: seemethere/upload-artifact-s3@v4 + with: + name: libtorch-rocm5_0-shared-without-deps-cxx11-abi + retention-days: 14 + if-no-files-found: error + path: + ${{ runner.temp }}/artifacts/* + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm5_0-shared-without-deps-cxx11-abi-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-rocm5_0-shared-without-deps-cxx11-abi-build + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm5.0 + GPU_ARCH_VERSION: 5.0 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm5.0 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-without-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-rocm5_0-shared-without-deps-cxx11-abi + path: "${{ runner.temp }}/artifacts/" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Test PyTorch binary + run: | + set -x + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/final_pkgs" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + # Generate test script + docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh" + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm5_0-shared-without-deps-cxx11-abi-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-rocm5_0-shared-without-deps-cxx11-abi-test + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm5.0 + GPU_ARCH_VERSION: 5.0 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm5.0 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-without-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-rocm5_0-shared-without-deps-cxx11-abi + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm5_0-static-with-deps-cxx11-abi-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm5.0 + GPU_ARCH_VERSION: 5.0 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm5.0 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: static-with-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Build PyTorch binary + run: | + set -x + mkdir -p artifacts/ + container_name=$(docker run \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/libtorch/build.sh" + - name: Chown artifacts + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - uses: seemethere/upload-artifact-s3@v4 + with: + name: libtorch-rocm5_0-static-with-deps-cxx11-abi + retention-days: 14 + if-no-files-found: error + path: + ${{ runner.temp }}/artifacts/* + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm5_0-static-with-deps-cxx11-abi-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-rocm5_0-static-with-deps-cxx11-abi-build + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm5.0 + GPU_ARCH_VERSION: 5.0 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm5.0 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: static-with-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-rocm5_0-static-with-deps-cxx11-abi + path: "${{ runner.temp }}/artifacts/" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Test PyTorch binary + run: | + set -x + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/final_pkgs" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + # Generate test script + docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh" + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm5_0-static-with-deps-cxx11-abi-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-rocm5_0-static-with-deps-cxx11-abi-test + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm5.0 + GPU_ARCH_VERSION: 5.0 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm5.0 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: static-with-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-rocm5_0-static-with-deps-cxx11-abi + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm5_0-static-without-deps-cxx11-abi-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm5.0 + GPU_ARCH_VERSION: 5.0 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm5.0 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: static-without-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Build PyTorch binary + run: | + set -x + mkdir -p artifacts/ + container_name=$(docker run \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/libtorch/build.sh" + - name: Chown artifacts + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - uses: seemethere/upload-artifact-s3@v4 + with: + name: libtorch-rocm5_0-static-without-deps-cxx11-abi + retention-days: 14 + if-no-files-found: error + path: + ${{ runner.temp }}/artifacts/* + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm5_0-static-without-deps-cxx11-abi-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-rocm5_0-static-without-deps-cxx11-abi-build + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm5.0 + GPU_ARCH_VERSION: 5.0 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm5.0 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: static-without-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-rocm5_0-static-without-deps-cxx11-abi + path: "${{ runner.temp }}/artifacts/" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Test PyTorch binary + run: | + set -x + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/final_pkgs" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + # Generate test script + docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh" + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm5_0-static-without-deps-cxx11-abi-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-rocm5_0-static-without-deps-cxx11-abi-test + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm5.0 + GPU_ARCH_VERSION: 5.0 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm5.0 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: static-without-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-rocm5_0-static-without-deps-cxx11-abi path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} diff --git a/.github/workflows/generated-linux-binary-libtorch-pre-cxx11-master.yml b/.github/workflows/generated-linux-binary-libtorch-pre-cxx11-master.yml new file mode 100644 index 00000000000000..922dbc27b7f250 --- /dev/null +++ b/.github/workflows/generated-linux-binary-libtorch-pre-cxx11-master.yml @@ -0,0 +1,283 @@ +# @generated DO NOT EDIT MANUALLY + +# Template is at: .github/templates/linux_binary_build_workflow.yml.j2 +# Generation script: .github/scripts/generate_ci_workflows.py +name: linux-binary-libtorch-pre-cxx11 + +on: + push: + branches: + - master + tags: + - 'ciflow/all/*' + - 'ciflow/trunk/*' + workflow_dispatch: + +env: + # Needed for conda builds + ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" + ANACONDA_USER: pytorch + AWS_DEFAULT_REGION: us-east-1 + BINARY_ENV_FILE: /tmp/env + BUILD_ENVIRONMENT: linux-binary-libtorch-pre-cxx11 + BUILDER_ROOT: /builder + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + IN_CI: 1 + IS_GHA: 1 + PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} + PR_NUMBER: ${{ github.event.pull_request.number }} + PYTORCH_FINAL_PACKAGE_DIR: /artifacts + PYTORCH_RETRY_TEST_CASES: 1 + PYTORCH_ROOT: /pytorch + SHA1: ${{ github.event.pull_request.head.sha || github.sha }} + SKIP_ALL_TESTS: 1 +concurrency: + group: linux-binary-libtorch-pre-cxx11-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} + cancel-in-progress: true + +jobs: + libtorch-cpu-shared-with-deps-cxx11-abi-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cpu + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-with-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Build PyTorch binary + run: | + set -x + mkdir -p artifacts/ + container_name=$(docker run \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/libtorch/build.sh" + - name: Chown artifacts + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - uses: seemethere/upload-artifact-s3@v4 + with: + name: libtorch-cpu-shared-with-deps-cxx11-abi + retention-days: 14 + if-no-files-found: error + path: + ${{ runner.temp }}/artifacts/* + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-cpu-shared-with-deps-cxx11-abi-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-cpu-shared-with-deps-cxx11-abi-build + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cpu + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-with-deps + DESIRED_DEVTOOLSET: cxx11-abi + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cpu-shared-with-deps-cxx11-abi + path: "${{ runner.temp }}/artifacts/" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Test PyTorch binary + run: | + set -x + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/final_pkgs" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + # Generate test script + docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh" + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af diff --git a/.github/workflows/generated-linux-binary-libtorch-pre-cxx11.yml b/.github/workflows/generated-linux-binary-libtorch-pre-cxx11-nightly.yml similarity index 56% rename from .github/workflows/generated-linux-binary-libtorch-pre-cxx11.yml rename to .github/workflows/generated-linux-binary-libtorch-pre-cxx11-nightly.yml index 0354e9061c546b..b34a3f3b322862 100644 --- a/.github/workflows/generated-linux-binary-libtorch-pre-cxx11.yml +++ b/.github/workflows/generated-linux-binary-libtorch-pre-cxx11-nightly.yml @@ -55,30 +55,10 @@ jobs: LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -95,9 +75,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -162,7 +139,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: libtorch-cpu-shared-with-deps-pre-cxx11 retention-days: 14 @@ -203,30 +180,10 @@ jobs: LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -243,10 +200,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cpu-shared-with-deps-pre-cxx11 @@ -346,30 +300,10 @@ jobs: LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -386,12 +320,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cpu-shared-with-deps-pre-cxx11 @@ -462,30 +393,10 @@ jobs: LIBTORCH_VARIANT: shared-without-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -502,9 +413,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -569,7 +477,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: libtorch-cpu-shared-without-deps-pre-cxx11 retention-days: 14 @@ -610,30 +518,10 @@ jobs: LIBTORCH_VARIANT: shared-without-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -650,10 +538,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cpu-shared-without-deps-pre-cxx11 @@ -753,30 +638,10 @@ jobs: LIBTORCH_VARIANT: shared-without-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -793,12 +658,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cpu-shared-without-deps-pre-cxx11 @@ -869,30 +731,10 @@ jobs: LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -909,9 +751,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -976,7 +815,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: libtorch-cpu-static-with-deps-pre-cxx11 retention-days: 14 @@ -1017,30 +856,10 @@ jobs: LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -1057,10 +876,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cpu-static-with-deps-pre-cxx11 @@ -1160,30 +976,10 @@ jobs: LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -1200,12 +996,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cpu-static-with-deps-pre-cxx11 @@ -1276,30 +1069,10 @@ jobs: LIBTORCH_VARIANT: static-without-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -1316,9 +1089,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -1383,7 +1153,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: libtorch-cpu-static-without-deps-pre-cxx11 retention-days: 14 @@ -1424,30 +1194,10 @@ jobs: LIBTORCH_VARIANT: static-without-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -1464,10 +1214,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cpu-static-without-deps-pre-cxx11 @@ -1567,30 +1314,10 @@ jobs: LIBTORCH_VARIANT: static-without-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -1607,12 +1334,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cpu-static-without-deps-pre-cxx11 @@ -1684,30 +1408,10 @@ jobs: LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -1724,9 +1428,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -1791,7 +1492,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: libtorch-cuda10_2-shared-with-deps-pre-cxx11 retention-days: 14 @@ -1833,30 +1534,10 @@ jobs: LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -1873,10 +1554,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda10_2-shared-with-deps-pre-cxx11 @@ -1988,30 +1666,10 @@ jobs: LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2028,12 +1686,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda10_2-shared-with-deps-pre-cxx11 @@ -2105,30 +1760,10 @@ jobs: LIBTORCH_VARIANT: shared-without-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2145,9 +1780,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -2212,7 +1844,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: libtorch-cuda10_2-shared-without-deps-pre-cxx11 retention-days: 14 @@ -2254,30 +1886,10 @@ jobs: LIBTORCH_VARIANT: shared-without-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2294,10 +1906,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda10_2-shared-without-deps-pre-cxx11 @@ -2409,30 +2018,10 @@ jobs: LIBTORCH_VARIANT: shared-without-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2449,12 +2038,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda10_2-shared-without-deps-pre-cxx11 @@ -2526,30 +2112,10 @@ jobs: LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2566,9 +2132,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -2633,7 +2196,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: libtorch-cuda10_2-static-with-deps-pre-cxx11 retention-days: 14 @@ -2675,30 +2238,10 @@ jobs: LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2715,10 +2258,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda10_2-static-with-deps-pre-cxx11 @@ -2830,30 +2370,10 @@ jobs: LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2870,12 +2390,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda10_2-static-with-deps-pre-cxx11 @@ -2947,30 +2464,10 @@ jobs: LIBTORCH_VARIANT: static-without-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2987,9 +2484,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -3054,7 +2548,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: libtorch-cuda10_2-static-without-deps-pre-cxx11 retention-days: 14 @@ -3096,30 +2590,10 @@ jobs: LIBTORCH_VARIANT: static-without-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3136,10 +2610,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda10_2-static-without-deps-pre-cxx11 @@ -3251,30 +2722,10 @@ jobs: LIBTORCH_VARIANT: static-without-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3291,12 +2742,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda10_2-static-without-deps-pre-cxx11 @@ -3368,30 +2816,10 @@ jobs: LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3408,9 +2836,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -3478,7 +2903,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: libtorch-cuda11_3-shared-with-deps-pre-cxx11 retention-days: 14 @@ -3520,30 +2945,10 @@ jobs: LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3560,10 +2965,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_3-shared-with-deps-pre-cxx11 @@ -3675,30 +3077,10 @@ jobs: LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3715,12 +3097,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_3-shared-with-deps-pre-cxx11 @@ -3792,30 +3171,10 @@ jobs: LIBTORCH_VARIANT: shared-without-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3832,9 +3191,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -3902,7 +3258,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: libtorch-cuda11_3-shared-without-deps-pre-cxx11 retention-days: 14 @@ -3944,30 +3300,10 @@ jobs: LIBTORCH_VARIANT: shared-without-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3984,10 +3320,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_3-shared-without-deps-pre-cxx11 @@ -4099,30 +3432,10 @@ jobs: LIBTORCH_VARIANT: shared-without-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -4139,12 +3452,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_3-shared-without-deps-pre-cxx11 @@ -4216,30 +3526,10 @@ jobs: LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -4256,9 +3546,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -4326,7 +3613,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: libtorch-cuda11_3-static-with-deps-pre-cxx11 retention-days: 14 @@ -4368,30 +3655,10 @@ jobs: LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -4408,10 +3675,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_3-static-with-deps-pre-cxx11 @@ -4523,30 +3787,10 @@ jobs: LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -4563,12 +3807,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_3-static-with-deps-pre-cxx11 @@ -4640,30 +3881,10 @@ jobs: LIBTORCH_VARIANT: static-without-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -4680,9 +3901,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -4750,7 +3968,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: libtorch-cuda11_3-static-without-deps-pre-cxx11 retention-days: 14 @@ -4792,30 +4010,10 @@ jobs: LIBTORCH_VARIANT: static-without-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -4832,10 +4030,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_3-static-without-deps-pre-cxx11 @@ -4947,30 +4142,10 @@ jobs: LIBTORCH_VARIANT: static-without-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -4987,12 +4162,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_3-static-without-deps-pre-cxx11 @@ -5064,31 +4236,11 @@ jobs: LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace run: | retry () { "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") @@ -5104,9 +4256,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -5174,7 +4323,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: libtorch-cuda11_5-shared-with-deps-pre-cxx11 retention-days: 14 @@ -5216,30 +4365,10 @@ jobs: LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -5256,10 +4385,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_5-shared-with-deps-pre-cxx11 @@ -5371,30 +4497,10 @@ jobs: LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -5411,12 +4517,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_5-shared-with-deps-pre-cxx11 @@ -5488,30 +4591,10 @@ jobs: LIBTORCH_VARIANT: shared-without-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -5528,9 +4611,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -5598,7 +4678,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: libtorch-cuda11_5-shared-without-deps-pre-cxx11 retention-days: 14 @@ -5640,30 +4720,10 @@ jobs: LIBTORCH_VARIANT: shared-without-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -5680,10 +4740,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_5-shared-without-deps-pre-cxx11 @@ -5795,30 +4852,10 @@ jobs: LIBTORCH_VARIANT: shared-without-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -5835,12 +4872,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_5-shared-without-deps-pre-cxx11 @@ -5912,30 +4946,10 @@ jobs: LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -5952,9 +4966,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -6022,7 +5033,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: libtorch-cuda11_5-static-with-deps-pre-cxx11 retention-days: 14 @@ -6064,30 +5075,10 @@ jobs: LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -6104,10 +5095,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_5-static-with-deps-pre-cxx11 @@ -6219,30 +5207,10 @@ jobs: LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -6259,12 +5227,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_5-static-with-deps-pre-cxx11 @@ -6336,30 +5301,10 @@ jobs: LIBTORCH_VARIANT: static-without-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -6376,9 +5321,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -6446,7 +5388,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: libtorch-cuda11_5-static-without-deps-pre-cxx11 retention-days: 14 @@ -6488,30 +5430,10 @@ jobs: LIBTORCH_VARIANT: static-without-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -6528,10 +5450,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_5-static-without-deps-pre-cxx11 @@ -6643,30 +5562,104 @@ jobs: LIBTORCH_VARIANT: static-without-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") retry () { "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cuda11_5-static-without-deps-pre-cxx11 + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-cuda11_6-shared-with-deps-pre-cxx11-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-with-deps + DESIRED_DEVTOOLSET: pre-cxx11 + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -6683,15 +5676,4066 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Clone pytorch/pytorch - uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download Build Artifacts + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: - name: libtorch-cuda11_5-static-without-deps-pre-cxx11 + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Set BUILD_SPLIT_CUDA + run: | + echo "BUILD_SPLIT_CUDA='ON'" >> "$GITHUB_ENV" + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Build PyTorch binary + run: | + set -x + mkdir -p artifacts/ + container_name=$(docker run \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/libtorch/build.sh" + - name: Chown artifacts + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - uses: seemethere/upload-artifact-s3@v4 + with: + name: libtorch-cuda11_6-shared-with-deps-pre-cxx11 + retention-days: 14 + if-no-files-found: error + path: + ${{ runner.temp }}/artifacts/* + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-cuda11_6-shared-with-deps-pre-cxx11-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-cuda11_6-shared-with-deps-pre-cxx11-build + runs-on: linux.4xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-with-deps + DESIRED_DEVTOOLSET: pre-cxx11 + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cuda11_6-shared-with-deps-pre-cxx11 + path: "${{ runner.temp }}/artifacts/" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a + name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG + with: + timeout_minutes: 10 + max_attempts: 3 + command: | + set -ex + pushd pytorch + bash .github/scripts/install_nvidia_utils_linux.sh + echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" + popd + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Test PyTorch binary + run: | + set -x + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/final_pkgs" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + # Generate test script + docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh" + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-cuda11_6-shared-with-deps-pre-cxx11-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-cuda11_6-shared-with-deps-pre-cxx11-test + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-with-deps + DESIRED_DEVTOOLSET: pre-cxx11 + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cuda11_6-shared-with-deps-pre-cxx11 + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-cuda11_6-shared-without-deps-pre-cxx11-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-without-deps + DESIRED_DEVTOOLSET: pre-cxx11 + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Set BUILD_SPLIT_CUDA + run: | + echo "BUILD_SPLIT_CUDA='ON'" >> "$GITHUB_ENV" + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Build PyTorch binary + run: | + set -x + mkdir -p artifacts/ + container_name=$(docker run \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/libtorch/build.sh" + - name: Chown artifacts + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - uses: seemethere/upload-artifact-s3@v4 + with: + name: libtorch-cuda11_6-shared-without-deps-pre-cxx11 + retention-days: 14 + if-no-files-found: error + path: + ${{ runner.temp }}/artifacts/* + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-cuda11_6-shared-without-deps-pre-cxx11-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-cuda11_6-shared-without-deps-pre-cxx11-build + runs-on: linux.4xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-without-deps + DESIRED_DEVTOOLSET: pre-cxx11 + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cuda11_6-shared-without-deps-pre-cxx11 + path: "${{ runner.temp }}/artifacts/" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a + name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG + with: + timeout_minutes: 10 + max_attempts: 3 + command: | + set -ex + pushd pytorch + bash .github/scripts/install_nvidia_utils_linux.sh + echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" + popd + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Test PyTorch binary + run: | + set -x + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/final_pkgs" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + # Generate test script + docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh" + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-cuda11_6-shared-without-deps-pre-cxx11-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-cuda11_6-shared-without-deps-pre-cxx11-test + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-without-deps + DESIRED_DEVTOOLSET: pre-cxx11 + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cuda11_6-shared-without-deps-pre-cxx11 + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-cuda11_6-static-with-deps-pre-cxx11-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: static-with-deps + DESIRED_DEVTOOLSET: pre-cxx11 + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Set BUILD_SPLIT_CUDA + run: | + echo "BUILD_SPLIT_CUDA='ON'" >> "$GITHUB_ENV" + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Build PyTorch binary + run: | + set -x + mkdir -p artifacts/ + container_name=$(docker run \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/libtorch/build.sh" + - name: Chown artifacts + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - uses: seemethere/upload-artifact-s3@v4 + with: + name: libtorch-cuda11_6-static-with-deps-pre-cxx11 + retention-days: 14 + if-no-files-found: error + path: + ${{ runner.temp }}/artifacts/* + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-cuda11_6-static-with-deps-pre-cxx11-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-cuda11_6-static-with-deps-pre-cxx11-build + runs-on: linux.4xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: static-with-deps + DESIRED_DEVTOOLSET: pre-cxx11 + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cuda11_6-static-with-deps-pre-cxx11 + path: "${{ runner.temp }}/artifacts/" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a + name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG + with: + timeout_minutes: 10 + max_attempts: 3 + command: | + set -ex + pushd pytorch + bash .github/scripts/install_nvidia_utils_linux.sh + echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" + popd + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Test PyTorch binary + run: | + set -x + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/final_pkgs" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + # Generate test script + docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh" + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-cuda11_6-static-with-deps-pre-cxx11-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-cuda11_6-static-with-deps-pre-cxx11-test + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: static-with-deps + DESIRED_DEVTOOLSET: pre-cxx11 + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cuda11_6-static-with-deps-pre-cxx11 + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-cuda11_6-static-without-deps-pre-cxx11-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: static-without-deps + DESIRED_DEVTOOLSET: pre-cxx11 + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Set BUILD_SPLIT_CUDA + run: | + echo "BUILD_SPLIT_CUDA='ON'" >> "$GITHUB_ENV" + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Build PyTorch binary + run: | + set -x + mkdir -p artifacts/ + container_name=$(docker run \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/libtorch/build.sh" + - name: Chown artifacts + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - uses: seemethere/upload-artifact-s3@v4 + with: + name: libtorch-cuda11_6-static-without-deps-pre-cxx11 + retention-days: 14 + if-no-files-found: error + path: + ${{ runner.temp }}/artifacts/* + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-cuda11_6-static-without-deps-pre-cxx11-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-cuda11_6-static-without-deps-pre-cxx11-build + runs-on: linux.4xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: static-without-deps + DESIRED_DEVTOOLSET: pre-cxx11 + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cuda11_6-static-without-deps-pre-cxx11 + path: "${{ runner.temp }}/artifacts/" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a + name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG + with: + timeout_minutes: 10 + max_attempts: 3 + command: | + set -ex + pushd pytorch + bash .github/scripts/install_nvidia_utils_linux.sh + echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" + popd + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Test PyTorch binary + run: | + set -x + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/final_pkgs" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + # Generate test script + docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh" + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-cuda11_6-static-without-deps-pre-cxx11-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-cuda11_6-static-without-deps-pre-cxx11-test + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: static-without-deps + DESIRED_DEVTOOLSET: pre-cxx11 + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cuda11_6-static-without-deps-pre-cxx11 + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm4_5_2-shared-with-deps-pre-cxx11-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm4.5.2 + GPU_ARCH_VERSION: 4.5.2 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/manylinux-builder:rocm4.5.2 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-with-deps + DESIRED_DEVTOOLSET: pre-cxx11 + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Build PyTorch binary + run: | + set -x + mkdir -p artifacts/ + container_name=$(docker run \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/libtorch/build.sh" + - name: Chown artifacts + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - uses: seemethere/upload-artifact-s3@v4 + with: + name: libtorch-rocm4_5_2-shared-with-deps-pre-cxx11 + retention-days: 14 + if-no-files-found: error + path: + ${{ runner.temp }}/artifacts/* + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm4_5_2-shared-with-deps-pre-cxx11-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-rocm4_5_2-shared-with-deps-pre-cxx11-build + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm4.5.2 + GPU_ARCH_VERSION: 4.5.2 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/manylinux-builder:rocm4.5.2 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-with-deps + DESIRED_DEVTOOLSET: pre-cxx11 + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-rocm4_5_2-shared-with-deps-pre-cxx11 + path: "${{ runner.temp }}/artifacts/" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Test PyTorch binary + run: | + set -x + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/final_pkgs" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + # Generate test script + docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh" + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm4_5_2-shared-with-deps-pre-cxx11-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-rocm4_5_2-shared-with-deps-pre-cxx11-test + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm4.5.2 + GPU_ARCH_VERSION: 4.5.2 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/manylinux-builder:rocm4.5.2 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-with-deps + DESIRED_DEVTOOLSET: pre-cxx11 + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-rocm4_5_2-shared-with-deps-pre-cxx11 + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm4_5_2-shared-without-deps-pre-cxx11-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm4.5.2 + GPU_ARCH_VERSION: 4.5.2 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/manylinux-builder:rocm4.5.2 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-without-deps + DESIRED_DEVTOOLSET: pre-cxx11 + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Build PyTorch binary + run: | + set -x + mkdir -p artifacts/ + container_name=$(docker run \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/libtorch/build.sh" + - name: Chown artifacts + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - uses: seemethere/upload-artifact-s3@v4 + with: + name: libtorch-rocm4_5_2-shared-without-deps-pre-cxx11 + retention-days: 14 + if-no-files-found: error + path: + ${{ runner.temp }}/artifacts/* + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm4_5_2-shared-without-deps-pre-cxx11-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-rocm4_5_2-shared-without-deps-pre-cxx11-build + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm4.5.2 + GPU_ARCH_VERSION: 4.5.2 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/manylinux-builder:rocm4.5.2 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-without-deps + DESIRED_DEVTOOLSET: pre-cxx11 + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-rocm4_5_2-shared-without-deps-pre-cxx11 + path: "${{ runner.temp }}/artifacts/" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Test PyTorch binary + run: | + set -x + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/final_pkgs" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + # Generate test script + docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh" + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm4_5_2-shared-without-deps-pre-cxx11-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-rocm4_5_2-shared-without-deps-pre-cxx11-test + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm4.5.2 + GPU_ARCH_VERSION: 4.5.2 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/manylinux-builder:rocm4.5.2 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-without-deps + DESIRED_DEVTOOLSET: pre-cxx11 + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-rocm4_5_2-shared-without-deps-pre-cxx11 + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm4_5_2-static-with-deps-pre-cxx11-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm4.5.2 + GPU_ARCH_VERSION: 4.5.2 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/manylinux-builder:rocm4.5.2 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: static-with-deps + DESIRED_DEVTOOLSET: pre-cxx11 + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Build PyTorch binary + run: | + set -x + mkdir -p artifacts/ + container_name=$(docker run \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/libtorch/build.sh" + - name: Chown artifacts + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - uses: seemethere/upload-artifact-s3@v4 + with: + name: libtorch-rocm4_5_2-static-with-deps-pre-cxx11 + retention-days: 14 + if-no-files-found: error + path: + ${{ runner.temp }}/artifacts/* + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm4_5_2-static-with-deps-pre-cxx11-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-rocm4_5_2-static-with-deps-pre-cxx11-build + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm4.5.2 + GPU_ARCH_VERSION: 4.5.2 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/manylinux-builder:rocm4.5.2 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: static-with-deps + DESIRED_DEVTOOLSET: pre-cxx11 + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-rocm4_5_2-static-with-deps-pre-cxx11 + path: "${{ runner.temp }}/artifacts/" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Test PyTorch binary + run: | + set -x + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/final_pkgs" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + # Generate test script + docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh" + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm4_5_2-static-with-deps-pre-cxx11-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-rocm4_5_2-static-with-deps-pre-cxx11-test + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm4.5.2 + GPU_ARCH_VERSION: 4.5.2 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/manylinux-builder:rocm4.5.2 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: static-with-deps + DESIRED_DEVTOOLSET: pre-cxx11 + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-rocm4_5_2-static-with-deps-pre-cxx11 + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm4_5_2-static-without-deps-pre-cxx11-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm4.5.2 + GPU_ARCH_VERSION: 4.5.2 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/manylinux-builder:rocm4.5.2 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: static-without-deps + DESIRED_DEVTOOLSET: pre-cxx11 + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Build PyTorch binary + run: | + set -x + mkdir -p artifacts/ + container_name=$(docker run \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/libtorch/build.sh" + - name: Chown artifacts + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - uses: seemethere/upload-artifact-s3@v4 + with: + name: libtorch-rocm4_5_2-static-without-deps-pre-cxx11 + retention-days: 14 + if-no-files-found: error + path: + ${{ runner.temp }}/artifacts/* + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm4_5_2-static-without-deps-pre-cxx11-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-rocm4_5_2-static-without-deps-pre-cxx11-build + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm4.5.2 + GPU_ARCH_VERSION: 4.5.2 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/manylinux-builder:rocm4.5.2 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: static-without-deps + DESIRED_DEVTOOLSET: pre-cxx11 + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-rocm4_5_2-static-without-deps-pre-cxx11 + path: "${{ runner.temp }}/artifacts/" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Test PyTorch binary + run: | + set -x + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/final_pkgs" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + # Generate test script + docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh" + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm4_5_2-static-without-deps-pre-cxx11-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-rocm4_5_2-static-without-deps-pre-cxx11-test + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm4.5.2 + GPU_ARCH_VERSION: 4.5.2 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/manylinux-builder:rocm4.5.2 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: static-without-deps + DESIRED_DEVTOOLSET: pre-cxx11 + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-rocm4_5_2-static-without-deps-pre-cxx11 + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm5_0-shared-with-deps-pre-cxx11-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm5.0 + GPU_ARCH_VERSION: 5.0 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.0 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-with-deps + DESIRED_DEVTOOLSET: pre-cxx11 + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Build PyTorch binary + run: | + set -x + mkdir -p artifacts/ + container_name=$(docker run \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/libtorch/build.sh" + - name: Chown artifacts + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - uses: seemethere/upload-artifact-s3@v4 + with: + name: libtorch-rocm5_0-shared-with-deps-pre-cxx11 + retention-days: 14 + if-no-files-found: error + path: + ${{ runner.temp }}/artifacts/* + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm5_0-shared-with-deps-pre-cxx11-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-rocm5_0-shared-with-deps-pre-cxx11-build + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm5.0 + GPU_ARCH_VERSION: 5.0 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.0 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-with-deps + DESIRED_DEVTOOLSET: pre-cxx11 + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-rocm5_0-shared-with-deps-pre-cxx11 + path: "${{ runner.temp }}/artifacts/" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Test PyTorch binary + run: | + set -x + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/final_pkgs" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + # Generate test script + docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh" + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm5_0-shared-with-deps-pre-cxx11-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-rocm5_0-shared-with-deps-pre-cxx11-test + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm5.0 + GPU_ARCH_VERSION: 5.0 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.0 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-with-deps + DESIRED_DEVTOOLSET: pre-cxx11 + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-rocm5_0-shared-with-deps-pre-cxx11 + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm5_0-shared-without-deps-pre-cxx11-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm5.0 + GPU_ARCH_VERSION: 5.0 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.0 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-without-deps + DESIRED_DEVTOOLSET: pre-cxx11 + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Build PyTorch binary + run: | + set -x + mkdir -p artifacts/ + container_name=$(docker run \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/libtorch/build.sh" + - name: Chown artifacts + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - uses: seemethere/upload-artifact-s3@v4 + with: + name: libtorch-rocm5_0-shared-without-deps-pre-cxx11 + retention-days: 14 + if-no-files-found: error + path: + ${{ runner.temp }}/artifacts/* + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm5_0-shared-without-deps-pre-cxx11-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-rocm5_0-shared-without-deps-pre-cxx11-build + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm5.0 + GPU_ARCH_VERSION: 5.0 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.0 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-without-deps + DESIRED_DEVTOOLSET: pre-cxx11 + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-rocm5_0-shared-without-deps-pre-cxx11 + path: "${{ runner.temp }}/artifacts/" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Test PyTorch binary + run: | + set -x + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/final_pkgs" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + # Generate test script + docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh" + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm5_0-shared-without-deps-pre-cxx11-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-rocm5_0-shared-without-deps-pre-cxx11-test + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm5.0 + GPU_ARCH_VERSION: 5.0 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.0 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: shared-without-deps + DESIRED_DEVTOOLSET: pre-cxx11 + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-rocm5_0-shared-without-deps-pre-cxx11 + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm5_0-static-with-deps-pre-cxx11-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm5.0 + GPU_ARCH_VERSION: 5.0 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.0 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: static-with-deps + DESIRED_DEVTOOLSET: pre-cxx11 + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Build PyTorch binary + run: | + set -x + mkdir -p artifacts/ + container_name=$(docker run \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/libtorch/build.sh" + - name: Chown artifacts + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - uses: seemethere/upload-artifact-s3@v4 + with: + name: libtorch-rocm5_0-static-with-deps-pre-cxx11 + retention-days: 14 + if-no-files-found: error + path: + ${{ runner.temp }}/artifacts/* + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm5_0-static-with-deps-pre-cxx11-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-rocm5_0-static-with-deps-pre-cxx11-build + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm5.0 + GPU_ARCH_VERSION: 5.0 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.0 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: static-with-deps + DESIRED_DEVTOOLSET: pre-cxx11 + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-rocm5_0-static-with-deps-pre-cxx11 + path: "${{ runner.temp }}/artifacts/" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Test PyTorch binary + run: | + set -x + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/final_pkgs" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + # Generate test script + docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh" + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm5_0-static-with-deps-pre-cxx11-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-rocm5_0-static-with-deps-pre-cxx11-test + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm5.0 + GPU_ARCH_VERSION: 5.0 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.0 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: static-with-deps + DESIRED_DEVTOOLSET: pre-cxx11 + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-rocm5_0-static-with-deps-pre-cxx11 + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm5_0-static-without-deps-pre-cxx11-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm5.0 + GPU_ARCH_VERSION: 5.0 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.0 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: static-without-deps + DESIRED_DEVTOOLSET: pre-cxx11 + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Build PyTorch binary + run: | + set -x + mkdir -p artifacts/ + container_name=$(docker run \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/libtorch/build.sh" + - name: Chown artifacts + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - uses: seemethere/upload-artifact-s3@v4 + with: + name: libtorch-rocm5_0-static-without-deps-pre-cxx11 + retention-days: 14 + if-no-files-found: error + path: + ${{ runner.temp }}/artifacts/* + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm5_0-static-without-deps-pre-cxx11-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-rocm5_0-static-without-deps-pre-cxx11-build + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm5.0 + GPU_ARCH_VERSION: 5.0 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.0 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: static-without-deps + DESIRED_DEVTOOLSET: pre-cxx11 + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-rocm5_0-static-without-deps-pre-cxx11 + path: "${{ runner.temp }}/artifacts/" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Test PyTorch binary + run: | + set -x + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/final_pkgs" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + # Generate test script + docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh" + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-rocm5_0-static-without-deps-pre-cxx11-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-rocm5_0-static-without-deps-pre-cxx11-test + env: + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm5.0 + GPU_ARCH_VERSION: 5.0 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.0 + SKIP_ALL_TESTS: 1 + LIBTORCH_VARIANT: static-without-deps + DESIRED_DEVTOOLSET: pre-cxx11 + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-rocm5_0-static-without-deps-pre-cxx11 path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} diff --git a/.github/workflows/generated-linux-binary-manywheel-master.yml b/.github/workflows/generated-linux-binary-manywheel-master.yml new file mode 100644 index 00000000000000..d384b3e79bd0d1 --- /dev/null +++ b/.github/workflows/generated-linux-binary-manywheel-master.yml @@ -0,0 +1,294 @@ +# @generated DO NOT EDIT MANUALLY + +# Template is at: .github/templates/linux_binary_build_workflow.yml.j2 +# Generation script: .github/scripts/generate_ci_workflows.py +name: linux-binary-manywheel + +on: + push: + branches: + - master + tags: + - 'ciflow/all/*' + - 'ciflow/trunk/*' + workflow_dispatch: + +env: + # Needed for conda builds + ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" + ANACONDA_USER: pytorch + AWS_DEFAULT_REGION: us-east-1 + BINARY_ENV_FILE: /tmp/env + BUILD_ENVIRONMENT: linux-binary-manywheel + BUILDER_ROOT: /builder + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + IN_CI: 1 + IS_GHA: 1 + PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} + PR_NUMBER: ${{ github.event.pull_request.number }} + PYTORCH_FINAL_PACKAGE_DIR: /artifacts + PYTORCH_RETRY_TEST_CASES: 1 + PYTORCH_ROOT: /pytorch + SHA1: ${{ github.event.pull_request.head.sha || github.sha }} + SKIP_ALL_TESTS: 1 +concurrency: + group: linux-binary-manywheel-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} + cancel-in-progress: true + +jobs: + manywheel-py3_7-cuda10_2-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: manywheel + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu102 + GPU_ARCH_VERSION: 10.2 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.7" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Build PyTorch binary + run: | + set -x + mkdir -p artifacts/ + container_name=$(docker run \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/manywheel/build.sh" + - name: Chown artifacts + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - uses: seemethere/upload-artifact-s3@v4 + with: + name: manywheel-py3_7-cuda10_2 + retention-days: 14 + if-no-files-found: error + path: + ${{ runner.temp }}/artifacts/* + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + manywheel-py3_7-cuda10_2-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: manywheel-py3_7-cuda10_2-build + runs-on: linux.4xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PACKAGE_TYPE: manywheel + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu102 + GPU_ARCH_VERSION: 10.2 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.7" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: manywheel-py3_7-cuda10_2 + path: "${{ runner.temp }}/artifacts/" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a + name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG + with: + timeout_minutes: 10 + max_attempts: 3 + command: | + set -ex + pushd pytorch + bash .github/scripts/install_nvidia_utils_linux.sh + echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" + popd + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Test PyTorch binary + run: | + set -x + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/final_pkgs" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + # Generate test script + docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh" + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af diff --git a/.github/workflows/generated-linux-binary-manywheel.yml b/.github/workflows/generated-linux-binary-manywheel-nightly.yml similarity index 78% rename from .github/workflows/generated-linux-binary-manywheel.yml rename to .github/workflows/generated-linux-binary-manywheel-nightly.yml index c35b6389328010..c8a7c1d73efff7 100644 --- a/.github/workflows/generated-linux-binary-manywheel.yml +++ b/.github/workflows/generated-linux-binary-manywheel-nightly.yml @@ -54,30 +54,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -94,9 +74,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -161,7 +138,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: manywheel-py3_7-cpu retention-days: 14 @@ -201,30 +178,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -241,10 +198,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: manywheel-py3_7-cpu @@ -343,30 +297,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -383,12 +317,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: manywheel-py3_7-cpu @@ -459,30 +390,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -499,9 +410,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -566,7 +474,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: manywheel-py3_7-cuda10_2 retention-days: 14 @@ -607,30 +515,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -647,10 +535,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: manywheel-py3_7-cuda10_2 @@ -761,30 +646,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -801,12 +666,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: manywheel-py3_7-cuda10_2 @@ -877,30 +739,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -917,9 +759,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -987,7 +826,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: manywheel-py3_7-cuda11_3 retention-days: 14 @@ -1028,30 +867,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -1068,10 +887,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: manywheel-py3_7-cuda11_3 @@ -1182,30 +998,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -1222,12 +1018,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: manywheel-py3_7-cuda11_3 @@ -1298,30 +1091,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -1338,9 +1111,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -1408,7 +1178,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: manywheel-py3_7-cuda11_5 retention-days: 14 @@ -1449,30 +1219,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -1489,10 +1239,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: manywheel-py3_7-cuda11_5 @@ -1603,30 +1350,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -1643,12 +1370,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: manywheel-py3_7-cuda11_5 @@ -1704,7 +1428,7 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_7-rocm4_5_2-build: + manywheel-py3_7-cuda11_6-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: linux.4xlarge timeout-minutes: 240 @@ -1712,37 +1436,17 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm4.5.2 - GPU_ARCH_VERSION: 4.5.2 - GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm4.5.2 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -1759,9 +1463,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -1785,6 +1486,9 @@ jobs: # Remove any artifacts from the previous checkouts git clean -fxd working-directory: builder + - name: Set BUILD_SPLIT_CUDA + run: | + echo "BUILD_SPLIT_CUDA='ON'" >> "$GITHUB_ENV" - name: Pull Docker image run: | retry () { @@ -1826,9 +1530,9 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: - name: manywheel-py3_7-rocm4_5_2 + name: manywheel-py3_7-cuda11_6 retention-days: 14 if-no-files-found: error path: @@ -1851,46 +1555,26 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_7-rocm4_5_2-test: # Testing + manywheel-py3_7-cuda11_6-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_7-rocm4_5_2-build - runs-on: linux.4xlarge + needs: manywheel-py3_7-cuda11_6-build + runs-on: linux.4xlarge.nvidia.gpu timeout-minutes: 240 env: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm4.5.2 - GPU_ARCH_VERSION: 4.5.2 - GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm4.5.2 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -1907,13 +1591,10 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: manywheel-py3_7-rocm4_5_2 + name: manywheel-py3_7-cuda11_6 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -1938,6 +1619,17 @@ jobs: # Remove any artifacts from the previous checkouts git clean -fxd working-directory: builder + - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a + name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG + with: + timeout_minutes: 10 + max_attempts: 3 + command: | + set -ex + pushd pytorch + bash .github/scripts/install_nvidia_utils_linux.sh + echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" + popd - name: Pull Docker image run: | retry () { @@ -1995,45 +1687,25 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_7-rocm4_5_2-upload: # Uploading + manywheel-py3_7-cuda11_6-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_7-rocm4_5_2-test + needs: manywheel-py3_7-cuda11_6-test env: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm4.5.2 - GPU_ARCH_VERSION: 4.5.2 - GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm4.5.2 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2050,15 +1722,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: manywheel-py3_7-rocm4_5_2 + name: manywheel-py3_7-cuda11_6 path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -2111,7 +1780,7 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_7-rocm5_0-build: + manywheel-py3_7-rocm4_5_2-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: linux.4xlarge timeout-minutes: 240 @@ -2119,37 +1788,17 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.0 - GPU_ARCH_VERSION: 5.0 + DESIRED_CUDA: rocm4.5.2 + GPU_ARCH_VERSION: 4.5.2 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.0 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm4.5.2 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2166,9 +1815,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -2233,9 +1879,9 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: - name: manywheel-py3_7-rocm5_0 + name: manywheel-py3_7-rocm4_5_2 retention-days: 14 if-no-files-found: error path: @@ -2258,46 +1904,26 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_7-rocm5_0-test: # Testing + manywheel-py3_7-rocm4_5_2-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_7-rocm5_0-build + needs: manywheel-py3_7-rocm4_5_2-build runs-on: linux.4xlarge timeout-minutes: 240 env: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.0 - GPU_ARCH_VERSION: 5.0 + DESIRED_CUDA: rocm4.5.2 + GPU_ARCH_VERSION: 4.5.2 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.0 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm4.5.2 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2314,13 +1940,10 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: manywheel-py3_7-rocm5_0 + name: manywheel-py3_7-rocm4_5_2 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -2402,45 +2025,25 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_7-rocm5_0-upload: # Uploading + manywheel-py3_7-rocm4_5_2-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_7-rocm5_0-test + needs: manywheel-py3_7-rocm4_5_2-test env: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.0 - GPU_ARCH_VERSION: 5.0 + DESIRED_CUDA: rocm4.5.2 + GPU_ARCH_VERSION: 4.5.2 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.0 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm4.5.2 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2457,15 +2060,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: manywheel-py3_7-rocm5_0 + name: manywheel-py3_7-rocm4_5_2 path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -2518,7 +2118,7 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_8-cpu-build: + manywheel-py3_7-rocm5_0-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: linux.4xlarge timeout-minutes: 240 @@ -2526,36 +2126,17 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu - DOCKER_IMAGE: pytorch/manylinux-builder:cpu + DESIRED_CUDA: rocm5.0 + GPU_ARCH_VERSION: 5.0 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.0 SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.8" + DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2572,9 +2153,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -2639,9 +2217,9 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: - name: manywheel-py3_8-cpu + name: manywheel-py3_7-rocm5_0 retention-days: 14 if-no-files-found: error path: @@ -2664,45 +2242,26 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_8-cpu-test: # Testing + manywheel-py3_7-rocm5_0-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_8-cpu-build + needs: manywheel-py3_7-rocm5_0-build runs-on: linux.4xlarge timeout-minutes: 240 env: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu - DOCKER_IMAGE: pytorch/manylinux-builder:cpu + DESIRED_CUDA: rocm5.0 + GPU_ARCH_VERSION: 5.0 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.0 SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.8" + DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2719,13 +2278,10 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: manywheel-py3_8-cpu + name: manywheel-py3_7-rocm5_0 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -2807,44 +2363,25 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_8-cpu-upload: # Uploading + manywheel-py3_7-rocm5_0-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_8-cpu-test + needs: manywheel-py3_7-rocm5_0-test env: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu - DOCKER_IMAGE: pytorch/manylinux-builder:cpu + DESIRED_CUDA: rocm5.0 + GPU_ARCH_VERSION: 5.0 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.0 SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.8" + DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2861,15 +2398,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: manywheel-py3_8-cpu + name: manywheel-py3_7-rocm5_0 path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -2922,7 +2456,7 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_8-cuda10_2-build: + manywheel-py3_8-cpu-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: linux.4xlarge timeout-minutes: 240 @@ -2930,37 +2464,16 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + DOCKER_IMAGE: pytorch/manylinux-builder:cpu SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2977,9 +2490,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -3044,9 +2554,9 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: - name: manywheel-py3_8-cuda10_2 + name: manywheel-py3_8-cpu retention-days: 14 if-no-files-found: error path: @@ -3069,46 +2579,25 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_8-cuda10_2-test: # Testing + manywheel-py3_8-cpu-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_8-cuda10_2-build - runs-on: linux.4xlarge.nvidia.gpu + needs: manywheel-py3_8-cpu-build + runs-on: linux.4xlarge timeout-minutes: 240 env: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + DOCKER_IMAGE: pytorch/manylinux-builder:cpu SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3125,13 +2614,10 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: manywheel-py3_8-cuda10_2 + name: manywheel-py3_8-cpu path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -3156,17 +2642,6 @@ jobs: # Remove any artifacts from the previous checkouts git clean -fxd working-directory: builder - - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG - with: - timeout_minutes: 10 - max_attempts: 3 - command: | - set -ex - pushd pytorch - bash .github/scripts/install_nvidia_utils_linux.sh - echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" - popd - name: Pull Docker image run: | retry () { @@ -3224,45 +2699,24 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_8-cuda10_2-upload: # Uploading + manywheel-py3_8-cpu-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_8-cuda10_2-test + needs: manywheel-py3_8-cpu-test env: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + DOCKER_IMAGE: pytorch/manylinux-builder:cpu SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3279,15 +2733,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: manywheel-py3_8-cuda10_2 + name: manywheel-py3_8-cpu path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -3340,7 +2791,7 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_8-cuda11_3-build: + manywheel-py3_8-cuda10_2-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: linux.4xlarge timeout-minutes: 240 @@ -3348,37 +2799,17 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu102 + GPU_ARCH_VERSION: 10.2 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3395,9 +2826,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -3421,9 +2849,6 @@ jobs: # Remove any artifacts from the previous checkouts git clean -fxd working-directory: builder - - name: Set BUILD_SPLIT_CUDA - run: | - echo "BUILD_SPLIT_CUDA='ON'" >> "$GITHUB_ENV" - name: Pull Docker image run: | retry () { @@ -3465,9 +2890,9 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: - name: manywheel-py3_8-cuda11_3 + name: manywheel-py3_8-cuda10_2 retention-days: 14 if-no-files-found: error path: @@ -3490,46 +2915,26 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_8-cuda11_3-test: # Testing + manywheel-py3_8-cuda10_2-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_8-cuda11_3-build + needs: manywheel-py3_8-cuda10_2-build runs-on: linux.4xlarge.nvidia.gpu timeout-minutes: 240 env: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu102 + GPU_ARCH_VERSION: 10.2 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3546,13 +2951,10 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: manywheel-py3_8-cuda11_3 + name: manywheel-py3_8-cuda10_2 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -3645,45 +3047,25 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_8-cuda11_3-upload: # Uploading + manywheel-py3_8-cuda10_2-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_8-cuda11_3-test + needs: manywheel-py3_8-cuda10_2-test env: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu102 + GPU_ARCH_VERSION: 10.2 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3700,15 +3082,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: manywheel-py3_8-cuda11_3 + name: manywheel-py3_8-cuda10_2 path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -3761,7 +3140,7 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_8-cuda11_5-build: + manywheel-py3_8-cuda11_3-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: linux.4xlarge timeout-minutes: 240 @@ -3769,37 +3148,17 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu115 - GPU_ARCH_VERSION: 11.5 + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.5 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3816,9 +3175,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -3886,9 +3242,9 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: - name: manywheel-py3_8-cuda11_5 + name: manywheel-py3_8-cuda11_3 retention-days: 14 if-no-files-found: error path: @@ -3911,46 +3267,26 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_8-cuda11_5-test: # Testing + manywheel-py3_8-cuda11_3-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_8-cuda11_5-build + needs: manywheel-py3_8-cuda11_3-build runs-on: linux.4xlarge.nvidia.gpu timeout-minutes: 240 env: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu115 - GPU_ARCH_VERSION: 11.5 + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.5 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3967,13 +3303,10 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: manywheel-py3_8-cuda11_5 + name: manywheel-py3_8-cuda11_3 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -4066,45 +3399,25 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_8-cuda11_5-upload: # Uploading + manywheel-py3_8-cuda11_3-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_8-cuda11_5-test + needs: manywheel-py3_8-cuda11_3-test env: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu115 - GPU_ARCH_VERSION: 11.5 + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.5 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -4121,15 +3434,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: manywheel-py3_8-cuda11_5 + name: manywheel-py3_8-cuda11_3 path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -4182,7 +3492,7 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_8-rocm4_5_2-build: + manywheel-py3_8-cuda11_5-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: linux.4xlarge timeout-minutes: 240 @@ -4190,37 +3500,17 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm4.5.2 - GPU_ARCH_VERSION: 4.5.2 - GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm4.5.2 + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.5 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -4237,9 +3527,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -4263,6 +3550,9 @@ jobs: # Remove any artifacts from the previous checkouts git clean -fxd working-directory: builder + - name: Set BUILD_SPLIT_CUDA + run: | + echo "BUILD_SPLIT_CUDA='ON'" >> "$GITHUB_ENV" - name: Pull Docker image run: | retry () { @@ -4304,9 +3594,9 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: - name: manywheel-py3_8-rocm4_5_2 + name: manywheel-py3_8-cuda11_5 retention-days: 14 if-no-files-found: error path: @@ -4329,46 +3619,26 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_8-rocm4_5_2-test: # Testing + manywheel-py3_8-cuda11_5-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_8-rocm4_5_2-build - runs-on: linux.4xlarge + needs: manywheel-py3_8-cuda11_5-build + runs-on: linux.4xlarge.nvidia.gpu timeout-minutes: 240 env: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm4.5.2 - GPU_ARCH_VERSION: 4.5.2 - GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm4.5.2 + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.5 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -4385,13 +3655,10 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: manywheel-py3_8-rocm4_5_2 + name: manywheel-py3_8-cuda11_5 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -4416,6 +3683,17 @@ jobs: # Remove any artifacts from the previous checkouts git clean -fxd working-directory: builder + - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a + name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG + with: + timeout_minutes: 10 + max_attempts: 3 + command: | + set -ex + pushd pytorch + bash .github/scripts/install_nvidia_utils_linux.sh + echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" + popd - name: Pull Docker image run: | retry () { @@ -4473,45 +3751,1388 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_8-rocm4_5_2-upload: # Uploading + manywheel-py3_8-cuda11_5-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_8-rocm4_5_2-test + needs: manywheel-py3_8-cuda11_5-test env: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm4.5.2 - GPU_ARCH_VERSION: 4.5.2 - GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm4.5.2 - SKIP_ALL_TESTS: 1 + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.5 + SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: manywheel-py3_8-cuda11_5 + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + manywheel-py3_8-cuda11_6-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: manywheel + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.8" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Set BUILD_SPLIT_CUDA + run: | + echo "BUILD_SPLIT_CUDA='ON'" >> "$GITHUB_ENV" + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Build PyTorch binary + run: | + set -x + mkdir -p artifacts/ + container_name=$(docker run \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/manywheel/build.sh" + - name: Chown artifacts + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - uses: seemethere/upload-artifact-s3@v4 + with: + name: manywheel-py3_8-cuda11_6 + retention-days: 14 + if-no-files-found: error + path: + ${{ runner.temp }}/artifacts/* + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + manywheel-py3_8-cuda11_6-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: manywheel-py3_8-cuda11_6-build + runs-on: linux.4xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PACKAGE_TYPE: manywheel + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.8" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: manywheel-py3_8-cuda11_6 + path: "${{ runner.temp }}/artifacts/" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a + name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG + with: + timeout_minutes: 10 + max_attempts: 3 + command: | + set -ex + pushd pytorch + bash .github/scripts/install_nvidia_utils_linux.sh + echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" + popd + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Test PyTorch binary + run: | + set -x + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/final_pkgs" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + # Generate test script + docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh" + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + manywheel-py3_8-cuda11_6-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: manywheel-py3_8-cuda11_6-test + env: + PACKAGE_TYPE: manywheel + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.8" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: manywheel-py3_8-cuda11_6 + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + manywheel-py3_8-rocm4_5_2-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: manywheel + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm4.5.2 + GPU_ARCH_VERSION: 4.5.2 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/manylinux-builder:rocm4.5.2 + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.8" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Build PyTorch binary + run: | + set -x + mkdir -p artifacts/ + container_name=$(docker run \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/manywheel/build.sh" + - name: Chown artifacts + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - uses: seemethere/upload-artifact-s3@v4 + with: + name: manywheel-py3_8-rocm4_5_2 + retention-days: 14 + if-no-files-found: error + path: + ${{ runner.temp }}/artifacts/* + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + manywheel-py3_8-rocm4_5_2-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: manywheel-py3_8-rocm4_5_2-build + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: manywheel + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm4.5.2 + GPU_ARCH_VERSION: 4.5.2 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/manylinux-builder:rocm4.5.2 + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.8" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: manywheel-py3_8-rocm4_5_2 + path: "${{ runner.temp }}/artifacts/" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Test PyTorch binary + run: | + set -x + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/final_pkgs" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + # Generate test script + docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh" + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + manywheel-py3_8-rocm4_5_2-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: manywheel-py3_8-rocm4_5_2-test + env: + PACKAGE_TYPE: manywheel + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm4.5.2 + GPU_ARCH_VERSION: 4.5.2 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/manylinux-builder:rocm4.5.2 + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.8" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: manywheel-py3_8-rocm4_5_2 + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + manywheel-py3_8-rocm5_0-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: manywheel + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm5.0 + GPU_ARCH_VERSION: 5.0 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.0 + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.8" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Build PyTorch binary + run: | + set -x + mkdir -p artifacts/ + container_name=$(docker run \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/manywheel/build.sh" + - name: Chown artifacts + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - uses: seemethere/upload-artifact-s3@v4 + with: + name: manywheel-py3_8-rocm5_0 + retention-days: 14 + if-no-files-found: error + path: + ${{ runner.temp }}/artifacts/* + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + manywheel-py3_8-rocm5_0-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: manywheel-py3_8-rocm5_0-build + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: manywheel + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm5.0 + GPU_ARCH_VERSION: 5.0 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.0 + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.8" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: manywheel-py3_8-rocm5_0 + path: "${{ runner.temp }}/artifacts/" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Test PyTorch binary + run: | + set -x + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/final_pkgs" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + # Generate test script + docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh" + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + manywheel-py3_8-rocm5_0-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: manywheel-py3_8-rocm5_0-test + env: + PACKAGE_TYPE: manywheel + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: rocm5.0 + GPU_ARCH_VERSION: 5.0 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.0 + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.8" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: manywheel-py3_8-rocm5_0 + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + manywheel-py3_9-cpu-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: manywheel + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + DOCKER_IMAGE: pytorch/manylinux-builder:cpu + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.9" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Build PyTorch binary + run: | + set -x + mkdir -p artifacts/ + container_name=$(docker run \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/manywheel/build.sh" + - name: Chown artifacts + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - uses: seemethere/upload-artifact-s3@v4 + with: + name: manywheel-py3_9-cpu + retention-days: 14 + if-no-files-found: error + path: + ${{ runner.temp }}/artifacts/* + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + manywheel-py3_9-cpu-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: manywheel-py3_9-cpu-build + runs-on: linux.4xlarge + timeout-minutes: 240 + env: + PACKAGE_TYPE: manywheel + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + DOCKER_IMAGE: pytorch/manylinux-builder:cpu + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.9" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: manywheel-py3_9-cpu + path: "${{ runner.temp }}/artifacts/" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Pull Docker image + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${DOCKER_IMAGE}" + - name: Test PyTorch binary + run: | + set -x + # shellcheck disable=SC2086,SC2090 + container_name=$(docker run \ + ${GPU_FLAG:-} \ + -e BINARY_ENV_FILE \ + -e BUILDER_ROOT \ + -e BUILD_ENVIRONMENT \ + -e BUILD_SPLIT_CUDA \ + -e DESIRED_CUDA \ + -e DESIRED_DEVTOOLSET \ + -e DESIRED_PYTHON \ + -e GPU_ARCH_TYPE \ + -e GPU_ARCH_VERSION \ + -e IS_GHA \ + -e LIBTORCH_VARIANT \ + -e PACKAGE_TYPE \ + -e PYTORCH_FINAL_PACKAGE_DIR \ + -e PYTORCH_ROOT \ + -e SKIP_ALL_TESTS \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ + -v "${GITHUB_WORKSPACE}/builder:/builder" \ + -v "${RUNNER_TEMP}/artifacts:/final_pkgs" \ + -w / \ + "${DOCKER_IMAGE}" + ) + docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh" + # Generate test script + docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh" + docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh" + - name: Hold runner for 2 hours or until ssh sessions have drained + working-directory: pytorch/ + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + manywheel-py3_9-cpu-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: manywheel-py3_9-cpu-test + env: + PACKAGE_TYPE: manywheel + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + DOCKER_IMAGE: pytorch/manylinux-builder:cpu + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.9" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -4528,15 +5149,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: manywheel-py3_8-rocm4_5_2 + name: manywheel-py3_9-cpu path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -4589,7 +5207,7 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_8-rocm5_0-build: + manywheel-py3_9-cuda10_2-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: linux.4xlarge timeout-minutes: 240 @@ -4597,37 +5215,17 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.0 - GPU_ARCH_VERSION: 5.0 - GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.0 + DESIRED_CUDA: cu102 + GPU_ARCH_VERSION: 10.2 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.8" + DESIRED_PYTHON: "3.9" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -4644,9 +5242,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -4711,9 +5306,9 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: - name: manywheel-py3_8-rocm5_0 + name: manywheel-py3_9-cuda10_2 retention-days: 14 if-no-files-found: error path: @@ -4736,46 +5331,26 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_8-rocm5_0-test: # Testing + manywheel-py3_9-cuda10_2-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_8-rocm5_0-build - runs-on: linux.4xlarge + needs: manywheel-py3_9-cuda10_2-build + runs-on: linux.4xlarge.nvidia.gpu timeout-minutes: 240 env: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.0 - GPU_ARCH_VERSION: 5.0 - GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.0 + DESIRED_CUDA: cu102 + GPU_ARCH_VERSION: 10.2 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.8" + DESIRED_PYTHON: "3.9" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -4792,13 +5367,10 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: manywheel-py3_8-rocm5_0 + name: manywheel-py3_9-cuda10_2 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -4823,6 +5395,17 @@ jobs: # Remove any artifacts from the previous checkouts git clean -fxd working-directory: builder + - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a + name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG + with: + timeout_minutes: 10 + max_attempts: 3 + command: | + set -ex + pushd pytorch + bash .github/scripts/install_nvidia_utils_linux.sh + echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" + popd - name: Pull Docker image run: | retry () { @@ -4880,45 +5463,25 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_8-rocm5_0-upload: # Uploading + manywheel-py3_9-cuda10_2-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_8-rocm5_0-test + needs: manywheel-py3_9-cuda10_2-test env: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.0 - GPU_ARCH_VERSION: 5.0 - GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.0 + DESIRED_CUDA: cu102 + GPU_ARCH_VERSION: 10.2 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.8" + DESIRED_PYTHON: "3.9" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -4935,15 +5498,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: manywheel-py3_8-rocm5_0 + name: manywheel-py3_9-cuda10_2 path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -4996,7 +5556,7 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_9-cpu-build: + manywheel-py3_9-cuda11_3-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: linux.4xlarge timeout-minutes: 240 @@ -5004,36 +5564,17 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu - DOCKER_IMAGE: pytorch/manylinux-builder:cpu + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.9" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -5050,9 +5591,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -5076,6 +5614,9 @@ jobs: # Remove any artifacts from the previous checkouts git clean -fxd working-directory: builder + - name: Set BUILD_SPLIT_CUDA + run: | + echo "BUILD_SPLIT_CUDA='ON'" >> "$GITHUB_ENV" - name: Pull Docker image run: | retry () { @@ -5117,9 +5658,9 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: - name: manywheel-py3_9-cpu + name: manywheel-py3_9-cuda11_3 retention-days: 14 if-no-files-found: error path: @@ -5142,45 +5683,26 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_9-cpu-test: # Testing + manywheel-py3_9-cuda11_3-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_9-cpu-build - runs-on: linux.4xlarge + needs: manywheel-py3_9-cuda11_3-build + runs-on: linux.4xlarge.nvidia.gpu timeout-minutes: 240 env: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu - DOCKER_IMAGE: pytorch/manylinux-builder:cpu + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.9" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -5197,13 +5719,10 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: manywheel-py3_9-cpu + name: manywheel-py3_9-cuda11_3 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -5228,6 +5747,17 @@ jobs: # Remove any artifacts from the previous checkouts git clean -fxd working-directory: builder + - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a + name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG + with: + timeout_minutes: 10 + max_attempts: 3 + command: | + set -ex + pushd pytorch + bash .github/scripts/install_nvidia_utils_linux.sh + echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" + popd - name: Pull Docker image run: | retry () { @@ -5285,44 +5815,25 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_9-cpu-upload: # Uploading + manywheel-py3_9-cuda11_3-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_9-cpu-test + needs: manywheel-py3_9-cuda11_3-test env: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu - DOCKER_IMAGE: pytorch/manylinux-builder:cpu + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.9" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -5339,15 +5850,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: manywheel-py3_9-cpu + name: manywheel-py3_9-cuda11_3 path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -5400,7 +5908,7 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_9-cuda10_2-build: + manywheel-py3_9-cuda11_5-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: linux.4xlarge timeout-minutes: 240 @@ -5408,37 +5916,17 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.5 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.9" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -5455,9 +5943,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -5481,6 +5966,9 @@ jobs: # Remove any artifacts from the previous checkouts git clean -fxd working-directory: builder + - name: Set BUILD_SPLIT_CUDA + run: | + echo "BUILD_SPLIT_CUDA='ON'" >> "$GITHUB_ENV" - name: Pull Docker image run: | retry () { @@ -5522,9 +6010,9 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: - name: manywheel-py3_9-cuda10_2 + name: manywheel-py3_9-cuda11_5 retention-days: 14 if-no-files-found: error path: @@ -5547,46 +6035,26 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_9-cuda10_2-test: # Testing + manywheel-py3_9-cuda11_5-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_9-cuda10_2-build + needs: manywheel-py3_9-cuda11_5-build runs-on: linux.4xlarge.nvidia.gpu timeout-minutes: 240 env: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.5 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.9" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -5603,13 +6071,10 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: manywheel-py3_9-cuda10_2 + name: manywheel-py3_9-cuda11_5 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -5702,45 +6167,25 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_9-cuda10_2-upload: # Uploading + manywheel-py3_9-cuda11_5-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_9-cuda10_2-test + needs: manywheel-py3_9-cuda11_5-test env: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.5 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.9" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -5757,15 +6202,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: manywheel-py3_9-cuda10_2 + name: manywheel-py3_9-cuda11_5 path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -5818,7 +6260,7 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_9-cuda11_3-build: + manywheel-py3_9-cuda11_6-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: linux.4xlarge timeout-minutes: 240 @@ -5826,37 +6268,17 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.9" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -5873,9 +6295,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -5943,9 +6362,9 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: - name: manywheel-py3_9-cuda11_3 + name: manywheel-py3_9-cuda11_6 retention-days: 14 if-no-files-found: error path: @@ -5968,46 +6387,26 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_9-cuda11_3-test: # Testing + manywheel-py3_9-cuda11_6-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_9-cuda11_3-build + needs: manywheel-py3_9-cuda11_6-build runs-on: linux.4xlarge.nvidia.gpu timeout-minutes: 240 env: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.9" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -6024,13 +6423,10 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: manywheel-py3_9-cuda11_3 + name: manywheel-py3_9-cuda11_6 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -6123,45 +6519,25 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_9-cuda11_3-upload: # Uploading + manywheel-py3_9-cuda11_6-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_9-cuda11_3-test + needs: manywheel-py3_9-cuda11_6-test env: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.9" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -6178,15 +6554,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: manywheel-py3_9-cuda11_3 + name: manywheel-py3_9-cuda11_6 path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -6239,7 +6612,7 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_9-cuda11_5-build: + manywheel-py3_9-rocm4_5_2-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: linux.4xlarge timeout-minutes: 240 @@ -6247,37 +6620,17 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu115 - GPU_ARCH_VERSION: 11.5 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.5 + DESIRED_CUDA: rocm4.5.2 + GPU_ARCH_VERSION: 4.5.2 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/manylinux-builder:rocm4.5.2 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.9" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -6294,9 +6647,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -6320,9 +6670,6 @@ jobs: # Remove any artifacts from the previous checkouts git clean -fxd working-directory: builder - - name: Set BUILD_SPLIT_CUDA - run: | - echo "BUILD_SPLIT_CUDA='ON'" >> "$GITHUB_ENV" - name: Pull Docker image run: | retry () { @@ -6364,9 +6711,9 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: - name: manywheel-py3_9-cuda11_5 + name: manywheel-py3_9-rocm4_5_2 retention-days: 14 if-no-files-found: error path: @@ -6389,46 +6736,26 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_9-cuda11_5-test: # Testing + manywheel-py3_9-rocm4_5_2-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_9-cuda11_5-build - runs-on: linux.4xlarge.nvidia.gpu + needs: manywheel-py3_9-rocm4_5_2-build + runs-on: linux.4xlarge timeout-minutes: 240 env: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu115 - GPU_ARCH_VERSION: 11.5 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.5 + DESIRED_CUDA: rocm4.5.2 + GPU_ARCH_VERSION: 4.5.2 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/manylinux-builder:rocm4.5.2 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.9" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -6445,13 +6772,10 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: manywheel-py3_9-cuda11_5 + name: manywheel-py3_9-rocm4_5_2 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -6476,17 +6800,6 @@ jobs: # Remove any artifacts from the previous checkouts git clean -fxd working-directory: builder - - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG - with: - timeout_minutes: 10 - max_attempts: 3 - command: | - set -ex - pushd pytorch - bash .github/scripts/install_nvidia_utils_linux.sh - echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" - popd - name: Pull Docker image run: | retry () { @@ -6544,45 +6857,25 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_9-cuda11_5-upload: # Uploading + manywheel-py3_9-rocm4_5_2-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_9-cuda11_5-test + needs: manywheel-py3_9-rocm4_5_2-test env: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu115 - GPU_ARCH_VERSION: 11.5 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.5 + DESIRED_CUDA: rocm4.5.2 + GPU_ARCH_VERSION: 4.5.2 + GPU_ARCH_TYPE: rocm + DOCKER_IMAGE: pytorch/manylinux-builder:rocm4.5.2 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.9" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -6599,15 +6892,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: manywheel-py3_9-cuda11_5 + name: manywheel-py3_9-rocm4_5_2 path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -6660,7 +6950,7 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_9-rocm4_5_2-build: + manywheel-py3_9-rocm5_0-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: linux.4xlarge timeout-minutes: 240 @@ -6668,37 +6958,17 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm4.5.2 - GPU_ARCH_VERSION: 4.5.2 + DESIRED_CUDA: rocm5.0 + GPU_ARCH_VERSION: 5.0 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm4.5.2 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.0 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.9" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -6715,9 +6985,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -6782,9 +7049,9 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: - name: manywheel-py3_9-rocm4_5_2 + name: manywheel-py3_9-rocm5_0 retention-days: 14 if-no-files-found: error path: @@ -6807,46 +7074,26 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_9-rocm4_5_2-test: # Testing + manywheel-py3_9-rocm5_0-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_9-rocm4_5_2-build + needs: manywheel-py3_9-rocm5_0-build runs-on: linux.4xlarge timeout-minutes: 240 env: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm4.5.2 - GPU_ARCH_VERSION: 4.5.2 + DESIRED_CUDA: rocm5.0 + GPU_ARCH_VERSION: 5.0 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm4.5.2 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.0 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.9" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -6863,13 +7110,10 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: manywheel-py3_9-rocm4_5_2 + name: manywheel-py3_9-rocm5_0 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -6951,45 +7195,25 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_9-rocm4_5_2-upload: # Uploading + manywheel-py3_9-rocm5_0-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_9-rocm4_5_2-test + needs: manywheel-py3_9-rocm5_0-test env: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm4.5.2 - GPU_ARCH_VERSION: 4.5.2 + DESIRED_CUDA: rocm5.0 + GPU_ARCH_VERSION: 5.0 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm4.5.2 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.0 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.9" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -7006,15 +7230,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: manywheel-py3_9-rocm4_5_2 + name: manywheel-py3_9-rocm5_0 path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -7067,7 +7288,7 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_9-rocm5_0-build: + manywheel-py3_10-cpu-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: linux.4xlarge timeout-minutes: 240 @@ -7075,37 +7296,16 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.0 - GPU_ARCH_VERSION: 5.0 - GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.0 + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + DOCKER_IMAGE: pytorch/manylinux-builder:cpu SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.9" + DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -7122,9 +7322,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -7189,9 +7386,9 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: - name: manywheel-py3_9-rocm5_0 + name: manywheel-py3_10-cpu retention-days: 14 if-no-files-found: error path: @@ -7214,46 +7411,25 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_9-rocm5_0-test: # Testing + manywheel-py3_10-cpu-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_9-rocm5_0-build + needs: manywheel-py3_10-cpu-build runs-on: linux.4xlarge timeout-minutes: 240 env: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.0 - GPU_ARCH_VERSION: 5.0 - GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.0 + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + DOCKER_IMAGE: pytorch/manylinux-builder:cpu SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.9" + DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -7270,13 +7446,10 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: manywheel-py3_9-rocm5_0 + name: manywheel-py3_10-cpu path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -7358,45 +7531,24 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_9-rocm5_0-upload: # Uploading + manywheel-py3_10-cpu-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_9-rocm5_0-test + needs: manywheel-py3_10-cpu-test env: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.0 - GPU_ARCH_VERSION: 5.0 - GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.0 + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + DOCKER_IMAGE: pytorch/manylinux-builder:cpu SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.9" + DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -7413,15 +7565,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: manywheel-py3_9-rocm5_0 + name: manywheel-py3_10-cpu path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -7474,7 +7623,7 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_10-cpu-build: + manywheel-py3_10-cuda10_2-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: linux.4xlarge timeout-minutes: 240 @@ -7482,36 +7631,17 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu - DOCKER_IMAGE: pytorch/manylinux-builder:cpu + DESIRED_CUDA: cu102 + GPU_ARCH_VERSION: 10.2 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -7528,9 +7658,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -7595,9 +7722,9 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: - name: manywheel-py3_10-cpu + name: manywheel-py3_10-cuda10_2 retention-days: 14 if-no-files-found: error path: @@ -7620,45 +7747,26 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_10-cpu-test: # Testing + manywheel-py3_10-cuda10_2-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_10-cpu-build - runs-on: linux.4xlarge + needs: manywheel-py3_10-cuda10_2-build + runs-on: linux.4xlarge.nvidia.gpu timeout-minutes: 240 env: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu - DOCKER_IMAGE: pytorch/manylinux-builder:cpu + DESIRED_CUDA: cu102 + GPU_ARCH_VERSION: 10.2 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -7675,13 +7783,10 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: manywheel-py3_10-cpu + name: manywheel-py3_10-cuda10_2 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -7706,6 +7811,17 @@ jobs: # Remove any artifacts from the previous checkouts git clean -fxd working-directory: builder + - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a + name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG + with: + timeout_minutes: 10 + max_attempts: 3 + command: | + set -ex + pushd pytorch + bash .github/scripts/install_nvidia_utils_linux.sh + echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" + popd - name: Pull Docker image run: | retry () { @@ -7763,44 +7879,25 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_10-cpu-upload: # Uploading + manywheel-py3_10-cuda10_2-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_10-cpu-test + needs: manywheel-py3_10-cuda10_2-test env: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu - DOCKER_IMAGE: pytorch/manylinux-builder:cpu + DESIRED_CUDA: cu102 + GPU_ARCH_VERSION: 10.2 + GPU_ARCH_TYPE: cuda + DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -7817,15 +7914,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: manywheel-py3_10-cpu + name: manywheel-py3_10-cuda10_2 path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -7878,7 +7972,7 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_10-cuda10_2-build: + manywheel-py3_10-cuda11_3-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: linux.4xlarge timeout-minutes: 240 @@ -7886,37 +7980,17 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -7933,9 +8007,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -7959,6 +8030,9 @@ jobs: # Remove any artifacts from the previous checkouts git clean -fxd working-directory: builder + - name: Set BUILD_SPLIT_CUDA + run: | + echo "BUILD_SPLIT_CUDA='ON'" >> "$GITHUB_ENV" - name: Pull Docker image run: | retry () { @@ -8000,9 +8074,9 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: - name: manywheel-py3_10-cuda10_2 + name: manywheel-py3_10-cuda11_3 retention-days: 14 if-no-files-found: error path: @@ -8025,46 +8099,26 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_10-cuda10_2-test: # Testing + manywheel-py3_10-cuda11_3-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_10-cuda10_2-build + needs: manywheel-py3_10-cuda11_3-build runs-on: linux.4xlarge.nvidia.gpu timeout-minutes: 240 env: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -8081,13 +8135,10 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: manywheel-py3_10-cuda10_2 + name: manywheel-py3_10-cuda11_3 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -8180,45 +8231,25 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_10-cuda10_2-upload: # Uploading + manywheel-py3_10-cuda11_3-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_10-cuda10_2-test + needs: manywheel-py3_10-cuda11_3-test env: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -8235,15 +8266,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: manywheel-py3_10-cuda10_2 + name: manywheel-py3_10-cuda11_3 path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -8296,7 +8324,7 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_10-cuda11_3-build: + manywheel-py3_10-cuda11_5-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: linux.4xlarge timeout-minutes: 240 @@ -8304,37 +8332,17 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.5 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -8351,9 +8359,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -8421,9 +8426,9 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: - name: manywheel-py3_10-cuda11_3 + name: manywheel-py3_10-cuda11_5 retention-days: 14 if-no-files-found: error path: @@ -8446,46 +8451,26 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_10-cuda11_3-test: # Testing + manywheel-py3_10-cuda11_5-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_10-cuda11_3-build + needs: manywheel-py3_10-cuda11_5-build runs-on: linux.4xlarge.nvidia.gpu timeout-minutes: 240 env: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.5 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -8502,13 +8487,10 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: manywheel-py3_10-cuda11_3 + name: manywheel-py3_10-cuda11_5 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -8601,45 +8583,25 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_10-cuda11_3-upload: # Uploading + manywheel-py3_10-cuda11_5-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_10-cuda11_3-test + needs: manywheel-py3_10-cuda11_5-test env: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.5 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -8656,15 +8618,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: manywheel-py3_10-cuda11_3 + name: manywheel-py3_10-cuda11_5 path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -8717,7 +8676,7 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_10-cuda11_5-build: + manywheel-py3_10-cuda11_6-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: linux.4xlarge timeout-minutes: 240 @@ -8725,37 +8684,17 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu115 - GPU_ARCH_VERSION: 11.5 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.5 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -8772,9 +8711,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -8842,9 +8778,9 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: - name: manywheel-py3_10-cuda11_5 + name: manywheel-py3_10-cuda11_6 retention-days: 14 if-no-files-found: error path: @@ -8867,46 +8803,26 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_10-cuda11_5-test: # Testing + manywheel-py3_10-cuda11_6-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_10-cuda11_5-build + needs: manywheel-py3_10-cuda11_6-build runs-on: linux.4xlarge.nvidia.gpu timeout-minutes: 240 env: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu115 - GPU_ARCH_VERSION: 11.5 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.5 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -8923,13 +8839,10 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: manywheel-py3_10-cuda11_5 + name: manywheel-py3_10-cuda11_6 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -9022,45 +8935,25 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_10-cuda11_5-upload: # Uploading + manywheel-py3_10-cuda11_6-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_10-cuda11_5-test + needs: manywheel-py3_10-cuda11_6-test env: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu115 - GPU_ARCH_VERSION: 11.5 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.5 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -9077,15 +8970,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: manywheel-py3_10-cuda11_5 + name: manywheel-py3_10-cuda11_6 path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -9153,30 +9043,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -9193,9 +9063,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -9260,7 +9127,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: manywheel-py3_10-rocm4_5_2 retention-days: 14 @@ -9301,30 +9168,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -9341,10 +9188,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: manywheel-py3_10-rocm4_5_2 @@ -9444,30 +9288,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -9484,12 +9308,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: manywheel-py3_10-rocm4_5_2 @@ -9560,30 +9381,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -9600,9 +9401,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -9667,7 +9465,7 @@ jobs: run: | # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 with: name: manywheel-py3_10-rocm5_0 retention-days: 14 @@ -9708,30 +9506,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -9748,10 +9526,7 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: manywheel-py3_10-rocm5_0 @@ -9851,30 +9626,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -9891,12 +9646,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: manywheel-py3_10-rocm5_0 diff --git a/.github/workflows/generated-linux-bionic-cuda10.2-py3.9-gcc7.yml b/.github/workflows/generated-linux-bionic-cuda10.2-py3.9-gcc7.yml deleted file mode 100644 index 2ce53ab2ecba2c..00000000000000 --- a/.github/workflows/generated-linux-bionic-cuda10.2-py3.9-gcc7.yml +++ /dev/null @@ -1,2283 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/linux_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: linux-bionic-cuda10.2-py3.9-gcc7 - -on: - push: - tags: - - 'ciflow/all/*' - - 'ciflow/cuda/*' - - 'ciflow/linux/*' - - 'ciflow/slow/*' - - 'ciflow/trunk/*' - branches: - - master - - main - - release/* - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: linux-bionic-cuda10.2-py3.9-gcc7 - DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7 - SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 - XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla - TORCH_CUDA_ARCH_LIST: 5.2 - IN_CI: 1 - IS_GHA: 1 - # This is used for the phase of adding wheel tests only, will be removed once completed - IN_WHEEL_TEST: 1 - # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh - CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - AWS_DEFAULT_REGION: us-east-1 - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - PYTORCH_RETRY_TEST_CASES: 1 -concurrency: - group: linux-bionic-cuda10.2-py3.9-gcc7-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true - -jobs: - - build: - runs-on: linux.2xlarge - timeout-minutes: 240 - env: - JOB_BASE_NAME: linux-bionic-cuda10.2-py3.9-gcc7-build - outputs: - docker_image: ${{ steps.calculate-tag.outputs.docker_image }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Calculate docker image tag - id: calculate-tag - run: | - DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker) - echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "::set-output name=docker_tag::${DOCKER_TAG}" - echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" - - name: Check if image should be built - id: check - env: - BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }} - run: | - set -x - # Check if image already exists, if it does then skip building it - if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then - exit 0 - fi - if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then - # if we're on the base branch then use the parent commit - MERGE_BASE=$(git rev-parse HEAD~) - else - # otherwise we're on a PR, so use the most recent base commit - MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") - fi - # Covers the case where a previous tag doesn't exist for the tree - # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly - if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then - echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit" - exit 1 - fi - PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker") - # If no image exists but the hash is the same as the previous hash then we should error out here - if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then - echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch" - echo " contact the PyTorch team to restore the original images" - exit 1 - fi - echo ::set-output name=rebuild::yes - - name: Build and push docker image - if: ${{ steps.check.outputs.rebuild }} - env: - DOCKER_SKIP_S3_UPLOAD: 1 - working-directory: .circleci/docker - run: | - export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/} - ./build_docker.sh - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Build - env: - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - # detached container should get cleaned up by teardown_ec2_linux - container_name=$(docker run \ - -e BUILD_ENVIRONMENT \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e AWS_DEFAULT_REGION \ - -e IS_GHA \ - -e PR_NUMBER \ - -e SHA1 \ - -e BRANCH \ - -e GITHUB_RUN_ID \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e PR_LABELS \ - -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh' - - name: Display and upload binary build size statistics (Click Me) - # temporary hack: set CIRCLE_* vars, until we update - # tools/stats/print_test_stats.py to natively support GitHub Actions - env: - SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - run: | - COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0) - export COMMIT_TIME - pip3 install requests==2.26 boto3==1.16.34 - python3 -m tools.stats.upload_binary_size_to_scuba || exit 0 - - name: Chown workspace - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Archive artifacts into zip - run: | - zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json - - uses: seemethere/upload-artifact-s3@v3 - name: Store PyTorch Build Artifacts on S3 - with: - name: ${{ env.BUILD_ENVIRONMENT }} - retention-days: 14 - if-no-files-found: error - path: - artifacts.zip - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Clean up docker images - if: always() - run: | - # Prune all of the docker images - docker system prune -af - - test_jit_legacy_1_1: - name: test (jit_legacy, 1, 1, linux.4xlarge.nvidia.gpu) - needs: build - runs-on: linux.4xlarge.nvidia.gpu - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: linux-bionic-cuda10.2-py3.9-gcc7-test - TEST_CONFIG: jit_legacy - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 1 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG - with: - timeout_minutes: 10 - max_attempts: 3 - command: | - set -ex - bash .github/scripts/install_nvidia_utils_linux.sh - echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-jit_legacy-1-1-linux.4xlarge.nvidia.gpu' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-jit_legacy-1-1-linux.4xlarge.nvidia.gpu' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: linux-bionic-cuda10.2-py3.9-gcc7-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - test_multigpu_1_1: - name: test (multigpu, 1, 1, linux.16xlarge.nvidia.gpu) - needs: build - runs-on: linux.16xlarge.nvidia.gpu - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: linux-bionic-cuda10.2-py3.9-gcc7-test - TEST_CONFIG: multigpu - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 1 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG - with: - timeout_minutes: 10 - max_attempts: 3 - command: | - set -ex - bash .github/scripts/install_nvidia_utils_linux.sh - echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-multigpu-1-1-linux.16xlarge.nvidia.gpu' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-multigpu-1-1-linux.16xlarge.nvidia.gpu' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: linux-bionic-cuda10.2-py3.9-gcc7-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - test_nogpu_NO_AVX_1_1: - name: test (nogpu_NO_AVX, 1, 1, linux.2xlarge) - needs: build - runs-on: linux.2xlarge - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: linux-bionic-cuda10.2-py3.9-gcc7-test - TEST_CONFIG: nogpu_NO_AVX - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 1 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-nogpu_NO_AVX-1-1-linux.2xlarge' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-nogpu_NO_AVX-1-1-linux.2xlarge' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: linux-bionic-cuda10.2-py3.9-gcc7-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - test_nogpu_NO_AVX2_1_1: - name: test (nogpu_NO_AVX2, 1, 1, linux.2xlarge) - needs: build - runs-on: linux.2xlarge - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: linux-bionic-cuda10.2-py3.9-gcc7-test - TEST_CONFIG: nogpu_NO_AVX2 - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 1 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-nogpu_NO_AVX2-1-1-linux.2xlarge' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-nogpu_NO_AVX2-1-1-linux.2xlarge' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: linux-bionic-cuda10.2-py3.9-gcc7-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - test_distributed_1_1: - name: test (distributed, 1, 1, linux.8xlarge.nvidia.gpu) - needs: build - runs-on: linux.8xlarge.nvidia.gpu - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: linux-bionic-cuda10.2-py3.9-gcc7-test - TEST_CONFIG: distributed - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 1 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG - with: - timeout_minutes: 10 - max_attempts: 3 - command: | - set -ex - bash .github/scripts/install_nvidia_utils_linux.sh - echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-distributed-1-1-linux.8xlarge.nvidia.gpu' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-distributed-1-1-linux.8xlarge.nvidia.gpu' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: linux-bionic-cuda10.2-py3.9-gcc7-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - test_slow_1_1: - name: test (slow, 1, 1, linux.4xlarge.nvidia.gpu) - needs: build - runs-on: linux.4xlarge.nvidia.gpu - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: linux-bionic-cuda10.2-py3.9-gcc7-test - TEST_CONFIG: slow - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 1 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG - with: - timeout_minutes: 10 - max_attempts: 3 - command: | - set -ex - bash .github/scripts/install_nvidia_utils_linux.sh - echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-slow-1-1-linux.4xlarge.nvidia.gpu' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-slow-1-1-linux.4xlarge.nvidia.gpu' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: linux-bionic-cuda10.2-py3.9-gcc7-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - test_default_1_2: - name: test (default, 1, 2, linux.4xlarge.nvidia.gpu) - needs: build - runs-on: linux.4xlarge.nvidia.gpu - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: linux-bionic-cuda10.2-py3.9-gcc7-test - TEST_CONFIG: default - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 2 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG - with: - timeout_minutes: 10 - max_attempts: 3 - command: | - set -ex - bash .github/scripts/install_nvidia_utils_linux.sh - echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-1-2-linux.4xlarge.nvidia.gpu' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-1-2-linux.4xlarge.nvidia.gpu' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: linux-bionic-cuda10.2-py3.9-gcc7-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - test_default_2_2: - name: test (default, 2, 2, linux.4xlarge.nvidia.gpu) - needs: build - runs-on: linux.4xlarge.nvidia.gpu - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: linux-bionic-cuda10.2-py3.9-gcc7-test - TEST_CONFIG: default - SHARD_NUMBER: 2 - NUM_TEST_SHARDS: 2 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG - with: - timeout_minutes: 10 - max_attempts: 3 - command: | - set -ex - bash .github/scripts/install_nvidia_utils_linux.sh - echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-2-2-linux.4xlarge.nvidia.gpu' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-2-2-linux.4xlarge.nvidia.gpu' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: linux-bionic-cuda10.2-py3.9-gcc7-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af diff --git a/.github/workflows/generated-linux-bionic-py3.7-clang9.yml b/.github/workflows/generated-linux-bionic-py3.7-clang9.yml deleted file mode 100644 index b77d051c6b62cd..00000000000000 --- a/.github/workflows/generated-linux-bionic-py3.7-clang9.yml +++ /dev/null @@ -1,995 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/linux_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: linux-bionic-py3.7-clang9 - -on: - pull_request: - push: - tags: - - 'ciflow/all/*' - - 'ciflow/cpu/*' - - 'ciflow/linux/*' - - 'ciflow/noarch/*' - - 'ciflow/trunk/*' - branches: - - master - - main - - release/* - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: linux-bionic-py3.7-clang9 - DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-py3.7-clang9 - SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 - XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla - TORCH_CUDA_ARCH_LIST: 5.2 - IN_CI: 1 - IS_GHA: 1 - # This is used for the phase of adding wheel tests only, will be removed once completed - IN_WHEEL_TEST: 1 - # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh - CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - AWS_DEFAULT_REGION: us-east-1 - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - PYTORCH_RETRY_TEST_CASES: 1 -concurrency: - group: linux-bionic-py3.7-clang9-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true - -jobs: - - build: - runs-on: linux.2xlarge - timeout-minutes: 240 - env: - JOB_BASE_NAME: linux-bionic-py3.7-clang9-build - outputs: - docker_image: ${{ steps.calculate-tag.outputs.docker_image }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Calculate docker image tag - id: calculate-tag - run: | - DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker) - echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "::set-output name=docker_tag::${DOCKER_TAG}" - echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" - - name: Check if image should be built - id: check - env: - BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }} - run: | - set -x - # Check if image already exists, if it does then skip building it - if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then - exit 0 - fi - if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then - # if we're on the base branch then use the parent commit - MERGE_BASE=$(git rev-parse HEAD~) - else - # otherwise we're on a PR, so use the most recent base commit - MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") - fi - # Covers the case where a previous tag doesn't exist for the tree - # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly - if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then - echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit" - exit 1 - fi - PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker") - # If no image exists but the hash is the same as the previous hash then we should error out here - if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then - echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch" - echo " contact the PyTorch team to restore the original images" - exit 1 - fi - echo ::set-output name=rebuild::yes - - name: Build and push docker image - if: ${{ steps.check.outputs.rebuild }} - env: - DOCKER_SKIP_S3_UPLOAD: 1 - working-directory: .circleci/docker - run: | - export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/} - ./build_docker.sh - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Build - env: - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - # detached container should get cleaned up by teardown_ec2_linux - container_name=$(docker run \ - -e BUILD_ENVIRONMENT \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e AWS_DEFAULT_REGION \ - -e IS_GHA \ - -e PR_NUMBER \ - -e SHA1 \ - -e BRANCH \ - -e GITHUB_RUN_ID \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e PR_LABELS \ - -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh' - - name: Display and upload binary build size statistics (Click Me) - # temporary hack: set CIRCLE_* vars, until we update - # tools/stats/print_test_stats.py to natively support GitHub Actions - env: - SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - run: | - COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0) - export COMMIT_TIME - pip3 install requests==2.26 boto3==1.16.34 - python3 -m tools.stats.upload_binary_size_to_scuba || exit 0 - - name: Chown workspace - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Archive artifacts into zip - run: | - zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json - - uses: seemethere/upload-artifact-s3@v3 - name: Store PyTorch Build Artifacts on S3 - with: - name: ${{ env.BUILD_ENVIRONMENT }} - retention-days: 14 - if-no-files-found: error - path: - artifacts.zip - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Clean up docker images - if: always() - run: | - # Prune all of the docker images - docker system prune -af - - test_noarch_1_1: - name: test (noarch, 1, 1, linux.2xlarge) - needs: build - runs-on: linux.2xlarge - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: linux-bionic-py3.7-clang9-test - TEST_CONFIG: noarch - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 1 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-noarch-1-1-linux.2xlarge' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-noarch-1-1-linux.2xlarge' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: linux-bionic-py3.7-clang9-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - test_default_1_2: - name: test (default, 1, 2, linux.2xlarge) - needs: build - runs-on: linux.2xlarge - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: linux-bionic-py3.7-clang9-test - TEST_CONFIG: default - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 2 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-1-2-linux.2xlarge' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-1-2-linux.2xlarge' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: linux-bionic-py3.7-clang9-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - test_default_2_2: - name: test (default, 2, 2, linux.2xlarge) - needs: build - runs-on: linux.2xlarge - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: linux-bionic-py3.7-clang9-test - TEST_CONFIG: default - SHARD_NUMBER: 2 - NUM_TEST_SHARDS: 2 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-2-2-linux.2xlarge' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-2-2-linux.2xlarge' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: linux-bionic-py3.7-clang9-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af diff --git a/.github/workflows/generated-linux-bionic-rocm4.5-py3.7.yml b/.github/workflows/generated-linux-bionic-rocm4.5-py3.7.yml deleted file mode 100644 index bc7d226e5c1e42..00000000000000 --- a/.github/workflows/generated-linux-bionic-rocm4.5-py3.7.yml +++ /dev/null @@ -1,922 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/linux_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: linux-bionic-rocm4.5-py3.7 - -on: - pull_request: - push: - tags: - - 'ciflow/all/*' - - 'ciflow/linux/*' - - 'ciflow/rocm/*' - - 'ciflow/trunk/*' - branches: - - master - - main - - release/* - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: linux-bionic-rocm4.5-py3.7 - DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-rocm4.5-py3.7 - SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 - XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla - TORCH_CUDA_ARCH_LIST: 5.2 - IN_CI: 1 - IS_GHA: 1 - # This is used for the phase of adding wheel tests only, will be removed once completed - IN_WHEEL_TEST: 1 - # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh - CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - AWS_DEFAULT_REGION: us-east-1 - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - PYTORCH_RETRY_TEST_CASES: 1 -concurrency: - group: linux-bionic-rocm4.5-py3.7-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true - -jobs: - - build: - runs-on: linux.2xlarge - timeout-minutes: 240 - env: - JOB_BASE_NAME: linux-bionic-rocm4.5-py3.7-build - outputs: - docker_image: ${{ steps.calculate-tag.outputs.docker_image }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Calculate docker image tag - id: calculate-tag - run: | - DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker) - echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "::set-output name=docker_tag::${DOCKER_TAG}" - echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" - - name: Check if image should be built - id: check - env: - BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }} - run: | - set -x - # Check if image already exists, if it does then skip building it - if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then - exit 0 - fi - if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then - # if we're on the base branch then use the parent commit - MERGE_BASE=$(git rev-parse HEAD~) - else - # otherwise we're on a PR, so use the most recent base commit - MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") - fi - # Covers the case where a previous tag doesn't exist for the tree - # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly - if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then - echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit" - exit 1 - fi - PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker") - # If no image exists but the hash is the same as the previous hash then we should error out here - if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then - echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch" - echo " contact the PyTorch team to restore the original images" - exit 1 - fi - echo ::set-output name=rebuild::yes - - name: Build and push docker image - if: ${{ steps.check.outputs.rebuild }} - env: - DOCKER_SKIP_S3_UPLOAD: 1 - working-directory: .circleci/docker - run: | - export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/} - ./build_docker.sh - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Build - env: - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - # detached container should get cleaned up by teardown_ec2_linux - container_name=$(docker run \ - -e BUILD_ENVIRONMENT \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e AWS_DEFAULT_REGION \ - -e IS_GHA \ - -e PR_NUMBER \ - -e SHA1 \ - -e BRANCH \ - -e GITHUB_RUN_ID \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e PR_LABELS \ - -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh' - - name: Display and upload binary build size statistics (Click Me) - # temporary hack: set CIRCLE_* vars, until we update - # tools/stats/print_test_stats.py to natively support GitHub Actions - env: - SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - run: | - COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0) - export COMMIT_TIME - pip3 install requests==2.26 boto3==1.16.34 - python3 -m tools.stats.upload_binary_size_to_scuba || exit 0 - - name: Chown workspace - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Archive artifacts into zip - run: | - zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json - - uses: seemethere/upload-artifact-s3@v3 - name: Store PyTorch Build Artifacts on S3 - with: - name: ${{ env.BUILD_ENVIRONMENT }} - retention-days: 14 - if-no-files-found: error - path: - artifacts.zip - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Clean up docker images - if: always() - run: | - # Prune all of the docker images - docker system prune -af - - test_distributed_1_1: - name: test (distributed, 1, 1, linux.rocm.gpu) - needs: build - runs-on: linux.rocm.gpu - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: linux-bionic-rocm4.5-py3.7-test - TEST_CONFIG: distributed - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 1 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: Set DOCKER_HOST - run: echo "DOCKER_HOST=unix:///run/user/$(id -u)/docker.sock" >> "${GITHUB_ENV}" - - name: Runner health check system info - if: always() - run: | - cat /etc/os-release || true - cat /etc/apt/sources.list.d/rocm.list || true - cat /opt/rocm/.info/version || true - whoami - - name: Runner health check rocm-smi - if: always() - run: | - rocm-smi - - name: Runner health check rocminfo - if: always() - run: | - rocminfo - - name: Runner health check GPU count - if: always() - run: | - ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx') - if [[ "x$ngpu" != "x2" && "x$ngpu" != "x4" ]]; then - echo "Failed to detect GPUs on the runner" - exit 1 - fi - - name: Runner health check disconnect on failure - if: ${{ failure() }} - run: | - killall runsvc.sh - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: ROCm set GPU_FLAG - run: | - echo "GPU_FLAG=--device=/dev/mem --device=/dev/kfd --device=/dev/dri --group-add video --group-add daemon" >> "${GITHUB_ENV}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - # jenkins user does not have write permission to mounted workspace; work-around by copying within container to jenkins home - docker exec -t "${container_name}" sh -c "cd .. && cp -R workspace pytorch && cd pytorch && pip install dist/*.whl && ${TEST_COMMAND}" - # copy test results back to the mounted workspace, needed sudo, resulting permissions were correct - docker exec -t "${container_name}" sh -c "cd ../pytorch && sudo cp -R test/test-reports ../workspace/test" - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-distributed-1-1-linux.rocm.gpu' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: actions/upload-artifact@v2 - name: Store Test Downloaded JSONs on Github - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-distributed-1-1-linux.rocm.gpu' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: actions/upload-artifact@v2 - name: Store Test Reports on Github - if: always() - with: - name: test-reports - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: linux-bionic-rocm4.5-py3.7-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - test_default_1_2: - name: test (default, 1, 2, linux.rocm.gpu) - needs: build - runs-on: linux.rocm.gpu - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: linux-bionic-rocm4.5-py3.7-test - TEST_CONFIG: default - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 2 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: Set DOCKER_HOST - run: echo "DOCKER_HOST=unix:///run/user/$(id -u)/docker.sock" >> "${GITHUB_ENV}" - - name: Runner health check system info - if: always() - run: | - cat /etc/os-release || true - cat /etc/apt/sources.list.d/rocm.list || true - cat /opt/rocm/.info/version || true - whoami - - name: Runner health check rocm-smi - if: always() - run: | - rocm-smi - - name: Runner health check rocminfo - if: always() - run: | - rocminfo - - name: Runner health check GPU count - if: always() - run: | - ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx') - if [[ "x$ngpu" != "x2" && "x$ngpu" != "x4" ]]; then - echo "Failed to detect GPUs on the runner" - exit 1 - fi - - name: Runner health check disconnect on failure - if: ${{ failure() }} - run: | - killall runsvc.sh - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: ROCm set GPU_FLAG - run: | - echo "GPU_FLAG=--device=/dev/mem --device=/dev/kfd --device=/dev/dri --group-add video --group-add daemon" >> "${GITHUB_ENV}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - # jenkins user does not have write permission to mounted workspace; work-around by copying within container to jenkins home - docker exec -t "${container_name}" sh -c "cd .. && cp -R workspace pytorch && cd pytorch && pip install dist/*.whl && ${TEST_COMMAND}" - # copy test results back to the mounted workspace, needed sudo, resulting permissions were correct - docker exec -t "${container_name}" sh -c "cd ../pytorch && sudo cp -R test/test-reports ../workspace/test" - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-1-2-linux.rocm.gpu' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: actions/upload-artifact@v2 - name: Store Test Downloaded JSONs on Github - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-1-2-linux.rocm.gpu' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: actions/upload-artifact@v2 - name: Store Test Reports on Github - if: always() - with: - name: test-reports - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: linux-bionic-rocm4.5-py3.7-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - test_default_2_2: - name: test (default, 2, 2, linux.rocm.gpu) - needs: build - runs-on: linux.rocm.gpu - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: linux-bionic-rocm4.5-py3.7-test - TEST_CONFIG: default - SHARD_NUMBER: 2 - NUM_TEST_SHARDS: 2 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: Set DOCKER_HOST - run: echo "DOCKER_HOST=unix:///run/user/$(id -u)/docker.sock" >> "${GITHUB_ENV}" - - name: Runner health check system info - if: always() - run: | - cat /etc/os-release || true - cat /etc/apt/sources.list.d/rocm.list || true - cat /opt/rocm/.info/version || true - whoami - - name: Runner health check rocm-smi - if: always() - run: | - rocm-smi - - name: Runner health check rocminfo - if: always() - run: | - rocminfo - - name: Runner health check GPU count - if: always() - run: | - ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx') - if [[ "x$ngpu" != "x2" && "x$ngpu" != "x4" ]]; then - echo "Failed to detect GPUs on the runner" - exit 1 - fi - - name: Runner health check disconnect on failure - if: ${{ failure() }} - run: | - killall runsvc.sh - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: ROCm set GPU_FLAG - run: | - echo "GPU_FLAG=--device=/dev/mem --device=/dev/kfd --device=/dev/dri --group-add video --group-add daemon" >> "${GITHUB_ENV}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - # jenkins user does not have write permission to mounted workspace; work-around by copying within container to jenkins home - docker exec -t "${container_name}" sh -c "cd .. && cp -R workspace pytorch && cd pytorch && pip install dist/*.whl && ${TEST_COMMAND}" - # copy test results back to the mounted workspace, needed sudo, resulting permissions were correct - docker exec -t "${container_name}" sh -c "cd ../pytorch && sudo cp -R test/test-reports ../workspace/test" - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-2-2-linux.rocm.gpu' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: actions/upload-artifact@v2 - name: Store Test Downloaded JSONs on Github - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-2-2-linux.rocm.gpu' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: actions/upload-artifact@v2 - name: Store Test Reports on Github - if: always() - with: - name: test-reports - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: linux-bionic-rocm4.5-py3.7-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af diff --git a/.github/workflows/generated-linux-docs-push.yml b/.github/workflows/generated-linux-docs-push.yml deleted file mode 100644 index 0a99fcf684f9ba..00000000000000 --- a/.github/workflows/generated-linux-docs-push.yml +++ /dev/null @@ -1,395 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/linux_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: linux-docs-push - -on: - push: - tags: - # NOTE: Binary build pipelines should only get triggered on release candidate builds - # Release candidate tags look like: v1.11.0-rc1 - - v[0-9]+.[0-9]+.[0-9]+-rc[0-9]+ - - 'ciflow/all/*' - - 'ciflow/cpu/*' - - 'ciflow/linux/*' - - 'ciflow/scheduled/*' - schedule: - - cron: 0 0 * * * - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: linux-docs-push - DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc5.4 - SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 - XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla - TORCH_CUDA_ARCH_LIST: 5.2 - IN_CI: 1 - IS_GHA: 1 - # This is used for the phase of adding wheel tests only, will be removed once completed - IN_WHEEL_TEST: 1 - # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh - CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - AWS_DEFAULT_REGION: us-east-1 - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - PYTORCH_RETRY_TEST_CASES: 1 -concurrency: - group: linux-docs-push-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true - -jobs: - - build: - runs-on: linux.2xlarge - timeout-minutes: 240 - env: - JOB_BASE_NAME: linux-docs-push-build - outputs: - docker_image: ${{ steps.calculate-tag.outputs.docker_image }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Calculate docker image tag - id: calculate-tag - run: | - DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker) - echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "::set-output name=docker_tag::${DOCKER_TAG}" - echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" - - name: Check if image should be built - id: check - env: - BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }} - run: | - set -x - # Check if image already exists, if it does then skip building it - if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then - exit 0 - fi - if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then - # if we're on the base branch then use the parent commit - MERGE_BASE=$(git rev-parse HEAD~) - else - # otherwise we're on a PR, so use the most recent base commit - MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") - fi - # Covers the case where a previous tag doesn't exist for the tree - # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly - if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then - echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit" - exit 1 - fi - PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker") - # If no image exists but the hash is the same as the previous hash then we should error out here - if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then - echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch" - echo " contact the PyTorch team to restore the original images" - exit 1 - fi - echo ::set-output name=rebuild::yes - - name: Build and push docker image - if: ${{ steps.check.outputs.rebuild }} - env: - DOCKER_SKIP_S3_UPLOAD: 1 - working-directory: .circleci/docker - run: | - export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/} - ./build_docker.sh - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Build - env: - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - # detached container should get cleaned up by teardown_ec2_linux - container_name=$(docker run \ - -e BUILD_ENVIRONMENT \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e AWS_DEFAULT_REGION \ - -e IS_GHA \ - -e PR_NUMBER \ - -e SHA1 \ - -e BRANCH \ - -e GITHUB_RUN_ID \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e PR_LABELS \ - -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh' - - name: Display and upload binary build size statistics (Click Me) - # temporary hack: set CIRCLE_* vars, until we update - # tools/stats/print_test_stats.py to natively support GitHub Actions - env: - SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - run: | - COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0) - export COMMIT_TIME - pip3 install requests==2.26 boto3==1.16.34 - python3 -m tools.stats.upload_binary_size_to_scuba || exit 0 - - name: Chown workspace - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Archive artifacts into zip - run: | - zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json - - uses: seemethere/upload-artifact-s3@v3 - name: Store PyTorch Build Artifacts on S3 - with: - name: ${{ env.BUILD_ENVIRONMENT }} - retention-days: 14 - if-no-files-found: error - path: - artifacts.zip - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Clean up docker images - if: always() - run: | - # Prune all of the docker images - docker system prune -af - build-docs: - runs-on: linux.2xlarge - timeout-minutes: 240 - strategy: - matrix: - docs_type: [cpp, python] - needs: [build] - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - DOCS_TYPE: ${{ matrix.docs_type }} - WITH_PUSH: ${{ github.event_name == 'schedule' || startsWith(github.event.ref, 'refs/tags/v') }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Generate netrc (only for docs-push) - if: ${{ github.event_name == 'schedule' || startsWith(github.event.ref, 'refs/tags/v') }} - env: - GITHUB_PYTORCHBOT_TOKEN: ${{ secrets.GH_PYTORCHBOT_TOKEN }} - run: | - # set credentials for https pushing - echo "machine github.com" > "${RUNNER_TEMP}/.netrc" - echo "login pytorchbot" >> "${RUNNER_TEMP}/.netrc" - echo "password ${GITHUB_PYTORCHBOT_TOKEN}" >> "${RUNNER_TEMP}/.netrc" - - name: Build ${{ matrix.docs_type }} docs - run: | - set -ex - time docker pull "${DOCKER_IMAGE}" > /dev/null - # Convert refs/tags/v1.12.0rc3 into 1.12 - if [[ "${GITHUB_REF}" =~ ^refs/tags/v([0-9]+\.[0-9]+)\.* ]]; then - target="${BASH_REMATCH[1]}" - else - target="master" - fi - # detached container should get cleaned up by teardown_ec2_linux - container_name=$(docker run \ - -e BUILD_ENVIRONMENT \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e IN_CI \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SHA1="$GITHUB_SHA" \ - -e DOCS_VERSION="${target}" \ - -e DOCS_TYPE \ - -e PR_LABELS \ - -e WITH_PUSH \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -v "${RUNNER_TEMP}/.netrc":/var/lib/jenkins/.netrc \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" bash -c "sudo chown -R jenkins . && pip install dist/*.whl && ./.circleci/scripts/${DOCS_TYPE}_doc_push_script.sh" - - name: Chown workspace - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 - name: Upload Python Docs Preview - if: ${{ github.event_name == 'pull_request' && matrix.docs_type == 'python' }} - with: - retention-days: 14 - s3-bucket: doc-previews - if-no-files-found: error - path: pytorch.github.io/docs/master/ - s3-prefix: pytorch/${{ github.event.pull_request.number }} - - uses: seemethere/upload-artifact-s3@v3 - name: Upload C++ Docs Preview - if: ${{ github.event_name == 'pull_request' && matrix.docs_type == 'cpp' }} - with: - retention-days: 14 - if-no-files-found: error - s3-bucket: doc-previews - path: cppdocs/ - s3-prefix: pytorch/${{ github.event.pull_request.number }}/cppdocs diff --git a/.github/workflows/generated-linux-docs.yml b/.github/workflows/generated-linux-docs.yml deleted file mode 100644 index f5c73edb01f531..00000000000000 --- a/.github/workflows/generated-linux-docs.yml +++ /dev/null @@ -1,386 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/linux_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: linux-docs - -on: - pull_request: - push: - tags: - - 'ciflow/all/*' - - 'ciflow/cpu/*' - - 'ciflow/docs/*' - - 'ciflow/linux/*' - - 'ciflow/trunk/*' - branches: - - master - - main - - release/* - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: linux-docs - DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc5.4 - SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 - XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla - TORCH_CUDA_ARCH_LIST: 5.2 - IN_CI: 1 - IS_GHA: 1 - # This is used for the phase of adding wheel tests only, will be removed once completed - IN_WHEEL_TEST: 1 - # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh - CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - AWS_DEFAULT_REGION: us-east-1 - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - PYTORCH_RETRY_TEST_CASES: 1 -concurrency: - group: linux-docs-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true - -jobs: - - build: - runs-on: linux.2xlarge - timeout-minutes: 240 - env: - JOB_BASE_NAME: linux-docs-build - outputs: - docker_image: ${{ steps.calculate-tag.outputs.docker_image }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Calculate docker image tag - id: calculate-tag - run: | - DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker) - echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "::set-output name=docker_tag::${DOCKER_TAG}" - echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" - - name: Check if image should be built - id: check - env: - BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }} - run: | - set -x - # Check if image already exists, if it does then skip building it - if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then - exit 0 - fi - if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then - # if we're on the base branch then use the parent commit - MERGE_BASE=$(git rev-parse HEAD~) - else - # otherwise we're on a PR, so use the most recent base commit - MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") - fi - # Covers the case where a previous tag doesn't exist for the tree - # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly - if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then - echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit" - exit 1 - fi - PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker") - # If no image exists but the hash is the same as the previous hash then we should error out here - if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then - echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch" - echo " contact the PyTorch team to restore the original images" - exit 1 - fi - echo ::set-output name=rebuild::yes - - name: Build and push docker image - if: ${{ steps.check.outputs.rebuild }} - env: - DOCKER_SKIP_S3_UPLOAD: 1 - working-directory: .circleci/docker - run: | - export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/} - ./build_docker.sh - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Build - env: - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - # detached container should get cleaned up by teardown_ec2_linux - container_name=$(docker run \ - -e BUILD_ENVIRONMENT \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e AWS_DEFAULT_REGION \ - -e IS_GHA \ - -e PR_NUMBER \ - -e SHA1 \ - -e BRANCH \ - -e GITHUB_RUN_ID \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e PR_LABELS \ - -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh' - - name: Display and upload binary build size statistics (Click Me) - # temporary hack: set CIRCLE_* vars, until we update - # tools/stats/print_test_stats.py to natively support GitHub Actions - env: - SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - run: | - COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0) - export COMMIT_TIME - pip3 install requests==2.26 boto3==1.16.34 - python3 -m tools.stats.upload_binary_size_to_scuba || exit 0 - - name: Chown workspace - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Archive artifacts into zip - run: | - zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json - - uses: seemethere/upload-artifact-s3@v3 - name: Store PyTorch Build Artifacts on S3 - with: - name: ${{ env.BUILD_ENVIRONMENT }} - retention-days: 14 - if-no-files-found: error - path: - artifacts.zip - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Clean up docker images - if: always() - run: | - # Prune all of the docker images - docker system prune -af - build-docs: - runs-on: linux.2xlarge - timeout-minutes: 240 - strategy: - matrix: - docs_type: [cpp, python] - needs: [build] - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - DOCS_TYPE: ${{ matrix.docs_type }} - WITH_PUSH: ${{ github.event_name == 'schedule' || startsWith(github.event.ref, 'refs/tags/v') }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Build ${{ matrix.docs_type }} docs - run: | - set -ex - time docker pull "${DOCKER_IMAGE}" > /dev/null - # Convert refs/tags/v1.12.0rc3 into 1.12 - if [[ "${GITHUB_REF}" =~ ^refs/tags/v([0-9]+\.[0-9]+)\.* ]]; then - target="${BASH_REMATCH[1]}" - else - target="master" - fi - # detached container should get cleaned up by teardown_ec2_linux - container_name=$(docker run \ - -e BUILD_ENVIRONMENT \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e IN_CI \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SHA1="$GITHUB_SHA" \ - -e DOCS_VERSION="${target}" \ - -e DOCS_TYPE \ - -e PR_LABELS \ - -e WITH_PUSH \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" bash -c "sudo chown -R jenkins . && pip install dist/*.whl && ./.circleci/scripts/${DOCS_TYPE}_doc_push_script.sh" - - name: Chown workspace - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v3 - name: Upload Python Docs Preview - if: ${{ github.event_name == 'pull_request' && matrix.docs_type == 'python' }} - with: - retention-days: 14 - s3-bucket: doc-previews - if-no-files-found: error - path: pytorch.github.io/docs/master/ - s3-prefix: pytorch/${{ github.event.pull_request.number }} - - uses: seemethere/upload-artifact-s3@v3 - name: Upload C++ Docs Preview - if: ${{ github.event_name == 'pull_request' && matrix.docs_type == 'cpp' }} - with: - retention-days: 14 - if-no-files-found: error - s3-bucket: doc-previews - path: cppdocs/ - s3-prefix: pytorch/${{ github.event.pull_request.number }}/cppdocs diff --git a/.github/workflows/generated-linux-vulkan-bionic-py3.7-clang9.yml b/.github/workflows/generated-linux-vulkan-bionic-py3.7-clang9.yml deleted file mode 100644 index 3aeaf3d6b49996..00000000000000 --- a/.github/workflows/generated-linux-vulkan-bionic-py3.7-clang9.yml +++ /dev/null @@ -1,501 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/linux_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: linux-vulkan-bionic-py3.7-clang9 - -on: - pull_request: - push: - tags: - - 'ciflow/all/*' - - 'ciflow/cpu/*' - - 'ciflow/linux/*' - - 'ciflow/trunk/*' - - 'ciflow/vulkan/*' - branches: - - master - - main - - release/* - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: linux-vulkan-bionic-py3.7-clang9 - DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-py3.7-clang9 - SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 - XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla - TORCH_CUDA_ARCH_LIST: 5.2 - IN_CI: 1 - IS_GHA: 1 - # This is used for the phase of adding wheel tests only, will be removed once completed - IN_WHEEL_TEST: 1 - # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh - CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - AWS_DEFAULT_REGION: us-east-1 - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - PYTORCH_RETRY_TEST_CASES: 1 -concurrency: - group: linux-vulkan-bionic-py3.7-clang9-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true - -jobs: - - build: - runs-on: linux.2xlarge - timeout-minutes: 240 - env: - JOB_BASE_NAME: linux-vulkan-bionic-py3.7-clang9-build - outputs: - docker_image: ${{ steps.calculate-tag.outputs.docker_image }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Calculate docker image tag - id: calculate-tag - run: | - DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker) - echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "::set-output name=docker_tag::${DOCKER_TAG}" - echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" - - name: Check if image should be built - id: check - env: - BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }} - run: | - set -x - # Check if image already exists, if it does then skip building it - if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then - exit 0 - fi - if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then - # if we're on the base branch then use the parent commit - MERGE_BASE=$(git rev-parse HEAD~) - else - # otherwise we're on a PR, so use the most recent base commit - MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") - fi - # Covers the case where a previous tag doesn't exist for the tree - # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly - if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then - echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit" - exit 1 - fi - PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker") - # If no image exists but the hash is the same as the previous hash then we should error out here - if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then - echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch" - echo " contact the PyTorch team to restore the original images" - exit 1 - fi - echo ::set-output name=rebuild::yes - - name: Build and push docker image - if: ${{ steps.check.outputs.rebuild }} - env: - DOCKER_SKIP_S3_UPLOAD: 1 - working-directory: .circleci/docker - run: | - export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/} - ./build_docker.sh - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Build - env: - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - # detached container should get cleaned up by teardown_ec2_linux - container_name=$(docker run \ - -e BUILD_ENVIRONMENT \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e AWS_DEFAULT_REGION \ - -e IS_GHA \ - -e PR_NUMBER \ - -e SHA1 \ - -e BRANCH \ - -e GITHUB_RUN_ID \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e PR_LABELS \ - -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh' - - name: Display and upload binary build size statistics (Click Me) - # temporary hack: set CIRCLE_* vars, until we update - # tools/stats/print_test_stats.py to natively support GitHub Actions - env: - SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - run: | - COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0) - export COMMIT_TIME - pip3 install requests==2.26 boto3==1.16.34 - python3 -m tools.stats.upload_binary_size_to_scuba || exit 0 - - name: Chown workspace - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Archive artifacts into zip - run: | - zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json - - uses: seemethere/upload-artifact-s3@v3 - name: Store PyTorch Build Artifacts on S3 - with: - name: ${{ env.BUILD_ENVIRONMENT }} - retention-days: 14 - if-no-files-found: error - path: - artifacts.zip - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Clean up docker images - if: always() - run: | - # Prune all of the docker images - docker system prune -af - - test_default_1_1: - name: test (default, 1, 1, linux.2xlarge) - needs: build - runs-on: linux.2xlarge - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: linux-vulkan-bionic-py3.7-clang9-test - TEST_CONFIG: default - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 1 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-1-1-linux.2xlarge' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-1-1-linux.2xlarge' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: linux-vulkan-bionic-py3.7-clang9-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af diff --git a/.github/workflows/generated-linux-xenial-cuda11.3-py3.7-gcc7-bazel-test.yml b/.github/workflows/generated-linux-xenial-cuda11.3-py3.7-gcc7-bazel-test.yml deleted file mode 100644 index dc0d30c1c72e88..00000000000000 --- a/.github/workflows/generated-linux-xenial-cuda11.3-py3.7-gcc7-bazel-test.yml +++ /dev/null @@ -1,337 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/bazel_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: linux-xenial-cuda11.3-py3.7-gcc7-bazel-test - -on: - pull_request: - push: - tags: - - 'ciflow/all/*' - - 'ciflow/bazel/*' - - 'ciflow/cpu/*' - - 'ciflow/linux/*' - - 'ciflow/trunk/*' - branches: - - master - - main - - release/* - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: linux-xenial-cuda11.3-py3.7-gcc7-bazel-test - DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7 - SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 - XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla - TORCH_CUDA_ARCH_LIST: 5.2 - IN_CI: 1 - IS_GHA: 1 - # This is used for the phase of adding wheel tests only, will be removed once completed - IN_WHEEL_TEST: 1 - # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh - CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - AWS_DEFAULT_REGION: us-east-1 - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - PYTORCH_RETRY_TEST_CASES: 1 -concurrency: - group: linux-xenial-cuda11.3-py3.7-gcc7-bazel-test-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true - -jobs: - - # building and testing in a single job since bazel runs only small subset of tests - build-and-test: - runs-on: linux.2xlarge - env: - JOB_BASE_NAME: linux-xenial-cuda11.3-py3.7-gcc7-bazel-test-build-and-test - NUM_TEST_SHARDS: 1 - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Calculate docker image tag - id: calculate-tag - run: | - DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker) - echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "::set-output name=docker_tag::${DOCKER_TAG}" - echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" - - name: Check if image should be built - id: check - env: - BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }} - run: | - set -x - # Check if image already exists, if it does then skip building it - if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then - exit 0 - fi - if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then - # if we're on the base branch then use the parent commit - MERGE_BASE=$(git rev-parse HEAD~) - else - # otherwise we're on a PR, so use the most recent base commit - MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") - fi - # Covers the case where a previous tag doesn't exist for the tree - # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly - if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then - echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit" - exit 1 - fi - PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker") - # If no image exists but the hash is the same as the previous hash then we should error out here - if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then - echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch" - echo " contact the PyTorch team to restore the original images" - exit 1 - fi - echo ::set-output name=rebuild::yes - - name: Build and push docker image - if: ${{ steps.check.outputs.rebuild }} - env: - DOCKER_SKIP_S3_UPLOAD: 1 - working-directory: .circleci/docker - run: | - export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/} - ./build_docker.sh - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - name: Output disk space left - run: | - sudo df -H - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Build - run: | - # detached container should get cleaned up by teardown_ec2_linux - container_name=$(docker run \ - -e BUILD_ENVIRONMENT \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e PR_LABELS \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && sudo chown -R jenkins /dev && .jenkins/pytorch/build.sh' - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Display and upload binary build size statistics (Click Me) - # temporary hack: set CIRCLE_* vars, until we update - # tools/stats/print_test_stats.py to natively support GitHub Actions - env: - AWS_DEFAULT_REGION: us-east-1 - SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - run: | - # The artifact file is created inside docker container, which contains the result binaries. - # Now unpackage it into the project folder. The subsequent script will scan project folder - # to locate result binaries and report their sizes. - # If artifact file is not provided it assumes that the project folder has been mounted in - # the docker during build and already contains the result binaries, so this step can be skipped. - export ARTIFACTS= - if [ -n "${ARTIFACTS}" ]; then - tar xf "${ARTIFACTS}" -C "${GITHUB_WORKSPACE}" - cd "${GITHUB_WORKSPACE}" - fi - COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0) - export COMMIT_TIME - ANDROID_BUILD_TYPE= - export ANDROID_BUILD_TYPE - pip3 install requests==2.26 boto3==1.16.34 - python3 -m tools.stats.upload_binary_size_to_scuba "android" || exit 0 - - name: Test - # Time out the test phase after 3.5 hours - timeout-minutes: 210 - run: | - # detached container should get cleaned up by teardown_ec2_linux - export SHARD_NUMBER=0 - # TODO: Stop building test binaries as part of the build phase - # Make sure we copy test results from bazel-testlogs symlink to - # a regular directory ./test/test-reports - container_name=$(docker run \ - -e BUILD_ENVIRONMENT \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e SHARD_NUMBER \ - -e NUM_TEST_SHARDS \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e PR_LABELS \ - -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && sudo chown -R jenkins /dev && .jenkins/pytorch/test.sh && cp -Lr ./bazel-testlogs ./test/test-reports' - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: 'bazel-${{ github.job }}' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: 'bazel-${{ github.job }}' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: linux-xenial-cuda11.3-py3.7-gcc7-bazel-test-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af diff --git a/.github/workflows/generated-linux-xenial-cuda11.3-py3.7-gcc7-no-ops.yml b/.github/workflows/generated-linux-xenial-cuda11.3-py3.7-gcc7-no-ops.yml deleted file mode 100644 index 362e4db272ebe9..00000000000000 --- a/.github/workflows/generated-linux-xenial-cuda11.3-py3.7-gcc7-no-ops.yml +++ /dev/null @@ -1,251 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/linux_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: linux-xenial-cuda11.3-py3.7-gcc7-no-ops - -on: - push: - tags: - - 'ciflow/all/*' - - 'ciflow/cuda/*' - - 'ciflow/linux/*' - - 'ciflow/trunk/*' - branches: - - master - - main - - release/* - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: linux-xenial-cuda11.3-py3.7-gcc7-no-ops - DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7 - SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 - XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla - TORCH_CUDA_ARCH_LIST: 5.2 - IN_CI: 1 - IS_GHA: 1 - # This is used for the phase of adding wheel tests only, will be removed once completed - IN_WHEEL_TEST: 1 - # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh - CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - AWS_DEFAULT_REGION: us-east-1 - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - PYTORCH_RETRY_TEST_CASES: 1 -concurrency: - group: linux-xenial-cuda11.3-py3.7-gcc7-no-ops-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true - -jobs: - - build: - runs-on: linux.2xlarge - timeout-minutes: 240 - env: - JOB_BASE_NAME: linux-xenial-cuda11.3-py3.7-gcc7-no-ops-build - outputs: - docker_image: ${{ steps.calculate-tag.outputs.docker_image }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Calculate docker image tag - id: calculate-tag - run: | - DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker) - echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "::set-output name=docker_tag::${DOCKER_TAG}" - echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" - - name: Check if image should be built - id: check - env: - BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }} - run: | - set -x - # Check if image already exists, if it does then skip building it - if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then - exit 0 - fi - if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then - # if we're on the base branch then use the parent commit - MERGE_BASE=$(git rev-parse HEAD~) - else - # otherwise we're on a PR, so use the most recent base commit - MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") - fi - # Covers the case where a previous tag doesn't exist for the tree - # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly - if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then - echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit" - exit 1 - fi - PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker") - # If no image exists but the hash is the same as the previous hash then we should error out here - if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then - echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch" - echo " contact the PyTorch team to restore the original images" - exit 1 - fi - echo ::set-output name=rebuild::yes - - name: Build and push docker image - if: ${{ steps.check.outputs.rebuild }} - env: - DOCKER_SKIP_S3_UPLOAD: 1 - working-directory: .circleci/docker - run: | - export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/} - ./build_docker.sh - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Build - env: - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - # detached container should get cleaned up by teardown_ec2_linux - container_name=$(docker run \ - -e BUILD_ENVIRONMENT \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e AWS_DEFAULT_REGION \ - -e IS_GHA \ - -e PR_NUMBER \ - -e SHA1 \ - -e BRANCH \ - -e GITHUB_RUN_ID \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e PR_LABELS \ - -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh' - - name: Display and upload binary build size statistics (Click Me) - # temporary hack: set CIRCLE_* vars, until we update - # tools/stats/print_test_stats.py to natively support GitHub Actions - env: - SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - run: | - COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0) - export COMMIT_TIME - pip3 install requests==2.26 boto3==1.16.34 - python3 -m tools.stats.upload_binary_size_to_scuba || exit 0 - - name: Chown workspace - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Archive artifacts into zip - run: | - zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json - - uses: seemethere/upload-artifact-s3@v3 - name: Store PyTorch Build Artifacts on S3 - with: - name: ${{ env.BUILD_ENVIRONMENT }} - retention-days: 14 - if-no-files-found: error - path: - artifacts.zip - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Clean up docker images - if: always() - run: | - # Prune all of the docker images - docker system prune -af diff --git a/.github/workflows/generated-linux-xenial-cuda11.3-py3.7-gcc7.yml b/.github/workflows/generated-linux-xenial-cuda11.3-py3.7-gcc7.yml deleted file mode 100644 index 2cc43a1c3ec55d..00000000000000 --- a/.github/workflows/generated-linux-xenial-cuda11.3-py3.7-gcc7.yml +++ /dev/null @@ -1,1021 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/linux_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: linux-xenial-cuda11.3-py3.7-gcc7 - -on: - pull_request: - push: - tags: - - 'ciflow/all/*' - - 'ciflow/cuda/*' - - 'ciflow/linux/*' - - 'ciflow/trunk/*' - branches: - - master - - main - - release/* - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: linux-xenial-cuda11.3-py3.7-gcc7 - DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7 - SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 - XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla - TORCH_CUDA_ARCH_LIST: 5.2 - IN_CI: 1 - IS_GHA: 1 - # This is used for the phase of adding wheel tests only, will be removed once completed - IN_WHEEL_TEST: 1 - # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh - CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - AWS_DEFAULT_REGION: us-east-1 - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - PYTORCH_RETRY_TEST_CASES: 1 -concurrency: - group: linux-xenial-cuda11.3-py3.7-gcc7-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true - -jobs: - - build: - runs-on: linux.2xlarge - timeout-minutes: 240 - env: - JOB_BASE_NAME: linux-xenial-cuda11.3-py3.7-gcc7-build - outputs: - docker_image: ${{ steps.calculate-tag.outputs.docker_image }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Calculate docker image tag - id: calculate-tag - run: | - DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker) - echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "::set-output name=docker_tag::${DOCKER_TAG}" - echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" - - name: Check if image should be built - id: check - env: - BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }} - run: | - set -x - # Check if image already exists, if it does then skip building it - if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then - exit 0 - fi - if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then - # if we're on the base branch then use the parent commit - MERGE_BASE=$(git rev-parse HEAD~) - else - # otherwise we're on a PR, so use the most recent base commit - MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") - fi - # Covers the case where a previous tag doesn't exist for the tree - # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly - if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then - echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit" - exit 1 - fi - PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker") - # If no image exists but the hash is the same as the previous hash then we should error out here - if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then - echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch" - echo " contact the PyTorch team to restore the original images" - exit 1 - fi - echo ::set-output name=rebuild::yes - - name: Build and push docker image - if: ${{ steps.check.outputs.rebuild }} - env: - DOCKER_SKIP_S3_UPLOAD: 1 - working-directory: .circleci/docker - run: | - export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/} - ./build_docker.sh - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Build - env: - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - # detached container should get cleaned up by teardown_ec2_linux - container_name=$(docker run \ - -e BUILD_ENVIRONMENT \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e AWS_DEFAULT_REGION \ - -e IS_GHA \ - -e PR_NUMBER \ - -e SHA1 \ - -e BRANCH \ - -e GITHUB_RUN_ID \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e PR_LABELS \ - -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh' - - name: Display and upload binary build size statistics (Click Me) - # temporary hack: set CIRCLE_* vars, until we update - # tools/stats/print_test_stats.py to natively support GitHub Actions - env: - SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - run: | - COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0) - export COMMIT_TIME - pip3 install requests==2.26 boto3==1.16.34 - python3 -m tools.stats.upload_binary_size_to_scuba || exit 0 - - name: Chown workspace - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Archive artifacts into zip - run: | - zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json - - uses: seemethere/upload-artifact-s3@v3 - name: Store PyTorch Build Artifacts on S3 - with: - name: ${{ env.BUILD_ENVIRONMENT }} - retention-days: 14 - if-no-files-found: error - path: - artifacts.zip - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Clean up docker images - if: always() - run: | - # Prune all of the docker images - docker system prune -af - - test_distributed_1_1: - name: test (distributed, 1, 1, linux.8xlarge.nvidia.gpu) - needs: build - runs-on: linux.8xlarge.nvidia.gpu - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: linux-xenial-cuda11.3-py3.7-gcc7-test - TEST_CONFIG: distributed - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 1 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG - with: - timeout_minutes: 10 - max_attempts: 3 - command: | - set -ex - bash .github/scripts/install_nvidia_utils_linux.sh - echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-distributed-1-1-linux.8xlarge.nvidia.gpu' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-distributed-1-1-linux.8xlarge.nvidia.gpu' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: linux-xenial-cuda11.3-py3.7-gcc7-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - test_default_1_2: - name: test (default, 1, 2, linux.4xlarge.nvidia.gpu) - needs: build - runs-on: linux.4xlarge.nvidia.gpu - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: linux-xenial-cuda11.3-py3.7-gcc7-test - TEST_CONFIG: default - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 2 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG - with: - timeout_minutes: 10 - max_attempts: 3 - command: | - set -ex - bash .github/scripts/install_nvidia_utils_linux.sh - echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-1-2-linux.4xlarge.nvidia.gpu' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-1-2-linux.4xlarge.nvidia.gpu' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: linux-xenial-cuda11.3-py3.7-gcc7-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - test_default_2_2: - name: test (default, 2, 2, linux.4xlarge.nvidia.gpu) - needs: build - runs-on: linux.4xlarge.nvidia.gpu - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: linux-xenial-cuda11.3-py3.7-gcc7-test - TEST_CONFIG: default - SHARD_NUMBER: 2 - NUM_TEST_SHARDS: 2 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG - with: - timeout_minutes: 10 - max_attempts: 3 - command: | - set -ex - bash .github/scripts/install_nvidia_utils_linux.sh - echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-2-2-linux.4xlarge.nvidia.gpu' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-2-2-linux.4xlarge.nvidia.gpu' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: linux-xenial-cuda11.3-py3.7-gcc7-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af diff --git a/.github/workflows/generated-linux-xenial-py3-clang5-mobile-build.yml b/.github/workflows/generated-linux-xenial-py3-clang5-mobile-build.yml deleted file mode 100644 index d093e0a976732e..00000000000000 --- a/.github/workflows/generated-linux-xenial-py3-clang5-mobile-build.yml +++ /dev/null @@ -1,241 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/linux_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: linux-xenial-py3-clang5-mobile-build - -on: - pull_request: - push: - tags: - - 'ciflow/all/*' - - 'ciflow/linux/*' - - 'ciflow/mobile/*' - - 'ciflow/trunk/*' - branches: - - master - - main - - release/* - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: linux-xenial-py3-clang5-mobile-build - DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-asan - SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 - XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla - TORCH_CUDA_ARCH_LIST: 5.2 - IN_CI: 1 - IS_GHA: 1 - # This is used for the phase of adding wheel tests only, will be removed once completed - IN_WHEEL_TEST: 1 - # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh - CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - AWS_DEFAULT_REGION: us-east-1 - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - PYTORCH_RETRY_TEST_CASES: 1 -concurrency: - group: linux-xenial-py3-clang5-mobile-build-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true - -jobs: - - build: - runs-on: linux.2xlarge - timeout-minutes: 240 - env: - JOB_BASE_NAME: linux-xenial-py3-clang5-mobile-build-build - outputs: - docker_image: ${{ steps.calculate-tag.outputs.docker_image }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Calculate docker image tag - id: calculate-tag - run: | - DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker) - echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "::set-output name=docker_tag::${DOCKER_TAG}" - echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" - - name: Check if image should be built - id: check - env: - BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }} - run: | - set -x - # Check if image already exists, if it does then skip building it - if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then - exit 0 - fi - if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then - # if we're on the base branch then use the parent commit - MERGE_BASE=$(git rev-parse HEAD~) - else - # otherwise we're on a PR, so use the most recent base commit - MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") - fi - # Covers the case where a previous tag doesn't exist for the tree - # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly - if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then - echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit" - exit 1 - fi - PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker") - # If no image exists but the hash is the same as the previous hash then we should error out here - if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then - echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch" - echo " contact the PyTorch team to restore the original images" - exit 1 - fi - echo ::set-output name=rebuild::yes - - name: Build and push docker image - if: ${{ steps.check.outputs.rebuild }} - env: - DOCKER_SKIP_S3_UPLOAD: 1 - working-directory: .circleci/docker - run: | - export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/} - ./build_docker.sh - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Build - env: - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - # detached container should get cleaned up by teardown_ec2_linux - container_name=$(docker run \ - -e BUILD_ENVIRONMENT \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e AWS_DEFAULT_REGION \ - -e IS_GHA \ - -e PR_NUMBER \ - -e SHA1 \ - -e BRANCH \ - -e GITHUB_RUN_ID \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e PR_LABELS \ - -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh' - - name: Display and upload binary build size statistics (Click Me) - # temporary hack: set CIRCLE_* vars, until we update - # tools/stats/print_test_stats.py to natively support GitHub Actions - env: - SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - run: | - COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0) - export COMMIT_TIME - pip3 install requests==2.26 boto3==1.16.34 - python3 -m tools.stats.upload_binary_size_to_scuba || exit 0 - - name: Chown workspace - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Clean up docker images - if: always() - run: | - # Prune all of the docker images - docker system prune -af diff --git a/.github/workflows/generated-linux-xenial-py3-clang5-mobile-custom-build-static.yml b/.github/workflows/generated-linux-xenial-py3-clang5-mobile-custom-build-static.yml deleted file mode 100644 index 409a0e3e95a345..00000000000000 --- a/.github/workflows/generated-linux-xenial-py3-clang5-mobile-custom-build-static.yml +++ /dev/null @@ -1,241 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/linux_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: linux-xenial-py3-clang5-mobile-custom-build-static - -on: - pull_request: - push: - tags: - - 'ciflow/all/*' - - 'ciflow/linux/*' - - 'ciflow/mobile/*' - - 'ciflow/trunk/*' - branches: - - master - - main - - release/* - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: linux-xenial-py3-clang5-mobile-custom-build-static - DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c - SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 - XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla - TORCH_CUDA_ARCH_LIST: 5.2 - IN_CI: 1 - IS_GHA: 1 - # This is used for the phase of adding wheel tests only, will be removed once completed - IN_WHEEL_TEST: 1 - # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh - CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - AWS_DEFAULT_REGION: us-east-1 - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - PYTORCH_RETRY_TEST_CASES: 1 -concurrency: - group: linux-xenial-py3-clang5-mobile-custom-build-static-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true - -jobs: - - build: - runs-on: linux.2xlarge - timeout-minutes: 240 - env: - JOB_BASE_NAME: linux-xenial-py3-clang5-mobile-custom-build-static-build - outputs: - docker_image: ${{ steps.calculate-tag.outputs.docker_image }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Calculate docker image tag - id: calculate-tag - run: | - DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker) - echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "::set-output name=docker_tag::${DOCKER_TAG}" - echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" - - name: Check if image should be built - id: check - env: - BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }} - run: | - set -x - # Check if image already exists, if it does then skip building it - if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then - exit 0 - fi - if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then - # if we're on the base branch then use the parent commit - MERGE_BASE=$(git rev-parse HEAD~) - else - # otherwise we're on a PR, so use the most recent base commit - MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") - fi - # Covers the case where a previous tag doesn't exist for the tree - # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly - if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then - echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit" - exit 1 - fi - PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker") - # If no image exists but the hash is the same as the previous hash then we should error out here - if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then - echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch" - echo " contact the PyTorch team to restore the original images" - exit 1 - fi - echo ::set-output name=rebuild::yes - - name: Build and push docker image - if: ${{ steps.check.outputs.rebuild }} - env: - DOCKER_SKIP_S3_UPLOAD: 1 - working-directory: .circleci/docker - run: | - export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/} - ./build_docker.sh - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Build - env: - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - # detached container should get cleaned up by teardown_ec2_linux - container_name=$(docker run \ - -e BUILD_ENVIRONMENT \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e AWS_DEFAULT_REGION \ - -e IS_GHA \ - -e PR_NUMBER \ - -e SHA1 \ - -e BRANCH \ - -e GITHUB_RUN_ID \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e PR_LABELS \ - -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh' - - name: Display and upload binary build size statistics (Click Me) - # temporary hack: set CIRCLE_* vars, until we update - # tools/stats/print_test_stats.py to natively support GitHub Actions - env: - SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - run: | - COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0) - export COMMIT_TIME - pip3 install requests==2.26 boto3==1.16.34 - python3 -m tools.stats.upload_binary_size_to_scuba || exit 0 - - name: Chown workspace - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Clean up docker images - if: always() - run: | - # Prune all of the docker images - docker system prune -af diff --git a/.github/workflows/generated-linux-xenial-py3.7-clang7-asan.yml b/.github/workflows/generated-linux-xenial-py3.7-clang7-asan.yml deleted file mode 100644 index 0f8858cb178f51..00000000000000 --- a/.github/workflows/generated-linux-xenial-py3.7-clang7-asan.yml +++ /dev/null @@ -1,995 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/linux_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: linux-xenial-py3.7-clang7-asan - -on: - pull_request: - push: - tags: - - 'ciflow/all/*' - - 'ciflow/cpu/*' - - 'ciflow/linux/*' - - 'ciflow/sanitizers/*' - - 'ciflow/trunk/*' - branches: - - master - - main - - release/* - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: linux-xenial-py3.7-clang7-asan - DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang7-asan - SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 - XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla - TORCH_CUDA_ARCH_LIST: 5.2 - IN_CI: 1 - IS_GHA: 1 - # This is used for the phase of adding wheel tests only, will be removed once completed - IN_WHEEL_TEST: 1 - # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh - CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - AWS_DEFAULT_REGION: us-east-1 - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - PYTORCH_RETRY_TEST_CASES: 1 -concurrency: - group: linux-xenial-py3.7-clang7-asan-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true - -jobs: - - build: - runs-on: linux.2xlarge - timeout-minutes: 240 - env: - JOB_BASE_NAME: linux-xenial-py3.7-clang7-asan-build - outputs: - docker_image: ${{ steps.calculate-tag.outputs.docker_image }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Calculate docker image tag - id: calculate-tag - run: | - DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker) - echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "::set-output name=docker_tag::${DOCKER_TAG}" - echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" - - name: Check if image should be built - id: check - env: - BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }} - run: | - set -x - # Check if image already exists, if it does then skip building it - if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then - exit 0 - fi - if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then - # if we're on the base branch then use the parent commit - MERGE_BASE=$(git rev-parse HEAD~) - else - # otherwise we're on a PR, so use the most recent base commit - MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") - fi - # Covers the case where a previous tag doesn't exist for the tree - # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly - if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then - echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit" - exit 1 - fi - PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker") - # If no image exists but the hash is the same as the previous hash then we should error out here - if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then - echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch" - echo " contact the PyTorch team to restore the original images" - exit 1 - fi - echo ::set-output name=rebuild::yes - - name: Build and push docker image - if: ${{ steps.check.outputs.rebuild }} - env: - DOCKER_SKIP_S3_UPLOAD: 1 - working-directory: .circleci/docker - run: | - export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/} - ./build_docker.sh - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Build - env: - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - # detached container should get cleaned up by teardown_ec2_linux - container_name=$(docker run \ - -e BUILD_ENVIRONMENT \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e AWS_DEFAULT_REGION \ - -e IS_GHA \ - -e PR_NUMBER \ - -e SHA1 \ - -e BRANCH \ - -e GITHUB_RUN_ID \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e PR_LABELS \ - -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh' - - name: Display and upload binary build size statistics (Click Me) - # temporary hack: set CIRCLE_* vars, until we update - # tools/stats/print_test_stats.py to natively support GitHub Actions - env: - SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - run: | - COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0) - export COMMIT_TIME - pip3 install requests==2.26 boto3==1.16.34 - python3 -m tools.stats.upload_binary_size_to_scuba || exit 0 - - name: Chown workspace - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Archive artifacts into zip - run: | - zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json - - uses: seemethere/upload-artifact-s3@v3 - name: Store PyTorch Build Artifacts on S3 - with: - name: ${{ env.BUILD_ENVIRONMENT }} - retention-days: 14 - if-no-files-found: error - path: - artifacts.zip - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Clean up docker images - if: always() - run: | - # Prune all of the docker images - docker system prune -af - - test_default_1_3: - name: test (default, 1, 3, linux.2xlarge) - needs: build - runs-on: linux.2xlarge - timeout-minutes: 330 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: linux-xenial-py3.7-clang7-asan-test - TEST_CONFIG: default - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 3 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 300 minutes - timeout-minutes: 300 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-1-3-linux.2xlarge' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-1-3-linux.2xlarge' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: linux-xenial-py3.7-clang7-asan-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - test_default_2_3: - name: test (default, 2, 3, linux.2xlarge) - needs: build - runs-on: linux.2xlarge - timeout-minutes: 330 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: linux-xenial-py3.7-clang7-asan-test - TEST_CONFIG: default - SHARD_NUMBER: 2 - NUM_TEST_SHARDS: 3 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 300 minutes - timeout-minutes: 300 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-2-3-linux.2xlarge' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-2-3-linux.2xlarge' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: linux-xenial-py3.7-clang7-asan-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - test_default_3_3: - name: test (default, 3, 3, linux.2xlarge) - needs: build - runs-on: linux.2xlarge - timeout-minutes: 330 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: linux-xenial-py3.7-clang7-asan-test - TEST_CONFIG: default - SHARD_NUMBER: 3 - NUM_TEST_SHARDS: 3 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 300 minutes - timeout-minutes: 300 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-3-3-linux.2xlarge' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-3-3-linux.2xlarge' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: linux-xenial-py3.7-clang7-asan-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af diff --git a/.github/workflows/generated-linux-xenial-py3.7-clang7-onnx.yml b/.github/workflows/generated-linux-xenial-py3.7-clang7-onnx.yml deleted file mode 100644 index a2ceb91d987b1f..00000000000000 --- a/.github/workflows/generated-linux-xenial-py3.7-clang7-onnx.yml +++ /dev/null @@ -1,748 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/linux_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: linux-xenial-py3.7-clang7-onnx - -on: - pull_request: - push: - tags: - - 'ciflow/all/*' - - 'ciflow/cpu/*' - - 'ciflow/linux/*' - - 'ciflow/onnx/*' - - 'ciflow/trunk/*' - branches: - - master - - main - - release/* - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: linux-xenial-py3.7-clang7-onnx - DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang7-onnx - SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 - XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla - TORCH_CUDA_ARCH_LIST: 5.2 - IN_CI: 1 - IS_GHA: 1 - # This is used for the phase of adding wheel tests only, will be removed once completed - IN_WHEEL_TEST: 1 - # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh - CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - AWS_DEFAULT_REGION: us-east-1 - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - PYTORCH_RETRY_TEST_CASES: 1 -concurrency: - group: linux-xenial-py3.7-clang7-onnx-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true - -jobs: - - build: - runs-on: linux.2xlarge - timeout-minutes: 240 - env: - JOB_BASE_NAME: linux-xenial-py3.7-clang7-onnx-build - outputs: - docker_image: ${{ steps.calculate-tag.outputs.docker_image }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Calculate docker image tag - id: calculate-tag - run: | - DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker) - echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "::set-output name=docker_tag::${DOCKER_TAG}" - echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" - - name: Check if image should be built - id: check - env: - BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }} - run: | - set -x - # Check if image already exists, if it does then skip building it - if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then - exit 0 - fi - if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then - # if we're on the base branch then use the parent commit - MERGE_BASE=$(git rev-parse HEAD~) - else - # otherwise we're on a PR, so use the most recent base commit - MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") - fi - # Covers the case where a previous tag doesn't exist for the tree - # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly - if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then - echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit" - exit 1 - fi - PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker") - # If no image exists but the hash is the same as the previous hash then we should error out here - if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then - echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch" - echo " contact the PyTorch team to restore the original images" - exit 1 - fi - echo ::set-output name=rebuild::yes - - name: Build and push docker image - if: ${{ steps.check.outputs.rebuild }} - env: - DOCKER_SKIP_S3_UPLOAD: 1 - working-directory: .circleci/docker - run: | - export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/} - ./build_docker.sh - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Build - env: - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - # detached container should get cleaned up by teardown_ec2_linux - container_name=$(docker run \ - -e BUILD_ENVIRONMENT \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e AWS_DEFAULT_REGION \ - -e IS_GHA \ - -e PR_NUMBER \ - -e SHA1 \ - -e BRANCH \ - -e GITHUB_RUN_ID \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e PR_LABELS \ - -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh' - - name: Display and upload binary build size statistics (Click Me) - # temporary hack: set CIRCLE_* vars, until we update - # tools/stats/print_test_stats.py to natively support GitHub Actions - env: - SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - run: | - COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0) - export COMMIT_TIME - pip3 install requests==2.26 boto3==1.16.34 - python3 -m tools.stats.upload_binary_size_to_scuba || exit 0 - - name: Chown workspace - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Archive artifacts into zip - run: | - zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json - - uses: seemethere/upload-artifact-s3@v3 - name: Store PyTorch Build Artifacts on S3 - with: - name: ${{ env.BUILD_ENVIRONMENT }} - retention-days: 14 - if-no-files-found: error - path: - artifacts.zip - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Clean up docker images - if: always() - run: | - # Prune all of the docker images - docker system prune -af - - test_default_1_2: - name: test (default, 1, 2, linux.2xlarge) - needs: build - runs-on: linux.2xlarge - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: linux-xenial-py3.7-clang7-onnx-test - TEST_CONFIG: default - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 2 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-1-2-linux.2xlarge' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-1-2-linux.2xlarge' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: linux-xenial-py3.7-clang7-onnx-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - test_default_2_2: - name: test (default, 2, 2, linux.2xlarge) - needs: build - runs-on: linux.2xlarge - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: linux-xenial-py3.7-clang7-onnx-test - TEST_CONFIG: default - SHARD_NUMBER: 2 - NUM_TEST_SHARDS: 2 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-2-2-linux.2xlarge' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-2-2-linux.2xlarge' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: linux-xenial-py3.7-clang7-onnx-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af diff --git a/.github/workflows/generated-linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build.yml b/.github/workflows/generated-linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build.yml deleted file mode 100644 index 80eaabc04c7f92..00000000000000 --- a/.github/workflows/generated-linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build.yml +++ /dev/null @@ -1,243 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/linux_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build - -on: - pull_request: - push: - tags: - - 'ciflow/all/*' - - 'ciflow/cpu/*' - - 'ciflow/libtorch/*' - - 'ciflow/linux/*' - - 'ciflow/mobile/*' - - 'ciflow/trunk/*' - branches: - - master - - main - - release/* - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build - DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc5.4 - SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 - XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla - TORCH_CUDA_ARCH_LIST: 5.2 - IN_CI: 1 - IS_GHA: 1 - # This is used for the phase of adding wheel tests only, will be removed once completed - IN_WHEEL_TEST: 1 - # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh - CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - AWS_DEFAULT_REGION: us-east-1 - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - PYTORCH_RETRY_TEST_CASES: 1 -concurrency: - group: linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true - -jobs: - - build: - runs-on: linux.2xlarge - timeout-minutes: 240 - env: - JOB_BASE_NAME: linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build-build - outputs: - docker_image: ${{ steps.calculate-tag.outputs.docker_image }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Calculate docker image tag - id: calculate-tag - run: | - DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker) - echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "::set-output name=docker_tag::${DOCKER_TAG}" - echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" - - name: Check if image should be built - id: check - env: - BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }} - run: | - set -x - # Check if image already exists, if it does then skip building it - if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then - exit 0 - fi - if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then - # if we're on the base branch then use the parent commit - MERGE_BASE=$(git rev-parse HEAD~) - else - # otherwise we're on a PR, so use the most recent base commit - MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") - fi - # Covers the case where a previous tag doesn't exist for the tree - # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly - if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then - echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit" - exit 1 - fi - PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker") - # If no image exists but the hash is the same as the previous hash then we should error out here - if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then - echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch" - echo " contact the PyTorch team to restore the original images" - exit 1 - fi - echo ::set-output name=rebuild::yes - - name: Build and push docker image - if: ${{ steps.check.outputs.rebuild }} - env: - DOCKER_SKIP_S3_UPLOAD: 1 - working-directory: .circleci/docker - run: | - export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/} - ./build_docker.sh - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Build - env: - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - # detached container should get cleaned up by teardown_ec2_linux - container_name=$(docker run \ - -e BUILD_ENVIRONMENT \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e AWS_DEFAULT_REGION \ - -e IS_GHA \ - -e PR_NUMBER \ - -e SHA1 \ - -e BRANCH \ - -e GITHUB_RUN_ID \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e PR_LABELS \ - -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh' - - name: Display and upload binary build size statistics (Click Me) - # temporary hack: set CIRCLE_* vars, until we update - # tools/stats/print_test_stats.py to natively support GitHub Actions - env: - SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - run: | - COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0) - export COMMIT_TIME - pip3 install requests==2.26 boto3==1.16.34 - python3 -m tools.stats.upload_binary_size_to_scuba || exit 0 - - name: Chown workspace - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Clean up docker images - if: always() - run: | - # Prune all of the docker images - docker system prune -af diff --git a/.github/workflows/generated-linux-xenial-py3.7-gcc5.4.yml b/.github/workflows/generated-linux-xenial-py3.7-gcc5.4.yml deleted file mode 100644 index 87df9f6ff116c1..00000000000000 --- a/.github/workflows/generated-linux-xenial-py3.7-gcc5.4.yml +++ /dev/null @@ -1,1735 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/linux_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: linux-xenial-py3.7-gcc5.4 - -on: - pull_request: - push: - tags: - - 'ciflow/all/*' - - 'ciflow/cpu/*' - - 'ciflow/linux/*' - - 'ciflow/trunk/*' - branches: - - master - - main - - release/* - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: linux-xenial-py3.7-gcc5.4 - DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc5.4 - SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 - XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla - TORCH_CUDA_ARCH_LIST: 5.2 - IN_CI: 1 - IS_GHA: 1 - # This is used for the phase of adding wheel tests only, will be removed once completed - IN_WHEEL_TEST: 1 - # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh - CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - AWS_DEFAULT_REGION: us-east-1 - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - PYTORCH_RETRY_TEST_CASES: 1 -concurrency: - group: linux-xenial-py3.7-gcc5.4-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true - -jobs: - - build: - runs-on: linux.2xlarge - timeout-minutes: 240 - env: - JOB_BASE_NAME: linux-xenial-py3.7-gcc5.4-build - outputs: - docker_image: ${{ steps.calculate-tag.outputs.docker_image }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Calculate docker image tag - id: calculate-tag - run: | - DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker) - echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "::set-output name=docker_tag::${DOCKER_TAG}" - echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" - - name: Check if image should be built - id: check - env: - BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }} - run: | - set -x - # Check if image already exists, if it does then skip building it - if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then - exit 0 - fi - if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then - # if we're on the base branch then use the parent commit - MERGE_BASE=$(git rev-parse HEAD~) - else - # otherwise we're on a PR, so use the most recent base commit - MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") - fi - # Covers the case where a previous tag doesn't exist for the tree - # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly - if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then - echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit" - exit 1 - fi - PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker") - # If no image exists but the hash is the same as the previous hash then we should error out here - if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then - echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch" - echo " contact the PyTorch team to restore the original images" - exit 1 - fi - echo ::set-output name=rebuild::yes - - name: Build and push docker image - if: ${{ steps.check.outputs.rebuild }} - env: - DOCKER_SKIP_S3_UPLOAD: 1 - working-directory: .circleci/docker - run: | - export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/} - ./build_docker.sh - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Build - env: - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - # detached container should get cleaned up by teardown_ec2_linux - container_name=$(docker run \ - -e BUILD_ENVIRONMENT \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e AWS_DEFAULT_REGION \ - -e IS_GHA \ - -e PR_NUMBER \ - -e SHA1 \ - -e BRANCH \ - -e GITHUB_RUN_ID \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e PR_LABELS \ - -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh' - - name: Display and upload binary build size statistics (Click Me) - # temporary hack: set CIRCLE_* vars, until we update - # tools/stats/print_test_stats.py to natively support GitHub Actions - env: - SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - run: | - COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0) - export COMMIT_TIME - pip3 install requests==2.26 boto3==1.16.34 - python3 -m tools.stats.upload_binary_size_to_scuba || exit 0 - - name: Chown workspace - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Archive artifacts into zip - run: | - zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json - - uses: seemethere/upload-artifact-s3@v3 - name: Store PyTorch Build Artifacts on S3 - with: - name: ${{ env.BUILD_ENVIRONMENT }} - retention-days: 14 - if-no-files-found: error - path: - artifacts.zip - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Clean up docker images - if: always() - run: | - # Prune all of the docker images - docker system prune -af - - test_jit_legacy_1_1: - name: test (jit_legacy, 1, 1, linux.2xlarge) - needs: build - runs-on: linux.2xlarge - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: linux-xenial-py3.7-gcc5.4-test - TEST_CONFIG: jit_legacy - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 1 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-jit_legacy-1-1-linux.2xlarge' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-jit_legacy-1-1-linux.2xlarge' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: linux-xenial-py3.7-gcc5.4-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - test_distributed_1_1: - name: test (distributed, 1, 1, linux.2xlarge) - needs: build - runs-on: linux.2xlarge - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: linux-xenial-py3.7-gcc5.4-test - TEST_CONFIG: distributed - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 1 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-distributed-1-1-linux.2xlarge' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-distributed-1-1-linux.2xlarge' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: linux-xenial-py3.7-gcc5.4-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - test_docs_test_1_1: - name: test (docs_test, 1, 1, linux.2xlarge) - needs: build - runs-on: linux.2xlarge - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: linux-xenial-py3.7-gcc5.4-test - TEST_CONFIG: docs_test - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 1 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-docs_test-1-1-linux.2xlarge' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-docs_test-1-1-linux.2xlarge' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: linux-xenial-py3.7-gcc5.4-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - test_backwards_compat_1_1: - name: test (backwards_compat, 1, 1, linux.2xlarge) - needs: build - runs-on: linux.2xlarge - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: linux-xenial-py3.7-gcc5.4-test - TEST_CONFIG: backwards_compat - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 1 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-backwards_compat-1-1-linux.2xlarge' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-backwards_compat-1-1-linux.2xlarge' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: linux-xenial-py3.7-gcc5.4-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - test_default_1_2: - name: test (default, 1, 2, linux.2xlarge) - needs: build - runs-on: linux.2xlarge - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: linux-xenial-py3.7-gcc5.4-test - TEST_CONFIG: default - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 2 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-1-2-linux.2xlarge' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-1-2-linux.2xlarge' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: linux-xenial-py3.7-gcc5.4-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - test_default_2_2: - name: test (default, 2, 2, linux.2xlarge) - needs: build - runs-on: linux.2xlarge - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: linux-xenial-py3.7-gcc5.4-test - TEST_CONFIG: default - SHARD_NUMBER: 2 - NUM_TEST_SHARDS: 2 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-2-2-linux.2xlarge' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-2-2-linux.2xlarge' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: linux-xenial-py3.7-gcc5.4-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af diff --git a/.github/workflows/generated-linux-xenial-py3.7-gcc7-no-ops.yml b/.github/workflows/generated-linux-xenial-py3.7-gcc7-no-ops.yml deleted file mode 100644 index 1b507bc4831625..00000000000000 --- a/.github/workflows/generated-linux-xenial-py3.7-gcc7-no-ops.yml +++ /dev/null @@ -1,252 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/linux_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: linux-xenial-py3.7-gcc7-no-ops - -on: - pull_request: - push: - tags: - - 'ciflow/all/*' - - 'ciflow/cpu/*' - - 'ciflow/linux/*' - - 'ciflow/trunk/*' - branches: - - master - - main - - release/* - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: linux-xenial-py3.7-gcc7-no-ops - DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc7 - SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 - XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla - TORCH_CUDA_ARCH_LIST: 5.2 - IN_CI: 1 - IS_GHA: 1 - # This is used for the phase of adding wheel tests only, will be removed once completed - IN_WHEEL_TEST: 1 - # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh - CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - AWS_DEFAULT_REGION: us-east-1 - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - PYTORCH_RETRY_TEST_CASES: 1 -concurrency: - group: linux-xenial-py3.7-gcc7-no-ops-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true - -jobs: - - build: - runs-on: linux.2xlarge - timeout-minutes: 240 - env: - JOB_BASE_NAME: linux-xenial-py3.7-gcc7-no-ops-build - outputs: - docker_image: ${{ steps.calculate-tag.outputs.docker_image }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Calculate docker image tag - id: calculate-tag - run: | - DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker) - echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "::set-output name=docker_tag::${DOCKER_TAG}" - echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" - - name: Check if image should be built - id: check - env: - BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }} - run: | - set -x - # Check if image already exists, if it does then skip building it - if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then - exit 0 - fi - if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then - # if we're on the base branch then use the parent commit - MERGE_BASE=$(git rev-parse HEAD~) - else - # otherwise we're on a PR, so use the most recent base commit - MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") - fi - # Covers the case where a previous tag doesn't exist for the tree - # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly - if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then - echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit" - exit 1 - fi - PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker") - # If no image exists but the hash is the same as the previous hash then we should error out here - if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then - echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch" - echo " contact the PyTorch team to restore the original images" - exit 1 - fi - echo ::set-output name=rebuild::yes - - name: Build and push docker image - if: ${{ steps.check.outputs.rebuild }} - env: - DOCKER_SKIP_S3_UPLOAD: 1 - working-directory: .circleci/docker - run: | - export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/} - ./build_docker.sh - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Build - env: - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - # detached container should get cleaned up by teardown_ec2_linux - container_name=$(docker run \ - -e BUILD_ENVIRONMENT \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e AWS_DEFAULT_REGION \ - -e IS_GHA \ - -e PR_NUMBER \ - -e SHA1 \ - -e BRANCH \ - -e GITHUB_RUN_ID \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e PR_LABELS \ - -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh' - - name: Display and upload binary build size statistics (Click Me) - # temporary hack: set CIRCLE_* vars, until we update - # tools/stats/print_test_stats.py to natively support GitHub Actions - env: - SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - run: | - COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0) - export COMMIT_TIME - pip3 install requests==2.26 boto3==1.16.34 - python3 -m tools.stats.upload_binary_size_to_scuba || exit 0 - - name: Chown workspace - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Archive artifacts into zip - run: | - zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json - - uses: seemethere/upload-artifact-s3@v3 - name: Store PyTorch Build Artifacts on S3 - with: - name: ${{ env.BUILD_ENVIRONMENT }} - retention-days: 14 - if-no-files-found: error - path: - artifacts.zip - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Clean up docker images - if: always() - run: | - # Prune all of the docker images - docker system prune -af diff --git a/.github/workflows/generated-linux-xenial-py3.7-gcc7.yml b/.github/workflows/generated-linux-xenial-py3.7-gcc7.yml deleted file mode 100644 index 59c1e771d7b611..00000000000000 --- a/.github/workflows/generated-linux-xenial-py3.7-gcc7.yml +++ /dev/null @@ -1,994 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/linux_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: linux-xenial-py3.7-gcc7 - -on: - pull_request: - push: - tags: - - 'ciflow/all/*' - - 'ciflow/cpu/*' - - 'ciflow/linux/*' - - 'ciflow/trunk/*' - branches: - - master - - main - - release/* - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: linux-xenial-py3.7-gcc7 - DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc7 - SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 - XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla - TORCH_CUDA_ARCH_LIST: 5.2 - IN_CI: 1 - IS_GHA: 1 - # This is used for the phase of adding wheel tests only, will be removed once completed - IN_WHEEL_TEST: 1 - # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh - CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - AWS_DEFAULT_REGION: us-east-1 - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - PYTORCH_RETRY_TEST_CASES: 1 -concurrency: - group: linux-xenial-py3.7-gcc7-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true - -jobs: - - build: - runs-on: linux.2xlarge - timeout-minutes: 240 - env: - JOB_BASE_NAME: linux-xenial-py3.7-gcc7-build - outputs: - docker_image: ${{ steps.calculate-tag.outputs.docker_image }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Calculate docker image tag - id: calculate-tag - run: | - DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker) - echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "::set-output name=docker_tag::${DOCKER_TAG}" - echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" - - name: Check if image should be built - id: check - env: - BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }} - run: | - set -x - # Check if image already exists, if it does then skip building it - if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then - exit 0 - fi - if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then - # if we're on the base branch then use the parent commit - MERGE_BASE=$(git rev-parse HEAD~) - else - # otherwise we're on a PR, so use the most recent base commit - MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") - fi - # Covers the case where a previous tag doesn't exist for the tree - # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly - if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then - echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit" - exit 1 - fi - PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker") - # If no image exists but the hash is the same as the previous hash then we should error out here - if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then - echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch" - echo " contact the PyTorch team to restore the original images" - exit 1 - fi - echo ::set-output name=rebuild::yes - - name: Build and push docker image - if: ${{ steps.check.outputs.rebuild }} - env: - DOCKER_SKIP_S3_UPLOAD: 1 - working-directory: .circleci/docker - run: | - export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/} - ./build_docker.sh - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Build - env: - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - # detached container should get cleaned up by teardown_ec2_linux - container_name=$(docker run \ - -e BUILD_ENVIRONMENT \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e AWS_DEFAULT_REGION \ - -e IS_GHA \ - -e PR_NUMBER \ - -e SHA1 \ - -e BRANCH \ - -e GITHUB_RUN_ID \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e PR_LABELS \ - -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh' - - name: Display and upload binary build size statistics (Click Me) - # temporary hack: set CIRCLE_* vars, until we update - # tools/stats/print_test_stats.py to natively support GitHub Actions - env: - SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - run: | - COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0) - export COMMIT_TIME - pip3 install requests==2.26 boto3==1.16.34 - python3 -m tools.stats.upload_binary_size_to_scuba || exit 0 - - name: Chown workspace - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Archive artifacts into zip - run: | - zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json - - uses: seemethere/upload-artifact-s3@v3 - name: Store PyTorch Build Artifacts on S3 - with: - name: ${{ env.BUILD_ENVIRONMENT }} - retention-days: 14 - if-no-files-found: error - path: - artifacts.zip - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Clean up docker images - if: always() - run: | - # Prune all of the docker images - docker system prune -af - - test_distributed_1_1: - name: test (distributed, 1, 1, linux.2xlarge) - needs: build - runs-on: linux.2xlarge - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: linux-xenial-py3.7-gcc7-test - TEST_CONFIG: distributed - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 1 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-distributed-1-1-linux.2xlarge' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-distributed-1-1-linux.2xlarge' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: linux-xenial-py3.7-gcc7-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - test_default_1_2: - name: test (default, 1, 2, linux.2xlarge) - needs: build - runs-on: linux.2xlarge - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: linux-xenial-py3.7-gcc7-test - TEST_CONFIG: default - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 2 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-1-2-linux.2xlarge' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-1-2-linux.2xlarge' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: linux-xenial-py3.7-gcc7-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - test_default_2_2: - name: test (default, 2, 2, linux.2xlarge) - needs: build - runs-on: linux.2xlarge - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: linux-xenial-py3.7-gcc7-test - TEST_CONFIG: default - SHARD_NUMBER: 2 - NUM_TEST_SHARDS: 2 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-2-2-linux.2xlarge' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-2-2-linux.2xlarge' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: linux-xenial-py3.7-gcc7-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af diff --git a/.github/workflows/generated-macos-10-15-py3-arm64.yml b/.github/workflows/generated-macos-10-15-py3-arm64.yml deleted file mode 100644 index 5a6c089249f661..00000000000000 --- a/.github/workflows/generated-macos-10-15-py3-arm64.yml +++ /dev/null @@ -1,89 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/macos_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: macos-10-15-py3-arm64 - -on: - push: - branches: - - master - - main - - release/* - tags: - - 'ciflow/all/*' - - 'ciflow/macos/*' - - 'ciflow/trunk/*' - workflow_dispatch: - -# For setup-miniconda, see https://github.com/conda-incubator/setup-miniconda/issues/179 -defaults: - run: - shell: bash -e -l {0} -env: - BUILD_ENVIRONMENT: macos-10-15-py3-arm64 - COMPACT_JOB_NAME: macos-10-15-py3-arm64 - IN_CI: 1 - IS_GHA: 1 - PYTORCH_RETRY_TEST_CASES: 1 - - -jobs: - - build: - runs-on: macos-10.15 - env: - JOB_BASE_NAME: macos-10-15-py3-arm64 - # For sccache access (only on non-forked PRs) - AWS_ACCESS_KEY_ID: ${{ secrets.MACOS_SCCACHE_S3_ACCESS_KEY_ID }} - AWS_SECRET_ACCESS_KEY: ${{ secrets.MACOS_SCCACHE_S3_SECRET_ACCESS_KEY }} - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Setup miniconda - uses: conda-incubator/setup-miniconda@v2 - with: - auto-update-conda: true - python-version: 3.8 - activate-environment: build - - name: Install macOS homebrew dependencies - run: | - # Install dependencies - brew install libomp - - name: Install sccache (only for non-forked PRs, and pushes to trunk) - if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} - run: | - sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache - sudo chmod +x /usr/local/bin/sccache - echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" - - name: Build - run: | - echo "CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"}" >> "${GITHUB_ENV}" - .jenkins/pytorch/macos-build.sh - - name: Archive artifacts into zip - run: | - zip -1 -r artifacts.zip dist/ - - uses: actions/upload-artifact@v2 - name: Store PyTorch Build Artifacts on GHA - with: - name: ${{ env.BUILD_ENVIRONMENT }} - retention-days: 14 - if-no-files-found: error - path: - artifacts.zip - - -concurrency: - group: macos-10-15-py3-arm64-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true diff --git a/.github/workflows/generated-macos-10-15-py3-lite-interpreter-x86-64.yml b/.github/workflows/generated-macos-10-15-py3-lite-interpreter-x86-64.yml deleted file mode 100644 index af9859b138280b..00000000000000 --- a/.github/workflows/generated-macos-10-15-py3-lite-interpreter-x86-64.yml +++ /dev/null @@ -1,80 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/macos_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: macos-10-15-py3-lite-interpreter-x86-64 - -on: - push: - branches: - - master - - main - - release/* - tags: - - 'ciflow/all/*' - - 'ciflow/macos/*' - - 'ciflow/trunk/*' - workflow_dispatch: - -# For setup-miniconda, see https://github.com/conda-incubator/setup-miniconda/issues/179 -defaults: - run: - shell: bash -e -l {0} -env: - BUILD_ENVIRONMENT: macos-10-15-py3-lite-interpreter-x86-64 - COMPACT_JOB_NAME: macos-10-15-py3-lite-interpreter-x86-64 - IN_CI: 1 - IS_GHA: 1 - PYTORCH_RETRY_TEST_CASES: 1 - - # Set xcode xcode version to 12 - DEVELOPER_DIR: /Applications/Xcode_12.app/Contents/Developer - -jobs: - - build: - runs-on: macos-10.15 - env: - JOB_BASE_NAME: macos-10-15-py3-lite-interpreter-x86-64 - # For sccache access (only on non-forked PRs) - AWS_ACCESS_KEY_ID: ${{ secrets.MACOS_SCCACHE_S3_ACCESS_KEY_ID }} - AWS_SECRET_ACCESS_KEY: ${{ secrets.MACOS_SCCACHE_S3_SECRET_ACCESS_KEY }} - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Setup miniconda - uses: conda-incubator/setup-miniconda@v2 - with: - auto-update-conda: true - python-version: 3.8 - activate-environment: build - - name: Install macOS homebrew dependencies - run: | - # Install dependencies - brew install libomp - - name: Install sccache (only for non-forked PRs, and pushes to trunk) - if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} - run: | - sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache - sudo chmod +x /usr/local/bin/sccache - echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" - - name: Build - run: | - echo "CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"}" >> "${GITHUB_ENV}" - .jenkins/pytorch/macos-build.sh - - -concurrency: - group: macos-10-15-py3-lite-interpreter-x86-64-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true diff --git a/.github/workflows/generated-macos-11-py3-x86-64.yml b/.github/workflows/generated-macos-11-py3-x86-64.yml deleted file mode 100644 index 7961cff18fd119..00000000000000 --- a/.github/workflows/generated-macos-11-py3-x86-64.yml +++ /dev/null @@ -1,319 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/macos_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: macos-11-py3-x86-64 - -on: - push: - branches: - - master - - main - - release/* - tags: - - 'ciflow/all/*' - - 'ciflow/macos/*' - - 'ciflow/trunk/*' - workflow_dispatch: - -# For setup-miniconda, see https://github.com/conda-incubator/setup-miniconda/issues/179 -defaults: - run: - shell: bash -e -l {0} -env: - BUILD_ENVIRONMENT: macos-11-py3-x86-64 - COMPACT_JOB_NAME: macos-11-py3-x86-64 - IN_CI: 1 - IS_GHA: 1 - PYTORCH_RETRY_TEST_CASES: 1 - - # Set xcode xcode version to 12.4 - DEVELOPER_DIR: /Applications/Xcode_12.4.app/Contents/Developer - -jobs: - - build: - runs-on: macos-11 - env: - JOB_BASE_NAME: macos-11-py3-x86-64 - # For sccache access (only on non-forked PRs) - AWS_ACCESS_KEY_ID: ${{ secrets.MACOS_SCCACHE_S3_ACCESS_KEY_ID }} - AWS_SECRET_ACCESS_KEY: ${{ secrets.MACOS_SCCACHE_S3_SECRET_ACCESS_KEY }} - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Setup miniconda - uses: conda-incubator/setup-miniconda@v2 - with: - auto-update-conda: true - python-version: 3.8 - activate-environment: build - - name: Install macOS homebrew dependencies - run: | - # Install dependencies - brew install libomp - - name: Install sccache (only for non-forked PRs, and pushes to trunk) - if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} - run: | - sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache - sudo chmod +x /usr/local/bin/sccache - echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" - - name: Build - run: | - echo "CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"}" >> "${GITHUB_ENV}" - .jenkins/pytorch/macos-build.sh - - name: Archive artifacts into zip - run: | - zip -1 -r artifacts.zip dist/ - - uses: actions/upload-artifact@v2 - name: Store PyTorch Build Artifacts on GHA - with: - name: ${{ env.BUILD_ENVIRONMENT }} - retention-days: 14 - if-no-files-found: error - path: - artifacts.zip - - - test_default_1_2: - name: test (default, 1, 2, macos-11) - needs: build - runs-on: macos-11 - timeout-minutes: 240 - env: - JOB_BASE_NAME: macos-11-py3-x86-64-test - TEST_CONFIG: default - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 2 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: false - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - uses: actions/download-artifact@v2 - name: Download PyTorch Build Artifacts from GHA - with: - name: ${{ env.BUILD_ENVIRONMENT }} - path: . - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Setup miniconda - uses: conda-incubator/setup-miniconda@v2 - with: - auto-update-conda: true - python-version: 3.8 - activate-environment: build - - name: Install macOS homebrew dependencies - run: | - # Install dependencies - brew install libomp - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - run: | - python3 -mpip install dist/*.whl - .jenkins/pytorch/macos-test.sh - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-1-2-macos-11' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: actions/upload-artifact@v2 - name: Store Test Downloaded JSONs on Github - if: always() - with: - name: test-jsons - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-1-2-macos-11' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: actions/upload-artifact@v2 - name: Store Test Reports on Github - if: always() - with: - name: test-reports - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: macos-11-py3-x86-64-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - AWS_ACCESS_KEY_ID: ${{ secrets.AWS_OSSCI_METRICS_ACCESS_KEY_ID }} - AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_OSSCI_METRICS_SECRET_ACCESS_KEY }} - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - test_default_2_2: - name: test (default, 2, 2, macos-11) - needs: build - runs-on: macos-11 - timeout-minutes: 240 - env: - JOB_BASE_NAME: macos-11-py3-x86-64-test - TEST_CONFIG: default - SHARD_NUMBER: 2 - NUM_TEST_SHARDS: 2 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: false - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - uses: actions/download-artifact@v2 - name: Download PyTorch Build Artifacts from GHA - with: - name: ${{ env.BUILD_ENVIRONMENT }} - path: . - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Setup miniconda - uses: conda-incubator/setup-miniconda@v2 - with: - auto-update-conda: true - python-version: 3.8 - activate-environment: build - - name: Install macOS homebrew dependencies - run: | - # Install dependencies - brew install libomp - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - run: | - python3 -mpip install dist/*.whl - .jenkins/pytorch/macos-test.sh - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-2-2-macos-11' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: actions/upload-artifact@v2 - name: Store Test Downloaded JSONs on Github - if: always() - with: - name: test-jsons - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-2-2-macos-11' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: actions/upload-artifact@v2 - name: Store Test Reports on Github - if: always() - with: - name: test-reports - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: macos-11-py3-x86-64-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - AWS_ACCESS_KEY_ID: ${{ secrets.AWS_OSSCI_METRICS_ACCESS_KEY_ID }} - AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_OSSCI_METRICS_SECRET_ACCESS_KEY }} - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - -concurrency: - group: macos-11-py3-x86-64-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true diff --git a/.github/workflows/generated-macos-arm64-binary-conda.yml b/.github/workflows/generated-macos-arm64-binary-conda-nightly.yml similarity index 86% rename from .github/workflows/generated-macos-arm64-binary-conda.yml rename to .github/workflows/generated-macos-arm64-binary-conda-nightly.yml index 593ca5a37b6445..37e922583ae4a6 100644 --- a/.github/workflows/generated-macos-arm64-binary-conda.yml +++ b/.github/workflows/generated-macos-arm64-binary-conda-nightly.yml @@ -38,6 +38,7 @@ concurrency: jobs: conda-py3_8-cpu-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: macos-10.15 timeout-minutes: 240 env: @@ -133,30 +134,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -173,9 +154,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - uses: actions/download-artifact@v2 @@ -235,6 +213,7 @@ jobs: # Prune all of the docker images docker system prune -af conda-py3_9-cpu-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: macos-10.15 timeout-minutes: 240 env: @@ -330,30 +309,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.9" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -370,9 +329,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - uses: actions/download-artifact@v2 @@ -432,6 +388,7 @@ jobs: # Prune all of the docker images docker system prune -af conda-py3_10-cpu-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: macos-10.15 timeout-minutes: 240 env: @@ -527,30 +484,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -567,9 +504,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - uses: actions/download-artifact@v2 diff --git a/.github/workflows/generated-macos-arm64-binary-wheel.yml b/.github/workflows/generated-macos-arm64-binary-wheel-nightly.yml similarity index 86% rename from .github/workflows/generated-macos-arm64-binary-wheel.yml rename to .github/workflows/generated-macos-arm64-binary-wheel-nightly.yml index b17db22d2a7c1c..a0267de766e2ac 100644 --- a/.github/workflows/generated-macos-arm64-binary-wheel.yml +++ b/.github/workflows/generated-macos-arm64-binary-wheel-nightly.yml @@ -38,6 +38,7 @@ concurrency: jobs: wheel-py3_7-cpu-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: macos-10.15 timeout-minutes: 240 env: @@ -133,30 +134,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -173,9 +154,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - uses: actions/download-artifact@v2 @@ -235,6 +213,7 @@ jobs: # Prune all of the docker images docker system prune -af wheel-py3_8-cpu-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: macos-10.15 timeout-minutes: 240 env: @@ -330,30 +309,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -370,9 +329,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - uses: actions/download-artifact@v2 @@ -432,6 +388,7 @@ jobs: # Prune all of the docker images docker system prune -af wheel-py3_9-cpu-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: macos-10.15 timeout-minutes: 240 env: @@ -527,30 +484,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.9" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -567,9 +504,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - uses: actions/download-artifact@v2 @@ -629,6 +563,7 @@ jobs: # Prune all of the docker images docker system prune -af wheel-py3_10-cpu-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: macos-10.15 timeout-minutes: 240 env: @@ -724,30 +659,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -764,9 +679,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - uses: actions/download-artifact@v2 diff --git a/.github/workflows/generated-macos-binary-conda.yml b/.github/workflows/generated-macos-binary-conda-nightly.yml similarity index 86% rename from .github/workflows/generated-macos-binary-conda.yml rename to .github/workflows/generated-macos-binary-conda-nightly.yml index 3fb1852c859169..d5c6eae896cb31 100644 --- a/.github/workflows/generated-macos-binary-conda.yml +++ b/.github/workflows/generated-macos-binary-conda-nightly.yml @@ -36,6 +36,7 @@ concurrency: jobs: conda-py3_7-cpu-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: macos-10.15 timeout-minutes: 240 env: @@ -131,30 +132,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -171,9 +152,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - uses: actions/download-artifact@v2 @@ -233,6 +211,7 @@ jobs: # Prune all of the docker images docker system prune -af conda-py3_8-cpu-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: macos-10.15 timeout-minutes: 240 env: @@ -328,30 +307,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -368,9 +327,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - uses: actions/download-artifact@v2 @@ -430,6 +386,7 @@ jobs: # Prune all of the docker images docker system prune -af conda-py3_9-cpu-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: macos-10.15 timeout-minutes: 240 env: @@ -525,30 +482,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.9" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -565,9 +502,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - uses: actions/download-artifact@v2 @@ -627,6 +561,7 @@ jobs: # Prune all of the docker images docker system prune -af conda-py3_10-cpu-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: macos-10.15 timeout-minutes: 240 env: @@ -722,30 +657,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -762,9 +677,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - uses: actions/download-artifact@v2 diff --git a/.github/workflows/generated-macos-binary-libtorch-cxx11-abi.yml b/.github/workflows/generated-macos-binary-libtorch-cxx11-abi-nightly.yml similarity index 87% rename from .github/workflows/generated-macos-binary-libtorch-cxx11-abi.yml rename to .github/workflows/generated-macos-binary-libtorch-cxx11-abi-nightly.yml index a1f39d4ceea408..eac3e4019cd350 100644 --- a/.github/workflows/generated-macos-binary-libtorch-cxx11-abi.yml +++ b/.github/workflows/generated-macos-binary-libtorch-cxx11-abi-nightly.yml @@ -36,6 +36,7 @@ concurrency: jobs: libtorch-cpu-shared-with-deps-cxx11-abi-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: macos-10.15 # libtorch builds take a long time on github hosted runners timeout-minutes: 720 @@ -137,30 +138,10 @@ jobs: LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -177,9 +158,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - uses: actions/download-artifact@v2 @@ -239,6 +217,7 @@ jobs: # Prune all of the docker images docker system prune -af libtorch-cpu-shared-without-deps-cxx11-abi-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: macos-10.15 # libtorch builds take a long time on github hosted runners timeout-minutes: 720 @@ -340,30 +319,10 @@ jobs: LIBTORCH_VARIANT: shared-without-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -380,9 +339,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - uses: actions/download-artifact@v2 @@ -442,6 +398,7 @@ jobs: # Prune all of the docker images docker system prune -af libtorch-cpu-static-with-deps-cxx11-abi-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: macos-10.15 # libtorch builds take a long time on github hosted runners timeout-minutes: 720 @@ -543,30 +500,10 @@ jobs: LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -583,9 +520,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - uses: actions/download-artifact@v2 @@ -645,6 +579,7 @@ jobs: # Prune all of the docker images docker system prune -af libtorch-cpu-static-without-deps-cxx11-abi-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: macos-10.15 # libtorch builds take a long time on github hosted runners timeout-minutes: 720 @@ -746,30 +681,10 @@ jobs: LIBTORCH_VARIANT: static-without-deps DESIRED_DEVTOOLSET: cxx11-abi steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -786,9 +701,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - uses: actions/download-artifact@v2 diff --git a/.github/workflows/generated-macos-binary-libtorch-pre-cxx11.yml b/.github/workflows/generated-macos-binary-libtorch-pre-cxx11-nightly.yml similarity index 87% rename from .github/workflows/generated-macos-binary-libtorch-pre-cxx11.yml rename to .github/workflows/generated-macos-binary-libtorch-pre-cxx11-nightly.yml index cf6936d467744b..b943ea97a97011 100644 --- a/.github/workflows/generated-macos-binary-libtorch-pre-cxx11.yml +++ b/.github/workflows/generated-macos-binary-libtorch-pre-cxx11-nightly.yml @@ -36,6 +36,7 @@ concurrency: jobs: libtorch-cpu-shared-with-deps-pre-cxx11-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: macos-10.15 # libtorch builds take a long time on github hosted runners timeout-minutes: 720 @@ -137,30 +138,10 @@ jobs: LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -177,9 +158,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - uses: actions/download-artifact@v2 @@ -239,6 +217,7 @@ jobs: # Prune all of the docker images docker system prune -af libtorch-cpu-shared-without-deps-pre-cxx11-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: macos-10.15 # libtorch builds take a long time on github hosted runners timeout-minutes: 720 @@ -340,30 +319,10 @@ jobs: LIBTORCH_VARIANT: shared-without-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -380,9 +339,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - uses: actions/download-artifact@v2 @@ -442,6 +398,7 @@ jobs: # Prune all of the docker images docker system prune -af libtorch-cpu-static-with-deps-pre-cxx11-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: macos-10.15 # libtorch builds take a long time on github hosted runners timeout-minutes: 720 @@ -543,30 +500,10 @@ jobs: LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -583,9 +520,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - uses: actions/download-artifact@v2 @@ -645,6 +579,7 @@ jobs: # Prune all of the docker images docker system prune -af libtorch-cpu-static-without-deps-pre-cxx11-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: macos-10.15 # libtorch builds take a long time on github hosted runners timeout-minutes: 720 @@ -746,30 +681,10 @@ jobs: LIBTORCH_VARIANT: static-without-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -786,9 +701,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - uses: actions/download-artifact@v2 diff --git a/.github/workflows/generated-macos-binary-wheel.yml b/.github/workflows/generated-macos-binary-wheel-nightly.yml similarity index 86% rename from .github/workflows/generated-macos-binary-wheel.yml rename to .github/workflows/generated-macos-binary-wheel-nightly.yml index 1db195ea06d6e0..2dd93eea93ca9c 100644 --- a/.github/workflows/generated-macos-binary-wheel.yml +++ b/.github/workflows/generated-macos-binary-wheel-nightly.yml @@ -36,6 +36,7 @@ concurrency: jobs: wheel-py3_7-cpu-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: macos-10.15 timeout-minutes: 240 env: @@ -131,30 +132,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -171,9 +152,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - uses: actions/download-artifact@v2 @@ -233,6 +211,7 @@ jobs: # Prune all of the docker images docker system prune -af wheel-py3_8-cpu-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: macos-10.15 timeout-minutes: 240 env: @@ -328,30 +307,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -368,9 +327,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - uses: actions/download-artifact@v2 @@ -430,6 +386,7 @@ jobs: # Prune all of the docker images docker system prune -af wheel-py3_9-cpu-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: macos-10.15 timeout-minutes: 240 env: @@ -525,30 +482,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.9" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -565,9 +502,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - uses: actions/download-artifact@v2 @@ -627,6 +561,7 @@ jobs: # Prune all of the docker images docker system prune -af wheel-py3_10-cpu-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: macos-10.15 timeout-minutes: 240 env: @@ -722,30 +657,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -762,9 +677,6 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - uses: actions/download-artifact@v2 diff --git a/.github/workflows/generated-parallelnative-linux-xenial-py3.7-gcc5.4.yml b/.github/workflows/generated-parallelnative-linux-xenial-py3.7-gcc5.4.yml deleted file mode 100644 index 17322971c3fc84..00000000000000 --- a/.github/workflows/generated-parallelnative-linux-xenial-py3.7-gcc5.4.yml +++ /dev/null @@ -1,746 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/linux_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: parallelnative-linux-xenial-py3.7-gcc5.4 - -on: - push: - tags: - - 'ciflow/all/*' - - 'ciflow/cpu/*' - - 'ciflow/linux/*' - - 'ciflow/trunk/*' - branches: - - master - - main - - release/* - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: parallelnative-linux-xenial-py3.7-gcc5.4 - DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc5.4 - SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 - XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla - TORCH_CUDA_ARCH_LIST: 5.2 - IN_CI: 1 - IS_GHA: 1 - # This is used for the phase of adding wheel tests only, will be removed once completed - IN_WHEEL_TEST: 1 - # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh - CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - AWS_DEFAULT_REGION: us-east-1 - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - PYTORCH_RETRY_TEST_CASES: 1 -concurrency: - group: parallelnative-linux-xenial-py3.7-gcc5.4-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true - -jobs: - - build: - runs-on: linux.2xlarge - timeout-minutes: 240 - env: - JOB_BASE_NAME: parallelnative-linux-xenial-py3.7-gcc5.4-build - outputs: - docker_image: ${{ steps.calculate-tag.outputs.docker_image }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Calculate docker image tag - id: calculate-tag - run: | - DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker) - echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "::set-output name=docker_tag::${DOCKER_TAG}" - echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" - - name: Check if image should be built - id: check - env: - BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }} - run: | - set -x - # Check if image already exists, if it does then skip building it - if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then - exit 0 - fi - if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then - # if we're on the base branch then use the parent commit - MERGE_BASE=$(git rev-parse HEAD~) - else - # otherwise we're on a PR, so use the most recent base commit - MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") - fi - # Covers the case where a previous tag doesn't exist for the tree - # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly - if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then - echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit" - exit 1 - fi - PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker") - # If no image exists but the hash is the same as the previous hash then we should error out here - if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then - echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch" - echo " contact the PyTorch team to restore the original images" - exit 1 - fi - echo ::set-output name=rebuild::yes - - name: Build and push docker image - if: ${{ steps.check.outputs.rebuild }} - env: - DOCKER_SKIP_S3_UPLOAD: 1 - working-directory: .circleci/docker - run: | - export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/} - ./build_docker.sh - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Build - env: - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - # detached container should get cleaned up by teardown_ec2_linux - container_name=$(docker run \ - -e BUILD_ENVIRONMENT \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e AWS_DEFAULT_REGION \ - -e IS_GHA \ - -e PR_NUMBER \ - -e SHA1 \ - -e BRANCH \ - -e GITHUB_RUN_ID \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e PR_LABELS \ - -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh' - - name: Display and upload binary build size statistics (Click Me) - # temporary hack: set CIRCLE_* vars, until we update - # tools/stats/print_test_stats.py to natively support GitHub Actions - env: - SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - run: | - COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0) - export COMMIT_TIME - pip3 install requests==2.26 boto3==1.16.34 - python3 -m tools.stats.upload_binary_size_to_scuba || exit 0 - - name: Chown workspace - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Archive artifacts into zip - run: | - zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json - - uses: seemethere/upload-artifact-s3@v3 - name: Store PyTorch Build Artifacts on S3 - with: - name: ${{ env.BUILD_ENVIRONMENT }} - retention-days: 14 - if-no-files-found: error - path: - artifacts.zip - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Clean up docker images - if: always() - run: | - # Prune all of the docker images - docker system prune -af - - test_distributed_1_1: - name: test (distributed, 1, 1, linux.2xlarge) - needs: build - runs-on: linux.2xlarge - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: parallelnative-linux-xenial-py3.7-gcc5.4-test - TEST_CONFIG: distributed - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 1 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-distributed-1-1-linux.2xlarge' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-distributed-1-1-linux.2xlarge' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: parallelnative-linux-xenial-py3.7-gcc5.4-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - test_default_1_1: - name: test (default, 1, 1, linux.2xlarge) - needs: build - runs-on: linux.2xlarge - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: parallelnative-linux-xenial-py3.7-gcc5.4-test - TEST_CONFIG: default - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 1 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-1-1-linux.2xlarge' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-1-1-linux.2xlarge' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: parallelnative-linux-xenial-py3.7-gcc5.4-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af diff --git a/.github/workflows/generated-periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7.yml b/.github/workflows/generated-periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7.yml deleted file mode 100644 index bcf59941a8c7f2..00000000000000 --- a/.github/workflows/generated-periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7.yml +++ /dev/null @@ -1,239 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/linux_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7 - -on: - push: - tags: - - 'ciflow/all/*' - - 'ciflow/cuda/*' - - 'ciflow/libtorch/*' - - 'ciflow/linux/*' - - 'ciflow/scheduled/*' - schedule: - - cron: 45 4,10,16,22 * * * - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7 - DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.5-cudnn8-py3-gcc7 - SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 - XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla - TORCH_CUDA_ARCH_LIST: 5.2 - IN_CI: 1 - IS_GHA: 1 - # This is used for the phase of adding wheel tests only, will be removed once completed - IN_WHEEL_TEST: 1 - # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh - CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - AWS_DEFAULT_REGION: us-east-1 - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - PYTORCH_RETRY_TEST_CASES: 1 -concurrency: - group: periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true - -jobs: - - build: - runs-on: linux.2xlarge - timeout-minutes: 240 - env: - JOB_BASE_NAME: periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7-build - outputs: - docker_image: ${{ steps.calculate-tag.outputs.docker_image }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Calculate docker image tag - id: calculate-tag - run: | - DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker) - echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "::set-output name=docker_tag::${DOCKER_TAG}" - echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" - - name: Check if image should be built - id: check - env: - BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }} - run: | - set -x - # Check if image already exists, if it does then skip building it - if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then - exit 0 - fi - if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then - # if we're on the base branch then use the parent commit - MERGE_BASE=$(git rev-parse HEAD~) - else - # otherwise we're on a PR, so use the most recent base commit - MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") - fi - # Covers the case where a previous tag doesn't exist for the tree - # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly - if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then - echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit" - exit 1 - fi - PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker") - # If no image exists but the hash is the same as the previous hash then we should error out here - if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then - echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch" - echo " contact the PyTorch team to restore the original images" - exit 1 - fi - echo ::set-output name=rebuild::yes - - name: Build and push docker image - if: ${{ steps.check.outputs.rebuild }} - env: - DOCKER_SKIP_S3_UPLOAD: 1 - working-directory: .circleci/docker - run: | - export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/} - ./build_docker.sh - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Build - env: - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - # detached container should get cleaned up by teardown_ec2_linux - container_name=$(docker run \ - -e BUILD_ENVIRONMENT \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e AWS_DEFAULT_REGION \ - -e IS_GHA \ - -e PR_NUMBER \ - -e SHA1 \ - -e BRANCH \ - -e GITHUB_RUN_ID \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e PR_LABELS \ - -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh' - - name: Display and upload binary build size statistics (Click Me) - # temporary hack: set CIRCLE_* vars, until we update - # tools/stats/print_test_stats.py to natively support GitHub Actions - env: - SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - run: | - COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0) - export COMMIT_TIME - pip3 install requests==2.26 boto3==1.16.34 - python3 -m tools.stats.upload_binary_size_to_scuba || exit 0 - - name: Chown workspace - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Clean up docker images - if: always() - run: | - # Prune all of the docker images - docker system prune -af diff --git a/.github/workflows/generated-periodic-linux-bionic-cuda11.5-py3.7-gcc7.yml b/.github/workflows/generated-periodic-linux-bionic-cuda11.5-py3.7-gcc7.yml deleted file mode 100644 index ff85e17659c075..00000000000000 --- a/.github/workflows/generated-periodic-linux-bionic-cuda11.5-py3.7-gcc7.yml +++ /dev/null @@ -1,1018 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/linux_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: periodic-linux-bionic-cuda11.5-py3.7-gcc7 - -on: - push: - tags: - - 'ciflow/all/*' - - 'ciflow/cuda/*' - - 'ciflow/linux/*' - - 'ciflow/scheduled/*' - schedule: - - cron: 45 4,10,16,22 * * * - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: periodic-linux-bionic-cuda11.5-py3.7-gcc7 - DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.5-cudnn8-py3-gcc7 - SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 - XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla - TORCH_CUDA_ARCH_LIST: 5.2 - IN_CI: 1 - IS_GHA: 1 - # This is used for the phase of adding wheel tests only, will be removed once completed - IN_WHEEL_TEST: 1 - # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh - CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - AWS_DEFAULT_REGION: us-east-1 - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - PYTORCH_RETRY_TEST_CASES: 1 -concurrency: - group: periodic-linux-bionic-cuda11.5-py3.7-gcc7-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true - -jobs: - - build: - runs-on: linux.2xlarge - timeout-minutes: 240 - env: - JOB_BASE_NAME: periodic-linux-bionic-cuda11.5-py3.7-gcc7-build - outputs: - docker_image: ${{ steps.calculate-tag.outputs.docker_image }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Calculate docker image tag - id: calculate-tag - run: | - DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker) - echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "::set-output name=docker_tag::${DOCKER_TAG}" - echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" - - name: Check if image should be built - id: check - env: - BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }} - run: | - set -x - # Check if image already exists, if it does then skip building it - if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then - exit 0 - fi - if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then - # if we're on the base branch then use the parent commit - MERGE_BASE=$(git rev-parse HEAD~) - else - # otherwise we're on a PR, so use the most recent base commit - MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") - fi - # Covers the case where a previous tag doesn't exist for the tree - # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly - if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then - echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit" - exit 1 - fi - PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker") - # If no image exists but the hash is the same as the previous hash then we should error out here - if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then - echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch" - echo " contact the PyTorch team to restore the original images" - exit 1 - fi - echo ::set-output name=rebuild::yes - - name: Build and push docker image - if: ${{ steps.check.outputs.rebuild }} - env: - DOCKER_SKIP_S3_UPLOAD: 1 - working-directory: .circleci/docker - run: | - export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/} - ./build_docker.sh - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Build - env: - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - # detached container should get cleaned up by teardown_ec2_linux - container_name=$(docker run \ - -e BUILD_ENVIRONMENT \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e AWS_DEFAULT_REGION \ - -e IS_GHA \ - -e PR_NUMBER \ - -e SHA1 \ - -e BRANCH \ - -e GITHUB_RUN_ID \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e PR_LABELS \ - -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh' - - name: Display and upload binary build size statistics (Click Me) - # temporary hack: set CIRCLE_* vars, until we update - # tools/stats/print_test_stats.py to natively support GitHub Actions - env: - SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - run: | - COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0) - export COMMIT_TIME - pip3 install requests==2.26 boto3==1.16.34 - python3 -m tools.stats.upload_binary_size_to_scuba || exit 0 - - name: Chown workspace - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Archive artifacts into zip - run: | - zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json - - uses: seemethere/upload-artifact-s3@v3 - name: Store PyTorch Build Artifacts on S3 - with: - name: ${{ env.BUILD_ENVIRONMENT }} - retention-days: 14 - if-no-files-found: error - path: - artifacts.zip - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Clean up docker images - if: always() - run: | - # Prune all of the docker images - docker system prune -af - - test_distributed_1_1: - name: test (distributed, 1, 1, linux.8xlarge.nvidia.gpu) - needs: build - runs-on: linux.8xlarge.nvidia.gpu - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: periodic-linux-bionic-cuda11.5-py3.7-gcc7-test - TEST_CONFIG: distributed - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 1 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG - with: - timeout_minutes: 10 - max_attempts: 3 - command: | - set -ex - bash .github/scripts/install_nvidia_utils_linux.sh - echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-distributed-1-1-linux.8xlarge.nvidia.gpu' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-distributed-1-1-linux.8xlarge.nvidia.gpu' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: periodic-linux-bionic-cuda11.5-py3.7-gcc7-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - test_default_1_2: - name: test (default, 1, 2, linux.4xlarge.nvidia.gpu) - needs: build - runs-on: linux.4xlarge.nvidia.gpu - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: periodic-linux-bionic-cuda11.5-py3.7-gcc7-test - TEST_CONFIG: default - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 2 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG - with: - timeout_minutes: 10 - max_attempts: 3 - command: | - set -ex - bash .github/scripts/install_nvidia_utils_linux.sh - echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-1-2-linux.4xlarge.nvidia.gpu' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-1-2-linux.4xlarge.nvidia.gpu' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: periodic-linux-bionic-cuda11.5-py3.7-gcc7-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - test_default_2_2: - name: test (default, 2, 2, linux.4xlarge.nvidia.gpu) - needs: build - runs-on: linux.4xlarge.nvidia.gpu - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: periodic-linux-bionic-cuda11.5-py3.7-gcc7-test - TEST_CONFIG: default - SHARD_NUMBER: 2 - NUM_TEST_SHARDS: 2 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG - with: - timeout_minutes: 10 - max_attempts: 3 - command: | - set -ex - bash .github/scripts/install_nvidia_utils_linux.sh - echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-2-2-linux.4xlarge.nvidia.gpu' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-2-2-linux.4xlarge.nvidia.gpu' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: periodic-linux-bionic-cuda11.5-py3.7-gcc7-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af diff --git a/.github/workflows/generated-periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck.yml b/.github/workflows/generated-periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck.yml deleted file mode 100644 index 5d5c901859f0bd..00000000000000 --- a/.github/workflows/generated-periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck.yml +++ /dev/null @@ -1,764 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/linux_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck - -on: - push: - tags: - - 'ciflow/all/*' - - 'ciflow/cuda/*' - - 'ciflow/linux/*' - - 'ciflow/scheduled/*' - - 'ciflow/slow/*' - - 'ciflow/slow-gradcheck/*' - schedule: - - cron: 0 */4 * * * - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck - DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7 - SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 - XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla - TORCH_CUDA_ARCH_LIST: 5.2 - IN_CI: 1 - IS_GHA: 1 - # This is used for the phase of adding wheel tests only, will be removed once completed - IN_WHEEL_TEST: 1 - # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh - CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - AWS_DEFAULT_REGION: us-east-1 - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - PYTORCH_RETRY_TEST_CASES: 1 -concurrency: - group: periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true - -jobs: - - build: - runs-on: linux.2xlarge - timeout-minutes: 240 - env: - JOB_BASE_NAME: periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck-build - outputs: - docker_image: ${{ steps.calculate-tag.outputs.docker_image }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Calculate docker image tag - id: calculate-tag - run: | - DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker) - echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "::set-output name=docker_tag::${DOCKER_TAG}" - echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" - - name: Check if image should be built - id: check - env: - BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }} - run: | - set -x - # Check if image already exists, if it does then skip building it - if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then - exit 0 - fi - if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then - # if we're on the base branch then use the parent commit - MERGE_BASE=$(git rev-parse HEAD~) - else - # otherwise we're on a PR, so use the most recent base commit - MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") - fi - # Covers the case where a previous tag doesn't exist for the tree - # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly - if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then - echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit" - exit 1 - fi - PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker") - # If no image exists but the hash is the same as the previous hash then we should error out here - if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then - echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch" - echo " contact the PyTorch team to restore the original images" - exit 1 - fi - echo ::set-output name=rebuild::yes - - name: Build and push docker image - if: ${{ steps.check.outputs.rebuild }} - env: - DOCKER_SKIP_S3_UPLOAD: 1 - working-directory: .circleci/docker - run: | - export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/} - ./build_docker.sh - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Build - env: - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - # detached container should get cleaned up by teardown_ec2_linux - container_name=$(docker run \ - -e BUILD_ENVIRONMENT \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e AWS_DEFAULT_REGION \ - -e IS_GHA \ - -e PR_NUMBER \ - -e SHA1 \ - -e BRANCH \ - -e GITHUB_RUN_ID \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e PR_LABELS \ - -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh' - - name: Display and upload binary build size statistics (Click Me) - # temporary hack: set CIRCLE_* vars, until we update - # tools/stats/print_test_stats.py to natively support GitHub Actions - env: - SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - run: | - COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0) - export COMMIT_TIME - pip3 install requests==2.26 boto3==1.16.34 - python3 -m tools.stats.upload_binary_size_to_scuba || exit 0 - - name: Chown workspace - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Archive artifacts into zip - run: | - zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json - - uses: seemethere/upload-artifact-s3@v3 - name: Store PyTorch Build Artifacts on S3 - with: - name: ${{ env.BUILD_ENVIRONMENT }} - retention-days: 14 - if-no-files-found: error - path: - artifacts.zip - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Clean up docker images - if: always() - run: | - # Prune all of the docker images - docker system prune -af - - test_default_1_2: - name: test (default, 1, 2, linux.4xlarge.nvidia.gpu) - needs: build - runs-on: linux.4xlarge.nvidia.gpu - timeout-minutes: 390 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck-test - TEST_CONFIG: default - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 2 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG - with: - timeout_minutes: 10 - max_attempts: 3 - command: | - set -ex - bash .github/scripts/install_nvidia_utils_linux.sh - echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 360 minutes - timeout-minutes: 360 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-1-2-linux.4xlarge.nvidia.gpu' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-1-2-linux.4xlarge.nvidia.gpu' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - test_default_2_2: - name: test (default, 2, 2, linux.4xlarge.nvidia.gpu) - needs: build - runs-on: linux.4xlarge.nvidia.gpu - timeout-minutes: 390 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck-test - TEST_CONFIG: default - SHARD_NUMBER: 2 - NUM_TEST_SHARDS: 2 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG - with: - timeout_minutes: 10 - max_attempts: 3 - command: | - set -ex - bash .github/scripts/install_nvidia_utils_linux.sh - echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 360 minutes - timeout-minutes: 360 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-2-2-linux.4xlarge.nvidia.gpu' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-2-2-linux.4xlarge.nvidia.gpu' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af diff --git a/.github/workflows/generated-periodic-linux-xenial-cuda11.3-py3.7-gcc7-debug.yml b/.github/workflows/generated-periodic-linux-xenial-cuda11.3-py3.7-gcc7-debug.yml deleted file mode 100644 index 8e4f047facad57..00000000000000 --- a/.github/workflows/generated-periodic-linux-xenial-cuda11.3-py3.7-gcc7-debug.yml +++ /dev/null @@ -1,1019 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/linux_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: periodic-linux-xenial-cuda11.3-py3.7-gcc7-debug - -on: - push: - tags: - - 'ciflow/all/*' - - 'ciflow/cuda/*' - - 'ciflow/linux/*' - - 'ciflow/scheduled/*' - schedule: - - cron: 45 0,4,8,12,16,20 * * * - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: periodic-linux-xenial-cuda11.3-py3.7-gcc7-debug - DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7 - SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 - XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla - TORCH_CUDA_ARCH_LIST: 5.2 - IN_CI: 1 - IS_GHA: 1 - # This is used for the phase of adding wheel tests only, will be removed once completed - IN_WHEEL_TEST: 1 - # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh - CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - AWS_DEFAULT_REGION: us-east-1 - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - PYTORCH_RETRY_TEST_CASES: 1 - DEBUG: 1 -concurrency: - group: periodic-linux-xenial-cuda11.3-py3.7-gcc7-debug-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true - -jobs: - - build: - runs-on: linux.2xlarge - timeout-minutes: 240 - env: - JOB_BASE_NAME: periodic-linux-xenial-cuda11.3-py3.7-gcc7-debug-build - outputs: - docker_image: ${{ steps.calculate-tag.outputs.docker_image }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Calculate docker image tag - id: calculate-tag - run: | - DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker) - echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "::set-output name=docker_tag::${DOCKER_TAG}" - echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" - - name: Check if image should be built - id: check - env: - BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }} - run: | - set -x - # Check if image already exists, if it does then skip building it - if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then - exit 0 - fi - if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then - # if we're on the base branch then use the parent commit - MERGE_BASE=$(git rev-parse HEAD~) - else - # otherwise we're on a PR, so use the most recent base commit - MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") - fi - # Covers the case where a previous tag doesn't exist for the tree - # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly - if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then - echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit" - exit 1 - fi - PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker") - # If no image exists but the hash is the same as the previous hash then we should error out here - if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then - echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch" - echo " contact the PyTorch team to restore the original images" - exit 1 - fi - echo ::set-output name=rebuild::yes - - name: Build and push docker image - if: ${{ steps.check.outputs.rebuild }} - env: - DOCKER_SKIP_S3_UPLOAD: 1 - working-directory: .circleci/docker - run: | - export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/} - ./build_docker.sh - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Build - env: - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - # detached container should get cleaned up by teardown_ec2_linux - container_name=$(docker run \ - -e BUILD_ENVIRONMENT \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e AWS_DEFAULT_REGION \ - -e IS_GHA \ - -e PR_NUMBER \ - -e SHA1 \ - -e BRANCH \ - -e GITHUB_RUN_ID \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e PR_LABELS \ - -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh' - - name: Display and upload binary build size statistics (Click Me) - # temporary hack: set CIRCLE_* vars, until we update - # tools/stats/print_test_stats.py to natively support GitHub Actions - env: - SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - run: | - COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0) - export COMMIT_TIME - pip3 install requests==2.26 boto3==1.16.34 - python3 -m tools.stats.upload_binary_size_to_scuba || exit 0 - - name: Chown workspace - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Archive artifacts into zip - run: | - zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json - - uses: seemethere/upload-artifact-s3@v3 - name: Store PyTorch Build Artifacts on S3 - with: - name: ${{ env.BUILD_ENVIRONMENT }} - retention-days: 14 - if-no-files-found: error - path: - artifacts.zip - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Clean up docker images - if: always() - run: | - # Prune all of the docker images - docker system prune -af - - test_distributed_1_1: - name: test (distributed, 1, 1, linux.8xlarge.nvidia.gpu) - needs: build - runs-on: linux.8xlarge.nvidia.gpu - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: periodic-linux-xenial-cuda11.3-py3.7-gcc7-debug-test - TEST_CONFIG: distributed - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 1 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG - with: - timeout_minutes: 10 - max_attempts: 3 - command: | - set -ex - bash .github/scripts/install_nvidia_utils_linux.sh - echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-distributed-1-1-linux.8xlarge.nvidia.gpu' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-distributed-1-1-linux.8xlarge.nvidia.gpu' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: periodic-linux-xenial-cuda11.3-py3.7-gcc7-debug-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - test_default_1_2: - name: test (default, 1, 2, linux.4xlarge.nvidia.gpu) - needs: build - runs-on: linux.4xlarge.nvidia.gpu - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: periodic-linux-xenial-cuda11.3-py3.7-gcc7-debug-test - TEST_CONFIG: default - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 2 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG - with: - timeout_minutes: 10 - max_attempts: 3 - command: | - set -ex - bash .github/scripts/install_nvidia_utils_linux.sh - echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-1-2-linux.4xlarge.nvidia.gpu' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-1-2-linux.4xlarge.nvidia.gpu' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: periodic-linux-xenial-cuda11.3-py3.7-gcc7-debug-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - test_default_2_2: - name: test (default, 2, 2, linux.4xlarge.nvidia.gpu) - needs: build - runs-on: linux.4xlarge.nvidia.gpu - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: periodic-linux-xenial-cuda11.3-py3.7-gcc7-debug-test - TEST_CONFIG: default - SHARD_NUMBER: 2 - NUM_TEST_SHARDS: 2 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG - with: - timeout_minutes: 10 - max_attempts: 3 - command: | - set -ex - bash .github/scripts/install_nvidia_utils_linux.sh - echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-2-2-linux.4xlarge.nvidia.gpu' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-2-2-linux.4xlarge.nvidia.gpu' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: periodic-linux-xenial-cuda11.3-py3.7-gcc7-debug-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af diff --git a/.github/workflows/generated-periodic-win-vs2019-cuda11.5-py3.yml b/.github/workflows/generated-periodic-win-vs2019-cuda11.5-py3.yml deleted file mode 100644 index 8041eca3762360..00000000000000 --- a/.github/workflows/generated-periodic-win-vs2019-cuda11.5-py3.yml +++ /dev/null @@ -1,601 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/windows_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: periodic-win-vs2019-cuda11.5-py3 - -on: - push: - tags: - - 'ciflow/all/*' - - 'ciflow/cuda/*' - - 'ciflow/scheduled/*' - - 'ciflow/win/*' - schedule: - - cron: 45 4,10,16,22 * * * - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: periodic-win-vs2019-cuda11.5-py3 - BUILD_WHEEL: 1 - MAX_JOBS: 8 - CUDA_VERSION: "11.5" - IN_CI: 1 - IS_GHA: 1 - INSTALL_WINDOWS_SDK: 1 - PYTHON_VERSION: "3.8" - PYTORCH_RETRY_TEST_CASES: 1 - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - SCCACHE_BUCKET: "ossci-compiler-cache" - VC_PRODUCT: "BuildTools" - VC_VERSION: "" - VS_VERSION: "16.8.6" - VC_YEAR: "2019" - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - no_proxy: localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock - AWS_DEFAULT_REGION: us-east-1 - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TORCH_CUDA_ARCH_LIST: "7.0" - USE_CUDA: 1 - -concurrency: - group: periodic-win-vs2019-cuda11.5-py3-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true - -jobs: - build: - runs-on: "windows.4xlarge" - timeout-minutes: 240 - env: - JOB_BASE_NAME: periodic-win-vs2019-cuda11.5-py3-build - http_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" - https_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Install Visual Studio 2019 toolchain - shell: powershell - run: | - .\.circleci\scripts\vs_install.ps1 - - name: Install Cuda - shell: bash - run: | - .circleci/scripts/windows_cuda_install.sh - - name: Install Cudnn - shell: bash - run: | - .circleci/scripts/windows_cudnn_install.sh - - uses: actions/setup-python@v2 - name: Setup Python3 - with: - python-version: '3.x' - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Build - shell: bash - env: - PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/ - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - .jenkins/pytorch/win-build.sh - # Upload to github so that people can click and download artifacts - - name: Upload artifacts to s3 - uses: seemethere/upload-artifact-s3@v3 - with: - retention-days: 14 - if-no-files-found: error - name: ${{ env.BUILD_ENVIRONMENT }} - path: C:\${{ github.run_id }}\build-results - - name: Wait until all sessions have drained - shell: powershell - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - - name: Cleanup build-results and workspaces - if: always() - shell: bash - env: - PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/ - # Should remove the entirety of pytorch-${{ github.run_id }} - run: | - rm -rf "${PYTORCH_FINAL_PACKAGE_DIR}" - rm -rf ./* - test_force_on_cpu_1_1: - name: test (force_on_cpu, 1, 1, windows.4xlarge) - timeout-minutes: 270 - env: - JOB_BASE_NAME: periodic-win-vs2019-cuda11.5-py3-test - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 1 - TEST_CONFIG: force_on_cpu - http_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" - https_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" - PR_BODY: ${{ github.event.pull_request.body }} - needs: build - runs-on: windows.4xlarge - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Install Visual Studio 2019 toolchain - shell: powershell - run: | - .\.circleci\scripts\vs_install.ps1 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - path: C:\${{ github.run_id }}\build-results - - name: Check build-results folder - shell: powershell - run: | - tree /F C:\$Env:GITHUB_RUN_ID\build-results - # Needed for coverage in win-test.sh - - uses: actions/setup-python@v2 - name: Setup Python3 - with: - python-version: '3.x' - - name: Test - shell: bash - env: - PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/ - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - .jenkins/pytorch/win-test.sh - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-force_on_cpu-1-1-windows.4xlarge' - shell: powershell - run: | - # -ir => recursive include all files in pattern - 7z a "test-jsons-$Env:FILE_SUFFIX.zip" -ir'!test\*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-force_on_cpu-1-1-windows.4xlarge' - shell: powershell - run: | - # -ir => recursive include all files in pattern - 7z a "test-reports-$Env:FILE_SUFFIX.zip" -ir'!test\*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Wait until all sessions have drained - shell: powershell - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: periodic-win-vs2019-cuda11.5-py3-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Cleanup workspace - if: always() - shell: bash - # Should remove the entirety of pytorch-${{ github.run_id }} - run: | - rm -rf ./* - test_default_1_2: - name: test (default, 1, 2, windows.8xlarge.nvidia.gpu) - timeout-minutes: 270 - env: - JOB_BASE_NAME: periodic-win-vs2019-cuda11.5-py3-test - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 2 - TEST_CONFIG: default - http_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" - https_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" - PR_BODY: ${{ github.event.pull_request.body }} - needs: build - runs-on: windows.8xlarge.nvidia.gpu - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Install Visual Studio 2019 toolchain - shell: powershell - run: | - .\.circleci\scripts\vs_install.ps1 - - name: Install Cuda - shell: bash - run: | - .circleci/scripts/windows_cuda_install.sh - - name: Install Cudnn - shell: bash - run: | - .circleci/scripts/windows_cudnn_install.sh - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - path: C:\${{ github.run_id }}\build-results - - name: Check build-results folder - shell: powershell - run: | - tree /F C:\$Env:GITHUB_RUN_ID\build-results - # Needed for coverage in win-test.sh - - uses: actions/setup-python@v2 - name: Setup Python3 - with: - python-version: '3.x' - - name: Test - shell: bash - env: - PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/ - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - .jenkins/pytorch/win-test.sh - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-1-2-windows.8xlarge.nvidia.gpu' - shell: powershell - run: | - # -ir => recursive include all files in pattern - 7z a "test-jsons-$Env:FILE_SUFFIX.zip" -ir'!test\*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-1-2-windows.8xlarge.nvidia.gpu' - shell: powershell - run: | - # -ir => recursive include all files in pattern - 7z a "test-reports-$Env:FILE_SUFFIX.zip" -ir'!test\*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Wait until all sessions have drained - shell: powershell - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: periodic-win-vs2019-cuda11.5-py3-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Cleanup workspace - if: always() - shell: bash - # Should remove the entirety of pytorch-${{ github.run_id }} - run: | - rm -rf ./* - test_default_2_2: - name: test (default, 2, 2, windows.8xlarge.nvidia.gpu) - timeout-minutes: 270 - env: - JOB_BASE_NAME: periodic-win-vs2019-cuda11.5-py3-test - SHARD_NUMBER: 2 - NUM_TEST_SHARDS: 2 - TEST_CONFIG: default - http_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" - https_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" - PR_BODY: ${{ github.event.pull_request.body }} - needs: build - runs-on: windows.8xlarge.nvidia.gpu - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Install Visual Studio 2019 toolchain - shell: powershell - run: | - .\.circleci\scripts\vs_install.ps1 - - name: Install Cuda - shell: bash - run: | - .circleci/scripts/windows_cuda_install.sh - - name: Install Cudnn - shell: bash - run: | - .circleci/scripts/windows_cudnn_install.sh - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - path: C:\${{ github.run_id }}\build-results - - name: Check build-results folder - shell: powershell - run: | - tree /F C:\$Env:GITHUB_RUN_ID\build-results - # Needed for coverage in win-test.sh - - uses: actions/setup-python@v2 - name: Setup Python3 - with: - python-version: '3.x' - - name: Test - shell: bash - env: - PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/ - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - .jenkins/pytorch/win-test.sh - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-2-2-windows.8xlarge.nvidia.gpu' - shell: powershell - run: | - # -ir => recursive include all files in pattern - 7z a "test-jsons-$Env:FILE_SUFFIX.zip" -ir'!test\*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-2-2-windows.8xlarge.nvidia.gpu' - shell: powershell - run: | - # -ir => recursive include all files in pattern - 7z a "test-reports-$Env:FILE_SUFFIX.zip" -ir'!test\*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Wait until all sessions have drained - shell: powershell - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: periodic-win-vs2019-cuda11.5-py3-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Cleanup workspace - if: always() - shell: bash - # Should remove the entirety of pytorch-${{ github.run_id }} - run: | - rm -rf ./* diff --git a/.github/workflows/generated-pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build.yml b/.github/workflows/generated-pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build.yml deleted file mode 100644 index c198168b1cd883..00000000000000 --- a/.github/workflows/generated-pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build.yml +++ /dev/null @@ -1,510 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/android_ci_full_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build - -on: - push: - tags: - - 'ciflow/all/*' - - 'ciflow/android/*' - - 'ciflow/cpu/*' - - 'ciflow/linux/*' - - 'ciflow/trunk/*' - branches: - - master - - main - - release/* - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build - DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c - SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 - XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla - TORCH_CUDA_ARCH_LIST: 5.2 - IN_CI: 1 - IS_GHA: 1 - # This is used for the phase of adding wheel tests only, will be removed once completed - IN_WHEEL_TEST: 1 - # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh - CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - AWS_DEFAULT_REGION: us-east-1 - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - PYTORCH_RETRY_TEST_CASES: 1 -concurrency: - group: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true - -jobs: - - # building and testing in a single job since bazel runs only small subset of tests - build-and-test: - runs-on: linux.2xlarge - env: - JOB_BASE_NAME: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build-build-and-test - NUM_TEST_SHARDS: 1 - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Calculate docker image tag - id: calculate-tag - run: | - DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker) - echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "::set-output name=docker_tag::${DOCKER_TAG}" - echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" - - name: Check if image should be built - id: check - env: - BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }} - run: | - set -x - # Check if image already exists, if it does then skip building it - if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then - exit 0 - fi - if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then - # if we're on the base branch then use the parent commit - MERGE_BASE=$(git rev-parse HEAD~) - else - # otherwise we're on a PR, so use the most recent base commit - MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") - fi - # Covers the case where a previous tag doesn't exist for the tree - # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly - if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then - echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit" - exit 1 - fi - PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker") - # If no image exists but the hash is the same as the previous hash then we should error out here - if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then - echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch" - echo " contact the PyTorch team to restore the original images" - exit 1 - fi - echo ::set-output name=rebuild::yes - - name: Build and push docker image - if: ${{ steps.check.outputs.rebuild }} - env: - DOCKER_SKIP_S3_UPLOAD: 1 - working-directory: .circleci/docker - run: | - export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/} - ./build_docker.sh - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - name: Output disk space left - run: | - sudo df -H - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Build-arm-v7a - env: - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - # detached container should get cleaned up by teardown_ec2_linux - #!/bin/bash -eo pipefail - # Pull Docker image and run build - time docker pull "${DOCKER_IMAGE}" >/dev/null - echo "${DOCKER_IMAGE}" - export container_name - container_name=$(docker run \ - -e BUILD_ENVIRONMENT=pytorch-linux-xenial-py3-clang5-android-ndk-r19c-arm-v7a-build \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e AWS_DEFAULT_REGION \ - -e IS_GHA \ - -e PR_NUMBER \ - -e SHA1 \ - -e BRANCH \ - -e GITHUB_RUN_ID \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e PR_LABELS \ - -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - git submodule sync && git submodule update -q --init --recursive --depth 1 --jobs 0 - docker cp "${GITHUB_WORKSPACE}/." "${container_name}:/var/lib/jenkins/workspace" - # shellcheck disable=SC1105 - ((echo "sudo chown -R jenkins . && .jenkins/pytorch/build.sh && find ${BUILD_ROOT} -type f -name "*.a" -or -name "*.o" -delete") | docker exec -u jenkins -i "${container_name}" bash) 2>&1 - - # Copy dist folder back - export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-arm-v7a - docker cp "${container_name}:/var/lib/jenkins/workspace/dist" "${GITHUB_WORKSPACE}/." || echo "Dist folder not found" - docker commit "${container_name}" "${COMMIT_DOCKER_IMAGE}" - time docker push "${COMMIT_DOCKER_IMAGE}" - - name: Build-arm-v8a - env: - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - # detached container should get cleaned up by teardown_ec2_linux - #!/bin/bash -eo pipefail - # Pull Docker image and run build - time docker pull "${DOCKER_IMAGE}" >/dev/null - echo "${DOCKER_IMAGE}" - export container_name - container_name=$(docker run \ - -e BUILD_ENVIRONMENT=pytorch-linux-xenial-py3-clang5-android-ndk-r19c-arm-v8a-build \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e AWS_DEFAULT_REGION \ - -e IS_GHA \ - -e PR_NUMBER \ - -e SHA1 \ - -e BRANCH \ - -e GITHUB_RUN_ID \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e PR_LABELS \ - -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - git submodule sync && git submodule update -q --init --recursive --depth 1 --jobs 0 - docker cp "${GITHUB_WORKSPACE}/." "${container_name}:/var/lib/jenkins/workspace" - # shellcheck disable=SC1105 - ((echo "sudo chown -R jenkins . && .jenkins/pytorch/build.sh && find ${BUILD_ROOT} -type f -name "*.a" -or -name "*.o" -delete") | docker exec -u jenkins -i "${container_name}" bash) 2>&1 - - # Copy dist folder back - export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-arm-v8a - docker cp "${container_name}:/var/lib/jenkins/workspace/dist" "${GITHUB_WORKSPACE}/." || echo "Dist folder not found" - docker commit "${container_name}" "${COMMIT_DOCKER_IMAGE}" - time docker push "${COMMIT_DOCKER_IMAGE}" - - name: Build-x86_32 - env: - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - # detached container should get cleaned up by teardown_ec2_linux - #!/bin/bash -eo pipefail - # Pull Docker image and run build - time docker pull "${DOCKER_IMAGE}" >/dev/null - echo "${DOCKER_IMAGE}" - export container_name - container_name=$(docker run \ - -e BUILD_ENVIRONMENT=pytorch-linux-xenial-py3-clang5-android-ndk-r19c-x86_32-build \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e AWS_DEFAULT_REGION \ - -e IS_GHA \ - -e PR_NUMBER \ - -e SHA1 \ - -e BRANCH \ - -e GITHUB_RUN_ID \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e PR_LABELS \ - -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - git submodule sync && git submodule update -q --init --recursive --depth 1 --jobs 0 - docker cp "${GITHUB_WORKSPACE}/." "${container_name}:/var/lib/jenkins/workspace" - # shellcheck disable=SC1105 - ((echo "sudo chown -R jenkins . && .jenkins/pytorch/build.sh && find ${BUILD_ROOT} -type f -name "*.a" -or -name "*.o" -delete") | docker exec -u jenkins -i "${container_name}" bash) 2>&1 - - # Copy dist folder back - export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-x86_32 - docker cp "${container_name}:/var/lib/jenkins/workspace/dist" "${GITHUB_WORKSPACE}/." || echo "Dist folder not found" - docker commit "${container_name}" "${COMMIT_DOCKER_IMAGE}" - time docker push "${COMMIT_DOCKER_IMAGE}" - - name: Build-x86_64 - env: - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - # detached container should get cleaned up by teardown_ec2_linux - #!/bin/bash -eo pipefail - # Pull Docker image and run build - time docker pull "${DOCKER_IMAGE}" >/dev/null - echo "${DOCKER_IMAGE}" - export container_name - container_name=$(docker run \ - -e BUILD_ENVIRONMENT=pytorch-linux-xenial-py3-clang5-android-ndk-r19c-x86_64-build \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e AWS_DEFAULT_REGION \ - -e IS_GHA \ - -e PR_NUMBER \ - -e SHA1 \ - -e BRANCH \ - -e GITHUB_RUN_ID \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e PR_LABELS \ - -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - git submodule sync && git submodule update -q --init --recursive --depth 1 --jobs 0 - docker cp "${GITHUB_WORKSPACE}/." "${container_name}:/var/lib/jenkins/workspace" - # shellcheck disable=SC1105 - ((echo "sudo chown -R jenkins . && .jenkins/pytorch/build.sh && find ${BUILD_ROOT} -type f -name "*.a" -or -name "*.o" -delete") | docker exec -u jenkins -i "${container_name}" bash) 2>&1 - - # Copy dist folder back - export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-x86_64 - docker cp "${container_name}:/var/lib/jenkins/workspace/dist" "${GITHUB_WORKSPACE}/." || echo "Dist folder not found" - docker commit "${container_name}" "${COMMIT_DOCKER_IMAGE}" - time docker push "${COMMIT_DOCKER_IMAGE}" - - name: Build final artifact - env: - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - set -eux - - docker_image_libtorch_android_x86_32="${DOCKER_IMAGE}-x86_32" - docker_image_libtorch_android_x86_64="${DOCKER_IMAGE}-x86_64" - docker_image_libtorch_android_arm_v7a="${DOCKER_IMAGE}-arm-v7a" - docker_image_libtorch_android_arm_v8a="${DOCKER_IMAGE}-arm-v8a" - - echo "docker_image_commit: ${DOCKER_IMAGE}" - echo "docker_image_libtorch_android_x86_32: ${docker_image_libtorch_android_x86_32}" - echo "docker_image_libtorch_android_x86_64: ${docker_image_libtorch_android_x86_64}" - echo "docker_image_libtorch_android_arm_v7a: ${docker_image_libtorch_android_arm_v7a}" - echo "docker_image_libtorch_android_arm_v8a: ${docker_image_libtorch_android_arm_v8a}" - - # x86_32 - time docker pull "${docker_image_libtorch_android_x86_32}" >/dev/null - export id_x86_32 - id_x86_32=$(docker run -e GRADLE_OFFLINE=1 --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins "${docker_image_libtorch_android_x86_32}") - - # shellcheck disable=SC1105 - ((echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "${id_x86_32}" bash) 2>&1 - - # arm-v7a - time docker pull "${docker_image_libtorch_android_arm_v7a}" >/dev/null - export id_arm_v7a - id_arm_v7a=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins "${docker_image_libtorch_android_arm_v7a}") - - # shellcheck disable=SC1105 - ((echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "${id_arm_v7a}" bash) 2>&1 - - mkdir -p "${GITHUB_WORKSPACE}/build_android_install_arm_v7a" - docker cp "${id_arm_v7a}:/var/lib/jenkins/workspace/build_android/install" "${GITHUB_WORKSPACE}/build_android_install_arm_v7a" - - # x86_64 - time docker pull "${docker_image_libtorch_android_x86_64}" >/dev/null - export id_x86_64 - id_x86_64=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins "${docker_image_libtorch_android_x86_64}") - - # shellcheck disable=SC1105 - ((echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "${id_x86_64}" bash) 2>&1 - - mkdir -p "${GITHUB_WORKSPACE}/build_android_install_x86_64" - docker cp "${id_x86_64}:/var/lib/jenkins/workspace/build_android/install" "${GITHUB_WORKSPACE}/build_android_install_x86_64" - - # arm-v8a - time docker pull "${docker_image_libtorch_android_arm_v8a}" >/dev/null - export id_arm_v8a - id_arm_v8a=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins "${docker_image_libtorch_android_arm_v8a}") - - # shellcheck disable=SC1105 - ((echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "$id_arm_v8a" bash) 2>&1 - - mkdir -p "${GITHUB_WORKSPACE}/build_android_install_arm_v8a" - docker cp "${id_arm_v8a}:/var/lib/jenkins/workspace/build_android/install" "${GITHUB_WORKSPACE}/build_android_install_arm_v8a" - - # Putting everything together - docker cp "${GITHUB_WORKSPACE}/build_android_install_arm_v7a" "${id_x86_32}:/var/lib/jenkins/workspace/build_android_install_arm_v7a" - docker cp "${GITHUB_WORKSPACE}/build_android_install_x86_64" "${id_x86_32}:/var/lib/jenkins/workspace/build_android_install_x86_64" - docker cp "${GITHUB_WORKSPACE}/build_android_install_arm_v8a" "${id_x86_32}:/var/lib/jenkins/workspace/build_android_install_arm_v8a" - - # run gradle buildRelease - # shellcheck disable=SC1105 - ((echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/build_android_gradle.sh") | docker exec \ - -e BUILD_ENVIRONMENT="pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build" \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e AWS_DEFAULT_REGION \ - -e IS_GHA \ - -e PR_NUMBER \ - -e SHA1 \ - -e BRANCH \ - -e GITHUB_RUN_ID \ - -e SCCACHE_BUCKET \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e PR_LABELS \ - -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --user jenkins \ - -u jenkins -i "${id_x86_32}" bash) 2>&1 - - mkdir -p "${GITHUB_WORKSPACE}/build_android_artifacts" - docker cp "${id_x86_32}:/var/lib/jenkins/workspace/android/artifacts.tgz" "${GITHUB_WORKSPACE}/build_android_artifacts/" - - output_image="${DOCKER_IMAGE}-android-x86_32-gradle" - docker commit "${id_x86_32}" "${output_image}" - time docker push "${output_image}" - - name: Display and upload binary build size statistics (Click Me) - # temporary hack: set CIRCLE_* vars, until we update - # tools/stats/print_test_stats.py to natively support GitHub Actions - env: - AWS_DEFAULT_REGION: us-east-1 - SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - run: | - # The artifact file is created inside docker container, which contains the result binaries. - # Now unpackage it into the project folder. The subsequent script will scan project folder - # to locate result binaries and report their sizes. - # If artifact file is not provided it assumes that the project folder has been mounted in - # the docker during build and already contains the result binaries, so this step can be skipped. - export ARTIFACTS=${GITHUB_WORKSPACE}/build_android_artifacts/artifacts.tgz - if [ -n "${ARTIFACTS}" ]; then - tar xf "${ARTIFACTS}" -C "${GITHUB_WORKSPACE}" - cd "${GITHUB_WORKSPACE}" - fi - COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0) - export COMMIT_TIME - ANDROID_BUILD_TYPE=prebuilt - export ANDROID_BUILD_TYPE - pip3 install requests==2.26 boto3==1.16.34 - python3 -m tools.stats.upload_binary_size_to_scuba "android" || exit 0 - - uses: seemethere/upload-artifact-s3@v3 - name: Store PyTorch Android Build Artifacts on S3 - with: - name: ${{ env.BUILD_ENVIRONMENT }} - retention-days: 14 - if-no-files-found: error - path: - build_android_artifacts/artifacts.tgz - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af diff --git a/.github/workflows/generated-pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit.yml b/.github/workflows/generated-pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit.yml deleted file mode 100644 index 471b0bb759f336..00000000000000 --- a/.github/workflows/generated-pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit.yml +++ /dev/null @@ -1,277 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/android_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit - -on: - pull_request: - push: - tags: - - 'ciflow/all/*' - - 'ciflow/android/*' - - 'ciflow/cpu/*' - - 'ciflow/linux/*' - - 'ciflow/trunk/*' - branches: - - master - - main - - release/* - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit - DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c - SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 - XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla - TORCH_CUDA_ARCH_LIST: 5.2 - IN_CI: 1 - IS_GHA: 1 - # This is used for the phase of adding wheel tests only, will be removed once completed - IN_WHEEL_TEST: 1 - # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh - CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - AWS_DEFAULT_REGION: us-east-1 - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - PYTORCH_RETRY_TEST_CASES: 1 -concurrency: - group: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true - -jobs: - - # building and testing in a single job since bazel runs only small subset of tests - build-and-test: - runs-on: linux.2xlarge - env: - JOB_BASE_NAME: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit-build-and-test - NUM_TEST_SHARDS: 1 - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Calculate docker image tag - id: calculate-tag - run: | - DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker) - echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "::set-output name=docker_tag::${DOCKER_TAG}" - echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" - - name: Check if image should be built - id: check - env: - BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }} - run: | - set -x - # Check if image already exists, if it does then skip building it - if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then - exit 0 - fi - if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then - # if we're on the base branch then use the parent commit - MERGE_BASE=$(git rev-parse HEAD~) - else - # otherwise we're on a PR, so use the most recent base commit - MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") - fi - # Covers the case where a previous tag doesn't exist for the tree - # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly - if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then - echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit" - exit 1 - fi - PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker") - # If no image exists but the hash is the same as the previous hash then we should error out here - if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then - echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch" - echo " contact the PyTorch team to restore the original images" - exit 1 - fi - echo ::set-output name=rebuild::yes - - name: Build and push docker image - if: ${{ steps.check.outputs.rebuild }} - env: - DOCKER_SKIP_S3_UPLOAD: 1 - working-directory: .circleci/docker - run: | - export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/} - ./build_docker.sh - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - name: Output disk space left - run: | - sudo df -H - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Build - run: | - set -e - # Unlike other gradle jobs, it's not worth building libtorch in a separate CI job and share via docker, because: - # 1) Not shareable: it's custom selective build, which is different from default libtorch mobile build; - # 2) Not parallelizable by architecture: it only builds libtorch for one architecture; - - echo "DOCKER_IMAGE: ${DOCKER_IMAGE}" - time docker pull "${DOCKER_IMAGE}" >/dev/null - - export BUILD_LITE_INTERPRETER - BUILD_LITE_INTERPRETER="1" - if [[ "${BUILD_ENVIRONMENT}" == *"full-jit" ]]; then - BUILD_LITE_INTERPRETER="0" - fi - - git submodule sync && git submodule update -q --init --recursive --depth 1 --jobs 0 - # shellcheck disable=SC2016 - export id - id=$(docker run -e BUILD_ENVIRONMENT \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e PR_LABELS \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e BUILD_LITE_INTERPRETER \ - -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -v "$(pwd):/var/lib/jenkins/workspace" \ - --cap-add=SYS_PTRACE \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --security-opt seccomp=unconfined \ - -t -d -w /var/lib/jenkins "${DOCKER_IMAGE}") - - # shellcheck disable=SC2016 - export COMMAND - # shellcheck disable=SC2016 - COMMAND='((echo "export GRADLE_OFFLINE=1" && echo "export BUILD_LITE_INTERPRETER=${BUILD_LITE_INTERPRETER}" && echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/build_android_gradle.sh") | docker exec -u jenkins -i "$id" bash) 2>&1' - echo "${COMMAND}" > ./command.sh && bash ./command.sh - # Skip docker push as this job is purely for size analysis purpose. - # Result binaries are already in `/home/circleci/project/` as it's mounted instead of copied. - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Display and upload binary build size statistics (Click Me) - # temporary hack: set CIRCLE_* vars, until we update - # tools/stats/print_test_stats.py to natively support GitHub Actions - env: - AWS_DEFAULT_REGION: us-east-1 - SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - run: | - # The artifact file is created inside docker container, which contains the result binaries. - # Now unpackage it into the project folder. The subsequent script will scan project folder - # to locate result binaries and report their sizes. - # If artifact file is not provided it assumes that the project folder has been mounted in - # the docker during build and already contains the result binaries, so this step can be skipped. - export ARTIFACTS= - if [ -n "${ARTIFACTS}" ]; then - tar xf "${ARTIFACTS}" -C "${GITHUB_WORKSPACE}" - cd "${GITHUB_WORKSPACE}" - fi - COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0) - export COMMIT_TIME - ANDROID_BUILD_TYPE=custom-build-single - export ANDROID_BUILD_TYPE - pip3 install requests==2.26 boto3==1.16.34 - python3 -m tools.stats.upload_binary_size_to_scuba "android" || exit 0 - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af diff --git a/.github/workflows/generated-pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single.yml b/.github/workflows/generated-pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single.yml deleted file mode 100644 index 7d0f98c29bd698..00000000000000 --- a/.github/workflows/generated-pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single.yml +++ /dev/null @@ -1,277 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/android_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single - -on: - pull_request: - push: - tags: - - 'ciflow/all/*' - - 'ciflow/android/*' - - 'ciflow/cpu/*' - - 'ciflow/linux/*' - - 'ciflow/trunk/*' - branches: - - master - - main - - release/* - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single - DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c - SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 - XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla - TORCH_CUDA_ARCH_LIST: 5.2 - IN_CI: 1 - IS_GHA: 1 - # This is used for the phase of adding wheel tests only, will be removed once completed - IN_WHEEL_TEST: 1 - # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh - CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - AWS_DEFAULT_REGION: us-east-1 - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - PYTORCH_RETRY_TEST_CASES: 1 -concurrency: - group: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true - -jobs: - - # building and testing in a single job since bazel runs only small subset of tests - build-and-test: - runs-on: linux.2xlarge - env: - JOB_BASE_NAME: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-build-and-test - NUM_TEST_SHARDS: 1 - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Calculate docker image tag - id: calculate-tag - run: | - DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker) - echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "::set-output name=docker_tag::${DOCKER_TAG}" - echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" - - name: Check if image should be built - id: check - env: - BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }} - run: | - set -x - # Check if image already exists, if it does then skip building it - if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then - exit 0 - fi - if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then - # if we're on the base branch then use the parent commit - MERGE_BASE=$(git rev-parse HEAD~) - else - # otherwise we're on a PR, so use the most recent base commit - MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") - fi - # Covers the case where a previous tag doesn't exist for the tree - # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly - if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then - echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit" - exit 1 - fi - PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker") - # If no image exists but the hash is the same as the previous hash then we should error out here - if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then - echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch" - echo " contact the PyTorch team to restore the original images" - exit 1 - fi - echo ::set-output name=rebuild::yes - - name: Build and push docker image - if: ${{ steps.check.outputs.rebuild }} - env: - DOCKER_SKIP_S3_UPLOAD: 1 - working-directory: .circleci/docker - run: | - export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/} - ./build_docker.sh - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - name: Output disk space left - run: | - sudo df -H - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Build - run: | - set -e - # Unlike other gradle jobs, it's not worth building libtorch in a separate CI job and share via docker, because: - # 1) Not shareable: it's custom selective build, which is different from default libtorch mobile build; - # 2) Not parallelizable by architecture: it only builds libtorch for one architecture; - - echo "DOCKER_IMAGE: ${DOCKER_IMAGE}" - time docker pull "${DOCKER_IMAGE}" >/dev/null - - export BUILD_LITE_INTERPRETER - BUILD_LITE_INTERPRETER="1" - if [[ "${BUILD_ENVIRONMENT}" == *"full-jit" ]]; then - BUILD_LITE_INTERPRETER="0" - fi - - git submodule sync && git submodule update -q --init --recursive --depth 1 --jobs 0 - # shellcheck disable=SC2016 - export id - id=$(docker run -e BUILD_ENVIRONMENT \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e PR_LABELS \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e BUILD_LITE_INTERPRETER \ - -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -v "$(pwd):/var/lib/jenkins/workspace" \ - --cap-add=SYS_PTRACE \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --security-opt seccomp=unconfined \ - -t -d -w /var/lib/jenkins "${DOCKER_IMAGE}") - - # shellcheck disable=SC2016 - export COMMAND - # shellcheck disable=SC2016 - COMMAND='((echo "export GRADLE_OFFLINE=1" && echo "export BUILD_LITE_INTERPRETER=${BUILD_LITE_INTERPRETER}" && echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/build_android_gradle.sh") | docker exec -u jenkins -i "$id" bash) 2>&1' - echo "${COMMAND}" > ./command.sh && bash ./command.sh - # Skip docker push as this job is purely for size analysis purpose. - # Result binaries are already in `/home/circleci/project/` as it's mounted instead of copied. - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Display and upload binary build size statistics (Click Me) - # temporary hack: set CIRCLE_* vars, until we update - # tools/stats/print_test_stats.py to natively support GitHub Actions - env: - AWS_DEFAULT_REGION: us-east-1 - SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - run: | - # The artifact file is created inside docker container, which contains the result binaries. - # Now unpackage it into the project folder. The subsequent script will scan project folder - # to locate result binaries and report their sizes. - # If artifact file is not provided it assumes that the project folder has been mounted in - # the docker during build and already contains the result binaries, so this step can be skipped. - export ARTIFACTS= - if [ -n "${ARTIFACTS}" ]; then - tar xf "${ARTIFACTS}" -C "${GITHUB_WORKSPACE}" - cd "${GITHUB_WORKSPACE}" - fi - COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0) - export COMMIT_TIME - ANDROID_BUILD_TYPE=custom-build-single - export ANDROID_BUILD_TYPE - pip3 install requests==2.26 boto3==1.16.34 - python3 -m tools.stats.upload_binary_size_to_scuba "android" || exit 0 - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af diff --git a/.github/workflows/generated-pytorch-xla-linux-bionic-py3.7-clang8.yml b/.github/workflows/generated-pytorch-xla-linux-bionic-py3.7-clang8.yml deleted file mode 100644 index 8890295d6253cb..00000000000000 --- a/.github/workflows/generated-pytorch-xla-linux-bionic-py3.7-clang8.yml +++ /dev/null @@ -1,468 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/linux_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: pytorch-xla-linux-bionic-py3.7-clang8 - -on: - pull_request: - push: - tags: - - 'ciflow/all/*' - - 'ciflow/cpu/*' - - 'ciflow/linux/*' - - 'ciflow/trunk/*' - - 'ciflow/xla/*' - branches: - - master - - main - - release/* - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: pytorch-xla-linux-bionic-py3.7-clang8 - DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/xla_base - SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 - XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla - TORCH_CUDA_ARCH_LIST: 5.2 - IN_CI: 1 - IS_GHA: 1 - # This is used for the phase of adding wheel tests only, will be removed once completed - IN_WHEEL_TEST: 1 - # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh - CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - AWS_DEFAULT_REGION: us-east-1 - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - PYTORCH_RETRY_TEST_CASES: 1 - # This is used for XLA tests only - XLA_CUDA: 0 - XLA_IMAGE_TAG: v0.2 -concurrency: - group: pytorch-xla-linux-bionic-py3.7-clang8-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true - -jobs: - - build: - runs-on: linux.2xlarge - timeout-minutes: 240 - env: - JOB_BASE_NAME: pytorch-xla-linux-bionic-py3.7-clang8-build - outputs: - docker_image: ${{ steps.calculate-tag.outputs.docker_image }} - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Calculate docker image tag - id: calculate-tag - run: | - echo "XLA workflow uses pre-built test image at ${XLA_IMAGE_TAG}" - DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker) - echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${XLA_IMAGE_TAG}" >> "${GITHUB_ENV}" - echo "::set-output name=docker_tag::${DOCKER_TAG}" - echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${XLA_IMAGE_TAG}" - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Build - env: - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - # detached container should get cleaned up by teardown_ec2_linux - container_name=$(docker run \ - -e BUILD_ENVIRONMENT \ - -e JOB_BASE_NAME \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e AWS_DEFAULT_REGION \ - -e IS_GHA \ - -e PR_NUMBER \ - -e SHA1 \ - -e BRANCH \ - -e GITHUB_RUN_ID \ - -e SCCACHE_BUCKET \ - -e XLA_CUDA \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e SKIP_SCCACHE_INITIALIZATION=1 \ - -e TORCH_CUDA_ARCH_LIST \ - -e PR_LABELS \ - -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --tty \ - --detach \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh' - - name: Display and upload binary build size statistics (Click Me) - # temporary hack: set CIRCLE_* vars, until we update - # tools/stats/print_test_stats.py to natively support GitHub Actions - env: - SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - run: | - COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0) - export COMMIT_TIME - pip3 install requests==2.26 boto3==1.16.34 - python3 -m tools.stats.upload_binary_size_to_scuba || exit 0 - - name: Chown workspace - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Archive artifacts into zip - run: | - zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json - - uses: seemethere/upload-artifact-s3@v3 - name: Store PyTorch Build Artifacts on S3 - with: - name: ${{ env.BUILD_ENVIRONMENT }} - retention-days: 14 - if-no-files-found: error - path: - artifacts.zip - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Clean up docker images - if: always() - run: | - # Prune all of the docker images - docker system prune -af - - test_xla_1_1: - name: test (xla, 1, 1, linux.2xlarge) - needs: build - runs-on: linux.2xlarge - timeout-minutes: 270 - env: - DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }} - JOB_BASE_NAME: pytorch-xla-linux-bionic-py3.7-clang8-test - TEST_CONFIG: xla - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 1 - PR_BODY: ${{ github.event.pull_request.body }} - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" - - name: Determine shm-size - run: | - shm_size="1g" - case "${BUILD_ENVIRONMENT}" in - *cuda*) - shm_size="2g" - ;; - *rocm*) - shm_size="8g" - ;; - esac - echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - - name: Unzip artifacts - run: | - unzip -o artifacts.zip - - name: Output disk space left - run: | - sudo df -H - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Test - env: - PR_NUMBER: ${{ github.event.pull_request.number }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - set -x - - if [[ $TEST_CONFIG == 'multigpu' ]]; then - TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh - elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then - TEST_COMMAND=.jenkins/caffe2/test.sh - else - TEST_COMMAND=.jenkins/pytorch/test.sh - fi - PROXY_ENV= - # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now - # We should investigate whether or not there's a list of hostnames we can add to no_proxy to - # make it so that we shouldn't have to fully disable squid for XLA tests - if [[ $TEST_CONFIG != 'xla' ]]; then - # shellcheck disable=SC2089 - PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" - fi - # detached container should get cleaned up by teardown_ec2_linux - # TODO: Stop building test binaries as part of the build phase - # Used for GPU_FLAG since that doesn't play nice - # shellcheck disable=SC2086,SC2090 - container_name=$(docker run \ - ${GPU_FLAG:-} \ - -e BUILD_ENVIRONMENT \ - -e PR_NUMBER \ - -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ - -e GITHUB_ACTIONS \ - -e IN_CI \ - -e IS_GHA \ - -e BRANCH \ - -e SHA1 \ - -e AWS_DEFAULT_REGION \ - -e IN_WHEEL_TEST \ - -e SHARD_NUMBER \ - -e JOB_BASE_NAME \ - -e TEST_CONFIG \ - -e NUM_TEST_SHARDS \ - -e PR_BODY \ - -e PYTORCH_RETRY_TEST_CASES \ - -e PR_LABELS \ - -e MAX_JOBS="$(nproc --ignore=2)" \ - -e SCCACHE_BUCKET \ - -e XLA_CUDA \ - -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ - ${PROXY_ENV} \ - --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ - --ulimit stack=10485760:83886080 \ - --security-opt seccomp=unconfined \ - --cap-add=SYS_PTRACE \ - --ipc=host \ - --shm-size="${SHM_SIZE}" \ - --tty \ - --detach \ - --name="${container_name}" \ - --user jenkins \ - -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ - -w /var/lib/jenkins/workspace \ - "${DOCKER_IMAGE}" - ) - docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}" - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-xla-1-1-linux.2xlarge' - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-xla-1-1-linux.2xlarge' - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: pytorch-xla-linux-bionic-py3.7-clang8-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af diff --git a/.github/workflows/generated-win-vs2019-cpu-py3.yml b/.github/workflows/generated-win-vs2019-cpu-py3.yml deleted file mode 100644 index 070d41bd20714d..00000000000000 --- a/.github/workflows/generated-win-vs2019-cpu-py3.yml +++ /dev/null @@ -1,430 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/windows_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: win-vs2019-cpu-py3 - -on: - pull_request: - push: - tags: - - 'ciflow/all/*' - - 'ciflow/cpu/*' - - 'ciflow/trunk/*' - - 'ciflow/win/*' - branches: - - master - - main - - release/* - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: win-vs2019-cpu-py3 - BUILD_WHEEL: 1 - MAX_JOBS: 8 - CUDA_VERSION: "cpu" - IN_CI: 1 - IS_GHA: 1 - INSTALL_WINDOWS_SDK: 1 - PYTHON_VERSION: "3.8" - PYTORCH_RETRY_TEST_CASES: 1 - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - SCCACHE_BUCKET: "ossci-compiler-cache" - VC_PRODUCT: "BuildTools" - VC_VERSION: "" - VS_VERSION: "16.8.6" - VC_YEAR: "2019" - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - no_proxy: localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock - AWS_DEFAULT_REGION: us-east-1 - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - USE_CUDA: 0 - -concurrency: - group: win-vs2019-cpu-py3-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true - -jobs: - build: - runs-on: "windows.4xlarge" - timeout-minutes: 240 - env: - JOB_BASE_NAME: win-vs2019-cpu-py3-build - http_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" - https_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Install Visual Studio 2019 toolchain - shell: powershell - run: | - .\.circleci\scripts\vs_install.ps1 - - uses: actions/setup-python@v2 - name: Setup Python3 - with: - python-version: '3.x' - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Build - shell: bash - env: - PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/ - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - .jenkins/pytorch/win-build.sh - # Upload to github so that people can click and download artifacts - - name: Upload artifacts to s3 - uses: seemethere/upload-artifact-s3@v3 - with: - retention-days: 14 - if-no-files-found: error - name: ${{ env.BUILD_ENVIRONMENT }} - path: C:\${{ github.run_id }}\build-results - - name: Wait until all sessions have drained - shell: powershell - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - - name: Cleanup build-results and workspaces - if: always() - shell: bash - env: - PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/ - # Should remove the entirety of pytorch-${{ github.run_id }} - run: | - rm -rf "${PYTORCH_FINAL_PACKAGE_DIR}" - rm -rf ./* - test_default_1_2: - name: test (default, 1, 2, windows.4xlarge) - timeout-minutes: 270 - env: - JOB_BASE_NAME: win-vs2019-cpu-py3-test - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 2 - TEST_CONFIG: default - http_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" - https_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" - PR_BODY: ${{ github.event.pull_request.body }} - needs: build - runs-on: windows.4xlarge - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Install Visual Studio 2019 toolchain - shell: powershell - run: | - .\.circleci\scripts\vs_install.ps1 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - path: C:\${{ github.run_id }}\build-results - - name: Check build-results folder - shell: powershell - run: | - tree /F C:\$Env:GITHUB_RUN_ID\build-results - # Needed for coverage in win-test.sh - - uses: actions/setup-python@v2 - name: Setup Python3 - with: - python-version: '3.x' - - name: Test - shell: bash - env: - PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/ - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - .jenkins/pytorch/win-test.sh - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-1-2-windows.4xlarge' - shell: powershell - run: | - # -ir => recursive include all files in pattern - 7z a "test-jsons-$Env:FILE_SUFFIX.zip" -ir'!test\*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-1-2-windows.4xlarge' - shell: powershell - run: | - # -ir => recursive include all files in pattern - 7z a "test-reports-$Env:FILE_SUFFIX.zip" -ir'!test\*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Wait until all sessions have drained - shell: powershell - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: win-vs2019-cpu-py3-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Cleanup workspace - if: always() - shell: bash - # Should remove the entirety of pytorch-${{ github.run_id }} - run: | - rm -rf ./* - test_default_2_2: - name: test (default, 2, 2, windows.4xlarge) - timeout-minutes: 270 - env: - JOB_BASE_NAME: win-vs2019-cpu-py3-test - SHARD_NUMBER: 2 - NUM_TEST_SHARDS: 2 - TEST_CONFIG: default - http_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" - https_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" - PR_BODY: ${{ github.event.pull_request.body }} - needs: build - runs-on: windows.4xlarge - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Install Visual Studio 2019 toolchain - shell: powershell - run: | - .\.circleci\scripts\vs_install.ps1 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - path: C:\${{ github.run_id }}\build-results - - name: Check build-results folder - shell: powershell - run: | - tree /F C:\$Env:GITHUB_RUN_ID\build-results - # Needed for coverage in win-test.sh - - uses: actions/setup-python@v2 - name: Setup Python3 - with: - python-version: '3.x' - - name: Test - shell: bash - env: - PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/ - # Time out the test phase after 240 minutes - timeout-minutes: 240 - run: | - .jenkins/pytorch/win-test.sh - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-2-2-windows.4xlarge' - shell: powershell - run: | - # -ir => recursive include all files in pattern - 7z a "test-jsons-$Env:FILE_SUFFIX.zip" -ir'!test\*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-2-2-windows.4xlarge' - shell: powershell - run: | - # -ir => recursive include all files in pattern - 7z a "test-reports-$Env:FILE_SUFFIX.zip" -ir'!test\*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Wait until all sessions have drained - shell: powershell - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: win-vs2019-cpu-py3-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Cleanup workspace - if: always() - shell: bash - # Should remove the entirety of pytorch-${{ github.run_id }} - run: | - rm -rf ./* diff --git a/.github/workflows/generated-win-vs2019-cuda11.3-py3.yml b/.github/workflows/generated-win-vs2019-cuda11.3-py3.yml deleted file mode 100644 index fe218f09ec6d9d..00000000000000 --- a/.github/workflows/generated-win-vs2019-cuda11.3-py3.yml +++ /dev/null @@ -1,604 +0,0 @@ -# @generated DO NOT EDIT MANUALLY -# Template is at: .github/templates/windows_ci_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: win-vs2019-cuda11.3-py3 - -on: - pull_request: - push: - tags: - - 'ciflow/all/*' - - 'ciflow/cuda/*' - - 'ciflow/trunk/*' - - 'ciflow/win/*' - branches: - - master - - main - - release/* - workflow_dispatch: - -env: - BUILD_ENVIRONMENT: win-vs2019-cuda11.3-py3 - BUILD_WHEEL: 1 - MAX_JOBS: 8 - CUDA_VERSION: "11.3" - IN_CI: 1 - IS_GHA: 1 - INSTALL_WINDOWS_SDK: 1 - PYTHON_VERSION: "3.8" - PYTORCH_RETRY_TEST_CASES: 1 - PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - SCCACHE_BUCKET: "ossci-compiler-cache" - VC_PRODUCT: "BuildTools" - VC_VERSION: "" - VS_VERSION: "16.8.6" - VC_YEAR: "2019" - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - no_proxy: localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock - AWS_DEFAULT_REGION: us-east-1 - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TORCH_CUDA_ARCH_LIST: "7.0" - USE_CUDA: 1 - -concurrency: - group: win-vs2019-cuda11.3-py3-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true - -jobs: - build: - runs-on: "windows.4xlarge" - timeout-minutes: 240 - env: - JOB_BASE_NAME: win-vs2019-cuda11.3-py3-build - http_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" - https_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" - steps: - - name: print labels - run: echo "${PR_LABELS}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Install Visual Studio 2019 toolchain - shell: powershell - run: | - .\.circleci\scripts\vs_install.ps1 - - name: Install Cuda - shell: bash - run: | - .circleci/scripts/windows_cuda_install.sh - - name: Install Cudnn - shell: bash - run: | - .circleci/scripts/windows_cudnn_install.sh - - uses: actions/setup-python@v2 - name: Setup Python3 - with: - python-version: '3.x' - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Build - shell: bash - env: - PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/ - BRANCH: ${{ steps.parse-ref.outputs.branch }} - run: | - .jenkins/pytorch/win-build.sh - # Upload to github so that people can click and download artifacts - - name: Upload artifacts to s3 - uses: seemethere/upload-artifact-s3@v3 - with: - retention-days: 14 - if-no-files-found: error - name: ${{ env.BUILD_ENVIRONMENT }} - path: C:\${{ github.run_id }}\build-results - - name: Wait until all sessions have drained - shell: powershell - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - - name: Cleanup build-results and workspaces - if: always() - shell: bash - env: - PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/ - # Should remove the entirety of pytorch-${{ github.run_id }} - run: | - rm -rf "${PYTORCH_FINAL_PACKAGE_DIR}" - rm -rf ./* - test_force_on_cpu_1_1: - name: test (force_on_cpu, 1, 1, windows.4xlarge) - timeout-minutes: 300 - env: - JOB_BASE_NAME: win-vs2019-cuda11.3-py3-test - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 1 - TEST_CONFIG: force_on_cpu - http_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" - https_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" - PR_BODY: ${{ github.event.pull_request.body }} - needs: build - runs-on: windows.4xlarge - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Install Visual Studio 2019 toolchain - shell: powershell - run: | - .\.circleci\scripts\vs_install.ps1 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - path: C:\${{ github.run_id }}\build-results - - name: Check build-results folder - shell: powershell - run: | - tree /F C:\$Env:GITHUB_RUN_ID\build-results - # Needed for coverage in win-test.sh - - uses: actions/setup-python@v2 - name: Setup Python3 - with: - python-version: '3.x' - - name: Test - shell: bash - env: - PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/ - # Time out the test phase after 270 minutes - timeout-minutes: 270 - run: | - .jenkins/pytorch/win-test.sh - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-force_on_cpu-1-1-windows.4xlarge' - shell: powershell - run: | - # -ir => recursive include all files in pattern - 7z a "test-jsons-$Env:FILE_SUFFIX.zip" -ir'!test\*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-force_on_cpu-1-1-windows.4xlarge' - shell: powershell - run: | - # -ir => recursive include all files in pattern - 7z a "test-reports-$Env:FILE_SUFFIX.zip" -ir'!test\*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Wait until all sessions have drained - shell: powershell - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: win-vs2019-cuda11.3-py3-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Cleanup workspace - if: always() - shell: bash - # Should remove the entirety of pytorch-${{ github.run_id }} - run: | - rm -rf ./* - test_default_1_2: - name: test (default, 1, 2, windows.8xlarge.nvidia.gpu) - timeout-minutes: 300 - env: - JOB_BASE_NAME: win-vs2019-cuda11.3-py3-test - SHARD_NUMBER: 1 - NUM_TEST_SHARDS: 2 - TEST_CONFIG: default - http_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" - https_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" - PR_BODY: ${{ github.event.pull_request.body }} - needs: build - runs-on: windows.8xlarge.nvidia.gpu - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Install Visual Studio 2019 toolchain - shell: powershell - run: | - .\.circleci\scripts\vs_install.ps1 - - name: Install Cuda - shell: bash - run: | - .circleci/scripts/windows_cuda_install.sh - - name: Install Cudnn - shell: bash - run: | - .circleci/scripts/windows_cudnn_install.sh - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - path: C:\${{ github.run_id }}\build-results - - name: Check build-results folder - shell: powershell - run: | - tree /F C:\$Env:GITHUB_RUN_ID\build-results - # Needed for coverage in win-test.sh - - uses: actions/setup-python@v2 - name: Setup Python3 - with: - python-version: '3.x' - - name: Test - shell: bash - env: - PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/ - # Time out the test phase after 270 minutes - timeout-minutes: 270 - run: | - .jenkins/pytorch/win-test.sh - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-1-2-windows.8xlarge.nvidia.gpu' - shell: powershell - run: | - # -ir => recursive include all files in pattern - 7z a "test-jsons-$Env:FILE_SUFFIX.zip" -ir'!test\*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-1-2-windows.8xlarge.nvidia.gpu' - shell: powershell - run: | - # -ir => recursive include all files in pattern - 7z a "test-reports-$Env:FILE_SUFFIX.zip" -ir'!test\*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Wait until all sessions have drained - shell: powershell - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: win-vs2019-cuda11.3-py3-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Cleanup workspace - if: always() - shell: bash - # Should remove the entirety of pytorch-${{ github.run_id }} - run: | - rm -rf ./* - test_default_2_2: - name: test (default, 2, 2, windows.8xlarge.nvidia.gpu) - timeout-minutes: 300 - env: - JOB_BASE_NAME: win-vs2019-cuda11.3-py3-test - SHARD_NUMBER: 2 - NUM_TEST_SHARDS: 2 - TEST_CONFIG: default - http_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" - https_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" - PR_BODY: ${{ github.event.pull_request.body }} - needs: build - runs-on: windows.8xlarge.nvidia.gpu - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - # deep clone, to allow use of git merge-base - fetch-depth: 0 - submodules: recursive - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - - name: Install Visual Studio 2019 toolchain - shell: powershell - run: | - .\.circleci\scripts\vs_install.ps1 - - name: Install Cuda - shell: bash - run: | - .circleci/scripts/windows_cuda_install.sh - - name: Install Cudnn - shell: bash - run: | - .circleci/scripts/windows_cudnn_install.sh - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b - name: Download PyTorch Build Artifacts - with: - name: ${{ env.BUILD_ENVIRONMENT }} - path: C:\${{ github.run_id }}\build-results - - name: Check build-results folder - shell: powershell - run: | - tree /F C:\$Env:GITHUB_RUN_ID\build-results - # Needed for coverage in win-test.sh - - uses: actions/setup-python@v2 - name: Setup Python3 - with: - python-version: '3.x' - - name: Test - shell: bash - env: - PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/ - # Time out the test phase after 270 minutes - timeout-minutes: 270 - run: | - .jenkins/pytorch/win-test.sh - - name: Zip JSONs for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-2-2-windows.8xlarge.nvidia.gpu' - shell: powershell - run: | - # -ir => recursive include all files in pattern - 7z a "test-jsons-$Env:FILE_SUFFIX.zip" -ir'!test\*.json' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Downloaded JSONs on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip - - name: Zip test reports for upload - if: always() - env: - FILE_SUFFIX: '${{ github.job }}-default-2-2-windows.8xlarge.nvidia.gpu' - shell: powershell - run: | - # -ir => recursive include all files in pattern - 7z a "test-reports-$Env:FILE_SUFFIX.zip" -ir'!test\*.xml' - - uses: seemethere/upload-artifact-s3@v3 - name: Store Test Reports on S3 - if: always() - with: - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ - - name: Wait until all sessions have drained - shell: powershell - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - - name: Parse ref - shell: bash - id: parse-ref - run: ./.github/scripts/parse_ref.py - - name: Upload test statistics - if: always() - env: - AWS_DEFAULT_REGION: us-east-1 - BRANCH: ${{ steps.parse-ref.outputs.branch }} - JOB_BASE_NAME: win-vs2019-cuda11.3-py3-test - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - shell: bash - run: | - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - - name: Cleanup workspace - if: always() - shell: bash - # Should remove the entirety of pytorch-${{ github.run_id }} - run: | - rm -rf ./* diff --git a/.github/workflows/generated-windows-binary-conda-nightly.yml b/.github/workflows/generated-windows-binary-conda-nightly.yml new file mode 100644 index 00000000000000..a65be8ad607705 --- /dev/null +++ b/.github/workflows/generated-windows-binary-conda-nightly.yml @@ -0,0 +1,4834 @@ +# @generated DO NOT EDIT MANUALLY + +# Template is at: .github/templates/windows_binary_build_workflow.yml.j2 +# Generation script: .github/scripts/generate_ci_workflows.py +name: windows-binary-conda + +on: + push: + # NOTE: Meta Employees can trigger new nightlies using: https://fburl.com/trigger_pytorch_nightly_build + branches: + - nightly + tags: + # NOTE: Binary build pipelines should only get triggered on release candidate builds + # Release candidate tags look like: v1.11.0-rc1 + - v[0-9]+.[0-9]+.[0-9]+-rc[0-9]+ + - 'ciflow/binaries/*' + - 'ciflow/binaries_conda/*' + workflow_dispatch: + +env: + # Needed for conda builds + ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" + ANACONDA_USER: pytorch + AWS_DEFAULT_REGION: us-east-1 + BUILD_ENVIRONMENT: windows-binary-conda + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + IN_CI: 1 + IS_GHA: 1 + PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} + PR_NUMBER: ${{ github.event.pull_request.number }} + PYTORCH_RETRY_TEST_CASES: 1 + SHA1: ${{ github.event.pull_request.head.sha || github.sha }} + SKIP_ALL_TESTS: 1 +concurrency: + group: windows-binary-conda-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} + cancel-in-progress: true + +jobs: + conda-py3_7-cpu-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.7" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Build PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" + - uses: seemethere/upload-artifact-s3@v4 + if: always() + with: + name: conda-py3_7-cpu + retention-days: 14 + if-no-files-found: error + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + conda-py3_7-cpu-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_7-cpu-build + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.7" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_7-cpu + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Test PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + conda-py3_7-cpu-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_7-cpu-test + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.7" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_7-cpu + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + conda-py3_7-cuda11_3-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.7" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Build PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" + - uses: seemethere/upload-artifact-s3@v4 + if: always() + with: + name: conda-py3_7-cuda11_3 + retention-days: 14 + if-no-files-found: error + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + conda-py3_7-cuda11_3-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_7-cuda11_3-build + runs-on: windows.8xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.7" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_7-cuda11_3 + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Test PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + conda-py3_7-cuda11_3-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_7-cuda11_3-test + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.7" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_7-cuda11_3 + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + conda-py3_7-cuda11_5-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.7" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Build PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" + - uses: seemethere/upload-artifact-s3@v4 + if: always() + with: + name: conda-py3_7-cuda11_5 + retention-days: 14 + if-no-files-found: error + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + conda-py3_7-cuda11_5-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_7-cuda11_5-build + runs-on: windows.8xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.7" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_7-cuda11_5 + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Test PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + conda-py3_7-cuda11_5-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_7-cuda11_5-test + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.7" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_7-cuda11_5 + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + conda-py3_7-cuda11_6-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.7" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Set-MpPreference -ExclusionPath $(Get-Location).tostring() + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Build PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" + - uses: seemethere/upload-artifact-s3@v4 + if: always() + with: + name: conda-py3_7-cuda11_6 + retention-days: 14 + if-no-files-found: error + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + conda-py3_7-cuda11_6-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_7-cuda11_6-build + runs-on: windows.8xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.7" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Set-MpPreference -ExclusionPath $(Get-Location).tostring() + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_7-cuda11_6 + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Test PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + conda-py3_7-cuda11_6-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_7-cuda11_6-test + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.7" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_7-cuda11_6 + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + conda-py3_8-cpu-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.8" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Build PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" + - uses: seemethere/upload-artifact-s3@v4 + if: always() + with: + name: conda-py3_8-cpu + retention-days: 14 + if-no-files-found: error + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + conda-py3_8-cpu-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_8-cpu-build + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.8" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_8-cpu + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Test PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + conda-py3_8-cpu-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_8-cpu-test + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.8" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_8-cpu + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + conda-py3_8-cuda11_3-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.8" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Build PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" + - uses: seemethere/upload-artifact-s3@v4 + if: always() + with: + name: conda-py3_8-cuda11_3 + retention-days: 14 + if-no-files-found: error + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + conda-py3_8-cuda11_3-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_8-cuda11_3-build + runs-on: windows.8xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.8" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_8-cuda11_3 + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Test PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + conda-py3_8-cuda11_3-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_8-cuda11_3-test + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.8" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_8-cuda11_3 + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + conda-py3_8-cuda11_5-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.8" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Build PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" + - uses: seemethere/upload-artifact-s3@v4 + if: always() + with: + name: conda-py3_8-cuda11_5 + retention-days: 14 + if-no-files-found: error + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + conda-py3_8-cuda11_5-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_8-cuda11_5-build + runs-on: windows.8xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.8" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_8-cuda11_5 + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Test PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + conda-py3_8-cuda11_5-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_8-cuda11_5-test + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.8" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_8-cuda11_5 + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + conda-py3_8-cuda11_6-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.8" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Set-MpPreference -ExclusionPath $(Get-Location).tostring() + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Build PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" + - uses: seemethere/upload-artifact-s3@v4 + if: always() + with: + name: conda-py3_8-cuda11_6 + retention-days: 14 + if-no-files-found: error + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + conda-py3_8-cuda11_6-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_8-cuda11_6-build + runs-on: windows.8xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.8" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Set-MpPreference -ExclusionPath $(Get-Location).tostring() + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_8-cuda11_6 + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Test PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + conda-py3_8-cuda11_6-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_8-cuda11_6-test + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.8" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_8-cuda11_6 + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + conda-py3_9-cpu-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.9" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Build PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" + - uses: seemethere/upload-artifact-s3@v4 + if: always() + with: + name: conda-py3_9-cpu + retention-days: 14 + if-no-files-found: error + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + conda-py3_9-cpu-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_9-cpu-build + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.9" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_9-cpu + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Test PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + conda-py3_9-cpu-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_9-cpu-test + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.9" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_9-cpu + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + conda-py3_9-cuda11_3-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.9" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Build PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" + - uses: seemethere/upload-artifact-s3@v4 + if: always() + with: + name: conda-py3_9-cuda11_3 + retention-days: 14 + if-no-files-found: error + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + conda-py3_9-cuda11_3-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_9-cuda11_3-build + runs-on: windows.8xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.9" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_9-cuda11_3 + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Test PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + conda-py3_9-cuda11_3-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_9-cuda11_3-test + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.9" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_9-cuda11_3 + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + conda-py3_9-cuda11_5-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.9" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Build PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" + - uses: seemethere/upload-artifact-s3@v4 + if: always() + with: + name: conda-py3_9-cuda11_5 + retention-days: 14 + if-no-files-found: error + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + conda-py3_9-cuda11_5-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_9-cuda11_5-build + runs-on: windows.8xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.9" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_9-cuda11_5 + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Test PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + conda-py3_9-cuda11_5-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_9-cuda11_5-test + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.9" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_9-cuda11_5 + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + conda-py3_9-cuda11_6-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.9" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Set-MpPreference -ExclusionPath $(Get-Location).tostring() + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Build PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" + - uses: seemethere/upload-artifact-s3@v4 + if: always() + with: + name: conda-py3_9-cuda11_6 + retention-days: 14 + if-no-files-found: error + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + conda-py3_9-cuda11_6-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_9-cuda11_6-build + runs-on: windows.8xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.9" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Set-MpPreference -ExclusionPath $(Get-Location).tostring() + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_9-cuda11_6 + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Test PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + conda-py3_9-cuda11_6-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_9-cuda11_6-test + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.9" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_9-cuda11_6 + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + conda-py3_10-cpu-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.10" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Build PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" + - uses: seemethere/upload-artifact-s3@v4 + if: always() + with: + name: conda-py3_10-cpu + retention-days: 14 + if-no-files-found: error + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + conda-py3_10-cpu-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_10-cpu-build + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.10" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_10-cpu + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Test PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + conda-py3_10-cpu-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_10-cpu-test + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.10" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_10-cpu + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + conda-py3_10-cuda11_3-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.10" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Build PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" + - uses: seemethere/upload-artifact-s3@v4 + if: always() + with: + name: conda-py3_10-cuda11_3 + retention-days: 14 + if-no-files-found: error + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + conda-py3_10-cuda11_3-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_10-cuda11_3-build + runs-on: windows.8xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.10" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_10-cuda11_3 + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Test PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + conda-py3_10-cuda11_3-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_10-cuda11_3-test + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.10" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_10-cuda11_3 + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + conda-py3_10-cuda11_5-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.10" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Build PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" + - uses: seemethere/upload-artifact-s3@v4 + if: always() + with: + name: conda-py3_10-cuda11_5 + retention-days: 14 + if-no-files-found: error + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + conda-py3_10-cuda11_5-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_10-cuda11_5-build + runs-on: windows.8xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.10" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_10-cuda11_5 + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Test PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + conda-py3_10-cuda11_5-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_10-cuda11_5-test + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.10" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_10-cuda11_5 + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + conda-py3_10-cuda11_6-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.10" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Set-MpPreference -ExclusionPath $(Get-Location).tostring() + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Build PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" + - uses: seemethere/upload-artifact-s3@v4 + if: always() + with: + name: conda-py3_10-cuda11_6 + retention-days: 14 + if-no-files-found: error + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + conda-py3_10-cuda11_6-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_10-cuda11_6-build + runs-on: windows.8xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.10" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Set-MpPreference -ExclusionPath $(Get-Location).tostring() + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_10-cuda11_6 + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Test PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + conda-py3_10-cuda11_6-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: conda-py3_10-cuda11_6-test + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: conda + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.10" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: conda-py3_10-cuda11_6 + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af diff --git a/.github/workflows/generated-windows-binary-libtorch-debug-master.yml b/.github/workflows/generated-windows-binary-libtorch-debug-master.yml new file mode 100644 index 00000000000000..04188e958fecfe --- /dev/null +++ b/.github/workflows/generated-windows-binary-libtorch-debug-master.yml @@ -0,0 +1,247 @@ +# @generated DO NOT EDIT MANUALLY + +# Template is at: .github/templates/windows_binary_build_workflow.yml.j2 +# Generation script: .github/scripts/generate_ci_workflows.py +name: windows-binary-libtorch-debug + +on: + push: + branches: + - master + tags: + - 'ciflow/all/*' + - 'ciflow/trunk/*' + workflow_dispatch: + +env: + # Needed for conda builds + ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" + ANACONDA_USER: pytorch + AWS_DEFAULT_REGION: us-east-1 + BUILD_ENVIRONMENT: windows-binary-libtorch-debug + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + IN_CI: 1 + IS_GHA: 1 + PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} + PR_NUMBER: ${{ github.event.pull_request.number }} + PYTORCH_RETRY_TEST_CASES: 1 + SHA1: ${{ github.event.pull_request.head.sha || github.sha }} + SKIP_ALL_TESTS: 1 +concurrency: + group: windows-binary-libtorch-debug-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} + cancel-in-progress: true + +jobs: + libtorch-cpu-shared-with-deps-debug-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + SKIP_ALL_TESTS: 1 + LIBTORCH_CONFIG: debug + LIBTORCH_VARIANT: shared-with-deps + # This is a dummy value for libtorch to work correctly with our batch scripts + # without this value pip does not get installed for some reason + DESIRED_PYTHON: "3.7" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Build PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" + - uses: seemethere/upload-artifact-s3@v4 + if: always() + with: + name: libtorch-cpu-shared-with-deps-debug + retention-days: 14 + if-no-files-found: error + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + libtorch-cpu-shared-with-deps-debug-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-cpu-shared-with-deps-debug-build + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + SKIP_ALL_TESTS: 1 + LIBTORCH_CONFIG: debug + LIBTORCH_VARIANT: shared-with-deps + # This is a dummy value for libtorch to work correctly with our batch scripts + # without this value pip does not get installed for some reason + DESIRED_PYTHON: "3.7" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cpu-shared-with-deps-debug + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Test PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 diff --git a/.github/workflows/generated-windows-binary-libtorch-debug.yml b/.github/workflows/generated-windows-binary-libtorch-debug-nightly.yml similarity index 68% rename from .github/workflows/generated-windows-binary-libtorch-debug.yml rename to .github/workflows/generated-windows-binary-libtorch-debug-nightly.yml index 38ff3b9c519437..22a6b60056f4b2 100644 --- a/.github/workflows/generated-windows-binary-libtorch-debug.yml +++ b/.github/workflows/generated-windows-binary-libtorch-debug-nightly.yml @@ -37,6 +37,7 @@ concurrency: jobs: libtorch-cpu-shared-with-deps-debug-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 env: @@ -67,10 +68,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -111,7 +123,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 if: always() with: name: libtorch-cpu-shared-with-deps-debug @@ -164,10 +176,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -177,7 +200,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cpu-shared-with-deps-debug @@ -245,30 +268,10 @@ jobs: # without this value pip does not get installed for some reason DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -285,12 +288,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cpu-shared-with-deps-debug @@ -347,6 +347,7 @@ jobs: # Prune all of the docker images docker system prune -af libtorch-cpu-shared-without-deps-debug-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 env: @@ -377,10 +378,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -421,7 +433,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 if: always() with: name: libtorch-cpu-shared-without-deps-debug @@ -474,10 +486,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -487,7 +510,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cpu-shared-without-deps-debug @@ -555,30 +578,10 @@ jobs: # without this value pip does not get installed for some reason DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -595,12 +598,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cpu-shared-without-deps-debug @@ -657,6 +657,7 @@ jobs: # Prune all of the docker images docker system prune -af libtorch-cpu-static-with-deps-debug-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 env: @@ -687,10 +688,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -731,7 +743,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 if: always() with: name: libtorch-cpu-static-with-deps-debug @@ -784,10 +796,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -797,7 +820,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cpu-static-with-deps-debug @@ -865,30 +888,10 @@ jobs: # without this value pip does not get installed for some reason DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -905,12 +908,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cpu-static-with-deps-debug @@ -967,6 +967,7 @@ jobs: # Prune all of the docker images docker system prune -af libtorch-cpu-static-without-deps-debug-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 env: @@ -997,10 +998,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -1041,7 +1053,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 if: always() with: name: libtorch-cpu-static-without-deps-debug @@ -1094,10 +1106,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -1107,7 +1130,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cpu-static-without-deps-debug @@ -1175,30 +1198,10 @@ jobs: # without this value pip does not get installed for some reason DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -1215,12 +1218,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cpu-static-without-deps-debug @@ -1277,6 +1277,7 @@ jobs: # Prune all of the docker images docker system prune -af libtorch-cuda11_3-shared-with-deps-debug-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 env: @@ -1308,10 +1309,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -1352,7 +1364,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 if: always() with: name: libtorch-cuda11_3-shared-with-deps-debug @@ -1406,10 +1418,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -1419,7 +1442,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_3-shared-with-deps-debug @@ -1488,30 +1511,10 @@ jobs: # without this value pip does not get installed for some reason DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -1528,12 +1531,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_3-shared-with-deps-debug @@ -1590,6 +1590,7 @@ jobs: # Prune all of the docker images docker system prune -af libtorch-cuda11_3-shared-without-deps-debug-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 env: @@ -1621,10 +1622,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -1665,7 +1677,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 if: always() with: name: libtorch-cuda11_3-shared-without-deps-debug @@ -1719,10 +1731,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -1732,7 +1755,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_3-shared-without-deps-debug @@ -1801,30 +1824,10 @@ jobs: # without this value pip does not get installed for some reason DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -1841,12 +1844,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_3-shared-without-deps-debug @@ -1903,6 +1903,7 @@ jobs: # Prune all of the docker images docker system prune -af libtorch-cuda11_3-static-with-deps-debug-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 env: @@ -1934,10 +1935,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -1978,7 +1990,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 if: always() with: name: libtorch-cuda11_3-static-with-deps-debug @@ -2032,10 +2044,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -2045,7 +2068,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_3-static-with-deps-debug @@ -2114,30 +2137,10 @@ jobs: # without this value pip does not get installed for some reason DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2154,12 +2157,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_3-static-with-deps-debug @@ -2216,6 +2216,7 @@ jobs: # Prune all of the docker images docker system prune -af libtorch-cuda11_3-static-without-deps-debug-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 env: @@ -2247,10 +2248,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -2291,7 +2303,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 if: always() with: name: libtorch-cuda11_3-static-without-deps-debug @@ -2345,10 +2357,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -2358,7 +2381,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_3-static-without-deps-debug @@ -2427,30 +2450,10 @@ jobs: # without this value pip does not get installed for some reason DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2467,12 +2470,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_3-static-without-deps-debug @@ -2529,6 +2529,7 @@ jobs: # Prune all of the docker images docker system prune -af libtorch-cuda11_5-shared-with-deps-debug-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 env: @@ -2560,10 +2561,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -2604,7 +2616,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 if: always() with: name: libtorch-cuda11_5-shared-with-deps-debug @@ -2658,10 +2670,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -2671,7 +2694,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_5-shared-with-deps-debug @@ -2740,34 +2763,14 @@ jobs: # without this value pip does not get installed for some reason DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") } retry docker pull "${ALPINE_IMAGE}" # Ensure the working directory gets chowned back to the current user @@ -2780,12 +2783,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_5-shared-with-deps-debug @@ -2842,6 +2842,7 @@ jobs: # Prune all of the docker images docker system prune -af libtorch-cuda11_5-shared-without-deps-debug-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 env: @@ -2873,10 +2874,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -2917,7 +2929,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 if: always() with: name: libtorch-cuda11_5-shared-without-deps-debug @@ -2971,10 +2983,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -2984,7 +3007,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_5-shared-without-deps-debug @@ -3053,30 +3076,10 @@ jobs: # without this value pip does not get installed for some reason DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3093,12 +3096,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_5-shared-without-deps-debug @@ -3155,6 +3155,7 @@ jobs: # Prune all of the docker images docker system prune -af libtorch-cuda11_5-static-with-deps-debug-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 env: @@ -3186,10 +3187,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -3230,7 +3242,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 if: always() with: name: libtorch-cuda11_5-static-with-deps-debug @@ -3284,10 +3296,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -3297,7 +3320,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_5-static-with-deps-debug @@ -3366,30 +3389,10 @@ jobs: # without this value pip does not get installed for some reason DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3406,12 +3409,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_5-static-with-deps-debug @@ -3468,6 +3468,7 @@ jobs: # Prune all of the docker images docker system prune -af libtorch-cuda11_5-static-without-deps-debug-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 env: @@ -3499,10 +3500,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -3543,7 +3555,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 if: always() with: name: libtorch-cuda11_5-static-without-deps-debug @@ -3597,10 +3609,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -3610,7 +3633,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_5-static-without-deps-debug @@ -3679,30 +3702,10 @@ jobs: # without this value pip does not get installed for some reason DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3719,12 +3722,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_5-static-without-deps-debug @@ -3780,3 +3780,1247 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af + libtorch-cuda11_6-shared-with-deps-debug-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + LIBTORCH_CONFIG: debug + LIBTORCH_VARIANT: shared-with-deps + # This is a dummy value for libtorch to work correctly with our batch scripts + # without this value pip does not get installed for some reason + DESIRED_PYTHON: "3.7" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Set-MpPreference -ExclusionPath $(Get-Location).tostring() + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Build PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" + - uses: seemethere/upload-artifact-s3@v4 + if: always() + with: + name: libtorch-cuda11_6-shared-with-deps-debug + retention-days: 14 + if-no-files-found: error + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + libtorch-cuda11_6-shared-with-deps-debug-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-cuda11_6-shared-with-deps-debug-build + runs-on: windows.8xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + LIBTORCH_CONFIG: debug + LIBTORCH_VARIANT: shared-with-deps + # This is a dummy value for libtorch to work correctly with our batch scripts + # without this value pip does not get installed for some reason + DESIRED_PYTHON: "3.7" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Set-MpPreference -ExclusionPath $(Get-Location).tostring() + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cuda11_6-shared-with-deps-debug + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Test PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + libtorch-cuda11_6-shared-with-deps-debug-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-cuda11_6-shared-with-deps-debug-test + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + LIBTORCH_CONFIG: debug + LIBTORCH_VARIANT: shared-with-deps + # This is a dummy value for libtorch to work correctly with our batch scripts + # without this value pip does not get installed for some reason + DESIRED_PYTHON: "3.7" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cuda11_6-shared-with-deps-debug + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-cuda11_6-shared-without-deps-debug-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + LIBTORCH_CONFIG: debug + LIBTORCH_VARIANT: shared-without-deps + # This is a dummy value for libtorch to work correctly with our batch scripts + # without this value pip does not get installed for some reason + DESIRED_PYTHON: "3.7" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Set-MpPreference -ExclusionPath $(Get-Location).tostring() + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Build PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" + - uses: seemethere/upload-artifact-s3@v4 + if: always() + with: + name: libtorch-cuda11_6-shared-without-deps-debug + retention-days: 14 + if-no-files-found: error + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + libtorch-cuda11_6-shared-without-deps-debug-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-cuda11_6-shared-without-deps-debug-build + runs-on: windows.8xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + LIBTORCH_CONFIG: debug + LIBTORCH_VARIANT: shared-without-deps + # This is a dummy value for libtorch to work correctly with our batch scripts + # without this value pip does not get installed for some reason + DESIRED_PYTHON: "3.7" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Set-MpPreference -ExclusionPath $(Get-Location).tostring() + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cuda11_6-shared-without-deps-debug + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Test PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + libtorch-cuda11_6-shared-without-deps-debug-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-cuda11_6-shared-without-deps-debug-test + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + LIBTORCH_CONFIG: debug + LIBTORCH_VARIANT: shared-without-deps + # This is a dummy value for libtorch to work correctly with our batch scripts + # without this value pip does not get installed for some reason + DESIRED_PYTHON: "3.7" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cuda11_6-shared-without-deps-debug + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-cuda11_6-static-with-deps-debug-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + LIBTORCH_CONFIG: debug + LIBTORCH_VARIANT: static-with-deps + # This is a dummy value for libtorch to work correctly with our batch scripts + # without this value pip does not get installed for some reason + DESIRED_PYTHON: "3.7" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Set-MpPreference -ExclusionPath $(Get-Location).tostring() + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Build PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" + - uses: seemethere/upload-artifact-s3@v4 + if: always() + with: + name: libtorch-cuda11_6-static-with-deps-debug + retention-days: 14 + if-no-files-found: error + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + libtorch-cuda11_6-static-with-deps-debug-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-cuda11_6-static-with-deps-debug-build + runs-on: windows.8xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + LIBTORCH_CONFIG: debug + LIBTORCH_VARIANT: static-with-deps + # This is a dummy value for libtorch to work correctly with our batch scripts + # without this value pip does not get installed for some reason + DESIRED_PYTHON: "3.7" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Set-MpPreference -ExclusionPath $(Get-Location).tostring() + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cuda11_6-static-with-deps-debug + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Test PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + libtorch-cuda11_6-static-with-deps-debug-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-cuda11_6-static-with-deps-debug-test + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + LIBTORCH_CONFIG: debug + LIBTORCH_VARIANT: static-with-deps + # This is a dummy value for libtorch to work correctly with our batch scripts + # without this value pip does not get installed for some reason + DESIRED_PYTHON: "3.7" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cuda11_6-static-with-deps-debug + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-cuda11_6-static-without-deps-debug-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + LIBTORCH_CONFIG: debug + LIBTORCH_VARIANT: static-without-deps + # This is a dummy value for libtorch to work correctly with our batch scripts + # without this value pip does not get installed for some reason + DESIRED_PYTHON: "3.7" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Set-MpPreference -ExclusionPath $(Get-Location).tostring() + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Build PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" + - uses: seemethere/upload-artifact-s3@v4 + if: always() + with: + name: libtorch-cuda11_6-static-without-deps-debug + retention-days: 14 + if-no-files-found: error + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + libtorch-cuda11_6-static-without-deps-debug-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-cuda11_6-static-without-deps-debug-build + runs-on: windows.8xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + LIBTORCH_CONFIG: debug + LIBTORCH_VARIANT: static-without-deps + # This is a dummy value for libtorch to work correctly with our batch scripts + # without this value pip does not get installed for some reason + DESIRED_PYTHON: "3.7" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Set-MpPreference -ExclusionPath $(Get-Location).tostring() + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cuda11_6-static-without-deps-debug + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Test PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + libtorch-cuda11_6-static-without-deps-debug-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-cuda11_6-static-without-deps-debug-test + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + LIBTORCH_CONFIG: debug + LIBTORCH_VARIANT: static-without-deps + # This is a dummy value for libtorch to work correctly with our batch scripts + # without this value pip does not get installed for some reason + DESIRED_PYTHON: "3.7" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cuda11_6-static-without-deps-debug + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af diff --git a/.github/workflows/generated-windows-binary-libtorch-release-master.yml b/.github/workflows/generated-windows-binary-libtorch-release-master.yml new file mode 100644 index 00000000000000..422cbb27cbb7e2 --- /dev/null +++ b/.github/workflows/generated-windows-binary-libtorch-release-master.yml @@ -0,0 +1,247 @@ +# @generated DO NOT EDIT MANUALLY + +# Template is at: .github/templates/windows_binary_build_workflow.yml.j2 +# Generation script: .github/scripts/generate_ci_workflows.py +name: windows-binary-libtorch-release + +on: + push: + branches: + - master + tags: + - 'ciflow/all/*' + - 'ciflow/trunk/*' + workflow_dispatch: + +env: + # Needed for conda builds + ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" + ANACONDA_USER: pytorch + AWS_DEFAULT_REGION: us-east-1 + BUILD_ENVIRONMENT: windows-binary-libtorch-release + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + IN_CI: 1 + IS_GHA: 1 + PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} + PR_NUMBER: ${{ github.event.pull_request.number }} + PYTORCH_RETRY_TEST_CASES: 1 + SHA1: ${{ github.event.pull_request.head.sha || github.sha }} + SKIP_ALL_TESTS: 1 +concurrency: + group: windows-binary-libtorch-release-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} + cancel-in-progress: true + +jobs: + libtorch-cpu-shared-with-deps-release-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + SKIP_ALL_TESTS: 1 + LIBTORCH_CONFIG: release + LIBTORCH_VARIANT: shared-with-deps + # This is a dummy value for libtorch to work correctly with our batch scripts + # without this value pip does not get installed for some reason + DESIRED_PYTHON: "3.7" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Build PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" + - uses: seemethere/upload-artifact-s3@v4 + if: always() + with: + name: libtorch-cpu-shared-with-deps-release + retention-days: 14 + if-no-files-found: error + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + libtorch-cpu-shared-with-deps-release-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-cpu-shared-with-deps-release-build + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + SKIP_ALL_TESTS: 1 + LIBTORCH_CONFIG: release + LIBTORCH_VARIANT: shared-with-deps + # This is a dummy value for libtorch to work correctly with our batch scripts + # without this value pip does not get installed for some reason + DESIRED_PYTHON: "3.7" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cpu-shared-with-deps-release + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Test PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 diff --git a/.github/workflows/generated-windows-binary-libtorch-release.yml b/.github/workflows/generated-windows-binary-libtorch-release-nightly.yml similarity index 68% rename from .github/workflows/generated-windows-binary-libtorch-release.yml rename to .github/workflows/generated-windows-binary-libtorch-release-nightly.yml index 262561c2b199d8..9ee9a85b3ce314 100644 --- a/.github/workflows/generated-windows-binary-libtorch-release.yml +++ b/.github/workflows/generated-windows-binary-libtorch-release-nightly.yml @@ -37,6 +37,7 @@ concurrency: jobs: libtorch-cpu-shared-with-deps-release-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 env: @@ -67,10 +68,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -111,7 +123,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 if: always() with: name: libtorch-cpu-shared-with-deps-release @@ -164,10 +176,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -177,7 +200,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cpu-shared-with-deps-release @@ -245,30 +268,10 @@ jobs: # without this value pip does not get installed for some reason DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -285,12 +288,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cpu-shared-with-deps-release @@ -347,6 +347,7 @@ jobs: # Prune all of the docker images docker system prune -af libtorch-cpu-shared-without-deps-release-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 env: @@ -377,10 +378,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -421,7 +433,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 if: always() with: name: libtorch-cpu-shared-without-deps-release @@ -474,10 +486,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -487,7 +510,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cpu-shared-without-deps-release @@ -555,30 +578,10 @@ jobs: # without this value pip does not get installed for some reason DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -595,12 +598,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cpu-shared-without-deps-release @@ -657,6 +657,7 @@ jobs: # Prune all of the docker images docker system prune -af libtorch-cpu-static-with-deps-release-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 env: @@ -687,10 +688,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -731,7 +743,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 if: always() with: name: libtorch-cpu-static-with-deps-release @@ -784,10 +796,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -797,7 +820,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cpu-static-with-deps-release @@ -865,30 +888,10 @@ jobs: # without this value pip does not get installed for some reason DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -905,12 +908,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cpu-static-with-deps-release @@ -967,6 +967,7 @@ jobs: # Prune all of the docker images docker system prune -af libtorch-cpu-static-without-deps-release-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 env: @@ -997,10 +998,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -1041,7 +1053,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 if: always() with: name: libtorch-cpu-static-without-deps-release @@ -1094,10 +1106,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -1107,7 +1130,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cpu-static-without-deps-release @@ -1175,30 +1198,10 @@ jobs: # without this value pip does not get installed for some reason DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -1215,12 +1218,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cpu-static-without-deps-release @@ -1277,6 +1277,7 @@ jobs: # Prune all of the docker images docker system prune -af libtorch-cuda11_3-shared-with-deps-release-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 env: @@ -1308,10 +1309,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -1352,7 +1364,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 if: always() with: name: libtorch-cuda11_3-shared-with-deps-release @@ -1406,10 +1418,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -1419,7 +1442,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_3-shared-with-deps-release @@ -1488,30 +1511,10 @@ jobs: # without this value pip does not get installed for some reason DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -1528,12 +1531,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_3-shared-with-deps-release @@ -1590,6 +1590,7 @@ jobs: # Prune all of the docker images docker system prune -af libtorch-cuda11_3-shared-without-deps-release-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 env: @@ -1621,10 +1622,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -1665,7 +1677,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 if: always() with: name: libtorch-cuda11_3-shared-without-deps-release @@ -1719,10 +1731,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -1732,7 +1755,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_3-shared-without-deps-release @@ -1801,30 +1824,10 @@ jobs: # without this value pip does not get installed for some reason DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -1841,12 +1844,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_3-shared-without-deps-release @@ -1903,6 +1903,7 @@ jobs: # Prune all of the docker images docker system prune -af libtorch-cuda11_3-static-with-deps-release-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 env: @@ -1934,10 +1935,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -1978,7 +1990,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 if: always() with: name: libtorch-cuda11_3-static-with-deps-release @@ -2032,10 +2044,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -2045,7 +2068,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_3-static-with-deps-release @@ -2114,30 +2137,10 @@ jobs: # without this value pip does not get installed for some reason DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2154,12 +2157,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_3-static-with-deps-release @@ -2216,6 +2216,7 @@ jobs: # Prune all of the docker images docker system prune -af libtorch-cuda11_3-static-without-deps-release-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 env: @@ -2247,10 +2248,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -2291,7 +2303,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 if: always() with: name: libtorch-cuda11_3-static-without-deps-release @@ -2345,10 +2357,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -2358,7 +2381,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_3-static-without-deps-release @@ -2427,30 +2450,10 @@ jobs: # without this value pip does not get installed for some reason DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2467,12 +2470,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_3-static-without-deps-release @@ -2529,6 +2529,7 @@ jobs: # Prune all of the docker images docker system prune -af libtorch-cuda11_5-shared-with-deps-release-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 env: @@ -2560,10 +2561,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -2604,7 +2616,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 if: always() with: name: libtorch-cuda11_5-shared-with-deps-release @@ -2658,10 +2670,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -2671,7 +2694,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_5-shared-with-deps-release @@ -2740,34 +2763,14 @@ jobs: # without this value pip does not get installed for some reason DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") } retry docker pull "${ALPINE_IMAGE}" # Ensure the working directory gets chowned back to the current user @@ -2780,12 +2783,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_5-shared-with-deps-release @@ -2842,6 +2842,7 @@ jobs: # Prune all of the docker images docker system prune -af libtorch-cuda11_5-shared-without-deps-release-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 env: @@ -2873,10 +2874,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -2917,7 +2929,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 if: always() with: name: libtorch-cuda11_5-shared-without-deps-release @@ -2971,10 +2983,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -2984,7 +3007,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_5-shared-without-deps-release @@ -3053,30 +3076,10 @@ jobs: # without this value pip does not get installed for some reason DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3093,12 +3096,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_5-shared-without-deps-release @@ -3155,6 +3155,7 @@ jobs: # Prune all of the docker images docker system prune -af libtorch-cuda11_5-static-with-deps-release-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 env: @@ -3186,10 +3187,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -3230,7 +3242,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 if: always() with: name: libtorch-cuda11_5-static-with-deps-release @@ -3284,10 +3296,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -3297,7 +3320,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_5-static-with-deps-release @@ -3366,30 +3389,10 @@ jobs: # without this value pip does not get installed for some reason DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3406,12 +3409,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_5-static-with-deps-release @@ -3468,6 +3468,7 @@ jobs: # Prune all of the docker images docker system prune -af libtorch-cuda11_5-static-without-deps-release-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 env: @@ -3499,10 +3500,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -3543,7 +3555,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 if: always() with: name: libtorch-cuda11_5-static-without-deps-release @@ -3597,10 +3609,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -3610,7 +3633,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_5-static-without-deps-release @@ -3679,30 +3702,10 @@ jobs: # without this value pip does not get installed for some reason DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3719,12 +3722,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: libtorch-cuda11_5-static-without-deps-release @@ -3780,3 +3780,1247 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af + libtorch-cuda11_6-shared-with-deps-release-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + LIBTORCH_CONFIG: release + LIBTORCH_VARIANT: shared-with-deps + # This is a dummy value for libtorch to work correctly with our batch scripts + # without this value pip does not get installed for some reason + DESIRED_PYTHON: "3.7" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Set-MpPreference -ExclusionPath $(Get-Location).tostring() + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Build PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" + - uses: seemethere/upload-artifact-s3@v4 + if: always() + with: + name: libtorch-cuda11_6-shared-with-deps-release + retention-days: 14 + if-no-files-found: error + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + libtorch-cuda11_6-shared-with-deps-release-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-cuda11_6-shared-with-deps-release-build + runs-on: windows.8xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + LIBTORCH_CONFIG: release + LIBTORCH_VARIANT: shared-with-deps + # This is a dummy value for libtorch to work correctly with our batch scripts + # without this value pip does not get installed for some reason + DESIRED_PYTHON: "3.7" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Set-MpPreference -ExclusionPath $(Get-Location).tostring() + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cuda11_6-shared-with-deps-release + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Test PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + libtorch-cuda11_6-shared-with-deps-release-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-cuda11_6-shared-with-deps-release-test + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + LIBTORCH_CONFIG: release + LIBTORCH_VARIANT: shared-with-deps + # This is a dummy value for libtorch to work correctly with our batch scripts + # without this value pip does not get installed for some reason + DESIRED_PYTHON: "3.7" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cuda11_6-shared-with-deps-release + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-cuda11_6-shared-without-deps-release-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + LIBTORCH_CONFIG: release + LIBTORCH_VARIANT: shared-without-deps + # This is a dummy value for libtorch to work correctly with our batch scripts + # without this value pip does not get installed for some reason + DESIRED_PYTHON: "3.7" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Set-MpPreference -ExclusionPath $(Get-Location).tostring() + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Build PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" + - uses: seemethere/upload-artifact-s3@v4 + if: always() + with: + name: libtorch-cuda11_6-shared-without-deps-release + retention-days: 14 + if-no-files-found: error + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + libtorch-cuda11_6-shared-without-deps-release-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-cuda11_6-shared-without-deps-release-build + runs-on: windows.8xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + LIBTORCH_CONFIG: release + LIBTORCH_VARIANT: shared-without-deps + # This is a dummy value for libtorch to work correctly with our batch scripts + # without this value pip does not get installed for some reason + DESIRED_PYTHON: "3.7" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Set-MpPreference -ExclusionPath $(Get-Location).tostring() + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cuda11_6-shared-without-deps-release + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Test PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + libtorch-cuda11_6-shared-without-deps-release-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-cuda11_6-shared-without-deps-release-test + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + LIBTORCH_CONFIG: release + LIBTORCH_VARIANT: shared-without-deps + # This is a dummy value for libtorch to work correctly with our batch scripts + # without this value pip does not get installed for some reason + DESIRED_PYTHON: "3.7" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cuda11_6-shared-without-deps-release + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-cuda11_6-static-with-deps-release-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + LIBTORCH_CONFIG: release + LIBTORCH_VARIANT: static-with-deps + # This is a dummy value for libtorch to work correctly with our batch scripts + # without this value pip does not get installed for some reason + DESIRED_PYTHON: "3.7" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Set-MpPreference -ExclusionPath $(Get-Location).tostring() + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Build PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" + - uses: seemethere/upload-artifact-s3@v4 + if: always() + with: + name: libtorch-cuda11_6-static-with-deps-release + retention-days: 14 + if-no-files-found: error + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + libtorch-cuda11_6-static-with-deps-release-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-cuda11_6-static-with-deps-release-build + runs-on: windows.8xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + LIBTORCH_CONFIG: release + LIBTORCH_VARIANT: static-with-deps + # This is a dummy value for libtorch to work correctly with our batch scripts + # without this value pip does not get installed for some reason + DESIRED_PYTHON: "3.7" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Set-MpPreference -ExclusionPath $(Get-Location).tostring() + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cuda11_6-static-with-deps-release + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Test PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + libtorch-cuda11_6-static-with-deps-release-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-cuda11_6-static-with-deps-release-test + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + LIBTORCH_CONFIG: release + LIBTORCH_VARIANT: static-with-deps + # This is a dummy value for libtorch to work correctly with our batch scripts + # without this value pip does not get installed for some reason + DESIRED_PYTHON: "3.7" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cuda11_6-static-with-deps-release + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + libtorch-cuda11_6-static-without-deps-release-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + LIBTORCH_CONFIG: release + LIBTORCH_VARIANT: static-without-deps + # This is a dummy value for libtorch to work correctly with our batch scripts + # without this value pip does not get installed for some reason + DESIRED_PYTHON: "3.7" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Set-MpPreference -ExclusionPath $(Get-Location).tostring() + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Build PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" + - uses: seemethere/upload-artifact-s3@v4 + if: always() + with: + name: libtorch-cuda11_6-static-without-deps-release + retention-days: 14 + if-no-files-found: error + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + libtorch-cuda11_6-static-without-deps-release-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-cuda11_6-static-without-deps-release-build + runs-on: windows.8xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + LIBTORCH_CONFIG: release + LIBTORCH_VARIANT: static-without-deps + # This is a dummy value for libtorch to work correctly with our batch scripts + # without this value pip does not get installed for some reason + DESIRED_PYTHON: "3.7" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Set-MpPreference -ExclusionPath $(Get-Location).tostring() + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cuda11_6-static-without-deps-release + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Test PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + libtorch-cuda11_6-static-without-deps-release-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: libtorch-cuda11_6-static-without-deps-release-test + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: libtorch + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + LIBTORCH_CONFIG: release + LIBTORCH_VARIANT: static-without-deps + # This is a dummy value for libtorch to work correctly with our batch scripts + # without this value pip does not get installed for some reason + DESIRED_PYTHON: "3.7" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: libtorch-cuda11_6-static-without-deps-release + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af diff --git a/.github/workflows/generated-windows-binary-wheel-master.yml b/.github/workflows/generated-windows-binary-wheel-master.yml new file mode 100644 index 00000000000000..befb73dd15c241 --- /dev/null +++ b/.github/workflows/generated-windows-binary-wheel-master.yml @@ -0,0 +1,241 @@ +# @generated DO NOT EDIT MANUALLY + +# Template is at: .github/templates/windows_binary_build_workflow.yml.j2 +# Generation script: .github/scripts/generate_ci_workflows.py +name: windows-binary-wheel + +on: + push: + branches: + - master + tags: + - 'ciflow/all/*' + - 'ciflow/trunk/*' + workflow_dispatch: + +env: + # Needed for conda builds + ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" + ANACONDA_USER: pytorch + AWS_DEFAULT_REGION: us-east-1 + BUILD_ENVIRONMENT: windows-binary-wheel + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + IN_CI: 1 + IS_GHA: 1 + PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} + PR_NUMBER: ${{ github.event.pull_request.number }} + PYTORCH_RETRY_TEST_CASES: 1 + SHA1: ${{ github.event.pull_request.head.sha || github.sha }} + SKIP_ALL_TESTS: 1 +concurrency: + group: windows-binary-wheel-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} + cancel-in-progress: true + +jobs: + wheel-py3_7-cuda11_3-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: wheel + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.7" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Build PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" + - uses: seemethere/upload-artifact-s3@v4 + if: always() + with: + name: wheel-py3_7-cuda11_3 + retention-days: 14 + if-no-files-found: error + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + wheel-py3_7-cuda11_3-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: wheel-py3_7-cuda11_3-build + runs-on: windows.8xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: wheel + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.7" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: wheel-py3_7-cuda11_3 + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Test PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 diff --git a/.github/workflows/generated-windows-binary-wheel.yml b/.github/workflows/generated-windows-binary-wheel-nightly.yml similarity index 68% rename from .github/workflows/generated-windows-binary-wheel.yml rename to .github/workflows/generated-windows-binary-wheel-nightly.yml index 0e763245267990..95e163841eabba 100644 --- a/.github/workflows/generated-windows-binary-wheel.yml +++ b/.github/workflows/generated-windows-binary-wheel-nightly.yml @@ -37,6 +37,7 @@ concurrency: jobs: wheel-py3_7-cpu-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 env: @@ -63,10 +64,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -107,7 +119,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 if: always() with: name: wheel-py3_7-cpu @@ -156,10 +168,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -169,7 +192,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: wheel-py3_7-cpu @@ -233,30 +256,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -273,12 +276,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: wheel-py3_7-cpu @@ -335,6 +335,7 @@ jobs: # Prune all of the docker images docker system prune -af wheel-py3_7-cuda11_3-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 env: @@ -362,10 +363,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -406,7 +418,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 if: always() with: name: wheel-py3_7-cuda11_3 @@ -456,10 +468,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -469,7 +492,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: wheel-py3_7-cuda11_3 @@ -534,30 +557,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -574,12 +577,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: wheel-py3_7-cuda11_3 @@ -636,6 +636,7 @@ jobs: # Prune all of the docker images docker system prune -af wheel-py3_7-cuda11_5-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 env: @@ -663,10 +664,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -707,7 +719,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 if: always() with: name: wheel-py3_7-cuda11_5 @@ -757,10 +769,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -770,7 +793,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: wheel-py3_7-cuda11_5 @@ -835,30 +858,10 @@ jobs: SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -875,12 +878,9 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: name: wheel-py3_7-cuda11_5 @@ -936,7 +936,8 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - wheel-py3_8-cpu-build: + wheel-py3_7-cuda11_6-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 env: @@ -945,10 +946,11 @@ jobs: PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.8" + DESIRED_PYTHON: "3.7" steps: - name: Display EC2 information shell: bash @@ -963,10 +965,20 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Set-MpPreference -ExclusionPath $(Get-Location).tostring() # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -1007,10 +1019,10 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 if: always() with: - name: wheel-py3_8-cpu + name: wheel-py3_7-cuda11_6 retention-days: 14 if-no-files-found: error path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" @@ -1027,10 +1039,10 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_8-cpu-test: # Testing + wheel-py3_7-cuda11_6-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_8-cpu-build - runs-on: windows.4xlarge + needs: wheel-py3_7-cuda11_6-build + runs-on: windows.8xlarge.nvidia.gpu timeout-minutes: 240 env: PYTORCH_ROOT: ${{ github.workspace }}/pytorch @@ -1038,10 +1050,11 @@ jobs: PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.8" + DESIRED_PYTHON: "3.7" steps: - name: Display EC2 information shell: bash @@ -1056,10 +1069,20 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Set-MpPreference -ExclusionPath $(Get-Location).tostring() # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -1069,10 +1092,10 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: wheel-py3_8-cpu + name: wheel-py3_7-cuda11_6 path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -1118,45 +1141,26 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_8-cpu-upload: # Uploading + wheel-py3_7-cuda11_6-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_8-cpu-test + needs: wheel-py3_7-cuda11_6-test env: PYTORCH_ROOT: ${{ github.workspace }}/pytorch BUILDER_ROOT: ${{ github.workspace }}/builder PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.8" + DESIRED_PYTHON: "3.7" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -1173,15 +1177,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: wheel-py3_8-cpu + name: wheel-py3_7-cuda11_6 path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -1234,7 +1235,8 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - wheel-py3_8-cuda11_3-build: + wheel-py3_8-cpu-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 env: @@ -1243,9 +1245,8 @@ jobs: PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" steps: @@ -1262,10 +1263,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -1306,10 +1318,10 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 if: always() with: - name: wheel-py3_8-cuda11_3 + name: wheel-py3_8-cpu retention-days: 14 if-no-files-found: error path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" @@ -1326,10 +1338,10 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_8-cuda11_3-test: # Testing + wheel-py3_8-cpu-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_8-cuda11_3-build - runs-on: windows.8xlarge.nvidia.gpu + needs: wheel-py3_8-cpu-build + runs-on: windows.4xlarge timeout-minutes: 240 env: PYTORCH_ROOT: ${{ github.workspace }}/pytorch @@ -1337,9 +1349,8 @@ jobs: PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" steps: @@ -1356,10 +1367,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -1369,10 +1391,10 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: wheel-py3_8-cuda11_3 + name: wheel-py3_8-cpu path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -1418,46 +1440,25 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_8-cuda11_3-upload: # Uploading + wheel-py3_8-cpu-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_8-cuda11_3-test + needs: wheel-py3_8-cpu-test env: PYTORCH_ROOT: ${{ github.workspace }}/pytorch BUILDER_ROOT: ${{ github.workspace }}/builder PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -1474,15 +1475,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: wheel-py3_8-cuda11_3 + name: wheel-py3_8-cpu path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -1535,7 +1533,8 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - wheel-py3_8-cuda11_5-build: + wheel-py3_8-cuda11_3-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 env: @@ -1544,8 +1543,8 @@ jobs: PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu115 - GPU_ARCH_VERSION: 11.5 + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" @@ -1563,10 +1562,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -1607,10 +1617,10 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 if: always() with: - name: wheel-py3_8-cuda11_5 + name: wheel-py3_8-cuda11_3 retention-days: 14 if-no-files-found: error path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" @@ -1627,9 +1637,9 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_8-cuda11_5-test: # Testing + wheel-py3_8-cuda11_3-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_8-cuda11_5-build + needs: wheel-py3_8-cuda11_3-build runs-on: windows.8xlarge.nvidia.gpu timeout-minutes: 240 env: @@ -1638,8 +1648,8 @@ jobs: PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu115 - GPU_ARCH_VERSION: 11.5 + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" @@ -1657,10 +1667,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -1670,10 +1691,10 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: wheel-py3_8-cuda11_5 + name: wheel-py3_8-cuda11_3 path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -1719,48 +1740,1227 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_8-cuda11_5-upload: # Uploading + wheel-py3_8-cuda11_3-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_8-cuda11_5-test + needs: wheel-py3_8-cuda11_3-test env: PYTORCH_ROOT: ${{ github.workspace }}/pytorch BUILDER_ROOT: ${{ github.workspace }}/builder PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu115 - GPU_ARCH_VERSION: 11.5 + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") retry () { "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" - - name: Chown workspace - run: | + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: wheel-py3_8-cuda11_3 + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + wheel-py3_8-cuda11_5-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: wheel + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.8" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Build PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" + - uses: seemethere/upload-artifact-s3@v4 + if: always() + with: + name: wheel-py3_8-cuda11_5 + retention-days: 14 + if-no-files-found: error + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + wheel-py3_8-cuda11_5-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: wheel-py3_8-cuda11_5-build + runs-on: windows.8xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: wheel + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.8" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: wheel-py3_8-cuda11_5 + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Test PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + wheel-py3_8-cuda11_5-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: wheel-py3_8-cuda11_5-test + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: wheel + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.8" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: wheel-py3_8-cuda11_5 + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + wheel-py3_8-cuda11_6-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: wheel + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.8" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Set-MpPreference -ExclusionPath $(Get-Location).tostring() + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Build PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" + - uses: seemethere/upload-artifact-s3@v4 + if: always() + with: + name: wheel-py3_8-cuda11_6 + retention-days: 14 + if-no-files-found: error + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + wheel-py3_8-cuda11_6-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: wheel-py3_8-cuda11_6-build + runs-on: windows.8xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: wheel + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.8" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Set-MpPreference -ExclusionPath $(Get-Location).tostring() + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: wheel-py3_8-cuda11_6 + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Test PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + wheel-py3_8-cuda11_6-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: wheel-py3_8-cuda11_6-test + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: wheel + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.8" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: wheel-py3_8-cuda11_6 + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + wheel-py3_9-cpu-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: wheel + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.9" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Build PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" + - uses: seemethere/upload-artifact-s3@v4 + if: always() + with: + name: wheel-py3_9-cpu + retention-days: 14 + if-no-files-found: error + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + wheel-py3_9-cpu-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: wheel-py3_9-cpu-build + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: wheel + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.9" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: wheel-py3_9-cpu + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Test PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + wheel-py3_9-cpu-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: wheel-py3_9-cpu-test + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: wheel + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.9" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | + retry () { + "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") + } + retry docker pull "${ALPINE_IMAGE}" + # Ensure the working directory gets chowned back to the current user + docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Clean workspace + run: | + rm -rf "${GITHUB_WORKSPACE}" + mkdir "${GITHUB_WORKSPACE}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Clone pytorch/pytorch + uses: actions/checkout@v2 + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: wheel-py3_9-cpu + path: "${{ runner.temp }}/artifacts/" + - name: Set DRY_RUN (only for tagged pushes) + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + run: | + echo "DRY_RUN=disabled" >> "$GITHUB_ENV" + - name: Set UPLOAD_CHANNEL (only for tagged pushes) + if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + run: | + # reference ends with an RC suffix + if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then + echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" + fi + - name: Upload binaries + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + UPLOAD_SUBFOLDER: "${{ env.DESIRED_CUDA }}" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} + ANACONDA_API_TOKEN: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} + run: | + docker run --rm -i \ + -e ANACONDA_API_TOKEN \ + -e AWS_ACCESS_KEY_ID \ + -e AWS_SECRET_ACCESS_KEY \ + -e DRY_RUN \ + -e PACKAGE_TYPE \ + -e PKG_DIR=/artifacts \ + -e UPLOAD_CHANNEL \ + -e UPLOAD_SUBFOLDER \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -v "${GITHUB_WORKSPACE}:/v" \ + -w /v \ + 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ + bash -c '.circleci/scripts/binary_upload.sh' + - name: Hold runner for 2 hours or until ssh sessions have drained + # Always hold for active ssh sessions + if: always() + run: .github/scripts/wait_for_ssh_to_drain.sh + - name: Chown workspace + if: always() + run: | + # Ensure the working directory gets chowned back to the current user + docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . + - name: Kill containers, clean up images + if: always() + run: | + # ignore expansion of "docker ps -q" since it could be empty + # shellcheck disable=SC2046 + docker stop $(docker ps -q) || true + # Prune all of the docker images + docker system prune -af + wheel-py3_9-cuda11_3-build: + if: ${{ github.repository_owner == 'pytorch' }} + runs-on: windows.4xlarge + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: wheel + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.9" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Build PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" + - uses: seemethere/upload-artifact-s3@v4 + if: always() + with: + name: wheel-py3_9-cuda11_3 + retention-days: 14 + if-no-files-found: error + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + wheel-py3_9-cuda11_3-test: # Testing + if: ${{ github.repository_owner == 'pytorch' }} + needs: wheel-py3_9-cuda11_3-build + runs-on: windows.8xlarge.nvidia.gpu + timeout-minutes: 240 + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: wheel + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.9" + steps: + - name: Display EC2 information + shell: bash + run: | + set -euo pipefail + function get_ec2_metadata() { + # Pulled from instance metadata endpoint for EC2 + # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html + category=$1 + curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" + } + echo "ami-id: $(get_ec2_metadata ami-id)" + echo "instance-id: $(get_ec2_metadata instance-id)" + echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: seemethere/add-github-ssh-key@v1 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore + # NOTE: These environment variables are put here so that they can be applied on every job equally + # They are also here because setting them at a workflow level doesn't give us access to the + # runner.temp variable, which we need. + - name: Populate binary env + shell: bash + run: | + echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" + echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" + echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" + - uses: seemethere/download-artifact-s3@v3 + name: Download Build Artifacts + with: + name: wheel-py3_9-cuda11_3 + path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" + - name: Checkout PyTorch + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} + submodules: recursive + path: pytorch + - name: Clean PyTorch checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: pytorch + - name: Checkout pytorch/builder + uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + with: + ref: main + submodules: recursive + repository: pytorch/builder + path: builder + - name: Clean pytorch/builder checkout + run: | + # Remove any artifacts from the previous checkouts + git clean -fxd + working-directory: builder + - name: Populate binary env + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" + - name: Test PyTorch binary + shell: bash + run: | + "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" + - name: Wait until all sessions have drained + shell: powershell + working-directory: pytorch + if: always() + timeout-minutes: 120 + run: | + .github\scripts\wait_for_ssh_to_drain.ps1 + - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) + shell: powershell + working-directory: pytorch + if: always() + run: | + .github\scripts\kill_active_ssh_sessions.ps1 + wheel-py3_9-cuda11_3-upload: # Uploading + runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts + if: ${{ github.repository_owner == 'pytorch' }} + needs: wheel-py3_9-cuda11_3-test + env: + PYTORCH_ROOT: ${{ github.workspace }}/pytorch + BUILDER_ROOT: ${{ github.workspace }}/builder + PACKAGE_TYPE: wheel + # TODO: This is a legacy variable that we eventually want to get rid of in + # favor of GPU_ARCH_VERSION + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 + GPU_ARCH_TYPE: cuda + SKIP_ALL_TESTS: 1 + DESIRED_PYTHON: "3.9" + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Chown workspace + run: | retry () { "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") } @@ -1775,15 +2975,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: wheel-py3_8-cuda11_5 + name: wheel-py3_9-cuda11_3 path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -1836,7 +3033,8 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - wheel-py3_9-cpu-build: + wheel-py3_9-cuda11_5-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 env: @@ -1845,8 +3043,9 @@ jobs: PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 + GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.9" steps: @@ -1863,10 +3062,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -1907,10 +3117,10 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 if: always() with: - name: wheel-py3_9-cpu + name: wheel-py3_9-cuda11_5 retention-days: 14 if-no-files-found: error path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" @@ -1927,10 +3137,10 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_9-cpu-test: # Testing + wheel-py3_9-cuda11_5-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_9-cpu-build - runs-on: windows.4xlarge + needs: wheel-py3_9-cuda11_5-build + runs-on: windows.8xlarge.nvidia.gpu timeout-minutes: 240 env: PYTORCH_ROOT: ${{ github.workspace }}/pytorch @@ -1938,8 +3148,9 @@ jobs: PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 + GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.9" steps: @@ -1956,10 +3167,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -1969,10 +3191,10 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: wheel-py3_9-cpu + name: wheel-py3_9-cuda11_5 path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -2018,45 +3240,26 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_9-cpu-upload: # Uploading + wheel-py3_9-cuda11_5-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_9-cpu-test + needs: wheel-py3_9-cuda11_5-test env: PYTORCH_ROOT: ${{ github.workspace }}/pytorch BUILDER_ROOT: ${{ github.workspace }}/builder PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 + GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.9" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2073,15 +3276,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: wheel-py3_9-cpu + name: wheel-py3_9-cuda11_5 path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -2134,7 +3334,8 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - wheel-py3_9-cuda11_3-build: + wheel-py3_9-cuda11_6-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 env: @@ -2143,8 +3344,8 @@ jobs: PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.9" @@ -2162,10 +3363,20 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Set-MpPreference -ExclusionPath $(Get-Location).tostring() # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -2206,10 +3417,10 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 if: always() with: - name: wheel-py3_9-cuda11_3 + name: wheel-py3_9-cuda11_6 retention-days: 14 if-no-files-found: error path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" @@ -2226,9 +3437,9 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_9-cuda11_3-test: # Testing + wheel-py3_9-cuda11_6-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_9-cuda11_3-build + needs: wheel-py3_9-cuda11_6-build runs-on: windows.8xlarge.nvidia.gpu timeout-minutes: 240 env: @@ -2237,8 +3448,8 @@ jobs: PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.9" @@ -2256,10 +3467,20 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Set-MpPreference -ExclusionPath $(Get-Location).tostring() # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -2269,10 +3490,10 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: wheel-py3_9-cuda11_3 + name: wheel-py3_9-cuda11_6 path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -2318,46 +3539,26 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_9-cuda11_3-upload: # Uploading + wheel-py3_9-cuda11_6-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_9-cuda11_3-test + needs: wheel-py3_9-cuda11_6-test env: PYTORCH_ROOT: ${{ github.workspace }}/pytorch BUILDER_ROOT: ${{ github.workspace }}/builder PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.9" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2374,15 +3575,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: wheel-py3_9-cuda11_3 + name: wheel-py3_9-cuda11_6 path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -2435,7 +3633,8 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - wheel-py3_9-cuda11_5-build: + wheel-py3_10-cpu-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 env: @@ -2444,11 +3643,10 @@ jobs: PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu115 - GPU_ARCH_VERSION: 11.5 - GPU_ARCH_TYPE: cuda + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.9" + DESIRED_PYTHON: "3.10" steps: - name: Display EC2 information shell: bash @@ -2463,10 +3661,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -2507,10 +3716,10 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 if: always() with: - name: wheel-py3_9-cuda11_5 + name: wheel-py3_10-cpu retention-days: 14 if-no-files-found: error path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" @@ -2527,10 +3736,10 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_9-cuda11_5-test: # Testing + wheel-py3_10-cpu-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_9-cuda11_5-build - runs-on: windows.8xlarge.nvidia.gpu + needs: wheel-py3_10-cpu-build + runs-on: windows.4xlarge timeout-minutes: 240 env: PYTORCH_ROOT: ${{ github.workspace }}/pytorch @@ -2538,11 +3747,10 @@ jobs: PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu115 - GPU_ARCH_VERSION: 11.5 - GPU_ARCH_TYPE: cuda + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.9" + DESIRED_PYTHON: "3.10" steps: - name: Display EC2 information shell: bash @@ -2557,10 +3765,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -2570,10 +3789,10 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: wheel-py3_9-cuda11_5 + name: wheel-py3_10-cpu path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -2619,46 +3838,25 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_9-cuda11_5-upload: # Uploading + wheel-py3_10-cpu-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_9-cuda11_5-test + needs: wheel-py3_10-cpu-test env: PYTORCH_ROOT: ${{ github.workspace }}/pytorch BUILDER_ROOT: ${{ github.workspace }}/builder PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu115 - GPU_ARCH_VERSION: 11.5 - GPU_ARCH_TYPE: cuda + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.9" + DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2675,15 +3873,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: wheel-py3_9-cuda11_5 + name: wheel-py3_10-cpu path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -2736,7 +3931,8 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - wheel-py3_10-cpu-build: + wheel-py3_10-cuda11_3-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 env: @@ -2745,8 +3941,9 @@ jobs: PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 + GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: @@ -2763,10 +3960,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -2807,10 +4015,10 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 if: always() with: - name: wheel-py3_10-cpu + name: wheel-py3_10-cuda11_3 retention-days: 14 if-no-files-found: error path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" @@ -2827,10 +4035,10 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_10-cpu-test: # Testing + wheel-py3_10-cuda11_3-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_10-cpu-build - runs-on: windows.4xlarge + needs: wheel-py3_10-cuda11_3-build + runs-on: windows.8xlarge.nvidia.gpu timeout-minutes: 240 env: PYTORCH_ROOT: ${{ github.workspace }}/pytorch @@ -2838,8 +4046,9 @@ jobs: PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 + GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: @@ -2856,10 +4065,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -2869,10 +4089,10 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: wheel-py3_10-cpu + name: wheel-py3_10-cuda11_3 path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -2918,45 +4138,26 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_10-cpu-upload: # Uploading + wheel-py3_10-cuda11_3-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_10-cpu-test + needs: wheel-py3_10-cuda11_3-test env: PYTORCH_ROOT: ${{ github.workspace }}/pytorch BUILDER_ROOT: ${{ github.workspace }}/builder PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu + DESIRED_CUDA: cu113 + GPU_ARCH_VERSION: 11.3 + GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -2973,15 +4174,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: wheel-py3_10-cpu + name: wheel-py3_10-cuda11_3 path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -3034,7 +4232,8 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - wheel-py3_10-cuda11_3-build: + wheel-py3_10-cuda11_5-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 env: @@ -3043,8 +4242,8 @@ jobs: PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" @@ -3062,10 +4261,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -3106,10 +4316,10 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 if: always() with: - name: wheel-py3_10-cuda11_3 + name: wheel-py3_10-cuda11_5 retention-days: 14 if-no-files-found: error path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" @@ -3126,9 +4336,9 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_10-cuda11_3-test: # Testing + wheel-py3_10-cuda11_5-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_10-cuda11_3-build + needs: wheel-py3_10-cuda11_5-build runs-on: windows.8xlarge.nvidia.gpu timeout-minutes: 240 env: @@ -3137,8 +4347,8 @@ jobs: PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" @@ -3156,10 +4366,21 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + # Since it's just a defensive command, the workflow should continue even the command fails + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -3169,10 +4390,10 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: wheel-py3_10-cuda11_3 + name: wheel-py3_10-cuda11_5 path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -3218,46 +4439,26 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_10-cuda11_3-upload: # Uploading + wheel-py3_10-cuda11_5-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_10-cuda11_3-test + needs: wheel-py3_10-cuda11_5-test env: PYTORCH_ROOT: ${{ github.workspace }}/pytorch BUILDER_ROOT: ${{ github.workspace }}/builder PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu115 + GPU_ARCH_VERSION: 11.5 GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3274,15 +4475,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: wheel-py3_10-cuda11_3 + name: wheel-py3_10-cuda11_5 path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} @@ -3335,7 +4533,8 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - wheel-py3_10-cuda11_5-build: + wheel-py3_10-cuda11_6-build: + if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 env: @@ -3344,8 +4543,8 @@ jobs: PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu115 - GPU_ARCH_VERSION: 11.5 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" @@ -3363,10 +4562,20 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Set-MpPreference -ExclusionPath $(Get-Location).tostring() # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -3407,10 +4616,10 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v3 + - uses: seemethere/upload-artifact-s3@v4 if: always() with: - name: wheel-py3_10-cuda11_5 + name: wheel-py3_10-cuda11_6 retention-days: 14 if-no-files-found: error path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" @@ -3427,9 +4636,9 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_10-cuda11_5-test: # Testing + wheel-py3_10-cuda11_6-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_10-cuda11_5-build + needs: wheel-py3_10-cuda11_6-build runs-on: windows.8xlarge.nvidia.gpu timeout-minutes: 240 env: @@ -3438,8 +4647,8 @@ jobs: PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu115 - GPU_ARCH_VERSION: 11.5 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" @@ -3457,10 +4666,20 @@ jobs: echo "ami-id: $(get_ec2_metadata ami-id)" echo "instance-id: $(get_ec2_metadata instance-id)" echo "instance-type: $(get_ec2_metadata instance-type)" + echo "system info $(uname -a)" - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 + - name: Enable long paths on Windows + shell: powershell + run: | + Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 + - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. + shell: powershell + run: | + Set-MpPreference -ExclusionPath $(Get-Location).tostring() # NOTE: These environment variables are put here so that they can be applied on every job equally # They are also here because setting them at a workflow level doesn't give us access to the # runner.temp variable, which we need. @@ -3470,10 +4689,10 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: wheel-py3_10-cuda11_5 + name: wheel-py3_10-cuda11_6 path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -3519,46 +4738,26 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_10-cuda11_5-upload: # Uploading + wheel-py3_10-cuda11_6-upload: # Uploading runs-on: linux.2xlarge # self hosted runner to download ec2 artifacts if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_10-cuda11_5-test + needs: wheel-py3_10-cuda11_6-test env: PYTORCH_ROOT: ${{ github.workspace }}/pytorch BUILDER_ROOT: ${{ github.workspace }}/builder PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu115 - GPU_ARCH_VERSION: 11.5 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.10" steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - - name: Log in to ECR - env: - AWS_RETRY_MODE: standard - AWS_MAX_ATTEMPTS: 5 - run: | - AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ - --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Setup Linux + uses: ./.github/actions/setup-linux - name: Chown workspace run: | retry () { @@ -3575,15 +4774,12 @@ jobs: uses: seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - - name: Preserve github env variables for use in docker - run: | - env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" - name: Clone pytorch/pytorch uses: actions/checkout@v2 - - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b + - uses: seemethere/download-artifact-s3@v3 name: Download Build Artifacts with: - name: wheel-py3_10-cuda11_5 + name: wheel-py3_10-cuda11_6 path: "${{ runner.temp }}/artifacts/" - name: Set DRY_RUN (only for tagged pushes) if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} diff --git a/.github/workflows/lint.yml b/.github/workflows/lint.yml index d98a81da5e9b13..05317c5a92875c 100644 --- a/.github/workflows/lint.yml +++ b/.github/workflows/lint.yml @@ -8,6 +8,7 @@ on: jobs: quick-checks: + name: quick-checks runs-on: ubuntu-18.04 steps: - name: Setup Python @@ -15,8 +16,9 @@ jobs: with: python-version: 3.x architecture: x64 + # [see note: pytorch repo ref] - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master - name: Clean PyTorch checkout run: | # Remove any artifacts from the previous checkouts @@ -113,6 +115,7 @@ jobs: .github/scripts/lint_test_ownership.py clang-format: + name: clang-format runs-on: ubuntu-18.04 if: ${{ github.event_name == 'pull_request' }} steps: @@ -121,10 +124,10 @@ jobs: with: python-version: 3.x architecture: x64 - - name: Fetch PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - fetch-depth: 0 # deep clone, to allow us to use git merge-base + # [see note: pytorch repo ref] + # deep clone (fetch-depth 0 required to use git merge-base) + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master - name: Run clang-format env: BASE_SHA: ${{ github.event.pull_request.base.sha }} @@ -153,6 +156,7 @@ jobs: exit 1 py2-setup-validate-errormsg: + name: py2-setup-validate-errormsg runs-on: ubuntu-18.04 steps: - name: Setup Python @@ -160,8 +164,9 @@ jobs: with: python-version: 2.x architecture: x64 + # [see note: pytorch repo ref] - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master - name: Attempt to run setup.py run: | if ! python2 setup.py | grep -q "Python 2 has reached end-of-life and is no longer supported by PyTorch."; then @@ -172,6 +177,7 @@ jobs: run: python2 -m py_compile torch/utils/collect_env.py shellcheck: + name: shellcheck runs-on: ubuntu-18.04 steps: - name: Setup Python @@ -179,8 +185,9 @@ jobs: with: python-version: 3.x architecture: x64 + # [see note: pytorch repo ref] - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master - name: Install requirements id: requirements run: | @@ -188,8 +195,9 @@ jobs: - name: Install Jinja2 run: | pip3 install Jinja2==3.0.1 --user + # [see note: pytorch repo ref] - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master - name: Regenerate workflows id: generate_workflows run: .github/scripts/generate_ci_workflows.py @@ -251,6 +259,7 @@ jobs: rm actionlint toc: + name: toc runs-on: ubuntu-18.04 # https://github.com/actions/virtual-environments/issues/599#issuecomment-602754687 env: @@ -258,8 +267,9 @@ jobs: steps: - name: Setup Node uses: actions/setup-node@v2 + # [see note: pytorch repo ref] - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master - name: Install markdown-toc run: npm install -g markdown-toc - name: Regenerate ToCs and check that they didn't change @@ -287,6 +297,7 @@ jobs: fi flake8-py3: + name: flake8-py3 runs-on: ubuntu-18.04 steps: - name: Setup Python @@ -294,10 +305,10 @@ jobs: with: python-version: 3.x architecture: x64 - - name: Fetch PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - fetch-depth: 2 # to allow us to use github.event.pull_request.head.sha + # [see note: pytorch repo ref] + # fetch-depth 2 required to allow us to use github.event.pull_request.head.sha + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master - name: Prepare output dir with HEAD commit SHA env: HEAD_SHA: ${{ github.event.pull_request.head.sha }} @@ -347,7 +358,8 @@ jobs: mode: json clang-tidy: - runs-on: linux.2xlarge + name: clang-tidy + runs-on: [self-hosted, linux.2xlarge] container: # ubuntu20.04-cuda11.2-py3.8-tidy11 image: ghcr.io/pytorch/cilint-clang-tidy:d8f0c777964d0dd8a147360de80aed1a13eb613a @@ -356,10 +368,12 @@ jobs: run: | rm -rf "${GITHUB_WORKSPACE}" mkdir "${GITHUB_WORKSPACE}" + # [see note: pytorch repo ref] + # deep clone (fetch-depth 0) to allow tools/linter/clang_tidy.py to do its thing - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master with: - fetch-depth: 0 # to allow tools/linter/clang_tidy.py to do its thing + no-sudo: true - name: Prepare output dir with HEAD commit SHA env: HEAD_SHA: ${{ github.event.pull_request.head.sha }} @@ -398,10 +412,12 @@ jobs: python3 -m tools.linter.clang_tidy \ --paths \ + torch/csrc/cuda \ torch/csrc/fx \ torch/csrc/utils \ torch/csrc/generic \ torch/csrc/deploy \ + torch/csrc/onnx \ torch/csrc/tensor \ --clang-tidy-exe "$(which clang-tidy)" \ --disable-progress-bar 2>&1 | tee -a "${GITHUB_WORKSPACE}"/clang-tidy-output.txt @@ -440,6 +456,7 @@ jobs: mode: json cmakelint: + name: cmakelint runs-on: ubuntu-18.04 steps: - name: Setup Python @@ -447,8 +464,9 @@ jobs: with: python-version: 3.x architecture: x64 - - name: Fetch PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + # [see note: pytorch repo ref] + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master - name: Install dependencies run: | set -eux @@ -462,6 +480,7 @@ jobs: xargs -0 cmakelint --config=.cmakelintrc --spaces=2 --quiet mypy: + name: mypy runs-on: ubuntu-18.04 steps: - name: Setup Python @@ -469,8 +488,9 @@ jobs: with: python-version: 3.8 architecture: x64 - - name: Fetch PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 + # [see note: pytorch repo ref] + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master - name: Install dependencies run: | set -eux @@ -502,6 +522,64 @@ jobs: false fi + test-tools: + name: Test tools + if: ${{ github.repository == 'pytorch/pytorch' }} + runs-on: ubuntu-18.04 + steps: + - name: Setup Python + uses: actions/setup-python@v2 + with: + python-version: 3.8 + architecture: x64 + # [see note: pytorch repo ref] + # deep clone (fetch-depth 0) required, to allow us to use git log + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Install dependencies + # mypy and boto3 versions copied from + # .circleci/docker/common/install_conda.sh + run: | + set -eux + python3 -mpip install -r requirements.txt + python3 -mpip install boto3==1.16.34 + pip3 install typing-extensions==3.10 --user + pip3 install -r requirements-flake8.txt --user + python3 -mpip install -r requirements.txt --user + python3 -mpip install mypy==0.960 --user + make setup_lint + - name: Test tools + run: | + python3 -m unittest discover -vs tools/test -p 'test_*.py' + python3 -m unittest discover -vs .github/scripts -p 'test_*.py' + + test_collect_env: + if: ${{ github.repository == 'pytorch/pytorch' }} + name: Test collect_env + runs-on: ubuntu-18.04 + strategy: + matrix: + with_torch: [with_torch, without_torch] + steps: + - name: Setup Python + uses: actions/setup-python@v2 + with: + python-version: 3.8 + architecture: x64 + # [see note: pytorch repo ref] + # deep clone (fetch-depth 0) required, to allow us to use git log + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + - name: Install torch + if: matrix.with_torch == 'with_torch' + run: | + # Doesn't really matter what torch version, we just need ANY torch installed + pip install 'torch==1.*' + - name: Run collect_env.py + run: | + # All we need to see is that it passes + python3 torch/utils/collect_env.py + concurrency: - group: lint-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} + group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} cancel-in-progress: true diff --git a/.github/workflows/nightly.yml b/.github/workflows/nightly.yml new file mode 100644 index 00000000000000..3322b2097a17dd --- /dev/null +++ b/.github/workflows/nightly.yml @@ -0,0 +1,33 @@ +name: nightly + +on: + schedule: + - cron: 0 0 * * * + push: + tags: + - ciflow/nightly/* + workflow_dispatch: + + +concurrency: + group: ${{ github.workflow }}--${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} + cancel-in-progress: true + +jobs: + docs-build: + name: docs build + uses: ./.github/workflows/_linux-build.yml + with: + build-environment: linux-xenial-py3.7-gcc5.4 + docker-image-name: pytorch-linux-xenial-py3.7-gcc5.4 + + docs-push: + name: docs push + uses: ./.github/workflows/_docs.yml + needs: docs-build + with: + build-environment: linux-xenial-py3.7-gcc5.4 + docker-image: ${{ needs.docs-build.outputs.docker-image }} + push: true + secrets: + GH_PYTORCHBOT_TOKEN: ${{ secrets.GH_PYTORCHBOT_TOKEN }} diff --git a/.github/workflows/periodic.yml b/.github/workflows/periodic.yml new file mode 100644 index 00000000000000..972041d24c13da --- /dev/null +++ b/.github/workflows/periodic.yml @@ -0,0 +1,206 @@ +name: periodic + +on: + schedule: + - cron: 45 0,4,8,12,16,20 * * * + push: + tags: + - ciflow/periodic/* + - ciflow/all/* + workflow_dispatch: + +concurrency: + group: ${{ github.workflow }}--${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} + cancel-in-progress: true + +jobs: + linux-bionic-cuda11_5-py3_7-gcc7-build: + name: linux-bionic-cuda11.5-py3.7-gcc7 + uses: ./.github/workflows/_linux-build.yml + with: + build-environment: linux-bionic-cuda11.5-py3.7-gcc7 + docker-image-name: pytorch-linux-bionic-cuda11.5-cudnn8-py3-gcc7 + + linux-bionic-cuda11_5-py3_7-gcc7-test: + name: linux-bionic-cuda11.5-py3.7-gcc7 + uses: ./.github/workflows/_linux-test.yml + needs: linux-bionic-cuda11_5-py3_7-gcc7-build + with: + build-environment: linux-bionic-cuda11.5-py3.7-gcc7 + docker-image: ${{ needs.linux-bionic-cuda11_5-py3_7-gcc7-build.outputs.docker-image }} + test-matrix: | + { include: [ + { config: "default", shard: 1, num_shards: 2, runner: "linux.4xlarge.nvidia.gpu" }, + { config: "default", shard: 2, num_shards: 2, runner: "linux.4xlarge.nvidia.gpu" }, + ]} + + linux-bionic-cuda11_6-py3_7-gcc7-build: + name: linux-bionic-cuda11.6-py3.7-gcc7 + uses: ./.github/workflows/_linux-build.yml + with: + build-environment: linux-bionic-cuda11.6-py3.7-gcc7 + docker-image-name: pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7 + + linux-bionic-cuda11_6-py3_7-gcc7-test: + name: linux-bionic-cuda11.6-py3.7-gcc7 + uses: ./.github/workflows/_linux-test.yml + needs: linux-bionic-cuda11_6-py3_7-gcc7-build + with: + build-environment: linux-bionic-cuda11.6-py3.7-gcc7 + docker-image: ${{ needs.linux-bionic-cuda11_6-py3_7-gcc7-build.outputs.docker-image }} + test-matrix: | + { include: [ + { config: "default", shard: 1, num_shards: 2, runner: "linux.4xlarge.nvidia.gpu" }, + { config: "default", shard: 2, num_shards: 2, runner: "linux.4xlarge.nvidia.gpu" }, + ]} + + libtorch-linux-bionic-cuda11_5-py3_7-gcc7-build: + name: libtorch-linux-bionic-cuda11.5-py3.7-gcc7 + uses: ./.github/workflows/_linux-build.yml + with: + build-environment: libtorch-linux-bionic-cuda11.5-py3.7-gcc7 + docker-image-name: pytorch-linux-bionic-cuda11.5-cudnn8-py3-gcc7 + build-generates-artifacts: false + + libtorch-linux-bionic-cuda11_6-py3_7-gcc7-build: + name: libtorch-linux-bionic-cuda11.6-py3.7-gcc7 + uses: ./.github/workflows/_linux-build.yml + with: + build-environment: libtorch-linux-bionic-cuda11.6-py3.7-gcc7 + docker-image-name: pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7 + build-generates-artifacts: false + + linux-xenial-cuda10_2-py3-gcc7-slow-gradcheck-build: + name: linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck + uses: ./.github/workflows/_linux-build.yml + with: + build-environment: linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck + docker-image-name: pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7 + + linux-xenial-cuda10_2-py3-gcc7-slow-gradcheck-test: + name: linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck + uses: ./.github/workflows/_linux-test.yml + needs: linux-xenial-cuda10_2-py3-gcc7-slow-gradcheck-build + with: + build-environment: linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck + docker-image: ${{ needs.linux-xenial-cuda10_2-py3-gcc7-slow-gradcheck-build.outputs.docker-image }} + test-matrix: | + { include: [ + { config: "default", shard: 1, num_shards: 2, runner: "linux.4xlarge.nvidia.gpu" }, + { config: "default", shard: 2, num_shards: 2, runner: "linux.4xlarge.nvidia.gpu" }, + ]} + + linux-xenial-cuda11_3-py3_7-gcc7-debug-build: + name: linux-xenial-cuda11.3-py3.7-gcc7-debug + uses: ./.github/workflows/_linux-build.yml + with: + build-environment: linux-xenial-cuda11.3-py3.7-gcc7-debug + docker-image-name: pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7 + build-with-debug: true + + linux-xenial-cuda11_3-py3_7-gcc7-debug-test: + name: linux-xenial-cuda11.3-py3.7-gcc7-debug + uses: ./.github/workflows/_linux-test.yml + needs: linux-xenial-cuda11_3-py3_7-gcc7-debug-build + with: + build-environment: linux-xenial-cuda11.3-py3.7-gcc7-debug + docker-image: ${{ needs.linux-xenial-cuda11_3-py3_7-gcc7-debug-build.outputs.docker-image }} + test-matrix: | + { include: [ + { config: "default", shard: 1, num_shards: 2, runner: "linux.4xlarge.nvidia.gpu" }, + { config: "default", shard: 2, num_shards: 2, runner: "linux.4xlarge.nvidia.gpu" }, + ]} + + win-vs2019-cuda11_5-py3-build: + name: win-vs2019-cuda11.5-py3 + uses: ./.github/workflows/_win-build.yml + with: + build-environment: win-vs2019-cuda11.5-py3 + cuda-version: "11.5" + + win-vs2019-cuda11_5-py3-test: + name: win-vs2019-cuda11.5-py3 + uses: ./.github/workflows/_win-test.yml + needs: win-vs2019-cuda11_5-py3-build + with: + build-environment: win-vs2019-cuda11.5-py3 + cuda-version: "11.5" + test-matrix: | + { include: [ + { config: "default", shard: 1, num_shards: 2, runner: "windows.8xlarge.nvidia.gpu" }, + { config: "default", shard: 2, num_shards: 2, runner: "windows.8xlarge.nvidia.gpu" }, + { config: "force_on_cpu", shard: 1, num_shards: 1, runner: "windows.4xlarge" }, + ]} + + win-vs2019-cuda11_6-py3-build: + name: win-vs2019-cuda11.6-py3 + uses: ./.github/workflows/_win-build.yml + with: + build-environment: win-vs2019-cuda11.6-py3 + cuda-version: "11.6" + + win-vs2019-cuda11_6-py3-test: + name: win-vs2019-cuda11.6-py3 + uses: ./.github/workflows/_win-test.yml + needs: win-vs2019-cuda11_6-py3-build + with: + build-environment: win-vs2019-cuda11.6-py3 + cuda-version: "11.6" + test-matrix: | + { include: [ + { config: "default", shard: 1, num_shards: 2, runner: "windows.8xlarge.nvidia.gpu" }, + { config: "default", shard: 2, num_shards: 2, runner: "windows.8xlarge.nvidia.gpu" }, + { config: "force_on_cpu", shard: 1, num_shards: 1, runner: "windows.4xlarge" }, + ]} + + ios-12-5-1-arm64: + name: ios-12-5-1-arm64 + uses: ./.github/workflows/_ios-build-test.yml + with: + build-environment: ios-12-5-1-arm64 + ios-platform: OS + ios-arch: arm64 + secrets: + IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }} + IOS_CERT_SECRET: ${{ secrets.IOS_CERT_SECRET}} + IOS_DEV_TEAM_ID: ${{ secrets.IOS_DEV_TEAM_ID}} + IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }} + + ios-12-5-1-arm64-coreml: + name: ios-12-5-1-arm64-coreml + uses: ./.github/workflows/_ios-build-test.yml + with: + build-environment: ios-12-5-1-arm64-coreml + ios-platform: OS + ios-arch: arm64 + secrets: + IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }} + IOS_CERT_SECRET: ${{ secrets.IOS_CERT_SECRET}} + IOS_DEV_TEAM_ID: ${{ secrets.IOS_DEV_TEAM_ID}} + IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }} + + ios-12-5-1-arm64-custom-ops: + name: ios-12-5-1-arm64-custom-ops + uses: ./.github/workflows/_ios-build-test.yml + with: + build-environment: ios-12-5-1-arm64-custom-ops + ios-platform: OS + ios-arch: arm64 + secrets: + IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }} + IOS_CERT_SECRET: ${{ secrets.IOS_CERT_SECRET}} + IOS_DEV_TEAM_ID: ${{ secrets.IOS_DEV_TEAM_ID}} + IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }} + + ios-12-5-1-arm64-metal: + name: ios-12-5-1-arm64-metal + uses: ./.github/workflows/_ios-build-test.yml + with: + build-environment: ios-12-5-1-arm64-metal + ios-platform: OS + ios-arch: arm64 + secrets: + IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }} + IOS_CERT_SECRET: ${{ secrets.IOS_CERT_SECRET}} + IOS_DEV_TEAM_ID: ${{ secrets.IOS_DEV_TEAM_ID}} + IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }} diff --git a/.github/workflows/pull.yml b/.github/workflows/pull.yml new file mode 100644 index 00000000000000..fb2c96fa56efbe --- /dev/null +++ b/.github/workflows/pull.yml @@ -0,0 +1,320 @@ +name: pull + +on: + pull_request: + push: + branches: + - master + - main + - release/* + workflow_dispatch: + +concurrency: + group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} + cancel-in-progress: true + +jobs: + linux-xenial-py3_7-gcc5_4-build: + name: linux-xenial-py3.7-gcc5.4 + uses: ./.github/workflows/_linux-build.yml + with: + build-environment: linux-xenial-py3.7-gcc5.4 + docker-image-name: pytorch-linux-xenial-py3.7-gcc5.4 + + linux-xenial-py3_7-gcc5_4-test: + name: linux-xenial-py3.7-gcc5.4 + uses: ./.github/workflows/_linux-test.yml + needs: linux-xenial-py3_7-gcc5_4-build + with: + build-environment: linux-xenial-py3.7-gcc5.4 + docker-image: ${{ needs.linux-xenial-py3_7-gcc5_4-build.outputs.docker-image }} + test-matrix: | + { include: [ + { config: "default", shard: 1, num_shards: 2, runner: "linux.2xlarge" }, + { config: "default", shard: 2, num_shards: 2, runner: "linux.2xlarge" }, + { config: "distributed", shard: 1, num_shards: 1, runner: "linux.2xlarge" }, + { config: "docs_test", shard: 1, num_shards: 1, runner: "linux.2xlarge" }, + { config: "backwards_compat", shard: 1, num_shards: 1, runner: "linux.2xlarge" }, + { config: "jit_legacy", shard: 1, num_shards: 1, runner: "linux.2xlarge" }, + ]} + + linux-docs: + name: linux-docs + uses: ./.github/workflows/_docs.yml + needs: linux-xenial-py3_7-gcc5_4-build + with: + build-environment: linux-xenial-py3.7-gcc5.4 + docker-image: ${{ needs.linux-xenial-py3_7-gcc5_4-build.outputs.docker-image }} + + linux-xenial-py3_7-gcc7-build: + name: linux-xenial-py3.7-gcc7 + uses: ./.github/workflows/_linux-build.yml + with: + build-environment: linux-xenial-py3.7-gcc7 + docker-image-name: pytorch-linux-xenial-py3.7-gcc7 + + linux-xenial-py3_7-gcc7-test: + name: linux-xenial-py3.7-gcc7 + uses: ./.github/workflows/_linux-test.yml + needs: linux-xenial-py3_7-gcc7-build + with: + build-environment: linux-xenial-py3.7-gcc7 + docker-image: ${{ needs.linux-xenial-py3_7-gcc7-build.outputs.docker-image }} + test-matrix: | + { include: [ + { config: "default", shard: 1, num_shards: 2, runner: "linux.2xlarge" }, + { config: "default", shard: 2, num_shards: 2, runner: "linux.2xlarge" }, + ]} + + linux-xenial-py3_7-clang7-asan-build: + name: linux-xenial-py3.7-clang7-asan + uses: ./.github/workflows/_linux-build.yml + with: + build-environment: linux-xenial-py3.7-clang7-asan + docker-image-name: pytorch-linux-xenial-py3-clang7-asan + + linux-xenial-py3_7-clang7-asan-test: + name: linux-xenial-py3.7-clang7-asan + uses: ./.github/workflows/_linux-test.yml + needs: linux-xenial-py3_7-clang7-asan-build + with: + build-environment: linux-xenial-py3.7-clang7-asan + docker-image: ${{ needs.linux-xenial-py3_7-clang7-asan-build.outputs.docker-image }} + test-matrix: | + { include: [ + { config: "default", shard: 1, num_shards: 3, runner: "linux.2xlarge" }, + { config: "default", shard: 2, num_shards: 3, runner: "linux.2xlarge" }, + { config: "default", shard: 3, num_shards: 3, runner: "linux.2xlarge" }, + ]} + + linux-xenial-py3_7-gcc7-no-ops: + name: linux-xenial-py3.7-gcc7-no-ops + uses: ./.github/workflows/_linux-build.yml + with: + build-environment: linux-xenial-py3.7-gcc7-no-ops + docker-image-name: pytorch-linux-xenial-py3.7-gcc7 + + linux-xenial-py3_7-clang7-onnx-build: + name: linux-xenial-py3.7-clang7-onnx + uses: ./.github/workflows/_linux-build.yml + with: + build-environment: linux-xenial-py3.7-clang7-onnx + docker-image-name: pytorch-linux-xenial-py3-clang7-onnx + + linux-xenial-py3_7-clang7-onnx-test: + name: linux-xenial-py3.7-clang7-onnx + uses: ./.github/workflows/_linux-test.yml + needs: linux-xenial-py3_7-clang7-onnx-build + with: + build-environment: linux-xenial-py3.7-clang7-onnx + docker-image: ${{ needs.linux-xenial-py3_7-clang7-onnx-build.outputs.docker-image }} + test-matrix: | + { include: [ + { config: "default", shard: 1, num_shards: 2, runner: "linux.2xlarge" }, + { config: "default", shard: 2, num_shards: 2, runner: "linux.2xlarge" }, + ]} + + linux-bionic-py3_7-clang9-build: + name: linux-bionic-py3.7-clang9 + uses: ./.github/workflows/_linux-build.yml + with: + build-environment: linux-bionic-py3.7-clang9 + docker-image-name: pytorch-linux-bionic-py3.7-clang9 + + linux-bionic-py3_7-clang9-test: + name: linux-bionic-py3.7-clang9 + uses: ./.github/workflows/_linux-test.yml + needs: linux-bionic-py3_7-clang9-build + with: + build-environment: linux-bionic-py3.7-clang9 + docker-image: ${{ needs.linux-bionic-py3_7-clang9-build.outputs.docker-image }} + test-matrix: | + { include: [ + { config: "default", shard: 1, num_shards: 2, runner: "linux.2xlarge" }, + { config: "default", shard: 2, num_shards: 2, runner: "linux.2xlarge" }, + { config: "noarch", shard: 1, num_shards: 1, runner: "linux.2xlarge" }, + ]} + + linux-vulkan-bionic-py3_7-clang9-build: + name: linux-vulkan-bionic-py3.7-clang9 + uses: ./.github/workflows/_linux-build.yml + with: + build-environment: linux-vulkan-bionic-py3.7-clang9 + docker-image-name: pytorch-linux-bionic-py3.7-clang9 + + linux-vulkan-bionic-py3_7-clang9-test: + name: linux-vulkan-bionic-py3.7-clang9 + uses: ./.github/workflows/_linux-test.yml + needs: linux-vulkan-bionic-py3_7-clang9-build + with: + build-environment: linux-vulkan-bionic-py3.7-clang9 + docker-image: ${{ needs.linux-vulkan-bionic-py3_7-clang9-build.outputs.docker-image }} + test-matrix: | + { include: [ + { config: "default", shard: 1, num_shards: 1, runner: "linux.2xlarge" }, + ]} + + linux-xenial-cuda11_3-py3_7-gcc7-build: + name: linux-xenial-cuda11.3-py3.7-gcc7 + uses: ./.github/workflows/_linux-build.yml + with: + build-environment: linux-xenial-cuda11.3-py3.7-gcc7 + docker-image-name: pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7 + + linux-xenial-cuda11_3-py3_7-gcc7-test: + name: linux-xenial-cuda11.3-py3.7-gcc7 + uses: ./.github/workflows/_linux-test.yml + needs: linux-xenial-cuda11_3-py3_7-gcc7-build + with: + build-environment: linux-xenial-cuda11.3-py3.7-gcc7 + docker-image: ${{ needs.linux-xenial-cuda11_3-py3_7-gcc7-build.outputs.docker-image }} + test-matrix: | + { include: [ + { config: "default", shard: 1, num_shards: 2, runner: "linux.4xlarge.nvidia.gpu" }, + { config: "default", shard: 2, num_shards: 2, runner: "linux.4xlarge.nvidia.gpu" }, + { config: "distributed", shard: 1, num_shards: 1, runner: "linux.8xlarge.nvidia.gpu" }, + ]} + + linux-bionic-rocm5_0-py3_7-build: + name: linux-bionic-rocm5.0-py3.7 + uses: ./.github/workflows/_linux-build.yml + with: + build-environment: linux-bionic-rocm5.0-py3.7 + docker-image-name: pytorch-linux-bionic-rocm5.0-py3.7 + + linux-bionic-rocm5_0-py3_7-test: + name: linux-bionic-rocm5.0-py3.7 + uses: ./.github/workflows/_rocm-test.yml + needs: linux-bionic-rocm5_0-py3_7-build + with: + build-environment: linux-bionic-rocm5.0-py3.7 + docker-image: ${{ needs.linux-bionic-rocm5_0-py3_7-build.outputs.docker-image }} + test-matrix: | + { include: [ + { config: "default", shard: 1, num_shards: 2, runner: "linux.rocm.gpu" }, + { config: "default", shard: 2, num_shards: 2, runner: "linux.rocm.gpu" }, + ]} + + linux-xenial-py3-clang5-mobile-build: + name: linux-xenial-py3-clang5-mobile-build + uses: ./.github/workflows/_linux-build.yml + with: + build-environment: linux-xenial-py3-clang5-mobile-build + docker-image-name: pytorch-linux-xenial-py3-clang5-asan + build-generates-artifacts: false + + linux-xenial-py3-clang5-mobile-custom-build-static: + name: linux-xenial-py3-clang5-mobile-custom-build-static + uses: ./.github/workflows/_linux-build.yml + with: + build-environment: linux-xenial-py3-clang5-mobile-custom-build-static + docker-image-name: pytorch-linux-xenial-py3-clang5-android-ndk-r19c + build-generates-artifacts: false + + pytorch-xla-linux-bionic-py3_7-clang8-build: + name: pytorch-xla-linux-bionic-py3.7-clang8 + uses: ./.github/workflows/_linux-build.yml + with: + build-environment: pytorch-xla-linux-bionic-py3.7-clang8 + docker-image-name: xla_base + + pytorch-xla-linux-bionic-py3_7-clang8-test: + name: pytorch-xla-linux-bionic-py3.7-clang8 + uses: ./.github/workflows/_linux-test.yml + needs: pytorch-xla-linux-bionic-py3_7-clang8-build + with: + build-environment: pytorch-xla-linux-bionic-py3.7-clang8 + docker-image: ${{ needs.pytorch-xla-linux-bionic-py3_7-clang8-build.outputs.docker-image }} + test-matrix: | + { include: [ + { config: "xla", shard: 1, num_shards: 1, runner: "linux.2xlarge" }, + ]} + + win-vs2019-cpu-py3-build: + name: win-vs2019-cpu-py3 + uses: ./.github/workflows/_win-build.yml + with: + build-environment: win-vs2019-cpu-py3 + cuda-version: cpu + + win-vs2019-cpu-py3-test: + name: win-vs2019-cpu-py3 + uses: ./.github/workflows/_win-test.yml + needs: win-vs2019-cpu-py3-build + with: + build-environment: win-vs2019-cpu-py3 + cuda-version: cpu + test-matrix: | + { include: [ + { config: "default", shard: 1, num_shards: 2, runner: "windows.4xlarge" }, + { config: "default", shard: 2, num_shards: 2, runner: "windows.4xlarge" }, + ]} + + win-vs2019-cuda11_3-py3-build: + name: win-vs2019-cuda11.3-py3 + uses: ./.github/workflows/_win-build.yml + with: + build-environment: win-vs2019-cuda11.3-py3 + cuda-version: "11.3" + + win-vs2019-cuda11_3-py3-test: + name: win-vs2019-cuda11.3-py3 + uses: ./.github/workflows/_win-test.yml + needs: win-vs2019-cuda11_3-py3-build + with: + build-environment: win-vs2019-cuda11.3-py3 + cuda-version: "11.3" + test-matrix: | + { include: [ + { config: "default", shard: 1, num_shards: 2, runner: "windows.8xlarge.nvidia.gpu" }, + { config: "default", shard: 2, num_shards: 2, runner: "windows.8xlarge.nvidia.gpu" }, + { config: "force_on_cpu", shard: 1, num_shards: 1, runner: "windows.4xlarge" }, + ]} + + linux-xenial-cuda11_3-py3_7-gcc7-bazel-test: + name: linux-xenial-cuda11.3-py3.7-gcc7-bazel-test + uses: ./.github/workflows/_bazel-build-test.yml + with: + build-environment: linux-xenial-cuda11.3-py3.7-gcc7-bazel-test + docker-image-name: pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7 + + pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single: + name: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single + uses: ./.github/workflows/_android-build-test.yml + with: + build-environment: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single + docker-image-name: pytorch-linux-xenial-py3-clang5-android-ndk-r19c + + pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit: + name: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit + uses: ./.github/workflows/_android-build-test.yml + with: + build-environment: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit + docker-image-name: pytorch-linux-xenial-py3-clang5-android-ndk-r19c + + linux-xenial-py3_7-gcc5_4-mobile-lightweight-dispatch-build: + name: linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build + uses: ./.github/workflows/_linux-build.yml + with: + build-environment: linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build + docker-image-name: pytorch-linux-xenial-py3.7-gcc5.4 + build-generates-artifacts: false + + deploy-linux-xenial-cuda11_3-py3_7-gcc7-build: + name: deploy-linux-xenial-cuda11.3-py3.7-gcc7 + uses: ./.github/workflows/_linux-build.yml + with: + build-environment: deploy-linux-xenial-cuda11.3-py3.7-gcc7 + docker-image-name: pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7 + + deploy-linux-xenial-cuda11_3-py3_7-gcc7-test: + name: linux-xenial-cuda11.3-py3.7-gcc7 + uses: ./.github/workflows/_linux-test.yml + needs: deploy-linux-xenial-cuda11_3-py3_7-gcc7-build + with: + build-environment: deploy-linux-xenial-cuda11.3-py3.7-gcc7 + docker-image: ${{ needs.deploy-linux-xenial-cuda11_3-py3_7-gcc7-build.outputs.docker-image }} + test-matrix: | + { include: [ + { config: "deploy", shard: 1, num_shards: 1, runner: "linux.4xlarge.nvidia.gpu" }, + ]} diff --git a/.github/workflows/push_nightly_docker_ghcr.yml b/.github/workflows/push_nightly_docker_ghcr.yml index 3a2ce8d6bcde20..ca30c9651ff8f3 100644 --- a/.github/workflows/push_nightly_docker_ghcr.yml +++ b/.github/workflows/push_nightly_docker_ghcr.yml @@ -1,22 +1,30 @@ -name: Build PyTorch nightly Docker image and push to GitHub Container Registry +name: docker-release-builds on: schedule: # Push the nightly docker daily at 1 PM UTC - cron: '0 13 * * *' + # Trigger when we modify something related to these images + pull_request: + paths: + - .github/scripts/build_publish_nightly_docker.sh + - .github/workflows/push_nightly_docker_ghcr.yml + - Dockerfile + - docker.Makefile # Have the ability to trigger this job manually using the API as well workflow_dispatch: jobs: - build-publish-docker: + docker-release-build: if: ${{ github.repository == 'pytorch/pytorch' }} runs-on: linux.2xlarge env: GHCR_PAT: ${{ secrets.GHCR_PAT }} + WITH_PUSH: ${{ github.event_name == 'schedule' }} steps: - - name: Checkout + - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: - ref: master + ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a name: Build and upload nightly docker with: @@ -25,3 +33,7 @@ jobs: command: | set -ex bash .github/scripts/build_publish_nightly_docker.sh + +concurrency: + group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} + cancel-in-progress: true diff --git a/.github/workflows/revert.yml b/.github/workflows/revert.yml index fa5451d9695119..22e0508d88b8f8 100644 --- a/.github/workflows/revert.yml +++ b/.github/workflows/revert.yml @@ -27,6 +27,12 @@ jobs: env: GITHUB_TOKEN: ${{ secrets.MERGEBOT_TOKEN }} PR_NUM: ${{ github.event.client_payload.pr_num }} + COMMENT_ID: ${{ github.event.client_payload.comment_id }} GH_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }} run: | - python3 .github/scripts/trymerge.py --revert "${PR_NUM}" + set -ex + if [ -n "${COMMENT_ID}" ]; then + python3 .github/scripts/trymerge.py --revert --comment-id "${COMMENT_ID}" "${PR_NUM}" + else + python3 .github/scripts/trymerge.py --revert "${PR_NUM}" + fi diff --git a/.github/workflows/run_android_tests.yml b/.github/workflows/run_android_tests.yml new file mode 100644 index 00000000000000..85cef5623d7ed9 --- /dev/null +++ b/.github/workflows/run_android_tests.yml @@ -0,0 +1,67 @@ +name: android-tests + +on: + push: + tags: + # Trigger on release candidate builds + # Release candidate tags look like: v1.11.0-rc1 + - v[0-9]+.[0-9]+.[0-9]+-rc[0-9]+ + - 'ciflow/trunk/*' + - 'ciflow/android/*' + branches: + - master + - main + - release/* + workflow_dispatch: + +concurrency: + group: run-android-tests-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} + cancel-in-progress: true + +defaults: + run: + shell: bash -e -l {0} + +jobs: + + build-and-test: + runs-on: ubuntu-latest + env: + JOB_BASE_NAME: ubuntu-latest-android-tests + steps: + - name: Setup miniconda + uses: conda-incubator/setup-miniconda@v2 + with: + auto-update-conda: true + python-version: 3.8 + activate-environment: build + + - name: Install dependencies + run: | + conda install -y \ + cffi \ + cmake \ + mkl \ + mkl-include \ + ninja \ + numpy \ + pyyaml \ + requests \ + setuptools \ + typing_extensions + + # [see note: pytorch repo ref] + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + + - name: Build PyTorch Android + run: | + export ANDROID_NDK="${ANDROID_SDK_ROOT}/ndk-bundle" + echo "CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"}" >> "${GITHUB_ENV}" + ./scripts/build_pytorch_android.sh x86 + + - name: Run tests + uses: reactivecircus/android-emulator-runner@v2 + with: + api-level: 25 + script: ./android/run_tests.sh diff --git a/.github/workflows/run_torchbench.yml b/.github/workflows/run_torchbench.yml index 5fe6cb772a6a58..d84a32ca318e1c 100644 --- a/.github/workflows/run_torchbench.yml +++ b/.github/workflows/run_torchbench.yml @@ -36,10 +36,15 @@ jobs: # shellcheck disable=SC1091 . "${HOME}"/anaconda3/etc/profile.d/conda.sh conda activate pr-ci - conda install -y numpy requests ninja pyyaml mkl mkl-include setuptools cmake cffi typing_extensions \ + # pin cmake version to 3.22 since 3.23 breaks pytorch build + # see details at: https://github.com/pytorch/pytorch/issues/74985 + conda install -y numpy requests ninja pyyaml mkl mkl-include setuptools cmake=3.22 cffi typing_extensions \ future six dataclasses pillow pytest tabulate gitpython git-lfs tqdm psutil # install magma conda install -y -c pytorch "${MAGMA_VERSION}" + # install ffmpeg-4.4.1 + # torchvision doesn't compile on ffmpeg-5: https://github.com/pytorch/vision/issues/5616 + conda install -y ffmpeg=4.4.1 - name: Setup TorchBench branch run: | # shellcheck disable=SC1091 @@ -84,5 +89,5 @@ jobs: path: ~/.torchbench/bisection/pr${{ github.event.number }} concurrency: - group: run-torchbench-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} + group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} cancel-in-progress: true diff --git a/.github/workflows/test_tools.yml b/.github/workflows/test_tools.yml deleted file mode 100644 index 18e8339fb92b24..00000000000000 --- a/.github/workflows/test_tools.yml +++ /dev/null @@ -1,39 +0,0 @@ -name: Test tools - -on: - push: - branches: - - master - - main - pull_request: - -jobs: - test: - if: ${{ github.repository == 'pytorch/pytorch' }} - runs-on: ubuntu-18.04 - steps: - - name: Setup Python - uses: actions/setup-python@v2 - with: - python-version: 3.8 - architecture: x64 - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - fetch-depth: 0 # deep clone, to allow us to use git log - - name: Install dependencies - # mypy and boto3 versions copied from - # .circleci/docker/common/install_conda.sh - run: | - set -eux - python3 -mpip install -r requirements.txt - python3 -mpip install boto3==1.16.34 - make setup_lint - - name: Test tools - run: | - python3 -m unittest discover -vs tools/test -p 'test_*.py' - python3 -m unittest discover -vs .github/scripts -p 'test_*.py' - -concurrency: - group: test-tools-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true diff --git a/.github/workflows/trunk.yml b/.github/workflows/trunk.yml new file mode 100644 index 00000000000000..e7f051effdd9db --- /dev/null +++ b/.github/workflows/trunk.yml @@ -0,0 +1,222 @@ +name: trunk + +on: + push: + branches: + - master + - main + - release/* + tags: + - ciflow/trunk/* + - ciflow/all/* + workflow_dispatch: + +concurrency: + group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} + cancel-in-progress: true + +jobs: + parallelnative-linux-xenial-py3_7-gcc5_4-build: + name: parallelnative-linux-xenial-py3.7-gcc5.4 + uses: ./.github/workflows/_linux-build.yml + with: + build-environment: parallelnative-linux-xenial-py3.7-gcc5.4 + docker-image-name: pytorch-linux-xenial-py3.7-gcc5.4 + + parallelnative-linux-xenial-py3_7-gcc5_4-test: + name: parallelnative-linux-xenial-py3.7-gcc5.4 + uses: ./.github/workflows/_linux-test.yml + needs: parallelnative-linux-xenial-py3_7-gcc5_4-build + with: + build-environment: parallelnative-linux-xenial-py3.7-gcc5.4 + docker-image: ${{ needs.parallelnative-linux-xenial-py3_7-gcc5_4-build.outputs.docker-image }} + test-matrix: | + { include: [ + { config: "default", shard: 1, num_shards: 2, runner: "linux.2xlarge" }, + { config: "default", shard: 2, num_shards: 2, runner: "linux.2xlarge" }, + ]} + + # Build PyTorch with BUILD_CAFFE2=ON + caffe2-linux-xenial-py3_7-gcc5_4-build: + name: caffe2-linux-xenial-py3.7-gcc5.4 + uses: ./.github/workflows/_linux-build.yml + with: + build-environment: caffe2-linux-xenial-py3.7-gcc5.4 + docker-image-name: pytorch-linux-xenial-py3.7-gcc5.4 + + linux-bionic-cuda10_2-py3_9-gcc7-build: + name: linux-bionic-cuda10.2-py3.9-gcc7 + uses: ./.github/workflows/_linux-build.yml + with: + build-environment: linux-bionic-cuda10.2-py3.9-gcc7 + docker-image-name: pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7 + + linux-bionic-cuda10_2-py3_9-gcc7-test: + name: linux-bionic-cuda10.2-py3.9-gcc7 + uses: ./.github/workflows/_linux-test.yml + needs: linux-bionic-cuda10_2-py3_9-gcc7-build + with: + build-environment: linux-bionic-cuda10.2-py3.9-gcc7 + docker-image: ${{ needs.linux-bionic-cuda10_2-py3_9-gcc7-build.outputs.docker-image }} + test-matrix: | + { include: [ + { config: "default", shard: 1, num_shards: 2, runner: "linux.4xlarge.nvidia.gpu" }, + { config: "default", shard: 2, num_shards: 2, runner: "linux.4xlarge.nvidia.gpu" }, + { config: "slow", shard: 1, num_shards: 1, runner: "linux.4xlarge.nvidia.gpu" }, + { config: "nogpu_NO_AVX", shard: 1, num_shards: 1, runner: "linux.2xlarge" }, + { config: "nogpu_NO_AVX2", shard: 1, num_shards: 1, runner: "linux.2xlarge" }, + { config: "jit_legacy", shard: 1, num_shards: 1, runner: "linux.4xlarge.nvidia.gpu" }, + { config: "distributed", shard: 1, num_shards: 1, runner: "linux.8xlarge.nvidia.gpu" }, + { config: "multigpu", shard: 1, num_shards: 1, runner: "linux.16xlarge.nvidia.gpu" }, + ]} + + libtorch-linux-xenial-cuda10_2-py3_7-gcc7-build: + name: libtorch-linux-xenial-cuda10.2-py3.7-gcc7 + uses: ./.github/workflows/_linux-build.yml + with: + build-environment: libtorch-linux-xenial-cuda10.2-py3.7-gcc7 + docker-image-name: pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7 + build-generates-artifacts: false + + libtorch-linux-xenial-cuda11_3-py3_7-gcc7-build: + name: libtorch-linux-xenial-cuda11.3-py3.7-gcc7 + uses: ./.github/workflows/_linux-build.yml + with: + build-environment: libtorch-linux-xenial-cuda11.3-py3.7-gcc7 + docker-image-name: pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7 + build-generates-artifacts: false + + # no-ops builds test USE_PER_OPERATOR_HEADERS=0 where ATen/ops is not generated + linux-xenial-cuda11_3-py3_7-gcc7-no-ops-build: + name: linux-xenial-cuda11.3-py3.7-gcc7-no-ops + uses: ./.github/workflows/_linux-build.yml + with: + build-environment: linux-xenial-cuda11.3-py3.7-gcc7-no-ops + docker-image-name: pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7 + + linux-bionic-rocm4_5-py3_7-distributed-build: + name: linux-bionic-rocm5.0-py3.7-distributed + uses: ./.github/workflows/_linux-build.yml + with: + build-environment: linux-bionic-rocm5.0-py3.7 + docker-image-name: pytorch-linux-bionic-rocm5.0-py3.7 + + linux-bionic-rocm4_5-py3_7-distributed-test: + name: linux-bionic-rocm5.0-py3.7-distributed + uses: ./.github/workflows/_rocm-test.yml + needs: linux-bionic-rocm4_5-py3_7-distributed-build + with: + build-environment: linux-bionic-rocm5.0-py3.7 + docker-image: ${{ needs.linux-bionic-rocm4_5-py3_7-distributed-build.outputs.docker-image }} + test-matrix: | + { include: [ + { config: "distributed", shard: 1, num_shards: 1, runner: "linux.rocm.gpu" }, + ]} + + pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build: + name: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build + uses: ./.github/workflows/_android-full-build-test.yml + with: + build-environment: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build + docker-image-name: pytorch-linux-xenial-py3-clang5-android-ndk-r19c + secrets: + SONATYPE_NEXUS_USERNAME: ${{ secrets.SONATYPE_NEXUS_USERNAME }} + SONATYPE_NEXUS_PASSWORD: ${{ secrets.SONATYPE_NEXUS_PASSWORD }} + ANDROID_SIGN_KEY: ${{ secrets.ANDROID_SIGN_KEY }} + ANDROID_SIGN_PASS: ${{ secrets.ANDROID_SIGN_PASS }} + SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }} + + linux-bionic-py3_7-clang9-slow-build: + name: linux-bionic-py3.7-clang9-slow + uses: ./.github/workflows/_linux-build.yml + with: + build-environment: linux-bionic-py3.7-clang9-slow + docker-image-name: pytorch-linux-bionic-py3.7-clang9 + + linux-bionic-py3_7-clang9-slow-test: + name: linux-bionic-py3.7-clang9-slow + uses: ./.github/workflows/_linux-test.yml + needs: linux-bionic-py3_7-clang9-slow-build + with: + build-environment: linux-bionic-py3.7-clang9-slow + docker-image: ${{ needs.linux-bionic-py3_7-clang9-slow-build.outputs.docker-image }} + test-matrix: | + { include: [ + { config: "slow", shard: 1, num_shards: 1, runner: "linux.2xlarge" }, + ]} + + ios-12-5-1-x86-64: + name: ios-12-5-1-x86-64 + uses: ./.github/workflows/_ios-build-test.yml + with: + build-environment: ios-12-5-1-x86-64 + ios-platform: SIMULATOR + ios-arch: x86_64 + secrets: + IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }} + IOS_CERT_SECRET: ${{ secrets.IOS_CERT_SECRET}} + IOS_DEV_TEAM_ID: ${{ secrets.IOS_DEV_TEAM_ID}} + IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }} + + ios-12-5-1-x86-64-coreml: + name: ios-12-5-1-x86-64-coreml + uses: ./.github/workflows/_ios-build-test.yml + with: + build-environment: ios-12-5-1-x86-64-coreml + ios-platform: SIMULATOR + ios-arch: x86_64 + secrets: + IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }} + IOS_CERT_SECRET: ${{ secrets.IOS_CERT_SECRET}} + IOS_DEV_TEAM_ID: ${{ secrets.IOS_DEV_TEAM_ID}} + IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }} + + macos-11-py3-x86-64-build: + name: macos-11-py3-x86-64 + uses: ./.github/workflows/_mac-build.yml + with: + build-environment: macos-11-py3-x86-64 + xcode-version: "12.4" + runner-type: macos-11 + build-generates-artifacts: true + secrets: + MACOS_SCCACHE_S3_ACCESS_KEY_ID: ${{ secrets.MACOS_SCCACHE_S3_ACCESS_KEY_ID }} + MACOS_SCCACHE_S3_SECRET_ACCESS_KEY: ${{ secrets.MACOS_SCCACHE_S3_SECRET_ACCESS_KEY }} + + macos-11-py3-x86-64-test: + name: macos-11-py3-x86-64 + uses: ./.github/workflows/_mac-test.yml + needs: macos-11-py3-x86-64-build + with: + build-environment: macos-11-py3-x86-64 + test-matrix: | + { include: [ + { config: "default", shard: 1, num_shards: 2, runner: "macos-11" }, + { config: "default", shard: 2, num_shards: 2, runner: "macos-11" }, + ]} + secrets: + AWS_OSSCI_METRICS_V2_ACCESS_KEY_ID: ${{ secrets.AWS_OSSCI_METRICS_V2_ACCESS_KEY_ID }} + AWS_OSSCI_METRICS_V2_SECRET_ACCESS_KEY: ${{ secrets.AWS_OSSCI_METRICS_V2_SECRET_ACCESS_KEY }} + + macos-10-15-py3-lite-interpreter-x86-64: + name: macos-10-15-py3-lite-interpreter-x86-64 + uses: ./.github/workflows/_mac-build.yml + with: + build-environment: macos-10-15-py3-lite-interpreter-x86-64 + xcode-version: "12" + runner-type: macos-10.15 + build-generates-artifacts: false + secrets: + MACOS_SCCACHE_S3_ACCESS_KEY_ID: ${{ secrets.MACOS_SCCACHE_S3_ACCESS_KEY_ID }} + MACOS_SCCACHE_S3_SECRET_ACCESS_KEY: ${{ secrets.MACOS_SCCACHE_S3_SECRET_ACCESS_KEY }} + + macos-10-15-py3-arm64: + name: macos-10-15-py3-arm64 + uses: ./.github/workflows/_mac-build.yml + with: + build-environment: macos-10-15-py3-arm64 + runner-type: macos-10.15 + build-generates-artifacts: false + secrets: + MACOS_SCCACHE_S3_ACCESS_KEY_ID: ${{ secrets.MACOS_SCCACHE_S3_ACCESS_KEY_ID }} + MACOS_SCCACHE_S3_SECRET_ACCESS_KEY: ${{ secrets.MACOS_SCCACHE_S3_SECRET_ACCESS_KEY }} diff --git a/.github/workflows/trymerge.yml b/.github/workflows/trymerge.yml index ae29ab82462a65..6da9e872ce46e8 100644 --- a/.github/workflows/trymerge.yml +++ b/.github/workflows/trymerge.yml @@ -28,5 +28,10 @@ jobs: GITHUB_TOKEN: ${{ secrets.MERGEBOT_TOKEN }} PR_NUM: ${{ github.event.client_payload.pr_num }} GH_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }} + FORCE: ${{ github.event.client_payload.force}} run: | - python3 .github/scripts/trymerge.py "${PR_NUM}" + if [ -n "${FORCE}" ]; then + python3 .github/scripts/trymerge.py --force "${PR_NUM}" + else + python3 .github/scripts/trymerge.py "${PR_NUM}" + fi diff --git a/.github/workflows/update_pytorch_labels.yml b/.github/workflows/update_pytorch_labels.yml index 82061efa3c3caf..f19347070ecef7 100644 --- a/.github/workflows/update_pytorch_labels.yml +++ b/.github/workflows/update_pytorch_labels.yml @@ -17,8 +17,8 @@ jobs: uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - name: Update PyTorch labels list in S3 env: - AWS_ACCESS_KEY_ID: ${{ secrets.AWS_OSSCI_METRICS_ACCESS_KEY_ID }} - AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_OSSCI_METRICS_SECRET_ACCESS_KEY }} + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_OSSCI_METRICS_V2_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_OSSCI_METRICS_V2_SECRET_ACCESS_KEY }} run: | python3 -m pip install boto3==1.19.12 .github/scripts/export_pytorch_labels.py diff --git a/.github/workflows/upload-test-stats.yml b/.github/workflows/upload-test-stats.yml new file mode 100644 index 00000000000000..bfed85e5131e19 --- /dev/null +++ b/.github/workflows/upload-test-stats.yml @@ -0,0 +1,35 @@ +name: Upload test stats + +on: + workflow_run: + workflows: [pull, trunk, periodic] + types: + - completed + +jobs: + upload-test-stats: + if: github.event.workflow_run.conclusion == 'success' || github.event.workflow_run.conclusion == 'failure' + runs-on: [self-hosted, linux.2xlarge] + + steps: + - name: Print workflow information + env: + TRIGGERING_WORKFLOW: ${{ toJSON(github.event.workflow_run) }} + run: echo "${TRIGGERING_WORKFLOW}" + + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + + - run: | + pip3 install requests==2.26 + pip3 install rockset==0.8.3 + pip3 install boto3==1.19.12 + pip3 install six==1.16.0 + + - name: Upload test stats + env: + ROCKSET_API_KEY: ${{ secrets.ROCKSET_API_KEY }} + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + WORKFLOW_RUN_ID: ${{ github.event.workflow_run.id }} + WORKFLOW_RUN_ATTEMPT: ${{ github.event.workflow_run.run_attempt }} + run: python3 tools/stats/upload_test_stats.py --workflow-run-id "${WORKFLOW_RUN_ID}" --workflow-run-attempt "${WORKFLOW_RUN_ATTEMPT}" diff --git a/.gitignore b/.gitignore index 4a332afb8d0e04..b95fc1a1d9dae6 100644 --- a/.gitignore +++ b/.gitignore @@ -35,6 +35,7 @@ aten/src/ATen/cuda/CUDAConfig.h benchmarks/.data caffe2/cpp_test/ dist/ +docs/build/ docs/cpp/src docs/src/**/* docs/cpp/build @@ -66,8 +67,11 @@ torch/_C/__init__.pyi torch/_C/_nn.pyi torch/_C/_VariableFunctions.pyi torch/_VF.pyi +torch/return_types.pyi torch/nn/functional.pyi +torch/utils/data/datapipes/datapipe.pyi torch/csrc/autograd/generated/* +torch/csrc/lazy/generated/* # Listed manually because some files in this directory are not generated torch/testing/_internal/generated/annotated_fn_args.py torch/testing/_internal/data/*.pt @@ -137,6 +141,7 @@ scripts/release_notes/*.json compile_commands.json *.egg-info/ docs/source/scripts/activation_images/ +docs/source/scripts/quantization_backend_configs/ ## General @@ -307,7 +312,7 @@ bazel-* *.zip # core dump files -core.* +**/core.[1-9]* # Generated if you use the pre-commit script for clang-tidy pr.diff diff --git a/.gitmodules b/.gitmodules index 9c9373ef7229ae..c3c93bb76584c8 100644 --- a/.gitmodules +++ b/.gitmodules @@ -9,7 +9,7 @@ [submodule "third_party/eigen"] ignore = dirty path = third_party/eigen - url = https://github.com/eigenteam/eigen-git-mirror.git + url = https://gitlab.com/libeigen/eigen.git [submodule "third_party/googletest"] ignore = dirty path = third_party/googletest diff --git a/.jenkins/caffe2/test.sh b/.jenkins/caffe2/test.sh index fd626d09c3e221..17a5cf796deb0b 100755 --- a/.jenkins/caffe2/test.sh +++ b/.jenkins/caffe2/test.sh @@ -134,19 +134,15 @@ if [[ $BUILD_ENVIRONMENT == *-rocm* ]]; then rocm_ignore_test+=("--ignore $caffe2_pypath/python/ideep/pool_op_test.py") fi -# NB: Warnings are disabled because they make it harder to see what -# the actual erroring test is echo "Running Python tests.." -if [[ "$BUILD_ENVIRONMENT" == *py3* ]]; then - # locale setting is required by click package with py3 - for loc in "en_US.utf8" "C.UTF-8"; do - if locale -a | grep "$loc" >/dev/null 2>&1; then - export LC_ALL="$loc" - export LANG="$loc" - break; - fi - done -fi +# locale setting is required by click package +for loc in "en_US.utf8" "C.UTF-8"; do + if locale -a | grep "$loc" >/dev/null 2>&1; then + export LC_ALL="$loc" + export LANG="$loc" + break; + fi +done # Some Caffe2 tests fail when run using AVX512 ISA, see https://github.com/pytorch/pytorch/issues/66111 export DNNL_MAX_CPU_ISA=AVX2 @@ -154,6 +150,8 @@ export DNNL_MAX_CPU_ISA=AVX2 # Should still run even in the absence of SHARD_NUMBER if [[ "${SHARD_NUMBER:-1}" == "1" ]]; then pip install --user pytest-sugar + # NB: Warnings are disabled because they make it harder to see what + # the actual erroring test is "$PYTHON" \ -m pytest \ -x \ @@ -170,18 +168,18 @@ if [[ "${SHARD_NUMBER:-1}" == "1" ]]; then "${EXTRA_TESTS[@]}" fi -##################### -# torchvision tests # -##################### +############## +# ONNX tests # +############## if [[ "$BUILD_ENVIRONMENT" == *onnx* ]]; then # Check out torch/vision at 0.9.0-rc1 commit # This hash must match one in .jenkins/pytorch/test.sh pip install -q --user git+https://github.com/pytorch/vision.git@8a2dc6f22ac4389ccba8859aa1e1cb14f1ee53db - pip install -q --user ninja + pip install -q --user ninja flatbuffers==2.0 numpy==1.21.5 onnxruntime==1.11.0 + # numba requires numpy <= 1.20, onnxruntime requires numpy >= 1.21. + # We don't actually need it for our tests, but it's imported if it's present, so uninstall. + pip uninstall -q --yes numba # JIT C++ extensions require ninja, so put it into PATH. export PATH="/var/lib/jenkins/.local/bin:$PATH" - if [[ "$BUILD_ENVIRONMENT" == *py3* ]]; then - pip install -q --user flatbuffers==2.0 onnxruntime==1.9.0 - fi "$ROOT_DIR/scripts/onnx/test.sh" fi diff --git a/.jenkins/pytorch/build.sh b/.jenkins/pytorch/build.sh index 01faa947634d60..977b977609eff6 100755 --- a/.jenkins/pytorch/build.sh +++ b/.jenkins/pytorch/build.sh @@ -20,7 +20,7 @@ if [[ "$BUILD_ENVIRONMENT" == *-mobile-*build* ]]; then exec "$(dirname "${BASH_SOURCE[0]}")/build-mobile.sh" "$@" fi -if [[ "$BUILD_ENVIRONMENT" == *linux-xenial-cuda11.3* || "$BUILD_ENVIRONMENT" == *linux-bionic-cuda11.5* ]]; then +if [[ "$BUILD_ENVIRONMENT" == *linux-xenial-cuda11.3* || "$BUILD_ENVIRONMENT" == *linux-bionic-cuda11.5* || "$BUILD_ENVIRONMENT" == *linux-bionic-cuda11.6* ]]; then # Enabling DEPLOY build (embedded torch python interpreter, experimental) # only on one config for now, can expand later export USE_DEPLOY=ON diff --git a/.jenkins/pytorch/common.sh b/.jenkins/pytorch/common.sh index be5245bf19bc97..e8ce4b2ecb4d31 100644 --- a/.jenkins/pytorch/common.sh +++ b/.jenkins/pytorch/common.sh @@ -8,6 +8,13 @@ set -ex # Save the SCRIPT_DIR absolute path in case later we chdir (as occurs in the gpu perf test) SCRIPT_DIR="$( cd "$(dirname "${BASH_SOURCE[0]}")" ; pwd -P )" +if [[ "${BUILD_ENVIRONMENT}" == *linux* ]]; then + # TODO: Remove this once nvidia package repos are back online + # Comment out nvidia repositories to prevent them from getting apt-get updated, see https://github.com/pytorch/pytorch/issues/74968 + # shellcheck disable=SC2046 + sudo sed -i 's/.*nvidia.*/# &/' $(find /etc/apt/ -type f -name "*.list") +fi + # Required environment variables: # $BUILD_ENVIRONMENT (should be set by your Docker image) @@ -145,7 +152,8 @@ fi # export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"} if [[ "${TEST_CONFIG:-}" == *xla* ]] || \ [[ "$BUILD_ENVIRONMENT" == *centos* ]] || \ - [[ "$BUILD_ENVIRONMENT" == *linux-bionic* ]]; then + [[ "$BUILD_ENVIRONMENT" == *linux-bionic* ]] || \ + [[ "$BUILD_ENVIRONMENT" == *linux-focal* ]]; then if ! which conda; then echo "Expected ${BUILD_ENVIRONMENT} to use conda, but 'which conda' returns empty" exit 1 diff --git a/.jenkins/pytorch/common_utils.sh b/.jenkins/pytorch/common_utils.sh index 54bd44d3ccc6de..4169f6a2cb8c79 100644 --- a/.jenkins/pytorch/common_utils.sh +++ b/.jenkins/pytorch/common_utils.sh @@ -60,19 +60,18 @@ function get_pr_change_files() { set -e } -function file_diff_from_base() { - # The fetch may fail on Docker hosts, this fetch is necessary for GHA - set +e - git fetch origin master --quiet - set -e - git diff --name-only "$(git merge-base origin/master HEAD)" > "$1" -} - function get_bazel() { - # download bazel version - wget https://ossci-linux.s3.amazonaws.com/bazel-4.2.1-linux-x86_64 -O tools/bazel - # verify content - echo '1a4f3a3ce292307bceeb44f459883859c793436d564b95319aacb8af1f20557c tools/bazel' | sha256sum --quiet -c + if [[ $(uname) == "Darwin" ]]; then + # download bazel version + curl https://github.com/bazelbuild/bazel/releases/download/4.2.1/bazel-4.2.1-darwin-x86_64 -Lo tools/bazel + # verify content + echo '74d93848f0c9d592e341e48341c53c87e3cb304a54a2a1ee9cff3df422f0b23c tools/bazel' | shasum -a 256 -c >/dev/null + else + # download bazel version + curl https://ossci-linux.s3.amazonaws.com/bazel-4.2.1-linux-x86_64 -o tools/bazel + # verify content + echo '1a4f3a3ce292307bceeb44f459883859c793436d564b95319aacb8af1f20557c tools/bazel' | shasum -a 256 -c >/dev/null + fi chmod +x tools/bazel } diff --git a/.jenkins/pytorch/macos-test.sh b/.jenkins/pytorch/macos-test.sh index 28f86c6e6e5dae..63e90c05bdd5eb 100755 --- a/.jenkins/pytorch/macos-test.sh +++ b/.jenkins/pytorch/macos-test.sh @@ -10,7 +10,9 @@ conda install -y six pip install -q hypothesis "expecttest==0.1.3" "librosa>=0.6.2" "numba<=0.49.1" psutil "scipy==1.6.3" # TODO move this to docker -pip install unittest-xml-reporting pytest +# Pin unittest-xml-reporting to freeze printing test summary logic, related: https://github.com/pytorch/pytorch/issues/69014 +pip install "unittest-xml-reporting<=3.2.0,>=2.0.0" \ + pytest if [ -z "${IN_CI}" ]; then rm -rf "${WORKSPACE_DIR}"/miniconda3/lib/python3.6/site-packages/torch* diff --git a/.jenkins/pytorch/multigpu-test.sh b/.jenkins/pytorch/multigpu-test.sh index 2d119d09a70c07..481619a8dc314d 100755 --- a/.jenkins/pytorch/multigpu-test.sh +++ b/.jenkins/pytorch/multigpu-test.sh @@ -13,7 +13,8 @@ source "$(dirname "${BASH_SOURCE[0]}")/common.sh" echo "Testing pytorch (distributed only)" if [ -n "${IN_CI}" ]; then # TODO move this to docker - pip_install unittest-xml-reporting + # Pin unittest-xml-reporting to freeze printing test summary logic, related: https://github.com/pytorch/pytorch/issues/69014 + pip_install "unittest-xml-reporting<=3.2.0,>=2.0.0" fi # Disabling tests to see if they solve timeout issues; see https://github.com/pytorch/pytorch/issues/70015 diff --git a/.jenkins/pytorch/short-perf-test-cpu.sh b/.jenkins/pytorch/short-perf-test-cpu.sh index f2e02b52974c69..ff9ef7a84eee75 100755 --- a/.jenkins/pytorch/short-perf-test-cpu.sh +++ b/.jenkins/pytorch/short-perf-test-cpu.sh @@ -17,14 +17,15 @@ pip install -q awscli # Set multipart_threshold to be sufficiently high, so that `aws s3 cp` is not a multipart read # More info at https://github.com/aws/aws-cli/issues/2321 aws configure set default.s3.multipart_threshold 5GB +UPSTREAM_DEFAULT_BRANCH="$(git remote show https://github.com/pytorch/pytorch.git | awk '/HEAD branch/ {print $NF}')" -if [[ "$COMMIT_SOURCE" == master ]]; then - # Get current master commit hash - MASTER_COMMIT_ID=$(git log --format="%H" -n 1) - export MASTER_COMMIT_ID +if [[ "$COMMIT_SOURCE" == "$UPSTREAM_DEFAULT_BRANCH" ]]; then + # Get current default branch commit hash + DEFAULT_BRANCH_COMMIT_ID=$(git log --format="%H" -n 1) + export DEFAULT_BRANCH_COMMIT_ID fi -# Find the master commit to test against +# Find the default branch commit to test against git remote add upstream https://github.com/pytorch/pytorch.git git fetch upstream IFS=$'\n' @@ -33,13 +34,13 @@ while IFS='' read -r commit_id; do LATEST_TESTED_COMMIT=${commit_id} break fi -done < <(git rev-list upstream/master) +done < <(git rev-list upstream/"$UPSTREAM_DEFAULT_BRANCH") aws s3 cp s3://ossci-perf-test/pytorch/cpu_runtime/"${LATEST_TESTED_COMMIT}".json cpu_runtime.json -if [[ "$COMMIT_SOURCE" == master ]]; then +if [[ "$COMMIT_SOURCE" == "$UPSTREAM_DEFAULT_BRANCH" ]]; then # Prepare new baseline file cp cpu_runtime.json new_cpu_runtime.json - python update_commit_hash.py new_cpu_runtime.json "${MASTER_COMMIT_ID}" + python update_commit_hash.py new_cpu_runtime.json "${DEFAULT_BRANCH_COMMIT_ID}" fi # Include tests @@ -54,7 +55,7 @@ fi # Run tests export TEST_MODE="compare_with_baseline" -if [[ "$COMMIT_SOURCE" == master ]]; then +if [[ "$COMMIT_SOURCE" == "$UPSTREAM_DEFAULT_BRANCH" ]]; then export TEST_MODE="compare_and_update" fi @@ -66,8 +67,8 @@ run_test test_cpu_speed_torch_tensor ${TEST_MODE} run_test test_cpu_speed_mini_sequence_labeler 20 ${TEST_MODE} run_test test_cpu_speed_mnist 20 ${TEST_MODE} -if [[ "$COMMIT_SOURCE" == master ]]; then - # This could cause race condition if we are testing the same master commit twice, +if [[ "$COMMIT_SOURCE" == "$UPSTREAM_DEFAULT_BRANCH" ]]; then + # This could cause race condition if we are testing the same default branch commit twice, # but the chance of them executing this line at the same time is low. - aws s3 cp new_cpu_runtime.json s3://ossci-perf-test/pytorch/cpu_runtime/"${MASTER_COMMIT_ID}".json --acl public-read + aws s3 cp new_cpu_runtime.json s3://ossci-perf-test/pytorch/cpu_runtime/"${DEFAULT_BRANCH_COMMIT_ID}".json --acl public-read fi diff --git a/.jenkins/pytorch/short-perf-test-gpu.sh b/.jenkins/pytorch/short-perf-test-gpu.sh index 4d8efee8dc2019..bde8ca5c9dd311 100755 --- a/.jenkins/pytorch/short-perf-test-gpu.sh +++ b/.jenkins/pytorch/short-perf-test-gpu.sh @@ -17,14 +17,15 @@ pip install -q awscli --ignore-installed PyYAML # Set multipart_threshold to be sufficiently high, so that `aws s3 cp` is not a multipart read # More info at https://github.com/aws/aws-cli/issues/2321 aws configure set default.s3.multipart_threshold 5GB +UPSTREAM_DEFAULT_BRANCH="$(git remote show https://github.com/pytorch/pytorch.git | awk '/HEAD branch/ {print $NF}')" -if [[ "$COMMIT_SOURCE" == master ]]; then - # Get current master commit hash - MASTER_COMMIT_ID=$(git log --format="%H" -n 1) - export MASTER_COMMIT_ID +if [[ "$COMMIT_SOURCE" == "$UPSTREAM_DEFAULT_BRANCH" ]]; then + # Get current default branch commit hash + DEFAULT_BRANCH_COMMIT_ID=$(git log --format="%H" -n 1) + export DEFAULT_BRANCH_COMMIT_ID fi -# Find the master commit to test against +# Find the default branch commit to test against git remote add upstream https://github.com/pytorch/pytorch.git git fetch upstream IFS=$'\n' @@ -33,13 +34,13 @@ while IFS='' read -r commit_id; do LATEST_TESTED_COMMIT=${commit_id} break fi -done < <(git rev-list upstream/master) +done < <(git rev-list upstream/"$UPSTREAM_DEFAULT_BRANCH") aws s3 cp s3://ossci-perf-test/pytorch/gpu_runtime/"${LATEST_TESTED_COMMIT}".json gpu_runtime.json -if [[ "$COMMIT_SOURCE" == master ]]; then +if [[ "$COMMIT_SOURCE" == "$UPSTREAM_DEFAULT_BRANCH" ]]; then # Prepare new baseline file cp gpu_runtime.json new_gpu_runtime.json - python update_commit_hash.py new_gpu_runtime.json "${MASTER_COMMIT_ID}" + python update_commit_hash.py new_gpu_runtime.json "${DEFAULT_BRANCH_COMMIT_ID}" fi # Include tests @@ -55,7 +56,7 @@ fi . ./test_gpu_speed_mlstm.sh # Run tests -if [[ "$COMMIT_SOURCE" == master ]]; then +if [[ "$COMMIT_SOURCE" == "$UPSTREAM_DEFAULT_BRANCH" ]]; then run_test test_gpu_speed_mnist 20 compare_and_update run_test test_gpu_speed_word_language_model 20 compare_and_update run_test test_gpu_speed_cudnn_lstm 20 compare_and_update @@ -69,10 +70,10 @@ else run_test test_gpu_speed_mlstm 20 compare_with_baseline fi -if [[ "$COMMIT_SOURCE" == master ]]; then - # This could cause race condition if we are testing the same master commit twice, +if [[ "$COMMIT_SOURCE" == "$UPSTREAM_DEFAULT_BRANCH" ]]; then + # This could cause race condition if we are testing the same default branch commit twice, # but the chance of them executing this line at the same time is low. - aws s3 cp new_gpu_runtime.json s3://ossci-perf-test/pytorch/gpu_runtime/"${MASTER_COMMIT_ID}".json --acl public-read + aws s3 cp new_gpu_runtime.json s3://ossci-perf-test/pytorch/gpu_runtime/"${DEFAULT_BRANCH_COMMIT_ID}".json --acl public-read fi popd diff --git a/.jenkins/pytorch/test.sh b/.jenkins/pytorch/test.sh index 4514aa86330522..b4353c55c10bc1 100755 --- a/.jenkins/pytorch/test.sh +++ b/.jenkins/pytorch/test.sh @@ -77,6 +77,7 @@ fi if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then # Print GPU info + rocminfo rocminfo | grep -E 'Name:.*\sgfx|Marketing' # Manually set NUM_TEST_SHARDS since Jenkins doesn't do it @@ -274,6 +275,14 @@ test_libtorch() { else "$TORCH_BIN_DIR"/test_jit --gtest_filter='-*CUDA' --gtest_output=xml:$TEST_REPORTS_DIR/test_jit.xml fi + + # Run Lazy Tensor cpp tests + if [[ "$BUILD_ENVIRONMENT" == *cuda* && "$BUILD_ENVIRONMENT" != *nogpu* ]]; then + LTC_TS_CUDA=1 "$TORCH_BIN_DIR"/test_lazy --gtest_output=xml:$TEST_REPORTS_DIR/test_lazy.xml + else + "$TORCH_BIN_DIR"/test_lazy --gtest_output=xml:$TEST_REPORTS_DIR/test_lazy.xml + fi + python test/cpp/jit/tests_setup.py shutdown # Wait for background download to finish wait @@ -518,7 +527,7 @@ test_torch_deploy() { ln -sf "$TORCH_LIB_DIR"/libshm* "$TORCH_BIN_DIR" ln -sf "$TORCH_LIB_DIR"/libc10* "$TORCH_BIN_DIR" "$TORCH_BIN_DIR"/test_deploy - "$TORCH_BIN_DIR"/test_api --gtest_filter='IMethodTest.*' + "$TORCH_BIN_DIR"/test_deploy_gpu assert_git_not_dirty } @@ -530,8 +539,9 @@ if ! [[ "${BUILD_ENVIRONMENT}" == *libtorch* || "${BUILD_ENVIRONMENT}" == *-baze (cd test && python -c "import torch; print(torch.__config__.show())") (cd test && python -c "import torch; print(torch.__config__.parallel_info())") fi - -if [[ "${BUILD_ENVIRONMENT}" == *backward* ]]; then +if [[ "${BUILD_ENVIRONMENT}" == *deploy* ]]; then + test_torch_deploy +elif [[ "${BUILD_ENVIRONMENT}" == *backward* ]]; then test_forward_backward_compatibility # Do NOT add tests after bc check tests, see its comment. elif [[ "${TEST_CONFIG}" == *xla* ]]; then @@ -544,9 +554,6 @@ elif [[ "${BUILD_ENVIRONMENT}" == *libtorch* ]]; then # TODO: run some C++ tests echo "no-op at the moment" elif [[ "${BUILD_ENVIRONMENT}" == *-test1 || "${JOB_BASE_NAME}" == *-test1 || ("${SHARD_NUMBER}" == 1 && $NUM_TEST_SHARDS -gt 1) ]]; then - if [[ "${BUILD_ENVIRONMENT}" == *linux-xenial-cuda11.1*-test1* ]]; then - test_torch_deploy - fi test_without_numpy install_torchvision test_python_shard 1 diff --git a/.jenkins/pytorch/win-test-helpers/installation-helpers/install_miniconda3.bat b/.jenkins/pytorch/win-test-helpers/installation-helpers/install_miniconda3.bat index 20b3b4db4c0256..65784863124529 100644 --- a/.jenkins/pytorch/win-test-helpers/installation-helpers/install_miniconda3.bat +++ b/.jenkins/pytorch/win-test-helpers/installation-helpers/install_miniconda3.bat @@ -22,7 +22,7 @@ if "%INSTALL_FRESH_CONDA%"=="1" ( call conda install -y -q python=%PYTHON_VERSION% numpy cffi pyyaml boto3 libuv if errorlevel 1 exit /b if not errorlevel 0 exit /b - call conda install -y -q -c conda-forge cmake + call conda install -y -q -c conda-forge cmake=3.22.3 if errorlevel 1 exit /b if not errorlevel 0 exit /b ) diff --git a/.jenkins/pytorch/win-test-helpers/setup_pytorch_env.bat b/.jenkins/pytorch/win-test-helpers/setup_pytorch_env.bat index 0ad44db5b47dde..c7f3e1b6a6140c 100644 --- a/.jenkins/pytorch/win-test-helpers/setup_pytorch_env.bat +++ b/.jenkins/pytorch/win-test-helpers/setup_pytorch_env.bat @@ -34,7 +34,9 @@ popd :: The version is fixed to avoid flakiness: https://github.com/pytorch/pytorch/issues/31136 ======= -pip install "ninja==1.10.0.post1" future "hypothesis==4.53.2" "expecttest==0.1.3" "librosa>=0.6.2" "scipy==1.6.3" psutil pillow unittest-xml-reporting pytest +:: Pin unittest-xml-reporting to freeze printing test summary logic, related: https://github.com/pytorch/pytorch/issues/69014 + +pip install "ninja==1.10.0.post1" future "hypothesis==4.53.2" "expecttest==0.1.3" "librosa>=0.6.2" "scipy==1.6.3" psutil pillow "unittest-xml-reporting<=3.2.0,>=2.0.0" pytest if errorlevel 1 exit /b if not errorlevel 0 exit /b diff --git a/BUILD.bazel b/BUILD.bazel index 686e798d9765cd..197592f81e0d14 100644 --- a/BUILD.bazel +++ b/BUILD.bazel @@ -3,7 +3,7 @@ load("@pybind11_bazel//:build_defs.bzl", "pybind_extension") load("@rules_proto//proto:defs.bzl", "proto_library") load("@rules_cc//cc:defs.bzl", "cc_binary", "cc_library", "cc_proto_library", "cc_test") load("//third_party:substitution.bzl", "header_template_rule") -load("//:tools/build_variables.bzl", "jit_core_sources", "libtorch_core_sources", "libtorch_cuda_sources", "libtorch_distributed_sources", "libtorch_extra_sources", "libtorch_nvfuser_generated_headers", "libtorch_nvfuser_runtime_sources", "libtorch_python_core_sources", "torch_cpp_srcs") +load("//:tools/build_variables.bzl", "jit_core_sources", "libtorch_core_sources", "libtorch_cuda_sources", "libtorch_distributed_sources", "libtorch_extra_sources", "libtorch_nvfuser_generated_headers", "libtorch_nvfuser_runtime_sources", "libtorch_python_core_sources", "torch_cpp_srcs", "lazy_tensor_ts_sources") load("//tools/rules:cu.bzl", "cu_library") load("//tools/config:defs.bzl", "if_cuda") load("//:aten.bzl", "intern_build_aten_ops", "generate_aten", "aten_ufunc_generated_cpu_sources", "aten_ufunc_generated_cpu_kernel_sources", "aten_ufunc_generated_cuda_sources") @@ -25,16 +25,6 @@ COMMON_COPTS = [ "-DUSE_CUDNN", ]) -# TODO: refactor this into its own library (but how to make -# a binary based off of a module in a library?) -py_binary( - name = "gen", - srcs = ["tools/setup_helpers/gen.py"], - deps = [ - ":tools_codegen" - ], -) - aten_generation_srcs = ["aten/src/ATen/native/native_functions.yaml"] + glob(["aten/src/ATen/templates/**"]) generated_cpu_cpp = [ @@ -102,37 +92,14 @@ generate_aten( aten_ufunc_generated_cuda_sources("aten/src/ATen/{}") + ["aten/src/ATen/Declarations.yaml"] ), - generator=":gen", -) - -py_library( - name = "tools_codegen", - srcs = glob(["tools/codegen/**/*.py"]), -) - -py_library( - name = "tools_autograd", - srcs = glob(["tools/autograd/*.py"]), - data = glob([ - "tools/autograd/*.yaml", - "tools/autograd/templates/*", - ]), - deps = [":tools_codegen"], + generator = "//tools/codegen:gen", ) py_library( name = "tools_jit", srcs = glob(["tools/jit/*.py"]), data = glob(["tools/jit/templates/*"]), -) - -py_binary( - name = "generate_code", - srcs = ["tools/setup_helpers/generate_code.py"], - deps = [ - ":tools_autograd", - ":tools_jit", - ], + visibility = ["//tools/setup_helpers:__pkg__"], ) libtorch_cpp_generated_sources = [ @@ -155,6 +122,11 @@ libtorch_cpp_generated_sources = [ "torch/csrc/autograd/generated/Functions.h", "torch/csrc/autograd/generated/Functions.cpp", "torch/csrc/autograd/generated/variable_factories.h", + "torch/csrc/lazy/generated/LazyIr.h", + "torch/csrc/lazy/generated/LazyNativeFunctions.h", + "torch/csrc/lazy/generated/LazyNativeFunctions.cpp", + "torch/csrc/lazy/generated/RegisterAutogradLazy.cpp", + "torch/csrc/lazy/generated/RegisterLazy.cpp", ] libtorch_python_generated_sources = [ @@ -180,10 +152,17 @@ genrule( name = "all_generated_code", srcs = [ "aten/src/ATen/native/native_functions.yaml", + "aten/src/ATen/native/ts_native_functions.yaml", + "torch/csrc/lazy/core/shape_inference.h", + "torch/csrc/lazy/ts_backend/ts_native_functions.cpp", + "aten/src/ATen/templates/DispatchKeyNativeFunctions.cpp", + "aten/src/ATen/templates/DispatchKeyNativeFunctions.h", + "aten/src/ATen/templates/RegisterDispatchKey.cpp", + "aten/src/ATen/templates/LazyIr.h", ], outs = libtorch_cpp_generated_sources + libtorch_python_generated_sources, - cmd = "$(location :generate_code) --install_dir `dirname $(location torch/csrc/autograd/generated/variable_factories.h)`/../.. --native-functions-path $(location aten/src/ATen/native/native_functions.yaml) --nn-path aten/src", - tools = [":generate_code"], + cmd = "$(location //tools/setup_helpers:generate_code) --install_dir `dirname $(location torch/csrc/autograd/generated/variable_factories.h)`/../.. --native-functions-path $(location aten/src/ATen/native/native_functions.yaml) --gen_lazy_ts_backend", + tools = ["//tools/setup_helpers:generate_code"], ) filegroup( @@ -1368,7 +1347,7 @@ cc_library( py_binary( name = "gen_op", srcs = ["caffe2/contrib/aten/gen_op.py"], - deps = [":tools_codegen"], + deps = ["//tools/codegen"], ) genrule( @@ -1636,17 +1615,12 @@ cc_library( ) # torch -py_binary( - name = "gen_version_header", - srcs = ["tools/setup_helpers/gen_version_header.py"], -) - genrule( name = "version_h", srcs = ["torch/csrc/api/include/torch/version.h.in", "version.txt"], outs = ["torch/csrc/api/include/torch/version.h"], - cmd = "$(location :gen_version_header) --template-path $(location torch/csrc/api/include/torch/version.h.in) --version-path $(location version.txt) --output-path $@", - tools = [':gen_version_header'], + cmd = "$(location //tools/setup_helpers:gen_version_header) --template-path $(location torch/csrc/api/include/torch/version.h.in) --version-path $(location version.txt) --output-path $@", + tools = ['//tools/setup_helpers:gen_version_header'], ) py_binary( @@ -1732,7 +1706,7 @@ cc_library( "torch/csrc/cuda/nccl.cpp", "torch/csrc/distributed/c10d/quantization/quantization_gpu.cu", ], - )) + libtorch_core_sources + libtorch_distributed_sources + torch_cpp_srcs + libtorch_extra_sources + jit_core_sources + [ + )) + libtorch_core_sources + libtorch_distributed_sources + torch_cpp_srcs + libtorch_extra_sources + jit_core_sources + lazy_tensor_ts_sources +[ ":cpp_generated_code", "torch/csrc/jit/serialization/flatbuffer_serializer.cpp", "torch/csrc/jit/mobile/flatbuffer_loader.cpp" @@ -1915,6 +1889,11 @@ cc_test( srcs = glob([ "test/cpp/lazy/*.cpp", "test/cpp/lazy/*.h", + ], exclude=[ + # skip these since they depend on generated LazyIr.h which isn't available in bazel yet + "test/cpp/lazy/test_ir.cpp", + "test/cpp/lazy/test_lazy_ops.cpp", + "test/cpp/lazy/test_lazy_ops_util.cpp", ]), linkstatic = True, tags = [ diff --git a/CMakeLists.txt b/CMakeLists.txt index 8b2e50ce52e7d5..c5c1aeb0b636ea 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -209,7 +209,7 @@ cmake_dependent_option( option(USE_FBGEMM "Use FBGEMM (quantized 8-bit server operators)" ON) option(USE_KINETO "Use Kineto profiling library" ON) option(USE_BREAKPAD "Use breakpad crash dump library" ON) -option(USE_CUPTI_SO "Use CUPTI as a shared library" OFF) +option(USE_CUPTI_SO "Use CUPTI as a shared library" ON) option(USE_FAKELOWP "Use FakeLowp operators" OFF) option(USE_FFMPEG "Use ffmpeg" OFF) option(USE_GFLAGS "Use GFLAGS" OFF) @@ -304,6 +304,7 @@ set(MKLDNN_ENABLE_CONCURRENT_EXEC ${USE_MKLDNN}) cmake_dependent_option( USE_MKLDNN_CBLAS "Use CBLAS in MKLDNN" OFF "USE_MKLDNN" OFF) +option(USE_STATIC_MKL "Prefer to link with MKL statically (Unix only)" OFF) option(USE_DISTRIBUTED "Use distributed" ON) cmake_dependent_option( USE_MPI "Use MPI for Caffe2. Only available if USE_DISTRIBUTED is on." ON @@ -312,12 +313,15 @@ cmake_dependent_option( USE_GLOO "Use Gloo. Only available if USE_DISTRIBUTED is on." ON "USE_DISTRIBUTED" OFF) cmake_dependent_option( - USE_GLOO_WITH_OPENSSL "Use Gloo with OpenSSL. Only available if USE_GLOO is on." OFF + USE_GLOO_WITH_OPENSSL "Use Gloo with OpenSSL. Only available if USE_GLOO is on." OFF "USE_GLOO AND LINUX AND NOT INTERN_BUILD_MOBILE" OFF) cmake_dependent_option( USE_C10D_GLOO "USE C10D GLOO" ON "USE_DISTRIBUTED;USE_GLOO" OFF) cmake_dependent_option( USE_C10D_NCCL "USE C10D NCCL" ON "USE_DISTRIBUTED;USE_NCCL" OFF) +cmake_dependent_option( + USE_NCCL_WITH_UCC "Enable UCC support for ProcessGroupNCCL. Only available if USE_C10D_NCCL is on." OFF + "USE_C10D_NCCL" OFF) cmake_dependent_option( USE_C10D_MPI "USE C10D MPI" ON "USE_DISTRIBUTED;USE_MPI" OFF) cmake_dependent_option( @@ -336,6 +340,9 @@ cmake_dependent_option(USE_CCACHE "Attempt using CCache to wrap the compilation" option(WERROR "Build with -Werror supported by the compiler" OFF) option(USE_COREML_DELEGATE "Use the CoreML backend through delegate APIs" OFF) option(USE_PER_OPERATOR_HEADERS "Whether ATen should generate separate headers for each operator" ON) +cmake_dependent_option( + BUILD_LAZY_TS_BACKEND "Build the lazy Torchscript backend, not compatible with mobile builds" ON + "NOT INTERN_BUILD_MOBILE" OFF) if(USE_CCACHE) @@ -550,6 +557,8 @@ endif(NOT MSVC) # purpose. if(ANDROID OR IOS OR DEFINED ENV{BUILD_PYTORCH_MOBILE_WITH_HOST_TOOLCHAIN}) set(INTERN_BUILD_MOBILE ON) + message(WARNING "INTERN_BUILD_MOBILE is on, disabling BUILD_LAZY_TS_BACKEND") + set(BUILD_LAZY_TS_BACKEND OFF) if(DEFINED ENV{BUILD_PYTORCH_MOBILE_WITH_HOST_TOOLCHAIN}) # C10_MOBILE is derived from Android/iOS toolchain macros in @@ -789,6 +798,8 @@ if(NOT MSVC) if("${CMAKE_CXX_COMPILER_ID}" MATCHES "Clang") string(APPEND CMAKE_CXX_FLAGS " -Wno-range-loop-analysis") string(APPEND CMAKE_CXX_FLAGS " -Wno-pass-failed") + # sign-compare is not part of -Wall, see https://godbolt.org/z/s1YczM41T + string(APPEND CMAKE_CXX_FLAGS " -Wsign-compare") endif() if(CMAKE_COMPILER_IS_GNUCXX AND NOT (CMAKE_CXX_COMPILER_VERSION VERSION_LESS 7.0.0)) string(APPEND CMAKE_CXX_FLAGS " -Wno-stringop-overflow") diff --git a/CODEOWNERS b/CODEOWNERS index e1d2bf0154b069..dd88eac8c2bb09 100644 --- a/CODEOWNERS +++ b/CODEOWNERS @@ -12,6 +12,7 @@ /torch/optim/ @albanD /test/test_public_bindings.py @albanD /docs/source/conf.py @albanD +/aten/src/ATen/native/native_functions.yaml @bdhirsh # Tensorpipe RPC Agent. /torch/csrc/distributed/rpc/tensorpipe_agent.cpp @jiayisuse @osalpekar @lw @beauby @@ -20,15 +21,15 @@ # Distributed package # This list is mostly if you'd like to be tagged as reviewer, feel free to add # or remove yourself from it. -/torch/csrc/distributed/ @mrshenli @zhaojuanmao @pritamdamania87 @rohan-varma @mingzhe09088 @H-Huang @bowangbj -/torch/distributed/ @mrshenli @zhaojuanmao @pritamdamania87 @rohan-varma @mingzhe09088 @H-Huang @bowangbj -/torch/nn/parallel/ @mrshenli @zhaojuanmao @pritamdamania87 @rohan-varma @mingzhe09088 @H-Huang @bowangbj +/torch/csrc/distributed/ @mrshenli @zhaojuanmao @pritamdamania87 @rohan-varma @mingzhe09088 @H-Huang @awgu +/torch/distributed/ @mrshenli @zhaojuanmao @pritamdamania87 @rohan-varma @mingzhe09088 @H-Huang @awgu +/torch/nn/parallel/ @mrshenli @zhaojuanmao @pritamdamania87 @rohan-varma @mingzhe09088 @H-Huang @awgu # Distributed tests # This list is mostly if you'd like to be tagged as reviewer, feel free to add # or remove yourself from it. -/test/distributed @mrshenli @pritamdamania87 @zhaojuanmao @rohan-varma @H-Huang @bowangbj -/torch/testing/_internal/distributed @mrshenli @pritamdamania87 @zhaojuanmao @rohan-varma @H-Huang @bowangbj +/test/distributed @mrshenli @pritamdamania87 @zhaojuanmao @rohan-varma @H-Huang @awgu +/torch/testing/_internal/distributed @mrshenli @pritamdamania87 @zhaojuanmao @rohan-varma @H-Huang @awgu # ONNX Export /torch/csrc/jit/passes/onnx.h @bowenbao @shubhambhokare1 @@ -46,9 +47,9 @@ /.github/ @seemethere @janeyx99 @atalman # Custom Test Infrastructure -/test/run_test.py @pytorch-dev-infra +/test/run_test.py @pytorch/pytorch-dev-infra /torch/testing/_internal/common_device_type.py @mruberry -/torch/testing/_internal/common_utils.py @pytorch-dev-infra +/torch/testing/_internal/common_utils.py @pytorch/pytorch-dev-infra # Parametrizations /torch/nn/utils/parametriz*.py @lezcano diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 59b7ae8a488f5e..b20ecd3ffcb9d9 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -512,7 +512,7 @@ missing file warnings but will still complete. For example, to work on `jit.rst` ```bash cd docs/source -ls | grep rst | grep -v index | grep -v jit | xargs rm +find . -type f | grep rst | grep -v index | grep -v jit | xargs rm # Make your changes, build the docs, etc. @@ -1098,8 +1098,7 @@ This internally invokes our driver script and closely mimics how clang-tidy is r ## Pre-commit tidy/linting hook -We use clang-tidy and flake8 (installed with flake8-bugbear, -flake8-comprehensions, flake8-pyi, and others) to perform additional +We use clang-tidy to perform additional formatting and semantic checking of code. We provide a pre-commit git hook for performing these checks, before a commit is created: @@ -1107,18 +1106,18 @@ performing these checks, before a commit is created: ln -s ../../tools/git-pre-commit .git/hooks/pre-commit ``` -You'll need to install an appropriately configured flake8; see -[Lint as you type](https://github.com/pytorch/pytorch/wiki/Lint-as-you-type) -for documentation on how to do this. - -If you haven't set up the pre-commit hook and have already committed files and +If you have already committed files and CI reports `flake8` errors, you can run the check locally in your PR branch with: ```bash flake8 $(git diff --name-only $(git merge-base --fork-point master)) ``` -fix the code so that no errors are reported when you re-run the above check again, +You'll need to install an appropriately configured flake8; see +[Lint as you type](https://github.com/pytorch/pytorch/wiki/Lint-as-you-type) +for documentation on how to do this. + +Fix the code so that no errors are reported when you re-run the above check again, and then commit the fix. ## Building PyTorch with ASAN @@ -1245,39 +1244,17 @@ Once you submit a PR or push a new commit to a branch that is in an active PR, CI jobs will be run automatically. Some of these may fail and you will need to find out why, by looking at the logs. -Fairly often, a CI failure might be unrelated to your changes. In this case, you +Fairly often, a CI failure might be unrelated to your changes. You can +confirm by going to our [HUD](hud.pytorch.org) and seeing if the CI job +is failing upstream already. In this case, you can usually ignore the failure. See [the following subsection](#which-commit-is-used-in-ci) for more details. Some failures might be related to specific hardware or environment -configurations. In this case, if the job is run by CircleCI, you can -ssh into the job's session to perform manual debugging using the -following steps: - -1. In the CircleCI page for the failed job, make sure you are logged in - and then click the `Rerun` actions dropdown button on the top right. - Click `Rerun Job with SSH`. - -2. When the job reruns, a new step will be added in the `STEPS` tab - labelled `Set up SSH`. Inside that tab will be an ssh command that - you can execute in a shell. - -3. Once you are connected through ssh, you may need to enter a docker - container. Run `docker ps` to check if there are any docker - containers running. Note that your CI job might be in the process - of initiating a docker container, which means it will not show up - yet. It is best to wait until the CI job reaches a step where it is - building pytorch or running pytorch tests. If the job does have a - docker container, run `docker exec -it IMAGE_ID /bin/bash` to - connect to it. - -4. Now you can find the pytorch working directory, which could be - `~/workspace` or `~/project`, and run commands locally to debug - the failure. - -For certain Windows failures, it may be useful to have a full [Remote -Desktop](https://docs.microsoft.com/en-us/windows-server/remote/remote-desktop-services/clients/remote-desktop-clients) connection. See detailed instructions [here](https://github.com/pytorch/pytorch/wiki/Debugging-Windows-with-Remote-Desktop-or-CDB-(CLI-windbg)-on-CircleCI) -for how to set that up after rerunning the job. +configurations. In this case, if you're a Meta employee, you can ssh into +the job's session to perform manual debugging following the instructions in +our [CI wiki](https://github.com/pytorch/pytorch/wiki/Debugging-using-with-ssh-for-Github-Actions). + ### Which commit is used in CI? diff --git a/Dockerfile b/Dockerfile index e5065cd6524b09..a8dc7f141685d6 100644 --- a/Dockerfile +++ b/Dockerfile @@ -32,7 +32,7 @@ RUN curl -fsSL -v -o ~/miniconda.sh -O https://repo.anaconda.com/miniconda/Mini chmod +x ~/miniconda.sh && \ ~/miniconda.sh -b -p /opt/conda && \ rm ~/miniconda.sh && \ - /opt/conda/bin/conda install -y python=${PYTHON_VERSION} conda-build pyyaml numpy ipython&& \ + /opt/conda/bin/conda install -y python=${PYTHON_VERSION} conda-build pyyaml numpy ipython && \ /opt/conda/bin/conda clean -ya FROM dev-base as submodule-update diff --git a/README.md b/README.md index 88a77f04b34555..9105b1d35f3101 100644 --- a/README.md +++ b/README.md @@ -8,6 +8,8 @@ PyTorch is a Python package that provides two high-level features: You can reuse your favorite Python packages such as NumPy, SciPy, and Cython to extend PyTorch when needed. +Our trunk health (Continuous Integration signals) can be found at [hud.pytorch.org](https://hud.pytorch.org/ci/pytorch/pytorch/master). + - [More About PyTorch](#more-about-pytorch) @@ -39,18 +41,6 @@ You can reuse your favorite Python packages such as NumPy, SciPy, and Cython to -| System | 3.7 | 3.8 | -| :---: | :---: | :--: | -| Linux CPU | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-master/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-master/) |
| -| Linux GPU | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-master/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-master/) |
| -| Windows CPU / GPU | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-trigger/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-trigger/) |
| -| Linux (ppc64le) CPU | [![Build Status](https://powerci.osuosl.org/job/pytorch-master-nightly-py3-linux-ppc64le/badge/icon)](https://powerci.osuosl.org/job/pytorch-master-nightly-py3-linux-ppc64le/) |
| -| Linux (ppc64le) GPU | [![Build Status](https://powerci.osuosl.org/job/pytorch-master-nightly-py3-linux-ppc64le-gpu/badge/icon)](https://powerci.osuosl.org/job/pytorch-master-nightly-py3-linux-ppc64le-gpu/) |
| -| Linux (aarch64) CPU | [![Build Status](http://openlabtesting.org:15000/badge?project=pytorch%2Fpytorch&job_name=pytorch-arm64-build-daily-master-py37)](https://status.openlabtesting.org/builds/builds?project=pytorch%2Fpytorch&job_name=pytorch-arm64-build-daily-master-py37) | [![Build Status](http://openlabtesting.org:15000/badge?project=pytorch%2Fpytorch&job_name=pytorch-arm64-build-daily-master-py38)](https://status.openlabtesting.org/builds/builds?project=pytorch%2Fpytorch&job_name=pytorch-arm64-build-daily-master-py38) | - -See also the [CI HUD at hud.pytorch.org](https://hud.pytorch.org/ci/pytorch/pytorch/master). - - ## More About PyTorch At a granular level, PyTorch is a library that consists of the following components: diff --git a/RELEASE.md b/RELEASE.md index 1c95ea1b5328c4..e84ccbc159627d 100644 --- a/RELEASE.md +++ b/RELEASE.md @@ -3,13 +3,25 @@ - [General Overview](#general-overview) + - [Cutting a release branch preparations](#cutting-a-release-branch-preparations) - [Cutting release branches](#cutting-release-branches) + - [`pytorch/pytorch`](#pytorchpytorch) + - [`pytorch/builder` / PyTorch domain libraries](#pytorchbuilder--pytorch-domain-libraries) - [Making release branch specific changes](#making-release-branch-specific-changes) - [Getting CI signal on release branches:](#getting-ci-signal-on-release-branches) - [Drafting RCs (Release Candidates)](#drafting-rcs-release-candidates) - [Release Candidate Storage](#release-candidate-storage) - [Cherry Picking Fixes](#cherry-picking-fixes) - [Promoting RCs to Stable](#promoting-rcs-to-stable) + - [Additonal Steps to prepare for release day](#additonal-steps-to-prepare-for-release-day) + - [Modify release matrix](#modify-release-matrix) + - [Open Google Colab issue](#open-google-colab-issue) +- [Patch Releases](#patch-releases) + - [Patch Release Criteria](#patch-release-criteria) + - [Patch Release Process](#patch-release-process) + - [Triage](#triage) + - [Building a release schedule / cherry picking](#building-a-release-schedule--cherry-picking) + - [Building Binaries / Promotion to Stable](#building-binaries--promotion-to-stable) - [Special Topics](#special-topics) - [Updating submodules for a release](#updating-submodules-for-a-release) @@ -19,32 +31,55 @@ Releasing a new version of PyTorch generally entails 3 major steps: +0. Cutting a release branch preparations 1. Cutting a release branch and making release branch specific changes 2. Drafting RCs (Release Candidates), and merging cherry picks -3. Promoting RCs to stable +3. Promoting RCs to stable and performing release day tasks + +## Cutting a release branch preparations + +Following Requirements needs to be met prior to final RC Cut: + +* Resolve all outstanding issues in the milestones(for example [1.11.0](https://github.com/pytorch/pytorch/milestone/28))before first RC cut is completed. After RC cut is completed following script should be executed from builder repo in order to validate the presence of the fixes in the release branch : +``` python github_analyze.py --repo-path ~/local/pytorch --remote upstream --branch release/1.11 --milestone-id 26 --missing-in-branch ``` +* Validate that all new workflows have been created in the PyTorch and domain libraries included in the release. Validate it against all dimensions of release matrix, including operating systems(Linux, MacOS, Windows), Python versions as well as CPU architectures(x86 and arm) and accelerator versions(CUDA, ROCm). +* All the nighly jobs for pytorch and domain libraries should be green. Validate this using following HUD links: + * [Pytorch](https://hud.pytorch.org/hud/pytorch/pytorch/nightly) + * [TorchVision](https://hud.pytorch.org/hud/pytorch/vision/nightly) + * [TorchAudio](https://hud.pytorch.org/hud/pytorch/audio/nightly) + * [TorchText](https://hud.pytorch.org/hud/pytorch/text/nightly) ## Cutting release branches +### `pytorch/pytorch` + Release branches are typically cut from the branch [`viable/strict`](https://github.com/pytorch/pytorch/tree/viable/strict) as to ensure that tests are passing on the release branch. -Release branches *should* be prefixed like so: -``` -release/{MAJOR}.{MINOR} -``` +There's a convenience script to create release branches from current `viable/strict` (from root `pytorch/pytorch`): -An example of this would look like: +```bash +DRY_RUN=disabled scripts/release/cut-release-branch.sh ``` -release/1.8 + +This script should create 2 branches: +* `release/{MAJOR}.{MINOR}` +* `orig/release/{MAJOR}.{MINOR}` + +### `pytorch/builder` / PyTorch domain libraries + +Convenience script can also be used domains as well as `pytorch/builder` + +> NOTE: RELEASE_VERSION only needs to be specified if version.txt is not available in root directory + +```bash +DRY_RUN=disabled GIT_BRANCH_TO_CUT_FROM=main RELEASE_VERSION=1.11 scripts/release/cut-release-branch.sh ``` -Please make sure to create branch that pins divergent point of release branch from the main branch, i.e. `orig/release/{MAJOR}.{MINOR}` ### Making release branch specific changes These are examples of changes that should be made to release branches so that CI / tooling can function normally on them: -* Update target determinator to use release branch: - * Example: https://github.com/pytorch/pytorch/pull/40712 * Update backwards compatibility tests to use RC binaries instead of nightlies * Example: https://github.com/pytorch/pytorch/pull/40706 * A release branches should also be created in [`pytorch/xla`](https://github.com/pytorch/xla) and [`pytorch/builder`](https://github.com/pytorch/builder) repos and pinned in `pytorch/pytorch` @@ -57,6 +92,7 @@ These are examples of changes that should be made to the *default* branch after * Example: https://github.com/pytorch/pytorch/pull/65435 ### Getting CI signal on release branches: + Create a PR from `release/{MAJOR}.{MINOR}` to `orig/release/{MAJOR}.{MINOR}` in order to start CI testing for cherry-picks into release branch. Example: @@ -99,8 +135,11 @@ For fixes that are to go into a release after the release branch has been cut we An example of this would look like: * https://github.com/pytorch/pytorch/issues/51886 +Please also make sure to add milestone target to the PR/issue, especially if it needs to be considered for inclusion into the dot release. + **NOTE**: The cherry pick process is not an invitation to add new features, it is mainly there to fix regressions + ## Promoting RCs to Stable Promotion of RCs to stable is done with this script: @@ -114,6 +153,69 @@ Promotion should occur in two steps: **NOTE**: The promotion of wheels to PyPI can only be done once so take caution when attempting to promote wheels to PyPI, (see https://github.com/pypa/warehouse/issues/726 for a discussion on potential draft releases within PyPI) +## Additonal Steps to prepare for release day + +The following should be prepared for the release day + +### Modify release matrix + +Need to modify release matrix for get started page. See following [PR](https://github.com/pytorch/pytorch.github.io/pull/959) as reference. + +After modifying published_versions.json you will need to regenerate regenerate the quick-start-module.js file run following command +``` +python3 scripts/gen_quick_start_module.py >assets/quick-start-module.js +``` +Please note: This PR needs to be merged on the release day and hence it should be absolutely free of any failures. To test this PR, open another test PR but pointing to to the Release candidate location as above [Release Candidate Storage](RELEASE.md#release-candidate-storage) + +### Open Google Colab issue + +This is normally done right after the release is completed. We would need to create Google Colab Issue see following [PR](https://github.com/googlecolab/colabtools/issues/2372) + +# Patch Releases + +A patch release is a maintenance release of PyTorch that includes fixes for regressions found in a previous minor release. Patch releases typically will bump the `patch` version from semver (i.e. `[major].[minor].[patch]` + +## Patch Release Criteria + +Patch releases should be considered if a regression meets the following criteria: + +1. Does the regression break core functionality (stable / beta features) including functionality in first party domain libraries? + * First party domain libraries: + * [pytorch/vision](https://github.com/pytorch/vision) + * [pytorch/audio](https://github.com/pytorch/audio) + * [pytorch/text](https://github.com/pytorch/text) +3. Is there not a viable workaround? + * Can the regression be solved simply or is it not overcomable? + +> *NOTE*: Patch releases should only be considered when functionality is broken, documentation does not typically fall within this category + +## Patch Release Process + +### Triage + +> Main POC: Triage Reviewers + +1. Tag issues / pull requests that are candidates for a potential patch release with `triage review` + * ![adding triage review label](https://user-images.githubusercontent.com/1700823/132589089-a9210a14-6159-409d-95e5-f79067f6fa38.png) +2. Triage reviewers will then check if the regression / fix identified fits within above mentioned [Patch Release Criteria](#patch-release-criteria) +3. Triage reviewers will then add the issue / pull request to the related milestone (i.e. `1.9.1`) if the regressions if found to be within the [Patch Release Criteria](#patch-release-criteria) + * ![adding to milestone](https://user-images.githubusercontent.com/1700823/131175980-148ff38d-44c3-4611-8a1f-cd2fd1f4c49d.png) + +### Building a release schedule / cherry picking + +> Main POC: Patch Release Managers + +1. After regressions / fixes have been triaged Patch Release Managers will work together and build /announce a schedule for the patch release + * *NOTE*: Ideally this should be ~2-3 weeks after a regression has been identified to allow other regressions to be identified +2. Patch Release Managers will work with the authors of the regressions / fixes to cherry pick their change into the related release branch (i.e. `release/1.9` for `1.9.1`) + +### Building Binaries / Promotion to Stable + +> Main POC: Patch Release managers + +1. Patch Release Managers will follow the process of [Drafting RCs (Release Candidates)](#drafting-rcs-release-candidates) +2. Patch Release Managers will follow the process of [Promoting RCs to Stable](#promoting-rcs-to-stable) + # Special Topics ## Updating submodules for a release diff --git a/android/pytorch_android/src/androidTest/assets/activation_ops.ptl b/android/pytorch_android/src/androidTest/assets/activation_ops.ptl new file mode 100644 index 00000000000000..179f426ae7cdf6 Binary files /dev/null and b/android/pytorch_android/src/androidTest/assets/activation_ops.ptl differ diff --git a/android/pytorch_android/src/androidTest/assets/android_api_module.ptl b/android/pytorch_android/src/androidTest/assets/android_api_module.ptl new file mode 100644 index 00000000000000..df62dd86208811 Binary files /dev/null and b/android/pytorch_android/src/androidTest/assets/android_api_module.ptl differ diff --git a/android/pytorch_android/src/androidTest/assets/blas_lapack_ops.ptl b/android/pytorch_android/src/androidTest/assets/blas_lapack_ops.ptl new file mode 100644 index 00000000000000..fea933ee644fd4 Binary files /dev/null and b/android/pytorch_android/src/androidTest/assets/blas_lapack_ops.ptl differ diff --git a/android/pytorch_android/src/androidTest/assets/comparison_ops.ptl b/android/pytorch_android/src/androidTest/assets/comparison_ops.ptl new file mode 100644 index 00000000000000..01b1c153e7515a Binary files /dev/null and b/android/pytorch_android/src/androidTest/assets/comparison_ops.ptl differ diff --git a/android/pytorch_android/src/androidTest/assets/convolution_ops.ptl b/android/pytorch_android/src/androidTest/assets/convolution_ops.ptl new file mode 100644 index 00000000000000..db253a207a33d0 Binary files /dev/null and b/android/pytorch_android/src/androidTest/assets/convolution_ops.ptl differ diff --git a/android/pytorch_android/src/androidTest/assets/distance_function_ops.ptl b/android/pytorch_android/src/androidTest/assets/distance_function_ops.ptl new file mode 100644 index 00000000000000..cc4d994f440a4d Binary files /dev/null and b/android/pytorch_android/src/androidTest/assets/distance_function_ops.ptl differ diff --git a/android/pytorch_android/src/androidTest/assets/dropout_ops.ptl b/android/pytorch_android/src/androidTest/assets/dropout_ops.ptl new file mode 100644 index 00000000000000..422c2f60e6be25 Binary files /dev/null and b/android/pytorch_android/src/androidTest/assets/dropout_ops.ptl differ diff --git a/android/pytorch_android/src/androidTest/assets/dynamic_quant_ops.ptl b/android/pytorch_android/src/androidTest/assets/dynamic_quant_ops.ptl new file mode 100644 index 00000000000000..0bbbce9671c3c4 Binary files /dev/null and b/android/pytorch_android/src/androidTest/assets/dynamic_quant_ops.ptl differ diff --git a/android/pytorch_android/src/androidTest/assets/fused_quant_ops.ptl b/android/pytorch_android/src/androidTest/assets/fused_quant_ops.ptl new file mode 100644 index 00000000000000..9d2b3f9dde1a71 Binary files /dev/null and b/android/pytorch_android/src/androidTest/assets/fused_quant_ops.ptl differ diff --git a/android/pytorch_android/src/androidTest/assets/general_quant_ops.ptl b/android/pytorch_android/src/androidTest/assets/general_quant_ops.ptl new file mode 100644 index 00000000000000..7d4888e0bc817e Binary files /dev/null and b/android/pytorch_android/src/androidTest/assets/general_quant_ops.ptl differ diff --git a/android/pytorch_android/src/androidTest/assets/linear_ops.ptl b/android/pytorch_android/src/androidTest/assets/linear_ops.ptl new file mode 100644 index 00000000000000..ca9066c03dc4f3 Binary files /dev/null and b/android/pytorch_android/src/androidTest/assets/linear_ops.ptl differ diff --git a/android/pytorch_android/src/androidTest/assets/loss_function_ops.ptl b/android/pytorch_android/src/androidTest/assets/loss_function_ops.ptl new file mode 100644 index 00000000000000..4c0592e5485afa Binary files /dev/null and b/android/pytorch_android/src/androidTest/assets/loss_function_ops.ptl differ diff --git a/android/pytorch_android/src/androidTest/assets/mobilenet_v2.ptl b/android/pytorch_android/src/androidTest/assets/mobilenet_v2.ptl new file mode 100644 index 00000000000000..9b8297a250d35d Binary files /dev/null and b/android/pytorch_android/src/androidTest/assets/mobilenet_v2.ptl differ diff --git a/android/pytorch_android/src/androidTest/assets/nn_utils_ops.ptl b/android/pytorch_android/src/androidTest/assets/nn_utils_ops.ptl new file mode 100644 index 00000000000000..5d008eab03b9b8 Binary files /dev/null and b/android/pytorch_android/src/androidTest/assets/nn_utils_ops.ptl differ diff --git a/android/pytorch_android/src/androidTest/assets/normalization_ops.ptl b/android/pytorch_android/src/androidTest/assets/normalization_ops.ptl new file mode 100644 index 00000000000000..d85bd06c763bc7 Binary files /dev/null and b/android/pytorch_android/src/androidTest/assets/normalization_ops.ptl differ diff --git a/android/pytorch_android/src/androidTest/assets/other_math_ops.ptl b/android/pytorch_android/src/androidTest/assets/other_math_ops.ptl new file mode 100644 index 00000000000000..7209c3b3bd1fdd Binary files /dev/null and b/android/pytorch_android/src/androidTest/assets/other_math_ops.ptl differ diff --git a/android/pytorch_android/src/androidTest/assets/padding_ops.ptl b/android/pytorch_android/src/androidTest/assets/padding_ops.ptl new file mode 100644 index 00000000000000..02e57ba207129c Binary files /dev/null and b/android/pytorch_android/src/androidTest/assets/padding_ops.ptl differ diff --git a/android/pytorch_android/src/androidTest/assets/pointwise_ops.ptl b/android/pytorch_android/src/androidTest/assets/pointwise_ops.ptl new file mode 100644 index 00000000000000..948ed4832660ae Binary files /dev/null and b/android/pytorch_android/src/androidTest/assets/pointwise_ops.ptl differ diff --git a/android/pytorch_android/src/androidTest/assets/pooling_ops.ptl b/android/pytorch_android/src/androidTest/assets/pooling_ops.ptl new file mode 100644 index 00000000000000..df051163413f5a Binary files /dev/null and b/android/pytorch_android/src/androidTest/assets/pooling_ops.ptl differ diff --git a/android/pytorch_android/src/androidTest/assets/recurrent_ops.ptl b/android/pytorch_android/src/androidTest/assets/recurrent_ops.ptl new file mode 100644 index 00000000000000..245ceb454d5387 Binary files /dev/null and b/android/pytorch_android/src/androidTest/assets/recurrent_ops.ptl differ diff --git a/android/pytorch_android/src/androidTest/assets/reduction_ops.ptl b/android/pytorch_android/src/androidTest/assets/reduction_ops.ptl new file mode 100644 index 00000000000000..13771302c66802 Binary files /dev/null and b/android/pytorch_android/src/androidTest/assets/reduction_ops.ptl differ diff --git a/android/pytorch_android/src/androidTest/assets/sampling_ops.ptl b/android/pytorch_android/src/androidTest/assets/sampling_ops.ptl new file mode 100644 index 00000000000000..416be7cb127953 Binary files /dev/null and b/android/pytorch_android/src/androidTest/assets/sampling_ops.ptl differ diff --git a/android/pytorch_android/src/androidTest/assets/shuffle_ops.ptl b/android/pytorch_android/src/androidTest/assets/shuffle_ops.ptl new file mode 100644 index 00000000000000..5e5520118764ef Binary files /dev/null and b/android/pytorch_android/src/androidTest/assets/shuffle_ops.ptl differ diff --git a/android/pytorch_android/src/androidTest/assets/sparse_ops.ptl b/android/pytorch_android/src/androidTest/assets/sparse_ops.ptl new file mode 100644 index 00000000000000..a16f68f8f95ff8 Binary files /dev/null and b/android/pytorch_android/src/androidTest/assets/sparse_ops.ptl differ diff --git a/android/pytorch_android/src/androidTest/assets/spectral_ops.ptl b/android/pytorch_android/src/androidTest/assets/spectral_ops.ptl new file mode 100644 index 00000000000000..9828dd2ba9013a Binary files /dev/null and b/android/pytorch_android/src/androidTest/assets/spectral_ops.ptl differ diff --git a/android/pytorch_android/src/androidTest/assets/static_quant_ops.ptl b/android/pytorch_android/src/androidTest/assets/static_quant_ops.ptl new file mode 100644 index 00000000000000..d0a0a254d1efe1 Binary files /dev/null and b/android/pytorch_android/src/androidTest/assets/static_quant_ops.ptl differ diff --git a/android/pytorch_android/src/androidTest/assets/tensor_creation_ops.ptl b/android/pytorch_android/src/androidTest/assets/tensor_creation_ops.ptl new file mode 100644 index 00000000000000..d897b43cd36ca9 Binary files /dev/null and b/android/pytorch_android/src/androidTest/assets/tensor_creation_ops.ptl differ diff --git a/android/pytorch_android/src/androidTest/assets/tensor_general_ops.ptl b/android/pytorch_android/src/androidTest/assets/tensor_general_ops.ptl new file mode 100644 index 00000000000000..6f2855ea83eaa5 Binary files /dev/null and b/android/pytorch_android/src/androidTest/assets/tensor_general_ops.ptl differ diff --git a/android/pytorch_android/src/androidTest/assets/tensor_indexing_ops.ptl b/android/pytorch_android/src/androidTest/assets/tensor_indexing_ops.ptl new file mode 100644 index 00000000000000..ac9cb8c4b94add Binary files /dev/null and b/android/pytorch_android/src/androidTest/assets/tensor_indexing_ops.ptl differ diff --git a/android/pytorch_android/src/androidTest/assets/tensor_typing_ops.ptl b/android/pytorch_android/src/androidTest/assets/tensor_typing_ops.ptl new file mode 100644 index 00000000000000..3e2f4d8cc68922 Binary files /dev/null and b/android/pytorch_android/src/androidTest/assets/tensor_typing_ops.ptl differ diff --git a/android/pytorch_android/src/androidTest/assets/tensor_view_ops.ptl b/android/pytorch_android/src/androidTest/assets/tensor_view_ops.ptl new file mode 100644 index 00000000000000..5e2dc829484265 Binary files /dev/null and b/android/pytorch_android/src/androidTest/assets/tensor_view_ops.ptl differ diff --git a/android/pytorch_android/src/androidTest/assets/torchscript_builtin_ops.ptl b/android/pytorch_android/src/androidTest/assets/torchscript_builtin_ops.ptl new file mode 100644 index 00000000000000..2d2532df2fd257 Binary files /dev/null and b/android/pytorch_android/src/androidTest/assets/torchscript_builtin_ops.ptl differ diff --git a/android/pytorch_android/src/androidTest/assets/torchscript_collection_ops.ptl b/android/pytorch_android/src/androidTest/assets/torchscript_collection_ops.ptl new file mode 100644 index 00000000000000..ce434b3b4210d5 Binary files /dev/null and b/android/pytorch_android/src/androidTest/assets/torchscript_collection_ops.ptl differ diff --git a/android/pytorch_android/src/androidTest/assets/transformer_ops.ptl b/android/pytorch_android/src/androidTest/assets/transformer_ops.ptl new file mode 100644 index 00000000000000..ebb2bd693604a7 Binary files /dev/null and b/android/pytorch_android/src/androidTest/assets/transformer_ops.ptl differ diff --git a/android/pytorch_android/src/androidTest/assets/vision_function_ops.ptl b/android/pytorch_android/src/androidTest/assets/vision_function_ops.ptl new file mode 100644 index 00000000000000..c9c45655e2bca9 Binary files /dev/null and b/android/pytorch_android/src/androidTest/assets/vision_function_ops.ptl differ diff --git a/android/pytorch_android/src/androidTest/java/org/pytorch/PytorchTestBase.java b/android/pytorch_android/src/androidTest/java/org/pytorch/PytorchTestBase.java index 5a1405e679bcfb..9abcbcbda8a6ca 100644 --- a/android/pytorch_android/src/androidTest/java/org/pytorch/PytorchTestBase.java +++ b/android/pytorch_android/src/androidTest/java/org/pytorch/PytorchTestBase.java @@ -12,7 +12,7 @@ import org.junit.Test; public abstract class PytorchTestBase { - private static final String TEST_MODULE_ASSET_NAME = "test.pt"; + private static final String TEST_MODULE_ASSET_NAME = "android_api_module.ptl"; @Test public void testForwardNull() throws IOException { @@ -377,6 +377,186 @@ public void testChannelsLastConv2d() throws IOException { new long[] {2, 11, -101, 4, 12, -102, 6, 13, -103, 8, 14, -104}); } + @Test + public void testMobileNetV2() throws IOException { + try { + final Module module = loadModel("mobilenet_v2.ptl"); + final IValue inputs = module.runMethod("get_all_bundled_inputs"); + assertTrue(inputs.isList()); + final IValue input = inputs.toList()[0]; + assertTrue(input.isTuple()); + module.forward(input.toTuple()[0]); + assertTrue(true); + } catch (Exception ex) { + assertTrue("failed to run MobileNetV2 " + ex.getMessage(), false); + } + } + + @Test + public void testPointwiseOps() throws IOException { + runModel("pointwise_ops"); + } + + @Test + public void testReductionOps() throws IOException { + runModel("reduction_ops"); + } + + @Test + public void testComparisonOps() throws IOException { + runModel("comparison_ops"); + } + + @Test + public void testOtherMathOps() throws IOException { + runModel("other_math_ops"); + } + + @Test + public void testSpectralOps() throws IOException { + runModel("spectral_ops"); + } + + @Test + public void testBlasLapackOps() throws IOException { + runModel("blas_lapack_ops"); + } + + @Test + public void testSamplingOps() throws IOException { + runModel("sampling_ops"); + } + + @Test + public void testTensorOps() throws IOException { + runModel("tensor_general_ops"); + } + + @Test + public void testTensorCreationOps() throws IOException { + runModel("tensor_creation_ops"); + } + + @Test + public void testTensorIndexingOps() throws IOException { + runModel("tensor_indexing_ops"); + } + + @Test + public void testTensorTypingOps() throws IOException { + runModel("tensor_typing_ops"); + } + + @Test + public void testTensorViewOps() throws IOException { + runModel("tensor_view_ops"); + } + + @Test + public void testConvolutionOps() throws IOException { + runModel("convolution_ops"); + } + + @Test + public void testPoolingOps() throws IOException { + runModel("pooling_ops"); + } + + @Test + public void testPaddingOps() throws IOException { + runModel("padding_ops"); + } + + @Test + public void testActivationOps() throws IOException { + runModel("activation_ops"); + } + + @Test + public void testNormalizationOps() throws IOException { + runModel("normalization_ops"); + } + + @Test + public void testRecurrentOps() throws IOException { + runModel("recurrent_ops"); + } + + @Test + public void testTransformerOps() throws IOException { + runModel("transformer_ops"); + } + + @Test + public void testLinearOps() throws IOException { + runModel("linear_ops"); + } + + @Test + public void testDropoutOps() throws IOException { + runModel("dropout_ops"); + } + + @Test + public void testSparseOps() throws IOException { + runModel("sparse_ops"); + } + + @Test + public void testDistanceFunctionOps() throws IOException { + runModel("distance_function_ops"); + } + + @Test + public void testLossFunctionOps() throws IOException { + runModel("loss_function_ops"); + } + + @Test + public void testVisionFunctionOps() throws IOException { + runModel("vision_function_ops"); + } + + @Test + public void testShuffleOps() throws IOException { + runModel("shuffle_ops"); + } + + @Test + public void testNNUtilsOps() throws IOException { + runModel("nn_utils_ops"); + } + + @Test + public void testQuantOps() throws IOException { + runModel("general_quant_ops"); + } + + @Test + public void testDynamicQuantOps() throws IOException { + runModel("dynamic_quant_ops"); + } + + @Test + public void testStaticQuantOps() throws IOException { + runModel("static_quant_ops"); + } + + @Test + public void testFusedQuantOps() throws IOException { + runModel("fused_quant_ops"); + } + + @Test + public void testTorchScriptBuiltinQuantOps() throws IOException { + runModel("torchscript_builtin_ops"); + } + + @Test + public void testTorchScriptCollectionQuantOps() throws IOException { + runModel("torchscript_collection_ops"); + } + static void assertIValueTensor( final IValue ivalue, final MemoryFormat memoryFormat, @@ -389,5 +569,15 @@ static void assertIValueTensor( assertArrayEquals(expectedData, t.getDataAsLongArray()); } + void runModel(final String name) throws IOException { + final Module storage_module = loadModel(name + ".ptl"); + storage_module.forward(); + + // TODO enable this once the on-the-fly script is ready + // final Module on_the_fly_module = loadModel(name + "_temp.ptl"); + // on_the_fly_module.forward(); + assertTrue(true); + } + protected abstract Module loadModel(String assetName) throws IOException; } diff --git a/android/pytorch_android/src/main/cpp/pytorch_jni_common.cpp b/android/pytorch_android/src/main/cpp/pytorch_jni_common.cpp index 8094f7bdc97415..5ed0c9978e8346 100644 --- a/android/pytorch_android/src/main/cpp/pytorch_jni_common.cpp +++ b/android/pytorch_android/src/main/cpp/pytorch_jni_common.cpp @@ -223,7 +223,8 @@ class TensorHybrid : public facebook::jni::HybridClass { } else { facebook::jni::throwNewJavaException( facebook::jni::gJavaLangIllegalArgumentException, - "at::Tensor scalar type is not supported on java side"); + "at::Tensor scalar type %s is not supported on java side", + c10::toString(scalarType)); } const auto& tensorShape = tensor.sizes(); diff --git a/aten/src/ATen/BatchingRegistrations.cpp b/aten/src/ATen/BatchingRegistrations.cpp index 0eb0d697078ea1..c7c95cf92c9fcb 100644 --- a/aten/src/ATen/BatchingRegistrations.cpp +++ b/aten/src/ATen/BatchingRegistrations.cpp @@ -1105,6 +1105,7 @@ TORCH_LIBRARY_IMPL(aten, Batched, m) { m.impl("select.int", select_batching_rule); m.impl("slice.Tensor", slice_batching_rule); m.impl("split.Tensor", split_batching_rule); + m.impl("split.sizes", split_with_sizes_batching_rule); m.impl("split_with_sizes", split_with_sizes_batching_rule); m.impl("squeeze", squeeze_batching_rule); m.impl("squeeze.dim", squeeze_dim_batching_rule); diff --git a/aten/src/ATen/Context.cpp b/aten/src/ATen/Context.cpp index 98590b266be402..8712fe203d1e1e 100644 --- a/aten/src/ATen/Context.cpp +++ b/aten/src/ATen/Context.cpp @@ -236,6 +236,10 @@ const std::vector& Context::supportedQEngines() { engines.push_back(at::kNoQEngine); #endif // C10_MOBILE +#if AT_MKLDNN_ENABLED() + engines.push_back(at::kONEDNN); +#endif + #ifdef USE_FBGEMM if (fbgemm::fbgemmSupportedCPU()) { engines.push_back(at::kFBGEMM); @@ -293,6 +297,20 @@ bool NoTF32Guard::should_disable_tf32() { return override_allow_tf32_flag; } +thread_local bool BackwardPassGuard::is_backward_pass_; + +BackwardPassGuard::BackwardPassGuard() { + is_backward_pass_ = true; +} + +BackwardPassGuard::~BackwardPassGuard() { + is_backward_pass_ = false; +} + +bool BackwardPassGuard::is_backward_pass() { + return is_backward_pass_; +} + bool Context::areVmapFallbackWarningsEnabled() const { return display_vmap_fallback_warnings_; } diff --git a/aten/src/ATen/Context.h b/aten/src/ATen/Context.h index 88cbc3ec0bb3a1..1a90a7e0f1047d 100644 --- a/aten/src/ATen/Context.h +++ b/aten/src/ATen/Context.h @@ -80,6 +80,9 @@ class TORCH_API Context { static bool hasHIP() { return detail::getHIPHooks().hasHIP(); } + static bool hasIPU() { + return c10::impl::hasDeviceGuardImpl(at::DeviceType::IPU); + } static bool hasXLA() { return c10::impl::hasDeviceGuardImpl(at::DeviceType::XLA); } @@ -295,6 +298,10 @@ static inline bool hasHIP() { return globalContext().hasHIP(); } +static inline bool hasIPU() { + return globalContext().hasIPU(); +} + static inline bool hasXLA() { return globalContext().hasXLA(); } @@ -387,4 +394,12 @@ struct TORCH_API NoTF32Guard { bool changed = false; }; +struct TORCH_API BackwardPassGuard { + BackwardPassGuard(); + ~BackwardPassGuard(); + static bool is_backward_pass(); +private: + static thread_local bool is_backward_pass_; +}; + } // namespace at diff --git a/aten/src/ATen/Dispatch.h b/aten/src/ATen/Dispatch.h index 1bd78db594e51e..e0d66934883fc7 100644 --- a/aten/src/ATen/Dispatch.h +++ b/aten/src/ATen/Dispatch.h @@ -513,6 +513,22 @@ inline void deprecated_AT_DISPATCH_ALL_TYPES_AND_HALF_AND_COMPLEX() {} } \ }() +#define AT_DISPATCH_QINT_BYTE_TYPES(TYPE, NAME, ...) \ + [&] { \ + const auto& the_type = TYPE; \ + /* don't use TYPE again in case it is an expensive or side-effect op */ \ + at::ScalarType _st = ::detail::scalar_type(the_type); \ + RECORD_KERNEL_FUNCTION_DTYPE(NAME, _st); \ + switch (_st) { \ + AT_QINT_PRIVATE_CASE_TYPE( \ + NAME, at::kQInt8, at::qint8, at::kChar, int8_t, __VA_ARGS__) \ + AT_QINT_PRIVATE_CASE_TYPE( \ + NAME, at::kQUInt8, at::quint8, at::kByte, uint8_t, __VA_ARGS__) \ + default: \ + AT_ERROR(#NAME, " not implemented for '", toString(TYPE), "'"); \ + } \ + }() + #define AT_DISPATCH_QINT_AND_SUB_BYTE_TYPES(TYPE, NAME, ...) \ [&] { \ const auto& the_type = TYPE; \ @@ -753,6 +769,56 @@ inline void deprecated_AT_DISPATCH_ALL_TYPES_AND_HALF_AND_COMPLEX() {} } \ }() +#define AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND4( \ + SCALARTYPE1, SCALARTYPE2, SCALARTYPE3, SCALARTYPE4, TYPE, NAME, ...) \ + [&] { \ + const auto& the_type = TYPE; \ + /* don't use TYPE again in case it is an expensive or side-effect op*/ \ + at::ScalarType _st = ::detail::scalar_type(the_type); \ + RECORD_KERNEL_FUNCTION_DTYPE(NAME, _st); \ + switch (_st) { \ + AT_PRIVATE_CASE_TYPE(NAME, at::ScalarType::Byte, uint8_t, __VA_ARGS__) \ + AT_PRIVATE_CASE_TYPE(NAME, at::ScalarType::Char, int8_t, __VA_ARGS__) \ + AT_PRIVATE_CASE_TYPE(NAME, at::ScalarType::Double, double, __VA_ARGS__) \ + AT_PRIVATE_CASE_TYPE(NAME, at::ScalarType::Float, float, __VA_ARGS__) \ + AT_PRIVATE_CASE_TYPE(NAME, at::ScalarType::Int, int32_t, __VA_ARGS__) \ + AT_PRIVATE_CASE_TYPE(NAME, at::ScalarType::Long, int64_t, __VA_ARGS__) \ + AT_PRIVATE_CASE_TYPE(NAME, at::ScalarType::Short, int16_t, __VA_ARGS__) \ + AT_PRIVATE_CASE_TYPE( \ + NAME, \ + at::ScalarType::ComplexFloat, \ + c10::complex, \ + __VA_ARGS__) \ + AT_PRIVATE_CASE_TYPE( \ + NAME, \ + at::ScalarType::ComplexDouble, \ + c10::complex, \ + __VA_ARGS__) \ + AT_PRIVATE_CASE_TYPE( \ + NAME, \ + SCALARTYPE1, \ + decltype(c10::impl::ScalarTypeToCPPType::t), \ + __VA_ARGS__) \ + AT_PRIVATE_CASE_TYPE( \ + NAME, \ + SCALARTYPE2, \ + decltype(c10::impl::ScalarTypeToCPPType::t), \ + __VA_ARGS__) \ + AT_PRIVATE_CASE_TYPE( \ + NAME, \ + SCALARTYPE3, \ + decltype(c10::impl::ScalarTypeToCPPType::t), \ + __VA_ARGS__) \ + AT_PRIVATE_CASE_TYPE( \ + NAME, \ + SCALARTYPE4, \ + decltype(c10::impl::ScalarTypeToCPPType::t), \ + __VA_ARGS__) \ + default: \ + AT_ERROR(#NAME, " not implemented for '", toString(_st), "'"); \ + } \ + }() + #define AT_DISPATCH_INDEX_TYPES(TYPE, NAME, ...) \ [&] { \ const auto& the_index_type = TYPE; \ diff --git a/aten/src/ATen/DynamicLibrary.cpp b/aten/src/ATen/DynamicLibrary.cpp index f380fb6c35dd6a..f3287121b2e267 100644 --- a/aten/src/ATen/DynamicLibrary.cpp +++ b/aten/src/ATen/DynamicLibrary.cpp @@ -20,7 +20,7 @@ namespace at { static void* checkDL(void* x) { if (!x) { - AT_ERROR("Error in dlopen or dlsym: ", dlerror()); + TORCH_CHECK_WITH(DynamicLibraryError, false, "Error in dlopen or dlsym: ", dlerror()); } return x; @@ -32,10 +32,10 @@ DynamicLibrary::DynamicLibrary(const char* name, const char* alt_name, bool leak if (alt_name) { handle = dlopen(alt_name, RTLD_LOCAL | RTLD_NOW); if (!handle) { - AT_ERROR("Error in dlopen for library ", name, "and ", alt_name); + TORCH_CHECK_WITH(DynamicLibraryError, false, "Error in dlopen for library ", name, "and ", alt_name); } } else { - AT_ERROR("Error in dlopen: ", dlerror()); + TORCH_CHECK_WITH(DynamicLibraryError, false, "Error in dlopen: ", dlerror()); } } } @@ -84,7 +84,7 @@ DynamicLibrary::DynamicLibrary(const char* name, const char* alt_name, bool leak FormatMessageA(FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS, NULL, dw, MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT), buf, (sizeof(buf) / sizeof(char)), NULL); - AT_ERROR("error in LoadLibrary for ", name, ". WinError ", dw, ": ", buf); + TORCH_CHECK_WITH(DynamicLibraryError, false, "error in LoadLibrary for ", name, ". WinError ", dw, ": ", buf); } } @@ -92,7 +92,7 @@ void* DynamicLibrary::sym(const char* name) { AT_ASSERT(handle); FARPROC procAddress = GetProcAddress((HMODULE)handle, name); if (!procAddress) { - AT_ERROR("error in GetProcAddress"); + TORCH_CHECK_WITH(DynamicLibraryError, false, "error in GetProcAddress"); } return (void*)procAddress; } diff --git a/aten/src/ATen/DynamicLibrary.h b/aten/src/ATen/DynamicLibrary.h index 9e7ade53cf96f8..8f65dd5b494f76 100644 --- a/aten/src/ATen/DynamicLibrary.h +++ b/aten/src/ATen/DynamicLibrary.h @@ -1,8 +1,17 @@ #pragma once #include +#include #include +namespace c10 { + +class DynamicLibraryError : public Error { + using Error::Error; +}; + +} // namespace c10 + namespace at { struct DynamicLibrary { diff --git a/aten/src/ATen/EmptyTensor.cpp b/aten/src/ATen/EmptyTensor.cpp index 5e21a2f52d187f..5a72a09d1841c7 100644 --- a/aten/src/ATen/EmptyTensor.cpp +++ b/aten/src/ATen/EmptyTensor.cpp @@ -2,31 +2,93 @@ #include #include #include +#include + +#include namespace at { namespace detail { - -static c10::Allocator* GetCPUAllocatorMaybePinned(bool pin_memory) { +namespace { +c10::Allocator* GetCPUAllocatorMaybePinned(bool pin_memory) { if (pin_memory) { return at::detail::getCUDAHooks().getPinnedMemoryAllocator(); } return c10::GetCPUAllocator(); } +constexpr uint64_t storage_max() { + // int64_t and size_t are used somewhat inconsistently throughout ATen. + // To be safe, storage size calculations must fit in both types. + constexpr auto int64_max = static_cast( + std::numeric_limits::max()); + constexpr auto size_max = static_cast( + std::numeric_limits::max()); + return std::min(int64_max, size_max); +} + +} // namespace (anonymous) + +size_t computeStorageNbytesContiguous( + IntArrayRef sizes, + size_t itemsize_bytes, + size_t storage_offset + ) { + // Ignore overflow checks on mobile +#ifndef C10_MOBILE + uint64_t size = 1; + bool overflowed = c10::safe_multiplies_u64(sizes, &size); + overflowed |= c10::add_overflows(size, storage_offset, &size); + overflowed |= c10::mul_overflows(size, itemsize_bytes, &size); + overflowed |= size > storage_max(); + TORCH_CHECK(!overflowed, + "Storage size calculation overflowed with sizes=", sizes); + return static_cast(size); +#else + const auto numel = c10::multiply_integers(sizes); + return itemsize_bytes * (storage_offset + numel); +#endif +} + size_t computeStorageNbytes( IntArrayRef sizes, IntArrayRef strides, - size_t itemsize_bytes) { + size_t itemsize_bytes, + size_t storage_offset + ) { + // Ignore overflow checks on mobile +#ifndef C10_MOBILE // size of the underlying storage is 1 bigger than the offset // of the last element according to stride - size_t size = 1; + uint64_t size = storage_offset + 1; + bool overflowed = false; for (const auto i : c10::irange(sizes.size())) { - if(sizes[i] == 0) { + if (sizes[i] == 0) { return 0; } - size += strides[i]*(sizes[i]-1); + + uint64_t strided_size; + overflowed |= c10::mul_overflows(strides[i], sizes[i] - 1, &strided_size); + overflowed |= c10::add_overflows(size, strided_size, &size); } - return size * itemsize_bytes; + overflowed |= c10::mul_overflows(size, itemsize_bytes, &size); + overflowed |= size > storage_max(); + TORCH_CHECK(!overflowed, + "Storage size calculation overflowed with sizes=", + sizes, " and strides=", strides); + return static_cast(size); +#else + // size of the underlying storage is 1 bigger than the offset + // of the last element according to stride + uint64_t size = 1; + for (const auto i : c10::irange(sizes.size())) { + if (sizes[i] == 0) { + return 0; + } + + size += strides[i] * (sizes[i] - 1); + } + return itemsize_bytes * (storage_offset + size); +#endif } TensorBase empty_generic( @@ -37,9 +99,8 @@ TensorBase empty_generic( c10::optional memory_format_opt) { at::detail::check_size_nonnegative(size); - int64_t nelements = c10::multiply_integers(size); caffe2::TypeMeta dtype = scalarTypeToTypeMeta(scalar_type); - int64_t size_bytes = nelements * dtype.itemsize(); + size_t size_bytes = computeStorageNbytesContiguous(size, dtype.itemsize()); auto storage_impl = c10::make_intrusive( c10::StorageImpl::use_byte_size_t(), size_bytes, @@ -73,7 +134,7 @@ TensorBase empty_strided_generic( at::detail::check_size_nonnegative(size); caffe2::TypeMeta dtype = scalarTypeToTypeMeta(scalar_type); - int64_t size_bytes = computeStorageNbytes(size, stride, dtype.itemsize()); + size_t size_bytes = computeStorageNbytes(size, stride, dtype.itemsize()); auto storage_impl = c10::make_intrusive( c10::StorageImpl::use_byte_size_t(), size_bytes, @@ -176,13 +237,11 @@ struct MetaAllocator final : public at::Allocator { static MetaAllocator g_meta_alloc; -at::Allocator* GetMetaAllocator() { - return &g_meta_alloc; -} +REGISTER_ALLOCATOR(kMeta, &g_meta_alloc); TensorBase empty_meta(IntArrayRef size, ScalarType dtype, c10::optional memory_format_opt) { - auto *allocator = GetMetaAllocator(); + auto *allocator = GetAllocator(kMeta); constexpr c10::DispatchKeySet meta_dks(c10::DispatchKey::Meta); return at::detail::empty_generic( size, allocator, meta_dks, dtype, memory_format_opt); @@ -222,7 +281,7 @@ TensorBase empty_meta( TensorBase empty_strided_meta(IntArrayRef size, IntArrayRef stride, ScalarType dtype) { - auto *allocator = GetMetaAllocator(); + auto *allocator = GetAllocator(kMeta); constexpr c10::DispatchKeySet meta_dks(c10::DispatchKey::Meta); return at::detail::empty_strided_generic( size, stride, allocator, meta_dks, dtype); diff --git a/aten/src/ATen/EmptyTensor.h b/aten/src/ATen/EmptyTensor.h index a49b3e909d6e80..895bcc8e177970 100644 --- a/aten/src/ATen/EmptyTensor.h +++ b/aten/src/ATen/EmptyTensor.h @@ -10,8 +10,11 @@ inline void check_size_nonnegative(IntArrayRef size) { } } +TORCH_API size_t computeStorageNbytesContiguous( + IntArrayRef sizes, size_t itemsize, size_t storage_offset=0); TORCH_API size_t computeStorageNbytes( - IntArrayRef sizes, IntArrayRef strides, size_t itemsize); + IntArrayRef sizes, IntArrayRef strides, + size_t itemsize, size_t storage_offset=0); TORCH_API TensorBase empty_generic( IntArrayRef size, diff --git a/aten/src/ATen/FunctionalTensorWrapper.cpp b/aten/src/ATen/FunctionalTensorWrapper.cpp index 5f99e377479866..13cc746246a774 100644 --- a/aten/src/ATen/FunctionalTensorWrapper.cpp +++ b/aten/src/ATen/FunctionalTensorWrapper.cpp @@ -322,6 +322,57 @@ void sync(const c10::List> t_list) { } } +bool isFunctionalTensor(const at::Tensor& tensor) { + return tensor.unsafeGetTensorImpl()->key_set().has(c10::DispatchKey::Functionalize); +} + +bool isFunctionalTensor(const c10::optional& t) { + if (t.has_value()) { + return isFunctionalTensor(*t); + } else { + return false; + } +} + +bool isFunctionalTensor(const c10::List& t_list) { + if (t_list.size() == 0) return false; + bool any_functional = isFunctionalTensor(t_list[0]); + for (const auto i : c10::irange(1, t_list.size())) { + auto curr_functional = isFunctionalTensor(t_list[i]); + TORCH_INTERNAL_ASSERT( + curr_functional == any_functional, + "Functionalization encountered a list of tensors where some are functional", + "and some are not, which is not currently unsupported."); + } + return any_functional; +} + +bool isFunctionalTensor(const c10::List>& t_list) { + if (t_list.size() == 0) return false; + bool any_functional = isFunctionalTensor(t_list[0]); + for (const auto i : c10::irange(1, t_list.size())) { + auto curr_functional = isFunctionalTensor(t_list[i]); + TORCH_INTERNAL_ASSERT( + curr_functional == any_functional, + "Functionalization encountered a list of tensors where some are functional", + "and some are not, which is not currently unsupported."); + } + return any_functional; +} + +bool isFunctionalTensor(const c10::ArrayRef t_list) { + if (t_list.size() == 0) return false; + bool any_functional = isFunctionalTensor(t_list[0]); + for (const auto i : c10::irange(1, t_list.size())) { + auto curr_functional = isFunctionalTensor(t_list[i]); + TORCH_INTERNAL_ASSERT( + curr_functional == any_functional, + "Functionalization encountered a list of tensors where some are functional", + "and some are not, which is not currently unsupported."); + } + return any_functional; +} + Tensor create_functional_tensor_with_view_meta(const at::Tensor& view_to_wrap, const at::Tensor& base, functionalization::ViewMeta meta, int64_t out_idx) { TORCH_INTERNAL_ASSERT(!at::functionalization::impl::isFunctionalTensor(view_to_wrap)); TORCH_INTERNAL_ASSERT(at::functionalization::impl::isFunctionalTensor(base)); diff --git a/aten/src/ATen/FunctionalTensorWrapper.h b/aten/src/ATen/FunctionalTensorWrapper.h index 1696b41f1543c7..1f0988c4a07b18 100644 --- a/aten/src/ATen/FunctionalTensorWrapper.h +++ b/aten/src/ATen/FunctionalTensorWrapper.h @@ -117,9 +117,11 @@ TORCH_API inline FunctionalTensorWrapper* unsafeGetFunctionalWrapper(const Tenso return functional_impl; } -TORCH_API inline bool isFunctionalTensor(const at::Tensor& tensor) { - return tensor.unsafeGetTensorImpl()->key_set().has(c10::DispatchKey::Functionalize); -} +TORCH_API bool isFunctionalTensor(const at::Tensor& tensor); +TORCH_API bool isFunctionalTensor(const c10::optional& t); +TORCH_API bool isFunctionalTensor(const c10::List& t_list); +TORCH_API bool isFunctionalTensor(const c10::List>& t_list); +TORCH_API bool isFunctionalTensor(const c10::ArrayRef t_list); TORCH_API Tensor to_functional_tensor(const Tensor& tensor); TORCH_API c10::List to_functional_tensor(const c10::List& t_list); diff --git a/aten/src/ATen/FunctionalizeFallbackKernel.cpp b/aten/src/ATen/FunctionalizeFallbackKernel.cpp index f130fc7cdbd4df..f63f4bdcd79912 100644 --- a/aten/src/ATen/FunctionalizeFallbackKernel.cpp +++ b/aten/src/ATen/FunctionalizeFallbackKernel.cpp @@ -12,23 +12,36 @@ namespace { const auto arguments_begin = stack->size() - num_arguments; auto arguments = torch::jit::last(stack, num_arguments); + auto any_functional_inputs = false; + auto any_tensor_inputs = false; for (uint64_t idx = 0; idx < num_arguments; ++idx) { const auto& ivalue = arguments[idx]; if (ivalue.isTensor()) { + any_tensor_inputs = true; auto t = ivalue.toTensor(); - at::functionalization::impl::sync(t); - auto t_new = c10::IValue(at::functionalization::impl::from_functional_tensor(t)); - (*stack)[arguments_begin + idx] = t_new; + if (at::functionalization::impl::isFunctionalTensor(t)) { + any_functional_inputs = true; + at::functionalization::impl::sync(t); + auto t_new = c10::IValue(at::functionalization::impl::from_functional_tensor(t)); + (*stack)[arguments_begin + idx] = t_new; + } } else if (ivalue.isTensorList()) { + any_tensor_inputs = true; auto tensors = ivalue.toTensorList(); - at::functionalization::impl::sync(tensors); - auto t_new = c10::IValue(at::functionalization::impl::from_functional_tensor(tensors)); - (*stack)[arguments_begin + idx] = t_new; + if (at::functionalization::impl::isFunctionalTensor(tensors)) { + any_functional_inputs = true; + at::functionalization::impl::sync(tensors); + auto t_new = c10::IValue(at::functionalization::impl::from_functional_tensor(tensors)); + (*stack)[arguments_begin + idx] = t_new; + } } } + // we should wrap the output if any inputs were wrapped, + // OR if we're hitting a factory function (with no tensor inputs) + auto should_wrap_outputs = !any_tensor_inputs || any_functional_inputs; { at::AutoDispatchSkipFunctionalize guard; - op.redispatchBoxed(dispatchKeySet & c10::after_func_keyset, stack); + op.callBoxed(stack); } const auto num_returns = schema.returns().size(); const auto returns_begin = stack->size() - num_returns; @@ -36,11 +49,11 @@ namespace { for (const auto idx : c10::irange(num_returns)) { const auto& ivalue = returns[idx]; - if (ivalue.isTensor()) { + if (ivalue.isTensor() && should_wrap_outputs) { auto t = ivalue.toTensor(); auto t_new = c10::IValue(at::functionalization::impl::to_functional_tensor(t)); (*stack)[returns_begin + idx] = t_new; - } else if (ivalue.isTensorList()) { + } else if (ivalue.isTensorList() && should_wrap_outputs) { auto tensors = ivalue.toTensorList(); auto t_new = c10::IValue(at::functionalization::impl::to_functional_tensor(tensors)); (*stack)[returns_begin + idx] = t_new; diff --git a/aten/src/ATen/NestedTensorImpl.cpp b/aten/src/ATen/NestedTensorImpl.cpp index 51e93fc86c5d19..a7b6d97b2cee31 100644 --- a/aten/src/ATen/NestedTensorImpl.cpp +++ b/aten/src/ATen/NestedTensorImpl.cpp @@ -30,6 +30,7 @@ NestedTensorImpl::NestedTensorImpl( key_set_ = key_set_ - c10::DispatchKeySet({c10::DispatchKey::ADInplaceOrView}); refresh_dim(); + set_sizes_customization_policy(CustomizableMethodPolicy::NotSupported); } void NestedTensorImpl::refresh_dim() { @@ -38,5 +39,8 @@ void NestedTensorImpl::refresh_dim() { TORCH_INTERNAL_ASSERT_DEBUG_ONLY(dim() == my_dim); } +const char* NestedTensorImpl::tensorimpl_type_name() const { + return "NestedTensorImpl"; +} } // namespace native } // namespace at diff --git a/aten/src/ATen/NestedTensorImpl.h b/aten/src/ATen/NestedTensorImpl.h index 4598a45c3c44fe..5b7757e66dd363 100644 --- a/aten/src/ATen/NestedTensorImpl.h +++ b/aten/src/ATen/NestedTensorImpl.h @@ -29,7 +29,7 @@ struct NestedTensorImpl : public c10::TensorImpl { // TODO: don't expose private implementation details like this; in // particular, resizing this tensor will mess up our dim() and // callers cannot fix it. - const Tensor& get_nested_size_tensor() { + const Tensor& get_nested_size_tensor() const { return nested_size_tensor_; } #ifndef C10_DISABLE_TENSORIMPL_EXTENSIBILITY @@ -53,6 +53,9 @@ struct NestedTensorImpl : public c10::TensorImpl { return buffer_; } + protected: + const char* tensorimpl_type_name() const override; + private: // Must be called after any changes to our dim() to sync the state // to TensorImpl. @@ -62,5 +65,29 @@ struct NestedTensorImpl : public c10::TensorImpl { const at::Tensor nested_size_tensor_; }; +inline NestedTensorImpl* get_nested_tensor_impl_or_null(const at::Tensor& tensor) { + if (tensor.is_nested()) { + return static_cast(tensor.unsafeGetTensorImpl()); + } + return nullptr; +} + +inline NestedTensorImpl* get_nested_tensor_impl( + const at::Tensor& tensor) { + TORCH_CHECK( + tensor.is_nested(), + "get_nested_tensor_impl requires a NestedTensor."); + return static_cast( + tensor.unsafeGetTensorImpl()); +} + + +// TODO: real implementation once we support strides. +inline bool nested_tensor_impl_is_contiguous( + const NestedTensorImpl* nt, + at::MemoryFormat memory_format = MemoryFormat::Contiguous) { + return memory_format == MemoryFormat::Contiguous; +} + } // namespace native } // namespace at diff --git a/aten/src/ATen/OpMathType.h b/aten/src/ATen/OpMathType.h index b58d4779ac7a47..7b8ad97d3150ab 100644 --- a/aten/src/ATen/OpMathType.h +++ b/aten/src/ATen/OpMathType.h @@ -1,7 +1,9 @@ #pragma once +#include #include #include +#include namespace at { @@ -13,4 +15,21 @@ template<> struct OpMathType { using type = float; }; template using opmath_type = typename OpMathType::type; +namespace { + +c10::ScalarType toOpMathType(const c10::ScalarType type) { + switch (type) { +#define DEFINE_CASE(scalar_t, TypeNum) \ + case ScalarType::TypeNum: \ + return CppTypeToScalarType>::value; + + AT_FORALL_SCALAR_TYPES_WITH_COMPLEX_EXCEPT_COMPLEX_HALF(DEFINE_CASE) +#undef DEFINE_CASE + + default: TORCH_INTERNAL_ASSERT(false, "Unrecognized ScalarType: ", type); + } +} + +} + } // namespace at diff --git a/aten/src/ATen/PythonTorchFunctionTLS.cpp b/aten/src/ATen/PythonTorchFunctionTLS.cpp new file mode 100644 index 00000000000000..ae9f722de60ac6 --- /dev/null +++ b/aten/src/ATen/PythonTorchFunctionTLS.cpp @@ -0,0 +1,38 @@ +#include +#include + +namespace at { +namespace impl { + +static thread_local PythonTorchFunctionTLS pythonTorchFunctionState; + +void PythonTorchFunctionTLS::set_mode(std::shared_ptr mode) { + pythonTorchFunctionState.mode_ = std::move(mode); +} + +const std::shared_ptr& PythonTorchFunctionTLS::get_mode() { + return pythonTorchFunctionState.mode_; +} + +void PythonTorchFunctionTLS::swap_mode(std::shared_ptr& mode) { + pythonTorchFunctionState.mode_.swap(mode); +} + +void PythonTorchFunctionTLS::set_disabled(bool disabled) { + pythonTorchFunctionState.disabled_ = disabled; +} + +bool PythonTorchFunctionTLS::is_disabled() { + return pythonTorchFunctionState.disabled_; +} + +void PythonTorchFunctionTLS::set_state(const PythonTorchFunctionTLS& state) { + pythonTorchFunctionState = state; +} + +const PythonTorchFunctionTLS& PythonTorchFunctionTLS::get_state() { + return pythonTorchFunctionState; +} + +} // namespace impl +} // namespace at diff --git a/aten/src/ATen/PythonTorchFunctionTLS.h b/aten/src/ATen/PythonTorchFunctionTLS.h new file mode 100644 index 00000000000000..64256d2f7c21d4 --- /dev/null +++ b/aten/src/ATen/PythonTorchFunctionTLS.h @@ -0,0 +1,26 @@ +#pragma once + +#include +#include + +namespace at { +namespace impl { + +struct TORCH_API PythonTorchFunctionTLS { + static void set_disabled(bool); + static bool is_disabled(); + + static void set_mode(std::shared_ptr); + static const std::shared_ptr& get_mode(); + static void swap_mode(std::shared_ptr&); + + static void set_state(const PythonTorchFunctionTLS& state); + static const PythonTorchFunctionTLS& get_state(); + +private: + bool disabled_; + std::shared_ptr mode_; +}; + +} // namespace impl +} // namespace at diff --git a/aten/src/ATen/ScalarOps.cpp b/aten/src/ATen/ScalarOps.cpp index 8eb10266d78fe7..98a38023f9b4f1 100644 --- a/aten/src/ATen/ScalarOps.cpp +++ b/aten/src/ATen/ScalarOps.cpp @@ -15,8 +15,8 @@ inline void fill_inplace(Tensor& self, const Scalar& value_scalar) { namespace detail { Tensor& scalar_fill(Tensor& self, const Scalar& value) { - AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3( - kHalf, kBool, kBFloat16, self.scalar_type(), "fill_out", [&]() { + AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND4( + kComplexHalf, kHalf, kBool, kBFloat16, self.scalar_type(), "fill_out", [&]() { fill_inplace(self, value); }); return self; diff --git a/aten/src/ATen/SparseCsrTensorImpl.cpp b/aten/src/ATen/SparseCsrTensorImpl.cpp index 2029189912e6b2..e2f565b6efef12 100644 --- a/aten/src/ATen/SparseCsrTensorImpl.cpp +++ b/aten/src/ATen/SparseCsrTensorImpl.cpp @@ -57,20 +57,31 @@ SparseCsrTensorImpl::SparseCsrTensorImpl( col_indices_(std::move(col_indices)), values_(std::move(values)) { set_storage_access_should_throw(); + is_non_overlapping_and_dense_ = false; + set_has_contiguity_policy(HasContiguityPolicy::ContiguityNotSupported); +} + +const char* SparseCsrTensorImpl::tensorimpl_type_name() const { + return "SparseCsrTensorImpl"; } void SparseCsrTensorImpl::resize_(int64_t nnz, IntArrayRef size) { - auto rows = size[0]; - auto cols = size[1]; + auto rows = size[size.size() - 2]; + auto cols = size[size.size() - 1]; auto old_crow_indices_size = crow_indices_.size(-1); - crow_indices_.resize_({rows + 1}); + + auto new_crow_indices_size = DimVector(size.slice(0, size.size() - 2)); + new_crow_indices_size.push_back(rows + 1); + crow_indices_.resize_(new_crow_indices_size); if (rows + 1 >= old_crow_indices_size) { crow_indices_.narrow(-1, old_crow_indices_size, rows + 1 - old_crow_indices_size).fill_(nnz); } else { crow_indices_.narrow(-1, rows, 1).fill_(std::min(nnz, rows*cols)); } - col_indices_.resize_({std::min(nnz, rows*cols)}); - values_.resize_({std::min(nnz, rows*cols)}); + auto col_indices_values_size = DimVector(size.slice(0, size.size() - 2)); + col_indices_values_size.push_back(std::min(nnz, rows*cols)); + col_indices_.resize_(col_indices_values_size); + values_.resize_(col_indices_values_size); sizes_and_strides_.set_sizes(size); } @@ -113,4 +124,21 @@ void SparseCsrTensorImpl::set_member_tensors( sizes_and_strides_.set_sizes(size); refresh_numel(); } + +IntArrayRef SparseCsrTensorImpl::strides() const { + TORCH_CHECK(false, "Sparse CSR tensors do not have strides."); +} +int64_t SparseCsrTensorImpl::stride(int64_t d) const { + TORCH_CHECK(false, "Sparse CSR tensors do not have strides."); +} +void SparseCsrTensorImpl::set_size(int64_t dim, int64_t new_size) { + TORCH_CHECK(false, "Sparse CSR tensors do not have set_size."); +} +void SparseCsrTensorImpl::set_stride(int64_t dim, int64_t new_stride) { + TORCH_CHECK(false, "Sparse CSR tensors do not have set_stride."); +} +void SparseCsrTensorImpl::set_storage_offset(int64_t storage_offset) { + TORCH_CHECK(false, "Sparse CSR tensors do not have set_storage_offset."); +} + } // namespace at diff --git a/aten/src/ATen/SparseCsrTensorImpl.h b/aten/src/ATen/SparseCsrTensorImpl.h index 850e0a02a44857..ea308f6891a2c0 100644 --- a/aten/src/ATen/SparseCsrTensorImpl.h +++ b/aten/src/ATen/SparseCsrTensorImpl.h @@ -43,7 +43,13 @@ struct TORCH_API SparseCsrTensorImpl : public TensorImpl { const Tensor& crow_indices() const { return crow_indices_; } const Tensor& col_indices() const { return col_indices_; } const Tensor& values() const { return values_; } - int nnz() { return values_.size(0); } + int nnz() { return col_indices_.size(-1); } + + IntArrayRef strides() const override; + int64_t stride(int64_t d) const override; + void set_size(int64_t dim, int64_t new_size) override; + void set_stride(int64_t dim, int64_t new_stride) override; + void set_storage_offset(int64_t storage_offset) override; /** * Return a TensorImpl that is a shallow-copy of this TensorImpl. @@ -91,6 +97,8 @@ struct TORCH_API SparseCsrTensorImpl : public TensorImpl { at::Tensor col_indices, at::Tensor values); + const char* tensorimpl_type_name() const override; + /** * Copy the tensor metadata fields (e.g. sizes / strides / storage pointer / storage_offset) * from one TensorImpl to another TensorImpl. diff --git a/aten/src/ATen/SparseTensorUtils.cpp b/aten/src/ATen/SparseTensorUtils.cpp index d5811b933e7ca5..712e85e851be91 100644 --- a/aten/src/ATen/SparseTensorUtils.cpp +++ b/aten/src/ATen/SparseTensorUtils.cpp @@ -30,7 +30,7 @@ Tensor flatten_indices(const Tensor& indices, IntArrayRef full_size, bool force_ } } else { std::vector indices_mult_cpu_vec; - indices_mult_cpu_vec.reserve(sparse_dim); + indices_mult_cpu_vec.resize(sparse_dim); int64_t mult = 1; for (int64_t i = sparse_dim - 1; i >= 0; i--) { indices_mult_cpu_vec[i] = mult; diff --git a/aten/src/ATen/TensorIterator.cpp b/aten/src/ATen/TensorIterator.cpp index 6c9e03d044ef98..f79dd3066b78ee 100644 --- a/aten/src/ATen/TensorIterator.cpp +++ b/aten/src/ATen/TensorIterator.cpp @@ -745,7 +745,7 @@ void TensorIteratorBase::for_each(loop2d_t loop, int64_t grain_size) { int64_t numel = this->numel(); if (numel == 0) { return; - } else if (numel < grain_size || at::get_num_threads() == 1) { + } else if (numel < internal::GRAIN_SIZE || at::get_num_threads() == 1) { return serial_for_each(loop, {0, numel}); } else { at::parallel_for(0, numel, grain_size, [&](int64_t begin, int64_t end) { @@ -1493,8 +1493,10 @@ void TensorIteratorBase::build(TensorIteratorConfig& config) { // Nothing beyond this point is important for meta functions, so it's fine to exit early here. // Extend the condition to ORT tesnors as ORT tensors also don't have storage. if (common_device_.type() == DeviceType::XLA || + common_device_.type() == DeviceType::IPU || common_device_.type() == DeviceType::Lazy || - common_device_.type() == DeviceType::ORT) return; + common_device_.type() == DeviceType::ORT || + common_device_.type() == DeviceType::HPU) return; for (auto& op : operands_) { TORCH_INTERNAL_ASSERT(op.tensor_base().defined()); diff --git a/aten/src/ATen/TensorSubclassLikeUtils.h b/aten/src/ATen/TensorSubclassLikeUtils.h index 7f5517bc08114a..e9f5e7d26e112c 100644 --- a/aten/src/ATen/TensorSubclassLikeUtils.h +++ b/aten/src/ATen/TensorSubclassLikeUtils.h @@ -28,8 +28,7 @@ constexpr auto kFunctorchWrappedTensors = DispatchKeySet({ constexpr auto kTensorSubclassLike = kFunctorchWrappedTensors | DispatchKeySet({ DispatchKey::Batched, - DispatchKey::SparseCPU, - DispatchKey::SparseCUDA, + DispatchKey::Sparse, DispatchKey::SparseCsrCPU, DispatchKey::SparseCsrCUDA, DispatchKey::Meta, diff --git a/aten/src/ATen/ThreadLocalState.cpp b/aten/src/ATen/ThreadLocalState.cpp index 3e3d4d6a957371..fdbd8b1699ba6f 100644 --- a/aten/src/ATen/ThreadLocalState.cpp +++ b/aten/src/ATen/ThreadLocalState.cpp @@ -13,7 +13,8 @@ ThreadLocalState::ThreadLocalState() : dispatch_key_(c10::impl::tls_local_dispatch_key_set()), debug_info_(c10::ThreadLocalDebugInfo::current()), functorch_tls_(functorch::getCopyOfFuncTorchTLS()), - autograd_tls_(c10::AutogradState::get_tls_state()) { + autograd_tls_(c10::AutogradState::get_tls_state()), + python_torch_function_state_(at::impl::PythonTorchFunctionTLS::get_state()) { rf_tls_ = at::get_record_function_tls_(); saved_tensors_default_hooks_ = at::SavedTensorDefaultHooks::get_stack(); @@ -35,6 +36,8 @@ void ThreadLocalState::setThreadLocalState( at::impl::PythonModeTLS::set_state(state.python_mode_state_); + at::impl::PythonTorchFunctionTLS::set_state(state.python_torch_function_state_); + at::set_record_function_tls_(state.rf_tls_); at::SavedTensorDefaultHooks::set_stack(state.saved_tensors_default_hooks_); diff --git a/aten/src/ATen/ThreadLocalState.h b/aten/src/ATen/ThreadLocalState.h index c5f14518f42281..7599c16ad4c802 100644 --- a/aten/src/ATen/ThreadLocalState.h +++ b/aten/src/ATen/ThreadLocalState.h @@ -10,6 +10,7 @@ #include #include #include +#include namespace at { @@ -53,7 +54,11 @@ class TORCH_API ThreadLocalState { // TLS for AutogradModes AutogradState autograd_tls_; - std::shared_ptr python_mode_state_; + // TLS for enable_python_mode (__torch_dispatch__) + std::shared_ptr python_mode_state_; + + // TLS for __torch_function__ (mode and disable_torch_function) + at::impl::PythonTorchFunctionTLS python_torch_function_state_; // TLS for saved tensors default hooks std::stack> saved_tensors_default_hooks_; diff --git a/aten/src/ATen/autocast_mode.cpp b/aten/src/ATen/autocast_mode.cpp index bd9da6a4593502..d2c2232cc6d4be 100644 --- a/aten/src/ATen/autocast_mode.cpp +++ b/aten/src/ATen/autocast_mode.cpp @@ -325,6 +325,7 @@ TORCH_LIBRARY_IMPL(aten, Autocast, m) { KERNEL(ADD_NS(addmv), "addmv", Tensor (const Tensor &, const Tensor &, const Tensor &, const Scalar&, const Scalar&), lower_precision_fp) KERNEL(ADD_NS(addr), "addr", Tensor (const Tensor &, const Tensor &, const Tensor &, const Scalar&, const Scalar&), lower_precision_fp) KERNEL(ADD_NS(matmul), "matmul", Tensor (const Tensor &, const Tensor &), lower_precision_fp) + KERNEL(ADD_NS(einsum), "einsum", Tensor (c10::string_view, TensorList), lower_precision_fp) KERNEL(ADD_NS(mm), "mm", Tensor (const Tensor &, const Tensor &), lower_precision_fp) KERNEL(ADD_NS(mv), "mv", Tensor (const Tensor &, const Tensor &), lower_precision_fp) KERNEL(ADD_NS(linear), "linear", Tensor (const Tensor &, const Tensor &, const c10::optional&), lower_precision_fp) @@ -487,23 +488,23 @@ TORCH_LIBRARY_IMPL(aten, AutocastCPU, m) { KERNEL_CPU(ADD_NS(avg_pool3d), "avg_pool3d", Tensor (const Tensor &, IntArrayRef, IntArrayRef, IntArrayRef, bool, bool, c10::optional), fp32) KERNEL_CPU(ADD_NS(gelu), "gelu", Tensor (const Tensor &, c10::string_view), fp32) KERNEL_CPU(ADD_NS(upsample_nearest1d), "upsample_nearest1d", Tensor (const Tensor &, IntArrayRef, c10::optional), fp32) - KERNEL_CPU(ADD_NS(upsample_nearest1d), "upsample_nearest1d.vec", Tensor (const Tensor &, c10::optional, c10::optional>), fp32) + KERNEL_CPU(ADD_NS(upsample_nearest1d), "upsample_nearest1d.vec", Tensor (const Tensor &, at::OptionalIntArrayRef, c10::optional>), fp32) KERNEL_CPU(ADD_NS(_upsample_nearest_exact1d), "_upsample_nearest_exact1d", Tensor (const Tensor &, IntArrayRef, c10::optional), fp32) - KERNEL_CPU(ADD_NS(_upsample_nearest_exact1d), "_upsample_nearest_exact1d.vec", Tensor (const Tensor &, c10::optional, c10::optional>), fp32) + KERNEL_CPU(ADD_NS(_upsample_nearest_exact1d), "_upsample_nearest_exact1d.vec", Tensor (const Tensor &, at::OptionalIntArrayRef, c10::optional>), fp32) KERNEL_CPU(ADD_NS(upsample_nearest2d), "upsample_nearest2d", Tensor (const Tensor &, IntArrayRef, c10::optional, c10::optional), fp32) - KERNEL_CPU(ADD_NS(upsample_nearest2d), "upsample_nearest2d.vec", Tensor (const Tensor &, c10::optional, c10::optional>), fp32) + KERNEL_CPU(ADD_NS(upsample_nearest2d), "upsample_nearest2d.vec", Tensor (const Tensor &, at::OptionalIntArrayRef, c10::optional>), fp32) KERNEL_CPU(ADD_NS(_upsample_nearest_exact2d), "_upsample_nearest_exact2d", Tensor (const Tensor &, IntArrayRef, c10::optional, c10::optional), fp32) - KERNEL_CPU(ADD_NS(_upsample_nearest_exact2d), "_upsample_nearest_exact2d.vec", Tensor (const Tensor &, c10::optional, c10::optional>), fp32) + KERNEL_CPU(ADD_NS(_upsample_nearest_exact2d), "_upsample_nearest_exact2d.vec", Tensor (const Tensor &, at::OptionalIntArrayRef, c10::optional>), fp32) KERNEL_CPU(ADD_NS(upsample_nearest3d), "upsample_nearest3d", Tensor (const Tensor &, IntArrayRef, c10::optional, c10::optional, c10::optional), fp32) - KERNEL_CPU(ADD_NS(upsample_nearest3d), "upsample_nearest3d.vec", Tensor (const Tensor &, c10::optional, c10::optional>), fp32) + KERNEL_CPU(ADD_NS(upsample_nearest3d), "upsample_nearest3d.vec", Tensor (const Tensor &, at::OptionalIntArrayRef, c10::optional>), fp32) KERNEL_CPU(ADD_NS(_upsample_nearest_exact3d), "_upsample_nearest_exact3d", Tensor (const Tensor &, IntArrayRef, c10::optional, c10::optional, c10::optional), fp32) - KERNEL_CPU(ADD_NS(_upsample_nearest_exact3d), "_upsample_nearest_exact3d.vec", Tensor (const Tensor &, c10::optional, c10::optional>), fp32) + KERNEL_CPU(ADD_NS(_upsample_nearest_exact3d), "_upsample_nearest_exact3d.vec", Tensor (const Tensor &, at::OptionalIntArrayRef, c10::optional>), fp32) KERNEL_CPU(ADD_NS(upsample_linear1d), "upsample_linear1d", Tensor (const Tensor &, IntArrayRef, bool, c10::optional), fp32) - KERNEL_CPU(ADD_NS(upsample_linear1d), "upsample_linear1d.vec", Tensor (const Tensor &, c10::optional, bool, c10::optional>), fp32) + KERNEL_CPU(ADD_NS(upsample_linear1d), "upsample_linear1d.vec", Tensor (const Tensor &, at::OptionalIntArrayRef, bool, c10::optional>), fp32) KERNEL_CPU(ADD_NS(upsample_bilinear2d), "upsample_bilinear2d", Tensor (const Tensor &, IntArrayRef, bool, c10::optional, c10::optional), fp32) - KERNEL_CPU(ADD_NS(upsample_bilinear2d), "upsample_bilinear2d.vec", Tensor (const Tensor &, c10::optional, bool, c10::optional>), fp32) + KERNEL_CPU(ADD_NS(upsample_bilinear2d), "upsample_bilinear2d.vec", Tensor (const Tensor &, at::OptionalIntArrayRef, bool, c10::optional>), fp32) KERNEL_CPU(ADD_NS(upsample_trilinear3d), "upsample_trilinear3d", Tensor (const Tensor &, IntArrayRef, bool, c10::optional, c10::optional, c10::optional), fp32) - KERNEL_CPU(ADD_NS(upsample_trilinear3d), "upsample_trilinear3d.vec", Tensor (const Tensor &, c10::optional, bool, c10::optional>), fp32) + KERNEL_CPU(ADD_NS(upsample_trilinear3d), "upsample_trilinear3d.vec", Tensor (const Tensor &, at::OptionalIntArrayRef, bool, c10::optional>), fp32) KERNEL_CPU(ADD_NS(binary_cross_entropy), "binary_cross_entropy", Tensor (const Tensor &, const Tensor &, const c10::optional&, int64_t), fp32) KERNEL_CPU(ADD_NS(binary_cross_entropy_with_logits), "binary_cross_entropy_with_logits", Tensor (const Tensor &, const Tensor &, const c10::optional&, const c10::optional&, int64_t), fp32) @@ -522,6 +523,7 @@ TORCH_LIBRARY_IMPL(aten, AutocastCPU, m) { KERNEL_CPU(ADD_NS(nanquantile), "nanquantile", Tensor(const Tensor &, const Tensor &, c10::optional, bool, c10::string_view), fp32) KERNEL_CPU(ADD_NS(nanquantile), "nanquantile.scalar", Tensor(const Tensor &, double, c10::optional, bool, c10::string_view), fp32) KERNEL_CPU(ADD_NS(stft), "stft", Tensor(const Tensor &, int64_t, c10::optional, c10::optional, const c10::optional &, bool, c10::optional, c10::optional), fp32) + KERNEL_CPU(ADD_NS(stft), "stft.center", Tensor(const Tensor &, int64_t, c10::optional, c10::optional, const c10::optional &, bool, c10::string_view, bool, c10::optional, c10::optional), fp32) KERNEL_CPU(ADD_NS(cdist), "cdist", Tensor(const Tensor &, const Tensor &, double, c10::optional), fp32) KERNEL_CPU(ADD_NS(cross), "cross", Tensor(const Tensor &, const Tensor &, c10::optional), fp32) KERNEL_CPU(ADD_NS(cumprod), "cumprod", Tensor(const Tensor &, int64_t, c10::optional), fp32) @@ -580,16 +582,16 @@ TORCH_LIBRARY_IMPL(aten, AutocastCPU, m) { KERNEL_CPU(ADD_NS(multilabel_margin_loss), "multilabel_margin_loss", Tensor(const Tensor &, const Tensor &, int64_t), fp32) KERNEL_CPU(ADD_NS(fft_fft), "fft_fft", Tensor(const Tensor &, c10::optional, int64_t, c10::optional), fp32) KERNEL_CPU(ADD_NS(fft_ifft), "fft_ifft", Tensor(const Tensor &, c10::optional, int64_t, c10::optional), fp32) - KERNEL_CPU(ADD_NS(fft_fft2), "fft_fft2", Tensor(const Tensor &, c10::optional, at::IntArrayRef, c10::optional), fp32) - KERNEL_CPU(ADD_NS(fft_ifft2), "fft_ifft2", Tensor(const Tensor &, c10::optional, at::IntArrayRef, c10::optional), fp32) - KERNEL_CPU(ADD_NS(fft_fftn), "fft_fftn", Tensor(const Tensor &, c10::optional, c10::optional, c10::optional), fp32) - KERNEL_CPU(ADD_NS(fft_ifftn), "fft_ifftn", Tensor(const Tensor &, c10::optional, c10::optional, c10::optional), fp32) + KERNEL_CPU(ADD_NS(fft_fft2), "fft_fft2", Tensor(const Tensor &, at::OptionalIntArrayRef, at::IntArrayRef, c10::optional), fp32) + KERNEL_CPU(ADD_NS(fft_ifft2), "fft_ifft2", Tensor(const Tensor &, at::OptionalIntArrayRef, at::IntArrayRef, c10::optional), fp32) + KERNEL_CPU(ADD_NS(fft_fftn), "fft_fftn", Tensor(const Tensor &, at::OptionalIntArrayRef, at::OptionalIntArrayRef, c10::optional), fp32) + KERNEL_CPU(ADD_NS(fft_ifftn), "fft_ifftn", Tensor(const Tensor &, at::OptionalIntArrayRef, at::OptionalIntArrayRef, c10::optional), fp32) KERNEL_CPU(ADD_NS(fft_rfft), "fft_rfft", Tensor(const Tensor &, c10::optional, int64_t, c10::optional), fp32) KERNEL_CPU(ADD_NS(fft_irfft), "fft_irfft", Tensor(const Tensor &, c10::optional, int64_t, c10::optional), fp32) - KERNEL_CPU(ADD_NS(fft_rfft2), "fft_rfft2", Tensor(const Tensor &, c10::optional, at::IntArrayRef, c10::optional), fp32) - KERNEL_CPU(ADD_NS(fft_irfft2), "fft_irfft2", Tensor(const Tensor &, c10::optional, at::IntArrayRef, c10::optional), fp32) - KERNEL_CPU(ADD_NS(fft_rfftn), "fft_rfftn", Tensor(const Tensor &, c10::optional, c10::optional, c10::optional), fp32) - KERNEL_CPU(ADD_NS(fft_irfftn), "fft_irfftn", Tensor(const Tensor &, c10::optional, c10::optional, c10::optional), fp32) + KERNEL_CPU(ADD_NS(fft_rfft2), "fft_rfft2", Tensor(const Tensor &, at::OptionalIntArrayRef, at::IntArrayRef, c10::optional), fp32) + KERNEL_CPU(ADD_NS(fft_irfft2), "fft_irfft2", Tensor(const Tensor &, at::OptionalIntArrayRef, at::IntArrayRef, c10::optional), fp32) + KERNEL_CPU(ADD_NS(fft_rfftn), "fft_rfftn", Tensor(const Tensor &, at::OptionalIntArrayRef, at::OptionalIntArrayRef, c10::optional), fp32) + KERNEL_CPU(ADD_NS(fft_irfftn), "fft_irfftn", Tensor(const Tensor &, at::OptionalIntArrayRef, at::OptionalIntArrayRef, c10::optional), fp32) KERNEL_CPU(ADD_NS(fft_hfft), "fft_hfft", Tensor(const Tensor &, c10::optional, int64_t, c10::optional), fp32) KERNEL_CPU(ADD_NS(fft_ihfft), "fft_ihfft", Tensor(const Tensor &, c10::optional, int64_t, c10::optional), fp32) KERNEL_CPU(ADD_NS(conv_tbc), "conv_tbc", Tensor(const Tensor &, const Tensor &, const Tensor &, int64_t), fp32) @@ -607,7 +609,7 @@ TORCH_LIBRARY_IMPL(aten, AutocastCPU, m) { KERNEL_CPU(ADD_NS(linalg_inv), "linalg_inv", Tensor(const Tensor &), fp32) KERNEL_CPU(ADD_NS(linalg_householder_product), "linalg_householder_product", Tensor(const Tensor &, const Tensor &), fp32) KERNEL_CPU(ADD_NS(linalg_tensorinv), "linalg_tensorinv", Tensor(const Tensor &, int64_t), fp32) - KERNEL_CPU(ADD_NS(linalg_tensorsolve), "linalg_tensorsolve", Tensor(const Tensor &, const Tensor &, c10::optional), fp32) + KERNEL_CPU(ADD_NS(linalg_tensorsolve), "linalg_tensorsolve", Tensor(const Tensor &, const Tensor &, at::OptionalIntArrayRef), fp32) KERNEL_CPU(ADD_NS(fake_quantize_per_tensor_affine), "fake_quantize_per_tensor_affine", Tensor (const Tensor &, double, int64_t, int64_t, int64_t), fp32) KERNEL_CPU(ADD_NS(glu), "glu", Tensor (const Tensor &, int64_t), fp32) diff --git a/aten/src/ATen/core/Formatting.cpp b/aten/src/ATen/core/Formatting.cpp index f3122daf2cc6d6..832059ed198077 100644 --- a/aten/src/ATen/core/Formatting.cpp +++ b/aten/src/ATen/core/Formatting.cpp @@ -12,6 +12,28 @@ namespace c10 { std::ostream& operator<<(std::ostream & out, Backend b) { return out << toString(b); } + +std::ostream& operator<<(std::ostream & out, Scalar s) { + if (s.isFloatingPoint()) { + return out << s.toDouble(); + } + if (s.isComplex()) { + return out << s.toComplexDouble(); + } + if (s.isBoolean()) { + return out << (s.toBool() ? "true" : "false"); + } + if (s.isIntegral(false)) { + return out << s.toLong(); + } + throw std::logic_error("Unknown type in Scalar"); +} + +std::string toString(Scalar s) { + std::stringstream out; + out << s; + return out.str(); +} } namespace at { diff --git a/aten/src/ATen/core/Formatting.h b/aten/src/ATen/core/Formatting.h index 55cfe7b3bdf7e9..6dcfc6c7b3cd15 100644 --- a/aten/src/ATen/core/Formatting.h +++ b/aten/src/ATen/core/Formatting.h @@ -1,12 +1,15 @@ #pragma once -#include -#include #include +#include +#include +#include namespace c10 { TORCH_API std::ostream& operator<<(std::ostream& out, Backend b); +TORCH_API std::ostream& operator<<(std::ostream & out, Scalar s); +TORCH_API std::string toString(Scalar s); } namespace at { @@ -19,21 +22,4 @@ static inline std::ostream& operator<<(std::ostream & out, const Tensor & t) { return print(out,t,80); } TORCH_API void print(const Tensor & t, int64_t linesize=80); - -static inline std::ostream& operator<<(std::ostream & out, Scalar s) { - if (s.isFloatingPoint()) { - return out << s.toDouble(); - } - if (s.isComplex()) { - return out << s.toComplexDouble(); - } - if (s.isBoolean()) { - return out << (s.toBool() ? "true" : "false"); - } - if (s.isIntegral(false)) { - return out << s.toLong(); - } - throw std::logic_error("Unknown type in Scalar"); -} - } diff --git a/aten/src/ATen/core/ITensorListRef.h b/aten/src/ATen/core/ITensorListRef.h new file mode 100644 index 00000000000000..aaa128b7f2e5ee --- /dev/null +++ b/aten/src/ATen/core/ITensorListRef.h @@ -0,0 +1,445 @@ +#pragma once + +#include +#include +#include + +#include +#include +#include +#include + +namespace at { +class Tensor; +} + +namespace c10 { +class ITensorListRef; +class ITensorListRefIterator; + +// Applies arbitrary macros to each `ITensorListRefTag`. +#define TORCH_ITENSORLISTREF_FORALL_TAGS(_, ...) \ + _(Unboxed, ##__VA_ARGS__) \ + _(Boxed, ##__VA_ARGS__) + +// Builds the name of the implementation class for `TAG`. +#define TORCH_ITENSORLISTREF_IMPL(TAG) \ + c10::detail::ITensorListRefTagImpl + +// Defines a "switch-case" for `TAG`. Inside, it executes `BODY`, +// while bringing to scope: +// - `ImplT`: the implementation class for `TAG` +// - `this_`: the result of unwrapping `this` +#define TORCH_ITENSORLISTREF_UNWRAP_CASE(TAG, BODY) \ + case c10::ITensorListRefTag::TAG: { \ + using ImplT = TORCH_ITENSORLISTREF_IMPL(TAG); \ + auto& this_ = ImplT::unwrap(*this); \ + BODY \ + } break; + +// Dispatches the unwrap call, depending on `TAG`, followed by +// the execution of `BODY`. It aborts if `TAG` is not a `ITensorListRefTag`. +#define TORCH_ITENSORLISTREF_UNWRAP(TAG, BODY) \ + switch (TAG) { \ + TORCH_ITENSORLISTREF_FORALL_TAGS(TORCH_ITENSORLISTREF_UNWRAP_CASE, BODY) \ + default: \ + TORCH_INTERNAL_ASSERT(false, "invalid ITensorListRef tag."); \ + } + +enum class ITensorListRefTag { +#define DEFINE_TAG(tag, ...) tag, + TORCH_ITENSORLISTREF_FORALL_TAGS(DEFINE_TAG) +#undef DEFINE_TAG + None +}; + +namespace detail { +using ITensorListRefConstRef = + typename detail::ivalue_to_const_ref_overload_return::type; + +/* + * Interface that implements key functions for each `ITensorListRefTag` type. + * + * You should create an specialization of this class for each + * possible `ITensorListRefTag` type (except `None`). + * + * Specializations of this class should, at least, define: + * - a type `list_type` + * - 1 function `unwrap` for getting the actual `list_type` + * - 2 functions `unwrap` (const and non-const overloads) for getting + * iterators of `list_type` + * - a function `iterator_get` + * + * See the examples below. + */ +template +class ITensorListRefTagImpl {}; + +template <> +class ITensorListRefTagImpl { + public: + using list_type = at::ArrayRef; + + // Unwraps an `ITensorListRef` into a const-ref of type `list_type`. + static const list_type& unwrap(const ITensorListRef& ilist); + + // Unwraps an `ITensorListRefIterator` into a (const) ref of type + // `list_type::const_iterator`. Has overload for const. + static list_type::const_iterator& unwrap(ITensorListRefIterator& it); + static const list_type::const_iterator& unwrap(const ITensorListRefIterator& it); + + // Accesses the element referenced by the unwrapped iterator `it`. + static ITensorListRefConstRef iterator_get(const list_type::const_iterator& it); +}; + +template <> +class ITensorListRefTagImpl { + public: + using list_type = List; + static const list_type& unwrap(const ITensorListRef& ilist); + static list_type::const_iterator& unwrap(ITensorListRefIterator& it); + static const list_type::const_iterator& unwrap(const ITensorListRefIterator& it); + static ITensorListRefConstRef iterator_get(const list_type::const_iterator& it); +}; +} // namespace detail + +/* + * Materialized list for `ITensorListRef`. + * + * Container that groups `Tensor` references together. This exchanges the + * overhead of every method call from `ITensorListRef` for a dynamic allocation. + * + * You should use this container instead of `ITensorListRef` if: + * + * - You are going to iterate the list of tensors more than once + * - You need to repeatedly access arbitrary elements (using `operator[]`) + */ +using MaterializedITensorListRef = + std::vector>; + +/* + * Wrapper around both boxed and unboxed iterators. + * + * Currently, a `std::bidirectional_iterator` that wraps those defined for + * each of the `ITensorListRefTag`. + * + * One should be able to use it, as if it were the unwrapped iterators + * themselves. + * + * [Note: MSVC Iterator Debug] + * =========================== + * MSVC `vector::iterator` implementation (used in the boxed variant) + * makes it so this union's destructor, copy-constructor (assignment), and + * move-constructor (assignment) are implcitly deleted. + * + * Therefore, we need to explicitly define them as needed. Follows a list + * of places where these are needed and their reason: + * + * - `Payload` destructor: + * it is deleted only if the macro `_ITERATOR_DEBUG_LEVEL` is set to 2. + * + * - `ITensorListRefIterator` destructor: + * same as above. However, we need to explicitly call the variant + * destructor explicitly. + * + * - `ITensorListRefIterator` copy-constructor: + * it is deleted only if the macro `_ITERATOR_DEBUG_LEVEL` is different + * than 0. + */ +class ITensorListRefIterator + : public std::iterator< + std::bidirectional_iterator_tag, + detail::ITensorListRefConstRef, + ptrdiff_t, + std::add_pointer, + std::add_rvalue_reference> { + private: +#define DEFINE_FRIEND_CLASS(TAG, ...) friend class TORCH_ITENSORLISTREF_IMPL(TAG); + TORCH_ITENSORLISTREF_FORALL_TAGS(DEFINE_FRIEND_CLASS) +#undef DEFINE_FRIEND_CLASS + + using unboxed_iterator_type = + TORCH_ITENSORLISTREF_IMPL(Unboxed)::list_type::const_iterator; + using boxed_iterator_type = + TORCH_ITENSORLISTREF_IMPL(Boxed)::list_type::const_iterator; + + union Payload { + boxed_iterator_type boxed_iterator; + unboxed_iterator_type unboxed_iterator; + void* _init_ptr; + Payload() : _init_ptr(nullptr) {} +#if defined(_MSC_VER) && _ITERATOR_DEBUG_LEVEL == 2 + // See [Note: MSVC Iterator Debug] + ~Payload() {} +#endif + }; + + public: + ITensorListRefIterator() : tag_(ITensorListRefTag::None) {} + +#if defined(_MSC_VER) && _ITERATOR_DEBUG_LEVEL != 0 + // See [Note: MSVC Iterator Debug] + ITensorListRefIterator(const ITensorListRefIterator& iterator) + : tag_(iterator.tag_) { + switch (tag_) { + case ITensorListRefTag::Boxed: + payload_.boxed_iterator = iterator.payload_.boxed_iterator; + case ITensorListRefTag::Unboxed: + payload_.unboxed_iterator = iterator.payload_.unboxed_iterator; + default: + TORCH_INTERNAL_ASSERT(false, "invalid ITensorListRef tag."); + } + } +#endif + +#if defined(_MSC_VER) && _ITERATOR_DEBUG_LEVEL == 2 + // See [Note: MSVC Iterator Debug] + ~ITensorListRefIterator() { + switch (tag_) { + case ITensorListRefTag::Boxed: + payload_.boxed_iterator.~boxed_iterator_type(); + case ITensorListRefTag::Unboxed: + payload_.unboxed_iterator.~unboxed_iterator_type(); + default: + TORCH_INTERNAL_ASSERT(false, "invalid ITensorListRef tag."); + } + } +#endif + + ITensorListRefIterator(boxed_iterator_type boxed) : tag_(ITensorListRefTag::Boxed) { + payload_.boxed_iterator = boxed; + } + + ITensorListRefIterator(unboxed_iterator_type unboxed) + : tag_(ITensorListRefTag::Unboxed) { + payload_.unboxed_iterator = unboxed; + } + + detail::ITensorListRefConstRef operator*() const { + TORCH_ITENSORLISTREF_UNWRAP(tag_, { return ImplT::iterator_get(this_); }); + } + + ITensorListRefIterator& operator++() { + TORCH_ITENSORLISTREF_UNWRAP(tag_, { ++this_; }); + return *this; + } + + ITensorListRefIterator operator++(int) { + auto old = *this; + TORCH_ITENSORLISTREF_UNWRAP(tag_, { ++this_; }); + return old; + } + + ITensorListRefIterator& operator--() { + TORCH_ITENSORLISTREF_UNWRAP(tag_, { --this_; }); + return *this; + } + + ITensorListRefIterator operator--(int) { + auto old = *this; + TORCH_ITENSORLISTREF_UNWRAP(tag_, { --this_; }); + return old; + } + + bool operator==(const ITensorListRefIterator& rhs) const { + if (tag_ != rhs.tag_) { + return false; + } + TORCH_ITENSORLISTREF_UNWRAP(tag_, { + auto& rhs_it = ImplT::unwrap(rhs); + return this_ == rhs_it; + }); + } + + bool operator!=(const ITensorListRefIterator& rhs) const { + return !(*this == rhs); + } + + private: + Payload payload_; + ITensorListRefTag tag_; +}; + +/* + * [Note: ITensorListRef] + * Wrapper around boxed and unboxed API containers. + * + * Tagged union of both API containers: + * - `TensorList`, a.k.a. `ArrayRef` (the unboxed API container) + * - `List` (the boxed API container) + * + * This container wraps around these two, without incurring in extra overhead + * for converting from one to another. + * + * Note that `ITensorListRef` is a view type. Meaning that it won't own the + * tensors it holds. If you need it to last longer, make sure that there is + * actually a non-temporary list of tensors (e.g. `vector`) that owns + * them and outlives the `ITensorListRef` instance. + * + * (see https://github.com/pytorch/pytorch/issues/66328) + */ +class ITensorListRef { + private: +#define DEFINE_FRIEND_CLASS(TAG, ...) friend class TORCH_ITENSORLISTREF_IMPL(TAG); + TORCH_ITENSORLISTREF_FORALL_TAGS(DEFINE_FRIEND_CLASS) +#undef DEFINE_FRIEND_CLASS + + using unboxed_type = TORCH_ITENSORLISTREF_IMPL(Unboxed)::list_type; + using boxed_type = TORCH_ITENSORLISTREF_IMPL(Boxed)::list_type; + + union Payload { + const boxed_type* boxed; + unboxed_type unboxed; + Payload() : boxed(nullptr) {} + ~Payload() {}; + }; + + public: + using iterator = ITensorListRefIterator; + using const_iterator = ITensorListRefIterator; + using value_type = typename iterator::value_type; + + ITensorListRef() : tag_(ITensorListRefTag::None) {} + + ITensorListRef(const std::initializer_list& list) + : tag_(ITensorListRefTag::Unboxed) { + payload_.unboxed = at::ArrayRef(list); + } + + ITensorListRef(const boxed_type& boxed) : tag_(ITensorListRefTag::Boxed) { + payload_.boxed = &boxed; + } + + ITensorListRef(const unboxed_type& unboxed) : tag_(ITensorListRefTag::Unboxed) { + payload_.unboxed = unboxed; + } + + template < + typename... UnboxedConstructorArgs, + typename = std::enable_if_t< + std::is_constructible::value>> + ITensorListRef(UnboxedConstructorArgs&&... args) + : tag_(ITensorListRefTag::Unboxed) { + payload_.unboxed = unboxed_type(std::forward(args)...); + } + + size_t size() const { + TORCH_ITENSORLISTREF_UNWRAP(tag_, { return this_.size(); }); + } + + bool empty() const { + return size() == 0; + } + + iterator begin() const { + TORCH_ITENSORLISTREF_UNWRAP(tag_, { return this_.begin(); }); + } + + iterator end() const { + TORCH_ITENSORLISTREF_UNWRAP(tag_, { return this_.end(); }); + } + + MaterializedITensorListRef materialize() const { + MaterializedITensorListRef materialized; + materialized.reserve(size()); + for (const auto& t : *this) { + materialized.emplace_back(t); + } + return materialized; + } + +#define DEFINE_CHECK(TAG, ...) \ + bool is##TAG() const { \ + return tag_ == ITensorListRefTag::TAG; \ + } + TORCH_ITENSORLISTREF_FORALL_TAGS(DEFINE_CHECK); +#undef DEFINE_CHECK + + bool isNone() const { + return tag_ == ITensorListRefTag::None; + } + +#define DEFINE_CASTING(TAG, ...) \ + const typename TORCH_ITENSORLISTREF_IMPL(TAG)::list_type& to##TAG() const { \ + TORCH_INTERNAL_ASSERT(is##TAG()); \ + return TORCH_ITENSORLISTREF_IMPL(TAG)::unwrap(*this); \ + } + TORCH_ITENSORLISTREF_FORALL_TAGS(DEFINE_CASTING); +#undef DEFINE_CASTING + + private: + Payload payload_; + ITensorListRefTag tag_; +}; + +} // namespace c10 + +inline +const TORCH_ITENSORLISTREF_IMPL(Unboxed)::list_type& +TORCH_ITENSORLISTREF_IMPL(Unboxed)::unwrap( + const c10::ITensorListRef& ilist +) { + return ilist.payload_.unboxed; +} + +inline +TORCH_ITENSORLISTREF_IMPL(Unboxed)::list_type::const_iterator& +TORCH_ITENSORLISTREF_IMPL(Unboxed)::unwrap( + c10::ITensorListRefIterator& it +) { + return it.payload_.unboxed_iterator; +} + +inline +const TORCH_ITENSORLISTREF_IMPL(Unboxed)::list_type::const_iterator& +TORCH_ITENSORLISTREF_IMPL(Unboxed)::unwrap( + const c10::ITensorListRefIterator& it +) { + return it.payload_.unboxed_iterator; +} + +inline +c10::detail::ITensorListRefConstRef +TORCH_ITENSORLISTREF_IMPL(Unboxed)::iterator_get( + const list_type::const_iterator& it +) { + return *it; +} + +inline +const TORCH_ITENSORLISTREF_IMPL(Boxed)::list_type& +TORCH_ITENSORLISTREF_IMPL(Boxed)::unwrap( + const c10::ITensorListRef& ilist +) { + return *ilist.payload_.boxed; +} + +inline +TORCH_ITENSORLISTREF_IMPL(Boxed)::list_type::const_iterator& +TORCH_ITENSORLISTREF_IMPL(Boxed)::unwrap( + c10::ITensorListRefIterator& it +) { + return it.payload_.boxed_iterator; +} + +inline +const TORCH_ITENSORLISTREF_IMPL(Boxed)::list_type::const_iterator& +TORCH_ITENSORLISTREF_IMPL(Boxed)::unwrap( + const c10::ITensorListRefIterator& it +) { + return it.payload_.boxed_iterator; +} + +inline +c10::detail::ITensorListRefConstRef +TORCH_ITENSORLISTREF_IMPL(Boxed)::iterator_get( + const list_type::const_iterator& it +) { + return (*it).get().toTensor(); +} + +namespace at { +using ITensorListRef = c10::ITensorListRef; +using ITensorListRefIterator = c10::ITensorListRefIterator; +using MaterializedITensorListRef = c10::MaterializedITensorListRef; +} // namespace at diff --git a/aten/src/ATen/core/ITensorListRef_test.cpp b/aten/src/ATen/core/ITensorListRef_test.cpp new file mode 100644 index 00000000000000..679ccea5865ffa --- /dev/null +++ b/aten/src/ATen/core/ITensorListRef_test.cpp @@ -0,0 +1,188 @@ +#include +#include +#include + +using namespace c10; + +static std::vector get_tensor_vector() { + std::vector boxed; + const size_t SIZE = 5; + for (size_t i = 0; i < SIZE; i++) { + boxed.push_back(at::empty({0})); + } + return boxed; +} + +template +void check_elements_same(ITensorListRef list, const T& thing, int use_count) { + EXPECT_EQ(thing.size(), list.size()); + size_t i = 0; + for (const auto& t : list) { + const at::Tensor& other = thing[i]; + EXPECT_EQ(other.use_count(), use_count); + EXPECT_TRUE(other.is_same(t)); + i++; + } +} + +TEST(ITensorListRefTest, CtorEmpty_IsNone_Throws) { + ITensorListRef list; + EXPECT_TRUE(list.isNone()); + // NOLINTNEXTLINE(cppcoreguidelines-avoid-goto,hicpp-avoid-goto) + EXPECT_THROW(list.size(), c10::Error); +} + +TEST(ITensorListRefTest, CtorBoxed_IsBoxed) { + auto vec = get_tensor_vector(); + List boxed(vec); + ITensorListRef list(boxed); + EXPECT_TRUE(list.isBoxed()); +} + +TEST(ITensorListRefTest, CtorUnboxed_IsUnboxed) { + auto vec = get_tensor_vector(); + at::ArrayRef unboxed(vec); + ITensorListRef list(unboxed); + EXPECT_TRUE(list.isUnboxed()); +} + +TEST(ITensorListRefTest, CtorUnboxedIndirect_IsUnboxed) { + auto vec = get_tensor_vector(); + auto check_is_unboxed = [](ITensorListRef list) { + EXPECT_TRUE(list.isUnboxed()); + }; + check_is_unboxed(vec[0]); + check_is_unboxed({vec.data(), vec.size()}); + check_is_unboxed({&*vec.begin(), &*vec.end()}); + check_is_unboxed(vec); +} + +TEST(ITensorListRefTest, CtorTemp_IsUnboxed) { + auto check_is_unboxed = [](ITensorListRef list) { + EXPECT_TRUE(list.isUnboxed()); + }; + + auto vec = get_tensor_vector(); + check_is_unboxed({vec[0], vec[1]}); +} + +TEST(ITensorListRefTest, Boxed_GetConstRefTensor) { + auto vec = get_tensor_vector(); + // We need 'boxed' to be 'const' here (and some other tests below) + // because 'List::operator[]' returns a 'ListElementReference' + // instead of returning a 'Tensor'. On the other hand, + // 'List::operator[] const' returns a 'const Tensor &'. + const List boxed(vec); + ITensorListRef list(boxed); + static_assert( + std::is_same::value, + "Accessing elements from List through a ITensorListRef should be const references."); + EXPECT_TRUE(boxed[0].is_same(*list.begin())); + EXPECT_TRUE(boxed[1].is_same(*(++list.begin()))); +} + +TEST(ITensorListRefTest, Unboxed_GetConstRefTensor) { + auto vec = get_tensor_vector(); + ITensorListRef list(vec); + static_assert( + std::is_same::value, + "Accessing elements from ArrayRef through a ITensorListRef should be const references."); + EXPECT_TRUE(vec[0].is_same(*list.begin())); + EXPECT_TRUE(vec[1].is_same(*(++list.begin()))); +} + +TEST(ITensorListRefTest, Boxed_Equal) { + auto vec = get_tensor_vector(); + List boxed(vec); + check_elements_same(boxed, vec, /* use_count= */ 2); +} + +TEST(ITensorListRefTest, Unboxed_Equal) { + auto vec = get_tensor_vector(); + check_elements_same(at::ArrayRef(vec), vec, /* use_count= */ 1); +} + +TEST(ITensorListRefTest, UnboxedIndirect_Equal) { + auto vec = get_tensor_vector(); + check_elements_same(vec[0], std::vector{vec[0]}, /* use_count= */ 3); + check_elements_same({vec.data(), vec.size()}, vec, /* use_count= */ 1); + check_elements_same({&*vec.begin(), &*vec.end()}, vec, /* use_count= */ 1); + check_elements_same(vec, vec, /* use_count= */ 1); +} + +TEST(ITensorListRefTest, BoxedMaterialize_Equal) { + auto vec = get_tensor_vector(); + List boxed(vec); + ITensorListRef list(boxed); + auto materialized = list.materialize(); + check_elements_same(list, vec, 2); + check_elements_same(list, materialized, 2); +} + +TEST(ITensorListRefTest, UnboxedMaterialize_Equal) { + auto vec = get_tensor_vector(); + at::ArrayRef unboxed(vec); + ITensorListRef list(unboxed); + auto materialized = list.materialize(); + check_elements_same(list, vec, 1); + check_elements_same(list, materialized, 1); +} + +TEST(ITensorListRefIteratorTest, CtorEmpty_ThrowsError) { + ITensorListRefIterator it; + // NOLINTNEXTLINE(cppcoreguidelines-avoid-goto,hicpp-avoid-goto) + EXPECT_THROW(*it, c10::Error); +} + +TEST(ITensorListRefIteratorTest, Boxed_GetFirstElement) { + auto vec = get_tensor_vector(); + const List boxed(vec); + ITensorListRef list(boxed); + EXPECT_TRUE(boxed[0].is_same(*list.begin())); +} + +TEST(ITensorListRefIteratorTest, Unboxed_GetFirstElement) { + auto vec = get_tensor_vector(); + ITensorListRef list(vec); + EXPECT_TRUE(vec[0].is_same(*list.begin())); +} + +TEST(ITensorListRefIteratorTest, Boxed_Equality) { + auto vec = get_tensor_vector(); + List boxed(vec); + ITensorListRef list(boxed); + EXPECT_EQ(list.begin(), list.begin()); + EXPECT_NE(list.begin(), list.end()); + EXPECT_NE(list.end(), list.begin()); + EXPECT_EQ(list.end(), list.end()); +} + +TEST(ITensorListRefIteratorTest, Unboxed_Equality) { + auto vec = get_tensor_vector(); + ITensorListRef list(vec); + EXPECT_EQ(list.begin(), list.begin()); + EXPECT_NE(list.begin(), list.end()); + EXPECT_NE(list.end(), list.begin()); + EXPECT_EQ(list.end(), list.end()); +} + +TEST(ITensorListRefIteratorTest, Boxed_Iterate) { + auto vec = get_tensor_vector(); + const List boxed(vec); + ITensorListRef list(boxed); + size_t i = 0; + for (const auto& t : list) { + EXPECT_TRUE(boxed[i++].is_same(t)); + } + EXPECT_EQ(i, list.size()); +} + +TEST(ITensorListRefIteratorTest, Unboxed_Iterate) { + auto vec = get_tensor_vector(); + ITensorListRef list(vec); + size_t i = 0; + for (const auto& t : list) { + EXPECT_TRUE(vec[i++].is_same(t)); + } + EXPECT_EQ(i, list.size()); +} diff --git a/aten/src/ATen/core/List.h b/aten/src/ATen/core/List.h index b042fab24f7d8c..0785a6941affda 100644 --- a/aten/src/ATen/core/List.h +++ b/aten/src/ATen/core/List.h @@ -78,6 +78,10 @@ class ListElementReference final { // assigning another ref to this assigns the underlying value ListElementReference& operator=(ListElementReference&& rhs) &&; + const IValue& get() const& { + return *iterator_; + } + friend void swap(ListElementReference&& lhs, ListElementReference&& rhs); private: @@ -235,6 +239,7 @@ class List final { using value_type = T; using size_type = typename c10::detail::ListImpl::list_type::size_type; using iterator = impl::ListIterator; + using const_iterator = impl::ListIterator; using reverse_iterator = impl::ListIterator; /** diff --git a/aten/src/ATen/core/PythonFallbackKernel.cpp b/aten/src/ATen/core/PythonFallbackKernel.cpp index 37766077287b54..41becc56735496 100644 --- a/aten/src/ATen/core/PythonFallbackKernel.cpp +++ b/aten/src/ATen/core/PythonFallbackKernel.cpp @@ -1,28 +1,65 @@ -#include -#include #include +#include +#include +#include #include namespace { -// TLS saving the state of the include/exclude sets on entry to the dispatcher -// This is set in the pythonTLSSnapshot fallback and used by the Python fallback. -thread_local std::stack tls_on_entry; +// This TLS is used to track the state of the dispatcher to be able to restore +// it when calling back into python. +// It has the following invariant: +// - It must be empty while python code is executed. +// - It should only be set once even for multiple dispatcher calls that do not come +// back to python. +// To achieve this, we ensure that the tls is empty by default and emptied again both when +// we call into user torch_dispatch or returning back to python after this call. -struct StashTLSStateGuard { - public: - StashTLSStateGuard(const c10::impl::LocalDispatchKeySet& key_set) { - tls_on_entry.push(key_set); +thread_local c10::optional tls_on_entry; + +// RAII guard to make working with the above TLS safer. +struct MaybeSetTLSOnEntryGuard { +public: + MaybeSetTLSOnEntryGuard() { + if (tls_on_entry.has_value()) { + value_set_ = false; + } else { + value_set_ = true; + tls_on_entry = c10::impl::tls_local_dispatch_key_set(); + } } - ~StashTLSStateGuard() { - tls_on_entry.pop(); + ~MaybeSetTLSOnEntryGuard() { + if (value_set_) { + TORCH_INTERNAL_ASSERT(tls_on_entry.has_value()); + tls_on_entry = c10::nullopt; + } } + +private: + bool value_set_; +}; + +// This guard assumes that tls_on_entry has a value. +struct StashTLSOnEntryGuard { +public: + StashTLSOnEntryGuard(): saved_(tls_on_entry.value()) { + tls_on_entry = c10::nullopt; + } + + ~StashTLSOnEntryGuard() { + TORCH_INTERNAL_ASSERT(!tls_on_entry.has_value()); + tls_on_entry = saved_; + } + +private: + c10::impl::LocalDispatchKeySet saved_; }; void pythonFallback(const c10::OperatorHandle& op, torch::jit::Stack* stack) { - TORCH_INTERNAL_ASSERT(tls_on_entry.size() > 0); - c10::impl::ForceDispatchKeyGuard guard(tls_on_entry.top()); + TORCH_INTERNAL_ASSERT(tls_on_entry.has_value()); + c10::impl::ForceDispatchKeyGuard dispatcher_guard(tls_on_entry.value()); + StashTLSOnEntryGuard stash_guard; // If Python Mode is active, use its PyInterpreter for dispatch const auto& maybe_python_mode_state = at::impl::PythonModeTLS::get_state(); @@ -63,10 +100,9 @@ void pythonFallback(const c10::OperatorHandle& op, torch::jit::Stack* stack) { void pythonTLSSnapshotFallback(const c10::OperatorHandle& op, c10::DispatchKeySet dispatch_keys, torch::jit::Stack* stack) { // It is ok for the tls to be already set here. - // A CompositeImplicitAutograd function may have been called just before this and so the tls here were never cleared - // This is also why we don't need an RAII to ensure the tls is reset when exceptions happen - - StashTLSStateGuard guard(c10::impl::tls_local_dispatch_key_set()); + // It means that there are multiple calls into the dispatcher not originating from python code. + // The guard below will properly ignore such calls. + MaybeSetTLSOnEntryGuard guard; op.redispatchBoxed(dispatch_keys & c10::DispatchKeySet(c10::DispatchKeySet::FULL_AFTER, c10::DispatchKey::PythonTLSSnapshot), stack); } diff --git a/aten/src/ATen/core/PythonModeTLS.cpp b/aten/src/ATen/core/PythonModeTLS.cpp index 97892fcf5d3742..2382c77c220e40 100644 --- a/aten/src/ATen/core/PythonModeTLS.cpp +++ b/aten/src/ATen/core/PythonModeTLS.cpp @@ -1,25 +1,26 @@ #include +#include namespace at { namespace impl { -thread_local std::shared_ptr pythonModeState; +thread_local std::shared_ptr pythonModeState; -void PythonModeTLS::set_state(const std::shared_ptr& state) { - pythonModeState = state; +void PythonModeTLS::set_state(std::shared_ptr state) { if (state) { c10::impl::tls_set_dispatch_key_included(DispatchKey::Python, true); c10::impl::tls_set_dispatch_key_included(DispatchKey::PythonTLSSnapshot, true); } else { PythonModeTLS::reset_state(); } + pythonModeState = std::move(state); } -const std::shared_ptr& PythonModeTLS::get_state() { +const std::shared_ptr& PythonModeTLS::get_state() { return pythonModeState; } void PythonModeTLS::reset_state() { - pythonModeState.reset((TorchDispatchTypeObject*)nullptr); + pythonModeState.reset(); c10::impl::tls_set_dispatch_key_included(DispatchKey::Python, false); c10::impl::tls_set_dispatch_key_included(DispatchKey::PythonTLSSnapshot, false); } diff --git a/aten/src/ATen/core/PythonModeTLS.h b/aten/src/ATen/core/PythonModeTLS.h index be52b182c659b2..9794090de1715b 100644 --- a/aten/src/ATen/core/PythonModeTLS.h +++ b/aten/src/ATen/core/PythonModeTLS.h @@ -8,8 +8,8 @@ namespace at { namespace impl { struct TORCH_API PythonModeTLS { - static void set_state(const std::shared_ptr& state); - static const std::shared_ptr& get_state(); + static void set_state(std::shared_ptr state); + static const std::shared_ptr& get_state(); static void reset_state(); }; diff --git a/aten/src/ATen/core/QuantizerBase.h b/aten/src/ATen/core/QuantizerBase.h index e11d8d6e049c16..922ea8a38f50d0 100644 --- a/aten/src/ATen/core/QuantizerBase.h +++ b/aten/src/ATen/core/QuantizerBase.h @@ -55,7 +55,7 @@ struct TORCH_API Quantizer : public c10::intrusive_ptr_target { */ virtual QScheme qscheme() const = 0; - ScalarType scalar_type() { + ScalarType scalar_type() const { return scalar_type_; } @@ -77,7 +77,7 @@ struct TORCH_API Quantizer : public c10::intrusive_ptr_target { /** * Compare against `other` for equality. */ - virtual bool equalTo(QuantizerPtr other) = 0; + virtual bool equalTo(QuantizerPtr other) const = 0; }; } // namespace at diff --git a/aten/src/ATen/core/SymInt.h b/aten/src/ATen/core/SymInt.h new file mode 100644 index 00000000000000..5cebf357dbfd83 --- /dev/null +++ b/aten/src/ATen/core/SymInt.h @@ -0,0 +1,60 @@ +#pragma once + +#include +#include + +namespace c10 { + +// `SymInt` is a C++ wrapper class around int64_t data_ which and is used to +// represent concrete dimension values. +// +// `SymInt` is also a data type in Pytorch that can be used in function schemas +// to enable tracing. +// +// `SymInt` is introduced to enable tracing arithmetic +// operations on symbolic integers (e.g. sizes). Tracing symbolic sizes will +// allow LTC and AOTAutograd representing dynamic shapes in expression graphs +// faithfully without baking in concrete dimension values. +// +// To trace the operations, SymInt will overload arithmetic operators (e.g. +, -, *) +// and will provide overloads taking SymInt for commonly used math functions. +// +// SymInt will be extenteded to represent a union structure Union[int64_t, SymbolicIntNode*] +// which will be implemented as a single packed int64_t field named data_. +// +// data_ can be either a plain int64_t or (1 << 63 | `index`). `index` points to +// SymbolicIntNode* that will be responsible for constructing an IR node for +// a traced operation to represent it in LTC or Fx graphs. +class TORCH_API SymInt { + public: + SymInt(int64_t d): + data_(d) {}; + + int64_t expect_int() const { + // we are dealing with concrete ints only for now + return data_; + } + + bool is_symbolic() const { + return false; + } + + bool operator==(const SymInt& p2) const + { + return data_ == p2.data_; + } + + SymInt operator+(SymInt sci) const { + return data_ + sci.data_; + } + + int64_t data() const { + return data_; + } + + private: + int64_t data_; +}; + +TORCH_API std::ostream& operator<<(std::ostream& os, SymInt s); +} diff --git a/aten/src/ATen/core/TensorBase.h b/aten/src/ATen/core/TensorBase.h index 0af9513eaa57f7..0ba95383f4447b 100644 --- a/aten/src/ATen/core/TensorBase.h +++ b/aten/src/ATen/core/TensorBase.h @@ -156,15 +156,17 @@ class TORCH_API TensorBase { } int64_t size(int64_t dim) const { + const auto sizes = this->sizes(); + const auto ndim = static_cast(sizes.size()); // false is passed to maybe_wrap_dim so behavior is identical to array access (but with wrapping) - dim = c10::maybe_wrap_dim(dim, this->dim(), false); - return sizes()[dim]; + return sizes[c10::maybe_wrap_dim(dim, ndim, /*wrap_scalar=*/false)]; } int64_t stride(int64_t dim) const { + const auto strides = this->strides(); + const auto ndim = static_cast(strides.size()); // false is passed to maybe_wrap_dim so behavior is identical to array access (but with wrapping) - dim = c10::maybe_wrap_dim(dim, this->dim(), false); - return strides()[dim]; + return strides[c10::maybe_wrap_dim(dim, ndim, /*wrap_scalar=*/false)]; } TensorImpl * unsafeGetTensorImpl() const { @@ -370,6 +372,12 @@ class TORCH_API TensorBase { return impl_->is_cuda(); } + /// Returns if a `Tensor` has IPU backend. + bool is_ipu() const { + // NB: this is not a native function to avoid dispatching overhead. + return impl_->is_ipu(); + } + /// Returns if a `Tensor` has XPU backend. bool is_xpu() const { // NB: this is not a native function to avoid dispatching overhead. @@ -462,6 +470,11 @@ class TORCH_API TensorBase { return impl_->is_inference(); } + // Returns if a `Tensor` is a NestedTensor. + bool is_nested() const { + return impl_->is_nested(); + } + /// If a tensor is a quantized tensor, returns its quantizer /// TODO: it's not in native_functions.yaml yet as it's not exposed to python QuantizerPtr quantizer() const; diff --git a/aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h b/aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h index f48246c02fd682..87c5c33bdeeabf 100644 --- a/aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h +++ b/aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h @@ -180,6 +180,13 @@ namespace impl { "You tried to register a kernel with an unsupported input type: ArrayRef. Please use List, List or Tensor instead."); }; + template + struct assert_is_valid_input_type, AllowDeprecatedTypes> + : assert_is_valid_input_type { + static_assert(!std::is_same::value, + "You tried to register a kernel with an unsupported input type: OptionalArrayRef. Please use List, List or Tensor instead."); + }; + template struct assert_is_valid_input_type, AllowDeprecatedTypes> : assert_is_valid_input_type { @@ -233,6 +240,10 @@ namespace impl { struct assert_is_valid_output_type, AllowDeprecatedTypes> : assert_is_valid_output_type {}; + template + struct assert_is_valid_output_type, AllowDeprecatedTypes> + : assert_is_valid_output_type {}; + template struct assert_is_valid_output_type, AllowDeprecatedTypes> : assert_is_valid_output_type { @@ -361,13 +372,24 @@ namespace impl { template struct ivalue_to_arg>, AllowDeprecatedTypes> final { // If an argument is optional>, convert the IValue to an optional> and pass that - // to the operator. OptionalArray is basically a optional> but impliticly convertible + // to the operator. OptionalArray is basically a optional> but implicitly convertible // to optional>. static OptionalArray call(IValue& v) { return ivalue_to_arg, AllowDeprecatedTypes>::call(v); } }; + template + struct ivalue_to_arg, AllowDeprecatedTypes> final { + // If an argument is OptionalArrayRef, convert the IValue to an + // optional> and pass that to the operator. OptionalArray + // is basically a optional> but implicitly convertible to + // OptionalArrayRef + static OptionalArray call(IValue& v) { + return ivalue_to_arg, AllowDeprecatedTypes>::call(v); + } + }; + // return_to_ivalue template struct return_to_ivalue final {}; diff --git a/aten/src/ATen/core/builtin_function.h b/aten/src/ATen/core/builtin_function.h index 3c6fd0c77cadf1..6f1e9e75ea3e29 100644 --- a/aten/src/ATen/core/builtin_function.h +++ b/aten/src/ATen/core/builtin_function.h @@ -62,7 +62,7 @@ struct BuiltinOpFunction : public Function { return *this; } - bool call(Stack& stack, size_t, c10::function_ref) override { + bool call(Stack& stack, c10::optional, c10::function_ref) override { run(stack); return false; } diff --git a/aten/src/ATen/core/dispatch/DispatchKeyExtractor.cpp b/aten/src/ATen/core/dispatch/DispatchKeyExtractor.cpp index a930edc2db6328..9180d0d19e6449 100644 --- a/aten/src/ATen/core/dispatch/DispatchKeyExtractor.cpp +++ b/aten/src/ATen/core/dispatch/DispatchKeyExtractor.cpp @@ -6,11 +6,52 @@ namespace c10 { void DispatchKeyExtractor::setOperatorHasFallthroughForKey(DispatchKey k, bool has_fallthrough) { + // (1) update nonFallthroughKeys_ if (has_fallthrough) { nonFallthroughKeys_ = nonFallthroughKeys_.remove(k); } else { nonFallthroughKeys_ = nonFallthroughKeys_.add(k); } + // (2) update nonFallthroughKeysPerBackend_ + if (isPerBackendFunctionalityKey(toFunctionalityKey(k))) { + // This is a per-backend functionality key. + // We need to figure out what the current backend is, + // and only update the bitset for that backend. + // subtracting 1 because the first backend should have index 0 (CPU), + // But the enum starts with BackendComponent::InvalidBit. + auto backend_idx = static_cast(toBackendComponent(k)) - 1; + TORCH_INTERNAL_ASSERT(backend_idx >= 0 && static_cast(backend_idx) < nonFallthroughKeysPerBackend_.size()); + if (has_fallthrough) { + nonFallthroughKeysPerBackend_[backend_idx] = nonFallthroughKeysPerBackend_[backend_idx].remove(k); + } else { + nonFallthroughKeysPerBackend_[backend_idx] = nonFallthroughKeysPerBackend_[backend_idx].add(k); + } + + // Set requiresBitsetPerBackend_ accordingly + for (const auto i : c10::irange(nonFallthroughKeysPerBackend_.size() - 1)) { + if (nonFallthroughKeysPerBackend_[i] != nonFallthroughKeysPerBackend_[i+1]) { + requiresBitsetPerBackend_ = true; + return; + } + } + requiresBitsetPerBackend_ = false; + return; + } else { + // Otherwise, if a fallthrough is set for a functionality that isn't per backend, + // Then we update the fallthrough bitset for EVERY backend. + // TODO: we could probably optimize this by only lazily updating these values + // the first time that we see requiresBitsetPerBackend_ = true + // (which should almost never happen) + if (has_fallthrough) { + for (const auto i : c10::irange(nonFallthroughKeysPerBackend_.size())) { + nonFallthroughKeysPerBackend_[i] = nonFallthroughKeysPerBackend_[i].remove(k); + } + } else { + for (const auto i : c10::irange(nonFallthroughKeysPerBackend_.size())) { + nonFallthroughKeysPerBackend_[i] = nonFallthroughKeysPerBackend_[i].add(k); + } + } + } } std::string DispatchKeyExtractor::dumpState() const { diff --git a/aten/src/ATen/core/dispatch/DispatchKeyExtractor.h b/aten/src/ATen/core/dispatch/DispatchKeyExtractor.h index 53e348d6b99ea9..d5345b28e7149f 100644 --- a/aten/src/ATen/core/dispatch/DispatchKeyExtractor.h +++ b/aten/src/ATen/core/dispatch/DispatchKeyExtractor.h @@ -156,14 +156,24 @@ struct TORCH_API DispatchKeyExtractor final { } }); // Keys that are fallthrough should be skipped - return impl::computeDispatchKeySet(ks, nonFallthroughKeys_); + if (requiresBitsetPerBackend_) { + auto backend_idx = ks.getBackendIndex(); + return impl::computeDispatchKeySet(ks, nonFallthroughKeysPerBackend_[backend_idx]); + } else { + return impl::computeDispatchKeySet(ks, nonFallthroughKeys_); + } } template DispatchKeySet getDispatchKeySetUnboxed(const Args&... args) const { auto ks = detail::multi_dispatch_key_set(args...); // Keys that are fallthrough should be skipped - return impl::computeDispatchKeySet(ks, nonFallthroughKeys_); + if (requiresBitsetPerBackend_) { + auto backend_idx = ks.getBackendIndex(); + return impl::computeDispatchKeySet(ks, nonFallthroughKeysPerBackend_[backend_idx]); + } else { + return impl::computeDispatchKeySet(ks, nonFallthroughKeys_); + } } void setOperatorHasFallthroughForKey(DispatchKey k, bool has_fallthrough); @@ -193,7 +203,12 @@ struct TORCH_API DispatchKeyExtractor final { explicit DispatchKeyExtractor(c10::utils::bitset dispatch_arg_indices_reverse) : dispatch_arg_indices_reverse_(dispatch_arg_indices_reverse) - , nonFallthroughKeys_(DispatchKeySet::FULL) {} + , nonFallthroughKeys_(DispatchKeySet::FULL) + , requiresBitsetPerBackend_(false) { + for (const auto i : c10::irange(nonFallthroughKeysPerBackend_.size())) { + nonFallthroughKeysPerBackend_[i] = DispatchKeySet::FULL; + } + } // this is a bitset that has ones for each argument index which has to be // considered for dispatch. This avoids having to iterate over the stack @@ -205,8 +220,14 @@ struct TORCH_API DispatchKeyExtractor final { // fallthrough c10::utils::bitset dispatch_arg_indices_reverse_; - // Set of keys for which the operator does NOT have fallthrough kernel. + // Set of functionality keys for which the operator does NOT have fallthrough kernel. DispatchKeySet nonFallthroughKeys_; + // Set of functionality keys for which the operator does NOT have fallthrough kernel, defined PER BACKEND. + // This is only needed if we know that the operator has a different set of fallthroughs defined for some backends. + std::array nonFallthroughKeysPerBackend_; + // Flag to tell us if we can use the single set of nonFallthroughKeys_ (fast path), + // or if we need to fall back to the slower path and check nonFallthroughKeysPerBackend_ + bool requiresBitsetPerBackend_; }; } diff --git a/aten/src/ATen/core/dispatch/Dispatcher.cpp b/aten/src/ATen/core/dispatch/Dispatcher.cpp index 3dccc4645a824c..86960634e46133 100644 --- a/aten/src/ATen/core/dispatch/Dispatcher.cpp +++ b/aten/src/ATen/core/dispatch/Dispatcher.cpp @@ -267,14 +267,16 @@ void Dispatcher::cleanup(const OperatorHandle& op, const OperatorName& op_name) RegistrationHandleRAII Dispatcher::registerFallback(DispatchKey dispatchKey, KernelFunction kernel, std::string debug) { std::lock_guard lock(mutex_); + auto idx = getDispatchTableIndexForDispatchKey(dispatchKey); + TORCH_CHECK(idx >= 0 && static_cast(idx) < backendFallbackKernels_.size(), "idx=", idx); TORCH_CHECK( - !backendFallbackKernels_[static_cast(dispatchKey)].kernel.isValid(), + !backendFallbackKernels_[idx].kernel.isValid(), "Tried to register multiple backend fallbacks for the same dispatch key ", dispatchKey, "; previous registration ", - backendFallbackKernels_[static_cast(dispatchKey)].debug, ", new registration ", debug + backendFallbackKernels_[idx].debug, ", new registration ", debug ); // NB: inferred function schema is always nullptr for fallbacks, as fallbacks // cannot be unobxed - backendFallbackKernels_[static_cast(dispatchKey)] = impl::AnnotatedKernel(std::move(kernel), nullptr, std::move(debug)); + backendFallbackKernels_[idx] = impl::AnnotatedKernel(std::move(kernel), nullptr, std::move(debug)); for (auto& op : operators_) { op.op.updateFallback(*this, dispatchKey); @@ -288,7 +290,8 @@ RegistrationHandleRAII Dispatcher::registerFallback(DispatchKey dispatchKey, Ker void Dispatcher::deregisterFallback_(DispatchKey dispatchKey) { std::lock_guard lock(mutex_); - backendFallbackKernels_[static_cast(dispatchKey)] = {}; + auto idx = getDispatchTableIndexForDispatchKey(dispatchKey); + backendFallbackKernels_[idx] = {}; for (auto& op : operators_) { op.op.updateFallback(*this, dispatchKey); diff --git a/aten/src/ATen/core/dispatch/Dispatcher.h b/aten/src/ATen/core/dispatch/Dispatcher.h index 14ffa2f94c9c8c..8108c3c1928b81 100644 --- a/aten/src/ATen/core/dispatch/Dispatcher.h +++ b/aten/src/ATen/core/dispatch/Dispatcher.h @@ -291,7 +291,7 @@ class TORCH_API Dispatcher final { // Map from namespace to debug string (saying, e.g., where the library was defined) ska::flat_hash_map libraries_; - std::array(DispatchKey::NumDispatchKeys)> backendFallbackKernels_; + std::array backendFallbackKernels_; std::unique_ptr listeners_; std::mutex mutex_; @@ -531,8 +531,7 @@ C10_DISPATCHER_INLINE_UNLESS_MOBILE Return Dispatcher::call(const TypedOperatorH detail::unused_arg_(args...); // workaround for a false-positive warning about unused parameters in gcc 5 auto dispatchKeySet = op.operatorDef_->op.dispatchKeyExtractor() .template getDispatchKeySetUnboxed(args...); - TORCH_INTERNAL_ASSERT_DEBUG_ONLY(!c10::isAliasDispatchKey(dispatchKeySet.highestPriorityTypeId())); - const KernelFunction& kernel = op.operatorDef_->op.lookup(dispatchKeySet.highestPriorityTypeId()); + const KernelFunction& kernel = op.operatorDef_->op.lookup(dispatchKeySet); #ifndef PYTORCH_DISABLE_PER_OP_PROFILING // By default, when there're no high-frequency or non-sampled callbacks, // RecordFunction is pre-sampled as a perf optimization; @@ -553,7 +552,7 @@ template inline Return Dispatcher::redispatch(const TypedOperatorHandle& op, DispatchKeySet currentDispatchKeySet, Args... args) const { detail::unused_arg_(args...); // workaround for a false-positive warning about unused parameters in gcc 5 // do not use RecordFunction on redispatch - const KernelFunction& kernel = op.operatorDef_->op.lookup(currentDispatchKeySet.highestPriorityTypeId()); + const KernelFunction& kernel = op.operatorDef_->op.lookup(currentDispatchKeySet); return kernel.template call(op, currentDispatchKeySet, std::forward(args)...); } @@ -561,7 +560,7 @@ inline void Dispatcher::callBoxed(const OperatorHandle& op, Stack* stack) const // note: this doesn't need the mutex because write operations on the list keep iterators intact. const auto& entry = op.operatorDef_->op; auto dispatchKeySet = entry.dispatchKeyExtractor().getDispatchKeySetBoxed(stack); - const auto& kernel = entry.lookup(dispatchKeySet.highestPriorityTypeId()); + const auto& kernel = entry.lookup(dispatchKeySet); #ifndef PYTORCH_DISABLE_PER_OP_PROFILING bool pre_sampled = false; if (C10_UNLIKELY(at::shouldRunRecordFunction(&pre_sampled))) { @@ -593,7 +592,7 @@ inline void Dispatcher::callBoxed(const OperatorHandle& op, Stack* stack) const inline void Dispatcher::redispatchBoxed(const OperatorHandle& op, DispatchKeySet dispatchKeySet, Stack* stack) const { // note: this doesn't need the mutex because write operations on the list keep iterators intact. const auto& entry = op.operatorDef_->op; - const auto& kernel = entry.lookup(dispatchKeySet.highestPriorityTypeId()); + const auto& kernel = entry.lookup(dispatchKeySet); return kernel.callBoxed(op, dispatchKeySet, stack); } diff --git a/aten/src/ATen/core/dispatch/ObservedOperators.cpp b/aten/src/ATen/core/dispatch/ObservedOperators.cpp index 1d1ed4c1926a48..65545a221f9cb8 100644 --- a/aten/src/ATen/core/dispatch/ObservedOperators.cpp +++ b/aten/src/ATen/core/dispatch/ObservedOperators.cpp @@ -15,6 +15,7 @@ std::unordered_set& ObservedOperators::getUnobservedOperatorList() "aten::_version", "aten::is_complex", "profiler::_record_function_enter", + "profiler::_record_function_enter_new", "profiler::_record_function_exit", }; return not_observed_ops; diff --git a/aten/src/ATen/core/dispatch/OperatorEntry.cpp b/aten/src/ATen/core/dispatch/OperatorEntry.cpp index d4d997fde69aef..d5cc6d45933fa2 100644 --- a/aten/src/ATen/core/dispatch/OperatorEntry.cpp +++ b/aten/src/ATen/core/dispatch/OperatorEntry.cpp @@ -283,7 +283,10 @@ std::pair OperatorEntry::computeDispatchTab } // 3. Backend fallback - auto dispatch_ix = static_cast(dispatch_key); + auto dispatch_ix = getDispatchTableIndexForDispatchKey(dispatch_key); + if (dispatch_ix < 0) { + return {missingKernel(), "backend fallback not registered on mobile"}; + } if (dispatcher.backendFallbackKernels_[dispatch_ix].kernel.isValid()) { return {dispatcher.backendFallbackKernels_[dispatch_ix], "backend fallback"}; } @@ -299,7 +302,7 @@ std::pair OperatorEntry::computeDispatchTab // or alias keys and their associated keysets). // This function should be considered a private helper for updateDispatchTable_() void OperatorEntry::updateDispatchTableEntry_(const c10::Dispatcher& dispatcher, DispatchKey dispatch_key) { - const auto dispatch_ix = c10::getDispatchTableIndexForDispatchKey(dispatch_key); + const auto dispatch_ix = getDispatchTableIndexForDispatchKey(dispatch_key); if (C10_UNLIKELY(dispatch_ix == -1)) { return; } @@ -329,8 +332,12 @@ void OperatorEntry::updateDispatchTable_(const c10::Dispatcher& dispatcher, Disp } // Note [Refresh Runtime Autograd entries in dispatchTable_] // Registering to backend key might affect computed entry at its Autograd backend key due to (2.1) & (2.3). + // In theory, we should only have to check if the given runtime key has "dense" functionality, + // e.g. DispatchKey::CPU (which is composed of DispatchKey::Dense and BackendComponent::CPUBit). + // However, there are some backends that should be included in this set that don't have the dense key set. + // E.g. DispatchKey::Meta, DispatchKey::ORT. if (c10::isBackendDispatchKey(dispatch_key)) { - DispatchKey autograd_key = getAutogradKeyFromBackend(dispatch_key); + DispatchKey autograd_key = getAutogradKeyFromBackend(toBackendComponent(dispatch_key)); updateDispatchTableEntry_(dispatcher, autograd_key); } } @@ -357,8 +364,9 @@ void OperatorEntry::updateDispatchTableFull_(const c10::Dispatcher& dispatcher) // catchAll. After catchAllKernel_ is removed, Undefined now can get a kernel from either CompositeExplicitAutograd // or CompositeImplicitAutograd alias key so that we don't break the support. Ideally isIncludedInAlias(Undefined, CompositeImplicitAutograd) // should return true, it returns false because Undefined cannot be represented in a DispatchKeySet. - for (uint8_t iter = 0; iter != static_cast(DispatchKey::NumDispatchKeys); ++iter) { - updateDispatchTable_(dispatcher, static_cast(iter)); + updateDispatchTable_(dispatcher, DispatchKey::Undefined); + for (auto k : DispatchKeySet(DispatchKeySet::FULL)) { + updateDispatchTable_(dispatcher, k); } } @@ -371,9 +379,13 @@ void OperatorEntry::checkInvariants() const { for (const auto& kv : kernels_) { TORCH_INTERNAL_ASSERT(kv.second.size() > 0, dumpState()); } - for (uint8_t iter = 0; iter != static_cast(DispatchKey::NumDispatchKeys); ++iter) { - auto expected_k = computeDispatchTableEntry(c10::Dispatcher::singleton(), static_cast(iter)); - TORCH_INTERNAL_ASSERT(expected_k._equalsBoxedAndUnboxed(dispatchTable_[iter]), + for (auto k : DispatchKeySet(DispatchKeySet::FULL)) { + auto expected_k = computeDispatchTableEntry(c10::Dispatcher::singleton(), k); + auto idx = getDispatchTableIndexForDispatchKey(k); + if (C10_UNLIKELY(idx == -1)) { + continue; + } + TORCH_INTERNAL_ASSERT(expected_k._equalsBoxedAndUnboxed(dispatchTable_[idx]), "Canonical state\n~~~~~~~~~~~\n", dumpState(), "\n\n" "Computed table:\n~~~~~~~~~~~\n", dumpComputedTable()); } @@ -384,8 +396,9 @@ std::string OperatorEntry::listAllDispatchKeys() const { str << "["; bool has_kernels = false; - for (uint8_t iter = 0; iter != static_cast(DispatchKey::NumDispatchKeys); ++iter) { - if (!dispatchTable_[iter].isValid()) { + for (auto k : DispatchKeySet(DispatchKeySet::FULL)) { + auto iter = getDispatchTableIndexForDispatchKey(k); + if (iter == -1 || !dispatchTable_[iter].isValid()) { continue; } if (has_kernels) { @@ -443,8 +456,12 @@ void OperatorEntry::reportError(DispatchKey dispatchKey) const { // updateDispatchTableFull_ would update the dispatch table to be) std::string OperatorEntry::dumpComputedTable() const { std::ostringstream oss; - for (uint8_t i = 0; i < static_cast(DispatchKey::NumDispatchKeys); i++) { - auto k = static_cast(i); + // Need to handle Undefined separately, because its a runtime key that can't be represented + // in a DispatchKeySet. + std::vector runtime_keys = {DispatchKey::Undefined}; + for (auto k : DispatchKeySet(DispatchKeySet::FULL)) runtime_keys.push_back(k); + + for (auto k : runtime_keys) { auto kernel_prov = computeDispatchTableEntryWithDebug(c10::Dispatcher::singleton(), k); if (kernel_prov.first.kernel.isValid()) { oss << toString(k) << ": " diff --git a/aten/src/ATen/core/dispatch/OperatorEntry.h b/aten/src/ATen/core/dispatch/OperatorEntry.h index d98bd6bc69041a..c0f90808280a8e 100644 --- a/aten/src/ATen/core/dispatch/OperatorEntry.h +++ b/aten/src/ATen/core/dispatch/OperatorEntry.h @@ -173,10 +173,10 @@ class TORCH_API OperatorEntry final { [[noreturn]] void reportError(DispatchKey dispatchKey) const; - const KernelFunction& lookup(DispatchKey k) const { - const auto idx = getDispatchTableIndexForDispatchKey(k); + const KernelFunction& lookup(DispatchKeySet ks) const { + const auto idx = ks.getDispatchTableIndexForDispatchKeySet(); if (C10_UNLIKELY(idx == -1)) { - reportError(k); + reportError(ks.highestPriorityTypeId()); } const auto& kernel = dispatchTable_[idx]; // A valid kernel *always* has a boxed kernel and *may* have an @@ -187,7 +187,7 @@ class TORCH_API OperatorEntry final { // in the common case. if (C10_UNLIKELY(!kernel.isValidUnboxed())) { if (!kernel.isValid()) { - reportError(k); + reportError(ks.highestPriorityTypeId()); } } return kernel; @@ -211,7 +211,7 @@ class TORCH_API OperatorEntry final { OperatorName name_; c10::optional schema_; - std::array dispatchTable_; + std::array dispatchTable_; DispatchKeyExtractor dispatchKeyExtractor_; // kernels_ stores all registered kernels for the corresponding dispatch key diff --git a/aten/src/ATen/core/dynamic_type.cpp b/aten/src/ATen/core/dynamic_type.cpp index 95050da593eb01..051b859d98158a 100644 --- a/aten/src/ATen/core/dynamic_type.cpp +++ b/aten/src/ATen/core/dynamic_type.cpp @@ -227,6 +227,8 @@ TypePtr DynamicType::fallback() const { return BoolType::get(); case Tag::Int: return IntType::get(); + case Tag::SymInt: + return SymIntType::get(); case Tag::Float: return FloatType::get(); case Tag::Complex: @@ -320,6 +322,8 @@ DynamicType::Ptr IValue::TagType::get(const c10::IValue& v) { return DynamicTypeTrait::getBaseType(); case Tag::Int: return DynamicTypeTrait::getBaseType(); + case Tag::SymInt: + return DynamicTypeTrait::getBaseType(); case Tag::Bool: return DynamicTypeTrait::getBaseType(); case Tag::String: diff --git a/aten/src/ATen/core/dynamic_type.h b/aten/src/ATen/core/dynamic_type.h index d5551c9a5e511c..7be10d810e42a1 100644 --- a/aten/src/ATen/core/dynamic_type.h +++ b/aten/src/ATen/core/dynamic_type.h @@ -16,6 +16,7 @@ constexpr DynamicTypeBits kDynamicAnyTypeBit = DYNAMIC_TYPE_BIT(30); constexpr DynamicTypeBits kDynamicNoneTypeBit = DYNAMIC_TYPE_BIT(1); constexpr DynamicTypeBits kDynamicIntTypeBit = DYNAMIC_TYPE_BIT(3); +constexpr DynamicTypeBits kDynamicSymIntTypeBit = DYNAMIC_TYPE_BIT(23); constexpr DynamicTypeBits kDynamicFloatTypeBit = DYNAMIC_TYPE_BIT(4); constexpr DynamicTypeBits kDynamicComplexTypeBit = DYNAMIC_TYPE_BIT(5); constexpr DynamicTypeBits kDynamicListTypeBit = DYNAMIC_TYPE_BIT(7); @@ -28,6 +29,7 @@ constexpr DynamicTypeBits kDynamicClassTypeBit = DYNAMIC_TYPE_BIT(10); _(Bool, DYNAMIC_TYPE_BIT(2), 1) \ _(Int, kDynamicIntTypeBit, 1) \ _(Float, kDynamicFloatTypeBit, 1) \ + _(SymInt, kDynamicSymIntTypeBit, 1) \ _(Complex, kDynamicComplexTypeBit, 1) \ _(Number, \ (kDynamicIntTypeBit | kDynamicFloatTypeBit | kDynamicComplexTypeBit), \ @@ -159,7 +161,7 @@ class DynamicType : public SharedType { const Arguments& arguments() const { return arguments_; } - TypeKind dynamicKind() const; + TORCH_API TypeKind dynamicKind() const; // Should be used only on the server side to restore static type information. #ifndef C10_MOBILE diff --git a/aten/src/ATen/core/function.h b/aten/src/ATen/core/function.h index b0c02041affcbb..881efb1a4ff046 100644 --- a/aten/src/ATen/core/function.h +++ b/aten/src/ATen/core/function.h @@ -90,7 +90,7 @@ struct TORCH_API Function { // call() returns false. // Overload for server interpreter, a bailout size is needed for graph executor. - virtual bool call(Stack&, size_t, c10::function_ref) { + virtual bool call(Stack&, c10::optional, c10::function_ref) { TORCH_INTERNAL_ASSERT_DEBUG_ONLY(false); return false; } diff --git a/aten/src/ATen/core/interned_strings.h b/aten/src/ATen/core/interned_strings.h index 88f275093d1e93..46b43aecce2850 100644 --- a/aten/src/ATen/core/interned_strings.h +++ b/aten/src/ATen/core/interned_strings.h @@ -64,6 +64,8 @@ namespace c10 { _(prim, PadPacked) /* onnx */ \ _(prim, Placeholder) /* debug */ \ _(prim, Print) \ + _(prim, EmptyListLiteral) \ + _(prim, LegacyTypedConstructor) \ _(prim, PythonOp) \ _(prim, IgnoredPythonOp) \ _(prim, Reverse) \ @@ -107,7 +109,6 @@ namespace c10 { _(aten, Complex) \ _(aten, str) \ _(aten, Delete) \ - _(aten, gelu_) \ _(prim, device) \ _(prim, dtype) \ _(prim, layout) \ @@ -302,6 +303,7 @@ namespace c10 { _(attr, transA) \ _(attr, transB) \ _(attr, name) \ + _(attr, module) \ _(attr, beg) \ _(attr, idx) \ _(attr, split) \ diff --git a/aten/src/ATen/core/ivalue.cpp b/aten/src/ATen/core/ivalue.cpp index 85117e345e30fa..cd980e84df1698 100644 --- a/aten/src/ATen/core/ivalue.cpp +++ b/aten/src/ATen/core/ivalue.cpp @@ -91,6 +91,8 @@ c10::TypePtr IValue::TagType::get(const IValue& v) { return ComplexType::get(); case Tag::Int: return IntType::get(); + case Tag::SymInt: + return c10::SymIntType::get(); case Tag::Bool: return BoolType::get(); case Tag::String: @@ -298,6 +300,8 @@ IValue IValue::equals(const IValue& rhs) const { return rhs.isComplexDouble() && lhs.toComplexDouble() == rhs.toComplexDouble(); case Tag::Int: return rhs.isInt() && lhs.toInt() == rhs.toInt(); + case Tag::SymInt: + return rhs.isSymInt() && lhs.toSymInt() == rhs.toSymInt(); case Tag::Bool: return rhs.isBool() && lhs.toBool() == rhs.toBool(); case Tag::String: @@ -349,6 +353,8 @@ size_t IValue::hash(const IValue& v) { return c10::get_hash(v.payload.u.as_int); case Tag::Int: return c10::get_hash(v.payload.u.as_int); + case Tag::SymInt: + return c10::get_hash(v.payload.u.as_int); case Tag::String: return c10::get_hash(v.toStringRef()); case Tag::Tuple: @@ -567,6 +573,8 @@ std::ostream& IValue::repr( } case IValue::Tag::Int: return out << v.toInt(); + case IValue::Tag::SymInt: + return out << v.toSymInt(); case IValue::Tag::Bool: return out << (v.toBool() ? "True" : "False"); case IValue::Tag::Tuple: { @@ -753,6 +761,8 @@ std::ostream& operator<<(std::ostream & out, const IValue & v) { return printComplex(out, v); } case IValue::Tag::Int: return out << v.toInt(); + case IValue::Tag::SymInt: + return out << v.toSymInt(); case IValue::Tag::Bool: return out << (v.toBool() ? "True" : "False"); case IValue::Tag::Tuple: { @@ -886,6 +896,7 @@ IValue IValue::deepcopy( case IValue::Tag::None: case IValue::Tag::Double: case IValue::Tag::Int: + case IValue::Tag::SymInt: case IValue::Tag::Bool: case IValue::Tag::Device: case IValue::Tag::Uninitialized: { diff --git a/aten/src/ATen/core/ivalue.h b/aten/src/ATen/core/ivalue.h index 81867348450d48..dbb4f08739ff8c 100644 --- a/aten/src/ATen/core/ivalue.h +++ b/aten/src/ATen/core/ivalue.h @@ -92,12 +92,29 @@ struct OptionalArray { return *this; } + // Used when saving an argument for the backwards pass. + OptionalArray& operator=(c10::OptionalArrayRef ref) { + if (ref) { + list = std::vector(ref->begin(), ref->end()); + } else { + list = nullopt; + } + return *this; + } + operator c10::optional>() { if (!list) { return nullopt; } return *list; } + + operator c10::OptionalArrayRef() { + if (!list) { + return nullopt; + } + return *list; + } }; // Capsule is an internal implementation detail of custom C++ classes. We @@ -127,6 +144,7 @@ struct Capsule { _(Double) \ _(ComplexDouble) \ _(Int) \ + _(SymInt) \ _(Bool) \ _(Tuple) \ _(String) \ @@ -543,6 +561,18 @@ struct TORCH_API IValue final { payload.u.as_int = i; } + IValue(c10::SymInt i) : tag(Tag::SymInt), is_intrusive_ptr(false) { + payload.u.as_int = i.data(); + } + + bool isSymInt() const { + return Tag::SymInt == tag; + } + + c10::SymInt toSymInt() const { + return c10::SymInt(payload.u.as_int); + } + // allow you to pass literals (3, 4) without ambiguity IValue(int32_t i) : IValue(static_cast(i)) {} @@ -666,6 +696,8 @@ struct TORCH_API IValue final { template = nullptr> IValue(c10::optional v); + template = nullptr> + IValue(c10::OptionalArrayRef v); IValue(c10::nullopt_t); // ClassType diff --git a/aten/src/ATen/core/ivalue_inl.h b/aten/src/ATen/core/ivalue_inl.h index 24e904e4444e52..57d9ed8d5ed330 100644 --- a/aten/src/ATen/core/ivalue_inl.h +++ b/aten/src/ATen/core/ivalue_inl.h @@ -1584,6 +1584,7 @@ DEFINE_TO(at::MemoryFormat, toMemoryFormat) DEFINE_TO(at::QScheme, toQScheme) DEFINE_TO(at::Dimname, toDimname) DEFINE_TO(at::Generator, toGenerator) +DEFINE_TO(c10::SymInt, toSymInt) template struct _fake_type {}; @@ -1981,6 +1982,13 @@ inline IValue::IValue(const std::vector& v) : IValue(c10::List()) { list.push_back(e); } } +template > +inline IValue::IValue(c10::OptionalArrayRef v) : IValue() { + if (v.has_value()) { + *this = IValue(std::move(*v)); + } +} + template inline IValue::IValue(std::array v) : IValue(c10::List()) { auto list = to>(); diff --git a/aten/src/ATen/core/jit_type.h b/aten/src/ATen/core/jit_type.h index cbeb8154774a72..4956ad426fb96c 100644 --- a/aten/src/ATen/core/jit_type.h +++ b/aten/src/ATen/core/jit_type.h @@ -435,6 +435,17 @@ struct TORCH_API SymbolicShape { return dims_; } + c10::optional> symbolicDims() const { + if (!dims_) { + return c10::nullopt; + } + auto symbolic_dims = std::vector(); + for (const ShapeSymbol& s : *dims_) { + symbolic_dims.push_back(!s.is_static()); + } + return symbolic_dims; + } + // Checks whether the shape is fully defined/complete, ie. rank and sizes // of every dimension are known. bool isComplete() const { @@ -866,7 +877,11 @@ struct TORCH_API DictType : public SharedType { static const TypeKind Kind = TypeKind::DictType; static DictTypePtr create(TypePtr key, TypePtr value) { - switch (key->kind()) { + auto kind = key->kind(); + if (auto dyn = key->castRaw()) { + kind = dyn->dynamicKind(); + } + switch (kind) { case TypeKind::AnyType: case TypeKind::IntType: case TypeKind::BoolType: @@ -1232,6 +1247,31 @@ struct TORCH_API ComplexType : public NumberType { } }; +// We need to introduce `SymIntType` to represent the `SymInt` type +// used in function schemas e.g. `aten::narrow_copy(... SymInt length) +// `SymInt` will be used to enable tracing arithmetic operations on +// dimension values. Please see [SymInt.h] for more information +struct SymIntType; +using SymIntTypePtr = SingletonTypePtr; +struct TORCH_API SymIntType : public Type { + bool equals(const Type& rhs) const override { + return rhs.kind() == kind(); + } + std::string str() const override { + return "SymInt"; + } + std::string annotation_str_impl(TypePrinter printer = nullptr) const override { + // TODO: will become a Union[SymbolicIntNode|int] in the near future + return "int"; + } + static const TypeKind Kind = TypeKind::SymIntType; + // global singleton + static SymIntTypePtr get(); + + private: + SymIntType() : Type(TypeKind::SymIntType) {} +}; + struct IntType; using IntTypePtr = SingletonTypePtr; // This type represents a Python int number @@ -1693,6 +1733,13 @@ struct getTypePtr_ final { return IntType::get(); } }; + +template <> +struct getTypePtr_ final { + static decltype(auto) call() { + return SymIntType::get(); + } +}; template <> struct getTypePtr_ final { static decltype(auto) call() { @@ -1812,6 +1859,15 @@ struct getTypePtr_> final { return type; } }; + +template<> +struct getTypePtr_ final { + static const auto& call() { + static auto type = OptionalType::create(getTypePtr_::call()); + return type; + } +}; + template struct getTypePtr_> final { static const auto& call() { diff --git a/aten/src/ATen/core/jit_type_base.h b/aten/src/ATen/core/jit_type_base.h index 21a17c9ec6693e..f7a95402ca39ee 100644 --- a/aten/src/ATen/core/jit_type_base.h +++ b/aten/src/ATen/core/jit_type_base.h @@ -6,6 +6,7 @@ #include #include +#include #include #include #include @@ -48,6 +49,7 @@ namespace c10 { _(AnyListType) \ _(AnyTupleType) \ _(AnyClassType) \ + _(SymIntType) \ _(UnionType) \ _(DynamicType) diff --git a/aten/src/ATen/core/library.cpp b/aten/src/ATen/core/library.cpp index ba16a5bf10c129..ba608e98ad53a8 100644 --- a/aten/src/ATen/core/library.cpp +++ b/aten/src/ATen/core/library.cpp @@ -235,6 +235,9 @@ Library& Library::_fallback(CppFunction&& f) & { // Note if dispatch_key is DispatchKey::Undefined, it'll be ignored here since Undefined // isn't a runtime key, you shouldn't register anything to it at all. for (auto k : c10::getRuntimeDispatchKeySet(*dispatch_key)) { + // mobile doesn't use all dispatch keys, so skip any fallback registrations for the unused keys. + auto idx = getDispatchTableIndexForDispatchKey(k); + if (idx < 0) continue; registrars_.emplace_back( c10::Dispatcher::singleton().registerFallback( k, diff --git a/aten/src/ATen/core/op_registration/op_registration_test.cpp b/aten/src/ATen/core/op_registration/op_registration_test.cpp index 0a3f9236b75522..05294c25548eb1 100644 --- a/aten/src/ATen/core/op_registration/op_registration_test.cpp +++ b/aten/src/ATen/core/op_registration/op_registration_test.cpp @@ -284,7 +284,8 @@ TEST(OperatorRegistrationTest, whenRegisteringMultipleKernelsInSameOpCallAndCall EXPECT_FALSE(called_kernel1); EXPECT_TRUE(called_kernel2); - for (c10::DispatchKey key : {c10::DispatchKey::XLA, c10::DispatchKey::Lazy}) { + // Test for out of tree lazy backends- ::Lazy key is now registered to TS backend in tree + for (c10::DispatchKey key : {c10::DispatchKey::XLA}) { std::string expectMessage = expectedMessageForBackend(key); expectThrows([&] { callOp(*op, dummyTensor(key)); @@ -591,7 +592,7 @@ TEST(OperatorRegistrationTest, AutogradBackendOverridesAutogradKernel) { void LazyBackendsAutogradOverridesAutogradKernel(DispatchKey key) { auto registrar = c10::RegisterOperators().op("_test::dummy(Tensor dummy) -> ()", c10::RegisterOperators::options() - .kernel(c10::getAutogradKeyFromBackend(key)) + .kernel(c10::getAutogradKeyFromBackend(toBackendComponent(key))) .kernel(DispatchKey::Autograd)); auto op = Dispatcher::singleton().findSchema({"_test::dummy", ""}); @@ -613,14 +614,13 @@ void LazyBackendsAutogradOverridesAutogradKernel(DispatchKey key) { EXPECT_FALSE(called_nonautograd); } +// no longer test ::Lazy key here +// since it is now registered to TS backend in-tree and thus behaves differently, +// does not throw the expected 'could not run..' messages TEST(OperatorRegistrationTest, AutogradXLAOverridesAutogradKernel) { LazyBackendsAutogradOverridesAutogradKernel(DispatchKey::XLA); } -TEST(OperatorRegistrationTest, AutogradLazyOverridesAutogradKernel) { - LazyBackendsAutogradOverridesAutogradKernel(DispatchKey::Lazy); -} - void whenRegisterWithLazyBackendsAndCatchAll_AutogradLazyBackendsIsNotFilled(DispatchKey key) { { auto registrar = c10::RegisterOperators().op("_test::dummy(Tensor dummy) -> ()", c10::RegisterOperators::options() @@ -1791,22 +1791,22 @@ TEST(NewOperatorRegistrationTest, dispatchAutogradPrecedence) { TEST(NewOperatorRegistrationTest, throwsWhenRegisterToBackendMapsToAutogradOther) { // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - bool sparsecpu_called, math_called = false; + bool fpga_called, math_called = false; auto m = MAKE_TORCH_LIBRARY(test); - m.def("fn", torch::dispatch(c10::DispatchKey::SparseCPU, [&](const Tensor& x) { sparsecpu_called = true; return x; })); + m.def("fn", torch::dispatch(c10::DispatchKey::FPGA, [&](const Tensor& x) { fpga_called = true; return x; })); m.impl("fn", c10::DispatchKey::CompositeImplicitAutograd, [&](const Tensor& x) { math_called = true; return x; }); auto op = Dispatcher::singleton().findSchema({"test::fn", ""}); ASSERT_TRUE(op.has_value()); { - callOp(*op, dummyTensor(c10::DispatchKey::SparseCPU)); - ASSERT_TRUE(sparsecpu_called); + callOp(*op, dummyTensor(c10::DispatchKey::FPGA)); + ASSERT_TRUE(fpga_called); } { expectThrows([&] { - callOp(*op, dummyTensor(c10::DispatchKey::SparseCPU, /*requires_grad=*/true)); + callOp(*op, dummyTensor(c10::DispatchKey::FPGA, /*requires_grad=*/true)); }, "test::fn has kernels registered to both CompositeImplicitAutograd and a backend mapped to AutogradOther."); } } @@ -1849,18 +1849,15 @@ TEST(NewOperatorRegistrationTest, dispatchMultipleTensors) { } { - // TODO(#43908): currently this will fallthrough AutogradPrivateUse1 then call catchall kernel - // at AutogradCPU, while backend extenders are indeed expecting to call PrivateUse1 kernel. - // This confusing behavior is caused by we registering fallthrough as backend fallback for - // Autograd keys. Note users could always work around this by registering the same kernel to - // AutogradPrivateUse1 as shown below until we support it. auto op = Dispatcher::singleton().findOp({"test::fn", ""}); ASSERT_TRUE(op.has_value()); catchall_called = false; + privateuse1_called = false; callOp(*op, dummyTensor(c10::DispatchKey::PrivateUse1, /*requires_grad=*/true), dummyTensor(c10::DispatchKey::CPU, /*requires_grad=*/true)); - ASSERT_TRUE(catchall_called); + ASSERT_FALSE(catchall_called); + ASSERT_TRUE(privateuse1_called); } m.impl("fn", c10::DispatchKey::AutogradPrivateUse1, [&](const Tensor& x, const Tensor& y) { privateuse1_called = true; return x; }); @@ -1876,6 +1873,27 @@ TEST(NewOperatorRegistrationTest, dispatchMultipleTensors) { } } +TEST(NewOperatorRegistrationTest, registerCompositeImplicitAutogradWithCPUKernel_andCallAutogradOtherKernel_callsComposite) { + bool math_called = false; + bool cpu_called = false; + auto m = MAKE_TORCH_LIBRARY(test); + m.def("fn(Tensor dummy) -> Tensor"); + m.impl("fn", c10::DispatchKey::CPU, [&](const Tensor& x) { cpu_called = true; return x; }); + m.impl("fn", c10::DispatchKey::CompositeImplicitAutograd, [&](const Tensor& x) { math_called = true; return x; }); + + auto op = Dispatcher::singleton().findSchema({"test::fn", ""}); + ASSERT_TRUE(op.has_value()); + + { + math_called = cpu_called = false; + // Meta should redispatch to the AutogradOther backend, + // which the composite kernel should be registered to. + callOp(*op, dummyTensor(c10::DispatchKey::Meta, /*requires_grad=*/true)); + ASSERT_TRUE(math_called); + ASSERT_FALSE(cpu_called); + } +} + TEST(NewOperatorRegistrationTest, dispatchMultiple) { bool cpu_called = false; bool cuda_called = false; diff --git a/aten/src/ATen/core/tensor_type.cpp b/aten/src/ATen/core/tensor_type.cpp index cb7b6cc2766753..664aa301f0a463 100644 --- a/aten/src/ATen/core/tensor_type.cpp +++ b/aten/src/ATen/core/tensor_type.cpp @@ -3,6 +3,40 @@ namespace c10 { +namespace { + +// The idea is to only mark possible overlap across dimensions. We want to +// return false for expanded tensors and permuted tensors, for which dimensional +// collapsing is safe. +bool possible_cross_dimension_overlap(c10::IntArrayRef sizes, c10::IntArrayRef strides) { + int n_dim = static_cast(sizes.size()); + std::vector stride_indices(n_dim); + std::iota(stride_indices.rbegin(), stride_indices.rend(), 0); + + // sort indices going with ascending strides + for (int i = 1; i < n_dim; i++) { + auto c = i; + for (int j = i - 1; j >= 0; j--) { + if (strides[stride_indices[j]] > strides[stride_indices[c]]) { + std::swap(stride_indices[j], stride_indices[c]); + c = j; + } + } + } + + for (const auto i : c10::irange(1, n_dim)) { + if (i != 0) { + // we are being conservative on checking for memory overlap + if (sizes[stride_indices[i]] != 1 && strides[stride_indices[i]] < sizes[stride_indices[i-1]] * strides[stride_indices[i-1]]) { + return true; + } + } + } + return false; +} + +} + const TensorTypePtr& TensorType::get() { static auto value = TensorType::create( {}, {}, SymbolicShape(), VaryingShape{}, {}); @@ -115,6 +149,10 @@ VaryingShape TensorType::computeStrideProps( bool tensor_contiguity) { int n_dim = static_cast(sizes.size()); std::vector stride_indices(n_dim); + // default has_overlap to false as we only compute overlap when: + // 1. input sizes/strides fails format check; + // 2. tensor_contiguity are not set. + bool has_overlap = false; // Sorting strides in ascending order // Example: @@ -173,21 +211,35 @@ VaryingShape TensorType::computeStrideProps( } } } + // conveniently is_contiguous_strides/is_contiguous_strides only returns + // true when there's no memory overlap, so we only re-compute has_overlap + // in the last branch when both returns false + if (!tensor_contiguity) { + // trust tensor_contiguity and only computes overlap when it is not set + has_overlap = possible_cross_dimension_overlap(sizes, strides); + } } std::vector stride_properties; + + for (size_t i = 0; i < stride_indices.size(); i++) { bool contiguous_ = tensor_contiguity; if (!contiguous_) { - // innermost stride expected to be 1 - // TODO: turn contiguous_ into an enum CONTIGUOUS, NONCONTIGUOUS, - // BROADCASTED - if (i == 0) { - contiguous_ = strides[stride_indices[i]] == 1; + if (!has_overlap) { + // innermost stride expected to be 1 + // TODO: turn contiguous_ into an enum CONTIGUOUS, NONCONTIGUOUS, + // BROADCASTED + if (i == 0) { + contiguous_ = strides[stride_indices[i]] == 1; + } else { + contiguous_ = strides[stride_indices[i]] == 1 || + (strides[stride_indices[i]] != 0 && + strides[stride_indices[i]] == + strides[stride_indices[i - 1]] * sizes[stride_indices[i - 1]]); + } } else { - contiguous_ = strides[stride_indices[i]] == 1 || - (strides[stride_indices[i]] != 0 && - strides[stride_indices[i]] == - strides[stride_indices[i - 1]] * sizes[stride_indices[i - 1]]); + // leaving this assign statement for readability; + contiguous_ = false; } } stride_properties.emplace_back(stride_indices[i], contiguous_, strides[stride_indices[i]]); diff --git a/aten/src/ATen/core/type.cpp b/aten/src/ATen/core/type.cpp index a3f0451dc61cb9..5d981f31f8a5bb 100644 --- a/aten/src/ATen/core/type.cpp +++ b/aten/src/ATen/core/type.cpp @@ -143,6 +143,11 @@ std::ostream& operator<<(std::ostream & out, const Type & t) { return out; } +std::ostream& operator<<(std::ostream& os, SymInt s) { + os << "SymInt(" << s.data() << ")"; + return os; +} + AnyTypePtr AnyType::get() { static AnyTypePtr value(new AnyType()); return value; @@ -257,6 +262,11 @@ AnyEnumTypePtr AnyEnumType::get() { return value; } +SymIntTypePtr SymIntType::get() { + static SymIntTypePtr value(new SymIntType()); + return value; +} + c10::optional unifyTypesImpl(const TypePtr& t1, const TypePtr& t2, bool default_to_union=false, TypePtr type_hint=nullptr) { // check direct subtyping relation if (t1->isSubtypeOf(*t2)) { diff --git a/aten/src/ATen/cuda/Atomic.cuh b/aten/src/ATen/cuda/Atomic.cuh index cd002414687a34..2bd8364ebf8a4a 100644 --- a/aten/src/ATen/cuda/Atomic.cuh +++ b/aten/src/ATen/cuda/Atomic.cuh @@ -298,7 +298,7 @@ static inline __device__ void gpuAtomicAddNoReturn(at::BFloat16 *address, at::BF static inline __device__ void gpuAtomicAddNoReturn(double *address, double val) { gpuAtomicAdd(address, val); } /* Special case fp32 atomic. */ -#if defined(USE_ROCM) && defined(__gfx908__) +#if defined(USE_ROCM) static inline __device__ void gpuAtomicAddNoReturn(float *address, float val) { atomicAddNoRet(address, val); } #else static inline __device__ void gpuAtomicAddNoReturn(float *address, float val) { gpuAtomicAdd(address, val); } @@ -344,3 +344,83 @@ inline __device__ float gpuAtomicMul (float * address, float val) { return __int_as_float(old); } + +// Atomic maximum implementation. + +inline __device__ at::Half gpuAtomicMax(at::Half * address, at::Half val) { + return AtomicFPOp()(address, val, + [](at::Half bsum, at::Half val) { + return max(bsum, val); + }); +} + +inline __device__ at::BFloat16 gpuAtomicMax(at::BFloat16 * address, at::BFloat16 val) { + return AtomicFPOp()(address, val, + [](at::BFloat16 bsum, at::BFloat16 val) { + return max(bsum, val); + }); +} + +inline __device__ double gpuAtomicMax(double * address, double val) { + return AtomicFPOp()(address, val, + [](double val, unsigned long long int assumed) { + return __double_as_longlong(max(val, __longlong_as_double(assumed))); + }); +} + +// Dont use a templated function for this since the addition function defaults to the CUDA built-in. +inline __device__ float gpuAtomicMax(float * address, float val) { + unsigned int* address_as_ull = (unsigned int*)address; + unsigned int old = *address_as_ull; + unsigned int assumed; + + do { + assumed = old; + old = atomicCAS(address_as_ull, assumed, + __float_as_int(max(val, __int_as_float(assumed)))); + + // Note: uses integer comparison to avoid hang in case of NaN (since NaN != NaN) + } while (assumed != old); + + return __int_as_float(old); +} + +// Atomic minimum implementation. + +inline __device__ at::Half gpuAtomicMin(at::Half * address, at::Half val) { + return AtomicFPOp()(address, val, + [](at::Half bsum, at::Half val) { + return min(bsum, val); + }); +} + +inline __device__ at::BFloat16 gpuAtomicMin(at::BFloat16 * address, at::BFloat16 val) { + return AtomicFPOp()(address, val, + [](at::BFloat16 bsum, at::BFloat16 val) { + return min(bsum, val); + }); +} + +inline __device__ double gpuAtomicMin(double * address, double val) { + return AtomicFPOp()(address, val, + [](double val, unsigned long long int assumed) { + return __double_as_longlong(min(val, __longlong_as_double(assumed))); + }); +} + +// Dont use a templated function for this since the addition function defaults to the CUDA built-in. +inline __device__ float gpuAtomicMin(float * address, float val) { + unsigned int* address_as_ull = (unsigned int*)address; + unsigned int old = *address_as_ull; + unsigned int assumed; + + do { + assumed = old; + old = atomicCAS(address_as_ull, assumed, + __float_as_int(min(val, __int_as_float(assumed)))); + + // Note: uses integer comparison to avoid hang in case of NaN (since NaN != NaN) + } while (assumed != old); + + return __int_as_float(old); +} diff --git a/aten/src/ATen/cuda/CUDABlas.cpp b/aten/src/ATen/cuda/CUDABlas.cpp index 5e795396d7dbe5..ec023f27e89d28 100644 --- a/aten/src/ATen/cuda/CUDABlas.cpp +++ b/aten/src/ATen/cuda/CUDABlas.cpp @@ -15,6 +15,11 @@ #include #endif +#ifdef USE_ROCM +#define PYTORCH_ROCBLAS_VERSION_DECIMAL (ROCBLAS_VERSION_MAJOR * 100 + ROCBLAS_VERSION_MINOR) +#define USE_GEMM_FLAGS_FP16_ALT_IMPL (PYTORCH_ROCBLAS_VERSION_DECIMAL >= 242) +#endif + #define CUDABLAS_POSINT_CHECK(FD, X) \ TORCH_CHECK( \ (X > 0 && X <= INT_MAX), \ @@ -246,13 +251,17 @@ void bgemm(CUDABLAS_BGEMM_ARGTYPES(at::Half)) { float falpha = alpha; float fbeta = beta; #ifdef USE_ROCM + int flag = 0; +#if USE_GEMM_FLAGS_FP16_ALT_IMPL + flag = at::BackwardPassGuard::is_backward_pass() ? rocblas_gemm_flags_fp16_alt_impl : 0; +#endif TORCH_CUDABLAS_CHECK(rocblas_gemm_strided_batched_ex(handle, opa, opb, (int)m, (int)n, (int)k, (void*)&falpha, a, rocblas_datatype_f16_r, (int)lda, stridea, b, rocblas_datatype_f16_r, (int)ldb, strideb, (void*)&fbeta, c, rocblas_datatype_f16_r, (int)ldc, stridec, c, rocblas_datatype_f16_r, (int)ldc, stridec, (int) num_batches, rocblas_datatype_f32_r, rocblas_gemm_algo_standard, - 0, 0)); + 0, flag)); #else #if defined(CUDA_VERSION) && CUDA_VERSION < 11000 // On CUDA versions prior to 11, users are required to set the math mode to CUBLAS_TENSOR_OP_MATH @@ -392,6 +401,10 @@ void gemm(CUDABLAS_GEMM_ARGTYPES(at::Half)) { _cublasAdjustLdLevel3(transa, transb, m, n, k, &lda, &ldb, &ldc); GEMM_CHECK_ARGVALUES(at::Half); #ifdef USE_ROCM + int flag = 0; +#if USE_GEMM_FLAGS_FP16_ALT_IMPL + flag = at::BackwardPassGuard::is_backward_pass() ? rocblas_gemm_flags_fp16_alt_impl : 0; +#endif TORCH_CUDABLAS_CHECK(rocblas_gemm_ex( handle, opa, @@ -416,7 +429,7 @@ void gemm(CUDABLAS_GEMM_ARGTYPES(at::Half)) { rocblas_datatype_f32_r, rocblas_gemm_algo_standard, 0, - 0)); + flag)); #else cudaDeviceProp* prop = at::cuda::getCurrentDeviceProperties(); if (prop->major >= 5) { @@ -634,7 +647,8 @@ void gemm_and_bias( int64_t mat2_ld, const Dtype* bias, Dtype* result_ptr, - int64_t result_ld) { + int64_t result_ld, + GEMMAndBiasActivationEpilogue activation) { using opmath_t = at::opmath_type; opmath_t beta_val = 0; // bias is added in epilogue @@ -670,6 +684,13 @@ void gemm_and_bias( &transb, sizeof(transb))); cublasLtEpilogue_t epilogue = CUBLASLT_EPILOGUE_BIAS; + if (activation == GEMMAndBiasActivationEpilogue::RELU) { + epilogue = CUBLASLT_EPILOGUE_RELU_BIAS; + } else if (activation == GEMMAndBiasActivationEpilogue::GELU) { +#if CUDA_VERSION >= 11040 + epilogue = CUBLASLT_EPILOGUE_GELU_BIAS; +#endif + } TORCH_CUDABLAS_CHECK(cublasLtMatmulDescSetAttribute( computeDesc.descriptor(), CUBLASLT_MATMUL_DESC_EPILOGUE, @@ -752,7 +773,8 @@ template void gemm_and_bias( int64_t mat2_ld, const double* bias, double* result_ptr, - int64_t result_ld); + int64_t result_ld, + GEMMAndBiasActivationEpilogue activation); template void gemm_and_bias( bool transpose_mat1, @@ -767,7 +789,8 @@ template void gemm_and_bias( int64_t mat2_ld, const float* bias, float* result_ptr, - int64_t result_ld); + int64_t result_ld, + GEMMAndBiasActivationEpilogue activation); template void gemm_and_bias( bool transpose_mat1, @@ -782,7 +805,8 @@ template void gemm_and_bias( int64_t mat2_ld, const at::Half* bias, at::Half* result_ptr, - int64_t result_ld); + int64_t result_ld, + GEMMAndBiasActivationEpilogue activation); template void gemm_and_bias( bool transpose_mat1, @@ -797,7 +821,8 @@ template void gemm_and_bias( int64_t mat2_ld, const at::BFloat16* bias, at::BFloat16* result_ptr, - int64_t result_ld); + int64_t result_ld, + GEMMAndBiasActivationEpilogue activation); #endif // defined(CUDA_VERSION) && CUDA_VERSION >= 11000 && !defined(_MSC_VER) template <> diff --git a/aten/src/ATen/cuda/CUDABlas.h b/aten/src/ATen/cuda/CUDABlas.h index 72d0abe40ca49d..10e589ecd6c9d0 100644 --- a/aten/src/ATen/cuda/CUDABlas.h +++ b/aten/src/ATen/cuda/CUDABlas.h @@ -71,6 +71,14 @@ void gemm(CUDABLAS_GEMM_ARGTYPES(at::BFloat16)); #endif #if defined(CUDA_VERSION) && CUDA_VERSION >= 11000 && !defined(_MSC_VER) +enum GEMMAndBiasActivationEpilogue { + None, + RELU, + GELU, +}; + +// NOTE: GELU activation is not supported prior to CUDA 11.4 and will +// do nothing if passed in that case. template void gemm_and_bias( bool transpose_mat1, @@ -85,7 +93,8 @@ void gemm_and_bias( int64_t mat2_ld, const Dtype* bias, Dtype* result_ptr, - int64_t result_ld); + int64_t result_ld, + GEMMAndBiasActivationEpilogue activation = GEMMAndBiasActivationEpilogue::None); #endif #define CUDABLAS_BGEMM_ARGTYPES(Dtype) \ diff --git a/aten/src/ATen/cuda/CUDAEvent.h b/aten/src/ATen/cuda/CUDAEvent.h index deaebd3583d670..f07daeb979b9ea 100644 --- a/aten/src/ATen/cuda/CUDAEvent.h +++ b/aten/src/ATen/cuda/CUDAEvent.h @@ -32,15 +32,11 @@ struct TORCH_CUDA_CPP_API CUDAEvent { CUDAEvent( DeviceIndex device_index, const cudaIpcEventHandle_t* handle) { - #if !defined(USE_ROCM) device_index_ = device_index; CUDAGuard guard(device_index_); AT_CUDA_CHECK(cudaIpcOpenEventHandle(&event_, *handle)); is_created_ = true; - #else - AT_ERROR("cuIpcOpenEventHandle with HIP is not supported"); - #endif } // Note: event destruction done on creating device to avoid creating a @@ -148,7 +144,6 @@ struct TORCH_CUDA_CPP_API CUDAEvent { // Note: cudaIpcGetEventHandle must be called on the same device as the event void ipc_handle(cudaIpcEventHandle_t * handle) { - #if !defined(USE_ROCM) if (!is_created_) { // this CUDAEvent object was initially constructed from flags but event_ // is not created yet. @@ -156,9 +151,6 @@ struct TORCH_CUDA_CPP_API CUDAEvent { } CUDAGuard guard(device_index_); AT_CUDA_CHECK(cudaIpcGetEventHandle(handle, event_)); - #else - AT_ERROR("cuIpcGetEventHandle with HIP is not supported"); - #endif } private: diff --git a/aten/src/ATen/cuda/cub.cuh b/aten/src/ATen/cuda/cub.cuh index 2011ad097c4a72..abe2e9272014ff 100644 --- a/aten/src/ATen/cuda/cub.cuh +++ b/aten/src/ATen/cuda/cub.cuh @@ -6,6 +6,8 @@ #include #include +#include + #include #if USE_GLOBAL_CUB_WRAPPED_NAMESPACE() @@ -161,6 +163,34 @@ inline void segmented_sort_pairs( } } +#if CUB_SUPPORTS_UNIQUE_BY_KEY() +template +inline void unique_by_key( + KeysInputIteratorT keys_in, ValuesInputIteratorT values_in, + KeysOutputIteratorT keys_out, ValuesOutputIteratorT values_out, + NumSelectedIteratorT num_selected, int64_t num_input_items) +{ + // TODO: use thrust::discard_iterator to handle null keys_out when https://github.com/NVIDIA/cub/issues/406 is fixed. + constexpr bool null_keys_out = std::is_same::value; + using KeyT = typename std::iterator_traits::value_type; + using RealKeysOutputIteratorT = typename std::conditional::type; + RealKeysOutputIteratorT keys_out_; + auto allocator = c10::cuda::CUDACachingAllocator::get(); + c10::DataPtr keys_out_owner; + c10::guts::if_constexpr( + [&](auto _) { + keys_out_owner = allocator->allocate(num_input_items * sizeof(KeyT)); + keys_out_ = static_cast(keys_out_owner.get()); + }, + [&](auto _) { + keys_out_ = keys_out; + } + ); + CUB_WRAPPER(NO_ROCM(at_cuda_detail)::cub::DeviceSelect::UniqueByKey, + keys_in, values_in, keys_out_, values_out, num_selected, num_input_items, c10::cuda::getCurrentCUDAStream()); +} +#endif + namespace impl { template diff --git a/aten/src/ATen/cuda/cub_definitions.cuh b/aten/src/ATen/cuda/cub_definitions.cuh index e464b19e57d511..a3d551673558f7 100644 --- a/aten/src/ATen/cuda/cub_definitions.cuh +++ b/aten/src/ATen/cuda/cub_definitions.cuh @@ -18,7 +18,7 @@ #define CUB_SUPPORTS_NV_BFLOAT16() false #endif -// cub sort support for CUB_WRAPPED_NAMESPACE is added to cub 1.13.1 in: +// cub support for CUB_WRAPPED_NAMESPACE is added to cub 1.13.1 in: // https://github.com/NVIDIA/cub/pull/326 // CUB_WRAPPED_NAMESPACE is defined globally in cmake/Dependencies.cmake // starting from CUDA 11.5 @@ -28,6 +28,14 @@ #define USE_GLOBAL_CUB_WRAPPED_NAMESPACE() false #endif +// cub support for UniqueByKey is added to cub 1.16 in: +// https://github.com/NVIDIA/cub/pull/405 +#if CUB_VERSION >= 101600 +#define CUB_SUPPORTS_UNIQUE_BY_KEY() true +#else +#define CUB_SUPPORTS_UNIQUE_BY_KEY() false +#endif + // cub support for scan by key is added to cub 1.15 // in https://github.com/NVIDIA/cub/pull/376 #if CUB_VERSION >= 101500 diff --git a/aten/src/ATen/cuda/detail/CUDAHooks.cpp b/aten/src/ATen/cuda/detail/CUDAHooks.cpp index 4efe2ec4c33f36..5a444376cc8f66 100644 --- a/aten/src/ATen/cuda/detail/CUDAHooks.cpp +++ b/aten/src/ATen/cuda/detail/CUDAHooks.cpp @@ -139,16 +139,14 @@ bool CUDAHooks::hasCuSOLVER() const { #endif } -#if !defined(USE_ROCM) #if defined(USE_DIRECT_NVRTC) static std::pair, at::cuda::NVRTC*> load_nvrtc() { return std::make_pair(nullptr, at::cuda::load_nvrtc()); } -#else +#elif !defined(USE_ROCM) static std::pair, at::cuda::NVRTC*> load_nvrtc() { return std::make_pair(nullptr, &at::cuda::detail::lazyNVRTC); } -#endif #else static std::pair, at::cuda::NVRTC*> load_nvrtc() { #if defined(_WIN32) @@ -293,10 +291,22 @@ std::string CUDAHooks::showConfig() const { cudaRuntimeGetVersion(&runtimeVersion); auto printCudaStyleVersion = [&](int v) { +#ifdef USE_ROCM + // HIP_VERSION value format was changed after ROCm v4.2 to include the patch number + if(v < 500) { + // If major=xx, minor=yy then format -> xxyy + oss << (v / 100) << "." << (v % 10); + } + else { + // If major=xx, minor=yy & patch=zzzzz then format -> xxyyzzzzz + oss << (v / 10000000) << "." << (v / 100000 % 100) << "." << (v % 100000); + } +#else oss << (v / 1000) << "." << (v / 10 % 100); if (v % 10 != 0) { oss << "." << (v % 10); } +#endif }; #if !defined(USE_ROCM) diff --git a/aten/src/ATen/cuda/llvm_complex.cpp b/aten/src/ATen/cuda/llvm_complex.cpp index 00339bdac0fb69..4cceb11b3eeda1 100644 --- a/aten/src/ATen/cuda/llvm_complex.cpp +++ b/aten/src/ATen/cuda/llvm_complex.cpp @@ -724,6 +724,16 @@ log10(const complex<_Tp>& __x) return log(__x) / log(_Tp(10)); } +// log2 + +template +inline +complex<_Tp> +log2(const complex<_Tp>& __x) +{ + return log(__x) / log(_Tp(2)); +} + // sqrt template diff --git a/aten/src/ATen/cudnn/Descriptors.cpp b/aten/src/ATen/cudnn/Descriptors.cpp index a5e8dc0a245315..f954bbf5623ad9 100644 --- a/aten/src/ATen/cudnn/Descriptors.cpp +++ b/aten/src/ATen/cudnn/Descriptors.cpp @@ -22,6 +22,8 @@ inline cudnnDataType_t getDataType(const at::Tensor& t) { #if defined(CUDNN_VERSION) && CUDNN_VERSION >= 8200 else if (scalar_type == at::kBFloat16) { return CUDNN_DATA_BFLOAT16; + } else if (scalar_type == at::kQInt8) { + return CUDNN_DATA_INT8; } #endif throw std::runtime_error("TensorDescriptor only supports double, float and half tensors"); diff --git a/aten/src/ATen/cudnn/Types.cpp b/aten/src/ATen/cudnn/Types.cpp index 4771f9bf2165b8..215d42fcd23f84 100644 --- a/aten/src/ATen/cudnn/Types.cpp +++ b/aten/src/ATen/cudnn/Types.cpp @@ -5,7 +5,9 @@ namespace at { namespace native { cudnnDataType_t getCudnnDataTypeFromScalarType(const at::ScalarType dtype) { - if (dtype == at::kFloat) { + if (dtype == c10::kQInt8) { + return CUDNN_DATA_INT8; + } else if (dtype == at::kFloat) { return CUDNN_DATA_FLOAT; } else if (dtype == at::kDouble) { return CUDNN_DATA_DOUBLE; diff --git a/aten/src/ATen/jiterator_macros.h b/aten/src/ATen/jiterator_macros.h new file mode 100644 index 00000000000000..2769537346c873 --- /dev/null +++ b/aten/src/ATen/jiterator_macros.h @@ -0,0 +1,38 @@ +#pragma once +#include +#include + +#define JITERATOR_HOST_DEVICE C10_HOST_DEVICE +#if defined(_MSC_VER) && defined(__CUDACC__) +// NVRTC on Windows errors if __host__ __device__ attribute is +// present on kernel. +// error: attribute "__host__" does not apply here +// error: attribute "__device__" does not apply here +#define JITERATOR_HOST_DEVICE +#endif + +// jiterator_also_stringify_as macro is used to define code (for CPU/ROCm) +// and generate code string for `jiterator` (only when compiling for CUDA). +// Usage : +// jiterator_also_stringify_as( +// jiterator_code(template T identity(T x) { return x; }), +// identity_string); +// This will define the template `identity` as present in code and +// also define `std::string identity_string` with the code as the string +// if this is being compiled for CUDA. + +// `jiterator_code` macro is to deal with `,` in the kernel code. +// These `,`s confuse the preprocessor into thinking we are passing +// multiple arguments to the macro. +#define jiterator_code(...) __VA_ARGS__ +#if defined(__CUDACC__) + // CPU and CUDA case + #define stringify_code(...) #__VA_ARGS__ + #define jiterator_also_stringify_as(code, str_name) \ + code /* define the function */ \ + const std::string str_name = std::string(stringify_code(code)); +#else + // CPU only or CPU and ROCm case + // Only needs the function + #define jiterator_also_stringify_as(code, str_name) code +#endif diff --git a/aten/src/ATen/mkl/SparseDescriptors.h b/aten/src/ATen/mkl/SparseDescriptors.h index 46d656898a8d0a..2c152e0b2b725c 100644 --- a/aten/src/ATen/mkl/SparseDescriptors.h +++ b/aten/src/ATen/mkl/SparseDescriptors.h @@ -101,7 +101,7 @@ class MklSparseCsrDescriptor sparse_matrix_t raw_descriptor; // Assuming that the last two dimensions are block elements of the matrix - if (values.dim() == 3) { + if (values.dim() == 3 && crow_indices.dim() == 1 && col_indices.dim() == 1) { TORCH_CHECK( values.size(-1) == values.size(-2), "MKL Sparse doesn't support matrices with non-square blocks."); diff --git a/aten/src/ATen/native/BatchLinearAlgebra.cpp b/aten/src/ATen/native/BatchLinearAlgebra.cpp index 5fc486c44f5c60..33f325267884a3 100644 --- a/aten/src/ATen/native/BatchLinearAlgebra.cpp +++ b/aten/src/ATen/native/BatchLinearAlgebra.cpp @@ -952,8 +952,8 @@ static Tensor& linalg_solve_out_info(Tensor& result, Tensor& infos, const Tensor // _linalg_broadcast_batch_dims also includes linearSolveCheckInputs // it checks for squareness of 'input' and 'shape' compatibility of 'other' and 'input' - Tensor other_broadcasted, input_broadcasted; - std::tie(other_broadcasted, input_broadcasted) = _linalg_broadcast_batch_dims(other_, input, "linalg.solve"); + Tensor other_broadcasted; + std::tie(other_broadcasted, std::ignore) = _linalg_broadcast_batch_dims(other_, input, "linalg.solve"); auto squeezed_other_broadcasted = at::squeeze(other_broadcasted, -1); auto squeezed_result_shape = squeezed_other_broadcasted.sizes(); @@ -989,18 +989,17 @@ static Tensor& linalg_solve_out_info(Tensor& result, Tensor& infos, const Tensor // lu_factor_stub+lu_solve_stub perform calculations in-place and 'result' must be a copy of 'other_broadcasted' result.copy_(other_broadcasted); - auto input_working_copy = cloneBatchedColumnMajor(input_broadcasted); - TORCH_INTERNAL_ASSERT(infos.scalar_type() == kInt); TORCH_INTERNAL_ASSERT(infos.device() == input.device()); - infos.resize_({std::max(1, batchCount(input_broadcasted))}); + infos.resize_({std::max(1, batchCount(input))}); // if input is empty infos might not get filled; make sure infos doesn't contain garbage then if (input.numel() == 0) { infos.fill_(0); } // compute the LU factorization of 'input_working_copy' - auto pivots_shape = IntArrayRef(input_broadcasted.sizes().data(), input_broadcasted.dim() - 2).vec(); // input_broadcasted.shape[:-2] + auto input_working_copy = cloneBatchedColumnMajor(input); + auto pivots_shape = IntArrayRef(input.sizes().data(), input.dim() - 2).vec(); // input.shape[:-2] pivots_shape.push_back(std::min(input.size(-2), input.size(-1))); Tensor pivots = at::empty(pivots_shape, input.options().dtype(kInt)); lu_factor_stub(input.device().type(), input_working_copy, pivots, infos, /*compute_pivots=*/true); @@ -1023,8 +1022,7 @@ Tensor& linalg_solve_out(const Tensor& input, const Tensor& other, Tensor& resul // Now check LAPACK/MAGMA error codes // _linalg_check_errors calls 'infos = infos.to(kCPU)' - bool vector_case = linalg_solve_is_vector_rhs(input, other); - at::_linalg_check_errors(infos, "linalg.solve", vector_case ? result.dim() == 1 : result.dim() == 2); + at::_linalg_check_errors(infos, "linalg.solve", input.dim() == 2); return result; } diff --git a/aten/src/ATen/native/BatchLinearAlgebraKernel.cpp b/aten/src/ATen/native/BatchLinearAlgebraKernel.cpp index 117bbdb90935d5..84759dce1acc99 100644 --- a/aten/src/ATen/native/BatchLinearAlgebraKernel.cpp +++ b/aten/src/ATen/native/BatchLinearAlgebraKernel.cpp @@ -908,8 +908,8 @@ void apply_lu_solve(const Tensor& b, const Tensor& lu, const Tensor& pivots, Tra const auto trans = to_blas(transpose); auto pivots_data = pivots.data_ptr(); auto b_stride = matrixStride(b); - auto lu_stride = matrixStride(lu); - auto pivots_stride = pivots.size(-1); + auto lu_stride = lu.dim() > 2 ? lu.stride(-3) : 0; + auto pivots_stride = pivots.dim() > 1 ? pivots.stride(-2) : 0; auto batch_size = batchCount(b); auto n = lu.size(-2); @@ -917,10 +917,19 @@ void apply_lu_solve(const Tensor& b, const Tensor& lu, const Tensor& pivots, Tra auto leading_dimension = std::max(1, n); int info = 0; + + // lu and pivots tensors can be broadcast to b + // here we construct a helper indexing tensor to linearly index into lu and pivots + IntArrayRef lu_batch_shape(lu.sizes().data(), lu.dim() - 2); + IntArrayRef b_batch_shape(b.sizes().data(), b.dim() - 2); + BroadcastLinearIndices lu_index( + batchCount(lu), lu_batch_shape, b_batch_shape); + for (const auto i : c10::irange(batch_size)) { + int64_t lu_index_i = lu_index(i); scalar_t* b_working_ptr = &b_data[i * b_stride]; - scalar_t* lu_working_ptr = &lu_data[i * lu_stride]; - int* pivots_working_ptr = &pivots_data[i * pivots_stride]; + scalar_t* lu_working_ptr = &lu_data[lu_index_i * lu_stride]; + int* pivots_working_ptr = &pivots_data[lu_index_i * pivots_stride]; lapackLuSolve(trans, n, nrhs, lu_working_ptr, leading_dimension, pivots_working_ptr, b_working_ptr, leading_dimension, &info); diff --git a/aten/src/ATen/native/BinaryOps.cpp b/aten/src/ATen/native/BinaryOps.cpp index 437835d7a86657..5b6ead4ff5a5a8 100644 --- a/aten/src/ATen/native/BinaryOps.cpp +++ b/aten/src/ATen/native/BinaryOps.cpp @@ -618,6 +618,11 @@ Tensor& mul_(Tensor& self, const Scalar& other) { return at::mul_out(self, wrapped_scalar_tensor(other), self); // redispatch! } +Tensor& mul__scalar_sparse_csr(Tensor& self, const Scalar& other) { + self.values().mul_(other); + return self; +} + Device correct_out_device(const Tensor& self, const Tensor& other) { if (self.device() == at::kCPU){ return other.device(); diff --git a/aten/src/ATen/native/ConstantPadNd.cpp b/aten/src/ATen/native/ConstantPadNd.cpp deleted file mode 100644 index f7a2d76ed52280..00000000000000 --- a/aten/src/ATen/native/ConstantPadNd.cpp +++ /dev/null @@ -1,87 +0,0 @@ -#include - -#include - -namespace at { namespace native { - -Tensor constant_pad_nd(const Tensor& self, IntArrayRef pad, const Scalar& value) { - TORCH_CHECK(pad.size() % 2 == 0, "Length of pad must be even but instead it equals ", - pad.size()); - - auto input_sizes = self.sizes(); - auto l_inp = self.dim(); - - auto l_pad = pad.size() / 2; - auto l_diff = l_inp - l_pad; - TORCH_CHECK(l_inp >= (int64_t)l_pad, "Length of pad should be no more than twice the number of " - "dimensions of the input. Pad length is ", pad.size(), "while the input has ", - l_inp, "dimensions."); - - std::vector new_shape; - - bool all_pads_non_positive = true; - - auto c_input = self; - for (const auto i : c10::irange(l_diff, l_inp)) { - auto pad_idx = 2 * (l_inp - i - 1); - if (pad[pad_idx] < 0) { - c_input = c_input.narrow(i, -pad[pad_idx], c_input.size(i) + pad[pad_idx]); - } else if (pad[pad_idx] != 0) { - all_pads_non_positive = false; - } - if (pad[pad_idx + 1] < 0) { - c_input = c_input.narrow(i, 0, c_input.size(i) + pad[pad_idx + 1]); - } else if (pad[pad_idx + 1] != 0) { - all_pads_non_positive = false; - } - } - - // if none of the pads are positive we can optimize and just return the result - // of calling .narrow() on the input - if (all_pads_non_positive) { - return c_input.clone(); - } - - - for (size_t i = 0; i < (size_t)l_diff; i ++) { - new_shape.emplace_back(input_sizes[i]); - } - - for (const auto i : c10::irange((size_t)l_pad)) { - auto pad_idx = pad.size() - ((i + 1) * 2); - auto new_dim = input_sizes[l_diff + i] + pad[pad_idx] + pad[pad_idx + 1]; - TORCH_CHECK(new_dim > 0, "The input size ", input_sizes[l_diff + i], ", plus negative padding ", - pad[pad_idx], " and ", pad[pad_idx + 1], " resulted in a negative output size, " - "which is invalid. Check dimension ", l_diff + i, " of your input."); - new_shape.emplace_back(new_dim); - } - - at::Tensor output; - const auto memory_format = self.suggest_memory_format(); - if (self.is_quantized()) { - const auto qscheme = self.qscheme(); - TORCH_CHECK(qscheme == kPerTensorAffine || qscheme == kPerTensorSymmetric, - "Only per-tensor padding is supported."); - output = at::_empty_affine_quantized( - new_shape, self.options().memory_format(memory_format), - self.q_scale(), self.q_zero_point(), c10::nullopt); - } else { - output = at::empty(new_shape, self.options().memory_format(memory_format)); - } - output.fill_(value); - - auto c_output = output; - for (const auto i : c10::irange(l_diff, l_inp)) { - auto pad_idx = 2 * (l_inp - i - 1); - if (pad[pad_idx] > 0) { - c_output = c_output.narrow(i, pad[pad_idx], c_output.size(i) - pad[pad_idx]); - } - if (pad[pad_idx + 1] > 0) { - c_output = c_output.narrow(i, 0, c_output.size(i) - pad[pad_idx + 1]); - } - } - c_output.copy_(c_input); - return output; -} - -}} // namespace at::native diff --git a/aten/src/ATen/native/ConvUtils.h b/aten/src/ATen/native/ConvUtils.h index f54103372e3a01..54a4b5d14a5ab5 100644 --- a/aten/src/ATen/native/ConvUtils.h +++ b/aten/src/ATen/native/ConvUtils.h @@ -104,7 +104,7 @@ struct ConvParams { bool use_mkldnn(const at::Tensor& input, const at::Tensor& weight) const; bool use_nnpack(const at::Tensor& input, const at::Tensor& weight) const; bool use_xnnpack(const at::Tensor& input, const at::Tensor& weight, - const c10::optional bias_sizes_opt) const; + const at::OptionalIntArrayRef bias_sizes_opt) const; bool is_depthwise(const at::Tensor& input, const at::Tensor& weight) const; }; @@ -139,7 +139,7 @@ enum class ConvBackend { TORCH_API ConvBackend select_conv_backend( const Tensor& input, const Tensor& weight, - const c10::optional bias_sizes_opt, + const at::OptionalIntArrayRef bias_sizes_opt, const bool need_backward, const ConvParams& params); diff --git a/aten/src/ATen/native/Convolution.cpp b/aten/src/ATen/native/Convolution.cpp index e4e051025239b4..02b179480cc5d5 100644 --- a/aten/src/ATen/native/Convolution.cpp +++ b/aten/src/ATen/native/Convolution.cpp @@ -267,7 +267,7 @@ auto ConvParams::use_nnpack(const at::Tensor& input, const at::Tensor& weight) c auto ConvParams::use_xnnpack( const at::Tensor& input, const at::Tensor& weight, - const c10::optional bias_sizes_opt) const -> bool { + const at::OptionalIntArrayRef bias_sizes_opt) const -> bool { #if defined(C10_MOBILE) if (!transposed) { return (input.size(1) == groups) && @@ -652,6 +652,88 @@ static at::Tensor subtensor(at::Tensor& tensor, int dim, int groups, int g) { return tensor.narrow(dim, n * g, n).contiguous(); } +namespace { + +std::pair complex_to_real(const Tensor& inp) { + auto inp_view_as_complex = at::view_as_real(inp); + auto dim_i = inp_view_as_complex.dim() - 1; + auto i_r = inp_view_as_complex.select(dim_i, 0); + auto i_i = inp_view_as_complex.select(dim_i, 1); + return std::make_pair(i_r, i_i); +} + +at::Tensor complex_convolution( + const Tensor& input, + const Tensor& weight, + const Tensor& bias, + IntArrayRef stride, + IntArrayRef padding, + IntArrayRef dilation, + IntArrayRef output_padding, + int64_t groups) { + check_input_same_type_as_parameters(input, weight, bias); + Tensor i_r, i_i, w_r, w_i; + std::tie(i_r, i_i) = complex_to_real(input.resolve_conj()); + std::tie(w_r, w_i) = complex_to_real(weight.resolve_conj()); + + // [NOTE] Complex Convolution + // conv(W, x, b) = conv(Wr, xr, br) - conv(Wi, xi, 0) + i(conv(Wi, xr, bi) + conv(Wr, xi, 0)) + // where W, x and b are all complex inputs. + // With Gauss Trick: + // a = conv(Wr, xr, br), + // b = conv(Wi, xi, 0), + // c = conv(Wr + Wi, xr + xi, bi + br) + // conv(W, x, b) = a - b + i(c - a - b) + Tensor a, b, c; + if (!bias.defined()) { + a = at::convolution(i_r, w_r, bias, stride, padding, dilation, false, output_padding, groups); + b = at::convolution(i_i, w_i, bias, stride, padding, dilation, false, output_padding, groups); + c = at::convolution(i_r + i_i, w_r + w_i, bias, stride, padding, dilation, false, output_padding, groups); + } else { + Tensor b_r, b_i; + std::tie(b_r, b_i) = complex_to_real(bias.resolve_conj()); + a = at::convolution(i_r, w_r, b_r, stride, padding, dilation, false, output_padding, groups); + b = at::convolution(i_i, w_i, Tensor(), stride, padding, dilation, false, output_padding, groups); + c = at::convolution(i_r + i_i, w_r + w_i, b_r + b_i, stride, padding, dilation, false, output_padding, groups); + } + + auto i = c10::Scalar(c10::complex(0, 1)); + return a - b + i * (c - a - b); +} + +at::Tensor complex_convolution_mode( + const at::Tensor& input, + const at::Tensor& weight, + const c10::optional& bias_opt, + at::IntArrayRef stride, + c10::string_view padding, + at::IntArrayRef dilation, + int64_t groups) { + auto bias = bias_opt.value_or(Tensor()); + check_input_same_type_as_parameters(input, weight, bias); + Tensor i_r, i_i, w_r, w_i; + std::tie(i_r, i_i) = complex_to_real(input.resolve_conj()); + std::tie(w_r, w_i) = complex_to_real(weight.resolve_conj()); + + // See [NOTE] Complex Convolution + Tensor a, b, c; + if (!bias.defined()) { + a = at::_convolution_mode(i_r, w_r, bias, stride, padding, dilation, groups); + b = at::_convolution_mode(i_i, w_i, bias, stride, padding, dilation, groups); + c = at::_convolution_mode(i_r + i_i, w_r + w_i, bias, stride, padding, dilation, groups); + } else { + Tensor b_r, b_i; + std::tie(b_r, b_i) = complex_to_real(bias.resolve_conj()); + a = at::_convolution_mode(i_r, w_r, b_r, stride, padding, dilation, groups); + b = at::_convolution_mode(i_i, w_i, Tensor(), stride, padding, dilation, groups); + c = at::_convolution_mode(i_r + i_i, w_r + w_i, b_r + b_i, stride, padding, dilation, groups); + } + + auto i = c10::Scalar(c10::complex(0, 1)); + return a - b + i * (c - a - b); +} + +} // namespace at::Tensor conv1d( const Tensor& input_, const Tensor& weight, const c10::optional& bias_opt, @@ -663,7 +745,12 @@ at::Tensor conv1d( Tensor input; bool is_batched; std::tie(input, is_batched) = batchify(input_, /*num_spatial_dims=*/ 1, "conv1d"); - auto output = at::convolution(input, weight, bias, stride, padding, dilation, false, {0}, groups); + Tensor output; + if (at::isComplexType(input_.scalar_type())) { + output = complex_convolution(input, weight, bias, stride, padding, dilation, {0}, groups); + } else { + output = at::convolution(input, weight, bias, stride, padding, dilation, false, {0}, groups); + } return is_batched ? output : output.squeeze(0); } @@ -677,7 +764,12 @@ at::Tensor conv2d( Tensor input; bool is_batched; std::tie(input, is_batched) = batchify(input_, /*num_spatial_dims=*/ 2, "conv2d"); - auto output = at::convolution(input, weight, bias, stride, padding, dilation, false, {{0, 0}}, groups); + Tensor output; + if (at::isComplexType(input_.scalar_type())) { + output = complex_convolution(input, weight, bias, stride, padding, dilation, {{0, 0}}, groups); + } else { + output = at::convolution(input, weight, bias, stride, padding, dilation, false, {{0, 0}}, groups); + } return is_batched ? output : output.squeeze(0); } @@ -691,7 +783,12 @@ at::Tensor conv3d( Tensor input; bool is_batched; std::tie(input, is_batched) = batchify(input_, /*num_spatial_dims=*/ 3, "conv3d"); - auto output = at::convolution(input, weight, bias, stride, padding, dilation, false, {{0, 0, 0}}, groups); + Tensor output; + if (at::isComplexType(input_.scalar_type())) { + output = complex_convolution(input, weight, bias, stride, padding, dilation, {{0, 0, 0}}, groups); + } else { + output = at::convolution(input, weight, bias, stride, padding, dilation, false, {{0, 0, 0}}, groups); + } return is_batched ? output : output.squeeze(0); } @@ -787,8 +884,12 @@ at::Tensor conv1d( Tensor input; bool is_batched; std::tie(input, is_batched) = batchify(input_, /*num_spatial_dims=*/ 1, "conv1d"); - auto output = at::_convolution_mode( - input, weight, bias, stride, std::move(padding), dilation, groups); + Tensor output; + if (at::isComplexType(input_.scalar_type())) { + output = complex_convolution_mode(input, weight, bias, stride, std::move(padding), dilation, groups); + } else { + output = at::_convolution_mode(input, weight, bias, stride, std::move(padding), dilation, groups); + } return is_batched ? output : output.squeeze(0); } @@ -799,8 +900,12 @@ at::Tensor conv2d( Tensor input; bool is_batched; std::tie(input, is_batched) = batchify(input_, /*num_spatial_dims=*/ 2, "conv2d"); - auto output = at::_convolution_mode( - input, weight, bias, stride, std::move(padding), dilation, groups); + Tensor output; + if (at::isComplexType(input_.scalar_type())) { + output = complex_convolution_mode(input, weight, bias, stride, std::move(padding), dilation, groups); + } else { + output = at::_convolution_mode(input, weight, bias, stride, std::move(padding), dilation, groups); + } return is_batched ? output : output.squeeze(0); } @@ -811,8 +916,12 @@ at::Tensor conv3d( Tensor input; bool is_batched; std::tie(input, is_batched) = batchify(input_, /*num_spatial_dims=*/ 3, "conv3d"); - auto output = at::_convolution_mode( - input, weight, bias, stride, std::move(padding), dilation, groups); + Tensor output; + if (at::isComplexType(input_.scalar_type())) { + output = complex_convolution_mode(input, weight, bias, stride, std::move(padding), dilation, groups); + } else { + output = at::_convolution_mode(input, weight, bias, stride, std::move(padding), dilation, groups); + } return is_batched ? output : output.squeeze(0); } @@ -933,7 +1042,7 @@ ConvBackend select_conv_backend( ConvBackend select_conv_backend( const Tensor& input, const Tensor& weight, - const c10::optional bias_sizes_opt, + const at::OptionalIntArrayRef bias_sizes_opt, const bool need_backward, const ConvParams& params) { @@ -1565,7 +1674,7 @@ std::tuple _convolution_backward_nogroup_bac // output_mask: 3-dim boolean array specifying which gradients to compute in input, weight, bias order std::tuple convolution_backward( const Tensor& grad_output_, const Tensor& input_, const Tensor& weight_, - const c10::optional bias_sizes_opt, + const at::OptionalIntArrayRef bias_sizes_opt, IntArrayRef stride, IntArrayRef padding, IntArrayRef dilation, bool transposed, IntArrayRef output_padding, int64_t groups, std::array output_mask) { auto grad_output = grad_output_; diff --git a/aten/src/ATen/native/Copy.cpp b/aten/src/ATen/native/Copy.cpp index 5496facf847c7b..c93d517b7b78d2 100644 --- a/aten/src/ATen/native/Copy.cpp +++ b/aten/src/ATen/native/Copy.cpp @@ -52,7 +52,7 @@ void copy_same_type_transpose_(Tensor& self, const Tensor& src) { // The code below is implemented with the assumption that sizes are equal TORCH_INTERNAL_ASSERT_DEBUG_ONLY(self.sizes().equals(src.sizes())); - AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3(kHalf, kBool, kBFloat16, self.scalar_type(), "copy_", [&] { + AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND4(kHalf, kBool, kBFloat16, kComplexHalf, self.scalar_type(), "copy_", [&] { scalar_t* sp = src.data_ptr(); scalar_t* rp = self.data_ptr(); scalar_t* bp = buf.data_ptr(); diff --git a/aten/src/ATen/native/DilatedConvolutionUtils.h b/aten/src/ATen/native/DilatedConvolutionUtils.h index 2d4815799b10f2..51b30a9bc77aed 100644 --- a/aten/src/ATen/native/DilatedConvolutionUtils.h +++ b/aten/src/ATen/native/DilatedConvolutionUtils.h @@ -4,7 +4,7 @@ #include #include -#include +#include #include #define TORCH_CHECK_DIM_SIZE(T, DIM, DIM_SIZE, SIZE) \ diff --git a/aten/src/ATen/native/EmbeddingBag.cpp b/aten/src/ATen/native/EmbeddingBag.cpp index e6f88f556c8258..32eb95d50fadc1 100644 --- a/aten/src/ATen/native/EmbeddingBag.cpp +++ b/aten/src/ATen/native/EmbeddingBag.cpp @@ -10,6 +10,7 @@ #ifdef USE_FBGEMM #include +#include #else #include #endif @@ -60,14 +61,14 @@ std::pair promoteIndicesAndOffsets( // is only applicable if special conditions are met template bool is_fast_path_index_select(const Tensor& src, Tensor& output, index_t padding_idx) { - return src.scalar_type() == kFloat && src.strides()[1] == 1 && output.strides()[1] == 1 && padding_idx < static_cast(0); + return (src.scalar_type() == kFloat || src.scalar_type() == kHalf) && src.strides()[1] == 1 && output.strides()[1] == 1 && padding_idx < static_cast(0); } // Determines if we can use a fast implementation for index_select_scale_add, // which is only applicable if special conditions are met template bool is_fast_path_index_select_scale(const Tensor& src, const Tensor& scale, Tensor& output, index_t padding_idx) { - return src.scalar_type() == kFloat && src.strides()[1] == 1 && output.strides()[1] == 1 && scale.strides()[0] == 1 && padding_idx < static_cast(0); + return (src.scalar_type() == kFloat || src.scalar_type() == kHalf) && src.strides()[1] == 1 && output.strides()[1] == 1 && scale.strides()[0] == 1 && padding_idx < static_cast(0); } template @@ -81,7 +82,7 @@ bool is_fast_path(const Tensor& src, const c10::optional& scale, Tensor& // index_add (using add_indices as the index), without creating an intermediary // tensor to hold the selected embeddings template -typename std::enable_if::value, void>::type +typename std::enable_if::value && !std::is_same::value, void>::type index_select_add(const Tensor &select_indices, const Tensor &add_indices, const Tensor &src, @@ -96,12 +97,12 @@ index_select_add(const Tensor &select_indices, auto* src_data = src.data_ptr(); auto* output_data = output.data_ptr(); // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - index_t* bag_size_data; + index_t* bag_size_data = nullptr; if (bag_size.defined()) { bag_size_data = bag_size.data_ptr(); } auto numel = add_indices.numel(); - int64_t ddim = src.sizes()[1]; + int64_t ddim = src.size(1); auto vocab_size = src.size(0); auto src_stride0 = src.strides()[0]; auto src_stride1 = src.strides()[1]; @@ -157,6 +158,155 @@ void fbgemm_spmdm_report_error_( } } // namespace +template +typename std::enable_if::value, void>::type +index_select_add(const Tensor &select_indices, + const Tensor &add_indices, + const Tensor &src, + Tensor &output, + const Tensor& offsets, + bool include_last_offset, + Tensor &bag_size, + index_t padding_idx) { + int64_t ddim = src.size(1); + auto* select_indices_data = select_indices.data_ptr(); + auto* output_data = output.data_ptr(); + + if (is_fast_path_index_select(src, output, padding_idx)) { + auto src_contig = src.contiguous(); + auto* src_data = src_contig.data_ptr(); + int64_t output_size = offsets.numel() - 1; + auto* offsets_data = offsets.data_ptr(); + std::vector offsets_include_last; + + if (include_last_offset) { + output_size = offsets.numel() - 1; + } else { + output_size = offsets.numel(); + offsets_include_last.resize(offsets.numel() + 1); + if (offsets.numel() > 0) { + std::memcpy( + offsets_include_last.data(), + offsets.data_ptr(), + sizeof(index_t) * offsets.numel()); + } + offsets_include_last[offsets.numel()] = select_indices.numel(); + offsets_data = offsets_include_last.data(); + } + +#ifdef USE_FBGEMM + using float16 = uint16_t; + auto kernel_fp16_index_t = + fbgemm::GenerateEmbeddingSpMDM( + /* block_size */ddim, + /* has_weight */false, + /* normalize_by_lengths */false, + /* prefetch */16, + /* is_weight_positional */false, + /* use_offsets */true + ); +#else + // Initialize the intermediate output buffer to be 0. + Tensor output_fp32 = at::zeros({output_size, ddim}, output.options().dtype(at::kFloat)); + auto* output_data_fp32 = output_fp32.data_ptr(); +#endif + at::parallel_for( + 0, output_size, 1, [&](index_t start_idx, index_t end_idx) { +#ifdef USE_FBGEMM + bool success = kernel_fp16_index_t( + /* output_size */end_idx - start_idx, + /* index_size */offsets_data[end_idx] - offsets_data[start_idx], + /* data_size */src.size(0), + /* input */reinterpret_cast(src_data), + /* indices */select_indices_data + offsets_data[start_idx], + /* offsets_or_lengths */offsets_data + start_idx, + /* weights */nullptr, + /* output */reinterpret_cast(output_data + start_idx * ddim)); + if (!success) { + fbgemm_spmdm_report_error_( + end_idx - start_idx, + offsets_data[end_idx] - offsets_data[start_idx], + src.size(0), + offsets_data + start_idx, + select_indices_data + offsets_data[start_idx]); + } +#else + caffe2::EmbeddingLookupIdx( + /*block_size=*/ddim, + /*output_size=*/end_idx - start_idx, + /*index_size=*/offsets_data[end_idx] - offsets_data[start_idx], + /*data_size=*/src.size(0), + /*input=*/src_data, + /*indices=*/select_indices_data + offsets_data[start_idx], + /*offsets=*/offsets_data + start_idx, + /*weights=*/nullptr, + /*scale_bias=*/nullptr, + /*normalize_by_lengths=*/false, + /*out=*/output_data_fp32 + start_idx * ddim); + for (const auto i : c10::irange(output_size)) { + // Convert FP32 intermediate buffer result back to FP16 for output dtype + for (const auto d : c10::irange(ddim)) { + (output_data + i * ddim)[d] = static_cast((output_data_fp32 + ddim * i)[d]); + } + } +#endif + }); + + } else { + TORCH_CHECK(select_indices.numel() == add_indices.numel()); + auto* src_data = src.data_ptr(); + auto* add_indices_data = add_indices.data_ptr(); + // NOLINTNEXTLINE(cppcoreguidelines-init-variables) + index_t* bag_size_data = nullptr; + if (bag_size.defined()) { + bag_size_data = bag_size.data_ptr(); + } + auto vocab_size = src.size(0); + auto src_stride0 = src.strides()[0]; + auto src_stride1 = src.strides()[1]; + auto output_stride0 = output.strides()[0]; + auto output_stride1 = output.strides()[1]; + auto numel = add_indices.numel(); + + Tensor src_fp32 = at::empty({ddim}, src.options().dtype(at::kFloat)); + auto* src_data_fp32 = src_fp32.data_ptr(); + + // Initialize the intermediate output buffer to be 0. + Tensor output_fp32 = at::zeros({output.size(0), ddim}, output.options().dtype(at::kFloat)); + auto* output_data_fp32 = output_fp32.data_ptr(); + + for (const auto i : c10::irange(numel)) { + // We can skip indices equal to padding_idx so they are not included in + // the reduction + auto idx = select_indices_data[i]; + TORCH_CHECK( + idx >= 0 && idx < vocab_size, + "embedding_bag: Expected idx >= 0 && idx < num_embeddings but found idx to be ", + idx); + if (idx != padding_idx) { + // Copy src_data + src_stride0 * idx to src_data_fp32 + for (const auto d : c10::irange(ddim)) { + src_data_fp32[d] = static_cast((src_data + src_stride0 * idx)[d * src_stride1]); + } + at::native::cpublas::axpy(ddim, 1, + src_data_fp32, 1, + output_data_fp32 + ddim * add_indices_data[i], 1); + + } else if (bag_size.defined()) { + // Decrement bag_size to reflect that the index is padded + // NOLINTNEXTLINE(clang-analyzer-core.NullDereference) + bag_size_data[add_indices_data[i]]--; + } + } + for (const auto i : c10::irange(output.size(0))) { + // Convert FP32 intermediate buffer result back to FP16 for output dtype + for (const auto d : c10::irange(ddim)) { + (output_data + output_stride0 * i)[d * output_stride1] = static_cast((output_data_fp32 + ddim * i)[d]); + } + } + } +} + template typename std::enable_if::value, void>::type index_select_add(const Tensor &select_indices, @@ -167,7 +317,7 @@ index_select_add(const Tensor &select_indices, bool include_last_offset, Tensor &bag_size, index_t padding_idx) { - int64_t ddim = src.sizes()[1]; + int64_t ddim = src.size(1); auto* select_indices_data = select_indices.data_ptr(); auto* output_data = output.data_ptr(); @@ -210,7 +360,7 @@ index_select_add(const Tensor &select_indices, bool success = kernel_fp32_index_t( /* output_size */end_idx - start_idx, /* index_size */offsets_data[end_idx] - offsets_data[start_idx], - /* data_size */src.sizes()[0], + /* data_size */src.size(0), /* input */src_data, /* indices */select_indices_data + offsets_data[start_idx], /* offsets_or_lengths */offsets_data + start_idx, @@ -220,7 +370,7 @@ index_select_add(const Tensor &select_indices, fbgemm_spmdm_report_error_( end_idx - start_idx, offsets_data[end_idx] - offsets_data[start_idx], - src.sizes()[0], + src.size(0), offsets_data + start_idx, select_indices_data + offsets_data[start_idx]); } @@ -229,7 +379,7 @@ index_select_add(const Tensor &select_indices, /*block_size=*/ddim, /*output_size=*/end_idx - start_idx, /*index_size=*/offsets_data[end_idx] - offsets_data[start_idx], - /*data_size=*/src.sizes()[0], + /*data_size=*/src.size(0), /*input=*/src_data, /*indices=*/select_indices_data + offsets_data[start_idx], /*offsets=*/offsets_data + start_idx, @@ -244,7 +394,7 @@ index_select_add(const Tensor &select_indices, auto* src_data = src.data_ptr(); auto* add_indices_data = add_indices.data_ptr(); // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - index_t* bag_size_data; + index_t* bag_size_data = nullptr; if (bag_size.defined()) { bag_size_data = bag_size.data_ptr(); } @@ -284,7 +434,7 @@ index_select_add(const Tensor &select_indices, // mul (scaling by per_sample_weights) // index_add (using add_indices as the index) template -static typename std::enable_if::value, void>::type +static typename std::enable_if::value && !std::is_same::value, void>::type index_select_scale_add(const Tensor &select_indices, const Tensor &add_indices, const Tensor &scale, @@ -300,7 +450,7 @@ index_select_scale_add(const Tensor &select_indices, auto* src_data = src.data_ptr(); auto* output_data = output.data_ptr(); // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - index_t* bag_size_data; + index_t* bag_size_data = nullptr; if (bag_size.defined()) { bag_size_data = bag_size.data_ptr(); } @@ -338,6 +488,158 @@ index_select_scale_add(const Tensor &select_indices, } } +template +typename std::enable_if::value, void>::type +index_select_scale_add(const Tensor &select_indices, + const Tensor &add_indices, + const Tensor &scale, + const Tensor &src, + Tensor &output, + const Tensor& offsets, + bool include_last_offset, + Tensor &bag_size, + index_t padding_idx) { + int64_t ddim = src.size(1); + auto* scale_data = scale.data_ptr(); + auto* select_indices_data = select_indices.data_ptr(); + auto* output_data = output.data_ptr(); + + if (is_fast_path_index_select_scale(src, scale, output, padding_idx)) { + auto src_contig = src.contiguous(); + auto* src_data = src_contig.data_ptr(); + int64_t output_size = offsets.numel() - 1; + auto* offsets_data = offsets.data_ptr(); + std::vector offsets_include_last; + + if (include_last_offset) { + output_size = offsets.numel() - 1; + } else { + output_size = offsets.numel(); + offsets_include_last.resize(offsets.numel() + 1); + std::memcpy( + offsets_include_last.data(), + offsets.data_ptr(), + sizeof(index_t) * offsets.numel()); + offsets_include_last[offsets.numel()] = select_indices.numel(); + offsets_data = offsets_include_last.data(); + } + + Tensor scale_fp32 = at::empty(scale.sizes(), scale.options().dtype(at::kFloat)); + auto* scale_data_fp32 = scale_fp32.data_ptr(); + +#ifdef USE_FBGEMM + using float16 = uint16_t; + fbgemm::Float16ToFloat_simd(reinterpret_cast(scale_data), scale_data_fp32, scale_fp32.numel()); + auto kernel_fp16_index_t = + fbgemm::GenerateEmbeddingSpMDM( + /* block_size */ddim, + /* has_weight */true, + /* normalize_by_lengths */false, + /* prefetch */16, + /* is_weight_positional */false, + /* use_offsets */true + ); +#else + // Initialize the intermediate output buffer to be 0. + Tensor output_fp32 = at::zeros({output_size, ddim}, output.options().dtype(at::kFloat)); + auto* output_data_fp32 = output_fp32.data_ptr(); + for (const auto i : c10::irange(scale.numel())) { + scale_data_fp32[i] = static_cast(scale_data[i]); + } +#endif + at::parallel_for( + 0, output_size, 1, [&](index_t start_idx, index_t end_idx) { +#ifdef USE_FBGEMM + bool success = kernel_fp16_index_t( + /* output_size */end_idx - start_idx, + /* index_size */offsets_data[end_idx] - offsets_data[start_idx], + /* data_size */src.size(0), + /* input */reinterpret_cast(src_data), + /* indices */select_indices_data + offsets_data[start_idx], + /* offsets_or_lengths */offsets_data + start_idx, + /* weights */scale_data_fp32 + offsets_data[start_idx], + /* output */reinterpret_cast(output_data + start_idx * ddim)); + if (!success) { + fbgemm_spmdm_report_error_( + end_idx - start_idx, + offsets_data[end_idx] - offsets_data[start_idx], + src.size(0), + offsets_data + start_idx, + select_indices_data + offsets_data[start_idx]); + } +#else + caffe2::EmbeddingLookupIdx( + /*block_size=*/ddim, + /*output_size=*/end_idx - start_idx, + /*index_size=*/offsets_data[end_idx] - offsets_data[start_idx], + /*data_size=*/src.size(0), + /*input=*/src_data, + /*indices=*/select_indices_data + offsets_data[start_idx], + /*offsets=*/offsets_data + start_idx, + /*weights=*/scale_data_fp32 + offsets_data[start_idx], + /*scale_bias=*/nullptr, + /*normalize_by_lengths=*/false, + /*out=*/output_data_fp32 + start_idx * ddim); + for (const auto i : c10::irange(output_size)) { + // Convert FP32 intermediate buffer result back to FP16 for output dtype + for (const auto d : c10::irange(ddim)) { + (output_data + i * ddim)[d] = static_cast((output_data_fp32 + ddim * i)[d]); + } + } +#endif + }); + } else { + AT_ASSERT(select_indices.numel() == add_indices.numel()); + auto* src_data = src.data_ptr(); + auto* add_indices_data = add_indices.data_ptr(); + // NOLINTNEXTLINE(cppcoreguidelines-init-variables) + index_t* bag_size_data = nullptr; + if (bag_size.defined()) { + bag_size_data = bag_size.data_ptr(); + } + auto vocab_size = src.size(0); + auto src_stride0 = src.strides()[0]; + auto src_stride1 = src.strides()[1]; + auto output_stride0 = output.strides()[0]; + auto output_stride1 = output.strides()[1]; + auto scale_stride = scale.strides()[0]; + auto numel = add_indices.numel(); + + // Initialize the intermediate output buffer to be 0. + Tensor output_fp32 = at::zeros({output.size(0), ddim}, output.options().dtype(at::kFloat)); + auto* output_data_fp32 = output_fp32.data_ptr(); + + for (const auto i : c10::irange(numel)) { + // We can skip indices equal to padding_idx so they are not included in + // the reduction + auto idx = select_indices_data[i]; + TORCH_CHECK( + idx >= 0 && idx < vocab_size, + "embedding_bag: Expected idx >= 0 && idx < num_embeddings but found idx to be ", + idx); + if (idx != padding_idx) { + + auto* src_base = src_data + src_stride0 * idx; + auto* output_base_fp32 = output_data_fp32 + ddim * add_indices_data[i]; + auto scale = scale_data[i * scale_stride]; + for (const auto j : c10::irange(ddim)) { + output_base_fp32[j] += static_cast(src_base[j * src_stride1]) * static_cast(scale); + } + } else if (bag_size.defined()) { + // Decrement bag_size to reflect that the index is padded + // NOLINTNEXTLINE(clang-analyzer-core.NullDereference) + bag_size_data[add_indices_data[i]]--; + } + } + for (const auto i : c10::irange(output.size(0))) { + // Convert FP32 intermediate buffer result back to FP16 for output dtype + for (const auto d : c10::irange(ddim)) { + (output_data + output_stride0 * i)[d * output_stride1] = static_cast((output_data_fp32 + ddim * i)[d]); + } + } + } +} + template typename std::enable_if::value, void>::type index_select_scale_add(const Tensor &select_indices, @@ -349,7 +651,7 @@ index_select_scale_add(const Tensor &select_indices, bool include_last_offset, Tensor &bag_size, index_t padding_idx) { - int64_t ddim = src.sizes()[1]; + int64_t ddim = src.size(1); auto* scale_data = scale.data_ptr(); auto* select_indices_data = select_indices.data_ptr(); auto* output_data = output.data_ptr(); @@ -391,7 +693,7 @@ index_select_scale_add(const Tensor &select_indices, bool success = kernel_fp32_index_t( /* output_size */end_idx - start_idx, /* index_size */offsets_data[end_idx] - offsets_data[start_idx], - /* data_size */src.sizes()[0], + /* data_size */src.size(0), /* input */src_data, /* indices */select_indices_data + offsets_data[start_idx], /* offsets_or_lengths */offsets_data + start_idx, @@ -401,7 +703,7 @@ index_select_scale_add(const Tensor &select_indices, fbgemm_spmdm_report_error_( end_idx - start_idx, offsets_data[end_idx] - offsets_data[start_idx], - src.sizes()[0], + src.size(0), offsets_data + start_idx, select_indices_data + offsets_data[start_idx]); } @@ -410,7 +712,7 @@ index_select_scale_add(const Tensor &select_indices, /*block_size=*/ddim, /*output_size=*/end_idx - start_idx, /*index_size=*/offsets_data[end_idx] - offsets_data[start_idx], - /*data_size=*/src.sizes()[0], + /*data_size=*/src.size(0), /*input=*/src_data, /*indices=*/select_indices_data + offsets_data[start_idx], /*offsets=*/offsets_data + start_idx, @@ -425,7 +727,7 @@ index_select_scale_add(const Tensor &select_indices, auto* src_data = src.data_ptr(); auto* add_indices_data = add_indices.data_ptr(); // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - index_t* bag_size_data; + index_t* bag_size_data = nullptr; if (bag_size.defined()) { bag_size_data = bag_size.data_ptr(); } @@ -477,17 +779,17 @@ void check_arguments( checkScalarTypes("embedding_bag", offsets_arg, {kLong, kInt}); checkSameType("embedding_bag", indices_arg, offsets_arg); auto weight_arg = TensorArg(weight, "weight", 1); - checkScalarTypes("embedding_bag", weight_arg, {kFloat, kDouble}); + checkScalarTypes("embedding_bag", weight_arg, {kHalf, kFloat, kDouble}); AT_DISPATCH_INDEX_TYPES(offsets.scalar_type(), "_embedding_bag_cpu_impl", [&]() { - if (offsets.sizes()[0] > 0) { + if (offsets.size(0) > 0) { index_t offset_0 = offsets.data_ptr()[0]; - index_t offset_n = offsets.data_ptr()[offsets.sizes()[0]-1]; + index_t offset_n = offsets.data_ptr()[offsets.size(0)-1]; TORCH_CHECK(offset_0 == 0, "offsets[0] has to be 0, i.e., the first sequence " "in the mini-batch has to start from position 0. " "However, got ", offsets[0]); - TORCH_CHECK(offset_n <= indices.sizes()[0], "offsets[-1] can not " - "be greater than input's length ", indices.sizes()[0], " but got offsets[-1] of ", + TORCH_CHECK(offset_n <= indices.size(0), "offsets[-1] can not " + "be greater than input's length ", indices.size(0), " but got offsets[-1] of ", offset_n); } }); @@ -504,7 +806,7 @@ void check_arguments( if (include_last_offset) { TORCH_CHECK( - offsets.sizes()[0] >= 1, + offsets.size(0) >= 1, "include_last_offset: number of offset should be at least 1"); } } @@ -517,16 +819,16 @@ void make_bag_size_out( const bool include_last_offset, const bool requires_grad) { if (requires_grad || mode == MODE_MEAN || mode == MODE_MAX) { - auto num_bags = offsets.sizes()[0] - (include_last_offset ? 1 : 0); + auto num_bags = offsets.size(0) - (include_last_offset ? 1 : 0); at::native::resize_(bag_size_out, {num_bags}, c10::nullopt); // Compute this for MODE_MEAN and MODE_MAX (latter needed for backwards) if (num_bags != 1) { - bag_size_out.slice(0, 0, bag_size_out.sizes()[0] - 1, 1) = + bag_size_out.slice(0, 0, bag_size_out.size(0) - 1, 1) = offsets.slice(0, 1, num_bags, 1) - offsets.slice(0, 0, num_bags - 1, 1); } if (num_bags > 0) { - bag_size_out[-1] = indices.sizes()[0] - offsets[num_bags - 1]; + bag_size_out[-1] = indices.size(0) - offsets[num_bags - 1]; } } else { at::native::resize_(bag_size_out, offsets.sizes(), c10::nullopt); @@ -541,7 +843,7 @@ void make_max_indices_out( const Tensor& bag_size, const int64_t mode, bool include_last_offset) { - int64_t numBags = offsets.sizes()[0]; + int64_t numBags = offsets.size(0); if (mode == MODE_MAX) { if (include_last_offset) { TORCH_CHECK( @@ -569,13 +871,11 @@ void make_offset2bag_out( bool fast_path_sum = is_fast_path(weight, per_sample_weights, output, padding_idx); if (mode == MODE_MEAN || mode == MODE_MAX || !fast_path_sum) { - at::native::resize_(offset2bag, {indices.sizes()[0] + 1}, c10::nullopt); + at::native::resize_(offset2bag, {indices.size(0) + 1}, c10::nullopt); at::native::zero_(offset2bag); - } - if (mode == MODE_MEAN || mode == MODE_MAX || !fast_path_sum) { make_offset2bag(offsets, offset2bag); - at::native::resize_(offset2bag, {indices.sizes()[0]}, c10::nullopt); + at::native::resize_(offset2bag, {indices.size(0)}, c10::nullopt); // only initialize output in slow path at::native::zero_(output); } @@ -711,7 +1011,7 @@ void _embedding_bag_cpu_impl_out(Tensor& output, Tensor& offset2bag, const c10::optional& per_sample_weights, bool include_last_offset, int64_t padding_idx) { if (mode == MODE_MEAN || mode == MODE_SUM) { - AT_DISPATCH_FLOATING_TYPES(weight.scalar_type(), "embedding_bag_no_grad_cpu_out", + AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, weight.scalar_type(), "embedding_bag_no_grad_cpu_out", [&indices, &offset2bag, &per_sample_weights, &weight, &output, &offsets, &include_last_offset, &mode, &bag_size, &padding_idx]() { AT_DISPATCH_INDEX_TYPES(indices.scalar_type(), "embedding_bag_no_grad_cpu_out", [&indices, &offset2bag, &per_sample_weights, &weight, &output, &offsets, &include_last_offset, &mode, &bag_size, &padding_idx]() { @@ -756,7 +1056,7 @@ std::tuple _embedding_bag_cpu_impl( check_arguments(weight, indices, offsets, mode, per_sample_weights, include_last_offset); Tensor output = at::empty( - {include_last_offset ? offsets.sizes()[0] - 1 : offsets.sizes()[0], + {include_last_offset ? offsets.size(0) - 1 : offsets.size(0), weight.sizes()[1]}, weight.options()); @@ -894,10 +1194,10 @@ Tensor _embedding_bag_backward(const Tensor &grad, const Tensor &indices_, Tensor offset2bag_; if (indices.numel() != 0 && offset2bag.numel() == 0) { offset2bag_ = at::zeros( - {indices.sizes()[0] + 1}, offsets.options()); // offset2bag = [0 0 0 0 0] + {indices.size(0) + 1}, offsets.options()); // offset2bag = [0 0 0 0 0] make_offset2bag(offsets, offset2bag_); - offset2bag_.resize_({indices.sizes()[0]}); + offset2bag_.resize_({indices.size(0)}); } else { auto offset2bag_arg = TensorArg(offset2bag, "offset2bag", 1); checkScalarTypes("embedding_bag", offset2bag_arg, {kLong, kInt}); @@ -1081,7 +1381,7 @@ Tensor _embedding_bag_dense_backward_cpu(const Tensor &grad_, const Tensor &indi // for more details. auto grad = grad_.contiguous(); auto grad_arg = TensorArg(grad, "grad_", 1); - checkScalarTypes("embedding_bag", grad_arg, {kFloat, kDouble}); + checkScalarTypes("embedding_bag", grad_arg, {kHalf, kFloat, kDouble}); if (mode == MODE_MAX) { return _embedding_bag_dense_backward_cpu_max( @@ -1092,12 +1392,24 @@ Tensor _embedding_bag_dense_backward_cpu(const Tensor &grad_, const Tensor &indi auto index_grad_weight = at::zeros({num_weights, grad.sizes()[1]}, grad.options()); - AT_DISPATCH_FLOATING_TYPES(grad.scalar_type(), "embedding_bag_backward", [&] { - _embedding_bag_dense_backward_cpu_sum_mean( - grad, indices_, offset2bag__, bag_size_, num_weights, - scale_grad_by_freq, mode, per_sample_weights_, index_grad_weight, - padding_idx); - }); + AT_DISPATCH_FLOATING_TYPES_AND2( + at::ScalarType::Half, + at::ScalarType::BFloat16, + grad.scalar_type(), + "embedding_bag_backward", + [&] { + _embedding_bag_dense_backward_cpu_sum_mean( + grad, + indices_, + offset2bag__, + bag_size_, + num_weights, + scale_grad_by_freq, + mode, + per_sample_weights_, + index_grad_weight, + padding_idx); + }); return index_grad_weight; } @@ -1120,7 +1432,7 @@ Tensor _embedding_bag_per_sample_weights_backward_cpu_template( Tensor indices, offsets; std::tie(indices, offsets) = promoteIndicesAndOffsets(indices_, offsets_); AT_ASSERT(indices.dim() == 1); - auto num_samples = indices.sizes()[0]; + auto num_samples = indices.size(0); AT_ASSERT(weight.dim() == 2); AT_ASSERT(weight.sizes()[1] == embedding_features); @@ -1134,11 +1446,11 @@ Tensor _embedding_bag_per_sample_weights_backward_cpu_template( Tensor offset2bag_; if (indices.numel() != 0 && offset2bag.numel() == 0) { offset2bag_ = at::zeros( - {indices.sizes()[0] + 1}, offset2bag.options()); // offset2bag = [0 0 0 0 0] + {indices.size(0) + 1}, offset2bag.options()); // offset2bag = [0 0 0 0 0] make_offset2bag(offsets, offset2bag_); - at::native::resize_(offset2bag_, {indices.sizes()[0]}, c10::nullopt); + at::native::resize_(offset2bag_, {indices.size(0)}, c10::nullopt); } else { auto offset2bag_arg = TensorArg(offset2bag, "offset2bag", 1); checkScalarTypes("embedding_bag", offset2bag_arg, {kLong, kInt}); @@ -1194,12 +1506,16 @@ Tensor _embedding_bag_per_sample_weights_backward_cpu( const Tensor& offset2bag, int64_t mode, int64_t padding_idx) { - return AT_DISPATCH_FLOATING_TYPES( - grad.scalar_type(), "_embedding_bag_per_sample_weights_backward_cpu", [&]() { - return _embedding_bag_per_sample_weights_backward_cpu_template( - grad, weight, indices, offsets, offset2bag, mode, padding_idx); - } - ); + return AT_DISPATCH_FLOATING_TYPES_AND2( + at::ScalarType::Half, + at::ScalarType::BFloat16, + grad.scalar_type(), + "_embedding_bag_per_sample_weights_backward_cpu", + [&]() { + return _embedding_bag_per_sample_weights_backward_cpu_template< + scalar_t>( + grad, weight, indices, offsets, offset2bag, mode, padding_idx); + }); } Tensor _embedding_bag_sparse_backward( @@ -1229,6 +1545,5 @@ Tensor _embedding_bag_sparse_backward( return native::embedding_backward(index_grad, indices, num_weights, padding_idx, scale_grad_by_freq, true); } - } } // namespace at::native diff --git a/aten/src/ATen/native/ForeachUtils.h b/aten/src/ATen/native/ForeachUtils.h index 8855fd313a5623..033052f401f6bd 100644 --- a/aten/src/ATen/native/ForeachUtils.h +++ b/aten/src/ATen/native/ForeachUtils.h @@ -126,19 +126,11 @@ bool check_fast_path_restrictions( bool can_use_fast_route(ArrayRef tensorLists, ArrayRef scalarList = {}, bool does_op_promote_integer_inputs_to_float = false) { -#if defined(USE_ROCM) - return false; -#else return check_fast_path_restrictions(tensorLists, scalarList, does_op_promote_integer_inputs_to_float); -#endif } bool can_use_fast_route(TensorList tensors1, TensorList tensors2, bool does_op_promote_integer_inputs_to_float = false) { -#if defined(USE_ROCM) - return false; -#else return can_use_fast_route({tensors1, tensors2}, {}, does_op_promote_integer_inputs_to_float); -#endif } } diff --git a/aten/src/ATen/native/GridSampler.cpp b/aten/src/ATen/native/GridSampler.cpp index 54002dbd8f8fec..8b044061022609 100644 --- a/aten/src/ATen/native/GridSampler.cpp +++ b/aten/src/ATen/native/GridSampler.cpp @@ -1,4 +1,5 @@ #include +#include #include #include #include @@ -23,6 +24,12 @@ namespace { GridSamplerInterpolation interpolation_mode, GridSamplerPadding padding_mode, bool align_corners) { + // See NOTE [ grid_sampler Native Functions ]. + // Add checks here in case this is called instead of grid_sampler. + check_grid_sampler_common(input, grid); + check_grid_sampler_3d( + input, grid, static_cast(interpolation_mode)); + int64_t N = input.size(0); int64_t C = input.size(1); int64_t inp_D = input.size(2); @@ -179,6 +186,12 @@ namespace { GridSamplerInterpolation interpolation_mode, GridSamplerPadding padding_mode, bool align_corners, std::array output_mask) { + // See NOTE [ grid_sampler Native Functions ]. + // Add checks here in case this is called instead of grid_sampler. + check_grid_sampler_common(input, grid); + check_grid_sampler_3d( + input, grid, static_cast(interpolation_mode)); + auto input_requires_grad = output_mask[0]; Tensor grad_input = ([&]() { if (input_requires_grad) { @@ -411,6 +424,11 @@ Tensor _grid_sampler_2d_cpu_quantized( int64_t interpolation_mode_, int64_t padding_mode_, bool align_corners) { + // See NOTE [ grid_sampler Native Functions ]. + // Add checks here in case this is called instead of grid_sampler. + check_grid_sampler_common(input, grid); + check_grid_sampler_2d(input, grid); + auto interpolation_mode = static_cast(interpolation_mode_); /* Bilinear interpolation is supported using the fact that we can perform @@ -515,6 +533,11 @@ Tensor _grid_sampler_2d_cpu_fallback(const Tensor& input, const Tensor& grid, int64_t interpolation_mode_, int64_t padding_mode_, bool align_corners) { + // See NOTE [ grid_sampler Native Functions ]. + // Add checks here in case this is called instead of grid_sampler. + check_grid_sampler_common(input, grid); + check_grid_sampler_2d(input, grid); + auto interpolation_mode = static_cast(interpolation_mode_); auto padding_mode = static_cast(padding_mode_); using scalar_t = float; @@ -663,6 +686,11 @@ _grid_sampler_2d_cpu_fallback_backward(const Tensor& grad_output, int64_t interpolation_mode_, int64_t padding_mode_, bool align_corners) { + // See NOTE [ grid_sampler Native Functions ]. + // Add checks here in case this is called instead of grid_sampler. + check_grid_sampler_common(input, grid); + check_grid_sampler_2d(input, grid); + const auto interpolation_mode = static_cast(interpolation_mode_); const auto padding_mode = static_cast(padding_mode_); using scalar_t = float; @@ -856,10 +884,14 @@ _grid_sampler_2d_cpu_fallback_backward(const Tensor& grad_output, return std::make_tuple(grad_input, grad_grid); } -// No shape checking needed here. See # NOTE [ grid_sampler Native Functions ]. Tensor grid_sampler_2d_cpu(const Tensor& input, const Tensor& grid, int64_t interpolation_mode, int64_t padding_mode, bool align_corners) { + // See NOTE [ grid_sampler Native Functions ]. + // Add checks here in case this is called instead of grid_sampler. + check_grid_sampler_common(input, grid); + check_grid_sampler_2d(input, grid); + if (input.scalar_type() == kQUInt8) { return native::_grid_sampler_2d_cpu_quantized( input, grid, interpolation_mode, padding_mode, align_corners); @@ -896,10 +928,14 @@ Tensor grid_sampler_2d_cpu(const Tensor& input, const Tensor& grid, DEFINE_DISPATCH(grid_sampler_2d_cpu_kernel); -// No shape checking needed here. See # NOTE [ grid_sampler Native Functions ]. Tensor grid_sampler_3d_cpu(const Tensor& input, const Tensor& grid, int64_t interpolation_mode, int64_t padding_mode, bool align_corners) { + // See NOTE [ grid_sampler Native Functions ]. + // Add checks here in case this is called instead of grid_sampler. + check_grid_sampler_common(input, grid); + check_grid_sampler_3d(input, grid, interpolation_mode); + return AT_DISPATCH_FLOATING_TYPES(input.scalar_type(), "grid_sampler3d_cpu", [&] { return grid_sampler_3d_cpu_impl( input, grid, static_cast(interpolation_mode), @@ -907,11 +943,14 @@ Tensor grid_sampler_3d_cpu(const Tensor& input, const Tensor& grid, }); } -// No shape checking needed here. See # NOTE [ grid_sampler Native Functions ]. std::tuple grid_sampler_2d_backward_cpu(const Tensor& grad_output, const Tensor& input, const Tensor& grid, int64_t interpolation_mode, int64_t padding_mode, bool align_corners, std::array output_mask) { + // See NOTE [ grid_sampler Native Functions ]. + // Add checks here in case this is called instead of grid_sampler. + check_grid_sampler_common(input, grid); + check_grid_sampler_2d(input, grid); // AVX gather instructions use signed 32-bit offsets to gather float values. // Check for possible overflow and fallback to scalar implementation @@ -953,11 +992,14 @@ grid_sampler_2d_backward_cpu(const Tensor& grad_output, const Tensor& input, con DEFINE_DISPATCH(grid_sampler_2d_backward_cpu_kernel); -// No shape checking needed here. See # NOTE [ grid_sampler Native Functions ]. std::tuple grid_sampler_3d_backward_cpu(const Tensor& grad_output, const Tensor& input, const Tensor& grid, int64_t interpolation_mode, int64_t padding_mode, bool align_corners, std::array output_mask) { + // See NOTE [ grid_sampler Native Functions ]. + // Add checks here in case this is called instead of grid_sampler. + check_grid_sampler_common(input, grid); + check_grid_sampler_3d(input, grid, interpolation_mode); return AT_DISPATCH_FLOATING_TYPES(input.scalar_type(), "grid_sampler_3d_backward_cpu", [&] { return grid_sampler_3d_backward_cpu_impl( @@ -968,62 +1010,29 @@ grid_sampler_3d_backward_cpu(const Tensor& grad_output, const Tensor& input, con }); } -Tensor grid_sampler(const Tensor& input, const Tensor& grid, - int64_t interpolation_mode, int64_t padding_mode, - bool align_corners) { - TORCH_CHECK( - input.defined() && grid.defined(), - "grid_sampler(): expected input and grid to not be undefined, but input " - "is ", input, " and grid is ", grid); - auto input_opt = input.options(); - auto grid_opt = grid.options(); - TORCH_CHECK( - input_opt.device() == grid_opt.device(), - "grid_sampler(): expected input and grid to be on same device, but input " - "is on ", input_opt.device(), " and grid is on ", grid_opt.device()); - TORCH_CHECK( - input_opt.layout() == kStrided && grid_opt.layout() == kStrided, - "grid_sampler(): expected input and grid to have torch.strided layout, but " - "input has ", input_opt.layout(), " and grid has ", grid_opt.layout()); - TORCH_CHECK( - (input.dim() == 4 || input.dim() == 5) && input.dim() == grid.dim(), - "grid_sampler(): expected 4D or 5D input and grid with same number of " - "dimensions, but got input with sizes ", input.sizes(), - " and grid with sizes ", grid.sizes()); - TORCH_CHECK( - input.size(0) == grid.size(0), - "grid_sampler(): expected grid and input to have same batch size, but got " - "input with sizes ", input.sizes(), " and grid with sizes ", grid.sizes()); - TORCH_CHECK( - grid.size(-1) == input.dim() - 2, - "grid_sampler(): expected grid to have size ", input.dim() - 2, " in last " - "dimension, but got grid with sizes ", grid.sizes()); - TORCH_CHECK( - !(input.dim() == 5 && static_cast(interpolation_mode) == GridSamplerInterpolation::Bicubic), - "grid_sampler(): bicubic interpolation only supports 4D input" - ); - for (const auto i : c10::irange(2, input.dim())) { - TORCH_CHECK(input.size(i) > 0, - "grid_sampler(): expected input to have non-empty spatial dimensions, " - "but input has sizes ", input.sizes(), " with dimension ", i, " being " - "empty"); - } - // cudnn does not support inputs larger than 1024 - if (at::native::cudnn_is_acceptable(input) && - at::native::cudnn_is_acceptable(grid) && - at::native::canUse32BitIndexMath(input) && - at::native::canUse32BitIndexMath(grid) && - static_cast(interpolation_mode) == GridSamplerInterpolation::Bilinear && - static_cast(padding_mode) == GridSamplerPadding::Zeros && - align_corners && - input.dim() == 4 && - input.size(1) <= 1024) { +// See NOTE [ grid_sampler Native Functions ]. +Tensor grid_sampler( + const Tensor& input, + const Tensor& grid, + int64_t interpolation_mode, + int64_t padding_mode, + bool align_corners +) { + if (cond_cudnn_grid_sampler(input, grid) && + static_cast(interpolation_mode) == + GridSamplerInterpolation::Bilinear && + static_cast(padding_mode) == + GridSamplerPadding::Zeros && + align_corners) { return cudnn_grid_sampler(input, grid); } + if (input.dim() == 4) { - return at::grid_sampler_2d(input, grid, interpolation_mode, padding_mode, align_corners); + return at::grid_sampler_2d( + input, grid, interpolation_mode, padding_mode, align_corners); } else { - return at::grid_sampler_3d(input, grid, interpolation_mode, padding_mode, align_corners); + return at::grid_sampler_3d( + input, grid, interpolation_mode, padding_mode, align_corners); } } diff --git a/aten/src/ATen/native/GridSampler.h b/aten/src/ATen/native/GridSampler.h index 412465937aa015..f4a735032430a1 100644 --- a/aten/src/ATen/native/GridSampler.h +++ b/aten/src/ATen/native/GridSampler.h @@ -5,14 +5,9 @@ #include #include -namespace at { namespace native { - -namespace detail { +#include - enum class GridSamplerInterpolation {Bilinear, Nearest, Bicubic}; - enum class GridSamplerPadding {Zeros, Border, Reflection}; - -} // namespace detail +namespace at { namespace native { using detail::GridSamplerInterpolation; using detail::GridSamplerPadding; diff --git a/aten/src/ATen/native/GridSamplerUtils.h b/aten/src/ATen/native/GridSamplerUtils.h new file mode 100644 index 00000000000000..0b6f29de8c4273 --- /dev/null +++ b/aten/src/ATen/native/GridSamplerUtils.h @@ -0,0 +1,109 @@ +#pragma once + +// See NOTE: [Tensor vs. TensorBase] +// https://github.com/pytorch/pytorch/pull/66979 +#include +#include +#include + +namespace at { namespace native { + +namespace detail { + +enum class GridSamplerInterpolation {Bilinear, Nearest, Bicubic}; +enum class GridSamplerPadding {Zeros, Border, Reflection}; + +} // namespace detail + +using detail::GridSamplerInterpolation; +using detail::GridSamplerPadding; + +namespace { + +// See NOTE [ grid_sampler Native Functions ]. +void check_grid_sampler_common( + const TensorBase& input, + const TensorBase& grid +) { + auto input_opt = input.options(); + auto grid_opt = grid.options(); + + TORCH_CHECK( + input.defined(), + "grid_sampler(): expected input to not be undefined"); + TORCH_CHECK( + grid.defined(), + "grid_sampler(): expected grid to not be undefined"); + TORCH_CHECK( + input_opt.device() == grid_opt.device(), + "grid_sampler(): expected input and grid to be on same device, but input " + "is on ", input_opt.device(), " and grid is on ", grid_opt.device()); + TORCH_CHECK( + input_opt.layout() == kStrided && grid_opt.layout() == kStrided, + "grid_sampler(): expected input and grid to have torch.strided layout, but " + "input has ", input_opt.layout(), " and grid has ", grid_opt.layout()); + TORCH_CHECK( + input.size(0) == grid.size(0), + "grid_sampler(): expected grid and input to have same batch size, but got " + "input with sizes ", input.sizes(), " and grid with sizes ", grid.sizes()); + TORCH_CHECK( + grid.size(-1) == input.dim() - 2, + "grid_sampler(): expected grid to have size ", input.dim() - 2, " in last " + "dimension, but got grid with sizes ", grid.sizes()); + + for (const auto i : c10::irange(2, input.dim())) { + TORCH_CHECK(input.size(i) > 0, + "grid_sampler(): expected input to have non-empty spatial dimensions, " + "but input has sizes ", input.sizes(), " with dimension ", i, " being " + "empty"); + } +} + +// See NOTE [ grid_sampler Native Functions ]. +void check_grid_sampler_2d( + const TensorBase& input, + const TensorBase& grid +) { + TORCH_CHECK( + input.dim() == 4 && input.dim() == grid.dim(), + "grid_sampler(): expected 4D input and grid with same number of " + "dimensions, but got input with sizes ", input.sizes(), + " and grid with sizes ", grid.sizes()); +} + +// See NOTE [ grid_sampler Native Functions ]. +void check_grid_sampler_3d( + const TensorBase& input, + const TensorBase& grid, + int64_t interpolation_mode +) { + TORCH_CHECK( + input.dim() == 5 && input.dim() == grid.dim(), + "grid_sampler(): expected 5D input and grid with same number of " + "dimensions, but got input with sizes ", input.sizes(), + " and grid with sizes ", grid.sizes()); + TORCH_CHECK( + !(input.dim() == 5 && + static_cast(interpolation_mode) == + GridSamplerInterpolation::Bicubic), + "grid_sampler(): bicubic interpolation only supports 4D input"); +} + +// See NOTE [ grid_sampler Native Functions ]. +// cudnn does not support inputs larger than 1024. +bool cond_cudnn_grid_sampler( + const TensorBase& input, + const TensorBase& grid +) { + return ( + at::native::cudnn_is_acceptable(input) && + at::native::cudnn_is_acceptable(grid) && + at::native::canUse32BitIndexMath(input) && + at::native::canUse32BitIndexMath(grid) && + input.dim() == 4 && + input.size(1) <= 1024); +} + +} // anonymous namespace + +}} // namespace at::native diff --git a/aten/src/ATen/native/Histogram.cpp b/aten/src/ATen/native/Histogram.cpp index abd1ae32ded110..c3a007f2c2dcba 100644 --- a/aten/src/ATen/native/Histogram.cpp +++ b/aten/src/ATen/native/Histogram.cpp @@ -407,4 +407,28 @@ Tensor histogram_histc_cpu(const Tensor& self, int64_t bin_ct, return histogram_histc_cpu_out(self, bin_ct, min, max, hist); } +std::tuple> histogramdd( + const Tensor &self, TensorList bins, c10::optional> /*range*/, + const c10::optional &weight, bool density) { + auto hist = at::_histogramdd_from_bin_tensors(self, bins, weight, density); + return std::tuple>{ + std::move(hist), bins.vec()}; +} + +std::tuple> histogramdd( + const Tensor &self, IntArrayRef bins, c10::optional> range, + const c10::optional &weight, bool density) { + auto bin_edges = at::_histogramdd_bin_edges(self, bins, range, weight, density); + auto hist = at::_histogramdd_from_bin_cts(self, bins, range, weight, density); + return std::tuple>{ + std::move(hist), std::move(bin_edges)}; +} + +std::tuple> histogramdd( + const Tensor &self, int64_t bins, c10::optional> range, + const c10::optional &weight, bool density) { + DimVector bins_v(self.size(-1), bins); + return at::native::histogramdd(self, bins_v, range, weight, density); +} + }} // namespace at::native diff --git a/aten/src/ATen/native/Itertools.cpp b/aten/src/ATen/native/Itertools.cpp index d1117b8c1d4d56..bd5fa0fa359549 100644 --- a/aten/src/ATen/native/Itertools.cpp +++ b/aten/src/ATen/native/Itertools.cpp @@ -46,7 +46,10 @@ Tensor cartesian_prod(TensorList tensors) { Tensor combinations(const Tensor& self, int64_t r, bool with_replacement) { TORCH_CHECK(self.dim() == 1, "Expect a 1D vector, but got shape ", self.sizes()); - TORCH_CHECK(r > 0, "Expect a positive number, but got ", r); + TORCH_CHECK(r >= 0, "Expect a non-negative number, but got ", r); + if (r == 0) { + return at::empty({0}, self.options()); + } int64_t num_elements = self.numel(); std::vector grids = at::meshgrid(std::vector(r, self)); Tensor mask = _triu_mask(num_elements, r, with_replacement, self.options()); diff --git a/aten/src/ATen/native/Linear.cpp b/aten/src/ATen/native/Linear.cpp index 3a4a8e1fd7f2d3..847a2dab5e838f 100644 --- a/aten/src/ATen/native/Linear.cpp +++ b/aten/src/ATen/native/Linear.cpp @@ -34,6 +34,12 @@ Tensor linear(const Tensor& input, const Tensor& weight, const c10::optionaldefined() && input.is_contiguous()) { + // Also hit the fused path for contiguous 3D input. + const auto input_sizes = input.sizes(); + const auto result = at::addmm(*bias, input.view({input_sizes[0] * input_sizes[1], input_sizes[2]}), weight.t()); + return result.view({input_sizes[0], input_sizes[1], result.size(1)}); + } auto output = at::matmul(input, weight.t()); if (bias->defined()) { output.add_(*bias); diff --git a/aten/src/ATen/native/LinearAlgebra.cpp b/aten/src/ATen/native/LinearAlgebra.cpp index 926dfc04759e9b..e7a67822068a7f 100644 --- a/aten/src/ATen/native/LinearAlgebra.cpp +++ b/aten/src/ATen/native/LinearAlgebra.cpp @@ -29,15 +29,23 @@ namespace at { namespace meta { -TORCH_META_FUNC(addmm)(const Tensor& self, const Tensor& mat1, const Tensor& mat2, const Scalar& beta, const Scalar& alpha) { - TORCH_CHECK(mat1.dim() == 2, "mat1 must be a matrix, got ", mat1.dim(), "-D tensor"); - TORCH_CHECK(mat2.dim() == 2, "mat2 must be a matrix, got ", mat2.dim(), "-D tensor"); - TORCH_CHECK( - mat1.sizes()[1] == mat2.sizes()[0], "mat1 and mat2 shapes cannot be multiplied (", - mat1.sizes()[0], "x", mat1.sizes()[1], " and ", mat2.sizes()[0], "x", mat2.sizes()[1], ")"); - auto names = at::namedinference::propagate_names_for_addmm(mat1, mat2, self); +#define ADDMM_META() \ + TORCH_CHECK(mat1.dim() == 2, "mat1 must be a matrix, got ", mat1.dim(), "-D tensor"); \ + TORCH_CHECK(mat2.dim() == 2, "mat2 must be a matrix, got ", mat2.dim(), "-D tensor"); \ + TORCH_CHECK( \ + mat1.sizes()[1] == mat2.sizes()[0], "mat1 and mat2 shapes cannot be multiplied (", \ + mat1.sizes()[0], "x", mat1.sizes()[1], " and ", mat2.sizes()[0], "x", mat2.sizes()[1], ")"); \ + \ + auto names = at::namedinference::propagate_names_for_addmm(mat1, mat2, self); \ set_output(0, {mat1.sizes()[0], mat2.sizes()[1]}, {}, self.options(), names); + +TORCH_META_FUNC(addmm)(const Tensor& self, const Tensor& mat1, const Tensor& mat2, const Scalar& beta, const Scalar& alpha) { + ADDMM_META(); +} + +TORCH_META_FUNC(_addmm_activation)(const Tensor& self, const Tensor& mat1, const Tensor& mat2, const Scalar& beta, const Scalar& alpha, bool use_gelu) { + ADDMM_META(); } TORCH_META_FUNC(mm)(const Tensor & self, const Tensor & mat2) { @@ -1126,6 +1134,19 @@ static void addmm_impl_cpu_( return; } + // Some paths in the code below do not handle multiplications of the form [a, 0] x [0, b] + if (m1_sizes[1] == 0) { + if (beta.toComplexDouble() == 0.0) { + result.zero_(); + } else { + if (!self.is_same(result)) { + result.copy_(self); + } + result.mul_(beta); + } + return; + } + if (beta.toComplexDouble() != 0.0 && !self.is_same(result)) { result.copy_(self); } @@ -1290,6 +1311,19 @@ TORCH_IMPL_FUNC(addmm_out_cpu)(const Tensor& self, const Tensor& mat1, const Ten } } +TORCH_IMPL_FUNC(addmm_activation_out_cpu)(const Tensor& self, const Tensor& mat1, const Tensor& mat2, const Scalar& beta, const Scalar& alpha, bool use_gelu, const Tensor &result) { + auto b_self = expand_size(self, {mat1.sizes()[0], mat2.sizes()[1]}, "addmm_out"); + { + at::NoNamesGuard guard; + addmm_impl_cpu_(const_cast(result), *b_self, mat1, mat2, beta, alpha); + if (use_gelu) { + at::gelu_(const_cast(result)); + } else { + at::relu_(const_cast(result)); + } + } +} + TORCH_IMPL_FUNC(mm_out_cpu)(const Tensor & self, const Tensor & mat2, const Tensor & result) { { at::NoNamesGuard guard; @@ -2399,7 +2433,7 @@ static std::vector make_dim_list(int64_t ndim) { } // Checks for valid arguments to linalg_norm when type(ord) == str -static void check_str_ord_valid(const c10::string_view str_ord, optional opt_dim, int64_t ndim) { +static void check_str_ord_valid(const c10::string_view str_ord, OptionalIntArrayRef opt_dim, int64_t ndim) { TORCH_CHECK((str_ord == "nuc") || (str_ord == "fro"), "Invalid norm order: ", str_ord); bool dims_valid = (ndim == 2 && !opt_dim.has_value()) || (opt_dim.has_value() && opt_dim.value().size() == 2); TORCH_CHECK(dims_valid, "order \"", str_ord, @@ -2481,7 +2515,7 @@ static Tensor& _linalg_norm_matrix_out(Tensor& result, const Tensor &self, const return result; } -static Tensor& linalg_norm_out_impl(Tensor& result, const Tensor& self, const optional& opt_num_ord, optional opt_str_ord, optional opt_dim, bool keepdim, optional opt_dtype) { +static Tensor& linalg_norm_out_impl(Tensor& result, const Tensor& self, const optional& opt_num_ord, optional opt_str_ord, OptionalIntArrayRef opt_dim, bool keepdim, optional opt_dtype) { // Callers must give the ord argument as either a number, a string, or neither. // Since the user-facing API has no direct control over how this function is called, this is an internal assert. TORCH_INTERNAL_ASSERT(!(opt_num_ord.has_value() && opt_str_ord.has_value())); @@ -2525,7 +2559,7 @@ static Tensor& linalg_norm_out_impl(Tensor& result, const Tensor& self, const op return result; } -static Tensor& linalg_vector_norm_impl(const Tensor& self, const Scalar& scalar_ord, optional opt_dim, bool keepdim, optional opt_dtype, Tensor& result) { +static Tensor& linalg_vector_norm_impl(const Tensor& self, const Scalar& scalar_ord, OptionalIntArrayRef opt_dim, bool keepdim, optional opt_dtype, Tensor& result) { // Casting a large integer to a double will introduce some error, but for // practical purposes, it won't matter since a large order will usually // give an infinite result @@ -2601,13 +2635,13 @@ static Tensor& linalg_vector_norm_impl(const Tensor& self, const Scalar& scalar_ return result; } -Tensor linalg_vector_norm(const Tensor& self, const Scalar& ord, optional opt_dim, bool keepdim, optional opt_dtype) { +Tensor linalg_vector_norm(const Tensor& self, const Scalar& ord, OptionalIntArrayRef opt_dim, bool keepdim, optional opt_dtype) { ScalarType out_dtype = opt_dtype.value_or(toRealValueType(self.scalar_type())); Tensor result = create_reduction_result(self, opt_dim.value_or(IntArrayRef{}), keepdim, out_dtype); return at::native::linalg_vector_norm_impl(self, ord, opt_dim, keepdim, opt_dtype, result); } -Tensor& linalg_vector_norm_out(const Tensor& self, const Scalar& ord, optional opt_dim, bool keepdim, optional opt_dtype, Tensor& result) { +Tensor& linalg_vector_norm_out(const Tensor& self, const Scalar& ord, OptionalIntArrayRef opt_dim, bool keepdim, optional opt_dtype, Tensor& result) { return at::native::linalg_vector_norm_impl(self, ord, opt_dim, keepdim, opt_dtype, result); } @@ -2676,7 +2710,7 @@ Tensor& linalg_matrix_norm_out( } // Numerical or None norms -Tensor linalg_norm(const Tensor& self, const optional& opt_ord, optional opt_dim, bool keepdim, optional opt_dtype) { +Tensor linalg_norm(const Tensor& self, const optional& opt_ord, OptionalIntArrayRef opt_dim, bool keepdim, optional opt_dtype) { auto options = TensorOptions().dtype(opt_dtype.has_value() ? opt_dtype.value() : toRealValueType(self.scalar_type())).device(self.device()); Tensor result = at::empty({0}, options); return at::native::linalg_norm_out( @@ -2684,7 +2718,7 @@ Tensor linalg_norm(const Tensor& self, const optional& opt_ord, optional } // Frobenius and nuclear norms -Tensor linalg_norm(const Tensor& self, c10::string_view ord, optional opt_dim, bool keepdim, optional opt_dtype) { +Tensor linalg_norm(const Tensor& self, c10::string_view ord, OptionalIntArrayRef opt_dim, bool keepdim, optional opt_dtype) { auto options = TensorOptions().dtype(opt_dtype.has_value() ? opt_dtype.value() : toRealValueType(self.scalar_type())).device(self.device()); Tensor result = at::empty({0}, options); return at::native::linalg_norm_out( @@ -2692,12 +2726,12 @@ Tensor linalg_norm(const Tensor& self, c10::string_view ord, optional& opt_ord, optional opt_dim, bool keepdim, optional opt_dtype, Tensor& result) { +Tensor& linalg_norm_out(const Tensor& self, const optional& opt_ord, OptionalIntArrayRef opt_dim, bool keepdim, optional opt_dtype, Tensor& result) { return linalg_norm_out_impl(result, self, opt_ord, c10::nullopt, opt_dim, keepdim, opt_dtype); } // Frobenius and nuclear norms -Tensor& linalg_norm_out(const Tensor& self, c10::string_view ord, optional opt_dim, bool keepdim, optional opt_dtype, Tensor& result) { +Tensor& linalg_norm_out(const Tensor& self, c10::string_view ord, OptionalIntArrayRef opt_dim, bool keepdim, optional opt_dtype, Tensor& result) { return linalg_norm_out_impl(result, self, c10::nullopt, ord, opt_dim, keepdim, opt_dtype); } @@ -2876,7 +2910,7 @@ Tensor& linalg_tensorinv_out(const Tensor& self, int64_t ind, Tensor& result) { return result; } -Tensor linalg_tensorsolve(const Tensor& self, const Tensor& other, optional dims) { +Tensor linalg_tensorsolve(const Tensor& self, const Tensor& other, OptionalIntArrayRef dims) { /* The idea is to reduce the problem to 2D matrix solve. Step 1. (optional) `self` is permuted with `dims` such that dimensions from `dims` are moved to the right. @@ -2914,7 +2948,7 @@ Tensor linalg_tensorsolve(const Tensor& self, const Tensor& other, optional dims, Tensor& result) { +Tensor& linalg_tensorsolve_out(const Tensor& self, const Tensor& other, OptionalIntArrayRef dims, Tensor& result) { checkSameDevice("tensorsolve", result, self); checkLinalgCompatibleDtype("tensorsolve", result, self); diff --git a/aten/src/ATen/native/LinearAlgebraUtils.h b/aten/src/ATen/native/LinearAlgebraUtils.h index 2448c8db730cff..2c3dfbbf4f6ba2 100644 --- a/aten/src/ATen/native/LinearAlgebraUtils.h +++ b/aten/src/ATen/native/LinearAlgebraUtils.h @@ -114,7 +114,7 @@ static inline c10::MaybeOwned borrow_else_clone(const bool cond, const T * broadcasted shape. */ static inline Tensor copyBatchedColumnMajor(const Tensor& src, int64_t nrows = -1, - c10::optional desired_batch_sizes = c10::nullopt) { + at::OptionalIntArrayRef desired_batch_sizes = c10::nullopt) { nrows = (nrows == -1) ? src.size(-2) : nrows; auto copy_sizes = desired_batch_sizes.has_value() ? desired_batch_sizes.value().vec() @@ -606,6 +606,41 @@ static inline bool linalg_solve_is_vector_rhs(const Tensor& input, const Tensor& return vector_case; } +/* + Computes linear indices for a tensor with original_shape to access its elements like it was a materialized broadcast tensor. +*/ +static inline Tensor get_linear_indices(int64_t numel, IntArrayRef original_shape, IntArrayRef broadcast_shape) { + TensorOptions options = at::TensorOptions().dtype(at::kLong).device(at::kCPU); + return at::arange(numel, options).view(original_shape).broadcast_to(broadcast_shape).contiguous(); +} + +class BroadcastLinearIndices { + private: + Tensor linear_indices_; + bool is_broadcasting_; + + public: + BroadcastLinearIndices( + int64_t numel, + IntArrayRef original_shape, + IntArrayRef broadcast_shape) { + // The assumption is that the broadcast_shape is a materialized broadcast + // shape of the original_shape. We need to compute the linear indices + // compatible with the original_shape to access the elements in the original + // tensor corresponding to the broadcast tensor. + is_broadcasting_ = !original_shape.equals(broadcast_shape); + if (is_broadcasting_) { + linear_indices_ = + get_linear_indices(numel, original_shape, broadcast_shape); + } + } + int64_t operator()(int64_t broadcast_linear_index) { + return is_broadcasting_ + ? linear_indices_.data_ptr()[broadcast_linear_index] + : broadcast_linear_index; + } +}; + static inline bool is_blas_compatible_column_major_order(const Tensor& input) { IntArrayRef input_strides = input.strides(); IntArrayRef input_sizes = input.sizes(); diff --git a/aten/src/ATen/native/LossNLL.cpp b/aten/src/ATen/native/LossNLL.cpp index ed733411ff5376..6a04992e53e6b2 100644 --- a/aten/src/ATen/native/LossNLL.cpp +++ b/aten/src/ATen/native/LossNLL.cpp @@ -491,7 +491,11 @@ Tensor cross_entropy_loss_prob_target( switch (reduction) { case Reduction::Mean: - return -(input * target * weight_).sum() / (input.numel() / input.size(1)); + if (input.numel()==0){ + return -(input * target * weight_).sum().fill_(std::numeric_limits::quiet_NaN()); + } else { + return -(input * target * weight_).sum() / (input.numel() / input.size(1)); + } case Reduction::Sum: return -(input * target * weight_).sum(); case Reduction::None: @@ -502,7 +506,11 @@ Tensor cross_entropy_loss_prob_target( } else { switch (reduction) { case Reduction::Mean: - return -(input * target).sum() / (input.numel() / input.size(1)); + if (input.numel()==0){ + return -(input * target).sum().fill_(std::numeric_limits::quiet_NaN()); + } else { + return -(input * target).sum() / (input.numel()/ input.size(1)); + } case Reduction::Sum: return -(input * target).sum(); case Reduction::None: diff --git a/aten/src/ATen/native/Math.h b/aten/src/ATen/native/Math.h index 09255e065879fb..ee10d00f9b5cd3 100644 --- a/aten/src/ATen/native/Math.h +++ b/aten/src/ATen/native/Math.h @@ -12,6 +12,7 @@ #include #include #include +#include C10_CLANG_DIAGNOSTIC_PUSH() #if C10_CLANG_HAS_WARNING("-Wimplicit-float-conversion") @@ -67,6 +68,83 @@ Output was modified to be inf or -inf when input is 1 or -1. */ POSSIBILITY OF SUCH DAMAGE. */ +namespace { +/* + * This function is derived from the implementation of the i0e function in the + * Cephes Math Library. See note [3-Clause BSD License for the Cephes Math + * Library]. + * + * Computes an approximation of the exponentially scaled zeroth order modified + * Bessel function of the first kind. The approximation is actually two + * (sub)approximations, both using a Chebyshev polynomial expansion. One + * approximates the function over [0, 8], and the other over (8, infinity). This + * function takes the absolute value of all inputs to convert them into the + * domain of the approximation. + */ +jiterator_also_stringify_as(jiterator_code( + template + JITERATOR_HOST_DEVICE T chbevl(T x, const T array[], const int len) { + T b0, b1, b2; + + b0 = array[0]; + b1 = 0; + + for (int i = 1; i < len; ++i) { + b2 = b1; + b1 = b0; + b0 = x * b1 - b2 + array[i]; + } + + return T{0.5} * (b0 - b2); + } + + template + JITERATOR_HOST_DEVICE T calc_i0e(T _x) { + T x = fabs(_x); + + if (x <= T{8.0}) { + static const T coefficients[] = { + -4.41534164647933937950E-18, 3.33079451882223809783E-17, + -2.43127984654795469359E-16, 1.71539128555513303061E-15, + -1.16853328779934516808E-14, 7.67618549860493561688E-14, + -4.85644678311192946090E-13, 2.95505266312963983461E-12, + -1.72682629144155570723E-11, 9.67580903537323691224E-11, + -5.18979560163526290666E-10, 2.65982372468238665035E-9, + -1.30002500998624804212E-8, 6.04699502254191894932E-8, + -2.67079385394061173391E-7, 1.11738753912010371815E-6, + -4.41673835845875056359E-6, 1.64484480707288970893E-5, + -5.75419501008210370398E-5, 1.88502885095841655729E-4, + -5.76375574538582365885E-4, 1.63947561694133579842E-3, + -4.32430999505057594430E-3, 1.05464603945949983183E-2, + -2.37374148058994688156E-2, 4.93052842396707084878E-2, + -9.49010970480476444210E-2, 1.71620901522208775349E-1, + -3.04682672343198398683E-1, 6.76795274409476084995E-1}; + + T y = (x / T{2.0}) - T{2.0}; + return chbevl(y, coefficients, int{30}); + } + + // x > 8 + static const T coefficients[] = { + -7.23318048787475395456E-18, -4.83050448594418207126E-18, + 4.46562142029675999901E-17, 3.46122286769746109310E-17, + -2.82762398051658348494E-16, -3.42548561967721913462E-16, + 1.77256013305652638360E-15, 3.81168066935262242075E-15, + -9.55484669882830764870E-15, -4.15056934728722208663E-14, + 1.54008621752140982691E-14, 3.85277838274214270114E-13, + 7.18012445138366623367E-13, -1.79417853150680611778E-12, + -1.32158118404477131188E-11, -3.14991652796324136454E-11, + 1.18891471078464383424E-11, 4.94060238822496958910E-10, + 3.39623202570838634515E-9, 2.26666899049817806459E-8, + 2.04891858946906374183E-7, 2.89137052083475648297E-6, + 6.88975834691682398426E-5, 3.36911647825569408990E-3, + 8.04490411014108831608E-1}; + + return chbevl(T{32.0} / x - T{2.0}, coefficients, int{25}) / sqrt(x); + }), + i0e_string); // i0e_string +} + #define CENTRAL_RANGE 0.7 template @@ -1385,37 +1463,6 @@ calc_i0(T _x) { // Upcast bfloat16 input to float for numerical accuracy purposes static inline c10::BFloat16 calc_i0(c10::BFloat16 a) { return calc_i0(static_cast(a)); } -/* - * This function is derived from the implementation of the i0e function in the Cephes Math Library. - * See note [3-Clause BSD License for the Cephes Math Library]. - * - * Computes an approximation of the exponentially scaled zeroth order modified Bessel function of the first kind. - * The approximation is actually two (sub)approximations, both using a Chebyshev polynomial expansion. - * One approximates the function over [0, 8], and the other over (8, infinity). This function takes the absolute value - * of all inputs to convert them into the domain of the approximation. - */ -template -static inline typename std::enable_if::value, T>::type -calc_i0e(T _x) { - T x = std::abs(_x); - - if (x <= T{8.0}) { - auto coeff_pair = chebyshev_coefficients_i0e_A(); - auto A = std::get<0>(coeff_pair); - auto len = std::get<1>(coeff_pair); - T y = (x / T{2.0}) - T{2.0}; - return chbevl(y, A, len); - } - - auto coeff_pair = chebyshev_coefficients_i0e_B(); - auto B = std::get<0>(coeff_pair); - auto len = std::get<1>(coeff_pair); - return chbevl(T{32.0} / x - T{2.0}, B, len) / std::sqrt(x); -} - -// Upcast bfloat16 input to float for numerical accuracy purposes -static inline c10::BFloat16 calc_i0e(c10::BFloat16 a) { return calc_i0e(static_cast(a)); } - /* * This function is derived from the implementation of the i1 function in the Cephes Math Library. * See note [3-Clause BSD License for the Cephes Math Library]. @@ -2113,4 +2160,21 @@ calc_erfcx(T x) } } +/* + * Logarithm of Gaussian cumulative distribution function. + + * This implementation of log_ndtr and its helper functions + * follow SciPy's implementation + * See NOTICE for the licenses. + */ +template +static inline C10_HOST_DEVICE T calc_log_ndtr(T x) { + T t = x * M_SQRT1_2; + if (x < T{-1.0}) { + return std::log(calc_erfcx(-t) / 2) - t * t; + } else { + return std::log1p(-std::erfc(t) / 2); + } +} + C10_CLANG_DIAGNOSTIC_POP() diff --git a/aten/src/ATen/native/Normalization.cpp b/aten/src/ATen/native/Normalization.cpp index 981e568b6b9756..1b6ab5d981f31c 100644 --- a/aten/src/ATen/native/Normalization.cpp +++ b/aten/src/ATen/native/Normalization.cpp @@ -26,7 +26,7 @@ TORCH_META_FUNC(renorm)(const Tensor& self, const Scalar& p, int64_t dim, const TORCH_CHECK(maxnorm.toDouble() >= 0.0, "renorm: expected maxnorm to be >= 0 but got ", maxnorm.toDouble()); const auto ndim = self.dim(); - TORCH_CHECK(ndim > 1, "renorm: input needs at least 2 dimensions, got ", ndim, "dimensions"); + TORCH_CHECK(ndim > 1, "renorm: input needs at least 2 dimensions, got ", ndim, " dimensions"); set_output(self.sizes(), self.options()); } diff --git a/aten/src/ATen/native/PadNd.cpp b/aten/src/ATen/native/PadNd.cpp new file mode 100644 index 00000000000000..bdeb351a80dd04 --- /dev/null +++ b/aten/src/ATen/native/PadNd.cpp @@ -0,0 +1,214 @@ +#include +#include + +#include + +namespace at { namespace native { + +Tensor constant_pad_nd(const Tensor& self, IntArrayRef pad, const Scalar& value) { + TORCH_CHECK(pad.size() % 2 == 0, "Length of pad must be even but instead it equals ", + pad.size()); + + auto input_sizes = self.sizes(); + auto l_inp = self.dim(); + + auto l_pad = pad.size() / 2; + auto l_diff = l_inp - l_pad; + TORCH_CHECK(l_inp >= (int64_t)l_pad, "Length of pad should be no more than twice the number of " + "dimensions of the input. Pad length is ", pad.size(), "while the input has ", + l_inp, "dimensions."); + + std::vector new_shape; + + bool all_pads_non_positive = true; + + auto c_input = self; + for (const auto i : c10::irange(l_diff, l_inp)) { + auto pad_idx = 2 * (l_inp - i - 1); + if (pad[pad_idx] < 0) { + c_input = c_input.narrow(i, -pad[pad_idx], c_input.size(i) + pad[pad_idx]); + } else if (pad[pad_idx] != 0) { + all_pads_non_positive = false; + } + if (pad[pad_idx + 1] < 0) { + c_input = c_input.narrow(i, 0, c_input.size(i) + pad[pad_idx + 1]); + } else if (pad[pad_idx + 1] != 0) { + all_pads_non_positive = false; + } + } + + // if none of the pads are positive we can optimize and just return the result + // of calling .narrow() on the input + if (all_pads_non_positive) { + return c_input.clone(); + } + + + for (size_t i = 0; i < (size_t)l_diff; i ++) { + new_shape.emplace_back(input_sizes[i]); + } + + for (const auto i : c10::irange((size_t)l_pad)) { + auto pad_idx = pad.size() - ((i + 1) * 2); + auto new_dim = input_sizes[l_diff + i] + pad[pad_idx] + pad[pad_idx + 1]; + TORCH_CHECK(new_dim > 0, "The input size ", input_sizes[l_diff + i], ", plus negative padding ", + pad[pad_idx], " and ", pad[pad_idx + 1], " resulted in a negative output size, " + "which is invalid. Check dimension ", l_diff + i, " of your input."); + new_shape.emplace_back(new_dim); + } + + at::Tensor output; + const auto memory_format = self.suggest_memory_format(); + if (self.is_quantized()) { + const auto qscheme = self.qscheme(); + TORCH_CHECK(qscheme == kPerTensorAffine || qscheme == kPerTensorSymmetric, + "Only per-tensor padding is supported."); + output = at::_empty_affine_quantized( + new_shape, self.options().memory_format(memory_format), + self.q_scale(), self.q_zero_point(), c10::nullopt); + } else { + output = at::empty(new_shape, self.options().memory_format(memory_format)); + } + output.fill_(value); + + auto c_output = output; + for (const auto i : c10::irange(l_diff, l_inp)) { + auto pad_idx = 2 * (l_inp - i - 1); + if (pad[pad_idx] > 0) { + c_output = c_output.narrow(i, pad[pad_idx], c_output.size(i) - pad[pad_idx]); + } + if (pad[pad_idx + 1] > 0) { + c_output = c_output.narrow(i, 0, c_output.size(i) - pad[pad_idx + 1]); + } + } + c_output.copy_(c_input); + return output; +} + +Tensor _pad_circular(const Tensor &self, IntArrayRef padding) { + const auto in_shape = self.sizes(); + const auto ndim = static_cast(in_shape.size()) - 2; + TORCH_CHECK(padding.size() + 4 == in_shape.size() * 2, + "Invalid padding size, expected ", ndim * 2, " but got ", padding.size()); + + DimVector out_shape(in_shape.size()); + out_shape[0] = in_shape[0]; + out_shape[1] = in_shape[1]; + + // Get shape of padded tensor + for (const auto i : c10::irange(ndim)) { + const auto pad_l = padding[2 * (ndim - i - 1) + 0]; + const auto pad_r = padding[2 * (ndim - i - 1) + 1]; + const auto size = in_shape[2 + i]; + out_shape[2 + i] = size + pad_l + pad_r; + + TORCH_CHECK( + pad_l <= size && pad_r <= size, + "Padding value causes wrapping around more than once."); + TORCH_CHECK( + out_shape[2 + i] >= 0, + "Negative padding value is resulting in an empty dimension"); + } + + auto out = self.new_empty(out_shape, self.options()); + + // Put original array into the padded array + Tensor out_slice = out; + Tensor in_slice = self; + constexpr int64_t zero = 0; + for (const auto i : c10::irange(ndim)) { + const auto dim = ndim - i + 1; + const auto pad_l = padding[2*i + 0]; + const auto pad_r = padding[2*i + 1]; + out_slice = out_slice.slice(dim, std::max(pad_l, zero), out_shape[dim] - std::max(pad_r, zero)); + in_slice = in_slice.slice(dim, std::max(-pad_l, zero), in_shape[dim] - std::max(-pad_r, zero)); + } + out_slice.copy_(in_slice); + + // The following steps first pad the beginning of the tensor (left side), + // and then pad the end of the tensor (right side). + // Note: Corners will be written more than once when ndim > 1. + // + // Only in cases where padding values are > 0 are when additional copying + // is required. + for (const auto i : c10::irange(ndim)) { + const auto dim = ndim - i + 1; + const auto pad_l = padding[2*i + 0]; + const auto pad_r = padding[2*i + 1]; + + if (pad_l > 0) { + out_slice = out.slice(dim, 0, pad_l); + in_slice = out.slice(dim, + out_shape[dim] - pad_l - std::max(pad_r, zero), + out_shape[dim] - std::max(pad_r, zero)); + out_slice.copy_(in_slice); + } + + if (pad_r > 0) { + out_slice = out.slice(dim, out_shape[dim] - pad_r, out_shape[dim]); + in_slice = out.slice(dim, std::max(pad_l, zero), std::max(pad_l, zero) + pad_r); + out_slice.copy_(in_slice); + } + } + + return out; +} + +Tensor _pad_enum(const Tensor &self, IntArrayRef pad, int64_t mode_int, c10::optional value) { + const auto input_dim = self.dim(); + TORCH_CHECK(pad.size() % 2 == 0, "Padding length must be divisible by 2"); + TORCH_CHECK(static_cast(pad.size()) <= input_dim * 2, "Padding length too large"); + auto mode = static_cast(mode_int); + + if (mode == at::padding_mode::constant) { + return at::constant_pad_nd(self, pad, value.value_or(0.0)); + } + TORCH_CHECK( + !value.has_value(), "Padding mode \"", + padding_mode_string(mode), + "\" doesn't take in value argument"); + + if (pad.size() == 2 && (input_dim == 2 || input_dim == 3)) { + switch (mode) { + case at::padding_mode::reflect: return at::reflection_pad1d(self, pad); + case at::padding_mode::replicate: return at::replication_pad1d(self, pad); + case at::padding_mode::circular: return at::_pad_circular(self, pad); + default: {} + } + } else if(pad.size() == 4 && (input_dim == 3 || input_dim == 4)) { + switch (mode) { + case at::padding_mode::reflect: return at::reflection_pad2d(self, pad); + case at::padding_mode::replicate: return at::replication_pad2d(self, pad); + case at::padding_mode::circular: return at::_pad_circular(self, pad); + default: {} + } + } else if (pad.size() == 6 && (input_dim == 4 || input_dim == 5)) { + switch (mode) { + case at::padding_mode::reflect: return at::reflection_pad3d(self, pad); + case at::padding_mode::replicate: return at::replication_pad3d(self, pad); + case at::padding_mode::circular: return at::_pad_circular(self, pad); + default: {} + } + } + C10_THROW_ERROR(NotImplementedError, + "Only 2D, 3D, 4D, 5D padding with non-constant padding are supported for now"); +} + +Tensor pad(const Tensor &self, IntArrayRef pad, c10::string_view mode, c10::optional value) { + const auto mode_enum = [&] { + if (mode == "reflect") { + return at::padding_mode::reflect; + } else if (mode == "constant") { + return at::padding_mode::constant; + } else if (mode == "replicate") { + return at::padding_mode::replicate; + } else if (mode == "circular") { + return at::padding_mode::circular; + } + C10_THROW_ERROR(NotImplementedError, + c10::str("Unrecognised padding mode ", mode)); + }(); + return at::native::_pad_enum(self, pad, static_cast(mode_enum), value); +} + +}} // namespace at::native diff --git a/aten/src/ATen/native/PadNd.h b/aten/src/ATen/native/PadNd.h new file mode 100644 index 00000000000000..37f59acb8a4ce0 --- /dev/null +++ b/aten/src/ATen/native/PadNd.h @@ -0,0 +1,22 @@ +#pragma once + +namespace at { + +enum class padding_mode { + reflect, + replicate, + circular, + constant, +}; + +static inline c10::string_view padding_mode_string(padding_mode m) { + switch (m) { + case padding_mode::reflect: return "reflect"; + case padding_mode::replicate: return "replicate"; + case padding_mode::circular: return "circular"; + case padding_mode::constant: return "constant"; + } + TORCH_CHECK(false, "Invalid padding mode (", static_cast(m), ")"); +} + +} // namespace at diff --git a/aten/src/ATen/native/QuantizedLinear.cpp b/aten/src/ATen/native/QuantizedLinear.cpp index 88513f34b9fb47..fcd8f6335b581d 100644 --- a/aten/src/ATen/native/QuantizedLinear.cpp +++ b/aten/src/ATen/native/QuantizedLinear.cpp @@ -13,7 +13,7 @@ #include #include #include -#include +#include #include diff --git a/aten/src/ATen/native/RNN.cpp b/aten/src/ATen/native/RNN.cpp index af387e3c43f978..f8db0ba311ad89 100644 --- a/aten/src/ATen/native/RNN.cpp +++ b/aten/src/ATen/native/RNN.cpp @@ -3,7 +3,7 @@ #include #include #include -#include +#include #include #include #include diff --git a/aten/src/ATen/native/ReduceOps.cpp b/aten/src/ATen/native/ReduceOps.cpp index cce0f1a3d3b89d..e5d40fcad40125 100644 --- a/aten/src/ATen/native/ReduceOps.cpp +++ b/aten/src/ATen/native/ReduceOps.cpp @@ -267,17 +267,31 @@ TORCH_META_FUNC(aminmax) } TORCH_META_FUNC(amax) -(const Tensor& self, IntArrayRef dims, bool keepdim) { +(const Tensor& self, IntArrayRef dim, bool keepdim) { auto maybe_result = maybe_get_output(); if (maybe_result.defined()) { TORCH_CHECK(self.scalar_type() == maybe_result.scalar_type(), "Expected the dtype for input and out to match, but got ", self.scalar_type(), " for input's dtype and ", maybe_result.scalar_type(), " for out's dtype."); } if (self.numel() == 0) { - at::native::zero_numel_check_dims(self, dims, "amax()"); + at::native::zero_numel_check_dims(self, dim, "amax()"); } const ScalarType& out_dtype = maybe_result.defined() ? maybe_result.scalar_type() : self.scalar_type(); - resize_reduction(*this, self, dims, keepdim, out_dtype); + resize_reduction(*this, self, dim, keepdim, out_dtype); +} + +TORCH_META_FUNC(amin) +(const Tensor& self, IntArrayRef dim, bool keepdim) { + auto maybe_result = maybe_get_output(); + if (maybe_result.defined()) { + TORCH_CHECK(self.scalar_type() == maybe_result.scalar_type(), "Expected the dtype for input and out to match, but got ", + self.scalar_type(), " for input's dtype and ", maybe_result.scalar_type(), " for out's dtype."); + } + if (self.numel() == 0) { + at::native::zero_numel_check_dims(self, dim, "amin()"); + } + const ScalarType& out_dtype = maybe_result.defined() ? maybe_result.scalar_type() : self.scalar_type(); + resize_reduction(*this, self, dim, keepdim, out_dtype); } } // namespace meta @@ -844,7 +858,7 @@ Tensor& diff_out(const Tensor& self, int64_t n, int64_t dim, const c10::optional } } -void pre_check_gradient(const Tensor& self, c10::optional spacing_size, c10::optional dim, int64_t edge_order) { +void pre_check_gradient(const Tensor& self, c10::optional spacing_size, at::OptionalIntArrayRef dim, int64_t edge_order) { // Helper for gradient function to make sure input data satisfies prerequisites TORCH_CHECK(self.scalar_type() != ScalarType::Byte, "torch.gradient does not support uint8 input."); if (spacing_size.has_value() && !dim.has_value()) { @@ -946,7 +960,7 @@ std::vector gradient_dim_preprocess(const Tensor& self, c10::optional gradient(const Tensor& self, TensorList coordinates, IntArrayRef dim, int64_t edge_order) { pre_check_gradient(self, c10::optional(coordinates.size()), - c10::optional(dim), + at::OptionalIntArrayRef(dim), edge_order); return gradient_helper(self, coordinates, dim, edge_order); } @@ -955,7 +969,7 @@ std::vector gradient(const Tensor& self, TensorList coordinates, c10::op const auto processed_dim = gradient_dim_preprocess(self, dim); pre_check_gradient(self, c10::optional(coordinates.size()), - dim.has_value() ? c10::optional(processed_dim) : c10::nullopt, + dim.has_value() ? at::OptionalIntArrayRef(processed_dim) : c10::nullopt, edge_order); return gradient_helper(self, coordinates, processed_dim, edge_order); } @@ -963,7 +977,7 @@ std::vector gradient(const Tensor& self, TensorList coordinates, c10::op std::vector gradient(const Tensor& self, c10::ArrayRef spacing, IntArrayRef dim, int64_t edge_order) { pre_check_gradient(self, c10::optional(spacing.size()), - c10::optional(dim), + at::OptionalIntArrayRef(dim), edge_order); return gradient_helper_float(self, spacing, dim, edge_order); } @@ -972,7 +986,7 @@ std::vector gradient(const Tensor& self, ArrayRef spacing, c10:: const auto processed_dim = gradient_dim_preprocess(self, dim); pre_check_gradient(self, c10::optional(spacing.size()), - dim.has_value() ? c10::optional(processed_dim) : c10::nullopt, + dim.has_value() ? at::OptionalIntArrayRef(processed_dim) : c10::nullopt, edge_order); return gradient_helper_float(self, spacing, processed_dim, edge_order); } @@ -983,7 +997,7 @@ std::vector gradient(const Tensor& self, const Scalar& unit_size, IntArr std::vector spacing(dim.size(), unit_size); pre_check_gradient(self, c10::optional(spacing.size()), - c10::optional(dim), + at::OptionalIntArrayRef(dim), edge_order); return gradient_helper_float(self, spacing, dim, edge_order); } @@ -997,7 +1011,7 @@ std::vector gradient(const Tensor& self, const c10::optional& un unit_size.has_value() ? unit_size.value() : 1.0) ; pre_check_gradient(self, unit_size.has_value() ? c10::optional(spacing.size()) : c10::nullopt, - dim.has_value() ? c10::optional(processed_dim) : c10::nullopt, + dim.has_value() ? at::OptionalIntArrayRef(processed_dim) : c10::nullopt, edge_order); return gradient_helper_float(self, spacing, processed_dim, edge_order); } @@ -1006,7 +1020,7 @@ std::vector gradient(const Tensor& self, IntArrayRef dim, int64_t edge_o std::vector spacing(dim.size(), 1.0) ; pre_check_gradient(self, c10::optional(spacing.size()), - c10::optional(dim), + at::OptionalIntArrayRef(dim), edge_order); return gradient_helper_float(self, spacing, dim, edge_order); } @@ -1429,29 +1443,17 @@ TORCH_IMPL_FUNC(any_all_out)(const Tensor& self, const Tensor& result) { allany_impl<0>(self, result, {}, false, or_stub); } -Tensor &amin_out(const Tensor& self, IntArrayRef dim, bool keepdim, Tensor& result) { - TORCH_CHECK(self.scalar_type() == result.scalar_type(), "Expected the dtype for input and out to match, but got ", - self.scalar_type(), " for input's dtype and ", result.scalar_type(), " for out's dtype."); - if (self.numel() == 0) { - zero_numel_check_dims(self, dim, "amin()"); - } - - auto iter = make_reduction("amin", result, self, dim, keepdim, self.scalar_type()); +TORCH_IMPL_FUNC(amin_out) (const Tensor& self, IntArrayRef dim, bool keepdim, const Tensor& result) { + auto iter = + meta::make_reduction(self, result, dim, keepdim, self.scalar_type()); if (iter.numel() != 0) { min_values_stub(iter.device_type(), iter); } - return result; -} - -Tensor amin(const Tensor& self, IntArrayRef dim, bool keepdim) { - Tensor result = at::empty({0}, self.options()); - return at::amin_out(result, self, dim, keepdim); } TORCH_IMPL_FUNC(amax_out) (const Tensor& self, IntArrayRef dim, bool keepdim, const Tensor& result) { - c10::MaybeOwned in = c10::MaybeOwned::borrowed(self); auto iter = - meta::make_reduction(*in, result, dim, keepdim, self.scalar_type()); + meta::make_reduction(self, result, dim, keepdim, self.scalar_type()); if (iter.numel() != 0) { max_values_stub(iter.device_type(), iter); } @@ -1560,7 +1562,7 @@ static double std_var_all_cpu(const Tensor& self, int64_t correction, bool take_ static Tensor& std_var_out( const char* fname, Tensor& result, const Tensor& self, - c10::optional dim, c10::optional correction_opt, + at::OptionalIntArrayRef dim, c10::optional correction_opt, bool keepdim, bool take_sqrt) { TORCH_CHECK(self.device().is_cpu() || self.device().is_cuda(), "std and var only supports tensors on a CPU or CUDA device, but got: ", @@ -1628,7 +1630,7 @@ static Tensor& std_var_out( static std::tuple std_var_mean_out( const char* fname, Tensor& result1, Tensor& result2, const Tensor& self, - c10::optional dim, c10::optional correction_opt, + at::OptionalIntArrayRef dim, c10::optional correction_opt, bool keepdim, bool take_sqrt) { AT_ASSERT(result1.defined() && result2.defined()); TORCH_CHECK(self.device().is_cpu() || self.is_cuda(), @@ -1699,13 +1701,13 @@ static std::tuple std_var_mean_out( std::tuple var_mean( const Tensor& self, IntArrayRef dim, bool unbiased, bool keepdim) { - return at::var_mean(self, /*dim=*/c10::optional(dim), + return at::var_mean(self, /*dim=*/at::OptionalIntArrayRef(dim), /*correction=*/int64_t{unbiased ? 1 : 0}, keepdim); } std::tuple std_mean( const Tensor& self, IntArrayRef dim, bool unbiased, bool keepdim) { - return at::std_mean(self, /*dim=*/c10::optional(dim), + return at::std_mean(self, /*dim=*/at::OptionalIntArrayRef(dim), /*correction=*/int64_t{unbiased ? 1 : 0}, keepdim); } @@ -1732,7 +1734,7 @@ static TensorOptions options_to_value_type(TensorOptions opts) { } std::tuple var_mean( - const Tensor& self, c10::optional dim, + const Tensor& self, at::OptionalIntArrayRef dim, c10::optional correction, bool keepdim) { Tensor result1 = at::empty({0}, options_to_value_type(self.options())); Tensor result2 = at::empty({0}, self.options()); @@ -1741,7 +1743,7 @@ std::tuple var_mean( } std::tuple std_mean( - const Tensor& self, c10::optional dim, + const Tensor& self, at::OptionalIntArrayRef dim, c10::optional correction, bool keepdim) { Tensor result1 = at::empty({0}, options_to_value_type(self.options())); Tensor result2 = at::empty({0}, self.options()); @@ -1755,12 +1757,12 @@ Tensor var(const Tensor& self, bool unbiased) { } Tensor var(const Tensor& self, IntArrayRef dim, bool unbiased, bool keepdim) { - return at::var(self, /*dim=*/c10::optional(dim), + return at::var(self, /*dim=*/at::OptionalIntArrayRef(dim), /*correction=*/int64_t{unbiased ? 1 : 0}, keepdim); } Tensor& var_out(const Tensor& self, IntArrayRef dim, bool unbiased, bool keepdim, Tensor& result) { - return at::var_out(result, self, /*dim=*/c10::optional(dim), + return at::var_out(result, self, /*dim=*/at::OptionalIntArrayRef(dim), /*correction=*/int64_t{unbiased ? 1 : 0}, keepdim); } @@ -1770,35 +1772,35 @@ Tensor std(const Tensor& self, bool unbiased) { } Tensor std(const Tensor& self, IntArrayRef dim, bool unbiased, bool keepdim) { - return at::std(self, /*dim=*/c10::optional(dim), + return at::std(self, /*dim=*/at::OptionalIntArrayRef(dim), /*correction=*/int64_t{unbiased ? 1 : 0}, keepdim); } Tensor& std_out(const Tensor& self, IntArrayRef dim, bool unbiased, bool keepdim, Tensor& result) { - return at::std_out(result, self, /*dim=*/c10::optional(dim), + return at::std_out(result, self, /*dim=*/at::OptionalIntArrayRef(dim), /*correction=*/int64_t{unbiased ? 1 : 0}, keepdim); } -Tensor std(const Tensor& self, c10::optional dim, +Tensor std(const Tensor& self, at::OptionalIntArrayRef dim, c10::optional correction, bool keepdim) { Tensor result = at::empty({0}, options_to_value_type(self.options())); return std_var_out("std", result, self, dim, correction, keepdim, true); } Tensor& std_out( - const Tensor& self, c10::optional dim, + const Tensor& self, at::OptionalIntArrayRef dim, c10::optional correction, bool keepdim, Tensor& result) { return std_var_out("std", result, self, dim, correction, keepdim, true); } Tensor& var_out( - const Tensor& self, c10::optional dim, + const Tensor& self, at::OptionalIntArrayRef dim, c10::optional correction, bool keepdim, Tensor& result) { return std_var_out("var", result, self, dim, correction, keepdim, false); } Tensor var( - const Tensor& self, c10::optional dim, + const Tensor& self, at::OptionalIntArrayRef dim, c10::optional correction, bool keepdim) { Tensor result = at::empty({0}, options_to_value_type(self.options())); return std_var_out("var", result, self, dim, correction, keepdim, false); @@ -1983,5 +1985,9 @@ Tensor value_selecting_reduction_backward(const Tensor& grad, int64_t dim, const return at::zeros(sizes, grad.options()).scatter_(dim, indices, grad); } +Tensor sum_csr(const Tensor &self, c10::optional dtype) { + return self.values().sum(dtype); +} + } // namespace native } // namespace at diff --git a/aten/src/ATen/native/ReduceOpsUtils.h b/aten/src/ATen/native/ReduceOpsUtils.h index fa93faa782d757..7951d7eda4e178 100644 --- a/aten/src/ATen/native/ReduceOpsUtils.h +++ b/aten/src/ATen/native/ReduceOpsUtils.h @@ -167,7 +167,7 @@ static Tensor review_reduce_result(const Tensor& result, int ndim, DimMask mask, static TensorIterator make_reduction( const char* name, Tensor& result, const Tensor& self, - c10::optional dim_opt, + at::OptionalIntArrayRef dim_opt, bool keepdim, ScalarType in_dtype, ScalarType out_dtype) { // check that result type and dtype match if provided TORCH_CHECK( @@ -192,7 +192,7 @@ static TensorIterator make_reduction( static C10_UNUSED TensorIterator make_reduction( const char* name, Tensor& result, const Tensor& self, - c10::optional dim, bool keepdim, ScalarType out_dtype) { + at::OptionalIntArrayRef dim, bool keepdim, ScalarType out_dtype) { // special case for type promotion in mixed precision, improves computational // efficiency. // not generalize this to common mismatched input/output types to avoid cross @@ -205,7 +205,7 @@ static C10_UNUSED TensorIterator make_reduction( static TensorIterator make_reduction( const char* name, Tensor& result1, Tensor& result2, const Tensor& self, - c10::optional dim_opt, bool keepdim, ScalarType dtype1, + at::OptionalIntArrayRef dim_opt, bool keepdim, ScalarType dtype1, ScalarType dtype2) { // check that result type and dtype match if provided TORCH_CHECK( @@ -242,7 +242,7 @@ static TensorIterator make_reduction( static C10_UNUSED TensorIterator make_reduction( const char* name, Tensor& result1, Tensor& result2, const Tensor& self, - c10::optional dim, bool keepdim, ScalarType dtype) { + at::OptionalIntArrayRef dim, bool keepdim, ScalarType dtype) { return make_reduction(name, result1, result2, self, dim, keepdim, dtype, dtype); } @@ -257,7 +257,11 @@ static void zero_numel_check_dims(const Tensor& self, const int64_t dim, const c } } -static C10_UNUSED void zero_numel_check_dims(const Tensor& self, const IntArrayRef dim, const char *fn_name) { +static void zero_numel_check_dims(const Tensor& self, const IntArrayRef dim, const char *fn_name) { + TORCH_CHECK( + !dim.empty(), + fn_name, ": Expected reduction dim to be specified for input.numel() == 0. ", + "Specify the reduction dim with the 'dim' argument."); for (const int64_t d : dim) { zero_numel_check_dims(self, d, fn_name); } diff --git a/aten/src/ATen/native/ReflectionPad.cpp b/aten/src/ATen/native/ReflectionPad.cpp index f6a1bc43aba76a..fab267ef43a06d 100644 --- a/aten/src/ATen/native/ReflectionPad.cpp +++ b/aten/src/ATen/native/ReflectionPad.cpp @@ -1,6 +1,7 @@ #include #include #include +#include #include namespace at { @@ -266,76 +267,43 @@ inline void reflection_pad1d_out_loop( void reflection_pad1d_out_template( const Tensor& output, const Tensor& input_, IntArrayRef padding) { - int64_t dim_plane = 0; - int64_t dim_w = 1; - int64_t nbatch = 1; - // allow dim=0 only in the batch dimension. - TORCH_CHECK( - (input_.ndimension() == 2 && input_.size(1) != 0) || - (input_.ndimension() == 3 && input_.size(1) != 0 && input_.size(2) != 0), - "2D or 3D (batch mode) tensor expected for input, but got: ", input_); - - if (input_.ndimension() == 3) { - nbatch = input_.size(0); - dim_w++; - dim_plane++; - } - - /* sizes */ - auto pad_l = padding[0]; - auto pad_r = padding[1]; - - int64_t nplane = input_.size(dim_plane); - int64_t input_w = input_.size(dim_w); - int64_t output_w = input_w + pad_l + pad_r; - - TORCH_CHECK(pad_l < input_w && pad_r < input_w, "Argument #4: Padding size " - "should be less than the corresponding input dimension, but got: padding (", - pad_l, ", ", pad_r, ") at dimension ", dim_w, " of input ", input_.sizes()); - - TORCH_CHECK(output_w >= 1 , 2, - "input (W: ", input_w, ")is too small. Calculated output W: ", output_w); - /* get contiguous input */ Tensor input = input_.contiguous(); - /* resize output */ if (input.ndimension() == 2) { - output.resize_({nplane, output_w}); if (input.is_quantized()) { AT_DISPATCH_QINT_TYPES(input.scalar_type(), "qreflection_pad1d", [&]() { reflection_pad1d_out_frame( input.data_ptr(), output.data_ptr(), - nplane, - input_w, output_w, - pad_l); + input.size(0), + input.size(1), output.size(-1), + padding[0]); }); } else { AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES(input.scalar_type(), "reflection_pad1d", [&] { reflection_pad1d_out_frame( input.data_ptr(), output.data_ptr(), - nplane, - input_w, output_w, - pad_l); + input.size(0), + input.size(1), output.size(-1), + padding[0]); }); } } else { - output.resize_({nbatch, nplane, output_w}); if (input.is_quantized()) { AT_DISPATCH_QINT_TYPES(input.scalar_type(), "qreflection_pad1d", [&]() { reflection_pad1d_out_loop( input.data_ptr(), output.data_ptr(), - nbatch, nplane, - input_w, output_w, - pad_l); + output.size(0), input.size(1), + input.size(2), output.size(-1), + padding[0]); }); } else { AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES(input.scalar_type(), "reflection_pad1d", [&] { reflection_pad1d_out_loop( input.data_ptr(), output.data_ptr(), - nbatch, nplane, - input_w, output_w, - pad_l); + output.size(0), input.size(1), + input.size(2), output.size(-1), + padding[0]); }); } } @@ -854,20 +822,18 @@ static void reflection_pad3d_backward_out_loop( } // namespace +// TODO: I tihnk this function should be removed since we implement it with +// TORCH_IMPL_FUNC below Tensor& reflection_pad1d_out_cpu(const Tensor& input, IntArrayRef padding, Tensor& output) { reflection_pad1d_out_template(output, input, padding); return output; } -// This function is needed because structured_delegate currently does not -// support quantized backends. This function may be able to be omitted in the -// future if support for quantized backends is enabled for structured_delegate -Tensor reflection_pad1d_quantized_cpu(const Tensor& input, IntArrayRef padding) { +Tensor& reflection_pad1d_out_quantized_cpu(const Tensor& input, IntArrayRef padding, + Tensor& output) { TORCH_CHECK(input.qscheme() == kPerTensorAffine, "Only per tensor quantization is supported"); - Tensor output = at::_empty_affine_quantized({0}, input.options(), - input.q_scale(), - input.q_zero_point()); + set_quantizer_(output, make_per_tensor_affine_quantizer(input.q_scale(), input.q_zero_point(), input.scalar_type())); reflection_pad1d_out_template(output, input, padding); return output; } diff --git a/aten/src/ATen/native/Repeat.h b/aten/src/ATen/native/Repeat.h index 9751f2ec8be7a4..dadbfb0c2374bb 100644 --- a/aten/src/ATen/native/Repeat.h +++ b/aten/src/ATen/native/Repeat.h @@ -1,6 +1,14 @@ #pragma once -#include +#include +#include + +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#include +#endif namespace at { namespace native { diff --git a/aten/src/ATen/native/Resize.h b/aten/src/ATen/native/Resize.h index 3540ef8b21ac4d..c6fe2b3d214670 100644 --- a/aten/src/ATen/native/Resize.h +++ b/aten/src/ATen/native/Resize.h @@ -2,6 +2,7 @@ #include #include +#include #include #include @@ -30,22 +31,16 @@ TORCH_API bool resize_output_check(const Tensor& output, IntArrayRef shape); TORCH_API void resize_bytes_cpu(StorageImpl* storage, size_t size_bytes); -static inline void maybe_resize_storage_cpu(TensorImpl* self, uint64_t new_size) { +static inline void maybe_resize_storage_cpu(TensorImpl* self, size_t new_size_bytes) { // It does not make sense to try to resize a storage // to hold 0 elements, and this can break // if storage_offset is positive but // new_size is 0, so just bail in that case // (same comment is in cuda/Resize.h) - if (new_size == 0) { + if (self->numel() == 0) { return; } - const auto new_size_bytes_i = - (new_size + self->storage_offset()) * self->dtype().itemsize(); - TORCH_CHECK(!overflows(new_size_bytes_i), "Requested storage size (", - new_size_bytes_i, ") cannot be represented as a size_t"); - const auto new_size_bytes = static_cast(new_size_bytes_i); - const Storage& storage = self->unsafe_storage(); if (!storage) { auto new_storage = c10::make_intrusive( @@ -62,21 +57,25 @@ static inline void maybe_resize_storage_cpu(TensorImpl* self, uint64_t new_size) inline TensorImpl* resize_impl_cpu_( TensorImpl* self, IntArrayRef size, - c10::optional stride, + at::OptionalIntArrayRef stride, bool resize_storage = true) { - if (self->sizes() == size && (!stride || self->strides() == stride)) { + if (self->sizes() == size && (!stride || self->strides() == stride.value())) { return self; } - int64_t storage_size = 1; + const auto itemsize = self->dtype().itemsize(); + const auto storage_offset = self->storage_offset(); + size_t storage_size = 1; if (stride) { self->set_sizes_and_strides(size, *stride); - // NB: storage size can be different from numel. - storage_size = storage_size_for(size, *stride); + storage_size = at::detail::computeStorageNbytes( + size, *stride, itemsize, storage_offset); } else { self->set_sizes_contiguous(size); - storage_size = self->numel(); + storage_size = at::detail::computeStorageNbytesContiguous( + size, itemsize, storage_offset); } + if (resize_storage) { maybe_resize_storage_cpu(self, storage_size); } @@ -158,6 +157,12 @@ inline void setStrided( IntArrayRef stride, int64_t storage_offset) { TORCH_CHECK(size.size() == stride.size(), "mismatch in length of strides and shape"); + for (auto val : stride) { + TORCH_CHECK(val >= 0, + "as_strided: Negative strides are not supported at the moment, " + "got strides: ", stride); + } + auto* self_ = self.unsafeGetTensorImpl(); checkInBoundsForStorage( size, stride, storage_offset, self_->dtype(), self_->storage()); @@ -170,11 +175,6 @@ inline void setStrided( if (self_->sizes() == size && self_->strides() == stride) { return; } - for (auto val : stride) { - TORCH_CHECK(val >= 0, - "as_strided: Negative strides are not supported at the moment, " - "got strides: ", stride); - } self_->set_sizes_and_strides(size, stride); } diff --git a/aten/src/ATen/native/Scalar.cpp b/aten/src/ATen/native/Scalar.cpp index aecfffadb02025..7342c4806d44c5 100644 --- a/aten/src/ATen/native/Scalar.cpp +++ b/aten/src/ATen/native/Scalar.cpp @@ -20,8 +20,8 @@ Scalar item(const Tensor& self) { Scalar _local_scalar_dense_cpu(const Tensor& self) { Scalar r; - AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3( - at::ScalarType::Half, at::ScalarType::Bool, at::ScalarType::BFloat16, self.scalar_type(), "_local_scalar_dense_cpu", [&] { + AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND4( + kComplexHalf, kHalf, kBool, kBFloat16, self.scalar_type(), "_local_scalar_dense_cpu", [&] { scalar_t value = *self.data_ptr(); r = Scalar(value); }); diff --git a/aten/src/ATen/native/ScatterGatherChecks.h b/aten/src/ATen/native/ScatterGatherChecks.h index 1b71eb40975db7..92e1edeb5fe029 100644 --- a/aten/src/ATen/native/ScatterGatherChecks.h +++ b/aten/src/ATen/native/ScatterGatherChecks.h @@ -1,7 +1,7 @@ #pragma once #include -#include +#include #include #include diff --git a/aten/src/ATen/native/SegmentReduce.h b/aten/src/ATen/native/SegmentReduce.h index 11a399ae77a1a0..1e5b87eefb6ddb 100644 --- a/aten/src/ATen/native/SegmentReduce.h +++ b/aten/src/ATen/native/SegmentReduce.h @@ -1,10 +1,12 @@ #pragma once -#include #include +#include #include namespace at { +class Tensor; + namespace native { enum SegmentReductionType { MAX, MEAN, MIN, SUM }; diff --git a/aten/src/ATen/native/SoftMax.cpp b/aten/src/ATen/native/SoftMax.cpp index b4635365e43224..0ef278a12a6c4d 100644 --- a/aten/src/ATen/native/SoftMax.cpp +++ b/aten/src/ATen/native/SoftMax.cpp @@ -170,7 +170,7 @@ void host_softmax( } } else { for (const auto d : c10::irange(0, dim_size)) { - if (mask_data[d * dim_stride]) { + if (!mask_data[d * dim_stride]) { max_input = is_meaningful_max ? std::max(max_input, input_data[d * dim_stride]) : input_data[d * dim_stride]; @@ -183,7 +183,7 @@ void host_softmax( acc_type tmpsum = 0; for (const auto d : c10::irange(dim_size)) { scalar_t z{}; - if (!MaskedSoftMax || mask_data[d * dim_stride]) { + if (!MaskedSoftMax || !mask_data[d * dim_stride]) { z = std::exp(input_data[d * dim_stride] - max_input); } else { z = 0; diff --git a/aten/src/ATen/native/SpectralOps.cpp b/aten/src/ATen/native/SpectralOps.cpp index 41a182bd29042e..af000cc70d9fe6 100644 --- a/aten/src/ATen/native/SpectralOps.cpp +++ b/aten/src/ATen/native/SpectralOps.cpp @@ -219,7 +219,7 @@ struct ShapeAndDims { // Wraps dimensions and applies defaulting behavior. // Also checks transform dims are unique and transform shape is non-empty. ShapeAndDims canonicalize_fft_shape_and_dim_args( - Tensor input, c10::optional shape, c10::optional dim) { + Tensor input, at::OptionalIntArrayRef shape, at::OptionalIntArrayRef dim) { const int64_t input_dim = input.dim(); const IntArrayRef input_sizes = input.sizes(); ShapeAndDims ret; @@ -372,8 +372,8 @@ Tensor& fft_ihfft_out(const Tensor& self, c10::optional n, return out; } -Tensor fft_fftn(const Tensor& self, c10::optional s, - c10::optional dim, +Tensor fft_fftn(const Tensor& self, at::OptionalIntArrayRef s, + at::OptionalIntArrayRef dim, c10::optional norm) { auto desc = canonicalize_fft_shape_and_dim_args(self, s, dim); // TODO: For real input, perform rfftn then mirror with conjugate symmetry @@ -382,8 +382,8 @@ Tensor fft_fftn(const Tensor& self, c10::optional s, } Tensor& fft_fftn_out(const Tensor& self, - c10::optional s, - c10::optional dim, + at::OptionalIntArrayRef s, + at::OptionalIntArrayRef dim, c10::optional norm, Tensor& out) { auto desc = canonicalize_fft_shape_and_dim_args(self, s, dim); // TODO: For real input, perform rfftn then mirror with conjugate symmetry @@ -392,8 +392,8 @@ Tensor& fft_fftn_out(const Tensor& self, return out; } -Tensor fft_ifftn(const Tensor& self, c10::optional s, - c10::optional dim, +Tensor fft_ifftn(const Tensor& self, at::OptionalIntArrayRef s, + at::OptionalIntArrayRef dim, c10::optional norm) { auto desc = canonicalize_fft_shape_and_dim_args(self, s, dim); Tensor input = promote_tensor_fft(self, /*require_complex=*/true); @@ -401,8 +401,8 @@ Tensor fft_ifftn(const Tensor& self, c10::optional s, } Tensor& fft_ifftn_out(const Tensor& self, - c10::optional s, - c10::optional dim, + at::OptionalIntArrayRef s, + at::OptionalIntArrayRef dim, c10::optional norm, Tensor& out) { auto desc = canonicalize_fft_shape_and_dim_args(self, s, dim); Tensor input = promote_tensor_fft(self, /*require_complex=*/true); @@ -411,8 +411,8 @@ Tensor& fft_ifftn_out(const Tensor& self, } static Tensor fft_rfftn_impl(Tensor out, const Tensor& self, - c10::optional s, - c10::optional dim, + at::OptionalIntArrayRef s, + at::OptionalIntArrayRef dim, const c10::optional& norm_str) { TORCH_CHECK(!self.is_complex(), "rfftn expects a real-valued input tensor, but got ", self.scalar_type()); auto desc = canonicalize_fft_shape_and_dim_args(self, s, dim); @@ -424,15 +424,15 @@ static Tensor fft_rfftn_impl(Tensor out, const Tensor& self, return fft_r2c_maybe_out(fname, out, x, desc.dim, norm, /*onesided=*/true); } -Tensor fft_rfftn(const Tensor& self, c10::optional s, - c10::optional dim, +Tensor fft_rfftn(const Tensor& self, at::OptionalIntArrayRef s, + at::OptionalIntArrayRef dim, c10::optional norm_str) { return fft_rfftn_impl({}, self, s, dim, norm_str); } Tensor& fft_rfftn_out(const Tensor& self, - c10::optional s, - c10::optional dim, + at::OptionalIntArrayRef s, + at::OptionalIntArrayRef dim, c10::optional norm_str, Tensor& out) { fft_rfftn_impl(out, self, s, dim, norm_str); return out; @@ -440,8 +440,8 @@ Tensor& fft_rfftn_out(const Tensor& self, ShapeAndDims canonicalize_fft_c2r_shape_and_dim_args( c10::string_view fname, const Tensor& self, - const c10::optional& s, - const c10::optional& dims, + const at::OptionalIntArrayRef& s, + const at::OptionalIntArrayRef& dims, int64_t& last_dim_size) { auto desc = canonicalize_fft_shape_and_dim_args(self, s, dims); TORCH_CHECK(desc.shape.size() > 0, fname, " must transform at least one axis"); @@ -463,8 +463,8 @@ ShapeAndDims canonicalize_fft_c2r_shape_and_dim_args( } static Tensor fft_irfftn_impl(Tensor out, const Tensor& self, - c10::optional s, - c10::optional dim, + at::OptionalIntArrayRef s, + at::OptionalIntArrayRef dim, const c10::optional& norm_str) { int64_t last_dim_size = 0; auto desc = canonicalize_fft_c2r_shape_and_dim_args( @@ -477,15 +477,15 @@ static Tensor fft_irfftn_impl(Tensor out, const Tensor& self, } Tensor fft_irfftn(const Tensor& self, - c10::optional s, - c10::optional dim, + at::OptionalIntArrayRef s, + at::OptionalIntArrayRef dim, c10::optional norm_str) { return fft_irfftn_impl({}, self, s, dim, norm_str); } Tensor& fft_irfftn_out(const Tensor& self, - c10::optional s, - c10::optional dim, + at::OptionalIntArrayRef s, + at::OptionalIntArrayRef dim, c10::optional norm_str, Tensor& out) { fft_irfftn_impl(out, self, s, dim, norm_str); return out; @@ -493,8 +493,8 @@ Tensor& fft_irfftn_out(const Tensor& self, static Tensor fft_hfftn_impl( const Tensor& self, - c10::optional s, - c10::optional dim, + at::OptionalIntArrayRef s, + at::OptionalIntArrayRef dim, c10::optional norm_str, const Tensor& out) { constexpr c10::string_view fname = "hfftn"; @@ -521,16 +521,16 @@ static Tensor fft_hfftn_impl( Tensor fft_hfftn( const Tensor& self, - c10::optional s, - c10::optional dim, + at::OptionalIntArrayRef s, + at::OptionalIntArrayRef dim, c10::optional norm) { return fft_hfftn_impl(self, s, dim, norm, {}); } const Tensor& fft_hfftn_out( const Tensor& self, - c10::optional s, - c10::optional dim, c10::optional norm, + at::OptionalIntArrayRef s, + at::OptionalIntArrayRef dim, c10::optional norm, const Tensor& out) { fft_hfftn_impl(self, s, dim, norm, out); return out; @@ -538,8 +538,8 @@ const Tensor& fft_hfftn_out( static Tensor fft_ihfftn_impl( const Tensor& self, - const c10::optional& s, - const c10::optional& dim, + const at::OptionalIntArrayRef& s, + const at::OptionalIntArrayRef& dim, const c10::optional& norm_str, const Tensor& out) { constexpr c10::string_view fname = "ihfftn"; @@ -563,80 +563,80 @@ static Tensor fft_ihfftn_impl( Tensor fft_ihfftn( const Tensor& self, - c10::optional s, - c10::optional dim, + at::OptionalIntArrayRef s, + at::OptionalIntArrayRef dim, c10::optional norm) { return fft_ihfftn_impl(self, s, dim, norm, {}); } const Tensor& fft_ihfftn_out( const Tensor& self, - c10::optional s, - c10::optional dim, + at::OptionalIntArrayRef s, + at::OptionalIntArrayRef dim, c10::optional norm, const Tensor& out) { fft_ihfftn_impl(self, s, dim, norm, out); return out; } -Tensor fft_fft2(const Tensor& self, c10::optional s, +Tensor fft_fft2(const Tensor& self, at::OptionalIntArrayRef s, IntArrayRef dim, c10::optional norm) { return native::fft_fftn(self, s, dim, std::move(norm)); } -Tensor& fft_fft2_out(const Tensor& self, c10::optional s, +Tensor& fft_fft2_out(const Tensor& self, at::OptionalIntArrayRef s, IntArrayRef dim, c10::optional norm, Tensor& out) { return native::fft_fftn_out(self, s, dim, std::move(norm), out); } -Tensor fft_ifft2(const Tensor& self, c10::optional s, +Tensor fft_ifft2(const Tensor& self, at::OptionalIntArrayRef s, IntArrayRef dim, c10::optional norm) { return native::fft_ifftn(self, s, dim, std::move(norm)); } -Tensor& fft_ifft2_out(const Tensor& self, c10::optional s, +Tensor& fft_ifft2_out(const Tensor& self, at::OptionalIntArrayRef s, IntArrayRef dim, c10::optional norm, Tensor& out) { return native::fft_ifftn_out(self, s, dim, std::move(norm), out); } -Tensor fft_rfft2(const Tensor& self, c10::optional s, +Tensor fft_rfft2(const Tensor& self, at::OptionalIntArrayRef s, IntArrayRef dim, c10::optional norm) { return native::fft_rfftn(self, s, dim, std::move(norm)); } -Tensor& fft_rfft2_out(const Tensor& self, c10::optional s, +Tensor& fft_rfft2_out(const Tensor& self, at::OptionalIntArrayRef s, IntArrayRef dim, c10::optional norm, Tensor& out) { return native::fft_rfftn_out(self, s, dim, std::move(norm), out); } -Tensor fft_irfft2(const Tensor& self, c10::optional s, +Tensor fft_irfft2(const Tensor& self, at::OptionalIntArrayRef s, IntArrayRef dim, c10::optional norm) { return native::fft_irfftn(self, s, dim, std::move(norm)); } -Tensor& fft_irfft2_out(const Tensor& self, c10::optional s, +Tensor& fft_irfft2_out(const Tensor& self, at::OptionalIntArrayRef s, IntArrayRef dim, c10::optional norm, Tensor& out) { return native::fft_irfftn_out(self, s, dim, std::move(norm), out); } const Tensor& fft_hfft2_out( - const Tensor& self, c10::optional s, IntArrayRef dim, + const Tensor& self, at::OptionalIntArrayRef s, IntArrayRef dim, c10::optional norm, const Tensor& out) { return native::fft_hfftn_out(self, s, dim, std::move(norm), out); } -Tensor fft_hfft2(const Tensor& self, c10::optional s, +Tensor fft_hfft2(const Tensor& self, at::OptionalIntArrayRef s, IntArrayRef dim, c10::optional norm) { return native::fft_hfftn(self, s, dim, std::move(norm)); } const Tensor& fft_ihfft2_out( - const Tensor& self, c10::optional s, IntArrayRef dim, + const Tensor& self, at::OptionalIntArrayRef s, IntArrayRef dim, c10::optional norm, const Tensor& out) { return native::fft_ihfftn_out(self, s, dim, std::move(norm), out); } -Tensor fft_ihfft2(const Tensor& self, c10::optional s, +Tensor fft_ihfft2(const Tensor& self, at::OptionalIntArrayRef s, IntArrayRef dim, c10::optional norm) { return native::fft_ihfftn(self, s, dim, std::move(norm)); } @@ -687,7 +687,7 @@ Tensor fft_rfftfreq(int64_t n, double d, // If an array dim is specified, wraps them according to self.dim(). // Otherwise returns a vector of all dims. -DimVector default_alldims(const Tensor& self, c10::optional dim_opt) { +DimVector default_alldims(const Tensor& self, at::OptionalIntArrayRef dim_opt) { DimVector dim; if (dim_opt) { IntArrayRef dim_unwrapped = *dim_opt; @@ -702,7 +702,7 @@ DimVector default_alldims(const Tensor& self, c10::optional dim_opt return dim; } -Tensor fft_fftshift(const Tensor& x, c10::optional dim_opt) { +Tensor fft_fftshift(const Tensor& x, at::OptionalIntArrayRef dim_opt) { auto dim = default_alldims(x, dim_opt); IntArrayRef x_sizes = x.sizes(); @@ -714,7 +714,7 @@ Tensor fft_fftshift(const Tensor& x, c10::optional dim_opt) { return at::roll(x, shift, dim); } -Tensor fft_ifftshift(const Tensor& x, c10::optional dim_opt) { +Tensor fft_ifftshift(const Tensor& x, at::OptionalIntArrayRef dim_opt) { auto dim = default_alldims(x, dim_opt); IntArrayRef x_sizes = x.sizes(); @@ -759,14 +759,11 @@ static Stream& write_opt(Stream& SS, const optional& value) { * * This is modeled after librosa but with support for complex time-domain * signals and complex windows. - * - * NOTE: librosa's center and pad_mode arguments are currently only implemented - * in python because it uses torch.nn.functional.pad which is python-only. */ Tensor stft(const Tensor& self, const int64_t n_fft, const optional hop_lengthOpt, const optional win_lengthOpt, const c10::optional& window_opt, - const bool normalized, const optional onesidedOpt, - const optional return_complexOpt) { + const bool center, c10::string_view mode, const bool normalized, + const optional onesidedOpt, const optional return_complexOpt) { // See [Note: hacky wrapper removal for optional tensor] c10::MaybeOwned window_maybe_owned = at::borrow_from_optional_tensor(window_opt); const Tensor& window = *window_maybe_owned; @@ -824,6 +821,19 @@ Tensor stft(const Tensor& self, const int64_t n_fft, const optional hop if (self.dim() == 1) { input = input.unsqueeze(0); } + + if (center) { + const auto input_shape = input.sizes(); + const auto input_dim = input_shape.size(); + const auto extra_dims = std::max(size_t{3}, input_dim) - input_dim; + const auto pad_amount = n_fft / 2; + + DimVector extended_shape(extra_dims, 1); + extended_shape.append(input_shape.begin(), input_shape.end()); + input = at::pad(input.view(extended_shape), {pad_amount, pad_amount}, mode); + input = input.view(IntArrayRef(input.sizes()).slice(extra_dims)); + } + int64_t batch = input.size(0); int64_t len = input.size(1); if (n_fft <= 0 || n_fft > len) { @@ -897,6 +907,17 @@ Tensor stft(const Tensor& self, const int64_t n_fft, const optional hop } } +Tensor stft( + const Tensor& self, const int64_t n_fft, const optional hop_lengthOpt, + const optional win_lengthOpt, const c10::optional& window_opt, + const bool normalized, + const optional onesidedOpt, const optional return_complexOpt) { + return at::stft( + self, n_fft, hop_lengthOpt, win_lengthOpt, window_opt, + /*center=*/false, /*mode=*/"constant", normalized, onesidedOpt, + return_complexOpt); +} + // Create complex tensor from the old style of real tensor with size=(..., 2) // This is to support istft in the transition to requiring complex input. // NOTE: This may return a view of the input tensor, or might clone if necessary @@ -1090,14 +1111,6 @@ Tensor istft(const Tensor& self, const int64_t n_fft, const optional ho #undef REPR } -Tensor stft(const Tensor& self, const int64_t n_fft, const optional hop_lengthOpt, - const optional win_lengthOpt, const Tensor& window, - const bool normalized, const optional onesidedOpt) { - return at::native::stft( - self, n_fft, hop_lengthOpt, win_lengthOpt, window, normalized, onesidedOpt, - /*return_complex=*/c10::nullopt); -} - Tensor istft(const Tensor& self, const int64_t n_fft, const optional hop_lengthOpt, const optional win_lengthOpt, const Tensor& window, const bool center, const bool normalized, const optional onesidedOpt, diff --git a/aten/src/ATen/native/TensorAdvancedIndexing.cpp b/aten/src/ATen/native/TensorAdvancedIndexing.cpp index 340bc5a822ad0a..9492e2c02b43f8 100644 --- a/aten/src/ATen/native/TensorAdvancedIndexing.cpp +++ b/aten/src/ATen/native/TensorAdvancedIndexing.cpp @@ -74,13 +74,29 @@ namespace at { namespace meta { -native::SCATTER_GATHER_OP get_operator_enum(const c10::string_view reduce) { - if (reduce == "add") { - return native::SCATTER_GATHER_OP::REDUCE_ADD; - } else if (reduce == "multiply") { - return native::SCATTER_GATHER_OP::REDUCE_MULTIPLY; +native::SCATTER_GATHER_OP get_operator_enum(const c10::string_view reduce, bool use_new_options = false) { + if (use_new_options) { + if (reduce == "sum") { + return native::SCATTER_GATHER_OP::REDUCE_ADD; + } else if (reduce == "prod") { + return native::SCATTER_GATHER_OP::REDUCE_MULTIPLY; + } else if (reduce == "mean") { + return native::SCATTER_GATHER_OP::REDUCE_MEAN; + } else if (reduce == "amax") { + return native::SCATTER_GATHER_OP::REDUCE_MAXIMUM; + } else if (reduce == "amin") { + return native::SCATTER_GATHER_OP::REDUCE_MINIMUM; + } else { + TORCH_CHECK(false, "reduce argument must be either sum, prod, mean, amax or amin."); + } } else { - TORCH_CHECK(false, "reduce argument must be either add or multiply."); + if (reduce == "add") { + return native::SCATTER_GATHER_OP::REDUCE_ADD; + } else if (reduce == "multiply") { + return native::SCATTER_GATHER_OP::REDUCE_MULTIPLY; + } else { + TORCH_CHECK(false, "reduce argument must be either add or multiply.") + } } } @@ -113,7 +129,7 @@ TORCH_META_FUNC(gather) at::native::gather_shape_check(self, wrapped_dim, index); } -template +template void scatter_meta_impl( Meta& meta, const Tensor& self, @@ -137,7 +153,7 @@ void scatter_meta_impl( meta.set_output(self.sizes(), self.options()); if (reduce.has_value()) { // Check if we have a valid reduce operator. - get_operator_enum(reduce.value()); + get_operator_enum(reduce.value(), use_new_options); } } @@ -174,6 +190,17 @@ TORCH_META_FUNC(scatter_add) scatter_meta_impl(*this, self, dim, index, src, "add"); } +TORCH_META_FUNC2(scatter_reduce, two) +(const Tensor& self, + int64_t dim, + const Tensor& index, + const Tensor& src, + const c10::string_view reduce, + bool include_self) { + (void) include_self; + scatter_meta_impl(*this, self, dim, index, src, reduce); +} + TORCH_PRECOMPUTE_META_FUNC(index_copy) (const Tensor& self, int64_t dim, const Tensor& index, const Tensor& source) { dim = maybe_wrap_dim(dim, self.dim()); @@ -296,6 +323,7 @@ DEFINE_DISPATCH(scatter_fill_stub); DEFINE_DISPATCH(scatter_add_stub); DEFINE_DISPATCH(scatter_reduce_stub); DEFINE_DISPATCH(scatter_scalar_reduce_stub); +DEFINE_DISPATCH(scatter_reduce_two_stub); static bool all_strides_match(TensorList tensors) { TORCH_CHECK(tensors.size() >= 1); @@ -880,9 +908,6 @@ Tensor & index_select_out_cpu_dim1_( for (const auto i : c10::irange(N)) { auto idx = idxs[i]; - if (idx < 0) { - idx = idx + src_indexing_axis_dim; - } dst_floats[i] = src_floats[idx]; } } @@ -892,10 +917,6 @@ Tensor & index_select_out_cpu_dim1_( for (const auto batch : c10::irange(outer_dims_product)) { for (const auto i : c10::irange(N)) { auto idx = idxs[i]; - if (idx < 0) { - idx = idx + src_indexing_axis_dim; - } - auto src = src_base + batch * src_batch_bytesize + idx * block_bytesize; auto dst = out + batch * gathered_batch_bytesize + i * block_bytesize; memcpy(dst, src, block_bytesize); @@ -1176,7 +1197,37 @@ Tensor gather_backward(const Tensor& grad, const Tensor& self, int64_t dim, cons return grad.new_zeros(self.sizes()).scatter_add_(dim, index, grad); } -template +static void scatter_reduce_exclude_self_helper( + const Tensor& self, + int64_t dim, + const Tensor& index, + const SCATTER_GATHER_OP& op) { + AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3( + at::ScalarType::Half, at::ScalarType::BFloat16, at::ScalarType::Bool, + self.scalar_type(), "scatter_reduce_exclude_input_init", [&] { + scalar_t init_val; + switch (op) { + case SCATTER_GATHER_OP::REDUCE_ADD: + init_val = (scalar_t)0; + break; + case SCATTER_GATHER_OP::REDUCE_MULTIPLY: + init_val = (scalar_t)1; + break; + case SCATTER_GATHER_OP::REDUCE_MAXIMUM: + init_val = std::numeric_limits::lowest(); + break; + case SCATTER_GATHER_OP::REDUCE_MINIMUM: + init_val = std::numeric_limits::max(); + break; + case SCATTER_GATHER_OP::REDUCE_MEAN: + init_val = (scalar_t)0; + break; + } + self.scatter_(dim, index, init_val); + }); +} + +template void scatter_impl( const Tensor& self, int64_t dim, @@ -1185,7 +1236,8 @@ void scatter_impl( const Tensor& out, ReduceStub& reduce_stub, FillStub& fill_stub, - const c10::optional reduce = nullopt) { + const c10::optional reduce = nullopt, + bool reduce_includes_self = true) { dim = at::maybe_wrap_dim(dim, self.dim()); auto mut_out = const_cast(out); @@ -1197,7 +1249,11 @@ void scatter_impl( if (index.numel() == 0) return; if (reduce.has_value()) { - auto op = meta::get_operator_enum(reduce.value()); + auto op = meta::get_operator_enum(reduce.value(), use_new_options); + if (!reduce_includes_self) { + // scatter inits for reduction to appropriate indices (used by scatter_reduce.two) + scatter_reduce_exclude_self_helper(mut_out, dim, index, op); + } reduce_stub(self.device().type(), mut_out, dim, index, src, op); } else { fill_stub(self.device().type(), mut_out, dim, index, src); @@ -1282,113 +1338,35 @@ TORCH_IMPL_FUNC(scatter_add) } } -Tensor scatter_reduce_two_cpu(const Tensor& self, - int64_t dim, - const Tensor& index, - const c10::string_view reduce, - const c10::optional output_size) { - - // TODO: Add documentation. - - - TORCH_CHECK(dim >= -self.dim() && dim < self.dim(), - "Expected `dim` to be in range ", -self.dim(), " to ", self.dim() - 1, " (got ", dim, ")"); - - dim = dim < 0 ? dim + self.dim() : dim; - - auto sizes = self.sizes().vec(); - if (output_size.has_value()) { - sizes[dim] = output_size.value(); - } else { - sizes[dim] = index.numel() > 0 ? index.max().item() + 1: 0; - } - Tensor out = at::empty(sizes, self.options()); - - TORCH_CHECK(self.dim() == index.dim(), - "Shape mismatch between `self` (got ", self.sizes(), ") and `index` (got ", index.sizes(), ")"); - for (const auto i : c10::irange(self.dim())) { - TORCH_CHECK(self.size(i) == index.size(i), - "Shape mismatch between `self` (got ", self.sizes(), ") and `index` (got ", index.sizes(), ")"); - } - - TORCH_CHECK(reduce == "sum" || reduce == "prod" || reduce == "mean" || reduce == "amax" || reduce =="amin", - "`reduce` argument must be one of ('sum', 'prod', 'mean', 'amax', 'amin'"); - - if (self.numel() == 0) { - return out.zero_(); - } - - AT_DISPATCH_ALL_TYPES_AND2(kHalf, kBFloat16, self.scalar_type(), "scatter_reduce", [&] { - if (reduce == "prod") { - out.fill_((scalar_t)1); - } else if (reduce == "amax") { - out.fill_(std::numeric_limits::lowest()); - } else if (reduce == "amin") { - out.fill_(std::numeric_limits::max()); +TORCH_IMPL_FUNC(scatter_reduce_two) +(const Tensor& self, + int64_t dim, + const Tensor& index, + const Tensor& src, + const c10::string_view reduce, + bool include_self, + const Tensor& out) { + // See issue https://github.com/pytorch/pytorch/issues/74770 + TORCH_WARN_ONCE("scatter_reduce() is in beta and the API may change at any time."); + + scatter_impl(self, dim, index, src, out, + scatter_reduce_two_stub, + scatter_stub, + reduce, + include_self); + + if (meta::get_operator_enum(reduce, true) == SCATTER_GATHER_OP::REDUCE_MEAN) { + auto ones = at::ones_like(src); + auto count = include_self ? at::ones_like(out) : at::zeros_like(out); + count.scatter_add_(dim, index, ones); + count.masked_fill_(count == 0, 1); + + if (out.is_floating_point() || out.is_complex()) { + out.div_(count); } else { - out.fill_((scalar_t)0); - } - - - auto self_cont = self.contiguous(); - auto index_cont = index.contiguous(); - auto self_data = self_cont.data_ptr(); - auto index_data = index_cont.data_ptr(); - bool out_is_contiguous = out.is_contiguous(); - auto out_cont = out.contiguous(); - auto out_cont_data = out_cont.data_ptr(); - - auto counts = at::zeros_like(out_cont); - auto counts_data = counts.data_ptr(); - - - int64_t offset1 = 1, offset2 = 1; - for (const auto d : c10::irange(dim)) { - offset1 *= self.size(d); - } - for (int64_t d = dim + 1; d < self.dim(); d++) { - offset2 *= self.size(d); - } - - scalar_t value; - int64_t dim_index; - for (const auto i : c10::irange(offset1)) { - for (const auto j : c10::irange(self.size(dim))) { - for (const auto k : c10::irange(offset2)) { - value = self_data[i * self_cont.stride(dim) * self_cont.size(dim) + j * self_cont.stride(dim) + k]; - dim_index = index_data[i * index_cont.stride(dim) * index_cont.size(dim) + j * index_cont.stride(dim) + k]; - TORCH_CHECK(dim_index >= 0 && dim_index < out.size(dim), - "Expected `index` values to be in range ", 0, " to ", out.size(dim), " (got ", dim_index, ")"); - int64_t ind = i * out_cont.stride(dim) * out_cont.size(dim) + dim_index * out_cont.stride(dim) + k; - if (reduce == "sum") { - out_cont_data[ind] += value; - } else if (reduce == "prod") { - out_cont_data[ind] *= value; - } else if (reduce == "mean") { - auto n = counts_data[ind]; - out_cont_data[ind] = (out_cont_data[ind] * n + value) / (n + 1); - counts_data[ind] += 1; - } else if (reduce == "amax") { - out_cont_data[ind] = std::max(out_cont_data[ind], value); - } else { - out_cont_data[ind] = std::min(out_cont_data[ind], value); - } - } - } - } - - if (reduce == "amin" || reduce == "amax") { - auto val = (reduce == "amin") ? std::numeric_limits::max() : std::numeric_limits::lowest(); - out_cont.masked_fill_(out_cont == val, (scalar_t)0); - } - - if (!out_is_contiguous) { - out.copy_(out_cont); + out.div_(count, "floor"); } - - }); - - return out; + } } Tensor masked_scatter(const Tensor & self, const Tensor & mask, const Tensor & source) { diff --git a/aten/src/ATen/native/TensorAdvancedIndexing.h b/aten/src/ATen/native/TensorAdvancedIndexing.h index 689ff5178d550c..a0c282d550e407 100644 --- a/aten/src/ATen/native/TensorAdvancedIndexing.h +++ b/aten/src/ATen/native/TensorAdvancedIndexing.h @@ -12,7 +12,7 @@ struct TensorIterator; namespace at { namespace native { -enum class SCATTER_GATHER_OP: uint8_t {REDUCE_ADD, REDUCE_MULTIPLY}; +enum class SCATTER_GATHER_OP: uint8_t {REDUCE_ADD, REDUCE_MULTIPLY, REDUCE_MAXIMUM, REDUCE_MINIMUM, REDUCE_MEAN}; using index_put_with_sort_fn = void(*)(Tensor &, const c10::List> &, const Tensor &, bool accumulate, bool unsafe); @@ -24,6 +24,8 @@ using scatter_reduce_fn = void(*)(const Tensor& self, const int64_t dim, const T const Tensor& src, const SCATTER_GATHER_OP& reduce); using scatter_scalar_reduce_fn = void(*)(const Tensor& self, const int64_t dim, const Tensor& index, const Scalar& value, const SCATTER_GATHER_OP& reduce); +using scatter_reduce_two_fn = void(*)(const Tensor& self, const int64_t dim, const Tensor& index, + const Tensor& src, const SCATTER_GATHER_OP& reduce); DECLARE_DISPATCH(index_put_with_sort_fn, index_put_with_sort_stub); @@ -33,6 +35,7 @@ DECLARE_DISPATCH(scatter_fill_fn, scatter_fill_stub); DECLARE_DISPATCH(scatter_add_fn, scatter_add_stub); DECLARE_DISPATCH(scatter_reduce_fn, scatter_reduce_stub); DECLARE_DISPATCH(scatter_scalar_reduce_fn, scatter_scalar_reduce_stub); +DECLARE_DISPATCH(scatter_reduce_two_fn, scatter_reduce_two_stub); TORCH_API Tensor& index_out(Tensor& result, const Tensor & self, const c10::List>& indices); diff --git a/aten/src/ATen/native/TensorCompare.cpp b/aten/src/ATen/native/TensorCompare.cpp index 0114deb943b35f..5054a57ae9a5b8 100644 --- a/aten/src/ATen/native/TensorCompare.cpp +++ b/aten/src/ATen/native/TensorCompare.cpp @@ -323,21 +323,30 @@ static void isin_sorting( } } -Tensor where(const Tensor& condition, const Tensor& self, const Tensor& other) { - TORCH_CHECK(condition.device() == self.device() && self.device() == other.device(), - "Expected condition, x and y to be on the same device, but condition is on ", - condition.device(), " and x and y are on ", self.device(), " and ", other.device(), - " respectively"); +Tensor& where_self_out(const Tensor& condition, const Tensor& self, const Tensor& other, Tensor& out) { + TORCH_CHECK(self.dtype() == other.dtype(), "expected scalar type ", self.dtype(), " but found ", other.dtype()); if (condition.scalar_type() == ScalarType::Byte) { TORCH_WARN_ONCE("where received a uint8 condition tensor. This behavior is deprecated and will be removed in a future version of PyTorch. Use a boolean condition instead."); -} else { + } else { TORCH_CHECK(condition.scalar_type() == ScalarType::Bool, "where expected condition to be a boolean tensor, but got a tensor with dtype ", condition.scalar_type()); + } + Tensor cond_bool = condition.scalar_type() == ScalarType::Byte ? condition.to(ScalarType::Bool) : condition; + auto iter = at::TensorIteratorConfig() + .check_all_same_dtype(false) + .add_output(out) + .add_input(cond_bool) + .add_input(self) + .add_input(other) + .build(); + where_kernel(iter.device_type(), iter); + return out; } - c10::MaybeOwned b_condition, b_self, b_other; - std::tie(b_condition, b_self, b_other) = expand_outplace(condition, self, other, "where"); - return at::_s_where(*b_condition, *b_self, *b_other); +Tensor where(const Tensor& condition, const Tensor& self, const Tensor& other) { + Tensor ret = at::empty({0}, self.options()); + at::native::where_self_out(condition, self, other, ret); + return ret; } Tensor where(const Tensor& condition, const Scalar& self, const Tensor& other) { @@ -359,22 +368,6 @@ std::vector where(const Tensor& condition) { return condition.nonzero_numpy(); } -Tensor _s_where(const Tensor& condition, const Tensor& self, const Tensor& other) { - TORCH_CHECK(self.dtype() == other.dtype(), "expected scalar type ", self.dtype(), " but found ", other.dtype()); - Tensor ret = at::empty(self.sizes(), self.options()); - // - Tensor cond_bool = condition.scalar_type() == ScalarType::Byte ? condition.to(ScalarType::Bool) : condition; - auto iter = at::TensorIteratorConfig() - .check_all_same_dtype(false) - .add_output(ret) - .add_input(cond_bool) - .add_input(self) - .add_input(other) - .build(); - where_kernel(iter.device_type(), iter); - return ret; -} - std::tuple mode(const Tensor& self, int64_t dim, bool keepdim) { Tensor values = at::empty({0}, self.options()); Tensor indices = at::empty({0}, self.options().dtype(kLong)); diff --git a/aten/src/ATen/native/TensorConversions.cpp b/aten/src/ATen/native/TensorConversions.cpp index 71690c4bf2d17b..d79b2929e471cf 100644 --- a/aten/src/ATen/native/TensorConversions.cpp +++ b/aten/src/ATen/native/TensorConversions.cpp @@ -240,11 +240,14 @@ Tensor to_dense_backward(const Tensor& grad, const Tensor& input_) { if (input_.layout() == c10::kSparse) { auto input = input_.coalesce(); return grad.sparse_mask(input); - } else if (input_.layout() == c10::kMkldnn) { + } + if (input_.layout() == c10::kMkldnn) { return grad.to_mkldnn(input_.scalar_type()); - } else { - AT_ERROR("Unsupported input layout: ", input_.layout()); } + if (input_.layout() == c10::kStrided) { + return grad.to_dense(); + } + AT_ERROR("Unsupported input layout: ", input_.layout()); } Tensor to_mkldnn_backward(const Tensor& grad, const Tensor& input_) { @@ -252,6 +255,41 @@ Tensor to_mkldnn_backward(const Tensor& grad, const Tensor& input_) { return grad.to_dense(input_.scalar_type()); } +Tensor to_dense(const Tensor& tensor, c10::optional dtype) { + if (tensor.layout() == c10::kSparse) { + return tensor._to_dense(dtype); + } + if (tensor.layout() == c10::kSparseCsr) { + return tensor._to_dense(dtype); + } + if (tensor.layout() == c10::kMkldnn) { + return tensor._to_dense(dtype); + } + TORCH_CHECK(tensor.layout() == c10::kStrided, "to_dense does not support layout ", tensor.layout()); + if (dtype) { + return tensor.to(*dtype); + } + return tensor; +} + +Tensor sparse_to_dense( + const Tensor& self, + c10::optional dtype) { + TORCH_CHECK( + !dtype.has_value(), "dtype argument is not supported by sparse_to_dense"); + Tensor dst = at::zeros(self.sizes(), self.options().layout(kStrided)); + return dst.add_(self); +} + +Tensor sparse_csr_to_dense( + const Tensor& self, + c10::optional dtype) { + TORCH_CHECK( + !dtype.has_value(), "dtype argument is not supported by sparse_csr_to_dense"); + Tensor dst = at::zeros(self.sizes(), self.options().layout(kStrided)); + return dst.add_(self); +} + // Computes the strides for view_dtype output when the view dtype is // smaller than the original dtype inline DimVector compute_strides_for_view_dtype_downsize(IntArrayRef old_strides, int64_t size_ratio, ScalarType old_dtype, ScalarType new_dtype) { @@ -371,4 +409,32 @@ Tensor view_dtype(const Tensor& self, ScalarType dtype) { return new_tensor; } +Tensor dense_to_sparse_csr(const Tensor& self) { + return self.to_sparse().to_sparse_csr(); +} + +Tensor csr_to_sparse_csr(const Tensor& self) { + return self; +} + +Tensor coo_to_sparse_csr(const Tensor& self) { + TORCH_CHECK( + self.dim() == 2, + "Only 2D tensors can be converted to the CSR format but got shape: ", + self.sizes()); + auto coalesced_self = self.coalesce(); + auto row_indices = coalesced_self.indices()[0]; + bool out_int32 = (row_indices.scalar_type() == at::kInt); + auto crow_indices = at::_convert_indices_from_coo_to_csr( + row_indices, self.size(0), out_int32); + return at::native::_sparse_csr_tensor_unsafe( + crow_indices, + coalesced_self.indices()[1].contiguous(), + coalesced_self.values(), + coalesced_self.sizes(), + coalesced_self.scalar_type(), + c10::kSparseCsr, + coalesced_self.device()); +} + }} // namespace at::native diff --git a/aten/src/ATen/native/TensorFactories.cpp b/aten/src/ATen/native/TensorFactories.cpp index 458a694411e4bc..5cba59058beb66 100644 --- a/aten/src/ATen/native/TensorFactories.cpp +++ b/aten/src/ATen/native/TensorFactories.cpp @@ -110,9 +110,9 @@ Tensor _dim_arange(const Tensor& like, int64_t dim) { // ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ complex / polar ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ void complex_check_floating(const Tensor& a, const Tensor& b) { - TORCH_CHECK((a.scalar_type() == kFloat || a.scalar_type() == kDouble) && - (b.scalar_type() == kFloat || b.scalar_type() == kDouble), - "Expected both inputs to be Float or Double tensors but got ", + TORCH_CHECK((a.scalar_type() == kFloat || a.scalar_type() == kDouble || a.scalar_type() == kHalf) && + (b.scalar_type() == kFloat || b.scalar_type() == kDouble || b.scalar_type() == kHalf), + "Expected both inputs to be Half, Float or Double tensors but got ", a.scalar_type(), " and ", b.scalar_type()); } @@ -1344,6 +1344,11 @@ Tensor kaiser_window( TensorOptions options = TensorOptions().dtype(dtype).layout(layout).device(device).pinned_memory(pin_memory); window_function_checks("kaiser_window", options, window_length); + // short-circuit for `meta`. + if (device == kMeta) { + return at::empty({window_length}, options); + } + if (window_length == 0) { return at::empty({0}, options); } diff --git a/aten/src/ATen/native/TensorFactories.h b/aten/src/ATen/native/TensorFactories.h index 2d4a306f094875..35e058df4b3ab7 100644 --- a/aten/src/ATen/native/TensorFactories.h +++ b/aten/src/ATen/native/TensorFactories.h @@ -35,6 +35,10 @@ namespace at { namespace native { // In this case, we first calculate the size of top trapezoid, and then // calculate the size of the bottom rectangle. inline int64_t get_tril_size(int64_t row, int64_t col, int64_t offset) { + // If either dimension is 0 then the there is no tril + if (row == 0 || col == 0) { + return 0; + } // number of elements in the first row of the tril auto m_first_row = offset > 0 ? std::min(col, 1 + offset) : // upper bounded by col diff --git a/aten/src/ATen/native/TensorProperties.cpp b/aten/src/ATen/native/TensorProperties.cpp index 63d928749e0910..fd72abc580b4ca 100644 --- a/aten/src/ATen/native/TensorProperties.cpp +++ b/aten/src/ATen/native/TensorProperties.cpp @@ -1,6 +1,6 @@ #include #include -#include +#include #include #include @@ -31,7 +31,7 @@ int64_t stride(const Tensor& self, Dimname dim) { return self.strides()[pos_dim]; } -bool cudnn_is_acceptable(const Tensor& self) { +bool cudnn_is_acceptable(const TensorBase& self) { if (!globalContext().userEnabledCuDNN()) return false; if (!self.is_cuda()) return false; auto st = self.scalar_type(); @@ -48,6 +48,10 @@ bool cudnn_is_acceptable(const Tensor& self) { return true; } +bool cudnn_is_acceptable(const Tensor& self) { + return cudnn_is_acceptable(static_cast(self)); +} + Tensor & detach_(Tensor & self) { // this just exists to give us a hook in VariableType and an entry in Declarations.yaml //AT_ERROR("detach_ is not implemented for Tensor"); diff --git a/aten/src/ATen/native/TensorProperties.h b/aten/src/ATen/native/TensorProperties.h new file mode 100644 index 00000000000000..fe6e8395c178e9 --- /dev/null +++ b/aten/src/ATen/native/TensorProperties.h @@ -0,0 +1,12 @@ +#pragma once + +// See NOTE: [Tensor vs. TensorBase] +namespace at { +class TensorBase; +} + +namespace at { namespace native { + +TORCH_API bool cudnn_is_acceptable(const TensorBase& self); + +}} // namespace at::native diff --git a/aten/src/ATen/native/TensorShape.cpp b/aten/src/ATen/native/TensorShape.cpp index 21233a13c3b7a4..28a79421675247 100644 --- a/aten/src/ATen/native/TensorShape.cpp +++ b/aten/src/ATen/native/TensorShape.cpp @@ -59,9 +59,11 @@ Tensor& set_storage_cpu_(Tensor& result, Storage storage, int64_t storage_offset checkSetStorage(result, storage, storage_offset, size, stride); result.unsafeGetTensorImpl()->set_storage_offset(storage_offset); - c10::optional stride_opt = stride.data() != nullptr ? - c10::optional(stride) : c10::nullopt; - at::native::resize_impl_cpu_(result.unsafeGetTensorImpl(), size, stride_opt); + at::OptionalIntArrayRef stride_opt = stride.data() != nullptr ? + at::OptionalIntArrayRef(stride) : c10::nullopt; + // We can re-use this kernel for the meta device. + // We just need to make sure we don't actually try to resize the (null) storage. + at::native::resize_impl_cpu_(result.unsafeGetTensorImpl(), size, stride_opt, /*resize_storage=*/!result.is_meta()); return result; } @@ -87,6 +89,19 @@ Tensor& set_cpu_(Tensor& result) { return result; } +// We can't re-use the cpu kernel here because we don't want to use the cpu allocator. +Tensor& set_meta_(Tensor& result) { + caffe2::TypeMeta dtype = result.dtype(); + Storage storage( + Storage::use_byte_size_t(), + 0, + c10::GetAllocator(kMeta), + true); + result.set_(storage, 0, {0}, {}); + TORCH_INTERNAL_ASSERT(dtype == result.dtype()); + return result; +} + Tensor sparse_broadcast_to(const Tensor& self, IntArrayRef size) { TORCH_CHECK(self.is_sparse(), "input must be sparse tensor"); int64_t sparse_extra_ndim = size.size() - self.dim(); @@ -877,6 +892,19 @@ const Tensor &as_strided_(const Tensor& self, IntArrayRef size, IntArrayRef stri return self; } +Tensor narrow_copy_symint(const Tensor& self, int64_t dim, int64_t start, SymInt sym_length) { + return narrow_copy(self, dim, start, sym_length.expect_int()); +} + +Tensor narrow_copy_dense(const Tensor& self, int64_t dim, int64_t start, int64_t length) { + return self.narrow(dim, start, length).clone(at::MemoryFormat::Contiguous); +} + +Tensor narrow_copy_dense_cpu(const Tensor& self, int64_t dim, int64_t start, int64_t length){ + auto output = at::empty_like(self); + return narrow_copy_dense_cpu_out(self, dim, start, length, output); +} + Tensor narrow_copy_sparse(const Tensor& self, int64_t dim, int64_t start, int64_t length) { int64_t allDim = self.dim(); int64_t end = start+length; @@ -914,6 +942,7 @@ Tensor narrow_copy_sparse(const Tensor& self, int64_t dim, int64_t start, int64_ Tensor& narrow_copy_dense_cpu_out( const Tensor& self, int64_t dim, int64_t start, int64_t length, Tensor& output ) { + TORCH_CHECK(self.dim() > 0, "narrow() cannot be applied to a 0-dim tensor."); TORCH_CHECK(self.dtype() == output.dtype()); @@ -991,15 +1020,6 @@ Tensor& narrow_copy_dense_cpu_out( return output; } -Tensor narrow_copy_dense(const Tensor& self, int64_t dim, int64_t start, int64_t length){ - return self.narrow(dim, start, length).clone(at::MemoryFormat::Contiguous); -} - -Tensor narrow_copy_dense_cpu(const Tensor& self, int64_t dim, int64_t start, int64_t length){ - auto output = at::empty_like(self); - return narrow_copy_dense_cpu_out(self, dim, start, length, output); -} - Tensor narrow(const Tensor& self, int64_t dim, int64_t start, int64_t length) { TORCH_CHECK(self.dim() > 0, "narrow() cannot be applied to a 0-dim tensor."); auto cur_size = self.size(dim); @@ -1159,7 +1179,7 @@ Tensor reshape(const Tensor& self, IntArrayRef proposed_shape) { // // We need to do the checks here instead of in `native_functions.yaml` // to preserve backwards compatibility. - if (!self.is_xla() && !self.is_lazy()) { + if (!self.is_xla() && !self.is_lazy() && !self.is_ipu()) { return self._reshape_alias(shape, stride.value()); } else { return self.view(shape); @@ -1464,6 +1484,10 @@ std::vector split(const Tensor& self, int64_t split_size, int64_t dim) { return splits; } +std::vector split(const Tensor& self, IntArrayRef sizes, int64_t dim) { + return at::split_with_sizes(self, sizes, dim); +} + std::vector unsafe_split(const Tensor& self, int64_t split_size, int64_t dim) { auto result = at::native::split(self, split_size, dim); for (auto& t : result) { @@ -2206,7 +2230,7 @@ Tensor flatten(const Tensor& self, DimnameList dims, Dimname out_dim) { } Tensor ravel(const Tensor& self) { - return self.reshape(-1); + return self.contiguous().view(-1); } static inline void handle_unflatten_exception(const std::runtime_error &e, diff --git a/aten/src/ATen/native/TensorShape.h b/aten/src/ATen/native/TensorShape.h index 69eb749ea48483..c9fd4d8ad61757 100644 --- a/aten/src/ATen/native/TensorShape.h +++ b/aten/src/ATen/native/TensorShape.h @@ -1,4 +1,5 @@ -#include +#pragma once +#include #include namespace at { @@ -47,4 +48,11 @@ inline int64_t get_num_splits(const Tensor& self, int64_t split_size, int64_t di return num_splits; } +/// +/// For more information, see +/// https://pytorch.org/docs/master/generated/torch.Tensor.unfold.html#torch.Tensor.unfold +/// + +Tensor unfold(const Tensor& self, int64_t dimension, int64_t size, int64_t step); + }} // namespace at::native diff --git a/aten/src/ATen/native/TensorTransformations.cpp b/aten/src/ATen/native/TensorTransformations.cpp index 5e5f9c91179e42..e555fc1db3a3b9 100644 --- a/aten/src/ATen/native/TensorTransformations.cpp +++ b/aten/src/ATen/native/TensorTransformations.cpp @@ -1,6 +1,7 @@ #include #include // for flip_stub +#include #include #include #include diff --git a/aten/src/ATen/native/TensorTransformations.h b/aten/src/ATen/native/TensorTransformations.h index 03ee31e696aada..4909ebe84bb03e 100644 --- a/aten/src/ATen/native/TensorTransformations.h +++ b/aten/src/ATen/native/TensorTransformations.h @@ -1,4 +1,10 @@ -#include +#include + +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#endif #include diff --git a/aten/src/ATen/native/TestOps.cpp b/aten/src/ATen/native/TestOps.cpp index 0658502619209a..9a3a5b10cb2693 100644 --- a/aten/src/ATen/native/TestOps.cpp +++ b/aten/src/ATen/native/TestOps.cpp @@ -13,7 +13,7 @@ namespace native { /// Else, return a new tensor containing the elementwise sums. Tensor _test_optional_intlist( const Tensor& values, - c10::optional addends) { + at::OptionalIntArrayRef addends) { if (!addends) { return values; } diff --git a/aten/src/ATen/native/UnaryOps.cpp b/aten/src/ATen/native/UnaryOps.cpp index 64e17dd9dd0413..8577ca8c1c079a 100644 --- a/aten/src/ATen/native/UnaryOps.cpp +++ b/aten/src/ATen/native/UnaryOps.cpp @@ -67,6 +67,7 @@ CREATE_UNARY_FLOAT_META_FUNC(special_i0e) CREATE_UNARY_FLOAT_META_FUNC(special_i1) CREATE_UNARY_FLOAT_META_FUNC(special_i1e) CREATE_UNARY_FLOAT_META_FUNC(special_ndtri) +CREATE_UNARY_FLOAT_META_FUNC(special_log_ndtr) CREATE_UNARY_FLOAT_META_FUNC(sqrt) CREATE_UNARY_FLOAT_META_FUNC(tan) CREATE_UNARY_FLOAT_META_FUNC(tanh) @@ -184,6 +185,7 @@ CREATE_UNARY_TORCH_IMPL_FUNC(special_i0e_out, special_i0e_stub) CREATE_UNARY_TORCH_IMPL_FUNC(special_i1e_out, special_i1e_stub) CREATE_UNARY_TORCH_IMPL_FUNC(special_i1_out, special_i1_stub) CREATE_UNARY_TORCH_IMPL_FUNC(special_ndtri_out, special_ndtri_stub) +CREATE_UNARY_TORCH_IMPL_FUNC(special_log_ndtr_out, special_log_ndtr_stub) CREATE_UNARY_TORCH_IMPL_FUNC(sqrt_out, sqrt_stub) CREATE_UNARY_TORCH_IMPL_FUNC(tan_out, tan_stub) CREATE_UNARY_TORCH_IMPL_FUNC(tanh_out, tanh_stub) @@ -538,7 +540,7 @@ Tensor special_sinc(const Tensor& self) { return self.sinc(); } namespace { inline Tensor calc_ndtr(const Tensor& self) { - auto x_sqrt_2 = self / std::sqrt(2.); + auto x_sqrt_2 = self * M_SQRT1_2; return (1 + at::erf(x_sqrt_2)) * 0.5; } @@ -841,6 +843,7 @@ DEFINE_DISPATCH(log1p_stub); // NOLINT(cppcoreguidelines-avoid-non-const-global- DEFINE_DISPATCH(log2_stub); // NOLINT(cppcoreguidelines-avoid-non-const-global-variables) DEFINE_DISPATCH(logical_not_stub); // NOLINT(cppcoreguidelines-avoid-non-const-global-variables) DEFINE_DISPATCH(special_ndtri_stub); // NOLINT(cppcoreguidelines-avoid-non-const-global-variables) +DEFINE_DISPATCH(special_log_ndtr_stub); // NOLINT(cppcoreguidelines-avoid-non-const-global-variables) DEFINE_DISPATCH(neg_stub); // NOLINT(cppcoreguidelines-avoid-non-const-global-variables) DEFINE_DISPATCH(nan_to_num_stub); // NOLINT(cppcoreguidelines-avoid-non-const-global-variables) DEFINE_DISPATCH(polygamma_stub); // NOLINT(cppcoreguidelines-avoid-non-const-global-variables) diff --git a/aten/src/ATen/native/UnaryOps.h b/aten/src/ATen/native/UnaryOps.h index 0a9afd9cd4dbd0..c0fb139c0b1594 100644 --- a/aten/src/ATen/native/UnaryOps.h +++ b/aten/src/ATen/native/UnaryOps.h @@ -52,6 +52,7 @@ DECLARE_DISPATCH(unary_fn, log10_stub); DECLARE_DISPATCH(unary_fn, log1p_stub); DECLARE_DISPATCH(unary_fn, log2_stub); DECLARE_DISPATCH(unary_fn, special_ndtri_stub); +DECLARE_DISPATCH(unary_fn, special_log_ndtr_stub); DECLARE_DISPATCH(unary_fn, neg_stub); DECLARE_DISPATCH(unary_fn, reciprocal_stub); diff --git a/aten/src/ATen/native/UpSample.cpp b/aten/src/ATen/native/UpSample.cpp index bcc8891de8dcd7..db75b7e99fdb1a 100644 --- a/aten/src/ATen/native/UpSample.cpp +++ b/aten/src/ATen/native/UpSample.cpp @@ -9,7 +9,7 @@ namespace upsample { TORCH_API c10::SmallVector compute_output_size( c10::IntArrayRef input_size, // Full input tensor size. - c10::optional output_size, + at::OptionalIntArrayRef output_size, c10::optional> scale_factors) { const auto spatial_dimensions = static_cast(input_size.size()) - 2; if (output_size) { diff --git a/aten/src/ATen/native/UpSample.h b/aten/src/ATen/native/UpSample.h index 743188a623b49f..8cc476ab445cb2 100644 --- a/aten/src/ATen/native/UpSample.h +++ b/aten/src/ATen/native/UpSample.h @@ -51,7 +51,7 @@ namespace upsample { TORCH_API c10::SmallVector compute_output_size( c10::IntArrayRef input_size, // Full input tensor size. - c10::optional output_size, + at::OptionalIntArrayRef output_size, c10::optional> scale_factors); inline c10::optional get_scale_value(c10::optional> scales, int idx) { diff --git a/aten/src/ATen/native/UpSampleBicubic2d.cpp b/aten/src/ATen/native/UpSampleBicubic2d.cpp index 95d9f91bcb8036..a23019ecc0eb60 100644 --- a/aten/src/ATen/native/UpSampleBicubic2d.cpp +++ b/aten/src/ATen/native/UpSampleBicubic2d.cpp @@ -264,7 +264,7 @@ using at::native::upsample::get_scale_value; Tensor upsample_bicubic2d( const Tensor& input, - c10::optional output_size, + at::OptionalIntArrayRef output_size, bool align_corners, c10::optional> scale_factors) { auto osize = compute_output_size(input.sizes(), output_size, scale_factors); @@ -275,7 +275,7 @@ Tensor upsample_bicubic2d( Tensor upsample_bicubic2d_backward( const Tensor& grad_output, - c10::optional output_size, + at::OptionalIntArrayRef output_size, IntArrayRef input_size, bool align_corners, c10::optional> scale_factors) { @@ -287,7 +287,7 @@ Tensor upsample_bicubic2d_backward( Tensor _upsample_bicubic2d_aa( const Tensor& input, - c10::optional output_size, + at::OptionalIntArrayRef output_size, bool align_corners, c10::optional> scale_factors) { auto osize = compute_output_size(input.sizes(), output_size, scale_factors); @@ -298,7 +298,7 @@ Tensor _upsample_bicubic2d_aa( Tensor _upsample_bicubic2d_aa_backward( const Tensor& grad_output, - c10::optional output_size, + at::OptionalIntArrayRef output_size, IntArrayRef input_size, bool align_corners, c10::optional> scale_factors) { diff --git a/aten/src/ATen/native/UpSampleBilinear2d.cpp b/aten/src/ATen/native/UpSampleBilinear2d.cpp index f73bb50c9ff426..2a228a86ac71d7 100644 --- a/aten/src/ATen/native/UpSampleBilinear2d.cpp +++ b/aten/src/ATen/native/UpSampleBilinear2d.cpp @@ -145,7 +145,7 @@ using at::native::upsample::get_scale_value; Tensor upsample_bilinear2d( const Tensor& input, - c10::optional output_size, + at::OptionalIntArrayRef output_size, bool align_corners, c10::optional> scale_factors) { auto osize = compute_output_size(input.sizes(), output_size, scale_factors); @@ -156,7 +156,7 @@ Tensor upsample_bilinear2d( Tensor upsample_bilinear2d_backward( const Tensor& grad_output, - c10::optional output_size, + at::OptionalIntArrayRef output_size, IntArrayRef input_size, bool align_corners, c10::optional> scale_factors) { @@ -168,7 +168,7 @@ Tensor upsample_bilinear2d_backward( Tensor _upsample_bilinear2d_aa( const Tensor& input, - c10::optional output_size, + at::OptionalIntArrayRef output_size, bool align_corners, c10::optional> scale_factors) { auto osize = compute_output_size(input.sizes(), output_size, scale_factors); @@ -179,7 +179,7 @@ Tensor _upsample_bilinear2d_aa( Tensor _upsample_bilinear2d_aa_backward( const Tensor& grad_output, - c10::optional output_size, + at::OptionalIntArrayRef output_size, IntArrayRef input_size, bool align_corners, c10::optional> scale_factors) { diff --git a/aten/src/ATen/native/UpSampleLinear1d.cpp b/aten/src/ATen/native/UpSampleLinear1d.cpp index 371a53dc890028..687cad5c879bf8 100644 --- a/aten/src/ATen/native/UpSampleLinear1d.cpp +++ b/aten/src/ATen/native/UpSampleLinear1d.cpp @@ -79,7 +79,7 @@ using at::native::upsample::get_scale_value; Tensor upsample_linear1d( const Tensor& input, - c10::optional output_size, + at::OptionalIntArrayRef output_size, bool align_corners, c10::optional> scale_factors) { auto osize = compute_output_size(input.sizes(), output_size, scale_factors); @@ -89,7 +89,7 @@ Tensor upsample_linear1d( Tensor upsample_linear1d_backward( const Tensor& grad_output, - c10::optional output_size, + at::OptionalIntArrayRef output_size, IntArrayRef input_size, bool align_corners, c10::optional> scale_factors) { diff --git a/aten/src/ATen/native/UpSampleNearest1d.cpp b/aten/src/ATen/native/UpSampleNearest1d.cpp index 52fa7bcc5c9a5e..b9bc5b3c5b9682 100644 --- a/aten/src/ATen/native/UpSampleNearest1d.cpp +++ b/aten/src/ATen/native/UpSampleNearest1d.cpp @@ -109,7 +109,7 @@ using at::native::upsample::get_scale_value; Tensor upsample_nearest1d( const Tensor& input, - c10::optional output_size, + at::OptionalIntArrayRef output_size, c10::optional> scale_factors) { auto osize = compute_output_size(input.sizes(), output_size, scale_factors); auto scale_w = get_scale_value(scale_factors, 0); @@ -118,7 +118,7 @@ Tensor upsample_nearest1d( Tensor _upsample_nearest_exact1d( const Tensor& input, - c10::optional output_size, + at::OptionalIntArrayRef output_size, c10::optional> scale_factors) { auto osize = compute_output_size(input.sizes(), output_size, scale_factors); auto scale_w = get_scale_value(scale_factors, 0); @@ -127,7 +127,7 @@ Tensor _upsample_nearest_exact1d( Tensor upsample_nearest1d_backward( const Tensor& grad_output, - c10::optional output_size, + at::OptionalIntArrayRef output_size, IntArrayRef input_size, c10::optional> scale_factors) { auto osize = compute_output_size(input_size, output_size, scale_factors); @@ -137,7 +137,7 @@ Tensor upsample_nearest1d_backward( Tensor _upsample_nearest_exact1d_backward( const Tensor& grad_output, - c10::optional output_size, + at::OptionalIntArrayRef output_size, IntArrayRef input_size, c10::optional> scale_factors) { auto osize = compute_output_size(input_size, output_size, scale_factors); diff --git a/aten/src/ATen/native/UpSampleNearest2d.cpp b/aten/src/ATen/native/UpSampleNearest2d.cpp index 864121fb0afa0d..1f9a9eafd4f6db 100644 --- a/aten/src/ATen/native/UpSampleNearest2d.cpp +++ b/aten/src/ATen/native/UpSampleNearest2d.cpp @@ -134,7 +134,7 @@ using at::native::upsample::get_scale_value; Tensor upsample_nearest2d( const Tensor& input, - c10::optional output_size, + at::OptionalIntArrayRef output_size, c10::optional> scale_factors) { auto osize = compute_output_size(input.sizes(), output_size, scale_factors); auto scale_h = get_scale_value(scale_factors, 0); @@ -144,7 +144,7 @@ Tensor upsample_nearest2d( Tensor _upsample_nearest_exact2d( const Tensor& input, - c10::optional output_size, + at::OptionalIntArrayRef output_size, c10::optional> scale_factors) { auto osize = compute_output_size(input.sizes(), output_size, scale_factors); auto scale_h = get_scale_value(scale_factors, 0); @@ -154,7 +154,7 @@ Tensor _upsample_nearest_exact2d( Tensor upsample_nearest2d_backward( const Tensor& grad_output, - c10::optional output_size, + at::OptionalIntArrayRef output_size, IntArrayRef input_size, c10::optional> scale_factors) { auto osize = compute_output_size(input_size, output_size, scale_factors); @@ -165,7 +165,7 @@ Tensor upsample_nearest2d_backward( Tensor _upsample_nearest_exact2d_backward( const Tensor& grad_output, - c10::optional output_size, + at::OptionalIntArrayRef output_size, IntArrayRef input_size, c10::optional> scale_factors) { auto osize = compute_output_size(input_size, output_size, scale_factors); diff --git a/aten/src/ATen/native/UpSampleNearest3d.cpp b/aten/src/ATen/native/UpSampleNearest3d.cpp index c659a86cd81f39..ff559f3e09c07b 100644 --- a/aten/src/ATen/native/UpSampleNearest3d.cpp +++ b/aten/src/ATen/native/UpSampleNearest3d.cpp @@ -149,7 +149,7 @@ using at::native::upsample::get_scale_value; Tensor upsample_nearest3d_cpu( const Tensor& input, - c10::optional output_size, + at::OptionalIntArrayRef output_size, c10::optional> scale_factors) { auto osize = compute_output_size(input.sizes(), output_size, scale_factors); auto scale_d = get_scale_value(scale_factors, 0); @@ -160,7 +160,7 @@ Tensor upsample_nearest3d_cpu( Tensor _upsample_nearest_exact3d_cpu( const Tensor& input, - c10::optional output_size, + at::OptionalIntArrayRef output_size, c10::optional> scale_factors) { auto osize = compute_output_size(input.sizes(), output_size, scale_factors); auto scale_d = get_scale_value(scale_factors, 0); @@ -172,7 +172,7 @@ Tensor _upsample_nearest_exact3d_cpu( // when structured kernels can handle QuantizedCPU, update these overloads to be CompositeExplicitAutograd Tensor upsample_nearest3d_backward_cpu( const Tensor& grad_output, - c10::optional output_size, + at::OptionalIntArrayRef output_size, IntArrayRef input_size, c10::optional> scale_factors) { auto osize = compute_output_size(input_size, output_size, scale_factors); @@ -184,7 +184,7 @@ Tensor upsample_nearest3d_backward_cpu( Tensor _upsample_nearest_exact3d_backward_cpu( const Tensor& grad_output, - c10::optional output_size, + at::OptionalIntArrayRef output_size, IntArrayRef input_size, c10::optional> scale_factors) { auto osize = compute_output_size(input_size, output_size, scale_factors); diff --git a/aten/src/ATen/native/UpSampleTrilinear3d.cpp b/aten/src/ATen/native/UpSampleTrilinear3d.cpp index 75a77a76c623d2..256e5e235b461a 100644 --- a/aten/src/ATen/native/UpSampleTrilinear3d.cpp +++ b/aten/src/ATen/native/UpSampleTrilinear3d.cpp @@ -90,7 +90,7 @@ using at::native::upsample::get_scale_value; Tensor upsample_trilinear3d( const Tensor& input, - c10::optional output_size, + at::OptionalIntArrayRef output_size, bool align_corners, c10::optional> scale_factors) { auto osize = compute_output_size(input.sizes(), output_size, scale_factors); @@ -102,7 +102,7 @@ Tensor upsample_trilinear3d( Tensor upsample_trilinear3d_backward( const Tensor& grad_output, - c10::optional output_size, + at::OptionalIntArrayRef output_size, IntArrayRef input_size, bool align_corners, c10::optional> scale_factors) { diff --git a/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_prepack.cpp b/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_prepack.cpp index e0fb55427a77f3..187ed4fd1404ab 100644 --- a/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_prepack.cpp +++ b/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_prepack.cpp @@ -2,7 +2,6 @@ #include #include -#include #include #include #include diff --git a/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_unpack.cpp b/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_unpack.cpp index a0a389f818c480..ec6e160b16c3e6 100644 --- a/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_unpack.cpp +++ b/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_unpack.cpp @@ -1,7 +1,6 @@ #include #include -#include #include #include #include diff --git a/aten/src/ATen/native/cpu/Activation.cpp b/aten/src/ATen/native/cpu/Activation.cpp index 1eebcde30c9edf..637972e5ff6198 100644 --- a/aten/src/ATen/native/cpu/Activation.cpp +++ b/aten/src/ATen/native/cpu/Activation.cpp @@ -24,41 +24,106 @@ namespace { template inline void _vec_log_sigmoid(TensorBase &output, TensorBase &buffer, const TensorBase &input) { - using Vec = Vectorized; - scalar_t* output_data = output.data_ptr(); - scalar_t* buffer_data = buffer.data_ptr(); - scalar_t* input_data = input.data_ptr(); - parallel_for(0, input.numel(), 1, [&] (int64_t begin, int64_t end) { - int64_t size = end - begin; - int64_t d = 0; - for (; d < size - (size % Vec::size()); d += Vec::size()) { - Vec data_vec = Vec::loadu(input_data + begin+ d); - Vec min_vec = vec::minimum(data_vec, Vec(scalar_t(0))); - Vec buffer_vec = data_vec.abs().neg().exp(); - Vec output_vec = min_vec - buffer_vec.log1p(); - buffer_vec.store(buffer_data + begin + d); - output_vec.store(output_data + begin + d); - } - if (size - d > 0) { - Vec data_vec = Vec::loadu(input_data + begin + d, size - d); - Vec min_vec = vec::minimum(data_vec, Vec(scalar_t(0))); - Vec buffer_vec = data_vec.abs().neg().exp(); - Vec output_vec = min_vec - buffer_vec.log1p(); - buffer_vec.store(buffer_data + begin + d, size - d); - output_vec.store(output_data + begin + d, size - d); - } - }); + if (input.scalar_type() == kBFloat16) { + using Vec = Vectorized; + BFloat16* output_data = output.data_ptr(); + BFloat16* buffer_data = buffer.data_ptr(); + BFloat16* input_data = input.data_ptr(); + parallel_for(0, input.numel(), 1, [&] (int64_t begin, int64_t end) { + int64_t size = end - begin; + int64_t d = 0; + for (; d < size - (size % Vec::size()); d += Vec::size()) { + Vec data_vec = Vec::loadu(input_data + begin+ d); + Vectorized data_vec0, data_vec1; + std::tie(data_vec0, data_vec1) = convert_bfloat16_float(data_vec); + Vectorized min_vec = minimum(data_vec0, Vectorized(float(0))); + Vectorized buffer_vec0 = data_vec0.abs().neg().exp(); + Vectorized output_vec0 = min_vec - buffer_vec0.log1p(); + min_vec = minimum(data_vec1, Vectorized(float(0))); + Vectorized buffer_vec1 = data_vec1.abs().neg().exp(); + Vectorized output_vec1 = min_vec - buffer_vec1.log1p(); + convert_float_bfloat16(buffer_vec0, buffer_vec1).store(buffer_data + begin + d); + convert_float_bfloat16(output_vec0, output_vec1).store(output_data + begin + d); + } + if (size - d > 0) { + Vec data_vec = Vec::loadu(input_data + begin + d, size - d); + Vectorized data_vec0, data_vec1; + std::tie(data_vec0, data_vec1) = convert_bfloat16_float(data_vec); + Vectorized min_vec = minimum(data_vec0, Vectorized(float(0))); + Vectorized buffer_vec0 = data_vec0.abs().neg().exp(); + Vectorized output_vec0 = min_vec - buffer_vec0.log1p(); + min_vec = minimum(data_vec1, Vectorized(float(0))); + Vectorized buffer_vec1 = data_vec1.abs().neg().exp(); + Vectorized output_vec1 = min_vec - buffer_vec1.log1p(); + convert_float_bfloat16(buffer_vec0, buffer_vec1).store(buffer_data + begin + d, size - d); + convert_float_bfloat16(output_vec0, output_vec1).store(output_data + begin + d, size - d); + } + }); + } else { + using Vec = Vectorized; + scalar_t* output_data = output.data_ptr(); + scalar_t* buffer_data = buffer.data_ptr(); + scalar_t* input_data = input.data_ptr(); + parallel_for(0, input.numel(), 1, [&] (int64_t begin, int64_t end) { + int64_t size = end - begin; + int64_t d = 0; + for (; d < size - (size % Vec::size()); d += Vec::size()) { + Vec data_vec = Vec::loadu(input_data + begin+ d); + Vec min_vec = vec::minimum(data_vec, Vec(scalar_t(0))); + Vec buffer_vec = data_vec.abs().neg().exp(); + Vec output_vec = min_vec - buffer_vec.log1p(); + buffer_vec.store(buffer_data + begin + d); + output_vec.store(output_data + begin + d); + } + if (size - d > 0) { + Vec data_vec = Vec::loadu(input_data + begin + d, size - d); + Vec min_vec = vec::minimum(data_vec, Vec(scalar_t(0))); + Vec buffer_vec = data_vec.abs().neg().exp(); + Vec output_vec = min_vec - buffer_vec.log1p(); + buffer_vec.store(buffer_data + begin + d, size - d); + output_vec.store(output_data + begin + d, size - d); + } + }); + } } -static void log_sigmoid_cpu_kernel( - TensorBase &output, TensorBase &buffer, const TensorBase &input) { - AT_DISPATCH_FLOATING_TYPES(input.scalar_type(), "log_sigmoid_cpu", [&] { +static void log_sigmoid_cpu_kernel(TensorBase &output, TensorBase &buffer, const TensorBase &input) { + AT_DISPATCH_FLOATING_TYPES_AND(kBFloat16, input.scalar_type(), "log_sigmoid_cpu", [&] { _vec_log_sigmoid(output, buffer, input); }); } static void log_sigmoid_backward_cpu_kernel(TensorIterator& iter) { - AT_DISPATCH_FLOATING_TYPES(iter.dtype(), "log_sigmoid_backward_cpu", [&]() { + if (iter.dtype() == kBFloat16) { + using Vec = Vectorized; + auto zero_val = float(0); + auto zero_vec = Vectorized(zero_val); + auto one_val = float(1); + auto one_vec = Vectorized(one_val); + cpu_kernel_vec(iter, + [=](BFloat16 a, BFloat16 b, BFloat16 c) -> BFloat16 { + auto in_negative = float(a) < float(0); + auto max_deriv = in_negative ? float(1) : float(0); + auto sign = in_negative ? float(1) : -float(1); + return (max_deriv - sign * (float(b) / (float(1) + b))) * float(c); + }, + [=](Vec a, Vec b, Vec c) -> Vec { + Vectorized a0, a1, b0, b1, c0, c1; + std::tie(a0, a1) = convert_bfloat16_float(a); + std::tie(b0, b1) = convert_bfloat16_float(b); + std::tie(c0, c1) = convert_bfloat16_float(c); + auto mask = a0 < zero_vec; + auto max_deriv_vec = Vectorized::blendv(zero_vec, one_vec, mask); + auto sign_vec = Vectorized::blendv(one_vec.neg(), one_vec, mask); + a0 = (max_deriv_vec - sign_vec * (b0 / (one_vec + b0))) * c0; + mask = a1 < zero_vec; + max_deriv_vec = Vectorized::blendv(zero_vec, one_vec, mask); + sign_vec = Vectorized::blendv(one_vec.neg(), one_vec, mask); + a1 = (max_deriv_vec - sign_vec * (b1 / (one_vec + b1))) * c1; + return convert_float_bfloat16(a0, a1); + }); + } else { + AT_DISPATCH_FLOATING_TYPES(iter.dtype(), "log_sigmoid_backward_cpu", [&]() { using Vec = Vectorized; auto zero_val = scalar_t(0); auto zero_vec = Vec(zero_val); @@ -78,6 +143,7 @@ static void log_sigmoid_backward_cpu_kernel(TensorIterator& iter) { return (max_deriv_vec - sign_vec * (b / (one_vec + b))) * c; }); }); + } } static void threshold_kernel( @@ -318,7 +384,34 @@ void GeluBackwardKernelImpl(TensorIteratorBase& it, GeluType approximate) { } void hardsigmoid_kernel(TensorIteratorBase& iter) { - AT_DISPATCH_FLOATING_TYPES(iter.dtype(), "hardsigmoid_cpu", [&] { + if (iter.dtype() == kBFloat16) { + const float zero(0.0f); + const float three(3.0f); + const float six(6.0f); + using Vec = vec::Vectorized; + const Vec kZeroVec(zero); + const Vec kThreeVec(three); + const Vec kSixVec(six); + cpu_kernel_vec( + iter, + [&](BFloat16 self_val) -> BFloat16 { + return std::min(std::max(float(self_val) + three, zero), six) / six; + }, + [&](vec::Vectorized self_val) -> vec::Vectorized { + Vectorized self_val0, self_val1; + std::tie(self_val0, self_val1) = convert_bfloat16_float(self_val); + self_val0 = minimum( + maximum(self_val0 + kThreeVec, kZeroVec), + kSixVec + ) / kSixVec; + self_val1 = minimum( + maximum(self_val1 + kThreeVec, kZeroVec), + kSixVec + ) / kSixVec; + return convert_float_bfloat16(self_val0, self_val1); + }); + } else { + AT_DISPATCH_FLOATING_TYPES(iter.dtype(), "hardsigmoid_cpu", [&] { const scalar_t zero(0.0f); const scalar_t three(3.0f); const scalar_t six(6.0f); @@ -338,10 +431,37 @@ void hardsigmoid_kernel(TensorIteratorBase& iter) { ) / kSixVec; }); }); + } } void hardsigmoid_backward_kernel(TensorIteratorBase& iter) { - AT_DISPATCH_FLOATING_TYPES(iter.dtype(), "hardsigmoid_backward", [&] { + if (iter.dtype() == kBFloat16) { + const float zero(0.0f); + const float three(3.0f); + const float neg_three(-3.0f); + const float one_sixth(1.0f / 6.0f); + using Vec = Vectorized; + Vec kZeroVec(0.0f); + Vec kOneSixthVec(1.0f / 6.0f); + cpu_kernel_vec( + iter, + [=](BFloat16 grad_val, BFloat16 self_val) -> BFloat16 { + return (float(self_val) > neg_three && float(self_val) < three) + ? float(grad_val) * one_sixth + : zero; + }, + [=](Vectorized grad_val, Vectorized self_val) -> Vectorized { + Vec self_val0, self_val1, grad_val0, grad_val1; + std::tie(self_val0, self_val1) = convert_bfloat16_float(self_val); + std::tie(grad_val0, grad_val1) = convert_bfloat16_float(grad_val); + Vec gradNonZeroMask = (self_val0 > neg_three) & (self_val0 < three); + self_val0 = Vec::blendv(kZeroVec, grad_val0 * kOneSixthVec, gradNonZeroMask); + gradNonZeroMask = (self_val1 > neg_three) & (self_val1 < three); + self_val1 = Vec::blendv(kZeroVec, grad_val1 * kOneSixthVec, gradNonZeroMask); + return convert_float_bfloat16(self_val0, self_val1); + }); + } else { + AT_DISPATCH_FLOATING_TYPES(iter.dtype(), "hardsigmoid_backward", [&] { const scalar_t zero(0.0f); const scalar_t three(3.0f); const scalar_t neg_three(-3.0f); @@ -361,10 +481,11 @@ void hardsigmoid_backward_kernel(TensorIteratorBase& iter) { return Vec::blendv(kZeroVec, grad_val * kOneSixthVec, gradNonZeroMask); }); }); + } } void hardshrink_kernel(TensorIteratorBase& iter, const Scalar& lambd) { - AT_DISPATCH_FLOATING_TYPES(iter.dtype(), "hardshrink_cpu", [&] { + AT_DISPATCH_FLOATING_TYPES_AND(kBFloat16, iter.dtype(), "hardshrink_cpu", [&] { auto lambd_val = lambd.to(); cpu_kernel_vec( iter, @@ -379,16 +500,43 @@ void hardshrink_kernel(TensorIteratorBase& iter, const Scalar& lambd) { } void softshrink_kernel(TensorIteratorBase& iter, const Scalar& lambd) { - AT_DISPATCH_FLOATING_TYPES(iter.dtype(), "softshrink_cpu", [&]() { + if (iter.dtype() == kBFloat16) { + auto lambd_val = lambd.to(); + auto lambdVec = Vectorized(lambd_val); + cpu_kernel_vec( + iter, + [=](BFloat16 a) -> BFloat16 { + return float(a) > lambd_val ? a - lambd_val : (float(a) < -lambd_val ? a + lambd_val : float(0)); + }, + [=](Vectorized self_val) { + Vectorized self_val0, self_val1; + Vectorized self_val_t0, self_val_t1; + std::tie(self_val0, self_val1) = convert_bfloat16_float(self_val); + self_val_t0 = convert_float_bfloat16((self_val0 > lambdVec) & (self_val0 - lambdVec), (self_val1 > lambdVec) & (self_val1 - lambdVec)); + self_val_t1 = convert_float_bfloat16((self_val0 < -lambd_val) & (self_val0 + lambdVec), (self_val1 < -lambd_val) & (self_val1 + lambdVec)); + return (self_val_t0 | self_val_t1); + }); + } else { + AT_DISPATCH_FLOATING_TYPES(iter.dtype(), "softshrink_cpu", [&]() { auto lambd_val = lambd.to(); - cpu_kernel(iter, [=](scalar_t a) -> scalar_t { - return a > lambd_val ? a - lambd_val : (a < -lambd_val ? a + lambd_val : scalar_t(0)); - }); + auto lambdVec = Vectorized(lambd_val); + cpu_kernel_vec( + iter, + [=](scalar_t a) -> scalar_t { + return a > lambd_val ? a - lambd_val : (a < -lambd_val ? a + lambd_val : scalar_t(0)); + }, + [=](Vectorized self_val) { + Vectorized self_val_t0, self_val_t1; + self_val_t0 = (self_val > lambdVec) & (self_val - lambdVec); + self_val_t1 = (self_val < -lambd_val) & (self_val + lambdVec); + return (self_val_t0 | self_val_t1); + }); }); + } } void shrink_backward_kernel(TensorIteratorBase& iter, const Scalar& lambd) { - AT_DISPATCH_FLOATING_TYPES(iter.dtype(), "shrink_backward_cpu", [&] { + AT_DISPATCH_FLOATING_TYPES_AND(kBFloat16, iter.dtype(), "shrink_backward_cpu", [&] { auto lambd_val = lambd.to(); cpu_kernel_vec( iter, @@ -418,7 +566,35 @@ void hardtanh_backward_kernel(TensorIterator& iter, const Scalar& min, const Sca } void hardswish_kernel(TensorIterator& iter) { - AT_DISPATCH_FLOATING_TYPES(iter.dtype(), "hardswish_cpu", [&]() { + if (iter.dtype() == kBFloat16) { + const float zero(0.0f); + const float three(3.0f); + const float six(6.0f); + using Vec = vec::Vectorized; + const Vec kZeroVec(zero); + const Vec kThreeVec(three); + const Vec kSixVec(six); + cpu_kernel_vec( + iter, + [&](BFloat16 x) -> BFloat16 { + return float(x) * std::min(std::max(float(x) + three, zero), six) / six; + }, + [&](vec::Vectorized x_vec) { + Vectorized x_vec0, x_vec1; + std::tie(x_vec0, x_vec1) = convert_bfloat16_float(x_vec); + x_vec0 = x_vec0 * minimum( + maximum(x_vec0 + kThreeVec, kZeroVec), + kSixVec + ) / kSixVec; + x_vec1 = x_vec1 * minimum( + maximum(x_vec1 + kThreeVec, kZeroVec), + kSixVec + ) / kSixVec; + return convert_float_bfloat16(x_vec0, x_vec1); + } + ); + } else { + AT_DISPATCH_FLOATING_TYPES(iter.dtype(), "hardswish_cpu", [&]() { const scalar_t zero(0.0f); const scalar_t three(3.0f); const scalar_t six(6.0f); @@ -439,10 +615,58 @@ void hardswish_kernel(TensorIterator& iter) { } ); }); + } } void hardswish_backward_kernel(TensorIterator& iter) { - AT_DISPATCH_FLOATING_TYPES(iter.dtype(), "hardswish_backward_cpu", [&]() { + if (iter.dtype() == kBFloat16) { + const float zero(0.0f); + const float three(3.0f); + const float neg_three(-3.0f); + const float one_half(0.5f); + using Vec = vec::Vectorized; + const Vec kZeroVec(zero); + const Vec kThreeVec(three); + const Vec kNegThreeVec(neg_three); + const Vec kOneHalfVec(one_half); + cpu_kernel_vec( + iter, + [&](BFloat16 grad_val, BFloat16 self_val) -> BFloat16 { + if (float(self_val) < neg_three) { + return zero; + } else if (float(self_val) <= three) { + return float(grad_val) * ((float(self_val) / three) + one_half); + } else { + return grad_val; + } + }, + [&](vec::Vectorized grad_val, vec::Vectorized self_val) { + Vectorized self_val0, self_val1, grad_val0, grad_val1; + std::tie(self_val0, self_val1) = convert_bfloat16_float(self_val); + std::tie(grad_val0, grad_val1) = convert_bfloat16_float(grad_val); + self_val0 = Vec::blendv( + Vec::blendv( + grad_val0 * ((self_val0 / kThreeVec) + kOneHalfVec), + grad_val0, + self_val0 >= kThreeVec + ), + kZeroVec, + self_val0 < kNegThreeVec + ); + self_val1 = Vec::blendv( + Vec::blendv( + grad_val1 * ((self_val1 / kThreeVec) + kOneHalfVec), + grad_val1, + self_val1 >= kThreeVec + ), + kZeroVec, + self_val1 < kNegThreeVec + ); + return convert_float_bfloat16(self_val0, self_val1); + } + ); + } else { + AT_DISPATCH_FLOATING_TYPES(iter.dtype(), "hardswish_backward_cpu", [&]() { const scalar_t zero(0.0f); const scalar_t three(3.0f); const scalar_t neg_three(-3.0f); @@ -476,6 +700,7 @@ void hardswish_backward_kernel(TensorIterator& iter) { } ); }); + } } static void leaky_relu_kernel(TensorIteratorBase& iter, const Scalar& negval_) { @@ -556,7 +781,28 @@ static void leaky_relu_backward_kernel(TensorIteratorBase& iter, const Scalar& n } void softplus_kernel(TensorIteratorBase& iter, const Scalar& beta_, const Scalar& threshold_) { - AT_DISPATCH_FLOATING_TYPES(iter.dtype(), "softplus_cpu", [&]() { + if (iter.dtype() == kBFloat16) { + using Vec = Vectorized; + auto beta = beta_.to(); + auto threshold = threshold_.to(); + const Vec beta_vec(beta); + const Vec threshold_vec(threshold); + cpu_kernel_vec( + iter, + [beta, threshold](BFloat16 a) -> BFloat16 { + return (float(a) * beta) > threshold ? a + : static_cast((std::log1p(std::exp(float(a) * beta))) / beta); + }, + [beta_vec, threshold_vec](Vectorized a) -> Vectorized { + Vectorized a0, a1; + std::tie(a0, a1) = convert_bfloat16_float(a); + a0 = Vec::blendv((a0 * beta_vec).exp().log1p() / beta_vec, a0, (a0 * beta_vec) > threshold_vec); + a1 = Vec::blendv((a1 * beta_vec).exp().log1p() / beta_vec, a1, (a1 * beta_vec) > threshold_vec); + return convert_float_bfloat16(a0, a1); + } + ); + } else { + AT_DISPATCH_FLOATING_TYPES(iter.dtype(), "softplus_cpu", [&]() { using Vec = Vectorized; auto beta = beta_.to(); auto threshold = threshold_.to(); @@ -573,10 +819,36 @@ void softplus_kernel(TensorIteratorBase& iter, const Scalar& beta_, const Scalar } ); }); + } } void softplus_backward_kernel(TensorIteratorBase& iter, const Scalar& beta_, const Scalar& threshold_) { - AT_DISPATCH_FLOATING_TYPES(iter.dtype(), "softplus_backward_cpu", [&]() { + if (iter.dtype() == kBFloat16) { + using Vec = Vectorized; + auto beta = beta_.to(); + auto threshold = threshold_.to(); + const Vec beta_vec(beta); + const Vec threshold_vec(threshold); + const Vec one_vec(static_cast(1.0)); + cpu_kernel_vec( + iter, + [beta, threshold](BFloat16 a, BFloat16 b) -> BFloat16 { + float z = std::exp(float(b) * beta); + return (float(b) * beta) > threshold ? a : static_cast(float(a) * z / (z + float(1.))); + }, + [beta_vec, one_vec, threshold_vec](Vectorized a, Vectorized b) -> Vectorized { + Vectorized a0, a1, b0, b1; + std::tie(a0, a1) = convert_bfloat16_float(a); + std::tie(b0, b1) = convert_bfloat16_float(b); + Vec z = (b0 * beta_vec).exp(); + a0 = Vec::blendv(a0 * z / (z + one_vec), a0, (b0 * beta_vec) > threshold_vec); + z = (b1 * beta_vec).exp(); + a1 = Vec::blendv(a1 * z / (z + one_vec), a1, (b1 * beta_vec) > threshold_vec); + return convert_float_bfloat16(a0, a1); + } + ); + } else { + AT_DISPATCH_FLOATING_TYPES(iter.dtype(), "softplus_backward_cpu", [&]() { using Vec = Vectorized; auto beta = beta_.to(); auto threshold = threshold_.to(); @@ -595,6 +867,7 @@ void softplus_backward_kernel(TensorIteratorBase& iter, const Scalar& beta_, con } ); }); + } } void glu_kernel(TensorIteratorBase& iter) { diff --git a/aten/src/ATen/native/cpu/BinaryOpsKernel.cpp b/aten/src/ATen/native/cpu/BinaryOpsKernel.cpp index 0e5db26b069dce..1f39aeb3256c90 100644 --- a/aten/src/ATen/native/cpu/BinaryOpsKernel.cpp +++ b/aten/src/ATen/native/cpu/BinaryOpsKernel.cpp @@ -625,8 +625,33 @@ void fmin_kernel(TensorIteratorBase& iter) { } void smooth_l1_kernel(TensorIteratorBase& iter, double beta) { - AT_DISPATCH_FLOATING_TYPES_AND2( - kBFloat16, kHalf, iter.dtype(), "smooth_l1_cpu", [&]() { + if (iter.dtype() == kBFloat16) { + const float beta_val(beta); + const Vectorized beta_val_vec(beta_val); + const Vectorized point_five_vec(static_cast(0.5)); + cpu_kernel_vec( + iter, + [&beta_val](BFloat16 a, BFloat16 b) -> BFloat16 { + auto z = std::abs(float(a) - float(b)); + return z < beta_val + ? static_cast(0.5) * z * z / beta_val + : z - static_cast(0.5) * beta_val; + }, + [&beta_val_vec, &point_five_vec](Vectorized a, Vectorized b) { + Vectorized a0, a1, b0, b1; + std::tie(a0, a1) = convert_bfloat16_float(a); + std::tie(b0, b1) = convert_bfloat16_float(b); + auto z = (a0 - b0).abs(); + a0 = Vectorized::blendv( + point_five_vec * z * z / beta_val_vec, z - point_five_vec * beta_val_vec, z >= beta_val_vec); + z = (a1 - b1).abs(); + a1 = Vectorized::blendv( + point_five_vec * z * z / beta_val_vec, z - point_five_vec * beta_val_vec, z >= beta_val_vec); + return convert_float_bfloat16(a0, a1); + }); + } else { + AT_DISPATCH_FLOATING_TYPES_AND( + kHalf, iter.dtype(), "smooth_l1_cpu", [&]() { using Vec = Vectorized; const scalar_t beta_val(beta); const Vec beta_val_vec(beta_val); @@ -645,6 +670,7 @@ void smooth_l1_kernel(TensorIteratorBase& iter, double beta) { point_five_vec * z * z / beta_val_vec, z - point_five_vec * beta_val_vec, z >= beta_val_vec); }); }); + } } void huber_kernel(TensorIterator& iter, double delta) { diff --git a/aten/src/ATen/native/cpu/BlasKernel.cpp b/aten/src/ATen/native/cpu/BlasKernel.cpp index 68bb78c0003bbd..7b60e9a45cbac4 100644 --- a/aten/src/ATen/native/cpu/BlasKernel.cpp +++ b/aten/src/ATen/native/cpu/BlasKernel.cpp @@ -191,19 +191,28 @@ void cpublas_gemm_impl( } void cpublas_axpy_impl(at::ScalarType type, int64_t n, const Scalar& _a, const void *_x, int64_t incx, void *_y, int64_t incy){ - AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND2(at::kHalf, at::kBFloat16, type, "cpublas_axpy_impl", - [&] { - auto a = _a.to(); - auto x = static_cast(_x); - auto y = static_cast(_y); + if (type == at::kBool) { + auto a = _a.to(); + auto x = static_cast(_x); + auto y = static_cast(_y); int64_t i; for(i = 0; i < n; i++) - y[i*incy] += a*x[i*incx]; - }); + y[i*incy] |= a & x[i*incx]; + } else { + AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND2(at::kHalf, at::kBFloat16, type, "cpublas_axpy_impl", + [&] { + auto a = _a.to(); + auto x = static_cast(_x); + auto y = static_cast(_y); + int64_t i; + for(i = 0; i < n; i++) + y[i*incy] += a*x[i*incx]; + }); + } } void cpublas_copy_impl(at::ScalarType type, int64_t n, const void *_x, int64_t incx, void *_y, int64_t incy){ - AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND2(at::kHalf, at::kBFloat16, type, "cpublas_copy_impl", + AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3(at::kHalf, at::kBFloat16, at::kBool, type, "cpublas_copy_impl", [&] { auto x = static_cast(_x); auto y = static_cast(_y); diff --git a/aten/src/ATen/native/cpu/ComplexKernel.cpp b/aten/src/ATen/native/cpu/ComplexKernel.cpp index 56d8fc80ae00e9..99dc6134537ea3 100644 --- a/aten/src/ATen/native/cpu/ComplexKernel.cpp +++ b/aten/src/ATen/native/cpu/ComplexKernel.cpp @@ -9,7 +9,7 @@ namespace native { namespace { void complex_kernel(TensorIterator& iter) { - AT_DISPATCH_FLOATING_TYPES(iter.input_dtype(), "complex_cpu", [&]() { + AT_DISPATCH_FLOATING_TYPES_AND(kHalf, iter.input_dtype(), "complex_cpu", [&]() { cpu_kernel(iter, [=](scalar_t a, scalar_t b) -> c10::complex { return c10::complex(a, b); }); diff --git a/aten/src/ATen/native/cpu/CopyKernel.cpp b/aten/src/ATen/native/cpu/CopyKernel.cpp index 6e1d134c3e47bd..40a0c20b5ca8de 100644 --- a/aten/src/ATen/native/cpu/CopyKernel.cpp +++ b/aten/src/ATen/native/cpu/CopyKernel.cpp @@ -81,9 +81,9 @@ void copy_kernel(TensorIterator& iter, bool /*non_blocking*/) { if (dtype == iter.dtype(1)) { copy_same_dtype(iter, requires_conj, requires_neg); } else { - AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3(ScalarType::Half, ScalarType::Bool, ScalarType::BFloat16, dtype, "copy_", [&] { + AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND4(ScalarType::ComplexHalf, ScalarType::Half, ScalarType::Bool, ScalarType::BFloat16, dtype, "copy_", [&] { using dest_t = scalar_t; - AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3(ScalarType::Half, ScalarType::Bool, ScalarType::BFloat16, iter.dtype(1), "copy_", [&] { + AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND4(ScalarType::ComplexHalf, ScalarType::Half, ScalarType::Bool, ScalarType::BFloat16, iter.dtype(1), "copy_", [&] { // Note (@zasdfgbnm): // // The code below can not be simplified as diff --git a/aten/src/ATen/native/cpu/PointwiseOpsKernel.cpp b/aten/src/ATen/native/cpu/PointwiseOpsKernel.cpp index 549384055f2058..d3be310e280244 100644 --- a/aten/src/ATen/native/cpu/PointwiseOpsKernel.cpp +++ b/aten/src/ATen/native/cpu/PointwiseOpsKernel.cpp @@ -92,7 +92,55 @@ static void addcdiv_cpu_kernel(TensorIteratorBase& iter, const Scalar& value) { static void smooth_l1_backward_cpu_kernel(TensorIterator& iter, const Scalar& norm, double beta) { ScalarType dtype = iter.dtype(0); - AT_DISPATCH_ALL_TYPES(dtype, "smooth_l1_backward_cpu_out", [&] { + if (dtype == kBFloat16) { + auto norm_val = norm.to(); + float beta_val(beta); + auto norm_val_vec = Vectorized(norm_val); + auto beta_val_vec = Vectorized(beta_val); + const auto neg_1_vec = Vectorized(-1); + const auto zero_vec = Vectorized(0); + const auto pos_1_vec = Vectorized(1); + cpu_kernel_vec(iter, + [=](BFloat16 input, BFloat16 target, BFloat16 grad_output) -> BFloat16 { + const auto x = float(input) - float(target); + if (x <= -beta){ + return -norm_val * float(grad_output); + }else if (x >= beta){ + return norm_val * float(grad_output); + }else{ + return norm_val * x * float(grad_output) / beta; + } + }, + [norm_val_vec, beta_val_vec, neg_1_vec, zero_vec, pos_1_vec]( + Vectorized input, Vectorized target, Vectorized grad_output) -> Vectorized { + // using two blendv calls to simulate the 3 cases + // 1 if x >= beta + // -1 if x <= -beta + // x / beta if |x| < beta + Vectorized input0, input1, target0, target1, grad_output0, grad_output1; + std::tie(input0, input1) = convert_bfloat16_float(input); + std::tie(target0, target1) = convert_bfloat16_float(target); + std::tie(grad_output0, grad_output1) = convert_bfloat16_float(grad_output); + auto x = input0 - target0; + auto pos_or_neg_1_vec = Vectorized::blendv( + neg_1_vec, pos_1_vec, x > zero_vec); + auto x_abs = x.abs(); + auto output = Vectorized::blendv( + x / beta_val_vec, pos_or_neg_1_vec, x_abs >= beta_val_vec); + input0 = norm_val_vec * output * grad_output0; + + x = input1 - target1; + pos_or_neg_1_vec = Vectorized::blendv( + neg_1_vec, pos_1_vec, x > zero_vec); + x_abs = x.abs(); + output = Vectorized::blendv( + x / beta_val_vec, pos_or_neg_1_vec, x_abs >= beta_val_vec); + input1 = norm_val_vec * output * grad_output1; + return convert_float_bfloat16(input0, input1); + } + ); + } else { + AT_DISPATCH_ALL_TYPES(dtype, "smooth_l1_backward_cpu_out", [&] { auto norm_val = norm.to(); scalar_t beta_val(beta); auto norm_val_vec = Vectorized(norm_val); @@ -126,6 +174,7 @@ static void smooth_l1_backward_cpu_kernel(TensorIterator& iter, const Scalar& no } ); }); + } } static void huber_backward_cpu_kernel(TensorIterator& iter, const Scalar& norm, double delta) { diff --git a/aten/src/ATen/native/cpu/ScatterGatherKernel.cpp b/aten/src/ATen/native/cpu/ScatterGatherKernel.cpp index ee0d457ed2c951..c3ad085e03f3d4 100644 --- a/aten/src/ATen/native/cpu/ScatterGatherKernel.cpp +++ b/aten/src/ATen/native/cpu/ScatterGatherKernel.cpp @@ -35,6 +35,33 @@ class ReduceAdd { }; static ReduceAdd reduce_add; +class ReduceMean { +public: + template + constexpr void operator() (scalar_t * self_data, scalar_t * src_data) const { + *self_data += *src_data; + } +}; +static ReduceMean reduce_mean; + +class ReduceMaximum { +public: + template + constexpr void operator() (scalar_t * self_data, scalar_t * src_data) const { + *self_data = std::max(*self_data, *src_data); + } +}; +static ReduceMaximum reduce_maximum; + +class ReduceMinimum { +public: + template + constexpr void operator() (scalar_t * self_data, scalar_t * src_data) const { + *self_data = std::min(*self_data, *src_data); + } +}; +static ReduceMinimum reduce_minimum; + class TensorAssign { public: template @@ -283,6 +310,273 @@ struct cpu_scatter_gather_base_kernel { } ); } + + void operator()(const Tensor& self, int64_t dim, + const Tensor& index, const Tensor& src, + const std::string& method_name, ReduceMean& kernel_func) { + + auto iter = TensorIteratorConfig() + .check_all_same_dtype(false) + .resize_outputs(false) + // NOLINTNEXTLINE(bugprone-argument-comment) + .declare_static_shape(index.sizes(), /*squash_dim=*/dim) + .add_output(self) + .add_input(src) + .add_input(index) + .build(); + + auto self_dim_stride = ensure_nonempty_stride(self, dim); + auto self_dim_size = ensure_nonempty_size(self, dim); + + auto index_dim_stride = ensure_nonempty_stride(index, dim); + auto index_dim_size = ensure_nonempty_size(index, dim); + + auto src_dim_stride = ensure_nonempty_stride(src, dim); + auto src_dim_size = ensure_nonempty_size(src, dim); + + auto index_upper_bound = is_scatter_like ? self_dim_size : src_dim_size; + + int64_t grain_size = std::max((int64_t) 1, at::internal::GRAIN_SIZE / index_dim_size); + + AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND2( + ScalarType::Half, ScalarType::BFloat16, iter.dtype(), + "scatter_gather_tensor_cpu_reduce_mean", [&] { + constexpr auto SELF_ITER_STRIDE_IDX = 0; + constexpr auto INDEX_ITER_STRIDE_IDX = 2; + constexpr auto SRC_ITER_STRIDE_IDX = 1; + auto loop = [&](char** data, const int64_t* strides, int64_t n) { + auto* self_data_bytes = data[SELF_ITER_STRIDE_IDX]; + auto* index_data_bytes = data[INDEX_ITER_STRIDE_IDX]; + auto* src_data_bytes = data[SRC_ITER_STRIDE_IDX]; + // we change the order of TensorIterator-dim loop + // vs dim-TensorIterator loop order depending on + // whether dim is the last dimension + if (dim== self.dim() - 1) { + for (const auto nelem : c10::irange(n)) { + (void)nelem; //Suppress unused variable warning + // dim loop is a separate code block + // for better performance + _cpu_scatter_gather_dim_loop()( + (scalar_t*)self_data_bytes, self_dim_stride, + (int64_t*)index_data_bytes, index_dim_stride, + (scalar_t*)src_data_bytes, src_dim_stride, + dim, index_dim_size, index_upper_bound, + kernel_func + ); + + self_data_bytes += strides[SELF_ITER_STRIDE_IDX]; + index_data_bytes += strides[INDEX_ITER_STRIDE_IDX]; + src_data_bytes += strides[SRC_ITER_STRIDE_IDX]; + } + } + else { + for (const auto i : c10::irange(index_dim_size)) { + auto* self_data = self_data_bytes; + auto* index_data = (char*)((int64_t*)index_data_bytes + i * index_dim_stride); + auto* src_data = src_data_bytes; + for (const auto nelem : c10::irange(n)) { + (void)nelem; //Suppress unused variable warning + int64_t idx_dim = *(int64_t*)index_data; + // we are not putting idx_dim in the error message because it disables + // loop optimization in clang-7 + TORCH_CHECK(idx_dim >= 0 && idx_dim < index_upper_bound, + "index ", *(int64_t*)index_data, + " is out of bounds for dimension ", dim, + " with size ", index_upper_bound); + + kernel_func( + (scalar_t*)self_data + (is_scatter_like ? idx_dim : i) * self_dim_stride, + (scalar_t*)src_data + (is_scatter_like ? i : idx_dim) * src_dim_stride); + + self_data += strides[SELF_ITER_STRIDE_IDX]; + index_data += strides[INDEX_ITER_STRIDE_IDX]; + src_data += strides[SRC_ITER_STRIDE_IDX]; + } + } + } + }; + iter.for_each(loop, grain_size); + } + ); + } + + void operator()(const Tensor& self, int64_t dim, + const Tensor& index, const Tensor& src, + const std::string& method_name, ReduceMaximum& kernel_func) { + + auto iter = TensorIteratorConfig() + .check_all_same_dtype(false) + .resize_outputs(false) + // NOLINTNEXTLINE(bugprone-argument-comment) + .declare_static_shape(index.sizes(), /*squash_dim=*/dim) + .add_output(self) + .add_input(src) + .add_input(index) + .build(); + + auto self_dim_stride = ensure_nonempty_stride(self, dim); + auto self_dim_size = ensure_nonempty_size(self, dim); + + auto index_dim_stride = ensure_nonempty_stride(index, dim); + auto index_dim_size = ensure_nonempty_size(index, dim); + + auto src_dim_stride = ensure_nonempty_stride(src, dim); + auto src_dim_size = ensure_nonempty_size(src, dim); + + auto index_upper_bound = is_scatter_like ? self_dim_size : src_dim_size; + + int64_t grain_size = std::max((int64_t) 1, at::internal::GRAIN_SIZE / index_dim_size); + + AT_DISPATCH_ALL_TYPES_AND3( + ScalarType::Bool, ScalarType::Half, ScalarType::BFloat16, iter.dtype(), + "scatter_gather_tensor_cpu_reduce_amax", [&] { + constexpr auto SELF_ITER_STRIDE_IDX = 0; + constexpr auto INDEX_ITER_STRIDE_IDX = 2; + constexpr auto SRC_ITER_STRIDE_IDX = 1; + auto loop = [&](char** data, const int64_t* strides, int64_t n) { + auto* self_data_bytes = data[SELF_ITER_STRIDE_IDX]; + auto* index_data_bytes = data[INDEX_ITER_STRIDE_IDX]; + auto* src_data_bytes = data[SRC_ITER_STRIDE_IDX]; + // we change the order of TensorIterator-dim loop + // vs dim-TensorIterator loop order depending on + // whether dim is the last dimension + if (dim== self.dim() - 1) { + for (const auto nelem : c10::irange(n)) { + (void)nelem; //Suppress unused variable warning + // dim loop is a separate code block + // for better performance + _cpu_scatter_gather_dim_loop()( + (scalar_t*)self_data_bytes, self_dim_stride, + (int64_t*)index_data_bytes, index_dim_stride, + (scalar_t*)src_data_bytes, src_dim_stride, + dim, index_dim_size, index_upper_bound, + kernel_func + ); + + self_data_bytes += strides[SELF_ITER_STRIDE_IDX]; + index_data_bytes += strides[INDEX_ITER_STRIDE_IDX]; + src_data_bytes += strides[SRC_ITER_STRIDE_IDX]; + } + } + else { + for (const auto i : c10::irange(index_dim_size)) { + auto* self_data = self_data_bytes; + auto* index_data = (char*)((int64_t*)index_data_bytes + i * index_dim_stride); + auto* src_data = src_data_bytes; + for (const auto nelem : c10::irange(n)) { + (void)nelem; //Suppress unused variable warning + int64_t idx_dim = *(int64_t*)index_data; + // we are not putting idx_dim in the error message because it disables + // loop optimization in clang-7 + TORCH_CHECK(idx_dim >= 0 && idx_dim < index_upper_bound, + "index ", *(int64_t*)index_data, + " is out of bounds for dimension ", dim, + " with size ", index_upper_bound); + + kernel_func( + (scalar_t*)self_data + (is_scatter_like ? idx_dim : i) * self_dim_stride, + (scalar_t*)src_data + (is_scatter_like ? i : idx_dim) * src_dim_stride); + + self_data += strides[SELF_ITER_STRIDE_IDX]; + index_data += strides[INDEX_ITER_STRIDE_IDX]; + src_data += strides[SRC_ITER_STRIDE_IDX]; + } + } + } + }; + iter.for_each(loop, grain_size); + } + ); + } + + void operator()(const Tensor& self, int64_t dim, + const Tensor& index, const Tensor& src, + const std::string& method_name, ReduceMinimum& kernel_func) { + + auto iter = TensorIteratorConfig() + .check_all_same_dtype(false) + .resize_outputs(false) + // NOLINTNEXTLINE(bugprone-argument-comment) + .declare_static_shape(index.sizes(), /*squash_dim=*/dim) + .add_output(self) + .add_input(src) + .add_input(index) + .build(); + + auto self_dim_stride = ensure_nonempty_stride(self, dim); + auto self_dim_size = ensure_nonempty_size(self, dim); + + auto index_dim_stride = ensure_nonempty_stride(index, dim); + auto index_dim_size = ensure_nonempty_size(index, dim); + + auto src_dim_stride = ensure_nonempty_stride(src, dim); + auto src_dim_size = ensure_nonempty_size(src, dim); + + auto index_upper_bound = is_scatter_like ? self_dim_size : src_dim_size; + + int64_t grain_size = std::max((int64_t) 1, at::internal::GRAIN_SIZE / index_dim_size); + + AT_DISPATCH_ALL_TYPES_AND3( + ScalarType::Bool, ScalarType::Half, ScalarType::BFloat16, iter.dtype(), + "scatter_gather_tensor_cpu_reduce_amin", [&] { + constexpr auto SELF_ITER_STRIDE_IDX = 0; + constexpr auto INDEX_ITER_STRIDE_IDX = 2; + constexpr auto SRC_ITER_STRIDE_IDX = 1; + auto loop = [&](char** data, const int64_t* strides, int64_t n) { + auto* self_data_bytes = data[SELF_ITER_STRIDE_IDX]; + auto* index_data_bytes = data[INDEX_ITER_STRIDE_IDX]; + auto* src_data_bytes = data[SRC_ITER_STRIDE_IDX]; + // we change the order of TensorIterator-dim loop + // vs dim-TensorIterator loop order depending on + // whether dim is the last dimension + if (dim== self.dim() - 1) { + for (const auto nelem : c10::irange(n)) { + (void)nelem; //Suppress unused variable warning + // dim loop is a separate code block + // for better performance + _cpu_scatter_gather_dim_loop()( + (scalar_t*)self_data_bytes, self_dim_stride, + (int64_t*)index_data_bytes, index_dim_stride, + (scalar_t*)src_data_bytes, src_dim_stride, + dim, index_dim_size, index_upper_bound, + kernel_func + ); + + self_data_bytes += strides[SELF_ITER_STRIDE_IDX]; + index_data_bytes += strides[INDEX_ITER_STRIDE_IDX]; + src_data_bytes += strides[SRC_ITER_STRIDE_IDX]; + } + } + else { + for (const auto i : c10::irange(index_dim_size)) { + auto* self_data = self_data_bytes; + auto* index_data = (char*)((int64_t*)index_data_bytes + i * index_dim_stride); + auto* src_data = src_data_bytes; + for (const auto nelem : c10::irange(n)) { + (void)nelem; //Suppress unused variable warning + int64_t idx_dim = *(int64_t*)index_data; + // we are not putting idx_dim in the error message because it disables + // loop optimization in clang-7 + TORCH_CHECK(idx_dim >= 0 && idx_dim < index_upper_bound, + "index ", *(int64_t*)index_data, + " is out of bounds for dimension ", dim, + " with size ", index_upper_bound); + + kernel_func( + (scalar_t*)self_data + (is_scatter_like ? idx_dim : i) * self_dim_stride, + (scalar_t*)src_data + (is_scatter_like ? i : idx_dim) * src_dim_stride); + + self_data += strides[SELF_ITER_STRIDE_IDX]; + index_data += strides[INDEX_ITER_STRIDE_IDX]; + src_data += strides[SRC_ITER_STRIDE_IDX]; + } + } + } + }; + iter.for_each(loop, grain_size); + } + ); + } }; void gather_cpu_kernel(const Tensor& result, const Tensor& self, int64_t dim, const Tensor& index) { @@ -319,6 +613,34 @@ void scatter_reduce_cpu_kernel(const Tensor& self, const int64_t dim, const Tens cpu_scatter_gather_base_kernel<>()(self, dim, index, src, "scatter_reduce_multiply_", reduce_multiply); break; + default : + break; + } +} + +void scatter_reduce_two_cpu_kernel(const Tensor& self, const int64_t dim, const Tensor& index, + const Tensor& src, const SCATTER_GATHER_OP& reduce) { + switch (reduce) { + case SCATTER_GATHER_OP::REDUCE_ADD : + cpu_scatter_gather_base_kernel<>()(self, dim, index, src, + "scatter_reduce_sum_", reduce_add); + break; + case SCATTER_GATHER_OP::REDUCE_MULTIPLY : + cpu_scatter_gather_base_kernel<>()(self, dim, index, src, + "scatter_reduce_prod_", reduce_multiply); + break; + case SCATTER_GATHER_OP::REDUCE_MAXIMUM : + cpu_scatter_gather_base_kernel<>()(self, dim, index, src, + "scatter_reduce_amax_", reduce_maximum); + break; + case SCATTER_GATHER_OP::REDUCE_MINIMUM : + cpu_scatter_gather_base_kernel<>()(self, dim, index, src, + "scatter_reduce_amin_", reduce_minimum); + break; + case SCATTER_GATHER_OP::REDUCE_MEAN : + cpu_scatter_gather_base_kernel<>()(self, dim, index, src, + "scatter_reduce_mean_", reduce_mean); + break; } } @@ -333,6 +655,8 @@ void scatter_scalar_reduce_cpu_kernel(const Tensor& self, const int64_t dim, con cpu_scatter_gather_base_kernel<>()(self, dim, index, value, "scatter_scalar_reduce_multiply_", reduce_multiply); break; + default: + break; } } @@ -344,5 +668,6 @@ REGISTER_DISPATCH(scatter_fill_stub, &scatter_fill_cpu_kernel); REGISTER_DISPATCH(scatter_add_stub, &scatter_add_cpu_kernel); REGISTER_DISPATCH(scatter_reduce_stub, &scatter_reduce_cpu_kernel); REGISTER_DISPATCH(scatter_scalar_reduce_stub, &scatter_scalar_reduce_cpu_kernel); +REGISTER_DISPATCH(scatter_reduce_two_stub, &scatter_reduce_two_cpu_kernel); }} // namespace at::native diff --git a/aten/src/ATen/native/cpu/SortingKernel.cpp b/aten/src/ATen/native/cpu/SortingKernel.cpp index 829cfd87acfc97..715e7f1d605cd5 100644 --- a/aten/src/ATen/native/cpu/SortingKernel.cpp +++ b/aten/src/ATen/native/cpu/SortingKernel.cpp @@ -47,6 +47,10 @@ void _dim_apply( auto* values_data_bytes = data[0]; auto* indices_data_bytes = data[1]; + if(values_data_bytes==nullptr || indices_data_bytes==nullptr){ + return; + } + for (const auto i : c10::irange(n)) { (void)i; //Suppress unused variable warning f( diff --git a/aten/src/ATen/native/cpu/UnaryOpsKernel.cpp b/aten/src/ATen/native/cpu/UnaryOpsKernel.cpp index 8d862615cc5d1c..11661982d279d2 100644 --- a/aten/src/ATen/native/cpu/UnaryOpsKernel.cpp +++ b/aten/src/ATen/native/cpu/UnaryOpsKernel.cpp @@ -504,6 +504,13 @@ static void ndtri_kernel(TensorIteratorBase& iter) { }); } +static void log_ndtr_kernel(TensorIteratorBase& iter) { + TORCH_INTERNAL_ASSERT(iter.ntensors() == 2); + AT_DISPATCH_FLOATING_TYPES(iter.common_dtype(), "log_ndtr_cpu", [&]() { + cpu_kernel(iter, [](scalar_t x) { return calc_log_ndtr(x); }); + }); +} + static void i0e_kernel(TensorIteratorBase& iter) { TORCH_INTERNAL_ASSERT(iter.ntensors() == 2); AT_DISPATCH_FLOATING_TYPES_AND( @@ -641,6 +648,7 @@ REGISTER_DISPATCH(special_entr_stub, &CPU_CAPABILITY::entr_kernel); REGISTER_DISPATCH(frexp_stub, &CPU_CAPABILITY::frexp_kernel); REGISTER_DISPATCH(special_i0e_stub, &CPU_CAPABILITY::i0e_kernel); REGISTER_DISPATCH(special_ndtri_stub, &CPU_CAPABILITY::ndtri_kernel); +REGISTER_DISPATCH(special_log_ndtr_stub, &CPU_CAPABILITY::log_ndtr_kernel); REGISTER_DISPATCH(special_i1_stub, &CPU_CAPABILITY::i1_kernel); REGISTER_DISPATCH(special_i1e_stub, &CPU_CAPABILITY::i1e_kernel); REGISTER_DISPATCH(special_erfcx_stub, &CPU_CAPABILITY::erfcx_kernel); diff --git a/aten/src/ATen/native/cpu/layer_norm_kernel.cpp b/aten/src/ATen/native/cpu/layer_norm_kernel.cpp index bd3cfc564c531a..e1af1658d1a345 100644 --- a/aten/src/ATen/native/cpu/layer_norm_kernel.cpp +++ b/aten/src/ATen/native/cpu/layer_norm_kernel.cpp @@ -42,10 +42,13 @@ void LayerNormKernelImplInternal( const T* gamma_data = gamma.defined() ? gamma.data_ptr() : nullptr; const T* beta_data = beta.defined() ? beta.data_ptr() : nullptr; T* Y_data = Y->data_ptr(); - T* mean_data = mean->data_ptr(); - T* rstd_data = rstd->data_ptr(); + T* mean_data = mean ? mean->data_ptr() : nullptr; + T* rstd_data = rstd ? rstd->data_ptr() : nullptr; + const bool gamma_null = gamma_data == nullptr; const bool beta_null = beta_data == nullptr; + const bool mean_null = mean_data == nullptr; + const bool rstd_null = rstd_data == nullptr; at::parallel_for(0, M, 1, [&](int64_t start, int64_t end) { for (const auto i : c10::irange(start, end)) { const T* X_ptr = X_data + i * N; @@ -73,8 +76,12 @@ void LayerNormKernelImplInternal( beta_data, N); } - mean_data[i] = mean_val; - rstd_data[i] = rstd_val; + if (!mean_null) { + mean_data[i] = mean_val; + } + if (!rstd_null) { + rstd_data[i] = rstd_val; + } } }); } diff --git a/aten/src/ATen/native/cuda/AbsKernel.cu b/aten/src/ATen/native/cuda/AbsKernel.cu index 3bfc2621d9305f..ad9b0380f6f26b 100644 --- a/aten/src/ATen/native/cuda/AbsKernel.cu +++ b/aten/src/ATen/native/cuda/AbsKernel.cu @@ -1,6 +1,7 @@ #define TORCH_ASSERT_NO_OPERATORS #include #include +#include #include #include #include @@ -14,12 +15,36 @@ struct AbsFunctor { } }; +const char abs_name[] = "abs_kernel"; void abs_kernel_cuda(TensorIteratorBase& iter) { - AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3(ScalarType::Half, ScalarType::BFloat16, ScalarType::Bool, iter.dtype(), "abs_cuda", [&]() { - gpu_kernel(iter, AbsFunctor()); - }); + auto dtype = iter.dtype(); + if (at::isComplexType(dtype)) { +#if AT_USE_JITERATOR() + static const auto abs_string = jiterator_stringify( + template T abs_kernel(T x) { return std::abs(x); }); + AT_DISPATCH_COMPLEX_TYPES(dtype, "abs_cuda", [&]() { + jitted_gpu_kernel< + /*name=*/abs_name, + /*return_dtype=*/scalar_t, + /*common_dtype=*/scalar_t, + /*arity=*/1>(iter, abs_string); + }); +#else + AT_DISPATCH_COMPLEX_TYPES(dtype, "abs_cuda", [&]() { + gpu_kernel(iter, AbsFunctor()); + }); +#endif + } else { + AT_DISPATCH_ALL_TYPES_AND3( + ScalarType::Half, + ScalarType::BFloat16, + ScalarType::Bool, + iter.dtype(), + "abs_cuda", + [&]() { gpu_kernel(iter, AbsFunctor()); }); + } } -REGISTER_DISPATCH(abs_stub, &abs_kernel_cuda); + REGISTER_DISPATCH(abs_stub, &abs_kernel_cuda); }} // namespace at::native diff --git a/aten/src/ATen/native/cuda/BinaryMiscBackwardOpsKernels.cu b/aten/src/ATen/native/cuda/BinaryMiscBackwardOpsKernels.cu index 80c74e4e8d9f0c..4ff6d882c85692 100644 --- a/aten/src/ATen/native/cuda/BinaryMiscBackwardOpsKernels.cu +++ b/aten/src/ATen/native/cuda/BinaryMiscBackwardOpsKernels.cu @@ -8,6 +8,7 @@ #include #include #include +#include // NOTE: CUDA on Windows requires that the enclosing function // of a __device__ lambda not have internal linkage. @@ -15,15 +16,33 @@ namespace at { namespace native { +const char sigmoid_backward_name[] = "sigmoid_backward"; void sigmoid_backward_kernel_cuda(TensorIteratorBase& iter) { - if(isComplexType(iter.dtype())) { - AT_DISPATCH_COMPLEX_TYPES(iter.dtype(), "sigmoid_backward_cuda", [&]() { + auto dtype = iter.dtype(); + if(isComplexType(dtype)) { +#if AT_USE_JITERATOR() + static const auto sigmoid_backward_string = jiterator_stringify( + template + T sigmoid_backward(T a, T b) { + return a * std::conj((T{1.} - b) * b); + } + ); // sigmoid_backward_string + AT_DISPATCH_COMPLEX_TYPES(dtype, "sigmoid_backward_cuda", [&]() { + jitted_gpu_kernel< + /*name=*/ sigmoid_backward_name, + /*return_dtype=*/ scalar_t, + /*common_dtype=*/ scalar_t, + /*arity=*/ 2>(iter, sigmoid_backward_string); + }); +#else + AT_DISPATCH_COMPLEX_TYPES(dtype, "sigmoid_backward_cuda", [&]() { gpu_kernel(iter, [] GPU_LAMBDA(scalar_t a, scalar_t b) -> scalar_t { return a * std::conj((scalar_t{1.} - b) * b); }); }); +#endif } else { - AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, iter.dtype(), "sigmoid_backward_cuda", [&]() { + AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, dtype, "sigmoid_backward_cuda", [&]() { gpu_kernel(iter, []GPU_LAMBDA(scalar_t a, scalar_t b) -> scalar_t { return a * (scalar_t(1.) - b) * b; }); diff --git a/aten/src/ATen/native/cuda/Blas.cpp b/aten/src/ATen/native/cuda/Blas.cpp index ec50994fb12809..07ce6dca45e7de 100644 --- a/aten/src/ATen/native/cuda/Blas.cpp +++ b/aten/src/ATen/native/cuda/Blas.cpp @@ -13,6 +13,7 @@ #include #include #else +#include #include #include #include @@ -21,8 +22,10 @@ #include #include #include +#include #include #include +#include #include #include #endif @@ -113,7 +116,29 @@ c10::MaybeOwned prepare_batch_matrix_for_cublas(const Tensor& tensor, bo namespace { -Tensor& addmm_out_cuda_impl(Tensor& result, const Tensor& self, const Tensor& mat1, const Tensor& mat2, const Scalar& beta, const Scalar& alpha) { +enum class Activation { + None, + RELU, + GELU, +}; + +#if defined(CUDA_VERSION) && CUDA_VERSION >= 11000 && !defined(_MSC_VER) +cuda::blas::GEMMAndBiasActivationEpilogue activation_to_gemm_and_blas_arg(Activation a) { + switch (a) { + case Activation::None: + return cuda::blas::GEMMAndBiasActivationEpilogue::None; + case Activation::RELU: + return cuda::blas::GEMMAndBiasActivationEpilogue::RELU; + case Activation::GELU: + return cuda::blas::GEMMAndBiasActivationEpilogue::GELU; + default: + TORCH_CHECK(false); + return cuda::blas::GEMMAndBiasActivationEpilogue::None; + } +} +#endif + +Tensor& addmm_out_cuda_impl(Tensor& result, const Tensor& self, const Tensor& mat1, const Tensor& mat2, const Scalar& beta, const Scalar& alpha, Activation activation=Activation::None) { // Make sure to keep addmm_cuda below in sync with this code; it // preflights a check to try to avoid actually needing to call // expand(). @@ -129,7 +154,7 @@ Tensor& addmm_out_cuda_impl(Tensor& result, const Tensor& self, const Tensor& ma at::ScalarType scalar_type = self.scalar_type(); c10::MaybeOwned self_; if (&result != &self) { -#if defined(CUDA_VERSION) && CUDA_VERSION >= 11000 && !defined(_MSC_VER) +#if defined(CUDA_VERSION) && CUDA_VERSION >= 11040 && !defined(_MSC_VER) // Strangely, if mat2 has only 1 row or column, we get // CUBLAS_STATUS_INVALID_VALUE error from cublasLtMatmulAlgoGetHeuristic. // self.dim() == 1 && result.dim() == 2 && self.sizes()[0] == mat2_sizes[1] @@ -142,12 +167,6 @@ Tensor& addmm_out_cuda_impl(Tensor& result, const Tensor& self, const Tensor& ma scalar_type == at::ScalarType::Half || scalar_type == at::ScalarType::BFloat16) && mat2_sizes[0] > 1 && mat2_sizes[1] > 1; - - // https://docs.nvidia.com/cuda/cublas/index.html#cublasLt-general-description - // Batch size > 65535 does not work in most cases. - if (mat1_sizes[0] > 65535) { - useLtInterface = false; - } #endif if (!useLtInterface) { self_ = expand_size(self, {mat1_sizes[0], mat2_sizes[1]}, "addmm"); @@ -237,7 +256,19 @@ Tensor& addmm_out_cuda_impl(Tensor& result, const Tensor& self, const Tensor& ma mat2_ld, self.data_ptr(), result_->data_ptr(), - result_ld); + result_ld, +#if 0 + activation_to_gemm_and_blas_arg(activation) +#else + // GELU is not supported (and does not compile!) prior + // to CUDA 11.4. Have observed accuracy issues with + // GELU epilogue in 11.4; disabling the GELU epilogue + // path until we confirm which version it's working in. + activation != Activation::GELU + ? activation_to_gemm_and_blas_arg(activation) + : cuda::blas::GEMMAndBiasActivationEpilogue::None +#endif + ); }); } else #endif @@ -269,8 +300,27 @@ Tensor& addmm_out_cuda_impl(Tensor& result, const Tensor& self, const Tensor& ma result_ptr, result_ld); }); + switch (activation) { + case Activation::RELU: + at::relu_(const_cast(*result_)); + break; + case Activation::GELU: + at::gelu_(const_cast(*result_)); + break; + default: break; + } } +// Preprocessor gate here needs to match the inverse of the check +// gating activation_to_gemm_and_blas_arg above; here we are manually +// performing a post-GELU because we weren't able to use the GELU +// epilogue above. +#if !0 + if (useLtInterface && activation == Activation::GELU) { + at::gelu_(const_cast(*result_)); + } +#endif + if (!result.is_same(*result_)) { result.copy_(*result_); } @@ -354,6 +404,10 @@ TORCH_IMPL_FUNC(addmm_out_cuda)(const Tensor& self, const Tensor& mat1, const Te addmm_out_cuda_impl(const_cast(result), self, mat1, mat2, beta, alpha); } +TORCH_IMPL_FUNC(addmm_activation_out_cuda)(const Tensor& self, const Tensor& mat1, const Tensor& mat2, const Scalar& beta, const Scalar& alpha, bool use_gelu, const Tensor& result) { + addmm_out_cuda_impl(const_cast(result), self, mat1, mat2, beta, alpha, use_gelu ? Activation::GELU : Activation::RELU); +} + TORCH_IMPL_FUNC(mm_out_cuda)(const Tensor& self, const Tensor& mat2, const Tensor& result) { addmm_out_cuda_impl(const_cast(result), result, self, mat2, 0, 1); } diff --git a/aten/src/ATen/native/cuda/CUDAJitLoops.cuh b/aten/src/ATen/native/cuda/CUDAJitLoops.cuh index b5b1cd5c63bcf9..52274e043038e3 100644 --- a/aten/src/ATen/native/cuda/CUDAJitLoops.cuh +++ b/aten/src/ATen/native/cuda/CUDAJitLoops.cuh @@ -71,7 +71,8 @@ static inline void launch_jitted_unrolled_kernel( std::tuple extra_args) { TORCH_INTERNAL_ASSERT(N > 0 && N <= std::numeric_limits::max()); - const int64_t grid = (N + block_work_size() - 1) / block_work_size(); + //casting result to int is always safe, intermediate is int64 and won't overflow + const uint32_t grid = (N + block_work_size() - 1) / block_work_size(); static std::mutex _jiterator_mutex; static std::vector fns(c10::cuda::device_count()); @@ -114,9 +115,8 @@ static inline void launch_jitted_unrolled_kernel( // since 7 slots are already filled in `args` args[i + 7] = extra_args_array[i]; } - - at::cuda::jit::launch_jitted_pwise_function(*fn_ptr, args, grid, num_threads()); - C10_CUDA_KERNEL_LAUNCH_CHECK(); + at::cuda::jit::launch_jitted_pwise_function(*fn_ptr, args, {grid, 1u, 1u}, + {num_threads(), 1u, 1u}); } template< @@ -129,7 +129,8 @@ template< static inline void launch_jitted_vectorized_kernel(DeviceIndex dev_idx, int64_t N, const std::string& f, array_t data, at::opmath_type scalar_val, std::tuple extra_args) { TORCH_INTERNAL_ASSERT(N > 0 && N <= std::numeric_limits::max()); - const int64_t grid = (N + block_work_size() - 1) / block_work_size(); + // N is still int64_t for the computation, but it's always safe to cast result to int + const uint32_t grid = (N + block_work_size() - 1) / block_work_size(); const int vec_size = memory::jitted_can_vectorize_up_to(data); // Different kernels are compiled depending on what we're vectorizing up to (1, 2 or 4 elements) @@ -195,9 +196,7 @@ at::opmath_type scalar_val, std::tuple extra_args) { // since 3 slots are already filled in `args` args[i + 3] = extra_args_array[i]; } - - at::cuda::jit::launch_jitted_pwise_function(*fn_ptr, args, grid, num_threads()); - C10_CUDA_KERNEL_LAUNCH_CHECK(); + at::cuda::jit::launch_jitted_pwise_function(*fn_ptr, args, {grid, 1u, 1u}, {num_threads(), 1u, 1u}); } else { auto ic = TrivialOffsetCalculator(); auto oc = TrivialOffsetCalculator<1>(); @@ -219,8 +218,8 @@ at::opmath_type scalar_val, std::tuple extra_args) { // since 7 slots are already filled in `args` args[i + 7] = extra_args_array[i]; } - at::cuda::jit::launch_jitted_pwise_function(*fn_ptr, args, grid, num_threads()); - C10_CUDA_KERNEL_LAUNCH_CHECK(); + + at::cuda::jit::launch_jitted_pwise_function(*fn_ptr, args, {grid, 1u, 1u}, {num_threads(), 1u, 1u}); } } diff --git a/aten/src/ATen/native/cuda/CUDAScalar.cu b/aten/src/ATen/native/cuda/CUDAScalar.cu index 637dd6514f409f..4f2b092573e3fa 100644 --- a/aten/src/ATen/native/cuda/CUDAScalar.cu +++ b/aten/src/ATen/native/cuda/CUDAScalar.cu @@ -15,8 +15,8 @@ namespace native { Scalar _local_scalar_dense_cuda(const Tensor& self) { Scalar r; - AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3( - at::ScalarType::Half, at::ScalarType::Bool, at::ScalarType::BFloat16, self.scalar_type(), "_local_scalar_dense_cuda", [&] { + AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND4( + kComplexHalf, kHalf, kBool, kBFloat16, self.scalar_type(), "_local_scalar_dense_cuda", [&] { scalar_t value; cudaStream_t stream = at::cuda::getCurrentCUDAStream(); at::cuda::memcpy_and_sync(&value, self.data_ptr(), sizeof(scalar_t), cudaMemcpyDeviceToHost, stream); diff --git a/aten/src/ATen/native/cuda/ComplexKernel.cu b/aten/src/ATen/native/cuda/ComplexKernel.cu index 6420279704e027..8738c0ab4c8ec4 100644 --- a/aten/src/ATen/native/cuda/ComplexKernel.cu +++ b/aten/src/ATen/native/cuda/ComplexKernel.cu @@ -12,7 +12,7 @@ namespace native { namespace { void complex_kernel_cuda(TensorIterator& iter) { - AT_DISPATCH_FLOATING_TYPES(iter.input_dtype(0), "complex_cuda", [&]() { + AT_DISPATCH_FLOATING_TYPES_AND(kHalf, iter.input_dtype(0), "complex_cuda", [&]() { gpu_kernel( iter, [] GPU_LAMBDA(scalar_t a, scalar_t b) -> c10::complex { return c10::complex(a, b); diff --git a/aten/src/ATen/native/cuda/Copy.cu b/aten/src/ATen/native/cuda/Copy.cu index a42a90cbe29306..57f04d481fc5c2 100644 --- a/aten/src/ATen/native/cuda/Copy.cu +++ b/aten/src/ATen/native/cuda/Copy.cu @@ -31,8 +31,8 @@ void direct_copy_kernel_cuda(TensorIteratorBase &iter) { gpu_kernel(iter, [] GPU_LAMBDA(scalar_t x) { return x; }); }); } else { - AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3( - kHalf, kBool, kBFloat16, dtype, "copy_", [&] { + AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND4( + kHalf, kBool, kBFloat16, kComplexHalf, dtype, "copy_", [&] { gpu_kernel(iter, [] GPU_LAMBDA(scalar_t x) { return x; }); }); } diff --git a/aten/src/ATen/native/cuda/Dropout.cu b/aten/src/ATen/native/cuda/Dropout.cu index 528a43646b9b15..6ec054aa60504f 100644 --- a/aten/src/ATen/native/cuda/Dropout.cu +++ b/aten/src/ATen/native/cuda/Dropout.cu @@ -1,6 +1,9 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include +#include +#include #include #include #include @@ -11,6 +14,17 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#endif + namespace at{ namespace native{ diff --git a/aten/src/ATen/native/cuda/Embedding.cu b/aten/src/ATen/native/cuda/Embedding.cu index f4b5f160b5256d..8a241cabcd2d36 100644 --- a/aten/src/ATen/native/cuda/Embedding.cu +++ b/aten/src/ATen/native/cuda/Embedding.cu @@ -1,5 +1,7 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include +#include #include #include #include @@ -17,6 +19,18 @@ #include #endif +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#endif + namespace at { namespace native { namespace { diff --git a/aten/src/ATen/native/cuda/EmbeddingBackwardKernel.cu b/aten/src/ATen/native/cuda/EmbeddingBackwardKernel.cu index ef7eb942f26e05..1a2c7627fc730b 100644 --- a/aten/src/ATen/native/cuda/EmbeddingBackwardKernel.cu +++ b/aten/src/ATen/native/cuda/EmbeddingBackwardKernel.cu @@ -1,15 +1,26 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include -#include +#include +#include +#include #include -#include #include -#include - #include +#if CUB_SUPPORTS_UNIQUE_BY_KEY() +#include +#endif + +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#include +#endif + namespace at { namespace native { @@ -35,7 +46,8 @@ int64_t ceil_div(int64_t x, int64_t y) { template __global__ void krn_partials_per_segment(index_t *ret, const index_t *segment_offsets, - int64_t num_of_segments, int64_t numel) { + int64_t *num_of_segments_ptr, int64_t numel) { + int64_t num_of_segments = *num_of_segments_ptr; const int id = blockIdx.x * blockDim.x + threadIdx.x; if(id < num_of_segments) { const int64_t idx_start = segment_offsets[id]; @@ -52,7 +64,8 @@ void krn_partial_segment_offset( const index_t *partials_per_segment, const index_t *partials_per_segment_offset, const index_t *segment_offsets, - int64_t num_of_segments) { + int64_t *num_of_segments_ptr) { + int64_t num_of_segments = *num_of_segments_ptr; const int id = blockIdx.x * blockDim.x + threadIdx.x; if(id < num_of_segments) { index_t idx = partials_per_segment_offset[id]; @@ -71,10 +84,11 @@ __global__ void compute_grad_weight_bags( index_t *offset2bag, index_t *count, ptrdiff_t numel, int64_t stride, int mode_mean, const index_t *bag_size, scalar_t* per_sample_weights, int64_t per_sample_weights_stride, - index_t* segment_offsets, int64_t num_of_segments, + index_t* segment_offsets, int64_t *num_of_segments_ptr, acc_type *grad_weight_per_segment, const int64_t stride_warped) { + int64_t num_of_segments = *num_of_segments_ptr; const int gid = blockIdx.x * blockDim.x + threadIdx.x; const int id = gid / stride_warped; const int startFeature = gid % stride_warped; @@ -115,10 +129,11 @@ __global__ void compute_grad_weight( ptrdiff_t numel, int64_t stride, index_t* segment_offsets, - int64_t num_of_segments, + int64_t *num_of_segments_ptr, acc_type *grad_weight_per_segment, const int64_t stride_warped) { + int64_t num_of_segments = *num_of_segments_ptr; using accscalar_t = acc_type; const int gid = blockIdx.x * blockDim.x + threadIdx.x; const int id = gid / stride_warped; @@ -145,12 +160,14 @@ __global__ void compute_grad_weight( template __global__ void sum_and_scatter( index_t *input, scalar_t *gradWeight, int64_t stride, - index_t* segment_offsets, int64_t num_of_segments, + index_t* segment_offsets, int64_t *num_of_segments_ptr, const acc_type *grad_weight_per_segment, - const index_t *segment_sizes_offsets, int64_t num_of_partial_segments, + const index_t *segment_sizes_offsets, int64_t *num_of_partial_segments_ptr, const int64_t padding_idx, const int64_t stride_warped) { + int64_t num_of_segments = *num_of_segments_ptr; + int64_t num_of_partial_segments = *num_of_partial_segments_ptr; const int gid = blockIdx.x * blockDim.x + threadIdx.x; const int id = gid / stride_warped; const int startFeature = gid % stride_warped; @@ -173,10 +190,23 @@ __global__ void sum_and_scatter( } } +template +__global__ void compute_num_of_partial_segments(index_t *partials_per_segment, index_t *partials_per_segment_offset, int64_t *num_of_segments_ptr, int64_t *output) { + int64_t num_of_segments = *num_of_segments_ptr; + *output = partials_per_segment[num_of_segments-1] + + partials_per_segment_offset[num_of_segments-1]; +} + +__global__ void write_num_of_segments_for_legacy_thrust_path(int64_t *num_of_segments_ptr, int64_t num_of_segments) { + *num_of_segments_ptr = num_of_segments; +} + } // anon namespace +#if !CUB_SUPPORTS_UNIQUE_BY_KEY() template int64_t embedding_backward_cuda_kernel_unique_by_key(const Tensor &sorted_indices, Tensor &segment_offsets); +#endif Tensor embedding_backward_cuda_kernel( const Tensor &grad, @@ -200,19 +230,35 @@ Tensor embedding_backward_cuda_kernel( // spawn a warp per index. In this context, a segment is a number of rows that should // be summarized. // Unit: index in `sorted_indices` and `orig_indices` + auto segment_offsets = at::empty({numel}, orig_indices.options()); + auto num_of_segments_tensor = at::empty({}, grad.options().dtype(kLong)); + int64_t *num_of_segments_ptr = num_of_segments_tensor.data_ptr(); +#if !CUB_SUPPORTS_UNIQUE_BY_KEY() AT_DISPATCH_INDEX_TYPES(orig_indices.scalar_type(), "embedding_backward_cuda_kernel", [&] () { - auto segment_offsets = at::empty({numel}, orig_indices.options()); int64_t num_of_segments = embedding_backward_cuda_kernel_unique_by_key(sorted_indices, segment_offsets); + write_num_of_segments_for_legacy_thrust_path<<<1, 1, 0, c10::cuda::getCurrentCUDAStream()>>>(num_of_segments_ptr, num_of_segments); + C10_CUDA_KERNEL_LAUNCH_CHECK(); + }); +#else + AT_DISPATCH_INDEX_TYPES(orig_indices.scalar_type(), "embedding_backward_cuda_kernel", [&] () { + auto num_of_segments_tensor = at::empty({}, grad.options().dtype(kLong)); + cuda::cub::unique_by_key( + sorted_indices.data_ptr(), thrust::make_counting_iterator(0), + nullptr, segment_offsets.data_ptr(), + num_of_segments_ptr, sorted_indices.numel()); + }); +#endif + AT_DISPATCH_INDEX_TYPES(orig_indices.scalar_type(), "embedding_backward_cuda_kernel", [&] () { // We split the segments up into sizes of `NROWS_PER_THREAD` // Compute the number partial-segments per segment (some partial-segments // may not be the full `NROWS_PER_THREAD` number of rows) - auto partials_per_segment = at::empty({num_of_segments}, orig_indices.options()); + auto partials_per_segment = at::empty({numel}, orig_indices.options()); { - krn_partials_per_segment<<>> ( + krn_partials_per_segment<<>> ( partials_per_segment.data_ptr(), segment_offsets.data_ptr(), - num_of_segments, + num_of_segments_ptr, numel); C10_CUDA_KERNEL_LAUNCH_CHECK(); } @@ -221,33 +267,38 @@ Tensor embedding_backward_cuda_kernel( // of each partial-segment in `sorted_indices`, we need to compute the // start position of each _segment_ in `partial_segment_offset`. // Unit: index in `partial_segment_offset` - auto partials_per_segment_offset = at::empty({num_of_segments}, orig_indices.options()); + auto partials_per_segment_offset = at::empty({numel}, orig_indices.options()); cuda::cub::exclusive_sum( partials_per_segment.data_ptr(), partials_per_segment_offset.data_ptr(), - num_of_segments); + numel); // The total number of partial-segments is the sum of `partials_per_segment_offset` - const int num_of_partial_segments = partials_per_segment[num_of_segments-1].item() + - partials_per_segment_offset[num_of_segments-1].item(); + auto num_of_partial_segments_tensor = at::empty({}, grad.options().dtype(kLong)); + int64_t *num_of_partial_segments_ptr = num_of_partial_segments_tensor.data_ptr(); + compute_num_of_partial_segments<<<1, 1, 0, c10::cuda::getCurrentCUDAStream()>>>( + partials_per_segment.data_ptr(), + partials_per_segment_offset.data_ptr(), + num_of_segments_ptr, num_of_partial_segments_ptr); + C10_CUDA_KERNEL_LAUNCH_CHECK(); // Now we can compute the start position of each partial-segment // Unit: index in `sorted_indices` and `orig_indices` - auto partial_segment_offset = at::empty({num_of_partial_segments}, orig_indices.options()); + auto partial_segment_offset = at::empty({numel}, orig_indices.options()); { - krn_partial_segment_offset<<>> ( + krn_partial_segment_offset<<>> ( partial_segment_offset.data_ptr(), partials_per_segment.data_ptr(), partials_per_segment_offset.data_ptr(), segment_offsets.data_ptr(), - num_of_segments); + num_of_segments_ptr); C10_CUDA_KERNEL_LAUNCH_CHECK(); } const int warp_size = at::cuda::warp_size(); const int stride_warped = ceil_div(stride, warp_size)*warp_size; const int block = std::min(stride_warped, MAX_BLOCK_SIZE); - const int grid = ceil_div(num_of_partial_segments*stride_warped, block); + const int grid = ceil_div(numel*stride_warped, block); AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, grad.scalar_type(), "embedding_bag_backward_cuda_compute_grad_weight", [&] { @@ -260,7 +311,7 @@ Tensor embedding_backward_cuda_kernel( } else { op = grad.options(); } - auto grad_weight_per_segment = at::empty({num_of_partial_segments, stride}, op); + auto grad_weight_per_segment = at::empty({numel, stride}, op); // Compute the sum of each partial-segment and handle bags if (offset2bag.defined()) { compute_grad_weight_bags<<>>( @@ -272,7 +323,7 @@ Tensor embedding_backward_cuda_kernel( per_sample_weights.defined() ? per_sample_weights.data_ptr() : NULL, per_sample_weights.defined() ? per_sample_weights.stride(0) : 0, partial_segment_offset.data_ptr(), - num_of_partial_segments, grad_weight_per_segment.data_ptr(), + num_of_partial_segments_ptr, grad_weight_per_segment.data_ptr(), stride_warped); C10_CUDA_KERNEL_LAUNCH_CHECK(); } else { @@ -282,7 +333,7 @@ Tensor embedding_backward_cuda_kernel( count.defined() ? count.data_ptr() : nullptr, numel, stride, partial_segment_offset.data_ptr(), - num_of_partial_segments, + num_of_partial_segments_ptr, grad_weight_per_segment.data_ptr(), stride_warped); C10_CUDA_KERNEL_LAUNCH_CHECK(); @@ -290,15 +341,15 @@ Tensor embedding_backward_cuda_kernel( // Finally, we sum all the partial-sums and scatter them // into `grad_weight`. - const int grid2 = ceil_div(num_of_segments*stride_warped, block); + const int grid2 = ceil_div(numel*stride_warped, block); sum_and_scatter<<>>( sorted_indices.data_ptr(), grad_weight.data_ptr(), stride, segment_offsets.data_ptr(), - num_of_segments, grad_weight_per_segment.data_ptr(), + num_of_segments_ptr, grad_weight_per_segment.data_ptr(), partials_per_segment_offset.data_ptr(), - num_of_partial_segments, + num_of_partial_segments_ptr, padding_idx, stride_warped); C10_CUDA_KERNEL_LAUNCH_CHECK(); diff --git a/aten/src/ATen/native/cuda/EmbeddingBackwardKernel.cuh b/aten/src/ATen/native/cuda/EmbeddingBackwardKernel.cuh index 7b8fc9576e2178..0d8d45c1defb90 100644 --- a/aten/src/ATen/native/cuda/EmbeddingBackwardKernel.cuh +++ b/aten/src/ATen/native/cuda/EmbeddingBackwardKernel.cuh @@ -1,10 +1,8 @@ -#include +#pragma once +#include #include #include #include -#include - -#pragma once namespace at { namespace native { diff --git a/aten/src/ATen/native/cuda/EmbeddingBag.cu b/aten/src/ATen/native/cuda/EmbeddingBag.cu index c6701aba07b5c8..7ac3a7151b79c5 100644 --- a/aten/src/ATen/native/cuda/EmbeddingBag.cu +++ b/aten/src/ATen/native/cuda/EmbeddingBag.cu @@ -1,12 +1,26 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include +#include #include #include #include #include -#include -#include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#endif #include #include @@ -53,7 +67,7 @@ __global__ void EmbeddingBag_updateOutputKernel_max( index_t *offset2bag, int64_t numIndices, int64_t numBags, int64_t featureSize, int64_t weight_stride0, int64_t weight_stride1, index_t *bag_size, index_t *max_indices, - index_t padding_idx, int64_t vocab_size) { + index_t padding_idx) { // the strategy here is that each bag x feature is handled by a single thread @@ -74,7 +88,6 @@ __global__ void EmbeddingBag_updateOutputKernel_max( int64_t bag_size_ = 0; int64_t maxWord = -1; for (int64_t emb = begin; emb < end; emb++) { - CUDA_KERNEL_ASSERT(input[emb] >= 0 && input[emb] < vocab_size); bool pad = (input[emb] == padding_idx); const int64_t weightRow = input[emb] * weight_stride0; scalar_t weightValue = weightFeat[weightRow]; @@ -104,7 +117,7 @@ __global__ void EmbeddingBag_updateOutputKernel_sum_mean( int64_t featureSize, int64_t weight_stride0, int64_t weight_stride1, int mode, index_t *bag_size, scalar_t* per_sample_weights, int64_t per_sample_weights_stride, - index_t padding_idx, int64_t vocab_size) { + index_t padding_idx) { // the strategy here is that each bag x feature is handled by a single thread @@ -125,7 +138,6 @@ __global__ void EmbeddingBag_updateOutputKernel_sum_mean( accscalar_t weightFeatSum = 0; int64_t bag_size_ = 0; for (int64_t emb = begin; emb < end; emb++) { - CUDA_KERNEL_ASSERT(input[emb] >= 0 && input[emb] < vocab_size); bool pad = (input[emb] == padding_idx); const int64_t weightRow = input[emb] * weight_stride0; scalar_t weightValue = weightFeat[weightRow]; @@ -350,7 +362,6 @@ _embedding_bag_cuda(const Tensor &weight, const Tensor &indices_, numBags -= 1; } int64_t featureSize = weight.size(1); - int64_t vocabSize = weight.size(0); auto bag_size = at::empty(offsets.sizes(), indices.options()); auto offset2bag = @@ -384,7 +395,7 @@ _embedding_bag_cuda(const Tensor &weight, const Tensor &indices_, offset2bag.data_ptr(), numIndices, numBags, featureSize, weight.stride(0), weight.stride(1), bag_size.data_ptr(), max_indices.data_ptr(), - padding_idx, vocabSize); + padding_idx); C10_CUDA_KERNEL_LAUNCH_CHECK(); } else { EmbeddingBag_updateOutputKernel_sum_mean<<>>( @@ -394,7 +405,7 @@ _embedding_bag_cuda(const Tensor &weight, const Tensor &indices_, weight.stride(0), weight.stride(1), mode, bag_size.data_ptr(), per_sample_weights.defined() ? per_sample_weights.data_ptr() : NULL, per_sample_weights.defined() ? per_sample_weights.stride(0) : 0, - padding_idx, vocabSize); + padding_idx); C10_CUDA_KERNEL_LAUNCH_CHECK(); } }); diff --git a/aten/src/ATen/native/cuda/Equal.cpp b/aten/src/ATen/native/cuda/Equal.cpp index 401571b2f1f26c..ab8c9adef4e436 100644 --- a/aten/src/ATen/native/cuda/Equal.cpp +++ b/aten/src/ATen/native/cuda/Equal.cpp @@ -1,6 +1,14 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include + +#ifndef AT_PER_OPERATOR_HEADERS #include #include -#include +#else +#include +#include +#endif namespace at { namespace native { diff --git a/aten/src/ATen/native/cuda/FillKernel.cu b/aten/src/ATen/native/cuda/FillKernel.cu index 82813338946285..facceccf8028fc 100644 --- a/aten/src/ATen/native/cuda/FillKernel.cu +++ b/aten/src/ATen/native/cuda/FillKernel.cu @@ -19,7 +19,7 @@ struct FillFunctor { }; void fill_kernel_cuda(TensorIterator& iter, const Scalar& value) { - AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3(at::ScalarType::Bool, at::ScalarType::Half, at::ScalarType::BFloat16, iter.dtype(), "fill_cuda", [&]() { + AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND4(kComplexHalf, kBool, kHalf, kBFloat16, iter.dtype(), "fill_cuda", [&]() { gpu_kernel(iter, FillFunctor(value.to())); }); } diff --git a/aten/src/ATen/native/cuda/ForeachReduceOp.cu b/aten/src/ATen/native/cuda/ForeachReduceOp.cu index 0d6848324252d8..05fb1f6a087d14 100644 --- a/aten/src/ATen/native/cuda/ForeachReduceOp.cu +++ b/aten/src/ATen/native/cuda/ForeachReduceOp.cu @@ -1,6 +1,7 @@ #define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include #include +#include #include #include #include @@ -24,13 +25,13 @@ namespace native { template struct LpNormFunctor { static_assert(NormType == 1 || NormType == 2, "foreach_norm supports only L1 and L2 norm"); + using opmath_t = typename at::opmath_type; __device__ __forceinline__ void operator() ( int chunk_size, TensorListMetadata& tl, - T* output_per_tensor, + opmath_t* output_per_tensor, const int max_chunks_per_tensor ) { - using opmath_t = typename at::opmath_type; int tensor_loc = tl.block_to_tensor[blockIdx.x]; int chunk_idx = tl.block_to_chunk[blockIdx.x]; int n = tl.numel_for_tensor[tensor_loc]; @@ -82,16 +83,15 @@ struct LpNormFunctor { } }; -template +template> __global__ void lpnorm_cleanup( - T* output_per_tensor, + opmath_t* output_per_tensor, T* ret_per_tensor, int max_chunks_per_tensor) { - using opmath_t = typename at::opmath_type; __shared__ opmath_t vals[512]; - T* output_this_tensor = output_per_tensor + blockIdx.x*max_chunks_per_tensor; - T val = 0; + opmath_t* output_this_tensor = output_per_tensor + blockIdx.x*max_chunks_per_tensor; + opmath_t val = 0; for (int i = threadIdx.x; i < max_chunks_per_tensor; i += blockDim.x) { val += output_this_tensor[i]; } @@ -134,7 +134,7 @@ std::vector foreach_tensor_norm_cuda(TensorList tensors, const Scalar& o } } const auto options = tensors[0].options(); - auto output_per_tensor = at::zeros({ntensors*max_chunks_per_tensor}, options); + auto output_per_tensor = at::zeros({ntensors*max_chunks_per_tensor}, options.dtype(toOpMathType(tensors[0].scalar_type()))); auto ret_per_tensor = at::empty({ntensors}, options); auto tensor_lists = std::vector>{tensors.vec()}; @@ -145,13 +145,13 @@ std::vector foreach_tensor_norm_cuda(TensorList tensors, const Scalar& o multi_tensor_apply<1>( tensor_lists, LpNormFunctor(), - output_per_tensor.data_ptr(), + output_per_tensor.data_ptr(), max_chunks_per_tensor); C10_CUDA_KERNEL_LAUNCH_CHECK(); const at::cuda::OptionalCUDAGuard device_guard(device_of(output_per_tensor)); auto stream = at::cuda::getCurrentCUDAStream(); lpnorm_cleanup<<>>( - output_per_tensor.data_ptr(), + output_per_tensor.data_ptr(), ret_per_tensor.data_ptr(), max_chunks_per_tensor); C10_CUDA_KERNEL_LAUNCH_CHECK(); @@ -163,13 +163,13 @@ std::vector foreach_tensor_norm_cuda(TensorList tensors, const Scalar& o multi_tensor_apply<1>( tensor_lists, LpNormFunctor(), - output_per_tensor.data_ptr(), + output_per_tensor.data_ptr(), max_chunks_per_tensor); C10_CUDA_KERNEL_LAUNCH_CHECK(); const at::cuda::OptionalCUDAGuard device_guard(device_of(output_per_tensor)); auto stream = at::cuda::getCurrentCUDAStream(); lpnorm_cleanup<<>>( - output_per_tensor.data_ptr(), + output_per_tensor.data_ptr(), ret_per_tensor.data_ptr(), max_chunks_per_tensor); C10_CUDA_KERNEL_LAUNCH_CHECK(); diff --git a/aten/src/ATen/native/cuda/FractionalMaxPool2d.cu b/aten/src/ATen/native/cuda/FractionalMaxPool2d.cu index aa898d50a2ce06..46ea4eadf1febe 100644 --- a/aten/src/ATen/native/cuda/FractionalMaxPool2d.cu +++ b/aten/src/ATen/native/cuda/FractionalMaxPool2d.cu @@ -1,16 +1,24 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include +#include #include #include #include #include #include -#include #include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#include +#endif + #include #include #include diff --git a/aten/src/ATen/native/cuda/FractionalMaxPool3d.cu b/aten/src/ATen/native/cuda/FractionalMaxPool3d.cu index 34b238410bb5f2..92a77dc00af539 100644 --- a/aten/src/ATen/native/cuda/FractionalMaxPool3d.cu +++ b/aten/src/ATen/native/cuda/FractionalMaxPool3d.cu @@ -1,17 +1,27 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include +#include #include #include #include #include #include #include -#include #include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#endif + #include #include #include diff --git a/aten/src/ATen/native/cuda/FunctionOfAMatrixUtilsKernel.cu b/aten/src/ATen/native/cuda/FunctionOfAMatrixUtilsKernel.cu index e2f51503133e52..7c04ce4da351d4 100644 --- a/aten/src/ATen/native/cuda/FunctionOfAMatrixUtilsKernel.cu +++ b/aten/src/ATen/native/cuda/FunctionOfAMatrixUtilsKernel.cu @@ -1,3 +1,4 @@ +#define TORCH_ASSERT_NO_OPERATORS #include #include diff --git a/aten/src/ATen/native/cuda/GridSampler.cu b/aten/src/ATen/native/cuda/GridSampler.cu index 153b779bdf37c6..bfc3d86b8ab9ed 100644 --- a/aten/src/ATen/native/cuda/GridSampler.cu +++ b/aten/src/ATen/native/cuda/GridSampler.cu @@ -1,5 +1,6 @@ #define TORCH_ASSERT_NO_OPERATORS #include +#include #include #include #include @@ -739,10 +740,14 @@ namespace { } } // namespace -// No shape checking needed here. See # NOTE [ grid_sampler Native Functions ]. void launch_grid_sampler_2d_forward_kernel( const TensorBase &output, const TensorBase &input, const TensorBase &grid, int64_t interpolation_mode, int64_t padding_mode, bool align_corners) { + // See NOTE [ grid_sampler Native Functions ]. + // Add checks here in case this is called instead of grid_sampler. + check_grid_sampler_common(input, grid); + check_grid_sampler_2d(input, grid); + auto N = input.size(0); auto H = grid.size(1); auto W = grid.size(2); @@ -777,10 +782,14 @@ void launch_grid_sampler_2d_forward_kernel( } } -// No shape checking needed here. See # NOTE [ grid_sampler Native Functions ]. void launch_grid_sampler_3d_forward_kernel( const TensorBase &output, const TensorBase &input, const TensorBase &grid, int64_t interpolation_mode, int64_t padding_mode, bool align_corners) { + // See NOTE [ grid_sampler Native Functions ]. + // Add checks here in case this is called instead of grid_sampler. + check_grid_sampler_common(input, grid); + check_grid_sampler_3d(input, grid, interpolation_mode); + auto N = input.size(0); auto D = grid.size(1); auto H = grid.size(2); @@ -816,12 +825,16 @@ void launch_grid_sampler_3d_forward_kernel( } } -// No shape checking needed here. See # NOTE [ grid_sampler Native Functions ]. void launch_grid_sampler_2d_backward_kernel( const TensorBase &grad_input, const TensorBase &grad_grid, const TensorBase &grad_output, const TensorBase &input, const TensorBase &grid, int64_t interpolation_mode, int64_t padding_mode, bool align_corners, std::array output_mask) { + // See NOTE [ grid_sampler Native Functions ]. + // Add checks here in case this is called instead of grid_sampler. + check_grid_sampler_common(input, grid); + check_grid_sampler_2d(input, grid); + // See Note [Writing Nondeterministic Operations] // Nondeterministic because of atomicAdd usage globalContext().alertNotDeterministic("grid_sampler_2d_backward_cuda"); @@ -873,12 +886,16 @@ void launch_grid_sampler_2d_backward_kernel( } } -// No shape checking needed here. See # NOTE [ grid_sampler Native Functions ]. void launch_grid_sampler_3d_backward_kernel( const TensorBase &grad_input, const TensorBase &grad_grid, const TensorBase& grad_output, const TensorBase& input, const TensorBase& grid, int64_t interpolation_mode, int64_t padding_mode, bool align_corners, std::array output_mask) { + // See NOTE [ grid_sampler Native Functions ]. + // Add checks here in case this is called instead of grid_sampler. + check_grid_sampler_common(input, grid); + check_grid_sampler_3d(input, grid, interpolation_mode); + // See Note [Writing Nondeterministic Operations] // Nondeterministic because of atomicAdd usage globalContext().alertNotDeterministic("grid_sampler_3d_backward_cuda"); diff --git a/aten/src/ATen/native/cuda/GridSampler.cuh b/aten/src/ATen/native/cuda/GridSampler.cuh index abc86f21749745..a0e3b16c3a43ac 100644 --- a/aten/src/ATen/native/cuda/GridSampler.cuh +++ b/aten/src/ATen/native/cuda/GridSampler.cuh @@ -1,14 +1,9 @@ +#pragma once #include +#include namespace at { namespace native { -namespace detail { - - enum class GridSamplerInterpolation {Bilinear, Nearest, Bicubic}; - enum class GridSamplerPadding {Zeros, Border, Reflection}; - -} // namespace detail - using detail::GridSamplerInterpolation; using detail::GridSamplerPadding; diff --git a/aten/src/ATen/native/cuda/Im2Col.cu b/aten/src/ATen/native/cuda/Im2Col.cu index 053418423adfcc..89b2a1879b4b71 100644 --- a/aten/src/ATen/native/cuda/Im2Col.cu +++ b/aten/src/ATen/native/cuda/Im2Col.cu @@ -1,6 +1,7 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include -#include +#include #include #include #include @@ -10,6 +11,16 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#endif + namespace at { namespace native { namespace { diff --git a/aten/src/ATen/native/cuda/IndexKernel.cpp b/aten/src/ATen/native/cuda/IndexKernel.cpp index b85baf097559d8..478c96fa6084c3 100644 --- a/aten/src/ATen/native/cuda/IndexKernel.cpp +++ b/aten/src/ATen/native/cuda/IndexKernel.cpp @@ -1,10 +1,21 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include #include // For at::native::index_out +#include +#include #include -#include #include #include + +#ifndef AT_PER_OPERATOR_HEADERS +#include #include +#else +#include +#include +#include +#endif + namespace at { namespace native { diff --git a/aten/src/ATen/native/cuda/Indexing.cu b/aten/src/ATen/native/cuda/Indexing.cu index b215968fea5035..85183274ebfcc3 100644 --- a/aten/src/ATen/native/cuda/Indexing.cu +++ b/aten/src/ATen/native/cuda/Indexing.cu @@ -1,11 +1,13 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include #include -#include +#include #include -#include +#include #include #include +#include #include #include #include @@ -14,6 +16,18 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#endif + #include #include #include @@ -268,10 +282,11 @@ void index_put_with_sort_kernel(Tensor & self, const c10::List(at::cuda::getCurrentDeviceProperties()->maxGridSize[1], ceil_div(sliceSize, (int64_t) (C10_WARP_SIZE*UNROLL))), + std::min(at::cuda::getCurrentDeviceProperties()->maxGridSize[1], ceil_div(sliceSize, (int64_t) (warp_size*UNROLL))), std::min(std::max(1,nElemBefore), at::cuda::getCurrentDeviceProperties()->maxGridSize[2])); - dim3 block(C10_WARP_SIZE, indices_per_block); + dim3 block(warp_size, indices_per_block); AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3(at::ScalarType::Half, at::ScalarType::Bool, at::ScalarType::BFloat16, expandedValue.scalar_type(), "indexing_backward", [&] { diff --git a/aten/src/ATen/native/cuda/LegacyThrustHelpers.cu b/aten/src/ATen/native/cuda/LegacyThrustHelpers.cu index f8ac9d3ed8f695..b080a6e5eac2ce 100644 --- a/aten/src/ATen/native/cuda/LegacyThrustHelpers.cu +++ b/aten/src/ATen/native/cuda/LegacyThrustHelpers.cu @@ -1,7 +1,14 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#endif + #include #include #include diff --git a/aten/src/ATen/native/cuda/Loss.cu b/aten/src/ATen/native/cuda/Loss.cu index 6afc89592799bd..1f885ff6fe0b5b 100644 --- a/aten/src/ATen/native/cuda/Loss.cu +++ b/aten/src/ATen/native/cuda/Loss.cu @@ -1,14 +1,28 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include -#include #include #include #include -#include +#include +#include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#endif + constexpr float EPSILON = 1e-12; namespace { diff --git a/aten/src/ATen/native/cuda/LossCTC.cu b/aten/src/ATen/native/cuda/LossCTC.cu index 65508b1a956b0f..4e406f7cd4de9b 100644 --- a/aten/src/ATen/native/cuda/LossCTC.cu +++ b/aten/src/ATen/native/cuda/LossCTC.cu @@ -7,15 +7,32 @@ // Graves et al call the probabilities y, we use log_probs (also calling them inputs) // A few optimizations (similar to those here, but also some I didn't take) are described in // 2. Minmin Sun: http://on-demand.gputechconf.com/gtc/2016/presentation/s6383-minmin-sun-speech-recognition.pdf - +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include #include #include -#include +#include #include +#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif + #include #include diff --git a/aten/src/ATen/native/cuda/Math.cuh b/aten/src/ATen/native/cuda/Math.cuh index e063ec7f42fbe3..cbd562f542c546 100644 --- a/aten/src/ATen/native/cuda/Math.cuh +++ b/aten/src/ATen/native/cuda/Math.cuh @@ -7,108 +7,6 @@ namespace at { namespace native { - -// TODO: these functions are unconditionally available because kaiser window depends on them -// TODO: jiterate kaiser window and make them only available when not jiterating -// NOTE: jiterating kaiser window requires extending the jiterator's scalar support -/* - * For licensing information and documentation, please refer to the the cpu implementation located in "ATen/native/Math.h". - */ -template -static inline C10_HOST_DEVICE scalar_t -chbevl(scalar_t _x, const scalar_t array[], size_t len) { - static_assert(!std::is_same() && !std::is_same(), "don't instantiate with low precision type"); - - scalar_t b0, b1, b2; - - b0 = array[0]; - b1 = 0; - - for (size_t i = 1; i < len; ++i) { - b2 = b1; - b1 = b0; - b0 = _x * b1 - b2 + array[i]; - } - - return (0.5 * (b0 - b2)); -} - -/* - * For licensing information and documentation, please refer to the the cpu implementation located in "ATen/native/Math.h". - */ -template -C10_HOST_DEVICE inline std::tuple chebyshev_coefficients_i0e_A() { - /* Chebyshev coefficients for exp(-x) I0(x) - * in the interval [0,8]. - * - * lim(x->0){ exp(-x) I0(x) } = 1. - */ - static const T coefficients[] = { - -4.41534164647933937950E-18, 3.33079451882223809783E-17, - -2.43127984654795469359E-16, 1.71539128555513303061E-15, - -1.16853328779934516808E-14, 7.67618549860493561688E-14, - -4.85644678311192946090E-13, 2.95505266312963983461E-12, - -1.72682629144155570723E-11, 9.67580903537323691224E-11, - -5.18979560163526290666E-10, 2.65982372468238665035E-9, - -1.30002500998624804212E-8, 6.04699502254191894932E-8, - -2.67079385394061173391E-7, 1.11738753912010371815E-6, - -4.41673835845875056359E-6, 1.64484480707288970893E-5, - -5.75419501008210370398E-5, 1.88502885095841655729E-4, - -5.76375574538582365885E-4, 1.63947561694133579842E-3, - -4.32430999505057594430E-3, 1.05464603945949983183E-2, - -2.37374148058994688156E-2, 4.93052842396707084878E-2, - -9.49010970480476444210E-2, 1.71620901522208775349E-1, - -3.04682672343198398683E-1, 6.76795274409476084995E-1}; - - return std::make_tuple(coefficients, 30); -} - -template -C10_HOST_DEVICE inline std::tuple chebyshev_coefficients_i0e_B() { - /* Chebyshev coefficients for exp(-x) sqrt(x) I0(x) - * in the inverted interval [8,infinity]. - * - * lim(x->inf){ exp(-x) sqrt(x) I0(x) } = 1/sqrt(2pi). - */ - static const T coefficients[] = { - -7.23318048787475395456E-18, -4.83050448594418207126E-18, - 4.46562142029675999901E-17, 3.46122286769746109310E-17, - -2.82762398051658348494E-16, -3.42548561967721913462E-16, - 1.77256013305652638360E-15, 3.81168066935262242075E-15, - -9.55484669882830764870E-15, -4.15056934728722208663E-14, - 1.54008621752140982691E-14, 3.85277838274214270114E-13, - 7.18012445138366623367E-13, -1.79417853150680611778E-12, - -1.32158118404477131188E-11, -3.14991652796324136454E-11, - 1.18891471078464383424E-11, 4.94060238822496958910E-10, - 3.39623202570838634515E-9, 2.26666899049817806459E-8, - 2.04891858946906374183E-7, 2.89137052083475648297E-6, - 6.88975834691682398426E-5, 3.36911647825569408990E-3, - 8.04490411014108831608E-1}; - - return std::make_tuple(coefficients, 25); -} - -template -static inline C10_HOST_DEVICE scalar_t calc_i0(scalar_t _x) { - static_assert(!std::is_same() && !std::is_same(), "don't instantiate with low precision type"); - // Upcast input for numerical accuracy purposes - // Needed for accurate results if input is bfloat16 or float16 - scalar_t x = ::abs(_x); - - if (x <= scalar_t{8.0}) { - auto coeff_pair = chebyshev_coefficients_i0e_A(); - auto A = std::get<0>(coeff_pair); - auto len = std::get<1>(coeff_pair); - scalar_t y = (x / scalar_t{2.0}) - scalar_t{2.0}; - return (::exp(x) * chbevl(y, A, len)); - } - - auto coeff_pair = chebyshev_coefficients_i0e_B(); - auto B = std::get<0>(coeff_pair); - auto len = std::get<1>(coeff_pair); - return (::exp(x) * chbevl(scalar_t{32.0} / x - scalar_t{2.0}, B, len) / ::sqrt(x)); -} - // See note [Jiterator] // TODO: elaborate in this comment on the structure of math.cuh #if AT_USE_JITERATOR() @@ -276,6 +174,19 @@ const auto ndtri_string = jiterator_stringify( } ); // ndtri_string +const auto log_ndtr_string = jiterator_stringify( + template + T log_ndtr(T x) { + constexpr T SQRT1_2{0.707106781186547524400844362104849039}; // 1/sqrt(2) + T t = x * SQRT1_2; + if (x < T{-1.0}) { + return log(erfcx(-t) / 2) - t * t; + } else { + return log1p(-erfc(t) / 2); + } + } +); // log_ndtr_string + const auto gcd_string = jiterator_stringify( template T gcd(const T a_in, const T b_in) { @@ -555,6 +466,8 @@ const auto entr_string = jiterator_stringify( } ); // entr_string +// NOTE: `kaiser_window_string` depends on `i0_string` +// for its implementation. const auto i0_string = jiterator_stringify( template T chbevl(T x, const T array[], const int len) { @@ -629,69 +542,6 @@ const auto i0_string = jiterator_stringify( } ); // i0_string -const auto i0e_string = jiterator_stringify( - template - T chbevl(T x, const T array[], const int len) { - T b0, b1, b2; - - b0 = array[0]; - b1 = 0; - - for (int i = 1; i < len; ++i) { - b2 = b1; - b1 = b0; - b0 = x * b1 - b2 + array[i]; - } - - return T{0.5} * (b0 - b2); - } - - template - T i0e(T _x) { - T x = fabs(_x); - - if (x <= T{8.0}) { - T coefficients[] = { - -4.41534164647933937950E-18, 3.33079451882223809783E-17, - -2.43127984654795469359E-16, 1.71539128555513303061E-15, - -1.16853328779934516808E-14, 7.67618549860493561688E-14, - -4.85644678311192946090E-13, 2.95505266312963983461E-12, - -1.72682629144155570723E-11, 9.67580903537323691224E-11, - -5.18979560163526290666E-10, 2.65982372468238665035E-9, - -1.30002500998624804212E-8, 6.04699502254191894932E-8, - -2.67079385394061173391E-7, 1.11738753912010371815E-6, - -4.41673835845875056359E-6, 1.64484480707288970893E-5, - -5.75419501008210370398E-5, 1.88502885095841655729E-4, - -5.76375574538582365885E-4, 1.63947561694133579842E-3, - -4.32430999505057594430E-3, 1.05464603945949983183E-2, - -2.37374148058994688156E-2, 4.93052842396707084878E-2, - -9.49010970480476444210E-2, 1.71620901522208775349E-1, - -3.04682672343198398683E-1, 6.76795274409476084995E-1}; - - T y = (x / T{2.0}) - T{2.0}; - return chbevl(y, coefficients, int{30}); - } - - // x > 8 - T coefficients[] = { - -7.23318048787475395456E-18, -4.83050448594418207126E-18, - 4.46562142029675999901E-17, 3.46122286769746109310E-17, - -2.82762398051658348494E-16, -3.42548561967721913462E-16, - 1.77256013305652638360E-15, 3.81168066935262242075E-15, - -9.55484669882830764870E-15, -4.15056934728722208663E-14, - 1.54008621752140982691E-14, 3.85277838274214270114E-13, - 7.18012445138366623367E-13, -1.79417853150680611778E-12, - -1.32158118404477131188E-11, -3.14991652796324136454E-11, - 1.18891471078464383424E-11, 4.94060238822496958910E-10, - 3.39623202570838634515E-9, 2.26666899049817806459E-8, - 2.04891858946906374183E-7, 2.89137052083475648297E-6, - 6.88975834691682398426E-5, 3.36911647825569408990E-3, - 8.04490411014108831608E-1}; - - return chbevl(T{32.0} / x - T{2.0}, coefficients, int{25}) / sqrt(x); - } -); // i0e_string - const auto i1_string = jiterator_stringify( template T chbevl(const T x, const T array[], const int len) { @@ -881,6 +731,15 @@ const auto i1e_string = jiterator_stringify( } ); // i1e_string +const auto kaiser_window_string = i0_string + jiterator_stringify( + template + T kaiser_window(T a, T inv_alpha, T beta, T inv_i0_beta) { + T x = a * inv_alpha - T{1}; + T y = max(T{0}, T{1} - x * x); + return i0(beta * sqrt(y)) * inv_i0_beta; + } +); // kaiser_window_string + const auto sinc_string = jiterator_stringify( template T sinc(T a) { @@ -1509,22 +1368,102 @@ static inline C10_HOST_DEVICE scalar_t calc_trigamma(scalar_t in) { return static_cast(sign * result); } +/* + * For licensing information and documentation, please refer to the the cpu implementation located in "ATen/native/Math.h". + */ template -static inline C10_HOST_DEVICE scalar_t calc_i0e(scalar_t _x) { +static inline C10_HOST_DEVICE scalar_t +chbevl(scalar_t _x, const scalar_t array[], size_t len) { static_assert(!std::is_same() && !std::is_same(), "don't instantiate with low precision type"); + + scalar_t b0, b1, b2; + + b0 = array[0]; + b1 = 0; + + for (size_t i = 1; i < len; ++i) { + b2 = b1; + b1 = b0; + b0 = _x * b1 - b2 + array[i]; + } + + return (0.5 * (b0 - b2)); +} + +/* + * For licensing information and documentation, please refer to the the cpu implementation located in "ATen/native/Math.h". + */ +template +C10_HOST_DEVICE inline std::tuple chebyshev_coefficients_i0e_A() { + /* Chebyshev coefficients for exp(-x) I0(x) + * in the interval [0,8]. + * + * lim(x->0){ exp(-x) I0(x) } = 1. + */ + static const T coefficients[] = { + -4.41534164647933937950E-18, 3.33079451882223809783E-17, + -2.43127984654795469359E-16, 1.71539128555513303061E-15, + -1.16853328779934516808E-14, 7.67618549860493561688E-14, + -4.85644678311192946090E-13, 2.95505266312963983461E-12, + -1.72682629144155570723E-11, 9.67580903537323691224E-11, + -5.18979560163526290666E-10, 2.65982372468238665035E-9, + -1.30002500998624804212E-8, 6.04699502254191894932E-8, + -2.67079385394061173391E-7, 1.11738753912010371815E-6, + -4.41673835845875056359E-6, 1.64484480707288970893E-5, + -5.75419501008210370398E-5, 1.88502885095841655729E-4, + -5.76375574538582365885E-4, 1.63947561694133579842E-3, + -4.32430999505057594430E-3, 1.05464603945949983183E-2, + -2.37374148058994688156E-2, 4.93052842396707084878E-2, + -9.49010970480476444210E-2, 1.71620901522208775349E-1, + -3.04682672343198398683E-1, 6.76795274409476084995E-1}; + + return std::make_tuple(coefficients, 30); +} + +template +C10_HOST_DEVICE inline std::tuple chebyshev_coefficients_i0e_B() { + /* Chebyshev coefficients for exp(-x) sqrt(x) I0(x) + * in the inverted interval [8,infinity]. + * + * lim(x->inf){ exp(-x) sqrt(x) I0(x) } = 1/sqrt(2pi). + */ + static const T coefficients[] = { + -7.23318048787475395456E-18, -4.83050448594418207126E-18, + 4.46562142029675999901E-17, 3.46122286769746109310E-17, + -2.82762398051658348494E-16, -3.42548561967721913462E-16, + 1.77256013305652638360E-15, 3.81168066935262242075E-15, + -9.55484669882830764870E-15, -4.15056934728722208663E-14, + 1.54008621752140982691E-14, 3.85277838274214270114E-13, + 7.18012445138366623367E-13, -1.79417853150680611778E-12, + -1.32158118404477131188E-11, -3.14991652796324136454E-11, + 1.18891471078464383424E-11, 4.94060238822496958910E-10, + 3.39623202570838634515E-9, 2.26666899049817806459E-8, + 2.04891858946906374183E-7, 2.89137052083475648297E-6, + 6.88975834691682398426E-5, 3.36911647825569408990E-3, + 8.04490411014108831608E-1}; + + return std::make_tuple(coefficients, 25); +} + +template +static inline C10_HOST_DEVICE scalar_t calc_i0(scalar_t _x) { + static_assert(!std::is_same() && !std::is_same(), "don't instantiate with low precision type"); + // Upcast input for numerical accuracy purposes + // Needed for accurate results if input is bfloat16 or float16 scalar_t x = ::abs(_x); + if (x <= scalar_t{8.0}) { auto coeff_pair = chebyshev_coefficients_i0e_A(); auto A = std::get<0>(coeff_pair); auto len = std::get<1>(coeff_pair); scalar_t y = (x / scalar_t{2.0}) - scalar_t{2.0}; - return (chbevl(y, A, len)); + return (::exp(x) * chbevl(y, A, len)); } auto coeff_pair = chebyshev_coefficients_i0e_B(); auto B = std::get<0>(coeff_pair); auto len = std::get<1>(coeff_pair); - return (chbevl(scalar_t{32.0} / x - scalar_t{2.0}, B, len) / ::sqrt(x)); + return (::exp(x) * chbevl(scalar_t{32.0} / x - scalar_t{2.0}, B, len) / ::sqrt(x)); } template diff --git a/aten/src/ATen/native/cuda/MaxUnpooling.cu b/aten/src/ATen/native/cuda/MaxUnpooling.cu index 73db29deb4aa09..085f0d9f37b37f 100644 --- a/aten/src/ATen/native/cuda/MaxUnpooling.cu +++ b/aten/src/ATen/native/cuda/MaxUnpooling.cu @@ -1,11 +1,25 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include + +#include +#include +#endif + namespace at { namespace native { diff --git a/aten/src/ATen/native/cuda/MultiLabelMarginCriterion.cu b/aten/src/ATen/native/cuda/MultiLabelMarginCriterion.cu index 88c88ce0ad8076..7f61d9a0b5b03f 100644 --- a/aten/src/ATen/native/cuda/MultiLabelMarginCriterion.cu +++ b/aten/src/ATen/native/cuda/MultiLabelMarginCriterion.cu @@ -1,12 +1,22 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include -#include -#include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#include +#else +#include +#include +#include +#include +#endif + namespace at { namespace native { diff --git a/aten/src/ATen/native/cuda/MultiMarginLoss.cu b/aten/src/ATen/native/cuda/MultiMarginLoss.cu index fcf0a6a2356a3e..15e6d1e9dc0c33 100644 --- a/aten/src/ATen/native/cuda/MultiMarginLoss.cu +++ b/aten/src/ATen/native/cuda/MultiMarginLoss.cu @@ -1,9 +1,21 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include +#include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#endif + namespace at { namespace native { namespace { @@ -114,7 +126,7 @@ __global__ void MultiMarginLoss_backward_kernel( } } -void multi_margin_loss_shape_check( +void multi_margin_loss_shape_check(int &nframe, const Tensor &input, const Tensor &target) { auto in_sizes = input.sizes(); auto dims = in_sizes.size(); @@ -124,7 +136,7 @@ void multi_margin_loss_shape_check( "Expected non-empty vector or matrix with optional 0-dim batch size, but got: ", in_sizes); - int64_t nframe = dims <= 1 ? 1 : in_sizes[0]; + nframe = dims <= 1 ? 1 : in_sizes[0]; TORCH_CHECK( target.dim() <= 1 && target.numel() == nframe, "inconsistent target size, expected ", nframe, " but got ", @@ -138,16 +150,16 @@ Tensor& multi_margin_loss_cuda_out( const c10::optional &weights_, int64_t reduction, Tensor& out_) { auto p = p_.toLong(); TORCH_CHECK(p == 1 || p == 2, "multi_margin_loss: Invalid p, expected 1 or 2 but got ", p); - multi_margin_loss_shape_check(input_, target_); - if (reduction == at::Reduction::None) { - resize_output(out_, target_.sizes()); - } else if (input_.dim() == 2) { - resize_output(out_, {input_.sizes()[0]}); + int nframe; + multi_margin_loss_shape_check(nframe, input_, target_); + + // produce a scalar output for 1d input + if (reduction == Reduction::None && target_.dim() > 0) { + resize_output(out_, {nframe}); } else { resize_output(out_, {}); } - if (input_.numel() == 0) { return out_; } @@ -166,7 +178,6 @@ Tensor& multi_margin_loss_cuda_out( AT_DISPATCH_FLOATING_TYPES_AND2(kHalf, kBFloat16, input.scalar_type(), "multi_margin_loss_cuda", [&] { const scalar_t margin = margin_.to(); if (input.dim() <= 1) { - int nframe = 1; TORCH_CHECK(target.dim() <= 1 && target.numel() == nframe, "inconsistent target size"); dim3 blocks(1); dim3 threads(MULTIMARGIN_THREADS); @@ -196,7 +207,6 @@ Tensor& multi_margin_loss_cuda_out( } else { auto in_sizes = input.sizes(); TORCH_INTERNAL_ASSERT(in_sizes.size() == 2); - int nframe = in_sizes[0]; // allow zero-dim target for 2D input. TORCH_CHECK(in_sizes[1] != 0 && target.dim() <= 1 && target.numel() == nframe, "inconsistent target size"); @@ -248,7 +258,7 @@ Tensor& multi_margin_loss_cuda_out( margin); C10_CUDA_KERNEL_LAUNCH_CHECK(); } - at::sum_out(out, tmp_output, /*dims=*/IntArrayRef{}); + at::sum_out(out, tmp_output, IntArrayRef{}); } } }); @@ -262,7 +272,7 @@ Tensor& multi_margin_loss_cuda_out( Tensor multi_margin_loss_cuda( const Tensor &input, const Tensor &target, const Scalar &p, const Scalar &margin, const c10::optional &weights, int64_t reduction) { - auto out = at::empty({}, input.options()); + auto out = at::empty({0}, input.options()); multi_margin_loss_cuda_out(input, target, p, margin, weights, reduction, out); return out; } @@ -274,7 +284,8 @@ Tensor& multi_margin_loss_cuda_backward_out( auto p = p_.toLong(); TORCH_CHECK(p == 1 || p == 2, "multi_margin_loss_backward: Invalid p, expected 1 or 2 but got ", p); - multi_margin_loss_shape_check(input_, target_); + int nframe; + multi_margin_loss_shape_check(nframe, input_, target_); resize_output(grad_input_, input_.sizes()); if (input_.numel() == 0) { @@ -331,7 +342,6 @@ Tensor& multi_margin_loss_cuda_backward_out( } else { auto in_sizes = input.sizes(); TORCH_INTERNAL_ASSERT(in_sizes.size() == 2); - int nframe = in_sizes[0]; TORCH_CHECK((in_sizes[1] != 0) && (target.dim() <= 1) && (target.numel() == nframe), "inconsistent target size"); dim3 blocks(in_sizes[0]); diff --git a/aten/src/ATen/native/cuda/MultinomialKernel.cu b/aten/src/ATen/native/cuda/MultinomialKernel.cu index f9404fab0193fc..de8e8404ac2ddc 100644 --- a/aten/src/ATen/native/cuda/MultinomialKernel.cu +++ b/aten/src/ATen/native/cuda/MultinomialKernel.cu @@ -1,8 +1,9 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include -#include -#include +#include +#include #include #include #include @@ -11,6 +12,16 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#endif + #include #include #include @@ -74,12 +85,13 @@ void renormRows(Tensor& t) { const int64_t maxThreads = std::min( props->maxThreadsPerBlock, cuda_utils::kCUDABlockReduceMaxThreads); + int warp_size = at::cuda::warp_size(); dim3 grid(rows < numSM * 4 ? rows : numSM * 4); - dim3 block(std::min(maxThreads, C10_WARP_SIZE * ceil_div(cols, int64_t{C10_WARP_SIZE}))); + dim3 block(std::min(maxThreads, warp_size * ceil_div(cols, int64_t{warp_size}))); AT_DISPATCH_FLOATING_TYPES_AND_HALF(t.scalar_type(), "renormRows_cuda", [&] { renormRowsL1 - <<>>(t.data_ptr(), rows, cols); C10_CUDA_KERNEL_LAUNCH_CHECK(); @@ -335,8 +347,9 @@ void multinomial_with_replacement_kernel_impl( int maxThreads = props->maxThreadsPerBlock; int maxShared = props->sharedMemPerBlock; - int requiredWarps = at::ceil_div(numCategories, C10_WARP_SIZE); - int requiredThreads = std::min(maxThreads, requiredWarps * C10_WARP_SIZE); + int warp_size = at::cuda::warp_size(); + int requiredWarps = at::ceil_div(numCategories, warp_size); + int requiredThreads = std::min(maxThreads, requiredWarps * warp_size); int requiredShared = requiredThreads * sizeof(accscalar_t); if (n_sample == 1 && maxShared >= requiredShared) { diff --git a/aten/src/ATen/native/cuda/NLLLoss2d.cu b/aten/src/ATen/native/cuda/NLLLoss2d.cu index 79cec9f8da3ed0..2246c836f3dcad 100644 --- a/aten/src/ATen/native/cuda/NLLLoss2d.cu +++ b/aten/src/ATen/native/cuda/NLLLoss2d.cu @@ -1,7 +1,7 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include -#include #include #include #include @@ -12,6 +12,16 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#endif + namespace at { namespace native { diff --git a/aten/src/ATen/native/cuda/NaiveConvolutionTranspose2d.cu b/aten/src/ATen/native/cuda/NaiveConvolutionTranspose2d.cu index a04d118b750247..75b4e335754053 100644 --- a/aten/src/ATen/native/cuda/NaiveConvolutionTranspose2d.cu +++ b/aten/src/ATen/native/cuda/NaiveConvolutionTranspose2d.cu @@ -1,6 +1,9 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include + +#include #include -#include +#include #include #include #include @@ -9,7 +12,16 @@ #include #include -#include + +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#endif namespace at { namespace native { diff --git a/aten/src/ATen/native/cuda/NaiveConvolutionTranspose3d.cu b/aten/src/ATen/native/cuda/NaiveConvolutionTranspose3d.cu index 1198555d144ec4..d34de0f156bd67 100644 --- a/aten/src/ATen/native/cuda/NaiveConvolutionTranspose3d.cu +++ b/aten/src/ATen/native/cuda/NaiveConvolutionTranspose3d.cu @@ -1,6 +1,7 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include -#include +#include #include #include @@ -10,6 +11,17 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#endif + namespace at { namespace native { namespace { diff --git a/aten/src/ATen/native/cuda/NaiveDilatedConvolution.cu b/aten/src/ATen/native/cuda/NaiveDilatedConvolution.cu index 2c2c11f2246720..6c2942b05de39f 100644 --- a/aten/src/ATen/native/cuda/NaiveDilatedConvolution.cu +++ b/aten/src/ATen/native/cuda/NaiveDilatedConvolution.cu @@ -1,12 +1,25 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include +#include #include #include #include #include -#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#endif + #include namespace at { diff --git a/aten/src/ATen/native/cuda/Nonzero.cu b/aten/src/ATen/native/cuda/Nonzero.cu index dcacf98a80070b..0e524b7b81fd72 100644 --- a/aten/src/ATen/native/cuda/Nonzero.cu +++ b/aten/src/ATen/native/cuda/Nonzero.cu @@ -1,4 +1,6 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include #include #include @@ -6,6 +8,13 @@ #include //for MAX_DIMS #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#include +#endif + namespace at { namespace native { diff --git a/aten/src/ATen/native/cuda/Normalization.cu b/aten/src/ATen/native/cuda/Normalization.cu index 2f9484770ad44d..e7b2372a18dad2 100644 --- a/aten/src/ATen/native/cuda/Normalization.cu +++ b/aten/src/ATen/native/cuda/Normalization.cu @@ -1,3 +1,4 @@ +// #define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include #include #include @@ -7,6 +8,30 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif + +// TODO: Doesn't exist in this branch +#if 0 +#include +#else +#include +#endif + namespace at { namespace native { namespace { diff --git a/aten/src/ATen/native/cuda/Normalization.cuh b/aten/src/ATen/native/cuda/Normalization.cuh index 6d2c806ea3771b..a9b11e76db680b 100644 --- a/aten/src/ATen/native/cuda/Normalization.cuh +++ b/aten/src/ATen/native/cuda/Normalization.cuh @@ -1,6 +1,7 @@ #pragma once -#include +#include +#include #include #include #include @@ -9,6 +10,14 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#include +#include +#endif + namespace at { namespace native { // The maximum number of threads in a block diff --git a/aten/src/ATen/native/cuda/PersistentSoftmax.cuh b/aten/src/ATen/native/cuda/PersistentSoftmax.cuh index 6fbbe1f3be472e..4ad544aeb47df8 100644 --- a/aten/src/ATen/native/cuda/PersistentSoftmax.cuh +++ b/aten/src/ATen/native/cuda/PersistentSoftmax.cuh @@ -126,7 +126,7 @@ __global__ void softmax_warp_forward(output_t *dst, const input_t *src, int batc if (!is_transformer_mask) { idx += i*element_count; } - if (mask[idx]) { + if (!mask[idx]) { max_value[i] = (is_meaningful_max && max_value[i] > elements[i][it]) ? max_value[i] : elements[i][it]; is_meaningful_max = true; } @@ -160,7 +160,7 @@ __global__ void softmax_warp_forward(output_t *dst, const input_t *src, int batc idx += i*element_count; } - if (mask[idx]) { + if (!mask[idx]) { if (is_log_softmax) { sum[i] += std::exp(elements[i][it] - max_value[i]); } else { @@ -188,7 +188,7 @@ __global__ void softmax_warp_forward(output_t *dst, const input_t *src, int batc if (!is_transformer_mask) { idx += i*element_count; } - if (!mask[idx]) { + if (mask[idx]) { dst[i*element_count+it*WARP_SIZE] = 0; continue; } @@ -297,7 +297,8 @@ void dispatch_softmax_forward(output_t *dst, const input_t *src, int softmax_ele const int next_power_of_two = 1 << log2_elements; // This value must match the WARP_SIZE constexpr value computed inside softmax_warp_forward. - int warp_size = (next_power_of_two < C10_WARP_SIZE) ? next_power_of_two : C10_WARP_SIZE; + int warp_size = at::cuda::warp_size(); + warp_size = (next_power_of_two < warp_size) ? next_power_of_two : warp_size; // This value must match the WARP_BATCH constexpr value computed inside softmax_warp_forward. int batches_per_warp = (next_power_of_two <= 128) ? 2 : 1; @@ -346,7 +347,8 @@ void dispatch_softmax_backward(output_t *grad_input, const input_t *grad, const const int next_power_of_two = 1 << log2_elements; // This value must match the WARP_SIZE constexpr value computed inside softmax_warp_backward. - int warp_size = (next_power_of_two < C10_WARP_SIZE) ? next_power_of_two : C10_WARP_SIZE; + int warp_size = at::cuda::warp_size(); + warp_size = (next_power_of_two < warp_size) ? next_power_of_two : warp_size; // This value must match the WARP_BATCH constexpr value computed inside softmax_warp_backward. int batches_per_warp = (next_power_of_two <= 128) ? 2 : 1; diff --git a/aten/src/ATen/native/cuda/PointwiseOpsKernel.cu b/aten/src/ATen/native/cuda/PointwiseOpsKernel.cu index 5e42326056c194..b1c4a2ae4b411b 100644 --- a/aten/src/ATen/native/cuda/PointwiseOpsKernel.cu +++ b/aten/src/ATen/native/cuda/PointwiseOpsKernel.cu @@ -3,6 +3,7 @@ #include #include #include +#include #include #include #include @@ -10,28 +11,88 @@ namespace at { namespace native { +const char addcmul_name[] = "addcmul"; void addcmul_cuda_kernel(TensorIteratorBase& iter, const Scalar& value) { - AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND2(kHalf, kBFloat16, iter.dtype(), "addcmul_cuda", [&]() { - // note(mkozuki): If scalar_t is fp16 or bfloat16, cast scalar to float - // and do math in fp32 for better accuracy. - using accscalar_t = at::acc_type; - auto alpha = value.to(); - gpu_kernel(iter, [alpha]GPU_LAMBDA(scalar_t a, scalar_t b, scalar_t c) -> scalar_t { - return a + alpha * (static_cast(b) * static_cast(c)); + auto dtype = iter.dtype(); + if (at::isComplexType(dtype)) { + #if AT_USE_JITERATOR() + AT_DISPATCH_COMPLEX_TYPES(dtype, "addcmul_cuda", [&]() { + auto alpha = value.to(); + static const auto addcmul_string = jiterator_stringify( + template T addcmul(T a, T b, T c, T alpha) { return a + alpha * (b * c); }); + jitted_gpu_kernel< + /*name=*/addcmul_name, + /*return_dtype=*/scalar_t, + /*common_dtype=*/scalar_t, + /*arity=*/3>( + iter, + addcmul_string, + /*scalar_pos=*/at::cuda::jit::BinaryFuncVariant::NoScalar, + /*scalar_val=*/0, + /*extra_args=*/std::make_tuple(alpha)); + }); + #else + AT_DISPATCH_COMPLEX_TYPES(dtype, "addcmul_cuda", [&]() { + auto alpha = value.to(); + gpu_kernel(iter, [alpha]GPU_LAMBDA(scalar_t a, scalar_t b, scalar_t c) -> scalar_t { + return a + alpha * b * c; + }); + }); + #endif + } else { + AT_DISPATCH_ALL_TYPES_AND2(kHalf, kBFloat16, dtype, "addcmul_cuda", [&]() { + // note(mkozuki): If scalar_t is fp16 or bfloat16, cast scalar to float + // and do math in fp32 for better accuracy. + using accscalar_t = at::acc_type; + auto alpha = value.to(); + gpu_kernel(iter, [alpha]GPU_LAMBDA(scalar_t a, scalar_t b, scalar_t c) -> scalar_t { + return a + alpha * (static_cast(b) * static_cast(c)); + }); }); - }); + } } +// return a + alpha * (b / static_cast(c)); +const char addcdiv_name[] = "addcdiv"; void addcdiv_cuda_kernel(TensorIteratorBase& iter, const Scalar& value) { - AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND2(kHalf, kBFloat16, iter.dtype(), "addcdiv_cuda", [&]() { - // note(mkozuki): If scalar_t is fp16 or bfloat16, cast scalar to float - // and do math in fp32 for better accuracy. - using accscalar_t = at::acc_type; - auto alpha = value.to(); - gpu_kernel(iter, [alpha]GPU_LAMBDA(scalar_t a, scalar_t b, scalar_t c) -> scalar_t { - return a + alpha * (b / static_cast(c)); + auto dtype = iter.dtype(); + if (at::isComplexType(dtype)) { + #if AT_USE_JITERATOR() + AT_DISPATCH_COMPLEX_TYPES(dtype, "addcdiv_cuda", [&]() { + auto alpha = value.to(); + static const auto addcdiv_string = + jiterator_stringify(template T addcdiv( + T a, T b, T c, T alpha) { return a + alpha * (b / c); }); + jitted_gpu_kernel< + /*name=*/addcdiv_name, + /*return_dtype=*/scalar_t, + /*common_dtype=*/scalar_t, + /*arity=*/3>( + iter, + addcdiv_string, + /*scalar_pos=*/at::cuda::jit::BinaryFuncVariant::NoScalar, + /*scalar_val=*/0, + /*extra_args=*/std::make_tuple(alpha)); + }); + #else + AT_DISPATCH_COMPLEX_TYPES(dtype, "addcdiv_cuda", [&]() { + auto alpha = value.to(); + gpu_kernel(iter, [alpha]GPU_LAMBDA(scalar_t a, scalar_t b, scalar_t c) -> scalar_t { + return a + alpha * (b / c); + }); + }); + #endif + } else { + AT_DISPATCH_ALL_TYPES_AND2(kHalf, kBFloat16, dtype, "addcdiv_cuda", [&]() { + // note(mkozuki): If scalar_t is fp16 or bfloat16, cast scalar to float + // and do math in fp32 for better accuracy. + using accscalar_t = at::acc_type; + auto alpha = value.to(); + gpu_kernel(iter, [alpha]GPU_LAMBDA(scalar_t a, scalar_t b, scalar_t c) -> scalar_t { + return a + alpha * (b / static_cast(c)); + }); }); - }); + } } void smooth_l1_backward_cuda_kernel(TensorIterator& iter, const Scalar& norm, double beta) { diff --git a/aten/src/ATen/native/cuda/RNN.cu b/aten/src/ATen/native/cuda/RNN.cu index 659ddc28c4979d..046bbe4a5c0421 100644 --- a/aten/src/ATen/native/cuda/RNN.cu +++ b/aten/src/ATen/native/cuda/RNN.cu @@ -1,11 +1,24 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include +#include #include -#include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#endif + namespace at { namespace native { namespace { diff --git a/aten/src/ATen/native/cuda/Randperm.cu b/aten/src/ATen/native/cuda/Randperm.cu index f0c41f5be444fb..b3c679f7772449 100644 --- a/aten/src/ATen/native/cuda/Randperm.cu +++ b/aten/src/ATen/native/cuda/Randperm.cu @@ -1,9 +1,21 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#endif + #include namespace at { diff --git a/aten/src/ATen/native/cuda/RangeFactories.cu b/aten/src/ATen/native/cuda/RangeFactories.cu index 027806ed421617..55981ac1ad8e36 100644 --- a/aten/src/ATen/native/cuda/RangeFactories.cu +++ b/aten/src/ATen/native/cuda/RangeFactories.cu @@ -1,6 +1,6 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include -#include #include #include #include @@ -8,20 +8,39 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#endif + #define GPU_LAMBDA __device__ __host__ namespace { -constexpr int num_threads = C10_WARP_SIZE * 2; +#if defined(USE_ROCM) +constexpr int num_threads() { + return 128; +} +#else +constexpr int num_threads() { + return C10_WARP_SIZE * 2; +} +#endif constexpr int thread_work_size = 1; -constexpr int block_work_size = thread_work_size * num_threads; +constexpr int block_work_size = thread_work_size * num_threads(); template -C10_LAUNCH_BOUNDS_1(num_threads) +C10_LAUNCH_BOUNDS_1(num_threads()) __global__ void elementwise_kernel_with_index(index_t N, func_t f, typename function_traits::result_type *data) { #pragma unroll for (int i = 0; i < thread_work_size; i++) { - index_t idx = block_work_size * blockIdx.x + num_threads * i + threadIdx.x; + index_t idx = block_work_size * blockIdx.x + num_threads() * i + threadIdx.x; if (idx < N) { data[idx] = f(idx); } @@ -38,10 +57,10 @@ void gpu_kernel_with_index(at::Tensor &output, func_t f) { auto stream = at::cuda::getCurrentCUDAStream(); using scalar_t = typename function_traits::result_type; if (N <= std::numeric_limits::max()) { - elementwise_kernel_with_index<<>>(N, f, output.data_ptr()); + elementwise_kernel_with_index<<>>(N, f, output.data_ptr()); C10_CUDA_KERNEL_LAUNCH_CHECK(); } else { - elementwise_kernel_with_index<<>>(N, f, output.data_ptr()); + elementwise_kernel_with_index<<>>(N, f, output.data_ptr()); C10_CUDA_KERNEL_LAUNCH_CHECK(); } } diff --git a/aten/src/ATen/native/cuda/RecordStream.cu b/aten/src/ATen/native/cuda/RecordStream.cu index d48561df00e5c5..c4cb74bdc68ffd 100644 --- a/aten/src/ATen/native/cuda/RecordStream.cu +++ b/aten/src/ATen/native/cuda/RecordStream.cu @@ -1,5 +1,13 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include + +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#endif + namespace at { namespace native { void record_stream_cuda(Tensor& self, c10::Stream stream) { c10::cuda::CUDACachingAllocator::recordStream(self.storage().data_ptr(), at::cuda::CUDAStream::unpack(stream.pack())); diff --git a/aten/src/ATen/native/cuda/Reduce.cu b/aten/src/ATen/native/cuda/Reduce.cu index 103a386ff0c99c..2de32f6d4a35e0 100644 --- a/aten/src/ATen/native/cuda/Reduce.cu +++ b/aten/src/ATen/native/cuda/Reduce.cu @@ -1,3 +1,4 @@ +#define TORCH_ASSERT_NO_OPERATORS #include #include diff --git a/aten/src/ATen/native/cuda/Reduce.cuh b/aten/src/ATen/native/cuda/Reduce.cuh index 5ee3757d5937ca..57fa55fbec7d5c 100644 --- a/aten/src/ATen/native/cuda/Reduce.cuh +++ b/aten/src/ATen/native/cuda/Reduce.cuh @@ -9,6 +9,7 @@ #include #include #include +#include #include #include #include @@ -17,6 +18,9 @@ #include #include +#include +#include + namespace at { namespace native { using at::detail::Array; @@ -272,6 +276,65 @@ func_wrapper_t func_wrapper(const func_t& op) { return func_wrapper_t { op }; } +template +struct ReduceJitOp { +//ReduceJitOp is almost like ReduceOp, but it doesn't have ops functor that specifies reduction operations +//Maybe we can find a way to unify ReduceOp and ReduceJitOp + using InputCalculator = OffsetCalculator<1, uint32_t>; + using OutputCalculator = OffsetCalculator<2, uint32_t>; + //TODO for now arg_t is always opmath_t of the input, later we'll need to change it + using arg_t = at::opmath_type; + + static constexpr int input_vec_size = ReduceConfig::input_vec_size; + //TODO - ReduceJitOp will probably need to be changed for reductions that need full functor, + //not just wrapper + arg_t ident; + ReduceConfig config; + InputCalculator input_calc; + OutputCalculator output_calc; + const void* src; + const char* dst[2]; //it accepts at most two destinations + // acc_buf used for accumulation among sub Tensor Iterator when accumulation on + // output is not permissible + void* acc_buf; + // cta_buf used for accumulation between blocks during global reduction + void* cta_buf; + int* semaphores; + int64_t base_idx; + bool accumulate; + bool final_output; + int noutputs; + + ReduceJitOp( + ReduceConfig config, + InputCalculator input_calc, + OutputCalculator output_calc, + const void* src, + char* dst0, + optional dst1, + void* acc_buf, + void* cta_buf, + int* semaphores, + arg_t ident, + int noutputs, + int64_t base_idx) + : ident(ident), + config(config), + input_calc(input_calc), + output_calc(output_calc), + src(src), + acc_buf(acc_buf), + cta_buf(cta_buf), + semaphores(semaphores), + base_idx(base_idx), + noutputs(noutputs) { + dst[0] = dst0; + if (dst1.has_value()) { + dst[1] = dst1.value(); + } + } +}; + template struct ReduceOp { using traits = function_traits; @@ -284,8 +347,6 @@ struct ReduceOp { std::is_convertible::value && std::is_convertible::value; - static constexpr float acc_buffer_multiplier = (float)sizeof(arg_t) / sizeof(out_scalar_t); - static constexpr int input_vec_size = ReduceConfig::input_vec_size; ops_t ops; @@ -837,6 +898,47 @@ static void launch_reduce_kernel(const ReduceConfig& config, const R& reduction) } } +template +static void launch_jitted_reduce_kernel(DeviceIndex idx, const ReduceConfig& config, +R& reduction, const std::string& func) { + constexpr int max_threads = mnt_wrapper::MAX_NUM_THREADS; + dim3 block = config.block(); + dim3 grid = config.grid(); + + static std::mutex _jiterator_mutex; + static std::vector> fns(c10::cuda::device_count()); + int shared_memory = config.shared_memory_size(); + at::cuda::jit::NvrtcFunction* fn_ptr; + switch(config.output_vec_size) { + case 4: + fn_ptr = &fns[idx][0]; + break; + case 2: + fn_ptr = &fns[idx][1]; + break; + default: + fn_ptr = &fns[idx][2]; + } + if (!fn_ptr->function) { + std::string f_inputs_type_str = at::cuda::jit::typeName(); + std::string accum_type_str = at::cuda::jit::typeName>(); + std::string result_type_str = at::cuda::jit::typeName(); + int max_threads_codegen = max_threads/config.output_vec_size; + auto code = at::cuda::jit::generate_reduction_code(1, func, name, vt0, + f_inputs_type_str, accum_type_str, result_type_str, + true, false, config.output_vec_size, max_threads_codegen); + + *fn_ptr = at::cuda::jit::jit_pwise_function(code, "reduction_"+std::string(name)); + + } + constexpr int kernel_args = 1; + void* args[kernel_args]; + args[0] = static_cast(&reduction); + at::cuda::jit::launch_jitted_pwise_function(*fn_ptr, args, grid, block, shared_memory); +} + + class AccumulationBuffer { public: AccumulationBuffer() {} @@ -874,7 +976,7 @@ class AccumulationBuffer { }; template -int get_output_vec_size(TensorIterator &iter) { +int get_output_vec_size(const TensorIterator &iter) { int vec_size = 4; auto update_vec_size = [&vec_size](uint64_t n) { while(n % vec_size != 0) { @@ -898,61 +1000,8 @@ int get_output_vec_size(TensorIterator &iter) { return vec_size; } -template -inline void gpu_reduce_kernel(TensorIterator& iter, const ops_t& ops, ident_t ident=0, - AccumulationBuffer* acc_buf_ptr=nullptr, int64_t base_idx=0) { - AT_ASSERT(iter.numel() > 0 && iter.ntensors() - iter.noutputs() == 1 && iter.noutputs() >= 1); - - using traits = function_traits; - using arg_t = typename traits::template arg<0>::type; - static constexpr bool can_accumulate_in_output = - std::is_convertible::value; - - bool can_use_32bit_indexing = iter.can_use_32bit_indexing(); - std::unique_ptr owned_buf_ptr; - - // The acc_buf_ptr is a shared pointer. It is create at the first entrance and - // reused by all recursive function calls. - if (acc_buf_ptr == NULL) { - // acc_buf_ptr holds buffer used for accumulation among multiple sub_iter - // when accumulation in output is not possible. - if (!can_accumulate_in_output && !can_use_32bit_indexing) { - int64_t output_memory_size = iter.element_size(0); - for (int dim = 0; dim < iter.ndim(); dim++) { - output_memory_size = std::max(output_memory_size, iter.shape()[dim] * iter.strides(0)[dim]); - } - output_memory_size /= iter.element_size(0); //iter.strides is in bytes - owned_buf_ptr.reset(new AccumulationBuffer(sizeof(arg_t), - sizeof(out_scalar_t), - (char*) iter.data_ptr(0), - output_memory_size * sizeof(arg_t))); - } else { - owned_buf_ptr.reset(new AccumulationBuffer()); - } - acc_buf_ptr = owned_buf_ptr.get(); - } - - if (!can_use_32bit_indexing) { - for (auto& sub_iter : iter.with_32bit_indexing()) { - int64_t sub_iter_base_idx = sub_iter.view_offsets()[0]; - - gpu_reduce_kernel(sub_iter, ops, ident, - acc_buf_ptr, sub_iter_base_idx); - } - return; - } - - const char* in_data = (char*)iter.data_ptr(iter.ntensors() - 1); - char* out_data = (char*)iter.data_ptr(0); - const auto noutputs = iter.noutputs(); - optional out_data_extra; - if (noutputs > 1) { - out_data_extra = (char*)iter.data_ptr(1); - } else { - out_data_extra = nullopt; - } - char* acc_data = acc_buf_ptr->get_acc_slice(out_data); - +template +ReduceConfig setReduceConfig(const TensorIterator& iter){ // Start by assuming that each thread handles a single output and all // the inputs for that output. int64_t num_outputs = iter.num_output_elements(); @@ -1080,7 +1129,64 @@ inline void gpu_reduce_kernel(TensorIterator& iter, const ops_t& ops, ident_t id config.input_mult[2] = config.split_input(config.ctas_per_output); } } + return config; +}; + +template +inline void gpu_reduce_kernel(TensorIterator& iter, const ops_t& ops, ident_t ident=0, + AccumulationBuffer* acc_buf_ptr=nullptr, int64_t base_idx=0) { + AT_ASSERT(iter.numel() > 0 && iter.ntensors() - iter.noutputs() == 1 && iter.noutputs() >= 1); + + using traits = function_traits; + using arg_t = typename traits::template arg<0>::type; + static constexpr bool can_accumulate_in_output = + std::is_convertible::value; + + bool can_use_32bit_indexing = iter.can_use_32bit_indexing(); + std::unique_ptr owned_buf_ptr; + // The acc_buf_ptr is a shared pointer. It is create at the first entrance and + // reused by all recursive function calls. + if (acc_buf_ptr == NULL) { + // acc_buf_ptr holds buffer used for accumulation among multiple sub_iter + // when accumulation in output is not possible. + if (!can_accumulate_in_output && !can_use_32bit_indexing) { + int64_t output_memory_size = iter.element_size(0); + for (int dim = 0; dim < iter.ndim(); dim++) { + output_memory_size = std::max(output_memory_size, iter.shape()[dim] * iter.strides(0)[dim]); + } + output_memory_size /= iter.element_size(0); //iter.strides is in bytes + owned_buf_ptr.reset(new AccumulationBuffer(sizeof(arg_t), + sizeof(out_scalar_t), + (char*) iter.data_ptr(0), + output_memory_size * sizeof(arg_t))); + } else { + owned_buf_ptr.reset(new AccumulationBuffer()); + } + acc_buf_ptr = owned_buf_ptr.get(); + } + + if (!can_use_32bit_indexing) { + for (auto& sub_iter : iter.with_32bit_indexing()) { + int64_t sub_iter_base_idx = sub_iter.view_offsets()[0]; + + gpu_reduce_kernel(sub_iter, ops, ident, + acc_buf_ptr, sub_iter_base_idx); + } + return; + } + + const char* in_data = (char*)iter.data_ptr(iter.ntensors() - 1); + char* out_data = (char*)iter.data_ptr(0); + const auto noutputs = iter.noutputs(); + optional out_data_extra; + if (noutputs > 1) { + out_data_extra = (char*)iter.data_ptr(1); + } else { + out_data_extra = nullopt; + } + char* acc_data = acc_buf_ptr->get_acc_slice(out_data); + ReduceConfig config = setReduceConfig(iter); at::DataPtr buffer; at::DataPtr semaphores; if (config.should_global_reduce()) { @@ -1115,4 +1221,101 @@ inline void gpu_reduce_kernel(TensorIterator& iter, const ops_t& ops, ident_t id launch_reduce_kernel::MAX_NUM_THREADS>(config, reduce); } +//TODO this is 100 lines of almost-copy-paste, because we have to have different template args for this function +//try unifying with gpu_reduce_kernel +template +inline void jitted_gpu_reduce_kernel(TensorIterator& iter, const std::string& func, ident_t ident=0, + AccumulationBuffer* acc_buf_ptr=nullptr, int64_t base_idx=0) { + AT_ASSERT(iter.numel() > 0 && iter.ntensors() - iter.noutputs() == 1 && iter.noutputs() >= 1); + + //TODO - this will be different for more complicated reductions, but for now reductions using + //func_wrapper all have arg_t = opmath + using arg_t = at::opmath_type; + static constexpr bool can_accumulate_in_output = + std::is_convertible::value; + static_assert(can_accumulate_in_output == true, "unsupported arg_t for jitted reduction"); + + bool can_use_32bit_indexing = iter.can_use_32bit_indexing(); + std::unique_ptr owned_buf_ptr; + + // The acc_buf_ptr is a shared pointer. It is create at the first entrance and + // reused by all recursive function calls. + if (acc_buf_ptr == NULL) { + // acc_buf_ptr holds buffer used for accumulation among multiple sub_iter + // when accumulation in output is not possible. + if (!can_accumulate_in_output && !can_use_32bit_indexing) { + int64_t output_memory_size = iter.element_size(0); + for (int dim = 0; dim < iter.ndim(); dim++) { + output_memory_size = std::max(output_memory_size, iter.shape()[dim] * iter.strides(0)[dim]); + } + output_memory_size /= iter.element_size(0); //iter.strides is in bytes + owned_buf_ptr.reset(new AccumulationBuffer(sizeof(out_scalar_t), //TODO + sizeof(out_scalar_t), + (char*) iter.data_ptr(0), + output_memory_size * sizeof(out_scalar_t))); //TODO + } else { + owned_buf_ptr.reset(new AccumulationBuffer()); + } + acc_buf_ptr = owned_buf_ptr.get(); + } + + if (!can_use_32bit_indexing) { + for (auto& sub_iter : iter.with_32bit_indexing()) { + int64_t sub_iter_base_idx = sub_iter.view_offsets()[0]; + + jitted_gpu_reduce_kernel(sub_iter, func, ident, + acc_buf_ptr, sub_iter_base_idx); + } + return; + } + + //TODO - for now we support a single input, we may be able to relax this constraint + const char* in_data = (char*)iter.data_ptr(iter.ntensors() - 1); + char* out_data = (char*)iter.data_ptr(0); + const auto noutputs = iter.noutputs(); + optional out_data_extra; + if (noutputs > 1) { + out_data_extra = (char*)iter.data_ptr(1); + } else { + out_data_extra = nullopt; + } + char* acc_data = acc_buf_ptr->get_acc_slice(out_data); + + ReduceConfig config = setReduceConfig(iter); + + at::DataPtr buffer; + at::DataPtr semaphores; + if (config.should_global_reduce()) { + auto& allocator = *c10::cuda::CUDACachingAllocator::get(); + buffer = allocator.allocate(config.global_memory_size()); + semaphores = allocator.allocate(config.semaphore_size()); + + auto stream = at::cuda::getCurrentCUDAStream(); + AT_CUDA_CHECK(cudaMemsetAsync(semaphores.get(), 0, config.semaphore_size(), stream)); + } + + AT_ASSERT(can_use_32bit_indexing); + auto output_calc = make_output_calculator(iter); + auto input_calc = make_input_calculator(iter); + auto reduce = ReduceJitOp( + config, + input_calc, + output_calc, + in_data, + out_data, + out_data_extra, + acc_data, + buffer.get(), + (int*)semaphores.get(), + ident, + noutputs, + base_idx); + reduce.accumulate = iter.should_accumulate(); + reduce.final_output = iter.is_final_output(); + + launch_jitted_reduce_kernel(iter.device().index(), + config, reduce, func); +} + }} // namespace at::native diff --git a/aten/src/ATen/native/cuda/ReduceOps.cpp b/aten/src/ATen/native/cuda/ReduceOps.cpp index 472b26cbd872c7..52bc562f0612c5 100644 --- a/aten/src/ATen/native/cuda/ReduceOps.cpp +++ b/aten/src/ATen/native/cuda/ReduceOps.cpp @@ -1,3 +1,4 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include #include @@ -6,9 +7,24 @@ #include #include -#include +#include +#include +#include +#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#endif + namespace at { namespace native { namespace { diff --git a/aten/src/ATen/native/cuda/ReduceSumProdKernel.cu b/aten/src/ATen/native/cuda/ReduceSumProdKernel.cu index bf81ed5b794026..9faeae965feac9 100644 --- a/aten/src/ATen/native/cuda/ReduceSumProdKernel.cu +++ b/aten/src/ATen/native/cuda/ReduceSumProdKernel.cu @@ -5,6 +5,7 @@ #include #include #include +#include namespace at { namespace native { @@ -26,14 +27,28 @@ struct nansum_functor { } }; +const char op_name[] = "prod"; + template struct prod_functor { + #if AT_USE_JITERATOR() + void operator()(TensorIterator& iter) { + std::string func = jiterator_stringify( + arg_t combine(arg_t a, arg_t b) { + return a * b; + } + ); + jitted_gpu_reduce_kernel( + iter, func, 1.); + } + #else void operator()(TensorIterator& iter) { gpu_reduce_kernel( iter, func_wrapper([] GPU_LAMBDA(acc_t a, acc_t b) -> acc_t { return a * b; - }), 1); + }), 1.); } + #endif }; // Workaround for the error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context] diff --git a/aten/src/ATen/native/cuda/ReflectionPad.cu b/aten/src/ATen/native/cuda/ReflectionPad.cu index e497bae885f0f2..33f71368ca10bc 100644 --- a/aten/src/ATen/native/cuda/ReflectionPad.cu +++ b/aten/src/ATen/native/cuda/ReflectionPad.cu @@ -1,12 +1,27 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include +#include #include #include #include -#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#endif + #include namespace at { diff --git a/aten/src/ATen/native/cuda/Repeat.cu b/aten/src/ATen/native/cuda/Repeat.cu index 43d6602ea8e2ae..1b29dac6690f39 100644 --- a/aten/src/ATen/native/cuda/Repeat.cu +++ b/aten/src/ATen/native/cuda/Repeat.cu @@ -1,7 +1,15 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#endif + template __global__ static void compute_cuda_kernel( index_t* repeat_ptr, @@ -33,7 +41,7 @@ static void compute_cuda( int64_t size, int64_t result_size) { int64_t block = 512; - int64_t warps_per_block = block / C10_WARP_SIZE; + int64_t warps_per_block = block / at::cuda::warp_size(); int64_t grid = std::min((size + warps_per_block - 1) / warps_per_block, 2048L); diff --git a/aten/src/ATen/native/cuda/ReplicationPadding.cu b/aten/src/ATen/native/cuda/ReplicationPadding.cu index 754161c62097cd..d967ffd0354df6 100644 --- a/aten/src/ATen/native/cuda/ReplicationPadding.cu +++ b/aten/src/ATen/native/cuda/ReplicationPadding.cu @@ -1,13 +1,26 @@ -#include +#include #include +#include #include #include #include -#include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#endif + #include #include #include diff --git a/aten/src/ATen/native/cuda/Resize.cpp b/aten/src/ATen/native/cuda/Resize.cpp index c4167ec56e67a1..43e1cb95157402 100644 --- a/aten/src/ATen/native/cuda/Resize.cpp +++ b/aten/src/ATen/native/cuda/Resize.cpp @@ -1,10 +1,16 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include #include -#include -#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#endif + namespace at { namespace native { diff --git a/aten/src/ATen/native/cuda/Resize.h b/aten/src/ATen/native/cuda/Resize.h index 33ab263693dc5f..569b145fa61d99 100644 --- a/aten/src/ATen/native/cuda/Resize.h +++ b/aten/src/ATen/native/cuda/Resize.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include #include @@ -9,19 +9,15 @@ namespace at { namespace native { TORCH_CUDA_CPP_API void resize_bytes_cuda(StorageImpl* storage, size_t size_bytes); -static inline void maybe_resize_storage_cuda(TensorImpl* self, uint64_t new_size) { +static inline void maybe_resize_storage_cuda(TensorImpl* self, size_t new_size_bytes) { // It does not make sense to try to resize a storage // to hold 0 elements, and this can break // if storage_offset is positive but // new_size is 0, so just bail in that case // (same comment is in Resize.h) - if (new_size == 0) { + if (self->numel() == 0) { return; } - auto new_size_bytes_i = (new_size + self->storage_offset()) * self->dtype().itemsize(); - TORCH_CHECK(!overflows(new_size_bytes_i), "Requested storage size (", - new_size_bytes_i, ") cannot be represented as a size_t"); - const auto new_size_bytes = static_cast(new_size_bytes_i); const Storage &storage = self->unsafe_storage(); TORCH_CHECK(storage, "Tensor: invalid null storage"); @@ -33,7 +29,7 @@ static inline void maybe_resize_storage_cuda(TensorImpl* self, uint64_t new_size inline TensorImpl* resize_impl_cuda_( TensorImpl* self, IntArrayRef size, - c10::optional stride, + at::OptionalIntArrayRef stride, bool device_guard = true) { if (self->sizes() == size && (!stride || self->strides() == stride)) { return self; @@ -45,14 +41,17 @@ inline TensorImpl* resize_impl_cuda_( guard.set_index(self->storage().device().index()); } - int64_t storage_size = 1; + const auto itemsize = self->dtype().itemsize(); + const auto storage_offset = self->storage_offset(); + size_t storage_size = 1; if (stride) { self->set_sizes_and_strides(size, *stride); - // NB: storage size can be different from numel. - storage_size = storage_size_for(size, *stride); + storage_size = at::detail::computeStorageNbytes( + size, *stride, itemsize, storage_offset); } else { self->set_sizes_contiguous(size); - storage_size = self->numel(); + storage_size = at::detail::computeStorageNbytesContiguous( + size, itemsize, storage_offset); } maybe_resize_storage_cuda(self, storage_size); diff --git a/aten/src/ATen/native/cuda/RreluWithNoise.cu b/aten/src/ATen/native/cuda/RreluWithNoise.cu index b73097758fd75f..3b2435d3dae420 100644 --- a/aten/src/ATen/native/cuda/RreluWithNoise.cu +++ b/aten/src/ATen/native/cuda/RreluWithNoise.cu @@ -1,8 +1,19 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#endif + + namespace at { namespace native { template diff --git a/aten/src/ATen/native/cuda/ScanKernels.cpp b/aten/src/ATen/native/cuda/ScanKernels.cpp index f88faa1fcac9e3..8ba8b742af7714 100644 --- a/aten/src/ATen/native/cuda/ScanKernels.cpp +++ b/aten/src/ATen/native/cuda/ScanKernels.cpp @@ -1,10 +1,21 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include -#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#endif + namespace at { namespace native { static c10::MaybeOwned contiguous_out_arg(const Tensor &tensor) { diff --git a/aten/src/ATen/native/cuda/ScanKernels.h b/aten/src/ATen/native/cuda/ScanKernels.h index a502847f63075c..28e65372511bc7 100644 --- a/aten/src/ATen/native/cuda/ScanKernels.h +++ b/aten/src/ATen/native/cuda/ScanKernels.h @@ -1,3 +1,4 @@ +#pragma once #include namespace at { diff --git a/aten/src/ATen/native/cuda/ScatterGatherKernel.cu b/aten/src/ATen/native/cuda/ScatterGatherKernel.cu index 4ec12e166634a3..e80ec7def9611f 100644 --- a/aten/src/ATen/native/cuda/ScatterGatherKernel.cu +++ b/aten/src/ATen/native/cuda/ScatterGatherKernel.cu @@ -1,6 +1,7 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include -#include +#include #include #include @@ -34,6 +35,33 @@ public: }; static ReduceAdd reduce_add; +class ReduceMean { +public: + template + constexpr C10_DEVICE void operator() (scalar_t * self_data, const scalar_t * src_data) const { + gpuAtomicAddNoReturn(self_data, *src_data); + } +}; +static ReduceMean reduce_mean; + +class ReduceMinimum { +public: + template + constexpr C10_DEVICE void operator() (scalar_t * self_data, const scalar_t * src_data) const { + gpuAtomicMin(self_data, *src_data); + } +}; +static ReduceMinimum reduce_minimum; + +class ReduceMaximum { +public: + template + constexpr C10_DEVICE void operator() (scalar_t * self_data, const scalar_t * src_data) const { + gpuAtomicMax(self_data, *src_data); + } +}; +static ReduceMaximum reduce_maximum; + class TensorAssign { public: template @@ -126,12 +154,11 @@ struct _cuda_scatter_gather_internal_kernel { template struct cuda_scatter_gather_base_kernel { - template void operator()( const Tensor& self, int64_t dim, const Tensor& index, const Tensor& src, const std::string& method_name, - const func_t& f + const ReduceAdd& f ) { at::assert_no_internal_overlap(self); @@ -189,7 +216,66 @@ struct cuda_scatter_gather_base_kernel { const Tensor& self, int64_t dim, const Tensor& index, const Tensor& src, const std::string& method_name, - const ReduceMultiply& f + const TensorAssign& f + ) { + at::assert_no_internal_overlap(self); + + auto index_sizes = ensure_nonempty_vec(index.sizes().vec()); + auto self_strides = ensure_nonempty_vec(self.strides().vec()); + auto src_strides = ensure_nonempty_vec(src.strides().vec()); + + // restride self and src such that + // self.shape = src.shape = index.shape + // + // restride stride[dim] such that + // if (is_scatter_like) self.stride[dim] = 0 + // else src.stride[dim] = 0 + auto self_restrided = is_scatter_like ? + restride_dim(self, dim, index_sizes) + : self.as_strided(index_sizes, self_strides); + auto src_restrided = is_scatter_like ? + src.as_strided(index_sizes, src_strides) + : restride_dim(src, dim, index_sizes); + + auto iter = TensorIteratorConfig() + .set_check_mem_overlap(false) + .check_all_same_dtype(false) + .resize_outputs(false) + .add_output(self_restrided) + .add_input(src_restrided) + .add_input(index) + .build(); + + auto self_dim_stride = ensure_nonempty_stride(self, dim); + auto self_dim_size = ensure_nonempty_size(self, dim); + + auto src_dim_stride = ensure_nonempty_stride(src, dim); + auto src_dim_size = ensure_nonempty_size(src, dim); + + auto index_size = is_scatter_like ? self_dim_size : src_dim_size; + auto index_stride = is_scatter_like ? self_dim_stride : src_dim_stride; + + + AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3( + at::ScalarType::Half, at::ScalarType::Bool, at::ScalarType::BFloat16, + iter.dtype(), + "cuda_scatter_gather_base_kernel_func", [&] { + using dtype = typename std::conditional, scalar_t>::type; + + _cuda_scatter_gather_internal_kernel()( + iter, index_size, index_stride, f + ); + } + ); + } + + template + void operator()( + const Tensor& self, int64_t dim, + const Tensor& index, const Tensor& src, + const std::string& method_name, + const func_t& f ) { at::assert_no_internal_overlap(self); @@ -232,7 +318,7 @@ struct cuda_scatter_gather_base_kernel { AT_DISPATCH_FLOATING_TYPES_AND2( at::ScalarType::Half, at::ScalarType::BFloat16, iter.dtype(), - "cuda_scatter_gather_base_kernel_reduce_multiply", [&] { + "cuda_scatter_gather_base_kernel_func", [&] { using dtype = typename std::conditional, scalar_t>::type; @@ -416,6 +502,34 @@ void scatter_reduce_cuda_kernel(const Tensor& self, const int64_t dim, const Ten cuda_scatter_gather_base_kernel()(self, dim, index, src, "scatter_reduce_cuda_multiply_", reduce_multiply); break; + default : + break; + } +} + +void scatter_reduce_two_cuda_kernel(const Tensor& self, const int64_t dim, const Tensor& index, + const Tensor& src, const SCATTER_GATHER_OP& reduce) { + switch (reduce) { + case SCATTER_GATHER_OP::REDUCE_ADD : + cuda_scatter_gather_base_kernel()(self, dim, index, src, + "scatter_reduce_cuda_sum_", reduce_add); + break; + case SCATTER_GATHER_OP::REDUCE_MULTIPLY : + cuda_scatter_gather_base_kernel()(self, dim, index, src, + "scatter_reduce_cuda_prod_", reduce_multiply); + break; + case SCATTER_GATHER_OP::REDUCE_MAXIMUM : + cuda_scatter_gather_base_kernel()(self, dim, index, src, + "scatter_reduce_cuda_amax_", reduce_maximum); + break; + case SCATTER_GATHER_OP::REDUCE_MINIMUM : + cuda_scatter_gather_base_kernel()(self, dim, index, src, + "scatter_reduce_cuda_amin_", reduce_minimum); + break; + case SCATTER_GATHER_OP::REDUCE_MEAN : + cuda_scatter_gather_base_kernel()(self, dim, index, src, + "scatter_reduce_cuda_mean_", reduce_mean); + break; } } @@ -430,6 +544,8 @@ void scatter_scalar_reduce_cuda_kernel(const Tensor& self, const int64_t dim, co cuda_scatter_fill_base_kernel()(self, dim, index, value, "scatter_fill_cuda_multiply_", reduce_multiply); break; + default : + break; } } @@ -440,5 +556,6 @@ REGISTER_DISPATCH(scatter_fill_stub, &scatter_fill_cuda_kernel); REGISTER_DISPATCH(scatter_add_stub, &scatter_add_cuda_kernel); REGISTER_DISPATCH(scatter_reduce_stub, &scatter_reduce_cuda_kernel); REGISTER_DISPATCH(scatter_scalar_reduce_stub, &scatter_scalar_reduce_cuda_kernel); +REGISTER_DISPATCH(scatter_reduce_two_stub, &scatter_reduce_two_cuda_kernel); }} // namespace at::native diff --git a/aten/src/ATen/native/cuda/SegmentReduce.cu b/aten/src/ATen/native/cuda/SegmentReduce.cu index 6a5a768ae0d89d..862de29c76cbbd 100644 --- a/aten/src/ATen/native/cuda/SegmentReduce.cu +++ b/aten/src/ATen/native/cuda/SegmentReduce.cu @@ -1,12 +1,20 @@ - +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include -#include +#include +#include #include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#include +#endif + namespace at { namespace native { diff --git a/aten/src/ATen/native/cuda/Shape.cu b/aten/src/ATen/native/cuda/Shape.cu index 17eb9197307595..590761ad690483 100644 --- a/aten/src/ATen/native/cuda/Shape.cu +++ b/aten/src/ATen/native/cuda/Shape.cu @@ -1,4 +1,5 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include #include @@ -9,14 +10,22 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#endif + namespace at { namespace native { -#if defined(USE_ROCM) -constexpr int CAT_ARRAY_BATCH_SIZE = 1024; -#else constexpr int CAT_ARRAY_BATCH_SIZE = 128; -#endif constexpr int CAT_ARRAY_MAX_INPUT_DIMS = 4; namespace { @@ -83,45 +92,6 @@ struct TensorSizeStride { */ -// Use pinned memory and and pass the struct by pointer on ROCm -template -struct CatArrInputTensor { - T* input; - IndexType offset; - IndexType dimSize; - IndexType nElements; -}; - -template -C10_LAUNCH_BOUNDS_1(512) -__global__ void HIP_CatArrayBatchedCopy( - T* output, - CatArrInputTensor* inputs, - TensorSizeStride os, - const int concatDim, - IndexType dimStride) { - - IndexType tid = blockIdx.x * blockDim.x + threadIdx.x; - IndexType nElements = inputs[blockIdx.y].nElements; - - if(tid >= nElements) return; - - T* data = inputs[blockIdx.y].input; - IndexType offset = inputs[blockIdx.y].offset; - IndexType dimSize = inputs[blockIdx.y].dimSize; - IndexType dataOffset = offset * dimStride; - - IndexType stride = gridDim.x * blockDim.x; - - while( tid < nElements){ - IndexType elementOffset = CatArrIndexToOffset::compute( - os.tensorSize, os.tensorStride, dimSize, concatDim, tid); - output[dataOffset + elementOffset] = data[tid]; - - tid += stride; - } -} - // pass meta data directly through kernel argument instead of pin memory // In contiguous case, we will not need stride_size, setting it as 1 as placeholder // to pass compile. @@ -171,127 +141,6 @@ __global__ void CatArrayBatchedCopy( } } -template -void hip_parallel_cat(Tensor &out, const TensorList &inputs, int64_t dimension, - int nDims, c10::MemoryFormat memory_format) { - // First, let's set up our kernel parameters. We start with a raw pointer to - // the storage for the output Tensor. - scalar_t *data = out.data_ptr(); - - // Kernel Parameter - long tensorMetadataSize = - sizeof(CatArrInputTensor) * CAT_ARRAY_BATCH_SIZE; - auto d_inputs_storage = at::empty( - {tensorMetadataSize}, out.options().dtype(at::kByte)); - auto d_inputs = static_cast *>( - d_inputs_storage.data_ptr()); - - TensorSizeStride outputParam; - - // Next, let's initialize the size, stride arrays for the output Tensor. - if (memory_format == c10::MemoryFormat::Contiguous) { - for (int i = 0; i < nDims; ++i) { - outputParam.tensorSize[i] = at::native::size(out, i); - outputParam.tensorStride[i] = out.stride(i); - } - } else if (memory_format == c10::MemoryFormat::ChannelsLast || memory_format == c10::MemoryFormat::ChannelsLast3d) { - // permute the semantics of dims from NCHW to NHWC so that the input - // tensor is now contiguous - outputParam.tensorSize[0] = at::native::size(out, 0); - outputParam.tensorStride[0] = out.stride(0); - for (int i = 1; i < nDims - 1; ++i) { - outputParam.tensorSize[i] = at::native::size(out, i + 1); - outputParam.tensorStride[i] = out.stride(i + 1); - } - outputParam.tensorSize[nDims - 1] = at::native::size(out, 1); - outputParam.tensorStride[nDims - 1] = out.stride(1); - } else { - TORCH_CHECK(false, "unsupported memory format"); - } - - at::cuda::CUDAStream stream = at::cuda::getCurrentCUDAStream(); - - // Now we loop - int batchCounter = 0; - int64_t offset = 0; - for (int i = 0; i < inputs.size() ; i += CAT_ARRAY_BATCH_SIZE) { - // Re-allocate stackInputs every iteration to avoid read-after-write hazard - { - auto stackInputs_storage = at::empty({tensorMetadataSize}, - out.options().dtype(at::kByte).device(at::kCPU).pinned_memory(true)); - auto stackInputs = - static_cast *>( - stackInputs_storage.data_ptr()); - for (batchCounter = 0; - batchCounter < CAT_ARRAY_BATCH_SIZE && - (i+batchCounter) < inputs.size(); - ++batchCounter) { - int64_t dimSize = 0; - // There is a legacy case where a 1-D empty tensor can be concat with - // high-dimensional tensor - if (inputs[i+batchCounter].numel() > 0) { - dimSize = at::native::size(inputs[i+batchCounter], dimension); - } - - stackInputs[batchCounter].input = - inputs[i+batchCounter].data_ptr(); - stackInputs[batchCounter].offset = offset; - stackInputs[batchCounter].dimSize = dimSize; - stackInputs[batchCounter].nElements = inputs[i+batchCounter].numel(); - - // update offset - offset += dimSize; - } - at::native::copy_(d_inputs_storage, stackInputs_storage, - /* non_blocking= */ true); - } - - // Next, let's consider how we set our kernel launch parameters. - // We borrow from THCApply, which the kernel's internal indexing - // is based on. - dim3 applyBlock = dim3(32*16); - - //Get grid where x dim fills half gpu and y dim is number of tensors. - //This will have cating two tensors fill the entire grid, but prevent - //many threads from needlessly load meta data if their sizes is small. - dim3 catGrid; - getCatGrid(batchCounter, catGrid); - - if (memory_format != c10::MemoryFormat::Contiguous) { - switch (dimension) { - case 0: - break; - case 1: - dimension = nDims - dimension; - break; - default: - dimension--; - } - } - // Template Declarations for dim = 1, 2, 3, 4 -#define HANDLE_CASE(DIMS) \ - HIP_CatArrayBatchedCopy<<<\ - catGrid, applyBlock, 0, stream.stream()>>>(\ - data, d_inputs, outputParam, dimension, outputParam.tensorStride[dimension]); \ - C10_CUDA_KERNEL_LAUNCH_CHECK(); - switch (nDims) { - case 1: - HANDLE_CASE(1); - break; - case 2: - HANDLE_CASE(2); - break; - case 3: - HANDLE_CASE(3); - break; - case 4: - HANDLE_CASE(4); - break; - } -#undef HANDLE_CASE - } -} - template void parallel_cat(Tensor &out, const TensorList &inputs, int64_t dimension, int nDims, c10::MemoryFormat memory_format) { @@ -304,19 +153,19 @@ void parallel_cat(Tensor &out, const TensorList &inputs, int64_t dimension, // Next, let's initialize the size, stride arrays for the output Tensor. if (memory_format == c10::MemoryFormat::Contiguous) { for (int i = 0; i < nDims; ++i) { - outputParam.tensorSize[i] = at::native::size(out, i); + outputParam.tensorSize[i] = out.size(i); outputParam.tensorStride[i] = out.stride(i); } } else if (memory_format == c10::MemoryFormat::ChannelsLast || memory_format == c10::MemoryFormat::ChannelsLast3d) { // permute the semantics of dims from NCHW to NHWC so that the input // tensor is now contiguous - outputParam.tensorSize[0] = at::native::size(out, 0); + outputParam.tensorSize[0] = out.size(0); outputParam.tensorStride[0] = out.stride(0); for (int i = 1; i < nDims - 1; ++i) { - outputParam.tensorSize[i] = at::native::size(out, i + 1); + outputParam.tensorSize[i] = out.size(i + 1); outputParam.tensorStride[i] = out.stride(i + 1); } - outputParam.tensorSize[nDims - 1] = at::native::size(out, 1); + outputParam.tensorSize[nDims - 1] = out.size(1); outputParam.tensorStride[nDims - 1] = out.stride(1); } else { TORCH_CHECK(false, "unsupported memory format"); @@ -336,7 +185,7 @@ void parallel_cat(Tensor &out, const TensorList &inputs, int64_t dimension, // There is a legacy case where a 1-D empty tensor can be concat with // high-dimensional tensor if (inputs[i+batchCounter].numel() > 0) { - dimSize = at::native::size(inputs[i+batchCounter], dimension); + dimSize = inputs[i+batchCounter].size(dimension); } catMetaData.input[batchCounter] = inputs[i+batchCounter].data_ptr(); catMetaData.offset[batchCounter] = offset; @@ -440,7 +289,7 @@ Tensor& cat_out_cuda(TensorList inputs, int64_t dimension, Tensor& out) { // (i.e. other empty sizes are not skipped). // FIXME: warn if this is the case auto should_skip = [](const Tensor &t) { - return t.dim() == 1 && at::native::size(t, 0) == 0; + return t.dim() == 1 && t.size(0) == 0; }; const Tensor *notSkippedTensor = NULL; // non-owning reference @@ -502,7 +351,7 @@ Tensor& cat_out_cuda(TensorList inputs, int64_t dimension, Tensor& out) { continue; } check_cat_shape_except_dim(*notSkippedTensor, tensor, dimension, i); - cat_dim_size += at::native::size(tensor, dimension); + cat_dim_size += tensor.size(dimension); } // Compute the size of the result @@ -546,19 +395,6 @@ Tensor& cat_out_cuda(TensorList inputs, int64_t dimension, Tensor& out) { }); allSameType = allSameType && (out.scalar_type() == firstType); -#if defined(USE_ROCM) - if (inputs.size() > 1 && - out.dim() <= CAT_ARRAY_MAX_INPUT_DIMS && - at::cuda::detail::canUse32BitIndexMath(out) && - allContiguous && - all32BitIndexable && - allSameType) { - AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3( - at::ScalarType::Half, at::ScalarType::Bool, at::ScalarType::BFloat16, - out.scalar_type(), "cat_cuda", [&]() { - hip_parallel_cat(out, inputs, dimension, nDims, memory_format); - }); -#else // We support the contiguous inputs and non-contiguous input (<=4 dims) in different ways // For contiguous input, we don't need to pass stride meta data to cuda kernel through constant // memory. Therefore, we could pass more inputs to cuda threads. @@ -570,8 +406,8 @@ Tensor& cat_out_cuda(TensorList inputs, int64_t dimension, Tensor& out) { allContiguous && all32BitIndexable && allSameType) { - AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3( - at::ScalarType::Half, at::ScalarType::Bool, at::ScalarType::BFloat16, + AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND4( + kComplexHalf, kHalf, kBool, kBFloat16, out.scalar_type(), "cat_cuda", [&]() { parallel_cat(out, inputs, dimension, nDims, memory_format); }); @@ -582,18 +418,17 @@ Tensor& cat_out_cuda(TensorList inputs, int64_t dimension, Tensor& out) { all32BitIndexable && allSameType && memory_format == c10::MemoryFormat::Contiguous) { - AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3( - at::ScalarType::Half, at::ScalarType::Bool, at::ScalarType::BFloat16, + AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND4( + kComplexHalf, kHalf, kBool, kBFloat16, out.scalar_type(), "cat_cuda", [&]() { parallel_cat(out, inputs, dimension, nDims, memory_format); }); -#endif } else { int64_t offset = 0; for (int j = 0; j < inputs.size(); j++) { if (should_skip(inputs[j])) continue; - int64_t dimSize = at::native::size(inputs[j], dimension); + int64_t dimSize = inputs[j].size(dimension); Tensor nt = at::narrow(out, dimension, offset, dimSize); copy_(nt, inputs[j]); offset += dimSize; diff --git a/aten/src/ATen/native/cuda/SoftMax.cu b/aten/src/ATen/native/cuda/SoftMax.cu index 181fbb994c3fda..8c12e034ba48e6 100644 --- a/aten/src/ATen/native/cuda/SoftMax.cu +++ b/aten/src/ATen/native/cuda/SoftMax.cu @@ -1,7 +1,9 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include +#include #include -#include +#include #include #include @@ -13,6 +15,18 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#endif + namespace at { namespace native { @@ -153,7 +167,7 @@ inline dim3 SoftMax_getBlockSize(int ILP, uint64_t dim_size) { while (block_size < (max_block_size)) block_size *= 2; // Launch at least a single warp - the kernel assumes that. - block_size = std::max(block_size, static_cast(C10_WARP_SIZE)); + block_size = std::max(block_size, static_cast(at::cuda::warp_size())); return dim3(block_size); } @@ -959,8 +973,7 @@ Tensor masked_softmax_cuda(const Tensor& input, const Tensor& mask) { input.scalar_type(), "masked_softmax", [&] { - Tensor mask_not = mask.logical_not(); - output = at::softmax(input.masked_fill(mask_not, -std::numeric_limits::infinity()), -1); + output = at::softmax(input.masked_fill(mask, -std::numeric_limits::infinity()), -1); }); return output; } diff --git a/aten/src/ATen/native/cuda/Sort.cpp b/aten/src/ATen/native/cuda/Sort.cpp index 8bb7d93bfdb551..21f77f7050649b 100644 --- a/aten/src/ATen/native/cuda/Sort.cpp +++ b/aten/src/ATen/native/cuda/Sort.cpp @@ -1,11 +1,23 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include +#include #include -#include -#include #include +#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#endif + #include namespace at { namespace native { diff --git a/aten/src/ATen/native/cuda/SortImpl.cu b/aten/src/ATen/native/cuda/SortImpl.cu index a806c4a138746d..c6e29262046e8e 100644 --- a/aten/src/ATen/native/cuda/SortImpl.cu +++ b/aten/src/ATen/native/cuda/SortImpl.cu @@ -1,4 +1,6 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include namespace at { namespace native { diff --git a/aten/src/ATen/native/cuda/Sorting.cpp b/aten/src/ATen/native/cuda/Sorting.cpp index f92c4778051837..97b8df55416e23 100644 --- a/aten/src/ATen/native/cuda/Sorting.cpp +++ b/aten/src/ATen/native/cuda/Sorting.cpp @@ -1,13 +1,27 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include -#include -#include -#include +#include +#include +#include +#include #include +#include #include #include + #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#endif namespace at { namespace native { diff --git a/aten/src/ATen/native/cuda/Sorting.cu b/aten/src/ATen/native/cuda/Sorting.cu index d72788c1b97c79..52fa2710596d4b 100644 --- a/aten/src/ATen/native/cuda/Sorting.cu +++ b/aten/src/ATen/native/cuda/Sorting.cu @@ -5,6 +5,7 @@ #include #include #include +#include #include #include #include @@ -189,7 +190,7 @@ struct KthValueLauncher { } dim3 block(std::min( - round_up(slice_size, (int64_t)C10_WARP_SIZE), (int64_t)1024)); + round_up(slice_size, (int64_t)at::cuda::warp_size()), (int64_t)1024)); auto stream = at::cuda::getCurrentCUDAStream(); gatherKthValue<<>>( self_info, @@ -228,7 +229,7 @@ struct MedianLauncher { } dim3 block(std::min( - round_up(slice_size, (int64_t)C10_WARP_SIZE), (int64_t)1024)); + round_up(slice_size, (int64_t)at::cuda::warp_size()), (int64_t)1024)); auto stream = at::cuda::getCurrentCUDAStream(); gatherMedian<<>>( values_info, diff --git a/aten/src/ATen/native/cuda/SparseMM.cu b/aten/src/ATen/native/cuda/SparseMM.cu index 0cc3fe3806a072..922efa5f4fcb5d 100644 --- a/aten/src/ATen/native/cuda/SparseMM.cu +++ b/aten/src/ATen/native/cuda/SparseMM.cu @@ -1,7 +1,13 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#endif + namespace at { namespace native { // sparse, sparse, sparse, dense, real, real -> sparse Tensor& _sspaddmm_out_only_sparse_cuda(const Tensor& self, diff --git a/aten/src/ATen/native/cuda/SpectralOps.cpp b/aten/src/ATen/native/cuda/SpectralOps.cpp index f431e1e31cb47a..b418e8ffc8abb2 100644 --- a/aten/src/ATen/native/cuda/SpectralOps.cpp +++ b/aten/src/ATen/native/cuda/SpectralOps.cpp @@ -1,19 +1,28 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include #include -#include -#include -#include -#include +#include +#include #include #include -#include #include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#endif + #include #include diff --git a/aten/src/ATen/native/cuda/SpectralOps.cu b/aten/src/ATen/native/cuda/SpectralOps.cu index 4a91f58e61ec43..df51fe46afea68 100644 --- a/aten/src/ATen/native/cuda/SpectralOps.cu +++ b/aten/src/ATen/native/cuda/SpectralOps.cu @@ -1,19 +1,11 @@ -#include +#define TORCH_ASSERT_NO_OPERATORS #include #include #include -#include -#include #include #include #include -#include -#include #include -#include -#include -#include - #include #include @@ -21,8 +13,6 @@ namespace at { namespace native { -using namespace at::native::detail; - // Offset calculator for indexing in Hermitian mirrored order. // In mirrored dims, maps linear index i to (n - i) % n template diff --git a/aten/src/ATen/native/cuda/SummaryOps.cu b/aten/src/ATen/native/cuda/SummaryOps.cu index 4b47d0c9cd90a3..9877e8cf7c3c7e 100644 --- a/aten/src/ATen/native/cuda/SummaryOps.cu +++ b/aten/src/ATen/native/cuda/SummaryOps.cu @@ -1,10 +1,22 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include #include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#endif + namespace at { namespace cuda { #define THRESH_NUMBER_BINS_FOR_MULTI_BLOCK_MEM 100 diff --git a/aten/src/ATen/native/cuda/TensorCompare.cpp b/aten/src/ATen/native/cuda/TensorCompare.cpp index 5d2c84fdaca5a9..b99df69f3b2aa2 100644 --- a/aten/src/ATen/native/cuda/TensorCompare.cpp +++ b/aten/src/ATen/native/cuda/TensorCompare.cpp @@ -1,4 +1,5 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include namespace at { namespace native { diff --git a/aten/src/ATen/native/cuda/TensorFactories.cu b/aten/src/ATen/native/cuda/TensorFactories.cu index 29bd7adce5a0f0..f442c9c9f4e16f 100644 --- a/aten/src/ATen/native/cuda/TensorFactories.cu +++ b/aten/src/ATen/native/cuda/TensorFactories.cu @@ -1,14 +1,29 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include #include +#include #include #include #include -#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#endif + #include #include #include diff --git a/aten/src/ATen/native/cuda/TensorModeKernel.cpp b/aten/src/ATen/native/cuda/TensorModeKernel.cpp index 73ae5f3199b9ab..c04693bb72e215 100644 --- a/aten/src/ATen/native/cuda/TensorModeKernel.cpp +++ b/aten/src/ATen/native/cuda/TensorModeKernel.cpp @@ -1,5 +1,5 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include -#include #include #include #include diff --git a/aten/src/ATen/native/cuda/TensorModeKernel.cu b/aten/src/ATen/native/cuda/TensorModeKernel.cu index 40a8e19eb44502..c62e68a9041675 100644 --- a/aten/src/ATen/native/cuda/TensorModeKernel.cu +++ b/aten/src/ATen/native/cuda/TensorModeKernel.cu @@ -142,7 +142,8 @@ void handle_fused_mode( int64_t slice_size, int64_t slices) { constexpr int num_threads = size / 2; - static_assert(num_threads % C10_WARP_SIZE == 0 && + int warp_size = at::cuda::warp_size(); + TORCH_INTERNAL_ASSERT(num_threads % warp_size == 0 && num_threads <= cuda_utils::kCUDABlockReduceMaxThreads, ""); const auto memsize = (sizeof(scalar_t) * size) + (2 * size * sizeof(unsigned int)); @@ -191,15 +192,9 @@ void fused_mode( case 16: case 8: case 4: - case 2: { - if (ceilPowerOf2 > 2 * C10_WARP_SIZE) { - handle_fused_mode<128, scalar_t>( - grid, self, ti_values, ti_indices, slice_size, slices); - } else { - handle_fused_mode<2 * C10_WARP_SIZE, scalar_t>( - grid, self, ti_values, ti_indices, slice_size, slices); - } - } + case 2: + handle_fused_mode<128, scalar_t>( + grid, self, ti_values, ti_indices, slice_size, slices); break; case 1: default: diff --git a/aten/src/ATen/native/cuda/TensorShapeCUDA.cpp b/aten/src/ATen/native/cuda/TensorShapeCUDA.cpp index cc1c523dc1a341..0bb7eb410acf3a 100644 --- a/aten/src/ATen/native/cuda/TensorShapeCUDA.cpp +++ b/aten/src/ATen/native/cuda/TensorShapeCUDA.cpp @@ -1,9 +1,15 @@ - -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#endif + namespace at { namespace native { @@ -27,8 +33,8 @@ Tensor& set_storage_cuda_(Tensor& result, Storage storage, int64_t storage_offse checkSetStorage(result, storage, storage_offset, size, stride); result.unsafeGetTensorImpl()->set_storage_offset(storage_offset); - c10::optional stride_opt = stride.data() != nullptr ? - c10::optional(stride) : c10::nullopt; + at::OptionalIntArrayRef stride_opt = stride.data() != nullptr ? + at::OptionalIntArrayRef(stride) : c10::nullopt; at::native::resize_impl_cuda_(result.unsafeGetTensorImpl(), size, stride_opt); return result; } diff --git a/aten/src/ATen/native/cuda/TensorTopK.cpp b/aten/src/ATen/native/cuda/TensorTopK.cpp index 392b3ce25ce2d5..fcd155c2c7fe3e 100644 --- a/aten/src/ATen/native/cuda/TensorTopK.cpp +++ b/aten/src/ATen/native/cuda/TensorTopK.cpp @@ -1,9 +1,21 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include -#include -#include + +#include +#include +#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#endif + namespace at { namespace native { diff --git a/aten/src/ATen/native/cuda/TensorTopK.cu b/aten/src/ATen/native/cuda/TensorTopK.cu index 7980619a786471..9e1e717903dac4 100644 --- a/aten/src/ATen/native/cuda/TensorTopK.cu +++ b/aten/src/ATen/native/cuda/TensorTopK.cu @@ -189,7 +189,8 @@ void launch( dim3 grid; TORCH_INTERNAL_ASSERT(getGridFromTiles(numInputSlices, grid), "Too many slices for topk"); - dim3 block(std::min(at::ceil_div((int64_t)inputSliceSize, (int64_t)C10_WARP_SIZE) * (int64_t)C10_WARP_SIZE, (int64_t)1024)); + int warp_size = at::cuda::warp_size(); + dim3 block(std::min(at::ceil_div((int64_t)inputSliceSize, (int64_t)warp_size) * (int64_t)warp_size, (int64_t)1024)); gatherTopK<<>>( input, inputSliceSize, @@ -472,7 +473,8 @@ void launch( { dim3 grid; TORCH_INTERNAL_ASSERT(getGridFromTiles(numInputSlices, grid), "Too many slices for topk"); - dim3 block(std::min(at::ceil_div((int64_t)inputSliceSize, (int64_t)C10_WARP_SIZE) * (int64_t)C10_WARP_SIZE, (int64_t)1024)); + int warp_size = at::cuda::warp_size(); + dim3 block(std::min(at::ceil_div((int64_t)inputSliceSize, (int64_t)warp_size) * (int64_t)warp_size, (int64_t)1024)); sbtopk::gatherTopK<<>>( input, inputSliceSize, diff --git a/aten/src/ATen/native/cuda/TensorTransformations.cu b/aten/src/ATen/native/cuda/TensorTransformations.cu index d46a5613df78cb..335d746294d0df 100644 --- a/aten/src/ATen/native/cuda/TensorTransformations.cu +++ b/aten/src/ATen/native/cuda/TensorTransformations.cu @@ -1,11 +1,20 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include +#include #include -#include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#endif + #include #include diff --git a/aten/src/ATen/native/cuda/TriangularOps.cu b/aten/src/ATen/native/cuda/TriangularOps.cu index 1e264a0890787e..2d7bf30309dc86 100644 --- a/aten/src/ATen/native/cuda/TriangularOps.cu +++ b/aten/src/ATen/native/cuda/TriangularOps.cu @@ -11,6 +11,7 @@ #include #else #include +#include #include #include #include diff --git a/aten/src/ATen/native/cuda/UnaryLogKernels.cu b/aten/src/ATen/native/cuda/UnaryLogKernels.cu index 47f88383de428a..0f9eb26aba2d16 100644 --- a/aten/src/ATen/native/cuda/UnaryLogKernels.cu +++ b/aten/src/ATen/native/cuda/UnaryLogKernels.cu @@ -4,26 +4,70 @@ #include #include #include +#include +#include #include #include #include namespace at { namespace native { +const char log_name[] = "log_kernel"; void log_kernel_cuda(TensorIteratorBase& iter) { - AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND2(ScalarType::Half, ScalarType::BFloat16, iter.common_dtype(), "log_cuda", [&]() { - gpu_kernel(iter, []GPU_LAMBDA(scalar_t a) -> scalar_t { - return ::log(a); + auto common_dtype = iter.common_dtype(); + if (at::isComplexType(common_dtype)) { +#if AT_USE_JITERATOR() + static const auto log_string = jiterator_stringify( + template T log_kernel(T x) { return std::log(x); }); + AT_DISPATCH_COMPLEX_TYPES(common_dtype, "log_cuda", [&]() { + jitted_gpu_kernel< + /*name=*/log_name, + /*return_dtype=*/scalar_t, + /*common_dtype=*/scalar_t, + /*arity=*/1>(iter, log_string); }); - }); +#else + AT_DISPATCH_COMPLEX_TYPES(iter.common_dtype(), "log_cuda", [&]() { + gpu_kernel( + iter, [] GPU_LAMBDA(scalar_t a) -> scalar_t { return ::log(a); }); + }); +#endif + } else { + AT_DISPATCH_FLOATING_TYPES_AND2(ScalarType::Half, ScalarType::BFloat16, iter.common_dtype(), "log_cuda", [&]() { + gpu_kernel(iter, []GPU_LAMBDA(scalar_t a) -> scalar_t { + return ::log(a); + }); + }); + } } +const char log10_name[] = "log10_kernel"; void log10_kernel_cuda(TensorIteratorBase& iter) { - AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND2(ScalarType::Half, ScalarType::BFloat16, iter.common_dtype(), "log10_cuda", [&]() { - gpu_kernel(iter, []GPU_LAMBDA(scalar_t a) -> scalar_t { - return ::log10(a); + auto common_dtype = iter.common_dtype(); + if (at::isComplexType(common_dtype)) { +#if AT_USE_JITERATOR() + static const auto log10_string = jiterator_stringify( + template T log10_kernel(T x) { return std::log10(x); }); + AT_DISPATCH_COMPLEX_TYPES(common_dtype, "log10_cuda", [&]() { + jitted_gpu_kernel< + /*name=*/log10_name, + /*return_dtype=*/scalar_t, + /*common_dtype=*/scalar_t, + /*arity=*/1>(iter, log10_string); }); - }); +#else + AT_DISPATCH_COMPLEX_TYPES(iter.common_dtype(), "log10_cuda", [&]() { + gpu_kernel( + iter, [] GPU_LAMBDA(scalar_t a) -> scalar_t { return ::log10(a); }); + }); +#endif + } else { + AT_DISPATCH_FLOATING_TYPES_AND2(ScalarType::Half, ScalarType::BFloat16, iter.common_dtype(), "log10_cuda", [&]() { + gpu_kernel(iter, []GPU_LAMBDA(scalar_t a) -> scalar_t { + return ::log10(a); + }); + }); + } } void log1p_kernel_cuda(TensorIteratorBase& iter) { @@ -34,12 +78,33 @@ void log1p_kernel_cuda(TensorIteratorBase& iter) { }); } +const char log2_name[] = "log2_kernel"; void log2_kernel_cuda(TensorIteratorBase& iter) { - AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND2(ScalarType::Half, ScalarType::BFloat16, iter.common_dtype(), "log2_cuda", [&]() { - gpu_kernel(iter, []GPU_LAMBDA(scalar_t a) -> scalar_t { - return ::log2(a); + auto common_dtype = iter.common_dtype(); + if (at::isComplexType(common_dtype)) { +#if AT_USE_JITERATOR() + static const auto log2_string = jiterator_stringify( + template T log2_kernel(T x) { return std::log2(x); }); + AT_DISPATCH_COMPLEX_TYPES(common_dtype, "log2_cuda", [&]() { + jitted_gpu_kernel< + /*name=*/log2_name, + /*return_dtype=*/scalar_t, + /*common_dtype=*/scalar_t, + /*arity=*/1>(iter, log2_string); }); - }); +#else + AT_DISPATCH_COMPLEX_TYPES(iter.common_dtype(), "log2_cuda", [&]() { + gpu_kernel( + iter, [] GPU_LAMBDA(scalar_t a) -> scalar_t { return ::log2(a); }); + }); +#endif + } else { + AT_DISPATCH_FLOATING_TYPES_AND2(ScalarType::Half, ScalarType::BFloat16, iter.common_dtype(), "log2_cuda", [&]() { + gpu_kernel(iter, []GPU_LAMBDA(scalar_t a) -> scalar_t { + return ::log2(a); + }); + }); + } } REGISTER_DISPATCH(log_stub, &log_kernel_cuda); diff --git a/aten/src/ATen/native/cuda/UnaryOpsKernel.cu b/aten/src/ATen/native/cuda/UnaryOpsKernel.cu index 671ce1d6cbcdfc..303170690b423c 100644 --- a/aten/src/ATen/native/cuda/UnaryOpsKernel.cu +++ b/aten/src/ATen/native/cuda/UnaryOpsKernel.cu @@ -8,6 +8,8 @@ #include #include #include +#include +#include #include #include #include @@ -32,12 +34,37 @@ void bitwise_not_kernel_cuda(TensorIteratorBase& iter) { } } +const char exp_name[] = "exp_kernel"; void exp_kernel_cuda(TensorIteratorBase& iter) { - AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, iter.common_dtype(), "exp_cuda", [&]() { - gpu_kernel(iter, []GPU_LAMBDA(scalar_t a) -> scalar_t { - return std::exp(a); + auto common_dtype = iter.common_dtype(); + if (at::isComplexType(common_dtype)) { + #if AT_USE_JITERATOR() + static const auto exp_string = jiterator_stringify( + template + T exp_kernel(T x) { + return std::exp(x); + }); // exp_string + AT_DISPATCH_COMPLEX_TYPES(common_dtype, "exp_cuda", [&]() { + jitted_gpu_kernel< + /*name=*/exp_name, + /*return_dtype=*/scalar_t, + /*common_dtype=*/scalar_t, + /*arity=*/1>(iter, exp_string); + }); + #else + AT_DISPATCH_COMPLEX_TYPES(common_dtype, "exp_cuda", [&]() { + gpu_kernel(iter, []GPU_LAMBDA(scalar_t a) -> scalar_t { + return std::exp(a); + }); + }); + #endif + } else { + AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, common_dtype, "exp_cuda", [&]() { + gpu_kernel(iter, []GPU_LAMBDA(scalar_t a) -> scalar_t { + return std::exp(a); + }); }); - }); + } } void expm1_kernel_cuda(TensorIteratorBase& iter) { @@ -53,19 +80,45 @@ void expm1_kernel_cuda(TensorIteratorBase& iter) { // We manually overload rsqrt because std::rsqrt does not work with complex types. template -__host__ __device__ static inline scalar_t rsqrt_wrapper(scalar_t v) { +C10_HOST_DEVICE static inline scalar_t rsqrt_wrapper(scalar_t v) { return ::rsqrt(v); } template -__host__ __device__ static inline c10::complex rsqrt_wrapper(c10::complex v) { +C10_HOST_DEVICE static inline c10::complex rsqrt_wrapper(c10::complex v) { const c10::complex one = c10::complex(1.0, 0); // std::sqrt for c10::complex is overloaded in c10/util/complex_math.h return one / ::sqrt(v); } +const char rsqrt_name[] = "rsqrt_kernel"; void rsqrt_kernel_cuda(TensorIteratorBase& iter) { - AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND2( + auto common_dtype = iter.common_dtype(); + if (at::isComplexType(common_dtype)) { + #if AT_USE_JITERATOR() + static const auto rsqrt_string = jiterator_stringify( + template + T rsqrt_kernel(T x) { + const T one = T{1}; + return one / std::sqrt(x); + }); // rsqrt_string + AT_DISPATCH_COMPLEX_TYPES(common_dtype, "rsqrt_cuda", [&]() { + jitted_gpu_kernel< + /*name=*/rsqrt_name, + /*return_dtype=*/scalar_t, + /*common_dtype=*/scalar_t, + /*arity=*/1>(iter, rsqrt_string); + }); + #else + AT_DISPATCH_COMPLEX_TYPES(common_dtype, "rsqrt_cuda", [&]() { + gpu_kernel(iter, []GPU_LAMBDA(scalar_t a) -> scalar_t { + // In CUDA, ::rsqrt is overloaded for float and at::Half here is implicitly cast to float. + return rsqrt_wrapper(a); + }); + }); + #endif + } else { + AT_DISPATCH_FLOATING_TYPES_AND2( ScalarType::BFloat16, ScalarType::Half, iter.common_dtype(), "rsqrt_cuda", [&]() { @@ -74,14 +127,40 @@ void rsqrt_kernel_cuda(TensorIteratorBase& iter) { return rsqrt_wrapper(a); }); }); + } } +const char sqrt_name[] = "sqrt_kernel"; void sqrt_kernel_cuda(TensorIteratorBase& iter) { - AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND2(ScalarType::Half, ScalarType::BFloat16, iter.common_dtype(), "sqrt_cuda", [&]() { - gpu_kernel(iter, []GPU_LAMBDA(scalar_t a) -> scalar_t { - return ::sqrt(a); + auto common_dtype = iter.common_dtype(); + if (at::isComplexType(common_dtype)) { + #if AT_USE_JITERATOR() + static const auto sqrt_string = jiterator_stringify( + template + T sqrt_kernel(T x) { + return std::sqrt(x); + }); // sqrt_string + AT_DISPATCH_COMPLEX_TYPES(common_dtype, "sqrt_cuda", [&]() { + jitted_gpu_kernel< + /*name=*/sqrt_name, + /*return_dtype=*/scalar_t, + /*common_dtype=*/scalar_t, + /*arity=*/1>(iter, sqrt_string); + }); + #else + AT_DISPATCH_COMPLEX_TYPES(common_dtype, "sqrt_cuda", [&]() { + gpu_kernel(iter, []GPU_LAMBDA(scalar_t a) -> scalar_t { + return std::sqrt(a); + }); + }); + #endif + } else { + AT_DISPATCH_FLOATING_TYPES_AND2(ScalarType::Half, ScalarType::BFloat16, common_dtype, "sqrt_cuda", [&]() { + gpu_kernel(iter, []GPU_LAMBDA(scalar_t a) -> scalar_t { + return std::sqrt(a); + }); }); - }); + } } void clamp_kernel_cuda(TensorIteratorBase& iter, const Scalar& min_value, const Scalar& max_value) { diff --git a/aten/src/ATen/native/cuda/UnarySignKernels.cu b/aten/src/ATen/native/cuda/UnarySignKernels.cu index b88dc6597bdd3d..a41a59f4e95a07 100644 --- a/aten/src/ATen/native/cuda/UnarySignKernels.cu +++ b/aten/src/ATen/native/cuda/UnarySignKernels.cu @@ -1,6 +1,7 @@ #define TORCH_ASSERT_NO_OPERATORS #include #include +#include #include #include #include @@ -23,12 +24,38 @@ void logical_not_kernel_cuda(TensorIteratorBase& iter) { } // NB: Ignores the negative bit on tensors +const char neg_name[] = "neg_kernel"; void neg_kernel_cuda(TensorIteratorBase& iter) { - AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND2(ScalarType::Half, at::ScalarType::BFloat16, iter.dtype(), "neg_cuda", [&]() { + auto dtype = iter.dtype(); + if (at::isComplexType(dtype)) { +#if AT_USE_JITERATOR() + static const auto neg_string = jiterator_stringify( + template + T neg_kernel(T a) { + return -a; + } + ); // neg_string + AT_DISPATCH_COMPLEX_TYPES(dtype, "neg_cuda", [&]() { + jitted_gpu_kernel< + /*name=*/ neg_name, + /*return_dtype=*/ scalar_t, + /*common_dtype=*/ scalar_t, + /*arity=*/ 1>(iter, neg_string); + }); +#else + AT_DISPATCH_COMPLEX_TYPES(dtype, "neg_cuda", [&]() { + gpu_kernel(iter, []GPU_LAMBDA(scalar_t a) -> scalar_t { + return -a; + }); + }); +#endif + } else { + AT_DISPATCH_ALL_TYPES_AND2(ScalarType::Half, ScalarType::BFloat16, dtype, "neg_cuda", [&]() { gpu_kernel(iter, []GPU_LAMBDA(scalar_t a) -> scalar_t { return -a; }); }); + } } void sign_kernel_cuda(TensorIteratorBase& iter){ @@ -52,7 +79,7 @@ void signbit_kernel_cuda(TensorIteratorBase& iter){ } template -__host__ __device__ static inline c10::complex sgn_wrapper(c10::complex z) { +C10_HOST_DEVICE static inline c10::complex sgn_wrapper(c10::complex z) { if (z == c10::complex(0, 0)) { return c10::complex(0, 0); } else { @@ -60,13 +87,37 @@ __host__ __device__ static inline c10::complex sgn_wrapper(c10::complex z) } } +const char sgn_name[] = "sgn_kernel"; void sgn_kernel_cuda(TensorIteratorBase& iter){ - AT_DISPATCH_COMPLEX_TYPES(iter.dtype(), "sgn_cuda", [&]() { + auto dtype = iter.dtype(); + #if AT_USE_JITERATOR() + static const auto sgn_string = jiterator_stringify( + template + T sgn_kernel(T z) { + const T zero = T(0); + if (z == zero) { + return zero; + } else { + return z / std::abs(z); + } + } + ); // sgn_string + AT_DISPATCH_COMPLEX_TYPES(dtype, "sgn_cuda", [&]() { + jitted_gpu_kernel< + /*name=*/ sgn_name, + /*return_dtype=*/ scalar_t, + /*common_dtype=*/ scalar_t, + /*arity=*/ 1>(iter, sgn_string); + }); + #else + AT_DISPATCH_COMPLEX_TYPES(dtype, "sgn_cuda", [&]() { gpu_kernel(iter, []GPU_LAMBDA(scalar_t a) -> scalar_t { return sgn_wrapper(a); }); }); + #endif } + REGISTER_DISPATCH(logical_not_stub, &logical_not_kernel_cuda); REGISTER_DISPATCH(neg_stub, &neg_kernel_cuda); REGISTER_DISPATCH(sign_stub, &sign_kernel_cuda); diff --git a/aten/src/ATen/native/cuda/UnarySpecialOpsKernel.cu b/aten/src/ATen/native/cuda/UnarySpecialOpsKernel.cu index 71a35534702252..84a45a9ec78151 100644 --- a/aten/src/ATen/native/cuda/UnarySpecialOpsKernel.cu +++ b/aten/src/ATen/native/cuda/UnarySpecialOpsKernel.cu @@ -63,7 +63,7 @@ void i0_kernel_cuda(TensorIteratorBase& iter) { } // See note [Jiterator] -const char i0e_name[] = "i0e"; +const char i0e_name[] = "calc_i0e"; void i0e_kernel_cuda(TensorIteratorBase& iter) { #if AT_USE_JITERATOR() AT_DISPATCH_FLOATING_TYPES_AND2(ScalarType::Half, ScalarType::BFloat16, iter.common_dtype(), "i0e_cuda", [&]() { @@ -120,12 +120,39 @@ void i1e_kernel_cuda(TensorIteratorBase& iter) { #endif } +const char sigmoid_name[] = "sigmoid"; void sigmoid_kernel_cuda(TensorIteratorBase& iter) { - AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, iter.common_dtype(), "sigmoid_cuda", [&]() { - gpu_kernel(iter, []GPU_LAMBDA(scalar_t a) -> scalar_t { - return static_cast(1) / (static_cast(1) + std::exp(-a)); + auto common_dtype = iter.common_dtype(); + if (at::isComplexType(common_dtype)) { + // only jiterate for complex-dtype + #if AT_USE_JITERATOR() + static const auto sigmoid_string = jiterator_stringify( + template + T sigmoid(T x) { + return T{1} / (T{1} + std::exp(-x)); + } + ); // sigmoid_string + AT_DISPATCH_COMPLEX_TYPES(common_dtype, "sigmoid_cuda", [&]() { + jitted_gpu_kernel< + /*name=*/sigmoid_name, + /*return_dtype=*/scalar_t, + /*common_dtype=*/scalar_t, + /*arity=*/1>(iter, sigmoid_string); + }); + #else + AT_DISPATCH_COMPLEX_TYPES(common_dtype, "sigmoid_cuda", [&]() { + gpu_kernel(iter, []GPU_LAMBDA(scalar_t a) -> scalar_t { + return scalar_t{1} / (scalar_t{1} + std::exp(-a)); + }); + }); + #endif + } else { + AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, common_dtype, "sigmoid_cuda", [&]() { + gpu_kernel(iter, []GPU_LAMBDA(scalar_t a) -> scalar_t { + return scalar_t{1} / (scalar_t{1} + std::exp(-a)); + }); }); - }); + } } const char sinc_name[] = "sinc"; @@ -202,6 +229,23 @@ void ndtri_kernel_cuda(TensorIteratorBase& iter) { #endif } +const char log_ndtr_name[] = "log_ndtr"; +void log_ndtr_kernel_cuda(TensorIteratorBase& iter) { + #if AT_USE_JITERATOR() + AT_DISPATCH_FLOATING_TYPES(iter.common_dtype(), "log_ndtr_cuda", [&]() { + jitted_gpu_kernel(iter, log_ndtr_string); + }); + #else + AT_DISPATCH_FLOATING_TYPES(iter.common_dtype(), "log_ndtr_cuda", [&]() { + gpu_kernel( + iter, [] GPU_LAMBDA(scalar_t a) -> scalar_t { return calc_log_ndtr(a); }); + }); + #endif +} + void erf_kernel_cuda(TensorIteratorBase& iter) { AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, iter.common_dtype(), "erf_cuda", [&]() { gpu_kernel(iter, []GPU_LAMBDA(scalar_t a) -> scalar_t { @@ -264,18 +308,38 @@ void erfcx_kernel_cuda(TensorIteratorBase& iter) { #endif } +const char kaiser_window_name[] = "kaiser_window"; void kaiser_window_kernel_cuda(TensorIteratorBase& iter, int64_t window_length, double beta_){ - AT_DISPATCH_FLOATING_TYPES_AND2(ScalarType::Half, ScalarType::BFloat16, iter.dtype(), "kaiser_window_cuda", [&](){ - using opmath_t = at::opmath_type; - const opmath_t inv_alpha = static_cast(2.0 / (window_length - 1)); - const opmath_t beta = static_cast(beta_); - const opmath_t inv_i0_beta = 1.0 / calc_i0(beta); - gpu_kernel(iter, [=]GPU_LAMBDA(scalar_t a) -> scalar_t { - opmath_t x = static_cast(a) * inv_alpha - 1; - opmath_t y = std::max(0, 1 - x * x); - return calc_i0(beta * ::sqrt(y)) * inv_i0_beta; + #if AT_USE_JITERATOR() + AT_DISPATCH_FLOATING_TYPES_AND2(ScalarType::Half, ScalarType::BFloat16, iter.dtype(), "kaiser_window_cuda", [&](){ + using opmath_t = at::opmath_type; + const opmath_t inv_alpha = static_cast(2.0 / (window_length - 1)); + const opmath_t beta = static_cast(beta_); + const opmath_t inv_i0_beta = 1.0 / calc_i0(beta); + jitted_gpu_kernel< + /*name=*/kaiser_window_name, + /*return_dtype=*/scalar_t, + /*common_dtype=*/scalar_t, + /*arity=*/1>( + iter, + kaiser_window_string, + /*scalar_pos=*/at::cuda::jit::BinaryFuncVariant::NoScalar, + /*scalar_val=*/0, + /*extra_args=*/std::make_tuple(inv_alpha, beta, inv_i0_beta)); }); - }); + #else + AT_DISPATCH_FLOATING_TYPES_AND2(ScalarType::Half, ScalarType::BFloat16, iter.dtype(), "kaiser_window_cuda", [&](){ + using opmath_t = at::opmath_type; + const opmath_t inv_alpha = static_cast(2.0 / (window_length - 1)); + const opmath_t beta = static_cast(beta_); + const opmath_t inv_i0_beta = 1.0 / calc_i0(beta); + gpu_kernel(iter, [=]GPU_LAMBDA(scalar_t a) -> scalar_t { + opmath_t x = static_cast(a) * inv_alpha - 1; + opmath_t y = std::max(0, 1 - x * x); + return calc_i0(beta * ::sqrt(y)) * inv_i0_beta; + }); + }); + #endif } const char entr_name[] = "entr"; @@ -322,6 +386,7 @@ REGISTER_DISPATCH(erfinv_stub, &erfinv_kernel_cuda); REGISTER_DISPATCH(kaiser_window_stub, &kaiser_window_kernel_cuda); REGISTER_DISPATCH(special_entr_stub, &entr_kernel_cuda); REGISTER_DISPATCH(special_ndtri_stub, &ndtri_kernel_cuda); +REGISTER_DISPATCH(special_log_ndtr_stub, &log_ndtr_kernel_cuda); REGISTER_DISPATCH(special_erfcx_stub, &erfcx_kernel_cuda); } // namespace native diff --git a/aten/src/ATen/native/cuda/UnfoldBackwardKernel.cu b/aten/src/ATen/native/cuda/UnfoldBackwardKernel.cu index 8b43900e92716c..90f5238d0180da 100644 --- a/aten/src/ATen/native/cuda/UnfoldBackwardKernel.cu +++ b/aten/src/ATen/native/cuda/UnfoldBackwardKernel.cu @@ -1,3 +1,4 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include #include diff --git a/aten/src/ATen/native/cuda/Unique.cu b/aten/src/ATen/native/cuda/Unique.cu index d268ca1c490389..e25acb8e06efa0 100644 --- a/aten/src/ATen/native/cuda/Unique.cu +++ b/aten/src/ATen/native/cuda/Unique.cu @@ -1,8 +1,22 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#include +#include +#include +#include +#include +#include +#endif + #include #include #include diff --git a/aten/src/ATen/native/cuda/UniqueCub.cu b/aten/src/ATen/native/cuda/UniqueCub.cu index bda84bdda4e12d..cc19b96a779714 100644 --- a/aten/src/ATen/native/cuda/UniqueCub.cu +++ b/aten/src/ATen/native/cuda/UniqueCub.cu @@ -1,3 +1,4 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include #include @@ -5,6 +6,13 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#include +#endif + namespace at { namespace native { namespace internal { diff --git a/aten/src/ATen/native/cuda/UniqueCub.cuh b/aten/src/ATen/native/cuda/UniqueCub.cuh index 1bb96e3f5ebdf9..6e1cccc2e175cb 100644 --- a/aten/src/ATen/native/cuda/UniqueCub.cuh +++ b/aten/src/ATen/native/cuda/UniqueCub.cuh @@ -1,4 +1,4 @@ -#include +#include namespace at { namespace native { diff --git a/aten/src/ATen/native/cuda/UpSample.cuh b/aten/src/ATen/native/cuda/UpSample.cuh index f4d85512ba7242..09e460640df8de 100644 --- a/aten/src/ATen/native/cuda/UpSample.cuh +++ b/aten/src/ATen/native/cuda/UpSample.cuh @@ -1,9 +1,11 @@ +#pragma once #include #include #include #include #include +#include #include @@ -14,7 +16,7 @@ namespace upsample { // TODO: Remove duplicate declaration. TORCH_API c10::SmallVector compute_output_size( c10::IntArrayRef input_size, // Full input tensor size. - c10::optional output_size, + at::OptionalIntArrayRef output_size, c10::optional> scale_factors); } // namespace upsample diff --git a/aten/src/ATen/native/cuda/UpSampleBicubic2d.cu b/aten/src/ATen/native/cuda/UpSampleBicubic2d.cu index 29dec1735f2383..1214955b06d441 100644 --- a/aten/src/ATen/native/cuda/UpSampleBicubic2d.cu +++ b/aten/src/ATen/native/cuda/UpSampleBicubic2d.cu @@ -1,12 +1,21 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include -#include +#include #include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#endif + namespace at { namespace native { namespace { diff --git a/aten/src/ATen/native/cuda/UpSampleBilinear2d.cu b/aten/src/ATen/native/cuda/UpSampleBilinear2d.cu index 09ec9528ead149..d76e2783207f19 100644 --- a/aten/src/ATen/native/cuda/UpSampleBilinear2d.cu +++ b/aten/src/ATen/native/cuda/UpSampleBilinear2d.cu @@ -1,9 +1,10 @@ // Adapted from interp.cpp from Caffe util by Pauline Luc // Originally developed by George Papandreou -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include -#include +#include +#include #include #include #include @@ -12,6 +13,20 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#endif + namespace at { namespace native { namespace { diff --git a/aten/src/ATen/native/cuda/UpSampleLinear1d.cu b/aten/src/ATen/native/cuda/UpSampleLinear1d.cu index c23887cb79a6b7..af9edca2280e6f 100644 --- a/aten/src/ATen/native/cuda/UpSampleLinear1d.cu +++ b/aten/src/ATen/native/cuda/UpSampleLinear1d.cu @@ -1,15 +1,24 @@ // Adapted from interp.cpp from Caffe util by Pauline Luc // Originally developed by George Papandreou -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include -#include +#include #include #include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#endif + namespace at { namespace native { namespace { diff --git a/aten/src/ATen/native/cuda/UpSampleNearest1d.cu b/aten/src/ATen/native/cuda/UpSampleNearest1d.cu index 52b7b1d70947b1..decdfca30d7838 100644 --- a/aten/src/ATen/native/cuda/UpSampleNearest1d.cu +++ b/aten/src/ATen/native/cuda/UpSampleNearest1d.cu @@ -1,12 +1,23 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include -#include +#include #include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#endif + namespace at { namespace native { namespace { diff --git a/aten/src/ATen/native/cuda/UpSampleNearest2d.cu b/aten/src/ATen/native/cuda/UpSampleNearest2d.cu index 7b2a58c764bb46..8aa4f68aeda64c 100644 --- a/aten/src/ATen/native/cuda/UpSampleNearest2d.cu +++ b/aten/src/ATen/native/cuda/UpSampleNearest2d.cu @@ -1,7 +1,8 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include -#include +#include #include #include #include @@ -10,6 +11,17 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#endif + namespace at { namespace native { namespace { diff --git a/aten/src/ATen/native/cuda/UpSampleNearest3d.cu b/aten/src/ATen/native/cuda/UpSampleNearest3d.cu index 3b12614c10d5e4..1a4afa012d780e 100644 --- a/aten/src/ATen/native/cuda/UpSampleNearest3d.cu +++ b/aten/src/ATen/native/cuda/UpSampleNearest3d.cu @@ -1,11 +1,28 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include + +#include #include #include -#include +#include #include #include #include -#include + +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif namespace at { namespace native { @@ -322,7 +339,7 @@ using at::native::upsample_cuda::get_scale_value; Tensor upsample_nearest3d_cuda( const Tensor& input, - c10::optional output_size, + at::OptionalIntArrayRef output_size, c10::optional> scale_factors) { auto osize = compute_output_size(input.sizes(), output_size, scale_factors); auto scale_d = get_scale_value(scale_factors, 0); @@ -333,7 +350,7 @@ Tensor upsample_nearest3d_cuda( Tensor _upsample_nearest_exact3d_cuda( const Tensor& input, - c10::optional output_size, + at::OptionalIntArrayRef output_size, c10::optional> scale_factors) { auto osize = compute_output_size(input.sizes(), output_size, scale_factors); auto scale_d = get_scale_value(scale_factors, 0); @@ -345,7 +362,7 @@ Tensor _upsample_nearest_exact3d_cuda( // when structured kernels can handle QuantizedCPU, update these overloads to be CompositeExplicitAutograd Tensor upsample_nearest3d_backward_cuda( const Tensor& grad_output, - c10::optional output_size, + at::OptionalIntArrayRef output_size, IntArrayRef input_size, c10::optional> scale_factors) { auto osize = compute_output_size(input_size, output_size, scale_factors); @@ -357,7 +374,7 @@ Tensor upsample_nearest3d_backward_cuda( Tensor _upsample_nearest_exact3d_backward_cuda( const Tensor& grad_output, - c10::optional output_size, + at::OptionalIntArrayRef output_size, IntArrayRef input_size, c10::optional> scale_factors) { auto osize = compute_output_size(input_size, output_size, scale_factors); diff --git a/aten/src/ATen/native/cuda/UpSampleTrilinear3d.cu b/aten/src/ATen/native/cuda/UpSampleTrilinear3d.cu index a3623d2eb0f8b2..b19bf4858ac629 100644 --- a/aten/src/ATen/native/cuda/UpSampleTrilinear3d.cu +++ b/aten/src/ATen/native/cuda/UpSampleTrilinear3d.cu @@ -1,9 +1,10 @@ // Adapted from interp.cpp from Caffe util by Pauline Luc // Originally developed by George Papandreou -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include -#include +#include +#include #include #include #include @@ -12,6 +13,14 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#endif + namespace at { namespace native { namespace { diff --git a/aten/src/ATen/native/cuda/WeightNorm.cu b/aten/src/ATen/native/cuda/WeightNorm.cu index e9136ca61388bd..c451bc55349a8e 100644 --- a/aten/src/ATen/native/cuda/WeightNorm.cu +++ b/aten/src/ATen/native/cuda/WeightNorm.cu @@ -1,11 +1,24 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include +#include #include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#endif + + namespace at { namespace native { namespace { diff --git a/aten/src/ATen/native/cuda/group_norm_kernel.cu b/aten/src/ATen/native/cuda/group_norm_kernel.cu index f05f6e390edab5..53ce77fa37b113 100644 --- a/aten/src/ATen/native/cuda/group_norm_kernel.cu +++ b/aten/src/ATen/native/cuda/group_norm_kernel.cu @@ -1,13 +1,13 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include #include #include -#include +#include #include #include -#include #include #include #include @@ -15,6 +15,12 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#endif + namespace at { namespace native { diff --git a/aten/src/ATen/native/cuda/jit_utils.cpp b/aten/src/ATen/native/cuda/jit_utils.cpp index c8010a6e9b0afa..e7798d69fafb0d 100644 --- a/aten/src/ATen/native/cuda/jit_utils.cpp +++ b/aten/src/ATen/native/cuda/jit_utils.cpp @@ -1,3 +1,4 @@ +#define TORCH_ASSERT_NO_OPERATORS #include #include #include @@ -10,6 +11,7 @@ #include #include #include +#include #include #include @@ -118,6 +120,11 @@ const std::string jit_common_types = R"ESCAPE( Array() = default; Array(const Array&) = default; Array& operator=(const Array&) = default; + __device__ Array(T x) { + for (int i = 0; i < size; i++) { + data[i] = x; + } + } }; ${half_string} @@ -322,10 +329,7 @@ const std::string no_dynamic_cast_support_literal = R"ESCAPE( )ESCAPE"; -const std::string jit_code_template = R"ESCAPE( - - ${dynamic_casting_string} - +const std::string offset_calc_template = R"ESCAPE( template struct DivMod { T div; @@ -409,6 +413,14 @@ const std::string jit_code_template = R"ESCAPE( ${index_type} strides_[25][NARGS]; }; + +)ESCAPE"; + +const std::string jit_code_template = R"ESCAPE( + + ${dynamic_casting_string} + + ${functor} // TODO: setup grid-stride loop @@ -769,7 +781,7 @@ std::string generate_code( << ">(out[j], data[0], output_offsets[0]);\n"; env.s("store_outputs", store_outputs.str()); - static auto cuda_template = at::jit::CodeTemplate(jit_common_types + jit_code_template); + static auto cuda_template = at::jit::CodeTemplate(jit_common_types + offset_calc_template + jit_code_template); const auto code = cuda_template.format(env); return code; } @@ -808,6 +820,126 @@ std::string generate_code( return code; } +// Creates directories recursively +bool _r_mkdir(const std::string& dir) { + // Check if current dir exists + const char* p_dir = dir.c_str(); + const bool dir_exists = (access(p_dir, F_OK) == 0); + if (dir_exists) { + return true; + } + + // Try to create current directory +#ifdef _WIN32 + int ret = _mkdir(dir.c_str()); +#else + int ret = mkdir(dir.c_str(), S_IRWXU | S_IRWXG | S_IRWXO); +#endif + // Success + if (ret == 0) { + return true; + } + + // Find folder separator and check if we are at the top + auto pos = dir.find_last_of("/\\"); + if (pos == std::string::npos) { + return false; + } + + // Try to create parent directory + if (!(_r_mkdir(dir.substr(0, pos)))) { + return false; + } + + // Try to create complete path again +#ifdef _WIN32 + ret = _mkdir(dir.c_str()); +#else + ret = mkdir(dir.c_str(), S_IRWXU | S_IRWXG | S_IRWXO); +#endif + return ret == 0; +} + +// Creates directories recursively assuming that base exists +bool r_mkdir_with_base(std::string& base, std::string& dir){ + const char* p_base = base.c_str(); + const bool base_exists = (access(p_base, F_OK) == 0); + if (!base_exists) { + return false; + } + + // remove trailing '/' or '\\' + if ((base[base.size()-1]=='/') || base[base.size()-1]=='\\') { + base.pop_back(); + } + if ((dir[dir.size()-1]=='/') || dir[dir.size()-1]=='\\') { + dir.pop_back(); + } + + return _r_mkdir(base+dir); + +} + +std::string load_code_template(const std::string& path) { + std::ifstream ifs{path}; + std::string s{ + std::istreambuf_iterator(ifs), + std::istreambuf_iterator()}; + return s; +} + +std::string generate_reduction_code( + int nOutputs, + const std::string& func, + const std::string& name, + const int vt0, + const std::string& f_inputs_type, + const std::string& reduction_accum_type, + const std::string& result_type, + bool contiguous, + bool vectorized, + int vec_size, + int max_threads_codegen) { + at::jit::TemplateEnv env; + env.s("index_type", "unsigned int"); + env.s("scalar_type", f_inputs_type); + env.s("result_type", result_type); + env.s("reduction_accum_type", reduction_accum_type); + env.s("vt0", std::to_string(vt0)); + env.s("name", name); + env.s("max_threads_lb", std::to_string(max_threads_codegen)); + // reductions don't support dynamic casting, so the only way to get nonstandard types + // is through input + if (f_inputs_type == "at::Half") { + env.s("half_string", jiterator_half_support_literal); + } else { + env.s("half_string", ""); + } + if (f_inputs_type == "at::BFloat16") { + env.s("bfloat16_string", jiterator_bfloat16_support_literal); + } else { + env.s("bfloat16_string", ""); + } + if (f_inputs_type == "std::complex" || + f_inputs_type == "std::complex" ) { + env.s("traits_string", get_traits_string()); + env.s("complex_body_string", get_complex_body_string()); + env.s("complex_math_string", get_complex_math_string()); + env.s("complex", std::to_string(1)); + } else { + env.s("traits_string", ""); + env.s("complex_body_string", ""); + env.s("complex_math_string", ""); + env.s("complex", std::to_string(0)); + } + env.s("cmath_string", get_cmath_string()); + env.s("functor", func); + env.s("output_vec_size", std::to_string(vec_size)); + static auto cuda_template = at::jit::CodeTemplate( + jit_common_types + offset_calc_template + get_reduction_template()); + const auto code = cuda_template.format(env); + return code; +} // Acquires (possibly creating) the kernel cache directory c10::optional get_cache_dir() { @@ -822,6 +954,8 @@ c10::optional get_cache_dir() { // Cache path comes from PYTORCH_KERNEL_CACHE_PATH, then TEMP (Windows) or XDG_CACHE_HOME (Linux), then HOME environment variables std::string cache_dir; char* ptkcp = std::getenv("PYTORCH_KERNEL_CACHE_PATH"); + // Create kernel_cache_dir if needed as we do not want to create the base directory passed by the user + std::string kernels_cache_dir = ""; if (ptkcp != nullptr) { cache_dir = std::string(ptkcp); } else { @@ -832,7 +966,8 @@ c10::optional get_cache_dir() { ptkcp = std::getenv("XDG_CACHE_HOME"); #endif if (ptkcp != nullptr) { - cache_dir = std::string(ptkcp) + "/torch/kernels"; + kernels_cache_dir = "/torch/kernels"; + cache_dir = std::string(ptkcp) + kernels_cache_dir; } else { // Falls back to HOME/.cache ptkcp = std::getenv("HOME"); @@ -841,7 +976,8 @@ c10::optional get_cache_dir() { " This disables kernel caching."); return {}; } else { - cache_dir = std::string(ptkcp) + "/.cache/torch/kernels"; + kernels_cache_dir = "/.cache/torch/kernels"; + cache_dir = std::string(ptkcp) + kernels_cache_dir; } } } @@ -850,11 +986,8 @@ c10::optional get_cache_dir() { const char* p_cache_dir = cache_dir.c_str(); const bool cache_dir_exists = (access(p_cache_dir, F_OK) == 0); if (!cache_dir_exists) { -#ifdef _WIN32 - if (_mkdir(p_cache_dir) != 0) { -#else - if (mkdir(p_cache_dir, S_IRWXU | S_IRWXG | S_IRWXO) != 0) { -#endif + std::string s_ptkcp = std::string(ptkcp); + if (!r_mkdir_with_base(s_ptkcp, kernels_cache_dir)) { TORCH_WARN_ONCE("Specified kernel cache directory could not be created! This disables kernel caching.", " Specified directory is ", cache_dir, ".", " This warning will appear only once per process."); @@ -886,9 +1019,7 @@ c10::optional get_cache_dir() { NvrtcFunction jit_pwise_function( const std::string& code, const std::string& kernel_name) { - initializeCudaContext(); - // Acquires CUDA and nvrtc versions and whether we're compiling to ptx or SASS const cudaDeviceProp* prop = at::cuda::getCurrentDeviceProperties(); int cuda_major = 0, cuda_minor = 0, nvrtc_major = 0, nvrtc_minor = 0; @@ -983,7 +1114,7 @@ NvrtcFunction jit_pwise_function( AT_CUDA_NVRTC_CHECK(nvrtc.nvrtcGetProgramLog(program, log.data())); std::stringstream cu; cu << log.data(); - throw std::runtime_error(cu.str() + code); + throw std::runtime_error(code + cu.str()); } size_t ptx_size = 0; @@ -1049,24 +1180,26 @@ NvrtcFunction jit_pwise_function( void launch_jitted_pwise_function( NvrtcFunction function, void* args[], - const int nBlocks, - const int kBlockSize) { + const dim3 nBlocks, + const dim3 kBlockSize, + const int smem) { initializeCudaContext(); const auto& nvrtc = at::globalContext().getNVRTC(); // Launches kernel on current stream auto stream = at::cuda::getCurrentCUDAStream(); AT_CUDA_DRIVER_CHECK(nvrtc.cuLaunchKernel( function.function, - nBlocks, - 1, - 1, - kBlockSize, - 1, - 1, - 0, + nBlocks.x, + nBlocks.y, + nBlocks.z, + kBlockSize.x, + kBlockSize.y, + kBlockSize.z, + smem, stream, args, nullptr)); } + }}} // at::cuda::jit diff --git a/aten/src/ATen/native/cuda/jit_utils.h b/aten/src/ATen/native/cuda/jit_utils.h index 1f0f9c491b17a8..1ff6de701fc34c 100644 --- a/aten/src/ATen/native/cuda/jit_utils.h +++ b/aten/src/ATen/native/cuda/jit_utils.h @@ -32,6 +32,19 @@ std::string generate_code( bool vectorized=false, int vec_size=0); +std::string generate_reduction_code( + int nOutputs, + const std::string& func, + const std::string& name, + const int vt0, + const std::string& f_inputs_type, + const std::string& reduction_accum_type, + const std::string& result_type, + bool contiguous, + bool vectorized, + int vec_size, + int max_threads_codegen); + NvrtcFunction jit_pwise_function( const std::string& code, const std::string& kernel_name); @@ -39,8 +52,9 @@ NvrtcFunction jit_pwise_function( void launch_jitted_pwise_function( NvrtcFunction function, void* args[], - const int nBlocks, - const int kBlockSize); + const dim3 nBlocks, + const dim3 kBlockSize, + const int smem=0); template struct delayed_false : std::false_type { diff --git a/aten/src/ATen/native/cuda/layer_norm_kernel.cu b/aten/src/ATen/native/cuda/layer_norm_kernel.cu index 9fc2d02067092f..faa0fd2d4b9811 100644 --- a/aten/src/ATen/native/cuda/layer_norm_kernel.cu +++ b/aten/src/ATen/native/cuda/layer_norm_kernel.cu @@ -1,18 +1,29 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include #include #include -#include +#include #include #include -#include #include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#endif + #include namespace at { @@ -934,6 +945,7 @@ std::tuple layer_norm_backward_cuda( return std::make_tuple(std::move(dX), std::move(dgamma), std::move(dbeta)); } +REGISTER_DISPATCH(LayerNormKernel, &LayerNormKernelImpl); } // namespace native } // namespace at diff --git a/aten/src/ATen/native/cuda/linalg/BatchLinearAlgebra.cpp b/aten/src/ATen/native/cuda/linalg/BatchLinearAlgebra.cpp index 1099ba88cb4897..de4f222b362604 100644 --- a/aten/src/ATen/native/cuda/linalg/BatchLinearAlgebra.cpp +++ b/aten/src/ATen/native/cuda/linalg/BatchLinearAlgebra.cpp @@ -2882,19 +2882,27 @@ static void apply_lu_solve_looped_magma(const Tensor& b, const Tensor& lu, const auto pivots_data = pivots_cpu.data_ptr(); auto b_stride = matrixStride(b); - auto lu_stride = matrixStride(lu); - auto pivots_stride = pivots_cpu.size(-1); + auto lu_stride = lu.dim() > 2 ? lu.stride(-3) : 0; + auto pivots_stride = pivots_cpu.dim() > 1 ? pivots_cpu.stride(-2) : 0; auto batch_size = batchCount(b); magma_int_t n = magma_int_cast(lu.size(-2), "n"); magma_int_t nrhs = magma_int_cast(b.size(-1), "nrhs"); auto leading_dimension = std::max(1, n); + // lu and pivots tensors can be broadcast to b + // here we construct a helper indexing tensor to linearly index into lu and pivots + IntArrayRef lu_batch_shape(lu.sizes().data(), lu.dim() - 2); + IntArrayRef b_batch_shape(b.sizes().data(), b.dim() - 2); + BroadcastLinearIndices lu_index( + batchCount(lu), lu_batch_shape, b_batch_shape); + int info = 0; for (decltype(batch_size) i = 0; i < batch_size; i++) { + int64_t lu_index_i = lu_index(i); scalar_t* b_working_ptr = &b_data[i * b_stride]; - scalar_t* lu_working_ptr = &lu_data[i * lu_stride]; - int* pivots_working_ptr = &pivots_data[i * pivots_stride]; + scalar_t* lu_working_ptr = &lu_data[lu_index_i * lu_stride]; + int* pivots_working_ptr = &pivots_data[lu_index_i * pivots_stride]; magmaLuSolve(n, nrhs, lu_working_ptr, leading_dimension, pivots_working_ptr, b_working_ptr, leading_dimension, &info, trans); @@ -2927,6 +2935,8 @@ static void apply_lu_solve_batched_magma(const Tensor& b, const Tensor& lu, cons "Calling torch.lu_solve on a CUDA tensor requires compiling ", "PyTorch with MAGMA. Please rebuild with MAGMA."); #else + TORCH_INTERNAL_ASSERT(batchCount(b) == batchCount(lu), "batch_size of b and lu must be the same"); + TORCH_INTERNAL_ASSERT(batchCount(lu) == batchCount(pivots.unsqueeze(-1)), "batch_size of lu and pivots must be the same"); auto trans = to_magma(transpose); auto b_data = b.data_ptr(); auto lu_data = lu.data_ptr(); @@ -2993,9 +3003,36 @@ static void lu_solve_looped_magma(const Tensor& b, const Tensor& lu, const Tenso }); } +namespace { + +c10::MaybeOwned maybe_expand_lu(const Tensor& b, const Tensor& lu) { + if (batchCount(b) != batchCount(lu)) { + IntArrayRef b_batch_size(b.sizes().data(), b.dim() - 2); + DimVector expand_size(b_batch_size); + expand_size.insert(expand_size.end(), {lu.size(-2), lu.size(-1)}); + return c10::MaybeOwned::owned( + cloneBatchedColumnMajor(lu.expand(expand_size))); + } else { + return c10::MaybeOwned::borrowed(lu); + } +} + +c10::MaybeOwned maybe_expand_pivots(const Tensor& b,const Tensor& pivots) { + if (batchCount(b) != batchCount(pivots.unsqueeze(-1))) { + IntArrayRef b_batch_size(b.sizes().data(), b.dim() - 2); + DimVector expand_size(b_batch_size); + expand_size.insert(expand_size.end(), {pivots.size(-1)}); + return c10::MaybeOwned::owned( + pivots.expand(expand_size).clone(at::MemoryFormat::Contiguous)); + } else { + return c10::MaybeOwned::borrowed(pivots); + } +} + +} // anonymous namespace static void lu_solve_trans_dispatch(const Tensor& b, const Tensor& lu, const Tensor& pivots, TransposeType trans) { - auto batch_size = batchCount(lu); + auto batch_size = batchCount(b); auto m = lu.size(-2); auto b2 = b.size(-1); bool over_magma_dim_limit = b2 > 1024; // magma implementation of LU solve cannot handle a b tensor with last dim > 1024 (https://bitbucket.org/icl/magma/issues/19/dgesv_batched-dgetrs_batched-fails-for) @@ -3011,11 +3048,15 @@ static void lu_solve_trans_dispatch(const Tensor& b, const Tensor& lu, const Ten #endif // ifdef USE_CUSOLVER #ifdef CUDART_VERSION else if ((batch_size > 2 && m <= 128) || (batch_size > 8 && over_magma_dim_limit)) { - lu_solve_batched_cublas(b, lu, pivots, trans); + c10::MaybeOwned lu_ = maybe_expand_lu(b, lu); + c10::MaybeOwned pivots_ = maybe_expand_pivots(b, pivots); + lu_solve_batched_cublas(b, *lu_, *pivots_, trans); } #endif // ifdef CUDART_VERSION else { - lu_solve_batched_magma(b, lu, pivots, trans); + c10::MaybeOwned lu_ = maybe_expand_lu(b, lu); + c10::MaybeOwned pivots_ = maybe_expand_pivots(b, pivots); + lu_solve_batched_magma(b, *lu_, *pivots_, trans); } } @@ -3190,27 +3231,20 @@ void lstsq_kernel(const Tensor& a, Tensor& b, Tensor& /*rank*/, Tensor& /*singul "Please rebuild with cuSOLVER."); #endif } else { // m >= n -#if !AT_MAGMA_ENABLED() - // MAGMA is not available we can either use cuBLAS or cuSOLVER here +#if !AT_ROCM_ENABLED() + // On CUDA platform we use either cuBLAS or cuSOLVER here // the batched vs looped dispatch is implemented based on the following performance results // https://github.com/pytorch/pytorch/pull/54725#issuecomment-832234456 if (m <= 256 && batchCount(b) >= std::max(2, m / 16)) { - // if CUDART_VERSION is defined then cuBLAS is available - #ifdef CUDART_VERSION gels_batched_cublas(a, b, infos); - #else - // this would either call cuSOLVER or MAGMA, - // if MAGMA is called a runtime error is thrown about not finding MAGMA in compilation - gels_looped(a, b, infos); - #endif // CUDART_VERSION } else { gels_looped(a, b, infos); } #else - // if both MAGMA and cuSOLVER are available this would call cuSOLVER - // MAGMA is called if cuSOLVER is not available - gels_looped(a, b, infos); -#endif // AT_MAGMA_ENABLED() + // On ROCm platform we can only use MAGMA here + // If MAGMA is not available, an error will be thrown + gels_magma(a, b, infos); +#endif // !AT_ROCM_ENABLED() } } diff --git a/aten/src/ATen/native/cuda/linalg/BatchLinearAlgebraLib.cpp b/aten/src/ATen/native/cuda/linalg/BatchLinearAlgebraLib.cpp index 279c289e9e54ba..5b582a2fd2fb16 100644 --- a/aten/src/ATen/native/cuda/linalg/BatchLinearAlgebraLib.cpp +++ b/aten/src/ATen/native/cuda/linalg/BatchLinearAlgebraLib.cpp @@ -96,6 +96,8 @@ static void apply_lu_solve_batched_cublas(const Tensor& b, const Tensor& lu, con #ifndef CUDART_VERSION TORCH_CHECK(false, "lu_solve: cuBLAS backend for lu_solve is not available.") #else + TORCH_INTERNAL_ASSERT(batchCount(b) == batchCount(lu), "batch_size of b and lu must be the same"); + TORCH_INTERNAL_ASSERT(batchCount(lu) == batchCount(pivots.unsqueeze(-1)), "batch_size of lu and pivots must be the same"); const auto trans = to_cublas(transpose); auto pivots_data = pivots.data_ptr(); @@ -1446,26 +1448,34 @@ void lu_solve_looped_cusolver(const Tensor& b, const Tensor& lu, const Tensor& p const auto trans = to_cublas(transpose); int n = cuda_int_cast(lu.size(-2), "n"); int nrhs = cuda_int_cast(b.size(-1), "nrhs"); - auto batch_size = batchCount(lu); + auto batch_size = batchCount(b); auto info = at::zeros({1}, lu.options().dtype(kInt)); auto info_data = info.data_ptr(); auto b_data = b.data_ptr(); auto lu_data = lu.data_ptr(); auto pivots_data = pivots.data_ptr(); - auto pivots_stride = pivots.size(-1); - auto lu_stride = matrixStride(lu); + auto pivots_stride = pivots.dim() > 1 ? pivots.stride(-2) : 0; + auto lu_stride = lu.dim() > 2 ? lu.stride(-3) : 0; auto b_stride = matrixStride(b); int leading_dimension = cuda_int_cast(std::max(1, n), "leading_dimension"); + // lu and pivots tensors can be broadcast to b + // here we construct a helper indexing tensor to linearly index into lu and pivots + IntArrayRef lu_batch_shape(lu.sizes().data(), lu.dim() - 2); + IntArrayRef b_batch_shape(b.sizes().data(), b.dim() - 2); + BroadcastLinearIndices lu_index( + batchCount(lu), lu_batch_shape, b_batch_shape); + auto handle = at::cuda::getCurrentCUDASolverDnHandle(); for (auto batch = decltype(batch_size){0}; batch < batch_size; ++batch) { + int64_t lu_index_i = lu_index(batch); at::cuda::solver::getrs( handle, n, nrhs, - lu_data + batch * lu_stride, + lu_data + lu_index_i * lu_stride, leading_dimension, - pivots_data + batch * pivots_stride, + pivots_data + lu_index_i * pivots_stride, b_data + batch * b_stride, leading_dimension, info_data, diff --git a/aten/src/ATen/native/cuda/reduction_template.cuh b/aten/src/ATen/native/cuda/reduction_template.cuh new file mode 100644 index 00000000000000..4d9d559d8ec8a6 --- /dev/null +++ b/aten/src/ATen/native/cuda/reduction_template.cuh @@ -0,0 +1,664 @@ +namespace at { +namespace cuda { +//windows doesn't like large string literals, so split in two +const std::string reduction_template_0 = R"ESCAPE( + #define C10_HOST_DEVICE __host__ __device__ + #define C10_DEVICE __device__ + + template + __device__ __forceinline__ T WARP_SHFL_DOWN(T value, unsigned int delta, int width = warpSize, unsigned int mask = 0xffffffff) + { + return __shfl_down_sync(mask, value, delta, width); + } + + + #if ${complex} + template + __device__ __forceinline__ std::complex WARP_SHFL_DOWN(std::complex value, unsigned int delta, int width = warpSize, unsigned int mask = 0xffffffff) + { + return std::complex( + __shfl_down_sync(mask, value.real(), delta, width), + __shfl_down_sync(mask, value.imag(), delta, width)); + } + #endif + + // aligned vector generates vectorized load/store on CUDA + template + struct alignas(sizeof(scalar_t) * vec_size) aligned_vector { + scalar_t val[vec_size]; + }; + + + C10_HOST_DEVICE static void reduce_fraction(size_t &numerator, size_t &denominator) { + // get GCD of num and denom using Euclid's algorithm. + // Can replace this with std::gcd if we ever support c++17. + size_t a = denominator; + size_t b = numerator; + while (b != 0) { + a %= b; + // swap(a,b) + size_t tmp = a; + a = b; + b = tmp; + } + + // a is now the GCD + numerator /= a; + denominator /= a; + } + + + + + struct ReduceConfig { + //has to match host-side ReduceConfig in the eager code + static constexpr int BLOCK_X = 0; + static constexpr int BLOCK_Y = 1; + static constexpr int CTA = 2; + + static constexpr int input_vec_size = 4; + int element_size_bytes; + int num_inputs; + int num_outputs; + int step_input = 1; + int step_output = 1; + int ctas_per_output = 1; + int input_mult[3] = {0, 0, 0}; + int output_mult[2] = {0, 0}; + + int block_width; + int block_height; + int num_threads; + + bool vectorize_input = false; + int output_vec_size = 1; + + C10_HOST_DEVICE bool should_block_x_reduce() const { + return input_mult[BLOCK_X] != 0; + } + + C10_HOST_DEVICE bool should_block_y_reduce() const { + return input_mult[BLOCK_Y] != 0; + } + + C10_HOST_DEVICE bool should_global_reduce() const { + return input_mult[CTA] != 0; + } + + C10_DEVICE bool should_store(int output_idx) const { + return output_idx < num_outputs && + (!should_block_x_reduce() || threadIdx.x == 0) && + (!should_block_y_reduce() || threadIdx.y == 0); + } + + C10_DEVICE bool should_reduce_tail() const { + return (!should_block_y_reduce() || threadIdx.y == 0) && + (!should_global_reduce() || blockIdx.y == 0); + } + + C10_HOST_DEVICE int input_idx() const { + int lane = threadIdx.x; + int warp = threadIdx.y; + int cta2 = blockIdx.y; + return (lane * input_mult[BLOCK_X] + + warp * input_mult[BLOCK_Y] + + cta2 * input_mult[CTA]); + } + + template + C10_HOST_DEVICE int output_idx() const { + int lane = threadIdx.x; + int warp = threadIdx.y; + int cta1 = blockIdx.x; + return (lane * output_mult[BLOCK_X] + + warp * output_mult[BLOCK_Y] + + cta1 * step_output) * output_vec_size; + } + + C10_DEVICE int shared_memory_offset(int offset) const { + return threadIdx.x + (threadIdx.y + offset) * blockDim.x; + } + + C10_DEVICE int staging_memory_offset(int cta2) const { + int offset = cta2 + blockIdx.x * gridDim.y; + if (!should_block_x_reduce()) { + offset = threadIdx.x + offset * blockDim.x; + } + return offset; + } + + + }; + + +//TODO this will need to be different for more generic reduction functions +namespace reducer { + + using scalar_t = ${scalar_type}; + using arg_t = ${reduction_accum_type}; + using out_scalar_t = ${result_type}; + + + inline __device__ ${functor} + + inline __device__ out_scalar_t project(arg_t arg) { + return (out_scalar_t) arg; + } + + inline __device__ arg_t warp_shfl_down(arg_t arg, int offset) { + return WARP_SHFL_DOWN(arg, offset); + } + + inline __device__ arg_t translate_idx(arg_t acc, int64_t /*idx*/) { + return acc; + } + + // wrap a normal reduction that ignores the index + inline __device__ arg_t reduce(arg_t acc, arg_t val, int64_t idx) { + return combine(acc, val); + } +} + + +struct ReduceJitOp { + using scalar_t = ${scalar_type}; + using arg_t = ${reduction_accum_type}; + using out_scalar_t = ${result_type}; + + using InputCalculator = OffsetCalculator<1>; + using OutputCalculator = OffsetCalculator<2>; + +// static constexpr bool can_accumulate_in_output = +// std::is_convertible::value +// && std::is_convertible::value; + + static constexpr int input_vec_size = ReduceConfig::input_vec_size; + + arg_t ident; + ReduceConfig config; + InputCalculator input_calc; + OutputCalculator output_calc; + const void* src; + const char* dst[2]; //it accepts at most two destinations + // acc_buf used for accumulation among sub Tensor Iterator when accumulation on + // output is not permissible + void* acc_buf; + // cta_buf used for accumulation between blocks during global reduction + void* cta_buf; + int* semaphores; + int64_t base_idx; + bool accumulate; + bool final_output; + int noutputs; + + + C10_DEVICE void run() const { + extern __shared__ char shared_memory[]; + uint32_t output_idx = config.output_idx<${output_vec_size}>(); + uint32_t input_idx = config.input_idx(); + auto base_offsets1 = output_calc.get(output_idx)[1]; + + using arg_vec_t = Array; + arg_vec_t value; + + if (output_idx < config.num_outputs && input_idx < config.num_inputs) { + const scalar_t* input_slice = (const scalar_t*)((const char*)src + base_offsets1); + + value = thread_reduce<${output_vec_size}>(input_slice); + } + + if (config.should_block_y_reduce()) { + value = block_y_reduce<${output_vec_size}>(value, shared_memory); + } + if (config.should_block_x_reduce()) { + value = block_x_reduce<${output_vec_size}>(value, shared_memory); + } + + using out_ptr_vec_t = Array; + using offset_vec_t = Array; + offset_vec_t base_offsets; + out_ptr_vec_t out; + + #pragma unroll + for (int i = 0; i < ${output_vec_size}; i++) { + base_offsets[i] = output_calc.get(output_idx + i)[0]; + out[i] = (out_scalar_t*)((char*)dst[0] + base_offsets[i]); + } + + arg_vec_t* acc = nullptr; + if (acc_buf != nullptr) { + size_t numerator = sizeof(arg_t); + size_t denominator = sizeof(out_scalar_t); + reduce_fraction(numerator, denominator); + acc = (arg_vec_t*)((char*)acc_buf + (base_offsets[0] * numerator / denominator)); + } + + if (config.should_global_reduce()) { + value = global_reduce<${output_vec_size}>(value, acc, shared_memory); + } else if (config.should_store(output_idx)) { + if (accumulate) { + #pragma unroll + for (int i = 0; i < ${output_vec_size}; i++) { + value[i] = reducer::translate_idx(value[i], base_idx); + } + } + + if (acc == nullptr) { + if (accumulate) { + value = accumulate_in_output<${output_vec_size}>(out, value); + } + if (final_output) { + set_results_to_output<${output_vec_size}>(value, base_offsets); + } else { + #pragma unroll + for (int i = 0; i < ${output_vec_size}; i++) { + *(out[i]) = get_accumulated_output(out[i], value[i]); + } + } + } else { + if (accumulate) { + #pragma unroll + for (int i = 0; i < ${output_vec_size}; i++) { + value[i] = reducer::combine((*acc)[i], value[i]); + } + } + if (final_output) { + set_results_to_output<${output_vec_size}>(value, base_offsets); + } else { + *acc = value; + } + } + } + } + + template + C10_DEVICE Array thread_reduce(const scalar_t* data) const { + if (config.vectorize_input) { + assert(output_vec_size == 1); + // reduce at the header of input_slice where memory is not aligned, + // so that thread_reduce will have an aligned memory to work on. + return {input_vectorized_thread_reduce_impl(data)}; + } else { + uint32_t element_stride = input_calc.strides_[0][0] / sizeof(scalar_t); + bool is_contiguous = (input_calc.dims == 1 && element_stride == 1); + if (is_contiguous) { + return thread_reduce_impl(data, [](uint32_t idx) { return idx; }); + } else if (input_calc.dims == 1) { + return thread_reduce_impl(data, [&](uint32_t idx) { return idx * element_stride; }); + } else { + return thread_reduce_impl(data, [&](uint32_t idx) { return input_calc.get(idx)[0] / sizeof(scalar_t); }); + } + } + } + + C10_DEVICE arg_t input_vectorized_thread_reduce_impl(const scalar_t* data) const { + uint32_t end = config.num_inputs; + + // Handle the head of input slice where data is not aligned + arg_t value = ident; + constexpr int align_bytes = alignof(aligned_vector); + constexpr int align_elements = align_bytes / sizeof(scalar_t); + int shift = ((int64_t)data) % align_bytes / sizeof(scalar_t); + if (shift > 0) { + data -= shift; + end += shift; + if(threadIdx.x >= shift && threadIdx.x < align_elements && config.should_reduce_tail()){ + value = reducer::reduce(value, data[threadIdx.x], threadIdx.x - shift); + } + end -= align_elements; + data += align_elements; + shift = align_elements - shift; + } + + // Do the vectorized reduction + using load_t = aligned_vector; + + uint32_t idx = config.input_idx(); + const uint32_t stride = config.step_input; + + // Multiple accumulators to remove dependency between unrolled loops. + arg_t value_list[input_vec_size]; + value_list[0] = value; + + #pragma unroll + for (int i = 1; i < input_vec_size; i++) { + value_list[i] = ident; + } + + scalar_t values[input_vec_size]; + + load_t *values_vector = reinterpret_cast(&values[0]); + + while (idx * input_vec_size + input_vec_size - 1 < end) { + *values_vector = reinterpret_cast(data)[idx]; + #pragma unroll + for (uint32_t i = 0; i < input_vec_size; i++) { + value_list[i] = reducer::reduce(value_list[i], values[i], shift + idx * input_vec_size + i); + } + idx += stride; + } + + // tail + uint32_t tail_start = end - end % input_vec_size; + if (config.should_reduce_tail()) { + int idx = tail_start + threadIdx.x; + if (idx < end) { + value_list[0] = reducer::reduce(value_list[0], data[idx], idx + shift); + } + } + + // combine accumulators + #pragma unroll + for (int i = 1; i < input_vec_size; i++) { + value_list[0] = reducer::combine(value_list[0], value_list[i]); + } + return value_list[0]; + } + + template + C10_DEVICE Array thread_reduce_impl(const scalar_t* data_, offset_calc_t calc) const { + uint32_t idx = config.input_idx(); + const uint32_t end = config.num_inputs; + const uint32_t stride = config.step_input; + const int vt0=${vt0}; + + using arg_vec_t = Array; + using load_t = aligned_vector; + const load_t* data = reinterpret_cast(data_); + + // Multiple accumulators to remove dependency between unrolled loops. + arg_vec_t value_list[vt0]; + + #pragma unroll + for (int i = 0; i < vt0; i++) { + #pragma unroll + for (int j = 0; j < output_vec_size; j++) { + value_list[i][j] = ident; + } + } + + load_t values[vt0]; + + while (idx + (vt0 - 1) * stride < end) { + #pragma unroll + for (uint32_t i = 0; i < vt0; i++) { + values[i] = data[calc(idx + i * stride) / output_vec_size]; + } + #pragma unroll + for (uint32_t i = 0; i < vt0; i++) { + #pragma unroll + for (uint32_t j = 0; j < output_vec_size; j++) { + value_list[i][j] = reducer::reduce(value_list[i][j], values[i].val[j], idx + i * stride); + } + } + idx += stride * vt0; + } + + // tail + int idx_ = idx; + #pragma unroll + for (uint32_t i = 0; i < vt0; i++) { + if (idx >= end) { + break; + } + values[i] = data[calc(idx) / output_vec_size]; + idx += stride; + } + idx = idx_; + #pragma unroll + for (uint32_t i = 0; i < vt0; i++) { + if (idx >= end) { + break; + } + #pragma unroll + for (uint32_t j = 0; j < output_vec_size; j++) { + value_list[i][j] = reducer::reduce(value_list[i][j], values[i].val[j], idx); + } + idx += stride; + } + + // combine accumulators + #pragma unroll + for (int i = 1; i < vt0; i++) { + #pragma unroll + for (uint32_t j = 0; j < output_vec_size; j++) { + value_list[0][j] = reducer::combine(value_list[0][j], value_list[i][j]); + } + } + return value_list[0]; + } + template + C10_DEVICE Array block_x_reduce(Array value, char* shared_memory) const { + using args_vec_t = Array; + int dim_x = blockDim.x; + args_vec_t* shared = (args_vec_t*)shared_memory; + if (dim_x > warpSize) { + int address_base = threadIdx.x + threadIdx.y*blockDim.x; + shared[address_base] = value; + for (int offset = dim_x/2; offset >= warpSize; offset >>= 1) { + __syncthreads(); + if (threadIdx.x < offset && threadIdx.x + offset < blockDim.x) { + args_vec_t other = shared[address_base + offset]; + #pragma unroll + for (int i = 0; i < output_vec_size; i++) { + value[i] = reducer::combine(value[i], other[i]); + } + shared[address_base] = value; + } + } + dim_x = warpSize; + } + + __syncthreads(); + + for (int offset = 1; offset < dim_x; offset <<= 1) { + #pragma unroll + for (int i = 0; i < output_vec_size; i++) { + arg_t other = reducer::warp_shfl_down(value[i], offset); + value[i] = reducer::combine(value[i], other); + } + } + return value; + } + + template + C10_DEVICE Array block_y_reduce(Array value, char* shared_memory) const { + using args_vec_t = Array; + args_vec_t* shared = (args_vec_t*)shared_memory; + shared[config.shared_memory_offset(0)] = value; + for (int offset = blockDim.y / 2; offset > 0; offset >>= 1) { + __syncthreads(); + if (threadIdx.y < offset && threadIdx.y + offset < blockDim.y) { + args_vec_t other = shared[config.shared_memory_offset(offset)]; + #pragma unroll + for (int i = 0; i < output_vec_size; i++) { + value[i] = reducer::combine(value[i], other[i]); + } + shared[config.shared_memory_offset(0)] = value; + } + } + return value; + } + )ESCAPE"; + + const std::string reduction_template_1 = R"ESCAPE( + + C10_DEVICE bool mark_block_finished() const { + __shared__ bool is_last_block_done_shared; + + __syncthreads(); + if (threadIdx.x == 0 && threadIdx.y == 0) { + int prev_blocks_finished = atomicAdd(&semaphores[blockIdx.x], 1); + is_last_block_done_shared = (prev_blocks_finished == gridDim.y - 1); + } + + __syncthreads(); + + return is_last_block_done_shared; + } + + template + C10_DEVICE Array accumulate_in_output( + Array out, + Array value + ) const { + Array ret; + #pragma unroll + for (int i = 0; i < output_vec_size; i++) { + ret[i] = reducer::combine(*(out[i]), value[i]); + } + return ret; + } + + + C10_DEVICE out_scalar_t get_accumulated_output( + out_scalar_t* out, arg_t value + ) const { + assert(!final_output); + return (out_scalar_t)value; + } + + template + C10_DEVICE void set_results(const T x, const uint32_t base_offset) const { + assert(noutputs == 1); + auto res = (out_scalar_t*)((char*)dst[0] + base_offset); + *res = x; + } + +//TODO - multi-output reduction - we won't be able to use thrust::pair +//just explicitly specify typed output reads/writes +//Currently implemented for max of two outputs +// template +// C10_DEVICE void set_results(const thrust::pair x, const index_t base_offset) const { +// if (noutputs >= 1) { +// auto res0 = (T1*)((char*)dst[0] + base_offset); +// *res0 = x.first; +// } +// if (noutputs >= 2) { +// // base offset is computed assuming element size being sizeof(T1), so we need to make a +// // correction to obtain the correct base offset +// auto res1 = (T2*) ((char *) dst[1] + base_offset / sizeof(T1) * sizeof(T2)); +// *res1 = x.second; +// } +// } + + template + C10_DEVICE void set_results_to_output(Array value, Array base_offset) const { + assert(final_output); + #pragma unroll + for (int i = 0; i < output_vec_size; i++) { + set_results(reducer::project(value[i]), base_offset[i]); + } + } + + template + C10_DEVICE Array global_reduce(Array value, Array *acc, char* shared_memory) const { + using arg_vec_t = Array; + using out_ptr_vec_t = Array; + using offset_vec_t = Array; + + arg_vec_t* reduce_buffer = (arg_vec_t*)cta_buf; + uint32_t output_idx = config.output_idx(); + offset_vec_t base_offsets; + out_ptr_vec_t out; + + #pragma unroll + for (int i = 0; i < output_vec_size; i++) { + base_offsets[i] = output_calc.get(output_idx + i)[0]; + out[i] = (out_scalar_t*)((char*)dst[0] + base_offsets[i]); + } + + bool should_store = config.should_store(output_idx); + if (should_store) { + uint32_t offset = config.staging_memory_offset(blockIdx.y); + reduce_buffer[offset] = value; + } + + __threadfence(); // make sure writes are globally visible + __syncthreads(); // if multiple warps in this block wrote to staging, make sure they're all done + bool is_last_block_done = mark_block_finished(); + + if (is_last_block_done) { + value = ident; + if (config.should_block_x_reduce()) { + uint32_t input_offset = threadIdx.x + threadIdx.y * blockDim.x; + uint32_t step = blockDim.x * blockDim.y; + for (; input_offset < config.ctas_per_output; input_offset += step) { + uint32_t idx = config.staging_memory_offset(input_offset); + arg_vec_t next = reduce_buffer[idx]; + #pragma unroll + for (int i = 0; i < output_vec_size; i++) { + value[i] = reducer::combine(value[i], next[i]); + } + } + } else { + uint32_t input_offset = threadIdx.y; + uint32_t step = blockDim.y; + for (; input_offset < config.ctas_per_output; input_offset += step) { + uint32_t idx = config.staging_memory_offset(input_offset); + arg_vec_t next = reduce_buffer[idx]; + #pragma unroll + for (int i = 0; i < output_vec_size; i++) { + value[i] = reducer::combine(value[i], next[i]); + } + } + } + value = block_y_reduce(value, shared_memory); + if (config.should_block_x_reduce()) { + value = block_x_reduce(value, shared_memory); + } + if (should_store) { + if (accumulate) { + #pragma unroll + for (int i = 0; i < output_vec_size; i++) { + value[i] = reducer::translate_idx(value[i], base_idx); + } + } + + if (acc == nullptr) { + if (accumulate) { + value = accumulate_in_output(out, value); + } + if (final_output) { + set_results_to_output(value, base_offsets); + } else { + #pragma unroll + for (int i = 0; i < output_vec_size; i++) { + *(out[i]) = get_accumulated_output(out[i], value[i]); + } + } + } else { + if (accumulate) { + #pragma unroll + for (int i = 0; i < output_vec_size; i++) { + value[i] = reducer::combine((*acc)[i], value[i]); + } + } + if (final_output) { + set_results_to_output(value, base_offsets); + } else { + *acc = value; + } + } + } + } + + return value; + } +}; + +extern "C" +__launch_bounds__(${max_threads_lb}, 4) +__global__ void reduction_${name}_kernel(ReduceJitOp r){ + r.run(); +} +)ESCAPE"; + +const std::string reduction_template = reduction_template_0 + reduction_template_1; + + +const std::string &get_reduction_template() { + return reduction_template; +} + +}} diff --git a/aten/src/ATen/native/cuda/thread_constants.h b/aten/src/ATen/native/cuda/thread_constants.h index 464c6fe9fe2e1d..651053d663e4c2 100644 --- a/aten/src/ATen/native/cuda/thread_constants.h +++ b/aten/src/ATen/native/cuda/thread_constants.h @@ -13,7 +13,7 @@ constexpr int num_threads() { return 256; } #else -constexpr int num_threads() { +constexpr uint32_t num_threads() { return C10_WARP_SIZE * 4; } #endif diff --git a/aten/src/ATen/native/cuda/vol2col.cuh b/aten/src/ATen/native/cuda/vol2col.cuh index 17459f382816c6..7ab719bc819ebf 100644 --- a/aten/src/ATen/native/cuda/vol2col.cuh +++ b/aten/src/ATen/native/cuda/vol2col.cuh @@ -1,9 +1,5 @@ #pragma once -#include -#include -#include - #include #include #include diff --git a/aten/src/ATen/native/cudnn/GridSampler.cpp b/aten/src/ATen/native/cudnn/GridSampler.cpp index 38bde06aa6cc0c..b22d25cbff977a 100644 --- a/aten/src/ATen/native/cudnn/GridSampler.cpp +++ b/aten/src/ATen/native/cudnn/GridSampler.cpp @@ -2,6 +2,7 @@ #include #include #include +#include #if !AT_CUDNN_ENABLED() @@ -67,6 +68,13 @@ void checkGridSize(CheckedFrom c, TensorArg grid, TensorArg input) Tensor cudnn_grid_sampler_forward( const Tensor& input_t, const Tensor& grid_t) { + // See NOTE [ grid_sampler Native Functions ]. + // Add checks here in case this is called instead of grid_sampler. + check_grid_sampler_common(input_t, grid_t); + TORCH_CHECK( + cond_cudnn_grid_sampler(input_t, grid_t), + "Invalid arguments to cudnn_grid_sampler_forward"); + auto input_contig = contiguousIfZeroInStrides(input_t); auto grid_contig = grid_t.contiguous(); TensorArg input{ input_contig, "input", 1 }, @@ -106,6 +114,13 @@ std::tuple cudnn_grid_sampler_backward( const Tensor& input_t, const Tensor& grid_t, const Tensor& grad_output_t) { + // See NOTE [ grid_sampler Native Functions ]. + // Add checks here in case this is called instead of grid_sampler. + check_grid_sampler_common(input_t, grid_t); + TORCH_CHECK( + cond_cudnn_grid_sampler(input_t, grid_t), + "Invalid arguments to cudnn_grid_sampler_backward"); + auto input_contig = contiguousIfZeroInStrides(input_t); auto grid_contig = grid_t.contiguous(); auto grad_output_contig = contiguousIfZeroInStrides(grad_output_t); diff --git a/aten/src/ATen/native/cudnn/RNN.cpp b/aten/src/ATen/native/cudnn/RNN.cpp index a80fc4fe033595..29430b38e74ea4 100644 --- a/aten/src/ATen/native/cudnn/RNN.cpp +++ b/aten/src/ATen/native/cudnn/RNN.cpp @@ -753,19 +753,61 @@ namespace { } } - cudnnRNNAlgo_t get_algo(const RNNDescriptorParams& rnn, const TensorDescriptorListParams& tensors, const Tensor input) { + inline bool use_rnn_persist_small_h(const RNNDescriptorParams& rnn, + const TensorDescriptorListParams& tensors, + bool forward) { +#if CUDNN_VERSION >= 8201 // 8.2.1 + cudaDeviceProp* prop = at::cuda::getCurrentDeviceProperties(); + if (prop->major < 6) return false; + + if (forward) { + if (rnn.mode == CUDNN_RNN_RELU || rnn.mode == CUDNN_RNN_TANH) { + return rnn.hidden_size <= 384; + } + if (rnn.mode == CUDNN_LSTM || rnn.mode == CUDNN_GRU) { + return rnn.hidden_size <= 192; + } + } else /* backward */ { + if (rnn.mode == CUDNN_RNN_RELU || rnn.mode == CUDNN_RNN_TANH) { + return rnn.hidden_size <= 256; + } + if (rnn.mode == CUDNN_LSTM || rnn.mode == CUDNN_GRU) { + return rnn.hidden_size <= 128; + } + } + + return false; +#else + return false; +#endif + } + + cudnnRNNAlgo_t get_algo(const RNNDescriptorParams& rnn, const TensorDescriptorListParams& tensors, const Tensor input, bool forward) { // LSTM with projections only works with standard algorithm if (rnn.proj_size != 0) { return CUDNN_RNN_ALGO_STANDARD; } - if (getCudnnDataType(input) == CUDNN_DATA_HALF && - !tensors.is_input_packed()) { - if (use_persist_common_heuristics(rnn, tensors) && - use_persist_device_heuristics(rnn, tensors)) { - return CUDNN_RNN_ALGO_PERSIST_STATIC; + // Persistent algos typically don't work for packed inputs with sequence lengths that vary + // across batch elements, and will return CUDNN_STATUS_NOT_SUPPORTED if attempted. See + // https://docs.nvidia.com/deeplearning/cudnn/developer-guide/index.html#features-of-rnn-functions + if (!tensors.is_input_packed()) { + auto cudnnDataType = getCudnnDataType(input); +#if CUDNN_VERSION >= 8201 // 8.2.1 + if (cudnnDataType != CUDNN_DATA_DOUBLE) { + if (use_rnn_persist_small_h(rnn, tensors, forward)) { + return CUDNN_RNN_ALGO_PERSIST_STATIC_SMALL_H; + } + } +#endif + if (cudnnDataType == CUDNN_DATA_HALF) { + if (use_persist_common_heuristics(rnn, tensors) && + use_persist_device_heuristics(rnn, tensors)) { + return CUDNN_RNN_ALGO_PERSIST_STATIC; + } } } + return CUDNN_RNN_ALGO_STANDARD; } @@ -970,7 +1012,7 @@ std::tuple _cudnn_rnn( auto y = output; auto handle = getCudnnHandle(); - cudnnRNNAlgo_t algo = get_algo(fn.rnn, fn.tensors, input); + cudnnRNNAlgo_t algo = get_algo(fn.rnn, fn.tensors, input, true); fn.rnn.set_algo(algo); RNNDescriptors descs(fn, handle, x, y, hx, cx); @@ -1131,7 +1173,7 @@ std::tuple _cudnn_rnn_backward_input( TORCH_CHECK(dhy.is_cuda() && dy.is_cuda() && (!dcy.defined() || dcy.is_cuda()), "Gradients aren't CUDA tensors"); - cudnnRNNAlgo_t algo = get_algo(fn.rnn, fn.tensors, input); + cudnnRNNAlgo_t algo = get_algo(fn.rnn, fn.tensors, input, false); fn.rnn.set_algo(algo); RNNDescriptors descs(fn, handle, x, y, hx, cx); @@ -1234,7 +1276,7 @@ std::vector _cudnn_rnn_backward_weight( const auto& y = output; auto dw = at::zeros(weight_buf.sizes(), weight_buf.options()); - cudnnRNNAlgo_t algo = get_algo(fn.rnn, fn.tensors, input); + cudnnRNNAlgo_t algo = get_algo(fn.rnn, fn.tensors, input, false); fn.rnn.set_algo(algo); RNNDescriptors descs(fn, handle, x, y, hx, cx); diff --git a/aten/src/ATen/native/group_norm.cpp b/aten/src/ATen/native/group_norm.cpp index 5533780a4547e1..db1d82f84fef03 100644 --- a/aten/src/ATen/native/group_norm.cpp +++ b/aten/src/ATen/native/group_norm.cpp @@ -16,6 +16,39 @@ namespace at { namespace native { +void check_group_norm_inputs( + const Tensor& input, + const Tensor& weight, + const Tensor& bias, + int64_t C, + int64_t num_groups) { + TORCH_CHECK( + num_groups > 0, + "Expected num groups to be greater than 0, got ", num_groups); + TORCH_CHECK( + C % num_groups == 0, + "Expected number of channels in input to be divisible by ", + "num_groups, but got input of shape ", + input.sizes(), + " and " + "num_groups=", + num_groups); + TORCH_CHECK( + !weight.defined() || (weight.dim() == 1 && weight.numel() == C), + "Expected weight to be a vector of size equal to the number of ", + "channels in input, but got weight of shape ", + weight.sizes(), + " and input of shape ", + input.sizes()); + TORCH_CHECK( + !bias.defined() || (bias.dim() == 1 && bias.numel() == C), + "Expected bias to be a vector of size equal to the number of ", + "channels in input, but got bias of shape ", + weight.sizes(), + " and input of shape ", + input.sizes()); +} + std::tuple native_group_norm( const Tensor& X, const c10::optional& gamma_opt /* optional */, @@ -31,6 +64,9 @@ std::tuple native_group_norm( const Tensor& gamma = *gamma_maybe_owned; const Tensor& beta = c10::value_or_else(beta_opt, [] { return Tensor(); }); + // repeated check so expanded weights can call native_group_norm directly but + // save mean and variance from forward + check_group_norm_inputs(X, gamma, beta, C, group); auto memory_format = X.device().is_cpu() ? X.suggest_memory_format() : at::MemoryFormat::Contiguous; @@ -128,28 +164,7 @@ Tensor group_norm( const int64_t N = input.size(0); const int64_t C = input.size(1); - TORCH_CHECK( - C % num_groups == 0, - "Expected number of channels in input to be divisible by ", - "num_groups, but got input of shape ", - input.sizes(), - " and " - "num_groups=", - num_groups); - TORCH_CHECK( - !weight.defined() || (weight.dim() == 1 && weight.numel() == C), - "Expected weight to be a vector of size equal to the number of ", - "channels in input, but got weight of shape ", - weight.sizes(), - " and input of shape ", - input.sizes()); - TORCH_CHECK( - !bias.defined() || (bias.dim() == 1 && bias.numel() == C), - "Expected bias to be a vector of size equal to the number of ", - "channels in input, but got bias of shape ", - weight.sizes(), - " and input of shape ", - input.sizes()); + check_group_norm_inputs(input, weight, bias, C, num_groups); const auto input_shape = input.sizes(); const int64_t HxW = diff --git a/aten/src/ATen/native/layer_norm.cpp b/aten/src/ATen/native/layer_norm.cpp index c6b9b6d5c26ab1..fc5a37bc03ae0b 100644 --- a/aten/src/ATen/native/layer_norm.cpp +++ b/aten/src/ATen/native/layer_norm.cpp @@ -18,7 +18,7 @@ namespace at { namespace native { -void layer_norm_cpu_out( +void layer_norm_with_mean_rstd_out( at::Tensor& out, at::Tensor& mean, at::Tensor& rstd, @@ -50,6 +50,20 @@ void layer_norm_cpu_out( rstd = rstd.view(stat_shape); } +void layer_norm_cpu_out( + at::Tensor& out, + const at::Tensor& input, + const Tensor& gamma, + const Tensor& beta, + double eps, + int64_t M, + int64_t N) { + if (M <= 0) { + return; + } + LayerNormKernel(kCPU, input, gamma, beta, M, N, eps, &out, /*mean=*/nullptr, /*rstd=*/nullptr); +} + std::tuple layer_norm_cpu( const Tensor& input, IntArrayRef normalized_shape, const c10::optional& weight_opt /* optional */, const c10::optional& bias_opt /* optional */, @@ -78,7 +92,7 @@ std::tuple layer_norm_cpu( Tensor mean = at::empty({M}, X->options()); Tensor rstd = at::empty({M}, X->options()); - layer_norm_cpu_out(Y, mean, rstd, *X, normalized_shape, *gamma, *beta, eps, M, N); + layer_norm_with_mean_rstd_out(Y, mean, rstd, *X, normalized_shape, *gamma, *beta, eps, M, N); return std::make_tuple(std::move(Y), std::move(mean), std::move(rstd)); } diff --git a/aten/src/ATen/native/layer_norm.h b/aten/src/ATen/native/layer_norm.h index e1bf789dcd81d5..629bc9ab3906b9 100644 --- a/aten/src/ATen/native/layer_norm.h +++ b/aten/src/ATen/native/layer_norm.h @@ -65,10 +65,7 @@ C10_ALWAYS_INLINE std::pair _check_layer_norm_inputs( void layer_norm_cpu_out( at::Tensor& out, - at::Tensor& mean, - at::Tensor& rstd, const at::Tensor& input, - IntArrayRef normalized_shape, const Tensor& gamma, const Tensor& beta, double eps, diff --git a/aten/src/ATen/native/metal/ops/MetalUpsamplingNearest.mm b/aten/src/ATen/native/metal/ops/MetalUpsamplingNearest.mm index 300cddba006a40..39524569bae5fa 100644 --- a/aten/src/ATen/native/metal/ops/MetalUpsamplingNearest.mm +++ b/aten/src/ATen/native/metal/ops/MetalUpsamplingNearest.mm @@ -17,7 +17,7 @@ Tensor upsample_nearest2d_vec( const Tensor& input, - c10::optional output_size, + at::OptionalIntArrayRef output_size, c10::optional> scale_factors) { TORCH_CHECK(input.is_metal()); auto osize = diff --git a/aten/src/ATen/native/mkl/SparseBlasImpl.cpp b/aten/src/ATen/native/mkl/SparseBlasImpl.cpp index 3485dc1c5fb21d..3d49554ce29a63 100644 --- a/aten/src/ATen/native/mkl/SparseBlasImpl.cpp +++ b/aten/src/ATen/native/mkl/SparseBlasImpl.cpp @@ -340,18 +340,21 @@ void addmm_out_sparse_csr( const Scalar& alpha, const Tensor& result) { TORCH_INTERNAL_ASSERT_DEBUG_ONLY(mat1.dim() == 2 && mat2.dim() == 2 && result.dim() == 2); - if (mat2.layout() == kStrided && result.layout() == kStrided) { + if (mat1.is_sparse_csr() && mat2.layout() == kStrided && result.layout() == kStrided) { return addmm_dense_result(mat1, mat2, beta, alpha, result); - } else if ( - mat1.is_sparse_csr() && mat2.is_sparse_csr() && - result.layout() == kStrided) { + } + if (mat1.layout() == kStrided && mat2.is_sparse_csr() && result.layout() == kStrided) { + // TODO: We can use MKL's transposition flags once we have CSC support. + return addmm_dense_result(mat2.transpose(0, 1), mat1.transpose(0, 1), beta, alpha, result.transpose(0, 1)); + } + if (mat1.is_sparse_csr() && mat2.is_sparse_csr() && result.layout() == kStrided) { return addmm_sparse_input_dense_result(mat1, mat2, beta, alpha, result); - } else if (mat2.is_sparse_csr() && result.is_sparse_csr()) { + } + if (mat1.is_sparse_csr() && mat2.is_sparse_csr() && result.is_sparse_csr()) { return addmm_sparse_result(mat1, mat2, beta, alpha, result); - } else { - TORCH_CHECK(false, "addmm: computation on CPU is not implemented for ", - result.layout(), " + ", mat1.layout(), " @ ", mat2.layout()); } + TORCH_CHECK(false, "addmm: computation on CPU is not implemented for ", + result.layout(), " + ", mat1.layout(), " @ ", mat2.layout()); } /* diff --git a/aten/src/ATen/native/mkldnn/Conv.cpp b/aten/src/ATen/native/mkldnn/Conv.cpp index fb41dcdd6215dc..50b366e6ee51bd 100644 --- a/aten/src/ATen/native/mkldnn/Conv.cpp +++ b/aten/src/ATen/native/mkldnn/Conv.cpp @@ -199,9 +199,9 @@ std::tuple mkldnn_convolution_backward_weights( mkldnn_to_dense(new_with_itensor_mkldnn(std::move(mkldnn_grad_weight), optTypeMetaToScalarType(grad_output.options().dtype_opt()), grad_output.options().device_opt())), - mkldnn_to_dense(new_with_itensor_mkldnn(std::move(mkldnn_grad_bias), + bias_defined ? mkldnn_to_dense(new_with_itensor_mkldnn(std::move(mkldnn_grad_bias), optTypeMetaToScalarType(grad_output.options().dtype_opt()), - grad_output.options().device_opt()))); + grad_output.options().device_opt())) : Tensor()); } std::tuple mkldnn_convolution_backward( diff --git a/aten/src/ATen/native/native_functions.yaml b/aten/src/ATen/native/native_functions.yaml index baad10f9d26e62..3ef3291274a405 100644 --- a/aten/src/ATen/native/native_functions.yaml +++ b/aten/src/ATen/native/native_functions.yaml @@ -1878,6 +1878,7 @@ MkldnnCPU: empty_mkldnn SparseCPU, SparseCUDA: empty_sparse SparseCsrCPU, SparseCsrCUDA: empty_sparse_csr + QuantizedCPU, QuantizedCUDA: empty_unknown_quantized # We do not make new_empty a composite that calls into new_empty_strided, as the strided version # is significantly more difficult to implement by different backends @@ -1949,6 +1950,7 @@ CPU: empty_strided_cpu CUDA: empty_strided_cuda Meta: empty_strided_meta + QuantizedCPU, QuantizedCUDA: empty_strided_unknown_quantized - func: erf(Tensor self) -> Tensor device_check: NoCheck # TensorIterator @@ -2223,10 +2225,12 @@ variants: function, method # NOTE [ grid_sampler Native Functions ] -# `grid_sampler` does all the shape checking and then dispatches to one of -# `cudnn_grid_sampler`, `grid_sampler_2d`, or `grid_sampler_3d`, each of which -# has the corresponding backward defined as native functions as well. Therefore, -# in these functions and their backwards, no more shape checking is done. +# `grid_sampler` is _supposed to_ do all the shape checking and then dispatch to +# one of `cudnn_grid_sampler`, `grid_sampler_2d`, or `grid_sampler_3d`, each of +# which has the corresponding backward defined as native functions as well. +# However, we do shape checking everywhere for now since each of the mentioned +# functions can be called directly, which will lead to crashes otherwise. +# See https://github.com/pytorch/pytorch/issues/73187 for more information. # # There is also _grid_sampler_2d_backward_cpu_fallback which is an # implementation detail of grid_sampler_2d and is only exposed here for testing @@ -3086,10 +3090,10 @@ - func: amin(Tensor self, int[1] dim=[], bool keepdim=False) -> Tensor variants: function, method - dispatch: - CompositeExplicitAutograd: amin + structured_delegate: amin.out - func: amin.out(Tensor self, int[1] dim=[], bool keepdim=False, *, Tensor(a!) out) -> Tensor(a!) + structured: True dispatch: CPU, CUDA: amin_out @@ -3173,6 +3177,7 @@ variants: function, method dispatch: SparseCPU, SparseCUDA: mul_sparse + SparseCsrCPU, SparseCsrCUDA: mul_sparse_csr MkldnnCPU: mkldnn_mul ZeroTensor: mul_zerotensor @@ -3182,6 +3187,7 @@ variants: method dispatch: SparseCPU, SparseCUDA: mul_sparse_ + SparseCsrCPU, SparseCsrCUDA: mul_sparse_csr_ MkldnnCPU: mkldnn_mul_ - func: mul.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!) @@ -3192,6 +3198,7 @@ CPU, CUDA: mul_out SparseCPU: mul_out_sparse_cpu SparseCUDA: mul_out_sparse_cuda + SparseCsrCPU, SparseCsrCUDA: mul_out_sparse_csr MkldnnCPU: mkldnn_mul_out # For C++ only, until we have conversion from C++ numbers to Tensor @@ -3206,6 +3213,7 @@ variants: method dispatch: CompositeExplicitAutograd: mul_ + SparseCsrCPU, SparseCsrCUDA: mul__scalar_sparse_csr # multiply, alias for mul - func: multiply.Tensor(Tensor self, Tensor other) -> Tensor @@ -3255,6 +3263,11 @@ SparseCPU, SparseCUDA: narrow_copy_sparse CompositeExplicitAutograd: narrow_copy_dense +- func: narrow_copy.SymInt(Tensor self, int dim, int start, SymInt length) -> Tensor + variants: function, method + dispatch: + CompositeExplicitAutograd: narrow_copy_symint + - func: narrow_copy.out(Tensor self, int dim, int start, int length, *, Tensor(a!) out) -> Tensor(a!) dispatch: CPU: narrow_copy_dense_cpu_out @@ -3710,6 +3723,7 @@ CPU, CUDA: relu MkldnnCPU: mkldnn_relu QuantizedCPU: relu_quantized_cpu + NestedTensor: NestedTensor_relu - func: relu_(Tensor(a!) self) -> Tensor(a!) device_check: NoCheck # TensorIterator @@ -3718,6 +3732,7 @@ CPU, CUDA: relu_ MkldnnCPU: mkldnn_relu_ QuantizedCPU: relu_quantized_cpu_ + NestedTensor: NestedTensor_relu_ - func: relu6(Tensor self) -> Tensor python_module: nn @@ -3746,6 +3761,13 @@ CPU: gelu_out_cpu CUDA: gelu_out_cuda +- func: gelu_(Tensor(a!) self, *, str approximate='none') -> Tensor(a!) + structured_delegate: gelu.out + device_check: NoCheck # TensorIterator + python_module: nn + dispatch: + NestedTensor: NestedTensor_gelu_ + - func: gelu(Tensor self, *, str approximate='none') -> Tensor structured_delegate: gelu.out device_check: NoCheck # TensorIterator @@ -3753,6 +3775,7 @@ dispatch: MkldnnCPU: mkldnn_gelu QuantizedCPU: gelu_quantized_cpu + NestedTensor: NestedTensor_gelu - func: gelu_backward.grad_input(Tensor grad_output, Tensor self, *, str approximate='none', Tensor(a!) grad_input) -> Tensor(a!) structured: True @@ -4125,6 +4148,10 @@ dispatch: CompositeExplicitAutograd: split +- func: split.sizes(Tensor(a -> *) self, int[] split_size, int dim=0) -> Tensor(a)[] + variants: function, method + device_guard: False + - func: unsafe_split_with_sizes(Tensor self, int[] split_sizes, int dim=0) -> Tensor[] variants: function, method device_check: NoCheck @@ -4162,7 +4189,7 @@ device_check: NoCheck device_guard: False dispatch: - CPU, CUDA: squeeze + CompositeExplicitAutograd: squeeze QuantizedCPU, QuantizedCUDA: squeeze_quantized - func: squeeze.dim(Tensor(a) self, int dim) -> Tensor(a) @@ -4170,7 +4197,7 @@ device_check: NoCheck device_guard: False dispatch: - CPU, CUDA: squeeze + CompositeExplicitAutograd: squeeze QuantizedCPU, QuantizedCUDA: squeeze_quantized - func: squeeze.dimname(Tensor(a) self, Dimname dim) -> Tensor(a) @@ -4240,12 +4267,13 @@ - func: dstack.out(Tensor[] tensors, *, Tensor(a!) out) -> Tensor(a!) -# The signature is designed to be consistent with librosa except that it is -# missing the `pad_mode` and `center` arguments, which are taken care of at -# `torch.functional.py`. They shall be moved here once we have mapping between -# Python strings and C++ Enum in codegen. +# Overload without center & pad mode, needed for forward-compatibility - func: stft(Tensor self, int n_fft, int? hop_length=None, int? win_length=None, Tensor? window=None, bool normalized=False, bool? onesided=None, bool? return_complex=None) -> Tensor variants: function, method + cpp_no_default_args: ['hop_length', 'win_length', 'window', 'normalized'] + +- func: stft.center(Tensor self, int n_fft, int? hop_length=None, int? win_length=None, Tensor? window=None, bool center=True, str pad_mode="reflect", bool normalized=False, bool? onesided=None, bool? return_complex=None) -> Tensor + variants: function, method - func: istft(Tensor self, int n_fft, int? hop_length=None, int? win_length=None, Tensor? window=None, bool center=True, bool normalized=False, bool? onesided=None, int? length=None, bool return_complex=False) -> Tensor variants: function, method @@ -4266,6 +4294,7 @@ variants: function, method dispatch: CompositeExplicitAutograd: sum + SparseCsrCPU, SparseCsrCUDA: sum_csr - func: sum.dim_IntList(Tensor self, int[1] dim, bool keepdim=False, *, ScalarType? dtype=None) -> Tensor structured_delegate: sum.IntList_out @@ -4694,7 +4723,7 @@ device_check: NoCheck device_guard: False dispatch: - CPU, CUDA: unsqueeze + CompositeExplicitAutograd: unsqueeze SparseCPU, SparseCUDA: unsqueeze_sparse QuantizedCPU, QuantizedCUDA: unsqueeze_quantized @@ -4772,12 +4801,16 @@ device_check: NoCheck device_guard: False -# we define both of these because 'where' does the broadcast and '_s_where' doesn't; -# this allows us to implicitly calculate the broadcast derivative, while only dealing with the -# _s_where derivative. - func: where.self(Tensor condition, Tensor self, Tensor other) -> Tensor device_check: NoCheck # TensorIterator variants: function, method + dispatch: + CPU, CUDA: where + +- func: where.self_out(Tensor condition, Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!) + device_check: NoCheck # TensorIterator + dispatch: + CPU, CUDA: where_self_out - func: where.ScalarSelf(Tensor condition, Scalar self, Tensor other) -> Tensor variants: function @@ -4792,11 +4825,6 @@ device_check: NoCheck # TensorIterator variants: function -- func: _s_where(Tensor condition, Tensor self, Tensor other) -> Tensor - variants: function - dispatch: - CPU, CUDA: _s_where - - func: norm_except_dim(Tensor v, int pow=2, int dim=0) -> Tensor variants: function @@ -4895,6 +4923,11 @@ SparseCPU: _sparse_sum_backward_cpu SparseCUDA: _sparse_sum_backward_cuda +- func: _sparse_csr_sum.dim_dtype(Tensor self, int[1] dim, bool keepdim=False, *, ScalarType? dtype=None) -> Tensor + dispatch: + SparseCsrCPU: _sparse_csr_sum_cpu + SparseCsrCUDA: _sparse_csr_sum_cuda + - func: _sparse_softmax.int(Tensor self, int dim, ScalarType? dtype=None) -> Tensor python_module: sparse variants: function @@ -5036,7 +5069,7 @@ - func: resize_as_sparse_(Tensor(a!) self, Tensor the_template) -> Tensor(a!) use_const_ref_for_mutable_tensors: True - variants: function + variants: function, method dispatch: SparseCPU, SparseCUDA: resize_as_sparse_ SparseCsrCPU, SparseCsrCUDA: resize_as_sparse_csr_ @@ -5176,6 +5209,16 @@ SparseCPU: s_addmm_sparse_dense_cpu_ SparseCUDA: s_addmm_sparse_dense_cuda_ +- func: _addmm_activation.out(Tensor self, Tensor mat1, Tensor mat2, *, Scalar beta=1, Scalar alpha=1, bool use_gelu=False, Tensor(a!) out) -> Tensor(a!) + structured: True + dispatch: + CPU: addmm_activation_out_cpu + CUDA: addmm_activation_out_cuda + +- func: _addmm_activation(Tensor self, Tensor mat1, Tensor mat2, *, Scalar beta=1, Scalar alpha=1, bool use_gelu=False) -> Tensor + structured_delegate: _addmm_activation.out + variants: function, method + # NOTE [ Sparse: autograd and API ] # # @@ -5336,8 +5379,13 @@ - func: to_dense(Tensor self, ScalarType? dtype=None) -> Tensor variants: method + +# Special case of to_dense with custom derivative +- func: _to_dense(Tensor self, ScalarType? dtype=None) -> Tensor + variants: method dispatch: - SparseCPU, SparseCUDA, SparseCsrCPU, SparseCsrCUDA: sparse_to_dense + SparseCPU, SparseCUDA: sparse_to_dense + SparseCsrCPU, SparseCsrCUDA: sparse_csr_to_dense MkldnnCPU: mkldnn_to_dense - func: to_dense_backward(Tensor grad, Tensor input) -> Tensor @@ -5490,6 +5538,13 @@ CPU, CUDA: dense_to_sparse SparseCsrCPU, SparseCsrCUDA: sparse_csr_to_sparse +- func: to_sparse_csr(Tensor self) -> Tensor + variants: method + dispatch: + CPU, CUDA: dense_to_sparse_csr + SparseCPU, SparseCUDA: coo_to_sparse_csr + SparseCsrCPU, SparseCsrCUDA: csr_to_sparse_csr + - func: to_mkldnn(Tensor self, ScalarType? dtype=None) -> Tensor variants: method dispatch: @@ -5824,14 +5879,14 @@ device_check: NoCheck device_guard: False dispatch: - CPU, CUDA: set_ + CPU, CUDA, Meta: set_ - func: set_.source_Storage_storage_offset(Tensor(a!) self, Storage source, int storage_offset, int[] size, int[] stride=[]) -> Tensor(a!) variants: method device_check: NoCheck device_guard: False dispatch: - CPU: set_storage_cpu_ + CPU, Meta: set_storage_cpu_ CUDA: set_storage_cuda_ QuantizedCPU, QuantizedCUDA: set_storage_quantized_ @@ -5840,13 +5895,14 @@ device_check: NoCheck device_guard: False dispatch: - CPU, CUDA: set_tensor_ + CPU, CUDA, Meta: set_tensor_ - func: set_(Tensor(a!) self) -> Tensor(a!) variants: method dispatch: CPU: set_cpu_ CUDA: set_cuda_ + Meta: set_meta_ - func: is_set_to(Tensor self, Tensor tensor) -> bool variants: method @@ -6066,10 +6122,19 @@ - func: scatter_add.dimname(Tensor self, Dimname dim, Tensor index, Tensor src) -> Tensor variants: function, method -- func: scatter_reduce.two(Tensor self, int dim, Tensor index, str reduce, *, int? output_size=None) -> Tensor +- func: scatter_reduce.two(Tensor self, int dim, Tensor index, Tensor src, str reduce, *, bool include_self=True) -> Tensor + structured_delegate: scatter_reduce.two_out variants: function, method + +- func: scatter_reduce_.two(Tensor(a!) self, int dim, Tensor index, Tensor src, str reduce, *, bool include_self=True) -> Tensor(a!) + structured_delegate: scatter_reduce.two_out + variants: method + +- func: scatter_reduce.two_out(Tensor self, int dim, Tensor index, Tensor src, str reduce, *, bool include_self=True, Tensor(a!) out) -> Tensor(a!) + structured: True + variants: function dispatch: - CPU: scatter_reduce_two_cpu + CPU, CUDA: scatter_reduce_two - func: eq_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!) structured_delegate: eq.Scalar_out @@ -6276,25 +6341,25 @@ device_check: NoCheck # TensorIterator variants: method, function dispatch: - CPU, CUDA: bitwise_left_shift + CompositeExplicitAutograd: bitwise_left_shift - func: bitwise_left_shift_.Tensor_Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!) device_check: NoCheck # TensorIterator variants: method dispatch: - CPU, CUDA: bitwise_left_shift_ + CompositeExplicitAutograd: bitwise_left_shift_ - func: bitwise_left_shift.Tensor_Scalar_out(Tensor self, Scalar other, *, Tensor(a!) out) -> Tensor(a!) device_check: NoCheck # TensorIterator variants: function dispatch: - CPU, CUDA: bitwise_left_shift_out + CompositeExplicitAutograd: bitwise_left_shift_out - func: bitwise_left_shift.Scalar_Tensor(Scalar self, Tensor other) -> Tensor device_check: NoCheck # TensorIterator variants: function dispatch: - CPU, CUDA: bitwise_left_shift + CompositeExplicitAutograd: bitwise_left_shift - func: __rshift__.Scalar(Tensor self, Scalar other) -> Tensor device_check: NoCheck # TensorIterator @@ -6341,25 +6406,25 @@ device_check: NoCheck # TensorIterator variants: method, function dispatch: - CPU, CUDA: bitwise_right_shift + CompositeExplicitAutograd: bitwise_right_shift - func: bitwise_right_shift_.Tensor_Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!) device_check: NoCheck # TensorIterator variants: method dispatch: - CPU, CUDA: bitwise_right_shift_ + CompositeExplicitAutograd: bitwise_right_shift_ - func: bitwise_right_shift.Tensor_Scalar_out(Tensor self, Scalar other, *, Tensor(a!) out) -> Tensor(a!) device_check: NoCheck # TensorIterator variants: function dispatch: - CPU, CUDA: bitwise_right_shift_out + CompositeExplicitAutograd: bitwise_right_shift_out - func: bitwise_right_shift.Scalar_Tensor(Scalar self, Tensor other) -> Tensor device_check: NoCheck # TensorIterator variants: function dispatch: - CPU, CUDA: bitwise_right_shift + CompositeExplicitAutograd: bitwise_right_shift - func: tril_(Tensor(a!) self, int diagonal=0) -> Tensor(a!) structured_delegate: tril.out @@ -7011,7 +7076,7 @@ - func: linalg_solve_triangular(Tensor self, Tensor B, *, bool upper, bool left=True, bool unitriangular=False) -> Tensor python_module: linalg - variants: method, function + variants: function dispatch: CPU, CUDA: linalg_solve_triangular @@ -7404,6 +7469,12 @@ dispatch: CPU: histogramdd_cpu +- func: histogramdd(Tensor self, int[] bins, float[]? range=None, Tensor? weight=None, bool density=False) -> (Tensor hist, Tensor[] bin_edges) + +- func: histogramdd.int_bins(Tensor self, int bins, float[]? range=None, Tensor? weight=None, bool density=False) -> (Tensor hist, Tensor[] bin_edges) + +- func: histogramdd.TensorList_bins(Tensor self, Tensor[] bins, float[]? range=None, Tensor? weight=None, bool density=False) -> (Tensor hist, Tensor[] bin_edges) + - func: fmod.Scalar_out(Tensor self, Scalar other, *, Tensor(a!) out) -> Tensor(a!) device_check: NoCheck # TensorIterator dispatch: @@ -8594,6 +8665,9 @@ CPU: _convert_indices_from_csr_to_coo_structured_cpu CUDA: _convert_indices_from_csr_to_coo_structured_cuda +- func: _csr_to_block_csr(Tensor self, int[2] block_size) -> Tensor + python_module: sparse + ## NN wrappers - func: mse_loss.out(Tensor self, Tensor target, int reduction=Mean, *, Tensor(a!) out) -> Tensor(a!) @@ -9421,14 +9495,13 @@ python_module: nn structured: True dispatch: - CPU, QuantizedCPU: reflection_pad1d_out_cpu + CPU: reflection_pad1d_out_cpu + QuantizedCPU: reflection_pad1d_out_quantized_cpu CUDA: reflection_pad1d_out_cuda - func: reflection_pad1d(Tensor self, int[2] padding) -> Tensor python_module: nn structured_delegate: reflection_pad1d.out - dispatch: - QuantizedCPU: reflection_pad1d_quantized_cpu - func: reflection_pad1d_backward.grad_input(Tensor grad_output, Tensor self, int[2] padding, *, Tensor(a!) grad_input) -> Tensor(a!) python_module: nn @@ -9556,6 +9629,15 @@ CPU: replication_pad3d_backward_cpu CUDA: replication_pad3d_backward_cuda +- func: _pad_circular(Tensor self, int[] pad) -> Tensor + python_module: nn + +- func: _pad_enum(Tensor self, int[] pad, int mode, float? value=None) -> Tensor + python_module: nn + +- func: pad(Tensor self, int[] pad, str mode="constant", float? value=None) -> Tensor + python_module: nn + - func: upsample_linear1d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> Tensor python_module: nn dispatch: @@ -10250,6 +10332,19 @@ dispatch: CPU, CUDA: special_ndtri_out +- func: special_log_ndtr(Tensor self) -> Tensor + structured_delegate: special_log_ndtr.out + python_module: special + variants: function + +- func: special_log_ndtr.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!) + structured: True + structured_inherits: TensorIteratorBase + python_module: special + variants: function + dispatch: + CPU, CUDA: special_log_ndtr_out + - func: special_expm1(Tensor self) -> Tensor python_module: special variants: function @@ -10503,7 +10598,7 @@ - func: special_polygamma(int n, Tensor self) -> Tensor python_module: special - variants: function, method + variants: function - func: special_polygamma.out(int n, Tensor self, *, Tensor(a!) out) -> Tensor(a!) python_module: special @@ -11252,5 +11347,5 @@ variants: function python_module: nn -- func: _nested_tensor(Tensor[] list, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor +- func: nested_tensor(Tensor[] list, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor variants: function diff --git a/aten/src/ATen/native/nested/NestedTensorMath.cpp b/aten/src/ATen/native/nested/NestedTensorMath.cpp index d41243503275f8..83e2e5428b1517 100644 --- a/aten/src/ATen/native/nested/NestedTensorMath.cpp +++ b/aten/src/ATen/native/nested/NestedTensorMath.cpp @@ -8,6 +8,14 @@ namespace at { namespace native { +namespace { +template +Tensor map_nt(const Tensor& nt, Func f) { + auto* nt_impl = get_nested_tensor_impl(nt); + const auto& sizes = nt_impl->get_nested_size_tensor(); + return at::detail::make_tensor(f(nt_impl->get_buffer()), sizes); +} +} // namespace at::Tensor wrap_buffer(at::Tensor buffer, at::Tensor nested_size_tensor) { TORCH_CHECK(buffer.is_contiguous(), "Given buffer must be contiguous."); @@ -15,20 +23,6 @@ at::Tensor wrap_buffer(at::Tensor buffer, at::Tensor nested_size_tensor) { std::move(buffer), std::move(nested_size_tensor)); } -bool is_nested_tensor_impl(const at::Tensor& tensor) { - return tensor.unsafeGetTensorImpl()->key_set().has( - c10::DispatchKey::NestedTensor); -} - -inline at::native::NestedTensorImpl* get_nested_tensor_impl( - const at::Tensor& tensor) { - TORCH_CHECK( - is_nested_tensor_impl(tensor), - "get_nested_tensor_impl requires a NestedTensor."); - return static_cast( - tensor.unsafeGetTensorImpl()); -} - inline const at::Tensor& get_buffer(const at::Tensor& tensor) { return get_nested_tensor_impl(tensor)->get_buffer(); } @@ -69,11 +63,29 @@ std::vector NestedTensor_unbind( return result_tensors; } -/* - * This result of this function cannot be used by itself. The result needs to - * be wrapped in torch.nested.NestedTensor. - */ -Tensor _nested_tensor( +Tensor& NestedTensor_relu_(Tensor& self) { + at::relu_(const_cast(get_nested_tensor_impl(self)->get_buffer())); + return self; +} + +Tensor NestedTensor_relu(const Tensor& self) { + return map_nt(self, at::relu); +} + +Tensor& NestedTensor_gelu_(Tensor& self, c10::string_view approximate) { + at::gelu_(const_cast(get_nested_tensor_impl(self)->get_buffer()), approximate); + return self; +} + +Tensor NestedTensor_gelu(const Tensor& self, c10::string_view approximate) { + return map_nt( + self, + [approximate](const Tensor& buffer) { + return at::gelu(buffer, approximate); + }); +} + +Tensor nested_tensor( TensorList list, c10::optional dtype, c10::optional layout, diff --git a/aten/src/ATen/native/quantized/QTensor.cpp b/aten/src/ATen/native/quantized/QTensor.cpp index 5fefa3557f4b6c..6e858a3b5c2537 100644 --- a/aten/src/ATen/native/quantized/QTensor.cpp +++ b/aten/src/ATen/native/quantized/QTensor.cpp @@ -15,8 +15,11 @@ Tensor quantize_per_tensor_dynamic( const Tensor& self, ScalarType dtype, bool reduce_range) { - TORCH_CHECK( (dtype == ScalarType::QInt8 || dtype == ScalarType::QUInt8), "dtype ", dtype, "not supported"); + TORCH_CHECK( (dtype == ScalarType::QInt8 || dtype == ScalarType::QUInt8 || dtype == ScalarType::Half), "dtype ", dtype, "not supported"); auto input_contig = self.contiguous(); + if (dtype == ScalarType::Half) { + return input_contig.to(ScalarType::Half); + } float x_min = input_contig.min().item(); float x_max = input_contig.max().item(); diff --git a/aten/src/ATen/native/quantized/TensorFactories.cpp b/aten/src/ATen/native/quantized/TensorFactories.cpp index 08a972eacc3831..aa0fef5df9dc02 100644 --- a/aten/src/ATen/native/quantized/TensorFactories.cpp +++ b/aten/src/ATen/native/quantized/TensorFactories.cpp @@ -66,6 +66,40 @@ Tensor empty_per_channel_affine_quantized( quantizer); } +Tensor empty_unknown_quantized( + IntArrayRef size, + c10::optional dtype, + c10::optional layout, + c10::optional device, + c10::optional pin_memory, + c10::optional optional_memory_format) { + // See [Note: hacky wrapper removal for TensorOptions] + TensorOptions options_ = TensorOptions().dtype(dtype).layout(layout).device(device).pinned_memory(pin_memory); + + TORCH_CHECK( + !(options_.has_memory_format() && optional_memory_format.has_value()), + "Cannot set memory_format both in TensorOptions and explicit argument; please delete " + "the redundant setter."); + auto options = options_.merge_memory_format(optional_memory_format); + TORCH_CHECK( + options.has_dtype(), + "Must provide data type for Tensor creation functions."); + QuantizerPtr quantizer = make_unknown_quantizer(typeMetaToScalarType(options.dtype())); + return new_qtensor(size, options, quantizer); +} + +Tensor empty_strided_unknown_quantized( + IntArrayRef size, + IntArrayRef strided, + c10::optional dtype, + c10::optional layout, + c10::optional device, + c10::optional pin_memory) { + + TORCH_CHECK(false, "empty_strided not supported on quantized tensors yet see https://github.com/pytorch/pytorch/issues/74540") + +} + // Provide better error message if dtype is wrong Tensor empty_affine_quantized_other_backends_stub( IntArrayRef, diff --git a/aten/src/ATen/native/quantized/cpu/conv_packed_params.h b/aten/src/ATen/native/quantized/cpu/conv_packed_params.h deleted file mode 100644 index 130be6a0724dd5..00000000000000 --- a/aten/src/ATen/native/quantized/cpu/conv_packed_params.h +++ /dev/null @@ -1,28 +0,0 @@ -#pragma once - -#include -#include - -template -struct ConvPackedParamsBase : public torch::jit::CustomClassHolder { - virtual at::Tensor apply( - const at::Tensor& input, - double output_scale, - int64_t output_zero_point) = 0; - virtual at::Tensor apply_relu( - const at::Tensor& input, - double output_scale, - int64_t output_zero_point) = 0; - virtual at::Tensor apply_dynamic( - const at::Tensor& input, - bool reduce_range) = 0; - - virtual std::tuple> unpack() = 0; - - virtual torch::List stride() const = 0; - virtual torch::List padding() const = 0; - virtual torch::List output_padding() const = 0; - virtual torch::List dilation() const = 0; - virtual int64_t groups() const = 0; - virtual bool transpose() const = 0; -}; diff --git a/aten/src/ATen/native/quantized/cpu/conv_serialization.h b/aten/src/ATen/native/quantized/cpu/conv_serialization.h index cf5c04977b6a13..369f54b4396147 100644 --- a/aten/src/ATen/native/quantized/cpu/conv_serialization.h +++ b/aten/src/ATen/native/quantized/cpu/conv_serialization.h @@ -4,6 +4,7 @@ #include #include #include +#include #include #include @@ -358,6 +359,20 @@ c10::intrusive_ptr> deserialize_conv( ); } #endif // USE_PYTORCH_QNNPACK +#if AT_MKLDNN_ENABLED() + if (ctx.qEngine() == at::QEngine::ONEDNN) { + return PackedConvWeightsOnednn::prepack( + weight.value(), + bias, + stride, + padding, + output_padding, + dilation, + groups, + transpose + ); + } +#endif // AT_MKLDNN_ENABLED() TORCH_CHECK( false, "Didn't find engine for when deserializing ConvPackedParams: ", diff --git a/aten/src/ATen/native/quantized/cpu/fbgemm_utils.cpp b/aten/src/ATen/native/quantized/cpu/fbgemm_utils.cpp index ab6df06f7b73c3..0a8334b96f7071 100644 --- a/aten/src/ATen/native/quantized/cpu/fbgemm_utils.cpp +++ b/aten/src/ATen/native/quantized/cpu/fbgemm_utils.cpp @@ -1,10 +1,10 @@ #include -#include +#include #include #include #include -#include #include +#include #include #include #include @@ -160,9 +160,10 @@ Tensor MakeStridedQTensorCPU( allocator->allocate(size_bytes), allocator, /* resizable = */ true); + constexpr auto quantized_cpu_ks = at::DispatchKeySet(at::DispatchKey::QuantizedCPU); auto tensor = detail::make_tensor( storage, - at::DispatchKeySet(at::DispatchKey::QuantizedCPU), + quantized_cpu_ks, dtype, quantizer); get_qtensorimpl(tensor)->set_sizes_and_strides(sizes, strides); @@ -471,6 +472,16 @@ int register_linear_params() { std::move(weight), std::move(bias)); } #endif // USE_PYTORCH_QNNPACK +#if AT_MKLDNN_ENABLED() + if (at::globalContext().qEngine() == at::QEngine::ONEDNN) { + TORCH_CHECK( + weight.scalar_type() == at::kQInt8, + "ONEDNN only supports INT8 bit width currently. Got ", + c10::toString(weight.scalar_type())); + return PackedLinearWeightsOnednn::prepack( + std::move(weight), std::move(bias)); + } +#endif // #if AT_MKLDNN_ENABLED() TORCH_CHECK(false, "Unknown qengine"); }) .def("bias", [](const c10::intrusive_ptr& self) { diff --git a/aten/src/ATen/native/quantized/cpu/fbgemm_utils.h b/aten/src/ATen/native/quantized/cpu/fbgemm_utils.h index 43768658af7e00..c98ef18ec85c60 100644 --- a/aten/src/ATen/native/quantized/cpu/fbgemm_utils.h +++ b/aten/src/ATen/native/quantized/cpu/fbgemm_utils.h @@ -1,9 +1,8 @@ #pragma once #include -#include +#include #include -#include #include #include diff --git a/aten/src/ATen/native/quantized/cpu/onednn_utils.h b/aten/src/ATen/native/quantized/cpu/onednn_utils.h new file mode 100644 index 00000000000000..4ee8e8737fb220 --- /dev/null +++ b/aten/src/ATen/native/quantized/cpu/onednn_utils.h @@ -0,0 +1,151 @@ +#pragma once + +#include +#if AT_MKLDNN_ENABLED() +#include +#include +#include +#include + +struct PackedLinearWeightsOnednn : public LinearPackedParamsBase { + PackedLinearWeightsOnednn( + std::unique_ptr weight, + c10::optional bias, + at::Tensor orig_weight, + c10::optional orig_bias) + : weight_(std::move(weight)), + bias_(std::move(bias)), + orig_weight_(std::move(orig_weight)), + orig_bias_(std::move(orig_bias)) {} + std::unique_ptr weight_; + c10::optional bias_; + at::Tensor orig_weight_; + c10::optional orig_bias_; + + at::Tensor apply( + at::Tensor input, + double output_scale, + int64_t output_zero_point) override; + at::Tensor apply_relu( + at::Tensor input, + double output_scale, + int64_t output_zero_point) override; + + at::Tensor apply_dynamic(at::Tensor input, bool reduce_range=false) override; + at::Tensor apply_dynamic_relu(at::Tensor input, bool reduce_range=false) override; + + std::tuple> unpack() override; + + c10::optional bias() override { + return orig_bias_; + } + + static c10::intrusive_ptr prepack( + at::Tensor weight, + c10::optional bias); + + private: + template + at::Tensor apply_impl( + at::Tensor input, + double output_scale, + int64_t output_zero_point); + + template + at::Tensor apply_dynamic_impl(at::Tensor input, bool reduce_range=false); +}; + +template +struct PackedConvWeightsOnednn : public ConvPackedParamsBase { + PackedConvWeightsOnednn( + std::unique_ptr weight, + c10::optional bias, + at::Tensor orig_weight, + c10::optional orig_bias, + torch::List stride, + torch::List padding, + torch::List output_padding, + torch::List dilation, + int64_t groups, + uint8_t transpose) + : weight_(std::move(weight)), + bias_(std::move(bias)), + orig_weight_(std::move(orig_weight)), + orig_bias_(std::move(orig_bias)), + stride_(std::move(stride)), + padding_(std::move(padding)), + output_padding_(std::move(output_padding)), + dilation_(std::move(dilation)), + groups_(groups), + transpose_(transpose) {} + + std::unique_ptr weight_; + c10::optional bias_; + at::Tensor orig_weight_; + c10::optional orig_bias_; + torch::List stride_; + torch::List padding_; + torch::List output_padding_; + torch::List dilation_; + int64_t groups_; + uint8_t transpose_; + + at::Tensor apply( + const at::Tensor& input, + double output_scale, + int64_t output_zero_point) override; + + at::Tensor apply_relu( + const at::Tensor& input, + double output_scale, + int64_t output_zero_point) override; + + at::Tensor apply_dynamic( + const at::Tensor& input, + bool reduce_range) override; + + std::tuple> unpack() override; + + static c10::intrusive_ptr> prepack( + at::Tensor weight, + c10::optional bias, + torch::List stride, + torch::List padding, + torch::List output_padding, + torch::List dilation, + int64_t groups, + bool transpose); + + torch::List stride() const override { + return stride_; + } + + torch::List padding() const override { + return padding_; + } + + torch::List output_padding() const override { + return output_padding_; + } + + torch::List dilation() const override { + return dilation_; + } + + int64_t groups() const override { + return groups_; + } + + bool transpose() const override { + return (bool)transpose_; + } + + private: + template + at::Tensor apply_impl( + const at::Tensor& input, + double output_scale, + int64_t output_zero_point); +}; + +#endif // #if AT_MKLDNN_ENABLED() diff --git a/aten/src/ATen/native/quantized/cpu/qadd.cpp b/aten/src/ATen/native/quantized/cpu/qadd.cpp index 6aaffff79a22cd..cbca3ba58ef7ef 100644 --- a/aten/src/ATen/native/quantized/cpu/qadd.cpp +++ b/aten/src/ATen/native/quantized/cpu/qadd.cpp @@ -7,10 +7,9 @@ #include #include #include +#include #include -#include - namespace at { namespace native { @@ -217,18 +216,170 @@ Tensor qnnpack_add(Tensor qa, Tensor qb, double scale, int64_t zero_point) { return qy; } -#endif +#endif // USE_PYTORCH_QNNPACK + +#ifdef USE_XNNPACK +C10_ALWAYS_INLINE +enum xnn_status xnnp_create_add_nd( + int8_t azp, + float ascale, + int8_t bzp, + float bscale, + int8_t czp, + float cscale, + int8_t output_min, + int8_t output_max, + uint32_t flags, + xnn_operator_t* op) { + return xnn_create_add_nd_qs8( + azp, /* int8_t input1_zero_point */ + ascale, /* float input1_scale */ + bzp, /* int8_t input2_zero_point */ + bscale, /* float input2_scale */ + czp, /* int8_t output_zero_point */ + cscale, /* float output_scale */ + output_min, /* int8_t output_min */ + output_max, /* int8_t output_max */ + flags, /* uint32_t flags */ + op); /* xnn_operator_t* add_op_out */ +} + +C10_ALWAYS_INLINE +enum xnn_status xnnp_setup_add_nd( + xnn_operator_t op, + const std::vector& a_shape, + const std::vector& b_shape, + const int8_t* da, + const int8_t* db, + int8_t* dc, + pthreadpool_t pt_pool) { + return xnn_setup_add_nd_qs8( + op, /* xnn_operator_t add_op */ + a_shape.size(), /* size_t num_input1_dims */ + a_shape.data(), /* const size_t* input1_shape */ + b_shape.size(), /* size_t num_input2_dims */ + b_shape.data(), /* const size_t* input2_shape */ + da, /* const int8_t* input1 */ + db, /* const int8_t* input2 */ + dc, /* int8_t* output */ + pt_pool); /* pthreadpool_t threadpool */ +} + +template +Tensor xnnp_add(Tensor qa, Tensor qb, double scale, int64_t zero_point) { + using underlying_t = typename scalar_t::underlying; + const string func_name = "xnnp_add()"; + TORCH_CHECK(qa.ndimension() > 0, func_name, ": Got empty input tensor."); + TORCH_CHECK(at::native::xnnpack::available(), func_name, ": XNNPACK is not available") + + // using qa memory format for qb to allow xnnpack kernel to flatten all the + // dims + auto qa_mem_format = qa.suggest_memory_format(); + Tensor qa_contig = qa.contiguous(qa_mem_format); + Tensor qb_contig = qb.contiguous(qa_mem_format); + + const auto a_zero_point = qa_contig.q_zero_point(); + const auto b_zero_point = qb_contig.q_zero_point(); + const auto a_scale = qa_contig.q_scale(); + const auto b_scale = qb_contig.q_scale(); + + Tensor qy = at::native::empty_affine_quantized( + at::infer_size_dimvector(qa_contig.sizes(), qb_contig.sizes()), + qa.scalar_type(), + c10::nullopt /* layout */, + kCPU, + c10::nullopt /* pin_memory */, + scale, + zero_point, + qa_mem_format); + + if (qa_contig.size(0) == 0) { + return qy; + } + + xnn_operator_t xnnp_op = nullptr; + xnnpack_operator xnnp_add_operator; + + auto output_max = std::numeric_limits::max(); + auto output_min = std::numeric_limits::min(); + if (ReLUFused) { + /* + * FIXME: use acticationLimits() + * With , MSVC runs into "error C3862: indetifier activationLimits not found". + */ + constexpr int64_t qmin = std::numeric_limits::min(); + constexpr int64_t qmax = std::numeric_limits::max(); + int64_t qvalue = static_cast(zero_point); + qvalue = std::max(qvalue, qmin); + output_min = static_cast(std::min(qvalue, qmax)); + } + + // Create an operator + auto status = xnnp_create_add_nd( + a_zero_point, + a_scale, + b_zero_point, + b_scale, + static_cast(zero_point), + static_cast(scale), + output_min, + output_max, + 0, + &xnnp_op); + xnnp_add_operator = xnnpack_operator(xnnp_op); + TORCH_CHECK( + status == xnn_status_success, + func_name, ": xnn create operator failed(", status,")!"); + + const auto qa_shape = xnnp_utils::get_mem_format_aware_shape(qa_contig); + const auto qb_shape = xnnp_utils::get_mem_format_aware_shape(qb_contig); + + // Setup the operator + status = xnnp_setup_add_nd( + xnnp_add_operator.get(), + qa_shape, + qb_shape, + reinterpret_cast(qa_contig.data_ptr()), + reinterpret_cast(qb_contig.data_ptr()), + reinterpret_cast(qy.data_ptr()), + caffe2::pthreadpool_()); + TORCH_CHECK( + status == xnn_status_success, + func_name, ": xnn setup operator failed(", status,")!"); + + // Run the operator + status = xnn_run_operator( + xnnp_add_operator.get(), /* xnn_operator_t op */ + caffe2::pthreadpool_()); /* pthreadpool_t threadpool */ + TORCH_CHECK( + status == xnn_status_success, + func_name, ": xnn run operator failed(", status,")"); + return qy; +} +#endif // USE_XNNPACK template Tensor qadd(Tensor qa, Tensor qb, double scale, int64_t zero_point) { check_inputs(qa, qb); + + if (at::globalContext().qEngine() == at::QEngine::QNNPACK) { + TORCH_CHECK( + qa.scalar_type() == qb.scalar_type(), + "Both inputs to qadd must have same type"); + +#ifdef USE_XNNPACK + if (qa.scalar_type() == kQInt8) { + return xnnp_add(qa, qb, scale, zero_point); + } +#endif // USE_XNNPACK + #ifdef USE_PYTORCH_QNNPACK - if (at::globalContext().qEngine() == at::QEngine::QNNPACK && - qa.sizes() == qb.sizes() && /* qnnpack does not support boradcasting */ - qa.scalar_type() == kQUInt8 && qb.scalar_type() == kQUInt8) { + if(qa.sizes() == qb.sizes() && /* qnnpack does not support boradcasting */ + qa.scalar_type() == kQUInt8) { return qnnpack_add(qa, qb, scale, zero_point); + } +#endif // USE_PYTORCH_QNNPACK } -#endif auto qc = at::_empty_affine_quantized( qa.sizes(), at::device(kCPU) diff --git a/aten/src/ATen/native/quantized/cpu/qconv.cpp b/aten/src/ATen/native/quantized/cpu/qconv.cpp index 4f8bcd257d5c39..aa77489f74195e 100644 --- a/aten/src/ATen/native/quantized/cpu/qconv.cpp +++ b/aten/src/ATen/native/quantized/cpu/qconv.cpp @@ -5,9 +5,12 @@ #include #include #include -#include +#include #include #include +#include +#include +#include #include #include #include @@ -588,22 +591,262 @@ template at::Tensor PackedConvWeight<3>::apply_impl( #ifdef USE_PYTORCH_QNNPACK +#ifdef USE_XNNPACK template -at::Tensor PackedConvWeightsQnnp::apply( - const at::Tensor& input, - double output_scale, - int64_t output_zero_point) { - return apply_impl(input, output_scale, output_zero_point); -} +template +at::Tensor PackedConvWeightsQnnp::apply_impl_xnnp( + const at::Tensor& act, double output_scale, int64_t output_zero_point) { + using underlying_t = typename scalar_t::underlying; -template -at::Tensor PackedConvWeightsQnnp::apply_relu( - const at::Tensor& input, - double output_scale, - int64_t output_zero_point) { - return apply_impl(input, output_scale, output_zero_point); + std::lock_guard lock(qnnp_mutex_); + + const std::string func_name = transpose() + ? "quantized::conv_transpose (xnnpack)" + : "quantized::conv (xnnpack)"; + TORCH_CHECK( + kSpatialDim == 2, + func_name, ": xnnpack does not currently support 3d convolution."); + + /* + * NB: + * [de]conv_prepack prepares weights (values, scale, and zero_points) ahead of + * time during prepack() call assuming the activation will be uint8_t. But it + * may not always be the case. A solution may involve making prepack routine + * aware of the input qdtype. But currently all the pieces are not ready to + * pass that model level info to the prepack function. So, for now, here in + * this function we have to massage weights if we learn the input qdtype is + * not uint8_t. This involves copying and converting uint8_t to int8_t + * whenever necessary. To add to that, since XNNPACK, as of writing this, + * doesn't support per_channel weights for quint8_t, we add following assert + * makes sure we don't run into that case. Also take shortcuts when processing + * weights, which means we have to revisit and fix some weight massging logic + * when we enable the missing feature in XNNPACK. + * + * Table below summarizes how the weights are handled, + * + * .-------------------------------------------------------------------------. + * | input_qdtype | uint8_t | int8_t | + * | per_channel | yes | no | yes | no | + * |-------------------------------------------------------------------------| + * | zero_points | at::zeros()* | orig_zp + 128 | at:zeros()** | orig_zp | + * | scale | dtype = float, no changes needed | + * | values | always processed before passing to XNNPACK | + * .-------------------------------------------------------------------------. + * + * Notes: * - zero_points for uint8_t + per_channel: no support in xnnpack, need + * to fix when support is added. ** - zero_points for int8_t: symmetric + * quantization means XNNPACK will ignore kernel zero point(s). + */ + + if ((std::is_same::value )) { + TORCH_CHECK(!per_channel(), + func_name, ": xnnpack does not currently have per_channel support with activation dtype of c10::quint8." + ); + } + + // More checks + ConvDimChecks( + act.ndimension(), + stride().size(), + padding().size(), + output_padding().size(), + dilation().size(), + func_name, + transpose()); + + const int64_t N = act.size(0); + const int64_t H = act.size(2); + const int64_t W = act.size(3); + const int64_t D = 1; + const int64_t M = bias.size(0); + + const auto act_nhwc = act.contiguous(c10::MemoryFormat::ChannelsLast); + const auto act_input_scale = act_nhwc.q_scale(); + + auto status = xnn_status_invalid_state; + + // Create an operator iff necessary + if (!xnnp_convolution_op || + (!input_scale.has_value() || input_scale.value() != act_input_scale)) { + xnn_operator_t xnnp_op = nullptr; + + // Update the input scale so we may cache the op + input_scale = act_input_scale; + + // create an empty tensor for packing the weights + const at::Tensor weight_contig = + orig_weight.contiguous(c10::MemoryFormat::ChannelsLast); + const float* w_scales_data = w_scales.data_ptr(); + underlying_t w_zp = 0; + at::Tensor weight_tensor; + + if (!per_channel()) { + w_zp = static_cast( + weight_contig.q_zero_point() + + (std::is_same::value ? 128 : 0)); + + weight_tensor = at::native::empty_affine_quantized( + weight_contig.sizes(), + c10::CppTypeToScalarType::value, + c10::nullopt /* layout */, + c10::kCPU, + c10::nullopt /* pin_memory */, + w_scales_data[0], + w_zp, + c10::MemoryFormat::ChannelsLast); + } else { /* per_channel */ + weight_tensor = at::native::empty_per_channel_affine_quantized( + weight_contig.sizes(), + w_scales, + at::zeros(w_scales.sizes(), at::kInt), /* see comment above about w_zp */ + weight_contig.q_per_channel_axis(), + c10::CppTypeToScalarType::value, + c10::nullopt /* layout */, + c10::kCPU, + c10::nullopt /* pin_memory */, + c10::MemoryFormat::ChannelsLast); + } + + // copy from the original weight and take care of dtype change if necessary + at::native::xnnp_utils::q8_copy_int8_weight_and_add_offset( + weight_contig, weight_tensor); + const at::Tensor xnnp_weight = + at::native::xnnp_utils::convert_conv_weights_to_channel_last_tensor< + kSpatialDim>(weight_tensor, groups(), transpose()); + + auto output_min = kReluFused + // NOLINTNEXTLINE(bugprone-narrowing-conversions,cppcoreguidelines-narrowing-conversions) + ? activationLimits(output_scale, output_zero_point, Activation::RELU).first + : std::numeric_limits::min(); + auto output_max = kReluFused + // NOLINTNEXTLINE(bugprone-narrowing-conversions,cppcoreguidelines-narrowing-conversions) + ? activationLimits(output_scale, output_zero_point, Activation::RELU).second + : std::numeric_limits::max(); + + + // Original bias was float, so we requantize it here. + at::Tensor qbias; + if (per_channel()) { + auto bias_quant_scales = + weight_contig.q_per_channel_scales() * act_input_scale; + auto bias_zp = at::zeros(bias_quant_scales.sizes(), c10::kInt); + qbias = at::native::quantize_per_channel( + bias, bias_quant_scales, bias_zp, 0, c10::kQInt32); + } else { + qbias = at::native::quantize_per_tensor( + bias, weight_contig.q_scale() * act_input_scale, 0, c10::kQInt32); + } + + status = at::native::xnnp_utils::xnnp_create_convolution2d_nhwc( + padding()[0], + padding()[1], + padding()[0], + padding()[1], + kernel_[0], + kernel_[1], + stride()[0], + stride()[1], + dilation()[0], + dilation()[1], + groups(), + !transpose() ? orig_weight.size(1) : orig_weight.size(0) / groups(), + !transpose() ? orig_weight.size(0) / groups() : orig_weight.size(1), + !transpose() ? orig_weight.size(1) * groups() : orig_weight.size(0), + !transpose() ? orig_weight.size(0) : orig_weight.size(1) * groups(), + act_nhwc.q_zero_point(), + act_input_scale, + w_zp, /* will be ignored for Q[SC]8, see comment + above about w_zp*/ + w_scales_data, + reinterpret_cast( + xnnp_weight.template data_ptr()), + reinterpret_cast(qbias.template data_ptr()), + output_zero_point, + output_scale, + output_min, + output_max, + 0, + &xnnp_op, + per_channel(), + transpose()); + + xnnp_convolution_op = xnnpack_operator(xnnp_op); + TORCH_CHECK( + status == xnn_status_success, + func_name, + ": xnn create operator failed(", + status, + ")"); + } + + at::SmallVector output_shape; + const auto input_shape = MakeInputShape(D, H, W); + if (transpose()) { + output_shape = MakeDeConvOutputShape( + N, M, {H, W}, kernel_, stride(), padding(), output_padding(), dilation()); + } else { + output_shape = MakeConvOutputShape( + N, M, input_shape, kernel_, stride(), padding(), dilation()); + } + + if (act_nhwc.numel() > 0) { + TORCH_CHECK( + std::all_of( + output_shape.begin(), + output_shape.end(), + [](int64_t i) { return i > 0; }), + func_name, ": ", kSpatialDim, "d (xnnpack): each dimension of output tensor should be greater than 0.") + } + + // Allocate output Tensor and a buffer for XNNPACK to use + at::Tensor output = at::native::empty_affine_quantized( + output_shape, + c10::CppTypeToScalarType::value, + c10::nullopt /* layout */, + c10::kCPU, + c10::nullopt /* pin_memory */, + output_scale, + output_zero_point, + c10::MemoryFormat::ChannelsLast); + + // Setup the operator + status = at::native::xnnp_utils::xnnp_setup_convolution2d_nhwc( + xnnp_convolution_op.get(), + N, + H, + W, + reinterpret_cast(act_nhwc.template data_ptr()), + reinterpret_cast(output.template data_ptr()), + caffe2::pthreadpool_(), + per_channel(), + transpose(), + output_padding()[0], + output_padding()[1]); + + TORCH_CHECK( + status == xnn_status_success, + func_name, + ": xnn setup operator failed(", + status, + ")"); + + // Run the operator + status = xnn_run_operator( + xnnp_convolution_op.get(), /* xnn_operator_t op */ + caffe2::pthreadpool_()); /* pthreadpool_t threadpool */ + + TORCH_CHECK( + status == xnn_status_success, + func_name, + ": xnn run operator failed(", + status, + ")"); + + return output; } +#endif // USE_XNNPACK + template template at::Tensor PackedConvWeightsQnnp::apply_impl( @@ -622,7 +865,7 @@ at::Tensor PackedConvWeightsQnnp::apply_impl( func_name, "(qnnpack): Expected activation data type ", toString(c10::kQUInt8), - "but got ", + " but got ", toString(act.scalar_type())); ConvDimChecks( act.ndimension(), stride().size(), padding().size(), @@ -820,6 +1063,61 @@ at::Tensor PackedConvWeightsQnnp::apply_impl( return output; } +#ifdef USE_XNNPACK +bool can_use_xnnp( + c10::ScalarType dtype, + int kSpatialDim, + bool per_channel, + bool transpose) { + if (!at::native::xnnpack::available()) { + return false; + } + bool supported_dtypes = dtype == c10::kQInt8; + bool invalid_config = + (kSpatialDim != 2 /* No support for 3d convolution */ + || (dtype == c10::kQInt8 && transpose && + per_channel)); /* int8_t deconv does not support per-channel */ + if (supported_dtypes && invalid_config) { + /* don't want this to fall through to QNNPACK */ + const std::string func_name = + transpose ? "quantized::conv_transpose" : "quantized::conv"; + TORCH_CHECK( + false, + func_name, + " (xnnpack): Unsupported conv config for dtype KQInt8"); + } + return supported_dtypes && !invalid_config; +} +#endif // USE_XNNPACK + +template +at::Tensor PackedConvWeightsQnnp::apply( + const at::Tensor& input, + double output_scale, + int64_t output_zero_point) { +#ifdef USE_XNNPACK + if (can_use_xnnp(input.scalar_type(), kSpatialDim, per_channel(), transpose())) { + return apply_impl_xnnp( + input, output_scale, output_zero_point); + } /* fall through for unsupported types, configs, or shapes */ +#endif // USE_XNNPACK + return apply_impl(input, output_scale, output_zero_point); +} + +template +at::Tensor PackedConvWeightsQnnp::apply_relu( + const at::Tensor& input, + double output_scale, + int64_t output_zero_point) { +#ifdef USE_XNNPACK + if (can_use_xnnp(input.scalar_type(), kSpatialDim, per_channel(), transpose())) { + return apply_impl_xnnp( + input, output_scale, output_zero_point); + } /* fall through for unsupported types, configs, or shapes */ +#endif // USE_XNNPACK + return apply_impl(input, output_scale, output_zero_point); +} + template at::Tensor PackedConvWeightsQnnp<2>::apply( const at::Tensor& act, double output_scale, @@ -852,6 +1150,177 @@ template at::Tensor PackedConvWeightsQnnp<3>::apply_impl( #endif // USE_PYTORCH_QNNPACK +#if AT_MKLDNN_ENABLED() +template +at::Tensor PackedConvWeightsOnednn::apply( + const at::Tensor& input, + double output_scale, + int64_t output_zero_point) { + return apply_impl(input, output_scale, output_zero_point); +} + +template +at::Tensor PackedConvWeightsOnednn::apply_relu( + const at::Tensor& input, + double output_scale, + int64_t output_zero_point) { + return apply_impl(input, output_scale, output_zero_point); +} + +template +template +at::Tensor PackedConvWeightsOnednn::apply_impl( + const at::Tensor& act, + double output_scale, + int64_t output_zero_point) { + std::string func_name = "quantized::conv"; + if (transpose()) { + func_name += "_transpose"; + } + func_name += std::to_string(kSpatialDim) + "d"; + if (kReluFused) { + func_name += "_relu"; + } + ConvDimChecks( + act.ndimension(), stride().size(), padding().size(), + output_padding().size(), dilation().size(), func_name, transpose()); + TORCH_CHECK(act.scalar_type() == c10::ScalarType::QUInt8, + func_name, " (ONEDNN): data type of input should be QUint8."); + + // src + auto act_contig = act.contiguous(kSpatialDim == 2 ? c10::MemoryFormat::ChannelsLast : c10::MemoryFormat::ChannelsLast3d); + auto src_dims = act_contig.sizes().vec(); + auto src_data_type = dnnl::memory::data_type::u8; + auto src_desc = ideep::tensor::desc(src_dims, src_data_type, + kSpatialDim == 2 ? ideep::format_tag::nhwc : ideep::format_tag::ndhwc); + ideep::tensor src; + src.init(src_desc, act_contig.data_ptr()); + // weights & bias + ideep::tensor& weights = *(weight_.get()); + bool with_bias = bias_.has_value(); + const auto& kernel_size = weights.get_dims(); + // dst + const std::vector& input_size = src.get_dims(); + std::vector output_sizes; + if (transpose()) { + // Prepacked weight format: [o, i, ...] + const int N = act.size(0); // batch size + const int C = act.size(1); // input channels + const int M = weights.get_dim(0); // output channels + const int D = kSpatialDim == 2 ? 1 : act.size(2); // input depth + const int H = act.size(kSpatialDim); // input height + const int W = act.size(kSpatialDim + 1); // input width + const int KH = weights.get_dim(kSpatialDim); // kernel height + const int KW = weights.get_dim(kSpatialDim + 1); // kernel width + const int KD = kSpatialDim == 2 ? 1 : weights.get_dim(2); // kernel depth + TORCH_CHECK(C == groups() * weights.get_dim(1), // weight: [o, i, ...] + func_name, " (ONEDNN): input channel number should be ", + groups() * weights.get_dim(1), ", but got ", C); + auto output_shape = MakeDeConvOutputShape( + N, + M, + kSpatialDim == 2 ? std::vector{H, W} : std::vector{D, H, W}, + kSpatialDim == 2 ? std::vector{KH, KW} : std::vector{KD, KH, KW}, + stride(), + padding(), + output_padding(), + dilation()); + output_sizes = c10::IntArrayRef(output_shape).vec(); + } else { + output_sizes = at::native::conv_output_size(input_size, kernel_size, padding().vec(), stride().vec(), dilation().vec()); + } + ideep::dims dst_dims = ideep::dims({output_sizes.cbegin(), output_sizes.cend()}); + at::Tensor output = at::_empty_affine_quantized( + dst_dims, + device(c10::kCPU) + .dtype(c10::kQUInt8) + .memory_format(kSpatialDim == 2 ? + c10::MemoryFormat::ChannelsLast : + c10::MemoryFormat::ChannelsLast3d), + output_scale, + output_zero_point, + c10::nullopt); + if (output.numel() == 0) { + return output; + } + ideep::tensor dst({dst_dims, ideep::tensor::data_type::u8, {output.strides().cbegin(), output.strides().cend()}}, + output.data_ptr()); + // Parameters + const ideep::dims& strides = stride().vec(); + const ideep::dims& dilates = dilation().vec(); + const ideep::dims& padding_l = padding().vec(); + const ideep::dims& padding_r = padding().vec(); + const ideep::scale_t& src_scales = ideep::scale_t(1, 1.0/act.q_scale()); // Scales of ONEDNN and PyTorch are reciprocal + const ideep::scale_t& weights_scales = weights.get_scale(); + const ideep::scale_t& dst_scales = ideep::scale_t(weights_scales.size(), 1.0/output_scale); // Scales of ONEDNN and PyTorch are reciprocal + const ideep::zero_point_t src_zero_points = ideep::zero_point_t(1, act.q_zero_point()); + const ideep::zero_point_t dst_zero_points = ideep::zero_point_t(1, output_zero_point); + ideep::attr_t op_attr = kReluFused ? ideep::attr_t::fuse_relu() : ideep::attr_t(); + op_attr.set_zero_points(DNNL_ARG_SRC, ideep::utils::tensor_zp_mask(1), {DNNL_RUNTIME_S32_VAL}); // runtime src zero point + if (with_bias) { + // Bias might be modified outside (e.g. by quantization bias correction). + // If so, update the prepacked bias as well. + if (bias_.value().get_data_handle() != orig_bias_.value().data_ptr()) { + bias_.value().init(bias_.value().get_desc(), orig_bias_.value().data_ptr()); + } + const auto& b = bias_.value(); + if (transpose()) { + ideep::convolution_transpose_forward::compute_v2( + src, weights, b, dst_dims, dst, + strides, padding_l, padding_r, dilates, + groups(), src_scales, weights_scales, dst_scales, src_zero_points, dst_zero_points, + op_attr, dnnl::algorithm::deconvolution_direct, dnnl::prop_kind::forward_inference, + ideep::u8s8, ideep::engine::cpu_engine()); + } else { + ideep::convolution_forward::compute_v2( + src, weights, b, dst_dims, dst, + strides, dilates, padding_l, padding_r, groups(), + src_scales, weights_scales, dst_scales, src_zero_points, dst_zero_points, + op_attr, dnnl::algorithm::convolution_direct, dnnl::prop_kind::forward_inference, + ideep::u8s8, ideep::engine::cpu_engine()); + } + } else { + if (transpose()) { + ideep::convolution_transpose_forward::compute_v2( + src, weights, dst_dims, dst, + strides, padding_l, padding_r, dilates, + groups(), src_scales, weights_scales, dst_scales, src_zero_points, dst_zero_points, + op_attr, dnnl::algorithm::deconvolution_direct, dnnl::prop_kind::forward_inference, + ideep::u8s8, ideep::engine::cpu_engine()); + } else { + ideep::convolution_forward::compute_v2( + src, weights, dst_dims, dst, + strides, dilates, padding_l, padding_r, groups(), + src_scales, weights_scales, dst_scales, src_zero_points, dst_zero_points, + op_attr, dnnl::algorithm::convolution_direct, dnnl::prop_kind::forward_inference, + ideep::u8s8, ideep::engine::cpu_engine()); + } + } + return output; +} + +template at::Tensor PackedConvWeightsOnednn<2>::apply( + const at::Tensor& act, + double output_scale, + int64_t output_zero_point); + +template at::Tensor PackedConvWeightsOnednn<2>::apply_relu( + const at::Tensor& act, + double output_scale, + int64_t output_zero_point); + +template at::Tensor PackedConvWeightsOnednn<3>::apply( + const at::Tensor& act, + double output_scale, + int64_t output_zero_point); + +template at::Tensor PackedConvWeightsOnednn<3>::apply_relu( + const at::Tensor& act, + double output_scale, + int64_t output_zero_point); + +#endif // #if AT_MKLDNN_ENABLED() + namespace at { namespace native { namespace { diff --git a/aten/src/ATen/native/quantized/cpu/qconv_dynamic.cpp b/aten/src/ATen/native/quantized/cpu/qconv_dynamic.cpp index ec95748cd42ba1..2f3a6ed8f3cdb0 100644 --- a/aten/src/ATen/native/quantized/cpu/qconv_dynamic.cpp +++ b/aten/src/ATen/native/quantized/cpu/qconv_dynamic.cpp @@ -5,9 +5,10 @@ #include #include #include -#include +#include #include #include +#include #include #include #include @@ -118,6 +119,57 @@ template at::Tensor PackedConvWeightsQnnp<3>::apply_dynamic( #endif // USE_PYTORCH_QNNPACK +#if AT_MKLDNN_ENABLED() + +template +at::Tensor PackedConvWeightsOnednn::apply_dynamic( + const at::Tensor& input, + bool reduce_range) { + + // Find min/max of input + float x_max = 0, x_min = 0; + if (input.numel() > 0) { + x_min = input.min().item(); + x_max = input.max().item(); + } + + // Input tensor is quantized as 8-bit unsigned values + static constexpr int precision = 8; + static constexpr bool is_signed = false; + + // Calculate scale and zero point for quantization of input tensor + auto q_params = quant_utils::ChooseQuantizationParams( + /*min=*/x_min, + /*max=*/x_max, + /*qmin=*/is_signed ? -(1 << (precision - 1)) : 0, + /*qmax=*/ + is_signed ? ((1 << (precision - 1)) - 1) : (1 << precision) - 1, + /*preserve_sparsity=*/false, + /*force_scale_power_of_two=*/false, + /*reduce_range=*/reduce_range); + + // Quantize input + at::Tensor q_input = at::quantize_per_tensor( + input, q_params.scale, q_params.zero_point, c10::kQUInt8); + + at::Tensor out = + apply_impl(q_input, q_params.scale, q_params.zero_point); + + // TODO: Modify ideep to allow fp32 input & output + // to avoid explicit `quantize - dequantize` + return at::dequantize(out); +} + +template at::Tensor PackedConvWeightsOnednn<2>::apply_dynamic( + const at::Tensor& input, + bool reduce_range); + +template at::Tensor PackedConvWeightsOnednn<3>::apply_dynamic( + const at::Tensor& input, + bool reduce_range); + +#endif // AT_MKLDNN_ENABLED() + namespace at { namespace native { namespace { diff --git a/aten/src/ATen/native/quantized/cpu/qconv_prepack.cpp b/aten/src/ATen/native/quantized/cpu/qconv_prepack.cpp index 3cb5d9ef1a18cc..85edffef25b982 100644 --- a/aten/src/ATen/native/quantized/cpu/qconv_prepack.cpp +++ b/aten/src/ATen/native/quantized/cpu/qconv_prepack.cpp @@ -2,10 +2,11 @@ #include #include -#include +#include #include #include #include +#include #include #include #include @@ -314,6 +315,165 @@ c10::intrusive_ptr> PackedConvWeightsQnnp< bool transpose); #endif // USE_PYTORCH_QNNPACK +#if AT_MKLDNN_ENABLED() +template +c10::intrusive_ptr> PackedConvWeightsOnednn< + kSpatialDim>:: + prepack( + at::Tensor weight, + c10::optional bias, + torch::List stride, + torch::List padding, + torch::List output_padding, + torch::List dilation, + int64_t groups, + bool transpose) { + TORCH_CHECK( + weight.ndimension() == kSpatialDim + 2, + "Weights are expected to have ", kSpatialDim + 2, " dimensions"); + TORCH_CHECK( + stride.size() == kSpatialDim, + "stride should contain ", kSpatialDim, " elements for ", + kSpatialDim, "D convolution."); + TORCH_CHECK( + padding.size() == kSpatialDim, + "Specify front/top/left padding only. " + "end/bottom/right padding assumed to be equal to front/top/left"); + TORCH_CHECK( + !transpose || output_padding.size() == kSpatialDim, + "quantized::conv_prepack: Specify top/left output padding " + "only. bottom/right padding assumed to be equal to top/left"); + TORCH_CHECK( + dilation.size() == kSpatialDim, + "dilation should contain ", kSpatialDim, " elements for ", + kSpatialDim, "D convolution."); + TORCH_CHECK( + !transpose || std::all_of(output_padding.begin(), output_padding.end(), [](int i) { return i==0; }), + "quantized::conv_prepack: ONEDNN only supports zero output_padding."); + + // Weight + // Format: [OC IC//group KH KW] for conv; [IC OC//group KH KW] for deconv + auto dims = weight.sizes().vec(); + auto strides = stride.vec(); + auto padding_l = padding.vec(); + auto padding_r = padding.vec(); + auto dilates = dilation.vec(); + auto op_attr = ideep::attr_t(); + std::vector wgt_zero_points; + ideep::scale_t wgt_scales; + const int output_channels = transpose ? weight.size(1) * groups + : weight.size(0); + const auto qtype = weight.qscheme(); + if (qtype == c10::kPerTensorAffine) { + TORCH_CHECK( + weight.q_zero_point()==0, + "quantized::qconv_prepack: ONEDNN only supports symmetric quantization of weight," + " whose zero point must be 0."); + wgt_zero_points = std::vector(1, weight.q_zero_point()); + wgt_scales = ideep::scale_t(1, 1.0/weight.q_scale()); // Scales of ONEDNN and PyTorch are reciprocal + } else if (qtype == c10::kPerChannelAffine) { + TORCH_CHECK( + !transpose, + "Per Channel Quantization is currently disabled for transposed conv"); + wgt_zero_points.resize(output_channels); + wgt_scales.resize(output_channels); + for (int i = 0; i < output_channels; ++i) { + wgt_zero_points[i] = weight.q_per_channel_zero_points()[i].item(); + TORCH_CHECK( + wgt_zero_points[i]==0, + "quantized::qconv_prepack: ONEDNN only supports symmetric quantization of weight," + " whose zero point must be 0."); + wgt_scales[i] = 1.0f / weight.q_per_channel_scales()[i].item(); // Scales of ONEDNN and PyTorch are reciprocal + } + } else { + TORCH_CHECK(false, "Unsupported qscheme: ", toString(qtype)); + } + + // Set runtime src zero point + auto src_zero_point = {DNNL_RUNTIME_S32_VAL}; + op_attr.set_zero_points(DNNL_ARG_SRC, + ideep::utils::tensor_zp_mask(src_zero_point.size()), + src_zero_point); + at::Tensor weight_copy; + ideep::tensor::desc w_desc; + ideep::dims dims_iohw, dims_giohw; + ideep::tag w_tag = ideep::tag::any; + const bool with_groups = groups > 1; + if (transpose) { + w_desc = ideep::convolution_transpose_forward::expected_weights_desc( + dims, dnnl::memory::data_type::s8, + strides, padding_l, padding_r, dilates, groups, + dnnl::algorithm::deconvolution_direct, dnnl::prop_kind::forward_inference, + ideep::dims(), op_attr); + // convolution_transpose_forward::expected_weights_desc() gives format [i, o, ...], + // but ONEDNN requires [o, i, ...] for computation + dims_iohw = w_desc.get_dims(); + dims_giohw = with_groups ? ideep::utils::group_dims(dims_iohw, groups) : dims_iohw; + std::vector perms(dims_giohw.size(), 0); // for permutation of weight + std::iota(perms.begin(), perms.end(), 0); + w_desc = w_desc.transpose(with_groups, with_groups + 1); + std::swap(perms[with_groups], perms[with_groups + 1]); + weight_copy = weight.reshape(dims_giohw).permute(c10::IntArrayRef(perms)).clone(); + } else { + w_desc = ideep::convolution_forward::expected_weights_desc( + dims, dnnl::memory::data_type::s8, + strides, padding_l, padding_r, dilates, groups, + dnnl::algorithm::convolution_direct, dnnl::prop_kind::forward_inference, + dnnl::memory::data_type::u8, ideep::dims(), op_attr); + weight_copy = weight.clone(); + } + if (with_groups) { + w_tag = kSpatialDim == 2 ? ideep::tag::goihw : ideep::tag::goidhw; + } else { + w_tag = kSpatialDim == 2 ? ideep::tag::oihw : ideep::tag::oidhw; + } + ideep::dims w_dims = with_groups ? ideep::utils::group_dims(w_desc.get_dims(), groups) + : w_desc.get_dims(); + ideep::tensor wgt = ideep::tensor( + ideep::tensor::desc({w_dims, dnnl::memory::data_type::s8, w_tag}, groups), + weight_copy.data_ptr()); + wgt.set_scale(wgt_scales); // Scales are needed for feed_from(). + ideep::tensor exp_wgt; + exp_wgt.init(w_desc); + exp_wgt.set_scale(wgt_scales); // Also for feed_from() + exp_wgt.feed_from(wgt, transpose); // expect wgt to be in [OC IC KH KW] format + ideep::tensor * packed_weight_p = new ideep::tensor(exp_wgt); + packed_weight_p->set_scale(wgt_scales); + packed_weight_p->set_zero_point(wgt_zero_points); + std::unique_ptr weight_ptr(packed_weight_p); + // Bias + c10::optional onednn_bias{c10::nullopt}; + if (bias.has_value()) { + at::Tensor bias_vec = bias.value(); + TORCH_CHECK(bias_vec.dim() == 1, "bias should be a vector (1D Tensor)"); + TORCH_CHECK( + bias_vec.size(0) == output_channels, + "bias should have K elements: " + std::to_string(output_channels)); + auto bias_desc = ideep::tensor::desc(bias.value().sizes().vec(), dnnl::memory::data_type::f32); + ideep::tensor packed_bias; + packed_bias.init(bias_desc, bias.value().data_ptr()); + onednn_bias = c10::optional(packed_bias); + } + auto ret_ptr = c10::make_intrusive>( + PackedConvWeightsOnednn{ + std::move(weight_ptr), + onednn_bias, + weight, + bias, + stride, + padding, + output_padding, + dilation, + groups, + transpose + }); + return ret_ptr; +} + +template struct PackedConvWeightsOnednn<2>; +template struct PackedConvWeightsOnednn<3>; +#endif // #if AT_MKLDNN_ENABLED() + namespace at { namespace native { namespace { @@ -377,6 +537,14 @@ class QConvPackWeightInt8 final { } #endif +#if AT_MKLDNN_ENABLED() + if (ctx.qEngine() == at::QEngine::ONEDNN) { + return PackedConvWeightsOnednn::prepack( + weight, bias, stride, padding, output_padding, dilation, groups, + transpose); + } +#endif + TORCH_CHECK( false, "Didn't find engine for operation quantized::conv2d_prepack ", @@ -438,8 +606,6 @@ class QConv1dPackWeightInt8 final { } #endif - - #ifdef USE_PYTORCH_QNNPACK if (ctx.qEngine() == at::QEngine::QNNPACK) { return PackedConvWeightsQnnp<2>::prepack( @@ -447,6 +613,15 @@ class QConv1dPackWeightInt8 final { transpose); } #endif + +#if AT_MKLDNN_ENABLED() + if (ctx.qEngine() == at::QEngine::ONEDNN) { + return PackedConvWeightsOnednn<2>::prepack( + weight, bias, stride, padding, output_padding, dilation, groups, + transpose); + } +#endif + TORCH_CHECK( false, "Didn't find engine for operation quantized::conv1d_prepack ", diff --git a/aten/src/ATen/native/quantized/cpu/qconv_unpack_impl.cpp b/aten/src/ATen/native/quantized/cpu/qconv_unpack_impl.cpp new file mode 100644 index 00000000000000..693e093b120949 --- /dev/null +++ b/aten/src/ATen/native/quantized/cpu/qconv_unpack_impl.cpp @@ -0,0 +1,136 @@ +#include +#include + +#include +#include +#include +#include +#include +#include +#include + +#ifdef USE_FBGEMM +template +std::tuple> PackedConvWeight< + kSpatialDim>::unpack() { + auto* packed_weights_p = w.get(); + // output channels + const int output_channels = packed_weights_p->outputChannels(); + const int input_channels = packed_weights_p->inputChannels(); + const int groups = packed_weights_p->groups(); + + const int kernel_d = kSpatialDim == 2 ? 1 : kernel[0]; + // R (kernel height) + const int kernel_h = kernel[kSpatialDim - 2]; + // S (kernel width) + const int kernel_w = kernel[kSpatialDim - 1]; + + const int C_per_G = input_channels / groups; + + // Tensor for unpacked weights + // Unpacked format would be physical KRS(C/G) but logical KCRS (channels + // first) because that's how + // ChannelsLast3d is not available now.FBGEMM stores the weights + // TODO: Unify 2d and 3d when ChannelsLast3d is ready. + at::Tensor unpacked_weights; + if (q_scheme == c10::kPerTensorAffine) { + unpacked_weights = kSpatialDim == 2 + ? at::_empty_affine_quantized( + {output_channels, C_per_G, kernel_h, kernel_w}, + device(c10::kCPU) + .dtype(c10::kQInt8) + .memory_format(c10::MemoryFormat::ChannelsLast), + w_scale[0], + w_zp[0], + c10::nullopt) + : at::native::fbgemm_utils:: + MakeEmptyAffineQuantizedChannelsLast3dTensor( + output_channels, + C_per_G, + kernel_d, + kernel_h, + kernel_w, + device(c10::kCPU).dtype(c10::kQInt8), + w_scale[0], + w_zp[0]); + } else if (q_scheme == c10::kPerChannelAffine) { + TORCH_CHECK( + !transpose(), + "Per Channel Quantization is currently disabled for transposed conv"); + auto scales = at::from_blob( + w_scale.data(), w_scale.size(), device(c10::kCPU).dtype(c10::kFloat)); + auto zero_points = at::from_blob( + w_zp.data(), w_zp.size(), device(c10::kCPU).dtype(c10::kInt)); + unpacked_weights = kSpatialDim == 2 + ? at::_empty_per_channel_affine_quantized( + {output_channels, C_per_G, kernel_h, kernel_w}, + scales.toType(c10::kDouble), + zero_points.toType(c10::kLong), + 0, /* The output channel axis is 0 */ + device(c10::kCPU).dtype(c10::kQInt8), + c10::MemoryFormat::ChannelsLast) + : at::native::fbgemm_utils:: + MakeEmptyPerChannelAffineQuantizedChannelsLast3dTensor( + output_channels, + C_per_G, + kernel_d, + kernel_h, + kernel_w, + device(c10::kCPU).dtype(c10::kQInt8), + scales.toType(c10::kDouble), + zero_points.toType(c10::kLong)); + } else { + TORCH_CHECK(false, "Unsupported qscheme: ", toString(q_scheme)); + } + int8_t* unpacked_weights_p = + reinterpret_cast(unpacked_weights.data_ptr()); + packed_weights_p->unpack(unpacked_weights_p); + if(transpose()){ + unpacked_weights = + at::native::fbgemm_utils::TransposeConvTensorUnpackConversion< + kSpatialDim>(unpacked_weights, groups); + } + return std::tuple>( + unpacked_weights, bias); +} + +template std::tuple> PackedConvWeight< + 2>::unpack(); +template std::tuple> PackedConvWeight< + 3>::unpack(); +#endif // USE_FBGEMM + +#ifdef USE_PYTORCH_QNNPACK +template +std::tuple> PackedConvWeightsQnnp< + kSpatialDim>::unpack() { + TORCH_CHECK( + kSpatialDim == 2, + "QNNPACK only supports conv2d_unpack right " + "now."); + TORCH_CHECK( + orig_weight.defined(), + "Cannot unpack weights. " + "Call at::globalContext()::setReleaseOriginalWeights(false) before packing or loading to enable unpacking."); + return std::tuple>(orig_weight, bias); +} + +template std::tuple> PackedConvWeightsQnnp< + 2>::unpack(); +template std::tuple> PackedConvWeightsQnnp< + 3>::unpack(); +#endif // USE_PYTORCH_QNNPACK + +#if AT_MKLDNN_ENABLED() +template +std::tuple> PackedConvWeightsOnednn< + kSpatialDim>::unpack() { + return std::tuple>( + orig_weight_, orig_bias_); +} + +template std::tuple> PackedConvWeightsOnednn< + 2>::unpack(); +template std::tuple> PackedConvWeightsOnednn< + 3>::unpack(); +#endif // #if AT_MKLDNN_ENABLED() diff --git a/aten/src/ATen/native/quantized/cpu/qlinear.cpp b/aten/src/ATen/native/quantized/cpu/qlinear.cpp index ac055bf74a6e38..d358f23c6af362 100644 --- a/aten/src/ATen/native/quantized/cpu/qlinear.cpp +++ b/aten/src/ATen/native/quantized/cpu/qlinear.cpp @@ -2,8 +2,10 @@ #include #include #include -#include +#include #include +#include +#include #include #include #include @@ -270,6 +272,161 @@ at::Tensor& PackedLinearWeight::apply_relu_out( #endif // USE_FBGEMM #ifdef USE_PYTORCH_QNNPACK + +#ifdef USE_XNNPACK +// TODO: add per_channel support in the future when xnnp supports it +template +at::Tensor PackedLinearWeightsQnnp::apply_impl_xnnp( + const at::Tensor& input, + double output_scale, + int64_t output_zero_point) { + using underlying_t = typename scalar_t::underlying; + + std::lock_guard lock(qnnp_mutex_); + + const std::string func_name = kReluFused ? "quantized::linear_relu (xnnpack)" + : "quantized::linear (xnnpack)"; + TORCH_CHECK( + input.dim() >= 2, func_name, ": Input tensor rank should be >= 2."); + TORCH_CHECK( + !per_channel(), + func_name, + ": xnnpack does not currently have per_channel support."); + + const auto input_contig = input.contiguous(); + const auto input_scale = input_contig.q_scale(); + + const size_t rows_w = bias_.size(0); + const size_t cols_w = input_contig.size(input_contig.dim() - 1); + + auto status = xnn_status_invalid_state; + + // Create an operator iff not already created + if (!xnnp_linear_op || + (!this->input_scale.has_value() || + this->input_scale.value() != input_scale)) { + // Update the input scale so we may cache the op + this->input_scale = input_scale; + + xnn_operator_t xnnp_op = nullptr; + + const float* weight_scales_data = w_scales.data_ptr(); + + // prepare weights + underlying_t w_zp = static_cast( + orig_weight.q_zero_point() + + (std::is_same::value ? 128 : 0)); + + at::Tensor xnnp_weight = at::_empty_affine_quantized( + orig_weight.sizes(), + c10::CppTypeToScalarType::value, + weight_scales_data[0], + w_zp); + + // copy from the original weight and take care of dtype change if necessary + at::native::xnnp_utils::q8_copy_int8_weight_and_add_offset( + orig_weight, xnnp_weight); + + // Original bias was float, so we requantize it here. + at::Tensor qbias = at::native::quantize_per_tensor( + bias_, orig_weight.q_scale() * input_scale, 0, c10::kQInt32); + + // output limits + auto output_min = kReluFused + // NOLINTNEXTLINE(bugprone-narrowing-conversions,cppcoreguidelines-narrowing-conversions) + ? activationLimits(output_scale, output_zero_point, Activation::RELU).first + : std::numeric_limits::min(); + auto output_max = kReluFused + // NOLINTNEXTLINE(bugprone-narrowing-conversions,cppcoreguidelines-narrowing-conversions) + ? activationLimits(output_scale, output_zero_point, Activation::RELU).second + : std::numeric_limits::max(); + + // Create an operator + status = at::native::xnnp_utils::xnnp_create_fully_connected_nc( + cols_w, /* input_channels */ + rows_w, /* output_channels */ + cols_w, /* input_stride */ + rows_w, /* output_stride */ + input_contig.q_zero_point(), + input_contig.q_scale(), + w_zp, + weight_scales_data[0], + reinterpret_cast( + xnnp_weight.template data_ptr()), + reinterpret_cast(qbias.data_ptr()), + output_zero_point, + output_scale, + output_min, + output_max, + 0, /* flags */ + &xnnp_op); + xnnp_linear_op = xnnpack_operator(xnnp_op); + + TORCH_CHECK( + status == xnn_status_success, + func_name, + ": xnn create operator failed(", + status, + ")"); + } + + /* + * Allocate output Tensor and a buffer for XNNPACK to use + * The resulting matrix here is 2-D, let's view it with the original + * left hand dimensions of the input. Here are two examples: + * 1. If the input tensor is {M, K}, the output tensor is {M, N}. + * 2. If the input tensor is {b, M, K}, the output tensor is {b, M, N}. + */ + std::vector out_sizes = input.sizes().vec(); + out_sizes.back() = static_cast(rows_w); + at::Tensor output = at::native::empty_affine_quantized( + out_sizes, + c10::CppTypeToScalarType::value, + c10::nullopt /* layout */, + c10::kCPU, + c10::nullopt /* pin_memory */, + output_scale, + output_zero_point, + input.suggest_memory_format()); + + // calculate batch_size + size_t rows_input = 1; + for (const auto i : c10::irange(input_contig.dim() - 1)) { + rows_input *= input_contig.size(i); + } + + // Setup the operator + status = at::native::xnnp_utils::xnnp_setup_fully_connected_nc( + xnnp_linear_op.get(), + rows_input, /* batch_size */ + reinterpret_cast( + input_contig.template data_ptr()), + reinterpret_cast(output.template data_ptr()), + caffe2::pthreadpool_()); + + TORCH_CHECK( + status == xnn_status_success, + func_name, + ": xnn setup operator failed(", + status, + ")"); + + // Run the opeator + status = xnn_run_operator( + xnnp_linear_op.get(), // Linear op + caffe2::pthreadpool_() // threadpool + ); + TORCH_CHECK( + status == xnn_status_success, + func_name, + ": xnn run operator failed(", + status, + ")"); + + return output; +} +#endif // USE_XNNPACK + template at::Tensor PackedLinearWeightsQnnp::apply_impl( at::Tensor input, @@ -414,10 +571,35 @@ at::Tensor PackedLinearWeightsQnnp::apply_impl( return output; } +#ifdef USE_XNNPACK +bool can_use_xnnp(c10::ScalarType dtype, bool per_channel) { + if(!at::native::xnnpack::available()) { + return false; + } + + bool supported_dtypes = dtype == c10::kQInt8; + bool invalid_config = per_channel; /* xnnp does not currently support + per-channel fully connected op */ + if (supported_dtypes && invalid_config) { + /* don't want this to fall through to QNNPACK */ + TORCH_CHECK( + false, + "quantized::linear (xnnpack): Unsupported config for dtype KQInt8"); + } + return supported_dtypes && !invalid_config; +} +#endif // USE_XNNPACK + at::Tensor PackedLinearWeightsQnnp::apply( at::Tensor input, double output_scale, int64_t output_zero_point) { +#ifdef USE_XNNPACK + if (can_use_xnnp(input.scalar_type(), per_channel())) { + return apply_impl_xnnp( + input, output_scale, output_zero_point); + } /* fall through for unsupported types, configs, or shapes */ +#endif // USE_XNNPACK return apply_impl(std::move(input), output_scale, output_zero_point); } @@ -425,11 +607,92 @@ at::Tensor PackedLinearWeightsQnnp::apply_relu( at::Tensor input, double output_scale, int64_t output_zero_point) { +#ifdef USE_XNNPACK + if (can_use_xnnp(input.scalar_type(), per_channel())) { + return apply_impl_xnnp( + input, output_scale, output_zero_point); + } /* fall through for unsupported types, configs, or shapes */ +#endif // USE_XNNPACK return apply_impl(std::move(input), output_scale, output_zero_point); } #endif // USE_PYTORCH_QNNPACK +#if AT_MKLDNN_ENABLED() +template +at::Tensor PackedLinearWeightsOnednn::apply_impl( + at::Tensor input, + double output_scale, + int64_t output_zero_point) { + const int64_t dim = input.dim(); + TORCH_CHECK( + dim != 0, + "qlinear (ONEDNN): input dim should be at least 1, but got 0"); + TORCH_CHECK(input.scalar_type() == c10::ScalarType::QUInt8, + "qlinear (ONEDNN): data type of input should be QUint8."); + + auto input_contig = input.expect_contiguous(); + auto& w = *(weight_.get()); + auto K = input.size(dim - 1), M = input.numel() / K, N = w.get_dim(1); + auto input_dims = {M, K}; + auto input_data_type = dnnl::memory::data_type::u8; + auto input_desc = ideep::tensor::desc(input_dims, input_data_type); + ideep::attr_t op_attr = ReluFused ? ideep::attr_t::fuse_relu() : ideep::attr_t(); + ideep::tensor x(input_desc, input_contig->data_ptr()); + auto dst_dims = {M, N}; + const ideep::scale_t& src_scales = ideep::scale_t(1, 1.0/input.q_scale()); + const ideep::scale_t& weights_scales = w.get_scale(); + const ideep::scale_t& dst_scales = ideep::scale_t(1, 1.0/output_scale); // Scales of ONEDNN and PyTorch are reciprocal + const ideep::zero_point_t& src_zero_point = ideep::zero_point_t(1, input.q_zero_point()); + const ideep::zero_point_t& dst_zero_point = ideep::zero_point_t(1, output_zero_point); + // Compute: Use ideep::matmul_forward to support asymmetric quantization + // Allocate output Tensor + at::Tensor output = at::_empty_affine_quantized( + dst_dims, + at::device(c10::kCPU).dtype(c10::kQUInt8), + output_scale, + output_zero_point); + if (output.numel() == 0) { + return output; + } + ideep::tensor y({dst_dims, ideep::tensor::data_type::u8, {output.strides().cbegin(), output.strides().cend()}}, + output.data_ptr()); + if (bias_.has_value()) { + // Bias might be modified outside (e.g. by quantization bias correction). + // If so, update the prepacked bias as well. + if (bias_.value().get_data_handle() != orig_bias_.value().data_ptr()) { + bias_.value().init(bias_.value().get_desc(), orig_bias_.value().data_ptr()); + } + const auto& b = bias_.value(); + ideep::matmul_forward::compute_v2(x, w, b, y, 1.0f, 1.0f, src_scales, weights_scales, dst_scales, + src_zero_point, dst_zero_point, op_attr); + } else { + ideep::matmul_forward::compute_v2(x, w, y, 1.0f, 1.0f, src_scales, weights_scales, dst_scales, + src_zero_point, dst_zero_point, op_attr); + } + auto out_sizes = input.sizes().vec(); + out_sizes.back() = N; + if (output.sizes().vec() == out_sizes) + return output; + return output.reshape(out_sizes); +} + +at::Tensor PackedLinearWeightsOnednn::apply( + at::Tensor input, + double output_scale, + int64_t output_zero_point) { + return apply_impl(std::move(input), output_scale, output_zero_point); +} + +at::Tensor PackedLinearWeightsOnednn::apply_relu( + at::Tensor input, + double output_scale, + int64_t output_zero_point) { + return apply_impl(std::move(input), output_scale, output_zero_point); +} + +#endif // #if AT_MKLDNN_ENABLED() + namespace at { namespace native { namespace { diff --git a/aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp b/aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp index 676b2f1ce64983..111255726dcf8c 100644 --- a/aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp +++ b/aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp @@ -2,8 +2,9 @@ #include #include #include -#include +#include #include +#include #include #include #include @@ -463,6 +464,99 @@ void PackedLinearWeightFp16::set_bias(c10::optional bias) { #endif // USE_FBGEMM +#if AT_MKLDNN_ENABLED() +template +at::Tensor PackedLinearWeightsOnednn::apply_dynamic_impl( + at::Tensor input, + bool reduce_range) { + // Dynamic: fp32 * int8 -> fp32 + using at::Tensor; + + TORCH_CHECK( + input.dim() >= 2, + "The dimension of input tensor should be larger than or equal to 2"); + TORCH_CHECK(input.scalar_type() == c10::ScalarType::Float, + "qlinear_dynamic (ONEDNN): data type of input should be float."); + + // Input -> uint8 + auto input_contig = input.contiguous(); + const int64_t dim = input.dim(); + auto input_reshaped = + dim == 2 ? input : input.reshape({-1, input.size(input.dim() - 1)}); + auto input_dims = input_reshaped.sizes().vec(); + auto input_data_type = dnnl::memory::data_type::f32; + auto input_desc = ideep::tensor::desc(input_dims, input_data_type); + ideep::attr_t op_attr = ReluFused ? ideep::attr_t::fuse_relu() : ideep::attr_t(); + ideep::tensor x; + x.init(input_desc, input_contig.data_ptr()); + // Find quantization parameters + float x_max = 0, x_min = 0; + if (input.numel() > 0) { + x_min = input_contig.min().item(); + x_max = input_contig.max().item(); + } + const int precision = 8; + auto q_params = quant_utils::ChooseQuantizationParams( + /*min=*/x_min, + /*max=*/x_max, + /*qmin=*/0, + /*qmax=*/(1 << precision) - 1, + /*preserve_sparsity=*/false, + /*force_scale_power_of_two=*/false, + /*reduce_range=*/reduce_range); + const std::vector& src_zero_point = std::vector(1, q_params.zero_point); + // weights, dst + auto w = *(weight_.get()); + auto dst_dims = {x.get_dim(0), w.get_dim(1)}; + const ideep::scale_t& src_scales = ideep::scale_t(1, 1.0/q_params.scale); + const ideep::scale_t& weights_scales = w.get_scale(); + // Compute -> f32 + // Use ideep::matmul_forward instead of ideep::inner_product_forward, + // since the latter does not support asymmetric quantization + // Allocate output Tensor + at::Tensor output = at::empty(dst_dims, input.options().dtype(at::kFloat)); + if (output.numel() == 0) return output; + ideep::tensor y({dst_dims, ideep::tensor::data_type::f32, + {output.strides().cbegin(), output.strides().cend()}}, + output.data_ptr()); + if (bias_.has_value()) { + // Bias might be modified outside (e.g. by quantization bias correction). + // If so, update the prepacked bias as well. + if (bias_.value().get_data_handle() != orig_bias_.value().data_ptr()) { + bias_.value().init(bias_.value().get_desc(), orig_bias_.value().data_ptr()); + } + const ideep::tensor b = bias_.value(); + ideep::matmul_forward::compute_v2(x, w, b, y, 1.0f, 1.0f, + src_scales, weights_scales, ideep::scale_t(), + src_zero_point, ideep::zero_point_t(), op_attr); + } else { + ideep::matmul_forward::compute_v2(x, w, y, 1.0f, 1.0f, + src_scales, weights_scales, ideep::scale_t(), + src_zero_point, ideep::zero_point_t(), op_attr); + } + auto out_sizes = input.sizes().vec(); + out_sizes.back() = w.get_dim(1); + if (output.sizes().vec() == out_sizes) + return output; + return output.reshape(out_sizes); +} + +at::Tensor PackedLinearWeightsOnednn::apply_dynamic( + at::Tensor input, + bool reduce_range) { + return apply_dynamic_impl( + std::move(input), reduce_range); +} + +at::Tensor PackedLinearWeightsOnednn::apply_dynamic_relu( + at::Tensor input, + bool reduce_range) { + return apply_dynamic_impl( + std::move(input), reduce_range); +} + +#endif // #if AT_MKLDNN_ENABLED() + namespace at { namespace native { namespace { diff --git a/aten/src/ATen/native/quantized/cpu/qlinear_prepack.cpp b/aten/src/ATen/native/quantized/cpu/qlinear_prepack.cpp index 93c54dc1088904..6ca6905119f49e 100644 --- a/aten/src/ATen/native/quantized/cpu/qlinear_prepack.cpp +++ b/aten/src/ATen/native/quantized/cpu/qlinear_prepack.cpp @@ -1,9 +1,9 @@ #include -#include #include #include -#include +#include #include +#include #include #include #include @@ -194,6 +194,80 @@ c10::intrusive_ptr PackedLinearWeightFp16::prepack( } #endif // USE_FBGEMM +#if AT_MKLDNN_ENABLED() +c10::intrusive_ptr PackedLinearWeightsOnednn::prepack( + at::Tensor weight, + c10::optional bias) { + TORCH_CHECK( + weight.dim() == 2, + "The weight tensor for quantized::linear_prepack (onednn) should" + " be 2-dimensional."); + // Weight + std::vector dims = weight.sizes().vec(); + auto N = weight.size(0); + std::vector wgt_zero_points; + ideep::scale_t wgt_scales; + const auto qtype = weight.qscheme(); + if (qtype == c10::kPerTensorAffine) { + TORCH_CHECK( + weight.q_zero_point() == 0, + "quantized::linear_prepack: ONEDNN only supports symmetric quantization of weight," + " whose zero point must be 0, but got ", weight.q_zero_point()); + wgt_zero_points = std::vector(1, weight.q_zero_point()); + wgt_scales = ideep::scale_t(1, 1.0/weight.q_scale()); // Scales of ONEDNN and PyTorch are reciprocal + } else if (qtype == c10::kPerChannelAffine) { + wgt_zero_points.resize(N); + wgt_scales.resize(N); + for (int i = 0; i < N; ++i) { + wgt_zero_points[i] = weight.q_per_channel_zero_points()[i].item(); + TORCH_CHECK( + wgt_zero_points[i] == 0, + "quantized::linear_prepack: ONEDNN only supports symmetric quantization of weight," + " whose zero point must be 0, but got ", wgt_zero_points[i], ", at index ", i); + wgt_scales[i] = 1.0f / weight.q_per_channel_scales()[i].item(); // Scales of ONEDNN and PyTorch are reciprocal + } + } else { + TORCH_CHECK(false, "Unsupported qscheme: ", toString(qtype)); + } + + // Prepack weight + auto weight_copy = weight.clone(); + ideep::tensor wgt = ideep::tensor({dims, dnnl::memory::data_type::s8}, weight_copy.data_ptr()); + wgt.transpose_(0, 1); // ONEDNN requires transposed weight + auto w_desc = ideep::matmul_forward::expected_weights_desc(wgt.get_dims(), dnnl::memory::data_type::s8, + dnnl::memory::data_type::u8); + ideep::tensor exp_wgt(w_desc); + exp_wgt.feed_from(wgt); + ideep::tensor * packed_weight_p = new ideep::tensor(exp_wgt); + packed_weight_p->set_scale(wgt_scales); + packed_weight_p->set_zero_point(wgt_zero_points); + std::unique_ptr weight_ptr(packed_weight_p); + // Bias + c10::optional onednn_bias{c10::nullopt}; + if (bias.has_value()) { + auto& b = bias.value(); + auto bias_size = b.sizes().vec(); + bias_size.insert(bias_size.begin(), 1); + TORCH_CHECK( + bias_size[1] == weight_ptr->get_dim(1), + "bias should have N elements: ", + std::to_string(weight_ptr->get_dim(1)), + ", but got ", bias_size[1]); + auto bias_desc = ideep::tensor::desc(bias_size, dnnl::memory::data_type::f32); + ideep::tensor packed_bias; + packed_bias.init(bias_desc, b.data_ptr()); + onednn_bias = c10::optional(packed_bias); + } + auto ret_ptr = c10::make_intrusive( + PackedLinearWeightsOnednn{ + std::move(weight_ptr), + onednn_bias, + weight, + bias}); + return ret_ptr; +} +#endif // #if AT_MKLDNN_ENABLED() + namespace at { namespace native { @@ -224,6 +298,11 @@ class QLinearPackWeightInt8 final { std::move(weight), std::move(bias)); } #endif +#if AT_MKLDNN_ENABLED() + if (ctx.qEngine() == at::QEngine::ONEDNN) { + return PackedLinearWeightsOnednn::prepack(std::move(weight), std::move(bias)); + } +#endif // #if AT_MKLDNN_ENABLED() TORCH_CHECK( false, "Didn't find engine for operation quantized::linear_prepack ", @@ -238,6 +317,9 @@ class QLinearPackWeightFp16 final { c10::optional bias) { auto& ctx = at::globalContext(); #ifdef USE_FBGEMM + // temporarily convert weight back to fp32, needs to be fixed + // after fbgemm fixes the interface for their prepacking op (take fp16 input0 + weight = weight.to(ScalarType::Float); if (ctx.qEngine() == at::QEngine::FBGEMM) { return PackedLinearWeightFp16::prepack( std::move(weight), std::move(bias)); @@ -251,6 +333,14 @@ class QLinearPackWeightFp16 final { "not supported by QNNPACK"); } #endif // USE_PYTORCH_QNNPACK +#if AT_MKLDNN_ENABLED() + if (ctx.qEngine() == at::QEngine::ONEDNN) { + TORCH_CHECK( + false, + "quantized::linear_prepack_fp16 is currently " + "not supported by ONEDNN"); + } +#endif // #if AT_MKLDNN_ENABLED() TORCH_CHECK( false, "Didn't find engine for operation quantized::linear_prepack_fp16 ", @@ -261,63 +351,18 @@ class QLinearPackWeightFp16 final { class QLinearPackWeightInt8Legacy final { public: static Tensor run(at::Tensor weight, c10::optional bias) { - auto& ctx = at::globalContext(); - auto options = weight.options(); - -#ifdef USE_FBGEMM - if (ctx.qEngine() == at::QEngine::FBGEMM) { - auto prepacked = - PackedLinearWeight::prepack(std::move(weight), std::move(bias)); - auto wrapped = - std::make_unique>( - std::move(prepacked)); - return cpp_custom_type_hack::create(std::move(wrapped), options); - } -#endif // USE_FBGEMM -#ifdef USE_PYTORCH_QNNPACK - if (ctx.qEngine() == at::QEngine::QNNPACK) { - auto prepacked = - PackedLinearWeightsQnnp::prepack(std::move(weight), std::move(bias)); - auto wrapped = - std::make_unique>( - std::move(prepacked)); - return cpp_custom_type_hack::create(std::move(wrapped), options); - } -#endif // USE_PYTORCH_QNNPACK - TORCH_CHECK( - false, - "Didn't find engine for operation quantized::linear_prepack ", - toString(ctx.qEngine())); + TORCH_CHECK(false, + "This model uses an outdated version of quantized.linear_prepack. " + "Please re-export your model using the newer definitions in torch.jit.quantized"); } }; class QLinearPackWeightFp16Legacy final { public: static Tensor run(at::Tensor weight, c10::optional bias) { - auto& ctx = at::globalContext(); -#ifdef USE_FBGEMM - auto options = weight.options(); - if (ctx.qEngine() == at::QEngine::FBGEMM) { - auto prepacked = - PackedLinearWeightFp16::prepack(std::move(weight), std::move(bias)); - auto wrapped = - std::make_unique>( - std::move(prepacked)); - return cpp_custom_type_hack::create(std::move(wrapped), options); - } -#endif // USE_FBGEMM -#ifdef USE_PYTORCH_QNNPACK - if (ctx.qEngine() == at::QEngine::QNNPACK) { - TORCH_CHECK( - false, - "quantized::linear_prepack_fp16 is currently " - "not supported by QNNPACK"); - } -#endif // USE_PYTORCH_QNNPACK - TORCH_CHECK( - false, - "Didn't find engine for operation quantized::linear_prepack_fp16 ", - toString(ctx.qEngine())); + TORCH_CHECK(false, + "This model uses an outdated version of quantized.linear_prepack_fp16. " + "Please re-export your model using the newer definitions in torch.jit.quantized"); } }; diff --git a/aten/src/ATen/native/quantized/cpu/qlinear_unpack.cpp b/aten/src/ATen/native/quantized/cpu/qlinear_unpack_impl.cpp similarity index 50% rename from aten/src/ATen/native/quantized/cpu/qlinear_unpack.cpp rename to aten/src/ATen/native/quantized/cpu/qlinear_unpack_impl.cpp index 2a34e6748eb433..b7182bf0fa4724 100644 --- a/aten/src/ATen/native/quantized/cpu/qlinear_unpack.cpp +++ b/aten/src/ATen/native/quantized/cpu/qlinear_unpack_impl.cpp @@ -1,7 +1,8 @@ #include #include #include -#include +#include +#include #include #include #include @@ -74,78 +75,9 @@ std::tuple> PackedLinearWeightFp16:: } #endif // USE_FBGEMM -namespace at { -namespace native { -namespace { - -class QLinearUnpackWeightInt8 final { - public: - static std::tuple> run( - const c10::intrusive_ptr& packed_weight) { - return packed_weight->unpack(); - } -}; - -class QLinearUnpackWeightFp16 final { - public: - static std::tuple> run( - const c10::intrusive_ptr& packed_weight) { - auto& ctx = at::globalContext(); - - TORCH_CHECK( - ctx.qEngine() != at::QEngine::QNNPACK, - "quantized::linear_unpack_fp16 is currently " - "not supported by QNNPACK"); - - return packed_weight->unpack(); - } -}; - -class QLinearUnpackWeightInt8Legacy final { - public: - static std::tuple> run( - const at::Tensor& packed_weight) { - TORCH_WARN_ONCE( - "quantized.linear_unpack(Tensor) is deprecated! Please " - "upgrade your model to use the newer quantized.linear_" - "unpack(LinearPackedParamsBase) overload"); - return cpp_custom_type_hack::cast< - c10::intrusive_ptr>(packed_weight) - ->unpack(); - } -}; - -class QLinearUnpackWeightFp16Legacy final { - public: - static std::tuple> run( - const at::Tensor& packed_weight) { - TORCH_WARN_ONCE( - "quantized.linear_unpack(Tensor) is deprecated! Please " - "upgrade your model to use the newer quantized.linear_" - "unpack(LinearPackedParamsBase) overload"); - auto& ctx = at::globalContext(); - - TORCH_CHECK( - ctx.qEngine() != at::QEngine::QNNPACK, - "quantized::linear_unpack_fp16 is currently " - "not supported by QNNPACK"); - - return cpp_custom_type_hack::cast< - c10::intrusive_ptr>(packed_weight) - ->unpack(); - } -}; - -TORCH_LIBRARY_IMPL(quantized, CPU, m) { - m.impl(TORCH_SELECTIVE_NAME("quantized::linear_unpack.legacy"), TORCH_FN(QLinearUnpackWeightInt8Legacy::run)); - m.impl(TORCH_SELECTIVE_NAME("quantized::linear_unpack_fp16.legacy"), TORCH_FN(QLinearUnpackWeightFp16Legacy::run)); -} - -TORCH_LIBRARY_IMPL(quantized, CatchAll, m) { - m.impl(TORCH_SELECTIVE_NAME("quantized::linear_unpack"), TORCH_FN(QLinearUnpackWeightInt8::run)); - m.impl(TORCH_SELECTIVE_NAME("quantized::linear_unpack_fp16"), TORCH_FN(QLinearUnpackWeightFp16::run)); +#if AT_MKLDNN_ENABLED() +std::tuple> PackedLinearWeightsOnednn::unpack() { + return std::tuple>( + orig_weight_, orig_bias_); } - -} // namespace -} // namespace native -} // namespace at +#endif // #if AT_MKLDNN_ENABLED() diff --git a/aten/src/ATen/native/quantized/cpu/qmatmul.cpp b/aten/src/ATen/native/quantized/cpu/qmatmul.cpp index 013966a525103e..e42941fd0a35db 100644 --- a/aten/src/ATen/native/quantized/cpu/qmatmul.cpp +++ b/aten/src/ATen/native/quantized/cpu/qmatmul.cpp @@ -1,6 +1,12 @@ #include #include +#ifdef USE_RUY_QMATMUL +#include +#include +#include +#endif + namespace at { namespace native { @@ -21,6 +27,142 @@ inline void check_inputs(const Tensor& qa, const Tensor& qb) { "Both inputs to Matmul must have the same quantization scheme."); } +#ifdef USE_RUY_QMATMUL + +Tensor qmatmul( + const Tensor& qa, + const Tensor& qb, + const double output_scale, + const int64_t output_zero_point) { + check_inputs(qa, qb); + + const int64_t num_dims = qa.dim(); + const int64_t b_num_dims = qb.dim(); + + TORCH_CHECK( + num_dims == b_num_dims, + "MatMul operands should have the same dimensionality. (", num_dims, + " and ", b_num_dims, " provided)"); + TORCH_CHECK( + num_dims >= 2, + "Quantized Matmul currently only suports operands which are at least 2-dimensional. (", + num_dims, " provided)"); + + const int64_t m = qa.size(num_dims - 2); + const int64_t k = qa.size(num_dims - 1); + const int64_t b_k = qb.size(num_dims - 2); + const int64_t n = qb.size(num_dims - 1); + + TORCH_CHECK( + b_k == k, + "For Quantized Matmul, the size of tensor a (", k, + ") at dimension ", num_dims - 1, " must match the size of tensor b (", + b_k, ") at dimension ", num_dims - 2, "."); + + std::vector out_size_vec(num_dims); + size_t num_matmuls = 1; + for (int64_t i = 0; i < num_dims - 2; i++) { + const int64_t dim = qa.size(i); + const int64_t qb_dim = qb.size(i); + + TORCH_CHECK( + dim == qb_dim, + "For Quantized Matmul, the size of tensor a (", dim, + ") must match the size of tensor b (", qb_dim, + ") at dimension ", i); + + out_size_vec[i] = dim; + num_matmuls *= dim; + } + out_size_vec[num_dims - 2] = m; + out_size_vec[num_dims - 1] = n; + + Tensor out = at::_empty_affine_quantized( + IntArrayRef(out_size_vec), + at::device(kCPU) + .dtype(qa.scalar_type()) + .memory_format(qa.suggest_memory_format()), + output_scale, + output_zero_point, + c10::nullopt); + + const Tensor& qa_contig = qa.contiguous(); + const Tensor& qb_contig = qb.contiguous(); + + AT_DISPATCH_QINT_BYTE_TYPES(qa.scalar_type(), "qmatmul", [&] { + using underlying_t = typename scalar_t::underlying; + + const underlying_t* qa_data = reinterpret_cast( + qa_contig.data_ptr()); + const underlying_t* qb_data = reinterpret_cast( + qb_contig.data_ptr()); + underlying_t* out_data = + reinterpret_cast(out.data_ptr()); + + const size_t qa_stride = m * k; + const size_t qb_stride = k * n; + const size_t out_stride = m * n; + + auto matmuls = [&](int64_t begin, int64_t end) { + + ruy::Matrix qa_matrix; + ruy::MakeSimpleLayout( + m, k, ruy::Order::kRowMajor, qa_matrix.mutable_layout()); + qa_matrix.set_zero_point(qa.q_zero_point()); + + ruy::Matrix qb_matrix; + ruy::MakeSimpleLayout( + k, n, ruy::Order::kRowMajor, qb_matrix.mutable_layout()); + qb_matrix.set_zero_point(qb.q_zero_point()); + + ruy::Matrix out_matrix; + ruy::MakeSimpleLayout( + m, n, ruy::Order::kRowMajor, out_matrix.mutable_layout()); + out_matrix.set_zero_point(output_zero_point); + + // Requantization explanation: + // https://github.com/google/gemmlowp/blob/e844ffd17118c1e17d94e1ba4354c075a4577b88/doc/quantization.md + const double requantization_scale_inv = + (qa.q_scale() * qb.q_scale()) / output_scale; + + ruy::MulParams mul_params; + + int multiplier_fixedpoint; + int multiplier_exponent; + ruy_utils::quantize_multiplier(requantization_scale_inv, + &multiplier_fixedpoint, + &multiplier_exponent); + mul_params.set_multiplier_fixedpoint(multiplier_fixedpoint); + mul_params.set_multiplier_exponent(multiplier_exponent); + + const underlying_t* qa_subtensor = qa_data + begin * qa_stride; + const underlying_t* qb_subtensor = qb_data + begin * qb_stride; + underlying_t* out_subtensor = out_data + begin * out_stride; + + for (int64_t i = begin; i < end; i++) { + qa_matrix.set_data(qa_subtensor); + qb_matrix.set_data(qb_subtensor); + out_matrix.set_data(out_subtensor); + ruy::Mul(qa_matrix, + qb_matrix, + mul_params, + ruy_utils::get_ruy_context(), + &out_matrix); + + qa_subtensor += qa_stride; + qb_subtensor += qb_stride; + out_subtensor += out_stride; + } + }; + + at::parallel_for(0, num_matmuls, 1, matmuls); + }); + + return out; +} + +#else // ifdef USE_RUY_QMATMUL + Tensor qmatmul( const Tensor& qa, const Tensor& qb, @@ -34,6 +176,8 @@ Tensor qmatmul( rc, output_scale, output_zero_point, qa.scalar_type()); } +#endif // ifdef USE_RUY_QMATMUL + TORCH_LIBRARY_IMPL(quantized, QuantizedCPU, m) { m.impl(TORCH_SELECTIVE_NAME("quantized::matmul"), TORCH_FN(qmatmul)); } diff --git a/aten/src/ATen/native/quantized/cpu/qnnpack/cmake/DownloadGoogleTest.cmake b/aten/src/ATen/native/quantized/cpu/qnnpack/cmake/DownloadGoogleTest.cmake index 30cc61dc17fb76..4a86d641e41237 100644 --- a/aten/src/ATen/native/quantized/cpu/qnnpack/cmake/DownloadGoogleTest.cmake +++ b/aten/src/ATen/native/quantized/cpu/qnnpack/cmake/DownloadGoogleTest.cmake @@ -10,7 +10,7 @@ project(googletest-download NONE) include(ExternalProject) ExternalProject_Add(googletest - URL https://github.com/google/googletest/archive/release-1.8.0.zip + URL https://github.com/google/googletest/archive/release-1.10.0.zip URL_HASH SHA256=f3ed3b58511efd272eb074a3a6d6fb79d7c2e6a0e374323d1e6bcbcc1ef141bf SOURCE_DIR "${CONFU_DEPENDENCIES_SOURCE_DIR}/googletest" BINARY_DIR "${CONFU_DEPENDENCIES_BINARY_DIR}/googletest" diff --git a/aten/src/ATen/native/quantized/cpu/qnnpack/deps/clog/cmake/DownloadGoogleTest.cmake b/aten/src/ATen/native/quantized/cpu/qnnpack/deps/clog/cmake/DownloadGoogleTest.cmake index 30cc61dc17fb76..4a86d641e41237 100644 --- a/aten/src/ATen/native/quantized/cpu/qnnpack/deps/clog/cmake/DownloadGoogleTest.cmake +++ b/aten/src/ATen/native/quantized/cpu/qnnpack/deps/clog/cmake/DownloadGoogleTest.cmake @@ -10,7 +10,7 @@ project(googletest-download NONE) include(ExternalProject) ExternalProject_Add(googletest - URL https://github.com/google/googletest/archive/release-1.8.0.zip + URL https://github.com/google/googletest/archive/release-1.10.0.zip URL_HASH SHA256=f3ed3b58511efd272eb074a3a6d6fb79d7c2e6a0e374323d1e6bcbcc1ef141bf SOURCE_DIR "${CONFU_DEPENDENCIES_SOURCE_DIR}/googletest" BINARY_DIR "${CONFU_DEPENDENCIES_BINARY_DIR}/googletest" diff --git a/aten/src/ATen/native/quantized/cpu/qnnpack_utils.h b/aten/src/ATen/native/quantized/cpu/qnnpack_utils.h index 1f6d6f1d910561..60ea7822a76056 100644 --- a/aten/src/ATen/native/quantized/cpu/qnnpack_utils.h +++ b/aten/src/ATen/native/quantized/cpu/qnnpack_utils.h @@ -6,8 +6,8 @@ #include #include -#include -#include +#include +#include #include #include @@ -40,6 +40,7 @@ struct PackedLinearWeightsQnnp : public LinearPackedParamsBase { orig_weight(std::move(orig_weight)), bias_(at::native::mobile::allocate_padded_contiguous_if_needed( bias, bias.suggest_memory_format())), + per_channel_(this->orig_weight.qscheme() == at::kPerChannelAffine), input_scale(std::move(input_scale)), w_scales(w_scales), w_zero_points(std::move(w_zps)) {} @@ -47,6 +48,7 @@ struct PackedLinearWeightsQnnp : public LinearPackedParamsBase { std::unique_ptr w; at::Tensor orig_weight; at::Tensor bias_; + bool per_channel_; c10::optional input_scale; at::Tensor w_scales; std::vector w_zero_points; @@ -74,8 +76,23 @@ struct PackedLinearWeightsQnnp : public LinearPackedParamsBase { at::Tensor weight, c10::optional bias); + bool per_channel() const { + return per_channel_; + } + private: std::mutex qnnp_mutex_; + +#ifdef USE_XNNPACK + xnnpack_operator xnnp_linear_op; + + template + at::Tensor apply_impl_xnnp( + const at::Tensor& input, + double output_scale, + int64_t output_zero_point); +#endif // USE_XNNPACK + template at::Tensor apply_impl( at::Tensor input, @@ -112,6 +129,7 @@ struct PackedConvWeightsQnnp : public ConvPackedParamsBase { dilation_(std::move(dilation)), groups_(groups), transpose_(transpose), + is_per_channel_(is_per_channel), input_scale(input_scale), kernel_(std::move(kernel)), w_scales(w_scale), @@ -200,7 +218,7 @@ struct PackedConvWeightsQnnp : public ConvPackedParamsBase { convolution->input_padding_height = padding_[kSpatialDim - 2]; convolution->input_padding_width = padding_[kSpatialDim - 1]; convolution->input_padding_depth = kSpatialDim == 3 ? padding_[0] : 0; - convolution->per_channel = is_per_channel; + convolution->per_channel = is_per_channel_; convolution->transpose = transpose_; const uint32_t kr = pytorch_qnnp_params.q8conv.kr; @@ -260,6 +278,9 @@ struct PackedConvWeightsQnnp : public ConvPackedParamsBase { } std::unique_ptr convolution_op; + #ifdef USE_XNNPACK + xnnpack_operator xnnp_convolution_op; + #endif // USE_XNNPACK std::unique_ptr w; at::Tensor orig_weight; at::Tensor bias; @@ -269,6 +290,7 @@ struct PackedConvWeightsQnnp : public ConvPackedParamsBase { torch::List dilation_; int64_t groups_; bool transpose_; + bool is_per_channel_; c10::optional input_scale; std::vector kernel_; at::Tensor w_scales; @@ -326,6 +348,10 @@ struct PackedConvWeightsQnnp : public ConvPackedParamsBase { return transpose_; } + bool per_channel() const { + return is_per_channel_; + } + private: std::mutex qnnp_mutex_; template @@ -333,6 +359,14 @@ struct PackedConvWeightsQnnp : public ConvPackedParamsBase { const at::Tensor& input, double output_scale, int64_t output_zero_point); + +#ifdef USE_XNNPACK + template + at::Tensor apply_impl_xnnp( + const at::Tensor& input, + double output_scale, + int64_t output_zero_point); +#endif // USE_XNNPACK }; enum class Activation : uint8_t { NONE = 0, RELU = 1 }; diff --git a/aten/src/ATen/native/quantized/cpu/qupsample_bilinear2d.cpp b/aten/src/ATen/native/quantized/cpu/qupsample_bilinear2d.cpp index ab30cd7d381010..d9a871a591bfde 100644 --- a/aten/src/ATen/native/quantized/cpu/qupsample_bilinear2d.cpp +++ b/aten/src/ATen/native/quantized/cpu/qupsample_bilinear2d.cpp @@ -178,7 +178,7 @@ using at::native::upsample::get_scale_value; Tensor upsample_bilinear2d_quantized_cpu( const Tensor& input, - c10::optional output_size, + at::OptionalIntArrayRef output_size, bool align_corners, c10::optional> scale_factors) { auto osize = compute_output_size(input.sizes(), output_size, scale_factors); diff --git a/aten/src/ATen/native/quantized/cpu/qupsample_nearest2d.cpp b/aten/src/ATen/native/quantized/cpu/qupsample_nearest2d.cpp index 377ef15790b137..a8cd6abec7e44a 100644 --- a/aten/src/ATen/native/quantized/cpu/qupsample_nearest2d.cpp +++ b/aten/src/ATen/native/quantized/cpu/qupsample_nearest2d.cpp @@ -202,7 +202,7 @@ Tensor _upsample_nearest_exact2d_quantized_cpu( Tensor upsample_nearest2d_quantized_cpu( const Tensor& input, - c10::optional output_size, + at::OptionalIntArrayRef output_size, c10::optional> scale_factors) { auto osize = compute_output_size(input.sizes(), output_size, scale_factors); auto scale_h = get_scale_value(scale_factors, 0); @@ -212,7 +212,7 @@ Tensor upsample_nearest2d_quantized_cpu( Tensor _upsample_nearest_exact2d_quantized_cpu( const Tensor& input, - c10::optional output_size, + at::OptionalIntArrayRef output_size, c10::optional> scale_factors) { auto osize = compute_output_size(input.sizes(), output_size, scale_factors); auto scale_h = get_scale_value(scale_factors, 0); diff --git a/aten/src/ATen/native/quantized/cpu/qupsample_nearest3d.cpp b/aten/src/ATen/native/quantized/cpu/qupsample_nearest3d.cpp index db4077ef432887..d2e83542133674 100644 --- a/aten/src/ATen/native/quantized/cpu/qupsample_nearest3d.cpp +++ b/aten/src/ATen/native/quantized/cpu/qupsample_nearest3d.cpp @@ -232,7 +232,7 @@ Tensor _upsample_nearest_exact3d_quantized_cpu( Tensor upsample_nearest3d_quantized_cpu( const Tensor& input, - c10::optional output_size, + at::OptionalIntArrayRef output_size, c10::optional> scale_factors) { auto osize = compute_output_size(input.sizes(), output_size, scale_factors); auto scale_d = get_scale_value(scale_factors, 0); @@ -243,7 +243,7 @@ Tensor upsample_nearest3d_quantized_cpu( Tensor _upsample_nearest_exact3d_quantized_cpu( const Tensor& input, - c10::optional output_size, + at::OptionalIntArrayRef output_size, c10::optional> scale_factors) { auto osize = compute_output_size(input.sizes(), output_size, scale_factors); auto scale_d = get_scale_value(scale_factors, 0); diff --git a/aten/src/ATen/native/quantized/cpu/ruy_utils.cpp b/aten/src/ATen/native/quantized/cpu/ruy_utils.cpp new file mode 100644 index 00000000000000..d0164f7363524e --- /dev/null +++ b/aten/src/ATen/native/quantized/cpu/ruy_utils.cpp @@ -0,0 +1,37 @@ +#ifdef USE_RUY_QMATMUL + +#include +#include + +namespace at { +namespace native { +namespace ruy_utils { + +static thread_local ruy::Context context; + +ruy::Context* get_ruy_context() { + return &context; +} + +// Adopted from Ruy: +// https://github.com/google/ruy/blob/2d950b3bfa7ebfbe7a97ecb44b1cc4da5ac1d6f0/ruy/test.h#L1602 +void quantize_multiplier(double scale, + int* multiplier_fixedpoint, + int* multiplier_exponent) { + TORCH_CHECK(scale > 0, "Quantization scale (", scale, ") must be positive."); + const double q = std::frexp(scale, multiplier_exponent); + auto q_fixed = static_cast(std::round(q * (1ll << 31))); + TORCH_CHECK(q_fixed <= (1ll << 31)); + if (q_fixed == (1ll << 31)) { + q_fixed /= 2; + ++*multiplier_exponent; + } + TORCH_CHECK(q_fixed <= std::numeric_limits::max()); + *multiplier_fixedpoint = static_cast(q_fixed); +} + +} // namespace ruy_utils +} // namespace native +} // namesplace + +#endif // USE_RUY_QMATMUL diff --git a/aten/src/ATen/native/quantized/cpu/ruy_utils.h b/aten/src/ATen/native/quantized/cpu/ruy_utils.h new file mode 100644 index 00000000000000..aeb332af4ecae3 --- /dev/null +++ b/aten/src/ATen/native/quantized/cpu/ruy_utils.h @@ -0,0 +1,21 @@ +#pragma once + +#ifdef USE_RUY_QMATMUL + +#include + +namespace at { +namespace native { +namespace ruy_utils { + +ruy::Context* get_ruy_context(); + +void quantize_multiplier(double scale, + int* multiplier_fixedpoint, + int* multiplier_exponent); + +} // namespace ruy_utils +} // namespace native +} // namesplace + +#endif // USE_RUY_QMATMUL diff --git a/aten/src/ATen/native/quantized/cpu/xnnpack_utils.cpp b/aten/src/ATen/native/quantized/cpu/xnnpack_utils.cpp new file mode 100644 index 00000000000000..8f81c8ea8d5e81 --- /dev/null +++ b/aten/src/ATen/native/quantized/cpu/xnnpack_utils.cpp @@ -0,0 +1,89 @@ +#ifdef USE_XNNPACK + +#include +#include +#include +#include + +namespace at { +namespace native { +namespace xnnp_utils { + +std::vector get_mem_format_aware_shape(const at::Tensor& in) { + const auto mem_format = in.suggest_memory_format(); + const auto& sizes = in.sizes(); + std::vector ret(sizes.begin(), sizes.end()); + if (mem_format == c10::MemoryFormat::ChannelsLast) { + // NCHW -> NHWC + // 0123 -> 0231 + ret[1] = sizes[2]; /* H */ + ret[2] = sizes[3]; /* W */ + ret[3] = sizes[1]; /* C */ + } else if (mem_format == c10::MemoryFormat::ChannelsLast3d) { + // NCDHW -> NDHWC + // 01234 -> 02341 + ret[1] = sizes[2]; /* D */ + ret[2] = sizes[3]; /* H */ + ret[3] = sizes[4]; /* W */ + ret[4] = sizes[1]; /* C */ + } + return ret; +} + +template +void q8_copy_int8_weight_and_add_offset(const at::Tensor& in, at::Tensor& out) { + using T = typename PT::underlying; + static constexpr auto offset = std::is_same::value ? 128 : 0; + TORCH_CHECK( + in.scalar_type() == c10::kQInt8, + "q8_copy_int8_weight_and_add_offset: Expected input weight data type ", + toString(c10::kQInt8), + " but got ", + toString(in.scalar_type())) + const int8_t* in_ptr = + reinterpret_cast(in.data_ptr()); + T* out_ptr = reinterpret_cast(out.data_ptr()); + + for (const auto i : c10::irange(in.numel())) { + out_ptr[i] = static_cast(static_cast(in_ptr[i]) + offset); + } +} + +template void q8_copy_int8_weight_and_add_offset( + const at::Tensor& in, + at::Tensor& out); +template void q8_copy_int8_weight_and_add_offset( + const at::Tensor& in, + at::Tensor& out); + +/* + * Stolen from fbgemm_utils::ConvertConvWeightsToChannelLastTensor to avoid + * dependence on USE_FBGEMM. Reorder weights to the format xnnpack expects. + * TODO: add a 3d variant. + */ +template <> +Tensor convert_conv_weights_to_channel_last_tensor<2>( + const at::Tensor& src, + int groups, + bool transpose) { + return transpose ? + // 2D conv transpose weight transform + // IC OC/G KH KW -> G OC/G KH KW IC/G + [&]() { + auto ic_g_oc_g_hw_tensors = src.chunk(groups); + for (auto& tensor : ic_g_oc_g_hw_tensors) { + tensor = tensor.unsqueeze(0); + } + auto fused_tensor = at::cat(ic_g_oc_g_hw_tensors); + set_quantizer_(fused_tensor, src.quantizer()); + return fused_tensor.permute({0, 2, 3, 4, 1}) + .contiguous(c10::MemoryFormat::Contiguous); + }() + // 2d conv weight transform + : src.contiguous(c10::MemoryFormat::ChannelsLast); +} +} // namespace xnnp_utils +} // namespace native +} // namespace at + +#endif // USE_XNNPACK diff --git a/aten/src/ATen/native/quantized/cpu/xnnpack_utils.h b/aten/src/ATen/native/quantized/cpu/xnnpack_utils.h new file mode 100644 index 00000000000000..78f325263f4fc0 --- /dev/null +++ b/aten/src/ATen/native/quantized/cpu/xnnpack_utils.h @@ -0,0 +1,279 @@ +#pragma once + +#ifdef USE_XNNPACK +#include + +#include +#include + +using xnnpack_operator = at::native::xnnpack::Operator; + +namespace at { +namespace native { +namespace xnnp_utils { + +/* + * Return shape in the same order as the memory format + * e.g. channels_last will return NHWC instead of NCHW + */ +std::vector get_mem_format_aware_shape(const at::Tensor& in); + +/* + * Input is always int8_t, output can be [int8_t, uint8_t]. + * input + offset = output + * int8_t + 128 = uint8_t + * int8_t + 0 = int8_t + */ +template +void q8_copy_int8_weight_and_add_offset(const at::Tensor& in, at::Tensor& out); + +template +Tensor convert_conv_weights_to_channel_last_tensor( + const at::Tensor& src, + int groups, + bool transpose); + +/* + * Series of create wrapper functions to call xnn_create_[de]conv* functions. + */ +C10_ALWAYS_INLINE +enum xnn_status xnnp_create_convolution2d_nhwc( + uint32_t pad_top, + uint32_t pad_right, + uint32_t pad_bottom, + uint32_t pad_left, + uint32_t kernel_h, + uint32_t kernel_w, + uint32_t stride_h, + uint32_t stride_w, + uint32_t dilation_h, + uint32_t dilation_w, + uint32_t groups, + size_t group_input_channels, + size_t group_output_channels, + size_t ip_chan_stride, + size_t op_chan_stride, + int8_t izp, + float ip_scale, + int8_t kzp, + const float* k_scales, + const int8_t* kernel, + const int32_t* bias, + int8_t ozp, + float op_scale, + int8_t op_min, + int8_t op_max, + uint32_t flags, + xnn_operator_t* op, + bool per_channel, + bool transpose) { + /* Symmetric quantization forces kzp = 0 */ + TORCH_CHECK(!kzp, "XNNPACK Q[SC]8 conv kernels expects kernel zero point to be zero." + "But got: ", kzp); + + if (transpose) { + TORCH_CHECK(!per_channel, "XNNPACK Q[SC]8 does not have a per channel deconvolution!"); + return xnn_create_deconvolution2d_nhwc_qs8( + pad_top, /* uint32_t output_padding_top */ + pad_right, /* uint32_t output_padding_right */ + pad_bottom, /* uint32_t output_padding_bottom */ + pad_left, /* uint32_t output_padding_left */ + kernel_h, /* uint32_t kernel_height */ + kernel_w, /* uint32_t kernel_width */ + stride_h, /* uint32_t stride_height */ + stride_w, /* uint32_t stride_width */ + dilation_h, /* uint32_t dilation_height */ + dilation_w, /* uint32_t dilation_width */ + groups, /* uint32_t groups */ + group_input_channels, /* size_t group_input_channels */ + group_output_channels, /* size_t group_output_channels */ + ip_chan_stride, /* size_t input_pixel_stride */ + op_chan_stride, /* size_t output_pixel_stride */ + izp, /* int8_t input_zero_point */ + ip_scale, /* float input_scale */ + k_scales[0], /* float kernel_scale */ + kernel, /* const int8_t* kernel */ + bias, /* const int32_t* bias */ + ozp, /* int8_t output_zero_point */ + op_scale, /* float output_scale */ + op_min, /* int8_t output_min */ + op_max, /* int8_t output_max */ + flags, /* uint32_t flags */ + op); /* xnn_operator_t* deconvolution_op_out */ + + } + + if (!per_channel) { + return xnn_create_convolution2d_nhwc_qs8( + pad_top, /* uint32_t input_padding_top */ + pad_right, /* uint32_t input_padding_right */ + pad_bottom, /* uint32_t input_padding_bottom */ + pad_left, /* uint32_t input_padding_left */ + kernel_h, /* uint32_t kernel_height */ + kernel_w, /* uint32_t kernel_width */ + stride_h, /* uint32_t subsampling_height */ + stride_w, /* uint32_t subsampling_width */ + dilation_h, /* uint32_t dilation_height */ + dilation_w, /* uint32_t dilation_width */ + groups, /* uint32_t groups */ + group_input_channels, /* size_t group_input_channels */ + group_output_channels, /* size_t group_output_channels*/ + ip_chan_stride, /* size_t input_channel_stride */ + op_chan_stride, /* size_t output_channel_stride */ + izp, /* int8_t input_zero_point */ + ip_scale, /* float input_scale */ + k_scales[0], /* float kernel_scale */ + kernel, /* const int8_t* kernel */ + bias, /* const int32_t* bias */ + ozp, /* int8_t output_zero_point */ + op_scale, /* float output_scale */ + op_min, /* int8_t output_min */ + op_max, /* int8_t output_max */ + flags, /* uint32_t flags */ + op); /* xnn_operator_t* convolution_op_out */ + } else { /* per_channel */ + return xnn_create_convolution2d_nhwc_qc8( + pad_top, /* uint32_t input_padding_top */ + pad_right, /* uint32_t input_padding_right */ + pad_bottom, /* uint32_t input_padding_bottom */ + pad_left, /* uint32_t input_padding_left */ + kernel_h, /* uint32_t kernel_height */ + kernel_w, /* uint32_t kernel_width */ + stride_h, /* uint32_t subsampling_height */ + stride_w, /* uint32_t subsampling_width */ + dilation_h, /* uint32_t dilation_height */ + dilation_w, /* uint32_t dilation_width */ + groups, /* uint32_t groups */ + group_input_channels, /* size_t group_input_channels */ + group_output_channels, /* size_t group_output_channels*/ + ip_chan_stride, /* size_t input_channel_stride */ + op_chan_stride, /* size_t output_channel_stride */ + izp, /* int8_t input_zero_point */ + ip_scale, /* float input_scale */ + k_scales, /* const float* kernel_scale */ + kernel, /* const int8_t* kernel */ + bias, /* const int32_t* bias */ + ozp, /* int8_t output_zero_point */ + op_scale, /* float output_scale */ + op_min, /* int8_t output_min */ + op_max, /* int8_t output_max */ + flags, /* uint32_t flags */ + op); /* xnn_operator_t* convolution_op_out */ + } +} + +/* + * Series of setup wrapper functions to call xnn_setup_[de]conv* functions. + */ +C10_ALWAYS_INLINE +enum xnn_status xnnp_setup_convolution2d_nhwc( + xnn_operator_t op, + size_t batch, + size_t in_h, + size_t in_w, + const int8_t* inp, + int8_t* outp, + pthreadpool_t pt_pool, + bool per_channel = false, + bool transpose = false, + uint32_t adj_h = 0, + uint32_t adj_w = 0) { + if(transpose) { + TORCH_CHECK(!per_channel, "XNNPACK Q[SC]8 does not have a per channel deconvolution!"); + return xnn_setup_deconvolution2d_nhwc_qs8( + op, /* xnn_operator_t deconvolution_op */ + batch, /* size_t batch_size */ + in_h, /* size_t input_height */ + in_w, /* size_t input_width */ + adj_h, /* uint32_t adjustment_height */ + adj_w, /* uint32_t adjustment_width */ + inp, /* const int8_t* input */ + outp, /* int8_t* output */ + pt_pool); /* pthreadpool_t threadpool */ + } + + if (!per_channel) { + return xnn_setup_convolution2d_nhwc_qs8( + op, /* xnn_operator_t convolution_op */ + batch, /* size_t batch_size */ + in_h, /* size_t input_height */ + in_w, /* size_t input_width */ + inp, /* const int8_t* input */ + outp, /* int8_t* output */ + pt_pool); /* pthreadpool_t threadpool */ + } else { /* per_channel */ + return xnn_setup_convolution2d_nhwc_qc8( + op, /* xnn_operator_t convolution_op */ + batch, /* size_t batch_size */ + in_h, /* size_t input_height */ + in_w, /* size_t input_width */ + inp, /* const int8_t* input */ + outp, /* int8_t* output */ + pt_pool); /* pthreadpool_t threadpool */ + } +} + + +/* + * Series of wrapper functions to call xnn_create* and xnn_setup* + * functions for linear + */ +C10_ALWAYS_INLINE +enum xnn_status xnnp_create_fully_connected_nc( + size_t input_channels, + size_t output_channels, + size_t input_stride, + size_t output_stride, + int8_t input_zero_point, + float input_scale, + int8_t kernel_zero_point, + float kernel_scale, + const int8_t* kernel, + const int32_t* bias, + int8_t output_zero_point, + float output_scale, + int8_t output_min, + int8_t output_max, + uint32_t flags, + xnn_operator_t* fully_connected_op_out) { + /* Symmetric quantization forces kzp = 0 */ + TORCH_CHECK(!kernel_zero_point, "XNNPACK QS8 linear kernel expects kernel zero point to be zero." + "But got: ", kernel_zero_point); + return xnn_create_fully_connected_nc_qs8( + input_channels, /* size_t input_channels */ + output_channels, /* size_t output_channels */ + input_stride, /* size_t input_stride */ + output_stride, /* size_t output_stride */ + input_zero_point, /* int8_t input_zero_point */ + input_scale, /* float input_scale */ + kernel_scale, /* float kernel_scale */ + kernel, /* const int8_t* kernel */ + bias, /* const int32_t* bias */ + output_zero_point, /* int8_t output_zero_point */ + output_scale, /* float output_scale */ + output_min, /* int8_t output_min */ + output_max, /* int8_t output_max */ + flags, /* uint32_t flags */ + fully_connected_op_out); /* xnn_operator_t* fully_connected_op_out */ +} + +C10_ALWAYS_INLINE +enum xnn_status xnnp_setup_fully_connected_nc( + xnn_operator_t fully_connected_op, + size_t batch_size, + const int8_t* input, + int8_t* output, + pthreadpool_t threadpool) { + return xnn_setup_fully_connected_nc_qs8( + fully_connected_op, /* xnn_operator_t fully_connected_op */ + batch_size, /* size_t batch_size */ + input, /* const int8_t* input */ + output, /* int8_t* output */ + threadpool); /* pthreadpool_t threadpool */ +} + +} // namespace xnnp_utils +} // namespace native +} // namespace at + +#endif // USE_XNNPACK diff --git a/aten/src/ATen/native/quantized/cudnn/BinaryOps.cpp b/aten/src/ATen/native/quantized/cudnn/BinaryOps.cpp new file mode 100644 index 00000000000000..e81814d28e1581 --- /dev/null +++ b/aten/src/ATen/native/quantized/cudnn/BinaryOps.cpp @@ -0,0 +1,224 @@ +#ifdef USE_CUDA +#include // for the definition of AT_CUDNN_ENABLED + +#if AT_CUDNN_ENABLED() +#include +#if HAS_CUDNN_V8() + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +namespace at { +namespace native { +namespace { +// FIXME: make this thread-safe by reusing the benchmark cache in Conv_v7.cpp +namespace { +struct CacheKey { + uint8_t input_a_alignment; + uint8_t input_b_alignment; + uint8_t output_alignment; + bool kReluFused; +}; +std::unordered_map, at::native::ParamsEqual> execution_plan_cache; +} + +// TODO: this is also in qadd.cpp and some other cpp files in quantized/cpu/. I think we should +// move everything into a utilities file in quantized/ directory later. +inline void check_inputs(const Tensor& qa, const Tensor& qb) { + TORCH_CHECK( + qa.qscheme() == kPerTensorAffine, + "Only per tensor quantization is suported in Add."); + TORCH_CHECK( + qa.qscheme() == qb.qscheme(), + "Both inputs to Add must have the same quantization shceme."); + TORCH_CHECK( + qa.scalar_type() == qb.scalar_type(), + "Add operands should have same data type."); +} + +// currently we only support int8 symmetric (zero_point = 0 for inputs and output) quantized add +// We implement relu ( (a_int8 + b_int8 * ( b_scale/a_scale) ) ) * ( a_scale / out_scale ) +// which requires 4 cudnn ops (2 multiplication, 1 addition, and 1 relu ops) +// Multiplication ops: rhs_mult_op, requant_op +// Addition op: add_op +// Relu op: relu_op +template +Tensor add(Tensor qa, Tensor qb, double output_scale, int64_t output_zero_point) { + if (qa.numel() == 0) { + return Tensor{}; + } + // TODO: add shape checking when broadcasted add is supported. For now we assume the input tensors are the same shape + TORCH_CHECK(qa.sizes() == qb.sizes(), "Quantized cudnn add currently expects both input tensors to be the same shape"); + + check_inputs(qa, qb); + + // cudnn expects tensors to be at least 3D. So we will prepend dummy dimensions if the input tensors are not at least 3D + auto orig_sizes = qa.sizes().vec(); + if (qa.dim() < 3) { + std::vector new_sizes(3, 1); + // cudnn expects leading dimensions to be the dummy dimensions + new_sizes.back() = qa.sizes().back(); + if (qa.dim() == 2) { + new_sizes[1] = qa.size(0); + } + qa = qa.view(new_sizes); + qb = qb.view(new_sizes); + } + + at::Tensor add_output = at::empty(qa.sizes(), at::device(at::kCUDA).dtype(at::kFloat)); + at::Tensor quantized_output = at::_empty_affine_quantized( + qa.sizes(), + at::device(at::kCUDA).dtype(at::ScalarType::QInt8), + output_scale, + output_zero_point); + // TODO: When cudnn enables support for broadcasting, we can remove this tensor + at::Tensor requantize_multiplier_tensor = at::empty(quantized_output.sizes(), at::device(at::kCUDA).dtype(at::kFloat)); + requantize_multiplier_tensor.fill_(qa.q_scale() / output_scale); + at::Tensor rhs_multiplier_tensor = at::empty(quantized_output.sizes(), at::device(at::kCUDA).dtype(at::kFloat)); + rhs_multiplier_tensor.fill_(qb.q_scale() / qa.q_scale()); + + cudnnHandle_t handle = at::native::getCudnnHandle(); + CacheKey key; + bool deterministic{true}; + bool allow_tf32{false}; + key.kReluFused = kReluFused; + key.input_a_alignment = cudnn_utils::getAlignment(qa); + key.input_b_alignment = cudnn_utils::getAlignment(qb); + key.output_alignment = cudnn_utils::getAlignment(add_output); + + auto run = [&](cudnn_frontend::ManagedOpaqueDescriptor plan_desc) { + auto workspace_size = 0; + auto workspace = at::empty({workspace_size}, qa.options().dtype(at::kByte)); + std::vector data_ptrs; + std::vector uids; + data_ptrs.reserve(8); + uids.reserve(8); + data_ptrs = {reinterpret_cast(qb.data_ptr()), rhs_multiplier_tensor.data_ptr(), add_output.data_ptr(), + reinterpret_cast(qa.data_ptr()), add_output.data_ptr(), requantize_multiplier_tensor.data_ptr(), + reinterpret_cast(quantized_output.data_ptr())}; + uids = {'b', 'm', 'c', 'a', 'p', 'r', 'q'}; + if (kReluFused) { + data_ptrs.emplace_back(add_output.data_ptr()), + uids.emplace_back('f'); + } + + auto variantPack = cudnn_frontend::VariantPackBuilder() + .setWorkspacePointer(workspace.data_ptr()) + .setDataPointers(uids.size(), data_ptrs.data()) + .setUids(uids.size(), uids.data()) + .build(); + auto variant_pack_desc = variantPack.get_raw_desc(); + AT_CUDNN_CHECK(cudnnBackendExecute(handle, plan_desc->get_backend_descriptor(), variant_pack_desc)); + }; + + auto search = execution_plan_cache.find(key); + if (search != execution_plan_cache.end()) { + cudnn_frontend::ManagedOpaqueDescriptor plan_desc = search->second; + run(plan_desc); + return quantized_output.view(orig_sizes); + } + + // computes qb_int8 * ( qb_scale/qa_scale ) + auto rhs_mult_op = cudnn_frontend::OperationBuilder(CUDNN_BACKEND_OPERATION_POINTWISE_DESCRIPTOR) + .setxDesc(cudnn_utils::getTensorDescriptor(qb.sizes(), qb.strides(), CUDNN_DATA_INT8, 'b', key.input_b_alignment)) + .setbDesc(cudnn_utils::getTensorDescriptor(rhs_multiplier_tensor, 'm', cudnn_utils::getAlignment(rhs_multiplier_tensor))) + .setyDesc(cudnn_utils::getTensorDescriptor(add_output, 'c', key.output_alignment)) + .setpwDesc(cudnn_utils::getPointWiseMulDescriptor(at::native::getCudnnDataType(add_output))) + .build(); + + // add_op computes (qa_int8 + qb_int8 * ( qb_scale/qa_scale ) ) + // add_output is a fp32 tensor for accumulation purposes + auto add_op = cudnn_frontend::OperationBuilder(CUDNN_BACKEND_OPERATION_POINTWISE_DESCRIPTOR) + .setxDesc(rhs_mult_op.getOutputTensor()) + .setbDesc(cudnn_utils::getTensorDescriptor(qa.sizes(), qa.strides(), CUDNN_DATA_INT8, 'a', key.input_a_alignment)) + .setyDesc(cudnn_utils::getTensorDescriptor(add_output, 'p', key.output_alignment)) + .setpwDesc(cudnn_utils::getPointWiseAddDescriptor(at::native::getCudnnDataType(add_output))) + .build(); + + // relu_op computes + // relu( (qa_int8 + qb_int8 * ( qb_scale/qa_scale ) ) ) + // output is a fp32 tensor + c10::optional relu_op; + if (kReluFused) { + // we use inplace operation here where the output is assigned to the input + relu_op.emplace(cudnn_frontend::OperationBuilder(CUDNN_BACKEND_OPERATION_POINTWISE_DESCRIPTOR) + .setxDesc(add_op.getOutputTensor()) + .setyDesc(cudnn_utils::getTensorDescriptor(add_output, 'f', key.output_alignment)) + .setpwDesc(cudnn_utils::getPointWiseReluDescriptor(at::native::getCudnnDataType(add_output))) + .build()); + } + + // requant_op computes + // (a_int8 + b_int8 * ( b_scale/a_scale) ) * a_scale / out_scale + auto requant_op = cudnn_frontend::OperationBuilder(CUDNN_BACKEND_OPERATION_POINTWISE_DESCRIPTOR) + .setxDesc(kReluFused ? relu_op.value().getOutputTensor() : add_op.getOutputTensor()) + .setbDesc(cudnn_utils::getTensorDescriptor(requantize_multiplier_tensor, 'r', cudnn_utils::getAlignment(requantize_multiplier_tensor))) + .setyDesc(cudnn_utils::getTensorDescriptor(quantized_output.sizes(), quantized_output.strides(), CUDNN_DATA_INT8, 'q', cudnn_utils::getAlignment(quantized_output))) + .setpwDesc(cudnn_utils::getPointWiseMulDescriptor(at::native::getCudnnDataType(requantize_multiplier_tensor))) + .build(); + + std::vector ops{&rhs_mult_op, &add_op}; + if (kReluFused) { + ops.emplace_back(&(relu_op.value())); + } + ops.emplace_back(&requant_op); + + auto opGraph = cudnn_frontend::OperationGraphBuilder() + .setHandle(handle) + .setOperationGraph(ops.size(), ops.data()) + .build(); + // std::cout << "opGraph: " << opGraph.describe() << std::endl; + + auto heuristics = cudnn_frontend::EngineHeuristicsBuilder() + .setOperationGraph(opGraph) + .setHeurMode(CUDNN_HEUR_MODE_INSTANT) + .build(); + auto fallback = cudnn_frontend::EngineFallbackListBuilder() + .setOperationGraph(opGraph) + .setOperation(CUDNN_BACKEND_OPERATION_POINTWISE_DESCRIPTOR) + .build(); + + auto& engine_configs = heuristics.getEngineConfig(heuristics.getEngineConfigCount()); + auto& fallback_list = fallback.getFallbackList(); + + cudnn_frontend::EngineConfigList filtered_configs; + cudnn_utils::filterEngineConfigs(engine_configs, filtered_configs, deterministic, allow_tf32, at::kChar); + cudnn_utils::filterEngineConfigs(fallback_list, filtered_configs, deterministic, allow_tf32, at::kChar); + for (auto &cfg : engine_configs) { + try { + auto plan = cudnn_frontend::ExecutionPlanBuilder() + .setHandle(handle) + .setEngineConfig(cfg) + .build(); + auto plan_desc = plan.get_desc(); + run(plan_desc); + execution_plan_cache[key] = plan_desc; + return quantized_output.view(orig_sizes); + } catch (cudnn_frontend::cudnnException &e) {std::cout << "cudnn error:" << e.what() << std::endl;} catch(c10::CuDNNError &e) { std::cout << "other error" << e.what() << std::endl;} + } + + TORCH_CHECK(false, "Unable to find an engine to execute this computation"); +} + +TORCH_LIBRARY_IMPL(quantized, QuantizedCUDA, m) { + m.impl(TORCH_SELECTIVE_NAME("quantized::add"), TORCH_FN(add)); + m.impl(TORCH_SELECTIVE_NAME("quantized::add_relu"), TORCH_FN(add)); +} + +} // namespace +} // namespace native +} // namespace at + +#endif // HAS_CUDNN_V8 +#endif // AT_CUDNN_ENABLED +#endif // USE_CUDA diff --git a/aten/src/ATen/native/quantized/cudnn/Conv.cpp b/aten/src/ATen/native/quantized/cudnn/Conv.cpp index a96e6d571261ce..abd555557ffe60 100644 --- a/aten/src/ATen/native/quantized/cudnn/Conv.cpp +++ b/aten/src/ATen/native/quantized/cudnn/Conv.cpp @@ -8,57 +8,25 @@ #if HAS_CUDNN_V8() -#include #include -#include #include +#include #include #include +#include +#include #include -#include #include +#include #include -#include #include +#include +#include -namespace at { namespace native{ - -namespace { - -uint8_t getAlignment(const Tensor &t) { - // alignment are in bytes - uint8_t alignment = 1; - uintptr_t address = reinterpret_cast(t.data_ptr()); - while (address % alignment == 0 && alignment < 16) alignment *= 2; - return alignment; -} - -cudnn_frontend::Tensor getTensorDescriptor(const Tensor &t, int64_t id, uint8_t alignment) { - auto shape = t.sizes(); - auto strides = t.strides(); - return cudnn_frontend::TensorBuilder() - .setDim(shape.size(), shape.data()) - .setStrides(strides.size(), strides.data()) - .setId(id) - .setAlignment(alignment) - .setDataType(getCudnnDataType(t)) - .build(); -} - -cudnn_frontend::Tensor getTensorDescriptor(const IntArrayRef& shape, const IntArrayRef& strides, cudnnDataType_t cudnn_dtype, int64_t id, uint8_t alignment) { - return cudnn_frontend::TensorBuilder() - .setDim(shape.size(), shape.data()) - .setStrides(strides.size(), strides.data()) - .setId(id) - .setAlignment(alignment) - .setDataType(cudnn_dtype) - .build(); -} - -// TODO: there is a table from input dtype and weight dtype to operator dtype, +// TODO: there is a table from input dtype and weight dtype to operator qdtype, // we can derive the operator dtype based on input dtype -cudnn_frontend::ConvDesc_v8 getConvDescriptor(cudnnDataType_t dataType, IntArrayRef padding, IntArrayRef stride, IntArrayRef dilation) { +cudnn_frontend::ConvDesc_v8 getConvDescriptor(cudnnDataType_t dataType, c10::IntArrayRef padding, c10::IntArrayRef stride, c10::IntArrayRef dilation) { uint64_t convDim = stride.size(); return cudnn_frontend::ConvDescBuilder() .setDataType(dataType) @@ -71,99 +39,18 @@ cudnn_frontend::ConvDesc_v8 getConvDescriptor(cudnnDataType_t dataType, IntArray .build(); } -// TODO: there is a table from input dtype to operator dtype, we can derive -// the operator dtype based on input dtype -cudnn_frontend::PointWiseDesc_v8 getPointWiseMulDescriptor(cudnnDataType_t dataType) { - return cudnn_frontend::PointWiseDescBuilder() - .setMode(cudnnPointwiseMode_t::CUDNN_POINTWISE_MUL) - .setMathPrecision(dataType) - .build(); -} - -// TODO: there is a table from input dtype to operator dtype, we can derive -// the operator dtype based on input dtype -cudnn_frontend::PointWiseDesc_v8 getPointWiseAddDescriptor(cudnnDataType_t dataType) { - return cudnn_frontend::PointWiseDescBuilder() - .setMode(cudnnPointwiseMode_t::CUDNN_POINTWISE_ADD) - .setMathPrecision(dataType) - .build(); -} - -// TODO: there is a table from input dtype to operator dtype, we can derive -// the operator dtype based on input dtype -cudnn_frontend::PointWiseDesc_v8 getPointWiseReluDescriptor(cudnnDataType_t dataType) { - return cudnn_frontend::PointWiseDescBuilder() - .setMode(cudnnPointwiseMode_t::CUDNN_POINTWISE_RELU_FWD) - .setMathPrecision(dataType) - .build(); -} - -void filterEngineConfigs( - cudnn_frontend::EngineConfigList &from, - cudnn_frontend::EngineConfigList &to, - bool deterministic, bool allow_tf32, c10::ScalarType scalar_type) -{ - auto filter = [=](cudnnBackendDescriptor_t c) { - if (deterministic) { - if (cudnn_frontend::hasNumericalNote(c)) return true; - } - if (scalar_type == kFloat || scalar_type == kChar || !allow_tf32) { - if (cudnn_frontend::hasNumericalNote(c)) return true; - if (cudnn_frontend::hasNumericalNote(c)) return true; - } - return false; - }; - cudnn_frontend::filter(from, to, filter); -} - -cudnn_frontend::ExecutionPlan -get_execplan_from_heuristics_else_fall_back(cudnn_frontend::OperationGraph&& opGraph, cudnnHandle_t handle_) { - auto heuristics = cudnn_frontend::EngineHeuristicsBuilder() - .setOperationGraph(opGraph) - .setHeurMode(CUDNN_HEUR_MODE_INSTANT) - .build(); - - // std::cout << "Heuristic has " << heuristics.getEngineConfigCount() << " configurations " << std::endl; - auto& engine_config = heuristics.getEngineConfig(heuristics.getEngineConfigCount()); - - // Try engine configs returned by the heuristics and pick up the first one that works. - for (auto& ecfg : engine_config) { - try { - auto plan = cudnn_frontend::ExecutionPlanBuilder() - .setHandle(handle_) - .setEngineConfig(ecfg, opGraph.getTag()) - .build(); - return plan; - } catch (cudnn_frontend::cudnnException& e) { - continue; - } - } - - { - auto total_engines = opGraph.getEngineCount(); - // std::cout << opGraph.describe() << " has " << total_engines << " engines." << std::endl; - auto engine = cudnn_frontend::EngineBuilder().setGlobalEngineIdx(0).setOperationGraph(opGraph).build(); - // std::cout << engine.describe() << std::endl; - - auto engine_config = cudnn_frontend::EngineConfigBuilder().setEngine(engine).build(); - // std::cout << engine_config.describe() << std::endl; - - return cudnn_frontend::ExecutionPlanBuilder().setHandle(handle_).setEngineConfig(engine_config).build(); - } -} - +// FIXME: make this thread-safe by reusing the benchmark cache in Conv_v7.cpp +namespace { struct CacheKey { - ConvolutionParams params; + at::native::ConvolutionParams params; uint8_t input_alignment; uint8_t weight_alignment; uint8_t output_alignment; // default to -1 when no bias int8_t bias_alignment; }; - -// FIXME: make this thread-safe by reusing the benchmark cache in Conv_v7.cpp -std::unordered_map, ParamsEqual> execution_plan_cache; - +std::unordered_map, at::native::ParamsEqual> execution_plan_cache; +} // TODO: we can use cudnn_frontend::ExecutionPlanCache when it supports caching // multiple operators // reference: https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/conv_sample.cpp#L293 @@ -175,9 +62,9 @@ at::SmallVector MakeConvOutputShape( int M, // output channels const std::array& input_image_shape, const std::vector& kernel, - IntArrayRef stride, - IntArrayRef padding, - IntArrayRef dilation); + const torch::List& stride, + const torch::List& padding, + const torch::List& dilation); template <> at::SmallVector MakeConvOutputShape<2>( @@ -185,9 +72,9 @@ at::SmallVector MakeConvOutputShape<2>( int M, // output channels const std::array& input_image_shape, const std::vector& kernel, - IntArrayRef stride, - IntArrayRef padding, - IntArrayRef dilation) { + const torch::List& stride, + const torch::List& padding, + const torch::List& dilation) { const int H = input_image_shape[0]; const int W = input_image_shape[1]; const int64_t Y_H = @@ -197,94 +84,82 @@ at::SmallVector MakeConvOutputShape<2>( return {N, M, Y_H, Y_W}; } + // the parameter quantized_output is a quantized tensor +template template -void raw_cudnn_convolution_forward_out( - const Tensor& quantized_output, - const Tensor& input, - const Tensor& weight, - const c10::optional &bias, - IntArrayRef padding, - IntArrayRef stride, - IntArrayRef dilation, - int64_t groups, - bool benchmark, - bool deterministic, - bool allow_tf32, - float bias_multiplier, - float requantize_multiplier -) { - TORCH_CHECK(!benchmark, "not supported yet"); +void PackedConvWeightCudnn::apply_impl_helper(const at::Tensor& quantized_output, const at::Tensor& input, double output_scale) { if (quantized_output.numel() == 0) { return; } - - Tensor conv_output = at::empty(quantized_output.sizes(), at::device(at::kCUDA).dtype(at::kFloat), at::MemoryFormat::ChannelsLast); + at::Tensor conv_output = at::empty(quantized_output.sizes(), at::device(at::kCUDA).dtype(at::kFloat), at::MemoryFormat::ChannelsLast); // TODO: combine empty & fill_ using full_like or full - Tensor requantize_multiplier_tensor = at::empty(quantized_output.sizes(), at::device(at::kCUDA).dtype(at::kFloat), at::MemoryFormat::ChannelsLast); + at::Tensor requantize_multiplier_tensor = at::empty(quantized_output.sizes(), at::device(at::kCUDA).dtype(at::kFloat), at::MemoryFormat::ChannelsLast); + auto act_scale = input.q_scale(); + auto weight_scale = orig_weight_.q_scale(); + auto requantize_multiplier = act_scale * weight_scale / output_scale; requantize_multiplier_tensor.fill_(requantize_multiplier); c10::optional bias_multiplier_tensor; - c10::optional after_scales_bias; - c10::optional after_add; c10::optional broadcasted_bias; - c10::optional after_relu; - if (bias.has_value()) { + if (bias_.has_value()) { // the input bias is a 1-D tensor whose size is the same as the size of the second dimension of quantized_output. // we need to add trailing dimensions in order to properly broadcast bias, otherwise broadcast_to will fail. // the number of trailling dimensions is quantized_output.dim() - 2, so the new size of the broadcast_bias // becomes quantized_output.dim() - 2 + 1. nothing needs to be done for the leading dimensions std::vector new_size(quantized_output.dim() - 1, 1); - new_size[0] = bias.value().size(0); - broadcasted_bias = bias.value().reshape(new_size); + new_size[0] = bias_.value().size(0); + broadcasted_bias = bias_.value().reshape(new_size); broadcasted_bias.value() = broadcasted_bias.value().broadcast_to(quantized_output.sizes()); broadcasted_bias.value() = broadcasted_bias.value().contiguous(c10::MemoryFormat::ChannelsLast); bias_multiplier_tensor = at::empty(quantized_output.sizes(), at::device(at::kCUDA).dtype(at::kFloat), at::MemoryFormat::ChannelsLast); + auto bias_multiplier = 1.0 / (act_scale * weight_scale); bias_multiplier_tensor.value().fill_(bias_multiplier); - after_scales_bias = at::empty(quantized_output.sizes(), at::device(at::kCUDA).dtype(at::kFloat), at::MemoryFormat::ChannelsLast); - after_add = at::empty(quantized_output.sizes(), at::device(at::kCUDA).dtype(at::kFloat), at::MemoryFormat::ChannelsLast); - } - if (kReluFused) { - after_relu = at::empty(quantized_output.sizes(), at::device(at::kCUDA).dtype(at::kFloat), at::MemoryFormat::ChannelsLast); } - cudnnHandle_t handle = getCudnnHandle(); + cudnnHandle_t handle = at::native::getCudnnHandle(); CacheKey key; - setConvolutionParams(&key.params, input, weight, padding, stride, dilation, groups, deterministic, allow_tf32); + bool deterministic{true}; + bool allow_tf32{false}; + auto padding_vec = padding_.vec(); + auto stride_vec = stride_.vec(); + auto dilation_vec = dilation_.vec(); + setConvolutionParams(&key.params, input, orig_weight_, padding_vec, stride_vec, dilation_vec, groups_, deterministic, allow_tf32); + // operator datatype needs to be int32 for int8 convolution, but we can // set the datatype for output tensor to int32 or fp32 key.params.dataType = CUDNN_DATA_INT32; - key.input_alignment = getAlignment(input); - key.output_alignment = getAlignment(conv_output); - key.weight_alignment = getAlignment(weight); - if (bias.has_value()) { - key.bias_alignment = getAlignment(broadcasted_bias.value()); + key.input_alignment = cudnn_utils::getAlignment(input); + key.output_alignment = cudnn_utils::getAlignment(conv_output); + key.weight_alignment = cudnn_utils::getAlignment(orig_weight_); + if (bias_.has_value()) { + key.bias_alignment = cudnn_utils::getAlignment(broadcasted_bias.value()); } else { key.bias_alignment = -1; } auto run = [&](cudnn_frontend::ManagedOpaqueDescriptor plan_desc) { auto workspace_size = 0; - auto workspace = at::empty({workspace_size}, input.options().dtype(kByte)); + auto workspace = at::empty({workspace_size}, input.options().dtype(at::kByte)); std::vector data_ptrs; std::vector uids; data_ptrs.reserve(10); uids.reserve(10); data_ptrs = {reinterpret_cast(input.data_ptr()), conv_output.data_ptr(), - reinterpret_cast(weight.data_ptr()), + reinterpret_cast(orig_weight_.data_ptr()), requantize_multiplier_tensor.data_ptr(), reinterpret_cast(quantized_output.data_ptr())}; uids = {'x', 'y', 'w', 's', 'r'}; - if (bias.has_value()) { + if (bias_.has_value()) { data_ptrs.insert(data_ptrs.end(), {broadcasted_bias.value().data_ptr(), bias_multiplier_tensor.value().data_ptr(), - after_scales_bias.value().data_ptr(), after_add.value().data_ptr()}); + broadcasted_bias.value().data_ptr(), conv_output.data_ptr()}); uids.insert(uids.end(), {'b', 'c', 'd', 'e'}); if (kReluFused) { - data_ptrs.emplace_back(after_relu.value().data_ptr()), + data_ptrs.emplace_back(conv_output.data_ptr()), uids.emplace_back('f'); } } else { if (kReluFused) { - data_ptrs.emplace_back(after_relu.value().data_ptr()); + data_ptrs.emplace_back(conv_output.data_ptr()); uids.emplace_back('f'); } } @@ -307,41 +182,40 @@ void raw_cudnn_convolution_forward_out( // where act_fp32 and w_fp32 are the input and weight variables, resp. // output is a fp32 tensor auto conv_op = cudnn_frontend::OperationBuilder(CUDNN_BACKEND_OPERATION_CONVOLUTION_FORWARD_DESCRIPTOR) - .setxDesc(getTensorDescriptor(input, 'x', key.input_alignment)) - .setyDesc(getTensorDescriptor(conv_output, 'y', key.output_alignment)) - .setwDesc(getTensorDescriptor(weight, 'w', key.weight_alignment)) - .setcDesc(getConvDescriptor(key.params.dataType, padding, stride, dilation)) + .setxDesc(cudnn_utils::getTensorDescriptor(input.sizes(), input.strides(), CUDNN_DATA_INT8, 'x', key.input_alignment)) + .setyDesc(cudnn_utils::getTensorDescriptor(conv_output, 'y', key.output_alignment)) + .setwDesc(cudnn_utils::getTensorDescriptor(orig_weight_.sizes(), orig_weight_.strides(), CUDNN_DATA_INT8, 'w', key.weight_alignment)) + .setcDesc(getConvDescriptor(key.params.dataType, padding_vec, stride_vec, dilation_vec)) .build(); // std::cout << "operator:" << conv_op.describe() << std::endl; c10::optional bias_mult_op; c10::optional sum_conv_bias_op; - if (bias.has_value()) { + if (bias_.has_value()) { // we can't directly assign bias_mult_op becauase operator= is deleted for cudnn_frontend::Operation; // alternatively, I think we can use std::unique_ptr and dynamically allocate these builder ops // but here, we chose to do it statically. c10::optional::emplace() enables this approach - // TODO: can we assign the result back into bias and get rid of after_scales_bias? pending NVIDIA response // bias_mult_op computes bias_fp32 / (act_scale * w_scale) or bias_fp32 * (1 / (act_scale * w_scale)) // where bias_multiplier = (1 / (act_scale * w_scale)) // output is a fp32 tensor + // we use inplace operation here where the output is assigned to the input bias_mult_op.emplace(cudnn_frontend::OperationBuilder(CUDNN_BACKEND_OPERATION_POINTWISE_DESCRIPTOR) - .setxDesc(getTensorDescriptor(broadcasted_bias.value(), 'b', getAlignment(broadcasted_bias.value()))) - .setbDesc(getTensorDescriptor(bias_multiplier_tensor.value(), 'c', getAlignment(bias_multiplier_tensor.value()))) - .setyDesc(getTensorDescriptor(after_scales_bias.value(), 'd', getAlignment(after_scales_bias.value()))) - .setpwDesc(getPointWiseMulDescriptor(getCudnnDataType(bias_multiplier_tensor.value()))) + .setxDesc(cudnn_utils::getTensorDescriptor(broadcasted_bias.value(), 'b', cudnn_utils::getAlignment(broadcasted_bias.value()))) + .setbDesc(cudnn_utils::getTensorDescriptor(bias_multiplier_tensor.value(), 'c', cudnn_utils::getAlignment(bias_multiplier_tensor.value()))) + .setyDesc(cudnn_utils::getTensorDescriptor(broadcasted_bias.value(), 'd', cudnn_utils::getAlignment(broadcasted_bias.value()))) + .setpwDesc(cudnn_utils::getPointWiseMulDescriptor(at::native::getCudnnDataType(bias_multiplier_tensor.value()))) .build()); - // TODO: can we assign the result back into conv_output and get rid of after_add? - // computes (act_int8 * w_int8 + [bias_fp32/(act_scale * w_scale)]) - // where the 1st and 2nd summands is conv_output and after_scales_bias, resp. + // where the 1st and 2nd summands is conv_output and broadcasted_bias, resp. // output is a fp32 tensor + // we use inplace operation here where the output is assigned to the input sum_conv_bias_op.emplace(cudnn_frontend::OperationBuilder(CUDNN_BACKEND_OPERATION_POINTWISE_DESCRIPTOR) .setxDesc(conv_op.getOutputTensor()) - .setbDesc(getTensorDescriptor(after_scales_bias.value(), 'd', getAlignment(after_scales_bias.value()))) - .setyDesc(getTensorDescriptor(after_add.value(), 'e', getAlignment(after_add.value()))) - .setpwDesc(getPointWiseAddDescriptor(getCudnnDataType(after_scales_bias.value()))) + .setbDesc(cudnn_utils::getTensorDescriptor(broadcasted_bias.value(), 'd', cudnn_utils::getAlignment(broadcasted_bias.value()))) + .setyDesc(cudnn_utils::getTensorDescriptor(conv_output, 'e', key.output_alignment)) + .setpwDesc(cudnn_utils::getPointWiseAddDescriptor(at::native::getCudnnDataType(broadcasted_bias.value()))) .build()); } @@ -349,13 +223,13 @@ void raw_cudnn_convolution_forward_out( // or relu(act_int8 * w_int8) if bias is not present. // output is a fp32 tensor c10::optional relu_op; - std::shared_ptr tensor2requant_ptr = bias.has_value() ? sum_conv_bias_op.value().getOutputTensor() : conv_op.getOutputTensor(); + std::shared_ptr tensor2requant_ptr = bias_.has_value() ? sum_conv_bias_op.value().getOutputTensor() : conv_op.getOutputTensor(); if (kReluFused) { - // TODO: can we assign the result back into conv_output and get rid of after_relu? + // we use inplace operation here where the output is assigned to the input relu_op.emplace(cudnn_frontend::OperationBuilder(CUDNN_BACKEND_OPERATION_POINTWISE_DESCRIPTOR) .setxDesc(tensor2requant_ptr) - .setyDesc(getTensorDescriptor(after_relu.value(), 'f', getAlignment(after_relu.value()))) - .setpwDesc(getPointWiseReluDescriptor(getCudnnDataType(after_relu.value()))) + .setyDesc(cudnn_utils::getTensorDescriptor(conv_output, 'f', key.output_alignment)) + .setpwDesc(cudnn_utils::getPointWiseReluDescriptor(at::native::getCudnnDataType(conv_output))) .build()); } @@ -364,14 +238,14 @@ void raw_cudnn_convolution_forward_out( // output is a fp32 tensor auto requant_op = cudnn_frontend::OperationBuilder(CUDNN_BACKEND_OPERATION_POINTWISE_DESCRIPTOR) .setxDesc(kReluFused ? relu_op.value().getOutputTensor() : tensor2requant_ptr) - .setbDesc(getTensorDescriptor(requantize_multiplier_tensor, 's', getAlignment(requantize_multiplier_tensor))) - .setyDesc(getTensorDescriptor(quantized_output.sizes(), quantized_output.strides(), CUDNN_DATA_INT8, 'r', getAlignment(quantized_output))) - .setpwDesc(getPointWiseMulDescriptor(getCudnnDataType(requantize_multiplier_tensor))) + .setbDesc(cudnn_utils::getTensorDescriptor(requantize_multiplier_tensor, 's', cudnn_utils::getAlignment(requantize_multiplier_tensor))) + .setyDesc(cudnn_utils::getTensorDescriptor(quantized_output.sizes(), quantized_output.strides(), CUDNN_DATA_INT8, 'r', cudnn_utils::getAlignment(quantized_output))) + .setpwDesc(cudnn_utils::getPointWiseMulDescriptor(at::native::getCudnnDataType(requantize_multiplier_tensor))) .build(); // std::cout << "operator:" << requant_op.describe() << std::endl; std::vector ops{&conv_op}; - if (bias.has_value()) { + if (bias_.has_value()) { ops.emplace_back(&(bias_mult_op.value())); ops.emplace_back(&(sum_conv_bias_op.value())); } @@ -399,8 +273,8 @@ void raw_cudnn_convolution_forward_out( auto& fallback_list = fallback.getFallbackList(); cudnn_frontend::EngineConfigList filtered_configs; - filterEngineConfigs(engine_configs, filtered_configs, deterministic, allow_tf32, input.scalar_type()); - filterEngineConfigs(fallback_list, filtered_configs, deterministic, allow_tf32, input.scalar_type()); + cudnn_utils::filterEngineConfigs(engine_configs, filtered_configs, deterministic, allow_tf32, at::kChar); + cudnn_utils::filterEngineConfigs(fallback_list, filtered_configs, deterministic, allow_tf32, at::kChar); for (auto &cfg : engine_configs) { try { @@ -412,7 +286,7 @@ void raw_cudnn_convolution_forward_out( run(plan_desc); execution_plan_cache[key] = plan_desc; return; - } catch (cudnn_frontend::cudnnException &e) {std::cout << "cudnn error:" << e.what() << std::endl;} catch(CuDNNError &e) { std::cout << "other error" << e.what() << std::endl;} + } catch (cudnn_frontend::cudnnException &e) {std::cout << "cudnn error:" << e.what() << std::endl;} catch(c10::CuDNNError &e) { std::cout << "other error" << e.what() << std::endl;} } TORCH_CHECK(false, "Unable to find an engine to execute this computation"); @@ -436,94 +310,90 @@ out_int8 = (act_fp32 * w_fp32 + [bias_fp32]) / out_scale + out_zero_point = (act_int8 * w_int8 + [bias_fp32/(act_scale * w_scale)]) / (out_scale / (act_scale * w_scale)) = requantize((act_int8 * w_int8 + [bias_fp32/(act_scale * w_scale)]), out_scale / (act_scale * w_scale)) */ -template -Tensor raw_cudnn_convolution_forward( - const Tensor& act, - const Tensor& weight, - c10::optional bias, - IntArrayRef padding, - IntArrayRef stride, - IntArrayRef dilation, - int64_t groups, - bool benchmark, - bool deterministic, - bool allow_tf32, - float bias_multiplier, - float requantize_multiplier, +template +template +at::Tensor PackedConvWeightCudnn::apply_impl( + const at::Tensor& act, double output_scale, int64_t output_zero_point) { - // TODO: add dimension validations for input/weight/bias const int N = act.size(0); const int D = kSpatialDim == 3 ? act.size(2) : 1; const int H = act.size(kSpatialDim); const int W = act.size(kSpatialDim + 1); - const int M = weight.size(0); // output channels - std::vector kernel_size = {weight.size(2), weight.size(3)}; - at::SmallVector output_shape{MakeConvOutputShape(N, M, {H, W}, - kernel_size, stride, padding, dilation)}; - Tensor quantized_output = at::_empty_affine_quantized( + const int M = orig_weight_.size(0); // output channels + std::vector kernel_size = {orig_weight_.size(2), orig_weight_.size(3)}; + at::SmallVector output_shape = MakeConvOutputShape(N, M, {H, W}, + kernel_size, stride_, padding_, dilation_); + at::Tensor quantized_output = at::_empty_affine_quantized( output_shape, - at::device(at::kCUDA).dtype(ScalarType::QInt8), + at::device(at::kCUDA).dtype(at::ScalarType::QInt8), output_scale, output_zero_point, at::MemoryFormat::ChannelsLast); - raw_cudnn_convolution_forward_out( - quantized_output, act, weight, bias, - padding, stride, dilation, groups, - benchmark, - deterministic, - allow_tf32, - bias_multiplier, - requantize_multiplier); - + // requantization + // out_int8 = act_int8 * weight_int8 * act_scale * w_scale / output_scale + apply_impl_helper( + quantized_output, act, output_scale); return quantized_output; } +template +at::Tensor PackedConvWeightCudnn::apply( + const at::Tensor& input, + double output_scale, + int64_t output_zero_point) { + return apply_impl(input, output_scale, output_zero_point); +} + +template +at::Tensor PackedConvWeightCudnn::apply_relu( + const at::Tensor& input, + double output_scale, + int64_t output_zero_point) { + return apply_impl(input, output_scale, output_zero_point); +} + +template at::Tensor PackedConvWeightCudnn<2>::apply( + const at::Tensor& act, + double output_scale, + int64_t output_zero_point); + +template at::Tensor PackedConvWeightCudnn<2>::apply_relu( + const at::Tensor& act, + double output_scale, + int64_t output_zero_point); + +namespace at { +namespace native { +namespace { template class QConvInt8 final { public: - static Tensor run( - Tensor act, - Tensor weight, - c10::optional bias, - torch::List stride, - torch::List padding, - torch::List dilation, - int64_t groups, + static at::Tensor run( + at::Tensor act, + const c10::intrusive_ptr>& packed_weight, double output_scale, int64_t output_zero_point) { act = act.contiguous(c10::MemoryFormat::ChannelsLast); - weight = weight.contiguous(c10::MemoryFormat::ChannelsLast); - // requantization - // out_int8 = act_int8 * weight_int8 * act_scale * w_scale / output_scale - auto act_scale = act.q_scale(); - auto weight_scale = weight.q_scale(); - auto requantize_multiplier = act_scale * weight_scale / output_scale; - auto bias_multiplier = 1.0 / (act_scale * weight_scale); - // TODO: check all zero_points are zero/all tensors are symmetrically quantized - return raw_cudnn_convolution_forward( - act.int_repr(), weight.int_repr(), bias, - IntArrayRef(padding.vec()), IntArrayRef(stride.vec()), IntArrayRef(dilation.vec()), groups, - false /* benchmark */, - true /* deterministic */, - false /* allow_tf32 */, - bias_multiplier, - requantize_multiplier, - output_scale, - output_zero_point - ); + if (kReluFused) { + return packed_weight->apply_relu(act, output_scale, output_zero_point); + } else { + return packed_weight->apply(act, output_scale, output_zero_point); + } } }; TORCH_LIBRARY_IMPL(quantized, QuantizedCUDA, m) { - m.impl(TORCH_SELECTIVE_NAME("quantized::conv2d_cudnn"), QConvInt8<2, false>::run); - m.impl(TORCH_SELECTIVE_NAME("quantized::conv2d_relu_cudnn"), QConvInt8<2, true>::run); + m.impl(TORCH_SELECTIVE_NAME("quantized::conv2d.new"), QConvInt8<2, false>::run); + m.impl(TORCH_SELECTIVE_NAME("quantized::conv2d_relu.new"), QConvInt8<2, true>::run); } } // namespace -}} // at::native +} // namespace native +} // namespace at + #endif // HAS_CUDNN_V8 #endif // AT_CUDNN_ENABLED diff --git a/aten/src/ATen/native/quantized/cudnn/Linear.cpp b/aten/src/ATen/native/quantized/cudnn/Linear.cpp new file mode 100644 index 00000000000000..e4579bfc826bcf --- /dev/null +++ b/aten/src/ATen/native/quantized/cudnn/Linear.cpp @@ -0,0 +1,345 @@ +#ifdef USE_CUDA +#include // for the definition of AT_CUDNN_ENABLED + +#if AT_CUDNN_ENABLED() + +#include +#include + +#if HAS_CUDNN_V8() + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +// TODO: there is a table from input dtype and weight dtype to operator dtype, +// we can derive the operator dtype based on input dtype +cudnn_frontend::MatMulDesc_v8 getLinearDescriptor(cudnnDataType_t dataType) { + return cudnn_frontend::MatMulDescBuilder() + .setMathPrecision(dataType) + .build(); +} + +struct CacheKey { + uint8_t input_alignment; + uint8_t weight_alignment; + uint8_t output_alignment; + // default to -1 when no bias + int8_t bias_alignment; +}; + +// FIXME: make this thread-safe by reusing the benchmark cache in Conv_v7.cpp +namespace { +std::unordered_map, at::native::ParamsEqual> execution_plan_cache; +} +// TODO: we can use cudnn_frontend::ExecutionPlanCache when it supports caching +// multiple operators +// reference: https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/conv_sample.cpp#L293 +//static cudnn_frontend::ExecutionPlanCache plan_cache("sample_cache"); + +// currently we only support int8 symmetric (zero_point = 0 for inputs and output) quantized linear op +// We implement relu(act_int8 * transpose(w_int8) + [bias_fp32/(act_scale * w_scale] ) * ( act_scale * w_scale / out_scale ) +// which requires 5 cudnn ops (1 matmul, 2 multiplication, 1 add, and 1 relu ops) +// matmul op: linear_op +// Multiplication ops: rhs_mult_op, requant_op +// Addition op: add_op +// Relu op: relu_op +template +void PackedLinearWeightCudnn::apply_impl_helper(const at::Tensor& quantized_output, const at::Tensor& input, double output_scale) { + if (quantized_output.numel() == 0) { + return; + } + at::Tensor linear_output = at::empty(quantized_output.sizes(), at::device(at::kCUDA).dtype(at::kFloat)); + auto act_scale = input.q_scale(); + auto weight_scale = orig_weight.q_scale(); + auto requantize_multiplier = act_scale * weight_scale / output_scale; + at::Tensor requantize_multiplier_tensor = at::full(quantized_output.sizes(), requantize_multiplier, at::device(at::kCUDA).dtype(at::kFloat)); + requantize_multiplier_tensor.fill_(requantize_multiplier); + c10::optional bias_multiplier_tensor; + c10::optional broadcasted_bias; + if (bias_.has_value()) { + // the input bias is a 1-D tensor whose size is the same as the size of the second dimension of quantized_output. + // we need to add trailing dimensions in order to properly broadcast bias, otherwise broadcast_to will fail. + // the number of trailling dimensions is quantized_output.dim() - 2. We also prepend a leading dimension for clarity + std::vector new_size(quantized_output.dim(), 1); + new_size[1] = bias_.value().size(0); + broadcasted_bias = bias_.value().reshape(new_size); + broadcasted_bias.value() = broadcasted_bias.value().broadcast_to(quantized_output.sizes()); + bias_multiplier_tensor = at::empty(quantized_output.sizes(), at::device(at::kCUDA).dtype(at::kFloat)); + auto bias_multiplier = 1.0 / (act_scale * weight_scale); + bias_multiplier_tensor.value().fill_(bias_multiplier); + } + + cudnnHandle_t handle = at::native::getCudnnHandle(); + CacheKey key; + bool deterministic{true}; + bool allow_tf32{false}; + + key.input_alignment = cudnn_utils::getAlignment(input); + key.output_alignment = cudnn_utils::getAlignment(linear_output); + key.weight_alignment = cudnn_utils::getAlignment(orig_weight); + if (bias_.has_value()) { + key.bias_alignment = cudnn_utils::getAlignment(broadcasted_bias.value()); + } else { + key.bias_alignment = -1; + } + // the matmul operation is input * transpose(weight), so we will work with the transposed weight + auto weight_transposed = transpose(orig_weight, 0, 1); + // cudnn expects tensors to be at least 3D. weight_transposed is currently 2D. we will create a 3D view + // by prepending a leading dummy dimension (cudnn expects leading dimensions to be the dummy dimensions) + std::vector new_sizes(3, 1); + new_sizes.back() = weight_transposed.size(1); + new_sizes[1] = weight_transposed.size(0); + weight_transposed = weight_transposed.view(new_sizes); + // TODO: remove this with int8 matmul is supported + auto input_fp = input.int_repr().to(at::kFloat); + auto weight_fp = weight_transposed.int_repr().to(at::kFloat); + + auto run = [&](cudnn_frontend::ManagedOpaqueDescriptor plan_desc) { + auto workspace_size = 0; + auto workspace = at::empty({workspace_size}, input.options().dtype(at::kByte)); + std::vector data_ptrs; + std::vector uids; + data_ptrs.reserve(10); + uids.reserve(10); + data_ptrs = {input_fp.data_ptr(), linear_output.data_ptr(), + weight_fp.data_ptr(), + requantize_multiplier_tensor.data_ptr(), + reinterpret_cast(quantized_output.data_ptr())}; + uids = {'x', 'y', 'w', 's', 'r'}; + if (bias_.has_value()) { + data_ptrs.insert(data_ptrs.end(), {broadcasted_bias.value().data_ptr(), bias_multiplier_tensor.value().data_ptr(), + broadcasted_bias.value().data_ptr(), linear_output.data_ptr()}); + uids.insert(uids.end(), {'b', 'c', 'd', 'e'}); + if (kReluFused) { + data_ptrs.emplace_back(linear_output.data_ptr()), + uids.emplace_back('f'); + } + } else { + if (kReluFused) { + data_ptrs.emplace_back(linear_output.data_ptr()); + uids.emplace_back('f'); + } + } + auto variantPack = cudnn_frontend::VariantPackBuilder() + .setWorkspacePointer(workspace.data_ptr()) + .setDataPointers(uids.size(), data_ptrs.data()) + .setUids(uids.size(), uids.data()) + .build(); + auto variant_pack_desc = variantPack.get_raw_desc(); + AT_CUDNN_CHECK(cudnnBackendExecute(handle, plan_desc->get_backend_descriptor(), variant_pack_desc)); + }; + + auto search = execution_plan_cache.find(key); + if (search != execution_plan_cache.end()) { + cudnn_frontend::ManagedOpaqueDescriptor plan_desc = search->second; + run(plan_desc); + return; + } + + // linear_op computes act_int8 * tranpose(w_int8) (matrix multiplication) + // where act_int8 and w_int8 are the input and weight variables, resp. + // output is a fp32 tensor + auto linear_op = cudnn_frontend::OperationBuilder(CUDNN_BACKEND_OPERATION_MATMUL_DESCRIPTOR) + // TODO: make these 2 CUDNN_DATA_INT8 when cudnn enables int8 matmul + // .setaMatDesc(cudnn_utils::getTensorDescriptor(input.sizes(), input.strides(), CUDNN_DATA_FLOAT, 'x', key.input_alignment)) + .setaMatDesc(cudnn_utils::getTensorDescriptor(input_fp.sizes(), input_fp.strides(), CUDNN_DATA_FLOAT, 'x', key.input_alignment)) + // .setbMatDesc(cudnn_utils::getTensorDescriptor(orig_weight.sizes(), orig_weight.strides(), CUDNN_DATA_FLOAT, 'w', key.weight_alignment)) + .setbMatDesc(cudnn_utils::getTensorDescriptor(weight_fp.sizes(), weight_fp.strides(), CUDNN_DATA_FLOAT, 'w', key.weight_alignment)) + .setcMatDesc(cudnn_utils::getTensorDescriptor(linear_output, 'y', key.output_alignment)) + .setmatmulDesc(getLinearDescriptor(CUDNN_DATA_FLOAT)) // is this right? should it be float? + .build(); + // std::cout << "operator:" << linear_op.describe() << std::endl; + + c10::optional bias_mult_op; + c10::optional sum_linear_bias_op; + if (bias_.has_value()) { + // we can't directly assign bias_mult_op becauase operator= is deleted for cudnn_frontend::Operation; + // alternatively, I think we can use std::unique_ptr and dynamically allocate these builder ops + // but here, we chose to do it statically. c10::optional::emplace() enables this approach + + // bias_mult_op computes bias_fp32 / (act_scale * w_scale) or bias_fp32 * (1 / (act_scale * w_scale)) + // where bias_multiplier = (1 / (act_scale * w_scale)) + // output is a fp32 tensor + // we use inplace operation here where the output is assigned to the input + bias_mult_op.emplace(cudnn_frontend::OperationBuilder(CUDNN_BACKEND_OPERATION_POINTWISE_DESCRIPTOR) + .setxDesc(cudnn_utils::getTensorDescriptor(broadcasted_bias.value(), 'b', cudnn_utils::getAlignment(broadcasted_bias.value()))) + .setbDesc(cudnn_utils::getTensorDescriptor(bias_multiplier_tensor.value(), 'c', cudnn_utils::getAlignment(bias_multiplier_tensor.value()))) + .setyDesc(cudnn_utils::getTensorDescriptor(broadcasted_bias.value(), 'd', cudnn_utils::getAlignment(broadcasted_bias.value()))) + .setpwDesc(cudnn_utils::getPointWiseMulDescriptor(at::native::getCudnnDataType(bias_multiplier_tensor.value()))) + .build()); + + // computes (act_int8 * w_int8 + [bias_fp32/(act_scale * w_scale)]) + // where the 1st and 2nd summands is linear_output and broadcasted_bias, resp. + // output is a fp32 tensor + // we use inplace operation here where the output is assigned to the input + sum_linear_bias_op.emplace(cudnn_frontend::OperationBuilder(CUDNN_BACKEND_OPERATION_POINTWISE_DESCRIPTOR) + .setxDesc(linear_op.getOutputTensor()) + .setbDesc(cudnn_utils::getTensorDescriptor(broadcasted_bias.value(), 'd', cudnn_utils::getAlignment(broadcasted_bias.value()))) + .setyDesc(cudnn_utils::getTensorDescriptor(linear_output, 'e', key.output_alignment)) + .setpwDesc(cudnn_utils::getPointWiseAddDescriptor(at::native::getCudnnDataType(broadcasted_bias.value()))) + .build()); + } + + // relu_op computes relu(act_int8 * w_int8 + [bias_fp32/(act_scale * w_scale)] + // or relu(act_int8 * w_int8) if bias is not present. + // output is a fp32 tensor + c10::optional relu_op; + std::shared_ptr tensor2requant_ptr = bias_.has_value() ? sum_linear_bias_op.value().getOutputTensor() : linear_op.getOutputTensor(); + if (kReluFused) { + // we use inplace operation here where the output is assigned to the input + relu_op.emplace(cudnn_frontend::OperationBuilder(CUDNN_BACKEND_OPERATION_POINTWISE_DESCRIPTOR) + .setxDesc(tensor2requant_ptr) + .setyDesc(cudnn_utils::getTensorDescriptor(linear_output, 'f', key.output_alignment)) + .setpwDesc(cudnn_utils::getPointWiseReluDescriptor(at::native::getCudnnDataType(linear_output))) + .build()); + } + + // requant_op computes relu(act_int8 * w_int8 + [bias_fp32/(act_scale * w_scale)]) / (out_scale / (act_scale * w_scale)) + // or relu(act_int8 * w_int8) / (out_scale / (act_scale * w_scale))) if bias is not present. + // output is a fp32 tensor + auto requant_op = cudnn_frontend::OperationBuilder(CUDNN_BACKEND_OPERATION_POINTWISE_DESCRIPTOR) + .setxDesc(kReluFused ? relu_op.value().getOutputTensor() : tensor2requant_ptr) + .setbDesc(cudnn_utils::getTensorDescriptor(requantize_multiplier_tensor, 's', cudnn_utils::getAlignment(requantize_multiplier_tensor))) + .setyDesc(cudnn_utils::getTensorDescriptor(quantized_output.sizes(), quantized_output.strides(), CUDNN_DATA_INT8, 'r', cudnn_utils::getAlignment(quantized_output))) + .setpwDesc(cudnn_utils::getPointWiseMulDescriptor(at::native::getCudnnDataType(requantize_multiplier_tensor))) + .build(); + // // std::cout << "operator:" << requant_op.describe() << std::endl; + + std::vector ops{&linear_op}; + if (bias_.has_value()) { + ops.emplace_back(&(bias_mult_op.value())); + ops.emplace_back(&(sum_linear_bias_op.value())); + } + if (kReluFused) { + ops.emplace_back(&(relu_op.value())); + } + ops.emplace_back(&requant_op); + + auto opGraph = cudnn_frontend::OperationGraphBuilder() + .setHandle(handle) + .setOperationGraph(ops.size(), ops.data()) + .build(); + // std::cout << "opGraph: " << opGraph.describe() << std::endl; + + auto heuristics = cudnn_frontend::EngineHeuristicsBuilder() + .setOperationGraph(opGraph) + .setHeurMode(CUDNN_HEUR_MODE_INSTANT) + .build(); + auto fallback = cudnn_frontend::EngineFallbackListBuilder() + .setOperationGraph(opGraph) + .setOperation(CUDNN_BACKEND_OPERATION_MATMUL_DESCRIPTOR) + .build(); + + auto& engine_configs = heuristics.getEngineConfig(heuristics.getEngineConfigCount()); + auto& fallback_list = fallback.getFallbackList(); + + cudnn_frontend::EngineConfigList filtered_configs; + cudnn_utils::filterEngineConfigs(engine_configs, filtered_configs, deterministic, allow_tf32, at::kChar); + cudnn_utils::filterEngineConfigs(fallback_list, filtered_configs, deterministic, allow_tf32, at::kChar); + + for (auto &cfg : engine_configs) { + try { + auto plan = cudnn_frontend::ExecutionPlanBuilder() + .setHandle(handle) + .setEngineConfig(cfg) + .build(); + auto plan_desc = plan.get_desc(); + run(plan_desc); + execution_plan_cache[key] = plan_desc; + return; + } catch (cudnn_frontend::cudnnException &e) {std::cout << "cudnn error:" << e.what() << std::endl;} catch(c10::CuDNNError &e) { std::cout << "other error" << e.what() << std::endl;} + } + + TORCH_CHECK(false, "Unable to find an engine to execute this computation"); +} + +// output Tensor will be a clampped int8 Tensor +// both act and weight will be int8 Tensor +// Numerics are the same as conv (see aten/src/ATen/native/quantized/Conv.cpp): +template +at::Tensor PackedLinearWeightCudnn::apply_impl( + const at::Tensor& act, + double output_scale, + int64_t output_zero_point) { + std::vector original_output_shape{act.sizes().vec()}; // 2D + original_output_shape.back() = orig_weight.size(0); // output channels + // cudnn expects tensors to be at least 3D. we will prepend a dummy dimension for quantized_output + std::vector output_shape(3, 1); + output_shape[1] = original_output_shape[0]; + output_shape[2] = original_output_shape[1]; + at::Tensor quantized_output = at::_empty_affine_quantized( + output_shape, + at::device(at::kCUDA).dtype(at::ScalarType::QInt8), + output_scale, + output_zero_point); + // cudnn expects tensors to be at least 3D. act is currently 2D. we will create a 3D view + std::vector new_sizes(3, 1); + // cudnn expects leading dimensions to be the dummy dimensions + new_sizes.back() = act.sizes().back(); + new_sizes[1] = act.size(0); + apply_impl_helper( + quantized_output, act.view(new_sizes), output_scale); + return quantized_output.view(original_output_shape); +} + +at::Tensor PackedLinearWeightCudnn::apply( + at::Tensor input, + double output_scale, + int64_t output_zero_point) { + return apply_impl(input, output_scale, output_zero_point); +} + +at::Tensor PackedLinearWeightCudnn::apply_relu( + at::Tensor input, + double output_scale, + int64_t output_zero_point) { + return apply_impl(input, output_scale, output_zero_point); +} + +namespace at { +namespace native { +namespace { + +template +class QLinearInt8 final { + public: + static at::Tensor run( + at::Tensor act, + const c10::intrusive_ptr& packed_weight, + double output_scale, + int64_t output_zero_point) { + // TODO: if act is more than 2D, I think we should flatten the first n-1 dimensions? + // TODO: check all zero_points are zero/all tensors are symmetrically quantized + if (kReluFused) { + return packed_weight->apply_relu(act, output_scale, output_zero_point); + } else { + return packed_weight->apply(act, output_scale, output_zero_point); + } + } +}; + +TORCH_LIBRARY_IMPL(quantized, QuantizedCUDA, m) { + m.impl(TORCH_SELECTIVE_NAME("quantized::linear"), QLinearInt8::run); + m.impl(TORCH_SELECTIVE_NAME("quantized::linear_relu"), QLinearInt8::run); +} + +} // namespace +} // namespace native +} // namespace at + + +#endif // HAS_CUDNN_V8 +#endif // AT_CUDNN_ENABLED +#endif // USE_CUDA diff --git a/aten/src/ATen/native/quantized/cudnn/Pooling.cpp b/aten/src/ATen/native/quantized/cudnn/Pooling.cpp new file mode 100644 index 00000000000000..747be7a831d895 --- /dev/null +++ b/aten/src/ATen/native/quantized/cudnn/Pooling.cpp @@ -0,0 +1,212 @@ +#ifdef USE_CUDA +#include // for the definition of AT_CUDNN_ENABLED + +#if AT_CUDNN_ENABLED() + +#include + +#if HAS_CUDNN_V8() + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +namespace at { +namespace native { +namespace { +// TODO: This function is the same as that of qpool.cpp. We should refactor this into quantized directory +// so that we don't need to duplicate the function +void check_maxpool2d_params( + IntArrayRef kernel_size, + IntArrayRef stride, + IntArrayRef padding, + IntArrayRef dilation) { + TORCH_CHECK(kernel_size.size() == 1 || kernel_size.size() == 2, + "Expected 1d or 2d kernel size, got ", kernel_size.size()); + TORCH_CHECK(stride.empty() || stride.size() == 2, + "Expected no strides or 2d strides, got", stride.size()); + TORCH_CHECK(padding.size() == 1 || padding.size() == 2, + "Expected 1d or 2d padding, got ", padding.size()); + TORCH_CHECK(dilation.size() == 1 || dilation.size() == 2, + "Expected 1d or 2d dilation, got ", dilation.size()); +} +} + +// Currently we support 4D and 3D input (qx) tensors, the latter of which is supported for +// legacy reasons. The first dimension of a 4D input tensor is the batch size. +// For a 3D tensor, there is no batch size dimension -- it can be viewed as a single batch. +// cudnn's 2D pooling operation requires the input and output to be 4D tensors, so we must cast +// any 3D tensors to 4D prior to using cudnn +// This implementation currently uses the v7 cudnn APIs as v8 cudnn APIs are not yet available for +// pooling operations. +// Consult https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnPoolingForward for +// documentation on the APIs +// Currently, it appears there is no cudnn support for dilated pooling -- we will +// submit a feature request for this with cudnn +// TODO: ideally, we would like to use structured kernel support here so we do not have to repeat +// the input checks, however, that would require us to implement max_pool2d_with_indices_out_quantized_cuda +// based on how the dispatch table is currently constructed in native_functions.yaml. currently, +// there is no support for producing indices with cudnn max pooling, so until that becomes available, this cannot be done. +Tensor quantized_max_pool2d_cudnn( + const Tensor& qx, + IntArrayRef kernel_size, + IntArrayRef stride, + IntArrayRef padding, + IntArrayRef dilation, + bool ceil_mode) { + check_maxpool2d_params( + kernel_size, + stride, + padding, + dilation); + if (stride.empty()) { + stride = kernel_size; + } + auto ndim = qx.dim(); + TORCH_CHECK( + ndim == 3 || ndim == 4, "Expecting the input tensor of rank 3 or 4."); + TORCH_CHECK( + kernel_size.size() == 2, + "quantized_max_pool2d_cudnn(): Expected kernel_size to be 2-dimensional: got ", + kernel_size.size()); + TORCH_CHECK( + stride.size() == 2, + "quantized_max_pool2d_cudnn(): Expected stride to be 2-dimensional: got ", + stride.size()); + TORCH_CHECK( + dilation.size() == 2, + "quantized_max_pool2d_cudnn(): Expected dilation to be 2-dimensional: got ", + dilation.size()); + TORCH_CHECK( + dilation[0] == 1 && dilation[1] == 1, + "quantized_max_pool2d_cudnn(): Expected dilation=[1, 1] (cudnn does not currently support dilation[i] != 1), got", + dilation); + TORCH_CHECK( + padding.size() == 2, + "quantized_max_pool2d_cudnn(): Expected padding to be 2-dimensional: got ", + padding.size()); + + auto input = qx; + if (ndim == 4) { + input = qx.contiguous(MemoryFormat::ChannelsLast); + } else { // 3D + std::vector new_sizes{1, qx.size(0), qx.size(1), qx.size(2)}; + input = qx.view(new_sizes); + } + int batch_size = input.size(0); + int64_t inC = input.size(1); + int64_t inH = input.size(2); + int64_t inW = input.size(3); + // Check output dimensions. + int64_t padH = padding[0]; + int64_t padW = padding[1]; + int64_t kH = kernel_size[0]; + int64_t kW = kernel_size[1]; + int64_t strideH = stride[0]; + int64_t strideW = stride[1]; + TORCH_CHECK( + kH > 0 && kW > 0, + "qnnpack_maxpool2d(): kernel_size should be greater than zero."); + TORCH_CHECK( + strideH > 0 && strideW > 0, + "qnnpack_maxpool2d(): strides should be greater than zero."); + int64_t dilationH = dilation[0]; + int64_t dilationW = dilation[1]; + int64_t outC = inC; + int64_t outH = pooling_output_shape(inH, kH, padH, strideH, dilationH, ceil_mode); + int64_t outW = pooling_output_shape(inW, kW, padW, strideW, dilationW, ceil_mode); + TORCH_CHECK(outH > 0 && outW > 0, + "Given input size: (", + inC, "x", inH, "x", inW, + "). Calculated output size: (", + outC, "x", outH, "x", outW, + "). Output size is too small."); + + std::vector output_shape; + if (ndim == 3) { + // cudnn requires 4D input and output for 2D pooling, so we prepend a dummy dimension + // whose size represents the batch size (1) + output_shape = {1, outC, outH, outW}; + } else { + output_shape = {batch_size, outC, outH, outW}; + } + auto qy = at::_empty_affine_quantized( + output_shape, + at::device(at::kCUDA).dtype(at::ScalarType::QInt8), + input.q_scale(), + input.q_zero_point(), + (ndim == 4 ? MemoryFormat::ChannelsLast : MemoryFormat::Contiguous)); + + cudnnHandle_t handle = getCudnnHandle(); + cudnnPoolingDescriptor_t poolingDesc; + AT_CUDNN_CHECK_WITH_SHAPES(cudnnCreatePoolingDescriptor(&poolingDesc)); + AT_CUDNN_CHECK_WITH_SHAPES(cudnnSetPooling2dDescriptor( + poolingDesc, + CUDNN_POOLING_MAX_DETERMINISTIC, + CUDNN_NOT_PROPAGATE_NAN, + kernel_size[0], // kernel height + kernel_size[1], // kernel width + padding[0], // vertical padding + padding[1], // horizontal padding + stride[0], // vertical stride + stride[1])); // horizontal stride + + auto dataType = getCudnnDataType(input); + float one{1}; + float zero{0.0}; + TensorDescriptor xDesc; + at::MemoryFormat memory_format = (ndim == 4 ? at::MemoryFormat::ChannelsLast : at::MemoryFormat::Contiguous); + xDesc.set(input, memory_format); + TensorDescriptor yDesc; + yDesc.set(qy, memory_format); + cudnnPoolingForward(handle, + poolingDesc, + &one, + xDesc.desc(), + reinterpret_cast(input.data_ptr()), + &zero, + yDesc.desc(), + reinterpret_cast(qy.data_ptr())); + + // recall we casted our input and output to 4D if qx was 3D, so we recast it back to 3D prior to returning + return (ndim == 3 ? qy.view(std::vector(output_shape.begin() + 1, output_shape.end())) : qy); +} + +// Keep the registry in the anonymous namespace. +namespace { +template +class QMaxPool_arr_args final { + public: + static Tensor run( + Tensor qx, + std::vector kernel_size, + std::vector stride, + std::vector padding, + std::vector dilation, + bool ceil_mode) { + TORCH_CHECK(kSpatialDim == 2, "quantized max pool is only valid for 2D") + return quantized_max_pool2d_cudnn(qx, kernel_size, stride, padding, + dilation, ceil_mode); + } +}; + +TORCH_LIBRARY_IMPL(quantized, QuantizedCUDA, m) { + m.impl(TORCH_SELECTIVE_NAME("quantized::max_pool2d"), TORCH_FN(QMaxPool_arr_args<2>::run)); +} + +} // namespace +} // namespace native +} // namespace at + +#endif // HAS_CUDNN_V8 +#endif // AT_CUDNN_ENABLED +#endif // USE_CUDA diff --git a/aten/src/ATen/native/quantized/cudnn/conv_prepack.cpp b/aten/src/ATen/native/quantized/cudnn/conv_prepack.cpp new file mode 100644 index 00000000000000..70c05f33cc1aa8 --- /dev/null +++ b/aten/src/ATen/native/quantized/cudnn/conv_prepack.cpp @@ -0,0 +1,151 @@ +#ifdef USE_CUDA +#include // for the definition of AT_CUDNN_ENABLED + +#if AT_CUDNN_ENABLED() + +#include + +#if HAS_CUDNN_V8() + +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +template +c10::intrusive_ptr> PackedConvWeightCudnn< + kSpatialDim>:: + prepack( + at::Tensor weight, + c10::optional bias, + torch::List stride, + torch::List padding, + torch::List output_padding, + torch::List dilation, + int64_t groups, + bool transpose) { + TORCH_CHECK(weight.qscheme() == c10::kPerTensorAffine, "Unsupported qscheme: ", toString(weight.qscheme())); + TORCH_CHECK( + weight.ndimension() == kSpatialDim + 2, + "Weights are expected to have ", + kSpatialDim + 2, + " dimensions"); + TORCH_CHECK( + stride.size() == kSpatialDim, + "stride should contain ", + kSpatialDim, + " elements for ", + kSpatialDim, + "D convolution."); + TORCH_CHECK( + padding.size() == kSpatialDim, + "quantized::conv_prepack (cudnn): Specify front/top/left padding only. " + "end/bottom/right padding assumed to be equal to front/top/left"); + TORCH_CHECK( + !transpose || output_padding.size() == kSpatialDim, + "quantized::conv_prepack: Specify top/left output padding " + "only. bottom/right padding assumed to be equal to top/left"); + TORCH_CHECK( + dilation.size() == kSpatialDim, + "quantized::conv_prepack (cudnn): dilation should contain ", + kSpatialDim, + " elements for ", + kSpatialDim, + "D convolution."); + const int output_channels = transpose ? weight.size(1) * groups + : weight.size(0); + const auto qtype = weight.qscheme(); + if (bias.has_value()) { + TORCH_CHECK(bias.value().dim() == 1, "bias should be a vector (1D Tensor)"); + TORCH_CHECK( + bias.value().size(0) == output_channels, + "bias should have K elements: " + std::to_string(output_channels)); + // TODO: we create a broadcasted_bias tensor later so I think we don't need to make this contiguous here. + // we will revisit this when nvidia adds proper support for broadcasting + // bias_contig = bias->contiguous(); + } + + auto ret_ptr = c10::make_intrusive>( + weight.contiguous(c10::MemoryFormat::ChannelsLast), // TODO: this assumes 2D I think. make it more general? + bias, + stride, + padding, + output_padding, + dilation, + groups, + transpose, + qtype); + return ret_ptr; +} + +template +c10::intrusive_ptr> PackedConvWeightCudnn< + 2>:: + prepack( + at::Tensor weight, + c10::optional bias_in, + torch::List stride, + torch::List padding, + torch::List output_padding, + torch::List dilation, + int64_t groups, + bool transpose); + +namespace at { +namespace native { +namespace { + +template +class QConvPackWeightInt8Cudnn final { + public: + static c10::intrusive_ptr> run_conv( + Tensor weight, + c10::optional bias, + torch::List stride, + torch::List padding, + torch::List dilation, + int64_t groups) { + torch::List output_padding; + output_padding.reserve(kSpatialDim); + for (const auto idx : c10::irange(kSpatialDim)) { + (void)idx; //Suppress unused variable warning + output_padding.push_back((int64_t)0); + } + return _run(weight, bias, stride, padding, output_padding, dilation, groups, + /*transpose=*/false); + } + + private: + static c10::intrusive_ptr> _run( + Tensor weight, + c10::optional bias, + torch::List stride, + torch::List padding, + torch::List output_padding, + torch::List dilation, + int64_t groups, + bool transpose) { + return PackedConvWeightCudnn::prepack( + weight, bias, stride, padding, output_padding, dilation, groups, + transpose); + } +}; + +TORCH_LIBRARY_IMPL(quantized, QuantizedCUDA, m) { + m.impl(TORCH_SELECTIVE_NAME("quantized::conv2d_prepack"), TORCH_FN(QConvPackWeightInt8Cudnn<2>::run_conv)); +} + +} // namespace +} // namespace native +} // namespace at + +#endif // HAS_CUDNN_V8 +#endif // AT_CUDNN_ENABLED +#endif // USE_CUDA diff --git a/aten/src/ATen/native/quantized/cudnn/conv_unpack_impl.cpp b/aten/src/ATen/native/quantized/cudnn/conv_unpack_impl.cpp new file mode 100644 index 00000000000000..ca9611dca89066 --- /dev/null +++ b/aten/src/ATen/native/quantized/cudnn/conv_unpack_impl.cpp @@ -0,0 +1,28 @@ +#ifdef USE_CUDA +#include // for the definition of AT_CUDNN_ENABLED + +#if AT_CUDNN_ENABLED() + +#include + +#if HAS_CUDNN_V8() + +#include +#include +#include +#include + +#include + +template +std::tuple> PackedConvWeightCudnn< + kSpatialDim>::unpack() { + return std::tuple>{orig_weight_, bias_}; +} + +template std::tuple> PackedConvWeightCudnn< + 2>::unpack(); + +#endif // HAS_CUDNN_V8 +#endif // AT_CUDNN_ENABLED +#endif // USE_CUDA diff --git a/aten/src/ATen/native/quantized/cudnn/linear_prepack.cpp b/aten/src/ATen/native/quantized/cudnn/linear_prepack.cpp new file mode 100644 index 00000000000000..3541ce9b7d80a5 --- /dev/null +++ b/aten/src/ATen/native/quantized/cudnn/linear_prepack.cpp @@ -0,0 +1,63 @@ +#ifdef USE_CUDA +#include // for the definition of AT_CUDNN_ENABLED + +#if AT_CUDNN_ENABLED() + +#include + +#if HAS_CUDNN_V8() + +#include +#include +#include +#include +#include +#include +#include +#include + +c10::intrusive_ptr PackedLinearWeightCudnn::prepack( + at::Tensor weight, + c10::optional bias) { + TORCH_CHECK(weight.qscheme() == c10::kPerTensorAffine, "Unsupported qscheme: ", toString(weight.qscheme())); + const int output_channels = weight.size(0); + const auto qtype = weight.qscheme(); + if (bias.has_value()) { + TORCH_CHECK(bias.value().dim() == 1, "bias should be a vector (1D Tensor)"); + TORCH_CHECK( + bias.value().size(0) == output_channels, + "bias should have K elements: " + std::to_string(output_channels)); + } + + auto ret_ptr = c10::make_intrusive( + weight, + bias, + qtype); + return ret_ptr; +} + +namespace at { +namespace native { +namespace { + +class QLinearPackWeightInt8Cudnn final { + public: + static c10::intrusive_ptr run( + at::Tensor weight, + c10::optional bias) { + return PackedLinearWeightCudnn::prepack(std::move(weight), std::move(bias)); + } +}; + +TORCH_LIBRARY_IMPL(quantized, QuantizedCUDA, m) { + m.impl(TORCH_SELECTIVE_NAME("quantized::linear_prepack"), TORCH_FN(QLinearPackWeightInt8Cudnn::run)); +} + + +} // namespace +} // namespace native +} // namespace at + +#endif // HAS_CUDNN_V8 +#endif // AT_CUDNN_ENABLED +#endif // USE_CUDA diff --git a/aten/src/ATen/native/quantized/cudnn/linear_unpack_impl.cpp b/aten/src/ATen/native/quantized/cudnn/linear_unpack_impl.cpp new file mode 100644 index 00000000000000..ebf77b0294d872 --- /dev/null +++ b/aten/src/ATen/native/quantized/cudnn/linear_unpack_impl.cpp @@ -0,0 +1,23 @@ +#ifdef USE_CUDA +#include // for the definition of AT_CUDNN_ENABLED + +#if AT_CUDNN_ENABLED() + +#include + +#if HAS_CUDNN_V8() + +#include +#include +#include +#include + +#include + +std::tuple> PackedLinearWeightCudnn::unpack() { + return std::tuple>{orig_weight, bias_}; +} + +#endif // HAS_CUDNN_V8 +#endif // AT_CUDNN_ENABLED +#endif // USE_CUDA diff --git a/aten/src/ATen/native/quantized/cudnn/utils.h b/aten/src/ATen/native/quantized/cudnn/utils.h new file mode 100644 index 00000000000000..c5fdcd99f122d7 --- /dev/null +++ b/aten/src/ATen/native/quantized/cudnn/utils.h @@ -0,0 +1,304 @@ +#pragma once +/* +This file contains some of the auxiliary functions used by both Conv.cpp & Linear.cpp (introduced in a later PR) +*/ + +#ifdef USE_CUDA +#include // for the definition of AT_CUDNN_ENABLED + +#if AT_CUDNN_ENABLED() + +#include + +#if HAS_CUDNN_V8() + +#include +#include +#include +#include +#include +#include + +struct TORCH_API PackedLinearWeightCudnn : public LinearPackedParamsBase { + PackedLinearWeightCudnn( + at::Tensor orig_weight, + c10::optional bias, + c10::QScheme q_scheme) + : orig_weight(std::move(orig_weight)), + bias_(std::move(bias)), + q_scheme(std::move(q_scheme)) {} + + at::Tensor apply( + at::Tensor input, + double output_scale, + int64_t output_zero_point) override; + at::Tensor apply_relu( + at::Tensor input, + double output_scale, + int64_t output_zero_point) override; + + at::Tensor apply_dynamic(at::Tensor input, bool reduce_range = false) override { + throw std::runtime_error( + "apply_relu_out is not implemented for this packed " + "parameter type"); + } + at::Tensor apply_dynamic_relu(at::Tensor input, bool reduce_range = false) override { + throw std::runtime_error( + "apply_relu_out is not implemented for this packed " + "parameter type"); + } + + std::tuple> unpack() override; + + c10::optional bias() override { + return bias_; + } + + static c10::intrusive_ptr prepack( + at::Tensor weight, + c10::optional bias); + + private: + at::Tensor orig_weight; + c10::optional bias_; + c10::QScheme q_scheme; + + template + at::Tensor apply_impl( + const at::Tensor& input, + double output_scale, + int64_t output_zero_point); + + template + void apply_impl_helper( + const at::Tensor& quantized_output, + const at::Tensor& input, + double output_scale); +}; + +template +struct TORCH_API PackedConvWeightCudnn : public ConvPackedParamsBase { + PackedConvWeightCudnn( + at::Tensor orig_weight, + c10::optional bias, + torch::List stride, + torch::List padding, + torch::List output_padding, + torch::List dilation, + int64_t groups, + bool transpose, + c10::QScheme q_scheme) + : orig_weight_(std::move(orig_weight)), + bias_(std::move(bias)), + stride_(std::move(stride)), + padding_(std::move(padding)), + output_padding_(std::move(output_padding)), + dilation_(std::move(dilation)), + groups_(groups), + transpose_(transpose), + q_scheme_(q_scheme) {} + + at::Tensor apply( + const at::Tensor& input, + double output_scale, + int64_t output_zero_point) override; + + at::Tensor apply_relu( + const at::Tensor& input, + double output_scale, + int64_t output_zero_point) override; + + at::Tensor apply_dynamic( + const at::Tensor& input, + bool reduce_range) { + TORCH_CHECK(false, "apply_dynamic is currently not reported"); + } + + at::Tensor apply_dynamic_relu( + const at::Tensor& input, + bool reduce_range) { + TORCH_CHECK(false, "apply_dynamic_relu is currently not reported"); + } + + std::tuple> unpack() override; + + static c10::intrusive_ptr> prepack( + at::Tensor weight, + c10::optional bias, + torch::List stride, + torch::List padding, + torch::List output_padding, + torch::List dilation, + int64_t groups, + bool transpose); + + const float* GetBiasData(at::Tensor* bias); + + torch::List stride() const override { + return stride_; + } + + torch::List padding() const override { + return padding_; + } + + torch::List output_padding() const override { + return output_padding_; + } + + torch::List dilation() const override { + return dilation_; + } + + int64_t groups() const override { + return groups_; + } + + bool transpose() const override { + return transpose_; + } + + private: + at::Tensor orig_weight_; + c10::optional bias_; + torch::List stride_; + torch::List padding_; + torch::List output_padding_; + torch::List dilation_; + int64_t groups_; + bool transpose_; + c10::QScheme q_scheme_; + + template + at::Tensor apply_impl( + const at::Tensor& input, + double output_scale, + int64_t output_zero_point); + + template + void apply_impl_helper( + const at::Tensor& quantized_output, + const at::Tensor& input, + double output_scale); +}; + +namespace cudnn_utils { +namespace { + +uint8_t getAlignment(const at::Tensor &t) { + // alignment are in bytes + uint8_t alignment = 1; + uintptr_t address = reinterpret_cast(t.data_ptr()); + while (address % alignment == 0 && alignment < 16) alignment *= 2; + return alignment; +} + +cudnn_frontend::Tensor getTensorDescriptor(const at::Tensor &t, int64_t id, uint8_t alignment) { + auto shape = t.sizes(); + auto strides = t.strides(); + return cudnn_frontend::TensorBuilder() + .setDim(shape.size(), shape.data()) + .setStrides(strides.size(), strides.data()) + .setId(id) + .setAlignment(alignment) + .setDataType(at::native::getCudnnDataType(t)) + .build(); +} + +cudnn_frontend::Tensor getTensorDescriptor(const c10::IntArrayRef& shape, const c10::IntArrayRef& strides, cudnnDataType_t cudnn_dtype, int64_t id, uint8_t alignment) { + return cudnn_frontend::TensorBuilder() + .setDim(shape.size(), shape.data()) + .setStrides(strides.size(), strides.data()) + .setId(id) + .setAlignment(alignment) + .setDataType(cudnn_dtype) + .build(); +} + +// TODO: there is a table from input dtype to operator dtype, we can derive +// the operator dtype based on input dtype +cudnn_frontend::PointWiseDesc_v8 getPointWiseMulDescriptor(cudnnDataType_t dataType) { + return cudnn_frontend::PointWiseDescBuilder() + .setMode(cudnnPointwiseMode_t::CUDNN_POINTWISE_MUL) + .setMathPrecision(dataType) + .build(); +} + +// TODO: there is a table from input dtype to operator dtype, we can derive +// the operator dtype based on input dtype +cudnn_frontend::PointWiseDesc_v8 getPointWiseAddDescriptor(cudnnDataType_t dataType) { + return cudnn_frontend::PointWiseDescBuilder() + .setMode(cudnnPointwiseMode_t::CUDNN_POINTWISE_ADD) + .setMathPrecision(dataType) + .build(); +} + +// TODO: there is a table from input dtype to operator dtype, we can derive +// the operator dtype based on input dtype +cudnn_frontend::PointWiseDesc_v8 getPointWiseReluDescriptor(cudnnDataType_t dataType) { + return cudnn_frontend::PointWiseDescBuilder() + .setMode(cudnnPointwiseMode_t::CUDNN_POINTWISE_RELU_FWD) + .setMathPrecision(dataType) + .build(); +} + + +void filterEngineConfigs( + cudnn_frontend::EngineConfigList &from, + cudnn_frontend::EngineConfigList &to, + bool deterministic, bool allow_tf32, c10::ScalarType scalar_type) +{ + auto filter = [=](cudnnBackendDescriptor_t c) { + if (deterministic) { + if (cudnn_frontend::hasNumericalNote(c)) return true; + } + if (scalar_type == at::kFloat || scalar_type == at::kChar || !allow_tf32) { + if (cudnn_frontend::hasNumericalNote(c)) return true; + if (cudnn_frontend::hasNumericalNote(c)) return true; + } + return false; + }; + cudnn_frontend::filter(from, to, filter); +} + + +cudnn_frontend::ExecutionPlan get_execplan_from_heuristics_else_fall_back(cudnn_frontend::OperationGraph&& opGraph, cudnnHandle_t handle_) { + auto heuristics = cudnn_frontend::EngineHeuristicsBuilder() + .setOperationGraph(opGraph) + .setHeurMode(CUDNN_HEUR_MODE_INSTANT) + .build(); + + // std::cout << "Heuristic has " << heuristics.getEngineConfigCount() << " configurations " << std::endl; + auto& engine_config = heuristics.getEngineConfig(heuristics.getEngineConfigCount()); + + // Try engine configs returned by the heuristics and pick up the first one that works. + for (auto& ecfg : engine_config) { + try { + auto plan = cudnn_frontend::ExecutionPlanBuilder() + .setHandle(handle_) + .setEngineConfig(ecfg, opGraph.getTag()) + .build(); + return plan; + } catch (cudnn_frontend::cudnnException& e) { + continue; + } + } + + { + auto total_engines = opGraph.getEngineCount(); + // std::cout << opGraph.describe() << " has " << total_engines << " engines." << std::endl; + auto engine = cudnn_frontend::EngineBuilder().setGlobalEngineIdx(0).setOperationGraph(opGraph).build(); + // std::cout << engine.describe() << std::endl; + + auto engine_config = cudnn_frontend::EngineConfigBuilder().setEngine(engine).build(); + // std::cout << engine_config.describe() << std::endl; + + return cudnn_frontend::ExecutionPlanBuilder().setHandle(handle_).setEngineConfig(engine_config).build(); + } +} +} // anonymous +} // cudnn_utils + +#endif // HAS_CUDNN_V8 +#endif // AT_CUDNN_ENABLED +#endif // USE_CUDA diff --git a/aten/src/ATen/native/quantized/library.cpp b/aten/src/ATen/native/quantized/library.cpp index 74486fc7ee0c5d..b1106bc1f616d6 100644 --- a/aten/src/ATen/native/quantized/library.cpp +++ b/aten/src/ATen/native/quantized/library.cpp @@ -1,7 +1,6 @@ #include -#include -#include +#include #include #include @@ -189,11 +188,6 @@ TORCH_LIBRARY(quantized, m) { m.def(TORCH_SELECTIVE_SCHEMA("quantized::relu6(Tensor qx, bool inplace=False) -> Tensor")); m.def(TORCH_SELECTIVE_SCHEMA("quantized::leaky_relu(Tensor qx, Scalar negative_slope, bool inplace, float output_scale, int output_zero_point) -> Tensor")); m.def(TORCH_SELECTIVE_SCHEMA("quantized::sigmoid(Tensor qx, float output_scale, int output_zero_point) -> Tensor")); - - // quantized ops implemented in cudnn, with QuantizedCUDA dispatch - // TODO: use the same signature as quantized::conv2d - m.def(TORCH_SELECTIVE_SCHEMA("quantized::conv2d_cudnn(Tensor act, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, int groups, float output_scale, int output_zero_point) -> Tensor")); - m.def(TORCH_SELECTIVE_SCHEMA("quantized::conv2d_relu_cudnn(Tensor act, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, int groups, float output_scale, int output_zero_point) -> Tensor")); } // According to #33294: The "_" prefix registration will be diff --git a/aten/src/ATen/native/quantized/cpu/packed_params.h b/aten/src/ATen/native/quantized/packed_params.h similarity index 71% rename from aten/src/ATen/native/quantized/cpu/packed_params.h rename to aten/src/ATen/native/quantized/packed_params.h index 85d6ffcde17e1c..64d8ec840c4646 100644 --- a/aten/src/ATen/native/quantized/cpu/packed_params.h +++ b/aten/src/ATen/native/quantized/packed_params.h @@ -1,5 +1,6 @@ #pragma once +#include #include struct LinearPackedParamsBase : public torch::jit::CustomClassHolder { @@ -71,3 +72,27 @@ struct LinearPackedParamsBase : public torch::jit::CustomClassHolder { "parameter type"); } }; + +template +struct ConvPackedParamsBase : public torch::jit::CustomClassHolder { + virtual at::Tensor apply( + const at::Tensor& input, + double output_scale, + int64_t output_zero_point) = 0; + virtual at::Tensor apply_relu( + const at::Tensor& input, + double output_scale, + int64_t output_zero_point) = 0; + virtual at::Tensor apply_dynamic( + const at::Tensor& input, + bool reduce_range) = 0; + + virtual std::tuple> unpack() = 0; + + virtual torch::List stride() const = 0; + virtual torch::List padding() const = 0; + virtual torch::List output_padding() const = 0; + virtual torch::List dilation() const = 0; + virtual int64_t groups() const = 0; + virtual bool transpose() const = 0; +}; diff --git a/aten/src/ATen/native/quantized/cpu/qconv_unpack.cpp b/aten/src/ATen/native/quantized/qconv_unpack.cpp similarity index 63% rename from aten/src/ATen/native/quantized/cpu/qconv_unpack.cpp rename to aten/src/ATen/native/quantized/qconv_unpack.cpp index e4855062e360d9..062fc8a0522aca 100644 --- a/aten/src/ATen/native/quantized/cpu/qconv_unpack.cpp +++ b/aten/src/ATen/native/quantized/qconv_unpack.cpp @@ -1,124 +1,21 @@ +/* +The dispatch registrations at the end of this file applies to fbgemm, qnnpack, and cudnn backends. +The correct unpack backend function is determined using runtime polymorphism through the packed_weight pointer, +which is of type intrusive_ptr> and points to either a PackedConvWeightsQnnp, +PackedConvWeights (Fbgemm), or PackedConvWeightsCudnn at runtime, which all inherit from ConvPackedParamsBase. +The implementations for the unpack functions can be found in /cpu/qconv_unpack_impl.cpp, for fbgemm&qnnpack +and /cudnn/conv_unpack_impl.cpp, for cudnn. +*/ + #include -#include #include #include #include #include +#include #include -#include - -#ifdef USE_FBGEMM -template -std::tuple> PackedConvWeight< - kSpatialDim>::unpack() { - auto* packed_weights_p = w.get(); - // output channels - const int output_channels = packed_weights_p->outputChannels(); - const int input_channels = packed_weights_p->inputChannels(); - const int groups = packed_weights_p->groups(); - - const int kernel_d = kSpatialDim == 2 ? 1 : kernel[0]; - // R (kernel height) - const int kernel_h = kernel[kSpatialDim - 2]; - // S (kernel width) - const int kernel_w = kernel[kSpatialDim - 1]; - - const int C_per_G = input_channels / groups; - - // Tensor for unpacked weights - // Unpacked format would be physical KRS(C/G) but logical KCRS (channels - // first) because that's how - // ChannelsLast3d is not available now.FBGEMM stores the weights - // TODO: Unify 2d and 3d when ChannelsLast3d is ready. - at::Tensor unpacked_weights; - if (q_scheme == c10::kPerTensorAffine) { - unpacked_weights = kSpatialDim == 2 - ? at::_empty_affine_quantized( - {output_channels, C_per_G, kernel_h, kernel_w}, - device(c10::kCPU) - .dtype(c10::kQInt8) - .memory_format(c10::MemoryFormat::ChannelsLast), - w_scale[0], - w_zp[0], - c10::nullopt) - : at::native::fbgemm_utils:: - MakeEmptyAffineQuantizedChannelsLast3dTensor( - output_channels, - C_per_G, - kernel_d, - kernel_h, - kernel_w, - device(c10::kCPU).dtype(c10::kQInt8), - w_scale[0], - w_zp[0]); - } else if (q_scheme == c10::kPerChannelAffine) { - TORCH_CHECK( - !transpose(), - "Per Channel Quantization is currently disabled for transposed conv"); - auto scales = at::from_blob( - w_scale.data(), w_scale.size(), device(c10::kCPU).dtype(c10::kFloat)); - auto zero_points = at::from_blob( - w_zp.data(), w_zp.size(), device(c10::kCPU).dtype(c10::kInt)); - unpacked_weights = kSpatialDim == 2 - ? at::_empty_per_channel_affine_quantized( - {output_channels, C_per_G, kernel_h, kernel_w}, - scales.toType(c10::kDouble), - zero_points.toType(c10::kLong), - 0, /* The output channel axis is 0 */ - device(c10::kCPU).dtype(c10::kQInt8), - c10::MemoryFormat::ChannelsLast) - : at::native::fbgemm_utils:: - MakeEmptyPerChannelAffineQuantizedChannelsLast3dTensor( - output_channels, - C_per_G, - kernel_d, - kernel_h, - kernel_w, - device(c10::kCPU).dtype(c10::kQInt8), - scales.toType(c10::kDouble), - zero_points.toType(c10::kLong)); - } else { - TORCH_CHECK(false, "Unsupported qscheme: ", toString(q_scheme)); - } - int8_t* unpacked_weights_p = - reinterpret_cast(unpacked_weights.data_ptr()); - packed_weights_p->unpack(unpacked_weights_p); - if(transpose()){ - unpacked_weights = - at::native::fbgemm_utils::TransposeConvTensorUnpackConversion< - kSpatialDim>(unpacked_weights, groups); - } - return std::tuple>( - unpacked_weights, bias); -} - -template std::tuple> PackedConvWeight< - 2>::unpack(); -template std::tuple> PackedConvWeight< - 3>::unpack(); -#endif // USE_FBGEMM - -#ifdef USE_PYTORCH_QNNPACK -template -std::tuple> PackedConvWeightsQnnp< - kSpatialDim>::unpack() { - TORCH_CHECK( - kSpatialDim == 2, - "QNNPACK only supports conv2d_unpack right " - "now."); - TORCH_CHECK( - orig_weight.defined(), - "Cannot unpack weights. " - "Call at::globalContext()::setReleaseOriginalWeights(false) before packing or loading to enable unpacking."); - return std::tuple>(orig_weight, bias); -} - -template std::tuple> PackedConvWeightsQnnp< - 2>::unpack(); -template std::tuple> PackedConvWeightsQnnp< - 3>::unpack(); -#endif // USE_PYTORCH_QNNPACK +#include namespace at { namespace native { @@ -154,6 +51,12 @@ class QConvUnpackWeightsInt8 final { } #endif +#if AT_MKLDNN_ENABLED() + if (ctx.qEngine() == at::QEngine::ONEDNN) { + return packed_weight->unpack(); + } +#endif + TORCH_CHECK( false, "Didn't find engine for operation quantized::conv2d_unpack ", @@ -185,6 +88,15 @@ class QConv1dUnpackWeightsInt8 final { } #endif +#if AT_MKLDNN_ENABLED() + if (ctx.qEngine() == at::QEngine::ONEDNN) { + std::tie(weight, bias) = packed_weight->unpack(); + at::Tensor new_weight = weight.clone(); + new_weight.squeeze_(quant_utils::kConv1dSqueezeDim + 2); + return std::tuple>(new_weight, bias); + } +#endif + TORCH_CHECK( false, "Didn't find engine for operation quantized::conv1d_unpack ", @@ -252,7 +164,7 @@ unpack_quantized_prepacked_sizes_conv2d(const IValue& ivalue) { at::Tensor weight; c10::optional bias; std::tie(weight, bias) = params->unpack(); - c10::optional bias_sizes = c10::nullopt; + at::OptionalIntArrayRef bias_sizes = c10::nullopt; if (bias && bias->defined()) { bias_sizes = bias->sizes(); } diff --git a/aten/src/ATen/native/quantized/qlinear_unpack.cpp b/aten/src/ATen/native/quantized/qlinear_unpack.cpp new file mode 100644 index 00000000000000..cfcd0589f03cec --- /dev/null +++ b/aten/src/ATen/native/quantized/qlinear_unpack.cpp @@ -0,0 +1,77 @@ +/* +The dispatch registrations at the end of this file applies to fbgemm, qnnpack, and cudnn backends. +The correct unpack backend function is determined using runtime polymorphism through the packed_weight pointer, +which is of type intrusive_ptr and points to either a PackedLinearWeightsQnnp, +PackedLinearWeights (Fbgemm), or PackedLinearWeightsCudnn at runtime, which all inherit from LinearPackedParamsBase. +The implementations for the unpack functions can be found in /cpu/qlinear_unpack_impl.cpp, for fbgemm&qnnpack +and /cudnn/linear_unpack_impl.cpp, for cudnn. +*/ +#include +#include +#include +#include +#include +#include + +namespace at { +namespace native { +namespace { + +class QLinearUnpackWeightInt8 final { + public: + static std::tuple> run( + const c10::intrusive_ptr& packed_weight) { + return packed_weight->unpack(); + } +}; + +class QLinearUnpackWeightFp16 final { + public: + static std::tuple> run( + const c10::intrusive_ptr& packed_weight) { + auto& ctx = at::globalContext(); + + TORCH_CHECK( + ctx.qEngine() != at::QEngine::QNNPACK, + "quantized::linear_unpack_fp16 is currently " + "not supported by QNNPACK"); + + return packed_weight->unpack(); + } +}; + +class QLinearUnpackWeightInt8Legacy final { + public: + static std::tuple> run( + const at::Tensor& packed_weight) { + TORCH_CHECK(false, + "quantized.linear_unpack(Tensor) is unsupported! Please " + "upgrade your model to use the newer quantized.linear_" + "unpack(LinearPackedParamsBase) overload"); + } +}; + +class QLinearUnpackWeightFp16Legacy final { + public: + static std::tuple> run( + const at::Tensor& packed_weight) { + TORCH_CHECK(false, + "quantized.linear_unpack(Tensor) is unsupported! Please " + "upgrade your model to use the newer quantized.linear_" + "unpack(LinearPackedParamsBase) overload"); + } +}; + +TORCH_LIBRARY_IMPL(quantized, CPU, m) { + m.impl(TORCH_SELECTIVE_NAME("quantized::linear_unpack.legacy"), TORCH_FN(QLinearUnpackWeightInt8Legacy::run)); + m.impl(TORCH_SELECTIVE_NAME("quantized::linear_unpack_fp16.legacy"), TORCH_FN(QLinearUnpackWeightFp16Legacy::run)); +} + +TORCH_LIBRARY_IMPL(quantized, CatchAll, m) { + m.impl(TORCH_SELECTIVE_NAME("quantized::linear_unpack"), TORCH_FN(QLinearUnpackWeightInt8::run)); + m.impl(TORCH_SELECTIVE_NAME("quantized::linear_unpack_fp16"), TORCH_FN(QLinearUnpackWeightFp16::run)); +} + +} // namespace +} // namespace native +} // namespace at diff --git a/aten/src/ATen/native/sparse/SparseCsrTensor.cpp b/aten/src/ATen/native/sparse/SparseCsrTensor.cpp index f91d9648e7db9d..24a90826bc1a7e 100644 --- a/aten/src/ATen/native/sparse/SparseCsrTensor.cpp +++ b/aten/src/ATen/native/sparse/SparseCsrTensor.cpp @@ -9,6 +9,7 @@ #include #include #include +#include #ifndef AT_PER_OPERATOR_HEADERS #include @@ -56,29 +57,51 @@ void _validate_sparse_csr_tensor_args(const Tensor& crow_indices, const Tensor& // Shape and Strides invariants TORCH_CHECK( - size.size() == 2, - "size of a CSR tensor must be of length 2, but got: ", + size.size() >= 2, + "size of a batched CSR tensor must have length >= 2, but got: ", size.size()); TORCH_CHECK( - crow_indices.dim() == 1, - "crow_indices must have dim=1 but got crow_indices.dim()=", + crow_indices.dim() >= 1, + "crow_indices must have dim >= 1 but got crow_indices.dim() = ", crow_indices.dim()); TORCH_CHECK( - col_indices.dim() == 1, - "col_indices must have dim=1 but got col_indices.dim()=", + col_indices.dim() >= 1, + "col_indices must have dim >= 1 but got col_indices.dim() = ", col_indices.dim()); TORCH_CHECK( - values.dim() == 1, - "values must have dim=1 but got values.dim()=", + values.dim() >= 1, + "values must have dim >= 1 but got values.dim() = ", values.dim()); - // Note, this check also enforces `crow_indices.numel() >= 1` + + TORCH_CHECK( + crow_indices.dim() == col_indices.dim(), + "Number of dimensions of crow_indices and col_indices must be the same."); + TORCH_CHECK( + crow_indices.dim() == values.dim(), + "Number of dimensions of indices and values must be the same."); + TORCH_CHECK( + static_cast(crow_indices.dim()) == size.size() - 1, + "Number of dimensions of indices must be one less than the number of dimensions of the provided size."); + + // All batch sizes must be the same + auto batch_size = size.slice(0, size.size() - 2); + auto crow_indices_batch_size = crow_indices.sizes().slice(0, crow_indices.dim() - 1); + auto col_indices_batch_size = col_indices.sizes().slice(0, col_indices.dim() - 1); + auto values_batch_size = values.sizes().slice(0, values.dim() - 1); + TORCH_CHECK( + batch_size == crow_indices_batch_size && + batch_size == col_indices_batch_size && + batch_size == values_batch_size, + "All batch dimensions of the provided size, indices, and values must be the same."); + + // Note, this check also enforces `crow_indices.size(-1) >= 1` TORCH_CHECK( - crow_indices.numel() == (size[0] + 1), - "crow_indices.numel() must be size(0) + 1, but got: ", - crow_indices.numel()); + crow_indices.size(-1) == (size[size.size() - 2] + 1), + "crow_indices.size(-1) must be equal to size[-2] + 1 (that is ", size[size.size() - 2] + 1, "), but got: ", + crow_indices.size(-1)); TORCH_CHECK( col_indices.numel() == values.numel(), - "col_indices and values must have equal sizes, but got col_indices.numel(): ", + "col_indices and values must have the same number of elements, but got col_indices.numel(): ", col_indices.numel(), ", values.numel(): ", values.numel()); @@ -86,22 +109,28 @@ void _validate_sparse_csr_tensor_args(const Tensor& crow_indices, const Tensor& // Indices invariants AT_DISPATCH_INDEX_TYPES(crow_indices.scalar_type(), "csr_construct_check", [&] { Tensor crow_indices_cpu = crow_indices.to(kCPU); - auto crow_indices_accessor = crow_indices_cpu.accessor(); - TORCH_CHECK( - crow_indices_accessor[0] == 0, "0th value of crow_indices must be 0."); - - TORCH_CHECK( - crow_indices_accessor[crow_indices.numel() - 1] == col_indices.numel(), - "last value of crow_indices should be equal to the length of col_indices."); - - for (int i = 1; i <= size[0]; i++) { + auto crow_indices_data_ptr = crow_indices_cpu.data_ptr(); + auto batch_stride = crow_indices_cpu.dim() >= 2 ? crow_indices_cpu.stride(-2) : 0; + for (const auto batch_id : c10::irange(batchCount(crow_indices_cpu))) { + TORCH_CHECK( + crow_indices_data_ptr[batch_id*batch_stride] == 0, + "(Batch element ", batch_id, ") ", + ": 0th value of crow_indices must be 0, but it is ", crow_indices_data_ptr[batch_id*batch_stride]); TORCH_CHECK( - crow_indices_accessor[i - 1] <= crow_indices_accessor[i], - "at position i = ", i, ", this condition crow_indices[i - 1] <= crow_indices[i] fails"); + crow_indices_data_ptr[batch_id*batch_stride + crow_indices.size(-1) - 1] == col_indices.size(-1), + "(Batch element ", batch_id, ") ", + "last value of crow_indices should be equal to the length of col_indices."); + + for (int i = 1; i <= size[size.size() - 2]; i++) { + TORCH_CHECK( + crow_indices_data_ptr[batch_id*batch_stride + i - 1] <= crow_indices_data_ptr[batch_id*batch_stride + i], + "(Batch element ", batch_id, ") ", + "at position i = ", i, ", the condition crow_indices[i - 1] <= crow_indices[i] fails"); + } } if (col_indices.numel() > 0) { TORCH_CHECK(0 <= col_indices.min().item(), "col_indices.min() should be greater or equal to zero"); - TORCH_CHECK(size[1] > col_indices.max().item(), "size(1) should be greater than col_indices.max()"); + TORCH_CHECK(size[size.size() - 1] > col_indices.max().item(), "size[-1] should be greater than col_indices.max()"); } }); @@ -213,13 +242,10 @@ Tensor sparse_csr_tensor( c10::optional pin_memory) { // See [Note: hacky wrapper removal for TensorOptions] TensorOptions options = TensorOptions().dtype(dtype).layout(layout).device(device).pinned_memory(pin_memory); - std::array size = {0, 0}; - if (col_indices.numel() > 0) { - AT_DISPATCH_INDEX_TYPES(col_indices.scalar_type(), "csr_construct_check", [&] { - size[0] = crow_indices.numel() - 1; - size[1] = col_indices.max().item() + 1; - }); - } + // std::array size = {0, 0}; + auto size = DimVector(IntArrayRef(col_indices.sizes().data(), col_indices.dim() - 1)); + size.push_back(crow_indices.size(-1) - 1); + size.push_back(col_indices.max().item() + 1); at::native::_validate_sparse_csr_tensor_args(crow_indices, col_indices, values, size); @@ -243,16 +269,21 @@ Tensor empty_sparse_csr( c10::optional optional_memory_format) { check_size_nonnegative(size); - TORCH_CHECK(size.size() == 2, "torch.empty: Only 2D sparse CSR tensors are supported."); + TORCH_CHECK(size.size() >= 2, "torch.empty: Only batched sparse CSR matrices are supported, but got size ", size); TORCH_INTERNAL_ASSERT_DEBUG_ONLY(layout == Layout::SparseCsr); - auto rows = size[0]; + auto rows = size[size.size() - 2]; int64_t nnz = 0; + auto crow_indices_size = DimVector(size.slice(0, size.size() - 2)); + crow_indices_size.push_back(rows + 1); + auto col_indices_values_size = DimVector(size.slice(0, size.size() - 2)); + col_indices_values_size.push_back(nnz); + TensorOptions options = TensorOptions().dtype(ScalarType::Long).layout(Layout::Strided).device(device).pinned_memory(pin_memory); - auto crow_indices = at::empty({rows + 1}, options); - auto col_indices = at::empty({nnz}, options); - auto values = at::empty({nnz}, options.dtype(dtype)); + auto crow_indices = at::empty(crow_indices_size, options); + auto col_indices = at::empty(col_indices_values_size, options); + auto values = at::empty(col_indices_values_size, options.dtype(dtype)); return at::native::_sparse_csr_tensor_unsafe( crow_indices, @@ -270,13 +301,13 @@ const Tensor& resize_sparse_csr_( IntArrayRef size, c10::optional optional_memory_format) { check_size_nonnegative(size); - TORCH_CHECK(size.size() == 2, "torch.resize_: Only 2D sparse CSR tensors are supported."); + TORCH_CHECK(size.size() >= 2, "torch.resize_: Only batched sparse CSR matrices are supported, but got size ", size); TORCH_CHECK( - self.size(1) <= size[1], + self.size(-1) <= size[size.size() - 1], "torch.resize_: Resizing columns of sparse CSR tensors to a smaller value is not supported. ", "The original number of columns is ", - self.size(1), - " while the requested new number of columns is ", size[1], "."); + self.size(-1), + " while the requested new number of columns is ", size[size.size() - 1], "."); get_sparse_csr_impl(self)->resize_(self._nnz(), size); return self; } diff --git a/aten/src/ATen/native/sparse/SparseCsrTensorMath.cpp b/aten/src/ATen/native/sparse/SparseCsrTensorMath.cpp index d5d9ead612edee..6cccaf098d4434 100644 --- a/aten/src/ATen/native/sparse/SparseCsrTensorMath.cpp +++ b/aten/src/ATen/native/sparse/SparseCsrTensorMath.cpp @@ -1,16 +1,17 @@ #define TORCH_ASSERT_ONLY_METHOD_OPERATORS -#include #include #include #include #include #include +#include #include #include #include #include #include #include +#include #include #ifndef AT_PER_OPERATOR_HEADERS @@ -22,6 +23,7 @@ #include #include #include +#include #include #include #include @@ -89,10 +91,11 @@ #include #include #include +#include #include #include -#include #include +#include #endif #include @@ -100,19 +103,22 @@ namespace at { namespace meta { -TORCH_META_FUNC(_convert_indices_from_coo_to_csr) ( - const Tensor& self, const int64_t size, const bool out_int32 -) { +TORCH_META_FUNC(_convert_indices_from_coo_to_csr) +(const Tensor& self, const int64_t size, const bool out_int32) { TORCH_CHECK(self.dim() <= 1, "Input is supposed to be a vector"); ScalarType scalar_type = out_int32 ? ScalarType::Int : ScalarType::Long; - c10::TensorOptions options = TensorOptions().device(self.options().device()).dtype(scalar_type); + c10::TensorOptions options = + TensorOptions().device(self.options().device()).dtype(scalar_type); set_output(size + 1, options); } -TORCH_META_FUNC(_convert_indices_from_csr_to_coo) ( - const Tensor& crow_indices, const Tensor& col_indices, const bool out_int32, const bool transpose -) { - TORCH_CHECK(crow_indices.dim() == 1, "crow_indices is supposed to be a vector"); +TORCH_META_FUNC(_convert_indices_from_csr_to_coo) +(const Tensor& crow_indices, + const Tensor& col_indices, + const bool out_int32, + const bool transpose) { + TORCH_CHECK( + crow_indices.dim() == 1, "crow_indices is supposed to be a vector"); TORCH_CHECK(col_indices.dim() == 1, "col_indices is supposed to be a vector"); ScalarType scalar_type = out_int32 ? ScalarType::Int : ScalarType::Long; c10::TensorOptions options = crow_indices.options().dtype(scalar_type); @@ -126,7 +132,10 @@ namespace { constexpr int64_t GRAIN_SIZE = at::internal::GRAIN_SIZE; template -void convert_indices_from_coo_to_csr_cpu(const Tensor& result, const Tensor& input, const int64_t size) { +void convert_indices_from_coo_to_csr_cpu( + const Tensor& result, + const Tensor& input, + const int64_t size) { int64_t numel = input.numel(); const input_t* data_in = input.data_ptr(); output_t* data_out = result.data_ptr(); @@ -175,7 +184,7 @@ Tensor& unary_op_out(F op_out, const Tensor& self, Tensor& result) { return result; } -template +template Tensor& unary_op_inplace(Tensor& self, const F& op_inplace, Args&&... args) { TORCH_INTERNAL_ASSERT(self.is_sparse_csr()); @@ -185,7 +194,11 @@ Tensor& unary_op_inplace(Tensor& self, const F& op_inplace, Args&&... args) { } template -void convert_indices_from_csr_to_coo_cpu(const Tensor& indices, const Tensor& crow_indices, const Tensor& col_indices, const bool transpose=false) { +void convert_indices_from_csr_to_coo_cpu( + const Tensor& indices, + const Tensor& crow_indices, + const Tensor& col_indices, + const bool transpose = false) { int64_t nrows = crow_indices.numel() - 1; if (nrows == 0) { indices.zero_(); @@ -194,16 +207,18 @@ void convert_indices_from_csr_to_coo_cpu(const Tensor& indices, const Tensor& cr auto crow_indices_ = crow_indices.expect_contiguous(); const input_t* crow_indices_data_in = crow_indices_->data_ptr(); TORCH_INTERNAL_ASSERT(indices.is_contiguous()); - auto row0 = indices.select(0, transpose?1:0); - auto row1 = indices.select(0, transpose?0:1); + auto row0 = indices.select(0, transpose ? 1 : 0); + auto row1 = indices.select(0, transpose ? 0 : 1); output_t* data_out = row0.data_ptr(); row1.copy_(*col_indices.expect_contiguous()); at::parallel_for(0, nrows, GRAIN_SIZE, [&](int64_t start, int64_t end) { for (const auto i : c10::irange(start, end)) { - std::fill(&data_out[crow_indices_data_in[i]], &data_out[crow_indices_data_in[i + 1]], static_cast(i)); + std::fill( + &data_out[crow_indices_data_in[i]], + &data_out[crow_indices_data_in[i + 1]], + static_cast(i)); } }); - } } // end anonymous namespace @@ -222,26 +237,27 @@ inline Tensor get_result_tensor_for_unary_op(F op, const Tensor& input) { // To handle type promotion for inputs to unary ops, // we first get the result from the underlined op, and use the result - // to create a sparse CSR tensor, which is used as the input to the out= variant + // to create a sparse CSR tensor, which is used as the input to the out= + // variant auto result_values = op(values); auto result = at::native::_sparse_csr_tensor_unsafe( - input.crow_indices().clone(), - input.col_indices().clone(), - result_values, - input.sizes(), - result_values.scalar_type(), - input.layout(), - result_values.device()); + input.crow_indices().clone(), + input.col_indices().clone(), + result_values, + input.sizes(), + result_values.scalar_type(), + input.layout(), + result_values.device()); return result; } -} +} // namespace static constexpr bool is_mkl_supported() { #ifdef _MSC_VER return false; -#elif __APPLE__ || __MACH__ +#elif __APPLE__ || __MACH__ return false; #else return true; @@ -249,41 +265,46 @@ static constexpr bool is_mkl_supported() { } // Only accept squares sparse matrices or dense input as a vector -// TODO: Check what happens with MKL, the output error reported with non square matrices tends to be high -// See: https://github.com/pytorch/pytorch/issues/58770 +// TODO: Check what happens with MKL, the output error reported with non square +// matrices tends to be high See: +// https://github.com/pytorch/pytorch/issues/58770 bool is_square_or_vec(int64_t dim_i, int64_t dim_j, int64_t dim_k) { - return (dim_i == dim_k && dim_k == dim_j) || (dim_i == dim_j && dim_k == 1); + return (dim_i == dim_k && dim_k == dim_j) || (dim_i == dim_j && dim_k == 1); } -Tensor& normal_sparse_csr_(Tensor& self, double mean, double std, c10::optional gen) { +Tensor& normal_sparse_csr_( + Tensor& self, + double mean, + double std, + c10::optional gen) { return unary_op_inplace(self, &Tensor::normal_, mean, std, gen); } /* Implementation of Unary Ufuncs, those supported for Sparse CSR Layout * Only simple funcs, with 0->0 correspondence are currently supported. */ -#define CREATE_UNARY_UFUNC_OUT(op_name) \ - Tensor& op_name##_sparse_csr_out(const Tensor& self, Tensor& result) { \ - return unary_op_out(&at::op_name##_outf, self, result); \ +#define CREATE_UNARY_UFUNC_OUT(op_name) \ + Tensor& op_name##_sparse_csr_out(const Tensor& self, Tensor& result) { \ + return unary_op_out(&at::op_name##_outf, self, result); \ } -#define CREATE_UNARY_UFUNC_FUNCTIONAL(op_name) \ - Tensor op_name##_sparse_csr(const Tensor& self) { \ - return get_result_tensor_for_unary_op(&at::op_name, self); \ +#define CREATE_UNARY_UFUNC_FUNCTIONAL(op_name) \ + Tensor op_name##_sparse_csr(const Tensor& self) { \ + return get_result_tensor_for_unary_op(&at::op_name, self); \ } -#define CREATE_UNARY_UFUNC_INPLACE(op_name) \ - Tensor& op_name##_sparse_csr_(Tensor& self) { \ - return unary_op_inplace(self, &Tensor::op_name##_); \ +#define CREATE_UNARY_UFUNC_INPLACE(op_name) \ + Tensor& op_name##_sparse_csr_(Tensor& self) { \ + return unary_op_inplace(self, &Tensor::op_name##_); \ } -#define CREATE_UNARY_UFUNC(op_name) \ - CREATE_UNARY_UFUNC_OUT(op_name); \ - CREATE_UNARY_UFUNC_FUNCTIONAL(op_name); \ +#define CREATE_UNARY_UFUNC(op_name) \ + CREATE_UNARY_UFUNC_OUT(op_name); \ + CREATE_UNARY_UFUNC_FUNCTIONAL(op_name); \ CREATE_UNARY_UFUNC_INPLACE(op_name); -#define CREATE_UNARY_UFUNC_NO_INPLACE(op_name) \ - CREATE_UNARY_UFUNC_OUT(op_name); \ +#define CREATE_UNARY_UFUNC_NO_INPLACE(op_name) \ + CREATE_UNARY_UFUNC_OUT(op_name); \ CREATE_UNARY_UFUNC_FUNCTIONAL(op_name); // Exhaustive list of the unary ufuncs supported by sparse CSR @@ -339,8 +360,12 @@ CREATE_UNARY_UFUNC_FUNCTIONAL(isnan); CREATE_UNARY_UFUNC_FUNCTIONAL(isinf); template -void addmm_out_sparse_csr_native_cpu(const Tensor& sparse, const Tensor& dense, const Tensor& r, Scalar alpha, Scalar beta) { - +void addmm_out_sparse_csr_native_cpu( + const Tensor& sparse, + const Tensor& dense, + const Tensor& r, + Scalar alpha, + Scalar beta) { auto dim_i = sparse.size(0); auto dim_k = dense.size(1); @@ -350,41 +375,46 @@ void addmm_out_sparse_csr_native_cpu(const Tensor& sparse, const Tensor& dense, scalar_t cast_alpha = alpha.to(); r.mul_(beta); - AT_DISPATCH_INDEX_TYPES(col_indices.scalar_type(), "csr_mm_crow_indices", [&]() { - auto csr_accessor = csr.accessor(); - auto col_indices_accessor = col_indices.accessor(); - - auto values_accessor = values.accessor(); - scalar_t* dense_ptr = dense.data_ptr(); - scalar_t* r_ptr = r.data_ptr(); - - int64_t dense_stride0 = dense.stride(0); - int64_t dense_stride1 = dense.stride(1); - int64_t r_stride0 = r.stride(0); - int64_t r_stride1 = r.stride(1); - - at::parallel_for( - 0, - dim_i, - internal::GRAIN_SIZE, - [&](int64_t irow_start, int64_t irow_end) { - for (index_t h = irow_start; h < irow_end; ++h) { - index_t i_start = csr_accessor[h]; - index_t i_end = csr_accessor[h+1]; - for (index_t i = i_start; i < i_end; i++) { - scalar_t val = values_accessor[i]; - index_t col = col_indices_accessor[i]; - at::native::cpublas::axpy(dim_k, - cast_alpha * val, - dense_ptr + col * dense_stride0, dense_stride1, - r_ptr + h * r_stride0, r_stride1); + AT_DISPATCH_INDEX_TYPES( + col_indices.scalar_type(), "csr_mm_crow_indices", [&]() { + auto csr_accessor = csr.accessor(); + auto col_indices_accessor = col_indices.accessor(); + + auto values_accessor = values.accessor(); + scalar_t* dense_ptr = dense.data_ptr(); + scalar_t* r_ptr = r.data_ptr(); + + int64_t dense_stride0 = dense.stride(0); + int64_t dense_stride1 = dense.stride(1); + int64_t r_stride0 = r.stride(0); + int64_t r_stride1 = r.stride(1); + + at::parallel_for( + 0, + dim_i, + internal::GRAIN_SIZE, + [&](int64_t irow_start, int64_t irow_end) { + for (index_t h = irow_start; h < irow_end; ++h) { + index_t i_start = csr_accessor[h]; + index_t i_end = csr_accessor[h + 1]; + for (index_t i = i_start; i < i_end; i++) { + scalar_t val = values_accessor[i]; + index_t col = col_indices_accessor[i]; + at::native::cpublas::axpy( + dim_k, + cast_alpha * val, + dense_ptr + col * dense_stride0, + dense_stride1, + r_ptr + h * r_stride0, + r_stride1); + } } - } - }); - }); + }); + }); } // Functions for matrix multiplication. +// result = beta * self + alpha (mat1 @ mat2) Tensor& addmm_out_sparse_csr_cpu( const Tensor& self, const Tensor& mat1, @@ -392,62 +422,61 @@ Tensor& addmm_out_sparse_csr_cpu( const Scalar& beta, const Scalar& alpha, Tensor& result) { - TORCH_INTERNAL_ASSERT(mat1.is_sparse_csr()); - // TODO: remove this, there are no codegenerated checks for devices yet - TORCH_CHECK( - !self.is_cuda(), - "Expected all tensors to be on the same device. addmm expected 't' to be CPU tensor, but got CUDA tensor"); - TORCH_CHECK( - !result.is_cuda(), - "Expected all tensors to be on the same device. addmm: expected 'out' to be CPU tensor, but got CUDA tensor"); - TORCH_CHECK( - !mat1.is_cuda(), - "Expected all tensors to be on the same device. addmm: expected 'mat1' to be a CPU tensor, but got a CUDA tensor"); - TORCH_CHECK( - !mat2.is_cuda(), - "Expected all tensors to be on the same device. addmm: expected 'mat2' to be a CPU tensor, but got a CUDA tensor"); + sparse::impl::_check_is_cpu(self, "self"); + sparse::impl::_check_is_cpu(mat1, "mat1"); + sparse::impl::_check_is_cpu(mat2, "mat2"); + sparse::impl::_check_is_cpu(result, "result"); - // All the checks are from addmm_out_cuda_impl (ATen/native/cuda/Blas.cpp) and TORCH_META_FUNC(addmm) (ATen/native/LinearAlgebra.cpp) + // All the checks are from addmm_out_cuda_impl (ATen/native/cuda/Blas.cpp) and + // TORCH_META_FUNC(addmm) (ATen/native/LinearAlgebra.cpp) // TODO: remove code duplication and unify code - TORCH_CHECK(mat1.dim() == 2, "mat1 must be a matrix, got ", mat1.dim(), "-D tensor"); - TORCH_CHECK(mat2.dim() == 2, "mat2 must be a matrix, got ", mat2.dim(), "-D tensor"); + sparse::impl::_check_dim(mat1, 2, "mat1"); + sparse::impl::_check_dim(mat2, 2, "mat2"); + TORCH_CHECK( - mat1.sizes()[1] == mat2.sizes()[0], "mat1 and mat2 shapes cannot be multiplied (", - mat1.sizes()[0], "x", mat1.sizes()[1], " and ", mat2.sizes()[0], "x", mat2.sizes()[1], ")"); - - IntArrayRef mat1_sizes = mat1.sizes(); - IntArrayRef mat2_sizes = mat2.sizes(); - IntArrayRef self__sizes; - c10::MaybeOwned self_; - if (&result != &self && self.layout() == kStrided) { - self_ = expand_size(self, {mat1_sizes[0], mat2_sizes[1]}, "addmm"); - self__sizes = self_->sizes(); + mat1.size(1) == mat2.size(0), "mat1 and mat2 shapes cannot be multiplied (", + mat1.size(0), "x", mat1.size(1), " and ", mat2.sizes()[0], "x", mat2.sizes()[1], ")"); + + c10::MaybeOwned self_; + // Don't expand self if this is an in-place operation + if (&result == &self) { + self_ = c10::MaybeOwned::borrowed(self); } else { - self_ = c10::MaybeOwned::borrowed(self); - self__sizes = self_->sizes(); + self_ = expand_size(self, {mat1.size(0), mat2.size(1)}, "addmm"); } - TORCH_CHECK(((self_->dim() == 2) && (self_->sizes()[0] == mat1.sizes()[0]) && (self_->sizes()[1] == mat2.sizes()[1])), - "The input tensor must be a matrix with size ", mat1.sizes()[0], "x", mat2.sizes()[1], ", but got a ", self_->dim(), - "-D tensor with size ", self__sizes[0], "x", self__sizes[1]); + + TORCH_CHECK(((self_->dim() == 2) && + (self_->size(0) == mat1.size(0)) && + (self_->size(1) == mat2.size(1))), + "The input tensor must be a matrix with size ", + mat1.size(0), + "x", + mat2.size(1), + ", but got a ", + self_->dim(), + "-D tensor with size ", + self_->size(0), + "x", + self_->size(1)); if (&result != &self) { if (result.layout() == kStrided) { - at::native::resize_output(result, self__sizes); + at::native::resize_output(result, self_->sizes()); } else { - at::native::resize_as_sparse_csr_(result, *self_); + result.resize_as_sparse_(*self_); } result.copy_(*self_); } - IntArrayRef result_sizes = result.sizes(); - if ((result_sizes[0] == 0) || (result_sizes[1] == 0)) { + if (result.numel() == 0) { return result; } - if (mat1._nnz() == 0 && mat2.layout() == kStrided) { - // According to docs, when beta==0 values in self should be ignored. nans and infs should not propagate + if (sparse::impl::_is_sparse_and_zero(mat1) || sparse::impl::_is_sparse_and_zero(mat2)) { + // According to docs, when beta==0 values in self should be ignored. + // nans and infs should not propagate if (beta.toComplexDouble() == 0.) { result.zero_(); } else { @@ -456,26 +485,19 @@ Tensor& addmm_out_sparse_csr_cpu( return result; } - if (mat2.is_sparse_csr() && (mat1._nnz() == 0 || mat2._nnz() == 0)) { - if (beta.toComplexDouble() == 0.) { - result.values().zero_(); - } else { - result.values().mul_(beta); - } - return result; - } - #if !AT_USE_MKL_SPARSE() - if (mat2.is_sparse_csr() && result.is_sparse_csr()) { - TORCH_CHECK( - false, - "Calling addmm on sparse CPU tensors requires Linux platform. ", - "Please use PyTorch built with MKL on Linux."); - } - TORCH_INTERNAL_ASSERT_DEBUG_ONLY(result.layout() == kStrided); - AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES(result.scalar_type(), "addmm_sparse_dense", [&] { - addmm_out_sparse_csr_native_cpu(mat1, mat2, result, alpha, beta); - }); + TORCH_CHECK( + (mat1.is_sparse_csr() || + (mat2.is_sparse_csr() && result.is_sparse_csr())), + false, + "Calling addmm on sparse CPU tensors requires Linux platform. ", + "Please use PyTorch built with MKL on Linux."); + TORCH_INTERNAL_ASSERT_DEBUG_ONLY(result.layout() == kStrided); + AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES( + result.scalar_type(), "addmm_sparse_dense", [&] { + addmm_out_sparse_csr_native_cpu( + mat1, mat2, result, alpha, beta); + }); #else sparse::impl::mkl::addmm_out_sparse_csr(mat1, mat2, beta, alpha, result); #endif @@ -507,17 +529,36 @@ Tensor& _sparse_csr_mm_out( return at::addmm_out(result, zero, mat1, mat2, 0.0, 1.0); } -Tensor _sparse_csr_mm( - const Tensor& mat1, - const Tensor& mat2) { - Tensor zero; +Tensor _sparse_csr_mm(const Tensor& mat1, const Tensor& mat2) { if (mat1.is_sparse_csr() && mat2.is_sparse_csr()) { + // Return sparse // TODO: replace with at::zeros when it's implemented for sparse csr - zero = at::empty({mat1.size(0), mat2.size(1)}, mat2.options()); - } else { - zero = at::zeros({mat1.size(0), mat2.size(1)}, mat2.options()); + return at::addmm( + at::empty({mat1.size(0), mat2.size(1)}, mat2.options()), + mat1, + mat2, + 0.0, + 1.0); + } + if (mat1.is_sparse_csr() && mat2.layout() == c10::kStrided) { + // Return dense + return at::addmm( + at::zeros({mat1.size(0), mat2.size(1)}, mat2.options()), + mat1, + mat2, + 0.0, + 1.0); } - return at::addmm(zero, mat1, mat2, 0.0, 1.0); + if (mat1.layout() == c10::kStrided && mat2.is_sparse_csr()) { + // Return dense + return at::addmm( + at::zeros({mat1.size(0), mat2.size(1)}, mat1.options()), + mat1, + mat2, + 0.0, + 1.0); + } + TORCH_INTERNAL_ASSERT(false, "Shouldn't get here. Please open an issue."); } Tensor _sparse_csr_addmm( @@ -533,14 +574,20 @@ Tensor _sparse_csr_addmm( } // Functions for element-wise addition. -Tensor add_sparse_csr(const Tensor& self, const Tensor& other, const Scalar& alpha) { +Tensor add_sparse_csr( + const Tensor& self, + const Tensor& other, + const Scalar& alpha) { auto commonDtype = at::result_type(self, other); alpha_check(commonDtype, alpha); Tensor result = at::empty({0, 0}, self.options().dtype(commonDtype)); return at::add_out(result, self, other, alpha); // redispatch! } -Tensor& add_sparse_csr_(Tensor& self, const Tensor& other, const Scalar& alpha) { +Tensor& add_sparse_csr_( + Tensor& self, + const Tensor& other, + const Scalar& alpha) { return at::add_out(self, self, other, alpha); // redispatch! } @@ -584,13 +631,10 @@ void add_out_dense_sparse_csr_cpu( " in add operation"); auto src_values = src.values(); - auto src_crow_indices = src.crow_indices(); - auto src_col_indices = src.col_indices(); resize_output(out, dense.sizes()); Tensor resultBuffer = out; - Tensor valuesBuffer = src_values.to(commonDtype); if (out.scalar_type() != commonDtype) { resultBuffer = dense.to(commonDtype); @@ -598,36 +642,54 @@ void add_out_dense_sparse_csr_cpu( resultBuffer.copy_(dense); } + if (src._nnz() == 0) { + return; + } + + auto valuesBuffer = src_values.to(commonDtype).view({-1, src_values.size(-1)}); + resultBuffer = resultBuffer.view({-1, out.size(-2), out.size(-1)}); + auto src_crow_indices = src.crow_indices().view({-1, src.crow_indices().size(-1)}); + auto src_col_indices = src.col_indices().view({-1, src.col_indices().size(-1)}); + AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3( - kHalf, kBool, kBFloat16, + kHalf, + kBool, + kBFloat16, commonDtype, "add_out_op2_sparse_csr", - [&valuesBuffer, &resultBuffer, &alpha, &src_crow_indices, &src_col_indices]() { + [&valuesBuffer, + &resultBuffer, + &alpha, + &src_crow_indices, + &src_col_indices]() { AT_DISPATCH_INDEX_TYPES( src_crow_indices.scalar_type(), "csr_add_out_crow_indices", - [&valuesBuffer, &resultBuffer, &alpha, &src_crow_indices, &src_col_indices]() { - auto values_accessor = valuesBuffer.accessor(); + [&valuesBuffer, + &resultBuffer, + &alpha, + &src_crow_indices, + &src_col_indices]() { + auto batch_count = resultBuffer.dim() > 2 ? resultBuffer.size(-3) : 1; + auto values_accessor = valuesBuffer.accessor(); scalar_t* out_ptr = resultBuffer.data_ptr(); scalar_t cast_value = alpha.to(); auto crow_indices_accessor = - src_crow_indices.accessor(); + src_crow_indices.accessor(); auto col_indices_accessor = - src_col_indices.accessor(); - auto out_strides0 = resultBuffer.strides()[0]; - auto out_strides1 = resultBuffer.strides()[1]; - - for (index_t irow = 0; irow < src_crow_indices.size(0) - 1; - ++irow) { - index_t start_index = crow_indices_accessor[irow]; - index_t end_index = crow_indices_accessor[irow + 1]; - - for (index_t i = start_index; i < end_index; ++i) { - auto icol = col_indices_accessor[i]; - auto index = resultBuffer.storage_offset() + irow * out_strides0 + - icol * out_strides1; - out_ptr[index] += cast_value * values_accessor[i]; + src_col_indices.accessor(); + auto out_strides = resultBuffer.strides(); + + for (const auto batch_idx : c10::irange(batch_count)) { + for (const auto irow : c10::irange(src_crow_indices.size(-1) - 1)) { + index_t start_index = crow_indices_accessor[batch_idx][irow]; + index_t end_index = crow_indices_accessor[batch_idx][irow + 1]; + for (const auto i : c10::irange(start_index, end_index)) { + auto icol = col_indices_accessor[batch_idx][i]; + auto index = batch_idx * out_strides[0] + irow * out_strides[1] + icol * out_strides[2]; + out_ptr[index] += cast_value * values_accessor[batch_idx][i]; + } } } }); @@ -657,32 +719,583 @@ Tensor& add_out_sparse_csr_cpu( return out; } -TORCH_IMPL_FUNC(_convert_indices_from_coo_to_csr_structured_cpu) ( - const Tensor& input, const int64_t size, const bool out_int32, const Tensor& result -) { +TORCH_IMPL_FUNC(_convert_indices_from_coo_to_csr_structured_cpu) +(const Tensor& input, + const int64_t size, + const bool out_int32, + const Tensor& result) { if (out_int32) { - AT_DISPATCH_INTEGRAL_TYPES(input.scalar_type(), "convert_indices_from_coo_to_csr_cpu", [&] { - convert_indices_from_coo_to_csr_cpu(result, input, size); - }); + AT_DISPATCH_INTEGRAL_TYPES( + input.scalar_type(), "convert_indices_from_coo_to_csr_cpu", [&] { + convert_indices_from_coo_to_csr_cpu( + result, input, size); + }); } else { - AT_DISPATCH_INTEGRAL_TYPES(input.scalar_type(), "convert_indices_from_coo_to_csr_cpu", [&] { - convert_indices_from_coo_to_csr_cpu(result, input, size); - }); + AT_DISPATCH_INTEGRAL_TYPES( + input.scalar_type(), "convert_indices_from_coo_to_csr_cpu", [&] { + convert_indices_from_coo_to_csr_cpu( + result, input, size); + }); } } -TORCH_IMPL_FUNC(_convert_indices_from_csr_to_coo_structured_cpu) ( - const Tensor& crow_indices, const Tensor& col_indices, const bool out_int32, const bool transpose, const Tensor& result -) { +TORCH_IMPL_FUNC(_convert_indices_from_csr_to_coo_structured_cpu) +(const Tensor& crow_indices, + const Tensor& col_indices, + const bool out_int32, + const bool transpose, + const Tensor& result) { if (out_int32) { - AT_DISPATCH_INTEGRAL_TYPES(crow_indices.scalar_type(), "convert_indices_from_csr_to_coo_cpu", [&] { - convert_indices_from_csr_to_coo_cpu(result, crow_indices, col_indices, transpose); - }); + AT_DISPATCH_INTEGRAL_TYPES( + crow_indices.scalar_type(), "convert_indices_from_csr_to_coo_cpu", [&] { + convert_indices_from_csr_to_coo_cpu( + result, crow_indices, col_indices, transpose); + }); } else { - AT_DISPATCH_INTEGRAL_TYPES(crow_indices.scalar_type(), "convert_indices_from_csr_to_coo_cpu", [&] { - convert_indices_from_csr_to_coo_cpu(result, crow_indices, col_indices, transpose); - }); + AT_DISPATCH_INTEGRAL_TYPES( + crow_indices.scalar_type(), "convert_indices_from_csr_to_coo_cpu", [&] { + convert_indices_from_csr_to_coo_cpu( + result, crow_indices, col_indices, transpose); + }); + } +} + +/* + * Based on + * https://github.com/scipy/scipy/blob/8a64c938ddf1ae4c02a08d2c5e38daeb8d061d38/scipy/sparse/sparsetools/csr.h + */ +template +void _csr_to_block_csr_cpu_kernel( + const I n_row, + const I n_col, + const I R, + const I C, + const I* input_crow_indices, + const I* input_col_indices, + const T* input_values, + I* result_crow_indices, + I* result_col_indices, + T* result_values) { + // All blocks are possible, that is, may be allocated if a single non-zero + // value lives within them. Otherwise they're not. + + // Allocate pointers for all possible column blocks plus 1 + std::vector blocks(n_col / C + 1, (T*)0); + + assert(n_row % R == 0); + assert(n_col % C == 0); + + // Major assumptions + // 1. Blocks must be square + + // Number of blocks along rows + I n_brow = n_row / R; + // Number of blocks along columns + // I n_bcol = n_col / C; + + // Number of elements per block + I RC = R * C; + // Number of blocks overall + I n_blks = 0; + + result_crow_indices[0] = 0; + + // Iterate over blocks along rows + for (I block_i = 0; block_i < n_brow; block_i++) { + // Iterate over rows within block + for (I r = 0; r < R; r++) { + I i = R * block_i + r; // row index + for (I jj = input_crow_indices[i]; jj < input_crow_indices[i + 1]; jj++) { + I j = input_col_indices[jj]; // column index + + // Block corresponding to column index + I block_j = j / C; + // Column within block + I c = j % C; + + if (blocks[block_j] == 0) { + blocks[block_j] = result_values + RC * n_blks; + result_col_indices[n_blks] = block_j; + n_blks++; + } + + // Specific blocks entries should not be visited more than once. + // Scipy code does an addition here. Why? + *(blocks[block_j] + C * r + c) = input_values[jj]; + } + } + + for (I jj = input_crow_indices[R * block_i]; + jj < input_crow_indices[R * (block_i + 1)]; + jj++) { + blocks[input_col_indices[jj] / C] = 0; + } + + result_crow_indices[block_i + 1] = n_blks; + } +} + +/* + * Based on + * https://github.com/scipy/scipy/blob/8a64c938ddf1ae4c02a08d2c5e38daeb8d061d38/scipy/sparse/sparsetools/csr.h + */ +template +I csr_count_blocks( + const I n_row, + const I n_col, + const I R, + const I C, + const I Ap[], + const I Aj[]) { + std::vector mask(n_col / C + 1, -1); + I n_blks = 0; + for (I i = 0; i < n_row; i++) { + I bi = i / R; + for (I jj = Ap[i]; jj < Ap[i + 1]; jj++) { + I bj = Aj[jj] / C; + if (mask[bj] != bi) { + mask[bj] = bi; + n_blks++; + } + } } + return n_blks; +} + +Tensor _csr_to_block_csr_cpu(const Tensor& self, IntArrayRef blocksize) { + TORCH_CHECK( + blocksize[0] == blocksize[1], + "blocks must be square. ", + "Got (", + blocksize[0], + ", ", + blocksize[1], + ") instead."); + TORCH_CHECK( + self.size(0) % blocksize[0] == 0 && self.size(1) % blocksize[1] == 0, + "Block sparse CSR Tensors must have a size that is an ", + "integral multiple of their block size. ", + "Got Tensor of size (", + self.size(0), + ", ", + self.size(1), + ") with block size (", + blocksize[0], + ", ", + blocksize[1], + ") instead."); + Tensor input_values = self.values().contiguous(); + Tensor input_crow_indices = self.crow_indices().contiguous(); + Tensor input_col_indices = self.col_indices().contiguous(); + + // First we determine the number of blocks needed. For each given block, if it + // contains a non-zero element we will allocate values and indices for it. + int64_t num_blocks; + int64_t n_row = self.size(0); + int64_t n_col = self.size(1); + AT_DISPATCH_INDEX_TYPES( + input_crow_indices.scalar_type(), "_csr_to_block_csr_cpu", [&] { + num_blocks = csr_count_blocks( + n_row, + n_col, + blocksize[0], + blocksize[1], + input_crow_indices.data_ptr(), + input_col_indices.data_ptr()); + }); + + Tensor result_values = + input_values.new_zeros({num_blocks, blocksize[0], blocksize[1]}); + Tensor result_crow_indices = + input_crow_indices.new_empty({(n_row / blocksize[0]) + 1}); + Tensor result_col_indices = input_col_indices.new_empty({num_blocks}); + + // Next we copy over non-zero elements into the allocated blocks. + AT_DISPATCH_INDEX_TYPES( + input_crow_indices.scalar_type(), "_csr_to_block_csr_cpu", [&] { + AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES( + input_values.scalar_type(), "_csr_to_block_csr_cpu", [&] { + _csr_to_block_csr_cpu_kernel( + n_row, + n_col, + blocksize[0], + blocksize[1], + input_crow_indices.data_ptr(), + input_col_indices.data_ptr(), + input_values.data_ptr(), + result_crow_indices.data_ptr(), + result_col_indices.data_ptr(), + result_values.data_ptr()); + }); + }); + return at::native::_sparse_csr_tensor_unsafe( + result_crow_indices, + result_col_indices, + result_values, + self.sizes(), + result_values.scalar_type(), + self.layout(), + result_values.device()); +} + +Tensor _csr_to_block_csr(const Tensor& self, IntArrayRef blocksize) { + Tensor self_values = self.values(); + Tensor self_crow_indices = self.crow_indices(); + Tensor self_col_indices = self.col_indices(); + Tensor cpu_result = _csr_to_block_csr_cpu( + _sparse_csr_tensor_unsafe(self_crow_indices.cpu(), + self_col_indices.cpu(), + self_values.cpu(), + self.sizes(), + self_values.scalar_type(), + self.layout(), + self_values.device()), + blocksize); + Tensor result_values = cpu_result.values().to(self_values.options()); + Tensor result_crow_indices = cpu_result.crow_indices().to(self_crow_indices.options()); + Tensor result_col_indices = cpu_result.col_indices().to(self_col_indices.options()); + return at::native::_sparse_csr_tensor_unsafe( + result_crow_indices, + result_col_indices, + result_values, + self.sizes(), + result_values.scalar_type(), + self.layout(), + result_values.device()); +} + +/* + Reductions on sparse CSR tensors using masked semantics. + + - A CSR tensor is a 2D tensor that is specified by a 3-tuple + (crow_indices, col_indices, values). + + - To support a reduction operator on a CSR tensor, define: + +template +struct Reduction...Op { + inline scalar_t operator()(const scalar_t& a, const scalar_t& b) const { + return a ... b; + } + inline scalar_t identity() const { return ...; } +}; + +Tensor _sparse_csr_..._cpu(const Tensor& input, IntArrayRef dims_to_sum, bool keepdim, c10::optional dtype) { + ... + result = reduce_sparse_csr_cpu_template(input_, dims_to_sum, keepdim, Reduction...Op()); + ... + return result; +} + + and add the following + + - func: _sparse_csr_op.dim_dtype(Tensor self, int[1] dim, bool keepdim=False, *, ScalarType? dtype=None) -> Tensor + dispatch: + SparseCsrCUDA: _sparse_csr_..._cpu + + to native_functions.yaml + + Use ReductionAddOp and _sparse_csr_sum implementation as an example. + + - Since a CSR tensor dimensionality is always 2, only reductions + with keepdim=True can be supported. + +*/ + +namespace { + +template +Tensor reduce_sparse_csr_dim0_cpu_template(const Tensor& sparse, ReductionOp rop) { + /* + Consider the following sparse tensor: + + 1 * * * * + * * * 2 * + * * 3 * * + * * * * * + 4 * 5 * * + + that has CSR representation + + crow_indices = [0, 1, 2, 3, 3, 5] + col_indices = [0, 3, 2, 0, 2] + values = [1, 2, 3, 4, 5] + + Reduction with dim=0 results: + + rop(1,4) * rop(3,5) 2 * + + that has CSR representation + + new_crow_indices = [0, 3] + new_col_indices = [0, 2, 3] + new_values = [rop(1, 4], rop(3, 5), 2] + + In general, the CSR representation data can be computed as follows: + + new_col_indices, col_map = col_indices.unique(sorted=True, return_inverse=True) + nnz = new_col_indices.numel() + new_crow_indices = [0, nnz] + new_values.resize(nnz); new_values.fill_(identity) + for i in range(col_indices.numel()): + new_values[col_map[i]] = rop(new_values[col_map[i], values[i]) + */ + + Tensor col_indices = sparse.col_indices(); + Tensor values = sparse.values(); + auto numel = values.numel(); + Tensor new_col_indices; + Tensor columns_map; + + /* + Calling at::_unique constitutes the main bottleneck of this + function. However, it is still about 5x faster than using the + invariant: + csr.sum(dim=0) == csr.transpose(0, 1).sum(dim=1) + */ + std::tie(new_col_indices, columns_map) = at::_unique(col_indices, true, true); + auto nnz = new_col_indices.numel(); + + Tensor new_crow_indices = at::empty({2}, col_indices.options()); + new_crow_indices[0] = 0; + new_crow_indices[1] = nnz; + + Tensor new_values = at::empty({nnz}, values.options()); + new_values.fill_(rop.identity()); + + AT_DISPATCH_INDEX_TYPES(col_indices.scalar_type(), "reduce_sparse_csr_dim0_cpu_indices", + [&]() { + index_t* columns_map_ptr = columns_map.data_ptr(); + scalar_t* values_ptr = values.data_ptr(); + scalar_t* new_values_ptr = new_values.data_ptr(); + + // There is no point in parallelizing the following for-loop + // because about 99.3% of the computation time is spent in the + // at::_unique call above. + for (int64_t i=0; i +Tensor reduce_sparse_csr_dim1_cpu_template(const Tensor& sparse, ReductionOp rop) { + /* + Consider the following sparse tensor: + + 1 * * * * + * * * 2 * + * * 3 * * + * * * * * + 4 * 5 * * + + that has CSR representation + + crow_indices = [0, 1, 2, 3, 3, 5] + col_indices = [0, 3, 2, 0, 2] + values = [1, 2, 3, 4, 5] + + Reduction with dim=1 results: + + 1 + 2 + 3 + * + rop(4, 5) + + that has CSR representation + + new_crow_indices = [0, 1, 2, 3, 3, 4] + new_col_indices = [0, 0, 0, 0] + new_values = [1, 2, 3, rop(4, 5)] + + In general, the result CSR data can be computed as follows: + + new_crow_indices = [0] + for i in range(1, nrows+1): + new_crow_indices[i] = new_crow_indices[i-1] + (crow_indices[i] == crow_indices[i-1]) + nnz = new_crow_indices[-1] + new_col_indices = zeros(nnz) + new_values.resize(nnz) + j = -1 + for i in range(1, nrows+1): + if crow_indices[i] == crow_indices[i-1]: + continue + j += 1 + new_values[j] = rop(values[crow_indices[i] : crow_indices[i-1]]) + */ + + Tensor crow_indices = sparse.crow_indices(); + auto ioptions = crow_indices.options(); + Tensor values = sparse.values(); + auto nrows = sparse.size(0); + + Tensor new_crow_indices = at::empty({crow_indices.numel()}, ioptions); + Tensor new_col_indices = at::empty({}, ioptions); + Tensor new_values = at::empty({}, values.options()); + Tensor row_map = at::empty({nrows}, ioptions); + + AT_DISPATCH_INDEX_TYPES(crow_indices.scalar_type(), "reduce_sparse_csr_dim1_cpu_indices", + [&]() { + index_t* crow_indices_ptr = crow_indices.data_ptr(); + index_t* new_crow_indices_ptr = new_crow_indices.data_ptr(); + index_t* row_map_ptr = row_map.data_ptr(); + int64_t nnz = 0; + new_crow_indices_ptr[0] = 0; + for(int64_t i=0; i(); + scalar_t* new_values_ptr = new_values.data_ptr(); + + at::parallel_for( + 0, + nrows, + internal::GRAIN_SIZE, + [&](int64_t irow_start, int64_t irow_end) { + index_t i_end = crow_indices_ptr[irow_start]; + for (index_t h = irow_start; h < irow_end; ++h) { + index_t i_start = i_end; + i_end = crow_indices_ptr[h+1]; + if (i_start != i_end) { + scalar_t res = values_ptr[i_start]; + for (index_t i = i_start + 1; i < i_end; i++) { + res = rop(res, values_ptr[i]); + } + new_values_ptr[row_map_ptr[h]] = res; + } + } + }); + }); + + return at::native::_sparse_csr_tensor_unsafe(new_crow_indices, new_col_indices, new_values, + {sparse.size(0), 1}, + new_values.scalar_type(), + sparse.layout(), + new_values.device()); +} + +template +Tensor reduce_sparse_csr_dim01_cpu_template(const Tensor& sparse, ReductionOp rop) { + + auto ioptions = sparse.col_indices().options(); + Tensor values = sparse.values(); + auto numel = values.numel(); + auto nnz = std::min(1, numel); + + /* TODO: we can likely do about 3x better than parallel_reduce: + +In [2]: t=torch.randn(5000, 5000).to_sparse_csr() + +In [3]: %timeit torch._sparse_csr_sum(t, dim=(0, 1), keepdim=True) +3.39 ms ± 898 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) + +In [4]: %timeit torch.sum(t.values()) +1.07 ms ± 291 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) + */ + scalar_t* values_ptr = values.data_ptr(); + scalar_t value = at::parallel_reduce( + 0, + numel, + internal::GRAIN_SIZE, + rop.identity(), + [&](int64_t i_start, int64_t i_end, scalar_t identity) { + scalar_t res = identity; + for (int64_t i=i_start; i{0, nnz}, ioptions); + Tensor new_values; + if (numel > 0) { + new_values = at::empty({1}, values.options()); + new_values.fill_(value); + } else { + new_values = at::empty({}, values.options()); + } + return at::native::_sparse_csr_tensor_unsafe(new_crow_indices, new_col_indices, new_values, + {1, std::min(1, sparse.size(1))}, + new_values.scalar_type(), + sparse.layout(), + new_values.device()); +} + +template +Tensor reduce_sparse_csr_cpu_template(const Tensor& sparse, std::vector dims, ReductionOp rop) { + if (dims.size() == 1) { + if (dims[0] == 0) { + return reduce_sparse_csr_dim0_cpu_template(sparse, rop); + } else { + TORCH_INTERNAL_ASSERT(dims[0] == 1); + return reduce_sparse_csr_dim1_cpu_template(sparse, rop); + } + } else if (dims.size() == 2) { + TORCH_INTERNAL_ASSERT(((dims[0] == 0 && dims[1] == 1) || (dims[0] == 1 && dims[1] == 0))); + return reduce_sparse_csr_dim01_cpu_template(sparse, rop); + } + TORCH_INTERNAL_ASSERT(dims.size() == 0); + // effective after gh-29137 has been resolved + return sparse.clone(); +} + +template +Tensor reduce_sparse_csr_cpu_template(const Tensor& sparse, IntArrayRef dims_to_sum, bool keepdim, ReductionOp rop) { + TORCH_INTERNAL_ASSERT(sparse.is_sparse_csr()); + TORCH_CHECK(keepdim, "reduction operations on CSR tensors with keepdim=False is unsupported"); + TORCH_INTERNAL_ASSERT(sparse.device() == kCPU); + + const int64_t input_dim = sparse.dim(); + TORCH_INTERNAL_ASSERT(input_dim == 2); + auto dims = dims_to_sum.vec(); + maybe_wrap_dims(dims, input_dim); + if (dims.size() == 0) { + // after gh-29137 is resolved, delete this if-block + dims.emplace_back(0); + dims.emplace_back(1); + } + return reduce_sparse_csr_cpu_template(sparse, dims, rop); +} + +template +struct ReductionAddOp { + inline scalar_t operator()(const scalar_t& a, const scalar_t& b) const { + return a + b; + } + inline scalar_t identity() const { return 0; } +}; + +} // namespace + +Tensor _sparse_csr_sum_cpu(const Tensor& input, IntArrayRef dims_to_sum, bool keepdim, c10::optional dtype) { + ScalarType dtype_ = dtype.value_or(input.scalar_type()); + Tensor input_ = input.to(dtype_); + Tensor result; + AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND2( + kHalf, kBFloat16, input_.scalar_type(), "_sparse_csr_sum_cpu", + [&] { + result = reduce_sparse_csr_cpu_template(input_, dims_to_sum, keepdim, ReductionAddOp()); + }); + return result; } } // namespace native diff --git a/aten/src/ATen/native/sparse/SparseCsrTensorMath.h b/aten/src/ATen/native/sparse/SparseCsrTensorMath.h new file mode 100644 index 00000000000000..f23b1ff18e884c --- /dev/null +++ b/aten/src/ATen/native/sparse/SparseCsrTensorMath.h @@ -0,0 +1,63 @@ +#pragma once + +#include +#include + +namespace at { +namespace native { +namespace sparse { +namespace impl { + +// Returns true if all entries of self are zero +// TODO: This has potential to be a generic helper +inline bool _is_sparse_and_zero(const Tensor& self) { + if (self.is_sparse_csr() || self.is_sparse()) { + if (self._nnz() == 0) { + return true; + } + } + return false; +} + +inline void _check_is_cpu(const Tensor& self, c10::string_view name) { + TORCH_CHECK( + self.is_cpu(), + "Expected all tensors to be on the same device. addmm expected '", + name, + "' to be CPU tensor, but got ", + self.device(), + " tensor"); +} + +inline void _check_is_cuda(const Tensor& self, c10::string_view name) { + TORCH_CHECK( + self.is_cuda(), + "Expected all tensors to be on the same device. addmm expected '", + name, + "' to be CUDA tensor, but got ", + self.device(), + " tensor"); +} + +inline void _check_dim(const Tensor& self, int64_t target_dim, c10::string_view name) { + if (target_dim == 2) { + TORCH_CHECK( + self.dim() == target_dim, + name, " must be a matrix, ", + "got ", self.dim(), "-D tensor"); + } + TORCH_CHECK( + self.dim() == target_dim, + "Expected ", + name, + " to be of dimension ", + target_dim, + " but got ", + self.dim(), + " instead."); +} + +} +} +} +} diff --git a/aten/src/ATen/native/sparse/SparseTensor.cpp b/aten/src/ATen/native/sparse/SparseTensor.cpp index 5f2edff7db40a4..256a17f22c23c4 100644 --- a/aten/src/ATen/native/sparse/SparseTensor.cpp +++ b/aten/src/ATen/native/sparse/SparseTensor.cpp @@ -569,15 +569,6 @@ SparseTensor sparse_csr_to_sparse(const Tensor& self) { // NB: Dropped the resizeNd variants -Tensor sparse_to_dense( - const SparseTensor& self, - c10::optional dtype) { - TORCH_CHECK( - !dtype.has_value(), "dtype argument is not supported by sparse_to_dense"); - Tensor dst = at::zeros(self.sizes(), self.options().layout(kStrided)); - return dst.add_(self); -} - SparseTensor& copy_sparse_wrapper_( Tensor& self, const Tensor& src, @@ -664,8 +655,8 @@ SparseTensor _coalesce_sparse_cpu(const SparseTensor& self) { auto indicesBufferAccessor = indicesBuffer.accessor(); int64_t i = -1; - AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND2(at::ScalarType::BFloat16, at::ScalarType::Half, values.scalar_type(), - "coalesce", [&] { + AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3(at::ScalarType::BFloat16, at::ScalarType::Half, at::ScalarType::Bool, values.scalar_type(), + "coalesce", [&] { int64_t prev = -1; int64_t blockSize = values.stride(0); scalar_t* values_ptr = values.data_ptr(); diff --git a/aten/src/ATen/native/sparse/SparseTensorMath.cpp b/aten/src/ATen/native/sparse/SparseTensorMath.cpp index f98ab775926bbe..0c45074f8a1eb8 100644 --- a/aten/src/ATen/native/sparse/SparseTensorMath.cpp +++ b/aten/src/ATen/native/sparse/SparseTensorMath.cpp @@ -707,6 +707,34 @@ Tensor& mul_sparse_(Tensor& self, const Tensor& other) { return at::mul_out(self, self, other); // redispatch! } +Tensor& mul_out_sparse_csr(const Tensor& t_, const Tensor& src_, Tensor& r) { + // // TODO: Use a specialized CSR kernel for performance if needed + TORCH_CHECK(t_.is_sparse_csr() || (t_.layout() == c10::kStrided && t_.dim() == 0), "mul(dense, sparse_csr) is not supported"); + TORCH_CHECK(src_.is_sparse_csr() || (src_.layout() == c10::kStrided && src_.dim() == 0), "mul(sparse_csr, dense) is not supported"); + TORCH_CHECK(r.is_sparse_csr(), "Expected result Tensor to be of format CSR"); + Tensor t = t_.to_sparse(); + Tensor src = src_.to_sparse(); + Tensor tmp_result = t.mul(src); + auto r_sparse_csr = tmp_result.to_sparse_csr(); + r.resize_as_sparse_(r_sparse_csr); + r.copy_(r_sparse_csr); + return r; +} + +Tensor mul_sparse_csr(const Tensor& self, const Tensor& other) { + auto commonDtype = at::result_type(self, other); + TORCH_CHECK(self.is_sparse_csr(), "mul(dense, sparse_csr) is not supported"); + TORCH_CHECK(other.is_sparse_csr(), "mul(sparse_csr, dense) is not supported"); + auto result_options = self.options().dtype(commonDtype); + // CSR is 2d! + Tensor result = at::empty({0, 0}, result_options); + return at::mul_out(result, self, other); // redispatch! +} + +Tensor& mul_sparse_csr_(Tensor& self, const Tensor& other) { + return at::mul_out(self, self, other); // redispatch! +} + SparseTensor& mul_out_sparse_cpu(const Tensor& t_, const Tensor& src_, SparseTensor& r) { if (src_.dim() == 0) { return mul_out_sparse_zerodim(r, t_, src_); diff --git a/aten/src/ATen/native/sparse/cuda/SparseBlas.cpp b/aten/src/ATen/native/sparse/cuda/SparseBlas.cpp index 6a8b7253fbfc62..722582c3cbdbb7 100644 --- a/aten/src/ATen/native/sparse/cuda/SparseBlas.cpp +++ b/aten/src/ATen/native/sparse/cuda/SparseBlas.cpp @@ -3,6 +3,7 @@ #include #include #include +#include #ifndef AT_PER_OPERATOR_HEADERS #include @@ -103,6 +104,7 @@ Tensor sparse_sampled_addmm_sparse_csr_cuda( return result; } +// result = beta * self + alpha * (mat1 @ mat2) Tensor& addmm_out_sparse_csr_cuda( const Tensor& self, const Tensor& mat1, @@ -110,65 +112,63 @@ Tensor& addmm_out_sparse_csr_cuda( const Scalar& beta, const Scalar& alpha, Tensor& result) { - TORCH_INTERNAL_ASSERT_DEBUG_ONLY(mat1.is_sparse_csr()); + sparse::impl::_check_is_cuda(self, "self"); + sparse::impl::_check_is_cuda(mat1, "mat1"); + sparse::impl::_check_is_cuda(mat2, "mat2"); + sparse::impl::_check_is_cuda(result, "result"); // Same checks as in TORCH_META_FUNC(addmm) at // aten/src/ATen/native/LinearAlgebra.cpp - TORCH_CHECK( - mat1.dim() == 2, "mat1 must be a matrix, got ", mat1.dim(), "-D tensor"); - TORCH_CHECK( - mat2.dim() == 2, "mat2 must be a matrix, got ", mat2.dim(), "-D tensor"); + sparse::impl::_check_dim(mat1, 2, "mat1"); + sparse::impl::_check_dim(mat2, 2, "mat2"); - IntArrayRef mat1_sizes = mat1.sizes(); - IntArrayRef mat2_sizes = mat2.sizes(); TORCH_CHECK( - mat1_sizes[1] == mat2_sizes[0], - "mat1 and mat2 shapes cannot be multiplied (", - mat1_sizes[0], - "x", - mat1_sizes[1], - " and ", - mat2_sizes[0], - "x", - mat2_sizes[1], - ")"); + mat1.size(1) == mat2.size(0), "mat1 and mat2 shapes cannot be multiplied (", + mat1.size(0), "x", mat1.size(1), " and ", mat2.sizes()[0], "x", mat2.sizes()[1], ")"); // From addmm_out_cuda_impl at ATen/native/cuda/Blas.cpp // TODO: remove code duplication and unify code // There were undefined symbol problems, // when using the same function for CUDA and SparseCsrCUDA dispatch keys // Also structured kernels do not support sparse output - IntArrayRef self__sizes; - c10::MaybeOwned self_; - if (&result != &self && self.layout() == kStrided) { - self_ = expand_size(self, {mat1_sizes[0], mat2_sizes[1]}, "addmm"); - self__sizes = self_->sizes(); + c10::MaybeOwned self_; + // Don't expand self if this is an in-place operation + if (&result == &self) { + self_ = c10::MaybeOwned::borrowed(self); } else { - self_ = c10::MaybeOwned::borrowed(self); - self__sizes = self_->sizes(); - TORCH_CHECK(result.dim() == 2, "tensors must be 2-D"); - TORCH_CHECK( - self__sizes[0] == mat1_sizes[0], "self_ dim 0 must match mat1 dim 0"); - TORCH_CHECK( - self__sizes[1] == mat2_sizes[1], "self_ dim 1 must match mat2 dim 1"); + self_ = expand_size(self, {mat1.size(0), mat2.size(1)}, "addmm"); } + sparse::impl::_check_dim(*self_, 2, "self"); + TORCH_CHECK(((self_->dim() == 2) && + (self_->size(0) == mat1.size(0)) && + (self_->size(1) == mat2.size(1))), + "The input tensor must be a matrix with size ", + mat1.size(0), + "x", + mat2.size(1), + ", but got a ", + self_->dim(), + "-D tensor with size ", + self_->size(0), + "x", + self_->size(1)); + if (&result != &self) { if (result.layout() == kStrided) { - at::native::resize_output(result, self__sizes); + at::native::resize_output(result, self_->sizes()); } else { - at::native::resize_as_sparse_csr_(result, *self_); + result.resize_as_sparse_(*self_); } result.copy_(*self_); } - IntArrayRef result_sizes = result.sizes(); - if ((result_sizes[0] == 0) || (result_sizes[1] == 0)) { + if (result.numel() == 0) { return result; } - if (mat1._nnz() == 0 && mat2.layout() == kStrided) { - // According to docs, when beta==0 values in self should be ignored + if (sparse::impl::_is_sparse_and_zero(mat1) || sparse::impl::_is_sparse_and_zero(mat2)) { + // According to docs, when beta==0 values in self should be ignored. // nans and infs should not propagate if (beta.toComplexDouble() == 0.) { result.zero_(); @@ -178,15 +178,6 @@ Tensor& addmm_out_sparse_csr_cuda( return result; } - if (mat2.is_sparse_csr() && (mat1._nnz() == 0 || mat2._nnz() == 0)) { - if (beta.toComplexDouble() == 0.) { - result.values().zero_(); - } else { - result.values().mul_(beta); - } - return result; - } - sparse::impl::cuda::addmm_out_sparse_csr(mat1, mat2, beta, alpha, result); return result; } diff --git a/aten/src/ATen/native/sparse/cuda/SparseBlasImpl.cpp b/aten/src/ATen/native/sparse/cuda/SparseBlasImpl.cpp index f5396757ab7ce9..7cfe1248fb6243 100644 --- a/aten/src/ATen/native/sparse/cuda/SparseBlasImpl.cpp +++ b/aten/src/ATen/native/sparse/cuda/SparseBlasImpl.cpp @@ -120,6 +120,15 @@ void inline col_indices_and_values_resize_(const Tensor& input, int64_t nnz) { input.sizes()); } +void inline bsrsv2_bsrsm2_may_need_to_sync() { +#if defined(CUSPARSE_VERSION) && CUSPARSE_VERSION < 11703 + // cusparse bsrsv2 and bsrsm2 have a synchronization issue that may cause illegal memory access in cuda <= 11.6.x + // See https://github.com/pytorch/pytorch/issues/71297 + ::c10::cuda::device_synchronize(); +#endif + // else: do nothing! +} + void block_sparse_triangular_solve_vec( const at::sparse_csr::SparseCsrTensor& A, const Tensor& B, @@ -230,6 +239,8 @@ void block_sparse_triangular_solve_vec( X_->data_ptr(), CUSPARSE_SOLVE_POLICY_NO_LEVEL, work_data.get()); + + bsrsv2_bsrsm2_may_need_to_sync(); }); if (!X.is_same(*X_)) { X.copy_(*X_); @@ -360,6 +371,8 @@ void block_sparse_triangular_solve_mat( ldx, CUSPARSE_SOLVE_POLICY_NO_LEVEL, work_data.get()); + + bsrsv2_bsrsm2_may_need_to_sync(); }); if (!X.is_same(*X_)) { X.copy_(*X_); @@ -793,19 +806,23 @@ void spgemm( } // anonymous namespace void addmm_out_sparse_csr( - const at::sparse_csr::SparseCsrTensor& mat1, + const Tensor& mat1, const Tensor& mat2, const Scalar& beta, const Scalar& alpha, const Tensor& result) { - if (mat2.layout() == kStrided && result.layout() == kStrided) { + if (mat1.is_sparse_csr() && mat2.layout() == kStrided && result.layout() == kStrided) { return spmm(mat1, mat2, beta, alpha, result); - } else if (mat2.is_sparse_csr() && result.is_sparse_csr()) { + } + if (mat1.layout() == kStrided && mat2.is_sparse_csr() && result.layout() == kStrided) { + // TODO: We can use cuSPARSE's transposition flags once we have CSC support. + return spmm(mat2.transpose(0, 1), mat1.transpose(0, 1), beta, alpha, result.transpose(0, 1)); + } + if (mat1.is_sparse_csr() && mat2.is_sparse_csr() && result.is_sparse_csr()) { return spgemm(mat1, mat2, beta, alpha, result); - } else { - TORCH_CHECK(false, "addmm: computation on CUDA is not implemented for ", - result.layout(), " + ", mat1.layout(), " @ ", mat2.layout()); } + TORCH_CHECK(false, "addmm: computation on CUDA is not implemented for ", + result.layout(), " + ", mat1.layout(), " @ ", mat2.layout()); } /* @@ -965,6 +982,24 @@ void add_out_sparse_csr( auto B_col_indices_ptr = B_col_indices.data_ptr(); auto C_col_indices_ptr = C_col_indices.data_ptr(); + // Windows compilers don't support nested macros + // so we need this lambda outside of the + // AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES + auto fix_nnz = [ +#if AT_ROCM_ENABLED() + &C_crow_indices, + &m +#endif + ](int nnz) -> int { +// For some reason POINTER_MODE_HOST is not working here +// Let's extract manually the nnz from the C_crow_indices +#if AT_ROCM_ENABLED() + return std::max({nnz, C_crow_indices.narrow(-1, m, 1).item()}); +#else + return nnz; +#endif + }; + AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES( C.scalar_type(), "add_out_sparse_csr_cuda_impl", [&] { auto beta_ = beta.to(); @@ -1025,6 +1060,8 @@ void add_out_sparse_csr( &nnzC, work_data.get()); + nnzC = fix_nnz(nnzC); + // Resize result using nnz information from cusparse col_indices_and_values_resize_(C, nnzC); C_col_indices = C.col_indices(); diff --git a/aten/src/ATen/native/sparse/cuda/SparseCUDAApplyUtils.cuh b/aten/src/ATen/native/sparse/cuda/SparseCUDAApplyUtils.cuh index a8c622639ee6e3..2a266319212a79 100644 --- a/aten/src/ATen/native/sparse/cuda/SparseCUDAApplyUtils.cuh +++ b/aten/src/ATen/native/sparse/cuda/SparseCUDAApplyUtils.cuh @@ -2,6 +2,7 @@ #include #include +#include #include namespace at { namespace native { @@ -304,7 +305,7 @@ __global__ void indexSparseIntersectionKernel( // } template -C10_LAUNCH_BOUNDS_1(C10_WARP_SIZE*4) +C10_LAUNCH_BOUNDS_1(num_threads()) __global__ void coalesceValuesKernel( int64_t *segment_offsets, int64_t *value_indices, Dtype *values, Dtype *newValues, @@ -328,7 +329,6 @@ __global__ void coalesceValuesKernel( for (int row = begin; row < end; row++) { const int valueRow = ((int) value_indices[row]) * stride; - #pragma unroll for (int ii = 0; ii < SZ; ii++) { @@ -351,6 +351,56 @@ __global__ void coalesceValuesKernel( } } +// coalesceValuesKernel when Dtype/Acctype is bool. Can be eliminated using +// `if constexpr` when CUDA codes will be compiled under C++-17, see +// gh-56055 for blockers. +template +C10_LAUNCH_BOUNDS_1(C10_WARP_SIZE*4) +__global__ void coalesceValuesKernel( + int64_t *segment_offsets, int64_t *value_indices, + bool *values, bool *newValues, + int64_t nnz, int64_t newNnz, int64_t stride) { + + int seg = blockIdx.x * 4 + threadIdx.y; + + // Number of values processed by each thread (grain size) + const int SZ = 4; + + if (seg < newNnz) { + const int newValueRow = seg * stride; + const int begin = segment_offsets[seg]; + const int end = (seg < newNnz - 1) ? segment_offsets[seg + 1] : nnz; + const int startFeature = threadIdx.x + blockIdx.y * blockDim.x * SZ; + bool tmp[SZ]; + #pragma unroll + for (int ii = 0; ii < SZ; ii++) { + tmp[ii] = 0; + } + for (int row = begin; row < end; row++) { + const int valueRow = ((int) value_indices[row]) * stride; + + #pragma unroll + for (int ii = 0; ii < SZ; ii++) + { + int featureDim = startFeature + ii * C10_WARP_SIZE; + if (featureDim < stride) + { + tmp[ii] |= values[valueRow + featureDim]; + } + } + } + #pragma unroll + for (int ii = 0; ii < SZ; ii++) + { + int featureDim = startFeature + ii * C10_WARP_SIZE; + if (featureDim < stride) + { + newValues[newValueRow + featureDim] = tmp[ii]; + } + } + } +} + } // namespace apply }} // namespace at::native diff --git a/aten/src/ATen/native/sparse/cuda/SparseCUDATensor.cu b/aten/src/ATen/native/sparse/cuda/SparseCUDATensor.cu index 30e7d873b39cf8..dc5a2acf2da1a5 100644 --- a/aten/src/ATen/native/sparse/cuda/SparseCUDATensor.cu +++ b/aten/src/ATen/native/sparse/cuda/SparseCUDATensor.cu @@ -142,10 +142,11 @@ SparseTensor _coalesce_sparse_cuda(const SparseTensor& self) { const int SZ = 4; values = values.contiguous(); int64_t stride = c10::multiply_integers(values.sizes().slice(1)); - dim3 grid(ceil_div(newNnz, (int64_t) SZ), ceil_div(stride, (int64_t) C10_WARP_SIZE*SZ)); - dim3 block(C10_WARP_SIZE, SZ); - AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND2( - at::ScalarType::Half, at::ScalarType::BFloat16, values.scalar_type(), "coalesce_sparse_cuda", [&] { + int warp_size = at::cuda::warp_size(); + dim3 grid(ceil_div(newNnz, (int64_t) SZ), ceil_div(stride, (int64_t) warp_size*SZ)); + dim3 block(warp_size, SZ); + AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3( + at::ScalarType::Half, at::ScalarType::BFloat16, at::ScalarType::Bool, values.scalar_type(), "coalesce_sparse_cuda", [&] { using cuda_accscalar_t = acc_type; apply::coalesceValuesKernel<<>>( uniqueOffsets.data_ptr(), diff --git a/aten/src/ATen/native/sparse/cuda/SparseCsrTensorMath.cu b/aten/src/ATen/native/sparse/cuda/SparseCsrTensorMath.cu index c13984f2d92ff6..09663a8c0768d2 100644 --- a/aten/src/ATen/native/sparse/cuda/SparseCsrTensorMath.cu +++ b/aten/src/ATen/native/sparse/cuda/SparseCsrTensorMath.cu @@ -16,8 +16,12 @@ #else #include #include +#include +#include #include #include +#include +#include #endif #include @@ -29,6 +33,7 @@ #include #include +#include #include #include #include @@ -159,18 +164,26 @@ Tensor& add_out_dense_sparse_csr_cuda( " in add operation"); Tensor src_values = src.values(); - Tensor src_crow_indices = src.crow_indices(); - Tensor src_col_indices = src.col_indices(); resize_output(output, dense.sizes()); Tensor resultBuffer = output; - Tensor valuesBuffer = src_values.to(commonDtype); + if (output.scalar_type() != commonDtype) { resultBuffer = dense.to(commonDtype); } else if (!is_same_tensor(output, dense)) { resultBuffer.copy_(dense); } + + if (src._nnz() == 0) { + return output; + } + + auto valuesBuffer = src_values.to(commonDtype).view({-1, src_values.size(-1)}); + resultBuffer = resultBuffer.view({-1, output.size(-2), output.size(-1)}); + auto src_crow_indices = src.crow_indices().view({-1, src.crow_indices().size(-1)}); + auto src_col_indices = src.col_indices().view({-1, src.col_indices().size(-1)}); + AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3( kHalf, kBool, kBFloat16, commonDtype, @@ -180,6 +193,7 @@ Tensor& add_out_dense_sparse_csr_cuda( src_crow_indices.scalar_type(), "csr_add_out_crow_indices", [&valuesBuffer, &resultBuffer, &alpha, &src_crow_indices, &src_col_indices]() { + auto batch_count = resultBuffer.dim() > 2 ? resultBuffer.size(-3) : 1; scalar_t* values_accessor = valuesBuffer.data_ptr(); scalar_t* out_ptr = resultBuffer.data_ptr(); scalar_t cast_value = alpha.to(); @@ -189,8 +203,11 @@ Tensor& add_out_dense_sparse_csr_cuda( int64_t out_storage_offset = resultBuffer.storage_offset(); auto out_strides = resultBuffer.strides(); - int64_t out_strides0 = out_strides[0]; - int64_t out_strides1 = out_strides[1]; + auto out_strides0 = out_strides[0]; + auto out_strides1 = out_strides[1]; + auto crow_stride0 = src_crow_indices.stride(0); + auto col_stride0 = src_col_indices.stride(0); + auto val_stride0 = valuesBuffer.stride(0); cudaStream_t stream = at::cuda::getCurrentCUDAStream(); at::cuda::ThrustAllocator allocator; @@ -200,24 +217,29 @@ Tensor& add_out_dense_sparse_csr_cuda( thrust::for_each( policy, thrust::make_counting_iterator(int64_t(0)), - thrust::make_counting_iterator(int64_t(src_crow_indices.size(0) - 1)), + thrust::make_counting_iterator(int64_t(src_crow_indices.size(-1) - 1)), [values_accessor, crow_indices_accessor, col_indices_accessor, out_ptr, - out_storage_offset, - out_strides0, cast_value, - out_strides1 + out_strides0, + out_strides1, + crow_stride0, + col_stride0, + val_stride0, + batch_count ]__device__(int64_t irow) { - index_t start_index = crow_indices_accessor[irow]; - index_t end_index = crow_indices_accessor[irow + 1]; + for (index_t batch_idx = 0; batch_idx < batch_count; batch_idx++) { + index_t start_index = crow_indices_accessor[batch_idx*crow_stride0 + irow]; + index_t end_index = crow_indices_accessor[batch_idx*crow_stride0 + irow + 1]; for (index_t i = start_index; i < end_index; ++i) { - auto icol = col_indices_accessor[i]; - auto index = out_storage_offset + irow * out_strides0 + icol * out_strides1; - out_ptr[index] += cast_value * values_accessor[i]; + auto icol = col_indices_accessor[batch_idx*col_stride0 + i]; + auto index = batch_idx * out_strides0 + irow * out_strides1 + icol; + out_ptr[index] += cast_value * values_accessor[batch_idx*val_stride0 + i]; } + } }); }); }); @@ -275,5 +297,321 @@ TORCH_IMPL_FUNC(_convert_indices_from_csr_to_coo_structured_cuda) ( } } + /* + Reductions on sparse CSR tensors using masked semantics. + + - To support a reduction operator on a CSR tensor with CUDA storage, define + +template +struct Reduction...Op { + __device__ __forceinline__ scalar_t operator()(const scalar_t a, const scalar_t b) const { + return a ... b; + } + __device__ __forceinline__ scalar_t identity() const { return ...; } + __forceinline__ scalar_t identity_cpu() const { return ...; } +}; + + +Tensor _sparse_csr_..._cuda(const Tensor& input, IntArrayRef dims_to_sum, bool keepdim, c10::optional dtype) { + ... + result = reduce_sparse_csr_cuda_template(input_, dims_to_sum, keepdim, Reduction...Op()); + ... + return result; +} + + and add the following + + - func: _sparse_csr_op.dim_dtype(Tensor self, int[1] dim, bool keepdim=False, *, ScalarType? dtype=None) -> Tensor + dispatch: + SparseCsrCUDA: _sparse_csr_..._cuda + + to native_functions.yaml + */ + +namespace { + +template +__global__ void reduce_sparse_csr_dim0_cuda_kernel(scalar_t* new_values, + const index_t* new_col_indices, + const int64_t new_nnz, + const scalar_t* values, + const index_t* col_indices, + const int64_t nnz, + ReductionOp rop + ) { + int64_t tid = blockDim.x * blockIdx.x + threadIdx.x; + if (tid < new_nnz) { + index_t col = new_col_indices[tid]; + scalar_t v = rop.identity(); + for (int64_t j=0; j < nnz; j++) { + if (col == col_indices[j]) { + v = rop(v, values[j]); + } + } + new_values[tid] = v; + } +} + +template +Tensor reduce_sparse_csr_dim0_cuda_template(const Tensor& sparse, ReductionOp rop) { + /* + Consider the following sparse tensor: + + 1 * * * * + * * * 2 * + * * 3 * * + * * * * * + 4 * 5 * * + + that has CSR representation + + crow_indices = [0, 1, 2, 3, 3, 5] + col_indices = [0, 3, 2, 0, 2] + values = [1, 2, 3, 4, 5] + + Reduction with dim=0 results: + + rop(1,4) * rop(3,5) 2 * + + that has CSR representation + + new_crow_indices = [0, 3] + new_col_indices = [0, 2, 3] + new_values = [rop(1, 4], rop(3, 5), 2] + + In general, the CSR representation data can be computed as follows: + + nnz = col_indices.numel() + new_col_indices = col_indices.unique(sorted=True, return_inverse=False) + new_nnz = new_col_indices.numel() + new_crow_indices = [0, new_nnz] + new_values.resize(new_nnz) + + for i in range(new_nnz): + v = identity + col = new_col_indices[i] + for j in range(nnz): + if col == col_indices[j]: + v = rop(v, values[j]) + new_values[i] = v + + Notice this algorithm is different from the one used on CPU data. + */ + + Tensor col_indices = sparse.col_indices(); + Tensor values = sparse.values(); + auto ncols = sparse.size(1); + auto nnz = col_indices.numel(); + Tensor new_col_indices; + + std::tie(new_col_indices, std::ignore) = at::_unique(col_indices, true, false); + auto new_nnz = new_col_indices.numel(); + Tensor new_crow_indices = at::tensor(ArrayRef{0, new_nnz}, col_indices.options()); + Tensor new_values = at::empty({new_nnz}, values.options()); + + scalar_t* values_ptr = values.data_ptr(); + scalar_t* new_values_ptr = new_values.data_ptr(); + int64_t THREADS = at::cuda::getCurrentDeviceProperties()->maxThreadsPerBlock; + int64_t BLOCKS = (new_nnz + THREADS) / THREADS; + at::cuda::CUDAStream stream = at::cuda::getCurrentCUDAStream(); + AT_DISPATCH_INDEX_TYPES(col_indices.scalar_type(), "reduce_sparse_csr_dim0_cuda_indices", + [&]() { + index_t* col_indices_ptr = col_indices.data_ptr(); + index_t* new_col_indices_ptr = new_col_indices.data_ptr(); + reduce_sparse_csr_dim0_cuda_kernel<<>>(new_values_ptr, + new_col_indices_ptr, + new_nnz, + values_ptr, + col_indices_ptr, + nnz, + rop + ); + }); + C10_CUDA_KERNEL_LAUNCH_CHECK(); + return at::native::_sparse_csr_tensor_unsafe(new_crow_indices, new_col_indices, new_values, + {1, ncols}, + new_values.scalar_type(), + sparse.layout(), + new_values.device()); +} + +template +__global__ void reduce_crow_indices_dim1_cuda_kernel(index_t* new_crow_indices, + index_t* row_map, + const index_t* crow_indices, + const int64_t nrows + ) { + int64_t nnz = 0; + new_crow_indices[0] = 0; + for(int64_t i=0; i +__global__ void reduce_sparse_csr_dim1_cuda_kernel(scalar_t* new_values, + const scalar_t* values, + const index_t* crow_indices, + const index_t* row_map, + const int64_t nrows, + ReductionOp rop + ) { + int64_t tid = blockDim.x * blockIdx.x + threadIdx.x; + if (tid < nrows) { + index_t i_start = crow_indices[tid]; + index_t i_end = crow_indices[tid+1]; + if (i_start != i_end) { + scalar_t acc = rop.identity(); + for (index_t i = i_start; i < i_end; i++) { + acc = rop(acc, values[i]); + } + new_values[row_map[tid]] = acc; + } + } +} + +template +Tensor reduce_sparse_csr_dim1_cuda_template(const Tensor& sparse, ReductionOp rop) { + /* + The algorithm of computing reduce of a CSR tensor along the last + dimension is explained in the comment of the + reduce_sparse_csr_dim1_cpu_template function. + */ + Tensor crow_indices = sparse.crow_indices(); + auto ioptions = crow_indices.options(); + Tensor values = sparse.values(); + auto nrows = sparse.size(0); + auto numel = values.numel(); + + Tensor new_crow_indices = at::empty({crow_indices.numel()}, ioptions); + Tensor new_col_indices = at::empty({}, ioptions); + Tensor new_values = at::empty({}, values.options()); + Tensor row_map = at::empty({nrows}, ioptions); + + at::cuda::CUDAStream stream = at::cuda::getCurrentCUDAStream(); + int64_t THREADS = at::cuda::getCurrentDeviceProperties()->maxThreadsPerBlock; + int64_t BLOCKS = (nrows + THREADS) / THREADS; + + AT_DISPATCH_INDEX_TYPES(crow_indices.scalar_type(), "reduce_sparse_csr_dim1_cuda_indices", + [&]() { + index_t* crow_indices_ptr = crow_indices.data_ptr(); + index_t* new_crow_indices_ptr = new_crow_indices.data_ptr(); + index_t* row_map_ptr = row_map.data_ptr(); + reduce_crow_indices_dim1_cuda_kernel<<<1, 1, 0, stream>>>(new_crow_indices_ptr, + row_map_ptr, + crow_indices_ptr, + nrows); + C10_CUDA_KERNEL_LAUNCH_CHECK(); + index_t new_nnz = new_crow_indices[-1].item(); + new_col_indices.resize_(new_nnz); + new_col_indices.fill_(index_t(0)); + new_values.resize_(new_nnz); + + scalar_t* values_ptr = values.data_ptr(); + scalar_t* new_values_ptr = new_values.data_ptr(); + reduce_sparse_csr_dim1_cuda_kernel<<>>(new_values_ptr, + values_ptr, + crow_indices_ptr, + row_map_ptr, + nrows, + rop); + C10_CUDA_KERNEL_LAUNCH_CHECK(); + }); + + return at::native::_sparse_csr_tensor_unsafe(new_crow_indices, new_col_indices, new_values, + {sparse.size(0), 1}, + new_values.scalar_type(), + sparse.layout(), + new_values.device()); +} + +template +Tensor reduce_sparse_csr_dim01_cuda_template(const Tensor& sparse, ReductionOp rop) { + + auto ioptions = sparse.col_indices().options(); + Tensor values = sparse.values(); + auto numel = values.numel(); + auto nnz = std::min(1, numel); + + Tensor new_values; + if (numel > 0) { + new_values = at::empty({1}, values.options()); + auto iter = TensorIterator::reduce_op(new_values, values); + gpu_reduce_kernel(iter, func_wrapper(rop), rop.identity_cpu()); + } else { + new_values = at::empty({}, values.options()); + } + Tensor new_col_indices = at::zeros({nnz}, ioptions); + Tensor new_crow_indices = at::tensor(ArrayRef{0, nnz}, ioptions); + return at::native::_sparse_csr_tensor_unsafe(new_crow_indices, new_col_indices, new_values, + {1, std::min(1, sparse.size(1))}, + new_values.scalar_type(), + sparse.layout(), + new_values.device()); +} + +template +Tensor reduce_sparse_csr_cuda_template(const Tensor& sparse, std::vector dims, ReductionOp rop) { + if (dims.size() == 1) { + if (dims[0] == 0) { + return reduce_sparse_csr_dim0_cuda_template(sparse, rop); + } else { + TORCH_INTERNAL_ASSERT(dims[0] == 1); + return reduce_sparse_csr_dim1_cuda_template(sparse, rop); + } + } else if (dims.size() == 2) { + TORCH_INTERNAL_ASSERT(((dims[0] == 0 && dims[1] == 1) || (dims[0] == 1 && dims[1] == 0))); + return reduce_sparse_csr_dim01_cuda_template(sparse, rop); + } + TORCH_INTERNAL_ASSERT(dims.size() == 0); + // effective after gh-29137 has been resolved + return sparse.clone(); +} + +template +Tensor reduce_sparse_csr_cuda_template(const Tensor& sparse, IntArrayRef dims_to_sum, bool keepdim, ReductionOp rop) { + TORCH_INTERNAL_ASSERT(sparse.is_sparse_csr()); + TORCH_CHECK(keepdim, "reduction operations on CSR tensors with keepdim=False is unsupported"); + TORCH_INTERNAL_ASSERT(sparse.is_cuda()); + + const int64_t input_dim = sparse.dim(); + TORCH_INTERNAL_ASSERT(input_dim == 2); + auto dims = dims_to_sum.vec(); + maybe_wrap_dims(dims, input_dim); + if (dims.size() == 0) { + // after gh-29137 is resolved, delete this if-block + dims.emplace_back(0); + dims.emplace_back(1); + } + return reduce_sparse_csr_cuda_template(sparse, dims, rop); +} + +template +struct ReductionAddOp { + __device__ __forceinline__ scalar_t operator()(const scalar_t a, const scalar_t b) const { + return a + b; + } + __device__ __forceinline__ scalar_t identity() const { return 0; } + __forceinline__ scalar_t identity_cpu() const { return 0; } +}; + +} // namespace + +Tensor _sparse_csr_sum_cuda(const Tensor& input, IntArrayRef dims_to_sum, bool keepdim, c10::optional dtype) { + ScalarType dtype_ = dtype.value_or(input.scalar_type()); + Tensor input_ = input.to(dtype_); + Tensor result; + AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND2( + kHalf, kBFloat16, input_.scalar_type(), "_sparse_csr_sum_cuda", + [&] { + result = reduce_sparse_csr_cuda_template(input_, dims_to_sum, keepdim, ReductionAddOp()); + }); + return result; +} + } // namespace native } // namespace at diff --git a/aten/src/ATen/native/ts_native_functions.yaml b/aten/src/ATen/native/ts_native_functions.yaml new file mode 100644 index 00000000000000..6650387757d394 --- /dev/null +++ b/aten/src/ATen/native/ts_native_functions.yaml @@ -0,0 +1,177 @@ +backend: Lazy +cpp_namespace: torch::lazy +full_codegen: + - _adaptive_avg_pool2d + - _adaptive_avg_pool2d_backward + - _log_softmax + - _log_softmax_backward_data + - _softmax + - _softmax_backward_data + - abs + - add.Tensor + - addcdiv + - addcmul + - addmm + - arange.start_out + - all + - any + - avg_pool2d + - avg_pool2d_backward + - baddbmm + - bernoulli + - bernoulli_.float + - binary_cross_entropy + - binary_cross_entropy_backward + - bitwise_and.Tensor + - bitwise_or.Tensor + - bmm + - cat + - clamp + - clamp_min + - constant_pad_nd + - convolution + - convolution_backward + - cos + - cumsum + - div.Tensor + - div.Tensor_mode + - elu + - elu_backward + - embedding + - embedding_dense_backward + - eq.Scalar + - eq.Tensor + - exp + - flip + - floor + - frac + - gather + - ge.Scalar + - ge.Tensor + - gelu + - gelu_backward + - glu + - glu_backward + - grid_sampler_2d + - grid_sampler_2d_backward + - gt.Scalar + - gt.Tensor + - hardsigmoid + - index_select + - kl_div_backward + - l1_loss_backward + - le.Scalar + - le.Tensor + - leaky_relu + - leaky_relu_backward + - log + - log2 + - logdet + - log_sigmoid_backward + - log_sigmoid_forward + - lt.Scalar + - lt.Tensor + - masked_fill_.Scalar + - masked_fill_.Tensor + - max + - max.dim + - max_pool2d_with_indices + - max_pool2d_with_indices_backward + - maximum + - mean + - mean.dim + - min + - minimum + - mm + - mul.Tensor + - mv + - native_dropout + - native_dropout_backward + - native_layer_norm + - native_layer_norm_backward + - ne.Scalar + - ne.Tensor + - neg + - nll_loss_backward + - nll_loss_forward + - nll_loss2d_backward + - nll_loss2d_forward + - norm.ScalarOpt_dim + - pow.Tensor_Scalar + - pow.Tensor_Tensor + - random_ + - random_.from + - random_.to + - reciprocal + - relu + - relu_ + - remainder.Tensor + - repeat + - rsqrt + - scatter_add + - sgn + - sigmoid + - sigmoid_backward + - silu + - smooth_l1_loss + - smooth_l1_loss_backward + - softplus + - softplus_backward + - sort + - sqrt + - stack + - std + - std.dim + - std.correction + - sub.Tensor + - sum + - sum.dim_IntList + - tanh + - tanh_backward + - threshold + - threshold_backward + - topk + - trace + - tril + - triu + - trunc + - upsample_bilinear2d + - upsample_bilinear2d_backward + - upsample_nearest2d + - upsample_nearest2d_backward + - zero_ +supported: + - as_strided + - as_strided_ + - clone + - _copy_from + - _copy_from_and_resize + - diagonal + - empty.memory_format + - empty_strided + - expand + - fill_.Scalar + - native_batch_norm + - native_batch_norm_backward + - normal_ + - max_pool3d_with_indices + - max_pool3d_with_indices_backward + - permute + - select.int + - slice.Tensor + - squeeze + - squeeze.dim + - squeeze_ + - squeeze_.dim + - t + - t_ + - _to_copy + - transpose.int + - transpose_ + - unsqueeze + - unsqueeze_ + - view + - alias + - _unsafe_view +autograd: + - max_pool3d diff --git a/aten/src/ATen/native/vulkan/ops/Gru.cpp b/aten/src/ATen/native/vulkan/ops/Gru.cpp index 9052b43189d00c..8b0e99ab00bd7b 100644 --- a/aten/src/ATen/native/vulkan/ops/Gru.cpp +++ b/aten/src/ATen/native/vulkan/ops/Gru.cpp @@ -1,5 +1,5 @@ -#include -#include +#include +#include namespace at { namespace native { @@ -95,6 +95,151 @@ TORCH_LIBRARY_IMPL(aten, Vulkan, m) { #endif /* USE_VULKAN_API */ } // namespace + +std::vector pack_linear_op_contexts( + const std::vector& params_cpu, + int64_t num_layers) { + TORCH_CHECK(params_cpu.size() == 4 * num_layers, "Vulkan gru expects 'params_cpu' size to be 4 * 'num_layers'."); + std::vector linear_op_contexts; + for (int64_t i = 0; i < num_layers; ++i) { + const auto& w_ih = params_cpu.at(i * 4); + const auto& w_hh = params_cpu.at(i * 4 + 1); + const auto& b_ih = params_cpu.at(i * 4 + 2); + const auto& b_hh = params_cpu.at(i * 4 + 3); + const auto& h_in = w_ih.size(0) / 3; + + const auto& w_i_rzn = w_ih.split(h_in); + const auto& w_h_rzn = w_hh.split(h_in); + const auto& b_i_rzn = b_ih.split(h_in); + const auto& b_h_rzn = b_hh.split(h_in); + + const auto& w_ir = w_i_rzn[0]; + const auto& w_iz = w_i_rzn[1]; + const auto& w_in = w_i_rzn[2]; + const auto& w_hr = w_h_rzn[0]; + const auto& w_hz = w_h_rzn[1]; + const auto& w_hn = w_h_rzn[2]; + const auto& b_ir = b_i_rzn[0]; + const auto& b_iz = b_i_rzn[1]; + const auto& b_in = b_i_rzn[2]; + const auto& b_hr = b_h_rzn[0]; + const auto& b_hz = b_h_rzn[1]; + const auto& b_hn = b_h_rzn[2]; + + linear_op_contexts.emplace_back(LinearOpContext::create(w_ir.t(), b_ir)); + linear_op_contexts.emplace_back(LinearOpContext::create(w_hr.t(), b_hr)); + linear_op_contexts.emplace_back(LinearOpContext::create(w_iz.t(), b_iz)); + linear_op_contexts.emplace_back(LinearOpContext::create(w_hz.t(), b_hz)); + linear_op_contexts.emplace_back(LinearOpContext::create(w_in.t(), b_in)); + linear_op_contexts.emplace_back(LinearOpContext::create(w_hn.t(), b_hn)); + } + return linear_op_contexts; +} + +GruOpContext::GruOpContext( + const std::vector& params_cpu, + bool has_biases, + int64_t num_layers, + double dropout, + bool train, + bool bidirectional, + bool batch_first) + : packed_{pack_linear_op_contexts(params_cpu, num_layers), has_biases, num_layers, dropout, train, bidirectional, batch_first}, + unpacked_{params_cpu, has_biases, num_layers, dropout, train, bidirectional, batch_first} { + TORCH_INTERNAL_ASSERT(packed_.has_biases, "Vulkan gru expects 'has_biases' to be true."); + TORCH_INTERNAL_ASSERT(!packed_.train, "Vulkan gru expects 'train' to be false."); + TORCH_INTERNAL_ASSERT(!packed_.bidirectional, "Vulkan gru expects 'bidirectional' to be false."); + TORCH_INTERNAL_ASSERT(packed_.batch_first, "Vulkan gru expects 'batch_first' to be true."); + TORCH_INTERNAL_ASSERT(packed_.dropout < std::numeric_limits::epsilon()*1000, "Vulkan gru expects 'dropout' to be 0.0."); +} + +GruOpContext GruOpContext::create( + const std::vector& params_cpu, // weights/biases (cpu) + bool has_biases, + int64_t num_layers, + double dropout, + bool train, + bool bidirectional, + bool batch_first) { + return GruOpContext{ + params_cpu, + has_biases, + num_layers, + dropout, + train, + bidirectional, + batch_first + }; +} + +std::tuple GruOpContext::run( + const Tensor & input_vk, // input sequence (vulkan) + const Tensor & hx_vk) const { // initial hidden state (vulkan) + TORCH_INTERNAL_ASSERT(input_vk.sizes().size() == 3, "Vulkan gru expects 'input_vk' dims to be 3."); + TORCH_INTERNAL_ASSERT(hx_vk.sizes().size() == 3, "Vulkan gru expects 'hx_vk' dims to be 3."); + + const int64_t linear_op_contexts_per_layer = 6; // (b_ir, w_ir), (b_hr, w_hr), (b_iz, w_iz), (b_hz, w_hz), (b_in, w_in), (b_hn, w_hn) + std::vector h_n_list; // hidden output + + // reshape to 2D due to Vulkan at::mm op accepts only 2D + auto x = input_vk.reshape({input_vk.size(0) * input_vk.size(1), input_vk.size(2)}); + + for (int64_t i = 0; i < packed_.num_layers; ++i) { + // extract each hidden state and squeeze into 2D dim + auto h = at::slice(hx_vk, 0, i, i + 1, 1); + h = h.reshape({h.size(0) * h.size(1), h.size(2)}); + + const auto& cxt_ir = packed_.linear_op_contexts[i * linear_op_contexts_per_layer + 0]; + const auto& cxt_hr = packed_.linear_op_contexts[i * linear_op_contexts_per_layer + 1]; + const auto& cxt_iz = packed_.linear_op_contexts[i * linear_op_contexts_per_layer + 2]; + const auto& cxt_hz = packed_.linear_op_contexts[i * linear_op_contexts_per_layer + 3]; + const auto& cxt_in = packed_.linear_op_contexts[i * linear_op_contexts_per_layer + 4]; + const auto& cxt_hn = packed_.linear_op_contexts[i * linear_op_contexts_per_layer + 5]; + + const auto& r = at::sigmoid(cxt_ir.run(x, 1.0f, 1.0f) + cxt_hr.run(h, 1.0f, 1.0f)); + const auto& z = at::sigmoid(cxt_iz.run(x, 1.0f, 1.0f) + cxt_hz.run(h, 1.0f, 1.0f)); + const auto& n = at::tanh(cxt_in.run(x, 1.0f, 1.0f) + r * (cxt_hn.run(h, 1.0f, 1.0f))); + h = (z * (-1) + 1) * n + z * h; + x = h; // next input + h_n_list.emplace_back(h.reshape({1, 1, h.size(0), h.size(1)})); // 2D to 4D for cat op + } + + auto h_n = at::cat(h_n_list, 1); + h_n = h_n.reshape({h_n.size(0) * h_n.size(1), h_n.size(2), h_n.size(3)}); + return std::tuple(x, h_n); +} + +GruOpContext::State GruOpContext::unpack() const { + return GruOpContext::State{ + unpacked_.params_cpu, + unpacked_.has_biases, + unpacked_.num_layers, + unpacked_.dropout, + unpacked_.train, + unpacked_.bidirectional, + unpacked_.batch_first, + }; +} + +c10::intrusive_ptr gru_prepack( + std::vector&& params_cpu, + bool has_biases, + int64_t num_layers, + double dropout, + bool train, + bool bidirectional, + bool batch_first) { + return c10::make_intrusive(GruOpContext::create( + params_cpu, has_biases, num_layers, dropout, train, bidirectional, batch_first)); +} + +std::tuple gru_run( + const Tensor& input_vk, + const Tensor& hx_vk, + const c10::intrusive_ptr& context) { + return context->run(input_vk, hx_vk); +} + } // namespace ops } // namespace vulkan } // namespace native diff --git a/aten/src/ATen/native/vulkan/ops/Gru.h b/aten/src/ATen/native/vulkan/ops/Gru.h new file mode 100644 index 00000000000000..8000aa449ca4f2 --- /dev/null +++ b/aten/src/ATen/native/vulkan/ops/Gru.h @@ -0,0 +1,85 @@ +#pragma once + +#ifdef USE_VULKAN_API + +#include +#include +#include + +namespace at { +namespace native { +namespace vulkan { +namespace ops { + +class GruOpContext final : public torch::jit::CustomClassHolder { + public: + static GruOpContext create( + const std::vector& params_cpu, // weights/biases (cpu) + bool has_biases, + int64_t num_layers, + double dropout, + bool train, + bool bidirectional, + bool batch_first); + + using State = std::tuple, bool, int64_t, double, bool, bool, bool>; + + std::tuple run( + const Tensor& input_vk, + const Tensor & hx_vk) const; + State unpack() const; + + private: + GruOpContext( + const std::vector& params_cpu, // weights/biases (cpu) + bool has_biases, + int64_t num_layers, + double dropout, + bool train, + bool bidirectional, + bool batch_first); + + private: + struct { + std::vector linear_op_contexts; // {{ op context for b_ir, w_ir, op context for b_hr, w_hr, + // op context for b_iz, w_iz, op context for b_hz, w_hz, + // op context for b_in, w_in, op context for b_hn, w_hn,}, ...} + bool has_biases{}; + int64_t num_layers{}; + double dropout{}; + bool train{}; + bool bidirectional{}; + bool batch_first{}; + } packed_; + + struct { + std::vector params_cpu; // weights/biases (cpu) + bool has_biases{}; + int64_t num_layers{}; + double dropout{}; + bool train{}; + bool bidirectional{}; + bool batch_first{}; + } unpacked_; +}; + +c10::intrusive_ptr gru_prepack( + std::vector&& params_cpu, // weights/biases (cpu) + bool has_biases, + int64_t num_layers, + double dropout, + bool train, + bool bidirectional, + bool batch_first); + +std::tuple gru_run( + const Tensor& input_vk, + const Tensor & hx_vk, + const c10::intrusive_ptr& context); + +} // namespace ops +} // namespace vulkan +} // namespace native +} // namespace at + +#endif /* USE_VULKAN_API */ diff --git a/aten/src/ATen/native/vulkan/ops/Register.cpp b/aten/src/ATen/native/vulkan/ops/Register.cpp index 4b90fc8696e1ff..942836cf6838a4 100644 --- a/aten/src/ATen/native/vulkan/ops/Register.cpp +++ b/aten/src/ATen/native/vulkan/ops/Register.cpp @@ -2,6 +2,7 @@ #include #include +#include #include #include #include @@ -28,9 +29,9 @@ TORCH_LIBRARY(vulkan, m) { std::move(std::get<2>(state)), std::move(std::get<3>(state)), std::move(std::get<4>(state)), - std::move(std::get<5>(state)), - std::move(std::get<6>(state)), - std::move(std::get<7>(state))); + std::get<5>(state), + std::get<6>(state), + std::get<7>(state)); }); m.class_("TransposeConv2dOpContext") .def_pickle( @@ -47,9 +48,9 @@ TORCH_LIBRARY(vulkan, m) { std::move(std::get<3>(state)), std::move(std::get<4>(state)), std::move(std::get<5>(state)), - std::move(std::get<6>(state)), - std::move(std::get<7>(state)), - std::move(std::get<8>(state))); + std::get<6>(state), + std::get<7>(state), + std::get<8>(state)); }); m.class_("LinearOpContext") .def_pickle( @@ -62,6 +63,23 @@ TORCH_LIBRARY(vulkan, m) { return linear_prepack( std::move(std::get<0>(state)), std::move(std::get<1>(state))); }); + m.class_("GruOpContext") + .def_pickle( + // __getstate__ + [](const c10::intrusive_ptr& context) { + return context->unpack(); + }, + // __setstate__ + [](GruOpContext::State state) { + return gru_prepack( + std::move(std::get<0>(state)), + std::get<1>(state), + std::get<2>(state), + std::get<3>(state), + std::get<4>(state), + std::get<5>(state), + std::get<6>(state)); + }); } TORCH_LIBRARY(vulkan_prepack, m) { @@ -87,18 +105,33 @@ TORCH_LIBRARY(vulkan_prepack, m) { m.def(TORCH_SELECTIVE_SCHEMA( "vulkan_prepack::linear_run(Tensor X, " "__torch__.torch.classes.vulkan.LinearOpContext BW_prepack) -> Tensor Y")); + m.def(TORCH_SELECTIVE_SCHEMA( + "vulkan_prepack::gru_prepack(Tensor[] params_cpu, " + "bool has_biases, " + "int num_layers, " + "float dropout, " + "bool train, " + "bool bidirectional, " + "bool batch_first) " + "-> __torch__.torch.classes.vulkan.GruOpContext")); + m.def(TORCH_SELECTIVE_SCHEMA( + "vulkan_prepack::gru_run(Tensor input_vk, " + "Tensor hx_vk, " + "__torch__.torch.classes.vulkan.GruOpContext G_prepack) -> (Tensor next_input, Tensor hidden_layer)")); } TORCH_LIBRARY_IMPL(vulkan_prepack, CPU, m) { m.impl(TORCH_SELECTIVE_NAME("vulkan_prepack::conv2d_clamp_prepack"), TORCH_FN(conv2d_clamp_prepack)); m.impl(TORCH_SELECTIVE_NAME("vulkan_prepack::conv2d_transpose_clamp_prepack"), TORCH_FN(conv2d_transpose_clamp_prepack)); m.impl(TORCH_SELECTIVE_NAME("vulkan_prepack::linear_prepack"), TORCH_FN(linear_prepack)); + m.impl(TORCH_SELECTIVE_NAME("vulkan_prepack::gru_prepack"), TORCH_FN(gru_prepack)); } TORCH_LIBRARY_IMPL(vulkan_prepack, Vulkan, m) { m.impl(TORCH_SELECTIVE_NAME("vulkan_prepack::conv2d_clamp_run"), TORCH_FN(conv2d_clamp_run)); m.impl(TORCH_SELECTIVE_NAME("vulkan_prepack::conv2d_transpose_clamp_run"), TORCH_FN(conv2d_transpose_clamp_run)); m.impl(TORCH_SELECTIVE_NAME("vulkan_prepack::linear_run"), TORCH_FN(linear_run)); + m.impl(TORCH_SELECTIVE_NAME("vulkan_prepack::gru_run"), TORCH_FN(gru_run)); } Tensor convolution( diff --git a/aten/src/ATen/native/xnnpack/Convolution.cpp b/aten/src/ATen/native/xnnpack/Convolution.cpp index 3deb352d76ecd9..278e35280c4020 100644 --- a/aten/src/ATen/native/xnnpack/Convolution.cpp +++ b/aten/src/ATen/native/xnnpack/Convolution.cpp @@ -27,7 +27,7 @@ namespace { // TODO: Decouple and improve error handling and messages. bool available( const Tensor& weight, - const c10::optional bias_sizes_opt, + const at::OptionalIntArrayRef bias_sizes_opt, const IntArrayRef padding, const IntArrayRef stride, const IntArrayRef dilation, @@ -189,7 +189,7 @@ ContextConv2D create( TORCH_CHECK( available( weight_nhwc, - (bias.has_value() && bias->defined()) ? c10::optional(bias->sizes()) : c10::nullopt, + (bias.has_value() && bias->defined()) ? at::OptionalIntArrayRef(bias->sizes()) : c10::nullopt, padding_expanded, stride_expanded, dilation_expanded, @@ -433,7 +433,7 @@ unpack_prepacked_sizes_conv2d(const IValue& ivalue) { const auto& bias = std::get<1>(tuple); return IValue(std::make_tuple( std::get<0>(tuple).sizes(), - (bias && bias->defined()) ? c10::optional(bias->sizes()) : c10::nullopt, + (bias && bias->defined()) ? at::OptionalIntArrayRef(bias->sizes()) : c10::nullopt, std::get<2>(tuple), std::get<3>(tuple), std::get<4>(tuple), @@ -452,7 +452,7 @@ Tensor conv2d_transpose_clamp_run( bool use_convolution2d( const Tensor& input, const Tensor& weight, - const c10::optional bias_sizes_opt, + const at::OptionalIntArrayRef bias_sizes_opt, const IntArrayRef padding, const IntArrayRef stride, const IntArrayRef dilation, diff --git a/aten/src/ATen/native/xnnpack/Engine.h b/aten/src/ATen/native/xnnpack/Engine.h index 71ed262297b310..9d5c0e4594acfe 100644 --- a/aten/src/ATen/native/xnnpack/Engine.h +++ b/aten/src/ATen/native/xnnpack/Engine.h @@ -13,7 +13,7 @@ namespace xnnpack { bool use_convolution2d( const Tensor& input, const Tensor& weight, - const c10::optional bias_sizes_opt, + const at::OptionalIntArrayRef bias_sizes_opt, const IntArrayRef padding, const IntArrayRef stride, const IntArrayRef dilation, diff --git a/aten/src/ATen/native/xnnpack/Linear.cpp b/aten/src/ATen/native/xnnpack/Linear.cpp index 13fd04aad5a6a9..3f7ae681f95501 100644 --- a/aten/src/ATen/native/xnnpack/Linear.cpp +++ b/aten/src/ATen/native/xnnpack/Linear.cpp @@ -187,7 +187,7 @@ unpack_prepacked_sizes_linear(const IValue& ivalue) { const auto& bias = std::get<1>(tuple); return IValue(std::make_tuple( std::get<0>(tuple).sizes(), - (bias && bias->defined()) ? c10::optional(bias->sizes()) : c10::nullopt)); + (bias && bias->defined()) ? at::OptionalIntArrayRef(bias->sizes()) : c10::nullopt)); } } // namespace linear diff --git a/aten/src/ATen/native/xnnpack/Shim.cpp b/aten/src/ATen/native/xnnpack/Shim.cpp index 89fffa024aeff7..32ddfb4b852557 100644 --- a/aten/src/ATen/native/xnnpack/Shim.cpp +++ b/aten/src/ATen/native/xnnpack/Shim.cpp @@ -31,7 +31,7 @@ bool available() { bool use_convolution2d( const Tensor&, const Tensor&, - const c10::optional, + const at::OptionalIntArrayRef, const IntArrayRef, const IntArrayRef, const IntArrayRef, diff --git a/aten/src/ATen/ops/from_blob.h b/aten/src/ATen/ops/from_blob.h index 558ab57e900fba..f7599e70ea0558 100644 --- a/aten/src/ATen/ops/from_blob.h +++ b/aten/src/ATen/ops/from_blob.h @@ -26,7 +26,7 @@ class TORCH_API TensorMaker { public: using ContextDeleter = DeleterFnPtr; - TensorMaker& strides(optional value) noexcept { + TensorMaker& strides(OptionalIntArrayRef value) noexcept { strides_ = value; return *this; @@ -79,7 +79,7 @@ class TORCH_API TensorMaker { void* data_; IntArrayRef sizes_; - optional strides_{}; + OptionalIntArrayRef strides_{}; optional storage_offset_{}; std::function deleter_{}; std::unique_ptr ctx_{nullptr, detail::noopDelete}; diff --git a/aten/src/ATen/ops/tensor.h b/aten/src/ATen/ops/tensor.h index 3369eaf2502caa..2f72b7ef026379 100644 --- a/aten/src/ATen/ops/tensor.h +++ b/aten/src/ATen/ops/tensor.h @@ -1,6 +1,6 @@ #pragma once #include -#include +#include namespace at { diff --git a/aten/src/ATen/quantized/Quantizer.cpp b/aten/src/ATen/quantized/Quantizer.cpp index aa589819435693..4a1bac8bc4c161 100644 --- a/aten/src/ATen/quantized/Quantizer.cpp +++ b/aten/src/ATen/quantized/Quantizer.cpp @@ -417,4 +417,23 @@ Tensor from_blob_quantized_per_channel_affine( return qtensor; } +Tensor UnknownQuantizer::quantize(const Tensor& tensor) { + TORCH_INTERNAL_ASSERT(false, "cannot call quantize on UnknownQuantizer"); +} +Tensor UnknownQuantizer::dequantize(const Tensor& qtensor) { + TORCH_INTERNAL_ASSERT(false, "cannot call dequantize on UnknownQuantizer"); +} +Tensor& UnknownQuantizer::dequantize_out(Tensor& rtensor, const Tensor& qtensor) { + TORCH_INTERNAL_ASSERT(false, "cannot call dequantize_out on UnknownQuantizer"); +} +QScheme UnknownQuantizer::qscheme() const { + TORCH_INTERNAL_ASSERT(false, "cannot call qscheme on UnknownQuantizer"); +} +bool UnknownQuantizer::equalTo(QuantizerPtr other) const{ + TORCH_INTERNAL_ASSERT(false, "cannot call equalTo on UnknownQuantizer"); +} +QuantizerPtr make_unknown_quantizer(ScalarType scalar_type) { + return c10::make_intrusive(scalar_type); +} + } // namespace at diff --git a/aten/src/ATen/quantized/Quantizer.h b/aten/src/ATen/quantized/Quantizer.h index 5d9c7111f19eb0..05bd39b71223a0 100644 --- a/aten/src/ATen/quantized/Quantizer.h +++ b/aten/src/ATen/quantized/Quantizer.h @@ -18,6 +18,23 @@ namespace at { +/** + * UnknownQuantizer is a placeholder quantizer for functions that implement + * quantization in a two step process. First a tensor is allocated but with + * unknown quantizer, and then the quantization kernel decides what the final + * quantizer will be. + */ +struct TORCH_API UnknownQuantizer : public Quantizer { + explicit UnknownQuantizer(ScalarType scalar_type) + : Quantizer(scalar_type) {} + + Tensor quantize(const Tensor& tensor) override; + Tensor dequantize(const Tensor& qtensor) override; + Tensor& dequantize_out(Tensor& rtensor, const Tensor& qtensor) override; + QScheme qscheme() const override; + bool equalTo(QuantizerPtr other) const override; +}; + /** * UniformQuantizer is the parent class for all uniform quantizers. * These quantization scheme will map float value uniformly to @@ -80,7 +97,7 @@ struct TORCH_API PerTensorAffineQuantizer : public AffineQuantizer { return zero_point_; } - bool equalTo(QuantizerPtr other) override { + bool equalTo(QuantizerPtr other) const override { if (!other.get() || other->qscheme() != kPerTensorAffine) { return false; } @@ -139,7 +156,7 @@ struct TORCH_API PerChannelAffineQuantizer : public AffineQuantizer { Tensor dequantize(const Tensor& qtensor) override; Tensor& dequantize_out(Tensor& rtensor, const Tensor& qtensor) override; - bool equalTo(QuantizerPtr other) override { + bool equalTo(QuantizerPtr other) const override { if (!other.get() || other->qscheme() != kPerChannelAffine) { return false; } @@ -190,7 +207,7 @@ struct TORCH_API PerChannelAffineFloatQParamsQuantizer : public PerChannelAffine Tensor dequantize(const Tensor& qtensor) override; Tensor& dequantize_out(Tensor& rtensor, const Tensor& qtensor) override; - bool equalTo(QuantizerPtr other) override { + bool equalTo(QuantizerPtr other) const override { if (!other.get() || other->qscheme() != kPerChannelAffineFloatQParams) { return false; } @@ -222,6 +239,8 @@ TORCH_API QuantizerPtr make_per_channel_affine_quantizer( int64_t axis, ScalarType scalar_type); +TORCH_API QuantizerPtr make_unknown_quantizer(ScalarType scalar_type); + // Create a Quantized Tensor given arguments for normal Tensor and a quantizer TORCH_API Tensor new_qtensor( IntArrayRef sizes, diff --git a/aten/src/ATen/templates/DispatchKeyNativeFunctions.cpp b/aten/src/ATen/templates/DispatchKeyNativeFunctions.cpp new file mode 100644 index 00000000000000..1a5b4a452592d9 --- /dev/null +++ b/aten/src/ATen/templates/DispatchKeyNativeFunctions.cpp @@ -0,0 +1,9 @@ +// ${generated_comment} +${includes} +${native_functions_include} + +${namespace_prologue} + +${native_function_definitions} + +${namespace_epilogue} diff --git a/aten/src/ATen/templates/Functions.h b/aten/src/ATen/templates/Functions.h index 3313b90d51b035..7ff718892d669a 100644 --- a/aten/src/ATen/templates/Functions.h +++ b/aten/src/ATen/templates/Functions.h @@ -62,6 +62,7 @@ #include #include #include +#include #include #include #include diff --git a/aten/src/ATen/templates/LazyIr.h b/aten/src/ATen/templates/LazyIr.h new file mode 100644 index 00000000000000..1ee90e66cc6ced --- /dev/null +++ b/aten/src/ATen/templates/LazyIr.h @@ -0,0 +1,19 @@ +#pragma once + +// This file contains autogenerated LazyTensor IR nodes +${lazy_ir_sysinc} +${lazy_ir_inc} + +${namespace_prologue} +using at::operator<<; + +// kNullValue is used to contribute a static hash value any time +// a node has an Optional input that is nullopt. It is important +// to differentiate between HASH(nullopt, something) and HASH(something, nullopt), +// and using kNullValue in the hash function in the order of arguments +// serves this purpose. +static const torch::lazy::Value kNullValue = torch::lazy::Value(); + +${ir_declarations} + +${namespace_epilogue} diff --git a/aten/src/ATen/templates/NativeMetaFunctions.h b/aten/src/ATen/templates/NativeMetaFunctions.h index c83830f1eb1087..8e5d165fb70aa1 100644 --- a/aten/src/ATen/templates/NativeMetaFunctions.h +++ b/aten/src/ATen/templates/NativeMetaFunctions.h @@ -3,6 +3,7 @@ // ${generated_comment} #include +#include #include #include diff --git a/aten/src/ATen/templates/Operator.h b/aten/src/ATen/templates/Operator.h index 15434af15bae33..ee51847a4369fb 100644 --- a/aten/src/ATen/templates/Operator.h +++ b/aten/src/ATen/templates/Operator.h @@ -3,6 +3,7 @@ // ${generated_comment} #include +#include #include #include @@ -16,6 +17,7 @@ template class optional; template class List; +class ITensorListRef; class Stream; class Scalar; struct Storage; @@ -29,6 +31,7 @@ class Tensor; struct Dimname; struct Generator; using TensorList = c10::ArrayRef; +using ITensorListRef = c10::ITensorListRef; using DimnameList = c10::ArrayRef; using c10::Stream; using c10::Storage; diff --git a/aten/src/ATen/templates/Operators.h b/aten/src/ATen/templates/Operators.h index 3dc55a677106e3..a5a52ed1896d6b 100644 --- a/aten/src/ATen/templates/Operators.h +++ b/aten/src/ATen/templates/Operators.h @@ -17,6 +17,7 @@ and see NOTE [TORCH_ASSERT_ONLY_METHOD_OPERATORS]. #endif +#include #include #include #include diff --git a/aten/src/ATen/templates/RegisterDispatchKey.cpp b/aten/src/ATen/templates/RegisterDispatchKey.cpp index f9fa6ab244022d..df00c0d0e4a321 100644 --- a/aten/src/ATen/templates/RegisterDispatchKey.cpp +++ b/aten/src/ATen/templates/RegisterDispatchKey.cpp @@ -62,12 +62,12 @@ namespace { ${dispatch_anonymous_definitions} -TORCH_LIBRARY_IMPL(aten, ${DispatchKey}, m) { - ${dispatch_registrations} -} +${static_init_dispatch_registrations} } // anonymous namespace +${deferred_dispatch_registrations} + namespace ${dispatch_namespace} { ${dispatch_namespaced_definitions} diff --git a/aten/src/ATen/templates/TensorBody.h b/aten/src/ATen/templates/TensorBody.h index aa85ac1d30496f..c1f377b09b248c 100644 --- a/aten/src/ATen/templates/TensorBody.h +++ b/aten/src/ATen/templates/TensorBody.h @@ -32,8 +32,10 @@ #include #include #include +#include #include + #include namespace c10{ @@ -340,6 +342,10 @@ class TORCH_API Tensor: public TensorBase { return to(options().device(DeviceType::Metal), /*non_blocking*/ false, /*copy*/ false); } + Tensor meta() const { + return to(options().device(DeviceType::Meta), /*non_blocking*/ false, /*copy*/ false); + } + // ~~~~~ Autograd API ~~~~~ /// \fn bool is_leaf() const; diff --git a/aten/src/ATen/templates/TensorMethods.cpp b/aten/src/ATen/templates/TensorMethods.cpp index 29a43a657bb325..be9d94406e74cb 100644 --- a/aten/src/ATen/templates/TensorMethods.cpp +++ b/aten/src/ATen/templates/TensorMethods.cpp @@ -15,7 +15,7 @@ namespace at { return this->unsafeGetTensorImpl()->data_ptr_impl(); \ } - AT_FORALL_SCALAR_TYPES_WITH_COMPLEX_EXCEPT_COMPLEX_HALF(DEFINE_CAST) + AT_FORALL_SCALAR_TYPES_WITH_COMPLEX(DEFINE_CAST) AT_FORALL_QINT_TYPES(DEFINE_CAST) #undef DEFINE_CAST @@ -25,7 +25,7 @@ namespace at { return item().to##name(); \ } - AT_FORALL_SCALAR_TYPES_WITH_COMPLEX_EXCEPT_COMPLEX_HALF(DEFINE_ITEM) + AT_FORALL_SCALAR_TYPES_WITH_COMPLEX(DEFINE_ITEM) #undef DEFINE_ITEM } //namespace at diff --git a/aten/src/ATen/test/cuda_atomic_ops_test.cu b/aten/src/ATen/test/cuda_atomic_ops_test.cu index 54d43ffec019cf..d5d261440064bf 100644 --- a/aten/src/ATen/test/cuda_atomic_ops_test.cu +++ b/aten/src/ATen/test/cuda_atomic_ops_test.cu @@ -1,6 +1,7 @@ #include #include #include +#include #include #include @@ -25,6 +26,24 @@ __global__ void mul_test_kernel(T * a, T * sum) { gpuAtomicMul(&sum[idx], a[idx]); } +template +__global__ void max_test_kernel(T * a, T * max) { + int tid = blockIdx.x * blockDim.x + threadIdx.x; + int a_idx = (tid) % (arraysize * factor); + int idx = a_idx / factor; + + gpuAtomicMax(&max[idx], a[a_idx]); +} + +template +__global__ void min_test_kernel(T * a, T * min) { + int tid = blockIdx.x * blockDim.x + threadIdx.x; + int a_idx = (tid) % (arraysize * factor); + int idx = a_idx / factor; + + gpuAtomicMin(&min[idx], a[a_idx]); +} + template void test_atomic_add() { dim3 dimBlock(blocksize, 1); @@ -75,7 +94,7 @@ void test_atomic_mul() { for (int i = 0; i < arraysize; ++i) { a[i] = 2; sum[i] = 2; - answer[i] = pow(sum[i], static_cast(factor)); + answer[i] = pow(sum[i], static_cast(factor + 1)); } cudaMalloc((void**)&ad, arraysize * sizeof(T)); @@ -97,7 +116,88 @@ void test_atomic_mul() { cudaFree(sumd); } +template +void test_atomic_max() { + dim3 dimBlock(blocksize, 1); + dim3 dimGrid(1, 1); + + T *ad, *sumd; + + std::vector a(arraysize * factor); + std::vector sum(arraysize); + std::vector answer(arraysize); + + int j; + for (int i = 0; i < arraysize * factor; ++i) { + a[i] = i; + if (i % factor == 0) { + j = i / factor; + sum[j] = std::numeric_limits::lowest(); + answer[j] = (j + 1) * factor - 1; + } + } + + cudaMalloc((void**)&ad, arraysize * factor * sizeof(T)); + cudaMalloc((void**)&sumd, arraysize * sizeof(T)); + + cudaMemcpy(ad, a.data(), arraysize * factor * sizeof(T), cudaMemcpyHostToDevice); + cudaMemcpy(sumd, sum.data(), arraysize * sizeof(T), cudaMemcpyHostToDevice); + + max_test_kernel<<>>(ad, sumd); + C10_CUDA_KERNEL_LAUNCH_CHECK(); + + cudaMemcpy(sum.data(), sumd, arraysize * sizeof(T), cudaMemcpyDeviceToHost); + + for (int i = 0; i < arraysize; ++i) { + ASSERT_EQ(sum[i], answer[i]) << typeid(T).name(); + } + + cudaFree(ad); + cudaFree(sumd); +} + +template +void test_atomic_min() { + dim3 dimBlock(blocksize, 1); + dim3 dimGrid(1, 1); + + T *ad, *sumd; + + std::vector a(arraysize * factor); + std::vector sum(arraysize); + std::vector answer(arraysize); + + int j; + for (int i = 0; i < arraysize * factor; ++i) { + a[i] = i; + if (i % factor == 0) { + j = i / factor; + sum[j] = std::numeric_limits::max(); + answer[j] = j * factor; + } + } + + cudaMalloc((void**)&ad, arraysize * factor * sizeof(T)); + cudaMalloc((void**)&sumd, arraysize * sizeof(T)); + + cudaMemcpy(ad, a.data(), arraysize * factor * sizeof(T), cudaMemcpyHostToDevice); + cudaMemcpy(sumd, sum.data(), arraysize * sizeof(T), cudaMemcpyHostToDevice); + + min_test_kernel<<>>(ad, sumd); + C10_CUDA_KERNEL_LAUNCH_CHECK(); + + cudaMemcpy(sum.data(), sumd, arraysize * sizeof(T), cudaMemcpyDeviceToHost); + + for (int i = 0; i < arraysize; ++i) { + ASSERT_EQ(sum[i], answer[i]) << typeid(T).name(); + } + + cudaFree(ad); + cudaFree(sumd); +} + TEST(TestAtomicOps, TestAtomicAdd) { + if (!at::cuda::is_available()) return; test_atomic_add(); test_atomic_add(); test_atomic_add(); @@ -113,8 +213,25 @@ TEST(TestAtomicOps, TestAtomicAdd) { } TEST(TestAtomicOps, DISABLED_ON_WINDOWS(TestAtomicMul)) { + if (!at::cuda::is_available()) return; test_atomic_mul(); test_atomic_mul(); test_atomic_mul(); test_atomic_mul(); } + +TEST(TestAtomicOps, DISABLED_ON_WINDOWS(TestAtomicMax)) { + if (!at::cuda::is_available()) return; + test_atomic_max(); + test_atomic_max(); + test_atomic_max(); + test_atomic_max(); +} + +TEST(TestAtomicOps, DISABLED_ON_WINDOWS(TestAtomicMin)) { + if (!at::cuda::is_available()) return; + test_atomic_min(); + test_atomic_min(); + test_atomic_min(); + test_atomic_min(); +} diff --git a/aten/src/ATen/test/cuda_half_test.cu b/aten/src/ATen/test/cuda_half_test.cu index a55d9458e85131..aa1644c94b764f 100644 --- a/aten/src/ATen/test/cuda_half_test.cu +++ b/aten/src/ATen/test/cuda_half_test.cu @@ -76,6 +76,13 @@ __device__ void test(){ assert(::abs(::isnan(Half(0.0)) - ::isnan(0.0f)) <= threshold); assert(::abs(::isinf(Half(0.0)) - ::isinf(0.0f)) <= threshold); #endif + + // test complex<32> + Half real = 3.0f; + Half imag = -10.0f; + auto complex = c10::complex(real, imag); + assert(complex.real() == real); + assert(complex.imag() == imag); } __global__ void kernel(){ diff --git a/aten/src/ATen/test/half_test.cpp b/aten/src/ATen/test/half_test.cpp index 652823e8e9b1e2..02ccb8b6ce5dc3 100644 --- a/aten/src/ATen/test/half_test.cpp +++ b/aten/src/ATen/test/half_test.cpp @@ -164,3 +164,11 @@ TEST(TestHalf, CommonMath) { assert(std::abs(std::isinf(Half(0.0)) - std::isinf(0.0f)) <= threshold); #endif } + +TEST(TestHalf, ComplexHalf) { + Half real = 3.0f; + Half imag = -10.0f; + auto complex = c10::complex(real, imag); + assert(complex.real() == real); + assert(complex.imag() == imag); +} diff --git a/aten/src/ATen/test/vulkan_api_test.cpp b/aten/src/ATen/test/vulkan_api_test.cpp index 7001677d8dd5e2..d792613411eaf7 100644 --- a/aten/src/ATen/test/vulkan_api_test.cpp +++ b/aten/src/ATen/test/vulkan_api_test.cpp @@ -2,6 +2,7 @@ #include #include +#include #include // TODO: These functions should move to a common place. @@ -64,7 +65,7 @@ void showRtol(const at::Tensor& a, const at::Tensor& b) { } -static void gen_allpermutations(std::vector>& out, std::vector in, int i) { +static void gen_allpermutations(std::vector>& out, std::vector in, unsigned i) { // generate all permutations of a given dims if (i == in.size()) { out.push_back(in); @@ -137,6 +138,31 @@ static void clone_test(const std::vector& size, c10::optional +inline std::vector makeStack(Inputs&&... inputs) { + return {std::forward(inputs)...}; +} + +template +inline std::vector callOpByHandle( + const c10::OperatorHandle& op, + Args... args) { + auto stack = makeStack(std::forward(args)...); + c10::Dispatcher::singleton().callBoxed(op, &stack); + return stack; +} + +template +inline std::vector callOpByName( + const char* func_name, + const char* overload_name, + Args... args) { + const c10::optional op_handle = + c10::Dispatcher::singleton().findSchema({func_name, overload_name}); + assert(op_handle.has_value()); + return callOpByHandle(op_handle.value(), std::forward(args)...); +} + } // namespace namespace { @@ -2962,6 +2988,203 @@ TEST(VulkanAPITest, gru_invalidinputs_exceptions) { has_biases, num_layers, 1.0, train, bidirectional, batch_first); }, ::c10::Error); } + +TEST(VulkanAPITest, gru_prepack_success) { + // Guard + if (!at::is_vulkan_available()) { + return; + } + + // Arrange + const int H_in = 384; // input_size + const int H_out = 384; // hidden_size + const int num_layers = 2; + const double gru_dropout = .0; + const bool has_biases = true; + const bool train = false; + const bool bidirectional = false; + const bool batch_first = true; + const auto in_cpu = at::rand({1, 1, H_in}, at::device(at::kCPU).dtype(at::kFloat)); + const auto h0_cpu = at::rand({num_layers, 1, H_out}, at::device(at::kCPU).dtype(at::kFloat)); + + c10::List weight_ih_l; // shape (3 * hidden_size, input_size) + c10::List weight_hh_l; // shape (3 * hidden_size, hidden_size) + c10::List bias_ih_l; // shape (3 * hidden_size) + c10::List bias_hh_l; // shape (3 * hidden_size) + for (int i = 0; i < num_layers; ++i) { + weight_ih_l.emplace_back(at::rand({3 * H_out, H_in}, at::device(at::kCPU).dtype(at::kFloat))); + weight_hh_l.emplace_back(at::rand({3 * H_out, H_out}, at::device(at::kCPU).dtype(at::kFloat))); + bias_ih_l.emplace_back(at::rand({3 * H_out}, at::device(at::kCPU).dtype(at::kFloat))); + bias_hh_l.emplace_back(at::rand({3 * H_out}, at::device(at::kCPU).dtype(at::kFloat))); + } + + // put this guard here to run inference inststead of training + // to avoid the following error: + // C++ exception with description "0INTERNAL ASSERT FAILED at "xplat/caffe2/aten/src/ATen/core/boxing/KernelFunction.cpp":31, please report a bug to PyTorch. aten::gru.input has kernels registered to both CompositeImplicitAutograd and a backend mapped to AutogradOther. This makes the backend kernel unreachable; the dispatcher will always prefer the CompositeImplicitAutograd lowering (see Note [Ambiguity in AutogradOther kernel]). If you want to override CompositeImplicitAutograd, please open an issue to request a dedicated Autograd dispatch key for the backend. + // If you only want to run inference instead of training, add `c10::InferenceMode mode;` before model.forward(). Note this guard is only available in C++ but not Python at present. + c10::InferenceMode mode; + + // Act + const auto out_cpu = at::gru(in_cpu, h0_cpu, + { weight_ih_l[0], weight_hh_l[0], bias_ih_l[0], bias_hh_l[0], weight_ih_l[1], weight_hh_l[1], bias_ih_l[1], bias_hh_l[1] }, + has_biases, num_layers, gru_dropout, train, bidirectional, batch_first); + + auto prepack = callOpByName( + "vulkan_prepack::gru_prepack", + "", + std::vector({ weight_ih_l.get(0), weight_hh_l.get(0), bias_ih_l.get(0), bias_hh_l.get(0), + weight_ih_l.get(1), weight_hh_l.get(1), bias_ih_l.get(1), bias_hh_l.get(1) }), + has_biases, num_layers, gru_dropout, train, bidirectional, batch_first); + auto out_vulkan = callOpByName( + "vulkan_prepack::gru_run", + "", + in_cpu.vulkan(), h0_cpu.vulkan(), prepack[0]); + + auto cpu_output = std::get<0>(out_cpu); + auto cpu_hidden = std::get<1>(out_cpu); + auto vulkan_output = out_vulkan[0].toTensor(); + auto vulkan_hidden = out_vulkan[1].toTensor(); + + // Assert + const auto check_output = almostEqual(cpu_output, vulkan_output.cpu()); + if (!check_output) { + showRtol(cpu_output, vulkan_output.cpu()); + } + ASSERT_TRUE(check_output); + + const auto check_hidden = almostEqual(cpu_hidden, vulkan_hidden.cpu()); + if (!check_hidden) { + showRtol(cpu_hidden, vulkan_hidden.cpu()); + } + ASSERT_TRUE(check_hidden); +} + +TEST(VulkanAPITest, gru_prepack_invalidinputs_exceptions) { + // Guard + if (!at::is_vulkan_available()) { + return; + } + + // Arrange + const int H_in = 384; // input_size + const int H_out = 384; // hidden_size + const int num_layers = 2; + const double gru_dropout = .0; + const bool has_biases = true; + const bool train = false; + const bool bidirectional = false; + const bool batch_first = true; + const auto in_cpu = at::rand({1, 1, H_in}, at::device(at::kCPU).dtype(at::kFloat)); + const auto h0_cpu = at::rand({num_layers, 1, H_out}, at::device(at::kCPU).dtype(at::kFloat)); + + c10::List weight_ih_l; // shape (3 * hidden_size, input_size) + c10::List weight_hh_l; // shape (3 * hidden_size, hidden_size) + c10::List bias_ih_l; // shape (3 * hidden_size) + c10::List bias_hh_l; // shape (3 * hidden_size) + for (int i = 0; i < num_layers; ++i) { + weight_ih_l.emplace_back(at::rand({3 * H_out, H_in}, at::device(at::kCPU).dtype(at::kFloat))); + weight_hh_l.emplace_back(at::rand({3 * H_out, H_out}, at::device(at::kCPU).dtype(at::kFloat))); + bias_ih_l.emplace_back(at::rand({3 * H_out}, at::device(at::kCPU).dtype(at::kFloat))); + bias_hh_l.emplace_back(at::rand({3 * H_out}, at::device(at::kCPU).dtype(at::kFloat))); + } + + // put this guard here to run inference inststead of training + // to avoid the following error: + // C++ exception with description "0INTERNAL ASSERT FAILED at "xplat/caffe2/aten/src/ATen/core/boxing/KernelFunction.cpp":31, please report a bug to PyTorch. aten::gru.input has kernels registered to both CompositeImplicitAutograd and a backend mapped to AutogradOther. This makes the backend kernel unreachable; the dispatcher will always prefer the CompositeImplicitAutograd lowering (see Note [Ambiguity in AutogradOther kernel]). If you want to override CompositeImplicitAutograd, please open an issue to request a dedicated Autograd dispatch key for the backend. + // If you only want to run inference instead of training, add `c10::InferenceMode mode;` before model.forward(). Note this guard is only available in C++ but not Python at present. + c10::InferenceMode mode; + + // Act: incorrect # of weights/biases + EXPECT_THROW({ + auto prepack = callOpByName( + "vulkan_prepack::gru_prepack", + "", + std::vector({ weight_ih_l.get(0), weight_hh_l.get(0), bias_ih_l.get(0), bias_hh_l.get(0), + weight_ih_l.get(1), weight_hh_l.get(1), bias_ih_l.get(1) }), + has_biases, num_layers, gru_dropout, train, bidirectional, batch_first); + }, ::c10::Error); + + // Act: non-3D input tensor + EXPECT_THROW({ + const auto in_cpu_2d = at::rand({1, H_in}, at::device(at::kCPU).dtype(at::kFloat)); + auto prepack = callOpByName( + "vulkan_prepack::gru_prepack", + "", + std::vector({ weight_ih_l.get(0), weight_hh_l.get(0), bias_ih_l.get(0), bias_hh_l.get(0), + weight_ih_l.get(1), weight_hh_l.get(1), bias_ih_l.get(1), bias_hh_l.get(1) }), + has_biases, num_layers, gru_dropout, train, bidirectional, batch_first); + auto out_vulkan = callOpByName( + "vulkan_prepack::gru_run", + "", + in_cpu_2d.vulkan(), h0_cpu.vulkan(), prepack[0]); + }, ::c10::Error); + + // Act: non-3D hidden tensor + EXPECT_THROW({ + const auto h0_cpu_2d = at::rand({num_layers, H_out}, at::device(at::kCPU).dtype(at::kFloat)); + auto prepack = callOpByName( + "vulkan_prepack::gru_prepack", + "", + std::vector({ weight_ih_l.get(0), weight_hh_l.get(0), bias_ih_l.get(0), bias_hh_l.get(0), + weight_ih_l.get(1), weight_hh_l.get(1), bias_ih_l.get(1), bias_hh_l.get(1) }), + has_biases, num_layers, gru_dropout, train, bidirectional, batch_first); + auto out_vulkan = callOpByName( + "vulkan_prepack::gru_run", + "", + in_cpu.vulkan(), h0_cpu_2d.vulkan(), prepack[0]); + }, ::c10::Error); + + // Act: has_biases should be true + EXPECT_THROW({ + auto prepack = callOpByName( + "vulkan_prepack::gru_prepack", + "", + std::vector({ weight_ih_l.get(0), weight_hh_l.get(0), bias_ih_l.get(0), bias_hh_l.get(0), + weight_ih_l.get(1), weight_hh_l.get(1), bias_ih_l.get(1), bias_hh_l.get(1) }), + false, num_layers, gru_dropout, train, bidirectional, batch_first); + }, ::c10::Error); + + // Act: train should be false + EXPECT_THROW({ + auto prepack = callOpByName( + "vulkan_prepack::gru_prepack", + "", + std::vector({ weight_ih_l.get(0), weight_hh_l.get(0), bias_ih_l.get(0), bias_hh_l.get(0), + weight_ih_l.get(1), weight_hh_l.get(1), bias_ih_l.get(1), bias_hh_l.get(1) }), + has_biases, num_layers, gru_dropout, true, bidirectional, batch_first); + }, ::c10::Error); + + // Act: bidirectional should be false + EXPECT_THROW({ + auto prepack = callOpByName( + "vulkan_prepack::gru_prepack", + "", + std::vector({ weight_ih_l.get(0), weight_hh_l.get(0), bias_ih_l.get(0), bias_hh_l.get(0), + weight_ih_l.get(1), weight_hh_l.get(1), bias_ih_l.get(1), bias_hh_l.get(1) }), + has_biases, num_layers, gru_dropout, train, true, batch_first); + }, ::c10::Error); + + // Act: batch_first should be true + EXPECT_THROW({ + auto prepack = callOpByName( + "vulkan_prepack::gru_prepack", + "", + std::vector({ weight_ih_l.get(0), weight_hh_l.get(0), bias_ih_l.get(0), bias_hh_l.get(0), + weight_ih_l.get(1), weight_hh_l.get(1), bias_ih_l.get(1), bias_hh_l.get(1) }), + has_biases, num_layers, gru_dropout, train, bidirectional, false); + }, ::c10::Error); + + // Act: dropout should be 0.0 + EXPECT_THROW({ + auto prepack = callOpByName( + "vulkan_prepack::gru_prepack", + "", + std::vector({ weight_ih_l.get(0), weight_hh_l.get(0), bias_ih_l.get(0), bias_hh_l.get(0), + weight_ih_l.get(1), weight_hh_l.get(1), bias_ih_l.get(1), bias_hh_l.get(1) }), + has_biases, num_layers, 1.0, train, bidirectional, batch_first); + }, ::c10::Error); +} + } // namespace #endif /* USE_VULKAN_API */ diff --git a/aten/tools/run_tests.sh b/aten/tools/run_tests.sh index 4a724fa9400856..5b0c02c2846a46 100755 --- a/aten/tools/run_tests.sh +++ b/aten/tools/run_tests.sh @@ -64,6 +64,9 @@ fi if [[ -x ./cuda_cub_test ]]; then ./cuda_cub_test fi +if [[ -x ./cuda_atomic_ops_test ]]; then + ./cuda_atomic_ops_test +fi if [ "$VALGRIND" == "ON" ]; then valgrind --suppressions="$VALGRIND_SUP" --error-exitcode=1 ./basic --gtest_filter='-*CUDA' if [[ -x ./tensor_interop_test ]]; then diff --git a/benchmarks/cpp/nvfuser/CMakeLists.txt b/benchmarks/cpp/nvfuser/CMakeLists.txt index b566e6a359e907..3779616ee969f2 100644 --- a/benchmarks/cpp/nvfuser/CMakeLists.txt +++ b/benchmarks/cpp/nvfuser/CMakeLists.txt @@ -10,6 +10,8 @@ if(USE_CUDA) instance_norm.cpp layer_norm.cpp layer_norm_backward.cpp + rms_norm.cpp + rms_norm_backward.cpp lstm_cell.cpp reduction.cpp softmax.cpp diff --git a/benchmarks/cpp/nvfuser/instance_norm.cpp b/benchmarks/cpp/nvfuser/instance_norm.cpp index 007291d75f5f13..2c0cee0b06c75c 100644 --- a/benchmarks/cpp/nvfuser/instance_norm.cpp +++ b/benchmarks/cpp/nvfuser/instance_norm.cpp @@ -14,12 +14,18 @@ using namespace torch::jit::fuser::cuda; -static void setupInstanceNorm(Fusion* fusion, DataType dtype) { +static void setupInstanceNorm( + Fusion* fusion, + DataType dtype, + bool channels_last_3d = false) { TORCH_INTERNAL_ASSERT(dtype == DataType::Float || dtype == DataType::Half); FusionGuard fg(fusion); auto input = makeContigTensor(4, dtype); + if (channels_last_3d) { + input = makeContigTensor(5, dtype); + } auto weight = makeContigTensor(1, dtype); auto bias = makeContigTensor(1, dtype); auto running_mean = makeContigTensor(1, DataType::Float); @@ -51,7 +57,8 @@ static void setupInstanceNorm(Fusion* fusion, DataType dtype) { running_var, kTraining, momentum_ptr, - eps_ptr); + eps_ptr, + channels_last_3d); auto output = unaryOp(UnaryOpType::Relu, norm.output); @@ -67,7 +74,8 @@ static void setupInstanceNorm(Fusion* fusion, DataType dtype) { static void NvFuserScheduler_InstanceNorm( benchmark::State& benchmark_state, FusionExecutorCache* fusion_executor_cache, - DataType dtype) { + DataType dtype, + bool channels_last_3d = false) { TORCH_INTERNAL_ASSERT(dtype == DataType::Float || dtype == DataType::Half); std::vector input_shape{ @@ -76,17 +84,25 @@ static void NvFuserScheduler_InstanceNorm( benchmark_state.range(1), benchmark_state.range(1)}; + std::vector input_shape_3d{ + benchmark_state.range(0), + benchmark_state.range(1), + benchmark_state.range(1), + benchmark_state.range(1), + benchmark_state.range(2)}; + // inputs at::manual_seed(0); auto options = at::TensorOptions().dtype(data_type_to_aten(dtype)).device(at::kCUDA, 0); auto fp32_options = at::TensorOptions().dtype(at::kFloat).device(at::kCUDA, 0); - at::Tensor at_x = at::randn(input_shape, options); - at::Tensor at_weight = at::ones({input_shape[1]}, options); - at::Tensor at_bias = at::zeros({input_shape[1]}, options); - at::Tensor at_mean = at::zeros({input_shape[1]}, fp32_options); - at::Tensor at_var = at::ones({input_shape[1]}, fp32_options); + at::Tensor at_x = + at::randn(channels_last_3d ? input_shape_3d : input_shape, options); + at::Tensor at_weight = at::ones({benchmark_state.range(2)}, options); + at::Tensor at_bias = at::zeros({benchmark_state.range(2)}, options); + at::Tensor at_mean = at::zeros({benchmark_state.range(2)}, fp32_options); + at::Tensor at_var = at::ones({benchmark_state.range(2)}, fp32_options); std::vector aten_inputs = { at_x, at_weight, at_bias, at_mean, at_var}; @@ -94,9 +110,11 @@ static void NvFuserScheduler_InstanceNorm( runBenchmarkIterations(benchmark_state, fusion_executor_cache, aten_inputs); - const size_t kSize = - input_shape[0] * input_shape[1] * input_shape[2] * input_shape[3]; - const size_t kChannels = input_shape[1]; + const size_t kSize = channels_last_3d + ? input_shape[0] * input_shape[1] * input_shape[2] * input_shape[3] * + input_shape[4] + : input_shape[0] * input_shape[1] * input_shape[2] * input_shape[3]; + const size_t kChannels = benchmark_state.range(2); // Read: x, weight, bias, running_mean, running_var // Write: y, running_mean, running_var @@ -108,7 +126,8 @@ static void NvFuserScheduler_InstanceNorm( static void Baseline_InstanceNorm( benchmark::State& benchmark_state, - DataType dtype) { + DataType dtype, + bool channels_last_3d = false) { TORCH_INTERNAL_ASSERT(dtype == DataType::Float || dtype == DataType::Half); std::vector input_shape{ @@ -116,6 +135,14 @@ static void Baseline_InstanceNorm( benchmark_state.range(2), benchmark_state.range(1), benchmark_state.range(1)}; + std::vector input_shape_3d{ + benchmark_state.range(0), + benchmark_state.range(2), + benchmark_state.range(1), + benchmark_state.range(1), + benchmark_state.range(1), + }; + const float kMomentum = 0.1; const float kEps = 1e-5; const auto aten_dtype = data_type_to_aten(dtype); @@ -126,10 +153,15 @@ static void Baseline_InstanceNorm( at::TensorOptions().dtype(at::kFloat).device(at::kCUDA, 0); at::Tensor at_x = at::randn(input_shape, options); - at::Tensor at_weight = at::ones({input_shape[1]}, options); - at::Tensor at_bias = at::zeros({input_shape[1]}, options); - at::Tensor at_mean = at::zeros({input_shape[1]}, fp32_options); - at::Tensor at_var = at::ones({input_shape[1]}, fp32_options); + if (channels_last_3d) { + at_x = at::randn( + input_shape_3d, + options.memory_format(c10::MemoryFormat::ChannelsLast3d)); + } + at::Tensor at_weight = at::ones({benchmark_state.range(2)}, options); + at::Tensor at_bias = at::zeros({benchmark_state.range(2)}, options); + at::Tensor at_mean = at::zeros({benchmark_state.range(2)}, fp32_options); + at::Tensor at_var = at::ones({benchmark_state.range(2)}, fp32_options); auto ato_weight = c10::optional(at_weight); auto ato_bias = c10::optional(at_bias); @@ -159,9 +191,11 @@ static void Baseline_InstanceNorm( cudaDeviceSynchronize(); } - const size_t kSize = - input_shape[0] * input_shape[1] * input_shape[2] * input_shape[3]; - const size_t kChannels = input_shape[1]; + const size_t kSize = channels_last_3d + ? input_shape[0] * input_shape[1] * input_shape[2] * input_shape[3] * + input_shape[4] + : input_shape[0] * input_shape[1] * input_shape[2] * input_shape[3]; + const size_t kChannels = benchmark_state.range(2); // Read: x, weight, bias, running_mean, running_var // Write: y, running_mean, running_var @@ -181,6 +215,11 @@ static void Baseline_InstanceNorm_fp16(benchmark::State& benchmark_state) { Baseline_InstanceNorm(benchmark_state, DataType::Half); } +static void Baseline_InstanceNorm_fp32_channels_last_3d( + benchmark::State& benchmark_state) { + Baseline_InstanceNorm(benchmark_state, DataType::Float, true); +} + //------------------------------------------------------------------------------ NVFUSER_BENCHMARK_DEFINE( @@ -195,6 +234,43 @@ NVFUSER_BENCHMARK_RUN(NvFuserScheduler_InstanceNorm_fp32) ->Unit(benchmark::kMicrosecond) ->UseManualTime(); +NVFUSER_BENCHMARK_DEFINE( + NvFuserScheduler_InstanceNorm3d_channels_last_fp32, + setupInstanceNorm, + NvFuserScheduler_InstanceNorm, + DataType::Float, + true); + +NVFUSER_BENCHMARK_RUN(NvFuserScheduler_InstanceNorm3d_channels_last_fp32) + ->RangeMultiplier(2) + ->Ranges({{1, 8}, {128, 128}, {32, 32}}) + ->Unit(benchmark::kMicrosecond) + ->UseManualTime(); + +NVFUSER_BENCHMARK_RUN(NvFuserScheduler_InstanceNorm3d_channels_last_fp32) + ->RangeMultiplier(2) + ->Ranges({{1, 8}, {64, 64}, {64, 64}}) + ->Unit(benchmark::kMicrosecond) + ->UseManualTime(); + +NVFUSER_BENCHMARK_RUN(NvFuserScheduler_InstanceNorm3d_channels_last_fp32) + ->RangeMultiplier(2) + ->Ranges({{1, 8}, {32, 32}, {128, 128}}) + ->Unit(benchmark::kMicrosecond) + ->UseManualTime(); + +NVFUSER_BENCHMARK_RUN(NvFuserScheduler_InstanceNorm3d_channels_last_fp32) + ->RangeMultiplier(2) + ->Ranges({{1, 8}, {16, 16}, {256, 256}}) + ->Unit(benchmark::kMicrosecond) + ->UseManualTime(); + +NVFUSER_BENCHMARK_RUN(NvFuserScheduler_InstanceNorm3d_channels_last_fp32) + ->RangeMultiplier(2) + ->Ranges({{1, 8}, {4, 8}, {320, 320}}) + ->Unit(benchmark::kMicrosecond) + ->UseManualTime(); + NVFUSER_BENCHMARK_DEFINE( NvFuserScheduler_InstanceNorm_fp16, setupInstanceNorm, @@ -220,4 +296,28 @@ BENCHMARK(Baseline_InstanceNorm_fp16) ->Unit(benchmark::kMicrosecond) ->UseManualTime(); +BENCHMARK(Baseline_InstanceNorm_fp32_channels_last_3d) + ->RangeMultiplier(2) + ->Ranges({{2, 8}, {128, 128}, {32, 32}}) + ->Unit(benchmark::kMicrosecond) + ->UseManualTime(); + +BENCHMARK(Baseline_InstanceNorm_fp32_channels_last_3d) + ->RangeMultiplier(2) + ->Ranges({{2, 8}, {64, 64}, {64, 64}}) + ->Unit(benchmark::kMicrosecond) + ->UseManualTime(); + +BENCHMARK(Baseline_InstanceNorm_fp32_channels_last_3d) + ->RangeMultiplier(2) + ->Ranges({{2, 8}, {16, 16}, {256, 256}}) + ->Unit(benchmark::kMicrosecond) + ->UseManualTime(); + +BENCHMARK(Baseline_InstanceNorm_fp32_channels_last_3d) + ->RangeMultiplier(2) + ->Ranges({{2, 8}, {4, 8}, {320, 320}}) + ->Unit(benchmark::kMicrosecond) + ->UseManualTime(); + //------------------------------------------------------------------------------ diff --git a/benchmarks/cpp/nvfuser/layer_norm.cpp b/benchmarks/cpp/nvfuser/layer_norm.cpp index 7500ac8525b6b5..bdbc7ec6ac0a8b 100644 --- a/benchmarks/cpp/nvfuser/layer_norm.cpp +++ b/benchmarks/cpp/nvfuser/layer_norm.cpp @@ -46,8 +46,8 @@ static void setupLayerNorm(Fusion* fusion, DataType dtype) { auto output = layer_norm_results.output; - if (dtype == DataType::Half) { - output = castOp(DataType::Half, output); + if (dtype != DataType::Float) { + output = castOp(dtype, output); } fusion->addOutput(output); diff --git a/benchmarks/cpp/nvfuser/layer_norm_backward.cpp b/benchmarks/cpp/nvfuser/layer_norm_backward.cpp index 045465e712539f..fe95c01048f2b4 100644 --- a/benchmarks/cpp/nvfuser/layer_norm_backward.cpp +++ b/benchmarks/cpp/nvfuser/layer_norm_backward.cpp @@ -61,13 +61,12 @@ static void setupLayerNorm_BWD(Fusion* fusion, DataType dtype) { auto layer_norm_results = layer_norm_backward( grad_out, input, {1}, mean, rstd, weight, bias, {true, true, true}); - if (dtype == DataType::Half) { + if (dtype != DataType::Float) { layer_norm_results.grad_input = - castOp(DataType::Half, layer_norm_results.grad_input); - layer_norm_results.grad_bias = - castOp(DataType::Half, layer_norm_results.grad_bias); + castOp(dtype, layer_norm_results.grad_input); + layer_norm_results.grad_bias = castOp(dtype, layer_norm_results.grad_bias); layer_norm_results.grad_weight = - castOp(DataType::Half, layer_norm_results.grad_weight); + castOp(dtype, layer_norm_results.grad_weight); } fusion->addOutput(layer_norm_results.grad_input); diff --git a/benchmarks/cpp/nvfuser/rms_norm.cpp b/benchmarks/cpp/nvfuser/rms_norm.cpp new file mode 100644 index 00000000000000..9c46896366ccf0 --- /dev/null +++ b/benchmarks/cpp/nvfuser/rms_norm.cpp @@ -0,0 +1,171 @@ +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include + +#include "utils.h" + +using namespace torch::jit::fuser::cuda; + +//------------------------------------------------------------------------------ + +static void setupRMSNorm(Fusion* fusion, DataType dtype) { + TORCH_INTERNAL_ASSERT( + dtype == DataType::Float || dtype == DataType::Half || + dtype == DataType::BFloat16); + + FusionGuard fg(fusion); + + const int kReductionAxis = 2; + const float kEps = 1e-6; + + Double* eps_ptr = IrBuilder::create(kEps); + + // setup fusion + auto input = makeContigTensor(3, dtype); + auto weight = makeContigTensor(1, dtype); + + fusion->addInput(input); + fusion->addInput(weight); + + if (dtype == DataType::Half) { + input = castOp(DataType::Float, input); + weight = castOp(DataType::Float, weight); + } + + auto rms_norm_results = rms_norm(input, 1, weight, eps_ptr); + + auto output = rms_norm_results.output; + + if (dtype != DataType::Float) { + output = castOp(dtype, output); + } + + fusion->addOutput(output); +} + +static void NvFuserScheduler_RMSNorm( + benchmark::State& benchmark_state, + FusionExecutorCache* fusion_executor_cache, + DataType dtype) { + TORCH_INTERNAL_ASSERT( + dtype == DataType::Float || dtype == DataType::Half || + dtype == DataType::BFloat16); + + std::vector input_shape{8, benchmark_state.range(0), 1024}; + const float kEps = 1e-6; + + // inputs + at::manual_seed(0); + auto options = + at::TensorOptions().dtype(data_type_to_aten(dtype)).device(at::kCUDA, 0); + at::Tensor input = at::randn(input_shape, options); + at::Tensor weight = at::randn({input_shape[2]}, options); + + std::vector aten_inputs({input, weight}); + + runBenchmarkIterations(benchmark_state, fusion_executor_cache, aten_inputs); + + benchmark_state.SetBytesProcessed( + int64_t(benchmark_state.iterations()) * + (2 * input.numel() + weight.numel()) * int64_t(dataTypeSize(dtype))); +} + +//------------------------------------------------------------------------------ + +NVFUSER_BENCHMARK_DEFINE( + NvFuserScheduler_RMSNorm_fp32, + setupRMSNorm, + NvFuserScheduler_RMSNorm, + DataType::Float); + +NVFUSER_BENCHMARK_RUN(NvFuserScheduler_RMSNorm_fp32) + ->RangeMultiplier(2) + ->Ranges({{16, 64}}) + ->Unit(benchmark::kMicrosecond) + ->UseManualTime(); + +NVFUSER_BENCHMARK_RUN(NvFuserScheduler_RMSNorm_fp32) + ->RangeMultiplier(2) + ->Ranges({{18, 56}}) + ->Unit(benchmark::kMicrosecond) + ->UseManualTime(); + +NVFUSER_BENCHMARK_RUN(NvFuserScheduler_RMSNorm_fp32) + ->RangeMultiplier(2) + ->Ranges({{22, 44}}) + ->Unit(benchmark::kMicrosecond) + ->UseManualTime(); + +NVFUSER_BENCHMARK_RUN(NvFuserScheduler_RMSNorm_fp32) + ->RangeMultiplier(2) + ->Ranges({{24, 48}}) + ->Unit(benchmark::kMicrosecond) + ->UseManualTime(); +NVFUSER_BENCHMARK_DEFINE( + NvFuserScheduler_RMSNorm_fp16, + setupRMSNorm, + NvFuserScheduler_RMSNorm, + DataType::Half); + +NVFUSER_BENCHMARK_RUN(NvFuserScheduler_RMSNorm_fp16) + ->RangeMultiplier(2) + ->Ranges({{16, 64}}) + ->Unit(benchmark::kMicrosecond) + ->UseManualTime(); + +NVFUSER_BENCHMARK_RUN(NvFuserScheduler_RMSNorm_fp16) + ->RangeMultiplier(2) + ->Ranges({{18, 56}}) + ->Unit(benchmark::kMicrosecond) + ->UseManualTime(); + +NVFUSER_BENCHMARK_RUN(NvFuserScheduler_RMSNorm_fp16) + ->RangeMultiplier(2) + ->Ranges({{22, 44}}) + ->Unit(benchmark::kMicrosecond) + ->UseManualTime(); + +NVFUSER_BENCHMARK_RUN(NvFuserScheduler_RMSNorm_fp16) + ->RangeMultiplier(2) + ->Ranges({{24, 48}}) + ->Unit(benchmark::kMicrosecond) + ->UseManualTime(); + +NVFUSER_BENCHMARK_DEFINE( + NvFuserScheduler_RMSNorm_bf16, + setupRMSNorm, + NvFuserScheduler_RMSNorm, + DataType::BFloat16); + +NVFUSER_BENCHMARK_RUN(NvFuserScheduler_RMSNorm_bf16) + ->RangeMultiplier(2) + ->Ranges({{16, 64}}) + ->Unit(benchmark::kMicrosecond) + ->UseManualTime(); + +NVFUSER_BENCHMARK_RUN(NvFuserScheduler_RMSNorm_bf16) + ->RangeMultiplier(2) + ->Ranges({{18, 56}}) + ->Unit(benchmark::kMicrosecond) + ->UseManualTime(); + +NVFUSER_BENCHMARK_RUN(NvFuserScheduler_RMSNorm_bf16) + ->RangeMultiplier(2) + ->Ranges({{22, 44}}) + ->Unit(benchmark::kMicrosecond) + ->UseManualTime(); + +NVFUSER_BENCHMARK_RUN(NvFuserScheduler_RMSNorm_bf16) + ->RangeMultiplier(2) + ->Ranges({{24, 48}}) + ->Unit(benchmark::kMicrosecond) + ->UseManualTime(); diff --git a/benchmarks/cpp/nvfuser/rms_norm_backward.cpp b/benchmarks/cpp/nvfuser/rms_norm_backward.cpp new file mode 100644 index 00000000000000..3bd66b412b97ea --- /dev/null +++ b/benchmarks/cpp/nvfuser/rms_norm_backward.cpp @@ -0,0 +1,165 @@ +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include + +#include "utils.h" + +using namespace torch::jit::fuser::cuda; + +//------------------------------------------------------------------------------ + +static void setupRMSNorm_BWD(Fusion* fusion, DataType dtype) { + FusionGuard fg(fusion); + + TORCH_INTERNAL_ASSERT( + dtype == DataType::Float || dtype == DataType::Half || + dtype == DataType::BFloat16); + + const int kReductionAxis = 2; + Double* eps_ptr = IrBuilder::create(1e-6); + + // setup fusion + auto grad_out = makeContigTensor(3, dtype); + auto input = makeContigTensor(3, dtype); + auto weight = makeContigTensor(1, dtype); + auto rstd = TensorViewBuilder() + .contiguity({false, false, false}) + .shape({-1, -1, 1}) + .dtype(dtype) + .build(); + + fusion->addInput(grad_out); + fusion->addInput(input); + fusion->addInput(weight); + fusion->addInput(rstd); + + if (dtype == DataType::Half) { + grad_out = castOp(DataType::Float, grad_out); + input = castOp(DataType::Float, input); + weight = castOp(DataType::Float, weight); + rstd = castOp(DataType::Float, rstd); + } + + auto rms_norm_results = + rms_norm_backward(grad_out, input, {1}, rstd, weight, {true, true, true}); + + if (dtype != DataType::Float) { + rms_norm_results.grad_input = castOp(dtype, rms_norm_results.grad_input); + rms_norm_results.grad_weight = castOp(dtype, rms_norm_results.grad_weight); + } + + fusion->addOutput(rms_norm_results.grad_input); + fusion->addOutput(rms_norm_results.grad_weight); +} + +static void NvFuserScheduler_RMSNorm_BWD( + benchmark::State& benchmark_state, + FusionExecutorCache* fusion_executor_cache, + DataType dtype) { + TORCH_INTERNAL_ASSERT( + dtype == DataType::Float || dtype == DataType::Half || + dtype == DataType::BFloat16); + + std::vector input_shape{8, benchmark_state.range(0), 1024}; + + // inputs + at::manual_seed(0); + auto options = + at::TensorOptions().dtype(data_type_to_aten(dtype)).device(at::kCUDA, 0); + at::Tensor grad_out = at::randn(input_shape, options); + at::Tensor input = at::randn(input_shape, options); + at::Tensor weight = at::randn({input_shape[2]}, options); + at::Tensor rstd = at::randn({input_shape[0], input_shape[1], 1}, options); + + std::vector aten_inputs({grad_out, input, weight, rstd}); + + runBenchmarkIterations(benchmark_state, fusion_executor_cache, aten_inputs); + + benchmark_state.SetBytesProcessed( + int64_t(benchmark_state.iterations()) * + (3 * input.numel() + weight.numel() + rstd.numel()) * + int64_t(dataTypeSize(dtype))); +} + +//------------------------------------------------------------------------------ + +NVFUSER_BENCHMARK_DEFINE( + NvFuserScheduler_RMSNorm_BWD_fp32, + setupRMSNorm_BWD, + NvFuserScheduler_RMSNorm_BWD, + DataType::Float); + +NVFUSER_BENCHMARK_RUN(NvFuserScheduler_RMSNorm_BWD_fp32) + ->RangeMultiplier(2) + ->Ranges({{16, 64}}) + ->Unit(benchmark::kMicrosecond) + ->UseManualTime(); + +NVFUSER_BENCHMARK_RUN(NvFuserScheduler_RMSNorm_BWD_fp32) + ->RangeMultiplier(2) + ->Ranges({{28, 56}}) + ->Unit(benchmark::kMicrosecond) + ->UseManualTime(); + +NVFUSER_BENCHMARK_RUN(NvFuserScheduler_RMSNorm_BWD_fp32) + ->RangeMultiplier(2) + ->Ranges({{24, 48}}) + ->Unit(benchmark::kMicrosecond) + ->UseManualTime(); + +NVFUSER_BENCHMARK_DEFINE( + NvFuserScheduler_RMSNorm_BWD_fp16, + setupRMSNorm_BWD, + NvFuserScheduler_RMSNorm_BWD, + DataType::Half); + +NVFUSER_BENCHMARK_RUN(NvFuserScheduler_RMSNorm_BWD_fp16) + ->RangeMultiplier(2) + ->Ranges({{16, 64}}) + ->Unit(benchmark::kMicrosecond) + ->UseManualTime(); + +NVFUSER_BENCHMARK_RUN(NvFuserScheduler_RMSNorm_BWD_fp16) + ->RangeMultiplier(2) + ->Ranges({{28, 56}}) + ->Unit(benchmark::kMicrosecond) + ->UseManualTime(); + +NVFUSER_BENCHMARK_RUN(NvFuserScheduler_RMSNorm_BWD_fp16) + ->RangeMultiplier(2) + ->Ranges({{24, 48}}) + ->Unit(benchmark::kMicrosecond) + ->UseManualTime(); + +NVFUSER_BENCHMARK_DEFINE( + NvFuserScheduler_RMSNorm_BWD_bf16, + setupRMSNorm_BWD, + NvFuserScheduler_RMSNorm_BWD, + DataType::BFloat16); + +NVFUSER_BENCHMARK_RUN(NvFuserScheduler_RMSNorm_BWD_bf16) + ->RangeMultiplier(2) + ->Ranges({{16, 64}}) + ->Unit(benchmark::kMicrosecond) + ->UseManualTime(); + +NVFUSER_BENCHMARK_RUN(NvFuserScheduler_RMSNorm_BWD_bf16) + ->RangeMultiplier(2) + ->Ranges({{28, 56}}) + ->Unit(benchmark::kMicrosecond) + ->UseManualTime(); + +NVFUSER_BENCHMARK_RUN(NvFuserScheduler_RMSNorm_BWD_bf16) + ->RangeMultiplier(2) + ->Ranges({{24, 48}}) + ->Unit(benchmark::kMicrosecond) + ->UseManualTime(); diff --git a/benchmarks/fastrnns/fuser.py b/benchmarks/fastrnns/fuser.py index e1daab594c5083..29d395055296b5 100644 --- a/benchmarks/fastrnns/fuser.py +++ b/benchmarks/fastrnns/fuser.py @@ -4,18 +4,18 @@ def set_fuser(fuser_name, executor_name): assert fuser_name in ['te', 'old', 'none', 'default'] if fuser_name == 'te': torch._C._jit_set_profiling_executor(True) - torch._C._jit_set_profiling_mode(True) + torch._C._get_graph_executor_optimize(True) torch._C._jit_override_can_fuse_on_cpu(False) torch._C._jit_override_can_fuse_on_gpu(True) torch._C._jit_set_texpr_fuser_enabled(True) elif fuser_name == 'old': torch._C._jit_set_profiling_executor(False) - torch._C._jit_set_profiling_mode(False) + torch._C._get_graph_executor_optimize(False) torch._C._jit_override_can_fuse_on_gpu(True) torch._C._jit_set_texpr_fuser_enabled(False) elif fuser_name == 'none': torch._C._jit_set_profiling_executor(False) - torch._C._jit_set_profiling_mode(False) + torch._C._get_graph_executor_optimize(False) torch._C._jit_override_can_fuse_on_gpu(False) torch._C._jit_override_can_fuse_on_cpu(False) torch._C._jit_set_texpr_fuser_enabled(False) @@ -25,12 +25,11 @@ def set_fuser(fuser_name, executor_name): # --executor overrides settings of --fuser if executor_name == 'profiling': torch._C._jit_set_profiling_executor(True) - torch._C._jit_set_profiling_mode(True) + torch._C._get_graph_executor_optimize(True) elif executor_name == 'simple': - torch._C._jit_set_profiling_executor(True) - torch._C._jit_set_profiling_mode(False) + torch._C._get_graph_executor_optimize(False) elif executor_name == 'legacy': torch._C._jit_set_profiling_executor(False) - torch._C._jit_set_profiling_mode(False) + torch._C._get_graph_executor_optimize(True) elif executor_name == 'default': pass diff --git a/benchmarks/operator_benchmark/benchmark_core.py b/benchmarks/operator_benchmark/benchmark_core.py index 4248e4776f22bd..16a66d5cf92be5 100644 --- a/benchmarks/operator_benchmark/benchmark_core.py +++ b/benchmarks/operator_benchmark/benchmark_core.py @@ -200,8 +200,8 @@ def _print_header(self): print("# {}".format(self.args.operators)) def _print_perf_result(self, reported_run_time_us, test_case): - if self.args.ai_pep_format: - # Output for AI-PEP + if self.args.report_aibench: + # Output for AIBench # Print out per iteration execution time instead of avg time return test_name = '_'.join([test_case.framework, test_case.test_config.test_name]) @@ -288,7 +288,7 @@ def _measure_time(self, launch_test, test_case, iters, print_per_iter): report_run_time = 1e6 * run_time_sec / iters time_trace.append(report_run_time) # Print out the time spent in each epoch in ms - if self.args.ai_pep_format: + if self.args.report_aibench: mode = "JIT" if self.use_jit else "Eager" test_name = '_'.join([test_case.framework, test_case.test_config.test_name, mode]) print("PyTorchObserver " + json.dumps( diff --git a/benchmarks/operator_benchmark/benchmark_runner.py b/benchmarks/operator_benchmark/benchmark_runner.py index b9347364428eac..3e998e6ceb4ea2 100644 --- a/benchmarks/operator_benchmark/benchmark_runner.py +++ b/benchmarks/operator_benchmark/benchmark_runner.py @@ -89,12 +89,12 @@ def parse_args(): ) parser.add_argument( - "--ai_pep_format", + "--report_aibench", type=benchmark_utils.str2bool, nargs='?', const=True, default=False, - help="Print result when running on AI-PEP" + help="Print result when running on AIBench" ) parser.add_argument( diff --git a/benchmarks/operator_benchmark/pt/qinterpolate_test.py b/benchmarks/operator_benchmark/pt/qinterpolate_test.py index ec58e6e6a7dd5f..764274f925810e 100644 --- a/benchmarks/operator_benchmark/pt/qinterpolate_test.py +++ b/benchmarks/operator_benchmark/pt/qinterpolate_test.py @@ -44,7 +44,7 @@ def init(self, M, N, K, dtype, mode, scale, contig): zero_point=zero_point, dtype=dtype) if not contig: - permute_dims = list(range(q_input.ndim))[::-1] + permute_dims = list(range(self.q_input.ndim))[::-1] self.q_input = self.q_input.permute(permute_dims) self.inputs = { diff --git a/benchmarks/static_runtime/test_static_module.cc b/benchmarks/static_runtime/test_static_module.cc index be634a48def71d..85c5e7832735b6 100644 --- a/benchmarks/static_runtime/test_static_module.cc +++ b/benchmarks/static_runtime/test_static_module.cc @@ -1529,3 +1529,82 @@ TEST(ForceNonEmptyOutputs, TwoSubBlocks) { } } } + +TEST(EliminateExtraPermuteOps, FusesCorrectly) { + const auto src = R"JIT( + def forward(self, x): + y = torch.permute(x, (0, 2, 1)) + z = torch.sum(y, dim=-1) + return z + )JIT"; + torch::jit::Module mod("m"); + mod.define(src); + + auto graph = mod.get_method("forward").graph(); + // turn the ListConstruct(%constant) into proper constant lists + ConstantPropagation(graph); + EliminateExtraPermuteOps(graph); + + EXPECT_FALSE(hasNodeWithKind(graph, "aten::permute")); + auto* sum = getNodeWithKind(graph, "aten::sum"); + ASSERT_NE(sum, nullptr); + auto dim = toIValue(sum->input(1)); + ASSERT_TRUE(dim.has_value() && dim->isIntList()); + EXPECT_EQ(dim->toIntList(), c10::List{1}); +} + +TEST(EliminateExtraPermuteOps, DoesNotFuseWrongDim) { + const auto src = R"JIT( + def forward(self, x): + y = torch.permute(x, (0, 2, 1)) + z = torch.sum(y, dim=1) + return z + )JIT"; + torch::jit::Module mod("m"); + mod.define(src); + + auto graph = mod.get_method("forward").graph(); + // turn the ListConstruct(%constant) into proper constant lists + ConstantPropagation(graph); + EliminateExtraPermuteOps(graph); + + EXPECT_TRUE(hasNodeWithKind(graph, "aten::permute")); +} + +TEST(EliminateExtraPermuteOps, DoesNotFuseNonConstantDim) { + const auto src = R"JIT( + def forward(self, x, dim: int): + y = torch.permute(x, (0, 2, 1)) + z = torch.sum(y, dim=dim) + return z + )JIT"; + torch::jit::Module mod("m"); + mod.define(src); + + auto graph = mod.get_method("forward").graph(); + // turn the ListConstruct(%constant) into proper constant lists + ConstantPropagation(graph); + EliminateExtraPermuteOps(graph); + + EXPECT_TRUE(hasNodeWithKind(graph, "aten::permute")); +} + +TEST(UseSplitAndSqueeze, Fusion) { + const auto src = R"IR( + graph(%x: Tensor): + %dim: int = prim::Constant[value=1]() + %split_size: int = prim::Constant[value=1]() + %split: Tensor[] = aten::split(%x, %split_size, %dim) + %a: Tensor, %b: Tensor = prim::ListUnpack(%split) + %c: Tensor = aten::squeeze(%a, %dim) + %d: Tensor = aten::squeeze(%b, %dim) + return (%c, %d) + )IR"; + auto graph = getGraphFromIR(src); + UseSplitAndSqueeze(graph); + EXPECT_TRUE( + hasNodeWithKind(graph, "static_runtime::fused_split_and_squeeze")); + EXPECT_FALSE(hasNodeWithKind(graph, "aten::split")); + EXPECT_FALSE(hasNodeWithKind(graph, "aten::squeeze")); + EXPECT_FALSE(hasNodeWithKind(graph, "prim::ListUnpack")); +} diff --git a/benchmarks/static_runtime/test_static_runtime.cc b/benchmarks/static_runtime/test_static_runtime.cc index b64e3d8d0d6f5f..7ef02659cc8bfc 100644 --- a/benchmarks/static_runtime/test_static_runtime.cc +++ b/benchmarks/static_runtime/test_static_runtime.cc @@ -172,6 +172,108 @@ TEST(StaticRuntime, Clamp) { testStaticRuntime(clamp_script_2, {a, min_t, max_t}, {b, max_t1, min_t1}); } +TEST(StaticRuntime, LenWithTuple) { + const auto src = R"IR( + graph(%input : int[]): + %res : int = aten::len(%input) + return (%res) + )IR"; + + testStaticRuntime(src, {c10::List(4)}); +} + +TEST(StaticRuntime, LenWithTensor) { + const auto src = R"IR( + graph(%input : Tensor): + %res : int = aten::len(%input) + return (%res) + )IR"; + + testStaticRuntime(src, {at::randn({2, 2, 2})}); +} + +TEST(StaticRuntime, LenWithStr) { + const auto src = R"IR( + graph(%input : str): + %res : int = aten::len(%input) + return (%res) + )IR"; + + testStaticRuntime(src, {"static_runtime"}); +} + +TEST(StaticRuntime, LenWithDict_str) { + const auto script = R"JIT( + def forward(self, input: Dict[str, str]): + return len(input) + )JIT"; + + c10::Dict dict; + dict.insert("abc", "123"); + dict.insert("def", "456"); + testStaticRuntime(script, {dict}); +} + +TEST(StaticRuntime, LenWithDict_int) { + const auto script = R"JIT( + def forward(self, input: Dict[int, int]): + return len(input) + )JIT"; + + c10::Dict dict; + dict.insert(0, 1); + dict.insert(2, 3); + testStaticRuntime(script, {dict}); +} + +TEST(StaticRuntime, LenWithDict_bool) { + const auto script = R"JIT( + def forward(self, input: Dict[bool, bool]): + return len(input) + )JIT"; + + c10::Dict dict; + dict.insert(true, false); + dict.insert(false, true); + testStaticRuntime(script, {dict}); +} + +TEST(StaticRuntime, LenWithDict_float) { + const auto script = R"JIT( + def forward(self, input: Dict[float, float]): + return len(input) + )JIT"; + + c10::Dict dict; + dict.insert(0.1, 0.9); + dict.insert(0.8, 0.18); + testStaticRuntime(script, {dict}); +} + +TEST(StaticRuntime, LenWithDict_complex) { + const auto script = R"JIT( + def forward(self, input: Dict[complex, complex]): + return len(input) + )JIT"; + + c10::Dict, c10::complex> dict; + dict.insert(0.1, 0.4); + dict.insert(0.9, 0.45); + testStaticRuntime(script, {dict}); +} + +TEST(StaticRuntime, LenWithDict_Tensor) { + const auto script = R"JIT( + def forward(self, input: Dict[Tensor, Tensor]): + return len(input) + )JIT"; + + c10::Dict dict; + dict.insert(at::randn({1, 2}), at::randn({1, 2})); + dict.insert(at::randn({1, 2}), at::randn({1, 2})); + testStaticRuntime(script, {dict}); +} + TEST(StaticRuntime, Logit) { // no nnc const auto logit_script_1 = R"JIT( @@ -304,13 +406,6 @@ TEST(StaticRuntime, LayerNorm) { return torch.layer_norm(input, normalized_shape, None, None, 1e-05, False).clone() )JIT"; -#ifdef FBCODE_CAFFE2 - script::Module module("module"); - module.define(layer_norm_with_weights); - torch::jit::StaticModule smodule(module); - ASSERT_EQ(getNodeWithKind(smodule, "aten::layer_norm"), nullptr); - ASSERT_NE(getNodeWithKind(smodule, "static_runtime::layer_norm"), nullptr); -#endif const auto a = torch::rand({1, 2, 2, 2}); const auto b = torch::rand({3, 2, 2, 2}); for (int normalized_size : {2, 3}) { @@ -1170,13 +1265,23 @@ TEST(StaticRuntime, Full) { return (a.clone()) )JIT"; - auto dtype = at::ScalarType::Int; auto cpu = at::Device(DeviceType::CPU); c10::List size0{2, 5}; - std::vector args{size0, 4, dtype, at::kStrided, cpu, false}; + std::vector args{ + size0, 4, at::ScalarType::Int, at::kStrided, cpu, false}; + std::vector args1{ + size0, 4, at::ScalarType::Float, at::kStrided, cpu, false}; c10::List size1{5, 6}; - std::vector args2{size1, 5, dtype, at::kStrided, cpu, false}; + std::vector args2{ + size1, 5, at::ScalarType::Float, at::kStrided, cpu, false}; testStaticRuntime(full_script, args); + testStaticRuntime( + full_script, + args, + args1, + /*use_allclose=*/false, + /*use_equalnan=*/false, + /*check_resize=*/false); testStaticRuntime(full_script, args, args2); } @@ -1202,16 +1307,157 @@ TEST(StaticRuntime, FullLike) { auto a = at::randn({2, 3}); auto b = at::randn({3, 4, 2}); - auto dtype = at::ScalarType::Int; auto cpu = at::Device(DeviceType::CPU); std::vector args{ - a, 4, dtype, at::kStrided, cpu, false, c10::MemoryFormat::Contiguous}; + a, + 4, + at::ScalarType::Int, + at::kStrided, + cpu, + false, + c10::MemoryFormat::Contiguous}; + std::vector args1{ + a, + 4, + at::ScalarType::Float, + at::kStrided, + cpu, + false, + c10::MemoryFormat::Contiguous}; std::vector args2{ - b, 4, dtype, at::kStrided, cpu, false, c10::MemoryFormat::Contiguous}; + b, + 4, + at::ScalarType::Float, + at::kStrided, + cpu, + false, + c10::MemoryFormat::Contiguous}; testStaticRuntime(full_like_script, args); + testStaticRuntime( + full_like_script, + args, + args1, + /*use_allclose=*/false, + /*use_equalnan=*/false, + /*check_resize=*/false); testStaticRuntime(full_like_script, args, args2); } +TEST(StaticRuntime, Ones) { + const auto script = R"JIT( + def forward(self, + size: List[int], + dtype: Optional[int], + layout: Optional[int], + device: Optional[Device], + pin_memory: Optional[bool]): + a = torch.ones(size, + dtype=dtype, + layout=layout, + device=device, + pin_memory=pin_memory) + return (a.clone()) + )JIT"; + + auto dtype = at::ScalarType::Int; + auto cpu = at::Device(DeviceType::CPU); + c10::List size0{2, 5}; + std::vector args{size0, dtype, at::kStrided, cpu, false}; + c10::List size1{5, 6}; + std::vector args2{size1, dtype, at::kStrided, cpu, false}; + testStaticRuntime(script, args); + testStaticRuntime(script, args, args2); +} + +TEST(StaticRuntime, OnesLike) { + const auto script = R"JIT( + def forward(self, + input: Tensor, + dtype: Optional[int], + layout: Optional[int], + device: Optional[Device], + pin_memory: Optional[bool], + memory_format: Optional[int]): + a = torch.ones_like(input, + dtype=dtype, + layout=layout, + device=device, + pin_memory=pin_memory, + memory_format=memory_format) + return (a.clone()) + )JIT"; + + auto cpu = at::Device(DeviceType::CPU); + auto input0 = at::randn({2, 5}); + std::vector args{ + input0, + at::ScalarType::Int, + at::kStrided, + cpu, + false, + c10::MemoryFormat::Contiguous}; + std::vector args1{ + input0, + at::ScalarType::Float, + at::kStrided, + cpu, + false, + c10::MemoryFormat::Contiguous}; + auto input1 = at::randn({5, 6}); + std::vector args2{ + input1, + at::ScalarType::Float, + at::kStrided, + cpu, + false, + c10::MemoryFormat::Contiguous}; + testStaticRuntime(script, args); + testStaticRuntime( + script, + args, + args1, + /*use_allclose=*/false, + /*use_equalnan=*/false, + /*check_resize=*/false); + testStaticRuntime(script, args, args2); +} + +TEST(StaticRuntime, Zeros) { + const auto script = R"JIT( + def forward(self, + size: List[int], + dtype: Optional[int], + layout: Optional[int], + device: Optional[Device], + pin_memory: Optional[bool]): + a = torch.zeros(size, + dtype=dtype, + layout=layout, + device=device, + pin_memory=pin_memory) + return (a.clone()) + )JIT"; + + auto cpu = at::Device(DeviceType::CPU); + c10::List size0{2, 5}; + std::vector args{ + size0, at::ScalarType::Int, at::kStrided, cpu, false}; + std::vector args1{ + size0, at::ScalarType::Float, at::kStrided, cpu, false}; + c10::List size1{5, 6}; + std::vector args2{ + size1, at::ScalarType::Float, at::kStrided, cpu, false}; + testStaticRuntime(script, args); + testStaticRuntime( + script, + args, + args1, + /*use_allclose=*/false, + /*use_equalnan=*/false, + /*check_resize=*/false); + testStaticRuntime(script, args, args2); +} + TEST(StaticRuntime, Linear) { const auto linear_script = R"JIT( def forward(self, inp: Tensor, weights: Tensor, bias: Optional[Tensor]) -> Tensor: @@ -1442,6 +1688,28 @@ TEST(StaticRuntime, Index) { testStaticRuntime(index_with_two_tensors_script, args_c, args_d); } +TEST(StaticRuntime, IndexSelect) { + const std::string script = R"IR( + graph(%self: Tensor, %dim: int, %index: Tensor): + %bias: None = prim::Constant() + %ret = aten::index_select(%self, %dim, %index) + %cloned = aten::clone(%ret, %bias) + return (%cloned) + )IR"; + + auto self0 = at::rand({6}); + auto dim0 = 0; + auto index0 = at::randint(0, 5, {6}, torch::kInt32); + std::vector args{self0, dim0, index0}; + testStaticRuntime(script, args); + + auto self1 = at::rand({128}); + auto dim1 = 0; + auto index1 = at::randint(0, 127, {127}, torch::kInt32); + std::vector args2{self1, dim1, index1}; + testStaticRuntime(script, args, args2); +} + TEST(StaticRuntime, ClampMin) { const auto clamp_min_int_script = R"JIT( def forward(self, a: Tensor, b: int): @@ -1784,6 +2052,27 @@ TEST(StaticRuntime, QuantizedLinearDynamicFp16) { {input_2, weight_2}); } +TEST(StaticRuntime, QuantizedLinearReluDynamicFp16) { + const std::string quantized_linear_relu_dynamic_fp16_script = R"IR( + graph(%input: Tensor, %weights: Tensor): + %bias: None = prim::Constant() + %packed_params = quantized::linear_prepack_fp16(%weights, %bias) + %output = quantized::linear_relu_dynamic_fp16(%input, %packed_params) + %ret = aten::clone(%output, %bias) + return (%output) + )IR"; + at::Tensor weight = torch::randn({3, 2}, torch::kFloat); + at::Tensor input = torch::randn({3, 2}, torch::kFloat); + + at::Tensor weight_2 = torch::randn({4, 3}, torch::kFloat); + at::Tensor input_2 = torch::randn({5, 3}, torch::kFloat); + + testStaticRuntime( + quantized_linear_relu_dynamic_fp16_script, + {input, weight}, + {input_2, weight_2}); +} + TEST(StaticRuntime, VarStack) { const auto var_stack_script = R"JIT( def forward(self, inp1: Tensor, inp2: Tensor, dim: int): @@ -2745,3 +3034,148 @@ TEST(StaticRuntime, IfThenElse) { testStaticRuntime(src, args1); testStaticRuntime(src, args2); } + +TEST(StaticRuntime, EmptyIfBlock) { + const auto src = + R"JIT( + def forward(self, cond: bool, a: Tensor, b: Tensor): + l = [] + if cond: + l.append((a + b).clone()) + return l + )JIT"; + + testStaticRuntime(src, {true, at::rand(1), at::rand({1, 2})}); + testStaticRuntime(src, {false, at::rand(1), at::rand({1, 2})}); +} + +TEST(StaticRuntime, EmptyNestedIfBlock) { + const auto src = + R"JIT( + def forward(self, cond: bool, a: Tensor, b: Tensor): + l = [] + if cond: + if cond: + l.append((a + b).clone()) + return l + )JIT"; + + testStaticRuntime(src, {true, at::rand(1), at::rand({1, 2})}); + testStaticRuntime(src, {false, at::rand(1), at::rand({1, 2})}); +} + +TEST(StaticRuntime, StackEmpty) { + const auto src = R"JIT( + def forward(self): + x = torch.stack([]) + return x + )JIT"; + + torch::jit::Module mod("mod"); + mod.define(src); + + torch::jit::StaticModule smod(mod); + EXPECT_THROW(smod({}), c10::Error); +} + +TEST(StaticRuntime, ConcatEmpty) { + const auto src = R"JIT( + def forward(self): + x = torch.concat([]) + return x + )JIT"; + + torch::jit::Module mod("mod"); + mod.define(src); + + torch::jit::StaticModule smod(mod); + EXPECT_THROW(smod({}), c10::Error); +} + +TEST(StaticRuntime, IntImplicit) { + const auto src = R"IR( + graph(%a: Tensor): + %y: int = aten::IntImplicit(%a) + return (%y) + )IR"; + testStaticRuntime(src, {at::tensor({1}, at::kInt).squeeze()}); +} + +TEST(StaticRuntime, IntImplicit_ThrowOnBadInputs) { + const auto src = R"IR( + graph(%a: Tensor): + %y: int = aten::IntImplicit(%a) + return (%y) + )IR"; + auto graph = getGraphFromIR(src); + torch::jit::StaticModule smod(graph); + // Not 0D tensor + EXPECT_THROW(smod({at::tensor({1, 2}, at::kInt)}), std::runtime_error); + // Wrong dtype + EXPECT_THROW( + smod({at::tensor({1}, at::kFloat).squeeze()}), std::runtime_error); +} + +TEST(StaticRuntime, Select) { + const auto src = R"IR( + graph(%a: Tensor, %dim: int, %index: int): + %none: NoneType = prim::Constant() + %b: Tensor = aten::select(%a, %dim, %index) + %c: Tensor = aten::clone(%b, %none) + return (%c) + )IR"; + testStaticRuntime(src, {at::randn({2, 2}), 0, 1}); +} + +TEST(StaticRuntime, ReshapeAs) { + const auto src = R"JIT( + def forward(self, a, b): + return a.reshape_as(b).clone() + )JIT"; + testStaticRuntime(src, {at::randn({2, 2}), at::randn({4})}); +} + +TEST(StaticRuntime, MoveCtor) { + auto mod = getDeepAndWideSciptModel(); + std::vector args{ + at::randn({1, 1, 32}), at::randn({1, 1, 32}), at::randn({1, 50})}; + + torch::jit::StaticModule smod(mod); + + torch::jit::StaticRuntime runtime(smod); + auto expected = runtime(args); + + torch::jit::StaticRuntime new_runtime(std::move(runtime)); + auto actual = new_runtime(args); + compareResults(expected, actual); +} + +TEST(StaticRuntime, SingleBlockIfReturnList) { + const auto src = R"JIT( + def forward(self, a, b, cond: bool): + lst = [] + if cond: + lst.append(a + b) + return lst + )JIT"; + std::vector args1{at::randn({1}), at::randn({1}), true}; + std::vector args2{at::randn({42, 42}), at::randn({42, 42}), false}; + testStaticRuntime(src, args1, args2); +} + +TEST(StaticRuntime, NestedBlockIfReturnList) { + const auto src = R"JIT( + def forward(self, a, b, cond1: bool, cond2: bool): + if cond1: + lst = [] + if cond2: + lst.append(a + b) + lst.append(a * b) + return lst + return [] + )JIT"; + std::vector args1{at::randn({1}), at::randn({1}), true, true}; + std::vector args2{ + at::randn({42, 42}), at::randn({42, 42}), true, false}; + testStaticRuntime(src, args1, args2); +} diff --git a/benchmarks/static_runtime/test_utils.cc b/benchmarks/static_runtime/test_utils.cc index 6b0794d4ab9292..7e0733fbc8af43 100644 --- a/benchmarks/static_runtime/test_utils.cc +++ b/benchmarks/static_runtime/test_utils.cc @@ -146,11 +146,13 @@ void compareTensorLists( } } +} // namespace + void compareResults( const IValue& expect, const IValue& actual, - const bool use_allclose = false, - const bool use_equalnan = false) { + const bool use_allclose, + const bool use_equalnan) { if (expect.isTensor()) { VLOG(2) << "expect " << expect.toTensor() << std::endl; VLOG(2) << "output " << actual.toTensor() << std::endl; @@ -198,8 +200,6 @@ void compareResults( } } -} // namespace - at::Tensor getTensor(const at::IValue& ival) { if (ival.isTensor()) { return ival.toTensor(); diff --git a/benchmarks/static_runtime/test_utils.h b/benchmarks/static_runtime/test_utils.h index cb0a5a4a8c2ed9..27efd4d7d42efc 100644 --- a/benchmarks/static_runtime/test_utils.h +++ b/benchmarks/static_runtime/test_utils.h @@ -53,6 +53,12 @@ void compareResultsWithJIT( const bool use_allclose = false, const bool use_equalnan = false); +void compareResults( + const IValue& expect, + const IValue& actual, + const bool use_allclose = false, + const bool use_equalnan = false); + } // namespace test } // namespace jit } // namespace torch diff --git a/benchmarks/tensorexpr/__main__.py b/benchmarks/tensorexpr/__main__.py index f243ff5b61051e..63a1462d33d14f 100644 --- a/benchmarks/tensorexpr/__main__.py +++ b/benchmarks/tensorexpr/__main__.py @@ -137,7 +137,7 @@ def main(): torch._C._jit_set_profiling_executor(True) torch._C._jit_set_texpr_fuser_enabled(True) torch._C._jit_override_can_fuse_on_gpu(True) - torch._C._jit_set_profiling_mode(True) + torch._C._get_graph_executor_optimize(True) elif args.cuda_fuser == "old": import torch torch._C._jit_set_profiling_executor(False) @@ -148,7 +148,7 @@ def main(): torch._C._jit_set_profiling_executor(True) torch._C._jit_set_texpr_fuser_enabled(False) torch._C._jit_set_nvfuser_enabled(True) - torch._C._jit_set_profiling_mode(True) + torch._C._get_graph_executor_optimize(True) else : raise ValueError("Undefined fuser: {}".format(args.cuda_fuser)) diff --git a/binaries/CMakeLists.txt b/binaries/CMakeLists.txt index a98754eea2c390..b683ee002280c9 100644 --- a/binaries/CMakeLists.txt +++ b/binaries/CMakeLists.txt @@ -4,6 +4,7 @@ if(INTERN_BUILD_MOBILE) caffe2_binary_target("speed_benchmark.cc") else() caffe2_binary_target("speed_benchmark_torch.cc") + caffe2_binary_target("load_benchmark_torch.cc") if(NOT BUILD_LITE_INTERPRETER) caffe2_binary_target("compare_models_torch.cc") endif() diff --git a/binaries/load_benchmark_torch.cc b/binaries/load_benchmark_torch.cc new file mode 100644 index 00000000000000..330955657ece6e --- /dev/null +++ b/binaries/load_benchmark_torch.cc @@ -0,0 +1,93 @@ +/** + * Copyright (c) 2016-present, Facebook, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include +#include + +#include +#include "caffe2/core/timer.h" +#include "caffe2/utils/string_utils.h" +#include +#include +#include +#include +#include + +#include + +#include +using namespace std::chrono; + +C10_DEFINE_string(model, "", "The given torch script model to benchmark."); +C10_DEFINE_int(iter, 10, "The number of iterations to run."); +C10_DEFINE_bool( + report_pep, + true, + "Whether to print performance stats for AI-PEP."); + +int main(int argc, char** argv) { + c10::SetUsageMessage( + "Run model load time benchmark for pytorch model.\n" + "Example usage:\n" + "./load_benchmark_torch" + " --model=" + " --iter=20"); + if (!c10::ParseCommandLineFlags(&argc, &argv)) { + std::cerr << "Failed to parse command line flags!" << std::endl; + return 1; + } + + std::cout << "Starting benchmark." << std::endl; + CAFFE_ENFORCE( + FLAGS_iter >= 0, + "Number of main runs should be non negative, provided ", + FLAGS_iter, + "."); + + caffe2::Timer timer; + std::vector times; + + for (int i = 0; i < FLAGS_iter; ++i) { + auto start = high_resolution_clock::now(); + +#if BUILD_LITE_INTERPRETER + auto module = torch::jit::_load_for_mobile(FLAGS_model); +#else + auto module = torch::jit::load(FLAGS_model); +#endif + + auto stop = high_resolution_clock::now(); + auto duration = duration_cast(stop - start); + times.push_back(duration.count()); + } + + const double micros = static_cast(timer.MicroSeconds()); + if (FLAGS_report_pep) { + for (auto t : times) { + std::cout << R"(PyTorchObserver {"type": "NET", "unit": "us", )" + << R"("metric": "latency", "value": ")" + << t << R"("})" << std::endl; + } + } + + const double iters = static_cast(FLAGS_iter); + std::cout << "Main run finished. Microseconds per iter: " + << micros / iters + << ". Iters per second: " << 1000.0 * 1000 * iters / micros + << std::endl; + + return 0; +} diff --git a/c10/core/Backend.h b/c10/core/Backend.h index e17a1bc4226c69..59805f4a7ab1ae 100644 --- a/c10/core/Backend.h +++ b/c10/core/Backend.h @@ -32,6 +32,7 @@ enum class Backend { HIP, VE, FPGA, + IPU, XPU, SparseCPU, SparseCUDA, @@ -96,6 +97,8 @@ static inline Backend dispatchKeyToBackend(DispatchKey t) { return Backend::QuantizedCPU; } else if (t == DispatchKey::QuantizedCUDA) { return Backend::QuantizedCUDA; + } else if (t == DispatchKey::IPU || t == DispatchKey::AutogradIPU) { + return Backend::IPU; } else if (t == DispatchKey::XPU || t == DispatchKey::AutogradXPU) { return Backend::XPU; } else if (t == DispatchKey::SparseXPU) { @@ -129,6 +132,8 @@ static inline DispatchKey backendToDispatchKey(Backend b) { return DispatchKey::XLA; case Backend::Lazy: return DispatchKey::Lazy; + case Backend::IPU: + return DispatchKey::IPU; case Backend::XPU: return DispatchKey::XPU; case Backend::SparseXPU: @@ -196,6 +201,8 @@ static inline DeviceType backendToDeviceType(Backend b) { return DeviceType::CPU; case Backend::SparseCsrCUDA: return DeviceType::CUDA; + case Backend::IPU: + return DeviceType::IPU; case Backend::XPU: case Backend::SparseXPU: case Backend::QuantizedXPU: @@ -235,6 +242,8 @@ static inline const char* toString(Backend b) { return "FPGA"; case Backend::XPU: return "XPU"; + case Backend::IPU: + return "IPU"; case Backend::ORT: return "ORT"; case Backend::XLA: diff --git a/c10/core/Device.cpp b/c10/core/Device.cpp index 2531e3942271ad..1e0e4104144dc6 100644 --- a/c10/core/Device.cpp +++ b/c10/core/Device.cpp @@ -20,6 +20,7 @@ DeviceType parse_type(const std::string& device_string) { types = {{ {"cpu", DeviceType::CPU}, {"cuda", DeviceType::CUDA}, + {"ipu", DeviceType::IPU}, {"xpu", DeviceType::XPU}, {"mkldnn", DeviceType::MKLDNN}, {"opengl", DeviceType::OPENGL}, @@ -47,7 +48,7 @@ DeviceType parse_type(const std::string& device_string) { } TORCH_CHECK( false, - "Expected one of cpu, cuda, xpu, mkldnn, opengl, opencl, ideep, hip, ve, ort, mlc, xla, lazy, vulkan, meta, hpu device type at start of device string: ", + "Expected one of cpu, cuda, ipu, xpu, mkldnn, opengl, opencl, ideep, hip, ve, ort, mlc, xla, lazy, vulkan, meta, hpu device type at start of device string: ", device_string); } enum DeviceStringParsingState { START, INDEX_START, INDEX_REST, ERROR }; diff --git a/c10/core/Device.h b/c10/core/Device.h index b935eed6a65659..92ba0fc44d6707 100644 --- a/c10/core/Device.h +++ b/c10/core/Device.h @@ -96,11 +96,21 @@ struct C10_API Device final { return type_ == DeviceType::XPU; } + /// Return true if the device is of IPU type. + bool is_ipu() const noexcept { + return type_ == DeviceType::IPU; + } + /// Return true if the device is of HPU type. bool is_hpu() const noexcept { return type_ == DeviceType::HPU; } + /// Return true if the device is of META type. + bool is_meta() const noexcept { + return type_ == DeviceType::Meta; + } + /// Return true if the device is of CPU type. bool is_cpu() const noexcept { return type_ == DeviceType::CPU; diff --git a/c10/core/DeviceType.cpp b/c10/core/DeviceType.cpp index 4635acdb148c22..a076c5a5b0245c 100644 --- a/c10/core/DeviceType.cpp +++ b/c10/core/DeviceType.cpp @@ -43,6 +43,8 @@ std::string DeviceTypeName(DeviceType d, bool lower_case) { return lower_case ? "meta" : "META"; case DeviceType::HPU: return lower_case ? "hpu" : "HPU"; + case DeviceType::IPU: + return lower_case ? "ipu" : "IPU"; default: TORCH_CHECK( false, @@ -84,6 +86,7 @@ bool isValidDeviceType(DeviceType d) { case DeviceType::XPU: case DeviceType::Meta: case DeviceType::HPU: + case DeviceType::IPU: return true; default: return false; diff --git a/c10/core/DeviceType.h b/c10/core/DeviceType.h index c6bd56914d6d18..c2264532555134 100644 --- a/c10/core/DeviceType.h +++ b/c10/core/DeviceType.h @@ -31,11 +31,12 @@ enum class DeviceType : int8_t { HPU = 15, // HPU / HABANA VE = 16, // SX-Aurora / NEC Lazy = 17, // Lazy Tensors + IPU = 18, // Graphcore IPU // NB: If you add more devices: // - Change the implementations of DeviceTypeName and isValidDeviceType // in DeviceType.cpp // - Change the number below - COMPILE_TIME_MAX_DEVICE_TYPES = 18, + COMPILE_TIME_MAX_DEVICE_TYPES = 19, }; constexpr DeviceType kCPU = DeviceType::CPU; @@ -52,18 +53,19 @@ constexpr DeviceType kXPU = DeviceType::XPU; constexpr DeviceType kHPU = DeviceType::HPU; constexpr DeviceType kVE = DeviceType::VE; constexpr DeviceType kLazy = DeviceType::Lazy; +constexpr DeviceType kIPU = DeviceType::IPU; // define explicit int constant constexpr int COMPILE_TIME_MAX_DEVICE_TYPES = static_cast(DeviceType::COMPILE_TIME_MAX_DEVICE_TYPES); static_assert( - COMPILE_TIME_MAX_DEVICE_TYPES <= 18, + COMPILE_TIME_MAX_DEVICE_TYPES <= 19, "Hey! You seem to be adding a lot of new DeviceTypes. The intent was " "for this constant to reflect the actual number of DeviceTypes we support " "in PyTorch; it's important that this number is not too large as we " "use this to allocate stack arrays in some places in our code. If you " - "are indeed just adding the 18th device type, feel free to change " + "are indeed just adding the 19th device type, feel free to change " "the check to 32; but if you are adding some sort of extensible device " "types registration, please be aware that you are affecting code that " "this number is small. Try auditing uses of this constant."); diff --git a/c10/core/DispatchKey.cpp b/c10/core/DispatchKey.cpp index 6dbcaf88d5db78..14f501a6ddd02f 100644 --- a/c10/core/DispatchKey.cpp +++ b/c10/core/DispatchKey.cpp @@ -1,14 +1,49 @@ #include +#include #include namespace c10 { +const char* toString(BackendComponent t) { + switch (t) { + case BackendComponent::CPUBit: + return "CPUBit"; + case BackendComponent::CUDABit: + return "CUDABit"; + case BackendComponent::HIPBit: + return "HIPBit"; + case BackendComponent::XLABit: + return "XLABit"; + case BackendComponent::LazyBit: + return "LazyBit"; + case BackendComponent::XPUBit: + return "XPUBit"; + case BackendComponent::IPUBit: + return "IPUBit"; + case BackendComponent::MLCBit: + return "MLCBit"; + case BackendComponent::HPUBit: + return "HPUBit"; + case BackendComponent::VEBit: + return "VEBit"; + case BackendComponent::PrivateUse1Bit: + return "PrivateUse1Bit"; + case BackendComponent::PrivateUse2Bit: + return "PrivateUse2Bit"; + case BackendComponent::PrivateUse3Bit: + return "PrivateUse3Bit"; + case BackendComponent::InvalidBit: + return "InvalidBit"; + default: + return "UNKNOWN_BACKEND_BIT"; + } +} + const char* toString(DispatchKey t) { switch (t) { case DispatchKey::Undefined: return "Undefined"; - case DispatchKey::CPU: return "CPU"; case DispatchKey::CUDA: @@ -21,6 +56,8 @@ const char* toString(DispatchKey t) { return "FPGA"; case DispatchKey::XPU: return "XPU"; + case DispatchKey::IPU: + return "IPU"; case DispatchKey::ORT: return "ORT"; case DispatchKey::XLA: @@ -91,6 +128,8 @@ const char* toString(DispatchKey t) { return "Autograd"; case DispatchKey::AutogradCPU: return "AutogradCPU"; + case DispatchKey::AutogradIPU: + return "AutogradIPU"; case DispatchKey::AutogradXPU: return "AutogradXPU"; case DispatchKey::AutogradCUDA: @@ -103,8 +142,6 @@ const char* toString(DispatchKey t) { return "AutogradMLC"; case DispatchKey::AutogradHPU: return "AutogradHPU"; - case DispatchKey::AutogradNestedTensor: - return "AutogradNestedTensor"; case DispatchKey::AutogradPrivateUse1: return "AutogradPrivateUse1"; case DispatchKey::AutogradPrivateUse2: @@ -113,6 +150,8 @@ const char* toString(DispatchKey t) { return "AutogradPrivateUse3"; case DispatchKey::AutogradOther: return "AutogradOther"; + case DispatchKey::AutogradNestedTensor: + return "AutogradNestedTensor"; case DispatchKey::ZeroTensor: return "ZeroTensor"; @@ -170,6 +209,15 @@ const char* toString(DispatchKey t) { case DispatchKey::FuncTorchBatched: return "FuncTorchBatched"; + case DispatchKey::Dense: + return "Dense"; + case DispatchKey::Quantized: + return "Quantized"; + case DispatchKey::Sparse: + return "Sparse"; + case DispatchKey::AutogradFunctionality: + return "AutogradFunctionality"; + default: return "UNKNOWN_TENSOR_TYPE_ID"; } @@ -178,76 +226,37 @@ const char* toString(DispatchKey t) { std::ostream& operator<<(std::ostream& str, DispatchKey rhs) { return str << toString(rhs); } +std::ostream& operator<<(std::ostream& str, BackendComponent rhs) { + return str << toString(rhs); +} -// for a given backend key, return the associated autograd key. -// for non-backend keys, return AutogradOther as a default. -// Note: it's convenient and fast to return a default here rather than (say) -// returning an optional, or throwing. But it makes callers -// responsible for either a) enforcing the invariant that only backend keys -// be passed as arguments, or b) interpreting our return value carefully. -// -DispatchKey getAutogradKeyFromBackend(DispatchKey t) { - switch (t) { - case DispatchKey::CPU: - return DispatchKey::AutogradCPU; - case DispatchKey::XPU: - return DispatchKey::AutogradXPU; - case DispatchKey::CUDA: - return DispatchKey::AutogradCUDA; - case DispatchKey::XLA: - return DispatchKey::AutogradXLA; - case DispatchKey::Lazy: - return DispatchKey::AutogradLazy; - case DispatchKey::MLC: - return DispatchKey::AutogradMLC; - case DispatchKey::HPU: - return DispatchKey::AutogradHPU; - case DispatchKey::NestedTensor: - return DispatchKey::AutogradNestedTensor; - case DispatchKey::PrivateUse1: - return DispatchKey::AutogradPrivateUse1; - case DispatchKey::PrivateUse2: - return DispatchKey::AutogradPrivateUse2; - case DispatchKey::PrivateUse3: - return DispatchKey::AutogradPrivateUse3; - default: - return DispatchKey::AutogradOther; - } +DispatchKey getAutogradKeyFromBackend(BackendComponent k) { + // We want this to return an autograd key. We're relying on the fact that + // getAutogradRelatedKeySetFromBackend returns an autograd key + + // ADInplaceOrView, and autograd has higher precedence. The core mapping from + // backend -> autograd key lives in `getAutogradRelatedKeySetFromBackend` + // instead of here for performance. `getAutogradRelatedKeySetFromBackend` is a + // hotpath function, and we want to make sure that it doesn't have to + // construct any DispatchKeySets at runtime. + return getAutogradRelatedKeySetFromBackend(k).highestPriorityTypeId(); } c10::DispatchKey parseDispatchKey(const std::string& k) { static std::unordered_map key_map = { {"Undefined", c10::DispatchKey::Undefined}, - {"CPU", c10::DispatchKey::CPU}, - {"CUDA", c10::DispatchKey::CUDA}, - {"HIP", c10::DispatchKey::HIP}, + {"Dense", c10::DispatchKey::Dense}, {"FPGA", c10::DispatchKey::FPGA}, {"ORT", c10::DispatchKey::ORT}, - {"XLA", c10::DispatchKey::XLA}, - {"MLC", c10::DispatchKey::MLC}, {"Vulkan", c10::DispatchKey::Vulkan}, {"Metal", c10::DispatchKey::Metal}, - {"XPU", c10::DispatchKey::XPU}, - {"HPU", c10::DispatchKey::HPU}, {"VE", c10::DispatchKey::VE}, - {"Lazy", c10::DispatchKey::Lazy}, {"Meta", c10::DispatchKey::Meta}, - {"QuantizedCPU", c10::DispatchKey::QuantizedCPU}, - {"QuantizedCUDA", c10::DispatchKey::QuantizedCUDA}, - {"QuantizedXPU", c10::DispatchKey::QuantizedXPU}, + {"Quantized", c10::DispatchKey::Quantized}, {"CustomRNGKeyId", c10::DispatchKey::CustomRNGKeyId}, {"MkldnnCPU", c10::DispatchKey::MkldnnCPU}, - {"SparseCPU", c10::DispatchKey::SparseCPU}, - {"SparseCUDA", c10::DispatchKey::SparseCUDA}, - {"SparseHIP", c10::DispatchKey::SparseHIP}, - {"SparseXPU", c10::DispatchKey::SparseXPU}, - {"SparseVE", c10::DispatchKey::SparseVE}, + {"Sparse", c10::DispatchKey::Sparse}, {"SparseCsrCPU", c10::DispatchKey::SparseCsrCPU}, {"SparseCsrCUDA", c10::DispatchKey::SparseCsrCUDA}, - {"NestedTensor", c10::DispatchKey::NestedTensor}, - {"PrivateUse1", c10::DispatchKey::PrivateUse1}, - {"PrivateUse2", c10::DispatchKey::PrivateUse2}, - {"PrivateUse3", c10::DispatchKey::PrivateUse3}, {"BackendSelect", c10::DispatchKey::BackendSelect}, {"Python", c10::DispatchKey::Python}, {"PythonTLSSnapshot", c10::DispatchKey::PythonTLSSnapshot}, @@ -259,17 +268,8 @@ c10::DispatchKey parseDispatchKey(const std::string& k) { c10::DispatchKey::FuncTorchDynamicLayerBackMode}, {"ADInplaceOrView", c10::DispatchKey::ADInplaceOrView}, {"AutogradOther", c10::DispatchKey::AutogradOther}, - {"AutogradCPU", c10::DispatchKey::AutogradCPU}, - {"AutogradCUDA", c10::DispatchKey::AutogradCUDA}, - {"AutogradXLA", c10::DispatchKey::AutogradXLA}, - {"AutogradLazy", c10::DispatchKey::AutogradLazy}, - {"AutogradXPU", c10::DispatchKey::AutogradXPU}, - {"AutogradMLC", c10::DispatchKey::AutogradMLC}, - {"AutogradHPU", c10::DispatchKey::AutogradHPU}, + {"AutogradFunctionality", c10::DispatchKey::AutogradFunctionality}, {"AutogradNestedTensor", c10::DispatchKey::AutogradNestedTensor}, - {"AutogradPrivateUse1", c10::DispatchKey::AutogradPrivateUse1}, - {"AutogradPrivateUse2", c10::DispatchKey::AutogradPrivateUse2}, - {"AutogradPrivateUse3", c10::DispatchKey::AutogradPrivateUse3}, {"Tracer", c10::DispatchKey::Tracer}, {"AutocastCPU", c10::DispatchKey::AutocastCPU}, {"AutocastCUDA", c10::DispatchKey::AutocastCUDA}, @@ -283,6 +283,43 @@ c10::DispatchKey parseDispatchKey(const std::string& k) { {"TESTING_ONLY_GenericWrapper", c10::DispatchKey::TESTING_ONLY_GenericWrapper}, {"TESTING_ONLY_GenericMode", c10::DispatchKey::TESTING_ONLY_GenericMode}, + + {"CPU", c10::DispatchKey::CPU}, + {"CUDA", c10::DispatchKey::CUDA}, + {"HIP", c10::DispatchKey::HIP}, + {"XLA", c10::DispatchKey::XLA}, + {"MLC", c10::DispatchKey::MLC}, + {"XPU", c10::DispatchKey::XPU}, + {"IPU", c10::DispatchKey::IPU}, + {"HPU", c10::DispatchKey::HPU}, + {"Lazy", c10::DispatchKey::Lazy}, + {"NestedTensor", c10::DispatchKey::NestedTensor}, + {"PrivateUse1", c10::DispatchKey::PrivateUse1}, + {"PrivateUse2", c10::DispatchKey::PrivateUse2}, + {"PrivateUse3", c10::DispatchKey::PrivateUse3}, + + {"QuantizedCPU", c10::DispatchKey::QuantizedCPU}, + {"QuantizedCUDA", c10::DispatchKey::QuantizedCUDA}, + {"QuantizedXPU", c10::DispatchKey::QuantizedXPU}, + + {"SparseCPU", c10::DispatchKey::SparseCPU}, + {"SparseCUDA", c10::DispatchKey::SparseCUDA}, + {"SparseHIP", c10::DispatchKey::SparseHIP}, + {"SparseXPU", c10::DispatchKey::SparseXPU}, + {"SparseVE", c10::DispatchKey::SparseVE}, + + {"AutogradCPU", c10::DispatchKey::AutogradCPU}, + {"AutogradCUDA", c10::DispatchKey::AutogradCUDA}, + {"AutogradXLA", c10::DispatchKey::AutogradXLA}, + {"AutogradLazy", c10::DispatchKey::AutogradLazy}, + {"AutogradIPU", c10::DispatchKey::AutogradIPU}, + {"AutogradXPU", c10::DispatchKey::AutogradXPU}, + {"AutogradMLC", c10::DispatchKey::AutogradMLC}, + {"AutogradHPU", c10::DispatchKey::AutogradHPU}, + {"AutogradPrivateUse1", c10::DispatchKey::AutogradPrivateUse1}, + {"AutogradPrivateUse2", c10::DispatchKey::AutogradPrivateUse2}, + {"AutogradPrivateUse3", c10::DispatchKey::AutogradPrivateUse3}, + {"Autograd", c10::DispatchKey::Autograd}, {"CompositeImplicitAutograd", c10::DispatchKey::CompositeImplicitAutograd}, diff --git a/c10/core/DispatchKey.h b/c10/core/DispatchKey.h index 29315051b4177e..9ea1a36c2bb700 100644 --- a/c10/core/DispatchKey.h +++ b/c10/core/DispatchKey.h @@ -9,20 +9,99 @@ namespace c10 { +// Semantically, each value of BackendComponent identifies a "backend" for our +// dispatch. Some functionalities that we may dispatch to are allowed to +// register different handlers for each backend. The BackendComponent is then +// used to figure out which backend implementation to dispatch to. + +// In implementation terms, the backend component identifies a specific "bit" in +// a DispatchKeySet. The bits in the DispatchKeySet are split between the bottom +// ~12 "BackendComponent" bits, while the remaining upper bits are assigned to +// functionalities. When we encounter a functionality bit that is known to be +// customizeable per-backend, then we also look at the lower BackendComponent +// bits and take the highest bit to determine which backend's implementation to +// use. + +enum class BackendComponent : uint8_t { + + // A "backend" is colloquially used to refer to handlers for dispatch + // which actually implement the numerics of an operation in question. + // + // Due to the nature of the enum, these backends are specified in + // an ordered way, but for most backends this order is not semantically + // meaningful (e.g., it's valid to reorder these backends without changing + // semantics). The only situation when backend ordering is meaningful + // is when the backend participates in multiple dispatch with another + // backend; e.g., CPU and CUDA (cuda must have higher priority). + + // These keys don't correspond to individual kernels. + // Instead, they represent the backends that are allowed to override specific + // pieces of functionality: + // - dense kernels (e.g. DispatchKey::CPU) + // - sparse kernels (e.g. DispatchKey::SparseCPU) + // - quantized kernels (e.g. DispatchKey::QuantizedCPU) + // - autograd kernels (e.g. DispatchKey::AutogradCPU) + // We reserve space in the runtime operator table for this full cross product + // of + // [backends in this enum] x [keys below that are explicitly marked as having + // per-backend functionality] + + InvalidBit = 0, + CPUBit, + CUDABit, + HIPBit, + XLABit, + MLCBit, + IPUBit, + XPUBit, + HPUBit, + VEBit, + LazyBit, + PrivateUse1Bit, + PrivateUse2Bit, + PrivateUse3Bit, + // Define an alias to represent end of backend dispatch keys. + // If you add new backend keys after PrivateUse3, please also update it here. + // (But you shouldn't: private use keys should have higher precedence than + // all built-in keys) + EndOfBackendKeys = PrivateUse3Bit, +}; + // Semantically, a dispatch key identifies a possible "level" in our -// dispatch, for which a handler may be registered. Traditional -// backends like CPU and CUDA get dispatch keys; however, so do -// "wrapping" layers like Variable (for autograd handling). +// dispatch, for which a handler may be registered. Each handler corresponds +// to a type of functionality. // // In implementation terms, the dispatch key identifies a specific "bit" in a // DispatchKeySet. Higher bit indexes get handled by dispatching first (because // we "count leading zeros" when we extract the highest priority dispatch // key.) // +// Note [DispatchKey Classification] +// This enum actually contains several types of keys, which are explained +// in more detail further down: +// (1) non-customizable backends (e.g. FPGA) +// (2) non-customizable functionalities (e.g. Functionalize) +// (3) functionalized that are customizable per backend (e.g. Dense, Sparse, +// AutogradFunctionality) (4) per-backend instances of customizable +// functionalities (e.g. CPU, SparseCPU, AutogradCPU) (5) alias keys (e.g. +// CompositeImplicitAutograd) +// +// Of the categories above, it's important to note: +// (a) which keys are assigned individual bits in a DispatchKeySet +// (b) which keys are assigned individual slots in the runtime operator table +// ("Runtime keys") +// +// (1), (2) and (3) all get their own dedicated bits in the DispatchKeySet. +// (1), (2) and (4) all get their own dedicated slots in the runtime operator +// table. + +// See Note [DispatchKeySet Internal Representation] for more details. +// // NOTE: Keep the list in sync with `DispatchKey` in tools/codegen/model.py -enum class DispatchKey : uint8_t { +enum class DispatchKey : uint16_t { + // ~~~~~~~~~~~~~~~~~~~~~~~~~~ UNDEFINED ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // - // This is not a "real" tensor id, but it exists to give us a "nullopt" + // This is not a "real" functionality, but it exists to give us a "nullopt" // element we can return for cases when a DispatchKeySet contains no elements. // You can think a more semantically accurate definition of DispatchKey is: // @@ -38,24 +117,31 @@ enum class DispatchKey : uint8_t { // this will get eliminated, but for now it's convenient) CatchAll = Undefined, - // ~~~~~~~~~~~~~~~~~~~~~~~~~~ BACKENDS ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // - // A "backend" is colloquially used to refer to handlers for dispatch - // which actually implement the numerics of an operation in question. + // ~~~~~~~~~~~~~~~~~~~~~~~~~~ Functionality Keys ~~~~~~~~~~~~~~~~~~~~~~ // + // Every value in the enum (up to EndOfFunctionalityKeys) + // corresponds to an individual "functionality" that can be dispatched to. + // This is represented in the DispatchKeySet by assigning each of these enum + // values + // to each of the remaining (64 - len(BackendComponent)) bits. // - // Due to the nature of the enum, these backends are specified in - // an ordered way, but for most backends this order is not semantically - // meaningful (e.g., it's valid to reorder these backends without changing - // semantics). The only situation when backend ordering is meaningful - // is when the backend participates in multiple dispatch with another - // backend; e.g., CPU and SparseCPU (sparse must have - // higher priority). + // Most of these functionalities have a single handler assigned to them, + // making them "runtime keys". + // That map to a single slot in the runtime operator table. + // + // A few functionalities are allowed to be customizable per backend. + // See [Note: Per-Backend Functionality Dispatch Keys] for details. + + // See [Note: Per-Backend Functionality Dispatch Keys] + Dense, + + // Below are non-extensible backends. + // These are backends that currently don't have their own overrides for + // Autograd/Sparse/Quantized kernels, + // and we therefore don't waste space in the runtime operator table allocating + // space for them. + // If any of these backends ever need to customize, e.g., Autograd, then we'll + // need to add a DispatchKey::*Bit for them. - // Here are backends which you think of as traditionally specifying - // how to implement operations on some device. - CPU, // registered at build/aten/src/ATen/RegisterCPU.cpp - CUDA, // registered at build/aten/src/ATen/RegisterCUDA.cpp - HIP, // NB: I think this is not actually used, due to Note [Masquerading as - // CUDA] FPGA, // Xilinx support lives out of tree at // https://gitlab.com/pytorch-complex/vitis_kernels @@ -67,14 +153,8 @@ enum class DispatchKey : uint8_t { // - aten/src/ATen/test/extension_backend_test.cpp ORT, - XLA, // lives out of tree at https://github.com/pytorch/xla - MLC, // lives out of tree at https://github.com/pytorch/MLCompute Vulkan, Metal, - XPU, // For out of tree Intel's heterogeneous computing plug-in - HPU, // For out of tree & closed source integration of HPU / Habana - VE, // For out of tree & closed source integration of SX-Aurora / NEC - Lazy, // For lazy tensor backends // A meta tensor is a tensor without any data associated with it. (They // have also colloquially been referred to as tensors on the "null" device). @@ -83,11 +163,8 @@ enum class DispatchKey : uint8_t { // tensor with the output shape and dtype, but wouldn't actually add anything. Meta, - // Here are backends which specify more specialized operators - // based on the dtype of the tensor. - QuantizedCPU, // registered at build/aten/src/ATen/RegisterQuantizedCPU.cpp - QuantizedCUDA, // registered at build/aten/src/ATen/RegisterQuantizedCUDA.cpp - QuantizedXPU, // For out of tree Intel's heterogeneous computing plug-in + // See [Note: Per-Backend Functionality Dispatch Keys] + Quantized, // This backend is to support custom RNGs; it lets you go // to a different kernel if you pass in a generator that is not a @@ -106,30 +183,28 @@ enum class DispatchKey : uint8_t { // the corresponding dense tensors, and must be handled before them. MkldnnCPU, // registered at build/aten/src/ATen/RegisterMkldnnCPU.cpp // NB: not to be confused with MKLDNN, which is Caffe2 only - SparseCPU, // registered at build/aten/src/ATen/RegisterSparseCPU.cpp - SparseCUDA, // registered at build/aten/src/ATen/RegisterSparseCUDA.cpp - SparseHIP, // TODO: I think this is not actually used, due to Note - // [Masquerading as CUDA] - SparseXPU, // For out of tree Intel's heterogeneous computing plug-in - SparseVE, // For out of tree & closed source integration of SX-Aurora / NEC + + // See [Note: Per-Backend Functionality Dispatch Keys] + Sparse, SparseCsrCPU, SparseCsrCUDA, - NestedTensor, // lives out of tree at https://github.com/pytorch/nestedtensor - - // Here are reserved backends for user-defined backends, see Note [Private use - // DispatchKey] - // To see some example about how to use this, check out ORT - PrivateUse1, - PrivateUse2, - PrivateUse3, + // Note [Non-Customizable Backend Keys] + // Every key above here is considered a "non-customizable backend". + // These are backends that will work correctly with autograd, but + // but currently don't require separate implementations + // for autograd sparse or quantized kernels. + // Any new backends that don't need to be customized should go above here. + // If an existing backend needs to e.g. override autograd, then we can + // consider promoting it into the "BackendComponent" enum + // + // For all intents and purposes from the perspective of DispatchKeySet, + // "non-customizable backend" keys are treated the same way + // as other functionality keys + EndOfNonCustomizableBackends = SparseCsrCUDA, - // Define an alias key to represent end of backend dispatch keys. - // If you add new backend keys after PrivateUse3, please also update it here. - // (But you shouldn't: private use keys should have higher precedence than - // all built-in keys) - EndOfBackendKeys = PrivateUse3, + NestedTensor, // lives out of tree at https://github.com/pytorch/nestedtensor // In some situations, it is not immediately obvious what the correct // backend for function is, because the function in question doesn't @@ -233,20 +308,18 @@ enum class DispatchKey : uint8_t { // AutogradOther key. We can add specific autograd key for those backends // upon request. AutogradOther, - AutogradCPU, - AutogradCUDA, - AutogradXLA, - AutogradLazy, - AutogradXPU, - AutogradMLC, - AutogradHPU, - AutogradNestedTensor, // lives out of tree at + + // See [Note: Per-Backend Functionality Dispatch Keys] + AutogradFunctionality, + + // NestedTensor is an example of something that isn't a "real backend" + // (because it mostly consists of redispatching kernels) + // but it would like to override autograd functionality in C++. + // We can handle cases like this by adding an extra functionality key + // exclusively for handling autograd for NestedTensor. + // lives out of tree at // https://github.com/pytorch/nestedtensor - // Here are some reserved pre-autograd keys for user-defined backends, see - // Note [Private use DispatchKey] - AutogradPrivateUse1, - AutogradPrivateUse2, - AutogradPrivateUse3, + AutogradNestedTensor, Tracer, @@ -280,13 +353,16 @@ enum class DispatchKey : uint8_t { // we can consider adding separate keys dedicated to those individual passes. // See Note [Functionalization Pass In Core] for details. Functionalize, - FuncTorchDynamicLayerFrontMode, // See Note [Out-of-tree vmap+grad prototype] // Used by Python key logic to know the set of tls on entry to the dispatcher - // This kernel assumes it is at the very top of the dispatcher. If you add - // a key above, make sure to update the fallback implementation for this. + // This kernel assumes it is the top-most non-functorch-related DispatchKey. + // If you add a key above, make sure to update the fallback implementation for + // this. PythonTLSSnapshot, + // This key should be at the very top of the dispatcher + FuncTorchDynamicLayerFrontMode, // See Note [Out-of-tree vmap+grad prototype] + // TESTING: This is intended to be a generic testing tensor type id. // Don't use it for anything real; its only acceptable use is within a single // process test. Use it by creating a TensorImpl with this DispatchKey, and @@ -304,9 +380,104 @@ enum class DispatchKey : uint8_t { TESTING_ONLY_GenericMode, // ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ FIN ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // - NumDispatchKeys, // Sentinel, end of runtime keys. + EndOfFunctionalityKeys, // End of functionality keys. + + // ~~~~~~~~~~~~~~ "Dense" Per-Backend Dispatch keys ~~~~~~~~~~~~~~~~~~~~ // + // Here are backends which you think of as traditionally specifying + // how to implement operations on some device. + + // See Note [The Ordering of Per-Backend Dispatch Keys Matters!] + StartOfDenseBackends, + CPU, // registered at build/aten/src/ATen/RegisterCPU.cpp + CUDA, // registered at build/aten/src/ATen/RegisterCUDA.cpp + HIP, // NB: I think this is not actually used, due to Note [Masquerading as + // CUDA] + XLA, // lives out of tree at https://github.com/pytorch/xla + MLC, // lives out of tree at https://github.com/pytorch/MLCompute + IPU, // lives out of tree at https://github.com/graphcore/poptorch + XPU, // For out of tree Intel's heterogeneous computing plug-in + HPU, // For out of tree & closed source integration of HPU / Habana + VE, // For out of tree & closed source integration of SX-Aurora / NEC + Lazy, // For lazy tensor backends + // Here are reserved backends for user-defined backends, see Note [Private use + // DispatchKey] + // To see some example about how to use this, check out ORT + PrivateUse1, + PrivateUse2, + PrivateUse3, + EndOfDenseBackends = PrivateUse3, + + // ~~~~~~~~~~~~~~ "Quantized" Per-Backend Dispatch keys ~~~~~~~~~~~~~~~~ // + // keys starting with an _ are not currently used, + // but are needed to ensure that every backend is indexed correctly. + + // See Note [The Ordering of Per-Backend Dispatch Keys Matters!] + StartOfQuantizedBackends, + QuantizedCPU, // registered at build/aten/src/ATen/RegisterQuantizedCPU.cpp + QuantizedCUDA, // registered at build/aten/src/ATen/RegisterQuantizedCUDA.cpp + _QuantizedHIP, + _QuantizedXLA, + _QuantizedMLC, + _QuantizedIPU, + QuantizedXPU, // For out of tree Intel's heterogeneous computing plug-in + _QuantizedHPU, + _QuantizedVE, + _QuantizedLazy, + _QuantizedPrivateUse1, + _QuantizedPrivateUse2, + _QuantizedPrivateUse3, + EndOfQuantizedBackends = _QuantizedPrivateUse3, + + // ~~~~~~~~~~~~~~ "Sparse" Per-Backend Dispatch keys ~~~~~~~~~~~~~~~~~~~ // + // keys starting with an _ are not currently used, + // but are needed to ensure that every backend is indexed correctly. + + // See Note [The Ordering of Per-Backend Dispatch Keys Matters!] + StartOfSparseBackends, + SparseCPU, // registered at build/aten/src/ATen/RegisterSparseCPU.cpp + SparseCUDA, // registered at build/aten/src/ATen/RegisterSparseCUDA.cpp + SparseHIP, // TODO: I think this is not actually used, due to Note + // [Masquerading as CUDA] + _SparseXLA, + _SparseMLC, + _SparseIPU, + SparseXPU, // For out of tree Intel's heterogeneous computing plug-in + _SparseHPU, + SparseVE, // For out of tree & closed source integration of SX-Aurora / NEC + _SparseLazy, + _SparsePrivateUse1, + _SparsePrivateUse2, + _SparsePrivateUse3, + EndOfSparseBackends = _SparsePrivateUse3, + + // ~~~~~~~~~~~~~~ "Autograd" Per-Backend Dispatch keys ~~~~~~~~~~~~~~~~~ // + // keys starting with an _ are not currently used, + // but are needed to ensure that every backend is indexed correctly. + + // See Note [The Ordering of Per-Backend Dispatch Keys Matters!] + StartOfAutogradBackends, + AutogradCPU, + AutogradCUDA, + _AutogradHIP, + AutogradXLA, + AutogradMLC, + AutogradIPU, + AutogradXPU, + AutogradHPU, + _AutogradVE, + AutogradLazy, + // Here are some reserved pre-autograd keys for user-defined backends, see + // Note [Private use DispatchKey] + AutogradPrivateUse1, + AutogradPrivateUse2, + AutogradPrivateUse3, + EndOfAutogradBackends = AutogradPrivateUse3, + // If we add a new per-backend functionality key that has higher priority + // than Autograd, then this key should be updated. + EndOfRuntimeBackendKeys = EndOfAutogradBackends, // ~~~~~~~~~~~~~~~~~~~~~~ Alias Dispatch Keys ~~~~~~~~~~~~~~~~~~~~~~~~~~ // + // Note [Alias Dispatch Keys] // Alias dispatch keys are synthetic dispatch keys which map to multiple // runtime dispatch keys. Alisa keys have precedence, but they are always // lower precedence than runtime keys. You can register a kernel to an @@ -326,6 +497,7 @@ enum class DispatchKey : uint8_t { // Define an alias key to represent end of alias dispatch keys. // If you add new alias keys after Autograd, please also update it here. + StartOfAliasKeys = Autograd, EndOfAliasKeys = CompositeExplicitAutograd, // // ~~~~~~~~~~~~~~~~~~~~~~~~~ BC ALIASES ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // @@ -365,54 +537,83 @@ enum class DispatchKey : uint8_t { // built-in autograd formulas for operators are not appropriate. static_assert( - static_cast(DispatchKey::NumDispatchKeys) <= 64, - "DispatchKey is used as index into 64-bit bitmask; you must have less than 64 entries"); + (static_cast(BackendComponent::EndOfBackendKeys) + + static_cast(DispatchKey::EndOfFunctionalityKeys)) <= 64, + "The BackendComponent and DispatchKey enums (below EndOfFunctionalityKeys)" + " both map to backend and functionality bits" + " into a 64-bit bitmask; you must have less than 64 total entries between them"); -#if defined(C10_MOBILE_TRIM_DISPATCH_KEYS) -/** - * The method below maps the dispatch key in the enum DispatchKey to an - * integer index in the dispatchTable_ array in OperatorEntry. The array - * is trimmed for mobile to reduce peak memory usage since it's - * unnecessary to reserve additional space for dispatch keys that will - * never be used on mobile. - */ -C10_API constexpr int getDispatchTableIndexForDispatchKey(DispatchKey dk) { - switch (dk) { - case DispatchKey::Undefined: - return 0; - case DispatchKey::CPU: - return 1; - case DispatchKey::QuantizedCPU: - return 2; - case DispatchKey::SparseCPU: - return 3; - case DispatchKey::BackendSelect: - return 4; - case DispatchKey::ADInplaceOrView: - return 5; - case DispatchKey::AutogradOther: - return 6; - case DispatchKey::AutogradCPU: - return 7; - case DispatchKey::NumDispatchKeys: // Sentinel, end of runtime keys. - return 8; - default: - return -1; +// Check if a DispatchKey is an alias mapping to other runtime keys. +constexpr bool isAliasDispatchKey(DispatchKey k) { + return k >= DispatchKey::StartOfAliasKeys && k <= DispatchKey::EndOfAliasKeys; +} + +// [Note: Per-Backend Functionality Dispatch Keys] +// Check if a DispatchKey is a per-backend functionality key +// Any functionalities that can be customized per-backend should be added here. +// These keys correspond to functionalities that can be customized indivually +// per backend. While they only take up one bit in the `DispatchKeySet` bitset, +// they map to (# backends) slots in the operator table. +// Each of these keys also has a separate set of "runtime keys" in the dispatch +// key enum, per backend, which *do* map to the individual operator table slots. +// For example, the "Sparse" key maps to an individual bit in the +// DispatchKeySet, while `SparseCPU`, `SparseCUDA`, etc all map to individual +// slots in the runtime operator table. + +constexpr bool isPerBackendFunctionalityKey(DispatchKey k) { + if (k == DispatchKey::Dense || k == DispatchKey::Quantized || + k == DispatchKey::Sparse || k == DispatchKey::AutogradFunctionality) { + return true; + } else { + return false; } } -#else -/** - * For the server use-case, make this a simple pass-through. - */ -C10_API constexpr int getDispatchTableIndexForDispatchKey(DispatchKey dk) { - return static_cast(dk); + +// Note that this includes Undefined in the total count. +// BUT EndOfFunctionalityKeys is its own (placeholder) key. +// e.g. Undefined=0, Dense=1, Sparse=2, EndOfFunctionalityKeys=3. +// In the above example, there are 3 total functionality keys. +constexpr uint8_t num_functionality_keys = + static_cast(DispatchKey::EndOfFunctionalityKeys); + +constexpr uint8_t num_backends = + static_cast(BackendComponent::EndOfBackendKeys); + +// Note [No More Than 16 Backends] +// Search for this note to find places in the code where the "no more than 16 +// backends" invariant is baked in. +static_assert( + static_cast(BackendComponent::EndOfBackendKeys) <= 16, + "BackendComponent currently only supports <= 16 backends. If we really need to extend this, \ +there are a few places where this invariant is baked in"); + +constexpr uint8_t numPerBackendFunctionalityKeys() { + uint8_t count = 0; + for (uint8_t k = 0; k <= num_functionality_keys; ++k) { + if (isPerBackendFunctionalityKey(static_cast(k))) + ++count; + } + return count; } + +#if defined(C10_MOBILE_TRIM_DISPATCH_KEYS) +// See [Note: Trimmed Mobile Dispatch Keys] +constexpr uint16_t num_runtime_entries = 8; +#else +constexpr uint16_t num_runtime_entries = num_functionality_keys + + (numPerBackendFunctionalityKeys() * (num_backends - 1)); #endif +// See Note [No More Than 16 Backends] +constexpr uint16_t full_backend_mask = + (static_cast(1) << num_backends) - 1; + C10_API const char* toString(DispatchKey); +C10_API const char* toString(BackendComponent); C10_API std::ostream& operator<<(std::ostream&, DispatchKey); +C10_API std::ostream& operator<<(std::ostream&, BackendComponent); -C10_API DispatchKey getAutogradKeyFromBackend(DispatchKey t); +C10_API DispatchKey getAutogradKeyFromBackend(BackendComponent k); // Parses a string into a dispatch key. // If the string cannot be correctly parsed, throws an exception. @@ -425,10 +626,86 @@ C10_API c10::DispatchKey parseDispatchKey(const std::string& k); // torch::dispatch(torch::kCPU, ...) is also valid. constexpr DispatchKey kAutograd = DispatchKey::Autograd; -// Check if a DispatchKey is an alias mapping to other runtime keys. -inline bool isAliasDispatchKey(DispatchKey k) { - return k > DispatchKey::NumDispatchKeys && k <= DispatchKey::EndOfAliasKeys; +// See Note [The Ordering of Per-Backend Dispatch Keys Matters!] +// This function relies on the invariant that the dispatch keys between +// StartOfDenseBackends and EndOfRuntimeBackendKeys are ordered by backend +// in the same order as `BackendComponent`. +constexpr BackendComponent toBackendComponent(DispatchKey k) { + if (k >= DispatchKey::StartOfDenseBackends && + k <= DispatchKey::EndOfDenseBackends) { + return static_cast( + static_cast(k) - + static_cast(DispatchKey::StartOfDenseBackends)); + } else if ( + k >= DispatchKey::StartOfQuantizedBackends && + k <= DispatchKey::EndOfQuantizedBackends) { + return static_cast( + static_cast(k) - + static_cast(DispatchKey::StartOfQuantizedBackends)); + } else if ( + k >= DispatchKey::StartOfSparseBackends && + k <= DispatchKey::EndOfSparseBackends) { + return static_cast( + static_cast(k) - + static_cast(DispatchKey::StartOfSparseBackends)); + } else if ( + k >= DispatchKey::StartOfAutogradBackends && + k <= DispatchKey::EndOfAutogradBackends) { + return static_cast( + static_cast(k) - + static_cast(DispatchKey::StartOfAutogradBackends)); + } else { + return BackendComponent::InvalidBit; + } } + +constexpr DispatchKey toFunctionalityKey(DispatchKey k) { + if (k <= DispatchKey::EndOfFunctionalityKeys) { + return k; + } else if (k <= DispatchKey::EndOfDenseBackends) { + return DispatchKey::Dense; + } else if (k <= DispatchKey::EndOfQuantizedBackends) { + return DispatchKey::Quantized; + } else if (k <= DispatchKey::EndOfSparseBackends) { + return DispatchKey::Sparse; + } else if (k <= DispatchKey::EndOfAutogradBackends) { + return DispatchKey::AutogradFunctionality; + } else { + return DispatchKey::Undefined; + } +} + +// Given (DispatchKey::Dense, DispatchKey::CUDABit), returns DispatchKey::CUDA +// See Note [The Ordering of Per-Backend Dispatch Keys Matters!] +// This function relies on the invariant that the dispatch keys between +// StartOfDenseBackends and EndOfRuntimeBackendKeys are ordered by backend +// in the same order as `BackendComponent`. +constexpr DispatchKey toRuntimePerBackendFunctionalityKey( + DispatchKey functionality_k, + BackendComponent backend_k) { + if (functionality_k == DispatchKey::Dense) { + return static_cast( + static_cast(DispatchKey::StartOfDenseBackends) + + static_cast(backend_k)); + } + if (functionality_k == DispatchKey::Sparse) { + return static_cast( + static_cast(DispatchKey::StartOfSparseBackends) + + static_cast(backend_k)); + } + if (functionality_k == DispatchKey::Quantized) { + return static_cast( + static_cast(DispatchKey::StartOfQuantizedBackends) + + static_cast(backend_k)); + } + if (functionality_k == DispatchKey::AutogradFunctionality) { + return static_cast( + static_cast(DispatchKey::StartOfAutogradBackends) + + static_cast(backend_k)); + } + return DispatchKey::Undefined; +} + } // namespace c10 namespace torch { diff --git a/c10/core/DispatchKeySet.cpp b/c10/core/DispatchKeySet.cpp index 7f85567f886f6b..d36e43513d4783 100644 --- a/c10/core/DispatchKeySet.cpp +++ b/c10/core/DispatchKeySet.cpp @@ -1,37 +1,30 @@ #include +#include +#include namespace c10 { -// backend_dispatch_keyset should include all runtime backend keys. +// backend_dispatch_keyset includes all dispatch keys that map to backends. // Alias key DispatchKey::CompositeExplicitAutograd maps to -// backend_dispatch_keyset NestedTensor has been explicitly removed due to -// incompatibility with some kernels, such as structured kernels, that use the -// DefaultBackend key. -constexpr DispatchKeySet backend_dispatch_keyset = autogradother_backends | - DispatchKeySet({ - DispatchKey::CPU, - DispatchKey::CUDA, - DispatchKey::XLA, - DispatchKey::Lazy, - DispatchKey::XPU, - DispatchKey::PrivateUse1, - DispatchKey::PrivateUse2, - DispatchKey::PrivateUse3, - DispatchKey::MLC, - DispatchKey::HPU, - DispatchKey::ORT, - DispatchKey::Meta, - }); +// backend_dispatch_keyset +constexpr DispatchKeySet backend_dispatch_keyset = + autogradother_backends | DispatchKeySet(DispatchKey::Dense); bool isBackendDispatchKey(DispatchKey t) { return t != DispatchKey::Undefined // See Note [No Alias Keys in DispatchKeySet] - && !isAliasDispatchKey(t) && backend_dispatch_keyset.has(t); + && !isAliasDispatchKey(t) + // Note [NestedTensor Not Included in Backend Keys] + // NestedTensor has been explicitly removed from the "backend keyset" due + // to incompatibility with some kernels, so we don't want it to be + // included in CompositeImplicitAutograd or CompositeExplicitAutograd + // kernels. + && t != DispatchKey::NestedTensor && backend_dispatch_keyset.has(t); } // math_dispatch_keyset contains all keys in backend_dispatch_keyset and // autograd_dispatch_keyset Alias key DispatchKey::CompositeImplicitAutograd -// maps to math_dispatch_keyset. +// maps to [math_dispatch_keyset x full_backend_mask] constexpr DispatchKeySet math_dispatch_keyset = backend_dispatch_keyset | autograd_dispatch_keyset; @@ -39,7 +32,12 @@ DispatchKeySet getRuntimeDispatchKeySet(DispatchKey t) { TORCH_INTERNAL_ASSERT(t != DispatchKey::Undefined); switch (t) { case DispatchKey::Autograd: - return autograd_dispatch_keyset; + // See Note [autograd_dispatch_keyset Does Not Include Backend Bits] + // That's why we OR it with a mask of the backend bits here. + // getRuntimeDispatchKeySet() expects to return a keyset of runtime + // dispatch keys, like AutogradCPU, but that requires having backend bits. + return autograd_dispatch_keyset | + DispatchKeySet(DispatchKeySet::RAW, full_backend_mask); case DispatchKey::CompositeImplicitAutograd: return math_dispatch_keyset; case DispatchKey::CompositeExplicitAutograd: @@ -53,11 +51,13 @@ bool runtimeDispatchKeySetHas(DispatchKey t, DispatchKey k) { TORCH_INTERNAL_ASSERT(t != DispatchKey::Undefined); switch (t) { case DispatchKey::Autograd: - return autograd_dispatch_keyset.has(k); + return autograd_dispatch_keyset.has(toFunctionalityKey(k)); case DispatchKey::CompositeImplicitAutograd: - return math_dispatch_keyset.has(k); + // See Note [NestedTensor Not Included in Backend Keys] + return k != DispatchKey::NestedTensor && math_dispatch_keyset.has(k); case DispatchKey::CompositeExplicitAutograd: - return backend_dispatch_keyset.has(k); + // See Note [NestedTensor Not Included in Backend Keys] + return k != DispatchKey::NestedTensor && backend_dispatch_keyset.has(k); default: return t == k; } @@ -79,8 +79,8 @@ DispatchKeySet getBackendKeySetFromAutograd(DispatchKey t) { return DispatchKeySet(DispatchKey::MLC); case DispatchKey::AutogradHPU: return DispatchKeySet(DispatchKey::HPU); - case DispatchKey::AutogradNestedTensor: - return DispatchKeySet(DispatchKey::NestedTensor); + case DispatchKey::AutogradIPU: + return DispatchKeySet(DispatchKey::IPU); case DispatchKey::AutogradXPU: return DispatchKeySet(DispatchKey::XPU); case DispatchKey::AutogradPrivateUse1: @@ -96,23 +96,6 @@ DispatchKeySet getBackendKeySetFromAutograd(DispatchKey t) { } } -DispatchKeySet getAutocastRelatedKeySetFromBackend(DispatchKey t) { - switch (t) { - case DispatchKey::CPU: - return DispatchKeySet(DispatchKey::AutocastCPU); - case DispatchKey::CUDA: - case DispatchKey::XLA: - return DispatchKeySet(DispatchKey::AutocastCUDA); - default: - return DispatchKeySet(); - } -} - -DispatchKeySet getAutogradRelatedKeySetFromBackend(DispatchKey t) { - return DispatchKeySet( - {DispatchKey::ADInplaceOrView, getAutogradKeyFromBackend(t)}); -} - bool isIncludedInAlias(DispatchKey k, DispatchKey alias) { return k != DispatchKey::Undefined && runtimeDispatchKeySetHas(alias, k); } @@ -129,18 +112,135 @@ std::ostream& operator<<(std::ostream& os, DispatchKeySet ts) { return os; } os << "DispatchKeySet("; - DispatchKey tid; bool first = true; - while ((tid = ts.highestPriorityTypeId()) != DispatchKey::Undefined) { + for (auto k : ts) { if (!first) { os << ", "; } - os << tid; - ts = ts.remove(tid); + os << k; first = false; } os << ")"; return os; } +DispatchKeySet::iterator& DispatchKeySet::iterator::operator++() { + TORCH_INTERNAL_ASSERT(next_functionality_ <= iterator::end_iter_mask_val); + TORCH_INTERNAL_ASSERT(next_backend_ <= num_backends, next_backend_); + + // Create a masked version of the set representation to ignore previous + // keys that we've iterated through. + uint64_t masked_functionality_bits = + llvm::maskTrailingZeros(next_functionality_) & *data_ptr_; + uint64_t masked_backend_bits = + llvm::maskTrailingZeros(next_backend_) & full_backend_mask & + *data_ptr_; + + uint64_t first_functionality_idx = + llvm::findFirstSet(masked_functionality_bits); + uint64_t first_backendcomponent_idx = llvm::findFirstSet(masked_backend_bits); + + // If there are no keys, set to end iterator value + if (first_functionality_idx == std::numeric_limits::max() || + next_functionality_ == iterator::end_iter_mask_val) { + // Set up state to be the same as end() + next_functionality_ = iterator::end_iter_mask_val; + current_dispatchkey_idx_ = iterator::end_iter_key_val; + next_backend_ = 0; + current_backendcomponent_idx_ = iterator::end_iter_key_val; + return *this; + } + + // The +1 is because of DispatchKey::Undefined and + // BackendComponent::InvalidBit + auto new_next_functionality = first_functionality_idx + 1; + auto new_backendcomponent_idx = first_backendcomponent_idx + 1; + // and the -num_backends is because the first bits in the + // keyset are not Dispatch Keys. + auto next_dispatchkey_idx = new_next_functionality - num_backends; + + // If the current functionality bit is a per-backend bit, we need special + // handling + if (isPerBackendFunctionalityKey( + static_cast(next_dispatchkey_idx))) { + // case 1: if the current backend is undefined, then there is no valid + // backend instance of this functionality key so we can skip it. + if (first_backendcomponent_idx == std::numeric_limits::max()) { + // increment the functionality mask so we skip the current functionality + // bit on the next increment. + next_functionality_ = new_next_functionality; + ++(*this); + return *this; + } + + // Otherwise, at this point we know what the current backend and + // functionality bits are. + current_dispatchkey_idx_ = next_dispatchkey_idx; + current_backendcomponent_idx_ = new_backendcomponent_idx; + + // Next, we need to set up the masks for the next increment. + uint64_t next_backendcomponent_bits = + llvm::maskTrailingZeros(first_backendcomponent_idx + 1) & + full_backend_mask & *data_ptr_; + uint64_t next_backendcomponent_idx = + llvm::findFirstSet(next_backendcomponent_bits); + if (next_backendcomponent_idx == std::numeric_limits::max()) { + // case 2: the current backend is valid, but there is not another backend + // in the keyset. In this case, we need to bump the functionality mask and + // reset the backend mask for the next increment + next_functionality_ = new_next_functionality; + next_backend_ = 0; + } else { + // case 3: we have another backend to iterate over. We want to iterate + // over the same functionality bit next time, but a different backend bit. + next_backend_ = first_backendcomponent_idx + 1; + } + } else { + // Functionality bits that aren't per backend are simpler to handle. We can + // ignore the backend bits. + TORCH_INTERNAL_ASSERT(next_backend_ == 0); + current_dispatchkey_idx_ = next_dispatchkey_idx; + next_functionality_ = new_next_functionality; + } + return *this; +} + +std::array +initializeFunctionalityOffsetsAndMasks() { + std::array + offsets_and_masks; + // manualy set the first entry, which corresponds to Undefined. + offsets_and_masks[0] = FunctionalityOffsetAndMask(0, 0); + // loop through every functionality key (aside from Undefined). + for (const auto functionality_idx : c10::irange(1, num_functionality_keys)) { + // functionality_idx should be Dense -> 1, ... + auto prev_offset_and_mask = offsets_and_masks[functionality_idx - 1]; + auto k = static_cast(functionality_idx); + + // If the previous functionality was not per-backend, then we can just + // increment the previous offset. Otherwise, the next offset = + // previous_offset + num_backends. + auto next_offset = prev_offset_and_mask.offset + + (prev_offset_and_mask.mask == 0 ? 1 : num_backends); + // the mask is used in the runtime index calculation to find the offset of + // the backend. For non-per-backend functionalities, this offset should + // always be 0. Otherwise, we need to get the index of the backend (which we + // can do using a backend mask). + auto next_mask = isPerBackendFunctionalityKey(k) ? full_backend_mask : 0; + offsets_and_masks[functionality_idx] = + FunctionalityOffsetAndMask(next_offset, next_mask); + } + // Sanity check that the computed offset index of the last functionality key + // is correct. This assumes that the highest priority functionality key is not + // per backend. + TORCH_INTERNAL_ASSERT( + offsets_and_masks[num_functionality_keys - 1].offset == + (num_runtime_entries - 1), + "num_runtime_entries: ", + num_runtime_entries, + "last_offset: ", + offsets_and_masks[num_functionality_keys - 1].offset); + return offsets_and_masks; +} + } // namespace c10 diff --git a/c10/core/DispatchKeySet.h b/c10/core/DispatchKeySet.h index 79d39652219b51..0e631061411dd0 100644 --- a/c10/core/DispatchKeySet.h +++ b/c10/core/DispatchKeySet.h @@ -1,5 +1,4 @@ #pragma once - #include #include #include @@ -8,29 +7,147 @@ namespace c10 { +struct FunctionalityOffsetAndMask { + // empty constructor shouldn't be used; only needed to initialize + // the array before populating it. + FunctionalityOffsetAndMask() {} + FunctionalityOffsetAndMask(uint16_t offset, uint16_t mask) + : offset(offset), mask(mask) {} + // This needs to big enough to cover the size of the operator table. + uint16_t offset; + // See Note [No More Than 16 Backends] + // This mask needs to be big enough to mask all of the backend bits. + // We probably don't ever want to have more than 16 backend bits, so uint16_t + // should be enough. + uint16_t mask; +}; +static_assert( + c10::num_runtime_entries < 65536, + "The dispatcher currently only supports up to 2^16 runtime entries"); + +C10_API std::array +initializeFunctionalityOffsetsAndMasks(); + +C10_ALWAYS_INLINE static const std:: + array& + offsetsAndMasks() { + static auto offsets_and_masks_ = initializeFunctionalityOffsetsAndMasks(); + return offsets_and_masks_; +} + +// A representation of a set of DispatchKeys. A DispatchKeySet contains both +// "functionality" bits and "backend bits", and every tensor holds its own +// DispatchKeySet. The Dispatcher implements multiple dispatch by grabbing the +// keyset on every input tensor, or’ing them together, and dispatching to a +// specific piece of functionality. The functionality bits are *ordered*. When +// multiple functionality bits are set, we use the highest priority +// functionality. Similarly, multiple backend bits can theoretically be set if +// you call an operator with multiple tensors from difference devices (e.g. CPU +// and CUDA), although support for mixed device dispatch is limited (the only +// kernels that gracefully handle mixed device inputs for now are cuda kernels +// that take in a scalar cpu tensor). + // A representation of a set of DispatchKeys. A tensor may have multiple // tensor type ids, e.g., a Variable tensor can also be a CPU tensor; the // DispatchKeySet specifies what type ids apply. The internal representation is // as a 64-bit bit set (this means only 64 tensor type ids are supported). // -// Note that DispatchKeys are ordered; thus, we can ask questions like "what is -// the highest priority DispatchKey in the set"? (The set itself is not -// ordered; two sets with the same ids will always have the ids ordered in the -// same way.) +// As mentioned above, DispatchKeys are ordered; thus, we can ask questions like +// "what is the highest priority DispatchKey in the set"? (The set itself is +// not ordered; two sets with the same ids will always have the ids ordered in +// the same way.) +// +// Note [DispatchKeySet Internal Representation] +// Internally, dispatch keys are packed into 64-bit DispatchKeySet objects +// that get passed around at runtime. +// However, there isn't necessarily a 1-to-1 mapping between bits in the keyset +// and individual dispatch keys. +// +// First: why do we have this distinction, and why not map every dispatch key +// directly to a bit? This is mostly because we have several types of +// functionalities that different backends would like to customize. For example, +// we have: +// - "Dense": CPU, CUDA, XLA, ... (~12 keys) +// - "Sparse": SparseCPU, SparseCUDA, ... +// - "Quantized": QuantizedCPU, QuantizedCUDA, QuantizedXLA, ... +// - "Autograd": AutogradCPU, AutogradCUDA, Autograd XLA, ... +// The problem is that total number of keys grows quadratically with [# +// backends] x [# functionalities], making it very difficult to map each key +// directly to a bit in a bitset without dramatically increasing the size of the +// bitset over time. +// +// The two enums (BackendComponent and DispatchKey) can be divided roughly into +// 5 categories. +// +// (1) "Building block" keys +// (a) backends: jEverything in the BackendComponent enum (e.g. CPUBit, +// CUDABIt) (b) functionalities: (per-backend) functionality-bit DispatchKeys +// (e.g. AutogradFunctionality, Sparse, Dense) +// (2) "Runtime" keys +// (a) "non-customizable backends" (e.g. FPGA) +// (b) "non-customizable functionalities" (e.g. Functionalize) +// (c) "per-backend instances of customizable functionalities" (e.g. CPU, +// SparseCPU, AutogradCPU) +// (3) "Alias" DispatchKeys (see Note [Alias Dispatch Keys]) +// +// (1) Building block keys always correspond to individual bits in a +// DispatchKeySet. They can also be combined in a DispatchKeySet to form actual +// runtime keys. e.g. +// auto dense_cpu_ks = DispatchKeySet({DispatchKey::CPUBit, +// DispatchKey::Dense}); +// // The keyset has the runtime dense-cpu key. +// dense_cpu_ks.has(DispatchKey::CPU); +// // And it contains the building block keys too. +// dense_cpu_ks.has(DispatchKey::CPUBit); +// dense_cpu_ks.has(DispatchKey::Dense); +// +// Not every backend and not every functionality counts as a "building block +// key". This is mostly to give us more levers to pull in the design space. +// Backend keys and functionality keys that count as "building blocks" will +// contribute to a full cross product of functionality that can be overriden. // -// At the moment, there are no nontrivial uses of this set; tensors are always -// singletons. In the near future, this set will represent variable? + tensor -// type id. In the far future, it will be requires grad? + profiling? + -// tracing? + lazy? + tensor type id. +// For example, right now we have at least 12 "backend" building blocks (CPU, +// CUDA, XLA, ...) and at least 4 "functionality" building blocks (Dense, +// Sparse, Quantized, AutogradFunctionality, ...). These keys together allow +// every dispatcher operator to be customized in up to 12*4 different ways. Each +// of those requires a slot in the operator table of every dispatcher operator. +// Not every piece of functionality necessarily needs to be customizeable +// per-backend, and not every backend necessarily needs to be able to customize +// every type of functionality. // -// (The difference between variable and requires grad, is that -// there are currently three states a tensor can be: -// 1. Not a variable -// 2. Variable with requires_grad=False -// 3. Variable with requires_grad=True -// Eventually, we want to kill state (1), and only dispatch to autograd -// handling code if one of the inputs requires grad.) // +// (2) Every runtime key corresponds directly to a slot in an operator's runtime +// dispatch table, and you can directly register kernels to a runtime dispatch +// key. +// +// For per-backend functionalities like "Dense" or "AutogradFunctionality", +// you can think of the corresponding runtime dispatch keys as "instances" of +// that functionality, per backend. E.g. "CPU", "CUDA", "XLA", etc. are all +// runtime instances of the "Dense" building block key. + +// (2a) and (2b) are represented identically in the DispatchKeySet logic: +// - backend-agnostic functionalities (e.g. FuncTorchBatched) are NOT +// customizeable per backend. +// In order to do so, we'd need to promote it to a per-backend functionality +// "building block" key. +// - non-customizeable backends (e.g. FPGA) can NOT customize existing +// functionality like Sparse, Autograd, etc. +// In order to do so, we'd need to promote it to a backend "building block" +// key. +// +// In both cases, these keys directly correspond to runtime slots in the +// operator table. +// +// +// (3) "Alias" keys +// See Note [Alias Dispatch Keys] +// +// Final note: for anyone making future changes to the Dispatcher + +// DispatchKeySet internals, there's a closed PR with a basic +// python-implementation of the Dispatcher that might be useful in quickly +// testing out and validating changes. See it at +// https://github.com/pytorch/pytorch/pull/68743 + // An undefined tensor is one with an empty tensor type set. class DispatchKeySet final { public: @@ -41,29 +158,146 @@ class DispatchKeySet final { // NB: default constructor representation as zero is MANDATORY as // use of DispatchKeySet in TLS requires this. constexpr DispatchKeySet() : repr_(0) {} + constexpr DispatchKeySet(Full) - : repr_(std::numeric_limits::max()) {} + : repr_((1ULL << (num_backends + num_functionality_keys - 1)) - 1) {} + constexpr DispatchKeySet(FullAfter, DispatchKey t) // LSB after t are OK, but not t itself. - : repr_((1ULL << (static_cast(t) - 1)) - 1) {} + // "functionalities" have a notion of ordering (e.g. Autograd > Sparse > + // Quantized > Dense). But backends don't really have an ordering. + // Therefore, we're enforcing that FullAfter can only be used on + // "functionality" keys. + : repr_( + (1ULL + << (num_backends + static_cast(toFunctionalityKey(t)) - + 1)) - + 1) {} + // Public version of DispatchKeySet(uint64_t) API; external users // must be explicit when they do this! constexpr DispatchKeySet(Raw, uint64_t x) : repr_(x) {} - explicit constexpr DispatchKeySet(DispatchKey t) - : repr_( - t == DispatchKey::Undefined - ? 0 - : 1ULL << (static_cast(t) - 1)) {} - explicit constexpr DispatchKeySet(std::initializer_list ks) - : repr_(0) { + + constexpr explicit DispatchKeySet(BackendComponent k) { + if (k == BackendComponent::InvalidBit) { + repr_ = 0; + } else { + repr_ = 1ULL << (static_cast(k) - 1); + } + } + + constexpr explicit DispatchKeySet(DispatchKey k) { + if (k == DispatchKey::Undefined) { + // Case 1: handle Undefined specifically + repr_ = 0; + } else if (k <= DispatchKey::EndOfFunctionalityKeys) { + // Case 2: handle "functionality-only" keys + // These keys have a functionality bit set, but no backend bits + // These can technically be either: + // - valid runtime keys (e.g. DispatchKey::AutogradOther, + // DispatchKey::FuncTorchBatched, etc) + // - "building block" keys that aren't actual runtime keys (e.g. + // DispatchKey::Dense or Sparse) + uint64_t functionality_val = 1ULL + << (num_backends + static_cast(k) - 1); + repr_ = functionality_val; + } else if (k <= DispatchKey::EndOfRuntimeBackendKeys) { + // Case 3: "runtime" keys that have a functionality bit AND a backend bit. + // First compute which bit to flip for the functionality. + auto functionality_k = toFunctionalityKey(k); + // The - 1 is because Undefined is technically a "functionality" that + // doesn't show up in the bitset. So e.g. Dense is technically the second + // functionality, but the lowest functionality bit. + uint64_t functionality_val = 1ULL + << (num_backends + static_cast(functionality_k) - 1); + + // then compute which bit to flip for the backend + // Case 4a: handle the runtime instances of "per-backend functionality" + // keys For example, given DispatchKey::CPU, we should set: + // - the Dense functionality bit + // - the CPUBit backend bit + // first compute which bit to flip for the backend + auto backend_k = toBackendComponent(k); + uint64_t backend_val = backend_k == BackendComponent::InvalidBit + ? 0 + : 1ULL << (static_cast(backend_k) - 1); + repr_ = functionality_val + backend_val; + } else { + // At this point, we should have covered every case except for alias keys. + // Technically it would be possible to add alias dispatch keys to a + // DispatchKeySet, but the semantics are a little confusing and this + // currently isn't needed anywhere. + repr_ = 0; + } + } + + constexpr uint64_t keys_to_repr(std::initializer_list ks) { + uint64_t repr = 0; for (auto k : ks) { - repr_ |= DispatchKeySet(k).repr_; + repr |= DispatchKeySet(k).repr_; } + return repr; } + + constexpr uint64_t backend_bits_to_repr( + std::initializer_list ks) { + uint64_t repr = 0; + for (auto k : ks) { + repr |= DispatchKeySet(k).repr_; + } + return repr; + } + + explicit constexpr DispatchKeySet(std::initializer_list ks) + : repr_(keys_to_repr(ks)) {} + + explicit constexpr DispatchKeySet(std::initializer_list ks) + // Note: for some reason, putting this logic directly in the constructor + // appears to fail to compile on CUDA 10.1. + // See an example internal failure at + // https://www.internalfb.com/intern/skycastle/run/76561193669136035/artifact/actionlog.76561193742069401.stderr + : repr_(backend_bits_to_repr(ks)) {} + // Test if a DispatchKey is in the set - bool inline has(DispatchKey t) const { + inline bool has(DispatchKey t) const { TORCH_INTERNAL_ASSERT_DEBUG_ONLY(t != DispatchKey::Undefined); - return static_cast(repr_ & DispatchKeySet(t).repr_); + return has_all(DispatchKeySet(t)); + } + constexpr bool has_backend(BackendComponent t) const { + return has_all(DispatchKeySet(t)); + } + + // Test if a DispatchKey is in the set + // Given a DispatchKeySet of functionality keys and (potentially) backend + // keys, tests if all of them are in the current set. + constexpr bool has_all(DispatchKeySet ks) const { + return static_cast((repr_ & ks.repr_) == ks.repr_); + } + + // Given a DispatchKeySet of functionality keys and (potentially) backend + // keys, tests if any of them are in the current set. This could technically + // be pretty easily implemented using has(). It is strictly a perf + // optimization though. There are many places in the code base where we want + // to test for multiple functionality keys together. HOWEVER, runtime + // per-backend functionality keys aren't allowed to be used with this + // function, because you can end up with weird results. e.g. + // DispatchKeySet(DispatchKey::AutogradCPU).has_any(DispatchKeySet(DispatchKey::CPU)) + // would return true. + inline bool has_any(DispatchKeySet ks) const { + TORCH_INTERNAL_ASSERT_DEBUG_ONLY( + // Either there are no backend bits in the input keyset + ((ks.repr_ & full_backend_mask) == 0) || + // or there are no per-backend-functionality bits + // See [Note: Per-Backend Functionality Dispatch Keys] + ((ks & + DispatchKeySet({ + DispatchKey::Dense, + DispatchKey::Quantized, + DispatchKey::Sparse, + DispatchKey::AutogradFunctionality, + }) + .repr_) == 0)); + return static_cast((repr_ & ks.repr_) != 0); } // Test if DispatchKeySet is a superset of ks. bool isSupersetOf(DispatchKeySet ks) const { @@ -74,31 +308,64 @@ class DispatchKeySet final { return DispatchKeySet(repr_ | other.repr_); } // Perform set intersection - DispatchKeySet operator&(DispatchKeySet other) const { + constexpr DispatchKeySet operator&(DispatchKeySet other) const { return DispatchKeySet(repr_ & other.repr_); } - // Compute the set difference self - other + // Compute the set difference self - other, + // but ONLY for the functionality keys. + // Any backend bits set on self will remain unchanged. + // See Note [Removing keys from DispatchKeySet Only Affects Functionality + // Keys] DispatchKeySet operator-(DispatchKeySet other) const { - return DispatchKeySet(repr_ & ~other.repr_); + return DispatchKeySet(repr_ & (full_backend_mask | ~other.repr_)); } + // Compute self ^ other constexpr DispatchKeySet operator^(DispatchKeySet other) const { return DispatchKeySet(repr_ ^ other.repr_); } - // Perform set equality bool operator==(DispatchKeySet other) const { return repr_ == other.repr_; } + bool operator!=(DispatchKeySet other) const { + return repr_ != other.repr_; + } // Add a DispatchKey to the DispatchKey set. Does NOT mutate, // returns the extended DispatchKeySet! C10_NODISCARD DispatchKeySet add(DispatchKey t) const { return *this | DispatchKeySet(t); } - // Remove a DispatchKey from the DispatchKey set. This is - // generally not an operation you should be doing (it's - // used to implement operator<<) - C10_NODISCARD constexpr DispatchKeySet remove(DispatchKey t) const { - return DispatchKeySet(repr_ & ~DispatchKeySet(t).repr_); + C10_NODISCARD DispatchKeySet add(DispatchKeySet ks) const { + return *this | ks; + } + + // Remove a DispatchKey from the DispatchKey set. + // This is generally not an operation you should be doing + // (it's used to implement the printing overload, operator<<) + // + // Note [Removing keys from DispatchKeySet Only Affects Functionality Keys] + // Only functionality bits are allowed to be removed from a keyset. + // For now, we're only allowing removal of "functionality bits" from the + // keyset, which is specifically needed by the fallthrough key calculation + // logic. Why is removing backend bits problematic? Consider this example: + // + // DispatchKeySet([DispatchKey.CPU, DispatchKey.AutogradCUDA, + // DispatchKey.CUDA]).remove(DispatchKey.AutogradCUDA) + // DispatchKeySet([DispatchKey.CPU, + // DispatchKey.AutogradCUDA]).remove(DispatchKey.AutogradCUDA) + // + // What do we want to happen? + // Technically, we'd like it to be true that after removal, + // the first keyset still has the CUDA dispatch key while the second doesn't. + // Unfortunately there's no way to represent that, because the two keysets are + // represented the same way internally: functionality bits: Autograd, Dense + // backend bits: CPU, CUDA + // + // Instead, remove(DispatchKey.AutogradCPU) will only remove the "Autograd" + // bit from the bitset. + constexpr DispatchKeySet remove(DispatchKey t) const { + return DispatchKeySet( + repr_ & ~(DispatchKeySet(t).repr_ & ~full_backend_mask)); } // Is the set empty? (AKA undefined tensor) bool empty() const { @@ -107,22 +374,112 @@ class DispatchKeySet final { uint64_t raw_repr() { return repr_; } - // Return the type id in this set with the highest priority (i.e., - // is the largest in the DispatchKey enum). Intuitively, this - // type id is the one that should handle dispatch (assuming there - // aren't any further exclusions or inclusions). + + DispatchKey highestFunctionalityKey() const { + auto functionality_idx = indexOfHighestBit(); + // This means that none of the functionality bits were set. + if (functionality_idx < num_backends) + return DispatchKey::Undefined; + // The first num_backend bits in the keyset don't correspond to real + // dispatch keys. + return static_cast(functionality_idx - num_backends); + } + + // This is similar like toBackendComponent(DispatchKey), but less restrictive. + // toBackendComponent() errors out if the key that it was passed has no + // backend bits, which is useful for error checking. We need a version of that + // here that can also handle "fake" backends like FPGA, because they need to + // map to the AutogradOther key. For those backends, we return + // BackendComponent::InvalidBit. + BackendComponent highestBackendKey() const { + // mask to mask out functionality bits + auto backend_idx = + DispatchKeySet(repr_ & full_backend_mask).indexOfHighestBit(); + // all zeros across the backend bits means that no backend bits are set. + if (backend_idx == 0) + return BackendComponent::InvalidBit; + return static_cast(backend_idx); + } + + // returns the DispatchKey of highest priority in the set. DispatchKey highestPriorityTypeId() const { - // TODO: If I put Undefined as entry 64 and then adjust the - // singleton constructor to shift from the right, we can get rid of the - // subtraction here. It's modestly more complicated to get right so I - // didn't do it for now. - return static_cast(64 - llvm::countLeadingZeros(repr_)); + auto functionality_k = highestFunctionalityKey(); + if (isPerBackendFunctionalityKey(functionality_k)) { + return toRuntimePerBackendFunctionalityKey( + functionality_k, highestBackendKey()); + } + return functionality_k; + } + + // Returns the index of the most-significant bit in the keyset. + // This is used to as part of the calculation into the operator table to get: + // - the highest "functionality" bit in the keyset. + // - the highest "backend" bit in the keyset. + uint8_t indexOfHighestBit() const { + return 64 - llvm::countLeadingZeros(repr_); } - DispatchKey highestPriorityBackendTypeId() const { - return (*this & - ((1ULL << static_cast(DispatchKey::EndOfBackendKeys)) - 1)) - .highestPriorityTypeId(); +#if defined(C10_MOBILE_TRIM_DISPATCH_KEYS) + // [Note: Trimmed Mobile Dispatch Keys] + /** + * The method below maps the dispatch key in the enum DispatchKey to an + * integer index in the dispatchTable_ array in OperatorEntry. The array + * is trimmed for mobile to reduce peak memory usage since it's + * unnecessary to reserve additional space for dispatch keys that will + * never be used on mobile. + */ + int getDispatchTableIndexForDispatchKeySet() const { + auto dk = highestPriorityTypeId(); + switch (dk) { + case DispatchKey::Undefined: + return 0; + case DispatchKey::CPU: + return 1; + case DispatchKey::QuantizedCPU: + return 2; + case DispatchKey::SparseCPU: + return 3; + case DispatchKey::BackendSelect: + return 4; + case DispatchKey::ADInplaceOrView: + return 5; + case DispatchKey::AutogradOther: + return 6; + case DispatchKey::AutogradCPU: + return 7; + default: + return -1; + } + } +#else + // returns the index in the operator table of highest priority key in the the + // keyset Note that we could in theory implement this using + // highestPriorityTypeId(), but this code is very hotpath and we can do it + // faster without it. + int getDispatchTableIndexForDispatchKeySet() const { + auto functionality_idx = + DispatchKeySet(repr_ >> num_backends).indexOfHighestBit(); + auto offset_and_mask = offsetsAndMasks()[functionality_idx]; + // Mask the functionality bits out first, then right-shift by 1. + // right-shifting by 1 because everything is zero-indexed. + // E.g. 000001 (CPU) should give us an offset of 0, 000010 (CUDA) should + // give us an offset of 1, etc. + auto backend_idx = + DispatchKeySet((repr_ & offset_and_mask.mask) >> 1).indexOfHighestBit(); + return offset_and_mask.offset + backend_idx; + } +#endif + + // returns the "index" of the highest priority backend in the keyset. + // This is pretty similar to getBackendKey(), but: + // - It's hotpath code (part of the runtime bitset calculation) + // - I's returns an integer index, not an enum value + // - Everything is shifted to the right by 1. + // BackendComponent::InvalidBit is technically the lowest enum value, + // but it isn't included in the runtime table. So CPUBit = 1, CUDABit = 2, + // etc. + uint64_t getBackendIndex() const { + return DispatchKeySet((repr_ & full_backend_mask) >> 1).indexOfHighestBit(); } private: @@ -130,42 +487,53 @@ class DispatchKeySet final { uint64_t repr_ = 0; public: - // STL iterator for DispatchKeySet. Iterates through all DispatchKeys in the - // set. The iterator is only invalidated by the destruction of the underlying - // DispatchKeySet as the iterator stores a pointer to the raw representation - // of the DispatchKeySet. + // STL iterator for DispatchKeySet. Iterates through all runtime DispatchKeys + // in the set. The iterator is only invalidated by the destruction of the + // underlying DispatchKeySet as the iterator stores a pointer to the raw + // representation of the DispatchKeySet. Note: When we encounter a per-backend + // functionality (e.g. Dense or Sparse), we will iterate through EVERY backend + // in the keyset, for that functionality. For example, if the next + // functionality key to iterate over is Autograd, and the backend bits in the + // keyset correspond to [BackendComponent::CPUBit, BackendComponent::CUDABit], + // then the next two keys we return will be DispatchKey::AutogradCPU, + // DispatchKey::AutogradCUDA (CPU first because it has lower precedence than + // CUDA in DispatchKey.h). class iterator { public: using self_type = iterator; using iterator_category = std::input_iterator_tag; using value_type = DispatchKey; using difference_type = ptrdiff_t; - - explicit iterator(const uint64_t* data_ptr, uint8_t i = 0) - : data_ptr_(data_ptr), i_(i) { + // final mask value should mask out the entire keyset + static const uint8_t end_iter_mask_val = + num_backends + num_functionality_keys; + // final key value should be the last DispatchKey + static const uint8_t end_iter_key_val = num_functionality_keys; + + // current_dispatchkey_idx_ will iterate through all functionality bits. + // current_backendcomponent_idx_ will iterate through all backend bits. + explicit iterator( + const uint64_t* data_ptr, + uint8_t next_functionality = num_backends, + uint8_t next_backend = 0) + : data_ptr_(data_ptr), + next_functionality_(next_functionality), + next_backend_(next_backend), + // These are in an invalid state at construction time, and set by the + // first increment call + current_dispatchkey_idx_(end_iter_key_val), + current_backendcomponent_idx_(end_iter_key_val) { // Go to the first key in the set + TORCH_INTERNAL_ASSERT( + next_functionality_ >= num_backends, + "num_backends=", + static_cast(num_backends), + "next_functionality_=", + static_cast(next_functionality_)); ++(*this); } - self_type& operator++() { - TORCH_INTERNAL_ASSERT( - i_ <= static_cast(DispatchKey::NumDispatchKeys)); - - // Create a masked version of the set representation to ignore previous - // keys that we've iterated through. - uint64_t masked_data = llvm::maskTrailingZeros(i_) & *data_ptr_; - uint64_t firstKeyIndex = llvm::findFirstSet(masked_data); - - // If there are no keys, set to end iterator value - if (firstKeyIndex == std::numeric_limits::max() || - i_ == static_cast(DispatchKey::NumDispatchKeys)) { - i_ = static_cast(DispatchKey::NumDispatchKeys); - return *this; - } - - i_ = static_cast(firstKeyIndex) + 1; - return *this; - } + C10_API self_type& operator++(); self_type operator++(int) { self_type previous_iterator = *this; @@ -174,18 +542,50 @@ class DispatchKeySet final { } bool operator==(const self_type& rhs) const { - return i_ == rhs.i_; + return next_functionality_ == rhs.next_functionality_ && + current_dispatchkey_idx_ == rhs.current_dispatchkey_idx_ && + next_backend_ == rhs.next_backend_ && + current_backendcomponent_idx_ == rhs.current_backendcomponent_idx_; } bool operator!=(const self_type& rhs) const { - return i_ != rhs.i_; + return next_functionality_ != rhs.next_functionality_ || + current_dispatchkey_idx_ != rhs.current_dispatchkey_idx_ || + next_backend_ != rhs.next_backend_ || + current_backendcomponent_idx_ != rhs.current_backendcomponent_idx_; } DispatchKey operator*() const { - return static_cast(i_); + auto functionality_key = + static_cast(current_dispatchkey_idx_); + if (isPerBackendFunctionalityKey(functionality_key)) { + auto next_key = toRuntimePerBackendFunctionalityKey( + functionality_key, + static_cast(current_backendcomponent_idx_)); + // We expect all of the Dense, Sparse, Quantized, and Autograd keys to + // be ordered the same way with respect to their backends + TORCH_INTERNAL_ASSERT( + toBackendComponent(next_key) == + static_cast(current_backendcomponent_idx_), + "Tried to map functionality key ", + toString(functionality_key), + " and backend bit ", + toString( + static_cast(current_backendcomponent_idx_)), + " to a runtime key, but ended up with ", + toString(next_key), + ". This can happen if the order of the backend dispatch keys in DispatchKey.h isn't consistent.", + " Please double check that enum for inconsistencies."); + return next_key; + } else { + return functionality_key; + } } private: const uint64_t* data_ptr_; - uint8_t i_; + uint8_t next_functionality_; + uint8_t next_backend_; + uint8_t current_dispatchkey_idx_; + uint8_t current_backendcomponent_idx_; }; public: @@ -195,31 +595,35 @@ class DispatchKeySet final { return iterator(&repr_); } - // We do not need to iterate beyond NumDispatchKeys so we will treat this as - // the end iterator. NumDispatchKeys will always be strictly less than 64. + // We do not need to iterate beyond EndOfFunctionalityKeys so we will treat + // this as the end iterator. iterator end() const { - return iterator(&repr_, static_cast(DispatchKey::NumDispatchKeys)); + return iterator(&repr_, iterator::end_iter_mask_val); } }; C10_API std::string toString(DispatchKeySet); C10_API std::ostream& operator<<(std::ostream&, DispatchKeySet); -// autograd_dispatch_keyset should include all runtime autograd keys. -// Alias key DispatchKey::Autograd maps to autograd_dispatch_keyset. +C10_API inline int getDispatchTableIndexForDispatchKey(DispatchKey k) { + return DispatchKeySet(k).getDispatchTableIndexForDispatchKeySet(); +} + +// Alias key DispatchKey::Autograd maps to +// (autograd_dispatch_keyset x full_backend_mask) // NB: keys in this set also get associated with CompositeImplicitAutograd +// +// Note [autograd_dispatch_keyset Does Not Include Backend Bits] +// We don't want to include any backend bits (BackendComponent::CPUBit, etc) +// directly in autograd_dispatch_keyset. +// Why? keysets like autograd_dispatch_keyset are commonly used to remove +// autograd keys from a DispatchKeySet throughout the code base. However, you +// are only allowed to remove functionality bits from a keyset, not backend +// bits. See Note [Removing keys from DispatchKeySet Only Affects Functionality +// Keys] for details. To be consistent and avoid confusion, we're explicitly +// setting up autograd_dispatch_keyset to not have any backend bits. constexpr DispatchKeySet autograd_dispatch_keyset = DispatchKeySet({ - DispatchKey::AutogradCPU, - DispatchKey::AutogradCUDA, - DispatchKey::AutogradXLA, - DispatchKey::AutogradLazy, - DispatchKey::AutogradNestedTensor, - DispatchKey::AutogradMLC, - DispatchKey::AutogradHPU, - DispatchKey::AutogradXPU, - DispatchKey::AutogradPrivateUse1, - DispatchKey::AutogradPrivateUse2, - DispatchKey::AutogradPrivateUse3, + DispatchKey::AutogradFunctionality, DispatchKey::AutogradOther, }); @@ -242,27 +646,42 @@ constexpr DispatchKeySet default_excluded_set = DispatchKeySet({ constexpr DispatchKeySet autograd_dispatch_keyset_with_ADInplaceOrView = autograd_dispatch_keyset | DispatchKeySet(DispatchKey::ADInplaceOrView); +constexpr DispatchKeySet python_ks = DispatchKeySet({ + DispatchKey::Python, + DispatchKey::PythonTLSSnapshot, +}); + +constexpr DispatchKeySet sparse_ks = DispatchKeySet(DispatchKey::Sparse); + +constexpr DispatchKeySet sparse_csr_ks = + DispatchKeySet({DispatchKey::SparseCsrCPU, DispatchKey::SparseCsrCUDA}); + +constexpr DispatchKeySet mkldnn_ks = DispatchKeySet(DispatchKey::MkldnnCPU); + // backend dispatch keys that map to DispatchKey::AutogradOther // NB: keys in this set also get associated with CompositeImplicitAutograd -constexpr DispatchKeySet autogradother_backends = DispatchKeySet( - {DispatchKey::HIP, - DispatchKey::VE, - DispatchKey::FPGA, - DispatchKey::ORT, - DispatchKey::Vulkan, - DispatchKey::Metal, - DispatchKey::QuantizedCPU, - DispatchKey::QuantizedCUDA, - DispatchKey::CustomRNGKeyId, - DispatchKey::MkldnnCPU, - DispatchKey::SparseCPU, - DispatchKey::SparseCUDA, - DispatchKey::SparseHIP, - DispatchKey::SparseVE, - DispatchKey::SparseXPU, - DispatchKey::SparseCsrCPU, - DispatchKey::SparseCsrCUDA, - DispatchKey::Meta}); +constexpr DispatchKeySet autogradother_backends = + DispatchKeySet( + // HIP and VE aren't in this list: they now have their own backend bits + // which means that they can now have their own Autograd keys. + // Technically, HIP will now redispatch to its own custom AutogradHIP + // slot in the runtime table. + {DispatchKey::FPGA, + DispatchKey::ORT, + DispatchKey::Vulkan, + DispatchKey::Metal, + DispatchKey::SparseCsrCPU, + DispatchKey::SparseCsrCUDA, + DispatchKey::CustomRNGKeyId, + DispatchKey::MkldnnCPU, + DispatchKey::Meta, + // Sparse and Quantized backends also live here. + DispatchKey::Sparse, + DispatchKey::Quantized}) + // Including the backend bits because this keyset is used during op + // registration, which requires looping over all runtime autogradother + // backend keys. + | DispatchKeySet(DispatchKeySet::RAW, full_backend_mask); // The set of dispatch keys that come after autograd // n.b. this relies on the fact that AutogradOther is currently the lowest @@ -292,6 +711,48 @@ constexpr DispatchKeySet after_func_keyset = // away with it by explicitly removing the key here. c10::DispatchKey::ADInplaceOrView); +constexpr DispatchKeySet backend_bitset_mask = + DispatchKeySet(DispatchKeySet::RAW, (1ULL << num_backends) - 1); + +constexpr auto inplace_or_view_ks = + DispatchKeySet(DispatchKey::ADInplaceOrView); +constexpr auto autograd_cpu_ks = DispatchKeySet(DispatchKey::AutogradCPU); +constexpr auto autograd_ipu_ks = DispatchKeySet(DispatchKey::AutogradIPU); +constexpr auto autograd_xpu_ks = DispatchKeySet(DispatchKey::AutogradXPU); +constexpr auto autograd_cuda_ks = DispatchKeySet(DispatchKey::AutogradCUDA); +constexpr auto autograd_xla_ks = DispatchKeySet(DispatchKey::AutogradXLA); +constexpr auto autograd_lazy_ks = DispatchKeySet(DispatchKey::AutogradLazy); +constexpr auto autograd_mlc_ks = DispatchKeySet(DispatchKey::AutogradMLC); +constexpr auto autograd_hpu_ks = DispatchKeySet(DispatchKey::AutogradHPU); +constexpr auto autograd_privateuse1_ks = + DispatchKeySet(DispatchKey::AutogradPrivateUse1); +constexpr auto autograd_privateuse2_ks = + DispatchKeySet(DispatchKey::AutogradPrivateUse2); +constexpr auto autograd_privateuse3_ks = + DispatchKeySet(DispatchKey::AutogradPrivateUse3); +constexpr auto autograd_other_ks = DispatchKeySet(DispatchKey::AutogradOther); + +// This keyset has: +// (1) the functionality bits corresponding to backends (dense, sparse, +// quantized) (2) all of the backend bits set +constexpr DispatchKeySet backend_functionality_keys = + DispatchKeySet({ + DispatchKey::Dense, + DispatchKey::Quantized, + DispatchKey::Sparse, + }) | + DispatchKeySet(DispatchKeySet::RAW, full_backend_mask); + +struct OpTableOffsetAndMask { + uint16_t offset; + uint16_t backend_mask; +}; + +static_assert( + num_backends <= 16, + "Right now we expect the number of backends not to exceed 16. In the (unlikely) event" + " that this changes, the size of OpTableOffsetAndMask::backend_mask needs to be increased too."); + // true if t is a backend dispatch key C10_API bool isBackendDispatchKey(DispatchKey t); @@ -307,10 +768,62 @@ C10_API bool runtimeDispatchKeySetHas(DispatchKey t, DispatchKey k); C10_API DispatchKeySet getBackendKeySetFromAutograd(DispatchKey t); // Returns a DispatchKeySet of autograd related keys mapped to backend. -C10_API DispatchKeySet getAutogradRelatedKeySetFromBackend(DispatchKey t); +// for a given backend key, use the associated autograd key. +// for non-backend keys, use AutogradOther as a default. +// Note: it's convenient and fast to return a default here rather than (say) +// returning an optional, or throwing. But it makes callers +// responsible for either a) enforcing the invariant that only backend keys +// be passed as arguments, or b) interpreting our return value carefully. +inline DispatchKeySet getAutogradRelatedKeySetFromBackend(BackendComponent t) { + switch (t) { + case BackendComponent::CPUBit: + return inplace_or_view_ks | autograd_cpu_ks; + case BackendComponent::IPUBit: + return inplace_or_view_ks | autograd_ipu_ks; + case BackendComponent::XPUBit: + return inplace_or_view_ks | autograd_xpu_ks; + case BackendComponent::CUDABit: + return inplace_or_view_ks | autograd_cuda_ks; + case BackendComponent::XLABit: + return inplace_or_view_ks | autograd_xla_ks; + case BackendComponent::LazyBit: + return inplace_or_view_ks | autograd_lazy_ks; + case BackendComponent::MLCBit: + return inplace_or_view_ks | autograd_mlc_ks; + case BackendComponent::HPUBit: + return inplace_or_view_ks | autograd_hpu_ks; + case BackendComponent::PrivateUse1Bit: + return inplace_or_view_ks | autograd_privateuse1_ks; + case BackendComponent::PrivateUse2Bit: + return inplace_or_view_ks | autograd_privateuse2_ks; + case BackendComponent::PrivateUse3Bit: + return inplace_or_view_ks | autograd_privateuse3_ks; + default: + return inplace_or_view_ks | autograd_other_ks; + } +} // Returns a DispatchKeySet of autocast related keys mapped to backend. -C10_API DispatchKeySet getAutocastRelatedKeySetFromBackend(DispatchKey t); +inline DispatchKeySet getAutocastRelatedKeySetFromBackend(BackendComponent t) { + constexpr auto autocast_cpu_ks = DispatchKeySet(DispatchKey::AutocastCPU); + constexpr auto autocast_cuda_ks = DispatchKeySet(DispatchKey::AutocastCUDA); + switch (t) { + case BackendComponent::CPUBit: + return autocast_cpu_ks; + case BackendComponent::CUDABit: + case BackendComponent::XLABit: + return autocast_cuda_ks; + default: + return DispatchKeySet(); + } +} + +// returns the "backend" DispatchKey of highest priority in the set. +// This is basically like highestBackendKey(), except that we have some +// "functionality" bits that correspond to backends (Sparse, Quantized) +inline DispatchKey highestPriorityBackendTypeId(DispatchKeySet ks) { + return (ks & backend_functionality_keys).highestPriorityTypeId(); +} // This API exists because we have a use case for checking // getRuntimeDispatchKeySet(alias).has(DispatchKey::Undefined) @@ -329,7 +842,8 @@ static inline DispatchKey legacyExtractDispatchKey(DispatchKeySet s) { // here. At the moment, autograd keys and ADInplaceOrView key need this // treatment; return (s - autograd_dispatch_keyset_with_ADInplaceOrView - - autocast_dispatch_keyset) + autocast_dispatch_keyset - + DispatchKeySet({DispatchKey::PythonTLSSnapshot, DispatchKey::Python})) .highestPriorityTypeId(); } diff --git a/c10/core/QEngine.h b/c10/core/QEngine.h index ac092193d92136..60c21361f15f0d 100644 --- a/c10/core/QEngine.h +++ b/c10/core/QEngine.h @@ -15,11 +15,13 @@ enum class QEngine : uint8_t { NoQEngine = 0, FBGEMM = 1, QNNPACK = 2, + ONEDNN = 3, }; constexpr auto kNoQEngine = QEngine::NoQEngine; constexpr auto kFBGEMM = QEngine::FBGEMM; constexpr auto kQNNPACK = QEngine::QNNPACK; +constexpr auto kONEDNN = QEngine::ONEDNN; inline std::string toString(QEngine qengine) { switch (qengine) { @@ -29,6 +31,8 @@ inline std::string toString(QEngine qengine) { return "FBGEMM"; case kQNNPACK: return "QNNPACK"; + case kONEDNN: + return "ONEDNN"; default: TORCH_CHECK( false, "Unrecognized Quantized Engine: ", static_cast(qengine)); diff --git a/c10/core/SafePyObject.cpp b/c10/core/SafePyObject.cpp new file mode 100644 index 00000000000000..d8c3da49ffb121 --- /dev/null +++ b/c10/core/SafePyObject.cpp @@ -0,0 +1,11 @@ +#include +#include + +namespace c10 { + +PyObject* SafePyObject::ptr(const c10::impl::PyInterpreter* interpreter) const { + TORCH_INTERNAL_ASSERT(interpreter == pyinterpreter_); + return data_; +} + +} // namespace c10 diff --git a/c10/core/SafePyObject.h b/c10/core/SafePyObject.h new file mode 100644 index 00000000000000..13e32da3dc1dfe --- /dev/null +++ b/c10/core/SafePyObject.h @@ -0,0 +1,45 @@ +#pragma once + +#include +#include +#include + +namespace c10 { + +// This is an safe owning holder for a PyObject, akin to pybind11's +// py::object, with two major differences: +// +// - It is in c10/core; i.e., you can use this type in contexts where +// you do not have a libpython dependency +// +// - It is multi-interpreter safe (ala torchdeploy); when you fetch +// the underlying PyObject* you are required to specify what the current +// interpreter context is and we will check that you match it. +// +// It is INVALID to store a reference to a Tensor object in this way; +// you should just use TensorImpl directly in that case! +struct C10_API SafePyObject { + // Steals a reference to data + SafePyObject(PyObject* data, c10::impl::PyInterpreter* pyinterpreter) + : data_(data), pyinterpreter_(pyinterpreter) {} + + // In principle this could be copyable if we add an incref to PyInterpreter + // but for now it's easier to just disallow it. + SafePyObject(SafePyObject const&) = delete; + SafePyObject& operator=(SafePyObject const&) = delete; + + ~SafePyObject() { + pyinterpreter_->decref(data_, /*is_tensor*/ false); + } + + c10::impl::PyInterpreter* pyinterpreter() const { + return pyinterpreter_; + } + PyObject* ptr(const c10::impl::PyInterpreter*) const; + + private: + PyObject* data_; + c10::impl::PyInterpreter* pyinterpreter_; +}; + +} // namespace c10 diff --git a/c10/core/Scalar.h b/c10/core/Scalar.h index 08bf95e1875dab..66d96f69af7782 100644 --- a/c10/core/Scalar.h +++ b/c10/core/Scalar.h @@ -67,7 +67,7 @@ class C10_API Scalar { } // TODO: Support ComplexHalf accessor - AT_FORALL_SCALAR_TYPES_WITH_COMPLEX_EXCEPT_COMPLEX_HALF(DEFINE_ACCESSOR) + AT_FORALL_SCALAR_TYPES_WITH_COMPLEX(DEFINE_ACCESSOR) // also support scalar.to(); // Deleted for unsupported types, but specialized below for supported types @@ -201,7 +201,7 @@ using OptionalScalarRef = c10::OptionalRef; inline T Scalar::to() const { \ return to##name(); \ } -AT_FORALL_SCALAR_TYPES_WITH_COMPLEX_EXCEPT_COMPLEX_HALF(DEFINE_TO) +AT_FORALL_SCALAR_TYPES_WITH_COMPLEX(DEFINE_TO) #undef DEFINE_TO } // namespace c10 diff --git a/c10/core/ScalarType.h b/c10/core/ScalarType.h index d805623efe6c14..16553cf0230ace 100644 --- a/c10/core/ScalarType.h +++ b/c10/core/ScalarType.h @@ -4,6 +4,7 @@ #include #include #include +#include #include #include #include @@ -63,6 +64,21 @@ namespace c10 { _(bool, Bool) \ _(at::BFloat16, BFloat16) +#define AT_FORALL_SCALAR_TYPES_WITH_COMPLEX(_) \ + _(uint8_t, Byte) \ + _(int8_t, Char) \ + _(int16_t, Short) \ + _(int, Int) \ + _(int64_t, Long) \ + _(at::Half, Half) \ + _(float, Float) \ + _(double, Double) \ + _(c10::complex, ComplexHalf) \ + _(c10::complex, ComplexFloat) \ + _(c10::complex, ComplexDouble) \ + _(bool, Bool) \ + _(at::BFloat16, BFloat16) + enum class ScalarType : int8_t { #define DEFINE_ENUM(_1, n) n, AT_FORALL_SCALAR_TYPES_WITH_COMPLEX_AND_QINTS(DEFINE_ENUM) diff --git a/c10/core/TensorImpl.cpp b/c10/core/TensorImpl.cpp index fad9dcb6fc3a69..75d0e03255b145 100644 --- a/c10/core/TensorImpl.cpp +++ b/c10/core/TensorImpl.cpp @@ -20,43 +20,6 @@ C10_DEFINE_int64( namespace c10 { -namespace impl { - -static std::string noop_name_fn(const PyInterpreter*) { - return ""; -} - -static void noop_decref_fn(const PyInterpreter*, PyObject*, bool) { - // no-op -} - -static c10::intrusive_ptr noop_detach_fn( - const PyInterpreter*, - const TensorImpl*) { - TORCH_INTERNAL_ASSERT( - 0, - "attempted to detach (shallow_copy_and_detach) Tensor with nontrivial PyObject after corresponding interpreter died"); -} - -static void noop_dispatch_fn( - const PyInterpreter*, - const c10::OperatorHandle& op, - torch::jit::Stack* stack, - const std::shared_ptr& type) { - TORCH_INTERNAL_ASSERT( - 0, - "attempted to dispatch (__torch_dispatch__) an operator on Tensor with nontrivial PyObject after corresponding interpreter died"); -} - -void PyInterpreter::disarm() noexcept { - name_fn_ = &noop_name_fn; - decref_fn_ = &noop_decref_fn; - detach_fn_ = &noop_detach_fn; - dispatch_fn_ = &noop_dispatch_fn; -} - -} // namespace impl - const char* const TensorImpl::err_msg_tensor_metadata_change_not_allowed = "is not allowed on a Tensor created from .data or .detach().\n" "If your intent is to change the metadata of a Tensor (such as sizes / strides / storage / storage_offset)\n" @@ -148,10 +111,7 @@ TensorImpl::TensorImpl( numel_(0), data_type_(data_type), device_opt_(storage_.device()), - key_set_( - key_set.remove(DispatchKey::Python) - .remove(DispatchKey::PythonTLSSnapshot)) { // See [Note: Python - // key removal] + key_set_(key_set - c10::python_ks) { // See [Note: Python key removal] init_bitfields(); // Inference tensor doesn't have version counter. if (!is_inference()) { @@ -192,14 +152,12 @@ TensorImpl::TensorImpl( // TODO: be more explicit about the full key set at call sites so we // don't have to keep recomputing it here - DispatchKey k = key_set.highestPriorityBackendTypeId(); + auto k = key_set.highestBackendKey(); key_set = key_set | getAutocastRelatedKeySetFromBackend(k); - key_set = - key_set.remove(DispatchKey::Python) - .remove( - DispatchKey::PythonTLSSnapshot); // See [Note: Python key removal] + // See [Note: Python key removal] + key_set = key_set - c10::python_ks; // Inference tensor doesn't have autograd related keys. if (inference_mode) { @@ -420,7 +378,7 @@ void TensorImpl::throw_storage_access_error() const { bool TensorImpl::is_contiguous_nondefault_policy_impl( at::MemoryFormat memory_format) const { if (has_contiguity_ == - static_cast(HasContiguityPolicy::ContiguityNotSupported)) { + static_cast(CustomizableMethodPolicy::ContiguityNotSupported)) { TORCH_CHECK_NOT_IMPLEMENTED( false, "Tensors of type ", @@ -429,7 +387,7 @@ bool TensorImpl::is_contiguous_nondefault_policy_impl( } else { TORCH_INTERNAL_ASSERT_DEBUG_ONLY( has_contiguity_ == - static_cast(HasContiguityPolicy::CustomBehavior)); + static_cast(CustomizableMethodPolicy::CustomBehavior)); return is_contiguous_custom(memory_format); } } @@ -441,6 +399,22 @@ bool TensorImpl::is_contiguous_custom(at::MemoryFormat memory_format) const { "set_has_contiguity_policy and forget to override is_contiguous_custom?"); } +IntArrayRef TensorImpl::sizes_nondefault_policy_impl() const { + if (sizes_customization_policy_ == + static_cast(CustomizableMethodPolicy::NotSupported)) { + TORCH_CHECK_NOT_IMPLEMENTED( + false, + "Tensors of type ", + tensorimpl_type_name(), + " do not have sizes"); + } else { + TORCH_CHECK_NOT_IMPLEMENTED( + false, + "custom behavior for sizes() is not supported; please add it or file " + "an issue.") + } +} + static void deletePlacementDeleteContext(void* ptr) { delete static_cast(ptr); } @@ -572,6 +546,8 @@ void TensorImpl::copy_tensor_metadata_except_version_counter( dest_impl->is_wrapped_number_ = src_impl->is_wrapped_number_; dest_impl->reserved_ = src_impl->reserved_; dest_impl->set_allow_tensor_metadata_change(allow_tensor_metadata_change); + dest_impl->sizes_customization_policy_ = + src_impl->sizes_customization_policy_; dest_impl->storage_access_should_throw_ = src_impl->storage_access_should_throw_; if (src_impl->named_tensor_meta_ != nullptr) { @@ -606,23 +582,6 @@ void TensorImpl::copy_tensor_metadata( } } -TorchDispatchTypeObject::TorchDispatchTypeObject( - PyObject* type_object, - c10::impl::PyInterpreter* pyinterpreter) - : data_(type_object), pyinterpreter_(pyinterpreter) {} - -TorchDispatchTypeObject::~TorchDispatchTypeObject() { - pyinterpreter_->decref(data_, /*is_tensor*/ false); -} - -c10::impl::PyInterpreter* TorchDispatchTypeObject::pyinterpreter() const { - return pyinterpreter_; -} - -PyObject* TorchDispatchTypeObject::ptr() const { - return data_; -} - namespace impl { namespace { diff --git a/c10/core/TensorImpl.h b/c10/core/TensorImpl.h index 4f6019a5ec3c6b..1fdfad185c86ea 100644 --- a/c10/core/TensorImpl.h +++ b/c10/core/TensorImpl.h @@ -8,6 +8,7 @@ #include #include #include +#include #include #include #include @@ -16,9 +17,11 @@ #include #include #include +#include #include #include +#include #include #include @@ -49,17 +52,9 @@ class TensorBase; namespace c10 { class Scalar; -struct IValue; struct Storage; -class OperatorHandle; } // namespace c10 -namespace torch { -namespace jit { -using Stack = std::vector; -} -} // namespace torch - namespace c10 { /** @@ -168,9 +163,6 @@ struct C10_API AutogradMetaInterface { virtual ~AutogradMetaInterface(); }; -// forward declared -struct TorchDispatchTypeObject; - namespace impl { // Unfortunately, the definition of AutogradMeta lives in a separate @@ -196,137 +188,6 @@ struct C10_API AutogradMetaFactoryRegisterer { } }; -// Note [Python interpreter tag] -// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -// We store a PyObject on TensorImpl so that we can efficiently translate -// tensors into the Python representations. However, in some situations -// (torchdeploy) there may be multiple Python interpreters in a single process -// and we must take care not to accidentally mix up PyObjects with the wrong -// interpreters. Thus, we also tag every TensorImpl with the Python interpreter -// it corresponds to. -// -// With torchdeploy, we have these invariants: -// - Any given TensorImpl can be associated with AT MOST one Python -// interpreter. -// We represent the interpreter tag as a memory address to an instance of -// a virtual class that is allocated once per interpreter (this is so that -// we can request the interpreter to perform operations for us, if -// necessary). -// - A given TensorImpl's interpreter tag can only go from uninitialized to -// tagged; once tagged, this is a quiescent state (once tagged to an -// interpreter, ALWAYS tagged to that interpreter) -// - A thread may mutate the PyObject field of a TensorImpl if and only if it -// holds the GIL for the interpreter tagged on the TensorImpl. (If the -// TensorImpl is not tagged, it must first atomically claim its tag before it -// can validly write) - -// The PyInterpreter object itself is a class that contains some function -// pointers for interacting with the interpreter. For now this is just for -// debugging, but if a Tensor can own a PyObject, the interpreter can be used to -// free it. -// -// WARNING: This class has to be written very carefully, because it may be -// possible for a Tensor to have a reference an interpreter corresponding to -// a shared library that has ALREADY BEEN UNLOADED. This makes blindly calling -// virtual methods very dangerous, because the vtable may be garbage at that -// point (on a good day, you might get "pure virtual method called"). -// -// The idea to solve this problem is we always leak PyInterpreters (so they -// always stay live even after dlclose), and disarm the "virtual methods" by -// replacing them with function pointers that just no-op. This can't be done -// with a traditional C++ vtable, so we have to roll our own. -// -// NB: The downside with representing PyInterpreter tags as full objects is that -// it takes an extra word on TensorImpl. If tags were instead just integer -// indices, on 64-bit architectures we could pack the tag and PyObject together -// into a single atomic word. On 32-bit architectures we could simply say that -// only one Python interpreter is supported (erroring if a nontrivial -// interpreter tag is attempted to be set). -// -// The difficulty with this scheme is we need to maintain an out-of-line table -// to get at the PyInterpreters so that we can do virtual method calls on them, -// and registration/deregistration to this table must be done in a thread safe -// manner. This can be easily done if the number of possible PyInterpreters is -// small enough (e.g., 8-bit integer) by simply preallocating an array of -// sufficient size to hold all possible interpreters. Surely 128 threads is -// more than enough for anyone! -// -// I didn't decide to do this technique at the moment, because the extra word -// added by the PyInterpreter tag takes us to 24 words, which means that we -// still fit inside three eight word cache lines. If you need to penny pinch -// another word consider doing this! - -struct PyInterpreter; -struct C10_API PyInterpreter { - using name_sig = std::string(const PyInterpreter*); - using decref_sig = void(const PyInterpreter*, PyObject*, bool); - using detach_sig = - c10::intrusive_ptr(const PyInterpreter*, const TensorImpl*); - using dispatch_sig = void( - const PyInterpreter*, - const c10::OperatorHandle&, - torch::jit::Stack* stack, - const std::shared_ptr& type); - - PyInterpreter( - name_sig* name_fn, - decref_sig* decref_fn, - detach_sig* detach, - dispatch_sig* dispatch) - : name_fn_(name_fn), - decref_fn_(decref_fn), - detach_fn_(detach), - dispatch_fn_(dispatch) {} - - name_sig* name_fn_; - decref_sig* decref_fn_; - detach_sig* detach_fn_; - dispatch_sig* dispatch_fn_; - - // UBSAN suppression fixes: "call to function - // (anonymous namespace)::concrete_decref_fn(c10::impl::PyInterpreter const*, - // _object*) through pointer to incorrect function type 'void (*)(const - // c10::impl::PyInterpreter *, _object *)'" See - // https://github.com/google/sanitizers/issues/911 - - // Report the name of this interpreter - __ubsan_ignore_function__ std::string name() const { - return (*name_fn_)(this); - } - - // Run Py_DECREF on a PyObject. We DO NOT assume the GIL is held on call - // See NOTE [PyInterpreter::decref takes an `is_tensor` arg] - __ubsan_ignore_function__ void decref(PyObject* pyobj, bool is_tensor) const { - return (*decref_fn_)(this, pyobj, is_tensor); - } - - // Perform a detach by deferring to the __torch_dispatch__ implementation of - // detach, which will also arrange for the PyObject to get copied in this - // situation - __ubsan_ignore_function__ c10::intrusive_ptr detach( - const TensorImpl* self) const { - return (*detach_fn_)(this, self); - } - - // Invoke the Python boxed fallback dispatch to go back into Python - __ubsan_ignore_function__ void dispatch( - const c10::OperatorHandle& op, - torch::jit::Stack* stack, - const std::shared_ptr& type) const { - return (*dispatch_fn_)(this, op, stack, type); - } - - // Disarm this PyInterpreter, making all of its methods noops. - // Because the function pointers are raw pointers (not atomics), - // a disarm() invocation that is concurrent with active destructors - // is not thread safe and will trigger TSAN. My hope is that this - // situations doesn't ever actually happen; tensor destruction should - // quiesce when a dlclose happens, and any long lived tensors whose - // destructors would be disarmed here only begin the destruction process - // on process shutdown (long after the dlclose has occurred). - void disarm() noexcept; -}; - // PyInterpreterStatus describes what the state of its interpreter tag // is, relative to the thread currently holding the GIL. enum class PyInterpreterStatus { @@ -361,30 +222,6 @@ struct C10_API NamedTensorMetaInterface { }; }; -// NOTE [What is TorchDispatchTypeObject?] -// A TorchDispatchTypeObject represents the type of a Tensor subclass that has -// a __torch_dispatch__ classmethod. Concretely, it holds the class as a -// PyObject* and a PyInterpreter* that says which python interpreter the class -// came from. -// -// See NOTE [dispatch_fn's type argument] for more details -struct C10_API TorchDispatchTypeObject { - // Steals a reference to type_object - TorchDispatchTypeObject( - PyObject* type_object, - c10::impl::PyInterpreter* pyinterpreter); - - // Releases the stolen reference to type_object - ~TorchDispatchTypeObject(); - - c10::impl::PyInterpreter* pyinterpreter() const; - PyObject* ptr() const; - - private: - PyObject* data_; - c10::impl::PyInterpreter* pyinterpreter_; -}; - // NOTE [ Version Counter Sharing ] // // Every Tensor has a version counter. Version counters are incremented whenever @@ -699,16 +536,32 @@ struct C10_API TensorImpl : public c10::intrusive_ptr_target { /** * Return a reference to the sizes of this tensor. This reference remains * valid as long as the tensor is live and not resized. + * + * NOTE: sizes() is only `TENSORIMPL_MAYBE_VIRTUAL` for backward + * compatibility. See `set_sizes_customization_policy` for the + * encouraged customization point. + * + * NOTE: Currently, CustomizableMethodPolicy::CustomBehavior is not + * supported due to a lack of use case, but it can easily be added. */ TENSORIMPL_MAYBE_VIRTUAL IntArrayRef sizes() const #ifdef C10_DISABLE_TENSORIMPL_EXTENSIBILITY { + if (C10_UNLIKELY( + sizes_customization_policy_ != + static_cast(CustomizableMethodPolicy::Default))) { + return sizes_nondefault_policy_impl(); + } return sizes_and_strides_.sizes_arrayref(); } #else ; #endif + private: + IntArrayRef sizes_nondefault_policy_impl() const; + + public: /** * Return a reference to the strides of this tensor. This reference remains * valid as long as the tensor is live and not restrided. @@ -838,103 +691,112 @@ struct C10_API TensorImpl : public c10::intrusive_ptr_target { bool is_sparse() const { // NB: This method is not virtual and avoid dispatches for performance // reasons. - return key_set_.has(DispatchKey::SparseCPU) || - key_set_.has(DispatchKey::SparseCUDA) || - key_set_.has(DispatchKey::SparseHIP) || - key_set_.has(DispatchKey::SparseXPU); + return key_set_.has_all(c10::sparse_ks); } // Whether a tensor is sparse COO or not. Use is_sparse_csr for checking CSR // format. bool is_sparse_csr() const { - return key_set_.has(DispatchKey::SparseCsrCPU) || - key_set_.has(DispatchKey::SparseCsrCUDA); + return key_set_.has_any(c10::sparse_csr_ks); } bool is_quantized() const { // NB: This method is not virtual and avoid dispatches for performance // reasons. - return key_set_.has(DispatchKey::QuantizedCPU) || - key_set_.has(DispatchKey::QuantizedCUDA) || - key_set_.has(DispatchKey::QuantizedXPU); + constexpr auto quantized_ks = DispatchKeySet(DispatchKey::Quantized); + return key_set_.has_all(quantized_ks); } bool is_meta() const { // NB: This method is not virtual and avoid dispatches for performance // reasons. - return key_set_.has(DispatchKey::Meta); + constexpr auto meta_ks = DispatchKeySet(DispatchKey::Meta); + return key_set_.has_all(meta_ks); } bool is_cpu() const { // NB: This method is not virtual and avoid dispatches for performance // reasons. - return key_set_.has(DispatchKey::CPU) || - key_set_.has(DispatchKey::SparseCPU) || - key_set_.has(DispatchKey::SparseCsrCPU) || - key_set_.has(DispatchKey::QuantizedCPU) || - key_set_.has(DispatchKey::MkldnnCPU); + constexpr auto cpu_bits_ks = DispatchKeySet(BackendComponent::CPUBit) | + DispatchKeySet({DispatchKey::SparseCsrCPU, DispatchKey::MkldnnCPU}); + return key_set_.has_any(cpu_bits_ks); } bool is_cuda() const { // NB: This method is not virtual and avoid dispatches for performance // reasons. - return key_set_.has(DispatchKey::CUDA) || - key_set_.has(DispatchKey::SparseCUDA) || - key_set_.has(DispatchKey::SparseCsrCUDA) || - key_set_.has(DispatchKey::QuantizedCUDA); + constexpr auto cuda_bits_ks = DispatchKeySet(BackendComponent::CUDABit) | + DispatchKeySet(DispatchKey::SparseCsrCUDA); + return key_set_.has_any(cuda_bits_ks); } bool is_xpu() const { // NB: This method is not virtual and avoid dispatches for performance // reasons. - return key_set_.has(DispatchKey::XPU) || - key_set_.has(DispatchKey::SparseXPU) || - key_set_.has(DispatchKey::QuantizedXPU); + constexpr auto xpu_ks = DispatchKeySet(BackendComponent::XPUBit); + return key_set_.has_all(xpu_ks); + } + + bool is_ipu() const { + constexpr auto ipu_ks = DispatchKeySet(BackendComponent::IPUBit); + return key_set_.has_all(ipu_ks); } bool is_xla() const { - return key_set_.has(DispatchKey::XLA); + constexpr auto xla_ks = DispatchKeySet(BackendComponent::XLABit); + return key_set_.has_all(xla_ks); } bool is_hpu() const { - return key_set_.has(DispatchKey::HPU); + constexpr auto hpu_ks = DispatchKeySet(BackendComponent::HPUBit); + return key_set_.has_all(hpu_ks); } bool is_lazy() const { - return key_set_.has(DispatchKey::Lazy); + constexpr auto lazy_ks = DispatchKeySet(BackendComponent::LazyBit); + return key_set_.has_all(lazy_ks); } bool is_hip() const { // NB: This method is not virtual and avoid dispatches for performance // reasons. - return key_set_.has(DispatchKey::HIP) || - key_set_.has(DispatchKey::SparseHIP); + constexpr auto hip_ks = DispatchKeySet(BackendComponent::HIPBit); + return key_set_.has_all(hip_ks); } bool is_ve() const { // NB: This method is not virtual and avoid dispatches for performance // reasons. - return key_set_.has(DispatchKey::VE) || key_set_.has(DispatchKey::SparseVE); + constexpr auto ve_ks = DispatchKeySet(BackendComponent::VEBit); + return key_set_.has_all(ve_ks); } bool is_mkldnn() const { - return key_set_.has(DispatchKey::MkldnnCPU); + return key_set_.has_all(c10::mkldnn_ks); } bool is_vulkan() const { - return key_set_.has(DispatchKey::Vulkan); + constexpr auto vulkan_ks = DispatchKeySet(DispatchKey::Vulkan); + return key_set_.has_all(vulkan_ks); } bool is_metal() const { - return key_set_.has(DispatchKey::Metal); + constexpr auto metal_ks = DispatchKeySet(DispatchKey::Metal); + return key_set_.has_all(metal_ks); } bool is_mlc() const { - return key_set_.has(DispatchKey::MLC); + constexpr auto mls_ks = DispatchKeySet(DispatchKey::MLC); + return key_set_.has_all(mls_ks); } bool is_ort() const { - return key_set_.has(DispatchKey::ORT); + constexpr auto ort_ks = DispatchKeySet(DispatchKey::ORT); + return key_set_.has_all(ort_ks); + } + + bool is_nested() const { + return key_set_.has(DispatchKey::NestedTensor); } // TODO: remove this once we don't automatically enabled Autograd dispatch @@ -950,8 +812,8 @@ struct C10_API TensorImpl : public c10::intrusive_ptr_target { // Invariant: // Inference tensor has version_counter_.enabled() == false bool is_inference() { - bool no_ADInplaceOrView = !key_set_.has(c10::DispatchKey::ADInplaceOrView); - bool no_Autograd = (key_set_ & c10::autograd_dispatch_keyset).empty(); + bool no_ADInplaceOrView = !key_set_.has_any(c10::inplace_or_view_ks); + bool no_Autograd = !key_set_.has_any(c10::autograd_dispatch_keyset); TORCH_INTERNAL_ASSERT_DEBUG_ONLY( no_ADInplaceOrView == no_Autograd, "ADInplaceOrView and Autograd keys must be on/off at the same time."); @@ -972,14 +834,22 @@ struct C10_API TensorImpl : public c10::intrusive_ptr_target { Layout layout() const { // NB: This method is not virtual and avoid dispatches for perf. - if (is_sparse()) { + // strided is also the most common layout type, so we check for + // strided case first. + // This keyset must also be kept in sync with the logic in + // is_sparse() / is_sparse_csr() / is_mkldnn() + constexpr auto sparse_and_sparsecsr_and_mkldnn_ks = + c10::sparse_ks | c10::sparse_csr_ks | c10::mkldnn_ks; + if (!key_set_.has_any(sparse_and_sparsecsr_and_mkldnn_ks)) { + return kStrided; + } else if (is_sparse()) { return kSparse; } else if (is_sparse_csr()) { return kSparseCsr; - } else if (is_mkldnn()) { - return kMkldnn; } else { - return kStrided; + TORCH_INTERNAL_ASSERT( + is_mkldnn(), "There is an error in the layout calculation logic."); + return kMkldnn; } } @@ -1065,7 +935,8 @@ struct C10_API TensorImpl : public c10::intrusive_ptr_target { * Whether or not the imaginary part of the tensor should be negated */ inline bool is_conj() const { - return key_set_.has(DispatchKey::Conjugate); + constexpr auto conjugate_ks = DispatchKeySet(DispatchKey::Conjugate); + return key_set_.has_all(conjugate_ks); } /** @@ -1085,7 +956,8 @@ struct C10_API TensorImpl : public c10::intrusive_ptr_target { * Whether or not the tensor is a zerotensor */ inline bool _is_zerotensor() const { - return key_set_.has(DispatchKey::ZeroTensor); + constexpr auto zerotensor_ks = DispatchKeySet(DispatchKey::ZeroTensor); + return key_set_.has_all(zerotensor_ks); } /** @@ -1105,7 +977,8 @@ struct C10_API TensorImpl : public c10::intrusive_ptr_target { * Whether or not the tensor should be negated */ inline bool is_neg() const { - return key_set_.has(DispatchKey::Negative); + constexpr auto negative_ks = DispatchKeySet(DispatchKey::Negative); + return key_set_.has_all(negative_ks); } /** @@ -1476,16 +1349,14 @@ struct C10_API TensorImpl : public c10::intrusive_ptr_target { void set_python_dispatch(bool k) { if (k) { - key_set_ = - key_set_.add(DispatchKey::Python).add(DispatchKey::PythonTLSSnapshot); + key_set_ = key_set_.add(c10::python_ks); } else { - key_set_ = key_set_.remove(DispatchKey::Python) - .remove(DispatchKey::PythonTLSSnapshot); + key_set_ = key_set_ - c10::python_ks; } } bool is_python_dispatch() const { - return key_set_.has(DispatchKey::Python); + return key_set_.has_all(c10::python_ks); } /** @@ -1550,13 +1421,22 @@ struct C10_API TensorImpl : public c10::intrusive_ptr_target { */ inline bool has_compatible_shallow_copy_type(DispatchKeySet from) { auto is_dense = [](DispatchKeySet ts) { - return ts.has(DispatchKey::CPU) || ts.has(DispatchKey::CUDA) || - ts.has(DispatchKey::HIP) || ts.has(DispatchKey::XPU); + constexpr auto dense_backends = DispatchKeySet( + {BackendComponent::CPUBit, + BackendComponent::CUDABit, + BackendComponent::HIPBit, + BackendComponent::XPUBit}); + constexpr auto dense_k = DispatchKeySet(DispatchKey::Dense); + return ts.has_any(dense_k) && ts.has_any(dense_backends); }; auto is_sparse = [](DispatchKeySet ts) { - return ts.has(DispatchKey::SparseCPU) || - ts.has(DispatchKey::SparseCUDA) || ts.has(DispatchKey::SparseHIP) || - ts.has(DispatchKey::SparseXPU); + constexpr auto sparse_backends = DispatchKeySet( + {BackendComponent::CPUBit, + BackendComponent::CUDABit, + BackendComponent::HIPBit, + BackendComponent::XPUBit}); + constexpr auto sparse_k = DispatchKeySet(DispatchKey::Sparse); + return ts.has_any(sparse_k) && ts.has_any(sparse_backends); }; return (key_set_ == from) || (is_dense(key_set_) && is_dense(from)) || (is_sparse(key_set_) && is_sparse(from)); @@ -2246,11 +2126,12 @@ struct C10_API TensorImpl : public c10::intrusive_ptr_target { * Compute the number of elements based on the sizes of a tensor. */ int64_t compute_numel() const { - int64_t n = 1; - for (auto s : sizes()) { - n *= s; - } - return n; +#if C10_HAS_BUILTIN_OVERFLOW() && !defined(C10_MOBILE) + // Use overflow checks if supported by the compiler + return safe_compute_numel(); +#else + return c10::multiply_integers(sizes()); +#endif } /** @@ -2259,14 +2140,15 @@ struct C10_API TensorImpl : public c10::intrusive_ptr_target { * using a sparse layout has multiple dimensions with large sizes. */ int64_t safe_compute_numel() const { - int64_t n = 1; - for (auto s : sizes()) { - TORCH_CHECK( - s == 0 || n <= std::numeric_limits::max() / s, - "numel: integer multiplication overflow"); - n *= s; - } - return n; + uint64_t n = 1; + bool overflows = c10::safe_multiplies_u64(sizes(), &n); + constexpr auto numel_max = std::min( + static_cast(std::numeric_limits::max()), + static_cast(std::numeric_limits::max())); + + overflows |= (n > numel_max); + TORCH_CHECK(!overflows, "numel: integer multiplication overflow"); + return static_cast(n); } /** @@ -2408,24 +2290,33 @@ struct C10_API TensorImpl : public c10::intrusive_ptr_target { } protected: - // Policy for adjusting the behavior of is_contiguous(). Allows - // subclass customization while still being able to inline - // is_contiguous() in the common case. - enum class HasContiguityPolicy : uint8_t { - // Default behavior: check is_contiguous_ and similar bitflags. + // Policy for adjusting the behavior of customizable methods like + // is_contiguous() and sizes(). Allows subclass customization while + // still being able to inline the methods in the common case. + enum class CustomizableMethodPolicy : uint8_t { + // Default behavior. Default, // Throw a generic error message that this tensor type does not - // support is_contiguous. - ContiguityNotSupported, - // Call virtual is_contiguous_custom method to implement custom - // is_contiguous behavior. + // support the method in question. + NotSupported, + // For backward compatibility. + ContiguityNotSupported = NotSupported, + // Call virtual foo_custom method to implement custom foo + // behavior. CustomBehavior, }; - void set_has_contiguity_policy(HasContiguityPolicy p) { + // For backward compatibility. + using HasContiguityPolicy = CustomizableMethodPolicy; + + void set_has_contiguity_policy(CustomizableMethodPolicy p) { has_contiguity_ = static_cast(p); } + void set_sizes_customization_policy(CustomizableMethodPolicy p) { + sizes_customization_policy_ = static_cast(p); + } + Storage storage_; private: @@ -2536,7 +2427,7 @@ struct C10_API TensorImpl : public c10::intrusive_ptr_target { // or -std=gnu++2a inline void init_bitfields() { is_contiguous_ = true; - has_contiguity_ = static_cast(HasContiguityPolicy::Default); + has_contiguity_ = static_cast(CustomizableMethodPolicy::Default); is_channels_last_ = false; is_channels_last_contiguous_ = false; @@ -2547,6 +2438,8 @@ struct C10_API TensorImpl : public c10::intrusive_ptr_target { allow_tensor_metadata_change_ = true; reserved_ = false; owns_pyobj_ = false; + sizes_customization_policy_ = + static_cast(CustomizableMethodPolicy::Default); storage_access_should_throw_ = false; } @@ -2607,6 +2500,9 @@ struct C10_API TensorImpl : public c10::intrusive_ptr_target { // direction (to make sure the pyobj stays live). bool owns_pyobj_ : 1; + // Customization policy for the sizes() virtual method. + /* CustomizableMethodPolicy */ uint8_t sizes_customization_policy_ : 2; + // The set of DispatchKeys which describe this tensor. NB: this // does NOT include Autograd (historically, it did, but // not anymore!) diff --git a/c10/core/TensorOptions.h b/c10/core/TensorOptions.h index f7619db0d60f3c..ea903fdce2d008 100644 --- a/c10/core/TensorOptions.h +++ b/c10/core/TensorOptions.h @@ -643,6 +643,9 @@ inline DispatchKey computeDispatchKey( } return DispatchKey::CUDA; } + case DeviceType::IPU: { + return DispatchKey::IPU; + } case DeviceType::XPU: { if (isQIntType(dtype_)) { return DispatchKey::QuantizedXPU; @@ -780,6 +783,9 @@ inline DeviceType dispatchKeyToDeviceType(DispatchKey dispatch_key) { return DeviceType::Meta; // stuff that people are actively developing + case DispatchKey::IPU: + case DispatchKey::AutogradIPU: + return DeviceType::IPU; case DispatchKey::XPU: case DispatchKey::SparseXPU: case DispatchKey::QuantizedXPU: diff --git a/c10/core/WrapDimMinimal.cpp b/c10/core/WrapDimMinimal.cpp new file mode 100644 index 00000000000000..2dc359fc5d4fdd --- /dev/null +++ b/c10/core/WrapDimMinimal.cpp @@ -0,0 +1,36 @@ +#include + +namespace c10 { +namespace detail { + +int64_t maybe_wrap_dim_slow( + int64_t dim, + int64_t dim_post_expr, + bool wrap_scalar) { + if (dim_post_expr <= 0) { + TORCH_CHECK_INDEX( + wrap_scalar, + "dimension specified as ", + dim, + " but tensor has no dimensions"); + return c10::maybe_wrap_dim(dim, /*dim_post_expr=*/1, /*wrap_scalar=*/false); + } + + int64_t min = -dim_post_expr; + int64_t max = dim_post_expr - 1; + TORCH_CHECK_INDEX( + min <= dim && dim <= max, + "Dimension out of range (expected to be in range of [", + min, + ", ", + max, + "], but got ", + dim, + ")"); + + TORCH_INTERNAL_ASSERT( + false, "should never reach here as dim should be out-of-bounds"); +} + +} // namespace detail +} // namespace c10 diff --git a/c10/core/WrapDimMinimal.h b/c10/core/WrapDimMinimal.h index 01cb1c641a14b3..4a6f375147491a 100644 --- a/c10/core/WrapDimMinimal.h +++ b/c10/core/WrapDimMinimal.h @@ -4,37 +4,22 @@ namespace c10 { +namespace detail { +C10_API int64_t +maybe_wrap_dim_slow(int64_t dim, int64_t dim_post_expr, bool wrap_scalar); +} + static inline int64_t maybe_wrap_dim( int64_t dim, int64_t dim_post_expr, bool wrap_scalar = true) { - if (dim_post_expr <= 0) { - if (!wrap_scalar) { - TORCH_CHECK_INDEX( - false, - "dimension specified as ", - dim, - " but tensor has no dimensions"); - } - dim_post_expr = 1; // this will make range [-1, 0] - } - - int64_t min = -dim_post_expr; - int64_t max = dim_post_expr - 1; - if (dim < min || dim > max) { - TORCH_CHECK_INDEX( - false, - "Dimension out of range (expected to be in range of [", - min, - ", ", - max, - "], but got ", - dim, - ")"); + // Inline the fast paths + if (C10_LIKELY(-dim_post_expr <= dim && dim < dim_post_expr)) { + // Branch-less version of dim + (dim < 0 ? dim_post_expr : 0) + return dim + dim_post_expr * (dim < 0); } - if (dim < 0) - dim += dim_post_expr; - return dim; + // Check edge-cases out-of-line (wrapping scalars and out-of-bounds errors) + return c10::detail::maybe_wrap_dim_slow(dim, dim_post_expr, wrap_scalar); } } // namespace c10 diff --git a/c10/core/impl/FakeGuardImpl.h b/c10/core/impl/FakeGuardImpl.h index 2d47db0fdb1847..c86255220c1c1f 100644 --- a/c10/core/impl/FakeGuardImpl.h +++ b/c10/core/impl/FakeGuardImpl.h @@ -9,7 +9,7 @@ namespace impl { // FakeGuardImpl is hardcoded to have eight devices. Not for // any good reason, just to simplify code. -constexpr size_t kFakeGuardImplMaxDevices = 8; +constexpr DeviceIndex kFakeGuardImplMaxDevices = 8; /** * A fake implementation of DeviceGuardImplInterface suitable for testing. @@ -21,7 +21,7 @@ struct FakeGuardImpl final : public DeviceGuardImplInterface { static constexpr DeviceType static_type = T; // Runtime device type is not used FakeGuardImpl(DeviceType) {} - FakeGuardImpl() {} + FakeGuardImpl() = default; DeviceType type() const override { return T; } diff --git a/c10/core/impl/PyInterpreter.cpp b/c10/core/impl/PyInterpreter.cpp new file mode 100644 index 00000000000000..4367c7b7530e2d --- /dev/null +++ b/c10/core/impl/PyInterpreter.cpp @@ -0,0 +1,41 @@ +#include +#include + +namespace c10 { +namespace impl { + +static std::string noop_name_fn(const PyInterpreter*) { + return ""; +} + +static void noop_decref_fn(const PyInterpreter*, PyObject*, bool) { + // no-op +} + +static c10::intrusive_ptr noop_detach_fn( + const PyInterpreter*, + const TensorImpl*) { + TORCH_INTERNAL_ASSERT( + 0, + "attempted to detach (shallow_copy_and_detach) Tensor with nontrivial PyObject after corresponding interpreter died"); +} + +static void noop_dispatch_fn( + const PyInterpreter*, + const c10::OperatorHandle& op, + torch::jit::Stack* stack, + const std::shared_ptr& type) { + TORCH_INTERNAL_ASSERT( + 0, + "attempted to dispatch (__torch_dispatch__) an operator on Tensor with nontrivial PyObject after corresponding interpreter died"); +} + +void PyInterpreter::disarm() noexcept { + name_fn_ = &noop_name_fn; + decref_fn_ = &noop_decref_fn; + detach_fn_ = &noop_detach_fn; + dispatch_fn_ = &noop_dispatch_fn; +} + +} // namespace impl +} // namespace c10 diff --git a/c10/core/impl/PyInterpreter.h b/c10/core/impl/PyInterpreter.h new file mode 100644 index 00000000000000..a78ba2d83e728c --- /dev/null +++ b/c10/core/impl/PyInterpreter.h @@ -0,0 +1,190 @@ +#pragma once + +#include +#include +#include +#include +#include + +// Forward declarations + +namespace c10 { +struct IValue; +class OperatorHandle; +struct TensorImpl; +struct SafePyObject; +} // namespace c10 + +namespace torch { +namespace jit { +using Stack = std::vector; +} +} // namespace torch + +// Actual implementation + +namespace c10 { +namespace impl { + +// Note [Python interpreter tag] +// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +// Traditionally, PyTorch is layered such that our Python library +// (libtorch_python) references our pure C++ library (libtorch) as the +// natural order of things. However, sometimes this natural order is +// subverted: C++ objects refer to Python objects (for example, we +// store a PyObject* pointer on TensorImpl so that converting from a +// C++ Tensor to a Python Tensor is just a memory dereference). +// +// These unusual orderings must be treated with care. To start, you need to +// virtualize the destructor so that the PyObject can be decref'ed on +// destruction (because the C++ object itself doesn't know anything about +// Python--remember, layering!). This process itself is fraught, since +// acquiring the GIL could lead to deadlocks if someone is blocking on you +// while holding the GIL. Furthermore, if the C++ objects outlive the +// interpreter (which can happen if you stash them in a static global +// variable defined in libtorch), you may attempt to decref the object when +// the Python interpreter has already been shutdown. +// +// BUT WAIT, IT GETS WORSE. With torchdeploy, there may be multiple Python +// interpreters in a single process. If a C++ object is accessible from +// multiple interpreters, we must take care not to accidentally pass a +// PyObject from one interpreter with another interpreter. +// +// To prevent these mixups, we introduce a PyInterpreter "tag" (object with +// a vtable), which specifies a specific Python interpreter. +// +// - Any given object can be associated with AT MOST one Python interpreter. +// We represent the interpreter tag as a memory address to an instance of +// a virtual class that is allocated once per interpreter (this is so that +// we can request the interpreter to perform operations for us, if +// necessary). +// +// - It can be recorded with a PyObject (PyInterpreterObject) so that +// we know what interpreter the object is associated with, and we can +// raise an error if you try to use the PyObject from the wrong +// interpreter context. +// +// - It contains a vtable that can be used to perform various Python +// operations from ordinary C++ code that ordinarily wouldn't be accessible +// from libtorch. +// +// A simple use case is when a C++ object must be associated with a PyObject. +// However, for TensorImpl, we lazily allocate a PyObject the first time the +// object passes into Python. The invariants for this situation are more +// subtle: +// +// - A given TensorImpl's interpreter tag can only go from uninitialized to +// tagged; once tagged, this is a quiescent state (once tagged to an +// interpreter, ALWAYS tagged to that interpreter) +// +// - A thread may mutate the PyObject field of a TensorImpl if and only if it +// holds the GIL for the interpreter tagged on the TensorImpl. (If the +// TensorImpl is not tagged, it must first atomically claim its tag before it +// can validly write) +// +// WARNING: This class has to be written very carefully, because it may be +// possible for a Tensor to have a reference an interpreter corresponding to +// a shared library that has ALREADY BEEN UNLOADED. This makes blindly calling +// virtual methods very dangerous, because the vtable may be garbage at that +// point (on a good day, you might get "pure virtual method called"). +// +// The idea to solve this problem is we always leak PyInterpreters (so they +// always stay live even after dlclose), and disarm the "virtual methods" by +// replacing them with function pointers that just no-op. This can't be done +// with a traditional C++ vtable, so we have to roll our own. +// +// NB: The downside with representing PyInterpreter tags as full objects is that +// it takes an extra word on TensorImpl. If tags were instead just integer +// indices, on 64-bit architectures we could pack the tag and PyObject together +// into a single atomic word. On 32-bit architectures we could simply say that +// only one Python interpreter is supported (erroring if a nontrivial +// interpreter tag is attempted to be set). +// +// The difficulty with this scheme is we need to maintain an out-of-line table +// to get at the PyInterpreters so that we can do virtual method calls on them, +// and registration/deregistration to this table must be done in a thread safe +// manner. This can be easily done if the number of possible PyInterpreters is +// small enough (e.g., 8-bit integer) by simply preallocating an array of +// sufficient size to hold all possible interpreters. Surely 128 threads is +// more than enough for anyone! +// +// I didn't decide to do this technique at the moment, because the extra word +// added by the PyInterpreter tag takes us to 24 words, which means that we +// still fit inside three eight word cache lines. If you need to penny pinch +// another word consider doing this! + +struct C10_API PyInterpreter { + // Feel free to add as much random crap here as you need; each of these + // can be thought of as a "C++ to Python" hook. + using name_sig = std::string(const PyInterpreter*); + using decref_sig = void(const PyInterpreter*, PyObject*, bool); + using detach_sig = + c10::intrusive_ptr(const PyInterpreter*, const TensorImpl*); + using dispatch_sig = void( + const PyInterpreter*, + const c10::OperatorHandle&, + torch::jit::Stack* stack, + // This is a Tensor subclass type object + const std::shared_ptr& type); + + PyInterpreter( + name_sig* name_fn, + decref_sig* decref_fn, + detach_sig* detach, + dispatch_sig* dispatch) + : name_fn_(name_fn), + decref_fn_(decref_fn), + detach_fn_(detach), + dispatch_fn_(dispatch) {} + + name_sig* name_fn_; + decref_sig* decref_fn_; + detach_sig* detach_fn_; + dispatch_sig* dispatch_fn_; + + // UBSAN suppression fixes: "call to function + // (anonymous namespace)::concrete_decref_fn(c10::impl::PyInterpreter const*, + // _object*) through pointer to incorrect function type 'void (*)(const + // c10::impl::PyInterpreter *, _object *)'" See + // https://github.com/google/sanitizers/issues/911 + + // Report the name of this interpreter + __ubsan_ignore_function__ std::string name() const { + return (*name_fn_)(this); + } + + // Run Py_DECREF on a PyObject. We DO NOT assume the GIL is held on call + // See NOTE [PyInterpreter::decref takes an `is_tensor` arg] + __ubsan_ignore_function__ void decref(PyObject* pyobj, bool is_tensor) const { + return (*decref_fn_)(this, pyobj, is_tensor); + } + + // Perform a detach by deferring to the __torch_dispatch__ implementation of + // detach, which will also arrange for the PyObject to get copied in this + // situation + __ubsan_ignore_function__ c10::intrusive_ptr detach( + const TensorImpl* self) const { + return (*detach_fn_)(this, self); + } + + // Invoke the Python boxed fallback dispatch to go back into Python + __ubsan_ignore_function__ void dispatch( + const c10::OperatorHandle& op, + torch::jit::Stack* stack, + const std::shared_ptr& type) const { + return (*dispatch_fn_)(this, op, stack, type); + } + + // Disarm this PyInterpreter, making all of its methods noops. + // Because the function pointers are raw pointers (not atomics), + // a disarm() invocation that is concurrent with active destructors + // is not thread safe and will trigger TSAN. My hope is that this + // situations doesn't ever actually happen; tensor destruction should + // quiesce when a dlclose happens, and any long lived tensors whose + // destructors would be disarmed here only begin the destruction process + // on process shutdown (long after the dlclose has occurred). + void disarm() noexcept; +}; + +} // namespace impl +} // namespace c10 diff --git a/c10/cuda/CUDACachingAllocator.cpp b/c10/cuda/CUDACachingAllocator.cpp index c1ac4bd0ed0c88..49e7f3c3d137c5 100644 --- a/c10/cuda/CUDACachingAllocator.cpp +++ b/c10/cuda/CUDACachingAllocator.cpp @@ -7,6 +7,7 @@ #include #include #include +#include #include #include @@ -177,6 +178,8 @@ struct Block { Block* prev; // prev block if split from a larger allocation Block* next; // next block if split from a larger allocation int event_count; // number of outstanding CUDA events + int gc_count; // counter for prioritizing older / less useful blocks for + // garbage collection Block( int device, @@ -193,7 +196,8 @@ struct Block { allocated(0), prev(nullptr), next(nullptr), - event_count(0) {} + event_count(0), + gc_count(0) {} // constructor for search key Block(int device, cudaStream_t stream, size_t size) @@ -206,7 +210,8 @@ struct Block { allocated(0), prev(nullptr), next(nullptr), - event_count(0) {} + event_count(0), + gc_count(0) {} bool is_split() const { return (prev != nullptr) || (next != nullptr); @@ -310,7 +315,7 @@ cudaError_t cudaMallocMaybeCapturing(void** p, size_t size) { if (at::cuda::currentStreamCaptureStatusMayInitCtx() == at::cuda::CaptureStatus::None) { #endif - return cudaMalloc(p, size); + return C10_CUDA_ERROR_HANDLED(cudaMalloc(p, size)); #if defined(CUDA_VERSION) && CUDA_VERSION >= 11000 } else { // It's ok to capture cudaMallocs, as long as we never cudaFree those @@ -318,7 +323,7 @@ cudaError_t cudaMallocMaybeCapturing(void** p, size_t size) { // Capturing cudaMalloc behaves nicely: it gives the graph new VA, // but is ignored (won't leakily allocate new memory) in replays. at::cuda::CUDAStreamCaptureModeGuard g{cudaStreamCaptureModeRelaxed}; - return cudaMalloc(p, size); + return C10_CUDA_ERROR_HANDLED(cudaMalloc(p, size)); } #endif } @@ -330,6 +335,17 @@ class CachingAllocatorConfig { static size_t max_split_size() { return instance().m_max_split_size; } + static double garbage_collection_threshold() { + return instance().m_garbage_collection_threshold; + } + + // This is used to round-up allocation size to nearest power of 2 divisions. + // More description below in function roundup_power2_next_division + // As ane example, if we want 4 divisions between 2's power, this can be done + // using env variable: PYTORCH_CUDA_ALLOC_CONF=roundup_power2_divisions:4 + static size_t roundup_power2_divisions() { + return instance().m_roundup_power2_divisions; + } private: static CachingAllocatorConfig& instance() { @@ -342,8 +358,12 @@ class CachingAllocatorConfig { } CachingAllocatorConfig() - : m_max_split_size(std::numeric_limits::max()) {} + : m_max_split_size(std::numeric_limits::max()), + m_roundup_power2_divisions(0), + m_garbage_collection_threshold(0) {} size_t m_max_split_size; + size_t m_roundup_power2_divisions; + double m_garbage_collection_threshold; void parseArgs() { const char* val = getenv("PYTORCH_CUDA_ALLOC_CONF"); @@ -373,6 +393,32 @@ class CachingAllocatorConfig { val2 = std::min( val2, (std::numeric_limits::max() / (1024 * 1024))); m_max_split_size = val2 * 1024 * 1024; + } else if (kv[0].compare("roundup_power2_divisions") == 0) { + size_t val2 = stoi(kv[1]); + TORCH_CHECK( + llvm::isPowerOf2_64(val2), + "For roundups, the divisons has to be power of 2 ", + ""); + m_roundup_power2_divisions = val2; + } else if (kv[0].compare("garbage_collection_threshold") == 0) { + /* + * Perform garbage collection of GPU memory blocks to avoid + * triggering expensive sync-and-reclaim-all operation. Upon setting + * the threshold (e.g., 0.8), the allocator will start reclaiming + * blocks if GPU memory capacity usage exceeds the threshold (i.e., + * 80% of total memory). + * Values 0.0 and 1.0 are not allowed as they are less meaningful. + */ + double val2 = stod(kv[1]); + TORCH_CHECK( + val2 > 0, + "garbage_collect_threshold too small, set it 0.0~1.0", + ""); + TORCH_CHECK( + val2 < 1.0, + "garbage_collect_threshold too big, set it 0.0~1.0", + ""); + m_garbage_collection_threshold = val2; } else { TORCH_CHECK(false, "Unrecognized CachingAllocator option: ", kv[0]); } @@ -469,18 +515,29 @@ class DeviceCachingAllocator { params.stat_types[static_cast(StatType::AGGREGATE)] = true; params.stat_types[static_cast(get_stat_type_for_pool(pool))] = true; + // First, try to get a block from the existing pool. bool block_found = // Search pool get_free_block(params) // Trigger callbacks and retry search - || (trigger_free_memory_callbacks(params) && get_free_block(params)) - // Attempt allocate - || alloc_block(params, false) - // Free enough available cached blocks to satisfy alloc and retry alloc. - || - (release_available_cached_blocks(params) && alloc_block(params, false)) - // Free all non-split cached blocks and retry alloc. - || (release_cached_blocks() && alloc_block(params, true)); + || (trigger_free_memory_callbacks(params) && get_free_block(params)); + + // Can't reuse an existing block; try to get a new one. + if (!block_found) { + // Do garbage collection if the flag is set. + if (C10_UNLIKELY( + CachingAllocatorConfig::garbage_collection_threshold() > 0.0)) { + garbage_collect_cached_blocks(); + } + // Attempt allocate + block_found = alloc_block(params, false) + // Free enough available cached blocks to satisfy alloc and retry + // alloc. + || (release_available_cached_blocks(params) && + alloc_block(params, false)) + // Free all non-split cached blocks and retry alloc. + || (release_cached_blocks() && alloc_block(params, true)); + } if (!block_found) { // For any error code other than cudaErrorMemoryAllocation, @@ -699,9 +756,9 @@ class DeviceCachingAllocator { if (*largest == 0) { // make an initial guess if a zero *largest is passed in size_t tmp_bytes; - cudaMemGetInfo( + C10_CUDA_CHECK(cudaMemGetInfo( largest, // Use free memory as an optimistic initial guess of *largest - &tmp_bytes); + &tmp_bytes)); } cache_info_aux(large_blocks, total, largest); cache_info_aux(small_blocks, total, largest); @@ -808,11 +865,43 @@ class DeviceCachingAllocator { return result; } + // This function takes the size and number of divisions argument and rounds + // up the size argument for the nearest power-of-2 division. + // For example, if we need to round-up 1200 and number of divisions is 4, + // the size 1200 lies between 1024 and 2048 and if we do 4 divisions between + // them, the values are 1024, 1280, 1536, and 1792. So the function will + // return 1280 as the nearest ceiling of power-2 divison. + static size_t roundup_power2_next_division(size_t size, size_t divisions) { + if (C10_UNLIKELY(size <= 4 || divisions <= 1)) { + return size; + } + if (llvm::isPowerOf2_64(size)) { + return size; + } + + // divide the space between these 2's power into equal divisions + // If division is zero, return the power-of-2 ceiling. + size_t power2_floor = llvm::PowerOf2Floor(size); + size_t power2_divison = + power2_floor >> (63 - llvm::countLeadingZeros(divisions)); + if (C10_UNLIKELY(power2_divison == 0)) { + return (power2_floor << 1); + } + size_t round_size_floor = size & (~(power2_divison - 1)); + return (round_size_floor == size) ? size + : round_size_floor + power2_divison; + } + static size_t round_size(size_t size) { if (size < kMinBlockSize) { return kMinBlockSize; } else { - return kMinBlockSize * ((size + kMinBlockSize - 1) / kMinBlockSize); + auto divisions = CachingAllocatorConfig::roundup_power2_divisions(); + if (divisions > 0 && size > (kMinBlockSize * divisions)) { + return roundup_power2_next_division(size, divisions); + } else { + return kMinBlockSize * ((size + kMinBlockSize - 1) / kMinBlockSize); + } } } @@ -1037,6 +1126,14 @@ class DeviceCachingAllocator { bool get_free_block(AllocParams& p) { BlockPool& pool = *p.pool; + + if (C10_UNLIKELY( + CachingAllocatorConfig::garbage_collection_threshold() > 0.0)) { + // Track block reuse interval only when garbage collection is enabled. + for (auto& b : pool.blocks) { + ++b->gc_count; + } + } auto it = pool.blocks.lower_bound(&p.search_key); if (it == pool.blocks.end() || (*it)->stream != p.stream()) return false; @@ -1049,6 +1146,7 @@ class DeviceCachingAllocator { ((*it)->size >= p.size() + kLargeBuffer)) return false; p.block = *it; + (*it)->gc_count = 0; // Denote this block has been used pool.blocks.erase(it); return true; } @@ -1062,6 +1160,62 @@ class DeviceCachingAllocator { return freed_memory; } + void garbage_collect_cached_blocks() { + // Free unused cached blocks to reclaim GPU memory. + // Unlike release_cached_blocks(), this does not enforce synchronization and + // therefore should be of less overheads. + + size_t gc_threshold = static_cast( + CachingAllocatorConfig::garbage_collection_threshold() * + allowed_memory_maximum); + // No need to trigger GC yet + if (total_allocated_memory <= gc_threshold) { + return; + } + const auto target_size = total_allocated_memory - gc_threshold; + size_t gc_reclaimed = 0; + + // Calculate the total age of the free-able blocks. We'll use it later to + // get "avg age" threshold. + double total_age = 0.0; + int freeable_block_count = 0; + for (auto& b : large_blocks.blocks) { + if (!b->is_split()) { + total_age += b->gc_count; + ++freeable_block_count; + } + } + // No free-able blocks? + if (freeable_block_count == 0) { + return; + } + + // Repeat GC until we reach reclaim > target size. + bool block_freed = true; + while (gc_reclaimed < target_size && block_freed == true && + freeable_block_count > 0) { + // Free blocks exceeding this age threshold first. + double age_threshold = total_age / freeable_block_count; + // Stop iteration if we can no longer free a block. + block_freed = false; + + // Free blocks of > avg age. Don't stop upon reaching the target_size, + // we don't want this GC to be triggered frequently. + auto it = large_blocks.blocks.begin(); + while (it != large_blocks.blocks.end()) { + Block* block = *it; + ++it; + if (!block->is_split() && block->gc_count >= age_threshold) { + block_freed = true; + gc_reclaimed += block->size; + total_age -= block->gc_count; // Decrement the age + freeable_block_count--; // One less block that can be freed + release_block(block); + } + } + } + } + bool alloc_block(AllocParams& p, bool isRetry) { // Defensively checks for preexisting CUDA error state. C10_CUDA_CHECK(cudaGetLastError()); @@ -1304,7 +1458,7 @@ class DeviceCachingAllocator { cudaEvent_t event = e.first; Block* block = e.second; - cudaError_t err = cudaEventQuery(event); + cudaError_t err = C10_CUDA_ERROR_HANDLED(cudaEventQuery(event)); if (err == cudaErrorNotReady) { // ignore and clear the error if not ready cudaGetLastError(); @@ -1422,9 +1576,9 @@ class THCCachingAllocator { fraction, ". Please set within (0, 1)."); int activated_device; - cudaGetDevice(&activated_device); + C10_CUDA_CHECK(cudaGetDevice(&activated_device)); if (activated_device != device) { - cudaSetDevice(device); + C10_CUDA_CHECK(cudaSetDevice(device)); } device_allocator[device]->setMemoryFraction(fraction); } diff --git a/c10/cuda/CUDACachingAllocator.h b/c10/cuda/CUDACachingAllocator.h index d3a73943f7bbd0..9b1a6ecf159035 100644 --- a/c10/cuda/CUDACachingAllocator.h +++ b/c10/cuda/CUDACachingAllocator.h @@ -102,6 +102,7 @@ struct DeviceStats { // cudaMalloc).. struct BlockInfo { int64_t size = 0; + int32_t gc_counter = 0; bool allocated = false; bool active = false; }; diff --git a/c10/cuda/CUDAException.h b/c10/cuda/CUDAException.h index 77d0d07ac95e86..ca441711cbd679 100644 --- a/c10/cuda/CUDAException.h +++ b/c10/cuda/CUDAException.h @@ -63,6 +63,26 @@ class C10_CUDA_API CUDAError : public c10::Error { } \ } while (0) +// Indicates that a CUDA error is handled in a non-standard way +#define C10_CUDA_ERROR_HANDLED(EXPR) EXPR + +// Intentionally ignore a CUDA error +#define C10_CUDA_IGNORE_ERROR(EXPR) \ + do { \ + cudaError_t __err = EXPR; \ + if (__err != cudaSuccess) { \ + cudaError_t error_unused C10_UNUSED = cudaGetLastError(); \ + (void)error_unused; \ + } \ + } while (0) + +// Clear the last CUDA error +#define C10_CUDA_CLEAR_ERROR() \ + do { \ + cudaError_t error_unused C10_UNUSED = cudaGetLastError(); \ + (void)error_unused; \ + } while (0) + // This should be used directly after every kernel launch to ensure // the launch happened correctly and provide an early, close-to-source // diagnostic if it didn't. diff --git a/c10/cuda/CUDAFunctions.cpp b/c10/cuda/CUDAFunctions.cpp index 255d798d13fb91..9ab61aa1f38125 100644 --- a/c10/cuda/CUDAFunctions.cpp +++ b/c10/cuda/CUDAFunctions.cpp @@ -10,16 +10,13 @@ namespace { // returns -1 on failure int32_t driver_version() { int driver_version = -1; - cudaError_t err = cudaDriverGetVersion(&driver_version); - if (err != cudaSuccess) { - cudaError_t last_err C10_UNUSED = cudaGetLastError(); - } + C10_CUDA_IGNORE_ERROR(cudaDriverGetVersion(&driver_version)); return driver_version; } int device_count_impl(bool fail_if_no_driver) { int count; - auto err = cudaGetDeviceCount(&count); + auto err = C10_CUDA_ERROR_HANDLED(cudaGetDeviceCount(&count)); if (err == cudaSuccess) { return count; } diff --git a/c10/cuda/CUDAStream.h b/c10/cuda/CUDAStream.h index 7bb97e88b991e6..6d17136341c6ec 100644 --- a/c10/cuda/CUDAStream.h +++ b/c10/cuda/CUDAStream.h @@ -111,7 +111,7 @@ class C10_CUDA_API CUDAStream { bool query() const { DeviceGuard guard{stream_.device()}; - cudaError_t err = cudaStreamQuery(stream()); + cudaError_t err = C10_CUDA_ERROR_HANDLED(cudaStreamQuery(stream())); if (err == cudaSuccess) { return true; diff --git a/c10/cuda/impl/CUDAGuardImpl.h b/c10/cuda/impl/CUDAGuardImpl.h index 8f5cfdc259d3bc..583feeec26000a 100644 --- a/c10/cuda/impl/CUDAGuardImpl.h +++ b/c10/cuda/impl/CUDAGuardImpl.h @@ -41,7 +41,7 @@ struct CUDAGuardImpl final : public c10::impl::DeviceGuardImplInterface { } c10::optional uncheckedGetDevice() const noexcept { int device; - auto err = cudaGetDevice(&device); + const auto err = C10_CUDA_ERROR_HANDLED(cudaGetDevice(&device)); C10_CUDA_CHECK_WARN(err); if (err != cudaSuccess) { return c10::nullopt; @@ -164,7 +164,7 @@ struct CUDAGuardImpl final : public c10::impl::DeviceGuardImplInterface { if (!event) return true; cudaEvent_t cuda_event = static_cast(event); - const cudaError_t err = cudaEventQuery(cuda_event); + const cudaError_t err = C10_CUDA_ERROR_HANDLED(cudaEventQuery(cuda_event)); if (err != cudaErrorNotReady) { C10_CUDA_CHECK(err); } else { diff --git a/c10/test/core/DispatchKeySet_test.cpp b/c10/test/core/DispatchKeySet_test.cpp index 43b06c110e5bac..db6a2cf721c903 100644 --- a/c10/test/core/DispatchKeySet_test.cpp +++ b/c10/test/core/DispatchKeySet_test.cpp @@ -3,25 +3,163 @@ #include #include +#include using namespace c10; +// This test exists not to be comprehensive, but to more clearly show +// what the semantics of DispatchKeySet are. +TEST(DispatchKeySet, ShowSemantics) { + // the "CPU" dispatch key is an instance of a per-backend-functionality key. + // It corresponds to "dense" functionality, "CPU" backend. + // This means that it gets a dense functionality bit, and a cpu backend bit + // set. + auto undefined_set = DispatchKeySet(); + auto dense_cpu_set = DispatchKeySet(DispatchKey::CPU); + ASSERT_TRUE(dense_cpu_set.has(DispatchKey::Dense)); + ASSERT_TRUE(dense_cpu_set.has_backend(BackendComponent::CPUBit)); + ASSERT_TRUE(dense_cpu_set.has(DispatchKey::CPU)); + + auto dense_lazy_set = DispatchKeySet(DispatchKey::Lazy); + ASSERT_TRUE(dense_lazy_set.has(DispatchKey::Dense)); + ASSERT_TRUE(dense_lazy_set.has_backend(BackendComponent::LazyBit)); + ASSERT_TRUE(dense_lazy_set.has(DispatchKey::Lazy)); + + // You can think of "Dense/Sparse", and "CPUBit/CUDABit", as "building block" + // dispatch keys. You are allowed to directly create keysets out of them! + auto dense_cpu_set_from_building_blocks = DispatchKeySet(DispatchKey::Dense) | + DispatchKeySet(BackendComponent::CPUBit); + ASSERT_TRUE(dense_cpu_set.has(DispatchKey::Dense)); + ASSERT_TRUE(dense_cpu_set.has_backend(BackendComponent::CPUBit)); + ASSERT_TRUE(dense_cpu_set.has(DispatchKey::CPU)); + ASSERT_EQ(dense_cpu_set, dense_cpu_set_from_building_blocks); + + // Similarly, the AutogradCUDA key gets 2 bits in the keyset: + // The "Autograd" functionality bit, and the "CUDA" backend bit + auto autograd_cuda = DispatchKeySet(DispatchKey::AutogradCUDA); + ASSERT_TRUE(autograd_cuda.has(DispatchKey::AutogradFunctionality)); + ASSERT_TRUE(autograd_cuda.has_backend(BackendComponent::CUDABit)); + + // Because DispatchKeySet uses a condensed internal representation, you cannot + // use it to represent the FULL cross product of backends and functionalities + // for example: + auto autograd_dense_cpu_cuda = DispatchKeySet( + {DispatchKey::AutogradFunctionality, + DispatchKey::Dense, + DispatchKey::CUDA, + DispatchKey::CPU}); + auto fpga = DispatchKeySet(DispatchKey::FPGA); + auto fpga_and_cpu = DispatchKeySet({DispatchKey::FPGA, DispatchKey::CPU}); + // this keyset has all of the building block keys: + ASSERT_TRUE(autograd_dense_cpu_cuda.has(DispatchKey::AutogradFunctionality)); + ASSERT_TRUE(autograd_dense_cpu_cuda.has(DispatchKey::Dense)); + ASSERT_TRUE(autograd_dense_cpu_cuda.has_backend(BackendComponent::CUDABit)); + ASSERT_TRUE(autograd_dense_cpu_cuda.has_backend(BackendComponent::CPUBit)); + + // and it also has the "runtime" keys that correspond to the full + // cross-product of functionality + ASSERT_TRUE(autograd_dense_cpu_cuda.has(DispatchKey::AutogradCPU)); + ASSERT_TRUE(autograd_dense_cpu_cuda.has(DispatchKey::AutogradCPU)); + ASSERT_TRUE(autograd_dense_cpu_cuda.has(DispatchKey::CPU)); + ASSERT_TRUE(autograd_dense_cpu_cuda.has(DispatchKey::CUDA)); + + // This means that there's no way to represent a keyset with, say, only + // Autograd CUDA + Dense CPU. Instead, you should think of a keyset as + // inheriting the full set of functionalities + backends of its keys. This + // means that the below keysets are all indistinguishable from each other. + ASSERT_EQ( + autograd_dense_cpu_cuda, + DispatchKeySet( + {DispatchKey::AutogradCUDA, + DispatchKey::AutogradCPU, + DispatchKey::CUDA, + DispatchKey::CPU})); + ASSERT_EQ( + autograd_dense_cpu_cuda, + DispatchKeySet({DispatchKey::AutogradCUDA, DispatchKey::CPU})); + ASSERT_EQ( + autograd_dense_cpu_cuda, + DispatchKeySet({DispatchKey::CUDA, DispatchKey::AutogradCPU})); + + // ~~~~~~~~~~ DispatchKeySet iterators ~~~~~~~~~~~ + + // Iterators allow you to iterate individually through the DispatchKey's in a + // DispatchKeySet + auto empty_set = DispatchKeySet(); + auto t1 = empty_set.begin(); + auto t2 = empty_set.end(); + ASSERT_EQ(*empty_set.begin(), *empty_set.end()); + + // However, only keys that correspond to actual runtime indices of kernels in + // the operator table show up when you iterate through a keyset. i.e. + // DispatchKey::Dense, and BackendComponent::CPUBit won't show up in an + // iterator. + auto dense_cpu_iter = dense_cpu_set.begin(); + ASSERT_EQ(*dense_cpu_iter++, DispatchKey::CPU); + ASSERT_EQ(*dense_cpu_iter, *dense_cpu_set.end()); + + auto autograd_dense_cpu_cuda_iter = autograd_dense_cpu_cuda.begin(); + ASSERT_EQ(*autograd_dense_cpu_cuda_iter++, DispatchKey::CPU); + ASSERT_EQ(*autograd_dense_cpu_cuda_iter++, DispatchKey::CUDA); + ASSERT_EQ(*autograd_dense_cpu_cuda_iter++, DispatchKey::AutogradCPU); + ASSERT_EQ(*autograd_dense_cpu_cuda_iter++, DispatchKey::AutogradCUDA); + ASSERT_EQ(*autograd_dense_cpu_cuda_iter, *autograd_dense_cpu_cuda.end()); + + // But other "functionality bits" that are not defined per-backend DO get + // their own slots in the operator table. + auto mixed_keyset = DispatchKeySet(BackendComponent::CPUBit) | + DispatchKeySet( + {DispatchKey::FPGA, // runtime key + DispatchKey::Functionalize, // runtime key + DispatchKey::Dense}); // NOT a runtime key + auto mixed_iter = mixed_keyset.begin(); + ASSERT_EQ(*mixed_iter++, DispatchKey::CPU); + ASSERT_EQ(*mixed_iter++, DispatchKey::FPGA); + ASSERT_EQ(*mixed_iter++, DispatchKey::Functionalize); + ASSERT_EQ(*mixed_iter, *mixed_keyset.end()); +} + TEST(DispatchKeySet, Empty) { DispatchKeySet empty_set; - for (uint8_t i = 1; i < static_cast(DispatchKey::NumDispatchKeys); + for (uint8_t i = 0; + i <= static_cast(DispatchKey::EndOfRuntimeBackendKeys); i++) { auto tid = static_cast(i); + if (tid == DispatchKey::Undefined) + continue; ASSERT_FALSE(empty_set.has(tid)); } ASSERT_TRUE(empty_set.empty()); DispatchKeySet empty_set2; ASSERT_TRUE(empty_set == empty_set2); - ASSERT_EQ(empty_set.highestPriorityTypeId(), DispatchKey::Undefined); } -TEST(DispatchKeySet, Singleton) { - for (uint8_t i = 1; i < static_cast(DispatchKey::NumDispatchKeys); - i++) { +// This covers all keys that correspond to a single backend bit, e.g. +// BackendComponent::CPUBit. Even though these are NOT runtime keys, we still +// allow adding them directly to a keyset +TEST(DispatchKeySet, SingletonBackendComponent) { + for (const auto i : c10::irange(1, num_backends)) { + auto tid = static_cast(i); + DispatchKeySet sing(tid); + ASSERT_EQ(sing, sing); + ASSERT_EQ(sing, DispatchKeySet().add(tid)); + ASSERT_EQ(sing, sing.add(tid)); + ASSERT_EQ(sing, sing | sing); + ASSERT_FALSE(sing.empty()); + ASSERT_TRUE(sing.has(tid)); + } +} + +// This covers all keys that correspond to a single functionality bit: +// - runtime, not-per-backend functionality keys, e.g. +// DispatchKey::FuncTorchBatched +// - runtime, "fake backend" keys, e.g. DispatchKey::FPGA +// - NOT-runtime, per-backend functionality keys, e.g. DispatchKey::Dense +// Even though it's not a runtime key, we still allow adding it directly to a +// keyset. +// DispatchKey:: +TEST(DispatchKeySet, SingletonFunctionalityKeys) { + for (const auto i : c10::irange(1, num_functionality_keys)) { auto tid = static_cast(i); DispatchKeySet sing(tid); ASSERT_EQ(sing, sing); @@ -30,47 +168,145 @@ TEST(DispatchKeySet, Singleton) { ASSERT_EQ(sing, sing | sing); ASSERT_FALSE(sing.empty()); ASSERT_TRUE(sing.has(tid)); - ASSERT_EQ(sing.highestPriorityTypeId(), tid); ASSERT_EQ(sing.remove(tid), DispatchKeySet()); } } -TEST(DispatchKeySet, Doubleton) { - for (uint8_t i = 1; i < static_cast(DispatchKey::NumDispatchKeys); +// This covers runtime keys that are per-backend, +// and take up more than one bit in a DispatchKeySet. They take up one +// functionality bit + one backend bit. e.g. CPU, CUDA, SparseCPU, SparseCUDA, +// AutogradCPU, AutogradCUDA +TEST(DispatchKeySet, SingletonPerBackendFunctionalityKeys) { + for (uint8_t i = static_cast(DispatchKey::StartOfDenseBackends); + i <= static_cast(DispatchKey::EndOfRuntimeBackendKeys); + i++) { + auto tid = static_cast(i); + // Skip these because they aren't real keys. + if (tid == DispatchKey::StartOfDenseBackends || + tid == DispatchKey::StartOfSparseBackends || + tid == DispatchKey::StartOfQuantizedBackends || + tid == DispatchKey::StartOfAutogradBackends) { + continue; + } + DispatchKeySet sing(tid); + ASSERT_EQ(sing, sing); + ASSERT_EQ(sing, DispatchKeySet().add(tid)); + ASSERT_EQ(sing, sing.add(tid)); + ASSERT_EQ(sing, sing | sing); + ASSERT_FALSE(sing.empty()); + ASSERT_TRUE(sing.has(tid)); + + auto functionality_key = toFunctionalityKey(tid); + auto backend_key = toBackendComponent(tid); + // These two sets should be equivalent: + // DispatchKeySet(DispatchKey::CPU) + // DispatchKeySet({DispatchKey::Dense, BackendComponent::CPUBit}) + auto expected_ks = + DispatchKeySet(functionality_key) | DispatchKeySet(backend_key); + ASSERT_EQ(sing, expected_ks); + // These two sets should be equivalent: + // DispatchKeySet(DispatchKey::CPU).remove(DispatchKey::Dense) + // DispatchKeySet(BackendComponent::CPUBit) + expected_ks = DispatchKeySet(toBackendComponent(tid)); + ASSERT_EQ(sing.remove(tid), expected_ks); + } +} + +TEST(DispatchKeySet, DoubletonPerBackend) { + for (uint8_t i = static_cast(DispatchKey::StartOfDenseBackends); + i <= static_cast(DispatchKey::EndOfRuntimeBackendKeys); i++) { for (uint8_t j = i + 1; - j < static_cast(DispatchKey::NumDispatchKeys); + j <= static_cast(DispatchKey::EndOfRuntimeBackendKeys); j++) { ASSERT_LT(i, j); auto tid1 = static_cast(i); auto tid2 = static_cast(j); - auto doub = DispatchKeySet(tid1).add(tid2); - ASSERT_EQ(doub, DispatchKeySet(tid1) | DispatchKeySet(tid2)); - ASSERT_TRUE(doub.has(tid1)); - ASSERT_TRUE(doub.has(tid2)); - ASSERT_EQ(doub.highestPriorityTypeId(), tid2); // relies on i < j + + // Skip these because they aren't real keys. + if (tid1 == DispatchKey::StartOfDenseBackends || + tid1 == DispatchKey::StartOfSparseBackends || + tid1 == DispatchKey::StartOfQuantizedBackends || + tid1 == DispatchKey::StartOfAutogradBackends) + continue; + if (tid2 == DispatchKey::StartOfDenseBackends || + tid2 == DispatchKey::StartOfSparseBackends || + tid2 == DispatchKey::StartOfQuantizedBackends || + tid2 == DispatchKey::StartOfAutogradBackends) + continue; + + auto backend1 = toBackendComponent(tid1); + auto backend2 = toBackendComponent(tid2); + auto functionality1 = toFunctionalityKey(tid1); + auto functionality2 = toFunctionalityKey(tid2); + + auto combined = DispatchKeySet({tid1, tid2}); + // The combined set has the backend bits + ASSERT_TRUE(combined.has_backend(backend1)); + ASSERT_TRUE(combined.has_backend(backend2)); + // and it has the backend bits + ASSERT_TRUE(combined.has(functionality1)); + ASSERT_TRUE(combined.has(functionality2)); + // and it has the original two runtime keys + ASSERT_TRUE(combined.has(tid1)); + ASSERT_TRUE(combined.has(tid2)); + + // Add all of the keys in the keyset to a real set + std::unordered_set visited_keys; + auto iter = combined.begin(); + while (*iter != *combined.end()) { + visited_keys.insert(*iter); + ++iter; + } + std::unordered_set expected_keys; + expected_keys.insert( + toRuntimePerBackendFunctionalityKey(functionality1, backend1)); + expected_keys.insert( + toRuntimePerBackendFunctionalityKey(functionality1, backend2)); + expected_keys.insert( + toRuntimePerBackendFunctionalityKey(functionality2, backend1)); + expected_keys.insert( + toRuntimePerBackendFunctionalityKey(functionality2, backend2)); + ASSERT_EQ(expected_keys, visited_keys); + + if (backend1 == backend2 || functionality1 == functionality2) { + // We have two runtime keys, with either the same backend or the same + // per-backend functionalities. E.g. {AutogradCUDA, CUDA} or + // {AutogradCPU, AutogradCUDA} There should be 2 total runtime keys in + // this set. + ASSERT_EQ(2, visited_keys.size()); + } else { + // since i and j are different keys, they should not have the same + // functionality and backend + ASSERT_TRUE(backend1 != backend2 && functionality1 != functionality2); + // We have two runtime keys, that have different backends + per-backend + // functionalities. So we should expect the full cross product of + // runtime keys to be in the set. e.g. if i = AutogradCUDA, and j = CPU, + // then combined = {AutogradCUDA, AutogradCPU, CUDA, CPU} + ASSERT_EQ(4, visited_keys.size()); + } } } } TEST(DispatchKeySet, Full) { DispatchKeySet full(DispatchKeySet::FULL); - for (uint8_t i = 1; i < static_cast(DispatchKey::NumDispatchKeys); - i++) { + for (const auto i : c10::irange(1, num_functionality_keys)) { auto tid = static_cast(i); ASSERT_TRUE(full.has(tid)); } + ASSERT_FALSE(full.has(DispatchKey::EndOfFunctionalityKeys)); } TEST(DispatchKeySet, IteratorBasicOps) { DispatchKeySet empty_set; DispatchKeySet full_set(DispatchKeySet::FULL); - DispatchKeySet mutated_set = empty_set.add(static_cast(1)); + DispatchKeySet mutated_set = empty_set.add(DispatchKey::CPU); // Constructor + Comparison - ASSERT_EQ(*empty_set.begin(), DispatchKey::NumDispatchKeys); - ASSERT_EQ(*empty_set.end(), DispatchKey::NumDispatchKeys); - ASSERT_EQ(*mutated_set.begin(), static_cast(1)); + ASSERT_EQ(*empty_set.begin(), DispatchKey::EndOfFunctionalityKeys); + ASSERT_EQ(*empty_set.end(), DispatchKey::EndOfFunctionalityKeys); + ASSERT_EQ(*mutated_set.begin(), DispatchKey::CPU); ASSERT_TRUE(empty_set.begin() == empty_set.end()); ASSERT_TRUE(full_set.begin() != full_set.end()); @@ -80,6 +316,25 @@ TEST(DispatchKeySet, IteratorBasicOps) { ASSERT_TRUE(full_set.begin() != ++full_set.begin()); } +TEST(DispatchKeySet, getHighestPriorityBackendTypeId) { + // AutogradCPU isn't a backend key so it is ignored + DispatchKeySet dense_cpu({DispatchKey::AutogradCPU, DispatchKey::CPU}); + ASSERT_EQ(DispatchKey::CPU, c10::highestPriorityBackendTypeId(dense_cpu)); + + // Functionalize isn't a backend key so it is ignored + DispatchKeySet sparse_cuda( + {DispatchKey::Functionalize, DispatchKey::SparseCUDA}); + ASSERT_EQ( + DispatchKey::SparseCUDA, c10::highestPriorityBackendTypeId(sparse_cuda)); + + // quantizedCUDA has higher priority than CUDA + DispatchKeySet quantized_cuda( + {DispatchKey::CUDA, DispatchKey::QuantizedCUDA}); + ASSERT_EQ( + DispatchKey::QuantizedCUDA, + c10::highestPriorityBackendTypeId(quantized_cuda)); +} + TEST(DispatchKeySet, IteratorEmpty) { DispatchKeySet empty_set; uint8_t i = 0; @@ -90,16 +345,37 @@ TEST(DispatchKeySet, IteratorEmpty) { ASSERT_EQ(i, 0); } +TEST(DispatchKeySet, IteratorCrossProduct) { + // The iterator should return all runtime keys in the set, + // including the cross product of {backends} x {functionalities} + auto ks = + DispatchKeySet({BackendComponent::CPUBit, BackendComponent::CUDABit}) | + DispatchKeySet( + {DispatchKey::Dense, + DispatchKey::FPGA, + DispatchKey::AutogradFunctionality}); + + auto iter = ks.begin(); + // iterate through dense backends first. + ASSERT_EQ(DispatchKey::CPU, *(iter++)); + ASSERT_EQ(DispatchKey::CUDA, *(iter++)); + // FPGA doesn't have a backend bit, so it isn't included in the cross product. + ASSERT_EQ(DispatchKey::FPGA, *(iter++)); + // iterate through the autograd keys laster. + ASSERT_EQ(DispatchKey::AutogradCPU, *(iter++)); + ASSERT_EQ(DispatchKey::AutogradCUDA, *(iter++)); +} + TEST(DispatchKeySet, IteratorFull) { DispatchKeySet full_set(DispatchKeySet::FULL); uint8_t i = 0; for (const auto& it : full_set) { i++; - ASSERT_TRUE(it == static_cast(i)); - ASSERT_TRUE(it != DispatchKey::NumDispatchKeys); } - ASSERT_EQ(i, static_cast(DispatchKey::NumDispatchKeys) - 1); + // Total # of runtime entries includes an entry for DispatchKey::Undefined, + // which is not included when iterating through the DispatchKeySet. + ASSERT_EQ(i, num_runtime_entries - 1); } TEST(DispatchKeySet, IteratorRangeFull) { @@ -108,41 +384,61 @@ TEST(DispatchKeySet, IteratorRangeFull) { for (DispatchKey dispatch_key : full_set) { i++; - ASSERT_TRUE(dispatch_key == static_cast(i)); } - ASSERT_EQ(i, static_cast(DispatchKey::NumDispatchKeys) - 1); -} - -TEST(DispatchKeySet, SpecificKeys) { - DispatchKeySet keyset({ - static_cast(0), // Undefined should be ignored - static_cast(4), - static_cast(10), - static_cast(15), - }); - std::unordered_set visited_keys; - - for (DispatchKey key : keyset) { - visited_keys.insert(key); - } - - ASSERT_EQ(visited_keys.size(), 3); - ASSERT_TRUE( - visited_keys.find(static_cast(4)) != visited_keys.end()); - ASSERT_TRUE( - visited_keys.find(static_cast(10)) != visited_keys.end()); - ASSERT_TRUE( - visited_keys.find(static_cast(15)) != visited_keys.end()); + // Total # of runtime entries includes an entry for DispatchKey::Undefined, + // which is not included when iterating through the DispatchKeySet. + ASSERT_EQ(i, num_runtime_entries - 1); } TEST(DispatchKeySet, FailAtEndIterator) { DispatchKeySet full_set(DispatchKeySet::FULL); uint64_t raw_repr = full_set.raw_repr(); + // doesn't throw + DispatchKeySet::iterator(&raw_repr, num_backends + num_functionality_keys); // NOLINTNEXTLINE(cppcoreguidelines-avoid-goto,hicpp-avoid-goto) EXPECT_THROW( DispatchKeySet::iterator( - &raw_repr, static_cast(DispatchKey::NumDispatchKeys) + 1), + &raw_repr, num_backends + num_functionality_keys + 1), c10::Error); } + +TEST(DispatchKeySet, TestKeyOrderingInvariants) { + for (uint8_t i = static_cast(DispatchKey::StartOfDenseBackends); + i <= static_cast(DispatchKey::EndOfRuntimeBackendKeys); + i++) { + auto k = static_cast(i); + // Note [The Ordering of Per-Backend Dispatch Keys Matters!] + // The DispatchKey enum includes all of the runtime keys for + // Dense/Sparse/Quantized/Autograd, (e.g. CPU, CUDA, SparseCPU, SparseCUDA, + // AutogradCPU, AutogradCUDA, etc). And we expect the ordering of those keys + // to be the same as the ordering of the backends in the `BackendComponent` + // enum. This makes several utilities in `DispatchKey.h` and + // `DispatchKeySet.h` significantly easier to implement. The purpose of the + // test is to assert (through CI) that this invariant is maintained. + // + // The only way that we can really check this invariant is by + // comparing the string names of each enum. + // We only really care about the ordering for "real" keys that are actually + // used, which we expect to be able to print properly. This saves us from + // having to enumerate the full set of possible runtime keys in + // DispatchKey::toString(). It also relies on toString() being implemented + // correctly. + auto functionality_str = std::string(toString(k)); + if (functionality_str == "UNKNOWN_TENSOR_TYPE_ID") + continue; + + auto computed_backend_k = toBackendComponent(k); + auto computed_backend_str = std::string(toString(computed_backend_k)); + // Skip, e.g., the "Bit" from "CPUBit" + computed_backend_str = + computed_backend_str.substr(0, computed_backend_str.size() - 3); + + ASSERT_TRUE( + functionality_str.find(computed_backend_str) != std::string::npos) + << "DispatchKey invariant broken! Found a key that is not ordered correctly" + << " with its backend bit. key = " << toString(k) << ", " << k + << ", computed backend = " << toString(computed_backend_k); + } +} diff --git a/c10/test/util/Synchronized_test.cpp b/c10/test/util/Synchronized_test.cpp new file mode 100644 index 00000000000000..ce781a10cadb4c --- /dev/null +++ b/c10/test/util/Synchronized_test.cpp @@ -0,0 +1,43 @@ +#include +#include + +#include +#include + +namespace { + +TEST(Synchronized, TestSingleThreadExecution) { + c10::Synchronized iv(0); + const int kMaxValue = 100; + for (int i = 0; i < kMaxValue; ++i) { + auto ret = iv.withLock([](int& iv) { return ++iv; }); + EXPECT_EQ(ret, i + 1); + } + + iv.withLock([kMaxValue](int& iv) { EXPECT_EQ(iv, kMaxValue); }); +} + +TEST(Synchronized, TestMultiThreadedExecution) { + c10::Synchronized iv(0); +#define NUM_LOOP_INCREMENTS 10000 + + auto thread_cb = [&iv]() { + for (int i = 0; i < NUM_LOOP_INCREMENTS; ++i) { + iv.withLock([](int& iv) { ++iv; }); + } + }; + + std::array threads; + for (auto& t : threads) { + t = std::thread(thread_cb); + } + + for (auto& t : threads) { + t.join(); + } + + iv.withLock([](int& iv) { EXPECT_EQ(iv, NUM_LOOP_INCREMENTS * 10); }); +#undef NUM_LOOP_INCREMENTS +} + +} // namespace diff --git a/c10/test/util/ordered_preserving_dict_test.cpp b/c10/test/util/ordered_preserving_dict_test.cpp index 773b2e7a2a35b3..aa1d7f0f986eda 100644 --- a/c10/test/util/ordered_preserving_dict_test.cpp +++ b/c10/test/util/ordered_preserving_dict_test.cpp @@ -48,7 +48,7 @@ dict_int_int test_dict(dict_int_int& dict) { } dict.erase(begin, end); - std::vector order; + std::vector order; for (const auto i : c10::irange(100)) { if (!erase_set.count(i)) { order.push_back(i); @@ -211,12 +211,12 @@ TEST(OrderedPreservingDictTest, test_range_erase) { using HMap = ska_ordered::order_preserving_flat_hash_map; - const std::size_t nb_values = 1000; + const int64_t nb_values = 1000; HMap map; for (const auto i : c10::irange(nb_values)) { map[c10::guts::to_string(i)] = i; auto begin = map.begin(); - for (size_t j = 0; j <= i; ++j, begin++) { + for (int64_t j = 0; j <= i; ++j, begin++) { TORCH_INTERNAL_ASSERT(begin->second == j); } } diff --git a/c10/util/Half.h b/c10/util/Half.h index f74dc89bb0ef7d..a877efe9d2ca30 100644 --- a/c10/util/Half.h +++ b/c10/util/Half.h @@ -392,28 +392,32 @@ struct alignas(2) Half { #endif }; -// This is just a placeholder for whatever complex representation we -// end up deciding to use for half-precision complex numbers. +// TODO : move to complex.h template <> struct alignas(4) complex { - using value_type = Half; Half real_; Half imag_; + + // Constructors complex() = default; - Half real() const { + // Half constructor is not constexpr so the following constructor can't + // be constexpr + C10_HOST_DEVICE explicit inline complex(const Half& real, const Half& imag) + : real_(real), imag_(imag) {} + C10_HOST_DEVICE explicit inline complex(const c10::complex& value) + : real_(value.real()), imag_(value.imag()) {} + + // Conversion operator + inline C10_HOST_DEVICE operator c10::complex() const { + return {real_, imag_}; + } + + constexpr C10_HOST_DEVICE Half real() const { return real_; } - Half imag() const { + constexpr C10_HOST_DEVICE Half imag() const { return imag_; } - explicit inline complex(c10::complex value) - : real_(value.real()), imag_(value.imag()) {} - explicit inline complex(c10::complex value) - : real_(static_cast(value.real())), - imag_(static_cast(value.imag())) {} - inline operator c10::complex() const { - return {real_, imag_}; - } }; // In some versions of MSVC, there will be a compiler error when building. diff --git a/c10/util/LeftRight.h b/c10/util/LeftRight.h index 13529f2ea0c780..e45267cb8f7e36 100644 --- a/c10/util/LeftRight.h +++ b/c10/util/LeftRight.h @@ -1,4 +1,5 @@ #include +#include #include #include #include @@ -192,13 +193,9 @@ class LeftRight final { // read-write lock to protect T (data). template class RWSafeLeftRightWrapper final { - using mutexType = std::mutex; - using rLockType = std::unique_lock; - using wLockType = std::unique_lock; - public: template - explicit RWSafeLeftRightWrapper(const Args&... args) : _data{args...} {} + explicit RWSafeLeftRightWrapper(const Args&... args) : data_{args...} {} // RWSafeLeftRightWrapper is not copyable or moveable since LeftRight // is not copyable or moveable. @@ -209,19 +206,17 @@ class RWSafeLeftRightWrapper final { template auto read(F&& readFunc) const -> typename std::result_of::type { - rLockType lock(mutex_); - return readFunc(_data); + return data_.withLock( + [&readFunc](T const& data) { return readFunc(data); }); } template auto write(F&& writeFunc) -> typename std::result_of::type { - wLockType lock(mutex_); - return writeFunc(_data); + return data_.withLock([&writeFunc](T& data) { return writeFunc(data); }); } private: - T _data; - mutable mutexType mutex_; + c10::Synchronized data_; }; } // namespace c10 diff --git a/c10/util/OptionalArrayRef.h b/c10/util/OptionalArrayRef.h new file mode 100644 index 00000000000000..7ca375d7cb785e --- /dev/null +++ b/c10/util/OptionalArrayRef.h @@ -0,0 +1,228 @@ +// This file defines OptionalArrayRef, a class that has almost the same +// exact functionality as c10::optional>, except that its +// converting constructor fixes a dangling pointer issue. +// +// The implicit converting constructor of both c10::optional> and +// std::optional> can cause the underlying ArrayRef to store +// a dangling pointer. OptionalArrayRef prevents this by wrapping +// a c10::optional> and fixing the constructor implementation. +// +// See https://github.com/pytorch/pytorch/issues/63645 for more on this. + +#pragma once + +#include +#include + +namespace c10 { + +template +class OptionalArrayRef final { + public: + // Constructors + + constexpr OptionalArrayRef() noexcept {} + + constexpr OptionalArrayRef(nullopt_t) noexcept {} + + OptionalArrayRef(const OptionalArrayRef& other) = default; + + OptionalArrayRef(OptionalArrayRef&& other) = default; + + constexpr OptionalArrayRef(const optional>& other) noexcept + : wrapped_opt_array_ref(other) {} + + constexpr OptionalArrayRef(optional>&& other) noexcept + : wrapped_opt_array_ref(other) {} + + constexpr OptionalArrayRef(const T& value) noexcept + : wrapped_opt_array_ref(value) {} + + template < + typename U = ArrayRef, + std::enable_if_t< + !std::is_same, OptionalArrayRef>::value && + !std::is_same, in_place_t>::value && + std::is_constructible, U&&>::value && + std::is_convertible>::value && + !std::is_convertible::value, + bool> = false> + constexpr OptionalArrayRef(U&& value) noexcept( + std::is_nothrow_constructible, U&&>::value) + : wrapped_opt_array_ref(value) {} + + template < + typename U = ArrayRef, + std::enable_if_t< + !std::is_same, OptionalArrayRef>::value && + !std::is_same, in_place_t>::value && + std::is_constructible, U&&>::value && + !std::is_convertible>::value, + bool> = false> + constexpr explicit OptionalArrayRef(U&& value) noexcept( + std::is_nothrow_constructible, U&&>::value) + : wrapped_opt_array_ref(value) {} + + template + constexpr explicit OptionalArrayRef(in_place_t ip, Args&&... args) noexcept + : wrapped_opt_array_ref(ip, args...) {} + + template + constexpr explicit OptionalArrayRef( + in_place_t ip, + std::initializer_list il, + Args&&... args) + : wrapped_opt_array_ref(ip, il, args...) {} + + // Destructor + + ~OptionalArrayRef() = default; + + // Assignment + + constexpr OptionalArrayRef& operator=(nullopt_t) noexcept { + wrapped_opt_array_ref = c10::nullopt; + return *this; + } + + OptionalArrayRef& operator=(const OptionalArrayRef& other) = default; + + OptionalArrayRef& operator=(OptionalArrayRef&& other) = default; + + constexpr OptionalArrayRef& operator=( + const optional>& other) noexcept { + wrapped_opt_array_ref = other; + return *this; + } + + constexpr OptionalArrayRef& operator=( + optional>&& other) noexcept { + wrapped_opt_array_ref = other; + return *this; + } + + template > + constexpr std::enable_if_t< + !std::is_same, OptionalArrayRef>::value && + std::is_constructible, U&&>::value && + std::is_assignable&, U&&>::value, + OptionalArrayRef&> + operator=(U&& value) noexcept( + std::is_nothrow_constructible, U&&>::value&& + std::is_nothrow_assignable&, U&&>::value) { + wrapped_opt_array_ref = value; + return *this; + } + + // Observers + + constexpr ArrayRef* operator->() noexcept { + return &wrapped_opt_array_ref.value(); + } + + constexpr const ArrayRef* operator->() const noexcept { + return &wrapped_opt_array_ref.value(); + } + + constexpr ArrayRef& operator*() & noexcept { + return wrapped_opt_array_ref.value(); + } + + constexpr const ArrayRef& operator*() const& noexcept { + return wrapped_opt_array_ref.value(); + } + + constexpr ArrayRef&& operator*() && noexcept { + return std::move(wrapped_opt_array_ref.value()); + } + + constexpr const ArrayRef&& operator*() const&& noexcept { + return std::move(wrapped_opt_array_ref.value()); + } + + constexpr explicit operator bool() const noexcept { + return wrapped_opt_array_ref.has_value(); + } + + constexpr bool has_value() const noexcept { + return wrapped_opt_array_ref.has_value(); + } + + constexpr ArrayRef& value() & { + return wrapped_opt_array_ref.value(); + } + + constexpr const ArrayRef& value() const& { + return wrapped_opt_array_ref.value(); + } + + constexpr ArrayRef&& value() && { + return std::move(wrapped_opt_array_ref.value()); + } + + constexpr const ArrayRef&& value() const&& { + return std::move(wrapped_opt_array_ref.value()); + } + + template + constexpr std:: + enable_if_t>::value, ArrayRef> + value_or(U&& default_value) const& { + return wrapped_opt_array_ref.value_or(default_value); + } + + template + constexpr std:: + enable_if_t>::value, ArrayRef> + value_or(U&& default_value) && { + return wrapped_opt_array_ref.value_or(default_value); + } + + // Modifiers + + constexpr void swap(OptionalArrayRef& other) noexcept { + std::swap(wrapped_opt_array_ref, other.wrapped_opt_array_ref); + } + + constexpr void reset() noexcept { + wrapped_opt_array_ref.reset(); + } + + template + constexpr std::enable_if_t< + std::is_constructible, Args&&...>::value, + ArrayRef&> + emplace(Args&&... args) noexcept( + std::is_nothrow_constructible, Args&&...>::value) { + return wrapped_opt_array_ref.emplace(args...); + } + + template + constexpr ArrayRef& emplace( + std::initializer_list il, + Args&&... args) noexcept { + return wrapped_opt_array_ref.emplace(il, args...); + } + + private: + optional> wrapped_opt_array_ref; +}; + +using OptionalIntArrayRef = OptionalArrayRef; + +inline bool operator==( + const OptionalIntArrayRef& a1, + const IntArrayRef& other) { + if (!a1.has_value()) { + return false; + } + return a1.value() == other; +} + +inline bool operator==( + const c10::IntArrayRef& a1, + const c10::OptionalIntArrayRef& a2) { + return a2 == a1; +} + +} // namespace c10 diff --git a/c10/util/Synchronized.h b/c10/util/Synchronized.h index 205ded5a5e1f13..1679d7060fe05c 100644 --- a/c10/util/Synchronized.h +++ b/c10/util/Synchronized.h @@ -42,9 +42,9 @@ class Synchronized final { * provided callback safely. */ template - void withLock(CB cb) { + typename std::result_of::type withLock(CB cb) { std::lock_guard guard(this->mutex_); - cb(this->data_); + return cb(this->data_); } /** @@ -53,9 +53,9 @@ class Synchronized final { * the provided callback safely. */ template - void withLock(CB cb) const { + typename std::result_of::type withLock(CB cb) const { std::lock_guard guard(this->mutex_); - cb(this->data_); + return cb(this->data_); } }; } // end namespace c10 diff --git a/c10/util/TypeCast.h b/c10/util/TypeCast.h index 86c5c9f62231c4..1c6a72bab4926f 100644 --- a/c10/util/TypeCast.h +++ b/c10/util/TypeCast.h @@ -45,7 +45,8 @@ struct static_cast_with_inter_type { C10_HOST_DEVICE __ubsan_ignore_undefined__ static inline dest_t apply( src_t src) { constexpr bool real = needs_real::value; - return static_cast(maybe_real::apply(src)); + auto r = maybe_real::apply(src); + return static_cast(r); } }; @@ -68,6 +69,36 @@ struct static_cast_with_inter_type { } }; +template <> +struct static_cast_with_inter_type, c10::BFloat16> { + C10_HOST_DEVICE __ubsan_ignore_undefined__ static inline c10::complex< + c10::Half> + apply(c10::BFloat16 src) { + return static_cast>(c10::complex{src}); + } +}; + +template <> +struct static_cast_with_inter_type, c10::Half> { + C10_HOST_DEVICE __ubsan_ignore_undefined__ static inline c10::complex< + c10::Half> + apply(c10::Half src) { + return static_cast>(c10::complex{src}); + } +}; + +template <> +struct static_cast_with_inter_type< + c10::complex, + c10::complex> { + C10_HOST_DEVICE __ubsan_ignore_undefined__ static inline c10::complex< + c10::Half> + apply(c10::complex src) { + return static_cast>( + static_cast>(src)); + } +}; + // Dynamic type casting utils: // - fetch_and_cast // - cast_and_store @@ -130,7 +161,7 @@ C10_HOST_DEVICE inline dest_t fetch_and_cast( const ScalarType src_type, const void* ptr) { switch (src_type) { - AT_FORALL_SCALAR_TYPES_WITH_COMPLEX_EXCEPT_COMPLEX_HALF(FETCH_AND_CAST_CASE) + AT_FORALL_SCALAR_TYPES_WITH_COMPLEX(FETCH_AND_CAST_CASE) default: ERROR_UNSUPPORTED_CAST } @@ -149,7 +180,7 @@ C10_HOST_DEVICE inline void cast_and_store( void* ptr, src_t value) { switch (dest_type) { - AT_FORALL_SCALAR_TYPES_WITH_COMPLEX_EXCEPT_COMPLEX_HALF(CAST_AND_STORE_CASE) + AT_FORALL_SCALAR_TYPES_WITH_COMPLEX(CAST_AND_STORE_CASE) default:; } ERROR_UNSUPPORTED_CAST diff --git a/c10/util/accumulate.h b/c10/util/accumulate.h index 086a7977401c52..8d0cc49c8ecbd6 100644 --- a/c10/util/accumulate.h +++ b/c10/util/accumulate.h @@ -82,7 +82,7 @@ template < inline int64_t numelements_from_dim(const int k, const C& dims) { TORCH_INTERNAL_ASSERT_DEBUG_ONLY(k >= 0); - if (k > dims.size()) { + if (k > static_cast(dims.size())) { return 1; } else { auto cbegin = dims.cbegin(); diff --git a/c10/util/int128.cpp b/c10/util/int128.cpp index a080e73430b365..f83dba49983363 100644 --- a/c10/util/int128.cpp +++ b/c10/util/int128.cpp @@ -171,7 +171,7 @@ std::ostream& operator<<(std::ostream& o, const uint128& b) { // Add the requisite padding. std::streamsize width = o.width(0); - if (width > rep.size()) { + if (width > static_cast(rep.size())) { if ((flags & std::ios::adjustfield) == std::ios::left) { rep.append(width - rep.size(), o.fill()); } else { diff --git a/c10/util/safe_numerics.h b/c10/util/safe_numerics.h new file mode 100644 index 00000000000000..7eb9ed39395d86 --- /dev/null +++ b/c10/util/safe_numerics.h @@ -0,0 +1,74 @@ +#pragma once +#include +#include + +#include +#include +#include + +// GCC has __builtin_mul_overflow from before it supported __has_builtin +#ifdef _MSC_VER +#define C10_HAS_BUILTIN_OVERFLOW() (0) +#include +#include +#else +#define C10_HAS_BUILTIN_OVERFLOW() (1) +#endif + +namespace c10 { + +C10_ALWAYS_INLINE bool add_overflows(uint64_t a, uint64_t b, uint64_t* out) { +#if C10_HAS_BUILTIN_OVERFLOW() + return __builtin_add_overflow(a, b, out); +#else + unsigned long long tmp; + auto carry = _addcarry_u64(0, a, b, &tmp); + *out = tmp; + return carry; +#endif +} + +C10_ALWAYS_INLINE bool mul_overflows(uint64_t a, uint64_t b, uint64_t* out) { +#if C10_HAS_BUILTIN_OVERFLOW() + return __builtin_mul_overflow(a, b, out); +#else + *out = a * b; + // This test isnt exact, but avoids doing integer division + return ( + (c10::llvm::countLeadingZeros(a) + c10::llvm::countLeadingZeros(b)) < 64); +#endif +} + +template +bool safe_multiplies_u64(It first, It last, uint64_t* out) { +#if C10_HAS_BUILTIN_OVERFLOW() + uint64_t prod = 1; + bool overflow = false; + for (; first != last; ++first) { + overflow |= c10::mul_overflows(prod, *first, &prod); + } + *out = prod; + return overflow; +#else + uint64_t prod = 1; + uint64_t prod_log2 = 0; + bool is_zero = false; + for (; first != last; ++first) { + auto x = static_cast(*first); + prod *= x; + // log2(0) isn't valid, so need to track it specially + is_zero |= (x == 0); + prod_log2 += c10::llvm::Log2_64_Ceil(x); + } + *out = prod; + // This test isnt exact, but avoids doing integer division + return !is_zero && (prod_log2 >= 64); +#endif +} + +template +bool safe_multiplies_u64(const Container& c, uint64_t* out) { + return safe_multiplies_u64(c.begin(), c.end(), out); +} + +} // namespace c10 diff --git a/caffe2/CMakeLists.txt b/caffe2/CMakeLists.txt index c636cd18c0a5a4..b44ea8150f6eeb 100644 --- a/caffe2/CMakeLists.txt +++ b/caffe2/CMakeLists.txt @@ -350,6 +350,13 @@ if(NOT INTERN_BUILD_MOBILE OR NOT BUILD_CAFFE2_MOBILE) "${TORCH_SRC_DIR}/csrc/autograd/generated/ADInplaceOrViewType_0.cpp" "${TORCH_SRC_DIR}/csrc/autograd/generated/ADInplaceOrViewType_1.cpp" ) + if(BUILD_LAZY_TS_BACKEND) + list(APPEND GENERATED_CXX_TORCH + "${TORCH_SRC_DIR}/csrc/lazy/generated/LazyNativeFunctions.cpp" + "${TORCH_SRC_DIR}/csrc/lazy/generated/RegisterAutogradLazy.cpp" + "${TORCH_SRC_DIR}/csrc/lazy/generated/RegisterLazy.cpp" + ) + endif() endif() set(GENERATED_H_TORCH @@ -360,6 +367,8 @@ if(NOT INTERN_BUILD_MOBILE OR NOT BUILD_CAFFE2_MOBILE) if(NOT INTERN_DISABLE_AUTOGRAD) list(APPEND GENERATED_H_TORCH "${TORCH_SRC_DIR}/csrc/autograd/generated/VariableType.h" + "${TORCH_SRC_DIR}/csrc/lazy/generated/LazyIr.h" + "${TORCH_SRC_DIR}/csrc/lazy/generated/LazyNativeFunctions.h" ) endif() @@ -397,18 +406,31 @@ if(NOT INTERN_BUILD_MOBILE OR NOT BUILD_CAFFE2_MOBILE) ${GENERATED_TESTING_PYTHON} ) + set(GEN_PER_OPERATOR_FLAG) + if(USE_PER_OPERATOR_HEADERS) + list(APPEND GEN_PER_OPERATOR_FLAG "--per_operator_headers") + endif() + add_custom_command( OUTPUT ${TORCH_GENERATED_CODE} COMMAND "${PYTHON_EXECUTABLE}" tools/setup_helpers/generate_code.py --native-functions-path "aten/src/ATen/native/native_functions.yaml" - --nn-path "aten/src" $<$:--disable-autograd> $<$:--selected-op-list-path="${SELECTED_OP_LIST}"> --force_schema_registration + --gen_lazy_ts_backend + ${GEN_PER_OPERATOR_FLAG} DEPENDS "${TORCH_ROOT}/aten/src/ATen/native/native_functions.yaml" + "${TORCH_ROOT}/aten/src/ATen/native/ts_native_functions.yaml" + "${TORCH_ROOT}/torch/csrc/lazy/core/shape_inference.h" + "${TORCH_ROOT}/torch/csrc/lazy/ts_backend/ts_native_functions.cpp" + "${TORCH_ROOT}/aten/src/ATen/templates/DispatchKeyNativeFunctions.h" + "${TORCH_ROOT}/aten/src/ATen/templates/DispatchKeyNativeFunctions.cpp" + "${TORCH_ROOT}/aten/src/ATen/templates/LazyIr.h" + "${TORCH_ROOT}/aten/src/ATen/templates/RegisterDispatchKey.cpp" "${TOOLS_PATH}/autograd/templates/VariableType.h" "${TOOLS_PATH}/autograd/templates/VariableType.cpp" "${TOOLS_PATH}/autograd/templates/ADInplaceOrViewType.cpp" @@ -436,6 +458,10 @@ if(NOT INTERN_BUILD_MOBILE OR NOT BUILD_CAFFE2_MOBILE) "${TOOLS_PATH}/autograd/gen_variable_type.py" "${TOOLS_PATH}/autograd/gen_inplace_or_view_type.py" "${TOOLS_PATH}/autograd/load_derivatives.py" + "${TOOLS_PATH}/codegen/gen_backend_stubs.py" + "${TOOLS_PATH}/codegen/gen_lazy_tensor.py" + "${TOOLS_PATH}/codegen/api/lazy.py" + "${TOOLS_PATH}/codegen/dest/lazy_ir.py" WORKING_DIRECTORY "${TORCH_ROOT}") @@ -475,7 +501,9 @@ if(NOT INTERN_BUILD_MOBILE OR NOT BUILD_CAFFE2_MOBILE) set(CMAKE_POSITION_INDEPENDENT_CODE TRUE) else() append_filelist("libtorch_cmake_sources" LIBTORCH_CMAKE_SRCS) - + if(BUILD_LAZY_TS_BACKEND) + append_filelist("lazy_tensor_ts_sources" LIBTORCH_CMAKE_SRCS) + endif() if(CMAKE_CXX_COMPILER_ID MATCHES "Clang" OR CMAKE_CXX_COMPILER_ID STREQUAL "GNU") # TODO: Delete this line once https://github.com/pytorch/pytorch/pull/55889 lands set_source_files_properties(../torch/csrc/jit/serialization/export.cpp PROPERTIES COMPILE_FLAGS -Wno-deprecated-declarations) @@ -904,15 +932,26 @@ elseif(USE_CUDA) if(BUILD_LAZY_CUDA_LINALG) add_library(torch_cuda_linalg ${ATen_CUDA_LINALG_SRCS}) target_compile_definitions(torch_cuda_linalg PRIVATE USE_CUDA BUILD_LAZY_CUDA_LINALG) + # Library order is important during static linking + # `torch::magma` should be mentioned before other CUDA + # to transitively include all symbols present in torch_cuda/torch_cpu + if(USE_MAGMA) + target_link_libraries(torch_cuda_linalg PRIVATE torch::magma) + # CUDAHooks reports version of MAGMA PyTorch was compiled against, i.e. needs to be able to include magma headers + get_target_property(HOOKS_INCLUDE_DIRECTORIES torch_cuda INCLUDE_DIRECTORIES) + if(NOT "${MAGMA_INCLUDE_DIR}" IN_LIST HOOKS_INCLUDE_DIRECTORIES) + set_source_files_properties(${CMAKE_CURRENT_SOURCE_DIR}/../aten/src/ATen/cuda/detail/CUDAHooks.cpp PROPERTIES INCLUDE_DIRECTORIES "${MAGMA_INCLUDE_DIR}") + endif() + endif() target_link_libraries(torch_cuda_linalg PRIVATE torch_cpu torch_cuda ${CUDA_cusolver_LIBRARY} ) - if(USE_MAGMA) - target_link_libraries(torch_cuda_linalg PRIVATE torch::magma) - # CUDAHooks reports version of MAGMA PyTorch was compiled against, i.e. needs to be able to include magma headers - set_source_files_properties(${CMAKE_CURRENT_SOURCE_DIR}/../aten/src/ATen/cuda/detail/CUDAHooks.cpp PROPERTIES INCLUDE_DIRECTORIES "${MAGMA_INCLUDE_DIR}") + # NS: TODO, is this really necessary? + if(USE_MAGMA AND CAFFE2_STATIC_LINK_CUDA) + target_link_libraries(torch_cuda_linalg PRIVATE + "${CUDA_TOOLKIT_ROOT_DIR}/lib64/libculibos.a" dl) endif() set_source_files_properties(${CMAKE_CURRENT_SOURCE_DIR}/../aten/src/ATen/native/cuda/LinearAlgebraStubs.cpp PROPERTIES COMPILE_FLAGS "-DBUILD_LAZY_CUDA_LINALG") install(TARGETS torch_cuda_linalg DESTINATION "${TORCH_INSTALL_LIB_DIR}") @@ -930,59 +969,7 @@ elseif(USE_CUDA) endif() if(USE_CUDA OR USE_ROCM) - if(BUILD_SPLIT_CUDA) - set(TORCHLIB_FLAVOR torch_cuda_cu) # chose torch_cuda_cu here since JIT is in torch_cuda_cpp - elseif(USE_CUDA) - set(TORCHLIB_FLAVOR torch_cuda) - elseif(USE_ROCM) - set(TORCHLIB_FLAVOR torch_hip) - endif() - - # The list of NVFUSER runtime files - list(APPEND NVFUSER_RUNTIME_FILES - ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/runtime/block_reduction.cu - ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/runtime/block_sync_atomic.cu - ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/runtime/block_sync_default.cu - ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/runtime/broadcast.cu - ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/runtime/fp16_support.cu - ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/runtime/bf16_support.cu - ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/runtime/grid_broadcast.cu - ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/runtime/grid_reduction.cu - ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/runtime/grid_sync.cu - ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/runtime/helpers.cu - ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/runtime/index_utils.cu - ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/runtime/random_numbers.cu - ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/runtime/tensor.cu - ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/runtime/welford.cu - ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/runtime/warp.cu - ${CMAKE_CURRENT_SOURCE_DIR}/../aten/src/ATen/cuda/detail/PhiloxCudaStateRaw.cuh - ${CMAKE_CURRENT_SOURCE_DIR}/../aten/src/ATen/cuda/detail/UnpackRaw.cuh - ) - - file(MAKE_DIRECTORY "${CMAKE_BINARY_DIR}/include/nvfuser_resources") - - # "stringify" NVFUSER runtime sources - # (generate C++ header files embedding the original input as a string literal) - set(NVFUSER_STRINGIFY_TOOL "${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/tools/stringify_file.py") - foreach(src ${NVFUSER_RUNTIME_FILES}) - get_filename_component(filename ${src} NAME_WE) - set(dst "${CMAKE_BINARY_DIR}/include/nvfuser_resources/${filename}.h") - add_custom_command( - COMMENT "Stringify NVFUSER runtime source file" - OUTPUT ${dst} - DEPENDS ${src} - COMMAND ${PYTHON_EXECUTABLE} ${NVFUSER_STRINGIFY_TOOL} -i ${src} -o ${dst} - ) - add_custom_target(nvfuser_rt_${filename} DEPENDS ${dst}) - add_dependencies(${TORCHLIB_FLAVOR} nvfuser_rt_${filename}) - - # also generate the resource headers during the configuration step - # (so tools like clang-tidy can run w/o requiring a real build) - execute_process(COMMAND - ${PYTHON_EXECUTABLE} ${NVFUSER_STRINGIFY_TOOL} -i ${src} -o ${dst}) - endforeach() - - target_include_directories(${TORCHLIB_FLAVOR} PRIVATE "${CMAKE_BINARY_DIR}/include") + include(${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/nvfuser.cmake) endif() if(NOT MSVC AND USE_XNNPACK) @@ -1077,7 +1064,7 @@ if(NOT INTERN_BUILD_MOBILE OR NOT BUILD_CAFFE2_MOBILE) set_source_files_properties(${CMAKE_CURRENT_SOURCE_DIR}/../aten/src/ATen/native/QuantizedLinear.cpp PROPERTIES COMPILE_FLAGS -Wno-deprecated-declarations) set_source_files_properties(${CMAKE_CURRENT_SOURCE_DIR}/../aten/src/ATen/native/RNN.cpp PROPERTIES COMPILE_FLAGS -Wno-deprecated-declarations) set_source_files_properties(${CMAKE_CURRENT_SOURCE_DIR}/../aten/src/ATen/native/quantized/cpu/qlinear_prepack.cpp PROPERTIES COMPILE_FLAGS -Wno-deprecated-declarations) - set_source_files_properties(${CMAKE_CURRENT_SOURCE_DIR}/../aten/src/ATen/native/quantized/cpu/qlinear_unpack.cpp PROPERTIES COMPILE_FLAGS -Wno-deprecated-declarations) + set_source_files_properties(${CMAKE_CURRENT_SOURCE_DIR}/../aten/src/ATen/native/quantized/qlinear_unpack.cpp PROPERTIES COMPILE_FLAGS -Wno-deprecated-declarations) endif() if(USE_TBB) @@ -1107,7 +1094,7 @@ endif() install(DIRECTORY "${TORCH_SRC_DIR}/csrc" DESTINATION ${TORCH_INSTALL_INCLUDE_DIR}/torch - FILES_MATCHING PATTERN "*.h") + FILES_MATCHING PATTERN "*.h" PATTERN "*.hpp") install(DIRECTORY "${TORCH_SRC_DIR}/csrc/distributed/c10d" DESTINATION ${TORCH_INSTALL_INCLUDE_DIR} FILES_MATCHING PATTERN "*.h" PATTERN "*.hpp") @@ -1315,8 +1302,14 @@ if(USE_DISTRIBUTED) else() if(BUILD_SPLIT_CUDA) target_compile_definitions(torch_cuda_cpp PUBLIC USE_C10D_NCCL) + if(USE_NCCL_WITH_UCC) + target_compile_definitions(torch_cuda_cpp PUBLIC USE_NCCL_WITH_UCC) + endif() else() target_compile_definitions(torch_cuda PUBLIC USE_C10D_NCCL) + if(USE_NCCL_WITH_UCC) + target_compile_definitions(torch_cuda PUBLIC USE_NCCL_WITH_UCC) + endif() endif() endif() endif() diff --git a/caffe2/contrib/aten/aten_op_template.h b/caffe2/contrib/aten/aten_op_template.h index a5d1ea40e27a8b..97c64631921ad8 100644 --- a/caffe2/contrib/aten/aten_op_template.h +++ b/caffe2/contrib/aten/aten_op_template.h @@ -179,8 +179,9 @@ class ATenOp : public Operator { std::vector attrs; for (const auto i : c10::irange(operator_def.arg_size())) { auto & attr = operator_def.arg(i); - if(attr.name() == "operator" || attr.name() == "type" ) + if(attr.name() == "operator" || attr.name() == "type" || attr.name() == "overload_name" ) { continue; + } attrs.push_back(attr.name()); } std::sort(attrs.begin(), attrs.end()); diff --git a/caffe2/core/blob_test.cc b/caffe2/core/blob_test.cc index 2249c3bcbf2ab3..a7e3a8d27e23ac 100644 --- a/caffe2/core/blob_test.cc +++ b/caffe2/core/blob_test.cc @@ -1264,7 +1264,7 @@ void TestDataType( std::string dataTypeName) { LOG(INFO) << dataTypeName; FLAGS_caffe2_serialize_using_bytes_as_holder = true; - size_t numEl = 1000; + int numEl = 1000; // Proto with int32 auto protoInt32 = CreateProtoWithInt32Data(dataType, numEl, false); caffe2::Blob blobInt32; diff --git a/caffe2/core/export_caffe2_op_to_c10.h b/caffe2/core/export_caffe2_op_to_c10.h index 66ffdf21a1085c..82da29a44f4b4d 100644 --- a/caffe2/core/export_caffe2_op_to_c10.h +++ b/caffe2/core/export_caffe2_op_to_c10.h @@ -4,12 +4,13 @@ #if defined(EXPOSE_C2_OPS) || \ !defined(CAFFE2_IS_XPLAT_BUILD) && !defined(C10_MOBILE) +#include #include #include #include -#include #include #include +#include #include #include @@ -113,7 +114,9 @@ void call_caffe2_op_from_c10( _call_caffe2_op_from_c10(stack, Schema(), &_call_caffe2_op); } -inline FunctionSchema make_function_schema_for_c10(const char* schema_str) { +inline FunctionSchema make_function_schema_for_c10( + const char* schema_str, + c10::optional optional_alias_analysis_kind) { #if !defined(EXPOSE_C2_OPS) && \ (defined(CAFFE2_IS_XPLAT_BUILD) || defined(C10_MOBILE)) throw std::logic_error( @@ -127,13 +130,17 @@ inline FunctionSchema make_function_schema_for_c10(const char* schema_str) { nullopt, IValue()); - return FunctionSchema( + auto schema = FunctionSchema( parsed_schema.name(), parsed_schema.overload_name(), std::move(arguments), parsed_schema.returns(), parsed_schema.is_vararg(), parsed_schema.is_varret()); + if (optional_alias_analysis_kind) { + schema.setAliasAnalysis(*optional_alias_analysis_kind); + } + return schema; #endif } @@ -169,7 +176,7 @@ inline FunctionSchema make_function_schema_for_c10(const char* schema_str) { * caffe2. * - all operators must call C10_DECLARE_EXPORT_CAFFE2_OP_TO_C10 and * C10_EXPORT_CAFFE2_OP_TO_C10_CPU . - * - calling C10_EXPORT_CAFFE2_OP_TO_C10_CUDA is optional and can be omitted i f + * - calling C10_EXPORT_CAFFE2_OP_TO_C10_CUDA is optional and can be omitted if * you don't want to expose the operator for CUDA operations. * - caffe2 arguments must come after caffe2 inputs, in other words, any tensor * inputs must precede any non-tensor inputs. @@ -178,73 +185,85 @@ inline FunctionSchema make_function_schema_for_c10(const char* schema_str) { * - If your operator has a variable number of input tensors, make the first (!) * input an input of type TensorList. There must be no other tensor inputs. */ -#define C10_DECLARE_EXPORT_CAFFE2_OP_TO_C10(OperatorName) \ - namespace caffe2 { \ - namespace _c10_ops { \ +#define C10_DECLARE_EXPORT_CAFFE2_OP_TO_C10(OperatorName) \ + namespace caffe2 { \ + namespace _c10_ops { \ TORCH_API const FunctionSchema& schema_##OperatorName(); \ - } \ + } \ } -#define C10_EXPORT_CAFFE2_OP_TO_C10_SCHEMA_ONLY(OperatorName, OperatorSchema) \ - /* Register the op schema with the c10 dispatcher */ \ - namespace caffe2 { \ - namespace _c10_ops { \ - C10_EXPORT const FunctionSchema& schema_##OperatorName() { \ - static const FunctionSchema schema = \ - ::caffe2::detail::make_function_schema_for_c10(OperatorSchema); \ - return schema; \ - } \ - TORCH_LIBRARY_FRAGMENT(_caffe2, m) { \ - m.def(::caffe2::detail::make_function_schema_for_c10(OperatorSchema)); \ - } \ - } \ +#define C10_EXPORT_CAFFE2_OP_TO_C10_SCHEMA_ONLY( \ + OperatorName, OperatorSchema, OptionalAliasAnalysisKind) \ + /* Register the op schema with the c10 dispatcher */ \ + namespace caffe2 { \ + namespace _c10_ops { \ + C10_EXPORT const FunctionSchema& schema_##OperatorName() { \ + static const FunctionSchema schema = \ + ::caffe2::detail::make_function_schema_for_c10( \ + OperatorSchema, OptionalAliasAnalysisKind); \ + return schema; \ + } \ + TORCH_LIBRARY_FRAGMENT(_caffe2, m) { \ + m.def(::caffe2::detail::make_function_schema_for_c10( \ + OperatorSchema, OptionalAliasAnalysisKind)); \ + } \ + } \ } #define C10_EXPORT_CAFFE2_OP_TO_C10_CPU_KERNEL_ONLY( \ OperatorName, OperatorClass) \ /* Register call_caffe2_op_from_c10 as a kernel with the c10 dispatcher */ \ - TORCH_LIBRARY_IMPL(_caffe2, CPU, m) { \ - m.impl("_caffe2::" #OperatorName, \ - torch::CppFunction::makeFromBoxedFunction< \ - ::caffe2::detail::call_caffe2_op_from_c10< \ - ::caffe2::_c10_ops::schema_##OperatorName, \ - OperatorClass>>()); \ - } + TORCH_LIBRARY_IMPL(_caffe2, CPU, m) { \ + m.impl( \ + "_caffe2::" #OperatorName, \ + torch::CppFunction::makeFromBoxedFunction< \ + ::caffe2::detail::call_caffe2_op_from_c10< \ + ::caffe2::_c10_ops::schema_##OperatorName, \ + OperatorClass>>()); \ + } + +#define C10_EXPORT_CAFFE2_OP_TO_C10_CPU( \ + OperatorName, OperatorSchema, OperatorClass) \ + C10_EXPORT_CAFFE2_OP_TO_C10_SCHEMA_ONLY( \ + OperatorName, OperatorSchema, c10::nullopt) \ + C10_EXPORT_CAFFE2_OP_TO_C10_CPU_KERNEL_ONLY(OperatorName, OperatorClass) -#define C10_EXPORT_CAFFE2_OP_TO_C10_CPU( \ - OperatorName, OperatorSchema, OperatorClass) \ - C10_EXPORT_CAFFE2_OP_TO_C10_SCHEMA_ONLY(OperatorName, OperatorSchema) \ +#define C10_EXPORT_CAFFE2_OP_TO_C10_CPU_WITH_ALIAS_ANALYSIS( \ + OperatorName, OperatorSchema, OperatorClass, OptionalAliasAnalysisKind) \ + C10_EXPORT_CAFFE2_OP_TO_C10_SCHEMA_ONLY( \ + OperatorName, OperatorSchema, OptionalAliasAnalysisKind) \ C10_EXPORT_CAFFE2_OP_TO_C10_CPU_KERNEL_ONLY(OperatorName, OperatorClass) #define C10_EXPORT_CAFFE2_OP_TO_C10_CUDA(OperatorName, OperatorClass) \ /* Register call_caffe2_op_from_c10 as a kernel with the c10 dispatcher */ \ - TORCH_LIBRARY_IMPL(_caffe2, CUDA, m) { \ - m.impl("_caffe2::" #OperatorName, \ - torch::CppFunction::makeFromBoxedFunction< \ - ::caffe2::detail::call_caffe2_op_from_c10< \ - ::caffe2::_c10_ops::schema_##OperatorName, \ - OperatorClass>>()); \ - } - + TORCH_LIBRARY_IMPL(_caffe2, CUDA, m) { \ + m.impl( \ + "_caffe2::" #OperatorName, \ + torch::CppFunction::makeFromBoxedFunction< \ + ::caffe2::detail::call_caffe2_op_from_c10< \ + ::caffe2::_c10_ops::schema_##OperatorName, \ + OperatorClass>>()); \ + } // You should never manually call the C10_EXPORT_CAFFE2_OP_TO_C10_HIP macro . // The C10_EXPORT_CAFFE2_OP_TO_C10_CUDA macro from above will be automatically // rewritten to C10_EXPORT_CAFFE2_OP_TO_C10_HIP by hipify . #define C10_EXPORT_CAFFE2_OP_TO_C10_HIP(OperatorName, OperatorClass) \ /* Register call_caffe2_op_from_c10 as a kernel with the c10 dispatcher */ \ - TORCH_LIBRARY_IMPL(_caffe2, HIP, m) { \ - m.impl("_caffe2::" #OperatorName, \ - torch::CppFunction::makeFromBoxedFunction< \ - ::caffe2::detail::call_caffe2_op_from_c10< \ - ::caffe2::_c10_ops::schema_##OperatorName, \ - OperatorClass>>()); \ - } - + TORCH_LIBRARY_IMPL(_caffe2, HIP, m) { \ + m.impl( \ + "_caffe2::" #OperatorName, \ + torch::CppFunction::makeFromBoxedFunction< \ + ::caffe2::detail::call_caffe2_op_from_c10< \ + ::caffe2::_c10_ops::schema_##OperatorName, \ + OperatorClass>>()); \ + } #else // Don't use c10 dispatcher on mobile because of binary size #define C10_DECLARE_EXPORT_CAFFE2_OP_TO_C10(OperatorName) -#define C10_EXPORT_CAFFE2_OP_TO_C10_SCHEMA_ONLY(OperatorName, OperatorSchema) +#define C10_EXPORT_CAFFE2_OP_TO_C10_SCHEMA_ONLY( \ + OperatorName, OperatorSchema, OptionalAliasAnalysisKind) #define C10_EXPORT_CAFFE2_OP_TO_C10_CPU_KERNEL_ONLY(OperatorName, OperatorClass) #define C10_EXPORT_CAFFE2_OP_TO_C10_CPU( \ OperatorName, OperatorSchema, OperatorClass) diff --git a/caffe2/core/qtensor.h b/caffe2/core/qtensor.h index a34da6918bcd2f..f94863a09782ac 100644 --- a/caffe2/core/qtensor.h +++ b/caffe2/core/qtensor.h @@ -60,8 +60,7 @@ class C10_EXPORT QTensor { void Resize(at::ArrayRef dim_source) { if (dims_ != dim_source) { const auto source_size = c10::multiply_integers(dim_source); - // NOLINTNEXTLINE(clang-diagnostic-sign-compare) - if ((source_size * (precision_ + signed_)) > capacity_) { + if (static_cast(source_size * (precision_ + signed_)) > capacity_) { data_ptr_.clear(); capacity_ = 0; } diff --git a/caffe2/core/serialization_test.cc b/caffe2/core/serialization_test.cc index 1912802d2ac8fd..902a3e01e6773c 100644 --- a/caffe2/core/serialization_test.cc +++ b/caffe2/core/serialization_test.cc @@ -69,7 +69,7 @@ TEST(TensorSerialization, TestUnknownDType) { auto* blobTensor = BlobGetMutableTensor(&blob, CPU); blobTensor->Resize(kTestTensorSize, 1); auto *tensorData = blobTensor->mutable_data(); - for (int n = 0; n < kTestTensorSize; ++n) { + for (unsigned n = 0; n < kTestTensorSize; ++n) { tensorData[n] = n; } auto data = SerializeBlob(blob, "test_blob"); @@ -85,7 +85,7 @@ TEST(TensorSerialization, TestUnknownDType) { EXPECT_EQ(kTestTensorSize, tensor.numel()); EXPECT_EQ(TypeMeta::Make(), tensor.dtype()); const auto* tensor_data = tensor.template data(); - for (int i = 0; i < kTestTensorSize; ++i) { + for (unsigned i = 0; i < kTestTensorSize; ++i) { EXPECT_EQ(static_cast(i), tensor_data[i]); } diff --git a/caffe2/core/transform_test.cc b/caffe2/core/transform_test.cc index adb7ecae050be6..0dc6ba92c7f9e9 100644 --- a/caffe2/core/transform_test.cc +++ b/caffe2/core/transform_test.cc @@ -55,7 +55,7 @@ class DummyTransform : public Transform { return false; } // which index are we trying to append the new node to? - int pattern_idx = subgraph.size(); + auto pattern_idx = subgraph.size(); // type doesn't match if (g.node(idx).op.type() != pattern_chain[pattern_idx]) { return false; diff --git a/caffe2/operators/copy_op.cc b/caffe2/operators/copy_op.cc index f2323bbaf06f7e..c0efef07eeb6a6 100644 --- a/caffe2/operators/copy_op.cc +++ b/caffe2/operators/copy_op.cc @@ -200,8 +200,10 @@ REGISTER_GRADIENT(CopyCPUToGPU, GetCPUToGPUGradient); C10_EXPORT_CAFFE2_OP_TO_C10_SCHEMA_ONLY( CopyGPUToCPU, - "_caffe2::CopyGPUToCPU(Tensor input) -> Tensor"); + "_caffe2::CopyGPUToCPU(Tensor input) -> Tensor", + /*optional_alias_analysis_kind=*/c10::nullopt); C10_EXPORT_CAFFE2_OP_TO_C10_SCHEMA_ONLY( CopyCPUToGPU, - "_caffe2::CopyCPUToGPU(Tensor input) -> Tensor"); + "_caffe2::CopyCPUToGPU(Tensor input) -> Tensor", + /*optional_alias_analysis_kind=*/c10::nullopt); diff --git a/caffe2/operators/generate_proposals_op_util_nms.h b/caffe2/operators/generate_proposals_op_util_nms.h index 09b10c8e192aa4..a74d04f217fdb6 100644 --- a/caffe2/operators/generate_proposals_op_util_nms.h +++ b/caffe2/operators/generate_proposals_op_util_nms.h @@ -50,8 +50,7 @@ std::vector nms_cpu_upright( std::vector keep; while (order.size() > 0) { // exit if already enough proposals - // NOLINTNEXTLINE(clang-diagnostic-sign-compare) - if (topN >= 0 && keep.size() >= topN) { + if (topN >= 0 && keep.size() >= static_cast(topN)) { break; } @@ -127,7 +126,7 @@ std::vector soft_nms_cpu_upright( EArrXi pending = AsEArrXt(indices); while (pending.size() > 0) { // Exit if already enough proposals - if (topN >= 0 && keep.size() >= topN) { + if (topN >= 0 && keep.size() >= static_cast(topN)) { break; } @@ -560,8 +559,7 @@ std::vector nms_cpu_rotated( std::vector keep; while (order.size() > 0) { // exit if already enough proposals - // NOLINTNEXTLINE(clang-diagnostic-sign-compare) - if (topN >= 0 && keep.size() >= topN) { + if (topN >= 0 && keep.size() >= static_cast(topN)) { break; } @@ -626,7 +624,7 @@ std::vector soft_nms_cpu_rotated( EArrXi pending = AsEArrXt(indices); while (pending.size() > 0) { // Exit if already enough proposals - if (topN >= 0 && keep.size() >= topN) { + if (topN >= 0 && keep.size() >= static_cast(topN)) { break; } diff --git a/caffe2/operators/quantized/int8_test.cc b/caffe2/operators/quantized/int8_test.cc index b6d9719d522303..9b14d3eaec1dae 100644 --- a/caffe2/operators/quantized/int8_test.cc +++ b/caffe2/operators/quantized/int8_test.cc @@ -341,8 +341,8 @@ TEST(Int8, SumRelu) { } void setq(int8::Int8TensorCPU* dst, const std::vector& vs) { - CHECK_EQ(vs.size(), dst->t.numel()); - for (auto i = 0; i < vs.size(); ++i) { + CHECK_EQ(vs.size(), static_cast(dst->t.numel())); + for (auto i = 0U; i < vs.size(); ++i) { uint8_t vq = std::max( std::numeric_limits::min(), std::min( @@ -354,8 +354,8 @@ void setq(int8::Int8TensorCPU* dst, const std::vector& vs) { } void biassetq(int8::Int8TensorCPU* dst, const std::vector& vs) { - CHECK_EQ(vs.size(), dst->t.numel()); - for (auto i = 0; i < vs.size(); ++i) { + CHECK_EQ(vs.size(), static_cast(dst->t.numel())); + for (auto i = 0U; i < vs.size(); ++i) { int32_t vq = std::max( std::numeric_limits::min(), std::min( diff --git a/caffe2/operators/text_file_reader_utils.h b/caffe2/operators/text_file_reader_utils.h index 01b4743a91c145..a4f2d6189860e7 100644 --- a/caffe2/operators/text_file_reader_utils.h +++ b/caffe2/operators/text_file_reader_utils.h @@ -56,7 +56,7 @@ struct TORCH_API CharRange { struct TORCH_API StringProvider { virtual void operator()(CharRange&) = 0; virtual void reset() = 0; - virtual ~StringProvider() {} + virtual ~StringProvider() = default; }; class TORCH_API BufferedTokenizer { @@ -99,7 +99,7 @@ class TORCH_API BufferedTokenizer { StringProvider* provider_; Tokenizer tokenizer_; TokenizedString tokenized_; - int tokenIndex_; + unsigned tokenIndex_; int numPasses_; int pass_{0}; }; diff --git a/caffe2/opt/bound_shape_inference_test.cc b/caffe2/opt/bound_shape_inference_test.cc index 867142746d82ad..8224281124e1f4 100644 --- a/caffe2/opt/bound_shape_inference_test.cc +++ b/caffe2/opt/bound_shape_inference_test.cc @@ -45,7 +45,7 @@ void verifyShapeInfo( EXPECT_EQ(shape_info.getDimType(), t); const auto& shape = shape_info.shape; ASSERT_EQ(shape.dims_size(), dims.size()); - for (int i = 0; i < dims.size(); ++i) { + for (unsigned i = 0; i < dims.size(); ++i) { EXPECT_EQ(dims[i], shape.dims(i)); } EXPECT_EQ(shape.data_type(), dtype); diff --git a/caffe2/perfkernels/adagrad_avx2.cc b/caffe2/perfkernels/adagrad_avx2.cc index 0039afa942f1de..08c9fd00d9a089 100644 --- a/caffe2/perfkernels/adagrad_avx2.cc +++ b/caffe2/perfkernels/adagrad_avx2.cc @@ -18,7 +18,7 @@ void adagrad_update__avx2_fma( float decay, float lr, float weight_decay = 0.f) { - constexpr size_t kSize = 8; + constexpr int kSize = 8; auto i = 0; for (; i + kSize <= N; i += kSize) { __m256 gi = _mm256_loadu_ps(g + i); diff --git a/caffe2/python/memonger.py b/caffe2/python/memonger.py index 6225781bc429a9..178ebd8cd30248 100644 --- a/caffe2/python/memonger.py +++ b/caffe2/python/memonger.py @@ -798,15 +798,29 @@ def canonical_name(blob): op.output[i] = canonical_name(output) - def apply_recurrent_blob_assignments(op, blob_assignments, canonical_name): log.debug("Applying assignments to recurrent op: {}".format(op.type)) + + # Apply on alias_dst + alias_dst_args = [a for a in op.arg if a.name.endswith("alias_dst")] + for alias_dst in alias_dst_args: + for i, blob in enumerate(alias_dst.strings): + alias_dst.strings[i] = canonical_name(blob.decode()).encode() + + # Apply on link_external + link_external_args = [a for a in op.arg if a.name.endswith("link_external")] + for link_external in link_external_args: + for i, blob in enumerate(link_external.strings): + link_external.strings[i] = canonical_name(blob.decode()).encode() + + # Recurse into step nets step_args = [a for a in op.arg if a.name.endswith("step_net")] for step_arg in step_args: apply_assignments(step_arg.n, blob_assignments) for i, einp in enumerate(step_arg.n.external_input): if einp in blob_assignments: step_arg.n.external_input[i] = canonical_name(einp) + # Store renamings for blob, renamed in viewitems(blob_assignments): if blob in list(op.input) + list(op.output): diff --git a/caffe2/python/pybind_state.cc b/caffe2/python/pybind_state.cc index ad04cab82d5aa0..ccaa0afb6ac91e 100644 --- a/caffe2/python/pybind_state.cc +++ b/caffe2/python/pybind_state.cc @@ -300,7 +300,7 @@ class GetPythonGradient : public GradientMakerBase { } if (gradOutputIndices.size() > 0) { // NOLINTNEXTLINE(modernize-loop-convert) - for (int i = 0; i < gradOutputIndices.size(); ++i) { + for (unsigned i = 0; i < gradOutputIndices.size(); ++i) { int GO_i = gradOutputIndices[i]; gradientInputs.push_back(GO(GO_i)); } @@ -312,7 +312,7 @@ class GetPythonGradient : public GradientMakerBase { std::vector gradientOutputs; if (gradInputIndices.size() > 0) { // NOLINTNEXTLINE(modernize-loop-convert) - for (int i = 0; i < gradInputIndices.size(); ++i) { + for (unsigned i = 0; i < gradInputIndices.size(); ++i) { int GI_i = gradInputIndices[i]; gradientOutputs.push_back(GI(GI_i)); } @@ -877,7 +877,7 @@ void addObjectMethods(py::module& m) { std::vector tensors_data; #ifdef USE_NUMPY // NOLINTNEXTLINE(modernize-loop-convert) - for (auto i = 0; i < inputs.size(); ++i) { + for (auto i = 0U; i < inputs.size(); ++i) { auto input = inputs[i]; CAFFE_ENFORCE( PyArray_Check(input.ptr()), @@ -988,7 +988,7 @@ void addObjectMethods(py::module& m) { std::vector tensors_data; #ifdef USE_NUMPY // NOLINTNEXTLINE(modernize-loop-convert) - for (auto i = 0; i < inputs.size(); ++i) { + for (auto i = 0U; i < inputs.size(); ++i) { auto input = inputs[i]; CAFFE_ENFORCE( PyArray_Check(input.ptr()), @@ -1201,7 +1201,7 @@ void addGlobalMethods(py::module& m) { }); m.def("nearby_opnames", [](const std::string& name) { std::vector alternatives; - int editTolerance = 3; + unsigned editTolerance = 3; // NOLINTNEXTLINE(performance-for-range-copy) for (auto it : caffe2::CPUOperatorRegistry()->Keys()) { if (editDistance(it, name, editTolerance) < editTolerance + 1) { diff --git a/caffe2/serialize/inline_container.cc b/caffe2/serialize/inline_container.cc index 9f0e9ce6194ef9..92632fc7928b32 100644 --- a/caffe2/serialize/inline_container.cc +++ b/caffe2/serialize/inline_container.cc @@ -129,22 +129,27 @@ void PyTorchStreamReader::init() { } std::string version(static_cast(version_ptr.get()), version_size); version_ = caffe2::stoull(version); - AT_ASSERTM( - // NOLINTNEXTLINE(clang-diagnostic-sign-compare) - version_ >= kMinSupportedFileFormatVersion, - "Attempted to read a PyTorch file with version ", - c10::to_string(version_), - ", but the minimum supported version for reading is ", - c10::to_string(kMinSupportedFileFormatVersion), - ". Your PyTorch script module file is too old. Please re-export it again."); - AT_ASSERTM( - // NOLINTNEXTLINE(clang-diagnostic-sign-compare) - version_ <= kMaxSupportedFileFormatVersion, - "Attempted to read a PyTorch file with version ", - version_, - ", but the maximum supported version for reading is ", - kMaxSupportedFileFormatVersion, - ". Your PyTorch installation may be too old."); + // NOLINTNEXTLINE(clang-diagnostic-sign-compare) + if (version_ < kMinSupportedFileFormatVersion) { + CAFFE_THROW( + "Attempted to read a PyTorch file with version ", + c10::to_string(version_), + ", but the minimum supported version for reading is ", + c10::to_string(kMinSupportedFileFormatVersion), + ". Your PyTorch script module file is too old. Please regenerate it", + " with latest version of PyTorch to mitigate this issue."); + } + + // NOLINTNEXTLINE(clang-diagnostic-sign-compare) + if (version_ > kMaxSupportedFileFormatVersion) { + CAFFE_THROW( + "Attempted to read a PyTorch file with version ", + version_, + ", but the maximum supported version for reading is ", + kMaxSupportedFileFormatVersion, + ". The version of your PyTorch installation may be too old, ", + "please upgrade PyTorch to latest version to mitigate this issue."); + } } void PyTorchStreamReader::valid(const char* what, const char* info) { diff --git a/caffe2/serialize/inline_container_test.cc b/caffe2/serialize/inline_container_test.cc index 5ceb7274b771f2..18f75dddfaa5f5 100644 --- a/caffe2/serialize/inline_container_test.cc +++ b/caffe2/serialize/inline_container_test.cc @@ -5,6 +5,7 @@ #include #include "caffe2/serialize/inline_container.h" +#include "c10/util/irange.h" namespace caffe2 { namespace serialize { @@ -22,14 +23,14 @@ TEST(PyTorchStreamWriterAndReader, SaveAndLoad) { // NOLINTNEXTLINE(cppcoreguidelines-pro-type-member-init,cppcoreguidelines-avoid-magic-numbers) std::array data1; - for (int i = 0; i < data1.size(); ++i) { + for (auto i: c10::irange( data1.size())) { data1[i] = data1.size() - i; } writer.writeRecord("key1", data1.data(), data1.size()); // NOLINTNEXTLINE(cppcoreguidelines-pro-type-member-init,cppcoreguidelines-avoid-magic-numbers) std::array data2; - for (int i = 0; i < data2.size(); ++i) { + for (auto i: c10::irange(data2.size())) { data2[i] = data2.size() - i; } writer.writeRecord("key2", data2.data(), data2.size()); @@ -83,14 +84,14 @@ TEST(PytorchStreamWriterAndReader, GetNonexistentRecordThrows) { // NOLINTNEXTLINE(cppcoreguidelines-pro-type-member-init,cppcoreguidelines-avoid-magic-numbers) std::array data1; - for (int i = 0; i < data1.size(); ++i) { + for (auto i: c10::irange(data1.size())) { data1[i] = data1.size() - i; } writer.writeRecord("key1", data1.data(), data1.size()); // NOLINTNEXTLINE(cppcoreguidelines-pro-type-member-init,cppcoreguidelines-avoid-magic-numbers) std::array data2; - for (int i = 0; i < data2.size(); ++i) { + for (auto i: c10::irange(data2.size())) { data2[i] = data2.size() - i; } writer.writeRecord("key2", data2.data(), data2.size()); diff --git a/caffe2/serialize/versions.h b/caffe2/serialize/versions.h index 40d2cd0145fd71..78a91c64fe84fd 100644 --- a/caffe2/serialize/versions.h +++ b/caffe2/serialize/versions.h @@ -117,11 +117,16 @@ constexpr uint64_t kMinProducedFileFormatVersion = 0x3L; // {the_pointer_value_the_tensor.storage}, for example: // `140245072983168.storage` Forward-compatibility change. // 0x6L: Implicit opereator versioning using number of specified argument. -// Refer to the summary of https://github.com/pytorch/pytorch/pull/56845 for details. -// 0x7L: Enable support for operators with default arguments plus out arguments. -// Refer. See https://github.com/pytorch/pytorch/pull/63651 for details -// 0x8L: Emit promoted operators as instructions. -// See https://github.com/pytorch/pytorch/pull/71662 for details +// Refer to the summary of https://github.com/pytorch/pytorch/pull/56845 for +// details. +// 0x7L: Enable support for operators with default arguments plus out +// arguments. Refer. See https://github.com/pytorch/pytorch/pull/63651 for +// details. +// 0x8L: Emit promoted operators as instructions. See +// https://github.com/pytorch/pytorch/pull/71662 for details. +// 0x9L: Change serialization format from pickle to format This version is to +// serve migration. v8 pickle and v9 flatbuffer are the same. Refer to the +// summary of https://github.com/pytorch/pytorch/pull/75201 for more details. constexpr uint64_t kProducedBytecodeVersion = 0x8L; // static_assert( @@ -134,8 +139,8 @@ constexpr uint64_t kProducedBytecodeVersion = 0x8L; // kMinSupportedBytecodeVersion <= model_version <= kMaxSupportedBytecodeVersion // (in loader), we should support this model_version. For example, we provide a // wrapper to handle an updated operator. -constexpr uint64_t kMinSupportedBytecodeVersion = 0x3L; -constexpr uint64_t kMaxSupportedBytecodeVersion = 0x8L; +constexpr uint64_t kMinSupportedBytecodeVersion = 0x4L; +constexpr uint64_t kMaxSupportedBytecodeVersion = 0x9L; } // namespace serialize } // namespace caffe2 diff --git a/caffe2/share/contrib/depthwise/depthwise3x3_conv_op_test.cc b/caffe2/share/contrib/depthwise/depthwise3x3_conv_op_test.cc index 879f0d25068b7a..0f7e90e55b53ff 100644 --- a/caffe2/share/contrib/depthwise/depthwise3x3_conv_op_test.cc +++ b/caffe2/share/contrib/depthwise/depthwise3x3_conv_op_test.cc @@ -199,7 +199,7 @@ void runConv( } // unnamed namespace -constexpr size_t kIters = 20; +constexpr int kIters = 20; TEST(DEPTHWISE3x3, Conv) { for (int i = 0; i < kIters; ++i) { diff --git a/caffe2/share/contrib/nnpack/nnpack_test.cc b/caffe2/share/contrib/nnpack/nnpack_test.cc index 398be235f7f13f..fe653c4d91abd0 100644 --- a/caffe2/share/contrib/nnpack/nnpack_test.cc +++ b/caffe2/share/contrib/nnpack/nnpack_test.cc @@ -236,7 +236,7 @@ void runConv( } // unnamed namespace -constexpr size_t kIters = 20; +constexpr int kIters = 20; TEST(NNPACK, Conv_3x3s1) { for (int i = 0; i < kIters; ++i) { diff --git a/cmake/Dependencies.cmake b/cmake/Dependencies.cmake index a818c21eb5ea4f..f8d1ae74eaebad 100644 --- a/cmake/Dependencies.cmake +++ b/cmake/Dependencies.cmake @@ -816,6 +816,10 @@ if(USE_FBGEMM) set_property(TARGET fbgemm_avx2 PROPERTY POSITION_INDEPENDENT_CODE ON) set_property(TARGET fbgemm_avx512 PROPERTY POSITION_INDEPENDENT_CODE ON) set_property(TARGET fbgemm PROPERTY POSITION_INDEPENDENT_CODE ON) + if("${CMAKE_CXX_COMPILER_ID}" MATCHES "Clang" AND CMAKE_CXX_COMPILER_VERSION VERSION_GREATER 13.0.0) + # See https://github.com/pytorch/pytorch/issues/74352 + target_compile_options(asmjit PRIVATE -Wno-deprecated-copy -Wno-unused-but-set-variable) + endif() endif() if(USE_FBGEMM) @@ -1936,6 +1940,32 @@ if(USE_KINETO) message(STATUS " CUDA_cupti_LIBRARY = ${CUDA_cupti_LIBRARY}") message(STATUS "Found CUPTI") set(LIBKINETO_NOCUPTI OFF CACHE STRING "" FORCE) + + # I've only tested this sanity check on Linux; if someone + # runs into this bug on another platform feel free to + # generalize it accordingly + if(NOT USE_CUPTI_SO AND UNIX) + include(CheckCXXSourceRuns) + # rt is handled by the CMAKE_REQUIRED_LIBRARIES set above + if(NOT APPLE) + set(CMAKE_REQUIRED_LIBRARIES ${CMAKE_REQUIRED_LIBRARIES} "dl") + endif() + set(CMAKE_REQUIRED_LINK_OPTIONS "-Wl,--whole-archive,${CUPTI_LIBRARY_PATH},--no-whole-archive") + check_cxx_source_runs("#include + int main() { + try { + throw std::runtime_error(\"error\"); + } catch (...) { + return 0; + } + return 1; + }" EXCEPTIONS_WORK) + set(CMAKE_REQUIRED_LINK_OPTIONS "") + if(NOT EXCEPTIONS_WORK) + message(FATAL_ERROR "Detected that statically linking against CUPTI causes exceptions to stop working. See https://github.com/pytorch/pytorch/issues/57744 for more details. Perhaps try: USE_CUPTI_SO=1 python setup.py develop --cmake") + endif() + endif() + else() message(STATUS "Could not find CUPTI library, using CPU-only Kineto build") set(LIBKINETO_NOCUPTI ON CACHE STRING "" FORCE) diff --git a/cmake/Modules/FindMKL.cmake b/cmake/Modules/FindMKL.cmake index b79a87466252c3..01594a5b66e056 100644 --- a/cmake/Modules/FindMKL.cmake +++ b/cmake/Modules/FindMKL.cmake @@ -168,6 +168,26 @@ IF (EXISTS ${INTEL_OMP_DIR}) ENDIF() ENDIF() +MACRO(GET_MKL_LIB_NAMES LIBRARIES INTERFACE MKL64) + cmake_parse_arguments("" "" "THREAD" "" ${ARGN}) + SET(${LIBRARIES} mkl_${INTERFACE}${MKL64} mkl_core) + IF(_THREAD) + LIST(INSERT ${LIBRARIES} 1 ${_THREAD}) + IF(UNIX AND ${USE_STATIC_MKL}) + # The thread library defines symbols required by the other MKL libraries so also add it last + LIST(APPEND ${LIBRARIES} ${_THREAD}) + ENDIF() + ENDIF() + IF(${USE_STATIC_MKL}) + IF(UNIX) + list(TRANSFORM ${LIBRARIES} PREPEND "lib") + list(TRANSFORM ${LIBRARIES} APPEND ".a") + ELSE() + message(WARNING "Ignoring USE_STATIC_MKL") + ENDIF() + ENDIF() +ENDMACRO() + # Try linking multiple libs MACRO(CHECK_ALL_LIBRARIES LIBRARIES OPENMP_TYPE OPENMP_LIBRARY _name _list _flags) # This macro checks for the existence of the combination of libraries given by _list. @@ -304,8 +324,9 @@ IF (NOT "${MKL_THREADING}" STREQUAL "SEQ") FOREACH(mkl64 ${mkl64s} "") FOREACH(mklthread ${mklthreads}) IF (NOT MKL_LIBRARIES) + GET_MKL_LIB_NAMES(mkl_lib_names "${mkliface}" "${mkl64}" THREAD "${mklthread}") CHECK_ALL_LIBRARIES(MKL_LIBRARIES MKL_OPENMP_TYPE MKL_OPENMP_LIBRARY cblas_sgemm - "mkl_${mkliface}${mkl64};${mklthread};mkl_core;${mklrtl};${mkl_pthread};${mkl_m};${mkl_dl}" "") + "${mkl_lib_names};${mklrtl};${mkl_pthread};${mkl_m};${mkl_dl}" "") ENDIF (NOT MKL_LIBRARIES) ENDFOREACH(mklthread) ENDFOREACH(mkl64) @@ -317,8 +338,9 @@ ENDIF (NOT "${MKL_THREADING}" STREQUAL "SEQ") FOREACH(mkliface ${mklifaces}) FOREACH(mkl64 ${mkl64s} "") IF (NOT MKL_LIBRARIES) + GET_MKL_LIB_NAMES(mkl_lib_names "${mkliface}" "${mkl64}" THREAD "mkl_sequential") CHECK_ALL_LIBRARIES(MKL_LIBRARIES MKL_OPENMP_TYPE MKL_OPENMP_LIBRARY cblas_sgemm - "mkl_${mkliface}${mkl64};mkl_sequential;mkl_core;${mkl_m};${mkl_dl}" "") + "${mkl_lib_names};${mkl_m};${mkl_dl}" "") IF (MKL_LIBRARIES) SET(mklseq "_sequential") ENDIF (MKL_LIBRARIES) @@ -331,8 +353,9 @@ FOREACH(mklrtl ${mklrtls} "") FOREACH(mkliface ${mklifaces}) FOREACH(mkl64 ${mkl64s} "") IF (NOT MKL_LIBRARIES) + GET_MKL_LIB_NAMES(mkl_lib_names "${mkliface}" "${mkl64}" THREAD "${mklthread}") CHECK_ALL_LIBRARIES(MKL_LIBRARIES MKL_OPENMP_TYPE MKL_OPENMP_LIBRARY cblas_sgemm - "mkl_${mkliface}${mkl64};${mklthread};mkl_core;${mklrtl};pthread;${mkl_m};${mkl_dl}" "") + "${mkl_lib_names};${mklrtl};pthread;${mkl_m};${mkl_dl}" "") ENDIF (NOT MKL_LIBRARIES) ENDFOREACH(mkl64) ENDFOREACH(mkliface) @@ -341,6 +364,9 @@ ENDFOREACH(mklrtl) # Check for older versions IF (NOT MKL_LIBRARIES) SET(MKL_VERSION 900) + if (USE_STATIC_MKL) + message(WARNING "Ignoring USE_STATIC_MKL") + endif() CHECK_ALL_LIBRARIES(MKL_LIBRARIES MKL_OPENMP_TYPE MKL_OPENMP_LIBRARY cblas_sgemm "mkl;guide;pthread;m" "") ENDIF (NOT MKL_LIBRARIES) diff --git a/cmake/Summary.cmake b/cmake/Summary.cmake index 9203e72b3bda3d..cd0b330ab0e53c 100644 --- a/cmake/Summary.cmake +++ b/cmake/Summary.cmake @@ -148,6 +148,7 @@ function(caffe2_print_configuration_summary) message(STATUS " USE_NCCL : ${USE_NCCL}") if(${USE_NCCL}) message(STATUS " USE_SYSTEM_NCCL : ${USE_SYSTEM_NCCL}") + message(STATUS " USE_NCCL_WITH_UCC : ${USE_NCCL_WITH_UCC}") endif() message(STATUS " USE_NNPACK : ${USE_NNPACK}") message(STATUS " USE_NUMPY : ${USE_NUMPY}") @@ -191,4 +192,5 @@ function(caffe2_print_configuration_summary) message(STATUS " Private Dependencies : ${Caffe2_DEPENDENCY_LIBS}") # coreml message(STATUS " USE_COREML_DELEGATE : ${USE_COREML_DELEGATE}") + message(STATUS " BUILD_LAZY_TS_BACKEND : ${BUILD_LAZY_TS_BACKEND}") endfunction() diff --git a/docs/Makefile b/docs/Makefile index 28d910a89b4986..b9719df7ade5c3 100644 --- a/docs/Makefile +++ b/docs/Makefile @@ -15,6 +15,10 @@ help: figures: @$(PYCMD) source/scripts/build_activation_images.py + @$(PYCMD) source/scripts/build_quantization_configs.py + +onnx_supported_aten_ops: + @$(PYCMD) source/scripts/build_onnx_supported_aten_op_csv_table.py docset: html doc2dash --name $(SPHINXPROJ) --icon $(SOURCEDIR)/_static/img/pytorch-logo-flame.png --enable-js --online-redirect-url https://pytorch.org/docs/ --force $(BUILDDIR)/html/ @@ -30,13 +34,13 @@ html-stable: # See conf.py for more details. RELEASE=1 make html -.PHONY: help Makefile docset +.PHONY: help Makefile docset onnx_supported_aten_ops # Catch-all target: route all unknown targets to Sphinx using the new # "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). -%: Makefile figures +%: Makefile figures onnx_supported_aten_ops @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) clean: @echo "Removing everything under 'build' and 'source/generated'.." - @rm -rf $(BUILDDIR)/html/ $(BUILDDIR)/doctrees $(SOURCEDIR)/generated + @rm -rf $(BUILDDIR)/html/ $(BUILDDIR)/doctrees $(SOURCEDIR)/generated $(BUILDDIR)/auto_gen_aten_op_list.csv diff --git a/docs/cpp/requirements.txt b/docs/cpp/requirements.txt index f5d49d2ebe910d..ca3eb7da6846bf 100644 --- a/docs/cpp/requirements.txt +++ b/docs/cpp/requirements.txt @@ -1,4 +1,5 @@ sphinx==3.1.2 +Jinja2==3.0.* breathe==4.25.0 exhale==0.2.3 docutils==0.16 diff --git a/docs/cpp/source/Doxyfile b/docs/cpp/source/Doxyfile index 7785239d1539eb..a17d742a461efa 100644 --- a/docs/cpp/source/Doxyfile +++ b/docs/cpp/source/Doxyfile @@ -44,12 +44,14 @@ INPUT = ../../../aten/src/ATen/ATen.h \ ../../../aten/src/ATen/Scalar.h \ ../../../aten/src/ATen/TensorOptions.h \ ../../../aten/src/ATen/core/Tensor.h \ + ../../../aten/src/ATen/native/TensorShape.h \ ../../../build/aten/src/ATen/Functions.h \ ../../../build/aten/src/ATen/core/TensorBody.h \ ../../../c10/core/Device.h \ ../../../c10/core/DeviceType.h \ ../../../c10/util/Half.h \ ../../../c10/util/ArrayRef.h \ + ../../../c10/util/OptionalArrayRef.h \ ../../../c10/util/Exception.h \ ../../../c10/util/Optional.h \ ../../../c10/cuda/CUDAGuard.h \ diff --git a/docs/cpp/source/check-doxygen.sh b/docs/cpp/source/check-doxygen.sh index 6ff6832cd056c4..28c7e5b81ace98 100755 --- a/docs/cpp/source/check-doxygen.sh +++ b/docs/cpp/source/check-doxygen.sh @@ -19,8 +19,7 @@ cp torch/_utils_internal.py tools/shared python -m tools.codegen.gen python tools/setup_helpers/generate_code.py \ - --native-functions-path aten/src/ATen/native/native_functions.yaml \ - --nn-path aten/src + --native-functions-path aten/src/ATen/native/native_functions.yaml popd diff --git a/docs/requirements.txt b/docs/requirements.txt index 34ec6078225bdf..57bee508f61b40 100644 --- a/docs/requirements.txt +++ b/docs/requirements.txt @@ -1,4 +1,5 @@ sphinx==3.5.4 +Jinja2==3.0.* docutils==0.16 -e git+https://github.com/pytorch/pytorch_sphinx_theme.git#egg=pytorch_sphinx_theme sphinxcontrib.katex @@ -7,3 +8,4 @@ tensorboard # required to build torch.distributed.elastic.rendezvous.etcd* docs python-etcd>=0.4.5 sphinx_copybutton +sphinx-panels diff --git a/docs/source/amp.rst b/docs/source/amp.rst index 1f70f2c6982e63..e5d2a10585627d 100644 --- a/docs/source/amp.rst +++ b/docs/source/amp.rst @@ -1,22 +1,33 @@ .. role:: hidden :class: hidden-section -Automatic Mixed Precision package - torch.cuda.amp -================================================== +Automatic Mixed Precision package - torch.amp +============================================= -.. automodule:: torch.cuda.amp -.. currentmodule:: torch.cuda.amp +.. Both modules below are missing doc entry. Adding them here for now. +.. This does not add anything to the rendered page +.. py:module:: torch.cpu +.. py:module:: torch.cpu.amp +.. py:module:: torch.cuda.amp + +.. automodule:: torch.amp +.. currentmodule:: torch.amp -:class:`torch.cuda.amp` and :class:`torch` provide convenience methods for mixed precision, +:class:`torch.amp` provides convenience methods for mixed precision, where some operations use the ``torch.float32`` (``float``) datatype and other operations -use ``torch.float16`` (``half``). Some ops, like linear layers and convolutions, -are much faster in ``float16``. Other ops, like reductions, often require the dynamic +use lower precision floating point datatype (``lower_precision_fp``): ``torch.float16`` (``half``) or ``torch.bfloat16``. Some ops, like linear layers and convolutions, +are much faster in ``lower_precision_fp``. Other ops, like reductions, often require the dynamic range of ``float32``. Mixed precision tries to match each op to its appropriate datatype. -Ordinarily, "automatic mixed precision training" uses :class:`torch.autocast` and -:class:`torch.cuda.amp.GradScaler` together, as shown in the :ref:`Automatic Mixed Precision examples` -and `Automatic Mixed Precision recipe `_. -However, :class:`torch.autocast` and :class:`GradScaler` are modular, and may be used separately if desired. +Ordinarily, "automatic mixed precision training" with datatype of ``torch.float16`` uses :class:`torch.autocast` and +:class:`torch.cuda.amp.GradScaler` together, as shown in the :ref:`CUDA Automatic Mixed Precision examples` +and `CUDA Automatic Mixed Precision recipe `_. +However, :class:`torch.autocast` and :class:`torch.cuda.amp.GradScaler` are modular, and may be used separately if desired. + +For CUDA and CPU, APIs are also provided seperately: + +* ``torch.autocast("cuda", args...)`` is equivalent to ``torch.cuda.amp.autocast(args...)``. +* ``torch.autocast("cpu", args...)`` is equivalent to ``torch.cpu.amp.autocast(args...)``. For CPU, only lower precision floating point datatype of ``torch.bfloat16`` is supported for now. .. contents:: :local: @@ -38,6 +49,11 @@ Autocasting .. autofunction:: custom_bwd +.. currentmodule:: torch.cpu.amp + +.. autoclass:: autocast + :members: + .. _gradient-scaling: Gradient Scaling @@ -56,6 +72,8 @@ so they don't flush to zero. Each parameter's gradient (``.grad`` attribute) should be unscaled before the optimizer updates the parameters, so the scale factor does not interfere with the learning rate. +.. currentmodule:: torch.cuda.amp + .. autoclass:: GradScaler :members: @@ -68,8 +86,6 @@ Autocast Op Reference Op Eligibility -------------- -Only CUDA ops are eligible for autocasting. - Ops that run in ``float64`` or non-floating-point dtypes are not eligible, and will run in these types whether or not autocast is enabled. @@ -84,8 +100,10 @@ regions. Ops called with an explicit ``dtype=...`` argument are not eligible, and will produce output that respects the ``dtype`` argument. -Op-Specific Behavior --------------------- +.. _autocast-cuda-op-reference: + +CUDA Op-Specific Behavior +------------------------- The following lists describe the behavior of eligible ops in autocast-enabled regions. These ops always go through autocasting whether they are invoked as part of a :class:`torch.nn.Module`, as a function, or as a :class:`torch.Tensor` method. If functions are exposed in multiple namespaces, @@ -99,8 +117,8 @@ If an op is unlisted, we assume it's numerically stable in ``float16``. If you believe an unlisted op is numerically unstable in ``float16``, please file an issue. -Ops that can autocast to ``float16`` -"""""""""""""""""""""""""""""""""""" +CUDA Ops that can autocast to ``float16`` +""""""""""""""""""""""""""""""""""""""""" ``__matmul__``, ``addbmm``, @@ -126,8 +144,8 @@ Ops that can autocast to ``float16`` ``prelu``, ``RNNCell`` -Ops that can autocast to ``float32`` -"""""""""""""""""""""""""""""""""""" +CUDA Ops that can autocast to ``float32`` +""""""""""""""""""""""""""""""""""""""""" ``__pow__``, ``__rdiv__``, @@ -181,8 +199,8 @@ Ops that can autocast to ``float32`` ``tan``, ``triplet_margin_loss`` -Ops that promote to the widest input type -""""""""""""""""""""""""""""""""""""""""" +CUDA Ops that promote to the widest input type +"""""""""""""""""""""""""""""""""""""""""""""" These ops don't require a particular dtype for stability, but take multiple inputs and require that the inputs' dtypes match. If all of the inputs are ``float16``, the op runs in ``float16``. If any of the inputs is ``float32``, @@ -216,3 +234,191 @@ Many models use a sigmoid layer right before the binary cross entropy layer. In this case, combine the two layers using :func:`torch.nn.functional.binary_cross_entropy_with_logits` or :mod:`torch.nn.BCEWithLogitsLoss`. ``binary_cross_entropy_with_logits`` and ``BCEWithLogits`` are safe to autocast. + +.. _autocast-cpu-op-reference: + +CPU Op-Specific Behavior +------------------------ +The following lists describe the behavior of eligible ops in autocast-enabled regions. +These ops always go through autocasting whether they are invoked as part of a :class:`torch.nn.Module`, +as a function, or as a :class:`torch.Tensor` method. If functions are exposed in multiple namespaces, +they go through autocasting regardless of the namespace. + +Ops not listed below do not go through autocasting. They run in the type +defined by their inputs. However, autocasting may still change the type +in which unlisted ops run if they're downstream from autocasted ops. + +If an op is unlisted, we assume it's numerically stable in ``bfloat16``. +If you believe an unlisted op is numerically unstable in ``bfloat16``, +please file an issue. + +CPU Ops that can autocast to ``bfloat16`` +""""""""""""""""""""""""""""""""""""""""" + +``conv1d``, +``conv2d``, +``conv3d``, +``bmm``, +``mm``, +``baddbmm``, +``addmm``, +``addbmm``, +``linear``, +``_convolution`` + +CPU Ops that can autocast to ``float32`` +"""""""""""""""""""""""""""""""""""""""" + +``conv_transpose1d``, +``conv_transpose2d``, +``conv_transpose3d``, +``batch_norm``, +``dropout``, +``avg_pool1d``, +``avg_pool2d``, +``avg_pool3d``, +``gelu``, +``upsample_nearest1d``, +``_upsample_nearest_exact1d``, +``upsample_nearest2d``, +``_upsample_nearest_exact2d``, +``upsample_nearest3d``, +``_upsample_nearest_exact3d``, +``upsample_linear1d``, +``upsample_bilinear2d``, +``upsample_trilinear3d``, +``binary_cross_entropy``, +``binary_cross_entropy_with_logits``, +``instance_norm``, +``grid_sampler``, +``polar``, +``multinomial``, +``poisson``, +``fmod``, +``prod``, +``quantile``, +``nanquantile``, +``stft``, +``cdist``, +``cross``, +``cumprod``, +``cumsum``, +``diag``, +``diagflat``, +``histc``, +``logcumsumexp``, +``searchsorted``, +``trace``, +``tril``, +``triu``, +``vander``, +``view_as_complex``, +``cholesky``, +``cholesky_inverse``, +``cholesky_solve``, +``dot``, +``inverse``, +``lu_solve``, +``matrix_rank``, +``orgqr``, +``inverse``, +``ormqr``, +``pinverse``, +``vdot``, +``im2col``, +``col2im``, +``max_pool3d``, +``max_unpool2d``, +``max_unpool3d``, +``adaptive_avg_pool3d``, +``reflection_pad1d``, +``reflection_pad2d``, +``replication_pad1d``, +``replication_pad2d``, +``replication_pad3d``, +``elu``, +``hardshrink``, +``hardsigmoid``, +``hardswish``, +``log_sigmoid``, +``prelu``, +``selu``, +``celu``, +``softplus``, +``softshrink``, +``group_norm``, +``smooth_l1_loss``, +``mse_loss``, +``ctc_loss``, +``kl_div``, +``multilabel_margin_loss``, +``fft_fft``, +``fft_ifft``, +``fft_fft2``, +``fft_ifft2``, +``fft_fftn``, +``fft_ifftn``, +``fft_rfft``, +``fft_irfft``, +``fft_rfft2``, +``fft_irfft2``, +``fft_rfftn``, +``fft_irfftn``, +``fft_hfft``, +``fft_ihfft``, +``conv_tbc``, +``linalg_matrix_norm``, +``linalg_cond``, +``linalg_matrix_rank``, +``linalg_solve``, +``linalg_cholesky``, +``linalg_svdvals``, +``linalg_eigvals``, +``linalg_eigvalsh``, +``linalg_inv``, +``linalg_householder_product``, +``linalg_tensorinv``, +``linalg_tensorsolve``, +``fake_quantize_per_tensor_affine``, +``glu``, +``cummax``, +``cummin``, +``eig``, +``geqrf``, +``lstsq``, +``_lu_with_info``, +``lu_unpack``, +``qr``, +``solve``, +``svd``, +``symeig``, +``triangular_solve``, +``fractional_max_pool2d``, +``fractional_max_pool3d``, +``adaptive_max_pool1d``, +``adaptive_max_pool2d``, +``adaptive_max_pool3d``, +``multilabel_margin_loss_forward``, +``linalg_qr``, +``linalg_cholesky_ex``, +``linalg_svd``, +``linalg_eig``, +``linalg_eigh``, +``linalg_lstsq``, +``linalg_inv_ex`` + +CPU Ops that promote to the widest input type +""""""""""""""""""""""""""""""""""""""""""""" +These ops don't require a particular dtype for stability, but take multiple inputs +and require that the inputs' dtypes match. If all of the inputs are +``bfloat16``, the op runs in ``bfloat16``. If any of the inputs is ``float32``, +autocast casts all inputs to ``float32`` and runs the op in ``float32``. + +``cat``, +``stack``, +``index_copy`` + +Some ops not listed here (e.g., binary ops like ``add``) natively promote +inputs without autocasting's intervention. If inputs are a mixture of ``bfloat16`` +and ``float32``, these ops run in ``float32`` and produce ``float32`` output, +regardless of whether autocast is enabled. diff --git a/docs/source/backends.rst b/docs/source/backends.rst index 45d6fdf2add2a8..2b49e4c9341692 100644 --- a/docs/source/backends.rst +++ b/docs/source/backends.rst @@ -3,6 +3,7 @@ torch.backends ============== +.. automodule:: torch.backends `torch.backends` controls the behavior of various backends that PyTorch supports. @@ -17,6 +18,7 @@ These backends include: torch.backends.cuda ^^^^^^^^^^^^^^^^^^^ +.. automodule:: torch.backends.cuda .. autofunction:: torch.backends.cuda.is_built @@ -50,6 +52,7 @@ torch.backends.cuda torch.backends.cudnn ^^^^^^^^^^^^^^^^^^^^ +.. automodule:: torch.backends.cudnn .. autofunction:: torch.backends.cudnn.version @@ -78,17 +81,26 @@ torch.backends.cudnn torch.backends.mkl ^^^^^^^^^^^^^^^^^^ +.. automodule:: torch.backends.mkl .. autofunction:: torch.backends.mkl.is_available torch.backends.mkldnn ^^^^^^^^^^^^^^^^^^^^^ +.. automodule:: torch.backends.mkldnn .. autofunction:: torch.backends.mkldnn.is_available torch.backends.openmp ^^^^^^^^^^^^^^^^^^^^^ +.. automodule:: torch.backends.openmp .. autofunction:: torch.backends.openmp.is_available + +.. Docs for other backends need to be added here. +.. Automodules are just here to ensure checks run but they don't actually +.. add anything to the rendered page for now. +.. py:module:: torch.backends.quantized +.. py:module:: torch.backends.xnnpack diff --git a/docs/source/benchmark_utils.rst b/docs/source/benchmark_utils.rst index c211dcb7b58003..c93fbfd66c3d9a 100644 --- a/docs/source/benchmark_utils.rst +++ b/docs/source/benchmark_utils.rst @@ -18,3 +18,10 @@ Benchmark Utils - torch.utils.benchmark .. autoclass:: FunctionCounts :members: + +.. These are missing documentation. Adding them here until a better place +.. is made in this file. +.. py:module:: torch.utils.benchmark.examples +.. py:module:: torch.utils.benchmark.op_fuzzers +.. py:module:: torch.utils.benchmark.utils +.. py:module:: torch.utils.benchmark.utils.valgrind_wrapper diff --git a/docs/source/bottleneck.rst b/docs/source/bottleneck.rst index d6ce122234fb11..3fa1c99b506171 100644 --- a/docs/source/bottleneck.rst +++ b/docs/source/bottleneck.rst @@ -1,6 +1,7 @@ torch.utils.bottleneck ====================== +.. automodule:: torch.utils.bottleneck .. currentmodule:: torch.utils.bottleneck `torch.utils.bottleneck` is a tool that can be used as an initial step for diff --git a/docs/source/conf.py b/docs/source/conf.py index de66776b85cbae..d36deda65a19ab 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -57,12 +57,16 @@ 'sphinxcontrib.katex', 'sphinx.ext.autosectionlabel', 'sphinx_copybutton', + 'sphinx_panels' ] # build the templated autosummary files autosummary_generate = True numpydoc_show_class_members = False +# Theme has bootstrap already +panels_add_bootstrap_css = False + # autosectionlabel throws warnings if section names are duplicated. # The following tells autosectionlabel to not throw a warning for # duplicated section names that are in different documents. @@ -82,6 +86,8 @@ # TODO: document these and remove them from here. coverage_ignore_functions = [ + # torch + "typename", # torch.autograd "register_py_tensor_class_for_device", "variable", @@ -125,9 +131,41 @@ "execWrapper", # torch.onnx "unregister_custom_op_symbolic", + # torch.ao.quantization + "default_eval_fn", + # torch.ao.quantization.fx.backend_config + "validate_backend_config_dict", + # torch.backends + "disable_global_flags", + "flags_frozen", + # torch.distributed.algorithms.ddp_comm_hooks + "register_ddp_comm_hook", + # torch.nn + "factory_kwargs", + # torch.nn.parallel + "DistributedDataParallelCPU", + # torch.utils + "set_module", + # torch.utils.model_dump + "burn_in_info", + "get_info_and_burn_skeleton", + "get_inline_skeleton", + "get_model_info", + "get_storage_info", + "hierarchical_pickle", ] coverage_ignore_classes = [ + # torch + "FatalError", + "QUInt2x4Storage", + "Size", + "Storage", + "Stream", + "Tensor", + "finfo", + "iinfo", + "qscheme", # torch.cuda "BFloat16Storage", "BFloat16Tensor", @@ -193,109 +231,25 @@ # torch.onnx "CheckerError", "ExportTypes", + # torch.backends + "ContextProp", + "PropModule", + # torch.backends.cuda + "cuBLASModule", + "cuFFTPlanCache", + "cuFFTPlanCacheAttrContextProp", + "cuFFTPlanCacheManager", + # torch.distributed.algorithms.ddp_comm_hooks + "DDPCommHookType", + # torch.jit.mobile + "LiteScriptModule", + # torch.nn.quantized.modules + "DeQuantize", + "Quantize", + # torch.utils.backcompat + "Warning", ] -# List of modules that do not have automodule/py:module in the doc yet -# We should NOT add anything to this list, see the CI failure message -# on how to solve missing automodule issues -coverage_missing_automodule = [ - "torch", - "torch.ao", - "torch.ao.nn", - "torch.ao.nn.sparse", - "torch.ao.nn.sparse.quantized", - "torch.ao.nn.sparse.quantized.dynamic", - "torch.ao.ns", - "torch.ao.ns.fx", - "torch.ao.quantization", - "torch.ao.quantization.fx", - "torch.ao.quantization.fx.backend_config", - "torch.ao.sparsity", - "torch.ao.sparsity.experimental", - "torch.ao.sparsity.experimental.pruner", - "torch.ao.sparsity.scheduler", - "torch.ao.sparsity.sparsifier", - "torch.backends", - "torch.backends.cuda", - "torch.backends.cudnn", - "torch.backends.mkl", - "torch.backends.mkldnn", - "torch.backends.openmp", - "torch.backends.quantized", - "torch.backends.xnnpack", - "torch.contrib", - "torch.cpu", - "torch.cpu.amp", - "torch.distributed.algorithms", - "torch.distributed.algorithms.ddp_comm_hooks", - "torch.distributed.algorithms.model_averaging", - "torch.distributed.elastic", - "torch.distributed.elastic.utils", - "torch.distributed.elastic.utils.data", - "torch.distributed.launcher", - "torch.distributed.nn", - "torch.distributed.nn.api", - "torch.distributed.nn.jit", - "torch.distributed.nn.jit.templates", - "torch.distributed.pipeline", - "torch.distributed.pipeline.sync", - "torch.distributed.pipeline.sync.skip", - "torch.fft", - "torch.for_onnx", - "torch.fx.experimental", - "torch.fx.experimental.unification", - "torch.fx.experimental.unification.multipledispatch", - "torch.fx.passes", - "torch.jit.mobile", - "torch.nn", - "torch.nn.backends", - "torch.nn.intrinsic", - "torch.nn.intrinsic.modules", - "torch.nn.intrinsic.qat", - "torch.nn.intrinsic.qat.modules", - "torch.nn.intrinsic.quantized", - "torch.nn.intrinsic.quantized.dynamic", - "torch.nn.intrinsic.quantized.dynamic.modules", - "torch.nn.intrinsic.quantized.modules", - "torch.nn.modules", - "torch.nn.parallel", - "torch.nn.qat", - "torch.nn.qat.modules", - "torch.nn.qat.dynamic", - "torch.nn.qat.dynamic.modules", - "torch.nn.quantizable", - "torch.nn.quantizable.modules", - "torch.nn.quantized", - "torch.nn.quantized.dynamic", - "torch.nn.quantized.dynamic.modules", - "torch.nn.quantized.modules", - "torch.nn.utils", - "torch.package", - "torch.package.analyze", - "torch.quantization", - "torch.quantization.fx", - "torch.sparse", - "torch.special", - "torch.utils", - "torch.utils.backcompat", - "torch.utils.benchmark.examples", - "torch.utils.benchmark.op_fuzzers", - "torch.utils.benchmark.utils", - "torch.utils.benchmark.utils.valgrind_wrapper", - "torch.utils.bottleneck", - "torch.utils.data.communication", - "torch.utils.data.datapipes", - "torch.utils.data.datapipes.dataframe", - "torch.utils.data.datapipes.iter", - "torch.utils.data.datapipes.map", - "torch.utils.data.datapipes.utils", - "torch.utils.ffi", - "torch.utils.hipify", - "torch.utils.model_dump", - "torch.utils.tensorboard", -] - - # The suffix(es) of source filenames. # You can specify multiple suffix as a list of string: # @@ -413,6 +367,11 @@ def coverage_post_process(app, exception): if not isinstance(app.builder, CoverageBuilder): return + if not torch.distributed.is_available(): + raise RuntimeError("The coverage tool cannot run with a version " + "of PyTorch that was built with USE_DISTRIBUTED=0 " + "as this module's API changes.") + # These are all the modules that have "automodule" in an rst file # These modules are the ones for which coverage is checked # Here, we make sure that no module is missing from that list @@ -439,26 +398,16 @@ def is_not_internal(modname): if modname not in modules: missing.add(modname) - expected = set(coverage_missing_automodule) - output = [] - unexpected_missing = missing - expected - if unexpected_missing: - mods = ", ".join(unexpected_missing) + if missing: + mods = ", ".join(missing) output.append(f"\nYou added the following module(s) to the PyTorch namespace '{mods}' " "but they have no corresponding entry in a doc .rst file. You should " "either make sure that the .rst file that contains the module's documentation " "properly contains either '.. automodule:: mod_name' (if you do not want " - "the paragraph added by the automodule, you can simply use py:module) or " - "make the module private (by appending an '_' at the beginning of its name.") - - unexpected_not_missing = expected - missing - if unexpected_not_missing: - mods = ", ".join(unexpected_not_missing) - output.append(f"\nThank you for adding the missing .rst entries for '{mods}', please update " - "the 'coverage_missing_automodule' in 'torch/docs/source/conf.py' to remove " - "the module(s) you fixed and make sure we do not regress on this in the future.") + "the paragraph added by the automodule, you can simply use '.. py:module:: mod_name') " + " or make the module private (by appending an '_' at the beginning of its name).") # The output file is hard-coded by the coverage tool # Our CI is setup to fail if any line is added to this file diff --git a/docs/source/__config__.rst b/docs/source/config_mod.rst similarity index 100% rename from docs/source/__config__.rst rename to docs/source/config_mod.rst diff --git a/docs/source/data.rst b/docs/source/data.rst index 322de88e27d939..646f41436caf61 100644 --- a/docs/source/data.rst +++ b/docs/source/data.rst @@ -432,3 +432,15 @@ Example:: .. autoclass:: torch.utils.data.WeightedRandomSampler .. autoclass:: torch.utils.data.BatchSampler .. autoclass:: torch.utils.data.distributed.DistributedSampler + + +.. This module is experimental and should be private, adding it here for now +.. py:module:: torch.utils.data.communication + +.. These modules are documented as part of torch/data listing them here for +.. now until we have a clearer fix +.. py:module:: torch.utils.data.datapipes +.. py:module:: torch.utils.data.datapipes.dataframe +.. py:module:: torch.utils.data.datapipes.iter +.. py:module:: torch.utils.data.datapipes.map +.. py:module:: torch.utils.data.datapipes.utils diff --git a/docs/source/distributed.rst b/docs/source/distributed.rst index 6c956c68422258..0eb143ca49a5a4 100644 --- a/docs/source/distributed.rst +++ b/docs/source/distributed.rst @@ -123,14 +123,24 @@ It is imperative that all processes specify the same number of interfaces in thi Other NCCL environment variables """""""""""""""""""""""""""""""" -NCCL has also provided a number of environment variables for fine-tuning purposes. - -Commonly used ones include the following for debugging purposes: - -- ``export NCCL_DEBUG=INFO`` -- ``export NCCL_DEBUG_SUBSYS=ALL`` - -For the full list of NCCL environment variables, please refer to +**Debugging** - in case of NCCL failure, you can set ``NCCL_DEBUG=INFO`` to print an explicit +warning message as well as basic NCCL initialization information. + +You may also use ``NCCL_DEBUG_SUBSYS`` to get more details about a specific +aspect of NCCL. For example, ``NCCL_DEBUG_SUBSYS=COLL`` would print logs of +collective calls, which may be helpful when debugging hangs, especially those +caused by collective type or message size mismatch. In case of topology +detection failure, it would be helpful to set ``NCCL_DEBUG_SUBSYS=GRAPH`` +to inspect the detailed detection result and save as reference if further help +from NCCL team is needed. + +**Performance tuning** - NCCL performs automatic tuning based on its topology detection to save users' +tuning effort. On some socket-based systems, users may still try tuning +``NCCL_SOCKET_NTHREADS`` and ``NCCL_NSOCKS_PERTHREAD`` to increase socket +network bandwidth. These two environment variables have been pre-tuned by NCCL +for some cloud providers, such as AWS or GCP. + +For a full list of NCCL environment variables, please refer to `NVIDIA NCCL's official documentation `_ @@ -808,3 +818,21 @@ following matrix shows how the log level can be adjusted via the combination of +-------------------------+-----------------------------+------------------------+ | ``INFO`` | ``DETAIL`` | Trace (a.k.a. All) | +-------------------------+-----------------------------+------------------------+ + + +.. Distributed modules that are missing specific entries. +.. Adding them here for tracking purposes until they are more permanently fixed. +.. py:module:: torch.distributed.algorithms +.. py:module:: torch.distributed.algorithms.ddp_comm_hooks +.. py:module:: torch.distributed.algorithms.model_averaging +.. py:module:: torch.distributed.elastic +.. py:module:: torch.distributed.elastic.utils +.. py:module:: torch.distributed.elastic.utils.data +.. py:module:: torch.distributed.launcher +.. py:module:: torch.distributed.nn +.. py:module:: torch.distributed.nn.api +.. py:module:: torch.distributed.nn.jit +.. py:module:: torch.distributed.nn.jit.templates +.. py:module:: torch.distributed.pipeline +.. py:module:: torch.distributed.pipeline.sync +.. py:module:: torch.distributed.pipeline.sync.skip diff --git a/docs/source/fft.rst b/docs/source/fft.rst index 05f6215af513d5..5406b6610a602b 100644 --- a/docs/source/fft.rst +++ b/docs/source/fft.rst @@ -7,8 +7,6 @@ torch.fft Discrete Fourier transforms and related functions. .. automodule:: torch.fft - :noindex: - .. currentmodule:: torch.fft Fast Fourier Transforms diff --git a/docs/source/fx.rst b/docs/source/fx.rst index 65689930743da9..de1e1b88f93e21 100644 --- a/docs/source/fx.rst +++ b/docs/source/fx.rst @@ -1109,3 +1109,12 @@ API Reference :members: .. autofunction:: torch.fx.replace_pattern + + +.. The experimental and passes submodules are missing docs. +.. Adding it here for coverage but this doesn't add anything to the +.. rendered doc. +.. py:module:: torch.fx.passes +.. py:module:: torch.fx.experimental +.. py:module:: torch.fx.experimental.unification +.. py:module:: torch.fx.experimental.unification.multipledispatch diff --git a/docs/source/index.rst b/docs/source/index.rst index 24aa75476b044e..e64f7425c56d2a 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -54,9 +54,9 @@ Features described in this documentation are classified by release status: tensors tensor_attributes tensor_view + torch.amp torch.autograd cuda - torch.cuda.amp torch.backends torch.distributed torch.distributed.algorithms.join @@ -100,7 +100,7 @@ Features described in this documentation are classified by release status: type_info named_tensor name_inference - torch.__config__ <__config__> + torch.__config__ .. toctree:: :maxdepth: 1 diff --git a/docs/source/jit.rst b/docs/source/jit.rst index 23426fb3d9ea00..d2d55215aa3f10 100644 --- a/docs/source/jit.rst +++ b/docs/source/jit.rst @@ -878,3 +878,7 @@ References jit_python_reference jit_unsupported + +.. This package is missing doc. Adding it here for coverage +.. This does not add anything to the rendered page. +.. py:module:: torch.jit.mobile diff --git a/docs/source/nn.rst b/docs/source/nn.rst index 6eca9d4b16b6a4..0e9d161c014bc1 100644 --- a/docs/source/nn.rst +++ b/docs/source/nn.rst @@ -3,6 +3,8 @@ torch.nn =================================== +.. automodule:: torch.nn +.. automodule:: torch.nn.modules These are the basic building blocks for graphs: @@ -331,6 +333,8 @@ Shuffle Layers DataParallel Layers (multi-GPU, distributed) -------------------------------------------- +.. automodule:: torch.nn.parallel +.. currentmodule:: torch .. autosummary:: :toctree: generated @@ -342,6 +346,7 @@ DataParallel Layers (multi-GPU, distributed) Utilities --------- +.. automodule:: torch.nn.utils From the ``torch.nn.utils`` module @@ -453,3 +458,7 @@ Lazy Modules Initialization :template: classtemplate.rst nn.modules.lazy.LazyModuleMixin + + +.. This module is kept only for backward compatibility +.. py:module:: torch.nn.backends diff --git a/docs/source/notes/amp_examples.rst b/docs/source/notes/amp_examples.rst index 90cda473cb2926..b6bcc38bc0f300 100644 --- a/docs/source/notes/amp_examples.rst +++ b/docs/source/notes/amp_examples.rst @@ -1,7 +1,7 @@ .. _amp-examples: -Automatic Mixed Precision examples -================================== +CUDA Automatic Mixed Precision examples +======================================= .. currentmodule:: torch.cuda.amp diff --git a/docs/source/notes/autograd.rst b/docs/source/notes/autograd.rst index af8922ddfce4b8..216bb8cfb2510a 100644 --- a/docs/source/notes/autograd.rst +++ b/docs/source/notes/autograd.rst @@ -222,7 +222,7 @@ Evaluation Mode (``nn.Module.eval()``) Evaluation mode is not actually a mechanism to locally disable gradient computation. It is included here anyway because it is sometimes confused to be such a mechanism. -Functionally, ``module.eval()`` (or equivalently ``module.train()``) are completely +Functionally, ``module.eval()`` (or equivalently ``module.train(False)``) are completely orthogonal to no-grad mode and inference mode. How ``model.eval()`` affects your model depends entirely on the specific modules used in your model and whether they define any training-mode specific behavior. diff --git a/docs/source/notes/cuda.rst b/docs/source/notes/cuda.rst index b2901a6fe33658..59eb7d4c72b69f 100644 --- a/docs/source/notes/cuda.rst +++ b/docs/source/notes/cuda.rst @@ -364,6 +364,26 @@ Available options: :meth:`~torch.cuda.memory_summary` methods are useful for tuning. This option should be used as a last resort for a workload that is aborting due to 'out of memory' and showing a large amount of inactive split blocks. +* ``roundup_power2_divisions`` helps with rounding the requested allocation + size to nearest power-2 division and making better use of the blocks. In + the current CUDACachingAllocator, the sizes are rounded up in multiple + of blocks size of 512, so this works fine for smaller sizes. However, this + can be inefficient for large near-by allocations as each will go to different + size of blocks and re-use of those blocks are minimized. This might create + lots of unused blocks and will waste GPU memory capacity. This option enables + the rounding of allocation size to nearest power-2 division. For example, if + we need to round-up size of 1200 and if number of divisions is 4, + the size 1200 lies between 1024 and 2048 and if we do 4 divisions between + them, the values are 1024, 1280, 1536, and 1792. So, allocation size of 1200 + will be rounded to 1280 as the nearest ceiling of power-2 division. +* ``garbage_collection_threshold`` helps actively reclaiming unused GPU memory to + avoid triggering expensive sync-and-reclaim-all operation (release_cached_blocks), + which can be unfavorable to latency-critical GPU applications (e.g., servers). + Upon setting this threshold (e.g., 0.8), the allocator will start reclaiming + GPU memory blocks if the GPU memory capacity usage exceeds the threshold (i.e., + 80% of the total memory allocated to the GPU application). The algorithm prefers + to free old & unused blocks first to avoid freeing blocks that are actively being + reused. The threshold value should be between greater than 0.0 and less than 1.0. .. _cufft-plan-cache: diff --git a/docs/source/onnx.rst b/docs/source/onnx.rst index 78458c1d71053e..5ed8d2aebd0bf0 100644 --- a/docs/source/onnx.rst +++ b/docs/source/onnx.rst @@ -130,9 +130,9 @@ a :class:`torch.nn.Module`. If the passed-in model is not already a ``ScriptModu of different sizes. To use scripting: * Use :func:`torch.jit.script` to produce a ``ScriptModule``. - * Call ``torch.onnx.export()`` with the ``ScriptModule`` as the model, and set the - ``example_outputs`` arg. This is required so that the types and shapes of the outputs can be - captured without executing the model. + * Call ``torch.onnx.export()`` with the ``ScriptModule`` as the model. The ``args`` are still required, + but they will be used internally only to produce example outputs, so that the types and shapes of the + outputs can be captured. No tracing will be performed. See `Introduction to TorchScript `_ and `TorchScript `_ for more details, including how to compose tracing and scripting to suit the @@ -332,10 +332,20 @@ The process for adding a symbolic function depends on the type of operator. ATen operators ^^^^^^^^^^^^^^ - `ATen `_ is PyTorch’s built-in tensor library. If the operator is an ATen operator (shows up in the TorchScript graph with the prefix -``aten::``): +``aten::``), make sure it is not supported already. + +List of supported operators +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Visit the auto generated :doc:`list of supported ATen operators <../onnx_supported_aten_ops>` +for details on which operator are supported in each ``opset_version``. + +Adding support for an operator +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If the operator is not in the list above: * Define the symbolic function in ``torch/onnx/symbolic_opset.py``, for example `torch/onnx/symbolic_opset9.py `_. @@ -598,6 +608,11 @@ Functions .. autofunction:: register_custom_op_symbolic .. autofunction:: select_model_mode_for_export .. autofunction:: is_in_onnx_export +.. autofunction:: is_onnx_log_enabled +.. autofunction:: enable_log +.. autofunction:: disable_log +.. autofunction:: set_log_stream +.. autofunction:: log Classes ------- diff --git a/docs/source/onnx_supported_aten_ops.rst b/docs/source/onnx_supported_aten_ops.rst new file mode 100644 index 00000000000000..d6bf535e2e7ec3 --- /dev/null +++ b/docs/source/onnx_supported_aten_ops.rst @@ -0,0 +1,14 @@ +:orphan: + +ONNX supported ATen operators +============================= + +This file is automatically generated during the documentation build +by cross referencing ONNX operator symbolics with Torch JIT operators via +``docs/source/scripts/build_onnx_supported_aten_op_csv_table.py``. +Do not modify directly and instead `rebuild the docs `_. + +.. csv-table:: Supported ATen operators + :file: ../build/auto_gen_aten_op_list.csv + :widths: 30, 70 + :header-rows: 1 diff --git a/docs/source/package.rst b/docs/source/package.rst index c7881f1961406f..b72112ffed31fb 100644 --- a/docs/source/package.rst +++ b/docs/source/package.rst @@ -1,3 +1,6 @@ +.. automodule:: torch.package +.. py:module:: torch.package.analyze + .. currentmodule:: torch.package torch.package diff --git a/docs/source/quantization-backend-configuration.rst b/docs/source/quantization-backend-configuration.rst new file mode 100644 index 00000000000000..07fd875fa9b34a --- /dev/null +++ b/docs/source/quantization-backend-configuration.rst @@ -0,0 +1,20 @@ +Quantization Backend Configuration +---------------------------------- + +FX Graph Mode Quantization allows the user to configure various +quantization behaviors of an op in order to match the expectation +of their backend. + +In the future, this document will contain a detailed spec of +these configurations. + + +Default values for native configurations +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Below is the output of the configuration for quantization of ops +in fbgemm and qnnpack (PyTorch's default quantized backends). + +Results: + +.. literalinclude:: scripts/quantization_backend_configs/default_backend_config.txt diff --git a/docs/source/quantization-support.rst b/docs/source/quantization-support.rst index 78c5ea247c482b..da6649a2fee3d7 100644 --- a/docs/source/quantization-support.rst +++ b/docs/source/quantization-support.rst @@ -217,6 +217,8 @@ to configure quantization settings for individual ops. torch.nn.intrinsic ~~~~~~~~~~~~~~~~~~ +.. automodule:: torch.nn.intrinsic +.. automodule:: torch.nn.intrinsic.modules This module implements the combined (fused) modules conv + relu which can then be quantized. @@ -243,6 +245,9 @@ then be quantized. torch.nn.intrinsic.qat ~~~~~~~~~~~~~~~~~~~~~~ +.. automodule:: torch.nn.intrinsic.qat +.. automodule:: torch.nn.intrinsic.qat.modules + This module implements the versions of those fused operations needed for quantization aware training. @@ -268,6 +273,9 @@ quantization aware training. torch.nn.intrinsic.quantized ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +.. automodule:: torch.nn.intrinsic.quantized +.. automodule:: torch.nn.intrinsic.quantized.modules + This module implements the quantized implementations of fused operations like conv + relu. No BatchNorm variants as it's usually folded into convolution @@ -289,6 +297,8 @@ for inference. torch.nn.intrinsic.quantized.dynamic ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +.. automodule:: torch.nn.intrinsic.quantized.dynamic +.. automodule:: torch.nn.intrinsic.quantized.dynamic.modules This module implements the quantized dynamic implementations of fused operations like linear + relu. @@ -304,6 +314,8 @@ like linear + relu. torch.nn.qat ~~~~~~~~~~~~~~~~~~~~~~ +.. automodule:: torch.nn.qat +.. automodule:: torch.nn.qat.modules This module implements versions of the key nn modules **Conv2d()** and **Linear()** which run in FP32 but with rounding applied to simulate the @@ -322,6 +334,8 @@ effect of INT8 quantization. torch.nn.qat.dynamic ~~~~~~~~~~~~~~~~~~~~~~~~~~ +.. automodule:: torch.nn.qat.dynamic +.. automodule:: torch.nn.qat.dynamic.modules This module implements versions of the key nn modules such as **Linear()** which run in FP32 but with rounding applied to simulate the effect of INT8 @@ -338,6 +352,8 @@ quantization and will be dynamically quantized during inference. torch.nn.quantized ~~~~~~~~~~~~~~~~~~~~~~ +.. automodule:: torch.nn.quantized +.. automodule:: torch.nn.quantized.modules This module implements the quantized versions of the nn layers such as ~`torch.nn.Conv2d` and `torch.nn.ReLU`. @@ -376,6 +392,7 @@ This module implements the quantized versions of the nn layers such as torch.nn.quantized.functional ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +.. automodule:: torch.nn.quantized.functional This module implements the quantized versions of the functional layers such as ~`torch.nn.functional.conv2d` and `torch.nn.functional.relu`. Note: @@ -413,6 +430,8 @@ This module implements the quantized versions of the functional layers such as torch.nn.quantized.dynamic ~~~~~~~~~~~~~~~~~~~~~~~~~~ +.. automodule:: torch.nn.quantized.dynamic +.. automodule:: torch.nn.quantized.dynamic.modules Dynamically quantized :class:`~torch.nn.Linear`, :class:`~torch.nn.LSTM`, :class:`~torch.nn.LSTMCell`, :class:`~torch.nn.GRUCell`, and @@ -492,3 +511,8 @@ the `custom operator mechanism ` contains documentation +on how to configure the quantization workflows for various backends. + +.. toctree:: + :hidden: + + quantization-backend-configuration + Quantized Tensors --------------------------------------- @@ -883,3 +897,22 @@ Numerical Debugging (prototype) Eager mode numeric suite * :ref:`torch_ao_ns_numeric_suite_fx` FX numeric suite + + +.. torch.ao is missing documentation. Since part of it is mentioned here, adding them here for now. +.. They are here for tracking purposes until they are more permanently fixed. +.. py:module:: torch.ao +.. py:module:: torch.ao.nn +.. py:module:: torch.ao.nn.sparse +.. py:module:: torch.ao.nn.sparse.quantized +.. py:module:: torch.ao.nn.sparse.quantized.dynamic +.. py:module:: torch.ao.ns +.. py:module:: torch.ao.ns.fx +.. py:module:: torch.ao.quantization +.. py:module:: torch.ao.quantization.fx +.. py:module:: torch.ao.quantization.fx.backend_config +.. py:module:: torch.ao.sparsity +.. py:module:: torch.ao.sparsity.experimental +.. py:module:: torch.ao.sparsity.experimental.pruner +.. py:module:: torch.ao.sparsity.scheduler +.. py:module:: torch.ao.sparsity.sparsifier diff --git a/docs/source/scripts/build_onnx_supported_aten_op_csv_table.py b/docs/source/scripts/build_onnx_supported_aten_op_csv_table.py new file mode 100644 index 00000000000000..7d12a441c4409b --- /dev/null +++ b/docs/source/scripts/build_onnx_supported_aten_op_csv_table.py @@ -0,0 +1,21 @@ +""" +This script generates a CSV table with all ATen operators +supported by `torch.onnx.export`. The generated table is included by +docs/source/onnx_supported_aten_list.rst. +""" + +import os +from torch.onnx import onnx_supported_ops + +# Constants +BUILD_DIR = 'build' +AUTO_GEN_ATEN_OPS_CSV_FILE = 'auto_gen_aten_op_list.csv' + +os.makedirs(BUILD_DIR, exist_ok=True) + +aten_list = onnx_supported_ops.onnx_supported_ops() + +with open(os.path.join(BUILD_DIR, AUTO_GEN_ATEN_OPS_CSV_FILE), 'w') as f: + f.write('Operator,opset_version(s)\n') + for name, opset_version in aten_list: + f.write(f'"``{name}``","{opset_version}"\n') diff --git a/docs/source/scripts/build_quantization_configs.py b/docs/source/scripts/build_quantization_configs.py new file mode 100644 index 00000000000000..7e9a011e12ba3e --- /dev/null +++ b/docs/source/scripts/build_quantization_configs.py @@ -0,0 +1,23 @@ +""" +This script will generate default values of quantization configs. +These are for use in the documentation. +""" + +from torch.ao.quantization.fx.backend_config import get_native_backend_config_dict +import os.path +from pprint import pprint + + +# Create a directory for the images, if it doesn't exist +QUANTIZATION_BACKEND_CONFIG_IMAGE_PATH = os.path.join( + os.path.realpath(os.path.join(__file__, "..")), + "quantization_backend_configs" +) + +if not os.path.exists(QUANTIZATION_BACKEND_CONFIG_IMAGE_PATH): + os.mkdir(QUANTIZATION_BACKEND_CONFIG_IMAGE_PATH) + +output_path = os.path.join(QUANTIZATION_BACKEND_CONFIG_IMAGE_PATH, "default_backend_config.txt") + +with open(output_path, "w") as f: + pprint(get_native_backend_config_dict(), stream=f) diff --git a/docs/source/sparse.rst b/docs/source/sparse.rst index 178e4cb186030a..564df4ef432311 100644 --- a/docs/source/sparse.rst +++ b/docs/source/sparse.rst @@ -1,3 +1,5 @@ +.. automodule:: torch.sparse + .. currentmodule:: torch .. _sparse-docs: diff --git a/docs/source/special.rst b/docs/source/special.rst index 1aa24242fad9a3..42acd2148a6a9b 100644 --- a/docs/source/special.rst +++ b/docs/source/special.rst @@ -7,8 +7,6 @@ torch.special The torch.special module, modeled after SciPy's `special `_ module. .. automodule:: torch.special - :noindex: - .. currentmodule:: torch.special Functions @@ -39,6 +37,7 @@ Functions .. autofunction:: multigammaln .. autofunction:: ndtr .. autofunction:: ndtri +.. autofunction:: log_ndtr .. autofunction:: round .. autofunction:: sinc .. autofunction:: softmax diff --git a/docs/source/storage.rst b/docs/source/storage.rst index 3aeec082b607b9..747acf11ed36b8 100644 --- a/docs/source/storage.rst +++ b/docs/source/storage.rst @@ -1,87 +1,96 @@ torch.Storage =================================== -A :class:`torch.Storage` is a contiguous, one-dimensional array of a single -data type. +A :class:`torch._TypedStorage` is a contiguous, one-dimensional array of +elements of a particular :class:`torch.dtype`. It can be given any +:class:`torch.dtype`, and the internal data will be interpretted appropriately. -Every :class:`torch.Tensor` has a corresponding storage of the same data type. +Every strided :class:`torch.Tensor` contains a :class:`torch._TypedStorage`, +which stores all of the data that the :class:`torch.Tensor` views. -.. autoclass:: torch.DoubleStorage +For backward compatibility, there are also :class:`torch.Storage` classes +(like :class:`torch.FloatStorage`, :class:`torch.IntStorage`, etc). These +classes are not actually instantiated, and calling their constructors creates +a :class:`torch._TypedStorage` with the appropriate :class:`torch.dtype`. +:class:`torch.Storage` classes have all of the same class methods that +:class:`torch._TypedStorage` has. + +Also for backward compatibility, :class:`torch.Storage` is an alias for the +storage class that corresponds with the default data type +(:func:`torch.get_default_dtype()`). For instance, if the default data type is +:attr:`torch.float`, :class:`torch.Storage` resolves to +:class:`torch.FloatStorage`. + + +.. autoclass:: torch._TypedStorage :members: :undoc-members: :inherited-members: +.. autoclass:: torch.DoubleStorage + :members: + :undoc-members: + .. autoclass:: torch.FloatStorage :members: :undoc-members: - :inherited-members: .. autoclass:: torch.HalfStorage :members: :undoc-members: - :inherited-members: .. autoclass:: torch.LongStorage :members: :undoc-members: - :inherited-members: .. autoclass:: torch.IntStorage :members: :undoc-members: - :inherited-members: .. autoclass:: torch.ShortStorage :members: :undoc-members: - :inherited-members: .. autoclass:: torch.CharStorage :members: :undoc-members: - :inherited-members: .. autoclass:: torch.ByteStorage :members: :undoc-members: - :inherited-members: .. autoclass:: torch.BoolStorage :members: :undoc-members: - :inherited-members: .. autoclass:: torch.BFloat16Storage :members: :undoc-members: - :inherited-members: .. autoclass:: torch.ComplexDoubleStorage :members: :undoc-members: - :inherited-members: .. autoclass:: torch.ComplexFloatStorage :members: :undoc-members: - :inherited-members: .. autoclass:: torch.QUInt8Storage :members: :undoc-members: - :inherited-members: .. autoclass:: torch.QInt8Storage :members: :undoc-members: - :inherited-members: .. autoclass:: torch.QInt32Storage :members: :undoc-members: - :inherited-members: .. autoclass:: torch.QUInt4x2Storage :members: :undoc-members: - :inherited-members: + +.. autoclass:: torch.QUInt2x4Storage + :members: + :undoc-members: diff --git a/docs/source/tensorboard.rst b/docs/source/tensorboard.rst index d3205e3ba58925..8cd13836928819 100644 --- a/docs/source/tensorboard.rst +++ b/docs/source/tensorboard.rst @@ -1,5 +1,6 @@ torch.utils.tensorboard =================================== +.. automodule:: torch.utils.tensorboard Before going further, more details on TensorBoard can be found at https://www.tensorflow.org/tensorboard/ diff --git a/docs/source/tensors.rst b/docs/source/tensors.rst index fe9467dd4a694a..161a17f4a6da41 100644 --- a/docs/source/tensors.rst +++ b/docs/source/tensors.rst @@ -593,6 +593,7 @@ Tensor class reference Tensor.scatter_ Tensor.scatter_add_ Tensor.scatter_add + Tensor.scatter_reduce_ Tensor.scatter_reduce Tensor.select Tensor.select_scatter diff --git a/docs/source/torch.rst b/docs/source/torch.rst index e09675af82a1a3..e4062b6096f0ee 100644 --- a/docs/source/torch.rst +++ b/docs/source/torch.rst @@ -1,13 +1,6 @@ torch ===== -The torch package contains data structures for multi-dimensional -tensors and defines mathematical operations over these tensors. -Additionally, it provides many utilities for efficient serializing of -Tensors and arbitrary types, and other useful utilities. - -It has a CUDA counterpart, that enables you to run your tensor computations -on an NVIDIA GPU with compute capability >= 3.0 - +.. automodule:: torch .. currentmodule:: torch Tensors @@ -615,3 +608,18 @@ Utilities is_warn_always_enabled vmap _assert + + +.. Empty submodules added only for tracking. +.. py:module:: torch.contrib +.. py:module:: torch.utils.backcompat + +.. This submodule is split manually without a top level page. +.. py:module:: torch.utils + +.. This module is only used internally for ROCm builds. +.. py:module:: torch.utils.hipify + +.. This module needs to be documented. Adding here in the meantime +.. for tracking purposes +.. py:module:: torch.utils.model_dump diff --git a/ios/LibTorch-Lite.podspec b/ios/LibTorch-Lite.podspec index f3ccaa43e93220..d2d9264e0a622d 100644 --- a/ios/LibTorch-Lite.podspec +++ b/ios/LibTorch-Lite.podspec @@ -1,6 +1,6 @@ Pod::Spec.new do |s| s.name = 'LibTorch-Lite' - s.version = '1.10.0' + s.version = '1.11.0' s.authors = 'PyTorch Team' s.license = { :type => 'BSD' } s.homepage = 'https://github.com/pytorch/pytorch' diff --git a/ios/LibTorch.podspec b/ios/LibTorch.podspec index 22aaafac9d12c4..77bc0537e89edc 100644 --- a/ios/LibTorch.podspec +++ b/ios/LibTorch.podspec @@ -1,6 +1,6 @@ Pod::Spec.new do |s| s.name = 'LibTorch' - s.version = '1.10.0' + s.version = '1.11.0' s.authors = 'PyTorch Team' s.license = { :type => 'BSD' } s.homepage = 'https://github.com/pytorch/pytorch' diff --git a/ios/TestApp/TestApp/Base.lproj/Main.storyboard b/ios/TestApp/TestApp/Base.lproj/Main.storyboard index ad8e8f7c874cf1..86c53ddccf2244 100644 --- a/ios/TestApp/TestApp/Base.lproj/Main.storyboard +++ b/ios/TestApp/TestApp/Base.lproj/Main.storyboard @@ -1,38 +1,22 @@ - + - + - - + - - - - - - - - - - - - - - - @@ -59,12 +43,4 @@ - - - - - - - - diff --git a/ios/TestApp/TestApp/ViewController.mm b/ios/TestApp/TestApp/ViewController.mm index 38404ddac3b9f6..d8ecacda3c830b 100644 --- a/ios/TestApp/TestApp/ViewController.mm +++ b/ios/TestApp/TestApp/ViewController.mm @@ -4,4 +4,9 @@ @interface ViewController () @end @implementation ViewController + +- (void)viewDidLoad { + [super viewDidLoad]; +} + @end diff --git a/ios/TestApp/TestAppTests/TestLiteInterpreter.mm b/ios/TestApp/TestAppTests/TestLiteInterpreter.mm index f35642a148e3b3..37c8692b9980ae 100644 --- a/ios/TestApp/TestAppTests/TestLiteInterpreter.mm +++ b/ios/TestApp/TestAppTests/TestLiteInterpreter.mm @@ -11,8 +11,8 @@ @interface TestAppTests : XCTestCase @implementation TestAppTests { } -- (void)testLiteInterpreter { - NSString* modelPath = [[NSBundle bundleForClass:[self class]] pathForResource:@"model_lite" +- (void)testCoreML { + NSString* modelPath = [[NSBundle bundleForClass:[self class]] pathForResource:@"model_coreml" ofType:@"ptl"]; auto module = torch::jit::_load_for_mobile(modelPath.UTF8String); c10::InferenceMode mode; @@ -21,14 +21,173 @@ - (void)testLiteInterpreter { XCTAssertTrue(outputTensor.numel() == 1000); } -- (void)testCoreML { - NSString* modelPath = [[NSBundle bundleForClass:[self class]] pathForResource:@"model_coreml" +- (void)testModel:(NSString*)filename { + // model generated using the current pytorch revision + [self runModel:[NSString stringWithFormat:@"%@_temp", filename]]; + // model generated using older pyotrch revision + [self runModel:filename]; +} + +- (void)runModel:(NSString*)filename { + NSString* modelPath = [[NSBundle bundleForClass:[self class]] pathForResource:filename ofType:@"ptl"]; - auto module = torch::jit::_load_for_mobile(modelPath.UTF8String); + XCTAssertNotNil(modelPath); c10::InferenceMode mode; - auto input = torch::ones({1, 3, 224, 224}, at::kFloat); - auto outputTensor = module.forward({input}).toTensor(); - XCTAssertTrue(outputTensor.numel() == 1000); + auto module = torch::jit::_load_for_mobile(modelPath.UTF8String); + auto has_bundled_input = module.find_method("get_all_bundled_inputs"); + if (has_bundled_input) { + c10::IValue bundled_inputs = module.run_method("get_all_bundled_inputs"); + c10::List all_inputs = bundled_inputs.toList(); + std::vector> inputs; + for (at::IValue input : all_inputs) { + inputs.push_back(input.toTupleRef().elements()); + } + // run with the first bundled input + XCTAssertNoThrow(module.forward(inputs[0])); + } else { + XCTAssertNoThrow(module.forward({})); + } +} + +// TODO remove this once updated test script +- (void)testLiteInterpreter { + XCTAssertTrue(true); +} + +- (void)testMobileNetV2 { + [self testModel:@"mobilenet_v2"]; +} + +- (void)testPointwiseOps { + [self testModel:@"pointwise_ops"]; +} + +- (void)testReductionOps { + [self testModel:@"reduction_ops"]; +} + +- (void)testComparisonOps { + [self testModel:@"comparison_ops"]; +} + +- (void)testOtherMathOps { + [self testModel:@"other_math_ops"]; +} + +- (void)testSpectralOps { + [self testModel:@"spectral_ops"]; +} + +- (void)testBlasLapackOps { + [self testModel:@"blas_lapack_ops"]; +} + +- (void)testSamplingOps { + [self testModel:@"sampling_ops"]; +} + +- (void)testTensorOps { + [self testModel:@"tensor_general_ops"]; +} + +- (void)testTensorCreationOps { + [self testModel:@"tensor_creation_ops"]; +} + +- (void)testTensorIndexingOps { + [self testModel:@"tensor_indexing_ops"]; +} + +- (void)testTensorTypingOps { + [self testModel:@"tensor_typing_ops"]; +} + +- (void)testTensorViewOps { + [self testModel:@"tensor_view_ops"]; +} + +- (void)testConvolutionOps { + [self testModel:@"convolution_ops"]; +} + +- (void)testPoolingOps { + [self testModel:@"pooling_ops"]; +} + +- (void)testPaddingOps { + [self testModel:@"padding_ops"]; +} + +- (void)testActivationOps { + [self testModel:@"activation_ops"]; +} + +- (void)testNormalizationOps { + [self testModel:@"normalization_ops"]; +} + +- (void)testRecurrentOps { + [self testModel:@"recurrent_ops"]; +} + +- (void)testTransformerOps { + [self testModel:@"transformer_ops"]; +} + +- (void)testLinearOps { + [self testModel:@"linear_ops"]; +} + +- (void)testDropoutOps { + [self testModel:@"dropout_ops"]; +} + +- (void)testSparseOps { + [self testModel:@"sparse_ops"]; +} + +- (void)testDistanceFunctionOps { + [self testModel:@"distance_function_ops"]; +} + +- (void)testLossFunctionOps { + [self testModel:@"loss_function_ops"]; +} + +- (void)testVisionFunctionOps { + [self testModel:@"vision_function_ops"]; +} + +- (void)testShuffleOps { + [self testModel:@"shuffle_ops"]; +} + +- (void)testNNUtilsOps { + [self testModel:@"nn_utils_ops"]; +} + +- (void)testQuantOps { + [self testModel:@"general_quant_ops"]; +} + +- (void)testDynamicQuantOps { + [self testModel:@"dynamic_quant_ops"]; +} + +- (void)testStaticQuantOps { + [self testModel:@"static_quant_ops"]; +} + +- (void)testFusedQuantOps { + [self testModel:@"fused_quant_ops"]; +} + +- (void)testTorchScriptBuiltinQuantOps { + [self testModel:@"torchscript_builtin_ops"]; +} + +- (void)testTorchScriptCollectionQuantOps { + [self testModel:@"torchscript_collection_ops"]; } @end diff --git a/ios/TestApp/models/activation_ops.ptl b/ios/TestApp/models/activation_ops.ptl new file mode 100644 index 00000000000000..44673efd446e98 Binary files /dev/null and b/ios/TestApp/models/activation_ops.ptl differ diff --git a/ios/TestApp/models/android_api_module.ptl b/ios/TestApp/models/android_api_module.ptl new file mode 100644 index 00000000000000..df62dd86208811 Binary files /dev/null and b/ios/TestApp/models/android_api_module.ptl differ diff --git a/ios/TestApp/models/blas_lapack_ops.ptl b/ios/TestApp/models/blas_lapack_ops.ptl new file mode 100644 index 00000000000000..fea933ee644fd4 Binary files /dev/null and b/ios/TestApp/models/blas_lapack_ops.ptl differ diff --git a/ios/TestApp/models/comparison_ops.ptl b/ios/TestApp/models/comparison_ops.ptl new file mode 100644 index 00000000000000..01b1c153e7515a Binary files /dev/null and b/ios/TestApp/models/comparison_ops.ptl differ diff --git a/ios/TestApp/models/convolution_ops.ptl b/ios/TestApp/models/convolution_ops.ptl new file mode 100644 index 00000000000000..de776834eb7704 Binary files /dev/null and b/ios/TestApp/models/convolution_ops.ptl differ diff --git a/ios/TestApp/models/distance_function_ops.ptl b/ios/TestApp/models/distance_function_ops.ptl new file mode 100644 index 00000000000000..cc4d994f440a4d Binary files /dev/null and b/ios/TestApp/models/distance_function_ops.ptl differ diff --git a/ios/TestApp/models/dropout_ops.ptl b/ios/TestApp/models/dropout_ops.ptl new file mode 100644 index 00000000000000..422c2f60e6be25 Binary files /dev/null and b/ios/TestApp/models/dropout_ops.ptl differ diff --git a/ios/TestApp/models/dynamic_quant_ops.ptl b/ios/TestApp/models/dynamic_quant_ops.ptl new file mode 100644 index 00000000000000..573dee91f07b20 Binary files /dev/null and b/ios/TestApp/models/dynamic_quant_ops.ptl differ diff --git a/ios/TestApp/models/fused_quant_ops.ptl b/ios/TestApp/models/fused_quant_ops.ptl new file mode 100644 index 00000000000000..d24e3d8d4caa3f Binary files /dev/null and b/ios/TestApp/models/fused_quant_ops.ptl differ diff --git a/ios/TestApp/models/general_quant_ops.ptl b/ios/TestApp/models/general_quant_ops.ptl new file mode 100644 index 00000000000000..5254d33b4794d9 Binary files /dev/null and b/ios/TestApp/models/general_quant_ops.ptl differ diff --git a/ios/TestApp/models/linear_ops.ptl b/ios/TestApp/models/linear_ops.ptl new file mode 100644 index 00000000000000..36915823843cf9 Binary files /dev/null and b/ios/TestApp/models/linear_ops.ptl differ diff --git a/ios/TestApp/models/loss_function_ops.ptl b/ios/TestApp/models/loss_function_ops.ptl new file mode 100644 index 00000000000000..4c0592e5485afa Binary files /dev/null and b/ios/TestApp/models/loss_function_ops.ptl differ diff --git a/ios/TestApp/models/mobilenet_v2.ptl b/ios/TestApp/models/mobilenet_v2.ptl new file mode 100644 index 00000000000000..b034aaf8c8020e Binary files /dev/null and b/ios/TestApp/models/mobilenet_v2.ptl differ diff --git a/ios/TestApp/models/model_coreml.ptl b/ios/TestApp/models/model_coreml.ptl new file mode 100644 index 00000000000000..1f2271b365f3c0 Binary files /dev/null and b/ios/TestApp/models/model_coreml.ptl differ diff --git a/ios/TestApp/models/model_lite.ptl b/ios/TestApp/models/model_lite.ptl new file mode 100644 index 00000000000000..9aef3bd6b54663 Binary files /dev/null and b/ios/TestApp/models/model_lite.ptl differ diff --git a/ios/TestApp/models/nn_utils_ops.ptl b/ios/TestApp/models/nn_utils_ops.ptl new file mode 100644 index 00000000000000..726b200a67d161 Binary files /dev/null and b/ios/TestApp/models/nn_utils_ops.ptl differ diff --git a/ios/TestApp/models/normalization_ops.ptl b/ios/TestApp/models/normalization_ops.ptl new file mode 100644 index 00000000000000..1846009a3b7239 Binary files /dev/null and b/ios/TestApp/models/normalization_ops.ptl differ diff --git a/ios/TestApp/models/other_math_ops.ptl b/ios/TestApp/models/other_math_ops.ptl new file mode 100644 index 00000000000000..7209c3b3bd1fdd Binary files /dev/null and b/ios/TestApp/models/other_math_ops.ptl differ diff --git a/ios/TestApp/models/padding_ops.ptl b/ios/TestApp/models/padding_ops.ptl new file mode 100644 index 00000000000000..4af0418f11a679 Binary files /dev/null and b/ios/TestApp/models/padding_ops.ptl differ diff --git a/ios/TestApp/models/pointwise_ops.ptl b/ios/TestApp/models/pointwise_ops.ptl new file mode 100644 index 00000000000000..948ed4832660ae Binary files /dev/null and b/ios/TestApp/models/pointwise_ops.ptl differ diff --git a/ios/TestApp/models/pooling_ops.ptl b/ios/TestApp/models/pooling_ops.ptl new file mode 100644 index 00000000000000..4b98f1971ee54c Binary files /dev/null and b/ios/TestApp/models/pooling_ops.ptl differ diff --git a/ios/TestApp/models/recurrent_ops.ptl b/ios/TestApp/models/recurrent_ops.ptl new file mode 100644 index 00000000000000..10804040be8479 Binary files /dev/null and b/ios/TestApp/models/recurrent_ops.ptl differ diff --git a/ios/TestApp/models/reduction_ops.ptl b/ios/TestApp/models/reduction_ops.ptl new file mode 100644 index 00000000000000..13771302c66802 Binary files /dev/null and b/ios/TestApp/models/reduction_ops.ptl differ diff --git a/ios/TestApp/models/sampling_ops.ptl b/ios/TestApp/models/sampling_ops.ptl new file mode 100644 index 00000000000000..416be7cb127953 Binary files /dev/null and b/ios/TestApp/models/sampling_ops.ptl differ diff --git a/ios/TestApp/models/shuffle_ops.ptl b/ios/TestApp/models/shuffle_ops.ptl new file mode 100644 index 00000000000000..5e5520118764ef Binary files /dev/null and b/ios/TestApp/models/shuffle_ops.ptl differ diff --git a/ios/TestApp/models/sparse_ops.ptl b/ios/TestApp/models/sparse_ops.ptl new file mode 100644 index 00000000000000..a16f68f8f95ff8 Binary files /dev/null and b/ios/TestApp/models/sparse_ops.ptl differ diff --git a/ios/TestApp/models/spectral_ops.ptl b/ios/TestApp/models/spectral_ops.ptl new file mode 100644 index 00000000000000..9828dd2ba9013a Binary files /dev/null and b/ios/TestApp/models/spectral_ops.ptl differ diff --git a/ios/TestApp/models/static_quant_ops.ptl b/ios/TestApp/models/static_quant_ops.ptl new file mode 100644 index 00000000000000..f0f0a09b832db2 Binary files /dev/null and b/ios/TestApp/models/static_quant_ops.ptl differ diff --git a/ios/TestApp/models/tensor_creation_ops.ptl b/ios/TestApp/models/tensor_creation_ops.ptl new file mode 100644 index 00000000000000..d897b43cd36ca9 Binary files /dev/null and b/ios/TestApp/models/tensor_creation_ops.ptl differ diff --git a/ios/TestApp/models/tensor_general_ops.ptl b/ios/TestApp/models/tensor_general_ops.ptl new file mode 100644 index 00000000000000..6f2855ea83eaa5 Binary files /dev/null and b/ios/TestApp/models/tensor_general_ops.ptl differ diff --git a/ios/TestApp/models/tensor_indexing_ops.ptl b/ios/TestApp/models/tensor_indexing_ops.ptl new file mode 100644 index 00000000000000..ac9cb8c4b94add Binary files /dev/null and b/ios/TestApp/models/tensor_indexing_ops.ptl differ diff --git a/ios/TestApp/models/tensor_typing_ops.ptl b/ios/TestApp/models/tensor_typing_ops.ptl new file mode 100644 index 00000000000000..3e2f4d8cc68922 Binary files /dev/null and b/ios/TestApp/models/tensor_typing_ops.ptl differ diff --git a/ios/TestApp/models/tensor_view_ops.ptl b/ios/TestApp/models/tensor_view_ops.ptl new file mode 100644 index 00000000000000..5e2dc829484265 Binary files /dev/null and b/ios/TestApp/models/tensor_view_ops.ptl differ diff --git a/ios/TestApp/models/torchscript_builtin_ops.ptl b/ios/TestApp/models/torchscript_builtin_ops.ptl new file mode 100644 index 00000000000000..2d2532df2fd257 Binary files /dev/null and b/ios/TestApp/models/torchscript_builtin_ops.ptl differ diff --git a/ios/TestApp/models/torchscript_collection_ops.ptl b/ios/TestApp/models/torchscript_collection_ops.ptl new file mode 100644 index 00000000000000..ce434b3b4210d5 Binary files /dev/null and b/ios/TestApp/models/torchscript_collection_ops.ptl differ diff --git a/ios/TestApp/models/transformer_ops.ptl b/ios/TestApp/models/transformer_ops.ptl new file mode 100644 index 00000000000000..4546569cd7fd99 Binary files /dev/null and b/ios/TestApp/models/transformer_ops.ptl differ diff --git a/ios/TestApp/models/vision_function_ops.ptl b/ios/TestApp/models/vision_function_ops.ptl new file mode 100644 index 00000000000000..e1f8c39c78abd9 Binary files /dev/null and b/ios/TestApp/models/vision_function_ops.ptl differ diff --git a/modules/observers/perf_observer.cc b/modules/observers/perf_observer.cc index bdee55daf1792e..cfd6130f7255e3 100644 --- a/modules/observers/perf_observer.cc +++ b/modules/observers/perf_observer.cc @@ -195,7 +195,7 @@ void PerfNetObserver::Start() { int skipIters = ObserverConfig::getSkipIters(); int sampleRate = visitCount > 0 ? netFollowupSampleRate : netInitSampleRate; // NOLINTNEXTLINE(clang-analyzer-security.insecureAPI.rand) - if (skipIters <= numRuns_ && sampleRate > 0 && rand() % sampleRate == 0) { + if (skipIters <= static_cast(numRuns_) && sampleRate > 0 && rand() % sampleRate == 0) { visitCount++; if (visitCount == netFollowupSampleCount) { visitCount = 0; @@ -238,9 +238,9 @@ void PerfNetObserver::Stop() { if (logType_ == PerfNetObserver::OPERATOR_DELAY) { const auto& operators = subject_->GetOperators(); - for (int idx = 0; idx < operators.size(); ++idx) { + for (unsigned idx = 0; idx < operators.size(); ++idx) { const auto* op = operators[idx]; - auto name = getObserverName(op, idx); + auto name = getObserverName(op, static_cast(idx)); PerformanceInformation p; const PerfOperatorObserver* opObserver = static_cast(observerMap_[op]); diff --git a/mypy.ini b/mypy.ini index a3ec144806e48e..61442c1a7d697a 100644 --- a/mypy.ini +++ b/mypy.ini @@ -41,7 +41,7 @@ files = # # `exclude` is a regex, not a list of paths like `files` (sigh) # -exclude = torch/include/|torch/csrc/|torch/distributed/elastic/agent/server/api.py|torch/testing/_internal +exclude = torch/include/|torch/csrc/|torch/distributed/elastic/agent/server/api.py|torch/testing/_internal|torch/distributed/fsdp/fully_sharded_data_parallel.py # Minimum version supported - variable annotations were introduced # in Python 3.7 diff --git a/mypy_plugins/check_mypy_version.py b/mypy_plugins/check_mypy_version.py index 02a02a60b9501d..a34b8683c989e0 100644 --- a/mypy_plugins/check_mypy_version.py +++ b/mypy_plugins/check_mypy_version.py @@ -9,7 +9,7 @@ def get_correct_mypy_version(): # there's probably a more elegant way to do this match, = re.finditer( r'mypy==(\d+(?:\.\d+)*)', - Path('.circleci/docker/common/install_conda.sh').read_text(), + Path('.circleci/docker/requirements-ci.txt').read_text(), ) version, = match.groups() return version diff --git a/related_commits b/related_commits index 203ce97c0eb4be..32d7bc42104d2c 100644 --- a/related_commits +++ b/related_commits @@ -1,4 +1,4 @@ ubuntu|pytorch|apex|master|none|https://github.com/ROCmSoftwarePlatform/apex centos|pytorch|apex|master|none|https://github.com/ROCmSoftwarePlatform/apex -ubuntu|pytorch|torchvision|main|d8654bb0d84fd2ba8b42cd58d881523821a6214c|https://github.com/pytorch/vision -centos|pytorch|torchvision|main|d8654bb0d84fd2ba8b42cd58d881523821a6214c|https://github.com/pytorch/vision +ubuntu|pytorch|torchvision|main|f5afae50bc8e99b873e2345bcda2dedfc863a737|https://github.com/pytorch/vision +centos|pytorch|torchvision|main|f5afae50bc8e99b873e2345bcda2dedfc863a737|https://github.com/pytorch/vision diff --git a/scripts/jit/log_extract.py b/scripts/jit/log_extract.py index de9f983745c542..61e3172fe0b360 100644 --- a/scripts/jit/log_extract.py +++ b/scripts/jit/log_extract.py @@ -1,132 +1,45 @@ -from contextlib import contextmanager -from torch.testing import make_tensor -from typing import Any, List, Tuple import argparse -import torch +import functools +import traceback +from torch.utils.jit.log_extract import extract_ir, load_graph_and_inputs, run_baseline_no_fusion, run_nnc, run_nvfuser +from typing import List, Tuple, Callable, Optional ''' Usage: 1. Run your script and pipe into a log file PYTORCH_JIT_LOG_LEVEL=">>graph_fuser" python3 my_test.py &> log.txt 2. Run log_extract: - log_extract.py log.txt --nvfuser + log_extract.py log.txt --nvfuser --nnc-dynamic --nnc-static You can also extract the list of extracted IR: log_extract.py log.txt --output + +Passing in --graphs 0 2 will only run graphs 0 and 2 ''' -def extract_ir(filename: str) -> List[str]: - BEGIN = "" - END = "" - pfx = None - current = "" - graphs = [] - with open(filename, "r") as f: - split_strs = f.read().split(BEGIN) - for i, split_str in enumerate(split_strs): - if i == 0: - continue - end_loc = split_str.find(END) - if end_loc == -1: - continue - s = split_str[:end_loc] - pfx = split_strs[i - 1].splitlines()[-1] - lines = [x[len(pfx):] for x in s.splitlines(keepends=True)] - graphs.append(''.join(lines)) - - return graphs - - -def make_tensor_from_type(inp_type: torch._C.TensorType): - if inp_type.requires_grad() is not False: - raise NotImplementedError("Tensors with requires_grad are not implemented") - return make_tensor( - inp_type.sizes(), - dtype=inp_type.dtype(), - device=inp_type.device()) - - -def load_graph_and_inputs(ir: str) -> Tuple[Any, List[Any]]: - graph = torch._C.parse_ir(ir) - graph.makeMultiOutputIntoTuple() - inputs = [] - for inp in graph.inputs(): - if isinstance(inp.type(), torch._C.FloatType): - inputs.append(.5) - elif isinstance(inp.type(), torch._C.IntType): - inputs.append(2) - elif isinstance(inp.type(), torch._C.TensorType): - inputs.append(make_tensor_from_type(inp.type())) - else: - raise NotImplementedError(f"A default value is not implemented for type {inp.type()}") - - func = torch._C._create_function_from_graph("forward", graph) - torch._C._jit_pass_erase_shape_information(func.graph) - return (func, inputs) - - -# TODO add support for timing on CPU -def run_test(ir, inputs, *, warmup_runs=10, test_runs=20) -> float: - graph, _ = load_graph_and_inputs(ir) - for _ in range(warmup_runs): - graph(*inputs) - - start_event = torch.cuda.Event(enable_timing=True) - end_event = torch.cuda.Event(enable_timing=True) - torch.cuda.synchronize() - start_event.record() - torch.cuda.synchronize() - for i in range(test_runs): - graph(*inputs) - torch.cuda.synchronize() - end_event.record() - torch.cuda.synchronize() - return start_event.elapsed_time(end_event) / test_runs - - -@contextmanager -def no_fuser(*args, **kwargs): - old_cpu_fuse = torch._C._jit_can_fuse_on_cpu() - old_gpu_fuse = torch._C._jit_can_fuse_on_gpu() - old_texpr_fuser_state = torch._C._jit_texpr_fuser_enabled() - old_nvfuser_state = torch._C._jit_nvfuser_enabled() - - torch._C._jit_override_can_fuse_on_cpu(False) - torch._C._jit_override_can_fuse_on_gpu(False) - torch._C._jit_set_texpr_fuser_enabled(False) - torch._C._jit_set_nvfuser_enabled(False) - - try: - yield - finally: - torch._C._jit_override_can_fuse_on_cpu(old_cpu_fuse) - torch._C._jit_override_can_fuse_on_gpu(old_gpu_fuse) - torch._C._jit_set_texpr_fuser_enabled(old_texpr_fuser_state) - torch._C._jit_set_nvfuser_enabled(old_nvfuser_state) - - -def run_baseline_no_fusion(ir, inputs) -> float: - with no_fuser(): - return run_test(ir, inputs) - - -def run_nnc(ir, inputs) -> float: - with torch.jit.fuser("fuser1"): - return run_test(ir, inputs) - - -def run_nvfuser(ir, inputs) -> float: - with torch.jit.fuser("fuser2"): - return run_test(ir, inputs) - - -def test_nvfuser(graphs: List[str], baseline_fn, nvfuser_fn): + +def test_runners(graphs: List[str], runners: List[Tuple[str, Callable]], graph_set: Optional[List[int]]): for i, ir in enumerate(graphs): _, inputs = load_graph_and_inputs(ir) - baseline = baseline_fn(ir, inputs) - nvfuser = nvfuser_fn(ir, inputs) - improvement = (baseline / nvfuser - 1) * 100 - print(f" Graph {i}; baseline: {baseline:.2f} ms; nvfuser: {nvfuser:.2f} ms; improvement: {improvement:.2f}%") + if graph_set and i not in graph_set: + continue + + print(f"Running Graph {i}") + prev_result = None + prev_runner_name = None + for runner in runners: + runner_name, runner_fn = runner + try: + result = runner_fn(ir, inputs) + if prev_result: + improvement = (prev_result / result - 1) * 100 + print(f"{runner_name} : {result:.6f} ms improvement over {prev_runner_name}: improvement: {improvement:.2f}%") + else: + print(f"{runner_name} : {result:.6f} ms") + prev_result = result + prev_runner_name = runner_name + except RuntimeError: + print(f" Graph {i} failed for {runner_name} :", traceback.format_exc()) def run(): @@ -134,30 +47,56 @@ def run(): description="Extracts torchscript IR from log files and, optionally, benchmarks it or outputs the IR" ) parser.add_argument("filename", help="Filename of log file") - parser.add_argument("--nvfuser", dest="nvfuser", action="store_true", help="benchmark nvfuser against no fusion") - parser.add_argument("--no-nvfuser", dest="nvfuser", action="store_false", help="DON'T benchmark nvfuser against no fusion") + parser.add_argument("--nvfuser", dest="nvfuser", action="store_true", help="benchmark nvfuser") + parser.add_argument("--no-nvfuser", dest="nvfuser", action="store_false", help="DON'T benchmark nvfuser") parser.set_defaults(nvfuser=False) - parser.add_argument("--nvfuser-nnc", dest="nvfuser_nnc", action="store_true", help="benchmark nvfuser against nnc") - parser.add_argument("--no-nvfuser-nnc", dest="nvfuser_nnc", action="store_false", help="DON'T benchmark nvfuser against nnc") - parser.set_defaults(nvfuser_nnc=False) + parser.add_argument("--nnc-static", dest="nnc_static", action="store_true", help="benchmark nnc static") + parser.add_argument("--no-nnc-static", dest="nnc_static", action="store_false", help="DON'T benchmark nnc static") + parser.set_defaults(nnc_static=False) + + parser.add_argument("--nnc-dynamic", dest="nnc_dynamic", action="store_true", help="nnc with dynamic shapes") + parser.add_argument( + "--no-nnc-dynamic", + dest="nnc_dynamic", + action="store_false", + help="DONT't benchmark nnc with dynamic shapes") + parser.set_defaults(nnc_dynamic=False) + + + parser.add_argument("--baseline", dest="baseline", action="store_true", help="benchmark baseline") + parser.add_argument("--no-baseline", dest="baseline", action="store_false", help="DON'T benchmark baseline") + parser.set_defaults(baseline=False) + parser.add_argument("--output", dest="output", action="store_true", help="Output graph IR") parser.add_argument("--no-output", dest="output", action="store_false", help="DON'T output graph IR") parser.set_defaults(output=False) + parser.add_argument('--graphs', nargs="+", type=int, help="Run only specified graph indices") + + args = parser.parse_args() graphs = extract_ir(args.filename) + graph_set = args.graphs + graph_set = graph_set if graph_set else None + + options = [] + if args.baseline: + options.append(("Baseline no fusion", run_baseline_no_fusion)) + if args.nnc_dynamic: + options.append(("NNC Dynamic", functools.partial(run_nnc, dynamic=True))) + if args.nnc_static: + options.append(("NNC Static", functools.partial(run_nnc, dynamic=False))) if args.nvfuser: - print("NVFuser vs no fusion:") - test_nvfuser(graphs, run_baseline_no_fusion, run_nvfuser) + options.append(("NVFuser", run_nvfuser)) - if args.nvfuser_nnc: - print("NVFuser vs NNC:") - test_nvfuser(graphs, run_nnc, run_nvfuser) + test_runners(graphs, options, graph_set) if args.output: quoted = [] - for ir in graphs: + for i, ir in enumerate(graphs): + if graph_set and i not in graph_set: + continue quoted.append("\"\"\"" + ir + "\"\"\"") print("[" + ", ".join(quoted) + "]") diff --git a/scripts/onnx/test.sh b/scripts/onnx/test.sh index 3b39f600587668..dbeb6b2b27f5f3 100755 --- a/scripts/onnx/test.sh +++ b/scripts/onnx/test.sh @@ -69,7 +69,7 @@ if [[ "$BUILD_ENVIRONMENT" == *ort_test1* || "${SHARD_NUMBER}" == "1" ]]; then pytest "${args[@]}" \ "$top_dir/test/onnx/test_pytorch_onnx_onnxruntime.py::TestONNXRuntime_opset7" \ "$top_dir/test/onnx/test_pytorch_onnx_onnxruntime.py::TestONNXRuntime_opset8" \ - "$top_dir/test/onnx/test_pytorch_onnx_onnxruntime.py::TestONNXRuntime" \ + "$top_dir/test/onnx/test_pytorch_onnx_onnxruntime.py::TestONNXRuntime_opset9" \ "$top_dir/test/onnx/test_custom_ops.py" \ "$top_dir/test/onnx/test_models_onnxruntime.py" \ "$top_dir/test/onnx/test_utility_funs.py" \ diff --git a/scripts/release/cut-release-branch.sh b/scripts/release/cut-release-branch.sh new file mode 100644 index 00000000000000..468dbfb184d941 --- /dev/null +++ b/scripts/release/cut-release-branch.sh @@ -0,0 +1,49 @@ +#!/usr/bin/env bash + +: ' +So you are looking to cut a release branch? Well you came +to the right script. + +This script can be used to cut any branch on any repository + +For `pytorch/pytorch` usage would be like: +> DRY_RUN=disabled cut-release-branch.sh + +For `pytorch/builder` or domains usage would be like: +> DRY_RUN=disabled GIT_BRANCH_TO_CUT_FROM=main RELEASE_VERSION=1.11 cut-release-branch.sh +' + +set -eou pipefail + +GIT_TOP_DIR=$(git rev-parse --show-toplevel) +GIT_REMOTE=${GIT_REMOTE:-origin} +GIT_BRANCH_TO_CUT_FROM=${GIT_BRANCH_TO_CUT_FROM:-viable/strict} + +# should output something like 1.11 +RELEASE_VERSION=${RELEASE_VERSION:-$(cut -d'.' -f1-2 "${GIT_TOP_DIR}/version.txt")} + +DRY_RUN_FLAG="--dry-run" +if [[ ${DRY_RUN:-enabled} == "disabled" ]]; then + DRY_RUN_FLAG="" +fi + + +( + set -x + git fetch --all + git checkout "${GIT_REMOTE}/${GIT_BRANCH_TO_CUT_FROM}" +) + +for branch in "release/${RELEASE_VERSION}" "orig/release/${RELEASE_VERSION}"; do + if git rev-parse --verify "${branch}" >/dev/null 2>/dev/null; then + echo "+ Branch ${branch} already exists, skipping..." + continue + else + ( + set -x + git checkout "${GIT_REMOTE}/${GIT_BRANCH_TO_CUT_FROM}" + git checkout -b "${branch}" + git push "${GIT_REMOTE}" "${branch}" + ) + fi +done diff --git a/setup.py b/setup.py index 8024cb53b63cb0..dee5b369dc5ad1 100644 --- a/setup.py +++ b/setup.py @@ -50,6 +50,9 @@ # MKLDNN_CPU_RUNTIME # MKL-DNN threading mode: TBB or OMP (default) # +# USE_STATIC_MKL +# Prefer to link with MKL statically - Unix only +# # USE_NNPACK=0 # disables NNPACK build # @@ -821,7 +824,16 @@ def make_relative_rpath_args(path): include_dirs=[], library_dirs=library_dirs, extra_link_args=extra_link_args + main_link_args + make_relative_rpath_args('lib')) + C_flatbuffer = Extension("torch._C_flatbuffer", + libraries=main_libraries, + sources=["torch/csrc/stub_with_flatbuffer.c"], + language='c', + extra_compile_args=main_compile_args + extra_compile_args, + include_dirs=[], + library_dirs=library_dirs, + extra_link_args=extra_link_args + main_link_args + make_relative_rpath_args('lib')) extensions.append(C) + extensions.append(C_flatbuffer) if not IS_WINDOWS: DL = Extension("torch._dl", @@ -929,6 +941,7 @@ def print_box(msg): 'bin/*', 'test/*', '_C/*.pyi', + '_C_flatbuffer/*.pyi', 'cuda/*.pyi', 'optim/*.pyi', 'autograd/*.pyi', @@ -936,6 +949,7 @@ def print_box(msg): 'nn/*.pyi', 'nn/modules/*.pyi', 'nn/parallel/*.pyi', + 'utils/data/*.pyi', 'lib/*.so*', 'lib/*.dylib*', 'lib/*.dll', @@ -1015,7 +1029,8 @@ def print_box(msg): 'include/torch/csrc/autograd/utils/*.h', 'include/torch/csrc/cuda/*.h', 'include/torch/csrc/deploy/*.h', - 'include/torch/csrc/deploy/interpreter/interpreter_impl.h', + 'include/torch/csrc/deploy/interpreter/*.h', + 'include/torch/csrc/deploy/interpreter/*.hpp', 'include/torch/csrc/distributed/c10d/exception.h', 'include/torch/csrc/jit/*.h', 'include/torch/csrc/jit/backends/*.h', @@ -1036,6 +1051,7 @@ def print_box(msg): 'include/torch/csrc/profiler/*.h', 'include/torch/csrc/utils/*.h', 'include/torch/csrc/tensor/*.h', + 'include/torch/csrc/lazy/backend/*.h', 'include/torch/csrc/lazy/core/*.h', 'include/pybind11/*.h', 'include/pybind11/detail/*.h', diff --git a/test/ao/sparsity/test_composability.py b/test/ao/sparsity/test_composability.py new file mode 100644 index 00000000000000..b44c885507740e --- /dev/null +++ b/test/ao/sparsity/test_composability.py @@ -0,0 +1,304 @@ +# -*- coding: utf-8 -*- +# Owner(s): ["module: unknown"] + + +import logging + +import torch +import torch.ao.quantization as tq +from torch import nn +from torch.ao import sparsity +from torch.testing._internal.common_utils import TestCase + +logging.basicConfig( + format="%(asctime)s - %(name)s - %(levelname)s - %(message)s", level=logging.INFO +) + +sparse_defaults = { + "sparsity_level": 0.8, + "sparse_block_shape": (1, 4), + "zeros_per_block": 4, +} + +# This series of tests are to check the composability goals for sparsity and quantization. Namely +# that performing quantization and sparsity model manipulations in various orderings +# does not cause problems +class TestComposability(TestCase): + def _get_model_and_sparsifier_and_sparse_config(self, qconfig=None): + model = nn.Sequential( + nn.Linear(4, 4), # 0 + nn.ReLU(), + nn.Linear(4, 4), # 2 + nn.ReLU(), + tq.QuantStub(), + nn.Linear(4, 4), # 5 + nn.ReLU(), + tq.DeQuantStub(), + ) + if qconfig is None: + model[4].qconfig = tq.get_default_qconfig("fbgemm") + model[5].qconfig = tq.get_default_qconfig("fbgemm") + else: + model[4].qconfig = qconfig + model[5].qconfig = qconfig + + sparsifier = sparsity.WeightNormSparsifier(**sparse_defaults) + + sparse_config = [ + { + "module": model[5], + "sparsity_level": 0.7, + "sparse_block_shape": (1, 4), + "zeros_per_block": 4, + }, + model[0], + ] + return model, sparsifier, sparse_config + + def _squash_mask_calibrate_and_convert(self, model, sparsifier, input): + sparsifier.step() + sparsifier.squash_mask() + model(input) + tq.convert(model, inplace=True) + + def _calculate_sparsity(self, tensor): + return ((tensor == 0).sum() / tensor.numel()).item() + + # This test checks whether performing quantization prepare before sparse prepare + # causes any issues and verifies that the correct observers are inserted and that + # the quantized model works as expected + def test_q_prep_before_s_prep(self): + ( + mod, + sparsifier, + sparse_config, + ) = self._get_model_and_sparsifier_and_sparse_config() + + tq.prepare(mod, inplace=True) + sparsifier.prepare(mod, config=sparse_config) + + # check that correct modules had parametrizations added + self.assertTrue(hasattr(mod[0], "parametrizations")) + self.assertTrue(hasattr(mod[5], "parametrizations")) + # check that correct observers were inserted + self.assertTrue(hasattr(mod[5], "activation_post_process")) + + self._squash_mask_calibrate_and_convert( + mod, sparsifier, torch.randn(1, 4, 4, 4) + ) + + # check that final module is the expected quantized module and that the model runs + self.assertTrue(isinstance(mod[5], torch.nn.quantized.Linear)) + self.assertEqual(mod(torch.randn(1, 4, 4, 4)).shape, torch.Size([1, 4, 4, 4])) + + # This test checks whether performing sparsity prepare before quantization prepare + # causes any issues. In particular, previous quantization flow was unable to match + # the post sparse prepare module names (adding parametrizations changes the module class names) + # which would result in those parametrized modules not being quantized. This test verifies that + # the fix for this was successful. + def test_s_prep_before_q_prep(self): + ( + mod, + sparsifier, + sparse_config, + ) = self._get_model_and_sparsifier_and_sparse_config() + + sparsifier.prepare(mod, config=sparse_config) + tq.prepare(mod, inplace=True) + + # check that correct modules had parametrizations added and + # that none were lost during prepare + self.assertTrue(hasattr(mod[0], "parametrizations")) + self.assertTrue(hasattr(mod[5], "parametrizations")) + + # check that correct observers were inserted and that matching + # occured successfully + self.assertTrue(hasattr(mod[5], "activation_post_process")) + + self._squash_mask_calibrate_and_convert( + mod, sparsifier, torch.randn(1, 4, 4, 4) + ) + + # check that final module is the expected quantized module and that the model runs + self.assertTrue(isinstance(mod[5], torch.nn.quantized.Linear)) + self.assertEqual(mod(torch.randn(1, 4, 4, 4)).shape, torch.Size([1, 4, 4, 4])) + + # if the sparsified modules have not undergone the final squash mask operation, its possible + # that the problem outlined in test_s_prep_before_q_prep would occur. This test verifies + # both that the fix to the convert flow avoids this issue and that the resulting quantized + # module uses the sparse version of the weight value. + def test_convert_without_squash_mask(self): + ( + mod, + sparsifier, + sparse_config, + ) = self._get_model_and_sparsifier_and_sparse_config() + + sparsifier.prepare(mod, config=sparse_config) + tq.prepare(mod, inplace=True) + + # check that correct modules had parametrizations added and + # that none were lost during prepare + self.assertTrue(hasattr(mod[0], "parametrizations")) + self.assertTrue(hasattr(mod[5], "parametrizations")) + + # check that correct observers were inserted and that matching + # occured successfully + self.assertTrue(hasattr(mod[5], "activation_post_process")) + sparsifier.step() + sparsity_level = self._calculate_sparsity(mod[5].weight) + mod(torch.randn(1, 4, 4, 4)) + tq.convert(mod, inplace=True) + + # check that final module is the expected quantized module and that the model runs + self.assertTrue(isinstance(mod[5], torch.nn.quantized.Linear)) + self.assertEqual(mod(torch.randn(1, 4, 4, 4)).shape, torch.Size([1, 4, 4, 4])) + + # check that module was actually sparsified + cur_sparsity = self._calculate_sparsity(mod[5]._weight_bias()[0]) + self.assertGreaterAlmostEqual(cur_sparsity, sparsity_level) + self.assertGreaterAlmostEqual( + sparsity_level, sparse_config[0]["sparsity_level"] + ) + self.assertGreaterAlmostEqual(cur_sparsity, sparse_config[0]["sparsity_level"]) + + # This tests whether performing sparse prepare before fusion causes any issues. The + # worry was that the link created between the sparsifier and the modules that need to + # be sparsified would be broken. + def test_s_prep_before_fusion(self): + ( + mod, + sparsifier, + sparse_config, + ) = self._get_model_and_sparsifier_and_sparse_config() + sparsifier.prepare(mod, config=sparse_config) + tq.fuse_modules(mod, [["5", "6"]], inplace=True) + mod[5].qconfig = tq.get_default_qconfig("fbgemm") + tq.prepare(mod, inplace=True) + + # check that correct modules had parametrizations added and + # that none were lost during prepare or fusion + self.assertTrue(hasattr(mod[0], "parametrizations")) + self.assertTrue(hasattr(mod[5][0], "parametrizations")) + + # check that correct observers were inserted and that matching + # occured successfully + self.assertTrue(hasattr(mod[5], "activation_post_process")) + self._squash_mask_calibrate_and_convert( + mod, sparsifier, torch.randn(1, 4, 4, 4) + ) + + # check that final module is the expected quantized module and that the model runs + self.assertTrue(isinstance(mod[5], torch.nn.intrinsic.quantized.LinearReLU)) + self.assertEqual(mod(torch.randn(1, 4, 4, 4)).shape, torch.Size([1, 4, 4, 4])) + + # This tests whether performing fusion before sparse prepare causes and issues. The + # main worry was that the links to the modules in the sparse config would be broken by fusion. + def test_fusion_before_s_prep(self): + ( + mod, + sparsifier, + sparse_config, + ) = self._get_model_and_sparsifier_and_sparse_config() + tq.fuse_modules(mod, [["5", "6"]], inplace=True) + sparsifier.prepare(mod, config=sparse_config) + mod[5].qconfig = tq.get_default_qconfig("fbgemm") + tq.prepare(mod, inplace=True) + + # check that correct modules had parametrizations added and + # that none were lost during prepare + self.assertTrue(hasattr(mod[0], "parametrizations")) + self.assertTrue(hasattr(mod[5][0], "parametrizations")) + + # check that correct observers were inserted and that matching + # occured successfully + self.assertTrue(hasattr(mod[5], "activation_post_process")) + sparsifier.step() + sparsity_level = self._calculate_sparsity(mod[5][0].weight) + mod(torch.randn(1, 4, 4, 4)) + tq.convert(mod, inplace=True) + + # check that final module is the expected quantized module and that the model runs + self.assertTrue(isinstance(mod[5], torch.nn.intrinsic.quantized.LinearReLU)) + self.assertEqual(mod(torch.randn(1, 4, 4, 4)).shape, torch.Size([1, 4, 4, 4])) + + # check that module was actually sparsified + cur_sparsity = self._calculate_sparsity(mod[5]._weight_bias()[0]) + self.assertGreaterAlmostEqual(cur_sparsity, sparsity_level) + self.assertGreaterAlmostEqual( + sparsity_level, sparse_config[0]["sparsity_level"] + ) + self.assertGreaterAlmostEqual(cur_sparsity, sparse_config[0]["sparsity_level"]) + + # This tests whether performing sparse prepare before qat prepare causes issues. + # The primary worries were that qat_prep wouldn't recognize the parametrized + # modules and that the convert step for qat would remove the paramerizations + # from the modules. + def test_s_prep_before_qat_prep(self): + ( + mod, + sparsifier, + sparse_config, + ) = self._get_model_and_sparsifier_and_sparse_config( + tq.get_default_qat_qconfig("fbgemm") + ) + sparsifier.prepare(mod, config=sparse_config) + tq.prepare_qat(mod, inplace=True) + self.assertTrue(hasattr(mod[0], "parametrizations")) + self.assertTrue(hasattr(mod[5], "parametrizations")) + + # check that correct observers were inserted and that matching + # occured successfully + self.assertTrue(hasattr(mod[5], "activation_post_process")) + self.assertTrue(isinstance(mod[5], torch.nn.qat.Linear)) + self._squash_mask_calibrate_and_convert( + mod, sparsifier, torch.randn(1, 4, 4, 4) + ) + # check that final module is the expected quantized module and that the model runs + self.assertTrue(isinstance(mod[5], torch.nn.quantized.Linear)) + self.assertEqual(mod(torch.randn(1, 4, 4, 4)).shape, torch.Size([1, 4, 4, 4])) + + # check that module was actually sparsified + cur_sparsity = self._calculate_sparsity(mod[5]._weight_bias()[0]) + self.assertGreaterAlmostEqual(cur_sparsity, sparse_config[0]["sparsity_level"]) + + # This tests whether performing qat prepare before sparse prepare causes issues. + def test_qat_prep_before_s_prep(self): + mod, sparsifier, _ = self._get_model_and_sparsifier_and_sparse_config( + tq.get_default_qat_qconfig("fbgemm") + ) + tq.prepare_qat(mod, inplace=True) + + # need to setup sparse_config on new modules + sparse_config = [ + { + "module": mod[5], + "sparsity_level": 0.7, + "sparse_block_shape": (1, 4), + "zeros_per_block": 4, + }, + mod[0], + ] + sparsifier.prepare(mod, config=sparse_config) + + # check that correct modules had parametrizations added and + # that none were lost during qat prepare + self.assertTrue(hasattr(mod[0], "parametrizations")) + self.assertTrue(hasattr(mod[5], "parametrizations")) + + # check that correct observers were inserted and that matching + # occured successfully + self.assertTrue(hasattr(mod[5], "activation_post_process")) + self.assertTrue(isinstance(mod[5], torch.nn.qat.Linear)) + + self._squash_mask_calibrate_and_convert( + mod, sparsifier, torch.randn(1, 4, 4, 4) + ) + + # check that final module is the expected quantized module and that the model runs + self.assertTrue(isinstance(mod[5], torch.nn.quantized.Linear)) + self.assertEqual(mod(torch.randn(1, 4, 4, 4)).shape, torch.Size([1, 4, 4, 4])) + + # check that module was actually sparsified + cur_sparsity = self._calculate_sparsity(mod[5]._weight_bias()[0]) + self.assertGreaterAlmostEqual(cur_sparsity, sparse_config[0]["sparsity_level"]) diff --git a/test/ao/sparsity/test_kernels.py b/test/ao/sparsity/test_kernels.py index 8deec46b4188c1..04a93434599974 100644 --- a/test/ao/sparsity/test_kernels.py +++ b/test/ao/sparsity/test_kernels.py @@ -22,6 +22,7 @@ override_qengines, qengine_is_qnnpack, qengine_is_fbgemm, + qengine_is_onednn, ) # TODO: Once more test files are created, move the contents to a ao folder. @@ -48,6 +49,9 @@ def test_sparse_qlinear(self): # to other higher priority works. if qengine_is_qnnpack() and not (row_block_size == 1 and col_block_size == 4): return + # ONEDNN does not support this yet + if qengine_is_onednn(): + return dense_prepack = torch.ops.quantized.linear_prepack dense_qlinear = torch.ops.quantized.linear @@ -215,6 +219,10 @@ def test_sparse_qlinear(self): Y_hat = sqmodel(X_fp32) self.assertEqual(Y_ref, Y_hat) + # ONEDNN does not support this yet + elif qengine_is_onednn(): + return + row_block_size, col_block_size = sqmodel.linear._packed_params._weight_bias()[2:] assert row_block_size == 1 and col_block_size == 4 diff --git a/test/autograd/test_functional.py b/test/autograd/test_functional.py new file mode 100644 index 00000000000000..3b21be748d8d4e --- /dev/null +++ b/test/autograd/test_functional.py @@ -0,0 +1,1409 @@ +# Owner(s): ["module: autograd"] + +import types +import unittest +import warnings + +import torch +import torch.autograd.functional as autogradF + +from torch.testing._internal.common_cuda import TEST_CUDA +from torch.testing._internal.common_utils import ( + TestCase, run_tests, subtest, gradcheck, gradgradcheck, parametrize, instantiate_parametrized_tests) +from torch.testing._internal.logging_tensor import LoggingTensor + +# Utilities for parametrizing the tensor constructors used in autograd tests +# +# TODO: maybe move somewhere so other tests can also use +# +# NB: Not all factory functions included. A complete(?) list can be found here: +# https://pytorch.org/cppdocs/notes/tensor_creation.html +base_ctors_dict = { + "ones": torch.ones, + "zeros": torch.zeros, + "randn": torch.randn, + "rand": torch.rand, + "tensor": torch.tensor, +} +base_ctors = types.SimpleNamespace(**base_ctors_dict) + +def wrap_with_logging_tensor(ctor): + return lambda *args, **kwargs: LoggingTensor(ctor(*args, **kwargs)) + +logging_tensor_ctors_dict = {k: wrap_with_logging_tensor(ctor) for (k, ctor) in base_ctors_dict.items()} +logging_tensor_ctors = types.SimpleNamespace(**logging_tensor_ctors_dict) + +base_and_logging_tensor = parametrize("ctors", [subtest(base_ctors, name="base_tensor"), + subtest(logging_tensor_ctors, name="logging_tensor")]) + +FIXME_base_and_xfail_logging_tensor = parametrize("ctors", [subtest(base_ctors, name="base_tensor"), + subtest(logging_tensor_ctors, name="logging_tensor", + decorators=[unittest.expectedFailure])]) + +# NB: This is equivalent to having both @parmetrize("vectorized", [True, False]) and +# FIXME_base_and_xfail_logging_tensor, except the non-vectorized logging_tensor case is +# actually expected to succeed +FIXME_xfail_vectorized_logging_tensor = ( + parametrize("vectorize,ctors", [subtest((True, base_ctors), name="vectorized_base_tensor"), + subtest((False, base_ctors), name="base_tensor"), + subtest((True, logging_tensor_ctors), name="vectorized_logging_tensor", + decorators=[unittest.expectedFailure]), + subtest((False, logging_tensor_ctors), name="logging_tensor")])) + + +class TestAutogradFunctional(TestCase): + def _assert_same_struct(self, res, base): + # base and res should be Tensors or tuple of Tensors with the same size + if isinstance(base, torch.Tensor): + self.assertTrue(isinstance(res, torch.Tensor)) + self.assertEqual(base.size(), res.size()) + elif isinstance(base, tuple): + self.assertTrue(isinstance(res, tuple)) + self.assertEqual(len(base), len(res)) + for el_base, el_res in zip(base, res): + self.assertTrue(isinstance(el_base, torch.Tensor)) + self.assertTrue(isinstance(el_res, torch.Tensor)) + self.assertEqual(el_base.size(), el_res.size()) + else: + # Wrong base + raise RuntimeError("The base given to `_assert_same_struct` doesn't have" + " the right structure.") + + def _assert_interleaved_struct(self, res, base1, base2): + # base1 and base2 can be Tensors or tuples of Tensors. + # If they are tuples, res should be a tuple as well. + # The indexing works as follows for base1, base2 being + # - tuple, tuple: res[i][j][k][l] = (base1[i][k], base2[j][l]) + # - tuple, Tensor: res[i][k][l] = (base1[i][k], base2[l]) + # - Tensor, tuple: res[i][j][l] = (base1[i], base2[j][l]) + # - Tensor, Tensor: res[k][l] = (base1[k], base2[l]) + if isinstance(base1, torch.Tensor) and isinstance(base2, torch.Tensor): + self.assertTrue(isinstance(res, torch.Tensor)) + self.assertEqual(res.size(), base1.size() + base2.size()) + elif isinstance(base1, tuple) and isinstance(base2, torch.Tensor): + self.assertTrue(isinstance(res, tuple)) + self.assertEqual(len(res), len(base1)) + for el_res, el_base1 in zip(res, base1): + self.assertTrue(isinstance(el_res, torch.Tensor)) + self.assertTrue(isinstance(el_base1, torch.Tensor)) + self.assertEqual(el_res.size(), el_base1.size() + base2.size()) + elif isinstance(base1, torch.Tensor) and isinstance(base2, tuple): + self.assertTrue(isinstance(res, tuple)) + self.assertEqual(len(res), len(base2)) + for el_res, el_base2 in zip(res, base2): + self.assertTrue(isinstance(el_res, torch.Tensor)) + self.assertTrue(isinstance(el_base2, torch.Tensor)) + self.assertEqual(el_res.size(), base1.size() + el_base2.size()) + elif isinstance(base1, tuple) and isinstance(base2, tuple): + self.assertTrue(isinstance(res, tuple)) + self.assertEqual(len(res), len(base1)) + for el_res, el_base1 in zip(res, base1): + self.assertTrue(isinstance(el_res, tuple)) + self.assertEqual(len(res), len(base2)) + for el_el_res, el_base2 in zip(el_res, base2): + self.assertTrue(isinstance(el_el_res, torch.Tensor)) + self.assertTrue(isinstance(el_base2, torch.Tensor)) + self.assertEqual(el_el_res.size(), el_base1.size() + el_base2.size()) + else: + # Wrong bases + raise RuntimeError("The bases given to `_assert_interleaved_struct` don't have" + " the right structure.") + + @base_and_logging_tensor + def test_vjp_err_check(self, ctors): + def foo(a): + return 3 * a.narrow(0, 0, 3) + + def bar(a): + return 3 * a.narrow(0, 0, 3), "bar" + + inp = ctors.rand(4) + v = ctors.ones(3) + with self.assertRaisesRegex(TypeError, "The inputs given to vjp must be either a Tensor"): + res = autogradF.vjp(foo, (inp, 2), v) + + with self.assertRaisesRegex(TypeError, "The outputs of the user-provided function given to vjp must"): + res = autogradF.vjp(bar, inp, v) + + with self.assertRaisesRegex(RuntimeError, "The vector v can only be None if the user-provided function returns"): + res = autogradF.vjp(foo, inp) + + with self.assertRaisesRegex(RuntimeError, "The given v should contain a single Tensor."): + res = autogradF.vjp(foo, inp, (torch.ones_like(inp), torch.ones_like(inp))) + + with self.assertRaisesRegex(RuntimeError, "v has invalid size: should be torch.Size"): + res = autogradF.vjp(foo, inp, v[:2]) + + res = autogradF.vjp(foo, inp, v)[1] + self._assert_same_struct(res, inp) + + @base_and_logging_tensor + def test_vjp_err_check_strict(self, ctors): + def foo(a): + return a.detach() + + def bar(a): + # Make a non-leaf Tensor that requires_grad but that is not connected to the input + return a.long().float().requires_grad_().clone() + + inp = ctors.rand(4) + v = ctors.rand(4) + with self.assertRaisesRegex(RuntimeError, "Output 0 of the user-provided function does not require gradients."): + res = autogradF.vjp(foo, inp, v, strict=True) + res = autogradF.vjp(foo, inp, v, strict=False) + self._assert_same_struct(res[1], inp) + self.assertEqual(res[1].abs().sum(), 0.) + + with self.assertRaisesRegex(RuntimeError, "The output of the user-provided function is independent of input 0"): + res = autogradF.vjp(bar, inp, v, strict=True) + res = autogradF.vjp(bar, inp, v, strict=False) + self._assert_same_struct(res[1], inp) + self.assertEqual(res[1].abs().sum(), 0.) + + # The Jacobian does not depend on the input + def foo(a): + return a.clone() + + inp.requires_grad_() + with self.assertRaisesRegex(RuntimeError, "jacobian of the user-provided function is independent of input 0."): + res = autogradF.vjp(foo, inp, v, create_graph=True, strict=True) + res = autogradF.vjp(foo, inp, v, create_graph=True, strict=False) + self._assert_same_struct(res[1], inp) + self.assertEqual(res[1], v) + + @base_and_logging_tensor + def test_vjp_no_grad(self, ctors): + def reducer(x): + return x.sum(dim=1) + inputs = ctors.rand(4, 4) + v = ctors.ones(4) + with torch.no_grad(): + res = autogradF.vjp(reducer, inputs, v) + self.assertIsNone(res[0].grad_fn) + self.assertIsNone(res[1].grad_fn) + self.assertNotEqual(res[1], ctors.zeros(4, 4)) + + inputs.requires_grad_() + v.requires_grad_() + with torch.no_grad(): + res = autogradF.vjp(reducer, inputs, v, create_graph=True) + self.assertIsNotNone(res[0].grad_fn) + self.assertIsNotNone(res[1].grad_fn) + self.assertNotEqual(res[1], ctors.zeros(4, 4)) + + @base_and_logging_tensor + def test_vjp_output(self, ctors): + def reducer(x): + return x.sum(dim=1) + inputs = ctors.rand(4, 4) + v = ctors.ones(4) + res = autogradF.vjp(reducer, inputs, v) + self._assert_same_struct(res[1], inputs) + self.assertIsNone(res[0].grad_fn) + self.assertIsNone(res[1].grad_fn) + + def adder(x, y): + return 2 * x + 3 * y + + inputs = (ctors.rand(2), ctors.rand(2)) + v = ctors.ones(2) + out, vjp_val = autogradF.vjp(adder, inputs, v) + self._assert_same_struct(vjp_val, inputs) + self.assertIsNone(out.grad_fn) + self.assertIsNone(vjp_val[0].grad_fn) + self.assertIsNone(vjp_val[1].grad_fn) + + def adder(x, y): + return 2 * x + 3 * y, x + y + + inputs = (ctors.rand(2), ctors.rand(2)) + v = (ctors.tensor([1., 0.]), ctors.tensor([1., 0.])) + out, vjp_val = autogradF.vjp(adder, inputs, v) + self._assert_same_struct(vjp_val, inputs) + self.assertIsNone(out[0].grad_fn) + self.assertIsNone(out[1].grad_fn) + self.assertIsNone(vjp_val[0].grad_fn) + self.assertIsNone(vjp_val[1].grad_fn) + + @base_and_logging_tensor + def test_vjp_scalar(self, ctors): + def reducer(x): + return x.sum() + inputs = ctors.rand(4, 4) + v = ctors.ones([]) + res = autogradF.vjp(reducer, inputs, v) + self._assert_same_struct(res[0], v) + self._assert_same_struct(res[1], inputs) + + res = autogradF.vjp(reducer, inputs) + self._assert_same_struct(res[0], v) + self._assert_same_struct(res[1], inputs) + + def expander(x): + return x.unsqueeze(0).repeat(4) + inputs = ctors.rand([]) + v = ctors.ones(4) + res = autogradF.vjp(expander, inputs, v) + self._assert_same_struct(res[0], v) + self._assert_same_struct(res[1], inputs) + + @FIXME_base_and_xfail_logging_tensor + def test_vjp_create_graph(self, ctors): + def reducer(x): + return x.sum(dim=1) + inputs = ctors.rand(2, 2, dtype=torch.double) + v = ctors.ones(2, dtype=torch.double) + + inputs.requires_grad_() + v.requires_grad_() + res = autogradF.vjp(reducer, inputs, v, create_graph=True) + self._assert_same_struct(res[1], inputs) + self.assertIsNotNone(res[0].grad_fn) + self.assertIsNotNone(res[1].grad_fn) + + gradcheck(lambda inp, v: autogradF.vjp(reducer, inputs, v, create_graph=True), (inputs, v)) + gradgradcheck(lambda inp, v: autogradF.vjp(reducer, inputs, v, create_graph=True), (inputs, v)) + + def adder(x, y): + return 2 * x + 3 * y, x * y + + inputs = (ctors.rand(2, dtype=torch.double, requires_grad=True), + ctors.rand(2, dtype=torch.double, requires_grad=True)) + v = (ctors.tensor([1., 0.], dtype=torch.double, requires_grad=True), + ctors.tensor([1., 0.], dtype=torch.double, requires_grad=True)) + + gradcheck(lambda *args: autogradF.vjp(adder, args[:2], args[2:], create_graph=True)[1], inputs + v) + gradgradcheck(lambda *args: autogradF.vjp(adder, args[:2], args[2:], create_graph=True)[1], inputs + v) + + def foo(*args): + x, y = args[:2] + v = args[2:] + + x = x.cos() + val, grad = autogradF.vjp(adder, (x, y), v, create_graph=True) + + return val[0].exp() + val[1].exp() + grad[0].exp() + grad[1].exp() + x.exp() + y.exp() + + gradcheck(foo, inputs + v) + gradgradcheck(foo, inputs + v) + + @base_and_logging_tensor + def test_jvp_err_check(self, ctors): + def foo(a): + return 3 * a.narrow(0, 0, 3) + + def bar(a): + return 3 * a.narrow(0, 0, 3), "bar" + + inp = ctors.rand(4) + v = ctors.rand(4) + with self.assertRaisesRegex(TypeError, "The inputs given to jvp must be either a Tensor"): + res = autogradF.jvp(foo, (inp, 2), v) + + with self.assertRaisesRegex(TypeError, "The outputs of the user-provided function given to jvp must"): + res = autogradF.jvp(bar, inp, v) + + with self.assertRaisesRegex(RuntimeError, "The vector v can only be None if the input to the user-provided function"): + res = autogradF.jvp(foo, inp) + + with self.assertRaisesRegex(RuntimeError, "The given v should contain a single Tensor."): + res = autogradF.jvp(foo, inp, (v, v)) + + with self.assertRaisesRegex(RuntimeError, "v has invalid size: should be torch.Size"): + res = autogradF.jvp(foo, inp, v[:2]) + + res = autogradF.jvp(foo, inp, v)[1] + self._assert_same_struct(res, foo(inp)) + + @base_and_logging_tensor + def test_jvp_err_check_strict(self, ctors): + def foo(a): + return a.detach() + + def bar(a): + # Make a non-leaf Tensor that requires_grad but that is not connected to the input + return a.long().float().requires_grad_().clone() + + inp = ctors.rand(4) + v = ctors.rand(4) + with self.assertRaisesRegex(RuntimeError, "Output 0 of the user-provided function does not require gradients."): + res = autogradF.jvp(foo, inp, v, strict=True) + res = autogradF.jvp(foo, inp, v, strict=False) + self._assert_same_struct(res[1], res[0]) + self.assertEqual(res[1].abs().sum(), 0.) + + with self.assertRaisesRegex(RuntimeError, "The output of the user-provided function is independent of input 0"): + res = autogradF.jvp(bar, inp, v, strict=True) + res = autogradF.jvp(bar, inp, v, strict=False) + self._assert_same_struct(res[1], res[0]) + self.assertEqual(res[1].abs().sum(), 0.) + + # The Jacobian does not depend on the input + def foo(a): + return a.clone() + + inp.requires_grad_() + with self.assertRaisesRegex(RuntimeError, "jacobian of the user-provided function is independent of input 0."): + res = autogradF.jvp(foo, inp, v, create_graph=True, strict=True) + res = autogradF.jvp(foo, inp, v, create_graph=True, strict=False) + self._assert_same_struct(res[1], inp) + self.assertEqual(res[1], v) + + @base_and_logging_tensor + def test_jvp_no_grad(self, ctors): + def reducer(x): + return x.sum(dim=1) + inputs = ctors.rand(4, 4) + v = ctors.ones(4, 4) + with torch.no_grad(): + res = autogradF.jvp(reducer, inputs, v) + self.assertIsNone(res[0].grad_fn) + self.assertIsNone(res[1].grad_fn) + self.assertNotEqual(res[1], ctors.zeros(4, 4)) + + inputs.requires_grad_() + v.requires_grad_() + with torch.no_grad(): + res = autogradF.jvp(reducer, inputs, v, create_graph=True) + self.assertIsNotNone(res[0].grad_fn) + self.assertIsNotNone(res[1].grad_fn) + self.assertNotEqual(res[1], ctors.zeros(4, 4)) + + @base_and_logging_tensor + def test_jvp_output(self, ctors): + def reducer(x): + return x.sum(dim=1) + inputs = ctors.rand(4, 4) + v = ctors.ones(4, 4) + res = autogradF.jvp(reducer, inputs, v) + self._assert_same_struct(res[1], res[0]) + self.assertIsNone(res[0].grad_fn) + self.assertIsNone(res[1].grad_fn) + + def adder(x, y): + return 2 * x + 3 * y + + inputs = (ctors.rand(2), ctors.rand(2)) + v = (ctors.ones(2), ctors.ones(2)) + out, jvp_val = autogradF.jvp(adder, inputs, v) + self._assert_same_struct(jvp_val, out) + self.assertIsNone(out.grad_fn) + self.assertIsNone(jvp_val[0].grad_fn) + self.assertIsNone(jvp_val[1].grad_fn) + + def adder(x, y): + return 2 * x + 3 * y, x + y + + inputs = (ctors.rand(2), ctors.rand(2)) + v = (ctors.tensor([1., 0.]), ctors.tensor([1., 0.])) + out, jvp_val = autogradF.jvp(adder, inputs, v) + self._assert_same_struct(jvp_val, out) + self.assertIsNone(out[0].grad_fn) + self.assertIsNone(out[1].grad_fn) + self.assertIsNone(jvp_val[0].grad_fn) + self.assertIsNone(jvp_val[1].grad_fn) + + @base_and_logging_tensor + def test_jvp_scalar(self, ctors): + def reducer(x): + return x.sum() + inputs = ctors.rand(4, 4) + v = ctors.ones(4, 4) + res = autogradF.jvp(reducer, inputs, v) + self._assert_same_struct(res[0], ctors.zeros([])) + self._assert_same_struct(res[1], res[0]) + + def expander(x): + return x.unsqueeze(0).repeat(4) + inputs = ctors.rand([]) + v = ctors.ones([]) + res = autogradF.jvp(expander, inputs, v) + self._assert_same_struct(res[0], ctors.zeros(4)) + self._assert_same_struct(res[1], res[0]) + + res = autogradF.jvp(expander, inputs) + self._assert_same_struct(res[0], ctors.zeros(4)) + self._assert_same_struct(res[1], res[0]) + + @FIXME_base_and_xfail_logging_tensor + def test_jvp_create_graph(self, ctors): + def reducer(x): + return x.sum(dim=1) + inputs = ctors.rand(2, 2, dtype=torch.double) + v = ctors.ones(2, 2, dtype=torch.double) + + inputs.requires_grad_() + v.requires_grad_() + res = autogradF.jvp(reducer, inputs, v, create_graph=True) + self._assert_same_struct(res[1], res[0]) + self.assertIsNotNone(res[0].grad_fn) + self.assertIsNotNone(res[1].grad_fn) + + gradcheck(lambda inp, v: autogradF.jvp(reducer, inp, v, create_graph=True), (inputs, v)) + gradgradcheck(lambda inp, v: autogradF.jvp(reducer, inp, v, create_graph=True), (inputs, v)) + + def adder(x, y): + return 2 * x + 3 * y, x * y + + inputs = (ctors.rand(2, dtype=torch.double, requires_grad=True), + ctors.rand(2, dtype=torch.double, requires_grad=True)) + v = (ctors.tensor([1., 0.], dtype=torch.double, requires_grad=True), + ctors.tensor([1., 0.], dtype=torch.double, requires_grad=True)) + + gradcheck(lambda *args: autogradF.jvp(adder, args[:2], args[2:], create_graph=True)[1], inputs + v) + gradgradcheck(lambda *args: autogradF.jvp(adder, args[:2], args[2:], create_graph=True)[1], inputs + v) + + def foo(*args): + x, y = args[:2] + v = args[2:] + + x = x.cos() + val, grad = autogradF.jvp(adder, (x, y), v, create_graph=True) + + return val[0].exp() + val[1].exp() + grad[0].exp() + grad[1].exp() + x.exp() + y.exp() + + gradcheck(foo, inputs + v) + gradgradcheck(foo, inputs + v) + + def _test_construct_standard_basis_for(self, inputs): + numels = tuple(tensor.numel() for tensor in inputs) + results = autogradF._construct_standard_basis_for(inputs, numels) + for result, inp in zip(results, inputs): + self.assertEqual(result.dtype, inp.dtype) + self.assertEqual(result.device, inp.device) + results = torch.cat([result.to(device='cpu', dtype=torch.float) + for result in results], dim=1) + expected = torch.eye(results[0].shape[0], dtype=torch.float) + self.assertEqual(results, expected) + + @base_and_logging_tensor + def test_construct_standard_basis_for(self, ctors): + test_cases = [ + (ctors.randn(2, 3),), + (ctors.randn(1),), + (ctors.randn([]),), + (ctors.randn(1), ctors.randn([]), ctors.randn([])), + (ctors.randn(2), ctors.randn(3), ctors.randn([])), + (ctors.randn(2), ctors.randn([]), ctors.randn(3)), + (ctors.randn(2, 3), ctors.randn(3), ctors.randn(3, 4, 2)), + (ctors.randn(2, dtype=torch.float64), ctors.randn(3, dtype=torch.float32)), + ] + + for inputs in test_cases: + self._test_construct_standard_basis_for(inputs) + + @unittest.skipIf(not TEST_CUDA, "test requires CUDA") + @base_and_logging_tensor + def test_construct_standard_basis_for_cuda(self, ctors): + test_cases = [ + (ctors.randn(2), ctors.randn(3, device='cuda')), + (ctors.randn(3, device='cuda'), ctors.randn(2)), + ] + + for inputs in test_cases: + self._test_construct_standard_basis_for(inputs) + + def _test_vectorize_raises_no_warnings(self, api, ctors): + # vmap is an experimental prototype. When someone calls torch.vmap, + # it raises a python warning. This test checks that + # autogradF.{jacobian, hessian} don't raise that experimental prototype + # warning; it is not nice for a public-facing API to raise a warning + # no matter how it is called. + def foo(a): + return (a ** 2).sum() + + x = ctors.randn(3) + with warnings.catch_warnings(record=True) as wa: + result = api(foo, x, vectorize=True) + self.assertEqual(len(wa), 0) + + @FIXME_base_and_xfail_logging_tensor + def test_jacobian_vectorize_raises_no_warnings(self, ctors): + return self._test_vectorize_raises_no_warnings(autogradF.jacobian, ctors) + + @FIXME_base_and_xfail_logging_tensor + def test_hessian_vectorize_raises_no_warnings(self, ctors): + return self._test_vectorize_raises_no_warnings(autogradF.hessian, ctors) + + @FIXME_xfail_vectorized_logging_tensor + def test_jacobian_err_check(self, vectorize, ctors): + def foo(a): + return 3 * a.narrow(0, 0, 3) + + def bar(a): + return 3 * a.narrow(0, 0, 3), "bar" + + inp = ctors.rand(4) + with self.assertRaisesRegex(TypeError, "The inputs given to jacobian must be either a Tensor"): + res = autogradF.jacobian(foo, (inp, 2), vectorize=vectorize) + + with self.assertRaisesRegex(TypeError, "The outputs of the user-provided function given to jacobian must"): + res = autogradF.jacobian(bar, inp, vectorize=vectorize) + + res = autogradF.jacobian(foo, inp, vectorize=vectorize) + self._assert_interleaved_struct(res, foo(inp), inp) + + def foo(a, b): + return b, 3 * a.narrow(0, 0, 3) + + inp = (ctors.rand(4), ctors.rand(5)) + + res = autogradF.jacobian(foo, inp, vectorize=vectorize) + self._assert_interleaved_struct(res, foo(*inp), inp) + + @base_and_logging_tensor + def test_jacobian_err_check_strict(self, ctors): + def foo(a): + return a.detach() + + def bar(a): + # Make a non-leaf Tensor that requires_grad but that is not connected to the input + return a.long().float().requires_grad_().clone() + + inp = ctors.rand(4) + with self.assertRaisesRegex(RuntimeError, "Output 0 of the user-provided function does not require gradients."): + res = autogradF.jacobian(foo, inp, strict=True) + res = autogradF.jacobian(foo, inp, strict=False) + self._assert_interleaved_struct(res, foo(inp), inp) + self.assertEqual(res.abs().sum(), 0.) + + with self.assertRaisesRegex(RuntimeError, "Output 0 of the user-provided function is independent of input 0."): + res = autogradF.jacobian(bar, inp, strict=True) + res = autogradF.jacobian(bar, inp, strict=False) + self._assert_interleaved_struct(res, foo(inp), inp) + self.assertEqual(res.abs().sum(), 0.) + + # The Jacobian does not depend on the input + def foo(a): + return a.clone() + + inp.requires_grad_() + with self.assertRaisesRegex(RuntimeError, "jacobian of the user-provided function is independent of input 0."): + res = autogradF.jacobian(foo, inp, create_graph=True, strict=True) + res = autogradF.jacobian(foo, inp, create_graph=True, strict=False) + self._assert_interleaved_struct(res, inp, inp) + self.assertEqual(res, torch.eye(4)) + + @base_and_logging_tensor + def test_jacobian_err_check_strict_vectorize(self, ctors): + def foo(x): + return x + + inp = ctors.rand(4) + with self.assertRaisesRegex(RuntimeError, "not supported together"): + res = autogradF.jacobian(foo, inp, strict=True, vectorize=True) + + @base_and_logging_tensor + def test_jacobian_no_grad(self, ctors): + def exp_reducer(x): + return x.exp().sum(dim=1) + + inputs = ctors.rand(4, 4) + with torch.no_grad(): + res = autogradF.jacobian(exp_reducer, inputs) + self.assertIsNone(res.grad_fn) + self.assertNotEqual(res, ctors.zeros(4, 4)) + + with torch.no_grad(): + res = autogradF.jacobian(exp_reducer, inputs, create_graph=True) + self.assertIsNotNone(res.grad_fn) + self.assertNotEqual(res, ctors.zeros(4, 4)) + + @FIXME_xfail_vectorized_logging_tensor + def test_jacobian_output(self, vectorize, ctors): + def exp_reducer(x): + return x.exp().sum(dim=1) + + inputs = ctors.rand(4, 4) + res = autogradF.jacobian(exp_reducer, inputs, vectorize=vectorize) + self._assert_interleaved_struct(res, exp_reducer(inputs), inputs) + self.assertIsNone(res.grad_fn) + + def identity(x): + return x.clone() + + inputs = ctors.rand(4) + res = autogradF.jacobian(identity, inputs, vectorize=vectorize) + self._assert_interleaved_struct(res, identity(inputs), inputs) + self.assertIsNone(res.grad_fn) + self.assertEqual(res, torch.eye(4)) + + def add_exp_reducer(x, y): + return (x + y.exp()).sum(dim=1) + + inputs = (ctors.rand(4, 4), ctors.rand(4, 4)) + res = autogradF.jacobian(add_exp_reducer, inputs, vectorize=vectorize) + self._assert_interleaved_struct(res, add_exp_reducer(*inputs), inputs) + self.assertIsNone(res[0].grad_fn) + self.assertIsNone(res[1].grad_fn) + + @FIXME_xfail_vectorized_logging_tensor + def test_jacobian_scalar(self, vectorize, ctors): + def reducer(x): + return x.sum() + inputs = ctors.rand(4, 4) + res = autogradF.jacobian(reducer, inputs, vectorize=vectorize) + self._assert_same_struct(res, inputs) + + def expander(x): + return x.unsqueeze(0).repeat(4) + inputs = ctors.rand([]) + res = autogradF.jacobian(expander, inputs, vectorize=vectorize) + self._assert_same_struct(res, ctors.zeros(4)) + + @parametrize("vectorize", [True, False]) + @FIXME_base_and_xfail_logging_tensor + def test_jacobian_create_graph(self, vectorize, ctors): + def exp_reducer(x): + return x.exp().sum(dim=1) + + inputs = ctors.rand(4, 4, dtype=torch.double, requires_grad=True) + res = autogradF.jacobian(exp_reducer, inputs, create_graph=True, vectorize=vectorize) + self._assert_interleaved_struct(res, exp_reducer(inputs), inputs) + self.assertIsNotNone(res.grad_fn) + + gradcheck(lambda inp: autogradF.jacobian(exp_reducer, inp, create_graph=True, vectorize=vectorize), inputs) + gradgradcheck(lambda inp: autogradF.jacobian(exp_reducer, inp, create_graph=True, vectorize=vectorize), inputs) + + def add_exp_reducer(x, y): + return (x + y).exp().sum(dim=1) + + inputs = (ctors.rand(4, 4, dtype=torch.double, requires_grad=True), + ctors.rand(4, 4, dtype=torch.double, requires_grad=True)) + res = autogradF.jacobian(add_exp_reducer, inputs, create_graph=True, vectorize=vectorize) + self._assert_interleaved_struct(res, add_exp_reducer(*inputs), inputs) + self.assertIsNotNone(res[0].grad_fn) + self.assertIsNotNone(res[1].grad_fn) + + gradcheck(lambda *inp: autogradF.jacobian(add_exp_reducer, inp, create_graph=True, vectorize=vectorize), inputs) + gradgradcheck(lambda *inp: autogradF.jacobian(add_exp_reducer, inp, create_graph=True, vectorize=vectorize), inputs) + + def foo(x, y): + x = x.cos() + val, jac = autogradF.jacobian(add_exp_reducer, (x, y), create_graph=True, vectorize=vectorize) + + res = val[0].exp().sum() + val[1].exp().sum() + jac[0].exp().sum() + res = res + jac[1].exp().sum() + x.exp().sum() + y.exp().sum() + return res + + gradcheck(foo, inputs) + gradgradcheck(foo, inputs) + + def _check_jacobian_vectorize_correctness(self, f, inputs, test_forward_ad=True): + expected = autogradF.jacobian(f, inputs, vectorize=False) + result_backward_mode = autogradF.jacobian(f, inputs, vectorize=True) + self.assertEqual(result_backward_mode, expected) + + if test_forward_ad: + result_forward_mode = autogradF.jacobian(f, inputs, strategy="forward-mode", vectorize=True) + self.assertEqual(result_forward_mode, expected) + + @FIXME_base_and_xfail_logging_tensor + def test_jacobian_vectorize_correctness_simple(self, ctors): + def f(x): + return 3 * x ** 2 + + x = ctors.randn(2, 3, 5) + self._check_jacobian_vectorize_correctness(f, x) + + @FIXME_base_and_xfail_logging_tensor + def test_jacobian_vectorize_correctness_multi_input(self, ctors): + def f(x, y): + return (x.cos() * x) @ y.sin() + + x = ctors.randn(2, 3) + y = ctors.randn(3, 5) + self._check_jacobian_vectorize_correctness(f, (x, y)) + + @FIXME_base_and_xfail_logging_tensor + def test_jacobian_vectorize_correctness_multi_input_multi_output(self, ctors): + def f(x, y): + return (x * x) @ y, x @ (x.sum(1) * y), y.sum() + + x = ctors.randn(5, 3) + y = ctors.randn(3, 5) + self._check_jacobian_vectorize_correctness(f, (x, y)) + + @FIXME_base_and_xfail_logging_tensor + def test_jacobian_vectorize_correctness_unrelated_outputs(self, ctors): + def f(x, y): + return x, y, x, y + + x = ctors.randn(2) + y = ctors.randn(3) + self._check_jacobian_vectorize_correctness(f, (x, y)) + + @FIXME_base_and_xfail_logging_tensor + def test_jacobian_vectorize_correctness_zero_dim(self, ctors): + # zero-dim output + def f(x, y): + return x.sum(), y.sum(), x * y + + x = ctors.randn(3) + y = ctors.randn(3) + self._check_jacobian_vectorize_correctness(f, (x, y)) + + # zero-dim input + def g(x): + return torch.stack([x, x, x]) + + x = ctors.randn([]) + self._check_jacobian_vectorize_correctness(g, x) + + # Mixed zero-dim input / zero-dim output + def h(x, y): + return y.sum(), x * y + + x = ctors.randn([]) + y = ctors.randn(1) + self._check_jacobian_vectorize_correctness(h, (x, y)) + + @unittest.skipIf(not TEST_CUDA, "test requires CUDA") + @FIXME_base_and_xfail_logging_tensor + def test_jacobian_vectorize_correctness_different_devices(self, ctors): + def f(x, y): + return x * y, (x * y).cuda() + + x = ctors.randn(3) + y = ctors.randn(3) + self._check_jacobian_vectorize_correctness(f, (x, y)) + + @FIXME_base_and_xfail_logging_tensor + def test_jacobian_vectorize_correctness_different_dtype(self, ctors): + def f(x, y): + return (x * y).float(), (x * y).double() + + x = ctors.randn(3) + y = ctors.randn(3) + # The Jacobian computed using forward AD has the dtype of the output + # but the Jacobian computed with reverse AD has dtype of input + self._check_jacobian_vectorize_correctness(f, (x, y), test_forward_ad=False) + + def _check_hessian_vectorize_correctness(self, f, inputs): + expected = autogradF.hessian(f, inputs, vectorize=False) + result = autogradF.hessian(f, inputs, vectorize=True) + self.assertEqual(result, expected) + + result_forward_mode = autogradF.hessian(f, inputs, outer_jacobian_strategy="forward-mode", vectorize=True) + self.assertEqual(result_forward_mode, expected) + + @FIXME_base_and_xfail_logging_tensor + def test_hessian_vectorize_correctness_simple(self, ctors): + def f(x): + return (3 * x ** 2).sum() + + x = ctors.randn(2, 3, 5) + self._check_hessian_vectorize_correctness(f, x) + + @FIXME_base_and_xfail_logging_tensor + def test_hessian_vectorize_correctness_multi_input(self, ctors): + def f(x, y, z): + return ((x.relu() * x) @ y.sin() @ z).sum() + + x = ctors.randn(2, 3) + y = ctors.randn(3, 5) + z = ctors.randn(5, 5) + self._check_hessian_vectorize_correctness(f, (x, y, z)) + + @FIXME_base_and_xfail_logging_tensor + def test_hessian_vectorize_correctness_unrelated_outputs(self, ctors): + # output unrelated to one input + def f(x, y): + return (x ** 2).sum() + + x = ctors.randn(2) + y = ctors.randn(3) + self._check_hessian_vectorize_correctness(f, (x, y)) + + # output unrelated to all inputs + def f(x, y): + return ctors.ones([]) + + x = ctors.randn(2) + y = ctors.randn(3) + self._check_hessian_vectorize_correctness(f, (x, y)) + + @FIXME_xfail_vectorized_logging_tensor + def test_hessian_err_check(self, vectorize, ctors): + def foo(a): + return 3 * a.narrow(0, 0, 3).exp().sum() + + def bar(a): + return 3 * a.narrow(0, 0, 3), "bar" + + def bar2(a): + return 3 * a.narrow(0, 0, 3) + + def bar3(a): + return 3 * a.narrow(0, 0, 3), 3 * a.narrow(0, 0, 3) + + inp = ctors.rand(4) + with self.assertRaisesRegex(TypeError, "The inputs given to hessian must be either a Tensor"): + res = autogradF.hessian(foo, (inp, 2), vectorize=vectorize) + + with self.assertRaisesRegex(TypeError, "The outputs of the user-provided function given to hessian must"): + res = autogradF.hessian(bar, inp, vectorize=vectorize) + + err_msg_out = "The Tensor returned by the function given to hessian should contain a single element" + with self.assertRaisesRegex(RuntimeError, err_msg_out): + res = autogradF.hessian(bar2, inp, vectorize=vectorize) + + with self.assertRaisesRegex(RuntimeError, "The function given to hessian should return a single Tensor"): + res = autogradF.hessian(bar3, inp, vectorize=vectorize) + + res = autogradF.hessian(foo, inp, vectorize=vectorize) + self._assert_interleaved_struct(res, inp, inp) + + def foo(a, b): + return (3 * b.narrow(0, 0, 3) * a.narrow(0, 0, 3)).sum() + + inp = (ctors.rand(4), ctors.rand(5)) + + res = autogradF.hessian(foo, inp, vectorize=vectorize) + self._assert_interleaved_struct(res, inp, inp) + + @base_and_logging_tensor + def test_hessian_err_check_strict(self, ctors): + def foo(a): + return a.detach().sum() + + def bar(a): + # Make a non-leaf Tensor that requires_grad but that is not connected to the input + return a.long().float().requires_grad_().clone().sum() + + def bar2(a): + # A Linear function for which the jacobian is independent of the input + return (3 * a).sum() + + inp = ctors.rand(4) + with self.assertRaisesRegex(RuntimeError, "Output 0 of the user-provided function does not require gradients."): + res = autogradF.hessian(foo, inp, strict=True) + res = autogradF.hessian(foo, inp, strict=False) + self._assert_interleaved_struct(res, inp, inp) + self.assertEqual(res.abs().sum(), 0.) + + with self.assertRaisesRegex(RuntimeError, "jacobian of the user-provided function with respect to input 0"): + res = autogradF.hessian(bar, inp, strict=True) + res = autogradF.hessian(bar, inp, strict=False) + self._assert_interleaved_struct(res, inp, inp) + self.assertEqual(res.abs().sum(), 0.) + + with self.assertRaisesRegex(RuntimeError, "jacobian of the user-provided function with respect to input 0 is"): + res = autogradF.hessian(bar2, inp, strict=True) + res = autogradF.hessian(bar2, inp, strict=False) + self._assert_interleaved_struct(res, inp, inp) + self.assertEqual(res.abs().sum(), 0.) + + @base_and_logging_tensor + def test_hessian_err_check_strict_vectorize(self, ctors): + def foo(x): + return (x ** 3).sum() + + inp = ctors.rand(4) + with self.assertRaisesRegex(RuntimeError, "not supported together"): + res = autogradF.hessian(foo, inp, strict=True, vectorize=True) + + @base_and_logging_tensor + def test_hessian_no_grad(self, ctors): + def pow_reducer(x): + return x.pow(3).sum() + + inputs = ctors.rand(2, 2) + with torch.no_grad(): + res = autogradF.hessian(pow_reducer, inputs) + self.assertIsNone(res[0][0].grad_fn) + self.assertIsNone(res[0][1].grad_fn) + self.assertIsNone(res[1][0].grad_fn) + self.assertIsNone(res[1][1].grad_fn) + self.assertNotEqual(res, ctors.zeros(2, 2, 2)) + + with torch.no_grad(): + res = autogradF.hessian(pow_reducer, inputs, create_graph=True) + self.assertIsNotNone(res[0][0].grad_fn) + self.assertIsNotNone(res[0][1].grad_fn) + self.assertIsNotNone(res[1][0].grad_fn) + self.assertIsNotNone(res[1][1].grad_fn) + self.assertNotEqual(res, ctors.zeros(2, 2, 2)) + + @FIXME_xfail_vectorized_logging_tensor + def test_hessian_output(self, vectorize, ctors): + def pow_reducer(x): + return x.pow(3).sum() + + inputs = ctors.rand(2, 2) + res = autogradF.hessian(pow_reducer, inputs, vectorize=vectorize) + self._assert_interleaved_struct(res, inputs, inputs) + self.assertIsNone(res.grad_fn) + + def add_pow_reducer(x, y): + return (x + y).pow(3).sum() + + inputs = (ctors.rand(2, 2), ctors.rand(2, 2)) + res = autogradF.hessian(add_pow_reducer, inputs, vectorize=vectorize) + self._assert_interleaved_struct(res, inputs, inputs) + self.assertIsNone(res[0][0].grad_fn) + self.assertIsNone(res[0][1].grad_fn) + self.assertIsNone(res[1][0].grad_fn) + self.assertIsNone(res[1][1].grad_fn) + + @parametrize("vectorize", [True, False]) + @base_and_logging_tensor + def test_hessian_scalar(self, vectorize, ctors): + def reducer(x): + return x.sum() + inputs = ctors.rand(4, 4) + res = autogradF.hessian(reducer, inputs, vectorize=vectorize) + self._assert_interleaved_struct(res, inputs, inputs) + + inputs = ctors.rand([]) + res = autogradF.hessian(reducer, inputs, vectorize=vectorize) + self._assert_same_struct(res, inputs) + + def bad_reducer(x): + return x.sum().view(1, 1, 1) + inputs = ctors.rand(4, 4) + res = autogradF.hessian(bad_reducer, inputs, vectorize=vectorize) + self._assert_interleaved_struct(res, inputs, inputs) + + @parametrize("vectorize", [True, False]) + @FIXME_base_and_xfail_logging_tensor + def test_hessian_create_graph(self, vectorize, ctors): + def pow_reducer(x): + return x.pow(3).sum() + + inputs = ctors.rand(2, 2, dtype=torch.double, requires_grad=True) + res = autogradF.hessian(pow_reducer, inputs, create_graph=True, vectorize=vectorize) + self._assert_interleaved_struct(res, inputs, inputs) + self.assertIsNotNone(res.grad_fn) + + gradcheck(lambda inp: autogradF.hessian(pow_reducer, inp, create_graph=True, vectorize=vectorize), inputs) + gradgradcheck(lambda inp: autogradF.hessian(pow_reducer, inp, create_graph=True, vectorize=vectorize), inputs) + + def add_pow_reducer(x, y): + return (x + y).pow(3).sum() + + inputs = (ctors.rand(2, 2, dtype=torch.double, requires_grad=True), + ctors.rand(2, 2, dtype=torch.double, requires_grad=True)) + res = autogradF.hessian(add_pow_reducer, inputs, create_graph=True, vectorize=vectorize) + self._assert_interleaved_struct(res, inputs, inputs) + self.assertIsNotNone(res[0][0].grad_fn) + self.assertIsNotNone(res[0][1].grad_fn) + self.assertIsNotNone(res[1][0].grad_fn) + self.assertIsNotNone(res[1][1].grad_fn) + + def flatten(inp): + return tuple(el_lvl2 for el_lvl1 in inp for el_lvl2 in el_lvl1) + + gradcheck(lambda *inp: flatten(autogradF.hessian(add_pow_reducer, inp, create_graph=True, vectorize=vectorize)), inputs) + gradgradcheck(lambda *inp: flatten(autogradF.hessian(add_pow_reducer, inp, create_graph=True, vectorize=vectorize)), inputs) + + def foo(x, y): + x = x.cos() + val, hess = autogradF.hessian(add_pow_reducer, (x, y), create_graph=True, vectorize=vectorize) + + res = val[0].cos().sum() + val[1].cos().sum() + hess[0].cos().sum() + res = res + hess[1].cos().sum() + x.cos().sum() + y.cos().sum() + return res + + gradcheck(foo, inputs) + gradgradcheck(foo, inputs) + + @base_and_logging_tensor + def test_vhp_err_check(self, ctors): + def foo(a): + return 3 * a.narrow(0, 0, 3).exp().sum() + + def bar(a): + return 3 * a.narrow(0, 0, 3), "bar" + + def bar2(a): + return 3 * a.narrow(0, 0, 3) + + inp = ctors.rand(4) + v = ctors.rand(4) + with self.assertRaisesRegex(TypeError, "The inputs given to vhp must be either a Tensor"): + res = autogradF.vhp(foo, (inp, 2), v) + + with self.assertRaisesRegex(TypeError, "The outputs of the user-provided function given to vhp must"): + res = autogradF.vhp(bar, inp, v) + + err_msg_out = "The Tensor returned by the function given to vhp should contain a single element" + with self.assertRaisesRegex(RuntimeError, err_msg_out): + res = autogradF.vhp(bar2, inp, v) + + with self.assertRaisesRegex(RuntimeError, "v has invalid size:"): + res = autogradF.vhp(foo, inp, ctors.rand(5)) + + with self.assertRaisesRegex(TypeError, "The v given to vhp must be either a Tensor or a tuple of Tensors"): + res = autogradF.vhp(foo, inp, (v, 2)) + + res = autogradF.vhp(foo, inp, v) + self._assert_same_struct(res[1], inp) + + def foo(a, b): + return (3 * b.narrow(0, 0, 3) * a.narrow(0, 0, 3)).sum() + + inp = (ctors.rand(4), ctors.rand(5)) + v = (ctors.rand(4), ctors.rand(5)) + + res = autogradF.vhp(foo, inp, v) + self._assert_same_struct(res[1], inp) + + @base_and_logging_tensor + def test_vhp_err_check_strict(self, ctors): + def foo(a): + return a.detach().sum() + + def bar(a): + # Make a non-leaf Tensor that requires_grad but that is not connected to the input + return a.long().float().requires_grad_().clone().sum() + + def bar2(a): + # A Linear function for which the jacobian is independent of the input + return (3 * a).sum() + + inp = ctors.rand(4) + v = ctors.rand(4) + with self.assertRaisesRegex(RuntimeError, "Output 0 of the user-provided function does not require gradients."): + res = autogradF.vhp(foo, inp, v, strict=True) + res = autogradF.vhp(foo, inp, v, strict=False) + self._assert_same_struct(res[1], inp) + self.assertEqual(res[1].abs().sum(), 0.) + + with self.assertRaisesRegex(RuntimeError, "The output of the user-provided function is independent of input 0"): + res = autogradF.vhp(bar, inp, v, strict=True) + res = autogradF.vhp(bar, inp, v, strict=False) + self._assert_same_struct(res[1], inp) + self.assertEqual(res[1].abs().sum(), 0.) + + with self.assertRaisesRegex(RuntimeError, "jacobian of the user-provided function with respect to input 0 is"): + res = autogradF.vhp(bar2, inp, v, strict=True) + res = autogradF.vhp(bar2, inp, v, strict=False) + self._assert_same_struct(res[1], inp) + self.assertEqual(res[1].abs().sum(), 0.) + + @base_and_logging_tensor + def test_vhp_no_grad(self, ctors): + def reducer(x): + return x.exp().sum() + inputs = ctors.rand(4, 4) + v = ctors.ones(4, 4) + with torch.no_grad(): + res = autogradF.vhp(reducer, inputs, v) + self.assertIsNone(res[0].grad_fn) + self.assertIsNone(res[1].grad_fn) + self.assertNotEqual(res[1], ctors.zeros(4, 4)) + + with torch.no_grad(): + res = autogradF.vhp(reducer, inputs, v, create_graph=True) + self.assertIsNotNone(res[0].grad_fn) + self.assertIsNotNone(res[1].grad_fn) + self.assertNotEqual(res[1], ctors.zeros(4, 4)) + + @base_and_logging_tensor + def test_vhp_output(self, ctors): + def foo(a): + return 3 * a.narrow(0, 0, 3).exp().sum() + + inputs = ctors.rand(4, 4) + v = ctors.ones(4, 4) + res = autogradF.vhp(foo, inputs, v) + self._assert_same_struct(res[1], inputs) + self.assertIsNone(res[0].grad_fn) + self.assertIsNone(res[1].grad_fn) + + def bar(a, b): + return (a + 3 * b.narrow(0, 0, 3)).exp().sum() + + inputs = (ctors.rand(3), ctors.rand(4)) + v = (ctors.ones(3), ctors.ones(4)) + out, vhp_val = autogradF.vhp(bar, inputs, v) + self._assert_same_struct(vhp_val, inputs) + self.assertIsNone(out.grad_fn) + self.assertIsNone(vhp_val[0].grad_fn) + self.assertIsNone(vhp_val[1].grad_fn) + + @base_and_logging_tensor + def test_vhp_scalar(self, ctors): + def reducer(x): + return x.sum() + inputs = ctors.rand(4, 4) + v = ctors.ones(4, 4) + res = autogradF.vhp(reducer, inputs, v) + self._assert_same_struct(res[1], inputs) + + inputs = ctors.rand([]) + v = ctors.rand([]) + res = autogradF.vhp(reducer, inputs, v) + self._assert_same_struct(res[1], inputs) + + res = autogradF.vhp(reducer, inputs) + self._assert_same_struct(res[1], inputs) + + def bad_reducer(x): + return x.sum().view(1, 1, 1) + inputs = ctors.rand(4, 4) + v = ctors.rand(4, 4) + res = autogradF.vhp(bad_reducer, inputs, v) + self._assert_same_struct(res[1], inputs) + + @FIXME_base_and_xfail_logging_tensor + def test_vhp_create_graph(self, ctors): + def foo(a): + return 3 * a.narrow(0, 0, 3).exp().sum() + + inputs = ctors.rand(4, 4, dtype=torch.double, requires_grad=True) + v = ctors.ones(4, 4, dtype=torch.double, requires_grad=True) + res = autogradF.vhp(foo, inputs, v, create_graph=True) + self._assert_same_struct(res[1], inputs) + self.assertIsNotNone(res[0].grad_fn) + self.assertIsNotNone(res[1].grad_fn) + + gradcheck(lambda inp, v: autogradF.vhp(foo, inp, v, create_graph=True), (inputs, v)) + gradgradcheck(lambda inp, v: autogradF.vhp(foo, inp, v, create_graph=True), (inputs, v)) + + def bar(a, b): + return (a + 3 * b.narrow(0, 0, 3)).exp().sum() + + inputs = (ctors.rand(3, dtype=torch.double, requires_grad=True), + ctors.rand(4, dtype=torch.double, requires_grad=True)) + v = (ctors.ones(3, dtype=torch.double, requires_grad=True), + ctors.ones(4, dtype=torch.double, requires_grad=True)) + out, vhp_val = autogradF.vhp(bar, inputs, v, create_graph=True) + self._assert_same_struct(vhp_val, inputs) + self.assertIsNotNone(out.grad_fn) + self.assertIsNotNone(vhp_val[0].grad_fn) + self.assertIsNotNone(vhp_val[1].grad_fn) + + gradcheck(lambda *args: autogradF.vhp(bar, args[:2], args[2:], create_graph=True)[1], inputs + v) + gradgradcheck(lambda *args: autogradF.vhp(bar, args[:2], args[2:], create_graph=True)[1], inputs + v) + + def foo(*args): + x, y = args[:2] + v = args[2:] + + x = x.cos() + val, grad = autogradF.vhp(bar, (x, y), v, create_graph=True) + + return val.cos() + grad[0].cos().sum() + grad[1].cos() + x.cos().sum() + y.cos() + + gradcheck(foo, inputs + v) + gradgradcheck(foo, inputs + v) + + @base_and_logging_tensor + def test_hvp_err_check(self, ctors): + def foo(a): + return 3 * a.narrow(0, 0, 3).exp().sum() + + def bar(a): + return 3 * a.narrow(0, 0, 3), "bar" + + def bar2(a): + return 3 * a.narrow(0, 0, 3) + + inp = ctors.rand(4) + v = ctors.rand(4) + res = autogradF.hvp(foo, inp, v) + with self.assertRaisesRegex(TypeError, "The inputs given to hvp must be either a Tensor"): + res = autogradF.hvp(foo, (inp, 2), v) + + with self.assertRaisesRegex(TypeError, "The outputs of the user-provided function given to hvp must"): + res = autogradF.hvp(bar, inp, v) + + err_msg_out = "The Tensor returned by the function given to hvp should contain a single element" + with self.assertRaisesRegex(RuntimeError, err_msg_out): + res = autogradF.hvp(bar2, inp, v) + + with self.assertRaisesRegex(RuntimeError, "v has invalid size:"): + res = autogradF.hvp(foo, inp, ctors.rand(5)) + + with self.assertRaisesRegex(TypeError, "The v given to hvp must be either a Tensor or a tuple of Tensors"): + res = autogradF.hvp(foo, inp, (v, 2)) + + res = autogradF.hvp(foo, inp, v) + self._assert_same_struct(res[1], inp) + + def foo(a, b): + return (3 * b.narrow(0, 0, 3) * a.narrow(0, 0, 3)).sum() + + inp = (ctors.rand(4), ctors.rand(5)) + v = (ctors.rand(4), ctors.rand(5)) + + res = autogradF.hvp(foo, inp, v) + self._assert_same_struct(res[1], inp) + + @base_and_logging_tensor + def test_hvp_err_check_strict(self, ctors): + def foo(a): + return a.detach().sum() + + def bar(a): + # Make a non-leaf Tensor that requires_grad but that is not connected to the input + return a.long().float().requires_grad_().clone().sum() + + def bar2(a): + # A Linear function for which the jacobian is independent of the input + return (3 * a).sum() + + inp = ctors.rand(4) + v = ctors.rand(4) + with self.assertRaisesRegex(RuntimeError, "Output 0 of the user-provided function does not require gradients."): + res = autogradF.hvp(foo, inp, v, strict=True) + res = autogradF.hvp(foo, inp, v, strict=False) + self._assert_same_struct(res[1], inp) + self.assertEqual(res[1].abs().sum(), 0.) + + with self.assertRaisesRegex(RuntimeError, "The output of the user-provided function is independent of input 0"): + res = autogradF.hvp(bar, inp, v, strict=True) + res = autogradF.hvp(bar, inp, v, strict=False) + self._assert_same_struct(res[1], inp) + self.assertEqual(res[1].abs().sum(), 0.) + + with self.assertRaisesRegex(RuntimeError, "jacobian of the user-provided function with respect to input 0 is"): + res = autogradF.hvp(bar2, inp, v, strict=True) + res = autogradF.hvp(bar2, inp, v, strict=False) + self._assert_same_struct(res[1], inp) + self.assertEqual(res[1].abs().sum(), 0.) + + @base_and_logging_tensor + def test_hvp_no_grad(self, ctors): + def reducer(x): + return x.exp().sum() + inputs = ctors.rand(4, 4) + v = ctors.ones(4, 4) + with torch.no_grad(): + res = autogradF.hvp(reducer, inputs, v) + self.assertIsNone(res[0].grad_fn) + self.assertIsNone(res[1].grad_fn) + self.assertNotEqual(res[1], ctors.zeros(4, 4)) + + with torch.no_grad(): + res = autogradF.hvp(reducer, inputs, v, create_graph=True) + self.assertIsNotNone(res[0].grad_fn) + self.assertIsNotNone(res[1].grad_fn) + self.assertNotEqual(res[1], ctors.zeros(4, 4)) + + @base_and_logging_tensor + def test_hvp_output(self, ctors): + def foo(a): + return 3 * a.narrow(0, 0, 3).exp().sum() + + inputs = ctors.rand(4, 4) + v = ctors.ones(4, 4) + res = autogradF.hvp(foo, inputs, v) + self._assert_same_struct(res[1], inputs) + self.assertIsNone(res[0].grad_fn) + self.assertIsNone(res[1].grad_fn) + + def bar(a, b): + return (a + 3 * b.narrow(0, 0, 3)).exp().sum() + + inputs = (ctors.rand(3), ctors.rand(4)) + v = (ctors.ones(3), ctors.ones(4)) + out, hvp_val = autogradF.hvp(bar, inputs, v) + self._assert_same_struct(hvp_val, inputs) + self.assertIsNone(out.grad_fn) + self.assertIsNone(hvp_val[0].grad_fn) + self.assertIsNone(hvp_val[1].grad_fn) + + @base_and_logging_tensor + def test_hvp_scalar(self, ctors): + def reducer(x): + return x.exp().sum() + inputs = ctors.rand(4, 4) + v = ctors.ones(4, 4) + res = autogradF.hvp(reducer, inputs, v) + self._assert_same_struct(res[1], inputs) + + inputs = ctors.rand([]) + v = ctors.rand([]) + res = autogradF.hvp(reducer, inputs, v) + self._assert_same_struct(res[1], inputs) + + res = autogradF.hvp(reducer, inputs) + self._assert_same_struct(res[1], inputs) + + def bad_reducer(x): + return x.exp().sum().view(1, 1, 1) + inputs = ctors.rand(4, 4) + v = ctors.rand(4, 4) + res = autogradF.hvp(bad_reducer, inputs, v) + self._assert_same_struct(res[1], inputs) + + @FIXME_base_and_xfail_logging_tensor + def test_hvp_create_graph(self, ctors): + def foo(a): + return 3 * a.narrow(0, 0, 3).exp().sum() + + inputs = ctors.rand(4, 4, dtype=torch.double, requires_grad=True) + v = ctors.ones(4, 4, dtype=torch.double, requires_grad=True) + res = autogradF.hvp(foo, inputs, v, create_graph=True) + self._assert_same_struct(res[1], inputs) + self.assertIsNotNone(res[0].grad_fn) + self.assertIsNotNone(res[1].grad_fn) + + gradcheck(lambda inp, v: autogradF.hvp(foo, inp, v, create_graph=True), (inputs, v)) + gradgradcheck(lambda inp, v: autogradF.hvp(foo, inp, v, create_graph=True), (inputs, v)) + + def bar(a, b): + return (a + 3 * b.narrow(0, 0, 3)).exp().sum() + + inputs = (ctors.rand(3, dtype=torch.double, requires_grad=True), + ctors.rand(4, dtype=torch.double, requires_grad=True)) + v = (ctors.ones(3, dtype=torch.double, requires_grad=True), + ctors.ones(4, dtype=torch.double, requires_grad=True)) + out, hvp_val = autogradF.hvp(bar, inputs, v, create_graph=True) + self._assert_same_struct(hvp_val, inputs) + self.assertIsNotNone(out.grad_fn) + self.assertIsNotNone(hvp_val[0].grad_fn) + self.assertIsNotNone(hvp_val[1].grad_fn) + + gradcheck(lambda *args: autogradF.hvp(bar, args[:2], args[2:], create_graph=True)[1], inputs + v) + gradgradcheck(lambda *args: autogradF.hvp(bar, args[:2], args[2:], create_graph=True)[1], inputs + v) + + def foo(*args): + x, y = args[:2] + v = args[2:] + + x = x.cos() + val, grad = autogradF.hvp(bar, (x, y), v, create_graph=True) + + return val.cos() + grad[0].cos().sum() + grad[1].cos() + x.cos().sum() + y.cos() + + gradcheck(foo, inputs + v) + gradgradcheck(foo, inputs + v) + + @base_and_logging_tensor + def test_jacobian_match_vjp_jvp(self, ctors): + def foo(x): + return x ** 3 + x.sum() + + inputs = ctors.rand(4) + v = ctors.rand(4) + + jac = autogradF.jacobian(foo, inputs) + jvp = autogradF.jvp(foo, inputs, v)[1] + vjp = autogradF.vjp(foo, inputs, v)[1] + + self.assertEqual(jvp, torch.mm(jac, v.unsqueeze(1)).squeeze(1)) + self.assertEqual(vjp, torch.mm(v.unsqueeze(0), jac).squeeze(0)) + + @base_and_logging_tensor + def test_hessian_match_vhp_hvp(self, ctors): + def foo(a): + return 3 * a.narrow(0, 0, 3).exp().sum() + + inputs = ctors.rand(4) + v = ctors.rand(4) + + hes = autogradF.hessian(foo, inputs) + hvp = autogradF.hvp(foo, inputs, v)[1] + vhp = autogradF.vhp(foo, inputs, v)[1] + + self.assertEqual(hvp, torch.mm(hes, v.unsqueeze(1)).squeeze(1)) + self.assertEqual(vhp, torch.mm(v.unsqueeze(0), hes).squeeze(0)) + +instantiate_parametrized_tests(TestAutogradFunctional) + +if __name__ == '__main__': + run_tests() diff --git a/test/benchmark_utils/test_benchmark_utils.py b/test/benchmark_utils/test_benchmark_utils.py index a98c0ac97b4c92..a1e2adaacfa913 100644 --- a/test/benchmark_utils/test_benchmark_utils.py +++ b/test/benchmark_utils/test_benchmark_utils.py @@ -170,6 +170,7 @@ def test_timer(self): @slowTest @unittest.skipIf(IS_SANDCASTLE, "C++ timing is OSS only.") + @unittest.skipIf(True, "Failing on clang, see 74398") def test_timer_tiny_fast_snippet(self): timer = benchmark_utils.Timer( 'auto x = 1;(void)x;', @@ -181,6 +182,7 @@ def test_timer_tiny_fast_snippet(self): @slowTest @unittest.skipIf(IS_SANDCASTLE, "C++ timing is OSS only.") + @unittest.skipIf(True, "Failing on clang, see 74398") def test_cpp_timer(self): timer = benchmark_utils.Timer( """ @@ -547,6 +549,7 @@ def add_one(x): @slowTest @unittest.skipIf(IS_WINDOWS, "Valgrind is not supported on Windows.") @unittest.skipIf(IS_SANDCASTLE, "Valgrind is OSS only.") + @unittest.skipIf(True, "Failing on clang, see 74398") def test_collect_cpp_callgrind(self): timer = benchmark_utils.Timer( "x += 1;", diff --git a/test/cpp/api/dataloader.cpp b/test/cpp/api/dataloader.cpp index c0622ba41cbd16..9b71b721b3db93 100644 --- a/test/cpp/api/dataloader.cpp +++ b/test/cpp/api/dataloader.cpp @@ -1982,7 +1982,7 @@ TEST(DataLoaderTest, ChunkDatasetSave) { for (const auto epoch_index : c10::irange(epoch_count)) { (void)epoch_index; // Suppress unused variable warning - int iteration_count = 0; + unsigned iteration_count = 0; for (auto iterator = data_loader->begin(); iterator != data_loader->end(); ++iterator, ++iteration_count) { if ((iteration_count + 1) % save_interval == 0) { @@ -2316,7 +2316,7 @@ TEST(DataLoaderTest, CustomPreprocessPolicy) { ++iterator) { auto batch_result = *iterator; if (batch_result.size() > chunk_size * cross_chunk_shuffle_count) { - for (int i = 0; i < batch_result.size(); i += chunk_size) { + for (unsigned i = 0; i < batch_result.size(); i += chunk_size) { ASSERT_TRUE(std::is_sorted( batch_result.begin() + i, batch_result.begin() + i + chunk_size)); diff --git a/test/cpp/api/init.cpp b/test/cpp/api/init.cpp index 9e2ed422e28beb..222d4f1171c4d1 100644 --- a/test/cpp/api/init.cpp +++ b/test/cpp/api/init.cpp @@ -19,7 +19,7 @@ void check_exact_values( auto layerParameters = parameters[i]; auto expectedLayerParameters = expected_parameters[i]; - if (layerParameters.size(0) != expectedLayerParameters.size()) { + if (static_cast(layerParameters.size(0)) != expectedLayerParameters.size()) { std::cout << "layer #" << i << " layerParameters size: " << layerParameters.size(0) << " != " diff --git a/test/cpp/api/misc.cpp b/test/cpp/api/misc.cpp index a8d6320e9533d5..734cea27e5cca7 100644 --- a/test/cpp/api/misc.cpp +++ b/test/cpp/api/misc.cpp @@ -90,3 +90,14 @@ TEST(UtilsTest, AmbiguousOperatorDefaults) { at::_test_ambiguous_defaults(tmp, 1, 1); at::_test_ambiguous_defaults(tmp, 2, "2"); } + +int64_t get_first_element(c10::OptionalIntArrayRef arr) { + return arr.value()[0]; +} + +TEST(OptionalArrayRefTest, DanglingPointerFix) { + // Ensure that the converting constructor of `OptionalArrayRef` does not + // create a dangling pointer when given a single value + ASSERT_TRUE(get_first_element(300) == 300); + ASSERT_TRUE(get_first_element({400}) == 400); +} diff --git a/test/cpp/api/nn_utils.cpp b/test/cpp/api/nn_utils.cpp index 451c72e9d7762a..be371b1ae6d49a 100644 --- a/test/cpp/api/nn_utils.cpp +++ b/test/cpp/api/nn_utils.cpp @@ -615,7 +615,7 @@ TEST_F(NNUtilsTest, PackPaddedSequence) { } int64_t offset = 0; std::vector tensors_to_be_cat; - for (int64_t i = 1; i < sorted_lengths.size() + 1; i++) { + for (int64_t i = 1; i < static_cast(sorted_lengths.size() + 1); i++) { int64_t l = sorted_lengths.at(i-1); tensors_to_be_cat.emplace_back(pad(i * 100 + torch::arange(1., 5 * l + 1).view({l, 1, 5}), max_length)); } diff --git a/test/cpp/api/parameterdict.cpp b/test/cpp/api/parameterdict.cpp index 5f2eab5d6b289e..21dd1b31d5a88c 100644 --- a/test/cpp/api/parameterdict.cpp +++ b/test/cpp/api/parameterdict.cpp @@ -105,7 +105,7 @@ TEST_F(ParameterDictTest, Values) { auto dict = torch::nn::ParameterDict(params); std::vector values = dict->values(); std::vector true_values{ta, tb, tc}; - for (auto i = 0; i < values.size(); i += 1) { + for (auto i = 0U; i < values.size(); i += 1) { ASSERT_TRUE(torch::all(torch::eq(values[i], true_values[i])).item()); } } diff --git a/test/cpp/api/serialize.cpp b/test/cpp/api/serialize.cpp index b422662aa3623f..ecad2348674b79 100644 --- a/test/cpp/api/serialize.cpp +++ b/test/cpp/api/serialize.cpp @@ -129,7 +129,7 @@ void test_serialize_optimizer(DerivedOptimizerOptions options, bool only_has_glo // optim3_2 and optim1 should have param_groups and state of size 1 and state_size respectively ASSERT_TRUE(optim3_2_param_groups.size() == 1); // state_size = 2 for all optimizers except LBFGS as LBFGS only maintains one global state - int state_size = only_has_global_state ? 1 : 2; + unsigned state_size = only_has_global_state ? 1 : 2; ASSERT_TRUE(optim3_2_state.size() == state_size); // optim3_2 and optim1 should have param_groups and state of same size @@ -355,6 +355,7 @@ TEST(SerializeTest, ErrorOnMissingKey) { // We want the errors to contain hierarchy information, too. ASSERT_THROWS_WITH( torch::load(model2, stream), "No such serialized tensor 'a.b.x'"); + stream.seekg(0, stream.beg); ASSERT_THROWS_WITH( torch::load(model3, stream), "No such serialized submodule: 'a.x'"); } diff --git a/test/cpp/jit/CMakeLists.txt b/test/cpp/jit/CMakeLists.txt index 7e591925f19443..0c36d22c8dd956 100644 --- a/test/cpp/jit/CMakeLists.txt +++ b/test/cpp/jit/CMakeLists.txt @@ -95,8 +95,11 @@ set(JIT_TEST_SRCS ) if(USE_CUDA) - list(APPEND JIT_TEST_SRCS ${JIT_TEST_ROOT}/test_gpu.cpp) - list(APPEND JIT_TEST_SRCS ${JIT_TEST_ROOT}/test_gpu_shift.cpp) + list(APPEND JIT_TEST_SRCS ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/test/test_gpu.cpp) + list(APPEND JIT_TEST_SRCS ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/test/test_gpu_fused_reduction.cpp) + list(APPEND JIT_TEST_SRCS ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/test/test_gpu_shift.cpp) + list(APPEND JIT_TEST_SRCS ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/test/test_gpu_tensorcore.cpp) + list(APPEND JIT_TEST_SRCS ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/test/test_gpu_view.cpp) endif() add_executable(test_jit diff --git a/test/cpp/jit/source_range_test.cpp b/test/cpp/jit/source_range_test.cpp deleted file mode 100644 index 244db9c0085cd3..00000000000000 --- a/test/cpp/jit/source_range_test.cpp +++ /dev/null @@ -1,32 +0,0 @@ -#include -#include - -using namespace ::testing; -using namespace ::torch::jit; - -TEST(SourceRangeTest, test_find) { - std::vector> strings; - strings.push_back(std::make_shared("hello world")); - strings.push_back(std::make_shared("nihaoma")); - - std::vector pieces{*strings[0], *strings[1]}; - - StringCordView view(pieces, strings); - - auto x = view.find("rldni", 0); - EXPECT_EQ(x, 8); -} - -TEST(SourceRangeTest, test_substr) { - std::vector> strings; - strings.push_back(std::make_shared("hello world")); - strings.push_back(std::make_shared("nihaoma")); - - std::vector pieces{*strings[0], *strings[1]}; - - StringCordView view(pieces, strings); - - auto x = view.substr(4, 10).str(); - EXPECT_EQ(x, view.str().substr(4, 10)); - EXPECT_EQ(view.substr(0, view.size()).str(), view.str()); -} diff --git a/test/cpp/jit/test_autodiff.cpp b/test/cpp/jit/test_autodiff.cpp index e8bfefe642630d..6a087adb63c851 100644 --- a/test/cpp/jit/test_autodiff.cpp +++ b/test/cpp/jit/test_autodiff.cpp @@ -289,14 +289,11 @@ class AutodiffRemoveUnusedGradientsTest : public ::testing::Test { void SetUp() override { prev_exec = getExecutorMode(); getExecutorMode() = true; - prev_profiling = getProfilingMode(); - getProfilingMode() = true; prev_inline_autodiff = getAutodiffSubgraphInlining(); debugSetAutodiffSubgraphInlining(false); } void TearDown() override { getExecutorMode() = prev_exec; - getProfilingMode() = prev_profiling; debugSetAutodiffSubgraphInlining(prev_inline_autodiff); } diff --git a/test/cpp/jit/test_backend.cpp b/test/cpp/jit/test_backend.cpp index 2b5de4a146e89a..978daa08d94ddb 100644 --- a/test/cpp/jit/test_backend.cpp +++ b/test/cpp/jit/test_backend.cpp @@ -143,38 +143,6 @@ TEST(BackendTest, TestCompiler) { AT_ASSERT(mres.toTensor().equal(ref.toTensor())); } -TEST(BackendTest, TestCompilerWithStringTable) { - setShouldUseFormatWithStringTable(true); - Module m("m"); - m.define(R"( - def forward(self, x, h): - return x + h - )"); - - std::vector inputs; - inputs.emplace_back(2.0 * torch::ones({})); - inputs.emplace_back(1.0 * torch::ones({})); - auto ref = m.forward(inputs); - - c10::Dict compile_spec(StringType::get(), AnyType::get()); - c10::Dict fake_dict(StringType::get(), AnyType::get()); - fake_dict.insert("", ""); - compile_spec.insert("forward", fake_dict); - auto any_dict_ty = DictType::create(StringType::get(), AnyType::get()); - // lowered module - auto lm = torch::jit::detail::codegen_backend_module( - "backend_with_compiler_demo", m, compile_spec, any_dict_ty); - auto res = lm.forward(inputs); - AT_ASSERT(res.toTensor().equal(ref.toTensor())); - - std::stringstream ss; - lm._save_for_mobile(ss); - auto mlm = _load_for_mobile(ss); - auto mres = mlm.forward(inputs); - setShouldUseFormatWithStringTable(false); - AT_ASSERT(mres.toTensor().equal(ref.toTensor())); -} - TEST(BackendTest, TestComposite) { c10::Dict compile_spec(StringType::get(), AnyType::get()); c10::Dict fake_dict(StringType::get(), AnyType::get()); @@ -308,6 +276,7 @@ TEST(BackendTest, TestConsistencyOfCompositeWithSetStates) { c._save_for_mobile(ss); auto mc = _load_for_mobile(ss); auto res_mobile = mc.forward(inputs); + ss.seekg(0, ss.beg); // check if the methods names are always the same // by reloading the script module and saving it back as mobile @@ -415,56 +384,6 @@ Traceback of TorchScript (most recent call last): ASSERT_THROWS_WITH_MESSAGE(mlm.forward(inputs), error_pattern); } -TEST(BackendTestDebugInfo, TestCompilerWithStringTable) { - setShouldUseFormatWithStringTable(true); - Module m("m"); - m.define(R"( - def forward(self, x, h): - return x + h - )"); - - std::vector inputs; - inputs.emplace_back(torch::rand({2, 4})); - inputs.emplace_back(torch::rand({13, 9})); - - c10::Dict compile_spec(StringType::get(), AnyType::get()); - c10::Dict fake_dict(StringType::get(), AnyType::get()); - fake_dict.insert("", ""); - compile_spec.insert("forward", fake_dict); - auto any_dict_ty = DictType::create(StringType::get(), AnyType::get()); - // lowered module - auto lm = torch::jit::detail::codegen_backend_module( - "backend_with_compiler_demo", m, compile_spec, any_dict_ty); - - std::stringstream ss; - lm._save_for_mobile(ss, ExtraFilesMap(), true); - auto mlm = _load_for_mobile(ss); - std::string error_pattern = R"( - Module hierarchy:top(m)::.__loweredModule__(m)::forward.aten::add -Traceback of TorchScript (most recent call last): - File "", line 3, in - - def forward(self, x: Tensor, h: Tensor): - return self.__loweredModule__.forward(x, h) - ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE - - File "", line 5, in forward - typed_inputs: List[Any] = [x, h, ] - if self.__backend.is_available() : - _0, = self.__backend.execute(self.__handles["forward"], typed_inputs) - ~~~~~~~~~~~~~~~~~~~~~~ <--- HERE - assert isinstance(_0, Tensor) - return _0 - File "", line 3, in - - def forward(self, x, h): - return x + h - ~~~~~ <--- HERE - )"; - setShouldUseFormatWithStringTable(false); - ASSERT_THROWS_WITH_MESSAGE(mlm.forward(inputs), error_pattern); -} - TEST(BackendTestDebugInfo, TestExceptionStackForCompilerWithModuleHierarchy) { Module a("A"); a.define(R"( diff --git a/test/cpp/jit/test_flatbuffer.cpp b/test/cpp/jit/test_flatbuffer.cpp index 76c34389488aa7..0abb84c1268ea6 100644 --- a/test/cpp/jit/test_flatbuffer.cpp +++ b/test/cpp/jit/test_flatbuffer.cpp @@ -23,6 +23,7 @@ #include #include +#include #include #include // Tests go in torch::jit @@ -137,6 +138,22 @@ TEST(FlatbufferTest, MethodInvocation) { // NOLINT (use =delete in gtest) } } +#if defined(ENABLE_FLATBUFFER) && !defined(FB_XPLAT_BUILD) +TEST(FlatbufferTest, FlatbufferBackPortTest) { + Module m("m"); + m.define(R"( + def forward(self, input: Tensor, scale:float): + return torch.upsample_nearest2d(input, [1, 1], float(scale), float(scale)) + )"); + std::stringstream ss; + m._save_for_mobile(ss, {}, false, true); + + std::stringstream oss; + bool backPortSuccess = _backport_for_mobile(ss, oss, 5); + ASSERT_TRUE(backPortSuccess); +} +#endif // defined(ENABLE_FLATBUFFER) && !defined(FB_XPLAT_BUILD) + TEST(FlatbufferTest, ExtraFiles) { const auto script = R"JIT( def forward(self): @@ -153,16 +170,30 @@ TEST(FlatbufferTest, ExtraFiles) { extra_files["metadata.json"] = "abc"; extra_files["mobile_info.json"] = "{\"key\": 23}"; + std::unordered_map loaded_extra_files; +#if defined ENABLE_FLATBUFFER + std::stringstream ss; + module->_save_for_mobile(ss, extra_files, true, /*use_flatbuffer=*/true); + + loaded_extra_files["metadata.json"] = ""; + auto mobile_module = _load_for_mobile(ss, c10::nullopt, loaded_extra_files); + + ASSERT_EQ(loaded_extra_files["metadata.json"], "abc"); + ASSERT_EQ(loaded_extra_files["mobile_info.json"], "{\"key\": 23}"); + + // load it twice using the same stream + auto mobile_module2 = _load_for_mobile(ss, c10::nullopt, loaded_extra_files); +#else CompilationOptions options; mobile::Module bc = jitModuleToMobile(*module, options); auto buff = save_mobile_module_to_bytes(bc, extra_files); - std::unordered_map loaded_extra_files; loaded_extra_files["metadata.json"] = ""; auto* flatbuffer_module = mobile::serialization::GetMutableModule(buff.data()); parseExtraFiles(flatbuffer_module, loaded_extra_files); +#endif ASSERT_EQ(loaded_extra_files["metadata.json"], "abc"); ASSERT_EQ(loaded_extra_files["mobile_info.json"], "{\"key\": 23}"); @@ -235,6 +266,23 @@ TEST(FlatbufferTest, Inline) { AT_ASSERT(output.toTensor().item() == 7.0); } +#if defined ENABLE_FLATBUFFER +TEST(FlatbufferTest, GetByteCodeVersion) { + Module m("m"); + m.define(R"( + def forward(self, input: Tensor): + return input + 1 + )"); + std::stringstream ss; + m._save_for_mobile(ss, {}, false, /*use_flatbuffer=*/true); + auto version = _get_model_bytecode_version(ss); + AT_ASSERT(version == caffe2::serialize::kProducedBytecodeVersion); + ss.seekg(0, ss.beg); + auto version_again = _get_model_bytecode_version(ss); + AT_ASSERT(version == version_again); +} +#endif + TEST(FlatbufferTest, Tuple) { Module m("m"); m.define(R"JIT( @@ -1135,5 +1183,110 @@ TEST(FlatbufferTest, OperatorTest2) { // NOLINT (use =delete in gtest) } } +Module jitModuleFromBuffer(void* data) { + auto* flatbuffer_module = mobile::serialization::GetMutableModule(data); + FlatbufferLoader loader; + mobile::Module mobilem = loader.parseModule(flatbuffer_module); + ExtraFilesMap files; + std::vector constants; + loader.extractJitSourceAndConstants(&files, &constants); + return jitModuleFromSourceAndConstants( + mobilem._ivalue(), files, constants, 8); +} + +#if defined(ENABLE_FLATBUFFER) +TEST(TestSourceFlatbuffer, UpsampleNearest2d) { + Module m("m"); + m.define(R"( + def forward(self, input: Tensor, scale:float): + return torch.upsample_nearest2d(input, [1, 1], float(scale), float(scale)) + )"); + + std::vector inputs; + inputs.emplace_back(torch::rand({1, 3, 128, 128})); + inputs.emplace_back(at::Scalar(2.0)); + auto ref = m.forward(inputs); + + std::stringstream ss; + m._save_for_mobile(ss, {}, false, /*use_fatbuffer=*/true); + auto mm = _load_for_mobile(ss); + auto m2 = load(ss); + + auto res = m2.forward(inputs); + auto resm = mm.forward(inputs); + + auto resd = res.toTensor(); + auto refd = ref.toTensor(); + auto resmd = resm.toTensor(); + ASSERT_TRUE(resd.equal(refd)); + ASSERT_TRUE(resmd.equal(refd)); +} +#endif + +TEST(TestSourceFlatbuffer, CheckAttrAccess) { + Module m("m"); + m.register_attribute("mobile_optimized", BoolType::get(), true); + auto data = save_jit_module_to_bytes(m); + Module m2 = jitModuleFromBuffer(data.data()); + bool mobile_optimized = m2.attr("mobile_optimized", false).toBool(); + AT_ASSERT(mobile_optimized); + mobile::Module m3 = parse_mobile_module(data.data(), data.size()); + mobile_optimized = m3.attr("mobile_optimized", false).toBool(); + AT_ASSERT(mobile_optimized); +} + +TEST(TestSourceFlatbuffer, + MethodInvocation) { // NOLINT (use =delete in gtest) + const std::vector test_programs{ + // test invoking a method with default parameter + R"( + def test_func(self, x, b : int = 4): + return self.foo + x + b + )", + // inner method call with default parameter (gets inlined) + R"( + def add_with_default_arg(self, x, b : int = 4): + return self.foo + x + b + def test_func(self, x): + return self.add_with_default_arg(x) # invoke method w/ default arg + )", + // simple method call + R"( + def test_func(self, x): + b = 4 + return self.foo + x + b + )", + }; + for (const auto& test_program : test_programs) { + Module m("m"); + m.register_parameter("foo", torch::ones({}), false); + m.define(test_program); + + const int fortyTwo = 42; // (keep linter happy) + auto minput = fortyTwo * torch::ones({}); + auto ref = m.run_method("test_func", minput); + + auto data = save_jit_module_to_bytes(m); + Module m2 = jitModuleFromBuffer(data.data()); + const auto& test_func = m2.get_method("test_func"); + IValue res; + for (int i = 0; i < 3; ++i) { + res = test_func({minput}); + } + auto resd = res.toTensor().item(); + auto refd = ref.toTensor().item(); + AT_ASSERT(resd == refd); + + mobile::Module m3 = parse_mobile_module(data.data(), data.size()); + const auto& test_func3 = m3.get_method("test_func"); + for (int i = 0; i < 3; ++i) { + res = test_func3({minput}); + } + resd = res.toTensor().item(); + refd = ref.toTensor().item(); + AT_ASSERT(resd == refd); + } +} + } // namespace jit } // namespace torch diff --git a/test/cpp/jit/test_graph_iterator.cpp b/test/cpp/jit/test_graph_iterator.cpp index 75edac875b190f..00d1f9a6a28c88 100644 --- a/test/cpp/jit/test_graph_iterator.cpp +++ b/test/cpp/jit/test_graph_iterator.cpp @@ -62,7 +62,7 @@ void assert_ordering( ASSERT_EQ(expected.size(), actual.size()) << "Got " << actual.size() << " elements (" << actual << ")" << " expected " << expected.size() << " elements (" << expected << ")"; - for (int i = 0; i < expected.size(); i++) { + for (unsigned i = 0; i < expected.size(); i++) { ASSERT_EQ(expected[i], actual[i]) << "Difference at index " << i << " in " << actual << " (expected " << actual << ")"; diff --git a/test/cpp/jit/test_lite_interpreter.cpp b/test/cpp/jit/test_lite_interpreter.cpp index 5e00eafa7382cb..a07cc8af5aa707 100644 --- a/test/cpp/jit/test_lite_interpreter.cpp +++ b/test/cpp/jit/test_lite_interpreter.cpp @@ -599,7 +599,7 @@ void runAndCheckTorchScriptModel( std::stringstream& input_model_stream, const std::vector& input_data, const std::vector& expect_result_list, - const int64_t expect_version) { + const uint64_t expect_version) { auto actual_version = _get_model_bytecode_version(input_model_stream); AT_ASSERT(actual_version == expect_version); @@ -616,7 +616,7 @@ void runAndCheckBytecodeModel( std::stringstream& input_model_stream, const std::vector& input_data, const std::vector& expect_result_list, - const int64_t expect_version) { + const uint64_t expect_version) { auto actual_version = _get_model_bytecode_version(input_model_stream); AT_ASSERT(actual_version == expect_version); @@ -634,13 +634,14 @@ void backportAllVersionCheck( std::stringstream& test_model_file_stream, std::vector& input_data, std::vector& expect_result_list, - const int64_t expect_from_version) { + const uint64_t expect_from_version) { auto from_version = _get_model_bytecode_version(test_model_file_stream); AT_ASSERT(from_version == expect_from_version); + AT_ASSERT(from_version > 0); // Backport script_module_v5.ptl to an older version constexpr int64_t minimum_to_version = 4; - int64_t current_to_version = from_version - 1; + auto current_to_version = from_version - 1; // Verify all candidate to_version work as expected. All backport to version // larger than minimum_to_version should success. @@ -656,12 +657,14 @@ void backportAllVersionCheck( // Check backport model version auto backport_version = _get_model_bytecode_version(oss); + backport_version = _get_model_bytecode_version(oss); AT_ASSERT(backport_version == current_to_version); // Load and run the backport model, then compare the result with expect // result runAndCheckBytecodeModel( oss, input_data, expect_result_list, current_to_version); + oss.seekg(0, oss.beg); runAndCheckTorchScriptModel( oss, input_data, expect_result_list, current_to_version); @@ -715,7 +718,15 @@ TEST(LiteInterpreterTest, BackPortByteCodeModelAllVersions) { torch::jit::Module module_freeze = freeze(module); std::stringstream input_model_stream; +#if defined(ENABLE_FLATBUFFER) + module_freeze._save_for_mobile( + input_model_stream, + /*extra_files=*/{}, + /*save_mobile_debug_info=*/false, + /*use_flatbuffer=*/true); +#else module_freeze._save_for_mobile(input_model_stream); +#endif std::vector input_data = std::vector({torch::ones({1, 1, 28, 28})}); std::vector expect_result_list; @@ -991,7 +1002,6 @@ TEST(LiteInterpreterTest, ExtraFiles) { module->_save_for_mobile(oss, extra_files); std::istringstream iss(oss.str()); - caffe2::serialize::IStreamAdapter adapter{&iss}; std::unordered_map loaded_extra_files; loaded_extra_files["metadata.json"] = ""; torch::jit::_load_for_mobile(iss, torch::kCPU, loaded_extra_files); @@ -1006,7 +1016,7 @@ TEST(LiteInterpreterTest, ExtraFiles) { loaded_extra_files[file_name.substr(6)] = ""; } } - + iss.seekg(0, iss.beg); torch::jit::_load_for_mobile(iss, torch::kCPU, loaded_extra_files); ASSERT_EQ(loaded_extra_files["metadata.json"], "abc"); ASSERT_EQ(loaded_extra_files["mobile_info.json"], "{\"key\": 23}"); @@ -1186,7 +1196,6 @@ TEST(RunTimeTest, ParseOperator) { function.get()); parseOperators( std::move(*c10::ivalue::Tuple::create(operators)).elements(), - model_version, 1, function.get()); const size_t rsize = 5; @@ -1569,7 +1578,6 @@ TEST(RunTimeTest, RuntimeCall) { foo.get()); parseOperators( std::move(*c10::ivalue::Tuple::create(operatorsFoo)).elements(), - model_version, 1, foo.get()); parseConstants( @@ -1586,7 +1594,6 @@ TEST(RunTimeTest, RuntimeCall) { call.get()); parseOperators( std::move(*c10::ivalue::Tuple::create(operatorsCall)).elements(), - model_version, 1, call.get()); parseConstants( @@ -2090,10 +2097,7 @@ TEST(LiteInterpreterUpgraderTest, Upgrader) { if (byteCodeFunctionWithOperator.function.get_code().operators_.empty()) { for (const auto& op : byteCodeFunctionWithOperator.operators) { byteCodeFunctionWithOperator.function.append_operator( - op.name, - op.overload_name, - op.num_specified_args, - caffe2::serialize::kMaxSupportedFileFormatVersion); + op.name, op.overload_name, op.num_specified_args); } } upgrader_functions.push_back(byteCodeFunctionWithOperator.function); diff --git a/test/cpp/jit/test_lite_trainer.cpp b/test/cpp/jit/test_lite_trainer.cpp index cf3040f4fba46c..ede1c3a8355b48 100644 --- a/test/cpp/jit/test_lite_trainer.cpp +++ b/test/cpp/jit/test_lite_trainer.cpp @@ -158,6 +158,139 @@ TEST(MobileTest, SaveLoadParametersEmpty) { AT_ASSERT(mobile_params.size() == 0); } +TEST(MobileTest, SaveParametersDefaultsToZip) { + // Save some empty parameters. + std::map empty_parameters; + std::stringstream ss_data; + _save_parameters(empty_parameters, ss_data); + + // Verify that parameters were serialized to a ZIP container. + EXPECT_GE(ss_data.str().size(), 4); + EXPECT_EQ(ss_data.str()[0], 'P'); + EXPECT_EQ(ss_data.str()[1], 'K'); + EXPECT_EQ(ss_data.str()[2], '\x03'); + EXPECT_EQ(ss_data.str()[3], '\x04'); +} + +#if defined(ENABLE_FLATBUFFER) +TEST(MobileTest, SaveParametersCanUseFlatbuffer) { + // Save some empty parameters using flatbuffer. + std::map empty_parameters; + std::stringstream ss_data; + _save_parameters(empty_parameters, ss_data, /*use_flatbuffer=*/true); + + // Verify that parameters were serialized to a flatbuffer. The flatbuffer + // magic bytes should be at offsets 4..7. The first four bytes contain an + // offset to the actual flatbuffer data. + EXPECT_GE(ss_data.str().size(), 8); + EXPECT_EQ(ss_data.str()[4], 'P'); + EXPECT_EQ(ss_data.str()[5], 'T'); + EXPECT_EQ(ss_data.str()[6], 'M'); + EXPECT_EQ(ss_data.str()[7], 'F'); +} +#else // !defined(ENABLE_FLATBUFFER) +TEST(MobileTest, SaveParametersThrowsWithoutFlatbufferSupport) { + // Some empty parameters to try saving. + std::map empty_parameters; + std::stringstream ss_data; + + // Save using flatbuffers should fail when support isn't compiled in. Make + // sure we get the exception that explicitly mentions the lack of flatbuffer + // support. + try { + _save_parameters(empty_parameters, ss_data, /*use_flatbuffer=*/true); + FAIL() << "_save_parameters should have thrown"; + } catch (const ::c10::Error& e) { + static const std::string kExpectedSubstring = + "build hasn't enabled flatbuffer"; + EXPECT_TRUE( + std::string(e.msg()).find(kExpectedSubstring) != std::string::npos) + << "Exception message does not contain expected substring \"" + << kExpectedSubstring << "\": actual message \"" << e.msg() << "\""; + } catch (...) { + FAIL() << "Unexpected exception type"; + } +} +#endif // !defined(ENABLE_FLATBUFFER) + +#if defined(ENABLE_FLATBUFFER) +TEST(MobileTest, SaveLoadParametersUsingFlatbuffers) { + // Create some simple parameters to save. + std::map input_params; + input_params["four_by_ones"] = 4 * torch::ones({}); + input_params["three_by_ones"] = 3 * torch::ones({}); + + // Serialize them using flatbuffers. + std::stringstream data; + _save_parameters(input_params, data, /*use_flatbuffer=*/true); + + // The flatbuffer magic bytes should be at offsets 4..7. + EXPECT_EQ(data.str()[4], 'P'); + EXPECT_EQ(data.str()[5], 'T'); + EXPECT_EQ(data.str()[6], 'M'); + EXPECT_EQ(data.str()[7], 'F'); + + // Read them back and check that they survived the trip. + auto output_params = _load_parameters(data); + EXPECT_EQ(output_params.size(), 2); + { + auto four_by_ones = 4 * torch::ones({}); + EXPECT_EQ( + output_params["four_by_ones"].item(), four_by_ones.item()); + } + { + auto three_by_ones = 3 * torch::ones({}); + EXPECT_EQ( + output_params["three_by_ones"].item(), three_by_ones.item()); + } +} +#else // !defined(ENABLE_FLATBUFFER) +TEST(MobileTest, LoadParametersFailsWithoutFlatbufferSupport) { + // Create some data that looks like a flatbuffer header. + std::stringstream data; + data << "abcd" + << "PTMF" // Flatbuffer magic + << "ijkl"; + + // Loading the "flatbuffer" data should fail. Make sure we see the expected + // exception, not just any exception; since this isn't properly-formed + // flatbuffer data, any attempt to parse it might throw a different error type + // or message, but we don't expect anyone to try parsing it. + try { + _load_parameters(data); + FAIL() << "_load_parameters should have thrown"; + } catch (const ::c10::Error& e) { + static const std::string kExpectedSubstring = + "build hasn't enabled flatbuffer"; + EXPECT_TRUE( + std::string(e.msg()).find(kExpectedSubstring) != std::string::npos) + << "Exception message does not contain expected substring \"" + << kExpectedSubstring << "\": actual message \"" << e.msg() << "\""; + } catch (...) { + FAIL() << "Unexpected exception type"; + } +} +#endif // !defined(ENABLE_FLATBUFFER) + +TEST(MobileTest, LoadParametersUnexpectedFormatShouldThrow) { + // Manually create some data that doesn't look like a ZIP or Flatbuffer file. + // Make sure it's longer than 8 bytes, since getFileFormat() needs that much + // data to detect the type. + std::stringstream bad_data; + bad_data << "abcd" + << "efgh" + << "ijkl"; + + // Loading parameters from it should throw an exception. + EXPECT_ANY_THROW(_load_parameters(bad_data)); +} + +TEST(MobileTest, LoadParametersEmptyDataShouldThrow) { + // Loading parameters from an empty data stream should throw an exception. + std::stringstream empty; + EXPECT_ANY_THROW(_load_parameters(empty)); +} + TEST(LiteTrainerTest, SGD) { Module m("m"); m.register_parameter("foo", torch::ones({1}, at::requires_grad()), false); diff --git a/test/cpp/jit/test_misc.cpp b/test/cpp/jit/test_misc.cpp index 9ccbb77ff71531..244ea96bf3e99f 100644 --- a/test/cpp/jit/test_misc.cpp +++ b/test/cpp/jit/test_misc.cpp @@ -4,6 +4,7 @@ #include #include #include +#include #include #include #include @@ -47,6 +48,7 @@ #include #include #include +#include #include #include #include @@ -1381,6 +1383,29 @@ TEST(ThreadLocalDebugInfoTest, Basic) { } } +TEST(TestSymInt, NarrowCopyWithSymbolicInt) { + static const size_t LENGTH = 5; + auto a = at::randn({10}, at::kCPU); + c10::SymInt si(LENGTH); + auto b = a.narrow_copy(0, 0, si); + auto c = a.narrow(0, 0, LENGTH); + ASSERT_TRUE(torch::allclose(b, c)); +} + +TEST(TestSymInt, NarrowCopy) { + static const size_t LENGTH = 5; + auto a = at::randn({10}, at::kCPU); + auto b = a.narrow_copy(0, 0, LENGTH); + auto c = a.narrow(0, 0, LENGTH); + ASSERT_TRUE(torch::allclose(b, c)); +} + +TEST(TestSymInt, AddSymbolicInt) { + c10::SymInt a(5); + c10::SymInt b(3); + ASSERT_TRUE((a + b).expect_int() == 8); +} + TEST(FallbackGraphsTest, Basic) { static const auto nestGraphIntoFallbackGraph = [](const std::shared_ptr& graph) { @@ -2913,6 +2938,74 @@ graph(%x.1 : Tensor): testing::FileCheck().check_not("aten::relu(")->run(*graph); } +TEST(TestFunctionExecutor, SimpleExecutorTest) { + auto graph = std::make_shared(); + parseIR( + R"IR( +graph(%x.1 : Tensor): + %2 : int = prim::Constant[value=1]() + %x.3 : Tensor = aten::add(%x.1, %2, %2) + %y : Tensor = aten::relu(%x.3) + return (%y))IR", + &*graph); + { + auto func = torch::make_unique( + "name", graph, [](GraphFunction&) {}, ExecutorExecutionMode::PROFILING); + auto a = at::rand({2, 2, 2}, TensorOptions(kCPU).dtype(at::kFloat)); + Stack stack = {a}; + func->run(stack); + auto g = lastExecutedOptimizedGraph(); + testing::FileCheck() + .check("prim::profile") + ->check("aten::add") + ->check("aten::relu") + ->run(*g); + } + { + auto func = torch::make_unique( + "name", graph, [](GraphFunction&) {}, ExecutorExecutionMode::SIMPLE); + auto a = at::rand({2, 2, 2}, TensorOptions(kCPU).dtype(at::kFloat)); + Stack stack = {a}; + func->run(stack); + auto g = func->getDebugState().graph; + testing::FileCheck() + .check_not("prim::profile") + ->check("aten::add") + ->check("aten::relu") + ->run(*g); + } +} + +TEST(TestFunctionExecutor, RunDecompositionTest) { + GraphFunction* func; + std::once_flag flag1; + for (bool unbiased : {true, false}) { + std::call_once(flag1, [&]() { + // NB: take reference to schema here, `auto schema =` will not work + auto& schema = getOperatorForLiteral( + "aten::var(Tensor self, bool unbiased=True) -> Tensor") + ->schema(); + auto maybe_func = GetDecompositionFunction(schema); + TORCH_INTERNAL_ASSERT(maybe_func); + func = *maybe_func; + }); + auto input = at::rand({4, 4}); + Stack stack = {input, unbiased}; + func->run(stack); + at::Tensor out = pop(stack).toTensor(); + ASSERT_TRUE(at::allclose(out, input.var(unbiased))); + } +} + +TEST(TestShapeGraphLinting, Basic) { + auto schemas = RegisteredShapeComputeSchemas(); + for (const auto& schema : schemas) { + auto g = shapeComputeGraphForSchema(*schema); + TORCH_INTERNAL_ASSERT(g); + LintShapeComputeGraph(schema, *g); + } +} + // TODO: move to test_kernel when global settings are explicit // fusion parameters class Composed : public ::testing::Test { diff --git a/test/cpp/jit/test_save_load.cpp b/test/cpp/jit/test_save_load.cpp index 88bff7ea93e885..c7d631baa7a232 100644 --- a/test/cpp/jit/test_save_load.cpp +++ b/test/cpp/jit/test_save_load.cpp @@ -3,7 +3,9 @@ #include #include +#include #include +#include #include #include #include @@ -13,6 +15,20 @@ namespace torch { namespace jit { +namespace { + +Module roundtripThroughMobile(const Module& m) { + ExtraFilesMap files; + std::vector constants; + jitModuleToPythonCodeAndConstants(m, &files, &constants); + CompilationOptions options; + mobile::Module mobilem = jitModuleToMobile(m, options); + return jitModuleFromSourceAndConstants( + mobilem._ivalue(), files, constants, 8); +} + +} // namespace + TEST(SerializationTest, ExtraFilesHookPreference) { // Tests that an extra file written explicitly has precedence over // extra files written by a hook @@ -149,5 +165,78 @@ TEST(SerializationTest, TestJitStream_CUDA) { // Check if both the output tensors are equal ASSERT_TRUE(op.equal(c)); } + +TEST(TestSourceRoundTrip, UpsampleNearest2d) { + Module m("m"); + m.define(R"( + def forward(self, input: Tensor, scale:float): + return torch.upsample_nearest2d(input, [1, 1], float(scale), float(scale)) + )"); + + std::vector inputs; + inputs.emplace_back(torch::rand({1, 3, 128, 128})); + inputs.emplace_back(at::Scalar(2.0)); + auto ref = m.forward(inputs); + + Module m2 = roundtripThroughMobile(m); + auto res = m2.forward(inputs); + + auto resd = res.toTensor(); + auto refd = ref.toTensor(); + ASSERT_TRUE(resd.equal(refd)); +} + +TEST(TestSourceRoundTrip, CheckAttrAccess) { + Module m("m"); + m.register_attribute("mobile_optimized", BoolType::get(), true); + Module m2 = roundtripThroughMobile(m); + bool mobile_optimized = m2.attr("mobile_optimized", false).toBool(); + AT_ASSERT(mobile_optimized); +} + +TEST(TestSourceRoundTrip, + MethodInvocation) { // NOLINT (use =delete in gtest) + const std::vector test_programs{ + // test invoking a method with default parameter + R"( + def test_func(self, x, b : int = 4): + return self.foo + x + b + )", + // inner method call with default parameter (gets inlined) + R"( + def add_with_default_arg(self, x, b : int = 4): + return self.foo + x + b + def test_func(self, x): + return self.add_with_default_arg(x) # invoke method w/ default arg + )", + // simple method call + R"( + def test_func(self, x): + b = 4 + return self.foo + x + b + )", + }; + for (const auto& test_program : test_programs) { + Module m("m"); + m.register_parameter("foo", torch::ones({}), false); + m.define(test_program); + + const int fortyTwo = 42; // (keep linter happy) + auto minput = fortyTwo * torch::ones({}); + auto ref = m.run_method("test_func", minput); + + Module m2 = roundtripThroughMobile(m); + const auto& test_func = m2.get_method("test_func"); + IValue res; + for (int i = 0; i < 3; ++i) { + res = test_func({minput}); + } + + auto resd = res.toTensor().item(); + auto refd = ref.toTensor().item(); + AT_ASSERT(resd == refd); + } +} + } // namespace jit } // namespace torch diff --git a/test/cpp/jit/test_shape_analysis.cpp b/test/cpp/jit/test_shape_analysis.cpp index baf9f16e6e79dd..b3157c09e8bee4 100644 --- a/test/cpp/jit/test_shape_analysis.cpp +++ b/test/cpp/jit/test_shape_analysis.cpp @@ -30,7 +30,6 @@ Node* findNode(std::shared_ptr& g, Symbol k) { } TORCH_INTERNAL_ASSERT(false, "Couldn't find node"); } - } // namespace TEST(ShapeAnalysisTest, DynamicShapesFusion) { @@ -292,5 +291,66 @@ TEST(ShapeAnalysisTest, MovingConstantOutOfFusionGroups) { ->run(*g); } +namespace { + +void assertShapeEqual( + c10::optional>& actual, + std::vector> expected) { + ASSERT_TRUE(actual.has_value()); + ASSERT_EQ(actual->size(), 1); + auto a_canonical = CanonicalizedSymbolicShape(actual->at(0)); + + auto symb_expected = c10::SymbolicShape(expected); + auto b_canonical = CanonicalizedSymbolicShape(symb_expected); + ASSERT_EQ(a_canonical, b_canonical); +} + +} // namespace + +TEST(ShapeAnalysisTest, SymbolicShapeAPI) { + // Figure out how to fetch a function schema + + // Ask someone else how to create a function schema / operator in C++ + std::shared_ptr op = getOperatorForLiteral( + "aten::sub.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor"); + const FunctionSchema* schema = &(op->schema()); + + c10::IValue const_size_1 = std::vector{64, 56, 56}; + c10::IValue const_size_2 = std::vector{1, 56, 56}; + + // Check vector initializer list syntax + c10::optional sym_dim = c10::nullopt; + c10::SymbolicShape ss_concrete = + std::vector>{1, 56, 56}; + c10::SymbolicShape ss1 = std::vector>{sym_dim, 56, 56}; + c10::SymbolicShape ss2 = + std::vector>{64, sym_dim, sym_dim}; + c10::SymbolicShape ss3 = + std::vector>{sym_dim, sym_dim, sym_dim, sym_dim}; + + auto res = calculateSymbolicShapesOnOp( + schema, std::vector{const_size_1, const_size_1}); + assertShapeEqual(res, {64, 56, 56}); + + res = calculateSymbolicShapesOnOp( + schema, std::vector{const_size_1, const_size_2}); + assertShapeEqual(res, {64, 56, 56}); + + res = calculateSymbolicShapesOnOp( + schema, std::vector{const_size_1, ss1}); + assertShapeEqual(res, {64, 56, 56}); + + res = calculateSymbolicShapesOnOp( + schema, std::vector{const_size_2, ss1}); + assertShapeEqual(res, {sym_dim, 56, 56}); + + res = calculateSymbolicShapesOnOp( + schema, std::vector{ss_concrete, ss2}); + assertShapeEqual(res, {64, 56, 56}); + + res = calculateSymbolicShapesOnOp(schema, std::vector{ss2, ss3}); + assertShapeEqual(res, {sym_dim, 64, sym_dim, sym_dim}); +} + } // namespace jit } // namespace torch diff --git a/test/cpp/jit/test_utils.h b/test/cpp/jit/test_utils.h index 3d2ff4b159ca24..89a8959c424ff5 100644 --- a/test/cpp/jit/test_utils.h +++ b/test/cpp/jit/test_utils.h @@ -17,39 +17,31 @@ static inline void trim(std::string& s) { [](unsigned char ch) { return !std::isspace(ch); }) .base(), s.end()); - for (int64_t i = 0; i < s.size(); ++i) { - if (s[i] == '\n') { + for (size_t i = 0; i < s.size(); ++i) { + while (i < s.size() && s[i] == '\n') { s.erase(i, 1); - i--; } } - for (int64_t i = 0; i < s.size(); ++i) { + for (size_t i = 0; i < s.size(); ++i) { if (s[i] == ' ') { - for (int64_t j = i + 1; j < s.size(); j++) { - if (s[j] == ' ') { - s.erase(j, 1); - j--; - } else { - break; - } + while (i + 1 < s.size() && s[i + 1] == ' ') { + s.erase(i + 1, 1); } } } } } // namespace -#define ASSERT_THROWS_WITH_MESSAGE(statement, substring) \ - try { \ - (void)statement; \ - FAIL(); \ - } catch (const std::exception& e) { \ - std::string substring_s(substring); \ - trim(substring_s); \ - auto exception_string = std::string(e.what()); \ - trim(exception_string); \ - ASSERT_NE(exception_string.find(substring_s), std::string::npos) \ - << " Error was: \n" \ - << exception_string; \ +#define ASSERT_THROWS_WITH_MESSAGE(statement, substring) \ + try { \ + (void)statement; \ + FAIL(); \ + } catch (const std::exception& e) { \ + std::string substring_s(substring); \ + trim(substring_s); \ + auto exception_string = std::string(e.what()); \ + trim(exception_string); \ + ASSERT_NE(exception_string.find(substring_s), std::string::npos); \ } namespace torch { diff --git a/test/cpp/jit/upgrader_models/test_versioned_div_scalar_float_v2.ptl b/test/cpp/jit/upgrader_models/test_versioned_div_scalar_float_v2.ptl index be67cecf970508..ddee6be4c35afb 100644 Binary files a/test/cpp/jit/upgrader_models/test_versioned_div_scalar_float_v2.ptl and b/test/cpp/jit/upgrader_models/test_versioned_div_scalar_float_v2.ptl differ diff --git a/test/cpp/jit/upgrader_models/test_versioned_div_scalar_inplace_float_v2.ptl b/test/cpp/jit/upgrader_models/test_versioned_div_scalar_inplace_float_v2.ptl index e5663224ac7603..cb36f9aeba8bc6 100644 Binary files a/test/cpp/jit/upgrader_models/test_versioned_div_scalar_inplace_float_v2.ptl and b/test/cpp/jit/upgrader_models/test_versioned_div_scalar_inplace_float_v2.ptl differ diff --git a/test/cpp/jit/upgrader_models/test_versioned_div_scalar_inplace_int_v2.ptl b/test/cpp/jit/upgrader_models/test_versioned_div_scalar_inplace_int_v2.ptl index 8698001427a93f..443074fe7130cd 100644 Binary files a/test/cpp/jit/upgrader_models/test_versioned_div_scalar_inplace_int_v2.ptl and b/test/cpp/jit/upgrader_models/test_versioned_div_scalar_inplace_int_v2.ptl differ diff --git a/test/cpp/jit/upgrader_models/test_versioned_div_scalar_int_v2.ptl b/test/cpp/jit/upgrader_models/test_versioned_div_scalar_int_v2.ptl index c52d92b29f44cc..ac8b1b918de7c5 100644 Binary files a/test/cpp/jit/upgrader_models/test_versioned_div_scalar_int_v2.ptl and b/test/cpp/jit/upgrader_models/test_versioned_div_scalar_int_v2.ptl differ diff --git a/test/cpp/jit/upgrader_models/test_versioned_div_scalar_reciprocal_float_v2.ptl b/test/cpp/jit/upgrader_models/test_versioned_div_scalar_reciprocal_float_v2.ptl index 749614fa53097d..323aa42dde4ec2 100644 Binary files a/test/cpp/jit/upgrader_models/test_versioned_div_scalar_reciprocal_float_v2.ptl and b/test/cpp/jit/upgrader_models/test_versioned_div_scalar_reciprocal_float_v2.ptl differ diff --git a/test/cpp/jit/upgrader_models/test_versioned_div_scalar_reciprocal_int_v2.ptl b/test/cpp/jit/upgrader_models/test_versioned_div_scalar_reciprocal_int_v2.ptl index b20c456058be63..6d06dea6b5896d 100644 Binary files a/test/cpp/jit/upgrader_models/test_versioned_div_scalar_reciprocal_int_v2.ptl and b/test/cpp/jit/upgrader_models/test_versioned_div_scalar_reciprocal_int_v2.ptl differ diff --git a/test/cpp/jit/upgrader_models/test_versioned_div_scalar_scalar_v2.ptl b/test/cpp/jit/upgrader_models/test_versioned_div_scalar_scalar_v2.ptl index f33f3a8cf8de35..4fd551d073aebf 100644 Binary files a/test/cpp/jit/upgrader_models/test_versioned_div_scalar_scalar_v2.ptl and b/test/cpp/jit/upgrader_models/test_versioned_div_scalar_scalar_v2.ptl differ diff --git a/test/cpp/jit/upgrader_models/test_versioned_div_tensor_inplace_v2.ptl b/test/cpp/jit/upgrader_models/test_versioned_div_tensor_inplace_v2.ptl index ac7cc7479e7988..9680713a83e280 100644 Binary files a/test/cpp/jit/upgrader_models/test_versioned_div_tensor_inplace_v2.ptl and b/test/cpp/jit/upgrader_models/test_versioned_div_tensor_inplace_v2.ptl differ diff --git a/test/cpp/jit/upgrader_models/test_versioned_div_tensor_out_v2.ptl b/test/cpp/jit/upgrader_models/test_versioned_div_tensor_out_v2.ptl index 0b70614b09366b..0381636677b52b 100644 Binary files a/test/cpp/jit/upgrader_models/test_versioned_div_tensor_out_v2.ptl and b/test/cpp/jit/upgrader_models/test_versioned_div_tensor_out_v2.ptl differ diff --git a/test/cpp/jit/upgrader_models/test_versioned_div_tensor_v2.ptl b/test/cpp/jit/upgrader_models/test_versioned_div_tensor_v2.ptl index 5f6ae1a90b1e7b..21792d35b8924f 100644 Binary files a/test/cpp/jit/upgrader_models/test_versioned_div_tensor_v2.ptl and b/test/cpp/jit/upgrader_models/test_versioned_div_tensor_v2.ptl differ diff --git a/test/cpp/lazy/CMakeLists.txt b/test/cpp/lazy/CMakeLists.txt index ede4308816cfeb..9360247a4d3126 100644 --- a/test/cpp/lazy/CMakeLists.txt +++ b/test/cpp/lazy/CMakeLists.txt @@ -9,9 +9,15 @@ set(LAZY_TEST_SRCS ${LAZY_TEST_ROOT}/test_misc.cpp ${LAZY_TEST_ROOT}/test_permutation_util.cpp ${LAZY_TEST_ROOT}/test_shape.cpp - ${LAZY_TEST_ROOT}/test_tensor_impl.cpp + ${LAZY_TEST_ROOT}/test_symbolic_shape.cpp ${LAZY_TEST_ROOT}/test_util.cpp ) +if(BUILD_LAZY_TS_BACKEND) + list(APPEND LAZY_TEST_SRCS + ${LAZY_TEST_ROOT}/test_lazy_ops.cpp + ${LAZY_TEST_ROOT}/test_lazy_ops_util.cpp + ) +endif() add_executable(test_lazy ${TORCH_ROOT}/test/cpp/common/main.cpp diff --git a/test/cpp/lazy/test_backend_device.cpp b/test/cpp/lazy/test_backend_device.cpp index b75f0512d38787..f8ce49b9e287dd 100644 --- a/test/cpp/lazy/test_backend_device.cpp +++ b/test/cpp/lazy/test_backend_device.cpp @@ -74,9 +74,13 @@ TEST(BackendDeviceTest, FromAten) { auto device = c10::Device(c10::kCPU); EXPECT_THROW(atenDeviceToBackendDevice(device), c10::Error); - // TODO(alanwaketan): Update the following test once we have TorchScript backend upstreamed. device = c10::Device(c10::kLazy); +#ifndef FBCODE_CAFFE2 + auto backend_device = atenDeviceToBackendDevice(device); +#else + // Lazy Tensor is disabled in FBCODE until addressing non-virtual methods (e.g. sizes) in TensorImpl EXPECT_THROW(atenDeviceToBackendDevice(device), c10::Error); +#endif // FBCODE_CAFFE2 } TEST(BackendDeviceTest, ToAten) { diff --git a/test/cpp/lazy/test_cache.cpp b/test/cpp/lazy/test_cache.cpp index a6da9bccbd25e0..53bd6af147ebaf 100644 --- a/test/cpp/lazy/test_cache.cpp +++ b/test/cpp/lazy/test_cache.cpp @@ -4,6 +4,8 @@ #include #include #include +#include +#include namespace torch { namespace lazy { @@ -22,7 +24,6 @@ class CacheNode : public Node { const Output& operand(size_t i) const override { TORCH_INTERNAL_ASSERT(false, "Can't access operand[i] of test node"); } - private: std::string str_; }; @@ -59,5 +60,32 @@ TEST(CacheTest, BasicTest) { EXPECT_EQ(cache.Get(c->node_hash()), nullptr); } +class CacheNodeWithShape : public TsNode { + public: + explicit CacheNodeWithShape(const Shape& shape) + : TsNode(OpKind(), shape, /* num_outputs */ 1, /* seed */ 0){} +}; + +TEST(CacheTest, ShapeCacheTestForDynamicShape) { + // enable dynamic shape + FLAGS_ltc_enable_dynamic_shapes = true; + + CacheNodeWithShape nodes[] = { + CacheNodeWithShape(Shape(c10::kFloat, {2, 4})), + CacheNodeWithShape(Shape(c10::kFloat, {4, 2})) }; + + /* + * Make sure the cached shape for node (2, 4) is not used for node (4, 2) + */ + for (auto& node : nodes) { + EXPECT_EQ(node.shape(), node.GetOpShape([&]() { + return node.shape(); + })); + } + + // reset the flag + FLAGS_ltc_enable_dynamic_shapes = false; +} + } // namespace lazy } // namespace torch diff --git a/test/cpp/lazy/test_ir.cpp b/test/cpp/lazy/test_ir.cpp index 326f7a9092c00c..d07530b29e7dfd 100644 --- a/test/cpp/lazy/test_ir.cpp +++ b/test/cpp/lazy/test_ir.cpp @@ -1,10 +1,15 @@ #include +#include #include #include #include +#include #include #include +#include +#include +#include namespace torch { namespace lazy { @@ -23,7 +28,6 @@ class TestLeafNode : public Node { const Output& operand(size_t i) const override { TORCH_INTERNAL_ASSERT(false, "Can't access operand[i] of leaf node"); } - private: size_t param_; }; @@ -51,22 +55,22 @@ TEST(IrTest, MetaDataTest) { node = MakeNode(1); auto metaWithEmptyDebug = node->metadata(); EXPECT_EQ(metaWithEmptyDebug.scope.size(), 0); - EXPECT_EQ(metaWithEmptyDebug.frame_info.size(), 0); + EXPECT_EQ(metaWithEmptyDebug.frame_info.size(), 1); { ScopePusher scope("TestScope"); node = MakeNode(1); auto metaWithScope = node->metadata(); EXPECT_EQ(metaWithScope.scope, "TestScope.1"); - EXPECT_EQ(metaWithScope.frame_info.size(), 0); + EXPECT_EQ(metaWithScope.frame_info.size(), 1); } SourceLocation dummySourceLocation; dummySourceLocation.file = "file"; dummySourceLocation.function = "function"; dummySourceLocation.line = 10; - RegisterGetFrameInfo( - [&]() -> std::vector { return {dummySourceLocation}; }); + GetPythonFramesFunction() = + [&]() -> std::vector { return {dummySourceLocation}; }; node = MakeNode(1); auto metaWithSourceLoc = node->metadata(); EXPECT_EQ(metaWithSourceLoc.scope.size(), 0); @@ -77,7 +81,7 @@ TEST(IrTest, MetaDataTest) { FLAGS_torch_lazy_ir_debug = restore_FLAGS_torch_lazy_ir_debug; } -TEST(IrTest, TsNode) { +TEST(IrTest, TsNodeTest) { NodePtr node1 = MakeNode( OpKind(at::aten::view), Shape(), @@ -96,5 +100,28 @@ TEST(IrTest, TsNode) { EXPECT_TRUE(leafptr != nullptr); } +TEST(IrTest, DimensionNodeTest) { + + const size_t DIM0 = 5; + const size_t DIM1 = 8; + NodePtr node1 = MakeNode( + OpKind(at::aten::view), + Shape(c10::kFloat, {DIM0, DIM1}), + /*num_outputs*/ 1, + /*hash_seed*/ kHashSeed); + + auto size0 = std::dynamic_pointer_cast(MakeNode(Value{node1}, 0)); + auto size1 = std::dynamic_pointer_cast(MakeNode(Value{node1}, 1)); + + ASSERT_EQ(DIM0, size0->getStaticValue()); + ASSERT_EQ(DIM1, size1->getStaticValue()); + + auto add_dim = std::dynamic_pointer_cast(MakeNode(Value{size0}, Value{size1})); + ASSERT_EQ(DIM0 + DIM1, add_dim->getStaticValue()); + + auto mul_dim = std::dynamic_pointer_cast(MakeNode(Value{size0}, Value{size1})); + ASSERT_EQ(DIM0 * DIM1, mul_dim->getStaticValue()); +} + } // namespace lazy } // namespace torch diff --git a/test/cpp/lazy/test_ir_util.cpp b/test/cpp/lazy/test_ir_util.cpp index bb29cff6f6b316..6c85c0184323c1 100644 --- a/test/cpp/lazy/test_ir_util.cpp +++ b/test/cpp/lazy/test_ir_util.cpp @@ -22,18 +22,6 @@ class IrUtilNode : public Node { operands_as_outputs_.emplace_back(v.node.get(), v.index); operands_.push_back(std::move(v.node)); } - - const std::vector& operands() const override { - return operands_as_outputs_; - } - - const Output& operand(size_t i) const override { - return operands_as_outputs_.at(i); - } - - private: - std::vector operands_; - std::vector operands_as_outputs_; }; /* a diff --git a/test/cpp/lazy/test_lazy_ops.cpp b/test/cpp/lazy/test_lazy_ops.cpp new file mode 100644 index 00000000000000..c1319429c1811c --- /dev/null +++ b/test/cpp/lazy/test_lazy_ops.cpp @@ -0,0 +1,10727 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +namespace torch { +namespace lazy { + + +// Lazy Tensor is disabled in FBCODE until addressing non-virtual methods (e.g. sizes) in TensorImpl +#ifndef FBCODE_CAFFE2 + +namespace { + // This registers the torchscript backend, without which lazy device won't work +static bool inline init_backend(){ + torch::lazy::InitTorchScriptBackend(); + return true; +} +static const bool backend_initialized = init_backend(); + +} + +class LazyTsTest : public ::testing::Test { + protected: + void SetUp() override; + + void TearDown() override; + + static void CommonSetup() {} + + void ExpectCounterNotChanged( + const std::string& counter_regex, + const std::unordered_set* ignore_set) {} + + void ExpectCounterChanged(const std::string& counter_regex, + const std::unordered_set* ignore_set) { + } + + void ResetCounters() {} + + private: + void MakeEndSnapshot() {} +}; + +class LazyOpsTestBase : public LazyTsTest { + protected: + static void SetUpTestCase() {} +}; + +void LazyTsTest::SetUp() { + (void)backend_initialized; // avoid unused parameter warning + at::manual_seed(42); + torch::lazy::LazyGraphExecutor::Get()->SetRngSeed(torch::lazy::BackendDevice(), 42); +} + +void LazyTsTest::TearDown() {} + +namespace { +using torch::lazy::DebugUtil; + +class LazyOpsTest : public LazyOpsTestBase {}; + +static inline bool IsCuda() { + return torch::lazy::getBackend()->EagerFallbackDeviceType() == at::kCUDA; +} + +static inline at::DeviceType DefaultDevice() { + return torch::lazy::getBackend()->EagerFallbackDeviceType(); +} + + +} // namespace + +TEST_F(LazyOpsTest, TestScalarTensor) { + torch::Tensor scalar_tensor = torch::scalar_tensor( + 1., torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_scalar_tensor = torch::scalar_tensor( + 1., torch::TensorOptions(torch::kFloat).device(torch::kLazy)); + AllClose(scalar_tensor, lazy_scalar_tensor); + }); +} + +TEST_F(LazyOpsTest, TestClone) { + ForEachDevice([&](const torch::Device& device) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = lazy_a.clone(); + AllClose(a, lazy_b); + lazy_a.add_(1.0); + AllClose(a, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestTo) { + ForEachDevice([&](const torch::Device& device) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor lazy_a = CopyToDevice(a, device); + AllClose(a, lazy_a); + }); +} + +TEST_F(LazyOpsTest, TestIsFloatingPoint) { + ForEachDevice([&](const torch::Device& device) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor lazy_a = CopyToDevice(a, device); + bool is_float = torch::is_floating_point(a); + bool lazy_is_float = torch::is_floating_point(lazy_a); + EXPECT_EQ(is_float, lazy_is_float); + }); +} + +TEST_F(LazyOpsTest, TestIsSigned) { + ForEachDevice([&](const torch::Device& device) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor lazy_a = CopyToDevice(a, device); + bool is_signed = torch::is_signed(a); + bool lazy_is_signed = torch::is_signed(lazy_a); + EXPECT_EQ(is_signed, lazy_is_signed); + }); +} + +TEST_F(LazyOpsTest, TestCastByte) { + torch::Tensor a = + torch::rand({2, 2}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())) * + 100.0; + torch::Tensor b = torch::_cast_Byte(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::_cast_Byte(lazy_a); + AllEqual(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestCastChar) { + torch::Tensor a = + torch::rand({2, 2}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())) * + 100.0; + torch::Tensor b = torch::_cast_Char(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::_cast_Char(lazy_a); + AllEqual(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestCastShort) { + torch::Tensor a = + torch::rand({2, 2}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())) * + 100.0; + torch::Tensor b = torch::_cast_Short(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::_cast_Short(lazy_a); + AllEqual(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestCastInt) { + torch::Tensor a = + torch::rand({2, 2}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())) * + 100.0; + torch::Tensor b = torch::_cast_Int(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::_cast_Int(lazy_a); + AllEqual(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestCastLong) { + torch::Tensor a = + torch::rand({2, 2}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())) * + 100.0; + torch::Tensor b = torch::_cast_Long(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::_cast_Long(lazy_a); + AllEqual(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestCastFloat) { + torch::Tensor a = + torch::rand({2, 2}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())) * + 100.0; + torch::Tensor b = torch::_cast_Float(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::_cast_Float(lazy_a); + AllEqual(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestRetainType) { + torch::Tensor lazy_a = torch::zeros( + {2, 2}, torch::TensorOptions(torch::kByte).device(torch::kLazy)); + torch::Tensor lazy_b = torch::ones( + {2, 2}, torch::TensorOptions(torch::kByte).device(torch::kLazy)); + torch::Tensor lazy_c = lazy_a + lazy_b; + EXPECT_EQ(lazy_c.scalar_type(), torch::ScalarType::Byte); +} + +TEST_F(LazyOpsTest, TestLogicalTypeWithInterop) { + torch::Tensor query = + torch::rand({2, 12, 20, 64}, + torch::TensorOptions(torch::kFloat).device(torch::kLazy)); + torch::Tensor key = + torch::rand({2, 12, 64, 20}, + torch::TensorOptions(torch::kFloat).device(torch::kLazy)); + torch::Tensor scores = + torch::matmul(query, key) / + torch::scalar_tensor( + 8, torch::TensorOptions(torch::kDouble).device(torch::kLazy)); + torch::Tensor p_attn = torch::softmax(scores, /*dim=*/-1); + EXPECT_EQ(p_attn.scalar_type(), torch::ScalarType::Float); +} + +TEST_F(LazyOpsTest, TestAdd) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = torch::add(a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::add(lazy_a, lazy_b); + AllClose(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestAddHalf) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kHalf).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {2, 2}, torch::TensorOptions(torch::kHalf).device(DefaultDevice())); + torch::Tensor c = torch::add(a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::add(lazy_a, lazy_b); + AllClose(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestAddMixedPrecision) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {2, 2}, torch::TensorOptions(torch::kHalf).device(DefaultDevice())); + torch::Tensor c = torch::add(a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::add(lazy_a, lazy_b); + AllClose(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestAddInPlace) { + ForEachDevice([&](const torch::Device& device) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor b = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor c = a.add_(b); + torch::Tensor lazy_c = lazy_a.add_(lazy_b); + AllClose(a, lazy_a); + AllClose(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestAddScalar) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Scalar b(1); + torch::Tensor c = torch::add(a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_c = torch::add(lazy_a, b); + AllClose(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestAddScalarInPlace) { + torch::Scalar b(1); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor c = a.add_(b); + torch::Tensor lazy_c = lazy_a.add_(b); + AllClose(a, lazy_a); + AllClose(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestAddZeroSizeDim) { + torch::Tensor a = torch::rand( + {0, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {1, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = torch::add(a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::add(lazy_a, lazy_b); + AllClose(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestSub) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = torch::sub(a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::sub(lazy_a, lazy_b); + AllClose(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestSubInPlace) { + ForEachDevice([&](const torch::Device& device) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor b = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor c = a.sub_(b); + torch::Tensor lazy_c = lazy_a.sub_(lazy_b); + AllClose(a, lazy_a); + AllClose(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestSubScalar) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Scalar b(1); + torch::Tensor c = torch::sub(a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_c = torch::sub(lazy_a, b); + AllClose(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestSubScalarInPlace) { + torch::Scalar b(1); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor c = a.sub_(b); + torch::Tensor lazy_c = lazy_a.sub_(b); + AllClose(a, lazy_a); + AllClose(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestMul) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = torch::mul(a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::mul(lazy_a, lazy_b); + AllClose(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestMulInPlace) { + ForEachDevice([&](const torch::Device& device) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor b = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor c = a.mul_(b); + torch::Tensor lazy_c = lazy_a.mul_(lazy_b); + AllClose(a, lazy_a); + AllClose(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestMulScalar) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Scalar b(3); + torch::Tensor c = torch::mul(a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_c = torch::mul(lazy_a, b); + AllClose(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestMulScalarInPlace) { + torch::Scalar b(3); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor c = a.mul_(b); + torch::Tensor lazy_c = lazy_a.mul_(b); + AllClose(a, lazy_a); + AllClose(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestDiv) { + for (torch::ScalarType scalar_type1 : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + torch::Tensor a = + isFloatingType(scalar_type1) + ? torch::rand({3, 4}, torch::TensorOptions(scalar_type1)) + : torch::randint(0, 100, {3, 4}, + torch::TensorOptions(scalar_type1)); + for (torch::ScalarType scalar_type2 : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + torch::Tensor b = + isFloatingType(scalar_type2) + ? torch::rand({3, 4}, torch::TensorOptions(scalar_type2)) + : torch::randint(1, 100, {3, 4}, + torch::TensorOptions(scalar_type2)); + torch::Tensor c = torch::div(a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::div(lazy_a, lazy_b); + AllClose(c, lazy_c); + }); + } + } +} + +TEST_F(LazyOpsTest, TestDivWithRoundingMode) { + c10::optional rounding_modes[] = {"trunc", "floor", + c10::nullopt}; + for (const auto& rounding_mode : rounding_modes) { + for (torch::ScalarType scalar_type1 : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + int lower_bound = (scalar_type1 == torch::kByte) ? 0 : -100; + torch::Tensor a = + isFloatingType(scalar_type1) + ? torch::rand({3, 4}, torch::TensorOptions(scalar_type1)) + : torch::randint(lower_bound, 50, {3, 4}, + torch::TensorOptions(scalar_type1)); + for (torch::ScalarType scalar_type2 : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, + torch::kInt, torch::kLong}) { + torch::Tensor b = + isFloatingType(scalar_type2) + ? torch::rand({3, 4}, torch::TensorOptions(scalar_type2)) + : torch::randint(51, 100, {3, 4}, + torch::TensorOptions(scalar_type2)); + torch::Tensor c = torch::div(a, b, rounding_mode); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::div(lazy_a, lazy_b, rounding_mode); + AllClose(c, lazy_c); + }); + } + } + } +} + +TEST_F(LazyOpsTest, TestDivInPlace) { + for (torch::ScalarType scalar_type1 : {torch::kFloat}) { + torch::Tensor a = + isFloatingType(scalar_type1) + ? torch::rand({3, 4}, torch::TensorOptions(scalar_type1)) + : torch::randint(0, 100, {3, 4}, + torch::TensorOptions(scalar_type1)); + for (torch::ScalarType scalar_type2 : {torch::kFloat}) { + torch::Tensor b = + isFloatingType(scalar_type2) + ? torch::rand({3, 4}, torch::TensorOptions(scalar_type2)) + : torch::randint(1, 100, {3, 4}, + torch::TensorOptions(scalar_type2)); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor c = a.div_(b); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = lazy_a.div_(lazy_b); + ; + AllClose(c, lazy_c); + }); + } + } +} + +TEST_F(LazyOpsTest, TestDivInPlaceWithRoundingMode) { + c10::optional rounding_modes[] = {"trunc", "floor", + c10::nullopt}; + for (const auto& rounding_mode : rounding_modes) { + for (torch::ScalarType scalar_type1 : {torch::kFloat}) { + torch::Tensor a = + isFloatingType(scalar_type1) + ? torch::rand({3, 4}, torch::TensorOptions(scalar_type1)) + : torch::randint(-100, 100, {3, 4}, + torch::TensorOptions(scalar_type1)); + for (torch::ScalarType scalar_type2 : {torch::kFloat}) { + torch::Tensor b = + isFloatingType(scalar_type2) + ? torch::rand({3, 4}, torch::TensorOptions(scalar_type2)) + : torch::randint(1, 100, {3, 4}, + torch::TensorOptions(scalar_type2)); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor c = a.div_(b, rounding_mode); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = lazy_a.div_(lazy_b, rounding_mode); + AllClose(c, lazy_c); + }); + } + } + } +} + +TEST_F(LazyOpsTest, TestDivScalar) { + for (torch::ScalarType scalar_type : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + torch::Tensor a = + isFloatingType(scalar_type) + ? torch::rand( + {3, 4}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint( + 1, 100, {3, 4}, + torch::TensorOptions(scalar_type).device(DefaultDevice())); + for (bool is_float : {true, false}) { + torch::Scalar b = is_float ? torch::Scalar(3.0) : torch::Scalar(3); + torch::Tensor c = torch::div(a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_c = torch::div(lazy_a, b); + AllClose(c, lazy_c); + }); + } + } +} + +TEST_F(LazyOpsTest, TestDivScalarInPlace) { + for (torch::ScalarType scalar_type : {torch::kFloat}) { + torch::Tensor a = + isFloatingType(scalar_type) + ? torch::rand( + {3, 4}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint( + 1, 100, {3, 4}, + torch::TensorOptions(scalar_type).device(DefaultDevice())); + for (bool is_float : {true, false}) { + torch::Scalar b = is_float ? torch::Scalar(3.0) : torch::Scalar(3); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor c = a.div_(b); + torch::Tensor lazy_c = lazy_a.div_(b); + AllClose(c, lazy_c); + }); + } + } +} + +TEST_F(LazyOpsTest, TestDivOut) { + for (torch::ScalarType scalar_type : {torch::kFloat, torch::kDouble}) { + torch::Tensor a = torch::rand( + {3, 4}, torch::TensorOptions(scalar_type).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {3, 4}, torch::TensorOptions(scalar_type).device(DefaultDevice())); + torch::Tensor c = torch::empty( + {3, 4}, torch::TensorOptions(scalar_type).device(DefaultDevice())); + torch::div_out(c, a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::empty({3, 4}, lazy_b.options()); + torch::div_out(lazy_c, lazy_a, lazy_b); + AllClose(c, lazy_c); + }); + } +} + +TEST_F(LazyOpsTest, TestRsubScalar) { + torch::Tensor input = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Scalar other(1.5); + torch::Scalar alpha(2.5); + torch::Tensor result = torch::rsub(input, other, alpha); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_result = torch::rsub(lazy_input, other, alpha); + AllClose(result, lazy_result); + }); +} + +TEST_F(LazyOpsTest, TestNe) { + torch::Tensor a = torch::rand( + {2, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {2, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = torch::ne(a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::ne(lazy_a, lazy_b); + AllEqual(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestNeInplace) { + torch::Tensor a = torch::rand( + {2, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor a_copy = a.clone(); + torch::Tensor b = a.clone(); + b[0] += 1; + a.ne_(b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a_copy, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + lazy_a.ne_(lazy_b); + AllClose(a, lazy_a); + }); +} + +TEST_F(LazyOpsTest, TestEq) { + torch::Tensor a = torch::rand( + {2, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = a.clone(); + torch::Tensor c = torch::eq(a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::eq(lazy_a, lazy_b); + AllEqual(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestEqInplace) { + torch::Tensor a = torch::rand( + {2, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = a.clone(); + b[0] += 1; + torch::Tensor a_copy = a.clone(); + a.eq_(b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a_copy, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + lazy_a.eq_(lazy_b); + AllClose(lazy_a, a); + }); +} + +TEST_F(LazyOpsTest, TestGe) { + torch::Tensor a = torch::rand( + {2, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = a.clone(); + torch::Tensor c = torch::ge(a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::ge(lazy_a, lazy_b); + AllEqual(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestGeInplace) { + torch::Tensor a = torch::rand( + {2, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = a.clone(); + b[0] += 1; + b[1] -= 1; + torch::Tensor a_copy = a.clone(); + a.ge_(b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a_copy, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + lazy_a.ge_(lazy_b); + AllClose(lazy_a, a); + }); +} + +TEST_F(LazyOpsTest, TestLe) { + torch::Tensor a = torch::rand( + {2, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = a.clone(); + torch::Tensor c = torch::le(a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::le(lazy_a, lazy_b); + AllEqual(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestLeInplace) { + torch::Tensor a = torch::rand( + {2, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = a.clone(); + b[0] += 1; + b[1] -= 1; + torch::Tensor a_copy = a.clone(); + a.le_(b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a_copy, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + lazy_a.le_(lazy_b); + AllClose(lazy_a, a); + }); +} + +TEST_F(LazyOpsTest, TestGt) { + torch::Tensor a = torch::rand( + {2, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::add(a.clone(), torch::ones_like(a)); + torch::Tensor c = torch::gt(b, a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::gt(lazy_b, lazy_a); + AllEqual(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestGtInplace) { + torch::Tensor a = torch::rand( + {2, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = a.clone(); + b[0] += 1; + b[1] -= 1; + torch::Tensor a_copy = a.clone(); + a.gt_(b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a_copy, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + lazy_a.gt_(lazy_b); + AllClose(lazy_a, a); + }); +} + +TEST_F(LazyOpsTest, TestLt) { + torch::Tensor a = torch::rand( + {2, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::add(a.clone(), torch::ones_like(a)); + torch::Tensor c = torch::lt(a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::lt(lazy_a, lazy_b); + AllEqual(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestLtInplace) { + torch::Tensor a = torch::rand( + {2, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = a.clone(); + b[0] += 1; + b[1] -= 1; + torch::Tensor a_copy = a.clone(); + a.lt_(b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a_copy, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + lazy_a.lt_(lazy_b); + AllClose(lazy_a, a); + }); +} + +TEST_F(LazyOpsTest, TestNeScalar) { + torch::Tensor input = torch::ones({2, 3}); + torch::Scalar other(float(0)); + torch::Tensor result = torch::ne(input, other); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_result = torch::ne(lazy_input, other); + AllEqual(result, lazy_result); + }); +} + +TEST_F(LazyOpsTest, TestEqScalar) { + torch::Tensor input = torch::ones({2, 3}); + torch::Scalar other(float(1)); + torch::Tensor result = torch::eq(input, other); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_result = torch::eq(lazy_input, other); + AllEqual(result, lazy_result); + }); +} + +TEST_F(LazyOpsTest, TestGeScalar) { + torch::Tensor input = torch::ones({2, 3}); + torch::Scalar other(float(1)); + torch::Tensor result = torch::ge(input, other); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_result = torch::ge(lazy_input, other); + AllEqual(result, lazy_result); + }); +} + +TEST_F(LazyOpsTest, TestGeScalarInplace) { + torch::Tensor input = torch::arange( + -1., 1.5, 0.5, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Scalar other(float(0)); + torch::Tensor input_copy = input.clone(); + input.ge_(other); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input_copy, device); + lazy_input.ge_(other); + AllClose(lazy_input, input); + }); +} + +TEST_F(LazyOpsTest, TestLeScalar) { + torch::Tensor input = torch::ones({2, 3}); + torch::Scalar other(float(1)); + torch::Tensor result = torch::le(input, other); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_result = torch::le(lazy_input, other); + AllEqual(result, lazy_result); + }); +} + +TEST_F(LazyOpsTest, TestLeScalarInplace) { + torch::Tensor input = torch::arange( + -1., 1.5, 0.5, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Scalar other(float(0)); + torch::Tensor input_copy = input.clone(); + input.le_(other); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input_copy, device); + lazy_input.le_(other); + AllClose(lazy_input, input); + }); +} + +TEST_F(LazyOpsTest, TestGtScalar) { + torch::Tensor input = torch::ones({2, 3}); + torch::Scalar other(float(0.5)); + torch::Tensor result = torch::gt(input, other); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_result = torch::gt(lazy_input, other); + AllEqual(result, lazy_result); + }); +} + +TEST_F(LazyOpsTest, TestGtScalarInplace) { + torch::Tensor input = torch::arange( + -1., 1.5, 0.5, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Scalar other(float(0)); + torch::Tensor input_copy = input.clone(); + input.gt_(other); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input_copy, device); + lazy_input.gt_(other); + AllClose(lazy_input, input); + }); +} + +TEST_F(LazyOpsTest, TestLtScalar) { + torch::Tensor input = torch::ones({2, 3}); + torch::Scalar other(float(1.5)); + torch::Tensor result = torch::lt(input, other); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_result = torch::lt(lazy_input, other); + AllEqual(result, lazy_result); + }); +} + +TEST_F(LazyOpsTest, TestLtScalarInplace) { + torch::Tensor input = torch::arange( + -1., 1.5, 0.5, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Scalar other(float(0)); + torch::Tensor input_copy = input.clone(); + input.lt_(other); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input_copy, device); + lazy_input.lt_(other); + AllClose(lazy_input, input); + }); +} + +TEST_F(LazyOpsTest, TestIntegerAdd) { + std::vector types( + {torch::kByte, torch::kChar, torch::kShort, torch::kInt, torch::kLong}); + + ForEachDevice([&](const torch::Device& device) { + for (auto type : types) { + torch::Tensor a = + torch::randint(0, 63, {2, 2}, torch::TensorOptions(type)); + torch::Tensor b = + torch::randint(0, 63, {2, 2}, torch::TensorOptions(type)); + torch::Scalar one = + isIntegralType(type) ? torch::Scalar(1) : torch::Scalar(1.0); + torch::Tensor c = torch::add(b, one); + + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::add(lazy_b, one); + + AllEqual(c, lazy_c); + } + }); +} + +TEST_F(LazyOpsTest, TestSVD) { + static const int dims[] = {4, 7}; + for (auto m : dims) { + for (auto n : dims) { + torch::Tensor a = torch::rand( + {m, n}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + auto b = torch::svd(a, /*some=*/true, /*compute_uv=*/true); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + auto lazy_b = torch::svd(lazy_a, /*some=*/true, /*compute_uv=*/true); + // The U and V matrices might have different sign for column vectors, so + // cannot be compared if not by absolute value. + AllClose(std::get<0>(b).abs(), std::get<0>(lazy_b).abs(), /*rtol=*/1e-3, + /*atol=*/1e-4); + torch::Tensor diag = std::get<1>(b); + torch::Tensor lazy_diag = std::get<1>(lazy_b); + ASSERT_EQ(diag.sizes(), lazy_diag.sizes()); + AllClose(diag, lazy_diag, /*rtol=*/1e-3, + /*atol=*/1e-4); + AllClose(std::get<2>(b).abs(), std::get<2>(lazy_b).abs(), /*rtol=*/1e-3, + /*atol=*/1e-4); + }); + } + } +} + +TEST_F(LazyOpsTest, TestQR) { + static const int dims[] = {4, 7}; + for (auto m : dims) { + for (auto n : dims) { + torch::Tensor a = torch::rand( + {m, n}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + auto b = torch::qr(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + auto lazy_b = torch::qr(lazy_a); + AllClose(std::get<0>(b).abs(), std::get<0>(lazy_b).abs(), /*rtol=*/1e-3, + /*atol=*/1e-4); + AllClose(std::get<1>(b).abs(), std::get<1>(lazy_b).abs(), /*rtol=*/1e-3, + /*atol=*/1e-4); + }); + } + } +} + +TEST_F(LazyOpsTest, TestSymEig) { + static const int dims[] = {4, 7}; + for (auto m : dims) { + for (bool eigenvectors : {true, false}) { + for (bool upper : {true, false}) { + torch::Tensor a = torch::rand( + {m, m}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor sym_a = a.mm(a.t()); + auto b = torch::symeig(sym_a, eigenvectors, upper); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(sym_a, device); + auto lazy_b = torch::symeig(lazy_a, eigenvectors, upper); + AllClose(std::get<0>(b), std::get<0>(lazy_b), /*rtol=*/3e-2, + /*atol=*/1e-2); + if (eigenvectors) { + AllClose(std::get<1>(b).abs(), std::get<1>(lazy_b).abs(), + /*rtol=*/3e-2, + /*atol=*/1e-2); + } else { + EXPECT_EQ(std::get<1>(b).sizes(), std::get<1>(lazy_b).sizes()); + } + }); + } + } + } +} + +TEST_F(LazyOpsTest, TestCholesky) { + static const int dims[] = {4, 7}; + for (auto m : dims) { + for (bool upper : {true, false}) { + torch::Tensor a = torch::rand( + {3, m, m}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor pd_a = + torch::matmul(a, torch::transpose(a, 1, 2)) + + torch::eye( + m, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + auto b = torch::cholesky(pd_a, upper); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(pd_a, device); + auto lazy_b = torch::cholesky(lazy_a, upper); + AllClose(b, lazy_b, /*rtol=*/1e-3, /*atol=*/1e-4); + }); + } + } +} + +TEST_F(LazyOpsTest, TestLogDet) { + static const int dims[] = {4, 7}; + for (auto m : dims) { + torch::Tensor a = torch::rand( + {3, m, m}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor pd_a = + torch::matmul(a, torch::transpose(a, 1, 2)) + + torch::eye(m, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::logdet(pd_a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(pd_a, device); + torch::Tensor lazy_b = torch::logdet(lazy_a); + AllClose(b, lazy_b, /*rtol=*/1e-3, /*atol=*/1e-4); + }); + } +} + +TEST_F(LazyOpsTest, TestTriangularSolve) { + static const int dims[] = {4, 7}; + for (bool batched_a : {true, false}) { + for (bool batched_b : {true, false}) { + for (auto m : dims) { + for (auto n : dims) { + for (bool upper : {true, false}) { + for (bool transpose : {true, false}) { + for (bool unitriangular : {true, false}) { + torch::Tensor a = + torch::randn({m, m}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice())); + torch::Tensor b = + torch::randn({m, n}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice())); + a = batched_a ? a.expand({3, m, m}).clone() : a; + b = batched_b ? b.expand({3, m, n}).clone() : b; + auto result = torch::triangular_solve( + b, a, /*upper=*/upper, /*transpose=*/transpose, + /*unitriangular=*/unitriangular); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + auto lazy_result = torch::triangular_solve( + lazy_b, lazy_a, /*upper=*/upper, /*transpose=*/transpose, + /*unitriangular=*/unitriangular); + AllClose(std::get<0>(result), std::get<0>(lazy_result), + /*rtol=*/1e-3, /*atol=*/1e-4); + AllClose(std::get<1>(result), std::get<1>(lazy_result), + /*rtol=*/1e-3, /*atol=*/1e-4); + }); + } + } + } + } + } + } + } +} + +TEST_F(LazyOpsTest, TestKthValue) { + torch::Tensor a = torch::rand( + {4, 5, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (int k = 1; k <= 3; ++k) { + int rank = a.dim(); + for (int dim = -rank; dim < rank; ++dim) { + for (bool keepdim : {false, true}) { + auto b = torch::kthvalue(a, k, dim, keepdim); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + auto lazy_b = torch::kthvalue(lazy_a, k, dim, keepdim); + AllClose(std::get<0>(b), std::get<0>(lazy_b)); + AllEqual(std::get<1>(b), std::get<1>(lazy_b)); + }); + } + } + } +} + +TEST_F(LazyOpsTest, TestTopK) { + torch::Tensor a = torch::rand( + {4, 5, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (int k = 1; k <= 3; ++k) { + int rank = a.dim(); + for (int dim = -rank; dim < rank; ++dim) { + for (bool largest : {false, true}) { + auto b = torch::topk(a, k, dim, largest, /*sorted=*/true); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + auto lazy_b = torch::topk(lazy_a, k, dim, largest, /*sorted=*/true); + AllClose(std::get<0>(b), std::get<0>(lazy_b)); + AllEqual(std::get<1>(b), std::get<1>(lazy_b)); + }); + } + } + } +} + +TEST_F(LazyOpsTest, TestSort) { + torch::Tensor a = torch::rand( + {4, 5, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (int k = 1; k <= 3; ++k) { + for (int dim = 0; dim < 3; ++dim) { + for (bool descending : {false, true}) { + auto b = torch::sort(a, dim, descending); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + auto lazy_b = torch::sort(lazy_a, dim, descending); + AllClose(std::get<0>(b), std::get<0>(lazy_b)); + AllEqual(std::get<1>(b), std::get<1>(lazy_b)); + }); + } + } + } +} + +TEST_F(LazyOpsTest, TestSortDescWithMinValue) { + std::vector values{-128, 100}; + torch::Tensor input = + torch::tensor(values, torch::TensorOptions(torch::kChar)); + auto output = torch::sort(input, /*dim=*/0, /*descending=*/true); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + auto lazy_output = torch::sort(lazy_input, /*dim=*/0, /*descending=*/true); + AllEqual(std::get<0>(output), std::get<0>(lazy_output)); + AllEqual(std::get<1>(output), std::get<1>(lazy_output)); + }); +} + +TEST_F(LazyOpsTest, TestArgSort) { + torch::Tensor a = torch::rand( + {4, 5, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (int k = 1; k <= 3; ++k) { + for (int dim = 0; dim < 3; ++dim) { + for (bool descending : {false, true}) { + torch::Tensor b = torch::argsort(a, dim, descending); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::argsort(lazy_a, dim, descending); + AllEqual(b, lazy_b); + }); + } + } + } +} + +TEST_F(LazyOpsTest, TestMin) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = torch::min(a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::min(lazy_a, lazy_b); + AllClose(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestMax) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = torch::max(a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::max(lazy_a, lazy_b); + AllClose(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestUnaryMin) { + torch::Tensor input = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor output = torch::min(input); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::min(lazy_input); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestUnaryMax) { + torch::Tensor input = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor output = torch::max(input); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::max(lazy_input); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestAll) { + for (torch::ScalarType scalar_type : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + torch::Tensor a = + isFloatingType(scalar_type) + ? torch::rand( + {3, 4}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint( + 100, {3, 4}, + torch::TensorOptions(scalar_type).device(DefaultDevice())); + torch::Tensor b = torch::all(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::all(lazy_a); + EqualValues(b, lazy_b); + }); + } +} + +TEST_F(LazyOpsTest, TestAllDim) { + torch::Tensor a = torch::randint( + 0, 5, {2, 3, 4}, + torch::TensorOptions(torch::kByte).device(DefaultDevice())); + int rank = a.dim(); + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor b = torch::all(a, dim, /*keepdim=*/false); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::all(lazy_a, dim, /*keepdim=*/false); + EqualValues(b, lazy_b); + }); + } +} + +TEST_F(LazyOpsTest, TestAllDimKeep) { + torch::Tensor a = torch::randint( + 0, 5, {2, 3, 4}, + torch::TensorOptions(torch::kByte).device(DefaultDevice())); + int rank = a.dim(); + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor b = torch::all(a, dim, /*keepdim=*/true); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::all(lazy_a, dim, /*keepdim=*/true); + EqualValues(b, lazy_b); + }); + } +} + +TEST_F(LazyOpsTest, TestAmax) { + torch::Tensor input = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int rank = input.dim(); + for (bool keepdim : {false, true}) { + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor values = torch::amax(input, {dim}, /*keepdim=*/keepdim); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_values = + torch::amax(lazy_input, {dim}, /*keepdim=*/keepdim); + AllClose(values, lazy_values); + }); + } + for (int dim1 = -rank; dim1 < rank; ++dim1) { + for (int dim2 = -rank; dim2 < rank; ++dim2) { + if ((dim1 == dim2) || (dim1 == rank + dim2) || (dim2 == rank + dim1)) + continue; + torch::Tensor values = + torch::amax(input, {dim1, dim2}, /*keepdim=*/keepdim); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_values = + torch::amax(lazy_input, {dim1, dim2}, /*keepdim=*/keepdim); + AllClose(values, lazy_values); + }); + } + } + } + ExpectCounterNotChanged("aten::.*", GetIgnoredCounters()); + ExpectCounterChanged("xla::amax", GetIgnoredCounters()); +} + +TEST_F(LazyOpsTest, TestAmin) { + torch::Tensor input = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int rank = input.dim(); + for (bool keepdim : {false, true}) { + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor values = torch::amin(input, {dim}, /*keepdim=*/keepdim); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_values = + torch::amin(lazy_input, {dim}, /*keepdim=*/keepdim); + AllClose(values, lazy_values); + }); + } + for (int dim1 = -rank; dim1 < rank; ++dim1) { + for (int dim2 = -rank; dim2 < rank; ++dim2) { + if ((dim1 == dim2) || (dim1 == rank + dim2) || (dim2 == rank + dim1)) + continue; + torch::Tensor values = + torch::amin(input, {dim1, dim2}, /*keepdim=*/keepdim); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_values = + torch::amin(lazy_input, {dim1, dim2}, /*keepdim=*/keepdim); + AllClose(values, lazy_values); + }); + } + } + } + ExpectCounterNotChanged("aten::.*", GetIgnoredCounters()); + ExpectCounterChanged("xla::amin", GetIgnoredCounters()); +} + +TEST_F(LazyOpsTest, TestAny) { + for (torch::ScalarType scalar_type : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + torch::Tensor a = + isFloatingType(scalar_type) + ? torch::rand( + {3, 4}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint( + 100, {3, 4}, + torch::TensorOptions(scalar_type).device(DefaultDevice())); + torch::Tensor b = torch::any(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::any(lazy_a); + EqualValues(b, lazy_b); + }); + } +} + +TEST_F(LazyOpsTest, TestAnyDim) { + torch::Tensor a = torch::randint( + 0, 5, {2, 3, 4}, + torch::TensorOptions(torch::kByte).device(DefaultDevice())); + int rank = a.dim(); + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor b = torch::any(a, dim, /*keepdim=*/false); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::any(lazy_a, dim, /*keepdim=*/false); + EqualValues(b, lazy_b); + }); + } +} + +TEST_F(LazyOpsTest, TestAnyDimKeep) { + torch::Tensor a = torch::randint( + 0, 5, {2, 3, 4}, + torch::TensorOptions(torch::kByte).device(DefaultDevice())); + int rank = a.dim(); + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor b = torch::any(a, dim, /*keepdim=*/true); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::any(lazy_a, dim, /*keepdim=*/true); + EqualValues(b, lazy_b); + }); + } +} + +TEST_F(LazyOpsTest, TestMean) { + torch::Tensor a = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::mean(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::mean(lazy_a); + ASSERT_EQ(b.sizes(), lazy_b.sizes()); + AllClose(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestMeanCast) { + torch::Tensor a = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::mean(a, torch::kDouble); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::mean(lazy_a, torch::kDouble); + AllClose(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestMeanInDim) { + torch::Tensor a = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int rank = a.dim(); + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor b = torch::mean(a, {dim}); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::mean(lazy_a, {dim}); + AllClose(b, lazy_b); + }); + } +} + +TEST_F(LazyOpsTest, TestMeanInDims) { + torch::Tensor a = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (auto dims : std::vector>{{0, 1}, {-3, -2}}) { + torch::Tensor b = torch::mean(a, dims); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::mean(lazy_a, dims); + AllClose(b, lazy_b); + }); + } +} + +TEST_F(LazyOpsTest, TestMeanInDimsKeepCast) { + torch::Tensor a = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (auto dims : std::vector>{{0, 1}, {-3, -2}}) { + torch::Tensor b = torch::mean(a, dims, true, torch::kDouble); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::mean(lazy_a, dims, true, torch::kDouble); + AllClose(b, lazy_b); + }); + } +} + +TEST_F(LazyOpsTest, TestMeanInDimOut) { + torch::Tensor a = torch::rand( + {4, 4, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int rank = a.dim(); + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor b = torch::empty( + {4, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::mean_out(b, a, {dim}); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::empty({4, 4}, lazy_a.options()); + torch::mean_out(lazy_b, lazy_a, {dim}); + AllClose(b, lazy_b); + }); + } +} + +TEST_F(LazyOpsTest, TestStd) { + torch::Tensor a = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (auto unbiased : {true, false}) { + torch::Tensor b = torch::std(a, unbiased); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::std(lazy_a, unbiased); + AllClose(b, lazy_b); + }); + } +} + +TEST_F(LazyOpsTest, TestStdInDim) { + torch::Tensor a = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int rank = a.dim(); + for (auto unbiased : {true, false}) { + for (auto keepdim : {true, false}) { + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor b = torch::std(a, {dim}, unbiased, keepdim); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::std(lazy_a, {dim}, unbiased, keepdim); + AllClose(b, lazy_b); + }); + } + } + } +} + +TEST_F(LazyOpsTest, TestStdWithCorrection) { + torch::Tensor a = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + // int rank = a.dim(); + c10::optional corrections[] = {1, 2, c10::nullopt}; + for (const auto& correction : corrections) { + for (auto keepdim : {true, false}) { + for (const auto& dim : + std::vector>{{0, 1}, {-3, -2}}) { + torch::Tensor b = torch::std(a, dim, correction, keepdim); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::std(lazy_a, dim, correction, keepdim); + AllClose(b, lazy_b); + }); + } + } + } +} + +TEST_F(LazyOpsTest, TestStdMeanWithCorrection) { + torch::Tensor a = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + // int rank = a.dim(); + c10::optional corrections[] = {1, 2, c10::nullopt}; + for (const auto& correction : corrections) { + for (auto keepdim : {true, false}) { + for (const auto& dim : + std::vector>{{0, 1}, {-3, -2}}) { + auto b = torch::std_mean(a, dim, correction, keepdim); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + auto lazy_b = torch::std_mean(lazy_a, dim, correction, keepdim); + AllClose(std::get<0>(b), std::get<0>(lazy_b)); + AllClose(std::get<1>(b), std::get<1>(lazy_b)); + }); + } + } + } +} + +TEST_F(LazyOpsTest, TestSum) { + torch::Tensor a = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::sum(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::sum(lazy_a); + AllClose(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestSumCast) { + torch::Tensor a = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::sum(a, torch::kDouble); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::sum(lazy_a, torch::kDouble); + AllClose(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestSumU8) { + torch::Tensor a = torch::ones( + {256}, torch::TensorOptions(torch::kByte).device(DefaultDevice())); + torch::Tensor b = torch::sum(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::sum(lazy_a); + AllEqual(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestSumInDim) { + torch::Tensor a = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int rank = a.dim(); + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor b = torch::sum(a, {dim}); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::sum(lazy_a, {dim}); + AllClose(b, lazy_b); + }); + } +} + +TEST_F(LazyOpsTest, TestSumInDims) { + torch::Tensor a = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (auto dims : std::vector>{{0, 1}, {-3, -2}}) { + torch::Tensor b = torch::sum(a, dims); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::sum(lazy_a, dims); + AllClose(b, lazy_b); + }); + } +} + +TEST_F(LazyOpsTest, TestSumInDimsKeep) { + torch::Tensor a = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (auto dims : std::vector>{{0, 1}, {-3, -2}}) { + torch::Tensor b = torch::sum(a, dims, /*keepdim=*/true); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::sum(lazy_a, dims, /*keepdim=*/true); + AllClose(b, lazy_b); + }); + } +} + +TEST_F(LazyOpsTest, TestSumInDimsKeepCast) { + torch::Tensor a = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (auto dims : std::vector>{{0, 1}, {-3, -2}}) { + torch::Tensor b = torch::sum(a, dims, /*keepdim=*/true, torch::kDouble); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = + torch::sum(lazy_a, dims, /*keepdim=*/true, torch::kDouble); + AllClose(b, lazy_b); + }); + } +} + +TEST_F(LazyOpsTest, TestVar) { + torch::Tensor a = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (bool unbiased : {true, false}) { + torch::Tensor b = torch::var(a, unbiased); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::var(lazy_a, unbiased); + AllClose(b, lazy_b); + }); + } +} + +TEST_F(LazyOpsTest, TestVarWithDim) { + torch::Tensor a = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (auto dims : std::vector>{{0, 1}, {-3, -2}}) { + for (bool keepDim : {true, false}) { + for (bool unbiased : {true, false}) { + torch::Tensor b = torch::var(a, dims, unbiased, keepDim); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::var(lazy_a, dims, unbiased, keepDim); + AllClose(b, lazy_b); + }); + } + } + } +} + +TEST_F(LazyOpsTest, TestVarWithCorrection) { + torch::Tensor a = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + c10::optional corrections[] = {1, 2, c10::nullopt}; + for (const auto& dim : std::vector>{{0, 1}, {-3, -2}}) { + for (bool keepDim : {true, false}) { + for (const auto& correction : corrections) { + torch::Tensor b = torch::var(a, dim, correction, keepDim); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::var(lazy_a, dim, correction, keepDim); + AllClose(b, lazy_b); + }); + } + } + } + ExpectCounterNotChanged("aten::.*", GetIgnoredCounters()); + ExpectCounterChanged("lazy::var", GetIgnoredCounters()); +} + +TEST_F(LazyOpsTest, TestVarMeanWithCorrection) { + torch::Tensor a = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + c10::optional corrections[] = {1, 2, c10::nullopt}; + for (const auto& dim : std::vector>{{0, 1}, {-3, -2}}) { + for (const auto& correction : corrections) { + for (auto keepdim : {true, false}) { + auto b = torch::var_mean(a, dim, correction, keepdim); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + auto lazy_b = torch::var_mean(lazy_a, dim, correction, keepdim); + AllClose(std::get<0>(b), std::get<0>(lazy_b)); + AllClose(std::get<1>(b), std::get<1>(lazy_b)); + }); + } + } + } +} + +TEST_F(LazyOpsTest, TestMaxInDim) { + torch::Tensor input = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int rank = input.dim(); + for (int dim = -rank; dim < rank; ++dim) { + for (bool keepdim : {false, true}) { + auto values_indices = torch::max(input, dim, /*keepdim=*/keepdim); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + auto lazy_values_indices = + torch::max(lazy_input, dim, /*keepdim=*/keepdim); + AllClose(std::get<0>(values_indices), std::get<0>(lazy_values_indices)); + AllEqual(std::get<1>(values_indices), std::get<1>(lazy_values_indices)); + }); + } + } +} + +TEST_F(LazyOpsTest, TestMinInDim) { + torch::Tensor input = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int rank = input.dim(); + for (int dim = -rank; dim < rank; ++dim) { + for (bool keepdim : {false, true}) { + auto values_indices = torch::min(input, dim, /*keepdim=*/keepdim); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + auto lazy_values_indices = + torch::min(lazy_input, dim, /*keepdim=*/keepdim); + AllClose(std::get<0>(values_indices), std::get<0>(lazy_values_indices)); + AllEqual(std::get<1>(values_indices), std::get<1>(lazy_values_indices)); + }); + } + } +} + +TEST_F(LazyOpsTest, TestNorm) { + torch::Tensor a = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::norm(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::norm(lazy_a); + AllClose(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestNormInDim) { + torch::Tensor a = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (int dim : {1, -2}) { + torch::Tensor b = torch::norm(a, 2, {dim}, /*keepdim=*/false); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::norm(lazy_a, 2, {dim}, /*keepdim=*/false); + AllClose(b, lazy_b); + }); + } +} + +TEST_F(LazyOpsTest, TestNormInDims) { + torch::Tensor a = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (auto dims : std::vector>{{1, 2}, {-2, -1}}) { + torch::Tensor b = torch::norm(a, 2, dims, /*keepdim=*/false); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::norm(lazy_a, 2, dims, /*keepdim=*/false); + AllClose(b, lazy_b); + }); + } +} + +TEST_F(LazyOpsTest, TestNormInDimsKeep) { + torch::Tensor a = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (auto dims : std::vector>{{1, 2}, {-2, -1}}) { + torch::Tensor b = torch::norm(a, 2, dims, /*keepdim=*/true); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::norm(lazy_a, 2, dims, /*keepdim=*/true); + AllClose(b, lazy_b); + }); + } +} + +TEST_F(LazyOpsTest, TestNormalTwoTensor) { + at::Tensor mean = at::zeros({10, 10, 10}, at::dtype(at::kFloat)); + at::Tensor std = at::ones({10, 10, 10}, at::dtype(at::kFloat)); + ForEachDevice([&](const torch::Device& device) { + at::Tensor lazy_mean = CopyToDevice(mean, device); + at::Tensor lazy_std = CopyToDevice(std, device); + at::Tensor lazy_normal = at::normal(lazy_mean, lazy_std); + double res_mean = lazy_normal.mean().item().toDouble(); + double res_std = lazy_normal.std().item().toDouble(); + EXPECT_GT(res_mean, -0.06); + EXPECT_LT(res_mean, 0.06); + EXPECT_GT(res_std, 0.94); + EXPECT_LT(res_std, 1.06); + }); +} + +TEST_F(LazyOpsTest, TestNormalDoubleMean) { + at::Tensor std = at::ones({10, 10, 10}, at::dtype(at::kFloat)); + ForEachDevice([&](const torch::Device& device) { + at::Tensor lazy_std = CopyToDevice(std, device); + at::Tensor lazy_normal = at::normal(0, lazy_std); + double res_mean = lazy_normal.mean().item().toDouble(); + double res_std = lazy_normal.std().item().toDouble(); + EXPECT_GT(res_mean, -0.06); + EXPECT_LT(res_mean, 0.06); + EXPECT_GT(res_std, 0.94); + EXPECT_LT(res_std, 1.06); + }); +} + +TEST_F(LazyOpsTest, TestNormalDoubleStd) { + at::Tensor mean = at::zeros({10, 10, 10}, at::dtype(at::kFloat)); + ForEachDevice([&](const torch::Device& device) { + at::Tensor lazy_mean = CopyToDevice(mean, device); + at::Tensor lazy_normal = at::normal(lazy_mean, 1); + double res_mean = lazy_normal.mean().item().toDouble(); + double res_std = lazy_normal.std().item().toDouble(); + EXPECT_GT(res_mean, -0.06); + EXPECT_LT(res_mean, 0.06); + EXPECT_GT(res_std, 0.94); + EXPECT_LT(res_std, 1.06); + }); +} + +TEST_F(LazyOpsTest, TestNormalInPlace) { + at::Tensor a = at::zeros({10, 10, 10}, at::dtype(at::kFloat)); + ForEachDevice([&](const torch::Device& device) { + at::Tensor lazy_a = CopyToDevice(a, device); + lazy_a.normal_(/*mean=*/0, /*std=*/1); + double res_mean = lazy_a.mean().item().toDouble(); + double res_std = lazy_a.std().item().toDouble(); + EXPECT_GT(res_mean, -0.06); + EXPECT_LT(res_mean, 0.06); + EXPECT_GT(res_std, 0.94); + EXPECT_LT(res_std, 1.06); + }); +} + +TEST_F(LazyOpsTest, TestUniformInPlace) { + const double eps = 1e-3; + at::Tensor a = at::zeros({10, 10, 10}, at::dtype(at::kFloat)); + ForEachDevice([&](const torch::Device& device) { + at::Tensor lazy_a = CopyToDevice(a, device); + lazy_a.uniform_(/*from=*/0, /*to=*/1); + at::Tensor cpu_a = ToCpuTensor(lazy_a); + double res_min = cpu_a.min().item().toDouble(); + double res_max = cpu_a.max().item().toDouble(); + EXPECT_GT(res_min, 0.0 - eps); + EXPECT_LT(res_max, 1.0 + eps); + }); +} + +TEST_F(LazyOpsTest, TestRandomInPlace) { + for (auto dtype : {torch::kFloat, torch::kDouble, torch::kByte, torch::kChar, + torch::kShort, torch::kInt, torch::kLong}) { + const double eps = 0.2; + torch::Tensor a = torch::zeros({10, 10, 10}, torch::TensorOptions(dtype)); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + lazy_a.random_(/*from=*/0, /*to=*/10); + double res_mean = lazy_a.sum().item().toDouble() / a.numel(); + double res_min = lazy_a.min().item().toDouble(); + double res_max = lazy_a.max().item().toDouble(); + EXPECT_GT(res_mean, 4.5 - eps); + EXPECT_LT(res_mean, 4.5 + eps); + EXPECT_EQ(res_min, 0.0); + EXPECT_EQ(res_max, 9.0); + }); + } +} + +TEST_F(LazyOpsTest, TestRandomInPlaceDefaultFrom) { + for (auto dtype : {torch::kFloat, torch::kDouble, torch::kByte, torch::kChar, + torch::kShort, torch::kInt, torch::kLong}) { + const double eps = 0.2; + torch::Tensor a = torch::zeros({10, 10, 10}, torch::TensorOptions(dtype)); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + lazy_a.random_(/*to=*/10); + double res_mean = lazy_a.sum().item().toDouble() / a.numel(); + double res_min = lazy_a.min().item().toDouble(); + double res_max = lazy_a.max().item().toDouble(); + EXPECT_GT(res_mean, 4.5 - eps); + EXPECT_LT(res_mean, 4.5 + eps); + EXPECT_EQ(res_min, 0.0); + EXPECT_EQ(res_max, 9.0); + }); + } +} + +TEST_F(LazyOpsTest, TestRandomInPlaceDefault) { + for (auto dtype : {torch::kFloat, torch::kDouble, torch::kByte, torch::kChar, + torch::kShort, torch::kInt, torch::kLong}) { + auto input = torch::zeros({10}, torch::TensorOptions(dtype)); + ForEachDevice([&](const torch::Device& device) { + auto lazyInput = CopyToDevice(input, device); + lazyInput.random_(); + auto output = ToCpuTensor(lazyInput); + EXPECT_TRUE(torch::all(output.ne(input)).item()); + }); + } +} + +TEST_F(LazyOpsTest, TestNormGeneral) { + torch::Tensor a = torch::randn( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::norm(a, 3.5); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::norm(lazy_a, 3.5); + AllClose(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestNormNuclear) { + torch::Tensor a = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::norm(a, 1); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::norm(lazy_a, 1); + AllClose(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestFrobeniusNorm) { + torch::Tensor a = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::frobenius_norm(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::frobenius_norm(lazy_a); + AllClose(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestFrobeniusNormInDim) { + torch::Tensor a = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (int dim : {1, -2}) { + torch::Tensor b = torch::frobenius_norm(a, {dim}, /*keepdim=*/false); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = + torch::frobenius_norm(lazy_a, {dim}, /*keepdim=*/false); + AllClose(b, lazy_b); + }); + } +} + +TEST_F(LazyOpsTest, TestFrobeniusNormInDims) { + torch::Tensor a = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (auto dims : std::vector>{{1, 2}, {-2, -1}}) { + torch::Tensor b = torch::frobenius_norm(a, dims, /*keepdim=*/false); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = + torch::frobenius_norm(lazy_a, dims, /*keepdim=*/false); + AllClose(b, lazy_b); + }); + } +} + +TEST_F(LazyOpsTest, TestGroupNorm) { + int num_channels = 6; + torch::Tensor input = + torch::rand({20, num_channels, 10, 10}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor weight = + torch::rand({num_channels}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor bias = + torch::rand({num_channels}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + double eps = 1e-05; + for (int num_groups : {3, 6, 1}) { + torch::Tensor output = + torch::group_norm(input, num_groups, weight, bias, eps, + /*cudnn_enabled=*/false); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_weight = CopyToDevice(weight, device); + torch::Tensor lazy_bias = CopyToDevice(bias, device); + torch::Tensor lazy_output = + torch::group_norm(lazy_input, num_groups, lazy_weight, lazy_bias, eps, + /*cudnn_enabled=*/false); + AllClose(output, lazy_output, /*rtol=*/1e-3, /*atol=*/1e-5); + }); + } +} + +TEST_F(LazyOpsTest, TestGroupNormBackward) { + int num_channels = 6; + torch::Tensor input = + torch::rand({2, num_channels, 5, 5}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true)); + torch::Tensor weight = + torch::rand({num_channels}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true)); + torch::Tensor bias = + torch::rand({num_channels}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true)); + double eps = 1e-05; + for (bool undef_weight : {true, false}) { + for (int num_groups : {3, 6, 1}) { + auto testfn = + [&](const std::vector& inputs) -> torch::Tensor { + return torch::group_norm( + /*input=*/inputs[0], num_groups, inputs[1], inputs[2], + /*eps=*/eps, + /*cudnn_enabled=*/false); + }; + torch::Tensor undef; + ForEachDevice([&](const torch::Device& device) { + TestBackward( + {input, undef_weight ? undef : weight, undef_weight ? undef : bias}, + device, testfn, + /*rtol=*/1e-3, /*atol=*/1e-3, + /*derivative_level=*/2); + }); + } + } +} + +TEST_F(LazyOpsTest, TestInstanceNorm) { + int batch = 5; + int num_channels = 20; + torch::Tensor input = + torch::rand({batch, num_channels, 10, 10}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor weight = + torch::rand({num_channels}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor bias = + torch::rand({num_channels}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor running_mean = + torch::zeros({num_channels}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor running_var = + torch::ones({num_channels}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + double momentum = 0.1; + double eps = 1e-05; + torch::Tensor output = torch::instance_norm( + input, weight, bias, running_mean, running_var, + /*use_input_stats=*/true, momentum, eps, /*cudnn_enabled=*/false); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_weight = CopyToDevice(weight, device); + torch::Tensor lazy_bias = CopyToDevice(bias, device); + torch::Tensor lazy_running_mean = CopyToDevice(running_mean, device); + torch::Tensor lazy_running_var = CopyToDevice(running_var, device); + torch::Tensor lazy_output = torch::instance_norm( + lazy_input, lazy_weight, lazy_bias, lazy_running_mean, lazy_running_var, + /*use_input_stats=*/true, momentum, eps, /*cudnn_enabled=*/false); + AllClose(output, lazy_output, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestLayerNorm) { + torch::Tensor input = + torch::rand({20, 10, 10, 10}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + double eps = 1e-05; + torch::Tensor undef; + for (bool undef_weight : {true, false}) { + for (int64_t normalized_size : {2, 3}) { + std::vector normalized_shape(normalized_size, 10); + torch::Tensor weight = torch::rand( + normalized_shape, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor bias = torch::rand( + normalized_shape, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor output = torch::layer_norm(input, normalized_shape, + undef_weight ? undef : weight, + undef_weight ? undef : bias, eps, + /*cudnn_enabled=*/false); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_weight = + undef_weight ? undef : CopyToDevice(weight, device); + torch::Tensor lazy_bias = + undef_weight ? undef : CopyToDevice(bias, device); + torch::Tensor lazy_output = torch::layer_norm( + lazy_input, normalized_shape, lazy_weight, lazy_bias, eps, + /*cudnn_enabled=*/false); + AllClose(output, lazy_output, /*rtol=*/1e-3, /*atol=*/1e-5); + }); + } + } +} + +TEST_F(LazyOpsTest, TestLayerNormBackward) { + torch::Tensor input = + torch::rand({2, 3, 3, 3}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true)); + double eps = 1e-05; + for (bool undef_weight : {true, false}) { + for (int64_t normalized_size : {2, 3}) { + std::vector normalized_shape(normalized_size, 3); + auto testfn = + [&](const std::vector& inputs) -> torch::Tensor { + return torch::layer_norm( + /*input=*/inputs[0], normalized_shape, inputs[1], inputs[2], + /*eps=*/eps, + /*cudnn_enabled=*/false); + }; + torch::Tensor weight = + torch::rand(normalized_shape, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true)); + torch::Tensor bias = + torch::rand(normalized_shape, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true)); + torch::Tensor undef; + ForEachDevice([&](const torch::Device& device) { + TestBackward( + {input, undef_weight ? undef : weight, undef_weight ? undef : bias}, + device, testfn, + /*rtol=*/1e-3, /*atol=*/1e-4, /*derivative_level=*/2); + }); + } + } +} + +TEST_F(LazyOpsTest, TestNuclearNorm) { + torch::Tensor a = torch::rand( + {4, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::nuclear_norm(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::nuclear_norm(lazy_a); + AllClose(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestPairwiseDistance) { + torch::Tensor x1 = torch::rand( + {4, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor x2 = torch::rand( + {4, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + double eps = 1e-6; + for (bool keepdim : {false, true}) { + for (double p : {1, 2, 3, 4}) { + ForEachDevice([&](const torch::Device& device) { + torch::Tensor output = + torch::pairwise_distance(x1, x2, p, eps, keepdim); + torch::Tensor lazy_x1 = CopyToDevice(x1, device); + torch::Tensor lazy_x2 = CopyToDevice(x2, device); + torch::Tensor lazy_output = + torch::pairwise_distance(lazy_x1, lazy_x2, p, eps, keepdim); + AllClose(output, lazy_output, /*rtol=*/1e-5, /*atol=*/1e-5); + }); + } + } +} + +TEST_F(LazyOpsTest, TestCosineSimilarity) { + torch::Tensor x1 = torch::rand( + {4, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor x2 = torch::rand( + {4, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + double eps = 1e-8; + int rank = x1.dim(); + for (int dim = -rank; dim < rank; ++dim) { + ForEachDevice([&](const torch::Device& device) { + torch::Tensor output = torch::cosine_similarity(x1, x2, dim, eps); + torch::Tensor lazy_x1 = CopyToDevice(x1, device); + torch::Tensor lazy_x2 = CopyToDevice(x2, device); + torch::Tensor lazy_output = + torch::cosine_similarity(lazy_x1, lazy_x2, dim, eps); + AllClose(output, lazy_output); + }); + } +} + +TEST_F(LazyOpsTest, TestCosineEmbeddingLoss) { + torch::Tensor input1 = torch::rand( + {4, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor input2 = torch::rand( + {4, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor target = torch::rand( + {4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (torch::Reduction::Reduction reduction : + {torch::Reduction::Mean, torch::Reduction::Sum}) { + for (double margin : {0., 0.2}) { + ForEachDevice([&](const torch::Device& device) { + torch::Tensor output = torch::cosine_embedding_loss( + input1, input2, target, margin, reduction); + torch::Tensor lazy_input1 = CopyToDevice(input1, device); + torch::Tensor lazy_input2 = CopyToDevice(input2, device); + torch::Tensor lazy_target = CopyToDevice(target, device); + torch::Tensor lazy_output = torch::cosine_embedding_loss( + lazy_input1, lazy_input2, lazy_target, margin, reduction); + AllClose(output, lazy_output); + }); + } + } +} + +TEST_F(LazyOpsTest, TestHingeEmbeddingLoss) { + torch::Tensor input = torch::rand( + {4, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor target = torch::rand( + {4, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (torch::Reduction::Reduction reduction : + {torch::Reduction::Mean, torch::Reduction::Sum}) { + for (double margin : {0., 0.2}) { + ForEachDevice([&](const torch::Device& device) { + torch::Tensor output = + torch::hinge_embedding_loss(input, target, margin, reduction); + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_target = CopyToDevice(target, device); + torch::Tensor lazy_output = torch::hinge_embedding_loss( + lazy_input, lazy_target, margin, reduction); + AllClose(output, lazy_output); + }); + } + } +} + +TEST_F(LazyOpsTest, TestTripletMarginLoss) { + torch::Tensor anchor = torch::rand( + {4, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor positive = torch::abs(torch::rand( + {4, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice()))); + torch::Tensor negative = torch::neg(torch::abs(torch::rand( + {4, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())))); + double eps = 1e-6; + for (double margin : {0., 0.2}) { + for (double p : {1, 2, 3, 4}) { + for (bool swap : {false, true}) { + for (torch::Reduction::Reduction reduction : + {torch::Reduction::Mean, torch::Reduction::Sum}) { + ForEachDevice([&](const torch::Device& device) { + torch::Tensor output = torch::triplet_margin_loss( + anchor, positive, negative, margin, p, eps, swap, reduction); + torch::Tensor lazy_anchor = CopyToDevice(anchor, device); + torch::Tensor lazy_positive = CopyToDevice(positive, device); + torch::Tensor lazy_negative = CopyToDevice(negative, device); + torch::Tensor lazy_output = torch::triplet_margin_loss( + lazy_anchor, lazy_positive, lazy_negative, margin, p, eps, swap, + reduction); + AllClose(output, lazy_output); + }); + } + } + } + } +} + +TEST_F(LazyOpsTest, TestBinaryCrossEntropy) { + int batch = 10; + int classes = 5; + torch::Tensor input = + torch::rand({batch, classes}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor target = + torch::rand({batch, classes}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor weight = + torch::rand({batch, classes}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor undef; + for (torch::Reduction::Reduction reduction : + {torch::Reduction::Mean, torch::Reduction::Sum, + torch::Reduction::None}) { + for (bool undef_weight : {false, true}) { + ForEachDevice([&](const torch::Device& device) { + torch::Tensor output = torch::binary_cross_entropy( + input, target, undef_weight ? undef : weight, reduction); + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_target = CopyToDevice(target, device); + torch::Tensor lazy_weight = + undef_weight ? undef : CopyToDevice(weight, device); + torch::Tensor lazy_output = torch::binary_cross_entropy( + lazy_input, lazy_target, lazy_weight, reduction); + AllClose(output, lazy_output, /*rtol=*/1e-4, /*atol=*/1e-5); + }); + } + } +} + +TEST_F(LazyOpsTest, TestMarginRankingLoss) { + torch::Tensor input1 = torch::rand( + {4, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor input2 = torch::rand( + {4, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor target = torch::rand( + {4, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (torch::Reduction::Reduction reduction : + {torch::Reduction::Mean, torch::Reduction::Sum}) { + for (double margin : {0., 0.2}) { + ForEachDevice([&](const torch::Device& device) { + torch::Tensor output = torch::margin_ranking_loss( + input1, input2, target, margin, reduction); + torch::Tensor lazy_input1 = CopyToDevice(input1, device); + torch::Tensor lazy_input2 = CopyToDevice(input2, device); + torch::Tensor lazy_target = CopyToDevice(target, device); + torch::Tensor lazy_output = torch::margin_ranking_loss( + lazy_input1, lazy_input2, lazy_target, margin, reduction); + AllClose(output, lazy_output); + }); + } + } +} + +TEST_F(LazyOpsTest, TestBCEWithLogits) { + int batch = 10; + int classes = 5; + torch::Tensor input = + torch::rand({batch, classes}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor target = + torch::rand({batch, classes}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor weight = torch::rand( + {classes}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor pos_weight = torch::rand( + {classes}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor undef; + for (torch::Reduction::Reduction reduction : + {torch::Reduction::Mean, torch::Reduction::Sum}) { + for (bool undef_weight : {false, true}) { + for (bool undef_pos_weight : {false, true}) { + ForEachDevice([&](const torch::Device& device) { + torch::Tensor output = torch::binary_cross_entropy_with_logits( + input, target, undef_weight ? undef : weight, + undef_pos_weight ? undef : pos_weight, reduction); + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_target = CopyToDevice(target, device); + torch::Tensor lazy_weight = + undef_weight ? undef : CopyToDevice(weight, device); + torch::Tensor lazy_pos_weight = + undef_pos_weight ? undef : CopyToDevice(pos_weight, device); + torch::Tensor lazy_output = torch::binary_cross_entropy_with_logits( + lazy_input, lazy_target, lazy_weight, lazy_pos_weight, reduction); + }); + } + } + } +} + +TEST_F(LazyOpsTest, TestKlDiv) { + torch::Tensor input = torch::rand( + {4, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor target = torch::rand( + {4, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (bool log_target : {true, false}) { + for (torch::Reduction::Reduction reduction : + {torch::Reduction::Mean, torch::Reduction::Sum}) { + ForEachDevice([&](const torch::Device& device) { + torch::Tensor output = + torch::kl_div(input, target, reduction, log_target); + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_target = CopyToDevice(target, device); + torch::Tensor lazy_output = + torch::kl_div(lazy_input, lazy_target, reduction, log_target); + AllClose(output, lazy_output); + }); + } + } +} + +TEST_F(LazyOpsTest, TestProd) { + torch::Tensor a = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::prod(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::prod(lazy_a); + AllClose(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestProdCast) { + torch::Tensor a = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::prod(a, torch::kDouble); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::prod(lazy_a, torch::kDouble); + AllClose(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestProdInDim) { + torch::Tensor a = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int rank = a.dim(); + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor b = torch::prod(a, dim); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::prod(lazy_a, dim); + AllClose(b, lazy_b); + }); + } +} + +TEST_F(LazyOpsTest, TestProdInDimKeepCast) { + torch::Tensor a = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int rank = a.dim(); + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor b = torch::prod(a, dim, /*keepdim=*/true, torch::kDouble); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = + torch::prod(lazy_a, dim, /*keepdim=*/true, torch::kDouble); + AllClose(b, lazy_b); + }); + } +} + +TEST_F(LazyOpsTest, TestProdInDimKeep) { + torch::Tensor a = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int rank = a.dim(); + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor b = torch::prod(a, dim, /*keepdim=*/true); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::prod(lazy_a, dim, /*keepdim=*/true); + AllClose(b, lazy_b); + }); + } +} + +TEST_F(LazyOpsTest, TestCumSum) { + torch::Tensor input = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int rank = input.dim(); + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor result = torch::cumsum(input, dim); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_result = torch::cumsum(lazy_input, dim); + AllClose(result, lazy_result); + }); + } +} + +TEST_F(LazyOpsTest, TestCumSumCast) { + torch::Tensor input = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int rank = input.dim(); + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor result = torch::cumsum(input, dim, torch::kDouble); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_result = torch::cumsum(lazy_input, dim, torch::kDouble); + AllClose(result, lazy_result); + }); + } +} + +TEST_F(LazyOpsTest, TestCumSumLong) { + torch::Tensor input = torch::randint( + 1000, {4, 3, 4}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + int rank = input.dim(); + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor result = torch::cumsum(input, dim); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_result = torch::cumsum(lazy_input, dim); + AllEqual(result, lazy_result); + }); + } +} + +TEST_F(LazyOpsTest, TestCumSumCastLong) { + torch::Tensor input = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int rank = input.dim(); + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor result = torch::cumsum(input, dim, torch::kLong); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_result = torch::cumsum(lazy_input, dim, torch::kLong); + AllEqual(result, lazy_result); + }); + } +} + +TEST_F(LazyOpsTest, TestCumProd) { + torch::Tensor input = torch::rand( + {4, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int rank = input.dim(); + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor result = torch::cumprod(input, dim); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_result = torch::cumprod(lazy_input, dim); + AllClose(result, lazy_result); + }); + } +} + +TEST_F(LazyOpsTest, TestCumProdCast) { + torch::Tensor input = torch::mul( + torch::rand({4, 3, 4}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())), + 10); + int rank = input.dim(); + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor result = torch::cumprod(input, dim, torch::kDouble); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_result = torch::cumprod(lazy_input, dim, torch::kDouble); + AllClose(result, lazy_result); + }); + } +} + +TEST_F(LazyOpsTest, TestCumProdLong) { + torch::Tensor input = torch::randint( + 7, {2, 3}, torch::TensorOptions(torch::kLong).device(DefaultDevice())); + int rank = input.dim(); + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor result = torch::cumsum(input, dim); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_result = torch::cumsum(lazy_input, dim); + AllEqual(result, lazy_result); + }); + } +} + +TEST_F(LazyOpsTest, TestCumProdCastLong) { + torch::Tensor input = + torch::rand({2, 3}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())) * + 7; + int rank = input.dim(); + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor result = torch::cumsum(input, dim, torch::kLong); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_result = torch::cumsum(lazy_input, dim, torch::kLong); + AllEqual(result, lazy_result); + }); + } +} + +TEST_F(LazyOpsTest, TestArgMin) { + torch::Tensor a = torch::rand( + {4, 4, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::argmin(a, c10::nullopt, /*keepdim=*/false); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::argmin(lazy_a, c10::nullopt, /*keepdim=*/false); + AllEqual(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestArgMinDim) { + torch::Tensor a = torch::rand( + {4, 4, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (int dim : {1, -2}) { + torch::Tensor b = torch::argmin(a, dim, /*keepdim=*/false); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::argmin(lazy_a, dim, /*keepdim=*/false); + AllEqual(b, lazy_b); + }); + } +} + +TEST_F(LazyOpsTest, TestArgMinDimKeep) { + torch::Tensor a = torch::rand( + {4, 4, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (int dim : {1, -2}) { + torch::Tensor b = torch::argmin(a, dim, /*keepdim=*/true); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::argmin(lazy_a, dim, /*keepdim=*/true); + AllEqual(b, lazy_b); + }); + } +} + +TEST_F(LazyOpsTest, TestArgMinSameValue) { + torch::Tensor a = torch::ones( + {4, 4, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::argmin(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::argmin(lazy_a); + AllEqual(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestArgMinWrapper) { + torch::Tensor a = torch::rand( + {4, 4, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (int dim : {1, -2}) { + torch::Tensor b = torch::argmin(a, dim, /*keepdim=*/false); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::argmin(lazy_a, dim, /*keepdim=*/false); + AllEqual(b, lazy_b); + }); + } +} + +TEST_F(LazyOpsTest, TestArgMax) { + torch::Tensor a = torch::rand( + {4, 4, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::argmax(a, c10::nullopt, /*keepdim=*/false); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::argmax(lazy_a, c10::nullopt, /*keepdim=*/false); + AllEqual(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestArgMaxDim) { + torch::Tensor a = torch::rand( + {4, 4, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (int dim : {1, -2}) { + torch::Tensor b = torch::argmax(a, dim, /*keepdim=*/false); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::argmax(lazy_a, dim, /*keepdim=*/false); + AllEqual(b, lazy_b); + }); + } +} + +TEST_F(LazyOpsTest, TestArgMaxDimKeep) { + torch::Tensor a = torch::rand( + {4, 4, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (int dim : {1, -2}) { + torch::Tensor b = torch::argmax(a, dim, /*keepdim=*/true); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::argmax(lazy_a, dim, /*keepdim=*/true); + AllEqual(b, lazy_b); + }); + } +} + +TEST_F(LazyOpsTest, TestArgMaxSameValue) { + torch::Tensor a = torch::ones( + {4, 4, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::argmax(a, c10::nullopt, /*keepdim=*/false); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::argmax(lazy_a, c10::nullopt, /*keepdim=*/false); + AllEqual(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestArgMaxWrapper) { + torch::Tensor a = torch::rand( + {4, 4, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (int dim : {1, -2}) { + torch::Tensor b = torch::argmax(a, dim, /*keepdim=*/false); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::argmax(lazy_a, dim, /*keepdim=*/false); + AllEqual(b, lazy_b); + }); + } +} + +TEST_F(LazyOpsTest, TestAsin) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::asin(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::asin(lazy_a); + AllClose(b, lazy_b, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestAsinh) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::asinh(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::asinh(lazy_a); + AllClose(b, lazy_b, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestAsinhInPlace) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor b = torch::asinh_(a); + torch::Tensor lazy_b = torch::asinh_(lazy_a); + AllClose(a, lazy_a, /*rtol=*/1e-3, /*atol=*/1e-5); + AllClose(b, lazy_b, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestSin) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::sin(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::sin(lazy_a); + AllClose(b, lazy_b, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestSinh) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::sinh(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::sinh(lazy_a); + AllClose(b, lazy_b, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestAcos) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::acos(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::acos(lazy_a); + AllClose(b, lazy_b, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestAcosh) { + torch::Tensor a = + torch::rand({2, 2}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())) * + 100; + torch::Tensor b = torch::acosh(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::acosh(lazy_a); + AllClose(b, lazy_b, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestAcoshInPlace) { + torch::Tensor a = + torch::rand({2, 2}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())) * + 100; + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor b = torch::acosh_(a); + torch::Tensor lazy_b = torch::acosh_(lazy_a); + AllClose(a, lazy_a, /*rtol=*/1e-3, /*atol=*/1e-5); + AllClose(b, lazy_b, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestCos) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::cos(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::cos(lazy_a); + AllClose(b, lazy_b, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestCosh) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::cosh(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::cosh(lazy_a); + AllClose(b, lazy_b, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestAtan) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::atan(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::atan(lazy_a); + AllClose(b, lazy_b, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestAtanh) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::atanh(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::atanh(lazy_a); + AllClose(b, lazy_b, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestAtanhInPlace) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor b = torch::atanh_(a); + torch::Tensor lazy_b = torch::atanh_(lazy_a); + AllClose(a, lazy_a, /*rtol=*/1e-3, /*atol=*/1e-5); + AllClose(b, lazy_b, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestAtan2) { + torch::Tensor a = torch::randn( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::randn( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = torch::atan2(a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::atan2(lazy_a, lazy_b); + AllClose(c, lazy_c, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestTan) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::tan(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::tan(lazy_a); + AllClose(b, lazy_b, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestTanh) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::tanh(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::tanh(lazy_a); + AllClose(b, lazy_b, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestClampMinMax) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Scalar min_val(0.311); + torch::Scalar max_val(0.409); + torch::Tensor b = torch::clamp(a, min_val, max_val); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::clamp(lazy_a, min_val, max_val); + AllClose(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestClampMin) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Scalar min_val(0.311); + torch::Tensor b = torch::clamp(a, min_val, c10::nullopt); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::clamp(lazy_a, min_val, c10::nullopt); + AllClose(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestClampMax) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Scalar max_val(0.409); + torch::Tensor b = torch::clamp(a, c10::nullopt, max_val); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::clamp(lazy_a, c10::nullopt, max_val); + AllClose(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestClampMinExplicit) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Scalar min_val(0.311); + torch::Tensor b = torch::clamp_min(a, min_val); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::clamp_min(lazy_a, min_val); + AllClose(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestClampMaxExplicit) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Scalar max_val(0.409); + torch::Tensor b = torch::clamp_max(a, max_val); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::clamp_max(lazy_a, max_val); + AllClose(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestClampMinExplicitInPlace) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Scalar min_val(0.311); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor b = torch::clamp_min_(a, min_val); + torch::Tensor lazy_b = torch::clamp_min_(lazy_a, min_val); + AllClose(a, lazy_a); + AllClose(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestClampMaxExplicitInPlace) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Scalar max_val(0.409); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor b = torch::clamp_max_(a, max_val); + torch::Tensor lazy_b = torch::clamp_max_(lazy_a, max_val); + AllClose(a, lazy_a); + AllClose(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestCeil) { + torch::Tensor a = + torch::randn( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())) * + 100.0; + torch::Tensor b = torch::ceil(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::ceil(lazy_a); + AllClose(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestFloor) { + torch::Tensor a = + torch::randn( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())) * + 100.0; + torch::Tensor b = torch::floor(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::floor(lazy_a); + AllClose(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestRound) { + torch::Tensor a = torch::cat( + {torch::randn( + {8}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())) * + 100.0, + // Special case: 0.5, -0.5. lazy::Round impl rounds to -1/1 whereas + // lazy::RoundToEven properly implements bankers rounding. + torch::tensor( + {-0.5, 0.5}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice()))}, + 0); + torch::Tensor b = torch::round(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::round(lazy_a); + AllClose(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestTrunc) { + torch::Tensor a = + torch::randn( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())) * + 100.0; + torch::Tensor b = torch::trunc(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::trunc(lazy_a); + AllClose(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestFrac) { + torch::Tensor a = + torch::randn( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())) * + 100.0; + torch::Tensor b = torch::frac(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::frac(lazy_a); + AllClose(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestNeg) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::neg(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::neg(lazy_a); + AllClose(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestBitwiseNot) { + std::vector types( + {torch::kByte, torch::kChar, torch::kShort, torch::kInt, torch::kLong}); + + ForEachDevice([&](const torch::Device& device) { + for (auto type : types) { + torch::Tensor a = + torch::randint(0, 63, {2, 2}, torch::TensorOptions(type)); + torch::Tensor b = torch::bitwise_not(a); + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::bitwise_not(lazy_a); + AllEqual(b, lazy_b); + } + }); +} + +TEST_F(LazyOpsTest, TestBitwiseNotInPlace) { + std::vector types( + {torch::kByte, torch::kChar, torch::kShort, torch::kInt, torch::kLong}); + + ForEachDevice([&](const torch::Device& device) { + for (auto type : types) { + torch::Tensor a = + torch::randint(0, 63, {2, 2}, torch::TensorOptions(type)); + torch::Tensor lazy_a = CopyToDevice(a, device); + a.bitwise_not_(); + lazy_a.bitwise_not_(); + AllEqual(a, lazy_a); + } + }); +} + +TEST_F(LazyOpsTest, TestSign) { + torch::Tensor a = + torch::randn( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())) * + 100.0; + torch::Tensor b = torch::sign(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::sign(lazy_a); + AllClose(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestSignByte) { + torch::Tensor a = torch::randint( + 256, {2, 2}, torch::TensorOptions(torch::kByte).device(DefaultDevice())); + torch::Tensor b = torch::sign(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::sign(lazy_a); + AllEqual(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestAbs) { + torch::Tensor a = torch::randn( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::abs(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::abs(lazy_a); + AllClose(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestAbsByte) { + torch::Tensor a = torch::randint( + 256, {2, 2}, torch::TensorOptions(torch::kByte).device(DefaultDevice())); + torch::Tensor b = torch::abs(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::abs(lazy_a); + AllEqual(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestEmptyLike) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::empty_like(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::empty_like(lazy_a); + EXPECT_EQ(b.sizes(), lazy_b.sizes()); + }); +} + +TEST_F(LazyOpsTest, TestEmptyLikeOptions) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::empty_like( + a, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::empty_like( + lazy_a, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + EXPECT_EQ(b.sizes(), lazy_b.sizes()); + }); +} + +TEST_F(LazyOpsTest, TestEmpty) { + torch::Tensor a = torch::zeros( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = torch::empty( + {2, 2}, torch::TensorOptions(torch::kFloat).device(device)); + EXPECT_EQ(a.sizes(), lazy_a.sizes()); + }); +} + +TEST_F(LazyOpsTest, TestZeroInPlace) { + torch::Tensor input = torch::ones( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazyInput = CopyToDevice(input, device); + auto& output = torch::zero_(input); + auto& lazyOutput = torch::zero_(lazyInput); + AllClose(output, lazyOutput); + }); +} + +TEST_F(LazyOpsTest, TestZerosLike) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::zeros_like(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::zeros_like(lazy_a); + AllClose(a, lazy_a); + }); +} + +TEST_F(LazyOpsTest, TestZerosLikeOptions) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::zeros_like( + a, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::zeros_like( + lazy_a, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + AllClose(a, lazy_a); + }); +} + +TEST_F(LazyOpsTest, TestZeros) { + torch::Tensor a = torch::zeros( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = torch::zeros( + {2, 2}, torch::TensorOptions(torch::kFloat).device(device)); + AllClose(a, lazy_a); + }); +} + +TEST_F(LazyOpsTest, TestOnes) { + torch::Tensor a = torch::ones( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = + torch::ones({2, 2}, torch::TensorOptions(torch::kFloat).device(device)); + AllClose(a, lazy_a); + }); +} + +TEST_F(LazyOpsTest, TestOnesLike) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::ones_like(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::ones_like(lazy_a); + AllClose(a, lazy_a); + }); +} + +TEST_F(LazyOpsTest, TestOnesLikeOptions) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::ones_like( + a, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::ones_like( + lazy_a, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + AllClose(a, lazy_a); + }); +} + +TEST_F(LazyOpsTest, TestFull) { + torch::Tensor a = + torch::full({2, 2}, 3.1165, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = torch::full( + {2, 2}, 3.1165, torch::TensorOptions(torch::kFloat).device(device)); + AllClose(a, lazy_a); + }); +} + +TEST_F(LazyOpsTest, TestFullLike) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::full_like(a, 3.1165); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::full_like(lazy_a, 3.1165); + AllClose(a, lazy_a); + }); +} + +TEST_F(LazyOpsTest, TestFullLikeOptions) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::full_like( + a, 3.1165, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::full_like( + lazy_a, 3.1165, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + AllClose(a, lazy_a); + }); +} + +TEST_F(LazyOpsTest, TestARange) { + for (auto& ranges : std::vector>{{0.0, 100.0, 0.5}, + {0.0, -100.0, -0.5}}) { + torch::Tensor a = torch::arange( + ranges[0], ranges[1], ranges[2], + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = + torch::arange(ranges[0], ranges[1], ranges[2], + torch::TensorOptions(torch::kFloat).device(device)); + AllClose(a, lazy_a); + }); + } +} + +TEST_F(LazyOpsTest, TestARangeOut) { + torch::Tensor a = torch::randn( + {4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (auto& ranges : std::vector>{{0.0, 100.0, 0.5}, + {0.0, -100.0, -0.5}}) { + torch::Tensor b = torch::arange_out(a, ranges[0], ranges[1], ranges[2]); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = + torch::arange_out(lazy_a, ranges[0], ranges[1], ranges[2]); + AllClose(b, lazy_b); + }); + } +} + +TEST_F(LazyOpsTest, TestDimARange) { + torch::Tensor like = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor a = torch::_dim_arange(like, 1); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_like = CopyToDevice(like, device); + torch::Tensor lazy_a = torch::_dim_arange(lazy_like, 1); + AllClose(a, lazy_a); + }); +} + +TEST_F(LazyOpsTest, TestBartlettWindow) { + int window_length = 10; + for (bool periodic : {false, true}) { + ForEachDevice([&](const torch::Device& device) { + torch::Tensor output = torch::bartlett_window( + window_length, periodic, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + + torch::Tensor lazy_output = torch::bartlett_window( + window_length, periodic, + torch::TensorOptions(torch::kFloat).device(device)); + AllClose(output, lazy_output, /*rtol=*/1e-5, /*atol=*/1e-7); + }); + } +} + +TEST_F(LazyOpsTest, TestBlackmanWindow) { + int window_length = 10; + for (bool periodic : {false, true}) { + ForEachDevice([&](const torch::Device& device) { + torch::Tensor output = torch::blackman_window( + window_length, periodic, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor lazy_output = torch::blackman_window( + window_length, periodic, + torch::TensorOptions(torch::kFloat).device(device)); + AllClose(output, lazy_output, /*rtol=*/1e-5, /*atol=*/1e-7); + }); + } +} + +TEST_F(LazyOpsTest, TestHammingWindow) { + double alpha = 0.54; + double beta = 0.46; + int window_length = 10; + for (bool periodic : {false, true}) { + ForEachDevice([&](const torch::Device& device) { + torch::Tensor output = torch::hamming_window( + window_length, periodic, alpha, beta, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor lazy_output = torch::hamming_window( + window_length, periodic, alpha, beta, + torch::TensorOptions(torch::kFloat).device(device)); + AllClose(output, lazy_output); + }); + } +} + +TEST_F(LazyOpsTest, TestHannWindow) { + int window_length = 10; + for (bool periodic : {false, true}) { + ForEachDevice([&](const torch::Device& device) { + torch::Tensor output = torch::hann_window( + window_length, periodic, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor lazy_output = torch::hann_window( + window_length, periodic, + torch::TensorOptions(torch::kFloat).device(device)); + AllClose(output, lazy_output); + }); + } +} + +TEST_F(LazyOpsTest, TestLogSigmoid) { + torch::Tensor a = torch::empty( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + a.uniform_(-1.0, 1.0); + torch::Tensor b = torch::log_sigmoid(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::log_sigmoid(lazy_a); + AllClose(b, lazy_b, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestLogSigmoidForward) { + torch::Tensor a = torch::empty( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + a.uniform_(-1.0, 1.0); + auto tuple = torch::log_sigmoid_forward(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + auto lazy_tuple = torch::log_sigmoid_forward(lazy_a); + AllClose(std::get<0>(tuple), std::get<0>(lazy_tuple), + /*rtol=*/1e-3, /*atol=*/1e-5); + AllClose(std::get<1>(tuple), std::get<1>(lazy_tuple), + /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestLogsumexp) { + torch::Tensor a = torch::rand( + {3, 4, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (auto dims : std::vector>{{0, 1}, {-3, -2}}) { + for (bool keepdim : {false, true}) { + torch::Tensor b = torch::logsumexp(a, dims, keepdim); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::logsumexp(lazy_a, dims, keepdim); + AllClose(b, lazy_b); + }); + } + } +} + +TEST_F(LazyOpsTest, TestSiLU) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::silu(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::silu(lazy_a); + AllClose(b, lazy_b, /*rtol=*/1e-3, /*atol=*/1e-5); + }); + ExpectCounterChanged("lazy::silu_out", GetIgnoredCounters()); +} + +TEST_F(LazyOpsTest, TestSigmoid) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::sigmoid(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::sigmoid(lazy_a); + AllClose(b, lazy_b, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestMatmul_1x1) { + torch::Tensor a = torch::rand( + {4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = torch::matmul(a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::matmul(lazy_a, lazy_b); + AllClose(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestMatmul_2x1) { + torch::Tensor a = torch::rand( + {3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = torch::matmul(a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::matmul(lazy_a, lazy_b); + AllClose(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestMatmul_1x2) { + torch::Tensor a = torch::rand( + {4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {4, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = torch::matmul(a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::matmul(lazy_a, lazy_b); + AllClose(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestMatmul_2x2) { + torch::Tensor a = torch::rand( + {2, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {4, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = torch::matmul(a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::matmul(lazy_a, lazy_b); + AllClose(c, lazy_c, /*rtol=*/1e-3, /*atol=*/1e-4); + }); +} + +TEST_F(LazyOpsTest, TestMatmulBcast) { + torch::Tensor a = + torch::rand({4, 2, 3, 2, 4}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = + torch::rand({2, 1, 4, 3}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = torch::matmul(a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::matmul(lazy_a, lazy_b); + AllClose(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestDot) { + torch::Tensor a = torch::rand( + {4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = torch::dot(a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::dot(lazy_a, lazy_b); + AllClose(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestTensorDot) { + torch::Tensor a = torch::rand( + {6, 4, 8}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {4, 7, 8}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + std::vector dims_a = {1, 2}; + std::vector dims_b = {0, 2}; + torch::Tensor c = torch::tensordot(a, b, dims_a, dims_b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::tensordot(lazy_a, lazy_b, dims_a, dims_b); + AllClose(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestGer) { + torch::Tensor a = torch::rand( + {4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {5}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = torch::ger(a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::ger(lazy_a, lazy_b); + AllClose(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestMv) { + torch::Tensor a = torch::rand( + {4, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = torch::mv(a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::mv(lazy_a, lazy_b); + AllClose(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestMvOut) { + torch::Tensor a = torch::rand( + {4, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = torch::empty( + {4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::mv_out(c, a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::empty({4}, lazy_b.options()); + torch::mv_out(lazy_c, lazy_a, lazy_b); + AllClose(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestBatchAddBatchMatMul) { + torch::Tensor a = torch::rand( + {3, 6, 5}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {3, 6, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = torch::rand( + {3, 4, 5}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Scalar alpha = 0.5; + torch::Scalar beta = 1.5; + torch::Tensor d = torch::baddbmm(a, b, c, beta, alpha); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = CopyToDevice(c, device); + torch::Tensor lazy_d = torch::baddbmm(lazy_a, lazy_b, lazy_c, beta, alpha); + AllClose(d, lazy_d, /*rtol=*/1e-3, /*atol=*/1e-4); + }); +} + +TEST_F(LazyOpsTest, TestBatchAddBatchMatMulInPlace) { + torch::Tensor a = torch::rand( + {3, 6, 5}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {3, 6, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = torch::rand( + {3, 4, 5}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Scalar alpha = 0.5; + torch::Scalar beta = 1.5; + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = CopyToDevice(c, device); + torch::Tensor d = a.baddbmm_(b, c, beta, alpha); + torch::Tensor lazy_d = lazy_a.baddbmm_(lazy_b, lazy_c, beta, alpha); + AllClose(d, lazy_d, /*rtol=*/1e-3, /*atol=*/1e-4); + AllClose(a, lazy_a, /*rtol=*/1e-3, /*atol=*/1e-4); + }); +} + +TEST_F(LazyOpsTest, TestBatchMatMul) { + torch::Tensor a = torch::rand( + {3, 6, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {3, 4, 5}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = torch::bmm(a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::bmm(lazy_a, lazy_b); + AllClose(c, lazy_c, /*rtol=*/1e-3, /*atol=*/1e-4); + }); +} + +TEST_F(LazyOpsTest, TestChainMatMul) { + torch::Tensor a = torch::rand( + {5, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {4, 6}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = torch::rand( + {6, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor d = torch::rand( + {2, 7}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor result = torch::chain_matmul({a, b, c, d}); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = CopyToDevice(c, device); + torch::Tensor lazy_d = CopyToDevice(d, device); + torch::Tensor lazy_result = + torch::chain_matmul({lazy_a, lazy_b, lazy_c, lazy_d}); + AllClose(result, lazy_result, /*rtol=*/1e-3, /*atol=*/1e-4); + }); +} + +TEST_F(LazyOpsTest, TestLinear) { + torch::Tensor input = torch::rand( + {2, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor weight = torch::rand( + {3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor bias = torch::rand( + {3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor result = torch::linear(input, weight); + torch::Tensor result_with_bias = torch::linear(input, weight, bias); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_weight = CopyToDevice(weight, device); + torch::Tensor lazy_bias = CopyToDevice(bias, device); + torch::Tensor lazy_result = torch::linear(lazy_input, lazy_weight); + torch::Tensor lazy_result_with_bias = + torch::linear(lazy_input, lazy_weight, lazy_bias); + AllClose(result, lazy_result, /*rtol=*/1e-2, /*atol=*/1e-4); + AllClose(result_with_bias, lazy_result_with_bias, /*rtol=*/1e-2, + /*atol=*/1e-4); + }); +} + +TEST_F(LazyOpsTest, TestPinverse) { + torch::Tensor input = torch::rand( + {4, 6}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor result = torch::pinverse(input); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_result = torch::pinverse(lazy_input); + AllClose(result, lazy_result, /*rtol=*/1e-4); + }); +} + +TEST_F(LazyOpsTest, TestEinsumOuter) { + torch::Tensor a = torch::rand( + {5}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {5}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + std::string equation = "i,j->ij"; + torch::Tensor c = torch::einsum(equation, {a, b}); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::einsum(equation, {lazy_a, lazy_b}); + AllClose(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestEinsumOuterBackward) { + torch::Tensor a = torch::rand({5}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true)); + torch::Tensor b = torch::rand({5}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true)); + std::string equation = "i,j->ij"; + auto testfn = [&](const std::vector& inputs) -> torch::Tensor { + return torch::einsum(equation, inputs); + }; + ForEachDevice([&](const torch::Device& device) { + TestBackward({a, b}, device, testfn, /*rtol=*/1e-3, /*atol=*/1e-4); + }); +} + +TEST_F(LazyOpsTest, TestEinsumBatchMatMul) { + torch::Tensor a = torch::rand( + {3, 2, 5}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {3, 5, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + std::string equation = "bij,bjk->bik"; + torch::Tensor c = torch::einsum(equation, {a, b}); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::einsum(equation, {lazy_a, lazy_b}); + AllClose(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestEinsumPyTorchLowerBilinear) { + torch::Tensor a = torch::rand( + {3, 5, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor l = torch::rand( + {2, 5}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor r = torch::rand( + {2, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + std::string equation = "bn,anm,bm->ba"; + torch::Tensor c = torch::einsum(equation, {l, a, r}); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_l = CopyToDevice(l, device); + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_r = CopyToDevice(r, device); + torch::Tensor lazy_c = torch::einsum(equation, {lazy_l, lazy_a, lazy_r}); + AllClose(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestEinsumPyTorchLowerDiagonal) { + torch::Tensor input = torch::rand( + {3, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + std::string equation = "ii->i"; + torch::Tensor result = torch::einsum(equation, {input}); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_result = torch::einsum(equation, {lazy_input}); + AllClose(result, lazy_result); + }); +} + +TEST_F(LazyOpsTest, TestEinsumPyTorchLowerBatchDiagonal) { + torch::Tensor input = torch::rand( + {4, 3, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + std::string equation = "...ii->...i"; + torch::Tensor result = torch::einsum(equation, {input}); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_result = torch::einsum(equation, {lazy_input}); + AllClose(result, lazy_result); + }); +} + +TEST_F(LazyOpsTest, TestEinsumPyTorchLowerBatchPermute) { + torch::Tensor input = + torch::rand({2, 3, 4, 5}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + std::string equation = "...ij->...ji"; + torch::Tensor result = torch::einsum(equation, {input}); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_result = torch::einsum(equation, {lazy_input}); + AllClose(result, lazy_result); + }); +} + +TEST_F(LazyOpsTest, TestEinsumPyTorchLowerRepeatedAxis) { + torch::Tensor x = torch::rand( + {2, 3, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor y = torch::rand( + {4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + std::string equation = "ijj,k->ik"; + torch::Tensor result = torch::einsum(equation, {x, y}); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_x = CopyToDevice(x, device); + torch::Tensor lazy_y = CopyToDevice(y, device); + torch::Tensor lazy_result = torch::einsum(equation, {lazy_x, lazy_y}); + AllClose(result, lazy_result); + }); +} + +TEST_F(LazyOpsTest, TestBilinear) { + int batch_size = 16; + int in1_features = 4; + int in2_features = 6; + int out_features = 8; + torch::Tensor input1 = + torch::rand({batch_size, in1_features}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor input2 = + torch::rand({batch_size, in2_features}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor weight = + torch::rand({out_features, in1_features, in2_features}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor bias = + torch::rand({out_features}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input1 = CopyToDevice(input1, device); + torch::Tensor lazy_input2 = CopyToDevice(input2, device); + torch::Tensor lazy_weight = CopyToDevice(weight, device); + torch::Tensor lazy_bias = CopyToDevice(bias, device); + torch::Tensor result = torch::bilinear(input1, input2, weight, bias); + torch::Tensor lazy_result = + torch::bilinear(lazy_input1, lazy_input2, lazy_weight, lazy_bias); + AllClose(result, lazy_result); + }); +} + +TEST_F(LazyOpsTest, TestUpsampleNearest2D) { + int batch_size = 2; + int h = 5; + int w = 5; + int uh = 8; + int uw = 8; + int chans = 2; + torch::Tensor input = + torch::rand({batch_size, chans, h, w}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor result = torch::upsample_nearest2d(input, {uh, uw}); + torch::Tensor lazy_result = torch::upsample_nearest2d(lazy_input, {uh, uw}); + AllClose(result, lazy_result); + }); +} + +TEST_F(LazyOpsTest, TestUpsampleNearest2DBackward) { + int batch_size = 2; + int h = 5; + int w = 5; + int uh = 8; + int uw = 8; + int chans = 2; + auto testfn = [&](const std::vector& inputs) -> torch::Tensor { + return torch::upsample_nearest2d(inputs[0], {uh, uw}); + }; + ForEachDevice([&](const torch::Device& device) { + TestBackward({torch::rand({batch_size, chans, h, w}, + torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true))}, + device, testfn); + }); +} + +TEST_F(LazyOpsTest, TestUpsampleNearest2DWithScale) { + int batch_size = 2; + int h = 5; + int w = 5; + int chans = 2; + double scale_h = 2.5; + double scale_w = 3.4; + torch::Tensor input = + torch::rand({batch_size, chans, h, w}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor result = torch::upsample_nearest2d( + input, c10::nullopt, at::ArrayRef{scale_h, scale_w}); + torch::Tensor lazy_result = torch::upsample_nearest2d( + lazy_input, c10::nullopt, at::ArrayRef{scale_h, scale_w}); + AllClose(result, lazy_result); + }); +} + +TEST_F(LazyOpsTest, TestUpsampleNearest2DBackwardWithScale) { + int batch_size = 2; + int h = 5; + int w = 5; + int chans = 2; + double scale_h = 2.5; + double scale_w = 3.4; + auto testfn = [&](const std::vector& inputs) -> torch::Tensor { + return torch::upsample_nearest2d(inputs[0], c10::nullopt, + at::ArrayRef{scale_h, scale_w}); + }; + ForEachDevice([&](const torch::Device& device) { + TestBackward({torch::rand({batch_size, chans, h, w}, + torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true))}, + device, testfn); + }); +} + +TEST_F(LazyOpsTest, TestUpsampleBilinear2D) { + int batch_size = 2; + int h = 5; + int w = 5; + int uh = 8; + int uw = 8; + int chans = 2; + for (bool align_corners : {true, false}) { + torch::Tensor input = torch::rand( + {batch_size, chans, h, w}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor result = + torch::upsample_bilinear2d(input, {uh, uw}, align_corners); + torch::Tensor lazy_result = + torch::upsample_bilinear2d(lazy_input, {uh, uw}, align_corners); + AllClose(result, lazy_result); + }); + } +} + +TEST_F(LazyOpsTest, TestUpsampleBilinear2DBackward) { + int batch_size = 2; + int h = 5; + int w = 5; + int uh = 8; + int uw = 8; + int chans = 2; + for (bool align_corners : {true, false}) { + auto testfn = + [&](const std::vector& inputs) -> torch::Tensor { + return torch::upsample_bilinear2d(inputs[0], {uh, uw}, align_corners); + }; + ForEachDevice([&](const torch::Device& device) { + TestBackward({torch::rand({batch_size, chans, h, w}, + torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true))}, + device, testfn); + }); + } +} + +TEST_F(LazyOpsTest, TestAddCMul) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor d = torch::addcmul(a, b, c, 3.1165); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = CopyToDevice(c, device); + torch::Tensor lazy_d = torch::addcmul(lazy_a, lazy_b, lazy_c, 3.1165); + AllClose(d, lazy_d); + }); +} + +TEST_F(LazyOpsTest, TestAddCDiv) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = + torch::abs(torch::rand( + {2, 2}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice()))) + + 1.0; + torch::Tensor d = torch::addcdiv(a, b, c, 3.1165); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = CopyToDevice(c, device); + torch::Tensor lazy_d = torch::addcdiv(lazy_a, lazy_b, lazy_c, 3.1165); + AllClose(d, lazy_d); + }); +} + +TEST_F(LazyOpsTest, TestAddCDivWithBroadcast) { + torch::Tensor a = torch::rand( + {1, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {3, 1}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = + torch::abs(torch::rand( + {1, 3}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice()))) + + 1.0; + torch::Tensor d = torch::addcdiv(a, b, c, 3.1165); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = CopyToDevice(c, device); + torch::Tensor lazy_d = torch::addcdiv(lazy_a, lazy_b, lazy_c, 3.1165); + AllClose(d, lazy_d); + }); +} + +TEST_F(LazyOpsTest, TestSize) { + torch::Tensor input = + torch::rand({2, 1, 4, 6}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int rank = input.dim(); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + for (int dim = -rank; dim < rank; ++dim) { + EXPECT_EQ(torch::size(input, dim), torch::size(lazy_input, dim)); + } + }); +} + +TEST_F(LazyOpsTest, TestSelect) { + std::vector input_sizes = {14, 24, 8}; + int rank = input_sizes.size(); + for (int dim = -rank; dim < rank; ++dim) { + auto testfn = + [&](const std::vector& inputs) -> torch::Tensor { + return torch::select(inputs[0], dim, 0); + }; + ForEachDevice([&](const torch::Device& device) { + TestBackward({torch::rand(input_sizes, torch::TensorOptions(torch::kFloat) + .requires_grad(true))}, + device, testfn); + }); + }; +} + +TEST_F(LazyOpsTest, TestBernoulliScalarProb) { + torch::Tensor input = torch::zeros( + 1000, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::bernoulli(lazy_input, 0.1); + double frac = lazy_output.sum().item().toDouble() / input.numel(); + EXPECT_GT(frac, 0.06); + EXPECT_LT(frac, 0.14); + }); +} + +TEST_F(LazyOpsTest, TestBernoulliTensorProb) { + std::vector prob_values(1000, 0.1); + torch::Tensor input = torch::tensor( + prob_values, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::bernoulli(lazy_input); + double frac = lazy_output.sum().item().toDouble() / input.numel(); + EXPECT_GT(frac, 0.06); + EXPECT_LT(frac, 0.14); + }); +} + +TEST_F(LazyOpsTest, TestBernoulliScalarProbInPlace) { + torch::Tensor input = torch::zeros( + 1000, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + lazy_input.bernoulli_(0.1); + double frac = lazy_input.sum().item().toDouble() / input.numel(); + EXPECT_GT(frac, 0.06); + EXPECT_LT(frac, 0.14); + }); +} + +TEST_F(LazyOpsTest, TestBernoulliTensorProbInPlace) { + torch::Tensor input = torch::zeros( + 1000, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor prob = torch::scalar_tensor( + 0.1, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_prob = CopyToDevice(prob, device); + lazy_input.bernoulli_(lazy_prob); + double frac = lazy_input.sum().item().toDouble() / input.numel(); + EXPECT_GT(frac, 0.06); + EXPECT_LT(frac, 0.14); + }); +} + +TEST_F(LazyOpsTest, TestDropout) { + torch::Tensor a = torch::rand( + {17, 21}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::dropout(lazy_a, 0.1, /*train=*/true); + double prob = + static_cast(lazy_b.cpu().ne(0.0f).sum().item().toDouble()) / + a.numel(); + EXPECT_GT(prob, 0.86); + EXPECT_LT(prob, 0.94); + }); +} + +TEST_F(LazyOpsTest, TestDropoutInPlace) { + torch::Tensor a = torch::rand( + {17, 21}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::dropout_(lazy_a, 0.1, /*train=*/true); + double prob = + static_cast(lazy_a.cpu().ne(0.0f).sum().item().toDouble()) / + a.numel(); + EXPECT_GT(prob, 0.85); + EXPECT_LT(prob, 0.94); + }); +} + +TEST_F(LazyOpsTest, TestRandperm) { + unsigned n = 5; + torch::Tensor shuffle = torch::randperm( + n, torch::TensorOptions(torch::kLong).device(torch::kLazy)); + torch::Tensor shuffle_cpu = CopyToDevice(shuffle, torch::kCPU); + std::vector shuffle_data(shuffle_cpu.data_ptr(), + shuffle_cpu.data_ptr() + n); + EXPECT_TRUE(shuffle_data.size() == n && + torch::lazy::IsPermutation(shuffle_data)); +} + +TEST_F(LazyOpsTest, TestSlice) { + torch::Tensor a = + torch::rand({32, 24, 16}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::slice(a, 1, 0, 16, 1); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::slice(lazy_a, 1, 0, 16, 1); + AllClose(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestTake) { + torch::Tensor a = torch::rand( + {4, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::randint( + 16, {5}, torch::TensorOptions(torch::kLong).device(DefaultDevice())); + torch::Tensor c = torch::take(a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::take(lazy_a, lazy_b); + AllClose(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestTakeBackward) { + auto testfn = [&](const std::vector& inputs) -> torch::Tensor { + return torch::take(inputs[0], inputs[1]); + }; + ForEachDevice([&](const torch::Device& device) { + TestBackward( + {torch::rand({4, 4}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true)), + torch::randint( + 16, {5}, + torch::TensorOptions(torch::kLong).device(DefaultDevice()))}, + device, testfn); + }); +} + +TEST_F(LazyOpsTest, TestStack) { + torch::Tensor a = torch::rand( + {2, 4, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {2, 4, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = torch::rand( + {2, 4, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int rank = a.dim() + 1; + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor d = torch::stack({a, b, c}, dim); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = CopyToDevice(c, device); + torch::Tensor lazy_d = torch::stack({lazy_a, lazy_b, lazy_c}, dim); + AllClose(d, lazy_d); + }); + } +} + +TEST_F(LazyOpsTest, TestCat) { + torch::Tensor a = torch::rand( + {2, 1, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {2, 2, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = torch::rand( + {2, 3, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (int dim : {1, -2}) { + torch::Tensor d = torch::cat({a, b, c}, dim); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = CopyToDevice(c, device); + torch::Tensor lazy_d = torch::cat({lazy_a, lazy_b, lazy_c}, dim); + EXPECT_TRUE(d.sizes() == lazy_d.sizes() && d.dtype() == lazy_d.dtype()); + AllClose(d, lazy_d); + }); + } +} + +TEST_F(LazyOpsTest, TestUnbind) { + torch::Tensor input = torch::rand( + {4, 3, 7}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int rank = input.dim(); + for (int dim = -rank; dim < rank; ++dim) { + std::vector output = torch::unbind(input, dim); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + std::vector lazy_output = torch::unbind(lazy_input, dim); + ASSERT_EQ(output.size(), lazy_output.size()); + for (size_t i = 0; i < output.size(); ++i) { + AllClose(output[i], lazy_output[i]); + } + }); + } +} + +TEST_F(LazyOpsTest, TestRepeat) { + std::vector> repeats_list = {{4, 2}, {4, 2, 3}}; + std::vector> input_size_list = {{3}, {2, 4}}; + for (const auto& repeats : repeats_list) { + for (const auto& input_size : input_size_list) { + torch::Tensor input = torch::rand( + input_size, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor output = input.repeat(repeats); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = lazy_input.repeat(repeats); + AllClose(output, lazy_output); + }); + } + } +} + +TEST_F(LazyOpsTest, TestGather) { + torch::Tensor a = torch::rand( + {3, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::empty( + {3, 3}, torch::TensorOptions(torch::kLong).device(DefaultDevice())); + for (int i = 0; i < 3; i++) { + for (int j = 0; j < 3; j++) { + b[i][j] = (i + j) % 3; + } + } + for (bool sparse_grad : {false, true}) { + torch::Tensor c = torch::gather(a, 1, b, sparse_grad); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::gather(lazy_a, 1, lazy_b, sparse_grad); + AllClose(c, lazy_c); + }); + } +} + +TEST_F(LazyOpsTest, TestScatter) { + torch::Tensor a = torch::rand( + {3, 5}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {3, 5}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = torch::empty( + {3, 5}, torch::TensorOptions(torch::kLong).device(DefaultDevice())); + for (int dim = 0; dim < 2; ++dim) { + for (int i = 0; i < 3; i++) { + for (int j = 0; j < 5; j++) { + c[i][j] = (i + j) % c.sizes()[dim]; + } + } + torch::Tensor d = torch::scatter(a, dim, c, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = CopyToDevice(c, device); + torch::Tensor lazy_d = torch::scatter(lazy_a, dim, lazy_c, lazy_b); + AllClose(d, lazy_d); + }); + } +} + +TEST_F(LazyOpsTest, TestScatterR1) { + torch::Tensor a = torch::rand( + {5}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = torch::empty( + {2}, torch::TensorOptions(torch::kLong).device(DefaultDevice())); + c[0] = 1; + c[1] = 3; + torch::Tensor d = torch::scatter(a, 0, c, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = CopyToDevice(c, device); + torch::Tensor lazy_d = torch::scatter(lazy_a, 0, lazy_c, lazy_b); + AllClose(d, lazy_d); + }); +} + +TEST_F(LazyOpsTest, TestScatterR3) { + torch::Tensor a = torch::rand( + {3, 5, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {3, 4, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = torch::empty( + {3, 4, 2}, torch::TensorOptions(torch::kLong).device(DefaultDevice())); + for (int i = 0; i < 3; i++) { + for (int j = 0; j < 4; j++) { + for (int k = 0; k < 2; k++) { + c[i][j][k] = (i + j + k) % 4; + } + } + } + torch::Tensor d = torch::scatter(a, 1, c, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = CopyToDevice(c, device); + torch::Tensor lazy_d = torch::scatter(lazy_a, 1, lazy_c, lazy_b); + AllClose(d, lazy_d); + }); +} + +TEST_F(LazyOpsTest, TestScatterBiggerSource) { + torch::Tensor a = torch::rand( + {4, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {8, 8}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = torch::empty( + {4, 4}, torch::TensorOptions(torch::kLong).device(DefaultDevice())); + for (int i = 0; i < 4; i++) { + for (int j = 0; j < 4; j++) { + c[i][j] = (i + j) % 4; + } + } + for (int dim = 0; dim < 2; ++dim) { + torch::Tensor d = torch::scatter(a, dim, c, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = CopyToDevice(c, device); + torch::Tensor lazy_d = torch::scatter(lazy_a, dim, lazy_c, lazy_b); + AllClose(d, lazy_d); + }); + } +} + +TEST_F(LazyOpsTest, TestScatterScalar) { + torch::Tensor a = torch::rand( + {4, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Scalar b = 1.0f; + torch::Tensor c = torch::empty( + {4, 4}, torch::TensorOptions(torch::kLong).device(DefaultDevice())); + for (int i = 0; i < 4; i++) { + for (int j = 0; j < 4; j++) { + c[i][j] = (i + j) % 4; + } + } + for (int dim = 0; dim < 2; ++dim) { + torch::Tensor d = torch::scatter(a, dim, c, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_c = CopyToDevice(c, device); + torch::Tensor lazy_d = torch::scatter(lazy_a, dim, lazy_c, b); + AllClose(d, lazy_d); + }); + } +} + +TEST_F(LazyOpsTest, TestScatterReduceAdd) { + torch::Tensor a = torch::rand( + {3, 5}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {3, 5}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = torch::empty( + {3, 5}, torch::TensorOptions(torch::kLong).device(DefaultDevice())); + for (int dim = 0; dim < 2; ++dim) { + for (int i = 0; i < 3; i++) { + for (int j = 0; j < 5; j++) { + c[i][j] = (i + j) % c.sizes()[dim]; + } + } + torch::Tensor d = torch::scatter(a, dim, c, b, "add"); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = CopyToDevice(c, device); + torch::Tensor lazy_d = torch::scatter(lazy_a, dim, lazy_c, lazy_b, "add"); + AllClose(d, lazy_d); + }); + } + + ExpectCounterNotChanged("aten::.*", GetIgnoredCounters()); + ExpectCounterChanged("lazy::scatter_out", GetIgnoredCounters()); +} + +TEST_F(LazyOpsTest, TestScatterAdd) { + torch::Tensor a = torch::rand( + {3, 5}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {3, 5}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = torch::empty( + {3, 5}, torch::TensorOptions(torch::kLong).device(DefaultDevice())); + for (int dim = 0; dim < 2; ++dim) { + for (int i = 0; i < 3; i++) { + for (int j = 0; j < 5; j++) { + c[i][j] = (i + j) % c.sizes()[dim]; + } + } + torch::Tensor d = torch::scatter_add(a, dim, c, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = CopyToDevice(c, device); + torch::Tensor lazy_d = torch::scatter_add(lazy_a, dim, lazy_c, lazy_b); + AllClose(d, lazy_d); + }); + } +} + +TEST_F(LazyOpsTest, TestScatterAddInPlace) { + torch::Tensor b = torch::rand( + {4, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = torch::empty( + {4, 4}, torch::TensorOptions(torch::kLong).device(DefaultDevice())); + for (int i = 0; i < 4; i++) { + for (int j = 0; j < 4; j++) { + c[i][j] = (i + j) % 4; + } + } + for (int dim = 0; dim < 2; ++dim) { + ForEachDevice([&](const torch::Device& device) { + torch::Tensor a = torch::rand( + {4, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor d = a.scatter_add_(dim, c, b); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = CopyToDevice(c, device); + torch::Tensor lazy_d = lazy_a.scatter_add_(dim, lazy_c, lazy_b); + AllClose(d, lazy_d); + AllClose(a, lazy_a); + }); + } +} + +TEST_F(LazyOpsTest, TestIndexSelect) { + for (torch::ScalarType scalar_type : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + torch::Tensor a = + isFloatingType(scalar_type) + ? torch::rand( + {3, 4}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint( + 100, {3, 4}, + torch::TensorOptions(scalar_type).device(DefaultDevice())); + for (torch::ScalarType index_scalar_type : {torch::kInt, torch::kLong}) { + torch::Tensor b = torch::empty( + {2}, torch::TensorOptions(index_scalar_type).device(DefaultDevice())); + b[0] = 0; + b[1] = 2; + for (auto offset : {-2, 0}) { + torch::Tensor c0 = torch::index_select(a, 0 + offset, b); + torch::Tensor c1 = torch::index_select(a, 1 + offset, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c0 = torch::index_select(lazy_a, 0 + offset, lazy_b); + torch::Tensor lazy_c1 = torch::index_select(lazy_a, 1 + offset, lazy_b); + AllEqual(c0, lazy_c0); + AllEqual(c1, lazy_c1); + }); + } + } + } +} + +TEST_F(LazyOpsTest, TestIndexSelectRank0) { + for (torch::ScalarType scalar_type : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + torch::Tensor a = + isFloatingType(scalar_type) + ? torch::rand( + {3, 4}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint( + 100, {3, 4}, + torch::TensorOptions(scalar_type).device(DefaultDevice())); + torch::Tensor b = torch::scalar_tensor( + 2, torch::TensorOptions(torch::kLong).device(DefaultDevice())); + torch::Tensor c0 = torch::index_select(a, 0, b); + torch::Tensor c1 = torch::index_select(a, 1, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c0 = torch::index_select(lazy_a, 0, lazy_b); + torch::Tensor lazy_c1 = torch::index_select(lazy_a, 1, lazy_b); + AllEqual(c0, lazy_c0); + AllEqual(c1, lazy_c1); + }); + } +} + +TEST_F(LazyOpsTest, TestInverse) { + if (IsCuda()) { + // TODO(whc) debug failure on cuda, lazy_b comes back transposed + GTEST_SKIP(); + } + torch::Tensor a = torch::randn( + {5, 5}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::inverse(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::inverse(lazy_a); + AllClose(b, lazy_b, /*rtol=*/1e-3, /*atol=*/1e-4); + }); +} + +TEST_F(LazyOpsTest, TestIsnan) { + torch::Tensor a = torch::tensor( + {1.0, 2.0, std::nan("1"), 4.0}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::isnan(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::isnan(lazy_a); + AllEqual(b, lazy_b); + }); + ExpectCounterNotChanged("aten::.*", GetIgnoredCounters()); + ExpectCounterChanged("lazy::isnan", GetIgnoredCounters()); +} + +TEST_F(LazyOpsTest, TestExpand) { + torch::Tensor a = torch::rand( + {3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = a.expand({2, 3, 4}, /*implicit=*/false); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = lazy_a.expand({2, 3, 4}, /*implicit=*/false); + AllClose(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestExpandBack) { + torch::Tensor a = torch::rand( + {3, 1}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = a.expand({3, 4}, /*implicit=*/false); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = lazy_a.expand({3, 4}, /*implicit=*/false); + AllClose(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestExpandAs) { + torch::Tensor a = torch::rand( + {3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {2, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = torch::native::expand_as(a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::native::expand_as(lazy_a, lazy_b); + AllClose(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestEye) { + int n = 5; + ForEachDevice([&](const torch::Device& device) { + torch::Tensor out = torch::eye( + n, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor lazy_out = + torch::eye(n, torch::TensorOptions(torch::kFloat).device(device)); + AllClose(out, lazy_out); + }); +} + +TEST_F(LazyOpsTest, TestEyeWide) { + int lines = 3; + int cols = 5; + ForEachDevice([&](const torch::Device& device) { + torch::Tensor out = + torch::eye(lines, cols, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor lazy_out = torch::eye( + lines, cols, torch::TensorOptions(torch::kFloat).device(device)); + AllClose(out, lazy_out); + }); +} + +TEST_F(LazyOpsTest, TestEyeNarrow) { + int lines = 5; + int cols = 3; + ForEachDevice([&](const torch::Device& device) { + torch::Tensor out = + torch::eye(lines, cols, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor lazy_out = torch::eye( + lines, cols, torch::TensorOptions(torch::kFloat).device(device)); + AllClose(out, lazy_out); + }); +} + +TEST_F(LazyOpsTest, TestBroadcastTensors) { + torch::Tensor a = torch::rand( + {2, 1, 1}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {2, 1}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + std::vector c = torch::broadcast_tensors({a, b}); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + std::vector lazy_c = torch::broadcast_tensors({lazy_a, lazy_b}); + ASSERT_EQ(c.size(), lazy_c.size()); + for (size_t i = 0; i < c.size(); ++i) { + AllClose(c[i], lazy_c[i]); + } + }); +} + +TEST_F(LazyOpsTest, TestOneIndex) { + for (torch::ScalarType scalar_type : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + torch::Tensor params = + isFloatingType(scalar_type) + ? torch::rand( + {4, 3, 5, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint( + 100, {4, 3, 5, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())); + torch::Tensor indices = torch::randint( + -3, 3, {2, 4, 3}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + torch::Tensor result = torch::index(params, {indices}); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_params = CopyToDevice(params, device); + torch::Tensor lazy_indices = CopyToDevice(indices, device); + torch::Tensor lazy_result = torch::index(lazy_params, {lazy_indices}); + AllEqual(result, lazy_result); + }); + } +} + +TEST_F(LazyOpsTest, TestOneIndexTransfer) { + for (torch::ScalarType scalar_type : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + torch::Tensor params = + isFloatingType(scalar_type) + ? torch::rand( + {4, 3, 5, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint( + 100, {4, 3, 5, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())); + torch::Tensor indices = torch::randint( + -3, 3, {2, 4, 3}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + torch::Tensor result = torch::index(params, {indices}); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_params = CopyToDevice(params, device); + torch::Tensor lazy_result = torch::index(lazy_params, {indices}); + AllEqual(result, lazy_result); + }); + } +} + +TEST_F(LazyOpsTest, TestNonzero) { + torch::Tensor a = torch::zeros( + {4, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + a[0][1] = 1.0; + a[1][0] = 2.0; + a[3][1] = 3.0; + torch::Tensor b = torch::nonzero(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::nonzero(lazy_a); + AllClose(b, lazy_b); + + if (DebugUtil::ExperimentEnabled("nonzero")) { + // If the nonzero support is enabled, we must not see any aten:: calls. + ExpectCounterNotChanged("aten::.*", GetIgnoredCounters()); + } + ResetCounters(); + }); +} + +TEST_F(LazyOpsTest, TestMaskedSelect) { + torch::Tensor a = torch::rand( + {3, 5}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::randint( + 0, 2, {5}, torch::TensorOptions(torch::kBool).device(DefaultDevice())); + torch::Tensor c = torch::masked_select(a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::masked_select(lazy_a, lazy_b); + AllClose(c, lazy_c); + + if (DebugUtil::ExperimentEnabled("masked_select")) { + // If the masked_select support is enabled, we must not see any aten:: + // calls. + ExpectCounterNotChanged("aten::.*", GetIgnoredCounters()); + } + ResetCounters(); + }); +} + +TEST_F(LazyOpsTest, TestMaskedScatter) { + torch::Tensor a = torch::rand( + {3, 5}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::randint( + 0, 2, {3, 5}, torch::TensorOptions(torch::kBool).device(DefaultDevice())); + torch::Tensor c = torch::rand( + {15}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor d = torch::masked_scatter(a, b, c); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = CopyToDevice(c, device); + torch::Tensor lazy_d = torch::masked_scatter(lazy_a, lazy_b, lazy_c); + AllClose(d, lazy_d); + + if (DebugUtil::ExperimentEnabled("masked_scatter")) { + // If the masked_select support is enabled, we must not see any aten:: + // calls. + ExpectCounterNotChanged("aten::.*", GetIgnoredCounters()); + } + ResetCounters(); + }); +} + +TEST_F(LazyOpsTest, TestMultiIndexHeadNull) { + for (torch::ScalarType scalar_type : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + torch::Tensor params = + isFloatingType(scalar_type) + ? torch::rand( + {4, 3, 5, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint( + 100, {4, 3, 5, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())); + torch::Tensor indices_null; + torch::Tensor indices_0 = torch::randint( + -3, 3, {2, 4, 3}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + torch::Tensor indices_1 = torch::randint( + -3, 3, {2, 4, 3}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + torch::Tensor result = + torch::index(params, {indices_null, indices_0, indices_1}); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_params = CopyToDevice(params, device); + torch::Tensor lazy_indices_0 = CopyToDevice(indices_0, device); + torch::Tensor lazy_indices_1 = CopyToDevice(indices_1, device); + torch::Tensor lazy_result = torch::index( + lazy_params, {indices_null, lazy_indices_0, lazy_indices_1}); + AllEqual(result, lazy_result); + }); + } +} + +TEST_F(LazyOpsTest, TestMultiIndexMiddleNull) { + for (torch::ScalarType scalar_type : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + torch::Tensor params = + isFloatingType(scalar_type) + ? torch::rand( + {4, 3, 5, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint( + 100, {4, 3, 5, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())); + torch::Tensor indices_0 = torch::randint( + -3, 3, {2, 4, 3}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + torch::Tensor indices_null; + torch::Tensor indices_1 = torch::randint( + -3, 3, {2, 4, 3}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + torch::Tensor result = + torch::index(params, {indices_0, indices_null, indices_1}); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_params = CopyToDevice(params, device); + torch::Tensor lazy_indices_0 = CopyToDevice(indices_0, device); + torch::Tensor lazy_indices_1 = CopyToDevice(indices_1, device); + torch::Tensor lazy_result = torch::index( + lazy_params, {lazy_indices_0, indices_null, lazy_indices_1}); + AllEqual(result, lazy_result); + }); + } +} + +TEST_F(LazyOpsTest, TestMultiIndexTailNull) { + for (torch::ScalarType scalar_type : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + torch::Tensor params = + isFloatingType(scalar_type) + ? torch::rand( + {4, 3, 5, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint( + 100, {4, 3, 5, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())); + torch::Tensor indices_0 = torch::randint( + -3, 3, {2, 4, 3}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + torch::Tensor indices_null; + torch::Tensor indices_1 = torch::randint( + -3, 3, {2, 4, 3}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + torch::Tensor result = + torch::index(params, {indices_0, indices_1, indices_null}); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_params = CopyToDevice(params, device); + torch::Tensor lazy_indices_0 = CopyToDevice(indices_0, device); + torch::Tensor lazy_indices_1 = CopyToDevice(indices_1, device); + torch::Tensor lazy_result = torch::index( + lazy_params, {lazy_indices_0, lazy_indices_1, indices_null}); + AllEqual(result, lazy_result); + }); + } +} + +TEST_F(LazyOpsTest, TestMultiIndexMiddleBroadcast) { + for (torch::ScalarType scalar_type : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + torch::Tensor params = + isFloatingType(scalar_type) + ? torch::rand( + {4, 3, 5, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint( + 100, {4, 3, 5, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())); + torch::Tensor indices_0 = torch::randint( + -3, 3, {2, 4, 3}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + torch::Tensor indices_1 = torch::randint( + -3, 3, {2, 1, 3}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + torch::Tensor result = torch::index(params, {indices_0, indices_1}); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_params = CopyToDevice(params, device); + torch::Tensor lazy_indices_0 = CopyToDevice(indices_0, device); + torch::Tensor lazy_indices_1 = CopyToDevice(indices_1, device); + torch::Tensor lazy_result = + torch::index(lazy_params, {lazy_indices_0, lazy_indices_1}); + AllEqual(result, lazy_result); + }); + } +} + +TEST_F(LazyOpsTest, TestMultiIndexTailBroadcast) { + for (torch::ScalarType scalar_type : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + torch::Tensor params = + isFloatingType(scalar_type) + ? torch::rand( + {4, 3, 5, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint( + 100, {4, 3, 5, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())); + torch::Tensor indices_0 = torch::randint( + -3, 3, {2, 1, 3}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + torch::Tensor indices_1 = torch::randint( + -3, 3, {2, 1}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + torch::Tensor result = torch::index(params, {indices_0, indices_1}); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_params = CopyToDevice(params, device); + torch::Tensor lazy_indices_0 = CopyToDevice(indices_0, device); + torch::Tensor lazy_indices_1 = CopyToDevice(indices_1, device); + torch::Tensor lazy_result = + torch::index(lazy_params, {lazy_indices_0, lazy_indices_1}); + AllEqual(result, lazy_result); + }); + } +} + +TEST_F(LazyOpsTest, TestMaskIndex) { + for (torch::ScalarType scalar_type : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + torch::Tensor params = + isFloatingType(scalar_type) + ? torch::rand( + {2, 2}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint( + 100, {2, 2}, + torch::TensorOptions(scalar_type).device(DefaultDevice())); + torch::Tensor indices = torch::randint( + 0, 2, {2, 2}, + torch::TensorOptions(torch::kBool).device(DefaultDevice())); + torch::Tensor result = torch::index(params, {indices}); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_params = CopyToDevice(params, device); + torch::Tensor lazy_indices = CopyToDevice(indices, device); + torch::Tensor lazy_result = torch::index(lazy_params, {lazy_indices}); + AllEqual(result, lazy_result); + }); + } +} + +TEST_F(LazyOpsTest, TestOneIndexPut) { + for (torch::ScalarType scalar_type : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + torch::Tensor params = + isFloatingType(scalar_type) + ? torch::rand( + {4, 3, 5, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint( + 100, {4, 3, 5, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())); + torch::Tensor indices = torch::randint( + -3, 3, {2, 4, 3}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + torch::Tensor values = + isFloatingType(scalar_type) + ? torch::rand( + {3, 5, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint( + 100, {3, 5, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())); + for (bool accumulate : {false, true}) { + if (accumulate && IsCuda()) { + GTEST_SKIP(); + } + torch::Tensor result = + torch::index_put(params, {indices}, values, accumulate); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_params = CopyToDevice(params, device); + torch::Tensor lazy_indices = CopyToDevice(indices, device); + torch::Tensor lazy_values = CopyToDevice(values, device); + torch::Tensor lazy_result = + torch::index_put(lazy_params, {lazy_indices}, lazy_values, accumulate); + AllEqual(result, lazy_result); + }); + } + } +} + +TEST_F(LazyOpsTest, TestOneIndexPutInPlace) { + torch::Tensor indices = torch::randint( + -3, 3, {2, 4, 3}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + for (torch::ScalarType scalar_type : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + torch::Tensor values = + torch::ones({3, 5, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())); + for (bool accumulate : {false, true}) { + if (accumulate && IsCuda()) { + GTEST_SKIP(); + } + ForEachDevice([&](const torch::Device& device) { + torch::Tensor params = + isFloatingType(scalar_type) + ? torch::rand( + {4, 3, 5, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint(100, {4, 3, 5, 6, 7}, + torch::TensorOptions(scalar_type) + .device(DefaultDevice())); + torch::Tensor lazy_params = CopyToDevice(params.clone(), device); + torch::Tensor result = + torch::index_put_(params, {indices}, values, accumulate); + torch::Tensor lazy_indices = CopyToDevice(indices, device); + torch::Tensor lazy_values = CopyToDevice(values, device); + torch::Tensor lazy_result = torch::index_put_(lazy_params, {lazy_indices}, + lazy_values, accumulate); + AllEqual(result, lazy_result); + AllEqual(params, lazy_params); + }); + } + } +} + +TEST_F(LazyOpsTest, TestOneIndexPutTransfer) { + torch::Tensor indices = torch::randint( + -3, 3, {2, 4, 3}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + for (torch::ScalarType scalar_type : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + torch::Tensor params = + isFloatingType(scalar_type) + ? torch::rand( + {4, 3, 5, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint( + 100, {4, 3, 5, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())); + torch::Tensor values = + torch::ones({3, 5, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())); + for (bool accumulate : {false, true}) { + if (accumulate && IsCuda()) { + GTEST_SKIP(); + } + torch::Tensor result = + torch::index_put(params, {indices}, values, accumulate); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_params = CopyToDevice(params, device); + torch::Tensor lazy_values = CopyToDevice(values, device); + torch::Tensor lazy_result = + torch::index_put(lazy_params, {indices}, lazy_values, accumulate); + AllEqual(result, lazy_result); + }); + } + } +} + +TEST_F(LazyOpsTest, TestMultiIndexPut) { + torch::Tensor indices_0 = torch::randint( + -3, 3, {2, 4, 3}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + torch::Tensor indices_1 = torch::randint( + -3, 3, {2, 4, 3}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + for (torch::ScalarType scalar_type : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + torch::Tensor params = + isFloatingType(scalar_type) + ? torch::rand( + {4, 3, 5, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint( + 100, {4, 3, 5, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())); + torch::Tensor values = torch::ones( + {5, 6, 7}, torch::TensorOptions(scalar_type).device(DefaultDevice())); + for (bool accumulate : {false, true}) { + if (accumulate && IsCuda()) { + GTEST_SKIP(); + } + torch::Tensor result = + torch::index_put(params, {indices_0, indices_1}, values, accumulate); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_params = CopyToDevice(params, device); + torch::Tensor lazy_indices_0 = CopyToDevice(indices_0, device); + torch::Tensor lazy_indices_1 = CopyToDevice(indices_1, device); + torch::Tensor lazy_values = CopyToDevice(values, device); + torch::Tensor lazy_result = torch::index_put( + lazy_params, {lazy_indices_0, lazy_indices_1}, lazy_values, accumulate); + AllEqual(result, lazy_result); + }); + } + } +} + +TEST_F(LazyOpsTest, TestMultiIndexPutHeadNull) { + torch::Tensor indices_0 = torch::randint( + -3, 3, {2, 4, 3}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + torch::Tensor indices_null; + torch::Tensor indices_1 = torch::randint( + -3, 3, {2, 4, 3}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + for (torch::ScalarType scalar_type : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + torch::Tensor params = + isFloatingType(scalar_type) + ? torch::rand( + {4, 3, 3, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint( + 100, {4, 3, 3, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())); + torch::Tensor values = torch::ones( + {3, 6, 7}, torch::TensorOptions(scalar_type).device(DefaultDevice())); + for (bool accumulate : {false, true}) { + if (accumulate && IsCuda()) { + GTEST_SKIP(); + } + torch::Tensor result = torch::index_put( + params, {indices_null, indices_0, indices_1}, values, accumulate); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_params = CopyToDevice(params, device); + torch::Tensor lazy_indices_0 = CopyToDevice(indices_0, device); + torch::Tensor lazy_indices_1 = CopyToDevice(indices_1, device); + torch::Tensor lazy_values = CopyToDevice(values, device); + torch::Tensor lazy_result = torch::index_put( + lazy_params, {indices_null, lazy_indices_0, lazy_indices_1}, + lazy_values, accumulate); + AllEqual(result, lazy_result); + }); + } + } +} + +TEST_F(LazyOpsTest, TestMultiIndexPutMiddleNull) { + torch::Tensor indices_0 = torch::randint( + -3, 3, {2, 4, 3}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + torch::Tensor indices_null; + torch::Tensor indices_1 = torch::randint( + -3, 3, {2, 4, 3}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + for (torch::ScalarType scalar_type : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + torch::Tensor params = + isFloatingType(scalar_type) + ? torch::rand( + {4, 3, 3, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint( + 100, {4, 3, 3, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())); + torch::Tensor values = torch::ones( + {3, 6, 7}, torch::TensorOptions(scalar_type).device(DefaultDevice())); + for (bool accumulate : {false, true}) { + if (accumulate && IsCuda()) { + GTEST_SKIP(); + } + torch::Tensor result = torch::index_put( + params, {indices_0, indices_null, indices_1}, values, accumulate); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_params = CopyToDevice(params, device); + torch::Tensor lazy_indices_0 = CopyToDevice(indices_0, device); + torch::Tensor lazy_indices_1 = CopyToDevice(indices_1, device); + torch::Tensor lazy_values = CopyToDevice(values, device); + torch::Tensor lazy_result = torch::index_put( + lazy_params, {lazy_indices_0, indices_null, lazy_indices_1}, + lazy_values, accumulate); + AllEqual(result, lazy_result); + }); + } + } +} + +TEST_F(LazyOpsTest, TestMultiIndexPutTailNull) { + torch::Tensor indices_0 = torch::randint( + -3, 3, {2, 4, 3}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + torch::Tensor indices_1 = torch::randint( + -3, 3, {2, 4, 3}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + torch::Tensor indices_null; + for (torch::ScalarType scalar_type : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + torch::Tensor params = + isFloatingType(scalar_type) + ? torch::rand( + {4, 3, 3, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint( + 100, {4, 3, 3, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())); + torch::Tensor values = torch::ones( + {3, 6, 7}, torch::TensorOptions(scalar_type).device(DefaultDevice())); + for (bool accumulate : {false, true}) { + if (accumulate && IsCuda()) { + GTEST_SKIP(); + } + torch::Tensor result = torch::index_put( + params, {indices_0, indices_1, indices_null}, values, accumulate); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_params = CopyToDevice(params, device); + torch::Tensor lazy_indices_0 = CopyToDevice(indices_0, device); + torch::Tensor lazy_indices_1 = CopyToDevice(indices_1, device); + torch::Tensor lazy_values = CopyToDevice(values, device); + torch::Tensor lazy_result = torch::index_put( + lazy_params, {lazy_indices_0, lazy_indices_1, indices_null}, + lazy_values, accumulate); + AllEqual(result, lazy_result); + }); + } + } +} + +TEST_F(LazyOpsTest, TestMultiIndexPutMiddleBroadcast) { + torch::Tensor indices_0 = torch::randint( + -3, 3, {2, 4, 3}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + torch::Tensor indices_1 = torch::randint( + -3, 3, {2, 1, 3}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + for (torch::ScalarType scalar_type : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + torch::Tensor params = + isFloatingType(scalar_type) + ? torch::rand( + {4, 3, 5, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint( + 100, {4, 3, 5, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())); + torch::Tensor values = torch::ones( + {5, 6, 7}, torch::TensorOptions(scalar_type).device(DefaultDevice())); + for (bool accumulate : {false, true}) { + if (accumulate && IsCuda()) { + GTEST_SKIP(); + } + torch::Tensor result = + torch::index_put(params, {indices_0, indices_1}, values, accumulate); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_params = CopyToDevice(params, device); + torch::Tensor lazy_indices_0 = CopyToDevice(indices_0, device); + torch::Tensor lazy_indices_1 = CopyToDevice(indices_1, device); + torch::Tensor lazy_values = CopyToDevice(values, device); + torch::Tensor lazy_result = torch::index_put( + lazy_params, {lazy_indices_0, lazy_indices_1}, lazy_values, accumulate); + AllEqual(result, lazy_result); + }); + } + } +} + +TEST_F(LazyOpsTest, TestMultiIndexPutTailBroadcast) { + torch::Tensor indices_0 = torch::randint( + -3, 3, {2, 1, 3}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + torch::Tensor indices_1 = torch::randint( + -3, 3, {2, 1}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + for (torch::ScalarType scalar_type : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + torch::Tensor params = + isFloatingType(scalar_type) + ? torch::rand( + {4, 3, 5, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint( + 100, {4, 3, 5, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())); + torch::Tensor values = torch::ones( + {5, 6, 7}, torch::TensorOptions(scalar_type).device(DefaultDevice())); + for (bool accumulate : {false, true}) { + if (accumulate && IsCuda()) { + GTEST_SKIP(); + } + torch::Tensor result = + torch::index_put(params, {indices_0, indices_1}, values, accumulate); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_params = CopyToDevice(params, device); + torch::Tensor lazy_indices_0 = CopyToDevice(indices_0, device); + torch::Tensor lazy_indices_1 = CopyToDevice(indices_1, device); + torch::Tensor lazy_values = CopyToDevice(values, device); + torch::Tensor lazy_result = torch::index_put( + lazy_params, {lazy_indices_0, lazy_indices_1}, lazy_values, accumulate); + AllEqual(result, lazy_result); + }); + } + } +} + +TEST_F(LazyOpsTest, TestMaskIndexPut) { + torch::Tensor indices = + torch::tensor({0, 1}, + torch::TensorOptions(torch::kByte).device(DefaultDevice())) + .to(torch::kBool); + for (torch::ScalarType scalar_type : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + torch::Tensor params = + isFloatingType(scalar_type) + ? torch::rand( + {2, 2}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint( + 100, {2, 2}, + torch::TensorOptions(scalar_type).device(DefaultDevice())); + torch::Tensor values = torch::ones( + {2}, torch::TensorOptions(scalar_type).device(DefaultDevice())); + for (bool accumulate : {false, true}) { + torch::Tensor result = + torch::index_put(params, {indices}, values, accumulate); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_params = CopyToDevice(params, device); + torch::Tensor lazy_indices = CopyToDevice(indices, device); + torch::Tensor lazy_values = CopyToDevice(values, device); + torch::Tensor lazy_result = + torch::index_put(lazy_params, {lazy_indices}, lazy_values, accumulate); + AllEqual(result, lazy_result); + }); + } + } +} + +TEST_F(LazyOpsTest, TestIndexPutImpl) { + torch::Tensor indices = torch::randint( + -3, 3, {2, 4, 3}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + for (torch::ScalarType scalar_type : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + torch::Tensor values = + torch::ones({3, 5, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())); + for (bool accumulate : {false, true}) { + if (accumulate && IsCuda()) { + GTEST_SKIP(); + } + ForEachDevice([&](const torch::Device& device) { + torch::Tensor params = + isFloatingType(scalar_type) + ? torch::rand( + {4, 3, 5, 6, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint(100, {4, 3, 5, 6, 7}, + torch::TensorOptions(scalar_type) + .device(DefaultDevice())); + torch::Tensor lazy_params = CopyToDevice(params.clone(), device); + torch::Tensor result = torch::_index_put_impl_( + params, {indices}, values, accumulate, /*unsafe=*/true); + torch::Tensor lazy_indices = CopyToDevice(indices, device); + torch::Tensor lazy_values = CopyToDevice(values, device); + torch::Tensor lazy_result = torch::_index_put_impl_( + lazy_params, {lazy_indices}, lazy_values, accumulate, /*unsafe=*/true); + AllEqual(result, lazy_result); + AllEqual(params, lazy_params); + }); + } + } +} + +TEST_F(LazyOpsTest, TestIndexFillWithScalar) { + torch::Tensor index = torch::tensor( + {0, 2}, torch::TensorOptions(torch::kLong).device(DefaultDevice())); + torch::Scalar value = 42; + for (torch::ScalarType scalar_type : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + torch::Tensor base = + isFloatingType(scalar_type) + ? torch::rand( + {3, 4, 5}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint( + 100, {3, 4, 5}, + torch::TensorOptions(scalar_type).device(DefaultDevice())); + int rank = base.dim(); + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor result = torch::index_fill(base, dim, index, value); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_base = CopyToDevice(base, device); + torch::Tensor lazy_index = CopyToDevice(index, device); + torch::Tensor lazy_result = + torch::index_fill(lazy_base, dim, lazy_index, value); + AllEqual(result, lazy_result); + }); + } + } +} + +TEST_F(LazyOpsTest, TestIndexFillWithScalarInPlace) { + torch::Tensor index = torch::tensor( + {0, 2}, torch::TensorOptions(torch::kLong).device(DefaultDevice())); + torch::Scalar value = 42; + int rank = 3; + for (torch::ScalarType scalar_type : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + for (int dim = -rank; dim < rank; ++dim) { + ForEachDevice([&](const torch::Device& device) { + torch::Tensor base = + isFloatingType(scalar_type) + ? torch::rand( + {3, 4, 5}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint(100, {3, 4, 5}, + torch::TensorOptions(scalar_type) + .device(DefaultDevice())); + torch::Tensor lazy_base = CopyToDevice(base.clone(), device); + torch::Tensor result = base.index_fill_(dim, index, value); + torch::Tensor lazy_index = CopyToDevice(index, device); + torch::Tensor lazy_result = lazy_base.index_fill_(dim, lazy_index, value); + AllEqual(result, lazy_result); + AllEqual(base, lazy_base); + }); + } + } +} + +TEST_F(LazyOpsTest, TestIndexFillWithTensor) { + torch::Tensor index = torch::tensor( + {0, 2}, torch::TensorOptions(torch::kLong).device(DefaultDevice())); + for (torch::ScalarType scalar_type : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + torch::Tensor base = + isFloatingType(scalar_type) + ? torch::rand( + {3, 4, 5}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint( + 100, {3, 4, 5}, + torch::TensorOptions(scalar_type).device(DefaultDevice())); + torch::Tensor value = torch::scalar_tensor( + 42, torch::TensorOptions(scalar_type).device(DefaultDevice())); + int rank = base.dim(); + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor result = torch::index_fill(base, dim, index, value); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_base = CopyToDevice(base, device); + torch::Tensor lazy_index = CopyToDevice(index, device); + torch::Tensor lazy_value = CopyToDevice(value, device); + torch::Tensor lazy_result = + torch::index_fill(lazy_base, dim, lazy_index, lazy_value); + AllEqual(result, lazy_result); + }); + } + } +} + +TEST_F(LazyOpsTest, TestIndexFillWithTensorInPlace) { + torch::Tensor index = torch::tensor( + {0, 2}, torch::TensorOptions(torch::kLong).device(DefaultDevice())); + for (torch::ScalarType scalar_type : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + torch::Tensor value = torch::scalar_tensor( + 42, torch::TensorOptions(scalar_type).device(DefaultDevice())); + int rank = 3; + for (int dim = -rank; dim < rank; ++dim) { + ForEachDevice([&](const torch::Device& device) { + torch::Tensor base = + isFloatingType(scalar_type) + ? torch::rand( + {3, 4, 5}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint(100, {3, 4, 5}, + torch::TensorOptions(scalar_type) + .device(DefaultDevice())); + torch::Tensor lazy_base = CopyToDevice(base.clone(), device); + torch::Tensor result = base.index_fill_(dim, index, value); + torch::Tensor lazy_index = CopyToDevice(index, device); + torch::Tensor lazy_value = CopyToDevice(value, device); + torch::Tensor lazy_result = + lazy_base.index_fill_(dim, lazy_index, lazy_value); + AllEqual(result, lazy_result); + AllEqual(base, lazy_base); + }); + } + } +} + +TEST_F(LazyOpsTest, TestIndexFillRank0) { + torch::Tensor index = torch::scalar_tensor( + 2, torch::TensorOptions(torch::kLong).device(DefaultDevice())); + for (torch::ScalarType scalar_type : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + torch::Tensor base = + isFloatingType(scalar_type) + ? torch::rand( + {3, 4, 5}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint( + 100, {3, 4, 5}, + torch::TensorOptions(scalar_type).device(DefaultDevice())); + torch::Tensor value = torch::scalar_tensor( + 42, torch::TensorOptions(scalar_type).device(DefaultDevice())); + int rank = base.dim(); + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor result = torch::index_fill(base, dim, index, value); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_base = CopyToDevice(base, device); + torch::Tensor lazy_index = CopyToDevice(index, device); + torch::Tensor lazy_value = CopyToDevice(value, device); + torch::Tensor lazy_result = + torch::index_fill(lazy_base, dim, lazy_index, lazy_value); + AllEqual(result, lazy_result); + }); + } + } +} + +TEST_F(LazyOpsTest, TestIndexAdd) { + int index_size = 10; + for (torch::ScalarType scalar_type : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + torch::Tensor base = + isFloatingType(scalar_type) + ? torch::rand( + {5, 3, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint( + 100, {5, 3, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())); + int rank = base.dim(); + for (int dim = -rank; dim < rank; ++dim) { + for (torch::ScalarType index_scalar_type : {torch::kInt, torch::kLong}) { + torch::Tensor index = torch::randint( + 0, base.size(dim), {index_size}, + torch::TensorOptions(index_scalar_type).device(DefaultDevice())); + std::vector value_sizes(base.sizes().begin(), + base.sizes().end()); + int canonical_dim = dim < 0 ? dim + rank : dim; + value_sizes[canonical_dim] = index_size; + torch::Tensor value = + isFloatingType(scalar_type) + ? torch::rand( + value_sizes, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint(100, value_sizes, + torch::TensorOptions(scalar_type) + .device(DefaultDevice())); + torch::Tensor result = torch::index_add(base, dim, index, value); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_base = CopyToDevice(base, device); + torch::Tensor lazy_index = CopyToDevice(index, device); + torch::Tensor lazy_value = CopyToDevice(value, device); + torch::Tensor lazy_result = + torch::index_add(lazy_base, dim, lazy_index, lazy_value); + AllClose(result, lazy_result); + }); + } + } + } +} + +TEST_F(LazyOpsTest, TestIndexAddInPlace) { + int index_size = 10; + int rank = 3; + for (torch::ScalarType scalar_type : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + for (int dim = -rank; dim < rank; ++dim) { + ForEachDevice([&](const torch::Device& device) { + torch::Tensor base = + isFloatingType(scalar_type) + ? torch::rand( + {5, 3, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint(100, {5, 3, 7}, + torch::TensorOptions(scalar_type) + .device(DefaultDevice())); + torch::Tensor index = torch::randint( + 0, base.size(dim), {index_size}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + std::vector value_sizes(base.sizes().begin(), + base.sizes().end()); + int canonical_dim = dim < 0 ? dim + rank : dim; + value_sizes[canonical_dim] = index_size; + torch::Tensor value = + isFloatingType(scalar_type) + ? torch::rand( + value_sizes, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint(100, value_sizes, + torch::TensorOptions(scalar_type) + .device(DefaultDevice())); + torch::Tensor lazy_base = CopyToDevice(base.clone(), device); + torch::Tensor result = base.index_add_(dim, index, value); + torch::Tensor lazy_index = CopyToDevice(index, device); + torch::Tensor lazy_value = CopyToDevice(value, device); + torch::Tensor lazy_result = + lazy_base.index_add_(dim, lazy_index, lazy_value); + AllClose(result, lazy_result); + AllClose(base, lazy_base); + }); + } + } +} + +TEST_F(LazyOpsTest, TestIndexAddRank0) { + for (torch::ScalarType scalar_type : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + torch::Tensor base = + isFloatingType(scalar_type) + ? torch::rand( + {5, 3, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint( + 100, {5, 3, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())); + int rank = base.dim(); + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor index = torch::randint( + 0, base.size(dim), at::IntArrayRef{}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + std::vector value_sizes(base.sizes().begin(), + base.sizes().end()); + int canonical_dim = dim < 0 ? dim + rank : dim; + value_sizes[canonical_dim] = 1; + torch::Tensor value = + isFloatingType(scalar_type) + ? torch::rand( + value_sizes, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint( + 100, value_sizes, + torch::TensorOptions(scalar_type).device(DefaultDevice())); + torch::Tensor result = torch::index_add(base, dim, index, value); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_base = CopyToDevice(base, device); + torch::Tensor lazy_index = CopyToDevice(index, device); + torch::Tensor lazy_value = CopyToDevice(value, device); + torch::Tensor lazy_result = + torch::index_add(lazy_base, dim, lazy_index, lazy_value); + AllEqual(result, lazy_result); + }); + } + } +} + +TEST_F(LazyOpsTest, TestIndexCopy) { + for (torch::ScalarType scalar_type : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + torch::Tensor base = + isFloatingType(scalar_type) + ? torch::rand( + {5, 3, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint( + 100, {5, 3, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())); + int rank = base.dim(); + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor index = torch::randperm( + base.size(dim), + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + torch::Tensor value = + isFloatingType(scalar_type) + ? torch::rand( + base.sizes(), + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint( + 100, base.sizes(), + torch::TensorOptions(scalar_type).device(DefaultDevice())); + torch::Tensor result = torch::index_copy(base, dim, index, value); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_base = CopyToDevice(base, device); + torch::Tensor lazy_index = CopyToDevice(index, device); + torch::Tensor lazy_value = CopyToDevice(value, device); + torch::Tensor lazy_result = + torch::index_copy(lazy_base, dim, lazy_index, lazy_value); + AllEqual(result, lazy_result); + }); + } + } +} + +TEST_F(LazyOpsTest, TestIndexCopyInPlace) { + if (IsCuda()) { + GTEST_SKIP(); + } + int index_size = 10; + int rank = 3; + for (torch::ScalarType scalar_type : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + for (int dim = -rank; dim < rank; ++dim) { + ForEachDevice([&](const torch::Device& device) { + torch::Tensor base = + isFloatingType(scalar_type) + ? torch::rand( + {5, 3, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint(100, {5, 3, 7}, + torch::TensorOptions(scalar_type) + .device(DefaultDevice())); + torch::Tensor index = torch::randint( + 0, base.size(dim), {index_size}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + std::vector value_sizes(base.sizes().begin(), + base.sizes().end()); + int canonical_dim = dim < 0 ? dim + rank : dim; + value_sizes[canonical_dim] = index_size; + torch::Tensor value = + isFloatingType(scalar_type) + ? torch::rand( + value_sizes, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint(100, value_sizes, + torch::TensorOptions(scalar_type) + .device(DefaultDevice())); + torch::Tensor lazy_base = CopyToDevice(base.clone(), device); + torch::Tensor result = base.index_copy_(dim, index, value); + torch::Tensor lazy_index = CopyToDevice(index, device); + torch::Tensor lazy_value = CopyToDevice(value, device); + torch::Tensor lazy_result = + lazy_base.index_copy_(dim, lazy_index, lazy_value); + AllEqual(result, lazy_result); + AllEqual(base, lazy_base); + }); + } + } +} + +TEST_F(LazyOpsTest, TestIndexCopyRank0) { + for (torch::ScalarType scalar_type : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + torch::Tensor base = + isFloatingType(scalar_type) + ? torch::rand( + {5, 3, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint( + 100, {5, 3, 7}, + torch::TensorOptions(scalar_type).device(DefaultDevice())); + int rank = base.dim(); + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor index = torch::randint( + 0, base.size(dim), at::IntArrayRef{}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + std::vector value_sizes(base.sizes().begin(), + base.sizes().end()); + int canonical_dim = dim < 0 ? dim + rank : dim; + value_sizes[canonical_dim] = 1; + torch::Tensor value = + isFloatingType(scalar_type) + ? torch::rand( + value_sizes, + torch::TensorOptions(scalar_type).device(DefaultDevice())) + : torch::randint( + 100, value_sizes, + torch::TensorOptions(scalar_type).device(DefaultDevice())); + torch::Tensor result = torch::index_copy(base, dim, index, value); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_base = CopyToDevice(base, device); + torch::Tensor lazy_index = CopyToDevice(index, device); + torch::Tensor lazy_value = CopyToDevice(value, device); + torch::Tensor lazy_result = + torch::index_copy(lazy_base, dim, lazy_index, lazy_value); + AllEqual(result, lazy_result); + }); + } + } +} + +TEST_F(LazyOpsTest, TestRelu) { + torch::Tensor input = + torch::rand({2, 1, 4, 6}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor output = torch::relu(input); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::relu(lazy_input); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestReluInPlace) { + torch::Tensor input = + torch::rand({2, 1, 4, 6}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor output = torch::relu_(input); + torch::Tensor lazy_output = torch::relu_(lazy_input); + AllClose(output, lazy_output); + AllClose(input, lazy_input); + }); +} + +TEST_F(LazyOpsTest, TestHardshrink) { + torch::Tensor input = torch::randn( + {10}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor output = torch::hardshrink(input); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::hardshrink(lazy_input); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestHardSigmoid) { + torch::Tensor input = torch::randn( + {10}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor output = torch::hardsigmoid(input); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::hardsigmoid(lazy_input); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestHardSigmoidInPlace) { + ForEachDevice([&](const torch::Device& device) { + torch::Tensor input = torch::randn( + {10}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor output = torch::hardsigmoid_(input); + torch::Tensor lazy_output = torch::hardsigmoid_(lazy_input); + AllClose(input, lazy_input); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestHardSigmoidBackward) { + auto testfn = [&](const std::vector& inputs) -> torch::Tensor { + return torch::hardsigmoid(inputs[0]); + }; + ForEachDevice([&](const torch::Device& device) { + TestBackward({torch::randn({10}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true))}, + device, testfn); + }); +} + +TEST_F(LazyOpsTest, TestSoftshrink) { + torch::Tensor input = torch::randn( + {10}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor output = torch::softshrink(input); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::softshrink(lazy_input); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestHardtanh) { + torch::Tensor input = torch::randn( + {10}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor output = torch::hardtanh(input); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::hardtanh(lazy_input); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestHardtanhInPlace) { + torch::Tensor input = torch::randn( + {10}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor output = torch::hardtanh_(input); + torch::Tensor lazy_output = torch::hardtanh_(lazy_input); + AllClose(output, lazy_output); + AllClose(input, lazy_input); + }); +} + +TEST_F(LazyOpsTest, TestLeakyRelu) { + torch::Tensor input = + torch::rand({2, 1, 4, 6}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + double negative_slope = 0.01; + torch::Tensor output = torch::leaky_relu(input, negative_slope); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::leaky_relu(lazy_input, negative_slope); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestLeakyReluInPlace) { + torch::Tensor input = + torch::rand({2, 1, 4, 6}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + double negative_slope = 0.01; + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor output = torch::leaky_relu_(input, negative_slope); + torch::Tensor lazy_output = torch::leaky_relu_(lazy_input, negative_slope); + AllClose(output, lazy_output); + AllClose(input, lazy_input); + }); +} + +TEST_F(LazyOpsTest, TestExp) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::exp(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::exp(lazy_a); + AllClose(b, lazy_b, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestExpm1) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::expm1(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::expm1(lazy_a); + AllClose(b, lazy_b, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestLog) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::log(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::log(lazy_a); + AllClose(b, lazy_b, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestLog2) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::log2(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::log2(lazy_a); + AllClose(b, lazy_b, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestLog10) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::log10(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::log10(lazy_a); + AllClose(b, lazy_b, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestLog1p) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::log1p(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::log1p(lazy_a); + AllClose(b, lazy_b, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestErf) { + torch::Tensor a = torch::randn( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::erf(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::erf(lazy_a); + AllClose(b, lazy_b, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestErfc) { + torch::Tensor a = torch::randn( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::erfc(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::erfc(lazy_a); + AllClose(b, lazy_b, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestErfinv) { + torch::Tensor a = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::erfinv(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::erfinv(lazy_a); + AllClose(b, lazy_b, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestSqrt) { + torch::Tensor a = torch::abs(torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice()))); + torch::Tensor b = torch::sqrt(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::sqrt(lazy_a); + AllClose(b, lazy_b, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestRsqrt) { + torch::Tensor a = torch::abs(torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice()))); + torch::Tensor b = torch::rsqrt(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::rsqrt(lazy_a); + AllClose(b, lazy_b, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestReciprocal) { + torch::Tensor a = torch::randn( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::reciprocal(a); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::reciprocal(lazy_a); + AllClose(b, lazy_b, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestPowTensorScalar) { + torch::Tensor base = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Scalar exponent = 4.09; + torch::Tensor result = torch::pow(base, exponent); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_base = CopyToDevice(base, device); + torch::Tensor lazy_result = torch::pow(lazy_base, exponent); + AllClose(result, lazy_result, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestPowTensorScalarInPlace) { + torch::Tensor base = torch::rand( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Scalar exponent = 4.09; + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_base = CopyToDevice(base.clone(), device); + torch::Tensor result = base.pow_(exponent); + torch::Tensor lazy_result = lazy_base.pow_(exponent); + AllClose(result, lazy_result, /*rtol=*/1e-3, /*atol=*/1e-5); + AllClose(base, lazy_base, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestPowTensorTensor) { + torch::Tensor base = torch::abs(torch::rand( + {4, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice()))); + torch::Tensor exponent = torch::rand( + {4, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor result = torch::pow(base, exponent); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_base = CopyToDevice(base, device); + torch::Tensor lazy_exponent = CopyToDevice(exponent, device); + torch::Tensor lazy_result = torch::pow(lazy_base, lazy_exponent); + AllClose(result, lazy_result, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestPowTensorTensorInPlace) { + torch::Tensor base = torch::abs(torch::rand( + {4, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice()))); + torch::Tensor exponent = torch::rand( + {4, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_base = CopyToDevice(base.clone(), device); + torch::Tensor result = base.pow_(exponent); + torch::Tensor lazy_exponent = CopyToDevice(exponent, device); + torch::Tensor lazy_result = lazy_base.pow_(lazy_exponent); + AllClose(result, lazy_result, /*rtol=*/1e-3, /*atol=*/1e-5); + AllClose(base, lazy_base, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestPowTensorTensorBroadcast) { + torch::Tensor base = torch::abs(torch::rand( + {4, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice()))); + torch::Tensor exponent = torch::rand( + {4, 1}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor result = torch::pow(base, exponent); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_base = CopyToDevice(base, device); + torch::Tensor lazy_exponent = CopyToDevice(exponent, device); + torch::Tensor lazy_result = torch::pow(lazy_base, lazy_exponent); + AllClose(result, lazy_result, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestPowScalarTensor) { + torch::Scalar base = 3.5; + torch::Tensor exponent = torch::rand({4, 2}); + torch::Tensor result = torch::pow(base, exponent); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_exponent = CopyToDevice(exponent, device); + torch::Tensor lazy_result = torch::pow(base, lazy_exponent); + AllClose(result, lazy_result, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestPowIntExponent) { + torch::Tensor base = torch::abs(torch::rand( + {4, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice()))); + torch::Scalar exponent = 3; + torch::Tensor result = torch::pow(base, exponent); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_base = CopyToDevice(base, device); + torch::Tensor lazy_result = torch::pow(lazy_base, exponent); + AllClose(result, lazy_result, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestFmodScalar) { + torch::Tensor a = + torch::rand({2, 2}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())) * + 100.0; + torch::Scalar divisor = 2.0; + torch::Tensor b = torch::fmod(a, divisor); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::fmod(lazy_a, divisor); + AllClose(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestFmodScalarInPlace) { + torch::Scalar divisor = 2.0; + ForEachDevice([&](const torch::Device& device) { + torch::Tensor a = + torch::rand( + {2, 2}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())) * + 100.0; + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor b = a.fmod_(divisor); + torch::Tensor lazy_b = lazy_a.fmod_(divisor); + AllClose(b, lazy_b); + AllClose(a, lazy_a); + }); +} + +TEST_F(LazyOpsTest, TestFmodTensor) { + torch::Tensor a = + torch::rand({2, 2}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())) * + 100.0; + torch::Tensor b = + torch::rand({2, 2}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())) * + 10.0; + torch::Tensor c = torch::fmod(a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::fmod(lazy_a, lazy_b); + AllClose(c, lazy_c); + }); +} + +TEST_F(LazyOpsTest, TestFmodTensorInPlace) { + torch::Tensor b = + torch::rand({2, 2}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())) * + 10.0; + ForEachDevice([&](const torch::Device& device) { + torch::Tensor a = + torch::rand( + {2, 2}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())) * + 100.0; + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor c = a.fmod_(b); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = lazy_a.fmod_(lazy_b); + AllClose(c, lazy_c); + AllClose(a, lazy_a); + }); +} + +TEST_F(LazyOpsTest, TestRemainderScalar) { + torch::Tensor a = + torch::randn( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())) * + 100.0; + torch::Scalar divisor = -2.0; + torch::Tensor b = torch::remainder(a, divisor); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = torch::remainder(lazy_a, divisor); + AllClose(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestRemainderScalarInPlace) { + torch::Scalar divisor = -2.0; + ForEachDevice([&](const torch::Device& device) { + torch::Tensor a = + torch::randn( + {2, 2}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())) * + 100.0; + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor b = a.remainder_(divisor); + torch::Tensor lazy_b = lazy_a.remainder_(divisor); + AllClose(b, lazy_b); + AllClose(a, lazy_a); + }); +} + +TEST_F(LazyOpsTest, TestRemainderTensor) { + torch::Tensor a = + torch::randn( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())) * + 100.0; + torch::Tensor b = + torch::randn( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())) * + 10.0; + torch::Tensor c = torch::remainder(a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = torch::remainder(lazy_a, lazy_b); + AllClose(c, lazy_c, /*rtol=*/1e-4, /*atol=*/1e-6); + }); +} + +TEST_F(LazyOpsTest, TestRemainderTensorInPlace) { + torch::Tensor b = + torch::randn( + {2, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())) * + 10.0; + ForEachDevice([&](const torch::Device& device) { + torch::Tensor a = + torch::randn( + {2, 2}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())) * + 100.0; + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor c = a.remainder_(b); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = lazy_a.remainder_(lazy_b); + AllClose(c, lazy_c, /*rtol=*/1e-4, /*atol=*/1e-6); + AllClose(a, lazy_a, /*rtol=*/1e-4, /*atol=*/1e-6); + }); +} + +TEST_F(LazyOpsTest, TestWhere) { + torch::Tensor a = torch::rand( + {3, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {3, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = torch::empty( + {3, 3}, torch::TensorOptions(torch::kByte).device(DefaultDevice())); + for (int i = 0; i < 3; ++i) { + for (int j = 0; j < 3; ++j) { + c[i][j] = i == j; + } + } + torch::Tensor d = torch::where(c, a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = CopyToDevice(c, device); + torch::Tensor lazy_d = torch::where(lazy_c, lazy_a, lazy_b); + AllClose(d, lazy_d); + }); +} + +TEST_F(LazyOpsTest, TestWhereBroadcast) { + torch::Tensor a = torch::rand( + {3, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::zeros( + {}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = torch::empty( + {3, 3}, torch::TensorOptions(torch::kByte).device(DefaultDevice())); + for (int i = 0; i < 3; ++i) { + for (int j = 0; j < 3; ++j) { + c[i][j] = i == j; + } + } + torch::Tensor d = torch::where(c, a, b); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = CopyToDevice(c, device); + torch::Tensor lazy_d = torch::where(lazy_c, lazy_a, lazy_b); + AllClose(d, lazy_d); + }); +} + +TEST_F(LazyOpsTest, TestThreshold) { + torch::Tensor input = + torch::rand({2, 1, 4, 6}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + float threshold = 0.4; + float value = 20; + torch::Tensor output = torch::threshold(input, threshold, value); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::threshold(lazy_input, threshold, value); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestThresholdBackward) { + float threshold = 0.4; + float value = 20; + + auto testFunction = [&](const std::vector& inputs) -> torch::Tensor { + return torch::threshold(inputs[0], threshold, value); + }; + + ForEachDevice([&](const torch::Device& device) { + TestBackward({torch::rand({2, 1, 4, 6}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true))}, + device, testFunction); + }); +} + +TEST_F(LazyOpsTest, TestThresholdInPlace) { + torch::Tensor input = + torch::rand({2, 1, 4, 6}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor output = input.clone(); + float threshold = 0.4; + float value = 20; + torch::threshold_(output, threshold, value); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_output = CopyToDevice(input, device); + torch::threshold_(lazy_output, threshold, value); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestElu) { + torch::Tensor input = + torch::rand({2, 1, 4, 6}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Scalar alpha = 0.5; + torch::Scalar scale = 2.5; + torch::Scalar input_scale = 1.5; + torch::Tensor output = torch::elu(input, alpha, scale, input_scale); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::elu(lazy_input, alpha, scale, input_scale); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestEluInPlace) { + torch::Tensor input = + torch::rand({2, 1, 4, 6}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Scalar alpha = 0.5; + torch::Scalar scale = 2.5; + torch::Scalar input_scale = 1.5; + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor output = torch::elu_(input, alpha, scale, input_scale); + torch::Tensor lazy_output = + torch::elu_(lazy_input, alpha, scale, input_scale); + AllClose(output, lazy_output); + AllClose(input, lazy_input); + }); +} + +TEST_F(LazyOpsTest, TestSelu) { + torch::Tensor input = + torch::rand({2, 1, 4, 6}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor output = torch::selu(input); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::selu(lazy_input); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestSeluInPlace) { + torch::Tensor input = + torch::rand({2, 1, 4, 6}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor output = torch::selu_(input); + torch::Tensor lazy_output = torch::selu_(lazy_input); + AllClose(output, lazy_output); + AllClose(input, lazy_input); + }); +} + +TEST_F(LazyOpsTest, TestCelu) { + torch::Tensor input = + torch::rand({2, 1, 4, 6}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Scalar alpha = 2.5; + torch::Tensor output = torch::celu(input, alpha); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::celu(lazy_input, alpha); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestCeluInPlace) { + torch::Tensor input = + torch::rand({2, 1, 4, 6}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Scalar alpha = 2.5; + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor output = torch::celu_(input, alpha); + torch::Tensor lazy_output = torch::celu_(lazy_input, alpha); + AllClose(output, lazy_output); + AllClose(input, lazy_input); + }); +} + +TEST_F(LazyOpsTest, TestGelu) { + torch::Tensor input = torch::rand( + {2, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor output = torch::gelu(input); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::gelu(lazy_input); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestAddMatMul) { + int in_channels = 32; + int out_channels = 320; + int labels = 50; + torch::Tensor input = + torch::rand({in_channels, out_channels}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor weight = + torch::rand({out_channels, labels}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor bias = torch::rand( + {labels}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + // Test beta != 1. through the CPU interop. + for (double beta : {1., 2.}) { + torch::Tensor output = torch::addmm(bias, input, weight, /*beta=*/beta); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_weight = CopyToDevice(weight, device); + torch::Tensor lazy_bias = CopyToDevice(bias, device); + torch::Tensor lazy_output = + torch::addmm(lazy_bias, lazy_input, lazy_weight, /*beta=*/beta); + AllClose(output, lazy_output); + }); + } +} + +TEST_F(LazyOpsTest, TestEmbedding) { + torch::Tensor a = torch::rand( + {32, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor i = torch::randint( + 0, 31, {3, 4}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + torch::Tensor b = + torch::embedding(a, i, /*padding_idx=*/0, /*scale_grad_by_freq=*/false, + /*sparse=*/false); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_i = CopyToDevice(i, device); + torch::Tensor lazy_b = torch::embedding(lazy_a, lazy_i, /*padding_idx=*/0, + /*scale_grad_by_freq=*/false, + /*sparse=*/false); + AllClose(b, lazy_b); + }); +} + +TEST_F(LazyOpsTest, TestOneHot) { + int num_classes = 5; + torch::Tensor input = torch::randint( + 0, num_classes, {10}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + torch::Tensor output = torch::one_hot(input, num_classes); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::one_hot(lazy_input, num_classes); + AllEqual(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestTranspose) { + torch::Tensor input = torch::rand( + {2, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor output = torch::t(input); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::t(lazy_input); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestTransposeInPlace) { + torch::Tensor input = torch::rand( + {2, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor output = input.t_(); + torch::Tensor lazy_output = lazy_input.t_(); + EXPECT_EQ(lazy_output.sizes(), output.sizes()); + AllClose(output, lazy_output); + AllClose(input, lazy_input); + }); +} + +TEST_F(LazyOpsTest, TestReshape) { + torch::Tensor input = + torch::rand({32, 20, 4, 4}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor output = torch::reshape(input, {-1, 320}); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::reshape(lazy_input, {-1, 320}); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestResize) { + // Testing a resize_() with target size bigger than original size is not + // possible, as we fill with zeros, while pytorch fills with random garbage. + torch::Tensor input = torch::rand( + {2, 2, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor saved_input = input.clone(); + input.resize_({3, 3}); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(saved_input, device); + lazy_input.resize_({3, 3}); + AllClose(input, lazy_input); + }); +} + +TEST_F(LazyOpsTest, TestViewResize) { + torch::Tensor input = torch::zeros( + {8, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor saved_input = input.clone(); + torch::Tensor output = input.view({4, 4}); + output.resize_({3, 3}); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(saved_input, device); + torch::Tensor lazy_output = lazy_input.view({4, 4}); + lazy_output.resize_({3, 3}); + AllClose(input, lazy_input); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestView) { + torch::Tensor input = + torch::rand({32, 20, 4, 4}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor output = input.view({-1, 320}); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = lazy_input.view({-1, 320}); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestViewMod) { + torch::Tensor input = + torch::zeros({32, 20, 4, 4}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor one = torch::tensor( + 1.0, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor output = input.view({-1, 320}); + output.add_(one, 1.0); + input.add_(one, 1.0); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor xinput = torch::zeros( + {32, 20, 4, 4}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor lazy_input = CopyToDevice(xinput, device); + torch::Tensor lazy_one = CopyToDevice(one, device); + torch::Tensor lazy_output = lazy_input.view({-1, 320}); + lazy_output.add_(lazy_one, 1.0); + lazy_input.add_(lazy_one, 1.0); + AllClose(output, lazy_output); + AllClose(input, lazy_input); + }); +} + +TEST_F(LazyOpsTest, TestViewModComplex) { + torch::Tensor input = + torch::zeros({32, 20, 4, 4}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor one = torch::tensor( + 1.0, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor output1 = input.view({-1, 320}); + output1.add_(one, 1.0); + torch::Tensor output2 = input.view({-1, 160}); + output2.add_(one, 1.0); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor xinput = torch::zeros( + {32, 20, 4, 4}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor lazy_input = CopyToDevice(xinput, device); + torch::Tensor lazy_one = CopyToDevice(one, device); + torch::Tensor lazy_output1 = lazy_input.view({-1, 320}); + lazy_output1.add_(lazy_one, 1.0); + torch::Tensor lazy_output2 = lazy_input.view({-1, 160}); + lazy_output2.add_(lazy_one, 1.0); + AllClose(output1, lazy_output1); + AllClose(output2, lazy_output2); + }); +} + +TEST_F(LazyOpsTest, TestViewOfViewMod) { + torch::Tensor input = + torch::zeros({32, 20, 4, 4}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor one = torch::tensor( + 1.0, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor output1 = input.view({-1, 320}); + output1.add_(one, 1.0); + torch::Tensor output2 = output1.view({-1, 160}); + output2.add_(one, 1.0); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor xinput = torch::zeros( + {32, 20, 4, 4}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor lazy_input = CopyToDevice(xinput, device); + torch::Tensor lazy_one = CopyToDevice(one, device); + torch::Tensor lazy_output1 = lazy_input.view({-1, 320}); + lazy_output1.add_(lazy_one, 1.0); + torch::Tensor lazy_output2 = lazy_output1.view({-1, 160}); + lazy_output2.add_(lazy_one, 1.0); + AllClose(output1, lazy_output1); + AllClose(output2, lazy_output2); + }); +} + +TEST_F(LazyOpsTest, TestViewSqueezeAddInPlace) { + torch::Tensor input = torch::zeros( + {2, 3, 1}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + std::vector view_size = {2, 3, 1, 1}; + int squeeze_dim = 2; + torch::Tensor one = torch::tensor( + 1.0, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor output = input.view(view_size); + output.squeeze_(squeeze_dim); + output.add_(one, 1.0); + torch::Tensor lazy_one = CopyToDevice(one, device); + torch::Tensor lazy_output = lazy_input.view(view_size); + lazy_output.squeeze_(squeeze_dim); + lazy_output.add_(lazy_one, 1.0); + AllClose(output, lazy_output); + AllClose(input, lazy_input); + }); +} + +TEST_F(LazyOpsTest, TestUnsafeView) { + torch::Tensor input = + torch::rand({32, 20, 4, 4}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor output = torch::_unsafe_view(input, {-1, 320}); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::_unsafe_view(lazy_input, {-1, 320}); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestNarrow) { + torch::Tensor a = + torch::rand({8, 10, 4, 4}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (int64_t dim : {1, -3}) { + for (int64_t start : {2, -8}) { + torch::Tensor b = a.narrow(dim, start, 6); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = lazy_a.narrow(dim, start, 6); + AllClose(b, lazy_b); + }); + } + } +} + +TEST_F(LazyOpsTest, TestNarrowUpdate) { + for (int64_t dim : {1, -2}) { + for (int64_t start : {2, -6}) { + torch::Tensor a = torch::rand( + {3, 8, 3}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor a_copy = a.clone(); + torch::Tensor b = torch::rand( + {3, 4, 3}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = a.narrow(dim, start, 4); + c.add_(b, 1.0); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a_copy, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = lazy_a.narrow(dim, start, 4); + lazy_c.add_(lazy_b, 1.0); + AllClose(c, lazy_c); + }); + } + } +} + +TEST_F(LazyOpsTest, TestNarrowUpdateBaseCheck) { + for (int64_t dim : {0, -2}) { + for (int64_t start : {2, -6}) { + torch::Tensor a = torch::zeros( + {8, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor a_copy = a.clone(); + torch::Tensor b = torch::ones( + {4, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = a.narrow(dim, start, 4); + c.add_(b, 1.0); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a_copy, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = lazy_a.narrow(dim, start, 4); + lazy_c.add_(lazy_b, 1.0); + AllClose(a, lazy_a); + }); + } + } +} + +TEST_F(LazyOpsTest, TestNarrowUpdateTwoSlices) { + for (int64_t dim : {0, -2}) { + for (int64_t start0 : {2, -6}) { + for (int64_t start1 : {6, -2}) { + torch::Tensor a = torch::zeros( + {8, 3}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor a_copy = a.clone(); + torch::Tensor b = torch::ones( + {2, 3}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = b + 1; + torch::Tensor d = a.narrow(dim, start0, 2); + torch::Tensor e = a.narrow(dim, start1, 2); + d.add_(b, 1.0); + e.add_(c, 1.0); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a_copy, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = CopyToDevice(c, device); + torch::Tensor lazy_d = lazy_a.narrow(dim, start0, 2); + torch::Tensor lazy_e = lazy_a.narrow(dim, start1, 2); + lazy_d.add_(lazy_b, 1.0); + lazy_e.add_(lazy_c, 1.0); + AllClose(d, lazy_d); + AllClose(e, lazy_e); + AllClose(a, lazy_a); + }); + } + } + } +} + +TEST_F(LazyOpsTest, TestNarrowUpdateView) { + for (int64_t dim : {0, -3}) { + for (int64_t start : {2, -6}) { + torch::Tensor a = torch::rand( + {8, 2, 3}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor a_copy = a.clone(); + torch::Tensor b = torch::rand( + {4, 6}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = a.narrow(dim, start, 4); + torch::Tensor d = c.view({4, 6}); + d.add_(b, 1.0); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a_copy, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = lazy_a.narrow(dim, start, 4); + torch::Tensor lazy_d = lazy_c.view({4, 6}); + lazy_d.add_(lazy_b, 1.0); + AllClose(d, lazy_d); + }); + } + } +} + +TEST_F(LazyOpsTest, TestNarrowInNarrowUpdate) { + for (int64_t dim : {1, -2}) { + for (int64_t start0 : {1, -7}) { + for (int64_t start1 : {1, -5}) { + torch::Tensor a = torch::rand( + {3, 8, 3}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor a_copy = a.clone(); + torch::Tensor b = torch::rand( + {3, 2, 3}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = a.narrow(dim, start0, 6); + torch::Tensor d = c.narrow(dim, start1, 2); + d.add_(b, 1.0); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a_copy, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = lazy_a.narrow(dim, start0, 6); + torch::Tensor lazy_d = lazy_c.narrow(dim, start1, 2); + lazy_d.add_(lazy_b, 1.0); + AllClose(a, lazy_a); + }); + } + } + } +} + +TEST_F(LazyOpsTest, TestNarrowCopy) { + for (int64_t dim : {1, -3}) { + for (int64_t start : {2, -8}) { + ForEachDevice([&](const torch::Device& device) { + torch::Tensor input = torch::rand( + {8, 10, 4, 4}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor result = input.narrow_copy(dim, start, 6); + input.add_(1); + torch::Tensor lazy_result = lazy_input.narrow_copy(dim, start, 6); + lazy_input.add_(1); + AllClose(result, lazy_result); + }); + } + } +} + +TEST_F(LazyOpsTest, TestViewAs) { + torch::Tensor input = + torch::rand({32, 20, 4, 4}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor empty = torch::empty({32, 320}); + torch::Tensor output = input.view_as(empty); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_empty = CopyToDevice(empty, device); + torch::Tensor lazy_output = lazy_input.view_as(lazy_empty); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestLogSoftmax) { + torch::Tensor input = + torch::rand({5, 3, 4, 2}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + int rank = input.dim(); + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor output = torch::log_softmax(input, dim); + torch::Tensor lazy_output = torch::log_softmax(lazy_input, dim); + AllClose(output, lazy_output, /*rtol=*/1e-3); + } + }); +} + +TEST_F(LazyOpsTest, TestLogSoftmaxCast) { + torch::Tensor input = + torch::rand({5, 3, 4, 2}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + int rank = input.dim(); + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor output = torch::log_softmax(input, dim, torch::kDouble); + torch::Tensor lazy_output = + torch::log_softmax(lazy_input, dim, torch::kDouble); + AllClose(output, lazy_output, /*rtol=*/1e-3); + } + }); +} + +TEST_F(LazyOpsTest, TestLogSoftmaxWrapper) { + torch::Tensor input = + torch::rand({10, 2, 6, 4}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + int rank = input.dim(); + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor output = + torch::_log_softmax(input, dim, /*half_to_float=*/false); + torch::Tensor lazy_output = + torch::_log_softmax(lazy_input, dim, /*half_to_float=*/false); + AllClose(output, lazy_output, /*rtol=*/1e-3); + } + }); +} + +TEST_F(LazyOpsTest, TestSoftmax) { + torch::Tensor input = + torch::rand({10, 2, 6, 4}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + int rank = input.dim(); + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor output = torch::softmax(input, dim); + torch::Tensor lazy_output = torch::softmax(lazy_input, dim); + AllClose(output, lazy_output, /*rtol=*/1e-3); + } + }); +} + +TEST_F(LazyOpsTest, TestSoftmaxCast) { + torch::Tensor input = + torch::rand({10, 2, 6, 4}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + int rank = input.dim(); + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor output = torch::softmax(input, dim, torch::kDouble); + torch::Tensor lazy_output = torch::softmax(lazy_input, dim, torch::kDouble); + AllClose(output, lazy_output, /*rtol=*/1e-3); + } + }); +} + +TEST_F(LazyOpsTest, TestSoftmaxWrapper) { + torch::Tensor input = + torch::rand({10, 2, 6, 4}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + int rank = input.dim(); + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor output = + torch::_softmax(input, dim, /*half_to_float=*/false); + torch::Tensor lazy_output = + torch::_softmax(lazy_input, dim, /*half_to_float=*/false); + AllClose(output, lazy_output, /*rtol=*/1e-3); + } + }); +} + +TEST_F(LazyOpsTest, TestSoftplus) { + torch::Tensor input = + torch::rand({2, 1, 4, 6}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor output = torch::softplus(input); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::softplus(lazy_input); + AllClose(output, lazy_output, /*rtol=*/1e-4); + }); +} + +TEST_F(LazyOpsTest, TestMaxPool1D) { + torch::Tensor input = torch::rand( + {1, 16, 56}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int kernel_size = 3; + for (int stride = 1; stride <= 2; ++stride) { + for (int padding = 0; padding <= 1; ++padding) { + // Test ceil_mode=true through the CPU interop. + for (bool ceil_mode : {false, true}) { + // Test dilation through the CPU interop. + for (int dilation = 1; dilation <= 2; ++dilation) { + torch::Tensor output = + torch::max_pool1d(input, /*kernel_size=*/{kernel_size}, + /*stride=*/{stride}, + /*padding=*/{padding}, /*dilation=*/{dilation}, + /*ceil_mode=*/ceil_mode); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = + torch::max_pool1d(lazy_input, + /*kernel_size=*/{kernel_size}, + /*stride=*/{stride}, + /*padding=*/{padding}, + /*dilation=*/{dilation}, + /*ceil_mode=*/ceil_mode); + AllClose(output, lazy_output); + }); + } + } + } + } +} + +TEST_F(LazyOpsTest, TestMaxPool2D) { + torch::Tensor input = + torch::rand({1, 4, 14, 14}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int kernel_size = 3; + for (int stride = 1; stride <= 2; ++stride) { + for (int padding = 0; padding <= 1; ++padding) { + // Test ceil_mode=true through the CPU interop. + for (bool ceil_mode : {false, true}) { + // Test dilation through the CPU interop. + for (int dilation = 1; dilation <= 2; ++dilation) { + torch::Tensor output = torch::max_pool2d( + input, /*kernel_size=*/{kernel_size, kernel_size}, + /*stride=*/{stride, stride}, + /*padding=*/{padding, padding}, /*dilation=*/{dilation, dilation}, + /*ceil_mode=*/ceil_mode); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = + torch::max_pool2d(lazy_input, + /*kernel_size=*/{kernel_size, kernel_size}, + /*stride=*/{stride, stride}, + /*padding=*/{padding, padding}, + /*dilation=*/{dilation, dilation}, + /*ceil_mode=*/ceil_mode); + AllClose(output, lazy_output); + }); + } + } + } + } +} + +TEST_F(LazyOpsTest, TestMaxPool2DWithIndices) { + torch::Tensor input = + torch::rand({1, 4, 14, 14}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int kernel_size = 3; + for (int stride = 1; stride <= 2; ++stride) { + for (int padding = 0; padding <= 1; ++padding) { + // Test ceil_mode=true through the CPU interop. + for (bool ceil_mode : {false, true}) { + // Test dilation through the CPU interop. + for (int dilation = 1; dilation <= 2; ++dilation) { + auto outputs = torch::max_pool2d_with_indices( + input, /*kernel_size=*/{kernel_size, kernel_size}, + /*stride=*/{stride, stride}, + /*padding=*/{padding, padding}, /*dilation=*/{dilation, dilation}, + /*ceil_mode=*/ceil_mode); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + auto lazy_outputs = torch::max_pool2d_with_indices( + lazy_input, + /*kernel_size=*/{kernel_size, kernel_size}, + /*stride=*/{stride, stride}, + /*padding=*/{padding, padding}, + /*dilation=*/{dilation, dilation}, + /*ceil_mode=*/ceil_mode); + AllClose(std::get<0>(outputs), std::get<0>(lazy_outputs)); + AllClose(std::get<1>(outputs), std::get<1>(lazy_outputs)); + }); + } + } + } + } +} + +TEST_F(LazyOpsTest, TestMaxPool2DNonSquare) { + torch::Tensor input = + torch::rand({1, 4, 14, 14}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int kernel_size = 4; + for (int stride = 1; stride <= 2; ++stride) { + for (int padding = 0; padding <= 1; ++padding) { + // Test ceil_mode=true through the CPU interop. + for (bool ceil_mode : {false, true}) { + // Test dilation through the CPU interop. + for (int dilation = 1; dilation <= 2; ++dilation) { + torch::Tensor output = torch::max_pool2d( + input, /*kernel_size=*/{kernel_size, kernel_size + 1}, + /*stride=*/{stride, stride + 1}, + /*padding=*/{padding, padding + 1}, + /*dilation=*/{dilation, dilation}, + /*ceil_mode=*/ceil_mode); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::max_pool2d( + lazy_input, + /*kernel_size=*/{kernel_size, kernel_size + 1}, + /*stride=*/{stride, stride + 1}, + /*padding=*/{padding, padding + 1}, + /*dilation=*/{dilation, dilation}, + /*ceil_mode=*/ceil_mode); + AllClose(output, lazy_output); + }); + } + } + } + } +} + +TEST_F(LazyOpsTest, TestMaxPool3D) { + torch::Tensor input = + torch::rand({1, 1, 8, 8, 8}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int kernel_size = 3; + for (int stride = 1; stride <= 2; ++stride) { + for (int padding = 0; padding <= 1; ++padding) { + // Test ceil_mode=true through the CPU interop. + for (bool ceil_mode : {false, true}) { + // Test dilation through the CPU interop. + for (int dilation = 1; dilation <= 2; ++dilation) { + torch::Tensor output = torch::max_pool3d( + input, /*kernel_size=*/{kernel_size, kernel_size, kernel_size}, + /*stride=*/{stride, stride, stride}, + /*padding=*/{padding, padding, padding}, + /*dilation=*/{dilation, dilation, dilation}, + /*ceil_mode=*/ceil_mode); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::max_pool3d( + lazy_input, + /*kernel_size=*/{kernel_size, kernel_size, kernel_size}, + /*stride=*/{stride, stride, stride}, + /*padding=*/{padding, padding, padding}, + /*dilation=*/{dilation, dilation, dilation}, + /*ceil_mode=*/ceil_mode); + AllClose(output, lazy_output); + }); + } + } + } + } +} + +TEST_F(LazyOpsTest, TestMaxPool3DWithIndices) { + torch::Tensor input = + torch::rand({1, 1, 8, 8, 8}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int kernel_size = 3; + for (int stride = 1; stride <= 2; ++stride) { + for (int padding = 0; padding <= 1; ++padding) { + // Test ceil_mode=true through the CPU interop. + for (bool ceil_mode : {false, true}) { + // Test dilation through the CPU interop. + for (int dilation = 1; dilation <= 2; ++dilation) { + auto outputs = torch::max_pool3d_with_indices( + input, /*kernel_size=*/{kernel_size, kernel_size, kernel_size}, + /*stride=*/{stride, stride, stride}, + /*padding=*/{padding, padding, padding}, + /*dilation=*/{dilation, dilation, dilation}, + /*ceil_mode=*/ceil_mode); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + auto lazy_outputs = torch::max_pool3d_with_indices( + lazy_input, + /*kernel_size=*/{kernel_size, kernel_size, kernel_size}, + /*stride=*/{stride, stride, stride}, + /*padding=*/{padding, padding, padding}, + /*dilation=*/{dilation, dilation, dilation}, + /*ceil_mode=*/ceil_mode); + + AllClose(std::get<0>(outputs), std::get<0>(lazy_outputs)); + AllClose(std::get<1>(outputs), std::get<1>(lazy_outputs)); + }); + } + } + } + } +} + +TEST_F(LazyOpsTest, TestMaxPool3DIncompleteAttributes) { + torch::Tensor input = + torch::rand({1, 1, 8, 8, 8}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int kernel_size = 3; + for (int stride = 1; stride <= 2; ++stride) { + for (int padding = 0; padding <= 1; ++padding) { + // Test ceil_mode=true through the CPU interop. + for (bool ceil_mode : {false, true}) { + // Test dilation through the CPU interop. + for (int dilation = 1; dilation <= 2; ++dilation) { + torch::Tensor output = torch::max_pool3d( + input, /*kernel_size=*/{kernel_size, kernel_size, kernel_size}, + /*stride=*/{}, + /*padding=*/{padding}, + /*dilation=*/{dilation, dilation, dilation}, + /*ceil_mode=*/ceil_mode); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::max_pool3d( + lazy_input, + /*kernel_size=*/{kernel_size, kernel_size, kernel_size}, + /*stride=*/{}, + /*padding=*/{padding}, + /*dilation=*/{dilation, dilation, dilation}, + /*ceil_mode=*/ceil_mode); + AllClose(output, lazy_output); + }); + } + } + } + } +} + +TEST_F(LazyOpsTest, TestMaxPool3DNonSquare) { + torch::Tensor input = + torch::rand({1, 1, 8, 8, 8}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int kernel_size = 4; + for (int stride = 1; stride <= 2; ++stride) { + for (int padding = 0; padding <= 1; ++padding) { + // Test ceil_mode=true through the CPU interop. + for (bool ceil_mode : {false, true}) { + // Test dilation through the CPU interop. + for (int dilation = 1; dilation <= 2; ++dilation) { + torch::Tensor output = torch::max_pool3d( + input, + /*kernel_size=*/{kernel_size, kernel_size + 1, kernel_size}, + /*stride=*/{stride, stride + 1, stride}, + /*padding=*/{padding, padding + 1, padding}, + /*dilation=*/{dilation, dilation, dilation}, + /*ceil_mode=*/ceil_mode); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::max_pool3d( + lazy_input, + /*kernel_size=*/{kernel_size, kernel_size + 1, kernel_size}, + /*stride=*/{stride, stride + 1, stride}, + /*padding=*/{padding, padding + 1, padding}, + /*dilation=*/{dilation, dilation, dilation}, + /*ceil_mode=*/ceil_mode); + AllClose(output, lazy_output); + }); + } + } + } + } +} + +TEST_F(LazyOpsTest, TestMaxPool2DNoBatch) { + torch::Tensor input = torch::rand( + {4, 14, 14}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int kernel_size = 3; + for (int stride = 1; stride <= 2; ++stride) { + for (int padding = 0; padding <= 1; ++padding) { + // Test ceil_mode=true through the CPU interop. + for (bool ceil_mode : {false, true}) { + // Test dilation through the CPU interop. + for (int dilation = 1; dilation <= 2; ++dilation) { + torch::Tensor output = torch::max_pool2d( + input, /*kernel_size=*/{kernel_size, kernel_size}, + /*stride=*/{stride, stride}, + /*padding=*/{padding, padding}, /*dilation=*/{dilation, dilation}, + /*ceil_mode=*/ceil_mode); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = + torch::max_pool2d(lazy_input, + /*kernel_size=*/{kernel_size, kernel_size}, + /*stride=*/{stride, stride}, + /*padding=*/{padding, padding}, + /*dilation=*/{dilation, dilation}, + /*ceil_mode=*/ceil_mode); + AllClose(output, lazy_output); + }); + } + } + } + } +} + +TEST_F(LazyOpsTest, TestMaxPool3DNoBatch) { + torch::Tensor input = + torch::rand({1, 8, 8, 8}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int kernel_size = 3; + for (int stride = 1; stride <= 2; ++stride) { + for (int padding = 0; padding <= 1; ++padding) { + // Test ceil_mode=true through the CPU interop. + for (bool ceil_mode : {false, true}) { + // Test dilation through the CPU interop. + for (int dilation = 1; dilation <= 2; ++dilation) { + torch::Tensor output = torch::max_pool3d( + input, /*kernel_size=*/{kernel_size, kernel_size, kernel_size}, + /*stride=*/{stride, stride, stride}, + /*padding=*/{padding, padding, padding}, + /*dilation=*/{dilation, dilation, dilation}, + /*ceil_mode=*/ceil_mode); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::max_pool3d( + lazy_input, + /*kernel_size=*/{kernel_size, kernel_size, kernel_size}, + /*stride=*/{stride, stride, stride}, + /*padding=*/{padding, padding, padding}, + /*dilation=*/{dilation, dilation, dilation}, + /*ceil_mode=*/ceil_mode); + AllClose(output, lazy_output); + }); + } + } + } + } +} + +TEST_F(LazyOpsTest, TestAvgPool1D) { + torch::Tensor input = torch::rand( + {4, 1, 28}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int kernel_size = 2; + for (int stride = 1; stride <= 2; ++stride) { + for (int padding = 0; padding <= 1; ++padding) { + for (bool count_include_pad : {true, false}) { + // Test ceil_mode=true through the CPU interop. + for (bool ceil_mode : {false, true}) { + torch::Tensor output = + torch::avg_pool1d(input, /*kernel_size=*/{kernel_size}, + /*stride=*/{stride}, + /*padding=*/{padding}, /*ceil_mode=*/ceil_mode, + /*count_include_pad=*/count_include_pad); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = + torch::avg_pool1d(lazy_input, + /*kernel_size=*/{kernel_size}, + /*stride=*/{stride}, + /*padding=*/{padding}, + /*ceil_mode=*/ceil_mode, + /*count_include_pad=*/count_include_pad); + AllClose(output, lazy_output); + }); + } + } + } + } +} + +TEST_F(LazyOpsTest, TestAvgPool2D) { + torch::Tensor input = + torch::rand({2, 1, 14, 14}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int kernel_size = 2; + for (int stride = 1; stride <= 2; ++stride) { + for (int padding = 0; padding <= 1; ++padding) { + for (bool count_include_pad : {true, false}) { + // Test ceil_mode=true through the CPU interop. + for (bool ceil_mode : {false, true}) { + torch::Tensor output = torch::avg_pool2d( + input, /*kernel_size=*/{kernel_size, kernel_size}, + /*stride=*/{stride, stride}, + /*padding=*/{padding, padding}, /*ceil_mode=*/ceil_mode, + /*count_include_pad=*/count_include_pad); + ForEachDevice([&](const torch::Device& device) { + // torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = + torch::avg_pool2d(lazy_input, + /*kernel_size=*/{kernel_size, kernel_size}, + /*stride=*/{stride, stride}, + /*padding=*/{padding, padding}, + /*ceil_mode=*/ceil_mode, + /*count_include_pad=*/count_include_pad); + AllClose(output, lazy_output.to(torch::kCPU)); + }); + } + } + } + } +} + +TEST_F(LazyOpsTest, TestAvgPool2DNonSquare) { + torch::Tensor input = + torch::rand({2, 1, 14, 14}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int kernel_size = 4; + for (int stride = 1; stride <= 2; ++stride) { + for (int padding = 0; padding <= 1; ++padding) { + for (bool count_include_pad : {true, false}) { + // Test ceil_mode=true through the CPU interop. + for (bool ceil_mode : {false, true}) { + torch::Tensor output = torch::avg_pool2d( + input, /*kernel_size=*/{kernel_size, kernel_size + 1}, + /*stride=*/{stride, stride + 1}, + /*padding=*/{padding, padding + 1}, /*ceil_mode=*/ceil_mode, + /*count_include_pad=*/count_include_pad); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::avg_pool2d( + lazy_input, + /*kernel_size=*/{kernel_size, kernel_size + 1}, + /*stride=*/{stride, stride + 1}, + /*padding=*/{padding, padding + 1}, + /*ceil_mode=*/ceil_mode, + /*count_include_pad=*/count_include_pad); + AllClose(output, lazy_output); + }); + } + } + } + } +} + +TEST_F(LazyOpsTest, TestAvgPool3D) { + torch::Tensor input = + torch::rand({1, 1, 7, 7, 7}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int kernel_size = 2; + for (int stride = 1; stride <= 2; ++stride) { + for (int padding = 0; padding <= 1; ++padding) { + for (bool count_include_pad : {true, false}) { + // Test ceil_mode=true through the CPU interop. + for (bool ceil_mode : {false, true}) { + torch::Tensor output = torch::avg_pool3d( + input, /*kernel_size=*/{kernel_size, kernel_size, kernel_size}, + /*stride=*/{stride, stride, stride}, + /*padding=*/{padding, padding, padding}, /*ceil_mode=*/ceil_mode, + /*count_include_pad=*/count_include_pad); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::avg_pool3d( + lazy_input, + /*kernel_size=*/{kernel_size, kernel_size, kernel_size}, + /*stride=*/{stride, stride, stride}, + /*padding=*/{padding, padding, padding}, + /*ceil_mode=*/ceil_mode, + /*count_include_pad=*/count_include_pad); + AllClose(output, lazy_output); + }); + } + } + } + } +} + +TEST_F(LazyOpsTest, TestAvgPool3DIncompleteAttributes) { + torch::Tensor input = + torch::rand({1, 1, 7, 7, 7}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int kernel_size = 2; + for (int stride = 1; stride <= 2; ++stride) { + for (int padding = 0; padding <= 1; ++padding) { + for (bool count_include_pad : {true, false}) { + // Test ceil_mode=true through the CPU interop. + for (bool ceil_mode : {false, true}) { + torch::Tensor output = torch::avg_pool3d( + input, /*kernel_size=*/{kernel_size, kernel_size, kernel_size}, + /*stride=*/{}, + /*padding=*/{padding, padding, padding}, /*ceil_mode=*/ceil_mode, + /*count_include_pad=*/count_include_pad); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::avg_pool3d( + lazy_input, + /*kernel_size=*/{kernel_size, kernel_size, kernel_size}, + /*stride=*/{}, + /*padding=*/{padding, padding, padding}, + /*ceil_mode=*/ceil_mode, + /*count_include_pad=*/count_include_pad); + AllClose(output, lazy_output); + }); + } + } + } + } +} + +TEST_F(LazyOpsTest, TestAvgPool3DNonSquare) { + torch::Tensor input = + torch::rand({1, 1, 7, 7, 7}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int kernel_size = 4; + for (int stride = 1; stride <= 2; ++stride) { + for (int padding = 0; padding <= 1; ++padding) { + for (bool count_include_pad : {true, false}) { + // Test ceil_mode=true through the CPU interop. + for (bool ceil_mode : {false, true}) { + torch::Tensor output = torch::avg_pool3d( + input, + /*kernel_size=*/{kernel_size, kernel_size + 1, kernel_size}, + /*stride=*/{stride, stride + 1, stride}, + /*padding=*/{padding, padding + 1, padding}, + /*ceil_mode=*/ceil_mode, + /*count_include_pad=*/count_include_pad); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::avg_pool3d( + lazy_input, + /*kernel_size=*/{kernel_size, kernel_size + 1, kernel_size}, + /*stride=*/{stride, stride + 1, stride}, + /*padding=*/{padding, padding + 1, padding}, + /*ceil_mode=*/ceil_mode, + /*count_include_pad=*/count_include_pad); + AllClose(output, lazy_output); + }); + } + } + } + } +} + +TEST_F(LazyOpsTest, TestAvgPool2DNoBatch) { + torch::Tensor input = torch::rand( + {1, 7, 7}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int kernel_size = 2; + for (int stride = 1; stride <= 2; ++stride) { + for (int padding = 0; padding <= 1; ++padding) { + for (bool count_include_pad : {true, false}) { + // Test ceil_mode=true through the CPU interop. + for (bool ceil_mode : {false, true}) { + torch::Tensor output = torch::avg_pool2d( + input, /*kernel_size=*/{kernel_size, kernel_size}, + /*stride=*/{stride, stride}, + /*padding=*/{padding, padding}, /*ceil_mode=*/ceil_mode, + /*count_include_pad=*/count_include_pad); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = + torch::avg_pool2d(lazy_input, + /*kernel_size=*/{kernel_size, kernel_size}, + /*stride=*/{stride, stride}, + /*padding=*/{padding, padding}, + /*ceil_mode=*/ceil_mode, + /*count_include_pad=*/count_include_pad); + AllClose(output, lazy_output); + }); + } + } + } + } +} + +TEST_F(LazyOpsTest, TestAvgPool3DNoBatch) { + torch::Tensor input = + torch::rand({1, 7, 7, 7}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int kernel_size = 2; + for (int stride = 1; stride <= 2; ++stride) { + for (int padding = 0; padding <= 1; ++padding) { + for (bool count_include_pad : {true, false}) { + // Test ceil_mode=true through the CPU interop. + for (bool ceil_mode : {false, true}) { + torch::Tensor output = torch::avg_pool3d( + input, /*kernel_size=*/{kernel_size, kernel_size, kernel_size}, + /*stride=*/{stride, stride, stride}, + /*padding=*/{padding, padding, padding}, /*ceil_mode=*/ceil_mode, + /*count_include_pad=*/count_include_pad); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::avg_pool3d( + lazy_input, + /*kernel_size=*/{kernel_size, kernel_size, kernel_size}, + /*stride=*/{stride, stride, stride}, + /*padding=*/{padding, padding, padding}, + /*ceil_mode=*/ceil_mode, + /*count_include_pad=*/count_include_pad); + AllClose(output, lazy_output); + }); + } + } + } + } +} + +TEST_F(LazyOpsTest, TestAdaptiveAvgPool2D) { + torch::Tensor input = + torch::rand({4, 1, 28, 28}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (int64_t output_size : {7, 4}) { + torch::Tensor output = + torch::adaptive_avg_pool2d(input, {output_size, output_size}); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = + torch::adaptive_avg_pool2d(lazy_input, {output_size, output_size}); + AllClose(output, lazy_output); + }); + } +} + +TEST_F(LazyOpsTest, TestAdaptiveAvgPool3D) { + torch::Tensor input = + torch::rand({9, 4, 56, 28, 28}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (int64_t output_size : {7, 4}) { + torch::Tensor output = torch::adaptive_avg_pool3d( + input, {output_size, output_size, output_size}); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::adaptive_avg_pool3d( + lazy_input, {output_size, output_size, output_size}); + AllClose(output, lazy_output); + }); + } +} + +TEST_F(LazyOpsTest, TestAdaptiveAvgPool3DNoBatch) { + torch::Tensor input = + torch::rand({3, 56, 28, 28}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (int64_t output_size : {7, 4}) { + torch::Tensor output = torch::adaptive_avg_pool3d( + input, {output_size, output_size, output_size}); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::adaptive_avg_pool3d( + lazy_input, {output_size, output_size, output_size}); + AllClose(output, lazy_output); + }); + } +} + +TEST_F(LazyOpsTest, TestAdaptiveAvgPool2DNoBatch) { + torch::Tensor input = torch::rand( + {1, 56, 56}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (int64_t output_size : {7, 8}) { + torch::Tensor output = + torch::adaptive_avg_pool2d(input, {output_size, output_size}); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = + torch::adaptive_avg_pool2d(lazy_input, {output_size, output_size}); + AllClose(output, lazy_output); + }); + } +} + +TEST_F(LazyOpsTest, TestMaxUnpool2D) { + int kernel_size = 2; + torch::Tensor input = + torch::rand({2, 2, 8, 8}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (int stride = 1; stride <= 2; ++stride) { + for (int padding = 0; padding <= 1; ++padding) { + // Test ceil_mode=true through the CPU interop. + for (bool ceil_mode : {false, true}) { + // Test dilation through the CPU interop. + for (int dilation = 1; dilation <= 2; ++dilation) { + torch::Tensor output; + torch::Tensor indices; + std::tie(output, indices) = torch::max_pool2d_with_indices( + input, /*kernel_size=*/{kernel_size, kernel_size}, + /*stride=*/{stride, stride}, + /*padding=*/{padding, padding}, /*dilation=*/{dilation, dilation}, + /*ceil_mode=*/ceil_mode); + + std::vector output_size({input.size(2), input.size(3)}); + at::Tensor utensor = + torch::max_unpool2d(output, indices, output_size); + + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_output = CopyToDevice(output, device); + torch::Tensor lazy_indices = CopyToDevice(indices, device); + at::Tensor lazy_utensor = + torch::max_unpool2d(lazy_output, lazy_indices, output_size); + AllClose(utensor, lazy_utensor); + }); + } + } + } + } +} + +TEST_F(LazyOpsTest, TestMaxUnpool3D) { + int kernel_size = 2; + torch::Tensor input = + torch::rand({1, 1, 4, 4, 4}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (int stride = 1; stride <= 2; ++stride) { + for (int padding = 0; padding <= 1; ++padding) { + // Test ceil_mode=true through the CPU interop. + for (bool ceil_mode : {false, true}) { + // Test dilation through the CPU interop. + for (int dilation = 1; dilation <= 2; ++dilation) { + torch::Tensor output; + torch::Tensor indices; + std::tie(output, indices) = torch::max_pool3d_with_indices( + input, /*kernel_size=*/{kernel_size, kernel_size, kernel_size}, + /*stride=*/{stride, stride, stride}, + /*padding=*/{padding, padding, padding}, + /*dilation=*/{dilation, dilation, dilation}, + /*ceil_mode=*/ceil_mode); + + std::vector output_size( + {input.size(2), input.size(3), input.size(4)}); + at::Tensor utensor = torch::max_unpool3d( + output, indices, output_size, /*stride=*/{stride, stride, stride}, + /*padding=*/{padding, padding, padding}); + + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_output = CopyToDevice(output, device); + torch::Tensor lazy_indices = CopyToDevice(indices, device); + at::Tensor lazy_utensor = + torch::max_unpool3d(lazy_output, lazy_indices, output_size, + /*stride=*/{stride, stride, stride}, + /*padding=*/{padding, padding, padding}); + AllClose(utensor, lazy_utensor); + }); + } + } + } + } +} + +TEST_F(LazyOpsTest, TestNllLoss) { + + // TODO(whc) debug divide-by-zero failure under ASAN + GTEST_SKIP(); + + int batch = 6; + int classes = 2; + // TODO(asuhan): Fix the torch::kDouble case. + for (auto dtype : {torch::kFloat}) { + for (int ignore_index : {-1, 0, 1, 5}) { + for (bool def_weight : {false, true}) { + torch::Tensor input = + torch::rand({batch, classes}, + torch::TensorOptions(dtype).device(DefaultDevice())); + torch::Tensor target = torch::randint( + std::min(ignore_index, 0), classes, {batch}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + torch::Tensor weight; + if (def_weight) { + weight = torch::rand( + {classes}, torch::TensorOptions(dtype).device(DefaultDevice())); + } + for (torch::Reduction::Reduction reduction : + {torch::Reduction::Mean, torch::Reduction::Sum, + torch::Reduction::None}) { + torch::Tensor output = + torch::nll_loss(/*self=*/input, /*target=*/target, + /*weight=*/weight, + /*reduction=*/reduction, + /*ignore_index=*/ignore_index); + + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_target = CopyToDevice(target, device); + torch::Tensor lazy_weight = + def_weight ? CopyToDevice(weight, device) : torch::Tensor(); + torch::Tensor lazy_output = torch::nll_loss( + /*self=*/lazy_input, /*target=*/lazy_target, + /*weight=*/lazy_weight, + /*reduction=*/reduction, /*ignore_index=*/ignore_index); + AllClose(output, lazy_output); + }); + } + } + } + } +} + +TEST_F(LazyOpsTest, TestNllLoss2d) { + int batch = 6; + int classes = 2; + int height = 3; + int width = 3; + // TODO(asuhan): Fix the torch::kDouble case. + for (auto dtype : {torch::kFloat}) { + for (int ignore_index : {-1, 0, 1, 5}) { + for (bool def_weight : {false, true}) { + torch::Tensor input = + torch::rand({batch, classes, height, width}, + torch::TensorOptions(dtype).device(DefaultDevice())); + torch::Tensor target = torch::randint( + std::min(ignore_index, 0), classes, {batch, height, width}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + torch::Tensor weight; + if (def_weight) { + weight = torch::rand( + {classes}, torch::TensorOptions(dtype).device(DefaultDevice())); + } + for (torch::Reduction::Reduction reduction : + {torch::Reduction::Mean, torch::Reduction::Sum, + torch::Reduction::None}) { + torch::Tensor output = + torch::nll_loss2d(/*self=*/input, /*target=*/target, + /*weight=*/weight, + /*reduction=*/reduction, + /*ignore_index=*/ignore_index); + + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_target = CopyToDevice(target, device); + torch::Tensor lazy_weight = + def_weight ? CopyToDevice(weight, device) : torch::Tensor(); + torch::Tensor lazy_output = torch::nll_loss2d( + /*self=*/lazy_input, /*target=*/lazy_target, + /*weight=*/lazy_weight, + /*reduction=*/reduction, /*ignore_index=*/ignore_index); + AllClose(output, lazy_output); + }); + } + } + } + } +} + +TEST_F(LazyOpsTest, TestSmoothL1Loss) { + torch::Tensor input = torch::randn( + {2, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor target = torch::randn( + {2, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (torch::Reduction::Reduction reduction : + {torch::Reduction::None, torch::Reduction::Mean, + torch::Reduction::Sum}) { + for (double beta : {0.25, 1.}) { + torch::Tensor output = + torch::smooth_l1_loss(input, target, reduction, beta); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_target = CopyToDevice(target, device); + torch::Tensor lazy_output = + torch::smooth_l1_loss(lazy_input, lazy_target, reduction, beta); + AllClose(output, lazy_output); + }); + } + } +} + +TEST_F(LazyOpsTest, TestL1Loss) { + torch::Tensor input = torch::randn( + {2, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor target = torch::randn( + {2, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (torch::Reduction::Reduction reduction : + {torch::Reduction::None, torch::Reduction::Mean, + torch::Reduction::Sum}) { + torch::Tensor output = torch::l1_loss(input, target, reduction); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_target = CopyToDevice(target, device); + torch::Tensor lazy_output = + torch::l1_loss(lazy_input, lazy_target, reduction); + AllClose(output, lazy_output); + }); + } +} + +TEST_F(LazyOpsTest, TestL1LossBackward) { + for (torch::Reduction::Reduction reduction : + {torch::Reduction::None, torch::Reduction::Mean, + torch::Reduction::Sum}) { + auto testfn = + [&](const std::vector& inputs) -> torch::Tensor { + return torch::l1_loss(inputs[0], inputs[1], reduction); + }; + ForEachDevice([&](const torch::Device& device) { + TestBackward({torch::rand({2, 4}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true)), + torch::rand({2, 4}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()))}, + device, testfn); + }); + } +} + +TEST_F(LazyOpsTest, TestMseLoss) { + torch::Tensor input = torch::randn( + {2, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor target = torch::randn( + {2, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (torch::Reduction::Reduction reduction : + {torch::Reduction::None, torch::Reduction::Mean, + torch::Reduction::Sum}) { + torch::Tensor output = torch::mse_loss(input, target, reduction); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_target = CopyToDevice(target, device); + torch::Tensor lazy_output = + torch::mse_loss(lazy_input, lazy_target, reduction); + AllClose(output, lazy_output); + }); + } +} + +TEST_F(LazyOpsTest, TestMseLossBackward) { + for (torch::Reduction::Reduction reduction : + {torch::Reduction::None, torch::Reduction::Mean, + torch::Reduction::Sum}) { + auto testfn = + [&](const std::vector& inputs) -> torch::Tensor { + return torch::mse_loss(inputs[0], inputs[1], reduction); + }; + ForEachDevice([&](const torch::Device& device) { + TestBackward({torch::rand({2, 4}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true)), + torch::rand({2, 4}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()))}, + device, testfn); + }); + } +} + +TEST_F(LazyOpsTest, TestBatchNorm1D) { + int num_features = 3; + torch::Tensor input = + torch::rand({2, num_features, 4}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor weight = + torch::rand({num_features}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor bias = + torch::rand({num_features}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor running_mean = + torch::zeros({num_features}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor running_var = + torch::ones({num_features}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + double momentum = 0.1; + double eps = 0.5; + torch::Tensor undef; + for (bool training : {true, false}) { + for (bool undef_weight_bias : {false, true}) { + torch::Tensor output = torch::batch_norm( + /*input=*/input, /*weight=*/undef_weight_bias ? undef : weight, + /*bias=*/undef_weight_bias ? undef : bias, + /*running_mean=*/running_mean, /*running_var=*/running_var, + /*training=*/training, /*momentum=*/momentum, /*eps=*/eps, + /*cudnn_enabled=*/false); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_weight = + undef_weight_bias ? undef : CopyToDevice(weight, device); + torch::Tensor lazy_bias = + undef_weight_bias ? undef : CopyToDevice(bias, device); + torch::Tensor lazy_running_mean = CopyToDevice(running_mean, device); + torch::Tensor lazy_running_var = CopyToDevice(running_var, device); + torch::Tensor lazy_output = torch::batch_norm( + /*input=*/lazy_input, /*weight=*/lazy_weight, /*bias=*/lazy_bias, + /*running_mean=*/lazy_running_mean, /*running_var=*/lazy_running_var, + /*training=*/training, /*momentum=*/momentum, /*eps=*/eps, + /*cudnn_enabled=*/false); + AllClose(output, lazy_output, /*rtol=*/1e-3, /*atol=*/1e-5); + }); + } + } +} + +TEST_F(LazyOpsTest, TestBatchNorm2D) { + int num_features = 3; + torch::Tensor input = + torch::rand({2, num_features, 4, 4}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor weight = + torch::rand({num_features}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor bias = + torch::rand({num_features}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor running_mean = + torch::zeros({num_features}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor running_var = + torch::ones({num_features}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + double momentum = 0.1; + double eps = 0.5; + torch::Tensor undef; + for (bool training : {true, false}) { + for (bool undef_weight_bias : {false, true}) { + torch::Tensor output = torch::batch_norm( + /*input=*/input, /*weight=*/undef_weight_bias ? undef : weight, + /*bias=*/undef_weight_bias ? undef : bias, + /*running_mean=*/running_mean, /*running_var=*/running_var, + /*training=*/training, /*momentum=*/momentum, /*eps=*/eps, + /*cudnn_enabled=*/false); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_weight = + undef_weight_bias ? undef : CopyToDevice(weight, device); + torch::Tensor lazy_bias = + undef_weight_bias ? undef : CopyToDevice(bias, device); + torch::Tensor lazy_running_mean = CopyToDevice(running_mean, device); + torch::Tensor lazy_running_var = CopyToDevice(running_var, device); + torch::Tensor lazy_output = torch::batch_norm( + /*input=*/lazy_input, /*weight=*/lazy_weight, /*bias=*/lazy_bias, + /*running_mean=*/lazy_running_mean, /*running_var=*/lazy_running_var, + /*training=*/training, /*momentum=*/momentum, /*eps=*/eps, + /*cudnn_enabled=*/false); + AllClose(output, lazy_output, /*rtol=*/1e-3, /*atol=*/1e-5); + }); + } + } +} + +TEST_F(LazyOpsTest, TestDim) { + torch::Tensor input = torch::rand( + {2, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + EXPECT_EQ(input.dim(), lazy_input.dim()); + }); +} + +TEST_F(LazyOpsTest, TestContiguous) { + torch::Tensor input = torch::rand( + {2, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor output = torch::native::contiguous(input); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::native::contiguous(lazy_input); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestSqueezeAll) { + torch::Tensor input = + torch::rand({2, 1, 3, 1}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor output = torch::squeeze(input); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::squeeze(lazy_input); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestSqueezeAllInPlace) { + ForEachDevice([&](const torch::Device& device) { + torch::Tensor input = torch::rand( + {2, 1, 3, 1}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor output = input.squeeze_(); + torch::Tensor lazy_output = lazy_input.squeeze_(); + AllClose(output, lazy_output); + AllClose(input, lazy_input); + ASSERT_EQ(input.dim(), lazy_input.dim()); + for (int64_t dim_idx = 0; dim_idx < input.dim(); ++dim_idx) { + ASSERT_EQ(input.size(dim_idx), lazy_input.size(dim_idx)); + } + }); +} + +TEST_F(LazyOpsTest, TestSqueezeOne) { + torch::Tensor input = + torch::rand({2, 1, 3, 1}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int rank = input.dim(); + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor output = torch::squeeze(input, dim); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::squeeze(lazy_input, dim); + AllClose(output, lazy_output); + }); + } +} + +TEST_F(LazyOpsTest, TestSqueezeOneInPlace) { + int rank = 4; + for (int dim = -rank; dim < rank; ++dim) { + ForEachDevice([&](const torch::Device& device) { + torch::Tensor input = torch::rand( + {2, 1, 3, 1}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor output = input.squeeze_(dim); + torch::Tensor lazy_output = lazy_input.squeeze_(dim); + AllClose(output, lazy_output); + AllClose(input, lazy_input); + ASSERT_EQ(input.dim(), lazy_input.dim()); + for (int64_t dim_idx = 0; dim_idx < input.dim(); ++dim_idx) { + ASSERT_EQ(input.size(dim_idx), lazy_input.size(dim_idx)); + } + }); + } +} + +TEST_F(LazyOpsTest, TestUnsqueeze) { + torch::Tensor input = torch::rand( + {2, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int rank = input.dim() + 1; + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor output = torch::unsqueeze(input, dim); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::unsqueeze(lazy_input, dim); + AllClose(output, lazy_output); + }); + } +} + +TEST_F(LazyOpsTest, TestUnsqueezeInPlace) { + torch::Tensor input = torch::rand( + {2, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int rank = input.dim() + 1; + for (int dim = -rank; dim < rank; ++dim) { + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor output = input.unsqueeze_(dim); + torch::Tensor lazy_output = lazy_input.unsqueeze_(dim); + AllClose(output, lazy_output); + AllClose(input, lazy_input); + ASSERT_EQ(input.dim(), lazy_input.dim()); + for (int64_t dim_idx = 0; dim_idx < input.dim(); ++dim_idx) { + ASSERT_EQ(input.size(dim_idx), lazy_input.size(dim_idx)); + } + }); + } +} + +TEST_F(LazyOpsTest, TestMaskedFill) { + torch::Tensor input = torch::rand( + {2, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor mask = torch::randint( + 0, 2, {2, 3}, torch::TensorOptions(torch::kBool).device(DefaultDevice())); + torch::Scalar value(42); + torch::Tensor result = torch::masked_fill(input, mask, value); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_mask = CopyToDevice(mask, device); + torch::Tensor lazy_result = torch::masked_fill(lazy_input, lazy_mask, value); + AllClose(result, lazy_result); + }); +} + +TEST_F(LazyOpsTest, TestMaskedFillInPlace) { + torch::Scalar value(42); + torch::Tensor mask = torch::randint( + 0, 2, {2, 3}, torch::TensorOptions(torch::kBool).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor input = torch::rand( + {2, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_mask = CopyToDevice(mask, device); + torch::Tensor result = input.masked_fill_(mask, value); + torch::Tensor lazy_result = lazy_input.masked_fill_(lazy_mask, value); + AllClose(result, lazy_result); + AllClose(input, lazy_input); + }); +} + +TEST_F(LazyOpsTest, TestMaskedFillBroadcast) { + torch::Tensor input = + torch::rand({2, 5, 4, 3}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor mask = torch::randint( + 0, 2, {4, 1}, torch::TensorOptions(torch::kBool).device(DefaultDevice())); + torch::Scalar value(42); + torch::Tensor result = torch::masked_fill(input, mask, value); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_mask = CopyToDevice(mask, device); + torch::Tensor lazy_result = torch::masked_fill(lazy_input, lazy_mask, value); + AllClose(result, lazy_result); + }); +} + +TEST_F(LazyOpsTest, TestFill) { + torch::Scalar value(42); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor input = torch::empty( + {2, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor result = torch::fill_(input, value); + torch::Tensor lazy_result = torch::fill_(lazy_input, value); + AllClose(result, lazy_result); + AllClose(input, lazy_input); + }); +} + +TEST_F(LazyOpsTest, TestFillWithRank0) { + torch::Tensor value = torch::scalar_tensor(42); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor input = torch::empty( + {2, 3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor result = torch::fill_(input, value); + torch::Tensor lazy_value = CopyToDevice(value, device); + torch::Tensor lazy_result = torch::fill_(lazy_input, value); + AllClose(result, lazy_result); + AllClose(input, lazy_input); + }); +} + +TEST_F(LazyOpsTest, TestPermute) { + torch::Tensor input = torch::rand( + {2, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + std::vector> dims_permutations = { + {0, 1, 2}, {0, 2, 1}, {1, 0, 2}, {1, 2, 0}, {2, 0, 1}, {2, 1, 0}}; + int rank = input.dim(); + for (std::vector dims_permutation : dims_permutations) { + for (bool negative_dims : {false, true}) { + if (negative_dims) { + std::for_each(dims_permutation.begin(), dims_permutation.end(), + [rank](int64_t& dim) { dim -= rank; }); + } + torch::Tensor output = input.permute(dims_permutation); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = lazy_input.permute(dims_permutation); + AllClose(output, lazy_output); + }); + } + } +} + +TEST_F(LazyOpsTest, TestPermuteMod) { + std::vector> dims_permutations = { + {0, 1, 2}, {0, 2, 1}, {1, 0, 2}, {1, 2, 0}, {2, 0, 1}, {2, 1, 0}}; + std::vector input_sizes = {2, 3, 4}; + int rank = input_sizes.size(); + for (std::vector dims_permutation : dims_permutations) { + for (bool negative_dims : {false, true}) { + if (negative_dims) { + std::for_each(dims_permutation.begin(), dims_permutation.end(), + [rank](int64_t& dim) { dim -= rank; }); + } + torch::Tensor input = torch::zeros( + input_sizes, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor one = torch::tensor( + 1.0, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor output = input.permute(dims_permutation); + output.add_(one, 1.0); + input.add_(one, 1.0); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor xinput = torch::zeros( + input_sizes, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor lazy_input = CopyToDevice(xinput, device); + torch::Tensor lazy_one = CopyToDevice(one, device); + torch::Tensor lazy_output = lazy_input.permute(dims_permutation); + lazy_output.add_(lazy_one, 1.0); + lazy_input.add_(lazy_one, 1.0); + AllClose(output, lazy_output); + AllClose(input, lazy_input); + }); + } + } +} + +TEST_F(LazyOpsTest, TestFlip) { + torch::Tensor input = torch::rand( + {2, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + std::vector> dim_powerset = { + {0}, {1}, {2}, {0, 1}, {1, 2}, {2, 0}, {0, 1, 2}}; + for (std::vector flip_dims : dim_powerset) { + for (bool negative_dims : {false, true}) { + if (negative_dims) { + std::for_each(flip_dims.begin(), flip_dims.end(), + [](int64_t& dim) { dim -= 3; }); + } + torch::Tensor output = torch::flip(input, flip_dims); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::flip(lazy_input, flip_dims); + AllClose(output, lazy_output); + }); + } + } +} + +TEST_F(LazyOpsTest, TestPixelShuffle) { + torch::Tensor input = + torch::rand({5, 18, 4, 4}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int upscale_factor = 3; + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor output = torch::pixel_shuffle(input, upscale_factor); + torch::Tensor lazy_output = torch::pixel_shuffle(lazy_input, upscale_factor); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestSumToSize) { + torch::Tensor input = + torch::rand({4, 6, 3, 7}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + std::vector out_size = {4, 1, 1, 7}; + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor output = input.sum_to_size(out_size); + torch::Tensor lazy_output = lazy_input.sum_to_size(out_size); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestTransposeDims) { + torch::Tensor input = torch::rand( + {2, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int dim0 = 0; + int dim1 = 2; + torch::Tensor output = torch::transpose(input, dim0, dim1); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::transpose(lazy_input, dim0, dim1); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestTransposeDimsMod) { + std::vector input_sizes = {2, 3, 4}; + int dim0 = 0; + int dim1 = 2; + torch::Tensor input = torch::zeros( + input_sizes, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor one = torch::tensor( + 1.0, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor output = torch::transpose(input, dim0, dim1); + output.add_(one, 1.0); + input.add_(one, 1.0); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor xinput = torch::zeros( + input_sizes, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor lazy_input = CopyToDevice(xinput, device); + torch::Tensor lazy_one = CopyToDevice(one, device); + torch::Tensor lazy_output = torch::transpose(lazy_input, dim0, dim1); + lazy_output.add_(lazy_one, 1.0); + lazy_input.add_(lazy_one, 1.0); + AllClose(output, lazy_output); + AllClose(input, lazy_input); + }); +} + +TEST_F(LazyOpsTest, TestTransposeDimsInPlace) { + torch::Tensor input = torch::rand( + {2, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int dim0 = 0; + int dim1 = 2; + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor output = input.transpose_(dim0, dim1); + torch::Tensor lazy_output = lazy_input.transpose_(dim0, dim1); + AllClose(output, lazy_output); + AllClose(input, lazy_input); + }); +} + +TEST_F(LazyOpsTest, TestSplit) { + torch::Tensor input = torch::rand( + {7, 8, 9}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int rank = input.dim(); + for (int split_size : {2, 3}) { + for (int dim = -rank; dim < rank; ++dim) { + std::vector outputs = torch::split(input, split_size, dim); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + std::vector lazy_outputs = + torch::split(lazy_input, split_size, dim); + ASSERT_EQ(outputs.size(), lazy_outputs.size()); + for (size_t i = 0; i < outputs.size(); ++i) { + AllClose(outputs[i], lazy_outputs[i]); + } + }); + } + } +} + +TEST_F(LazyOpsTest, TestSplitEmpty) { + torch::Tensor input = torch::rand( + {0}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int split_size = 0; + int dim = 0; + std::vector outputs = torch::split(input, split_size, dim); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + std::vector lazy_outputs = + torch::split(lazy_input, split_size, dim); + ASSERT_EQ(outputs.size(), lazy_outputs.size()); + for (size_t i = 0; i < outputs.size(); ++i) { + AllClose(outputs[i], lazy_outputs[i]); + } + }); +} + +TEST_F(LazyOpsTest, TestSplitWithSizes) { + torch::Tensor input = + torch::rand({15, 15, 15}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int rank = input.dim(); + for (int dim = -rank; dim < rank; ++dim) { + std::vector outputs = + torch::split_with_sizes(input, {4, 5, 6}, dim); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + std::vector lazy_outputs = + torch::split_with_sizes(lazy_input, {4, 5, 6}, dim); + ASSERT_EQ(outputs.size(), lazy_outputs.size()); + for (size_t i = 0; i < outputs.size(); ++i) { + AllClose(outputs[i], lazy_outputs[i]); + } + }); + } +} + +TEST_F(LazyOpsTest, TestCrossImplicitDim) { + std::vector> dim_sizes = { + {4, 5, 3}, {4, 3, 5}, {3, 4, 5}}; + for (auto dim_size : dim_sizes) { + torch::Tensor input = torch::rand( + dim_size, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor other = torch::rand( + dim_size, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor result = torch::cross(input, other); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_other = CopyToDevice(other, device); + torch::Tensor lazy_result = torch::cross(lazy_input, lazy_other); + AllClose(result, lazy_result); + }); + } +} + +TEST_F(LazyOpsTest, TestCrossExplicitDim) { + std::vector dim_size = {3, 3}; + torch::Tensor input = torch::rand( + dim_size, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor other = torch::rand( + dim_size, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + int rank = dim_size.size(); + for (int dim = -rank; dim < rank; ++dim) { + torch::Tensor result = torch::cross(input, other, dim); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_other = CopyToDevice(other, device); + torch::Tensor lazy_result = torch::cross(lazy_input, lazy_other, dim); + AllClose(result, lazy_result); + }); + } +} + +TEST_F(LazyOpsTest, TestCrossZeroDim) { + torch::Tensor input = + torch::rand({0, 1, 3, 0}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor result = torch::cross(input, input); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_result = torch::cross(lazy_input, lazy_input); + AllClose(result, lazy_result); + }); +} + +TEST_F(LazyOpsTest, TestTriu) { + int size = 5; + torch::Tensor input = + torch::rand({size, size}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + // Test all diagonals and out of bounds (must be no-op). + for (int diagonal = -size; diagonal <= size; ++diagonal) { + torch::Tensor output = torch::triu(input, diagonal); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::triu(lazy_input, diagonal); + AllClose(output, lazy_output); + }); + } +} + +TEST_F(LazyOpsTest, TestTriuNonSquare) { + int size = 5; + torch::Tensor input = + torch::rand({size, size + 1}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + // Test all diagonals and out of bounds (must be no-op). + for (int diagonal = -size; diagonal <= size; ++diagonal) { + torch::Tensor output = torch::triu(input, diagonal); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::triu(lazy_input, diagonal); + AllClose(output, lazy_output); + }); + } +} + +TEST_F(LazyOpsTest, TestTriuBatch) { + int size = 5; + int batch_size = 3; + torch::Tensor input = + torch::rand({batch_size, size, size}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + // Test all diagonals and out of bounds (must be no-op). + for (int diagonal = -size; diagonal <= size; ++diagonal) { + torch::Tensor output = torch::triu(input, diagonal); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::triu(lazy_input, diagonal); + AllClose(output, lazy_output); + }); + } +} + +TEST_F(LazyOpsTest, TestTril) { + int size = 5; + torch::Tensor input = + torch::rand({size, size}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + // Test all diagonals and out of bounds (must be no-op). + for (int diagonal = -size; diagonal <= size; ++diagonal) { + torch::Tensor output = torch::tril(input, diagonal); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::tril(lazy_input, diagonal); + AllClose(output, lazy_output); + }); + } +} + +TEST_F(LazyOpsTest, TestTrilNonSquare) { + int size = 5; + torch::Tensor input = + torch::rand({size, size + 1}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + // Test all diagonals and out of bounds (must be no-op). + for (int diagonal = -size; diagonal <= size; ++diagonal) { + torch::Tensor output = torch::tril(input, diagonal); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::tril(lazy_input, diagonal); + AllClose(output, lazy_output); + }); + } +} + +TEST_F(LazyOpsTest, TestTrilBatch) { + int size = 5; + int batch_size = 3; + torch::Tensor input = + torch::rand({batch_size, size, size}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + // Test all diagonals and out of bounds (must be no-op). + for (int diagonal = -size; diagonal <= size; ++diagonal) { + torch::Tensor output = torch::tril(input, diagonal); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::tril(lazy_input, diagonal); + AllClose(output, lazy_output); + }); + } +} + +TEST_F(LazyOpsTest, TestTriuInPlace) { + int size = 5; + // Test all diagonals and out of bounds (must be no-op). + for (int diagonal = -size; diagonal <= size; ++diagonal) { + ForEachDevice([&](const torch::Device& device) { + torch::Tensor input = torch::rand( + {size, size}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor output = input.triu_(diagonal); + torch::Tensor lazy_output = lazy_input.triu_(diagonal); + AllClose(output, lazy_output); + AllClose(input, lazy_input); + }); + } +} + +TEST_F(LazyOpsTest, TestTrilInPlace) { + int size = 5; + // Test all diagonals and out of bounds (must be no-op). + for (int diagonal = -size; diagonal <= size; ++diagonal) { + ForEachDevice([&](const torch::Device& device) { + torch::Tensor input = torch::rand( + {size, size}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor output = input.tril_(diagonal); + torch::Tensor lazy_output = lazy_input.tril_(diagonal); + AllClose(output, lazy_output); + AllClose(input, lazy_input); + }); + } +} + +TEST_F(LazyOpsTest, TestTrace) { + int n = 5; + torch::Tensor input = torch::rand( + {n, n}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor output = torch::trace(input); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::trace(lazy_input); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestTraceWide) { + int lines = 3; + int cols = 5; + torch::Tensor input = + torch::rand({lines, cols}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor output = torch::trace(input); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::trace(lazy_input); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestTraceNarrow) { + int lines = 5; + int cols = 3; + torch::Tensor input = + torch::rand({lines, cols}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor output = torch::trace(input); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::trace(lazy_input); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestDiagRank1) { + int size = 7; + torch::Tensor input = torch::rand( + {size}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + // Test all diagonals and out of bounds (must be no-op). + for (int diagonal = -2 * size; diagonal <= 2 * size; ++diagonal) { + torch::Tensor output = torch::diag(input, diagonal); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::diag(lazy_input, diagonal); + AllClose(output, lazy_output); + }); + } +} + +TEST_F(LazyOpsTest, TestDiagRank2) { + int size = 7; + torch::Tensor input = + torch::rand({size, size}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + // Test all diagonals and out of bounds (must be no-op). + for (int diagonal = -size; diagonal <= size; ++diagonal) { + torch::Tensor output = torch::diag(input, diagonal); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::diag(lazy_input, diagonal); + AllClose(output, lazy_output); + }); + } +} + +TEST_F(LazyOpsTest, TestDiagFlat) { + torch::Tensor input = + torch::rand({4, 3, 6, 7}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (int diagonal = -10; diagonal < 10; ++diagonal) { + torch::Tensor output = torch::diagflat(input, diagonal); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::diagflat(lazy_input, diagonal); + AllClose(output, lazy_output); + }); + } +} + +TEST_F(LazyOpsTest, TestDiagonal) { + int size = 5; + torch::Tensor input = + torch::rand({size, size}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + // Test all diagonals and out of bounds (must be no-op). + for (int diagonal = -size; diagonal <= size; ++diagonal) { + torch::Tensor output = torch::diagonal(input, diagonal); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::diagonal(lazy_input, diagonal); + AllClose(output, lazy_output); + }); + } +} + +TEST_F(LazyOpsTest, TestDiagonalUpdate) { + int size = 5; + // Test all diagonals and out of bounds (must be no-op). + for (int diagonal = -size; diagonal <= size; ++diagonal) { + auto input = torch::rand({size, size}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + auto input_clone = input.clone(); + auto output = torch::diagonal(input, diagonal); + output.add_(1); + + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input_clone, device); + torch::Tensor lazy_output = torch::diagonal(lazy_input, diagonal); + lazy_output.add_(1); + + AllClose(output, lazy_output); + AllClose(input, lazy_input); + }); + } +} + +TEST_F(LazyOpsTest, TestDiagonalNonSquare) { + int size = 5; + torch::Tensor input = + torch::rand({size, size + 1}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + // Test all diagonals and out of bounds (must be no-op). + for (int diagonal = -size; diagonal <= size; ++diagonal) { + torch::Tensor output = torch::diagonal(input, diagonal); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::diagonal(lazy_input, diagonal); + AllClose(output, lazy_output); + }); + } +} + +TEST_F(LazyOpsTest, TestDiagonalBatch) { + int size = 5; + int batch_size = 3; + int dim1 = 1; + int dim2 = 2; + torch::Tensor input = + torch::rand({batch_size, size, size}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + // Test all diagonals and out of bounds (must be no-op). + for (int diagonal = -size; diagonal <= size; ++diagonal) { + torch::Tensor output = + torch::diagonal(input, diagonal, /*dim1=*/dim1, /*dim1=*/dim2); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = + torch::diagonal(lazy_input, diagonal, /*dim1=*/dim1, /*dim1=*/dim2); + AllClose(output, lazy_output); + }); + } +} + +TEST_F(LazyOpsTest, TestFlatten) { + torch::Tensor input = torch::rand({4, 7, 5, 3}); + int rank = input.dim(); + for (int pos_start_dim = 0; pos_start_dim < rank; ++pos_start_dim) { + for (int pos_end_dim = pos_start_dim; pos_end_dim < rank; ++pos_end_dim) { + for (bool negative_start_dim : {false, true}) { + for (bool negative_end_dim : {false, true}) { + int start_dim = + negative_start_dim ? pos_start_dim - rank : pos_start_dim; + int end_dim = negative_end_dim ? pos_end_dim - rank : pos_end_dim; + torch::Tensor output = torch::flatten(input, start_dim, end_dim); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = + torch::flatten(lazy_input, start_dim, end_dim); + AllClose(output, lazy_output); + }); + } + } + } + } +} + +TEST_F(LazyOpsTest, TestLogicalAnd) { + for (torch::ScalarType scalar_type1 : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + torch::Tensor lhs = + isFloatingType(scalar_type1) + ? torch::rand({3, 4}, torch::TensorOptions(scalar_type1)) + : torch::randint(0, 100, {3, 4}, + torch::TensorOptions(scalar_type1)); + for (torch::ScalarType scalar_type2 : + {torch::kFloat, torch::kByte, torch::kChar, torch::kShort, torch::kInt, + torch::kLong}) { + torch::Tensor rhs = + isFloatingType(scalar_type2) + ? torch::rand({3, 4}, torch::TensorOptions(scalar_type2)) + : torch::randint(1, 100, {3, 4}, + torch::TensorOptions(scalar_type2)); + torch::Tensor result = torch::logical_and(lhs, rhs); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_lhs = CopyToDevice(lhs, device); + torch::Tensor lazy_rhs = CopyToDevice(rhs, device); + torch::Tensor lazy_result = torch::logical_and(lazy_lhs, lazy_rhs); + AllEqual(result, lazy_result); + }); + } + } + + ExpectCounterNotChanged("aten::.*", GetIgnoredCounters()); + ExpectCounterChanged("xla::logical_and_out", GetIgnoredCounters()); +} + +TEST_F(LazyOpsTest, TestBitwiseAnd) { + torch::Tensor lhs = torch::randint(0, std::numeric_limits::max(), + {4, 2}, torch::TensorOptions(torch::kInt)); + torch::Tensor rhs = torch::randint(0, std::numeric_limits::max(), + {4, 2}, torch::TensorOptions(torch::kInt)); + torch::Tensor result = lhs.__and__(rhs); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_lhs = CopyToDevice(lhs, device); + torch::Tensor lazy_rhs = CopyToDevice(rhs, device); + torch::Tensor lazy_result = lazy_lhs.__and__(lazy_rhs); + AllEqual(result, lazy_result); + }); +} + +TEST_F(LazyOpsTest, TestBitwiseAndInPlace) { + torch::Tensor lhs = torch::randint(0, std::numeric_limits::max(), + {4, 2}, torch::TensorOptions(torch::kInt)); + torch::Tensor rhs = torch::randint(0, std::numeric_limits::max(), + {4, 2}, torch::TensorOptions(torch::kInt)); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_lhs = CopyToDevice(lhs, device); + torch::Tensor result = lhs.__iand__(rhs); + torch::Tensor lazy_rhs = CopyToDevice(rhs, device); + torch::Tensor lazy_result = lazy_lhs.__iand__(lazy_rhs); + AllEqual(result, lazy_result); + AllEqual(lhs, lazy_lhs); + }); +} + +TEST_F(LazyOpsTest, TestBitwiseAndScalar) { + torch::Tensor lhs = torch::randint(0, std::numeric_limits::max(), + {4, 2}, torch::TensorOptions(torch::kInt)); + torch::Scalar rhs(123456789); + torch::Tensor result = lhs.__and__(rhs); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_lhs = CopyToDevice(lhs, device); + torch::Tensor lazy_result = lazy_lhs.__and__(rhs); + AllEqual(result, lazy_result); + }); +} + +TEST_F(LazyOpsTest, TestBitwiseAndScalarInPlace) { + torch::Tensor lhs = torch::randint(0, std::numeric_limits::max(), + {4, 2}, torch::TensorOptions(torch::kInt)); + torch::Scalar rhs(123456789); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_lhs = CopyToDevice(lhs, device); + torch::Tensor result = lhs.__iand__(rhs); + torch::Tensor lazy_result = lazy_lhs.__iand__(rhs); + AllEqual(result, lazy_result); + AllEqual(lhs, lazy_lhs); + }); +} + +TEST_F(LazyOpsTest, TestBitwiseAndPromotion) { + torch::Tensor input = torch::rand( + {4, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor view = input.reshape(-1); + torch::Tensor result = torch::__and__(view.gt(0), view.ne(0)); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_view = lazy_input.reshape(-1); + torch::Tensor lazy_result = torch::__and__(lazy_view.gt(0), lazy_view.ne(0)); + AllEqual(result, lazy_result); + }); +} + +TEST_F(LazyOpsTest, TestBitwiseOr) { + torch::Tensor lhs = torch::randint(0, std::numeric_limits::max(), + {4, 2}, torch::TensorOptions(torch::kInt)); + torch::Tensor rhs = torch::randint(0, std::numeric_limits::max(), + {4, 2}, torch::TensorOptions(torch::kInt)); + torch::Tensor result = lhs.__or__(rhs); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_lhs = CopyToDevice(lhs, device); + torch::Tensor lazy_rhs = CopyToDevice(rhs, device); + torch::Tensor lazy_result = lazy_lhs.__or__(lazy_rhs); + AllEqual(result, lazy_result); + }); +} + +TEST_F(LazyOpsTest, TestBitwiseOrInPlace) { + torch::Tensor lhs = torch::randint(0, std::numeric_limits::max(), + {4, 2}, torch::TensorOptions(torch::kInt)); + torch::Tensor rhs = torch::randint(0, std::numeric_limits::max(), + {4, 2}, torch::TensorOptions(torch::kInt)); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_lhs = CopyToDevice(lhs, device); + torch::Tensor result = lhs.__ior__(rhs); + torch::Tensor lazy_rhs = CopyToDevice(rhs, device); + torch::Tensor lazy_result = lazy_lhs.__ior__(lazy_rhs); + AllEqual(result, lazy_result); + AllEqual(lhs, lazy_lhs); + }); +} + +TEST_F(LazyOpsTest, TestBitwiseOrScalar) { + torch::Tensor lhs = torch::randint(0, std::numeric_limits::max(), + {4, 2}, torch::TensorOptions(torch::kInt)); + torch::Scalar rhs(123456789); + torch::Tensor result = lhs.__or__(rhs); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_lhs = CopyToDevice(lhs, device); + torch::Tensor lazy_result = lazy_lhs.__or__(rhs); + AllEqual(result, lazy_result); + }); +} + +TEST_F(LazyOpsTest, TestBitwiseOrScalarInPlace) { + torch::Tensor lhs = torch::randint(0, std::numeric_limits::max(), + {4, 2}, torch::TensorOptions(torch::kInt)); + torch::Scalar rhs(123456789); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_lhs = CopyToDevice(lhs, device); + torch::Tensor result = lhs.__ior__(rhs); + torch::Tensor lazy_result = lazy_lhs.__ior__(rhs); + AllEqual(result, lazy_result); + AllEqual(lhs, lazy_lhs); + }); +} + +TEST_F(LazyOpsTest, TestBitwiseXor) { + torch::Tensor lhs = torch::randint(0, std::numeric_limits::max(), + {4, 2}, torch::TensorOptions(torch::kInt)); + torch::Tensor rhs = torch::randint(0, std::numeric_limits::max(), + {4, 2}, torch::TensorOptions(torch::kInt)); + torch::Tensor result = lhs.__xor__(rhs); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_lhs = CopyToDevice(lhs, device); + torch::Tensor lazy_rhs = CopyToDevice(rhs, device); + torch::Tensor lazy_result = lazy_lhs.__xor__(lazy_rhs); + AllEqual(result, lazy_result); + }); +} + +TEST_F(LazyOpsTest, TestBitwiseXorInPlace) { + torch::Tensor lhs = torch::randint(0, std::numeric_limits::max(), + {4, 2}, torch::TensorOptions(torch::kInt)); + torch::Tensor rhs = torch::randint(0, std::numeric_limits::max(), + {4, 2}, torch::TensorOptions(torch::kInt)); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_lhs = CopyToDevice(lhs, device); + torch::Tensor result = lhs.__ixor__(rhs); + torch::Tensor lazy_rhs = CopyToDevice(rhs, device); + torch::Tensor lazy_result = lazy_lhs.__ixor__(lazy_rhs); + AllEqual(result, lazy_result); + AllEqual(lhs, lazy_lhs); + }); +} + +TEST_F(LazyOpsTest, TestBitwiseXorScalar) { + torch::Tensor lhs = torch::randint(0, std::numeric_limits::max(), + {4, 2}, torch::TensorOptions(torch::kInt)); + torch::Scalar rhs(123456789); + torch::Tensor result = lhs.__xor__(rhs); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_lhs = CopyToDevice(lhs, device); + torch::Tensor lazy_result = lazy_lhs.__xor__(rhs); + AllEqual(result, lazy_result); + }); +} + +TEST_F(LazyOpsTest, TestBitwiseXorScalarInPlace) { + torch::Tensor lhs = torch::randint(0, std::numeric_limits::max(), + {4, 2}, torch::TensorOptions(torch::kInt)); + torch::Scalar rhs(123456789); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_lhs = CopyToDevice(lhs, device); + torch::Tensor result = lhs.__ixor__(rhs); + torch::Tensor lazy_result = lazy_lhs.__ixor__(rhs); + AllEqual(result, lazy_result); + AllEqual(lhs, lazy_lhs); + }); +} + +TEST_F(LazyOpsTest, TestLshift) { + torch::Tensor input = torch::ones( + {4, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor shift_amount = torch::randint( + 16, input.sizes(), torch::TensorOptions().device(DefaultDevice())); + torch::Tensor result = torch::__lshift__(input, shift_amount); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_shift_amount = CopyToDevice(shift_amount, device); + torch::Tensor lazy_result = torch::__lshift__(lazy_input, lazy_shift_amount); + AllClose(result, lazy_result); + }); +} + +TEST_F(LazyOpsTest, TestLshiftInPlace) { + torch::Tensor input = torch::ones( + {4, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor shift_amount = torch::randint( + 16, input.sizes(), torch::TensorOptions().device(DefaultDevice())); + torch::Tensor result = input.__ilshift__(shift_amount); + torch::Tensor lazy_shift_amount = CopyToDevice(shift_amount, device); + torch::Tensor lazy_result = lazy_input.__ilshift__(lazy_shift_amount); + AllClose(result, lazy_result); + AllClose(input, lazy_input); + }); +} + +TEST_F(LazyOpsTest, TestLshiftScalar) { + torch::Tensor input = torch::ones( + {4, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Scalar shift_amount = 3; + torch::Tensor result = torch::__lshift__(input, shift_amount); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_result = torch::__lshift__(lazy_input, shift_amount); + AllClose(result, lazy_result); + }); +} + +TEST_F(LazyOpsTest, TestLshiftScalarInPlace) { + torch::Tensor input = torch::ones( + {4, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Scalar shift_amount = 3; + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor result = input.__ilshift__(shift_amount); + torch::Tensor lazy_result = lazy_input.__ilshift__(shift_amount); + AllClose(result, lazy_result); + AllClose(input, lazy_input); + }); +} + +TEST_F(LazyOpsTest, TestRshift) { + torch::Tensor input = torch::ones( + {4, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor shift_amount = torch::randint( + 16, input.sizes(), torch::TensorOptions().device(DefaultDevice())); + torch::Tensor result = torch::__rshift__(input, shift_amount); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_shift_amount = CopyToDevice(shift_amount, device); + torch::Tensor lazy_result = torch::__rshift__(lazy_input, lazy_shift_amount); + AllClose(result, lazy_result); + }); +} + +TEST_F(LazyOpsTest, TestRshiftInPlace) { + torch::Tensor input = torch::ones( + {4, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor shift_amount = torch::randint( + 16, input.sizes(), torch::TensorOptions().device(DefaultDevice())); + torch::Tensor result = input.__irshift__(shift_amount); + torch::Tensor lazy_shift_amount = CopyToDevice(shift_amount, device); + torch::Tensor lazy_result = lazy_input.__irshift__(lazy_shift_amount); + AllClose(result, lazy_result); + AllClose(input, lazy_input); + }); +} + +TEST_F(LazyOpsTest, TestRshiftScalar) { + torch::Tensor input = torch::ones( + {4, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Scalar shift_amount = 3; + torch::Tensor result = torch::__rshift__(input, shift_amount); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_result = torch::__rshift__(lazy_input, shift_amount); + AllClose(result, lazy_result); + }); +} + +TEST_F(LazyOpsTest, TestRshiftScalarInPlace) { + torch::Tensor input = torch::ones( + {4, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Scalar shift_amount = 3; + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor result = input.__irshift__(shift_amount); + torch::Tensor lazy_result = lazy_input.__irshift__(shift_amount); + AllClose(result, lazy_result); + AllClose(input, lazy_input); + }); +} + +TEST_F(LazyOpsTest, TestMeshgrid) { + torch::Tensor a = torch::rand( + {3}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor b = torch::rand( + {2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor c = torch::rand( + {4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + auto d = torch::meshgrid({a, b, c}); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_a = CopyToDevice(a, device); + torch::Tensor lazy_b = CopyToDevice(b, device); + torch::Tensor lazy_c = CopyToDevice(c, device); + auto lazy_d = torch::meshgrid({lazy_a, lazy_b, lazy_c}); + EXPECT_EQ(d.size(), lazy_d.size()); + for (size_t i = 0; i < d.size(); ++i) { + AllClose(d[i], lazy_d[i]); + } + }); +} + +TEST_F(LazyOpsTest, TestConstantPad) { + torch::Tensor input = torch::rand( + {4, 2, 5}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + std::vector pad{1, 2, 3, 4, 5, 6}; + float pad_value = 5; + torch::Tensor output = torch::constant_pad_nd(input, pad, pad_value); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = + torch::constant_pad_nd(lazy_input, pad, pad_value); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestConstantPadIncomplete) { + torch::Tensor input = torch::rand( + {4, 2, 5}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + std::vector pad{1, 2}; + float pad_value = 5; + torch::Tensor output = torch::constant_pad_nd(input, pad, pad_value); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = + torch::constant_pad_nd(lazy_input, pad, pad_value); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestReflectionPad2dRank3) { + torch::Tensor input = torch::rand( + {2, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + std::vector pad{2, 2, 2, 2}; + torch::Tensor output = torch::reflection_pad2d(input, pad); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::reflection_pad2d(lazy_input, pad); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestReflectionPad2dRank4) { + torch::Tensor input = + torch::rand({2, 2, 3, 4}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + std::vector pad{2, 2, 2, 2}; + torch::Tensor output = torch::reflection_pad2d(input, pad); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::reflection_pad2d(lazy_input, pad); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestReflectionPad2dBackward) { + std::vector pad{2, 3, 1, 2}; + auto testfn = [&](const std::vector& inputs) -> torch::Tensor { + return torch::reflection_pad2d(inputs[0], pad); + }; + ForEachDevice([&](const torch::Device& device) { + TestBackward({torch::rand({1, 2, 4, 4}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true))}, + device, testfn); + }); +} + +TEST_F(LazyOpsTest, TestReplicationPad1d) { + torch::Tensor input = torch::rand( + {1, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + std::vector pad{1, 2}; + torch::Tensor output = torch::replication_pad1d(input, pad); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::replication_pad1d(lazy_input, pad); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestReplicationPad1dZeroPad) { + torch::Tensor input = torch::rand( + {1, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + std::vector pad{1, 0}; + torch::Tensor output = torch::replication_pad1d(input, pad); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::replication_pad1d(lazy_input, pad); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestReplicationPad1dBackward) { + std::vector pad{2, 3}; + auto testfn = [&](const std::vector& inputs) -> torch::Tensor { + return torch::replication_pad1d(inputs[0], pad); + }; + ForEachDevice([&](const torch::Device& device) { + TestBackward({torch::rand({2, 4}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true))}, + device, testfn); + }); +} + +TEST_F(LazyOpsTest, TestReplicationPad2d) { + torch::Tensor input = torch::rand( + {1, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + std::vector pad{1, 2, 2, 1}; + torch::Tensor output = torch::replication_pad2d(input, pad); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::replication_pad2d(lazy_input, pad); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestReplicationPad2dZeroPad) { + torch::Tensor input = torch::rand( + {1, 3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + std::vector pad{1, 0, 0, 1}; + torch::Tensor output = torch::replication_pad2d(input, pad); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = torch::replication_pad2d(lazy_input, pad); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestReplicationPad2dBackward) { + std::vector pad{2, 3, 1, 1}; + auto testfn = [&](const std::vector& inputs) -> torch::Tensor { + return torch::replication_pad2d(inputs[0], pad); + }; + ForEachDevice([&](const torch::Device& device) { + TestBackward({torch::rand({2, 3, 4}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true))}, + device, testfn); + }); +} + +TEST_F(LazyOpsTest, TestAsStrided) { + torch::Tensor input = torch::rand( + {128, 320}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + std::vector size = {128, 20, 4, 4}; + std::vector stride = {320, 16, 4, 1}; + torch::Tensor output = + torch::as_strided(input, /*size=*/size, /*stride=*/stride); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = + torch::as_strided(lazy_input, /*size=*/size, /*stride=*/stride); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestAsStridedInPlace) { + torch::Tensor input = torch::rand( + {128, 320}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + std::vector size = {128, 20, 4, 4}; + std::vector stride = {320, 16, 4, 1}; + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor output = + torch::as_strided_(input, /*size=*/size, /*stride=*/stride); + torch::Tensor lazy_output = + torch::as_strided_(lazy_input, /*size=*/size, /*stride=*/stride); + AllClose(output, lazy_output); + AllClose(input, lazy_input); + }); +} + +TEST_F(LazyOpsTest, TestAsStridedWithOffset) { + torch::Tensor input = torch::rand( + {4, 8, 2}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + std::vector size = {4, 4, 2}; + std::vector stride = {8, 2, 1}; + int64_t storage_offset = 4; + torch::Tensor output = + torch::as_strided(input, /*size=*/size, /*stride=*/stride, + /*storage_offset=*/storage_offset); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_output = + torch::as_strided(lazy_input, /*size=*/size, /*stride=*/stride, + /*storage_offset=*/storage_offset); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestAsStridedWithInplaceCopy) { + torch::Tensor grad = torch::ones( + {4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + std::vector size = {4}; + std::vector stride = {1}; + torch::Tensor output = torch::zeros({4}, grad.options()); + output.as_strided(size, stride).copy_(grad); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_grad = CopyToDevice(grad, device); + torch::Tensor lazy_output = torch::zeros({4}, lazy_grad.options()); + lazy_output.as_strided(size, stride).copy_(lazy_grad); + AllClose(output, lazy_output); + }); +} + +TEST_F(LazyOpsTest, TestEmptyStrided) { + std::vector size = {4, 4, 2}; + std::vector stride = {8, 2, 1}; + torch::Tensor output = torch::empty_strided(/*size=*/size, /*stride=*/stride); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_output = + torch::empty_strided(/*size=*/size, /*stride=*/stride); + EXPECT_EQ(output.sizes(), lazy_output.sizes()); + EXPECT_EQ(output.strides(), lazy_output.strides()); + }); +} + +TEST_F(LazyOpsTest, TestAvgPool2DBackward) { + int kernel_size = 2; + for (int stride = 1; stride <= 2; ++stride) { + for (int padding = 0; padding <= 1; ++padding) { + for (bool count_include_pad : {true, false}) { + // Test ceil_mode=true through the CPU interop. + for (bool ceil_mode : {false, true}) { + auto testfn = + [&](const std::vector& inputs) -> torch::Tensor { + return torch::avg_pool2d(inputs[0], + /*kernel_size=*/{kernel_size, kernel_size}, + /*stride=*/{stride, stride}, + /*padding=*/{padding, padding}, + /*ceil_mode=*/ceil_mode, + /*count_include_pad=*/count_include_pad); + }; + + ForEachDevice([&](const torch::Device& device) { + TestBackward( + {torch::rand({1, 1, 7, 7}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true))}, + device, testfn); + }); + } + } + } + } +} + +TEST_F(LazyOpsTest, TestAvgPool3DBackward) { + int kernel_size = 2; + for (int stride = 1; stride <= 2; ++stride) { + for (int padding = 0; padding <= 1; ++padding) { + for (bool count_include_pad : {true, false}) { + // Test ceil_mode=true through the CPU interop. + for (bool ceil_mode : {false, true}) { + auto testfn = + [&](const std::vector& inputs) -> torch::Tensor { + return torch::avg_pool3d( + inputs[0], + /*kernel_size=*/{kernel_size, kernel_size, kernel_size}, + /*stride=*/{stride, stride, stride}, + /*padding=*/{padding, padding, padding}, + /*ceil_mode=*/ceil_mode, + /*count_include_pad=*/count_include_pad); + }; + + ForEachDevice([&](const torch::Device& device) { + TestBackward({torch::rand({1, 1, 7, 7, 7}, + torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true))}, + device, testfn); + }); + } + } + } + } +} + +TEST_F(LazyOpsTest, TestAvgPool2DNoBatchBackward) { + int kernel_size = 2; + for (int stride = 1; stride <= 2; ++stride) { + for (int padding = 0; padding <= 1; ++padding) { + for (bool count_include_pad : {true, false}) { + // Test ceil_mode=true through the CPU interop. + for (bool ceil_mode : {false, true}) { + auto testfn = + [&](const std::vector& inputs) -> torch::Tensor { + return torch::avg_pool2d(inputs[0], + /*kernel_size=*/{kernel_size, kernel_size}, + /*stride=*/{stride, stride}, + /*padding=*/{padding, padding}, + /*ceil_mode=*/ceil_mode, + /*count_include_pad=*/count_include_pad); + }; + + ForEachDevice([&](const torch::Device& device) { + TestBackward( + {torch::rand({1, 7, 7}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true))}, + device, testfn); + }); + } + } + } + } +} + +TEST_F(LazyOpsTest, TestAvgPool3DNoBatchBackward) { + int kernel_size = 2; + for (int stride = 1; stride <= 2; ++stride) { + for (int padding = 0; padding <= 1; ++padding) { + for (bool count_include_pad : {true, false}) { + // Test ceil_mode=true through the CPU interop. + for (bool ceil_mode : {false, true}) { + auto testfn = + [&](const std::vector& inputs) -> torch::Tensor { + return torch::avg_pool3d( + inputs[0], + /*kernel_size=*/{kernel_size, kernel_size, kernel_size}, + /*stride=*/{stride, stride, stride}, + /*padding=*/{padding, padding, padding}, + /*ceil_mode=*/ceil_mode, + /*count_include_pad=*/count_include_pad); + }; + + ForEachDevice([&](const torch::Device& device) { + TestBackward( + {torch::rand({1, 7, 7, 7}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true))}, + device, testfn); + }); + } + } + } + } +} + +TEST_F(LazyOpsTest, TestAdaptiveAvgPool3DNoBatchBackward) { + if (IsCuda()) { + GTEST_SKIP(); + } + for (int64_t output_size : {7, 4}) { + auto testfn = + [&](const std::vector& inputs) -> torch::Tensor { + return torch::adaptive_avg_pool3d( + inputs[0], {output_size, output_size, output_size}); + }; + ForEachDevice([&](const torch::Device& device) { + TestBackward( + {torch::rand({1, 56, 28, 28}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true))}, + device, testfn); + }); + } +} + +TEST_F(LazyOpsTest, TestAdaptiveAvgPool3DBackward) { + if (IsCuda()) { + GTEST_SKIP(); + } + for (int64_t output_size : {7, 4}) { + auto testfn = + [&](const std::vector& inputs) -> torch::Tensor { + return torch::adaptive_avg_pool3d( + inputs[0], {output_size, output_size, output_size}); + }; + ForEachDevice([&](const torch::Device& device) { + TestBackward( + {torch::rand({4, 1, 56, 28, 28}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true))}, + device, testfn); + }); + } +} + +TEST_F(LazyOpsTest, TestAdaptiveAvgPool2DBackward) { + for (int64_t output_size : {7, 8}) { + auto testfn = + [&](const std::vector& inputs) -> torch::Tensor { + return torch::adaptive_avg_pool2d(inputs[0], {output_size, output_size}); + }; + ForEachDevice([&](const torch::Device& device) { + TestBackward( + {torch::rand({4, 1, 56, 56}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true))}, + device, testfn); + }); + } +} + +TEST_F(LazyOpsTest, TestAdaptiveAvgPool2DNoBatchBackward) { + for (int64_t output_size : {7, 8}) { + auto testfn = + [&](const std::vector& inputs) -> torch::Tensor { + return torch::adaptive_avg_pool2d(inputs[0], {output_size, output_size}); + }; + ForEachDevice([&](const torch::Device& device) { + TestBackward({torch::rand({1, 56, 56}, torch::TensorOptions(torch::kFloat) + .requires_grad(true))}, + device, testfn); + }); + } +} + +TEST_F(LazyOpsTest, TestConv2D) { + int in_channels = 4; + int out_channels = 4; + int kernel_size = 3; + for (int stride = 1; stride <= 3; ++stride) { + for (int padding = 0; padding <= 2; ++padding) { + for (bool with_bias : {true, false}) { + for (int dilation = 1; dilation <= 3; ++dilation) { + for (int groups : + {1, 2, 4}) { // covers normal, grouped, depthwise conv. + ForEachDevice([&](const torch::Device& device) { + torch::Tensor input = torch::rand( + {1, in_channels, 7, 7}, + torch::TensorOptions(torch::kDouble).device(DefaultDevice())); + torch::Tensor weight = torch::rand( + {out_channels, in_channels / groups, kernel_size, + kernel_size}, + torch::TensorOptions(torch::kDouble).device(DefaultDevice())); + torch::Tensor bias = + with_bias ? torch::rand({out_channels}, + torch::TensorOptions(torch::kDouble) + .device(DefaultDevice())) + : torch::Tensor(); + + torch::Tensor lazy_input = CopyToDevice(input, device); + torch::Tensor lazy_weight = CopyToDevice(weight, device); + torch::Tensor lazy_bias = + with_bias ? CopyToDevice(bias, device) : torch::Tensor(); + + torch::Tensor output = + torch::conv2d(input, weight, bias, + /*stride=*/{stride, stride}, + /*padding=*/{padding, padding}, + /*dilation=*/{dilation, dilation}, groups); + torch::Tensor lazy_output = + torch::conv2d(lazy_input, lazy_weight, lazy_bias, + /*stride=*/{stride, stride}, + /*padding=*/{padding, padding}, + /*dilation=*/{dilation, dilation}, groups); + AllClose(output, lazy_output); + }); + } + } + } + } + } +} + +TEST_F(LazyOpsTest, TestConv2DBackward) { + int in_channels = 4; + int out_channels = 4; + int kernel_size = 3; + for (int stride = 1; stride <= 3; ++stride) { + for (int padding = 0; padding <= 2; ++padding) { + for (bool with_bias : {true, false}) { + for (int dilation = 1; dilation <= 3; ++dilation) { + for (int groups : + {1, 2, 4}) { // covers normal, grouped, depthwise conv. + auto testfn = + [&](const std::vector& inputs) -> torch::Tensor { + return torch::conv2d(inputs[0], inputs[1], inputs[2], + /*stride=*/{stride, stride}, + /*padding=*/{padding, padding}, + /*dilation=*/{dilation, dilation}, groups); + }; + + ForEachDevice([&](const torch::Device& device) { + torch::Tensor bias = + with_bias ? torch::rand({out_channels}, + torch::TensorOptions(torch::kDouble) + .device(DefaultDevice())) + : torch::Tensor(); + TestBackward({torch::rand({1, in_channels, 7, 7}, + torch::TensorOptions(torch::kDouble) + .device(DefaultDevice()) + .requires_grad(true)), + torch::rand({out_channels, in_channels / groups, + kernel_size, kernel_size}, + torch::TensorOptions(torch::kDouble) + .device(DefaultDevice()) + .requires_grad(true)), + bias}, + device, testfn); + }); + } + }; + } + } + } +} + +TEST_F(LazyOpsTest, TestTransposedConv2DBackward) { + int in_channels = 4; + int out_channels = 4; + int kernel_size = 3; + for (int stride = 1; stride <= 2; ++stride) { + for (int padding = 0; padding <= 1; ++padding) { + for (int dilation = 1; dilation <= 2; ++dilation) { + for (int output_padding = 0; + output_padding < std::max(stride, dilation); ++output_padding) { + for (bool with_bias : {true, false}) { + for (int groups : + {1, 2, 4}) { // covers normal, grouped, depthwise conv. + auto testfn = [&](const std::vector& inputs) + -> torch::Tensor { + return torch::conv_transpose2d( + inputs[0], inputs[1], inputs[2], + /*stride=*/{stride, stride + 1}, + /*padding=*/{padding, padding + 1}, + /*output_padding=*/output_padding, + /*groups=*/groups, + /*dilation=*/{dilation, dilation + 1}); + }; + ForEachDevice([&](const torch::Device& device) { + torch::Tensor input = torch::rand( + {4, out_channels, 7, 7}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true)); + torch::Tensor weight = + torch::rand({out_channels, in_channels / groups, + kernel_size, kernel_size}, + torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true)); + torch::Tensor bias = + with_bias ? torch::rand({in_channels}, + torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true)) + : torch::Tensor(); + TestBackward({input, weight, bias}, device, testfn, + /*rtol=*/1e-5, /*atol=*/1e-5); + }); + } + }; + } + } + } + } +} + +TEST_F(LazyOpsTest, TestConv3DBackward) { + int in_channels = 4; + int out_channels = 4; + int kernel_size = 3; + for (int stride = 1; stride <= 3; ++stride) { + for (int padding = 1; padding <= 2; ++padding) { + for (bool with_bias : {true, false}) { + for (int dilation = 1; dilation <= 2; ++dilation) { + for (int groups : + {1, 2, 4}) { // covers normal, grouped, depthwise conv. + auto testfn = + [&](const std::vector& inputs) -> torch::Tensor { + return torch::conv3d(inputs[0], inputs[1], inputs[2], + /*stride=*/{stride, stride, stride}, + /*padding=*/{padding, padding, padding}, + /*dilation=*/{dilation, dilation, dilation}, + groups); + }; + + ForEachDevice([&](const torch::Device& device) { + torch::Tensor bias = + with_bias ? torch::rand({out_channels}, + torch::TensorOptions(torch::kDouble) + .device(DefaultDevice())) + : torch::Tensor(); + TestBackward({torch::rand({4, in_channels, 7, 7, 7}, + torch::TensorOptions(torch::kDouble) + .device(DefaultDevice()) + .requires_grad(true)), + torch::rand({out_channels, in_channels / groups, + kernel_size, kernel_size, kernel_size}, + torch::TensorOptions(torch::kDouble) + .device(DefaultDevice()) + .requires_grad(true)), + bias}, + device, testfn); + }); + } + }; + } + } + } +} + +TEST_F(LazyOpsTest, TestTransposedConv3DBackward) { + int in_channels = 4; + int out_channels = 4; + int kernel_size = 3; + for (int stride = 1; stride <= 2; ++stride) { + for (int padding = 0; padding <= 1; ++padding) { + for (int dilation = 1; dilation <= 2; ++dilation) { + for (int output_padding = 0; + output_padding < std::max(stride, dilation); ++output_padding) { + for (bool with_bias : {true, false}) { + for (int groups : + {1, 2, 4}) { // covers normal, grouped, depthwise conv. + auto testfn = [&](const std::vector& inputs) + -> torch::Tensor { + return torch::conv_transpose3d( + inputs[0], inputs[1], inputs[2], + /*stride=*/{stride, stride + 1, stride}, + /*padding=*/{padding, padding + 1, stride}, + /*output_padding=*/output_padding, + /*groups=*/groups, + /*dilation=*/{dilation, dilation + 1, dilation}); + }; + ForEachDevice([&](const torch::Device& device) { + torch::Tensor input = + torch::rand({4, out_channels, 7, 7, 7}, + torch::TensorOptions(torch::kDouble) + .device(DefaultDevice()) + .requires_grad(true)); + torch::Tensor weight = + torch::rand({out_channels, in_channels / groups, + kernel_size, kernel_size, kernel_size}, + torch::TensorOptions(torch::kDouble) + .device(DefaultDevice()) + .requires_grad(true)); + torch::Tensor bias = + with_bias ? torch::rand({in_channels}, + torch::TensorOptions(torch::kDouble) + .device(DefaultDevice()) + .requires_grad(true)) + : torch::Tensor(); + TestBackward({input, weight, bias}, device, testfn); + }); + } + }; + } + } + } + } +} + +TEST_F(LazyOpsTest, TestMaxPool2DBackward) { + int kernel_size = 3; + for (int stride = 1; stride <= 2; ++stride) { + for (int padding = 0; padding <= 1; ++padding) { + // Test ceil_mode=true through the CPU interop. + for (bool ceil_mode : {false, true}) { + auto testfn = + [&](const std::vector& inputs) -> torch::Tensor { + return torch::max_pool2d( + inputs[0], /*kernel_size=*/{kernel_size, kernel_size}, + /*stride=*/{stride, stride}, + /*padding=*/{padding, padding}, /*dilation=*/{1, 1}, + /*ceil_mode=*/ceil_mode); + }; + + ForEachDevice([&](const torch::Device& device) { + TestBackward( + {torch::rand({1, 2, 8, 8}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true))}, + device, testfn); + }); + } + } + } +} + +TEST_F(LazyOpsTest, TestMaxPool3DBackward) { + int kernel_size = 3; + for (int stride = 1; stride <= 2; ++stride) { + for (int padding = 0; padding <= 1; ++padding) { + // Test ceil_mode=true through the CPU interop. + for (bool ceil_mode : {false, true}) { + auto testfn = + [&](const std::vector& inputs) -> torch::Tensor { + return torch::max_pool3d( + inputs[0], + /*kernel_size=*/{kernel_size, kernel_size, kernel_size}, + /*stride=*/{stride, stride, stride}, + /*padding=*/{padding, padding, padding}, /*dilation=*/{1, 1, 1}, + /*ceil_mode=*/ceil_mode); + }; + + ForEachDevice([&](const torch::Device& device) { + TestBackward( + {torch::rand({1, 2, 4, 4, 4}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true))}, + device, testfn); + }); + } + } + } +} + +TEST_F(LazyOpsTest, TestMaxPool2DNoBatchBackward) { + int kernel_size = 3; + for (int stride = 1; stride <= 2; ++stride) { + for (int padding = 0; padding <= 1; ++padding) { + // Test ceil_mode=true through the CPU interop. + for (bool ceil_mode : {false, true}) { + auto testfn = + [&](const std::vector& inputs) -> torch::Tensor { + return torch::max_pool2d( + inputs[0], /*kernel_size=*/{kernel_size, kernel_size}, + /*stride=*/{stride, stride}, + /*padding=*/{padding, padding}, /*dilation=*/{1, 1}, + /*ceil_mode=*/ceil_mode); + }; + + ForEachDevice([&](const torch::Device& device) { + TestBackward( + {torch::rand({2, 8, 8}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true))}, + device, testfn); + }); + } + } + } +} + +TEST_F(LazyOpsTest, TestMaxPool3DNoBatchBackward) { + int kernel_size = 3; + for (int stride = 1; stride <= 2; ++stride) { + for (int padding = 0; padding <= 1; ++padding) { + // Test ceil_mode=true through the CPU interop. + for (bool ceil_mode : {false, true}) { + auto testfn = + [&](const std::vector& inputs) -> torch::Tensor { + return torch::max_pool3d( + inputs[0], + /*kernel_size=*/{kernel_size, kernel_size, kernel_size}, + /*stride=*/{stride, stride, stride}, + /*padding=*/{padding, padding, padding}, /*dilation=*/{1, 1, 1}, + /*ceil_mode=*/ceil_mode); + }; + + ForEachDevice([&](const torch::Device& device) { + TestBackward( + {torch::rand({2, 4, 4, 4}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true))}, + device, testfn); + }); + } + } + } +} + +TEST_F(LazyOpsTest, TestMaxUnpool2DBackward) { + int kernel_size = 2; + torch::Tensor input = + torch::rand({2, 2, 8, 8}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (int stride = 1; stride <= 2; ++stride) { + for (int padding = 0; padding <= 1; ++padding) { + // Test ceil_mode=true through the CPU interop. + for (bool ceil_mode : {false, true}) { + for (int dilation = 1; dilation <= 2; ++dilation) { + torch::Tensor output; + torch::Tensor indices; + std::tie(output, indices) = torch::max_pool2d_with_indices( + input, /*kernel_size=*/{kernel_size, kernel_size}, + /*stride=*/{stride, stride}, + /*padding=*/{padding, padding}, /*dilation=*/{dilation, dilation}, + /*ceil_mode=*/ceil_mode); + + std::vector output_size({input.size(2), input.size(3)}); + auto testfn = + [&](const std::vector& inputs) -> torch::Tensor { + return torch::max_unpool2d(inputs[0], inputs[1], output_size); + }; + + ForEachDevice([&](const torch::Device& device) { + TestBackward({output.requires_grad_(true), indices}, device, + testfn); + }); + } + } + } + } +} + +TEST_F(LazyOpsTest, TestMaxUnpool3DBackward) { + int kernel_size = 2; + torch::Tensor input = + torch::rand({1, 1, 4, 4, 4}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (int stride = 1; stride <= 2; ++stride) { + for (int padding = 0; padding <= 1; ++padding) { + // Test ceil_mode=true through the CPU interop. + for (bool ceil_mode : {false, true}) { + for (int dilation = 1; dilation <= 2; ++dilation) { + torch::Tensor output; + torch::Tensor indices; + std::tie(output, indices) = torch::max_pool3d_with_indices( + input, /*kernel_size=*/{kernel_size, kernel_size, kernel_size}, + /*stride=*/{stride, stride, stride}, + /*padding=*/{padding, padding, padding}, + /*dilation=*/{dilation, dilation, dilation}, + /*ceil_mode=*/ceil_mode); + + std::vector output_size( + {input.size(2), input.size(3), input.size(4)}); + auto testfn = + [&](const std::vector& inputs) -> torch::Tensor { + return torch::max_unpool3d(inputs[0], inputs[1], output_size, + /*stride=*/{stride, stride, stride}, + /*padding=*/{padding, padding, padding}); + }; + + ForEachDevice([&](const torch::Device& device) { + TestBackward({output.requires_grad_(true), indices}, device, + testfn); + }); + } + } + } + } +} + +TEST_F(LazyOpsTest, TestTanhBackward) { + auto testfn = [&](const std::vector& inputs) -> torch::Tensor { + return torch::tanh(inputs[0]); + }; + ForEachDevice([&](const torch::Device& device) { + TestBackward({torch::rand({2, 2}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true))}, + device, testfn, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestSigmoidBackward) { + auto testfn = [&](const std::vector& inputs) -> torch::Tensor { + return torch::sigmoid(inputs[0]); + }; + ForEachDevice([&](const torch::Device& device) { + TestBackward({torch::rand({2, 2}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true))}, + device, testfn); + }); +} + +TEST_F(LazyOpsTest, TestLogSigmoidBackward) { + auto testfn = [&](const std::vector& inputs) -> torch::Tensor { + return torch::log_sigmoid(inputs[0]); + }; + ForEachDevice([&](const torch::Device& device) { + TestBackward({torch::rand({2, 2}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true))}, + device, testfn, /*rtol=*/1e-3, /*atol=*/1e-5); + }); +} + +TEST_F(LazyOpsTest, TestLogSoftmaxBackward) { + for (int dim = -4; dim < 4; ++dim) { + auto testfn = + [&](const std::vector& inputs) -> torch::Tensor { + return torch::log_softmax(inputs[0], dim); + }; + + ForEachDevice([&](const torch::Device& device) { + TestBackward( + {torch::rand({5, 3, 4, 2}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true))}, + device, testfn, /*rtol=*/1e-3, /*atol=*/1e-4); + }); + } +} + +TEST_F(LazyOpsTest, TestSoftmaxBackward) { + for (int dim = -4; dim < 4; ++dim) { + auto testfn = + [&](const std::vector& inputs) -> torch::Tensor { + return torch::softmax(inputs[0], dim); + }; + + ForEachDevice([&](const torch::Device& device) { + TestBackward( + {torch::rand({5, 3, 4, 2}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true))}, + device, testfn, /*rtol=*/1e-3, /*atol=*/1e-4); + }); + } +} + +TEST_F(LazyOpsTest, TestSoftplusBackward) { + auto testfn = [&](const std::vector& inputs) -> torch::Tensor { + return torch::softplus(inputs[0]); + }; + ForEachDevice([&](const torch::Device& device) { + TestBackward({torch::rand({2, 1, 4, 6}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true))}, + device, testfn, /*rtol=*/1e-4); + }); +} + +TEST_F(LazyOpsTest, TestReluBackward) { + auto testfn = [&](const std::vector& inputs) -> torch::Tensor { + return torch::relu(inputs[0]); + }; + ForEachDevice([&](const torch::Device& device) { + TestBackward({torch::rand({2, 1, 4, 6}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true))}, + device, testfn); + }); +} + +TEST_F(LazyOpsTest, TestRreluBackward) { + auto testfn = [&](const std::vector& inputs) -> torch::Tensor { + return torch::rrelu(inputs[0]); + }; + ForEachDevice([&](const torch::Device& device) { + TestBackward({torch::rand({2, 1, 4, 6}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true))}, + device, testfn); + }); +} + +TEST_F(LazyOpsTest, TestHardshrinkBackward) { + auto testfn = [&](const std::vector& inputs) -> torch::Tensor { + return torch::hardshrink(inputs[0]); + }; + ForEachDevice([&](const torch::Device& device) { + TestBackward({torch::randn({100}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true))}, + device, testfn); + }); +} + +TEST_F(LazyOpsTest, TestSoftshrinkBackward) { + auto testfn = [&](const std::vector& inputs) -> torch::Tensor { + return torch::softshrink(inputs[0]); + }; + ForEachDevice([&](const torch::Device& device) { + TestBackward({torch::randn({100}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true))}, + device, testfn); + }); +} + +TEST_F(LazyOpsTest, TestHardtanhBackward) { + auto testfn = [&](const std::vector& inputs) -> torch::Tensor { + return torch::hardtanh(inputs[0]); + }; + ForEachDevice([&](const torch::Device& device) { + TestBackward({torch::randn({100}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true))}, + device, testfn); + }); +} + +TEST_F(LazyOpsTest, TestEluBackward) { + torch::Scalar alpha = 0.5; + torch::Scalar scale = 2.5; + torch::Scalar input_scale = 1.5; + auto testfn = [&](const std::vector& inputs) -> torch::Tensor { + return torch::elu(inputs[0], alpha, scale, input_scale); + }; + ForEachDevice([&](const torch::Device& device) { + TestBackward({torch::rand({2, 1, 4, 6}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true))}, + device, testfn); + }); +} + +TEST_F(LazyOpsTest, TestGeluBackward) { + auto testfn = [&](const std::vector& inputs) -> torch::Tensor { + return torch::gelu(inputs[0]); + }; + ForEachDevice([&](const torch::Device& device) { + TestBackward({torch::rand({2, 3}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true))}, + device, testfn); + }); + ExpectCounterChanged("lazy::gelu_backward", GetIgnoredCounters()); +} + +TEST_F(LazyOpsTest, TestLeakyReluBackward) { + double negative_slope = 0.01; + auto testfn = [=](const std::vector& inputs) -> torch::Tensor { + return torch::leaky_relu(inputs[0], negative_slope); + }; + ForEachDevice([&](const torch::Device& device) { + TestBackward({torch::rand({2, 1, 4, 6}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true))}, + device, testfn); + }); +} + +TEST_F(LazyOpsTest, TestTransposeBackward) { + auto testfn = [&](const std::vector& inputs) -> torch::Tensor { + return torch::t(inputs[0]); + }; + ForEachDevice([&](const torch::Device& device) { + TestBackward({torch::rand({2, 3}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true))}, + device, testfn); + }); +} + +TEST_F(LazyOpsTest, TestAddMatMulBackward) { + int in_channels = 32; + int out_channels = 320; + int labels = 50; + // Test beta != 1. through the CPU interop. + for (double beta : {1., 2.}) { + auto testfn = + [&](const std::vector& inputs) -> torch::Tensor { + return torch::addmm(inputs[0], inputs[1], inputs[2], /*beta=*/beta); + }; + ForEachDevice([&](const torch::Device& device) { + TestBackward({torch::rand({labels}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true)), + torch::rand({in_channels, out_channels}, + torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true)), + torch::rand({out_channels, labels}, + torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true))}, + device, testfn); + }); + } +} + +TEST_F(LazyOpsTest, TestBinaryCrossEntropyBackward) { + int batch = 6; + int classes = 2; + // TODO(asuhan): Fix the torch::kDouble case. + for (auto dtype : {torch::kFloat}) { + for (bool def_weight : {false, true}) { + torch::Tensor input = torch::rand( + {batch, classes}, torch::TensorOptions(dtype).requires_grad(true)); + torch::Tensor target = + torch::rand({batch, classes}, torch::TensorOptions(dtype)); + torch::Tensor weight; + if (def_weight) { + weight = torch::rand({batch, classes}, torch::TensorOptions(dtype)); + } + for (torch::Reduction::Reduction reduction : + {torch::Reduction::Mean, torch::Reduction::Sum, + torch::Reduction::None}) { + auto testfn = + [&](const std::vector& inputs) -> torch::Tensor { + return torch::binary_cross_entropy( + /*self=*/inputs[0], /*target=*/inputs[1], + /*weight=*/inputs[2], + /*reduction=*/reduction); + }; + ForEachDevice([&](const torch::Device& device) { + TestBackward({input, target, weight}, device, testfn, /*rtol=*/1e-4, + /*atol=*/1e-7); + }); + } + } + } +} + +TEST_F(LazyOpsTest, TestNllLossBackward) { + // TODO(whc) debug divide-by-zero failure under ASAN + GTEST_SKIP(); + + int batch = 6; + int classes = 2; + // TODO(asuhan): Fix the torch::kDouble case. + for (auto dtype : {torch::kFloat}) { + for (int ignore_index : {-1, 0, 1, 5}) { + for (bool def_weight : {false, true}) { + torch::Tensor input = + torch::rand({batch, classes}, torch::TensorOptions(dtype) + .device(DefaultDevice()) + .requires_grad(true)); + torch::Tensor target = torch::randint( + std::min(ignore_index, 0), classes, {batch}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + torch::Tensor weight; + if (def_weight) { + weight = torch::rand( + {classes}, torch::TensorOptions(dtype).device(DefaultDevice())); + } + for (torch::Reduction::Reduction reduction : + {torch::Reduction::Mean, torch::Reduction::Sum, + torch::Reduction::None}) { + auto testfn = + [&](const std::vector& inputs) -> torch::Tensor { + return torch::nll_loss( + /*self=*/inputs[0], /*target=*/inputs[1], + /*weight=*/inputs[2], + /*reduction=*/reduction, /*ignore_index=*/ignore_index); + }; + ForEachDevice([&](const torch::Device& device) { + TestBackward({input, target, weight}, device, testfn, /*rtol=*/1e-5, + /*atol=*/1e-8); + }); + } + } + } + } +} + +TEST_F(LazyOpsTest, TestNllLoss2dBackward) { + int batch = 6; + int classes = 2; + int height = 3; + int width = 3; + // TODO(asuhan): Fix the torch::kDouble case. + for (auto dtype : {torch::kFloat}) { + for (int ignore_index : {-1, 0, 1, 5}) { + for (bool def_weight : {false, true}) { + torch::Tensor input = torch::rand({batch, classes, height, width}, + torch::TensorOptions(dtype) + .device(DefaultDevice()) + .requires_grad(true)); + torch::Tensor target = torch::randint( + std::min(ignore_index, 0), classes, {batch, height, width}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + torch::Tensor weight; + if (def_weight) { + weight = torch::rand( + {classes}, torch::TensorOptions(dtype).device(DefaultDevice())); + } + for (torch::Reduction::Reduction reduction : + {torch::Reduction::Mean, torch::Reduction::Sum, + torch::Reduction::None}) { + auto testfn = + [&](const std::vector& inputs) -> torch::Tensor { + return torch::nll_loss2d( + /*self=*/inputs[0], /*target=*/inputs[1], + /*weight=*/inputs[2], + /*reduction=*/reduction, /*ignore_index=*/ignore_index); + }; + ForEachDevice([&](const torch::Device& device) { + TestBackward({input, target, weight}, device, testfn, /*rtol=*/1e-5, + /*atol=*/1e-8); + }); + } + } + } + } +} + +TEST_F(LazyOpsTest, TestSmoothL1LossBackward) { + torch::Tensor input = torch::randn({2, 4}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true)); + torch::Tensor target = torch::randn( + {2, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + for (torch::Reduction::Reduction reduction : + {torch::Reduction::None, torch::Reduction::Mean, + torch::Reduction::Sum}) { + for (double beta : {0.25, 1.}) { + auto testfn = + [&](const std::vector& inputs) -> torch::Tensor { + return torch::smooth_l1_loss(/*input=*/inputs[0], /*target=*/inputs[1], + /*reduction=*/reduction, /*beta=*/beta); + }; + ForEachDevice([&](const torch::Device& device) { + TestBackward({input, target}, device, testfn, /*rtol=*/1e-5, + /*atol=*/1e-8); + }); + } + } +} + +TEST_F(LazyOpsTest, TestViewBackward) { + auto testfn = [&](const std::vector& inputs) -> torch::Tensor { + return inputs[0].view({-1, 320}); + }; + ForEachDevice([&](const torch::Device& device) { + TestBackward( + {torch::rand({32, 20, 4, 4}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true))}, + device, testfn); + }); +} + +TEST_F(LazyOpsTest, TestBatchNorm2DBackward) { + double momentum = 0.1; + double eps = 0.5; + auto testfn = [&](const std::vector& inputs) -> torch::Tensor { + return torch::batch_norm( + /*input=*/inputs[0], /*weight=*/inputs[1], /*bias=*/inputs[2], + /*running_mean=*/inputs[3], /*running_var=*/inputs[4], + /*training=*/true, /*momentum=*/momentum, /*eps=*/eps, + /*cudnn_enabled=*/false); + }; + int num_features = 3; + torch::Tensor undef; + for (bool undef_weight_bias : {false, true}) { + ForEachDevice([&](const torch::Device& device) { + torch::Tensor input = torch::rand({2, num_features, 4, 4}, + torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true)); + torch::Tensor weight = + undef_weight_bias + ? undef + : torch::rand({num_features}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true)); + torch::Tensor bias = + undef_weight_bias + ? undef + : torch::rand({num_features}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true)); + torch::Tensor running_mean = torch::zeros( + {num_features}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor running_var = torch::ones( + {num_features}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + TestBackward({input, weight, bias, running_mean, running_var}, device, + testfn, + /*rtol=*/1e-3, /*atol=*/1e-4); + }); + } +} + +TEST_F(LazyOpsTest, TestBatchNorm3DBackward) { + double momentum = 0.1; + double eps = 0.5; + auto testfn = [&](const std::vector& inputs) -> torch::Tensor { + return torch::batch_norm( + /*input=*/inputs[0], /*weight=*/inputs[1], /*bias=*/inputs[2], + /*running_mean=*/inputs[3], /*running_var=*/inputs[4], + /*training=*/true, /*momentum=*/momentum, /*eps=*/eps, + /*cudnn_enabled=*/false); + }; + int num_features = 3; + torch::Tensor undef; + for (bool undef_weight_bias : {false, true}) { + ForEachDevice([&](const torch::Device& device) { + torch::Tensor input = torch::rand({2, num_features, 4, 4, 2}, + torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true)); + torch::Tensor weight = + undef_weight_bias + ? undef + : torch::rand({num_features}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true)); + torch::Tensor bias = + undef_weight_bias + ? undef + : torch::rand({num_features}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true)); + torch::Tensor running_mean = torch::zeros( + {num_features}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor running_var = torch::ones( + {num_features}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + TestBackward({input, weight, bias, running_mean, running_var}, device, + testfn, + /*rtol=*/1e-3, /*atol=*/1e-3); + }); + } +} + +TEST_F(LazyOpsTest, TestBCEWithLogitsBackward) { + int batch = 10; + int classes = 5; + torch::Tensor undef; + for (torch::Reduction::Reduction reduction : + {torch::Reduction::None, torch::Reduction::Mean, + torch::Reduction::Sum}) { + auto testfn = + [&](const std::vector& inputs) -> torch::Tensor { + return torch::binary_cross_entropy_with_logits( + /*input=*/inputs[0], /*target=*/inputs[1], /*weight=*/inputs[2], + /*pos_weight=*/inputs[3], + /*reduction=*/reduction); + }; + for (bool undef_weight : {false, true}) { + for (bool undef_pos_weight : {false, true}) { + torch::Tensor input = + torch::rand({batch, classes}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true)); + torch::Tensor target = + torch::rand({batch, classes}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true)); + torch::Tensor weight = + undef_weight + ? undef + : torch::rand({classes}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice())); + torch::Tensor pos_weight = + undef_pos_weight + ? undef + : torch::rand({classes}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + TestBackward({input, target, weight, pos_weight}, device, testfn, + /*rtol=*/1e-3, /*atol=*/1e-5); + }); + } + } + } +} + +TEST_F(LazyOpsTest, TestKlDivBackward) { + torch::Tensor input = torch::rand({4, 3}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true)); + torch::Tensor target = torch::rand({4, 3}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true)); + for (torch::Reduction::Reduction reduction : + {torch::Reduction::Mean, torch::Reduction::Sum, + torch::Reduction::None}) { + auto testfn = + [&](const std::vector& inputs) -> torch::Tensor { + return torch::kl_div(/*self=*/inputs[0], /*target=*/inputs[1], reduction); + }; + ForEachDevice([&](const torch::Device& device) { + TestBackward({input, target}, device, testfn, /*rtol=*/1e-4, + /*atol=*/1e-5); + }); + } +} + +TEST_F(LazyOpsTest, TestEmbeddingBackward) { + int num_weights = 32; + for (int padding_idx = -1; padding_idx < num_weights; ++padding_idx) { + for (bool scale_grad_by_freq : {false, true}) { + auto testfn = + [&](const std::vector& inputs) -> torch::Tensor { + return torch::embedding(inputs[0], inputs[1], + /*padding_idx=*/padding_idx, + /*scale_grad_by_freq=*/scale_grad_by_freq, + /*sparse=*/false); + }; + ForEachDevice([&](const torch::Device& device) { + torch::Tensor weight = + torch::rand({num_weights, 7}, torch::TensorOptions(torch::kFloat) + .device(DefaultDevice()) + .requires_grad(true)); + torch::Tensor indices = torch::randint( + num_weights, {3, 9, 4}, + torch::TensorOptions(torch::kLong).device(DefaultDevice())); + TestBackward({weight, indices}, device, testfn, /*rtol=*/1e-5, + /*atol=*/1e-8); + }); + } + } +} + +TEST_F(LazyOpsTest, TestAmpForeachNonFiniteCheckAndUnscale) { + if (IsCuda()) { + // TODO(whc) debug failure on cuda + GTEST_SKIP(); + } + + torch::Tensor grads0 = torch::tensor( + {1, 2, 3, 4}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor grads1 = torch::tensor( + {1.0, 2.0, std::nan("1"), 4.0}, + torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor inv_scale = torch::scalar_tensor( + 0.2, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor found_inf = torch::scalar_tensor( + 0, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor grads_output0 = grads0 * inv_scale; + torch::Tensor found_inf_output0 = torch::scalar_tensor( + 0, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor found_inf_output1 = torch::scalar_tensor( + 1, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ForEachDevice([&](const torch::Device& device) { + if (grads0.device() == at::kCPU) { + GTEST_SKIP(); + } + torch::Tensor lazy_grads0 = CopyToDevice(grads0, device); + torch::Tensor lazy_inv_scale = CopyToDevice(inv_scale, device); + torch::Tensor lazy_found_inf = CopyToDevice(found_inf, device); + torch::_amp_foreach_non_finite_check_and_unscale_(lazy_grads0, lazy_found_inf, + lazy_inv_scale); + AllClose(grads_output0, lazy_grads0, /*rtol=*/1e-2, /*atol=*/1e-4); + AllEqual(found_inf_output0, lazy_found_inf); + + torch::Tensor lazy_grads1 = CopyToDevice(grads1, device); + torch::_amp_foreach_non_finite_check_and_unscale_(lazy_grads1, lazy_found_inf, + lazy_inv_scale); + AllEqual(found_inf_output1, lazy_found_inf); + }); +} + +TEST_F(LazyOpsTest, TestAmpUpdateScale) { + torch::Tensor growth_tracker = torch::scalar_tensor( + 0, torch::TensorOptions(torch::kInt32).device(DefaultDevice())); + torch::Tensor current_scale = torch::scalar_tensor( + 4, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor found_inf = torch::scalar_tensor( + 1, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor not_found_inf = torch::scalar_tensor( + 0, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + float scale_growth_factor = 2.0; + float scale_backoff_factor = 0.5; + int growth_interval = 3; + + torch::Tensor growth_tracker_result0 = torch::scalar_tensor( + 1, torch::TensorOptions(torch::kInt32).device(DefaultDevice())); + torch::Tensor current_scale_result0 = torch::scalar_tensor( + 4, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor growth_tracker_result1 = torch::scalar_tensor( + 2, torch::TensorOptions(torch::kInt32).device(DefaultDevice())); + torch::Tensor current_scale_result1 = torch::scalar_tensor( + 4, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor growth_tracker_result2 = torch::scalar_tensor( + 0, torch::TensorOptions(torch::kInt32).device(DefaultDevice())); + torch::Tensor current_scale_result2 = torch::scalar_tensor( + 8, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor growth_tracker_result3 = torch::scalar_tensor( + 0, torch::TensorOptions(torch::kInt32).device(DefaultDevice())); + torch::Tensor current_scale_result3 = torch::scalar_tensor( + 4, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + + ForEachDevice([&](const torch::Device& device) { + if (growth_tracker.device() == at::kCPU) { + GTEST_SKIP(); + } + torch::Tensor lazy_growth_tracker = CopyToDevice(growth_tracker, device); + torch::Tensor lazy_current_scale = CopyToDevice(current_scale, device); + torch::Tensor lazy_found_inf = CopyToDevice(found_inf, device); + torch::Tensor lazy_not_found_inf = CopyToDevice(not_found_inf, device); + + torch::_amp_update_scale_(lazy_current_scale, lazy_growth_tracker, + lazy_not_found_inf, scale_growth_factor, + scale_backoff_factor, growth_interval); + AllClose(current_scale_result0, lazy_current_scale, /*rtol=*/1e-2, + /*atol=*/1e-4); + AllEqual(growth_tracker_result0, lazy_growth_tracker); + + torch::_amp_update_scale_(lazy_current_scale, lazy_growth_tracker, + lazy_not_found_inf, scale_growth_factor, + scale_backoff_factor, growth_interval); + AllClose(current_scale_result1, lazy_current_scale, /*rtol=*/1e-2, + /*atol=*/1e-4); + AllEqual(growth_tracker_result1, lazy_growth_tracker); + + // torch::_amp_update_scale_ returns the reference of current_scale + lazy_current_scale = torch::_amp_update_scale_( + lazy_current_scale, lazy_growth_tracker, lazy_not_found_inf, + scale_growth_factor, scale_backoff_factor, growth_interval); + AllClose(current_scale_result2, lazy_current_scale, /*rtol=*/1e-2, + /*atol=*/1e-4); + AllEqual(growth_tracker_result2, lazy_growth_tracker); + + lazy_current_scale = torch::_amp_update_scale_( + lazy_current_scale, lazy_growth_tracker, lazy_found_inf, + scale_growth_factor, scale_backoff_factor, growth_interval); + AllClose(current_scale_result3, lazy_current_scale, /*rtol=*/1e-2, + /*atol=*/1e-4); + AllEqual(growth_tracker_result3, lazy_growth_tracker); + }); + ExpectCounterNotChanged("aten::.*", GetIgnoredCounters()); + ExpectCounterChanged("lazy::_amp_update_scale_", + GetIgnoredCounters()); +} + +TEST_F(LazyOpsTest, TestEarlySyncLiveTensors) { + torch::Tensor scalar_tensor = torch::scalar_tensor( + 1., torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Scalar scalar1 = scalar_tensor.item(); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_scalar_tensor = CopyToDevice(scalar_tensor, device); + torch::Scalar scalar2 = lazy_scalar_tensor.item(); + ASSERT_EQ(scalar1.to(), scalar2.to()); + }); + if (DebugUtil::ExperimentEnabled("early_sync")) { + ExpectCounterChanged("EarlySyncLiveTensorsCount", + GetIgnoredCounters()); + } else { + ExpectCounterNotChanged("EarlySyncLiveTensorsCount", + GetIgnoredCounters()); + } + ExpectCounterChanged("aten::_local_scalar_dense", + GetIgnoredCounters()); +} + +TEST_F(LazyOpsTest, TestLerp) { + torch::Tensor start = torch::rand( + {3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor end = torch::rand( + {3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor weight = torch::rand( + {3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor res = torch::lerp(start, end, weight); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_start = CopyToDevice(start, device); + torch::Tensor lazy_end = CopyToDevice(end, device); + torch::Tensor lazy_weight = CopyToDevice(weight, device); + torch::Tensor lazy_res = torch::lerp(lazy_start, lazy_end, lazy_weight); + AllClose(res, lazy_res); + }); + ExpectCounterNotChanged("aten::.*", GetIgnoredCounters()); + ExpectCounterChanged("lazy::lerp", GetIgnoredCounters()); +} + +TEST_F(LazyOpsTest, TestLerpScalar) { + torch::Tensor start = torch::rand( + {3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor end = torch::rand( + {3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Scalar weight = torch::Scalar(3.0); + torch::Tensor res = torch::lerp(start, end, weight); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_start = CopyToDevice(start, device); + torch::Tensor lazy_end = CopyToDevice(end, device); + torch::Tensor lazy_res = torch::lerp(lazy_start, lazy_end, weight); + AllClose(res, lazy_res); + }); + ExpectCounterNotChanged("aten::.*", GetIgnoredCounters()); + ExpectCounterChanged("lazy::lerp", GetIgnoredCounters()); +} + +TEST_F(LazyOpsTest, TestLerpInplace) { + torch::Tensor input = torch::rand( + {3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor end = torch::rand( + {3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor weight = torch::rand( + {3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor input_copy = input.clone(); + input.lerp_(end, weight); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input_copy, device); + torch::Tensor lazy_end = CopyToDevice(end, device); + torch::Tensor lazy_weight = CopyToDevice(weight, device); + lazy_input.lerp_(lazy_end, lazy_weight); + AllClose(lazy_input, input); + }); + ExpectCounterNotChanged("aten::.*", GetIgnoredCounters()); + ExpectCounterChanged("lazy::lerp", GetIgnoredCounters()); +} + +TEST_F(LazyOpsTest, TestLerpScalarInplace) { + torch::Tensor input = torch::rand( + {3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor end = torch::rand( + {3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Scalar weight = torch::Scalar(3.0); + torch::Tensor input_copy = input.clone(); + input.lerp_(end, weight); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_input = CopyToDevice(input_copy, device); + torch::Tensor lazy_end = CopyToDevice(end, device); + lazy_input.lerp_(lazy_end, weight); + AllClose(lazy_input, input); + }); + ExpectCounterNotChanged("aten::.*", GetIgnoredCounters()); + ExpectCounterChanged("lazy::lerp", GetIgnoredCounters()); +} + +TEST_F(LazyOpsTest, TestLerpOut) { + torch::Tensor start = torch::rand( + {3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor end = torch::rand( + {3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor weight = torch::rand( + {3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor res = torch::empty( + {3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + ; + torch::lerp_out(res, start, end, weight); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_start = CopyToDevice(start, device); + torch::Tensor lazy_end = CopyToDevice(end, device); + torch::Tensor lazy_weight = CopyToDevice(weight, device); + torch::Tensor lazy_res = torch::empty({3, 4}, lazy_start.options()); + torch::lerp_out(lazy_res, lazy_start, lazy_end, lazy_weight); + AllClose(res, lazy_res); + }); + ExpectCounterNotChanged("aten::.*", GetIgnoredCounters()); + ExpectCounterChanged("lazy::lerp", GetIgnoredCounters()); +} + +TEST_F(LazyOpsTest, TestLerpScalarOut) { + torch::Tensor start = torch::rand( + {3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Tensor end = torch::rand( + {3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::Scalar weight = torch::Scalar(3.0); + torch::Tensor res = torch::empty( + {3, 4}, torch::TensorOptions(torch::kFloat).device(DefaultDevice())); + torch::lerp_out(res, start, end, weight); + ForEachDevice([&](const torch::Device& device) { + torch::Tensor lazy_start = CopyToDevice(start, device); + torch::Tensor lazy_end = CopyToDevice(end, device); + torch::Tensor lazy_res = torch::empty({3, 4}, lazy_start.options()); + torch::lerp_out(lazy_res, lazy_start, lazy_end, weight); + AllClose(res, lazy_res); + }); + ExpectCounterNotChanged("aten::.*", GetIgnoredCounters()); + ExpectCounterChanged("lazy::lerp", GetIgnoredCounters()); +} + +#endif // FBCODE_CAFFE2 + +} // namespace lazy +} // namespace torch diff --git a/test/cpp/lazy/test_lazy_ops_util.cpp b/test/cpp/lazy/test_lazy_ops_util.cpp new file mode 100644 index 00000000000000..6f12f960e7af2a --- /dev/null +++ b/test/cpp/lazy/test_lazy_ops_util.cpp @@ -0,0 +1,194 @@ +#include + +#include +#include +#include +#include + +#include +#include + + +namespace torch { +namespace lazy { +namespace { + +bool IsLtcTensor(const at::Tensor& tensor) { + return dynamic_cast(tensor.unsafeGetTensorImpl()); +} + +std::unordered_set* CreateIgnoredCounters() { + std::unordered_set* icounters = + new std::unordered_set(); + // Add below the counters whose name need to be ignored when doing + // is-any-counter-changed assertins. + icounters->insert("aten::rand"); + return icounters; +} + +} // namespace + +const std::unordered_set* GetIgnoredCounters() { + static const std::unordered_set* icounters = + CreateIgnoredCounters(); + return icounters; +} + +at::Tensor ToCpuTensor(const at::Tensor& tensor) { + // tensor.to() implicitly triggers a sync if t.device=torch::kLazy. + return tensor.to(torch::kCPU); +} + +torch::Tensor CopyToDevice(const torch::Tensor& tensor, + const torch::Device& device) { + return tensor.clone().to(device, /*non_blocking=*/false, /*copy=*/true); +} + +bool EqualValues(at::Tensor tensor1, at::Tensor tensor2) { + tensor1 = ToCpuTensor(tensor1); + tensor2 = ToCpuTensor(tensor2); + if (torch::isnan(tensor1).any().item()) { + EXPECT_TRUE(EqualValues(torch::isnan(tensor1), torch::isnan(tensor2))); + tensor1.nan_to_num_(); + tensor2.nan_to_num_(); + } + if (tensor1.sizes() != tensor2.sizes() || + tensor1.dtype() != tensor2.dtype()) { + std::cerr << "Different shape:\n" + << tensor1.dtype() << " " << tensor1.sizes() << "\n-vs-\n" + << tensor2.dtype() << " " << tensor2.sizes() << "\n"; + return false; + } + at::ScalarType type1 = tensor1.scalar_type(); + at::ScalarType type2 = tensor2.scalar_type(); + if (type1 != type2) { + tensor1 = tensor1.toType(type2); + } + bool equal = tensor1.equal(tensor2); + return equal; +} + +bool EqualValuesNoElementTypeCheck(at::Tensor tensor1, at::Tensor tensor2) { + tensor1 = ToCpuTensor(tensor1); + tensor2 = ToCpuTensor(tensor2); + if (tensor1.sizes() != tensor2.sizes()) { + std::cerr << "Different shape:\n" + << tensor1.dtype() << " " << tensor1.sizes() << "\n-vs-\n" + << tensor2.dtype() << " " << tensor2.sizes() << "\n"; + return false; + } + at::ScalarType type1 = tensor1.scalar_type(); + at::ScalarType type2 = tensor2.scalar_type(); + if (type1 != type2) { + tensor1 = tensor1.toType(type2); + } + bool equal = tensor1.equal(tensor2); + return equal; +} + +void ForEachDevice(const std::function& devfn) { + // Currently TorchScript backend only supports one type of hardware per process, + // which is set by env. And the ordinal is always 0 given distributed training/ + // multi-device is not supported yet. + auto device = torch::lazy::BackendDevice(); + torch::Device torch_device = torch::lazy::backendDeviceToAtenDevice(device); + devfn(torch_device); +} + +bool CloseValues(at::Tensor tensor1, at::Tensor tensor2, double rtol, + double atol) { + tensor1 = ToCpuTensor(tensor1); + tensor2 = ToCpuTensor(tensor2); + if (torch::isnan(tensor1).any().item()) { + EXPECT_TRUE(EqualValues(torch::isnan(tensor1), torch::isnan(tensor2))); + tensor1.nan_to_num_(); + tensor2.nan_to_num_(); + } + if (tensor1.sizes() != tensor2.sizes() || + tensor1.dtype() != tensor2.dtype()) { + std::cerr << "Different shape:\n" + << tensor1.dtype() << " " << tensor1.sizes() << "\n-vs-\n" + << tensor2.dtype() << " " << tensor2.sizes() << "\n"; + return false; + } + bool equal = tensor1.allclose(tensor2, rtol, atol); + return equal; +} + +std::string GetTensorTextGraph(at::Tensor tensor) { + torch::lazy::LazyTensorPtr lazy_tensor = torch::lazy::TryGetLtcTensor(tensor); + return torch::lazy::DumpUtil::ToText({lazy_tensor->GetIrValue().node.get()}); +} + +std::string GetTensorDotGraph(at::Tensor tensor) { + torch::lazy::LazyTensorPtr lazy_tensor = torch::lazy::TryGetLtcTensor(tensor); + return torch::lazy::DumpUtil::ToDot({lazy_tensor->GetIrValue().node.get()}); +} + +void TestBackward( + const std::vector& inputs, const torch::Device& device, + const std::function&)>& + testfn, + double rtol, double atol, int derivative_level) { + std::vector input_vars; + std::vector xinput_vars; + std::vector inputs_w_grad; + std::vector xinputs_w_grad; + for (size_t i = 0; i < inputs.size(); ++i) { + const torch::Tensor& input = inputs[i]; + if (input.defined()) { + torch::Tensor oinput = + input.clone().detach().set_requires_grad(input.requires_grad()); + input_vars.push_back(oinput); + + torch::Tensor xinput = CopyToDevice(input, device) + .detach() + .set_requires_grad(input.requires_grad()); + xinput_vars.push_back(xinput); + if (input.requires_grad()) { + inputs_w_grad.push_back(oinput); + xinputs_w_grad.push_back(xinput); + } + } else { + input_vars.emplace_back(); + xinput_vars.emplace_back(); + } + } + + torch::Tensor output = testfn(input_vars); + torch::Tensor xoutput = testfn(xinput_vars); + torch::lazy::AllClose(output, xoutput, rtol, atol); + + std::vector outs = {output}; + std::vector xouts = {xoutput}; + for (int d = 1; d <= derivative_level; ++d) { + // Check grad of sum(outs) w.r.t inputs_w_grad. + torch::Tensor sum = torch::zeros_like(outs[0]).sum(); + torch::Tensor xsum = torch::zeros_like(xouts[0]).sum(); + for (size_t i = 0; i < outs.size(); ++i) { + if (outs[i].requires_grad()) { + sum += outs[i].sum(); + xsum += xouts[i].sum(); + } + } + // Calculating higher order derivative requires create_graph=true + bool create_graph = d != derivative_level; + outs = torch::autograd::grad({sum}, inputs_w_grad, /*grad_outputs=*/{}, + /*retain_graph=*/c10::nullopt, + /*create_graph=*/create_graph, + /*allow_unused=*/true); + xouts = torch::autograd::grad({xsum}, xinputs_w_grad, /*grad_outputs=*/{}, + /*retain_graph=*/c10::nullopt, + /*create_graph=*/create_graph, + /*allow_unused=*/true); + for (size_t i = 0; i < outs.size(); ++i) { + ASSERT_EQ(outs[i].defined(), xouts[i].defined()); + if (outs[i].defined()) { + AllClose(outs[i], xouts[i], rtol, atol); + } + } + } +} + +} // namespace lazy +} // namespace torch diff --git a/test/cpp/lazy/test_lazy_ops_util.h b/test/cpp/lazy/test_lazy_ops_util.h new file mode 100644 index 00000000000000..6dc26b48be9518 --- /dev/null +++ b/test/cpp/lazy/test_lazy_ops_util.h @@ -0,0 +1,68 @@ +#pragma once + +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include + +namespace torch { +namespace lazy { + +const std::unordered_set* GetIgnoredCounters(); + +// Converts an at::Tensor(device=torch::kLazy) to at::Tensor(device=torch::kCPU) +// This at::Tensor can be torch::Tensor which is a Variable, or at::Tensor which +// know nothing about autograd. If the input tensor is already a CPU tensor, it +// will be returned. Needed because EqualValues and AllClose require CPU tensors +// on both sides. +at::Tensor ToCpuTensor(const at::Tensor& tensor); + +// Helper function to copy a tensor to device. +torch::Tensor CopyToDevice(const torch::Tensor& tensor, + const torch::Device& device); + +bool EqualValues(at::Tensor tensor1, at::Tensor tensor2); + +bool EqualValuesNoElementTypeCheck(at::Tensor tensor1, at::Tensor tensor2); + +bool CloseValues(at::Tensor tensor1, at::Tensor tensor2, double rtol = 1e-5, + double atol = 1e-8); + +static inline void AllClose(at::Tensor tensor, at::Tensor xla_tensor, + double rtol = 1e-5, double atol = 1e-8) { + EXPECT_TRUE(CloseValues(tensor, xla_tensor, rtol, atol)); +} + +static inline void AllClose(at::Tensor tensor, torch::lazy::LazyTensor& xla_tensor, + double rtol = 1e-5, double atol = 1e-8) { + EXPECT_TRUE( + CloseValues(tensor, xla_tensor.ToTensor(/*detached=*/false), rtol, atol)); +} + +static inline void AllEqual(at::Tensor tensor, at::Tensor xla_tensor) { + EXPECT_TRUE(EqualValues(tensor, xla_tensor)); +} + +void ForEachDevice(const std::function& devfn); + +std::string GetTensorTextGraph(at::Tensor tensor); + +std::string GetTensorDotGraph(at::Tensor tensor); + +std::string GetTensorHloGraph(at::Tensor tensor); + +void TestBackward( + const std::vector& inputs, const torch::Device& device, + const std::function&)>& + testfn, + double rtol = 1e-5, double atol = 1e-8, int derivative_level = 1); + +} // namespace lazy +} // namespace torch diff --git a/test/cpp/lazy/test_misc.cpp b/test/cpp/lazy/test_misc.cpp index 45b54fd2824b73..b2f941c42dd6ba 100644 --- a/test/cpp/lazy/test_misc.cpp +++ b/test/cpp/lazy/test_misc.cpp @@ -71,6 +71,11 @@ TEST(HashTest, Sanity) { auto b = std::vector({1, 1, 2, 3, 5, 8, 12}); test_hash_repeatable_sensitive(a, b); test_hash_repeatable_sensitive(c10::ArrayRef(a), c10::ArrayRef(b)); + + // vector is a special case bc it is implemented as vector + auto bool_a = std::vector({true, false, false, true}); + auto bool_b = std::vector({true, true, false, true}); + test_hash_repeatable_sensitive(bool_a, bool_b); } } // namespace lazy diff --git a/test/cpp/lazy/test_symbolic_shape.cpp b/test/cpp/lazy/test_symbolic_shape.cpp new file mode 100644 index 00000000000000..7fac64f44839f2 --- /dev/null +++ b/test/cpp/lazy/test_symbolic_shape.cpp @@ -0,0 +1,132 @@ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +namespace torch { +namespace lazy { + +// Lazy Tensor is disabled in FBCODE until addressing non-virtual methods (e.g. +// sizes) in TensorImpl +#ifndef FBCODE_CAFFE2 + +namespace { +// This registers the torchscript backend, without which lazy device won't work +torch::lazy::BackendRegistrar g_registrar(GetTSBackendImpl()); + +static inline at::DeviceType DefaultDevice() { + return torch::lazy::getBackend()->EagerFallbackDeviceType(); +} + +std::vector getIsSymbolic(at::Tensor& lazy_tensor) { + auto ltc_tensor = GetLtcTensor(lazy_tensor); + Value ir_val = ltc_tensor->GetIrValue(); + const Shape& shape = ir_val->shape(); + return shape.is_symbolic().value(); +} + +class LazyShapeTest : public ::testing::Test { + protected: + static void SetUpTestCase() {} + void SetUp() override { + at::manual_seed(42); + torch::lazy::LazyGraphExecutor::Get()->SetRngSeed( + torch::lazy::BackendDevice(), 42); + FLAGS_ltc_enable_symbolic_shapes = true; + } + void TearDown() override { + FLAGS_ltc_enable_symbolic_shapes = false; + } +}; + +class DynamicInputShapeNode : public Node { + public: + explicit DynamicInputShapeNode(Shape& shape) + : Node( + OpKind(), + /* num_outputs */ 1, + /* hash_func */ + [&](bool /*bakeInSizes*/) -> hash_t { return 0; }), + shape_(shape) {} + ~DynamicInputShapeNode() override = default; + + const std::vector& operands() const override { + TORCH_INTERNAL_ASSERT(false, "Can't access operands of test node"); + } + + const Output& operand(size_t i) const override { + TORCH_INTERNAL_ASSERT(false, "Can't access operand[i] of test node"); + } + const Shape& shape(size_t i) const override { + return shape_; + } + c10::ArrayRef shapes() const override { + return {shape_}; + } + + private: + Shape shape_; +}; + +} // namespace + +Tensor tensorWithSymbolicShape( + const std::vector& sizes, + const std::vector& is_symbolic) { + Shape shape = Shape(torch::kFloat32, sizes); + Shape shape_with_symbolic = shape.with_symbolic_dims(is_symbolic); + auto n = torch::lazy::MakeNode(shape_with_symbolic); + auto device = BackendDevice(); + auto lt = torch::lazy::LazyTensor::Create(n, device); + return torch::lazy::CreateAtenFromLtcTensor(lt); +} + +TEST_F(LazyShapeTest, TestMulBasic) { + // Basic propagation + torch::Tensor a = tensorWithSymbolicShape({2, 2}, {true, false}); + torch::Tensor b = tensorWithSymbolicShape({2, 2}, {true, false}); + torch::Tensor res = torch::mul(a, b); + + std::vector expected = {true, false}; + EXPECT_EQ(getIsSymbolic(res), expected); + + // Test when some inputs are symbolic + a = tensorWithSymbolicShape({2, 2}, {true, true}); + b = tensorWithSymbolicShape({2, 2}, {true, false}); + res = torch::mul(a, b); + + // This is not {true, false}, as the SSA shape propagation + // is not able to simplify + // expandedSizes.append(sizeB if sizeA == 1 else sizeA) + // in broadcast() in shape_functions_1.h + // due to sizeA being symbolic + expected = {true, true}; + EXPECT_EQ(getIsSymbolic(res), expected); + + // Test correct handling of broadcasting dim + a = tensorWithSymbolicShape({2, 2}, {false, true}); + b = tensorWithSymbolicShape({2, 1}, {true, false}); + res = torch::mul(a, b); + + expected = {false, true}; + EXPECT_EQ(getIsSymbolic(res), expected); + + // Test correct handling of scalar values + a = tensorWithSymbolicShape({2, 2}, {false, true}); + res = torch::mul(a, 3); + expected = {false, true}; + EXPECT_EQ(getIsSymbolic(res), expected); +}; +#endif // FBCODE_CAFFE2 + +} // namespace lazy +} // namespace torch diff --git a/test/cpp/lazy/test_tensor_impl.cpp b/test/cpp/lazy/test_tensor_impl.cpp index 2a7f2893c72496..8d968f620b6b24 100644 --- a/test/cpp/lazy/test_tensor_impl.cpp +++ b/test/cpp/lazy/test_tensor_impl.cpp @@ -6,12 +6,14 @@ namespace torch { namespace lazy { -// TODO(alanwaketan): Update the following unit tests once the TorchScript backend is merged. +#ifdef FBCODE_CAFFE2 +// Lazy Tensor is disabled in FBCODE until addressing non-virtual methods (e.g. sizes) in TensorImpl TEST(LazyTensorImplTest, BasicThrow) { EXPECT_THROW({ auto input = torch::rand({0, 1, 3, 0}, torch::TensorOptions(torch::kFloat).device("lazy")); }, ::c10::Error); } +#endif // FBCODE_CAFFE2 } // namespace lazy } // namespace torch diff --git a/test/cpp/profiler/containers.cpp b/test/cpp/profiler/containers.cpp new file mode 100644 index 00000000000000..60e6d0f238b185 --- /dev/null +++ b/test/cpp/profiler/containers.cpp @@ -0,0 +1,76 @@ +#include +#include +#include +#include + +#include + +#include +#include +#include + +TEST(ProfilerTest, AppendOnlyList) { + const int n = 4096; + torch::profiler::impl::AppendOnlyList list; + for (const auto i : c10::irange(n)) { + list.emplace_back(i); + ASSERT_EQ(list.size(), i + 1); + } + + int expected = 0; + for (const auto i : list) { + ASSERT_EQ(i, expected++); + } + ASSERT_EQ(expected, n); + + list.clear(); + ASSERT_EQ(list.size(), 0); +} + +TEST(ProfilerTest, AppendOnlyList_ref) { + const int n = 512; + torch::profiler::impl::AppendOnlyList, 64> list; + std::vector*> refs; + for (const auto _ : c10::irange(n)) { + refs.push_back(list.emplace_back()); + } + + for (const auto i : c10::irange(n)) { + *refs.at(i) = {i, 0}; + } + + int expected = 0; + for (const auto& i : list) { + ASSERT_EQ(i.first, expected++); + } +} + +// Test that we can convert TSC measurements back to wall clock time. +TEST(ProfilerTest, clock_converter) { + const int n = 10001; + torch::profiler::impl::ApproximateClockToUnixTimeConverter converter; + std::vector pairs; + for (const auto i : c10::irange(n)) { + pairs.push_back(torch::profiler::impl::ApproximateClockToUnixTimeConverter::measurePair()); + } + auto count_to_ns = converter.makeConverter(); + std::vector deltas; + for (const auto& i : pairs) { + deltas.push_back(i.t_ - count_to_ns(i.approx_t_)); + } + std::sort(deltas.begin(), deltas.end()); + + // In general it's not a good idea to put clocks in unit tests as it leads + // to flakiness. We mitigate this by: + // 1) Testing the clock itself. While the time to complete a task may + // vary, two clocks measuring the same time should be much more + // consistent. + // 2) Only testing the interquartile range. Context switches between + // calls to the two timers do occur and can result in hundreds of + // nanoseconds of noise, but such switches are only a few percent + // of cases. + // 3) We're willing to accept a somewhat large bias which can emerge from + // differences in the cost of calling each clock. + EXPECT_LT(std::abs(deltas[n / 2]), 200); + EXPECT_LT(deltas[n * 3 / 4] - deltas[n / 4], 50); +} diff --git a/test/cpp/tensorexpr/test_base.h b/test/cpp/tensorexpr/test_base.h index 4a8e667de3acfd..510cad45001281 100644 --- a/test/cpp/tensorexpr/test_base.h +++ b/test/cpp/tensorexpr/test_base.h @@ -78,7 +78,7 @@ static void assertAllEqual(const std::vector& vec, const T& val) { template static void assertAllEqual(const std::vector& v1, const std::vector& v2) { ASSERT_EQ(v1.size(), v2.size()); - for (int i = 0; i < v1.size(); i++) { + for (size_t i = 0; i < v1.size(); ++i) { ASSERT_EQ(v1[i], v2[i]); } } diff --git a/test/cpp/tensorexpr/test_external_calls.cpp b/test/cpp/tensorexpr/test_external_calls.cpp index 60335ed55b494b..76fba75444318f 100644 --- a/test/cpp/tensorexpr/test_external_calls.cpp +++ b/test/cpp/tensorexpr/test_external_calls.cpp @@ -2,8 +2,17 @@ #include +#include +#include +#include +#include +#include + #include +#include +#include #include +#include #include #include #include @@ -11,6 +20,9 @@ #include #include +#include +#include + #include #include #include @@ -884,7 +896,7 @@ TEST(ExternalCall, Inlining) { return MatmulResult.load(i, j) + FloatImm::make(3.0f); }); - StmtPtr root_stmt = alloc(std::vector( + StmtPtr root_stmt = alloc(std::vector( {A.stmt(), B.stmt(), MatmulResult.stmt(), Result.stmt()})); LoopNest l(root_stmt, {Result.buf()}); @@ -923,5 +935,130 @@ TEST(ExternalCall, Inlining) { ASSERT_TRUE(at::allclose(nnc_result, ref)); } +TEST(ExternalCall, JitCustomFusionOp) { + const char* custom_op_schema_literal = + "nnc_custom::add_mul(Tensor a, Tensor b, Tensor c) -> Tensor"; + const char* external_func_name = "nnc_add_mul"; + + auto add_mul_lowering_func = + [external_func_name]( + const std::vector& inputs, + const std::vector& output_shape, + const c10::optional& output_type, + at::Device device) { + auto output_dtype = Dtype(*output_type); + torch::jit::tensorexpr::BufHandle result_buf( + "nnc_add_mul_res_buf", output_shape, output_dtype); + const torch::jit::tensorexpr::BufHandle& a = + c10::get(inputs[0]); + const torch::jit::tensorexpr::BufHandle& b = + c10::get(inputs[1]); + const torch::jit::tensorexpr::BufHandle& c = + c10::get(inputs[1]); + torch::jit::tensorexpr::StmtPtr s = + torch::jit::tensorexpr::ExternalCall::make( + result_buf, external_func_name, {a, b, c}, {}); + return Tensor(result_buf.node(), s); + }; + + auto add_mul_external_func = [](int64_t bufs_num, + void** buf_data, + int64_t* buf_ranks, + int64_t* buf_dims, + int64_t* buf_strides, + int8_t* buf_dtypes, + int64_t args_num, + int64_t* extra_args) {}; + + torch::jit::RegisterOperators reg({Operator( + custom_op_schema_literal, + [](const Node* node) -> Operation { + return [](Stack& _stack) { + auto a = std::move(peek(_stack, 0, 3)).toTensor(); + auto b = std::move(peek(_stack, 1, 3)).toTensor(); + auto c = std::move(peek(_stack, 2, 3)).toTensor(); + drop(_stack, 3); + auto result = (a + b) * c; + pack(_stack, std::move(result)); + return 0; + }; + }, + c10::AliasAnalysisKind::FROM_SCHEMA)}); + + auto& custom_operator_set = torch::jit::tensorexpr::getCustomOperatorSet(); + custom_operator_set.insert({custom_op_schema_literal}); + + auto& te_lowering_registry = torch::jit::tensorexpr::getNNCLoweringRegistry(); + te_lowering_registry.insert( + parseSchema(custom_op_schema_literal), add_mul_lowering_func); + + auto& te_nnc_func_registry = torch::jit::tensorexpr::getNNCFunctionRegistry(); + te_nnc_func_registry[external_func_name] = add_mul_external_func; + + std::string graph_string = R"IR( + graph(%a : Float(10, 20, strides=[20, 1], device=cpu), + %b : Float(10, 20, strides=[20, 1], device=cpu), + %c : Float(10, 20, strides=[20, 1], device=cpu)): + %res : Float(10, 20, strides=[20, 1], device=cpu) = nnc_custom::add_mul(%a, %b, %c) + return (%res))IR"; + + auto graph = std::make_shared(); + torch::jit::parseIR(graph_string, graph.get()); + + std::string shape_compute_python_string = R"PY( + def computOutput(a: List[int], b: List[int], c: List[int]): + expandedSizes: List[int] = [] + dimsA = len(a) + dimsB = len(b) + dimsC = len(c) + ndim = max(dimsA, dimsB, dimsC) + for i in range(ndim): + offset = ndim - 1 - i + dimA = dimsA - 1 - offset + dimB = dimsB - 1 - offset + dimC = dimsC - 1 - offset + sizeA = a[dimA] if (dimA >= 0) else 1 + sizeB = b[dimB] if (dimB >= 0) else 1 + sizeC = a[dimC] if (dimC >= 0) else 1 + + if sizeA != sizeB and sizeB != sizeC and sizeA != 1 and sizeB != 1 and sizeC != 1: + # TODO: only assertion error is bound in C++ compilation right now + raise AssertionError( + "The size of tensor a {} must match the size of tensor b (" + "{} and c {}) at non-singleton dimension {}".format(sizeA, sizeB, sizeC, i) + ) + + expandedSizes.append(max(sizeA, sizeB, sizeC)) + + return expandedSizes + )PY"; + auto cu_ptr = torch::jit::compile(shape_compute_python_string); + torch::jit::GraphFunction* gf = + (torch::jit::GraphFunction*)&cu_ptr->get_function("computOutput"); + ASSERT_TRUE(gf); + +#ifdef TORCH_ENABLE_LLVM + auto static_graph_case = graph->copy(); + FuseTensorExprs(static_graph_case, 1); + torch::jit::testing::FileCheck() + .check("prim::TensorExprGroup_") + ->check("nnc_custom::add_mul") + ->run(*static_graph_case); + + auto dynamic_graph_case = graph->copy(); + auto custom_op = torch::jit::getOperatorForLiteral(custom_op_schema_literal); + ASSERT_TRUE(custom_op); + torch::jit::RegisterShapeComputeGraphForSchema( + custom_op->schema(), gf->graph()); + FuseTensorExprs(dynamic_graph_case, 1, false, true); + torch::jit::testing::FileCheck() + .check("prim::TensorExprGroup_") + ->check("nnc_custom::add_mul") + ->run(*dynamic_graph_case); +#else + torch::jit::testing::FileCheck().check("nnc_custom::add_mul")->run(*graph); +#endif +} + } // namespace jit } // namespace torch diff --git a/test/cpp/tensorexpr/test_memdependency.cpp b/test/cpp/tensorexpr/test_memdependency.cpp index 535ef439deaf7d..2e4c5bdb737ae0 100644 --- a/test/cpp/tensorexpr/test_memdependency.cpp +++ b/test/cpp/tensorexpr/test_memdependency.cpp @@ -274,7 +274,7 @@ TEST(MemDependency, BoundSubtractMultiDim) { if (x.size() != y.size()) { return false; } - for (auto i = 0; i < x.size(); ++i) { + for (auto i = 0U; i < x.size(); ++i) { if (!indexBoundsEquals(x[i], y[i])) { return false; } @@ -338,7 +338,7 @@ TEST(MemDependency, BoundSubtractMultiDimSymbolic) { if (x.size() != y.size()) { return false; } - for (auto i = 0; i < x.size(); ++i) { + for (auto i = 0U; i < x.size(); ++i) { if (!indexBoundsEquals(x[i], y[i])) { return false; } diff --git a/test/cpp/tensorexpr/test_ops.cpp b/test/cpp/tensorexpr/test_ops.cpp index e4c9155ff60c05..6ad9cb2a54a32f 100644 --- a/test/cpp/tensorexpr/test_ops.cpp +++ b/test/cpp/tensorexpr/test_ops.cpp @@ -24,7 +24,7 @@ TEST(Ops, Sum) { constexpr int N = 16; std::vector testDims = {{0}, {1}, {0, 1}}; std::vector> outputShapes = {{N}, {M}, {}}; - for (int idx = 0; idx < testDims.size(); idx++) { + for (unsigned idx = 0; idx < testDims.size(); idx++) { const auto& dims = testDims[idx]; const auto& outShape = outputShapes[idx]; diff --git a/test/cpp/tensorexpr/test_quantization.cpp b/test/cpp/tensorexpr/test_quantization.cpp index 9df2503a608ca9..82eb8573cff500 100644 --- a/test/cpp/tensorexpr/test_quantization.cpp +++ b/test/cpp/tensorexpr/test_quantization.cpp @@ -1,6 +1,6 @@ #include -#include +#include #include #include #include @@ -221,7 +221,99 @@ TEST_F(Quantization, QuantAddDequantUInt8) { CHECK_EQ(check, 1); } -TEST_F(Quantization, QuantUpsampleNearest2dDequantUInt8) { +TEST_F(Quantization, QuantSigmoidDequantUInt8) { + const auto graph_string = R"IR( + graph(%x1 : Float(2, 2, strides=[2, 1], device=cpu)): + %2 : int = prim::Constant[value=13]() + %qz1 : int = prim::Constant[value=13]() + %qs1 : float = prim::Constant[value=0.1]() + %q1 : QUInt8(2, 2) = aten::quantize_per_tensor(%x1, %qs1, %qz1, %2) + %qa : QUInt8(2, 2) = aten::sigmoid(%q1) + %6 : Float(2, 2) = aten::dequantize(%qa) + return (%6))IR"; + auto graph = std::make_shared(); + parseIR(graph_string, &*graph); + + auto x1 = at::rand({2, 2}, TensorOptions(kCPU).dtype(at::kFloat)); + auto q1 = at::quantize_per_tensor(x1, 0.1f, 13, at::kQUInt8); + auto qs = at::sigmoid(q1); + auto y_expected = at::dequantize(qs); + + TensorExprKernel k(graph); + std::vector inputs = {x1}; + StmtPtr s = k.getCodeGenStmt(); + + std::vector stack = fmap(inputs); + k.run(stack); + auto y = stack[0].toTensor(); + bool check = at::allclose(y_expected, y); + if (!check) { + std::cout << "x1:\n" << x1 << std::endl; + std::cout << "q1:\n" << q1 << std::endl; + std::cout << "qs:\n" << qs << std::endl; + std::cout << "y_expected:\n" << y_expected << std::endl; + std::cout << "y:\n" << y << std::endl; + } + CHECK_EQ(check, 1); +} + +at::Tensor quantized_mul( + at::Tensor x1, + at::Tensor x2, + double scale, + int64_t zero) { + const auto op = + c10::Dispatcher::singleton() + .findSchemaOrThrow("quantized::mul", "") + .typed(); + return op.call(x1, x2, scale, zero); +} + +TEST_F(Quantization, QuantMulDequantUInt8) { + const auto graph_string = R"IR( + graph(%x1 : Float(2, 2, strides=[2, 1], device=cpu), %x2 : Float(2, 2, strides=[2, 1], device=cpu)): + %2 : int = prim::Constant[value=13]() + %qz1 : int = prim::Constant[value=13]() + %qs1 : float = prim::Constant[value=0.1]() + %qz2 : int = prim::Constant[value=13]() + %qs2 : float = prim::Constant[value=0.1]() + %qza : int = prim::Constant[value=13]() + %qsa : float = prim::Constant[value=0.1]() + %q1 : QUInt8(2, 2) = aten::quantize_per_tensor(%x1, %qs1, %qz1, %2) + %q2 : QUInt8(2, 2) = aten::quantize_per_tensor(%x2, %qs2, %qz2, %2) + %qa : QUInt8(2, 2) = quantized::mul(%q1, %q2, %qsa, %qza) + %6 : Float(2, 2) = aten::dequantize(%qa) + return (%6))IR"; + auto graph = std::make_shared(); + parseIR(graph_string, &*graph); + + auto x1 = at::rand({2, 2}, TensorOptions(kCPU).dtype(at::kFloat)); + auto x2 = at::rand({2, 2}, TensorOptions(kCPU).dtype(at::kFloat)); + auto q1 = at::quantize_per_tensor(x1, 0.1f, 13, at::kQUInt8); + auto q2 = at::quantize_per_tensor(x2, 0.1f, 13, at::kQUInt8); + auto qa = quantized_mul(q1, q2, 0.1f, 13); + auto y_expected = at::dequantize(qa); + + TensorExprKernel k(graph); + std::vector inputs = {x1, x2}; + StmtPtr s = k.getCodeGenStmt(); + + std::vector stack = fmap(inputs); + k.run(stack); + auto y = stack[0].toTensor(); + bool check = at::allclose(y_expected, y); + if (!check) { + std::cout << "x1:\n" << x1 << std::endl; + std::cout << "q1:\n" << q1 << std::endl; + std::cout << "x2:\n" << x2 << std::endl; + std::cout << "q2:\n" << q2 << std::endl; + std::cout << "y_expected:\n" << y_expected << std::endl; + std::cout << "y:\n" << y << std::endl; + } + CHECK_EQ(check, 1); +} + +TEST_F(Quantization, QuantUpsampleNearst2dDequantUInt8) { const auto graph_string = R"IR( graph(%x : Float(1, 1, 4, 4, strides=[16, 16, 4, 1], device=cpu)): %2 : int = prim::Constant[value=13]() diff --git a/test/distributed/_shard/sharded_tensor/test_sharded_tensor.py b/test/distributed/_shard/sharded_tensor/test_sharded_tensor.py index f5ba770898665f..807d27a20230dd 100644 --- a/test/distributed/_shard/sharded_tensor/test_sharded_tensor.py +++ b/test/distributed/_shard/sharded_tensor/test_sharded_tensor.py @@ -9,6 +9,7 @@ import torch import torch.distributed as dist from torch.distributed import rpc +from torch.distributed import distributed_c10d from torch.distributed._shard import ( shard_parameter, sharded_tensor, @@ -1464,6 +1465,92 @@ def test_gather_uneven(self) -> None: else: self.assertIsNone(full_tensor) + @with_comms + @skip_if_lt_x_gpu(4) + @requires_nccl() + def test_sharded_tensor_to_cpu(self): + cpu_spec = ChunkShardingSpec( + dim=0, + placements=[ + "rank:0/cpu", + "rank:1/cpu", + "rank:2/cpu", + "rank:3/cpu", + ], + ) + spec = ChunkShardingSpec( + dim=0, + placements=[ + "rank:0/cuda:0", + "rank:1/cuda:1", + "rank:2/cuda:2", + "rank:3/cuda:3", + ], + ) + h, w = 10, 20 + gloo_pg = dist.new_group(backend="gloo") + + # CPU sharded tensor should return the same instance (no copy) + st_cpu = sharded_tensor.zeros(cpu_spec, h, w, process_group=gloo_pg) + new_st_cpu = st_cpu.cpu() + self.assertEqual(st_cpu, new_st_cpu) + + # GPU sharded tensor to cpu + st = sharded_tensor.zeros(spec, h, w) + # test ability to move st to CPU + spec_before_move = st.sharding_spec() + new_st = st.cpu(process_group=gloo_pg) + # return a copy of orginal st + self.assertNotEqual(st, new_st) + # check the spec is still ChunkShardingSpec + spec_after_move = new_st.sharding_spec() + self.assertIsInstance(spec_after_move, ChunkShardingSpec) + # now it should be ProcessGroupGloo since it's on CPU + self.assertIsInstance(new_st._process_group, distributed_c10d.ProcessGroupGloo) + # test specs before and after the move almost the same except placement device + self.assertEqual(spec_before_move.dim, spec_after_move.dim) + self.assertEqual(len(spec_before_move.placements), len(spec_after_move.placements)) + for i, remote_device_after in enumerate(spec_after_move.placements): + remote_device_before = spec_before_move.placements[i] + self.assertEqual(remote_device_before.rank(), remote_device_after.rank()) + self.assertEqual(str(remote_device_after.device()), "cpu") + + # ensure metdata also get changed to CPU + metas = new_st.metadata().shards_metadata + for meta in metas: + self.assertEqual(str(meta.placement.device()), "cpu") + + # Test if a mixed sharded tensor (ShardedTensor with different devices) to cpu + mixed_spec = ChunkShardingSpec( + dim=0, + placements=[ + "rank:0/cpu", + "rank:1/cpu", + "rank:2/cuda:2", + "rank:3/cuda:3", + ], + ) + + st = sharded_tensor.zeros(mixed_spec, h, w, process_group=gloo_pg) + new_st = st.cpu() + # return a copy of orginal st + self.assertNotEqual(st, new_st) + # check the spec is still ChunkShardingSpec + spec_after_move = new_st.sharding_spec() + self.assertIsInstance(spec_after_move, ChunkShardingSpec) + # test specs before and after the move almost the same except placement device + self.assertEqual(mixed_spec.dim, spec_after_move.dim) + self.assertEqual(len(mixed_spec.placements), len(spec_after_move.placements)) + for i, remote_device_after in enumerate(spec_after_move.placements): + remote_device_before = mixed_spec.placements[i] + self.assertEqual(remote_device_before.rank(), remote_device_after.rank()) + self.assertEqual(str(remote_device_after.device()), "cpu") + + # ensure metdata also get changed to CPU + metas = new_st.metadata().shards_metadata + for meta in metas: + self.assertEqual(str(meta.placement.device()), "cpu") + @skip_if_lt_x_gpu(4) @requires_nccl() def test_uneven_shards(self): diff --git a/test/distributed/_shard/sharding_spec/test_sharding_spec.py b/test/distributed/_shard/sharding_spec/test_sharding_spec.py index 30aa3a12609794..a0e13d80d93e18 100644 --- a/test/distributed/_shard/sharding_spec/test_sharding_spec.py +++ b/test/distributed/_shard/sharding_spec/test_sharding_spec.py @@ -318,6 +318,22 @@ def _infer_enum_sharding_spec_case(self): self.assertTrue(isinstance(spec, EnumerableShardingSpec)) self.assertEqual(spec.shards, shards_metadata) + shards_metadata = [ + ShardMetadata( + shard_offsets=[0], + shard_sizes=[16], + placement="cuda:0", + ), + ShardMetadata( + shard_offsets=[16], + shard_sizes=[9], + placement="cuda:1", + ) + ] + spec = _infer_sharding_spec_from_shards_metadata(shards_metadata) + self.assertTrue(isinstance(spec, EnumerableShardingSpec)) + self.assertEqual(spec.shards, shards_metadata) + shards_metadata = [ ShardMetadata( shard_offsets=[0, 0], diff --git a/test/distributed/_shard/test_replicated_tensor.py b/test/distributed/_shard/test_replicated_tensor.py new file mode 100644 index 00000000000000..474fbfb90aaa37 --- /dev/null +++ b/test/distributed/_shard/test_replicated_tensor.py @@ -0,0 +1,76 @@ +# Owner(s): ["oncall: distributed"] + +import torch + +import torch.distributed as dist + +from torch.testing._internal.common_distributed import ( + requires_nccl, + skip_if_lt_x_gpu, +) + +from torch.testing._internal.distributed._shard.sharded_tensor import ( + ShardedTensorTestBase, + with_comms, +) +from torch.distributed._shard.replicated_tensor import ReplicatedTensor + + +class TestReplicatedTensor(ShardedTensorTestBase): + + @with_comms(init_rpc=False) + @skip_if_lt_x_gpu(4) + @requires_nccl() + def test_replicated_tensor_basics(self): + local_tensor = torch.ones(3, 3, device=f"cuda:{self.rank}") * 4 + replica_tensor = ReplicatedTensor(local_tensor) + print(replica_tensor.process_group) + # validate it's a replicated tensor by checking values on all rank + validated = replica_tensor.validate() + self.assertEqual(validated, True) + res = replica_tensor + 2 + self.assertIsInstance(res, torch.Tensor) + self.assertNotIsInstance(res, ReplicatedTensor) + self.assertEqual(res, torch.ones(3, 3) * 6) + + # modify local tensor on certain rank, and test if validation raise + if self.rank == 2: + local_tensor += 3 + + with self.assertRaisesRegex(ValueError, 'have different values'): + replica_tensor.validate() + + @with_comms(init_rpc=False) + @skip_if_lt_x_gpu(4) + @requires_nccl() + def test_replicated_tensor_inter_op_replicated_tensor(self): + local_tensor = torch.ones(3, 3, device=f"cuda:{self.rank}") + replica_tensor1 = ReplicatedTensor(local_tensor * 4) + replica_tensor2 = ReplicatedTensor(local_tensor * 6) + + new_tensor = replica_tensor1 * replica_tensor2 + self.assertIsInstance(new_tensor, ReplicatedTensor) + self.assertEqual(new_tensor, torch.ones(3, 3) * 24) + + # test replicated tensor inter-op with different pgs + new_pg = dist.new_group(ranks=[1, 2, 3]) + replica_tensor_new_group = ReplicatedTensor(local_tensor * 3, process_group=new_pg) + + with self.assertRaisesRegex(RuntimeError, 'must be in the same'): + replica_tensor_new_group * replica_tensor1 + + + @with_comms(init_rpc=False) + @skip_if_lt_x_gpu(4) + @requires_nccl() + def test_replicated_tensor_inter_op_tensor(self): + local_tensor = torch.ones(3, 3, device=f"cuda:{self.rank}") * 4 + replica_tensor = ReplicatedTensor(local_tensor) + + local_rand_tensor = torch.randn(3, 3, device=f"cuda:{self.rank}") + + new_tensor = replica_tensor + local_rand_tensor + self.assertIsInstance(new_tensor, torch.Tensor) + self.assertNotIsInstance(new_tensor, ReplicatedTensor) + + self.assertEqual(new_tensor, local_tensor + local_rand_tensor) diff --git a/test/distributed/elastic/agent/server/test/local_elastic_agent_test.py b/test/distributed/elastic/agent/server/test/local_elastic_agent_test.py index a931f3ef1d4e29..9c5a395054900b 100644 --- a/test/distributed/elastic/agent/server/test/local_elastic_agent_test.py +++ b/test/distributed/elastic/agent/server/test/local_elastic_agent_test.py @@ -38,8 +38,8 @@ from torch.distributed.rpc.backend_registry import BackendType from torch.testing._internal.common_utils import ( TEST_WITH_DEV_DBG_ASAN, - sandcastle_skip_if, TEST_WITH_TSAN, + sandcastle_skip_if, ) @@ -170,11 +170,26 @@ def _check_env_function(): "TORCHELASTIC_MAX_RESTARTS", "TORCHELASTIC_RUN_ID", "TORCHELASTIC_USE_AGENT_STORE", + "NCCL_ASYNC_ERROR_HANDLING", ] for var in env_vars: _ = os.environ[var] +def _check_env_value(key: str, expected: str): + # checks if the env var ``key`` matches ``value`` + # this function is intended to be used as the entrypoint to the elastic run + if key not in os.environ: + raise RuntimeError(f"Environment variable {key} not found in os.environ") + else: + actual = os.getenv(key) + if expected != actual: + raise RuntimeError( + f"os.environ['{key}']={actual}" + f" does not equal the expected value: {expected}" + ) + + def acquire_available_port(): """ Uses sockets to acquire an available port from the os for use. @@ -184,10 +199,7 @@ def acquire_available_port(): the port as quickly as possible. """ addrs = socket.getaddrinfo( - host="localhost", - port=None, - family=socket.AF_UNSPEC, - type=socket.SOCK_STREAM + host="localhost", port=None, family=socket.AF_UNSPEC, type=socket.SOCK_STREAM ) for addr in addrs: @@ -398,7 +410,6 @@ def run_test_with_backend(self, backend: str, test_to_run: Callable): test_to_run() - def dummy_compute(self): res = self.run_agent(Conf(entrypoint=dummy_compute, local_world_size=2)) self.assertFalse(res.is_failed()) @@ -406,21 +417,15 @@ def dummy_compute(self): self.assertIsInstance(return_value, torch.Tensor) self.assertEqual((100, 100), return_value.shape) - @sandcastle_skip_if( - TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan" - ) + @sandcastle_skip_if(TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan") def test_dummy_compute_c10d(self): self.run_test_with_backend(backend="c10d", test_to_run=self.dummy_compute) - @sandcastle_skip_if( - TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan" - ) + @sandcastle_skip_if(TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan") def test_dummy_compute_etcd(self): self.run_test_with_backend(backend="etcd", test_to_run=self.dummy_compute) - @sandcastle_skip_if( - TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan" - ) + @sandcastle_skip_if(TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan") def test_dummy_compute_etcd_v2(self): self.run_test_with_backend(backend="etcd-v2", test_to_run=self.dummy_compute) @@ -430,23 +435,19 @@ def run_happy_function(self): self.assertIsNone(res.return_values[0]) self.assertIsNone(res.return_values[1]) - @sandcastle_skip_if( - TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan" - ) + @sandcastle_skip_if(TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan") def test_run_happy_function_c10d(self): self.run_test_with_backend(backend="c10d", test_to_run=self.run_happy_function) - @sandcastle_skip_if( - TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan" - ) + @sandcastle_skip_if(TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan") def test_run_happy_function_etcd(self): self.run_test_with_backend(backend="etcd", test_to_run=self.run_happy_function) - @sandcastle_skip_if( - TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan" - ) + @sandcastle_skip_if(TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan") def test_run_happy_function_etcd_v2(self): - self.run_test_with_backend(backend="etcd-v2", test_to_run=self.run_happy_function) + self.run_test_with_backend( + backend="etcd-v2", test_to_run=self.run_happy_function + ) def check_master_addr_port_override(self): master_addr = "test_host" @@ -463,17 +464,17 @@ def check_master_addr_port_override(self): self.assertFalse(res.is_failed()) self.assertIsNone(res.return_values[0]) - @sandcastle_skip_if( - TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan" - ) + @sandcastle_skip_if(TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan") def test_check_master_addr_port_override_etcd(self): - self.run_test_with_backend(backend="etcd", test_to_run=self.check_master_addr_port_override) + self.run_test_with_backend( + backend="etcd", test_to_run=self.check_master_addr_port_override + ) - @sandcastle_skip_if( - TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan" - ) + @sandcastle_skip_if(TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan") def test_check_master_addr_port_override_etcd_v2(self): - self.run_test_with_backend(backend="etcd-v2", test_to_run=self.check_master_addr_port_override) + self.run_test_with_backend( + backend="etcd-v2", test_to_run=self.check_master_addr_port_override + ) def run_check_env_function(self): # just checks that all env vars that we need to set on the user script @@ -481,11 +482,47 @@ def run_check_env_function(self): res = self.run_agent(Conf(entrypoint=_check_env_function, local_world_size=1)) self.assertFalse(res.is_failed()) - @sandcastle_skip_if( - TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan" - ) + def run_check_nccl_async_error_handling_env(self): + # make sure NCCL_ASYNC_ERROR_HANDLING set in os.environ is honored + with patch.dict(os.environ, {"NCCL_ASYNC_ERROR_HANDLING": "0"}): + res = self.run_agent( + Conf( + entrypoint=_check_env_value, + local_world_size=1, + args=("NCCL_ASYNC_ERROR_HANDLING", "0"), + ) + ) + self.assertFalse(res.is_failed()) + + def run_check_nccl_async_error_handling_env_default(self): + # if not present in env var it should default to 1 + res = self.run_agent( + Conf( + entrypoint=_check_env_value, + local_world_size=1, + args=("NCCL_ASYNC_ERROR_HANDLING", "1"), + ) + ) + self.assertFalse(res.is_failed()) + + @sandcastle_skip_if(TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan") def test_run_check_env_function_etcd(self): - self.run_test_with_backend(backend="etcd", test_to_run=self.run_check_env_function) + self.run_test_with_backend( + backend="etcd", test_to_run=self.run_check_env_function + ) + + @sandcastle_skip_if(TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan") + def test_run_check_nccl_async_error_handling_env_c10d(self): + self.run_test_with_backend( + backend="c10d", test_to_run=self.run_check_nccl_async_error_handling_env + ) + + @sandcastle_skip_if(TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan") + def test_run_check_nccl_async_error_handling_env_default_c10d(self): + self.run_test_with_backend( + backend="c10d", + test_to_run=self.run_check_nccl_async_error_handling_env_default, + ) def run_function_with_return_value(self): res = self.run_agent(Conf(entrypoint=_echo, args=("foo",), local_world_size=2)) @@ -493,44 +530,38 @@ def run_function_with_return_value(self): self.assertEqual("foo", res.return_values[0]) self.assertEqual("foo", res.return_values[1]) - @sandcastle_skip_if( - TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan" - ) + @sandcastle_skip_if(TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan") def test_run_function_with_return_value_c10d(self): - self.run_test_with_backend(backend="c10d", test_to_run=self.run_function_with_return_value) + self.run_test_with_backend( + backend="c10d", test_to_run=self.run_function_with_return_value + ) - @sandcastle_skip_if( - TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan" - ) + @sandcastle_skip_if(TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan") def test_run_function_with_return_value_etcd(self): - self.run_test_with_backend(backend="etcd", test_to_run=self.run_function_with_return_value) + self.run_test_with_backend( + backend="etcd", test_to_run=self.run_function_with_return_value + ) - @sandcastle_skip_if( - TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan" - ) + @sandcastle_skip_if(TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan") def test_run_function_with_return_value_etcd_v2(self): - self.run_test_with_backend(backend="etcd-v2", test_to_run=self.run_function_with_return_value) + self.run_test_with_backend( + backend="etcd-v2", test_to_run=self.run_function_with_return_value + ) def simple_dist_sum(self): res = self.run_agent(Conf(entrypoint=_dist_sum, local_world_size=2)) self.assertFalse(res.is_failed()) # _dist_sum internally checks that the sum computed is valid - @sandcastle_skip_if( - TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan" - ) + @sandcastle_skip_if(TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan") def test_simple_dist_sum_c10d(self): self.run_test_with_backend(backend="c10d", test_to_run=self.simple_dist_sum) - @sandcastle_skip_if( - TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan" - ) + @sandcastle_skip_if(TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan") def test_simple_dist_sum_etcd(self): self.run_test_with_backend(backend="etcd", test_to_run=self.simple_dist_sum) - @sandcastle_skip_if( - TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan" - ) + @sandcastle_skip_if(TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan") def test_simple_dist_sum_etcd_v2(self): self.run_test_with_backend(backend="etcd-v2", test_to_run=self.simple_dist_sum) @@ -556,21 +587,27 @@ def run_distributed_sum_homogeneous(self): "test incompatible with dev/dbg asan or tsan", ) def test_run_distributed_sum_homogeneous_c10d(self): - self.run_test_with_backend(backend="c10d", test_to_run=self.run_distributed_sum_homogeneous) + self.run_test_with_backend( + backend="c10d", test_to_run=self.run_distributed_sum_homogeneous + ) @unittest.skipIf( TEST_WITH_DEV_DBG_ASAN or TEST_WITH_TSAN, "test incompatible with dev/dbg asan or tsan", ) def test_run_distributed_sum_homogeneous_etcd(self): - self.run_test_with_backend(backend="etcd", test_to_run=self.run_distributed_sum_homogeneous) + self.run_test_with_backend( + backend="etcd", test_to_run=self.run_distributed_sum_homogeneous + ) @unittest.skipIf( TEST_WITH_DEV_DBG_ASAN or TEST_WITH_TSAN, "test incompatible with dev/dbg asan or tsan", ) def test_run_distributed_sum_homogeneous_etcd_v2(self): - self.run_test_with_backend(backend="etcd-v2", test_to_run=self.run_distributed_sum_homogeneous) + self.run_test_with_backend( + backend="etcd-v2", test_to_run=self.run_distributed_sum_homogeneous + ) def run_distributed_sum_heterogeneous(self): # sums all ranks on 3 agents; each running 1, 2, 3 workers respectively @@ -593,23 +630,23 @@ def run_distributed_sum_heterogeneous(self): ranks.update(run_results.return_values.keys()) self.assertSetEqual(set(range(1 + 2 + 3)), ranks) - @sandcastle_skip_if( - TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan" - ) + @sandcastle_skip_if(TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan") def test_run_distributed_sum_heterogeneous_c10d(self): - self.run_test_with_backend(backend="c10d", test_to_run=self.run_distributed_sum_heterogeneous) + self.run_test_with_backend( + backend="c10d", test_to_run=self.run_distributed_sum_heterogeneous + ) - @sandcastle_skip_if( - TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan" - ) + @sandcastle_skip_if(TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan") def test_run_distributed_sum_heterogeneous_etcd(self): - self.run_test_with_backend(backend="etcd", test_to_run=self.run_distributed_sum_heterogeneous) + self.run_test_with_backend( + backend="etcd", test_to_run=self.run_distributed_sum_heterogeneous + ) - @sandcastle_skip_if( - TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan" - ) + @sandcastle_skip_if(TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan") def test_run_distributed_sum_heterogeneous_etcd_v2(self): - self.run_test_with_backend(backend="etcd-v2", test_to_run=self.run_distributed_sum_heterogeneous) + self.run_test_with_backend( + backend="etcd-v2", test_to_run=self.run_distributed_sum_heterogeneous + ) def run_sad_function(self): """ @@ -632,21 +669,15 @@ def run_sad_function(self): self.assertEqual(data["message"], failure_data["message"]) self.assertEqual(int(data["extraInfo"]["timestamp"]), failure.timestamp) - @sandcastle_skip_if( - TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan" - ) + @sandcastle_skip_if(TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan") def test_run_sad_function_c10d(self): self.run_test_with_backend(backend="c10d", test_to_run=self.run_sad_function) - @sandcastle_skip_if( - TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan" - ) + @sandcastle_skip_if(TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan") def test_run_sad_function_etcd(self): self.run_test_with_backend(backend="etcd", test_to_run=self.run_sad_function) - @sandcastle_skip_if( - TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan" - ) + @sandcastle_skip_if(TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan") def test_run_sad_function_etcd_v2(self): self.run_test_with_backend(backend="etcd-v2", test_to_run=self.run_sad_function) @@ -663,23 +694,23 @@ def run_bipolar_function(self): self.assertEqual(WorkerState.FAILED, agent.get_worker_group().state) self.assertTrue(agent._total_execution_time > 0) - @sandcastle_skip_if( - TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan" - ) + @sandcastle_skip_if(TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan") def test_run_bipolar_function_c10d(self): - self.run_test_with_backend(backend="c10d", test_to_run=self.run_bipolar_function) + self.run_test_with_backend( + backend="c10d", test_to_run=self.run_bipolar_function + ) - @sandcastle_skip_if( - TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan" - ) + @sandcastle_skip_if(TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan") def test_run_bipolar_function_etcd(self): - self.run_test_with_backend(backend="etcd", test_to_run=self.run_bipolar_function) + self.run_test_with_backend( + backend="etcd", test_to_run=self.run_bipolar_function + ) - @sandcastle_skip_if( - TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan" - ) + @sandcastle_skip_if(TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan") def test_run_bipolar_function_etcd_v2(self): - self.run_test_with_backend(backend="etcd-v2", test_to_run=self.run_bipolar_function) + self.run_test_with_backend( + backend="etcd-v2", test_to_run=self.run_bipolar_function + ) def correct_rank_assignment_heterogeneous(self): node_configs = [ @@ -710,14 +741,18 @@ def correct_rank_assignment_heterogeneous(self): "test incompatible with dev/dbg asan or tsan", ) def test_correct_rank_assignment_heterogeneous_etcd(self): - self.run_test_with_backend(backend="etcd", test_to_run=self.correct_rank_assignment_heterogeneous) + self.run_test_with_backend( + backend="etcd", test_to_run=self.correct_rank_assignment_heterogeneous + ) @unittest.skipIf( TEST_WITH_DEV_DBG_ASAN or TEST_WITH_TSAN, "test incompatible with dev/dbg asan or tsan", ) def test_correct_rank_assignment_heterogeneous_etcd_v2(self): - self.run_test_with_backend(backend="etcd-v2", test_to_run=self.correct_rank_assignment_heterogeneous) + self.run_test_with_backend( + backend="etcd-v2", test_to_run=self.correct_rank_assignment_heterogeneous + ) def correct_rank_assignment_homogeneous(self): node_configs = [ @@ -744,14 +779,18 @@ def correct_rank_assignment_homogeneous(self): "test incompatible with dev/dbg asan or tsan", ) def test_correct_rank_assignment_homogeneous_etcd(self): - self.run_test_with_backend(backend="etcd", test_to_run=self.correct_rank_assignment_homogeneous) + self.run_test_with_backend( + backend="etcd", test_to_run=self.correct_rank_assignment_homogeneous + ) @unittest.skipIf( TEST_WITH_DEV_DBG_ASAN or TEST_WITH_TSAN, "test incompatible with dev/dbg asan or tsan", ) def test_correct_rank_assignment_homogeneous_etcd_v2(self): - self.run_test_with_backend(backend="etcd-v2", test_to_run=self.correct_rank_assignment_homogeneous) + self.run_test_with_backend( + backend="etcd-v2", test_to_run=self.correct_rank_assignment_homogeneous + ) def assert_rank_consistency( self, @@ -853,14 +892,18 @@ def double_agent_fault_tolerance(self): "test incompatible with dev/dbg asan or tsan", ) def test_double_agent_fault_tolerance_etcd(self): - self.run_test_with_backend(backend="etcd", test_to_run=self.double_agent_fault_tolerance) + self.run_test_with_backend( + backend="etcd", test_to_run=self.double_agent_fault_tolerance + ) @unittest.skipIf( TEST_WITH_DEV_DBG_ASAN or TEST_WITH_TSAN, "test incompatible with dev/dbg asan or tsan", ) def test_double_agent_fault_tolerance_etcd_v2(self): - self.run_test_with_backend(backend="etcd-v2", test_to_run=self.double_agent_fault_tolerance) + self.run_test_with_backend( + backend="etcd-v2", test_to_run=self.double_agent_fault_tolerance + ) def double_agent_elastic(self): """ @@ -907,21 +950,27 @@ def double_agent_elastic(self): "test incompatible with dev/dbg asan or tsan", ) def test_double_agent_elastic_c10d(self): - self.run_test_with_backend(backend="c10d", test_to_run=self.double_agent_elastic) + self.run_test_with_backend( + backend="c10d", test_to_run=self.double_agent_elastic + ) @unittest.skipIf( TEST_WITH_DEV_DBG_ASAN or TEST_WITH_TSAN, "test incompatible with dev/dbg asan or tsan", ) def test_double_agent_elastic_etcd(self): - self.run_test_with_backend(backend="etcd", test_to_run=self.double_agent_elastic) + self.run_test_with_backend( + backend="etcd", test_to_run=self.double_agent_elastic + ) @unittest.skipIf( TEST_WITH_DEV_DBG_ASAN or TEST_WITH_TSAN, "test incompatible with dev/dbg asan or tsan", ) def test_double_agent_elastic_etcd_v2(self): - self.run_test_with_backend(backend="etcd-v2", test_to_run=self.double_agent_elastic) + self.run_test_with_backend( + backend="etcd-v2", test_to_run=self.double_agent_elastic + ) def torch_rpc(self): """ @@ -1056,21 +1105,15 @@ def barrier_failed(self, barrier_mock): self.assertFalse(res.is_failed()) barrier_mock.assert_called_once() - @sandcastle_skip_if( - TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan" - ) + @sandcastle_skip_if(TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan") def test_barrier_failed_c10d(self): self.run_test_with_backend(backend="c10d", test_to_run=self.barrier_failed) - @sandcastle_skip_if( - TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan" - ) + @sandcastle_skip_if(TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan") def test_barrier_failed_etcd(self): self.run_test_with_backend(backend="etcd", test_to_run=self.barrier_failed) - @sandcastle_skip_if( - TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan" - ) + @sandcastle_skip_if(TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan") def test_barrier_failed_etcd_v2(self): self.run_test_with_backend(backend="etcd-v2", test_to_run=self.barrier_failed) @@ -1089,20 +1132,14 @@ def shutdown_called(self, start_processes_mock): agent.run("worker") pcontext_mock.close.assert_called_once() - @sandcastle_skip_if( - TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan" - ) + @sandcastle_skip_if(TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan") def test_shutdown_called_c10d(self): self.run_test_with_backend(backend="c10d", test_to_run=self.shutdown_called) - @sandcastle_skip_if( - TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan" - ) + @sandcastle_skip_if(TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan") def test_shutdown_called_etcd(self): self.run_test_with_backend(backend="etcd", test_to_run=self.shutdown_called) - @sandcastle_skip_if( - TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan" - ) + @sandcastle_skip_if(TEST_WITH_DEV_DBG_ASAN, "test incompatible with dev/dbg asan") def test_shutdown_called_etcd_v2(self): self.run_test_with_backend(backend="etcd-v2", test_to_run=self.shutdown_called) diff --git a/test/distributed/fsdp/test_fsdp_clip_grad_norm.py b/test/distributed/fsdp/test_fsdp_clip_grad_norm.py new file mode 100644 index 00000000000000..a88eb2deeb5378 --- /dev/null +++ b/test/distributed/fsdp/test_fsdp_clip_grad_norm.py @@ -0,0 +1,117 @@ +# Owner(s): ["oncall: distributed"] + +import sys +from math import inf + +import torch +from torch import distributed as dist +from torch.distributed.fsdp.fully_sharded_data_parallel import ( + FullyShardedDataParallel as FSDP, + CPUOffload, + _calc_grad_norm, +) +from torch.nn import utils as nn_utils +from torch.testing._internal.common_distributed import skip_if_lt_x_gpu +from torch.testing._internal.common_fsdp import ( + DeterministicModel, + FSDPTest, + _collect_total_grad_norm_fsdp, + _collect_total_grad_norm_local, +) +from torch.testing._internal.common_utils import ( + TEST_WITH_DEV_DBG_ASAN, + run_tests, + parametrize, + instantiate_parametrized_tests, +) + + +if not dist.is_available(): + print("Distributed not available, skipping tests", file=sys.stderr) + sys.exit(0) + +if TEST_WITH_DEV_DBG_ASAN: + print( + "Skip dev-asan as torch + multiprocessing spawn have known issues", + file=sys.stderr, + ) + sys.exit(0) + + +class TestClipGradNorm(FSDPTest): + def _run_fsdp_one_iteration(self, norm_type, nested_fsdp, cpu_offload): + """Test FSDP with clip grad norm.""" + fsdp_model = DeterministicModel(nested_fsdp, cpu_offload=cpu_offload) + local_model = DeterministicModel(False) + input = torch.rand(14, 2, device=self.rank) + fsdp_model = FSDP(fsdp_model, cpu_offload=cpu_offload) + self.assertTrue(len(input) >= self.world_size) + out = local_model(input[: self.world_size]) + out.sum().backward() + in_data = torch.tensor(input[self.rank], device=self.rank) + out_fsdp = fsdp_model(in_data) + out_fsdp.sum().backward() + total_norms_fsdp = _collect_total_grad_norm_fsdp( + fsdp_model, norm_type, self.rank + ) + total_norms_local = _collect_total_grad_norm_local(local_model, norm_type) + total_norms_local /= self.world_size + norm_cap = total_norms_fsdp / 2.0 + self.assertEqual(total_norms_local, total_norms_fsdp) + fsdp_model.clip_grad_norm_(norm_cap, norm_type=norm_type) + nn_utils.clip_grad_norm_( + local_model.parameters(), norm_cap, norm_type=norm_type + ) + total_norms_after_clip_fsdp = _collect_total_grad_norm_fsdp( + fsdp_model, norm_type, self.rank + ) + total_norms_after_clip_local = _collect_total_grad_norm_local( + local_model, norm_type + ) + self.assertTrue(total_norms_after_clip_fsdp <= norm_cap) + self.assertEqual(total_norms_after_clip_local, total_norms_after_clip_fsdp) + + @skip_if_lt_x_gpu(2) + @parametrize("norm_type", [2.0, inf]) + @parametrize("nested_fsdp", [True, False]) + @parametrize( + "cpu_offload", + [CPUOffload(offload_params=True), CPUOffload(offload_params=False)], + ) + def test_fsdp_clip_grad_norm(self, norm_type, nested_fsdp, cpu_offload): + """Test FSDP with clip grad norm.""" + self._run_fsdp_one_iteration(norm_type, nested_fsdp, cpu_offload) + + +class TestCalcuGradNorm(FSDPTest): + @skip_if_lt_x_gpu(2) + @parametrize("norm_type", [2.0, inf]) + @parametrize("nested_fsdp", [True, False]) + def test_fsdp_calc_grad_norm(self, norm_type, nested_fsdp): + """Test grad norm cal API.""" + model = FSDP(DeterministicModel(nested_fsdp)) + input = torch.rand(15, 2, device=self.rank) + out = model(input) + out.sum().backward() + total_norm = _calc_grad_norm(model.params_with_grad, norm_type) + total_norm_expected = _collect_total_grad_norm_local(model, norm_type) + self.assertEqual(total_norm, total_norm_expected) + + @skip_if_lt_x_gpu(2) + @parametrize("norm_type", [1.3, 2.5]) + def test_fsdp_calc_grad_norm_error(self, norm_type): + """Test the abnormal cases of grad norm cal API.""" + model = DeterministicModel(False) + input = torch.rand(12, 2, device=self.rank) + out = model(input) + out.sum().backward() + error_msg = f"Order {norm_type} not supported for matrix norm" + with self.assertRaisesRegex(RuntimeError, error_msg): + total_norm = _calc_grad_norm(model.parameters(), norm_type) + + +instantiate_parametrized_tests(TestClipGradNorm) +instantiate_parametrized_tests(TestCalcuGradNorm) + +if __name__ == "__main__": + run_tests() diff --git a/test/distributed/fsdp/test_fsdp_comm.py b/test/distributed/fsdp/test_fsdp_comm.py index 86cbaebb086327..f86880ff21e24d 100644 --- a/test/distributed/fsdp/test_fsdp_comm.py +++ b/test/distributed/fsdp/test_fsdp_comm.py @@ -6,6 +6,7 @@ import torch from torch import distributed as dist from torch.distributed.fsdp import FullyShardedDataParallel as FSDP +from torch.distributed.fsdp.fully_sharded_data_parallel import ShardingStrategy from torch.testing._internal.common_distributed import skip_if_lt_x_gpu from torch.testing._internal.common_fsdp import FSDPTest, NestedWrappedModule from torch.testing._internal.common_utils import ( @@ -38,10 +39,15 @@ class TestCommunication(FSDPTest): "use_no_sync", [False, True], ) + @parametrize( + "sharding_strategy", + [ShardingStrategy.SHARD_GRAD_OP, None], + ) def test_communication( self, nested_model: bool, use_no_sync: bool, + sharding_strategy: ShardingStrategy, ): """ Tests FSDP's communication cost in terms of calls to collective @@ -60,10 +66,14 @@ def test_communication( group = dist.distributed_c10d._get_default_group() device = torch.device("cuda") if nested_model: - model = NestedWrappedModule(group, wrap_fsdp=True) - fsdp_model: FSDP = FSDP(model, group).to(device) + model = NestedWrappedModule(group, wrap_fsdp=True, sharding_strategy=sharding_strategy) + fsdp_model: FSDP = FSDP(model, group, sharding_strategy=sharding_strategy).to(device) else: - fsdp_model: FSDP = self._get_wrapped_model(group, cuda_first=False) + fsdp_model: FSDP = self._get_wrapped_model( + group, + cuda_first=False, + config={"sharding_strategy": sharding_strategy}, + ) batch = fsdp_model.module.get_input(device) # Count the number of FSDP instances @@ -74,10 +84,16 @@ def test_communication( # Count the number of all-gathers and reduce-scatters by mocking # `_all_gather_base()` and `_reducer_scatter_base()` - # Both with and without `no_sync()`: - # Forward: `num_fsdp` all-gathers + # + # with `no_sync()`: + # Forward: when no_sync mode, root will not free full parameters, + # thus there will be `num_fsdp-1` all-gathers. + # Backward: `num_fsdp` - 1 all-gathers (only excluding the root) + # without `no_sync()`: + # Forward: all instances free full parameters, thus there will be `` + # `num_fsdp` all-gathers. # Backward: `num_fsdp` - 1 all-gathers (only excluding the root) - expected_num_all_gather_no_sync = num_fsdp + (num_fsdp - 1) + expected_num_all_gather_no_sync = (num_fsdp - 1) + (num_fsdp - 1) expected_num_all_gather_sync = num_fsdp + (num_fsdp - 1) expected_num_reduce_scatter_no_sync = 0 expected_num_reduce_scatter_sync = num_fsdp @@ -92,7 +108,7 @@ def reset_mocks(): if use_no_sync: # Check the communication cost when using `no_sync()` - for _ in range(num_no_sync_iters): + for i in range(num_no_sync_iters): reset_mocks() with fsdp_model.no_sync(): output = fsdp_model(*batch) @@ -100,33 +116,69 @@ def reset_mocks(): loss.backward() num_all_gather = mock_all_gather.call_count num_reduce_scatter = mock_reduce_scatter.call_count - assert num_all_gather == expected_num_all_gather_no_sync, \ - f"Expected {expected_num_all_gather_no_sync} " \ - f"all-gathers but saw {num_all_gather} all-gathers " \ + # in the first iteration, all fsdp instances including root + # need to all_gather shards in the forward pass. + if i == 0: + expected_num_all_gather_no_sync_updated = expected_num_all_gather_no_sync + 1 + # in the first iteration, all fsdp instances need to all_gather shards + # in the forward pass + if sharding_strategy == ShardingStrategy.SHARD_GRAD_OP: + expected_num_all_gather_no_sync_updated = num_fsdp + else: + expected_num_all_gather_no_sync_updated = expected_num_all_gather_no_sync + # full parameters are not freed after first iteration in the no_sync mode + if sharding_strategy == ShardingStrategy.SHARD_GRAD_OP: + expected_num_all_gather_no_sync_updated = 0 + self.assertEqual( + num_all_gather, + expected_num_all_gather_no_sync_updated, + f"Expected {expected_num_all_gather_no_sync_updated} " + f"all-gathers but saw {num_all_gather} all-gathers " f"when using `no_sync()`" - assert num_reduce_scatter == \ - expected_num_reduce_scatter_no_sync, \ - f"Expected {expected_num_reduce_scatter_no_sync} " \ - f"reduce-scatters but saw {num_reduce_scatter} " \ + ) + self.assertEqual( + num_reduce_scatter, + expected_num_reduce_scatter_no_sync, + f"Expected {expected_num_reduce_scatter_no_sync} " + f"reduce-scatters but saw {num_reduce_scatter} " "reduce-scatters when using `no_sync()`" + ) # Check the normal communication cost (when not using `no_sync()`) - for _ in range(num_sync_iters): + for i in range(num_sync_iters): reset_mocks() output = fsdp_model(*batch) loss = fsdp_model.module.get_loss(batch, output) loss.backward() num_all_gather = mock_all_gather.call_count num_reduce_scatter = mock_reduce_scatter.call_count - assert num_all_gather == expected_num_all_gather_sync, \ - f"Expected {expected_num_all_gather_sync} all-gathers " \ - f"but saw {num_all_gather} all-gathers when not using " \ + # previous non-sync iteration does not free full parameters for + # the root instance. + if use_no_sync and i == 0: + expected_num_all_gather_sync_updated = expected_num_all_gather_sync - 1 + # previous non-sync iteration does not free full parameters + if sharding_strategy == ShardingStrategy.SHARD_GRAD_OP: + expected_num_all_gather_sync_updated = 0 + else: + expected_num_all_gather_sync_updated = expected_num_all_gather_sync + # no need to all_gather shards in the backward pass when in + # SHARD_GRAD_OP mode + if sharding_strategy == ShardingStrategy.SHARD_GRAD_OP: + expected_num_all_gather_sync_updated = num_fsdp + self.assertEqual( + num_all_gather, + expected_num_all_gather_sync_updated, + f"Expected {expected_num_all_gather_sync_updated} all-gathers " + f"but saw {num_all_gather} all-gathers when not using " "`no_sync()`" - assert num_reduce_scatter == \ - expected_num_reduce_scatter_sync, \ - f"Expected {expected_num_reduce_scatter_sync} reduce-" \ - f"scatters but saw {num_reduce_scatter} reduce-scatters " \ + ) + self.assertEqual( + num_reduce_scatter, + expected_num_reduce_scatter_sync, + f"Expected {expected_num_reduce_scatter_sync} reduce-" + f"scatters but saw {num_reduce_scatter} reduce-scatters " "when not using `no_sync()`" + ) instantiate_parametrized_tests(TestCommunication) diff --git a/test/distributed/fsdp/test_fsdp_core.py b/test/distributed/fsdp/test_fsdp_core.py index ef91d4db083603..7ea54f27ce6c8f 100644 --- a/test/distributed/fsdp/test_fsdp_core.py +++ b/test/distributed/fsdp/test_fsdp_core.py @@ -1,6 +1,7 @@ # Owner(s): ["oncall: distributed"] import functools +import itertools import sys from unittest import mock @@ -18,6 +19,7 @@ NestedWrappedModule, NestedWrappedModuleWithDelay, TransformerWithSharedParams, + subtest_name ) from torch.testing._internal.common_utils import ( TEST_WITH_DEV_DBG_ASAN, @@ -26,8 +28,8 @@ run_tests, ) -from torch.distributed.fsdp import CPUOffload -from torch.distributed.fsdp.fully_sharded_data_parallel import BackwardPrefetch +from torch.distributed.fsdp import CPUOffload, MixedPrecision +from torch.distributed.fsdp.fully_sharded_data_parallel import BackwardPrefetch, ShardingStrategy if not dist.is_available(): @@ -41,6 +43,23 @@ ) sys.exit(0) +params = "cpu_offload,backward_prefetch,sharding_strategy" +cpu_offload_config = [CPUOffload(offload_params=True), CPUOffload(offload_params=False)] +backward_prefetch_config = [BackwardPrefetch.BACKWARD_PRE, BackwardPrefetch.BACKWARD_POST, None] +sharding_strategy_config = [ShardingStrategy.SHARD_GRAD_OP, None] +configs = list(itertools.product(cpu_offload_config, + backward_prefetch_config, + sharding_strategy_config)) +test_name_mapping = { + str(CPUOffload(offload_params=True)): "offload_true", + str(CPUOffload(offload_params=False)): "offload_false", + str(BackwardPrefetch.BACKWARD_PRE): "prefetch_pre", + str(BackwardPrefetch.BACKWARD_POST): "prefetch_post", + str(ShardingStrategy.SHARD_GRAD_OP): "shard_grad_op", +} + +subtest_name = functools.partial(subtest_name, test_name_mapping) + class TestParityWithDDP(FSDPTest): """ @@ -63,15 +82,8 @@ def _get_init_modes_for_test(self, cpu_offload): return modes @skip_if_lt_x_gpu(2) - @parametrize( - "cpu_offload", - [CPUOffload(offload_params=True), CPUOffload(offload_params=False)] - ) - @parametrize( - "backward_prefetch", - [BackwardPrefetch.BACKWARD_PRE, BackwardPrefetch.BACKWARD_POST, None] - ) - def test_nested_wrapped_model(self, cpu_offload, backward_prefetch): + @parametrize(params, configs, subtest_name) + def test_nested_wrapped_model(self, cpu_offload, backward_prefetch, sharding_strategy): init_modes = self._get_init_modes_for_test(cpu_offload) for fsdp_init_mode in init_modes: with self.subTest(fsdp_init_mode=fsdp_init_mode): @@ -80,18 +92,39 @@ def test_nested_wrapped_model(self, cpu_offload, backward_prefetch): fsdp_init_mode=fsdp_init_mode, cpu_offload=cpu_offload, backward_prefetch=backward_prefetch, + sharding_strategy=sharding_strategy, ) @skip_if_lt_x_gpu(2) - @parametrize( - "cpu_offload", - [CPUOffload(offload_params=True), CPUOffload(offload_params=False)] - ) - @parametrize( - "backward_prefetch", - [BackwardPrefetch.BACKWARD_PRE, BackwardPrefetch.BACKWARD_POST, None] - ) - def test_nested_all_wrapped_model(self, cpu_offload, backward_prefetch): + @parametrize("cpu_offload", cpu_offload_config) + @parametrize("sharding_strategy", sharding_strategy_config) + @parametrize("mixed_precision", [True, False]) + def test_nested_wrapped_model_single_iteration_mixed_precision( + self, + cpu_offload, + sharding_strategy, + mixed_precision + ): + init_modes = self._get_init_modes_for_test(cpu_offload) + mixed_precision = MixedPrecision() if mixed_precision else None + for fsdp_init_mode in init_modes: + with self.subTest(fsdp_init_mode=fsdp_init_mode): + self._test_identical_outputs( + NestedWrappedModule, + # Only run one step for comparison, as usually grad scaler + # is needed to avoid NaN after first step. + num_steps=1, + fsdp_init_mode=fsdp_init_mode, + cpu_offload=cpu_offload, + sharding_strategy=sharding_strategy, + mixed_precision=mixed_precision, + ) + + + @skip_if_lt_x_gpu(2) + @parametrize(params, configs, subtest_name) + @parametrize("clip_norm_type", [2.0, None]) + def test_nested_all_wrapped_model(self, cpu_offload, backward_prefetch, sharding_strategy, clip_norm_type): init_modes = self._get_init_modes_for_test(cpu_offload) for fsdp_init_mode in init_modes: with self.subTest(fsdp_init_mode=fsdp_init_mode): @@ -101,18 +134,14 @@ def test_nested_all_wrapped_model(self, cpu_offload, backward_prefetch): fsdp_init_mode=fsdp_init_mode, cpu_offload=cpu_offload, backward_prefetch=backward_prefetch, + norm_type=clip_norm_type, + sharding_strategy=sharding_strategy, ) @skip_if_lt_x_gpu(2) - @parametrize( - "cpu_offload", - [CPUOffload(offload_params=True), CPUOffload(offload_params=False)] - ) - @parametrize( - "backward_prefetch", - [BackwardPrefetch.BACKWARD_PRE, BackwardPrefetch.BACKWARD_POST, None] - ) - def test_transformer_parameterized(self, cpu_offload, backward_prefetch): + @parametrize(params, configs, subtest_name) + @parametrize("clip_norm_type", [2.0, None]) + def test_transformer_parameterized(self, cpu_offload, backward_prefetch, sharding_strategy, clip_norm_type): init_modes = self._get_init_modes_for_test(cpu_offload) for fsdp_init_mode in init_modes: with self.subTest(fsdp_init_mode=fsdp_init_mode): @@ -121,18 +150,13 @@ def test_transformer_parameterized(self, cpu_offload, backward_prefetch): fsdp_init_mode=fsdp_init_mode, cpu_offload=cpu_offload, backward_prefetch=backward_prefetch, + norm_type=clip_norm_type, + sharding_strategy=sharding_strategy, ) @skip_if_lt_x_gpu(2) - @parametrize( - "cpu_offload", - [CPUOffload(offload_params=True), CPUOffload(offload_params=False)] - ) - @parametrize( - "backward_prefetch", - [BackwardPrefetch.BACKWARD_PRE, BackwardPrefetch.BACKWARD_POST, None] - ) - def test_delayed_optim_step(self, cpu_offload, backward_prefetch): + @parametrize(params, configs, subtest_name) + def test_delayed_optim_step(self, cpu_offload, backward_prefetch, sharding_strategy): # We use a model with a long CUDA delay right before the optimizer step. # This tests our streams logic, and that we don't start the allgather # until after the optimization step completes. @@ -147,18 +171,12 @@ def test_delayed_optim_step(self, cpu_offload, backward_prefetch): fsdp_init_mode=fsdp_init_mode, cpu_offload=cpu_offload, backward_prefetch=backward_prefetch, + sharding_strategy=sharding_strategy, ) @skip_if_lt_x_gpu(2) - @parametrize( - "cpu_offload", - [CPUOffload(offload_params=True), CPUOffload(offload_params=False)] - ) - @parametrize( - "backward_prefetch", - [BackwardPrefetch.BACKWARD_PRE, BackwardPrefetch.BACKWARD_POST, None] - ) - def test_delayed_reduce_scatter(self, cpu_offload, backward_prefetch): + @parametrize(params, configs, subtest_name) + def test_delayed_reduce_scatter(self, cpu_offload, backward_prefetch, sharding_strategy): # We insert a delay in the torch.distributed._reduce_scatter_base op, so that # the post_backward_stream takes much longer than the backward pass. # This tests that we properly block at the end of the backward pass for @@ -174,21 +192,16 @@ def test_delayed_reduce_scatter(self, cpu_offload, backward_prefetch): fsdp_init_mode=fsdp_init_mode, cpu_offload=cpu_offload, backward_prefetch=backward_prefetch, + sharding_strategy=sharding_strategy, ) def _dummy_ddp_fn(self, model): return DummyDDP(model) @skip_if_lt_x_gpu(2) - @parametrize( - "cpu_offload", - [CPUOffload(offload_params=True), CPUOffload(offload_params=False)] - ) - @parametrize( - "backward_prefetch", - [BackwardPrefetch.BACKWARD_PRE, BackwardPrefetch.BACKWARD_POST, None] - ) - def test_mixture_of_experts(self, cpu_offload, backward_prefetch): + @parametrize(params, configs, subtest_name) + @parametrize("clip_norm_type", [2.0, None]) + def test_mixture_of_experts(self, cpu_offload, backward_prefetch, sharding_strategy, clip_norm_type): init_modes = self._get_init_modes_for_test(cpu_offload) for fsdp_init_mode in init_modes: with self.subTest(fsdp_init_mode=fsdp_init_mode): @@ -200,18 +213,13 @@ def test_mixture_of_experts(self, cpu_offload, backward_prefetch): fsdp_init_mode=fsdp_init_mode, cpu_offload=cpu_offload, backward_prefetch=backward_prefetch, + norm_type=clip_norm_type, + sharding_strategy=sharding_strategy, ) @skip_if_lt_x_gpu(2) - @parametrize( - "cpu_offload", - [CPUOffload(offload_params=True), CPUOffload(offload_params=False)] - ) - @parametrize( - "backward_prefetch", - [BackwardPrefetch.BACKWARD_PRE, BackwardPrefetch.BACKWARD_POST, None] - ) - def test_mixture_of_experts_with_delay_before_free(self, cpu_offload, backward_prefetch): + @parametrize(params, configs, subtest_name) + def test_mixture_of_experts_with_delay_before_free(self, cpu_offload, backward_prefetch, sharding_strategy): init_modes = self._get_init_modes_for_test(cpu_offload) for fsdp_init_mode in init_modes: with self.subTest(fsdp_init_mode=fsdp_init_mode): @@ -222,15 +230,21 @@ def test_mixture_of_experts_with_delay_before_free(self, cpu_offload, backward_p fsdp_init_mode=fsdp_init_mode, cpu_offload=cpu_offload, backward_prefetch=backward_prefetch, + sharding_strategy=sharding_strategy, ) class TestParamInit(FSDPTest): @skip_if_lt_x_gpu(2) - def test_param_change_after_init(self): + @parametrize("mixed_precision", [True, False]) + def test_param_change_after_init(self, mixed_precision): group = dist.distributed_c10d._get_default_group() # Establish reference behavior. - model = self._get_wrapped_model(group, cuda_first=False) + mixed_precision = MixedPrecision() if mixed_precision else None + config = {"mixed_precision": mixed_precision} + model = self._get_wrapped_model( + group, mixed_precision=mixed_precision, cuda_first=False + ) model.eval() # no dropout for this test input = model.module.get_input(torch.device("cuda")) ref_output = model(*input) @@ -284,10 +298,15 @@ def _test_output_backward_hooks(self, model): @skip_if_lt_x_gpu(2) @parametrize("cuda_first", [False, True]) - def test_register_functions_called(self, cuda_first): + @parametrize("mixed_precision", [True, False]) + def test_register_functions_called(self, cuda_first, mixed_precision): """Tests that _register_{pre|post}_backward_hooks called during forward.""" group = dist.distributed_c10d._get_default_group() - model = self._get_wrapped_model(group, cuda_first=cuda_first) + mixed_precision = MixedPrecision() if mixed_precision else None + config = {"mixed_precision": mixed_precision} + model = self._get_wrapped_model( + group, mixed_precision=mixed_precision, cuda_first=cuda_first + ) input = model.module.get_input(torch.device("cuda")) model._register_post_backward_hooks = mock.MagicMock(return_value=None) model._register_pre_backward_hooks = mock.MagicMock(return_value=None) @@ -300,11 +319,19 @@ def test_register_functions_called(self, cuda_first): class TestNoGrad(FSDPTest): @skip_if_lt_x_gpu(2) - def test_transformer_no_grad(self): + @parametrize("mixed_precision", [True, False]) + def test_transformer_no_grad(self, mixed_precision): group = dist.distributed_c10d._get_default_group() - model = self._get_wrapped_model(group, cuda_first=False) + mixed_precision = MixedPrecision() if mixed_precision else None + config = {"mixed_precision": mixed_precision} + model = self._get_wrapped_model(group, config=config, cuda_first=False) # Train model for a step - self._train_for_several_steps(model, num_steps=1, autocast=False) + self._train_for_several_steps( + model, + num_steps=1, + autocast=False, + mixed_precision=config["mixed_precision"] + ) model.eval() # no dropout for this test @@ -321,6 +348,8 @@ def test_transformer_no_grad(self): instantiate_parametrized_tests(TestHooks) instantiate_parametrized_tests(TestParityWithDDP) +instantiate_parametrized_tests(TestNoGrad) +instantiate_parametrized_tests(TestParamInit) if __name__ == "__main__": run_tests() diff --git a/test/distributed/fsdp/test_fsdp_grad_acc.py b/test/distributed/fsdp/test_fsdp_grad_acc.py new file mode 100644 index 00000000000000..f2569266c34711 --- /dev/null +++ b/test/distributed/fsdp/test_fsdp_grad_acc.py @@ -0,0 +1,261 @@ +# Owner(s): ["oncall: distributed"] + +import contextlib +import itertools +import sys +from dataclasses import dataclass +from typing import List, Optional, Tuple + +import torch +from torch import distributed as dist +from torch.distributed.fsdp import CPUOffload +from torch.distributed.fsdp import FullyShardedDataParallel as FSDP +from torch.distributed.fsdp.fully_sharded_data_parallel import BackwardPrefetch +from torch.testing._internal.common_distributed import skip_if_lt_x_gpu +from torch.testing._internal.common_fsdp import FSDPTest +from torch.testing._internal.common_utils import ( + TEST_WITH_DEV_DBG_ASAN, + instantiate_parametrized_tests, + parametrize, + run_tests, +) + +if not dist.is_available(): + print("Distributed not available, skipping tests", file=sys.stderr) + sys.exit(0) + +if TEST_WITH_DEV_DBG_ASAN: + print( + "Skip dev-asan as torch + multiprocessing spawn have known issues", + file=sys.stderr, + ) + sys.exit(0) + + +@dataclass +class _GradAccConfig: + """ + This configures how gradients are accumulated in :meth:`_test_grad_acc`. + Each instance of this class represents ``num_iters``-many consecutive + iterations, where the ``no_sync()`` context manager is used or not as given + by ``use_no_sync``. + + Attributes: + use_no_sync (bool): Indicates whether to use the ``no_sync()`` context + manager as the way to accumulate gradients. + num_iters (int): Number of iterations to accumulate gradients. + """ + use_no_sync: bool + num_iters: int + + def __repr__(self) -> str: + # Override to remove any spaces in the string to appease the internal + # build's test name parser + return ( + f"(use_no_sync={self.use_no_sync}," + f"num_iters={self.num_iters})" + ) + + +@dataclass +class _GradAccConfigs: + """ + This wraps a :class:`list` of :class:`_GradAccConfig` instances with the + sole purpose of overriding :meth:`__repr__` to remove spaces. + """ + configs: List[_GradAccConfig] + + def __repr__(self) -> str: + # Override to remove any spaces in the string to appease the internal + # build's test name parser + return ( + "[" + ",".join(config.__repr__() for config in self.configs) + "]" + ) + + +class TestGradAcc(FSDPTest): + """Tests ``FullyShardedDataParallel``'s gradient accumulation via both its + ``no_sync()`` context manager and without the context manager.""" + + def _test_grad_acc( + self, + batch_dim: int, + configs: List[_GradAccConfig], + cpu_offload: CPUOffload, + backward_prefetch: Optional[BackwardPrefetch], + ): + """ + Tests gradient accumulation by comparing a run that trains sequentially + through some batches while accumulating gradients with a run that + trains on the concatenation of those batches in a single iteration. + + The last iteration always synchronizes gradients regardless of what is + specified by the last element of ``configs``. + + Arguments: + batch_dim (int): Batch dimension in the input tensor to be passed + into the model for the forward pass. + configs (List[_GradAccConfig]): :class:`list` of configurations + specifying how gradients are accumulated; for example, a list + corresponding to [(False, 2), (True, 2), (False, 2)] indicates + to accumulate over 2 + 2 + 2 = 6 total iterations, where the + first two do not use ``no_sync()``, the middle two do use + ``no_sync()``, and the final two again do not use + ``no_sync()``. + cpu_offload (CPUOffload): Configures CPU offloading. + backward_prefetch (Optional[BackwardPrefetch]): Specifies at which + point to prefetch the next layer's full parameters during the + backward pass, if at all. + """ + # Gradient accumulation outside `no_sync()` is not currently compatible + # with CPU offloading + if cpu_offload.offload_params and \ + any(not config.use_no_sync for config in configs): + return + old_allow_tf32 = torch.backends.cuda.matmul.allow_tf32 + try: + # Disable TF32 to prevent floating point drift + torch.backends.cuda.matmul.allow_tf32 = False + + # Initialize the FSDP model and optimizer + group = dist.distributed_c10d._get_default_group() + fsdp_model: FSDP = self._get_wrapped_model( + group, cuda_first=False, add_bn=False, + config={ + "cpu_offload": cpu_offload, + "backward_prefetch": backward_prefetch, + }, + ) # disable BN since the test uses varying batch sizes + fsdp_model.eval() # disable dropout + device = torch.device("cuda") + optim = torch.optim.SGD( + fsdp_model.parameters(), lr=0.01, momentum=0.9, + ) + + # Generate the sequence of batches, each containing the same data + # but permuted + def permute_tensor(x: torch.Tensor): + return x.view(-1)[torch.randperm(x.numel())].view_as(x) + + batch: Tuple[torch.Tensor, ...] = \ + fsdp_model.module.get_input(device) + batches: List[Tuple[torch.Tensor, ...]] = [batch] + num_iters_to_acc = sum(config.num_iters for config in configs) + for _ in range(num_iters_to_acc - 1): + batches.append(tuple(permute_tensor(t) for t in batch)) + for (batch1, batch2) in itertools.combinations(batches, r=2): + for t1, t2 in zip(batch1, batch2): + assert not torch.all(t1 == t2), \ + "Check the test to make sure that batches are distinct" + + # Concatenate the batches along the given batch dimension + concat_batch: Tuple[torch.Tensor, ...] = tuple( + torch.cat(ts, dim=batch_dim) for ts in zip(*batches) + ) + + # Establish reference gradients using the concatenated batch + fsdp_model.zero_grad() + output = fsdp_model(*concat_batch) + ref_loss = fsdp_model.module.get_loss(concat_batch, output) + ref_loss.backward() + ref_grads = [ + p.grad.detach().clone() for p in fsdp_model.parameters() + ] + + # Compute and accumulate the gradients + fsdp_model.zero_grad() + losses = [] + batch_idx = 0 + for config in configs: + sync_context = fsdp_model.no_sync() if config.use_no_sync \ + else contextlib.suppress() + with sync_context: + for _ in range(config.num_iters): + if batch_idx == num_iters_to_acc - 1: + break # always sync on the last iteration + batch = batches[batch_idx] + batch_idx += 1 + output = fsdp_model(*batch) + loss = fsdp_model.module.get_loss(batch, output) + loss.backward() + losses.append(loss) + output = fsdp_model(*batches[-1]) + loss = fsdp_model.module.get_loss(batches[-1], output) + loss.backward() + losses.append(loss) + acc_loss = sum(losses) + acc_grads = [ + p.grad.detach().clone() for p in fsdp_model.parameters() + ] + + # Compare the losses and gradients + torch.testing.assert_close(ref_loss, acc_loss) + self.assertEqual(len(ref_grads), len(acc_grads)) + for ref_grad, acc_grad in zip(ref_grads, acc_grads): + self.assertEqual(ref_grad.device, acc_grad.device) + self.assertEqual(ref_grad.size(), acc_grad.size()) + self.assertEqual(ref_grad.dtype, acc_grad.dtype) + torch.testing.assert_close(ref_grad, acc_grad) + + # Check that the optimizer step does not error + optim.step() + finally: + torch.backends.cuda.matmul.allow_tf32 = old_allow_tf32 + + @skip_if_lt_x_gpu(2) + @parametrize( + "configs", + [ + _GradAccConfigs([ + _GradAccConfig(use_no_sync=True, num_iters=3), + _GradAccConfig(use_no_sync=False, num_iters=3), + _GradAccConfig(use_no_sync=True, num_iters=3), + ]), + _GradAccConfigs([ + _GradAccConfig(use_no_sync=False, num_iters=3), + _GradAccConfig(use_no_sync=True, num_iters=3), + _GradAccConfig(use_no_sync=False, num_iters=3), + ]), + ] + ) + @parametrize( + "cpu_offload", + [CPUOffload(offload_params=False), CPUOffload(offload_params=True)], + ) + @parametrize( + "backward_prefetch", + [BackwardPrefetch.BACKWARD_PRE, BackwardPrefetch.BACKWARD_POST, None], + ) + def test_grad_acc( + self, + configs: _GradAccConfigs, + cpu_offload: CPUOffload, + backward_prefetch: Optional[BackwardPrefetch], + ): + """ + Tests gradient accumulation. + + This exercises gradient accumulation inside and outside the + ``no_sync()`` context manager, in particular by interleaving the two. + It tests both interleaving starting with (and ending with, resp.) + inside versus outside ``no_sync()`` to ensure that initial conditions + (and final conditions, resp.) do not affect the correctness. This test + also checks for compatibility with the CPU offload and backward + prefetch options. + + NOTE: Gradient accumulation without using the ``no_sync()`` context + manager is not currently compatible with CPU offloading, so those tests + are vacuous. + """ + self._test_grad_acc( + batch_dim=1, + configs=configs.configs, + cpu_offload=cpu_offload, + backward_prefetch=backward_prefetch, + ) + + +instantiate_parametrized_tests(TestGradAcc) + +if __name__ == "__main__": + run_tests() diff --git a/test/distributed/fsdp/test_fsdp_mixed_precision.py b/test/distributed/fsdp/test_fsdp_mixed_precision.py new file mode 100644 index 00000000000000..d2295a93f1c9d1 --- /dev/null +++ b/test/distributed/fsdp/test_fsdp_mixed_precision.py @@ -0,0 +1,426 @@ +# Owner(s): ["oncall: distributed"] + +import sys +import contextlib +from functools import partial +from itertools import product + +import torch +import torch.cuda.nccl as nccl +import torch.nn as nn +from torch import distributed as dist +from torch.distributed.fsdp import ( + FullyShardedDataParallel as FSDP, + CPUOffload, + MixedPrecision, + BackwardPrefetch, + ShardingStrategy, +) +from torch.testing._internal.common_distributed import skip_if_lt_x_gpu +from torch.testing._internal.common_fsdp import ( + FSDPTest, + subtest_name, +) +from torch.testing._internal.common_utils import ( + instantiate_parametrized_tests, + parametrize, + run_tests, + TEST_WITH_DEV_DBG_ASAN, +) +from torch.testing._internal.common_cuda import CUDA11OrLater + + +if not dist.is_available(): + print("Distributed not available, skipping tests", file=sys.stderr) + sys.exit(0) + +if TEST_WITH_DEV_DBG_ASAN: + print( + "Skip dev-asan as torch + multiprocessing spawn have known issues", + file=sys.stderr, + ) + sys.exit(0) + +# Various mixed precision configs to test under. +default_mp = MixedPrecision() + +nccl_supports_bf16 = ( + CUDA11OrLater and dist.is_nccl_available() and nccl.version() >= (2, 10) +) + +mp_configs = [default_mp] + +if nccl_supports_bf16: + mp_diff_reduce = MixedPrecision(reduce_dtype=torch.bfloat16) + mp_diff_buffer = MixedPrecision(buffer_dtype=torch.bfloat16) + mp_diff_buffer_and_reduce = MixedPrecision(buffer_dtype=torch.bfloat16, reduce_dtype=torch.float32) + mp_configs.extend([ + mp_diff_reduce, mp_diff_buffer, mp_diff_buffer_and_reduce, + ]) + +# Buffer original dtype, which can differ from model params. +buffer_orig_dtype = torch.float64 + +params = "mp_config,cpu_offload,backward_prefetch,full_precision_param_dtype" +cpu_offload_config = [ + CPUOffload(offload_params=True), CPUOffload(offload_params=False) +] +backward_prefetch_config = [ + BackwardPrefetch.BACKWARD_PRE, BackwardPrefetch.BACKWARD_POST +] +full_precision_param_dtype_config = [torch.float32, torch.float64] +configs = list(product( + mp_configs, + cpu_offload_config, + backward_prefetch_config, + full_precision_param_dtype_config, +)) + +test_name_mapping = { + str(CPUOffload(offload_params=True)): "offload_true", + str(CPUOffload(offload_params=False)): "offload_false", + str(BackwardPrefetch.BACKWARD_PRE): "prefetch_pre", + str(BackwardPrefetch.BACKWARD_POST): "prefetch_post", + str(default_mp): "mp_fp16", + str(torch.float32): "fp32", + str(torch.float64): "fp64", +} + +if nccl_supports_bf16: + test_name_mapping.update({ + str(mp_diff_reduce): "mp_diff_reduce", + str(mp_diff_buffer): "mp_diff_buffer", + str(mp_diff_buffer_and_reduce): "mp_diff_buffer_reduce", + }) + +subtest_name = partial(subtest_name, test_name_mapping) + +@contextlib.contextmanager +def patch_reduce_scatter(new_reduce_scatter): + """ + Patches dist._reduce_scatter_base with a new reduce_scatter_base and + restores upon exiting. Used for validation of mixed precision + """ + orig_reduce_scatter = dist._reduce_scatter_base + dist._reduce_scatter_base = new_reduce_scatter + try: + yield + finally: + dist._reduce_scatter_base = orig_reduce_scatter + +class LinearMixedPrecision(nn.Module): + """ + A linear module with extra checks for mixed precision training. + """ + def __init__(self, param_dtype): + super().__init__() + self.lin = nn.Linear(10, 10, bias=False).to(param_dtype) + self.register_buffer('buffer', torch.randn((1, 2), dtype=buffer_orig_dtype)) + + def forward(self, tup): + # Param and input should be the mixed precision type + inp, cls, fsdp, mp_config, full_precision_param_dtype = tup + expected_param_type = mp_config.param_dtype + expected_buffer_type = mp_config.buffer_dtype + cls.assertEqual(inp.dtype, expected_param_type) + # Buffer should be in specified precision as well. + cls.assertEqual(self.buffer.dtype, expected_buffer_type) + + # In FSDP, self.params should point to the right type. + num_active_fsdp = 0 + for fsdp_module in FSDP.fsdp_modules(fsdp): + fsdp_managed_params = fsdp_module.params + # Single param assumption + cls.assertEqual(1, len(fsdp_managed_params)) + for param in fsdp_managed_params: + # FSDP unit is currently active if it is not using the param + # local shard. This supports both FULL_SHARD and SHARD_GRAD_OP + # cases. In FULL_SHARD, we have the additional property that + # param._full_param_padded has not been freed. + is_fsdp_unit_active = ( + param._is_sharded and + (param.data.data_ptr() != param._local_shard.data_ptr()) + ) + if is_fsdp_unit_active: + num_active_fsdp += 1 + # This FSDP unit is active, verify param points to mixed + cls.assertEqual(param.dtype, expected_param_type) + # _rebuild_full_param should have also freed the fp16 shard. + cls.assertEqual(0, param._mp_shard.storage().size()) + elif param._is_sharded: + # This FSDP unit is not active as full param has been + # freed or not yet allocated. Ensure param points to full + # precision param. + cls.assertEqual(param.dtype, full_precision_param_dtype) + # We should have gotten at least one active FSDP unit for sharded + # (world size > 1) cases. For cases where param is not sharded + # (ie world_size == 1) it is a bit hard to check if FSDP unit is active + # as we'd always point to the local shard, so we rely on the forward + # pass self.lin(inp) working well and inp being reduced precision to + # implicitly validate that the param is indeed in the reduced precision. + if cls.world_size > 1: + cls.assertGreater(num_active_fsdp, 0) + + return (self.lin(inp), cls, fsdp, mp_config, full_precision_param_dtype) + + +class TestFSDPMixedPrecision(FSDPTest): + @property + def world_size(self): + raise ValueError("To be implemented by child classes") + + def _get_simple_nested_model(self, param_dtype, *fsdp_args, **fsdp_kwargs): + model = FSDP( + nn.Sequential( + FSDP(LinearMixedPrecision(param_dtype).cuda(), *fsdp_args, **fsdp_kwargs), + LinearMixedPrecision(param_dtype).cuda(), + ), + *fsdp_args, + **fsdp_kwargs, + ) + return model + + def _get_simple_model(self, param_dtype, *fsdp_args, **fsdp_kwargs): + model = FSDP(LinearMixedPrecision(param_dtype).cuda(), *fsdp_args, **fsdp_kwargs) + return model + + def _validate_mp_shard_freed(self, fsdp_model): + """ + Ensures that the mixed precision shard is greed for all FSDP units. + """ + fsdp_units = FSDP.fsdp_modules(fsdp_model) + for fsdp in fsdp_units: + for param in fsdp.params: + self.assertEqual(0, param._mp_shard.storage().size()) + + def _reduce_scatter_base_validate_mp( + self, + orig_reduce_scatter, + mp_config, + *args, + **kwargs + ): + """ + Performs dist._reduce_scatter_base but verifies mixed precision settings + before. This is to test mixed precision is working as expected during + backward pass. + """ + tensors = [] + for x in args: + if isinstance(x, torch.Tensor): + tensors.append(x) + for _, x in kwargs.items(): + if isinstance(x, torch.Tensor): + tensors.append(x) + + # reduce_dtype has higher priority than param_dtype, because mixed_precision + # supports overriding param_dtype with reduce_dtype to control the + # reduction precision. In the case where reduce_dtype == param_dtype + # this tests that gradients are in the expected precision as well. + expected_dtype = mp_config.reduce_dtype + for t in tensors: + self.assertEqual(expected_dtype, t.dtype) + + return orig_reduce_scatter(*args, **kwargs) + + def _run_test_mixed_precision_e2e( + self, + mp_config, + cpu_offload, + backward_prefetch, + full_precision_param_dtype, + sharding_strategy, + ): + torch.cuda.set_device(self.rank) + fsdp_models = [ + self._get_simple_model( + param_dtype=full_precision_param_dtype, + sharding_strategy=sharding_strategy, + cpu_offload=cpu_offload, + mixed_precision=mp_config, + backward_prefetch=backward_prefetch + ), + self._get_simple_nested_model( + param_dtype=full_precision_param_dtype, + sharding_strategy=sharding_strategy, + cpu_offload=cpu_offload, + mixed_precision=mp_config, + backward_prefetch=backward_prefetch + ), + ] + for model in fsdp_models: + if not cpu_offload.offload_params: + model.cuda() + + # Patch reduce_scatter to add validation for mixed precision types. + orig_reduce_scatter = dist._reduce_scatter_base + test_reduce_scatter = partial( + self._reduce_scatter_base_validate_mp, orig_reduce_scatter, mp_config, + ) + with patch_reduce_scatter(test_reduce_scatter): + optim = torch.optim.Adam(model.parameters()) + + for _ in range(3): + inp = torch.randn(3, 10).cuda() + # Forward pass of LinearMixedPrecision check casting of + # inputs, params, buffers. + act, *_ = model( + (inp, self, model, mp_config, full_precision_param_dtype) + ) + # Buffers should be casted. + for buf in model.buffers(): + self.assertEqual(buf.dtype, mp_config.buffer_dtype) + # p._mp_shard should be freed. + if model.params[0]._is_sharded: # i.e. world_size > 1 + # TODO: free the mixed precision shard after forward + # when world_size == 1 as well, currently when + # world_size == 1 it is only freed after backward. + self._validate_mp_shard_freed(model) + + loss = act.sum() + self.assertEqual(loss.dtype, mp_config.param_dtype) + # Will run patched reduce scatter that validates mixed_precision + # types in backward. + loss.backward() + # Buffers stay casted even after backwards. + for buf in model.buffers(): + self.assertEqual(buf.dtype, mp_config.buffer_dtype) + # p._mp_shard should be freed. + self._validate_mp_shard_freed(model) + + # Ensure params and grads are in full precision + for param in model.parameters(): + self.assertEqual(param.dtype, full_precision_param_dtype) + if param.grad is not None: + self.assertEqual(param.grad.dtype, full_precision_param_dtype) + + optim.step() + + # Summon full params should be in full precision + with model.summon_full_params(): + # It is not expected for summon_full_params to allocate + # a mixed precision shard. + self._validate_mp_shard_freed(model) + params = list(model.parameters()) + for p in params: + self.assertEqual(p.dtype, full_precision_param_dtype) + + # Note that buffers are cast only once and only restored + # to the original buffer dtype in state_dict, so + # summon_full_params is not expected to restore buffer + # types to their original. + named_buffers = dict(model.named_buffers()) + for k, v in named_buffers.items(): + self.assertEqual(v.dtype, mp_config.buffer_dtype) + + # state_dict should be in full precision + state_dict = {k: v.clone() for k, v in model.state_dict().items()} + for name, tensor in state_dict.items(): + # Parameters and buffers are checkpointed in their + # original dtypes, which may be different. + if name in named_buffers.keys(): + self.assertEqual(tensor.dtype, buffer_orig_dtype) + else: + self.assertEqual( + tensor.dtype, full_precision_param_dtype, + f"{name}: {tensor.dtype} vs {full_precision_param_dtype}" + ) + + # After state_dict, buffer's dtype should have been restored + # to the mixed precision one. + for buf in model.buffers(): + self.assertEqual(buf.dtype, mp_config.buffer_dtype) + + +class TestFSDPMixedPrecisionSharded(TestFSDPMixedPrecision): + + @property + def world_size(self): + return 2 + + @skip_if_lt_x_gpu(2) + def test_mixed_precision_no_reshard_after_forward(self): + # Note that we don't exercise all possible different configs so as to + # not increase test TTS too much. + mp = default_mp if not nccl_supports_bf16 else mp_diff_buffer_and_reduce + self._run_test_mixed_precision_e2e( + mp_config=mp, + cpu_offload=CPUOffload(offload_params=True), + backward_prefetch=None, + full_precision_param_dtype=torch.float64, + sharding_strategy=ShardingStrategy.SHARD_GRAD_OP, + ) + + @skip_if_lt_x_gpu(2) + @parametrize(params, configs, subtest_name) + def test_mixed_precision_e2e_full_shard( + self, + mp_config, + cpu_offload, + backward_prefetch, + full_precision_param_dtype + ): + self._run_test_mixed_precision_e2e( + mp_config, + cpu_offload, + backward_prefetch, + full_precision_param_dtype, + ShardingStrategy.FULL_SHARD, + ) + + @skip_if_lt_x_gpu(2) + def test_mixed_precision_embedding_table(self): + # Basic test to ensure int inputs are not casted which would break + # modules such as embedding tables. + mp_config = MixedPrecision() + model = self._get_wrapped_model( + group=torch.distributed.distributed_c10d._get_default_group(), + config={"mixed_precision": mp_config} + ) + optim = torch.optim.SGD(model.parameters(), lr=0.1) + for _ in range(6): + inp = model.module.get_input(torch.device("cuda")) + # This would fail if we casted integer module inputs such as for + # embedding tables. + output = model(*inp) + loss = model.module.get_loss(inp, output).cuda() + self.assertEqual(loss.dtype, mp_config.param_dtype) + model.module.run_backward(loss) + optim.step() + +class TestFSDPMixedPrecisionUnsharded(TestFSDPMixedPrecision): + """ + Smaller test suite for unshared param (i.e. world_size == 1) case. + """ + @property + def world_size(self): + return 1 + + @skip_if_lt_x_gpu(1) + def test_mixed_precision_no_reshard_after_forward(self): + # Note that we don't exercise all possible different configs so as to + # not increase test TTS too much. + mp = default_mp if not nccl_supports_bf16 else mp_diff_buffer_and_reduce + self._run_test_mixed_precision_e2e( + mp_config=mp, + cpu_offload=CPUOffload(offload_params=True), + backward_prefetch=None, + full_precision_param_dtype=torch.float64, + sharding_strategy=ShardingStrategy.SHARD_GRAD_OP, + ) + + @skip_if_lt_x_gpu(1) + def test_mixed_precision_e2e_full_shard(self): + mp = default_mp if not nccl_supports_bf16 else mp_diff_buffer_and_reduce + self._run_test_mixed_precision_e2e( + mp_config=mp, + cpu_offload=CPUOffload(offload_params=True), + backward_prefetch=None, + full_precision_param_dtype=torch.float64, + sharding_strategy=ShardingStrategy.FULL_SHARD, + ) + +instantiate_parametrized_tests(TestFSDPMixedPrecisionSharded) + +if __name__ == "__main__": + run_tests() diff --git a/test/distributed/fsdp/test_fsdp_no_sync.py b/test/distributed/fsdp/test_fsdp_no_sync.py deleted file mode 100644 index 1016de3fe6af0c..00000000000000 --- a/test/distributed/fsdp/test_fsdp_no_sync.py +++ /dev/null @@ -1,166 +0,0 @@ -# Owner(s): ["oncall: distributed"] - -import itertools -import sys -from typing import List, Optional, Tuple - -import torch -from torch import distributed as dist -from torch.distributed.fsdp import CPUOffload -from torch.distributed.fsdp import FullyShardedDataParallel as FSDP -from torch.distributed.fsdp.fully_sharded_data_parallel import BackwardPrefetch -from torch.testing._internal.common_distributed import skip_if_lt_x_gpu -from torch.testing._internal.common_fsdp import FSDPTest -from torch.testing._internal.common_utils import ( - TEST_WITH_DEV_DBG_ASAN, - instantiate_parametrized_tests, - parametrize, - run_tests, -) - -if not dist.is_available(): - print("Distributed not available, skipping tests", file=sys.stderr) - sys.exit(0) - -if TEST_WITH_DEV_DBG_ASAN: - print( - "Skip dev-asan as torch + multiprocessing spawn have known issues", - file=sys.stderr, - ) - sys.exit(0) - - -class TestNoSync(FSDPTest): - """Tests ``FullyShardedDataParallel``'s gradient accumulation via its - ``no_sync()`` context manager.""" - - def _test_no_sync( - self, - batch_dim: int, - num_iters_to_acc: int, - cpu_offload: CPUOffload, - backward_prefetch: Optional[BackwardPrefetch], - ): - """ - Tests ``no_sync()`` by comparing a run that trains sequentially through - some batches while accumulating gradients with a run that trains on the - concatenation of those batches in a single iteration. The number of - batches, i.e. the number of iterations for which to accumulate - gradients, is given by ``num_iters_to_acc``. - - Arguments: - batch_dim (int): Batch dimension in the input tensor to be passed - into the model for the forward pass. - num_iters_to_acc (int): Number of iterations for which to - accumulate gradients; all but the last iteration are run using - the ``no_sync()`` context manager so that gradients are not - synchronized until the final iteration. - cpu_offload (CPUOffload): Configures CPU offloading. - backward_prefetch (Optional[BackwardPrefetch]): Specifies at which - point to prefetch the next layer's full parameters during the - backward pass, if at all. - """ - old_allow_tf32 = torch.backends.cuda.matmul.allow_tf32 - try: - # Disable TF32 to prevent floating point drift - torch.backends.cuda.matmul.allow_tf32 = False - - # Initialize the FSDP model and optimizer - group = dist.distributed_c10d._get_default_group() - fsdp_model: FSDP = self._get_wrapped_model( - group, cuda_first=False, add_bn=False, - cpu_offload=cpu_offload, backward_prefetch=backward_prefetch, - ) # disable BN since the test uses varying batch sizes - fsdp_model.eval() # disable dropout - device = torch.device("cuda") - optim = torch.optim.SGD(fsdp_model.parameters(), lr=0.01, momentum=0.9) - - # Generate the sequence of batches, each containing the same data but - # permuted - def permute_tensor(x: torch.Tensor): - return x.view(-1)[torch.randperm(x.numel())].view_as(x) - - batch: Tuple[torch.Tensor, ...] = fsdp_model.module.get_input(device) - batches: List[Tuple[torch.Tensor, ...]] = [batch] - for _ in range(num_iters_to_acc - 1): - batches.append(tuple(permute_tensor(t) for t in batch)) - for (batch1, batch2) in itertools.combinations(batches, r=2): - for t1, t2 in zip(batch1, batch2): - assert not torch.all(t1 == t2) - - # Concatenate the batches along the given batch dimension - concat_batch: Tuple[torch.Tensor, ...] = tuple( - torch.cat(ts, dim=batch_dim) for ts in zip(*batches) - ) - - # Establish reference gradients using the concatenated batch - fsdp_model.zero_grad() - output = fsdp_model(*concat_batch) - ref_loss = fsdp_model.module.get_loss(concat_batch, output) - ref_loss.backward() - ref_grads = [p.grad.detach().clone() for p in fsdp_model.parameters()] - - # Compute the gradients by accumulating via `no_sync()` - fsdp_model.zero_grad() - losses = [] - with fsdp_model.no_sync(): - for batch in batches[:-1]: # accumulate for all but the last batch - output = fsdp_model(*batch) - loss = fsdp_model.module.get_loss(batch, output) - loss.backward() - losses.append(loss) - output = fsdp_model(*batches[-1]) - loss = fsdp_model.module.get_loss(batches[-1], output) - loss.backward() - losses.append(loss) - acc_loss = sum(losses) - acc_grads = [p.grad.detach().clone() for p in fsdp_model.parameters()] - - # Compare the losses and gradients - torch.testing.assert_allclose(ref_loss, acc_loss) - assert len(ref_grads) == len(acc_grads) - for ref_grad, acc_grad in zip(ref_grads, acc_grads): - assert ref_grad.device == acc_grad.device - assert ref_grad.size() == acc_grad.size() - assert ref_grad.dtype == acc_grad.dtype - torch.testing.assert_allclose(ref_grad, acc_grad) - - # Check that the optimizer step does not error - optim.step() - finally: - torch.backends.cuda.matmul.allow_tf32 = old_allow_tf32 - - @skip_if_lt_x_gpu(2) - @parametrize( - "num_iters_to_acc", - [2, 4], - ) - @parametrize( - "cpu_offload", - [CPUOffload(offload_params=False), CPUOffload(offload_params=True)], - ) - @parametrize( - "backward_prefetch", - [BackwardPrefetch.BACKWARD_PRE, BackwardPrefetch.BACKWARD_POST, None] - ) - def test_no_sync( - self, - num_iters_to_acc: int, - cpu_offload: CPUOffload, - backward_prefetch: Optional[BackwardPrefetch], - ): - """Tests the ``no_sync()`` context manager.""" - assert num_iters_to_acc >= 2, \ - "Accumulate for at least 2 iterations to be nontrivial" - self._test_no_sync( - batch_dim=1, - num_iters_to_acc=num_iters_to_acc, - cpu_offload=cpu_offload, - backward_prefetch=backward_prefetch, - ) - - -instantiate_parametrized_tests(TestNoSync) - -if __name__ == "__main__": - run_tests() diff --git a/test/distributed/fsdp/test_fsdp_optim_state.py b/test/distributed/fsdp/test_fsdp_optim_state.py new file mode 100644 index 00000000000000..cfe22062d356e5 --- /dev/null +++ b/test/distributed/fsdp/test_fsdp_optim_state.py @@ -0,0 +1,591 @@ +# Owner(s): ["oncall: distributed"] + +import sys +from typing import Any, Dict, List, Type + +import torch +from torch import distributed as dist +from torch.distributed.fsdp import FullyShardedDataParallel as FSDP +from torch.distributed.fsdp.fully_sharded_data_parallel import ( + OptimStateKeyType, +) +from torch.testing._internal.common_distributed import skip_if_lt_x_gpu +from torch.testing._internal.common_fsdp import FSDPTest +from torch.testing._internal.common_utils import ( + TEST_WITH_DEV_DBG_ASAN, + instantiate_parametrized_tests, + parametrize, + run_tests, +) + +if not dist.is_available(): + print("Distributed not available, skipping tests", file=sys.stderr) + sys.exit(0) + +if TEST_WITH_DEV_DBG_ASAN: + print( + "Skip dev-asan as torch + multiprocessing spawn have known issues", + file=sys.stderr, + ) + sys.exit(0) + + +class Bias(torch.nn.Module): + """This module applies a 1D additive bias with dimension ``dim``.""" + def __init__(self, dim: int) -> None: + super().__init__() + assert dim > 0 + torch.manual_seed(0) + self.bias = torch.nn.Parameter(torch.randn((dim,))) + + def forward(self, x): + return x + self.bias + + +class BlockA(torch.nn.Module): + """ + Used to define interesting nested structure for FSDP wrapping. + BlockA + Bias0 + bias + weight + Bias1 + bias + """ + def __init__(self, in_dim: int, out_dim: int) -> None: + super().__init__() + assert all(v > 0 for v in (in_dim, out_dim)) + torch.manual_seed(0) + self.bias_module0 = Bias(out_dim) + self.weight = torch.nn.Parameter(torch.randn((in_dim, out_dim))) + self.bias_module1 = Bias(out_dim) + self.relu = torch.nn.ReLU() + + def forward(self, x): + x = x @ self.weight + x = self.bias_module0(x) + x = self.relu(x) # ensure biases have different gradients + x = self.bias_module1(x) + return x + +class BlockB(torch.nn.Module): + """ + Used to define interesting nested structure for FSDP wrapping. + BlockB + weight + Bias + bias + Bias + bias + """ + def __init__(self, in_dim: int, out_dim: int) -> None: + super().__init__() + assert all(v > 0 for v in (in_dim, out_dim)) + torch.manual_seed(0) + self.weight = torch.nn.Parameter(torch.randn((in_dim, out_dim))) + self.bias_module0 = Bias(out_dim) + self.bias_module1 = Bias(out_dim) + self.relu = torch.nn.ReLU() + + def forward(self, x): + x = x @ self.weight + x = self.bias_module0(x) + x = self.relu(x) # ensure biases have different gradients + x = self.bias_module1(x) + return x + + +class NestedModel(torch.nn.Module): + def __init__(self) -> None: + super().__init__() + self.block0 = BlockB(5, 7) + self.block1 = BlockB(7, 7) + self.bias = torch.nn.Parameter(torch.randn((5,))) + self.block2 = torch.nn.Sequential( + BlockA(7, 9), + BlockA(9, 9), + BlockB(9, 5), + ) + self.relu = torch.nn.ReLU() + + def forward(self, x) -> torch.Tensor: + x = self.relu(self.block0(x)) + x = self.relu(self.block1(x)) + x = self.relu(self.block2(x)) + x = x + self.bias + return x + + def get_input(self, device): + BATCH_SIZE = 8 + return (torch.randn((BATCH_SIZE, 5)).to(device),) + + def get_loss(self, inp, output): + return output.sum() + + def run_backward(self, loss): + loss.backward() + + @staticmethod + def wrap(model, group=None) -> torch.nn.Module: + # Flatten Bias0; then flatten weight and Bias1 together into `block1` + model.block1.bias_module0 = FSDP( + model.block1.bias_module0, process_group=group, + ) + model.block1 = FSDP(model.block1, process_group=group) + # Flatten Bias0; flatten Bias1; then flatten weight into `block2[1]` + model.block2[1].bias_module0 = FSDP( + model.block2[1].bias_module0, process_group=group, + ) + model.block2[1].bias_module1 = FSDP( + model.block2[1].bias_module1, process_group=group, + ) + model.block2[1] = FSDP(model.block2[1], process_group=group) + # Flatten weight, Bias, bias into `block2[2]` + model.block2[2] = FSDP(model.block2[2], process_group=group) + return model + + @staticmethod + def wrap_alt(model, group=None) -> torch.nn.Module: + model.block0.bias_module0 = FSDP( + model.block0.bias_module0, process_group=group, + ) + model.block0 = FSDP(model.block0, process_group=group) + return model + + # NOTE: We exclude `self.bias` from either parameter group to test the + # case where the optimizer input does not include all model parameters + def param_group0(self) -> List[torch.nn.Parameter]: + # Use `block1`'s parameters for the first parameter group to deviate + # from the `model.parameters()` order + return list(self.block1.parameters()) + + def param_group1(self) -> List[torch.nn.Parameter]: + # Deviate from the `model.parameters()` order further by rearranging + # `block2`'s parameters to be before `block0`'s parameters + return list(self.block2.parameters()) + \ + list(self.block0.parameters()) + + +class TestFSDPOptimState(FSDPTest): + def _init_nested_model( + self, + wrap: bool, + wrap_alt: bool = False, # ignored if `wrap=False` + device: torch.device = torch.device("cuda"), + group=None, + optim_class: Type[torch.optim.Optimizer] = torch.optim.Adam, + use_multiple_param_groups: bool = False, + ): + model = NestedModel().to(device) + if wrap: + model = NestedModel.wrap_alt(model, group) if wrap_alt \ + else NestedModel.wrap(model, group) + if not use_multiple_param_groups: + optim_input = list(model.parameters()) + else: + optim_input = [ + {"params": model.param_group0()}, + {"params": model.param_group1(), "weight_decay": 0.9} + ] + optim = optim_class(optim_input, lr=0.01) + return model, optim, optim_input + + def _init_transformer_model( + self, + wrap: bool, + device: torch.device = torch.device("cuda"), + group=None, + optim_class: Type[torch.optim.Optimizer] = torch.optim.Adam, + use_multiple_param_groups: bool = False, + ): + assert not use_multiple_param_groups, \ + "Multiple parameter groups for the transformer is not implemented" + if group is None: + group = dist.distributed_c10d._get_default_group() + model = self._get_wrapped_model(group=group).to(device) if wrap \ + else self._get_nonwrapped_model(group=group).to(device) + model.eval() # disable dropout for determinism + optim = optim_class(model.parameters(), lr=0.01) + return model, optim, None + + def _step_model( + self, + model: torch.nn.Module, + optim: torch.optim.Optimizer, + device: torch.device = torch.device("cuda"), + num_iters: int = 1, + ) -> List[float]: + """Performs a forward pass, backward pass, and optimizer step + ``num_iters``-many times, and returns the per-iteration losses.""" + torch.manual_seed(0) # set seed for determinism + losses = [] + module = model.module if hasattr(model, "module") else model + for _ in range(num_iters): + inp = module.get_input(device) + output = model(*inp) + loss = module.get_loss(inp, output).to(device) + losses.append(loss.item()) + module.run_backward(loss) + optim.step() + return losses + + def _broadcast_full_osd(self, full_osd: Dict[str, Any], group=None): + """Broadcasts the full optimizer state dict in place of using + ``torch.save()`` and ``torch.load()`` so that all ranks can have it.""" + obj_list = [full_osd] + dist.broadcast_object_list( + obj_list, src=0, group=group, + ) + full_osd = obj_list[0] + return full_osd + + def _are_equal_states( + self, + state1: Dict[str, Any], + state2: Dict[str, Any], + ) -> bool: + """Checks if ``state1`` and ``state2`` contain the same mappings.""" + if set(state1.keys()) != set(state2.keys()): + return False + for state_name, value1 in state1.items(): + value2 = state2[state_name] + if type(value1) != type(value2): + return False + if torch.is_tensor(value1): # tensor state + assert torch.is_tensor(value2) + # Check the values on CPU to be device-agnostic + value1 = value1.cpu() + value2 = value2.cpu() + if value1.shape != value2.shape or \ + not torch.all(torch.isclose(value1, value2)): + return False + else: # non-tensor state + if value1 != value2: + return False + return True + + def _check_same_state( + self, + full_osd, + ref_osd, + check_same_param_keys: bool, + ): + """Checks that ``full_osd`` and ``ref_osd`` have the same "state" part. + If ``check_same_param_keys=True``, then checks that the parameter keys + match (e.g. when both should be parameter names), and does not check + the parameter keys otherwise.""" + assert "state" in ref_osd + self.assertTrue("state" in full_osd) + ref_osd_state = ref_osd["state"] + full_osd_state = full_osd["state"] + if check_same_param_keys: + # Check parameter keys are the same + ref_osd_param_ids = set(ref_osd_state.keys()) + full_osd_param_ids = set(full_osd_state.keys()) + self.assertTrue(ref_osd_param_ids == full_osd_param_ids) + for param_id, param_state in full_osd_state.items(): + for state_name, value in param_state.items(): + ref_value = ref_osd_state[param_id][state_name] + self.assertEqual(value, ref_value) + return + # Otherwise, only require the parameter keys to be isomorphic (e.g. + # between IDs and names) + ref_osd_states = list(ref_osd["state"].values()) + full_osd_states = list(full_osd["state"].values()) + assert len(ref_osd_states) == len(full_osd_states) + # Use brute-force quadratic-time comparison since it is hard to + # hash a tensor by value instead of by object + for full_osd_state in full_osd_states: + # Check for at least one match (may be > 1 in toy edge cases, e.g. + # multiple biases); nonetheless, each having >= 1 match and the two + # lists having equal length imply that the list contents are equal + self.assertTrue(any( + self._are_equal_states(full_osd_state, ref_osd_state) + for ref_osd_state in ref_osd_states + )) + + def _check_same_param_groups( + self, + full_osd, + ref_osd, + check_same_param_keys: bool, + ): + """Checks that ``full_osd`` and ``ref_osd`` have the same + "param_groups" part. If ``check_same_param_keys=True`, then checks that + the parameter keys match (e.g. when both should be parameter names), + and does not check the parameter keys otherwise.""" + assert "param_groups" in ref_osd + self.assertTrue("param_groups" in full_osd) + ref_osd_param_groups = ref_osd["param_groups"] + full_osd_param_groups = full_osd["param_groups"] + self.assertTrue(len(full_osd_param_groups), len(ref_osd_param_groups)) + if self.rank == 0: + for full_osd_pg, ref_osd_pg in zip( + full_osd_param_groups, ref_osd_param_groups, + ): + self.assertEqual( + set(full_osd_pg.keys()), set(ref_osd_pg.keys()), + ) + for name, full_osd_value in full_osd_pg.items(): + if name == "params" and not check_same_param_keys: + continue + self.assertEqual(full_osd_value, ref_osd_pg[name]) + + @skip_if_lt_x_gpu(2) + @parametrize("use_multiple_param_groups", [False, True]) + @parametrize("rank0_only", [False, True]) + def test_full_optim_state_dict_nested( + self, + use_multiple_param_groups: bool, + rank0_only: bool, + ) -> None: + """ + Tests :meth:`full_optim_state_dict` by comparing the returned dict for + an FSDP-wrapped model with that of an equivalent non-wrapped model. + + The parameter groups in the "param_groups" part and the values in the + "state" part should be the same, but the parameter keys may be + different (e.g. the full optimizer state dict uses parameter names + while the non-wrapped equivalent uses parameter IDs). + """ + NUM_ITERS = 3 + model1, optim1, optim_input = self._init_nested_model( + wrap=True, use_multiple_param_groups=use_multiple_param_groups, + ) + losses1 = self._step_model(model1, optim1, num_iters=NUM_ITERS) + full_osd = FSDP.full_optim_state_dict( + model1, optim1, optim_input, rank0_only=rank0_only, + ) + # Non-target ranks get an empty state dict + if rank0_only and self.rank != 0: + self.assertEqual(len(full_osd), 0) + return + model2, optim2, _ = self._init_nested_model( + wrap=False, use_multiple_param_groups=use_multiple_param_groups, + ) + losses2 = self._step_model(model2, optim2, num_iters=NUM_ITERS) + ref_osd = optim2.state_dict() + # Check the losses to eliminate model drift as a source of error + for i, (l1, l2) in enumerate(zip(losses1, losses2)): + assert l1 == l2, f"Losses differ on iter {i}: {l1:.5f} {l2:.5f}" + # Do not check the parameter keys since the full optimizer state dict + # uses parameter names, while the non-wrapped equivalent uses parameter + # IDs + check_same_param_keys = False + self._check_same_param_groups( + full_osd, ref_osd, check_same_param_keys=check_same_param_keys, + ) + self._check_same_state( + full_osd, ref_osd, check_same_param_keys=check_same_param_keys, + ) + + # Require 4 GPUs since we test halving the world size + @skip_if_lt_x_gpu(4) + @parametrize("use_multiple_param_groups", [False, True]) + @parametrize("wrap_alt", [False, True]) + @parametrize("halve_world_size", [False, True]) + def test_shard_full_optim_state_dict_nested( + self, + use_multiple_param_groups: bool, + wrap_alt: bool, + halve_world_size: bool, + ): + """Tests :meth:`shard_full_optim_state_dict` for a non-FSDP-root model + with nested FSDP instances.""" + self._test_shard_full_optim_state( + model_class="nested", + use_multiple_param_groups=use_multiple_param_groups, + halve_world_size=halve_world_size, + wrap_alt=wrap_alt, + ) + + # Require 4 GPUs since we test halving the world size + @skip_if_lt_x_gpu(4) + def test_shard_full_optim_state_dict_transformer(self) -> None: + """Tests :meth:`shard_full_optim_state_dict` for an FSDP-root + transformer model with shared parameters.""" + self._test_shard_full_optim_state( + model_class="transformer", use_multiple_param_groups=False, + halve_world_size=True, + ) + + def _test_shard_full_optim_state( + self, + model_class: str, + use_multiple_param_groups: bool, + halve_world_size: bool, + **new_model_kwargs, + ): + """ + (1) Runs a model with full world size for K iterations to generate a + full optimizer state dict; + (2) initializes a model with halved world size and possibly different + FSDP wrapping scheme (based on ``new_model_kwargs``); + (3) shards the full optimizer state dict from (1) according to the + halved-world-size model; + (4) runs the halved-world-size model for K iterations; and + (5) checks that the sharded optimizer state dict from (3) matches the + halved-world-size model's local optimizer state dict, meaning that the + former could have equivalently been loaded into the local optimizer. + """ + NUM_ITERS = 3 + initializer = self._init_nested_model if model_class == "nested" \ + else self._init_transformer_model if model_class == "transformer" \ + else None + assert initializer is not None, f"Unsupported model: {model_class}" + # Run a wrapped model with full world size for a few iterations + model1, optim1, optim_input1 = initializer( + wrap=True, use_multiple_param_groups=use_multiple_param_groups, + ) + self._step_model(model1, optim1, num_iters=NUM_ITERS) + full_osd1 = FSDP.full_optim_state_dict(model1, optim1, optim_input1) + # Broadcast instead of `torch.save()`/`torch.load()` so that all ranks + # have the full state dict + full_osd1 = self._broadcast_full_osd(full_osd1) + if halve_world_size: + # Create a new process group with halved world size + new_group_ranks = [r for r in range(self.world_size) if r % 2 == 0] + new_group = dist.new_group(ranks=new_group_ranks) + if self.rank not in new_group_ranks: + return + else: + new_group = dist.distributed_c10d._get_default_group() + # Run a wrapped model with halved world size (from scratch) + model2, optim2, optim_input2 = initializer( + wrap=True, group=new_group, + use_multiple_param_groups=use_multiple_param_groups, + **new_model_kwargs, # specify `wrap_alt` to change wrapping + ) + self._step_model(model2, optim2, num_iters=NUM_ITERS) + full_osd2 = FSDP.full_optim_state_dict(model2, optim2, optim_input2) + full_osd2 = self._broadcast_full_osd(full_osd2, group=new_group) + # As a sanity check, check that sharding the halved-world-size model's + # full optimizer state dict according to itself is equivalent to its + # local optimizer's state dict + local_osd2 = optim2.state_dict() + sharded_osd2 = FSDP.shard_full_optim_state_dict( + full_osd2, model2, optim_input2, + ) + check_same_param_keys = True # should all have matching parameter IDs + self._check_same_param_groups( + sharded_osd2, local_osd2, + check_same_param_keys=check_same_param_keys, + ) + self._check_same_state( + sharded_osd2, local_osd2, + check_same_param_keys=check_same_param_keys, + ) + # Check that sharding the full-world-size model's full optimizer state + # dict according to the halved-world-size model is equivalent to the + # halved-world-size model's local optimizer state dict + sharded_osd1 = FSDP.shard_full_optim_state_dict( + full_osd1, model2, optim_input2, + ) + self._check_same_param_groups( + sharded_osd1, local_osd2, + check_same_param_keys=check_same_param_keys, + ) + self._check_same_state( + sharded_osd1, local_osd2, + check_same_param_keys=check_same_param_keys, + ) + # As a sanity check, check that we can load and run a few iterations + optim2.load_state_dict(sharded_osd1) + self._step_model(model2, optim2, num_iters=NUM_ITERS) + + @skip_if_lt_x_gpu(2) + @parametrize("use_multiple_param_groups", [False, True]) + def test_rekey_optim_state_dict_to_ids( + self, + use_multiple_param_groups: bool, + ): + """Tests :meth:`rekey_optim_state_dict` with the new keys being + parameter IDs by checking that a wrapped model (i.e. with FSDP modules) + can rekey its optimizer state dict to match that of an equivalent + non-wrapped model (i.e. without FSDP modules).""" + NUM_ITERS = 3 + # Run a wrapped model for a few iterations + model1, optim1, optim_input1 = self._init_nested_model( + wrap=True, use_multiple_param_groups=use_multiple_param_groups, + ) + self._step_model(model1, optim1, num_iters=NUM_ITERS) + full_osd = FSDP.full_optim_state_dict(model1, optim1, optim_input1) + # Broadcast instead of `torch.save()`/`torch.load()` so that all ranks + # have the full state dict + full_osd = self._broadcast_full_osd(full_osd) + # Run a non-wrapped model for a few iterations + model2, optim2, optim_input2 = self._init_nested_model( + wrap=False, use_multiple_param_groups=use_multiple_param_groups, + ) + self._step_model(model2, optim2, num_iters=NUM_ITERS) + # Re-key the wrapped model's optimizer state dict using parameter IDs + # according to the non-wrapped model + rekeyed_osd = FSDP.rekey_optim_state_dict( + full_osd, OptimStateKeyType.PARAM_ID, model2, optim_input2, + ) + # Check that the re-keyed dict and actual dict are the same + osd = optim2.state_dict() + check_same_param_keys = True + self._check_same_param_groups( + rekeyed_osd, osd, check_same_param_keys=check_same_param_keys, + ) + self._check_same_state( + rekeyed_osd, osd, check_same_param_keys=check_same_param_keys, + ) + # As a sanity check, check that we can load and run a few iterations + optim2.load_state_dict(rekeyed_osd) + self._step_model(model2, optim2, num_iters=NUM_ITERS) + + @skip_if_lt_x_gpu(2) + @parametrize("use_multiple_param_groups", [False]) + def test_rekey_optim_state_dict_to_names( + self, + use_multiple_param_groups: bool, + ): + """Tests :meth:`rekey_optim_state_dict` with the new keys being + parameter names by checking that a non-wrapped model (i.e. without FSDP + modules) can rekey its optimizer state dict to match the expected + output of :meth:`full_optim_state_dict`, hence be sharded using + :meth:`shard_full_optim_state_dict`, and finally match the per-rank + optimizer state dict of a wrapped model (i.e. with FSDP modules).""" + NUM_ITERS = 3 + # Run a wrapped model for a few iterations + model1, optim1, optim_input1 = self._init_nested_model( + wrap=True, use_multiple_param_groups=use_multiple_param_groups, + ) + self._step_model(model1, optim1, num_iters=NUM_ITERS) + # Run a non-wrapped model for a few iterations + model2, optim2, optim_input2 = self._init_nested_model( + wrap=False, use_multiple_param_groups=use_multiple_param_groups, + ) + self._step_model(model2, optim2, num_iters=NUM_ITERS) + # Re-key the non-wrapped model's optimizer state dict using parameter + # names (still according to itself) + osd2 = optim2.state_dict() + rekeyed_osd = FSDP.rekey_optim_state_dict( + osd2, OptimStateKeyType.PARAM_NAME, model2, optim_input2, + ) + # Shard the non-wrapped model's re-keyed optimizer state dict, which + # maps back to (flattened) parameter IDs + sharded_osd = FSDP.shard_full_optim_state_dict( + rekeyed_osd, model1, optim_input1, + ) + # Check that this sharded optimizer state dict matches the wrapped + # model's per-rank optimizer state dict + osd1 = optim1.state_dict() + check_same_param_keys = True + self._check_same_param_groups( + sharded_osd, osd1, check_same_param_keys=check_same_param_keys, + ) + self._check_same_state( + sharded_osd, osd1, check_same_param_keys=check_same_param_keys, + ) + # As a sanity check, check that we can load and run a few iterations + optim1.load_state_dict(sharded_osd) + self._step_model(model1, optim1, num_iters=NUM_ITERS) + + +instantiate_parametrized_tests(TestFSDPOptimState) + +if __name__ == "__main__": + run_tests() diff --git a/test/distributed/fsdp/test_fsdp_state_dict.py b/test/distributed/fsdp/test_fsdp_state_dict.py index 86734a1c794754..bd854155620b2e 100644 --- a/test/distributed/fsdp/test_fsdp_state_dict.py +++ b/test/distributed/fsdp/test_fsdp_state_dict.py @@ -1,6 +1,7 @@ # Owner(s): ["oncall: distributed"] import sys +from contextlib import suppress from copy import deepcopy from functools import partial from typing import Any, Dict @@ -10,8 +11,10 @@ from torch.distributed.fsdp import ( FullyShardedDataParallel as FSDP, StateDictType, - CPUOffload + CPUOffload, + MixedPrecision, ) +from torch.distributed.fsdp.wrap import enable_wrap, wrap from torch.nn import Linear, Module import torch.nn as nn from torch.nn.parallel import DistributedDataParallel @@ -21,8 +24,9 @@ FSDPTest, get_full_params, _get_full_detached_param, - _zero_model, _get_state_dict, + SkipModel, + _zero_model, ) from torch.testing._internal.common_utils import ( instantiate_parametrized_tests, @@ -78,8 +82,8 @@ def world_size(self): def _get_simple_nested_model(self, *fsdp_args, **fsdp_kwargs): model = FSDP( nn.Sequential( - FSDP(nn.Linear(10, 10, bias=False), *fsdp_args, **fsdp_kwargs), - nn.Linear(10, 10, bias=False), + FSDP(nn.Linear(10, 10, bias=False).cuda(), *fsdp_args, **fsdp_kwargs), + nn.Linear(10, 10, bias=False).cuda(), ), *fsdp_args, **fsdp_kwargs, @@ -87,7 +91,7 @@ def _get_simple_nested_model(self, *fsdp_args, **fsdp_kwargs): return model def _get_simple_model(self, *fsdp_args, **fsdp_kwargs): - model = FSDP(nn.Linear(10, 10, bias=False), *fsdp_args, **fsdp_kwargs) + model = FSDP(nn.Linear(10, 10, bias=False).cuda(), *fsdp_args, **fsdp_kwargs) return model @skip_if_lt_x_gpu(2) @@ -139,20 +143,24 @@ def test_basic_save_and_load_state_dict(self, cpu_offload, fp16): self.assertEqual(tensor.dtype, torch.float16) @skip_if_lt_x_gpu(2) - def test_save_and_load_after_forward_state_dict(self): + @parametrize("mixed_precision", [True, False]) + def test_save_and_load_after_forward_state_dict(self, mixed_precision): """ Test that saving after some training results in params being updated as expected. """ torch.cuda.set_device(self.rank) - model = self._get_wrapped_model(group=torch.distributed.distributed_c10d._get_default_group()) + mixed_precision = MixedPrecision() if mixed_precision else None + model = self._get_simple_nested_model(mixed_precision=mixed_precision) optim = torch.optim.SGD(model.parameters(), lr=0.1) initial_params = _get_full_detached_param(model) for _ in range(6): - inp = model.module.get_input(torch.device("cuda")) + inp = torch.randn(1, 10, device=torch.cuda.current_device()) output = model(*inp) - loss = model.module.get_loss(inp, output).cuda() - model.module.run_backward(loss) + loss = output.sum() + expected_dtype = torch.float32 if mixed_precision is None else torch.float16 + self.assertEqual(expected_dtype, loss.dtype) + loss.backward() optim.step() trained_params = _get_full_detached_param(model) @@ -162,6 +170,10 @@ def test_save_and_load_after_forward_state_dict(self): state_dict = {k: v.clone() for k, v in model.state_dict().items()} _zero_model(model) + # Ensure checkpointed params have the full param dtype + for tensor in state_dict.values(): + self.assertEqual(tensor.dtype, torch.float32) + # Load state_dict into zeroed model model.load_state_dict(state_dict) loaded_params = _get_full_detached_param(model) @@ -185,7 +197,7 @@ def _state_dict(model: Module, state_dict_type: str): except KeyError: raise ValueError(f"No state_dict type for {state_dict_type}") - with model.state_dict_type(enum_val): + with FSDP.state_dict_type(model, enum_val): return model.state_dict() @staticmethod @@ -197,7 +209,7 @@ def _load_state_dict( except KeyError: raise ValueError(f"No state_dict for {state_dict_type}") - with model.state_dict_type(enum_val): + with FSDP.state_dict_type(model, enum_val): return model.load_state_dict(state_dict) def _dist_train(self, wrap_fsdp: bool, state_dict_type: str = ""): @@ -274,6 +286,70 @@ def test_state_dict_load_into_local_module(self): for fsdp_param, local_param in zip(fsdp_params, local_params): self.assertEqual(fsdp_param, local_param) + @skip_if_lt_x_gpu(2) + @parametrize("double_nest", [True]) + def test_state_dict_skip_module(self, double_nest): + torch.cuda.set_device(self.rank) + + def _create_module(wrap_fsdp=True): + LINEAR_SKIP = "linear_skip" + ctx = enable_wrap(wrapper_cls=FSDP) if wrap_fsdp else suppress() + with ctx: + module = SkipModel(double_nest=double_nest) + # Full name of linear_skip param tensors in SkipModel, as would be + # stored in checkpoint. + linear_skip_tensor_names = [ + k for k in dict(module.named_parameters()).keys() + if LINEAR_SKIP in k + ] + # skip SkipModule + linear_skip = getattr(module, LINEAR_SKIP) + delattr(module, LINEAR_SKIP) + # Wrap FSDP + fsdp = wrap(module) + # reattach + setattr(module, LINEAR_SKIP, linear_skip) + return fsdp, linear_skip_tensor_names + + fsdp, linear_skip_tensor_names = _create_module() + # Run a forward pass + inp = torch.randn((1, 10), device=torch.cuda.current_device()) + loss = fsdp(inp) + loss.sum().backward() + + state_dict = fsdp.state_dict() + if self.rank == 0: + sd_keys = list(state_dict.keys()) + expected = list(SkipModel(double_nest=False).state_dict().keys()) + self.assertEqual(sorted(sd_keys), sorted(expected)) + # TODO: parameters in linear_skip_tensor_names should not be handled + # by FSDP.state_dict(). Have a check once this is implemented in + # FSDP.state_dict(). + + # Check that it can be loaded into FSDP. + new_fsdp, _ = _create_module() + _zero_model(new_fsdp) + for (p1, p2) in zip(fsdp.parameters(), new_fsdp.parameters()): + self.assertNotEqual(p1, p2) + new_fsdp.load_state_dict(deepcopy(state_dict)) + for (p1, p2) in zip(fsdp.parameters(), new_fsdp.parameters()): + self.assertEqual(p1, p2) + + # Test that the checkpoint can be loaded into a local model. + local, _ = _create_module(wrap_fsdp=False) + for param in local.parameters(): + with torch.no_grad(): + param.zero_() + + with fsdp.summon_full_params(): + for (p1, p2) in zip(fsdp.parameters(), local.parameters()): + self.assertNotEqual(p1, p2) + + local.load_state_dict(deepcopy(state_dict)) + with fsdp.summon_full_params(): + for (p1, p2) in zip(fsdp.parameters(), local.parameters()): + self.assertEqual(p1, p2) + instantiate_parametrized_tests(TestFSDPStateDict) diff --git a/test/distributed/fsdp/test_fsdp_summon_full_params.py b/test/distributed/fsdp/test_fsdp_summon_full_params.py index f0632e64cf4bab..42ad9354ba3b68 100644 --- a/test/distributed/fsdp/test_fsdp_summon_full_params.py +++ b/test/distributed/fsdp/test_fsdp_summon_full_params.py @@ -7,8 +7,9 @@ import torch import torch.nn as nn from torch import distributed as dist -from torch.distributed.fsdp import CPUOffload +from torch.distributed.fsdp import CPUOffload, MixedPrecision from torch.distributed.fsdp import FlatParameter +from torch.distributed.fsdp.wrap import wrap, enable_wrap from torch.distributed.fsdp import FullyShardedDataParallel as FSDP from torch.testing._internal.common_distributed import skip_if_lt_x_gpu from torch.testing._internal.common_fsdp import ( @@ -37,10 +38,12 @@ sys.exit(0) -def _run_test_summon_full_param_writeback(cls, writeback, cpu_offload, modify_outer): - model = FSDP( - nn.Sequential(FSDP(nn.Linear(5, 5, bias=False)), nn.Linear(5, 3, bias=False)) - ).cuda(cls.rank) +def _run_test_summon_full_param_writeback(cls, writeback, modify_outer, *fsdp_args, **fsdp_kwargs): + with enable_wrap(wrapper_cls=FSDP, *fsdp_args, **fsdp_kwargs): + lin1 = wrap(nn.Linear(5, 5, bias=False).cuda(cls.rank)) + lin2 = nn.Linear(5, 3, bias=False).cuda(cls.rank) + model = wrap(nn.Sequential(lin1, lin2)) + # set the value outer_param = model.get_parameter("_fsdp_wrapped_module.flat_param") @@ -72,17 +75,19 @@ def world_size(self): @skip_if_lt_x_gpu(2) @parametrize("writeback", [True, False]) - @parametrize( - "cpu_offload", - [CPUOffload(offload_params=True), CPUOffload(offload_params=False)], - ) @parametrize("modify_outer", [True, False]) - def test_summon_full_param_writeback(self, writeback, cpu_offload, modify_outer): + @parametrize("mixed_precision", [True, False]) + # TODO: CPUOffload summon + writeback does not + # work when param is not sharded + # (currently when world_size == 1) + def test_summon_full_param_writeback(self, writeback, modify_outer, mixed_precision): + mixed_precision = MixedPrecision() if mixed_precision else None return _run_test_summon_full_param_writeback( self, writeback, - cpu_offload, - modify_outer, + modify_outer=modify_outer, + cpu_offload=CPUOffload(offload_params=False), + mixed_precision=mixed_precision, ) @@ -104,20 +109,27 @@ def get_expected_sharded_size(self, global_size): "cpu_offload", [CPUOffload(offload_params=True), CPUOffload(offload_params=False)], ) + @parametrize("mixed_precision", [True, False]) @parametrize("modify_outer", [True, False]) - def test_summon_full_param_writeback(self, writeback, cpu_offload, modify_outer): + def test_summon_full_param_writeback(self, writeback, cpu_offload, mixed_precision, modify_outer): + mixed_precision = MixedPrecision() if mixed_precision else None return _run_test_summon_full_param_writeback( - self, writeback, cpu_offload, modify_outer + self, + writeback, + modify_outer, + cpu_offload=cpu_offload, + mixed_precision=mixed_precision, ) @skip_if_lt_x_gpu(2) - def test_summon_full_param_shard_value(self): - + @parametrize("mixed_precision", [True, False]) + def test_summon_full_param_shard_value(self, mixed_precision): + mixed_precision = MixedPrecision() if mixed_precision else None raw_model = nn.Linear(10, 11) raw_model_size = self.get_model_param_count(raw_model) expected_shard_size = self.get_expected_sharded_size(raw_model_size) - model = FSDP(raw_model.cuda(self.rank)) + model = FSDP(raw_model.cuda(self.rank), mixed_precision=mixed_precision) self.assertEqual(expected_shard_size, self.get_model_param_count(model)) # we're assuming a single flatenned param @@ -140,11 +152,15 @@ def test_summon_full_param_shard_value(self): @skip_if_lt_x_gpu(2) @parametrize("recurse", [True, False]) @parametrize("summon_outer", [True, False]) - def test_summon_full_param_recursive(self, recurse, summon_outer): + @parametrize("mixed_precision", [True, False]) + def test_summon_full_param_recursive(self, recurse, summon_outer, mixed_precision): + mixed_precision = MixedPrecision() if mixed_precision else None model = FSDP( nn.Sequential( - FSDP(nn.Linear(5, 5, bias=False)), nn.Linear(5, 3, bias=False) - ) + FSDP(nn.Linear(5, 5, bias=False), mixed_precision=mixed_precision), + nn.Linear(5, 3, bias=False) + ), + mixed_precision=mixed_precision, ).cuda(self.rank) global_inner_numel = self.get_model_param_count(nn.Linear(5, 5, bias=False)) @@ -210,11 +226,15 @@ def bad_backwards_hook(tensor): output.backward() @skip_if_lt_x_gpu(2) - def test_summon_full_params_respects_reshard_after_forward(self): + @parametrize("mixed_precision", [True, False]) + def test_summon_full_params_respects_reshard_after_forward(self, mixed_precision): + mixed_precision = MixedPrecision() if mixed_precision else None model = FSDP( nn.Sequential( - FSDP(nn.Linear(5, 5, bias=False)), nn.Linear(5, 3, bias=False) - ) + FSDP(nn.Linear(5, 5, bias=False), mixed_precision=mixed_precision), + nn.Linear(5, 3, bias=False) + ), + mixed_precision=mixed_precision, ).cuda(self.rank) outer_param = model.get_parameter("_fsdp_wrapped_module.flat_param") @@ -225,7 +245,6 @@ def test_summon_full_params_respects_reshard_after_forward(self): # trigger lazy init model(torch.zeros(5).cuda(self.rank)) - # the root FSDP module keeps all params around self.assertEqual( outer_full_param_size, outer_param._full_param_padded.storage().size() @@ -263,7 +282,9 @@ def test_summon_single_param(self): self.assertEqual(self.rank + 2, p[0]) @skip_if_lt_x_gpu(2) - def test_summon_full_params_equivalence(self): + @parametrize("rank0_only", [True, False]) + @parametrize("offload_to_cpu", [True, False]) + def test_summon_full_params_equivalence(self, rank0_only, offload_to_cpu): offload = CPUOffload(offload_params=True) model = FSDP( DeterministicModel(wrap_fsdp=True, cpu_offload=offload), @@ -271,20 +292,34 @@ def test_summon_full_params_equivalence(self): ) local_model = DeterministicModel(wrap_fsdp=False) - with model.summon_full_params(recurse=True): + dev = torch.device("cpu") if offload_to_cpu else torch.device("cuda", torch.cuda.current_device()) + + params_to_compare = ( + [p.clone() for p in model.parameters()] if rank0_only and self.rank != 0 + else list(local_model.parameters()) + ) + + with model.summon_full_params(recurse=True, rank0_only=rank0_only, writeback=not rank0_only, offload_to_cpu=offload_to_cpu): # Below sleep causes failures without stream synchronization in # summon_full_params fix. torch.cuda._sleep(1000000) - fsdp_params = deepcopy(list(model.parameters())) + # FSDP param deepcopy() of params has issues + fsdp_params = [p.clone() for p in model.parameters()] - self.assertEqual(fsdp_params, list(local_model.parameters())) + self.assertEqual(fsdp_params, params_to_compare) @skip_if_lt_x_gpu(2) - def test_reshard_outside_forward_backward_iteration(self): + @parametrize("rank0_only", [True, False]) + @parametrize("offload_to_cpu", [True, False]) + @parametrize("mixed_precision", [True, False]) + def test_reshard_outside_forward_backward_iteration(self, rank0_only, offload_to_cpu, mixed_precision): + mixed_precision = MixedPrecision() if mixed_precision else None model = FSDP( nn.Sequential( - FSDP(nn.Linear(5, 5, bias=False)), nn.Linear(5, 1, bias=False) - ) + FSDP(nn.Linear(5, 5, bias=False), mixed_precision=mixed_precision), + nn.Linear(5, 1, bias=False) + ), + mixed_precision=mixed_precision, ).cuda(self.rank) outer_param = model.get_parameter("_fsdp_wrapped_module.flat_param") @@ -310,7 +345,11 @@ def test_reshard_outside_forward_backward_iteration(self): # now lets repeat it with summon done in between output = model(torch.zeros(5).cuda(self.rank)) - with model.summon_full_params(): + self.assertEqual( + outer_full_param_size, outer_param._full_param_padded.storage().size() + ) + self.assertEqual(0, inner_param._full_param_padded.storage().size()) + with model.summon_full_params(rank0_only=rank0_only, writeback=not rank0_only, offload_to_cpu=offload_to_cpu): pass self.assertEqual( outer_full_param_size, outer_param._full_param_padded.storage().size() @@ -318,43 +357,128 @@ def test_reshard_outside_forward_backward_iteration(self): self.assertEqual(0, inner_param._full_param_padded.storage().size()) output.backward() - with model.summon_full_params(): + with model.summon_full_params(rank0_only=rank0_only, writeback=not rank0_only, offload_to_cpu=offload_to_cpu): pass self.assertEqual(0, outer_param._full_param_padded.storage().size()) self.assertEqual(0, inner_param._full_param_padded.storage().size()) @skip_if_lt_x_gpu(2) - def test_params_are_unflattenned(self): + @parametrize("rank0_only", [True, False]) + @parametrize("offload_to_cpu", [True, False]) + @parametrize("mixed_precision", [True, False]) + def test_params_are_unflattenned(self, rank0_only, offload_to_cpu, mixed_precision): layer_shape = (10, 12) model = nn.Linear(*layer_shape, bias=False).cuda(self.rank) - fsdp_model = FSDP(deepcopy(model)).cuda(self.rank) + mixed_precision = MixedPrecision() if mixed_precision else None + fsdp_model = FSDP(deepcopy(model), mixed_precision=mixed_precision).cuda(self.rank) - flattened_param = fsdp_model.get_parameter("_fsdp_wrapped_module.flat_param") + def _get_flat_param(): + return fsdp_model.get_parameter("_fsdp_wrapped_module.flat_param") + + flattened_param = _get_flat_param() self.assertEqual(layer_shape[0] * layer_shape[1] / 2, flattened_param.numel()) - with fsdp_model.summon_full_params(): - self.assertEqual(fsdp_model.weight.shape, model.weight.shape) + with fsdp_model.summon_full_params(rank0_only=rank0_only, writeback=not rank0_only, offload_to_cpu=offload_to_cpu): + if self.rank == 0 or not rank0_only: + self.assertEqual(fsdp_model.weight.shape, model.weight.shape) + expected_device = ( + torch.device("cpu") if offload_to_cpu else torch.device("cuda", torch.cuda.current_device()) + ) + self.assertTrue(expected_device == fsdp_model.weight.device) + else: + # Nonzero rank with rank0_only maintains original params. + flat_within_ctx = _get_flat_param() + self.assertEqual(flat_within_ctx, flattened_param) + self.assertEqual(flat_within_ctx.device, torch.device(torch.cuda.current_device())) + + # CPU offload should restore the param device + param = next(fsdp_model.parameters()) + self.assertTrue(param.device == torch.device("cuda", torch.cuda.current_device())) @skip_if_lt_x_gpu(2) - def test_params_count_and_value(self): + @parametrize("rank0_only", [True, False]) + @parametrize("offload_to_cpu", [True, False]) + @parametrize("mixed_precision", [True, False]) + def test_params_count_and_value(self, rank0_only, offload_to_cpu, mixed_precision): + mixed_precision = MixedPrecision() if mixed_precision else None fsdp_model = FSDP( NestedWrappedModule( group=dist.distributed_c10d._get_default_group(), wrap_fsdp=True, fsdp_init_mode=FSDPInitMode.CUDA_BEFORE, - ) + mixed_precision=mixed_precision, + ), + mixed_precision=mixed_precision, ) model = NestedWrappedModule( group=dist.distributed_c10d._get_default_group(), wrap_fsdp=False, fsdp_init_mode=FSDPInitMode.CUDA_BEFORE, ) - with fsdp_model.summon_full_params(): + + dev = ( + torch.device("cpu") if offload_to_cpu + else torch.device("cuda", torch.cuda.current_device()) + ) + + params_to_compare = ( + [p.to(dev) for p in model.module.parameters()] + if not rank0_only or self.rank == 0 else + list(p.clone() for p in fsdp_model.parameters()) + ) + with fsdp_model.summon_full_params(rank0_only=rank0_only, writeback=not rank0_only): for p1, p2 in itertools.zip_longest( - fsdp_model.parameters(), model.module.parameters() + fsdp_model.parameters(), params_to_compare ): self.assertEqual(p1, p2) + # CPU offload should restore the param device + param = next(fsdp_model.parameters()) + self.assertTrue( + param.device == torch.device("cuda", torch.cuda.current_device()) + ) + + @skip_if_lt_x_gpu(2) + def test_raises_rank0_with_writeback(self): + fsdp_model = FSDP( + NestedWrappedModule( + group=dist.distributed_c10d._get_default_group(), + wrap_fsdp=True, + fsdp_init_mode=FSDPInitMode.CUDA_BEFORE, + ) + ) + + with self.assertRaisesRegex(ValueError, "is not supported"): + with fsdp_model.summon_full_params(rank0_only=True, writeback=True): + pass + + @skip_if_lt_x_gpu(2) + @parametrize("prefix", ["", "test_prefix"]) + @parametrize("recurse", [False, True]) + def test_named_parameters_buffers(self, prefix: str, recurse: bool): + fsdp_model = FSDP( + NestedWrappedModule( + group=dist.distributed_c10d._get_default_group(), + wrap_fsdp=True, + fsdp_init_mode=FSDPInitMode.CUDA_BEFORE, + ) + ) + fsdp_model.register_buffer("buffer", torch.ones(1)) + model = NestedWrappedModule( + group=dist.distributed_c10d._get_default_group(), + wrap_fsdp=False, + fsdp_init_mode=FSDPInitMode.CUDA_BEFORE, + ) + model.register_buffer("buffer", torch.ones(1)) + with fsdp_model.summon_full_params(): + for call in ["named_parameters", "named_buffers"]: + for (n1, p1), (n2, p2) in itertools.zip_longest( + getattr(fsdp_model, call)(prefix=prefix, recurse=recurse), + getattr(model, call)(prefix=prefix, recurse=recurse), + ): + self.assertEqual(n1, n2) + self.assertEqual(p1, p2) + instantiate_parametrized_tests(TestSummonFullParams) instantiate_parametrized_tests(TestSummonFullParamsNoShard) diff --git a/test/distributed/fsdp/test_fsdp_traversal.py b/test/distributed/fsdp/test_fsdp_traversal.py new file mode 100644 index 00000000000000..69ceca082441bf --- /dev/null +++ b/test/distributed/fsdp/test_fsdp_traversal.py @@ -0,0 +1,57 @@ +# Owner(s): ["oncall: distributed"] + +import sys + +from torch import distributed as dist +from torch.distributed.fsdp import FullyShardedDataParallel as FSDP +from torch.testing._internal.common_distributed import skip_if_lt_x_gpu +from torch.testing._internal.common_fsdp import ( + FSDPTest, + NestedWrappedModule, +) +from torch.testing._internal.common_utils import ( + TEST_WITH_DEV_DBG_ASAN, + run_tests, +) + + +if not dist.is_available(): + print("Distributed not available, skipping tests", file=sys.stderr) + sys.exit(0) + +if TEST_WITH_DEV_DBG_ASAN: + print( + "Skip dev-asan as torch + multiprocessing spawn have known issues", + file=sys.stderr, + ) + sys.exit(0) + + +class TestTraversal(FSDPTest): + @property + def world_size(self): + return 2 + + @skip_if_lt_x_gpu(2) + def test_fsdp_modules(self): + group = dist.distributed_c10d._get_default_group() + model = NestedWrappedModule(group, wrap_fsdp=True) + modules = FSDP.fsdp_modules(model) + self.assertEquals( + modules, [ + model.module.get_submodule("1"), + model.module.get_submodule("1").get_submodule("0"), + model.module.get_submodule("2"), + ] + ) + modules = FSDP.fsdp_modules(model, root_only=True) + self.assertEqual( + modules, [ + model.module.get_submodule("1"), + model.module.get_submodule("2"), + ] + ) + + +if __name__ == "__main__": + run_tests() diff --git a/test/distributed/fsdp/test_utils.py b/test/distributed/fsdp/test_utils.py index 5ac13eefa7e970..99a17a26d2d586 100644 --- a/test/distributed/fsdp/test_utils.py +++ b/test/distributed/fsdp/test_utils.py @@ -1,5 +1,6 @@ # Owner(s): ["oncall: distributed"] +from collections import OrderedDict import random import sys import unittest @@ -58,7 +59,7 @@ def get_a_tensor(): data.append({"key1": get_a_tensor(), "key2": {1: get_a_tensor()}, "key3": 3}) data.insert(0, set(["x", get_a_tensor(), get_a_tensor()])) data.append(([1], get_a_tensor(), (1), [get_a_tensor()], set((1, 2)))) - od = dict() + od = OrderedDict() od["k"] = "value" data.append(od) diff --git a/test/distributed/fsdp/test_wrap.py b/test/distributed/fsdp/test_wrap.py index 0b4c1f8acc6cc7..d181ca23235aef 100644 --- a/test/distributed/fsdp/test_wrap.py +++ b/test/distributed/fsdp/test_wrap.py @@ -5,7 +5,6 @@ import os import tempfile import unittest - import torch import torch.nn as nn import torch.nn.functional as F @@ -15,6 +14,7 @@ BackwardPrefetch, ) from torch.distributed.fsdp.wrap import ( + always_wrap_policy, default_auto_wrap_policy, enable_wrap, wrap, @@ -67,6 +67,15 @@ def get_model(cuda=True): sequential = sequential.cuda() return sequential + @staticmethod + def verify_model_all_wrapped(cls, model): + cls.assertTrue(isinstance(model, FSDP)) + cls.assertTrue(isinstance(model.module[0], FSDP)) + cls.assertTrue(isinstance(model.module[1], FSDP)) + cls.assertTrue(isinstance(model.module[2], FSDP)) + cls.assertTrue(isinstance(model.module[2].module[0], FSDP)) + cls.assertTrue(isinstance(model.module[2].module[1], FSDP)) + @staticmethod def verify_model(cls, model): cls.assertTrue(isinstance(model, FSDP)) @@ -123,7 +132,7 @@ def test_error_already_wrapped(self, nested, fsdp_init_mode): wrapped_fsdp = wrapped_fsdp.cuda() with self.assertRaisesRegex(ValueError, "to NOT be FullyShardedDataParallel"): - mod = FSDP(wrapped_fsdp, fsdp_auto_wrap_policy=default_auto_wrap_policy) + mod = FSDP(wrapped_fsdp, auto_wrap_policy=default_auto_wrap_policy) @skip_if_lt_x_gpu(2) @parametrize( @@ -168,7 +177,7 @@ def forward(self, input): model = MyModel() wrapped_model = FSDP( model, - fsdp_auto_wrap_policy=functools.partial( + auto_wrap_policy=functools.partial( default_auto_wrap_policy, min_num_params=0, # wrap all modules ), @@ -226,7 +235,7 @@ def test_wrap(self, wrap_method): layer = FSDP( nn.Linear(5, 5), process_group=self.process_group, - fsdp_auto_wrap_policy=functools.partial(default_auto_wrap_policy, min_num_params=1) + auto_wrap_policy=functools.partial(default_auto_wrap_policy, min_num_params=1) ) self.assertTrue(isinstance(layer, FSDP)) self.assertEqual(layer.rank, self.process_group.rank()) @@ -257,6 +266,16 @@ def test_wrap_override_defaults(self): self.assertEqual(layer.rank, 0) self.assertEqual(layer.world_size, 2) + @unittest.skipIf(not torch.cuda.is_available(), "Test Requires CUDA") + def test_always_wrap(self): + """ + Test to ensure that if `always_wrap_policy` is + passed into FSDP, all submodules are wrapped. + """ + seq = TestFSDPWrap.NestedSequentialModel.get_model(cuda=True) + model = FSDP(seq, process_group=self.process_group, auto_wrap_policy=always_wrap_policy) + TestFSDPWrap.NestedSequentialModel.verify_model_all_wrapped(self, model) + def test_auto_wrap_api(self): """ Test to ensure with auto wrap, we wrap child modules correctly based on the min_num_params. @@ -269,7 +288,7 @@ def test_auto_wrap_api(self): model = FSDP( sequential, process_group=self.process_group, - fsdp_auto_wrap_policy=my_auto_wrap_policy + auto_wrap_policy=my_auto_wrap_policy ) TestFSDPWrap.NestedSequentialModel.verify_model(self, model) @@ -288,7 +307,7 @@ def test_auto_wrap_preset_exclude_wrap(self): model = FSDP( sequential, process_group=self.process_group, - fsdp_auto_wrap_policy=my_auto_wrap_policy + auto_wrap_policy=my_auto_wrap_policy ) self.assertTrue(isinstance(model, FSDP)) @@ -304,7 +323,7 @@ def test_auto_wrap_preset_exclude_wrap_include_children(self): my_auto_wrap_policy = functools.partial( default_auto_wrap_policy, min_num_params=40 ) - model = FSDP(sequential, process_group=self.process_group, fsdp_auto_wrap_policy=my_auto_wrap_policy) + model = FSDP(sequential, process_group=self.process_group, auto_wrap_policy=my_auto_wrap_policy) self.assertTrue(isinstance(model, FSDP)) self.assertTrue(isinstance(model[0], FSDP)) @@ -318,7 +337,7 @@ def test_auto_wrap_preset_force_leaf(self): my_auto_wrap_policy = functools.partial( default_auto_wrap_policy, min_num_params=40 ) - model = FSDP(sequential, process_group=self.process_group, fsdp_auto_wrap_policy=my_auto_wrap_policy) + model = FSDP(sequential, process_group=self.process_group, auto_wrap_policy=my_auto_wrap_policy) self.assertTrue(isinstance(model.module[0], FSDP)) # Assert children of multihead attention are not wrapped self.assertTrue(isinstance(model.module[1], nn.MultiheadAttention)) @@ -338,7 +357,7 @@ def test_auto_wrap_preset_force_leaf_custom(self): sequential = nn.Sequential( nn.Linear(10, 10), nn.ModuleList([nn.Linear(10, 10)]) ) - model = FSDP(sequential, process_group=self.process_group, fsdp_auto_wrap_policy=my_auto_wrap_policy) + model = FSDP(sequential, process_group=self.process_group, auto_wrap_policy=my_auto_wrap_policy) # Model was wrapped in FSDP as no inner modules were wrapped. self.assertTrue(isinstance(model, FSDP)) self.assertTrue(isinstance(model.module[0], nn.Linear)) @@ -380,7 +399,7 @@ def test_auto_wrap_smoke_test(self, fsdp_init_mode, cpu_offload): my_auto_wrap_policy = functools.partial( default_auto_wrap_policy, min_num_params=40 ) - model = FSDP(sequential, cpu_offload=cpu_offload, fsdp_auto_wrap_policy=my_auto_wrap_policy) + model = FSDP(sequential, cpu_offload=cpu_offload, auto_wrap_policy=my_auto_wrap_policy) TestFSDPWrap.NestedSequentialModel.verify_model(self, model) if cuda_after_init: model = model.cuda() diff --git a/test/distributed/optim/test_zero_redundancy_optimizer.py b/test/distributed/optim/test_zero_redundancy_optimizer.py index 67c274575d4468..6f8639395a8c5b 100644 --- a/test/distributed/optim/test_zero_redundancy_optimizer.py +++ b/test/distributed/optim/test_zero_redundancy_optimizer.py @@ -6,19 +6,17 @@ # LICENSE file in the root directory of this source tree. import copy -import itertools import os import sys +import unittest from contextlib import suppress -from typing import Any, List, Type, cast +from typing import Any, List, cast import numpy as np import torch import torch.distributed as dist -import unittest - if not dist.is_available(): print("Distributed not available, skipping tests", file=sys.stderr) sys.exit(0) @@ -34,15 +32,16 @@ from torch.distributed.optim.zero_redundancy_optimizer import _broadcast_object from torch.nn.parallel import DistributedDataParallel as DDP from torch.optim import SGD, AdamW -from torch.testing._internal import common_distributed, common_utils +from torch.testing._internal import common_distributed from torch.testing._internal.common_utils import ( + IS_WINDOWS, TEST_WITH_ASAN, TEST_WITH_DEV_DBG_ASAN, - sandcastle_skip_if, + instantiate_parametrized_tests, + parametrize, + run_tests, ) -from torch.testing._internal.common_utils import IS_WINDOWS - try: import torchvision HAS_TORCHVISION = True @@ -60,30 +59,19 @@ def _get_backend_for_tests(): BACKEND = _get_backend_for_tests() -DEVICE = "cuda" if torch.cuda.is_available() else "cpu" - - -def check_same_model_params(model_a: torch.nn.Module, model_b: torch.nn.Module, message: str = "") -> None: - for p_a, p_b in zip(model_a.parameters(), model_b.parameters()): - assert torch.allclose(p_a, p_b, atol=1e-3), f"Model parameters differ\n{p_a} {p_b}\n" + message - - for b_a, b_b in zip(model_a.buffers(), model_b.buffers()): - assert torch.allclose(b_a, b_b), f"Model buffers differ {b_a} - {b_b}\n" + message - - @unittest.skipIf( - TEST_WITH_ASAN or TEST_WITH_DEV_DBG_ASAN, "CUDA + ASAN doesnt work." + TEST_WITH_ASAN or TEST_WITH_DEV_DBG_ASAN, "CUDA + ASAN does not work." ) class TestZeroRedundancyOptimizer(common_distributed.MultiProcessTestCase): def setUp(self): super(TestZeroRedundancyOptimizer, self).setUp() os.environ["WORLD_SIZE"] = str(self.world_size) - self._spawn_processes() @property def device(self): - return torch.device(self.rank) if torch.cuda.is_available() else torch.device("cpu") + return torch.device("cuda") if torch.cuda.is_available() \ + else torch.device("cpu") @property def world_size(self): @@ -94,7 +82,6 @@ def tearDown(self): torch.distributed.destroy_process_group() except AssertionError: pass - try: os.remove(self.file_name) except OSError: @@ -104,75 +91,94 @@ def dist_init(self, rank, world_size=-1, backend=BACKEND): if (world_size < 1): world_size = self.world_size store = dist.FileStore(self.file_name, world_size) - return dist.init_process_group(backend=backend, store=store, rank=rank, world_size=world_size) + return dist.init_process_group( + backend=backend, store=store, rank=rank, world_size=world_size, + ) # TODO: sandcastle_skip_if does not work here. @unittest.skipIf( - TEST_WITH_ASAN or TEST_WITH_DEV_DBG_ASAN, "CUDA + ASAN doesnt work." + TEST_WITH_ASAN or TEST_WITH_DEV_DBG_ASAN, "CUDA + ASAN does not work." ) class TestZeroRedundancyOptimizerSingleRank(TestZeroRedundancyOptimizer): def test_state_dict(self): - """Check that the ZeroRedundancyOptimizer exposes the expected state dict interface, - irrespective of the sharding. - """ + """Check that ZeroRedundancyOptimizer exposes the expected state dict + interface, irrespective of the sharding.""" self.dist_init(self.rank) - x = torch.tensor([1.0], device=DEVICE, requires_grad=True) - o = ZeroRedundancyOptimizer([x], optimizer_class=SGD, lr=0.1, momentum=0.9) + LR1 = 0.1 + LR2 = 0.01 + MOMENTUM = 0.9 + RECIPIENT_RANK = 0 # rank 0 is the only rank since the world size is 1 + x = torch.tensor([1.0], device=self.device, requires_grad=True) + o = ZeroRedundancyOptimizer( + [x], optimizer_class=SGD, lr=LR1, momentum=MOMENTUM, + ) x.backward() o.step() - self.assertEqual(x, torch.tensor([0.9], device=DEVICE)) - self.assertEqual(o.optim.state[x]["momentum_buffer"], torch.tensor([1.0], device=DEVICE)) + self.assertEqual(x, torch.tensor([0.9], device=self.device)) + self.assertEqual( + o.optim.state[x]["momentum_buffer"], + torch.tensor([1.0], device=self.device), + ) o.zero_grad() - o.consolidate_state_dict() # Sync state dict in between replicas - even if there are none + o.consolidate_state_dict(to=RECIPIENT_RANK) state_dict = o.state_dict() - # Check that the state dict is pytorch-compliant key wise + # Check that the state dict has keys compliant with PyTorch self.assertIn("param_groups", state_dict.keys()) self.assertIn("state", state_dict.keys()) - # Check that the pulled state is what we expect, and that we have all the expected keys + # Check that the state has the expected keys self.assertEqual(state_dict["param_groups"][0]["lr"], 0.1) self.assertEqual(state_dict["param_groups"][0]["momentum"], 0.9) self.assertFalse(state_dict["param_groups"][0]["nesterov"]) self.assertEqual(state_dict["param_groups"][0]["weight_decay"], 0.0) self.assertEqual(state_dict["param_groups"][0]["dampening"], 0.0) - # Check that the pulled state and the .param_groups attribute are in sync - for k in state_dict["param_groups"][0].keys(): + # Check that the state and the `param_groups` attribute are in sync + for k in state_dict["param_groups"][0]: if k != "params": - self.assertEqual(state_dict["param_groups"][0][k], o.param_groups[0][k]) + self.assertEqual( + state_dict["param_groups"][0][k], + o.param_groups[0][k], + ) - # Check that it's correctly loaded - o = ZeroRedundancyOptimizer([x], optimizer_class=SGD, lr=0.01) + # Check that the state is reloaded with the correct values and device + o = ZeroRedundancyOptimizer([x], optimizer_class=SGD, lr=LR2) o.load_state_dict(state_dict) + self.assertEqual( + o.optim.state[x]["momentum_buffer"], + torch.tensor([1.0], device=self.device), + ) - # Check that state is correct and on proper device - self.assertEqual(o.optim.state[x]["momentum_buffer"], torch.tensor([1.0], device=DEVICE)) - - # We should now be using a lr of 0.1, both within the optimizer - # and as exposed by the .param_groups attribute - assert o.param_groups[0]["lr"] == 0.1 + # We should we using `LR1` and not `LR2` after reloading, both within + # the optimizer and as exposed by the `param_groups` attribute + self.assertEqual(o.param_groups[0]["lr"], LR1) x.backward() o.step() - self.assertEqual(x, torch.tensor([0.71], device=DEVICE)) - self.assertEqual(o.optim.state[x]["momentum_buffer"], torch.tensor([1.9], device=DEVICE)) + self.assertEqual(x, torch.tensor([0.71], device=self.device)) + self.assertEqual( + o.optim.state[x]["momentum_buffer"], + torch.tensor([1.9], device=self.device), + ) - # Check that the exposed param_groups are on the proper device + # Check that the exposed `param_groups`` are on the proper device self.assertEqual(o.param_groups[0]["params"][0].device, x.device) def test_lr_scheduler(self): - """ Check that a normal torch lr_scheduler is usable with ZeroRedundancyOptimizer""" - + """Check that a normal PyTorch ``lr_scheduler`` is usable with + ZeroRedundancyOptimizer.""" self.dist_init(self.rank) - x = torch.tensor([1.0], device=DEVICE, requires_grad=True) - x2 = torch.tensor([1.0], device=DEVICE, requires_grad=True) - o = ZeroRedundancyOptimizer([x], optimizer_class=SGD, lr=0.01) - o2 = torch.optim.SGD([x2], lr=0.01) + NUM_ITERS = 5 + LR = 0.01 + x = torch.tensor([1.0], device=self.device, requires_grad=True) + x2 = torch.tensor([1.0], device=self.device, requires_grad=True) + o = ZeroRedundancyOptimizer([x], optimizer_class=SGD, lr=LR) + o2 = torch.optim.SGD([x2], lr=LR) s = torch.optim.lr_scheduler.StepLR(o, 1) s2 = torch.optim.lr_scheduler.StepLR(o2, 1) - for _ in range(5): + for _ in range(NUM_ITERS): x.backward() o.zero_grad() o.step() @@ -184,8 +190,9 @@ def test_lr_scheduler(self): self.assertEqual(x, x2) def test_step_with_kwargs(self): - """ Check that the `step(**kwargs)` interface is properly exposed""" + """Check that the ``step(**kwargs)`` interface is properly exposed.""" self.dist_init(self.rank) + LR = 0.1 class SGDWithStepKWArg(torch.optim.SGD): def step(self, closure=None, kwarg=None): @@ -193,18 +200,21 @@ def step(self, closure=None, kwarg=None): kwarg.append(5) kwarg: List[Any] = [] - x = torch.tensor([1.0], device=DEVICE, requires_grad=True) - o = ZeroRedundancyOptimizer([x], optimizer_class=SGDWithStepKWArg, lr=0.1) + x = torch.tensor([1.0], device=self.device, requires_grad=True) + o = ZeroRedundancyOptimizer( + [x], optimizer_class=SGDWithStepKWArg, lr=LR, + ) x.backward() o.step(0, kwarg=kwarg) self.assertEqual(kwarg, [5]) - self.assertEqual(x, torch.tensor([0.9], device=DEVICE)) + self.assertEqual(x, torch.tensor([0.9], device=self.device)) def test_step_with_extra_inner_key(self): - """Check that an optimizer adding extra keys to the param_groups - is properly handled, in that the new key is exposed to the user - """ + """Check that ZeroRedundancyOptimizer wrapping an optimizer that adds + extra keys to ``param_groups`` exposes those keys through ZeRO's own + ``param_groups``.""" self.dist_init(self.rank) + LR = 0.1 class SGDWithNewKey(torch.optim.SGD): # Dummy optimizer which adds a new key to the param groups @@ -212,33 +222,38 @@ def step(self, closure=None): super().step() self.param_groups[0]["new_key"] = 0.1 - x = torch.tensor([1.0], device=DEVICE, requires_grad=True) - o = ZeroRedundancyOptimizer([x], optimizer_class=SGDWithNewKey, lr=0.1) + x = torch.tensor([1.0], device=self.device, requires_grad=True) + o = ZeroRedundancyOptimizer([x], optimizer_class=SGDWithNewKey, lr=LR) x.backward() o.step() self.assertEqual(o.param_groups[0]["new_key"], 0.1) - self.assertEqual(x, torch.tensor([0.9], device=DEVICE)) + self.assertEqual(x, torch.tensor([0.9], device=self.device)) def test_step_without_closure(self): - """Check that the step() method (without closure) is handlded as expected""" + """Check that the ``step()`` method (without closure) is handled as + expected.""" self.dist_init(self.rank) + LR = 0.1 class SGDWithoutClosure(torch.optim.SGD): def step(self): return super().step() - x = torch.tensor([1.0], device=DEVICE, requires_grad=True) - o = ZeroRedundancyOptimizer([x], optimizer_class=SGDWithoutClosure, lr=0.1) + x = torch.tensor([1.0], device=self.device, requires_grad=True) + o = ZeroRedundancyOptimizer( + [x], optimizer_class=SGDWithoutClosure, lr=LR, + ) x.backward() o.step() - self.assertEqual(x, torch.tensor([0.9], device=DEVICE)) + self.assertEqual(x, torch.tensor([0.9], device=self.device)) def test_zero_grad(self): - """Check that the zero_grad attribute is properly handled""" + """Check that the ``zero_grad`` method is properly handled.""" self.dist_init(self.rank) + LR = 0.01 x = torch.rand(1) m = torch.nn.Linear(1, 1) - o = ZeroRedundancyOptimizer(m.parameters(), optimizer_class=SGD, lr=0.1) + o = ZeroRedundancyOptimizer(m.parameters(), optimizer_class=SGD, lr=LR) y = m(x) y.backward(x) self.assertNotEqual(m.weight.grad, torch.zeros_like(m.weight)) @@ -251,46 +266,43 @@ def test_constructor(self): """Check the robustness of the ZeroRedundancyOptimizer constructor by passing different values for the ``params`` argument.""" self.dist_init(self.rank) - + LR = 0.01 m = torch.nn.Sequential( torch.nn.Linear(5, 10), torch.nn.Linear(10, 10), torch.nn.Linear(10, 10), ) - # Test various constructor inputs in the form: (input, expected error) ctor_inputs = [ - ([], ValueError), # empty parameter list - (torch.randn(1), TypeError), # non-iterable: `torch.Tensor` - (1.2, TypeError), # non-iterable: `float` + ([], ValueError), # empty parameter list + (torch.randn(1), TypeError), # non-iterable: `torch.Tensor` + (1.2, TypeError), # non-iterable: `float` ([ {"params": [l.weight for l in m]}, {"params": [l.bias for l in m]}, - ], None), # iterable of dict - (list(m.parameters()) + [42], TypeError), # iterable containing invalid type - (m.parameters(), None), # `params` as a generator - (list(m.parameters()), None) # `params` as a list + ], None), # iterable of dict + (list(m.parameters()) + [42], TypeError), # iterable containing invalid type + (m.parameters(), None), # `params` as a generator + (list(m.parameters()), None) # `params` as a list ] - for ctor_input, error in ctor_inputs: - if error: - with self.assertRaises(error): - ZeroRedundancyOptimizer(ctor_input, optimizer_class=SGD, lr=0.01) - else: - ZeroRedundancyOptimizer(ctor_input, optimizer_class=SGD, lr=0.01) + context = self.assertRaises(error) if error else suppress() + with context: + ZeroRedundancyOptimizer( + ctor_input, optimizer_class=SGD, lr=LR, + ) # Test constructing with multiple parameter groups more thoroughly - weight_decay = 0.01 - lr = 0.01 - betas = (0.9, 0.999) - eps = 1e-8 + WD = 0.01 + BETAS = (0.9, 0.999) + EPS = 1e-8 params = [ {"params": [l.weight for l in m], "weight_decay": 0.}, - {"params": [l.bias for l in m], "weight_decay": weight_decay}, + {"params": [l.bias for l in m], "weight_decay": WD}, ] o = ZeroRedundancyOptimizer( params, optimizer_class=AdamW, - lr=lr, betas=betas, eps=eps, + lr=LR, betas=BETAS, eps=EPS, ) assert len(o.param_groups) == 2, \ f"Expected 2 ZeRO param groups, but got {len(o.param_groups)}" @@ -306,7 +318,7 @@ def test_same_dense_param_type(self): and varying parameter types is added. """ self.dist_init(self.rank) - + LR = 0.01 inputs = [ [torch.sparse_coo_tensor(size=(2, 3))], [torch.FloatTensor(1), torch.DoubleTensor(1)], @@ -315,37 +327,63 @@ def test_same_dense_param_type(self): ] for input in inputs: with self.assertRaises(ValueError): - ZeroRedundancyOptimizer(input, optimizer_class=SGD, lr=0.1) + ZeroRedundancyOptimizer(input, optimizer_class=SGD, lr=LR) class TestZeroRedundancyOptimizerDistributed(TestZeroRedundancyOptimizer): + @property + def device(self): + return torch.device(self.rank) if torch.cuda.is_available() \ + else torch.device("cpu") + @property def world_size(self): return min(4, max(2, torch.cuda.device_count())) - @common_distributed.skip_if_rocm - def test_step(self): - """ Check that the ZeroRedundancyOptimizer wrapper properly exposes the `.step()` interface""" + @property + def context(self): + return suppress() if not torch.cuda.is_available() \ + else torch.cuda.device(self.rank) - if self.rank >= self.world_size or (torch.cuda.is_available() and torch.cuda.device_count() < 2): - return + def _check_same_model_params( + self, + model_a: torch.nn.Module, + model_b: torch.nn.Module, + message: str = "", + ) -> None: + # Check that model parameters match + for p_a, p_b in zip(model_a.parameters(), model_b.parameters()): + torch.testing.assert_close( + p_a, p_b, atol=1e-3, rtol=1e-5, + msg=f"Model parameters differ:\n{p_a} {p_b}\n" + message, + ) + # Check that model buffers match + for b_a, b_b in zip(model_a.buffers(), model_b.buffers()): + torch.testing.assert_close( + b_a, b_b, + msg=f"Model buffers differ:\n{b_a} {b_b}\n" + message, + ) + @common_distributed.skip_if_no_gpu + @common_distributed.skip_if_rocm + def test_step(self): + """Check that ZeroRedundancyOptimizer properly exposes the ``step()`` + interface.""" self.dist_init(self.rank, world_size=self.world_size) + LR = 0.01 - context = suppress() if not torch.cuda.is_available() else torch.cuda.device(self.rank) - - with context: + with self.context: x = torch.tensor([float(self.rank + 1)], device=self.device) m = torch.nn.Linear(1, 1) m.weight.data = torch.tensor([[1.0]]) m.bias.data = torch.tensor([2.0]) - m_zero = copy.deepcopy(m) - m.to(self.device) - m_zero.to(self.device) + m = m.to(self.device) + m_zero = copy.deepcopy(m).to(self.device) - lr = 0.1 - o = SGD(m.parameters(), lr=lr) - o_zero = ZeroRedundancyOptimizer(m_zero.parameters(), optimizer_class=SGD, lr=lr) + o = SGD(m.parameters(), lr=LR) + o_zero = ZeroRedundancyOptimizer( + m_zero.parameters(), optimizer_class=SGD, lr=LR, + ) y = m(x) y.backward(x) @@ -364,24 +402,23 @@ def test_step(self): self.assertEqual(m.weight, m_zero.weight) self.assertEqual(m.bias, m_zero.bias) + @common_distributed.skip_if_no_gpu @common_distributed.skip_if_rocm def test_step_with_closure(self): - """ Check that the ZeroRedundancyOptimizer wrapper properly exposes the `.step(closure)` interface""" - - if self.rank >= self.world_size or (torch.cuda.is_available() and torch.cuda.device_count() < 2): - return - + """Check that ZeroRedundancyOptimizer properly exposes the + ``step(closure)`` interface.""" self.dist_init(self.rank, world_size=self.world_size) - context = suppress() if not torch.cuda.is_available() else torch.cuda.device(self.rank) - - with context: + with self.context: for bucket_view in [False, True]: x_val = self.rank + 1 weight = 1.0 bias = 2.0 error = 1.0 - target = torch.tensor([x_val * weight + bias + error], device=self.device) + target = torch.tensor( + [x_val * weight + bias + error], + device=self.device, + ) loss_fn = torch.nn.L1Loss() x = torch.tensor([float(x_val)], device=self.device) @@ -416,32 +453,62 @@ def closure(): self.assertEqual(m.weight, torch.tensor([[1.1]])) self.assertEqual(m.bias, torch.tensor([2.1])) + @common_distributed.skip_if_no_gpu + def test_lr_scheduler(self): + """Check that a normal PyTorch ``lr_scheduler`` is usable with + ZeroRedundancyOptimizer.""" + self.dist_init(self.rank) + x = torch.tensor([1.0], device=self.device, requires_grad=True) + x2 = torch.tensor([1.0], device=self.device, requires_grad=True) + o = ZeroRedundancyOptimizer([x], optimizer_class=SGD, lr=0.01) + o2 = torch.optim.SGD([x2], lr=0.01) + s = torch.optim.lr_scheduler.StepLR(o, 1) + s2 = torch.optim.lr_scheduler.StepLR(o2, 1) + for _ in range(5): + x.backward() + o.zero_grad() + o.step() + s.step() + x2.backward() + o2.zero_grad() + o2.step() + s2.step() + self.assertEqual(x, x2) + def test_sharding(self): - """ Check the sharding at construction time + """ + Check ZeroRedundancyOptimizer's parameter sharding at construction + time. NOTE: The correctness of this test depends on the ZeRO implementation using the sorted-greedy partitioning algorithm. For details, see - `ZeroRedundancyOptimizer._partition_parameters()` in - `zero_redundancy_optimizer.py`. + ``ZeroRedundancyOptimizer._partition_parameters()`` in + zero_redundancy_optimizer.py. """ self.dist_init(self.rank) + LR = 0.01 sizes = [9, 7, 5, 3] params = [] for size in sizes * self.world_size: params.append(torch.rand(size, 1)) - o = ZeroRedundancyOptimizer(params, optimizer_class=SGD, lr=0.1) - self.assertEqual(sum([x.numel() for x in o.optim.param_groups[0]["params"]]), sum(sizes)) + o = ZeroRedundancyOptimizer(params, optimizer_class=SGD, lr=LR) + self.assertEqual( + sum([x.numel() for x in o.optim.param_groups[0]["params"]]), + sum(sizes), + ) def test_add_param_group(self): - """Check that ZeroRedundancyOptimizer properly handles adding a new param_group a posteriori, - and that all ranks get a shard + """Check that ZeroRedundancyOptimizer properly handles adding a new + parameter group a posteriori and that all ranks get a shard of the + contained parameters. NOTE: The correctness of this test depends on the ZeRO implementation using the sorted-greedy partitioning algorithm. For details, see - `ZeroRedundancyOptimizer._partition_parameters()` in - `zero_redundancy_optimizer.py`. + ``ZeroRedundancyOptimizer._partition_parameters()`` in + zero_redundancy_optimizer.py. """ self.dist_init(self.rank) + LR = 0.01 # Test with all parameters trainable to begin with def all_trainable(): @@ -451,19 +518,26 @@ def all_trainable(): for size in sizes_world[:-1]: params.append(torch.rand(size, 1)) - # Make sure that the params are trainable, enforces size-based partitioning + # Make sure that the params are trainable so that they are factored + # into the size-based parameter partitioning for p in params: p.requires_grad = True - o = ZeroRedundancyOptimizer(params, optimizer_class=SGD, lr=0.1) - - assert len(o.param_groups) == 1 + o = ZeroRedundancyOptimizer(params, optimizer_class=SGD, lr=LR) + self.assertEqual(len(o.param_groups), 1) o.add_param_group({"params": [torch.rand(3, 1)]}) - - assert len(o.param_groups) == 2 - # Verify that added group is added to the correct partition making all have the same elements. - assert sum([x.numel() for g in o.optim.param_groups for x in g["params"]]) == sum(sizes) - assert len(o.optim.param_groups) == 2 + # Verify that new group is added to the correct partition, making + # all partitions have the same elements + self.assertEqual(len(o.param_groups), 2) + self.assertEqual( + sum([ + x.numel() + for g in o.optim.param_groups + for x in g["params"] + ]), + sum(sizes), + ) + self.assertEqual(len(o.optim.param_groups), 2) # Test a pathological config with a first big non-trainable param def some_trainable(): @@ -471,17 +545,16 @@ def some_trainable(): for size in [100, 3, 5, 2, 6, 4]: params.append(torch.rand(size, 1)) - # Make sure that the params are trainable, enforces size-based partitioning + # Make sure that all but the first param are trainable so that they + # are factored into the size-based parameter partitioning for p in params[1:]: p.requires_grad = True - o = ZeroRedundancyOptimizer(params, optimizer_class=SGD, lr=0.1) - - assert len(o.param_groups) == 1 + o = ZeroRedundancyOptimizer(params, optimizer_class=SGD, lr=LR) + self.assertEqual(len(o.param_groups), 1) o.add_param_group({"params": [torch.rand(3, 1)]}) - - assert len(o.param_groups) == 2 - assert len(o.optim.param_groups) == 2 + self.assertEqual(len(o.param_groups), 2) + self.assertEqual(len(o.optim.param_groups), 2) all_trainable() some_trainable() @@ -489,91 +562,91 @@ def some_trainable(): @common_distributed.skip_if_no_gpu def test_multiple_param_groups(self): """ - Tests parity between constructing ZeRO with multiple parameter groups + Check parity between constructing ZeRO with multiple parameter groups upfront versus adding parameter groups to ZeRO after construction versus a non-sharded optimizer. """ self.dist_init(self.rank) - + BATCH_SIZE, NUM_ITERS = 8, 3 + INPUT_DIM, HIDDEN_DIM, OUTPUT_DIM = 5, 10, 5 + WD, LR = 0.01, 0.01 model1 = torch.nn.Sequential( - torch.nn.Linear(5, 10), - torch.nn.Linear(10, 10), - torch.nn.Linear(10, 5), + torch.nn.Linear(INPUT_DIM, HIDDEN_DIM), + torch.nn.Linear(HIDDEN_DIM, HIDDEN_DIM), + torch.nn.Linear(HIDDEN_DIM, OUTPUT_DIM), ) model2 = copy.deepcopy(model1) model3 = copy.deepcopy(model1) model1 = model1.to(self.device) model2 = model2.to(self.device) model3 = model3.to(self.device) - - batch_size = 8 - num_iters = 3 inputs = [ - torch.randn(batch_size, 5).to(self.device) for _ in range(num_iters) + torch.randn(BATCH_SIZE, INPUT_DIM).to(self.device) + for _ in range(NUM_ITERS) ] - wd = 0.01 - lr = 0.01 # Construct `optim1` with both parameter groups upfront optim1 = ZeroRedundancyOptimizer( [ {"params": [l.weight for l in model1], "weight_decay": 0.}, - {"params": [l.bias for l in model1], "weight_decay": wd}, + {"params": [l.bias for l in model1], "weight_decay": WD}, ], - optimizer_class=AdamW, lr=lr, + optimizer_class=AdamW, lr=LR, ) # Construct `optim2` by adding the second parameter after optim2 = ZeroRedundancyOptimizer( [l.weight for l in model2], - optimizer_class=AdamW, lr=lr, weight_decay=0., + optimizer_class=AdamW, lr=LR, weight_decay=0., ) optim2.add_param_group( - {"params": [l.bias for l in model2], "weight_decay": wd} + {"params": [l.bias for l in model2], "weight_decay": WD} ) # Construct `optim3` as a non-sharded optimizer optim3 = AdamW( [ {"params": [l.weight for l in model3], "weight_decay": 0.}, - {"params": [l.bias for l in model3], "weight_decay": wd}, - ], lr=lr, + {"params": [l.bias for l in model3], "weight_decay": WD}, + ], lr=LR, ) - # Check parity over a few iterations - for iter in range(num_iters): + for input in inputs: for model, optim in ( (model1, optim1), (model2, optim2), (model3, optim3), ): optim.zero_grad() - out = model(inputs[iter]) + out = model(input) loss = out.sum() loss.backward() optim.step() - for layer1, layer2, layer3 in zip(model1, model2, model3): - assert torch.allclose(layer1.weight, layer2.weight) - assert torch.allclose(layer1.weight, layer3.weight) - assert torch.allclose(layer1.bias, layer2.bias) - assert torch.allclose(layer1.bias, layer3.bias) + torch.testing.assert_close(layer1.weight, layer2.weight) + torch.testing.assert_close(layer1.weight, layer3.weight) + torch.testing.assert_close(layer1.bias, layer2.bias) + torch.testing.assert_close(layer1.bias, layer3.bias) - @common_distributed.skip_if_lt_x_gpu(2) + @common_distributed.skip_if_no_gpu @common_distributed.skip_if_rocm def test_collect_shards(self): - """ Check the state consolidation mechanism, and the state dict exposed by ZeroRedundancyOptimizer""" + """Check the state consolidation mechanism and the state dict exposed + by ZeroRedundancyOptimizer.""" self.dist_init(self.rank) - RECIPIENT_RANK = 0 - - # Run a dummy step so that the optimizer state dict exists - batch, input_width, hidden, target_width = 3, 20, 10, 5 - target = torch.rand((batch, target_width), device=self.device) - inputs = torch.rand((batch, input_width), device=self.device) - - model = torch.nn.Sequential(torch.nn.Linear(input_width, hidden), torch.nn.Linear(hidden, target_width)) - model.to(self.device) - + LR = 1e-3 + MOMENTUM = 0.99 + BATCH_SIZE, INPUT_DIM, HIDDEN_DIM, OUTPUT_DIM = 3, 20, 10, 5 + REFERENCE_RANK = 0 + target = torch.rand((BATCH_SIZE, OUTPUT_DIM), device=self.device) + inputs = torch.rand((BATCH_SIZE, INPUT_DIM), device=self.device) + model = torch.nn.Sequential( + torch.nn.Linear(INPUT_DIM, HIDDEN_DIM), + torch.nn.Linear(HIDDEN_DIM, OUTPUT_DIM), + ).to(self.device) loss_fn = torch.nn.L1Loss() loss_fn.to(self.device) - - # With SGD, Momentum is required to get a state to shard - optimizer = ZeroRedundancyOptimizer(model.parameters(), optimizer_class=SGD, lr=0.1, momentum=0.99) + optimizer = ZeroRedundancyOptimizer( + model.parameters(), + optimizer_class=SGD, + lr=LR, + momentum=MOMENTUM, # ensure there exists state to shard + ) def closure(): optimizer.zero_grad() @@ -582,56 +655,78 @@ def closure(): loss.backward() return loss + # Run a dummy step so that the optimizer state dict exists _ = optimizer.step(closure=closure) - # Update the optimizer state on the reference rank - optimizer.consolidate_state_dict(to=RECIPIENT_RANK) - - # Fetch the state on the reference rank - # - check that it has the correct size - # - load it again - if self.rank == RECIPIENT_RANK: + # Get the optimizer state on the reference rank + optimizer.consolidate_state_dict(to=REFERENCE_RANK) + if self.rank == REFERENCE_RANK: + # Check that the state has the correct size optimizer_state_dict = optimizer.state_dict() - self.assertEqual(len(optimizer_state_dict["state"]), len(list(model.parameters()))) + self.assertEqual( + len(optimizer_state_dict["state"]), + len(list(model.parameters())), + ) else: optimizer_state_dict = {} + # Load the optimizer state on all ranks without any exceptions optimizer_state_dict = _broadcast_object( optimizer_state_dict, - src_rank=RECIPIENT_RANK, + src_rank=REFERENCE_RANK, group=dist.group.WORLD, device=self.device, ) - - # Load the optimizer state dict, check that no exception is raised optimizer.load_state_dict(optimizer_state_dict) - @sandcastle_skip_if( - IS_WINDOWS, - "Test is flaky on windows: https://github.com/pytorch/pytorch/issues/66059" - ) - def test_multiple_groups(self): - """ Check that the ZeroRedundancyOptimizer handles working with multiple process groups""" - self.dist_init(self.rank, self.world_size, dist.Backend.GLOO) - - # Only work with the even ranks, to check that the global_rank indexing is properly used - sub_group_ranks = list(filter(lambda x: x % 2 == 0, range(self.world_size))) - process_group = torch.distributed.new_group(ranks=sub_group_ranks, backend="gloo") + def test_nondefault_process_group(self): + """Check that ZeroRedundancyOptimizer works with a non-default process + group consisting only of even ranks.""" + # Skip the test if below the minimum world size since then the test is + # trivial + MIN_WORLD_SIZE = 4 + if self.world_size < MIN_WORLD_SIZE: + common_distributed.logger.info( + "Skipping `test_nondefault_process_group()` since world size " + f"of {self.world_size} is less than {MIN_WORLD_SIZE}" + ) + return + BACKEND = dist.Backend.GLOO + self.dist_init(self.rank, self.world_size, BACKEND) + # Use GPU if enough are available, or fall back to CPU otherwise, which + # is fine since Gloo backend supports both + if torch.cuda.is_available() and \ + torch.cuda.device_count() >= self.world_size: + device = torch.device(self.rank) + else: + device = torch.device("cpu") + # Create a new process group consisting of the even ranks to exercise + # the case where the global and local ranks do not necessarily match + subgroup_ranks = [r for r in range(self.world_size) if r % 2 == 0] + process_group = dist.new_group( + ranks=subgroup_ranks, backend=BACKEND, + ) + # Ranks not participating in the new process group are no longer needed + if self.rank not in subgroup_ranks: + return - # Make sure that all the ranks get different training data - # So that the sync check in between their models is meaningful + # Set different seeds across ranks so that each rank gets different + # training data and hence the model sync check is meaningful torch.manual_seed(self.rank) np.random.seed(self.rank) - # Standard deep learning setup - epochs, batch, input_width, hidden, target_width = 5, 3, 20, 10, 5 - loss_fn = torch.nn.L1Loss().to(self.device) + EPOCHS, BATCH_SIZE, INPUT_DIM, HIDDEN_DIM, OUTPUT_DIM = 5, 3, 20, 10, 5 + LR = 1e-3 + MOMENTUM = 0.99 + REFERENCE_RANK = 0 + assert REFERENCE_RANK in subgroup_ranks, \ + "Reference rank must be in the new process group" + loss_fn = torch.nn.L1Loss().to(device) def check(optimizer): - # Just run a couple of epochs, check that the model is properly updated - for _ in range(epochs): - target = torch.rand((batch, target_width), device=self.device) - inputs = torch.rand((batch, input_width), device=self.device) + for _ in range(EPOCHS): + target = torch.rand((BATCH_SIZE, OUTPUT_DIM), device=device) + inputs = torch.rand((BATCH_SIZE, INPUT_DIM), device=device) def closure(): optimizer.zero_grad() @@ -639,167 +734,189 @@ def closure(): loss = loss_fn(output, target) loss /= self.world_size loss.backward() - dist.all_reduce(loss, group=process_group) # Not strictly needed for the test below - + dist.all_reduce(loss, group=process_group) return loss _ = optimizer.step(closure=closure) - # Check that all the params are the same on all ranks + # Check that the parameters match across ranks after a step for pg in optimizer.param_groups: for p in pg["params"]: - receptacle = [p.clone() for _ in sub_group_ranks] if self.rank == 0 else [] - dist.gather(p, receptacle, dst=0, group=process_group) - if self.rank == 0: - for sync_p in receptacle[1:]: - assert torch.all(torch.eq(receptacle[0], sync_p)), "Models differ in between ranks" - - if self.rank in sub_group_ranks: - # Model fitting in the broadcast bucket - model = torch.nn.Sequential( - torch.nn.Linear(input_width, hidden), - torch.nn.Linear(hidden, target_width), - ).to(self.device) + receptacle = [ + p.clone() for _ in subgroup_ranks + ] if self.rank == REFERENCE_RANK else [] + dist.gather( + p, receptacle, dst=REFERENCE_RANK, + group=process_group, + ) + if self.rank == REFERENCE_RANK: + reference_param = receptacle[0] + for param in receptacle[1:]: + torch.testing.assert_close( + reference_param, + param, + msg="Models differ between ranks", + ) - # With SGD, Momentum is required to get a state to shard - optimizer = ZeroRedundancyOptimizer( - model.parameters(), optimizer_class=SGD, lr=0.1, momentum=0.99, process_group=process_group - ) - check(optimizer) + model = torch.nn.Sequential( + torch.nn.Linear(INPUT_DIM, HIDDEN_DIM), + torch.nn.Linear(HIDDEN_DIM, OUTPUT_DIM), + ).to(device) + optimizer = ZeroRedundancyOptimizer( + model.parameters(), + optimizer_class=SGD, + lr=LR, + momentum=MOMENTUM, # ensure there exists state to shard + process_group=process_group, + ) + check(optimizer) - # Model not-fitting in the broadcast bucket + @common_distributed.skip_if_no_gpu + @parametrize( + "optimizer_class_str", + ["Adam", "AdamW", "SGD"], + # Use string to appease the internal test name parser + ) + @parametrize( + "maximize", + [False, True], + ) + def test_local_optimizer_parity( + self, + optimizer_class_str: str, + maximize: bool, + ): + """When combined with DDP, check that a local optimizer gives the same + results as wrapping that optimizer with ZeroRedundancyOptimizer.""" + self.dist_init(self.rank) + BATCHES = 20 + BATCH_SIZE = 64 + LR = 1e-3 + INPUT_DIM = 2 + HIDDEN_DIM = 3 + OUTPUT_DIM = 3 + torch.manual_seed(self.rank) + np.random.seed(self.rank) + if optimizer_class_str == "Adam": + optimizer_class = torch.optim.Adam + elif optimizer_class_str == "AdamW": + optimizer_class = torch.optim.AdamW + elif optimizer_class_str == "SGD": + optimizer_class = torch.optim.SGD + else: + assert 0, f"Unsupported optimizer class: {optimizer_class_str}" + + with self.context: + # Define a base model with a different buffer for each rank model = torch.nn.Sequential( - torch.nn.Linear(input_width, hidden), - torch.nn.Linear(hidden, target_width), + torch.nn.Linear(INPUT_DIM, HIDDEN_DIM), + torch.nn.Linear(HIDDEN_DIM, HIDDEN_DIM), + torch.nn.Linear(HIDDEN_DIM, OUTPUT_DIM), ).to(self.device) - - # With SGD, Momentum is required to get a state to shard - optimizer = ZeroRedundancyOptimizer( - model.parameters(), - optimizer_class=SGD, - lr=0.1, - momentum=0.99, - process_group=process_group, + model.register_buffer( + "test_buffer", torch.ones((1), device=self.device) * self.rank, + ) + # Define models/optimizers for DDP with ZeRO and DDP with local + # optimizer + defaults = {"maximize": True} if maximize else {} + sharded_optimizer = ZeroRedundancyOptimizer( + params=model.parameters(), optimizer_class=optimizer_class, + lr=LR, **defaults, + ) + sharded_ddp_model = DDP( + module=model, device_ids=[self.rank], + broadcast_buffers=True, find_unused_parameters=True, + ) + local_model = copy.deepcopy(model).to(self.device) + ddp_optimizer = optimizer_class( + local_model.parameters(), lr=LR, **defaults, + ) + ddp_model = DDP( + local_model, device_ids=[self.rank], + broadcast_buffers=True, find_unused_parameters=True, + ) + # Check that the model is properly synchronized between ranks + # at construction time + self._check_same_model_params( + sharded_ddp_model, ddp_model, + "Models differ from the start", ) - check(optimizer) - - @common_distributed.skip_if_no_gpu - def test_local_optimizer_parity(self): - """When combined with DDP, check that ZeroRedundancyOptimizer(optimizer) and the same monolithic optimizer - give the exact same results - """ - self.dist_init(self.rank) - BATCHS = 20 - - with torch.cuda.device(self.rank): - torch.manual_seed(self.rank) - np.random.seed(self.rank) - - def check_optimizer_equivalence(optimizer: Type[torch.optim.Optimizer], maximize: bool = False): - # Any model works. Add one different buffer per rank - model = torch.nn.Sequential( - torch.nn.Linear(2, 3), - torch.nn.Linear(3, 3), - torch.nn.Linear(3, 3), - ) - model.register_buffer("test_buffer", torch.ones((1)) * self.rank) - model.to(self.device) + def check_step(): + input_tensor = torch.rand((BATCH_SIZE, INPUT_DIM)) - defaults = dict() + def closure_ddp(input_tensor=input_tensor): + ddp_optimizer.zero_grad() + ddp_loss = ddp_model(input_tensor).abs().sum() + ddp_loss.backward() + return ddp_loss - if maximize: - defaults['maximize'] = True + def closure_sharded(input_tensor=input_tensor): + sharded_optimizer.zero_grad() + sharded_loss = sharded_ddp_model(input_tensor).abs().sum() + sharded_loss.backward() + return sharded_loss - sharded_optimizer = ZeroRedundancyOptimizer( - params=model.parameters(), optimizer_class=optimizer, lr=1e-3, **defaults + loss_ddp = cast( + torch.Tensor, ddp_optimizer.step(closure=closure_ddp), ) - sharded_ddp_model = DDP( - module=model, device_ids=[self.rank], broadcast_buffers=True, find_unused_parameters=True + loss_sharded_optim = cast( + torch.Tensor, + sharded_optimizer.step(closure=closure_sharded), ) - - ddp_model_single = copy.deepcopy(model) - ddp_model_single.to(self.device) - - ddp_optimizer = optimizer(ddp_model_single.parameters(), lr=1e-3, **defaults) - ddp_model = DDP( - ddp_model_single, device_ids=[self.rank], broadcast_buffers=True, find_unused_parameters=True + torch.testing.assert_close( + loss_ddp, loss_sharded_optim, + msg="Losses differ between local optimizer and ZeRO", + ) + self._check_same_model_params( + sharded_ddp_model, ddp_model, + "Models differ after a step", ) - # The model should be synchronized in between the ranks at construction time, check that - check_same_model_params(sharded_ddp_model, ddp_model, "Models differ from the start") - - def check_step(): - input_tensor = torch.rand((64, 2)) - - def closure_ddp(input_tensor=input_tensor): - ddp_optimizer.zero_grad() - ddp_loss = ddp_model(input_tensor).abs().sum() - ddp_loss.backward() - return ddp_loss - - def closure_sharded(input_tensor=input_tensor): - sharded_optimizer.zero_grad() - sharded_loss = sharded_ddp_model(input_tensor).abs().sum() - sharded_loss.backward() - return sharded_loss - - loss_ddp = cast(torch.Tensor, ddp_optimizer.step(closure=closure_ddp)) - loss_sharded_optim = cast(torch.Tensor, sharded_optimizer.step(closure=closure_sharded)) - - assert torch.allclose( - loss_ddp, loss_sharded_optim - ), "Losses differ in between Pytorch optim and ZeroRedundancyOptimizer" - - check_same_model_params(sharded_ddp_model, ddp_model, "Models differ after a step") - - # The models should stay the same in between the ranks - for i in range(BATCHS): - check_step() - - # Change the models trainability, check that parity is maintained - # only check after a couple of constant batchs to go through both regimes - if i > BATCHS // 2: - next(ddp_model.parameters()).requires_grad = bool(i % 2) - next(sharded_ddp_model.parameters()).requires_grad = bool(i % 2) - - # Check that the checkpoints are compatible - reference_rank = 0 - # - get states - ddp_state_dict = ddp_optimizer.state_dict() - sharded_optimizer.consolidate_state_dict(to=reference_rank) - sharded_optim_state_dict = [sharded_optimizer.state_dict() if self.rank == reference_rank else {}] - dist.broadcast_object_list(sharded_optim_state_dict, src=reference_rank, group=dist.group.WORLD) - sharded_optim_state_dict = sharded_optim_state_dict[0] - - # - cross load the states - # run one step and check that the models are still the same - ddp_state_dict_ref = copy.deepcopy(ddp_state_dict) # OSS will remove some states - ddp_optimizer.load_state_dict(sharded_optim_state_dict) # mixup on purpose ! - sharded_optimizer.load_state_dict(ddp_state_dict) - check_step() - - # - self load, rewind, check no problem - # run one step and check that the models are still the same - ddp_optimizer.load_state_dict(ddp_state_dict_ref) - sharded_optimizer.load_state_dict(sharded_optim_state_dict) + # Check that parity is maintained + for i in range(BATCHES): check_step() + # For the second half of batches, change the parameter + # trainability to further test parity + if i > BATCHES // 2: + next(ddp_model.parameters()).requires_grad = bool(i % 2) + next(sharded_ddp_model.parameters()).requires_grad = bool(i % 2) + + # Check that the `state_dict` checkpoints are compatible between + # the local optimizer and ZeRO + REFERENCE_RANK = 0 + # - Get states + ddp_state_dict = ddp_optimizer.state_dict() + sharded_optimizer.consolidate_state_dict(to=REFERENCE_RANK) + sharded_optim_state_dict = [ + sharded_optimizer.state_dict() + if self.rank == REFERENCE_RANK else {} + ] + dist.broadcast_object_list( + sharded_optim_state_dict, src=REFERENCE_RANK, + group=dist.group.WORLD, + ) + sharded_optim_state_dict = sharded_optim_state_dict[0] - for opt in [torch.optim.Adam, torch.optim.AdamW, torch.optim.SGD]: - for maximize in (True, False): - check_optimizer_equivalence(opt, maximize=maximize) + # - Cross-load the states + # Run one step and check that the models are still the same + ddp_state_dict_ref = copy.deepcopy(ddp_state_dict) + ddp_optimizer.load_state_dict(sharded_optim_state_dict) + sharded_optimizer.load_state_dict(ddp_state_dict) + check_step() + # - Reload their respective states + # Run one step and check that the models are still the same + ddp_optimizer.load_state_dict(ddp_state_dict_ref) + sharded_optimizer.load_state_dict(sharded_optim_state_dict) + check_step() def _test_zero_join(self, device): - r""" - Check that the ZeRO join hook allows training with uneven inputs when using the given device. - - Arguments: - device (torch.device): device used to store parameters and perform - collective communications. - """ + """Check that the ZeRO join hook allows training with uneven inputs + when using the given device.""" NUM_INPUTS = 3 NUM_EPOCHS = 2 + LR = 0.01 torch.manual_seed(0) torch.cuda.manual_seed(0) @@ -808,8 +925,6 @@ def _test_zero_join(self, device): is_gpu = device.type == "cuda" backend = _get_backend_for_tests() if is_gpu else dist.Backend.GLOO self.dist_init(rank, world_size, backend) - if is_gpu: - torch.cuda.set_device(self.device) model = torch.nn.Sequential( torch.nn.Linear(2, 3), @@ -822,14 +937,18 @@ def _test_zero_join(self, device): # local optimizers on uneven inputs should be equivalent to ZeRO on # uneven inputs with gradients being manually set ddp_model = DDP(model, device_ids=[rank]) if is_gpu else DDP(model) - local_optim = torch.optim.Adam(ddp_model.parameters(), lr=0.01) + local_optim = torch.optim.Adam(ddp_model.parameters(), lr=LR) zero_model = copy.deepcopy(model) zero_model.to(device) - zero_optim = ZeroRedundancyOptimizer(zero_model.parameters(), torch.optim.Adam, lr=0.01) + zero_optim = ZeroRedundancyOptimizer( + zero_model.parameters(), torch.optim.Adam, lr=LR, + ) loss_fn = torch.nn.MSELoss() # Use uneven inputs: rank i has i extra inputs - inputs = [torch.randn(20, 2).to(device) for _ in range(NUM_INPUTS + rank)] + inputs = [ + torch.randn(20, 2).to(device) for _ in range(NUM_INPUTS + rank) + ] labels = torch.randn(20, 3).to(device) # Save the gradients and parameters from DDP as the ground truth; do @@ -856,7 +975,10 @@ def _test_zero_join(self, device): # Broadcast the saved gradients and parameters to all of the other # ranks (which joined early) grads_and_params = [grads_at_each_iter, params_at_each_iter] - grads_and_params = _broadcast_object(grads_and_params, src_rank=world_size - 1, group=dist.group.WORLD, device=device) + grads_and_params = _broadcast_object( + grads_and_params, src_rank=world_size - 1, group=dist.group.WORLD, + device=device, + ) grads_at_each_iter = grads_and_params[0] params_at_each_iter = grads_and_params[1] # TODO: Replace this `_broadcast_object` with `broadcast_object_list` @@ -877,8 +999,9 @@ def __init__(self, zero_optim, grads): super().__init__() def main_hook(self): - grads = self.zero._join_grad_info.grads[self.zero._join_grad_info.index] - self.zero._join_grad_info.index += 1 + join_grad_info = self.zero._join_grad_info + grads = self.zero._join_grad_info.grads[join_grad_info.index] + join_grad_info.index += 1 for p, grad in zip(self.zero._all_params, grads): p.grad = grad.detach().clone().to(device) @@ -905,39 +1028,48 @@ def join_process_group(self): grads = grads_at_each_iter[-num_grads_after_joining:] gradient_setter = _GradientSetter() iter = 0 - with Join([gradient_setter, zero_optim], zero_optim=zero_optim, grads=grads): + with Join( + [gradient_setter, zero_optim], zero_optim=zero_optim, grads=grads, + ): for _ in range(NUM_EPOCHS): for input in inputs: # Notify join context that this process has not joined Join.notify_join_context(gradient_setter) - # Set gradients manually - for p, grad in zip(zero_model.parameters(), grads_at_each_iter[iter]): + for p, grad in zip( + zero_model.parameters(), grads_at_each_iter[iter], + ): p.grad = grad.detach().clone().to(device) - # Perform optimizer step and check parity zero_optim.step() - for p, ddp_p in zip(zero_model.parameters(), params_at_each_iter[iter]): - assert torch.allclose(p, ddp_p), \ - "Parameters differ between using ZeRO and local optimizer" + for p, ddp_p in zip( + zero_model.parameters(), params_at_each_iter[iter], + ): + torch.testing.assert_close( + p, ddp_p, + msg="Parameters differ between using ZeRO and " + "local optimizer", + ) iter += 1 @common_distributed.requires_nccl() - @common_distributed.skip_if_lt_x_gpu(2) + @common_distributed.skip_if_no_gpu def test_zero_join_gpu(self): - """Check that the ZeRO join hook allows training with uneven inputs on GPU.""" + """Check that the ZeRO join hook allows training with uneven inputs + on GPU.""" self._test_zero_join(self.device) @common_distributed.requires_gloo() def test_zero_join_cpu(self): - """Check that the ZeRO join hook allows training with uneven inputs on CPU.""" + """Check that the ZeRO join hook allows training with uneven inputs + on CPU.""" self._test_zero_join(torch.device("cpu")) def _test_zero_model_parallel(self, parameters_as_bucket_view: bool): # Use two processes each with two GPUs assert self.rank < 2 - NUM_EPOCHS = 3 - NUM_INPUTS = 5 + NUM_EPOCHS = 2 + NUM_INPUTS = 4 LR = 0.01 torch.manual_seed(0) torch.cuda.manual_seed(0) @@ -967,17 +1099,20 @@ def __init__(self): def forward(self, x): return self.net1(self.relu(self.net0(x))) - dev0 = 2 * self.rank - dev1 = 2 * self.rank + 1 + dev0 = torch.device(2 * self.rank) + dev1 = torch.device(2 * self.rank + 1) mp_model = ModelParallelModel(dev0, dev1) ddp_model = DDP(mp_model) - local_model = LocalModel() - cpu_device = torch.device("cpu") + local_model = LocalModel().to(dev0) + # Ensure the parameters are the same across the two models - local_model.net0.weight = torch.nn.Parameter(mp_model.net0.weight.detach().clone().to(cpu_device)) - local_model.net0.bias = torch.nn.Parameter(mp_model.net0.bias.detach().clone().to(cpu_device)) - local_model.net1.weight = torch.nn.Parameter(mp_model.net1.weight.detach().clone().to(cpu_device)) - local_model.net1.bias = torch.nn.Parameter(mp_model.net1.bias.detach().clone().to(cpu_device)) + def copy_param(p): + return torch.nn.Parameter(p.detach().clone().to(dev0)) + + local_model.net0.weight = copy_param(mp_model.net0.weight) + local_model.net0.bias = copy_param(mp_model.net0.bias) + local_model.net1.weight = copy_param(mp_model.net1.weight) + local_model.net1.bias = copy_param(mp_model.net1.bias) # Compare parity between DDP with model parallelism using ZeRO and # a local model using a local optimizer @@ -985,10 +1120,10 @@ def forward(self, x): ddp_model.parameters(), optimizer_class=torch.optim.Adam, parameters_as_bucket_view=parameters_as_bucket_view, - lr=LR + lr=LR, ) local_optim = torch.optim.Adam(local_model.parameters(), lr=LR) - inputs = [torch.randn(20, 10) for _ in range(NUM_INPUTS)] + inputs = [torch.randn(20, 10).to(dev0) for _ in range(NUM_INPUTS)] for _ in range(NUM_EPOCHS): for input in inputs: @@ -1004,40 +1139,42 @@ def closure_ddp(): ddp_loss.backward() return ddp_loss - local_loss = cast(torch.Tensor, local_optim.step(closure=closure_local)) - ddp_loss = cast(torch.Tensor, zero_optim.step(closure=closure_ddp)).to(cpu_device) - - # Increased tolerances are needed to pass test when using TensorFloat32 - # see https://github.com/pytorch/pytorch/issues/67764 - assert torch.allclose( - local_loss, ddp_loss, rtol=1e-03 - ), "Losses differ between local optim and ZeroRedundancyOptimizer" + local_loss = cast( + torch.Tensor, local_optim.step(closure=closure_local) + ) + ddp_loss = cast( + torch.Tensor, zero_optim.step(closure=closure_ddp) + ) - for local_p, ddp_p in zip(local_model.parameters(), ddp_model.parameters()): - ddp_p = ddp_p.to(cpu_device) - assert torch.allclose(local_p, ddp_p, rtol=1e-03, atol=1e-04), "Models differ after a step" + # Increased tolerances are needed to pass when using TF32 + # See: https://github.com/pytorch/pytorch/issues/67764 + torch.testing.assert_close( + local_loss.cpu(), ddp_loss.cpu(), rtol=1e-03, atol=1e-08, + ), "Losses differ between local optimizer and ZeRO" - @common_distributed.skip_if_lt_x_gpu(4) - def test_zero_model_parallel_with_bucket_view(self): - """ - Check that ZeRO works with model parallelism where layers are sharded - across devices when ``parameters_as_bucket_view=True``. - """ - if self.rank >= 2: - return - self.dist_init(self.rank, world_size=2) - self._test_zero_model_parallel(parameters_as_bucket_view=True) + for local_p, ddp_p in zip( + local_model.parameters(), + ddp_model.parameters() + ): + torch.testing.assert_close( + local_p.cpu(), ddp_p.cpu(), rtol=1e-03, atol=1e-04, + ), "Models differ after a step" @common_distributed.skip_if_lt_x_gpu(4) - def test_zero_model_parallel_without_bucket_view(self): - """ - Check that ZeRO works with model parallelism where layers are sharded - across devices when ``parameters_as_bucket_view=False``. - """ + @parametrize( + "parameters_as_bucket_view", + [False, True], + ) + def test_zero_model_parallel( + self, + parameters_as_bucket_view: bool, + ): + """Check that ZeRO works with model parallelism where the model's + layers are assigned to different devices.""" if self.rank >= 2: return self.dist_init(self.rank, world_size=2) - self._test_zero_model_parallel(parameters_as_bucket_view=False) + self._test_zero_model_parallel(parameters_as_bucket_view) def _test_ddp_zero_overlap( self, @@ -1058,22 +1195,21 @@ def _test_ddp_zero_overlap( is_gpu = device.type == "cuda" if is_gpu: torch.cuda.set_device(device) - models_to_test = [ - ( - torch.nn.Sequential( - torch.nn.Linear(1000, 2000), - torch.nn.Linear(2000, 500) - ), - [torch.randn(1, 1000).to(device) for _ in range(NUM_INPUTS)] + models_to_test = [( + torch.nn.Sequential( + torch.nn.Linear(1000, 2000), + torch.nn.Linear(2000, 500), ), - ] + [torch.randn(1, 1000).to(device) for _ in range(NUM_INPUTS)], + )] if HAS_TORCHVISION: - models_to_test.append( - ( - torchvision.models.resnet50(), - [torch.randn(1, 3, 3, 1000).to(device) for _ in range(NUM_INPUTS)] - ) - ) + models_to_test.append(( + torchvision.models.resnet50(), + [ + torch.randn(1, 3, 3, 1000).to(device) + for _ in range(NUM_INPUTS) + ] + )) for (model, inputs) in models_to_test: # Enable determinism in cudnn operators with torch.backends.cudnn.flags( @@ -1098,7 +1234,10 @@ def _test_ddp_zero_overlap( ) ddp_model_overlap.register_comm_hook( None, - hook_constructor(allreduce_hook, ddp_model_overlap, zero_optim, **kwargs) + hook_constructor( + allreduce_hook, ddp_model_overlap, zero_optim, + **kwargs, + ) ) # Set up the DDP model with local optimizer @@ -1163,120 +1302,68 @@ def _test_ddp_zero_overlap( self.assertEqual(p1, p2) # Check that the parameters were updated - self.assertNotEqual(init_params_overlap, list(ddp_model_overlap.parameters())) + self.assertNotEqual( + init_params_overlap, list(ddp_model_overlap.parameters()), + ) # Ensure that this test runs independently dist.barrier() + # NOTE: The test is skipped if using Windows since functional optimizers + # are not currently supported. @common_distributed.skip_if_win32() @common_distributed.requires_nccl() @common_distributed.skip_if_no_gpu @common_distributed.skip_if_rocm - def test_ddp_with_zero_step_parity_gpu(self): - r""" - Check that overlapping DDP with ZeRO using ``hook_with_zero_step()`` - achieves parity with DDP using a local optimizer when running on GPU. - - NOTE: The test is skipped if using Windows since functional optimizers - are not currently supported. + @parametrize( + "use_gpu", + [True], + # Add `False` once the Gloo sync issue causing hangs is fixed + # See: https://github.com/pytorch/pytorch/issues/62300 + ) + @parametrize( + "use_interleaved_hook", + [False, True], + ) + @parametrize( + "gradient_as_bucket_view", + [False, True], + ) + @parametrize( + "static_graph", + [False, True], + ) + @parametrize( + "shard_buckets", + [False, True], + ) + def test_ddp_zero_overlap( + self, + use_gpu: bool, + use_interleaved_hook: bool, + gradient_as_bucket_view: bool, + static_graph: bool, + shard_buckets: bool, + ): """ - self.dist_init(self.rank, self.world_size, dist.Backend.NCCL) - for gradient_as_bucket_view, static_graph in itertools.product( - [True, False], - [True, False] - ): - self._test_ddp_zero_overlap( - torch.device(self.rank), - hook_with_zero_step, - gradient_as_bucket_view, - static_graph - ) - # TODO: Add `test_ddp_with_zero_step_parity_cpu()` once the Gloo - # synchronization issue causing hangs is fixed. - - @common_distributed.skip_if_win32() - @common_distributed.requires_nccl() - @common_distributed.skip_if_no_gpu - @common_distributed.skip_if_rocm - def test_ddp_with_zero_step_interleaved_parity_gpu(self): - r""" - Check that overlapping DDP with ZeRO using - ``hook_with_zero_step_interleaved()`` achieves parity with DDP using a - local optimizer when running on GPU. - - NOTE: The test is skipped if using Windows since functional optimizers - are not currently supported. + Check that overlapping DDP with ZeRO using the given method determined + by ``hook_constructor`` and ``shard_buckets`` and using the given ZeRO + and DDP arguments achieves parity with DDP using a local optimizer. """ - self.dist_init(self.rank, self.world_size, dist.Backend.NCCL) - for gradient_as_bucket_view, static_graph in itertools.product( - [True, False], - [True, False] - ): - self._test_ddp_zero_overlap( - torch.device(self.rank), - hook_with_zero_step_interleaved, - gradient_as_bucket_view, - static_graph - ) - # TODO: Add `test_ddp_with_zero_step_interleaved_parity_cpu()` once the - # Gloo synchronization issue causing hangs is fixed. + device = torch.device(self.rank) if use_gpu else torch.device("cpu") + backend = _get_backend_for_tests() + self.dist_init(self.rank, self.world_size, backend) + hook_constructor = hook_with_zero_step if not use_interleaved_hook \ + else hook_with_zero_step_interleaved + self._test_ddp_zero_overlap( + device, hook_constructor, gradient_as_bucket_view, static_graph, + shard_buckets=shard_buckets, + ) - @common_distributed.skip_if_win32() - @common_distributed.requires_nccl() - @common_distributed.skip_if_no_gpu - @common_distributed.skip_if_rocm - def test_ddp_with_zero_step_uniform_parity_gpu(self): - r""" - Check that overlapping DDP with ZeRO using - ``hook_with_zero_step()`` with ``shard_buckets=True`` - achieves parity with DDP using a local optimizer when running on GPU. - - NOTE: The test is skipped if using Windows since functional optimizers - are not currently supported. - """ - self.dist_init(self.rank, self.world_size, dist.Backend.NCCL) - for gradient_as_bucket_view, static_graph in itertools.product( - [True, False], - [True, False] - ): - self._test_ddp_zero_overlap( - torch.device(self.rank), - hook_with_zero_step, - gradient_as_bucket_view, - static_graph, - shard_buckets=True, - ) - # TODO: Add `test_ddp_with_zero_step_uniform_parity_cpu()` once the Gloo - # synchronization issue causing hangs is fixed. - @common_distributed.skip_if_win32() - @common_distributed.requires_nccl() - @common_distributed.skip_if_no_gpu - @common_distributed.skip_if_rocm - def test_ddp_with_zero_step_interleaved_uniform_parity_gpu(self): - r""" - Check that overlapping DDP with ZeRO using - ``hook_with_zero_step()`` with ``shard_buckets=True`` - achieves parity with DDP using a local optimizer when running on GPU. - - NOTE: The test is skipped if using Windows since functional optimizers - are not currently supported. - """ - self.dist_init(self.rank, self.world_size, dist.Backend.NCCL) - for gradient_as_bucket_view, static_graph in itertools.product( - [True, False], - [True, False] - ): - self._test_ddp_zero_overlap( - torch.device(self.rank), - hook_with_zero_step_interleaved, - gradient_as_bucket_view, - static_graph, - shard_buckets=True, - ) - # TODO: Add `test_ddp_with_zero_step_interleaved_uniform_parity_cpu()` once - # the Gloo synchronization issue causing hangs is fixed. +instantiate_parametrized_tests(TestZeroRedundancyOptimizerSingleRank) +instantiate_parametrized_tests(TestZeroRedundancyOptimizerDistributed) if __name__ == "__main__": # ! unittest should not be used here, else the tests are not properly registered - common_utils.run_tests() + run_tests() diff --git a/test/distributed/test_c10d_common.py b/test/distributed/test_c10d_common.py index 3bdb0fc15e0411..5c29f1fd448d84 100644 --- a/test/distributed/test_c10d_common.py +++ b/test/distributed/test_c10d_common.py @@ -9,6 +9,7 @@ from datetime import timedelta from itertools import product from sys import platform +from contextlib import suppress import torch import torch.distributed as dist @@ -18,6 +19,7 @@ sys.exit(0) import torch.distributed.distributed_c10d as c10d +from torch.utils.checkpoint import checkpoint import torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook as powerSGD import torch.nn.functional as F import torch.testing._internal.common_utils as common @@ -25,12 +27,16 @@ from torch.nn.parallel import DistributedDataParallel from torch.testing._internal.common_distributed import ( MultiProcessTestCase, + skip_if_lt_x_gpu, ) + from torch.testing._internal.common_utils import ( TestCase, load_tests, run_tests, TEST_WITH_DEV_DBG_ASAN, + instantiate_parametrized_tests, + parametrize ) if TEST_WITH_DEV_DBG_ASAN: @@ -238,7 +244,7 @@ def forward(self, x): return F.softmax(self.embedding(x), dim=1) -class AbstractDistributedDataParallelTest(object): +class CommonDistributedDataParallelTest(object): def tearDown(self): # DistributedDataParallel test doesn't seem to call FileStore destructor # TODO: investigate this test and the test is known to have issues @@ -307,6 +313,363 @@ def _prepare_multi_device_module( return model, ddp_model, input, target + def _get_store(self): + return dist.FileStore(self.file_name, self.world_size) + + def _get_process_group(self): + raise NotImplementedError("To be implemented by child class") + + def _train_model(self, model, input_var, target, loss, run_checkpoint=False, use_reentrant=True): + model.train() + if run_checkpoint: + output = checkpoint(model, input_var, use_reentrant=use_reentrant) + else: + output = model(input_var) + l = loss(output, target) + l.backward() + + def _test_ddp_checkpointing( + self, + input_model, + process_group, + use_bucket_view, + find_unused_parameters=False, + static_graph=False, + run_checkpoint=False, + use_reentrant=True, + allow_none_grads=False, + ): + # to reproduce the same training results + torch.cuda.set_device(self.rank) + torch.manual_seed(31415) + model = copy.deepcopy(input_model).cuda() + ddp_model = copy.deepcopy(input_model).cuda() + ddp_model = nn.parallel.DistributedDataParallel( + ddp_model, + bucket_cap_mb=1, + gradient_as_bucket_view=use_bucket_view, + device_ids=[self.rank], + process_group=process_group, + find_unused_parameters=find_unused_parameters, + static_graph=static_graph, + ) + self.assertEqual( + ddp_model._get_ddp_logging_data().get("static_graph", 0), static_graph + ) + input, ddp_input, target, ddp_target = self._prepare_dummy_data() + loss = nn.MSELoss() + n_iters = 5 + for i in range(n_iters): + model.zero_grad(set_to_none=False) + ddp_model.zero_grad(set_to_none=False) + self._train_model(model, input, target, loss, run_checkpoint=run_checkpoint, use_reentrant=use_reentrant) + self._train_model( + ddp_model, ddp_input, ddp_target, loss, run_checkpoint=run_checkpoint, use_reentrant=use_reentrant + ) + for i, j in zip(model.parameters(), ddp_model.parameters()): + if not allow_none_grads: + self.assertTrue(i.grad is not None) + self.assertTrue(j.grad is not None) + self.assertEqual(i.grad, j.grad, rtol=1.3e-06, atol=5e-5) + + # A list of tests for ddp with activation checkpointing + # when gradient_as_bucket_view=True, False. + # Most of the tests are referred to + # https://github.com/facebookresearch/fairscale/blob/main/tests/nn/pipe/test_checkpoint_ddp.py + class CheckpointOnceModule(nn.Module): + """ + Runs checkpoint for a single layer in the model. + """ + def __init__(self, use_reentrant=True): + super().__init__() + self.l1 = nn.Linear(20, 20) + self.l2 = nn.Linear(20, 20) + self.use_reentrant = use_reentrant + + def forward(self, inp): + x = self.l1(inp) + x = checkpoint(self.l2, x, use_reentrant=self.use_reentrant) + return x + + class CheckpointTwiceModule(CheckpointOnceModule): + """ + Runs checkpoint for the same layer twice in a model. This simulates use + cases such as pipeline parallel where the same layer can be checkpointed + more than one time. + """ + def __init__(self, use_reentrant=True): + super().__init__(use_reentrant=use_reentrant) + + def forward(self, inp): + x = self.l1(inp) + x = checkpoint(self.l2, x, use_reentrant=self.use_reentrant) + x = checkpoint(self.l2, x, use_reentrant=self.use_reentrant) + return x + + class CheckpointTwiceModuleWeightSharing(CheckpointTwiceModule): + """ + Similar to CheckpointTwiceModule but the weights are shared. + """ + def __init__(self, use_reentrant=True): + super().__init__(use_reentrant=use_reentrant) + # Share weights + self.l1.weight = self.l2.weight + + def forward(self, inp): + x = self.l1(inp) + x = checkpoint(self.l2, x, use_reentrant=self.use_reentrant) + x = checkpoint(self.l2, x, use_reentrant=self.use_reentrant) + return x + + + class DynamicCheckpointTwiceModule(CheckpointTwiceModule): + def __init__(self, use_reentrant=True): + super().__init__(use_reentrant=use_reentrant) + self.count = 0 + + def forward(self, inp): + if self.count % 2: + x = checkpoint(self.l1, inp, use_reentrant=self.use_reentrant) + else: + x = checkpoint(self.l2, inp, use_reentrant=self.use_reentrant) + + self.count += 1 + return x + + class DynamicCheckpointTwiceModuleWeightSharing(DynamicCheckpointTwiceModule): + def __init__(self, use_reentrant=True): + super().__init__(use_reentrant=use_reentrant) + # Share weights + self.l1.weight = self.l2.weight + + + def _prepare_dummy_data(self): + ddp_bs = 16 + bs = ddp_bs * self.world_size + input = torch.rand((bs, 20), device="cuda", requires_grad=True) + target = torch.randn((bs, 20), device="cuda") + offset = self.rank * ddp_bs + ddp_input = input[offset : offset + ddp_bs] + ddp_target = target[offset : offset + ddp_bs] + return input, ddp_input, target, ddp_target + + + @skip_if_lt_x_gpu(2) + @parametrize("use_reentrant", [True, False]) + def test_ddp_checkpointing_once(self, use_reentrant): + """ + DDP works as expected when layer is checkpointed only once. + """ + process_group = self._get_process_group() + for use_bucket_view, static_graph in product((False, True), (False, True)): + self._test_ddp_checkpointing( + self.CheckpointOnceModule(use_reentrant=use_reentrant), + process_group=process_group, + use_bucket_view=use_bucket_view, + static_graph=static_graph, + ) + if static_graph: + # find_unused_parameters does not make a difference, since it is + # ignored for static graph. + self._test_ddp_checkpointing( + self.CheckpointOnceModule(), + process_group=process_group, + use_bucket_view=use_bucket_view, + static_graph=static_graph, + find_unused_parameters=True, + ) + + @skip_if_lt_x_gpu(2) + @parametrize("use_reentrant", [True, False]) + def test_ddp_checkpointing_unused_params(self, use_reentrant): + """ + With reentrant autograd checkpointing impl, DDP will fail when there are + unused params in the model and no static graph training. With + non-reentrant checkpointing implementation, this works as expected. + """ + process_group = self._get_process_group() + for use_bucket_view in (True, False): + err_ctx = ( + suppress() if not use_reentrant else + self.assertRaisesRegex( + RuntimeError, + "Expected to mark a variable ready only once." + ) + ) + with err_ctx: + model = self._test_ddp_checkpointing( + self.CheckpointOnceModule(use_reentrant=use_reentrant), + process_group=process_group, + use_bucket_view=use_bucket_view, + find_unused_parameters=True, + ) + # test passes when static_graph is true + model = self._test_ddp_checkpointing( + self.CheckpointOnceModule(use_reentrant=use_reentrant), + process_group=process_group, + use_bucket_view=use_bucket_view, + find_unused_parameters=True, + static_graph=True, + ) + + @skip_if_lt_x_gpu(2) + @parametrize("use_reentrant", [True, False]) + def test_ddp_checkpointing_twice(self, use_reentrant): + """ + Checkpoitning twice fails for non-static graph with reentrant checkpoint + implementation, succeeds with non-reentrant checkpoint implementation. + """ + process_group = self._get_process_group() + for use_bucket_view in (True, False): + err_ctx = ( + suppress() if not use_reentrant else + self.assertRaisesRegex( + RuntimeError, + "Expected to mark a variable ready only once." + ) + ) + with err_ctx: + model = self._test_ddp_checkpointing( + self.CheckpointTwiceModule(use_reentrant=use_reentrant), + process_group=process_group, + use_bucket_view=use_bucket_view, + static_graph=False, + ) + + with err_ctx: + model = self._test_ddp_checkpointing( + self.CheckpointTwiceModule(use_reentrant=use_reentrant), + process_group=process_group, + use_bucket_view=use_bucket_view, + static_graph=False, + find_unused_parameters=True, + ) + + @skip_if_lt_x_gpu(2) + @parametrize("use_reentrant", [True, False]) + def test_ddp_checkpointing_twice_static_graph(self, use_reentrant): + """ + Regardless of reentrant or non-reentrant checkpointing impl, + checkpointing twice works with static graph enabled. + """ + process_group = self._get_process_group() + for use_bucket_view in (True, False): + # Test passes when static_graph=True. + model = self._test_ddp_checkpointing( + self.CheckpointTwiceModule(use_reentrant=use_reentrant), + process_group=process_group, + use_bucket_view=use_bucket_view, + static_graph=True, + ) + + @skip_if_lt_x_gpu(2) + def test_ddp_checkpointing_dynamic_module(self): + """ + Dynamic module can be checkpointed, multiple times, with non-reentrant + checkpointing implementation. + """ + process_group = self._get_process_group() + for use_bucket_view in (True, False): + model = self._test_ddp_checkpointing( + self.DynamicCheckpointTwiceModule(use_reentrant=False), + process_group=process_group, + use_bucket_view=use_bucket_view, + static_graph=False, + find_unused_parameters=True, + # Grads can be none sometimes due to dynamic module not using + # all params. + allow_none_grads=True + ) + + @skip_if_lt_x_gpu(2) + def test_ddp_checkpointing_dynamic_weight_sharing(self): + """ + Dynamic module can be checkpointed multiple times with weight sharing + using non-reentrant checkpointing implementation. + """ + process_group = self._get_process_group() + for use_bucket_view in (True, False): + model = self._test_ddp_checkpointing( + self.DynamicCheckpointTwiceModuleWeightSharing(use_reentrant=False), + process_group=process_group, + use_bucket_view=use_bucket_view, + static_graph=False, + find_unused_parameters=True, + # Grads can be none sometimes due to dynamic module not using + # all params. + allow_none_grads=True + ) + + # DDP works as expected if there is weight sharing among layers + @skip_if_lt_x_gpu(2) + @parametrize("use_reentrant", [True, False]) + def test_ddp_checkpointing_weight_sharing(self, use_reentrant): + """ + Test that checkpointing with weight sharing works. + """ + process_group = self._get_process_group() + torch.cuda.set_device(self.rank) + for use_bucket_view, static_graph in product((False, True), (False, True)): + torch.manual_seed(31415) + l1 = nn.Linear(20, 20) + l2 = nn.Linear(20, 20) + l1.weight = l2.weight + model = nn.Sequential(l1, l2) + # TODO: non-reentrant based checkpointing of DDP module with + # static_graph runs into the below issue, see + # https://github.com/pytorch/pytorch/issues/70865 and + # https://github.com/pytorch/pytorch/issues/58111 for details. + err_ctx = ( + self.assertRaisesRegex( + RuntimeError, + "Your training graph has changed in this iteration" + ) if static_graph and not use_reentrant else suppress() + ) + with err_ctx: + self._test_ddp_checkpointing( + model, + process_group=process_group, + use_bucket_view=use_bucket_view, + static_graph=static_graph, + run_checkpoint=True, + use_reentrant=use_reentrant, + ) + + @skip_if_lt_x_gpu(2) + def test_ddp_checkpointing_twice_weight_sharing(self): + """ + Checkpointing should work with static graph in the case of checkpointing + same layer twice and having weights shared acrosss layers. + """ + process_group = self._get_process_group() + torch.cuda.set_device(self.rank) + for use_bucket_view in (True, False): + model = self._test_ddp_checkpointing( + self.CheckpointTwiceModuleWeightSharing(), + process_group=process_group, + use_bucket_view=use_bucket_view, + static_graph=True, + ) + + def test_invalid_powerSGD_state(self): + for start_powerSGD_iter, use_error_feedback, warm_start in product( + [0, 1], [True, False], [True, False] + ): + if not use_error_feedback and not warm_start: + continue + with self.assertRaisesRegex( + ValueError, + "Expect `start_powerSGD_iter` > 1 if `use_error_feedback` or `warm_start` is enabled, " + "because PowerSGD can only be applied after the first two iterations in DDP.", + ): + state = powerSGD.PowerSGDState( + process_group=None, + matrix_approximation_rank=1, + start_powerSGD_iter=start_powerSGD_iter, + use_error_feedback=use_error_feedback, + warm_start=warm_start, + ) + def _test_ddp_with_process_group( self, process_group, @@ -443,33 +806,101 @@ def fut_then(fut): return fut.then(fut_then) + def _test_not_nan(self, model, x): + y = model(x) + self.assertFalse(y.isnan().any().item()) + y.sum().backward() + for p in model.parameters(): + self.assertFalse(p.grad.isnan().any().item()) + + @skip_if_lt_x_gpu(2) + def test_sync_batch_norm_only_empty_input(self): + pg = self._get_process_group() + + model = torch.nn.Sequential( + nn.BatchNorm2d(2), + ).to(device=self.rank) + model = DistributedDataParallel( + model, + device_ids=[self.rank], + process_group=pg, + ) + model = nn.SyncBatchNorm.convert_sync_batchnorm( + model, + process_group=pg, + ) -class DistributedDataParallelTest( - AbstractDistributedDataParallelTest, MultiProcessTestCase -): - def setUp(self): - super(DistributedDataParallelTest, self).setUp() - self._spawn_processes() + model.train() - def test_invalid_powerSGD_state(self): - for start_powerSGD_iter, use_error_feedback, warm_start in product( - [0, 1], [True, False], [True, False] - ): - if not use_error_feedback and not warm_start: - continue - with self.assertRaisesRegex( - ValueError, - "Expect `start_powerSGD_iter` > 1 if `use_error_feedback` or `warm_start` is enabled, " - "because PowerSGD can only be applied after the first two iterations in DDP.", - ): - state = powerSGD.PowerSGDState( - process_group=None, - matrix_approximation_rank=1, - start_powerSGD_iter=start_powerSGD_iter, - use_error_feedback=use_error_feedback, - warm_start=warm_start, - ) + # only rank 0 receives empty inputs + x = torch.zeros( + (1 if self.rank != 0 else 0, 2, 11, 13), + dtype=torch.float32, + device=self.rank + ) + + # input requires grad, this will trigger the collective communication + # in the backward pass + x.requires_grad = True + self._test_not_nan(model, x) + + # input does not requires grad + x.requires_grad = False + self._test_not_nan(model, x) + + # all ranks receive empty inputs + x = torch.zeros( + (0, 2, 11, 13), + dtype=torch.float32, + device=self.rank + ) + + # input requires grad, this will trigger the collective communication + # in the backward pass + x.requires_grad = True + self._test_not_nan(model, x) + + # input does not requires grad + x.requires_grad = False + self._test_not_nan(model, x) + + @skip_if_lt_x_gpu(2) + def test_sync_batch_norm_empty_input(self): + pg = self._get_process_group() + + model = torch.nn.Sequential( + nn.Conv2d(2, 2, 3), + nn.BatchNorm2d(2), + nn.Linear(28, 2), + ).to(device=self.rank) + model = DistributedDataParallel( + model, + device_ids=[self.rank], + process_group=pg, + ) + model = nn.SyncBatchNorm.convert_sync_batchnorm( + model, + process_group=pg, + ) + + model.train() + # only rank 0 receives empty inputs + x = torch.zeros( + (3 if self.rank != 0 else 0, 2, 30, 30), + dtype=torch.float32, + device=self.rank + ) + self._test_not_nan(model, x) + + # all ranks receive empty inputs + x = torch.zeros( + (0, 2, 30, 30), + dtype=torch.float32, + device=self.rank + ) + + self._test_not_nan(model, x) class ComputeBucketAssignmentTest(TestCase): def test_single_limit_single_dtype(self): @@ -892,6 +1323,8 @@ def test_send_recv(self): # user applications would explicitly that. +instantiate_parametrized_tests(CommonDistributedDataParallelTest) + if __name__ == "__main__": assert ( diff --git a/test/distributed/test_c10d_gloo.py b/test/distributed/test_c10d_gloo.py index 9cd515fb05cbf7..22b5d7a98f6cf7 100644 --- a/test/distributed/test_c10d_gloo.py +++ b/test/distributed/test_c10d_gloo.py @@ -1136,8 +1136,14 @@ def _test_allgather_stress(self, inputs, fn): [[torch.tensor([i + j]) for j in range(self.world_size)]] for i in range(len(inputs)) ] + input_holder = {} for i in range(len(inputs)): - fut = pg.allgather(outputs[i], [fn(inputs[i])]).get_future() + # Note that this works around the data race discussed in + # https://github.com/pytorch/pytorch/issues/75529, but we should + # actually be able to pass the list directly into allgather when + # that race is fixed. + input_holder[i] = [fn(inputs[i])] + fut = pg.allgather(outputs[i], input_holder[i]).get_future() future_handles.append(fut) for i, future_handle in enumerate(future_handles): @@ -1457,12 +1463,16 @@ def create(num, prefix): class DistributedDataParallelTest( - test_c10d_common.AbstractDistributedDataParallelTest, MultiProcessTestCase + test_c10d_common.CommonDistributedDataParallelTest, MultiProcessTestCase ): def setUp(self): super(DistributedDataParallelTest, self).setUp() self._spawn_processes() + def _get_process_group(self): + store = self._get_store() + return c10d.ProcessGroupGloo(store, self.rank, self.world_size) + def _test_gloo_backend( self, devices, device_ids, multi_device=False, gradient_as_bucket_view=False ): diff --git a/test/distributed/test_c10d_nccl.py b/test/distributed/test_c10d_nccl.py index afe3a7cc19a374..e9eca078960aaa 100644 --- a/test/distributed/test_c10d_nccl.py +++ b/test/distributed/test_c10d_nccl.py @@ -9,7 +9,7 @@ import tempfile import threading import time -from contextlib import contextmanager, suppress +from contextlib import contextmanager from datetime import timedelta from itertools import product from unittest import mock @@ -49,11 +49,8 @@ TEST_WITH_DEV_DBG_ASAN, TEST_WITH_ROCM, sandcastle_skip, - instantiate_parametrized_tests, - parametrize, sandcastle_skip_if, ) -from torch.utils.checkpoint import checkpoint if TEST_WITH_DEV_DBG_ASAN: print( @@ -949,7 +946,7 @@ def allreduce(tensors): class DistributedDataParallelTest( - test_c10d_common.AbstractDistributedDataParallelTest, MultiProcessTestCase + test_c10d_common.CommonDistributedDataParallelTest, MultiProcessTestCase ): def setUp(self): super(DistributedDataParallelTest, self).setUp() @@ -958,6 +955,10 @@ def setUp(self): os.environ["NCCL_ASYNC_ERROR_HANDLING"] = "1" self._spawn_processes() + def _get_process_group(self): + store = self._get_store() + return c10d.ProcessGroupNCCL(store, self.rank, self.world_size) + def _test_nccl_backend( self, devices, device_ids, multi_device=False, gradient_as_bucket_view=False ): @@ -2216,349 +2217,6 @@ def test_ddp_weight_sharing(self): ), ) - # A list of tests for ddp with activation checkpointing - # when gradient_as_bucket_view=True, False. - # Most of the tests are referred to - # https://github.com/facebookresearch/fairscale/blob/main/tests/nn/pipe/test_checkpoint_ddp.py - class CheckpointOnceModule(nn.Module): - """ - Runs checkpoint for a single layer in the model. - """ - def __init__(self, use_reentrant=True): - super().__init__() - self.l1 = nn.Linear(20, 20) - self.l2 = nn.Linear(20, 20) - self.use_reentrant = use_reentrant - - def forward(self, inp): - x = self.l1(inp) - x = checkpoint(self.l2, x, use_reentrant=self.use_reentrant) - return x - - class CheckpointTwiceModule(CheckpointOnceModule): - """ - Runs checkpoint for the same layer twice in a model. This simulates use - cases such as pipeline parallel where the same layer can be checkpointed - more than one time. - """ - def __init__(self, use_reentrant=True): - super().__init__(use_reentrant=use_reentrant) - - def forward(self, inp): - x = self.l1(inp) - x = checkpoint(self.l2, x, use_reentrant=self.use_reentrant) - x = checkpoint(self.l2, x, use_reentrant=self.use_reentrant) - return x - - class CheckpointTwiceModuleWeightSharing(CheckpointTwiceModule): - """ - Similar to CheckpointTwiceModule but the weights are shared. - """ - def __init__(self, use_reentrant=True): - super().__init__(use_reentrant=use_reentrant) - - def forward(self, inp): - x = self.l1(inp) - x = checkpoint(self.l2, x, use_reentrant=self.use_reentrant) - x = checkpoint(self.l2, x, use_reentrant=self.use_reentrant) - return x - - - class DynamicCheckpointTwiceModule(CheckpointTwiceModule): - def __init__(self, use_reentrant=True): - super().__init__(use_reentrant=use_reentrant) - self.count = 0 - - def forward(self, inp): - if self.count % 2: - x = checkpoint(self.l1, inp, use_reentrant=self.use_reentrant) - else: - x = checkpoint(self.l2, inp, use_reentrant=self.use_reentrant) - - self.count += 1 - return x - - class DynamicCheckpointTwiceModuleWeightSharing(DynamicCheckpointTwiceModule): - def __init__(self, use_reentrant=True): - super().__init__(use_reentrant=use_reentrant) - self.l1.weight = self.l2.weight - - - def _prepare_dummy_data(self): - ddp_bs = 16 - bs = ddp_bs * self.world_size - input = torch.rand((bs, 20), device="cuda", requires_grad=True) - target = torch.randn((bs, 20), device="cuda") - offset = self.rank * ddp_bs - ddp_input = input[offset : offset + ddp_bs] - ddp_target = target[offset : offset + ddp_bs] - return input, ddp_input, target, ddp_target - - def _train_model(self, model, input_var, target, loss, run_checkpoint=False, use_reentrant=True): - model.train() - if run_checkpoint: - output = checkpoint(model, input_var, use_reentrant=use_reentrant) - else: - output = model(input_var) - l = loss(output, target) - l.backward() - - def _test_ddp_checkpointing( - self, - input_model, - process_group, - use_bucket_view, - find_unused_parameters=False, - static_graph=False, - run_checkpoint=False, - use_reentrant=True, - allow_none_grads=False, - ): - # to reproduce the same training results - torch.cuda.set_device(self.rank) - torch.manual_seed(31415) - model = copy.deepcopy(input_model).cuda() - ddp_model = copy.deepcopy(input_model).cuda() - ddp_model = nn.parallel.DistributedDataParallel( - ddp_model, - bucket_cap_mb=1, - gradient_as_bucket_view=use_bucket_view, - device_ids=[self.rank], - process_group=process_group, - find_unused_parameters=find_unused_parameters, - static_graph=static_graph, - ) - self.assertEqual( - ddp_model._get_ddp_logging_data().get("static_graph", 0), static_graph - ) - input, ddp_input, target, ddp_target = self._prepare_dummy_data() - loss = nn.MSELoss() - n_iters = 5 - for i in range(n_iters): - model.zero_grad(set_to_none=False) - ddp_model.zero_grad(set_to_none=False) - self._train_model(model, input, target, loss, run_checkpoint=run_checkpoint, use_reentrant=use_reentrant) - self._train_model( - ddp_model, ddp_input, ddp_target, loss, run_checkpoint=run_checkpoint, use_reentrant=use_reentrant - ) - for i, j in zip(model.parameters(), ddp_model.parameters()): - if not allow_none_grads: - self.assertTrue(i.grad is not None) - self.assertTrue(j.grad is not None) - self.assertEqual(i.grad, j.grad, rtol=1.3e-06, atol=5e-5) - - @requires_nccl() - @skip_if_lt_x_gpu(2) - @parametrize("use_reentrant", [True, False]) - def test_ddp_checkpointing_once(self, use_reentrant): - """ - DDP works as expected when layer is checkpointed only once. - """ - store = c10d.FileStore(self.file_name, self.world_size) - process_group = c10d.ProcessGroupNCCL(store, self.rank, self.world_size) - for use_bucket_view, static_graph in product((False, True), (False, True)): - self._test_ddp_checkpointing( - self.CheckpointOnceModule(use_reentrant=use_reentrant), - process_group=process_group, - use_bucket_view=use_bucket_view, - static_graph=static_graph, - ) - if static_graph: - # find_unused_parameters does not make a difference, since it is - # ignored for static graph. - self._test_ddp_checkpointing( - self.CheckpointOnceModule(), - process_group=process_group, - use_bucket_view=use_bucket_view, - static_graph=static_graph, - find_unused_parameters=True, - ) - - @requires_nccl() - @skip_if_lt_x_gpu(2) - @parametrize("use_reentrant", [True, False]) - def test_ddp_checkpointing_unused_params(self, use_reentrant): - """ - With reentrant autograd checkpointing impl, DDP will fail when there are - unused params in the model and no static graph training. With - non-reentrant checkpointing implementation, this works as expected. - """ - store = c10d.FileStore(self.file_name, self.world_size) - process_group = c10d.ProcessGroupNCCL(store, self.rank, self.world_size) - for use_bucket_view in (True, False): - err_ctx = ( - suppress() if not use_reentrant else - self.assertRaisesRegex( - RuntimeError, - "Expected to mark a variable ready only once." - ) - ) - with err_ctx: - model = self._test_ddp_checkpointing( - self.CheckpointOnceModule(use_reentrant=use_reentrant), - process_group=process_group, - use_bucket_view=use_bucket_view, - find_unused_parameters=True, - ) - # test passes when static_graph is true - model = self._test_ddp_checkpointing( - self.CheckpointOnceModule(use_reentrant=use_reentrant), - process_group=process_group, - use_bucket_view=use_bucket_view, - find_unused_parameters=True, - static_graph=True, - ) - - @requires_nccl() - @skip_if_lt_x_gpu(2) - @parametrize("use_reentrant", [True, False]) - def test_ddp_checkpointing_twice(self, use_reentrant): - """ - Checkpoitning twice fails for non-static graph with reentrant checkpoint - implementation, succeeds with non-reentrant checkpoint implementation. - """ - store = c10d.FileStore(self.file_name, self.world_size) - process_group = c10d.ProcessGroupNCCL(store, self.rank, self.world_size) - for use_bucket_view in (True, False): - err_ctx = ( - suppress() if not use_reentrant else - self.assertRaisesRegex( - RuntimeError, - "Expected to mark a variable ready only once." - ) - ) - with err_ctx: - model = self._test_ddp_checkpointing( - self.CheckpointTwiceModule(use_reentrant=use_reentrant), - process_group=process_group, - use_bucket_view=use_bucket_view, - static_graph=False, - ) - - with err_ctx: - model = self._test_ddp_checkpointing( - self.CheckpointTwiceModule(use_reentrant=use_reentrant), - process_group=process_group, - use_bucket_view=use_bucket_view, - static_graph=False, - find_unused_parameters=True, - ) - - @requires_nccl() - @skip_if_lt_x_gpu(2) - @parametrize("use_reentrant", [True, False]) - def test_ddp_checkpointing_twice_static_graph(self, use_reentrant): - """ - Regardless of reentrant or non-reentrant checkpointing impl, - checkpointing twice works with static graph enabled. - """ - store = c10d.FileStore(self.file_name, self.world_size) - process_group = c10d.ProcessGroupNCCL(store, self.rank, self.world_size) - for use_bucket_view in (True, False): - # Test passes when static_graph=True. - model = self._test_ddp_checkpointing( - self.CheckpointTwiceModule(use_reentrant=use_reentrant), - process_group=process_group, - use_bucket_view=use_bucket_view, - static_graph=True, - ) - - @requires_nccl() - @skip_if_lt_x_gpu(2) - def test_ddp_checkpointing_dynamic_module(self): - """ - Dynamic module can be checkpointed, multiple times, with non-reentrant - checkpointing implementation. - """ - store = c10d.FileStore(self.file_name, self.world_size) - process_group = c10d.ProcessGroupNCCL(store, self.rank, self.world_size) - for use_bucket_view in (True, False): - model = self._test_ddp_checkpointing( - self.DynamicCheckpointTwiceModule(use_reentrant=False), - process_group=process_group, - use_bucket_view=use_bucket_view, - static_graph=False, - find_unused_parameters=True, - # Grads can be none sometimes due to dynamic module not using - # all params. - allow_none_grads=True - ) - - @requires_nccl() - @skip_if_lt_x_gpu(2) - def test_ddp_checkpointing_dynamic_weight_sharing(self): - """ - Dynamic module can be checkpointed multiple times with weight sharing - using non-reentrant checkpointing implementation. - """ - store = c10d.FileStore(self.file_name, self.world_size) - process_group = c10d.ProcessGroupNCCL(store, self.rank, self.world_size) - for use_bucket_view in (True, False): - model = self._test_ddp_checkpointing( - self.DynamicCheckpointTwiceModuleWeightSharing(use_reentrant=False), - process_group=process_group, - use_bucket_view=use_bucket_view, - static_graph=False, - find_unused_parameters=True, - # Grads can be none sometimes due to dynamic module not using - # all params. - allow_none_grads=True - ) - - # DDP works as expected if there is weight sharing among layers - @requires_nccl() - @skip_if_lt_x_gpu(2) - @parametrize("use_reentrant", [True, False]) - def test_ddp_checkpointing_weight_sharing(self, use_reentrant): - """ - Test that checkpointing with weight sharing works. - """ - store = c10d.FileStore(self.file_name, self.world_size) - process_group = c10d.ProcessGroupNCCL(store, self.rank, self.world_size) - torch.cuda.set_device(self.rank) - for use_bucket_view, static_graph in product((False, True), (False, True)): - torch.manual_seed(31415) - l1 = nn.Linear(20, 20) - l2 = nn.Linear(20, 20) - l1.weight = l2.weight - model = nn.Sequential(l1, l2) - # TODO: non-reentrant based checkpointing of DDP module with - # static_graph runs into the below issue, see - # https://github.com/pytorch/pytorch/issues/70865 and - # https://github.com/pytorch/pytorch/issues/58111 for details. - err_ctx = ( - self.assertRaisesRegex( - RuntimeError, - "Your training graph has changed in this iteration" - ) if static_graph and not use_reentrant else suppress() - ) - with err_ctx: - self._test_ddp_checkpointing( - model, - process_group=process_group, - use_bucket_view=use_bucket_view, - static_graph=static_graph, - run_checkpoint=True, - use_reentrant=use_reentrant, - ) - - @requires_nccl() - @skip_if_lt_x_gpu(2) - def test_ddp_checkpointing_twice_weight_sharing(self): - """ - Checkpointing should work with static graph in the case of checkpointing - same layer twice and having weights shared acrosss layers. - """ - store = c10d.FileStore(self.file_name, self.world_size) - process_group = c10d.ProcessGroupNCCL(store, self.rank, self.world_size) - torch.cuda.set_device(self.rank) - for use_bucket_view in (True, False): - model = self._test_ddp_checkpointing( - self.CheckpointTwiceModuleWeightSharing(), - process_group=process_group, - use_bucket_view=use_bucket_view, - static_graph=True, - ) class NcclErrorHandlingTest(MultiProcessTestCase): @@ -3053,8 +2711,6 @@ def test_nccl_warn_not_in_group_debug_info(self): def test_nccl_warn_not_in_group_debug_off(self): self._test_warn_not_in_group(backend="nccl") -instantiate_parametrized_tests(DistributedDataParallelTest) - if __name__ == "__main__": assert ( not torch.cuda._initialized diff --git a/test/distributed/test_data_parallel.py b/test/distributed/test_data_parallel.py index c1720344e49dc3..3aeff9062909ae 100644 --- a/test/distributed/test_data_parallel.py +++ b/test/distributed/test_data_parallel.py @@ -17,6 +17,7 @@ from torch.testing._internal.common_utils import _assertGradAndGradgradChecks, gradcheck from torch.testing._internal.common_utils import dtype2prec_DONTUSE from torch.testing._internal.common_utils import sandcastle_skip_if +from torch.testing._internal.common_utils import TEST_WITH_ROCM import torch.nn.functional as F torch.set_default_dtype(torch.double) @@ -784,6 +785,7 @@ class TestDataParallelDeviceType(TestCase): @onlyCUDA @skipMeta + @sandcastle_skip_if(TEST_WITH_ROCM, "Failing on few archs, temporarily skipped") @dtypes(torch.float, torch.double, torch.half) def test_data_parallel_module(self, device, dtype): l = nn.Linear(10, 5).to(device, dtype) @@ -796,6 +798,7 @@ def test_data_parallel_module(self, device, dtype): @onlyCUDA @skipMeta + @sandcastle_skip_if(TEST_WITH_ROCM, "Failing on few archs, temporarily skipped") @dtypes(torch.float, torch.double, torch.half) def test_data_parallel_module_kwargs_only(self, device, dtype): class Net(nn.Module): @@ -816,6 +819,7 @@ def forward(self, input): @onlyCUDA @skipMeta + @sandcastle_skip_if(TEST_WITH_ROCM, "Failing on few archs, temporarily skipped") @dtypes(torch.float, torch.double, torch.half) def test_data_parallel_module_kwargs_only_empty_list(self, device, dtype): class Net(nn.Module): @@ -836,6 +840,7 @@ def forward(self, input): @onlyCUDA @skipMeta + @sandcastle_skip_if(TEST_WITH_ROCM, "Failing on few archs, temporarily skipped") @dtypes(torch.float, torch.double, torch.half) def test_data_parallel_module_kwargs_only_empty_dict(self, device, dtype): class Net(nn.Module): @@ -856,6 +861,7 @@ def forward(self, input): @onlyCUDA @skipMeta + @sandcastle_skip_if(TEST_WITH_ROCM, "Failing on few archs, temporarily skipped") @dtypes(torch.float, torch.double, torch.half) def test_data_parallel_module_kwargs_only_empty_tuple(self, device, dtype): class Net(nn.Module): diff --git a/test/distributed/test_store.py b/test/distributed/test_store.py index 02484585c68e2e..6744ab16995d24 100644 --- a/test/distributed/test_store.py +++ b/test/distributed/test_store.py @@ -404,6 +404,14 @@ def test_common_errors(self): gen = dist.rendezvous("tcp://127.0.0.1:23456?rank=0") next(gen) + def test_dns_timeout(self): + with self.assertRaisesRegex(TimeoutError, "client socket has timed out after.*dnsnotexist"): + gen = dist.rendezvous( + "tcp://dnsnotexist:23456?world_size=2&rank=0", + timeout=timedelta(seconds=1), + ) + next(gen) + @retry_on_connect_failures def test_nominal(self): url = self.create_tcp_url() diff --git a/test/distributions/test_distributions.py b/test/distributions/test_distributions.py index 37128792ae28c3..55a227d0327054 100644 --- a/test/distributions/test_distributions.py +++ b/test/distributions/test_distributions.py @@ -34,6 +34,7 @@ from collections import namedtuple from itertools import product from random import shuffle +from packaging import version import torch @@ -2220,39 +2221,41 @@ def test_multivariate_normal_moments(self): # We applied same tests in Multivariate Normal distribution for Wishart distribution def test_wishart_shape(self): - df = (torch.rand(5, requires_grad=True) + 1) * 10 - df_no_batch = (torch.rand([], requires_grad=True) + 1) * 10 - df_multi_batch = (torch.rand(6, 5, requires_grad=True) + 1) * 10 + ndim = 3 + + df = torch.rand(5, requires_grad=True) + ndim + df_no_batch = torch.rand([], requires_grad=True) + ndim + df_multi_batch = torch.rand(6, 5, requires_grad=True) + ndim # construct PSD covariance - tmp = torch.randn(3, 10) + tmp = torch.randn(ndim, 10) cov = (torch.matmul(tmp, tmp.t()) / tmp.size(-1)).requires_grad_() prec = cov.inverse().requires_grad_() scale_tril = torch.linalg.cholesky(cov).requires_grad_() # construct batch of PSD covariances - tmp = torch.randn(6, 5, 3, 10) + tmp = torch.randn(6, 5, ndim, 10) cov_batched = (tmp.unsqueeze(-2) * tmp.unsqueeze(-3)).mean(-1).requires_grad_() prec_batched = cov_batched.inverse() scale_tril_batched = torch.linalg.cholesky(cov_batched) # ensure that sample, batch, event shapes all handled correctly - self.assertEqual(Wishart(df, cov).sample().size(), (5, 3, 3)) - self.assertEqual(Wishart(df_no_batch, cov).sample().size(), (3, 3)) - self.assertEqual(Wishart(df_multi_batch, cov).sample().size(), (6, 5, 3, 3)) - self.assertEqual(Wishart(df, cov).sample((2,)).size(), (2, 5, 3, 3)) - self.assertEqual(Wishart(df_no_batch, cov).sample((2,)).size(), (2, 3, 3)) - self.assertEqual(Wishart(df_multi_batch, cov).sample((2,)).size(), (2, 6, 5, 3, 3)) - self.assertEqual(Wishart(df, cov).sample((2, 7)).size(), (2, 7, 5, 3, 3)) - self.assertEqual(Wishart(df_no_batch, cov).sample((2, 7)).size(), (2, 7, 3, 3)) - self.assertEqual(Wishart(df_multi_batch, cov).sample((2, 7)).size(), (2, 7, 6, 5, 3, 3)) - self.assertEqual(Wishart(df, cov_batched).sample((2, 7)).size(), (2, 7, 6, 5, 3, 3)) - self.assertEqual(Wishart(df_no_batch, cov_batched).sample((2, 7)).size(), (2, 7, 6, 5, 3, 3)) - self.assertEqual(Wishart(df_multi_batch, cov_batched).sample((2, 7)).size(), (2, 7, 6, 5, 3, 3)) - self.assertEqual(Wishart(df, precision_matrix=prec).sample((2, 7)).size(), (2, 7, 5, 3, 3)) - self.assertEqual(Wishart(df, precision_matrix=prec_batched).sample((2, 7)).size(), (2, 7, 6, 5, 3, 3)) - self.assertEqual(Wishart(df, scale_tril=scale_tril).sample((2, 7)).size(), (2, 7, 5, 3, 3)) - self.assertEqual(Wishart(df, scale_tril=scale_tril_batched).sample((2, 7)).size(), (2, 7, 6, 5, 3, 3)) + self.assertEqual(Wishart(df, cov).sample().size(), (5, ndim, ndim)) + self.assertEqual(Wishart(df_no_batch, cov).sample().size(), (ndim, ndim)) + self.assertEqual(Wishart(df_multi_batch, cov).sample().size(), (6, 5, ndim, ndim)) + self.assertEqual(Wishart(df, cov).sample((2,)).size(), (2, 5, ndim, ndim)) + self.assertEqual(Wishart(df_no_batch, cov).sample((2,)).size(), (2, ndim, ndim)) + self.assertEqual(Wishart(df_multi_batch, cov).sample((2,)).size(), (2, 6, 5, ndim, ndim)) + self.assertEqual(Wishart(df, cov).sample((2, 7)).size(), (2, 7, 5, ndim, ndim)) + self.assertEqual(Wishart(df_no_batch, cov).sample((2, 7)).size(), (2, 7, ndim, ndim)) + self.assertEqual(Wishart(df_multi_batch, cov).sample((2, 7)).size(), (2, 7, 6, 5, ndim, ndim)) + self.assertEqual(Wishart(df, cov_batched).sample((2, 7)).size(), (2, 7, 6, 5, ndim, ndim)) + self.assertEqual(Wishart(df_no_batch, cov_batched).sample((2, 7)).size(), (2, 7, 6, 5, ndim, ndim)) + self.assertEqual(Wishart(df_multi_batch, cov_batched).sample((2, 7)).size(), (2, 7, 6, 5, ndim, ndim)) + self.assertEqual(Wishart(df, precision_matrix=prec).sample((2, 7)).size(), (2, 7, 5, ndim, ndim)) + self.assertEqual(Wishart(df, precision_matrix=prec_batched).sample((2, 7)).size(), (2, 7, 6, 5, ndim, ndim)) + self.assertEqual(Wishart(df, scale_tril=scale_tril).sample((2, 7)).size(), (2, 7, 5, ndim, ndim)) + self.assertEqual(Wishart(df, scale_tril=scale_tril_batched).sample((2, 7)).size(), (2, 7, 6, 5, ndim, ndim)) # check gradients # Modified and applied the same tests for multivariate_normal @@ -2278,14 +2281,19 @@ def gradcheck_func(samples, nu, sigma, prec, scale_tril): wishart_log_prob_gradcheck(df_no_batch, None, None, scale_tril_batched) def test_wishart_stable_with_precision_matrix(self): - x = torch.randn(10) + ndim = 10 + x = torch.randn(ndim) P = torch.exp(-(x - x.unsqueeze(-1)) ** 2) # RBF kernel - Wishart(torch.tensor(10), precision_matrix=P) + Wishart(torch.tensor(ndim), precision_matrix=P) @unittest.skipIf(not TEST_NUMPY, "Numpy not found") def test_wishart_log_prob(self): - df = (torch.rand([], requires_grad=True) + 1) * 10 - tmp = torch.randn(3, 10) + ndim = 3 + df = torch.rand([], requires_grad=True) + ndim - 1 + # SciPy allowed ndim -1 < df < ndim for Wishar distribution after version 1.7.0 + if version.parse(scipy.__version__) < version.parse("1.7.0"): + df += 1. + tmp = torch.randn(ndim, 10) cov = (torch.matmul(tmp, tmp.t()) / tmp.size(-1)).requires_grad_() prec = cov.inverse().requires_grad_() scale_tril = torch.linalg.cholesky(cov).requires_grad_() @@ -2297,7 +2305,7 @@ def test_wishart_log_prob(self): dist3 = Wishart(df, scale_tril=scale_tril) ref_dist = scipy.stats.wishart(df.item(), cov.detach().numpy()) - x = dist1.sample((10,)) + x = dist1.sample((1000,)) expected = ref_dist.logpdf(x.transpose(0, 2).numpy()) self.assertEqual(0.0, np.mean((dist1.log_prob(x).detach().numpy() - expected)**2), atol=1e-3, rtol=0) @@ -2305,14 +2313,17 @@ def test_wishart_log_prob(self): self.assertEqual(0.0, np.mean((dist3.log_prob(x).detach().numpy() - expected)**2), atol=1e-3, rtol=0) # Double-check that batched versions behave the same as unbatched - df = (torch.rand(5, requires_grad=True) + 1) * 3 - tmp = torch.randn(5, 3, 10) + df = torch.rand(5, requires_grad=True) + ndim - 1 + # SciPy allowed ndim -1 < df < ndim for Wishar distribution after version 1.7.0 + if version.parse(scipy.__version__) < version.parse("1.7.0"): + df += 1. + tmp = torch.randn(5, ndim, 10) cov = (tmp.unsqueeze(-2) * tmp.unsqueeze(-3)).mean(-1).requires_grad_() dist_batched = Wishart(df, cov) dist_unbatched = [Wishart(df[i], cov[i]) for i in range(df.size(0))] - x = dist_batched.sample((10,)) + x = dist_batched.sample((1000,)) batched_prob = dist_batched.log_prob(x) unbatched_prob = torch.stack([dist_unbatched[i].log_prob(x[:, i]) for i in range(5)]).t() @@ -2322,28 +2333,35 @@ def test_wishart_log_prob(self): @unittest.skipIf(not TEST_NUMPY, "NumPy not found") def test_wishart_sample(self): set_rng_seed(0) # see Note [Randomized statistical tests] - df = (torch.rand([], requires_grad=True) + 1) * 3 - tmp = torch.randn(3, 10) + ndim = 3 + df = torch.rand([], requires_grad=True) + ndim - 1 + # SciPy allowed ndim -1 < df < ndim for Wishar distribution after version 1.7.0 + if version.parse(scipy.__version__) < version.parse("1.7.0"): + df += 1. + tmp = torch.randn(ndim, 10) cov = (torch.matmul(tmp, tmp.t()) / tmp.size(-1)).requires_grad_() prec = cov.inverse().requires_grad_() scale_tril = torch.linalg.cholesky(cov).requires_grad_() + ref_dist = scipy.stats.wishart(df.item(), cov.detach().numpy()) + self._check_sampler_sampler(Wishart(df, cov), - scipy.stats.wishart(df.item(), cov.detach().numpy()), + ref_dist, 'Wishart(df={}, covariance_matrix={})'.format(df, cov), multivariate=True) self._check_sampler_sampler(Wishart(df, precision_matrix=prec), - scipy.stats.wishart(df.item(), cov.detach().numpy()), + ref_dist, 'Wishart(df={}, precision_matrix={})'.format(df, prec), multivariate=True) self._check_sampler_sampler(Wishart(df, scale_tril=scale_tril), - scipy.stats.wishart(df.item(), cov.detach().numpy()), + ref_dist, 'Wishart(df={}, scale_tril={})'.format(df, scale_tril), multivariate=True) def test_wishart_properties(self): - df = (torch.rand([]) + 1) * 5 - scale_tril = transform_to(constraints.lower_cholesky)(torch.randn(5, 5)) + ndim = 5 + df = torch.rand([]) + ndim - 1 + scale_tril = transform_to(constraints.lower_cholesky)(torch.randn(ndim, ndim)) m = Wishart(df=df, scale_tril=scale_tril) self.assertEqual(m.covariance_matrix, m.scale_tril.mm(m.scale_tril.t())) self.assertEqual(m.covariance_matrix.mm(m.precision_matrix), torch.eye(m.event_shape[0])) @@ -2351,14 +2369,15 @@ def test_wishart_properties(self): def test_wishart_moments(self): set_rng_seed(0) # see Note [Randomized statistical tests] - df = (torch.rand([]) + 1) * 3 - scale_tril = transform_to(constraints.lower_cholesky)(torch.randn(3, 3)) + ndim = 3 + df = torch.rand([]) + ndim - 1 + scale_tril = transform_to(constraints.lower_cholesky)(torch.randn(ndim, ndim)) d = Wishart(df=df, scale_tril=scale_tril) - samples = d.rsample((100000,)) + samples = d.rsample((ndim * ndim * 100000,)) empirical_mean = samples.mean(0) - self.assertEqual(d.mean, empirical_mean, atol=5, rtol=0) + self.assertEqual(d.mean, empirical_mean, atol=0.5, rtol=0) empirical_var = samples.var(0) - self.assertEqual(d.variance, empirical_var, atol=5, rtol=0) + self.assertEqual(d.variance, empirical_var, atol=0.5, rtol=0) def test_exponential(self): rate = torch.randn(5, 5).abs().requires_grad_() @@ -3111,12 +3130,12 @@ def test_invalid_parameter_broadcasting(self): 'alpha': torch.tensor([1, 1, 1]) }), (StudentT, { - 'df': torch.tensor([1, 1]), - 'scale': torch.tensor([1, 1, 1]) + 'df': torch.tensor([1., 1.]), + 'scale': torch.tensor([1., 1., 1.]) }), (StudentT, { - 'df': torch.tensor([1, 1]), - 'loc': torch.tensor([1, 1, 1]) + 'df': torch.tensor([1., 1.]), + 'loc': torch.tensor([1., 1., 1.]) }) ] @@ -4623,8 +4642,16 @@ def setUp(self): scipy.stats.weibull_min(c=positive_var2[0], scale=positive_var[0]) ), ( - Wishart(20 + positive_var[0], cov_tensor), # scipy var for Wishart only supports scalars - scipy.stats.wishart(20 + positive_var[0].item(), cov_tensor), + # scipy var for Wishart only supports scalars + # SciPy allowed ndim -1 < df < ndim for Wishar distribution after version 1.7.0 + Wishart( + (20 if version.parse(scipy.__version__) < version.parse("1.7.0") else 19) + positive_var[0], + cov_tensor, + ), + scipy.stats.wishart( + (20 if version.parse(scipy.__version__) < version.parse("1.7.0") else 19) + positive_var[0].item(), + cov_tensor, + ), ), ] diff --git a/test/expect/TestFXAPIBackwardCompatibility.test_function_back_compat-fx_backcompat_function_signatures.expect b/test/expect/TestFXAPIBackwardCompatibility.test_function_back_compat-fx_backcompat_function_signatures.expect index 17e38e6c9fcd44..fcbf9ec18deb16 100644 --- a/test/expect/TestFXAPIBackwardCompatibility.test_function_back_compat-fx_backcompat_function_signatures.expect +++ b/test/expect/TestFXAPIBackwardCompatibility.test_function_back_compat-fx_backcompat_function_signatures.expect @@ -41,7 +41,7 @@ torch.fx.interpreter.Interpreter.get_attr(self, target: 'Target', args: Tuple[to torch.fx.interpreter.Interpreter.map_nodes_to_values(self, args: torch.fx.node.Argument, n: torch.fx.node.Node) -> torch.fx.node.Argument torch.fx.interpreter.Interpreter.output(self, target: 'Target', args: Tuple[torch.fx.node.Argument, ...], kwargs: Dict[str, Any]) -> Any torch.fx.interpreter.Interpreter.placeholder(self, target: 'Target', args: Tuple[torch.fx.node.Argument, ...], kwargs: Dict[str, Any]) -> Any -torch.fx.interpreter.Interpreter.run(self, *args, initial_env: Optional[Dict[torch.fx.node.Node, Any]] = None) -> Any +torch.fx.interpreter.Interpreter.run(self, *args, initial_env: Optional[Dict[torch.fx.node.Node, Any]] = None, enable_io_processing: bool = True) -> Any torch.fx.interpreter.Interpreter.run_node(self, n: torch.fx.node.Node) -> Any torch.fx.interpreter.Transformer.__init__(self, module) torch.fx.interpreter.Transformer.call_function(self, target: 'Target', args: Tuple[torch.fx.node.Argument, ...], kwargs: Dict[str, Any]) -> Any diff --git a/test/expect/TestPytorchExportModes.test_aten_fallback.expect b/test/expect/TestPytorchExportModes.test_aten_fallback.expect index 41059587af0b37..d5cfb31cfeefc8 100644 --- a/test/expect/TestPytorchExportModes.test_aten_fallback.expect +++ b/test/expect/TestPytorchExportModes.test_aten_fallback.expect @@ -11,7 +11,7 @@ ModelProto { nodes: [ Node {type: "Add", inputs: [0,1], outputs: [2], attributes: []}, Node {type: "Constant", inputs: [], outputs: [3], attributes: [{ name: 'value', type: tensor, value:TensorProto shape: []}]}, - Node {type: "ATen", inputs: [2,3], outputs: [4,5], attributes: [{ name: 'operator', type: string, value: 'qr'}]} + Node {type: "ATen", inputs: [2,3], outputs: [4,5], attributes: [{ name: 'operator', type: string, value: 'qr'}, { name: 'overload_name', type: string, value: ''}]} ] } opset_import: [OperatorSetIdProto { domain: }OperatorSetIdProto { domain: org.pytorch.aten}], diff --git a/test/expect/TestPytorchExportModes.test_onnx_aten.expect b/test/expect/TestPytorchExportModes.test_onnx_aten.expect index 22f1c57f95706a..85f4f8573d1c44 100644 --- a/test/expect/TestPytorchExportModes.test_onnx_aten.expect +++ b/test/expect/TestPytorchExportModes.test_onnx_aten.expect @@ -9,7 +9,7 @@ ModelProto { outputs: [{name: "2", type:Tensor dims: 3 4}] initializers: [] nodes: [ - Node {type: "ATen", inputs: [0,1], outputs: [2], attributes: [{ name: 'operator', type: string, value: 'fmod'}]} + Node {type: "ATen", inputs: [0,1], outputs: [2], attributes: [{ name: 'operator', type: string, value: 'fmod'}, { name: 'overload_name', type: string, value: ''}]} ] } opset_import: [OperatorSetIdProto { domain: }OperatorSetIdProto { domain: org.pytorch.aten}], diff --git a/test/expect/TestScript.test_listconstruct_erasure.expect b/test/expect/TestScript.test_listconstruct_erasure.expect index 0f7d470b0709e1..7d4bb8d97fc0f1 100644 --- a/test/expect/TestScript.test_listconstruct_erasure.expect +++ b/test/expect/TestScript.test_listconstruct_erasure.expect @@ -13,7 +13,7 @@ ModelProto { Node {type: "Less", inputs: [0,1], outputs: [2], attributes: []}, Node {type: "Cast", inputs: [2], outputs: [3], attributes: [{ name: 'to', type: int, value: 2}]}, Node {type: "Cast", inputs: [3], outputs: [4], attributes: [{ name: 'to', type: int, value: 9}]}, - Node {type: "ATen", inputs: [0,4], outputs: [5], attributes: [{ name: 'operator', type: string, value: 'index'}]} + Node {type: "ATen", inputs: [0,4], outputs: [5], attributes: [{ name: 'operator', type: string, value: 'index'}, { name: 'overload_name', type: string, value: ''}]} ] } opset_import: [OperatorSetIdProto { domain: }OperatorSetIdProto { domain: org.pytorch.aten}], diff --git a/test/forward_backward_compatibility/check_forward_backward_compatibility.py b/test/forward_backward_compatibility/check_forward_backward_compatibility.py index b7dc0d579c3467..9317e238244b7b 100644 --- a/test/forward_backward_compatibility/check_forward_backward_compatibility.py +++ b/test/forward_backward_compatibility/check_forward_backward_compatibility.py @@ -92,6 +92,7 @@ ("aten::miopen_depthwise_convolution_backward", datetime.date(9999, 1, 1)), ("aten::miopen_depthwise_convolution_backward_input", datetime.date(9999, 1, 1)), ("aten::miopen_depthwise_convolution_backward_weight", datetime.date(9999, 1, 1)), + ("aten::_nested_tensor", datetime.date(9999, 1, 1)), ("caffe2::", datetime.date(2021, 10, 23)), ("prepacked::unpack_prepacked_sizes_conv2d", datetime.date(9999, 1, 1)), ("prepacked::unpack_prepacked_sizes_linear", datetime.date(9999, 1, 1)), @@ -106,10 +107,15 @@ ("aten::_scatter_reduce", datetime.date(2022, 1, 31)), ("aten::native_multi_head_self_attention", datetime.date(9999, 1, 1)), ("aten::_native_multi_head_self_attention", datetime.date(9999, 1, 1)), - ("aten::scatter_reduce.two", datetime.date(2022, 3, 15)), ("aten::grid_sampler_3d_backward", datetime.date(9999, 1, 1)), ("aten::_transform_bias_rescale_qkv", datetime.date(9999, 1, 1)), - ("aten::_scatter_reduce.two", datetime.date(9999, 1, 1)), + ("aten::scatter_reduce.two", datetime.date(2022, 4, 15)), + ("aten::_s_where", datetime.date(2022, 9, 30)), + ("quantized::conv2d_cudnn", datetime.date(2022, 3, 22)), + ("quantized::conv2d_relu_cudnn", datetime.date(2022, 3, 22)), + ("quantized::softmax", datetime.date(2022, 4, 15)), + ("prim::infer_squeeze_size.dim", datetime.date(9999, 1, 1)), + ("prim::infer_squeeze_size", datetime.date(9999, 1, 1)), ] ALLOW_LIST_COMPILED = [ diff --git a/test/jit/test_autodiff.py b/test/jit/test_autodiff.py new file mode 100644 index 00000000000000..518826f602e1ab --- /dev/null +++ b/test/jit/test_autodiff.py @@ -0,0 +1,51 @@ +# Owner(s): ["oncall: jit"] + +import torch + +from torch.testing._internal.jit_utils import JitTestCase +from typing import List + +class TestAutodiffJit(JitTestCase): + def test_undefined_tensor_lists(self): + def fn(tensor_list: List[torch.Tensor], add_tensor): + cat = torch.cat(tensor_list, dim=1) + r = torch.sin(cat + add_tensor) + return r + + fn_s = torch.jit.script(fn) + + a = torch.rand((3, 6), requires_grad=True) + b = torch.rand((3, 10), requires_grad=True) + x = [a, b] + y = torch.rand((3, 16), requires_grad=True) + + ret = fn_s(x, y) + ret.sum().backward() + ret = fn_s(x, y) + ret.sum().backward() + + ret = fn_s(x, y) + s = ret.sum() + + # backward_fn expects 2 inputs: (grad_output, current_grad_r) + # current_grad_r is provided because we need to add this contribution + # to grad_r when we return it. + backward_fn = s.grad_fn.next_functions[0][0] + + # check behavior with defined tensor + grad_out = torch.rand((3, 16)) + grad_inputs = backward_fn(grad_out, None) + + # expect 3 tensors: grad_y, grad_a, grad_b + self.assertEqual(3, len(grad_inputs)) + for x in grad_inputs: + self.assertTrue(isinstance(x, torch.Tensor)) + + # now test with undefined grad_out + grad_inputs = backward_fn(None, None) + + # expect all of them to be None + self.assertEqual(3, len(grad_inputs)) + for x in grad_inputs: + if x is not None: + self.assertEqual(0, torch.max(torch.abs(x)).item()) diff --git a/test/jit/test_export_modes.py b/test/jit/test_export_modes.py index 70d2193201a3c8..300be7d9dd6908 100644 --- a/test/jit/test_export_modes.py +++ b/test/jit/test_export_modes.py @@ -82,7 +82,9 @@ def forward(self, x, y): ModelWithAtenNotONNXOp(), (x, y), add_node_names=False, do_constant_folding=False, - operator_export_type=OperatorExportTypes.ONNX_ATEN_FALLBACK) + operator_export_type=OperatorExportTypes.ONNX_ATEN_FALLBACK, + # support for linalg.qr was added in later op set versions. + opset_version=9) # torch.fmod is using to test ONNX_ATEN. # If you plan to remove fmod from aten, or found this test failed. diff --git a/test/jit/test_if_hoisting.py b/test/jit/test_if_hoisting.py index 939ceda3c56cfd..bda285e6e43bcb 100644 --- a/test/jit/test_if_hoisting.py +++ b/test/jit/test_if_hoisting.py @@ -3,6 +3,7 @@ import torch from torch.testing import FileCheck from torch.testing._internal.jit_utils import JitTestCase +from typing import Dict if __name__ == "__main__": raise RuntimeError( @@ -149,13 +150,13 @@ def fn(x: bool, y: torch.Tensor): self.run_pass("dce", op_graph) FileCheck().check_count("prim::If", 1, exactly=True).run(op_graph) - FileCheck().check_count("aten::add", 2, exactly=True).run(op_graph) + FileCheck().check_count("aten::add(", 2, exactly=True).run(op_graph) FileCheck().check_count("aten::add_", 1, exactly=True).run(op_graph) t1 = torch.Tensor([1]) t2 = torch.Tensor([5, 6]) - self.assertEqual(fn(True, t1), fn_script(True, t1)) - self.assertEqual(fn(False, t2), fn_script(False, t2)) + self.assertEqual(fn(True, t1.clone()), fn_script(True, t1.clone())) + self.assertEqual(fn(False, t2.clone()), fn_script(False, t2.clone())) def test_mutate_after(self): """ @@ -180,7 +181,6 @@ def fn(x: bool, y: torch.Tensor): FileCheck().check_count("prim::If", 1, exactly=True).run(op_graph) FileCheck().check_count("aten::add", 2, exactly=True).run(op_graph) - t1 = torch.Tensor([1]) t2 = torch.Tensor([5, 6]) self.assertEqual(fn(True, t1.clone()), fn_script(True, t1.clone())) @@ -212,3 +212,26 @@ def fn(x: bool, y: torch.Tensor): t2 = torch.Tensor([5, 6]) self.assertEqual(fn(True, t1), fn_script(True, t1)) self.assertEqual(fn(False, t2), fn_script(False, t2)) + + def test_hoist_mutation_2(self): + def fn(x, y, cond: bool, d: Dict[str, torch.Tensor]): + if cond: + m = x.relu() + f1 = torch.rand((2, 2)) + d["test"] = f1 + z = d["test"] + else: + m = y.gelu() + f2 = torch.rand((3, 2)) + d["test"] = f2 + z = d["test"] + return m, z + + fn_s = torch.jit.script(fn) + op_graph = fn_s.graph + self.run_pass("common_expression_hoisting", op_graph) + self.run_pass("dce", op_graph) + FileCheck().check_count("aten::__getitem__", 2, exactly=True).run(op_graph) + FileCheck().check_count("aten::_set_item", 2, exactly=True).run(op_graph) + FileCheck().check_count("aten::relu", 1, exactly=True).run(op_graph) + FileCheck().check_count("aten::gelu", 1, exactly=True).run(op_graph) diff --git a/test/jit/test_misc.py b/test/jit/test_misc.py index bf3c3c3e71c11b..20120ff8f96070 100644 --- a/test/jit/test_misc.py +++ b/test/jit/test_misc.py @@ -228,6 +228,91 @@ def use_module_interface(mod_list: List[OneTwoModule], x: torch.Tensor): self.assertTrue(set(['aten::add.Tensor', 'aten::mul.Scalar']).issubset( set(torch.jit.export_opnames(scripted_M_mod)))) + def test_math_inf(self): + from math import inf + + def foo(): + return inf + + self.checkScript(foo, ()) + + def test_list_literal_infer(self): + def expects_intlist(x: List[int]): + x.append(3) + return x + + def foo(): + return expects_intlist([]) + + self.checkScript(foo, ()) + + def annotated_list_fail(): + return expects_intlist(torch.jit.annotate([], List[Tensor])) + + with self.assertRaises(RuntimeError): + torch.jit.script(annotated_list_fail) + + def non_temporary_fail(): + a = [] + return expects_intlist(a) + + with self.assertRaises(RuntimeError): + torch.jit.script(non_temporary_fail) + + + @torch.jit.script + def test_return(): + return [] + + FileCheck().check("Tensor[] = prim::ListConstruct").run(test_return.graph) + + def test_legacy_tensor_constructor(self): + # testing PyObject overload + def test_all_dtypes(): + return ( + torch.BoolTensor([2]), + torch.LongTensor([3]), + torch.ByteTensor([4]), + torch.CharTensor([5]), + torch.DoubleTensor([6]), + torch.FloatTensor([7]), + torch.IntTensor([8]), + torch.ShortTensor([1]), + torch.HalfTensor([1]), + ) + + self.checkScript(test_all_dtypes, ()) + + # now test empty overload + def empty_overload(): + return torch.LongTensor(2, 3, 4) + + eager = empty_overload() + jit = torch.jit.script(empty_overload)() + eager[:] = 1 + jit[:] = 1 + self.assertEqual(eager, jit) + + def no_inputs(): + return torch.DoubleTensor() + + self.checkScript(no_inputs, ()) + + # bad schema + def multiple_args(): + return torch.LongTensor(1, [2]) + + with self.assertRaisesRegex(RuntimeError, "multiple positional arguments that were not all integers"): + torch.jit.script(multiple_args) + + # kwarg bad schema + def bad_kwarg(): + return torch.LongTensor(hello="1") + + with self.assertRaisesRegex(RuntimeError, "hello"): + torch.jit.script(bad_kwarg) + + def test_broadcasting_list(self): """ Test BroadcastingList and torch.nn._size_N_t alias diff --git a/test/jit/test_op_decompositions.py b/test/jit/test_op_decompositions.py new file mode 100644 index 00000000000000..bfd6edb2e6b824 --- /dev/null +++ b/test/jit/test_op_decompositions.py @@ -0,0 +1,23 @@ +# Owner(s): ["oncall: jit"] + +import torch +from torch.testing import FileCheck +from torch.testing._internal.jit_utils import JitTestCase + +if __name__ == '__main__': + raise RuntimeError("This test file is not meant to be run directly, use:\n\n" + "\tpython test/test_jit.py TESTNAME\n\n" + "instead.") + +class TestOpDecompositions(JitTestCase): + def test_op_decomposition(self): + def foo(x): + return torch.var(x, unbiased=True) + + # TODO: more robust testing + foo_s = torch.jit.script(foo) + FileCheck().check("aten::var").run(foo_s.graph) + torch._C._jit_pass_run_decompositions(foo_s.graph) + inp = torch.rand([10, 10]) + self.assertEqual(foo(inp), foo_s(inp)) + FileCheck().check_not("aten::var").run(foo_s.graph) diff --git a/test/jit/test_profiler.py b/test/jit/test_profiler.py index 4c9380f40471f5..81df055f55b7c8 100644 --- a/test/jit/test_profiler.py +++ b/test/jit/test_profiler.py @@ -18,7 +18,7 @@ class TestProfiler(JitTestCase): def setUp(self): self.prev_exec = torch._C._jit_set_profiling_executor(True) - self.prev_profiling = torch._C._jit_set_profiling_mode(True) + self.prev_profiling = torch._C._get_graph_executor_optimize(True) self.inline_autodiff = torch._C._debug_set_autodiff_subgraph_inlining(False) self.texpr_fuser_state = torch._C._jit_texpr_fuser_enabled() self.can_fuse_on_cpu = torch._C._jit_can_fuse_on_cpu() @@ -34,7 +34,7 @@ def setUp(self): def tearDown(self): torch._C._jit_set_profiling_executor(self.prev_exec) - torch._C._jit_set_profiling_mode(self.prev_profiling) + torch._C._get_graph_executor_optimize(self.prev_profiling) torch._C._debug_set_autodiff_subgraph_inlining(self.inline_autodiff) torch._C._jit_set_texpr_fuser_enabled(self.texpr_fuser_state) torch._C._jit_override_can_fuse_on_cpu(self.can_fuse_on_cpu) diff --git a/test/jit/test_python_bindings.py b/test/jit/test_python_bindings.py index 2f086feaa904e9..37c2ef7f85af74 100644 --- a/test/jit/test_python_bindings.py +++ b/test/jit/test_python_bindings.py @@ -1,6 +1,7 @@ # Owner(s): ["oncall: jit"] import torch +from torch.testing import FileCheck from torch.testing._internal.jit_utils import JitTestCase if __name__ == "__main__": @@ -82,3 +83,28 @@ def test_graph_create(self): gr = torch._C.Graph() with self.assertRaises(ValueError): gr.create("prim::Constant", [None]) + + def test_canonicalize(self): + ir = """ +graph(%p207 : Tensor, + %1 : Tensor, + %p407 : int): + %11 : Tensor = aten::view_expand_placeholder(%1) + %12 : Tensor = aten::pointwise_placeholder(%11, %p207, %p407) + %13 : Tensor = aten::view_expand_placeholder(%12) + %14 : Tensor = aten::pointwise_placeholder(%13) + return (%14) + """ + + graph1 = torch._C.parse_ir(ir) + graph1 = torch._C._jit_pass_canonicalize(graph1, True) + + graph2 = torch._C.parse_ir(ir) + graph2 = torch._C._jit_pass_canonicalize(graph2) + + self.assertEqual(str(graph1), str(graph2)) + FileCheck().check("%p207").check_not("%14").run(graph1) + + graph3 = torch._C.parse_ir(ir) + graph3 = torch._C._jit_pass_canonicalize(graph3, False) + FileCheck().check_not("%p207").run(graph3) diff --git a/test/jit/test_save_load.py b/test/jit/test_save_load.py index fbc1443024cb5a..bbe7e0a7016f6e 100644 --- a/test/jit/test_save_load.py +++ b/test/jit/test_save_load.py @@ -1,20 +1,22 @@ # Owner(s): ["oncall: jit"] -from typing import NamedTuple, Optional import io import os import pathlib import sys +import unittest +from typing import NamedTuple, Optional +import torch from torch import Tensor from torch.testing._internal.common_utils import TemporaryFileName -import torch # Make the helper files in test/ importable pytorch_test_dir = os.path.dirname(os.path.dirname(os.path.realpath(__file__))) sys.path.append(pytorch_test_dir) -from torch.testing._internal.jit_utils import (JitTestCase, - clear_class_registry) +from torch.testing._internal.jit_utils import JitTestCase, clear_class_registry + +ENABLE_FLATBUFFER = os.environ.get("ENABLE_FLATBUFFER", "0") == "1" if __name__ == "__main__": raise RuntimeError( @@ -23,12 +25,14 @@ "instead." ) + class TestSaveLoad(JitTestCase): def test_different_modules(self): """ Exercise the situation where we have the same qualified name in two different CompilationUnits on save/load. """ + class Foo(torch.nn.Module): def __init__(self): super(Foo, self).__init__() @@ -64,7 +68,8 @@ def forward(self, x): clear_class_registry() self.assertEqual( - first_script_module._c.qualified_name, second_script_module._c.qualified_name + first_script_module._c.qualified_name, + second_script_module._c.qualified_name, ) class ContainsBoth(torch.nn.Module): @@ -89,6 +94,7 @@ def test_different_functions(self): Exercise the situation where we have the same qualified name in two different CompilationUnits on save/load. """ + def lol(x): return x @@ -118,7 +124,8 @@ def forward(self, x): clear_class_registry() self.assertEqual( - first_script_module._c.qualified_name, second_script_module._c.qualified_name + first_script_module._c.qualified_name, + second_script_module._c.qualified_name, ) class ContainsBoth(torch.nn.Module): @@ -143,6 +150,7 @@ def test_different_interfaces(self): Exercise the situation where we have the same qualified name in two different CompilationUnits on save/load. """ + @torch.jit.interface class MyInterface(object): def bar(self, x: Tensor) -> Tensor: @@ -204,7 +212,8 @@ def forward(self, x): clear_class_registry() self.assertEqual( - first_script_module._c.qualified_name, second_script_module._c.qualified_name + first_script_module._c.qualified_name, + second_script_module._c.qualified_name, ) class ContainsBoth(torch.nn.Module): @@ -261,7 +270,6 @@ def forward(self, x): return x, MyCoolNamedTuple(a=5) - first_script_module = torch.jit.script(Foo()) first_saved_module = io.BytesIO() torch.jit.save(first_script_module, first_saved_module) @@ -310,7 +318,8 @@ def forward(self, x): clear_class_registry() self.assertEqual( - first_script_module._c.qualified_name, second_script_module._c.qualified_name + first_script_module._c.qualified_name, + second_script_module._c.qualified_name, ) class ContainsBoth(torch.nn.Module): @@ -340,44 +349,44 @@ def forward(self, a): value = b"bar\x00\xffbaz" expected_extra_files = {} - expected_extra_files['foo'] = value + expected_extra_files["foo"] = value # verify that str to bytes conversion also works - expected_extra_files['foo2'] = "bar" + expected_extra_files["foo2"] = "bar" m = MyMod() # Save to file. with TemporaryFileName() as fname: m.save(fname, _extra_files=expected_extra_files) # values don't matter - extra_files = {'foo': '', 'foo2': None} + extra_files = {"foo": "", "foo2": None} torch.jit.load(fname, _extra_files=extra_files) - self.assertEqual(value, extra_files['foo']) + self.assertEqual(value, extra_files["foo"]) # results come back always as bytes - self.assertEqual(b"bar", extra_files['foo2']) + self.assertEqual(b"bar", extra_files["foo2"]) # Use torch.jit API torch.jit.save(m, fname, _extra_files=expected_extra_files) - extra_files['foo'] = '' + extra_files["foo"] = "" torch.jit.load(fname, _extra_files=extra_files) - self.assertEqual(value, extra_files['foo']) + self.assertEqual(value, extra_files["foo"]) # Save to buffer. buffer = io.BytesIO(m.save_to_buffer(_extra_files=expected_extra_files)) - extra_files = {'foo': ''} + extra_files = {"foo": ""} torch.jit.load(buffer, _extra_files=extra_files) - self.assertEqual(value, extra_files['foo']) + self.assertEqual(value, extra_files["foo"]) # Use torch.jit API buffer = io.BytesIO() torch.jit.save(m, buffer, _extra_files=expected_extra_files) buffer.seek(0) - extra_files = {'foo': ''} + extra_files = {"foo": ""} torch.jit.load(buffer, _extra_files=extra_files) - self.assertEqual(value, extra_files['foo']) + self.assertEqual(value, extra_files["foo"]) # Non-existent file 'bar' with self.assertRaises(RuntimeError): - extra_files['bar'] = '' + extra_files["bar"] = "" torch.jit.load(buffer, _extra_files=extra_files) def test_save_load_using_pathlib(self): @@ -394,7 +403,7 @@ def forward(self, a): m.save(path) m2 = torch.jit.load(path) - x = torch.tensor([1., 2., 3., 4.]) + x = torch.tensor([1.0, 2.0, 3.0, 4.0]) self.assertTrue(torch.equal(m(x), m2(x))) def test_save_nonexit_file(self): @@ -455,7 +464,476 @@ class TestModule(torch.nn.Module): def __init__(self): super().__init__() self.add_module("submodule_a", Submodule()) - self.register_parameter("parameter_a", torch.nn.Parameter(torch.randn(4))) + self.register_parameter( + "parameter_a", torch.nn.Parameter(torch.randn(4)) + ) + self.register_buffer("buffer", torch.randn(4)) + self.t = torch.rand(4) # not buffer + + self.parameter_b = torch.nn.Parameter(torch.randn(4)) + self.submodule_b = Submodule() + + m = TestModule() + m_loaded = self.getExportImportCopy(torch.jit.script(m)) + + # Check submodules. + self.assertEqual( + len(list(m.named_modules())), len(list(m_loaded.named_modules())) + ) + for m_s, loaded_s in zip(m.named_modules(), m_loaded.named_modules()): + m_name, _ = m_s + loaded_name, _ = loaded_s + self.assertEqual(m_name, loaded_name) + + # Check parameters. + self.assertEqual(len(list(m.parameters())), len(list(m_loaded.parameters()))) + for m_p, loaded_p in zip(m.parameters(), m_loaded.parameters()): + self.assertEqual(m_p, loaded_p) + + # Check buffers. + self.assertEqual( + len(list(m.named_buffers())), len(list(m_loaded.named_buffers())) + ) + for m_b, loaded_b in zip(m.named_buffers(), m_loaded.named_buffers()): + m_name, m_buffer = m_b + loaded_name, loaded_buffer = loaded_b + self.assertEqual(m_name, loaded_name) + self.assertEqual(m_buffer, loaded_buffer) + + def test_save_load_meta_tensors(self): + """ + Check that parameters, buffers, and submodules are the same after loading + for a module with parameters and buffers that are meta tensors + """ + + class Foo(torch.nn.Module): + def __init__(self): + super(Foo, self).__init__() + self.foo = torch.nn.Linear(2, 3, device="meta") + self.bar = torch.nn.Linear(3, 4) + self.register_buffer("buffer", torch.randn(4, device="meta")) + + def forward(self, x): + x = self.foo(x) + x = self.bar(x) + return x + + m = Foo() + m_loaded = self.getExportImportCopy(torch.jit.script(m)) + # Check submodules. + self.assertEqual( + len(list(m.named_modules())), len(list(m_loaded.named_modules())) + ) + self.assertEqual( + set(name for name, _ in m.named_modules()), + set(name for name, _ in m_loaded.named_modules()), + ) + # Check parameters. + m_params = dict(m.named_parameters()) + m_loaded_params = dict(m_loaded.named_parameters()) + self.assertEqual(len(m_params), len(m_loaded_params)) + self.assertEqual(m_params, m_loaded_params) + # Check buffers. + m_buffers = dict(m.named_buffers()) + m_loaded_buffers = dict(m_loaded.named_buffers()) + self.assertEqual(len(m_buffers), len(m_loaded_buffers)) + self.assertEqual(m_buffers, m_loaded_buffers) + # Check params and buffers that are/are not meta tensors + self.assertTrue(m_params["foo.weight"].is_meta) + self.assertTrue(m_loaded_params["foo.weight"].is_meta) + self.assertTrue(m_params["foo.bias"].is_meta) + self.assertTrue(m_loaded_params["foo.bias"].is_meta) + self.assertFalse(m_params["bar.weight"].is_meta) + self.assertFalse(m_loaded_params["bar.weight"].is_meta) + self.assertFalse(m_params["bar.bias"].is_meta) + self.assertFalse(m_loaded_params["bar.bias"].is_meta) + self.assertTrue(m_buffers["buffer"].is_meta) + self.assertTrue(m_loaded_buffers["buffer"].is_meta) + + +def script_module_to_buffer(script_module): + module_buffer = io.BytesIO( + script_module._save_to_buffer_for_lite_interpreter(_use_flatbuffer=True) + ) + module_buffer.seek(0) + return module_buffer + + +@unittest.skipIf( + not ENABLE_FLATBUFFER, "Need to enable flatbuffer to run the below tests" +) +class TestSaveLoadFlatbuffer(JitTestCase): + def test_different_modules(self): + """ + Exercise the situation where we have the same qualified name + in two different CompilationUnits on save/load. + """ + + class Foo(torch.nn.Module): + def __init__(self): + super(Foo, self).__init__() + self.foo = torch.nn.Linear(2, 2) + self.bar = torch.nn.Linear(2, 2) + + def forward(self, x): + x = self.foo(x) + x = self.bar(x) + return x + + first_script_module = torch.jit.script(Foo()) + first_saved_module = script_module_to_buffer(first_script_module) + + clear_class_registry() + + class Foo(torch.nn.Module): + def __init__(self): + super(Foo, self).__init__() + self.foo = torch.nn.Linear(2, 2) + + def forward(self, x): + x = self.foo(x) + return x + + second_script_module = torch.jit.script(Foo()) + second_saved_module = script_module_to_buffer(second_script_module) + + clear_class_registry() + + self.assertEqual( + first_script_module._c.qualified_name, + second_script_module._c.qualified_name, + ) + + class ContainsBoth(torch.nn.Module): + def __init__(self): + super().__init__() + self.add_module( + "second", torch.jit.load(second_saved_module) + ) + self.add_module( + "first", torch.jit.load(first_saved_module) + ) + + def forward(self, x): + x = self.first(x) + x = self.second(x) + return x + + sm = torch.jit.script(ContainsBoth()) + contains_both = script_module_to_buffer(sm) + sm = torch.jit.load(contains_both) + + def test_different_functions(self): + """ + Exercise the situation where we have the same qualified name + in two different CompilationUnits on save/load. + """ + + def lol(x): + return x + + class Foo(torch.nn.Module): + def forward(self, x): + return lol(x) + + first_script_module = torch.jit.script(Foo()) + first_saved_module = script_module_to_buffer(first_script_module) + clear_class_registry() + + def lol(x): # noqa: F811 + return "hello" + + class Foo(torch.nn.Module): + def forward(self, x): + return lol(x) + + second_script_module = torch.jit.script(Foo()) + second_saved_module = script_module_to_buffer(second_script_module) + + clear_class_registry() + + self.assertEqual( + first_script_module._c.qualified_name, + second_script_module._c.qualified_name, + ) + + class ContainsBoth(torch.nn.Module): + def __init__(self): + super().__init__() + self.add_module( + "second", torch.jit.load(second_saved_module) + ) + self.add_module( + "first", torch.jit.load(first_saved_module) + ) + + def forward(self, x): + x = self.first(x) + x = self.second(x) + return x + + sm = torch.jit.script(ContainsBoth()) + contains_both = script_module_to_buffer(sm) + sm = torch.jit.load(contains_both) + + def test_different_interfaces(self): + """ + Exercise the situation where we have the same qualified name + in two different CompilationUnits on save/load. + """ + + @torch.jit.interface + class MyInterface(object): + def bar(self, x: Tensor) -> Tensor: + pass + + @torch.jit.script + class ImplementInterface(object): + def __init__(self): + pass + + def bar(self, x): + return x + + class Foo(torch.nn.Module): + __annotations__ = {"interface": MyInterface} + + def __init__(self): + super().__init__() + self.interface = ImplementInterface() + + def forward(self, x): + return self.interface.bar(x) + + first_script_module = torch.jit.script(Foo()) + first_saved_module = script_module_to_buffer(first_script_module) + clear_class_registry() + + @torch.jit.interface + class MyInterface(object): + def not_bar(self, x: Tensor) -> Tensor: + pass + + @torch.jit.script # noqa: F811 + class ImplementInterface(object): # noqa: F811 + def __init__(self): + pass + + def not_bar(self, x): + return x + + class Foo(torch.nn.Module): + __annotations__ = {"interface": MyInterface} + + def __init__(self): + super().__init__() + self.interface = ImplementInterface() + + def forward(self, x): + return self.interface.not_bar(x) + + second_script_module = torch.jit.script(Foo()) + second_saved_module = script_module_to_buffer(second_script_module) + + clear_class_registry() + + self.assertEqual( + first_script_module._c.qualified_name, + second_script_module._c.qualified_name, + ) + + class ContainsBoth(torch.nn.Module): + def __init__(self): + super().__init__() + self.add_module( + "second", torch.jit.load(second_saved_module) + ) + self.add_module( + "first", torch.jit.load(first_saved_module) + ) + + def forward(self, x): + x = self.first(x) + x = self.second(x) + return x + + sm = torch.jit.script(ContainsBoth()) + contains_both = script_module_to_buffer(sm) + sm = torch.jit.load(contains_both) + + def test_many_collisions(self): + class MyCoolNamedTuple(NamedTuple): + a: int + + @torch.jit.interface + class MyInterface(object): + def bar(self, x: Tensor) -> Tensor: + pass + + @torch.jit.script + class ImplementInterface(object): + def __init__(self): + pass + + def bar(self, x): + return x + + def lol(x): + return x + + class Foo(torch.nn.Module): + interface: MyInterface + + def __init__(self): + super().__init__() + self.foo = torch.nn.Linear(2, 2) + self.bar = torch.nn.Linear(2, 2) + self.interface = ImplementInterface() + + def forward(self, x): + x = self.foo(x) + x = self.bar(x) + x = lol(x) + x = self.interface.bar(x) + + return x, MyCoolNamedTuple(a=5) + + first_script_module = torch.jit.script(Foo()) + first_saved_module = script_module_to_buffer(first_script_module) + + clear_class_registry() + + @torch.jit.interface + class MyInterface(object): + def not_bar(self, x: Tensor) -> Tensor: + pass + + @torch.jit.script # noqa: F811 + class ImplementInterface(object): # noqa: F811 + def __init__(self): + pass + + def not_bar(self, x): + return x + + def lol(x): # noqa: F811 + return "asdofij" + + class MyCoolNamedTuple(NamedTuple): # noqa: F811 + a: str + + class Foo(torch.nn.Module): + interface: MyInterface + + def __init__(self): + super().__init__() + self.foo = torch.nn.Linear(2, 2) + self.interface = ImplementInterface() + + def forward(self, x): + x = self.foo(x) + self.interface.not_bar(x) + x = lol(x) + return x, MyCoolNamedTuple(a="hello") + + second_script_module = torch.jit.script(Foo()) + second_saved_module = script_module_to_buffer(second_script_module) + + clear_class_registry() + + self.assertEqual( + first_script_module._c.qualified_name, + second_script_module._c.qualified_name, + ) + + class ContainsBoth(torch.nn.Module): + def __init__(self): + super().__init__() + self.add_module( + "second", torch.jit.load(second_saved_module) + ) + self.add_module( + "first", torch.jit.load(first_saved_module) + ) + + def forward(self, x): + x, named_tuple_1 = self.first(x) + x, named_tuple_2 = self.second(x) + return len(x + named_tuple_2.a) + named_tuple_1.a + + sm = torch.jit.script(ContainsBoth()) + contains_both = script_module_to_buffer(sm) + sm = torch.jit.load(contains_both) + + def test_save_load_using_pathlib(self): + class MyMod(torch.jit.ScriptModule): + @torch.jit.script_method + def forward(self, a): + return 2 * a + + m = MyMod() + + # Save then load. + with TemporaryFileName() as fname: + path = pathlib.Path(fname) + torch.jit.save_jit_module_to_flatbuffer(m, path) + m2 = torch.jit.load(path) + + x = torch.tensor([1.0, 2.0, 3.0, 4.0]) + self.assertTrue(torch.equal(m(x), m2(x))) + + def test_save_namedtuple_input_only(self): + """ + Even if a NamedTuple is only used as an input argument, saving and + loading should work correctly. + """ + global FooTuple # see [local resolution in python] + + class FooTuple(NamedTuple): + a: int + + class MyModule(torch.nn.Module): + def forward(self, x: FooTuple) -> torch.Tensor: + return torch.tensor(3) + + m_loaded = self.getExportImportCopy(torch.jit.script(MyModule())) + output = m_loaded(FooTuple(a=5)) + self.assertEqual(output, torch.tensor(3)) + + def test_save_namedtuple_output_only(self): + """ + Even if a NamedTuple is only used as an output argument, saving and + loading should work correctly. + """ + global FooTuple # see [local resolution in python] + + class FooTuple(NamedTuple): + a: int + + class MyModule(torch.nn.Module): + def forward(self) -> Optional[FooTuple]: + return None + + m_loaded = self.getExportImportCopy(torch.jit.script(MyModule())) + output = m_loaded() + self.assertEqual(output, None) + + def test_save_load_params_buffers_submodules(self): + """ + Check that parameters, buffers, and submodules are the same after loading. + """ + + class Submodule(torch.nn.Module): + def __init__(self): + super().__init__() + + class TestModule(torch.nn.Module): + def __init__(self): + super().__init__() + self.add_module("submodule_a", Submodule()) + self.register_parameter( + "parameter_a", torch.nn.Parameter(torch.randn(4)) + ) self.register_buffer("buffer", torch.randn(4)) self.t = torch.rand(4) # not buffer @@ -466,7 +944,9 @@ def __init__(self): m_loaded = self.getExportImportCopy(torch.jit.script(m)) # Check submodules. - self.assertEqual(len(list(m.named_modules())), len(list(m_loaded.named_modules()))) + self.assertEqual( + len(list(m.named_modules())), len(list(m_loaded.named_modules())) + ) for m_s, loaded_s in zip(m.named_modules(), m_loaded.named_modules()): m_name, _ = m_s loaded_name, _ = loaded_s @@ -478,7 +958,9 @@ def __init__(self): self.assertEqual(m_p, loaded_p) # Check buffers. - self.assertEqual(len(list(m.named_buffers())), len(list(m_loaded.named_buffers()))) + self.assertEqual( + len(list(m.named_buffers())), len(list(m_loaded.named_buffers())) + ) for m_b, loaded_b in zip(m.named_buffers(), m_loaded.named_buffers()): m_name, m_buffer = m_b loaded_name, loaded_buffer = loaded_b diff --git a/test/jit/test_symbolic_shape_analysis.py b/test/jit/test_symbolic_shape_analysis.py index cd25caa92b2bbe..e756cdb6788982 100644 --- a/test/jit/test_symbolic_shape_analysis.py +++ b/test/jit/test_symbolic_shape_analysis.py @@ -12,6 +12,7 @@ ) from torch.testing._internal.common_utils import make_tensor from torch.testing._internal.jit_utils import JitTestCase, execWrapper +from typing import List, Any if __name__ == '__main__': raise RuntimeError("This test file is not meant to be run directly, use:\n\n" @@ -498,3 +499,37 @@ def test_shape_function_includes(self): m2_shape = [20, 10] res = torch.jit._shapes.matmul(m1_shape, m2_shape) self.assertEqual(res, [10, 10]) + + def test_register_function_error_checking(self): + # this will error before registering on global map, so + # no issue in overwriting schema mappings + @torch.jit.script + def foo(x, y): + return x + y + + node = foo.graph.findNode("aten::add") + + @torch.jit.script + def wrong_input_types(x, y): + x: List[int] = [] + return x + with self.assertRaisesRegex(RuntimeError, "Expected supertype of int"): + torch._C._jit_register_shape_compute_graph_for_node(node, wrong_input_types.graph) + + @torch.jit.script + def wrong_output_types(x: List[int], y: List[int]): + x: List[Tensor] = [] + return x + + with self.assertRaisesRegex(RuntimeError, "but got graph_type"): + torch._C._jit_register_shape_compute_graph_for_node(node, wrong_output_types.graph) + + @torch.jit.script + def too_many_inputs(x: List[int], y: List[int], z: Any, z2: Any): + x: List[int] = [] + return x + + with self.assertRaises(RuntimeError) as error: + torch._C._jit_register_shape_compute_graph_for_node(node, too_many_inputs.graph) + + self.assertTrue("fewer arguments than schema" in str(error.exception)) diff --git a/test/jit/test_tensor_methods.py b/test/jit/test_tensor_methods.py new file mode 100644 index 00000000000000..c761a3884c9238 --- /dev/null +++ b/test/jit/test_tensor_methods.py @@ -0,0 +1,39 @@ +# Owner(s): ["oncall: jit"] + +import os +import sys + +import torch + +# Make the helper files in test/ importable +pytorch_test_dir = os.path.dirname(os.path.dirname(os.path.realpath(__file__))) +sys.path.append(pytorch_test_dir) +from torch.testing._internal.jit_utils import JitTestCase +from torch.testing import FileCheck + +if __name__ == "__main__": + raise RuntimeError( + "This test file is not meant to be run directly, use:\n\n" + "\tpython test/test_jit.py TESTNAME\n\n" + "instead." + ) + +class TestTensorMethods(JitTestCase): + def test_getitem(self): + def tensor_getitem(inp: torch.Tensor): + indices = torch.tensor([0, 2], dtype=torch.long) + return inp.__getitem__(indices) + + inp = torch.rand(3, 4) + self.checkScript(tensor_getitem, (inp, )) + + scripted = torch.jit.script(tensor_getitem) + FileCheck().check("aten::index").run(scripted.graph) + + def test_getitem_invalid(self): + def tensor_getitem_invalid(inp: torch.Tensor): + return inp.__getitem__() + + with self.assertRaisesRegexWithHighlight( + RuntimeError, "expected exactly 1 argument", "inp.__getitem__"): + torch.jit.script(tensor_getitem_invalid) diff --git a/test/jit/test_types.py b/test/jit/test_types.py index 9fadbedb272bb5..ca3da3c17c8cd1 100644 --- a/test/jit/test_types.py +++ b/test/jit/test_types.py @@ -39,7 +39,7 @@ def fn(x: torch.Tensor) -> Tuple[Tuple[torch.Tensor], Dict[str, int]]: expected = fn(x) scripted = torch.jit.script(fn)(x) - self.assertEquals(expected, scripted) + self.assertEqual(expected, scripted) def test_types_as_values(self): def fn(m: torch.Tensor) -> torch.device: diff --git a/test/lazy/__init__.py b/test/lazy/__init__.py new file mode 100644 index 00000000000000..e69de29bb2d1d6 diff --git a/test/lazy/test_bindings.py b/test/lazy/test_bindings.py new file mode 100644 index 00000000000000..57151d4085602b --- /dev/null +++ b/test/lazy/test_bindings.py @@ -0,0 +1,7 @@ +# Owner(s): ["oncall: jit"] + +import torch._lazy.metrics + +def test_metrics(): + names = torch._lazy.metrics.counter_names() + assert len(names) == 0, f"Expected no counter names, but got {names}" diff --git a/test/lazy/test_extract_compiled_graph.py b/test/lazy/test_extract_compiled_graph.py new file mode 100644 index 00000000000000..f4152d0af68bf3 --- /dev/null +++ b/test/lazy/test_extract_compiled_graph.py @@ -0,0 +1,195 @@ +# Owner(s): ["oncall: jit"] + +import unittest + +from torch._lazy.ts_backend import init as init_ts_backend +init_ts_backend() +from torch._lazy import config +from torch._lazy.extract_compiled_graph import extract_compiled_graph +import torch +from torch import nn +import dis +import inspect +from torch import fx +import re +from contextlib import contextmanager +import copy + +class ModuleConstScale(nn.Module): + def __init__(self): + super(ModuleConstScale, self).__init__() + + def forward(self, a): + return a * 2 + +class ModuleSub(nn.Module): + def __init__(self): + super(ModuleSub, self).__init__() + + def forward(self, a, b): + return a - b + +class ModuleAddcmul(nn.Module): + """ + addcmul function takes a at::Scalar which results in a special TSData containing a Scalar rather than a Tensor. + """ + def __init__(self): + super(ModuleAddcmul, self).__init__() + + def forward(self, a, b, c): + return torch.addcmul(a, b, c, value=5) + +class ModuleReturnMulti(nn.Module): + def __init__(self): + super(ModuleReturnMulti, self).__init__() + + def forward(self, a, b): + return (b + 1, a - 1) + +# The default fx tracer will convert torch.randn to a constant.. We may need +# a custom tracer. +# class ModuleEagerTensor(nn.Module): +# def __init__(self): +# super(ModuleEagerTensor, self).__init__() +# +# def forward(self, a): +# b = torch.randn(2, 3, device="cpu") # eager device +# return a + b + +# The module was planned to cover the case that a Fx graph return an eager +# tensor on the default device. It's harder than ModuleEagerTensor because +# we can not just override the device argument to Lazy since there is no +# explicit device argument. +# +# Unfortunately, the default fx tracer convert the return value of the forward +# method to a constant.. Comment out for now +# class ModuleReturnEagerTensorOnDefaultDevice(nn.Module): +# def __init__(self): +# super(ModuleReturnEagerTensorOnDefaultDevice, self).__init__() +# +# def forward(self): +# return torch.tensor((2, 3), dtype=torch.float32) + +class ModuleReturnDupTensor(nn.Module): + """ + Handle the corner case that the same tensor appears multiple times in the + returned tuple. torchbench like drq will hit this corner case when running + thru torchdynamo.. + """ + def __init__(self): + super(ModuleReturnDupTensor, self).__init__() + + def forward(self, a, b): + c = a + b + return a - b, c, a + 1, c + +class ModuleInplaceUpdate(nn.Module): + def __init__(self): + super(ModuleInplaceUpdate, self).__init__() + + def forward(self, a, b): + a.sub_(b) + return b - 1, b + 1 + +@contextmanager +def force_fallback_ctx_mgr(fallback_op): + oldconfig = config.get_force_fallback() + config.set_force_fallback(fallback_op) + try: + yield None + finally: + config.set_force_fallback(oldconfig) + +@contextmanager +def nop_ctx_mgr(): + try: + yield None + finally: + pass + +def gen_rand_args(mod): + args = [] + for _ in range(len(inspect.signature(mod.forward).parameters)): + args.append(torch.randn(2, 3)) + return args + +def allclose(expected, actual): + def unwrap(cont): + if isinstance(cont, (list, tuple)) and len(cont) == 1: + return cont[0] + return cont + expected = unwrap(expected) + actual = unwrap(actual) + + if isinstance(expected, torch.Tensor) and isinstance(actual, torch.Tensor): + return torch.allclose(expected, actual) + elif isinstance(expected, (tuple, list)) and isinstance(actual, (tuple, list)): + return len(expected) == len(actual) and all(torch.allclose(a, b) for a, b in zip(expected, actual)) + else: + raise RuntimeError("Unexpected types") + +def verify_reusing_compiled_graph(mod, exception_msg_pattern, ncase=10): + args = gen_rand_args(mod) + out = mod(*args) + + dis.dis(mod.forward) + + try: + optimized_mod = extract_compiled_graph(fx.symbolic_trace(mod), args) + except RuntimeError as e: + if exception_msg_pattern is None: + raise e # reraise the exception + exception_message = str(e) + if not re.search(exception_msg_pattern, exception_message): + raise RuntimeError(f"Expection message does not match the required pattern: {exception_message}") + else: + # We are done for the test case that expects an exception + return + + if exception_msg_pattern is not None: + raise RuntimeError(f"Expect an exception matching pattern {exception_msg_pattern}") + print("return value of optimized_mod", optimized_mod(*args)) + + # check correctness + failed_index = [] + for i in range(ncase): + rand_args = gen_rand_args(mod) + rand_args_copy = copy.deepcopy(rand_args) + expected = mod(*rand_args) + actual = optimized_mod(*rand_args_copy) + + if not allclose(expected, actual): + print(f"Incorrect results. expected {expected}, actual {actual}") + failed_index.append(i) + continue + + # make sure arguments match after calling the model forward method to handle inplace + # updates. + if not allclose(rand_args, rand_args_copy): + print(f"Incorrect updated arguments. expected {rand_args}, actual {rand_args_copy}") + failed_index.append(i) + continue + + if len(failed_index) > 0: + raise RuntimeError(f"Failed {len(failed_index)}/{ncase} cases") + +def maketest(module_cls, exception_msg_pattern=None, ctxmgr=None): + def wrapper(self): + nonlocal ctxmgr + if not ctxmgr: + ctxmgr = nop_ctx_mgr() + with ctxmgr: + verify_reusing_compiled_graph(module_cls(), exception_msg_pattern) + + return wrapper + +class OptimizeTest(unittest.TestCase): + test_sub = maketest(ModuleSub) + # Same as test_sub but force aten::sub to fallback + # We expect an exception caught because of LTC fallabck. + test_ltc_fallback = maketest(ModuleSub, exception_msg_pattern="fallback.*aten::sub", ctxmgr=force_fallback_ctx_mgr("aten::sub")) + test_const_scale = maketest(ModuleConstScale) + test_addcmul = maketest(ModuleAddcmul) + test_return_multi = maketest(ModuleReturnMulti) + test_return_dup_tensor = maketest(ModuleReturnDupTensor) + test_inplace_update = maketest(ModuleInplaceUpdate) diff --git a/test/lazy/test_ts_opinfo.py b/test/lazy/test_ts_opinfo.py new file mode 100644 index 00000000000000..87f007b93a1e26 --- /dev/null +++ b/test/lazy/test_ts_opinfo.py @@ -0,0 +1,160 @@ +# Owner(s): ["oncall: jit"] + +from typing import Sequence +import torch +import functools + +from torch.testing._internal.common_utils import run_tests, TestCase +from torch.testing._internal.jit_utils import JitTestCase +from torch.testing._internal.common_methods_invocations import op_db +from torch.testing._internal.common_device_type import ops, instantiate_device_type_tests +import torch._lazy +import torch._lazy.metrics +import torch._lazy.ts_backend +import itertools +import yaml +import os +import pathlib + +torch._lazy.ts_backend.init() + +def get_test_device(): + return 'cuda' if 'LTC_TS_CUDA' in os.environ else 'cpu' + +def remove_suffixes(l): + return [x.split(".")[0] for x in l] + +def init_lists(): + path_to_script = pathlib.Path(os.path.abspath(os.path.dirname(__file__))) + TS_NATIVE_FUNCTIONS_PATH = path_to_script.parent.parent / "aten/src/ATen/native/ts_native_functions.yaml" + with open(TS_NATIVE_FUNCTIONS_PATH) as f: + yaml_ts = yaml.load(f, yaml.Loader) + LAZY_OPS_LIST = set(remove_suffixes(itertools.chain(yaml_ts["full_codegen"], yaml_ts["supported"], yaml_ts["autograd"]))) + FALLBACK_LIST = set(["clamp"]) + SKIP_RUNTIME_ERROR_LIST = set([ + 'index_select', # Empty output_sizes is not supported + 'clone', # is clone decomposed? + 'all', # ASAN failure https://github.com/pytorch/pytorch/issues/74519 + 'any', # ASAN failure https://github.com/pytorch/pytorch/issues/74519 + 'logdet', # ASAN failure https://github.com/pytorch/pytorch/issues/74519 + ]) + SKIP_INCORRECT_RESULTS_LIST = set([ + 'squeeze', # Value out of range + 't', # Value out of range + 'transpose', # Value out of range + 'bernoulli', # incorrect results + 'pow', # incorrect results + 'addcdiv', # incorrect results (on CI not locally?) + ]) + + return (LAZY_OPS_LIST, FALLBACK_LIST, SKIP_RUNTIME_ERROR_LIST, SKIP_INCORRECT_RESULTS_LIST) + +(LAZY_OPS_LIST, FALLBACK_LIST, SKIP_RUNTIME_ERROR_LIST, SKIP_INCORRECT_RESULTS_LIST) = init_lists() + +torch.manual_seed(42) + +class TestLazyTensor(JitTestCase): + def testConvolutionBackward(self): + def clone_move(t): + dev = 'lazy' + copy_t = t.detach().clone().requires_grad_(True).to(device=dev) + return copy_t + + test_device = get_test_device() + inp = torch.rand(1, 3, 128, 128, device=test_device, requires_grad=True) + inp_copy = clone_move(inp) + grad = torch.rand(1, 32, 121, 121, device=test_device) # no requires_grad + grad_copy = clone_move(grad) + weight = torch.rand(32, 3, 8, 8, device=test_device, requires_grad=True) + weight_copy = clone_move(weight) + bias = torch.rand(32, device=test_device, requires_grad=True) + bias_copy = clone_move(bias) + + # run eager + conv_out = torch.nn.functional.conv2d(inp, weight, bias) + (inp_grad, weight_grad, bias_grad) = torch.autograd.grad([conv_out], [inp, weight, bias], [grad]) + + # run lazy + conv_copy_out = torch.nn.functional.conv2d(inp_copy, weight_copy, bias_copy) + (inp_copy_grad, weight_copy_grad, bias_copy_grad) = torch.autograd.grad( + [conv_copy_out], [inp_copy, weight_copy, bias_copy], [grad_copy]) + + # check numerics + torch.testing.assert_close(bias_copy_grad.cpu(), bias_grad.cpu()) + + torch.testing.assert_close(weight_copy_grad.cpu(), weight_grad.cpu()) + torch.testing.assert_close(inp_copy_grad.cpu(), inp_grad.cpu()) + +class TestLazyOpInfo(TestCase): + + @ops([op for op in op_db if op.name in LAZY_OPS_LIST and op.name not in SKIP_RUNTIME_ERROR_LIST], allowed_dtypes=(torch.float,)) + def test_dispatched_to_lazy(self, device, dtype, op): + def get_name(op): + l = [op.name] + if op.variant_test_name != '': + l.append(op.variant_test_name) + return '.'.join(l) + + global FALLBACK_LIST + samples = op.sample_inputs("lazy", dtype, requires_grad=False) + sample = list(samples)[0] + args = [sample.input] + list(sample.args) + kwargs = sample.kwargs + torch._lazy.mark_step() + torch._lazy.wait_device_ops() + torch._lazy.metrics.reset() + + r = op(*args, **kwargs) + torch._lazy.mark_step() + torch._lazy.wait_device_ops() + prefix = "aten" if op.name in FALLBACK_LIST else "lazy" + found = f"{prefix}::{op.name}" in remove_suffixes(torch._lazy.metrics.counter_names()) + # check aliases + if not found: + for alias in op.aliases: + alias_found = f"{prefix}::{alias.name}" in remove_suffixes(torch._lazy.metrics.counter_names()) + found = found or alias_found + if found: + break + self.assertTrue(found) + + + @ops([op for op in op_db if op.name in LAZY_OPS_LIST and op.name not in SKIP_RUNTIME_ERROR_LIST | SKIP_INCORRECT_RESULTS_LIST], allowed_dtypes=(torch.float,)) # noqa: B950 + def test_correctness(self, device, dtype, op): + + test_device = get_test_device() + + def clone_to_device(input, dev): + if isinstance(input, torch.Tensor): + return input.detach().clone().to(device=dev) + if isinstance(input, Sequence) and not isinstance(input, str): + return tuple(map(functools.partial(clone_to_device, dev=dev), input)) + return input + + def assert_allclose_rec(t): + a, b = t + self.assertEqual(type(a), type(b)) + if isinstance(a, torch.Tensor): + self.assertTrue(torch.allclose(clone_to_device(a, test_device), b, atol=1e-4)) + + if isinstance(a, Sequence): + map(assert_allclose_rec, zip(a, b)) + + samples = op.sample_inputs("lazy", dtype, requires_grad=False) + for sample in samples: + args = [sample.input] + list(sample.args) + kwargs = sample.kwargs + copy_args = clone_to_device(args, test_device) + + r_exp = op(*copy_args, **kwargs) + r_actual = op(*args, **kwargs) + + assert_allclose_rec((r_actual, r_exp)) + +# TODO: after we move to master, add Lazy as a new Device here: +# https://github.com/pytorch/pytorch/blob/master/torch/testing/_internal/common_device_type.py#L532 +instantiate_device_type_tests(TestLazyOpInfo, globals(), only_for="cpu") + + +if __name__ == '__main__': + run_tests() diff --git a/test/mobile/lightweight_dispatch/test_codegen_unboxing.cpp b/test/mobile/lightweight_dispatch/test_codegen_unboxing.cpp index 2c0002505554b7..07a845d6008ba0 100644 --- a/test/mobile/lightweight_dispatch/test_codegen_unboxing.cpp +++ b/test/mobile/lightweight_dispatch/test_codegen_unboxing.cpp @@ -1,5 +1,6 @@ #include #include +#include #include #include #include @@ -190,6 +191,29 @@ TEST(LiteInterpreterTest, DivideTensor) { AT_ASSERT(result_1.toList().get(0).toTensor().equal(expected_1)); AT_ASSERT(result_1.toList().get(1).toTensor().equal(expected_2)); } + +TEST(LiteInterpreterTest, MultipleOps) { + // Load check in model: multiple_ops.ptl + auto testModelFile = "multiple_ops.ptl"; + + // class Model(torch.nn.Module): + // def __init__(self): + // super(Model, self).__init__() + // self.ops = torch.nn.Sequential( + // torch.nn.ReLU(), + // torch.nn.Flatten(), + // ) + // def forward(self, x): + // x[1] = -2 + // return self.ops(x) + + Module bc = _load_for_mobile(testModelFile); + auto b = at::ones({2, 2, 2, 2}); + const auto result = bc.forward({b}); + + at::Tensor expected = torch::tensor({{1, 1, 1, 1, 1, 1, 1, 1}, {0, 0, 0, 0, 0, 0, 0, 0}}, c10::TensorOptions(c10::ScalarType::Float)); + AT_ASSERT(result.toTensor().equal(expected)); +} } // namespace mobile } // namespace jit } // namespace torch diff --git a/test/mobile/lightweight_dispatch/tests_setup.py b/test/mobile/lightweight_dispatch/tests_setup.py index 8b1fd6f72998e2..91af29796b9d9e 100644 --- a/test/mobile/lightweight_dispatch/tests_setup.py +++ b/test/mobile/lightweight_dispatch/tests_setup.py @@ -150,6 +150,28 @@ def forward(self, b): script_model._save_for_lite_interpreter(self.path) +class ModelWithMultipleOps(FileSetup): + path = 'multiple_ops.ptl' + + def setup(self): + class Model(torch.nn.Module): + def __init__(self): + super(Model, self).__init__() + self.ops = torch.nn.Sequential( + torch.nn.ReLU(), + torch.nn.Flatten(), + ) + + def forward(self, x): + x[1] = -2 + return self.ops(x) + + model = Model() + # Script the model and save + script_model = torch.jit.script(model) + script_model._save_for_lite_interpreter(self.path) + + tests = [ ModelWithDTypeDeviceLayoutPinMemory(), ModelWithTensorOptional(), @@ -159,6 +181,7 @@ def forward(self, b): ModelWithArrayOfInt(), ModelWithTensors(), ModelWithStringOptional(), + ModelWithMultipleOps(), ] diff --git a/test/mobile/model_test/android_api_module.py b/test/mobile/model_test/android_api_module.py new file mode 100644 index 00000000000000..109e3aa963e8f4 --- /dev/null +++ b/test/mobile/model_test/android_api_module.py @@ -0,0 +1,128 @@ +from typing import Dict, List, Tuple, Optional + +import torch +from torch import Tensor + + +class AndroidAPIModule(torch.jit.ScriptModule): + def __init__(self): + super(AndroidAPIModule, self).__init__() + + @torch.jit.script_method + def forward(self, input): + return None + + @torch.jit.script_method + def eqBool(self, input: bool) -> bool: + return input + + @torch.jit.script_method + def eqInt(self, input: int) -> int: + return input + + @torch.jit.script_method + def eqFloat(self, input: float) -> float: + return input + + @torch.jit.script_method + def eqStr(self, input: str) -> str: + return input + + @torch.jit.script_method + def eqTensor(self, input: Tensor) -> Tensor: + return input + + @torch.jit.script_method + def eqDictStrKeyIntValue(self, input: Dict[str, int]) -> Dict[str, int]: + return input + + @torch.jit.script_method + def eqDictIntKeyIntValue(self, input: Dict[int, int]) -> Dict[int, int]: + return input + + @torch.jit.script_method + def eqDictFloatKeyIntValue(self, input: Dict[float, int]) -> Dict[float, int]: + return input + + @torch.jit.script_method + def listIntSumReturnTuple(self, input: List[int]) -> Tuple[List[int], int]: + sum = 0 + for x in input: + sum += x + return (input, sum) + + @torch.jit.script_method + def listBoolConjunction(self, input: List[bool]) -> bool: + res = True + for x in input: + res = res and x + return res + + @torch.jit.script_method + def listBoolDisjunction(self, input: List[bool]) -> bool: + res = False + for x in input: + res = res or x + return res + + @torch.jit.script_method + def tupleIntSumReturnTuple( + self, input: Tuple[int, int, int] + ) -> Tuple[Tuple[int, int, int], int]: + sum = 0 + for x in input: + sum += x + return (input, sum) + + @torch.jit.script_method + def optionalIntIsNone(self, input: Optional[int]) -> bool: + return input is None + + @torch.jit.script_method + def intEq0None(self, input: int) -> Optional[int]: + if input == 0: + return None + return input + + @torch.jit.script_method + def str3Concat(self, input: str) -> str: + return input + input + input + + @torch.jit.script_method + def newEmptyShapeWithItem(self, input): + return torch.tensor([int(input.item())])[0] + + @torch.jit.script_method + def testAliasWithOffset(self) -> List[Tensor]: + x = torch.tensor([100, 200]) + a = [x[0], x[1]] + return a + + @torch.jit.script_method + def testNonContiguous(self): + x = torch.tensor([100, 200, 300])[::2] + assert not x.is_contiguous() + assert x[0] == 100 + assert x[1] == 300 + return x + + @torch.jit.script_method + def conv2d(self, x: Tensor, w: Tensor, toChannelsLast: bool) -> Tensor: + r = torch.nn.functional.conv2d(x, w) + if toChannelsLast: + r = r.contiguous(memory_format=torch.channels_last) + else: + r = r.contiguous() + return r + + @torch.jit.script_method + def contiguous(self, x: Tensor) -> Tensor: + return x.contiguous() + + @torch.jit.script_method + def contiguousChannelsLast(self, x: Tensor) -> Tensor: + return x.contiguous(memory_format=torch.channels_last) + + @torch.jit.script_method + def contiguousChannelsLast3d(self, x: Tensor) -> Tensor: + return x.contiguous(memory_format=torch.channels_last_3d) diff --git a/test/mobile/model_test/builtin_ops.py b/test/mobile/model_test/builtin_ops.py new file mode 100644 index 00000000000000..75b57f7b0613d8 --- /dev/null +++ b/test/mobile/model_test/builtin_ops.py @@ -0,0 +1,125 @@ +import torch + + +# https://pytorch.org/docs/stable/jit_builtin_functions.html#builtin-functions + + +class TSBuiltinOpsModule(torch.nn.Module): + def __init__(self): + super(TSBuiltinOpsModule, self).__init__() + + def forward(self): + x = torch.tensor(1) + y = torch.tensor(0.5) + b = float(1) + s = "abcde" + l = ["1", "2", "test", "a{}b"] + d = {"key": 1} + d2 = {0: 100} + return len( + # type + bool(x), + bool(x.item()), + int(y), + int(y.item()), + float(x), + float(x.item()), + # math + x & x, + bool(x) & bool(x), + int(x) & int(x), + x | x, + bool(x) | bool(x), + int(x) | int(x), + x << x, + int(x) << int(x), + x >> x, + int(x) >> int(x), + x ^ x, + bool(x) ^ bool(x), + int(x) ^ int(x), + b * float(x), + b * int(x), + b + float(x), + b - float(x), + x.item() + y.item(), + x.item() - y.item(), + x.item() * y.item(), + x.item() / y.item(), + float(x) < float(y), + float(x) <= float(y), + float(x) > float(y), + float(x) > int(y), + float(x) >= float(y), + float(x) >= int(y), + float(x) == float(y), + float(x) == int(y), + float(x) != float(y), + int(x) != float(y), + float(x) / float(y), + int(x) / int(y), + max(x), + max(x.item(), y.item()), + max(int(x), int(y)), + max(float(x), float(y)), + min(x), + min(x.item(), y.item()), + min(int(x), int(y)), + min(float(x), float(y)), + int(l[0]), + float(l[0]), + # string + str(torch.tensor(1)), + l[2].find("t"), + l[2].replace("t", "x"), + l[2].lower(), + l[2].startswith("t"), + l[2].split("t"), + l[2].strip(), + l[2].rstrip(), + l[2].lstrip(), + l[2][slice(2)], + l[3].format("x"), + ord(l[2][0]), + len(torch.randn(3)), + len(l), + len(l[2]), + len(d), + len(d2), + ) + + +class TSCollectionOpsModule(torch.nn.Module): + def __init__(self): + super(TSCollectionOpsModule, self).__init__() + + def forward(self): + s = "abcde" + # list + l = ["1", "2", "test"] + l.reverse() + l.reverse() + l[1] = "3" + l.extend(["4"]) + # str dict + d = {"key": 1} + d.clear() + d.update({"key": 0}) + if "key" in d: + d["key"] = 2 + # int dict + d2 = {0: 100} + if 0 in d2: + d2.clear() + d2[0] = 100 + + return len( + s[torch.tensor(1)], + d["key"], + d2[0], + d.keys(), + d.items(), + d.values(), + d2.values(), + l.pop(), + ) diff --git a/test/mobile/model_test/coverage.yaml b/test/mobile/model_test/coverage.yaml new file mode 100644 index 00000000000000..5433fea4df1020 --- /dev/null +++ b/test/mobile/model_test/coverage.yaml @@ -0,0 +1,1094 @@ +_coverage: 87.53 +_covered_ops: 344 +_generated_ops: 693 +_production_ops: 393 +_uncovered_ops: 49 +all_generated_ops: +- aten::Bool.Tensor +- aten::Bool.int +- aten::Float.Scalar +- aten::Float.Tensor +- aten::Float.str +- aten::FloatImplicit +- aten::Int.Scalar +- aten::Int.Tensor +- aten::Int.float +- aten::Int.str +- aten::IntImplicit +- aten::ScalarImplicit +- aten::__and__.Tensor +- aten::__and__.bool +- aten::__and__.int +- aten::__contains__.int +- aten::__contains__.int_list +- aten::__contains__.str +- aten::__contains__.str_list +- aten::__derive_index +- aten::__getitem__.str +- aten::__getitem__.t +- aten::__lshift__.Tensor +- aten::__lshift__.int +- aten::__or__.Tensor +- aten::__or__.bool +- aten::__or__.int +- aten::__range_length +- aten::__rshift__.Tensor +- aten::__rshift__.int +- aten::__xor__.Tensor +- aten::__xor__.bool +- aten::__xor__.int +- aten::_infer_size +- aten::_set_item.int +- aten::_set_item.str +- aten::_set_item.t +- aten::_shape_as_tensor +- aten::_unique2 +- aten::abs +- aten::acos +- aten::acosh +- aten::adaptive_avg_pool1d +- aten::adaptive_avg_pool2d +- aten::adaptive_avg_pool3d +- aten::adaptive_max_pool1d +- aten::adaptive_max_pool2d +- aten::adaptive_max_pool3d +- aten::add +- aten::add.Scalar +- aten::add.Tensor +- aten::add.float +- aten::add.int +- aten::add.out +- aten::add.str +- aten::add.t +- aten::add_.Scalar +- aten::add_.Tensor +- aten::add_.t +- aten::addbmm +- aten::addcdiv +- aten::addcmul +- aten::addmm +- aten::addmv +- aten::addr +- aten::all +- aten::allclose +- aten::alpha_dropout +- aten::alpha_dropout_ +- aten::amax +- aten::amin +- aten::aminmax +- aten::angle +- aten::any +- aten::append.t +- aten::arange +- aten::arange.start +- aten::arange.start_step +- aten::argmax +- aten::argmin +- aten::argsort +- aten::as_strided +- aten::as_tensor.list +- aten::asin +- aten::asinh +- aten::atan +- aten::atan2 +- aten::atanh +- aten::atleast_1d +- aten::atleast_2d +- aten::atleast_3d +- aten::avg_pool1d +- aten::avg_pool2d +- aten::avg_pool3d +- aten::baddbmm +- aten::bartlett_window +- aten::batch_norm +- aten::bernoulli +- aten::bernoulli_.float +- aten::bilinear +- aten::binary_cross_entropy +- aten::binary_cross_entropy_with_logits +- aten::bincount +- aten::bitwise_and.Tensor +- aten::bitwise_not +- aten::bitwise_or.Tensor +- aten::bitwise_xor.Tensor +- aten::blackman_window +- aten::block_diag +- aten::bmm +- aten::broadcast_tensors +- aten::broadcast_to +- aten::bucketize.Tensor +- aten::cartesian_prod +- aten::cat +- aten::cauchy_ +- aten::cdist +- aten::ceil +- aten::ceil.Scalar +- aten::ceil.float +- aten::celu +- aten::chain_matmul +- aten::channel_shuffle +- aten::chunk +- aten::clamp +- aten::clamp_ +- aten::clamp_min +- aten::clear.int +- aten::clear.str +- aten::clone +- aten::coalesce +- aten::col2im +- aten::column_stack +- aten::combinations +- aten::complex +- aten::conj +- aten::constant_pad_nd +- aten::contiguous +- aten::conv1d +- aten::conv2d +- aten::conv3d +- aten::conv_transpose1d +- aten::conv_transpose2d.input +- aten::conv_transpose3d.input +- aten::copy_ +- aten::copy_.float +- aten::copy_.int +- aten::copysign.Scalar +- aten::copysign.Tensor +- aten::corrcoef +- aten::cos +- aten::cosh +- aten::cosine_embedding_loss +- aten::cosine_similarity +- aten::count_nonzero +- aten::cpu +- aten::cross +- aten::cross_entropy_loss +- aten::ctc_loss.Tensor +- aten::cummax +- aten::cummin +- aten::cumprod +- aten::cumsum +- aten::cumulative_trapezoid.x +- aten::deg2rad +- aten::dense_dim +- aten::dequantize.self +- aten::detach +- aten::detach_ +- aten::diag +- aten::diag_embed +- aten::diagflat +- aten::diagonal +- aten::diagonal_scatter +- aten::diff +- aten::digamma +- aten::dist +- aten::div +- aten::div.Scalar +- aten::div.Tensor +- aten::div.Tensor_mode +- aten::div.float +- aten::div.int +- aten::div_.Tensor +- aten::dot +- aten::dropout +- aten::dropout_ +- aten::dsplit.array +- aten::dstack +- aten::einsum +- aten::element_size +- aten::elu +- aten::embedding +- aten::embedding_bag.padding_idx +- aten::empty.memory_format +- aten::empty_like +- aten::empty_strided +- aten::eq.Scalar +- aten::eq.Tensor +- aten::eq.float +- aten::eq.float_int +- aten::eq.int +- aten::eq.int_list +- aten::eq.str +- aten::equal +- aten::erf +- aten::erfc +- aten::erfinv +- aten::exp +- aten::exp.float +- aten::exp2 +- aten::expand +- aten::expand_as +- aten::expm1 +- aten::exponential_ +- aten::extend.t +- aten::eye +- aten::fake_quantize_per_channel_affine +- aten::fake_quantize_per_tensor_affine +- aten::feature_alpha_dropout +- aten::feature_alpha_dropout_ +- aten::feature_dropout +- aten::feature_dropout_ +- aten::fill_.Scalar +- aten::fill_diagonal_ +- aten::find +- aten::flatten.using_ints +- aten::flip +- aten::fliplr +- aten::flipud +- aten::float_power.Tensor_Scalar +- aten::float_power.Tensor_Tensor +- aten::floor +- aten::floor.float +- aten::floor_divide +- aten::floor_divide.Scalar +- aten::floordiv.int +- aten::fmax +- aten::fmin +- aten::fmod.Scalar +- aten::frac +- aten::fractional_max_pool2d +- aten::fractional_max_pool3d +- aten::frobenius_norm.dim +- aten::frobenius_norm.out +- aten::full +- aten::full_like +- aten::gather +- aten::gcd +- aten::ge.Scalar +- aten::ge.Tensor +- aten::ge.float +- aten::ge.float_int +- aten::ge.int +- aten::gelu +- aten::geometric_ +- aten::glu +- aten::grid_sampler +- aten::group_norm +- aten::gru.input +- aten::gru_cell +- aten::gt.Scalar +- aten::gt.Tensor +- aten::gt.float +- aten::gt.float_int +- aten::gt.int +- aten::hamming_window +- aten::hann_window +- aten::hardshrink +- aten::hardsigmoid +- aten::hardsigmoid_ +- aten::hardswish +- aten::hardswish_ +- aten::hardtanh +- aten::hardtanh_ +- aten::heaviside +- aten::hinge_embedding_loss +- aten::histc +- aten::histogram.bin_ct +- aten::hsplit.array +- aten::hstack +- aten::huber_loss +- aten::hypot +- aten::i0 +- aten::igamma +- aten::igammac +- aten::im2col +- aten::imag +- aten::index.Tensor +- aten::index_fill.int_Scalar +- aten::index_put.hacked_twin +- aten::index_put_.hacked_twin +- aten::index_select +- aten::inner +- aten::instance_norm +- aten::is_coalesced +- aten::is_complex +- aten::is_conj +- aten::is_contiguous +- aten::is_floating_point +- aten::is_leaf +- aten::is_nonzero +- aten::is_pinned +- aten::is_set_to +- aten::is_signed +- aten::isclose +- aten::isfinite +- aten::isin.Tensor_Tensor +- aten::isinf +- aten::isnan +- aten::isneginf +- aten::isposinf +- aten::isreal +- aten::istft +- aten::item +- aten::items.str +- aten::kaiser_window +- aten::keys.str +- aten::kl_div +- aten::kron +- aten::kthvalue +- aten::l1_loss +- aten::layer_norm +- aten::lcm +- aten::ldexp.Tensor +- aten::le.Scalar +- aten::le.Tensor +- aten::le.float +- aten::le.int +- aten::leaky_relu +- aten::leaky_relu_ +- aten::len.Dict_int +- aten::len.Dict_str +- aten::len.Tensor +- aten::len.str +- aten::len.t +- aten::lerp.Scalar +- aten::lerp.Tensor +- aten::lgamma +- aten::linalg_matrix_exp +- aten::linalg_matrix_power +- aten::linear +- aten::linspace +- aten::list.t +- aten::log +- aten::log10 +- aten::log1p +- aten::log2 +- aten::log_normal_ +- aten::log_sigmoid +- aten::log_softmax.int +- aten::logaddexp +- aten::logaddexp2 +- aten::logcumsumexp +- aten::logical_and +- aten::logical_and.out +- aten::logical_not +- aten::logical_not.out +- aten::logical_or +- aten::logical_or.out +- aten::logical_xor +- aten::logical_xor.out +- aten::logit +- aten::logspace +- aten::logsumexp +- aten::lower +- aten::lstm.input +- aten::lstm_cell +- aten::lstrip +- aten::lt.Scalar +- aten::lt.Tensor +- aten::lt.float +- aten::lt.int +- aten::margin_ranking_loss +- aten::masked_fill.Scalar +- aten::masked_fill_.Scalar +- aten::masked_select +- aten::matmul +- aten::max +- aten::max.dim +- aten::max.other +- aten::max_pool1d +- aten::max_pool2d +- aten::max_pool3d +- aten::maximum +- aten::mean +- aten::mean.dim +- aten::median +- aten::meshgrid +- aten::meshgrid.indexing +- aten::min +- aten::min.dim +- aten::min.other +- aten::minimum +- aten::mish +- aten::mm +- aten::mode +- aten::movedim.int +- aten::mse_loss +- aten::msort +- aten::mul +- aten::mul.Scalar +- aten::mul.Tensor +- aten::mul.float +- aten::mul.float_int +- aten::mul.int +- aten::mul.int_float +- aten::mul.left_t +- aten::mul.out +- aten::mul_.Scalar +- aten::mul_.Tensor +- aten::multi_margin_loss +- aten::multilabel_margin_loss +- aten::multinomial +- aten::mv +- aten::mvlgamma +- aten::nan_to_num +- aten::nan_to_num_ +- aten::nanmean +- aten::nanmedian +- aten::nanquantile +- aten::nansum +- aten::narrow +- aten::ne.Scalar +- aten::ne.Tensor +- aten::ne.float +- aten::ne.int +- aten::ne.int_float +- aten::ne.int_list +- aten::ne.str +- aten::neg +- aten::neg.int +- aten::new_empty +- aten::new_full +- aten::new_ones +- aten::new_zeros +- aten::nll_loss_nd +- aten::nonzero +- aten::norm.Scalar +- aten::norm.ScalarOpt_dim +- aten::norm.ScalarOpt_dim_dtype +- aten::norm.dtype_out +- aten::norm.out +- aten::normal.float_float +- aten::normal_ +- aten::nuclear_norm +- aten::nuclear_norm.dim +- aten::nuclear_norm.dim_out +- aten::nuclear_norm.out +- aten::numel +- aten::one_hot +- aten::ones +- aten::ones_like +- aten::ord +- aten::outer +- aten::pad_sequence +- aten::pairwise_distance +- aten::pdist +- aten::permute +- aten::pixel_shuffle +- aten::pixel_unshuffle +- aten::poisson +- aten::poisson_nll_loss +- aten::polar +- aten::polygamma +- aten::pop.t +- aten::pow.Tensor_Scalar +- aten::pow.Tensor_Tensor +- aten::pow.int_float +- aten::prelu +- aten::prod +- aten::quantile +- aten::quantile.scalar +- aten::quantize_per_channel +- aten::quantize_per_tensor +- aten::quantize_per_tensor.tensor_qparams +- aten::quantized_gru.input +- aten::quantized_lstm.input +- aten::rad2deg +- aten::rand +- aten::rand_like +- aten::randint +- aten::randint.low +- aten::randint_like +- aten::randn +- aten::randn_like +- aten::random_ +- aten::randperm +- aten::range.step +- aten::ravel +- aten::real +- aten::reciprocal +- aten::reflection_pad1d +- aten::reflection_pad2d +- aten::reflection_pad3d +- aten::relu +- aten::relu_ +- aten::remainder.Scalar +- aten::remainder.int +- aten::renorm +- aten::repeat +- aten::repeat_interleave.Tensor +- aten::replace +- aten::replication_pad1d +- aten::replication_pad2d +- aten::replication_pad3d +- aten::requires_grad_ +- aten::reshape +- aten::resize_as_ +- aten::resolve_conj +- aten::resolve_neg +- aten::reverse.t +- aten::rnn_tanh.input +- aten::rnn_tanh_cell +- aten::roll +- aten::rot90 +- aten::round +- aten::round.Scalar +- aten::rrelu +- aten::rsqrt +- aten::rstrip +- aten::scatter.src +- aten::scatter_.src +- aten::scatter_add +- aten::scatter_add_ +- aten::searchsorted.Tensor +- aten::select.int +- aten::select_scatter +- aten::selu +- aten::sgn +- aten::sigmoid +- aten::sign +- aten::signbit +- aten::silu +- aten::sin +- aten::sinc +- aten::sinh +- aten::size +- aten::size.int +- aten::slice.Tensor +- aten::slice.str +- aten::slice.t +- aten::slice_scatter +- aten::smooth_l1_loss +- aten::soft_margin_loss +- aten::softmax.int +- aten::softplus +- aten::softshrink +- aten::sort +- aten::split +- aten::split.Tensor +- aten::split.str +- aten::sqrt +- aten::sqrt.int +- aten::square +- aten::squeeze.dim +- aten::squeeze_.dim +- aten::stack +- aten::startswith +- aten::std +- aten::std_mean +- aten::stft +- aten::str +- aten::strip +- aten::sub +- aten::sub.Scalar +- aten::sub.Tensor +- aten::sub.float +- aten::sub.int +- aten::sub_.Tensor +- aten::sum +- aten::sum.dim_IntList +- aten::sum.int +- aten::t +- aten::take +- aten::take_along_dim +- aten::tan +- aten::tanh +- aten::tensor +- aten::tensor.float +- aten::tensor.int +- aten::tensor_split.indices +- aten::tensor_split.sections +- aten::tensordot +- aten::tensordot.out +- aten::tile +- aten::to.device +- aten::to.dtype +- aten::to.dtype_layout +- aten::to.prim_Device +- aten::topk +- aten::trace +- aten::transpose.int +- aten::trapezoid.x +- aten::trapz.x +- aten::tril +- aten::tril_indices +- aten::triplet_margin_loss +- aten::triu +- aten::triu_indices +- aten::trunc +- aten::trunc_ +- aten::type_as +- aten::unbind.int +- aten::unflatten.int +- aten::unfold +- aten::uniform_ +- aten::unique_consecutive +- aten::unique_dim +- aten::unsqueeze +- aten::unsqueeze_ +- aten::update.str +- aten::upsample_bicubic2d.vec +- aten::upsample_bilinear2d.vec +- aten::upsample_linear1d.vec +- aten::upsample_nearest1d.vec +- aten::upsample_nearest2d.vec +- aten::upsample_nearest3d.vec +- aten::upsample_trilinear3d.vec +- aten::values.int +- aten::values.str +- aten::vander +- aten::var +- aten::var_mean +- aten::vdot +- aten::view +- aten::view_as +- aten::view_as_complex +- aten::view_as_real +- aten::vsplit.array +- aten::vstack +- aten::where +- aten::where.ScalarOther +- aten::where.self +- aten::xlogy.Scalar_Other +- aten::xlogy.Scalar_Self +- aten::xlogy.Tensor +- aten::zeros +- aten::zeros.out +- aten::zeros_like +- prepacked::conv2d_clamp_run +- prepacked::linear_clamp_run +- prim::TupleUnpack +- prim::is_meta +- prim::is_quantized +- prim::is_sparse +- prim::max +- prim::max.float +- prim::max.int +- prim::max.self_int +- prim::min +- prim::min.float +- prim::min.int +- prim::min.self_int +- prim::unchecked_cast +- quantized::add +- quantized::add_relu +- quantized::add_scalar +- quantized::batch_norm2d +- quantized::batch_norm3d +- quantized::cat +- quantized::conv1d +- quantized::conv1d_prepack +- quantized::conv1d_relu +- quantized::conv1d_unpack +- quantized::conv2d.new +- quantized::conv2d_prepack +- quantized::conv2d_relu.new +- quantized::conv2d_unpack +- quantized::conv3d.new +- quantized::conv3d_prepack +- quantized::conv3d_relu.new +- quantized::conv3d_unpack +- quantized::conv_transpose1d +- quantized::conv_transpose1d_prepack +- quantized::conv_transpose1d_unpack +- quantized::conv_transpose2d +- quantized::conv_transpose2d_prepack +- quantized::conv_transpose3d_prepack +- quantized::embedding_4bit +- quantized::embedding_byte +- quantized::hardswish +- quantized::instance_norm +- quantized::leaky_relu +- quantized::linear +- quantized::linear_dynamic +- quantized::linear_dynamic_fp16 +- quantized::linear_relu +- quantized::mul +- quantized::mul_scalar +- quantized::quantized_gru_cell_dynamic +- quantized::quantized_lstm_cell_dynamic +- quantized::quantized_rnn_tanh_cell_dynamic +covered_ops: + aten::Bool.Tensor: 19 + aten::Bool.int: 7 + aten::Float.Scalar: 18 + aten::Float.Tensor: 11 + aten::Float.str: 6 + aten::FloatImplicit: 2 + aten::Int.Scalar: 19 + aten::Int.Tensor: 35 + aten::Int.float: 6 + aten::Int.str: 12 + aten::IntImplicit: 11 + aten::ScalarImplicit: 3 + aten::__and__.Tensor: 13 + aten::__and__.bool: 11 + aten::__and__.int: 2 + aten::__contains__.int: 5 + aten::__contains__.int_list: 17 + aten::__contains__.str: 22 + aten::__contains__.str_list: 5 + aten::__derive_index: 24 + aten::__getitem__.str: 20 + aten::__getitem__.t: 178 + aten::__lshift__.int: 2 + aten::__range_length: 23 + aten::__rshift__.int: 2 + aten::__xor__.bool: 10 + aten::_infer_size: 7 + aten::_set_item.int: 7 + aten::_set_item.str: 163 + aten::_set_item.t: 8 + aten::_shape_as_tensor: 10 + aten::adaptive_avg_pool1d: 1 + aten::adaptive_avg_pool2d: 33 + aten::adaptive_avg_pool3d: 1 + aten::add.Scalar: 33 + aten::add.Tensor: 63 + aten::add.float: 5 + aten::add.int: 49 + aten::add.out: 2 + aten::add.str: 29 + aten::add.t: 11 + aten::add_.Scalar: 15 + aten::add_.Tensor: 29 + aten::addcmul: 2 + aten::addmm: 7 + aten::all: 6 + aten::allclose: 1 + aten::any: 14 + aten::append.t: 59 + aten::arange: 16 + aten::arange.start: 6 + aten::arange.start_step: 16 + aten::argmax: 2 + aten::as_strided: 10 + aten::as_tensor.list: 4 + aten::atan: 4 + aten::avg_pool1d: 6 + aten::avg_pool2d: 7 + aten::batch_norm: 15 + aten::binary_cross_entropy: 15 + aten::binary_cross_entropy_with_logits: 3 + aten::bitwise_not: 13 + aten::bmm: 16 + aten::broadcast_tensors: 1 + aten::cat: 90 + aten::ceil: 3 + aten::ceil.float: 7 + aten::chunk: 19 + aten::clamp: 36 + aten::clamp_: 12 + aten::clamp_min: 3 + aten::clear.str: 2 + aten::clone: 26 + aten::coalesce: 2 + aten::conj: 1 + aten::constant_pad_nd: 17 + aten::contiguous: 113 + aten::conv1d: 12 + aten::conv2d: 10 + aten::conv_transpose2d.input: 5 + aten::copy_: 15 + aten::copy_.int: 1 + aten::cos: 4 + aten::count_nonzero: 4 + aten::ctc_loss.Tensor: 1 + aten::cumsum: 13 + aten::dequantize.self: 30 + aten::detach: 34 + aten::div: 9 + aten::div.Scalar: 8 + aten::div.Tensor: 71 + aten::div.Tensor_mode: 7 + aten::div.float: 3 + aten::div.int: 7 + aten::div_.Tensor: 7 + aten::dropout: 41 + aten::embedding: 16 + aten::embedding_bag.padding_idx: 2 + aten::empty.memory_format: 11 + aten::empty_like: 11 + aten::empty_strided: 3 + aten::eq.Scalar: 24 + aten::eq.Tensor: 6 + aten::eq.int: 57 + aten::eq.int_list: 20 + aten::eq.str: 43 + aten::exp: 18 + aten::exp.float: 4 + aten::expand: 26 + aten::expand_as: 3 + aten::extend.t: 38 + aten::feature_dropout: 1 + aten::fill_.Scalar: 17 + aten::find: 3 + aten::flatten.using_ints: 45 + aten::flip: 1 + aten::floor: 5 + aten::floor.float: 2 + aten::floor_divide: 4 + aten::floor_divide.Scalar: 7 + aten::floordiv.int: 21 + aten::full: 10 + aten::full_like: 10 + aten::gather: 10 + aten::ge.Scalar: 4 + aten::ge.Tensor: 6 + aten::ge.int: 29 + aten::gelu: 12 + aten::glu: 18 + aten::grid_sampler: 3 + aten::gt.Scalar: 16 + aten::gt.float: 16 + aten::gt.float_int: 3 + aten::gt.int: 52 + aten::hardsigmoid: 3 + aten::hardsigmoid_: 2 + aten::hardswish_: 4 + aten::hardtanh: 3 + aten::hardtanh_: 3 + aten::hstack: 2 + aten::index.Tensor: 23 + aten::index_fill.int_Scalar: 15 + aten::index_select: 31 + aten::is_coalesced: 2 + aten::is_floating_point: 9 + aten::isnan: 1 + aten::item: 40 + aten::items.str: 3 + aten::keys.str: 15 + aten::layer_norm: 26 + aten::le.Scalar: 1 + aten::le.Tensor: 10 + aten::le.float: 2 + aten::le.int: 17 + aten::leaky_relu: 1 + aten::leaky_relu_: 5 + aten::len.Dict_int: 5 + aten::len.Tensor: 19 + aten::len.str: 23 + aten::len.t: 177 + aten::linear: 46 + aten::linspace: 3 + aten::list.t: 24 + aten::log: 18 + aten::log10: 4 + aten::log1p: 5 + aten::log_softmax.int: 31 + aten::logical_and: 1 + aten::logical_not: 10 + aten::logit: 7 + aten::lower: 10 + aten::lstm.input: 4 + aten::lt.Scalar: 8 + aten::lt.Tensor: 1 + aten::lt.float: 16 + aten::lt.int: 46 + aten::masked_fill.Scalar: 16 + aten::matmul: 12 + aten::max: 18 + aten::max.dim: 30 + aten::max.other: 7 + aten::max_pool2d: 10 + aten::maximum: 4 + aten::mean: 10 + aten::mean.dim: 16 + aten::meshgrid.indexing: 2 + aten::min: 2 + aten::min.dim: 4 + aten::min.other: 17 + aten::minimum: 4 + aten::mse_loss: 1 + aten::mul.Scalar: 26 + aten::mul.Tensor: 90 + aten::mul.float: 5 + aten::mul.float_int: 3 + aten::mul.int: 26 + aten::mul.int_float: 4 + aten::mul.left_t: 15 + aten::mul.out: 1 + aten::mul_.Scalar: 11 + aten::mul_.Tensor: 5 + aten::nan_to_num: 3 + aten::nan_to_num_: 10 + aten::narrow: 10 + aten::ne.Scalar: 14 + aten::ne.Tensor: 5 + aten::ne.int: 44 + aten::ne.int_float: 2 + aten::ne.int_list: 20 + aten::ne.str: 3 + aten::neg: 29 + aten::neg.int: 19 + aten::new_zeros: 6 + aten::nll_loss_nd: 3 + aten::nonzero: 4 + aten::norm.Scalar: 1 + aten::norm.ScalarOpt_dim: 4 + aten::numel: 8 + aten::one_hot: 2 + aten::ones: 38 + aten::ones_like: 16 + aten::ord: 20 + aten::permute: 43 + aten::pop.t: 7 + aten::pow.Tensor_Scalar: 3 + aten::pow.int_float: 2 + aten::quantile.scalar: 1 + aten::quantize_per_tensor: 66 + aten::quantize_per_tensor.tensor_qparams: 1 + aten::rand: 25 + aten::randint.low: 2 + aten::randn_like: 17 + aten::reciprocal: 1 + aten::reflection_pad2d: 1 + aten::relu: 82 + aten::relu_: 9 + aten::remainder.Scalar: 2 + aten::remainder.int: 22 + aten::repeat: 16 + aten::replace: 1 + aten::replication_pad1d: 1 + aten::replication_pad2d: 2 + aten::replication_pad3d: 1 + aten::requires_grad_: 4 + aten::reshape: 36 + aten::resize_as_: 1 + aten::resolve_conj: 1 + aten::resolve_neg: 1 + aten::reverse.t: 2 + aten::round.Scalar: 4 + aten::rstrip: 1 + aten::scatter_.src: 6 + aten::scatter_add_: 10 + aten::select.int: 57 + aten::selu: 2 + aten::sigmoid: 93 + aten::sin: 4 + aten::size: 66 + aten::size.int: 66 + aten::slice.Tensor: 75 + aten::slice.str: 12 + aten::slice.t: 43 + aten::softmax.int: 63 + aten::softplus: 2 + aten::sort: 18 + aten::split.str: 10 + aten::sqrt: 1 + aten::squeeze.dim: 26 + aten::stack: 30 + aten::startswith: 10 + aten::str: 16 + aten::strip: 3 + aten::sub: 8 + aten::sub.Scalar: 26 + aten::sub.Tensor: 94 + aten::sub.int: 52 + aten::sub_.Tensor: 4 + aten::sum: 17 + aten::sum.dim_IntList: 19 + aten::sum.int: 1 + aten::t: 3 + aten::tanh: 26 + aten::tensor: 51 + aten::tensor.float: 28 + aten::tensor.int: 34 + aten::tensor_split.indices: 4 + aten::to.device: 11 + aten::to.dtype: 23 + aten::to.dtype_layout: 27 + aten::to.prim_Device: 23 + aten::topk: 10 + aten::transpose.int: 33 + aten::triu: 10 + aten::trunc_: 3 + aten::type_as: 6 + aten::unbind.int: 24 + aten::unique_consecutive: 2 + aten::unsqueeze: 34 + aten::unsqueeze_: 6 + aten::update.str: 4 + aten::upsample_bicubic2d.vec: 1 + aten::upsample_bilinear2d.vec: 8 + aten::upsample_linear1d.vec: 1 + aten::upsample_nearest1d.vec: 2 + aten::upsample_nearest2d.vec: 30 + aten::upsample_nearest3d.vec: 2 + aten::upsample_trilinear3d.vec: 1 + aten::values.int: 3 + aten::view: 61 + aten::vstack: 1 + aten::where.ScalarOther: 4 + aten::where.self: 10 + aten::zeros: 75 + aten::zeros.out: 1 + aten::zeros_like: 7 + prepacked::conv2d_clamp_run: 32 + prepacked::linear_clamp_run: 26 + prim::TupleUnpack: 120 + prim::max.float: 7 + prim::max.int: 14 + prim::max.self_int: 17 + prim::min: 4 + prim::min.int: 35 + prim::min.self_int: 25 + prim::unchecked_cast: 100 + quantized::add: 58 + quantized::add_relu: 1 + quantized::batch_norm2d: 1 + quantized::cat: 4 + quantized::conv1d: 1 + quantized::conv2d.new: 55 + quantized::conv2d_prepack: 14 + quantized::conv2d_relu.new: 50 + quantized::conv_transpose2d: 2 + quantized::embedding_4bit: 1 + quantized::embedding_byte: 14 + quantized::hardswish: 1 + quantized::instance_norm: 1 + quantized::leaky_relu: 2 + quantized::linear: 27 + quantized::linear_dynamic: 21 + quantized::linear_dynamic_fp16: 18 + quantized::linear_relu: 2 + quantized::mul: 4 +uncovered_ops: + aten::__getitem__.Dict_int: 4 + aten::__getitem__.Dict_str: 39 + aten::__is__: 83 + aten::__isnot__: 81 + aten::__not__: 32 + aten::_aminmax: 4 + aten::_convolution: 12 + aten::_convolution.deprecated: 3 + aten::_make_per_tensor_quantized_tensor: 2 + aten::_pack_padded_sequence: 10 + aten::_pad_packed_sequence: 10 + aten::_reshape_from_tensor: 10 + aten::backward: 23 + aten::copy_.Tensor: 27 + aten::dequantize.list: 1 + aten::dequantize.tensor: 36 + aten::dim: 36 + aten::format: 58 + aten::get.default_str: 14 + aten::index_put_: 16 + aten::lstm.data: 8 + aten::nll_loss: 1 + aten::nll_loss2d: 1 + aten::quantized_lstm.data: 2 + aten::rsub.Scalar: 5 + aten::sparse_coo_tensor.indices: 1 + aten::sparse_resize_and_clear_: 1 + aten::to.prim_dtype: 38 + aten::true_divide.Tensor: 2 + aten::upsample_nearest2d: 7 + prepacked::conv2d_clamp_prepack: 2 + prepacked::conv2d_transpose_clamp_prepack: 1 + prepacked::conv2d_transpose_clamp_run: 1 + prim::ModuleContainerIndex.list: 2 + prim::NumToTensor.Scalar: 15 + prim::Print: 1 + prim::RaiseException: 103 + prim::TupleIndex: 157 + prim::Uninitialized: 80 + prim::device: 46 + prim::dtype: 45 + prim::is_cuda: 1 + quantized::conv2d: 4 + quantized::conv_prepack: 5 + quantized::linear_prepack: 29 + quantized::linear_prepack_fp16: 25 + quantized::linear_unpack: 4 + quantized::linear_unpack_fp16: 4 + quantized::mul.Scalar: 1 diff --git a/test/mobile/model_test/gen_test_model.py b/test/mobile/model_test/gen_test_model.py new file mode 100644 index 00000000000000..e9e3908630be40 --- /dev/null +++ b/test/mobile/model_test/gen_test_model.py @@ -0,0 +1,243 @@ +import io +import sys +import torch +import yaml +from android_api_module import AndroidAPIModule +from builtin_ops import ( + TSBuiltinOpsModule, + TSCollectionOpsModule, +) +from math_ops import ( + PointwiseOpsModule, + ReductionOpsModule, + ComparisonOpsModule, + OtherMathOpsModule, + SpectralOpsModule, + BlasLapackOpsModule, +) +from nn_ops import ( + NNConvolutionModule, + NNPoolingModule, + NNPaddingModule, + NNNormalizationModule, + NNActivationModule, + NNRecurrentModule, + NNTransformerModule, + NNLinearModule, + NNDropoutModule, + NNSparseModule, + NNDistanceModule, + NNLossFunctionModule, + NNVisionModule, + NNShuffleModule, + NNUtilsModule, +) +from quantization_ops import ( + GeneralQuantModule, + DynamicQuantModule, + StaticQuantModule, + FusedQuantModule, +) +from sampling_ops import SamplingOpsModule +from tensor_ops import ( + TensorOpsModule, + TensorCreationOpsModule, + TensorIndexingOpsModule, + TensorTypingOpsModule, + TensorViewOpsModule, +) +from torch.jit.mobile import _load_for_lite_interpreter +from torchvision_models import MobileNetV2Module + +test_path_ios = "ios/TestApp/models/" +test_path_android = "android/pytorch_android/src/androidTest/assets/" + +production_ops_path = "test/mobile/model_test/model_ops.yaml" +coverage_out_path = "test/mobile/model_test/coverage.yaml" + +all_modules = { + # math ops + "pointwise_ops": PointwiseOpsModule(), + "reduction_ops": ReductionOpsModule(), + "comparison_ops": ComparisonOpsModule(), + "spectral_ops": SpectralOpsModule(), + "other_math_ops": OtherMathOpsModule(), + "blas_lapack_ops": BlasLapackOpsModule(), + # sampling + "sampling_ops": SamplingOpsModule(), + # tensor ops + "tensor_general_ops": TensorOpsModule(), + "tensor_creation_ops": TensorCreationOpsModule(), + "tensor_indexing_ops": TensorIndexingOpsModule(), + "tensor_typing_ops": TensorTypingOpsModule(), + "tensor_view_ops": TensorViewOpsModule(), + # nn ops + "convolution_ops": NNConvolutionModule(), + "pooling_ops": NNPoolingModule(), + "padding_ops": NNPaddingModule(), + "activation_ops": NNActivationModule(), + "normalization_ops": NNNormalizationModule(), + "recurrent_ops": NNRecurrentModule(), + "transformer_ops": NNTransformerModule(), + "linear_ops": NNLinearModule(), + "dropout_ops": NNDropoutModule(), + "sparse_ops": NNSparseModule(), + "distance_function_ops": NNDistanceModule(), + "loss_function_ops": NNLossFunctionModule(), + "vision_function_ops": NNVisionModule(), + "shuffle_ops": NNShuffleModule(), + "nn_utils_ops": NNUtilsModule(), + # quantization ops + "general_quant_ops": GeneralQuantModule(), + "dynamic_quant_ops": DynamicQuantModule(), + "static_quant_ops": StaticQuantModule(), + "fused_quant_ops": FusedQuantModule(), + # TorchScript buildin ops + "torchscript_builtin_ops": TSBuiltinOpsModule(), + "torchscript_collection_ops": TSCollectionOpsModule(), + # vision + "mobilenet_v2": MobileNetV2Module(), + # android api module + "android_api_module": AndroidAPIModule(), +} + +models_need_trace = [ + "static_quant_ops", +] + + +def calcOpsCoverage(ops): + with open(production_ops_path) as input_yaml_file: + production_ops_dict = yaml.safe_load(input_yaml_file) + + production_ops = set(production_ops_dict["root_operators"].keys()) + all_generated_ops = set(ops) + covered_ops = production_ops.intersection(all_generated_ops) + uncovered_ops = production_ops - covered_ops + coverage = round(100 * len(covered_ops) / len(production_ops), 2) + + # weighted coverage (take op occurances into account) + total_occurances = sum(production_ops_dict["root_operators"].values()) + covered_ops_dict = {op: production_ops_dict["root_operators"][op] for op in covered_ops} + uncovered_ops_dict = {op: production_ops_dict["root_operators"][op] for op in uncovered_ops} + covered_occurances = sum(covered_ops_dict.values()) + occurances_coverage = round(100 * covered_occurances / total_occurances, 2) + + print(f"\n{len(uncovered_ops)} uncovered ops: {uncovered_ops}\n") + print(f"Generated {len(all_generated_ops)} ops") + print(f"Covered {len(covered_ops)}/{len(production_ops)} ({coverage}%) production ops") + print(f"Covered {covered_occurances}/{total_occurances} ({occurances_coverage}%) occurances") + print(f"pytorch ver {torch.__version__}\n") + + with open(coverage_out_path, "w") as f: + yaml.safe_dump( + { + "_covered_ops": len(covered_ops), + "_production_ops": len(production_ops), + "_generated_ops": len(all_generated_ops), + "_uncovered_ops": len(uncovered_ops), + "_coverage": round(coverage, 2), + "uncovered_ops": uncovered_ops_dict, + "covered_ops": covered_ops_dict, + "all_generated_ops": sorted(list(all_generated_ops)), + }, + f, + ) + + +def getModuleFromName(model_name): + if model_name not in all_modules: + print("Cannot find test model for " + model_name) + return None, [] + + module = all_modules[model_name] + if not isinstance(module, torch.nn.Module): + module = module.getModule() + + has_bundled_inputs = False # module.find_method("get_all_bundled_inputs") + + if model_name in models_need_trace: + module = torch.jit.trace(module, []) + else: + module = torch.jit.script(module) + + ops = torch.jit.export_opnames(module) + print(ops) + + # try to run the model + runModule(module) + + return module, ops + + +def runModule(module): + buffer = io.BytesIO(module._save_to_buffer_for_lite_interpreter()) + buffer.seek(0) + lite_module = _load_for_lite_interpreter(buffer) + if lite_module.find_method("get_all_bundled_inputs"): + # run with the first bundled input + input = lite_module.run_method("get_all_bundled_inputs")[0] + lite_module.forward(*input) + else: + # assuming model has no input + lite_module() + + +# generate all models in the given folder. +# If it's "on the fly" mode, add "_temp" suffix to the model file. +def generateAllModels(folder, on_the_fly=False): + all_ops = [] + for name in all_modules: + module, ops = getModuleFromName(name) + all_ops = all_ops + ops + path = folder + name + ("_temp.ptl" if on_the_fly else ".ptl") + module._save_for_lite_interpreter(path) + print("model saved to " + path) + calcOpsCoverage(all_ops) + + +# generate/update a given model for storage +def generateModel(name): + module, ops = getModuleFromName(name) + if module is None: + return + path_ios = test_path_ios + name + ".ptl" + path_android = test_path_android + name + ".ptl" + module._save_for_lite_interpreter(path_ios) + module._save_for_lite_interpreter(path_android) + print("model saved to " + path_ios + " and " + path_android) + + +def main(argv): + if argv is None or len(argv) != 1: + print( + """ +This script generate models for mobile test. For each model we have a "storage" version +and an "on-the-fly" version. The "on-the-fly" version will be generated during test,and +should not be committed to the repo. +The "storage" version is for back compatibility # test (a model generated today should +run on master branch in the next 6 months). We can use this script to update a model that +is no longer supported. +- use 'python gen_test_model.py android-test' to generate on-the-fly models for android +- use 'python gen_test_model.py ios-test' to generate on-the-fly models for ios +- use 'python gen_test_model.py android' to generate checked-in models for android +- use 'python gen_test_model.py ios' to generate on-the-fly models for ios +- use 'python gen_test_model.py ' to update the given storage model +""" + ) + return + + if argv[0] == "android": + generateAllModels(test_path_android, on_the_fly=False) + elif argv[0] == "ios": + generateAllModels(test_path_ios, on_the_fly=False) + elif argv[0] == "android-test": + generateAllModels(test_path_android, on_the_fly=True) + elif argv[0] == "ios-test": + generateAllModels(test_path_ios, on_the_fly=True) + else: + generateModel(argv[0]) + + +if __name__ == "__main__": + main(sys.argv[1:]) diff --git a/test/mobile/model_test/math_ops.py b/test/mobile/model_test/math_ops.py new file mode 100644 index 00000000000000..f89e3bca70d6d3 --- /dev/null +++ b/test/mobile/model_test/math_ops.py @@ -0,0 +1,469 @@ +# https://pytorch.org/docs/stable/torch.html#math-operations + +import math + +import torch + + +class PointwiseOpsModule(torch.nn.Module): + def __init__(self): + super(PointwiseOpsModule, self).__init__() + + def forward(self): + return self.pointwise_ops() + + def pointwise_ops(self): + a = torch.randn(4) + b = torch.randn(4) + t = torch.tensor([-1, -2, 3], dtype=torch.int8) + r = torch.tensor([0, 1, 10, 0], dtype=torch.int8) + t = torch.tensor([-1, -2, 3], dtype=torch.int8) + s = torch.tensor([4, 0, 1, 0], dtype=torch.int8) + f = torch.zeros(3) + g = torch.tensor([-1, 0, 1]) + w = torch.tensor([0.3810, 1.2774, -0.2972, -0.3719, 0.4637]) + return len( + torch.abs(torch.tensor([-1, -2, 3])), + torch.absolute(torch.tensor([-1, -2, 3])), + torch.acos(a), + torch.arccos(a), + torch.acosh(a.uniform_(1.0, 2.0)), + torch.add(a, 20), + torch.add(a, b, out=a), + b.add(a), + b.add(a, out=b), + b.add_(a), + b.add(1), + torch.add(a, torch.randn(4, 1), alpha=10), + torch.addcdiv( + torch.randn(1, 3), torch.randn(3, 1), torch.randn(1, 3), value=0.1 + ), + torch.addcmul( + torch.randn(1, 3), torch.randn(3, 1), torch.randn(1, 3), value=0.1 + ), + torch.angle(a), + torch.asin(a), + torch.arcsin(a), + torch.asinh(a), + torch.arcsinh(a), + torch.atan(a), + torch.arctan(a), + torch.atanh(a.uniform_(-1.0, 1.0)), + torch.arctanh(a.uniform_(-1.0, 1.0)), + torch.atan2(a, a), + torch.bitwise_not(t), + torch.bitwise_and(t, torch.tensor([1, 0, 3], dtype=torch.int8)), + torch.bitwise_or(t, torch.tensor([1, 0, 3], dtype=torch.int8)), + torch.bitwise_xor(t, torch.tensor([1, 0, 3], dtype=torch.int8)), + torch.ceil(a), + torch.ceil(float(torch.tensor(0.5))), + torch.ceil(torch.tensor(0.5).item()), + torch.clamp(a, min=-0.5, max=0.5), + torch.clamp(a, min=0.5), + torch.clamp(a, max=0.5), + torch.clip(a, min=-0.5, max=0.5), + torch.conj(a), + torch.copysign(a, 1), + torch.copysign(a, b), + torch.cos(a), + torch.cosh(a), + torch.deg2rad( + torch.tensor([[180.0, -180.0], [360.0, -360.0], [90.0, -90.0]]) + ), + torch.div(a, b), + a.div(b), + a.div(1), + a.div_(b), + torch.divide(a, b, rounding_mode="trunc"), + torch.divide(a, b, rounding_mode="floor"), + torch.digamma(torch.tensor([1.0, 0.5])), + torch.erf(torch.tensor([0.0, -1.0, 10.0])), + torch.erfc(torch.tensor([0.0, -1.0, 10.0])), + torch.erfinv(torch.tensor([0.0, 0.5, -1.0])), + torch.exp(torch.tensor([0.0, math.log(2.0)])), + torch.exp(float(torch.tensor(1))), + torch.exp2(torch.tensor([0.0, math.log(2.0), 3.0, 4.0])), + torch.expm1(torch.tensor([0.0, math.log(2.0)])), + torch.fake_quantize_per_channel_affine( + torch.randn(2, 2, 2), + (torch.randn(2) + 1) * 0.05, + torch.zeros(2), + 1, + 0, + 255, + ), + torch.fake_quantize_per_tensor_affine(a, 0.1, 0, 0, 255), + torch.float_power(torch.randint(10, (4,)), 2), + torch.float_power(torch.arange(1, 5), torch.tensor([2, -3, 4, -5])), + torch.floor(a), + torch.floor(float(torch.tensor(1))), + torch.floor_divide(torch.tensor([4.0, 3.0]), torch.tensor([2.0, 2.0])), + torch.floor_divide(torch.tensor([4.0, 3.0]), 1.4), + torch.fmod(torch.tensor([-3, -2, -1, 1, 2, 3]), 2), + torch.fmod(torch.tensor([1, 2, 3, 4, 5]), 1.5), + torch.frac(torch.tensor([1.0, 2.5, -3.2])), + torch.randn(4, dtype=torch.cfloat).imag, + torch.ldexp(torch.tensor([1.0]), torch.tensor([1])), + torch.ldexp(torch.tensor([1.0]), torch.tensor([1, 2, 3, 4])), + torch.lerp(torch.arange(1.0, 5.0), torch.empty(4).fill_(10), 0.5), + torch.lerp( + torch.arange(1.0, 5.0), + torch.empty(4).fill_(10), + torch.full_like(torch.arange(1.0, 5.0), 0.5), + ), + torch.lgamma(torch.arange(0.5, 2, 0.5)), + torch.log(torch.arange(5) + 10), + torch.log10(torch.rand(5)), + torch.log1p(torch.randn(5)), + torch.log2(torch.rand(5)), + torch.logaddexp(torch.tensor([-1.0]), torch.tensor([-1, -2, -3])), + torch.logaddexp( + torch.tensor([-100.0, -200.0, -300.0]), torch.tensor([-1, -2, -3]) + ), + torch.logaddexp( + torch.tensor([1.0, 2000.0, 30000.0]), torch.tensor([-1, -2, -3]) + ), + torch.logaddexp2(torch.tensor([-1.0]), torch.tensor([-1, -2, -3])), + torch.logaddexp2( + torch.tensor([-100.0, -200.0, -300.0]), torch.tensor([-1, -2, -3]) + ), + torch.logaddexp2( + torch.tensor([1.0, 2000.0, 30000.0]), torch.tensor([-1, -2, -3]) + ), + torch.logical_and(r, s), + torch.logical_and(r.double(), s.double()), + torch.logical_and(r.double(), s), + torch.logical_and(r, s, out=torch.empty(4, dtype=torch.bool)), + torch.logical_not(torch.tensor([0, 1, -10], dtype=torch.int8)), + torch.logical_not(torch.tensor([0.0, 1.5, -10.0], dtype=torch.double)), + torch.logical_not( + torch.tensor([0.0, 1.0, -10.0], dtype=torch.double), + out=torch.empty(3, dtype=torch.int16), + ), + torch.logical_or(r, s), + torch.logical_or(r.double(), s.double()), + torch.logical_or(r.double(), s), + torch.logical_or(r, s, out=torch.empty(4, dtype=torch.bool)), + torch.logical_xor(r, s), + torch.logical_xor(r.double(), s.double()), + torch.logical_xor(r.double(), s), + torch.logical_xor(r, s, out=torch.empty(4, dtype=torch.bool)), + torch.logit(torch.rand(5), eps=1e-6), + torch.hypot(torch.tensor([4.0]), torch.tensor([3.0, 4.0, 5.0])), + torch.i0(torch.arange(5, dtype=torch.float32)), + torch.igamma(a, b), + torch.igammac(a, b), + torch.mul(torch.randn(3), 100), + b.mul(a), + b.mul(5), + b.mul(a, out=b), + b.mul_(a), + b.mul_(5), + torch.multiply(torch.randn(4, 1), torch.randn(1, 4)), + torch.mvlgamma(torch.empty(2, 3).uniform_(1.0, 2.0), 2), + torch.tensor([float("nan"), float("inf"), -float("inf"), 3.14]), + torch.nan_to_num(w), + torch.nan_to_num_(w), + torch.nan_to_num(w, nan=2.0), + torch.nan_to_num(w, nan=2.0, posinf=1.0), + torch.neg(torch.randn(5)), + # torch.nextafter(torch.tensor([1, 2]), torch.tensor([2, 1])) == torch.tensor([eps + 1, 2 - eps]), + torch.polygamma(1, torch.tensor([1.0, 0.5])), + torch.polygamma(2, torch.tensor([1.0, 0.5])), + torch.polygamma(3, torch.tensor([1.0, 0.5])), + torch.polygamma(4, torch.tensor([1.0, 0.5])), + torch.pow(a, 2), + torch.pow(2, float(torch.tensor(0.5))), + torch.pow(torch.arange(1.0, 5.0), torch.arange(1.0, 5.0)), + torch.rad2deg( + torch.tensor([[3.142, -3.142], [6.283, -6.283], [1.570, -1.570]]) + ), + torch.randn(4, dtype=torch.cfloat).real, + torch.reciprocal(a), + torch.remainder(torch.tensor([-3.0, -2.0]), 2), + torch.remainder(torch.tensor([1, 2, 3, 4, 5]), 1.5), + torch.round(a), + torch.round(torch.tensor(0.5).item()), + torch.rsqrt(a), + torch.sigmoid(a), + torch.sign(torch.tensor([0.7, -1.2, 0.0, 2.3])), + torch.sgn(a), + torch.signbit(torch.tensor([0.7, -1.2, 0.0, 2.3])), + torch.sin(a), + torch.sinc(a), + torch.sinh(a), + torch.sqrt(a), + torch.square(a), + torch.sub(torch.tensor((1, 2)), torch.tensor((0, 1)), alpha=2), + b.sub(a), + b.sub_(a), + b.sub(5), + torch.sum(5), + torch.tan(a), + torch.tanh(a), + torch.true_divide(a, a), + torch.trunc(a), + torch.trunc_(a), + torch.xlogy(f, g), + torch.xlogy(f, g), + torch.xlogy(f, 4), + torch.xlogy(2, g), + ) + + +class ReductionOpsModule(torch.nn.Module): + def __init__(self): + super(ReductionOpsModule, self).__init__() + + def forward(self): + return self.reduction_ops() + + def reduction_ops(self): + a = torch.randn(4) + b = torch.randn(4) + c = torch.tensor(0.5) + return len( + torch.argmax(a), + torch.argmin(a), + torch.amax(a), + torch.amin(a), + torch.aminmax(a), + torch.all(a), + torch.any(a), + torch.max(a), + a.max(a), + torch.max(a, 0), + torch.min(a), + a.min(a), + torch.min(a, 0), + torch.dist(a, b), + torch.logsumexp(a, 0), + torch.mean(a), + torch.mean(a, 0), + torch.nanmean(a), + torch.median(a), + torch.nanmedian(a), + torch.mode(a), + torch.norm(a), + a.norm(2), + torch.norm(a, dim=0), + torch.norm(c, torch.tensor(2)), + torch.nansum(a), + torch.prod(a), + torch.quantile(a, torch.tensor([0.25, 0.5, 0.75])), + torch.quantile(a, 0.5), + torch.nanquantile(a, torch.tensor([0.25, 0.5, 0.75])), + torch.std(a), + torch.std_mean(a), + torch.sum(a), + torch.unique(a), + torch.unique_consecutive(a), + torch.var(a), + torch.var_mean(a), + torch.count_nonzero(a), + ) + + +class ComparisonOpsModule(torch.nn.Module): + def __init__(self): + super(ComparisonOpsModule, self).__init__() + + def forward(self): + a = torch.tensor(0) + b = torch.tensor(1) + return len( + torch.allclose(a, b), + torch.argsort(a), + torch.eq(a, b), + torch.eq(a, 1), + torch.equal(a, b), + torch.ge(a, b), + torch.ge(a, 1), + torch.greater_equal(a, b), + torch.greater_equal(a, 1), + torch.gt(a, b), + torch.gt(a, 1), + torch.greater(a, b), + torch.isclose(a, b), + torch.isfinite(a), + torch.isin(a, b), + torch.isinf(a), + torch.isposinf(a), + torch.isneginf(a), + torch.isnan(a), + torch.isreal(a), + torch.kthvalue(a, 1), + torch.le(a, b), + torch.le(a, 1), + torch.less_equal(a, b), + torch.lt(a, b), + torch.lt(a, 1), + torch.less(a, b), + torch.maximum(a, b), + torch.minimum(a, b), + torch.fmax(a, b), + torch.fmin(a, b), + torch.ne(a, b), + torch.ne(a, 1), + torch.not_equal(a, b), + torch.sort(a), + torch.topk(a, 1), + torch.msort(a), + ) + + +class OtherMathOpsModule(torch.nn.Module): + def __init__(self): + super(OtherMathOpsModule, self).__init__() + + def forward(self): + return self.other_ops() + + def other_ops(self): + a = torch.randn(4) + b = torch.randn(4) + c = torch.randint(0, 8, (5,), dtype=torch.int64) + e = torch.randn(4, 3) + f = torch.randn(4, 4, 4) + size = [0, 1] + dims = [0, 1] + return len( + torch.atleast_1d(a), + torch.atleast_2d(a), + torch.atleast_3d(a), + torch.bincount(c), + torch.block_diag(a), + torch.broadcast_tensors(a), + torch.broadcast_to(a, (4)), + # torch.broadcast_shapes(a), + torch.bucketize(a, b), + torch.cartesian_prod(a), + torch.cdist(e, e), + torch.clone(a), + torch.combinations(a), + torch.corrcoef(a), + # torch.cov(a), + torch.cross(e, e), + torch.cummax(a, 0), + torch.cummin(a, 0), + torch.cumprod(a, 0), + torch.cumsum(a, 0), + torch.diag(a), + torch.diag_embed(a), + torch.diagflat(a), + torch.diagonal(e), + torch.diff(a), + torch.einsum("iii", f), + torch.flatten(a), + torch.flip(e, dims), + torch.fliplr(e), + torch.flipud(e), + torch.kron(a, b), + torch.rot90(e), + torch.gcd(c, c), + torch.histc(a), + torch.histogram(a), + torch.meshgrid(a), + torch.meshgrid(a, indexing="xy"), + torch.lcm(c, c), + torch.logcumsumexp(a, 0), + torch.ravel(a), + torch.renorm(e, 1, 0, 5), + torch.repeat_interleave(c), + torch.roll(a, 1, 0), + torch.searchsorted(a, b), + torch.tensordot(e, e), + torch.trace(e), + torch.tril(e), + torch.tril_indices(3, 3), + torch.triu(e), + torch.triu_indices(3, 3), + torch.vander(a), + torch.view_as_real(torch.randn(4, dtype=torch.cfloat)), + torch.view_as_complex(torch.randn(4, 2)).real, + torch.resolve_conj(a), + torch.resolve_neg(a), + ) + + +class SpectralOpsModule(torch.nn.Module): + def __init__(self): + super(SpectralOpsModule, self).__init__() + + def forward(self): + return self.spectral_ops() + + def spectral_ops(self): + a = torch.randn(10) + b = torch.randn(10, 8, 4, 2) + return len( + torch.stft(a, 8), + torch.stft(a, torch.tensor(8)), + torch.istft(b, 8), + torch.bartlett_window(2, dtype=torch.float), + torch.blackman_window(2, dtype=torch.float), + torch.hamming_window(4, dtype=torch.float), + torch.hann_window(4, dtype=torch.float), + torch.kaiser_window(4, dtype=torch.float), + ) + + +class BlasLapackOpsModule(torch.nn.Module): + def __init__(self): + super(BlasLapackOpsModule, self).__init__() + + def forward(self): + return self.blas_lapack_ops() + + def blas_lapack_ops(self): + m = torch.randn(3, 3) + a = torch.randn(10, 3, 4) + b = torch.randn(10, 4, 3) + v = torch.randn(3) + return len( + torch.addbmm(m, a, b), + torch.addmm(torch.randn(2, 3), torch.randn(2, 3), torch.randn(3, 3)), + torch.addmv(torch.randn(2), torch.randn(2, 3), torch.randn(3)), + torch.addr(torch.zeros(3, 3), v, v), + torch.baddbmm(m, a, b), + torch.bmm(a, b), + torch.chain_matmul(torch.randn(3, 3), torch.randn(3, 3), torch.randn(3, 3)), + # torch.cholesky(a), # deprecated + # torch.cholesky_inverse(torch.randn(3, 3)), # had some error + # torch.cholesky_solve(torch.randn(3, 3), torch.randn(3, 3)), + torch.dot(v, v), + # torch.linalg.eig(m), # not build with lapack + # torch.geqrf(a), + torch.ger(v, v), + torch.inner(m, m), + # torch.inverse(m), + # torch.det(m), + # torch.logdet(m), + # torch.slogdet(m), + # torch.lstsq(m, m), + # torch.lu(m), + # torch.lu_solve(m, *torch.lu(m)), + # torch.lu_unpack(*torch.lu(m)), + torch.matmul(m, m), + torch.matrix_power(m, 2), + # torch.matrix_rank(m), + torch.matrix_exp(m), + torch.mm(m, m), + torch.mv(m, v), + # torch.orgqr(a, m), + # torch.ormqr(a, m, v), + torch.outer(v, v), + # torch.pinverse(m), + # torch.qr(a), + # torch.solve(m, m), + # torch.svd(a), + # torch.svd_lowrank(a), + # torch.pca_lowrank(a), + # torch.symeig(a), # deprecated + # torch.lobpcg(a, b), # not supported + torch.trapz(m, m), + torch.trapezoid(m, m), + torch.cumulative_trapezoid(m, m), + # torch.triangular_solve(m, m), + torch.vdot(v, v), + ) diff --git a/test/mobile/model_test/model_ops.yaml b/test/mobile/model_test/model_ops.yaml new file mode 100644 index 00000000000000..06a3640e4cbe79 --- /dev/null +++ b/test/mobile/model_test/model_ops.yaml @@ -0,0 +1,752 @@ +root_operators: + aten::Bool.Tensor: 19 + aten::Bool.int: 7 + aten::Float.Scalar: 18 + aten::Float.Tensor: 11 + aten::Float.str: 6 + aten::FloatImplicit: 2 + aten::Int.Scalar: 19 + aten::Int.Tensor: 35 + aten::Int.float: 6 + aten::Int.str: 12 + aten::IntImplicit: 11 + aten::ScalarImplicit: 3 + aten::__and__.Tensor: 13 + aten::__and__.bool: 11 + aten::__and__.int: 2 + aten::__contains__.int: 5 + aten::__contains__.int_list: 17 + aten::__contains__.str: 22 + aten::__contains__.str_list: 5 + aten::__derive_index: 24 + aten::__getitem__.Dict_int: 4 + aten::__getitem__.Dict_str: 39 + aten::__getitem__.str: 20 + aten::__getitem__.t: 178 + aten::__is__: 83 + aten::__isnot__: 81 + aten::__lshift__.int: 2 + aten::__not__: 32 + aten::__range_length: 23 + aten::__rshift__.int: 2 + aten::__xor__.bool: 10 + aten::_aminmax: 4 + aten::_convolution: 12 + aten::_convolution.deprecated: 3 + aten::_infer_size: 7 + aten::_make_per_tensor_quantized_tensor: 2 + aten::_pack_padded_sequence: 10 + aten::_pad_packed_sequence: 10 + aten::_reshape_from_tensor: 10 + aten::_set_item.int: 7 + aten::_set_item.str: 163 + aten::_set_item.t: 8 + aten::_shape_as_tensor: 10 + aten::adaptive_avg_pool1d: 1 + aten::adaptive_avg_pool2d: 33 + aten::adaptive_avg_pool3d: 1 + aten::add.Scalar: 33 + aten::add.Tensor: 63 + aten::add.float: 5 + aten::add.int: 49 + aten::add.out: 2 + aten::add.str: 29 + aten::add.t: 11 + aten::add_.Scalar: 15 + aten::add_.Tensor: 29 + aten::addcmul: 2 + aten::addmm: 7 + aten::all: 6 + aten::allclose: 1 + aten::any: 14 + aten::append.t: 59 + aten::arange: 16 + aten::arange.start: 6 + aten::arange.start_step: 16 + aten::argmax: 2 + aten::as_strided: 10 + aten::as_tensor.list: 4 + aten::atan: 4 + aten::avg_pool1d: 6 + aten::avg_pool2d: 7 + aten::backward: 23 + aten::batch_norm: 15 + aten::binary_cross_entropy: 15 + aten::binary_cross_entropy_with_logits: 3 + aten::bitwise_not: 13 + aten::bmm: 16 + aten::broadcast_tensors: 1 + aten::cat: 90 + aten::ceil: 3 + aten::ceil.float: 7 + aten::chunk: 19 + aten::clamp: 36 + aten::clamp_: 12 + aten::clamp_min: 3 + aten::clear.str: 2 + aten::clone: 26 + aten::coalesce: 2 + aten::conj: 1 + aten::constant_pad_nd: 17 + aten::contiguous: 113 + aten::conv1d: 12 + aten::conv2d: 10 + aten::conv_transpose2d.input: 5 + aten::copy_: 15 + aten::copy_.Tensor: 27 + aten::copy_.int: 1 + aten::cos: 4 + aten::count_nonzero: 4 + aten::ctc_loss.Tensor: 1 + aten::cumsum: 13 + aten::dequantize.list: 1 + aten::dequantize.self: 30 + aten::dequantize.tensor: 36 + aten::detach: 34 + aten::dim: 36 + aten::div: 9 + aten::div.Scalar: 8 + aten::div.Tensor: 71 + aten::div.Tensor_mode: 7 + aten::div.float: 3 + aten::div.int: 7 + aten::div_.Tensor: 7 + aten::dropout: 41 + aten::embedding: 16 + aten::embedding_bag.padding_idx: 2 + aten::empty.memory_format: 11 + aten::empty_like: 11 + aten::empty_strided: 3 + aten::eq.Scalar: 24 + aten::eq.Tensor: 6 + aten::eq.int: 57 + aten::eq.int_list: 20 + aten::eq.str: 43 + aten::exp: 18 + aten::exp.float: 4 + aten::expand: 26 + aten::expand_as: 3 + aten::extend.t: 38 + aten::feature_dropout: 1 + aten::fill_.Scalar: 17 + aten::find: 3 + aten::flatten.using_ints: 45 + aten::flip: 1 + aten::floor: 5 + aten::floor.float: 2 + aten::floor_divide: 4 + aten::floor_divide.Scalar: 7 + aten::floordiv.int: 21 + aten::format: 58 + aten::full: 10 + aten::full_like: 10 + aten::gather: 10 + aten::ge.Scalar: 4 + aten::ge.Tensor: 6 + aten::ge.int: 29 + aten::gelu: 12 + aten::get.default_str: 14 + aten::glu: 18 + aten::grid_sampler: 3 + aten::gt.Scalar: 16 + aten::gt.float: 16 + aten::gt.float_int: 3 + aten::gt.int: 52 + aten::hardsigmoid: 3 + aten::hardsigmoid_: 2 + aten::hardswish_: 4 + aten::hardtanh: 3 + aten::hardtanh_: 3 + aten::hstack: 2 + aten::index.Tensor: 23 + aten::index_fill.int_Scalar: 15 + aten::index_put_: 16 + aten::index_select: 31 + aten::is_coalesced: 2 + aten::is_floating_point: 9 + aten::isnan: 1 + aten::item: 40 + aten::items.str: 3 + aten::keys.str: 15 + aten::layer_norm: 26 + aten::le.Scalar: 1 + aten::le.Tensor: 10 + aten::le.float: 2 + aten::le.int: 17 + aten::leaky_relu: 1 + aten::leaky_relu_: 5 + aten::len.Dict_int: 5 + aten::len.Tensor: 19 + aten::len.str: 23 + aten::len.t: 177 + aten::linear: 46 + aten::linspace: 3 + aten::list.t: 24 + aten::log: 18 + aten::log10: 4 + aten::log1p: 5 + aten::log_softmax.int: 31 + aten::logical_and: 1 + aten::logical_not: 10 + aten::logit: 7 + aten::lower: 10 + aten::lstm.data: 8 + aten::lstm.input: 4 + aten::lt.Scalar: 8 + aten::lt.Tensor: 1 + aten::lt.float: 16 + aten::lt.int: 46 + aten::masked_fill.Scalar: 16 + aten::matmul: 12 + aten::max: 18 + aten::max.dim: 30 + aten::max.other: 7 + aten::max_pool2d: 10 + aten::maximum: 4 + aten::mean: 10 + aten::mean.dim: 16 + aten::meshgrid.indexing: 2 + aten::min: 2 + aten::min.dim: 4 + aten::min.other: 17 + aten::minimum: 4 + aten::mse_loss: 1 + aten::mul.Scalar: 26 + aten::mul.Tensor: 90 + aten::mul.float: 5 + aten::mul.float_int: 3 + aten::mul.int: 26 + aten::mul.int_float: 4 + aten::mul.left_t: 15 + aten::mul.out: 1 + aten::mul_.Scalar: 11 + aten::mul_.Tensor: 5 + aten::nan_to_num: 3 + aten::nan_to_num_: 10 + aten::narrow: 10 + aten::ne.Scalar: 14 + aten::ne.Tensor: 5 + aten::ne.int: 44 + aten::ne.int_float: 2 + aten::ne.int_list: 20 + aten::ne.str: 3 + aten::neg: 29 + aten::neg.int: 19 + aten::new_zeros: 6 + aten::nll_loss: 1 + aten::nll_loss2d: 1 + aten::nll_loss_nd: 3 + aten::nonzero: 4 + aten::norm.Scalar: 1 + aten::norm.ScalarOpt_dim: 4 + aten::numel: 8 + aten::one_hot: 2 + aten::ones: 38 + aten::ones_like: 16 + aten::ord: 20 + aten::permute: 43 + aten::pop.t: 7 + aten::pow.Tensor_Scalar: 3 + aten::pow.int_float: 2 + aten::quantile.scalar: 1 + aten::quantize_per_tensor: 66 + aten::quantize_per_tensor.tensor_qparams: 1 + aten::quantized_lstm.data: 2 + aten::rand: 25 + aten::randint.low: 2 + aten::randn_like: 17 + aten::reciprocal: 1 + aten::reflection_pad2d: 1 + aten::relu: 82 + aten::relu_: 9 + aten::remainder.Scalar: 2 + aten::remainder.int: 22 + aten::repeat: 16 + aten::replace: 1 + aten::replication_pad1d: 1 + aten::replication_pad2d: 2 + aten::replication_pad3d: 1 + aten::requires_grad_: 4 + aten::reshape: 36 + aten::resize_as_: 1 + aten::resolve_conj: 1 + aten::resolve_neg: 1 + aten::reverse.t: 2 + aten::round.Scalar: 4 + aten::rstrip: 1 + aten::rsub.Scalar: 5 + aten::scatter_.src: 6 + aten::scatter_add_: 10 + aten::select.int: 57 + aten::selu: 2 + aten::sigmoid: 93 + aten::sin: 4 + aten::size: 66 + aten::size.int: 66 + aten::slice.Tensor: 75 + aten::slice.str: 12 + aten::slice.t: 43 + aten::softmax.int: 63 + aten::softplus: 2 + aten::sort: 18 + aten::sparse_coo_tensor.indices: 1 + aten::sparse_resize_and_clear_: 1 + aten::split.str: 10 + aten::sqrt: 1 + aten::squeeze.dim: 26 + aten::stack: 30 + aten::startswith: 10 + aten::str: 16 + aten::strip: 3 + aten::sub: 8 + aten::sub.Scalar: 26 + aten::sub.Tensor: 94 + aten::sub.int: 52 + aten::sub_.Tensor: 4 + aten::sum: 17 + aten::sum.dim_IntList: 19 + aten::sum.int: 1 + aten::t: 3 + aten::tanh: 26 + aten::tensor: 51 + aten::tensor.float: 28 + aten::tensor.int: 34 + aten::tensor_split.indices: 4 + aten::to.device: 11 + aten::to.dtype: 23 + aten::to.dtype_layout: 27 + aten::to.prim_Device: 23 + aten::to.prim_dtype: 38 + aten::topk: 10 + aten::transpose.int: 33 + aten::triu: 10 + aten::true_divide.Tensor: 2 + aten::trunc_: 3 + aten::type_as: 6 + aten::unbind.int: 24 + aten::unique_consecutive: 2 + aten::unsqueeze: 34 + aten::unsqueeze_: 6 + aten::update.str: 4 + aten::upsample_bicubic2d.vec: 1 + aten::upsample_bilinear2d.vec: 8 + aten::upsample_linear1d.vec: 1 + aten::upsample_nearest1d.vec: 2 + aten::upsample_nearest2d: 7 + aten::upsample_nearest2d.vec: 30 + aten::upsample_nearest3d.vec: 2 + aten::upsample_trilinear3d.vec: 1 + aten::values.int: 3 + aten::view: 61 + aten::vstack: 1 + aten::where.ScalarOther: 4 + aten::where.self: 10 + aten::zeros: 75 + aten::zeros.out: 1 + aten::zeros_like: 7 + prepacked::conv2d_clamp_prepack: 2 + prepacked::conv2d_clamp_run: 32 + prepacked::conv2d_transpose_clamp_prepack: 1 + prepacked::conv2d_transpose_clamp_run: 1 + prepacked::linear_clamp_run: 26 + prim::ModuleContainerIndex.list: 2 + prim::NumToTensor.Scalar: 15 + prim::Print: 1 + prim::RaiseException: 103 + prim::TupleIndex: 157 + prim::TupleUnpack: 120 + prim::Uninitialized: 80 + prim::device: 46 + prim::dtype: 45 + prim::is_cuda: 1 + prim::max.float: 7 + prim::max.int: 14 + prim::max.self_int: 17 + prim::min: 4 + prim::min.int: 35 + prim::min.self_int: 25 + prim::unchecked_cast: 100 + quantized::add: 58 + quantized::add_relu: 1 + quantized::batch_norm2d: 1 + quantized::cat: 4 + quantized::conv1d: 1 + quantized::conv2d: 4 + quantized::conv2d.new: 55 + quantized::conv2d_prepack: 14 + quantized::conv2d_relu.new: 50 + quantized::conv_prepack: 5 + quantized::conv_transpose2d: 2 + quantized::embedding_4bit: 1 + quantized::embedding_byte: 14 + quantized::hardswish: 1 + quantized::instance_norm: 1 + quantized::leaky_relu: 2 + quantized::linear: 27 + quantized::linear_dynamic: 21 + quantized::linear_dynamic_fp16: 18 + quantized::linear_prepack: 29 + quantized::linear_prepack_fp16: 25 + quantized::linear_relu: 2 + quantized::linear_unpack: 4 + quantized::linear_unpack_fp16: 4 + quantized::mul: 4 + quantized::mul.Scalar: 1 +traced_operators: + aten::__and__.Tensor: 13 + aten::__iand__.Tensor: 1 + aten::__ior__.Tensor: 1 + aten::_adaptive_avg_pool2d: 23 + aten::_aminmax: 4 + aten::_batch_norm_impl_index: 15 + aten::_cat: 95 + aten::_coalesce: 2 + aten::_coalesced_: 3 + aten::_convolution: 34 + aten::_convolution.deprecated: 3 + aten::_ctc_loss: 1 + aten::_embedding_bag: 2 + aten::_embedding_bag_backward: 1 + aten::_embedding_bag_sparse_backward: 1 + aten::_empty_affine_quantized: 87 + aten::_empty_per_channel_affine_quantized: 28 + aten::_index_put_impl_: 16 + aten::_indices: 4 + aten::_local_scalar_dense: 188 + aten::_log_softmax: 28 + aten::_log_softmax_backward_data: 4 + aten::_make_per_tensor_quantized_tensor: 2 + aten::_nnz: 3 + aten::_pack_padded_sequence: 10 + aten::_pack_padded_sequence_backward: 3 + aten::_pad_packed_sequence: 10 + aten::_reshape_alias: 93 + aten::_reshape_from_tensor: 10 + aten::_s_where: 15 + aten::_shape_as_tensor: 10 + aten::_slow_conv2d_backward.output_mask: 3 + aten::_slow_conv2d_forward: 33 + aten::_softmax: 63 + aten::_sparse_coo_tensor_unsafe: 4 + aten::_sparse_coo_tensor_with_dims_and_tensors: 5 + aten::_to_copy: 188 + aten::_unsafe_view: 28 + aten::_values: 4 + aten::abs: 1 + aten::abs.out: 1 + aten::adaptive_avg_pool2d: 29 + aten::add.Scalar: 30 + aten::add.Tensor: 72 + aten::add.out: 2 + aten::add_.Scalar: 11 + aten::add_.Tensor: 48 + aten::addmm: 41 + aten::alias: 14 + aten::all: 8 + aten::allclose: 1 + aten::aminmax: 4 + aten::any: 14 + aten::any.dim: 1 + aten::arange: 10 + aten::arange.start: 26 + aten::arange.start_out: 28 + aten::arange.start_step: 8 + aten::argmax: 2 + aten::as_strided: 188 + aten::as_strided_: 39 + aten::atan: 4 + aten::atleast_1d.Sequence: 2 + aten::atleast_2d.Sequence: 1 + aten::avg_pool2d: 7 + aten::batch_norm: 15 + aten::bernoulli_.float: 2 + aten::binary_cross_entropy: 13 + aten::binary_cross_entropy_backward: 12 + aten::binary_cross_entropy_with_logits: 3 + aten::binary_cross_entropy_with_logits_backward: 2 + aten::bitwise_and.Tensor: 13 + aten::bitwise_and_.Tensor: 1 + aten::bitwise_not: 13 + aten::bitwise_or_.Tensor: 1 + aten::bmm: 18 + aten::broadcast_tensors: 1 + aten::cat: 95 + aten::ceil: 4 + aten::ceil_: 1 + aten::chunk: 20 + aten::clamp: 38 + aten::clamp_: 12 + aten::clamp_min: 73 + aten::clamp_min.out: 74 + aten::clamp_min_: 4 + aten::clone: 134 + aten::coalesce: 2 + aten::conj: 1 + aten::constant_pad_nd: 14 + aten::contiguous: 139 + aten::conv1d: 12 + aten::conv2d: 7 + aten::conv_transpose2d.input: 5 + aten::convolution: 19 + aten::convolution_backward: 3 + aten::copy_: 188 + aten::copy_sparse_to_sparse_: 3 + aten::cos: 4 + aten::count_nonzero: 4 + aten::count_nonzero.dim_IntList: 4 + aten::ctc_loss.Tensor: 1 + aten::cudnn_is_acceptable: 12 + aten::cumsum: 14 + aten::dense_dim: 3 + aten::dequantize.self: 63 + aten::dequantize.tensors: 1 + aten::detach: 49 + aten::div.Scalar: 188 + aten::div.Tensor: 188 + aten::div.Tensor_mode: 8 + aten::div_.Scalar: 27 + aten::div_.Tensor: 34 + aten::dropout: 41 + aten::elu: 2 + aten::embedding: 16 + aten::embedding_backward: 4 + aten::embedding_bag.padding_idx: 2 + aten::embedding_dense_backward: 4 + aten::embedding_sparse_backward: 1 + aten::empty.memory_format: 188 + aten::empty_like: 162 + aten::empty_strided: 188 + aten::eq.Scalar: 25 + aten::eq.Tensor: 188 + aten::exp: 15 + aten::exp_: 3 + aten::expand: 63 + aten::expand_as: 17 + aten::feature_dropout: 1 + aten::fill_.Scalar: 188 + aten::flatten.using_ints: 42 + aten::flip: 1 + aten::floor: 6 + aten::floor_divide: 7 + aten::floor_divide.Scalar: 7 + aten::full: 21 + aten::full_like: 10 + aten::gather: 11 + aten::ge.Scalar: 2 + aten::gelu: 12 + aten::glu: 18 + aten::grid_sampler: 3 + aten::grid_sampler_2d: 3 + aten::gt.Scalar: 16 + aten::hardsigmoid: 3 + aten::hardsigmoid_: 2 + aten::hardswish_: 4 + aten::hardtanh: 3 + aten::hstack: 2 + aten::index.Tensor: 20 + aten::index_add_: 4 + aten::index_fill.int_Scalar: 1 + aten::index_fill_.int_Scalar: 1 + aten::index_put_: 16 + aten::index_select: 28 + aten::index_select_backward: 3 + aten::is_coalesced: 3 + aten::is_floating_point: 8 + aten::isclose: 1 + aten::isfinite: 1 + aten::isnan: 1 + aten::item: 188 + aten::layer_norm: 26 + aten::le.Scalar: 2 + aten::le.Tensor: 1 + aten::leaky_relu: 1 + aten::leaky_relu_: 5 + aten::lerp_.Tensor: 1 + aten::linear: 51 + aten::linspace: 3 + aten::linspace.out: 3 + aten::log: 15 + aten::log10: 4 + aten::log1p: 5 + aten::log_: 3 + aten::log_softmax.int: 28 + aten::logical_and: 1 + aten::logical_and.out: 2 + aten::logical_and_: 1 + aten::logit: 7 + aten::lstm.data: 8 + aten::lstm.input: 4 + aten::lt.Scalar: 8 + aten::lt.Tensor: 1 + aten::masked_fill.Scalar: 3 + aten::masked_fill_.Scalar: 18 + aten::matmul: 31 + aten::max: 27 + aten::max.dim: 31 + aten::max.other: 4 + aten::max_pool2d: 7 + aten::maximum: 4 + aten::mean: 16 + aten::mean.dim: 26 + aten::meshgrid.indexing: 2 + aten::min: 25 + aten::min.dim: 5 + aten::min.other: 4 + aten::minimum: 5 + aten::mm: 40 + aten::mul.Scalar: 31 + aten::mul.Tensor: 103 + aten::mul.out: 12 + aten::mul_.Scalar: 11 + aten::mul_.Tensor: 7 + aten::nan_to_num: 3 + aten::nan_to_num.out: 13 + aten::nan_to_num_: 10 + aten::narrow: 188 + aten::native_batch_norm: 15 + aten::native_layer_norm: 26 + aten::native_layer_norm_backward: 1 + aten::ne.Scalar: 15 + aten::ne.Tensor: 6 + aten::neg: 29 + aten::new_empty_strided: 188 + aten::nll_loss: 4 + aten::nll_loss_backward: 4 + aten::nll_loss_forward: 4 + aten::nll_loss_nd: 3 + aten::nonzero: 16 + aten::norm.Scalar: 1 + aten::norm.ScalarOpt_dim: 5 + aten::normal_: 17 + aten::one_hot: 2 + aten::ones: 188 + aten::ones_like: 25 + aten::permute: 44 + aten::pow.Tensor_Scalar: 3 + aten::q_per_channel_scales: 28 + aten::q_per_channel_zero_points: 28 + aten::q_scale: 65 + aten::q_zero_point: 85 + aten::qscheme: 85 + aten::quantile.scalar: 1 + aten::quantize_per_tensor: 84 + aten::quantize_per_tensor.tensor_qparams: 1 + aten::quantized_lstm.data: 2 + aten::quantized_max_pool2d: 3 + aten::rand: 25 + aten::randint.low: 2 + aten::randn_like: 17 + aten::random_.from: 2 + aten::reciprocal: 1 + aten::reflection_pad2d: 1 + aten::relu: 79 + aten::relu_: 4 + aten::remainder.Scalar: 2 + aten::remainder.Tensor: 2 + aten::repeat: 14 + aten::replication_pad2d: 2 + aten::requires_grad_: 2 + aten::reshape: 69 + aten::resize_: 188 + aten::resize_as_: 18 + aten::resolve_conj: 70 + aten::resolve_neg: 1 + aten::result_type.Scalar: 3 + aten::rsub.Scalar: 5 + aten::scalar_tensor: 1 + aten::scatter_.src: 6 + aten::scatter_.value: 2 + aten::scatter_add_: 10 + aten::select.int: 77 + aten::select_backward: 1 + aten::selu: 2 + aten::set_.source_Storage: 186 + aten::set_.source_Storage_storage_offset: 186 + aten::sigmoid: 90 + aten::sigmoid_: 14 + aten::sigmoid_backward: 17 + aten::sin: 4 + aten::slice.Tensor: 188 + aten::slice_backward: 4 + aten::slow_conv_transpose2d: 6 + aten::softmax.int: 63 + aten::softplus: 2 + aten::sort: 20 + aten::sparse_coo_tensor.indices: 1 + aten::sparse_dim: 3 + aten::sparse_resize_and_clear_: 1 + aten::split.Tensor: 20 + aten::sqrt: 1 + aten::squeeze: 13 + aten::squeeze.dim: 38 + aten::squeeze_.dim: 36 + aten::stack: 39 + aten::sub.Scalar: 23 + aten::sub.Tensor: 105 + aten::sub_.Scalar: 1 + aten::sub_.Tensor: 7 + aten::sum: 18 + aten::sum.IntList_out: 29 + aten::sum.dim_IntList: 41 + aten::t: 49 + aten::tanh: 40 + aten::tanh_: 14 + aten::tanh_backward: 5 + aten::tensor_split.indices: 4 + aten::thnn_conv2d: 33 + aten::threshold_backward: 17 + aten::to.device: 35 + aten::to.dtype: 188 + aten::to.dtype_layout: 184 + aten::topk: 10 + aten::transpose.int: 73 + aten::triu: 10 + aten::true_divide.Tensor: 2 + aten::trunc_: 4 + aten::type_as: 6 + aten::unbind.int: 38 + aten::unfold: 14 + aten::uniform_: 25 + aten::unique_consecutive: 2 + aten::unsafe_chunk: 14 + aten::unsafe_split.Tensor: 14 + aten::unsqueeze: 56 + aten::unsqueeze_: 31 + aten::upsample_bilinear2d: 7 + aten::upsample_bilinear2d.vec: 7 + aten::upsample_nearest2d: 31 + aten::upsample_nearest2d.vec: 27 + aten::value_selecting_reduction_backward: 3 + aten::view: 95 + aten::vstack: 1 + aten::where.ScalarOther: 4 + aten::where.self: 15 + aten::zero_: 188 + aten::zeros: 188 + aten::zeros.out: 1 + aten::zeros_like: 6 + prepacked::conv2d_clamp_prepack: 1 + prepacked::conv2d_clamp_run: 32 + prepacked::conv2d_transpose_clamp_run: 1 + prepacked::linear_clamp_run: 26 + quantized::add: 58 + quantized::add_relu: 1 + quantized::batch_norm2d: 1 + quantized::cat: 4 + quantized::conv1d: 1 + quantized::conv2d: 4 + quantized::conv2d.new: 55 + quantized::conv2d_prepack: 14 + quantized::conv2d_relu.new: 50 + quantized::conv_prepack: 5 + quantized::conv_transpose2d: 2 + quantized::embedding_byte: 14 + quantized::hardswish: 1 + quantized::instance_norm: 1 + quantized::leaky_relu: 2 + quantized::linear: 27 + quantized::linear_dynamic: 21 + quantized::linear_prepack: 29 + quantized::linear_relu: 2 + quantized::mul: 4 + quantized::mul.Scalar: 1 diff --git a/test/mobile/model_test/nn_ops.py b/test/mobile/model_test/nn_ops.py new file mode 100644 index 00000000000000..338359c964084a --- /dev/null +++ b/test/mobile/model_test/nn_ops.py @@ -0,0 +1,427 @@ +import torch +import torch.nn as nn +import torch.nn.functional as F + +# https://pytorch.org/docs/stable/nn.html +class NNConvolutionModule(torch.nn.Module): + def __init__(self): + super(NNConvolutionModule, self).__init__() + self.input1d = torch.randn(1, 4, 36) + self.input2d = torch.randn(1, 4, 30, 10) + self.input3d = torch.randn(1, 4, 10, 4, 4) + self.module1d = nn.ModuleList( + [ + nn.Conv1d(4, 33, 3), + nn.ConvTranspose1d(4, 33, 3), + nn.Fold(output_size=(5, 10), kernel_size=(2, 2)), + ] + ) + self.module2d = nn.ModuleList( + [ + nn.Conv2d(4, 33, 3), + nn.ConvTranspose2d(4, 33, 3), + nn.Unfold(kernel_size=3), + ] + ) + self.module3d = nn.ModuleList( + [ + nn.Conv3d(4, 33, 2), + nn.ConvTranspose3d(4, 33, 3), + ] + ) + + def forward(self): + return len(( + [module(self.input1d) for i, module in enumerate(self.module1d)], + [module(self.input2d) for i, module in enumerate(self.module2d)], + [module(self.input3d) for i, module in enumerate(self.module3d)], + )) + + +class NNPoolingModule(torch.nn.Module): + def __init__(self): + super(NNPoolingModule, self).__init__() + self.input1d = torch.randn(1, 16, 50) + self.module1d = nn.ModuleList( + [ + nn.MaxPool1d(3, stride=2), + nn.AvgPool1d(3, stride=2), + nn.LPPool1d(2, 3, stride=2), + nn.AdaptiveMaxPool1d(3), + nn.AdaptiveAvgPool1d(3), + ] + ) + + self.input2d = torch.randn(1, 16, 30, 10) + self.module2d = nn.ModuleList( + [ + nn.MaxPool2d((3, 2), stride=(2, 1)), + nn.AvgPool2d((3, 2), stride=(2, 1)), + nn.FractionalMaxPool2d(3, output_ratio=(0.5, 0.5)), + nn.LPPool2d(2, 3, stride=(2, 1)), + nn.AdaptiveMaxPool2d((5, 7)), + nn.AdaptiveAvgPool2d((7)), + ] + ) + + self.input3d = torch.randn(1, 16, 20, 4, 4) + self.module3d = nn.ModuleList( + [ + nn.MaxPool3d(2), + nn.AvgPool3d(2), + nn.FractionalMaxPool3d(2, output_ratio=(0.5, 0.5, 0.5)), + nn.AdaptiveMaxPool3d((5, 7, 9)), + nn.AdaptiveAvgPool3d((5, 7, 9)), + ] + ) + # TODO max_unpool + + def forward(self): + return len(( + [module(self.input1d) for i, module in enumerate(self.module1d)], + [module(self.input2d) for i, module in enumerate(self.module2d)], + [module(self.input3d) for i, module in enumerate(self.module3d)], + )) + + +class NNPaddingModule(torch.nn.Module): + def __init__(self): + super(NNPaddingModule, self).__init__() + self.input1d = torch.randn(1, 4, 50) + self.module1d = nn.ModuleList( + [ + nn.ReflectionPad1d(2), + nn.ReplicationPad1d(2), + nn.ConstantPad1d(2, 3.5), + ] + ) + + self.input2d = torch.randn(1, 4, 30, 10) + self.module2d = nn.ModuleList( + [ + nn.ReflectionPad2d(2), + nn.ReplicationPad2d(2), + nn.ZeroPad2d(2), + nn.ConstantPad2d(2, 3.5), + ] + ) + + self.input3d = torch.randn(1, 4, 10, 4, 4) + self.module3d = nn.ModuleList( + [ + nn.ReflectionPad3d(1), + nn.ReplicationPad3d(3), + nn.ConstantPad3d(3, 3.5), + ] + ) + + def forward(self): + return len(( + [module(self.input1d) for i, module in enumerate(self.module1d)], + [module(self.input2d) for i, module in enumerate(self.module2d)], + [module(self.input3d) for i, module in enumerate(self.module3d)], + )) + + +class NNNormalizationModule(torch.nn.Module): + def __init__(self): + super(NNNormalizationModule, self).__init__() + self.input1d = torch.randn(1, 4, 50) + self.module1d = nn.ModuleList( + [ + nn.BatchNorm1d(4), + nn.InstanceNorm1d(4), + ] + ) + + self.input2d = torch.randn(1, 4, 30, 10) + self.module2d = nn.ModuleList( + [ + nn.BatchNorm2d(4), + nn.GroupNorm(4, 4), + nn.InstanceNorm2d(4), + nn.LayerNorm([4, 30, 10]), + nn.LocalResponseNorm(2), + ] + ) + + self.input3d = torch.randn(1, 4, 10, 4, 4) + self.module3d = nn.ModuleList( + [ + nn.BatchNorm3d(4), + nn.InstanceNorm3d(4), + nn.ChannelShuffle(2), + ] + ) + + def forward(self): + return len(( + [module(self.input1d) for i, module in enumerate(self.module1d)], + [module(self.input2d) for i, module in enumerate(self.module2d)], + [module(self.input3d) for i, module in enumerate(self.module3d)], + )) + + +class NNActivationModule(torch.nn.Module): + def __init__(self): + super(NNActivationModule, self).__init__() + self.activations = nn.ModuleList( + [ + nn.ELU(), + nn.Hardshrink(), + nn.Hardsigmoid(), + nn.Hardtanh(), + nn.Hardswish(), + nn.LeakyReLU(), + nn.LogSigmoid(), + # nn.MultiheadAttention(), + nn.PReLU(), + nn.ReLU(), + nn.ReLU6(), + nn.RReLU(), + nn.SELU(), + nn.CELU(), + nn.GELU(), + nn.Sigmoid(), + nn.SiLU(), + nn.Mish(), + nn.Softplus(), + nn.Softshrink(), + nn.Softsign(), + nn.Tanh(), + nn.Tanhshrink(), + # nn.Threshold(0.1, 20), + nn.GLU(), + nn.Softmin(), + nn.Softmax(), + nn.Softmax2d(), + nn.LogSoftmax(), + # nn.AdaptiveLogSoftmaxWithLoss(), + ] + ) + + def forward(self): + input = torch.randn(2, 3, 4) + return len(( + [module(input) for i, module in enumerate(self.activations)], + )) + + +class NNRecurrentModule(torch.nn.Module): + def __init__(self): + super(NNRecurrentModule, self).__init__() + self.rnn = nn.ModuleList( + [ + nn.RNN(4, 8, 2), + nn.RNNCell(4, 8), + ] + ) + self.gru = nn.ModuleList([nn.GRU(4, 8, 2), nn.GRUCell(4, 8)]) + self.lstm = nn.ModuleList( + [ + nn.LSTM(4, 8, 2), + nn.LSTMCell(4, 8), + ] + ) + + def forward(self): + input = torch.randn(5, 3, 4) + h = torch.randn(2, 3, 8) + c = torch.randn(2, 3, 8) + r = self.rnn[0](input, h) + r = self.rnn[1](input[0], h[0]) + r = self.gru[0](input, h) + r = self.gru[1](input[0], h[0]) + r = self.lstm[0](input, (h, c)) + r = self.lstm[1](input[0], (h[0], c[0])) + return len(r) + + +class NNTransformerModule(torch.nn.Module): + def __init__(self): + super(NNTransformerModule, self).__init__() + self.transformers = nn.ModuleList( + [ + nn.Transformer( + d_model=2, nhead=2, num_encoder_layers=1, num_decoder_layers=1 + ), + nn.TransformerEncoder( + nn.TransformerEncoderLayer(d_model=2, nhead=2), num_layers=1 + ), + nn.TransformerDecoder( + nn.TransformerDecoderLayer(d_model=2, nhead=2), num_layers=1 + ), + ] + ) + + def forward(self): + input = torch.rand(1, 16, 2) + tgt = torch.rand((1, 16, 2)) + r = self.transformers[0](input, tgt) + r = self.transformers[1](input) + r = self.transformers[2](input, tgt) + return len(r) + + +class NNLinearModule(torch.nn.Module): + def __init__(self): + super(NNLinearModule, self).__init__() + self.linears = nn.ModuleList( + [ + nn.Identity(54), + nn.Linear(20, 20), + nn.Bilinear(20, 20, 40), + # nn.LazyLinear(20, 30), + ] + ) + + def forward(self): + input = torch.randn(32, 20) + r = self.linears[0](input) + r = self.linears[1](input) + r = self.linears[2](input, input) + return len(r) + + +class NNDropoutModule(torch.nn.Module): + def __init__(self): + super(NNDropoutModule, self).__init__() + + def forward(self): + a = torch.randn(8, 4) + b = torch.randn(8, 4, 4, 4) + c = torch.randn(8, 4, 4, 4, 4) + return len( + F.dropout(a), + F.dropout2d(b), + F.dropout3d(c), + F.alpha_dropout(a), + F.feature_alpha_dropout(c), + ) + + +class NNSparseModule(torch.nn.Module): + def __init__(self): + super(NNSparseModule, self).__init__() + + def forward(self): + input = torch.tensor([[1, 2, 4, 5], [4, 3, 2, 9]]) + input2 = torch.tensor([1, 2, 4, 5, 4, 3, 2, 9]) + embedding_matrix = torch.rand(10, 3) + offsets = torch.tensor([0, 4]) + return len( + F.embedding(input, embedding_matrix), + F.embedding_bag(input2, embedding_matrix, offsets), + F.one_hot(torch.arange(0, 5) % 3, num_classes=5), + ) + + +class NNDistanceModule(torch.nn.Module): + def __init__(self): + super(NNDistanceModule, self).__init__() + + def forward(self): + a = torch.randn(8, 4) + b = torch.randn(8, 4) + return len( + F.pairwise_distance(a, b), + F.cosine_similarity(a, b), + F.pdist(a), + ) + + +class NNLossFunctionModule(torch.nn.Module): + def __init__(self): + super(NNLossFunctionModule, self).__init__() + self.x = torch.FloatTensor([[0.1, 0.2, 0.4, 0.8]]) + self.y = torch.LongTensor([[3, 0, -1, 1]]) + + def forward(self): + a = torch.randn(3, 2) + b = torch.rand(3, 2) + c = torch.rand(3) + log_probs = torch.randn(50, 16, 20).log_softmax(2).detach() + targets = torch.randint(1, 20, (16, 30), dtype=torch.long) + input_lengths = torch.full((16,), 50, dtype=torch.long) + target_lengths = torch.randint(10, 30, (16,), dtype=torch.long) + return len( + F.binary_cross_entropy(torch.sigmoid(a), b), + F.binary_cross_entropy_with_logits(torch.sigmoid(a), b), + F.poisson_nll_loss(a, b), + F.cosine_embedding_loss(a, b, c), + F.cross_entropy(a, b), + F.ctc_loss(log_probs, targets, input_lengths, target_lengths), + # F.gaussian_nll_loss(a, b, torch.ones(5, 1)), # ENTER is not supported in mobile module + F.hinge_embedding_loss(a, b), + F.kl_div(a, b), + F.l1_loss(a, b), + F.mse_loss(a, b), + F.margin_ranking_loss(c, c, c), + F.multilabel_margin_loss(self.x, self.y), + F.multilabel_soft_margin_loss(self.x, self.y), + F.multi_margin_loss(self.x, torch.tensor([3])), + F.nll_loss(a, torch.tensor([1, 0, 1])), + F.huber_loss(a, b), + F.smooth_l1_loss(a, b), + F.soft_margin_loss(a, b), + F.triplet_margin_loss(a, b, -b), + # F.triplet_margin_with_distance_loss(a, b, -b), # can't take variable number of arguments + ) + + +class NNVisionModule(torch.nn.Module): + def __init__(self): + super(NNVisionModule, self).__init__() + self.input = torch.randn(1, 4, 9, 9) + self.vision_modules = nn.ModuleList( + [ + nn.PixelShuffle(2), + nn.PixelUnshuffle(3), + nn.Upsample(scale_factor=2, mode="nearest"), + nn.Upsample(scale_factor=2, mode="bilinear"), + nn.Upsample(scale_factor=2, mode="bicubic"), + nn.UpsamplingNearest2d(scale_factor=2), + nn.UpsamplingBilinear2d(scale_factor=2), + ] + ) + self.linear_sample = nn.Upsample(scale_factor=2, mode="linear") + self.trilinear_sample = nn.Upsample(scale_factor=2, mode="trilinear") + + def forward(self): + input = torch.randn(1, 3, 16, 16) + for i, module in enumerate(self.vision_modules): + r = module(self.input) + return len( + r, + self.linear_sample(torch.randn(4, 9, 9)), + self.trilinear_sample(torch.randn(1, 3, 4, 9, 9)), + F.grid_sample(input, torch.ones(1, 4, 4, 2)), + ) + + +class NNShuffleModule(torch.nn.Module): + def __init__(self): + super(NNShuffleModule, self).__init__() + self.shuffle = nn.ChannelShuffle(2) + + def forward(self): + return len(self.shuffle(torch.randn(1, 4, 2, 2)),) + + +class NNUtilsModule(torch.nn.Module): + def __init__(self): + super(NNUtilsModule, self).__init__() + self.flatten = nn.Sequential( + nn.Linear(50, 50), + nn.Unflatten(1, (2, 5, 5)) + ) + + def forward(self): + a = [torch.tensor([1, 2, 3]), torch.tensor([3, 4])] + b = nn.utils.rnn.pad_sequence(a, batch_first=True) + # c = nn.utils.rnn.pack_padded_sequence(b, batch_first=True, lengths=torch.tensor([3, 2])) + input = torch.randn(2, 50) + return len( + self.flatten(input), + b, + ) diff --git a/test/mobile/model_test/quantization_ops.py b/test/mobile/model_test/quantization_ops.py new file mode 100644 index 00000000000000..d0fdb346545e7d --- /dev/null +++ b/test/mobile/model_test/quantization_ops.py @@ -0,0 +1,227 @@ +import torch +import torch.nn as nn + + +class GeneralQuantModule(torch.nn.Module): + def __init__(self): + super(GeneralQuantModule, self).__init__() + self.embedding = torch.nn.quantized.Embedding( + num_embeddings=10, embedding_dim=12 + ) + self.embedding_input = torch.tensor([9, 6, 5, 7, 8, 8, 9, 2, 8]) + self.func = torch.nn.quantized.QFunctional() + self.conv1 = torch.nn.quantized.ConvTranspose1d(16, 33, 3, stride=2) + self.conv2 = torch.nn.quantized.ConvTranspose2d(16, 33, 3, stride=2) + self.conv3 = torch.nn.quantized.ConvTranspose3d(16, 33, 3, stride=2) + + def forward(self): + a = torch.quantize_per_tensor(torch.tensor([3.0]), 1.0, 0, torch.qint32) + b = torch.quantize_per_tensor(torch.tensor(4.0), 1.0, 0, torch.qint32) + c = torch.quantize_per_tensor( + torch.tensor([3.0]), torch.tensor(1.0), torch.tensor(0), torch.qint32 + ) + input1 = torch.randn(1, 16, 4) + input2 = torch.randn(1, 16, 4, 4) + input3 = torch.randn(1, 16, 4, 4, 4) + return len( + self.func.add(a, b), + self.func.cat((a, a), 0), + self.func.mul(a, b), + self.func.add_relu(a, b), + self.func.add_scalar(a, b), + self.func.mul_scalar(a, b), + self.embedding(self.embedding_input), + self.conv1( + torch.quantize_per_tensor( + input1, scale=1.0, zero_point=0, dtype=torch.quint8 + ) + ), + self.conv2( + torch.quantize_per_tensor( + input2, scale=1.0, zero_point=0, dtype=torch.quint8 + ) + ), + c, + # self.conv3(torch.quantize_per_tensor(input3, scale=1.0, zero_point=0, dtype=torch.quint8)), # failed on iOS + ) + + +class DynamicQuantModule: + def __init__(self): + super(DynamicQuantModule, self).__init__() + self.module = self.M() + + def getModule(self): + return torch.quantization.quantize_dynamic(self.module, dtype=torch.qint8) + + class M(torch.nn.Module): + def __init__(self): + super(DynamicQuantModule.M, self).__init__() + self.rnn = nn.RNN(4, 8, 2) + self.rnncell = nn.RNNCell(4, 8) + self.gru = nn.GRU(4, 8, 2) + self.grucell = nn.GRUCell(4, 8) + self.lstm = nn.LSTM(4, 8, 2) + self.lstmcell = nn.LSTMCell(4, 8) + self.linears = nn.ModuleList( + [ + nn.Identity(54), + nn.Linear(20, 20), + nn.Bilinear(20, 20, 40), + ] + ) + self.transformers = nn.ModuleList( + [ + nn.Transformer( + d_model=2, nhead=2, num_encoder_layers=1, num_decoder_layers=1 + ), + nn.TransformerEncoder( + nn.TransformerEncoderLayer(d_model=2, nhead=2), num_layers=1 + ), + nn.TransformerDecoder( + nn.TransformerDecoderLayer(d_model=2, nhead=2), num_layers=1 + ), + ] + ) + # self.a = torch.nn.utils.rnn.pad_sequence([torch.tensor([1,2,3]), torch.tensor([3,4])], batch_first=True) + + def forward(self): + input = torch.randn(5, 3, 4) + h = torch.randn(2, 3, 8) + c = torch.randn(2, 3, 8) + linear_input = torch.randn(32, 20) + trans_input = torch.randn(1, 16, 2) + tgt = torch.rand(1, 16, 2) + + return len(( + self.rnn(input, h), + self.rnncell(input[0], h[0]), + self.gru(input, h), + self.grucell(input[0], h[0]), + self.lstm(input, (h, c)), + # self.lstm(torch.nn.utils.rnn.pack_padded_sequence(self.a, lengths=torch.tensor([3,2,1])), (h, c)), + self.lstmcell(input[0], (h[0], c[0])), + self.transformers[0](trans_input, tgt), + self.transformers[1](trans_input), + self.transformers[2](trans_input, tgt), + self.linears[0](linear_input), + self.linears[1](linear_input), + self.linears[2](linear_input, linear_input), + )) + + +class StaticQuantModule: + def __init__(self): + super(StaticQuantModule, self).__init__() + + def getModule(self): + model_fp32 = self.M() + model_fp32.eval() + model_fp32.qconfig = torch.quantization.get_default_qconfig("qnnpack") + model_fp32_prepared = torch.quantization.prepare(model_fp32) + model_int8 = torch.quantization.convert(model_fp32_prepared) + return model_int8 + + class M(torch.nn.Module): + def __init__(self): + super(StaticQuantModule.M, self).__init__() + self.quant = torch.quantization.QuantStub() + self.input1d = torch.randn(4, 2, 2) + self.input2d = torch.randn((4, 2, 4, 4)) + self.input3d = torch.randn(4, 2, 2, 4, 4) + self.linear_input = torch.randn(32, 20) + + self.layer1 = nn.Sequential( + nn.Conv1d(2, 2, 1), nn.InstanceNorm1d(1), nn.Hardswish() + ) + self.layer2 = nn.Sequential( + nn.Conv2d(2, 2, 1), + nn.BatchNorm2d(2), + nn.InstanceNorm2d(1), + nn.LeakyReLU(), + ) + self.layer3 = nn.Sequential( + nn.Conv3d(2, 2, 1), nn.BatchNorm3d(2), nn.InstanceNorm3d(1), nn.ReLU() + ) + self.layer4 = nn.Sequential(nn.Linear(4, 3)) + self.dequant = torch.quantization.DeQuantStub() + + def forward(self): + x = self.quant(self.input1d) + x = self.layer1(x) + x = self.dequant(x) + + y = self.input2d + y = self.quant(y) + y = self.layer2(y) + y = self.layer4(y) + y = self.dequant(y) + + z = self.quant(self.input3d) + z = self.layer3(z) + z = self.dequant(z) + + return (x, y, z) + + +class FusedQuantModule: + def __init__(self): + super(FusedQuantModule, self).__init__() + + def getModule(self): + model_fp32 = self.M() + model_fp32.eval() + model_fp32.qconfig = torch.quantization.get_default_qconfig("qnnpack") + model_fp32_fused = torch.quantization.fuse_modules( + model_fp32, + [ + ["conv1d", "relu1"], + ["conv2d", "relu2"], + ["conv3d", "relu3"], + ["linear", "relu4"], + ], + ) + model_fp32_prepared = torch.quantization.prepare(model_fp32_fused) + model_int8 = torch.quantization.convert(model_fp32_prepared) + return model_int8 + + class M(torch.nn.Module): + def __init__(self): + super(FusedQuantModule.M, self).__init__() + self.quant = torch.quantization.QuantStub() + self.input1d = torch.randn(4, 2, 2) + self.input2d = torch.randn((4, 2, 4, 4)) + self.input3d = torch.randn(4, 2, 2, 4, 4) + self.conv1d = nn.Conv1d(2, 2, 1) + self.conv2d = nn.Conv2d(2, 2, 1) + self.conv3d = nn.Conv3d(2, 2, 1) + self.linear = nn.Linear(4, 2) + self.relu1 = nn.ReLU() + self.relu2 = nn.ReLU() + self.relu3 = nn.ReLU() + self.relu4 = nn.ReLU() + self.dequant = torch.quantization.DeQuantStub() + + def forward(self): + x = self.input1d + y = self.input2d + z = self.input3d + + x = self.quant(x) + x = self.conv1d(x) + x = self.relu1(x) + x = self.dequant(x) + + y = self.quant(y) + y = self.conv2d(y) + y = self.relu2(y) + y = self.dequant(y) + + z = self.quant(z) + z = self.conv3d(z) + z = self.relu3(z) + z = self.linear(z) + z = self.relu4(z) + z = self.dequant(z) + + return (x, y, z) diff --git a/test/mobile/model_test/sampling_ops.py b/test/mobile/model_test/sampling_ops.py new file mode 100644 index 00000000000000..a1ac71a3a31903 --- /dev/null +++ b/test/mobile/model_test/sampling_ops.py @@ -0,0 +1,37 @@ +import torch + + +# https://pytorch.org/docs/stable/torch.html#random-sampling + +class SamplingOpsModule(torch.nn.Module): + def __init__(self): + super(SamplingOpsModule, self).__init__() + + def forward(self): + a = torch.empty(3, 3).uniform_(0.0, 1.0) + size = (1, 4) + weights = torch.tensor([0, 10, 3, 0], dtype=torch.float) + return len( + # torch.seed(), + # torch.manual_seed(0), + torch.bernoulli(a), + # torch.initial_seed(), + torch.multinomial(weights, 2), + torch.normal(2.0, 3.0, size), + torch.poisson(a), + torch.rand(2, 3), + torch.rand_like(a), + torch.randint(10, size), + torch.randint_like(a, 4), + torch.rand(4), + torch.randn_like(a), + torch.randperm(4), + a.bernoulli_(), + a.cauchy_(), + a.exponential_(), + a.geometric_(0.5), + a.log_normal_(), + a.normal_(), + a.random_(), + a.uniform_(), + ) diff --git a/test/mobile/model_test/tensor_ops.py b/test/mobile/model_test/tensor_ops.py new file mode 100644 index 00000000000000..9e04c6703d27cd --- /dev/null +++ b/test/mobile/model_test/tensor_ops.py @@ -0,0 +1,279 @@ +import torch + + +class TensorOpsModule(torch.nn.Module): + def __init__(self): + super(TensorOpsModule, self).__init__() + + def forward(self): + return self.tensor_general_ops() + + def tensor_general_ops(self): + a = torch.randn(4) + b = torch.tensor([1.5]) + x = torch.ones((2,)) + c = torch.randn(4, dtype=torch.cfloat) + w = torch.rand(4, 4, 4, 4) + v = torch.rand(4, 4, 4, 4) + return len( + # torch.is_tensor(a), + # torch.is_storage(a), + torch.is_complex(a), + torch.is_conj(a), + torch.is_floating_point(a), + torch.is_nonzero(b), + # torch.set_default_dtype(torch.float32), + # torch.get_default_dtype(), + # torch.set_default_tensor_type(torch.DoubleTensor), + torch.numel(a), + # torch.set_printoptions(), + # torch.set_flush_denormal(False), + # https://pytorch.org/docs/stable/tensors.html#tensor-class-reference + # x.new_tensor([[0, 1], [2, 3]]), + x.new_full((3, 4), 3.141592), + x.new_empty((2, 3)), + x.new_ones((2, 3)), + x.new_zeros((2, 3)), + x.is_cuda, + x.is_quantized, + x.is_meta, + x.device, + x.dim(), + c.real, + c.imag, + # x.backward(), + x.clone(), + w.contiguous(), + w.contiguous(memory_format=torch.channels_last), + w.copy_(v), + w.copy_(1), + w.copy_(0.5), + x.cpu(), + # x.cuda(), + # x.data_ptr(), + x.dense_dim(), + w.fill_diagonal_(0), + w.element_size(), + w.exponential_(), + w.fill_(0), + w.geometric_(0.5), + a.index_fill(0, torch.tensor([0, 2]), 1), + a.index_put_([torch.argmax(a)], torch.tensor(1.0)), + a.index_put([torch.argmax(a)], torch.tensor(1.0)), + w.is_contiguous(), + c.is_complex(), + w.is_conj(), + w.is_floating_point(), + w.is_leaf, + w.is_pinned(), + w.is_set_to(w), + # w.is_shared, + w.is_coalesced(), + w.coalesce(), + w.is_signed(), + w.is_sparse, + torch.tensor([1]).item(), + x.log_normal_(), + # x.masked_scatter_(), + # x.masked_scatter(), + # w.normal(), + w.numel(), + # w.pin_memory(), + # w.put_(0, torch.tensor([0, 1], w)), + x.repeat(4, 2), + a.clamp_(0), + a.clamp(0), + a.clamp_min(0), + a.hardsigmoid_(), + a.hardsigmoid(), + a.hardswish_(), + a.hardswish(), + a.hardtanh_(), + a.hardtanh(), + a.leaky_relu_(), + a.leaky_relu(), + a.relu_(), + a.relu(), + a.resize_as_(a), + a.type_as(a), + a._shape_as_tensor(), + a.requires_grad_(False), + ) + + +class TensorCreationOpsModule(torch.nn.Module): + def __init__(self): + super(TensorCreationOpsModule, self).__init__() + + def forward(self): + return self.tensor_creation_ops() + + def tensor_creation_ops(self): + i = torch.tensor([[0, 1, 1], [2, 0, 2]]) + v = torch.tensor([3, 4, 5], dtype=torch.float32) + real = torch.tensor([1, 2], dtype=torch.float32) + imag = torch.tensor([3, 4], dtype=torch.float32) + inp = torch.tensor([-1.5, 0.0, 2.0]) + values = torch.tensor([0.5]) + quantized = torch.quantize_per_channel( + torch.tensor([[-1.0, 0.0], [1.0, 2.0]]), + torch.tensor([0.1, 0.01]), + torch.tensor([10, 0]), + 0, + torch.quint8, + ) + return len( + torch.tensor([[0.1, 1.2], [2.2, 3.1], [4.9, 5.2]]), + # torch.sparse_coo_tensor(i, v, [2, 3]), # not work for iOS + torch.as_tensor([1, 2, 3]), + torch.as_strided(torch.randn(3, 3), (2, 2), (1, 2)), + torch.zeros(2, 3), + torch.zeros((2, 3)), + torch.zeros([2, 3], out=i), + torch.zeros(5), + torch.zeros_like(torch.empty(2, 3)), + torch.ones(2, 3), + torch.ones((2, 3)), + torch.ones([2, 3]), + torch.ones(5), + torch.ones_like(torch.empty(2, 3)), + torch.arange(5), + torch.arange(1, 4), + torch.arange(1, 2.5, 0.5), + torch.range(1, 4), + torch.range(1, 4, 0.5), + torch.linspace(3.0, 3.0, steps=1), + torch.logspace(start=2, end=2, steps=1, base=2.0), + torch.eye(3), + torch.empty(2, 3), + torch.empty_like(torch.empty(2, 3), dtype=torch.int64), + torch.empty_strided((2, 3), (1, 2)), + torch.full((2, 3), 3.141592), + torch.full_like(torch.full((2, 3), 3.141592), 2.71828), + torch.quantize_per_tensor( + torch.tensor([-1.0, 0.0, 1.0, 2.0]), 0.1, 10, torch.quint8 + ), + torch.dequantize(quantized), + torch.complex(real, imag), + torch.polar(real, imag), + torch.heaviside(inp, values), + ) + + +class TensorIndexingOpsModule(torch.nn.Module): + def __init__(self): + super(TensorIndexingOpsModule, self).__init__() + + def forward(self): + return self.tensor_indexing_ops() + + def tensor_indexing_ops(self): + x = torch.randn(2, 4) + y = torch.randn(4, 4) + t = torch.tensor([[0, 0], [1, 0]]) + mask = x.ge(0.5) + i = [0, 1] + return len( + torch.cat((x, x, x), 0), + torch.concat((x, x, x), 0), + torch.conj(x), + torch.chunk(x, 2), + torch.dsplit(torch.randn(2, 2, 4), i), + torch.column_stack((x, x)), + torch.dstack((x, x)), + torch.gather(x, 0, t), + torch.hsplit(x, i), + torch.hstack((x, x)), + torch.index_select(x, 0, torch.tensor([0, 1])), + x.index(t), + torch.masked_select(x, mask), + torch.movedim(x, 1, 0), + torch.moveaxis(x, 1, 0), + torch.narrow(x, 0, 0, 2), + torch.nonzero(x), + torch.permute(x, (0, 1)), + torch.reshape(x, (-1,)), + torch.row_stack((x, x)), + torch.select(x, 0, 0), + torch.scatter(x, 0, t, x), + x.scatter(0, t, x.clone()), + torch.diagonal_scatter(y, torch.ones(4)), + torch.select_scatter(y, torch.ones(4), 0, 0), + torch.slice_scatter(x, x), + torch.scatter_add(x, 0, t, x), + x.scatter_(0, t, y), + x.scatter_add_(0, t, y), + # torch.scatter_reduce(x, 0, t, reduce="sum"), + torch.split(x, 1), + torch.squeeze(x, 0), + torch.stack([x, x]), + torch.swapaxes(x, 0, 1), + torch.swapdims(x, 0, 1), + torch.t(x), + torch.take(x, t), + torch.take_along_dim(x, torch.argmax(x)), + torch.tensor_split(x, 1), + torch.tensor_split(x, [0, 1]), + torch.tile(x, (2, 2)), + torch.transpose(x, 0, 1), + torch.unbind(x), + torch.unsqueeze(x, -1), + torch.vsplit(x, i), + torch.vstack((x, x)), + torch.where(x), + torch.where(t > 0, t, 0), + torch.where(t > 0, t, t), + ) + + +class TensorTypingOpsModule(torch.nn.Module): + def __init__(self): + super(TensorTypingOpsModule, self).__init__() + + def forward(self): + return self.tensor_typing_ops() + + def tensor_typing_ops(self): + x = torch.randn(1, 3, 4, 4) + return len( + x.to(torch.float), + x.to(torch.double), + x.to(torch.cfloat), + x.to(torch.cdouble), + x.to(torch.half), + x.to(torch.bfloat16), + x.to(torch.uint8), + x.to(torch.int8), + x.to(torch.short), + x.to(torch.int), + x.to(torch.long), + x.to(torch.bool), + x.to(torch.device("cpu")), + x.to(device="cpu", dtype=torch.float), + x.to(memory_format=torch.channels_last), + ) + + +class TensorViewOpsModule(torch.nn.Module): + def __init__(self): + super(TensorViewOpsModule, self).__init__() + + def forward(self): + return self.tensor_view_ops() + + def tensor_view_ops(self): + x = torch.randn(4, 4, 1) + y = torch.randn(4, 4, 2) + return len( + x[0, 2:], + x.detach(), + x.detach_(), + x.diagonal(), + x.expand(-1, -1, 3), + x.expand_as(y), + x.select(0, 1), + x.unflatten(1, (2, 2)), + x.unfold(1, 2, 2), + x.view(16), + x.view_as(torch.randn(16)), + ) diff --git a/test/mobile/model_test/torchvision_models.py b/test/mobile/model_test/torchvision_models.py new file mode 100644 index 00000000000000..232afbc54b1eee --- /dev/null +++ b/test/mobile/model_test/torchvision_models.py @@ -0,0 +1,24 @@ +import torch +import torchvision +from torch.utils.bundled_inputs import augment_model_with_bundled_inputs +from torch.utils.mobile_optimizer import optimize_for_mobile + + +class MobileNetV2Module: + def __init__(self): + super(MobileNetV2Module, self).__init__() + + def getModule(self): + model = torchvision.models.mobilenet_v2(pretrained=True) + model.eval() + example = torch.zeros(1, 3, 224, 224) + traced_script_module = torch.jit.trace(model, example) + optimized_module = optimize_for_mobile(traced_script_module) + augment_model_with_bundled_inputs( + optimized_module, + [ + (example, ), + ], + ) + optimized_module(example) + return optimized_module diff --git a/test/mobile/model_test/update_production_ops.py b/test/mobile/model_test/update_production_ops.py new file mode 100644 index 00000000000000..6bb685e6296d4f --- /dev/null +++ b/test/mobile/model_test/update_production_ops.py @@ -0,0 +1,35 @@ +""" +This is a script to aggregate production ops from xplat/pytorch_models/build/all_mobile_model_configs.yaml. +Specify the file path in the first argument. The results will be dump to model_ops.yaml +""" + +import sys +import yaml + +root_operators = {} +traced_operators = {} +kernel_metadata = {} + +with open(sys.argv[1]) as input_yaml_file: + model_infos = yaml.safe_load(input_yaml_file) + for info in model_infos: + for op in info["root_operators"]: + # aggregate occurance per op + root_operators[op] = 1 + (root_operators[op] if op in root_operators else 0) + for op in info["traced_operators"]: + # aggregate occurance per op + traced_operators[op] = 1 + (traced_operators[op] if op in traced_operators else 0) + # merge dtypes for each kernel + for kernal, dtypes in info["kernel_metadata"].items(): + new_dtypes = dtypes + (kernel_metadata[kernal] if kernal in kernel_metadata else []) + kernel_metadata[kernal] = list(set(new_dtypes)) + + +# Only test these built-in ops. No custom ops or non-CPU ops. +namespaces = ["aten", "prepacked", "prim", "quantized"] +root_operators = {x: root_operators[x] for x in root_operators if x.split("::")[0] in namespaces} +traced_operators = {x: traced_operators[x] for x in traced_operators if x.split("::")[0] in namespaces} + +out_path = "test/mobile/model_test/model_ops.yaml" +with open(out_path, "w") as f: + yaml.safe_dump({"root_operators": root_operators}, f) diff --git a/test/mobile/test_lite_script_module.py b/test/mobile/test_lite_script_module.py index 90abdab4ceea8a..638ac37eb88b39 100644 --- a/test/mobile/test_lite_script_module.py +++ b/test/mobile/test_lite_script_module.py @@ -522,6 +522,49 @@ def forward(self, x): input = torch.randn(4, 1, 4, 4) self._compare_script_and_mobile(model=model_int8, input=input) + def test_bundled_input_with_dynamic_type(self): + class Model(torch.nn.Module): + def __init__(self): + super(Model, self).__init__() + + def forward( + self, + x: Dict[int, torch.Tensor], + y: Dict[int, torch.Tensor], + z: Dict[int, torch.Tensor], + ): + return x + + model = Model() + script_module = torch.jit.script(model) + + sample_input = { + script_module.forward: [ + ( + {0: torch.ones(1)}, + {1: torch.ones(1)}, + {2: torch.ones(1)}, + ) + ] + } + + bundled_model = torch.utils.bundled_inputs.bundle_inputs( + script_module, sample_input + ) + + buf = bundled_model._save_to_buffer_for_lite_interpreter() + mobile_module = _load_for_lite_interpreter(io.BytesIO(buf)) + + i = mobile_module.run_method("get_all_bundled_inputs") + + self.assertEqual( + i[0], + ( + {0: torch.ones(1)}, + {1: torch.ones(1)}, + {2: torch.ones(1)}, + ), + ) if __name__ == '__main__': run_tests() diff --git a/test/onnx/autograd_helper.py b/test/onnx/autograd_helper.py new file mode 100644 index 00000000000000..a5c07bf1a26c58 --- /dev/null +++ b/test/onnx/autograd_helper.py @@ -0,0 +1,18 @@ +# Owner(s): ["module: onnx"] + +import torch + +# Autograd funtion that is a replica of the autograd funtion in +# test_utility_funs.py (test_autograd_module_name) +class CustomFunction(torch.autograd.Function): + @staticmethod + def forward(ctx, input): + ctx.save_for_backward(input) + return input.clamp(min=0) + + @staticmethod + def backward(ctx, grad_output): + input, = ctx.saved_tensors + grad_input = grad_output.clone() + grad_input[input < 0] = 0 + return grad_input diff --git a/test/onnx/expect/TestOperators.test_acos.expect b/test/onnx/expect/TestOperators.test_acos.expect index bcf9463956104c..40fc61e29b7f9a 100644 --- a/test/onnx/expect/TestOperators.test_acos.expect +++ b/test/onnx/expect/TestOperators.test_acos.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -43,5 +43,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_add_broadcast.expect b/test/onnx/expect/TestOperators.test_add_broadcast.expect index 72d69469339fe3..569b2400df8819 100644 --- a/test/onnx/expect/TestOperators.test_add_broadcast.expect +++ b/test/onnx/expect/TestOperators.test_add_broadcast.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -57,5 +57,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_add_left_broadcast.expect b/test/onnx/expect/TestOperators.test_add_left_broadcast.expect index 81a0689a51bd82..ffa632ca475b8c 100644 --- a/test/onnx/expect/TestOperators.test_add_left_broadcast.expect +++ b/test/onnx/expect/TestOperators.test_add_left_broadcast.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -57,5 +57,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_add_size1_broadcast.expect b/test/onnx/expect/TestOperators.test_add_size1_broadcast.expect index ffdf6efd6228dc..9917880a8a228f 100644 --- a/test/onnx/expect/TestOperators.test_add_size1_broadcast.expect +++ b/test/onnx/expect/TestOperators.test_add_size1_broadcast.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -60,5 +60,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_add_size1_right_broadcast.expect b/test/onnx/expect/TestOperators.test_add_size1_right_broadcast.expect index 72d69469339fe3..569b2400df8819 100644 --- a/test/onnx/expect/TestOperators.test_add_size1_right_broadcast.expect +++ b/test/onnx/expect/TestOperators.test_add_size1_right_broadcast.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -57,5 +57,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_add_size1_singleton_broadcast.expect b/test/onnx/expect/TestOperators.test_add_size1_singleton_broadcast.expect index a00ddab51e96ef..96d2dca593256a 100644 --- a/test/onnx/expect/TestOperators.test_add_size1_singleton_broadcast.expect +++ b/test/onnx/expect/TestOperators.test_add_size1_singleton_broadcast.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -60,5 +60,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_addconstant.expect b/test/onnx/expect/TestOperators.test_addconstant.expect index 8494b62a3ed716..0e1570eb62da57 100644 --- a/test/onnx/expect/TestOperators.test_addconstant.expect +++ b/test/onnx/expect/TestOperators.test_addconstant.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -57,5 +57,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_addmm.expect b/test/onnx/expect/TestOperators.test_addmm.expect index ee46983d9e4129..1ef0a81e2a9054 100644 --- a/test/onnx/expect/TestOperators.test_addmm.expect +++ b/test/onnx/expect/TestOperators.test_addmm.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -102,5 +102,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_argmax.expect b/test/onnx/expect/TestOperators.test_argmax.expect index a10b3bbe32899d..38add716ff3677 100644 --- a/test/onnx/expect/TestOperators.test_argmax.expect +++ b/test/onnx/expect/TestOperators.test_argmax.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -17,6 +17,11 @@ graph { i: 0 type: INT } + attribute { + name: "select_last_index" + i: 0 + type: INT + } } name: "torch_jit" input { @@ -50,5 +55,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_asin.expect b/test/onnx/expect/TestOperators.test_asin.expect index a6197dea5ffe6e..f5a44b850eb1c6 100644 --- a/test/onnx/expect/TestOperators.test_asin.expect +++ b/test/onnx/expect/TestOperators.test_asin.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -43,5 +43,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_at_op.expect b/test/onnx/expect/TestOperators.test_at_op.expect index 46f9008a6ea5f6..8d4ba07ddcc854 100644 --- a/test/onnx/expect/TestOperators.test_at_op.expect +++ b/test/onnx/expect/TestOperators.test_at_op.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -13,6 +13,11 @@ graph { s: "add" type: STRING } + attribute { + name: "overload_name" + s: "" + type: STRING + } } name: "torch_jit" input { @@ -49,5 +54,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_atan.expect b/test/onnx/expect/TestOperators.test_atan.expect index d9a034a2504271..c8d189e1415ef2 100644 --- a/test/onnx/expect/TestOperators.test_atan.expect +++ b/test/onnx/expect/TestOperators.test_atan.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -43,5 +43,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_avg_pool2d.expect b/test/onnx/expect/TestOperators.test_avg_pool2d.expect index cb5da7e05037aa..344022ec268877 100644 --- a/test/onnx/expect/TestOperators.test_avg_pool2d.expect +++ b/test/onnx/expect/TestOperators.test_avg_pool2d.expect @@ -1,40 +1,43 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { + node { + output: "onnx::Pad_1" + name: "Constant_0" + op_type: "Constant" + attribute { + name: "value" + t { + dims: 8 + data_type: 7 + raw_data: "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000" + } + type: TENSOR + } + } node { input: "onnx::Pad_0" - output: "onnx::AveragePool_1" - name: "Pad_0" + input: "onnx::Pad_1" + output: "onnx::AveragePool_2" + name: "Pad_1" op_type: "Pad" attribute { name: "mode" s: "constant" type: STRING } - attribute { - name: "pads" - ints: 0 - ints: 0 - ints: 0 - ints: 0 - ints: 0 - ints: 0 - ints: 0 - ints: 0 - type: INTS - } - attribute { - name: "value" - f: 0 - type: FLOAT - } } node { - input: "onnx::AveragePool_1" - output: "2" - name: "AveragePool_1" + input: "onnx::AveragePool_2" + output: "3" + name: "AveragePool_2" op_type: "AveragePool" + attribute { + name: "ceil_mode" + i: 0 + type: INT + } attribute { name: "kernel_shape" ints: 3 @@ -80,7 +83,7 @@ graph { } } output { - name: "2" + name: "3" type { tensor_type { elem_type: 1 @@ -103,5 +106,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_baddbmm.expect b/test/onnx/expect/TestOperators.test_baddbmm.expect index 66fe45123b9f18..fc7eb0f8295e64 100644 --- a/test/onnx/expect/TestOperators.test_baddbmm.expect +++ b/test/onnx/expect/TestOperators.test_baddbmm.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -119,5 +119,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_basic.expect b/test/onnx/expect/TestOperators.test_basic.expect index 88d53eb0ff75d5..3d151aefabdb13 100644 --- a/test/onnx/expect/TestOperators.test_basic.expect +++ b/test/onnx/expect/TestOperators.test_basic.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -76,5 +76,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_batchnorm.expect b/test/onnx/expect/TestOperators.test_batchnorm.expect index 1bd402f6533eb9..d9c9ec338c8cb4 100644 --- a/test/onnx/expect/TestOperators.test_batchnorm.expect +++ b/test/onnx/expect/TestOperators.test_batchnorm.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -145,5 +145,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_batchnorm_1d.expect b/test/onnx/expect/TestOperators.test_batchnorm_1d.expect index 426fb72af70207..a4d2e1f102498a 100644 --- a/test/onnx/expect/TestOperators.test_batchnorm_1d.expect +++ b/test/onnx/expect/TestOperators.test_batchnorm_1d.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -133,5 +133,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_batchnorm_noaffine.expect b/test/onnx/expect/TestOperators.test_batchnorm_noaffine.expect index 88f4fdc578f140..a421443cdcda51 100644 --- a/test/onnx/expect/TestOperators.test_batchnorm_noaffine.expect +++ b/test/onnx/expect/TestOperators.test_batchnorm_noaffine.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -135,5 +135,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_batchnorm_onnx_irv4.expect b/test/onnx/expect/TestOperators.test_batchnorm_onnx_irv4.expect index 0d80bedd8ab5c8..a556e38c7198a5 100644 --- a/test/onnx/expect/TestOperators.test_batchnorm_onnx_irv4.expect +++ b/test/onnx/expect/TestOperators.test_batchnorm_onnx_irv4.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -93,5 +93,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_batchnorm_training.expect b/test/onnx/expect/TestOperators.test_batchnorm_training.expect index 9090a8ff187779..5e8f2049e14337 100644 --- a/test/onnx/expect/TestOperators.test_batchnorm_training.expect +++ b/test/onnx/expect/TestOperators.test_batchnorm_training.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -149,5 +149,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_chunk.expect b/test/onnx/expect/TestOperators.test_chunk.expect index f4973676048086..575245c807eb63 100644 --- a/test/onnx/expect/TestOperators.test_chunk.expect +++ b/test/onnx/expect/TestOperators.test_chunk.expect @@ -1,28 +1,158 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { node { - input: "onnx::Split_0" - output: "1" - output: "2" - name: "Split_0" - op_type: "Split" + input: "onnx::Shape_0" + output: "onnx::Gather_1" + name: "Shape_0" + op_type: "Shape" + } + node { + output: "onnx::Gather_2" + name: "Constant_1" + op_type: "Constant" + attribute { + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\000\000\000\000\000\000\000\000" + } + type: TENSOR + } + } + node { + input: "onnx::Gather_1" + input: "onnx::Gather_2" + output: "onnx::Add_3" + name: "Gather_2" + op_type: "Gather" attribute { name: "axis" i: 0 type: INT } + } + node { + output: "onnx::Slice_4" + name: "Constant_3" + op_type: "Constant" + attribute { + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\000\000\000\000\000\000\000\000" + } + type: TENSOR + } + } + node { + output: "onnx::Add_5" + name: "Constant_4" + op_type: "Constant" + attribute { + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\001\000\000\000\000\000\000\000" + } + type: TENSOR + } + } + node { + input: "onnx::Add_3" + input: "onnx::Add_5" + output: "onnx::Div_6" + name: "Add_5" + op_type: "Add" + } + node { + output: "onnx::Div_7" + name: "Constant_6" + op_type: "Constant" + attribute { + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\002\000\000\000\000\000\000\000" + } + type: TENSOR + } + } + node { + input: "onnx::Div_6" + input: "onnx::Div_7" + output: "onnx::Mul_8" + name: "Div_7" + op_type: "Div" + } + node { + output: "onnx::Mul_9" + name: "Constant_8" + op_type: "Constant" attribute { - name: "split" - ints: 2 - ints: 1 - type: INTS + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\001\000\000\000\000\000\000\000" + } + type: TENSOR } } + node { + input: "onnx::Mul_8" + input: "onnx::Mul_9" + output: "onnx::Slice_10" + name: "Mul_9" + op_type: "Mul" + } + node { + input: "onnx::Shape_0" + input: "onnx::Slice_4" + input: "onnx::Slice_10" + input: "onnx::Gather_2" + output: "11" + name: "Slice_10" + op_type: "Slice" + } + node { + output: "onnx::Mul_12" + name: "Constant_11" + op_type: "Constant" + attribute { + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\002\000\000\000\000\000\000\000" + } + type: TENSOR + } + } + node { + input: "onnx::Mul_8" + input: "onnx::Mul_12" + output: "onnx::Slice_13" + name: "Mul_12" + op_type: "Mul" + } + node { + input: "onnx::Shape_0" + input: "onnx::Slice_10" + input: "onnx::Slice_13" + input: "onnx::Gather_2" + output: "14" + name: "Slice_13" + op_type: "Slice" + } name: "torch_jit" input { - name: "onnx::Split_0" + name: "onnx::Shape_0" type { tensor_type { elem_type: 1 @@ -35,7 +165,7 @@ graph { } } output { - name: "1" + name: "11" type { tensor_type { elem_type: 1 @@ -48,7 +178,7 @@ graph { } } output { - name: "2" + name: "14" type { tensor_type { elem_type: 1 @@ -62,5 +192,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_clip.expect b/test/onnx/expect/TestOperators.test_clip.expect index 50293fd9cf9421..81606851e7851e 100644 --- a/test/onnx/expect/TestOperators.test_clip.expect +++ b/test/onnx/expect/TestOperators.test_clip.expect @@ -1,24 +1,26 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { node { input: "onnx::Clip_0" - output: "1" + input: "onnx::Clip_6" + input: "onnx::Clip_7" + output: "5" name: "Clip_0" op_type: "Clip" - attribute { - name: "max" - f: 0.5 - type: FLOAT - } - attribute { - name: "min" - f: -0.5 - type: FLOAT - } } name: "torch_jit" + initializer { + data_type: 1 + name: "onnx::Clip_6" + raw_data: "\000\000\000\277" + } + initializer { + data_type: 1 + name: "onnx::Clip_7" + raw_data: "\000\000\000?" + } input { name: "onnx::Clip_0" type { @@ -36,7 +38,7 @@ graph { } } output { - name: "1" + name: "5" type { tensor_type { elem_type: 1 @@ -53,5 +55,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_clip_max.expect b/test/onnx/expect/TestOperators.test_clip_max.expect index bb7bd0fc6db2f1..ceb89b3048c670 100644 --- a/test/onnx/expect/TestOperators.test_clip_max.expect +++ b/test/onnx/expect/TestOperators.test_clip_max.expect @@ -1,19 +1,21 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { node { input: "onnx::Clip_0" - output: "1" + input: "" + input: "onnx::Clip_7" + output: "5" name: "Clip_0" op_type: "Clip" - attribute { - name: "max" - f: 0.1 - type: FLOAT - } } name: "torch_jit" + initializer { + data_type: 1 + name: "onnx::Clip_7" + raw_data: "\315\314\314=" + } input { name: "onnx::Clip_0" type { @@ -37,22 +39,22 @@ graph { } } output { - name: "1" + name: "5" type { tensor_type { elem_type: 1 shape { dim { - dim_value: 1 + dim_param: "Clip5_dim_0" } dim { - dim_value: 2 + dim_param: "Clip5_dim_1" } dim { - dim_value: 3 + dim_param: "Clip5_dim_2" } dim { - dim_value: 4 + dim_param: "Clip5_dim_3" } } } @@ -60,5 +62,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_clip_min.expect b/test/onnx/expect/TestOperators.test_clip_min.expect index cda3b105ccbaaf..22826be3fd5434 100644 --- a/test/onnx/expect/TestOperators.test_clip_min.expect +++ b/test/onnx/expect/TestOperators.test_clip_min.expect @@ -1,19 +1,21 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { node { input: "onnx::Clip_0" - output: "1" + input: "onnx::Clip_7" + input: "" + output: "5" name: "Clip_0" op_type: "Clip" - attribute { - name: "min" - f: -0.1 - type: FLOAT - } } name: "torch_jit" + initializer { + data_type: 1 + name: "onnx::Clip_7" + raw_data: "\315\314\314\275" + } input { name: "onnx::Clip_0" type { @@ -37,22 +39,22 @@ graph { } } output { - name: "1" + name: "5" type { tensor_type { elem_type: 1 shape { dim { - dim_value: 1 + dim_param: "Clip5_dim_0" } dim { - dim_value: 2 + dim_param: "Clip5_dim_1" } dim { - dim_value: 3 + dim_param: "Clip5_dim_2" } dim { - dim_value: 4 + dim_param: "Clip5_dim_3" } } } @@ -60,5 +62,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_concat2.expect b/test/onnx/expect/TestOperators.test_concat2.expect index b5102e0f86647d..f5b6aec0c2293e 100644 --- a/test/onnx/expect/TestOperators.test_concat2.expect +++ b/test/onnx/expect/TestOperators.test_concat2.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -65,5 +65,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_conv.expect b/test/onnx/expect/TestOperators.test_conv.expect index 55fe131ef3ac10..f1078cef39c176 100644 --- a/test/onnx/expect/TestOperators.test_conv.expect +++ b/test/onnx/expect/TestOperators.test_conv.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -118,5 +118,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_conv_onnx_irv4.expect b/test/onnx/expect/TestOperators.test_conv_onnx_irv4.expect index 980f9fab61fcbb..18e3c683e9bc92 100644 --- a/test/onnx/expect/TestOperators.test_conv_onnx_irv4.expect +++ b/test/onnx/expect/TestOperators.test_conv_onnx_irv4.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -96,5 +96,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_convtranspose.expect b/test/onnx/expect/TestOperators.test_convtranspose.expect index 22241584dc6c21..0beedca2f2920e 100644 --- a/test/onnx/expect/TestOperators.test_convtranspose.expect +++ b/test/onnx/expect/TestOperators.test_convtranspose.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -124,5 +124,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_cos.expect b/test/onnx/expect/TestOperators.test_cos.expect index 7b08d883c7b802..1185bca62c5975 100644 --- a/test/onnx/expect/TestOperators.test_cos.expect +++ b/test/onnx/expect/TestOperators.test_cos.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -43,5 +43,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_dict.expect b/test/onnx/expect/TestOperators.test_dict.expect index 42a4855c818c1b..e041d535d768b4 100644 --- a/test/onnx/expect/TestOperators.test_dict.expect +++ b/test/onnx/expect/TestOperators.test_dict.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -60,5 +60,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_dict_str.expect b/test/onnx/expect/TestOperators.test_dict_str.expect index 3e72400d5f421d..eaab2752fb7dcd 100644 --- a/test/onnx/expect/TestOperators.test_dict_str.expect +++ b/test/onnx/expect/TestOperators.test_dict_str.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -63,5 +63,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_dim.expect b/test/onnx/expect/TestOperators.test_dim.expect index 77480e6173bb95..59e910a646ca99 100644 --- a/test/onnx/expect/TestOperators.test_dim.expect +++ b/test/onnx/expect/TestOperators.test_dim.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -28,5 +28,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_dropout.expect b/test/onnx/expect/TestOperators.test_dropout.expect index 407a1af477f7c8..27aab5c718211c 100644 --- a/test/onnx/expect/TestOperators.test_dropout.expect +++ b/test/onnx/expect/TestOperators.test_dropout.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -42,5 +42,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_dropout_default.expect b/test/onnx/expect/TestOperators.test_dropout_default.expect index 523ec6bf8e307b..89c0e988aacbcd 100644 --- a/test/onnx/expect/TestOperators.test_dropout_default.expect +++ b/test/onnx/expect/TestOperators.test_dropout_default.expect @@ -1,23 +1,46 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { node { - input: "x" - output: "onnx::ReduceMax_1" - output: "2" - name: "Dropout_0" - op_type: "Dropout" + output: "onnx::Dropout_1" + name: "Constant_0" + op_type: "Constant" + attribute { + name: "value" + t { + data_type: 1 + raw_data: "\000\000\000?" + } + type: TENSOR + } + } + node { + output: "onnx::Dropout_2" + name: "Constant_1" + op_type: "Constant" attribute { - name: "ratio" - f: 0.5 - type: FLOAT + name: "value" + t { + data_type: 9 + raw_data: "\001" + } + type: TENSOR } } node { - input: "onnx::ReduceMax_1" - output: "3" - name: "ReduceMax_1" + input: "x" + input: "onnx::Dropout_1" + input: "onnx::Dropout_2" + output: "onnx::ReduceMax_3" + output: "4" + name: "Dropout_2" + op_type: "Dropout" + } + node { + input: "onnx::ReduceMax_3" + output: "5" + name: "ReduceMax_3" op_type: "ReduceMax" attribute { name: "keepdims" @@ -43,7 +66,7 @@ graph { } } output { - name: "3" + name: "5" type { tensor_type { elem_type: 1 @@ -54,5 +77,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_dropout_training.expect b/test/onnx/expect/TestOperators.test_dropout_training.expect index 523ec6bf8e307b..89c0e988aacbcd 100644 --- a/test/onnx/expect/TestOperators.test_dropout_training.expect +++ b/test/onnx/expect/TestOperators.test_dropout_training.expect @@ -1,23 +1,46 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { node { - input: "x" - output: "onnx::ReduceMax_1" - output: "2" - name: "Dropout_0" - op_type: "Dropout" + output: "onnx::Dropout_1" + name: "Constant_0" + op_type: "Constant" + attribute { + name: "value" + t { + data_type: 1 + raw_data: "\000\000\000?" + } + type: TENSOR + } + } + node { + output: "onnx::Dropout_2" + name: "Constant_1" + op_type: "Constant" attribute { - name: "ratio" - f: 0.5 - type: FLOAT + name: "value" + t { + data_type: 9 + raw_data: "\001" + } + type: TENSOR } } node { - input: "onnx::ReduceMax_1" - output: "3" - name: "ReduceMax_1" + input: "x" + input: "onnx::Dropout_1" + input: "onnx::Dropout_2" + output: "onnx::ReduceMax_3" + output: "4" + name: "Dropout_2" + op_type: "Dropout" + } + node { + input: "onnx::ReduceMax_3" + output: "5" + name: "ReduceMax_3" op_type: "ReduceMax" attribute { name: "keepdims" @@ -43,7 +66,7 @@ graph { } } output { - name: "3" + name: "5" type { tensor_type { elem_type: 1 @@ -54,5 +77,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_elu.expect b/test/onnx/expect/TestOperators.test_elu.expect index c43a3827fce974..9fc2d5aab1fed4 100644 --- a/test/onnx/expect/TestOperators.test_elu.expect +++ b/test/onnx/expect/TestOperators.test_elu.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -60,5 +60,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_embedding_bags.expect b/test/onnx/expect/TestOperators.test_embedding_bags.expect index ee9be8e861fb9e..dfa1afddee3010 100644 --- a/test/onnx/expect/TestOperators.test_embedding_bags.expect +++ b/test/onnx/expect/TestOperators.test_embedding_bags.expect @@ -1,42 +1,359 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { node { - input: "weight" - input: "input" - input: "offsets" - output: "3" - output: "4" - output: "5" - output: "6" - op_type: "ATen" + output: "onnx::Cast_3" + op_type: "Constant" attribute { - name: "include_last_offset" - i: 0 + name: "value" + t { + data_type: 7 + raw_data: "\001\000\000\000\000\000\000\000" + } + type: TENSOR + } + } + node { + input: "onnx::Cast_3" + output: "onnx::Loop_4" + op_type: "Cast" + attribute { + name: "to" + i: 9 type: INT } + } + node { + output: "5" + op_type: "Constant" + attribute { + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\000\000\000\000\000\000\000\000" + } + type: TENSOR + } + } + node { + input: "input" + output: "onnx::Gather_6" + op_type: "Shape" + } + node { + output: "onnx::Gather_7" + op_type: "Constant" + attribute { + name: "value" + t { + data_type: 7 + raw_data: "\000\000\000\000\000\000\000\000" + } + type: TENSOR + } + } + node { + input: "onnx::Gather_6" + input: "onnx::Gather_7" + output: "onnx::Unsqueeze_8" + op_type: "Gather" attribute { - name: "mode" - i: 1 + name: "axis" + i: 0 type: INT } + } + node { + output: "onnx::Unsqueeze_9" + op_type: "Constant" attribute { - name: "operator" - s: "embedding_bag" - type: STRING + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\000\000\000\000\000\000\000\000" + } + type: TENSOR } + } + node { + input: "onnx::Unsqueeze_8" + input: "onnx::Unsqueeze_9" + output: "onnx::Concat_10" + op_type: "Unsqueeze" + } + node { + input: "offsets" + input: "onnx::Concat_10" + output: "onnx::Slice_11" + op_type: "Concat" attribute { - name: "scale_grad_by_freq" + name: "axis" i: 0 type: INT } + } + node { + output: "onnx::Slice_12" + op_type: "Constant" + attribute { + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\000\000\000\000\000\000\000\000" + } + type: TENSOR + } + } + node { + output: "onnx::Slice_13" + op_type: "Constant" + attribute { + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\001\000\000\000\000\000\000\000" + } + type: TENSOR + } + } + node { + output: "onnx::Slice_14" + op_type: "Constant" + attribute { + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\377\377\377\377\377\377\377\177" + } + type: TENSOR + } + } + node { + output: "onnx::Slice_15" + op_type: "Constant" attribute { - name: "sparse" + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\001\000\000\000\000\000\000\000" + } + type: TENSOR + } + } + node { + input: "onnx::Slice_11" + input: "onnx::Slice_13" + input: "onnx::Slice_14" + input: "onnx::Slice_12" + input: "onnx::Slice_15" + output: "onnx::Shape_16" + op_type: "Slice" + } + node { + input: "onnx::Shape_16" + output: "onnx::Gather_17" + op_type: "Shape" + } + node { + output: "onnx::Gather_18" + op_type: "Constant" + attribute { + name: "value" + t { + data_type: 7 + raw_data: "\000\000\000\000\000\000\000\000" + } + type: TENSOR + } + } + node { + input: "onnx::Gather_17" + input: "onnx::Gather_18" + output: "onnx::Loop_19" + op_type: "Gather" + attribute { + name: "axis" i: 0 type: INT } } + node { + input: "onnx::Loop_19" + input: "onnx::Loop_4" + output: "20" + op_type: "Loop" + attribute { + name: "body" + g { + node { + input: "onnx::Slice_11" + input: "21" + output: "23" + name: "Gather_0" + op_type: "Gather" + attribute { + name: "axis" + i: 0 + type: INT + } + } + node { + input: "onnx::Shape_16" + input: "21" + output: "24" + name: "Gather_1" + op_type: "Gather" + attribute { + name: "axis" + i: 0 + type: INT + } + } + node { + output: "25" + name: "Constant_2" + op_type: "Constant" + attribute { + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\000\000\000\000\000\000\000\000" + } + type: TENSOR + } + } + node { + input: "23" + input: "25" + output: "26" + name: "Unsqueeze_3" + op_type: "Unsqueeze" + } + node { + output: "27" + name: "Constant_4" + op_type: "Constant" + attribute { + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\000\000\000\000\000\000\000\000" + } + type: TENSOR + } + } + node { + input: "24" + input: "27" + output: "28" + name: "Unsqueeze_5" + op_type: "Unsqueeze" + } + node { + input: "input" + input: "26" + input: "28" + input: "5" + output: "29" + name: "Slice_6" + op_type: "Slice" + } + node { + input: "weight" + input: "29" + output: "30" + name: "Gather_7" + op_type: "Gather" + attribute { + name: "axis" + i: 0 + type: INT + } + } + node { + input: "30" + output: "31" + name: "ReduceMean_8" + op_type: "ReduceMean" + attribute { + name: "axes" + ints: 0 + type: INTS + } + attribute { + name: "keepdims" + i: 0 + type: INT + } + } + node { + input: "onnx::Loop_4" + output: "32" + name: "Cast_9" + op_type: "Cast" + attribute { + name: "to" + i: 9 + type: INT + } + } + name: "torch_jit1" + input { + name: "21" + type { + tensor_type { + elem_type: 7 + shape { + } + } + } + } + input { + name: "22" + type { + tensor_type { + elem_type: 9 + shape { + } + } + } + } + output { + name: "32" + type { + tensor_type { + elem_type: 9 + shape { + } + } + } + } + output { + name: "31" + type { + tensor_type { + elem_type: 1 + shape { + dim { + dim_param: "Loop20_dim_1" + } + } + } + } + } + } + type: GRAPH + } + } name: "torch_jit" initializer { dims: 10 @@ -88,16 +405,16 @@ graph { } } output { - name: "3" + name: "20" type { tensor_type { elem_type: 1 shape { dim { - dim_value: 1 + dim_param: "Loop20_dim_0" } dim { - dim_value: 8 + dim_param: "Loop20_dim_1" } } } @@ -105,5 +422,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_empty_like.expect b/test/onnx/expect/TestOperators.test_empty_like.expect index 1293acb1e16fba..e4f6c6ede2cab1 100644 --- a/test/onnx/expect/TestOperators.test_empty_like.expect +++ b/test/onnx/expect/TestOperators.test_empty_like.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -36,5 +36,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_equal.expect b/test/onnx/expect/TestOperators.test_equal.expect index 21c1e7f3caed14..5a9877d484f895 100644 --- a/test/onnx/expect/TestOperators.test_equal.expect +++ b/test/onnx/expect/TestOperators.test_equal.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -72,5 +72,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_erf.expect b/test/onnx/expect/TestOperators.test_erf.expect index 6568ca8418d6a7..f8f70c37598dc8 100644 --- a/test/onnx/expect/TestOperators.test_erf.expect +++ b/test/onnx/expect/TestOperators.test_erf.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -55,5 +55,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_exp.expect b/test/onnx/expect/TestOperators.test_exp.expect index b270bab2097512..49d9f74cb20d98 100644 --- a/test/onnx/expect/TestOperators.test_exp.expect +++ b/test/onnx/expect/TestOperators.test_exp.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -43,5 +43,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_expand.expect b/test/onnx/expect/TestOperators.test_expand.expect index 988830e43c83fd..6634173a0a63aa 100644 --- a/test/onnx/expect/TestOperators.test_expand.expect +++ b/test/onnx/expect/TestOperators.test_expand.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -131,5 +131,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_flatten.expect b/test/onnx/expect/TestOperators.test_flatten.expect index 48def60c9f25ba..12160e8b9e6640 100644 --- a/test/onnx/expect/TestOperators.test_flatten.expect +++ b/test/onnx/expect/TestOperators.test_flatten.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -9,29 +9,59 @@ graph { op_type: "Shape" } node { - input: "onnx::Slice_1" - output: "onnx::Concat_2" - name: "Slice_1" - op_type: "Slice" + output: "onnx::Slice_2" + name: "Constant_1" + op_type: "Constant" attribute { - name: "axes" - ints: 0 - type: INTS + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\000\000\000\000\000\000\000\000" + } + type: TENSOR } + } + node { + output: "onnx::Slice_3" + name: "Constant_2" + op_type: "Constant" attribute { - name: "ends" - ints: 0 - type: INTS + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\000\000\000\000\000\000\000\000" + } + type: TENSOR } + } + node { + output: "onnx::Slice_4" + name: "Constant_3" + op_type: "Constant" attribute { - name: "starts" - ints: 0 - type: INTS + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\000\000\000\000\000\000\000\000" + } + type: TENSOR } } node { - output: "onnx::Concat_3" - name: "Constant_2" + input: "onnx::Slice_1" + input: "onnx::Slice_3" + input: "onnx::Slice_4" + input: "onnx::Slice_2" + output: "onnx::Concat_5" + name: "Slice_4" + op_type: "Slice" + } + node { + output: "onnx::Concat_6" + name: "Constant_5" op_type: "Constant" attribute { name: "value" @@ -44,10 +74,10 @@ graph { } } node { - input: "onnx::Concat_2" - input: "onnx::Concat_3" - output: "onnx::Reshape_4" - name: "Concat_3" + input: "onnx::Concat_5" + input: "onnx::Concat_6" + output: "onnx::Reshape_7" + name: "Concat_6" op_type: "Concat" attribute { name: "axis" @@ -57,9 +87,9 @@ graph { } node { input: "onnx::Shape_0" - input: "onnx::Reshape_4" - output: "5" - name: "Reshape_4" + input: "onnx::Reshape_7" + output: "8" + name: "Reshape_7" op_type: "Reshape" } name: "torch_jit" @@ -86,7 +116,7 @@ graph { } } output { - name: "5" + name: "8" type { tensor_type { elem_type: 1 @@ -100,5 +130,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_flatten2D.expect b/test/onnx/expect/TestOperators.test_flatten2D.expect index 041886291c9b38..f60b1ba7066ffa 100644 --- a/test/onnx/expect/TestOperators.test_flatten2D.expect +++ b/test/onnx/expect/TestOperators.test_flatten2D.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -54,5 +54,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_frobenius_norm.expect b/test/onnx/expect/TestOperators.test_frobenius_norm.expect index b1af3261b0dd5c..fba4585b18b853 100644 --- a/test/onnx/expect/TestOperators.test_frobenius_norm.expect +++ b/test/onnx/expect/TestOperators.test_frobenius_norm.expect @@ -1,35 +1,49 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { + node { + output: "onnx::ReduceSum_1" + name: "Constant_0" + op_type: "Constant" + attribute { + name: "value" + t { + dims: 2 + data_type: 7 + raw_data: "\000\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000" + } + type: TENSOR + } + } node { input: "x" input: "x" - output: "onnx::ReduceSum_1" - name: "Mul_0" + output: "onnx::ReduceSum_2" + name: "Mul_1" op_type: "Mul" } node { + input: "onnx::ReduceSum_2" input: "onnx::ReduceSum_1" - output: "onnx::Sqrt_2" - name: "ReduceSum_1" + output: "onnx::Sqrt_3" + name: "ReduceSum_2" op_type: "ReduceSum" - attribute { - name: "axes" - ints: 0 - ints: 1 - type: INTS - } attribute { name: "keepdims" i: 1 type: INT } + attribute { + name: "noop_with_empty_axes" + i: 0 + type: INT + } } node { - input: "onnx::Sqrt_2" - output: "3" - name: "Sqrt_2" + input: "onnx::Sqrt_3" + output: "4" + name: "Sqrt_3" op_type: "Sqrt" } name: "torch_jit" @@ -53,7 +67,7 @@ graph { } } output { - name: "3" + name: "4" type { tensor_type { elem_type: 1 @@ -73,5 +87,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_full.expect b/test/onnx/expect/TestOperators.test_full.expect index d3526e4b1c568e..fc8acf5ee80dce 100644 --- a/test/onnx/expect/TestOperators.test_full.expect +++ b/test/onnx/expect/TestOperators.test_full.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -36,5 +36,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_full_like.expect b/test/onnx/expect/TestOperators.test_full_like.expect index d3526e4b1c568e..fc8acf5ee80dce 100644 --- a/test/onnx/expect/TestOperators.test_full_like.expect +++ b/test/onnx/expect/TestOperators.test_full_like.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -36,5 +36,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_gather.expect b/test/onnx/expect/TestOperators.test_gather.expect index dde397e206cd89..609f89853ac694 100644 --- a/test/onnx/expect/TestOperators.test_gather.expect +++ b/test/onnx/expect/TestOperators.test_gather.expect @@ -1,114 +1,22 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { node { - output: "onnx::OneHot_2" - name: "Constant_0" - op_type: "Constant" - attribute { - name: "value" - t { - dims: 2 - data_type: 7 - raw_data: "\000\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000" - } - type: TENSOR - } - } - node { - output: "onnx::Gather_3" - name: "Constant_1" - op_type: "Constant" - attribute { - name: "value" - t { - dims: 1 - data_type: 7 - raw_data: "\001\000\000\000\000\000\000\000" - } - type: TENSOR - } - } - node { - input: "onnx::Shape_0" - output: "onnx::Gather_4" - name: "Shape_2" - op_type: "Shape" - } - node { - input: "onnx::Gather_4" - input: "onnx::Gather_3" - output: "onnx::OneHot_5" - name: "Gather_3" - op_type: "Gather" - attribute { - name: "axis" - i: 0 - type: INT - } - } - node { - input: "onnx::OneHot_1" - input: "onnx::OneHot_5" - input: "onnx::OneHot_2" - output: "onnx::Cast_6" - name: "OneHot_4" - op_type: "OneHot" + input: "onnx::GatherElements_0" + input: "onnx::GatherElements_1" + output: "2" + name: "GatherElements_0" + op_type: "GatherElements" attribute { name: "axis" i: 1 type: INT } } - node { - input: "onnx::Cast_6" - output: "onnx::Mul_7" - name: "Cast_5" - op_type: "Cast" - attribute { - name: "to" - i: 1 - type: INT - } - } - node { - input: "onnx::Shape_0" - output: "onnx::Mul_8" - name: "Unsqueeze_6" - op_type: "Unsqueeze" - attribute { - name: "axes" - ints: 2 - type: INTS - } - } - node { - input: "onnx::Mul_8" - input: "onnx::Mul_7" - output: "onnx::ReduceSum_9" - name: "Mul_7" - op_type: "Mul" - } - node { - input: "onnx::ReduceSum_9" - output: "10" - name: "ReduceSum_8" - op_type: "ReduceSum" - attribute { - name: "axes" - ints: 1 - type: INTS - } - attribute { - name: "keepdims" - i: 0 - type: INT - } - } name: "torch_jit" input { - name: "onnx::Shape_0" + name: "onnx::GatherElements_0" type { tensor_type { elem_type: 1 @@ -127,7 +35,7 @@ graph { } } input { - name: "onnx::OneHot_1" + name: "onnx::GatherElements_1" type { tensor_type { elem_type: 7 @@ -146,7 +54,7 @@ graph { } } output { - name: "10" + name: "2" type { tensor_type { elem_type: 1 @@ -166,5 +74,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_ge.expect b/test/onnx/expect/TestOperators.test_ge.expect index 5246ccf0eb767c..8d578a4d25bd0b 100644 --- a/test/onnx/expect/TestOperators.test_ge.expect +++ b/test/onnx/expect/TestOperators.test_ge.expect @@ -1,23 +1,17 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { node { - input: "onnx::Less_0" - input: "onnx::Less_1" - output: "onnx::Not_2" - name: "Less_0" - op_type: "Less" - } - node { - input: "onnx::Not_2" - output: "3" - name: "Not_1" - op_type: "Not" + input: "onnx::GreaterOrEqual_0" + input: "onnx::GreaterOrEqual_1" + output: "2" + name: "GreaterOrEqual_0" + op_type: "GreaterOrEqual" } name: "torch_jit" input { - name: "onnx::Less_0" + name: "onnx::GreaterOrEqual_0" type { tensor_type { elem_type: 6 @@ -33,7 +27,7 @@ graph { } } input { - name: "onnx::Less_1" + name: "onnx::GreaterOrEqual_1" type { tensor_type { elem_type: 6 @@ -49,7 +43,7 @@ graph { } } output { - name: "3" + name: "2" type { tensor_type { elem_type: 9 @@ -66,5 +60,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_gelu.expect b/test/onnx/expect/TestOperators.test_gelu.expect index d59cafa2617837..dfc7d1d88468d1 100644 --- a/test/onnx/expect/TestOperators.test_gelu.expect +++ b/test/onnx/expect/TestOperators.test_gelu.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -122,5 +122,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_gt.expect b/test/onnx/expect/TestOperators.test_gt.expect index 903ea41b9051fb..5aab77798bf648 100644 --- a/test/onnx/expect/TestOperators.test_gt.expect +++ b/test/onnx/expect/TestOperators.test_gt.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -72,5 +72,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_hardtanh.expect b/test/onnx/expect/TestOperators.test_hardtanh.expect index 70a3732b20700c..1268a4c14cfd15 100644 --- a/test/onnx/expect/TestOperators.test_hardtanh.expect +++ b/test/onnx/expect/TestOperators.test_hardtanh.expect @@ -1,23 +1,41 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { node { - input: "input" - output: "1" - name: "Clip_0" - op_type: "Clip" + output: "onnx::Clip_1" + name: "Constant_0" + op_type: "Constant" attribute { - name: "max" - f: 0.5 - type: FLOAT + name: "value" + t { + data_type: 1 + raw_data: "\000\000\000\277" + } + type: TENSOR } + } + node { + output: "onnx::Clip_2" + name: "Constant_1" + op_type: "Constant" attribute { - name: "min" - f: -0.5 - type: FLOAT + name: "value" + t { + data_type: 1 + raw_data: "\000\000\000?" + } + type: TENSOR } } + node { + input: "input" + input: "onnx::Clip_1" + input: "onnx::Clip_2" + output: "3" + name: "Clip_2" + op_type: "Clip" + } name: "torch_jit" input { name: "input" @@ -36,7 +54,7 @@ graph { } } output { - name: "1" + name: "3" type { tensor_type { elem_type: 1 @@ -53,5 +71,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_implicit_expand.expect b/test/onnx/expect/TestOperators.test_implicit_expand.expect index db37957247fb9f..3c94edc85b4b38 100644 --- a/test/onnx/expect/TestOperators.test_implicit_expand.expect +++ b/test/onnx/expect/TestOperators.test_implicit_expand.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -57,5 +57,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_index.expect b/test/onnx/expect/TestOperators.test_index.expect index 1ea803f067b7aa..330d2de0d7fca6 100644 --- a/test/onnx/expect/TestOperators.test_index.expect +++ b/test/onnx/expect/TestOperators.test_index.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -59,5 +59,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_isnan.expect b/test/onnx/expect/TestOperators.test_isnan.expect index db7a6831750001..198d3bdb238706 100644 --- a/test/onnx/expect/TestOperators.test_isnan.expect +++ b/test/onnx/expect/TestOperators.test_isnan.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -37,5 +37,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_layer_norm_aten.expect b/test/onnx/expect/TestOperators.test_layer_norm_aten.expect index dad821eb13e337..d7b7ac56113014 100644 --- a/test/onnx/expect/TestOperators.test_layer_norm_aten.expect +++ b/test/onnx/expect/TestOperators.test_layer_norm_aten.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -29,6 +29,11 @@ graph { s: "layer_norm" type: STRING } + attribute { + name: "overload_name" + s: "" + type: STRING + } } name: "torch_jit" initializer { @@ -123,5 +128,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_le.expect b/test/onnx/expect/TestOperators.test_le.expect index c0b0d67f5a931b..374a0d0e0d5212 100644 --- a/test/onnx/expect/TestOperators.test_le.expect +++ b/test/onnx/expect/TestOperators.test_le.expect @@ -1,23 +1,17 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { node { - input: "onnx::Greater_0" - input: "onnx::Greater_1" - output: "onnx::Not_2" - name: "Greater_0" - op_type: "Greater" - } - node { - input: "onnx::Not_2" - output: "3" - name: "Not_1" - op_type: "Not" + input: "onnx::LessOrEqual_0" + input: "onnx::LessOrEqual_1" + output: "2" + name: "LessOrEqual_0" + op_type: "LessOrEqual" } name: "torch_jit" input { - name: "onnx::Greater_0" + name: "onnx::LessOrEqual_0" type { tensor_type { elem_type: 6 @@ -33,7 +27,7 @@ graph { } } input { - name: "onnx::Greater_1" + name: "onnx::LessOrEqual_1" type { tensor_type { elem_type: 6 @@ -49,7 +43,7 @@ graph { } } output { - name: "3" + name: "2" type { tensor_type { elem_type: 9 @@ -66,5 +60,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_linear.expect b/test/onnx/expect/TestOperators.test_linear.expect index 372c34223bafd4..71c64dfe5a5085 100644 --- a/test/onnx/expect/TestOperators.test_linear.expect +++ b/test/onnx/expect/TestOperators.test_linear.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -102,5 +102,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_log_sigmoid.expect b/test/onnx/expect/TestOperators.test_log_sigmoid.expect index 993490e9e1dd2a..2681f1193102c3 100644 --- a/test/onnx/expect/TestOperators.test_log_sigmoid.expect +++ b/test/onnx/expect/TestOperators.test_log_sigmoid.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -61,5 +61,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_logsoftmax.expect b/test/onnx/expect/TestOperators.test_logsoftmax.expect index d01223a4c57984..1c4de89b6402cd 100644 --- a/test/onnx/expect/TestOperators.test_logsoftmax.expect +++ b/test/onnx/expect/TestOperators.test_logsoftmax.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -60,5 +60,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_lt.expect b/test/onnx/expect/TestOperators.test_lt.expect index 57b6366e7b2abb..2dbcc07cd9e17e 100644 --- a/test/onnx/expect/TestOperators.test_lt.expect +++ b/test/onnx/expect/TestOperators.test_lt.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -72,5 +72,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_max.expect b/test/onnx/expect/TestOperators.test_max.expect index 295f32c6f87b9c..d9fcc0fb5f7a36 100644 --- a/test/onnx/expect/TestOperators.test_max.expect +++ b/test/onnx/expect/TestOperators.test_max.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -60,5 +60,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_maxpool.expect b/test/onnx/expect/TestOperators.test_maxpool.expect index 13dabcfd506e39..f43712bbfc58f3 100644 --- a/test/onnx/expect/TestOperators.test_maxpool.expect +++ b/test/onnx/expect/TestOperators.test_maxpool.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -7,6 +7,11 @@ graph { output: "1" name: "MaxPool_0" op_type: "MaxPool" + attribute { + name: "ceil_mode" + i: 0 + type: INT + } attribute { name: "kernel_shape" ints: 3 @@ -65,5 +70,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_maxpool_indices.expect b/test/onnx/expect/TestOperators.test_maxpool_indices.expect index 249112abedfac3..46c23e3a4caecd 100644 --- a/test/onnx/expect/TestOperators.test_maxpool_indices.expect +++ b/test/onnx/expect/TestOperators.test_maxpool_indices.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -8,6 +8,11 @@ graph { output: "onnx::Sub_2" name: "MaxPool_0" op_type: "MaxPool" + attribute { + name: "ceil_mode" + i: 0 + type: INT + } attribute { name: "kernel_shape" ints: 3 @@ -43,31 +48,61 @@ graph { } } node { - input: "onnx::Slice_4" - output: "onnx::Sub_5" - name: "Slice_2" - op_type: "Slice" + output: "onnx::Slice_5" + name: "Constant_2" + op_type: "Constant" attribute { - name: "axes" - ints: 2 - type: INTS + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\002\000\000\000\000\000\000\000" + } + type: TENSOR } + } + node { + output: "onnx::Slice_6" + name: "Constant_3" + op_type: "Constant" attribute { - name: "ends" - ints: 1 - type: INTS + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\000\000\000\000\000\000\000\000" + } + type: TENSOR } + } + node { + output: "onnx::Slice_7" + name: "Constant_4" + op_type: "Constant" attribute { - name: "starts" - ints: 0 - type: INTS + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\001\000\000\000\000\000\000\000" + } + type: TENSOR } } + node { + input: "onnx::Slice_4" + input: "onnx::Slice_6" + input: "onnx::Slice_7" + input: "onnx::Slice_5" + output: "onnx::Sub_8" + name: "Slice_5" + op_type: "Slice" + } node { input: "onnx::Sub_2" - input: "onnx::Sub_5" - output: "6" - name: "Sub_3" + input: "onnx::Sub_8" + output: "9" + name: "Sub_6" op_type: "Sub" } name: "torch_jit" @@ -110,7 +145,7 @@ graph { } } output { - name: "6" + name: "9" type { tensor_type { elem_type: 7 @@ -130,5 +165,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_mean.expect b/test/onnx/expect/TestOperators.test_mean.expect index 8148bfdb54b3b4..b53b8c2f1248fd 100644 --- a/test/onnx/expect/TestOperators.test_mean.expect +++ b/test/onnx/expect/TestOperators.test_mean.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -48,5 +48,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_mean_dtype.expect b/test/onnx/expect/TestOperators.test_mean_dtype.expect index dfda5eba27e0e7..92ce0ae3aa9925 100644 --- a/test/onnx/expect/TestOperators.test_mean_dtype.expect +++ b/test/onnx/expect/TestOperators.test_mean_dtype.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -59,5 +59,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_meshgrid.expect b/test/onnx/expect/TestOperators.test_meshgrid.expect index ba0edfb1c3985a..05b9de875d9413 100644 --- a/test/onnx/expect/TestOperators.test_meshgrid.expect +++ b/test/onnx/expect/TestOperators.test_meshgrid.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -318,5 +318,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_min.expect b/test/onnx/expect/TestOperators.test_min.expect index 12945fa60e9fab..28ca14779f71c8 100644 --- a/test/onnx/expect/TestOperators.test_min.expect +++ b/test/onnx/expect/TestOperators.test_min.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -60,5 +60,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_mm.expect b/test/onnx/expect/TestOperators.test_mm.expect index 4b436e8ca2491c..9492d651fd9ece 100644 --- a/test/onnx/expect/TestOperators.test_mm.expect +++ b/test/onnx/expect/TestOperators.test_mm.expect @@ -1,27 +1,12 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { - node { - output: "onnx::Gemm_2" - name: "Constant_0" - op_type: "Constant" - attribute { - name: "value" - t { - dims: 1 - data_type: 1 - raw_data: "\000\000\200?" - } - type: TENSOR - } - } node { input: "onnx::Gemm_0" input: "onnx::Gemm_1" - input: "onnx::Gemm_2" - output: "3" - name: "Gemm_1" + output: "2" + name: "Gemm_0" op_type: "Gemm" attribute { name: "alpha" @@ -68,7 +53,7 @@ graph { } } output { - name: "3" + name: "2" type { tensor_type { elem_type: 1 @@ -85,5 +70,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_narrow.expect b/test/onnx/expect/TestOperators.test_narrow.expect index 52e3e9c8ebffce..a7b13c89a646c0 100644 --- a/test/onnx/expect/TestOperators.test_narrow.expect +++ b/test/onnx/expect/TestOperators.test_narrow.expect @@ -1,29 +1,35 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { node { input: "onnx::Slice_0" - output: "1" + input: "onnx::Slice_14" + input: "onnx::Slice_15" + input: "onnx::Slice_16" + output: "12" name: "Slice_0" op_type: "Slice" - attribute { - name: "axes" - ints: 0 - type: INTS - } - attribute { - name: "ends" - ints: 2 - type: INTS - } - attribute { - name: "starts" - ints: 0 - type: INTS - } } name: "torch_jit" + initializer { + dims: 1 + data_type: 7 + name: "onnx::Slice_14" + raw_data: "\000\000\000\000\000\000\000\000" + } + initializer { + dims: 1 + data_type: 7 + name: "onnx::Slice_15" + raw_data: "\002\000\000\000\000\000\000\000" + } + initializer { + dims: 1 + data_type: 7 + name: "onnx::Slice_16" + raw_data: "\000\000\000\000\000\000\000\000" + } input { name: "onnx::Slice_0" type { @@ -41,7 +47,7 @@ graph { } } output { - name: "1" + name: "12" type { tensor_type { elem_type: 1 @@ -58,5 +64,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_ne.expect b/test/onnx/expect/TestOperators.test_ne.expect index 55d35128cb33c5..ab053fbcf67e19 100644 --- a/test/onnx/expect/TestOperators.test_ne.expect +++ b/test/onnx/expect/TestOperators.test_ne.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -78,5 +78,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_nonzero.expect b/test/onnx/expect/TestOperators.test_nonzero.expect index 9090e3959742c7..cfcb1f505f8789 100644 --- a/test/onnx/expect/TestOperators.test_nonzero.expect +++ b/test/onnx/expect/TestOperators.test_nonzero.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -58,5 +58,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_norm_p1.expect b/test/onnx/expect/TestOperators.test_norm_p1.expect index df15562f6072ea..ec5e12b90a1690 100644 --- a/test/onnx/expect/TestOperators.test_norm_p1.expect +++ b/test/onnx/expect/TestOperators.test_norm_p1.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -62,5 +62,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_norm_p2.expect b/test/onnx/expect/TestOperators.test_norm_p2.expect index 1fadd7a7706fac..0388ec620821e2 100644 --- a/test/onnx/expect/TestOperators.test_norm_p2.expect +++ b/test/onnx/expect/TestOperators.test_norm_p2.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -62,5 +62,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_ones_like.expect b/test/onnx/expect/TestOperators.test_ones_like.expect index 30e234cc935c26..fafec789b1741c 100644 --- a/test/onnx/expect/TestOperators.test_ones_like.expect +++ b/test/onnx/expect/TestOperators.test_ones_like.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -36,5 +36,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_pad.expect b/test/onnx/expect/TestOperators.test_pad.expect index ab8125247058ca..0a25fb0eaf8751 100644 --- a/test/onnx/expect/TestOperators.test_pad.expect +++ b/test/onnx/expect/TestOperators.test_pad.expect @@ -1,31 +1,190 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { + node { + input: "onnx::ConstantOfShape_27" + output: "onnx::Concat_10" + name: "ConstantOfShape_0" + op_type: "ConstantOfShape" + attribute { + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\000\000\000\000\000\000\000\000" + } + type: TENSOR + } + } + node { + input: "onnx::Concat_28" + input: "onnx::Concat_10" + output: "onnx::Reshape_11" + name: "Concat_1" + op_type: "Concat" + attribute { + name: "axis" + i: 0 + type: INT + } + } + node { + output: "onnx::Reshape_12" + name: "Constant_2" + op_type: "Constant" + attribute { + name: "value" + t { + dims: 2 + data_type: 7 + raw_data: "\377\377\377\377\377\377\377\377\002\000\000\000\000\000\000\000" + } + type: TENSOR + } + } + node { + input: "onnx::Reshape_11" + input: "onnx::Reshape_12" + output: "onnx::Slice_13" + name: "Reshape_3" + op_type: "Reshape" + } + node { + output: "onnx::Slice_14" + name: "Constant_4" + op_type: "Constant" + attribute { + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\000\000\000\000\000\000\000\000" + } + type: TENSOR + } + } + node { + output: "onnx::Slice_15" + name: "Constant_5" + op_type: "Constant" + attribute { + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\377\377\377\377\377\377\377\377" + } + type: TENSOR + } + } + node { + output: "onnx::Slice_16" + name: "Constant_6" + op_type: "Constant" + attribute { + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\001\000\000\000\000\000\000\200" + } + type: TENSOR + } + } + node { + output: "onnx::Slice_17" + name: "Constant_7" + op_type: "Constant" + attribute { + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\377\377\377\377\377\377\377\377" + } + type: TENSOR + } + } + node { + input: "onnx::Slice_13" + input: "onnx::Slice_15" + input: "onnx::Slice_16" + input: "onnx::Slice_14" + input: "onnx::Slice_17" + output: "onnx::Transpose_18" + name: "Slice_8" + op_type: "Slice" + } + node { + input: "onnx::Transpose_18" + output: "onnx::Reshape_19" + name: "Transpose_9" + op_type: "Transpose" + attribute { + name: "perm" + ints: 1 + ints: 0 + type: INTS + } + } + node { + output: "onnx::Reshape_20" + name: "Constant_10" + op_type: "Constant" + attribute { + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\377\377\377\377\377\377\377\377" + } + type: TENSOR + } + } + node { + input: "onnx::Reshape_19" + input: "onnx::Reshape_20" + output: "onnx::Cast_21" + name: "Reshape_11" + op_type: "Reshape" + } + node { + input: "onnx::Cast_21" + output: "onnx::Pad_22" + name: "Cast_12" + op_type: "Cast" + attribute { + name: "to" + i: 7 + type: INT + } + } node { input: "input" - output: "1" - name: "Pad_0" + input: "onnx::Pad_22" + output: "23" + name: "Pad_13" op_type: "Pad" attribute { name: "mode" s: "reflect" type: STRING } - attribute { - name: "pads" - ints: 0 - ints: 0 - ints: 0 - ints: 2 - ints: 0 - ints: 0 - ints: 1 - ints: 3 - type: INTS - } } name: "torch_jit" + initializer { + dims: 1 + data_type: 7 + name: "onnx::ConstantOfShape_27" + raw_data: "\004\000\000\000\000\000\000\000" + } + initializer { + dims: 4 + data_type: 7 + name: "onnx::Concat_28" + raw_data: "\002\000\000\000\000\000\000\000\003\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000" + } input { name: "input" type { @@ -49,22 +208,22 @@ graph { } } output { - name: "1" + name: "23" type { tensor_type { elem_type: 1 shape { dim { - dim_value: 1 + dim_param: "Pad23_dim_0" } dim { - dim_value: 1 + dim_param: "Pad23_dim_1" } dim { - dim_value: 3 + dim_param: "Pad23_dim_2" } dim { - dim_value: 9 + dim_param: "Pad23_dim_3" } } } @@ -72,5 +231,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_params.expect b/test/onnx/expect/TestOperators.test_params.expect index 1d1bd7d4936e13..67064d8087ae46 100644 --- a/test/onnx/expect/TestOperators.test_params.expect +++ b/test/onnx/expect/TestOperators.test_params.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -92,5 +92,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_params_onnx_irv4.expect b/test/onnx/expect/TestOperators.test_params_onnx_irv4.expect index d6ddd543f354b2..8dbc34a20640bf 100644 --- a/test/onnx/expect/TestOperators.test_params_onnx_irv4.expect +++ b/test/onnx/expect/TestOperators.test_params_onnx_irv4.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -76,5 +76,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_permute2.expect b/test/onnx/expect/TestOperators.test_permute2.expect index 42310b8337109a..7f7b6afd9d2d9e 100644 --- a/test/onnx/expect/TestOperators.test_permute2.expect +++ b/test/onnx/expect/TestOperators.test_permute2.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -77,5 +77,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_pow.expect b/test/onnx/expect/TestOperators.test_pow.expect index 56dd281d9d2a6e..f20fd955509048 100644 --- a/test/onnx/expect/TestOperators.test_pow.expect +++ b/test/onnx/expect/TestOperators.test_pow.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -78,5 +78,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_prelu.expect b/test/onnx/expect/TestOperators.test_prelu.expect index f38134f579ddcd..f2bcb50ef77720 100644 --- a/test/onnx/expect/TestOperators.test_prelu.expect +++ b/test/onnx/expect/TestOperators.test_prelu.expect @@ -1,11 +1,11 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { node { input: "onnx::PRelu_0" - input: "onnx::PRelu_4" - output: "3" + input: "onnx::PRelu_5" + output: "4" name: "PRelu_0" op_type: "PRelu" } @@ -15,7 +15,7 @@ graph { dims: 1 dims: 1 data_type: 1 - name: "onnx::PRelu_4" + name: "onnx::PRelu_5" raw_data: "\000\000\200>\000\000\200>" } input { @@ -41,7 +41,7 @@ graph { } } input { - name: "onnx::PRelu_4" + name: "onnx::PRelu_5" type { tensor_type { elem_type: 1 @@ -60,7 +60,7 @@ graph { } } output { - name: "3" + name: "4" type { tensor_type { elem_type: 1 @@ -83,5 +83,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_prod.expect b/test/onnx/expect/TestOperators.test_prod.expect index 33b1f0e44f3ec0..0cfeafa4da32c8 100644 --- a/test/onnx/expect/TestOperators.test_prod.expect +++ b/test/onnx/expect/TestOperators.test_prod.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -48,5 +48,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_prod_dtype.expect b/test/onnx/expect/TestOperators.test_prod_dtype.expect index d9359ba40686fa..26a63ac840ad2e 100644 --- a/test/onnx/expect/TestOperators.test_prod_dtype.expect +++ b/test/onnx/expect/TestOperators.test_prod_dtype.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -59,5 +59,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_rand.expect b/test/onnx/expect/TestOperators.test_rand.expect index 02e239ba584dc4..b4d2dbd6cb1909 100644 --- a/test/onnx/expect/TestOperators.test_rand.expect +++ b/test/onnx/expect/TestOperators.test_rand.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -69,5 +69,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_randn.expect b/test/onnx/expect/TestOperators.test_randn.expect index ef8c51827d893c..bc2d0b23dd7b2d 100644 --- a/test/onnx/expect/TestOperators.test_randn.expect +++ b/test/onnx/expect/TestOperators.test_randn.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -69,5 +69,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_reduce_sum_negative_indices.expect b/test/onnx/expect/TestOperators.test_reduce_sum_negative_indices.expect index 044a4b47cdb8bf..7e5fefad2eb701 100644 --- a/test/onnx/expect/TestOperators.test_reduce_sum_negative_indices.expect +++ b/test/onnx/expect/TestOperators.test_reduce_sum_negative_indices.expect @@ -1,17 +1,27 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { node { - input: "onnx::ReduceSum_0" - output: "1" - name: "ReduceSum_0" - op_type: "ReduceSum" + output: "onnx::ReduceSum_1" + name: "Constant_0" + op_type: "Constant" attribute { - name: "axes" - ints: -1 - type: INTS + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\377\377\377\377\377\377\377\377" + } + type: TENSOR } + } + node { + input: "onnx::ReduceSum_0" + input: "onnx::ReduceSum_1" + output: "2" + name: "ReduceSum_1" + op_type: "ReduceSum" attribute { name: "keepdims" i: 0 @@ -36,7 +46,7 @@ graph { } } output { - name: "1" + name: "2" type { tensor_type { elem_type: 1 @@ -50,5 +60,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_reduced_mean.expect b/test/onnx/expect/TestOperators.test_reduced_mean.expect index f5da3dc6d104f8..ce69ab65a6a6d4 100644 --- a/test/onnx/expect/TestOperators.test_reduced_mean.expect +++ b/test/onnx/expect/TestOperators.test_reduced_mean.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -62,5 +62,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_reduced_mean_dtype.expect b/test/onnx/expect/TestOperators.test_reduced_mean_dtype.expect index 231d847669e6a7..71d9d296aecd05 100644 --- a/test/onnx/expect/TestOperators.test_reduced_mean_dtype.expect +++ b/test/onnx/expect/TestOperators.test_reduced_mean_dtype.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -73,5 +73,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_reduced_mean_keepdim.expect b/test/onnx/expect/TestOperators.test_reduced_mean_keepdim.expect index 3ab9b2629d3d14..98bb26aaea36b2 100644 --- a/test/onnx/expect/TestOperators.test_reduced_mean_keepdim.expect +++ b/test/onnx/expect/TestOperators.test_reduced_mean_keepdim.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -66,5 +66,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_reduced_prod.expect b/test/onnx/expect/TestOperators.test_reduced_prod.expect index 2d281995da12df..cdfbc0f5fbb69c 100644 --- a/test/onnx/expect/TestOperators.test_reduced_prod.expect +++ b/test/onnx/expect/TestOperators.test_reduced_prod.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -62,5 +62,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_reduced_prod_dtype.expect b/test/onnx/expect/TestOperators.test_reduced_prod_dtype.expect index a6bcac89d3d0a8..641d21cb9c79a5 100644 --- a/test/onnx/expect/TestOperators.test_reduced_prod_dtype.expect +++ b/test/onnx/expect/TestOperators.test_reduced_prod_dtype.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -73,5 +73,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_reduced_prod_keepdim.expect b/test/onnx/expect/TestOperators.test_reduced_prod_keepdim.expect index edfe354880d3c5..62befc2cf1cff7 100644 --- a/test/onnx/expect/TestOperators.test_reduced_prod_keepdim.expect +++ b/test/onnx/expect/TestOperators.test_reduced_prod_keepdim.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -65,5 +65,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_reduced_sum.expect b/test/onnx/expect/TestOperators.test_reduced_sum.expect index 69f8abdc48ee63..e03a204a3f9987 100644 --- a/test/onnx/expect/TestOperators.test_reduced_sum.expect +++ b/test/onnx/expect/TestOperators.test_reduced_sum.expect @@ -1,18 +1,27 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { node { - input: "onnx::ReduceSum_0" - output: "1" - name: "ReduceSum_0" - op_type: "ReduceSum" + output: "onnx::ReduceSum_1" + name: "Constant_0" + op_type: "Constant" attribute { - name: "axes" - ints: 1 - ints: 2 - type: INTS + name: "value" + t { + dims: 2 + data_type: 7 + raw_data: "\001\000\000\000\000\000\000\000\002\000\000\000\000\000\000\000" + } + type: TENSOR } + } + node { + input: "onnx::ReduceSum_0" + input: "onnx::ReduceSum_1" + output: "2" + name: "ReduceSum_1" + op_type: "ReduceSum" attribute { name: "keepdims" i: 0 @@ -43,7 +52,7 @@ graph { } } output { - name: "1" + name: "2" type { tensor_type { elem_type: 1 @@ -60,5 +69,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_reduced_sum_dtype.expect b/test/onnx/expect/TestOperators.test_reduced_sum_dtype.expect index 59fdbf24d48e0e..e8ffa49295a5ca 100644 --- a/test/onnx/expect/TestOperators.test_reduced_sum_dtype.expect +++ b/test/onnx/expect/TestOperators.test_reduced_sum_dtype.expect @@ -1,11 +1,25 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { node { - input: "onnx::Cast_0" output: "onnx::ReduceSum_1" - name: "Cast_0" + name: "Constant_0" + op_type: "Constant" + attribute { + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\000\000\000\000\000\000\000\000" + } + type: TENSOR + } + } + node { + input: "onnx::Cast_0" + output: "onnx::ReduceSum_2" + name: "Cast_1" op_type: "Cast" attribute { name: "to" @@ -14,15 +28,11 @@ graph { } } node { + input: "onnx::ReduceSum_2" input: "onnx::ReduceSum_1" - output: "2" - name: "ReduceSum_1" + output: "3" + name: "ReduceSum_2" op_type: "ReduceSum" - attribute { - name: "axes" - ints: 0 - type: INTS - } attribute { name: "keepdims" i: 0 @@ -53,7 +63,7 @@ graph { } } output { - name: "2" + name: "3" type { tensor_type { elem_type: 11 @@ -73,5 +83,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_reduced_sum_keepdim.expect b/test/onnx/expect/TestOperators.test_reduced_sum_keepdim.expect index 6c3498d8978698..7d05fdc26041c7 100644 --- a/test/onnx/expect/TestOperators.test_reduced_sum_keepdim.expect +++ b/test/onnx/expect/TestOperators.test_reduced_sum_keepdim.expect @@ -1,17 +1,27 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { node { - input: "onnx::ReduceSum_0" - output: "1" - name: "ReduceSum_0" - op_type: "ReduceSum" + output: "onnx::ReduceSum_1" + name: "Constant_0" + op_type: "Constant" attribute { - name: "axes" - ints: 2 - type: INTS + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\002\000\000\000\000\000\000\000" + } + type: TENSOR } + } + node { + input: "onnx::ReduceSum_0" + input: "onnx::ReduceSum_1" + output: "2" + name: "ReduceSum_1" + op_type: "ReduceSum" attribute { name: "keepdims" i: 1 @@ -42,7 +52,7 @@ graph { } } output { - name: "1" + name: "2" type { tensor_type { elem_type: 1 @@ -65,5 +75,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_reducemax.expect b/test/onnx/expect/TestOperators.test_reducemax.expect index 015621e36cc3fc..bbd770761f3a09 100644 --- a/test/onnx/expect/TestOperators.test_reducemax.expect +++ b/test/onnx/expect/TestOperators.test_reducemax.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -48,5 +48,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_reducemin.expect b/test/onnx/expect/TestOperators.test_reducemin.expect index ba713c955d5397..a555fac90f0a67 100644 --- a/test/onnx/expect/TestOperators.test_reducemin.expect +++ b/test/onnx/expect/TestOperators.test_reducemin.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -48,5 +48,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_remainder.expect b/test/onnx/expect/TestOperators.test_remainder.expect index 75799ad14ec68f..ecf44141260e57 100644 --- a/test/onnx/expect/TestOperators.test_remainder.expect +++ b/test/onnx/expect/TestOperators.test_remainder.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -89,5 +89,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_repeat.expect b/test/onnx/expect/TestOperators.test_repeat.expect index e87fce2e4792db..5206bce0d88ff9 100644 --- a/test/onnx/expect/TestOperators.test_repeat.expect +++ b/test/onnx/expect/TestOperators.test_repeat.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -98,5 +98,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_repeat_dim_overflow.expect b/test/onnx/expect/TestOperators.test_repeat_dim_overflow.expect index fb0730a99e5534..2dbb3a436d42b5 100644 --- a/test/onnx/expect/TestOperators.test_repeat_dim_overflow.expect +++ b/test/onnx/expect/TestOperators.test_repeat_dim_overflow.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -92,5 +92,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_rrelu.expect b/test/onnx/expect/TestOperators.test_rrelu.expect index 959a842d29b846..3fb75ab0bb4a93 100644 --- a/test/onnx/expect/TestOperators.test_rrelu.expect +++ b/test/onnx/expect/TestOperators.test_rrelu.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -72,5 +72,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_rsqrt.expect b/test/onnx/expect/TestOperators.test_rsqrt.expect index 3f0b2f654fc2d8..32e4df543ae9b7 100644 --- a/test/onnx/expect/TestOperators.test_rsqrt.expect +++ b/test/onnx/expect/TestOperators.test_rsqrt.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -63,5 +63,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_rsub.expect b/test/onnx/expect/TestOperators.test_rsub.expect index fcc5e3e46f9293..75344bfc68deeb 100644 --- a/test/onnx/expect/TestOperators.test_rsub.expect +++ b/test/onnx/expect/TestOperators.test_rsub.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -57,5 +57,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_scatter_add.expect b/test/onnx/expect/TestOperators.test_scatter_add.expect index 7e5604971ec6c8..fd7514e306303b 100644 --- a/test/onnx/expect/TestOperators.test_scatter_add.expect +++ b/test/onnx/expect/TestOperators.test_scatter_add.expect @@ -1,9 +1,9 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { node { - output: "onnx::Scatter_3" + output: "onnx::ScatterElements_3" name: "Constant_0" op_type: "Constant" attribute { @@ -18,12 +18,12 @@ graph { } } node { - input: "onnx::Scatter_3" - input: "onnx::Scatter_1" - input: "onnx::Scatter_2" + input: "onnx::ScatterElements_3" + input: "onnx::ScatterElements_1" + input: "onnx::ScatterElements_2" output: "onnx::Add_4" - name: "Scatter_1" - op_type: "Scatter" + name: "ScatterElements_1" + op_type: "ScatterElements" attribute { name: "axis" i: 1 @@ -55,7 +55,7 @@ graph { } } input { - name: "onnx::Scatter_1" + name: "onnx::ScatterElements_1" type { tensor_type { elem_type: 7 @@ -71,7 +71,7 @@ graph { } } input { - name: "onnx::Scatter_2" + name: "onnx::ScatterElements_2" type { tensor_type { elem_type: 1 @@ -104,5 +104,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_selu.expect b/test/onnx/expect/TestOperators.test_selu.expect index 9469c9432c8042..7cdc4dc8bac4e2 100644 --- a/test/onnx/expect/TestOperators.test_selu.expect +++ b/test/onnx/expect/TestOperators.test_selu.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -55,5 +55,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_shape_value_map.expect b/test/onnx/expect/TestOperators.test_shape_value_map.expect index c0044e4f4cebd8..174551f9a7c5bd 100644 --- a/test/onnx/expect/TestOperators.test_shape_value_map.expect +++ b/test/onnx/expect/TestOperators.test_shape_value_map.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -34,23 +34,33 @@ graph { } } node { - input: "onnx::Unsqueeze_3" - output: "onnx::Concat_7" - name: "Unsqueeze_3" - op_type: "Unsqueeze" + output: "onnx::Unsqueeze_7" + name: "Constant_3" + op_type: "Constant" attribute { - name: "axes" - ints: 0 - type: INTS + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\000\000\000\000\000\000\000\000" + } + type: TENSOR } } node { - input: "onnx::Concat_7" - input: "onnx::Concat_21" - input: "onnx::Concat_22" - input: "onnx::Concat_23" - output: "onnx::Reshape_11" - name: "Concat_4" + input: "onnx::Unsqueeze_3" + input: "onnx::Unsqueeze_7" + output: "onnx::Concat_8" + name: "Unsqueeze_4" + op_type: "Unsqueeze" + } + node { + input: "onnx::Concat_8" + input: "onnx::Concat_26" + input: "onnx::Concat_27" + input: "onnx::Concat_28" + output: "onnx::Reshape_15" + name: "Concat_5" op_type: "Concat" attribute { name: "axis" @@ -60,66 +70,62 @@ graph { } node { input: "x" - input: "onnx::Reshape_11" - output: "onnx::Transpose_12" - name: "Reshape_5" + input: "onnx::Reshape_15" + output: "onnx::Transpose_16" + name: "Reshape_6" op_type: "Reshape" } node { - input: "onnx::Transpose_12" - output: "onnx::Softmax_13" - name: "Transpose_6" + input: "onnx::Transpose_16" + output: "x.1" + name: "Transpose_7" op_type: "Transpose" attribute { name: "perm" ints: 0 - ints: 3 - ints: 1 ints: 2 + ints: 1 + ints: 3 type: INTS } } node { - input: "onnx::Softmax_13" - output: "onnx::Transpose_14" - name: "Softmax_7" + input: "x.1" + output: "onnx::Reshape_18" + name: "Softmax_8" op_type: "Softmax" attribute { name: "axis" - i: 3 + i: 1 type: INT } } node { - input: "onnx::Transpose_14" - output: "onnx::Reshape_15" - name: "Transpose_8" - op_type: "Transpose" + output: "onnx::Unsqueeze_20" + name: "Constant_9" + op_type: "Constant" attribute { - name: "perm" - ints: 0 - ints: 3 - ints: 2 - ints: 1 - type: INTS + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\000\000\000\000\000\000\000\000" + } + type: TENSOR } } node { input: "onnx::Unsqueeze_3" - output: "onnx::Concat_17" - name: "Unsqueeze_9" + input: "onnx::Unsqueeze_20" + output: "onnx::Concat_21" + name: "Unsqueeze_10" op_type: "Unsqueeze" - attribute { - name: "axes" - ints: 0 - type: INTS - } } node { - input: "onnx::Concat_17" - input: "onnx::Concat_24" - output: "onnx::Reshape_19" - name: "Concat_10" + input: "onnx::Concat_21" + input: "onnx::Concat_29" + output: "onnx::Reshape_24" + name: "Concat_11" op_type: "Concat" attribute { name: "axis" @@ -128,35 +134,35 @@ graph { } } node { - input: "onnx::Reshape_15" - input: "onnx::Reshape_19" - output: "20" - name: "Reshape_11" + input: "onnx::Reshape_18" + input: "onnx::Reshape_24" + output: "25" + name: "Reshape_12" op_type: "Reshape" } name: "torch_jit" initializer { dims: 1 data_type: 7 - name: "onnx::Concat_21" + name: "onnx::Concat_26" raw_data: "\001\000\000\000\000\000\000\000" } initializer { dims: 1 data_type: 7 - name: "onnx::Concat_22" + name: "onnx::Concat_27" raw_data: "\002\000\000\000\000\000\000\000" } initializer { dims: 1 data_type: 7 - name: "onnx::Concat_23" + name: "onnx::Concat_28" raw_data: "\377\377\377\377\377\377\377\377" } initializer { dims: 1 data_type: 7 - name: "onnx::Concat_24" + name: "onnx::Concat_29" raw_data: "\377\377\377\377\377\377\377\377" } input { @@ -182,7 +188,7 @@ graph { } } output { - name: "20" + name: "25" type { tensor_type { elem_type: 1 @@ -199,5 +205,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_sign.expect b/test/onnx/expect/TestOperators.test_sign.expect index 0cf0a0fa4417d0..6cb9200dc07357 100644 --- a/test/onnx/expect/TestOperators.test_sign.expect +++ b/test/onnx/expect/TestOperators.test_sign.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -43,5 +43,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_sin.expect b/test/onnx/expect/TestOperators.test_sin.expect index 2e5710f70d4300..4ca6284c48d90c 100644 --- a/test/onnx/expect/TestOperators.test_sin.expect +++ b/test/onnx/expect/TestOperators.test_sin.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -43,5 +43,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_slice.expect b/test/onnx/expect/TestOperators.test_slice.expect index 755625522ace89..15aa37bc2f7eb5 100644 --- a/test/onnx/expect/TestOperators.test_slice.expect +++ b/test/onnx/expect/TestOperators.test_slice.expect @@ -1,28 +1,73 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { node { - input: "onnx::Slice_0" - output: "1" - name: "Slice_0" - op_type: "Slice" + output: "onnx::Slice_1" + name: "Constant_0" + op_type: "Constant" attribute { - name: "axes" - ints: 1 - type: INTS + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\001\000\000\000\000\000\000\000" + } + type: TENSOR } + } + node { + output: "onnx::Slice_2" + name: "Constant_1" + op_type: "Constant" + attribute { + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\001\000\000\000\000\000\000\000" + } + type: TENSOR + } + } + node { + output: "onnx::Slice_3" + name: "Constant_2" + op_type: "Constant" attribute { - name: "ends" - ints: 2 - type: INTS + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\002\000\000\000\000\000\000\000" + } + type: TENSOR } + } + node { + output: "onnx::Slice_4" + name: "Constant_3" + op_type: "Constant" attribute { - name: "starts" - ints: 1 - type: INTS + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\001\000\000\000\000\000\000\000" + } + type: TENSOR } } + node { + input: "onnx::Slice_0" + input: "onnx::Slice_2" + input: "onnx::Slice_3" + input: "onnx::Slice_1" + input: "onnx::Slice_4" + output: "5" + name: "Slice_4" + op_type: "Slice" + } name: "torch_jit" input { name: "onnx::Slice_0" @@ -41,7 +86,7 @@ graph { } } output { - name: "1" + name: "5" type { tensor_type { elem_type: 1 @@ -58,5 +103,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_split.expect b/test/onnx/expect/TestOperators.test_split.expect index bd11058b1a5e13..e1616e4a52cdf5 100644 --- a/test/onnx/expect/TestOperators.test_split.expect +++ b/test/onnx/expect/TestOperators.test_split.expect @@ -1,26 +1,34 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { + node { + output: "onnx::Split_1" + name: "Constant_0" + op_type: "Constant" + attribute { + name: "value" + t { + dims: 3 + data_type: 7 + raw_data: "\002\000\000\000\000\000\000\000\002\000\000\000\000\000\000\000\002\000\000\000\000\000\000\000" + } + type: TENSOR + } + } node { input: "tensor" - output: "1" + input: "onnx::Split_1" output: "2" output: "3" - name: "Split_0" + output: "4" + name: "Split_1" op_type: "Split" attribute { name: "axis" i: 1 type: INT } - attribute { - name: "split" - ints: 2 - ints: 2 - ints: 2 - type: INTS - } } name: "torch_jit" input { @@ -40,7 +48,7 @@ graph { } } output { - name: "1" + name: "2" type { tensor_type { elem_type: 1 @@ -56,7 +64,7 @@ graph { } } output { - name: "2" + name: "3" type { tensor_type { elem_type: 1 @@ -72,7 +80,7 @@ graph { } } output { - name: "3" + name: "4" type { tensor_type { elem_type: 1 @@ -89,5 +97,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_split_with_sizes.expect b/test/onnx/expect/TestOperators.test_split_with_sizes.expect index 359135cdb01a78..964ba363a56e38 100644 --- a/test/onnx/expect/TestOperators.test_split_with_sizes.expect +++ b/test/onnx/expect/TestOperators.test_split_with_sizes.expect @@ -1,26 +1,34 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { + node { + output: "onnx::Split_1" + name: "Constant_0" + op_type: "Constant" + attribute { + name: "value" + t { + dims: 3 + data_type: 7 + raw_data: "\002\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\003\000\000\000\000\000\000\000" + } + type: TENSOR + } + } node { input: "tensor" - output: "1" + input: "onnx::Split_1" output: "2" output: "3" - name: "Split_0" + output: "4" + name: "Split_1" op_type: "Split" attribute { name: "axis" i: 1 type: INT } - attribute { - name: "split" - ints: 2 - ints: 1 - ints: 3 - type: INTS - } } name: "torch_jit" input { @@ -40,7 +48,7 @@ graph { } } output { - name: "1" + name: "2" type { tensor_type { elem_type: 1 @@ -56,7 +64,7 @@ graph { } } output { - name: "2" + name: "3" type { tensor_type { elem_type: 1 @@ -72,7 +80,7 @@ graph { } } output { - name: "3" + name: "4" type { tensor_type { elem_type: 1 @@ -89,5 +97,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_sqrt.expect b/test/onnx/expect/TestOperators.test_sqrt.expect index 67e86c2836dd8b..91fc7bac0b7755 100644 --- a/test/onnx/expect/TestOperators.test_sqrt.expect +++ b/test/onnx/expect/TestOperators.test_sqrt.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -43,5 +43,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_std.expect b/test/onnx/expect/TestOperators.test_std.expect index 957ac1937fb229..69df37b90452a5 100644 --- a/test/onnx/expect/TestOperators.test_std.expect +++ b/test/onnx/expect/TestOperators.test_std.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -185,5 +185,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_sum.expect b/test/onnx/expect/TestOperators.test_sum.expect index d923dab06db294..6722064ace203e 100644 --- a/test/onnx/expect/TestOperators.test_sum.expect +++ b/test/onnx/expect/TestOperators.test_sum.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -48,5 +48,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_sum_dtype.expect b/test/onnx/expect/TestOperators.test_sum_dtype.expect index 3457c4d7e88bb4..2b5f417b0eee71 100644 --- a/test/onnx/expect/TestOperators.test_sum_dtype.expect +++ b/test/onnx/expect/TestOperators.test_sum_dtype.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -59,5 +59,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_tan.expect b/test/onnx/expect/TestOperators.test_tan.expect index 1ff7b8ee19a030..84bc3e9420df1e 100644 --- a/test/onnx/expect/TestOperators.test_tan.expect +++ b/test/onnx/expect/TestOperators.test_tan.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -43,5 +43,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_transpose.expect b/test/onnx/expect/TestOperators.test_transpose.expect index 41227d0b934a68..f1350a1b262334 100644 --- a/test/onnx/expect/TestOperators.test_transpose.expect +++ b/test/onnx/expect/TestOperators.test_transpose.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -43,5 +43,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_type_as.expect b/test/onnx/expect/TestOperators.test_type_as.expect index 2af30c6ebc31a4..31803483edbd72 100644 --- a/test/onnx/expect/TestOperators.test_type_as.expect +++ b/test/onnx/expect/TestOperators.test_type_as.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -37,5 +37,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_unfold.expect b/test/onnx/expect/TestOperators.test_unfold.expect index 58675ad825e7c5..9b5e20281d2015 100644 --- a/test/onnx/expect/TestOperators.test_unfold.expect +++ b/test/onnx/expect/TestOperators.test_unfold.expect @@ -1,76 +1,156 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { node { - input: "onnx::Slice_0" - output: "onnx::Unsqueeze_1" - name: "Slice_0" - op_type: "Slice" + output: "onnx::Slice_1" + name: "Constant_0" + op_type: "Constant" attribute { - name: "axes" - ints: 2 - type: INTS + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\002\000\000\000\000\000\000\000" + } + type: TENSOR } + } + node { + output: "onnx::Slice_2" + name: "Constant_1" + op_type: "Constant" attribute { - name: "ends" - ints: 2 - type: INTS + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\000\000\000\000\000\000\000\000" + } + type: TENSOR } + } + node { + output: "onnx::Slice_3" + name: "Constant_2" + op_type: "Constant" attribute { - name: "starts" - ints: 0 - type: INTS + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\002\000\000\000\000\000\000\000" + } + type: TENSOR } } node { input: "onnx::Slice_0" - output: "onnx::Unsqueeze_2" - name: "Slice_1" + input: "onnx::Slice_2" + input: "onnx::Slice_3" + input: "onnx::Slice_1" + output: "onnx::Unsqueeze_4" + name: "Slice_3" op_type: "Slice" + } + node { + output: "onnx::Slice_5" + name: "Constant_4" + op_type: "Constant" attribute { - name: "axes" - ints: 2 - type: INTS + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\002\000\000\000\000\000\000\000" + } + type: TENSOR } + } + node { + output: "onnx::Slice_6" + name: "Constant_5" + op_type: "Constant" attribute { - name: "ends" - ints: 4 - type: INTS + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\002\000\000\000\000\000\000\000" + } + type: TENSOR } + } + node { + output: "onnx::Slice_7" + name: "Constant_6" + op_type: "Constant" attribute { - name: "starts" - ints: 2 - type: INTS + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\004\000\000\000\000\000\000\000" + } + type: TENSOR } } node { - input: "onnx::Unsqueeze_1" - output: "onnx::Concat_3" - name: "Unsqueeze_2" - op_type: "Unsqueeze" + input: "onnx::Slice_0" + input: "onnx::Slice_6" + input: "onnx::Slice_7" + input: "onnx::Slice_5" + output: "onnx::Unsqueeze_8" + name: "Slice_7" + op_type: "Slice" + } + node { + output: "onnx::Unsqueeze_9" + name: "Constant_8" + op_type: "Constant" attribute { - name: "axes" - ints: 2 - type: INTS + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\002\000\000\000\000\000\000\000" + } + type: TENSOR } } node { - input: "onnx::Unsqueeze_2" - output: "onnx::Concat_4" - name: "Unsqueeze_3" + input: "onnx::Unsqueeze_4" + input: "onnx::Unsqueeze_9" + output: "onnx::Concat_10" + name: "Unsqueeze_9" op_type: "Unsqueeze" + } + node { + output: "onnx::Unsqueeze_11" + name: "Constant_10" + op_type: "Constant" attribute { - name: "axes" - ints: 2 - type: INTS + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\002\000\000\000\000\000\000\000" + } + type: TENSOR } } node { - input: "onnx::Concat_3" - input: "onnx::Concat_4" - output: "5" - name: "Concat_4" + input: "onnx::Unsqueeze_8" + input: "onnx::Unsqueeze_11" + output: "onnx::Concat_12" + name: "Unsqueeze_11" + op_type: "Unsqueeze" + } + node { + input: "onnx::Concat_10" + input: "onnx::Concat_12" + output: "13" + name: "Concat_12" op_type: "Concat" attribute { name: "axis" @@ -99,7 +179,7 @@ graph { } } output { - name: "5" + name: "13" type { tensor_type { elem_type: 1 @@ -122,5 +202,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_unsqueeze.expect b/test/onnx/expect/TestOperators.test_unsqueeze.expect index 215b76683f3fbb..49a61c2b845151 100644 --- a/test/onnx/expect/TestOperators.test_unsqueeze.expect +++ b/test/onnx/expect/TestOperators.test_unsqueeze.expect @@ -1,18 +1,28 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { node { - input: "onnx::Unsqueeze_0" - output: "1" - name: "Unsqueeze_0" - op_type: "Unsqueeze" + output: "onnx::Unsqueeze_1" + name: "Constant_0" + op_type: "Constant" attribute { - name: "axes" - ints: 2 - type: INTS + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\002\000\000\000\000\000\000\000" + } + type: TENSOR } } + node { + input: "onnx::Unsqueeze_0" + input: "onnx::Unsqueeze_1" + output: "2" + name: "Unsqueeze_1" + op_type: "Unsqueeze" + } name: "torch_jit" input { name: "onnx::Unsqueeze_0" @@ -31,7 +41,7 @@ graph { } } output { - name: "1" + name: "2" type { tensor_type { elem_type: 1 @@ -51,5 +61,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_upsample_nearest_scale.expect b/test/onnx/expect/TestOperators.test_upsample_nearest_scale.expect index a05dc823168696..e1f31dc406a0d1 100644 --- a/test/onnx/expect/TestOperators.test_upsample_nearest_scale.expect +++ b/test/onnx/expect/TestOperators.test_upsample_nearest_scale.expect @@ -1,24 +1,40 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { node { input: "x" - input: "onnx::Upsample_5" - output: "4" - name: "Upsample_0" - op_type: "Upsample" + input: "" + input: "onnx::Resize_6" + output: "5" + name: "Resize_0" + op_type: "Resize" + attribute { + name: "coordinate_transformation_mode" + s: "asymmetric" + type: STRING + } + attribute { + name: "cubic_coeff_a" + f: -0.75 + type: FLOAT + } attribute { name: "mode" s: "nearest" type: STRING } + attribute { + name: "nearest_mode" + s: "floor" + type: STRING + } } name: "torch_jit" initializer { dims: 4 data_type: 1 - name: "onnx::Upsample_5" + name: "onnx::Resize_6" raw_data: "\000\000\200?\000\000\200?\000\000\000@\000\000\000@" } input { @@ -44,22 +60,22 @@ graph { } } output { - name: "4" + name: "5" type { tensor_type { elem_type: 1 shape { dim { - dim_value: 1 + dim_param: "Resize5_dim_0" } dim { - dim_value: 2 + dim_param: "Resize5_dim_1" } dim { - dim_value: 6 + dim_param: "Resize5_dim_2" } dim { - dim_value: 8 + dim_param: "Resize5_dim_3" } } } @@ -67,5 +83,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_upsample_nearest_scale_default_scale_factor.expect b/test/onnx/expect/TestOperators.test_upsample_nearest_scale_default_scale_factor.expect index a05dc823168696..e1f31dc406a0d1 100644 --- a/test/onnx/expect/TestOperators.test_upsample_nearest_scale_default_scale_factor.expect +++ b/test/onnx/expect/TestOperators.test_upsample_nearest_scale_default_scale_factor.expect @@ -1,24 +1,40 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { node { input: "x" - input: "onnx::Upsample_5" - output: "4" - name: "Upsample_0" - op_type: "Upsample" + input: "" + input: "onnx::Resize_6" + output: "5" + name: "Resize_0" + op_type: "Resize" + attribute { + name: "coordinate_transformation_mode" + s: "asymmetric" + type: STRING + } + attribute { + name: "cubic_coeff_a" + f: -0.75 + type: FLOAT + } attribute { name: "mode" s: "nearest" type: STRING } + attribute { + name: "nearest_mode" + s: "floor" + type: STRING + } } name: "torch_jit" initializer { dims: 4 data_type: 1 - name: "onnx::Upsample_5" + name: "onnx::Resize_6" raw_data: "\000\000\200?\000\000\200?\000\000\000@\000\000\000@" } input { @@ -44,22 +60,22 @@ graph { } } output { - name: "4" + name: "5" type { tensor_type { elem_type: 1 shape { dim { - dim_value: 1 + dim_param: "Resize5_dim_0" } dim { - dim_value: 2 + dim_param: "Resize5_dim_1" } dim { - dim_value: 6 + dim_param: "Resize5_dim_2" } dim { - dim_value: 8 + dim_param: "Resize5_dim_3" } } } @@ -67,5 +83,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_upsample_nearest_size.expect b/test/onnx/expect/TestOperators.test_upsample_nearest_size.expect index e597ddfa5c5d30..cbd32608d2ae0e 100644 --- a/test/onnx/expect/TestOperators.test_upsample_nearest_size.expect +++ b/test/onnx/expect/TestOperators.test_upsample_nearest_size.expect @@ -1,34 +1,112 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { node { - output: "onnx::Upsample_1" - name: "Constant_0" + input: "x" + output: "onnx::Slice_2" + name: "Shape_0" + op_type: "Shape" + } + node { + output: "onnx::Slice_3" + name: "Constant_1" + op_type: "Constant" + attribute { + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\000\000\000\000\000\000\000\000" + } + type: TENSOR + } + } + node { + output: "onnx::Slice_4" + name: "Constant_2" + op_type: "Constant" + attribute { + name: "value" + t { + dims: 1 + data_type: 7 + raw_data: "\000\000\000\000\000\000\000\000" + } + type: TENSOR + } + } + node { + output: "onnx::Slice_5" + name: "Constant_3" op_type: "Constant" attribute { name: "value" t { - dims: 4 - data_type: 1 - raw_data: "\000\000\200?\000\000\200?\253\252\252@\000\000\200@" + dims: 1 + data_type: 7 + raw_data: "\002\000\000\000\000\000\000\000" } type: TENSOR } } + node { + input: "onnx::Slice_2" + input: "onnx::Slice_4" + input: "onnx::Slice_5" + input: "onnx::Slice_3" + output: "onnx::Concat_6" + name: "Slice_4" + op_type: "Slice" + } + node { + input: "onnx::Concat_6" + input: "onnx::Concat_12" + output: "onnx::Resize_8" + name: "Concat_5" + op_type: "Concat" + attribute { + name: "axis" + i: 0 + type: INT + } + } node { input: "x" - input: "onnx::Upsample_1" - output: "2" - name: "Upsample_1" - op_type: "Upsample" + input: "" + input: "" + input: "onnx::Resize_8" + output: "11" + name: "Resize_6" + op_type: "Resize" + attribute { + name: "coordinate_transformation_mode" + s: "asymmetric" + type: STRING + } + attribute { + name: "cubic_coeff_a" + f: -0.75 + type: FLOAT + } attribute { name: "mode" s: "nearest" type: STRING } + attribute { + name: "nearest_mode" + s: "floor" + type: STRING + } } name: "torch_jit" + initializer { + dims: 2 + data_type: 7 + name: "onnx::Concat_12" + raw_data: "\020\000\000\000\000\000\000\000\020\000\000\000\000\000\000\000" + } input { name: "x" type { @@ -52,22 +130,22 @@ graph { } } output { - name: "2" + name: "11" type { tensor_type { elem_type: 1 shape { dim { - dim_value: 1 + dim_param: "Resize11_dim_0" } dim { - dim_value: 2 + dim_param: "Resize11_dim_1" } dim { - dim_value: 16 + dim_param: "Resize11_dim_2" } dim { - dim_value: 16 + dim_param: "Resize11_dim_3" } } } @@ -75,5 +153,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_view.expect b/test/onnx/expect/TestOperators.test_view.expect index cb79b41812229f..0976258229695a 100644 --- a/test/onnx/expect/TestOperators.test_view.expect +++ b/test/onnx/expect/TestOperators.test_view.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -55,5 +55,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_view_flatten.expect b/test/onnx/expect/TestOperators.test_view_flatten.expect index ae9d957dd9fd8a..ac814160d5bd1a 100644 --- a/test/onnx/expect/TestOperators.test_view_flatten.expect +++ b/test/onnx/expect/TestOperators.test_view_flatten.expect @@ -1,11 +1,11 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { node { input: "onnx::Reshape_0" - input: "onnx::Reshape_9" - output: "6" + input: "onnx::Reshape_11" + output: "8" name: "Reshape_0" op_type: "Reshape" } @@ -13,7 +13,7 @@ graph { initializer { dims: 2 data_type: 7 - name: "onnx::Reshape_9" + name: "onnx::Reshape_11" raw_data: "\001\000\000\000\000\000\000\000\030\000\000\000\000\000\000\000" } input { @@ -39,7 +39,7 @@ graph { } } output { - name: "6" + name: "8" type { tensor_type { elem_type: 1 @@ -56,5 +56,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/expect/TestOperators.test_zeros_like.expect b/test/onnx/expect/TestOperators.test_zeros_like.expect index 1293acb1e16fba..e4f6c6ede2cab1 100644 --- a/test/onnx/expect/TestOperators.test_zeros_like.expect +++ b/test/onnx/expect/TestOperators.test_zeros_like.expect @@ -1,4 +1,4 @@ -ir_version: 4 +ir_version: 7 producer_name: "pytorch" producer_version: "CURRENT_VERSION" graph { @@ -36,5 +36,5 @@ graph { } } opset_import { - version: 9 + version: 13 } diff --git a/test/onnx/test_models.py b/test/onnx/test_models.py index 5d22c255f8320f..1f2e0258edc1e4 100644 --- a/test/onnx/test_models.py +++ b/test/onnx/test_models.py @@ -46,15 +46,15 @@ def toC(x): class TestModels(TestCase): + opset_version = 9 # Caffe2 doesn't support the default. keep_initializers_as_inputs = False - from torch.onnx.symbolic_helper import _export_onnx_opset_version - opset_version = _export_onnx_opset_version def exportTest(self, model, inputs, rtol=1e-2, atol=1e-7): with torch.onnx.select_model_mode_for_export(model, None): graph = torch.onnx.utils._trace(model, inputs, OperatorExportTypes.ONNX) torch._C._jit_pass_lint(graph) - verify(model, inputs, backend, rtol=rtol, atol=atol) + verify(model, inputs, backend, rtol=rtol, atol=atol, + opset_version=self.opset_version) def test_ops(self): x = Variable( @@ -245,12 +245,12 @@ def test_shufflenet(self): @skipIfUnsupportedMinOpsetVersion(11) def test_fcn(self): x = Variable(torch.randn(BATCH_SIZE, 3, 224, 224).fill_(1.0)) - self.exportTest(toC(fcn_resnet101()), toC(x), rtol=1e-3, atol=1e-5) + self.exportTest(toC(fcn_resnet101(pretrained=False, pretrained_backbone=False)), toC(x), rtol=1e-3, atol=1e-5) @skipIfUnsupportedMinOpsetVersion(11) def test_deeplab(self): x = Variable(torch.randn(BATCH_SIZE, 3, 224, 224).fill_(1.0)) - self.exportTest(toC(deeplabv3_resnet101()), toC(x), rtol=1e-3, atol=1e-5) + self.exportTest(toC(deeplabv3_resnet101(pretrained=False, pretrained_backbone=False)), toC(x), rtol=1e-3, atol=1e-5) def test_r3d_18_video(self): x = Variable(torch.randn(1, 3, 4, 112, 112).fill_(1.0)) diff --git a/test/onnx/test_pytorch_common.py b/test/onnx/test_pytorch_common.py index 13b4585a5def84..35a408eca244d5 100644 --- a/test/onnx/test_pytorch_common.py +++ b/test/onnx/test_pytorch_common.py @@ -50,17 +50,17 @@ def skipIfUnsupportedMinOpsetVersion(min_opset_version): def skip_dec(func): def wrapper(self): if self.opset_version < min_opset_version: - raise unittest.SkipTest("Skip verify test for unsupported opset_version") + raise unittest.SkipTest(f"Unsupported opset_version: {self.opset_version} < {min_opset_version}") return func(self) return wrapper return skip_dec -# skips tests for all versions above min_opset_version. -def skipIfUnsupportedMaxOpsetVersion(min_opset_version): +# skips tests for all versions above max_opset_version. +def skipIfUnsupportedMaxOpsetVersion(max_opset_version): def skip_dec(func): def wrapper(self): - if self.opset_version > min_opset_version: - raise unittest.SkipTest("Skip verify test for unsupported opset_version") + if self.opset_version > max_opset_version: + raise unittest.SkipTest(f"Unsupported opset_version: {self.opset_version} > {max_opset_version}") return func(self) return wrapper return skip_dec @@ -107,14 +107,5 @@ def wrapper(self): return wrapper return skip_dec -def skipIfONNXShapeInference(onnx_shape_inference): - def skip_dec(func): - def wrapper(self): - if self.onnx_shape_inference is onnx_shape_inference: - raise unittest.SkipTest("Skip verify test for unsupported opset_version") - return func(self) - return wrapper - return skip_dec - def flatten(x): return tuple(function._iter_filter(lambda o: isinstance(o, torch.Tensor))(x)) diff --git a/test/onnx/test_pytorch_onnx_caffe2.py b/test/onnx/test_pytorch_onnx_caffe2.py index 72ff9392254525..31c2287893a20d 100644 --- a/test/onnx/test_pytorch_onnx_caffe2.py +++ b/test/onnx/test_pytorch_onnx_caffe2.py @@ -117,8 +117,7 @@ def do_export(model, inputs, *args, **kwargs): class TestCaffe2Backend_opset9(unittest.TestCase): - from torch.onnx.symbolic_helper import _export_onnx_opset_version - opset_version = _export_onnx_opset_version + opset_version = 9 embed_params = False def setUp(self): diff --git a/test/onnx/test_pytorch_onnx_caffe2_quantized.py b/test/onnx/test_pytorch_onnx_caffe2_quantized.py index b427b85a2b56f6..bb84ab698a9dbd 100644 --- a/test/onnx/test_pytorch_onnx_caffe2_quantized.py +++ b/test/onnx/test_pytorch_onnx_caffe2_quantized.py @@ -31,7 +31,9 @@ def generic_test(self, model, sample_inputs, input_names=None, decimal=3, relaxe f = io.BytesIO() torch.onnx.export(q_model, pt_inputs, f, input_names=input_names, - operator_export_type=torch.onnx.OperatorExportTypes.ONNX_ATEN_FALLBACK) + operator_export_type=torch.onnx.OperatorExportTypes.ONNX_ATEN_FALLBACK, + # Caffe2 doesn't support newer opset versions + opset_version=9) f.seek(0) onnx_model = onnx.load(f) caffe_res = c2.run_model(onnx_model, dict(zip(input_names, sample_inputs)))[0] @@ -94,7 +96,9 @@ def export_to_onnx(self, model, input, input_names): model = torch.jit.load(buf) f = io.BytesIO() torch.onnx.export(model, input, f, input_names=input_names, - operator_export_type=torch.onnx.OperatorExportTypes.ONNX_ATEN_FALLBACK) + operator_export_type=torch.onnx.OperatorExportTypes.ONNX_ATEN_FALLBACK, + # Caffe2 doesn't support newer opset versions + opset_version=9) f.seek(0) onnx_model = onnx.load(f) diff --git a/test/onnx/test_pytorch_onnx_onnxruntime.py b/test/onnx/test_pytorch_onnx_onnxruntime.py index bdda7fb2ebc778..59ac7bb0e3a3dd 100644 --- a/test/onnx/test_pytorch_onnx_onnxruntime.py +++ b/test/onnx/test_pytorch_onnx_onnxruntime.py @@ -23,8 +23,7 @@ RnnModelWithPackedSequenceWithState, RnnModelWithPackedSequenceWithoutState) from test_pytorch_common import (skipIfUnsupportedMinOpsetVersion, skipIfUnsupportedOpsetVersion, - skipIfNoLapack, disableScriptTest, skipIfONNXShapeInference, - skipIfUnsupportedMaxOpsetVersion, skipForAllOpsetVersions) + skipIfNoLapack, disableScriptTest, skipIfUnsupportedMaxOpsetVersion) from test_pytorch_common import BATCH_SIZE from test_pytorch_common import RNN_BATCH_SIZE, RNN_SEQUENCE_LENGTH, RNN_INPUT_SIZE, RNN_HIDDEN_SIZE from typing import List, Tuple, Optional, Dict @@ -82,9 +81,7 @@ def to_numpy(elem): def convert_to_onnx(model, input=None, opset_version=9, do_constant_folding=True, keep_initializers_as_inputs=True, dynamic_axes=None, input_names=None, output_names=None, - fixed_batch_size=False, training=None, - onnx_shape_inference=True): - # export the model to ONNX + fixed_batch_size=False, training=None): f = io.BytesIO() input_copy = copy.deepcopy(input) torch.onnx._export(model, input_copy, f, @@ -93,8 +90,7 @@ def convert_to_onnx(model, input=None, opset_version=9, do_constant_folding=True keep_initializers_as_inputs=keep_initializers_as_inputs, dynamic_axes=dynamic_axes, input_names=input_names, output_names=output_names, - fixed_batch_size=fixed_batch_size, training=training, - onnx_shape_inference=onnx_shape_inference) + fixed_batch_size=fixed_batch_size, training=training) # compute onnxruntime output prediction so = onnxruntime.SessionOptions() @@ -177,8 +173,7 @@ def run_model_test(self, model, batch_size=2, state_dict=None, do_constant_folding=do_constant_folding, keep_initializers_as_inputs=self.keep_initializers_as_inputs, dynamic_axes=dynamic_axes, input_names=input_names, - output_names=output_names, fixed_batch_size=fixed_batch_size, training=training, - onnx_shape_inference=self.onnx_shape_inference) + output_names=output_names, fixed_batch_size=fixed_batch_size, training=training) # compute onnxruntime output prediction if remained_onnx_input_idx is not None: input_onnx = [] @@ -289,11 +284,15 @@ def set_rng_seed(seed): random.seed(seed) np.random.seed(seed) -class TestONNXRuntime(unittest.TestCase): - from torch.onnx.symbolic_helper import _export_onnx_opset_version - opset_version = _export_onnx_opset_version +class _TestONNXRuntime: + """Abstract base class for test cases. + + Intentionally not a sub-class of unittest.TestCase so that unittest / pytest + don't run it directly. unitest.TestCase is mixed in as another base class when + creating concrete sub-types. See MakeTestCase(). + """ + opset_version = -1 # Sub-classes must override keep_initializers_as_inputs = True # For IR version 3 type export. - onnx_shape_inference = True def setUp(self): torch.manual_seed(0) @@ -617,8 +616,8 @@ def get_test_images(self) -> Tuple[List[torch.Tensor], List[torch.Tensor]]: @skipIfUnsupportedMinOpsetVersion(11) @disableScriptTest() # Faster RCNN model is not scriptable def test_faster_rcnn(self): - model = torchvision.models.detection.faster_rcnn.fasterrcnn_resnet50_fpn(pretrained=False, min_size=200, - max_size=300) + model = torchvision.models.detection.faster_rcnn.fasterrcnn_resnet50_fpn(pretrained=False, pretrained_backbone=True, + min_size=200, max_size=300) model.eval() x1 = torch.randn(3, 200, 300, requires_grad=True) x2 = torch.randn(3, 200, 300, requires_grad=True) @@ -664,8 +663,8 @@ def test_paste_mask_in_image(self): @skipIfUnsupportedMinOpsetVersion(11) @disableScriptTest() def test_mask_rcnn(self): - model = torchvision.models.detection.mask_rcnn.maskrcnn_resnet50_fpn(pretrained=False, min_size=200, - max_size=300) + model = torchvision.models.detection.mask_rcnn.maskrcnn_resnet50_fpn(pretrained=False, pretrained_backbone=True, + min_size=200, max_size=300) images, test_images = self.get_test_images() self.run_test(model, (images,), rtol=1e-3, atol=1e-5) self.run_test(model, (images,), input_names=["images_tensors"], output_names=["boxes", "labels", "scores", "masks"], @@ -705,8 +704,8 @@ def test_heatmaps_to_keypoints(self): @skipIfUnsupportedMinOpsetVersion(11) @disableScriptTest() def test_keypoint_rcnn(self): - model = torchvision.models.detection.keypoint_rcnn.keypointrcnn_resnet50_fpn(pretrained=False, min_size=200, - max_size=300) + model = torchvision.models.detection.keypoint_rcnn.keypointrcnn_resnet50_fpn(pretrained=False, pretrained_backbone=False, + min_size=200, max_size=300) images, test_images = self.get_test_images() self.run_test(model, (images,), rtol=1e-3, atol=1e-5) self.run_test(model, (images,), input_names=["images_tensors"], @@ -1436,7 +1435,6 @@ def forward(self, input1, input2, input3): # Conversion of Transpose depends on input shape to be known. # The following test only works when onnx shape inference is enabled. - @skipIfONNXShapeInference(False) def test_transpose_infer_shape(self): class TransposeModule(torch.jit.ScriptModule): def __init__(self): @@ -1664,7 +1662,6 @@ def forward(self, x): # Operator rank mismatch between outputs of two branches for opsets below 11. @skipIfUnsupportedMinOpsetVersion(11) - @skipIfONNXShapeInference(False) def test_floating_point_infer_dtype(self): class FloatingPoint(torch.jit.ScriptModule): @torch.jit.script_method @@ -1775,7 +1772,6 @@ def forward(self, x): x = torch.randn(2, 3, 4) self.run_test(ArithmeticModule(), x, remained_onnx_input_idx=[]) - @skipIfUnsupportedMinOpsetVersion(9) # https://github.com/microsoft/onnxruntime/issues/9663 def test_arithmetic_prim_bool(self): class ArithmeticModule(torch.nn.Module): def forward(self, x, y: int, z: bool, t: float): @@ -1812,7 +1808,6 @@ def forward(self, x): # In scripting the first transpose node do not carry shape and dtype info. # The following test only works when onnx shape inference is enabled. - @skipIfONNXShapeInference(False) def test_arithmetic_infer_dtype(self): class ArithmeticModule(torch.jit.ScriptModule): @torch.jit.script_method @@ -1895,7 +1890,6 @@ def forward(self, x, y): # In scripting x, y do not carry shape and dtype info. # The following test only works when onnx shape inference is enabled. - @skipIfONNXShapeInference(False) def test_div_promotion_script(self): class DivModule(torch.nn.Module): def forward(self, x, y): @@ -2689,8 +2683,7 @@ def forward(self, x): x = torch.empty(2, 3, 3, dtype=torch.double).uniform_(0, 1) self.run_test(Bernoulli(), x) - # Enable test when fix for allowzero is in ORT - @skipForAllOpsetVersions() + @unittest.skip("Bug in ORT, skip test until rel-1.11.") @skipIfUnsupportedMinOpsetVersion(14) def test_reshape_allowzero(self): class ReshapeModel(torch.nn.Module): @@ -2858,7 +2851,6 @@ def test_interpolate_adaptive_pooling_error(self): with self.assertRaises(RuntimeError) as cm: self._interpolate(x, "area", False, True) - @skipIfUnsupportedMinOpsetVersion(9) # https://github.com/microsoft/onnxruntime/issues/9663 def test_groupnorm(self): model = torch.nn.GroupNorm(3, 6, 0.002) x = torch.randn(4, 6, 180, 180, 180) @@ -2872,7 +2864,6 @@ def test_groupnorm(self): x = torch.randn(4, 6, 180, 180) self.run_test(model, x) - @skipIfUnsupportedMinOpsetVersion(9) # https://github.com/microsoft/onnxruntime/issues/9663 def test_groupnorm_noaffine(self): model = torch.nn.GroupNorm(4, 8, 0.002, affine=False) x = torch.randn(3, 8, 224, 224) @@ -3474,6 +3465,16 @@ def forward(self, x): x = torch.arange(1., 6., requires_grad=True) self.run_test(MyModule(), x) + @skipIfUnsupportedMinOpsetVersion(10) + def test_topk_int32_k(self): + class Model(torch.nn.Module): + def forward(self, x, k): + return torch.topk(x, k) + + x = torch.arange(1., 6.) + k = torch.tensor(3, dtype=torch.int32) + self.run_test(Model(), (x, k)) + @skipIfUnsupportedMinOpsetVersion(11) def test_topk_smallest_unsorted(self): class MyModule(torch.nn.Module): @@ -3570,7 +3571,6 @@ def test_batchnorm1d_noaffine(self): x = torch.randn(10, 10, 128) self.run_test(model, x) - @skipIfUnsupportedMinOpsetVersion(9) # https://github.com/microsoft/onnxruntime/issues/9663 def test_batchnorm1d_norunningstats(self): x = torch.randn(10, 10) model = torch.nn.BatchNorm1d(10, track_running_stats=False) @@ -3589,7 +3589,6 @@ def test_batchnorm2d_noaffine(self): model = torch.nn.BatchNorm2d(3, affine=False) self.run_test(model, x) - @skipIfUnsupportedMinOpsetVersion(9) # https://github.com/microsoft/onnxruntime/issues/9663 def test_batchnorm2d_norunningstats(self): x = torch.randn(10, 3, 128, 128) model = torch.nn.BatchNorm2d(3, track_running_stats=False) @@ -3614,7 +3613,6 @@ def test_instancenorm1d_runningstats(self): model = torch.nn.InstanceNorm1d(5, affine=False, track_running_stats=True) self.run_test(model, x) - @skipIfUnsupportedMinOpsetVersion(9) # https://github.com/microsoft/onnxruntime/issues/9663 def test_instancenorm1d_norunningstats(self): x = torch.randn(10, 5, 128) model = torch.nn.InstanceNorm1d(5, affine=True, track_running_stats=False) @@ -3632,7 +3630,6 @@ def test_instancenorm2d_runningstats(self): model = torch.nn.InstanceNorm2d(3, affine=False, track_running_stats=True) self.run_test(model, x) - @skipIfUnsupportedMinOpsetVersion(9) # https://github.com/microsoft/onnxruntime/issues/9663 def test_instancenorm2d_norunningstats(self): x = torch.randn(10, 3, 128, 128) model = torch.nn.InstanceNorm2d(3, affine=True, track_running_stats=False) @@ -3650,7 +3647,6 @@ def test_instancenorm3d_runningstats(self): model = torch.nn.InstanceNorm3d(3, affine=False, track_running_stats=True) self.run_test(model, x) - @skipIfUnsupportedMinOpsetVersion(9) # https://github.com/microsoft/onnxruntime/issues/9663 def test_instancenorm3d_norunningstats(self): x = torch.randn(10, 3, 128, 128, 128) model = torch.nn.InstanceNorm3d(3, affine=True, track_running_stats=False) @@ -3736,6 +3732,17 @@ def forward(self, src, index): index = torch.tensor([[0, 1], [0, 1], [0, 1]], dtype=torch.int64) self.run_test(ScatterModel(), (src, index)) + @skipIfUnsupportedMinOpsetVersion(9) + def test_bucketize(self): + class BucketModel(torch.nn.Module): + def forward(self, input, boundaries): + return torch.bucketize(input, boundaries), \ + torch.bucketize(input, boundaries, right=True) + + input = torch.tensor([[2, 5, 10], [6, 8, 3]]) + boundaries = torch.tensor([1, 5, 7, 8, 10]) + self.run_test(BucketModel(), (input, boundaries)) + @skipIfUnsupportedMinOpsetVersion(9) def test_one_hot(self): class OneHot(torch.nn.Module): @@ -4327,6 +4334,15 @@ def forward(self, input, other): y = torch.randn(4, 1, requires_grad=True) self.run_test(model, (x, y)) + def test_amax_amin(self): + class Model(torch.nn.Module): + def forward(self, x): + return torch.amax(x, dim=0, keepdim=True), torch.amin(x, dim=[0, 1], keepdim=False) + + model = Model() + x = torch.randn(4, 4) + self.run_test(model, x) + @skipIfUnsupportedMinOpsetVersion(9) def test_arange_end(self): class ArangeScript(torch.jit.ScriptModule): @@ -4702,6 +4718,15 @@ def forward(self, x): x = torch.tensor([[1, 2], [3, 4]]) self.run_test(RepeatsDimsModel2(), (x,)) + @skipIfUnsupportedMinOpsetVersion(9) + def test_repeat_interleave_noop(self): + class Model(torch.nn.Module): + def forward(self, x): + return x.repeat_interleave(1, dim=1) + + x = torch.randn(4, 1, 8) + self.run_test(Model(), (x,)) + @skipIfUnsupportedMinOpsetVersion(13) def test_dynamic_repeat_interleave(self): class SingleDynamicModel(torch.nn.Module): @@ -4894,6 +4919,9 @@ def forward(self, input): x = torch.randint(10, (1, 2, 3, 4)) self.run_test(FlattenModel(), x) + x = torch.randn(4) + self.run_test(FlattenModel(), x) + def test_flatten2d(self): class FlattenModel(torch.nn.Module): def forward(self, input): @@ -5277,7 +5305,6 @@ def forward(self, x): inputs = torch.randn(16) self.run_test(model, inputs) - @skipIfONNXShapeInference(False) @skipIfUnsupportedMinOpsetVersion(11) def test_loop_transpose(self): class LoopModel(torch.nn.Module): @@ -5619,19 +5646,25 @@ def forward(self, x): self.run_test(OnesModel(), x, input_names=["x"], dynamic_axes={"x": [0, 1, 2]}) self.run_test(OnesModel(), x, remained_onnx_input_idx=[]) - @skipIfONNXShapeInference(True) + @skipIfUnsupportedMinOpsetVersion(9) + @disableScriptTest() # torch.zeros/torch.ones with size tensor of dim != 0 not scriptable. + def test_zeros_ones_with_tensor_input(self): + class ZeroAndOnes(torch.nn.Module): + def forward(self, x): + return torch.zeros(x, 1), torch.ones(x, 1) + + x = torch.tensor([2]) + self.run_test(ZeroAndOnes(), (x, )) + @skipIfUnsupportedMinOpsetVersion(9) def test_tolist(self): class List(torch.jit.ScriptModule): @torch.jit.script_method def forward(self, input): - cur_shape = torch._shape_as_tensor(input) - final_shape: List[int] = cur_shape.tolist() - pad_tensor = torch.zeros([1, 2] + final_shape) - return pad_tensor + res: List[int] = input.tolist() + return res - x = torch.randn(2, 3) - self.run_test(List(), (x,)) + self.run_test(List(), (torch.randint(100, (1,)),)) @skipIfUnsupportedMinOpsetVersion(9) def test_list_pass(self): @@ -6124,7 +6157,6 @@ def forward(self, x): input_names=["x"], test_with_inputs=[y]) - @skipIfONNXShapeInference(False) def test_unfold_infer_shape(self): class UnfoldModule(torch.jit.ScriptModule): def __init__(self): @@ -6742,24 +6774,19 @@ def forward(self, x, pad: List[int]): @skipIfUnsupportedMaxOpsetVersion(10) + @disableScriptTest() # TODO: the logic in symbolic_opset9 doesn't handle script def test_unsupported_pad(self): class Pad(torch.nn.Module): - def forward(self, x, pad): + def forward(self, x, pad: List[int]): return torch.nn.functional.pad(x, pad) - def run(): - x = torch.randn(2, 2, 4, 4) - y = pad = (torch.tensor(2, dtype=torch.int32), torch.tensor(4, dtype=torch.int32)) - p = Pad() - f = io.BytesIO() - torch.onnx._export(p, (x, y), f) + x = torch.randn(2, 2, 4, 4) + y = [2, 4] - with self.assertRaises(RuntimeError) as cm: - run() + with self.assertRaisesRegex(RuntimeError, ("Unsupported: ONNX export of Pad.*" + + "The sizes of the padding must be constant")): + self.run_test(Pad(), (x, y)) - the_exception = cm.exception - self.assertEqual("Unsupported: ONNX export of Pad in opset 9. The sizes of the padding must be constant. " + - "Please try opset version 11.", the_exception.args[0]) @skipIfUnsupportedMinOpsetVersion(9) def test_if_fold(self): @@ -6880,7 +6907,6 @@ def forward(self, x, y): self.run_test(IfFoldModel(), (x, y)) @skipIfUnsupportedMinOpsetVersion(11) - @skipIfONNXShapeInference(False) def test_uninitialized(self): class UninitializedModel(torch.nn.Module): def forward(self, y): @@ -6895,7 +6921,6 @@ def forward(self, y): self.run_test(UninitializedModel(), x) @skipIfUnsupportedMinOpsetVersion(11) - @skipIfONNXShapeInference(False) def test_uninitialized_dynamic(self): class UninitializedModel(torch.nn.Module): def forward(self, y): @@ -6914,7 +6939,6 @@ def forward(self, y): # onnx::Identity of sequence supported for ONNX opset >= 14 @skipIfUnsupportedMinOpsetVersion(14) - @skipIfONNXShapeInference(False) def test_uninitialized_tensorList(self): class UninitializedTensorListModel(torch.nn.Module): def forward(self, x): @@ -6930,7 +6954,6 @@ def forward(self, x): # onnx::Identity of sequence supported for ONNX opset >= 14 @skipIfUnsupportedMinOpsetVersion(14) - @skipIfONNXShapeInference(False) def test_uninitialized_tensorList_dynamic(self): class UninitializedTensorListModel(torch.nn.Module): def forward(self, x): @@ -6947,7 +6970,6 @@ def forward(self, x): # onnx::Identity of sequence supported for ONNX opset >= 14 @skipIfUnsupportedMinOpsetVersion(14) - @skipIfONNXShapeInference(False) def test_uninitialized_intList(self): class UninitializedListModel(torch.nn.Module): def forward(self, x): @@ -6966,7 +6988,6 @@ def forward(self, x): # onnx::Identity of sequence supported for ONNX opset >= 14 @skipIfUnsupportedMinOpsetVersion(14) - @skipIfONNXShapeInference(False) def test_uninitialized_tensorList_shape(self): class UninitializedModel(torch.nn.Module): def forward(self, x): @@ -7270,6 +7291,12 @@ def forward(self, x): for x in [torch.randn(3, 4), torch.randn(3, 4).to(dtype=torch.bool)]: self.run_test(EinsumModelTranspose(), input=(x,)) + @skipIfUnsupportedMinOpsetVersion(9) + def test_cosine_similarity(self): + x = torch.randn(5, 3, 2) + y = torch.randn(5, 3, 2) + self.run_test(torch.nn.CosineSimilarity(dim=2), input=(x, y)) + @skipIfUnsupportedMinOpsetVersion(12) def test_crossentropyloss(self): for ignore_index in [-100, 1]: @@ -8135,7 +8162,6 @@ def forward(self, x: torch.Tensor): self.run_test(MyModule(), x) - @skipIfONNXShapeInference(False) @skipIfUnsupportedMinOpsetVersion(11) def test_if_transpose(self): class IfModel(torch.nn.Module): @@ -8151,7 +8177,6 @@ def forward(self, x): output_names=["output_1"], dynamic_axes={"output_1": [0, 1]}) - @skipIfONNXShapeInference(False) @skipIfUnsupportedMinOpsetVersion(13) def test_if_list(self): class IfModel(torch.nn.Module): @@ -8560,7 +8585,6 @@ def forward(self, input): x = torch.randn(6, 4, 3, 3) self.run_test(FakeQuantizePerChannelModel(), (x)) - @skipIfUnsupportedMinOpsetVersion(9) # https://github.com/microsoft/onnxruntime/issues/9663 def test_batchnorm_training(self): class MyModule(torch.nn.Module): def __init__(self): @@ -8585,7 +8609,6 @@ def forward(self, x): model_export.train() self.run_test(model_export, (x, ), training=torch.onnx.TrainingMode.PRESERVE, rtol=1e-3, atol=1e-5) - @skipIfUnsupportedMinOpsetVersion(9) # https://github.com/microsoft/onnxruntime/issues/9663 def test_batchnorm_training_mode_fix_layer(self): class MyModule(torch.nn.Module): def __init__(self): @@ -8636,7 +8659,6 @@ def forward(self, x): model_export.eval() self.run_test(model_export, (x,), training=torch.onnx.TrainingMode.PRESERVE, rtol=1e-3, atol=1e-5) - @skipIfUnsupportedMinOpsetVersion(9) # https://github.com/microsoft/onnxruntime/issues/9663 def test_instancenorm_training(self): class MyModule(torch.nn.Module): def __init__(self): @@ -8661,7 +8683,6 @@ def forward(self, x): model_export.train() self.run_test(model_export, (x, ), training=torch.onnx.TrainingMode.PRESERVE, rtol=1e-3, atol=1e-5) - @skipIfUnsupportedMinOpsetVersion(9) # https://github.com/microsoft/onnxruntime/issues/9663 def test_instancenorm_training_mode_fix_layer(self): class MyModule(torch.nn.Module): def __init__(self): @@ -8687,7 +8708,6 @@ def forward(self, x): model_export.train() self.run_test(model_export, (x,), training=torch.onnx.TrainingMode.PRESERVE, rtol=1e-3, atol=1e-5) - @skipIfUnsupportedMinOpsetVersion(9) # https://github.com/microsoft/onnxruntime/issues/9663 def test_instancenorm_eval_mode_train_layer(self): class MyModule(torch.nn.Module): def __init__(self): @@ -8789,7 +8809,6 @@ def forward(self, x): np.testing.assert_allclose(ratio_pytorch, ratio_ort, rtol=0.01, atol=0.01) - @skipIfUnsupportedMinOpsetVersion(9) # https://github.com/microsoft/onnxruntime/issues/9663 def test_conv_bn(self): class MyModule(torch.nn.Module): def __init__(self): @@ -8807,7 +8826,6 @@ def forward(self, x): self.run_test(model_export, (x,), training=torch.onnx.TrainingMode.EVAL) self.run_test(model_export, (x,), training=torch.onnx.TrainingMode.TRAINING, rtol=1e-3, atol=1e-5) - @skipIfUnsupportedMinOpsetVersion(9) # https://github.com/microsoft/onnxruntime/issues/9663 def test_multiple_conv_bn(self): class MyModule(torch.nn.Module): def __init__(self): @@ -10486,7 +10504,6 @@ def symbolic_custom_invalid_add(g, input, other, alpha=None): loaded_model = onnx.load_from_string(f.getvalue()) - @skipIfUnsupportedMinOpsetVersion(9) # https://github.com/microsoft/onnxruntime/issues/9663 def test_tuple_output_from_if_with_raised_exception(self): class M(torch.nn.Module): def __init__(self): @@ -10527,10 +10544,16 @@ def forward(self, x): @skipIfUnsupportedMinOpsetVersion(10) def test_quantized_linear(self): model = torch.nn.quantized.Linear(4, 8) + # Set fixed weight to avoid flaky test. + weight = torch.quantize_per_tensor( + torch.arange(32, dtype=torch.float).view(8, 4), + 0.5, 0, torch.qint8) # Set non-zero bias. - bias = torch.arange(8).to(torch.float) - model.set_weight_bias(model.weight(), bias) + bias = torch.arange(8, dtype=torch.float) + model.set_weight_bias(weight, bias) + # Set fixed input to avoid flaky test. input = torch.randn(4, 4) + input = torch.arange(16, dtype=torch.float).view(4, 4) - 8 input_tensor = torch.quantize_per_tensor(input, 0.5, 128, torch.quint8) # Currently, we need convert the model to ScriptModule before export. # The reason is that PackedParams contains int (not tensor). @@ -10633,6 +10656,29 @@ def forward(self, x): x = torch.quantize_per_tensor(torch.randn(3, 4), 0.2, 0, torch.qint8) self.run_test(torch.jit.trace(Module(), x), x) + @skipIfUnsupportedMinOpsetVersion(9) + def test_convolution_allow_tf32(self): + class Module(torch.nn.Module): + def __init__(self, allow_tf32): + super().__init__() + + self.allow_tf32 = allow_tf32 + weight = torch.rand(32, 3, 3, 3) + self.weight = torch.nn.Parameter(weight) + + def forward(self, x): + if self.allow_tf32: + return torch._convolution(x, self.weight, None, [2, 2], [0, 0], [1, 1], False, [0, 0], + 1, False, False, True, True) + else: + return torch._convolution(x, self.weight, None, [2, 2], [0, 0], [1, 1], False, [0, 0], + 1, False, False, True) + + x = torch.randn(1, 3, 224, 224) + self.run_test(Module(False), x, rtol=1e-3, atol=1e-6) + self.run_test(Module(True), x, rtol=1e-3, atol=1e-6) + + def make_test(name, base, layer, bidirectional, initial_state, variable_length, dropout, script_test_min_opset_version, **extra_kwargs): @@ -10664,7 +10710,7 @@ def f(self): **extra_kwargs) f.__name__ = test_name - setattr(TestONNXRuntime, f.__name__, f) + setattr(_TestONNXRuntime, f.__name__, f) def setup_rnn_tests(): layers_opts = [ @@ -10722,7 +10768,7 @@ def setup_rnn_tests(): test_count += 1 # sanity check that a representative example does exist - TestONNXRuntime.test_gru_trilayer_forward_with_initial_state_without_sequence_lengths_with_dropout + _TestONNXRuntime.test_gru_trilayer_forward_with_initial_state_without_sequence_lengths_with_dropout # make sure no one accidentally disables all the tests without # noticing @@ -10732,82 +10778,42 @@ def setup_rnn_tests(): setup_rnn_tests() -# opset 7 tests -TestONNXRuntime_opset7 = type(str("TestONNXRuntime_opset7"), - (unittest.TestCase,), - dict(TestONNXRuntime.__dict__, opset_version=7)) - -# opset 8 tests -TestONNXRuntime_opset8 = type(str("TestONNXRuntime_opset8"), - (unittest.TestCase,), - dict(TestONNXRuntime.__dict__, opset_version=8)) - - -# opset 10 tests -TestONNXRuntime_opset10 = type(str("TestONNXRuntime_opset10"), - (unittest.TestCase,), - dict(TestONNXRuntime.__dict__, opset_version=10)) - -# opset 11 tests -TestONNXRuntime_opset11 = type(str("TestONNXRuntime_opset11"), - (unittest.TestCase,), - dict(TestONNXRuntime.__dict__, opset_version=11)) - -# opset 12 tests -TestONNXRuntime_opset12 = type(str("TestONNXRuntime_opset12"), - (unittest.TestCase,), - dict(TestONNXRuntime.__dict__, opset_version=12)) - -# opset 9 tests, with keep_initializers_as_inputs=False for -# IR version 4 style export. -TestONNXRuntime_opset9_IRv4 = type(str("TestONNXRuntime_opset9_IRv4"), - (unittest.TestCase,), - dict(TestONNXRuntime.__dict__, - keep_initializers_as_inputs=False)) - - -# opset 10 tests, with keep_initializers_as_inputs=False for -# IR version 4 style export. -TestONNXRuntime_opset10_IRv4 = type(str("TestONNXRuntime_opset10_IRv4"), - (unittest.TestCase,), - dict(TestONNXRuntime.__dict__, opset_version=10, - keep_initializers_as_inputs=False)) - - -# opset 11 tests, with keep_initializers_as_inputs=False for -# IR version 4 style export. -TestONNXRuntime_opset11_IRv4 = type(str("TestONNXRuntime_opset11_IRv4"), - (unittest.TestCase,), - dict(TestONNXRuntime.__dict__, opset_version=11, - keep_initializers_as_inputs=False)) - -# opset 12 tests, with keep_initializers_as_inputs=False for -# IR version 4 style export. -TestONNXRuntime_opset12_IRv4 = type(str("TestONNXRuntime_opset12_IRv4"), - (unittest.TestCase,), - dict(TestONNXRuntime.__dict__, opset_version=12, - keep_initializers_as_inputs=False)) - -# opset 13 tests -TestONNXRuntime_opset13 = type(str("TestONNXRuntime_opset13"), - (unittest.TestCase,), - dict(TestONNXRuntime.__dict__, opset_version=13, - keep_initializers_as_inputs=False, - onnx_shape_inference=True)) - -# opset 14 tests -TestONNXRuntime_opset14 = type(str("TestONNXRuntime_opset14"), - (unittest.TestCase,), - dict(TestONNXRuntime.__dict__, opset_version=14, - keep_initializers_as_inputs=False, - onnx_shape_inference=True)) - -# opset 15 tests -TestONNXRuntime_opset15 = type(str("TestONNXRuntime_opset15"), - (unittest.TestCase,), - dict(TestONNXRuntime.__dict__, opset_version=15, - keep_initializers_as_inputs=False, - onnx_shape_inference=True)) +def MakeTestCase(opset_version: int, keep_initializers_as_inputs: bool = True) -> type: + name = f"TestONNXRuntime_opset{opset_version}" + if not keep_initializers_as_inputs: + name += "_IRv4" + return type(str(name), + (unittest.TestCase,), + dict(_TestONNXRuntime.__dict__, + opset_version=opset_version, + keep_initializers_as_inputs=keep_initializers_as_inputs)) + + +TestONNXRuntime_opset7 = MakeTestCase(7) + +TestONNXRuntime_opset8 = MakeTestCase(8) + +TestONNXRuntime_opset9 = MakeTestCase(9) + +TestONNXRuntime_opset9_IRv4 = MakeTestCase(9, keep_initializers_as_inputs=False) + +TestONNXRuntime_opset10 = MakeTestCase(10) + +TestONNXRuntime_opset10_IRv4 = MakeTestCase(10, keep_initializers_as_inputs=False) + +TestONNXRuntime_opset11 = MakeTestCase(11) + +TestONNXRuntime_opset11_IRv4 = MakeTestCase(11, keep_initializers_as_inputs=False) + +TestONNXRuntime_opset12 = MakeTestCase(12) + +TestONNXRuntime_opset12_IRv4 = MakeTestCase(12, keep_initializers_as_inputs=False) + +TestONNXRuntime_opset13 = MakeTestCase(13, keep_initializers_as_inputs=False) + +TestONNXRuntime_opset14 = MakeTestCase(14, keep_initializers_as_inputs=False) + +TestONNXRuntime_opset15 = MakeTestCase(15, keep_initializers_as_inputs=False) if __name__ == "__main__": diff --git a/test/onnx/test_pytorch_onnx_onnxruntime_cuda.py b/test/onnx/test_pytorch_onnx_onnxruntime_cuda.py index 575d4caa16cebb..00a5b223bfa18e 100644 --- a/test/onnx/test_pytorch_onnx_onnxruntime_cuda.py +++ b/test/onnx/test_pytorch_onnx_onnxruntime_cuda.py @@ -99,6 +99,21 @@ def forward(self, x): x = torch.ones(3, 4, requires_grad=True, dtype=torch.float16, device=torch.device("cuda")) self.run_test(MyModule(), x, rtol=1e-3, atol=1e-5) + @skipIfNoCuda + def test_deduplicate_initializers_diff_devices(self): + class Model(torch.nn.Module): + def __init__(self): + super().__init__() + self.w = torch.nn.Parameter(torch.ones(2, 3, device=torch.device("cpu"))) + self.b = torch.nn.Parameter(torch.ones(3, device=torch.device("cuda"))) + + def forward(self, x, y): + return torch.matmul(self.w, x), y + self.b + + x = torch.randn(3, 3, device=torch.device("cpu")) + y = torch.randn(3, 3, device=torch.device("cuda")) + self.run_test(Model(), (x, y)) + TestONNXRuntime_cuda.setUp = TestONNXRuntime.setUp TestONNXRuntime_cuda.run_test = TestONNXRuntime.run_test diff --git a/test/onnx/test_pytorch_onnx_shape_inference.py b/test/onnx/test_pytorch_onnx_shape_inference.py index ecd3641c8fd796..7d636facaf67c5 100644 --- a/test/onnx/test_pytorch_onnx_shape_inference.py +++ b/test/onnx/test_pytorch_onnx_shape_inference.py @@ -6,6 +6,7 @@ from torch.onnx.symbolic_helper import (_set_onnx_shape_inference, _onnx_main_opset, _set_opset_version) +from test_pytorch_common import skipIfUnsupportedMinOpsetVersion def expect_tensor(scalar_type, shape=None): def verify(actual_type): @@ -75,6 +76,23 @@ def test_constant_of_shape_dynamic(self): constant_of_shape = g.op("ConstantOfShape", shape, value_t=torch.tensor([2.0])) self.run_test(g, constant_of_shape.node(), expect_tensor("Float", shape=(None, None, None, None))) + def test_gather_dynamic_index(self): + g = self.create_empty_graph() + input = g.addInput() + input.setType(input.type().with_dtype(torch.float).with_sizes([None, 3, 16, 16])) + indices = g.addInput() + indices.setType(indices.type().with_dtype(torch.int64).with_sizes([None])) + output = g.op("Gather", input, indices, axis_i=1) + self.run_test(g, output.node(), expect_tensor("Float", shape=([None, None, 16, 16]))) + + def test_gather_scalar_index(self): + g = self.create_empty_graph() + input = g.addInput() + input.setType(input.type().with_dtype(torch.float).with_sizes([None, 3, 16, 16])) + indices = self.insert_tensor_constant(g, torch.tensor(1)) + output = g.op("Gather", input, indices, axis_i=1) + self.run_test(g, output.node(), expect_tensor("Float", shape=([None, 16, 16]))) + def test_reshape(self): g = self.create_empty_graph() constant = self.insert_tensor_constant(g, torch.ones(2, 16, 5, 5)) @@ -102,6 +120,15 @@ def test_reshape_symbolic(self): output = g.op("Reshape", input, constant) self.run_test(g, output.node(), expect_tensor(None, shape=(None, None, 16))) + @skipIfUnsupportedMinOpsetVersion(14) + def test_reshape_allowzero(self): + g = self.create_empty_graph() + input = g.addInput() + input.setType(input.type().with_sizes([3, 4, 0])) + constant = self.insert_tensor_constant(g, torch.tensor([0, 4, 3])) + output = g.op("Reshape", input, constant, allowzero_i=1) + self.run_test(g, output.node(), expect_tensor(None, shape=(0, 4, 3))) + def test_slice(self): g = self.create_empty_graph() input = g.addInput() diff --git a/test/onnx/test_utility_funs.py b/test/onnx/test_utility_funs.py index 0f0c1e482a6603..22fe21e7291e84 100644 --- a/test/onnx/test_utility_funs.py +++ b/test/onnx/test_utility_funs.py @@ -15,8 +15,10 @@ _unpack_list, parse_args) import torch.utils.cpp_extension +from autograd_helper import CustomFunction as CustomFunction2 from test_pytorch_common import (skipIfUnsupportedMinOpsetVersion, - skipIfUnsupportedMaxOpsetVersion) + skipIfUnsupportedMaxOpsetVersion, + skipIfNoCuda) from verify import verify import torchvision @@ -956,7 +958,7 @@ def test_onnx_fallthrough(self): # Test aten export of op with symbolic for aten x = torch.randn(100, 128) y = torch.randn(100, 128) - model = torch.nn.CosineSimilarity(dim=1, eps=1e-6) + model = torch.nn.PairwiseDistance(p=2, eps=1e-6) graph, _, __ = self._model_to_graph(model, (x, y), operator_export_type=OperatorExportTypes.ONNX_FALLTHROUGH, @@ -965,7 +967,8 @@ def test_onnx_fallthrough(self): iter = graph.nodes() self.assertEqual(next(iter).kind(), "onnx::Constant") self.assertEqual(next(iter).kind(), "onnx::Constant") - self.assertEqual(next(iter).kind(), "aten::cosine_similarity") + self.assertEqual(next(iter).kind(), "onnx::Constant") + self.assertEqual(next(iter).kind(), "aten::pairwise_distance") # prim::ListConstruct is exported as onnx::SequenceConstruct for opset >= 11 @skipIfUnsupportedMaxOpsetVersion(10) @@ -1038,6 +1041,36 @@ def forward(self, input): iter = graph.nodes() self.assertEqual(next(iter).kind(), "prim::PythonOp") + def test_autograd_module_name(self): + class CustomFunction(torch.autograd.Function): + @staticmethod + def forward(ctx, input): + ctx.save_for_backward(input) + return input.clamp(min=0) + + @staticmethod + def backward(ctx, grad_output): + input, = ctx.saved_tensors + grad_input = grad_output.clone() + grad_input[input < 0] = 0 + return grad_input + + class Custom(torch.nn.Module): + def forward(self, input): + return CustomFunction.apply(input) + CustomFunction2.apply(input) + + model = Custom() + batch = torch.FloatTensor(1, 3) + + graph, _, _ = self._model_to_graph(model, batch, + input_names=["batch"], dynamic_axes={"batch": [0, 1]}) + iter = graph.nodes() + autograd1 = next(iter) + autograd2 = next(iter) + self.assertEqual(autograd1.kind(), "prim::PythonOp") + self.assertEqual(autograd2.kind(), "prim::PythonOp") + self.assertNotEqual(autograd1.s("module"), autograd2.s("module")) + def test_unused_initializers(self): class Model(torch.nn.Module): def __init__(self): @@ -1252,6 +1285,13 @@ def forward(self, x): graph = onnx.load(io.BytesIO(f.getvalue())) self.assertSetEqual(set([i.name for i in graph.graph.initializer]), param_name_set) + model.train() + f = io.BytesIO() + torch.onnx.export(model, (x,), f, training=TrainingMode.PRESERVE, + opset_version=self.opset_version) + graph = onnx.load(io.BytesIO(f.getvalue())) + self.assertSetEqual(set([i.name for i in graph.graph.initializer]), param_name_set) + # Test eval mode. model.eval() f = io.BytesIO() @@ -1267,6 +1307,24 @@ def test_deduplicate_initializers(self): def test_deduplicate_initializers_torchscript(self): self._test_deduplicate_initializers(torchscript=True) + @skipIfNoCuda + def test_deduplicate_initializers_diff_devices(self): + class Model(torch.nn.Module): + def __init__(self): + super().__init__() + self.w_cpu = torch.nn.Parameter(torch.ones(3, device=torch.device("cpu"))) + self.w_cuda = torch.nn.Parameter(torch.ones(3, device=torch.device("cuda"))) + + def forward(self, x, y): + return x + self.w_cpu, y + self.w_cuda + + x = torch.randn(3, 3, device=torch.device("cpu")) + y = torch.randn(3, 3, device=torch.device("cuda")) + f = io.BytesIO() + torch.onnx.export(Model(), (x, y), f, opset_version=self.opset_version) + graph = onnx.load(io.BytesIO(f.getvalue())) + self.assertSetEqual(set([i.name for i in graph.graph.initializer]), {"w_cpu"}) + def test_duplicated_output_node(self): class DuplicatedOutputNet(torch.nn.Module): def __init__(self, input_size, num_classes): diff --git a/test/package/package_e/test_nn_module.pt b/test/package/package_e/test_nn_module.pt new file mode 100644 index 00000000000000..1c1a8964a8a42f Binary files /dev/null and b/test/package/package_e/test_nn_module.pt differ diff --git a/test/package/test_dependency_api.py b/test/package/test_dependency_api.py index be867528282dca..9f1a9c9899e8b3 100644 --- a/test/package/test_dependency_api.py +++ b/test/package/test_dependency_api.py @@ -182,7 +182,7 @@ def test_pickle_mocked(self): obj2 = package_a.PackageAObject(obj) buffer = BytesIO() - with self.assertRaises(NotImplementedError): + with self.assertRaises(PackagingError): with PackageExporter(buffer) as he: he.mock(include="package_a.subpackage") he.intern("**") diff --git a/test/package/test_misc.py b/test/package/test_misc.py index 659355b62e5988..480217b8feb3b9 100644 --- a/test/package/test_misc.py +++ b/test/package/test_misc.py @@ -2,12 +2,15 @@ # Owner(s): ["oncall: package/deploy"] import inspect +import platform from io import BytesIO +from pathlib import Path from textwrap import dedent +from unittest import skipIf from torch.package import PackageExporter, PackageImporter, is_from_package from torch.package.package_exporter import PackagingError -from torch.testing._internal.common_utils import run_tests +from torch.testing._internal.common_utils import IS_FBCODE, IS_SANDCASTLE, run_tests try: from .common import PackageTestCase @@ -31,6 +34,7 @@ def test_file_structure(self): """\ ├── .data │ ├── extern_modules + │ ├── python_version │ └── version ├── main │ └── main @@ -54,6 +58,7 @@ def test_file_structure(self): """\ ├── .data │ ├── extern_modules + │ ├── python_version │ └── version ├── main │ └── main @@ -99,6 +104,36 @@ def test_file_structure(self): import_exclude, ) + def test_python_version(self): + """ + Tests that the current python version is stored in the package and is available + via PackageImporter's python_version() method. + """ + buffer = BytesIO() + + with PackageExporter(buffer) as he: + from package_a.test_module import SimpleTest + + he.intern("**") + obj = SimpleTest() + he.save_pickle("obj", "obj.pkl", obj) + + buffer.seek(0) + hi = PackageImporter(buffer) + + self.assertEqual(hi.python_version(), platform.python_version()) + + @skipIf( + IS_FBCODE or IS_SANDCASTLE, + "Tests that use temporary files are disabled in fbcode", + ) + def test_load_python_version_from_package(self): + """Tests loading a package with a python version embdded""" + importer1 = PackageImporter( + f"{Path(__file__).parent}/package_e/test_nn_module.pt" + ) + self.assertEqual(importer1.python_version(), "3.9.7") + def test_file_structure_has_file(self): """ Test Directory's has_file() method. diff --git a/test/quantization/ao_migration/test_quantization_fx.py b/test/quantization/ao_migration/test_quantization_fx.py index b47ffbcf72871c..0728595dba8745 100644 --- a/test/quantization/ao_migration/test_quantization_fx.py +++ b/test/quantization/ao_migration/test_quantization_fx.py @@ -197,7 +197,7 @@ def test_function_import_fx_utils(self): 'create_qparam_nodes', 'all_node_args_have_no_tensors', 'node_return_type_is_int', - 'node_bool_tensor_arg_indexes', + 'get_non_observable_arg_indexes_and_types', 'is_get_tensor_info_node', 'maybe_get_next_module' ] diff --git a/test/quantization/core/test_quantized_module.py b/test/quantization/core/test_quantized_module.py index d001aad7242b5d..7cbab3be475e19 100644 --- a/test/quantization/core/test_quantized_module.py +++ b/test/quantization/core/test_quantized_module.py @@ -27,6 +27,7 @@ override_quantized_engine, override_qengines, qengine_is_qnnpack, + qengine_is_onednn, ) from hypothesis import assume, given from hypothesis import strategies as st @@ -99,7 +100,9 @@ def _test_linear_api_impl(self, batch_size, in_features, out_features, use_bias, zero_points=zero_point_tensor, axis=0, dtype=torch.qint8) else: - W_q = torch.quantize_per_tensor(W, 0.1, 4, torch.qint8) + # ONEDNN only supports symmetric quantization of weight + W_zp = 0 if qengine_is_onednn() else 4 + W_q = torch.quantize_per_tensor(W, 0.1, W_zp, torch.qint8) X = torch.rand(batch_size, in_features).float() X_q = torch.quantize_per_tensor(X, 0.2, 10, torch.quint8) @@ -434,7 +437,7 @@ def test_conv1d_api(self): X_scale = 1.3 X_zero_point = 2 W_scale = [0.5] - W_zero_point = [3] + W_zero_point = [0] if qengine_is_onednn() else [3] Y_scale = 5.0 Y_zero_point = 4 if torch.backends.quantized.engine == 'qnnpack': @@ -501,7 +504,7 @@ def test_conv2d_api(self): X_scale = 1.3 X_zero_point = 2 W_scale = [0.5] - W_zero_point = [3] + W_zero_point = [0] if qengine_is_onednn() else [3] Y_scale = 5.0 Y_zero_point = 4 # use_fused -> quantized class @@ -570,7 +573,7 @@ def test_conv3d_api(self): X_scale = 1.3 X_zero_point = 2 W_scale = [0.5] - W_zero_point = [3] + W_zero_point = [0] if qengine_is_onednn() else [3] Y_scale = 5.0 Y_zero_point = 4 # use_fused -> quantized class @@ -1200,7 +1203,8 @@ def test_dynamic_convtranspose3d(self): def test_linear_api(self, batch_size, in_features, out_features, use_bias, use_default_observer): """test API functionality for nn.quantized.dynamic.Linear""" W = torch.rand(out_features, in_features).float() - W_scale, W_zp = _calculate_dynamic_qparams(W, torch.qint8) + qscheme = torch.per_tensor_symmetric if qengine_is_onednn() else torch.per_tensor_affine + W_scale, W_zp = _calculate_dynamic_qparams(W, torch.qint8, qscheme=qscheme) W_q = torch.quantize_per_tensor(W, W_scale, W_zp, torch.qint8) X = torch.rand(batch_size, in_features).float() B = torch.rand(out_features).float() if use_bias else None @@ -1311,8 +1315,8 @@ def test_lstm_api(self, dtype, bidirectional): bias_keys.append(key_name1) bias_keys.append(key_name2) - if not (dtype == torch.float16 and torch.backends.quantized.engine == "qnnpack"): - # fp16 dynamic quant is not supported for qnnpack + if not (dtype == torch.float16 and torch.backends.quantized.engine in ("qnnpack", "onednn")): + # fp16 dynamic quant is not supported for qnnpack or onednn x = torch.randn(seq_len, batch, input_size) h = torch.randn(num_layers * (bidirectional + 1), batch, hidden_size) c = torch.randn(num_layers * (bidirectional + 1), batch, hidden_size) @@ -1362,8 +1366,8 @@ def test_gru_api(self): # instantiated for all engines and dtypes for dtype in [torch.qint8, torch.float16]: - if dtype == torch.float16 and torch.backends.quantized.engine == "qnnpack": - # fp16 dynamic quant is not supported for qnnpack + if dtype == torch.float16 and torch.backends.quantized.engine in ("qnnpack", "onednn"): + # fp16 dynamic quant is not supported for qnnpack or onednn continue # Test default instantiation seq_len = 4 @@ -1435,8 +1439,8 @@ def test_cell_api(self, dtype): 'RNNReLU': torch.ops.quantized.quantized_rnn_relu_cell_dynamic} for rnn_type in cell_dict.keys(): - if not (dtype == torch.float16 and torch.backends.quantized.engine == "qnnpack"): - # fp16 dynamic quant is not supported for qnnpack + if not (dtype == torch.float16 and torch.backends.quantized.engine in ("qnnpack", "onednn")): + # fp16 dynamic quant is not supported for qnnpack or onednn kwargs = {'input_size': input_size, 'hidden_size': hidden_size, 'bias': bias, 'dtype': dtype} if rnn_type == 'RNNReLU': kwargs['nonlinearity'] = "relu" @@ -1545,22 +1549,7 @@ def test_rnn(self): hidden_size = 7 num_layers = 2 bias = True - weight_keys = [] - bias_keys = [] for bidirectional in [True, False]: - num_directions = 2 if bidirectional else 1 - for layer in range(num_layers): - for direction in range(num_directions): - suffix = '_reverse' if direction == 1 else '' - key_name1 = 'weight_ih_l{layer_idx}{suffix}'.format(layer_idx=layer, suffix=suffix) - key_name2 = 'weight_hh_l{layer_idx}{suffix}'.format(layer_idx=layer, suffix=suffix) - weight_keys.append(key_name1) - weight_keys.append(key_name2) - key_name1 = 'bias_ih_l{layer_idx}{suffix}'.format(layer_idx=layer, suffix=suffix) - key_name2 = 'bias_hh_l{layer_idx}{suffix}'.format(layer_idx=layer, suffix=suffix) - bias_keys.append(key_name1) - bias_keys.append(key_name2) - x = torch.randn(seq_len, batch, input_size) h = torch.randn(num_layers * (bidirectional + 1), batch, hidden_size) c = torch.randn(num_layers * (bidirectional + 1), batch, hidden_size) @@ -1575,11 +1564,11 @@ def test_rnn(self): # initialize ref rnn module weight_qparams = { 'qscheme': torch.per_tensor_affine, - 'dtype': torch.quint8, + 'dtype': torch.qint8, 'scale': 2.0, 'zero_point': 5 } - weight_qparams_dict = {key: weight_qparams for key in fp32_rnn._flat_weights_names} + weight_qparams_dict = {key: weight_qparams for key in fp32_rnn._flat_weights_names if key.startswith("weight")} ref_rnn = nnqr.LSTM( input_size=input_size, hidden_size=hidden_size, @@ -1589,10 +1578,20 @@ def test_rnn(self): dropout=0.0, bidirectional=bidirectional, weight_qparams_dict=weight_qparams_dict) - ref_rnn._flat_weights = fp32_rnn._flat_weights + for wn in fp32_rnn._flat_weights_names: + setattr(ref_rnn, wn, copy.deepcopy(getattr(fp32_rnn, wn))) + + ref_rnn._flat_weights = copy.deepcopy(fp32_rnn._flat_weights) # quantize and dequantize the weights for fp32_rnn module - fp32_rnn._flat_weights = [self._quant_dequant_weight(w, weight_qparams) for w in fp32_rnn._flat_weights] + flat_weights = [] + for wn in fp32_rnn._flat_weights_names: + if wn.startswith("weight"): + weight = self._quant_dequant_weight(getattr(fp32_rnn, wn), weight_qparams) + else: + weight = getattr(fp32_rnn, wn) + flat_weights.append(weight) + fp32_rnn._flat_weights = flat_weights fp32_res = fp32_rnn(x, (h, c)) ref_res = ref_rnn(x, (h, c)) diff --git a/test/quantization/core/test_quantized_op.py b/test/quantization/core/test_quantized_op.py index b6079b37b6a2fd..c1d1251e03e12c 100644 --- a/test/quantization/core/test_quantized_op.py +++ b/test/quantization/core/test_quantized_op.py @@ -26,9 +26,13 @@ from torch.testing._internal.common_quantization import skipIfNoFBGEMM, skipIfNoQNNPACK from torch.testing._internal.common_quantized import _quantize, _dequantize, _calculate_dynamic_qparams, \ override_quantized_engine, supported_qengines, override_qengines, _snr -from torch.testing._internal.common_quantized import qengine_is_qnnpack +from torch.testing._internal.common_quantized import ( + qengine_is_qnnpack, + qengine_is_onednn, +) from torch.ao.quantization import PerChannelMinMaxObserver from torch.testing._internal.common_cuda import TEST_CUDNN +import torch.backends.xnnpack from typing import Optional @@ -71,7 +75,7 @@ def avoid_vpmaddubsw_overflow_linear( # Reference quantized Linear operator -def qlinear_ref(X_q, X_scale, X_zp, W_q, W_scale, W_zp, b_q, Y_scale, Y_zp): +def qlinear_ref(X_q, X_scale, X_zp, W_q, W_scale, W_zp, b_q, Y_scale, Y_zp, dtype=np.uint8): X_q = np.reshape(X_q, (-1, X_q.shape[X_q.ndim - 1])) row_offsets_ref = X_q.sum(axis=1).astype(np.int32).reshape((-1, 1)) col_offsets_ref = W_q.sum(axis=1).astype(np.int32).reshape((1, -1)) @@ -85,7 +89,7 @@ def qlinear_ref(X_q, X_scale, X_zp, W_q, W_scale, W_zp, b_q, Y_scale, Y_zp): ) if b_q is not None: Prod_XqWq_ref += b_q - Y_q_ref = _quantize(Prod_XqWq_ref, Y_scale / (X_scale * W_scale), Y_zp) + Y_q_ref = _quantize(Prod_XqWq_ref, Y_scale / (X_scale * W_scale), Y_zp, dtype=dtype) return Y_q_ref """Computes the output shape given pooling parameters.""" @@ -825,6 +829,44 @@ def test_qadd_relu_same_qparams(self): self.assertEqual(qCrelu_hat, qCrelu_out_hat, msg="AddReLU.out failed") + """Tests the correctness of the cudnn add and add_relu op + (Similar to test_qadd_relu_different_qparams, will probably merge in the future)""" + @unittest.skipIf(not TEST_CUDNN, "cudnn is not enabled.") + @unittest.skip("Local only - currently the qconv2d_cudnn op is bulid " + "with USE_EXPERIMENTAL_CUDNN_V8_API, we can enable the test " + "after it is built by default") + def test_qadd_relu_cudnn(self): + dtype = torch.qint8 + add_relu = torch.ops.quantized.add_relu + add = torch.ops.quantized.add + + # NB: This is a strange size so that we exercise both the vectorized + # implementation (64-element chunks at at time) as well as the scalar + # implementation + A = torch.arange(-128, 130, dtype=torch.float).to(torch.device("cuda")) + B = torch.arange(-128, 130, dtype=torch.float).to(torch.device("cuda")) + scale_A = 2.5 + scale_B = 6.3 + scale_C = 12.9 + zero_point = 0 + qA = torch.quantize_per_tensor(A, scale=scale_A, zero_point=zero_point, + dtype=dtype) + qB = torch.quantize_per_tensor(B, scale=scale_B, zero_point=zero_point, + dtype=dtype) + # Add ground truth + C = (qA.dequantize() + qB.dequantize()).to(device="cpu").numpy() + qC = _quantize(C, scale_C, zero_point, dtype=np_dtype[dtype]) + qC_hat = add(qA, qB, scale=scale_C, zero_point=zero_point).to(device="cpu") + np.testing.assert_equal(qC, qC_hat.int_repr(), + "Quantized addition failed.") + + # Add + ReLU ground truth + Crelu = C.copy() + Crelu[C < 0] = 0 + qCrelu = _quantize(Crelu, scale_C, zero_point, dtype=np_dtype[dtype]) + qCrelu_hat = add_relu(qA, qB, scale=scale_C, zero_point=zero_point).to(device="cpu") + np.testing.assert_equal(qCrelu, qCrelu_hat.int_repr(), + "Quantized addition with ReLU failed.") """Tests the correctness of the add and add_relu op.""" def test_qadd_relu_different_qparams(self): @@ -992,9 +1034,20 @@ def test_qmul_relu_different_qparams(self): msg="mulReLU.out failed") """Tests the correctness of the matmul op.""" - def test_qmatmul(self): - A = torch.randn(size=(3, 4), dtype=torch.float32) * 3 - B = torch.randn(size=(4, 5), dtype=torch.float32) * 3 + @given(num_dims=st.integers(2, 5), + outer_dims=st.lists(st.integers(2, 6), min_size=3, max_size=3), + m=st.integers(2, 6), + k=st.integers(2, 6), + n=st.integers(2, 6), + dtypes=st.sampled_from(((torch.qint8, np.int8), + (torch.quint8, np.uint8)))) + def test_qmatmul(self, num_dims, outer_dims, m, k, n, dtypes): + (torch_dtype, np_dtype) = dtypes + + size_a = outer_dims[:num_dims - 2] + [m, k] + size_b = outer_dims[:num_dims - 2] + [k, n] + A = torch.randn(size=size_a, dtype=torch.float32) * 3 + B = torch.randn(size=size_b, dtype=torch.float32) * 3 scale_A = 3.1 zero_point_A = 7 @@ -1004,15 +1057,22 @@ def test_qmatmul(self): scale_C = 1.3 zero_point_C = 5 - qA = torch.quantize_per_tensor(A, scale=scale_A, zero_point=zero_point_A, - dtype=torch.qint8) - qB = torch.quantize_per_tensor(B, scale=scale_B, zero_point=zero_point_B, - dtype=torch.qint8) + qA = torch.quantize_per_tensor(A, + scale=scale_A, + zero_point=zero_point_A, + dtype=torch_dtype) + qB = torch.quantize_per_tensor(B, + scale=scale_B, + zero_point=zero_point_B, + dtype=torch_dtype) # matmul ground truth C = torch.matmul(qA.dequantize(), qB.dequantize()).numpy() - qC = _quantize(C, scale_C, zero_point_C, dtype=np.int8) - qC_hat = torch.ops.quantized.matmul(qA, qB, scale=scale_C, zero_point=zero_point_C) + qC = _quantize(C, scale_C, zero_point_C, dtype=(np_dtype)) + qC_hat = torch.ops.quantized.matmul(qA, + qB, + scale=scale_C, + zero_point=zero_point_C) np.testing.assert_equal(qC, qC_hat.int_repr(), "Quantized multiplication failed.") @@ -1023,10 +1083,16 @@ def test_qmatmul(self): scales_B = torch.rand(size=(B.shape[axis],)) zero_points_B = torch.randint(low=0, high=5, size=(B.shape[axis],)) - qA = torch.quantize_per_channel(A, scales=scales_A, zero_points=zero_points_A, - axis=axis, dtype=torch.qint8) - qB = torch.quantize_per_channel(B, scales=scales_B, zero_points=zero_points_B, - axis=axis, dtype=torch.qint8) + qA = torch.quantize_per_channel(A, + scales=scales_A, + zero_points=zero_points_A, + axis=axis, + dtype=torch.qint8) + qB = torch.quantize_per_channel(B, + scales=scales_B, + zero_points=zero_points_B, + axis=axis, + dtype=torch.qint8) np.testing.assert_raises_regex(RuntimeError, ".*per-tensor.*", torch.ops.quantized.matmul, @@ -1161,6 +1227,52 @@ def test_max_pool1d(self, X, kernel, stride, dilation, padding, ceil_mode): self.assertEqual(a_ref, a_hat.dequantize(), msg="ops.quantized.max_pool1d results are off") + # TODO: merge this test with test_max_pool2d when USE_EXPERIMENTAL_CUDNN_V8_API flag is enabled in CI + """Tests 2D cudnn max pool operation on quantized tensors.""" + @given(X=hu.tensor(shapes=hu.array_shapes(min_dims=3, max_dims=4, + min_side=1, max_side=10), + # cudnn's support for quantized pooling is limited to + # int8 currently + qparams=hu.qparams(dtypes=[torch.qint8])), + kernel=st.sampled_from((3, 5, 7)), + stride=st.sampled_from((None, 1, 2)), + # currently there is no support for dilation for cudnn + # pooling + dilation=st.integers(1, 1), + padding=st.integers(0, 2), + ceil_mode=st.booleans()) + @unittest.skipIf(not TEST_CUDNN, "cudnn is not enabled.") + @unittest.skip("Local only - currently the qconv2d_cudnn op is bulid " + "with USE_EXPERIMENTAL_CUDNN_V8_API, we can enable the test " + "after it is built by default") + def test_max_pool2d_cudnn(self, X, kernel, stride, dilation, padding, ceil_mode): + X, (scale, zero_point, torch_type) = X + assume(kernel // 2 >= padding) # Kernel cannot be overhanging! + iH, iW = X.shape[-2:] + oH = pool_output_shape(iH, kernel, padding, stride, dilation, ceil_mode) + assume(oH > 0) + oW = pool_output_shape(iW, kernel, padding, stride, dilation, ceil_mode) + assume(oW > 0) + + a = torch.from_numpy(X).to(device="cuda") + a_pool = torch.nn.functional.max_pool2d(a, kernel_size=kernel, + stride=stride, + padding=padding, dilation=dilation, + ceil_mode=ceil_mode) + a_ref = torch.quantize_per_tensor(a_pool, scale=scale, + zero_point=zero_point, dtype=torch_type) + a_ref = a_ref.dequantize() + qa = torch.quantize_per_tensor(a, scale=scale, zero_point=zero_point, + dtype=torch_type) + + # Test the ops.quantized separately, because None is not treated. + a_hat = torch.ops.quantized.max_pool2d( + qa, kernel_size=_pair(kernel), + stride=_pair(kernel if stride is None else stride), + padding=_pair(padding), dilation=_pair(dilation), ceil_mode=ceil_mode) + self.assertEqual(a_ref, a_hat.dequantize(), + msg="ops.quantized.max_pool2d results are off") + """Tests 2D max pool operation on quantized tensors.""" @given(X=hu.tensor(shapes=hu.array_shapes(min_dims=3, max_dims=4, min_side=1, max_side=10), @@ -2633,7 +2745,7 @@ def forward( ] q_data = [] - reduce_range = (qengine == 'fbgemm') + reduce_range = (qengine in ('fbgemm', 'onednn')) for idx, x in enumerate(fp_data): scale, zero_point = _calculate_dynamic_qparams( x, dtype=dtype, reduce_range=reduce_range) @@ -2654,7 +2766,13 @@ def forward( mha.eval() # Prepare - mha.qconfig = torch.ao.quantization.get_default_qconfig(qengine) + if qengine_is_onednn(): + # `reduce_range` is False by default for ONEDNN backend + # but the test fails on earlier CPUs without VNNI. + # So we use a default qconfig with `reduce_range=True` here + mha.qconfig = torch.ao.quantization.get_default_qconfig() + else: + mha.qconfig = torch.ao.quantization.get_default_qconfig(qengine) mha_prepared = torch.ao.quantization.prepare( mha, prepare_custom_config_dict=custom_module_config) @@ -2747,7 +2865,7 @@ def test_qlinear(self, batch_size, input_channels, output_channels, (b_value_max - b_value_min) + b_value_min ).astype(np.int32) if use_bias else None - if torch.backends.quantized.engine == 'fbgemm': + if torch.backends.quantized.engine in ('fbgemm', 'onednn'): avoid_vpmaddubsw_overflow_linear( batch_size, input_channels, @@ -2880,6 +2998,19 @@ def test_qlinear_legacy(self, batch_size, input_channels, output_channels): self.assertEqual(Y_fp32, Y_fp32_ref, msg="torch.ops.quantized.fbgemm_linear_dynamic results are off") + @skipIfNoFBGEMM + @given( + input_channels=st.integers(16, 32), + output_channels=st.integers(4, 8), + exponent=st.integers(0, 8)) + def test_linear_prepack_fp16_numerics(self, input_channels, output_channels, exponent): + w = torch.randn(output_channels, input_channels) * 10**exponent + bias = None + w_packed_fp16 = torch.ops.quantized.linear_prepack_fp16(w, bias) + w_unpacked_fp16 = torch.ops.quantized.linear_unpack_fp16(w_packed_fp16) + w_fp16 = w.to(torch.float16).to(torch.float32) + self.assertTrue(torch.equal(w_fp16, w_unpacked_fp16[0])) + @skipIfNoFBGEMM def test_qlinear_dynamic_fp16(self): @@ -2971,8 +3102,8 @@ def test_qlstmGRU(self, num_batches, input_size, hidden_size, for rnn_type in ['LSTM', 'GRU']: for dtype in [torch.qint8, torch.float16]: - # Fp16 quantization is not supported for qnnpack - if torch.backends.quantized.engine == 'qnnpack' and dtype == torch.float16: + # Fp16 quantization is not supported for qnnpack or onednn + if torch.backends.quantized.engine in ('qnnpack', 'onednn') and dtype == torch.float16: continue if torch.backends.quantized.engine == 'qnnpack': @@ -3105,8 +3236,8 @@ def test_qrnncell(self, num_batches, input_size, hidden_size, per_channel_quant) for rnn_type in ['LSTMCell', 'GRUCell', 'RNNTanh', 'RNNReLU']: for dtype in [torch.qint8, torch.float16]: - # Fp16 quantization is not supported for qnnpack - if torch.backends.quantized.engine == 'qnnpack' and dtype == torch.float16: + # Fp16 quantization is not supported for qnnpack or onednn + if torch.backends.quantized.engine in ('qnnpack', 'onednn') and dtype == torch.float16: continue if torch.backends.quantized.engine == 'qnnpack': @@ -3247,6 +3378,7 @@ class TestQuantizedLinear(TestCase): def test_qlinear(self, batch_size, input_channels, output_channels, use_bias, use_relu, use_multi_dim_input, use_channelwise): decimal_val = 4 + dtypes = [torch.quint8] if torch.backends.quantized.engine == 'qnnpack': # QNNPACK supports uint8 in the kernels. In the op we shift the int8 # weight values to uint8 to be on par with fbgemm. However, this causes @@ -3254,24 +3386,165 @@ def test_qlinear(self, batch_size, input_channels, output_channels, use_bias, # off by one results. decimal_val = 0 + # only qnnpack qengine supports qint8 when xnnpack is available + if torch.backends.xnnpack.enabled: + dtypes.append(torch.qint8) + + for dtype in dtypes: + # No support for channelwise in xnnpack (int8) + # ONEDNN does not support qint8 + if dtype == torch.qint8 and (use_channelwise or qengine_is_onednn()): + return + + nptype = np_dtype[dtype] + qlinear_prepack = torch.ops.quantized.linear_prepack + if use_relu: + qlinear = torch.ops.quantized.linear_relu + else: + qlinear = torch.ops.quantized.linear + if use_multi_dim_input: + batch_size *= 3 # Test the multi-dim input tensor + X_scale = 1.5 + X_zp = 5 + X_value_min = -128 if dtype == torch.qint8 else 0 + X_value_max = 127 if dtype == torch.qint8 else 255 + X_q0 = np.round( + np.random.rand(batch_size, input_channels) * + (X_value_max - X_value_min) + + X_value_min + ).astype(nptype) + + W_scales = np.random.rand(output_channels) + # xnnpack forces W_zp to 0 when using symmetric quantization + # ONEDNN only supports symmetric quantization of weight + if dtype == torch.qint8 or qengine_is_onednn(): + W_zps = np.zeros(output_channels).astype(np.int) + else: + W_zps = np.round(np.random.rand(output_channels) * 100 - 50).astype(np.int) + # when using symmetric quantization + # special restriction for xnnpack fully connected op weight + # [-127, 127] instead of [-128, 127] + W_value_min = -127 if dtype == torch.qint8 else -128 + W_value_max = 127 + W_q0 = np.round( + np.random.rand(output_channels, input_channels) + * (W_value_max - W_value_min) + + W_value_min + ).astype(np.int8) # weight is always int8_t + b_value_min = -10 + b_value_max = 10 + b_q0 = np.round( + np.random.rand(output_channels) * + (b_value_max - b_value_min) + b_value_min + ).astype(np.int32) if use_bias else None + if torch.backends.quantized.engine in ('fbgemm', 'onednn'): + avoid_vpmaddubsw_overflow_linear( + batch_size, + input_channels, + output_channels, + X_q0, + X_value_min, + X_value_max, + W_q0, + W_value_min, + W_value_max, + ) + X = torch.from_numpy(_dequantize( + X_q0, X_scale, X_zp)).to(dtype=torch.float) + X_q = torch.quantize_per_tensor( + X, scale=X_scale, zero_point=X_zp, dtype=dtype) + if use_channelwise: + W = torch.from_numpy(_dequantize(W_q0, W_scales.reshape( + (-1, 1)), W_zps.reshape((-1, 1)))).to(dtype=torch.float) + W_q = torch.quantize_per_channel(W, scales=torch.from_numpy(W_scales), + zero_points=torch.from_numpy(W_zps), axis=0, dtype=torch.qint8) + b = torch.from_numpy(_dequantize( + b_q0, X_scale * W_scales, 0)).to(dtype=torch.float) if use_bias else None + b_q = torch.quantize_per_channel(b, scales=torch.from_numpy(X_scale * W_scales), + zero_points=torch.zeros(output_channels, dtype=torch.long), + axis=0, dtype=torch.qint32) if use_bias else None + else: + W = torch.from_numpy(_dequantize( + W_q0, W_scales[0], W_zps[0])).to(dtype=torch.float) + W_q = torch.quantize_per_tensor(W, scale=W_scales[0], zero_point=( + W_zps[0].astype(int).item()), dtype=torch.qint8) + b = torch.from_numpy(_dequantize( + b_q0, X_scale * (W_scales[0].item()), 0)).to(dtype=torch.float) if use_bias else None + b_q = torch.quantize_per_tensor( + b, scale=X_scale * (W_scales[0].item()), zero_point=0, dtype=torch.qint32) if use_bias else None + # Compare X_scale * W_scale * input_channels * X_value_max * W_value_max with + # Y_scale * 255 (max for uint8). + Y_scale = 125.1234 + Y_zp = 5 + # Weight prepacking operator for quantized Linear + float_bias = b if use_bias else None + W_prepack = qlinear_prepack(W_q, float_bias) + if use_multi_dim_input: + X_q = X_q.view(3, int(batch_size / 3), input_channels) + # Quantized Linear operator with prepacked weight + Y_q = qlinear(X_q, W_prepack, Y_scale, Y_zp) + if not use_channelwise: + # Test the per-tensor quantization only + # Reference quantized Linear operator + Y_q_ref = qlinear_ref(X_q0, X_scale, X_zp, W_q0, + W_scales[0], W_zps[0], b_q0, Y_scale, Y_zp, dtype=nptype) + if use_relu: + Y_q_ref[Y_q_ref < Y_zp] = Y_zp + if use_multi_dim_input: + Y_q_ref = np.reshape( + Y_q_ref, (3, int(batch_size / 3), output_channels)) + # Assert equal + np.testing.assert_array_almost_equal(Y_q_ref, Y_q.int_repr().numpy(), decimal=decimal_val) + # Test both per-tensor and per-channel quantization + # Reference quantized result from PyTorch Linear operator + W_fp32 = W_q.dequantize().to(dtype=torch.float) + X_fp32 = X_q.dequantize().to(dtype=torch.float) + b_fp32 = b_q.dequantize().to(dtype=torch.float) if use_bias else None + Y_fp32_ref = F.linear(X_fp32, W_fp32, b_fp32) + if use_relu: + Y_fp32_ref[Y_fp32_ref < 0.0] = 0.0 + Y_q_ref2 = torch.quantize_per_tensor( + Y_fp32_ref, Y_scale, Y_zp, dtype) + # Assert equal + np.testing.assert_array_almost_equal( + Y_q_ref2.int_repr().numpy(), Y_q.int_repr().numpy(), decimal=decimal_val) + + @given(batch_size=st.integers(1, 4), + input_channels=st.integers(16, 32), + output_channels=st.integers(4, 8), + use_bias=st.sampled_from([False]), + use_relu=st.sampled_from([False]), + use_multi_dim_input=st.booleans(), + use_channelwise=st.sampled_from([False])) # channelwise currently not supported for qlinear cudnn + @skipIfNoFBGEMM + @unittest.skipIf(not TEST_CUDNN, "cudnn is not enabled.") + @unittest.skip("Local only - currently the qconv2d_cudnn op is bulid " + "with USE_EXPERIMENTAL_CUDNN_V8_API, we can enable the test " + "after it is built by default") + # TODO: check with yang regarding CUDNN flags + def test_qlinear_cudnn(self, batch_size, input_channels, output_channels, use_bias, + use_relu, use_multi_dim_input, use_channelwise): qlinear_prepack = torch.ops.quantized.linear_prepack + batch_size = 1 + input_channels = 10 + output_channels = 20 + use_bias = False + use_relu = False + use_channelwise = False if use_relu: - qlinear = torch.ops.quantized.linear_relu + qlinear_op = torch.ops.quantized.linear_relu else: - qlinear = torch.ops.quantized.linear - if use_multi_dim_input: - batch_size *= 3 # Test the multi-dim input tensor + qlinear_op = torch.ops.quantized.linear X_scale = 1.5 - X_zp = 5 - X_value_min = 0 - X_value_max = 225 + X_zp = 0 + X_value_min = -128 + X_value_max = 127 X_q0 = np.round( np.random.rand(batch_size, input_channels) * (X_value_max - X_value_min) - + X_value_min - ).astype(np.uint8) - W_scales = np.random.rand(output_channels) - W_zps = np.round(np.random.rand(output_channels) * 100 - 50).astype(np.int) + + X_value_min).astype(np.int8) + W_scale = 2.5 + W_zp = 0 W_value_min = -128 W_value_max = 127 W_q0 = np.round( @@ -3285,6 +3558,15 @@ def test_qlinear(self, batch_size, input_channels, output_channels, use_bias, np.random.rand(output_channels) * (b_value_max - b_value_min) + b_value_min ).astype(np.int32) if use_bias else None + if use_bias: + b_value_min = -10 + b_value_max = 10 + b_q0 = np.round( + np.random.rand(output_channels) * + (b_value_max - b_value_min) + b_value_min + ).astype(np.int32) + else: + bias = None avoid_vpmaddubsw_overflow_linear( batch_size, input_channels, @@ -3296,65 +3578,31 @@ def test_qlinear(self, batch_size, input_channels, output_channels, use_bias, W_value_min, W_value_max, ) + quant_dtype = torch.qint8 X = torch.from_numpy(_dequantize( - X_q0, X_scale, X_zp)).to(dtype=torch.float) + X_q0, X_scale, X_zp)).to(dtype=torch.float).to(device="cuda") X_q = torch.quantize_per_tensor( - X, scale=X_scale, zero_point=X_zp, dtype=torch.quint8) - if use_channelwise: - W = torch.from_numpy(_dequantize(W_q0, W_scales.reshape( - (-1, 1)), W_zps.reshape((-1, 1)))).to(dtype=torch.float) - W_q = torch.quantize_per_channel(W, scales=torch.from_numpy(W_scales), - zero_points=torch.from_numpy(W_zps), axis=0, dtype=torch.qint8) - b = torch.from_numpy(_dequantize( - b_q0, X_scale * W_scales, 0)).to(dtype=torch.float) if use_bias else None - b_q = torch.quantize_per_channel(b, scales=torch.from_numpy(X_scale * W_scales), - zero_points=torch.zeros(output_channels, dtype=torch.long), - axis=0, dtype=torch.qint32) if use_bias else None - else: - W = torch.from_numpy(_dequantize( - W_q0, W_scales[0], W_zps[0])).to(dtype=torch.float) - W_q = torch.quantize_per_tensor(W, scale=W_scales[0], zero_point=( - W_zps[0].astype(int).item()), dtype=torch.qint8) - b = torch.from_numpy(_dequantize( - b_q0, X_scale * (W_scales[0].item()), 0)).to(dtype=torch.float) if use_bias else None - b_q = torch.quantize_per_tensor( - b, scale=X_scale * (W_scales[0].item()), zero_point=0, dtype=torch.qint32) if use_bias else None - # Compare X_scale * W_scale * input_channels * X_value_max * W_value_max with - # Y_scale * 255 (max for uint8). - Y_scale = 125.1234 - Y_zp = 5 + X, scale=X_scale, zero_point=X_zp, dtype=quant_dtype) + W = torch.from_numpy(_dequantize( + W_q0, W_scale, W_zp)).to(dtype=torch.float).to(device="cuda") + W_q = torch.quantize_per_tensor(W, scale=W_scale, zero_point=W_zp, dtype=quant_dtype) + b = torch.from_numpy(_dequantize( + b_q0, X_scale * (W_zp), 0)).to(dtype=torch.float).to(device="cuda") if use_bias else None + b_q = torch.quantize_per_tensor( + b, scale=X_scale * W_scale, zero_point=0, dtype=quant_dtype) if use_bias else None + Y_scale = 0.5 + Y_zp = 0 # Weight prepacking operator for quantized Linear float_bias = b if use_bias else None - W_prepack = qlinear_prepack(W_q, float_bias) - if use_multi_dim_input: - X_q = X_q.view(3, int(batch_size / 3), input_channels) + W_prepack = qlinear_prepack(W_q, float_bias if use_bias else None) # Quantized Linear operator with prepacked weight - Y_q = qlinear(X_q, W_prepack, Y_scale, Y_zp) - if not use_channelwise: - # Test the per-tensor quantization only - # Reference quantized Linear operator - Y_q_ref = qlinear_ref(X_q0, X_scale, X_zp, W_q0, - W_scales[0], W_zps[0], b_q0, Y_scale, Y_zp) - if use_relu: - Y_q_ref[Y_q_ref < Y_zp] = Y_zp - if use_multi_dim_input: - Y_q_ref = np.reshape( - Y_q_ref, (3, int(batch_size / 3), output_channels)) - # Assert equal - np.testing.assert_array_almost_equal(Y_q_ref, Y_q.int_repr().numpy(), decimal=decimal_val) - # Test both per-tensor and per-channel quantization - # Reference quantized result from PyTorch Linear operator - W_fp32 = W_q.dequantize().to(dtype=torch.float) - X_fp32 = X_q.dequantize().to(dtype=torch.float) - b_fp32 = b_q.dequantize().to(dtype=torch.float) if use_bias else None - Y_fp32_ref = F.linear(X_fp32, W_fp32, b_fp32) + Y_q = qlinear_op(X_q, W_prepack, Y_scale, Y_zp).to(device="cpu") + Y_q_ref = qlinear_ref(X_q0, X_scale, X_zp, W_q0, + W_scale, W_zp, b_q0, Y_scale, Y_zp, dtype=np.int8) if use_relu: - Y_fp32_ref[Y_fp32_ref < 0.0] = 0.0 - Y_q_ref2 = torch.quantize_per_tensor( - Y_fp32_ref, Y_scale, Y_zp, torch.quint8) - # Assert equal - np.testing.assert_array_almost_equal( - Y_q_ref2.int_repr().numpy(), Y_q.int_repr().numpy(), decimal=decimal_val) + Y_q_ref[Y_q_ref < Y_zp] = Y_zp + decimal_val = 0 + np.testing.assert_array_almost_equal(Y_q_ref, Y_q.int_repr().numpy(), decimal=decimal_val) """Tests the correctness of the quantized::linear_unpack op.""" @given(W=hu.tensor(shapes=hu.array_shapes(2, 2,), @@ -3371,6 +3619,13 @@ def test_qlinear_unpack(self, W, use_channelwise): qlinear_prepack = torch.ops.quantized.linear_prepack qlinear_unpack = torch.ops.quantized.linear_unpack + # ONEDNN only supports symmetric quantization of weight + if qengine_is_onednn(): + if use_channelwise: + W_zps = torch.zeros(output_channels).to(torch.int64) + else: + W_zp = 0 + W = torch.from_numpy(W) if use_channelwise: W_q = torch.quantize_per_channel( @@ -3834,6 +4089,10 @@ def _test_qconv_unpack_impl(self, qconv_prepack_fn, qconv_unpack_fn, inputs, if channelwise and transposed: # currently transposed conv and per-channel per quantization does not work return + # ONEDNN only supports symmetric quantization of weight and zero output padding + if qengine_is_onednn(): + W_zero_point = 0 + o_pads = len(o_pads) * [0] if o_pads is not None else None if channelwise: if transposed: output_channels = W.shape[1] # IC OC/G @@ -3972,6 +4231,9 @@ def _test_qconv_impl( weight_dtype=torch.qint8, output_dtype=torch.quint8, ): + # ONEDNN only supports symmetric quantization of weight + if qengine_is_onednn() and W_zero_point is not None: + W_zero_point = len(W_zero_point) * [0] (X, W), (X_q, W_q), bias_float = self._make_qconv_tensors( batch_size, input_channels_per_group, input_feature_map_shape, output_channels_per_group, groups, kernels, @@ -4056,7 +4318,7 @@ def _test_qconv_impl( Y_scale=st.floats(4.2, 5.6), Y_zero_point=st.integers(0, 4), use_bias=st.booleans(), - use_relu=st.sampled_from([False]), + use_relu=st.booleans(), use_channelwise=st.booleans()) @override_qengines def test_qconv2d( @@ -4104,12 +4366,22 @@ def test_qconv2d( dilations, groups, ) - self._test_qconv_impl( - qconv, qconv_prepack, conv_op, batch_size, - input_channels_per_group, (height, width), - output_channels_per_group, groups, kernels, strides, pads, None, - dilations, X_scale, X_zero_point, W_scale, W_zero_point, - Y_scale, Y_zero_point, use_bias, use_relu, use_channelwise, False) + + act_qdtypes = [torch.quint8] + # Only qnnpack qengine supportes qint8 + if qengine_is_qnnpack() and torch.backends.xnnpack.enabled: + act_qdtypes.append(torch.qint8) + + for X_qdtype in act_qdtypes: + if X_qdtype == torch.qint8: + W_zero_point = [0 for i in range(len(W_zero_point))] + + self._test_qconv_impl( + qconv, qconv_prepack, conv_op, batch_size, + input_channels_per_group, (height, width), + output_channels_per_group, groups, kernels, strides, pads, None, + dilations, X_scale, X_zero_point, W_scale, W_zero_point, + Y_scale, Y_zero_point, use_bias, use_relu, use_channelwise, False, input_dtype=X_qdtype, output_dtype=X_qdtype) @given(batch_size=st.integers(1, 3), # only multiples of 16 are supported right now, might be fixed in @@ -4181,9 +4453,9 @@ def test_qconv2d_cudnn( dilations = (dilation, dilation) if use_relu: - qconv = torch.ops.quantized.conv2d_relu_cudnn + qconv = torch.ops.quantized.conv2d_relu else: - qconv = torch.ops.quantized.conv2d_cudnn + qconv = torch.ops.quantized.conv2d conv_op = torch.nn.Conv2d( input_channels, output_channels, @@ -4194,7 +4466,7 @@ def test_qconv2d_cudnn( groups, ).to(torch.device("cuda")) self._test_qconv_impl( - qconv, None, conv_op, batch_size, + qconv, torch.ops.quantized.conv2d_prepack, conv_op, batch_size, input_channels_per_group, (height, width), output_channels_per_group, groups, kernels, strides, pads, None, dilations, X_scale, X_zero_point, W_scale, W_zero_point, @@ -4270,13 +4542,14 @@ def trace_handler(p): weight_int8 = torch.quantize_per_tensor(weight, 1, 0, torch.qint8).contiguous(memory_format=torch.channels_last) scale = 1.0 zero_point = 0 - conv_op = torch.ops.quantized.conv2d_cudnn + conv_op = torch.ops.quantized.conv2d + weight_prepacked = torch.ops.quantized.conv2d_prepack(weight_int8, None, stride, padding, dilation, groups) with profile( activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA], schedule=my_schedule, on_trace_ready=trace_handler) as prof: for i in range(30): - conv_op(input_int8, weight_int8, None, stride, padding, dilation, groups, scale, zero_point) + conv_op(input_int8, weight_prepacked, scale, zero_point) prof.step() print("int8 benchmark result:") @@ -4324,7 +4597,7 @@ def test_qconv_transpose1d( return # Currently only the QNNPACK is supported if qengine_is_qnnpack() and (IS_PPC or TEST_WITH_UBSAN): return # QNNPACK doesn't support these - assume(o_pad < stride or o_pad < dilation) + assume(o_pad < stride and o_pad < dilation) input_channels = input_channels_per_group * groups output_channels = output_channels_per_group * groups @@ -4347,40 +4620,51 @@ def test_qconv_transpose1d( dilation=dilations, bias=use_bias ) - X_q, W_q, bias_float = self._test_qconv_impl( - qconv, qconv_prepack, conv_op, batch_size, - input_channels_per_group, (width, ), - output_channels_per_group, groups, kernels, strides, pads, o_pads, - dilations, X_scale, X_zero_point, W_scale, W_zero_point, - Y_scale, Y_zero_point, use_bias, use_relu=False, - use_channelwise=False, use_transpose=True) - # check that this doesn't error - test_conv = torch.nn.quantized.ConvTranspose1d(input_channels, output_channels, 1) - test_conv(X_q) + act_qdtypes = [torch.quint8] + # Only qnnpack qengine supportes qint8 + if qengine_is_qnnpack() and torch.backends.xnnpack.enabled: + act_qdtypes.append(torch.qint8) - # Test the module implementation - qconv_op = torch.nn.quantized.ConvTranspose1d( - in_channels=input_channels, - out_channels=output_channels, - kernel_size=kernels, - stride=strides, - padding=pads, - output_padding=o_pads, - groups=groups, - dilation=dilations, - bias=use_bias - ) - qconv_op.scale = Y_scale - qconv_op.zero_point = Y_zero_point - qconv_op.set_weight_bias(W_q, bias_float) + for X_qdtype in act_qdtypes: + if X_qdtype == torch.qint8: + W_zero_point = [0 for i in range(len(W_zero_point))] - Y_dq_ref = conv_op(X_q.dequantize()) - Y_q_ref = torch.quantize_per_tensor(Y_dq_ref, scale=Y_scale, - zero_point=Y_zero_point, - dtype=torch.quint8) - Y_q = qconv_op(X_q) - self.assertEqual(Y_q_ref, Y_q) + X_q, W_q, bias_float = self._test_qconv_impl( + qconv, qconv_prepack, conv_op, batch_size, + input_channels_per_group, (width, ), + output_channels_per_group, groups, kernels, strides, pads, o_pads, + dilations, X_scale, X_zero_point, W_scale, W_zero_point, + Y_scale, Y_zero_point, use_bias, use_relu=False, + use_channelwise=False, use_transpose=True, input_dtype=X_qdtype, output_dtype=X_qdtype) + + # check that this doesn't error + test_conv = torch.nn.quantized.ConvTranspose1d(input_channels, output_channels, 1) + test_conv.scale = Y_scale + test_conv(X_q) + + # Test the module implementation + qconv_op = torch.nn.quantized.ConvTranspose1d( + in_channels=input_channels, + out_channels=output_channels, + kernel_size=kernels, + stride=strides, + padding=pads, + output_padding=o_pads, + groups=groups, + dilation=dilations, + bias=use_bias + ) + qconv_op.scale = Y_scale + qconv_op.zero_point = Y_zero_point + qconv_op.set_weight_bias(W_q, bias_float) + + Y_dq_ref = conv_op(X_q.dequantize()) + Y_q_ref = torch.quantize_per_tensor(Y_dq_ref, scale=Y_scale, + zero_point=Y_zero_point, + dtype=X_qdtype) + Y_q = qconv_op(X_q) + self.assertEqual(Y_q_ref, Y_q) """Tests the correctness of quantized convolution op.""" @@ -4433,8 +4717,11 @@ def test_qconv_transpose2d( use_bias): if qengine_is_qnnpack() and (IS_PPC or TEST_WITH_UBSAN): return # QNNPACK doesn't support these - assume(o_pad_h < stride_h or o_pad_h < dilation) - assume(o_pad_w < stride_w or o_pad_w < dilation) + # ONEDNN does not support output paddings + if qengine_is_onednn() and (o_pad_h, o_pad_w) != (0, 0): + return + assume(o_pad_h < stride_h and o_pad_h < dilation) + assume(o_pad_w < stride_w and o_pad_w < dilation) input_channels = input_channels_per_group * groups output_channels = output_channels_per_group * groups @@ -4457,40 +4744,50 @@ def test_qconv_transpose2d( dilation=dilations, bias=use_bias ) - X_q, W_q, bias_float = self._test_qconv_impl( - qconv, qconv_prepack, conv_op, batch_size, - input_channels_per_group, (height, width), - output_channels_per_group, groups, kernels, strides, pads, o_pads, - dilations, X_scale, X_zero_point, W_scale, W_zero_point, - Y_scale, Y_zero_point, use_bias, use_relu=False, - use_channelwise=False, use_transpose=True) + act_qdtypes = [torch.quint8] + # Only qnnpack qengine supportes qint8 + if qengine_is_qnnpack() and torch.backends.xnnpack.enabled: + act_qdtypes.append(torch.qint8) - # check that this doesn't error - test_conv = torch.nn.quantized.ConvTranspose2d(input_channels, output_channels, 1) - test_conv(X_q) + for X_qdtype in act_qdtypes: + if X_qdtype == torch.qint8: + W_zero_point = [0 for i in range(len(W_zero_point))] - # Test the module implementation - qconv_op = torch.nn.quantized.ConvTranspose2d( - in_channels=input_channels, - out_channels=output_channels, - kernel_size=kernels, - stride=strides, - padding=pads, - output_padding=o_pads, - groups=groups, - dilation=dilations, - bias=use_bias - ) - qconv_op.scale = Y_scale - qconv_op.zero_point = Y_zero_point - qconv_op.set_weight_bias(W_q, bias_float) + X_q, W_q, bias_float = self._test_qconv_impl( + qconv, qconv_prepack, conv_op, batch_size, + input_channels_per_group, (height, width), + output_channels_per_group, groups, kernels, strides, pads, o_pads, + dilations, X_scale, X_zero_point, W_scale, W_zero_point, + Y_scale, Y_zero_point, use_bias, use_relu=False, + use_channelwise=False, use_transpose=True, input_dtype=X_qdtype, output_dtype=X_qdtype) + + # check that this doesn't error + test_conv = torch.nn.quantized.ConvTranspose2d(input_channels, output_channels, 1) + test_conv.scale = Y_scale + test_conv(X_q) + + # Test the module implementation + qconv_op = torch.nn.quantized.ConvTranspose2d( + in_channels=input_channels, + out_channels=output_channels, + kernel_size=kernels, + stride=strides, + padding=pads, + output_padding=o_pads, + groups=groups, + dilation=dilations, + bias=use_bias + ) + qconv_op.scale = Y_scale + qconv_op.zero_point = Y_zero_point + qconv_op.set_weight_bias(W_q, bias_float) - Y_dq_ref = conv_op(X_q.dequantize()) - Y_q_ref = torch.quantize_per_tensor(Y_dq_ref, scale=Y_scale, - zero_point=Y_zero_point, - dtype=torch.quint8) - Y_q = qconv_op(X_q) - self.assertEqual(Y_q_ref, Y_q) + Y_dq_ref = conv_op(X_q.dequantize()) + Y_q_ref = torch.quantize_per_tensor(Y_dq_ref, scale=Y_scale, + zero_point=Y_zero_point, + dtype=X_qdtype) + Y_q = qconv_op(X_q) + self.assertEqual(Y_q_ref, Y_q) """Tests the correctness of quantized convolution op.""" @given(batch_size=st.integers(1, 3), @@ -4552,6 +4849,9 @@ def test_qconv_transpose3d( use_bias): if qengine_is_qnnpack(): return # QNNPACK doesn't support this + # ONEDNN doesn't support output paddings + if qengine_is_onednn() and (o_pad_t, o_pad_h, o_pad_w) != (0, 0, 0): + return assume(o_pad_t < stride_t or o_pad_t < dilation) assume(o_pad_h < stride_h or o_pad_h < dilation) assume(o_pad_w < stride_w or o_pad_w < dilation) @@ -4587,6 +4887,7 @@ def test_qconv_transpose3d( # check that this doesn't error test_conv = torch.nn.quantized.ConvTranspose3d(input_channels, output_channels, 1) + test_conv.scale = Y_scale test_conv(X_q) # Test the module implementation @@ -4736,7 +5037,7 @@ def test_qconv1d( output_channels = output_channels_per_group * groups if torch.backends.quantized.engine == 'qnnpack': use_channelwise = False - true_conv1d = torch.nn.Conv1d( + conv1d = torch.nn.Conv1d( input_channels, output_channels, kernel, @@ -4749,12 +5050,23 @@ def test_qconv1d( qconv = torch.ops.quantized.conv1d if use_relu: qconv = torch.ops.quantized.conv1d_relu - self._test_qconv_impl( - qconv, qconv_prepack, true_conv1d, batch_size, - input_channels_per_group, (length, ), - output_channels_per_group, groups, kernel, [stride], [pad], None, - [dilation], X_scale, X_zero_point, W_scale, W_zero_point, - Y_scale, Y_zero_point, use_bias, use_relu, use_channelwise, False) + + act_qdtypes = [torch.quint8] + # Only qnnpack qengine supportes qint8 + if qengine_is_qnnpack() and torch.backends.xnnpack.enabled: + act_qdtypes.append(torch.qint8) + + for X_qdtype in act_qdtypes: + if X_qdtype == torch.qint8: + W_zero_point = [0 for i in range(len(W_zero_point))] + + self._test_qconv_impl( + qconv, qconv_prepack, conv1d, batch_size, + input_channels_per_group, (length, ), + output_channels_per_group, groups, kernel, [stride], [pad], None, + [dilation], X_scale, X_zero_point, W_scale, W_zero_point, + Y_scale, Y_zero_point, use_bias, use_relu, use_channelwise, False, + input_dtype=X_qdtype, output_dtype=X_qdtype) @given(batch_size=st.integers(1, 4), input_channels_per_group=st.sampled_from([2, 4, 5, 8, 16]), @@ -5089,7 +5401,7 @@ def test_qnnpack_sigmoid_sweep(self): """Tests the correctness of the quantized::add (qnnpack) op.""" @settings(suppress_health_check=(HealthCheck.filter_too_much,)) @given(A=hu.tensor(shapes=hu.array_shapes(1, 5, 1, 5), - qparams=hu.qparams(dtypes=torch.quint8)), + qparams=hu.qparams(dtypes=[torch.quint8, torch.qint8])), zero_point=st.sampled_from([0, 2, 5, 15, 127]), scale_A=st.sampled_from([0.001, 0.057, 0.889, 12.3]), scale_B=st.sampled_from([0.008, 0.0821, 0.67, 7]), @@ -5097,39 +5409,96 @@ def test_qnnpack_sigmoid_sweep(self): def test_qnnpack_add(self, A, zero_point, scale_A, scale_B, scale_C): with override_quantized_engine('qnnpack'): A_temp = A - A, (scale_a, zero_point_A, torch_type) = A_temp - B, (scale_b, zero_point_B, torch_type) = A_temp - A = torch.from_numpy(A) - B = torch.from_numpy(B) - - assume(scale_A // scale_C >= 2**-14) - assume(scale_A // scale_C < 2**8) - assume(scale_B // scale_C >= 2**-14) - assume(scale_B // scale_C < 2**8) - - zero_point_C = 127 - qA = torch.quantize_per_tensor(A, scale=scale_A, zero_point=zero_point, - dtype=torch.quint8) - qB = torch.quantize_per_tensor(B, scale=scale_B, zero_point=zero_point, - dtype=torch.quint8) + for channels_last in [True, False]: + if channels_last and len(A_temp[0].shape) != 4: + continue + A, (scale_a, zero_point_A, torch_type) = A_temp + B, (scale_b, zero_point_B, torch_type) = A_temp + A = torch.from_numpy(A) + B = torch.from_numpy(B) - # Add ground truth - C = (qA.dequantize() + qB.dequantize()).numpy() + if torch_type == torch.qint8 and not torch.backends.xnnpack.enabled: + continue - qC = _quantize(C, scale_C, zero_point_C) + if channels_last: + A = A.to(memory_format=torch.channels_last) + B = B.to(memory_format=torch.channels_last) + assume(scale_A // scale_C >= 2**-14) + assume(scale_A // scale_C < 2**8) + assume(scale_B // scale_C >= 2**-14) + assume(scale_B // scale_C < 2**8) - qC_qnnp = torch.ops.quantized.add(qA, qB, scale_C, zero_point_C) + zero_point_C = 127 + np_dtype = np.uint8 - np.testing.assert_equal(qC, qC_qnnp.int_repr(), - "Quantized addition failed.") + if torch_type == torch.qint8: + zero_point_C = 0 + np_dtype = np.int8 - Crelu = C.copy() - Crelu[C < 0] = 0 - qCrelu = torch.quantize_per_tensor(torch.from_numpy(Crelu), scale_C, - zero_point_C, dtype=torch.quint8) - qCrelu_hat = torch.ops.quantized.add_relu(qA, qB, scale=scale_C, zero_point=zero_point_C) - np.testing.assert_equal(qCrelu.int_repr().numpy(), qCrelu_hat.int_repr(), - "Quantized addition with ReLU failed.") + qA = torch.quantize_per_tensor(A, scale=scale_A, zero_point=zero_point, + dtype=torch_type) + qB = torch.quantize_per_tensor(B, scale=scale_B, zero_point=zero_point, + dtype=torch_type) + + # Add ground truth + C = (qA.dequantize() + qB.dequantize()).numpy() + + qC = _quantize(C, scale_C, zero_point_C, dtype=np_dtype) + + qC_qnnp = torch.ops.quantized.add(qA, qB, scale_C, zero_point_C) + + np.testing.assert_equal(qC, qC_qnnp.int_repr(), + "Quantized addition failed.") + + Crelu = C.copy() + Crelu[C < 0] = 0 + qCrelu = torch.quantize_per_tensor(torch.from_numpy(Crelu), scale_C, + zero_point_C, dtype=torch_type) + qCrelu_hat = torch.ops.quantized.add_relu(qA, qB, scale=scale_C, zero_point=zero_point_C) + np.testing.assert_equal(qCrelu.int_repr().numpy(), qCrelu_hat.int_repr(), + "Quantized addition with ReLU failed.") + + """Tests that quantized add works with broadcasting """ + def test_qnnpack_add_broadcast(self): + def _run_test(A, B): + qA = torch.quantize_per_tensor(A, 0.02, 0, dtype) + qB = torch.quantize_per_tensor(B, 0.04, 2, dtype) + + output_scale = 0.01 + output_zp = 1 + + # ground truth + C = qA.dequantize() + qB.dequantize() + qC = torch.quantize_per_tensor(C, output_scale, output_zp, dtype) + + # quantized + qC_hat_1 = torch.ops.quantized.add(qA, qB, output_scale, output_zp) + qC_hat_2 = torch.ops.quantized.add(qB, qA, output_scale, output_zp) + + self.assertTrue(torch.allclose(qC.dequantize(), qC_hat_1.dequantize())) + self.assertTrue(torch.allclose(qC.dequantize(), qC_hat_2.dequantize())) + + with override_quantized_engine("qnnpack"): + for dtype in (torch.qint8, torch.quint8): + if dtype == torch.qint8 and not torch.backends.xnnpack.enabled: + continue + + for channels_last in [True, False]: + # 4d + A = torch.randn(1, 3, 4, 4) + B = torch.randn(1, 1, 1, 1) + if channels_last: + A = A.to(memory_format=torch.channels_last) + B = B.to(memory_format=torch.channels_last) + _run_test(A, B) + + # 5d + C = torch.randn(1, 3, 4, 4, 4) + D = torch.randn(1, 1, 1, 1, 1) + if channels_last: + C = C.to(memory_format=torch.channels_last_3d) + D = D.to(memory_format=torch.channels_last_3d) + _run_test(C, D) """Tests the correctness of quantized::qnnpack_maxpool2d op.""" @given(A=hu.tensor(shapes=hu.array_shapes(4, 4, 3, 5), diff --git a/test/quantization/core/test_workflow_module.py b/test/quantization/core/test_workflow_module.py index 77fb492984c8e1..98c3fa913d015f 100644 --- a/test/quantization/core/test_workflow_module.py +++ b/test/quantization/core/test_workflow_module.py @@ -400,7 +400,7 @@ def test_zero_numel(self): x = obs(x) def _test_memoryless(self, obs_class): - obs = obs_class(memoryless=True) + obs = obs_class(averaging_constant=1) x = torch.randn((3, 3)) obs(x) params = obs.calculate_qparams() @@ -411,10 +411,10 @@ def _test_memoryless(self, obs_class): self.assertEqual(params, obs.calculate_qparams()) def test_memoryless_minmaxobserver(self): - self._test_memoryless(MinMaxObserver) + self._test_memoryless(MovingAverageMinMaxObserver) def test_memoryless_perchannelminmaxobserver(self): - self._test_memoryless(PerChannelMinMaxObserver) + self._test_memoryless(MovingAveragePerChannelMinMaxObserver) # HistogramObserver that works like it does on master class _ReferenceHistogramObserver(HistogramObserver): @@ -758,6 +758,17 @@ def test_fq_serializable_per_channel(self): for key in state_dict: self.assertEqual(state_dict[key], loaded_dict[key]) + def test_quant_min_max_override(self): + observer = default_per_channel_weight_observer + # test no override + fq_module = FakeQuantize(observer) + self.assertEqual(fq_module.activation_post_process.quant_min, -128) + self.assertEqual(fq_module.activation_post_process.quant_max, 127) + # test quant_min/quant_max override + fq_module = FakeQuantize(observer, quant_min=0, quant_max=127) + self.assertEqual(fq_module.activation_post_process.quant_min, 0) + self.assertEqual(fq_module.activation_post_process.quant_max, 127) + def _get_buffer_ids(module): """ Object addresses stay constant if and only if all modifications are in-place diff --git a/test/quantization/eager/test_numeric_suite_eager.py b/test/quantization/eager/test_numeric_suite_eager.py index 3bf969395c517c..3714a1f28c67b4 100644 --- a/test/quantization/eager/test_numeric_suite_eager.py +++ b/test/quantization/eager/test_numeric_suite_eager.py @@ -19,6 +19,8 @@ compare_model_outputs, compare_model_stub, compare_weights, + prepare_model_outputs, + get_matching_activations, ) from torch.testing._internal.common_quantization import ( AnnotatedConvBnReLUModel, @@ -30,6 +32,7 @@ QuantizationTestCase, SingleLayerLinearDynamicModel, test_only_eval_fn, + skip_if_no_torchvision, ) from torch.testing._internal.common_quantized import override_qengines @@ -421,14 +424,12 @@ def test_compare_model_outputs_functional_static(self): q_model(self.img_data_2d[0][0]) q_model = convert(q_model) act_compare_dict = compare_model_outputs(model, q_model, self.img_data_2d[0][0]) - self.assertEqual(len(act_compare_dict), 7) + self.assertEqual(len(act_compare_dict), 5) expected_act_compare_dict_keys = { "mycat.stats", "myadd.stats", "mymul.stats", "myadd_relu.stats", - "my_scalar_add.stats", - "my_scalar_mul.stats", "quant.stats", } self.assertTrue(act_compare_dict.keys() == expected_act_compare_dict_keys) @@ -534,3 +535,50 @@ def test_shadow_logger(self): self.assertEqual(len(logger.stats["float"]), 2) self.assertEqual(len(logger.stats["quantized"]), 2) + + @skip_if_no_torchvision + def _test_vision_model(self, float_model): + float_model.to('cpu') + float_model.eval() + float_model.fuse_model() + float_model.qconfig = torch.quantization.default_qconfig + img_data = [(torch.rand(2, 3, 224, 224, dtype=torch.float), torch.randint(0, 1, (2,), dtype=torch.long)) for _ in range(2)] + qmodel = quantize(float_model, torch.quantization.default_eval_fn, [img_data], inplace=False) + + wt_compare_dict = compare_weights(float_model.state_dict(), qmodel.state_dict()) + + def compute_error(x, y): + Ps = torch.norm(x) + Pn = torch.norm(x - y) + return 20 * torch.log10(Ps / Pn) + + data = img_data[0][0] + # Take in floating point and quantized model as well as input data, and returns a dict, with keys + # corresponding to the quantized module names and each entry being a dictionary with two keys 'float' and + # 'quantized', containing the activations of floating point and quantized model at matching locations. + act_compare_dict = compare_model_outputs(float_model, qmodel, data) + + + for key in act_compare_dict: + compute_error(act_compare_dict[key]['float'][0], act_compare_dict[key]['quantized'][0].dequantize()) + + prepare_model_outputs(float_model, qmodel) + + for data in img_data: + float_model(data[0]) + qmodel(data[0]) + + # Find the matching activation between floating point and quantized modules, and return a dict with key + # corresponding to quantized module names and each entry being a dictionary with two keys 'float' + # and 'quantized', containing the matching floating point and quantized activations logged by the logger + act_compare_dict = get_matching_activations(float_model, qmodel) + + @skip_if_no_torchvision + def test_mobilenet_v2(self): + from torchvision.models.quantization import mobilenet_v2 + self._test_vision_model(mobilenet_v2(pretrained=True, quantize=False)) + + @skip_if_no_torchvision + def test_mobilenet_v3(self): + from torchvision.models.quantization import mobilenet_v3_large + self._test_vision_model(mobilenet_v3_large(pretrained=True, quantize=False)) diff --git a/test/quantization/eager/test_quantize_eager_ptq.py b/test/quantization/eager/test_quantize_eager_ptq.py index a8ca0eb3353e2c..ec287cd89fa111 100644 --- a/test/quantization/eager/test_quantize_eager_ptq.py +++ b/test/quantization/eager/test_quantize_eager_ptq.py @@ -3,7 +3,6 @@ import torch import torch.nn as nn import torch.nn.quantized as nnq -import torch.nn.quantized._reference as nnqr from torch.nn.utils.rnn import PackedSequence from torch.ao.quantization import ( quantize, @@ -140,17 +139,7 @@ def forward(self, x): ref_m = prepare(original_ref_m) ref_m(data) - reference_module_mapping = { - QuantStub: nnq.Quantize, - DeQuantStub: nnq.DeQuantize, - nn.Conv1d: nnqr.Conv1d, - nn.Conv2d: nnqr.Conv2d, - nn.Conv3d: nnqr.Conv3d, - nn.ConvTranspose1d: nnqr.ConvTranspose1d, - nn.ConvTranspose2d: nnqr.ConvTranspose2d, - nn.ConvTranspose3d: nnqr.ConvTranspose3d, - } - ref_m = convert(ref_m, mapping=reference_module_mapping) + ref_m = convert(ref_m, is_reference=True) ref_res = ref_m(data) self.assertEqual(res, ref_res) @@ -202,6 +191,85 @@ def test_conv_transpose_3d(self): (16, 1, 10, 10, 10) ) + def test_linear(self): + self._test_reference_module_impl( + nn.Linear, + nnq.Linear, + {'in_features': 5, 'out_features': 10}, + (16, 5) + ) + + @override_qengines + def test_int16_reference_module(self): + + class RefM(torch.nn.Module): + def __init__(self): + super().__init__() + self.conv = nn.ConvTranspose2d(1, 1, 1) + self.quant1 = QuantStub() + self.dequant1 = DeQuantStub() + self.quant2 = QuantStub() + self.dequant2 = DeQuantStub() + + def forward(self, x): + x = self.quant1(x) + x = self.dequant1(x) + x = self.conv(x) + x = self.quant2(x) + x = self.dequant2(x) + return x + + + input_size = (16, 1, 10, 10) + data = torch.randn(*input_size, dtype=torch.float) + + original_ref_m = RefM() + rand_w = torch.randn_like(original_ref_m.conv.weight) + rand_b = torch.randn_like(original_ref_m.conv.bias) + original_ref_m.conv.weight = torch.nn.Parameter(rand_w, requires_grad=False) + original_ref_m.conv.bias = torch.nn.Parameter(rand_b, requires_grad=False) + + qengine = torch.backends.quantized.engine + if qengine not in supported_qengines: + return + from torch.ao.quantization.observer import MovingAverageMinMaxObserver + + weight_obs = MovingAverageMinMaxObserver.with_args( + dtype=torch.qint32, + # set qmin and qmax to represent qint16 + quant_min=-1 * (2 ** 15), + quant_max=(2 ** 15) - 1, + qscheme=torch.per_tensor_symmetric, + ) + act_obs = MovingAverageMinMaxObserver.with_args( + dtype=torch.qint32, + quant_min=-1 * (2 ** 15), + quant_max=(2 ** 15) - 1, + ) + custom_qconfig = QConfig(activation=act_obs, weight=weight_obs) + + # quantize the reference model + original_ref_m.eval() + original_ref_m.qconfig = custom_qconfig + + ref_m = prepare(original_ref_m) + # calibration + ref_m(torch.randn(*input_size, dtype=torch.float)) + + ref_m = convert(ref_m, is_reference=True) + + myobs = MovingAverageMinMaxObserver(averaging_constant=0.5, + dtype=torch.qint32, + # set qmin and qmax to represent qint16 + quant_min=-1 * (2 ** 15), + quant_max=(2 ** 15) - 1, + qscheme=torch.per_tensor_symmetric, + ) + result = myobs(rand_w) + qparams = myobs.calculate_qparams() + self.assertEqual(ref_m.conv.weight_scale, qparams[0]) + + def _test_activation_op_impl( self, float_module_class, quantized_module_class, extra_module_kwargs): """ Implementation for testing common activation ops like leaky relu @@ -1391,7 +1459,8 @@ def export_to_onnx(model, input, input_names): model = torch.jit.load(buf) f = io.BytesIO() torch.onnx.export(model, input, f, input_names=input_names, - operator_export_type=torch.onnx.OperatorExportTypes.ONNX_ATEN_FALLBACK) + operator_export_type=torch.onnx.OperatorExportTypes.ONNX_ATEN_FALLBACK, + opset_version=9) onnx_model = export_to_onnx(model, data, input_names) @skipIfNoFBGEMM diff --git a/test/quantization/eager/test_quantize_eager_qat.py b/test/quantization/eager/test_quantize_eager_qat.py index 02a3175f4c80c7..984e87dacbbcd9 100644 --- a/test/quantization/eager/test_quantize_eager_qat.py +++ b/test/quantization/eager/test_quantize_eager_qat.py @@ -23,6 +23,7 @@ default_qconfig, default_qat_qconfig, default_embedding_qat_qconfig, + default_symmetric_qnnpack_qat_qconfig, get_default_qat_qconfig, FixedQParamsFakeQuantize, FusedMovingAvgObsFakeQuantize, @@ -39,6 +40,7 @@ ManualDropoutQATModel, ManualLinearDynamicQATModel, ManualConvLinearQATModel, + ManualConvLinearSymmQATModel, ManualEmbeddingBagLinear, TwoLayerLinearModel, test_only_eval_fn, @@ -51,6 +53,8 @@ override_qengines, ) +from torch.testing._internal.common_utils import skipIfNoXNNPACK + from hypothesis import given from hypothesis import strategies as st import torch.testing._internal.hypothesis_utils as hu @@ -340,11 +344,45 @@ def checkQuantized(model): model = quantize_qat(model, test_only_train_fn, [self.img_data_2d_train]) checkQuantized(model) + @skipIfNoXNNPACK + def test_conv_linear_symm(self): + r"""Same as test_conv_linear but with Symmetric quantization. + Supported only with qengine=qnnpack, which uses symmetric + kernels from xnnpack library.""" + for qengine in supported_qengines: + if qengine != 'qnnpack': + continue + with override_quantized_engine(qengine): + model = ManualConvLinearSymmQATModel() + + model = prepare_qat(model) + self.checkObservers(model) + + test_only_train_fn(model, self.img_data_2d_train) + model = convert(model) + + def checkQuantized(model): + self.assertEqual(type(model.conv), nnq.Conv2d) + self.assertEqual(type(model.fc1), nnq.Linear) + self.assertEqual(type(model.fc2), nnq.Linear) + test_only_eval_fn(model, self.img_data_2d) + self.checkScriptable(model, self.img_data_2d) + self.checkNoQconfig(model) + + checkQuantized(model) + + model = ManualConvLinearSymmQATModel() + model = quantize_qat(model, test_only_train_fn, [self.img_data_2d_train]) + checkQuantized(model) + def test_dynamic_qat_linear(self): for qengine in supported_qengines: with override_quantized_engine(qengine): # Dynamic QAT without memoryless observers should fail - with self.assertRaisesRegex(ValueError, "Dynamic QAT requires a memoryless observer"): + with self.assertRaisesRegex(ValueError, + "Dynamic QAT requires a memoryless observer." + + "This means a MovingAverage observer with averaging constant equal to 1" + ): model = ManualLinearDynamicQATModel(default_qat_qconfig) model = prepare_qat(model, mapping={torch.nn.Linear: nnqatd.Linear}) @@ -1006,6 +1044,29 @@ def test_linear_bn_numerics(self): r2 = m(data) self.assertTrue(torch.allclose(r1, r2)) + @skipIfNoXNNPACK + @override_qengines + def test_linear_bn_symm_numerics(self): + qengine = torch.backends.quantized.engine + if qengine != "qnnpack": + return # Only qnnpack support symmetric quantization + m_ref = nn.Sequential( + nn.Linear(4, 4), + nn.BatchNorm1d(4), + ) + m_ref_copy = copy.deepcopy(m_ref) + m_ref_copy = torch.ao.quantization.fuse_modules_qat(m_ref_copy, [['0', '1']]) + qconfig = default_symmetric_qnnpack_qat_qconfig + m_ref_copy[0].qconfig = qconfig + m = nniqat.LinearBn1d.from_float(m_ref_copy[0]) + + # without fake_quants, fused QAT module should match fp32 module + m.apply(torch.quantization.disable_fake_quant) + data = torch.randn(4, 4) + r1 = m_ref(data) + r2 = m(data) + self.assertTrue(torch.allclose(r1, r2)) + @override_qengines def test_linear_bn_workflow(self): qengine = torch.backends.quantized.engine diff --git a/test/quantization/fx/test_numeric_suite_fx.py b/test/quantization/fx/test_numeric_suite_fx.py index fe1aad6c7771c3..37e737c94cfbf6 100644 --- a/test/quantization/fx/test_numeric_suite_fx.py +++ b/test/quantization/fx/test_numeric_suite_fx.py @@ -71,6 +71,8 @@ extract_shadow_logger_info, extend_logger_results_with_comparison, ) +from torch.ao.quantization.fx.backend_config import get_native_backend_config_dict +from torch.ao.quantization.fx.backend_config.utils import get_pattern_to_quantize_handlers # Note: these models are not for use outside of this file. While it's good @@ -274,7 +276,19 @@ def _wrapped_sigmoid(x): def _wrapped_linear(x, w, b): return F.linear(x, w, b) - +def get_all_quant_patterns(): + """ we are in the process to migrate the frontend of fx graph mode quant + to use backend_config_dict, so some of the patterns are moved to backend_config_dict + this function will include these patterns so that we can still have all the patterns + """ + # TODO: we can remove this call, and get all patterns from backend_config_dict in + # the future when the frontend refactor is done in fx graph mode quantization + all_quant_patterns = get_default_quant_patterns() + # some of the patterns are moved to (native) backend_config_dict so we need to + # add them back here + for pattern, quantize_handler in get_pattern_to_quantize_handlers(get_native_backend_config_dict()).items(): + all_quant_patterns[pattern] = quantize_handler + return all_quant_patterns class TestFXGraphMatcher(QuantizationTestCase): @@ -542,7 +556,6 @@ def forward(self, x): self.assert_types_for_matched_subgraph_pairs( results, expected_types, m1p, m2p) - def test_op_relationship_mapping(self): """ Tests that the mapping of op relationships is complete. @@ -620,7 +633,7 @@ def _op_is_unmatchable(op): op in METHS_UNMATCHABLE ) - default_quant_patterns = get_default_quant_patterns() + default_quant_patterns = get_all_quant_patterns() for pattern, qhandler_cls in default_quant_patterns.items(): base_op = None if isinstance(pattern, tuple): @@ -664,9 +677,6 @@ def _op_is_unmatchable(op): # RNNDynamicQuantizeHandler pass elif qhandler_cls == qp.DefaultNodeQuantizeHandler: - # torch.sum does not have quantized equivalents - if base_op == torch.sum: - continue self.assertTrue( _op_in_base_sets_of_related_ops(base_op), f"{base_op} not in sets of related ops") @@ -682,8 +692,23 @@ def _op_is_unmatchable(op): _op_in_base_sets_of_related_ops(base_op), f"{base_op} not in sets of related ops") else: - raise AssertionError( - f"handing for {qhandler_cls} not implemented") + # torch.sum does not have quantized equivalents + if base_op in [ + torch.sum, + nn.GRUCell, + nn.GRU, + nn.LSTMCell, + nn.RNNCell, + ]: + continue + if isinstance(base_op, tuple): + # skip fusion patterns + continue + # didn't match explicit quantize handler class, we can check if the + # operator is in the related op set directly + if not (_op_in_base_sets_of_related_ops(base_op) or _op_is_unmatchable(base_op)): + raise AssertionError( + f"handling for {qhandler_cls} for op {base_op} not implemented") @skipIfNoFBGEMM def test_user_defined_function(self): @@ -1534,7 +1559,7 @@ def test_op_io_dtype_coverage(self): # 4. go through the ops mapped to each QuantizeHandler type, and verify # correctness. - default_quant_patterns = get_default_quant_patterns() + default_quant_patterns = get_all_quant_patterns() for pattern, qhandler_cls in default_quant_patterns.items(): base_op = None if isinstance(pattern, tuple): @@ -1591,8 +1616,26 @@ def test_op_io_dtype_coverage(self): # embedding shadowing is not implemented, for now continue else: - raise AssertionError( - f"handing for {qhandler_cls} not implemented") + if ( + base_op in FUNS_UNMATCHABLE or + base_op in MODS_UNMATCHABLE or + base_op in METHS_UNMATCHABLE + ): + continue + if qhandler_cls(None, {}).is_general_tensor_value_op(): + self.assertTrue( + (base_op in FUNS_IO_TYPE_FP32_OR_INT8) or + (base_op in MODS_IO_TYPE_FP32_OR_INT8) or + (base_op in METHS_IO_TYPE_FP32_OR_INT8), + f"missing IO type handling for {base_op} using {qhandler_cls}") + else: + self.assertTrue( + (base_op in FUNS_IO_TYPE_FP32_OR_INT8) or + (base_op in MODS_IO_TYPE_FP32_OR_INT8) or + (base_op in METHS_IO_TYPE_FP32_OR_INT8) or + (base_op in FUNS_IO_TYPE_FP32) or + (base_op in MODS_IO_TYPE_FP32) or + f"missing IO type handling for {base_op} using {qhandler_cls}") @skipIfNoFBGEMM def test_user_defined_function(self): diff --git a/test/quantization/fx/test_quantize_fx.py b/test/quantization/fx/test_quantize_fx.py index 484d53a146424b..56d2194bd0c7ca 100644 --- a/test/quantization/fx/test_quantize_fx.py +++ b/test/quantization/fx/test_quantize_fx.py @@ -80,6 +80,8 @@ get_default_output_activation_post_process_map ) +from torch.ao.quantization.fx.utils import NodeInfo + from torch.ao.quantization.fake_quantize import ( default_affine_fixed_qparams_fake_quant, default_symmetric_fixed_qparams_fake_quant, @@ -133,7 +135,9 @@ import operator import unittest import io -from typing import Callable, Optional +from typing import Callable, Optional, List + + TEST_WITH_ROCM = os.getenv('PYTORCH_TEST_WITH_ROCM', '0') == '1' @@ -596,6 +600,77 @@ def conv_bn_res_relu_extra_inputs_getter(pattern): if node.op == "call_module" and type(named_modules[node.target]) == torch.nn.Conv2d: self.assertTrue(len(node.args) == 2), "Expecting the fused op to have two arguments" + def test_fusion_pattern_with_matchallnode(self): + """This test tests that the node matched by MatchAllNode will be regared as an input + instead of a module to be fused. For instance, we have two patterns: + (nn.ReLU, (torch.add, MatchAllNode, nn.Conv2d)) + (nn.ReLU, nn.Conv2d) + And we wanna fuse the following model + Conv2d -> ReLU + + Conv2d ------ Add -> ReLU + ReLU in the first row is matched as MatchAllNode in the residual pattern. But it won't be + fused as part of that pattnern. It needs to be properly fused with the upstream Conv2d. + """ + + class M(torch.nn.Module): + def __init__(self): + super().__init__() + self.conv1 = torch.nn.Conv2d(3, 3, 3) + self.relu1 = torch.nn.ReLU() + self.conv2 = torch.nn.Conv2d(3, 3, 3) + self.relu2 = torch.nn.ReLU() + + def forward(self, x): + y = self.conv1(x) + y = self.relu1(y) + + x = self.conv2(x) + x = torch.add(x, y) + x = self.relu2(x) + return x + + m = M().eval() + + def fuse_conv_relu(is_qat, relu, conv): + return conv + + def fuse_conv_res_relu(is_qat, relu, add_pattern): + _, conv, _ = add_pattern + return conv + + def conv_res_relu_root_node_getter(pattern): + relu, (_, conv, _) = pattern + return conv + + def conv_res_relu_extra_inputs_getter(pattern): + relu, (_, _, extra_input) = pattern + return [extra_input] + + conv_relu_config = { + "pattern": (nn.ReLU, nn.Conv2d), + "fuser_method": fuse_conv_relu, + } + conv_res_relu_config = { + "pattern": (nn.ReLU, (torch.add, nn.Conv2d, MatchAllNode)), + "fuser_method": fuse_conv_res_relu, + "root_node_getter": conv_res_relu_root_node_getter, + "extra_inputs_getter": conv_res_relu_extra_inputs_getter, + } + + backend_config_dict = { + "configs": [ + conv_relu_config, + conv_res_relu_config, + ], + } + m = fuse_fx(m, backend_config_dict=backend_config_dict) + self.assertEqual(type(m.conv1), torch.nn.Conv2d) + self.assertEqual(type(m.conv2), torch.nn.Conv2d) + # check relu are gone since we replaced the both patterns to conv + self.assertFalse(hasattr(m, "relu1")) + self.assertFalse(hasattr(m, "relu2")) + + @skipIfNoFBGEMM class TestQuantizeFx(QuantizationTestCase): def test_pattern_match(self): @@ -947,7 +1022,7 @@ def forward(self, x): qconfig_dict = {'': qconfig} prepared = prepare_fx(m, qconfig_dict) quantized = convert_fx(prepared, is_reference=True) - qparams = (quantized._input_scale_0, quantized._input_zero_point_0) + qparams = (quantized._scale_0, quantized._zero_point_0) weight_obs = qconfig.weight() weight_obs(quantized.weight) # Get the actual value to avoid tensor size mismatch error, torch.Size([]) vs torch.Size([1]) @@ -955,6 +1030,8 @@ def forward(self, x): self.assertEqual(qparams, ref_qparams) def test_conv_bn_relu(self): + """ Tests fusion and quantization for "Conv - Bn" and "Conv - Bn - ReLU" + """ convs = { 1: nn.Conv1d, 2: nn.Conv2d, @@ -995,8 +1072,7 @@ def forward(self, x): x = self.dequant(x) return x - # TODO: add 1d support - options = itertools.product([2, 3], [True, False], self.static_quant_types) + options = itertools.product([1, 2, 3], [True, False], self.static_quant_types) for dim, has_relu, quant_type in options: expected_node = ns.call_module( quantized_conv_relus[dim] if has_relu @@ -1033,11 +1109,57 @@ def forward(self, x): fuse_modules(m_eager, fuse_list, inplace=True) m_eager.qconfig = qconfig m_eager = prepare_fn(m_eager) + prepared_fx = result_dict["prepared"] + m_eager(*self.img_data_dict[dim][0]) m_eager = convert(m_eager) result_eager = m_eager(*self.img_data_dict[dim][0]) self.assertEqual(result, result_eager) + def test_linear_bn(self): + class M(torch.nn.Module): + def __init__(self): + super().__init__() + self.linear = nn.Linear(4, 4) + self.bn = nn.BatchNorm1d(4) + self.quant = QuantStub() + self.dequant = DeQuantStub() + + def forward(self, x): + x = self.quant(x) + x = self.linear(x) + x = self.bn(x) + x = self.dequant(x) + return x + + data = (torch.randn(4, 4),) + for quant_type in self.static_quant_types: + expected_node = ns.call_module(nnq.Linear) + m = M() + m_eager = copy.deepcopy(m) + result_dict = self.checkGraphModeFxOp(m, data, quant_type, expected_node=expected_node) + result = result_dict["quantized_output"] + + # check numerics vs eager mode + fuse_list = ["linear", "bn"] + qengine = torch.backends.quantized.engine + if quant_type == QuantType.STATIC: + m_eager.eval() + qconfig = get_default_qconfig(qengine) + prepare_fn = prepare + fuse_modules(m_eager, fuse_list, inplace=True) + else: + m_eager.train() + qconfig = get_default_qat_qconfig(qengine) + prepare_fn = prepare_qat + fuse_modules_qat(m_eager, fuse_list, inplace=True) + m_eager.qconfig = qconfig + m_eager = prepare_fn(m_eager) + m_eager(*data) + m_eager = convert(m_eager) + result_eager = m_eager(*data) + self.assertEqual(result, result_eager) + @skipIfNoFBGEMM def test_dynamic_quant_fp16(self): class Linear(torch.nn.Module): @@ -2124,6 +2246,88 @@ def forward(self, x): ref_res = ref_m(data) self.assertEqual(res, ref_res) + @skipIfNoFBGEMM + def test_custom_module_class_input_has_multiple_users(self): + """ Tests that the flow still works when the input of custom module + has multiple users + """ + class CustomModule(torch.nn.Module): + def __init__(self): + super().__init__() + self.linear = torch.nn.Linear(3, 3) + + def forward(self, x): + return self.linear(x) + + class ObservedCustomModule(torch.nn.Module): + def __init__(self, linear): + super().__init__() + self.linear = linear + + def forward(self, x): + return self.linear(x) + + @classmethod + def from_float(cls, float_module): + assert hasattr(float_module, 'qconfig') + observed = cls(float_module.linear) + observed.qconfig = float_module.qconfig + return observed + + class StaticQuantCustomModule(torch.nn.Module): + def __init__(self, linear): + super().__init__() + self.linear = linear + + def forward(self, x): + return self.linear(x) + + @classmethod + def from_observed(cls, observed_module): + assert hasattr(observed_module, 'qconfig') + assert hasattr(observed_module, 'activation_post_process') + observed_module.linear.activation_post_process = \ + observed_module.activation_post_process + quantized = cls(nnq.Linear.from_float(observed_module.linear)) + return quantized + + class M(torch.nn.Module): + def __init__(self): + super().__init__() + self.linear = torch.nn.Linear(3, 3) + self.custom = CustomModule() + + def forward(self, x0): + x1 = self.custom(x0) + x2 = self.linear(x0) + return x1 + x2 + + prepare_custom_config_dict = { + "float_to_observed_custom_module_class": { + "static": { + CustomModule: ObservedCustomModule + } + } + } + convert_custom_config_dict = { + "observed_to_quantized_custom_module_class": { + "static": { + ObservedCustomModule: StaticQuantCustomModule + } + } + } + m = M().eval() + m = prepare_fx( + m, + {"": default_qconfig}, + prepare_custom_config_dict=prepare_custom_config_dict) + # make sure it works + m = convert_fx( + m, + convert_custom_config_dict=convert_custom_config_dict) + # make sure it runs + m(torch.randn(3, 3)) + @skipIfNoFBGEMM def test_non_traceable_module(self): class NonTraceable(torch.nn.Module): @@ -2425,12 +2629,13 @@ def forward(self, x): self.assertTrue( set(scripted_keys) == set(non_packed_weight_keys), "Expected the scripted model to preserve the state_dict for non-packed weight attributes") + # TODO: probably don't want to hardcode the attribute names, since they are generated for attr_name in [ "mods1_0_input_scale_0", "mods1_0_input_zero_point_0", - "mods1_0_scale_0", "mods1_0_zero_point_0", - "mods1_1_scale_0", "mods1_1_zero_point_0", - "mods2_scale_0", "mods2_zero_point_0"]: - self.assertTrue(hasattr(m, attr_name)) + "mods1_0_scale_1", "mods1_0_zero_point_1", + "mods1_1_scale_1", "mods1_1_zero_point_1", + "mods2_scale_1", "mods2_zero_point_1"]: + self.assertTrue(hasattr(m, attr_name), attr_name + " not found.") @skipIfNoFBGEMM def test_packed_weight_fused_op(self): @@ -2543,6 +2748,234 @@ def forward(self, x): mp(torch.rand(4, 4, 4, 4)) mc = convert_fx(mp) + class _NonReferenceTestModel(nn.Module): + def __init__(self, func, lin_in, lin_out): + super().__init__() + self.conv1 = nn.Conv2d(3, 6, 5) + self.pool = nn.MaxPool2d(2, 2) + self.lin = nn.Linear(lin_in, lin_out) + self.func = func + + def forward(self, x, y, z): + x = self.pool(F.relu(self.conv1(x))) + x = torch.flatten(x, 1) + x = self.func(x, y, z) + x = self.lin(x) + return x + + # This function looks at the node specified by the NodeInfo in the key of + # node_info_to_non_tensor_args and checks that the args at specified indices + # are not observed (since they are non tensors). If the args at those indices + # are a tuple/list (which do not show up as nodes) the function checks the + # individual elements of the tuple/list recursively. + def _check_not_observed(self, model, node_info_to_non_tensor_args): + + # this is a helper function (for easier recursion) that checks whether + # arg_node is observed + def _check_node_not_observed(model, arg_node, node): + if isinstance(arg_node, tuple) or isinstance(arg_node, list): + for new_node in arg_node: + _check_node_not_observed(model, new_node, node) + elif arg_node.op == "call_module": + self.assertTrue( + not is_activation_post_process(getattr(model, arg_node.target)), + "Arg: {0} of node: {1} is observed but is not a float tensor".format( + arg_node, node + ), + ) + + for node in model.graph.nodes: + indices = node_info_to_non_tensor_args.get( + NodeInfo(node.op, node.target), [] + ) + for index in indices: + if index < len(node.args): + arg_node = node.args[index] + _check_node_not_observed(model, arg_node, node) + + # This test checks that the model gets prepared correct, doesn't have observers + # on specific ops (see _check_not_observed) and that the prepared model runs + def _test_dtype_propagation(self, model, node_info_to_non_tensor_args, *args): + model.eval() + qconfig_dict = {"": torch.ao.quantization.get_default_qconfig("fbgemm")} + prepared_model = prepare_fx(model, qconfig_dict) + self._check_not_observed(prepared_model, node_info_to_non_tensor_args) + prepared_model(*args) + + def test_masked_fill_nontensor_args_not_observed(self): + def func(x, y, z): + return x.masked_fill(y, z) + + model = self._NonReferenceTestModel(func, 1176, 1) + args = [torch.randn(5, 3, 32, 32), torch.randn(1176) > 0, 0.1] + node_info_to_non_tensor_args = {NodeInfo("call_method", "masked_fill"): [1, 2]} + self._test_dtype_propagation(model, node_info_to_non_tensor_args, *args) + + def test_permute_nontensor_args_not_observed(self): + def func(x, y, z): + return x.permute(y, z) + + model = self._NonReferenceTestModel(func, 1176, 1) + args = [torch.randn(5, 3, 32, 32), 0, 1] + node_info_to_non_tensor_args = {NodeInfo("call_method", "permute"): [1, 2]} + self._test_dtype_propagation(model, node_info_to_non_tensor_args, *args) + + def test_repeat_nontensor_args_not_observed(self): + def func(x, y, z): + return x.repeat(y, z) + + model = self._NonReferenceTestModel(func, 1176, 1) + args = [torch.randn(5, 3, 32, 32), 2, 1] + node_info_to_non_tensor_args = {NodeInfo("call_method", "repeat"): [1, 2]} + self._test_dtype_propagation(model, node_info_to_non_tensor_args, *args) + + def test_reshape_nontensor_args_not_observed(self): + def func(x, y, z): + return x.reshape(-1, y) + + model = self._NonReferenceTestModel(func, 5, 1) + args = [torch.randn(5, 3, 32, 32), 5, None] + node_info_to_non_tensor_args = {NodeInfo("call_method", "reshape"): [2]} + self._test_dtype_propagation(model, node_info_to_non_tensor_args, *args) + + def test_size_nontensor_args_not_observed(self): + def func(x, y, z): + return x.reshape((-1, x.size(y))) + + model = self._NonReferenceTestModel(func, 5, 1) + args = [torch.randn(5, 3, 32, 32), 0, None] + node_info_to_non_tensor_args = {NodeInfo("call_method", "size"): [1]} + self._test_dtype_propagation(model, node_info_to_non_tensor_args, *args) + + def test_transpose_nontensor_args_not_observed(self): + def func(x, y, z): + return x.transpose(y, z) + + model = self._NonReferenceTestModel(func, 5, 1) + args = [torch.randn(5, 3, 32, 32), 0, 1] + node_info_to_non_tensor_args = {NodeInfo("call_method", "transpose"): [1, 2]} + self._test_dtype_propagation(model, node_info_to_non_tensor_args, *args) + + def test_torch_transpose_nontensor_args_not_observed(self): + # TODO: make torch.transpose traceable by fx when using + # variable nontensor arguments + # func = lambda x, y, z: torch.transpose(x, y, z) # error + def func(x, y, z): + return torch.transpose(x, 0, 1) + + model = self._NonReferenceTestModel(func, 5, 1) + node_info_to_non_tensor_args = { + NodeInfo("call_method", torch.transpose): [1, 2] + } + args = [torch.randn(5, 3, 32, 32), 0, 1] + self._test_dtype_propagation(model, node_info_to_non_tensor_args, *args) + + def test_unsqueeze_nontensor_args_not_observed(self): + def func(x, y, z): + return x.unsqueeze(y) + + model = self._NonReferenceTestModel(func, 1176, 1) + args = [torch.randn(5, 3, 32, 32), 1, None] + node_info_to_non_tensor_args = {NodeInfo("call_method", "unsqueeze"): [1]} + self._test_dtype_propagation(model, node_info_to_non_tensor_args, *args) + + def test_unsqueeze__nontensor_args_not_observed(self): + def func(x, y, z): + return x.unsqueeze_(y) + + model = self._NonReferenceTestModel(func, 1176, 1) + args = [torch.randn(5, 3, 32, 32), 1, None] + node_info_to_non_tensor_args = {NodeInfo("call_method", "unsqueeze_"): [1]} + self._test_dtype_propagation(model, node_info_to_non_tensor_args, *args) + + def test_torch_unsqueeze_nontensor_args_not_observed(self): + # TODO: make torch.unsqueeze scriptable by fx when using + # variable nontensor arguments + # func = lambda x, y, z: torch.unsqueeze(x, y) # error + def func(x, y, z): + return torch.unsqueeze(x, 1) + + model = self._NonReferenceTestModel(func, 1176, 1) + args = [torch.randn(5, 3, 32, 32), 1, None] + node_info_to_non_tensor_args = {NodeInfo("call_method", torch.unsqueeze): [1]} + self._test_dtype_propagation(model, node_info_to_non_tensor_args, *args) + + def test_view_nontensor_args_not_observed(self): + def func(x, y, z): + return x.view(-1, y) + + model = self._NonReferenceTestModel(func, 5, 1) + args = [torch.randn(5, 3, 32, 32), 5, None] + node_info_to_non_tensor_args = {NodeInfo("call_method", "view"): [2]} + self._test_dtype_propagation(model, node_info_to_non_tensor_args, *args) + + def test_propagate_dtypes_for_known_nodes_list_args(self): + def func(x, y, z): + return x.reshape(y) + + model = self._NonReferenceTestModel(func, 5, 1) + args = [torch.randn(5, 3, 32, 32), [-1, 5], None] + node_info_to_non_tensor_args = {NodeInfo("call_method", "reshape"): [1]} + self._test_dtype_propagation(model, node_info_to_non_tensor_args, *args) + + def test_propagate_dtypes_for_known_nodes_split_list_args(self): + def func(x, y, z): + return x.reshape([y, z]) + + model = self._NonReferenceTestModel(func, 5, 1) + args = [torch.randn(5, 3, 32, 32), -1, 5] + node_info_to_non_tensor_args = {NodeInfo("call_method", "reshape"): [1]} + self._test_dtype_propagation(model, node_info_to_non_tensor_args, *args) + + def test_propagate_dtypes_for_known_nodes_tuple_args(self): + def func(x, y, z): + return x.reshape(y) + + model = self._NonReferenceTestModel(func, 5, 1) + args = [torch.randn(5, 3, 32, 32), (-1, 5), None] + node_info_to_non_tensor_args = {NodeInfo("call_method", "reshape"): [1]} + self._test_dtype_propagation(model, node_info_to_non_tensor_args, *args) + + def test_propagate_dtypes_for_known_nodes_split_tuple_args(self): + def func(x, y, z): + return x.reshape((y, z)) + + model = self._NonReferenceTestModel(func, 5, 1) + args = [torch.randn(5, 3, 32, 32), -1, 5] + node_info_to_non_tensor_args = {NodeInfo("call_method", "reshape"): [1]} + self._test_dtype_propagation(model, node_info_to_non_tensor_args, *args) + + def test_propagate_dtypes_for_known_nodes_dict_args(self): + def func(x, y, z): + return x.transpose(y["first"], y["second"]) + + model = self._NonReferenceTestModel(func, 5, 1) + args = [torch.randn(5, 3, 32, 32), {"first": 0, "second": 1}, None] + node_info_to_non_tensor_args = {NodeInfo("call_method", "transpose"): [1, 2]} + self._test_dtype_propagation(model, node_info_to_non_tensor_args, *args) + + def test_propagate_dtypes_for_known_nodes_dict_tuple_args(self): + class reshape_module(nn.Module): + def __init__(self): + super().__init__() + + def forward(self, x, y, z): + return x.reshape(y["shape"]) + + model = self._NonReferenceTestModel(reshape_module(), 5, 1) + args = [torch.randn(5, 3, 32, 32), {"shape": (-1, 5)}, None] + node_info_to_non_tensor_args = {NodeInfo("call_method", "reshape"): [1]} + self._test_dtype_propagation(model, node_info_to_non_tensor_args, *args) + + def test_propagate_dtypes_for_known_nodes_dict_split_tuple_args(self): + def func(x, y, z): + return x.reshape((y["first"], y["second"])) + + model = self._NonReferenceTestModel(func, 5, 1) + args = [torch.randn(5, 3, 32, 32), {"first": -1, "second": 5}, None] + node_info_to_non_tensor_args = {NodeInfo("call_method", "transpose"): [1]} + self._test_dtype_propagation(model, node_info_to_non_tensor_args, *args) + def test_assert_on_size_after_quant_layer(self): """ Verifies that calculating a size of a quantized tensor works @@ -2817,11 +3250,12 @@ def forward(self, x): m = convert_fx(m) keys = m.state_dict().keys() m(torch.randn(5, 5)) + # TODO: probably don't want to hardcode the attribute names, since they are generated for attr_name in [ "mods1_0_input_scale_0", "mods1_0_input_zero_point_0", "mods1_0_scale_0", "mods1_0_zero_point_0", "mods1_1_scale_0", "mods1_1_zero_point_0"]: - self.assertTrue(hasattr(m, attr_name)) + self.assertTrue(hasattr(m, attr_name), attr_name + " not found.") def test_no_obs_between_unmatched_node_and_copy_node(self): """ @@ -3153,7 +3587,6 @@ def forward(self, x): def test_preserve_tuple(self): """ Test tuple input type is preserved """ - from typing import List class LSTM(nn.Module): def __init__(self): @@ -3231,23 +3664,101 @@ def forward(self, x): x = self.relu(x) return x - model = M().eval() - dynamic_quantized_ops = { float16_dynamic_qconfig: torch.ops.quantized.linear_relu_dynamic_fp16, default_dynamic_qconfig: torch.ops.quantized.linear_relu_dynamic } - for config in [float16_dynamic_qconfig, default_dynamic_qconfig]: - qconfig = { - "": config + for qconfig in [float16_dynamic_qconfig, default_dynamic_qconfig]: + model = M().eval() + qconfig_dict = { + "": qconfig } - m = prepare_fx(model, qconfig) + m = prepare_fx(model, qconfig_dict) m = convert_fx(m) m(torch.rand(5, 5)) node_list = [ ns.call_module(nniqd.LinearReLU), ns.call_module(nniqd.LinearReLU), - ns.call_function(dynamic_quantized_ops[config]), + ns.call_function(dynamic_quantized_ops[qconfig]), + ] + self.checkGraphModuleNodes(m, expected_node_list=node_list) + + @skipIfNoFBGEMM + def test_dynamic_with_fusion_multiple_uses(self): + """ + Tests that dynamic quantization APIs work with Linear + Relu fusion + """ + class LinearRelu(torch.nn.Module): + def __init__(self): + super().__init__() + self.linear = torch.nn.Linear(5, 5) + self.relu = torch.nn.ReLU() + + def forward(self, x): + x = self.linear(x) + return self.relu(x) + + class M(torch.nn.Module): + def __init__(self): + super().__init__() + self.linear_relu = LinearRelu() + + def forward(self, x): + x = self.linear_relu(x) + x = self.linear_relu(x) + return x + + for qconfig in [float16_dynamic_qconfig, default_dynamic_qconfig]: + model = M().eval() + qconfig_dict = { + "": qconfig + } + m = prepare_fx(model, qconfig_dict) + m = convert_fx(m) + m(torch.rand(5, 5)) + node_list = [ + ns.call_module(nniqd.LinearReLU), + ns.call_module(nniqd.LinearReLU), + ] + self.checkGraphModuleNodes(m, expected_node_list=node_list) + + @skipIfNoFBGEMM + def test_dynamic_linear_input_multiple_use(self): + """ + Tests input for dynamic linear being used by multiple ops + """ + class LinearRelu(torch.nn.Module): + def __init__(self): + super().__init__() + self.linear = torch.nn.Linear(5, 5) + self.relu = torch.nn.ReLU() + + def forward(self, x): + x = self.linear(x) + return self.relu(x) + + class M(torch.nn.Module): + def __init__(self): + super().__init__() + self.mod1 = LinearRelu() + self.mod2 = LinearRelu() + + def forward(self, x): + y1 = self.mod1(x) + y2 = self.mod2(x) + return y1 + y2 + + for qconfig in [float16_dynamic_qconfig, default_dynamic_qconfig]: + model = M().eval() + qconfig_dict = { + "": qconfig + } + m = prepare_fx(model, qconfig_dict) + m = convert_fx(m) + m(torch.rand(5, 5, 5)) + node_list = [ + ns.call_module(nniqd.LinearReLU), + ns.call_module(nniqd.LinearReLU), ] self.checkGraphModuleNodes(m, expected_node_list=node_list) @@ -3499,6 +4010,7 @@ def forward(self, x): ns.call_function(torch.quantize_per_tensor): 1, ns.call_function(torch.ops.quantized.linear): 2, ns.call_function(torch.ops.quantized.add): 1, + ns.call_function(torch.mul): 1, ns.call_method("dequantize"): 1 } order_check = [ @@ -3507,6 +4019,7 @@ def forward(self, x): ns.call_function(torch.ops.quantized.linear), ns.call_function(torch.ops.quantized.add), ns.call_method("dequantize"), + ns.call_function(torch.mul), ns.call_module(nn.Linear), ] @@ -3520,19 +4033,6 @@ def forward(self, x): def _assertFixedQParamsFakeQuantizeEqual(self, fq1, fq2): self.assertEqual(fq1()._observer_ctr, fq2()._observer_ctr) - def test_fixed_qparams_patterns(self): - hard_sigmoid_keys = [torch.nn.Hardsigmoid, torch.nn.functional.hardsigmoid, "hardsigmoid", "hardsigmoid_"] - sigmoid_keys = [torch.nn.Sigmoid, torch.sigmoid, "sigmoid", "sigmoid_"] - tanh_keys = [torch.nn.Tanh, torch.tanh, "tanh", "tanh_"] - for k in hard_sigmoid_keys + sigmoid_keys: - self.assertEqual(DEFAULT_OUTPUT_OBSERVER_MAP[k], default_affine_fixed_qparams_observer) - self._assertFixedQParamsFakeQuantizeEqual(DEFAULT_OUTPUT_FAKE_QUANTIZE_MAP[k], - default_affine_fixed_qparams_fake_quant) - for k in tanh_keys: - self.assertEqual(DEFAULT_OUTPUT_OBSERVER_MAP[k], default_symmetric_fixed_qparams_observer) - self._assertFixedQParamsFakeQuantizeEqual(DEFAULT_OUTPUT_FAKE_QUANTIZE_MAP[k], - default_symmetric_fixed_qparams_fake_quant) - def test_register_patterns(self): @register_fusion_pattern("dummy_fusion") class DummyFusion(): @@ -3560,10 +4060,13 @@ class DummyQuant3(): default_affine_fixed_qparams_fake_quant) self._assertFixedQParamsFakeQuantizeEqual(DEFAULT_OUTPUT_FAKE_QUANTIZE_MAP["dummy_quant3"], default_symmetric_fixed_qparams_fake_quant) - self.assertTrue(get_default_output_activation_post_process_map(is_training=True) is - DEFAULT_OUTPUT_FAKE_QUANTIZE_MAP) - self.assertTrue(get_default_output_activation_post_process_map(is_training=False) is - DEFAULT_OUTPUT_OBSERVER_MAP) + output_fake_quantize_map = get_default_output_activation_post_process_map(is_training=True) + output_observer_map = get_default_output_activation_post_process_map(is_training=False) + self.assertEqual(output_observer_map.get("dummy_quant3"), default_symmetric_fixed_qparams_observer) + self._assertFixedQParamsFakeQuantizeEqual(output_fake_quantize_map.get("dummy_quant3"), + default_symmetric_fixed_qparams_fake_quant) + + def test_reuse_input_qconfig(self): class M1(torch.nn.Module): @@ -3652,22 +4155,63 @@ def forward(self, x): break self.assertTrue(found_stack_trace, f"stack trace not found, node: {n.format_node()}, is_reference: False") - def test_stack_trace_preserved_subgraph_rewriter(self): - # a functional relu is taking the subgraph rewriter code path + def test_qat_skip_untraced(self): + class UnTraceableModuleClass(nn.Module): + def __init__(self): + super().__init__() + self.linear = nn.Linear(2, 2) + + def forward(self, x): + return self.linear(x) + + class UnTraceableModuleName(nn.Module): + def __init__(self): + super().__init__() + self.linear = nn.Linear(2, 2) + + def forward(self, x): + return self.linear(x) + class M(nn.Module): + def __init__(self): + super().__init__() + self.untraceable_module_class = UnTraceableModuleClass() + self.untraceable_module_name = UnTraceableModuleClass() + def forward(self, x): - x = F.relu(x) + x = self.untraceable_module_class(x) + x = self.untraceable_module_name(x) return x - m = M().eval() - mp = prepare_fx(m, get_default_qconfig_dict()) - mq = convert_fx(copy.deepcopy(mp), is_reference=False) - found_stack_trace = False - for n in mq.graph.nodes: - if n.op == 'call_function' and n.target == F.relu: - found_stack_trace = n.stack_trace is not None - break - self.assertTrue(found_stack_trace, f"stack trace not found, node: {n.format_node()}, is_reference: True") + mod = M() + + qconfig_dict = {"": torch.quantization.get_default_qat_qconfig()} + prepare_custom_config_dict = { + "non_traceable_module_class": [UnTraceableModuleClass], + "non_traceable_module_name": ["untraceable_module_name"], + } + mod_prep = torch.ao.quantization.quantize_fx.prepare_qat_fx( + mod.train(), qconfig_dict, prepare_custom_config_dict + ) + mod_prep = torch.ao.quantization.quantize_fx.prepare_qat_fx( + mod.train(), qconfig_dict, prepare_custom_config_dict + ) + self.assertTrue( + isinstance(mod_prep.untraceable_module_class.linear, torch.nn.Linear) + ) + self.assertTrue( + isinstance(mod_prep.untraceable_module_name.linear, torch.nn.Linear) + ) + self.assertTrue( + type(mod_prep.untraceable_module_class.linear) + is not torch.nn.qat.modules.linear.Linear, + "prepare_qat_fx shold not convert anything inside untraced module classes", + ) + self.assertTrue( + type(mod_prep.untraceable_module_name.linear) + is not torch.nn.qat.modules.linear.Linear, + "prepare_qat_fx shold not convert anything inside modules named in untraced_module_names", + ) def test_qconfig_dict_setup(self): class M(torch.nn.Module): @@ -3710,6 +4254,28 @@ def forward(self, x): self.assertEqual(mod.quant_min, 0) self.assertEqual(mod.quant_max, 255) + def test_prepare_mode(self): + class LinearModel(torch.nn.Module): + def __init__(self): + super().__init__() + self.linear = torch.nn.Linear(5, 10) + + def forward(self, x): + return self.linear(x) + + def _test(prepare_fn, qconfig_dict): + m = LinearModel() + m1 = copy.deepcopy(m) + m1.train() + prepare_fn(m1, qconfig_dict) + m2 = copy.deepcopy(m) + m2.eval() + prepare_fn(m2, qconfig_dict) + + # Ensure prepare_fx and prepare_qat_fx work in both training and eval modes + _test(prepare_fx, get_default_qconfig_dict()) + _test(prepare_qat_fx, get_default_qat_qconfig_dict()) + @skipIfNoFBGEMM class TestQuantizeFxOps(QuantizationTestCase): def setUp(self): @@ -3750,41 +4316,64 @@ def setUp(self): """ @skipIfNoFBGEMM def test_linear_module(self): - class ModuleLinear(torch.nn.Module): - def __init__(self, has_relu=False, f_relu=False): - super(ModuleLinear, self).__init__() + class LinearModel(torch.nn.Module): + def __init__(self): + super(LinearModel, self).__init__() self.linear = torch.nn.Linear(30, 4).float() - if has_relu: - if f_relu: - self.relu = F.relu - else: - self.relu = torch.nn.ReLU() + + def forward(self, x): + return self.linear(x) + + class LinearReLUModel(torch.nn.Module): + def __init__(self, f_relu=False): + super(LinearReLUModel, self).__init__() + self.linear = torch.nn.Linear(30, 4).float() + if f_relu: + self.relu = F.relu else: - self.relu = torch.nn.Identity() + self.relu = torch.nn.ReLU() def forward(self, x): - return self.relu(self.linear(x)) + x = self.linear(x) + x = self.relu(x) + return x + + class LinearBnModel(torch.nn.Module): + def __init__(self): + super(LinearBnModel, self).__init__() + self.linear = torch.nn.Linear(4, 4).float() + self.bn = torch.nn.BatchNorm1d(4) + + def forward(self, x): + x = self.linear(x) + x = self.bn(x) + return x + # Test linear data = (torch.rand((1, 30), dtype=torch.float),) - options = itertools.product( - [ModuleLinear(has_relu=False)], - self.all_quant_types) - quantized_nodes = { - # quant_type: - QuantType.DYNAMIC: ns.call_module(nnqd.Linear), - QuantType.STATIC: ns.call_module(nnq.Linear), - # note that we are checking the final result - QuantType.QAT: ns.call_module(nnq.Linear), - } - for model, quant_type in options: - self.checkGraphModeFxOp( - model, data, quant_type, quantized_nodes[quant_type]) + for quant_type in self.all_quant_types: + model = LinearModel() + quantized_module = nnqd.Linear if quant_type == QuantType.DYNAMIC else nnq.Linear + quantized_node = ns.call_module(quantized_module) + result_dict = self.checkGraphModeFxOp(model, data, quant_type, quantized_node) + if quant_type in self.static_quant_types: + self.assertEqual(result_dict["quantized_output"], result_dict["quantized_reference_output"]) + # TODO: enable test for dynamic quant + # Test linear-relu for f_relu, quant_type in itertools.product([True, False], [QuantType.STATIC, QuantType.QAT]): - for model, quantized_node in [ - (ModuleLinear(has_relu=True, f_relu=f_relu), ns.call_module(nniq.LinearReLU))]: - result_dict = self.checkGraphModeFxOp(model, data, quant_type, quantized_node) - self.assertEqual(result_dict["quantized_output"], result_dict["quantized_reference_output"]) + model = LinearReLUModel(f_relu) + quantized_node = ns.call_module(nniq.LinearReLU) + result_dict = self.checkGraphModeFxOp(model, data, quant_type, quantized_node) + self.assertEqual(result_dict["quantized_output"], result_dict["quantized_reference_output"]) + + # Test linear-bn + data = (torch.rand((4, 4), dtype=torch.float),) + for quant_type in self.static_quant_types: + model = LinearBnModel() + quantized_node = ns.call_module(nnq.Linear) + result_dict = self.checkGraphModeFxOp(model, data, quant_type, quantized_node) + self.assertEqual(result_dict["quantized_output"], result_dict["quantized_reference_output"]) @skipIfNoFBGEMM def test_functional_linear(self): @@ -3853,10 +4442,18 @@ def forward(self, x): else: qlinear_fun = quant_type_to_qlinear_fun[quant_type] + if quant_type != QuantType.DYNAMIC: + num_dequantize = 1 + else: + # we will have an extra quantize_per_tensor_dynamic + dequantize for + # nn.Identity right now, but it will be fixed after we use + # backend_config_dict to configure the default pt backend + num_dequantize = int(not has_relu) + convert_node_occurrence = { ns.call_function(torch.quantize_per_tensor): 1 if quant_type != QuantType.DYNAMIC else 0, qlinear_fun: 1, - ns.call_method("dequantize"): 1 if quant_type != QuantType.DYNAMIC else 0 + ns.call_method("dequantize"): num_dequantize, } prepare_expected_node_occurrence = \ quant_type_to_prepare_expected_node_occurrence[quant_type] @@ -3909,8 +4506,11 @@ def forward(self, x): else: qlinear_fun = ns.call_function(torch.ops.quantized.linear_dynamic_fp16) prepare_node_occurrence = { - # weight - ns.call_module(torch.ao.quantization.PlaceholderObserver): 1 + # activation and weight + # TODO: this is temporary behavior, should be fixed after we use + # backend_config_dict to configure default pt quantization behavior + # activation for nn.Identity (not has_relu) + ns.call_module(torch.ao.quantization.PlaceholderObserver): 2 + int(not has_relu) } convert_node_occurrence = { qlinear_fun: 1, @@ -4107,10 +4707,14 @@ def forward(self, x): } prepare_expected_node_occurrence = \ quant_type_to_prepare_expected_node_occurrence[quant_type] - self.checkGraphModeFxOp( + result_dict = self.checkGraphModeFxOp( model, data, quant_type, qconv_fun, prepare_expected_node_occurrence=prepare_expected_node_occurrence, expected_node_occurrence=convert_node_occurrence) + if quant_type != QuantType.DYNAMIC: + self.assertEqual(result_dict["quantized_output"], result_dict["quantized_reference_output"]) + # Ensure packed weights in lowered models are folded + self.assertIn("_packed_weight_0", result_dict["quantized"].state_dict().keys()) @skipIfNoFBGEMM def test_quantized_conv_relu(self): @@ -4260,10 +4864,12 @@ def test_add(self): self._test_binary_op_float16_impl( operator.add, operator.iadd) + @unittest.skip("This is no longer needed right now, can enable later with new api") def test_sub(self): self._test_binary_op_float16_impl(operator.sub, operator.isub) self._test_binary_op_float16_impl(torch.sub, None) + @unittest.skip("This is no longer needed right now, can enable later with new api") def test_div(self): self._test_binary_op_float16_impl(operator.truediv, operator.itruediv) self._test_binary_op_float16_impl(torch.div, None) @@ -4274,6 +4880,7 @@ def test_mul(self): operator.mul, operator.imul, torch.ops.quantized.mul) self._test_binary_op_float16_impl(operator.mul, operator.imul) + @unittest.skip("This is no longer needed right now, can enable later with new api") def test_sum(self): class Sum(torch.nn.Module): def forward(self, x): @@ -4297,6 +4904,7 @@ def forward(self, x): expected_node_occurrence=node_occurrence, custom_qconfig_dict=custom_qconfig_dict) + @unittest.skip("This is no longer needed right now, can enable later with new api") def test_bmm(self): class BMMMethod(torch.nn.Module): def __init__(self): @@ -4403,7 +5011,7 @@ def forward(self, x): m = M() expected_node_occurrence = { - ns.call_module(torch.ao.quantization.FusedMovingAvgObsFakeQuantize): 6, + ns.call_module(torch.ao.quantization.FusedMovingAvgObsFakeQuantize): 5, } self._test_quantized_add_mul_qat(m, expected_node_occurrence) @@ -4419,14 +5027,13 @@ def forward(self, x): x = torch.mul(x, 1.0) x = self.conv1(x) x = torch.mul(x, 1.0) - # TODO: add support for add + torch.relu? x = torch.relu(x) x = self.conv2(x) return x m = M() expected_node_occurrence = { - ns.call_module(torch.ao.quantization.FusedMovingAvgObsFakeQuantize): 6, + ns.call_module(torch.ao.quantization.FusedMovingAvgObsFakeQuantize): 5, } self._test_quantized_add_mul_qat(m, expected_node_occurrence) @@ -4846,6 +5453,7 @@ def test_softmax_normal(self): self._test_default_node_quant_handler_ops( module, functional, qconfig, is_reference, node_list) + @unittest.skip("This is no longer needed right now, can enable later with new api") def test_gelu_reference(self): module = torch.nn.GELU functional = torch.nn.functional.gelu @@ -4861,6 +5469,7 @@ def test_gelu_reference(self): ns.call_function(torch.quantize_per_tensor), ns.call_method('dequantize') ] + # TODO: change these to use backend_config_dict additional_patterns = {torch.nn.GELU: DefaultNodeQuantizeHandler, torch.nn.functional.gelu: DefaultNodeQuantizeHandler} self._test_default_node_quant_handler_ops( @@ -4869,6 +5478,7 @@ def test_gelu_reference(self): self._test_default_node_quant_handler_ops(module, functional, self.custom_qconfig, is_reference, node_list, additional_quant_pattern_dict=self.common_quant_patterns) + @unittest.skip("This is no longer needed right now, can enable later with new api") def test_softmax_reference(self): module = torch.nn.Softmax functional = torch.nn.functional.softmax @@ -4892,6 +5502,7 @@ def test_softmax_reference(self): self._test_default_node_quant_handler_ops(module, functional, self.custom_qconfig, is_reference, node_list, additional_quant_pattern_dict=self.common_quant_patterns) + @unittest.skip("This is no longer needed right now, can enable later with new api") def test_silu_reference(self): module = torch.nn.SiLU functional = torch.nn.functional.silu @@ -4923,6 +5534,7 @@ def test_silu_reference(self): self._test_default_node_quant_handler_ops(module, functional, self.custom_qconfig, is_reference, node_list, additional_quant_pattern_dict=self.common_quant_patterns) + @unittest.skip("This is no longer needed right now, can enable later with new api") def test_mish_reference(self): module = torch.nn.Mish functional = torch.nn.functional.mish @@ -5324,7 +5936,8 @@ def forward(self, x): m = M().eval() m = prepare_fx(m, {"": default_reuse_input_qconfig}) m = convert_fx(m) - print(m) + # make sure it runs + m(torch.rand(1)) def test_getitem(self): """ Make sure we only insert observer for getitem if the following node is matched @@ -5398,7 +6011,6 @@ def forward(self, x): x = self.sigmoid(x) x = torch.sigmoid(x) x = x.sigmoid() - x.sigmoid_() x = self.hardsigmoid(x) x = F.hardsigmoid(x) x = F.hardsigmoid(x, inplace=True) @@ -5406,7 +6018,6 @@ def forward(self, x): # F.tanh is deprecated x = torch.tanh(x) x = x.tanh() - x.tanh_() return x for eval_mode in [True, False]: @@ -5417,12 +6028,12 @@ def forward(self, x): m.eval() qconfig = default_qconfig prepare = prepare_fx - fq_count = 11 + fq_count = 9 else: m.train() qconfig = default_qat_qconfig prepare = prepare_qat_fx - fq_count = 11 + fq_count = 9 # nothing to fuse so skipping the fuse step m_copy = copy.deepcopy(m) @@ -5465,7 +6076,7 @@ def forward(self, x): expected_node_list=order_check) reference_count_check = { - ns.call_function(torch.quantize_per_tensor) : 13, + ns.call_function(torch.quantize_per_tensor) : 11, ns.call_method('dequantize') : 11 } reference_order_check = [ @@ -5879,6 +6490,7 @@ def forward(self, x): m, expected_node_occurrence=expected_occurrence) + @unittest.skip("This is no longer needed right now, can enable later with new api") def test_qmatmul(self): class M(torch.nn.Module): def forward(self, x, y): @@ -6277,15 +6889,7 @@ def forward(self, input: torch.Tensor, offsets: Optional[torch.Tensor] = None, model = EmbeddingBagLinear().train() prepared_fx_model = prepare_qat_fx(model, qconfig_dict) test_only_train_fn(prepared_fx_model, train_indices) - convert_custom_config_dict = { - "additional_object_mapping": { - "static": { - torch.nn.qat.EmbeddingBag: nn.quantized.EmbeddingBag, - } - } - } quant_model = convert_fx(prepared_fx_model, - convert_custom_config_dict=convert_custom_config_dict, qconfig_dict=qconfig_dict) def checkQuantized(model): diff --git a/test/quantization/serialized/TestSerialization.test_linear_relu_package_quantization_transforms.get_attr_targets.pt b/test/quantization/serialized/TestSerialization.test_linear_relu_package_quantization_transforms.get_attr_targets.pt index bb34a57f962a4d..6887e8c614a52d 100644 Binary files a/test/quantization/serialized/TestSerialization.test_linear_relu_package_quantization_transforms.get_attr_targets.pt and b/test/quantization/serialized/TestSerialization.test_linear_relu_package_quantization_transforms.get_attr_targets.pt differ diff --git a/test/run_test.py b/test/run_test.py index 5b5ce3b8318a26..1b73f3bde01068 100644 --- a/test/run_test.py +++ b/test/run_test.py @@ -104,7 +104,6 @@ def skip_test_p(name: str) -> bool: 'test_kernel_launch_checks', 'test_metal', 'test_nnapi', - 'test_functionalization', 'test_segment_reductions', 'test_static_runtime', 'test_throughput_benchmark', @@ -133,6 +132,7 @@ def skip_test_p(name: str) -> bool: "distributed/elastic/utils/util_test", "distributed/elastic/utils/distributed_test", "distributed/elastic/multiprocessing/api_test", + "test_deploy", ] ) @@ -168,6 +168,7 @@ def skip_test_p(name: str) -> bool: "test_typing", "distributed/elastic/events/lib_test", "distributed/elastic/agent/server/test/api_test", + "test_deploy", ] WINDOWS_BLOCKLIST = [ @@ -210,7 +211,9 @@ def skip_test_p(name: str) -> bool: "distributed/_shard/sharded_tensor/ops/test_binary_cmp", "distributed/_shard/sharded_tensor/ops/test_init", "distributed/_shard/sharded_tensor/ops/test_linear", + "distributed/_shard/sharding_spec/test_sharding_spec", "distributed/_shard/sharded_optim/test_sharded_optim", + "distributed/_shard/test_replicated_tensor", ] + FSDP_TEST ROCM_BLOCKLIST = [ @@ -228,9 +231,10 @@ def skip_test_p(name: str) -> bool: "distributed/_shard/sharded_tensor/ops/test_binary_cmp", "distributed/_shard/sharded_tensor/ops/test_init", "distributed/_shard/sharded_tensor/ops/test_linear", + "distributed/_shard/sharding_spec/test_sharding_spec", "distributed/_shard/sharded_optim/test_sharded_optim", + "distributed/_shard/test_replicated_tensor", "test_determination", - "test_multiprocessing", "test_jit_legacy", "test_type_hints", "test_openmp", @@ -257,6 +261,8 @@ def skip_test_p(name: str) -> bool: "test_modules", "test_nn", "test_ops", + "test_ops_gradients", + "test_ops_jit", "test_torch" ] @@ -306,7 +312,6 @@ def skip_test_p(name: str) -> bool: ) JIT_EXECUTOR_TESTS = [ - "test_jit_cuda_fuser", "test_jit_profiling", "test_jit_legacy", "test_jit_fuser_legacy", @@ -867,6 +872,10 @@ def get_selected_tests(options): if options.exclude_distributed_tests: options.exclude.extend(DISTRIBUTED_TESTS) + # these tests failing in CUDA 11.6 temporary disabling. issue https://github.com/pytorch/pytorch/issues/75375 + if torch.version.cuda is not None and LooseVersion(torch.version.cuda) == "11.6": + options.exclude.extend(["distributions/test_constraints"]) + selected_tests = exclude_tests(options.exclude, selected_tests) if sys.platform == "win32" and not options.ignore_win_blocklist: diff --git a/test/test_ao_sparsity.py b/test/test_ao_sparsity.py index 32b95973928e31..6b5c8574c2e679 100644 --- a/test/test_ao_sparsity.py +++ b/test/test_ao_sparsity.py @@ -20,5 +20,8 @@ # Scheduler from ao.sparsity.test_scheduler import TestScheduler # noqa: F401 +# Composability +from ao.sparsity.test_composability import TestComposability # noqa: F401 + if __name__ == '__main__': run_tests() diff --git a/test/test_autograd.py b/test/test_autograd.py index 1d4ef2ce38424f..408c71af075a6f 100644 --- a/test/test_autograd.py +++ b/test/test_autograd.py @@ -14,6 +14,7 @@ import uuid import warnings import operator +import subprocess from copy import deepcopy from collections import OrderedDict from itertools import product @@ -26,7 +27,6 @@ from torch.autograd.function import once_differentiable from torch.autograd.profiler import (profile, record_function, emit_nvtx) from torch.autograd.profiler_util import (_format_time, EventList, FunctionEvent, FunctionEventAvg) -import torch.autograd.functional as autogradF from torch.utils.checkpoint import checkpoint from torch.testing import make_tensor from torch.testing._internal.common_cuda import TEST_CUDA @@ -40,7 +40,7 @@ from torch.testing._internal.common_device_type import (instantiate_device_type_tests, skipCUDAIfRocm, onlyCPU, onlyCUDA, dtypes, dtypesIfCUDA, deviceCountAtLeast, skipMeta) -from torch.testing._internal.common_dtype import get_all_dtypes +from torch.testing._internal.common_dtype import floating_types_and from torch.testing._internal.logging_tensor import no_dispatch import pickle @@ -389,8 +389,8 @@ def test_not_implemented_fwad(self): hint_msg = "Running forward AD for an OP that does not implement it should raise a NotImplementedError" with self.assertRaisesRegex(NotImplementedError, err_msg, msg=hint_msg): - # if forward AD ends up being implemented for torch.atan2, choose a different op - torch.atan2(dual_x, dual_x) + # if forward AD ends up being implemented for torch.igamma, choose a different op + torch.igamma(dual_x, dual_x) def test_accumulate_grad(self): grad_output = torch.ones(5, 5) @@ -2820,7 +2820,7 @@ def test_profiler(self): for evt in p.function_events: if evt.name in names: found_indices.add(names.index(evt.name)) - self.assertEquals(len(found_indices), len(names)) + self.assertEqual(len(found_indices), len(names)) def test_profiler_seq_nr(self): with profile(use_kineto=kineto_available()) as p: @@ -2931,6 +2931,21 @@ def test_record_function_callbacks(self): foo_event = [event for event in function_events if "foo" in event.name][0] self.assertEqual(foo_event.count, 1) + def test_record_function_legacy(self): + # Test the new _record_function ops work + # Note: Remove once record_function uses these directly + x = torch.randn(10, 10) + with profile(use_kineto=kineto_available()) as p: + handle = torch.ops.profiler._record_function_enter("bar", None) + try: + y = x * 2 + 4 + finally: + torch.ops.profiler._record_function_exit(handle) + + function_events = p.function_events + foo_event = [event for event in function_events if "bar" in event.name][0] + self.assertEqual(foo_event.count, 1) + def test_profiler_aggregation_fake(self): events = EventList() id = [0] @@ -4815,7 +4830,10 @@ def test_grad_fn_attr_bindings(self): self.assertIsInstance(out.grad_fn._saved_output_size[0], int) self.assertEqual(out.grad_fn._saved_align_corners, False) # bool -> bool self.assertIsInstance(out.grad_fn._saved_align_corners, bool) - self.assertIsNone(out.grad_fn._saved_scale_factors) # c10::optional> -> float[]? + if hasattr(out.grad_fn, '_saved_scale_factors'): + self.assertIsNone(out.grad_fn._saved_scale_factors) # c10::optional> -> float[]? + else: + self.assertIsNone(out.grad_fn._saved_scales) # c10::optional> -> float[]? out = torch.nn.functional.interpolate(a, scale_factor=0.5, mode="linear") self.assertIsNone(out.grad_fn._saved_output_size) @@ -6340,1361 +6358,76 @@ def f(x): memory_with_hooks = torch.cuda.memory_allocated() self.assertEqual(memory_with_hooks, memory_without_grad) + def test_pynode_destruction_deadlock(self): + script = """ +import torch -def index_perm_variable(shape, max_indices): - if not isinstance(shape, tuple): - shape = (shape,) - - index = torch.randperm(max_indices).narrow(0, 0, reduce(mul, shape)).view(shape) - return index - -def bernoulli_scalar(): - return torch.tensor(0, dtype=torch.uint8).bernoulli_() - - -class TestAutogradFunctional(TestCase): - def _assert_same_struct(self, res, base): - # base and res should be Tensors or tuple of Tensors with the same size - if isinstance(base, torch.Tensor): - self.assertTrue(isinstance(res, torch.Tensor)) - self.assertEqual(base.size(), res.size()) - elif isinstance(base, tuple): - self.assertTrue(isinstance(res, tuple)) - self.assertEqual(len(base), len(res)) - for el_base, el_res in zip(base, res): - self.assertTrue(isinstance(el_base, torch.Tensor)) - self.assertTrue(isinstance(el_res, torch.Tensor)) - self.assertEqual(el_base.size(), el_res.size()) - else: - # Wrong base - raise RuntimeError("The base given to `_assert_same_struct` doesn't have" - " the right structure.") - - def _assert_interleaved_struct(self, res, base1, base2): - # base1 and base2 can be Tensors or tuples of Tensors. - # If they are tuples, res should be a tuple as well. - # The indexing works as follows for base1, base2 being - # - tuple, tuple: res[i][j][k][l] = (base1[i][k], base2[j][l]) - # - tuple, Tensor: res[i][k][l] = (base1[i][k], base2[l]) - # - Tensor, tuple: res[i][j][l] = (base1[i], base2[j][l]) - # - Tensor, Tensor: res[k][l] = (base1[k], base2[l]) - if isinstance(base1, torch.Tensor) and isinstance(base2, torch.Tensor): - self.assertTrue(isinstance(res, torch.Tensor)) - self.assertEqual(res.size(), base1.size() + base2.size()) - elif isinstance(base1, tuple) and isinstance(base2, torch.Tensor): - self.assertTrue(isinstance(res, tuple)) - self.assertEqual(len(res), len(base1)) - for el_res, el_base1 in zip(res, base1): - self.assertTrue(isinstance(el_res, torch.Tensor)) - self.assertTrue(isinstance(el_base1, torch.Tensor)) - self.assertEqual(el_res.size(), el_base1.size() + base2.size()) - elif isinstance(base1, torch.Tensor) and isinstance(base2, tuple): - self.assertTrue(isinstance(res, tuple)) - self.assertEqual(len(res), len(base2)) - for el_res, el_base2 in zip(res, base2): - self.assertTrue(isinstance(el_res, torch.Tensor)) - self.assertTrue(isinstance(el_base2, torch.Tensor)) - self.assertEqual(el_res.size(), base1.size() + el_base2.size()) - elif isinstance(base1, tuple) and isinstance(base2, tuple): - self.assertTrue(isinstance(res, tuple)) - self.assertEqual(len(res), len(base1)) - for el_res, el_base1 in zip(res, base1): - self.assertTrue(isinstance(el_res, tuple)) - self.assertEqual(len(res), len(base2)) - for el_el_res, el_base2 in zip(el_res, base2): - self.assertTrue(isinstance(el_el_res, torch.Tensor)) - self.assertTrue(isinstance(el_base2, torch.Tensor)) - self.assertEqual(el_el_res.size(), el_base1.size() + el_base2.size()) - else: - # Wrong bases - raise RuntimeError("The bases given to `_assert_interleaved_struct` don't have" - " the right structure.") - - def test_vjp_err_check(self): - def foo(a): - return 3 * a.narrow(0, 0, 3) - - def bar(a): - return 3 * a.narrow(0, 0, 3), "bar" - - inp = torch.rand(4) - v = torch.ones(3) - with self.assertRaisesRegex(TypeError, "The inputs given to vjp must be either a Tensor"): - res = autogradF.vjp(foo, (inp, 2), v) - - with self.assertRaisesRegex(TypeError, "The outputs of the user-provided function given to vjp must"): - res = autogradF.vjp(bar, inp, v) - - with self.assertRaisesRegex(RuntimeError, "The vector v can only be None if the user-provided function returns"): - res = autogradF.vjp(foo, inp) - - with self.assertRaisesRegex(RuntimeError, "The given v should contain a single Tensor."): - res = autogradF.vjp(foo, inp, (torch.ones_like(inp), torch.ones_like(inp))) - - with self.assertRaisesRegex(RuntimeError, "v has invalid size: should be torch.Size"): - res = autogradF.vjp(foo, inp, v[:2]) - - res = autogradF.vjp(foo, inp, v)[1] - self._assert_same_struct(res, inp) - - def test_vjp_err_check_strict(self): - def foo(a): - return a.detach() - - def bar(a): - # Make a non-leaf Tensor that requires_grad but that is not connected to the input - return a.long().float().requires_grad_().clone() - - inp = torch.rand(4) - v = torch.rand(4) - with self.assertRaisesRegex(RuntimeError, "Output 0 of the user-provided function does not require gradients."): - res = autogradF.vjp(foo, inp, v, strict=True) - res = autogradF.vjp(foo, inp, v, strict=False) - self._assert_same_struct(res[1], inp) - self.assertEqual(res[1].abs().sum(), 0.) - - with self.assertRaisesRegex(RuntimeError, "The output of the user-provided function is independent of input 0"): - res = autogradF.vjp(bar, inp, v, strict=True) - res = autogradF.vjp(bar, inp, v, strict=False) - self._assert_same_struct(res[1], inp) - self.assertEqual(res[1].abs().sum(), 0.) - - # The Jacobian does not depend on the input - def foo(a): - return a.clone() - - inp.requires_grad_() - with self.assertRaisesRegex(RuntimeError, "jacobian of the user-provided function is independent of input 0."): - res = autogradF.vjp(foo, inp, v, create_graph=True, strict=True) - res = autogradF.vjp(foo, inp, v, create_graph=True, strict=False) - self._assert_same_struct(res[1], inp) - self.assertEqual(res[1], v) - - def test_vjp_no_grad(self): - def reducer(x): - return x.sum(dim=1) - inputs = torch.rand(4, 4) - v = torch.ones(4) - with torch.no_grad(): - res = autogradF.vjp(reducer, inputs, v) - self.assertIsNone(res[0].grad_fn) - self.assertIsNone(res[1].grad_fn) - self.assertNotEqual(res[1], torch.zeros(4, 4)) - - inputs.requires_grad_() - v.requires_grad_() - with torch.no_grad(): - res = autogradF.vjp(reducer, inputs, v, create_graph=True) - self.assertIsNotNone(res[0].grad_fn) - self.assertIsNotNone(res[1].grad_fn) - self.assertNotEqual(res[1], torch.zeros(4, 4)) - - def test_vjp_output(self): - def reducer(x): - return x.sum(dim=1) - inputs = torch.rand(4, 4) - v = torch.ones(4) - res = autogradF.vjp(reducer, inputs, v) - self._assert_same_struct(res[1], inputs) - self.assertIsNone(res[0].grad_fn) - self.assertIsNone(res[1].grad_fn) - - def adder(x, y): - return 2 * x + 3 * y - - inputs = (torch.rand(2), torch.rand(2)) - v = torch.ones(2) - out, vjp_val = autogradF.vjp(adder, inputs, v) - self._assert_same_struct(vjp_val, inputs) - self.assertIsNone(out.grad_fn) - self.assertIsNone(vjp_val[0].grad_fn) - self.assertIsNone(vjp_val[1].grad_fn) - - def adder(x, y): - return 2 * x + 3 * y, x + y - - inputs = (torch.rand(2), torch.rand(2)) - v = (torch.tensor([1., 0.]), torch.tensor([1., 0.])) - out, vjp_val = autogradF.vjp(adder, inputs, v) - self._assert_same_struct(vjp_val, inputs) - self.assertIsNone(out[0].grad_fn) - self.assertIsNone(out[1].grad_fn) - self.assertIsNone(vjp_val[0].grad_fn) - self.assertIsNone(vjp_val[1].grad_fn) - - def test_vjp_scalar(self): - def reducer(x): - return x.sum() - inputs = torch.rand(4, 4) - v = torch.ones([]) - res = autogradF.vjp(reducer, inputs, v) - self._assert_same_struct(res[0], v) - self._assert_same_struct(res[1], inputs) - - res = autogradF.vjp(reducer, inputs) - self._assert_same_struct(res[0], v) - self._assert_same_struct(res[1], inputs) - - def expander(x): - return x.unsqueeze(0).repeat(4) - inputs = torch.rand([]) - v = torch.ones(4) - res = autogradF.vjp(expander, inputs, v) - self._assert_same_struct(res[0], v) - self._assert_same_struct(res[1], inputs) - - def test_vjp_create_graph(self): - def reducer(x): - return x.sum(dim=1) - inputs = torch.rand(2, 2, dtype=torch.double) - v = torch.ones(2, dtype=torch.double) - - inputs.requires_grad_() - v.requires_grad_() - res = autogradF.vjp(reducer, inputs, v, create_graph=True) - self._assert_same_struct(res[1], inputs) - self.assertIsNotNone(res[0].grad_fn) - self.assertIsNotNone(res[1].grad_fn) - - gradcheck(lambda inp, v: autogradF.vjp(reducer, inputs, v, create_graph=True), (inputs, v)) - gradgradcheck(lambda inp, v: autogradF.vjp(reducer, inputs, v, create_graph=True), (inputs, v)) - - def adder(x, y): - return 2 * x + 3 * y, x * y - - inputs = (torch.rand(2, dtype=torch.double, requires_grad=True), - torch.rand(2, dtype=torch.double, requires_grad=True)) - v = (torch.tensor([1., 0.], dtype=torch.double, requires_grad=True), - torch.tensor([1., 0.], dtype=torch.double, requires_grad=True)) - - gradcheck(lambda *args: autogradF.vjp(adder, args[:2], args[2:], create_graph=True)[1], inputs + v) - gradgradcheck(lambda *args: autogradF.vjp(adder, args[:2], args[2:], create_graph=True)[1], inputs + v) - - def foo(*args): - x, y = args[:2] - v = args[2:] - - x = x.cos() - val, grad = autogradF.vjp(adder, (x, y), v, create_graph=True) - - return val[0].exp() + val[1].exp() + grad[0].exp() + grad[1].exp() + x.exp() + y.exp() - - gradcheck(foo, inputs + v) - gradgradcheck(foo, inputs + v) - - def test_jvp_err_check(self): - def foo(a): - return 3 * a.narrow(0, 0, 3) - - def bar(a): - return 3 * a.narrow(0, 0, 3), "bar" - - inp = torch.rand(4) - v = torch.rand(4) - with self.assertRaisesRegex(TypeError, "The inputs given to jvp must be either a Tensor"): - res = autogradF.jvp(foo, (inp, 2), v) - - with self.assertRaisesRegex(TypeError, "The outputs of the user-provided function given to jvp must"): - res = autogradF.jvp(bar, inp, v) - - with self.assertRaisesRegex(RuntimeError, "The vector v can only be None if the input to the user-provided function"): - res = autogradF.jvp(foo, inp) - - with self.assertRaisesRegex(RuntimeError, "The given v should contain a single Tensor."): - res = autogradF.jvp(foo, inp, (v, v)) - - with self.assertRaisesRegex(RuntimeError, "v has invalid size: should be torch.Size"): - res = autogradF.jvp(foo, inp, v[:2]) - - res = autogradF.jvp(foo, inp, v)[1] - self._assert_same_struct(res, foo(inp)) - - def test_jvp_err_check_strict(self): - def foo(a): - return a.detach() - - def bar(a): - # Make a non-leaf Tensor that requires_grad but that is not connected to the input - return a.long().float().requires_grad_().clone() - - inp = torch.rand(4) - v = torch.rand(4) - with self.assertRaisesRegex(RuntimeError, "Output 0 of the user-provided function does not require gradients."): - res = autogradF.jvp(foo, inp, v, strict=True) - res = autogradF.jvp(foo, inp, v, strict=False) - self._assert_same_struct(res[1], res[0]) - self.assertEqual(res[1].abs().sum(), 0.) - - with self.assertRaisesRegex(RuntimeError, "The output of the user-provided function is independent of input 0"): - res = autogradF.jvp(bar, inp, v, strict=True) - res = autogradF.jvp(bar, inp, v, strict=False) - self._assert_same_struct(res[1], res[0]) - self.assertEqual(res[1].abs().sum(), 0.) - - # The Jacobian does not depend on the input - def foo(a): - return a.clone() - - inp.requires_grad_() - with self.assertRaisesRegex(RuntimeError, "jacobian of the user-provided function is independent of input 0."): - res = autogradF.jvp(foo, inp, v, create_graph=True, strict=True) - res = autogradF.jvp(foo, inp, v, create_graph=True, strict=False) - self._assert_same_struct(res[1], inp) - self.assertEqual(res[1], v) - - def test_jvp_no_grad(self): - def reducer(x): - return x.sum(dim=1) - inputs = torch.rand(4, 4) - v = torch.ones(4, 4) - with torch.no_grad(): - res = autogradF.jvp(reducer, inputs, v) - self.assertIsNone(res[0].grad_fn) - self.assertIsNone(res[1].grad_fn) - self.assertNotEqual(res[1], torch.zeros(4, 4)) - - inputs.requires_grad_() - v.requires_grad_() - with torch.no_grad(): - res = autogradF.jvp(reducer, inputs, v, create_graph=True) - self.assertIsNotNone(res[0].grad_fn) - self.assertIsNotNone(res[1].grad_fn) - self.assertNotEqual(res[1], torch.zeros(4, 4)) - - def test_jvp_output(self): - def reducer(x): - return x.sum(dim=1) - inputs = torch.rand(4, 4) - v = torch.ones(4, 4) - res = autogradF.jvp(reducer, inputs, v) - self._assert_same_struct(res[1], res[0]) - self.assertIsNone(res[0].grad_fn) - self.assertIsNone(res[1].grad_fn) - - def adder(x, y): - return 2 * x + 3 * y - - inputs = (torch.rand(2), torch.rand(2)) - v = (torch.ones(2), torch.ones(2)) - out, jvp_val = autogradF.jvp(adder, inputs, v) - self._assert_same_struct(jvp_val, out) - self.assertIsNone(out.grad_fn) - self.assertIsNone(jvp_val[0].grad_fn) - self.assertIsNone(jvp_val[1].grad_fn) - - def adder(x, y): - return 2 * x + 3 * y, x + y - - inputs = (torch.rand(2), torch.rand(2)) - v = (torch.tensor([1., 0.]), torch.tensor([1., 0.])) - out, jvp_val = autogradF.jvp(adder, inputs, v) - self._assert_same_struct(jvp_val, out) - self.assertIsNone(out[0].grad_fn) - self.assertIsNone(out[1].grad_fn) - self.assertIsNone(jvp_val[0].grad_fn) - self.assertIsNone(jvp_val[1].grad_fn) - - def test_jvp_scalar(self): - def reducer(x): - return x.sum() - inputs = torch.rand(4, 4) - v = torch.ones(4, 4) - res = autogradF.jvp(reducer, inputs, v) - self._assert_same_struct(res[0], torch.zeros([])) - self._assert_same_struct(res[1], res[0]) - - def expander(x): - return x.unsqueeze(0).repeat(4) - inputs = torch.rand([]) - v = torch.ones([]) - res = autogradF.jvp(expander, inputs, v) - self._assert_same_struct(res[0], torch.zeros(4)) - self._assert_same_struct(res[1], res[0]) - - res = autogradF.jvp(expander, inputs) - self._assert_same_struct(res[0], torch.zeros(4)) - self._assert_same_struct(res[1], res[0]) - - def test_jvp_create_graph(self): - def reducer(x): - return x.sum(dim=1) - inputs = torch.rand(2, 2, dtype=torch.double) - v = torch.ones(2, 2, dtype=torch.double) - - inputs.requires_grad_() - v.requires_grad_() - res = autogradF.jvp(reducer, inputs, v, create_graph=True) - self._assert_same_struct(res[1], res[0]) - self.assertIsNotNone(res[0].grad_fn) - self.assertIsNotNone(res[1].grad_fn) - - gradcheck(lambda inp, v: autogradF.jvp(reducer, inp, v, create_graph=True), (inputs, v)) - gradgradcheck(lambda inp, v: autogradF.jvp(reducer, inp, v, create_graph=True), (inputs, v)) - - def adder(x, y): - return 2 * x + 3 * y, x * y - - inputs = (torch.rand(2, dtype=torch.double, requires_grad=True), - torch.rand(2, dtype=torch.double, requires_grad=True)) - v = (torch.tensor([1., 0.], dtype=torch.double, requires_grad=True), - torch.tensor([1., 0.], dtype=torch.double, requires_grad=True)) - - gradcheck(lambda *args: autogradF.jvp(adder, args[:2], args[2:], create_graph=True)[1], inputs + v) - gradgradcheck(lambda *args: autogradF.jvp(adder, args[:2], args[2:], create_graph=True)[1], inputs + v) - - def foo(*args): - x, y = args[:2] - v = args[2:] - - x = x.cos() - val, grad = autogradF.jvp(adder, (x, y), v, create_graph=True) - - return val[0].exp() + val[1].exp() + grad[0].exp() + grad[1].exp() + x.exp() + y.exp() - - gradcheck(foo, inputs + v) - gradgradcheck(foo, inputs + v) - - def _test_construct_standard_basis_for(self, inputs): - numels = tuple(tensor.numel() for tensor in inputs) - results = autogradF._construct_standard_basis_for(inputs, numels) - for result, inp in zip(results, inputs): - self.assertEqual(result.dtype, inp.dtype) - self.assertEqual(result.device, inp.device) - results = torch.cat([result.to(device='cpu', dtype=torch.float) - for result in results], dim=1) - expected = torch.eye(results[0].shape[0], dtype=torch.float) - self.assertEqual(results, expected) - - def test_construct_standard_basis_for(self): - test_cases = [ - (torch.randn(2, 3),), - (torch.randn(1),), - (torch.randn([]),), - (torch.randn(1), torch.randn([]), torch.randn([])), - (torch.randn(2), torch.randn(3), torch.randn([])), - (torch.randn(2), torch.randn([]), torch.randn(3)), - (torch.randn(2, 3), torch.randn(3), torch.randn(3, 4, 2)), - (torch.randn(2, dtype=torch.float64), torch.randn(3, dtype=torch.float32)), - ] - - for inputs in test_cases: - self._test_construct_standard_basis_for(inputs) - - @unittest.skipIf(not TEST_CUDA, "test requires CUDA") - def test_construct_standard_basis_for_cuda(self): - test_cases = [ - (torch.randn(2), torch.randn(3, device='cuda')), - (torch.randn(3, device='cuda'), torch.randn(2)), - ] - - for inputs in test_cases: - self._test_construct_standard_basis_for(inputs) - - def _test_vectorize_raises_no_warnings(self, api): - # vmap is an experimental prototype. When someone calls torch.vmap, - # it raises a python warning. This test checks that - # autogradF.{jacobian, hessian} don't raise that experimental prototype - # warning; it is not nice for a public-facing API to raise a warning - # no matter how it is called. - def foo(a): - return (a ** 2).sum() - - x = torch.randn(3) - with warnings.catch_warnings(record=True) as wa: - result = api(foo, x, vectorize=True) - self.assertEqual(len(wa), 0) - - def test_jacobian_vectorize_raises_no_warnings(self): - return self._test_vectorize_raises_no_warnings(autogradF.jacobian) - - def test_hessian_vectorize_raises_no_warnings(self): - return self._test_vectorize_raises_no_warnings(autogradF.hessian) - - def _test_jacobian_err_check(self, vectorize): - def foo(a): - return 3 * a.narrow(0, 0, 3) - - def bar(a): - return 3 * a.narrow(0, 0, 3), "bar" - - inp = torch.rand(4) - with self.assertRaisesRegex(TypeError, "The inputs given to jacobian must be either a Tensor"): - res = autogradF.jacobian(foo, (inp, 2), vectorize=vectorize) - - with self.assertRaisesRegex(TypeError, "The outputs of the user-provided function given to jacobian must"): - res = autogradF.jacobian(bar, inp, vectorize=vectorize) - - res = autogradF.jacobian(foo, inp, vectorize=vectorize) - self._assert_interleaved_struct(res, foo(inp), inp) - - def foo(a, b): - return b, 3 * a.narrow(0, 0, 3) - - inp = (torch.rand(4), torch.rand(5)) - - res = autogradF.jacobian(foo, inp, vectorize=vectorize) - self._assert_interleaved_struct(res, foo(*inp), inp) - - def test_jacobian_err_check(self): - return self._test_jacobian_err_check(vectorize=False) - - def test_jacobian_err_check_vectorize(self): - return self._test_jacobian_err_check(vectorize=True) - - def test_jacobian_err_check_strict(self): - def foo(a): - return a.detach() - - def bar(a): - # Make a non-leaf Tensor that requires_grad but that is not connected to the input - return a.long().float().requires_grad_().clone() - - inp = torch.rand(4) - with self.assertRaisesRegex(RuntimeError, "Output 0 of the user-provided function does not require gradients."): - res = autogradF.jacobian(foo, inp, strict=True) - res = autogradF.jacobian(foo, inp, strict=False) - self._assert_interleaved_struct(res, foo(inp), inp) - self.assertEqual(res.abs().sum(), 0.) - - with self.assertRaisesRegex(RuntimeError, "Output 0 of the user-provided function is independent of input 0."): - res = autogradF.jacobian(bar, inp, strict=True) - res = autogradF.jacobian(bar, inp, strict=False) - self._assert_interleaved_struct(res, foo(inp), inp) - self.assertEqual(res.abs().sum(), 0.) - - # The Jacobian does not depend on the input - def foo(a): - return a.clone() - - inp.requires_grad_() - with self.assertRaisesRegex(RuntimeError, "jacobian of the user-provided function is independent of input 0."): - res = autogradF.jacobian(foo, inp, create_graph=True, strict=True) - res = autogradF.jacobian(foo, inp, create_graph=True, strict=False) - self._assert_interleaved_struct(res, inp, inp) - self.assertEqual(res, torch.eye(4)) - - def test_jacobian_err_check_strict_vectorize(self): - def foo(x): - return x - - inp = torch.rand(4) - with self.assertRaisesRegex(RuntimeError, "not supported together"): - res = autogradF.jacobian(foo, inp, strict=True, vectorize=True) - - def test_jacobian_no_grad(self): - def exp_reducer(x): - return x.exp().sum(dim=1) - - inputs = torch.rand(4, 4) - with torch.no_grad(): - res = autogradF.jacobian(exp_reducer, inputs) - self.assertIsNone(res.grad_fn) - self.assertNotEqual(res, torch.zeros(4, 4)) - - with torch.no_grad(): - res = autogradF.jacobian(exp_reducer, inputs, create_graph=True) - self.assertIsNotNone(res.grad_fn) - self.assertNotEqual(res, torch.zeros(4, 4)) - - def _test_jacobian_output(self, vectorize): - def exp_reducer(x): - return x.exp().sum(dim=1) - - inputs = torch.rand(4, 4) - res = autogradF.jacobian(exp_reducer, inputs, vectorize=vectorize) - self._assert_interleaved_struct(res, exp_reducer(inputs), inputs) - self.assertIsNone(res.grad_fn) - - def identity(x): - return x.clone() - - inputs = torch.rand(4) - res = autogradF.jacobian(identity, inputs, vectorize=vectorize) - self._assert_interleaved_struct(res, identity(inputs), inputs) - self.assertIsNone(res.grad_fn) - self.assertEqual(res, torch.eye(4)) - - def add_exp_reducer(x, y): - return (x + y.exp()).sum(dim=1) - - inputs = (torch.rand(4, 4), torch.rand(4, 4)) - res = autogradF.jacobian(add_exp_reducer, inputs, vectorize=vectorize) - self._assert_interleaved_struct(res, add_exp_reducer(*inputs), inputs) - self.assertIsNone(res[0].grad_fn) - self.assertIsNone(res[1].grad_fn) - - def test_jacobian_output(self): - self._test_jacobian_output(vectorize=False) - - def test_jacobian_output_vectorize(self): - self._test_jacobian_output(vectorize=True) - - def _test_jacobian_scalar(self, vectorize): - def reducer(x): - return x.sum() - inputs = torch.rand(4, 4) - res = autogradF.jacobian(reducer, inputs, vectorize=vectorize) - self._assert_same_struct(res, inputs) - - def expander(x): - return x.unsqueeze(0).repeat(4) - inputs = torch.rand([]) - res = autogradF.jacobian(expander, inputs, vectorize=vectorize) - self._assert_same_struct(res, torch.zeros(4)) - - def test_jacobian_scalar(self): - self._test_jacobian_scalar(vectorize=False) - - def test_jacobian_scalar_vectorize(self): - self._test_jacobian_scalar(vectorize=True) - - def _test_jacobian_create_graph(self, vectorize): - def exp_reducer(x): - return x.exp().sum(dim=1) - - inputs = torch.rand(4, 4, dtype=torch.double, requires_grad=True) - res = autogradF.jacobian(exp_reducer, inputs, create_graph=True, vectorize=vectorize) - self._assert_interleaved_struct(res, exp_reducer(inputs), inputs) - self.assertIsNotNone(res.grad_fn) - - gradcheck(lambda inp: autogradF.jacobian(exp_reducer, inp, create_graph=True, vectorize=vectorize), inputs) - gradgradcheck(lambda inp: autogradF.jacobian(exp_reducer, inp, create_graph=True, vectorize=vectorize), inputs) - - def add_exp_reducer(x, y): - return (x + y).exp().sum(dim=1) - - inputs = (torch.rand(4, 4, dtype=torch.double, requires_grad=True), - torch.rand(4, 4, dtype=torch.double, requires_grad=True)) - res = autogradF.jacobian(add_exp_reducer, inputs, create_graph=True, vectorize=vectorize) - self._assert_interleaved_struct(res, add_exp_reducer(*inputs), inputs) - self.assertIsNotNone(res[0].grad_fn) - self.assertIsNotNone(res[1].grad_fn) - - gradcheck(lambda *inp: autogradF.jacobian(add_exp_reducer, inp, create_graph=True, vectorize=vectorize), inputs) - gradgradcheck(lambda *inp: autogradF.jacobian(add_exp_reducer, inp, create_graph=True, vectorize=vectorize), inputs) - - def foo(x, y): - x = x.cos() - val, jac = autogradF.jacobian(add_exp_reducer, (x, y), create_graph=True, vectorize=vectorize) - - res = val[0].exp().sum() + val[1].exp().sum() + jac[0].exp().sum() - res = res + jac[1].exp().sum() + x.exp().sum() + y.exp().sum() - return res - - gradcheck(foo, inputs) - gradgradcheck(foo, inputs) - - def test_jacobian_create_graph(self): - self._test_jacobian_create_graph(vectorize=False) - - def test_jacobian_create_graph_vectorize(self): - self._test_jacobian_create_graph(vectorize=True) - - def _check_jacobian_vectorize_correctness(self, f, inputs, test_forward_ad=True): - expected = autogradF.jacobian(f, inputs, vectorize=False) - result_backward_mode = autogradF.jacobian(f, inputs, vectorize=True) - self.assertEqual(result_backward_mode, expected) - - if test_forward_ad: - result_forward_mode = autogradF.jacobian(f, inputs, strategy="forward-mode", vectorize=True) - self.assertEqual(result_forward_mode, expected) - - def test_jacobian_vectorize_correctness_simple(self): - def f(x): - return 3 * x ** 2 - - x = torch.randn(2, 3, 5) - self._check_jacobian_vectorize_correctness(f, x) - - def test_jacobian_vectorize_correctness_multi_input(self): - def f(x, y): - return (x.cos() * x) @ y.sin() - - x = torch.randn(2, 3) - y = torch.randn(3, 5) - self._check_jacobian_vectorize_correctness(f, (x, y)) - - def test_jacobian_vectorize_correctness_multi_input_multi_output(self): - def f(x, y): - return (x * x) @ y, x @ (x.sum(1) * y), y.sum() - - x = torch.randn(5, 3) - y = torch.randn(3, 5) - self._check_jacobian_vectorize_correctness(f, (x, y)) - - def test_jacobian_vectorize_correctness_unrelated_outputs(self): - def f(x, y): - return x, y, x, y - - x = torch.randn(2) - y = torch.randn(3) - self._check_jacobian_vectorize_correctness(f, (x, y)) - - def test_jacobian_vectorize_correctness_zero_dim(self): - # zero-dim output - def f(x, y): - return x.sum(), y.sum(), x * y - - x = torch.randn(3) - y = torch.randn(3) - self._check_jacobian_vectorize_correctness(f, (x, y)) - - # zero-dim input - def g(x): - return torch.stack([x, x, x]) - - x = torch.randn([]) - self._check_jacobian_vectorize_correctness(g, x) - - # Mixed zero-dim input / zero-dim output - def h(x, y): - return y.sum(), x * y - - x = torch.randn([]) - y = torch.randn(1) - self._check_jacobian_vectorize_correctness(h, (x, y)) - - @unittest.skipIf(not TEST_CUDA, "test requires CUDA") - def test_jacobian_vectorize_correctness_different_devices(self): - def f(x, y): - return x * y, (x * y).cuda() - - x = torch.randn(3) - y = torch.randn(3) - self._check_jacobian_vectorize_correctness(f, (x, y)) - - def test_jacobian_vectorize_correctness_different_dtype(self): - def f(x, y): - return (x * y).float(), (x * y).double() - - x = torch.randn(3) - y = torch.randn(3) - # The Jacobian computed using forward AD has the dtype of the output - # but the Jacobian computed with reverse AD has dtype of input - self._check_jacobian_vectorize_correctness(f, (x, y), test_forward_ad=False) - - def _check_hessian_vectorize_correctness(self, f, inputs): - expected = autogradF.hessian(f, inputs, vectorize=False) - result = autogradF.hessian(f, inputs, vectorize=True) - self.assertEqual(result, expected) - - result_forward_mode = autogradF.hessian(f, inputs, outer_jacobian_strategy="forward-mode", vectorize=True) - self.assertEqual(result_forward_mode, expected) - - def test_hessian_vectorize_correctness_simple(self): - def f(x): - return (3 * x ** 2).sum() - - x = torch.randn(2, 3, 5) - self._check_hessian_vectorize_correctness(f, x) - - def test_hessian_vectorize_correctness_multi_input(self): - def f(x, y, z): - return ((x.relu() * x) @ y.sin() @ z).sum() - - x = torch.randn(2, 3) - y = torch.randn(3, 5) - z = torch.randn(5, 5) - self._check_hessian_vectorize_correctness(f, (x, y, z)) - - def test_hessian_vectorize_correctness_unrelated_outputs(self): - # output unrelated to one input - def f(x, y): - return (x ** 2).sum() - - x = torch.randn(2) - y = torch.randn(3) - self._check_hessian_vectorize_correctness(f, (x, y)) - - # output unrelated to all inputs - def f(x, y): - return torch.ones([]) - - x = torch.randn(2) - y = torch.randn(3) - self._check_hessian_vectorize_correctness(f, (x, y)) - - def _test_hessian_err_check(self, vectorize): - def foo(a): - return 3 * a.narrow(0, 0, 3).exp().sum() - - def bar(a): - return 3 * a.narrow(0, 0, 3), "bar" - - def bar2(a): - return 3 * a.narrow(0, 0, 3) - - def bar3(a): - return 3 * a.narrow(0, 0, 3), 3 * a.narrow(0, 0, 3) - - inp = torch.rand(4) - with self.assertRaisesRegex(TypeError, "The inputs given to hessian must be either a Tensor"): - res = autogradF.hessian(foo, (inp, 2), vectorize=vectorize) - - with self.assertRaisesRegex(TypeError, "The outputs of the user-provided function given to hessian must"): - res = autogradF.hessian(bar, inp, vectorize=vectorize) - - err_msg_out = "The Tensor returned by the function given to hessian should contain a single element" - with self.assertRaisesRegex(RuntimeError, err_msg_out): - res = autogradF.hessian(bar2, inp, vectorize=vectorize) - - with self.assertRaisesRegex(RuntimeError, "The function given to hessian should return a single Tensor"): - res = autogradF.hessian(bar3, inp, vectorize=vectorize) - - res = autogradF.hessian(foo, inp, vectorize=vectorize) - self._assert_interleaved_struct(res, inp, inp) - - def foo(a, b): - return (3 * b.narrow(0, 0, 3) * a.narrow(0, 0, 3)).sum() - - inp = (torch.rand(4), torch.rand(5)) - - res = autogradF.hessian(foo, inp, vectorize=vectorize) - self._assert_interleaved_struct(res, inp, inp) - - def test_hessian_err_check(self): - self._test_hessian_err_check(vectorize=False) - - def test_hessian_err_check_vectorize(self): - self._test_hessian_err_check(vectorize=True) - - def test_hessian_err_check_strict(self): - def foo(a): - return a.detach().sum() - - def bar(a): - # Make a non-leaf Tensor that requires_grad but that is not connected to the input - return a.long().float().requires_grad_().clone().sum() - - def bar2(a): - # A Linear function for which the jacobian is independent of the input - return (3 * a).sum() - - inp = torch.rand(4) - with self.assertRaisesRegex(RuntimeError, "Output 0 of the user-provided function does not require gradients."): - res = autogradF.hessian(foo, inp, strict=True) - res = autogradF.hessian(foo, inp, strict=False) - self._assert_interleaved_struct(res, inp, inp) - self.assertEqual(res.abs().sum(), 0.) - - with self.assertRaisesRegex(RuntimeError, "jacobian of the user-provided function with respect to input 0"): - res = autogradF.hessian(bar, inp, strict=True) - res = autogradF.hessian(bar, inp, strict=False) - self._assert_interleaved_struct(res, inp, inp) - self.assertEqual(res.abs().sum(), 0.) - - with self.assertRaisesRegex(RuntimeError, "jacobian of the user-provided function with respect to input 0 is"): - res = autogradF.hessian(bar2, inp, strict=True) - res = autogradF.hessian(bar2, inp, strict=False) - self._assert_interleaved_struct(res, inp, inp) - self.assertEqual(res.abs().sum(), 0.) - - def test_hessian_err_check_strict_vectorize(self): - def foo(x): - return (x ** 3).sum() - - inp = torch.rand(4) - with self.assertRaisesRegex(RuntimeError, "not supported together"): - res = autogradF.hessian(foo, inp, strict=True, vectorize=True) - - def test_hessian_no_grad(self): - def pow_reducer(x): - return x.pow(3).sum() - - inputs = torch.rand(2, 2) - with torch.no_grad(): - res = autogradF.hessian(pow_reducer, inputs) - self.assertIsNone(res[0][0].grad_fn) - self.assertIsNone(res[0][1].grad_fn) - self.assertIsNone(res[1][0].grad_fn) - self.assertIsNone(res[1][1].grad_fn) - self.assertNotEqual(res, torch.zeros(2, 2, 2)) - - with torch.no_grad(): - res = autogradF.hessian(pow_reducer, inputs, create_graph=True) - self.assertIsNotNone(res[0][0].grad_fn) - self.assertIsNotNone(res[0][1].grad_fn) - self.assertIsNotNone(res[1][0].grad_fn) - self.assertIsNotNone(res[1][1].grad_fn) - self.assertNotEqual(res, torch.zeros(2, 2, 2)) - - - def _test_hessian_output(self, vectorize): - def pow_reducer(x): - return x.pow(3).sum() - - inputs = torch.rand(2, 2) - res = autogradF.hessian(pow_reducer, inputs, vectorize=vectorize) - self._assert_interleaved_struct(res, inputs, inputs) - self.assertIsNone(res.grad_fn) - - def add_pow_reducer(x, y): - return (x + y).pow(3).sum() - - inputs = (torch.rand(2, 2), torch.rand(2, 2)) - res = autogradF.hessian(add_pow_reducer, inputs, vectorize=vectorize) - self._assert_interleaved_struct(res, inputs, inputs) - self.assertIsNone(res[0][0].grad_fn) - self.assertIsNone(res[0][1].grad_fn) - self.assertIsNone(res[1][0].grad_fn) - self.assertIsNone(res[1][1].grad_fn) - - def test_hessian_output(self): - self._test_hessian_output(vectorize=False) - - def test_hessian_output_vectorize(self): - self._test_hessian_output(vectorize=True) - - def _test_hessian_scalar(self, vectorize): - def reducer(x): - return x.sum() - inputs = torch.rand(4, 4) - res = autogradF.hessian(reducer, inputs, vectorize=vectorize) - self._assert_interleaved_struct(res, inputs, inputs) - - inputs = torch.rand([]) - res = autogradF.hessian(reducer, inputs, vectorize=vectorize) - self._assert_same_struct(res, inputs) - - def bad_reducer(x): - return x.sum().view(1, 1, 1) - inputs = torch.rand(4, 4) - res = autogradF.hessian(bad_reducer, inputs, vectorize=vectorize) - self._assert_interleaved_struct(res, inputs, inputs) - - def test_hessian_scalar(self): - return self._test_hessian_scalar(vectorize=False) - - def test_hessian_scalar_vectorize(self): - return self._test_hessian_scalar(vectorize=True) - - def _test_hessian_create_graph(self, vectorize): - def pow_reducer(x): - return x.pow(3).sum() - - inputs = torch.rand(2, 2, dtype=torch.double, requires_grad=True) - res = autogradF.hessian(pow_reducer, inputs, create_graph=True, vectorize=vectorize) - self._assert_interleaved_struct(res, inputs, inputs) - self.assertIsNotNone(res.grad_fn) - - gradcheck(lambda inp: autogradF.hessian(pow_reducer, inp, create_graph=True, vectorize=vectorize), inputs) - gradgradcheck(lambda inp: autogradF.hessian(pow_reducer, inp, create_graph=True, vectorize=vectorize), inputs) - - def add_pow_reducer(x, y): - return (x + y).pow(3).sum() - - inputs = (torch.rand(2, 2, dtype=torch.double, requires_grad=True), - torch.rand(2, 2, dtype=torch.double, requires_grad=True)) - res = autogradF.hessian(add_pow_reducer, inputs, create_graph=True, vectorize=vectorize) - self._assert_interleaved_struct(res, inputs, inputs) - self.assertIsNotNone(res[0][0].grad_fn) - self.assertIsNotNone(res[0][1].grad_fn) - self.assertIsNotNone(res[1][0].grad_fn) - self.assertIsNotNone(res[1][1].grad_fn) - - def flatten(inp): - return tuple(el_lvl2 for el_lvl1 in inp for el_lvl2 in el_lvl1) - - gradcheck(lambda *inp: flatten(autogradF.hessian(add_pow_reducer, inp, create_graph=True, vectorize=vectorize)), inputs) - gradgradcheck(lambda *inp: flatten(autogradF.hessian(add_pow_reducer, inp, create_graph=True, vectorize=vectorize)), inputs) - - def foo(x, y): - x = x.cos() - val, hess = autogradF.hessian(add_pow_reducer, (x, y), create_graph=True, vectorize=vectorize) - - res = val[0].cos().sum() + val[1].cos().sum() + hess[0].cos().sum() - res = res + hess[1].cos().sum() + x.cos().sum() + y.cos().sum() - return res - - gradcheck(foo, inputs) - gradgradcheck(foo, inputs) - - def test_hessian_create_graph(self): - self._test_hessian_create_graph(vectorize=False) - - def test_hessian_create_graph_vectorize(self): - self._test_hessian_create_graph(vectorize=True) - - def test_vhp_err_check(self): - def foo(a): - return 3 * a.narrow(0, 0, 3).exp().sum() - - def bar(a): - return 3 * a.narrow(0, 0, 3), "bar" - - def bar2(a): - return 3 * a.narrow(0, 0, 3) - - inp = torch.rand(4) - v = torch.rand(4) - with self.assertRaisesRegex(TypeError, "The inputs given to vhp must be either a Tensor"): - res = autogradF.vhp(foo, (inp, 2), v) - - with self.assertRaisesRegex(TypeError, "The outputs of the user-provided function given to vhp must"): - res = autogradF.vhp(bar, inp, v) - - err_msg_out = "The Tensor returned by the function given to vhp should contain a single element" - with self.assertRaisesRegex(RuntimeError, err_msg_out): - res = autogradF.vhp(bar2, inp, v) - - with self.assertRaisesRegex(RuntimeError, "v has invalid size:"): - res = autogradF.vhp(foo, inp, torch.rand(5)) - - with self.assertRaisesRegex(TypeError, "The v given to vhp must be either a Tensor or a tuple of Tensors"): - res = autogradF.vhp(foo, inp, (v, 2)) - - res = autogradF.vhp(foo, inp, v) - self._assert_same_struct(res[1], inp) - - def foo(a, b): - return (3 * b.narrow(0, 0, 3) * a.narrow(0, 0, 3)).sum() - - inp = (torch.rand(4), torch.rand(5)) - v = (torch.rand(4), torch.rand(5)) - - res = autogradF.vhp(foo, inp, v) - self._assert_same_struct(res[1], inp) - - def test_vhp_err_check_strict(self): - def foo(a): - return a.detach().sum() - - def bar(a): - # Make a non-leaf Tensor that requires_grad but that is not connected to the input - return a.long().float().requires_grad_().clone().sum() - - def bar2(a): - # A Linear function for which the jacobian is independent of the input - return (3 * a).sum() +class Foo(torch.autograd.Function): + @staticmethod + def forward(ctx, x): + return x.clone() - inp = torch.rand(4) - v = torch.rand(4) - with self.assertRaisesRegex(RuntimeError, "Output 0 of the user-provided function does not require gradients."): - res = autogradF.vhp(foo, inp, v, strict=True) - res = autogradF.vhp(foo, inp, v, strict=False) - self._assert_same_struct(res[1], inp) - self.assertEqual(res[1].abs().sum(), 0.) + @staticmethod + def forward(ctx, gO): + return gO.clone() - with self.assertRaisesRegex(RuntimeError, "The output of the user-provided function is independent of input 0"): - res = autogradF.vhp(bar, inp, v, strict=True) - res = autogradF.vhp(bar, inp, v, strict=False) - self._assert_same_struct(res[1], inp) - self.assertEqual(res[1].abs().sum(), 0.) +def get_out(): + inp = torch.rand(2, requires_grad=True) - with self.assertRaisesRegex(RuntimeError, "jacobian of the user-provided function with respect to input 0 is"): - res = autogradF.vhp(bar2, inp, v, strict=True) - res = autogradF.vhp(bar2, inp, v, strict=False) - self._assert_same_struct(res[1], inp) - self.assertEqual(res[1].abs().sum(), 0.) + # The python function is first so that it runs + # last in the backward pass + right = Foo.apply(inp) - def test_vhp_no_grad(self): - def reducer(x): - return x.exp().sum() - inputs = torch.rand(4, 4) - v = torch.ones(4, 4) - with torch.no_grad(): - res = autogradF.vhp(reducer, inputs, v) - self.assertIsNone(res[0].grad_fn) - self.assertIsNone(res[1].grad_fn) - self.assertNotEqual(res[1], torch.zeros(4, 4)) + # An op that creates new memory + left1 = inp.clone() + # An op that saves its input + left2 = left1 ** 2 - with torch.no_grad(): - res = autogradF.vhp(reducer, inputs, v, create_graph=True) - self.assertIsNotNone(res[0].grad_fn) - self.assertIsNotNone(res[1].grad_fn) - self.assertNotEqual(res[1], torch.zeros(4, 4)) - - def test_vhp_output(self): - def foo(a): - return 3 * a.narrow(0, 0, 3).exp().sum() - - inputs = torch.rand(4, 4) - v = torch.ones(4, 4) - res = autogradF.vhp(foo, inputs, v) - self._assert_same_struct(res[1], inputs) - self.assertIsNone(res[0].grad_fn) - self.assertIsNone(res[1].grad_fn) - - def bar(a, b): - return (a + 3 * b.narrow(0, 0, 3)).exp().sum() - - inputs = (torch.rand(3), torch.rand(4)) - v = (torch.ones(3), torch.ones(4)) - out, vhp_val = autogradF.vhp(bar, inputs, v) - self._assert_same_struct(vhp_val, inputs) - self.assertIsNone(out.grad_fn) - self.assertIsNone(vhp_val[0].grad_fn) - self.assertIsNone(vhp_val[1].grad_fn) - - def test_vhp_scalar(self): - def reducer(x): - return x.sum() - inputs = torch.rand(4, 4) - v = torch.ones(4, 4) - res = autogradF.vhp(reducer, inputs, v) - self._assert_same_struct(res[1], inputs) - - inputs = torch.rand([]) - v = torch.rand([]) - res = autogradF.vhp(reducer, inputs, v) - self._assert_same_struct(res[1], inputs) - - res = autogradF.vhp(reducer, inputs) - self._assert_same_struct(res[1], inputs) - - def bad_reducer(x): - return x.sum().view(1, 1, 1) - inputs = torch.rand(4, 4) - v = torch.rand(4, 4) - res = autogradF.vhp(bad_reducer, inputs, v) - self._assert_same_struct(res[1], inputs) - - def test_vhp_create_graph(self): - def foo(a): - return 3 * a.narrow(0, 0, 3).exp().sum() - - inputs = torch.rand(4, 4, dtype=torch.double, requires_grad=True) - v = torch.ones(4, 4, dtype=torch.double, requires_grad=True) - res = autogradF.vhp(foo, inputs, v, create_graph=True) - self._assert_same_struct(res[1], inputs) - self.assertIsNotNone(res[0].grad_fn) - self.assertIsNotNone(res[1].grad_fn) - - gradcheck(lambda inp, v: autogradF.vhp(foo, inp, v, create_graph=True), (inputs, v)) - gradgradcheck(lambda inp, v: autogradF.vhp(foo, inp, v, create_graph=True), (inputs, v)) - - def bar(a, b): - return (a + 3 * b.narrow(0, 0, 3)).exp().sum() - - inputs = (torch.rand(3, dtype=torch.double, requires_grad=True), - torch.rand(4, dtype=torch.double, requires_grad=True)) - v = (torch.ones(3, dtype=torch.double, requires_grad=True), - torch.ones(4, dtype=torch.double, requires_grad=True)) - out, vhp_val = autogradF.vhp(bar, inputs, v, create_graph=True) - self._assert_same_struct(vhp_val, inputs) - self.assertIsNotNone(out.grad_fn) - self.assertIsNotNone(vhp_val[0].grad_fn) - self.assertIsNotNone(vhp_val[1].grad_fn) - - gradcheck(lambda *args: autogradF.vhp(bar, args[:2], args[2:], create_graph=True)[1], inputs + v) - gradgradcheck(lambda *args: autogradF.vhp(bar, args[:2], args[2:], create_graph=True)[1], inputs + v) - - def foo(*args): - x, y = args[:2] - v = args[2:] - - x = x.cos() - val, grad = autogradF.vhp(bar, (x, y), v, create_graph=True) - - return val.cos() + grad[0].cos().sum() + grad[1].cos() + x.cos().sum() + y.cos() - - gradcheck(foo, inputs + v) - gradgradcheck(foo, inputs + v) - - def test_hvp_err_check(self): - def foo(a): - return 3 * a.narrow(0, 0, 3).exp().sum() - - def bar(a): - return 3 * a.narrow(0, 0, 3), "bar" - - def bar2(a): - return 3 * a.narrow(0, 0, 3) - - inp = torch.rand(4) - v = torch.rand(4) - res = autogradF.hvp(foo, inp, v) - with self.assertRaisesRegex(TypeError, "The inputs given to hvp must be either a Tensor"): - res = autogradF.hvp(foo, (inp, 2), v) - - with self.assertRaisesRegex(TypeError, "The outputs of the user-provided function given to hvp must"): - res = autogradF.hvp(bar, inp, v) - - err_msg_out = "The Tensor returned by the function given to hvp should contain a single element" - with self.assertRaisesRegex(RuntimeError, err_msg_out): - res = autogradF.hvp(bar2, inp, v) - - with self.assertRaisesRegex(RuntimeError, "v has invalid size:"): - res = autogradF.hvp(foo, inp, torch.rand(5)) - - with self.assertRaisesRegex(TypeError, "The v given to hvp must be either a Tensor or a tuple of Tensors"): - res = autogradF.hvp(foo, inp, (v, 2)) - - res = autogradF.hvp(foo, inp, v) - self._assert_same_struct(res[1], inp) - - def foo(a, b): - return (3 * b.narrow(0, 0, 3) * a.narrow(0, 0, 3)).sum() - - inp = (torch.rand(4), torch.rand(5)) - v = (torch.rand(4), torch.rand(5)) - - res = autogradF.hvp(foo, inp, v) - self._assert_same_struct(res[1], inp) - - def test_hvp_err_check_strict(self): - def foo(a): - return a.detach().sum() - - def bar(a): - # Make a non-leaf Tensor that requires_grad but that is not connected to the input - return a.long().float().requires_grad_().clone().sum() - - def bar2(a): - # A Linear function for which the jacobian is independent of the input - return (3 * a).sum() - - inp = torch.rand(4) - v = torch.rand(4) - with self.assertRaisesRegex(RuntimeError, "Output 0 of the user-provided function does not require gradients."): - res = autogradF.hvp(foo, inp, v, strict=True) - res = autogradF.hvp(foo, inp, v, strict=False) - self._assert_same_struct(res[1], inp) - self.assertEqual(res[1].abs().sum(), 0.) - - with self.assertRaisesRegex(RuntimeError, "The output of the user-provided function is independent of input 0"): - res = autogradF.hvp(bar, inp, v, strict=True) - res = autogradF.hvp(bar, inp, v, strict=False) - self._assert_same_struct(res[1], inp) - self.assertEqual(res[1].abs().sum(), 0.) - - with self.assertRaisesRegex(RuntimeError, "jacobian of the user-provided function with respect to input 0 is"): - res = autogradF.hvp(bar2, inp, v, strict=True) - res = autogradF.hvp(bar2, inp, v, strict=False) - self._assert_same_struct(res[1], inp) - self.assertEqual(res[1].abs().sum(), 0.) - - def test_hvp_no_grad(self): - def reducer(x): - return x.exp().sum() - inputs = torch.rand(4, 4) - v = torch.ones(4, 4) - with torch.no_grad(): - res = autogradF.hvp(reducer, inputs, v) - self.assertIsNone(res[0].grad_fn) - self.assertIsNone(res[1].grad_fn) - self.assertNotEqual(res[1], torch.zeros(4, 4)) + # Inplace modify so that the backward for + # left2 always raises an error + left1 += 1 - with torch.no_grad(): - res = autogradF.hvp(reducer, inputs, v, create_graph=True) - self.assertIsNotNone(res[0].grad_fn) - self.assertIsNotNone(res[1].grad_fn) - self.assertNotEqual(res[1], torch.zeros(4, 4)) - - def test_hvp_output(self): - def foo(a): - return 3 * a.narrow(0, 0, 3).exp().sum() - - inputs = torch.rand(4, 4) - v = torch.ones(4, 4) - res = autogradF.hvp(foo, inputs, v) - self._assert_same_struct(res[1], inputs) - self.assertIsNone(res[0].grad_fn) - self.assertIsNone(res[1].grad_fn) - - def bar(a, b): - return (a + 3 * b.narrow(0, 0, 3)).exp().sum() - - inputs = (torch.rand(3), torch.rand(4)) - v = (torch.ones(3), torch.ones(4)) - out, hvp_val = autogradF.hvp(bar, inputs, v) - self._assert_same_struct(hvp_val, inputs) - self.assertIsNone(out.grad_fn) - self.assertIsNone(hvp_val[0].grad_fn) - self.assertIsNone(hvp_val[1].grad_fn) - - def test_hvp_scalar(self): - def reducer(x): - return x.exp().sum() - inputs = torch.rand(4, 4) - v = torch.ones(4, 4) - res = autogradF.hvp(reducer, inputs, v) - self._assert_same_struct(res[1], inputs) - - inputs = torch.rand([]) - v = torch.rand([]) - res = autogradF.hvp(reducer, inputs, v) - self._assert_same_struct(res[1], inputs) - - res = autogradF.hvp(reducer, inputs) - self._assert_same_struct(res[1], inputs) - - def bad_reducer(x): - return x.exp().sum().view(1, 1, 1) - inputs = torch.rand(4, 4) - v = torch.rand(4, 4) - res = autogradF.hvp(bad_reducer, inputs, v) - self._assert_same_struct(res[1], inputs) - - def test_hvp_create_graph(self): - def foo(a): - return 3 * a.narrow(0, 0, 3).exp().sum() - - inputs = torch.rand(4, 4, dtype=torch.double, requires_grad=True) - v = torch.ones(4, 4, dtype=torch.double, requires_grad=True) - res = autogradF.hvp(foo, inputs, v, create_graph=True) - self._assert_same_struct(res[1], inputs) - self.assertIsNotNone(res[0].grad_fn) - self.assertIsNotNone(res[1].grad_fn) - - gradcheck(lambda inp, v: autogradF.hvp(foo, inp, v, create_graph=True), (inputs, v)) - gradgradcheck(lambda inp, v: autogradF.hvp(foo, inp, v, create_graph=True), (inputs, v)) - - def bar(a, b): - return (a + 3 * b.narrow(0, 0, 3)).exp().sum() - - inputs = (torch.rand(3, dtype=torch.double, requires_grad=True), - torch.rand(4, dtype=torch.double, requires_grad=True)) - v = (torch.ones(3, dtype=torch.double, requires_grad=True), - torch.ones(4, dtype=torch.double, requires_grad=True)) - out, hvp_val = autogradF.hvp(bar, inputs, v, create_graph=True) - self._assert_same_struct(hvp_val, inputs) - self.assertIsNotNone(out.grad_fn) - self.assertIsNotNone(hvp_val[0].grad_fn) - self.assertIsNotNone(hvp_val[1].grad_fn) - - gradcheck(lambda *args: autogradF.hvp(bar, args[:2], args[2:], create_graph=True)[1], inputs + v) - gradgradcheck(lambda *args: autogradF.hvp(bar, args[:2], args[2:], create_graph=True)[1], inputs + v) - - def foo(*args): - x, y = args[:2] - v = args[2:] - - x = x.cos() - val, grad = autogradF.hvp(bar, (x, y), v, create_graph=True) - - return val.cos() + grad[0].cos().sum() + grad[1].cos() + x.cos().sum() + y.cos() - - gradcheck(foo, inputs + v) - gradgradcheck(foo, inputs + v) - - def test_jacobian_match_vjp_jvp(self): - def foo(x): - return x ** 3 + x.sum() + # An op that takes both side as input. + # After running, both side's last op will be in + # the ready queue + # And the op for left will run first as it was + # executed last during the forward + out = left2 + right - inputs = torch.rand(4) - v = torch.rand(4) + return out - jac = autogradF.jacobian(foo, inputs) - jvp = autogradF.jvp(foo, inputs, v)[1] - vjp = autogradF.vjp(foo, inputs, v)[1] +# Nothing should be global variables here as, from what +# I can see, python leaks all the global objects +get_out().sum().backward() - self.assertEqual(jvp, torch.mm(jac, v.unsqueeze(1)).squeeze(1)) - self.assertEqual(vjp, torch.mm(v.unsqueeze(0), jac).squeeze(0)) +# This used to deadlock when the PyNode is being destroyed after +# the error is raised. +""" + try: + subprocess.check_output( + [sys.executable, '-c', script], + stderr=subprocess.STDOUT, + # On Windows, opening the subprocess with the default CWD makes `import torch` + # fail, so just set CWD to this script's directory + cwd=os.path.dirname(os.path.realpath(__file__)), + # It is ok to have an extra long timeout here as a timeout means the test failed + timeout=20) + except subprocess.TimeoutExpired as e: + self.fail(msg="Example code timed out! See the code sample in the test for details.") + except subprocess.CalledProcessError as e: + err_msg = "RuntimeError: one of the variables needed for gradient computation" + self.assertTrue(err_msg in e.output.decode("utf-8")) - def test_hessian_match_vhp_hvp(self): - def foo(a): - return 3 * a.narrow(0, 0, 3).exp().sum() +def index_perm_variable(shape, max_indices): + if not isinstance(shape, tuple): + shape = (shape,) - inputs = torch.rand(4) - v = torch.rand(4) + index = torch.randperm(max_indices).narrow(0, 0, reduce(mul, shape)).view(shape) + return index - hes = autogradF.hessian(foo, inputs) - hvp = autogradF.hvp(foo, inputs, v)[1] - vhp = autogradF.vhp(foo, inputs, v)[1] +def bernoulli_scalar(): + return torch.tensor(0, dtype=torch.uint8).bernoulli_() - self.assertEqual(hvp, torch.mm(hes, v.unsqueeze(1)).squeeze(1)) - self.assertEqual(vhp, torch.mm(v.unsqueeze(0), hes).squeeze(0)) class TestAutogradForwardModeBatchedGrad(TestCase): def test_out_of_place_basic(self): @@ -7939,13 +6672,16 @@ class MySubclass(torch.Tensor): def __new__(cls, data=None): return torch.Tensor._make_subclass(cls, data) + __torch_function__ = torch._C._disabled_torch_function_impl + @classmethod def __torch_dispatch__(cls, func, types, args=(), kwargs=None): - if func == torch.ops.aten.alias: + if func.overloadpacket == torch.ops.aten.alias: counter[0] += 1 - with no_dispatch(): - return MySubclass(torch.ops.aten.alias(*args)) + # Make sure autograd is not disabled here + foo = torch.rand(1, requires_grad=True) + self.assertIsNotNone(foo.exp().grad_fn) with no_dispatch(): return func(*args, **kwargs) @@ -7954,10 +6690,11 @@ def __torch_dispatch__(cls, func, types, args=(), kwargs=None): s = MySubclass(a) with fwAD.dual_level(): + # Only the primal has "alias" called on it fwAD.make_dual(s, torch.rand_like(s)) self.assertEqual(counter[0], 1) fwAD.make_dual(torch.rand_like(s), s) - self.assertEqual(counter[0], 2) + self.assertEqual(counter[0], 1) def test_print(self): with fwAD.dual_level() as level: @@ -8760,7 +7497,7 @@ def test_copy_(self, device): # At the time of writing this test, copy_ is not generated from native_functions.yaml # there was a bug that bfloat16 was not recognized as floating. x = torch.randn(10, device=device, requires_grad=True) - floating_dt = [dt for dt in get_all_dtypes() if dt.is_floating_point] + floating_dt = floating_types_and(torch.half, torch.bfloat16) for dt in floating_dt: y = torch.empty(10, device=device, dtype=dt) y.copy_(x) @@ -9722,6 +8459,7 @@ def fn(x1, x2): # the suppressions. from autograd.test_complex import TestAutogradComplex # noqa: F401 +from autograd.test_functional import TestAutogradFunctional # noqa: F401 # e.g., TestAutogradDeviceTypeCPU and TestAutogradDeviceTypeCUDA instantiate_device_type_tests( diff --git a/test/test_binary_ufuncs.py b/test/test_binary_ufuncs.py index f51de26948862e..8407aee05e9505 100644 --- a/test/test_binary_ufuncs.py +++ b/test/test_binary_ufuncs.py @@ -13,6 +13,7 @@ import operator from functools import partial +import torch.autograd.forward_ad as fwAD from torch._six import inf, nan from torch.testing._internal.common_utils import ( TestCase, slowTest, iter_indices, TEST_WITH_ASAN, run_tests, gradcheck, @@ -23,216 +24,29 @@ skipCUDAIfRocm, skipIf, ops, OpDTypes, skipMeta) from torch.testing import make_tensor from torch.testing._internal.common_dtype import ( - all_types_and_complex_and, integral_types_and, get_all_dtypes, get_all_int_dtypes, get_all_math_dtypes, - get_all_complex_dtypes, get_all_fp_dtypes, + all_types_and_complex_and, all_types_and, integral_types, complex_types, integral_types_and, + floating_types_and, floating_and_complex_types, get_all_math_dtypes, ) from torch.testing._internal.common_methods_invocations import ( - binary_ufuncs, _NOTHING) + binary_ufuncs, _NOTHING, + generate_elementwise_binary_tensors, + generate_elementwise_binary_small_value_tensors, + generate_elementwise_binary_large_value_tensors, + generate_elementwise_binary_extremal_value_tensors, + generate_elementwise_binary_broadcasting_tensors, + generate_elementwise_binary_with_scalar_samples +) if TEST_SCIPY: import scipy.special import scipy.integrate -# TODO: remove this -def _generate_input(shape, dtype, device, with_extremal): - if shape == (): - x = torch.tensor((), dtype=dtype, device=device) - else: - if dtype.is_floating_point or dtype.is_complex: - # work around torch.randn not being implemented for bfloat16 - if dtype == torch.bfloat16: - x = torch.randn(*shape, device=device) * random.randint(30, 100) - x = x.to(torch.bfloat16) - else: - x = torch.randn(*shape, dtype=dtype, device=device) * random.randint(30, 100) - x[torch.randn(*shape) > 0.5] = 0 - if with_extremal and dtype.is_floating_point: - # Use extremal values - x[torch.randn(*shape) > 0.5] = float('nan') - x[torch.randn(*shape) > 0.5] = float('inf') - x[torch.randn(*shape) > 0.5] = float('-inf') - elif with_extremal and dtype.is_complex: - x[torch.randn(*shape) > 0.5] = complex('nan') - x[torch.randn(*shape) > 0.5] = complex('inf') - x[torch.randn(*shape) > 0.5] = complex('-inf') - elif dtype == torch.bool: - x = torch.zeros(shape, dtype=dtype, device=device) - x[torch.randn(*shape) > 0.5] = True - else: - x = torch.randint(15, 100, shape, dtype=dtype, device=device) - - return x - -# TODO: refactor this out -# Converts half/bfloat16 dtype to float when device is cpu -def _convert_t(dtype, device): - if device == 'cpu' and dtype in {torch.half, torch.bfloat16}: - return torch.float - return dtype - -# TODO: revise the tests to use make_tensor in common_utils.py instead -# Returns a tensor of the requested shape, dtype, and device -# Requesting a half CPU tensor returns a float CPU tensor with -# values representable by a half. -# Initialization uses randint for non-float types and randn for float types. -def _make_tensor(shape, dtype, device, fill_ones=False) -> torch.Tensor: - # Returns a tensor filled with ones - if fill_ones: - return torch.ones(*shape, dtype=_convert_t(dtype, device), device=device) - - # Returns a tensor with random integer values - if not (dtype.is_floating_point or dtype.is_complex): - t = torch.randint(0, 10, shape, device=device) - if dtype != torch.uint8: - t = t - 5 # generate negative values also - return t.to(_convert_t(dtype, device)) - - # Populates the CPU tensor with floats representable as half/bfloat16 - if dtype == torch.half and device == 'cpu': - return torch.randn(*shape, dtype=torch.float, device=device).half().float() - if dtype == torch.bfloat16 and device == 'cpu': - return torch.randn(*shape, dtype=torch.float, device=device).bfloat16().float() - - # Default: returns a tensor with random float values - return torch.randn(shape, dtype=dtype, device=device).to(dtype=dtype) - # TODO: update to use opinfos consistently class TestBinaryUfuncs(TestCase): # Generic tests for elementwise binary (AKA binary universal (u) functions (funcs)) # TODO: below contiguous tensor results are compared with a variety of noncontiguous results. # It would be interesting to have the lhs and rhs have different discontiguities. - # Returns a pair of iterables of contiguous tensors on the requested device - # and with the requested dtype. - # - # This function is intended to test the non-vectorized and vectorized code - # paths of unary functions, as well as their handling of odd tensor - # sizes (like zero-dim tensors and tensors with zero elements). - # - # Each iterable will include an a tensor with no elements, - # zero dim (scalar) tensors, small 1D tensors, a medium 1D tensor, and - # a large 2D tensor. - def _generate_numeric_tensors(self, op, *, device, dtype, lhs_kwargs, rhs_kwargs): - lhs_tensors = [] - rhs_tensors = [] - - shapes = ((0,), # tensors with no elements - (1, 0, 3), - # zero dim (scalar) tensor - (), - # small 1D tensor - (20,), - # medium 1D tensor - (812,), - # large 2D tensor - (1029, 917)) - - for kwargs, tensors in ((lhs_kwargs, lhs_tensors), (rhs_kwargs, rhs_tensors)): - for shape in shapes: - tensors.append(make_tensor(shape, dtype=dtype, device=device, **kwargs)) - - return lhs_tensors, rhs_tensors - - # Returns a pair of iterables of contiguous tensors on the requested device and with - # the requested dtype. - # - # Unlike the previous function, the values in these tensors are specified manually. - def _generate_interesting_small_valued_tensors(self, device, dtype): - # defines interesting values - _unsigned_int_vals = (0, 1, 55, 127, 128, 190, 210, 220, 254, 255, 256) - _int_vals = (0, -1, 1, -55, 55, -127, 127, -128, 128) - _float_vals = (0., - -.001, .001, - -.25, .25, - -1., 1., - -math.pi / 2, math.pi / 2, - -math.pi + .00001, math.pi - .00001, - -math.pi, math.pi, - -math.pi - .00001, math.pi + .00001) - - l_vals = [] - r_vals = [] - - if dtype.is_floating_point: - prod = product(_float_vals, _float_vals) - elif dtype.is_complex: - complex_vals = product(_float_vals, _float_vals) - # Note the use of list is required here or the map generator will be - # emptied by the following product and it won't produce the desired cross-product - complex_vals = list(map(lambda x: complex(*x), complex_vals)) - prod = product(complex_vals, complex_vals) - elif dtype in (torch.int8, torch.int16, torch.int32, torch.int64): - prod = product(_int_vals, _int_vals) - elif dtype is torch.uint8: - prod = product(_unsigned_int_vals, _unsigned_int_vals) - else: - raise ValueError("Unsupported dtype!") - - for l, r in prod: - l_vals.append(l) - r_vals.append(r) - - lhs = torch.tensor(l_vals, device=device, dtype=dtype) - rhs = torch.tensor(r_vals, device=device, dtype=dtype) - - return lhs, rhs - - def _generate_interesting_large_valued_tensors(self, device, dtype): - _large_int_vals = (-1113, 1113, -10701, 10701) - _large_float16_vals = (-501, 501, -1001.2, 1001.2, -13437.7, 13437.7) - _large_float_vals = _large_float16_vals + (-4988429.2, 4988429.2, -1e20, 1e20) - - l_vals = [] - r_vals = [] - - if dtype == torch.float16: - prod = product(_large_float16_vals, _large_float16_vals) - elif dtype.is_floating_point: - prod = product(_large_float_vals, _large_float_vals) - elif dtype.is_complex: - complex_vals = product(_large_float_vals, _large_float_vals) - # Note the use of list is required here or the map generator will be - # emptied by the following product and it won't produce the desired cross-product - complex_vals = list(map(lambda x: complex(*x), complex_vals)) - prod = product(complex_vals, complex_vals) - elif dtype in (torch.int16, torch.int32, torch.int64): - prod = product(_large_int_vals, _large_int_vals) - else: - raise ValueError("Unsupported dtype!") - - for l, r in prod: - l_vals.append(l) - r_vals.append(r) - lhs = torch.tensor(l_vals, device=device, dtype=dtype) - rhs = torch.tensor(r_vals, device=device, dtype=dtype) - - return lhs, rhs - - def _generate_interesting_extremal_valued_tensors(self, device, dtype): - _float_extremals = (float('inf'), float('-inf'), float('nan')) - - l_vals = [] - r_vals = [] - - if dtype.is_floating_point: - prod = product(_float_extremals, _float_extremals) - elif dtype.is_complex: - complex_vals = product(_float_extremals, _float_extremals) - # Note the use of list is required here or the map generator will be - # emptied by the following product and it won't produce the desired cross-product - complex_vals = list(map(lambda x: complex(*x), complex_vals)) - prod = product(complex_vals, complex_vals) - else: - raise ValueError("Unsupported dtype!") - - for l, r in prod: - l_vals.append(l) - r_vals.append(r) - lhs = torch.tensor(l_vals, device=device, dtype=dtype) - rhs = torch.tensor(r_vals, device=device, dtype=dtype) - - return lhs, rhs - # Helper for comparing torch tensors and NumPy arrays # TODO: should this or assertEqual also validate that strides are equal? def assertEqualHelper(self, actual, expected, msg, *, dtype, exact_dtype=True, **kwargs): @@ -263,7 +77,7 @@ def assertEqualHelper(self, actual, expected, msg, *, dtype, exact_dtype=True, * # Tests that the function and its (array-accepting) reference produce the same # values on given tensors - def _test_reference_numerics(self, dtype, op, tensor_pairs, equal_nan=True): + def _test_reference_numerics(self, dtype, op, gen, equal_nan=True): def _helper_reference_numerics(expected, actual, msg, exact_dtype, equal_nan=True): if not torch.can_cast(numpy_to_torch_dtype_dict[expected.dtype.type], dtype): exact_dtype = False @@ -275,19 +89,27 @@ def _helper_reference_numerics(expected, actual, msg, exact_dtype, equal_nan=Tru else: self.assertEqualHelper(actual, expected, msg, dtype=dtype, equal_nan=equal_nan, exact_dtype=exact_dtype) - for l, r in tensor_pairs: - if dtype is torch.bfloat16: - l_numpy = l.cpu().to(torch.float32).numpy() - r_numpy = r.cpu().to(torch.float32).numpy() - else: - l_numpy = l.cpu().numpy() - r_numpy = r.cpu().numpy() + for sample in gen: + # Each sample input acquired from the generator is just one lhs tensor + # and one rhs tensor + l = sample.input + r = sample.args[0] + + np_input, np_args, np_kwargs = sample.numpy() + l_numpy = np_input + r_numpy = np_args[0] actual = op(l, r) expected = op.ref(l_numpy, r_numpy) # Crafts a custom error message for smaller, printable tensors - if l.numel() < 10 and r.numel() < 10: + def _numel(x): + if isinstance(x, torch.Tensor): + return x.numel() + # Assumes x is a scalar + return 1 + + if _numel(l) < 10 and _numel(r) < 10: msg = ("Failed to produce expected results! Input lhs tensor was" " {0}, rhs tensor was {1}, torch result is {2}, and reference result is" " {3}.").format(l, r, actual, expected) @@ -307,13 +129,8 @@ def _helper_reference_numerics(expected, actual, msg, exact_dtype, equal_nan=Tru @ops(binary_ufuncs_with_references) def test_reference_numerics(self, device, dtype, op): - lhs_tensors, rhs_tensors = self._generate_numeric_tensors(op, - device=device, - dtype=dtype, - lhs_kwargs=op.lhs_make_tensor_kwargs, - rhs_kwargs=op.rhs_make_tensor_kwargs) - - self._test_reference_numerics(dtype, op, zip(lhs_tensors, rhs_tensors), equal_nan=True) + gen = generate_elementwise_binary_tensors(op, device=device, dtype=dtype) + self._test_reference_numerics(dtype, op, gen, equal_nan=True) # runtime error: 128 is outside the range of representable values of type 'signed char' @unittest.skipIf(TEST_WITH_ASAN, "Skipped under ASAN") @@ -322,8 +139,8 @@ def test_reference_numerics_small_values(self, device, dtype, op): if dtype is torch.bool: self.skipTest("Doesn't support bool!") - lhs, rhs = self._generate_interesting_small_valued_tensors(device, dtype) - self._test_reference_numerics(dtype, op, ((lhs, rhs),), equal_nan=True) + gen = generate_elementwise_binary_small_value_tensors(op, device=device, dtype=dtype) + self._test_reference_numerics(dtype, op, gen, equal_nan=True) # TODO: review if this skip is necessary @unittest.skipIf(TEST_WITH_ASAN, "Skipped under ASAN") @@ -331,8 +148,8 @@ def test_reference_numerics_small_values(self, device, dtype, op): allowed_dtypes=(torch.int16, torch.int32, torch.int64, torch.float16, torch.bfloat16, torch.float32, torch.float64, torch.complex64, torch.complex128)) def test_reference_numerics_large_values(self, device, dtype, op): - lhs, rhs = self._generate_interesting_large_valued_tensors(device, dtype) - self._test_reference_numerics(dtype, op, ((lhs, rhs),), equal_nan=True) + gen = generate_elementwise_binary_large_value_tensors(op, device=device, dtype=dtype) + self._test_reference_numerics(dtype, op, gen, equal_nan=True) # TODO: review if this skip is necessary @unittest.skipIf(TEST_WITH_ASAN, "Skipped under ASAN") @@ -340,58 +157,19 @@ def test_reference_numerics_large_values(self, device, dtype, op): allowed_dtypes=(torch.float16, torch.bfloat16, torch.float32, torch.float64, torch.complex64, torch.complex128)) def test_reference_numerics_extremal_values(self, device, dtype, op): - lhs, rhs = self._generate_interesting_extremal_valued_tensors(device, dtype) - self._test_reference_numerics(dtype, op, ((lhs, rhs),), equal_nan=True) + gen = generate_elementwise_binary_extremal_value_tensors(op, device=device, dtype=dtype) + self._test_reference_numerics(dtype, op, gen, equal_nan=True) # tests broadcasting and noncontiguous broadcasting behavior @ops(binary_ufuncs_with_references, allowed_dtypes=(torch.long, torch.float32,)) def test_broadcasting(self, device, dtype, op): - shapes = ( - ((1,), ()), - ((2,), ()), - ((1,), (2,)), - ((2,), (2,)), - ((2, 1), (2,)), - ((1, 2), (2,)), - ((3, 2), (2,)), - ((3, 2), (3, 2)), - ((1, 3, 2), (2,)), - ((1, 3, 2), (3, 2)), - ((3, 1, 2), (3, 2)), - ((1, 3, 2), (1, 3, 2)), - ((2, 3, 2), ()), - ((2, 3, 2), (2, 3, 2)), - ((3, 1, 2), (1, 3, 2)), - ) - - for shape, noncontiguous in product(shapes, [True, False]): - shape_lhs, shape_rhs = shape - lhs = make_tensor(shape_lhs, device=device, dtype=dtype, - noncontiguous=noncontiguous, **op.lhs_make_tensor_kwargs) - rhs = make_tensor(shape_rhs, device=device, dtype=dtype, - noncontiguous=noncontiguous, **op.rhs_make_tensor_kwargs) - - actual = op(lhs, rhs) - expected = op.ref(lhs.cpu().numpy(), rhs.cpu().numpy()) - - self.assertEqual(actual, expected, exact_dtype=False) - - @ops(binary_ufuncs, allowed_dtypes=(torch.long, torch.float32,)) - def test_broadcast_python_scalar(self, device, dtype, op): - for shape_lhs in ((), (1,), (2,), (1, 2, 3),): - lhs = make_tensor(shape_lhs, device=device, dtype=dtype, **op.lhs_make_tensor_kwargs) + gen = generate_elementwise_binary_broadcasting_tensors(op, device=device, dtype=dtype) + self._test_reference_numerics(dtype, op, gen, equal_nan=True) - rhs_tensor = make_tensor((), device=device, dtype=dtype, **op.rhs_make_tensor_kwargs) - rhs_expanded = rhs_tensor.expand_as(lhs) - rhs_scalar = rhs_tensor.item() - - expected = op(lhs, rhs_expanded) - - actual_tensor = op(lhs, rhs_tensor) - actual_scalar = op(lhs, rhs_scalar) - - self.assertEqual(actual_tensor, expected) - self.assertEqual(actual_scalar, expected) + @ops(binary_ufuncs_with_references, allowed_dtypes=(torch.long, torch.float32, torch.complex64)) + def test_scalar_support(self, device, dtype, op): + gen = generate_elementwise_binary_with_scalar_samples(op, device=device, dtype=dtype) + self._test_reference_numerics(dtype, op, gen, equal_nan=True) @ops(binary_ufuncs) def test_contig_vs_every_other(self, device, dtype, op): @@ -932,7 +710,7 @@ def test_inplace_division(self, device): id_after = id(t) self.assertEqual(id_before, id_after) - @dtypes(*get_all_dtypes(include_bool=False, include_complex=False)) + @dtypes(*all_types_and(torch.half, torch.bfloat16)) def test_div_rounding_modes(self, device, dtype): if dtype.is_floating_point: low, high = -10.0, 10.0 @@ -1032,8 +810,7 @@ def test_divide_by_zero_rounding(self, device, dtype): actual = torch.divide(a, zero, rounding_mode=rounding_mode) self.assertEqual(actual, expect, exact_dtype=exact_dtype) - @dtypes(*get_all_dtypes( - include_bool=False, include_complex=False, include_bfloat16=False)) + @dtypes(*all_types_and(torch.half)) def test_div_rounding_numpy(self, device, dtype): info = (torch.finfo(dtype) if dtype.is_floating_point else torch.iinfo(dtype)) @@ -1485,7 +1262,7 @@ def test_pow_cuda_complex_extremal_failing(self, device, dtype): self.assertEqual(cpu_out, cuda_out) @onlyNativeDeviceTypes - @dtypes(*(get_all_dtypes(include_bool=False, include_bfloat16=False))) + @dtypes(*all_types_and_complex_and(torch.half)) def test_complex_scalar_pow_tensor(self, device, dtype): complexes = [0.5j, 1. + 1.j, -1.5j, 2.2 - 1.6j, 1 + 0j] first_exp = make_tensor((100,), dtype=dtype, device=device, low=-2, high=2) @@ -1877,7 +1654,8 @@ def test_binary_ops_with_scalars(self, device): self.assertEqual(expected, python_op(first, second)) self.assertEqual(expected, torch_op(first, second)) - @dtypes(*product(get_all_dtypes(include_complex=False), get_all_dtypes(include_complex=False))) + @dtypes(*product(all_types_and(torch.half, torch.bfloat16, torch.bool), + all_types_and(torch.half, torch.bfloat16, torch.bool))) def test_maximum_minimum_type_promotion(self, device, dtypes): a = torch.tensor((0, 1), device=device, dtype=dtypes[0]) b = torch.tensor((1, 0), device=device, dtype=dtypes[1]) @@ -1885,7 +1663,7 @@ def test_maximum_minimum_type_promotion(self, device, dtypes): result = op(a, b) self.assertEqual(result.dtype, torch.result_type(a, b)) - @dtypes(*(get_all_int_dtypes() + [torch.bool])) + @dtypes(*integral_types_and(torch.bool)) def test_maximum_minimum_int_and_bool(self, device, dtype): ops = ((torch.maximum, torch.max, np.maximum), (torch.minimum, torch.min, np.minimum), (torch.fmax, None, np.fmax), (torch.fmin, None, np.fmin)) @@ -1911,7 +1689,7 @@ def test_maximum_minimum_int_and_bool(self, device, dtype): self.assertEqual(out, numpy_result) @precisionOverride({torch.bfloat16: 1e-2}) - @dtypes(*(get_all_fp_dtypes())) + @dtypes(*(floating_types_and(torch.half, torch.bfloat16))) def test_maximum_minimum_float(self, device, dtype): ops = ((torch.maximum, torch.max, np.maximum), (torch.minimum, torch.min, np.minimum), (torch.fmax, None, np.fmax), (torch.fmin, None, np.fmin)) @@ -1939,7 +1717,7 @@ def test_maximum_minimum_float(self, device, dtype): self.assertEqual(tensor_result, numpy_result, exact_dtype=False) self.assertEqual(out, numpy_result, exact_dtype=False) - @dtypes(*(get_all_fp_dtypes())) + @dtypes(*(floating_types_and(torch.half, torch.bfloat16))) def test_maximum_minimum_float_nan_and_inf(self, device, dtype): # np.maximum and np.minimum functions compare input arrays element-wisely. # if one of the elements being compared is a NaN, then that element is returned. @@ -1975,7 +1753,7 @@ def test_maximum_minimum_float_nan_and_inf(self, device, dtype): self.assertEqual(tensor_result, numpy_result) self.assertEqual(out, numpy_result) - @dtypes(*product(get_all_complex_dtypes(), get_all_dtypes())) + @dtypes(*product(complex_types(), all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool))) def test_maximum_minimum_complex(self, device, dtypes): for torch_op in (torch.maximum, torch.minimum, torch.max, torch.min, torch.fmax, torch.fmin): with self.assertRaisesRegex(RuntimeError, '.+not implemented for.+'): @@ -2017,7 +1795,8 @@ def test_maximum_minimum_cross_device(self, device): self.assertEqual(tensor_result_1, numpy_result_1) self.assertEqual(tensor_result_2, numpy_result_2) - @dtypes(*product(get_all_fp_dtypes(), get_all_fp_dtypes())) + @dtypes(*product(floating_types_and(torch.half, torch.bfloat16), + floating_types_and(torch.half, torch.bfloat16))) def test_maximum_and_minimum_subgradient(self, device, dtypes): def run_test(f, a, b, expected_a_grad, expected_b_grad): a = torch.tensor(a, requires_grad=True, device=device, dtype=dtypes[0]) @@ -2030,6 +1809,33 @@ def run_test(f, a, b, expected_a_grad, expected_b_grad): run_test(torch.maximum, [0., 1., 2.], [1., 1., 1.], [0., 0.5, 1.], [1., 0.5, 0.]) run_test(torch.minimum, [0., 1., 2.], [1., 1., 1.], [1., 0.5, 0.], [0., 0.5, 1.]) + def test_maximum_minimum_forward_ad_float32(self, device): + # TODO: This should really be covered by OpInfo but it isn't. The problem + # is that our gradient tests test using float64 but it should also test + # float32 + x = torch.randn(3, device=device, dtype=torch.float32) + y = torch.randn(3, device=device, dtype=torch.float32) + tx = torch.randn(3, device=device, dtype=torch.float32) + ty = torch.randn(3, device=device, dtype=torch.float32) + + with fwAD.dual_level(): + x_dual = fwAD.make_dual(x, tx) + y_dual = fwAD.make_dual(y, ty) + result = torch.maximum(x_dual, y_dual) + _, result_tangent = fwAD.unpack_dual(result) + + expected = torch.where(x > y, tx, ty) + self.assertEqual(result_tangent, expected) + + with fwAD.dual_level(): + x_dual = fwAD.make_dual(x, tx) + y_dual = fwAD.make_dual(y, ty) + result = torch.minimum(x_dual, y_dual) + _, result_tangent = fwAD.unpack_dual(result) + + expected = torch.where(x < y, tx, ty) + self.assertEqual(result_tangent, expected) + # TODO: tests like this should be generic @dtypesIfCUDA(torch.half, torch.float, torch.double) @dtypes(torch.float, torch.double) @@ -2046,18 +1852,29 @@ def test_mul_intertype_scalar(self, device, dtype): self.assertEqual(x, 4.5) @onlyCPU - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool)) def test_sub(self, device, dtype): - m1 = torch.tensor([2.34, 4.44], dtype=dtype, device=device) - m2 = torch.tensor([1.23, 2.33], dtype=dtype, device=device) + if dtype in integral_types(): + # Before Python 3.10, floats were implicitly converted to ints, but with + # DeprecationWarning: an integer is required (got type float). + # Implicit conversion to integers using __int__ is deprecated, + # and may be removed in a future version of Python. + # Since Python 3.10, that attempt gives an error. + m1 = torch.tensor([2, 4], dtype=dtype, device=device) + m2 = torch.tensor([1, 2], dtype=dtype, device=device) + diff = torch.tensor([1, 2], dtype=dtype) + else: + m1 = torch.tensor([2.34, 4.44], dtype=dtype, device=device) + m2 = torch.tensor([1.23, 2.33], dtype=dtype, device=device) + diff = torch.tensor([1.11, 2.11], dtype=dtype) if dtype == torch.bool: self.assertRaises(RuntimeError, lambda: m1 - m2) elif (dtype == torch.bfloat16 or dtype == torch.half): # bfloat16 has a lower precision so we have to have a separate check for it - self.assertEqual(m1 - m2, torch.tensor([1.11, 2.11], dtype=dtype), atol=0.01, rtol=0) + self.assertEqual(m1 - m2, diff, atol=0.01, rtol=0) else: - self.assertEqual(m1 - m2, torch.tensor([1.11, 2.11], dtype=dtype)) + self.assertEqual(m1 - m2, diff) # TODO: what is this test testing? @onlyCPU @@ -2108,8 +1925,8 @@ def test_min_max_binary_op_nan(self, device, dtype): self.assertFalse(torch.isnan(ma[i]), "max(a, b): {}, a: {}, b: {}".format(ma[i], a[i], b[i])) self.assertFalse(torch.isnan(mi[i]), "min(a, b): {}, a: {}, b: {}".format(mi[i], a[i], b[i])) - @dtypes(*product(get_all_dtypes(include_complex=False), - get_all_dtypes(include_complex=False))) + @dtypes(*product(all_types_and(torch.half, torch.bfloat16, torch.bool), + all_types_and(torch.half, torch.bfloat16, torch.bool))) def test_copysign(self, device, dtypes): def _test_copysign_numpy(a, b): torch_result = torch.copysign(a, b) @@ -2126,7 +1943,7 @@ def _test_copysign_numpy(a, b): expected = torch.from_numpy(np.copysign(np_a, np_b)) # To handle inconsistencies of type promotion between PyTorch and Numpy # Applied for both arguments having integral precision and bfloat16 - types = [torch.bool, torch.bfloat16] + get_all_int_dtypes() + types = integral_types_and(torch.bool, torch.bfloat16) if a.dtype in types or b.dtype in types: promoted_type = torch.promote_types(torch_result.dtype, expected.dtype) torch_result = torch_result.to(promoted_type) @@ -2171,13 +1988,13 @@ def _test_copysign_numpy(a, b): for case in cases: _test_copysign_numpy(torch.tensor([case], device=device, dtype=dtypes[0]), b) - if dtypes[1] in get_all_fp_dtypes(): + if dtypes[1] in floating_types_and(torch.half, torch.bfloat16): a = make_tensor((10, 10), device=device, dtype=dtypes[0], low=-9, high=9) for case in cases: _test_copysign_numpy(a, torch.tensor([case], device=device, dtype=dtypes[1])) - @dtypes(*product(get_all_fp_dtypes(), - get_all_fp_dtypes())) + @dtypes(*product(floating_types_and(torch.half, torch.bfloat16), + floating_types_and(torch.half, torch.bfloat16))) def test_copysign_subgradient(self, device, dtypes): # Input is 0.0 x = torch.tensor([0.0, 0.0, 0.0], dtype=dtypes[0], device=device, requires_grad=True) @@ -2317,7 +2134,7 @@ def test_rdiv(self, device, dtype): z = torch.tensor([30 / v.item() for v in x], device=device) self.assertEqual(y, z, exact_dtype=False) - @dtypes(*get_all_fp_dtypes(include_bfloat16=False)) + @dtypes(*floating_types_and(torch.half)) def test_fmod_remainder_by_zero_float(self, device, dtype): fn_list = (torch.fmod, torch.remainder) for fn in fn_list: @@ -2329,7 +2146,7 @@ def test_fmod_remainder_by_zero_float(self, device, dtype): @onlyNativeDeviceTypes # Check Issue https://github.com/pytorch/pytorch/issues/48130 @skipCUDAIfRocm # Error happens on both ROCM and XLA - @dtypes(*get_all_int_dtypes()) + @dtypes(*integral_types()) def test_fmod_remainder_by_zero_integral(self, device, dtype): fn_list = (torch.fmod, torch.remainder) for fn in fn_list: @@ -2354,7 +2171,7 @@ def test_fmod_remainder_by_zero_integral(self, device, dtype): value = 255 if dtype == torch.uint8 else -1 self.assertTrue(torch.all(fn(x, zero) == value)) - @dtypes(*get_all_dtypes(include_bfloat16=False, include_bool=False, include_complex=False)) + @dtypes(*all_types_and(torch.half)) def test_fmod_remainder(self, device, dtype): # Use numpy as reference def _helper(x, mod, fns_list): @@ -2391,7 +2208,7 @@ def _helper(x, mod, fns_list): # Mods: Integer, Float, Tensor, Non-contiguous Tensor mods = [3, 2.3, mod, mod.t()] # mod with floating-point dtype - if dtype in get_all_int_dtypes(): + if dtype in integral_types(): mod_float = make_tensor((10, 10), device=device, dtype=torch.float, low=-9, high=9) mod[mod == 0] = 1 mods.append(mod_float) @@ -2612,7 +2429,7 @@ def test_floor_divide_zero(self, device, dtype): a // b @unittest.skipIf(TEST_WITH_ASAN, "Integer overflows are not allowed under ASAN") - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool)) def test_muldiv_scalar(self, device, dtype): x = make_tensor((10, 3), dtype=dtype, device=device, low=None, high=None) s = make_tensor((1,), dtype=dtype, device="cpu", low=None, high=None).item() @@ -2622,7 +2439,38 @@ def test_muldiv_scalar(self, device, dtype): self.assertEqual(x / s, x / y) self.assertEqual(s / x, y / x) - @dtypes(*tuple(itertools.combinations_with_replacement(get_all_dtypes(), 2))) + # TODO: update make_tensor to support extremal additions and remove this in favor of make_tensor + def _generate_input(self, shape, dtype, device, with_extremal): + if shape == (): + x = torch.tensor((), dtype=dtype, device=device) + else: + if dtype.is_floating_point or dtype.is_complex: + # work around torch.randn not being implemented for bfloat16 + if dtype == torch.bfloat16: + x = torch.randn(*shape, device=device) * random.randint(30, 100) + x = x.to(torch.bfloat16) + else: + x = torch.randn(*shape, dtype=dtype, device=device) * random.randint(30, 100) + x[torch.randn(*shape) > 0.5] = 0 + if with_extremal and dtype.is_floating_point: + # Use extremal values + x[torch.randn(*shape) > 0.5] = float('nan') + x[torch.randn(*shape) > 0.5] = float('inf') + x[torch.randn(*shape) > 0.5] = float('-inf') + elif with_extremal and dtype.is_complex: + x[torch.randn(*shape) > 0.5] = complex('nan') + x[torch.randn(*shape) > 0.5] = complex('inf') + x[torch.randn(*shape) > 0.5] = complex('-inf') + elif dtype == torch.bool: + x = torch.zeros(shape, dtype=dtype, device=device) + x[torch.randn(*shape) > 0.5] = True + else: + x = torch.randint(15, 100, shape, dtype=dtype, device=device) + + return x + + @dtypes(*tuple(itertools.combinations_with_replacement(all_types_and_complex_and(torch.half, + torch.bfloat16, torch.bool), 2))) def test_comparison_ops_type_promotion_and_broadcasting(self, device, dtypes): # issue #42660 # testing all combinations of broadcasting and type promotion @@ -2658,8 +2506,8 @@ def compare_with_numpy_bin_op(torch_fn, np_fn, x, y, out=None): for size1 in input_sizes: size2 = (2,) + size1 # perform broadcasting for with_extremal in [False, True]: - a = _generate_input(size1, dtypes[0], device, with_extremal) - b = _generate_input(size2, dtypes[1], device, with_extremal) + a = self._generate_input(size1, dtypes[0], device, with_extremal) + b = self._generate_input(size2, dtypes[1], device, with_extremal) for torch_op, numpy_op in op_pairs: if (dtypes[0].is_complex or dtypes[1].is_complex) and torch_op in complex_op_denylist: continue @@ -2804,8 +2652,8 @@ def test_bitwise_shift_float(self, device): self.assertEqual(torch_op(a, 2.2), expected_op(a, 2.2)) @onlyNativeDeviceTypes - @dtypes(*list(product(get_all_dtypes(include_complex=False), - get_all_dtypes(include_complex=False)))) + @dtypes(*list(product(all_types_and(torch.half, torch.bfloat16, torch.bool), + all_types_and(torch.half, torch.bfloat16, torch.bool)))) def test_heaviside(self, device, dtypes): input_dtype = dtypes[0] values_dtype = dtypes[1] @@ -2864,8 +2712,7 @@ def test_heaviside_cross_device(self, device): with self.assertRaisesRegex(RuntimeError, 'Expected all tensors to be on the same device'): torch.heaviside(y, x) - @dtypes(*list(product(get_all_complex_dtypes(), - get_all_complex_dtypes()))) + @dtypes(*list(product(complex_types(), complex_types()))) def test_heaviside_complex(self, device, dtypes): input_dtype = dtypes[0] values_dtype = dtypes[1] @@ -2900,15 +2747,18 @@ def _test_logical(self, device, dtypes, op, a_, b_, expected_res_): getattr(a, op + '_')(b) self.assertEqual(expected_res, a) - @dtypes(*product(get_all_dtypes(), get_all_dtypes())) + @dtypes(*product(all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool), + all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool))) def test_logical_xor(self, device, dtypes): self._test_logical(device, dtypes, 'logical_xor', [10, 0, 1, 0], [1, 0, 0, 10], [0, 0, 1, 1]) - @dtypes(*product(get_all_dtypes(), get_all_dtypes())) + @dtypes(*product(all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool), + all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool))) def test_logical_and(self, device, dtypes): self._test_logical(device, dtypes, 'logical_and', [10, 0, 1, 0], [1, 0, 0, 10], [1, 0, 0, 0]) - @dtypes(*product(get_all_dtypes(), get_all_dtypes())) + @dtypes(*product(all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool), + all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool))) def test_logical_or(self, device, dtypes): self._test_logical(device, dtypes, 'logical_or', [10, 0, 1, 0], [1, 0, 0, 10], [1, 0, 1, 1]) @@ -3017,7 +2867,7 @@ def test_logaddexp2(self, device, dtype): self._test_logaddexp(device, dtype, base2=True) def test_add(self, device): - dtypes = [torch.float, torch.double] + get_all_complex_dtypes() + dtypes = floating_and_complex_types() for dtype in dtypes: # [res] torch.add([res,] tensor1, tensor2) m1 = torch.randn(100, 100, dtype=dtype, device=device) @@ -3219,7 +3069,7 @@ def test_bool_tensor_comparison_ops(self, device): torch.tensor([0, 1, 0, 1, 0, 1], dtype=torch.bool, device=device)) self.assertFalse(a.equal(b)) - @dtypes(*get_all_dtypes(include_complex=False)) + @dtypes(*all_types_and(torch.half, torch.bfloat16, torch.bool)) def test_logical(self, device, dtype): if dtype != torch.bool: x = torch.tensor([1, 2, 3, 4], device=device, dtype=dtype) @@ -3406,8 +3256,8 @@ def test_pow_scalar_overloads_mem_overlap(self, device, dtype): self.unary_check_input_output_mem_overlap( doubles, sz, lambda input, out: torch.pow(42, input, out=out)) - @dtypes(*list(product(get_all_dtypes(include_bool=False), - get_all_dtypes(include_bool=False)))) + @dtypes(*list(product(all_types_and_complex_and(torch.half, torch.bfloat16), + all_types_and_complex_and(torch.half, torch.bfloat16)))) def test_float_power(self, device, dtypes): def to_np(value): if isinstance(value, torch.Tensor) and value.dtype == torch.bfloat16: @@ -3503,8 +3353,8 @@ def _promo_helper(x, y): torch.Tensor.float_power_(base.clone(), exp) @skipIf(not TEST_SCIPY, "Scipy required for the test.") - @dtypes(*product(get_all_dtypes(include_complex=False, include_bfloat16=False), - get_all_dtypes(include_complex=False, include_bfloat16=False))) + @dtypes(*product(all_types_and(torch.half, torch.bool), + all_types_and(torch.half, torch.bool))) def test_xlogy_xlog1py(self, device, dtypes): x_dtype, y_dtype = dtypes @@ -3515,7 +3365,7 @@ def out_variant_helper(torch_fn, x, y): self.assertEqual(expected, out) def xlogy_inplace_variant_helper(x, y): - if x.dtype in get_all_int_dtypes() + [torch.bool]: + if x.dtype in integral_types_and(torch.bool): with self.assertRaisesRegex(RuntimeError, "can't be cast to the desired output type"): x.clone().xlogy_(y) @@ -3642,10 +3492,7 @@ def _compare_helper(x, y, torch_fn, reference_fn): _compare_helper(t, zeros, *xlog1py_fns) _compare_helper(t, 0., *xlog1py_fns) - @dtypes(*product(get_all_dtypes(include_complex=False, - include_half=False, include_bfloat16=False), - get_all_dtypes(include_complex=False, - include_half=False, include_bfloat16=False))) + @dtypes(*product(all_types_and(torch.bool), all_types_and(torch.bool))) @skipIf(not TEST_SCIPY, "Scipy required for the test.") @slowTest def test_zeta(self, device, dtypes): @@ -3733,20 +3580,11 @@ class UnknownType: torch.uint8 ] - # TODO: refactor to use make_tensor - def _small_2d(dtype, device, has_zeros=True, fill_ones=False, oneish=False): - t = _make_tensor((5, 5), dtype, device, fill_ones=fill_ones) - if oneish: - return t.clamp(min=_number(.99, 1, dtype), max=1.01) - if not has_zeros: - return t.clamp(min=(_number(_div_min, 1, dtype))) - return t - def create_test_func(op): @dtypes(*_types) def test(self, device, dtype): # Generate the inputs - tensor = _small_2d(dtype, device) + tensor = torch.empty((), device=device, dtype=dtype) # Runs the tensor op on the device result = getattr(tensor, op)(UnknownType()) diff --git a/test/test_complex.py b/test/test_complex.py index 9f2e0ad32401af..88404902631f7e 100644 --- a/test/test_complex.py +++ b/test/test_complex.py @@ -3,12 +3,12 @@ import torch from torch.testing._internal.common_device_type import instantiate_device_type_tests, dtypes from torch.testing._internal.common_utils import TestCase, run_tests -from torch.testing._internal.common_dtype import get_all_complex_dtypes +from torch.testing._internal.common_dtype import complex_types devices = (torch.device('cpu'), torch.device('cuda:0')) class TestComplexTensor(TestCase): - @dtypes(*get_all_complex_dtypes()) + @dtypes(*complex_types()) def test_to_list(self, device, dtype): # test that the complex float tensor has expected values and # there's no garbage value in the resultant list diff --git a/test/test_cuda.py b/test/test_cuda.py index f38101c5f0475f..c5c4c422486afb 100644 --- a/test/test_cuda.py +++ b/test/test_cuda.py @@ -3943,7 +3943,7 @@ def _test_reduce_add_coalesced(self, tensors, buffer_size): r_tensors = [comm.reduce_add(t) for t in zip(*dup_tensors)] for r, t in zip(r_tensors, tensors): self.assertEqualTypeString(r, t) - self.assertEqual(r, t * 2) + self.assertEqual(r.coalesce() if r.is_sparse else r, t * 2) rc_tensors = comm.reduce_add_coalesced(dup_tensors, buffer_size=buffer_size) self.assertEqual(r_tensors, rc_tensors) diff --git a/test/test_dataloader.py b/test/test_dataloader.py index 9a1e829bde4820..4900cd31516aa8 100644 --- a/test/test_dataloader.py +++ b/test/test_dataloader.py @@ -842,6 +842,21 @@ def __len__(self): return int(math.ceil(len(self.dataset) / float(self.batch_size))) +class TestMultiEpochDataset(IterableDataset): + def __init__(self, length): + self.length = length + + def __iter__(self): + worker_info = torch.utils.data.get_worker_info() + assert worker_info is not None + worker_id = worker_info.id + for idx in range(self.length // worker_info.num_workers): + yield worker_id + + def __len__(self): + return self.length + + class CustomList(list): pass @@ -1426,6 +1441,19 @@ def get_dataloader(): dataset = SynchronizedSeedDataset(num_workers, batch_size, num_workers) self.assertEqual(set(int(batch) for batch in get_dataloader()), set(int(batch) for batch in get_dataloader())) + def test_multi_epochs_reproducibility(self): + num_workers = 2 + batch_size = 10 + num_epochs = 3 + + dataset = TestMultiEpochDataset(batch_size * num_workers) + dataloader = self._get_data_loader(dataset, batch_size=batch_size, + shuffle=False, num_workers=num_workers) + + for ind in range(num_epochs): + for batch_idx, sample in enumerate(dataloader): + self.assertEqual(sample.tolist(), [batch_idx % num_workers] * batch_size) + def test_worker_init_fn(self): dataset = SeedDataset(4) dataloader = self._get_data_loader(dataset, batch_size=2, num_workers=2, @@ -2145,6 +2173,13 @@ def test_basics(self): self.assertEqual(list(dl), list(dl2)) self.assertEqual(list(dl), list(dl2_threading)) + class Sorter(IterDataPipe): + def __init__(self, datapipe): + self.datapipe = datapipe + + def __iter__(self): + return iter(sorted(self.datapipe)) + def test_shuffle(self): items = list(range(1000)) dp = IterableWrapper(items).sharding_filter().shuffle() @@ -2152,19 +2187,27 @@ def test_shuffle(self): dl = DataLoader2(dp, batch_size=None, num_workers=2, shuffle=False) self.assertEqual(items, list(dl)) - dl = DataLoader(dp, batch_size=None, num_workers=2, shuffle=False, - worker_init_fn=torch.utils.data.backward_compatibility.worker_init_fn) + dl = DataLoader2(dp, batch_size=None, num_workers=2, shuffle=False, + worker_init_fn=torch.utils.data.backward_compatibility.worker_init_fn) self.assertEqual(items, list(dl)) dl = DataLoader2(dp, batch_size=None, num_workers=2, shuffle=True) self.assertNotEqual(items, list(dl)) self.assertEqual(items, sorted(list(dl))) - dl = DataLoader(dp, batch_size=None, num_workers=2, shuffle=True, - worker_init_fn=torch.utils.data.backward_compatibility.worker_init_fn) + dl = DataLoader2(dp, batch_size=None, num_workers=2, shuffle=True, + worker_init_fn=torch.utils.data.backward_compatibility.worker_init_fn) self.assertNotEqual(items, list(dl)) self.assertEqual(items, sorted(list(dl))) + dl = DataLoader2(self.Sorter(dp), batch_size=None, num_workers=2, shuffle=True) + self.assertEqual(list(dl), items) + + dl = DataLoader2(self.Sorter(dp), batch_size=None, num_workers=2, shuffle=True, + worker_init_fn=torch.utils.data.backward_compatibility.worker_init_fn) + self.assertEqual(list(dl), items) + + @unittest.skipIf( TEST_WITH_TSAN, "Fails with TSAN with the following error: starting new threads after multi-threaded " diff --git a/test/test_datapipe.py b/test/test_datapipe.py index 25d8728be001b3..09900d21dafcc1 100644 --- a/test/test_datapipe.py +++ b/test/test_datapipe.py @@ -565,7 +565,11 @@ class TestFunctionalIterDataPipe(TestCase): def _serialization_test_helper(self, datapipe): serialized_dp = pickle.dumps(datapipe) deserialized_dp = pickle.loads(serialized_dp) - self.assertEqual(list(datapipe), list(deserialized_dp)) + try: + self.assertEqual(list(datapipe), list(deserialized_dp)) + except AssertionError as e: + print(f"{datapipe} is failing.") + raise e def _serialization_test_for_single_dp(self, dp): # 1. Testing for serialization before any iteration starts @@ -598,43 +602,44 @@ def _serialization_test_for_dp_with_children(self, dp1, dp2): self._serialization_test_helper(dp2) def test_serializable(self): - input_dp = dp.iter.IterableWrapper(range(10)) - picklable_datapipes: List[Tuple[Type[IterDataPipe], Tuple, Dict[str, Any]]] = [ - (dp.iter.Batcher, (3, True,), {}), - (dp.iter.Collator, (_fake_fn,), {}), - (dp.iter.Concater, (dp.iter.IterableWrapper(range(5)),), {}), - (dp.iter.Demultiplexer, (2, _fake_filter_fn), {}), - (dp.iter.FileLister, (), {}), - (dp.iter.FileOpener, (), {}), - (dp.iter.Filter, (_fake_filter_fn,), {}), - (dp.iter.Filter, (partial(_fake_filter_fn_constant, 5),), {}), - (dp.iter.Forker, (2,), {}), - (dp.iter.Grouper, (_fake_filter_fn,), {"group_size": 2}), - (dp.iter.IterableWrapper, (), {}), - (dp.iter.Mapper, (_fake_fn, ), {}), - (dp.iter.Mapper, (partial(_fake_add, 1), ), {}), - (dp.iter.Multiplexer, (input_dp,), {}), - (dp.iter.Sampler, (), {}), - (dp.iter.Shuffler, (), {}), - (dp.iter.StreamReader, (), {}), - (dp.iter.UnBatcher, (0,), {}), - (dp.iter.Zipper, (input_dp,), {}), + picklable_datapipes: List = [ + (dp.iter.Batcher, None, (3, True,), {}), + (dp.iter.Collator, None, (_fake_fn,), {}), + (dp.iter.Concater, None, (dp.iter.IterableWrapper(range(5)),), {}), + (dp.iter.Demultiplexer, None, (2, _fake_filter_fn), {}), + (dp.iter.FileLister, ".", (), {}), + (dp.iter.FileOpener, None, (), {}), + (dp.iter.Filter, None, (_fake_filter_fn,), {}), + (dp.iter.Filter, None, (partial(_fake_filter_fn_constant, 5),), {}), + (dp.iter.Forker, None, (2,), {}), + (dp.iter.Grouper, None, (_fake_filter_fn,), {"group_size": 2}), + (dp.iter.IterableWrapper, range(10), (), {}), + (dp.iter.Mapper, None, (_fake_fn, ), {}), + (dp.iter.Mapper, None, (partial(_fake_add, 1), ), {}), + (dp.iter.Multiplexer, None, (dp.iter.IterableWrapper(range(10)),), {}), + (dp.iter.Sampler, None, (), {}), + (dp.iter.Shuffler, dp.iter.IterableWrapper([0] * 10), (), {}), + (dp.iter.StreamReader, None, (), {}), + (dp.iter.UnBatcher, None, (0,), {}), + (dp.iter.Zipper, None, (dp.iter.IterableWrapper(range(10)),), {}), ] # Skipping comparison for these DataPipes - dp_skip_comparison = {dp.iter.FileLister, dp.iter.FileOpener, dp.iter.StreamReader, dp.iter.Shuffler} + dp_skip_comparison = {dp.iter.FileOpener, dp.iter.StreamReader} # These DataPipes produce multiple DataPipes as outputs and those should be compared dp_compare_children = {dp.iter.Demultiplexer, dp.iter.Forker} - for dpipe, dp_args, dp_kwargs in picklable_datapipes: + for dpipe, custom_input, dp_args, dp_kwargs in picklable_datapipes: + if custom_input is None: + custom_input = dp.iter.IterableWrapper(range(10)) if dpipe in dp_skip_comparison: # Merely make sure they are picklable and loadable (no value comparison) - datapipe = dpipe(input_dp, *dp_args, **dp_kwargs) # type: ignore[call-arg] + datapipe = dpipe(custom_input, *dp_args, **dp_kwargs) # type: ignore[call-arg] serialized_dp = pickle.dumps(datapipe) _ = pickle.loads(serialized_dp) elif dpipe in dp_compare_children: # DataPipes that have children - dp1, dp2 = dpipe(input_dp, *dp_args, **dp_kwargs) # type: ignore[call-arg] + dp1, dp2 = dpipe(custom_input, *dp_args, **dp_kwargs) # type: ignore[call-arg] self._serialization_test_for_dp_with_children(dp1, dp2) else: # Single DataPipe that requires comparison - datapipe = dpipe(input_dp, *dp_args, **dp_kwargs) # type: ignore[call-arg] + datapipe = dpipe(custom_input, *dp_args, **dp_kwargs) # type: ignore[call-arg] self._serialization_test_for_single_dp(datapipe) def test_serializable_with_dill(self): @@ -1402,6 +1407,10 @@ def test_shuffle_iterdatapipe(self): with self.assertRaisesRegex(TypeError, r"instance doesn't have valid length$"): len(shuffle_dp_nl) + # Test: deactivate shuffling via set_shuffle + unshuffled_dp = input_ds.shuffle().set_shuffle(False) + self.assertEqual(list(unshuffled_dp), list(input_ds)) + def test_zip_iterdatapipe(self): # Functional Test: raises TypeError when an input is not of type `IterDataPipe` @@ -1433,30 +1442,45 @@ def test_zip_iterdatapipe(self): class TestFunctionalMapDataPipe(TestCase): - def _serialization_test_helper(self, datapipe, has_two_children=False): + def _serialization_test_helper(self, datapipe): serialized_dp = pickle.dumps(datapipe) deserialized_dp = pickle.loads(serialized_dp) - if not has_two_children: + try: self.assertEqual(list(datapipe), list(deserialized_dp)) - else: - for c1, c2 in zip(list(datapipe), list(deserialized_dp)): - self.assertEqual(list(c1), list(c2)) + except AssertionError as e: + print(f"{datapipe} is failing.") + raise e + + def _serialization_test_for_single_dp(self, dp): + # 1. Testing for serialization before any iteration starts + self._serialization_test_helper(dp) + # 2. Testing for serialization after DataPipe is partially read + it = iter(dp) + _ = next(it) + self._serialization_test_helper(dp) + # 3. Testing for serialization after DataPipe is fully read + _ = list(it) + self._serialization_test_helper(dp) def test_serializable(self): - input_dp = dp.map.SequenceWrapper(range(10)) - picklable_datapipes: List[ - Tuple[Type[MapDataPipe], Tuple, Dict[str, Any]] - ] = [ - (dp.map.Mapper, (), {}), - (dp.map.Mapper, (_fake_fn, ), {}), - (dp.map.Mapper, (partial(_fake_add, 1), ), {}), + picklable_datapipes: List = [ + (dp.map.Batcher, None, (2,), {}), + (dp.map.Concater, None, (dp.map.SequenceWrapper(range(10)),), {}), + (dp.map.Mapper, None, (), {}), + (dp.map.Mapper, None, (_fake_fn, ), {}), + (dp.map.Mapper, None, (partial(_fake_add, 1), ), {}), + (dp.map.SequenceWrapper, range(10), (), {}), + (dp.map.Shuffler, dp.map.SequenceWrapper([0] * 5), (), {}), + (dp.map.Zipper, None, (dp.map.SequenceWrapper(range(10)),), {}), ] - for dpipe, dp_args, dp_kwargs in picklable_datapipes: - _ = pickle.dumps(dpipe(input_dp, *dp_args, **dp_kwargs)) # type: ignore[call-arg] - datapipe = dpipe(input_dp, *dp_args, **dp_kwargs) # type: ignore[call-arg] - self._serialization_test_helper(datapipe) + for dpipe, custom_input, dp_args, dp_kwargs in picklable_datapipes: + if custom_input is None: + custom_input = dp.map.SequenceWrapper(range(10)) + datapipe = dpipe(custom_input, *dp_args, **dp_kwargs) # type: ignore[call-arg] + self._serialization_test_for_single_dp(datapipe) def test_serializable_with_dill(self): + """Only for DataPipes that take in a function as argument""" input_dp = dp.map.SequenceWrapper(range(10)) unpicklable_datapipes: List[ Tuple[Type[MapDataPipe], Tuple, Dict[str, Any]] @@ -1655,7 +1679,7 @@ class A(IterDataPipe[P]): @skipTyping def test_subtype(self): - from torch.utils.data._typing import issubtype + from torch.utils.data.datapipes._typing import issubtype basic_type = (int, str, bool, float, complex, list, tuple, dict, set, T_co) @@ -1703,7 +1727,7 @@ def test_subtype(self): @skipTyping def test_issubinstance(self): - from torch.utils.data._typing import issubinstance + from torch.utils.data.datapipes._typing import issubinstance basic_data = (1, '1', True, 1., complex(1., 0.)) basic_type = (int, str, bool, float, complex) @@ -1773,7 +1797,7 @@ def __iter__(self) -> Iterator[Tuple[int, str]]: self.assertTrue(issubclass(DP1, IterDataPipe)) dp1 = DP1(10) - self.assertTrue(DP1.type.issubtype(dp1.type) and dp1.type.issubtype(DP1.type)) + self.assertTrue(DP1.type.issubtype(dp1.type) and dp1.type.issubtype(DP1.type)) # type: ignore[attr-defined] dp1_ = DP1(5) self.assertEqual(dp1.type, dp1_.type) @@ -1789,7 +1813,7 @@ def __iter__(self) -> Iterator[T_co]: self.assertTrue(issubclass(DP2, IterDataPipe)) dp2 = DP2() # type: ignore[var-annotated] - self.assertTrue(DP2.type.issubtype(dp2.type) and dp2.type.issubtype(DP2.type)) + self.assertTrue(DP2.type.issubtype(dp2.type) and dp2.type.issubtype(DP2.type)) # type: ignore[attr-defined] dp2_ = DP2() # type: ignore[var-annotated] self.assertEqual(dp2.type, dp2_.type) @@ -1805,7 +1829,7 @@ def __iter__(self) -> Iterator[Tuple[T_co, str]]: self.assertTrue(issubclass(DP3, IterDataPipe)) dp3 = DP3(range(10)) # type: ignore[var-annotated] - self.assertTrue(DP3.type.issubtype(dp3.type) and dp3.type.issubtype(DP3.type)) + self.assertTrue(DP3.type.issubtype(dp3.type) and dp3.type.issubtype(DP3.type)) # type: ignore[attr-defined] dp3_ = DP3(5) # type: ignore[var-annotated] self.assertEqual(dp3.type, dp3_.type) @@ -1827,7 +1851,7 @@ def __iter__(self) -> Iterator[str]: self.assertTrue(issubclass(DP5, IterDataPipe)) dp5 = DP5() - from torch.utils.data._typing import issubtype + from torch.utils.data.datapipes._typing import issubtype self.assertTrue(issubtype(dp5.type.param, Any) and issubtype(Any, dp5.type.param)) class DP6(IterDataPipe[int]): @@ -1844,13 +1868,13 @@ class DP7(IterDataPipe[Awaitable[T_co]]): r""" DataPipe with abstract base class""" self.assertTrue(issubclass(DP7, IterDataPipe)) - self.assertTrue(DP7.type.param == Awaitable[T_co]) + self.assertTrue(DP7.type.param == Awaitable[T_co]) # type: ignore[attr-defined] class DP8(DP7[str]): r""" DataPipe subclass from a DataPipe with abc type""" self.assertTrue(issubclass(DP8, IterDataPipe)) - self.assertTrue(DP8.type.param == Awaitable[str]) + self.assertTrue(DP8.type.param == Awaitable[str]) # type: ignore[attr-defined] @skipTyping def test_construct_time(self): @@ -1985,6 +2009,35 @@ def test_traverse_forked(self): self.assertEqual(expected, graph) +class TestCircularSerialization(TestCase): + + class CustomIterDataPipe(IterDataPipe): + def add_one(self, x): + return x + 1 + + def classify(self, x): + return 0 + + def __init__(self): + self._dp = dp.iter.IterableWrapper([1, 2, 4]).map(self.add_one).demux(2, self.classify)[0] + + def __iter__(self): + yield from self._dp + + def test_circular_reference(self): + self.assertEqual( + list(TestCircularSerialization.CustomIterDataPipe()), + list(pickle.loads(pickle.dumps(TestCircularSerialization.CustomIterDataPipe()))) + ) + _ = traverse(TestCircularSerialization.CustomIterDataPipe(), only_datapipe=True) + _ = traverse(TestCircularSerialization.CustomIterDataPipe(), only_datapipe=False) + + # TODO: Ensure this works with `dill` installed + # @skipIfNoDill + # def test_circular_serialization_with_dill(self): + # assert list(self._CustomIterDataPipe()) == list(dill.loads(dill.dumps(self._CustomIterDataPipe()))) + + class TestSharding(TestCase): def _get_pipeline(self): diff --git a/test/test_dispatch.py b/test/test_dispatch.py index 37a6054f9151e6..bf609cf50b3e3c 100644 --- a/test/test_dispatch.py +++ b/test/test_dispatch.py @@ -532,8 +532,8 @@ def test_computed_table_with_ambiguous_autogradother(self): lambda m: m.def_("foo(Tensor x) -> Tensor"), # m.impl("foo", torch::kCompositeImplicitAutograd, [](const Tensor & x) { return x }) lambda m: m.impl_t_t("foo", "CompositeImplicitAutograd", debug="fn_math"), - # m.impl("foo", torch::kQuantizedCPU, [](const Tensor & x) { return x }) - lambda m: m.impl_t_t("foo", "QuantizedCPU", debug="fn_quantizedcpu"), + # m.impl("foo", torch::kFPGA, [](const Tensor & x) { return x }) + lambda m: m.impl_t_t("foo", "FPGA", debug="fn_fpga"), ]) state, table = result.state, result.table self.assertExpectedInline(state, '''\ @@ -541,12 +541,12 @@ def test_computed_table_with_ambiguous_autogradother(self): schema: test::foo(Tensor x) -> (Tensor) debug: registered at /dev/null:0 alias analysis kind: FROM_SCHEMA -QuantizedCPU: fn_quantizedcpu :: (Tensor _0) -> (Tensor _0) [ boxed unboxed ] +FPGA: fn_fpga :: (Tensor _0) -> (Tensor _0) [ boxed unboxed ] CompositeImplicitAutograd[alias]: fn_math :: (Tensor _0) -> (Tensor _0) [ boxed unboxed ] ''') # computed dispatch table is too big, so we only check on a few entries we're interested in. - extracted_table = extract_dispatch_table_with_keys(table, dispatch_keys_to_check + ('QuantizedCPU',)) + extracted_table = extract_dispatch_table_with_keys(table, dispatch_keys_to_check + ('FPGA',)) self.assertExpectedInline(extracted_table, '''\ Undefined: fn_math [math kernel] @@ -557,7 +557,7 @@ def test_computed_table_with_ambiguous_autogradother(self): AutogradCPU: fn_math [math kernel] AutogradCUDA: fn_math [math kernel] AutogradXLA: fn_math [math kernel] -QuantizedCPU: fn_quantizedcpu [kernel] +FPGA: fn_fpga [kernel] ''') def test_computed_table_with_cpu_defaultbackend(self): @@ -616,7 +616,7 @@ def test_computed_table_with_cpu_autograd_defaultbackend(self): ''') # computed dispatch table is too big, so we only check on a few entries we're interested in. - extracted_table = extract_dispatch_table_with_keys(table, dispatch_keys_to_check + ('QuantizedCPU',)) + extracted_table = extract_dispatch_table_with_keys(table, dispatch_keys_to_check + ('FPGA',)) self.assertExpectedInline(extracted_table, '''\ Undefined: fn_defaultbackend [default backend kernel] @@ -627,7 +627,7 @@ def test_computed_table_with_cpu_autograd_defaultbackend(self): AutogradCPU: fn_autograd [autograd kernel] AutogradCUDA: fn_autograd [autograd kernel] AutogradXLA: fn_autograd [autograd kernel] -QuantizedCPU: fn_defaultbackend [default backend kernel] +FPGA: fn_defaultbackend [default backend kernel] ''') def test_computed_table_with_cpu_autograd_math_defaultbackend(self): @@ -808,7 +808,7 @@ def test_basic(self): CPU fn_CPU [kernel] XLA fn_XLA [kernel] Lazy fn_Lazy [kernel] -QuantizedCPU fn_CompositeImplicitAutograd [math kernel] +FPGA fn_CompositeImplicitAutograd [math kernel] AutogradOther fn_CompositeImplicitAutograd [math kernel] AutogradCPU fallthrough [backend fallback] AutogradXLA fallthrough [backend fallback] @@ -829,7 +829,7 @@ def test_math_autogradcpu(self): CPU fn_CPU [kernel] XLA fn_XLA [kernel] Lazy fn_Lazy [kernel] -QuantizedCPU fn_CompositeImplicitAutograd [math kernel] +FPGA fn_CompositeImplicitAutograd [math kernel] AutogradOther fn_CompositeImplicitAutograd [math kernel] AutogradCPU fn_AutogradCPU [kernel] AutogradXLA fallthrough [backend fallback] @@ -864,7 +864,7 @@ def test_defaultbackend_autogradcpu(self): CPU fn_CPU [kernel] XLA fn_XLA [kernel] Lazy fn_Lazy [kernel] -QuantizedCPU fn_CompositeExplicitAutograd [default backend kernel] +FPGA fn_CompositeExplicitAutograd [default backend kernel] AutogradOther fallthrough [backend fallback] AutogradCPU fn_AutogradCPU [kernel] AutogradXLA fallthrough [backend fallback] @@ -889,7 +889,7 @@ def test_defaultbackend_autogradcpu(self): def test_autogradother(self): dispatcher = PythonDispatcher() - dispatcher.register(["CPU", "QuantizedCPU", "CompositeImplicitAutograd"]) + dispatcher.register(["CPU", "FPGA", "CompositeImplicitAutograd"]) self.assertExpectedInline( dispatcher.dispatchTable(), '''\ @@ -900,7 +900,7 @@ def test_autogradother(self): CPU fn_CPU [kernel] XLA fn_CompositeImplicitAutograd [math kernel] Lazy fn_CompositeImplicitAutograd [math kernel] -QuantizedCPU fn_QuantizedCPU [kernel] +FPGA fn_FPGA [kernel] AutogradOther ambiguous_autogradother [ambiguous autogradother] AutogradCPU fallthrough [backend fallback] AutogradXLA fn_CompositeImplicitAutograd [math kernel] @@ -915,8 +915,8 @@ def test_autogradother(self): Registered Kernels key kernel --------------------------- +FPGA fn_FPGA CPU fn_CPU -QuantizedCPU fn_QuantizedCPU CompositeImplicitAutograd[alias] fn_CompositeImplicitAutograd ''' ) @@ -935,5 +935,20 @@ def test_defaultbackend_math(self): r"Registration to both CompositeImplicitAutograd and CompositeExplicitAutograd is not allowed"): dispatcher.register(["CompositeExplicitAutograd", "CompositeImplicitAutograd"]) + def test_quantized_structured_not_implemented(self): + x = torch.zeros([1, 1, 1]) + y = torch.zeros([1, 1, 1]) + scale, zero_point = 1.0, 0 + dtype = torch.qint8 + qx = torch.quantize_per_tensor(x, scale, zero_point, dtype) + qy = torch.quantize_per_tensor(y, scale, zero_point, dtype) + # If bmm gets quantized support you need to update this to something + # else that is not implemented + self.assertRaisesRegex( + NotImplementedError, + "Could not run 'aten::bmm.out' with arguments from the 'QuantizedCPU' backend.", + lambda: torch.bmm(qx, qy) + ) + if __name__ == '__main__': run_tests() diff --git a/test/test_expanded_weights.py b/test/test_expanded_weights.py index 6c697b6c721bcb..63d08fa55a6255 100644 --- a/test/test_expanded_weights.py +++ b/test/test_expanded_weights.py @@ -1,12 +1,15 @@ # Owner(s): ["module: nn"] from functools import partial -from itertools import product +from itertools import product, chain import unittest import torch import torch.nn as nn +import torch.nn.functional as F +from torch.nn import CrossEntropyLoss from torch.nn.utils._per_sample_grad import call_for_per_sample_grads +from torch.testing._internal.common_cuda import TEST_CUDA from torch.testing._internal.common_device_type import OpDTypes, instantiate_device_type_tests, ops from torch.testing._internal.common_nn import TestBase, module_tests, new_module_tests from torch.testing._internal.common_utils import TestCase, freeze_rng_state, make_tensor, run_tests @@ -159,7 +162,7 @@ def test_expanded_weight_per_sample_grad(self, device, dtype, op): for (result_grad, expected_grad) in zip(expanded_weight_grad, per_sample_grad): if result_grad is None: result_grad = torch.zeros_like(expected_grad) - assert torch.allclose(result_grad, expected_grad), f"Got {result_grad}, expected {expected_grad}" + self.assertEqual(result_grad, expected_grad) @ops(filter(lambda op: op.supports_expanded_weight, op_db), dtypes=OpDTypes.supported, allowed_dtypes=(torch.double,)) def test_unsupported_expand_weights(self, device, dtype, op): @@ -185,10 +188,16 @@ def test_unsupported_expand_weights(self, device, dtype, op): def test_expanded_weight_forward(self, device, dtype, op): sample_inputs = op.sample_inputs(device, dtype) for sample_input in supported_inputs(op, sample_inputs): + if op.name == "nn.functional.embedding": # embedding flips its argument order for autograd tests + sample_input = SampleInput(sample_input.args[0].clone(), + args=(sample_input.input.clone(),), + kwargs=sample_input.kwargs) + if "cuda" in device and "max_norm" in sample_input.kwargs and "padding_idx" in sample_input.kwargs: + self.skipTest("embedding is non-determinstic in this case, see issue #74679") batch_size = sample_input.input.shape[0] if len(sample_input.input.shape) > 1 else 1 (ew_input, ew_args, ew_kwargs) = make_expanded_weight(sample_input, batch_size) - expanded_weight_result = op(ew_input, *ew_args, **ew_kwargs) - normal_result = op(sample_input.input, *sample_input.args, **sample_input.kwargs) + expanded_weight_result = run_op(op, ew_input, *ew_args, **ew_kwargs) + normal_result = run_op(op, sample_input.input, *sample_input.args, **sample_input.kwargs) self.assertEqual(expanded_weight_result, normal_result) def test_expanded_weight_error(self, device): @@ -198,10 +207,63 @@ def test_expanded_weight_error(self, device): with self.assertRaisesRegex(RuntimeError, r"Expanded Weights encountered but cannot handle function"): torch.add(sample_input, ExpandedWeight(sample_weight, batch_size)) + def test_small_model(self, device): + def convnet(num_classes): + return nn.Sequential( + nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1), + nn.ReLU(), + nn.AvgPool2d(kernel_size=2, stride=2), + nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1), + nn.ReLU(), + nn.AvgPool2d(kernel_size=2, stride=2), + nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=1), + nn.ReLU(), + nn.AvgPool2d(kernel_size=2, stride=2), + nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1), + nn.ReLU(), + nn.AdaptiveAvgPool2d((1, 1)), + nn.Flatten(start_dim=1, end_dim=-1), + nn.Linear(128, num_classes, bias=True), + ) + + batch_size = 32 + model = convnet(10).to(device) + input = torch.randn([batch_size, 3, 28, 28], device=device) + targets = torch.randint(0, 10, (batch_size,), device=device) + criterion = CrossEntropyLoss(reduction='sum') # use a loss that doesn't average across the batch to test in a for loop + result = call_for_per_sample_grads(model, batch_size, input) + loss = criterion(result, targets) + loss.backward() + result = [] + for weight in model.parameters(): + result.append(weight.grad_sample) + del weight.grad_sample + + expected = [] + for i in range(batch_size): + loss = criterion(model(input[i].unsqueeze(0)), targets[i].unsqueeze(0)) + expected.append(torch.autograd.grad(loss, model.parameters(), torch.ones_like(loss))) + + expected = [torch.stack(grad) for grad in zip(*expected)] + for (res, exp) in zip(result, expected): + self.assertEqual(res, exp, atol=1e-4, rtol=5e-5) + + def test_group_norm_error(self, device): + # group norm has to call native_group_norm. This checks that it hits the same errors + # that normal group norm would + + N = 3 + C = 5 + inp = torch.randn(N, C) + with self.assertRaisesRegex(RuntimeError, r"Expected number of channels in input to be divisible"): + F.group_norm(inp, 2) # 5 is not divisible by 2 class TestExpandedWeightModule(TestCase): def _do_test(self, module, input): batch_size = input.shape[0] + diff_input = input.dtype == torch.float or input.dtype == torch.double + if diff_input: + input.requires_grad_() with freeze_rng_state(): # get per sample grads with ExpandedWeights context manager actual_res = call_for_per_sample_grads(module, batch_size, input).sum() @@ -210,17 +272,25 @@ def _do_test(self, module, input): for param in module.parameters(): actual_grads.append(param.grad_sample) del param.grad_sample + if diff_input: + actual_grads.append(input.grad.clone()) + input.grad = torch.zeros_like(input.grad) # get per sample grads with a for loop - expected_res = torch.tensor(0.) + expected_res = torch.tensor(0., device=input.device, dtype=torch.double) expected_grads = [] for i in range(batch_size): - res = module(input[i].unsqueeze(0)).sum() - expected_grads.append(torch.autograd.grad(res, module.parameters(), torch.ones_like(res))) + input_slice = input[i] + diff_params = module.parameters() + if diff_input: + diff_params = chain(diff_params, (input_slice,)) + res = module(input_slice.unsqueeze(0)).sum() + out_grads = torch.autograd.grad(res, diff_params, torch.ones_like(res), allow_unused=True) + expected_grads.append(out_grads) expected_res += res expected_grads = tuple(torch.stack(grad) for grad in zip(*expected_grads)) self.assertEqual(actual_res, expected_res) - assert [torch.allclose(actual, expected) for (actual, expected) in zip(actual_grads, expected_grads)] + [self.assertEqual(actual, expected) for (actual, expected) in zip(actual_grads, expected_grads)] def _do_test_multi_input(self, module, input): class TestModule(nn.Module): @@ -232,6 +302,9 @@ def forward(self, input): return self.module(input) + self.module(input) batch_size = input.shape[0] + diff_input = input.dtype == torch.float or input.dtype == torch.double + if diff_input: + input.requires_grad_() with freeze_rng_state(): # get per sample grads with ExpandedWeights context manager, calling .backward() twice test_module = TestModule(module) @@ -241,14 +314,24 @@ def forward(self, input): for param in module.parameters(): actual_grads.append(param.grad_sample) del param.grad_sample + if diff_input: + actual_grads.append(input.grad.clone()) + input.grad = torch.zeros_like(input.grad) + # get per sample grads with a for loop, running over the input twice expected_grads = [] for i in range(batch_size): - res = module(input[i].unsqueeze(0)).sum() - expected_grads.append(torch.autograd.grad(res, module.parameters(), torch.ones_like(res))) - expected_grads = tuple(torch.stack(grad) for grad in zip(*expected_grads)) - assert [torch.allclose(actual, 2 * expected) for (actual, expected) in zip(actual_grads, expected_grads)] + input_slice = input[i] + diff_params = module.parameters() + if diff_input: + diff_params = chain(diff_params, (input_slice,)) + res = module(input_slice.unsqueeze(0)).sum() + out_grads = torch.autograd.grad(res, diff_params, torch.ones_like(res), allow_unused=True) + expected_grads.append(out_grads) + expected_grads = tuple(torch.stack(grad) for grad in zip(*expected_grads)) + expected_grads = tuple(expected_grad for expected_grad in expected_grads if expected_grad is not None) + assert [self.assertEqual(actual, 2 * expected) for (actual, expected) in zip(actual_grads, expected_grads)] def test_per_sample_api_failing(self): module = nn.Linear(10, 10) @@ -266,23 +349,28 @@ def test_per_sample_api_failing(self): class ContextManagerTests(TestBase): def __init__(self, *args, **kwargs): + self.test_cpu = kwargs.get('test_cpu', True) + self.test_cuda = kwargs.get('test_cuda', True) super().__init__(*args, **kwargs) @property def constructor_args(self): return self._get_arg('constructor_args', False) - def test_context_manager(self, test_case): - module = self.constructor(*self.constructor_args) - input = self._get_input() + def test_context_manager(self, test_case, device): + kwargs = {'device': device, 'dtype': torch.double} + module = self.constructor(*self.constructor_args).to(**kwargs) + if 'Embedding' in self.get_name(): + kwargs['dtype'] = torch.long + input = self._get_input().to(**kwargs) if len(input.shape) == 0 or input.shape[0] == 0: raise unittest.SkipTest("Can't get per sample gradients when no batch dim or batch dim is 0") if self.constructor == torch.nn.Linear and len(input.shape) == 1: raise unittest.SkipTest("Can't get per sample gradients for input of rank 1") test_case._do_test(module, input) - def test_context_manager_multiple_inputs(self, test_case): - module = self.constructor(*self.constructor_args) + def test_context_manager_multiple_inputs(self, test_case, device): + module = self.constructor(*self.constructor_args).to(device) input = self._get_input() if len(input.shape) == 0 or input.shape[0] == 0: raise unittest.SkipTest("Can't get per sample gradients when no batch dim or batch dim is 0") @@ -292,7 +380,7 @@ def test_context_manager_multiple_inputs(self, test_case): # TODO: Once all of these use ModuleInfo, replace with ModuleInfo tests # These currently use the legacy nn tests -supported_modules = ['Linear'] +supported_modules = ['Linear', 'Conv1d', 'Conv2d', 'Conv3d', 'Embedding', 'LayerNorm', 'GroupNorm'] supported_tests = [t for t in module_tests + new_module_tests if 'module_name' in t and t['module_name'] in supported_modules] for test_param in supported_tests: if 'constructor' not in test_param: @@ -308,9 +396,14 @@ def test_context_manager_multiple_inputs(self, test_case): raise RuntimeError('Found two tests with the same name: ' + test_name) if decorator is not None: fn = decorator(fn) - setattr(TestExpandedWeightModule, test_name, lambda self, test=test: test.test_context_manager(self)) - setattr(TestExpandedWeightModule, test_name_multi_input, - lambda self, test=test: test.test_context_manager_multiple_inputs(self)) + if test.test_cpu: + setattr(TestExpandedWeightModule, test_name, lambda self, test=test: test.test_context_manager(self, 'cpu')) + setattr(TestExpandedWeightModule, test_name_multi_input, + lambda self, test=test: test.test_context_manager_multiple_inputs(self, 'cpu')) + if TEST_CUDA and test.test_cuda: + # since this checks derivatives, only use double for precision + setattr(TestExpandedWeightModule, test_name + '_cuda_double', + lambda self, test=test: test.test_context_manager(self, 'cuda')) # ------------- HELPER FUNCTIONS ----------------- @@ -340,12 +433,13 @@ def supported_inputs(op, sample_inputs, supported_inputs=True): operations that would cause inter-batch operations. Removes all of the cases it cannot deal with """ def filter_fn(input): + convolutions = ["nn.functional.conv1d", "nn.functional.conv2d", "nn.functional.conv3d"] if op.name == "nn.functional.linear": is_supported_input = len(input.input.shape) > 1 # input of rank 1 means no batch dim elif op.name == "nn.functional.layer_norm": normalized_shape = input.args[0] is_supported_input = input.input.shape != normalized_shape # would cause inter-batch operations - elif op.name == "nn.functional.conv2d": + elif op.name in convolutions: # currently can't deal with padding computation on Python level is_supported_input = 'padding' not in input.kwargs or not isinstance(input.kwargs['padding'], str) elif op.name == "nn.functional.embedding": diff --git a/test/test_foreach.py b/test/test_foreach.py index a04ddcebbaaecd..4da23dc66fc3b7 100644 --- a/test/test_foreach.py +++ b/test/test_foreach.py @@ -11,12 +11,13 @@ from torch.testing._comparison import default_tolerances from torch.testing._internal.common_utils import TestCase, run_tests, TEST_WITH_ROCM, TEST_WITH_SLOW from torch.testing._internal.common_device_type import \ - (instantiate_device_type_tests, dtypes, onlyCUDA, skipCUDAIfRocm, skipMeta, ops) + (instantiate_device_type_tests, dtypes, onlyCUDA, skipMeta, ops) from torch.testing._internal.common_methods_invocations import ( foreach_unary_op_db, foreach_binary_op_db, foreach_pointwise_op_db, foreach_minmax_op_db, foreach_reduce_op_db) from torch.testing._internal.common_dtype import ( - get_all_dtypes, get_all_int_dtypes, get_all_complex_dtypes, get_all_fp_dtypes, + all_types_and_complex_and, all_types_and, integral_types, complex_types, + floating_types_and, floating_types, integral_types_and, ) # Includes some values such that N * N won't be a multiple of 4, @@ -140,7 +141,7 @@ def _test_binary_op_tensorlists(self, device, dtype, opinfo, N, is_fastpath, dis self._binary_test(dtype, inplace_op, inplace_ref, inputs, is_fastpath, is_inplace=True) if opinfo.supports_alpha_param: alpha = None - if dtype in get_all_int_dtypes(): + if dtype in integral_types(): alpha = 3 elif dtype.is_complex: alpha = complex(3, 3) @@ -165,19 +166,11 @@ def _test_binary_op_tensorlists(self, device, dtype, opinfo, N, is_fastpath, dis self._binary_test( dtype, inplace_op, inplace_ref, inputs, is_fastpath and disable_fastpath, is_inplace=True) - # note(mkozuki): Why ROCm? - # ROCm is supposed to compile slow path as in - # https://github.com/pytorch/pytorch/blob/7e032f18cf1405804c4f787b05ea2de5e08a091e/aten/src/ATen/native/ForeachUtils.h#L148-L164, # noqa: E501 - # Therefore `[torch.add(*args, alpha=alpha) for args in zip(tensors1, tensors2)]` and - # `torch._foreach_add(tensors1, tensors2, alpha=alpha)` - # are expected to return the same outputs, however, the outputs look unstable for torch.bfloat16 and torch.half. - # log: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-bionic-rocm4.2-py3.6-test1/2741/console - @skipCUDAIfRocm @skipMeta @ops(foreach_binary_op_db) def test_binary_op_tensorlists_fastpath(self, device, dtype, op): for N in N_values: - disable_fastpath = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool] + disable_fastpath = op.ref == torch.div and dtype in integral_types_and(torch.bool) if op.ref == torch.add and dtype == torch.bool: disable_fastpath = True self._test_binary_op_tensorlists(device, dtype, op, N, True, disable_fastpath) @@ -194,22 +187,21 @@ def _test_binary_op_scalar(self, device, dtype, opinfo, N, scalar, is_fastpath, self._binary_test(dtype, op, ref, inputs, is_fastpath, is_inplace=False) self._binary_test(dtype, inplace_op, inplace_ref, inputs, is_fastpath, is_inplace=True) - @skipCUDAIfRocm @skipMeta @ops(foreach_binary_op_db) def test_binary_op_scalar_fastpath(self, device, dtype, op): for N, scalar in itertools.product(N_values, Scalars): - disable_fastpath = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool] + disable_fastpath = op.ref == torch.div and dtype in integral_types_and(torch.bool) if isinstance(scalar, int): disable_fastpath |= dtype == torch.bool if isinstance(scalar, float): - disable_fastpath |= dtype in get_all_int_dtypes() + [torch.bool] + disable_fastpath |= dtype in integral_types_and(torch.bool) if isinstance(scalar, bool): disable_fastpath |= dtype == torch.bool if op.ref in (torch.add, torch.mul): disable_fastpath = False if isinstance(scalar, complex): - disable_fastpath |= dtype not in get_all_complex_dtypes() + disable_fastpath |= dtype not in complex_types() self._test_binary_op_scalar(device, dtype, op, N, scalar, True, disable_fastpath) @ops(foreach_binary_op_db) @@ -233,22 +225,21 @@ def _test_binary_op_scalarlist(self, device, dtype, opinfo, N, scalarlist, is_fa # errors depending on the order of scalarlist. To keep actual unit test impl simple, # separating mixed scalarlist tests. By setting the first element of scalarlist to bool, # they are expected to throw bool sub error even in inplace test. - @skipCUDAIfRocm @skipMeta @ops(foreach_binary_op_db) def test_binary_op_scalarlist_fastpath(self, device, dtype, op): for N in N_values: for type_str, scalarlist in getScalarLists(N): - bool_int_div = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool] + bool_int_div = op.ref == torch.div and dtype in integral_types_and(torch.bool) disable_fastpath = bool_int_div if type_str == "int": disable_fastpath |= dtype == torch.bool if type_str == "float": - disable_fastpath |= dtype in get_all_int_dtypes() + [torch.bool] + disable_fastpath |= dtype in integral_types_and(torch.bool) if type_str == "complex": - disable_fastpath |= dtype not in get_all_complex_dtypes() + disable_fastpath |= dtype not in complex_types() if type_str == "mixed": - disable_fastpath |= True and dtype not in get_all_complex_dtypes() + disable_fastpath |= True and dtype not in complex_types() self._test_binary_op_scalarlist(device, dtype, op, N, scalarlist, True, disable_fastpath) @ops(foreach_binary_op_db) @@ -305,7 +296,7 @@ def _test_pointwise_op(self, device, dtype, opinfo, N, is_fastpath, disable_fast @skipMeta @ops(foreach_pointwise_op_db) def test_pointwise_op_fastpath(self, device, dtype, op): - disable_fastpath = dtype in get_all_int_dtypes() + [torch.bool] + disable_fastpath = dtype in integral_types_and(torch.bool) # for N, scalar in itertools.product(N_values, Scalars): for N in N_values: self._test_pointwise_op(device, dtype, op, N, True, disable_fastpath) @@ -363,7 +354,7 @@ def _test_unary(self, device, dtype, opinfo, N, is_fastpath): op, ref, inplace_op, inplace_ref = self._get_funcs(opinfo, 1) inputs = opinfo.sample_inputs(device, dtype, N, noncontiguous=not is_fastpath), # note(mkozuki): Complex inputs for `_foreach_abs` go through slowpath. - if opinfo.name == "_foreach_abs" and dtype in get_all_complex_dtypes(): + if opinfo.name == "_foreach_abs" and dtype in complex_types(): is_fastpath = False self._regular_unary_test(dtype, op, ref, inputs, is_fastpath) self._inplace_unary_test(dtype, inplace_op, inplace_ref, inputs, is_fastpath) @@ -374,7 +365,7 @@ def test_unary_fastpath(self, device, dtype, op): for N in N_values: self._test_unary(device, dtype, op, N, is_fastpath=True) - @ops(foreach_unary_op_db, dtypes=get_all_dtypes()) + @ops(foreach_unary_op_db, dtypes=all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool)) def test_unary_slowpath(self, device, dtype, op): for N in N_values: self._test_unary(device, dtype, op, N, is_fastpath=False) @@ -391,7 +382,7 @@ def test_minmax_fastpath(self, device, dtype, op): self._minmax_test(op, inputs, True, N if dtype == torch.bool else 1) @ops(foreach_minmax_op_db, - dtypes=get_all_dtypes(include_half=True, include_bfloat16=True, include_complex=False)) + dtypes=all_types_and(torch.half, torch.bfloat16, torch.bool)) def test_minmax_slowpath(self, device, dtype, op): for N in N_values: inputs = tuple(op.sample_inputs(device, dtype, N, noncontiguous=True) for _ in range(2)) @@ -399,7 +390,7 @@ def test_minmax_slowpath(self, device, dtype, op): # note(mkozuki): ForeachFuncInfo's of both `_foreach_maximum` and `_foreach_minimum` include integer types. # so, manually limit dtypes to fp types for inf&nan tests. - @ops(foreach_minmax_op_db, dtypes=get_all_fp_dtypes(include_bfloat16=True, include_half=True)) + @ops(foreach_minmax_op_db, dtypes=floating_types_and(torch.half, torch.bfloat16)) def test_minmax_float_inf_nan(self, device, dtype, op): inputs = ( [ @@ -424,7 +415,7 @@ def _reduce_test(self, opinfo, inputs, ord, is_fastpath, n_expected_cudaLaunchKe @ops(foreach_reduce_op_db) def test_reduce_fastpath(self, device, dtype, op): for N, ord in itertools.product(N_values, (0, 1, 2, -1, -2)): - if ord in (1, 2) and dtype in torch.testing.get_all_fp_dtypes(): + if ord in (1, 2) and dtype in floating_types_and(torch.half, torch.bfloat16): n_expected_cudaLaunchKernels = 3 else: n_expected_cudaLaunchKernels = N @@ -437,7 +428,7 @@ def test_reduce_slowpath(self, device, dtype, op): inputs = op.sample_inputs(device, dtype, N, noncontiguous=True), self._reduce_test(op, inputs, ord, False, 1) - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool)) def test_add_scalar_with_empty_list_and_empty_tensor(self, device, dtype): # TODO: enable empty list case for tensors in [[torch.randn([0])]]: @@ -447,7 +438,7 @@ def test_add_scalar_with_empty_list_and_empty_tensor(self, device, dtype): torch._foreach_add_(tensors, 1) self.assertEqual(res, tensors) - @ops(foreach_binary_op_db, dtypes=get_all_dtypes()) + @ops(foreach_binary_op_db, dtypes=all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool)) def test_binary_op_scalar_with_overlapping_tensors(self, device, dtype, op): foreach_op, ref = op.method_variant, op.ref tensors = [torch.ones(1, 1, device=device, dtype=dtype).expand(2, 1, 3)] @@ -479,7 +470,7 @@ def test_binary_op_scalar_with_different_tensor_dtypes(self, device, dtype, op): runtime_error = e self.assertIsNone(runtime_error) - @ops(foreach_binary_op_db, dtypes=get_all_dtypes()) + @ops(foreach_binary_op_db, dtypes=all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool)) def test_binary_op_list_error_cases(self, device, dtype, op): foreach_op, foreach_op_, ref, ref_ = op.method_variant, op.inplace_variant, op.ref, op.ref_inplace tensors1 = [] @@ -534,7 +525,7 @@ def test_binary_op_list_error_cases(self, device, dtype, op): return with self.assertRaisesRegex(RuntimeError, "Expected all tensors to be on the same device"): foreach_op([tensor1], [tensor2]) - if dtype in get_all_int_dtypes() + [torch.bool] and foreach_op == torch._foreach_div: + if dtype in integral_types_and(torch.bool) and foreach_op == torch._foreach_div: with self.assertRaisesRegex(RuntimeError, "result type"): foreach_op_([tensor1], [tensor2]) else: @@ -543,7 +534,7 @@ def test_binary_op_list_error_cases(self, device, dtype, op): @skipMeta @unittest.skipIf(not torch.cuda.is_available(), "CUDA not found") - @ops(foreach_binary_op_db, dtypes=get_all_dtypes()) + @ops(foreach_binary_op_db, dtypes=all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool)) def test_binary_op_list_slow_path(self, device, dtype, op): # note(mkozuki): why `n_expected_cudaLaunchKernels=0`? # In this test, foreach functions don't go through fast path, @@ -635,7 +626,7 @@ def test_binary_op_tensors_on_different_devices(self, device, dtype, op): self.assertEqual(actual, tensors1) @onlyCUDA - @ops(foreach_pointwise_op_db, allowed_dtypes=get_all_fp_dtypes(include_half=False, include_bfloat16=False)) + @ops(foreach_pointwise_op_db, allowed_dtypes=floating_types()) def test_pointwise_op_tensors_on_different_devices(self, device, dtype, op): # tensors1: ['cuda', 'cpu] # tensors2: ['cuda', 'cpu] @@ -653,6 +644,27 @@ def test_pointwise_op_tensors_on_different_devices(self, device, dtype, op): foreach_op_(tensors1, tensors2, tensors3) self.assertEqual(expected, tensors1) + # note: BFloat16 has the same number of exponent bits as FP32 + # so if squared L2 norm overflows in BF16, then it also overflows in FP32. + @onlyCUDA + @ops(foreach_reduce_op_db, allowed_dtypes=(torch.half, torch.bfloat16)) + def test_foreach_l2_large_value_input(self, device, dtype, op): + ord, N = 2, 10 + max_value = torch.finfo(dtype).max + scaler = torch.tensor([max_value]).sqrt().to(device=device, dtype=dtype) + inputs = [t * scaler for t in op.sample_inputs(device, dtype, N, noncontiguous=False, low=1)], + # make sure that the min. of squared L2 norm value per tensor is greater than the max value of `dtype`. + self.assertTrue(scaler * scaler * N > max_value) + fn, ref_fn, *_ = self._get_funcs(op, 3) + actual = fn(inputs, is_cuda=True, is_fastpath=True, ord=ord) + expect = ref_fn(inputs, ord=ord) + if dtype == torch.float16: + # making sure the reference L2 norm values are in the range of FP16. + self.assertFalse(any(torch.isinf(e) for e in expect)) + else: + self.assertTrue(all(torch.isinf(e) for e in expect)) + self.assertEqual(expect, actual, equal_nan=False) + instantiate_device_type_tests(TestForeach, globals()) diff --git a/test/test_functionalization.py b/test/test_functionalization.py index 28476ff259576f..1b6bb88acf24f9 100644 --- a/test/test_functionalization.py +++ b/test/test_functionalization.py @@ -3,6 +3,9 @@ import torch from torch.testing._internal.common_utils import TestCase, run_tests from torch.testing._internal.logging_tensor import LoggingTensor, capture_logs, log_input +from torch.utils._pytree import tree_map + +import logging def are_aliased(x, y): if x._base is None and y._base is None: @@ -13,6 +16,45 @@ def are_aliased(x, y): return y._base is x return x._base is y._base +# Just for testing: a logging tensor that also transforms out-of-place ops into inplace ops. +# That way even if the outer wrapper is functionalized, the inner wrapper will also need functionalization. +class InplaceLoggingTensor(LoggingTensor): + @staticmethod + def __new__(cls, e): + r = torch.Tensor._make_wrapper_subclass(cls, e.shape, dtype=e.dtype, requires_grad=False) + r.elem = e + return r + + __torch_function__ = torch._C._disabled_torch_function_impl + + def __str__(self): + return f'InplaceLoggingTensor({self.elem})' + + @classmethod + def __torch_dispatch__(cls, func, types, args=(), kwargs=None): + def unwrap(e): + if isinstance(e, InplaceLoggingTensor): + return e.elem + else: + return e + + def wrap(e): + if isinstance(e, torch.Tensor): + return InplaceLoggingTensor(e) + else: + return e + f = func + # this subclass converts all `add()` ops into `add_()` ops + if f is torch.ops.aten.add.Tensor: + f = torch.ops.aten.add_.Tensor + + rs = tree_map(wrap, f(*tree_map(unwrap, args), **tree_map(unwrap, kwargs))) + # after running the (potentially transformed) op, + # log the original op that we saw. + logging.getLogger("LoggingTensor").info(f"{func.__module__}.{func.__name__}", args, kwargs, rs) + return rs + + class TestFunctionalization(TestCase): @@ -61,13 +103,13 @@ def f(x): logs = self.get_logs(f, torch.ones(4, 2)) self.assertExpectedInline('\n'.join(logs), """\ $0 = input('input') -$1 = torch._ops.aten.view($0, [4, 2]) -$2 = torch._ops.aten.add($1, tensor([[1., 1.], +$1 = torch._ops.aten.view.default($0, [4, 2]) +$2 = torch._ops.aten.add.Tensor($1, tensor([[1., 1.], [1., 1.], [1., 1.], [1., 1.]])) -$3 = torch._ops.aten.view($2, [4, 2]) -$4 = torch._ops.aten.mul($3, $3)""") +$3 = torch._ops.aten.view.default($2, [4, 2]) +$4 = torch._ops.aten.mul.Tensor($3, $3)""") def test_inplace_on_non_view(self): def f(x): @@ -81,8 +123,8 @@ def f(x): logs = self.get_logs(f, torch.ones(4, 2)) self.assertExpectedInline('\n'.join(logs), """\ $0 = input('input') -$1 = torch._ops.aten.view($0, [4, 2]) -$2 = torch._ops.aten.add($0, tensor([[1., 1.], +$1 = torch._ops.aten.view.default($0, [4, 2]) +$2 = torch._ops.aten.add.Tensor($0, tensor([[1., 1.], [1., 1.], [1., 1.], [1., 1.]]))""") @@ -101,9 +143,9 @@ def f(x): # We can update the output of this test if/when these tests eventually use LoggingTensor with PythonMode self.assertExpectedInline('\n'.join(logs), """\ $0 = input('input') -$1 = torch._ops.aten.copy_(tensor([[1., 1.], +$1 = torch._ops.aten.copy_.default(tensor([[1., 1.], [1., 1.]]), $0) -$2 = torch._ops.aten.copy_(tensor([[1., 1.], +$2 = torch._ops.aten.copy_.default(tensor([[1., 1.], [1., 1.]]), $0)""") def test_diagonal(self): @@ -118,10 +160,10 @@ def f(x): logs = self.get_logs(f, torch.ones(2, 2)) self.assertExpectedInline('\n'.join(logs), """\ $0 = input('input') -$1 = torch._ops.aten.diagonal($0) -$2 = torch._ops.aten.add($1, tensor([1., 1.])) -$3 = torch._ops.aten.diagonal_scatter($0, $2) -$4 = torch._ops.aten.mul($3, $3)""") +$1 = torch._ops.aten.diagonal.default($0) +$2 = torch._ops.aten.add.Tensor($1, tensor([1., 1.])) +$3 = torch._ops.aten.diagonal_scatter.default($0, $2) +$4 = torch._ops.aten.mul.Tensor($3, $3)""") def test_diagonal_mutated_input(self): def f(x): @@ -146,13 +188,13 @@ def f(x): logs = self.get_logs(f, torch.ones(4, 2)) self.assertExpectedInline('\n'.join(logs), """\ $0 = input('input') -$1, $2 = torch._ops.aten.split($0, 2) -$3 = torch._ops.aten.diagonal($2) -$4 = torch._ops.aten.add($3, tensor([1., 1.])) -$5, $6 = torch._ops.aten.split($0, 2) -$7 = torch._ops.aten.diagonal_scatter($6, $4) -$8 = torch._ops.aten.slice_scatter($0, $7, 0, 2, 4) -$9 = torch._ops.aten.mul($8, $8)""") +$1, $2 = torch._ops.aten.split.Tensor($0, 2) +$3 = torch._ops.aten.diagonal.default($2) +$4 = torch._ops.aten.add.Tensor($3, tensor([1., 1.])) +$5, $6 = torch._ops.aten.split.Tensor($0, 2) +$7 = torch._ops.aten.diagonal_scatter.default($6, $4) +$8 = torch._ops.aten.slice_scatter.default($0, $7, 0, 2, 4) +$9 = torch._ops.aten.mul.Tensor($8, $8)""") def test_view_inplace(self): def f(x): @@ -166,9 +208,9 @@ def f(x): logs = self.get_logs(f, torch.ones(4, 2)) self.assertExpectedInline('\n'.join(logs), """\ $0 = input('input') -$1 = torch._ops.aten.transpose($0, 1, 0) -$2 = torch._ops.aten.select($1, 0, 0) -$3 = torch._ops.aten.add($2, tensor([1., 1., 1., 1.]))""") +$1 = torch._ops.aten.transpose.int($0, 1, 0) +$2 = torch._ops.aten.select.int($1, 0, 0) +$3 = torch._ops.aten.add.Tensor($2, tensor([1., 1., 1., 1.]))""") def test_scalars(self): def f(x): @@ -183,10 +225,10 @@ def f(x): logs = self.get_logs(f, torch.ones(4, 2)) self.assertExpectedInline('\n'.join(logs), """\ $0 = input('input') -$1 = torch._ops.aten.view($0, [4, 2]) -$2 = torch._ops.aten.add($1, tensor(1)) -$3 = torch._ops.aten.mul($2, tensor(2)) -$4 = torch._ops.aten.div($3, tensor(1))""") +$1 = torch._ops.aten.view.default($0, [4, 2]) +$2 = torch._ops.aten.add.Tensor($1, tensor(1)) +$3 = torch._ops.aten.mul.Tensor($2, tensor(2)) +$4 = torch._ops.aten.div.Tensor($3, tensor(1))""") def test_everything(self): def f(x): @@ -205,39 +247,39 @@ def f(x): logs = self.get_logs(f, torch.ones(4, 2)) self.assertExpectedInline('\n'.join(logs), """\ $0 = input('input') -$1 = torch._ops.aten.view($0, [8]) -$2 = torch._ops.aten._reshape_alias($1, [2, 4], [4, 1]) -$3 = torch._ops.aten.transpose($2, 1, 0) -$4 = torch._ops.aten.view($0, [8]) -$5 = torch._ops.aten._reshape_alias($4, [2, 4], [4, 1]) -$6 = torch._ops.aten.transpose($5, 1, 0) -$7 = torch._ops.aten.unsqueeze($6, 0) -$8 = torch._ops.aten.view($0, [8]) -$9 = torch._ops.aten._reshape_alias($8, [2, 4], [4, 1]) -$10 = torch._ops.aten.transpose($9, 1, 0) -$11 = torch._ops.aten.unsqueeze($10, 0) -$12 = torch._ops.aten.squeeze($11) -$13, $14 = torch._ops.aten.split($12, 2) -$15 = torch._ops.aten.add($13, tensor([[1., 1.], +$1 = torch._ops.aten.view.default($0, [8]) +$2 = torch._ops.aten._reshape_alias.default($1, [2, 4], [4, 1]) +$3 = torch._ops.aten.transpose.int($2, 1, 0) +$4 = torch._ops.aten.view.default($0, [8]) +$5 = torch._ops.aten._reshape_alias.default($4, [2, 4], [4, 1]) +$6 = torch._ops.aten.transpose.int($5, 1, 0) +$7 = torch._ops.aten.unsqueeze.default($6, 0) +$8 = torch._ops.aten.view.default($0, [8]) +$9 = torch._ops.aten._reshape_alias.default($8, [2, 4], [4, 1]) +$10 = torch._ops.aten.transpose.int($9, 1, 0) +$11 = torch._ops.aten.unsqueeze.default($10, 0) +$12 = torch._ops.aten.squeeze.default($11) +$13, $14 = torch._ops.aten.split.Tensor($12, 2) +$15 = torch._ops.aten.add.Tensor($13, tensor([[1., 1.], [1., 1.]])) -$16 = torch._ops.aten.select($2, 0, 0) -$17 = torch._ops.aten.clone($15, memory_format=0) -$18 = torch._ops.aten._unsafe_view($17, [4]) -$19 = torch._ops.aten.view($0, [8]) -$20 = torch._ops.aten._reshape_alias($19, [2, 4], [4, 1]) -$21 = torch._ops.aten.transpose($20, 1, 0) -$22 = torch._ops.aten.unsqueeze($21, 0) -$23 = torch._ops.aten.squeeze($22) -$24 = torch._ops.aten.slice_scatter($23, $15, 0, 0, 2) -$25 = torch._ops.aten.unsqueeze($24, 0) -$26 = torch._ops.aten.squeeze($25, 0) -$27 = torch._ops.aten.transpose($26, 1, 0) -$28 = torch._ops.aten._reshape_alias($27, [8], [1]) -$29 = torch._ops.aten.view($28, [4, 2]) -$30 = torch._ops.aten.view($29, [8]) -$31 = torch._ops.aten._reshape_alias($30, [2, 4], [4, 1]) -$32 = torch._ops.aten.select($31, 0, 0) -$33 = torch._ops.aten.add($32, $18)""") +$16 = torch._ops.aten.select.int($2, 0, 0) +$17 = torch._ops.aten.clone.default($15, memory_format=0) +$18 = torch._ops.aten._unsafe_view.default($17, [4]) +$19 = torch._ops.aten.view.default($0, [8]) +$20 = torch._ops.aten._reshape_alias.default($19, [2, 4], [4, 1]) +$21 = torch._ops.aten.transpose.int($20, 1, 0) +$22 = torch._ops.aten.unsqueeze.default($21, 0) +$23 = torch._ops.aten.squeeze.default($22) +$24 = torch._ops.aten.slice_scatter.default($23, $15, 0, 0, 2) +$25 = torch._ops.aten.unsqueeze.default($24, 0) +$26 = torch._ops.aten.squeeze.dim($25, 0) +$27 = torch._ops.aten.transpose.int($26, 1, 0) +$28 = torch._ops.aten._reshape_alias.default($27, [8], [1]) +$29 = torch._ops.aten.view.default($28, [4, 2]) +$30 = torch._ops.aten.view.default($29, [8]) +$31 = torch._ops.aten._reshape_alias.default($30, [2, 4], [4, 1]) +$32 = torch._ops.aten.select.int($31, 0, 0) +$33 = torch._ops.aten.add.Tensor($32, $18)""") def test_aliases_maintained_after_pass(self): def f(x): @@ -279,34 +321,34 @@ def f(x): logs = self.get_logs(f, torch.ones(2)) self.assertExpectedInline('\n'.join(logs), """\ $0 = input('input') -$1 = torch._ops.aten.expand($0, [2]) -$2 = torch._ops.aten.add($1, $0)""") +$1 = torch._ops.aten.expand.default($0, [2]) +$2 = torch._ops.aten.add.Tensor($1, $0)""") # Test 2: copy_() with same dtype, different shape self.assert_functionalization(f, torch.ones(1)) logs = self.get_logs(f, torch.ones(1)) self.assertExpectedInline('\n'.join(logs), """\ $0 = input('input') -$1 = torch._ops.aten.expand($0, [2]) -$2 = torch._ops.aten.add($1, $0)""") +$1 = torch._ops.aten.expand.default($0, [2]) +$2 = torch._ops.aten.add.Tensor($1, $0)""") # Test 3: copy_() with different dtype, same shape self.assert_functionalization(f, torch.ones(2, dtype=torch.long)) logs = self.get_logs(f, torch.ones(2, dtype=torch.long)) self.assertExpectedInline('\n'.join(logs), """\ $0 = input('input') -$1 = torch._ops.aten._to_copy($0, dtype=6, layout=0, device=device(type='cpu'), pin_memory=False) -$2 = torch._ops.aten.expand($1, [2]) -$3 = torch._ops.aten.add($2, $0)""") +$1 = torch._ops.aten._to_copy.default($0, dtype=6, layout=0, device=device(type='cpu'), pin_memory=False) +$2 = torch._ops.aten.expand.default($1, [2]) +$3 = torch._ops.aten.add.Tensor($2, $0)""") # Test 4: copy_() with different dtype, different shape self.assert_functionalization(f, torch.ones(1, dtype=torch.long)) logs = self.get_logs(f, torch.ones(1, dtype=torch.long)) self.assertExpectedInline('\n'.join(logs), """\ $0 = input('input') -$1 = torch._ops.aten._to_copy($0, dtype=6, layout=0, device=device(type='cpu'), pin_memory=False) -$2 = torch._ops.aten.expand($1, [2]) -$3 = torch._ops.aten.add($2, $0)""") +$1 = torch._ops.aten._to_copy.default($0, dtype=6, layout=0, device=device(type='cpu'), pin_memory=False) +$2 = torch._ops.aten.expand.default($1, [2]) +$3 = torch._ops.aten.add.Tensor($2, $0)""") def test_nested_functions_propagate_updates(self): def g(x): @@ -324,5 +366,77 @@ def f(x): self.assert_functionalization(f, torch.ones(2, 2)) + def test_mixed_wrappers_valid(self): + def f(x, y): + z = x + y + z.add_(1) + return z + + x1_not_functional = LoggingTensor(torch.ones(4)) + x2_functional = torch._to_functional_tensor(LoggingTensor(torch.ones(4))) + + with capture_logs() as logs: + y = f(x1_not_functional, x2_functional) + + # I think the alias trace is coming from the fact that x2 is technically *not* + # a LoggingTensor (instead it *contains* a LoggingTensor), but x1 *is* a LoggingTensor. + # The important thing here though is that functionalization ran the "+" kernel + # with a functional + non-functional tensor, and wrapped the output appropriately. + self.assertExpectedInline('\n'.join(logs), """\ +$2 = torch._ops.aten.add.Tensor($0, $1) +$3 = torch._ops.aten.alias.default($2) +$4 = torch._ops.aten.add.Tensor($3, tensor(1))""") + + def test_mixed_wrappers_invalid(self): + x1_not_functional = torch.ones(4) + x2_functional = torch._to_functional_tensor(torch.ones(4)) + + # When dealing with mixed functional + nonfunctional tensors, + # normal_tensor.add_(functional_tensor) is not valid + # because normal_tensor would need to be "promoted" to a functional tensor. + with self.assertRaises(RuntimeError): + x1_not_functional.add_(x2_functional) + + # This tests the behavior of functionalization with multiple layers of wrapped tensor subclasses. + def test_multiple_levels_of_wrapping(self): + def f(x): + # call an inplace op and have it get logged twice (by the outer + inner wrapper) + x.add_(1) + + # Test 1: both the inner and outer wrapper are "functionalized" + x_inner_and_outer_functional = torch._to_functional_tensor( + InplaceLoggingTensor(torch._to_functional_tensor(LoggingTensor(torch.ones(4))))) + + with capture_logs() as logs: + f(x_inner_and_outer_functional) + + # Since both wrappers were unctionalized, they both log "add" + self.assertExpectedInline('\n'.join(logs), """\ +$1 = torch._ops.aten.add.Tensor($0, tensor(1)) +$3 = torch._ops.aten.add.Tensor($2, tensor(1))""") + + # Test 2: only the inner wrapper is "functionalized" + x_only_inner_functional = InplaceLoggingTensor(torch._to_functional_tensor(LoggingTensor(torch.ones(4)))) + + with capture_logs() as logs: + f(x_only_inner_functional) + + # Since only the inner wrapper is functionalized, then the inner (first) log is functionalized + self.assertExpectedInline('\n'.join(logs), """\ +$1 = torch._ops.aten.add.Tensor($0, tensor(1)) +$3 = torch._ops.aten.add_.Tensor($2, tensor(1))""") + + # Test 3: only the inner wrapper is "functionalized" + x_only_outer_functional = torch._to_functional_tensor(InplaceLoggingTensor(LoggingTensor(torch.ones(4)))) + + with capture_logs() as logs: + f(x_only_outer_functional) + + # Only the outer add_ is functionalized + # Since only the outer wrapper is functionalized, then the outer (second) log is functionalized + self.assertExpectedInline('\n'.join(logs), """\ +$1 = torch._ops.aten.add_.Tensor($0, tensor(1)) +$3 = torch._ops.aten.add.Tensor($2, tensor(1))""") + if __name__ == '__main__': run_tests() diff --git a/test/test_fx.py b/test/test_fx.py index f72dbd21266974..4a5436b7968339 100644 --- a/test/test_fx.py +++ b/test/test_fx.py @@ -7,6 +7,7 @@ import inspect import math import numbers +import io import operator import os import pickle @@ -17,6 +18,7 @@ import types import warnings import unittest +import torch.nn.utils._stateless as _stateless from math import sqrt from torch.multiprocessing import Process from torch.testing import FileCheck @@ -141,6 +143,7 @@ def __init__(self, a, b): class TestFX(JitTestCase): def setUp(self): + super().setUp() # Checking for mutable operations whil tracing is feature flagged # Enable it in testing but not by default self.orig_tracer_mutable_flag = torch.fx.proxy.TracerBase.check_mutable_operations @@ -151,6 +154,7 @@ def setUp(self): torch.ops.load_library(str(lib_file_path)) def tearDown(self): + super().tearDown() torch.fx.proxy.TracerBase.check_mutable_operations = self.orig_tracer_mutable_flag def checkGraphModule(self, m: torch.nn.Module, args, kwargs=None): @@ -457,6 +461,19 @@ def forward(self, a, b): gm.graph.lint() self.assertEqual(gm(3, 4), 14) + def test_concrete_arg_none_assert(self): + class Foo(torch.nn.Module): + def forward(self, x, val=None): + return x if val is None else x + val + + f = Foo() + traced = torch.fx.symbolic_trace(f, concrete_args={'val' : None}) + with self.assertRaisesRegex(AssertionError, 'val has been specialized to have value None'): + traced(torch.randn(5), torch.randn(5)) + + x = torch.randn(5) + torch.testing.assert_close(traced(x), f(x)) + def test_graph_unique_names(self): class M(torch.nn.Module): def forward(self, a, b): @@ -686,6 +703,7 @@ def forward(self, a): for node in m_g.graph.nodes: self.assertTrue(node.name != "getattr") + @unittest.skip("Hotfix for SEV remediation") def test_trace_buffer_slice(self): bs, d_hid = 10, 23 @@ -1026,6 +1044,24 @@ def forward(self, x): traced_scripted = torch.jit.script(traced) self.assertEqual(traced_scripted(torch.rand(4)), 2) + def test_tuple_no_subscript(self): + def foo(x : Tuple): + return x[0] + + traced = torch.fx.symbolic_trace(foo) + x = (torch.randn(5, 3),) + torch.testing.assert_allclose(traced(x), x[0]) + + bio = io.BytesIO() + + torch.save(traced, bio) + + bio.seek(0) + + loaded = torch.load(bio) + + torch.testing.assert_allclose(loaded(x), x[0]) + def test_torch_fx_len(self): class FXLenTest(torch.nn.Module): def forward(self, x): @@ -1096,6 +1132,24 @@ def forward(self, a): out = gm(input) self.assertEqual(out, ref_out) + def test_torch_op_overloads(self): + class M(torch.nn.Module): + def forward(self, a): + b = torch.ops.aten.add.Tensor(a, a) + return b + m = M() + input = torch.randn(3) + ref_out = m(input) + gm = symbolic_trace(m) + gm.graph.lint() + out = gm(input) + self.assertEqual(out, ref_out) + + for node in gm.graph.nodes: + if node.op == 'call_function': + assert isinstance(node.target, torch._ops.OpOverload) + assert node.target.__name__ == 'add.Tensor' + def test_pickle_torch_custom_ops(self): class M(torch.nn.Module): def forward(self, a): @@ -2661,7 +2715,7 @@ def to_trace(y): def test_profiler_ranges_side_effect(self): g = torch.fx.Graph() - handle = g.call_function(torch.ops.profiler._record_function_enter, ('test_range',)) + handle = g.call_function(torch.ops.profiler._record_function_enter_new, ('test_range',)) g.call_function(torch.ops.profiler._record_function_exit, (handle,)) g.output(None) @@ -2671,7 +2725,7 @@ def test_profiler_ranges_side_effect(self): found_targets.setdefault(node.target) self.assertEqual( list(found_targets.keys()), - [torch.ops.profiler._record_function_enter, torch.ops.profiler._record_function_exit] + [torch.ops.profiler._record_function_enter_new, torch.ops.profiler._record_function_exit] ) g.eliminate_dead_code() @@ -2681,7 +2735,7 @@ def test_profiler_ranges_side_effect(self): found_targets.setdefault(node.target) self.assertEqual( list(found_targets.keys()), - [torch.ops.profiler._record_function_enter, torch.ops.profiler._record_function_exit] + [torch.ops.profiler._record_function_enter_new, torch.ops.profiler._record_function_exit] ) def test_ast_rewriter_wrapped_via_decorator(self): @@ -2917,6 +2971,35 @@ def is_leaf_module(self, m: torch.nn.Module, module_qualified_name : str) -> boo gm2.delete_all_unused_submodules() torch.testing.assert_allclose(gm2(inputs), model(inputs)) + def test_fx_stateless(self): + class MockModule(torch.nn.Module): + def __init__(self): + super().__init__() + self.l1 = torch.nn.Linear(1, 1) + self.register_buffer('buffer', torch.ones(1)) + + def forward(self, x): + return self.l1(x) + self.buffer + + module = MockModule() + x = torch.rand((1, 1)) + weight = torch.tensor([[1.0]], requires_grad=True) + bias = torch.tensor([0.0], requires_grad=True) + buffer = torch.tensor([0.0]) + parameters = {'l1.weight': weight, + 'l1.bias': bias, + 'buffer': buffer} + fx_module = torch.fx.symbolic_trace(module) + res = _stateless.functional_call(fx_module, parameters, x) + res.backward() + self.assertIsNotNone(weight.grad) + self.assertIsNotNone(bias.grad) + self.assertIsNone(buffer.grad) + # Gradient was not calculated for the module stated and buffers + self.assertIsNone(module.l1.weight.grad) + self.assertIsNone(module.l1.bias.grad) + self.assertIsNone(module.buffer.grad) + def test_tracing_graphmodules_as_leaf_submodules(self): class A(torch.nn.Module): def forward(self, t): @@ -3310,6 +3393,66 @@ def f(a, b): ts_f = torch.jit.script(nf) self.assertEqual(nf(vals), ts_f(vals)) + def test_custom_codegen_with_transformer(self): + class ListCodeGen(CodeGen): + def gen_fn_def(self, free_vars, maybe_return_annotation): + lst_unpack = f""" +def forward(self, args_list: List[torch.Tensor]){maybe_return_annotation}: + {', '.join(free_vars)} = args_list""" + return lst_unpack + + def additional_globals(self): + return [('List', typing.List)] + + def process_inputs(self, *inputs): + assert(len(inputs) == 1) + return inputs[0] + + def f(a, b): + return a + b + + nf = symbolic_trace(f) + vals = [torch.randn(3), torch.randn(3)] + self.assertEqual(nf(*vals), f(*vals)) + + nf.graph.set_codegen(ListCodeGen()) + nf.recompile() + self.assertEqual(nf(vals), f(*vals)) + + transformed_gm = Transformer(nf).transform() + self.assertEqual(nf(vals), transformed_gm(vals)) + + def test_interpreter_with_codegen(self): + class ListCodeGen(CodeGen): + def gen_fn_def(self, free_vars, maybe_return_annotation): + lst_unpack = f""" +def forward(self, args_list: List[torch.Tensor]){maybe_return_annotation}: + {', '.join(free_vars)} = args_list""" + return lst_unpack + + def additional_globals(self): + return [('List', typing.List)] + + def process_inputs(self, *inputs): + assert(len(inputs) == 1) + return inputs[0] + + def generate_output(self, output_args): + return f'return list({repr(output_args)})' + + def process_outputs(self, outputs): + return list(outputs) + + def f(a, b): + a = a + b + b = a + b + return a, b + + nf = symbolic_trace(f) + vals = [torch.randn(3), torch.randn(3)] + nf.graph.set_codegen(ListCodeGen()) + nf.recompile() + self.assertEqual(Interpreter(nf).run(vals), nf(vals)) def test_imul_code_print(self): graph = torch.fx.Graph() @@ -3368,6 +3511,7 @@ def test_get_torch_func_signature_exhaustive(self, device, dtype, op): class TestFXAPIBackwardCompatibility(JitTestCase): def setUp(self): + super().setUp() self.maxDiff = None # Checking for mutable operations whil tracing is feature flagged @@ -3376,6 +3520,7 @@ def setUp(self): torch.fx.proxy.TracerBase.check_mutable_operations = True def tearDown(self): + super().tearDown() torch.fx.proxy.TracerBase.check_mutable_operations = self.orig_tracer_mutable_flag @@ -3614,12 +3759,14 @@ def check_symbols_have_bc_designation(m, prefix): class TestFunctionalTracing(JitTestCase): def setUp(self): + super().setUp() # Checking for mutable operations whil tracing is feature flagged # Enable it in testing but not by default self.orig_tracer_mutable_flag = torch.fx.proxy.TracerBase.check_mutable_operations torch.fx.proxy.TracerBase.check_mutable_operations = True def tearDown(self): + super().tearDown() torch.fx.proxy.TracerBase.check_mutable_operations = self.orig_tracer_mutable_flag IGNORE_FUNCS = ("has_torch_function", "has_torch_function_unary", diff --git a/test/test_fx_experimental.py b/test/test_fx_experimental.py index 37569198347844..53798776eb91f6 100644 --- a/test/test_fx_experimental.py +++ b/test/test_fx_experimental.py @@ -814,6 +814,29 @@ def mod_partition(node: Node): self.assertEqual(orig_out, submodules_out) + def test_split_module_kwargs_expansion(self): + class ModuleWithKwargsExpansion(torch.nn.Module): + def forward(self, x, **kwargs): + return x + kwargs['foo'] + + mod = ModuleWithKwargsExpansion() + traced = torch.fx.symbolic_trace(mod) + + seen_getitem = False + + def split_callback(n): + nonlocal seen_getitem + split_idx = int(seen_getitem) + if n.target == operator.getitem: + seen_getitem = True + return split_idx + + split = split_module(traced, mod, split_callback) + + x = torch.randn(5, 3) + foo = torch.randn(5, 3) + torch.testing.assert_allclose(split(x, foo=foo), traced(x, foo=foo)) + @skipIfNoTorchVision def test_subgraph_trivial_resnet(self): # Smoke test trivially splitting resnet into 1 partition works @@ -1516,6 +1539,7 @@ def test_normalize_operator_exhaustive(self, device, dtype, op): "igamma", "igammac", "index_put", + "linalg_pinv_singular", # Implemented with a lambda (only the singular variant) "nn.functional.conv2d", "nn.functional.dropout", "nn.functional.dropout2d", @@ -1587,6 +1611,9 @@ def test_normalize_operator_exhaustive(self, device, dtype, op): if op.name in op_skip: return + if op.formatted_name in op_skip: + return + if op.name.startswith('_masked.'): return diff --git a/test/test_hub.py b/test/test_hub.py new file mode 100644 index 00000000000000..662a2cf9771ee2 --- /dev/null +++ b/test/test_hub.py @@ -0,0 +1,256 @@ +# Owner(s): ["module: hub"] + +import unittest +from unittest.mock import patch +import os +import tempfile +import warnings + +import torch +import torch.hub as hub +from torch.testing._internal.common_utils import retry, IS_SANDCASTLE, TestCase + + +def sum_of_state_dict(state_dict): + s = 0 + for _, v in state_dict.items(): + s += v.sum() + return s + + +SUM_OF_HUB_EXAMPLE = 431080 +TORCHHUB_EXAMPLE_RELEASE_URL = 'https://github.com/ailzhang/torchhub_example/releases/download/0.1/mnist_init_ones' + + +@unittest.skipIf(IS_SANDCASTLE, 'Sandcastle cannot ping external') +class TestHub(TestCase): + + def setUp(self): + super().setUp() + self.previous_hub_dir = torch.hub.get_dir() + self.tmpdir = tempfile.TemporaryDirectory('hub_dir') + torch.hub.set_dir(self.tmpdir.name) + self.trusted_list_path = os.path.join(torch.hub.get_dir(), "trusted_list") + + def tearDown(self): + super().tearDown() + torch.hub.set_dir(self.previous_hub_dir) # probably not needed, but can't hurt + self.tmpdir.cleanup() + + def _assert_trusted_list_is_empty(self): + with open(self.trusted_list_path) as f: + assert not f.readlines() + + def _assert_in_trusted_list(self, line): + with open(self.trusted_list_path) as f: + assert line in (l.strip() for l in f.readlines()) + + @retry(Exception, tries=3) + def test_load_from_github(self): + hub_model = hub.load('ailzhang/torchhub_example', 'mnist', source='github', pretrained=True, verbose=False) + self.assertEqual(sum_of_state_dict(hub_model.state_dict()), SUM_OF_HUB_EXAMPLE) + + @retry(Exception, tries=3) + def test_load_from_local_dir(self): + local_dir = hub._get_cache_or_reload( + 'ailzhang/torchhub_example', + force_reload=False, + trust_repo=True, + calling_fn=None + ) + hub_model = hub.load(local_dir, 'mnist', source='local', pretrained=True, verbose=False) + self.assertEqual(sum_of_state_dict(hub_model.state_dict()), SUM_OF_HUB_EXAMPLE) + + @retry(Exception, tries=3) + def test_load_from_branch(self): + hub_model = hub.load('ailzhang/torchhub_example:ci/test_slash', 'mnist', pretrained=True, verbose=False) + self.assertEqual(sum_of_state_dict(hub_model.state_dict()), SUM_OF_HUB_EXAMPLE) + + @retry(Exception, tries=3) + def test_get_set_dir(self): + previous_hub_dir = torch.hub.get_dir() + with tempfile.TemporaryDirectory('hub_dir') as tmpdir: + torch.hub.set_dir(tmpdir) + self.assertEqual(torch.hub.get_dir(), tmpdir) + self.assertNotEqual(previous_hub_dir, tmpdir) + + hub_model = hub.load('ailzhang/torchhub_example', 'mnist', pretrained=True, verbose=False) + self.assertEqual(sum_of_state_dict(hub_model.state_dict()), SUM_OF_HUB_EXAMPLE) + assert os.path.exists(os.path.join(tmpdir, 'ailzhang_torchhub_example_master')) + + # Test that set_dir properly calls expanduser() + # non-regression test for https://github.com/pytorch/pytorch/issues/69761 + new_dir = os.path.join("~", "hub") + torch.hub.set_dir(new_dir) + self.assertEqual(torch.hub.get_dir(), os.path.expanduser(new_dir)) + + @retry(Exception, tries=3) + def test_list_entrypoints(self): + entry_lists = hub.list('ailzhang/torchhub_example', trust_repo=True) + self.assertObjectIn('mnist', entry_lists) + + @retry(Exception, tries=3) + def test_download_url_to_file(self): + with tempfile.TemporaryDirectory() as tmpdir: + f = os.path.join(tmpdir, 'temp') + hub.download_url_to_file(TORCHHUB_EXAMPLE_RELEASE_URL, f, progress=False) + loaded_state = torch.load(f) + self.assertEqual(sum_of_state_dict(loaded_state), SUM_OF_HUB_EXAMPLE) + + @retry(Exception, tries=3) + def test_load_state_dict_from_url(self): + loaded_state = hub.load_state_dict_from_url(TORCHHUB_EXAMPLE_RELEASE_URL) + self.assertEqual(sum_of_state_dict(loaded_state), SUM_OF_HUB_EXAMPLE) + + # with name + file_name = "the_file_name" + loaded_state = hub.load_state_dict_from_url(TORCHHUB_EXAMPLE_RELEASE_URL, file_name=file_name) + expected_file_path = os.path.join(torch.hub.get_dir(), 'checkpoints', file_name) + self.assertTrue(os.path.exists(expected_file_path)) + self.assertEqual(sum_of_state_dict(loaded_state), SUM_OF_HUB_EXAMPLE) + + @retry(Exception, tries=3) + def test_load_legacy_zip_checkpoint(self): + with warnings.catch_warnings(record=True) as ws: + warnings.simplefilter("always") + hub_model = hub.load('ailzhang/torchhub_example', 'mnist_zip', pretrained=True, verbose=False) + self.assertEqual(sum_of_state_dict(hub_model.state_dict()), SUM_OF_HUB_EXAMPLE) + assert any("will be deprecated in favor of default zipfile" in str(w) for w in ws) + + # Test the default zipfile serialization format produced by >=1.6 release. + @retry(Exception, tries=3) + def test_load_zip_1_6_checkpoint(self): + hub_model = hub.load( + 'ailzhang/torchhub_example', + 'mnist_zip_1_6', + pretrained=True, + verbose=False, + trust_repo=True + ) + self.assertEqual(sum_of_state_dict(hub_model.state_dict()), SUM_OF_HUB_EXAMPLE) + + @retry(Exception, tries=3) + def test_hub_parse_repo_info(self): + # If the branch is specified we just parse the input and return + self.assertEqual( + torch.hub._parse_repo_info('a/b:c'), + ('a', 'b', 'c') + ) + # For torchvision, the default branch is main + self.assertEqual( + torch.hub._parse_repo_info('pytorch/vision'), + ('pytorch', 'vision', 'main') + ) + # For the torchhub_example repo, the default branch is still master + self.assertEqual( + torch.hub._parse_repo_info('ailzhang/torchhub_example'), + ('ailzhang', 'torchhub_example', 'master') + ) + + @retry(Exception, tries=3) + def test_load_commit_from_forked_repo(self): + with self.assertRaisesRegex(ValueError, 'If it\'s a commit from a forked repo'): + torch.hub.load('pytorch/vision:4e2c216', 'resnet18') + + @retry(Exception, tries=3) + @patch('builtins.input', return_value='') + def test_trust_repo_false_emptystring(self, patched_input): + with self.assertRaisesRegex(Exception, 'Untrusted repository.'): + torch.hub.load('ailzhang/torchhub_example', 'mnist_zip_1_6', trust_repo=False) + self._assert_trusted_list_is_empty() + patched_input.assert_called_once() + + patched_input.reset_mock() + with self.assertRaisesRegex(Exception, 'Untrusted repository.'): + torch.hub.load('ailzhang/torchhub_example', 'mnist_zip_1_6', trust_repo=False) + self._assert_trusted_list_is_empty() + patched_input.assert_called_once() + + @retry(Exception, tries=3) + @patch('builtins.input', return_value='no') + def test_trust_repo_false_no(self, patched_input): + with self.assertRaisesRegex(Exception, 'Untrusted repository.'): + torch.hub.load('ailzhang/torchhub_example', 'mnist_zip_1_6', trust_repo=False) + self._assert_trusted_list_is_empty() + patched_input.assert_called_once() + + patched_input.reset_mock() + with self.assertRaisesRegex(Exception, 'Untrusted repository.'): + torch.hub.load('ailzhang/torchhub_example', 'mnist_zip_1_6', trust_repo=False) + self._assert_trusted_list_is_empty() + patched_input.assert_called_once() + + @retry(Exception, tries=3) + @patch('builtins.input', return_value='y') + def test_trusted_repo_false_yes(self, patched_input): + torch.hub.load('ailzhang/torchhub_example', 'mnist_zip_1_6', trust_repo=False) + self._assert_in_trusted_list("ailzhang_torchhub_example") + patched_input.assert_called_once() + + # Loading a second time with "check", we don't ask for user input + patched_input.reset_mock() + torch.hub.load('ailzhang/torchhub_example', 'mnist_zip_1_6', trust_repo="check") + patched_input.assert_not_called() + + # Loading again with False, we still ask for user input + patched_input.reset_mock() + torch.hub.load('ailzhang/torchhub_example', 'mnist_zip_1_6', trust_repo=False) + patched_input.assert_called_once() + + @retry(Exception, tries=3) + @patch('builtins.input', return_value='no') + def test_trust_repo_check_no(self, patched_input): + with self.assertRaisesRegex(Exception, 'Untrusted repository.'): + torch.hub.load('ailzhang/torchhub_example', 'mnist_zip_1_6', trust_repo="check") + self._assert_trusted_list_is_empty() + patched_input.assert_called_once() + + patched_input.reset_mock() + with self.assertRaisesRegex(Exception, 'Untrusted repository.'): + torch.hub.load('ailzhang/torchhub_example', 'mnist_zip_1_6', trust_repo="check") + patched_input.assert_called_once() + + @retry(Exception, tries=3) + @patch('builtins.input', return_value='y') + def test_trust_repo_check_yes(self, patched_input): + torch.hub.load('ailzhang/torchhub_example', 'mnist_zip_1_6', trust_repo="check") + self._assert_in_trusted_list("ailzhang_torchhub_example") + patched_input.assert_called_once() + + # Loading a second time with "check", we don't ask for user input + patched_input.reset_mock() + torch.hub.load('ailzhang/torchhub_example', 'mnist_zip_1_6', trust_repo="check") + patched_input.assert_not_called() + + @retry(Exception, tries=3) + def test_trust_repo_true(self): + torch.hub.load('ailzhang/torchhub_example', 'mnist_zip_1_6', trust_repo=True) + self._assert_in_trusted_list("ailzhang_torchhub_example") + + @retry(Exception, tries=3) + def test_trust_repo_builtin_trusted_owners(self): + torch.hub.load('pytorch/vision', 'resnet18', trust_repo="check") + self._assert_trusted_list_is_empty() + + @retry(Exception, tries=3) + def test_trust_repo_none(self): + with warnings.catch_warnings(record=True) as w: + warnings.simplefilter("always") + torch.hub.load('ailzhang/torchhub_example', 'mnist_zip_1_6', trust_repo=None) + assert len(w) == 1 + assert issubclass(w[-1].category, UserWarning) + assert "You are about to download and run code from an untrusted repository" in str(w[-1].message) + + self._assert_trusted_list_is_empty() + + @retry(Exception, tries=3) + def test_trust_repo_legacy(self): + # We first download a repo and then delete the allowlist file + # Then we check that the repo is indeed trusted without a prompt, + # because it was already downloaded in the past. + torch.hub.load('ailzhang/torchhub_example', 'mnist_zip_1_6', trust_repo=True) + os.remove(self.trusted_list_path) + + torch.hub.load('ailzhang/torchhub_example', 'mnist_zip_1_6', trust_repo="check") + + self._assert_trusted_list_is_empty() diff --git a/test/test_indexing.py b/test/test_indexing.py index 42ffa8ab24e8f2..4f0e7e4bf74bb3 100644 --- a/test/test_indexing.py +++ b/test/test_indexing.py @@ -692,7 +692,7 @@ def test_bool_indices(self, device): self.assertEqual(v[boolIndices].shape, v[uint8Indices].shape) self.assertEqual(v[boolIndices], v[uint8Indices]) self.assertEqual(v[boolIndices], tensor([True], dtype=torch.bool, device=device)) - self.assertEquals(len(w), 2) + self.assertEqual(len(w), 2) def test_bool_indices_accumulate(self, device): mask = torch.zeros(size=(10, ), dtype=torch.bool, device=device) @@ -713,7 +713,7 @@ def test_byte_mask(self, device): with warnings.catch_warnings(record=True) as w: self.assertEqual(v[mask].shape, (3, 7, 3)) self.assertEqual(v[mask], torch.stack([v[0], v[2], v[3]])) - self.assertEquals(len(w), 2) + self.assertEqual(len(w), 2) v = torch.tensor([1.], device=device) self.assertEqual(v[v == 0], torch.tensor([], device=device)) @@ -725,7 +725,7 @@ def test_byte_mask_accumulate(self, device): warnings.simplefilter("always") y.index_put_((mask, ), y[mask], accumulate=True) self.assertEqual(y, torch.ones(size=(10, 10), device=device)) - self.assertEquals(len(w), 2) + self.assertEqual(len(w), 2) def test_index_put_accumulate_large_tensor(self, device): # This test is for tensors with number of elements >= INT_MAX (2^31 - 1). @@ -876,7 +876,7 @@ def test_multiple_byte_mask(self, device): with warnings.catch_warnings(record=True) as w: warnings.simplefilter("always") self.assertEqual(v[mask1, :, mask2].shape, (3, 7)) - self.assertEquals(len(w), 2) + self.assertEqual(len(w), 2) def test_byte_mask2d(self, device): v = torch.randn(5, 7, 3, device=device) @@ -1130,7 +1130,7 @@ def test_byte_tensor_assignment(self, device): with warnings.catch_warnings(record=True) as w: x[b] = value - self.assertEquals(len(w), 1) + self.assertEqual(len(w), 1) self.assertEqual(x[0], value) self.assertEqual(x[1], torch.arange(4., 8, device=device)) diff --git a/test/test_jit.py b/test/test_jit.py index f60857058318da..876e6cbdf1e790 100644 --- a/test/test_jit.py +++ b/test/test_jit.py @@ -17,6 +17,7 @@ from jit.test_data_parallel import TestDataParallel # noqa: F401 from jit.test_models import TestModels # noqa: F401 from jit.test_modules import TestModules # noqa: F401 +from jit.test_autodiff import TestAutodiffJit # noqa: F401 from jit.test_autodiff_subgraph_slicing import TestAutodiffSubgraphSlicing # noqa: F401 from jit.test_custom_operators import TestCustomOperators # noqa: F401 from jit.test_export_modes import TestExportModes # noqa: F401 @@ -25,12 +26,13 @@ from jit.test_builtins import TestBuiltins, TestTensorBuiltins # noqa: F401 from jit.test_ignore_context_manager import TestIgnoreContextManager # noqa: F401 from jit.test_symbolic_shape_analysis import TestSymbolicShapeAnalysis # noqa: F401 +from jit.test_op_decompositions import TestOpDecompositions # noqa: F401 from jit.test_if_hoisting import TestIfHoisting # noqa: F401 from jit.test_unsupported_ops import TestUnsupportedOps # noqa: F401 from jit.test_freezing import TestFreezing, TestFrozenOptimizations, TestMKLDNNReinplacing # noqa: F401 from jit.test_peephole import TestPeephole # noqa: F401 from jit.test_alias_analysis import TestAliasAnalysis # noqa: F401 -from jit.test_save_load import TestSaveLoad # noqa: F401 +from jit.test_save_load import TestSaveLoad, TestSaveLoadFlatbuffer # noqa: F401 from jit.test_save_load_for_op_version import TestSaveLoadForOpVersion # noqa: F401 from jit.test_module_containers import TestModuleContainers # noqa: F401 from jit.test_python_bindings import TestPythonBindings # noqa: F401 @@ -76,6 +78,7 @@ from jit.test_device_analysis import TestDeviceAnalysis # noqa: F401 from jit.test_dce import TestDCE # noqa: F401 from jit.test_sparse import TestSparse # noqa: F401 +from jit.test_tensor_methods import TestTensorMethods # noqa: F401 # Torch from torch import Tensor @@ -203,11 +206,6 @@ def doAutodiffCheck(testname): # TODO: enable TE in PE when all tests are fixed torch._C._jit_set_texpr_fuser_enabled(GRAPH_EXECUTOR == ProfilingMode.PROFILING) torch._C._jit_set_profiling_executor(GRAPH_EXECUTOR != ProfilingMode.LEGACY) -# even though FULL_PROFILER should be our default -# we haven't tested every single test in this file -# but we enable FULL_PROFILER for a large subset -# of the tests with "with enable_profiling_mode_for_profiling_tests" -torch._C._jit_set_profiling_mode(False) def LSTMCell(input, hidden, w_ih, w_hh, b_ih=None, b_hh=None): hx, cx = hidden @@ -969,6 +967,56 @@ def forward(self, input): m_dropout.eval() self.assertEqual(dropout(input) + 1, m_dropout(input)) + def test_nn_lp_pool2d(self): + class Mod(torch.nn.Module): + def __init__(self): + super().__init__() + self.l = torch.nn.LPPool2d(2, 3) + self.n = torch.nn.LPPool2d(2, (7, 1)) + + def forward(self, x): + return (self.l(x), + self.n(x), + torch.nn.functional.lp_pool2d(x, float(2), 3), + torch.nn.functional.lp_pool2d(x, 2, 3), + torch.nn.functional.lp_pool2d(x, float(2), (7, 1))) + + self.checkModule(Mod(), (torch.rand(1, 3, 7, 7),)) + + def test_nn_lp_pool1d(self): + class Mod(torch.nn.Module): + def __init__(self): + super().__init__() + self.l = torch.nn.LPPool1d(2, 3) + self.n = torch.nn.LPPool1d(2, 7) + + def forward(self, x): + return (self.l(x), + self.n(x), + torch.nn.functional.lp_pool1d(x, float(2), 3), + torch.nn.functional.lp_pool1d(x, 2, 3), + torch.nn.functional.lp_pool1d(x, float(2), 7)) + + self.checkModule(Mod(), (torch.rand(1, 3, 7),)) + + def test_nn_padding_functional(self): + class Mod(nn.Module): + def __init__(self, *pad): + super().__init__() + self.pad = pad + + def forward(self, x): + return F.pad(x, self.pad, mode='constant', value=3.5) + + inputs = [ + (Mod(1, 2), torch.randn(1, 3, 4)), # 1D + (Mod(1, 2, 3, 4), torch.randn(1, 3, 4)), # 2D + (Mod(1, 2, 3, 4, 5, 6), torch.randn(1, 3, 4)), # 3D + ] + + for m, inp in inputs: + self.checkModule(m, (inp,)) + def test_nn_padding(self): class Mod(nn.Module): def __init__(self, padding): @@ -5715,12 +5763,7 @@ def test_fuser_double_float_codegen(self): 'frac'] def lookup_c_equivalent_fn(aten_fn): - if aten_fn == 'min': - return 'fmin' - elif aten_fn == 'max': - return 'fmax' - else: - return aten_fn + return aten_fn def test_dispatch(op, expects, dtype, binary=False): if dtype == torch.double: @@ -5754,7 +5797,9 @@ def test_dispatch(op, expects, dtype, binary=False): test_dispatch(fn, lookup_c_equivalent_fn(fn) + '(', torch.double) test_dispatch(fn, lookup_c_equivalent_fn(fn) + 'f(', torch.float) - binary_fns = ['min', 'max', 'pow'] + # 'min', 'max' were previously tested but are now replaced with ternary expressions + # instead of fmin() and fmax() + binary_fns = ['pow'] for fn in binary_fns: test_dispatch(fn, lookup_c_equivalent_fn(fn) + '(', torch.double, binary=True) test_dispatch(fn, lookup_c_equivalent_fn(fn) + 'f(', torch.float, binary=True) @@ -7312,7 +7357,7 @@ def test_as_tensor_tensor_input(input): g = test_as_tensor_tensor_input.graph_for(torch.ones(3, 4)) FileCheck().check("Tensor = aten::as_tensor").check("Float(*, *, requires_grad=0, device=cpu) = aten::as_tensor").run(g) - + @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.LEGACY, "testing legacy behavior") def test_tensor_requires_grad(self): @torch.jit.script def test(b): @@ -8218,6 +8263,44 @@ def test_irparser(self): """ FileCheck().run(graph_str, parse_ir(graph_str)) + def test_parse_tensor_constants(self): + def foo(): + return torch.zeros([4, 4]) + + foo_s = torch.jit.script(foo) + torch._C._jit_pass_constant_propagation(foo_s.graph) + + g = str(foo_s.graph) + g_parsed = parse_ir(g, parse_tensor_constants=True) + self.assertEqual(str(canonical(g_parsed)), str(canonical(foo_s.graph))) + func = torch._C._create_function_from_graph("forward", g_parsed) + + out_parsed = func() + out_func = foo() + # not checking data, just dtype, size etc + out_parsed[:] = 0 + out_func[:] = 0 + self.assertEqual(out_func, out_parsed) + + with self.assertRaises(RuntimeError): + parse_ir(g, parse_tensor_constants=False) + + def test_parse_nested_names(self): + g_str = """ + graph(%x.1 : Tensor): + %3 : int = prim::Constant[value=1]() + %2 : int = prim::Constant[value=2]() + %hi.submod.value.5 : Tensor = aten::add(%x.1, %2, %3) + return (%hi.submod.value.5) + """ + g = parse_ir(g_str) + round_trip_g = parse_ir(str(g)) + self.assertEqual(canonical(g), canonical(round_trip_g)) + + func1 = torch._C._create_function_from_graph("forward", g) + func2 = torch._C._create_function_from_graph("forward", round_trip_g) + self.assertEqual(func1(torch.ones([2])), func2(torch.ones([2]))) + def test_is_after_use(self): def sorted_input_use(g): uses = list(next(g.inputs()).uses()) @@ -11047,6 +11130,26 @@ def randint(): FileCheck().check("Double(*, *, requires_grad=0, device=cpu)") \ .check_not("Float(*, *, requires_grad=0, device=cpu)").run(randint.graph_for()) + @unittest.skipIf(not RUN_CUDA, "no CUDA") + @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "skip if profiling isn't enabled") + def test_autodiff_complex(self): + def foo(x: torch.Tensor, y: torch.Tensor, W: torch.Tensor): + return torch.exp(torch.mm(torch.complex(x, y), W.cfloat())) + + @torch.jit.script + def jitted_foo(x: torch.Tensor, y: torch.Tensor, W: torch.Tensor): + return torch.exp(torch.mm(torch.complex(x, y), W.cfloat())) + + x = torch.randn(128, 16, dtype=torch.float32, device='cuda:0') + y = torch.randn(128, 16, dtype=torch.float32, device='cuda:0') + W = torch.randn(16, 1, dtype=torch.float32, device='cuda:0', requires_grad=True) + W.data /= 4 + + with enable_profiling_mode_for_profiling_tests(): + for i in range(4): + self.assertTrue((foo(x, y, W).grad_fn is None) == (jitted_foo(x, y, W).grad_fn is None)) + + def test_linear_grad(self): with enable_profiling_mode_for_profiling_tests(): def t(x: torch.Tensor, w: torch.Tensor, b: Optional[torch.Tensor]): @@ -14820,6 +14923,12 @@ def forward(self, x): with self.assertRaisesRegex(Exception, "Overloads are not useable when a module"): a = torch.jit.script(W2()) + def test_narrow_copy(self): + def foo(a): + return a.narrow_copy(0, 0, 5) + + self.checkScript(foo, [torch.rand(10)]) + def test_select_after_chunk(self): def foo(x): chunked = torch.chunk(x, 1) diff --git a/test/test_jit_autocast.py b/test/test_jit_autocast.py index cec8acfe7e8542..37acb003e94778 100644 --- a/test/test_jit_autocast.py +++ b/test/test_jit_autocast.py @@ -659,6 +659,55 @@ def forward(self, x, y): # isn't enabled self.assertRaises(RuntimeError, lambda: scripted_thing1.forward(x, y)) + @unittest.skipIf(not TEST_CUDA, "No cuda") + def test_jit_freeze_autocast_basic(self): + class TestModule(torch.nn.Module): + def __init__(self): + super(TestModule, self).__init__() + + def forward(self, x, y): + with torch.cuda.amp.autocast(): + return torch.mm(x, y) + + x = torch.rand((3, 4), dtype=torch.float).cuda() + y = torch.rand((4, 5), dtype=torch.float).cuda() + + mod = TestModule().eval() + + # sanity check + self._test_autocast(mod, "aten::_autocast_to_reduced_precision", x, y) + + frozen_mod = torch.jit.freeze(torch.jit.script(mod).eval()) + FileCheck().check_count("aten::_autocast_to_reduced_precision", 2, True).run(frozen_mod.graph) + + # make sure that the runtime pass doesn't duplicate autocast nodes + frozen_mod(x, y) + optimized_graph = frozen_mod.graph_for(x, y) + FileCheck().check_count("aten::_autocast_to_reduced_precision", 2, True).run(optimized_graph) + + @unittest.skipIf(not TEST_CUDA, "No cuda") + def test_jit_freeze_autocast_constants(self): + class TestModule(torch.nn.Module): + def __init__(self): + super(TestModule, self).__init__() + self.x = torch.rand((3, 4), dtype=torch.float).cuda() + + def forward(self, y): + with torch.cuda.amp.autocast(): + return torch.mm(self.x, y) + + y = torch.rand((4, 5), dtype=torch.float).cuda() + mod = TestModule().eval() + + frozen_mod = torch.jit.freeze(torch.jit.script(mod).eval()) + # freezing should pre-cast the constant self.x to remove one autocast call + FileCheck().check_count("aten::_autocast_to_reduced_precision", 1, True).run(frozen_mod.graph) + + # the runtime autocasting pass will re-insert the second autocast call, + # but constant propagation will merge it with the constant that it's casting. + frozen_mod(y) + optimized_graph = frozen_mod.graph_for(y) + FileCheck().check_count("aten::_autocast_to_reduced_precision", 1, True).run(optimized_graph) if __name__ == "__main__": run_tests() diff --git a/test/test_jit_cuda_fuser.py b/test/test_jit_cuda_fuser.py index 299c738c570ab0..734e0d238294a9 100644 --- a/test/test_jit_cuda_fuser.py +++ b/test/test_jit_cuda_fuser.py @@ -10,14 +10,18 @@ import torch from torch.nn import functional +from torch.profiler import profile, ProfilerActivity -from torch.testing._internal.common_utils import run_tests, ProfilingMode, GRAPH_EXECUTOR # TEST_WITH_ROCM -from torch.testing._internal.common_cuda import TEST_MULTIGPU from torch.testing._internal.codegen.random_topo_test import runDefaultTestWithSeed +from torch.testing._internal.common_cuda import TEST_MULTIGPU +from torch.testing._internal.common_device_type import instantiate_device_type_tests, ops, OpDTypes +from torch.testing._internal.common_jit import JitCommonTestCase +from torch.testing._internal.common_methods_invocations import op_db +from torch.testing._internal.common_utils import run_tests, ProfilingMode, GRAPH_EXECUTOR, TEST_WITH_ROCM, IS_WINDOWS, slowTest +from torch.testing._internal.jit_utils import clone_inputs, get_traced_sample_variant_pairs, JitTestCase, RUN_CUDA +from torch.testing._internal.jit_metaprogramming_utils import create_traced_fn from torch.testing import FileCheck -from test_jit import JitTestCase, RUN_CUDA - from jit.test_fuser_common import TestFuserCommon # noqa: F401 import itertools @@ -28,7 +32,11 @@ from typing import List -CUDA_MAJOR, CUDA_MINOR = (int(x) for x in torch.version.cuda.split('.')) +RUN_NVFUSER = RUN_CUDA and not TEST_WITH_ROCM and not IS_WINDOWS +CUDA_MAJOR, CUDA_MINOR = 0, 0 + +if RUN_NVFUSER and torch.version.cuda is not None: + CUDA_MAJOR, CUDA_MINOR = (int(x) for x in torch.version.cuda.split('.')) os.environ['PYTORCH_NVFUSER_DISABLE_FALLBACK'] = '1' os.environ['PYTORCH_NVFUSER_DISABLE_FMA'] = '1' @@ -63,38 +71,36 @@ def nvfuser_horizontal_fusion(flag): torch._C._jit_set_nvfuser_horizontal_mode(old_value) def is_pre_volta(): + if not RUN_NVFUSER: + return False prop = torch.cuda.get_device_properties(torch.cuda.current_device()) return prop.major < 7 -TEST_BF16 = torch.cuda.is_bf16_supported() +TEST_BF16 = RUN_NVFUSER and torch.cuda.is_bf16_supported() -class TestCudaFuser(JitTestCase): +class CudaFuserTestOptions(): + def __init__(self): + self.old_cpu_fuse = torch._C._jit_can_fuse_on_cpu() + self.old_gpu_fuse = torch._C._jit_can_fuse_on_gpu() + torch._C._jit_override_can_fuse_on_cpu(False) + torch._C._jit_override_can_fuse_on_gpu(False) + self.old_guard = torch._C._jit_set_nvfuser_guard_mode(False) + torch._C._debug_set_autodiff_subgraph_inlining(False) + self.old_value = torch._C._jit_set_autocast_mode(True) - special_values = torch.tensor( - [float("-inf"), -10, -math.pi, - -1, -0.5, 0, 1, 0.5, - math.pi, 10, float("inf"), - float("nan")], dtype=torch.float, device='cuda') - - int_types = [ - torch.int8, - torch.uint8, - torch.int16, - torch.int32, - torch.int64 - ] - - support_tensor_dtypes = [ - torch.int32, - torch.int64, - torch.float16, - torch.float32, - torch.float64, - torch.bool - ] - if TEST_BF16: - support_tensor_dtypes.append(torch.bfloat16) + if(RUN_CUDA): + self.old_nvfuser = torch._C._jit_set_nvfuser_enabled(True) + + def restore(self): + if(RUN_CUDA): + torch._C._jit_set_nvfuser_enabled(self.old_nvfuser) + torch._C._jit_override_can_fuse_on_cpu(self.old_cpu_fuse) + torch._C._jit_override_can_fuse_on_gpu(self.old_gpu_fuse) + torch._C._jit_set_nvfuser_guard_mode(self.old_guard) + torch._C._debug_set_autodiff_subgraph_inlining(True) + torch._C._jit_set_autocast_mode(self.old_value) +class TestCudaFuser(JitTestCase): def _getSubgraphInFusion(self, graph): num_node = 0 subgraph = None @@ -114,6 +120,34 @@ def count(block, ret): def setUp(self): super(TestCudaFuser, self).setUp() + + # cpu backup to avoid errors in case this is run on a CPU-only machine + dev = 'cuda' if RUN_NVFUSER else 'cpu' + self.special_values = torch.tensor( + [float("-inf"), -10, -math.pi, + -1, -0.5, 0, 1, 0.5, + math.pi, 10, float("inf"), + float("nan")], dtype=torch.float, device=dev) + + self.int_types = [ + torch.int8, + torch.uint8, + torch.int16, + torch.int32, + torch.int64 + ] + + self.support_tensor_dtypes = [ + torch.int32, + torch.int64, + torch.float16, + torch.float32, + torch.float64, + torch.bool + ] + if TEST_BF16: + self.support_tensor_dtypes.append(torch.bfloat16) + self.old_cpu_fuse = torch._C._jit_can_fuse_on_cpu() self.old_gpu_fuse = torch._C._jit_can_fuse_on_gpu() torch._C._jit_override_can_fuse_on_cpu(False) @@ -122,17 +156,12 @@ def setUp(self): torch._C._debug_set_autodiff_subgraph_inlining(False) self.old_value = torch._C._jit_set_autocast_mode(True) - if(RUN_CUDA): - self.old_nvfuser = torch._C._jit_set_nvfuser_enabled(True) + if(RUN_NVFUSER): + self.cuda_fuser_options = CudaFuserTestOptions() def tearDown(self): - if(RUN_CUDA): - torch._C._jit_set_nvfuser_enabled(self.old_nvfuser) - torch._C._jit_override_can_fuse_on_cpu(self.old_cpu_fuse) - torch._C._jit_override_can_fuse_on_gpu(self.old_gpu_fuse) - torch._C._jit_set_nvfuser_guard_mode(self.old_guard) - torch._C._debug_set_autodiff_subgraph_inlining(True) - torch._C._jit_set_autocast_mode(self.old_value) + if(RUN_NVFUSER): + self.cuda_fuser_options.restore() super(TestCudaFuser, self).tearDown() def _run_helper(self, jit_op, op, *args): @@ -168,7 +197,7 @@ def _run_training_helper(self, jit_op, op, grads, *args): )[0].graph self.assertGraphContainsExactly(bwd_graph, FUSION_GUARD, 1, consider_subgraphs=True) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_half(self): @@ -194,7 +223,7 @@ def t(x: torch.Tensor, y: torch.Tensor, z: torch.Tensor, alpha: float): self.assertGraphContains(t_jit.graph_for(x, y, z, alpha), FUSION_GUARD) @unittest.skipIf(not TEST_BF16, "device does not support BFloat16") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_bfloat(self): @@ -219,7 +248,7 @@ def t(x: torch.Tensor, y: torch.Tensor, z: torch.Tensor, alpha: float): self.assertEqual(oo, jit_oo) self.assertGraphContains(t_jit.graph_for(x, y, z, alpha), FUSION_GUARD) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_const(self): @@ -236,7 +265,7 @@ def t(x, y): self.assertEqual(o, jit_o) self.assertGraphContains(t_jit.graph_for(x, y), FUSION_GUARD) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_chunk(self): @@ -260,14 +289,14 @@ def t(x, y, z, q): self.assertGraphContains(t_jit.graph_for(x, y, z, q), FUSION_GUARD) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_reduction_dtypes_axis(self): - for op in [torch.sum, torch.mean, torch.amax]: + for op in [torch.sum, torch.mean, torch.amax, torch.var, torch.std]: for dtype in [torch.float16, torch.float32, torch.double]: - for axis in [-1, 2]: + for axis in [-1, 2, 0]: def make_func(op): def func(x: torch.Tensor): o = torch.mul(x, 2.0) @@ -285,7 +314,34 @@ def func(x: torch.Tensor): self.assertTrue(self._compare("comparing output failed", o, jit_o, 1e-4)) self.assertGraphContains(t_jit.graph_for(x), FUSION_GUARD) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") + @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, + "Requires fusion optimization pass to be effective") + def test_variance(self): + + for op in [torch.var, torch.std]: + for dtype in [torch.float16, torch.float32, torch.double]: + for axis in [-2, -1, 2, 1]: + for unbiased in [False, True]: + def make_func(op): + def func(x: torch.Tensor): + o = torch.mul(x, 2.0) + o = op(o, dim=[axis]) + return o + return func + + x = torch.randn(8, 4, 16, dtype=dtype, device="cuda") + t = make_func(op) + t_jit = torch.jit.trace(t, x) + jit_o = t_jit(x) + jit_o = t_jit(x) + o = t(x) + self.assertEqual(o.dtype, jit_o.dtype) + self.assertTrue(self._compare("comparing output failed", o, jit_o, 1e-4)) + self.assertGraphContains(t_jit.graph_for(x), FUSION_GUARD) + + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_scalar_input(self): @@ -303,7 +359,7 @@ def t(x: torch.Tensor, y: torch.Tensor, z: float): self.assertEqual(o, jit_o) self.assertGraphContains(t_jit.graph_for(x, y, 2.0), FUSION_GUARD) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_broadcasting_0(self): @@ -322,7 +378,7 @@ def t(x: torch.Tensor, y: torch.Tensor, z: float): subgraph = self._getSubgraphInFusion(t_jit.graph_for(x, y, 2.0)) self.assertGraphContainsExactly(subgraph, 'aten::add', 2, consider_subgraphs=False) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_broadcasting_1(self): @@ -341,7 +397,7 @@ def t(x: torch.Tensor, y: torch.Tensor, z: float): subgraph = self._getSubgraphInFusion(t_jit.graph_for(x, y, 2.0)) self.assertGraphContainsExactly(subgraph, 'aten::add', 2, consider_subgraphs=False) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_broadcasting_2(self): @@ -360,7 +416,7 @@ def t(x: torch.Tensor, y: torch.Tensor, z: float): subgraph = self._getSubgraphInFusion(t_jit.graph_for(x, y, 2.0)) self.assertGraphContainsExactly(subgraph, 'aten::add', 2, consider_subgraphs=False) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_broadcasting_3(self): @@ -382,7 +438,7 @@ def t(x: torch.Tensor, y: torch.Tensor, z: float): # test_broadcasting_partition_logic_X # Testing partition logic that is capable to avoid creating unsupported # broadcasting semantics in CudaFusionGroup - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_broadcasting_partition_logic_0(self): @@ -404,7 +460,7 @@ def t(x: torch.Tensor, y: torch.Tensor, z: torch.Tensor): subgraph = self._getSubgraphInFusion(t_jit.graph_for(x, y, z)) self.assertGraphContainsExactly(subgraph, 'aten::add', 4, consider_subgraphs=False) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_broadcasting_partition_logic_1(self): @@ -427,7 +483,7 @@ def t(x: torch.Tensor, y: torch.Tensor, z: torch.Tensor): self.assertGraphContainsExactly(subgraph, 'aten::add', 4, consider_subgraphs=False) @unittest.skipIf(True, "Broadcast with different output not supported yet") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_broadcasting_multiple_output_shape(self): @@ -449,7 +505,7 @@ def t(x: torch.Tensor, y: torch.Tensor, z: torch.Tensor): self.assertGraphContains(t_jit.graph_for(x, y, z), FUSION_GUARD) @unittest.skipIf(True, "broadcast on branches can't be resolved yet") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_broadcasting_multiple_output(self): @@ -510,7 +566,7 @@ def t(x: torch.Tensor, y: torch.Tensor): self.assertEqual(o.dtype, jit_o.dtype) self.assertTrue(self._compare("failing case {}\n{}\n{}\n{}".format(dtype, operation, x, y), o, jit_o, 1e-2)) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_unary_ops(self): @@ -561,7 +617,7 @@ def test_unary_ops(self): self._unary_test_helper(op, dtype, False) # test special numbers self._unary_test_helper(op, dtype, True) # test random data - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_category_rule(self): @@ -621,7 +677,7 @@ def t(x: torch.Tensor, z: float): z = torch.tensor(3., dtype=torch.double) run_scalar(x, z) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_unary_bitwise(self): @@ -650,53 +706,173 @@ def bool_not(x: torch.Tensor, y: torch.Tensor): jitted.graph_for(x, y) # Shows up in second instance, not first self.assertGraphContains(jitted.graph_for(x, y), FUSION_GUARD) - def _binary_test_helper(self, operation, dtypes, random_data): - if isinstance(dtypes, tuple): - dtype_arg1, dtype_arg2 = dtypes - else: - dtype_arg1 = dtype_arg2 = dtypes + def _get_scalar_binary_test_fn(self, category_and_type1, category_and_type2, operation): + category1, dtype_arg1 = category_and_type1 + category2, dtype_arg2 = category_and_type2 - def t(x: torch.Tensor, y: torch.Tensor, z: torch.Tensor): + def t_intx_tensory(x: int, y: torch.Tensor): o = operation(x, y) - o = o + z + o = 2 + o return o - def t_int(x: torch.Tensor, y: torch.Tensor): + def t_doublex_tensory(x: float, y: torch.Tensor): o = operation(x, y) o = 2 + o return o + # Omit both scalar cases and swap cases + assert category1 == "scalar" and category2 != "scalar" + if dtype_arg1.is_floating_point: + return t_doublex_tensory + if dtype_arg1 == torch.int64 or dtype_arg1 == torch.int32: + return t_intx_tensory + raise NotImplementedError + + def _binary_test_helper(self, operation, dtypes, random_data, categories="ndim"): + if isinstance(dtypes, tuple): + dtype_arg1, dtype_arg2 = dtypes + else: + dtype_arg1 = dtype_arg2 = dtypes - def t_float(x: torch.Tensor, y: torch.Tensor): + if isinstance(categories, tuple) and random_data: + category1, category2 = categories + elif not random_data: + category1 = category2 = "ndim" + else: + category1 = category2 = categories + + def is_cpu_category(x): + return x == "0dimcpu" or x == "scalar" + + # skip unsupported cases + if is_cpu_category(category1) and is_cpu_category(category2): + return + + # only test cases with first operand as scalar + if category2 == "scalar": + return + + # skip ops that doesn't support scalar inputs in eager + if operation in [ + torch.atan2, + torch.max, + torch.min, + torch.remainder, # unsupported in nvfuser + ]: + if category1 == "scalar" or category2 == "scalar": + return + + if operation in [ + torch.fmod, + torch.eq, + torch.ne, + torch.ge, + torch.gt, + torch.le, + torch.lt + ]: + if category1 == "scalar": + return + + # operators that does not support bfloat16 + if operation in [torch.fmod]: + if dtype_arg1 == torch.bfloat16 or dtype_arg2 == torch.bfloat16: + return + + def t(x: torch.Tensor, y: torch.Tensor, z: torch.Tensor): o = operation(x, y) - o = 2. + o + o = o + z return o shape = (4, 32, 32) + + shapex = shape if category1 == "ndim" else () + shapey = shape if category2 == "ndim" else () + if random_data: - x = (torch.randn(shape, dtype=torch.float, device="cuda") * 5).to(dtype_arg1) - y = (torch.randn(shape, dtype=torch.float, device="cuda") * 5).to(dtype_arg2) + x = (torch.randn(shapex, dtype=torch.float, device="cuda") * 5).to(dtype_arg1) + y = (torch.randn(shapey, dtype=torch.float, device="cuda") * 5).to(dtype_arg2) else: x = self.special_values.to(dtype=dtype_arg1) y = (torch.rand_like(self.special_values) * 5).to(dtype_arg2) + + r""" + Category conversion + """ + has_scalar = False + if category1 == "scalar": + has_scalar = True + x = x.item() + + if category1 == "0dimcpu": + x = x.to(device="cpu") + + if category2 == "scalar": + has_scalar = True + y = y.item() + + if category2 == "0dimcpu": + y = y.to(device="cpu") + z = torch.tensor([2], device="cuda").to(dtype_arg1) + is_dtype_arg1_int = dtype_arg1 == torch.int32 or dtype_arg1 == torch.int64 + is_dtype_arg2_int = dtype_arg2 == torch.int32 or dtype_arg2 == torch.int64 + + if operation in [torch.pow]: + if is_dtype_arg1_int and is_dtype_arg2_int: + if category2 == "scalar": + # RuntimeError: Integers to negative integer powers are not allowed + y = abs(y) + if category2 == "0dimcpu" and y == -1: + # https://github.com/pytorch/pytorch/issues/73196 + y = y - 1 + if category2 == "0dimcpu" and y == -2: + # avoid pow(0, -2), which gives inconsistent results on integer tensor + y = y - 1 # Avoid division by zero for integer tensors div_like = [torch.div, torch.fmod, torch.remainder] if operation in div_like and (dtype_arg2 == torch.int32 or dtype_arg2 == torch.int64): y[y == 0] = 1 - for test_fn in [t, t_int, t_float]: - o = t(x, y, z) - t_jit = torch.jit.script(t) - jit_o = t_jit(x, y, z) - jit_o = t_jit(x, y, z) - jit_o = t_jit(x, y, z) + test_value = True + if dtype_arg1 == torch.half or dtype_arg2 == torch.half: + test_value = False + if dtype_arg1 == torch.bfloat16 or dtype_arg2 == torch.bfloat16: + test_value = False - self.assertEqual(o.dtype, jit_o.dtype) - self.assertEqual(o, jit_o) - self.assertGraphContains(t_jit.graph_for(x, y, z), FUSION_GUARD) + try: + if not has_scalar: + o = t(x, y, z) + t_jit = torch.jit.script(t) + jit_o = t_jit(x, y, z) + jit_o = t_jit(x, y, z) + jit_o = t_jit(x, y, z) + + self.assertEqual(o.dtype, jit_o.dtype) + if test_value: + self.assertEqual(o, jit_o) + self.assertGraphContains(t_jit.graph_for(x, y, z), FUSION_GUARD) + + elif category2 != "scalar": # only test the case where first is scalar + test_fn = self._get_scalar_binary_test_fn((category1, dtype_arg1), (category2, dtype_arg2), operation) + o = test_fn(x, y) + t_jit = torch.jit.script(test_fn) + jit_o = t_jit(x, y) + jit_o = t_jit(x, y) + jit_o = t_jit(x, y) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + self.assertEqual(o.dtype, jit_o.dtype) + if test_value: + self.assertEqual(o, jit_o) + self.assertGraphContains(t_jit.graph_for(x, y), FUSION_GUARD) + except Exception as e: + print("failing test for op: ", operation.__name__) + print("with input\n\tx: ", x) + print("\ty: ", y) + print("\tz: ", z) + raise e + + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_binary_ops(self): @@ -704,14 +880,12 @@ def test_binary_ops(self): data_types = [ torch.int32, torch.int64, - # torch.float16, + torch.float16, torch.float32, torch.float64 ] - ''' if TEST_BF16: data_types.append(torch.bfloat16) - ''' operations = [torch.mul, torch.div, torch.atan2, @@ -726,12 +900,24 @@ def test_binary_ops(self): torch.gt, torch.le, torch.lt] - binary_dtype_combinations = itertools.combinations(data_types, 2) + + category_types = [ + "scalar", + "0dim", + "0dimcpu", + "ndim" + ] + + binary_dtype_combinations = list(itertools.combinations(data_types, 2)) + category_combinations = list(itertools.combinations(category_types, 2)) + + for op, dtypes, categories in itertools.product(operations, binary_dtype_combinations, category_combinations): + self._binary_test_helper(op, dtypes, True, categories) # random data + for op, dtypes in itertools.product(operations, binary_dtype_combinations): - self._binary_test_helper(op, dtypes, True) # random data self._binary_test_helper(op, dtypes, False) # special numbers - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_binary_bitwise(self): @@ -778,7 +964,7 @@ def jit_xor(x: torch.Tensor, y: torch.Tensor, z: torch.Tensor): self.assertEqual(o, jit_o) self.assertGraphContains(jitted.graph_for(x, y, z), FUSION_GUARD) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_type_as_op(self): @@ -835,7 +1021,7 @@ def threshold(x: torch.Tensor, th: int, val: int): threshold_jit = torch.jit.script(threshold) self._run_helper(threshold_jit, threshold, x, arg2, arg3) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_ternary_ops_integer_compatibility(self): @@ -888,7 +1074,7 @@ def t(x: torch.Tensor, y: torch.Tensor, z: torch.Tensor, alpha: torch.Tensor): self.assertEqual(o, jit_o) self.assertGraphContains(t_jit.graph_for(x, y, z), FUSION_GUARD) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_ternary_ops_type_promotion(self): @@ -910,7 +1096,7 @@ def test_ternary_ops_type_promotion(self): self._ternary_test_helper(op, dtypes, False) # special numbers # We can't test the scalar version of rsub from python - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_rsub(self): x = torch.randn(4, 8, 32, 32, dtype=torch.float, device="cuda") @@ -924,7 +1110,7 @@ def rsub(x: torch.Tensor, y: torch.Tensor): rsub_jit = torch.jit.script(rsub) self._run_helper(rsub_jit, rsub, x, y) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") # legacy fuser does not work for rand_like, see issue #34361 @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_ternary_ops(self): @@ -976,7 +1162,7 @@ def lerp_scale(x: torch.Tensor, y: torch.Tensor, z: float): lerp_scale_jit = torch.jit.script(lerp_scale) self._run_helper(lerp_scale_jit, lerp_scale, x, y, 0.5) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires profiling node to run cuda fuser") def test_addcmul_ops(self): x = torch.randn(4, 8, 32, 32, dtype=torch.float, device="cuda") @@ -1004,7 +1190,7 @@ def addcmul_const_alpha(x: torch.Tensor, y: torch.Tensor, z: torch.Tensor): addcmul_const_alpha_jit = torch.jit.script(addcmul_const_alpha) self._run_helper(addcmul_const_alpha_jit, addcmul_const_alpha, x, y, z) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_dynamic_size(self): @@ -1044,7 +1230,7 @@ def t(x: torch.Tensor, y: torch.Tensor, z: float): self.assertGraphContains(t_jit.graph_for(x, y, 2.0), FUSION_GUARD) torch._C._jit_set_nvfuser_guard_mode(old_guard) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_random_topo(self): @@ -1095,7 +1281,7 @@ def t(x: torch.Tensor, y: torch.Tensor): # we are testing inputs with all combination of permutation order, just to # ensure that integration would be able to generate functionally correct # kernels - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_binary_ops_permutation(self): @@ -1109,7 +1295,7 @@ def test_binary_ops_permutation(self): x = [7, 8, 12] self._permutation_helper(x, b_axis, torch.float32, "cuda", perm0, perm1) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_binary_ops_channels_last_with_bcast(self): @@ -1160,7 +1346,7 @@ def forward(self, x: torch.Tensor, y: torch.Tensor): self.assertGraphContains(t_jit.graph_for(x, y), FUSION_GUARD) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_reduction(self): @@ -1210,7 +1396,7 @@ def _layer_norm_autodiff_helper(self, model, grad, shapes, args): FileCheck().check(FUSION_GUARD).run(v2.graph) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_layer_norm_autodiff(self): @@ -1252,7 +1438,7 @@ def t(shapes: List[int], x, eps: float, cudnn: bool): self._layer_norm_autodiff_helper(m, grad, shapes, args) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_layer_norm_parser(self): @@ -1312,7 +1498,7 @@ def forward(self, x: torch.Tensor): self.assertGraphContains(t_jit.graph_for(x), FUSION_GUARD) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_native_layer_norm(self): @@ -1326,7 +1512,7 @@ def test_native_layer_norm(self): self._native_layer_norm_helper(input_shape, norm_shape, torch.float32, "cuda", 1e-4, affine) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_native_layer_norm_half(self): @@ -1339,7 +1525,7 @@ def test_native_layer_norm_half(self): self._native_layer_norm_helper(input_shape, norm_shape, torch.float16, "cuda", 5e-3) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") @unittest.skipIf(not TEST_BF16, "device does not support BFloat16") @@ -1352,7 +1538,15 @@ def test_native_layer_norm_bfloat(self): norm_shape = [input_shape[idx] for idx in range(dims - offset, dims)] self._native_layer_norm_helper(input_shape, norm_shape, torch.bfloat16, "cuda", 1e-1) - def _norm_helper(self, shape, dtype, device, error, is_batch_norm_else_instance_norm, memory_format=torch.contiguous_format): + def _norm_helper(self, + shape, + dtype, + device, + error, + is_batch_norm_else_instance_norm, + memory_format=torch.contiguous_format, + *, + layer_dtype=torch.float32): class MyBatchNorm(torch.nn.Module): def __init__(self): super(MyBatchNorm, self).__init__() @@ -1374,8 +1568,8 @@ def forward(self, x: torch.Tensor, r_mean: torch.Tensor, r_var: torch.Tensor): t = MyBatchNorm() if is_batch_norm_else_instance_norm else MyInstanceNorm() x = torch.randn(shape, dtype=dtype, device=device).to(memory_format=memory_format) - running_mean = torch.zeros(shape[1], dtype=torch.float32, device=device) - running_var = torch.ones(shape[1], dtype=torch.float32, device=device) + running_mean = torch.zeros(shape[1], dtype=layer_dtype, device=device) + running_var = torch.ones(shape[1], dtype=layer_dtype, device=device) t_jit = torch.jit.script(t) eager_running_mean = running_mean.clone() @@ -1400,7 +1594,38 @@ def forward(self, x: torch.Tensor, r_mean: torch.Tensor, r_var: torch.Tensor): self.assertGraphContains(t_jit.graph_for(x, running_mean, running_var), FUSION_GUARD) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") + @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, + "Requires fusion optimization pass to be effective") + def test_layer_norm_trivial_reduce_dim(self): + def t_wb(shapes: List[int], x, w, b, eps: float, cudnn: bool): + o = torch.layer_norm(x, shapes, w, b, eps, cudnn) + o = torch.relu(o) + return o + + batch = [1] + shapes = [2, 7, 3] + + grad = torch.randn(batch + shapes, dtype=torch.float32, device="cuda") + args = [torch.randn(batch + shapes, dtype=torch.float32, device="cuda").requires_grad_()] + args.append(torch.randn(shapes, dtype=torch.float32, device="cuda").requires_grad_()) + args.append(torch.randn(shapes, dtype=torch.float32, device="cuda").requires_grad_()) + self._layer_norm_autodiff_helper(t_wb, grad, shapes, args) + + @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") + @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, + "Requires fusion optimization pass to be effective") + def test_norm_half_layer(self): + size = [2, 4, 2, 2] + + for is_batch_norm_else_instance_norm in [False, True]: + for mf in [torch.channels_last, torch.contiguous_format]: + self._norm_helper(size, torch.float16, "cuda", 1e-3, is_batch_norm_else_instance_norm, + memory_format=mf, layer_dtype=torch.float16) + + @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_norm_channels_last(self): @@ -1412,7 +1637,7 @@ def test_norm_channels_last(self): self._norm_helper(size, torch.float32, "cuda", 1e-4, is_batch_norm_else_instance_norm, memory_format=mf) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_norm(self): @@ -1429,7 +1654,7 @@ def test_norm(self): self._norm_helper(x, torch.float32, "cuda", 1e-4, is_batch_norm_else_instance_norm) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_norm_large(self): @@ -1445,7 +1670,7 @@ def test_norm_large(self): self._norm_helper(x, torch.float32, "cuda", 1e-4, is_batch_norm_else_instance_norm) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_norm_half(self): @@ -1462,7 +1687,7 @@ def test_norm_half(self): self._norm_helper(x, torch.float16, "cuda", 5e-3, is_batch_norm_else_instance_norm) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") @unittest.skipIf(not TEST_BF16, "device does not support BFloat16") @@ -1479,7 +1704,7 @@ def test_norm_bfloat(self): x[1] = C self._norm_helper(x, torch.bfloat16, "cuda", 1e-1, is_batch_norm_else_instance_norm) - def _softmax_helper(self, shape, reduction_axis, dtype, device, error): + def _softmax_helper(self, shape, reduction_axis, is_log_softmax, dtype, device, error): class MySoftmax(torch.nn.Module): __constants__ = ['reduction_axis'] @@ -1492,22 +1717,40 @@ def forward(self, x: torch.Tensor, y: torch.Tensor): o = torch.nn.functional.softmax(o, dim=self.reduction_axis) return o - t = MySoftmax() + class MyLogSoftmax(torch.nn.Module): + __constants__ = ['reduction_axis'] - x = torch.randn(shape, dtype=dtype, device=device) - y = torch.randn(shape, dtype=dtype, device=device) + def __init__(self): + super(MyLogSoftmax, self).__init__() + self.reduction_axis = reduction_axis + + def forward(self, x: torch.Tensor, y: torch.Tensor): + o = torch.add(x, y) + o = torch.nn.functional.log_softmax(o, dim=self.reduction_axis) + return o + + gradient_check = (dtype == torch.float64) + t = MyLogSoftmax() if is_log_softmax else MySoftmax() + + x = torch.randn(shape, dtype=dtype, device=device, requires_grad=gradient_check) + y = torch.randn(shape, dtype=dtype, device=device, requires_grad=gradient_check) t_jit = torch.jit.script(t) jit_o = t_jit(x, y) jit_o = t_jit(x, y) - o = t(x, y) - self.assertEqual(o.dtype, jit_o.dtype) - # numerical issues here due to our scheduling. - # can't use `self.assertEqual(o, jit_o)` - self.assertTrue(self._compare("comparing output failed", o, jit_o, error)) - self.assertGraphContains(t_jit.graph_for(x, y), FUSION_GUARD) + jit_o = t_jit(x, y) + + if gradient_check: + gradcheck(t_jit.forward, [x, y], nondet_tol=1e-5) + else: + o = t(x, y) + self.assertEqual(o.dtype, jit_o.dtype) + # numerical issues here due to our scheduling. + # can't use `self.assertEqual(o, jit_o)` + self.assertTrue(self._compare("comparing output failed", o, jit_o, error)) + self.assertGraphContains(t_jit.graph_for(x, y), FUSION_GUARD) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_softmax_dtype(self): @@ -1549,7 +1792,7 @@ def t(x: torch.Tensor, y: torch.Tensor): FileCheck().check(FUSION_GUARD).run(bwd_graph) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test__softmax_function(self): @@ -1573,7 +1816,7 @@ def t(x: torch.Tensor, y: torch.Tensor): self.assertGraphContainsExactly(t_jit.graph_for(x, y), FUSION_GUARD, 1, consider_subgraphs=True) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test__softmax_function_half_to_float(self): @@ -1597,7 +1840,7 @@ def t(x: torch.Tensor, y: torch.Tensor): self.assertGraphContainsExactly(t_jit.graph_for(x, y), FUSION_GUARD, 1, consider_subgraphs=True) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_softmax(self): @@ -1606,14 +1849,21 @@ def test_softmax(self): output_size = int(pow(output_size, 1. / dims)) reduction_sizes = [67, 256, 1024, 4096] + # gradient check + for reduction_dim in range(dims): + for is_log_softmax in [False, True]: + shape = [output_size for idx in range(dims)] + self._softmax_helper(shape, reduction_dim, is_log_softmax, torch.float64, "cuda", 1e-4) + for reduction_dim in range(dims): for reduction_size in reduction_sizes: x = [output_size for idx in range(dims)] x[reduction_dim] = reduction_size - self._softmax_helper(x, reduction_dim, torch.float32, "cuda", 1e-4) + for is_log_softmax in [False, True]: + self._softmax_helper(x, reduction_dim, is_log_softmax, torch.float32, "cuda", 1e-4) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_softmax_half(self): @@ -1626,10 +1876,11 @@ def test_softmax_half(self): for reduction_size in reduction_sizes: x = [output_size for idx in range(dims)] x[reduction_dim] = reduction_size - self._softmax_helper(x, reduction_dim, torch.float16, "cuda", 5e-3) + for is_log_softmax in [False, True]: + self._softmax_helper(x, reduction_dim, is_log_softmax, torch.float16, "cuda", 5e-3) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") @unittest.skipIf(not TEST_BF16, "device does not support BFloat16") @@ -1643,10 +1894,11 @@ def test_softmax_bfloat(self): for reduction_size in reduction_sizes: x = [output_size for idx in range(dims)] x[reduction_dim] = reduction_size - self._softmax_helper(x, reduction_dim, torch.bfloat16, "cuda", 1e-1) + for is_log_softmax in [False, True]: + self._softmax_helper(x, reduction_dim, is_log_softmax, torch.bfloat16, "cuda", 1e-1) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_reduction_permutation(self): @@ -1660,7 +1912,7 @@ def test_reduction_permutation(self): self._reduction_helper(x, axes, torch.float32, "cuda", perm0, perm1) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_reduction_multiple_output(self): @@ -1699,7 +1951,7 @@ def t(x: torch.Tensor, y: torch.Tensor, scale: float, z: torch.Tensor): self.assertGraphContains(t_jit.graph_for(x, y, scale, z), FUSION_GUARD) torch._C._jit_set_nvfuser_guard_mode(old_guard) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_channels_last_with_broadcast(self): @@ -1805,7 +2057,7 @@ def t(x: torch.Tensor, y: torch.Tensor): ''' @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_pw_single_reduction_partition(self): @@ -1830,7 +2082,7 @@ def t(x: torch.Tensor, y: torch.Tensor, z: torch.Tensor): self.assertGraphContains(t_jit.graph_for(x, y, z), FUSION_GUARD) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_permutation_preservation(self): @@ -1868,7 +2120,7 @@ def t(x: torch.Tensor): self.assertTrue(jit_o.is_contiguous(memory_format=torch.channels_last)) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_normalization_partition(self): @@ -1896,7 +2148,7 @@ def t(x: torch.Tensor, y: torch.Tensor, z: torch.Tensor, r_mean: torch.Tensor, r self.assertGraphContains(t_jit.graph_for(x, y, z, r_m, r_v), FUSION_GUARD) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_sum_to_one(self): @@ -1917,7 +2169,7 @@ def t(x: torch.Tensor): self.assertGraphContains(t_jit.graph_for(x), FUSION_GUARD) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_single_reduction_broadcast(self): @@ -1941,7 +2193,7 @@ def t(x: torch.Tensor, y: torch.Tensor, z: torch.Tensor): self.assertGraphContains(t_jit.graph_for(x, y, z), FUSION_GUARD) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_trivial_reduction(self): @@ -1962,7 +2214,7 @@ def t(x: torch.Tensor): self.assertEqual(o, jit_o) self.assertGraphContains(t_jit.graph_for(x), FUSION_GUARD) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_profiling_node(self): @@ -1978,7 +2230,7 @@ def repro(x: torch.Tensor, alpha: float): self._run_helper(repro_jit, repro, x, 0.6) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_reduction_sizes_op(self): @@ -2002,7 +2254,7 @@ def t(x: torch.Tensor, y: torch.Tensor): self.assertGraphContainsExactly(t_jit.graph_for(x, y), FUSION_GUARD, 0) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_profile_ivalue(self): @@ -2025,7 +2277,7 @@ def t(x: torch.Tensor, y: torch.Tensor, dim: List[int], keepdim: bool): self.assertGraphContains(t_jit.graph_for(x, y, (0, 1), False), FUSION_GUARD) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_sum_to_size(self): @@ -2059,7 +2311,7 @@ def t(x: torch.Tensor, y: torch.Tensor, new_size: List[int]): self.assertEqual(o, jit_o) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_grad_sum_to_size(self): @@ -2118,7 +2370,7 @@ def t(x: torch.Tensor, y: torch.Tensor): self.assertEqual(x.grad, ref_x.grad) self.assertEqual(y.grad, ref_y.grad) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_dropout_inference_fusion(self): @@ -2135,7 +2387,7 @@ def t(x: torch.Tensor, p: float, train: bool): self._run_helper(t_jit, t, x, 0.15, False) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_dropout_train_nograd_fusion(self): @@ -2152,7 +2404,7 @@ def t(x: torch.Tensor, p: float, train: bool): self._run_helper(t_jit, t, x, 0.0, True) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_dropout_train_nograd_prob_check(self): @@ -2183,7 +2435,7 @@ def t(x: torch.Tensor, p: float, train: bool): self.assertGraphContainsExactly(t_jit.graph_for(x, prob, True), FUSION_GUARD, 1, consider_subgraphs=True) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_dropout_training_fusion(self): @@ -2214,7 +2466,7 @@ def t2(x: torch.Tensor, p: float, train: bool): # numbers between eager mode and the jit is different self._run_training_helper(t2_jit, t2, grads, x, 0.0, True) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_gelu(self): @@ -2234,7 +2486,7 @@ def t(x: torch.Tensor, mode : str): self._run_training_helper(t_jit, t, grads, x, 'tanh') torch._C._jit_set_nvfuser_guard_mode(old_guard) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_dropout_training_prob_check(self): @@ -2267,13 +2519,15 @@ def t(x: torch.Tensor, p: float, train: bool): self.assertTrue((percent_zeros >= (prob - 0.01)) and (percent_zeros <= (prob + 0.01))) self.assertGraphContainsExactly(t_jit.graph_for(x, prob, True), FUSION_GUARD, 1, consider_subgraphs=True) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_linear(self): in_feature = 2 out_feature = 8 - x = torch.randn(4, in_feature, dtype=torch.float32, device='cuda') + # Changing the input dims to be 3-D to avoid eager mode bias fusion + # The bias fusion causes some precision issues with TF-32 + x = torch.randn(2, 4, in_feature, dtype=torch.float32, device='cuda') weight = torch.randn(out_feature, in_feature, dtype=torch.float32, device='cuda') bias = torch.randn(out_feature, dtype=torch.float32, device='cuda') @@ -2292,7 +2546,7 @@ def t(x: torch.Tensor, weight: torch.Tensor, bias: torch.Tensor): # have been optimized away self.assertGraphContainsExactly(t_jit.graph_for(x, weight, bias), FUSION_GUARD, 1) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_backward_type(self): @@ -2335,7 +2589,7 @@ def test1(x: torch.Tensor, y: torch.Tensor): self.assertEqual(y.grad.dtype, y.dtype) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_autocast_1(self): @@ -2372,7 +2626,7 @@ def t(x: torch.Tensor, y: torch.Tensor): self.assertEqual(y.grad.dtype, y.dtype) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_autocast_2(self): @@ -2408,7 +2662,7 @@ def t(x: torch.Tensor): self.assertEqual(x.grad.dtype, x.dtype) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") @unittest.skipIf(not TEST_BF16, "device does not support BFloat16") @@ -2446,7 +2700,7 @@ def t(x: torch.Tensor, y: torch.Tensor): self.assertEqual(y.grad.dtype, y.dtype) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") @unittest.skipIf(not TEST_BF16, "device does not support BFloat16") @@ -2482,7 +2736,7 @@ def t(x: torch.Tensor): self.assertEqual(jit_o.dtype, torch.float) self.assertEqual(x.grad.dtype, x.dtype) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_to_dtype_fp32_to_fp16(self): @@ -2501,7 +2755,7 @@ def t(x: torch.Tensor): self.assertGraphContainsExactly(t_jit.graph_for(x), FUSION_GUARD, 1) self.assertEqual(jit_o.dtype, torch.half) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_to_dtype_fp16_to_fp32(self): @@ -2520,7 +2774,7 @@ def t(x: torch.Tensor): self.assertGraphContainsExactly(t_jit.graph_for(x), FUSION_GUARD, 1) self.assertEqual(jit_o.dtype, torch.float) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_to_dtype_fp16_to_fp16(self): @@ -2539,7 +2793,7 @@ def t(x: torch.Tensor): self.assertGraphContainsExactly(t_jit.graph_for(x), FUSION_GUARD, 1) self.assertEqual(jit_o.dtype, torch.half) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") @unittest.skipIf(not TEST_BF16, "device does not support BFloat16") @@ -2559,7 +2813,7 @@ def t(x: torch.Tensor): self.assertGraphContainsExactly(t_jit.graph_for(x), FUSION_GUARD, 1) self.assertEqual(jit_o.dtype, torch.bfloat16) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") @unittest.skipIf(not TEST_BF16, "device does not support BFloat16") @@ -2579,7 +2833,7 @@ def t(x: torch.Tensor): self.assertGraphContainsExactly(t_jit.graph_for(x), FUSION_GUARD, 1) self.assertEqual(jit_o.dtype, torch.float) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") @unittest.skipIf(not TEST_BF16, "device does not support BFloat16") @@ -2599,7 +2853,7 @@ def t(x: torch.Tensor): self.assertGraphContainsExactly(t_jit.graph_for(x), FUSION_GUARD, 1) self.assertEqual(jit_o.dtype, torch.bfloat16) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(not TEST_MULTIGPU, "requires multiple CUDA device") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") @@ -2621,7 +2875,7 @@ def t(x): x = x.to("cuda:1") jit_o = t_jit(x) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_graph_for_with_missing_optimized_engine(self): @@ -2648,7 +2902,7 @@ def t(x: torch.Tensor, flag: bool): # have been optimized away self.assertGraphContainsExactly(t_jit.graph_for(x, True), FUSION_GUARD, 1, True) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_branches(self): @@ -2678,7 +2932,7 @@ def t(x: torch.Tensor, weight: torch.Tensor, bias: torch.Tensor, flag: bool): # have been optimized away self.assertGraphContainsExactly(t_jit.graph_for(x, weight, bias, True), FUSION_GUARD, 1) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_scalar_tensor(self): @@ -2701,7 +2955,7 @@ def t(x: torch.Tensor): @unittest.skipIf(os.environ.get('PYTORCH_NO_CUDA_MEMORY_CACHING') is not None, "skipping graph_rng when caching allocator is disabled") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(CUDA_MAJOR < 11, "requires CUDA11 or above") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") @@ -2858,7 +3112,7 @@ def forward(self, x): e0)) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_batch_norm_half(self): @@ -2873,7 +3127,25 @@ def test_batch_norm_half(self): self._test_batch_norm_impl_index_helper(4, 8, 5, affine, track_running_stats, training, torch.half) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") + @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, + "Requires fusion optimization pass to be effective") + def test_batch_norm_impl_index_inner_bcast(self): + # the repro + self._test_batch_norm_impl_index_helper(2, 1, 1, False, True, True) + + # running the full set + setups = [ + [True, True], + [False, False], + [True, False], + [False, True]] + for training_and_track, affine in itertools.product(setups, [True, False]): + training, track_running_stats = training_and_track + self._test_batch_norm_impl_index_helper(2, 1, 1, affine, track_running_stats, training) + + @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_batch_norm_impl_index_correctness(self): @@ -2897,7 +3169,7 @@ def test_batch_norm_impl_index_correctness(self): training, track_running_stats = training_and_track self._test_batch_norm_impl_index_helper(b, c, hw, affine, track_running_stats, training) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_softplus_fuser(self): @@ -2923,7 +3195,7 @@ def shifted_softplus(x: torch.Tensor, shift: float): assert torch.allclose(jit_grad, aten_grad) self.assertGraphContains(jitted.graph_for(inp, 0.693147), FUSION_GROUP, True) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_inplace_removal(self): @@ -2943,7 +3215,7 @@ def t(x: torch.Tensor): self.assertGraphContains(graph, 'aten::add', True) self.assertGraphContains(graph, 'aten::relu', True) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_conv2d_bias(self): @@ -2984,11 +3256,11 @@ def t_bias(x: torch.Tensor, w: torch.Tensor, bias: torch.Tensor): jit_o = jitted_bias(inp, weight, bias) graph = jitted_bias.graph_for(inp) - self.assertGraphContainsExactly(graph, FUSION_GROUP, 0) + self.assertGraphContains(graph, FUSION_GROUP, True) self.assertGraphContains(graph, 'prim::add_optional', True) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_remove_output_used_only_in_dtype(self): @@ -3021,7 +3293,7 @@ def forward(self, x, y): self.assertGraphContains(graph, FUSION_GROUP, True) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_fix_shape_expression_bn(self): @@ -3053,31 +3325,6 @@ def forward(self, x, y): graph = jitted.graph_for(x, y) self.assertGraphContains(graph, FUSION_GROUP, True) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") - @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, - "Requires fusion optimization pass to be effective") - def test_linear_1d_weight_mismatch_bias_dtype(self): - def t(x: torch.Tensor, w: torch.Tensor, b: torch.Tensor): - o = torch.nn.functional.linear(x, w, b) - return o.relu() - - device = "cuda" - jitted = torch.jit.script(t) - x = torch.randn(2, 5, 5, dtype=torch.half, device=device) - w = torch.randn(5, dtype=torch.half, device=device) - b = torch.randn(5, dtype=torch.float32, device=device) - - for i in range(3): - jit_o = jitted(x, w, b) - jit_o = jitted(x, w, b) - o = t(x, w, b) - self.assertEqual(o, jit_o) - self.assertEqual(o.dtype, jit_o.dtype) - self.assertEqual(o.size(), jit_o.size()) - graph = jitted.graph_for(x, w, b) - self.assertGraphContains(graph, FUSION_GROUP, True) - self.assertGraphContains(graph, 'aten::matmul', True) - def _run_fwd_helper(self, func, ops, *args): jitted = torch.jit.script(func) for i in range(3): @@ -3093,7 +3340,7 @@ def _run_fwd_helper(self, func, ops, *args): self.assertGraphContainsExactly(graph, op, 0) @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_sibling_fusion(self): @@ -3114,7 +3361,7 @@ def t2(x: torch.Tensor, y: torch.Tensor): return o1, o2 self._run_fwd_helper(t2, ['aten::sum', 'aten::mul'], x, y) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_clean_profile_ivalue(self): @@ -3136,7 +3383,7 @@ def t(x: torch.Tensor, flag: bool): graph = jit_t.graph_for(x, True) out = jit_t(x, False) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_sibling_fusion_no_scalar_inputs(self): @@ -3187,7 +3434,9 @@ def forward(self, inputs : torch.Tensor, view_shape : List[int]): self.assertTrue(self._compare("comparing output failed", o, jit_o, error)) graph = t_jit.graph_for(x, output_shape) - has_inferred_dimension = any([dim == -1 for dim in output_shape]) + # TODO: revert disabled aten::view + # has_inferred_dimension = any([dim == -1 for dim in output_shape]) + has_inferred_dimension = True if has_inferred_dimension: # prohibit fusing when view_shape contains an inferred dimension self.assertGraphContainsExactly(graph, FUSION_GROUP, 0) @@ -3204,27 +3453,28 @@ def __init__(self): with torch.no_grad(): self.bias.fill_(10) - def forward(self, inputs : torch.Tensor, view_shape : List[int]): + def forward(self, inputs : torch.Tensor, bias : torch.Tensor, view_shape : List[int]): o = inputs.view(view_shape) - inputs = inputs * self.bias + inputs.add_(bias) return torch.relu(o) t = BiasViewRelu() x = torch.randn(shape, dtype=dtype, device=device, requires_grad=False) + bias = torch.randn(shape, dtype=dtype, device=device, requires_grad=False) t_jit = torch.jit.script(t) # profiling - jit_o = t_jit(x, output_shape) + jit_o = t_jit(x.clone(), bias, output_shape) # optimization - jit_o = t_jit(x, output_shape) + jit_o = t_jit(x.clone(), bias, output_shape) # final - jit_o = t_jit(x, output_shape) + jit_o = t_jit(x.clone(), bias, output_shape) # eager - baseline - o = t(x, output_shape) + o = t(x.clone(), bias, output_shape) self.assertEqual(o.dtype, jit_o.dtype) self.assertTrue(self._compare("comparing output failed", o, jit_o, error)) - graph = t_jit.graph_for(x, output_shape) + graph = t_jit.graph_for(x, bias, output_shape) self.assertGraphContainsExactly(graph, FUSION_GUARD, 0) self.assertGraphContainsExactly(graph, 'prim::view_copy', 0) @@ -3334,7 +3584,7 @@ def _view_test_generator(self, ndims, test_fn): total += 1 test_fn(all_views[idx], all_views[jdx], torch.float, 'cuda', 1e-6) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_view(self): @@ -3344,6 +3594,47 @@ def test_view(self): self._view_test_generator(ndims, self._bias_view_relu_helper) self._alias_bias_view_relu_helper([2, 3, 4, 5], [1, 6, 1, 2, 2, 5, 1], torch.float, 'cuda', 1e-6) + def _ltc_helper(self, shape, dtype, device, error, approximate=True): + # modeled after LTC linear layer + class LTC(torch.nn.Module): + def __init__(self): + super(LTC, self).__init__() + self.weight = torch.nn.Parameter(torch.randn([1024, 1024], dtype=dtype, device=device), requires_grad=False) + self.bias = torch.nn.Parameter(torch.randn([1, 1024], dtype=dtype, device=device), requires_grad=False) + + def forward(self, inputs : torch.Tensor): + o = inputs.view([32768, 1024]) + o = torch.mm(o, self.weight) + o = o.view([256, 128, 1024]) + o = o + self.bias + o = o.view([32768, 1024]) + o = o.view([256, 128, 1024]) + return torch.nn.functional.gelu(o) + + t = LTC() + x = torch.randn(shape, dtype=dtype, device=device, requires_grad=False) + t_jit = torch.jit.script(t) + + # profile/optimization runs + for i in range(3): + jit_o = t_jit(x) + o = t(x) + + self.assertEqual(o.dtype, jit_o.dtype) + self.assertTrue(self._compare("comparing output failed", o, jit_o, error)) + graph = t_jit.graph_for(x) + # TODO: revert disabled aten::view + # self.assertGraphContains(graph, FUSION_GUARD) + # self.assertGraphContains(graph, 'prim::view_copy', True) + self.assertGraphContainsExactly(graph, FUSION_GUARD, 0) + self.assertGraphContainsExactly(graph, 'prim::view_copy', 0, True) + + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") + @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, + "Requires fusion optimization pass to be effective") + def test_nested_view(self): + self._ltc_helper([256, 128, 1024], torch.float, 'cuda', 1e-6) + def _bias_squeeze_relu_helper(self, shape, dtype, device, error): class BiasSqueezeRelu(torch.nn.Module): def __init__(self): @@ -3366,7 +3657,7 @@ def forward(self, inputs : torch.Tensor, bias : torch.Tensor): self.assertEqual(o.dtype, jit_o.dtype) self.assertTrue(self._compare("comparing output failed", o, jit_o, error)) - graph = t_jit.graph_for(x) + graph = t_jit.graph_for(x, bias) self.assertGraphContains(graph, FUSION_GUARD) self.assertGraphContains(graph, 'prim::squeeze_copy', True) @@ -3377,7 +3668,7 @@ def __init__(self): def forward(self, inputs : torch.Tensor, bias : torch.Tensor): o = torch.squeeze(inputs) - inputs = inputs * bias + inputs.add_(bias) return torch.relu(o) t = BiasSqueezeRelu() @@ -3385,10 +3676,10 @@ def forward(self, inputs : torch.Tensor, bias : torch.Tensor): bias = torch.randn(shape, dtype=dtype, device=device, requires_grad=False) t_jit = torch.jit.script(t) - jit_o = t_jit(x, bias) - jit_o = t_jit(x, bias) - jit_o = t_jit(x, bias) - o = t(x, bias) + jit_o = t_jit(x.clone(), bias) + jit_o = t_jit(x.clone(), bias) + jit_o = t_jit(x.clone(), bias) + o = t(x.clone(), bias) self.assertEqual(o.dtype, jit_o.dtype) self.assertTrue(self._compare("comparing output failed", o, jit_o, error)) @@ -3396,13 +3687,37 @@ def forward(self, inputs : torch.Tensor, bias : torch.Tensor): self.assertGraphContainsExactly(graph, FUSION_GUARD, 0) self.assertGraphContainsExactly(graph, 'prim::squeeze_copy', 0) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_squeeze(self): self._bias_squeeze_relu_helper([1, 6, 1, 2, 2, 5, 1], torch.float, 'cuda', 1e-6) self._alias_bias_squeeze_relu_helper([1, 6, 1, 2, 2, 5, 1], torch.float, 'cuda', 1e-6) + # remove this after opinfo tests are enabled + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") + @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, + "Requires fusion optimization pass to be effective") + def test_squeeze_zero(self): + x = torch.tensor(1.0, dtype=torch.float, device="cuda") + + def squeeze_0(x: torch.Tensor): + o = x + 1. + o = torch.squeeze(o, 0) + o = o * 2. + return o + + def squeeze_1(x: torch.Tensor): + o = x + 1. + o = torch.squeeze(o, -1) + o = o + .5 + return o + + squeeze_0_jit = torch.jit.script(squeeze_0) + self._run_helper(squeeze_0_jit, squeeze_0, x) + squeeze_1_jit = torch.jit.script(squeeze_1) + self._run_helper(squeeze_1_jit, squeeze_1, x) + def _bias_unsqueeze_relu_helper(self, shape, dtype, device, error): class BiasUnsqueezeRelu(torch.nn.Module): def __init__(self): @@ -3425,7 +3740,7 @@ def forward(self, inputs : torch.Tensor, bias : torch.Tensor): self.assertEqual(o.dtype, jit_o.dtype) self.assertTrue(self._compare("comparing output failed", o, jit_o, error)) - graph = t_jit.graph_for(x) + graph = t_jit.graph_for(x, bias) self.assertGraphContains(graph, FUSION_GUARD) self.assertGraphContains(graph, 'prim::unsqueeze_copy', True) @@ -3435,9 +3750,8 @@ def __init__(self): super(BiasUnsqueezeRelu, self).__init__() def forward(self, inputs : torch.Tensor, bias : torch.Tensor): - o = torch.squeeze(inputs) o = torch.unsqueeze(inputs, 0) - inputs = inputs * bias + inputs.add_(bias) return torch.relu(o) t = BiasUnsqueezeRelu() @@ -3445,25 +3759,25 @@ def forward(self, inputs : torch.Tensor, bias : torch.Tensor): bias = torch.randn(shape, dtype=dtype, device=device, requires_grad=False) t_jit = torch.jit.script(t) - jit_o = t_jit(x, bias) - jit_o = t_jit(x, bias) - jit_o = t_jit(x, bias) - o = t(x, bias) + jit_o = t_jit(x.clone(), bias) + jit_o = t_jit(x.clone(), bias) + jit_o = t_jit(x.clone(), bias) + o = t(x.clone(), bias) self.assertEqual(o.dtype, jit_o.dtype) self.assertTrue(self._compare("comparing output failed", o, jit_o, error)) - graph = t_jit.graph_for(x) + graph = t_jit.graph_for(x, bias) self.assertGraphContainsExactly(graph, FUSION_GUARD, 0) self.assertGraphContainsExactly(graph, 'prim::unsqueeze_copy', 0) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_unsqueeze(self): self._bias_unsqueeze_relu_helper([2, 3, 4, 5], torch.float, 'cuda', 1e-6) self._alias_bias_unsqueeze_relu_helper([2, 3, 4, 5], torch.float, 'cuda', 1e-6) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_alias_pass_fix(self): @@ -3479,7 +3793,7 @@ def t(x, w, b): t_jit = torch.jit.script(t) self._run_helper(t_jit, t, x, w, b) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_squeeze_negative_dim(self): @@ -3494,7 +3808,7 @@ def t(x): t_jit = torch.jit.script(t) self._run_helper(t_jit, t, x) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_singleton_fusion(self): @@ -3507,7 +3821,32 @@ def t(x): t_jit = torch.jit.script(t) self._run_helper(t_jit, t, x) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") + @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, + "Requires fusion optimization pass to be effective") + def test_issue1445_fusion(self): + def f(t0, t1, t2, t3): + masked_input = torch.where(t1, t2, t3) + total = masked_input.sum([0, 1, 2, 3]) + sizes : List[int] = [] + t10 = torch.reshape(t0, sizes) + t7 = total / t10 + t4 = t7.to(dtype=torch.float) + return t4 + + x = torch.randn(1, 1, 1, 1, device='cuda').to(dtype=torch.long) + y = torch.randn(3, 2, 1, 1, device='cuda').to(dtype=torch.bool).expand([3, 2, 1, 2]) + z = torch.randn(3, 2, 1, 2, device='cuda') + w = torch.tensor(1.5, device='cuda') + + f_jit = torch.jit.script(f) + for i in range(5): + out_jit = f_jit(x, y, z, w) + out = f(x, y, z, w) + self.assertEqual(out, out_jit) + self.assertGraphContainsExactly(f_jit.graph_for(x, y, z, w), FUSION_GROUP, 1) + + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_disable_sibling_fuse(self): @@ -3528,7 +3867,7 @@ def t(x, y, s): # sibling fusion should be disabled with the flag self.assertGraphContainsExactly(t_jit.graph_for(x, y, s), FUSION_GUARD, 0) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_build_shape_expression_native_dropout(self): @@ -3550,7 +3889,7 @@ def t(x): self.assertEqual(oo, jit_oo) self.assertGraphContains(t_jit.graph_for(x), FUSION_GUARD) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_scalar_tensor_permuted(self): @@ -3564,7 +3903,7 @@ def t(x, y): t_jit = torch.jit.script(t) self._run_helper(t_jit, t, x, y) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_cpu_scalar(self): @@ -3609,7 +3948,7 @@ def t3(x, y, z): self.assertGraphContainsExactly(t3.graph_for(x, y, z), FUSION_GUARD, 1) self.assertGraphContainsExactly(t3.graph_for(x, y, z), 'aten::add', 1) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_shape_expression(self): @@ -3660,9 +3999,410 @@ def run(fn): for t in [t_unsqueeze, t_squeeze, t_squeeze_dim, t_squeeze_dim_no_op]: run(t) + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") + @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, + "Requires fusion optimization pass to be effective") + def test_scalar_cuda_tensor(self): + x = torch.tensor(2.0, device="cuda") + + with nvfuser_singleton_fusion(True): + def t(x): + return x + 1.0 + + t_jit = torch.jit.script(t) + self._run_helper(t_jit, t, x) + + @torch.jit.script + def t_jitted(x): + return x.sum(0) + + for i in range(5): + t_jitted(x) + self.assertGraphContainsExactly(t_jitted.graph_for(x), FUSION_GUARD, 0) + + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") + @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, + "Requires fusion optimization pass to be effective") + def test_overlapped_input(self): + x = torch.randn(8, device="cuda").as_strided((2, 4), (1, 1)) + + with nvfuser_singleton_fusion(True): + def t(x): + return x + 1.0 + + t_jit = torch.jit.script(t) + self._run_helper(t_jit, t, x) + + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") + @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, + "Requires fusion optimization pass to be effective") + @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") + def test_reduction_empty_axes(self): + x = torch.randn(4, 2, 3, device="cuda").permute([1, 2, 0]) + + with nvfuser_singleton_fusion(True): + def t(x): + sizes : List[int] = [] + return x.sum(sizes) + + t_jit = torch.jit.script(t) + self._run_helper(t_jit, t, x) + + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") + @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, + "Requires fusion optimization pass to be effective") + @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") + def test_int_tensor_input(self): + x = torch.randn(4, 2, device="cuda").to(dtype=torch.int) + + with nvfuser_singleton_fusion(True): + def t(x): + return x.amax(dim=0) + + t_jit = torch.jit.script(t) + self._run_helper(t_jit, t, x) + + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") + @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, + "Requires fusion optimization pass to be effective") + def test_to_boolean(self): + x = torch.randn(4, 2, device="cuda") + + with nvfuser_singleton_fusion(True): + def t(x): + return x.to(dtype=torch.bool) + + t_jit = torch.jit.script(t) + self._run_helper(t_jit, t, x) + + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") + @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, + "Requires fusion optimization pass to be effective") + def test_view_copy_graph_guard(self): + x = torch.randn(4, 2, 3, device="cuda").permute([1, 2, 0]) + y = [4, 6] + + with nvfuser_singleton_fusion(True): + def t(x, y : List[int]): + t1 = x + 1.0 + t2 = t1 * 1.0 + out = t2.reshape(y) + return out.relu() + + t_jit = torch.jit.script(t) + self._run_helper(t_jit, t, x, y) + + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") + @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, + "Requires fusion optimization pass to be effective") + def test_view_copy_graph_guard_double_fusion(self): + x = torch.randn(2, 2, 5, device="cuda") + w = torch.randn(5, 5, device="cuda") + + with nvfuser_singleton_fusion(True): + def t(x, w): + o = x.view([4, x.size()[-1]]) + o = torch.matmul(o, w) + o = o.view([2, 2, o.size()[1]]) + return o + + t_jit = torch.jit.script(t) + for i in range(3): + jit_o = t_jit(x, w) + o = t(x, w) + self.assertEqual(jit_o, o) + # TODO: revert disabled aten::view + # self.assertGraphContainsExactly(t_jit.graph_for(x, w), FUSION_GUARD, 2, consider_subgraphs=True) + self.assertGraphContainsExactly(t_jit.graph_for(x, w), FUSION_GUARD, 0, consider_subgraphs=True) + + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") + @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, + "Requires fusion optimization pass to be effective") + def test_input_output_passthrough(self): + def t(t0, t1, t2): + mask = t1.to(dtype=torch.bool) + masked_input = torch.where(t0, mask, t2) + return masked_input, mask + + t_jit = torch.jit.script(t) + # stick to integers, this avoid the numerical difference due to our + # promotion + x = torch.randn(4, 4, device='cuda').to(dtype=torch.bool) + y = torch.randn(4, 4, device='cuda').to(dtype=torch.bool) + z = torch.tensor(1.0, device='cuda').to(dtype=torch.bool) + jit_o = t_jit(x, y, z) + jit_o = t_jit(x, y, z) + o = t(x, y, z) + for oo, jit_oo in zip(o, jit_o): + self.assertEqual(oo.dtype, jit_oo.dtype) + self.assertEqual(oo, jit_oo) + self.assertGraphContains(t_jit.graph_for(x, y, z), FUSION_GUARD) + + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") + @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, + "Requires fusion optimization pass to be effective") + def test_pointwise_reference_tensor(self): + def t(input1, input2, scalar): + _unsafe_view = torch.ops.aten._unsafe_view(input1, [2, 4, 16]) + add_ = torch.ops.aten.add_(_unsafe_view, input2) + gelu_ = torch.ops.aten.gelu(add_) + view_ = torch.ops.aten.view(gelu_, [8, 16]) + mul_ = torch.ops.aten.mul(add_, scalar) + return [view_, mul_] + + x = torch.randn(8, 16, device="cuda") + bias = torch.randn(16, device="cuda") + scalar = torch.ones(torch.Size([]), device="cuda") + + t_jit = torch.jit.script(t) + for i in range(3): + jit_o = t_jit(x, bias, scalar) + o = t(x, bias, scalar) + self.assertEqual(jit_o, o) + self.assertGraphContains(t_jit.graph_for(x, bias, scalar), FUSION_GUARD) + + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") + @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, + "Requires fusion optimization pass to be effective") + @unittest.skipIf(is_pre_volta(), "reduction not supported in pre volta device") + def test_native_batch_norm_backward(self): + grad_output = torch.randn(4, 2, 3, device="cuda") + input = torch.randn(4, 2, 3, device="cuda") + weight = torch.randn(2, device="cuda") + + r_m = torch.randn(2, device="cuda") + r_v = torch.randn(2, device="cuda").abs() + + save_mean = torch.randn(2, device="cuda") + save_invstd = torch.randn(2, device="cuda").abs() + + with nvfuser_singleton_fusion(True): + def t(grad_out, input, weight, r_m, r_v, save_mean, save_invstd, train: bool, eps: float, mask: List[bool]): + return torch.ops.aten.native_batch_norm_backward(grad_out, input, weight, r_m, r_v, save_mean, + save_invstd, train, eps, mask) + + t_jit = torch.jit.script(t) + for i in range(4): + jit_o = t_jit(grad_output, input, weight, r_m.clone(), r_v.clone(), + save_mean, save_invstd, True, 1e-5, [True, True, True]) + + ref_m = r_m.clone() + ref_v = r_v.clone() + jit_o = t_jit(grad_output, input, weight, r_m, r_v, save_mean, save_invstd, True, 1e-5, [True, True, True]) + o = t(grad_output, input, weight, ref_m, ref_v, save_mean, save_invstd, True, 1e-5, [True, True, True]) + for oo, jit_oo in zip(o, jit_o): + self.assertEqual(oo.dtype, jit_oo.dtype) + self.assertEqual(oo, jit_oo) + self.assertEqual(ref_m.dtype, r_m.dtype) + self.assertEqual(ref_m, r_m) + self.assertEqual(ref_v.dtype, r_v.dtype) + self.assertEqual(ref_v, r_v) + self.assertGraphContains(t_jit.graph_for(grad_output, input, weight, r_m.clone(), r_v.clone, save_mean, + save_invstd, True, 1e-5, [True, True, True]), FUSION_GUARD) + + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") + @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, + "Requires fusion optimization pass to be effective") + def test_contiguous_on_broadcasted(self): + x = torch.randn(4, 1, device="cuda") + y = torch.randn(4, 128, device="cuda") + + with nvfuser_singleton_fusion(True): + def t(x, y): + t1 = x.expand([4, 128]) + t2 = t1 * y + return t2 + + t_jit = torch.jit.script(t) + self._run_helper(t_jit, t, x, y) + + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") + @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, + "Requires fusion optimization pass to be effective") + def test_skip_parser(self): + x = torch.randn(4, 12, device="cuda") + + with nvfuser_singleton_fusion(True): + def fn(x): + t1 = x + 1.0 + return t1.relu() + + fn_jit = torch.jit.script(fn) + self._run_helper(fn_jit, fn, x) + + # add node should have been merged into fusion + self.assertGraphContains(fn_jit.graph_for(x), FUSION_GUARD) + self.assertGraphContainsExactly(fn_jit.graph_for(x), 'aten::add', 0) + + # flips skip parse for `aten::add`, following fusion should skip the + # add node + self.assertFalse(torch._C._jit_set_nvfuser_skip_node_kind("aten::add", True)) + + def fn_1(x): + t1 = x + 2.0 # change const value so we'll not reuse plan + return t1.relu() + + fn_1_jit = torch.jit.script(fn_1) + self._run_helper(fn_1_jit, fn_1, x) + + # add node should have been merged into fusion + self.assertGraphContains(fn_1_jit.graph_for(x), FUSION_GUARD) + self.assertGraphContainsExactly(fn_1_jit.graph_for(x), 'aten::add', 1) + + # flips skip parse for `aten::add`, next fusion should fuse add node + self.assertTrue(torch._C._jit_set_nvfuser_skip_node_kind("aten::add", True)) + + def fn_2(x): + t1 = x + 2.0 # change const value so we'll not reuse plan + return t1.relu() + + fn_2_jit = torch.jit.script(fn_2) + self._run_helper(fn_2_jit, fn_2, x) + + # add node should have been merged into fusion + self.assertGraphContains(fn_2_jit.graph_for(x), FUSION_GUARD) + self.assertGraphContainsExactly(fn_2_jit.graph_for(x), 'aten::add', 0) + + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") + @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, + "Requires fusion optimization pass to be effective") + def test_cuda_fusion_guard(self): + old_guard = torch._C._jit_set_nvfuser_guard_mode(True) + + class ConvModule(torch.nn.Module): + def __init__(self): + super().__init__() + + def forward(self, x): + return x.sin().sigmoid() + + mod = ConvModule().to(device="cuda") + + inputs = [torch.randn(20, 16, 50, 100, device="cuda", requires_grad=True)] + + def reduce_scalar(temp): + return temp.sum() + + scripted = torch.jit.script(mod) + with torch.no_grad(): + scripted(*inputs) + res = scripted(*inputs) + reduce_scalar(res).backward() + torch._C._jit_set_nvfuser_guard_mode(old_guard) + + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") + @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, + "Requires fusion optimization pass to be effective") + def test_nvfuser_comparison_callbacks_with_fallback(self): + try: + fused_result = None + unfused_result = None + graph_ir = None + + def callback(fused_outputs, unfused_outputs, graph_str): + nonlocal unfused_result + nonlocal fused_result + nonlocal graph_ir + unfused_result = unfused_outputs[-1] + fused_result = fused_outputs[-1] + graph_ir = graph_str + torch._C._jit_nvfuser_set_comparison_callback(True, callback) + + def fn(x, y): + z = torch.add(x, y) + return torch.relu(z) + + x = torch.rand((4, 4)).cuda() - 0.5 + y = torch.rand((4, 4)).cuda() - 0.5 + + fn_s = torch.jit.script(fn) + fn_s(x, y) + fn_s(x, y) + fn_s(x, y) + + expected = fn(x, y) + + self.assertEqual(expected, fused_result) + self.assertEqual(expected, unfused_result) + FileCheck().check("aten::add").run(graph_ir) + finally: + torch._C._jit_nvfuser_clear_comparison_callback() + + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") + @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, + "Requires fusion optimization pass to be effective") + def test_nvfuser_comparison_callbacks_without_fallback(self): + try: + fused_result = None + unfused_result = None + graph_ir = None + + def callback(fused_outputs, unfused_outputs, graph_str): + nonlocal unfused_result + nonlocal fused_result + nonlocal graph_ir + if len(unfused_outputs) > 0: + unfused_result = unfused_outputs[-1] + fused_result = fused_outputs[-1] + graph_ir = graph_str + torch._C._jit_nvfuser_set_comparison_callback(False, callback) + + def fn(x, y): + z = torch.add(x, y) + return torch.relu(z) + + x = torch.rand((4, 4)).cuda() - 0.5 + y = torch.rand((4, 4)).cuda() - 0.5 + + fn_s = torch.jit.script(fn) + fn_s(x, y) + fn_s(x, y) + fn_s(x, y) + + expected = fn(x, y) + + self.assertEqual(expected, fused_result) + self.assertEqual(None, unfused_result) + FileCheck().check("aten::add").run(graph_ir) + finally: + torch._C._jit_nvfuser_clear_comparison_callback() + + @unittest.skipIf(not RUN_NVFUSER, "requires NVFuser") + @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, + "Requires fusion optimization pass to be effective") + def test_cuda_fusion_guard_backward(self): + old_guard = torch._C._jit_set_nvfuser_guard_mode(True) + + inp = torch.randn(10, device="cuda", requires_grad=True) + grad = torch.randn(10, device="cuda") + + def f(x): + a = x.cos().cos() + return a + scripted = torch.jit.script(f) + + with profile(activities=[ProfilerActivity.CPU]) as prof: + for _ in range(5): + inp.grad = None + out = scripted(inp) + out.backward(grad) + + # check that we do not have fallback triggered + self.assertEqual(prof.events().table().find("fallback"), -1) + torch._C._jit_set_nvfuser_guard_mode(old_guard) + class TestPassManagerCudaFuser(JitTestCase): + def setUp(self): + super().setUp() + if RUN_NVFUSER: + self.is_enabled = torch._C._jit_set_nvfuser_enabled(False) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + def tearDown(self): + if RUN_NVFUSER: + torch._C._jit_set_nvfuser_enabled(self.is_enabled) + super().tearDown() + + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") @unittest.skipIf(GRAPH_EXECUTOR != ProfilingMode.PROFILING, "Requires fusion optimization pass to be effective") def test_context_manager_test(self): @@ -3698,7 +4438,7 @@ def t3(x, y): t_jit_3(x, y) self.assertGraphContainsExactly(t_jit_3.graph_for(x, y), FUSION_GUARD, 0) - @unittest.skipIf(not RUN_CUDA, "requires CUDA") + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") def test_register_fuser(self): self.assertFalse(torch._C._jit_set_nvfuser_enabled(True)) self.assertTrue(torch._C._jit_nvfuser_enabled()) @@ -3708,5 +4448,41 @@ def test_register_fuser(self): self.assertFalse(torch._C._jit_nvfuser_enabled()) +class TestCudaFuserOpInfo(JitCommonTestCase): + def setUp(self): + if RUN_NVFUSER: + self.cuda_fuser_options = CudaFuserTestOptions() + self.nvfuser_single_node_mode = torch._C._jit_set_nvfuser_single_node_mode(True) + + def tearDown(self): + if RUN_NVFUSER: + self.cuda_fuser_options.restore() + torch._C._jit_set_nvfuser_single_node_mode(self.nvfuser_single_node_mode) + + @slowTest + @unittest.skipIf(not RUN_NVFUSER, "requires CUDA") + @ops(op_db, dtypes=OpDTypes.supported) + def test_nvfuser_correctness(self, device, dtype, op): + variant_sample_pairs = get_traced_sample_variant_pairs(device, dtype, op) + + for variant, sample in variant_sample_pairs: + trace = create_traced_fn(self, variant) + ref = variant(*clone_inputs((sample.input, *sample.args)), **sample.kwargs) + + trace(*clone_inputs((sample.input, *sample.args)), **sample.kwargs) + + val = trace(*clone_inputs((sample.input, *sample.args)), **sample.kwargs) + + self.assertEqual(ref, val) + + # https://github.com/pytorch/pytorch/issues/35600 + # each torch.jit.trace adds state to the _python_cu compilation unit + # since this test traces a lot of functions, out-of-memory can occur + # if the CU is not cleared. + torch.jit._state._python_cu.drop_all_functions() + +instantiate_device_type_tests(TestCudaFuserOpInfo, globals(), only_for=("cuda")) + + if __name__ == '__main__': run_tests() diff --git a/test/test_jit_fuser_te.py b/test/test_jit_fuser_te.py index ab2b85c6bb3ba0..ac0718269c1907 100644 --- a/test/test_jit_fuser_te.py +++ b/test/test_jit_fuser_te.py @@ -18,7 +18,7 @@ # inferred erroneously runs or skips # some tests torch._C._jit_set_profiling_executor(True) -torch._C._jit_set_profiling_mode(True) +torch._C._get_graph_executor_optimize(True) from torch.testing._internal.common_utils import run_tests, ProfilingMode, GRAPH_EXECUTOR, \ enable_profiling_mode_for_profiling_tests, slowTest @@ -82,6 +82,7 @@ def inline_fusion_groups(): class TestTEFuser(JitTestCase): def setUp(self): + super().setUp() self.tensorexpr_options = TensorExprTestOptions() # note: `self.dynamic_shapes` instatiated in specialization of class @@ -109,6 +110,7 @@ def setUp(self): def tearDown(self): self.tensorexpr_options.restore() torch._C._jit_set_fusion_strategy(self.old_fusion_strategy) + super().tearDown() def assertAllFused(self, graph, except_for=None): except_for = except_for if except_for is not None else set() @@ -1353,79 +1355,80 @@ def apply(fn): ) def test_unary_ops(self): - def apply(fn): - return lambda x: fn(x) - - unary_ops = [ - torch.lgamma, - torch.sigmoid, - torch.reciprocal, - torch.neg, - torch.relu, - F.relu6, - torch.log, - torch.log10, - torch.log1p, - torch.log2, - torch.exp, - torch.expm1, - torch.erf, - torch.erfc, - torch.cos, - torch.sin, - torch.tan, - torch.acos, - torch.asin, - torch.cosh, - torch.sinh, - torch.atan, - torch.tanh, - F.hardtanh, - F.hardsigmoid, - F.hardswish, - F.softplus, - torch.sqrt, - torch.rsqrt, - torch.abs, - torch.ceil, - torch.floor, - torch.round, - torch.trunc, - torch.frac, - # TODO: broken on ROCm? - # F.hardshrink, - F.leaky_relu, - lambda x: torch.threshold(x, 0, -10), - lambda x: torch.clamp(x, -10, 10), - ] - gpu_only = {torch.erf, torch.erfc} - sizes = [(1,), (2,), (4, 4)] - for dtype, op, device, size in product(self.dtypes, unary_ops, self.devices, sizes): - # TODO: Add back when https://github.com/pytorch/pytorch/issues/55905 is closed - if dtype in [torch.float16, torch.bfloat16] and device == "cpu": - continue - # todo - re-enable. fails with .500 - if dtype == torch.bfloat16 and op == torch.round: - continue - if op in gpu_only and device == "cpu": - continue - try: - x = self.data_for(dtype, device, size=size) - fn = apply(op) - ref = fn(x) - except Exception: - # If eager mode doesn't support a dtype/op/device combo, - # neither does the fuser. Catch everything to avoid needing to - # guess what errors might be thrown by eager. - continue - try: - t = torch.jit.trace(fn, (x,)) - torch.testing.assert_close(ref, t(x)) - self.assertAllFused(t.graph_for(x)) - except Exception as e: - raise RuntimeError( - " ".join(["Failed:", str(dtype), op.__name__, device, str(size)]) - ) + with torch._jit_internal._disable_emit_hooks(): + def apply(fn): + return lambda x: fn(x) + + unary_ops = [ + torch.lgamma, + torch.sigmoid, + torch.reciprocal, + torch.neg, + torch.relu, + F.relu6, + torch.log, + torch.log10, + torch.log1p, + torch.log2, + torch.exp, + torch.expm1, + torch.erf, + torch.erfc, + torch.cos, + torch.sin, + torch.tan, + torch.acos, + torch.asin, + torch.cosh, + torch.sinh, + torch.atan, + torch.tanh, + F.hardtanh, + F.hardsigmoid, + F.hardswish, + F.softplus, + torch.sqrt, + torch.rsqrt, + torch.abs, + torch.ceil, + torch.floor, + torch.round, + torch.trunc, + torch.frac, + # TODO: broken on ROCm? + # F.hardshrink, + F.leaky_relu, + lambda x: torch.threshold(x, 0, -10), + lambda x: torch.clamp(x, -10, 10), + ] + gpu_only = {torch.erf, torch.erfc} + sizes = [(1,), (2,), (4, 4)] + for dtype, op, device, size in product(self.dtypes, unary_ops, self.devices, sizes): + # TODO: Add back when https://github.com/pytorch/pytorch/issues/55905 is closed + if dtype in [torch.float16, torch.bfloat16] and device == "cpu": + continue + # todo - re-enable. fails with .500 + if dtype == torch.bfloat16 and op == torch.round: + continue + if op in gpu_only and device == "cpu": + continue + try: + x = self.data_for(dtype, device, size=size) + fn = apply(op) + ref = fn(x) + except Exception: + # If eager mode doesn't support a dtype/op/device combo, + # neither does the fuser. Catch everything to avoid needing to + # guess what errors might be thrown by eager. + continue + try: + t = torch.jit.trace(fn, (x,)) + torch.testing.assert_close(ref, t(x)) + self.assertAllFused(t.graph_for(x)) + except Exception as e: + raise RuntimeError( + " ".join(["Failed:", str(dtype), op.__name__, device, str(size)]) + ) def test_binary_ops(self): def apply(fn): @@ -1592,47 +1595,48 @@ def fn(x, y): ) def test_binary_tensor_scalar_ops(self): - def apply_with_scalar(fn, scalar): - return lambda x: fn(x, scalar) - - # FIXME: Fails in IR Eval: torch.int64 and_ cpu - binary_ops = [ - operator.__and__, - operator.__or__, - operator.__xor__, - torch.add, - torch.sub, - torch.mul, - torch.eq, - torch.ne, - torch.ge, - torch.lt, - torch.gt, - ] - devices = self.devices - # Maybe we should split this into separate tests to speed it up by - # only using scalar values relevant to particular ops - scalars = [1.5, 3, 0, -2.0, -1] - for dtype, op, device, scalar in product(self.dtypes, binary_ops, devices, scalars): - if dtype in [torch.float16, torch.bfloat16] and device == "cpu": - continue - try: - x = self.data_for(dtype, device) - fn = apply_with_scalar(op, scalar) - ref = fn(x) - except Exception: - # If eager mode doesn't support a dtype/op/device combo, - # neither does the fuser. Catch everything to avoid needing to - # guess what errors might be thrown by eager. - continue - try: - t = torch.jit.trace(fn, (x)) - self.assertEqual(ref, t(x)) - self.assertAllFused(t.graph_for(x)) - except Exception as e: - raise RuntimeError( - " ".join(["Failed:", str(dtype), op.__name__, device]) - ) + with torch._jit_internal._disable_emit_hooks(): + def apply_with_scalar(fn, scalar): + return lambda x: fn(x, scalar) + + # FIXME: Fails in IR Eval: torch.int64 and_ cpu + binary_ops = [ + operator.__and__, + operator.__or__, + operator.__xor__, + torch.add, + torch.sub, + torch.mul, + torch.eq, + torch.ne, + torch.ge, + torch.lt, + torch.gt, + ] + devices = self.devices + # Maybe we should split this into separate tests to speed it up by + # only using scalar values relevant to particular ops + scalars = [1.5, 3, 0, -2.0, -1] + for dtype, op, device, scalar in product(self.dtypes, binary_ops, devices, scalars): + if dtype in [torch.float16, torch.bfloat16] and device == "cpu": + continue + try: + x = self.data_for(dtype, device) + fn = apply_with_scalar(op, scalar) + ref = fn(x) + except Exception: + # If eager mode doesn't support a dtype/op/device combo, + # neither does the fuser. Catch everything to avoid needing to + # guess what errors might be thrown by eager. + continue + try: + t = torch.jit.trace(fn, (x)) + self.assertEqual(ref, t(x)) + self.assertAllFused(t.graph_for(x)) + except Exception as e: + raise RuntimeError( + " ".join(["Failed:", str(dtype), op.__name__, device]) + ) def test_binary_div_ops(self): def apply_with_scalar(fn, scalar): @@ -2473,12 +2477,21 @@ def get_name(op): l.append(op.variant_test_name) return '.'.join(l) -class TestNNCOpInfo(JitCommonTestCase): +# Purpose of this class is to allow super() calls. +# super() [with no arguments] fails, presumably because of how instantiate_device_type_tests works. +# super(TestNNCOpInfo, self) fails because TestNNCOpInfo gets deleted from global scope. +# super(JitCommonTestCase, self).fn() would skip JitCommonTestCase.fn() implementation +class TestNNCOpInfoParent(JitCommonTestCase): + pass + +class TestNNCOpInfo(TestNNCOpInfoParent): def setUp(self): + super(TestNNCOpInfoParent, self).setUp() self.tensorexpr_options = TensorExprTestOptions() def tearDown(self): self.tensorexpr_options.restore() + super(TestNNCOpInfoParent, self).tearDown() def te_compile(self, device, dtype, op): if op.name in skip_ops: @@ -2578,9 +2591,13 @@ def test_nnc_correctness(self, device, dtype, op): only_for = ("cpu", "cuda") instantiate_device_type_tests(TestNNCOpInfo, globals(), only_for=only_for) +# Purpose of this class is to allow super() calls. (See TestNNCOpInfoParent) +class TestLoopnestRandomizationParent(JitTestCase): + pass -class TestLoopnestRandomization(JitTestCase): +class TestLoopnestRandomization(TestLoopnestRandomizationParent): def setUp(self): + super(TestLoopnestRandomizationParent, self).setUp() self.old_cpu_fuser_state = torch._C._jit_can_fuse_on_cpu() self.old_must_use_cpu_state = torch._C._jit_get_te_must_use_llvm_cpu() self.old_gpu_fuser_state = torch._C._jit_can_fuse_on_gpu() @@ -2591,7 +2608,7 @@ def setUp(self): torch._C._jit_override_can_fuse_on_gpu(True) self.old_profiling_executor = torch._C._jit_set_profiling_executor(True) - self.old_profiling_mode = torch._C._jit_set_profiling_mode(True) + self.old_profiling_mode = torch._C._get_graph_executor_optimize(True) self.old_fusion_inlining = torch._C._debug_get_fusion_group_inlining() torch._C._debug_set_fusion_group_inlining(False) @@ -2608,7 +2625,7 @@ def setUp(self): def tearDown(self): torch._C._jit_set_profiling_executor(self.old_profiling_executor) - torch._C._jit_set_profiling_mode(self.old_profiling_mode) + torch._C._get_graph_executor_optimize(self.old_profiling_mode) torch._C._jit_override_can_fuse_on_gpu(self.old_gpu_fuser_state) torch._C._jit_override_can_fuse_on_cpu(self.old_cpu_fuser_state) @@ -2620,6 +2637,7 @@ def tearDown(self): # Set it back to 0. os.environ["PYTORCH_TENSOREXPR_RANDOM_TRANSFORM_SEED"] = "0" + super(TestLoopnestRandomizationParent, self).tearDown() @onlyCPU @unittest.skipIf(not LLVM_ENABLED, "Compiles with TensorExprKernel") diff --git a/test/test_linalg.py b/test/test_linalg.py index a7e9ccc2bddfb2..0fcc3006b4715f 100644 --- a/test/test_linalg.py +++ b/test/test_linalg.py @@ -25,8 +25,8 @@ onlyCUDA, skipCUDAVersionIn, skipMeta, skipCUDAIfNoCusolver) from torch.testing import make_tensor from torch.testing._internal.common_dtype import ( - all_types, floating_and_complex_types, get_all_dtypes, get_all_int_dtypes, get_all_complex_dtypes, - get_all_fp_dtypes, + all_types, all_types_and_complex_and, floating_and_complex_types, integral_types, + floating_and_complex_types_and, floating_types_and, complex_types, ) from torch.testing._internal.common_cuda import SM53OrLater, tf32_on_and_off, CUDA11OrLater, CUDA9 from torch.distributions.binomial import Binomial @@ -101,7 +101,7 @@ def check(a_sizes_, b_sizes_): # Tests torch.outer, and its alias, torch.ger, vs. NumPy @precisionOverride({torch.bfloat16: 1e-1}) - @dtypes(*(get_all_dtypes())) + @dtypes(*all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool)) def test_outer(self, device, dtype): def run_test_case(a, b): if dtype == torch.bfloat16: @@ -264,7 +264,8 @@ def numpy_ref(a, b): else: # driver == 'gelsy' # QR based algorithm; setting the value too high might lead to non-unique solutions and flaky tests - rcond = 1e-4 + # so we skip this case + continue # specifying rcond value has no effect for gels driver so no need to run the tests again if driver == 'gels' and rcond is not None: @@ -744,7 +745,7 @@ def check(m, a, b, beta, alpha): check(m_scalar, a, b, beta, alpha) # test nans and infs are not propagated to the output when beta == 0 - float_and_complex_dtypes = get_all_fp_dtypes() + get_all_complex_dtypes() + float_and_complex_dtypes = floating_and_complex_types_and(torch.half, torch.bfloat16) if beta == 0 and dtype in float_and_complex_dtypes: m[0][10] = m[10][10] = m[20][20] = float('inf') m[1][10] = m[11][10] = m[21][20] = float('nan') @@ -757,7 +758,7 @@ def test_addr_bool(self, device, dtype): self._test_addr_vs_numpy(device, dtype, beta=False, alpha=False) self._test_addr_vs_numpy(device, dtype, beta=True, alpha=True) - @dtypes(*(get_all_int_dtypes())) + @dtypes(*integral_types()) def test_addr_integral(self, device, dtype): with self.assertRaisesRegex(RuntimeError, 'argument beta must not be a floating point number.'): @@ -778,7 +779,7 @@ def test_addr_integral(self, device, dtype): self._test_addr_vs_numpy(device, dtype, beta=2, alpha=2) @precisionOverride({torch.bfloat16: 1e-1}) - @dtypes(*(get_all_fp_dtypes() + get_all_complex_dtypes())) + @dtypes(*floating_and_complex_types_and(torch.half, torch.bfloat16)) def test_addr_float_and_complex(self, device, dtype): with self.assertRaisesRegex(RuntimeError, 'Boolean beta only supported for Boolean results.'): @@ -791,11 +792,11 @@ def test_addr_float_and_complex(self, device, dtype): self._test_addr_vs_numpy(device, dtype, beta=0., alpha=2) # when beta is not zero self._test_addr_vs_numpy(device, dtype, beta=0.5, alpha=2) - if dtype in get_all_complex_dtypes(): + if dtype in complex_types(): self._test_addr_vs_numpy(device, dtype, beta=(0 + 0.1j), alpha=(0.2 - 0.2j)) - @dtypes(*itertools.product(get_all_dtypes(), - get_all_dtypes())) + @dtypes(*itertools.product(all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool), + all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool))) def test_outer_type_promotion(self, device, dtypes): a = torch.randn(5).to(device=device, dtype=dtypes[0]) b = torch.randn(5).to(device=device, dtype=dtypes[1]) @@ -805,7 +806,7 @@ def test_outer_type_promotion(self, device, dtypes): # don't use @dtypes decorator to avoid generating ~1700 tests per device def test_addr_type_promotion(self, device): - for dtypes0, dtypes1, dtypes2 in product(get_all_dtypes(), repeat=3): + for dtypes0, dtypes1, dtypes2 in product(all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool), repeat=3): a = make_tensor((5,), device=device, dtype=dtypes0, low=-2, high=2) b = make_tensor((5,), device=device, dtype=dtypes1, low=-2, high=2) m = make_tensor((5, 5), device=device, dtype=dtypes2, low=-2, high=2) @@ -2936,7 +2937,7 @@ def run_test_singular_input(batch_dim, n): @skipCPUIfNoLapack @onlyNativeDeviceTypes # TODO: XLA doesn't raise exception @skipCUDAIfRocm - @skipCUDAVersionIn([(11, 3), (11, 5)]) # https://github.com/pytorch/pytorch/issues/57482 + @skipCUDAVersionIn([(11, 3), (11, 5), (11, 6)]) # https://github.com/pytorch/pytorch/issues/57482 @dtypes(*floating_and_complex_types()) def test_inverse_errors_large(self, device, dtype): # Test batched inverse of singular matrices reports errors without crashing (gh-51930) @@ -3240,6 +3241,27 @@ def run_test_singular_input(batch_dim, n): with self.assertRaisesRegex(RuntimeError, "tensors to be on the same device"): torch.linalg.solve(a, b, out=out) + @skipCUDAIfNoMagma + @skipCPUIfNoLapack + @dtypes(*floating_and_complex_types()) + def test_solve_batched_broadcasting(self, device, dtype): + from numpy.linalg import solve + + def run_test(A_dims, B_dims): + A_matrix_size = A_dims[-1] + A_batch_dims = A_dims[:-2] + B, A = self.solve_test_helper(A_batch_dims + (A_matrix_size, A_matrix_size), B_dims, device, dtype) + actual = torch.linalg.solve(A, B) + expected = solve(A.cpu().numpy(), B.cpu().numpy()) + self.assertEqual(actual, expected) + + # test against numpy.linalg.solve + run_test((5, 5), (2, 0, 5, 3)) # broadcasting with 0 batch dim + run_test((2, 0, 5, 5), (5, 3)) # broadcasting with 0 batch dim + run_test((2, 1, 3, 4, 4), (4, 6)) # broadcasting B + run_test((4, 4), (2, 1, 3, 4, 2)) # broadcasting A + run_test((1, 3, 1, 4, 4), (2, 1, 3, 4, 5)) # broadcasting A & B + @skipCUDAIfNoMagma @skipCPUIfNoLapack @dtypes(*floating_and_complex_types()) @@ -3678,6 +3700,9 @@ def test_matrix_rank_atol_rtol(self, device, dtype): result = torch.linalg.matrix_rank(a, atol=tol_value, rtol=tol_value) self.assertEqual(result, 2) # there are 2 singular values above max(0.81, 1.5*0.81) + # CUDA 11.6 issue failure https://github.com/pytorch/pytorch/issues/75391 + @skipCUDAIf(torch.version.cuda is not None + and torch.version.cuda.split(".") == ["11", "6"], "There's a bug in CUDA 11.6") @skipCUDAIfNoMagma @skipCPUIfNoLapack @dtypes(*floating_and_complex_types()) @@ -4405,7 +4430,7 @@ def test_linalg_solve_triangular(self, device, dtype): @onlyCUDA @skipCUDAIfNoMagma # Magma needed for the PLU decomposition @skipCUDAIfRocm # There is a memory access bug in rocBLAS in the (non-batched) solve_triangular - @skipCUDAVersionIn([(11, 3), (11, 5)]) # Tracked in https://github.com/pytorch/pytorch/issues/70111 + @skipCUDAVersionIn([(11, 3), (11, 5), (11, 6)]) # Tracked in https://github.com/pytorch/pytorch/issues/70111 @dtypes(*floating_and_complex_types()) @precisionOverride({torch.float32: 1e-2, torch.complex64: 1e-2, torch.float64: 1e-8, torch.complex128: 1e-8}) @@ -5050,9 +5075,11 @@ def call_torch_fn(*args, **kwargs): A_LU, pivots = fn(torch.lu, (2, 0, 0)) self.assertEqual([(2, 0, 0), (2, 0)], [A_LU.shape, pivots.shape]) - @dtypesIfCUDA(torch.cfloat, torch.cdouble, - *get_all_fp_dtypes(include_half=not CUDA9, include_bfloat16=(CUDA11OrLater and SM53OrLater))) - @dtypes(*(set(get_all_dtypes()) - {torch.half, torch.bool})) + @dtypesIfCUDA(*floating_and_complex_types_and( + *[torch.half] if not CUDA9 else [], + *[torch.bfloat16] if CUDA11OrLater and SM53OrLater else [] + )) + @dtypes(*all_types_and_complex_and(torch.bfloat16)) def test_blas_alpha_beta_empty(self, device, dtype): # This test is disabled on CUDA 9 due to: # See: https://github.com/pytorch/pytorch/issues/31006 @@ -5088,7 +5115,7 @@ def test_blas_alpha_beta_empty(self, device, dtype): self.assertEqual(torch.full((2, 3), beta * value, dtype=dtype, device=device), torch.addmm(input=input, mat1=mat, mat2=mat2, alpha=alpha, beta=beta, out=out)) - @dtypes(*(get_all_complex_dtypes() + get_all_fp_dtypes())) + @dtypes(*floating_and_complex_types_and(torch.half, torch.bfloat16)) def test_blas_nan_out(self, device, dtype): # These functions should work correctly with NaN filled outputs, # but need special handling, see [NOTE: cpu_zero] @@ -5674,7 +5701,7 @@ def tracker(worker): ---(input size: {:4}, eigenpairs:{:2}, units: relative error, maxiter={:4})--- '''.format(tol, eq_err, eq_err_general, iters1, eq_err_scipy, eq_err_general_scipy, iters2, m, k, niter)) - def _test_addmm_addmv(self, f, t, m, v, *, alpha=None, beta=None, transpose_out=False): + def _test_addmm_addmv(self, f, t, m, v, *, alpha=None, beta=None, transpose_out=False, activation=None): dtype = t.dtype numpy_dtype = dtype if dtype in {torch.bfloat16}: @@ -5693,15 +5720,19 @@ def _test_addmm_addmv(self, f, t, m, v, *, alpha=None, beta=None, transpose_out= res3 = alpha * (m.to(numpy_dtype).cpu().numpy() @ v.to(numpy_dtype).cpu().numpy()) if beta != 0: res3 += (beta * t).to(numpy_dtype).cpu().numpy() + if activation == "relu": + res3 = res3 * (res3 > 0) + else: + assert activation is None, f"unsupported activation {activation}" res3 = torch.from_numpy(res3).to(dtype) self.assertEqual(res1, res2) self.assertEqual(res1, res3) @precisionOverride({torch.bfloat16: 1e-0, torch.half: 5e-4, torch.float: 1e-4, torch.double: 1e-8, torch.cfloat: 1e-4, torch.cdouble: 1e-8}) - @dtypesIfCUDA(*get_all_complex_dtypes(), - *get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater)), - include_half=(not TEST_WITH_ROCM))) + @dtypesIfCUDA(*floating_and_complex_types_and( + *[torch.bfloat16] if TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater) else [], + *[torch.half] if not TEST_WITH_ROCM else [])) @dtypes(torch.bfloat16, torch.float, torch.double, torch.cfloat, torch.cdouble) def test_addmv(self, device, dtype): # have to use torch.randn(...).to(bfloat16) instead of @@ -5736,7 +5767,8 @@ def test_addmv(self, device, dtype): for m, v in itertools.product(ms, vs): self._test_addmm_addmv(torch.addmv, t, m, v, beta=0) - @dtypesIfCUDA(*get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater)))) + @dtypesIfCUDA(*floating_types_and(*[torch.bfloat16] if TEST_WITH_ROCM or (CUDA11OrLater and + SM53OrLater) else [])) @dtypes(torch.float, torch.double) def test_addmv_rowmajor_colmajor_incx_incy_lda(self, device, dtype): # tests (o, s)*(s). o is output size, s is summed size. @@ -5765,29 +5797,23 @@ def _test(row_major, incx, incy, lda_tail): for row_major, incx, incy, lda_tail in itertools.product((False, True), (1, 2), (1, 2), (0, 1)): _test(row_major, incx, incy, lda_tail) - @precisionOverride({torch.double: 1e-8, torch.float: 1e-4, torch.bfloat16: 0.6, - torch.half: 1e-1, torch.cfloat: 1e-4, torch.cdouble: 1e-8}) - @dtypesIfCUDA(*get_all_complex_dtypes(), - *get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater)))) - @dtypes(*get_all_complex_dtypes(), *get_all_fp_dtypes()) - @tf32_on_and_off(0.05) - def test_addmm(self, device, dtype): + def _test_addmm_impl(self, func, activation, device, dtype): M = torch.randn(10, 25, device=device).to(dtype) m1 = torch.randn(10, 50, device=device).to(dtype) m2 = torch.randn(50, 25, device=device).to(dtype) - self._test_addmm_addmv(torch.addmm, M, m1, m2) + self._test_addmm_addmv(func, M, m1, m2, activation=activation) # Test 0-strided M = torch.randn(10, 1, device=device).to(dtype).expand(10, 25) m1 = torch.randn(10, 1, device=device).to(dtype).expand(10, 50) m2 = torch.randn(50, 25, device=device).to(dtype) - self._test_addmm_addmv(torch.addmm, M, m1, m2) + self._test_addmm_addmv(func, M, m1, m2, activation=activation) # Test beta=0, M=nan M = torch.full((10, 25), math.nan, device=device).to(dtype) m1 = torch.randn(10, 50, device=device).to(dtype) m2 = torch.randn(50, 25, device=device).to(dtype) - self._test_addmm_addmv(torch.addmm, M, m1, m2, beta=0) + self._test_addmm_addmv(func, M, m1, m2, beta=0, activation=activation) # Test transpose for t1, t2, t3, t4 in itertools.product([True, False], repeat=4): @@ -5799,10 +5825,28 @@ def maybe_transpose(cond, m): M = maybe_transpose(t1, torch.randn(10, 25, device=device).to(dtype)) m1 = maybe_transpose(t2, torch.randn(10, 50, device=device).to(dtype)) m2 = maybe_transpose(t3, torch.randn(50, 25, device=device).to(dtype)) - self._test_addmm_addmv(torch.addmm, M, m1, m2, transpose_out=t4) + self._test_addmm_addmv(func, M, m1, m2, transpose_out=t4, activation=activation) + + @precisionOverride({torch.double: 1e-8, torch.float: 1e-4, torch.bfloat16: 0.6, + torch.half: 1e-1, torch.cfloat: 1e-4, torch.cdouble: 1e-8}) + @dtypesIfCUDA(*floating_and_complex_types_and( + *[torch.bfloat16] if TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater) else [])) + @dtypes(*floating_and_complex_types_and(torch.half, torch.bfloat16)) + @tf32_on_and_off(0.05) + def test_addmm(self, device, dtype): + self._test_addmm_impl(torch.addmm, None, device, dtype) + + @precisionOverride({torch.double: 1e-8, torch.float: 1e-4, torch.bfloat16: 0.6, + torch.half: 1e-1, torch.cfloat: 1e-4, torch.cdouble: 1e-8}) + @dtypesIfCUDA(*floating_types_and( + *[torch.bfloat16] if TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater) else [])) + @dtypes(*floating_types_and(torch.bfloat16)) + @tf32_on_and_off(0.05) + def test_addmm_activation(self, device, dtype): + self._test_addmm_impl(torch._addmm_activation, "relu", device, dtype) @dtypes(torch.float, torch.double) - @dtypesIfCUDA(*([torch.float, torch.double] + get_all_complex_dtypes())) + @dtypesIfCUDA(*floating_and_complex_types()) @tf32_on_and_off(0.005) def test_addmm_sizes(self, device, dtype): for m in [0, 1, 25]: @@ -5855,7 +5899,8 @@ def test_matmul_45724(self, device): @slowTest @onlyNativeDeviceTypes - @dtypes(torch.float32, torch.float64, torch.bfloat16, torch.int32, torch.int64, torch.cfloat, torch.cdouble) + # bfloat16 doesn't have sufficient precision to pass this test + @dtypes(torch.float32, torch.float64, torch.int32, torch.int64, torch.cfloat, torch.cdouble) @dtypesIfCUDA(torch.float32, torch.float64, torch.cfloat, torch.cdouble) @tf32_on_and_off(0.01) def test_mm(self, device, dtype): @@ -6000,7 +6045,7 @@ def test_strided_mm_bmm(self, device, dtype): @precisionOverride({torch.half: 0.05, torch.bfloat16: 0.05}) @skipCUDAIf(torch.version.cuda == "10.1", "flaky on CUDA 10.1") @onlyNativeDeviceTypes - @dtypes(*get_all_fp_dtypes(), *get_all_complex_dtypes()) + @dtypes(*floating_and_complex_types_and(torch.half, torch.bfloat16)) @tf32_on_and_off(0.05) def test_bmm(self, device, dtype): if self.device_type == 'cuda' and dtype is torch.bfloat16 and CUDA11OrLater and not SM53OrLater: @@ -6112,7 +6157,7 @@ def _test_addbmm_baddbmm(self, func, b1, b2, ref, out_tensor): @precisionOverride({torch.half: 0.05, torch.bfloat16: 0.05}) @onlyNativeDeviceTypes - @dtypes(*get_all_fp_dtypes(), *get_all_complex_dtypes()) + @dtypes(*floating_and_complex_types_and(torch.half, torch.bfloat16)) @tf32_on_and_off(0.05) def test_addbmm(self, device, dtype): if self.device_type == 'cuda' and dtype is torch.bfloat16 and CUDA11OrLater and not SM53OrLater: @@ -6185,7 +6230,7 @@ def generate_tensor(): @precisionOverride({torch.half: 0.1, torch.bfloat16: 0.5}) @onlyNativeDeviceTypes - @dtypes(*get_all_fp_dtypes(), *get_all_complex_dtypes()) + @dtypes(*floating_and_complex_types_and(torch.half, torch.bfloat16)) @tf32_on_and_off(0.05) def test_baddbmm(self, device, dtype): if self.device_type == 'cuda' and dtype is torch.bfloat16 and CUDA11OrLater and not SM53OrLater: diff --git a/test/test_logging.py b/test/test_logging.py index 4bb057fd157a8b..01fdd3f8edd838 100644 --- a/test/test_logging.py +++ b/test/test_logging.py @@ -12,10 +12,10 @@ def testApiUsage(self): subprocess """ s = TestCase.runWithPytorchAPIUsageStderr("import torch") - self.assertRegexpMatches(s, "PYTORCH_API_USAGE.*import") + self.assertRegex(s, "PYTORCH_API_USAGE.*import") # import the shared library directly - it triggers static init but doesn't call anything s = TestCase.runWithPytorchAPIUsageStderr("from ctypes import CDLL; CDLL('{}')".format(torch._C.__file__)) - self.assertNotRegexpMatches(s, "PYTORCH_API_USAGE") + self.assertNotRegex(s, "PYTORCH_API_USAGE") if __name__ == '__main__': diff --git a/test/test_masked.py b/test/test_masked.py index fa192086c89a16..1b9b4b075f7c09 100644 --- a/test/test_masked.py +++ b/test/test_masked.py @@ -10,11 +10,11 @@ import unittest from torch.testing._internal.common_utils import \ - (TestCase, suppress_warnings, _TestParametrizer) + (TestCase, parametrize, suppress_warnings, _TestParametrizer) from torch.testing._internal.common_methods_invocations import \ (op_db, SampleInput) from torch.testing._internal.common_device_type import \ - (instantiate_device_type_tests, ops, onlyNativeDeviceTypes) + (instantiate_device_type_tests, ops, onlyNativeDeviceTypes, precisionOverride) def apply_masked_reduction_along_dim(op, input, *args, **kwargs): @@ -113,7 +113,10 @@ def apply_masked_reduction_along_dim(op, input, *args, **kwargs): output = input.new_full(shape, float('nan') if dtype.is_floating_point else 0, dtype=dtype) # apply op to all elementary slices: - inpmask = torch._masked._input_mask(input, mask=mask) + if mask is None: + inpmask = input.new_ones([], dtype=torch.bool).expand(input.shape) + else: + inpmask = torch._masked._input_mask(input, mask=mask) for s in itertools.product(*ranges): # data of an elementary slice is 1D sequence and has only # masked-in elements: @@ -142,7 +145,10 @@ def apply_masked_normalization_along_dim(op, input, *args, **kwargs): dim = args[dim_pos] args0 = args[:dim_pos] + (0,) + args[dim_pos + 1:] output = torch.zeros_like(input, dtype=dtype) - inpmask = torch._masked._input_mask(input, mask=mask) + if mask is None: + inpmask = input.new_ones([], dtype=torch.bool).expand(input.shape) + else: + inpmask = torch._masked._input_mask(input, mask=mask) dim_ = dim % input.ndim left_ranges = tuple(map(range, input.shape[:dim_])) right_ranges = tuple(map(range, input.shape[dim_ + 1:])) @@ -155,6 +161,7 @@ def apply_masked_normalization_along_dim(op, input, *args, **kwargs): reference_functions = dict( norm=lambda *args, **kwargs: apply_masked_reduction_along_dim(torch.linalg.vector_norm, *args, **dict(kwargs, dim_position=1)), var=lambda *args, **kwargs: apply_masked_reduction_along_dim(torch.var, *args, **dict(kwargs, dim_position=0)), + std=lambda *args, **kwargs: apply_masked_reduction_along_dim(torch.std, *args, **dict(kwargs, dim_position=0)), softmax=lambda *args, **kwargs: apply_masked_normalization_along_dim(torch.softmax, *args, **kwargs), log_softmax=lambda *args, **kwargs: apply_masked_normalization_along_dim(torch.log_softmax, *args, **kwargs), softmin=lambda *args, **kwargs: apply_masked_normalization_along_dim(torch.nn.functional.softmin, *args, **kwargs), @@ -236,14 +243,7 @@ def sample_inputs_generator(): kwargs=sample_input_kwargs) if layout != torch.sparse_coo and op.supports_sparse: sample_input_kwargs = sample_input.kwargs.copy() - if mask.layout == torch.sparse_csr: - # TODO: remove this if-block when sparse csr supports to_sparse - mask = torch.sparse_coo_tensor( - torch._convert_indices_from_csr_to_coo(mask.crow_indices(), mask.col_indices()), - mask.values(), mask.shape)._coalesced_(True) - sample_input_kwargs.update(mask=mask) - else: - sample_input_kwargs.update(mask=mask.to_sparse()) + sample_input_kwargs.update(mask=mask.to_sparse()) yield SampleInput(sample_input.input.clone(), args=sample_input.args, kwargs=sample_input_kwargs) @@ -264,31 +264,37 @@ class TestMasked(TestCase): def assertEqualMasked(self, actual, expected, mask): strided = to_strided(actual) - strided = torch.where(mask, strided, strided.new_zeros([])) - expected = torch.where(mask, expected, expected.new_zeros([])) + if mask is not None: + strided = torch.where(mask, strided, strided.new_zeros([])) + expected = torch.where(mask, expected, expected.new_zeros([])) self.assertEqual(strided, expected, exact_device=False) @onlyNativeDeviceTypes @suppress_warnings @ops(masked_ops_with_references) + @precisionOverride({torch.bfloat16: 5e-4, torch.float16: 5e-4}) def test_reference_masked(self, device, dtype, op): op_name = op.name.rsplit('.', 1)[-1] ref_op = reference_functions[op_name] sample_inputs = op.sample_inputs(device, dtype) for sample_input in sample_inputs: t_inp, t_args, t_kwargs = sample_input.input, sample_input.args, sample_input.kwargs - if op_name == 'var' and not (t_inp.dtype.is_floating_point or t_inp.dtype.is_complex): - # torch.var does not support integer inputs + if op_name in {'var', 'std'} and not (t_inp.dtype.is_floating_point or t_inp.dtype.is_complex): + # torch.var/torch.std does not support integer inputs continue actual = op.op(t_inp, *t_args, **t_kwargs) expected = ref_op(t_inp, *t_args, **t_kwargs) - outmask = torch._masked._output_mask(op.op, t_inp, *t_args, **t_kwargs) + if t_kwargs.get('mask') is None: + outmask = None + else: + outmask = torch._masked._output_mask(op.op, t_inp, *t_args, **t_kwargs) self.assertEqualMasked(actual, expected, outmask) @mask_layouts() @onlyNativeDeviceTypes @suppress_warnings @ops(masked_ops_with_non_strided_support) + @precisionOverride({torch.bfloat16: 5e-3, torch.float16: 5e-3}) def test_mask_layout(self, layout, device, dtype, op, sample_inputs): for sample in sample_inputs: t_inp, t_args, t_kwargs = sample.input, sample.args, sample.kwargs @@ -300,9 +306,124 @@ def test_mask_layout(self, layout, device, dtype, op, sample_inputs): # op(inp, mask).to_dense() == op(inp.to_dense(), mask.to_dense()) at outmask # r_inp, r_args, r_kwargs = to_strided((t_inp, t_args, t_kwargs)) - outmask = torch._masked._output_mask(op.op, r_inp, *r_args, **r_kwargs) + if r_kwargs.get('mask') is None: + outmask = None + else: + outmask = torch._masked._output_mask(op.op, r_inp, *r_args, **r_kwargs) expected = op.op(r_inp, *r_args, **r_kwargs) self.assertEqualMasked(actual, expected, outmask) + @parametrize("sparse_kind,fill_value", [('coo', 0), ('hybrid_coo', 0), + ('coo', 123), ('hybrid_coo', 123), + ('csr', 0), ('csr', 123)], + name_fn=lambda sparse_kind, fill_value: f'{sparse_kind}_fill_value_{fill_value}') + def test_where(self, sparse_kind, fill_value): + + is_hybrid = False + if sparse_kind == 'coo': + + def to_sparse(dense): + return dense.to_sparse(2) + + def set_values(sparse, index, value): + sparse._values()[index] = value + + elif sparse_kind == 'hybrid_coo': + is_hybrid = True + + def to_sparse(dense): + return dense.to_sparse(1) + + def set_values(sparse, index, value): + sparse._values()[index] = value + + elif sparse_kind == 'csr': + + def to_sparse(dense): + return dense.to_sparse_csr() + + def set_values(sparse, index, value): + sparse.values()[index] = value + + else: + assert 0, sparse_kind + + mask = torch.tensor([[1, 0, 1, 0, 0], + [1, 1, 1, 1, 0], + [0, 1, 0, 1, 0], + [0, 0, 0, 0, 0], + [0, 0, 1, 1, 0], + [1, 1, 0, 0, 0]]).to(dtype=bool) + mask = to_sparse(mask) + # make some specified mask elements as explicit masked-out masks: + if is_hybrid: + set_values(mask, (1, 1), False) + set_values(mask, (-2, -2), False) + else: + set_values(mask, 3, False) + set_values(mask, -3, False) + + input = torch.tensor([[1, 0, 0, 0, -1], + [2, 3, 0, 0, -2], + [0, 4, 5, 0, -3], + [0, 0, 6, 7, 0], + [0, 8, 9, 0, -3], + [10, 11, 0, 0, -5]]) + input = to_sparse(input) + # make specified input elements have zero values: + if is_hybrid: + set_values(input, (1, 1), 0) + set_values(input, (-1, 0), 0) + F = fill_value + else: + set_values(input, 3, 0) + set_values(input, -3, 0) + F = 0 + + # expected where result: + Z = 99 + # Z value corresponds to masked-in elements that are not + # specified in the input and it will be replaced with a zero + tmp = torch.tensor([[1, F, Z, F, F], + [2, F, Z, Z, F], + [F, 4, F, Z, F], + [0, 0, 0, 0, 0], + [F, F, 9, F, F], + [Z, 11, F, F, F]]) + tmp = to_sparse(tmp) + + + sparse = torch._masked._where(mask, input, + torch.tensor(fill_value, dtype=input.dtype, device=input.device)) + + if tmp.layout == torch.sparse_coo: + expected_sparse = torch.sparse_coo_tensor( + tmp.indices(), + torch.where(tmp.values() != Z, tmp.values(), tmp.values().new_full([], 0)), + input.shape) + outmask = torch.sparse_coo_tensor(sparse.indices(), + sparse.values().new_full(sparse.values().shape, 1).to(dtype=bool), + sparse.shape)._coalesced_(True) + elif tmp.layout == torch.sparse_csr: + expected_sparse = torch.sparse_csr_tensor( + tmp.crow_indices(), + tmp.col_indices(), + torch.where(tmp.values() != Z, tmp.values(), tmp.values().new_full([], 0)), + input.shape) + outmask = torch.sparse_csr_tensor(sparse.crow_indices(), sparse.col_indices(), + sparse.values().new_full(sparse.values().shape, 1).to(dtype=bool), + sparse.shape) + else: + assert 0 + + self.assertEqual(sparse, expected_sparse) + + # check invariance: + # torch.where(mask.to_dense(), input.to_dense(), fill_value) + # == where(mask, input, fill_value).to_dense(fill_value) + expected = torch.where(mask.to_dense(), input.to_dense(), torch.full(input.shape, F)) + dense = torch.where(outmask.to_dense(), sparse.to_dense(), torch.full(sparse.shape, F)) + self.assertEqual(dense, expected) + instantiate_device_type_tests(TestMasked, globals(), except_for='meta') diff --git a/test/test_module_init.py b/test/test_module_init.py index 589db4b71622e3..fa0ac8f79dcee1 100644 --- a/test/test_module_init.py +++ b/test/test_module_init.py @@ -166,6 +166,9 @@ def build_constructor_arg_db(): torch.nn.UpsamplingBilinear2d: ((), {}), torch.nn.UpsamplingNearest2d: ((), {}), torch.nn.ZeroPad2d: ((0,), {}), + torch.nn.qat.Conv1d: ((3, 3, 3), { + 'qconfig': torch.ao.quantization.default_qconfig, + }), torch.nn.qat.Conv2d: ((3, 3, 3), { 'qconfig': torch.ao.quantization.default_qconfig, }), @@ -206,7 +209,7 @@ def build_constructor_arg_db(): torch.nn.quantized.EmbeddingBag: ((10, 3), { 'factory_kwargs': {}, }), - torch.nn.quantized.GroupNorm: ((2, 3, torch.nn.Parameter(torch.tensor(2.)), + torch.nn.quantized.GroupNorm: ((2, 4, torch.nn.Parameter(torch.tensor(2.)), torch.nn.Parameter(torch.tensor(2.)), 0.1, 0), {}), torch.nn.quantized.Hardswish: ((0.1, 0,), {}), torch.nn.quantized.InstanceNorm1d: ((2, torch.nn.Parameter(torch.tensor(2.)), diff --git a/test/test_multiprocessing.py b/test/test_multiprocessing.py index cdadf6aec001b5..515121586fcfd5 100644 --- a/test/test_multiprocessing.py +++ b/test/test_multiprocessing.py @@ -258,7 +258,7 @@ def test_fill(): self.assertTrue(e.is_set()) self.assertTrue(data[0].eq(4).all()) self.assertTrue(data[1].eq(4).all()) - p.join(1) + p.join(100) self.assertFalse(p.is_alive()) def test_receive(): @@ -280,7 +280,7 @@ def test_receive(): # collect them properly del t1, t2 e.set() - p.join(1) + p.join(100) self.assertFalse(p.is_alive()) with leak_checker(self) as lc: @@ -587,6 +587,7 @@ def _test_event_multiprocess_child(event, p2c, c2p): event.synchronize() c2p.put(1) # notify parent synchronization is done + @unittest.skip("Skipped as this test fails on ROCm") @unittest.skipIf(NO_MULTIPROCESSING_SPAWN, "Disabled for environments that \ don't support multiprocessing with spawn start method") @unittest.skipIf(not TEST_CUDA_IPC, 'CUDA IPC not available') @@ -645,6 +646,7 @@ def _test_event_handle_importer_consumer(handle, p2c, c2p): c2p.put(1) # nofity synchronization is done in child p2c.get() # wait for parent to finish before destructing child event + @unittest.skip("Skipped as this test fails on ROCm") @unittest.skipIf(NO_MULTIPROCESSING_SPAWN, "Disabled for environments that \ don't support multiprocessing with spawn start method") @unittest.skipIf(not TEST_CUDA_IPC, 'CUDA IPC not available') @@ -684,6 +686,7 @@ def _test_event_handle_exporter_consumer(handle, p2c, c2p): # destructing e1 p2c.get() + @unittest.skip("Skipped as this test fails on ROCm") @unittest.skipIf(NO_MULTIPROCESSING_SPAWN, "Disabled for environments that \ don't support multiprocessing with spawn start method") @unittest.skipIf(not TEST_CUDA_IPC, 'CUDA IPC not available') @@ -753,7 +756,7 @@ def hook(*unused): self.assertEqual(var.data, torch.ones(5, 5, device=device)) self.assertEqual(var.grad.data, torch.ones(5, 5, device=device) * 4) - p.join(1) + p.join(100) self.assertFalse(p.is_alive()) # Check sharing a cudaMalloc allocation with different types of storage. diff --git a/test/test_namedtuple_return_api.py b/test/test_namedtuple_return_api.py index ddc23e45f276e5..c0d8c6489aa8f7 100644 --- a/test/test_namedtuple_return_api.py +++ b/test/test_namedtuple_return_api.py @@ -18,7 +18,8 @@ 'triangular_solve', 'cummax', 'cummin', 'linalg_eigh', "_unpack_dual", 'linalg_qr', 'linalg_svd', '_linalg_svd', 'linalg_slogdet', 'fake_quantize_per_tensor_affine_cachemask', 'fake_quantize_per_channel_affine_cachemask', 'linalg_lstsq', 'linalg_eig', 'linalg_cholesky_ex', - 'frexp', 'lu_unpack', 'histogram', '_fake_quantize_per_tensor_affine_cachemask_tensor_qparams', + 'frexp', 'lu_unpack', 'histogram', 'histogramdd', + '_fake_quantize_per_tensor_affine_cachemask_tensor_qparams', '_fused_moving_avg_obs_fq_helper', 'linalg_lu_factor', 'linalg_lu_factor_ex', '_det_lu_based_helper', '_lu_with_info', @@ -100,6 +101,7 @@ def test_namedtuple_return(self): input=(torch.tensor([3, 2, 1, 4, 5], dtype=torch.int32), True, True), names=('P', 'L', 'U'), hasout=True), op(operators=['histogram'], input=(1,), names=('hist', 'bin_edges'), hasout=True), + op(operators=['histogramdd'], input=(1,), names=('hist', 'bin_edges'), hasout=False), op(operators=['_fake_quantize_per_tensor_affine_cachemask_tensor_qparams'], input=(torch.tensor([1.0]), torch.tensor([0], dtype=torch.int), torch.tensor([1]), 0, 255), names=('output', 'mask',), hasout=False), diff --git a/test/test_nestedtensor.py b/test/test_nestedtensor.py index cf868f2761794c..eeaa51b24d66ce 100644 --- a/test/test_nestedtensor.py +++ b/test/test_nestedtensor.py @@ -133,14 +133,15 @@ def test_numel(self): RuntimeError, "numel is disabled", lambda: a1.numel(), ) - @unittest.skipIf(IS_FBCODE, "size is not virtual in fbcode.") @torch.inference_mode() def test_size(self): for constructor in _iter_constructors(): a1 = constructor([]) self.assertRaisesRegex( RuntimeError, - "NestedTensorImpl doesn't support sizes", + "Tensors of type NestedTensorImpl do not have sizes" + if IS_FBCODE + else "NestedTensorImpl doesn't support sizes", lambda: a1.size(), ) @@ -182,3 +183,12 @@ def test_repr_string(self): ) self.assertEqual(str(a), expected) self.assertEqual(repr(a), expected) + + @torch.inference_mode() + def test_activations(self): + for func in (torch.nn.functional.relu, torch.nn.functional.relu_, torch.nn.functional.gelu, torch._C._nn.gelu_): + t = torch.tensor([-1, 0, 1], dtype=torch.float) + nt = nested_tensor([t]) + nested_result = func(nt) + self.assertTrue(nested_result.is_nested) + self.assertEqual(func(t), nested_result.unbind()[0]) diff --git a/test/test_nn.py b/test/test_nn.py index 30c4d136e1b907..809cd8b455191d 100644 --- a/test/test_nn.py +++ b/test/test_nn.py @@ -35,12 +35,13 @@ from torch.nn import Parameter from torch.nn.parameter import UninitializedParameter, UninitializedBuffer from torch.nn.parallel._functions import Broadcast -from torch.testing._internal.common_dtype import integral_types, get_all_fp_dtypes, get_all_math_dtypes +from torch.testing._internal.common_dtype import integral_types, floating_types_and, get_all_math_dtypes, \ + floating_and_complex_types_and from torch.testing._internal.common_utils import freeze_rng_state, run_tests, TestCase, skipIfNoLapack, skipIfRocm, \ skipIfRocmVersionLessThan, skipIfNotMiopenSuggestNHWC, TEST_NUMPY, TEST_SCIPY, TEST_WITH_ROCM, download_file, \ get_function_arglist, load_tests, \ suppress_warnings, TemporaryFileName, TEST_WITH_UBSAN, IS_PPC, \ - parametrize as parametrize_test, subtest, instantiate_parametrized_tests + parametrize as parametrize_test, subtest, instantiate_parametrized_tests, set_default_dtype from torch.testing._internal.common_cuda import TEST_CUDA, TEST_MULTIGPU, TEST_CUDNN, TEST_CUDNN_VERSION from torch.testing._internal.common_nn import NNTestCase, NewModuleTest, CriterionTest, \ module_tests, criterion_tests, loss_reference_fns, \ @@ -53,6 +54,7 @@ from torch.nn import MultiheadAttention from hypothesis import given +from torch.testing import make_tensor import torch.testing._internal.hypothesis_utils as hu from torch.testing._internal.common_utils import _assertGradAndGradgradChecks, gradcheck, gradgradcheck, \ GRADCHECK_NONDET_TOL @@ -69,6 +71,7 @@ if TEST_SCIPY: from scipy import stats + import scipy.signal import scipy.ndimage if TEST_NUMPY: @@ -892,7 +895,7 @@ def test_no_grad(self): self.assertRaises(RuntimeError, lambda: output2.backward(torch.ones(1, 5, 10, 10))) def test_invalid_conv1d(self): - for dtype in [torch.bfloat16, torch.float, torch.double]: + for dtype in [torch.bfloat16, torch.float, torch.double, torch.cfloat, torch.cdouble]: module = nn.Conv1d(in_channels=3, out_channels=33, kernel_size=10, stride=1, bias=True).to(dtype) input = torch.randn(1, 3, 4).to(dtype) with self.assertRaisesRegex(RuntimeError, @@ -907,30 +910,32 @@ def test_invalid_conv1d(self): module(input) def test_mismatch_shape_conv2d(self): - x = torch.randn(1, 10, 1, 28, 28) - w = torch.randn(6, 1, 5, 5) + for dtype in (torch.float, torch.cfloat): + x = torch.randn(1, 10, 1, 28, 28, dtype=dtype) + w = torch.randn(6, 1, 5, 5, dtype=dtype) - with self.assertRaisesRegex(RuntimeError, - r'Expected 3D \(unbatched\) or 4D \(batched\) input to conv2d, but got ' + - r'input of size: \[1, 10, 1, 28, 28\]'): + with self.assertRaisesRegex(RuntimeError, + r'Expected 3D \(unbatched\) or 4D \(batched\) input to conv2d, but got ' + + r'input of size: \[1, 10, 1, 28, 28\]'): - F.conv2d(x, w) + F.conv2d(x, w) def test_conv2d_discontiguous_weight(self): - # Test for https://github.com/pytorch/pytorch/issues/55781 - x = torch.ones(64, 16, 16, 16) - weight = torch.arange(0, 1.0, 1 / 2.0 ** 10).reshape(32, 16, 1, 2)[:, :, :, ::2] - self.assertFalse(weight.is_contiguous()) - y = torch.nn.functional.conv2d(x, weight, None) - if torch.backends.mkldnn.is_available(): - # Disable MKLDNN explicitly, so that either NNPACK or THCNN will be used - with torch.backends.mkldnn.flags(enabled=False): - y_ = torch.nn.functional.conv2d(x, weight, None) - self.assertEqual(y, y_) - self.assertEqual(y.sum(), 4186112.) + for dtype in (torch.float, torch.cfloat): + # Test for https://github.com/pytorch/pytorch/issues/55781 + x = torch.ones(64, 16, 16, 16, dtype=dtype) + weight = torch.arange(0, 1.0, 1 / 2.0 ** 10).reshape(32, 16, 1, 2).to(dtype)[:, :, :, ::2] + self.assertFalse(weight.is_contiguous()) + y = torch.nn.functional.conv2d(x, weight, None) + if torch.backends.mkldnn.is_available(): + # Disable MKLDNN explicitly, so that either NNPACK or THCNN will be used + with torch.backends.mkldnn.flags(enabled=False): + y_ = torch.nn.functional.conv2d(x, weight, None) + self.assertEqual(y, y_) + self.assertEqual(y.sum(), 4186112.) def test_invalid_conv2d(self): - for dtype in [torch.bfloat16, torch.float, torch.double]: + for dtype in [torch.bfloat16, torch.float, torch.double, torch.cfloat, torch.cdouble]: module = torch.nn.Conv2d(1, 1, kernel_size=3, dilation=2, stride=2).to(dtype) input = torch.empty(1, 1, 4, 4).to(dtype) self.assertRaises(RuntimeError, lambda: module(input)) @@ -955,7 +960,7 @@ def test_invalid_conv2d(self): module(input) def test_invalid_conv3d(self): - for dtype in [torch.bfloat16, torch.float, torch.double]: + for dtype in [torch.bfloat16, torch.float, torch.double, torch.cfloat, torch.cdouble]: module = torch.nn.Conv3d(1, 1, kernel_size=3, dilation=2, stride=2).to(dtype) input = torch.empty(1, 1, 4, 4, 4).to(dtype) self.assertRaises(RuntimeError, lambda: module(input)) @@ -3169,6 +3174,40 @@ def forward(self, X): Y = model.weight self.assertEqual(id(X), id(Y)) + # FIXME: Rewrite this test using functions not depending on LAPACK + # and remove the `@skipIfNoLapack` (see #70995) + @skipIfNoLapack + def test_caching_parametrization_with_transfer_parametrizations_and_params(self): + r"""Test that transferring parametrizations doesn't cause issues with caching""" + class Skew(nn.Module): + def forward(self, X): + X = X.tril(-1) + return X - X.T + + class Orthogonal(nn.Module): + def forward(self, X): + Id = torch.eye(X.size(0), device=X.device) + return torch.linalg.solve(Id + X, Id - X) + + model = nn.Linear(5, 5) + parametrize.register_parametrization(model, "weight", Skew()) + parametrize.register_parametrization(model, "weight", Orthogonal()) + + to_model = nn.Linear(5, 5) + parametrize.transfer_parametrizations_and_params(model, to_model) + + with parametrize.cached(): + X = model.weight + Y = model.weight + self.assertEqual(id(X), id(Y)) + + A = to_model.weight + B = to_model.weight + self.assertEqual(id(A), id(B)) + + # test that the results are distinct objects for each module + self.assertNotEqual(id(A), id(X)) + def test_parametrization_same_training_mode(self): r"""Test training mode updated on parametrization registration""" class Identity(nn.Module): @@ -3184,6 +3223,220 @@ def forward(self, X): self.assertTrue(module.parametrizations.weight[0].training) self.assertTrue(module.parametrizations.weight[1].training) + def test_type_before_parametrizations(self): + r"""Test that type_before_parametrizations always retrieves original type""" + + class Identity(nn.Module): + def forward(self, X): + return X + + model = nn.Linear(5, 5) + original_type = type(model) + self.assertTrue( + parametrize.type_before_parametrizations(model) == original_type + ) + parametrize.register_parametrization(model, "weight", Identity()) + self.assertTrue( + parametrize.type_before_parametrizations(model) == original_type + ) + + def test_transfer_parametrizations_and_params(self): + r"""Test that all parametrizations and their associated parameters are transferred.""" + + class AddOne(nn.Module): + def forward(self, x): + return x + 1.0 + + class Double(nn.Module): + def forward(self, x): + return 2.0 * x + + def right_inverse(self, x): + return 0.5 * x + + class MinusOne(nn.Module): + def forward(self, x): + return x - 1.0 + + model = nn.Linear(5, 5) + parametrize.register_parametrization(model, "weight", AddOne()) + parametrize.register_parametrization(model, "weight", Double()) + parametrize.register_parametrization(model, "weight", MinusOne()) + hold_weight = model.weight + + to_model = nn.qat.Linear( + 5, 5, qconfig=torch.ao.quantization.get_default_qconfig() + ) + parametrize.transfer_parametrizations_and_params(model, to_model) + + # checks that final and original value are correct and the to_model is parametrized + self.assertTrue(torch.nn.utils.parametrize.is_parametrized(to_model, "weight")) + self.assertEqual(model.weight, to_model.weight) + self.assertEqual( + model.parametrizations.weight.original, + to_model.parametrizations.weight.original, + ) + + # check that the transfer didn't affect the original value + self.assertEqual(hold_weight, model.weight) + + # testing that changes to one set of parametrizations do not affect the other + parametrize.remove_parametrizations(to_model, "weight") + self.assertFalse(torch.nn.utils.parametrize.is_parametrized(to_model, "weight")) + self.assertTrue(torch.nn.utils.parametrize.is_parametrized(model, "weight")) + + # also test that parameters that don't exist in to_model get transferred + model.test_param = Parameter(torch.randn(5, 5)) + + self.assertTrue(not hasattr(to_model, "test_param")) + parametrize.register_parametrization(model, "test_param", Double()) + hold_test_param = model.test_param + parametrize.transfer_parametrizations_and_params(model, to_model, "test_param") + + # check that previously missing params got transferred correctly + self.assertEqual(model.test_param, to_model.test_param) + self.assertEqual( + model.parametrizations.test_param.original, + to_model.parametrizations.test_param.original, + ) + + # check that the new transfer didn't change the value for the from_module + self.assertEqual(hold_test_param, model.test_param) + + def test_transfer_parametrizations_and_params_right_inverse(self): + r"""Test that all parametrizations and their associated parameters are transferred.""" + + class Double(nn.Module): + def forward(self, x): + return 2.0 * x + + def right_inverse(self, x): + return 0.5 * x + + model = nn.Linear(5, 5) + parametrize.register_parametrization(model, "weight", Double()) + hold_weight = model.weight + + to_model = nn.qat.Linear( + 5, 5, qconfig=torch.ao.quantization.get_default_qconfig() + ) + parametrize.transfer_parametrizations_and_params(model, to_model) + + # check that transfer occurs successfully + self.assertEqual(model.weight, to_model.weight) + self.assertEqual( + model.parametrizations.weight.original, + to_model.parametrizations.weight.original, + ) + + # check that transfer doesn't affect the from_model weight + self.assertEqual(hold_weight, model.weight) + + def test_transfer_parametrizations_and_params_single_param(self): + r"""Test that all parametrizations and their associated parameters are transferred.""" + + class AddOne(nn.Module): + def forward(self, x): + return x + 1.0 + + class Double(nn.Module): + def forward(self, x): + return 2.0 * x + + class MinusOne(nn.Module): + def forward(self, x): + return x - 1.0 + + model = nn.Linear(5, 5, bias=True) + parametrize.register_parametrization(model, "weight", AddOne()) + parametrize.register_parametrization(model, "weight", Double()) + parametrize.register_parametrization(model, "weight", MinusOne()) + parametrize.register_parametrization(model, "bias", AddOne()) + parametrize.register_parametrization(model, "bias", Double()) + parametrize.register_parametrization(model, "bias", MinusOne()) + + to_model = nn.qat.Linear( + 5, 5, bias=True, qconfig=torch.ao.quantization.get_default_qconfig() + ) + parametrize.transfer_parametrizations_and_params(model, to_model, "weight") + + # check that weight and only weight was transferred + self.assertEqual(model.weight, to_model.weight) + self.assertEqual( + model.parametrizations.weight.original, + to_model.parametrizations.weight.original, + ) + self.assertTrue("bias" not in to_model.parametrizations) + + # FIXME: Rewrite this test using functions not depending on LAPACK + # and remove the `@skipIfNoLapack` (see #70995) + @skipIfNoLapack + def test_transfer_parametrizations_and_params_many_to_one(self): + # A parametrization with several outputs + class RankOne(nn.Module): + def forward(self, x, y): + # Form a rank-1 matrix from a pair of vectors + return x.unsqueeze(-1) @ y.unsqueeze(-2) + + def right_inverse(self, Y): + # We project the given matrix onto the rank 1 matrices + U, S, Vh = torch.linalg.svd(Y, full_matrices=False) + # S is ordered in a decreasing way. + s0_sqrt = S[0].sqrt().unsqueeze(-1) + return U[..., :, 0] * s0_sqrt, Vh[..., 0, :] * s0_sqrt + + class Double(nn.Module): + def forward(self, x): + return 2.0 * x + + model = nn.Linear(3, 3) + parametrize.register_parametrization(model, "weight", RankOne()) + parametrize.register_parametrization(model, "weight", Double()) + hold_weight = model.weight + + to_model = nn.qat.Linear( + 3, 3, qconfig=torch.ao.quantization.get_default_qconfig() + ) + + parametrize.transfer_parametrizations_and_params(model, to_model) + + # checks that final and original value are correct and the to_model is parametrized + self.assertTrue(torch.nn.utils.parametrize.is_parametrized(to_model, "weight")) + self.assertEqual(model.weight, to_model.weight) + self.assertEqual( + model.parametrizations.weight.original0, + to_model.parametrizations.weight.original0, + ) + self.assertEqual( + model.parametrizations.weight.original1, + to_model.parametrizations.weight.original1, + ) + + # check that the transfer didn't affect the original value + self.assertEqual(hold_weight, model.weight) + + # testing that changes to one set of parametrizations do not affect the other + model.test_param = Parameter(torch.randn(3, 3)) + + self.assertTrue(not hasattr(to_model, "test_param")) + parametrize.register_parametrization(model, "test_param", RankOne()) + hold_test_param = model.test_param + parametrize.transfer_parametrizations_and_params(model, to_model, "test_param") + + # also check that previously missing params got transferred correctly + self.assertEqual(model.test_param, to_model.test_param) + self.assertEqual( + model.parametrizations.test_param.original0, + to_model.parametrizations.test_param.original0, + ) + self.assertEqual( + model.parametrizations.test_param.original1, + to_model.parametrizations.test_param.original1, + ) + + # check that the new transfer didn't change the value for the from_module + self.assertEqual(hold_test_param, model.test_param) + # torch/nn/utils/prune.py @unittest.skipIf(not TEST_NUMPY, "numpy not found") def test_validate_pruning_amount_init(self): @@ -4823,7 +5076,7 @@ def assert_weight_allclose_Q(weight, W): (torch.float32, torch.complex64), (True, False)): # Conv2d does not support complex yet - if not use_linear and dtype.is_complex: + if not use_linear: continue if use_linear: @@ -6305,7 +6558,7 @@ def test(should_raise, module, input_size, dtype): # just run it to ensure no exception raised. module(input) - for dtype in [torch.bfloat16, torch.float, torch.double]: + for dtype in [torch.bfloat16, torch.float, torch.double, torch.cfloat, torch.cdouble]: # Conv1d test(True, nn.Conv1d(1, 1, 3).to(dtype), (1, 2), dtype) test(True, nn.Conv1d(1, 1, 3, stride=2).to(dtype), (1, 2), dtype) @@ -6381,8 +6634,6 @@ def test_ConvTranspose2d_half_cublas_gemm(self): output = deconv(inputs) output.mean().backward() - - @skipIfRocm # For https://github.com/pytorch/pytorch/pull/1273 # Almost identical to the above `test_Conv2d_naive_groups` def test_Conv2d_groups_nobias(self): @@ -6422,7 +6673,6 @@ def test_Conv2d_groups_nobias(self): # Covering special case when group > 1, input-channel / group < 16 and output-channel is multiple of 16 # See also https://github.com/pytorch/pytorch/pull/18463#issuecomment-476563686 # and https://github.com/pytorch/pytorch/pull/18463#issuecomment-477001024 - @skipIfRocm def test_Conv2d_groups_nobias_v2(self): torch.manual_seed(123) dev_dtypes = [("cpu", torch.float)] @@ -9978,10 +10228,10 @@ def test_grid_sample_error_checking(self): with self.assertRaisesRegex(ValueError, "but got: 'garbage'"): F.grid_sample(input, grid, padding_mode='garbage', align_corners=False) - with self.assertRaisesRegex(RuntimeError, "expected 4D or 5D input"): + with self.assertRaisesRegex(RuntimeError, "expected grid to have size 1 in last dimension"): F.grid_sample(input[0], grid, align_corners=False) - with self.assertRaisesRegex(RuntimeError, "grid with same number of dimensions"): + with self.assertRaisesRegex(RuntimeError, "expected grid to have size 2 in last dimension"): F.grid_sample(input, torch.empty(1, 1, 1, 1, 3), align_corners=False) with self.assertRaisesRegex(RuntimeError, "expected grid and input to have same batch size"): @@ -9997,7 +10247,7 @@ def test_grid_sample_error_checking(self): F.grid_sample(torch.empty(1, 1, 2, 2, 2), torch.empty(1, 1, 1, 1, 3), mode='bicubic') if TEST_CUDA: - with self.assertRaisesRegex(RuntimeError, "expected input and grid to be on same device"): + with self.assertRaisesRegex(RuntimeError, "Expected all tensors to be on the same device"): F.grid_sample(input.cuda(), grid, align_corners=False) def test_affine_grid_error_checking(self): @@ -11390,6 +11640,12 @@ def test_cross_entropy_loss_precision(self): outd = loss_cpu(inputd, target) self.assertEqual(outf, outd, exact_dtype=False) + def test_cross_entropy_loss_zero_div(self): + # Test for issue #73165 + input_1 = torch.rand([5, 0], dtype=torch.float32) + input_2 = torch.rand([5, 0], dtype=torch.float32) + torch.nn.CrossEntropyLoss()(input_1, input_2) + @unittest.skipIf(not torch.cuda.is_available(), "CUDA not available") def test_convert_sync_batchnorm(self): module = torch.nn.Sequential( @@ -12826,9 +13082,8 @@ def _test_GroupNorm_general(self, device, dtype=torch.float): (2, 6, 4, 2, 2): 4, } for shape, g in bad_shape_g.items(): - gn = nn.GroupNorm(g, shape[1]) - input = torch.empty(*shape, device=device, dtype=dtype).uniform_(0, 10) - self.assertRaises(RuntimeError, lambda: gn(input)) + with self.assertRaises(ValueError): + gn = nn.GroupNorm(g, shape[1]) def _test_GroupNorm_cuda_half(self): input = torch.zeros(2, 4, 3, 2, requires_grad=True).cuda().half().random_(1, 10) @@ -13099,7 +13354,7 @@ def test_affine_3d_rotateRandom(self, device): @onlyCUDA @skipCUDAIfNoCudnn - @dtypes(*get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM)) + @dtypes(*floating_and_complex_types_and(torch.half, *[torch.bfloat16] if AMPERE_OR_ROCM else [])) def test_Conv2d_deterministic_cudnn(self, device, dtype): inputs = torch.randn(2, 3, 5, 5, device=device, dtype=dtype, requires_grad=True) with cudnn.flags(enabled=True, benchmark=True, deterministic=True): @@ -13118,7 +13373,7 @@ def test_Conv2d_deterministic_cudnn(self, device, dtype): @onlyCUDA - @dtypes(*get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM)) + @dtypes(*floating_types_and(torch.half, *[torch.bfloat16] if AMPERE_OR_ROCM else [])) def test_Conv2d_large_workspace(self, device, dtype): # These sizes require huge cuDNN workspaces. Make sure we choose a # reasonable algorithm that does not run out of memory @@ -13243,7 +13498,7 @@ def test_Conv3d_depthwise_naive_groups(self, device, dtype): @onlyCUDA - @dtypes(*get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM)) + @dtypes(*floating_types_and(torch.half, *[torch.bfloat16] if AMPERE_OR_ROCM else [])) def test_noncontig_conv_grad(self, device, dtype): # FIXME: remove after adding non-contiguous grad tests for all modules module = nn.Conv2d(3, 5, kernel_size=3, padding=1).to(device, dtype) @@ -13359,8 +13614,8 @@ def test_conv_double_backward_stride(self): batch_size, inp_size, dilation, no_weight) - - def test_conv1d_same_padding(self, device): + @dtypes(torch.float, torch.cfloat) + def test_conv1d_same_padding(self, device, dtype): # Test padding='same' outputs the correct shape test_args = [ # in_size @@ -13373,22 +13628,22 @@ def test_conv1d_same_padding(self, device): [1], ] for in_size, k_size, dilation, stride in itertools.product(*test_args): - x = torch.rand(1, 1, in_size, device=device) - y = torch.rand(1, 1, k_size, device=device) + x = torch.rand(1, 1, in_size, device=device, dtype=dtype) + y = torch.rand(1, 1, k_size, device=device, dtype=dtype) z = F.conv1d(x, y, padding='same', dilation=dilation, stride=stride) self.assertEqual(z.size(2), int(math.ceil(in_size / stride))) # Compare F.conv1d padding='same' output against manual padding # Without strides/dilation - x = torch.rand(1, 1, 12, device=device) - y = torch.rand(1, 1, 3, device=device) + x = torch.rand(1, 1, 12, device=device, dtype=dtype) + y = torch.rand(1, 1, 3, device=device, dtype=dtype) expect = F.conv1d(x, y, padding=1) actual = F.conv1d(x, y, padding='same') self.assertEqual(expect, actual) # With dilation - x = torch.rand(1, 1, 12, device=device) - y = torch.rand(1, 1, 4, device=device) + x = torch.rand(1, 1, 12, device=device, dtype=dtype) + y = torch.rand(1, 1, 4, device=device, dtype=dtype) expect = F.conv1d(x, y, padding=3, dilation=2) actual = F.conv1d(x, y, padding='same', dilation=2) self.assertEqual(expect, actual) @@ -13398,76 +13653,89 @@ def test_conv1d_same_padding(self, device): actual = F.conv1d(x, y, padding='same', dilation=3) self.assertEqual(expect, actual) - - def test_conv2d_same_padding(self, device): + @dtypes(torch.float, torch.cfloat) + def test_conv2d_same_padding(self, device, dtype): + if dtype is torch.cfloat: + rtol, atol = 2e-6, 2e-6 + else: + rtol, atol = None, None # Compare F.conv2d padding='same' output against manual padding # Without strides/dilation - x = torch.rand(1, 1, 10, 11, device=device) - y = torch.rand(1, 1, 4, 5, device=device) + x = torch.rand(1, 1, 10, 11, device=device, dtype=dtype) + y = torch.rand(1, 1, 4, 5, device=device, dtype=dtype) expect = F.conv2d(x, y, padding=(2, 2))[..., 1:, :] actual = F.conv2d(x, y, padding='same') - self.assertEqual(expect, actual) + self.assertEqual(expect, actual, rtol=rtol, atol=atol) # With dilation - y = torch.rand(1, 1, 3, 4, device=device) + y = torch.rand(1, 1, 3, 4, device=device, dtype=dtype) expect = F.conv2d(x, y, padding=(2, 3), dilation=2) actual = F.conv2d(x, y, padding='same', dilation=2) - self.assertEqual(expect, actual) + self.assertEqual(expect, actual, rtol=rtol, atol=atol) # Dilation with asymmetric padding - y = torch.rand(1, 1, 4, 4, device=device) + y = torch.rand(1, 1, 4, 4, device=device, dtype=dtype) expect = F.conv2d(x, y, padding=5, dilation=3)[..., 1:, 1:] actual = F.conv2d(x, y, padding='same', dilation=3) - self.assertEqual(expect, actual) + self.assertEqual(expect, actual, rtol=rtol, atol=atol) - def test_conv3d_same_padding(self, device): + @dtypes(torch.float, torch.cfloat) + def test_conv3d_same_padding(self, device, dtype): + if dtype is torch.cfloat: + rtol, atol = 2e-6, 2e-6 + else: + rtol, atol = None, None # Compare F.conv3d padding='same' output against manual padding # Without strides/dilation - x = torch.rand(1, 1, 10, 11, 12, device=device) - y = torch.rand(1, 1, 1, 2, 5, device=device) + x = torch.rand(1, 1, 10, 11, 12, device=device, dtype=dtype) + y = torch.rand(1, 1, 1, 2, 5, device=device, dtype=dtype) expect = F.conv3d(x, y, padding=(0, 1, 2))[..., :, 1:, :] actual = F.conv3d(x, y, padding='same') - self.assertEqual(expect, actual) + self.assertEqual(expect, actual, rtol=rtol, atol=atol) # With dilation expect = F.conv3d(x, y, padding=(0, 1, 4), dilation=2) actual = F.conv3d(x, y, padding='same', dilation=2) - self.assertEqual(expect, actual) + self.assertEqual(expect, actual, rtol=rtol, atol=atol) # Dilation with asymmetric padding - y = torch.rand(1, 1, 4, 4, 4, device=device) + y = torch.rand(1, 1, 4, 4, 4, device=device, dtype=dtype) expect = F.conv3d(x, y, padding=5, dilation=3)[..., 1:, 1:, 1:] actual = F.conv3d(x, y, padding='same', dilation=3) - self.assertEqual(expect, actual) + self.assertEqual(expect, actual, rtol=rtol, atol=atol) - def test_conv1d_valid_padding(self, device): + @dtypes(torch.float, torch.cfloat) + def test_conv1d_valid_padding(self, device, dtype): # Test F.conv1d padding='valid' is the same as no padding - x = torch.rand(1, 1, 10, device=device) - y = torch.rand(1, 1, 4, device=device) + x = torch.rand(1, 1, 10, device=device, dtype=dtype) + y = torch.rand(1, 1, 4, device=device, dtype=dtype) expect = F.conv1d(x, y) actual = F.conv1d(x, y, padding='valid') self.assertEqual(expect, actual) - def test_conv2d_valid_padding(self, device): + @dtypes(torch.float, torch.cfloat) + def test_conv2d_valid_padding(self, device, dtype): # Test F.conv2d padding='valid' is the same as no padding - x = torch.rand(1, 1, 1, 10, device=device) - y = torch.rand(1, 1, 1, 4, device=device) + x = torch.rand(1, 1, 1, 10, device=device, dtype=dtype) + y = torch.rand(1, 1, 1, 4, device=device, dtype=dtype) expect = F.conv2d(x, y) actual = F.conv2d(x, y, padding='valid') self.assertEqual(expect, actual) - def test_conv3d_valid_padding(self, device): + @dtypes(torch.float, torch.cfloat) + def test_conv3d_valid_padding(self, device, dtype): # Test F.conv3d padding='valid' is the same as no padding - x = torch.rand(1, 1, 1, 1, 10, device=device) - y = torch.rand(1, 1, 1, 1, 4, device=device) + x = torch.rand(1, 1, 1, 1, 10, dtype=dtype, device=device) + y = torch.rand(1, 1, 1, 1, 4, dtype=dtype, device=device) expect = F.conv3d(x, y) actual = F.conv3d(x, y, padding='valid') self.assertEqual(expect, actual) - def test_conv1d_same_padding_backward(self, device): + @dtypes(torch.float, torch.cfloat) + def test_conv1d_same_padding_backward(self, device, dtype): # Test F.conv1d gradients work with padding='same' - x = torch.rand(1, 1, 12, device=device, requires_grad=True) - y = torch.rand(1, 1, 4, device=device, requires_grad=True) + x = torch.rand(1, 1, 12, dtype=dtype, device=device, requires_grad=True) + y = torch.rand(1, 1, 4, dtype=dtype, device=device, requires_grad=True) # Symmetric padding z = F.conv1d(x, y, padding=3, dilation=2) @@ -13492,10 +13760,11 @@ def test_conv1d_same_padding_backward(self, device): self.assertEqual(gx_expect, x.grad) self.assertEqual(gy_expect, y.grad) - def test_conv2d_same_padding_backward(self, device): + @dtypes(torch.float, torch.cfloat) + def test_conv2d_same_padding_backward(self, device, dtype): # Test F.conv2d gradients work with padding='same' - x = torch.rand(1, 1, 10, 11, device=device, requires_grad=True) - y = torch.rand(1, 1, 4, 5, device=device, requires_grad=True) + x = torch.rand(1, 1, 10, 11, device=device, dtype=dtype, requires_grad=True) + y = torch.rand(1, 1, 4, 5, device=device, dtype=dtype, requires_grad=True) # Symmetric padding z = F.conv2d(x, y, padding=(3, 4), dilation=2) @@ -13510,7 +13779,7 @@ def test_conv2d_same_padding_backward(self, device): x.grad, y.grad = None, None # Asymmetric padding - y = torch.rand(1, 1, 4, 4, device=device, requires_grad=True) + y = torch.rand(1, 1, 4, 4, device=device, dtype=dtype, requires_grad=True) z = F.conv2d(x, y, padding=2)[..., 1:, 1:] z.sum().backward() gx_expect, gy_expect = x.grad, y.grad @@ -13521,12 +13790,13 @@ def test_conv2d_same_padding_backward(self, device): self.assertEqual(gx_expect, x.grad) self.assertEqual(gy_expect, y.grad) - def test_conv3d_same_padding_backward(self, device): + @dtypes(torch.double, torch.cdouble) + def test_conv3d_same_padding_backward(self, device, dtype): check_forward_ad = torch.device(device).type != 'xla' # Test F.conv3d gradients work with padding='same' - x = torch.rand(1, 1, 1, 11, 12, device=device, requires_grad=True) - y = torch.rand(1, 1, 1, 2, 5, device=device, requires_grad=True) + x = torch.rand(1, 1, 1, 11, 12, dtype=dtype, device=device, requires_grad=True) + y = torch.rand(1, 1, 1, 2, 5, dtype=dtype, device=device, requires_grad=True) # Symmetric padding z = F.conv3d(x, y, padding=(0, 1, 4), dilation=2) @@ -13548,7 +13818,7 @@ def test_conv3d_same_padding_backward(self, device): check_fwd_over_rev=True) # Asymmetric padding - y = torch.rand(1, 1, 1, 4, 4, device=device, requires_grad=True) + y = torch.rand(1, 1, 1, 4, 4, dtype=dtype, device=device, requires_grad=True) z = F.conv3d(x, y, padding=2)[..., 1:, 1:] z.sum().backward() gx_expect, gy_expect = x.grad, y.grad @@ -13566,10 +13836,11 @@ def test_conv3d_same_padding_backward(self, device): gradgradcheck(lambda x, y: F.conv3d(x, y, padding='same'), (x, y), check_fwd_over_rev=True) - def test_conv1d_valid_padding_backward(self, device): + @dtypes(torch.float, torch.cfloat) + def test_conv1d_valid_padding_backward(self, device, dtype): # Test F.conv1d gradients work with padding='valid' - x = torch.rand(1, 1, 10, device=device, requires_grad=True) - y = torch.rand(1, 1, 4, device=device, requires_grad=True) + x = torch.rand(1, 1, 10, dtype=dtype, device=device, requires_grad=True) + y = torch.rand(1, 1, 4, dtype=dtype, device=device, requires_grad=True) F.conv1d(x, y, padding=0).sum().backward() gx_expect, gy_expect = x.grad, y.grad x.grad, y.grad = None, None @@ -13579,10 +13850,132 @@ def test_conv1d_valid_padding_backward(self, device): self.assertEqual(gx_expect, gx_actual) self.assertEqual(gy_expect, gy_actual) - def test_conv2d_valid_padding_backward(self, device): + @unittest.skipIf(not TEST_SCIPY, "Scipy required for the test.") + @dtypes(torch.float, torch.cfloat) + @parametrize_test("mode", ('valid', 'same')) + def test_conv1d_vs_scipy(self, device, dtype, mode): + t = make_tensor((1, 10), device=device, dtype=dtype) + feat_dim = t.shape[1] + weight_even = make_tensor((1, 1, 4), device=device, dtype=dtype) + weight_odd = make_tensor((1, 1, 5), device=device, dtype=dtype) + + def _test(t, weight, mode): + # SciPy expects two 1-D inputs. + t_a = t.view(-1).cpu().numpy() + w_a = weight.view(-1).cpu().numpy() + expected = scipy.signal.convolve(t_a, w_a, mode=mode) + + kwargs = {'padding': mode} + if mode == 'same': + # `same` padding in PyTorch conv1d is different + # from SciPy + p = weight.shape[2] // 2 + t = torch.nn.functional.pad(t, (p, p)) + # We have already taken care of padding + kwargs.pop("padding") + + # second input is flipped in SciPy's convolve + weight_flipped = torch.flip(weight, (2,)) + actual = torch.nn.functional.conv1d(t, weight_flipped, **kwargs).squeeze(0) + if mode == 'same': + actual = actual[:feat_dim] + + self.assertEqual(actual, expected) + + # Global dtype for this test suite is torch.double + # This leads to change in type-promotion + # and conv1d outputs `complex128` for `complex64` input. + with set_default_dtype(torch.float): + _test(t, weight_even, mode) + _test(t, weight_odd, mode) + + @unittest.skipIf(not TEST_SCIPY, "Scipy required for the test.") + @dtypes(torch.float, torch.cfloat) + @parametrize_test("mode", ('valid', 'same')) + def test_conv2d_vs_scipy(self, device, dtype, mode): + t = make_tensor((1, 5, 10), device=device, dtype=dtype) + weight_even = make_tensor((1, 1, 2, 4), device=device, dtype=dtype) + weight_odd = make_tensor((1, 1, 3, 5), device=device, dtype=dtype) + + def _test(t, weight, mode): + # SciPy expects two 2-D inputs. + t_a = t.squeeze(0).cpu().numpy() + w_a = weight.squeeze(0).squeeze(0).cpu().numpy() + expected = scipy.signal.convolve2d(t_a, w_a, mode=mode) + + kwargs = {'padding': mode} + if mode == 'same': + # `same` padding in PyTorch conv2d is different + # from SciPy + left_right_pad = weight.shape[3] // 2 + top_bottom_pad = weight.shape[2] // 2 + p = (left_right_pad, left_right_pad, top_bottom_pad, top_bottom_pad) + t = torch.nn.functional.pad(t, p) + # We have already taken care of padding + kwargs.pop("padding") + + # second input is flipped in SciPy's convolve2d + weight_flipped = torch.flip(weight, (2, 3)) + actual = torch.nn.functional.conv2d(t, weight_flipped, **kwargs).squeeze(0) + if mode == 'same': + actual = actual[:5, :10] + + self.assertEqual(actual, expected, rtol=2e-5, atol=5e-6) + + # Global dtype for this test suite is torch.double + # This leads to change in type-promotion + # and conv1d outputs `complex128` for `complex64` input. + with set_default_dtype(torch.float): + _test(t, weight_even, mode) + _test(t, weight_odd, mode) + + @unittest.skipIf(not TEST_SCIPY, "Scipy required for the test.") + @dtypes(torch.float, torch.cfloat) + @parametrize_test("mode", ('valid', 'same')) + def test_conv3d_vs_scipy(self, device, dtype, mode): + t = make_tensor((1, 5, 5, 10), device=device, dtype=dtype) + weight_even = make_tensor((1, 1, 2, 2, 4), device=device, dtype=dtype) + weight_odd = make_tensor((1, 1, 2, 3, 5), device=device, dtype=dtype) + + def _test(t, weight, mode): + # SciPy expects two 3-D inputs. + t_a = t.squeeze(0).cpu().numpy() + w_a = weight.squeeze(0).squeeze(0).cpu().numpy() + expected = scipy.signal.convolve(t_a, w_a, mode=mode) + + kwargs = {'padding': mode} + if mode == 'same': + # `same` padding in PyTorch conv3d is different + # from SciPy + left_right_pad = weight.shape[4] // 2 + top_bottom_pad = weight.shape[3] // 2 + front_back_pad = weight.shape[2] // 2 + p = (left_right_pad, left_right_pad, top_bottom_pad, top_bottom_pad, + front_back_pad, front_back_pad) + t = torch.nn.functional.pad(t, p) + # We have already taken care of padding + kwargs.pop("padding") + + # second input is flipped in SciPy's convolve + weight_flipped = torch.flip(weight, (2, 3, 4)) + actual = torch.nn.functional.conv3d(t, weight_flipped, **kwargs).squeeze(0) + if mode == 'same': + actual = actual[:5, :5, :10] + + self.assertEqual(actual, expected, rtol=2e-5, atol=5e-6) + + # Global dtype for this test suite is torch.double + # This leads to change in type-promotion + # and conv1d outputs `complex128` for `complex64` input. + with set_default_dtype(torch.float): + _test(t, weight_even, mode) + _test(t, weight_odd, mode) + + @dtypes(torch.float, torch.complex64) + def test_conv2d_valid_padding_backward(self, device, dtype): # Test F.conv2d gradients work with padding='valid' - x = torch.rand(1, 1, 1, 10, device=device, requires_grad=True) - y = torch.rand(1, 1, 1, 4, device=device, requires_grad=True) + x = torch.rand(1, 1, 1, 10, device=device, dtype=dtype, requires_grad=True) + y = torch.rand(1, 1, 1, 4, device=device, dtype=dtype, requires_grad=True) F.conv2d(x, y, padding=0).sum().backward() gx_expect, gy_expect = x.grad, y.grad x.grad, y.grad = None, None @@ -13592,12 +13985,13 @@ def test_conv2d_valid_padding_backward(self, device): self.assertEqual(gx_expect, gx_actual) self.assertEqual(gy_expect, gy_actual) - def test_conv3d_valid_padding_backward(self, device): + @dtypes(torch.double, torch.cdouble) + def test_conv3d_valid_padding_backward(self, device, dtype): check_forward_ad = torch.device(device).type != 'xla' # Test F.conv3d gradients work with padding='valid' - x = torch.rand(1, 1, 1, 1, 10, device=device, requires_grad=True) - y = torch.rand(1, 1, 1, 1, 4, device=device, requires_grad=True) + x = torch.rand(1, 1, 1, 1, 10, dtype=dtype, device=device, requires_grad=True) + y = torch.rand(1, 1, 1, 1, 4, dtype=dtype, device=device, requires_grad=True) F.conv3d(x, y, padding=0).sum().backward() gx_expect, gy_expect = x.grad, y.grad x.grad, y.grad = None, None @@ -13800,6 +14194,20 @@ def _make_noncontiguous(inp): if layout is torch._mkldnn: return + if backend_actual != torch._C._ConvBackend.Empty: # FIXME: forward AD fails + # Forward AD and forward-over-reverse AD smoke test in float32 + # TODO: remove this if we introduce per-op gradient tests for float32 + with fwAD.dual_level(): + dual_inputs = [(fwAD.make_dual(i, torch.rand_like(i)) if isinstance(i, torch.Tensor) else i) for i in inputs] + # Forward AD + output = convolution(*dual_inputs) + # Forward over reverse AD + grad_output_d = fwAD.make_dual(torch.rand_like(output), torch.rand_like(output)) + if has_bias: + torch.autograd.grad(output, [x, weight, bias], grad_output_d) + else: + torch.autograd.grad(output, [x, weight], grad_output_d) + # Convert to float64 for gradcheck. x = x.to(torch.float64).detach().requires_grad_(True) weight = weight.to(torch.float64).detach().requires_grad_(True) @@ -14623,26 +15031,27 @@ def test_BatchNorm_empty(self, device): self.assertEqual(mod.weight.grad, torch.tensor([0., 0, 0], device=device)) self.assertEqual(mod.bias.grad, torch.tensor([0., 0, 0], device=device)) - def test_conv_empty_channel(self, device): + @dtypes(torch.float, torch.cfloat) + def test_conv_empty_channel(self, device, dtype): in_channels = 0 - mod = torch.nn.Conv1d(in_channels, 8, 2, stride=2).to(device) - inp = torch.randn(2, 0, 15, device=device) + mod = torch.nn.Conv1d(in_channels, 8, 2, stride=2, dtype=dtype).to(device) + inp = torch.randn(2, 0, 15, device=device, dtype=dtype) self._test_module_empty_input(mod, inp, check_size=False) with self.assertRaisesRegex(RuntimeError, "Given groups=1, weight"): inp = torch.randn(2, 1, 0, device=device) mod(inp) - mod = torch.nn.Conv2d(in_channels, 33, 3, stride=2).to(device) - inp = torch.randn(2, 0, 50, 100, device=device) + mod = torch.nn.Conv2d(in_channels, 33, 3, stride=2, dtype=dtype).to(device) + inp = torch.randn(2, 0, 50, 100, device=device, dtype=dtype) self._test_module_empty_input(mod, inp, check_size=False) with self.assertRaisesRegex(RuntimeError, "Given groups=1, weight"): inp = torch.randn(2, 1, 40, 0, device=device) mod(inp) - mod = torch.nn.Conv3d(in_channels, 33, 3, stride=2).to(device) - inp = torch.randn(2, 0, 50, 20, 40, device=device) + mod = torch.nn.Conv3d(in_channels, 33, 3, stride=2, dtype=dtype).to(device) + inp = torch.randn(2, 0, 50, 20, 40, device=device, dtype=dtype) self._test_module_empty_input(mod, inp, check_size=False) with self.assertRaisesRegex(RuntimeError, "Given groups=1, weight"): @@ -14918,6 +15327,31 @@ def test_unequal_when_beta_is_greater_than_one(): test_unequal_when_beta_is_less_than_one() test_unequal_when_beta_is_greater_than_one() + @onlyCPU + def test_smooth_l1_loss_bfloat16(self, device): + def test_dtype(fn, input, target, dtype): + input = input.detach().clone().to(dtype=dtype).requires_grad_(True) + input2 = input.detach().clone().float().requires_grad_(True) + target = target.detach().clone().to(dtype=dtype) + target2 = target.detach().clone().float() + out = fn(input, target) + out.sum().backward() + out2 = fn(input2, target2) + out2.sum().backward() + self.assertEqual(out.dtype, dtype) + self.assertEqual(input.grad.dtype, dtype) + self.assertEqual(out, out2, exact_dtype=False) + self.assertEqual(input.grad, input2.grad, exact_dtype=False) + + def func(device): + return nn.SmoothL1Loss().to(device=device) + + shapes = [[1, 3, 1, 6], [1, 3, 1, 128], [1, 3, 128, 128]] + for shape in shapes: + x = torch.randn(shape, device=device, requires_grad=True) + t = torch.randn(shape, device=device) + test_dtype(func(device), x, t, torch.bfloat16) + # We don't want to make propagating NaN a hard requirement on ops, but for # these easy ones, we should make them do so. def test_nonlinearity_propagate_nan(self, device): @@ -15693,9 +16127,7 @@ def test_upsamplingBicubic2d(self, device, antialias, align_corners): # for scale_factor in [0.5, 1, 1.5, 2]: for scale_factor in [2, ]: in_t = torch.ones(2, 3, 8, 8, device=device) - print("dtype: ", in_t.dtype) out_t = F.interpolate(in_t, scale_factor=scale_factor, **kwargs) - print(out_t) out_size = int(math.floor(in_t.shape[-1] * scale_factor)) expected_out = torch.ones(2, 3, out_size, out_size, device=device) self.assertEqual(expected_out, out_t, atol=1e-5, rtol=0) @@ -16153,6 +16585,7 @@ def test_masked_softmax(self, device): mask = mask.cuda() mask = mask.reshape(B, 1, 1, L).expand(B, num_heads, L, L).bool() native_res = torch._masked_softmax(input, mask) + mask = ~mask mask = mask.float() def slow_masked_softmax(input, mask): @@ -16176,6 +16609,7 @@ def test_masked_softmax_transformer_layout(self, device): mask = mask.bool() native_res = torch._masked_softmax(input, mask) mask = mask.reshape(B, 1, 1, L).expand(B, num_heads, L, L) + mask = ~mask mask = mask.float() def slow_masked_softmax(input, mask): @@ -17059,7 +17493,7 @@ def test_embedding_bag_empty_input(self, device, dtypes): output = Embed(input=x, offsets=torch.tensor([0, 0], device=device, dtype=dtypes[1])) self.assertEqual(output, torch.zeros_like(output)) - @skipCUDAIf(True, "cuda assert is not recovarable.") + @skipCUDAIf(True, "no out-of-bounds check on CUDA for perf.") @dtypes(*itertools.product((torch.float, torch.double), (torch.int, torch.long))) @parametrize_test("padding_idx", [None, 0]) @parametrize_test("mode", ["sum", "mean", "max"]) @@ -17178,15 +17612,15 @@ def _embedding_bag_reference_impl(self, input, weight, offsets=None, mode='sum', bags.append(embeddings.narrow(0, offset, length).max(0)[0]) return torch.stack(bags) - @dtypesIfCUDA(*itertools.product((torch.int, torch.long), (torch.int, torch.long), (torch.float, torch.double, torch.half))) - @dtypes(*itertools.product((torch.int, torch.long), (torch.int, torch.long), (torch.float, torch.double))) + @skipMeta + @dtypes(*itertools.product((torch.int, torch.long), (torch.int, torch.long), (torch.half, torch.float, torch.double))) def test_EmbeddingBag_empty_per_sample_weights_and_offsets(self, device, dtypes): # Test empty input and per sample weight, and backward pass. There was a CUDA # invalid configuration bug (more context in #46572) def test_per_sample_weights(mode, trainable_scale): es = nn.EmbeddingBag(5, 2, mode=mode).to(dtype=dtypes[2], device=device) es.weight.data.copy_( - torch.arange(1, 11, device=device, dtype=dtypes[2]).view_as(es.weight)) + torch.arange(1, 11, device=device).view_as(es.weight).to(dtypes[2])) input = torch.tensor([], device=device, dtype=dtypes[0]) offsets = torch.tensor([0, 0, 0, 0, 0], device=device, dtype=dtypes[1]) per_sample_weights = torch.randn_like(input, dtype=dtypes[2]) \ @@ -17217,13 +17651,13 @@ def test_per_sample_weights(mode, trainable_scale): for mode, trainable in itertools.product(modes, trainable_scale): test_per_sample_weights(mode, trainable) - @dtypesIfCUDA(*itertools.product((torch.int, torch.long), (torch.int, torch.long), (torch.float, torch.double, torch.half))) - @dtypes(*itertools.product((torch.int, torch.long), (torch.int, torch.long), (torch.float, torch.double))) + @skipMeta + @dtypes(*itertools.product((torch.int, torch.long), (torch.int, torch.long), (torch.float, torch.double, torch.half))) def test_EmbeddingBag_per_sample_weights_and_offsets(self, device, dtypes): def test_per_sample_weights(mode, trainable_scale): es = nn.EmbeddingBag(5, 2, mode=mode).to(dtype=dtypes[2], device=device) es.weight.data.copy_( - torch.arange(1, 11, device=device, dtype=dtypes[2]).view_as(es.weight)) + torch.arange(1, 11, device=device).view_as(es.weight).to(dtypes[2])) input = torch.tensor([3, 1, 1, 1, 4, 0], device=device, dtype=dtypes[0]) offsets = torch.tensor([0, 0, 3, 3, 6], device=device, dtype=dtypes[1]) per_sample_weights = torch.randn_like(input, dtype=dtypes[2]) \ @@ -17251,13 +17685,13 @@ def test_per_sample_weights(mode, trainable_scale): for mode, trainable in itertools.product(modes, trainable_scale): test_per_sample_weights(mode, trainable) - @dtypesIfCUDA(*itertools.product((torch.int, torch.long), (torch.int, torch.long), (torch.float, torch.double, torch.half))) - @dtypes(*itertools.product((torch.int, torch.long), (torch.int, torch.long), (torch.float, torch.double))) + @skipMeta + @dtypes(*itertools.product((torch.int, torch.long), (torch.int, torch.long), (torch.float, torch.double, torch.half))) def test_EmbeddingBag_per_sample_weights_and_new_offsets(self, device, dtypes): def test_per_sample_weights_new_offsets(mode, trainable_scale, include_last_offset, has_weight=True): es = nn.EmbeddingBag(5, 2, mode=mode, include_last_offset=include_last_offset).to(dtype=dtypes[2], device=device) es.weight.data.copy_( - torch.arange(1, 11, device=device, dtype=dtypes[2]).view_as(es.weight)) + torch.arange(1, 11, device=device).view_as(es.weight).to(dtypes[2])) input = torch.tensor([3, 1, 1, 1, 4, 0], device=device, dtype=dtypes[0]) offsets = torch.tensor([0, 0, 3, 3, 6], device=device, dtype=dtypes[1]) @@ -17413,7 +17847,7 @@ def _test_EmbeddingBag( ): # check a known test example es = nn.EmbeddingBag(5, 2, mode=mode, sparse=sparse).to(device, wdtype) - es.weight.data.copy_(torch.arange(1, 11, device=device, dtype=wdtype).view_as(es.weight)) + es.weight.data.copy_(torch.arange(1, 11, device=device).view_as(es.weight).to(wdtype)) input = torch.tensor([3, 1, 1, 1, 4, 0], device=device, dtype=dtype) offsets = torch.tensor([0, 0, 3, 3, 6], device=device, dtype=odtype) @@ -17516,8 +17950,8 @@ def _test_EmbeddingBag( offset[-1] = 100 self.assertRaises(RuntimeError, lambda: es(input.view(-1), offset)) - @dtypesIfCUDA(*itertools.product((torch.int, torch.long), (torch.int, torch.long), (torch.float, torch.double, torch.half))) - @dtypes(*itertools.product((torch.int, torch.long), (torch.int, torch.long), (torch.float, torch.double))) + @skipMeta + @dtypes(*itertools.product((torch.int, torch.long), (torch.int, torch.long), (torch.float, torch.double, torch.half))) def test_embedding_bag_device(self, device, dtypes): self._test_EmbeddingBag(device, 'sum', False, wdtype=dtypes[2], dtype=dtypes[0], odtype=dtypes[1]) self._test_EmbeddingBag(device, 'mean', False, wdtype=dtypes[2], dtype=dtypes[0], odtype=dtypes[1]) @@ -17530,7 +17964,7 @@ def test_embedding_bag_device(self, device, dtypes): elif self.device_type == 'cpu': # TODO: figure out why precision on sparse embeddings isn't the # same as for dense. - test_backward = dtypes[2] is not torch.float + test_backward = dtypes[2] is not torch.float and dtypes[2] is not torch.float16 self._test_EmbeddingBag( device, @@ -17551,8 +17985,8 @@ def test_embedding_bag_device(self, device, dtypes): test_backward=test_backward, ) - @dtypesIfCUDA(*itertools.product((torch.int, torch.long), (torch.int, torch.long), (torch.float, torch.double, torch.half))) - @dtypes(*itertools.product((torch.int, torch.long), (torch.int, torch.long), (torch.float, torch.double))) + @skipMeta + @dtypes(*itertools.product((torch.int, torch.long), (torch.int, torch.long), (torch.float, torch.double, torch.half))) def test_embedding_bag_non_contiguous_weight(self, device, dtypes): weight_tensor = torch.randn(3, 4, dtype=dtypes[2], device=device) @@ -17582,6 +18016,11 @@ def test_embedding_bag_bfloat16(self, device, dtypes): self._test_EmbeddingBag(device, 'sum', True, wdtype=torch.bfloat16, dtype=dtypes[0], odtype=dtypes[1], test_backward=True) self._test_EmbeddingBag(device, 'mean', True, wdtype=torch.bfloat16, dtype=dtypes[0], odtype=dtypes[1], test_backward=True) + @onlyNativeDeviceTypes # currently fails on XLA + @dtypes(*itertools.product((torch.int, torch.long), (torch.int, torch.long))) + def test_embedding_bag_half(self, device, dtypes): + self._test_EmbeddingBag(device, 'sum', True, wdtype=torch.float16, dtype=dtypes[0], odtype=dtypes[1], test_backward=True) + @onlyCUDA @dtypes(torch.half, torch.float, torch.double) def test_multihead_attention_dtype(self, device, dtype): @@ -17597,7 +18036,7 @@ def test_multihead_attention_dtype(self, device, dtype): self.assertEqual(q.size(), out[0].size()) self.assertEqual(dtype, out[0].dtype) - @dtypesIfCUDA(*get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM)) + @dtypesIfCUDA(*floating_types_and(torch.half, *[torch.bfloat16] if AMPERE_OR_ROCM else [])) @dtypes(torch.float) def test_Conv2d_naive_groups(self, device, dtype): # Check that grouped convolutions matches two half convolutions @@ -17632,7 +18071,7 @@ def test_Conv2d_naive_groups(self, device, dtype): torch.cat([m1.weight.grad.data, m2.weight.grad.data], 0), atol=dtype2prec_DONTUSE[dtype], rtol=0) - @dtypes(torch.double) + @dtypes(torch.double, torch.cdouble) def test_Conv2d_backward_depthwise(self, device, dtype): x = torch.randn(2, 2, 4, 20, device=device, dtype=dtype, requires_grad=True) weight = torch.randn(2, 1, 3, 5, device=device, dtype=dtype, requires_grad=True) @@ -17965,37 +18404,37 @@ def expected_output(dim): self.assertEqual(output[0, 0, 0, 0], float("-inf")) self.assertEqual(indices[0, 0, 0, 0], 0) - @dtypesIfCUDA(*get_all_fp_dtypes()) + @dtypesIfCUDA(*floating_types_and(torch.half, torch.bfloat16)) @dtypes(torch.float) def test_MaxPool1d_indices(self, device, dtype): self._test_maxpool_indices(1, device=device, dtype=dtype) - @dtypesIfCUDA(*get_all_fp_dtypes()) + @dtypesIfCUDA(*floating_types_and(torch.half, torch.bfloat16)) @dtypes(torch.float) def test_MaxPool2d_indices(self, device, dtype): self._test_maxpool_indices(2, device=device, dtype=dtype) - @dtypesIfCUDA(*get_all_fp_dtypes()) + @dtypesIfCUDA(*floating_types_and(torch.half, torch.bfloat16)) @dtypes(torch.float) def test_MaxPool3d_indices(self, device, dtype): self._test_maxpool_indices(3, device=device, dtype=dtype) - @dtypesIfCUDA(*get_all_fp_dtypes()) + @dtypesIfCUDA(*floating_types_and(torch.half, torch.bfloat16)) @dtypes(torch.float) def test_AdaptiveMaxPool1d_indices(self, device, dtype): self._test_maxpool_indices(1, adaptive=True, device=device, dtype=dtype) - @dtypesIfCUDA(*get_all_fp_dtypes()) + @dtypesIfCUDA(*floating_types_and(torch.half, torch.bfloat16)) @dtypes(torch.float) def test_AdaptiveMaxPool2d_indices(self, device, dtype): self._test_maxpool_indices(2, adaptive=True, device=device, dtype=dtype) - @dtypesIfCUDA(*get_all_fp_dtypes()) + @dtypesIfCUDA(*floating_types_and(torch.half, torch.bfloat16)) @dtypes(torch.float) def test_AdaptiveMaxPool3d_indices(self, device, dtype): self._test_maxpool_indices(3, adaptive=True, device=device, dtype=dtype) - @dtypesIfCUDA(*get_all_fp_dtypes()) + @dtypesIfCUDA(*floating_types_and(torch.half, torch.bfloat16)) @dtypes(torch.float) def test_maxpool_indices_no_batch_dim(self, device, dtype): """Check that indices with no batch dim is consistent with a single batch.""" @@ -18160,7 +18599,7 @@ def test_pooling_zero_stride(self, device): self.assertRaisesRegex(RuntimeError, r"stride should not be zero|stride must be greater than zero", lambda: fn_module(x)) - @dtypesIfCUDA(*get_all_fp_dtypes()) + @dtypesIfCUDA(*floating_types_and(torch.half, torch.bfloat16)) @dtypes(torch.float) def test_pool_large_size(self, device, dtype): for op in ('max', 'avg'): @@ -18174,7 +18613,7 @@ def test_pool_large_size(self, device, dtype): # check if the output shape was still computed correctly self.assertEqual(x.shape[2], res.shape[2]) - @dtypesIfCUDA(*get_all_fp_dtypes()) + @dtypesIfCUDA(*floating_types_and(torch.half, torch.bfloat16)) @dtypes(torch.float) def test_pool_invalid_size(self, device, dtype): for op in ('max', 'avg'): @@ -18418,6 +18857,35 @@ def test_multi_margin_loss_errors(self, device): lambda: nn.functional.multi_margin_loss(torch.randn(5, device=device), torch.zeros(3, device=device))) + @onlyCPU + def test_activations_bfloat16_cpu(self, device): + def test_bfloat16(fn, device, inp_dims, prec): + # bfloat16 compute + input = torch.randn(inp_dims, dtype=torch.bfloat16, device=device, requires_grad=True) + out = fn(input) + grad_input = torch.randn_like(out, dtype=torch.bfloat16, device=device) + out.backward(grad_input) + + # fp32 compute + input2 = input.detach().clone().float().requires_grad_(True) + out2 = fn(input2) + grad_input2 = grad_input.detach().clone().float() + out2.backward(grad_input2) + + self.assertEqual(out.dtype, torch.bfloat16) + self.assertEqual(input.grad.dtype, torch.bfloat16) + self.assertEqual(out, out2, atol=prec, rtol=0, exact_dtype=False) + self.assertEqual(input.grad.data, input2.grad.data, atol=prec, rtol=0, exact_dtype=False) + + shapes = [[1, 3, 1, 6], [1, 3, 1, 128], [1, 3, 256, 256]] + for shape in shapes: + test_bfloat16(torch.nn.LogSigmoid(), device, shape, prec=2e-2) + test_bfloat16(torch.nn.Hardsigmoid(), device, shape, prec=1e-2) + test_bfloat16(torch.nn.Hardshrink(), device, shape, prec=1e-2) + test_bfloat16(torch.nn.Softshrink(), device, shape, prec=1e-2) + test_bfloat16(torch.nn.Hardswish(), device, shape, prec=2e-2) + test_bfloat16(torch.nn.Softplus(), device, shape, prec=1e-2) + def _test_bfloat16_ops(self, op, device, inp_dims=(), prec=1e-2, scale_factor=None): # fp32 compute input1 = torch.randn(inp_dims, dtype=torch.float32, device=device, requires_grad=True) @@ -18467,7 +18935,7 @@ def test_softmax_bfloat16(self, device): @skipCUDAIfRocmVersionLessThan((4, 3)) @skipCUDAIfNotMiopenSuggestNHWC @skipCUDAIfCudnnVersionLessThan(7603) - @dtypes(torch.half, torch.float) + @dtypes(torch.half, torch.float, torch.cfloat) def test_conv_cudnn_nhwc(self, device, dtype): def helper(n, c, h, w, out_channels, kernel_size, groups): input = torch.randint(-3, 3, (n, c, h, w), dtype=dtype, device=device)\ @@ -19350,6 +19818,32 @@ def test_leaky_relu_inplace_with_zero_slope(self, device): expected_bf16 = torch.tensor([0., 0., 1.], device=device, dtype=torch.bfloat16) self.assertEqual(a_bf16.grad, expected_bf16) + @onlyCPU + def test_softshrink(self, device): + x = torch.tensor([[1.21, 0.56, 0.5001, 0.4999, 1.2357, -0.4999, -0.5001, -1.154, + 0.254, -0.24, -0.225, 0.104, 0.002, -0.001, 0.0574, 1.2344, + 0.1748, -0.1797, -0.8125, 0.2051, -1.1328, 1.2344, -0.1562, 2.3554, + -0.1953, 0.0304, -0.3613, -1.3047, 1.0312, 0.1436, -0.6953, 0.5664, + -0.5820, -0.3301, 0.8203, 0.6133, 0.5938], + [-0.8203, -1.2344, -0.5234, 2.5312, -0.4551, -0.6875, -1.5547, -0.2217, + -0.3027, 2.6406, 1.3047, 0.2344, -1.6719, 0.2773, -1.3516, 3.4575, + 0.4414, 0.2656, 2.1094, -1.5156, 1.2344, -0.4336, 0.6797, -3.5486, + 0.9766, -0.4062, 1.4844, 0.7500, -1.7578, 0.7461, 1.6094, 8.5458, + 0.3730, -0.3477, -1.0625, 0.3848, 0.0557]], device=device) + expected = torch.tensor([[0.71, 0.06, 0.0001, 0., 0.7357, 0., -0.0001, -0.654, + 0., 0., 0., 0., 0., 0., 0., 0.7344, + 0., 0., -0.3125, 0., -0.6328, 0.7344, 0., 1.8554, + 0., 0., 0., -0.8047, 0.5312, 0., -0.1953, 0.0664, + -0.0820, 0.0, 0.3203, 0.1133, 0.0938], + [-0.3203, -0.7344, -0.0234, 2.0312, 0.0, -0.1875, -1.0547, 0., + 0.0, 2.1406, 0.8047, 0., -1.1719, 0., -0.8516, 2.9575, + 0., 0., 1.6094, -1.0156, 0.7344, 0., 0.1797, -3.0486, + 0.4766, 0., 0.9844, 0.2500, -1.2578, 0.2461, 1.1094, 8.0458, + 0., 0., -0.5625, 0., 0.]]) + softshrink = torch.nn.Softshrink() + out = softshrink(x) + self.assertEqual(out, expected, atol=1e-2, rtol=0) + def test_threshold_inplace_overlap(self, device): # Inplace threshold is okay, because it is idempotent x = torch.randn((1, 6), device=device).expand((6, 6)) diff --git a/test/test_numpy_interop.py b/test/test_numpy_interop.py index 2c1395a19ac8ea..96c1016c2dbb3d 100644 --- a/test/test_numpy_interop.py +++ b/test/test_numpy_interop.py @@ -9,7 +9,7 @@ (TestCase, run_tests) from torch.testing._internal.common_device_type import \ (instantiate_device_type_tests, onlyCPU, dtypes, skipMeta) -from torch.testing._internal.common_dtype import get_all_dtypes +from torch.testing._internal.common_dtype import all_types_and_complex_and # For testing handling NumPy objects and sending tensors to / accepting # arrays from NumPy. @@ -234,6 +234,28 @@ def test_from_list_of_ndarray_warning(self, device): with self.assertWarnsOnceRegex(UserWarning, warning_msg): torch.tensor([np.array([0]), np.array([1])], device=device) + def test_ctor_with_invalid_numpy_array_sequence(self, device): + # Invalid list of numpy array + with self.assertRaisesRegex(ValueError, "expected sequence of length"): + torch.tensor([np.random.random(size=(3, 3)), np.random.random(size=(3, 0))], device=device) + + # Invalid list of list of numpy array + with self.assertRaisesRegex(ValueError, "expected sequence of length"): + torch.tensor([[np.random.random(size=(3, 3)), np.random.random(size=(3, 2))]], device=device) + + with self.assertRaisesRegex(ValueError, "expected sequence of length"): + torch.tensor([[np.random.random(size=(3, 3)), np.random.random(size=(3, 3))], + [np.random.random(size=(3, 3)), np.random.random(size=(3, 2))]], device=device) + + # expected shape is `[1, 2, 3]`, hence we try to iterate over 0-D array + # leading to type error : not a sequence. + with self.assertRaisesRegex(TypeError, "not a sequence"): + torch.tensor([[np.random.random(size=(3)), np.random.random()]], device=device) + + # list of list or numpy array. + with self.assertRaisesRegex(ValueError, "expected sequence of length"): + torch.tensor([[1, 2, 3], np.random.random(size=(2,)), ], device=device) + @onlyCPU def test_ctor_with_numpy_scalar_ctor(self, device) -> None: dtypes = [ @@ -397,7 +419,7 @@ def test_has_storage_numpy(self, device): self.assertIsNotNone(torch.tensor(arr, device=device, dtype=torch.long).storage()) self.assertIsNotNone(torch.tensor(arr, device=device, dtype=torch.uint8).storage()) - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool)) def test_numpy_scalar_cmp(self, device, dtype): if dtype.is_complex: tensors = (torch.tensor(complex(1, 3), dtype=dtype, device=device), diff --git a/test/test_ops.py b/test/test_ops.py index 5791f93fb53aca..cbf6629862547b 100644 --- a/test/test_ops.py +++ b/test/test_ops.py @@ -1,30 +1,29 @@ -# Owner(s): ["high priority"] +# Owner(s): ["module: unknown"] from collections.abc import Sequence -from functools import partial, wraps +from functools import partial import warnings import unittest import itertools - import torch -from torch.testing import FileCheck, make_tensor -from torch.testing._internal.common_dtype import floating_and_complex_types_and, get_all_dtypes +from torch.testing import make_tensor +from torch.testing._internal.common_dtype import floating_and_complex_types_and, all_types_and_complex_and from torch.testing._internal.common_utils import \ (TestCase, is_iterable_of_tensors, run_tests, IS_SANDCASTLE, clone_input_helper, - gradcheck, gradgradcheck, IS_IN_CI, suppress_warnings, noncontiguous_like, + IS_IN_CI, suppress_warnings, noncontiguous_like, TEST_WITH_ASAN, IS_WINDOWS, IS_FBCODE, first_sample) from torch.testing._internal.common_methods_invocations import \ (op_db, _NOTHING, UnaryUfuncInfo, ReductionOpInfo, SpectralFuncInfo) from torch.testing._internal.common_device_type import \ (deviceCountAtLeast, instantiate_device_type_tests, ops, onlyCPU, onlyCUDA, onlyNativeDeviceTypes, OpDTypes, skipMeta) -from torch.testing._internal.common_jit import JitCommonTestCase, check_against_reference -from torch.testing._internal.jit_metaprogramming_utils import create_script_fn, create_traced_fn, \ - check_alias_annotation -from torch.testing._internal.jit_utils import disable_autodiff_subgraph_inlining, is_lambda + + import torch.testing._internal.opinfo_helper as opinfo_helper -from torch.testing._internal.composite_compliance import _check_composite_compliance +from torch.testing._internal import composite_compliance + +TEST_ROCM = torch.cuda.is_available() and torch.version.hip is not None # TODO: fixme https://github.com/pytorch/pytorch/issues/68972 torch.set_default_dtype(torch.float32) @@ -68,8 +67,13 @@ def tearDownClass(cls): @onlyNativeDeviceTypes @ops(op_db, dtypes=OpDTypes.none) def test_dtypes(self, device, op): + # Check complex32 support only if the op claims. + # TODO: Once the complex32 support is better, we should add check for complex32 unconditionally. + include_complex32 = ((torch.complex32,) if op.supports_dtype(torch.complex32, device) else ()) + # dtypes to try to backward in - allowed_backward_dtypes = floating_and_complex_types_and(torch.bfloat16, torch.float16) + allowed_backward_dtypes = floating_and_complex_types_and( + *((torch.half, torch.bfloat16) + include_complex32)) # lists for (un)supported dtypes supported_dtypes = [] @@ -82,7 +86,8 @@ def unsupported(dtype): if dtype in allowed_backward_dtypes: unsupported_backward_dtypes.append(dtype) - for dtype in get_all_dtypes(): + for dtype in all_types_and_complex_and( + *((torch.half, torch.bfloat16, torch.bool) + include_complex32)): # tries to acquire samples - failure indicates lack of support requires_grad = (dtype in allowed_backward_dtypes and op.supports_autograd) try: @@ -204,6 +209,7 @@ def test_multiple_devices(self, devices, dtype, op): # This test runs in double and complex double precision because # NumPy does computation internally using double precision for many functions # resulting in possible equality check failures. + @unittest.skipIf(TEST_WITH_ASAN, "Skipped under ASAN") @onlyNativeDeviceTypes @suppress_warnings @ops(_ref_test_ops, allowed_dtypes=(torch.float64, torch.long, torch.complex128)) @@ -212,8 +218,8 @@ def test_reference_testing(self, device, dtype, op): # Sets the default dtype to NumPy's default dtype of double cur_default = torch.get_default_dtype() torch.set_default_dtype(torch.double) - sample_inputs = op.sample_inputs(device, dtype) - for sample_input in sample_inputs: + reference_inputs = op.reference_inputs(device, dtype) + for sample_input in reference_inputs: self.compare_with_reference(op, op.ref, sample_input, exact_dtype=(dtype is not torch.long)) finally: torch.set_default_dtype(cur_default) @@ -680,24 +686,6 @@ def _test_inplace_preserve_storage(samples, variants): inplace_samples = list(filter(lambda sample: not sample.broadcasts_input, samples)) _test_inplace_preserve_storage(inplace_samples, inplace_variants) - # Checks if the operator (if it is composite) is written to support most - # backends and Tensor subclasses. See "CompositeImplicitAutograd Compliance" - # in aten/src/ATen/native/README.md for more details - # - # NB: onlyCPU because CompositeImplicitAutograd ops go through the same - # codepath on all devices. Ideally we'd use a meta device here but coverage - # for that is not good yet. - @unittest.skipIf(IS_FBCODE or IS_SANDCASTLE, '__torch_dispatch__ does not work in fbcode') - @onlyCPU - @ops(op_db, allowed_dtypes=(torch.float,)) - def test_composite_compliance(self, device, dtype, op): - samples = op.sample_inputs(device, dtype, requires_grad=False) - - for sample in samples: - args = [sample.input] + list(sample.args) - kwargs = sample.kwargs - _check_composite_compliance(op, args, kwargs) - @onlyCPU @ops(op_db, allowed_dtypes=(torch.float,)) def test_floating_inputs_are_differentiable(self, device, dtype, op): @@ -722,465 +710,48 @@ def check_tensor_floating_is_differentiable(t): for arg in sample.kwargs.values(): check_tensor_floating_is_differentiable(arg) + # Reference testing for operations in complex32 against complex64. + # NOTE: We test against complex64 as NumPy doesn't have a complex32 equivalent dtype. + @ops(op_db, allowed_dtypes=(torch.complex32,)) + def test_complex_half_reference_testing(self, device, dtype, op): + if not op.supports_dtype(torch.complex32, device): + unittest.skip("Does not support complex32") -# gradcheck requires double precision -_gradcheck_ops = partial(ops, dtypes=OpDTypes.supported, - allowed_dtypes=[torch.double, torch.cdouble]) - - -class TestGradients(TestCase): - exact_dtype = True - - # Copies inputs to inplace operations to avoid inplace modifications - # to leaves requiring gradient - def _get_safe_inplace(self, inplace_variant): - @wraps(inplace_variant) - def _fn(t, *args, **kwargs): - return inplace_variant(t.clone(), *args, **kwargs) - - return _fn - - def _check_helper(self, device, dtype, op, variant, check, *, check_forward_ad=False, check_backward_ad=True, - check_batched_grad=None, check_batched_forward_grad=False): - assert check in ('gradcheck', 'bwgrad_bwgrad', 'fwgrad_bwgrad') - # NB: check_backward_ad does not affect gradgradcheck (always True) - if variant is None: - self.skipTest("Skipped! Variant not implemented.") - if not op.supports_dtype(dtype, torch.device(device).type): - self.skipTest(f"Skipped! {op.name} does not support dtype {str(dtype)}") - - def is_inplace(variant): - if hasattr(variant, "__wrapped__"): - return variant.__wrapped__ is op.get_inplace() - return variant is op.get_inplace() + for sample in op.sample_inputs(device, dtype): + actual = op(sample.input, *sample.args, **sample.kwargs) + (inp, args, kwargs) = sample.transform(lambda x: x.to(torch.complex64)) + expected = op(inp, *args, **kwargs) + self.assertEqual(actual, expected, exact_dtype=False) - include_conjugated_inputs = op.test_conjugated_samples and dtype.is_complex - samples = op.sample_inputs(device, dtype, requires_grad=True, include_conjugated_inputs=include_conjugated_inputs) +class TestCompositeCompliance(TestCase): + # Checks if the operator (if it is composite) is written to support most + # backends and Tensor subclasses. See "CompositeImplicitAutograd Compliance" + # in aten/src/ATen/native/README.md for more details + @unittest.skipIf(IS_FBCODE or IS_SANDCASTLE, '__torch_dispatch__ does not work in fbcode') + @ops(op_db, allowed_dtypes=(torch.float,)) + def test_operator(self, device, dtype, op): + samples = op.sample_inputs(device, dtype, requires_grad=False) for sample in samples: - if sample.broadcasts_input and is_inplace(variant): - continue + args = [sample.input] + list(sample.args) + kwargs = sample.kwargs + composite_compliance.check_with_mode(op, args, kwargs) + composite_compliance.check_all_permutations(op, args, kwargs) - # Note on TensorList inputs - # - # gradcheck does not support TensorList inputs so here we pass TensorList - # inputs of size n as n single Tensor inputs to gradcheck and wrap the op - # in a function that puts the n Tensor inputs back into a TensorList - def fn(*inputs): - # Put tensors back into TensorList since we splat them when passing to gradcheck - if is_iterable_of_tensors(sample.input): - n = len(sample.input) - inputs = (inputs[:n], *inputs[n:]) - output = op.gradcheck_wrapper(variant, *inputs, **sample.kwargs) - if sample.output_process_fn_grad is not None: - return sample.output_process_fn_grad(output) - return output - - # Splat TensorList inputs into single Tensor inputs - gradcheck_args = (sample.input,) if isinstance(sample.input, torch.Tensor) else tuple(sample.input) - gradcheck_args += sample.args - - if check == 'gradcheck': - if check_batched_grad is None: - check_batched_grad = op.check_batched_grad - self.assertTrue(gradcheck(fn, gradcheck_args, - check_batched_grad=check_batched_grad, - check_grad_dtypes=True, - nondet_tol=op.gradcheck_nondet_tol, - fast_mode=op.gradcheck_fast_mode, - check_forward_ad=check_forward_ad, - check_backward_ad=check_backward_ad, - check_undefined_grad=True, - check_batched_forward_grad=check_batched_forward_grad)) - elif check in ('bwgrad_bwgrad', 'fwgrad_bwgrad'): # gradgrad check - self.assertFalse(check_forward_ad, msg="Cannot run forward AD check for gradgradcheck") - for gen_non_contig_grad_outputs in (False, True): - kwargs = { - "gen_non_contig_grad_outputs": gen_non_contig_grad_outputs, - "check_batched_grad": op.check_batched_gradgrad, - "check_grad_dtypes": True, - "nondet_tol": op.gradcheck_nondet_tol, - "fast_mode": op.gradcheck_fast_mode - } - if check == "fwgrad_bwgrad": - kwargs["check_fwd_over_rev"] = True - kwargs["check_rev_over_rev"] = False - kwargs["check_batched_grad"] = False - kwargs["check_undefined_grad"] = False - - self.assertTrue(gradgradcheck(fn, gradcheck_args, **kwargs)) - else: - self.assertTrue(False, msg="Unknown check requested!") - - def _grad_test_helper(self, device, dtype, op, variant, *, check_forward_ad=False, check_backward_ad=True, - check_batched_grad=None, check_batched_forward_grad=False): - return self._check_helper(device, dtype, op, variant, 'gradcheck', check_forward_ad=check_forward_ad, - check_backward_ad=check_backward_ad, check_batched_grad=check_batched_grad, - check_batched_forward_grad=check_batched_forward_grad) - - def _skip_helper(self, op, device, dtype): - if not op.supports_autograd and not op.supports_forward_ad: - self.skipTest("Skipped! autograd not supported.") - if not op.supports_complex_autograd(torch.device(device).type) and dtype.is_complex: - self.skipTest("Skipped! Complex autograd not supported.") - - # Tests that gradients are computed correctly - @_gradcheck_ops(op_db) - def test_fn_grad(self, device, dtype, op): - self._skip_helper(op, device, dtype) - self._grad_test_helper(device, dtype, op, op.get_op()) - - # Method grad (and gradgrad, see below) tests are disabled since they're - # costly and redundant with function grad (and gradgad) tests - # @_gradcheck_ops(op_db) - # def test_method_grad(self, device, dtype, op): - # self._skip_helper(op, device, dtype) - # self._grad_test_helper(device, dtype, op, op.get_method()) - - @_gradcheck_ops(op_db) - def test_inplace_grad(self, device, dtype, op): - self._skip_helper(op, device, dtype) - if not op.inplace_variant or not op.supports_inplace_autograd: - self.skipTest("Skipped! Operation does not support inplace autograd.") - self._grad_test_helper(device, dtype, op, self._get_safe_inplace(op.get_inplace())) - - # Test that gradients of gradients are computed correctly - @_gradcheck_ops(op_db) - def test_fn_gradgrad(self, device, dtype, op): - self._skip_helper(op, device, dtype) - if not op.supports_gradgrad: - self.skipTest("Skipped! Operation does not support gradgrad") - self._check_helper(device, dtype, op, op.get_op(), 'bwgrad_bwgrad') - - # Test that forward-over-reverse gradgrad is computed correctly - @_gradcheck_ops(op_db) - def test_fn_fwgrad_bwgrad(self, device, dtype, op): - self._skip_helper(op, device, dtype) - - if op.supports_fwgrad_bwgrad: - self._check_helper(device, dtype, op, op.get_op(), "fwgrad_bwgrad") - else: - err_msg = r"Trying to use forward AD with .* that does not support it\." - hint_msg = ("Running forward-over-backward gradgrad for an OP that has does not support it did not " - "raise any error. If your op supports forward AD, you should set supports_fwgrad_bwgrad=True.") - with self.assertRaisesRegex(NotImplementedError, err_msg, msg=hint_msg): - self._check_helper(device, dtype, op, op.get_op(), "fwgrad_bwgrad") - - # Test that gradients of gradients are properly raising - @_gradcheck_ops(op_db) - def test_fn_fail_gradgrad(self, device, dtype, op): - self._skip_helper(op, device, dtype) - if op.supports_gradgrad: - self.skipTest("Skipped! Operation does support gradgrad") - - err_msg = r"derivative for .* is not implemented" - with self.assertRaisesRegex(RuntimeError, err_msg): - self._check_helper(device, dtype, op, op.get_op(), 'bwgrad_bwgrad') - - # Method gradgrad (and grad, see above) tests are disabled since they're - # costly and redundant with function gradgrad (and grad) tests - # @_gradcheck_ops(op_db) - # def test_method_gradgrad(self, device, dtype, op): - # self._skip_helper(op, device, dtype) - # self._gradgrad_test_helper(device, dtype, op, op.get_method()) - - @_gradcheck_ops(op_db) - def test_inplace_gradgrad(self, device, dtype, op): - self._skip_helper(op, device, dtype) - if not op.inplace_variant or not op.supports_inplace_autograd: - self.skipTest("Skipped! Operation does not support inplace autograd.") - self._check_helper(device, dtype, op, self._get_safe_inplace(op.get_inplace()), "bwgrad_bwgrad") - - def _forward_grad_helper(self, device, dtype, op, variant, is_inplace): - # TODO: clean up how attributes are passed to gradcheck from OpInfos - def call_grad_test_helper(): - check_batched_forward_grad = ((op.check_batched_forward_grad and not is_inplace) or - (op.check_inplace_batched_forward_grad and is_inplace)) - self._grad_test_helper(device, dtype, op, variant, check_forward_ad=True, check_backward_ad=False, - check_batched_grad=False, check_batched_forward_grad=check_batched_forward_grad) - if op.supports_forward_ad: - call_grad_test_helper() - else: - err_msg = r"Trying to use forward AD with .* that does not support it\." - hint_msg = ("Running forward AD for an OP that has does not support it did not " - "raise any error. If your op supports forward AD, you should set supports_forward_ad=True") - with self.assertRaisesRegex(NotImplementedError, err_msg, msg=hint_msg): - call_grad_test_helper() - - @_gradcheck_ops(op_db) - def test_forward_mode_AD(self, device, dtype, op): - self._skip_helper(op, device, dtype) - - self._forward_grad_helper(device, dtype, op, op.get_op(), is_inplace=False) - - @_gradcheck_ops(op_db) - def test_inplace_forward_mode_AD(self, device, dtype, op): - self._skip_helper(op, device, dtype) - - if not op.inplace_variant or not op.supports_inplace_autograd: - self.skipTest("Skipped! Operation does not support inplace autograd.") - - self._forward_grad_helper(device, dtype, op, self._get_safe_inplace(op.get_inplace()), is_inplace=True) - - # Functions that do not support autograd should not fail in forward mode - # Inplace functions (such as "resize_") are expected to fail in forward mode and should be skipped - # Test only when supports_autograd=False and for double dtype - @ops(filter(lambda op: not op.supports_autograd, op_db), dtypes=OpDTypes.supported, allowed_dtypes=(torch.double,)) - def test_nondifferentiable(self, device, dtype, op): - # Expecting no errors + # There are some weird unexpected successe here that imply rocm goes down + # a different path than CUDA sometimes. There's not an easy way to describe + # this in OpInfo so we're just going to skip all ROCM tests... + @unittest.skipIf(TEST_ROCM, "The CUDA tests give sufficient signal") + @unittest.skipIf(IS_FBCODE or IS_SANDCASTLE, '__torch_dispatch__ does not work in fbcode') + @ops([op for op in op_db if op.supports_autograd], allowed_dtypes=(torch.float,)) + def test_backward(self, device, dtype, op): samples = op.sample_inputs(device, dtype, requires_grad=True) - sample = first_sample(self, samples) - result = op(sample.input, *sample.args, **sample.kwargs) - - -# Tests operators for consistency between JIT and eager, also checks -# correctness of JIT specific alias schemas and intended -# autodifferentiation behavior. -# Inherits from JitCommonTestCase instead of TestCase directly to share -# functionality with original test_jit.py method operator tests -class TestJit(JitCommonTestCase): - exact_dtype = True - # Tests that the forward and backward passes of operations produce the - # same values for the cross-product of op variants (function, method, inplace) - # and runtimes (eager, traced, scripted). - # TODO WARNING: inplace x {traced, scripted} not currently tested - @_variant_ops(op_db) - def test_variant_consistency_jit(self, device, dtype, op): - _requires_grad = op.supports_autograd and (dtype.is_floating_point or - op.supports_complex_autograd(torch.device(device).type)) - - include_conjugated_inputs = op.test_conjugated_samples and dtype.is_complex - samples = op.sample_inputs(device, dtype, requires_grad=_requires_grad, include_conjugated_inputs=include_conjugated_inputs) - - # Acquires variants to test - func = op.get_op() - method = op.get_method() - variants = { - # TODO: inplace tests currently fail, fix and add inplace variant - 'function': func, 'method': method, - } - - # TODO: find better way to standardize on op registration itself.. - has_fake_function = op.name in ["resize_", 'resize_as_'] - - if has_fake_function: - variants = {'method': getattr(torch.Tensor, op.name)} - samples = op.sample_inputs(device, dtype, requires_grad=False) - - support_script = op.supports_scripting - - tested = False for sample in samples: - # Test traced and scripted consistency - for func_type, variant in variants.items(): - if variant is None: - continue - - # scripting and check_alias_analysis do not work with lambdas - # lambdas are typically used as a way to simulate methods without - # functional variants, so rely on the other variant for testing - # for now - if is_lambda(variant): - continue - - tested = True - - # Create accessor for script function variant - name = op.name + '_' if func_type == 'inplace' else op.name - - # run with disable_autodiff_subgraph_inlining(True) to test - # autodiff support. Context manager forces the graph to contain - # DifferentiableGraph nodes if they are present - with disable_autodiff_subgraph_inlining(): - # Check scripted forward, grad, and grad grad - if support_script: - script_fn = create_script_fn(self, name, func_type) - - def out_fn(output): - # Processes the output for autograd - if sample.output_process_fn_grad is not None: - return sample.output_process_fn_grad(output) - return output - - def get_sample(): - return clone_input_helper(sample.input) if op.name[-1] == '_' else sample.input - - if support_script: - check_against_reference(self, - script_fn, - func, - out_fn, - (get_sample(),) + sample.args, - sample.kwargs, - no_grad=not _requires_grad, no_gradgrad=not op.supports_gradgrad) - - # Check traced forward, grad, and grad grad - # TODO: fix tracing here - supports_tracing = not has_fake_function - if op.assert_jit_shape_analysis: - self.assertTrue(supports_tracing) - - if supports_tracing: - traced_fn = create_traced_fn(self, variant) - check_against_reference(self, - traced_fn, - func, - out_fn, - (get_sample(),) + sample.args, - sample.kwargs, - no_grad=not _requires_grad, no_gradgrad=not op.supports_gradgrad) - - # Check alias annotation schema for correctness (make - # sure inputs that aren't supposed to be modified aren't) - # Note: only runs in float32 because schema isn't affected by dtype, - # so running it on all dtypes is would be excessive - if dtype == torch.float32: - # TODO: no reason why we cant run this with tracing graph - if support_script and op.name != "rsub": - check_alias_annotation(name, (get_sample(),) + sample.args, sample.kwargs, - func_type=func_type, aten_name=op.aten_name) - - # TODO: use script graph as well - checked_shape_analysis = False - if supports_tracing: - out = variant(get_sample(), *sample.args, **sample.kwargs) - - # right now, tuple of outputs and tensor output supported - # TODO: list of tensor outputs - tuple_of_tensors = isinstance(out, tuple) and all([isinstance(elem, torch.Tensor) for elem in out]) - - if isinstance(out, torch.Tensor) or tuple_of_tensors: - if tuple_of_tensors: - sizes = [elem.size() for elem in out] - else: - sizes = out.size() - self.checkShapeAnalysis(sizes, traced_fn.graph, op.assert_jit_shape_analysis) - checked_shape_analysis = True - if op.assert_jit_shape_analysis: - self.assertTrue(checked_shape_analysis) - - # Check autodifferentiation of nodes for traced and scripted graphs, only need to check once per sample - if dtype is torch.float32: - # Sandcastle doesn't fuse nodes - if IS_SANDCASTLE: - # fusible nodes are expected to be found in FusionGroups in the DifferentiableGraphs - nonfusible_nodes = op.autodiff_nonfusible_nodes + op.autodiff_fusible_nodes - fusible_nodes = [] - else: - nonfusible_nodes = op.autodiff_nonfusible_nodes - fusible_nodes = op.autodiff_fusible_nodes - - if supports_tracing: - self.assertAutodiffNode(traced_fn.last_graph, op.assert_autodiffed, nonfusible_nodes, fusible_nodes) - if support_script: - self.assertAutodiffNode(script_fn.last_graph, op.assert_autodiffed, nonfusible_nodes, fusible_nodes) - assert tested, "JIT Test does not execute any logic" - - # alias testing is only done with torch.float for the same reason - _alias_ops = partial(ops, dtypes=OpDTypes.supported, - allowed_dtypes=(torch.float,)) - - @_alias_ops((op for op in op_db if op.aliases)) - def test_jit_alias_remapping(self, device, dtype, op): - # Required to avoid undefined value: tensor error in JIT compilation of the function template - tensor = torch.tensor - - # NOTE: only tests on first sample - samples = op.sample_inputs(device, dtype, requires_grad=True) - sample = first_sample(self, samples) - - # [Scripting Data Preparation] - # Prepare data for test scripting - # Below we prepare strings of args/kwargs with and without type annotations. - # These strings are inserted into function template strings which is then torch scripted. - # - args string is ["t0"] corresponding to the "input" tensor required by the op - # - args_kw is the value of args and strings of kwargs used to call the op (without type annotations), for example, - # ["to", "1.0", "(1,)", "True", "tensor(1.0)"] -> def fn(t0): return variant(t0, 1.0, (1,), True, tensor(1.0)) - args = ["t0"] - - def quote_strs(v): - if isinstance(v, str): - return f"'{v}'" - - return str(v) - - args_kw = args + \ - [f"{v}" for v in sample.args] + \ - [f"{k}={quote_strs(v)}" for k, v in sample.kwargs.items()] - - # Prepare data for test tracing - sample_args_kwargs = () - if len(sample.args) > 0: - sample_args_kwargs += (sample.args, ) - if len(sample.kwargs) > 0: - sample_args_kwargs += (sample.kwargs, ) - - original_name = op.aten_name - original_name_inplace = original_name + "_" - expected_dtype = op(sample.input, *sample.args, **sample.kwargs).dtype + args = [sample.input] + list(sample.args) + kwargs = sample.kwargs + composite_compliance.check_backward_formula(op, args, kwargs) - for a_op in op.aliases: - inplace = a_op.inplace_variant - method_or_inplace = [a_op.inplace_variant, a_op.method_variant] - variants = (v for v in (a_op.op, a_op.method_variant, a_op.inplace_variant) if v is not None) - - # Test scripting: - for variant in variants: - variant_name = variant.__name__ - op_name = original_name_inplace if variant is inplace else original_name - - if variant in method_or_inplace: - fn_template = ''' - def _fn(t0{c}): - return t0.{alias_name}({args_kw}) - ''' - # remove the first input tensor - script = fn_template.format( - c=", " if len(args_kw[1:]) > 1 else "", - args_kw=", ".join(args_kw[1:]), - alias_name=variant_name, - ) - else: - fn_template = ''' - def _fn({args}): - return variant({args_kw}) - ''' - script = fn_template.format( - args=", ".join(args), - args_kw=", ".join(args_kw), - ) - scripted = torch.jit.CompilationUnit(script)._fn - - if (variant is inplace and not torch.can_cast(expected_dtype, dtype)): - try: - inp = clone_input_helper(sample.input) - scripted(inp) - except Exception as e: - continue - self.fail("Inplace operation on integer tensor that should be promoted to float didn't fail!") - - inp = clone_input_helper(sample.input) - scripted(inp) - inp = clone_input_helper(sample.input) - graph = scripted.graph_for(inp) - FileCheck().check(op.aten_name).check_not(variant_name).run(graph) - - # Test tracing: - for variant in variants: - variant_name = variant.__name__ - op_name = original_name_inplace if variant is inplace else original_name - - def _fn(*sample_args, **sample_kwargs): - return variant(*sample_args, **sample_kwargs) - - inp = (clone_input_helper(sample.input),) + sample_args_kwargs - traced = torch.jit.trace(_fn, *inp) - inp = (clone_input_helper(sample.input),) + sample_args_kwargs - traced(*inp) - inp = (clone_input_helper(sample.input),) + sample_args_kwargs - graph = traced.graph_for(*inp) - FileCheck().check(op_name).check_not(variant_name).run(graph) class TestMathBits(TestCase): # Tests that @@ -1313,8 +884,7 @@ def is_bit_set(x): instantiate_device_type_tests(TestCommon, globals()) -instantiate_device_type_tests(TestGradients, globals()) -instantiate_device_type_tests(TestJit, globals()) +instantiate_device_type_tests(TestCompositeCompliance, globals()) instantiate_device_type_tests(TestMathBits, globals()) if __name__ == '__main__': diff --git a/test/test_ops_gradients.py b/test/test_ops_gradients.py new file mode 100644 index 00000000000000..6d8037fb44d0fd --- /dev/null +++ b/test/test_ops_gradients.py @@ -0,0 +1,228 @@ +# Owner(s): ["module: unknown"] + +from functools import partial, wraps +import torch + +from torch.testing._internal.common_utils import \ + (TestCase, is_iterable_of_tensors, run_tests, gradcheck, gradgradcheck, first_sample) +from torch.testing._internal.common_methods_invocations import op_db +from torch.testing._internal.common_device_type import \ + (instantiate_device_type_tests, ops, OpDTypes) + +# TODO: fixme https://github.com/pytorch/pytorch/issues/68972 +torch.set_default_dtype(torch.float32) + +# gradcheck requires double precision +_gradcheck_ops = partial(ops, dtypes=OpDTypes.supported, + allowed_dtypes=[torch.double, torch.cdouble]) + +class TestGradients(TestCase): + exact_dtype = True + + # Copies inputs to inplace operations to avoid inplace modifications + # to leaves requiring gradient + def _get_safe_inplace(self, inplace_variant): + @wraps(inplace_variant) + def _fn(t, *args, **kwargs): + return inplace_variant(t.clone(), *args, **kwargs) + + return _fn + + def _check_helper(self, device, dtype, op, variant, check, *, check_forward_ad=False, check_backward_ad=True, + check_batched_grad=None, check_batched_forward_grad=False): + assert check in ('gradcheck', 'bwgrad_bwgrad', 'fwgrad_bwgrad') + # NB: check_backward_ad does not affect gradgradcheck (always True) + if variant is None: + self.skipTest("Skipped! Variant not implemented.") + if not op.supports_dtype(dtype, torch.device(device).type): + self.skipTest(f"Skipped! {op.name} does not support dtype {str(dtype)}") + + def is_inplace(variant): + if hasattr(variant, "__wrapped__"): + return variant.__wrapped__ is op.get_inplace() + return variant is op.get_inplace() + + include_conjugated_inputs = op.test_conjugated_samples and dtype.is_complex + samples = op.sample_inputs(device, dtype, requires_grad=True, include_conjugated_inputs=include_conjugated_inputs) + + for sample in samples: + if sample.broadcasts_input and is_inplace(variant): + continue + + # Note on TensorList inputs + # + # gradcheck does not support TensorList inputs so here we pass TensorList + # inputs of size n as n single Tensor inputs to gradcheck and wrap the op + # in a function that puts the n Tensor inputs back into a TensorList + def fn(*inputs): + # Put tensors back into TensorList since we splat them when passing to gradcheck + if is_iterable_of_tensors(sample.input): + n = len(sample.input) + inputs = (inputs[:n], *inputs[n:]) + output = op.gradcheck_wrapper(variant, *inputs, **sample.kwargs) + if sample.output_process_fn_grad is not None: + return sample.output_process_fn_grad(output) + return output + + # Splat TensorList inputs into single Tensor inputs + gradcheck_args = (sample.input,) if isinstance(sample.input, torch.Tensor) else tuple(sample.input) + gradcheck_args += sample.args + + if check == 'gradcheck': + if check_batched_grad is None: + check_batched_grad = op.check_batched_grad + self.assertTrue(gradcheck(fn, gradcheck_args, + check_batched_grad=check_batched_grad, + check_grad_dtypes=True, + nondet_tol=op.gradcheck_nondet_tol, + fast_mode=op.gradcheck_fast_mode, + check_forward_ad=check_forward_ad, + check_backward_ad=check_backward_ad, + check_undefined_grad=True, + check_batched_forward_grad=check_batched_forward_grad)) + elif check in ('bwgrad_bwgrad', 'fwgrad_bwgrad'): # gradgrad check + self.assertFalse(check_forward_ad, msg="Cannot run forward AD check for gradgradcheck") + for gen_non_contig_grad_outputs in (False, True): + kwargs = { + "gen_non_contig_grad_outputs": gen_non_contig_grad_outputs, + "check_batched_grad": op.check_batched_gradgrad, + "check_grad_dtypes": True, + "nondet_tol": op.gradcheck_nondet_tol, + "fast_mode": op.gradcheck_fast_mode + } + if check == "fwgrad_bwgrad": + kwargs["check_fwd_over_rev"] = True + kwargs["check_rev_over_rev"] = False + kwargs["check_batched_grad"] = False + kwargs["check_undefined_grad"] = False + + self.assertTrue(gradgradcheck(fn, gradcheck_args, **kwargs)) + else: + self.assertTrue(False, msg="Unknown check requested!") + + def _grad_test_helper(self, device, dtype, op, variant, *, check_forward_ad=False, check_backward_ad=True, + check_batched_grad=None, check_batched_forward_grad=False): + return self._check_helper(device, dtype, op, variant, 'gradcheck', check_forward_ad=check_forward_ad, + check_backward_ad=check_backward_ad, check_batched_grad=check_batched_grad, + check_batched_forward_grad=check_batched_forward_grad) + + def _skip_helper(self, op, device, dtype): + if not op.supports_autograd and not op.supports_forward_ad: + self.skipTest("Skipped! autograd not supported.") + if not op.supports_complex_autograd(torch.device(device).type) and dtype.is_complex: + self.skipTest("Skipped! Complex autograd not supported.") + + # Tests that gradients are computed correctly + @_gradcheck_ops(op_db) + def test_fn_grad(self, device, dtype, op): + self._skip_helper(op, device, dtype) + self._grad_test_helper(device, dtype, op, op.get_op()) + + # Method grad (and gradgrad, see below) tests are disabled since they're + # costly and redundant with function grad (and gradgad) tests + # @_gradcheck_ops(op_db) + # def test_method_grad(self, device, dtype, op): + # self._skip_helper(op, device, dtype) + # self._grad_test_helper(device, dtype, op, op.get_method()) + + @_gradcheck_ops(op_db) + def test_inplace_grad(self, device, dtype, op): + self._skip_helper(op, device, dtype) + if not op.inplace_variant or not op.supports_inplace_autograd: + self.skipTest("Skipped! Operation does not support inplace autograd.") + self._grad_test_helper(device, dtype, op, self._get_safe_inplace(op.get_inplace())) + + # Test that gradients of gradients are computed correctly + @_gradcheck_ops(op_db) + def test_fn_gradgrad(self, device, dtype, op): + self._skip_helper(op, device, dtype) + if not op.supports_gradgrad: + self.skipTest("Skipped! Operation does not support gradgrad") + self._check_helper(device, dtype, op, op.get_op(), 'bwgrad_bwgrad') + + # Test that forward-over-reverse gradgrad is computed correctly + @_gradcheck_ops(op_db) + def test_fn_fwgrad_bwgrad(self, device, dtype, op): + self._skip_helper(op, device, dtype) + + if op.supports_fwgrad_bwgrad: + self._check_helper(device, dtype, op, op.get_op(), "fwgrad_bwgrad") + else: + err_msg = r"Trying to use forward AD with .* that does not support it" + hint_msg = ("Running forward-over-backward gradgrad for an OP that has does not support it did not " + "raise any error. If your op supports forward AD, you should set supports_fwgrad_bwgrad=True.") + with self.assertRaisesRegex(NotImplementedError, err_msg, msg=hint_msg): + self._check_helper(device, dtype, op, op.get_op(), "fwgrad_bwgrad") + + # Test that gradients of gradients are properly raising + @_gradcheck_ops(op_db) + def test_fn_fail_gradgrad(self, device, dtype, op): + self._skip_helper(op, device, dtype) + if op.supports_gradgrad: + self.skipTest("Skipped! Operation does support gradgrad") + + err_msg = r"derivative for .* is not implemented" + with self.assertRaisesRegex(RuntimeError, err_msg): + self._check_helper(device, dtype, op, op.get_op(), 'bwgrad_bwgrad') + + # Method gradgrad (and grad, see above) tests are disabled since they're + # costly and redundant with function gradgrad (and grad) tests + # @_gradcheck_ops(op_db) + # def test_method_gradgrad(self, device, dtype, op): + # self._skip_helper(op, device, dtype) + # self._gradgrad_test_helper(device, dtype, op, op.get_method()) + + @_gradcheck_ops(op_db) + def test_inplace_gradgrad(self, device, dtype, op): + self._skip_helper(op, device, dtype) + if not op.inplace_variant or not op.supports_inplace_autograd: + self.skipTest("Skipped! Operation does not support inplace autograd.") + self._check_helper(device, dtype, op, self._get_safe_inplace(op.get_inplace()), "bwgrad_bwgrad") + + def _forward_grad_helper(self, device, dtype, op, variant, is_inplace): + # TODO: clean up how attributes are passed to gradcheck from OpInfos + def call_grad_test_helper(): + check_batched_forward_grad = ((op.check_batched_forward_grad and not is_inplace) or + (op.check_inplace_batched_forward_grad and is_inplace)) + self._grad_test_helper(device, dtype, op, variant, check_forward_ad=True, check_backward_ad=False, + check_batched_grad=False, check_batched_forward_grad=check_batched_forward_grad) + if op.supports_forward_ad: + call_grad_test_helper() + else: + err_msg = r"Trying to use forward AD with .* that does not support it" + hint_msg = ("Running forward AD for an OP that has does not support it did not " + "raise any error. If your op supports forward AD, you should set supports_forward_ad=True") + with self.assertRaisesRegex(NotImplementedError, err_msg, msg=hint_msg): + call_grad_test_helper() + + @_gradcheck_ops(op_db) + def test_forward_mode_AD(self, device, dtype, op): + self._skip_helper(op, device, dtype) + + self._forward_grad_helper(device, dtype, op, op.get_op(), is_inplace=False) + + @_gradcheck_ops(op_db) + def test_inplace_forward_mode_AD(self, device, dtype, op): + self._skip_helper(op, device, dtype) + + if not op.inplace_variant or not op.supports_inplace_autograd: + self.skipTest("Skipped! Operation does not support inplace autograd.") + + self._forward_grad_helper(device, dtype, op, self._get_safe_inplace(op.get_inplace()), is_inplace=True) + + # Functions that do not support autograd should not fail in forward mode + # Inplace functions (such as "resize_") are expected to fail in forward mode and should be skipped + # Test only when supports_autograd=False and for double dtype + @ops(filter(lambda op: not op.supports_autograd, op_db), dtypes=OpDTypes.supported, allowed_dtypes=(torch.double,)) + def test_nondifferentiable(self, device, dtype, op): + # Expecting no errors + samples = op.sample_inputs(device, dtype, requires_grad=True) + sample = first_sample(self, samples) + result = op(sample.input, *sample.args, **sample.kwargs) + + + +instantiate_device_type_tests(TestGradients, globals()) + +if __name__ == '__main__': + run_tests() diff --git a/test/test_ops_jit.py b/test/test_ops_jit.py new file mode 100644 index 00000000000000..f74587955cf3c7 --- /dev/null +++ b/test/test_ops_jit.py @@ -0,0 +1,280 @@ +# Owner(s): ["module: unknown"] + +from functools import partial + +import torch + +from torch.testing import FileCheck +from torch.testing._internal.common_utils import \ + (run_tests, IS_SANDCASTLE, clone_input_helper, first_sample) +from torch.testing._internal.common_methods_invocations import op_db +from torch.testing._internal.common_device_type import instantiate_device_type_tests, ops, OpDTypes +from torch.testing._internal.common_jit import JitCommonTestCase, check_against_reference +from torch.testing._internal.jit_metaprogramming_utils import create_script_fn, create_traced_fn, check_alias_annotation +from torch.testing._internal.jit_utils import disable_autodiff_subgraph_inlining, is_lambda + + +# TODO: fixme https://github.com/pytorch/pytorch/issues/68972 +torch.set_default_dtype(torch.float32) + +# variant testing is only done with torch.float and torch.cfloat to avoid +# excessive test times and maximize signal to noise ratio +_variant_ops = partial(ops, dtypes=OpDTypes.supported, + allowed_dtypes=(torch.float, torch.cfloat)) + + + +# Tests operators for consistency between JIT and eager, also checks +# correctness of JIT specific alias schemas and intended +# autodifferentiation behavior. +# Inherits from JitCommonTestCase instead of TestCase directly to share +# functionality with original test_jit.py method operator tests +class TestJit(JitCommonTestCase): + exact_dtype = True + + # Tests that the forward and backward passes of operations produce the + # same values for the cross-product of op variants (function, method, inplace) + # and runtimes (eager, traced, scripted). + # TODO WARNING: inplace x {traced, scripted} not currently tested + @_variant_ops(op_db) + def test_variant_consistency_jit(self, device, dtype, op): + _requires_grad = op.supports_autograd and (dtype.is_floating_point or + op.supports_complex_autograd(torch.device(device).type)) + + include_conjugated_inputs = op.test_conjugated_samples and dtype.is_complex + samples = op.sample_inputs(device, dtype, requires_grad=_requires_grad, include_conjugated_inputs=include_conjugated_inputs) + + # Acquires variants to test + func = op.get_op() + method = op.get_method() + variants = { + # TODO: inplace tests currently fail, fix and add inplace variant + 'function': func, 'method': method, + } + + # TODO: find better way to standardize on op registration itself.. + has_fake_function = op.name in ["resize_", 'resize_as_'] + + if has_fake_function: + variants = {'method': getattr(torch.Tensor, op.name)} + samples = op.sample_inputs(device, dtype, requires_grad=False) + + support_script = op.supports_scripting + + tested = False + for sample in samples: + # Test traced and scripted consistency + for func_type, variant in variants.items(): + if variant is None: + continue + + # scripting and check_alias_analysis do not work with lambdas + # lambdas are typically used as a way to simulate methods without + # functional variants, so rely on the other variant for testing + # for now + if is_lambda(variant): + continue + + tested = True + + # Create accessor for script function variant + name = op.name + '_' if func_type == 'inplace' else op.name + + # run with disable_autodiff_subgraph_inlining(True) to test + # autodiff support. Context manager forces the graph to contain + # DifferentiableGraph nodes if they are present + with disable_autodiff_subgraph_inlining(): + # Check scripted forward, grad, and grad grad + if support_script: + script_fn = create_script_fn(self, name, func_type) + + def out_fn(output): + # Processes the output for autograd + if sample.output_process_fn_grad is not None: + return sample.output_process_fn_grad(output) + return output + + def get_sample(): + return clone_input_helper(sample.input) if op.name[-1] == '_' else sample.input + + if support_script: + check_against_reference(self, + script_fn, + func, + out_fn, + (get_sample(),) + sample.args, + sample.kwargs, + no_grad=not _requires_grad, no_gradgrad=not op.supports_gradgrad) + + # Check traced forward, grad, and grad grad + # TODO: fix tracing here + supports_tracing = not has_fake_function + if op.assert_jit_shape_analysis: + self.assertTrue(supports_tracing) + + if supports_tracing: + traced_fn = create_traced_fn(self, variant) + check_against_reference(self, + traced_fn, + func, + out_fn, + (get_sample(),) + sample.args, + sample.kwargs, + no_grad=not _requires_grad, no_gradgrad=not op.supports_gradgrad) + + # Check alias annotation schema for correctness (make + # sure inputs that aren't supposed to be modified aren't) + # Note: only runs in float32 because schema isn't affected by dtype, + # so running it on all dtypes is would be excessive + if dtype == torch.float32: + # TODO: no reason why we cant run this with tracing graph + if support_script and op.name != "rsub": + check_alias_annotation(name, (get_sample(),) + sample.args, sample.kwargs, + func_type=func_type, aten_name=op.aten_name) + + # TODO: use script graph as well + checked_shape_analysis = False + if supports_tracing: + out = variant(get_sample(), *sample.args, **sample.kwargs) + + # right now, tuple of outputs and tensor output supported + # TODO: list of tensor outputs + tuple_of_tensors = isinstance(out, tuple) and all([isinstance(elem, torch.Tensor) for elem in out]) + + if isinstance(out, torch.Tensor) or tuple_of_tensors: + if tuple_of_tensors: + sizes = [elem.size() for elem in out] + else: + sizes = out.size() + self.checkShapeAnalysis(sizes, traced_fn.graph, op.assert_jit_shape_analysis) + checked_shape_analysis = True + if op.assert_jit_shape_analysis: + self.assertTrue(checked_shape_analysis) + + # Check autodifferentiation of nodes for traced and scripted graphs, only need to check once per sample + if dtype is torch.float32: + # Sandcastle doesn't fuse nodes + if IS_SANDCASTLE: + # fusible nodes are expected to be found in FusionGroups in the DifferentiableGraphs + nonfusible_nodes = op.autodiff_nonfusible_nodes + op.autodiff_fusible_nodes + fusible_nodes = [] + else: + nonfusible_nodes = op.autodiff_nonfusible_nodes + fusible_nodes = op.autodiff_fusible_nodes + + if supports_tracing: + self.assertAutodiffNode(traced_fn.last_graph, op.assert_autodiffed, nonfusible_nodes, fusible_nodes) + if support_script: + self.assertAutodiffNode(script_fn.last_graph, op.assert_autodiffed, nonfusible_nodes, fusible_nodes) + assert tested, "JIT Test does not execute any logic" + + # alias testing is only done with torch.float for the same reason + _alias_ops = partial(ops, dtypes=OpDTypes.supported, + allowed_dtypes=(torch.float,)) + + @_alias_ops((op for op in op_db if op.aliases)) + def test_jit_alias_remapping(self, device, dtype, op): + # Required to avoid undefined value: tensor error in JIT compilation of the function template + tensor = torch.tensor + + # NOTE: only tests on first sample + samples = op.sample_inputs(device, dtype, requires_grad=True) + sample = first_sample(self, samples) + + # [Scripting Data Preparation] + # Prepare data for test scripting + # Below we prepare strings of args/kwargs with and without type annotations. + # These strings are inserted into function template strings which is then torch scripted. + # - args string is ["t0"] corresponding to the "input" tensor required by the op + # - args_kw is the value of args and strings of kwargs used to call the op (without type annotations), for example, + # ["to", "1.0", "(1,)", "True", "tensor(1.0)"] -> def fn(t0): return variant(t0, 1.0, (1,), True, tensor(1.0)) + args = ["t0"] + + def quote_strs(v): + if isinstance(v, str): + return f"'{v}'" + + return str(v) + + args_kw = args + \ + [f"{v}" for v in sample.args] + \ + [f"{k}={quote_strs(v)}" for k, v in sample.kwargs.items()] + + # Prepare data for test tracing + sample_args_kwargs = () + if len(sample.args) > 0: + sample_args_kwargs += (sample.args, ) + if len(sample.kwargs) > 0: + sample_args_kwargs += (sample.kwargs, ) + + original_name = op.aten_name + original_name_inplace = original_name + "_" + expected_dtype = op(sample.input, *sample.args, **sample.kwargs).dtype + + for a_op in op.aliases: + inplace = a_op.inplace_variant + method_or_inplace = [a_op.inplace_variant, a_op.method_variant] + variants = (v for v in (a_op.op, a_op.method_variant, a_op.inplace_variant) if v is not None) + + # Test scripting: + for variant in variants: + variant_name = variant.__name__ + op_name = original_name_inplace if variant is inplace else original_name + + if variant in method_or_inplace: + fn_template = ''' + def _fn(t0{c}): + return t0.{alias_name}({args_kw}) + ''' + # remove the first input tensor + script = fn_template.format( + c=", " if len(args_kw[1:]) > 1 else "", + args_kw=", ".join(args_kw[1:]), + alias_name=variant_name, + ) + else: + fn_template = ''' + def _fn({args}): + return variant({args_kw}) + ''' + script = fn_template.format( + args=", ".join(args), + args_kw=", ".join(args_kw), + ) + scripted = torch.jit.CompilationUnit(script)._fn + + if (variant is inplace and not torch.can_cast(expected_dtype, dtype)): + try: + inp = clone_input_helper(sample.input) + scripted(inp) + except Exception as e: + continue + self.fail("Inplace operation on integer tensor that should be promoted to float didn't fail!") + + inp = clone_input_helper(sample.input) + scripted(inp) + inp = clone_input_helper(sample.input) + graph = scripted.graph_for(inp) + FileCheck().check(op.aten_name).check_not(variant_name).run(graph) + + # Test tracing: + for variant in variants: + variant_name = variant.__name__ + op_name = original_name_inplace if variant is inplace else original_name + + def _fn(*sample_args, **sample_kwargs): + return variant(*sample_args, **sample_kwargs) + + inp = (clone_input_helper(sample.input),) + sample_args_kwargs + traced = torch.jit.trace(_fn, *inp) + inp = (clone_input_helper(sample.input),) + sample_args_kwargs + traced(*inp) + inp = (clone_input_helper(sample.input),) + sample_args_kwargs + graph = traced.graph_for(*inp) + FileCheck().check(op_name).check_not(variant_name).run(graph) + + +instantiate_device_type_tests(TestJit, globals()) + +if __name__ == '__main__': + run_tests() diff --git a/test/test_optim.py b/test/test_optim.py index c59d6a49bb4918..7ec98caeebe484 100644 --- a/test/test_optim.py +++ b/test/test_optim.py @@ -20,7 +20,7 @@ _LRScheduler, CyclicLR, CosineAnnealingWarmRestarts, OneCycleLR, ChainedScheduler, \ EPOCH_DEPRECATION_WARNING from torch.optim.swa_utils import AveragedModel, SWALR, update_bn -from torch.testing._internal.common_utils import TestCase, run_tests, TEST_WITH_UBSAN, load_tests, \ +from torch.testing._internal.common_utils import TestCase, run_tests, TEST_WITH_ROCM, TEST_WITH_UBSAN, load_tests, \ skipIfRocm # load_tests from common_utils is used to automatically filter tests for # sharding on sandcastle. This line silences flake warnings @@ -228,6 +228,12 @@ def fn_base(optimizer, weight, bias): # Make sure state dict wasn't modified self.assertEqual(state_dict, state_dict_c) + # Make sure that device of state['step'] is still CPU + new_state_dict = optimizer_cuda.state_dict() + if 'step' in state_dict['state'][0] and torch.is_tensor(state_dict['state'][0]['step']): + for state in new_state_dict['state'].values(): + self.assertEqual(state['step'].device.type, 'cpu') + for _i in range(20): optimizer.step(fn) optimizer_cuda.step(fn_cuda) @@ -620,20 +626,24 @@ def test_adadelta(self): self.rel_tol = 4e-3 for optimizer in [optim.Adadelta, optim_mt.Adadelta]: self._test_basic_cases( - lambda weight, bias: optimizer([weight, bias]) + lambda weight, bias, maximize: optimizer([weight, bias], maximize=maximize), + constructor_accepts_maximize=True ) self._test_basic_cases( - lambda weight, bias: optimizer( - self._build_params_dict(weight, bias, rho=0.95)) + lambda weight, bias, maximize: optimizer( + self._build_params_dict(weight, bias, rho=0.95), maximize=maximize), + constructor_accepts_maximize=True ) self._test_basic_cases( - lambda weight, bias: optimizer( - self._build_params_dict(weight, bias, rho=0.95)), + lambda weight, bias, maximize: optimizer( + self._build_params_dict(weight, bias, rho=0.95), maximize=maximize), [lambda opt: StepLR(opt, gamma=0.9, step_size=10), - lambda opt: ReduceLROnPlateau(opt)] + lambda opt: ReduceLROnPlateau(opt)], + constructor_accepts_maximize=True ) self._test_basic_cases( - lambda weight, bias: optimizer([weight, bias], weight_decay=1) + lambda weight, bias, maximize: optimizer([weight, bias], weight_decay=1, maximize=maximize), + constructor_accepts_maximize=True ) with self.assertRaisesRegex(ValueError, "Invalid rho value: 1.1"): optimizer(None, lr=1e-2, rho=1.1) @@ -653,6 +663,8 @@ def test_adadelta_complex(self): ) def test_nadam(self): + if TEST_WITH_ROCM: + self.rel_tol = 1e-5 for optimizer in [optim.NAdam, optim_mt.NAdam]: self._test_basic_cases( lambda weight, bias: optimizer([weight, bias], lr=1e-3) diff --git a/test/test_overrides.py b/test/test_overrides.py index e3a7e2b13eed70..34eac8081db0af 100644 --- a/test/test_overrides.py +++ b/test/test_overrides.py @@ -1,4 +1,4 @@ -# Owner(s): ["high priority"] +# Owner(s): ["module: __torch_function__"] import torch import numpy as np @@ -7,6 +7,7 @@ import pprint import pickle import collections +import unittest from torch.testing._internal.common_utils import TestCase, run_tests from torch.overrides import ( @@ -14,8 +15,10 @@ has_torch_function, get_overridable_functions, get_testing_overrides, - is_tensor_method_or_property + is_tensor_method_or_property, + TorchFunctionMode ) +from functools import partial Tensor = torch.Tensor @@ -28,7 +31,7 @@ def foo(a, b, c=None): """A function multiple arguments and an optional argument""" - if any(type(t) is not Tensor for t in (a, b, c)) and has_torch_function((a, b, c)): + if has_torch_function((a, b, c)): return handle_torch_function(foo, (a, b, c), a, b, c=c) if c: return a + b + c @@ -36,19 +39,19 @@ def foo(a, b, c=None): def bar(a): """A function with one argument""" - if type(a) is not Tensor and has_torch_function((a,)): + if has_torch_function((a,)): return handle_torch_function(bar, (a,), a) return a def baz(a, b): """A function with multiple arguments""" - if type(a) is not Tensor or type(b) is not Tensor and has_torch_function((a, b)): + if has_torch_function((a, b)): return handle_torch_function(baz, (a, b), a, b) return a + b def quux(a): """Used to test that errors raised in user implementations get propagated""" - if type(a) is not Tensor and has_torch_function((a,)): + if has_torch_function((a,)): return handle_torch_function(quux, (a,), a) return a @@ -621,6 +624,9 @@ def instance_gen(): func_args.append(torch.float32) elif t == 'c10::string_view': func_args.append('') + elif t == 'SymInt': + # TODO: generate actual SymbolicInt + func_args.append(1) else: raise RuntimeError(f"Unsupported argument type {t} for {arg['name']} of function {func}") else: @@ -690,7 +696,10 @@ def test(self): test_method.__name__ = name setattr(cls, name, test_method) -# generate_tensor_like_override_tests(TestTorchFunctionOverride) +generate_tensor_like_override_tests(TestTorchFunctionOverride) +TestTorchFunctionOverride.test_torch_functional_histogramdd = unittest.skip( + "histogramdd is missing __torch_function__ support")( + TestTorchFunctionOverride.test_torch_functional_histogramdd) class Wrapper: "Basic data container that knows how to unwrap itself" @@ -1056,14 +1065,151 @@ def __torch_function__(self, *args, **kwargs): pass a = Bad1() - with self.assertWarnsRegex(DeprecationWarning, "as a plain method is deprecated"): - # This needs to be a function that handle torch_function on the python side - torch.split(a, (2)) - - a = Bad2() - with self.assertWarnsRegex(DeprecationWarning, "as a plain method is deprecated"): - # This needs to be a function that handle torch_function on the python side - torch.split(a, (2)) + for a in (Bad1(), Bad2()): + with self.assertWarnsRegex(DeprecationWarning, "as a plain method is deprecated"): + # Function that handles torch_function on the python side + torch.nn.functional.dropout(a) + + with self.assertWarnsRegex(UserWarning, "as a plain method is deprecated"): + # Function that handles torch_function in C++ + torch.abs(a) + +class TestTorchFunctionMode(TestCase): + def test_basic(self): + class A(TorchFunctionMode): + def __torch_function__(self, *args, **kwargs): + return -1 + # NB: factory functions get overridden too! + x = torch.randn(1) + with torch.overrides.push_torch_function_mode(A): + self.assertEqual(torch.randn(3), -1) + self.assertEqual(torch.add(x, x), -1) + self.assertEqual(torch.split(None, [2]), -1) # python side + self.assertEqual(bar(x), -1) + + def test_enable_torch_function_mode_with_tensor_subclass(self): + x = torch.randn(1) + with torch.overrides.enable_torch_function_mode(SubTensor): + self.assertEqual(torch.mm(x, x), -1) + + def test_modes_handle_first(self): + class A(TorchFunctionMode): + def __torch_function__(self, *args, **kwargs): + return -40 + + x = SubTensor() + with torch.overrides.push_torch_function_mode(A): + self.assertEqual(torch.neg(x), -40) + self.assertEqual(torch.mean(x), -40) + self.assertEqual(torch.mm(x, x), -40) + self.assertEqual(bar(x), -40) + + def test_modes_return_notimplemented(self): + class MyMode(TorchFunctionMode): + def __torch_function__(self, *args, **kwargs): + return NotImplemented + + x = SubTensor() + with torch.overrides.push_torch_function_mode(MyMode): + self.assertEqual(torch.mean(x), 0) + self.assertEqual(torch.mm(x, x), -1) + self.assertEqual(bar(x), 1) + self.assertRaisesRegex( + TypeError, r'SubTensor.+MyMode', + lambda: self.assertEqual(torch.max(x, x))) + + def test_mode_stack(self): + logs = [] + + class Logger(TorchFunctionMode): + def __init__(self, name): + self.name = name + + def __torch_function__(self, func, types, args=(), kwargs=None): + if kwargs is None: + kwargs = {} + logs.append(self.name) + return func(*args, **kwargs) + + x = torch.randn(1) + with torch.overrides.push_torch_function_mode(partial(Logger, "A")): + with torch.overrides.push_torch_function_mode(partial(Logger, "B")): + torch.mean(x) + + self.assertEqual(logs, ["B", "A"]) + + def test_push_mode_instance_errors(self): + class A(TorchFunctionMode): + pass + with self.assertRaisesRegex(ValueError, 'instance of TorchFunctionMode'): + with torch.overrides.push_torch_function_mode(A(inner=None)): + pass + + def test_push_mode_returns_unrelated(self): + with self.assertRaisesRegex(ValueError, 'return a TorchFunctionMode'): + with torch.overrides.push_torch_function_mode(lambda *, inner: None): + pass + + def test_missing_inner_mode_ctor(self): + self.assertRaisesRegex(TypeError, 'push_torch_function_mode', lambda: TorchFunctionMode()) + + def test_enable_torch_function_mode_trivial(self): + class A(TorchFunctionMode): + def __torch_function__(self, *args, **kwargs): + return -40 + a = A(inner=None) + with torch.overrides.enable_torch_function_mode(a): + with torch.overrides.enable_torch_function_mode(a): + self.assertEqual(bar(None), -40) + + def test_enable_torch_function_mode_replace(self): + class A(TorchFunctionMode): + def __init__(self, val): + self.val = val + + def __torch_function__(self, *args, **kwargs): + return self.val + a1 = A(-40, inner=None) + a2 = A(-41, inner=None) + with torch.overrides.enable_torch_function_mode(a1): + with torch.overrides.enable_torch_function_mode(a2, replace=a1): + self.assertEqual(bar(None), -41) + + def test_enable_torch_function_mode_ignore_preexisting(self): + class A(TorchFunctionMode): + def __init__(self, val): + self.val = val + + def __torch_function__(self, *args, **kwargs): + return self.val + a1 = A(-40, inner=None) + a2 = A(-41, inner=None) + with torch.overrides.enable_torch_function_mode(a1): + with torch.overrides.enable_torch_function_mode(a2, ignore_preexisting=True): + self.assertEqual(bar(None), -41) + + def test_reentrant_mode_idiom(self): + log = [] + + class A(TorchFunctionMode): + def __torch_function__(self, func, types, args=(), kwargs=None): + if kwargs is None: + kwargs = {} + log.append(func) + if func is torch.sub: + with torch.overrides.enable_torch_function_mode(self, replace=self.inner): + input, other = args + assert not kwargs + return torch.add(input, other, alpha=-1) + return func(*args, **kwargs) + + x = torch.randn(1) + y = torch.randn(1) + with torch.overrides.push_torch_function_mode(A): + torch.sub(x, y) + # add hits the torch function again! + self.assertEqual(log, [torch.sub, torch.add]) + if __name__ == '__main__': run_tests() diff --git a/test/test_per_overload_api.py b/test/test_per_overload_api.py index cb949180320d4e..cdb2b79835121a 100644 --- a/test/test_per_overload_api.py +++ b/test/test_per_overload_api.py @@ -10,8 +10,8 @@ def test_basics_opoverloadpacket(self): add_packet = torch.ops.aten.add # class attributes - self.assertEqual(add_packet.op_name, 'add') - self.assertEqual(add_packet.qualified_op_name, 'aten.add') + self.assertEqual(add_packet.__name__, 'add') + self.assertEqual(str(add_packet), 'aten.add') # callable self.assertEqual(add_packet(torch.tensor(2), torch.tensor(3)), torch.tensor(5)) @@ -27,7 +27,7 @@ def test_basics_opoverloadpacket(self): self.assertEqual(id(add_packet), id(copy.deepcopy(add_packet))) # pretty print - self.assertEqual(str(add_packet), "OpOverloadPacket(op='aten.add')") + self.assertEqual(repr(add_packet), "") self.assertRaises(AttributeError, lambda: add_packet.foo) @@ -36,9 +36,9 @@ def test_basics_opoverload(self): add_tensoroverload = add_packet.Tensor # class attributes - self.assertEqual(add_tensoroverload.name, 'aten.add') - self.assertEqual(add_tensoroverload.overload_name, 'Tensor') - self.assertEqual(add_tensoroverload.overload_packet, add_packet) + self.assertEqual(str(add_tensoroverload), 'aten.add.Tensor') + self.assertEqual(add_tensoroverload.__name__, 'add.Tensor') + self.assertEqual(add_tensoroverload.overloadpacket, add_packet) # deepcopy is a no-op self.assertEqual(id(add_tensoroverload), id(copy.deepcopy(add_tensoroverload))) @@ -48,7 +48,7 @@ def test_basics_opoverload(self): self.assertEqual(id(add_tensoroverload), id(another_add_tensoroverload)) # pretty print - self.assertEqual(str(add_tensoroverload), "OpOverload(op='aten.add', overload='Tensor')") + self.assertEqual(repr(add_tensoroverload), "") # callable self.assertEqual(add_tensoroverload(torch.tensor(2), torch.tensor(3)), torch.tensor(5)) diff --git a/test/test_profiler.py b/test/test_profiler.py index 30a9452735cd35..cb2e4a0e5d3157 100644 --- a/test/test_profiler.py +++ b/test/test_profiler.py @@ -64,6 +64,31 @@ def test_mem_leak(self): self.assertTrue(not (is_increasing and max_diff > 100 * 1024), msg='memory usage is increasing, {}'.format(str(last_rss))) + def test_custom_module_input_op_ids(self): + class MyFunc(torch.autograd.Function): + @staticmethod + def forward(ctx, x): + ctx.save_for_backward(x) + return x + + @staticmethod + def backward(ctx, gO): + x, = ctx.saved_tensors + return x + + def custom_layer(input_ten): + return MyFunc.apply(input_ten) + + # Only testing that emit_nvtx runs when + # record_shapes option is enabled. + with torch.autograd.profiler.emit_nvtx(record_shapes=True) as prof: + x = torch.randn(10, 10, requires_grad=True) + y = torch.randn(10, 10, requires_grad=True) + z = x + y + s = custom_layer(z) + q = s.sum() + q.backward() + class TestRecordFunction(TestCase): def _record_function_with_param(self): u = torch.randn(3, 4, 5, requires_grad=True) diff --git a/test/test_public_bindings.py b/test/test_public_bindings.py index 769e2315974732..260a3ac783cd72 100644 --- a/test/test_public_bindings.py +++ b/test/test_public_bindings.py @@ -138,6 +138,7 @@ def test_no_new_bindings(self): "InterfaceType", "IntStorageBase", "IntType", + "SymIntType", "IODescriptor", "is_anomaly_enabled", "is_autocast_cache_enabled", diff --git a/test/test_python_dispatch.py b/test/test_python_dispatch.py index 555e76965a8b7c..4a743d44f88ec5 100644 --- a/test/test_python_dispatch.py +++ b/test/test_python_dispatch.py @@ -1,4 +1,4 @@ -# Owner(s): ["high priority"] +# Owner(s): ["module: __torch_dispatch__"] import tempfile import torch @@ -31,11 +31,11 @@ def test_basic(self) -> None: # self.assertEqual(saved_x._version, x._version) self.assertExpectedInline('\n'.join(logs), '''\ $0 = input('x') -$1 = torch._ops.aten.mul($0, $0) +$1 = torch._ops.aten.mul.Tensor($0, $0) $2 = input('grad_y') -$3 = torch._ops.aten.mul($2, $0) -$4 = torch._ops.aten.mul($2, $0) -$5 = torch._ops.aten.add($4, $3)''') +$3 = torch._ops.aten.mul.Tensor($2, $0) +$4 = torch._ops.aten.mul.Tensor($2, $0) +$5 = torch._ops.aten.add.Tensor($4, $3)''') def test_out(self) -> None: with capture_logs() as logs: @@ -51,7 +51,7 @@ def test_out(self) -> None: self.assertExpectedInline('\n'.join(logs), '''\ $0 = input('x') $1 = input('y') -$2 = torch._ops.aten.abs($0, out=$1)''') +$2 = torch._ops.aten.abs.out($0, out=$1)''') def test_kwarg_only(self) -> None: @@ -74,11 +74,11 @@ def test_kwarg_only(self) -> None: $0 = input('x') $1 = input('y') $2 = input('z') -$3 = torch._ops.aten.addmv($0, $1, $2) -$4 = torch._ops.aten.addmv($0, $1, $2) -$5 = torch._ops.aten.addmv($0, $1, $2, beta=2) -$6 = torch._ops.aten.addmv($0, $1, $2, alpha=2) -$7 = torch._ops.aten.addmv($0, $1, $2, beta=2, alpha=2)''') +$3 = torch._ops.aten.addmv.default($0, $1, $2) +$4 = torch._ops.aten.addmv.default($0, $1, $2) +$5 = torch._ops.aten.addmv.default($0, $1, $2, beta=2) +$6 = torch._ops.aten.addmv.default($0, $1, $2, alpha=2) +$7 = torch._ops.aten.addmv.default($0, $1, $2, beta=2, alpha=2)''') def test_kwarg_only_and_positional_default(self) -> None: with capture_logs() as logs: @@ -96,10 +96,10 @@ def test_kwarg_only_and_positional_default(self) -> None: self.assertExpectedInline('\n'.join(logs), '''\ $0 = input('x') $1 = input('y') -$2 = torch._ops.aten.kl_div($0, $1) -$3 = torch._ops.aten.kl_div($0, $1, 2) -$4 = torch._ops.aten.kl_div($0, $1, log_target=True) -$5 = torch._ops.aten.kl_div($0, $1, 2, log_target=True)''') +$2 = torch._ops.aten.kl_div.default($0, $1) +$3 = torch._ops.aten.kl_div.default($0, $1, 2) +$4 = torch._ops.aten.kl_div.default($0, $1, log_target=True) +$5 = torch._ops.aten.kl_div.default($0, $1, 2, log_target=True)''') def test_list_ret(self) -> None: # test all sequence types are permissible returns @@ -111,7 +111,7 @@ def __new__(cls, elem): @classmethod def __torch_dispatch__(cls, func, types, args=(), kwargs=None): - if func == torch.ops.aten.split: + if func.overloadpacket == torch.ops.aten.split: with no_dispatch(): return list_type(torch.split(*args)) else: @@ -134,7 +134,7 @@ def __torch_dispatch__(cls, func, types, args=(), kwargs=None): return "arf" # Wobbles depending on NDEBUG mode of pybind11 - self.assertRaisesRegexp( + self.assertRaisesRegex( RuntimeError, "Unable to cast", lambda: A(torch.zeros(1)).neg(), ) self.assertRaisesRegexp( @@ -152,8 +152,8 @@ def test_detach_appears_twice_when_called_once(self) -> None: # would be bad if calling .detach() once emits 3+ detaches). self.assertExpectedInline('\n'.join(logs), '''\ $0 = input('x') -$1 = torch._ops.aten.detach($0) -$2 = torch._ops.aten.detach($1)''') +$1 = torch._ops.aten.detach.default($0) +$2 = torch._ops.aten.detach.default($1)''') def test_metadata_change_not_allowed(self) -> None: x = LoggingTensor(torch.ones(1)) @@ -264,11 +264,11 @@ def backward(ctx, grad_output): self.assertExpectedInline('\n'.join(logs), '''\ $0 = input('x') $1 = input('x.grad') -$2 = torch._ops.aten.pow($0, 2) +$2 = torch._ops.aten.pow.Tensor_Scalar($0, 2) $3 = input('grad_output') -$4 = torch._ops.aten.mul($3, tensor(2)) -$5 = torch._ops.aten.mul($4, $0) -$6 = torch._ops.aten.add_($1, $5)''') +$4 = torch._ops.aten.mul.Tensor($3, tensor(2)) +$5 = torch._ops.aten.mul.Tensor($4, $0) +$6 = torch._ops.aten.add_.Tensor($1, $5)''') def test_subclass_creation(self): # Make sure these statements runs without error @@ -376,7 +376,7 @@ def __new__(cls, elem, *args, **kwargs): @classmethod def __torch_dispatch__(cls, func, types, args=(), kwargs=None): - if func.__name__ == "clone": + if func.overloadpacket.__name__ == "clone": # Return a plain tensor from clone(). return args[0].elem.clone() raise RuntimeError("NYI") @@ -444,7 +444,7 @@ def __torch_dispatch__(cls, func, types, args=(), kwargs=None): idxs = (MyTensor(torch.tensor(0)),) v = torch.randn(1) res = x.index_put_(idxs, v) - self.assertEqual(called_funcs, [torch.ops.aten.index_put_]) + self.assertEqual(called_funcs, [torch.ops.aten.index_put_.default]) def test_enable_python_mode_error(self) -> None: with self.assertRaisesRegex(ValueError, "__torch_dispatch__"): @@ -594,7 +594,7 @@ def wrap(e): # It prevents infinite recursion. with no_dispatch(): rs = tree_map(wrap, func(*tree_map(unwrap, args), **tree_map(unwrap, kwargs))) - if func.__name__ == "add": + if func.overloadpacket.__name__ == "add": return None else: return rs @@ -659,7 +659,26 @@ def __torch_dispatch__(cls, func, types, args=(), kwargs=None): x = torch.randn(2) y = torch.randn(2) self.assertEqual(SubTensor(x) + SubTensor(y), x + y) - self.assertEqual(called, [torch.ops.aten.add]) + self.assertEqual(called, [torch.ops.aten.add.Tensor]) + + def test_dispatch_super_call_list_arg(self): + called = [] + + class SubTensorWithListArg(torch.Tensor): + @staticmethod + def __new__(cls, elem): + return torch.Tensor._make_subclass(cls, elem) + + __torch_function__ = torch._C._disabled_torch_function_impl + + @classmethod + def __torch_dispatch__(cls, func, types, args=(), kwargs=None): + called.append(func) + return super().__torch_dispatch__(func, types, list(args), kwargs) + + x = torch.randn(2) + self.assertEqual(SubTensorWithListArg(x).neg(), x.neg()) + self.assertEqual(called, [torch.ops.aten.neg.default]) def test_dispatch_super_dont_autograd(self): called = [] @@ -685,7 +704,13 @@ def __torch_dispatch__(cls, func, types, args=(), kwargs=None): x = SubTensor(torch.randn(2, requires_grad=True)) x.neg() - self.assertEqual(called, [torch.ops.aten.neg]) + self.assertEqual(called, [torch.ops.aten.neg.default]) + + def test_construct_int_tensor(self): + class SubTensor(torch.Tensor): + pass + # should not fail + SubTensor(torch.zeros(2, dtype=torch.int)) def test_multiple_ops_subclass(self): # This is a Direct Subclass, don't do that! diff --git a/test/test_pytree.py b/test/test_pytree.py index 81631c45c3fdd6..c39f5cb3a0a01d 100644 --- a/test/test_pytree.py +++ b/test/test_pytree.py @@ -1,4 +1,4 @@ -# Owner(s): ["high priority"] +# Owner(s): ["module: pytree"] import torch from torch.testing._internal.common_utils import TestCase, run_tests diff --git a/test/test_reductions.py b/test/test_reductions.py index 0def4b9b25253f..52c0a8a1d25785 100644 --- a/test/test_reductions.py +++ b/test/test_reductions.py @@ -13,8 +13,8 @@ from torch._six import inf, nan from torch.testing import make_tensor from torch.testing._internal.common_dtype import ( - get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_complex_dtypes, get_all_fp_dtypes, - integral_types_and, floating_and_complex_types_and + all_types_and_complex_and, get_all_math_dtypes, integral_types, complex_types, floating_types_and, + integral_types_and, floating_and_complex_types_and, all_types_and, ) from torch.testing._internal.common_utils import ( TestCase, run_tests, skipIfNoSciPy, slowTest, torch_to_numpy_dtype_dict, @@ -357,13 +357,13 @@ def _test_ref(self, op: ReductionOpInfo, t: torch.Tensor, **reduction_kwargs): self.assertEqual(result, expected, exact_dtype=False) @ops(filter(lambda op: op.ref is not None, reduction_ops), - allowed_dtypes=get_all_dtypes(include_bfloat16=False)) + allowed_dtypes=all_types_and_complex_and(torch.half, torch.bool)) def test_ref_scalar_input(self, device, dtype, op: ReductionOpInfo): """Compares op against reference for scalar input tensors""" self._test_ref(op, make_tensor([], dtype=dtype, device=device)) @ops(filter(lambda op: op.ref is not None, reduction_ops), - allowed_dtypes=get_all_dtypes(include_bfloat16=False)) + allowed_dtypes=all_types_and_complex_and(torch.half, torch.bool)) def test_ref_small_input(self, device, dtype, op: ReductionOpInfo): """Compares op against reference for small input tensors""" t = make_tensor((5, 3, 4, 2), dtype=dtype, device=device, low=-2, high=2, exclude_zero=True) @@ -391,7 +391,7 @@ def test_ref_large_input_64bit_indexing(self, device, dtype, op: ReductionOpInfo self._test_ref(op, make_tensor((275000000,), dtype=dtype, device=device, low=-1, high=1, exclude_zero=True)) @ops(filter(lambda op: op.ref is not None, reduction_ops), - allowed_dtypes=get_all_dtypes(include_bfloat16=False)) + allowed_dtypes=all_types_and_complex_and(torch.half, torch.bool)) def test_ref_duplicate_values(self, device, dtype, op: ReductionOpInfo): """Compares op against reference for input tensors with duplicate values""" t = make_tensor((4, 4), dtype=dtype, device=device, low=-2, high=2, exclude_zero=True) @@ -452,7 +452,7 @@ def test_dim_reduction_less_than_64(self, device): sizes = [1] * 65 x = torch.randn(sizes, device=device) ops = [torch.mean, torch.sum, torch.nansum, torch.std, torch.logsumexp, torch.std, torch.var, - torch.amin, torch.amax, torch.norm] + torch.norm] for op in ops: with self.assertRaisesRegex(RuntimeError, "only tensors with up to 64 dims are supported"): op(x, 64) @@ -1415,7 +1415,7 @@ def test_dtype_bfloat16(values_bf16=False, boundaries_bf16=False): test_dtype_bfloat16(False, True) test_dtype_bfloat16(True, True) - @dtypes(*get_all_dtypes(include_bool=False, include_complex=False)) + @dtypes(*all_types_and(torch.half, torch.bfloat16)) def test_nansum(self, device, dtype): args = product( (True, False), # noncontiguous @@ -1468,15 +1468,14 @@ def _test_reduction_function_with_numpy(self, torch_func, np_func, device, dtype self.compare_with_numpy(torch_func_partial, np_func_partial, x, device=None, dtype=None, atol=atol, rtol=rtol, exact_dtype=exact_dtype) - @dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) + - get_all_complex_dtypes())) + @dtypes(*all_types_and_complex_and(torch.half)) def test_count_nonzero(self, device, dtype): self._test_reduction_function_with_numpy(torch.count_nonzero, np.count_nonzero, device, dtype) self._test_reduction_function_with_numpy(torch.count_nonzero, np.count_nonzero, device, dtype, True) def _test_sum_reduction_vs_numpy(self, torch_fn, np_fn, device, dtype, with_keepdim=False, with_extremal=False): def is_integral(dtype): - return dtype in get_all_int_dtypes() + return dtype in integral_types() # On Windows CI, the current version of `numpy` promotes all lower integers # dtypes to int32 while `torch` promotes them to int64. Hence we skip on checking @@ -1505,28 +1504,30 @@ def is_integral(dtype): with_keepdim=with_keepdim, with_extremal=with_extremal) @onlyNativeDeviceTypes - @dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False))) + @dtypes(*all_types_and(torch.half)) def test_sum_vs_numpy(self, device, dtype): self._test_sum_reduction_vs_numpy(torch.sum, np.sum, device, dtype) self._test_sum_reduction_vs_numpy(torch.sum, np.sum, device, dtype, with_extremal=True) self._test_sum_reduction_vs_numpy(torch.sum, np.sum, device, dtype, with_keepdim=True) @onlyNativeDeviceTypes - @dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False))) + @dtypes(*all_types_and(torch.half)) def test_nansum_vs_numpy(self, device, dtype): self._test_sum_reduction_vs_numpy(torch.nansum, np.nansum, device, dtype) self._test_sum_reduction_vs_numpy(torch.nansum, np.nansum, device, dtype, with_extremal=True) self._test_sum_reduction_vs_numpy(torch.nansum, np.nansum, device, dtype, with_keepdim=True) - @dtypes(*(get_all_complex_dtypes())) + @dtypes(*complex_types()) def test_nansum_complex(self, device, dtype): x = torch.randn((3, 3, 3), device=device, dtype=dtype) with self.assertRaisesRegex(RuntimeError, "nansum does not support complex inputs"): torch.nansum(x) - def test_nansum_out_dtype(self, device): - dtypes = list(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False)) - for inp_dtype, out_dtype in combinations(dtypes, 2): + @dtypes(*all_types_and(torch.half)) + def test_nansum_out_dtype(self, device, dtype): + out_dtype = dtype + inp_dtypes = all_types_and(torch.half) if out_dtype.is_floating_point else integral_types() + for inp_dtype in inp_dtypes: shape = _rand_shape(random.randint(2, 5), min_size=5, max_size=10) x = _generate_input(shape, inp_dtype, device, with_extremal=False) torch_fn = partial(torch.nansum, dtype=out_dtype) @@ -1534,7 +1535,7 @@ def test_nansum_out_dtype(self, device): np_fn = partial(np.nansum, dtype=np_out_dtype) self.compare_with_numpy(torch_fn, np_fn, x, device=None, dtype=None) - @dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False))) + @dtypes(*all_types_and(torch.half)) def test_argminmax_multiple(self, device, dtype): # Case: All Ones t = torch.ones(3, 3, device=device, dtype=dtype) @@ -1542,7 +1543,7 @@ def test_argminmax_multiple(self, device, dtype): self.compare_with_numpy(torch.argmin, np.argmin, t) # Case: With single `nan` present. - if dtype in get_all_fp_dtypes(): + if dtype in floating_types_and(torch.half, torch.bfloat16): t[2, 2] = float('nan') self.compare_with_numpy(torch.argmax, np.argmax, t) self.compare_with_numpy(torch.argmin, np.argmin, t) @@ -1619,8 +1620,7 @@ def verify_against_numpy(t): [0, 0]], device=device, dtype=dtype) verify_against_numpy(t) - @dtypes(*(get_all_dtypes(include_half=True, include_bfloat16=False, - include_bool=True, include_complex=True))) + @dtypes(*all_types_and_complex_and(torch.half, torch.bool)) def test_all_any_vs_numpy(self, device, dtype): # Note [all, any uint8 compatibility]: However for compatibility reason, # for `uint8`, they return Tensor of same dtype `uint8`. @@ -1735,7 +1735,7 @@ def _test_output_dtype(x): @onlyNativeDeviceTypes def test_repeated_dim(self, device): ops = [torch.mean, torch.sum, torch.nansum, torch.std, torch.logsumexp, torch.std, torch.var, - torch.amin, torch.amax, torch.norm] + torch.norm] x = torch.randn(3, 3, 3, 3, device=device) error_msg = r'appears multiple times in the list of dims' @@ -1835,10 +1835,6 @@ def test_minmax_illegal_dtype(self, device): torch.max(x, dim=0, out=(illegal_values, valid_indices)) with self.assertRaisesRegex(RuntimeError, rmsg): torch.min(x, dim=0, out=(illegal_values, valid_indices)) - with self.assertRaisesRegex(RuntimeError, rmsg): - torch.amax(x, dim=0, out=illegal_values) - with self.assertRaisesRegex(RuntimeError, rmsg): - torch.amin(x, dim=0, out=illegal_values) with self.assertRaisesRegex(RuntimeError, rmsg): torch.max(x, dim=0, out=(valid_values, illegal_indices)) with self.assertRaisesRegex(RuntimeError, rmsg): @@ -1848,7 +1844,7 @@ def test_minmax_illegal_dtype(self, device): with self.assertRaisesRegex(RuntimeError, rmsg): torch.min(x, dim=0, out=(illegal_values, illegal_indices)) - @dtypes(*get_all_dtypes(include_bool=False, include_complex=False)) + @dtypes(*all_types_and(torch.half, torch.bfloat16)) def test_dim_arg_reduction_scalar(self, device, dtype): example = 4.0 @@ -1866,7 +1862,7 @@ def test_dim_arg_reduction_scalar(self, device, dtype): @precisionOverride({torch.float16: 1e-2, torch.bfloat16: 1e-2}) - @dtypes(*(set(get_all_dtypes(include_bool=False, include_complex=False)) - {torch.uint8})) + @dtypes(*set(all_types_and(torch.half, torch.bfloat16)) - {torch.uint8}) def test_dim_reduction(self, device, dtype): example = [[-1, 2, 1], [5, 3, 6]] @@ -3241,8 +3237,7 @@ def test_reduction_empty_any_all(self, device): shape = (2, 0, 4) x = torch.randn(shape, device=device) - for dtype in get_all_dtypes(include_half=True, include_bfloat16=False, - include_bool=True, include_complex=True): + for dtype in all_types_and_complex_and(torch.half, torch.bool): # Refer: [all, any uint8 compatibility] if dtype == torch.uint8: out_dtype = torch.uint8 diff --git a/test/test_scatter_gather_ops.py b/test/test_scatter_gather_ops.py index cd944da7366718..9ef198f7d93225 100644 --- a/test/test_scatter_gather_ops.py +++ b/test/test_scatter_gather_ops.py @@ -10,7 +10,9 @@ (run_tests, TestCase,) from torch.testing._internal.common_device_type import \ (instantiate_device_type_tests, dtypes, dtypesIfCUDA, - toleranceOverride, tol) + toleranceOverride, tol,) +from torch.testing._internal.common_dtype import \ + (get_all_dtypes, get_all_fp_dtypes,) # Protects against includes accidentally setting the default dtype assert torch.get_default_dtype() is torch.float32 @@ -22,13 +24,16 @@ class TestScatterGather(TestCase): # Fills an index tensor with valid indices - def _fill_indices(self, idx, dim, dim_size, elems_per_row, m, n, o): + def _fill_indices(self, idx, dim, dim_size, elems_per_row, m, n, o, unique_indices=True): for i in range(1 if dim == 0 else m): for j in range(1 if dim == 1 else n): for k in range(1 if dim == 2 else o): ii = [i, j, k] ii[dim] = slice(0, idx.size(dim) + 1) - idx[tuple(ii)] = torch.randperm(dim_size)[0:elems_per_row] + if unique_indices: + idx[tuple(ii)] = torch.randperm(dim_size)[0:elems_per_row] + else: + idx[tuple(ii)] = torch.randint(dim_size, (elems_per_row,)) @dtypes(torch.float32, torch.complex64) def test_gather(self, device, dtype): @@ -67,7 +72,8 @@ def test_gather_bool(self, device, dtype): expected = torch.tensor(((False, False), (True, True)), device=device, dtype=dtype) self.assertEqual(actual, expected, atol=0, rtol=0) - def _test_scatter_base(self, fn, *, device, dtype, is_scalar, reduction): + def _test_scatter_base(self, fn, *, device, dtype, is_scalar, reduction, + unique_indices=True, include_self=True): m, n, o = random.randint(10, 20), random.randint(10, 20), random.randint(10, 20) elems_per_row = random.randint(1, 10) dim = random.randrange(3) @@ -75,7 +81,7 @@ def _test_scatter_base(self, fn, *, device, dtype, is_scalar, reduction): idx_size = [m, n, o] idx_size[dim] = elems_per_row idx = torch.empty(tuple(idx_size), device=device, dtype=torch.long) - self._fill_indices(idx, dim, ([m, n, o])[dim], elems_per_row, m, n, o) + self._fill_indices(idx, dim, ([m, n, o])[dim], elems_per_row, m, n, o, unique_indices) if is_scalar: src = random.random() @@ -85,11 +91,15 @@ def _test_scatter_base(self, fn, *, device, dtype, is_scalar, reduction): base = make_tensor((m, n, o), device=device, dtype=dtype) if reduction is not None: - actual = fn(base.clone(), dim, idx, src, reduce=reduction) + if fn is torch.Tensor.scatter_reduce_: + actual = fn(base.clone(), dim, idx, src, reduce=reduction, include_self=include_self) + else: + actual = fn(base.clone(), dim, idx, src, reduce=reduction) else: actual = fn(base.clone(), dim, idx, src) expected = base.clone() + counts = torch.zeros(base.shape, dtype=torch.long, device=device) + include_self for i in range(idx_size[0]): for j in range(idx_size[1]): for k in range(idx_size[2]): @@ -98,16 +108,35 @@ def _test_scatter_base(self, fn, *, device, dtype, is_scalar, reduction): if fn is torch.Tensor.scatter_add_: expected[tuple(ii)] += src[i, j, k] else: - # method may be 'scatter_' or 'scatter' - # both might have a reduction argument + # method may be 'scatter_', 'scatter', 'scatter_reduce' + # or 'scatter_reduce_', the former two might have a reduction argument + # while the latter two always do value = src if is_scalar else src[i, j, k] - if reduction == "add": - expected[tuple(ii)] += value - elif reduction == "multiply": - expected[tuple(ii)] *= value - else: + if ((not include_self) and counts[tuple(ii)] == 0): expected[tuple(ii)] = value + else: + if reduction == "add" or reduction == "sum": + expected[tuple(ii)] += value + elif reduction == "multiply" or reduction == "prod": + expected[tuple(ii)] *= value + elif reduction == "amax": + expected[tuple(ii)] = max(expected[tuple(ii)], value) + elif reduction == "amin": + expected[tuple(ii)] = min(expected[tuple(ii)], value) + elif reduction == "mean": + expected[tuple(ii)] += value + else: + expected[tuple(ii)] = value + + counts[tuple(ii)] += 1 + + if (reduction == "mean"): + counts.masked_fill_(counts == 0, 1) + if (dtype.is_floating_point or dtype.is_complex): + expected /= counts + else: + expected.div_(counts, rounding_mode="floor") self.assertEqual(actual, expected, atol=0, rtol=0) @@ -158,6 +187,46 @@ def test_scatter_add_mult_index_base(self, device, dtype): self.assertEqual(res0[0, :], m * torch.ones(n, device=device, dtype=dtype), atol=0, rtol=0) self.assertEqual(res1[:, 0], n * torch.ones(m, device=device, dtype=dtype), atol=0, rtol=0) + # FIXME: discrepancy between bool ReduceAdd on CUDA and CPU (a + b on CPU and buggy a && b on CUDA) + @dtypes(*get_all_dtypes(include_half=True, include_bfloat16=True, include_bool=False)) + def test_scatter_reduce_sum(self, device, dtype): + for include_self in (True, False): + self._test_scatter_base(torch.Tensor.scatter_reduce_, device=device, dtype=dtype, + is_scalar=False, reduction='sum', unique_indices=False, + include_self=include_self) + + @dtypes(*get_all_dtypes(include_half=True, include_bfloat16=True)) + @dtypesIfCUDA(*get_all_fp_dtypes(include_half=True, include_bfloat16=True)) + def test_scatter_reduce_prod(self, device, dtype): + for include_self in (True, False): + self._test_scatter_base(torch.Tensor.scatter_reduce_, device=device, dtype=dtype, + is_scalar=False, reduction='prod', unique_indices=False, + include_self=include_self) + + @dtypes(*get_all_dtypes(include_half=True, include_bfloat16=True, include_bool=False)) + @dtypesIfCUDA(*get_all_fp_dtypes(include_half=True, include_bfloat16=True)) + def test_scatter_reduce_mean(self, device, dtype): + for include_self in (True, False): + self._test_scatter_base(torch.Tensor.scatter_reduce_, device=device, dtype=dtype, + is_scalar=False, reduction='mean', unique_indices=False, + include_self=include_self) + + @dtypes(*get_all_dtypes(include_half=True, include_bfloat16=True, include_complex=False)) + @dtypesIfCUDA(*get_all_fp_dtypes(include_half=True, include_bfloat16=True)) + def test_scatter_reduce_amax(self, device, dtype): + for include_self in (True, False): + self._test_scatter_base(torch.Tensor.scatter_reduce_, device=device, dtype=dtype, + is_scalar=False, reduction='amax', unique_indices=False, + include_self=include_self) + + @dtypes(*get_all_dtypes(include_half=True, include_bfloat16=True, include_complex=False)) + @dtypesIfCUDA(*get_all_fp_dtypes(include_half=True, include_bfloat16=True)) + def test_scatter_reduce_amin(self, device, dtype): + for include_self in (True, False): + self._test_scatter_base(torch.Tensor.scatter_reduce_, device=device, dtype=dtype, + is_scalar=False, reduction='amin', unique_indices=False, + include_self=include_self) + # Generic Device Test Framework instantation, see # https://github.com/pytorch/pytorch/wiki/Running-and-writing-tests diff --git a/test/test_serialization.py b/test/test_serialization.py index 878c602d5d64da..9204392683b6b7 100644 --- a/test/test_serialization.py +++ b/test/test_serialization.py @@ -23,7 +23,7 @@ from torch.testing._internal.common_utils import TestCase, IS_WINDOWS, \ TEST_DILL, run_tests, download_file, BytesIOContext, TemporaryFileName from torch.testing._internal.common_device_type import instantiate_device_type_tests -from torch.testing._internal.common_dtype import get_all_dtypes +from torch.testing._internal.common_dtype import all_types_and_complex_and # These tests were all copied from `test/test_torch.py` at some point, so see # the actual blame, see this revision @@ -414,7 +414,7 @@ def test_serialization_save_warnings(self): with warnings.catch_warnings(record=True) as warns: with tempfile.NamedTemporaryFile() as checkpoint: x = torch.save(torch.nn.Linear(2, 3), checkpoint) - self.assertEquals(len(warns), 0) + self.assertEqual(len(warns), 0) def test_serialization_map_location(self): test_file_path = download_file('https://download.pytorch.org/test_data/gpu_tensors.pt') @@ -616,10 +616,11 @@ def save_load_check(a, b): self.assertEqual(a, a_loaded) self.assertEqual(b, b_loaded) - for device, dtype in product(devices, get_all_dtypes()): + for device, dtype in product(devices, all_types_and_complex_and(torch.half, + torch.bfloat16, torch.bool)): a = torch.tensor([], dtype=dtype, device=device) - for other_dtype in get_all_dtypes(): + for other_dtype in all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool): s = torch._TypedStorage( wrap_storage=a.storage()._untyped(), dtype=other_dtype) @@ -726,7 +727,7 @@ def import_module(name, filename): loaded = torch.load(checkpoint) self.assertTrue(isinstance(loaded, module.Net)) if can_retrieve_source: - self.assertEquals(len(w), 0) + self.assertEqual(len(w), 0) # Replace the module with different source fname = get_file_path_2(os.path.dirname(os.path.dirname(torch.__file__)), 'torch', 'testing', @@ -737,7 +738,7 @@ def import_module(name, filename): loaded = torch.load(checkpoint) self.assertTrue(isinstance(loaded, module.Net)) if can_retrieve_source: - self.assertEquals(len(w), 1) + self.assertEqual(len(w), 1) self.assertTrue(w[0].category, 'SourceChangeWarning') def test_serialization_container(self): diff --git a/test/test_shape_ops.py b/test/test_shape_ops.py index 13c636d6563a4c..de709cc1ee627c 100644 --- a/test/test_shape_ops.py +++ b/test/test_shape_ops.py @@ -15,7 +15,7 @@ from torch.testing._internal.common_device_type import ( instantiate_device_type_tests, onlyCPU, onlyCUDA, dtypes, onlyNativeDeviceTypes, dtypesIfCUDA, largeTensorTest) -from torch.testing._internal.common_dtype import get_all_dtypes +from torch.testing._internal.common_dtype import all_types_and_complex_and, all_types, all_types_and # TODO: replace with make_tensor def _generate_input(shape, dtype, device, with_extremal): @@ -227,9 +227,8 @@ def test_diagonal_multidim(self, device, dtype): self.assertEqual(expected, result) @onlyNativeDeviceTypes - @dtypes(*get_all_dtypes(include_complex=False, include_bool=False, include_half=False, - include_bfloat16=False)) - @dtypesIfCUDA(*get_all_dtypes(include_complex=False, include_bool=False, include_bfloat16=False)) + @dtypes(*all_types()) + @dtypesIfCUDA(*all_types_and(torch.half)) def test_trace(self, device, dtype): def test(shape): tensor = make_tensor(shape, dtype=dtype, device=device, low=-9, high=9) @@ -341,7 +340,7 @@ def test_clamp_raises_arg_errors(self, device): with self.assertRaisesRegex(RuntimeError, error_msg): torch.clamp(X) - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) def test_flip(self, device, dtype): make_from_data = partial(torch.tensor, device=device, dtype=dtype) make_from_size = partial(make_tensor, device=device, dtype=dtype) @@ -440,7 +439,7 @@ def gen_data(): for dims in test_dims: self.assertEqual(size, list(data.flip(dims).size())) - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) def test_flip_errors(self, device, dtype): make_arg = partial(make_tensor, dtype=dtype, device=device) data = make_arg((2, 2, 2)) @@ -458,7 +457,7 @@ def test_flip_errors(self, device, dtype): def _rand_shape(self, dim, min_size, max_size): return tuple(torch.randint(min_size, max_size + 1, (dim,))) - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) def test_flip_numpy(self, device, dtype): make_arg = partial(make_tensor, dtype=dtype, device=device) @@ -476,6 +475,7 @@ def test_flip_numpy(self, device, dtype): @onlyCUDA # CPU is too slow @largeTensorTest('17GB') # 4 tensors of 4GB (in, out) x (torch, numpy) + 1GB + @largeTensorTest("81GB", "cpu") # even for CUDA test, sufficient system memory is required def test_flip_large_tensor(self, device): t_in = torch.empty(2**32 + 1, dtype=torch.uint8).random_() torch_fn = partial(torch.flip, dims=(0,)) @@ -567,7 +567,7 @@ def test_nonzero_no_warning(self, device): t.nonzero() self.assertEqual(len(w), 0) - @dtypes(*get_all_dtypes(include_complex=False)) + @dtypes(*all_types_and(torch.half, torch.bool, torch.bfloat16)) def test_nonzero(self, device, dtype): shapes = [ diff --git a/test/test_sort_and_select.py b/test/test_sort_and_select.py index ab6c72285ce8f9..ba99d3ed7a0ffe 100644 --- a/test/test_sort_and_select.py +++ b/test/test_sort_and_select.py @@ -8,11 +8,9 @@ from itertools import permutations, product from torch.testing import make_tensor -from torch.testing._internal.common_dtype import ( - all_types, all_types_and, floating_types_and, get_all_dtypes, get_all_int_dtypes, get_all_fp_dtypes, -) +from torch.testing._internal.common_dtype import all_types, all_types_and, floating_types_and from torch.testing._internal.common_utils import \ - (TEST_WITH_ROCM, TestCase, run_tests, slowTest) + (TestCase, run_tests, slowTest) from torch.testing._internal.common_device_type import \ (instantiate_device_type_tests, dtypes, onlyNativeDeviceTypes, skipCUDAIfRocm, onlyCUDA, dtypesIfCUDA, dtypesIfCPU, onlyCPU, largeTensorTest) @@ -133,7 +131,7 @@ def test_sort(self, device): 'random with NaNs') # FIXME: remove torch.bool from unsupported types once support is added for cub sort - @dtypes(*set(get_all_dtypes()) - {torch.bool, torch.complex64, torch.complex128}) + @dtypes(*all_types_and(torch.half, torch.bfloat16)) def test_stable_sort(self, device, dtype): sizes = (100, 1000, 10000) for ncopies in sizes: @@ -226,7 +224,7 @@ def test_topk_1d_output_discontiguous(self, device, dtype): self.assertEqual(values, values_cont) # FIXME: remove torch.bool from unsupported types once support is added for cub sort - @dtypes(*set(get_all_dtypes()) - {torch.bool, torch.complex64, torch.complex128}) + @dtypes(*all_types_and(torch.half, torch.bfloat16)) def test_stable_sort_against_numpy(self, device, dtype): if dtype in floating_types_and(torch.float16, torch.bfloat16): inf = float('inf') @@ -289,7 +287,7 @@ def repeated_index_fill(t, dim, idxs, vals): idx_numpy = np.argsort(sample_numpy, axis=dim, kind='stable') self.assertEqual(idx_torch, idx_numpy) - @dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes())) + @dtypes(*all_types_and(torch.half, torch.bfloat16)) def test_msort(self, device, dtype): def test(shape): tensor = make_tensor(shape, dtype=dtype, device=device, low=-9, high=9) @@ -678,7 +676,6 @@ def test_topk_integral(self, device, dtype): @onlyCUDA @dtypes(torch.bfloat16) - @skipCUDAIfRocm def test_topk_bfloat16(self, device, dtype): small = 10 @@ -687,12 +684,9 @@ def test_topk_bfloat16(self, device, dtype): for curr_size in (small, large, verylarge): self._test_topk_dtype(device, dtype, False, curr_size) - @dtypesIfCUDA(*get_all_fp_dtypes()) + @dtypesIfCUDA(*floating_types_and(torch.half, torch.bfloat16)) @dtypes(torch.float, torch.double, torch.bfloat16) def test_topk_nonfinite(self, device, dtype): - if TEST_WITH_ROCM and dtype == torch.bfloat16: - return - x = torch.tensor([float('nan'), float('inf'), 1e4, 0, -1e4, -float('inf')], device=device, dtype=dtype) val, idx = x.topk(4) expect = torch.tensor([float('nan'), float('inf'), 1e4, 0], device=device, dtype=dtype) @@ -721,15 +715,9 @@ def test_topk_4d(self, device): self.assertEqual(ind, expected_ind, atol=0, rtol=0) @onlyNativeDeviceTypes - @dtypesIfCUDA(*(get_all_dtypes(include_complex=False, - include_bool=False, - include_half=False, - include_bfloat16=True))) - @dtypes(*(get_all_dtypes(include_complex=False, include_bool=False, include_half=False, include_bfloat16=False))) + @dtypesIfCUDA(*all_types_and(torch.bfloat16)) + @dtypes(*all_types()) def test_topk_zero(self, device, dtype): - if TEST_WITH_ROCM and dtype == torch.bfloat16: - return - # https://github.com/pytorch/pytorch/issues/49205 t = torch.rand(2, 2, device=device).to(dtype=dtype) val, idx = torch.topk(t, k=0, largest=False) @@ -782,12 +770,9 @@ def ensure_tuple(x): self.assertEqual(expected_inverse.view(additional_shape), y_inverse) self.assertEqual(expected_counts, y_counts) - @dtypesIfCPU(*set(get_all_dtypes()) - {torch.complex64, torch.complex128}) - @dtypes(*set(get_all_dtypes()) - {torch.bfloat16, torch.complex64, torch.complex128}) + @dtypesIfCPU(*all_types_and(torch.bool, torch.bfloat16)) + @dtypes(*all_types_and(torch.half, torch.bool)) def test_unique(self, device, dtype): - if dtype is torch.half and self.device_type == 'cpu': - return # CPU does not have half support - def ensure_tuple(x): if isinstance(x, torch.Tensor): return (x,) @@ -842,12 +827,9 @@ def ensure_tuple(x): count += 1 self.assertEqual(j, count) - @dtypesIfCPU(*set(get_all_dtypes()) - {torch.complex64, torch.complex128}) - @dtypes(*set(get_all_dtypes()) - {torch.bfloat16, torch.complex64, torch.complex128}) + @dtypesIfCPU(*all_types_and(torch.bool, torch.bfloat16)) + @dtypes(*all_types_and(torch.half, torch.bool)) def test_unique_consecutive(self, device, dtype): - if dtype is torch.half and self.device_type == 'cpu': - return # CPU does not have half support - if dtype is torch.bool: x = torch.tensor([True, False, False, False, True, True, False, False, False], dtype=torch.bool, device=device) expected_unique = torch.tensor([True, False, True, False], dtype=torch.bool, device=device) diff --git a/test/test_sparse.py b/test/test_sparse.py index a50d493cdac635..86fef22ef49aef 100644 --- a/test/test_sparse.py +++ b/test/test_sparse.py @@ -7,9 +7,6 @@ import random import unittest from torch.testing import make_tensor -from torch.testing._internal.common_dtype import ( - all_types_and_complex, -) from torch.testing._internal.common_utils import TestCase, run_tests, skipIfRocm, do_test_dtypes, \ do_test_empty_full, load_tests, TEST_NUMPY, IS_WINDOWS, gradcheck, coalescedonoff, \ DeterministicGuard, first_sample, IS_LINUX @@ -17,16 +14,16 @@ from numbers import Number from typing import Dict, Any from distutils.version import LooseVersion -from torch.testing import get_all_complex_dtypes, get_all_fp_dtypes from torch.testing._internal.common_cuda import \ (SM53OrLater, SM80OrLater, CUDA11OrLater) from torch.testing._internal.common_device_type import \ (instantiate_device_type_tests, ops, dtypes, dtypesIfCUDA, onlyCPU, onlyCUDA, precisionOverride, deviceCountAtLeast, OpDTypes) from torch.testing._internal.common_methods_invocations import \ - (sparse_unary_ufuncs) + (sparse_unary_ufuncs, sparse_masked_reduction_ops) from torch.testing._internal.common_dtype import ( - floating_and_complex_types, floating_and_complex_types_and, get_all_dtypes, get_all_int_dtypes, + all_types, all_types_and_complex, all_types_and_complex_and, floating_and_complex_types, + floating_and_complex_types_and, integral_types, floating_types_and, ) # load_tests from torch.testing._internal.common_utils is used to automatically filter tests for @@ -315,6 +312,10 @@ def test_tensor(x, res): self.assertEqual(res, dense_x) self.assertEqual(res, safe_dense_x) + # Only run autograd test for float64 + if x.dtype != torch.float64: + return + def fn(x): return x.to_dense() x.requires_grad_(True) @@ -346,6 +347,7 @@ def fn(x): ], dtype=dtype, device=device) test_tensor(x, res) + test_tensor(res, res) i = self.index_tensor([ [0, 1, 2, 2], @@ -1954,7 +1956,7 @@ def test_narrow(self, device, dtype, coalesced): def _test_log1p_tensor(self, sparse_tensor, coalesced): def is_integral(dtype): - return dtype in get_all_int_dtypes() + return dtype in integral_types() dense_tensor = sparse_tensor.to_dense() expected_output = dense_tensor.log1p() @@ -1985,7 +1987,7 @@ def is_integral(dtype): sparse_tensor.requires_grad_() @coalescedonoff - @dtypes(*get_all_dtypes(include_bool=False, include_half=False, include_complex=False)) + @dtypes(*all_types()) def test_log1p(self, device, dtype, coalesced): if coalesced: input_coalesced = torch.sparse_coo_tensor( @@ -2093,7 +2095,7 @@ def test_neg_negative(self, device, dtype, coalesced): def _test_asin_arcsin(self, sparse_tensor, coalesced): def is_integral(dtype): - return dtype in get_all_int_dtypes() + return dtype in integral_types() is_integral_dtype = is_integral(sparse_tensor.dtype) dense_tensor = sparse_tensor.to_dense() @@ -2128,7 +2130,7 @@ def is_integral(dtype): op(sparse_tensor) @coalescedonoff - @dtypes(*get_all_dtypes(include_bool=False, include_half=False, include_complex=False)) + @dtypes(*all_types()) def test_asin_arcsin(self, device, dtype, coalesced): if coalesced: input_coalesced = torch.sparse_coo_tensor( @@ -2615,14 +2617,14 @@ def test_legacy_new(self, device): @onlyCPU # not really, but we only really want to run this once def test_dtypes(self, device): - all_sparse_dtypes = get_all_dtypes(include_complex=True) + all_sparse_dtypes = all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16) do_test_dtypes(self, all_sparse_dtypes, torch.sparse_coo, torch.device('cpu')) if torch.cuda.is_available(): do_test_dtypes(self, all_sparse_dtypes, torch.sparse_coo, torch.device('cuda:0')) @onlyCPU # not really, but we only really want to run this once def test_empty_full(self, device): - all_sparse_dtypes = get_all_dtypes(include_complex=True) + all_sparse_dtypes = all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16) do_test_empty_full(self, all_sparse_dtypes, torch.sparse_coo, torch.device('cpu')) if torch.cuda.device_count() > 0: do_test_empty_full(self, all_sparse_dtypes, torch.sparse_coo, None) @@ -3219,14 +3221,12 @@ def sparse_log(x): # TODO: Check after why ROCm's cusparseXcsrgemm2Nnz function doesn't return the same nnz value as CUDA @skipIfRocm @coalescedonoff - @dtypes(*get_all_complex_dtypes(), - *get_all_fp_dtypes(include_half=False, include_bfloat16=False)) - @dtypesIfCUDA(*((torch.complex64,) if CUDA11OrLater else ()), - *((torch.complex128,) if CUSPARSE_SPMM_COMPLEX128_SUPPORTED else ()), - *get_all_fp_dtypes( - include_half=(CUDA11OrLater and SM53OrLater), - include_bfloat16=(CUDA11OrLater and SM80OrLater))) - @precisionOverride({torch.bfloat16: 2.5e-2, torch.float16: 2.5e-2, torch.complex64: 1e-2, torch.float32: 1e-2}) + @dtypes(*floating_and_complex_types()) + @dtypesIfCUDA(*floating_types_and(*[torch.half] if CUDA11OrLater and SM53OrLater else [], + *[torch.bfloat16] if CUDA11OrLater and SM80OrLater else [], + *[torch.complex64] if CUDA11OrLater else [], + *[torch.complex128] if CUSPARSE_SPMM_COMPLEX128_SUPPORTED else [])) + @precisionOverride({torch.bfloat16: 1e-2, torch.float16: 1e-2, torch.complex64: 1e-2, torch.float32: 1e-2}) def test_sparse_matmul(self, device, dtype, coalesced): """ This function test `torch.sparse.mm` when both the mat1 and mat2 are sparse tensors. @@ -3402,21 +3402,21 @@ class TestSparseOneOff(TestCase): def test_cuda_from_cpu(self): with self.assertRaisesRegex( RuntimeError, - "backend of indices \\(CUDA\\) must match backend of values \\(CPU\\)"): + "Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!"): torch.sparse.FloatTensor(torch.zeros(1, 4).long().cuda(), torch.randn(4, 4, 4), [3, 4, 4]) with self.assertRaisesRegex( RuntimeError, - "backend of indices \\(CUDA\\) must match backend of values \\(CPU\\)"): + "Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!"): torch.sparse.FloatTensor(torch.zeros(1, 4).long().cuda(), torch.randn(4, 4, 4, 0), [3, 4, 4, 0]) with self.assertRaisesRegex( RuntimeError, - "backend of indices \\(CUDA\\) must match backend of values \\(CPU\\)"): + "Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!"): torch.sparse.FloatTensor(torch.LongTensor(1, 0).cuda(), torch.randn(0, 4, 4, 0), [0, 4, 4, 0]) @@ -3547,9 +3547,48 @@ def fn(x): fast_mode=op.gradcheck_fast_mode)) +class TestSparseMaskedReductions(TestCase): + exact_dtype = True + + @ops(sparse_masked_reduction_ops) + def test_future_empty_dim(self, device, dtype, op): + """Currently, `dim=()` in reductions operations means "reduce over + all dimensions" while in future, it will read "no reduce". See + https://github.com/pytorch/pytorch/issues/29137 + + For sparse masked reductions, we'll implement the current behavior. + + For testing, we'll use samples with `dim=0` and map it to + `dim=()` until + torch.testing._internal.common_methods_invocations._generate_reduction_kwargs + is made to generate samples with `dim=()` for non-scalar + inputs. With this and after gh-29137 is resolved, this test + can be deleted. See also `torch._masked._canonical_dim` + implementation about changing the `dim=()` behavior. + """ + + samples = op.sample_inputs_func(op, device, dtype, requires_grad=False) + for sample_input in samples: + if sample_input.kwargs.get('dim') != 0: + continue + sample_input_kwargs = dict(sample_input.kwargs) + sample_input_kwargs['dim'] = () # reduce over all dimensions + + t = sample_input.input + mask = sample_input_kwargs.get('mask') + sparse_op_kwargs = dict(sample_input_kwargs) + actual = op(t.to_sparse(), *sample_input.args, **sample_input_kwargs) + self.assertEqual(actual.layout, torch.sparse_coo) + + expected = op(t, *sample_input.args, **sample_input_kwargs).to_sparse() + self.assertEqual(actual, expected) + + # e.g., TestSparseUnaryUfuncsCPU and TestSparseUnaryUfuncsCUDA instantiate_device_type_tests(TestSparseUnaryUfuncs, globals(), except_for='meta') +instantiate_device_type_tests(TestSparseMaskedReductions, globals(), except_for='meta') + # e.g., TestSparseCPU and TestSparseCUDA instantiate_device_type_tests(TestSparse, globals(), except_for='meta') diff --git a/test/test_sparse_csr.py b/test/test_sparse_csr.py index 8c120376b118f0..a546bc26b329a8 100644 --- a/test/test_sparse_csr.py +++ b/test/test_sparse_csr.py @@ -4,17 +4,20 @@ import random import itertools import unittest -from torch.testing import get_all_complex_dtypes, get_all_fp_dtypes, floating_and_complex_types, make_tensor +from torch.testing import make_tensor from torch.testing._internal.common_cuda import SM53OrLater, SM80OrLater, TEST_CUSPARSE_GENERIC from torch.testing._internal.common_utils import \ - (TEST_WITH_ROCM, TEST_SCIPY, TEST_MKL, IS_WINDOWS, TestCase, run_tests, load_tests, coalescedonoff) + (TEST_WITH_ROCM, TEST_SCIPY, TEST_MKL, IS_WINDOWS, TestCase, run_tests, load_tests, coalescedonoff, parametrize) from torch.testing._internal.common_device_type import \ (ops, instantiate_device_type_tests, dtypes, OpDTypes, dtypesIfCUDA, onlyCPU, onlyCUDA, skipCUDAIfNoCusparseGeneric, precisionOverride, skipMeta, skipCUDAIf, skipCUDAIfRocm, skipCPUIfNoMklSparse) from torch.testing._internal.common_methods_invocations import \ - (op_db, sparse_csr_unary_ufuncs, ) + (op_db, sparse_csr_unary_ufuncs, ReductionOpInfo) from torch.testing._internal.common_cuda import _get_torch_cuda_version, CUDA11OrLater -from torch.testing._internal.common_dtype import floating_types, get_all_dtypes +from torch.testing._internal.common_dtype import ( + floating_types, all_types_and_complex_and, floating_and_complex_types, floating_types_and, + all_types_and_complex, floating_and_complex_types_and +) from test_sparse import CUSPARSE_SPMM_COMPLEX128_SUPPORTED if TEST_SCIPY: @@ -135,7 +138,28 @@ def test_csr_layout(self): self.assertEqual(str(torch.sparse_csr), 'torch.sparse_csr') self.assertEqual(type(torch.sparse_csr), torch.layout) - @dtypes(*get_all_dtypes()) + def test_csr_stride(self): + a = self.genSparseCSRTensor((3, 3), 3, dtype=torch.float, device=self.device_type, index_dtype=torch.int64) + + with self.assertRaisesRegex(RuntimeError, "Sparse CSR tensors do not have strides"): + a.stride() + + with self.assertRaisesRegex(RuntimeError, "Sparse CSR tensors do not have strides"): + a.stride(-1) + + def test_csr_storage(self): + a = self.genSparseCSRTensor((3, 3), 3, dtype=torch.float, device=self.device_type, index_dtype=torch.int64) + + with self.assertRaisesRegex(RuntimeError, "Cannot access storage of SparseCsrTensorImpl"): + a.storage() + + def test_csr_is_contiguous(self): + a = self.genSparseCSRTensor((3, 3), 3, dtype=torch.float, device=self.device_type, index_dtype=torch.int64) + + with self.assertRaisesRegex(RuntimeError, "Tensors of type SparseCsrTensorImpl do not have is_contiguous"): + a.is_contiguous() + + @dtypes(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) def test_sparse_csr_constructor_shape_inference(self, device, dtype): crow_indices = [0, 2, 4] col_indices = [0, 1, 0, 1] @@ -148,7 +172,7 @@ def test_sparse_csr_constructor_shape_inference(self, device, dtype): self.assertEqual(dtype, sparse.dtype) self.assertEqual(torch.device(device), sparse.device) - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) def test_sparse_csr_constructor(self, device, dtype): crow_indices = [0, 2, 4] col_indices = [0, 1, 0, 1] @@ -165,7 +189,34 @@ def test_sparse_csr_constructor(self, device, dtype): self.assertEqual(torch.tensor(col_indices, dtype=index_dtype), sparse.col_indices()) self.assertEqual(torch.tensor(values, dtype=dtype), sparse.values()) - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool)) + def test_sparse_csr_batch_constructor(self, device, dtype): + batch_shape = (2, 3) + crow_indices = torch.tensor([0, 2, 4], device=device).repeat(6, 1).reshape(*batch_shape, -1) + col_indices = torch.tensor([0, 1, 0, 1], device=device).repeat(6, 1).reshape(*batch_shape, -1) + values = torch.tensor([1, 2, 3, 4], device=device, dtype=dtype).repeat(6, 1).reshape(*batch_shape, -1) + for index_dtype in [torch.int32, torch.int64]: + sparse = torch.sparse_csr_tensor(crow_indices.to(index_dtype), + col_indices.to(index_dtype), + values, + size=(*batch_shape, 2, 10), + dtype=dtype, + device=device) + self.assertEqual((*batch_shape, 2, 10), sparse.shape) + self.assertEqual(crow_indices.to(index_dtype), sparse.crow_indices()) + self.assertEqual(col_indices.to(index_dtype), sparse.col_indices()) + self.assertEqual(values, sparse.values()) + + @dtypes(*all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool)) + def test_sparse_csr_batch_constructor_shape_inference(self, device, dtype): + batch_shape = (2, 3) + crow_indices = torch.tensor([0, 2, 4], device=device).repeat(6, 1).reshape(*batch_shape, -1) + col_indices = torch.tensor([0, 1, 0, 1], device=device).repeat(6, 1).reshape(*batch_shape, -1) + values = torch.tensor([1, 2, 3, 4], device=device, dtype=dtype).repeat(6, 1).reshape(*batch_shape, -1) + sparse = torch.sparse_csr_tensor(crow_indices, col_indices, values, dtype=dtype, device=device) + self.assertEqual((*batch_shape, crow_indices.shape[-1] - 1, col_indices.max() + 1), sparse.shape) + + @dtypes(*all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool)) def test_sparse_csr_constructor_from_lists(self, device, dtype): # without size sparse = torch.sparse_csr_tensor([0, 2, 4], @@ -195,18 +246,20 @@ def test_sparse_csr_constructor_from_lists(self, device, dtype): self.assertEqual(torch.tensor([1, 2, 3, 4], dtype=dtype, device=device), sparse.values()) @skipMeta - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.bool, torch.bfloat16, torch.half)) def test_empty(self, device, dtype): ns = [5, 2, 0] - for shape in itertools.product(ns, ns): + batch_shapes = [(), (2,), (2, 3)] + for m, n, b in itertools.product(ns, ns, batch_shapes): + shape = (*b, m, n) result = torch.empty(shape, dtype=dtype, device=device, layout=torch.sparse_csr) self.assertEqual(result.shape, shape) self.assertEqual(result.dtype, dtype) self.assertEqual(result.device, torch.device(device)) self.assertEqual(result.layout, torch.sparse_csr) - self.assertEqual(result.crow_indices().shape, (shape[0] + 1,)) - self.assertEqual(result.col_indices().shape, (0,)) - self.assertEqual(result.values().shape, (0,)) + self.assertEqual(result.crow_indices().shape, (*b, shape[-2] + 1,)) + self.assertEqual(result.col_indices().shape, (*b, 0,)) + self.assertEqual(result.values().shape, (*b, 0,)) self.assertEqual(result._nnz(), 0) self.assertEqual(result.crow_indices().device, torch.device(device)) self.assertEqual(result.col_indices().device, torch.device(device)) @@ -216,31 +269,27 @@ def test_empty(self, device, dtype): self.assertEqual(result.values().dtype, dtype) @skipMeta - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.bool, torch.half, torch.bfloat16)) def test_empty_errors(self, device, dtype): - with self.assertRaisesRegex(RuntimeError, "torch.empty: Only 2D sparse CSR tensors are supported."): + with self.assertRaisesRegex(RuntimeError, "torch.empty: Only batched sparse CSR matrices are supported, but got size"): torch.empty((5,), dtype=dtype, device=device, layout=torch.sparse_csr) - with self.assertRaisesRegex(RuntimeError, "torch.empty: Only 2D sparse CSR tensors are supported."): - torch.empty((2, 3, 4), dtype=dtype, device=device, layout=torch.sparse_csr) - @skipMeta - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.bool, torch.half, torch.bfloat16)) def test_clone(self, device, dtype): - x = torch.sparse_csr_tensor([0, 2, 4], - [0, 1, 0, 1], - [1, 2, 3, 4], - dtype=dtype, - device=device) - y = x.clone() - - self.assertEqual(x.shape, y.shape) - self.assertEqual(x.crow_indices(), y.crow_indices()) - self.assertEqual(x.col_indices(), y.col_indices()) - self.assertEqual(x.values(), y.values()) + from operator import mul + from functools import reduce + for batch_shape in ((), (2,), (2, 3)): + prod = reduce(mul, batch_shape, 1) + crow_indices = torch.tensor([0, 2, 4], device=device).repeat(prod, 1).reshape(*batch_shape, -1) + col_indices = torch.tensor([0, 1, 0, 1], device=device).repeat(prod, 1).reshape(*batch_shape, -1) + values = torch.tensor([1, 2, 3, 4], device=device, dtype=dtype).repeat(prod, 1).reshape(*batch_shape, -1) + sparse = torch.sparse_csr_tensor(crow_indices, col_indices, values, dtype=dtype, device=device) + cloned_sparse = sparse.clone() + self.assertEqual(sparse, cloned_sparse) @skipMeta - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) def test_copy(self, device, dtype): def run_test(shape, nnz, index_type): @@ -249,17 +298,16 @@ def run_test(shape, nnz, index_type): a.copy_(b) - self.assertEqual(a.crow_indices(), b.crow_indices()) - self.assertEqual(a.col_indices(), b.col_indices()) - self.assertEqual(a.values(), b.values()) + self.assertEqual(a, b) ns = [5, 2, 0] - for shape, index_dtype in zip(itertools.product(ns, ns), [torch.int32, torch.int64]): - run_test(shape, 0, index_dtype) - run_test(shape, shape[0] * shape[1], index_dtype) + batch_shapes = [(), (2,), (2, 3)] + for (m, n, b), index_dtype in zip(itertools.product(ns, ns, batch_shapes), [torch.int32, torch.int64]): + run_test((*b, m, n), 0, index_dtype) + run_test((*b, m, n), m * n, index_dtype) @skipMeta - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) def test_copy_errors(self, device, dtype): for index_dtype in [torch.int32, torch.int64]: shape1 = (2, 3) @@ -278,36 +326,42 @@ def test_copy_errors(self, device, dtype): a.copy_(b) @skipMeta - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) def test_resize(self, device, dtype): - for index_dtype in [torch.int32, torch.int64]: - shape = (2, 3) + batch_shapes = [(), (2,), (2, 3)] + for index_dtype, b in zip([torch.int32, torch.int64], batch_shapes): + shape = (*b, 2, 3) nnz = 6 a = self.genSparseCSRTensor(shape, nnz, dtype=dtype, device=device, index_dtype=index_dtype) - new_shape = (4, 5) + new_shape = (*b, 4, 5) a.resize_(new_shape) self.assertEqual(a.shape, new_shape) # resize to larger shape doesn't add specified elements self.assertEqual(a._nnz(), nnz) - new_shape = (1, 5) + new_shape = (*b, 1, 5) a.resize_(new_shape) self.assertEqual(a.shape, new_shape) # resize to smaller shape trims specified elements self.assertEqual(a._nnz(), 5) + # trim batched dimensions + a.resize_(new_shape[-2], new_shape[-1]) + self.assertEqual(a.shape, (new_shape[-2], new_shape[-1])) + self.assertEqual(a._nnz(), 5) + @skipMeta - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) def test_resize_errors(self, device, dtype): for index_dtype in [torch.int32, torch.int64]: shape = (2, 3) nnz = 6 a = self.genSparseCSRTensor(shape, nnz, dtype=dtype, device=device, index_dtype=index_dtype) - with self.assertRaisesRegex(RuntimeError, "torch.resize_: Only 2D sparse CSR tensors are supported."): + with self.assertRaisesRegex(RuntimeError, "torch.resize_: Only batched sparse CSR matrices are supported"): new_shape = (4,) a.resize_(new_shape) @@ -352,49 +406,62 @@ def test_factory_layout_invariants_check(self, device): torch.tensor([1, 2, 3, 4])) def test_factory_shape_invariants_check(self, device): - crow_indices = [0, 2, 4] - col_indices = [0, 1, 0, 1] - values = [1, 2, 3, 4] + crow_indices = torch.tensor([0, 2, 4], device=device) + col_indices = torch.tensor([0, 1, 0, 1], device=device) + values = torch.tensor([1, 2, 3, 4], device=device) size = (2, 10) - torch.sparse_csr_tensor(torch.tensor(crow_indices), torch.tensor(col_indices), torch.tensor(values), size, - device=device) + torch.sparse_csr_tensor(crow_indices, col_indices, values, size, device=device) - with self.assertRaisesRegex(RuntimeError, r"size of a CSR tensor must be of length 2, but got: 3"): - torch.sparse_csr_tensor(torch.tensor(crow_indices), torch.tensor(col_indices), torch.tensor(values), - size=(2, 10, 2), + with self.assertRaisesRegex(RuntimeError, r"size of a batched CSR tensor must have length >= 2, but got: 1"): + torch.sparse_csr_tensor(crow_indices, col_indices, values, + size=(2,), device=device) - with self.assertRaisesRegex(RuntimeError, r"crow_indices must have dim\=1 but got crow_indices\.dim\(\)\=2"): - torch.sparse_csr_tensor(torch.tensor(crow_indices).repeat(2, 1), - torch.tensor(col_indices), - torch.tensor(values), + with self.assertRaisesRegex(RuntimeError, r"crow_indices must have dim >= 1 but got crow_indices\.dim\(\)\ = 0"): + torch.sparse_csr_tensor(torch.zeros((), device=device, dtype=torch.int64), + col_indices, + values, size, device=device) - with self.assertRaisesRegex(RuntimeError, r"col_indices must have dim\=1 but got col_indices\.dim\(\)\=2"): - torch.sparse_csr_tensor(torch.tensor(crow_indices), - torch.tensor(col_indices).repeat(2, 1), - torch.tensor(values), + with self.assertRaisesRegex(RuntimeError, r"col_indices must have dim >= 1 but got col_indices\.dim\(\)\ = 0"): + torch.sparse_csr_tensor(crow_indices, + torch.zeros((), device=device, dtype=torch.int64), + values, size, device=device) - with self.assertRaisesRegex(RuntimeError, r"values must have dim\=1 but got values\.dim\(\)\=2"): - torch.sparse_csr_tensor(torch.tensor(crow_indices), - torch.tensor(col_indices), - torch.tensor(values).repeat(2, 1), + with self.assertRaisesRegex(RuntimeError, r"values must have dim >= 1 but got values\.dim\(\)\ = 0"): + torch.sparse_csr_tensor(crow_indices, + col_indices, + torch.zeros((), device=device, dtype=torch.int64), size, device=device) with self.assertRaisesRegex(RuntimeError, - r"crow_indices\.numel\(\) must be size\(0\) \+ 1, but got: 3"): - torch.sparse_csr_tensor(torch.tensor(crow_indices), torch.tensor(col_indices), torch.tensor(values), (1, 1), + r"crow_indices\.size\(-1\) must be equal to size\[-2\] \+ 1 \(that is 2\), but got: 3"): + torch.sparse_csr_tensor(crow_indices, col_indices, values, (1, 1), + device=device) + + + with self.assertRaisesRegex(RuntimeError, + r"Number of dimensions of crow_indices and col_indices must be the same"): + torch.sparse_csr_tensor(crow_indices, col_indices.repeat(2, 1), values, size, + device=device) + + with self.assertRaisesRegex(RuntimeError, + r"Number of dimensions of indices and values must be the same"): + torch.sparse_csr_tensor(crow_indices, col_indices, values.repeat(2, 1), size, device=device) + with self.assertRaisesRegex(RuntimeError, + r"Number of dimensions of indices must be one less"): + torch.sparse_csr_tensor(crow_indices.repeat(2, 1), col_indices.repeat(2, 1), values.repeat(2, 1), size, + device=device) with self.assertRaisesRegex(RuntimeError, - r"col_indices and values must have equal sizes, " + - r"but got col_indices\.numel\(\): 3, values\.numel\(\): 4"): - torch.sparse_csr_tensor(torch.tensor(crow_indices), torch.tensor([0, 1, 0]), torch.tensor(values), size, + r"All batch dimensions of the provided size, indices, and values must be the same"): + torch.sparse_csr_tensor(crow_indices.repeat(2, 1), col_indices.repeat(3, 1), values.repeat(4, 1), (2, 2, 10), device=device) def test_factory_indices_invariants_check(self, device): @@ -413,7 +480,7 @@ def test_factory_indices_invariants_check(self, device): with self.assertRaisesRegex(RuntimeError, r"at position i \= 2," + - r" this condition crow_indices\[i - 1\] <\= crow_indices\[i\] fails"): + r" the condition crow_indices\[i - 1\] <\= crow_indices\[i\] fails"): torch.sparse_csr_tensor(torch.tensor([0, 5, 4]), torch.tensor(col_indices), torch.tensor(values), size, device=device) @@ -421,12 +488,12 @@ def test_factory_indices_invariants_check(self, device): torch.sparse_csr_tensor(torch.tensor(crow_indices), torch.tensor([0, -1, 0, 1]), torch.tensor(values), size, device=device) - with self.assertRaisesRegex(RuntimeError, r"size\(1\) should be greater than col_indices\.max\(\)"): + with self.assertRaisesRegex(RuntimeError, r"size\[-1\] should be greater than col_indices\.max\(\)"): torch.sparse_csr_tensor(torch.tensor(crow_indices), torch.tensor([0, 11, 0, 1]), torch.tensor(values), size, device=device) @onlyCUDA - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) def test_factory_device_type_inference(self, device, dtype): cpu_cuda = ('cpu', 'cuda') cpu_cuda_none = cpu_cuda + (None,) @@ -497,7 +564,7 @@ def test_sparse_csr_print(self, device): self.assertExpected('\n'.join(printed)) self.maxDiff = orig_maxDiff - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) def test_sparse_csr_from_dense(self, device, dtype): dense = torch.tensor([[4, 5, 0], [0, 0, 0], [1, 0, 0]], dtype=dtype, device=device) sparse = dense.to_sparse_csr() @@ -517,7 +584,7 @@ def test_sparse_csr_from_dense(self, device, dtype): self.assertEqual(torch.tensor([0, 1, 2] * 3, dtype=torch.int64), sparse.col_indices()) self.assertEqual(torch.tensor([2] * 9, dtype=dtype), sparse.values()) - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) def test_sparse_csr_to_dense(self, device, dtype): mn = [5, 2, 0] for (m, n) in itertools.product(mn, mn): @@ -526,12 +593,12 @@ def test_sparse_csr_to_dense(self, device, dtype): sparse = dense.to_sparse_csr() self.assertEqual(sparse.to_dense(), dense) - crow_indices = torch.tensor([0, 3, 5]) - col_indices = torch.tensor([0, 1, 2, 0, 1]) - values = torch.tensor([1, 2, 1, 3, 4], dtype=dtype) - csr = torch.sparse_csr_tensor(crow_indices, col_indices, - values, dtype=dtype, device=device) - dense = torch.tensor([[1, 2, 1], [3, 4, 0]], dtype=dtype, device=device) + batch_shape = (2, 3) + crow_indices = torch.tensor([0, 3, 5], device=device).repeat(6, 1).reshape(*batch_shape, -1) + col_indices = torch.tensor([0, 1, 2, 0, 1], device=device).repeat(6, 1).reshape(*batch_shape, -1) + values = torch.tensor([1, 2, 1, 3, 4], device=device, dtype=dtype).repeat(6, 1).reshape(*batch_shape, -1) + csr = torch.sparse_csr_tensor(crow_indices, col_indices, values, dtype=dtype, device=device) + dense = torch.tensor([[1, 2, 1], [3, 4, 0]], dtype=dtype, device=device).repeat(6, 1).reshape(csr.shape) self.assertEqual(csr.to_dense(), dense) @skipCPUIfNoMklSparse @@ -577,7 +644,39 @@ def test_coo_to_csr_convert(self, device, dtype, coalesced): values = torch.tensor([2, 1, 6, 4, 10, 3, 5, 9, 8, 7], dtype=dtype, device=device) self.assertEqual(csr.values(), values) - @dtypes(*get_all_dtypes()) + @parametrize("blocksize", [2, 4]) + @parametrize("shape", [(24, 24), (12, 24)]) + @dtypes((torch.double, torch.int32), (torch.double, torch.int64)) + @unittest.skipIf(not TEST_SCIPY, "SciPy not found") + @skipMeta + def test_csr_to_block_csr(self, device, dtypes, shape, blocksize): + dtype, index_dtype = dtypes + m, k = shape + nnz = random.randint(0, m * k) + t = self.genSparseCSRTensor((m * blocksize, k * blocksize), nnz, dtype=dtype, + device=device, index_dtype=index_dtype) + st = sp.csr_matrix((t.values().cpu(), t.col_indices().cpu(), t.crow_indices().cpu()), shape=tuple(t.size())) + block_t = torch.sparse._csr_to_block_csr(t, (blocksize, blocksize)) + self.assertEqual(block_t.values().dim(), 3) + block_st = st.tobsr(blocksize=(blocksize, blocksize)) + self.assertEqual(block_t.values().cpu(), block_st.data) + self.assertEqual(block_t.col_indices().cpu(), torch.tensor(block_st.indices).to(index_dtype)) + self.assertEqual(block_t.crow_indices().cpu(), torch.tensor(block_st.indptr).to(index_dtype)) + + @dtypes(torch.double) + @unittest.skipIf(not TEST_SCIPY, "SciPy not found") + def test_csr_to_block_csr_errors(self, device, dtype): + for index_dtype in [torch.int32, torch.int64]: + nnz = 15 + t = self.genSparseCSRTensor((16, 16), nnz, dtype=dtype, + device=device, index_dtype=index_dtype) + with self.assertRaisesRegex(RuntimeError, "must be square."): + block_t = torch.sparse._csr_to_block_csr(t, (2, 3)) + + with self.assertRaisesRegex(RuntimeError, r"size \(16, 16\) with block size \(5, 5\)"): + block_t = torch.sparse._csr_to_block_csr(t, (5, 5)) + + @dtypes(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) def test_sparse_csr_from_dense_convert_error(self, device, dtype): size = (4, 2, 4) dense = make_tensor(size, dtype=dtype, device=device) @@ -603,8 +702,9 @@ def test_matmul_device_mismatch(self, device, dtype): @skipCPUIfNoMklSparse @skipCUDAIfNoCusparseGeneric @dtypes(*floating_and_complex_types()) - @dtypesIfCUDA(*get_all_complex_dtypes(), - *get_all_fp_dtypes(include_half=SM53OrLater, include_bfloat16=SM80OrLater)) + @dtypesIfCUDA(*floating_and_complex_types_and( + *[torch.half] if SM53OrLater else [], + *[torch.bfloat16] if SM80OrLater else [])) def test_csr_matvec(self, device, dtype): side = 100 for index_dtype in [torch.int32, torch.int64]: @@ -709,45 +809,61 @@ def run_test_block_addmm_addmv(self, addmv_addmm, c, a, b, op_b=False, op_out=Fa self.assertEqual(actual, out) self.assertEqual(actual, expected) + @parametrize("block_size", [1, 2, 3]) + @parametrize("index_dtype", [torch.int32, torch.int64]) @skipCPUIfNoMklSparse @unittest.skipIf(not TEST_SCIPY, "SciPy not found") @dtypes(torch.float32, torch.float64, torch.complex64, torch.complex128) - def test_block_addmm(self, device, dtype): - for index_dtype in [torch.int32, torch.int64]: - for (m, n, k), block_size, noncontiguous in zip(itertools.product([1, 5], repeat=3), [1, 2, 3], [True, False]): - nnz = random.randint(0, m * k) + def test_block_addmm(self, device, dtype, index_dtype, block_size): + for (m, n, k), noncontiguous in zip(itertools.product([1, 5], repeat=3), [True, False]): + nnz = random.randint(0, m * k) + if not noncontiguous: + a = self.genSparseCSRTensor((m * block_size, k * block_size), nnz, + dtype=dtype, device=device, index_dtype=index_dtype) + a = torch.sparse._csr_to_block_csr(a, (block_size, block_size)) + else: a = self.genSparseCSRTensor((m, k), nnz, dtype=dtype, device=device, index_dtype=index_dtype) a_data = make_tensor((nnz, block_size, block_size), dtype=dtype, device=device) a_data = a_data.mT if noncontiguous else a_data # Test column-major blocks - a = torch._sparse_csr_tensor_unsafe(a.crow_indices(), a.col_indices(), a_data, (m * block_size, k * block_size)) - b = make_tensor((k * block_size, n * block_size), dtype=dtype, device=device, noncontiguous=noncontiguous) - c = make_tensor((m * block_size, n * block_size), dtype=dtype, device=device, noncontiguous=noncontiguous) - for op_b, op_out in itertools.product([True, False], repeat=2): - self.run_test_block_addmm_addmv(torch.addmm, c, a, b, op_b, op_out, dtype=dtype, device=device) - + a = torch._sparse_csr_tensor_unsafe(a.crow_indices(), a.col_indices(), + a_data, (m * block_size, k * block_size)) + b = make_tensor((k * block_size, n * block_size), dtype=dtype, device=device, noncontiguous=noncontiguous) + c = make_tensor((m * block_size, n * block_size), dtype=dtype, device=device, noncontiguous=noncontiguous) + for op_b, op_out in itertools.product([True, False], repeat=2): + self.run_test_block_addmm_addmv(torch.addmm, c, a, b, op_b, op_out, dtype=dtype, device=device) + + @parametrize("block_size", [2, 3]) + @parametrize("index_dtype", [torch.int32, torch.int64]) @skipCPUIfNoMklSparse @unittest.skipIf(not TEST_SCIPY, "SciPy not found") @dtypes(torch.float32, torch.float64, torch.complex64, torch.complex128) - def test_block_addmv(self, device, dtype): - for index_dtype in [torch.int32, torch.int64]: - block_sizes = [1, 2, 3] - if TEST_WITH_ROCM or not TEST_CUSPARSE_GENERIC: - block_sizes = [2, 3] - for (m, k), block_size, noncontiguous in zip(itertools.product([1, 5], repeat=2), block_sizes, [True, False]): - nnz = random.randint(0, m * k) + def test_block_addmv(self, device, dtype, index_dtype, block_size): + # TODO: Explicitly disable block size 1 support + # if (TEST_WITH_ROCM or not TEST_CUSPARSE_GENERIC) and block_size == 1: + # return + for (m, k), noncontiguous in zip(itertools.product([1, 5], repeat=2), [True, False]): + nnz = random.randint(0, m * k) + if not noncontiguous: + a = self.genSparseCSRTensor((m * block_size, k * block_size), nnz, + dtype=dtype, device=device, index_dtype=index_dtype) + a = torch.sparse._csr_to_block_csr(a, (block_size, block_size)) + else: a = self.genSparseCSRTensor((m, k), nnz, dtype=dtype, device=device, index_dtype=index_dtype) a_data = make_tensor((nnz, block_size, block_size), dtype=dtype, device=device) - a_data = a_data.mT if noncontiguous else a_data # Test column-major blocks - a = torch._sparse_csr_tensor_unsafe(a.crow_indices(), a.col_indices(), a_data, (m * block_size, k * block_size)) - b = make_tensor((k * block_size,), dtype=dtype, device=device, noncontiguous=noncontiguous) - c = make_tensor((m * block_size,), dtype=dtype, device=device, noncontiguous=noncontiguous) - self.run_test_block_addmm_addmv(torch.addmv, c, a, b, dtype=dtype, device=device) - + a_data = a_data.mT if noncontiguous else a_data # Test column-major blocks + a = torch._sparse_csr_tensor_unsafe(a.crow_indices(), a.col_indices(), + a_data, (m * block_size, k * block_size)) + b = make_tensor((k * block_size,), dtype=dtype, device=device, noncontiguous=noncontiguous) + c = make_tensor((m * block_size,), dtype=dtype, device=device, noncontiguous=noncontiguous) + self.run_test_block_addmm_addmv(torch.addmv, c, a, b, dtype=dtype, device=device) + + @parametrize("block_size", [2, 3]) + @parametrize("index_dtype", [torch.int32, torch.int64]) @skipCPUIfNoMklSparse @skipCUDAIfRocm @unittest.skipIf(not TEST_SCIPY, "SciPy not found") @dtypes(torch.float32, torch.float64, torch.complex64, torch.complex128) - def test_block_triangular_solve(self, device, dtype): + def test_block_triangular_solve(self, device, dtype, index_dtype, block_size): def run_test(a, b, upper, transpose, unitriangular, op_out): actual = torch.triangular_solve(b, a, upper=upper, unitriangular=unitriangular, transpose=transpose) actual_X = actual.solution @@ -782,53 +898,70 @@ def run_test(a, b, upper, transpose, unitriangular, op_out): self.assertEqual(out, actual_X) self.assertEqual(out, expected_X) - for index_dtype in [torch.int32, torch.int64]: - for (m, k), block_size, noncontiguous in zip(itertools.product([1, 5], repeat=2), [2, 3], [True, False]): - nnz = random.randint(0, m * m) + for (m, k), noncontiguous in zip(itertools.product([1, 5], repeat=2), [True, False]): + nnz = random.randint(0, m * m) + if not noncontiguous: + a = self.genSparseCSRTensor((m * block_size, m * block_size), nnz, + dtype=dtype, device=device, index_dtype=index_dtype) + a = torch.sparse._csr_to_block_csr(a, (block_size, block_size)) + else: a = self.genSparseCSRTensor((m, m), nnz, dtype=dtype, device=device, index_dtype=index_dtype) a_data = make_tensor((nnz, block_size, block_size), dtype=dtype, device=device) a_data = a_data.mT if noncontiguous else a_data # Test column-major blocks - a = torch._sparse_csr_tensor_unsafe(a.crow_indices(), a.col_indices(), a_data, (m * block_size, m * block_size)) - b = make_tensor((m * block_size, k), dtype=dtype, device=device, noncontiguous=noncontiguous) + a = torch._sparse_csr_tensor_unsafe(a.crow_indices(), a.col_indices(), + a_data, (m * block_size, m * block_size)) + b = make_tensor((m * block_size, k), dtype=dtype, device=device, noncontiguous=noncontiguous) - for (upper, unitriangular, transpose, op_out) in itertools.product([True, False], repeat=4): - run_test(a, b, upper, unitriangular, transpose, op_out) + for (upper, unitriangular, transpose, op_out) in itertools.product([True, False], repeat=4): + run_test(a, b, upper, unitriangular, transpose, op_out) @skipCPUIfNoMklSparse @dtypes(torch.double) def test_mm(self, device, dtype): - def test_shape(di, dj, dk, nnz): + def test_shape(di, dj, dk, nnz0=None, nnz1=None): for index_dtype in [torch.int32, torch.int64]: - x = self.genSparseCSRTensor((di, dj), nnz, device=device, dtype=dtype, index_dtype=index_dtype) - t = torch.randn(di, dk, dtype=dtype, device=device) - y = torch.randn(dj, dk, dtype=dtype, device=device) alpha = random.random() beta = random.random() - # res = beta * t + alpha * (x @ y) - res = torch.addmm(t, x, y, beta=beta, alpha=alpha) - expected = torch.addmm(t, x.to_dense(), y, beta=beta, alpha=alpha) - self.assertEqual(res, expected) - - res = torch.addmm(t, x, y) - expected = torch.addmm(t, x.to_dense(), y) - self.assertEqual(res, expected) - - res = torch.mm(x, y) - expected = torch.mm(x.to_dense(), y) - self.assertEqual(res, expected) + def _test(t, x, y): + # res = beta * t + alpha * (x @ y) + res = torch.addmm(t, x, y, beta=beta, alpha=alpha) + expected = torch.addmm(t, x.to_dense(), y.to_dense(), beta=beta, alpha=alpha) + self.assertEqual(res, expected) + + res = torch.addmm(t, x, y) + expected = torch.addmm(t, x.to_dense(), y.to_dense()) + self.assertEqual(res, expected) + + res = torch.mm(x, y) + expected = torch.mm(x.to_dense(), y.to_dense()) + self.assertEqual(res, expected) + + if nnz0 is None: + nnz0 = random.randint(di * dk // 2, di * dk) + t = torch.randn(di, dj, dtype=dtype, device=device) + x = self.genSparseCSRTensor((di, dk), nnz0, device=device, dtype=dtype, index_dtype=index_dtype) + y = torch.randn(dk, dj, dtype=dtype, device=device) + _test(t, x, y) + + if nnz1 is None: + nnz1 = random.randint(dk * dj // 2, dk * dj) + t = torch.randn(di, dj, dtype=dtype, device=device) + x = torch.randn(di, dk, dtype=dtype, device=device) + y = self.genSparseCSRTensor((dk, dj), nnz1, device=device, dtype=dtype, index_dtype=index_dtype) + _test(t, x, y) for i in range(2, 5): for j in range(2, 8): for k in range(2, 8): - test_shape(i, j, k, i * j // 2) - test_shape(4, 4, 4, 0) + test_shape(i, j, k) + test_shape(4, 4, 4, 0, 0) @skipCPUIfNoMklSparse @dtypes(*floating_and_complex_types()) - @dtypesIfCUDA(*get_all_complex_dtypes(), - *get_all_fp_dtypes(include_half=SM53OrLater and TEST_CUSPARSE_GENERIC, - include_bfloat16=SM80OrLater and TEST_CUSPARSE_GENERIC)) + @dtypesIfCUDA(*floating_and_complex_types_and( + *[torch.half] if SM53OrLater and TEST_CUSPARSE_GENERIC else [], + *[torch.bfloat16] if SM80OrLater and TEST_CUSPARSE_GENERIC else [])) @precisionOverride({torch.bfloat16: 1e-2, torch.float16: 1e-2}) def test_sparse_mm(self, device, dtype): def test_shape(d1, d2, d3, nnz, transposed, index_dtype): @@ -845,9 +978,9 @@ def test_shape(d1, d2, d3, nnz, transposed, index_dtype): test_shape(7, 8, 9, 20, True, index_dtype) @dtypes(*floating_and_complex_types()) - @dtypesIfCUDA(*get_all_complex_dtypes(), - *get_all_fp_dtypes(include_half=SM53OrLater and TEST_CUSPARSE_GENERIC, - include_bfloat16=SM80OrLater and TEST_CUSPARSE_GENERIC)) + @dtypesIfCUDA(*floating_and_complex_types_and( + *[torch.half] if SM53OrLater and TEST_CUSPARSE_GENERIC else [], + *[torch.bfloat16] if SM80OrLater and TEST_CUSPARSE_GENERIC else [])) @precisionOverride({torch.bfloat16: 1e-2, torch.float16: 1e-2}) def test_sparse_addmm(self, device, dtype): def test_shape(m, n, p, nnz, broadcast, index_dtype, alpha_beta=None): @@ -879,10 +1012,10 @@ def test_shape(m, n, p, nnz, broadcast, index_dtype, alpha_beta=None): @dtypes(*floating_and_complex_types()) @precisionOverride({torch.double: 1e-8, torch.float: 1e-4, torch.bfloat16: 0.6, torch.half: 1e-1, torch.cfloat: 1e-4, torch.cdouble: 1e-8}) - @dtypesIfCUDA(torch.complex64, - *((torch.complex128,) if CUSPARSE_SPMM_COMPLEX128_SUPPORTED else ()), - *torch.testing.get_all_fp_dtypes(include_bfloat16=SM80OrLater, - include_half=SM53OrLater)) + @dtypesIfCUDA(*floating_types_and(torch.complex64, + *[torch.bfloat16] if SM80OrLater else [], + *[torch.half] if SM53OrLater else [], + *[torch.complex128] if CUSPARSE_SPMM_COMPLEX128_SUPPORTED else [])) @skipCUDAIf( not _check_cusparse_spgemm_available(), "cuSparse Generic API SpGEMM is not available" @@ -950,32 +1083,32 @@ def maybe_transpose(cond, m): m2 = maybe_transpose(t3, torch.randn(50, 25, device=device).to(dtype)) _test_addmm_addmv(self, torch.addmm, M, m1, m2, transpose_out=t4, layout=torch.sparse_csr, mode="dense_result") + @parametrize("k", [0, 1, 8]) + @parametrize("n", [0, 1, 10]) + @parametrize("m", [0, 1, 25]) @skipCPUIfNoMklSparse @dtypes(*floating_and_complex_types()) - @dtypesIfCUDA(torch.complex64, - *((torch.complex128,) if CUSPARSE_SPMM_COMPLEX128_SUPPORTED else ()), - *torch.testing.get_all_fp_dtypes(include_bfloat16=SM80OrLater, - include_half=SM53OrLater)) + @dtypesIfCUDA(*floating_types_and(torch.complex64, + *[torch.bfloat16] if SM80OrLater else [], + *[torch.half] if SM53OrLater else [], + *[torch.complex128] if CUSPARSE_SPMM_COMPLEX128_SUPPORTED else [])) @skipCUDAIf( not _check_cusparse_spgemm_available(), "cuSparse Generic API SpGEMM is not available" ) @precisionOverride({torch.double: 1e-8, torch.float: 1e-4, torch.bfloat16: 0.6, torch.half: 1e-1, torch.cfloat: 1e-4, torch.cdouble: 1e-8}) - def test_addmm_sizes_all_sparse_csr(self, device, dtype): - for m in [0, 1, 25]: - for n in [0, 1, 10]: - for k in [0, 1, 8]: - M = torch.randn(n, m, device=device).to(dtype) - m1 = torch.randn(n, k, device=device).to(dtype) - m2 = torch.randn(k, m, device=device).to(dtype) - _test_addmm_addmv(self, torch.addmm, M, m1, m2, layout=torch.sparse_csr, mode="all_sparse") - - M = torch.randn(n, m, device=device).to(dtype).to_sparse_csr() - m1 = torch.randn(n, k + 1, device=device).to(dtype).to_sparse_csr() - m2 = torch.randn(k, m, device=device).to(dtype).to_sparse_csr() - self.assertRaisesRegex(RuntimeError, f"{n}x{k + 1}.*{k}x{m}", lambda: torch.addmm(M, m1, m2)) - self.assertRaisesRegex(RuntimeError, f"{n}x{k + 1}.*{k}x{m}", lambda: torch.mm(m1, m2)) + def test_addmm_sizes_all_sparse_csr(self, device, dtype, m, n, k): + M = torch.randn(n, m, device=device).to(dtype) + m1 = torch.randn(n, k, device=device).to(dtype) + m2 = torch.randn(k, m, device=device).to(dtype) + _test_addmm_addmv(self, torch.addmm, M, m1, m2, layout=torch.sparse_csr, mode="all_sparse") + + M = torch.randn(n, m, device=device).to(dtype).to_sparse_csr() + m1 = torch.randn(n, k + 1, device=device).to(dtype).to_sparse_csr() + m2 = torch.randn(k, m, device=device).to(dtype).to_sparse_csr() + self.assertRaisesRegex(RuntimeError, f"{n}x{k + 1}.*{k}x{m}", lambda: torch.addmm(M, m1, m2)) + self.assertRaisesRegex(RuntimeError, f"{n}x{k + 1}.*{k}x{m}", lambda: torch.mm(m1, m2)) @skipCPUIfNoMklSparse @dtypes(torch.float) @@ -1051,6 +1184,9 @@ def test2(*, is_sparse): @dtypes(torch.float, torch.double) def test_add(self, device, dtype): def _test_spadd_shape(nnz, shape): + # sparse.to_dense() uses torch.add internally so if torch.add is wrong, + # the dense tensor will be wrong but this test would still pass + # there's a separate test that checks for the correctness of the .to_dense() call x = self.genSparseCSRTensor(shape, nnz, dtype=dtype, device=device, index_dtype=torch.int32) y = torch.randn(*shape, dtype=dtype, device=device) r = random.random() @@ -1072,10 +1208,42 @@ def _test_spadd_shape(nnz, shape): self.assertEqual(res, expected) - _test_spadd_shape(10, [100, 100]) - _test_spadd_shape(0, [100, 100]) - _test_spadd_shape(10, [100, 1]) - _test_spadd_shape(10, [1, 100]) + ns = [2, 5] + batch_shapes = [(), (2,), (2, 3)] + for b, m, n in itertools.product(batch_shapes, ns, ns): + _test_spadd_shape(0, (*b, m, n)) + _test_spadd_shape(m * n // 2, (*b, m, n)) + _test_spadd_shape(m * n, (*b, m, n)) + + @dtypes(torch.float, torch.double) + def test_mul(self, device, dtype): + def _test_spadd_shape(fn, nnz, shape): + x = self.genSparseCSRTensor(shape, nnz, dtype=dtype, device=device, index_dtype=torch.int32) + y = self.genSparseCSRTensor(shape, nnz, dtype=dtype, device=device, index_dtype=torch.int32) + + res = fn(y, x) + expected = fn(y.to_dense(), x.to_dense()).to_sparse_csr() + self.assertEqual(res, expected) + + _test_spadd_shape(torch.mul, 100, [100, 100]) + _test_spadd_shape(torch.mul, 0, [100, 100]) + _test_spadd_shape(torch.mul, 100, [100, 1]) + _test_spadd_shape(torch.mul, 100, [1, 100]) + + s = torch.sparse_coo_tensor([[0], [1]], [5.0], (2, 3), device=device) + s = s.to_sparse_csr() + t23 = s.to_dense() + + if device == 'cpu': + with self.assertRaisesRegex(RuntimeError, r"mul\(sparse_csr, dense\) is not supported"): + s * t23 + with self.assertRaisesRegex(RuntimeError, r"mul\(dense, sparse_csr\) is not supported"): + t23 * s + elif device == 'cuda': + with self.assertRaisesRegex(NotImplementedError, "CUDA"): + s * t23 + with self.assertRaisesRegex(NotImplementedError, "CUDA"): + t23 * s @skipCPUIfNoMklSparse @dtypes(torch.float32, torch.float64, torch.complex64, torch.complex128) @@ -1297,7 +1465,7 @@ def test_sampled_addmm_errors(self, device, dtype): torch.sparse.sampled_addmm(a_sparse, a, a_sparse) @skipMeta - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) def test_coo_csr_conversion(self, device, dtype): for m, n in itertools.product([5, 2, 0], [5, 2, 0]): size = (m, n) @@ -1308,7 +1476,7 @@ def test_coo_csr_conversion(self, device, dtype): self.assertEqual(csr_sparse.to_dense(), dense) @skipMeta - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) def test_csr_coo_conversion(self, device, dtype): for m, n in itertools.product([5, 2, 0], [5, 2, 0]): size = (m, n) @@ -1332,7 +1500,9 @@ def test_sparse_csr_consistency(self, device, dtype, op): # Sparse CSR only supports 2D tensors as inputs if sample.input.ndim != 2: continue - + # Reductions on sparse CSR require keepdim=True + if isinstance(op, ReductionOpInfo): + continue expected = op(sample.input) assert torch.is_tensor(expected) output = op(sample.input.to_sparse_csr()) @@ -1389,10 +1559,7 @@ def test_sparse_csr_unary_out(self, device, dtype, op): index_dtype=sample.input.crow_indices().dtype) op(sample.input, *sample.args, **sample.kwargs, out=out) - self.assertEqual(out.values(), expect.values()) - self.assertEqual(out.crow_indices(), expect.crow_indices()) - self.assertEqual(out.col_indices(), expect.col_indices()) - self.assertEqual(out._nnz(), expect._nnz()) + self.assertEqual(out, expect) @ops(sparse_csr_unary_ufuncs) def test_sparse_csr_unary_inplace(self, device, dtype, op): @@ -1424,10 +1591,7 @@ def test_sparse_csr_unary_inplace(self, device, dtype, op): actual = op.inplace_variant(sample.input, *sample.args, **sample.kwargs) self.assertIs(actual, sample.input) - self.assertEqual(actual.values(), expect.values()) - self.assertEqual(actual.crow_indices(), expect.crow_indices()) - self.assertEqual(actual.col_indices(), expect.col_indices()) - self.assertEqual(actual._nnz(), expect._nnz()) + self.assertEqual(actual, expect) @unittest.expectedFailure @ops(sparse_csr_unary_ufuncs, dtypes=OpDTypes.supported, allowed_dtypes=[torch.double, torch.cdouble]) @@ -1469,7 +1633,8 @@ def test_autograd_dense_output_addmm(self, device, dtype): raise ValueError("Expected at least one 2D tensor in samples to convert to sparse.") for sample in samples: - a = sample.args[0].to_sparse_csr() + # TODO: Remove detach once we have autograd support for CSR input + a = sample.args[0].to_sparse_csr().detach() for addmm in [torch.addmm, torch.sparse.addmm]: @@ -1500,7 +1665,8 @@ def test_autograd_dense_output_addmv(self, device, dtype): raise ValueError("Expected at least one 2D tensor in samples to convert to sparse.") for sample in samples: - a = sample.args[0].to_sparse_csr() + # TODO: Remove detach once we have autograd support for CSR input + a = sample.args[0].to_sparse_csr().detach() def fn(c, b): output = torch.addmv(c, a, b, **sample.kwargs) @@ -1532,7 +1698,8 @@ def test_autograd_dense_output(self, device, dtype, op): # Here we assume that the signature is op(sparse_input, dense_input) -> dense_output for sample in samples: - sparse_input = sample.input.to_sparse_csr() + # TODO: Remove detach once we have autograd support for CSR input + sparse_input = sample.input.to_sparse_csr().detach() def fn(*args): output = op.gradcheck_wrapper(op.get_op(), sparse_input, *args, **sample.kwargs) @@ -1546,7 +1713,7 @@ def fn(*args): args = [make_tensor(a.shape, device=device, dtype=dtype, noncontiguous=True, requires_grad=True) for a in sample.args] self.assertTrue(torch.autograd.gradcheck(fn, args, fast_mode=True)) - @dtypes(*get_all_dtypes(include_bool=False)) + @dtypes(*all_types_and_complex()) def test_direct_coo_csr_conversion(self, device, dtype): for m, n in itertools.product([5, 2, 0], [5, 2, 0]): size = (m, n) @@ -1556,7 +1723,27 @@ def test_direct_coo_csr_conversion(self, device, dtype): self.assertEqual(coo_sparse.to_sparse_csr().to_sparse_coo(), coo_sparse) @skipMeta - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) + def test_sum(self, device, dtype): + def run_test(shape, nnz, index_type): + a = self.genSparseCSRTensor(shape, nnz, dtype=dtype, device=device, index_dtype=index_dtype) + self.assertEqual(a.sum(), a.values().sum()) + if dtype in floating_types(): + a.requires_grad_(True) + with self.assertRaisesRegex(RuntimeError, + ("Function SumBackward0 returned an invalid gradient at " + + "index 0 - expected layout SparseCsr but got Strided")): + a.sum().backward() + for shape, index_dtype in itertools.product( + [(10, 5), (10, 10)], + [torch.int32, torch.int64]): + run_test(shape, 0, index_dtype) + run_test(shape, max(shape), index_dtype) + run_test(shape, shape[0] * shape[1], index_dtype) + + + @skipMeta + @dtypes(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) def test_transpose(self, device, dtype): def run_test(shape, nnz, index_type, dim0, dim1): @@ -1577,16 +1764,14 @@ def run_test(shape, nnz, index_type, dim0, dim1): # TODO: This is a stopgap for a rigorous extension of our autograd tests # to test the functionality of detach @skipMeta - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) def test_exercise_detach(self, device, dtype): shape = (3, 3) nnz = 4 for index_dtype in [torch.int32, torch.int64]: inp = self.genSparseCSRTensor(shape, nnz, dtype=dtype, device=device, index_dtype=index_dtype) detached_inp = inp.detach() - self.assertEqual(inp.values(), detached_inp.values()) - self.assertEqual(inp.crow_indices(), detached_inp.crow_indices()) - self.assertEqual(inp.col_indices(), detached_inp.col_indices()) + self.assertEqual(inp, detached_inp) diff --git a/test/test_spectral_ops.py b/test/test_spectral_ops.py index c11b87b507aec1..344c810bd4bd44 100644 --- a/test/test_spectral_ops.py +++ b/test/test_spectral_ops.py @@ -13,7 +13,7 @@ (TestCase, run_tests, TEST_NUMPY, TEST_LIBROSA, TEST_MKL) from torch.testing._internal.common_device_type import \ (instantiate_device_type_tests, ops, dtypes, onlyNativeDeviceTypes, - skipCPUIfNoFFT, deviceCountAtLeast, onlyCUDA, OpDTypes, skipIf) + skipCPUIfNoFFT, skipCUDAIfRocm, deviceCountAtLeast, onlyCUDA, OpDTypes, skipIf) from torch.testing._internal.common_methods_invocations import ( spectral_funcs, SpectralFuncInfo, SpectralFuncType) @@ -204,6 +204,7 @@ def get_op_name(op): else: return (input, s, dim, norm) + @skipCUDAIfRocm @onlyNativeDeviceTypes @ops([op for op in spectral_funcs if op.ndimensional == SpectralFuncType.OneD]) def test_reference_1d(self, device, dtype, op): @@ -367,6 +368,7 @@ def test_fft_half_and_bfloat16_errors(self, device, dtype, op): op(x) # nd-fft tests + @skipCUDAIfRocm @onlyNativeDeviceTypes @unittest.skipIf(not TEST_NUMPY, 'NumPy not found') @ops([op for op in spectral_funcs if op.ndimensional == SpectralFuncType.ND]) diff --git a/test/test_tensor_creation_ops.py b/test/test_tensor_creation_ops.py index abb9710363cfe5..27a91c398b2679 100644 --- a/test/test_tensor_creation_ops.py +++ b/test/test_tensor_creation_ops.py @@ -20,8 +20,10 @@ onlyCPU, largeTensorTest, precisionOverride, dtypes, onlyCUDA, skipCPUIf, dtypesIfCUDA, skipMeta, get_all_device_types) from torch.testing._internal.common_dtype import ( - get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes + all_types_and_complex_and, get_all_math_dtypes, all_types_and, floating_and_complex_types, + floating_types, floating_and_complex_types_and, integral_types_and ) +from torch.testing._creation import float_to_corresponding_complex_type_map from torch.utils.dlpack import to_dlpack @@ -147,7 +149,7 @@ def test_vander_types(self, device, dtype): exact_dtype=False) def test_cat_all_dtypes_and_devices(self, device): - for dt in get_all_dtypes(): + for dt in all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16): x = torch.tensor([[1, 2], [3, 4]], dtype=dt, device=device) expected1 = torch.tensor([[1, 2], [3, 4], [1, 2], [3, 4]], dtype=dt, device=device) @@ -157,7 +159,7 @@ def test_cat_all_dtypes_and_devices(self, device): self.assertEqual(torch.cat((x, x), 1), expected2) def test_fill_all_dtypes_and_devices(self, device): - for dt in get_all_dtypes(): + for dt in all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16): for x in [torch.tensor((10, 10), dtype=dt, device=device), torch.empty(10000, dtype=dt, device=device)]: # large tensor numel = x.numel() @@ -311,7 +313,7 @@ def run_test(shape, device, diagonal, dtype): (3, 1), (5, 3, 1), (7, 5, 3, 1), # very fat matrices (1, 3), (5, 1, 3), (7, 5, 1, 3), # very thin matrices (1, 3, 3, 3), (3, 1, 3, 3, 3)] # unsqueezed batch dimensions - dtypes = [dtype for dtype in get_all_dtypes() if dtype != torch.bfloat16] + dtypes = all_types_and_complex_and(torch.half, torch.bool) for s, d, dtype in product(shapes, diagonals, dtypes): run_test(s, device, d, dtype) @@ -508,12 +510,12 @@ def test_block_diag_scipy(self, device): self.assertEqual(torch_result, scipy_result) @onlyNativeDeviceTypes - @dtypes(torch.float32, torch.float64) + @dtypes(torch.half, torch.float32, torch.float64) def test_torch_complex(self, device, dtype): real = torch.tensor([1, 2], device=device, dtype=dtype) imag = torch.tensor([3, 4], device=device, dtype=dtype) z = torch.complex(real, imag) - complex_dtype = torch.complex64 if dtype == torch.float32 else torch.complex128 + complex_dtype = float_to_corresponding_complex_type_map[dtype] self.assertEqual(torch.tensor([1.0 + 3.0j, 2.0 + 4.0j], dtype=complex_dtype), z) @onlyNativeDeviceTypes @@ -531,12 +533,12 @@ def test_torch_polar(self, device, dtype): @onlyNativeDeviceTypes @dtypes(torch.uint8, torch.int8, torch.int16, torch.int32, torch.int64, - torch.float16, torch.complex64, torch.complex128, torch.bool) + torch.complex64, torch.complex128, torch.bool) def test_torch_complex_floating_dtype_error(self, device, dtype): for op in (torch.complex, torch.polar): a = torch.tensor([1, 2], device=device, dtype=dtype) b = torch.tensor([3, 4], device=device, dtype=dtype) - error = r"Expected both inputs to be Float or Double tensors but " \ + error = r"Expected both inputs to be Half, Float or Double tensors but " \ r"got [A-Za-z]+ and [A-Za-z]+" with self.assertRaisesRegex(RuntimeError, error): op(a, b) @@ -1009,8 +1011,7 @@ def _test_special_stacks(self, dim, at_least_dim, torch_fn, np_fn, device, dtype np_fn(np_input) @onlyNativeDeviceTypes - @dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) + - get_all_complex_dtypes())) + @dtypes(*all_types_and_complex_and(torch.half)) def test_hstack_column_stack(self, device, dtype): ops = ((torch.hstack, np.hstack), (torch.column_stack, np.column_stack)) for torch_op, np_op in ops: @@ -1029,8 +1030,7 @@ def test_hstack_column_stack(self, device, dtype): torch_result) @onlyNativeDeviceTypes - @dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) + - get_all_complex_dtypes())) + @dtypes(*all_types_and_complex_and(torch.half)) def test_vstack_row_stack(self, device, dtype): ops = ((torch.vstack, np.vstack), (torch.row_stack, np.row_stack)) for torch_op, np_op in ops: @@ -1047,8 +1047,7 @@ def test_vstack_row_stack(self, device, dtype): self.assertEqual(actual, expected) @onlyNativeDeviceTypes - @dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) + - get_all_complex_dtypes())) + @dtypes(*all_types_and_complex_and(torch.half)) def test_dstack(self, device, dtype): self._test_special_stacks(2, 3, torch.dstack, np.dstack, device, dtype) for i in range(5): @@ -1600,6 +1599,10 @@ def test_cartesian_prod(self, device): def test_combinations(self, device): a = torch.tensor([1, 2, 3], device=device) + c = torch.combinations(a, r=0) + expected = torch.empty(0, dtype=a.dtype, device=device) + self.assertEqual(c, expected) + c = torch.combinations(a, r=1) expected = torch.tensor(list(combinations(a, r=1)), device=device) self.assertEqual(c, expected) @@ -1752,7 +1755,7 @@ def test_random_from_to_bool(self, device): lambda: t.random_(from_, to_) ) - @dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes())) + @dtypes(*all_types_and(torch.bfloat16, torch.half)) def test_random_full_range(self, device, dtype): size = 2000 alpha = 0.1 @@ -1786,7 +1789,7 @@ def test_random_full_range(self, device, dtype): self.assertTrue(from_ <= t.to(torch.double).min() < (from_ + delta)) self.assertTrue((to_inc_ - delta) < t.to(torch.double).max() <= to_inc_) - @dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes())) + @dtypes(*all_types_and(torch.bfloat16, torch.half)) def test_random_from_to(self, device, dtype): size = 2000 alpha = 0.1 @@ -1875,7 +1878,7 @@ def test_random_from_to(self, device, dtype): lambda: t.random_(from_, to_) ) - @dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes())) + @dtypes(*all_types_and(torch.bfloat16, torch.half)) def test_random_to(self, device, dtype): size = 2000 alpha = 0.1 @@ -1933,7 +1936,7 @@ def test_random_to(self, device, dtype): lambda: t.random_(from_, to_) ) - @dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes())) + @dtypes(*all_types_and(torch.bfloat16, torch.half)) def test_random_default(self, device, dtype): size = 2000 alpha = 0.1 @@ -2124,13 +2127,7 @@ def test_constructor_dtypes(self, device): self.assertRaises(TypeError, lambda: torch.set_default_tensor_type(torch.float32)) # don't allow passing dtype to set_default_dtype - for t in get_all_dtypes( - include_half=True, - include_bfloat16=True, - include_bool=True, - include_complex=True, - include_complex32=True, - include_qint=True): + for t in all_types_and_complex_and(torch.bool, torch.half, torch.bfloat16, torch.qint8): # only floating-point types are supported as the default type if t in ( torch.half, @@ -2668,8 +2665,17 @@ def test_empty_tensor_props(self, device): y = torch.empty(tuple(size_ones_instead_of_zeros), device=device) self.assertEqual(x.stride(), y.stride()) + @onlyNativeDeviceTypes + def test_empty_overflow(self, device): + with self.assertRaisesRegex(RuntimeError, 'Storage size calculation overflowed'): + torch.empty([2, 4, 2**29, 2**29], dtype=torch.float64) + with self.assertRaisesRegex(RuntimeError, 'Storage size calculation overflowed'): + torch.empty([8, 8, 2**29, 2**29], dtype=torch.float64) + with self.assertRaisesRegex(RuntimeError, 'Storage size calculation overflowed'): + torch.empty_strided([8, 8], [2**61, 1], dtype=torch.float64) + def test_eye(self, device): - for dtype in get_all_dtypes(): + for dtype in all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16): if dtype == torch.bfloat16: continue # Test the RuntimeError is raised when either m or n is a negative number @@ -2702,8 +2708,7 @@ def test_eye(self, device): self.assertEqual(res1, res2) @precisionOverride({torch.float: 1e-8, torch.double: 1e-10}) - @dtypes(*(get_all_fp_dtypes(include_half=False, include_bfloat16=False) + - get_all_complex_dtypes())) + @dtypes(*floating_and_complex_types()) def test_linspace_vs_numpy(self, device, dtype): start = -0.0316082797944545745849609375 + (0.8888888888j if dtype.is_complex else 0) end = .0315315723419189453125 + (0.444444444444j if dtype.is_complex else 0) @@ -2740,7 +2745,7 @@ def test_logspace_vs_numpy_complex(self, device, dtype): device, dtype) @precisionOverride({torch.float: 1e-6, torch.double: 1e-10}) - @dtypes(*get_all_fp_dtypes(include_half=False, include_bfloat16=False)) + @dtypes(*floating_types()) def test_logspace_vs_numpy(self, device, dtype): start = -0.0316082797944545745849609375 end = .0315315723419189453125 @@ -2832,8 +2837,6 @@ def test_signal_window_functions(self, device, dtype, window): self._test_signal_window_functions(window, dtype, device) @onlyNativeDeviceTypes - # See https://github.com/pytorch/pytorch/issues/72630 - @skipMeta @precisionOverride({torch.bfloat16: 5e-2, torch.half: 1e-3}) @unittest.skipIf(not TEST_SCIPY, "Scipy not found") @dtypesIfCUDA(torch.float, torch.double, torch.bfloat16, torch.half, torch.long) @@ -2847,7 +2850,7 @@ def test_tensor_factories_empty(self, device): shapes = [(5, 0, 1), (0,), (0, 0, 1, 0, 2, 0, 0)] for shape in shapes: - for dt in get_all_dtypes(): + for dt in all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16): self.assertEqual(shape, torch.zeros(shape, device=device, dtype=dt).shape) self.assertEqual(shape, torch.zeros_like(torch.zeros(shape, device=device, dtype=dt)).shape) @@ -2933,8 +2936,8 @@ def test_arange_bfloat16(self, device): bfloat16_tensor = torch.arange(0, 6, step=2, dtype=torch.bfloat16, device=device) self.assertEqual(ref_tensor, bfloat16_tensor) - @dtypes(*get_all_dtypes(include_bool=False, include_half=False)) - @dtypesIfCUDA(*get_all_dtypes(include_bool=False, include_half=True)) + @dtypes(*all_types_and_complex_and(torch.bfloat16)) + @dtypesIfCUDA(*all_types_and_complex_and(torch.bfloat16)) def test_linspace(self, device, dtype): _from = random.random() to = _from + random.random() @@ -3051,12 +3054,12 @@ def _test_linspace(self, device, dtype, steps): # See NOTE [Linspace+Logspace precision override] @skipCPUIf(True, "compares with CPU") @precisionOverride({torch.half: 0.0039 + LINSPACE_LOGSPACE_EXTRA_EPS}) - @dtypes(*(get_all_fp_dtypes() + get_all_complex_dtypes())) + @dtypes(*floating_and_complex_types_and(torch.half, torch.bfloat16)) def test_linspace_device_vs_cpu(self, device, dtype): self._test_linspace(device, dtype, steps=10) @skipCPUIf(True, "compares with CPU") - @dtypes(*(get_all_fp_dtypes() + get_all_complex_dtypes())) + @dtypes(*floating_and_complex_types_and(torch.half, torch.bfloat16)) def test_linspace_special_steps(self, device, dtype): for steps in self.LINSPACE_LOGSPACE_SPECIAL_STEPS: self._test_linspace(device, dtype, steps=steps) @@ -3097,10 +3100,9 @@ def test_logspace_special_steps(self, device, dtype): self._test_logspace(device, dtype, steps=steps) self._test_logspace_base2(device, dtype, steps=steps) - @dtypes(*get_all_dtypes(include_bool=False, include_half=False, include_complex=False)) - @dtypesIfCUDA(*((get_all_int_dtypes() + [torch.float32, torch.float16, torch.bfloat16]) - if TEST_WITH_ROCM - else get_all_dtypes(include_bool=False, include_half=True, include_complex=False))) + @dtypes(*all_types_and(torch.bfloat16)) + @dtypesIfCUDA(*integral_types_and(torch.half, torch.bfloat16, torch.float32) if TEST_WITH_ROCM else + all_types_and(torch.half, torch.bfloat16)) def test_logspace(self, device, dtype): _from = random.random() to = _from + random.random() @@ -3898,7 +3900,7 @@ def check(**kwargs): # data pointer (which is basically the point here), since they all # return 0. @skipMeta - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) def test_alias_from_tensor(self, device, dtype): self._test_alias_with_cvt(identity, device, dtype) @@ -3909,7 +3911,7 @@ def test_alias_from_numpy(self, device, dtype): # Skipping 'meta', since 'to_dlpack' does not work for them. @skipMeta - @dtypes(*get_all_dtypes(include_bool=False)) + @dtypes(*all_types_and_complex_and(torch.half, torch.bfloat16)) def test_alias_from_dlpack(self, device, dtype): self._test_alias_with_cvt(to_dlpack, device, dtype) @@ -3941,13 +3943,13 @@ def check(**kwargs): # Copy is forced because of different dtype if not only_with_dtype: - for other in get_all_dtypes(): + for other in all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16): if dtype != other: check(same_dtype=False, dtype=other) check(same_dtype=False, dtype=other, copy=True) @skipMeta - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) def test_copy_tensor(self, device, dtype): self._test_copy_with_cvt(identity, device, dtype) @@ -3957,7 +3959,7 @@ def test_copy_from_numpy(self, device, dtype): self._test_copy_with_cvt(to_numpy, device, dtype) @skipMeta - @dtypes(*get_all_dtypes(include_bool=False)) + @dtypes(*all_types_and_complex_and(torch.half, torch.bfloat16)) def test_copy_from_dlpack(self, device, dtype): self._test_copy_with_cvt(to_dlpack, device, dtype) @@ -3980,17 +3982,17 @@ def check(**kwargs): @onlyCUDA @deviceCountAtLeast(2) - @dtypes(*get_all_dtypes(include_bool=False)) + @dtypes(*all_types_and_complex_and(torch.half, torch.bfloat16)) def test_copy_from_tensor_mult_devices(self, devices, dtype): self._test_copy_mult_devices(devices, dtype, identity) @onlyCUDA @deviceCountAtLeast(2) - @dtypes(*get_all_dtypes(include_bool=False)) + @dtypes(*all_types_and_complex_and(torch.half, torch.bfloat16)) def test_copy_from_dlpack_mult_devices(self, devices, dtype): self._test_copy_mult_devices(devices, dtype, to_dlpack) - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) def test_copy_list(self, device, dtype): original = make_tensor((5, 5), dtype=dtype, device=torch.device("cpu")) @@ -4071,6 +4073,8 @@ def test_astensor_consistency(self, device): [0.0, True, False, 42], # With Complex [0.0, True, False, 42, 5j], + # With Range + range(5), ] for e in examples: diff --git a/test/test_tensorboard.py b/test/test_tensorboard.py index 4300e9a71006bf..7f34fd90dd5c14 100644 --- a/test/test_tensorboard.py +++ b/test/test_tensorboard.py @@ -562,15 +562,15 @@ def forward(self, x): expected_proto = GraphDef() text_format.Parse(expected_str, expected_proto) - self.assertEquals(len(expected_proto.node), len(actual_proto.node)) + self.assertEqual(len(expected_proto.node), len(actual_proto.node)) for i in range(len(expected_proto.node)): expected_node = expected_proto.node[i] actual_node = actual_proto.node[i] - self.assertEquals(expected_node.name, actual_node.name) - self.assertEquals(expected_node.op, actual_node.op) - self.assertEquals(expected_node.input, actual_node.input) - self.assertEquals(expected_node.device, actual_node.device) - self.assertEquals( + self.assertEqual(expected_node.name, actual_node.name) + self.assertEqual(expected_node.op, actual_node.op) + self.assertEqual(expected_node.input, actual_node.input) + self.assertEqual(expected_node.device, actual_node.device) + self.assertEqual( sorted(expected_node.attr.keys()), sorted(actual_node.attr.keys())) def test_nested_nn_squential(self): diff --git a/test/test_tensorexpr.py b/test/test_tensorexpr.py index 42ca49dc347574..8a5e918eda4b97 100644 --- a/test/test_tensorexpr.py +++ b/test/test_tensorexpr.py @@ -13,11 +13,13 @@ class BaseTestClass(JitTestCase): def setUp(self): + super(BaseTestClass, self).setUp() self.tensorexpr_options = TensorExprTestOptions() self.devices = ['cpu'] if not torch.cuda.is_available() else ['cpu', 'cuda'] def tearDown(self): self.tensorexpr_options.restore() + super(BaseTestClass, self).tearDown() def assertLastGraphAllFused(self): self.assertAllFused(torch.jit.last_executed_optimized_graph()) diff --git a/test/test_testing.py b/test/test_testing.py index 948890f87f5c6e..2ccb6ff3628237 100644 --- a/test/test_testing.py +++ b/test/test_testing.py @@ -22,14 +22,13 @@ deviceCountAtLeast, ops, expectedFailureMeta) from torch.testing._internal.common_methods_invocations import op_db import torch.testing._internal.opinfo_helper as opinfo_helper -from torch.testing._internal.common_dtype import get_all_dtypes +from torch.testing._internal.common_dtype import all_types_and_complex_and from torch.testing._internal.common_modules import modules, module_db # For testing TestCase methods and torch.testing functions class TestTesting(TestCase): # Ensure that assertEqual handles numpy arrays properly - @dtypes(*(get_all_dtypes(include_half=True, include_bfloat16=False, - include_bool=True, include_complex=True))) + @dtypes(*all_types_and_complex_and(torch.bool, torch.half)) def test_assertEqual_numpy(self, device, dtype): S = 10 test_sizes = [ @@ -279,6 +278,11 @@ def check(size, low, high, requires_grad, noncontiguous): check(size, None, None, False, False) check(size, 2, 4, True, True) + def test_make_tensor_complex32(self, device): + # verify that we can generate torch.complex32 tensor + t = make_tensor((1, 2, 3), dtype=torch.complex32, device=device) + self.assertEqual(t.dtype, torch.complex32) + # The following tests (test_cuda_assert_*) are added to ensure test suite terminates early # when CUDA assert was thrown. Because all subsequent test will fail if that happens. # These tests are slow because it spawn another process to run test suite. @@ -403,7 +407,7 @@ def test_get_supported_dtypes(self, device): ops_to_test = list(filter(lambda op: op.name in ['atan2', 'topk', 'xlogy'], op_db)) for op in ops_to_test: - dynamic_dtypes = opinfo_helper.get_supported_dtypes(op.op, op.sample_inputs_func, self.device_type) + dynamic_dtypes = opinfo_helper.get_supported_dtypes(op, op.sample_inputs_func, self.device_type) dynamic_dispatch = opinfo_helper.dtypes_dispatch_hint(dynamic_dtypes) if self.device_type == 'cpu': dtypes = op.dtypesIfCPU diff --git a/test/test_torch.py b/test/test_torch.py index 67f820457b7496..4f4e53f3e7487c 100644 --- a/test/test_torch.py +++ b/test/test_torch.py @@ -52,9 +52,11 @@ from typing import Tuple import torch.backends.quantized import torch.testing._internal.data -from torch.testing._internal.common_cuda import tf32_on_and_off, tf32_is_not_fp32 +from torch.testing._internal.common_cuda import ( + tf32_on_and_off, tf32_is_not_fp32, TEST_CUDNN) from torch.testing._internal.common_dtype import ( - get_all_fp_dtypes, get_all_int_dtypes, get_all_math_dtypes, get_all_dtypes, get_all_complex_dtypes + floating_types_and, get_all_math_dtypes, all_types_and_complex_and, complex_types, + all_types_and, floating_types, floating_and_complex_types, integral_types, ) # Protects against includes accidentally setting the default dtype @@ -116,19 +118,6 @@ def test_cuda_vitals_gpu_only(self, device): class TestTorchDeviceType(TestCase): exact_dtype = True - # FIXME: Port this to ErrorInputs on where - @onlyCUDA - @dtypes(torch.float32) - def test_where_invalid_device(self, device, dtype): - for devices in [('cpu', device, device), (device, 'cpu', 'cpu'), - (device, 'cpu', device), ('cpu', device, 'cpu')]: - condition = make_tensor(16, device=devices[0], dtype=torch.float32) - x = make_tensor(16, device=devices[1], dtype=torch.float32) - y = make_tensor(16, device=devices[2], dtype=torch.float32) - with self.assertRaisesRegex(RuntimeError, - "Expected condition, x and y to be on the same device"): - torch.where(condition, x, y) - # TODO: move all tensor creation to common ops def _rand_shape(self, dim, min_size, max_size): shape = [] @@ -233,7 +222,17 @@ def test_storage_setitem(self, device, dtype): self.assertEqual(s, storage_type(l)) @onlyNativeDeviceTypes - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) + def test_tensor_storage_type(self, device, dtype): + a = make_tensor((10,), dtype=dtype, device=device, low=-9, high=9) + + module = torch.cuda if (torch.device(device).type == 'cuda') else torch + expected_storage_type = getattr(module, torch.storage._dtype_to_storage_type_map()[dtype]) + + self.assertEqual(a.storage_type(), expected_storage_type) + + @onlyNativeDeviceTypes + @dtypes(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) def test_tensor_from_storage(self, device, dtype): a = make_tensor((4, 5, 3), dtype=dtype, device=device, low=-9, high=9) a_s = a.storage() @@ -242,7 +241,7 @@ def test_tensor_from_storage(self, device, dtype): c = torch.tensor(a_s._untyped(), device=device, dtype=dtype).reshape(a.size()) self.assertEqual(a, c) - for error_dtype in get_all_dtypes(): + for error_dtype in all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16): if error_dtype == dtype: continue with self.assertRaisesRegex(RuntimeError, r'Expected a Storage of type'): @@ -250,7 +249,7 @@ def test_tensor_from_storage(self, device, dtype): torch.tensor(error_storage, device=device, dtype=dtype) @onlyNativeDeviceTypes - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) def test_set_storage(self, device, dtype): a = make_tensor((4, 5, 3), dtype=dtype, device=device, low=-9, high=9) a_s = a.storage() @@ -259,7 +258,7 @@ def test_set_storage(self, device, dtype): c = torch.tensor([], device=device, dtype=dtype).set_(a_s._untyped()).reshape(a.size()) self.assertEqual(a, c) - for error_dtype in get_all_dtypes(): + for error_dtype in all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16): if error_dtype == dtype: continue with self.assertRaisesRegex(RuntimeError, r'Expected a Storage of type'): @@ -460,26 +459,12 @@ def test_scalar_check(self, device): self.assertEqual((), torch.cummax(zero_d, 0)[0].shape) self.assertEqual((), torch.cummin(zero_d, 0)[0].shape) - # renorm - self.assertRaises(RuntimeError, lambda: torch.renorm(zero_d, 0.5, 0, 1.0)) - # sort, topk self.assertEqual([(), ()], [x.shape for x in torch.sort(zero_d, 0, False)]) self.assertEqual([(), ()], [x.shape for x in torch.sort(zero_d, 0, True)]) self.assertEqual([(), ()], [x.shape for x in torch.topk(zero_d, 1, 0, False)]) self.assertEqual([(), ()], [x.shape for x in torch.topk(zero_d, 1, 0, True)]) - # lstsq (gels) - self.assertRaises(RuntimeError, lambda: torch.lstsq(zero_d, zero_d)) - - # eig - self.assertRaises(RuntimeError, lambda: torch.eig(zero_d, False)) - self.assertRaises(RuntimeError, lambda: torch.eig(zero_d, True)) - - # this is only implemented on cpu - if (torch.device(device).type == 'cpu'): - self.assertRaises(RuntimeError, lambda: torch.ormqr(zero_d, zero_d, zero_d)) - # max, min self.assertEqual((), torch.max(zero_d, zero_d).shape) self.assertEqual((1,), torch.max(one_d, zero_d).shape) @@ -488,9 +473,6 @@ def test_scalar_check(self, device): self.assertEqual((1,), torch.min(one_d, zero_d).shape) self.assertEqual((1,), torch.min(zero_d, one_d).shape) - # diag - self.assertRaises(RuntimeError, lambda: torch.diag(zero_d)) - zero_d_int = torch.tensor(1, device=device) one_d_int = torch.tensor([1], device=device) @@ -1415,15 +1397,55 @@ def backward_func(slf, device): backward_func(self, device) - def test_embedding_scalar_weight_error(self, device): - indices = torch.rand(2, 2, device=device).long() - weights = [ - torch.tensor(1.0, device=device), - torch.tensor(1.0, device=device).reshape(1, 1, 1), - ] - for weight in weights: - with self.assertRaisesRegex(RuntimeError, "'weight' must be 2-D"): - torch.embedding(weight, indices) + def test_invalid_shapes_grid_sampler(self, device): + make_arg = partial( + make_tensor, device=device, dtype=torch.float64, requires_grad=True) + + inputs = ( + # input, grid + ((5, 5, 5, 5, 5,), (1, 1, 1, 4, 4,)), # 3d + ((5, 5, 5, 5,), (1, 1, 4, 4,)), # 2d + ) + + interpolation_mode = 0 + padding_mode = 0 + align_corners = True + + err = "expected grid and input to have same batch size" + + for input, grid in inputs: + input = make_arg(input) + grid = make_arg(grid, low=-1, high=1) + + # Wrapper for the 2d, 3d, and cuDNN functions listed below. + with self.assertRaisesRegex(RuntimeError, err): + torch.grid_sampler( + input, grid, interpolation_mode, padding_mode, + align_corners) + + # Expects 2d input. + with self.assertRaisesRegex(RuntimeError, err): + torch.grid_sampler_2d( + input, grid, interpolation_mode, padding_mode, + align_corners) + + # Expects 3d input. + with self.assertRaisesRegex(RuntimeError, err): + torch.grid_sampler_3d( + input, grid, interpolation_mode, padding_mode, + align_corners) + + # Expects 2d input. + with self.assertRaisesRegex(RuntimeError, err): + torch._grid_sampler_2d_cpu_fallback( + input, grid, interpolation_mode, padding_mode, + align_corners) + + # Expects 2d input, on CUDA. + # Doesn't work on CPU and ROCm. + if device != 'cpu' and TEST_CUDNN and not TEST_WITH_ROCM: + with self.assertRaisesRegex(RuntimeError, err): + torch.cudnn_grid_sampler(input, grid) def test_dist(self, device): def run_test(x, y): @@ -1592,13 +1614,13 @@ def _cond_fn(x): _sync_raises_helper(f, level) - @dtypes(*get_all_fp_dtypes()) + @dtypes(*floating_types_and(torch.half, torch.bfloat16)) def test_log_normal(self, device, dtype): a = torch.tensor([10], dtype=dtype, device=device).log_normal_() self.assertEqual(a.dtype, dtype) self.assertEqual(a.size(), torch.Size([1])) - @dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes())) + @dtypes(*all_types_and(torch.half, torch.bfloat16)) def test_geometric(self, device, dtype): a = torch.tensor([10], dtype=dtype, device=device).geometric_(0.5) self.assertEqual(a.dtype, dtype) @@ -1630,9 +1652,9 @@ def test_repeat_interleave(self, device): self.assertEqual(a_with_output.dtype, y.dtype) self.assertEqual(a_with_output.size(), torch.Size([3, 2])) - @dtypes(*get_all_fp_dtypes(include_half=False, include_bfloat16=False)) - @dtypesIfCPU(*(get_all_fp_dtypes(include_half=False, include_bfloat16=True))) - @dtypesIfCUDA(*(get_all_fp_dtypes(include_bfloat16=False))) + @dtypes(*floating_types()) + @dtypesIfCPU(*floating_types_and(torch.bfloat16)) + @dtypesIfCUDA(*floating_types_and(torch.half)) def test_bernoulli_p(self, device, dtype): for trivial_p in ([0, 1], [1, 0, 1, 1, 0, 1]): x = torch.tensor(trivial_p, dtype=dtype, device=device) @@ -1652,9 +1674,9 @@ def isBinary(t): self.assertTrue(isBinary(p)) # RngUniform not implemented for Integral type in XLA test - @dtypes(*(get_all_fp_dtypes(include_half=False, include_bfloat16=False))) - @dtypesIfCPU(*(get_all_dtypes(include_half=False, include_bfloat16=False, include_complex=False))) - @dtypesIfCUDA(*(get_all_dtypes(include_bfloat16=False, include_complex=False))) + @dtypes(*floating_types()) + @dtypesIfCPU(*all_types_and(torch.bool)) + @dtypesIfCUDA(*all_types_and(torch.bool, torch.half)) def test_bernoulli_self(self, device, dtype): def isBinary(t): @@ -1666,7 +1688,7 @@ def isBinary(t): t.bernoulli_(0.5) self.assertTrue(isBinary(t)) - for p_dtype in get_all_fp_dtypes(include_half=device.startswith('cuda'), include_bfloat16=False): + for p_dtype in floating_types_and(*[torch.half] if device.startswith('cuda') else []): p = torch.rand(10, dtype=p_dtype, device=device).expand(10, 10) t.fill_(2) t.bernoulli_(p) @@ -1681,8 +1703,8 @@ def isBinary(t): self.assertTrue(isBinary(t)) @slowTest - @dtypes(*(get_all_fp_dtypes(include_half=False, include_bfloat16=False))) - @dtypesIfCUDA(*(get_all_fp_dtypes(include_bfloat16=False))) + @dtypes(*floating_types()) + @dtypesIfCUDA(*floating_types_and(torch.half)) def test_bernoulli_edge_cases(self, device, dtype): # Need to draw a lot of samples to cover every random floating point number. a = torch.zeros(10000, 10000, dtype=dtype, device=device) # probability of drawing "1" is 0 @@ -1693,7 +1715,7 @@ def test_bernoulli_edge_cases(self, device, dtype): num_zeros = (torch.bernoulli(b) == 0).sum() self.assertEqual(num_zeros, 0) - @dtypes(*get_all_fp_dtypes()) + @dtypes(*floating_types_and(torch.half, torch.bfloat16)) def test_exponential(self, device, dtype): a = torch.tensor([10], dtype=dtype, device=device).exponential_(0.5) self.assertEqual(a.dtype, dtype) @@ -1759,25 +1781,8 @@ def check(t, correction=1, fweights=None, aweights=None): for correction, fw, aw in product([0, 1, 2], [None, fweights], [None, aweights]): check(x, correction, fweights, aweights) - # FIXME: port to ErrorInputs - def test_cov_error(self, device): - def check(msg, *args, **kwargs): - with self.assertRaisesRegex(RuntimeError, r'cov\(\):.*' + msg + r'.*'): - torch.cov(*args, **kwargs) - - a = torch.rand(2) - check(r'expected input to have two or fewer dimensions', torch.rand(2, 2, 2)) - check(r'expected fweights to have one or fewer dimensions', a, fweights=torch.rand(2, 2)) - check(r'expected aweights to have one or fewer dimensions', a, aweights=torch.rand(2, 2)) - check(r'expected fweights to have integral dtype', a, fweights=torch.rand(2)) - check(r'expected aweights to have floating point dtype', a, aweights=torch.tensor([1, 1])) - check(r'expected fweights to have the same numel', a, fweights=torch.tensor([1])) - check(r'expected aweights to have the same numel', a, aweights=torch.rand(1)) - check(r'fweights cannot be negative', a, fweights=torch.tensor([-1, -2])) - check(r'aweights cannot be negative', a, aweights=torch.tensor([-1., -2.])) - @skipIfNoSciPy - @dtypes(*get_all_fp_dtypes()) + @dtypes(*floating_types_and(torch.half, torch.bfloat16)) def test_uniform_kstest(self, device, dtype): from scipy import stats size = 1000 @@ -1789,8 +1794,8 @@ def test_uniform_kstest(self, device, dtype): self.assertTrue(res.statistic < 0.1) @skipIfNoSciPy - @dtypes(*get_all_fp_dtypes(include_bfloat16=False)) - @dtypesIfCUDA(*get_all_fp_dtypes()) + @dtypes(*floating_types_and(torch.half)) + @dtypesIfCUDA(*floating_types_and(torch.half, torch.bfloat16)) def test_normal_kstest(self, device, dtype): from scipy import stats size = 1000 @@ -1801,7 +1806,7 @@ def test_normal_kstest(self, device, dtype): self.assertTrue(res.statistic < 0.1) @skipIfNoSciPy - @dtypes(*get_all_fp_dtypes()) + @dtypes(*floating_types_and(torch.half, torch.bfloat16)) def test_lognormal_kstest(self, device, dtype): from scipy import stats size = 1000 @@ -1815,7 +1820,7 @@ def test_lognormal_kstest(self, device, dtype): self.assertTrue(res.statistic < 0.1) @skipIfNoSciPy - @dtypes(*get_all_fp_dtypes()) + @dtypes(*floating_types_and(torch.half, torch.bfloat16)) def test_exponential_kstest(self, device, dtype): from scipy import stats size = 1000 @@ -1825,7 +1830,7 @@ def test_exponential_kstest(self, device, dtype): self.assertTrue(res.statistic < 0.1) @skipIfNoSciPy - @dtypes(*get_all_fp_dtypes()) + @dtypes(*floating_types_and(torch.half, torch.bfloat16)) def test_cauchy_kstest(self, device, dtype): from scipy import stats size = 1000 @@ -1846,7 +1851,7 @@ def test_cauchy_no_inf(self, device, dtype): self.assertFalse(x.isinf().sum()) @skipIfNoSciPy - @dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes())) + @dtypes(*all_types_and(torch.half, torch.bfloat16)) def test_geometric_kstest(self, device, dtype): from scipy import stats size = 1000 @@ -2087,37 +2092,6 @@ def test_cdist_same_inputs(self, device): # values such as nan or inf assert torch.isfinite(x.grad).all() - def test_multinomial_constraints(self, device): - x = torch.empty(1, 2, 3, dtype=torch.double, device=device) - self.assertRaisesRegex( - RuntimeError, "prob_dist must be 1 or 2 dim", - lambda: torch.multinomial(x, 2)) - x = torch.empty(1, 2, dtype=torch.long, device=device) - self.assertRaisesRegex( - RuntimeError, "multinomial only supports floating-point dtypes for input", - lambda: torch.multinomial(x, 2)) - x = torch.empty(1, 2, dtype=torch.double, device=device) - y = torch.empty(1, 2, dtype=torch.double, device=device) - self.assertRaisesRegex( - RuntimeError, "multinomial expects Long tensor out", - lambda: torch.multinomial(x, 2, out=y)) - x = torch.empty(2, dtype=torch.double, device=device) - self.assertRaisesRegex( - RuntimeError, "cannot sample n_sample <= 0 samples", - lambda: torch.multinomial(x, 0)) - x = torch.empty(2, dtype=torch.double, device=device) - self.assertRaisesRegex( - RuntimeError, "cannot sample n_sample <= 0 samples", - lambda: torch.multinomial(x, -1)) - x = torch.empty(2, dtype=torch.double, device=device) - self.assertRaisesRegex( - RuntimeError, "cannot sample n_sample > prob_dist", - lambda: torch.multinomial(x, 3, False)) - x = torch.empty(16777217, dtype=torch.double, device=device) - self.assertRaisesRegex( - RuntimeError, "number of categories cannot exceed", - lambda: torch.multinomial(x, 3)) - def test_cumsum(self, device): x = torch.rand(100, 100, device=device) res1 = torch.cumsum(x, 1) @@ -2357,7 +2331,7 @@ def to_np(t): # All tensors appear contiguous on XLA @onlyNativeDeviceTypes - @dtypes(*get_all_dtypes(include_bfloat16=False)) + @dtypes(*all_types_and_complex_and(torch.half, torch.bool)) def test_diff_noncontig(self, device, dtype): shapes = ( (1,), @@ -2377,9 +2351,9 @@ def test_diff_noncontig(self, device, dtype): self._test_diff_numpy(non_contig) # RngNormal not implemented for type f16 for XLA - @dtypes(*get_all_dtypes(include_half=False, include_bfloat16=False)) - @dtypesIfCPU(*get_all_dtypes(include_bfloat16=False)) - @dtypesIfCUDA(*get_all_dtypes(include_bfloat16=False)) + @dtypes(*all_types_and_complex_and(torch.bool)) + @dtypesIfCPU(*all_types_and_complex_and(torch.half, torch.bool)) + @dtypesIfCUDA(*all_types_and_complex_and(torch.half, torch.bool)) def test_diff(self, device, dtype): shapes = ( (1,), @@ -2551,38 +2525,6 @@ def test_gradient_type_promotion(self, device): actual, expected = self._inf_nan_preprocess(list(actual), expected) self.assertEqual(actual, expected, equal_nan=True, exact_dtype=False) - # FIXME: port this to ErrorInputs - @onlyNativeDeviceTypes - @dtypes(torch.long, torch.float32, torch.complex64) - def test_error_gradient(self, device, dtype): - t = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]], device=device, dtype=dtype) - with self.assertRaisesRegex(RuntimeError, 'torch.gradient expected spacing to be unspecified, a scalar '): - dim = (1, 0) - spacing = [0.1] - torch.gradient(t, spacing=spacing, dim=dim, edge_order=1) - - with self.assertRaisesRegex(RuntimeError, 'torch.gradient only supports edge_order=1 and edge_order=2.'): - torch.gradient(t, edge_order=3) - - with self.assertRaisesRegex(RuntimeError, 'dim 1 appears multiple times in the list of dims'): - dim = (1, 1) - spacing = 0.1 - torch.gradient(t, spacing=spacing, dim=dim, edge_order=1) - - with self.assertRaisesRegex(RuntimeError, 'torch.gradient expected each tensor to be on the same device,'): - dim = (0, 1) - coordinates = [torch.tensor([1, 2, 4], device='cpu'), torch.tensor([1, 2, 4], device='meta')] - torch.gradient(t, spacing=coordinates, dim=dim, edge_order=1) - - with self.assertRaises(IndexError): - torch.gradient(t, dim=3) - - with self.assertRaisesRegex(RuntimeError, 'torch.gradient expected each dimension size to be at least'): - torch.gradient(torch.tensor([[1], [2], [3]]), edge_order=1) - - with self.assertRaisesRegex(RuntimeError, 'torch.gradient expected each dimension size to be at least'): - torch.gradient(torch.tensor([[1, 2], [3, 4]]), edge_order=2) - def _test_large_cum_fn_helper(self, x, fn): x_cpu = x.cpu().float() expected = fn(x_cpu) @@ -2602,6 +2544,7 @@ def test_large_cumsum(self, device, dtype): @onlyCUDA @dtypes(torch.half) # only small dtype not to get oom + @largeTensorTest("48GB", "cpu") def test_large_cumprod(self, device, dtype): # initialization to avoid overflow and half caveats x = torch.empty(2**30 + 200, device=device, dtype=dtype) @@ -2650,7 +2593,7 @@ def test_bool_tensor_value_change(self, device): # FIXME: move to shape ops test suite def test_unfold_all_devices_and_dtypes(self, device): - for dt in get_all_dtypes(): + for dt in all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16): if dt == torch.bool: x = torch.empty((0, 1, 3, 0), dtype=dt, device=device) @@ -2672,7 +2615,7 @@ def test_unfold_scalars(self, device): # FIXME: move to data movement test suite def test_copy_all_dtypes_and_devices(self, device): from copy import copy - for dt in get_all_dtypes(): + for dt in all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16): x = torch.tensor([1, 2, 3, 4], dtype=dt, device=device) x_clone = x.clone() y = copy(x) @@ -2741,7 +2684,7 @@ def test_copy_transpose_math_view(self, device, dtype): self.assertEqual(dst, src.conj_physical()) def test_clone_all_dtypes_and_devices(self, device): - for dt in get_all_dtypes(): + for dt in all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16): x = torch.tensor((1, 1), dtype=dt, device=device) y = x.clone() self.assertEqual(x, y) @@ -2812,7 +2755,7 @@ def test_narrow_empty(self, device): self.assertEqual(sz, y.size()) # FIXME: move to test indexing - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) def test_index_copy(self, device, dtype): # We just test for num_copy <= num_dest, as otherwise there are repeated indices # and the behavior is undefined @@ -2847,7 +2790,7 @@ def ref_index_copy(tgt, dim, idx, src): # onlyNativeDeviceTypes due to an XLA error: # https://github.com/pytorch/pytorch/issues/53256 @onlyNativeDeviceTypes - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) def test_index_copy_scalars(self, device, dtype): # Create the 8 possible combinations of scalar sizes for target / index / source scalars = ((make_tensor(size_t, dtype=dtype, device=device, low=None, high=None), @@ -2957,7 +2900,7 @@ def test_index_put_non_accumulate_deterministic(self, device) -> None: self.assertEqual(output, input_list) # FIXME: move to test indexing - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) def test_index_fill(self, device, dtype): x = torch.tensor([[1, 2], [4, 5]], dtype=dtype, device=device) index = torch.tensor([0], device=device) @@ -2975,7 +2918,7 @@ def test_index_fill(self, device, dtype): # FIXME: move to test indexing # The test fails for zero-dimensional tensors on XLA @onlyNativeDeviceTypes - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) def test_index_select(self, device, dtype): num_src, num_out = 3, 5 @@ -3021,7 +2964,7 @@ def ref_index_select(src, dim, idx): self.assertEqual(out.item(), source.item()) # FIXME: find a test suite for the take operator - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) def test_take(self, device, dtype): idx_size = (4,) @@ -3056,7 +2999,7 @@ def ref_take(src, idx): # FIXME: find a test suite for the put operator # The bool instance does not work on GPU. See # https://github.com/pytorch/pytorch/issues/54317 - @dtypes(*get_all_dtypes(include_bool=False)) + @dtypes(*all_types_and_complex_and(torch.half, torch.bfloat16)) def test_put(self, device, dtype): src_size = (4,) @@ -3127,7 +3070,7 @@ def ref_put(dst, idx, src, accumulate): # FIXME: find a test suite for the put operator # The bool instance does not work on GPU. See # https://github.com/pytorch/pytorch/issues/54317 - @dtypes(*get_all_dtypes(include_bool=False)) + @dtypes(*all_types_and_complex_and(torch.half, torch.bfloat16)) def test_put_accumulate(self, device, dtype): # Test for parallel adds with accumulate == True low_precision = dtype == torch.half or dtype == torch.bfloat16 @@ -3171,13 +3114,9 @@ def scatter_allow_reduce(self, device, dtype, reduceop): device_type = torch.device(device).type return device_type != 'cuda' or (reduceop == 'multiply' and dtype.is_floating_point) - # FIXME: port to test_scatter_gather_ops.py - # torch.{zeros, ones} do not support ComplexHalf (torch.complex32) - # So, we are skipping it here. - @dtypes(*(get_all_fp_dtypes(include_bfloat16=False, include_half=False) + - get_all_complex_dtypes())) - @dtypesIfCPU(*get_all_dtypes()) - @dtypesIfCUDA(*get_all_dtypes()) + @dtypes(*floating_and_complex_types()) + @dtypesIfCPU(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) + @dtypesIfCUDA(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) def test_scatter_reduce_operations_to_large_input(self, device, dtype): index = torch.tensor([[1], [2]], device=device, dtype=torch.long) test_data = [ @@ -3202,13 +3141,9 @@ def test_scatter_reduce_operations_to_large_input(self, device, dtype): input.scatter_(0, index, src, reduce=operation) self.assertEqual(input, result) - # FIXME: port to test_scatter_gather_ops.py - # torch.{zeros, ones} do not support ComplexHalf (torch.complex32) - # So, we are skipping it here. - @dtypes(*(get_all_fp_dtypes(include_bfloat16=False, include_half=False) + - get_all_complex_dtypes())) - @dtypesIfCPU(*get_all_dtypes()) - @dtypesIfCUDA(*get_all_dtypes()) + @dtypes(*floating_and_complex_types()) + @dtypesIfCPU(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) + @dtypesIfCUDA(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) def test_scatter_reduce_scalar(self, device, dtype): index = torch.tensor([[1], [2]], device=device, dtype=torch.long) test_data = [ @@ -3245,13 +3180,9 @@ def test_scatter_add_non_unique_index(self, device): torch.tensor([[3], [1]], device=device, dtype=torch.float32).repeat(1, width)) - # FIXME: port to test_scatter_gather_ops.py - # torch.{zeros, ones} do not support ComplexHalf (torch.complex32) - # So, we are skipping it here. - @dtypes(*(get_all_fp_dtypes(include_bfloat16=False, include_half=False) + - get_all_complex_dtypes())) - @dtypesIfCPU(*get_all_dtypes()) - @dtypesIfCUDA(*get_all_dtypes()) + @dtypes(*floating_and_complex_types()) + @dtypesIfCPU(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) + @dtypesIfCUDA(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) def test_scatter_reduce_non_unique_index(self, device, dtype): height = 2 width = 2 @@ -3272,12 +3203,8 @@ def test_scatter_reduce_non_unique_index(self, device, dtype): input.scatter_(0, index, src, reduce=operation) self.assertEqual(input, result, msg=f"result: {result} input: {input} method: {str(operation)}") - # FIXME: port to test_scatter_gather_ops.py - # torch.{zeros, ones} do not support ComplexHalf (torch.complex32) - # So, we are skipping it here. @onlyCUDA - @dtypes(*(get_all_complex_dtypes() + - get_all_int_dtypes())) + @dtypes(*integral_types(), *complex_types()) def test_scatter_reduce_multiply_unsupported_dtypes(self, device, dtype): height = 2 width = 2 @@ -3329,7 +3256,7 @@ def test_scatter_add_bool(self, device): # FIXME: find a test suite for the masked scatter operator @onlyNativeDeviceTypes - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) def test_masked_scatter(self, device, dtype): dt = dtype with warnings.catch_warnings(record=True) as w: @@ -3406,8 +3333,6 @@ def test_masked_scatter_bool_tensor(self, device): # FIXME: find a test suite for the masked scatter operator # test_scatter_gather_ops or test_masked_ops? - # refer https://github.com/pytorch/pytorch/issues/60190 - @skipIfRocm @onlyCUDA @largeTensorTest('30GB') def test_masked_scatter_large_tensor(self, device): @@ -3418,7 +3343,7 @@ def test_masked_scatter_large_tensor(self, device): self.assertEqual(result, result_cpu) # FIXME: find a test suite for the masked select operator - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16)) def test_masked_select(self, device, dtype): if device == 'cpu': warn = 'masked_select received a mask with dtype torch.uint8,' @@ -3486,7 +3411,7 @@ def test_masked_select_discontiguous(self, device): self.assertEqual(out_dc, expected, atol=0, rtol=0) # FIXME: find a test suite for the masked fill operator - @dtypes(*product(get_all_dtypes(), (torch.uint8, torch.bool))) + @dtypes(*product(all_types_and_complex_and(torch.half, torch.bool, torch.bfloat16), (torch.uint8, torch.bool))) def test_masked_fill(self, device, dtypes): dtype = dtypes[0] mask_dtype = dtypes[1] @@ -3791,15 +3716,18 @@ def test_pdist_norm_backward(self, device): # FIXME: find a test suite for the pdist operator @unittest.skipIf(IS_FBCODE and IS_REMOTE_GPU, "sandcastle OOM with current tpx gpu/re configuration") @skipIfRocm + @onlyCUDA + @largeTensorTest('10GB', device='cpu') + @largeTensorTest('5GB', device='cuda') def test_pdist_norm_large(self, device): # use dim0>=46342 for forward, see: # https://github.com/pytorch/pytorch/issues/30583 # Compare output using GPU with the CPU implementation, as brute_pdist uses too much memory - if 'cuda' in device: - x = torch.randn(50000, 1, dtype=torch.float32) - expected_cpu = torch.pdist(x, p=2) - actual_gpu = torch.pdist(x.to(device), p=2) - self.assertEqual(expected_cpu, actual_gpu.cpu()) + x = torch.randn(50000, 1, dtype=torch.float32) # 50k * 4 bytes = 200 KB + # Will require 1249975000 float32s + expected_cpu = torch.pdist(x, p=2) # ~1250M * 4 bytes = 5 GB on CPU + actual_gpu = torch.pdist(x.to(device), p=2) # 5 GB on GPU + self.assertEqual(expected_cpu, actual_gpu.cpu()) # Another 5 GB on CPU # FIXME: move to elementwise ternary test suite @onlyNativeDeviceTypes @@ -4033,19 +3961,6 @@ def test_masked_fill_mem_overlap(self, device): with self.assertRaisesRegex(RuntimeError, 'unsupported operation'): mask[1:].masked_fill_(mask[:-1], False) - # FIXME: convert to ErrorInputs - @onlyNativeDeviceTypes - def test_masked_select_mem_overlap(self, device): - x = torch.rand((1,), device=device).expand((3,)) - y = torch.rand((6,), device=device) - mask = torch.tensor([True, False, True, True, False, False], device=device) - with self.assertRaisesRegex(RuntimeError, 'unsupported operation'): - torch.masked_select(y, mask, out=x) - with self.assertRaisesRegex(RuntimeError, 'unsupported operation'): - torch.masked_select(y, mask, out=y) - with self.assertRaisesRegex(RuntimeError, 'unsupported operation'): - torch.masked_select(mask.clone(), mask, out=mask) - # FIXME: convert to ErrorInputs @expectedFailureMeta # RuntimeError not raised @onlyNativeDeviceTypes @@ -4057,15 +3972,6 @@ def test_masked_scatter_mem_overlap(self, device): with self.assertRaisesRegex(RuntimeError, 'unsupported operation'): x.masked_scatter_(mask, src) - # FIXME: convert to ErrorInputs - @onlyNativeDeviceTypes - def test_index_select_mem_overlap(self, device): - x = torch.rand((1, 6), device=device).expand((2, 6)) - y = torch.rand((3, 6), device=device) - ind = torch.tensor([0, 1], dtype=torch.int64, device=device) - with self.assertRaisesRegex(RuntimeError, 'unsupported operation'): - torch.index_select(y, 1, ind, out=x) - # FIXME: convert to ErrorInputs @onlyNativeDeviceTypes def test_scatter_mem_overlap(self, device): @@ -4080,32 +3986,6 @@ def test_scatter_mem_overlap(self, device): with self.assertRaisesRegex(RuntimeError, 'unsupported operation'): ind.scatter_(0, ind, ind.clone()) - # FIXME: convert to ErrorInputs - @onlyNativeDeviceTypes - def test_gather_mem_overlap(self, device): - x = torch.rand((1,), device=device).expand((3,)) - src = torch.rand((6,), device=device) - ind = torch.tensor([2, 1, 0], device=device, dtype=torch.int64) - with self.assertRaisesRegex(RuntimeError, 'unsupported operation'): - torch.gather(src, 0, ind, out=x) - with self.assertRaisesRegex(RuntimeError, 'unsupported operation'): - torch.gather(src, 0, ind, out=src) - with self.assertRaisesRegex(RuntimeError, 'unsupported operation'): - torch.gather(ind.clone(), 0, ind[1:], out=ind[:1]) - - # FIXME: convert to ErrorInputs - @onlyNativeDeviceTypes - def test_take_mem_overlap(self, device): - x = torch.rand((1,), device=device).expand((3,)) - src = torch.rand((6,), device=device) - ind = torch.tensor([2, 1, 0], device=device, dtype=torch.int64) - with self.assertRaisesRegex(RuntimeError, 'unsupported operation'): - torch.take(src, ind, out=x) - with self.assertRaisesRegex(RuntimeError, 'unsupported operation'): - torch.take(src, ind, out=src) - with self.assertRaisesRegex(RuntimeError, 'unsupported operation'): - torch.take(ind.clone(), ind[1:], out=ind[:-1]) - # FIXME: move to test distributions @onlyCUDA def test_multinomial_device_constrain(self, device): @@ -4564,7 +4444,7 @@ def compare_strides(s1, s2, div): # FIXME: move dlpack tests to their own test class/suite @skipMeta @onlyNativeDeviceTypes - @dtypes(*get_all_dtypes(include_bool=False)) + @dtypes(*all_types_and_complex_and(torch.half, torch.bfloat16)) def test_dlpack_capsule_conversion(self, device, dtype): # DLpack does not explicitly support bool (xref dmlc/dlpack#75) x = make_tensor((5,), dtype=dtype, device=device) @@ -4573,7 +4453,7 @@ def test_dlpack_capsule_conversion(self, device, dtype): @skipMeta @onlyNativeDeviceTypes - @dtypes(*get_all_dtypes(include_bool=False)) + @dtypes(*all_types_and_complex_and(torch.half, torch.bfloat16)) def test_dlpack_protocol_conversion(self, device, dtype): x = make_tensor((5,), dtype=dtype, device=device) z = from_dlpack(x) @@ -4589,7 +4469,7 @@ def test_dlpack_shared_storage(self, device): @skipMeta @onlyCUDA - @dtypes(*get_all_dtypes(include_bool=False)) + @dtypes(*all_types_and_complex_and(torch.half, torch.bfloat16)) def test_dlpack_conversion_with_streams(self, device, dtype): # Create a stream where the tensor will reside stream = torch.cuda.Stream() @@ -4608,7 +4488,7 @@ def test_dlpack_conversion_with_streams(self, device, dtype): @skipMeta @onlyNativeDeviceTypes - @dtypes(*get_all_dtypes(include_bool=False)) + @dtypes(*all_types_and_complex_and(torch.half, torch.bfloat16)) def test_from_dlpack(self, device, dtype): x = make_tensor((5,), dtype=dtype, device=device) y = torch.from_dlpack(x) @@ -4616,7 +4496,7 @@ def test_from_dlpack(self, device, dtype): @skipMeta @onlyNativeDeviceTypes - @dtypes(*get_all_dtypes(include_bool=False)) + @dtypes(*all_types_and_complex_and(torch.half, torch.bfloat16)) def test_from_dlpack_noncontinguous(self, device, dtype): x = make_tensor((25,), dtype=dtype, device=device).reshape(5, 5) @@ -4642,7 +4522,7 @@ def test_from_dlpack_noncontinguous(self, device, dtype): @skipMeta @onlyCUDA - @dtypes(*get_all_dtypes(include_bool=False)) + @dtypes(*all_types_and_complex_and(torch.half, torch.bfloat16)) def test_dlpack_conversion_with_diff_streams(self, device, dtype): stream_a = torch.cuda.Stream() stream_b = torch.cuda.Stream() @@ -4659,7 +4539,7 @@ def test_dlpack_conversion_with_diff_streams(self, device, dtype): @skipMeta @onlyNativeDeviceTypes - @dtypes(*get_all_dtypes(include_bool=False)) + @dtypes(*all_types_and_complex_and(torch.half, torch.bfloat16)) def test_from_dlpack_dtype(self, device, dtype): x = make_tensor((5,), dtype=dtype, device=device) y = torch.from_dlpack(x) @@ -4691,7 +4571,7 @@ def __dlpack__(self, stream=None): @skipMeta @onlyNativeDeviceTypes - @dtypes(*get_all_dtypes(include_bool=False)) + @dtypes(*all_types_and_complex_and(torch.half, torch.bfloat16)) def test_dlpack_tensor_invalid_stream(self, device, dtype): with self.assertRaises(TypeError): x = make_tensor((5,), dtype=dtype, device=device) @@ -5201,8 +5081,7 @@ def _where_valid_scalar_tensor_combination(self, scalar_type, dtype): # FIXME: move to elementwise ternary test suite @onlyNativeDeviceTypes - @dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes() + - get_all_complex_dtypes())) + @dtypes(*all_types_and_complex_and(torch.half, torch.bfloat16)) def test_where_scalar_invalid_combination_raises(self, device, dtype): def checkRaises(scalar_type, dtype, condition, x, scalar_1): @@ -5215,8 +5094,7 @@ def checkRaises(scalar_type, dtype, condition, x, scalar_1): # FIXME: move to elementwise ternary test suite @skipCUDAVersionIn([(11, 2)]) # test fails for 11.2, see https://github.com/pytorch/pytorch/issues/51980 - @dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes() + - get_all_complex_dtypes())) + @dtypes(*all_types_and_complex_and(torch.half, torch.bfloat16)) def test_where_scalar_valid_combination(self, device, dtype): def checkResult(scalar_type, dtype, condition, x, scalar_1): @@ -5329,6 +5207,48 @@ def test_assertRaisesRegex_ignore_msg_non_native_device(self, device): with self.assertRaisesRegex(RuntimeError, msg): torch.nn.functional.nll_loss(x, t, weight=invalid_weight) + @dtypes(*all_types_and_complex_and(torch.bool, torch.half, torch.bfloat16, torch.complex32)) + def test_copy_(self, device, dtype): + def can_cast(src_dtype, dst_dtype): + # torch.can_cast(torch.int16, torch.uint8) returns True + # which isn't actually safe-cast. + # This function returns False in this case. + def is_unsigned_int(dtype): + return dtype is torch.uint8 + + if is_unsigned_int(dst_dtype): + return is_unsigned_int(src_dtype) + return torch.can_cast(src_dtype, dst_dtype) + + def make_tensor_wrapper(shape, dtype): + if dtype is not torch.complex32: + # Make tensor does not support generating + # complex32 tensor + return make_tensor(shape, device=device, dtype=dtype) + return torch.randn(shape, device=device, dtype=dtype) + + t = make_tensor_wrapper((50,), dtype) + src_dtypes = all_types_and_complex_and(torch.bool, torch.half, torch.bfloat16, torch.complex32) + for src_dtype in src_dtypes: + src = make_tensor_wrapper((50,), dtype=src_dtype) + t.copy_(src) + dst = make_tensor_wrapper((50, ), dtype=src_dtype) + if can_cast(src_dtype, dtype): + rtol = None + atol = None + if dtype in (torch.half, torch.complex32): + rtol = 1e-3 + atol = 1e-3 + if dtype in (torch.bfloat16,): + rtol = 1e-2 + atol = 1e-2 + self.assertEqual(src, dst.copy_(t), rtol=rtol, atol=atol) + + @dtypes(*all_types_and_complex_and(torch.bool, torch.half, torch.bfloat16, torch.complex32)) + def test_item(self, device, dtype): + t = torch.ones((), device=device, dtype=dtype) + self.assertEqual(1, t.item()) + # Tests that compare a device's computation with the (gold-standard) CPU's. class TestDevicePrecision(TestCase): @@ -5757,69 +5677,6 @@ def test_unflatten(self): r"the unspecified dimension size -1 can be any value and is ambiguous"): torch.randn(2, 0).unflatten(1, (2, -1, 0)) - # FIXME: move to test_scatter_gather_ops.py - def test_scatter_reduce(self): - dtype = device = None - output_size = 10 - shape = [5, 10, 20] - reduces = ["sum", "prod", "mean", "amax", "amin"] - fills = {"sum": 0, "prod": 1, "mean": 0, "amax": -(2 ** 31), "amin": 2 ** 31 - 1} - fns = {"sum": lambda t, v: t.add_(v), - "prod": lambda t, v: t.mul_(v), - "mean": lambda t, v, n: t.mul_(n).add_(v).div_(n + 1), - "amax": lambda t, v: torch.max(t, v, out=t), - "amin": lambda t, v: torch.min(t, v, out=t)} - - index = torch.randint(0, output_size, shape, dtype=torch.long, device=device) - input = torch.randn(shape, dtype=dtype, device=device) - - for reduce in reduces: - for dim in range(len(shape)): - output = input.scatter_reduce(dim, index, reduce, output_size=output_size) - - # Check that output is of the correct size - output_shape = copy.copy(shape) - output_shape[dim] = output_size - self.assertEqual(output.shape, output_shape) - - expected = torch.zeros(output_shape, dtype=dtype, device=device) - expected.fill_(fills[reduce]) - counts = torch.zeros(output_shape, dtype=dtype, device=device) - for i, j, k in itertools.product(range(shape[0]), range(shape[1]), range(shape[2])): - v = input[i, j, k] - m = index[i, j, k] - - if dim == 0: - i = m - elif dim == 1: - j = m - else: - k = m - - op = fns[reduce] - if (reduce == "mean"): - op(expected[i, j, k], v, counts[i, j, k]) - else: - op(expected[i, j, k], v) - counts[i, j, k] += 1 - - if (reduce == "amin" or reduce == "amax"): - expected.masked_fill_(counts == 0, 0) - - self.assertTrue(torch.allclose(output, expected)) - - with self.assertRaisesRegex(RuntimeError, "Expected `dim` to be in range -3 to 2"): - torch.scatter_reduce(input, 4, index, "sum") - - with self.assertRaisesRegex(RuntimeError, "Shape mismatch"): - index2 = torch.randint(0, output_size, (10, ), dtype=torch.long, device=device) - torch.scatter_reduce(input, 0, index2, "sum") - - with self.assertRaisesRegex(RuntimeError, "Expected `index` values to be in range 0 to 2"): - input2 = torch.randn(10, dtype=dtype, device=device) - index2 = torch.tensor([0, 1, 0, 1, 2, 3, 3, 4, 4, 3]) - torch.scatter_reduce(input2, 0, index2, "sum", output_size=2) - def test_structseq_repr(self): a = torch.arange(250).reshape(5, 5, 10) expected = """ @@ -6339,6 +6196,7 @@ def test_from_buffer(self): self.assertEqual(bools.size(), 8) self.assertEqual(bools.tolist(), [False, True, True, True, True, True, True, True]) self.assertEqual(bools.type(), 'torch.BoolStorage') + self.assertTrue(isinstance(bools, torch.BoolStorage)) f = bytearray(b'\x80\x02\x8a\nl\xfc\x9cF\xf9 j\xa8P\x19.\x80\x02M\xe9') bools = torch.BoolStorage.from_buffer(f, 'big') @@ -6351,6 +6209,122 @@ def test_from_buffer(self): bytes = torch.ByteStorage.from_buffer(a) self.assertEqual(bytes.nbytes(), 4) self.assertEqual(bytes.tolist(), [1, 2, 3, 4]) + self.assertTrue(isinstance(bytes, torch.ByteStorage)) + + def test_storage_error(self): + quantized_storages = [ + torch.QInt32Storage, + torch.QInt8Storage, + torch.QUInt2x4Storage, + torch.QUInt4x2Storage, + torch.QUInt8Storage, + ] + + with self.assertRaisesRegex(RuntimeError, r"Only child classes of _LegacyStorage can be instantiated"): + torch.storage._LegacyStorage() + + for storage_class in torch._storage_classes: + if storage_class in [torch._UntypedStorage, torch.cuda._UntypedStorage, torch._TypedStorage]: + continue + + device = 'cuda' if storage_class.__module__ == 'torch.cuda' else 'cpu' + dtype = storage_class.dtype + + if device == 'cuda' and not torch.cuda.is_available(): + continue + + # Legacy Storage constructor errors + with self.assertRaisesRegex(RuntimeError, r"'device' cannot be specified"): + storage_class(device='cpu') + + with self.assertRaisesRegex(RuntimeError, r"'dtype' cannot be specified"): + storage_class(dtype=torch.float) + + with self.assertRaisesRegex(TypeError, r"got an unexpected keyword"): + storage_class(sdlkjf=torch.float) + + with self.assertRaisesRegex(RuntimeError, r"Too many positional arguments"): + storage_class(0, 0) + + with self.assertRaisesRegex(TypeError, r"invalid data type"): + storage_class('string') + + with self.assertRaisesRegex(TypeError, r"Argument type not recognized"): + storage_class(torch.tensor([])) + + s = storage_class() + + with self.assertRaisesRegex(RuntimeError, r"No positional arguments"): + storage_class(0, wrap_storage=s._untyped()) + + with self.assertRaisesRegex(TypeError, r"must be _UntypedStorage"): + storage_class(wrap_storage=s) + + if torch.cuda.is_available(): + if storage_class in quantized_storages: + with self.assertRaisesRegex(RuntimeError, r"Cannot create CUDA storage with quantized dtype"): + s.cuda() + + else: + + if s.is_cuda: + s_other_device = s.cpu() + else: + s_other_device = s.cuda() + + with self.assertRaisesRegex(RuntimeError, r"Device of 'wrap_storage' must be"): + storage_class(wrap_storage=s_other_device._untyped()) + + # _TypedStorage constructor errors + with self.assertRaisesRegex(RuntimeError, r"No positional arguments"): + torch._TypedStorage(0, wrap_storage=s._untyped(), dtype=dtype) + + with self.assertRaisesRegex(RuntimeError, r"Argument 'dtype' must be specified"): + torch._TypedStorage(wrap_storage=s._untyped()) + + with self.assertRaisesRegex(TypeError, r"Argument 'dtype' must be torch.dtype"): + torch._TypedStorage(wrap_storage=s._untyped(), dtype=0) + + with self.assertRaisesRegex(RuntimeError, r"Argument 'device' should not be specified"): + torch._TypedStorage(wrap_storage=s._untyped(), dtype=dtype, device=device) + + with self.assertRaisesRegex(TypeError, r"Argument 'wrap_storage' must be _UntypedStorage"): + torch._TypedStorage(wrap_storage=s, dtype=dtype) + + with self.assertRaisesRegex(RuntimeError, r"Storage device not recognized"): + torch._TypedStorage(dtype=dtype, device='xla') + + if torch.cuda.is_available(): + if storage_class in quantized_storages: + with self.assertRaisesRegex(RuntimeError, r"Cannot create CUDA storage with quantized dtype"): + torch._TypedStorage(dtype=dtype, device='cuda') + + with self.assertRaisesRegex(TypeError, r"Argument type not recognized"): + torch._TypedStorage(torch.tensor([]), dtype=dtype, device=device) + + with self.assertRaisesRegex(RuntimeError, r"Too many positional arguments"): + torch._TypedStorage(0, 0, dtype=dtype, device=device) + + def test_storage_error_no_attribute(self): + storage_classes = [ + torch.cuda.ByteStorage, + torch.cuda.FloatStorage, + torch.cuda._UntypedStorage, + ] + for storage_class in storage_classes: + with self.assertRaisesRegex(RuntimeError, r'Not available for CUDA storage'): + storage_class.from_buffer() + + if storage_class == torch.cuda._UntypedStorage: + with self.assertRaisesRegex(RuntimeError, r'Not available for CUDA storage'): + storage_class._new_with_weak_ptr() + + else: + with self.assertRaisesRegex(AttributeError, r'has no attribute'): + storage_class._new_with_weak_ptr() + + with self.assertRaisesRegex(RuntimeError, r'Not available for CUDA storage'): + storage_class._new_shared_filename(0, 0, 0) def test_storage_casts(self): storage = torch.IntStorage([-1, 0, 1, 2, 3, 4]) @@ -7109,6 +7083,14 @@ def test_fill_diagonal(self): e1.fill_diagonal_(v, wrap=True) self.assertEqual(e1, e2) + def test_setting_real_imag_to_a_number(self): + x = torch.randn(4, dtype=torch.cfloat) + x.real = 0 + x.imag = 0 + zeros = torch.zeros(4) + self.assertEqual(x.real, zeros) + self.assertEqual(x.imag, zeros) + def test_batch_norm_cpu_inference(self): # input nchw in (2,1,1,1), (2,2,2,2) inputs = [ @@ -7165,6 +7147,11 @@ def test_empty_meta(self): self.assertEqual(z.size(), (2 ** 20, 2 ** 20)) self.assertRaises(RuntimeError, lambda: z[0][0].item()) + @noarchTest + def test_format_scalar_meta(self): + x = torch.empty((), device='meta') + self.assertEqual(format(x), repr(x)) + @noarchTest def test_upsample_nearest1d_meta(self): # TODO: this test should be triggered by test_nn.py but right @@ -7408,12 +7395,12 @@ def test_numel(self): # Verifies that (deep)copies of dtypes are the same objects def test_copy_dtypes(self): - for dtype in get_all_dtypes(): + for dtype in all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool): copied_dtype = copy.deepcopy(dtype) self.assertIs(dtype, copied_dtype) def test_dtype_is_signed(self): - for dtype in get_all_dtypes(): + for dtype in all_types_and_complex_and(torch.half, torch.bfloat16, torch.half): self.assertEqual(dtype.is_signed, torch.is_signed(torch.tensor(0, dtype=dtype))) self.assertRaisesRegex(RuntimeError, 'not supported for quantized', lambda: torch.quint8.is_signed) @@ -7528,6 +7515,12 @@ def test_copy_transpose(self): self.assertEqual(y[:, 0], range(100)) self.assertEqual(y[:, 40], range(4000, 4100)) + x = torch.arange(100 * 100).reshape(100, 100).to(dtype=torch.complex32).t() + y = torch.empty(100, 100, dtype=torch.complex32) + y.copy_(x) + self.assertEqual(y[:, 0], range(100)) + self.assertEqual(y[:, 40], range(4000, 4100)) + # FIXME: Port to a more appropriate test suite def test_copy_broadcast(self): torch.zeros(5, 6).copy_(torch.zeros(6)) diff --git a/test/test_type_promotion.py b/test/test_type_promotion.py index f32a89933f0880..a157f49962d5c5 100644 --- a/test/test_type_promotion.py +++ b/test/test_type_promotion.py @@ -11,7 +11,7 @@ from torch.testing._internal.common_device_type import (instantiate_device_type_tests, onlyNativeDeviceTypes, dtypes, dtypesIfCUDA, onlyCPU, expectedFailureMeta, skipMeta) from torch.testing._internal.common_dtype import ( - get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_fp_dtypes + all_types_and_complex_and, all_types_and, get_all_math_dtypes, integral_types_and, floating_types_and ) if TEST_NUMPY: @@ -184,7 +184,7 @@ def test_bfloat16(self, device): self.assertEqual(bf + scalar, scalar + bf) # with tensor - for dtype in get_all_dtypes(): + for dtype in all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool): t = torch.tensor(1, dtype=dtype, device=device) self.assertEqual(bf + t, t + bf) if dtype in (torch.float16, torch.float32, torch.float64, torch.cfloat, torch.cdouble): @@ -340,7 +340,8 @@ def test_create_bool_tensors(self, device): # this seems like odd behavior but ints also create float tensors, numpy doesn't have this function. self.assertEqual(torch.scalar_tensor(False, device=device), torch.tensor(0., device=device)) - @dtypes(*itertools.product(get_all_dtypes(), get_all_dtypes())) + @dtypes(*itertools.product(all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool), + all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool))) def test_result_type(self, device, dtypes): "Test result_type for tensor vs tensor and scalar vs scalar." @@ -562,7 +563,7 @@ def test_promote_types(self, device): @float_double_default_dtype def test_promote_self(self, device): - for dtype in get_all_dtypes(): + for dtype in all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool): self.assertEqual(torch.promote_types(dtype, dtype), dtype) @expectedFailureMeta @@ -880,7 +881,7 @@ def test_numpy_array_binary_ufunc_promotion(self, device, dtypes): @onlyNativeDeviceTypes def test_cat_different_dtypes(self, device): - dtypes = get_all_dtypes(include_bfloat16=False) + dtypes = all_types_and_complex_and(torch.half, torch.bool) for x_dtype, y_dtype in itertools.product(dtypes, dtypes): x_vals, y_vals = [1, 2, 3], [4, 5, 6] @@ -899,7 +900,7 @@ def test_cat_different_dtypes(self, device): @onlyNativeDeviceTypes def test_cat_out_different_dtypes(self, device): - dtypes = get_all_dtypes(include_bfloat16=False, include_bool=False) + dtypes = all_types_and_complex_and(torch.half) for x_dtype, y_dtype, out_dtype in itertools.product(dtypes, dtypes, dtypes): out = torch.zeros(6, device=device, dtype=out_dtype) x = torch.tensor([1, 2, 3], device=device, dtype=x_dtype) @@ -971,21 +972,19 @@ def test_computation_ignores_out(self, device): self.assertEqual(result, a - b, exact_dtype=False) self.assertNotEqual(result, a.double() - b, exact_dtype=False) - @dtypesIfCUDA(*itertools.product(get_all_dtypes(include_bfloat16=False, include_complex=False), - get_all_dtypes(include_bfloat16=False, include_complex=False))) - @dtypes(*itertools.product(get_all_dtypes(include_half=False, include_bfloat16=False, - include_complex=False), - get_all_dtypes(include_half=False, include_bfloat16=False, - include_complex=False))) + @dtypesIfCUDA(*itertools.product(all_types_and(torch.half, torch.bool), + all_types_and(torch.half, torch.bool))) + @dtypes(*itertools.product(all_types_and(torch.bool), + all_types_and(torch.bool))) def test_atan2_type_promotion(self, device, dtypes): dtype1, dtype2 = dtypes default_float = torch.get_default_dtype() def is_int(dtype): - return dtype in get_all_int_dtypes() + [torch.bool] + return dtype in integral_types_and(torch.bool) def is_float(dtype): - return dtype in get_all_fp_dtypes(include_half=True, include_bfloat16=False) + return dtype in floating_types_and(torch.half) def get_binary_float_result_type(x, y): dtype1 = x.dtype diff --git a/test/test_unary_ufuncs.py b/test/test_unary_ufuncs.py index 3cfcd4fa2e813e..c6ca4ffc81c8d6 100644 --- a/test/test_unary_ufuncs.py +++ b/test/test_unary_ufuncs.py @@ -21,8 +21,8 @@ OpDTypes) from torch.testing import make_tensor from torch.testing._internal.common_dtype import ( - floating_types_and, all_types_and_complex_and, floating_and_complex_types_and, get_all_dtypes, get_all_math_dtypes, - get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes + floating_types_and, all_types_and_complex_and, integral_types_and, get_all_math_dtypes, + complex_types, all_types_and, floating_and_complex_types_and ) if TEST_SCIPY: @@ -517,8 +517,7 @@ def test_out_arg_all_dtypes(self, device, dtype, op): out = torch.empty_like(input, dtype=out_dtype) self._test_out_arg(op, input, out, expected, **torch_kwargs) - @dtypes(*(get_all_int_dtypes() + [torch.bool] + - get_all_fp_dtypes(include_bfloat16=False))) + @dtypes(*all_types_and(torch.bool, torch.half)) def test_nan_to_num(self, device, dtype): for contiguous in [False, True]: x = make_tensor((64, 64), low=0., high=100., dtype=dtype, device=device) @@ -596,7 +595,7 @@ def test_digamma(self, device, dtype): self.compare_with_numpy(torch.digamma, scipy.special.digamma, tensor) @skipCUDAIfRocm - @dtypes(*get_all_fp_dtypes(include_half=True, include_bfloat16=False)) + @dtypes(*floating_types_and(torch.half)) def test_frexp(self, device, dtype): input = make_tensor((50, 50), dtype=dtype, device=device) mantissa, exponent = torch.frexp(input) @@ -611,15 +610,13 @@ def test_frexp(self, device, dtype): @skipCUDAIfRocm def test_frexp_assert_raises(self, device): - invalid_input_dtypes = get_all_int_dtypes() + \ - get_all_complex_dtypes() + \ - [torch.bool] + invalid_input_dtypes = integral_types_and(torch.bool) + complex_types() for dtype in invalid_input_dtypes: input = make_tensor((50, 50), dtype=dtype, device=device) with self.assertRaisesRegex(RuntimeError, r"torch\.frexp\(\) only supports floating-point dtypes"): torch.frexp(input) - for dtype in get_all_fp_dtypes(include_half=True, include_bfloat16=False): + for dtype in floating_types_and(torch.half): input = make_tensor((50, 50), dtype=dtype, device=device) dtypes = list(all_types_and_complex_and(torch.bool, torch.half, torch.bfloat16)) @@ -872,7 +869,7 @@ def test_unary_out_op_mem_overlap(self, device, dtype): # TODO: opinfo hardshrink @onlyCPU - @dtypes(torch.float, torch.double) + @dtypes(torch.float, torch.double, torch.bfloat16) def test_hardshrink(self, device, dtype): data = torch.tensor([1, 0.5, 0.3, 0.6], dtype=dtype, device=device).view(2, 2) self.assertEqual(torch.tensor([1, 0.5, 0, 0.6], dtype=dtype, device=device).view(2, 2), @@ -888,7 +885,7 @@ def test_hardshrink(self, device, dtype): data.t().hardshrink(0.3)) @onlyCPU - @dtypes(torch.float, torch.double) + @dtypes(torch.float, torch.double, torch.bfloat16) def test_hardshrink_edge_cases(self, device, dtype) -> None: def h(values, l_expected): for l, expected in l_expected.items(): @@ -913,6 +910,7 @@ def test_helper(min, max): @onlyCPU @slowTest @dtypes(torch.float) + @unittest.skipIf(True, "Insufficient memory on linux.(2|4)xlarge") def test_exp_slow(self, device, dtype): # Test for https://github.com/pytorch/pytorch/issues/17271 # This is pretty slow on my Macbook but it only takes a few @@ -922,8 +920,7 @@ def test_exp_slow(self, device, dtype): self.assertEqual(a, b.expand(2 ** 31)) @precisionOverride({torch.bfloat16: 1e-2, torch.float: 0.0002, torch.double: 0.0002}) - @dtypesIfCUDA(torch.float, torch.double, torch.bfloat16) - @dtypes(torch.float, torch.double) + @dtypes(torch.float, torch.double, torch.bfloat16) def test_hardswish(self, device, dtype): inputValues = [-1000, -4, -3, -2, 0, 2, 3, 4, 1000] expectedOutput = np.multiply( @@ -944,8 +941,7 @@ def test_hardswish(self, device, dtype): self.assertEqual(inputTensorCpy, expectedOutputTensor) @precisionOverride({torch.bfloat16: 1e-2, torch.float: 0.0002, torch.double: 0.0002}) - @dtypesIfCUDA(torch.float, torch.double, torch.bfloat16) - @dtypes(torch.float, torch.double) + @dtypes(torch.float, torch.double, torch.bfloat16) def test_hardsigmoid(self, device, dtype): inputValues = [-1000, -4, -3, -2, 0, 2, 3, 4, 1000] expectedOutput = np.minimum(np.maximum((np.add(inputValues, 3)), 0), 6) / 6.0 @@ -962,8 +958,7 @@ def test_hardsigmoid(self, device, dtype): torch.tensor(expectedOutput, dtype=dtype, device=device)) @precisionOverride({torch.bfloat16: 1e-2, torch.float: 0.0002, torch.double: 0.0002}) - @dtypesIfCUDA(torch.float, torch.double, torch.bfloat16) - @dtypes(torch.float, torch.double) + @dtypes(torch.float, torch.double, torch.bfloat16) def test_hardsigmoid_backward(self, device, dtype): inputValues = [-3.0, 3.0, -2.0, 2.0, -6.0, 6.0] expectedValues = [0.0, 0.0, 1.0 / 6.0, 1.0 / 6.0, 0.0, 0.0] @@ -1182,7 +1177,7 @@ def _i0_range_helper(self, range, device, dtype): t = torch.rand(1000, device=device).to(dtype) * r self._i0_helper(t) - @dtypesIfCUDA(*get_all_fp_dtypes()) + @dtypesIfCUDA(*floating_types_and(torch.half, torch.bfloat16)) @dtypes(torch.bfloat16, torch.float32, torch.float64) @unittest.skipIf(not TEST_SCIPY, "SciPy not found") def test_i0_range1(self, device, dtype): @@ -1190,7 +1185,7 @@ def test_i0_range1(self, device, dtype): # The domain is (-13.25, 13.25) self._i0_range_helper(13.25, device, dtype) - @dtypesIfCUDA(*get_all_fp_dtypes()) + @dtypesIfCUDA(*floating_types_and(torch.half, torch.bfloat16)) @dtypes(torch.bfloat16, torch.float32, torch.float64) @unittest.skipIf(not TEST_SCIPY, "SciPy not found") def test_i0_range2(self, device, dtype): @@ -1205,7 +1200,7 @@ def test_i0_range3(self, device, dtype): # The domain is (-709.75, 709.75) self._i0_range_helper(709.75, device, dtype) - @dtypesIfCUDA(*get_all_fp_dtypes()) + @dtypesIfCUDA(*floating_types_and(torch.half, torch.bfloat16)) @dtypes(torch.bfloat16, torch.float32, torch.float64) @unittest.skipIf(not TEST_SCIPY, "SciPy not found") def test_i0_special(self, device, dtype): @@ -1215,7 +1210,7 @@ def test_i0_special(self, device, dtype): t = torch.tensor([inf, -inf, nan], device=device, dtype=dtype) self.assertTrue(torch.i0(t).isnan().all()) - @dtypesIfCUDA(*get_all_fp_dtypes()) + @dtypesIfCUDA(*floating_types_and(torch.half, torch.bfloat16)) @dtypes(torch.bfloat16, torch.float32, torch.float64) @unittest.skipIf(not TEST_SCIPY, "SciPy not found") def test_special_i0_i1_vs_scipy(self, device, dtype): @@ -1269,11 +1264,25 @@ def check_equal(t): self.assertEqual(actual, expected) range = (-10, 10) + t = torch.linspace(*range, 1, device=device, dtype=dtype) + check_equal(t) - t = torch.linspace(*range, int(1e4), device=device, dtype=dtype) + # Skip testing NaN, inf, -inf since they are tested in reference_numerics tests. + info = torch.finfo(dtype) + min, max, eps, tiny = info.min, info.max, info.eps, info.tiny + t = torch.tensor([min, max, eps, tiny], dtype=dtype, device=device) check_equal(t) - # NaN, inf, -inf are tested in reference_numerics tests. + @dtypes(torch.float32, torch.float64) + @unittest.skipIf(not TEST_SCIPY, "SciPy not found") + def test_special_log_ndtr_vs_scipy(self, device, dtype): + def check_equal(t): + # Test by comparing with scipy + actual = torch.special.log_ndtr(t) + expected = scipy.special.log_ndtr(t.cpu().numpy()) + self.assertEqual(actual, expected) + + # Skip testing NaN, inf, -inf since they are tested in reference_numerics tests. info = torch.finfo(dtype) min, max, eps, tiny = info.min, info.max, info.eps, info.tiny t = torch.tensor([min, max, eps, tiny], dtype=dtype, device=device) @@ -1307,7 +1316,7 @@ def test_abs_zero(self, device, dtype): for num in abs_zeros: self.assertGreater(math.copysign(1.0, num), 0.0) - @dtypes(*(get_all_dtypes(include_bool=False))) + @dtypes(*all_types_and_complex_and(torch.half, torch.bfloat16)) def test_isposinf_isneginf_non_boolean_output(self, device, dtype): # test non-boolean tensors as the `out=` parameters # boolean outputs are tested in the above testcases @@ -1349,10 +1358,8 @@ def assert_tuple_empty(tup, dim): self.assertEqual(torch.empty(0, dtype=torch.long), z[0]) # TODO: rationalize with exp OpInfo - @dtypes(*(get_all_fp_dtypes(include_half=False) + - get_all_complex_dtypes())) - @dtypesIfCUDA(*(get_all_fp_dtypes(include_half=True) + - get_all_complex_dtypes())) + @dtypes(*floating_and_complex_types_and(torch.bfloat16)) + @dtypesIfCUDA(*floating_and_complex_types_and(torch.half, torch.bfloat16)) def test_exp(self, device, dtype): for v in (2, -2) + ((1j, 1 + 1j) if dtype.is_complex else ()): a = torch.tensor(v, dtype=dtype, device=device) * torch.arange(18, device=device) / 3 * math.pi diff --git a/test/test_utils.py b/test/test_utils.py index c8f4e3aa9453b7..6338b8d5d810a5 100644 --- a/test/test_utils.py +++ b/test/test_utils.py @@ -1,4 +1,4 @@ -# Owner(s): ["high priority"] +# Owner(s): ["module: unknown"] import sys import os @@ -18,10 +18,9 @@ import torch.cuda from torch.utils.checkpoint import checkpoint, checkpoint_sequential import torch.utils.cpp_extension -import torch.hub as hub from torch.autograd._functions.utils import check_onnx_broadcast from torch.onnx.symbolic_opset9 import _prepare_onnx_paddings -from torch.testing._internal.common_utils import has_breakpad, load_tests, retry, IS_SANDCASTLE, IS_WINDOWS, TEST_WITH_ASAN +from torch.testing._internal.common_utils import has_breakpad, load_tests, IS_SANDCASTLE, IS_WINDOWS, TEST_WITH_ASAN # load_tests from torch.testing._internal.common_utils is used to automatically filter tests for # sharding on sandcastle. This line silences flake warnings @@ -411,12 +410,6 @@ def test_multi_drop(self): test_dir = os.path.abspath(os.path.dirname(str(__file__))) -class TestFFI(TestCase): - def test_deprecated(self): - with self.assertRaisesRegex(ImportError, "torch.utils.ffi is deprecated. Please use cpp extensions instead."): - from torch.utils.ffi import create_extension # type: ignore[attr-defined] # noqa: F401 - - @unittest.skipIf('SKIP_TEST_BOTTLENECK' in os.environ.keys(), 'SKIP_TEST_BOTTLENECK is set') class TestBottleneck(TestCase): def _run(self, command, timeout=30): @@ -584,146 +577,6 @@ def try_check_onnx_broadcast(dims1, dims2, expect_broadcast, expect_fail): try_check_onnx_broadcast(dims1, dims2, True, False) -def sum_of_state_dict(state_dict): - s = 0 - for _, v in state_dict.items(): - s += v.sum() - return s - -SUM_OF_HUB_EXAMPLE = 431080 -TORCHHUB_EXAMPLE_RELEASE_URL = 'https://github.com/ailzhang/torchhub_example/releases/download/0.1/mnist_init_ones' - -@unittest.skipIf(IS_SANDCASTLE, 'Sandcastle cannot ping external') -class TestHub(TestCase): - @retry(Exception, tries=3) - def test_load_from_github(self): - hub_model = hub.load( - 'ailzhang/torchhub_example', - 'mnist', - source='github', - pretrained=True, - verbose=False) - self.assertEqual(sum_of_state_dict(hub_model.state_dict()), - SUM_OF_HUB_EXAMPLE) - - @retry(Exception, tries=3) - def test_load_from_local_dir(self): - local_dir = hub._get_cache_or_reload( - 'ailzhang/torchhub_example', force_reload=False) - hub_model = hub.load( - local_dir, - 'mnist', - source='local', - pretrained=True, - verbose=False) - self.assertEqual(sum_of_state_dict(hub_model.state_dict()), - SUM_OF_HUB_EXAMPLE) - - @retry(Exception, tries=3) - def test_load_from_branch(self): - hub_model = hub.load( - 'ailzhang/torchhub_example:ci/test_slash', - 'mnist', - pretrained=True, - verbose=False) - self.assertEqual(sum_of_state_dict(hub_model.state_dict()), - SUM_OF_HUB_EXAMPLE) - - @retry(Exception, tries=3) - def test_set_dir(self): - temp_dir = tempfile.gettempdir() - hub.set_dir(temp_dir) - hub_model = hub.load( - 'ailzhang/torchhub_example', - 'mnist', - pretrained=True, - verbose=False) - self.assertEqual(sum_of_state_dict(hub_model.state_dict()), - SUM_OF_HUB_EXAMPLE) - assert os.path.exists(temp_dir + '/ailzhang_torchhub_example_master') - shutil.rmtree(temp_dir + '/ailzhang_torchhub_example_master') - - @retry(Exception, tries=3) - def test_list_entrypoints(self): - entry_lists = hub.list('ailzhang/torchhub_example', force_reload=True) - self.assertObjectIn('mnist', entry_lists) - - @retry(Exception, tries=3) - def test_download_url_to_file(self): - temp_file = os.path.join(tempfile.gettempdir(), 'temp') - hub.download_url_to_file(TORCHHUB_EXAMPLE_RELEASE_URL, temp_file, progress=False) - loaded_state = torch.load(temp_file) - self.assertEqual(sum_of_state_dict(loaded_state), - SUM_OF_HUB_EXAMPLE) - - @retry(Exception, tries=3) - def test_load_state_dict_from_url(self): - loaded_state = hub.load_state_dict_from_url(TORCHHUB_EXAMPLE_RELEASE_URL) - self.assertEqual(sum_of_state_dict(loaded_state), - SUM_OF_HUB_EXAMPLE) - - @retry(Exception, tries=3) - def test_load_zip_checkpoint(self): - hub_model = hub.load( - 'ailzhang/torchhub_example', - 'mnist_zip', - pretrained=True, - verbose=False) - self.assertEqual(sum_of_state_dict(hub_model.state_dict()), - SUM_OF_HUB_EXAMPLE) - - # Test the default zipfile serialization format produced by >=1.6 release. - @retry(Exception, tries=3) - def test_load_zip_1_6_checkpoint(self): - hub_model = hub.load( - 'ailzhang/torchhub_example', - 'mnist_zip_1_6', - pretrained=True, - verbose=False) - self.assertEqual(sum_of_state_dict(hub_model.state_dict()), - SUM_OF_HUB_EXAMPLE) - - - def test_hub_dir(self): - with tempfile.TemporaryDirectory('hub_dir') as dirname: - torch.hub.set_dir(dirname) - self.assertEqual(torch.hub.get_dir(), dirname) - - @retry(Exception, tries=3) - def test_hub_parse_repo_info(self): - # If the branch is specified we just parse the input and return - self.assertEqual( - torch.hub._parse_repo_info('a/b:c'), - ('a', 'b', 'c') - ) - # For torchvision, the default branch is main - self.assertEqual( - torch.hub._parse_repo_info('pytorch/vision'), - ('pytorch', 'vision', 'main') - ) - # For the torchhub_example repo, the default branch is still master - self.assertEqual( - torch.hub._parse_repo_info('ailzhang/torchhub_example'), - ('ailzhang', 'torchhub_example', 'master') - ) - - @retry(Exception, tries=3) - def test_load_state_dict_from_url_with_name(self): - with tempfile.TemporaryDirectory('hub_dir') as dirname: - torch.hub.set_dir(dirname) - file_name = 'test_file' - loaded_state = hub.load_state_dict_from_url(TORCHHUB_EXAMPLE_RELEASE_URL, file_name=file_name) - self.assertTrue(os.path.exists(os.path.join(dirname, 'checkpoints', file_name))) - self.assertEqual(sum_of_state_dict(loaded_state), - SUM_OF_HUB_EXAMPLE) - - @retry(Exception, tries=3) - def test_load_commit_from_forked_repo(self): - with self.assertRaisesRegex( - ValueError, - 'If it\'s a commit from a forked repo'): - model = torch.hub.load('pytorch/vision:4e2c216', 'resnet18', force_reload=True) - class TestHipify(TestCase): def test_import_hipify(self): from torch.utils.hipify import hipify_python # noqa: F401 diff --git a/test/test_view_ops.py b/test/test_view_ops.py index d85d53e6991510..064d001727ab70 100644 --- a/test/test_view_ops.py +++ b/test/test_view_ops.py @@ -16,7 +16,7 @@ from torch.testing._internal.common_device_type import \ (instantiate_device_type_tests, onlyCPU, dtypes, onlyNativeDeviceTypes, skipMeta) from torch.testing._internal.common_dtype import ( - get_all_dtypes, get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes + all_types_and_complex_and, complex_types, all_types_and, floating_and_complex_types_and, ) # TODO: replace this with make_tensor() in common_utils.py @@ -121,14 +121,14 @@ def _do_transpose(self, x, contiguous=False, dim0=0, dim1=1): else: return x.transpose(dim0, dim1) - @dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes())) + @dtypes(*all_types_and(torch.half, torch.bfloat16)) def test_conj_self(self, device, dtype): t = torch.ones(5, 5, device=device) s = t.conj() self.assertTrue(s is t) @onlyNativeDeviceTypes - @dtypes(*get_all_dtypes(include_bfloat16=False)) + @dtypes(*all_types_and_complex_and(torch.half, torch.bool)) def test_view_dtype_new(self, device, dtype): dtypes = torch_to_numpy_dtype_dict.copy() del dtypes[torch.bool] @@ -210,18 +210,18 @@ def calc_expected_size_and_stride(a, view_dtype): # because view(dtype) does not support backward yet # TODO: Remove this when autograd support is added if dtype.is_floating_point or dtype.is_complex: - for view_dtype in [*get_all_fp_dtypes(), *get_all_complex_dtypes()]: + for view_dtype in floating_and_complex_types_and(torch.half, torch.bfloat16): t = make_tensor((5, 5, 64), dtype=dtype, device=device, low=-5, high=5, requires_grad=True) self.assertFalse(t.view(view_dtype).requires_grad) # Test the extra error checks that happen when the view dtype # has a greater element size than the original dtype @onlyNativeDeviceTypes - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool)) def test_view_dtype_upsize_errors(self, device, dtype): dtype_size = torch._utils._element_size(dtype) - for view_dtype in get_all_dtypes(): + for view_dtype in all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool): view_dtype_size = torch._utils._element_size(view_dtype) if view_dtype_size <= dtype_size: continue @@ -302,7 +302,7 @@ def fn(contiguous_input=True, dim0=0, dim1=1): self.assertEqual(res.shape, torch.Size([0])) @onlyNativeDeviceTypes - @dtypes(*get_all_complex_dtypes(include_complex32=True)) + @dtypes(*complex_types(), torch.complex32) def test_view_as_real(self, device, dtype): def fn(contiguous_input=True): t = torch.randn(3, 4, dtype=dtype, device=device) @@ -340,7 +340,7 @@ def fn(contiguous_input=True): self.assertEqual(res.shape, torch.Size([2])) @onlyNativeDeviceTypes - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool)) def test_view_tensor_split(self, device, dtype): a = make_tensor((40, 30), dtype=dtype, device=device, low=-9, high=9) a_split_dim0 = a.tensor_split(7, 0) @@ -351,7 +351,7 @@ def test_view_tensor_split(self, device, dtype): self.assertTrue(self.is_view_of(a, a_split_dim1_tensor)) @onlyNativeDeviceTypes - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool)) def test_view_tensor_hsplit(self, device, dtype): t = make_tensor((4, 4, 4), dtype=dtype, device=device, low=-9, high=9) t_hsplit = torch.hsplit(t, 2) @@ -361,7 +361,7 @@ def test_view_tensor_hsplit(self, device, dtype): self.assertEqual(t_hsplit[1][2, 0, 2], t[2, 2, 2]) @onlyNativeDeviceTypes - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool)) def test_view_tensor_vsplit(self, device, dtype): t = make_tensor((4, 4, 4), dtype=dtype, device=device, low=-9, high=9) t_vsplit = torch.vsplit(t, 2) @@ -371,7 +371,7 @@ def test_view_tensor_vsplit(self, device, dtype): self.assertEqual(t_vsplit[1][0, 2, 2], t[2, 2, 2]) @onlyNativeDeviceTypes - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool)) def test_view_tensor_dsplit(self, device, dtype): t = make_tensor((4, 4, 4), dtype=dtype, device=device, low=-9, high=9) t_dsplit = torch.dsplit(t, 2) @@ -381,7 +381,7 @@ def test_view_tensor_dsplit(self, device, dtype): self.assertEqual(t_dsplit[1][2, 2, 0], t[2, 2, 2]) @onlyNativeDeviceTypes - @dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes())) + @dtypes(*all_types_and(torch.half, torch.bfloat16)) def test_imag_noncomplex(self, device, dtype): t = torch.ones((5, 5), dtype=dtype, device=device) @@ -389,7 +389,7 @@ def test_imag_noncomplex(self, device, dtype): torch.imag(t) @onlyNativeDeviceTypes - @dtypes(*get_all_complex_dtypes()) + @dtypes(*complex_types()) def test_real_imag_view(self, device, dtype): def compare_with_numpy(contiguous_input=True): t = torch.randn(3, 3, dtype=dtype, device=device) @@ -420,7 +420,7 @@ def compare_with_numpy(contiguous_input=True): self.assertEqual(a[5:].imag, a.imag[5:]) @onlyNativeDeviceTypes - @dtypes(*get_all_complex_dtypes()) + @dtypes(*complex_types()) def test_conj_imag_view(self, device, dtype) -> None: t = _make_tensor((4, 5,), dtype, device) t_numpy_conj = torch.from_numpy(t.cpu().numpy().conj()).to(device=device) @@ -445,7 +445,7 @@ def test_conj_view_with_shared_memory(self, device) -> None: self.assertEqual(torch.add(b, c), b.add_(c)) @onlyNativeDeviceTypes - @dtypes(*product(get_all_complex_dtypes(), get_all_dtypes())) + @dtypes(*product(complex_types(), all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool))) @suppress_warnings def test_set_real_imag(self, device, dtypes): x = torch.randn(10, dtype=dtypes[0], device=device) @@ -917,29 +917,38 @@ def _test_ravel(tensors, size, nc=False): flat = src.ravel() self.assertEqual(flat.shape, torch.Size([size])) self.assertEqual(src.view(-1), flat) - self.assertEqual(flat._base, src) + self.assertIs(flat._base, src) + self.assertTrue(flat.is_contiguous()) # Non-continuous Tensor -> Copy if nc: nc_src = src.t() nc_flat = nc_src.ravel() self.assertEqual(nc_flat.shape, torch.Size([size])) - self.assertEqual(nc_src.reshape(-1), nc_flat) - self.assertTrue(nc_flat._base != nc_src) + self.assertEqual(nc_src.contiguous().view(-1), nc_flat) + self.assertIsNot(nc_flat._base, src) + self.assertTrue(nc_flat.is_contiguous()) # Test that flatten returns 1-dim tensor when given a 0-dim tensor zero_dim_tensor = torch.tensor(123, device=device) flat0 = zero_dim_tensor.ravel() one_dim_tensor = torch.tensor([123], device=device) flat1 = zero_dim_tensor.ravel() + nc_ones_tensor = torch.ones(10, device=device)[::2] + flat2 = nc_ones_tensor.ravel() self.assertEqual(zero_dim_tensor.shape, torch.Size([])) self.assertEqual(flat0.shape, torch.Size([1])) self.assertEqual(one_dim_tensor.shape, torch.Size([1])) self.assertEqual(flat1.shape, torch.Size([1])) + self.assertEqual(nc_ones_tensor.shape, torch.Size([5])) + self.assertEqual(flat2.shape, torch.Size([5])) self.assertEqual(flat0, one_dim_tensor) self.assertEqual(flat0, flat1) self.assertEqual(flat0.shape, flat1.shape) + self.assertTrue(flat0.is_contiguous()) + self.assertTrue(flat1.is_contiguous()) + self.assertTrue(flat2.is_contiguous()) # Test both float tensor and quantized tensor tensors = [torch.randn(5, 5, 5, 5, device=device), @@ -1255,7 +1264,7 @@ def test_T(self, device): scalar = torch.tensor(5, device=device) self.assertEqual(scalar, scalar.T) - @dtypes(*(torch.testing.get_all_dtypes())) + @dtypes(*all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool)) def test_transposes(self, device, dtype): for op in ("T", "H", "mT", "mH", "adjoint"): shapes = ((), (2, 3), (2, 3, 4)) if op[0] == "m" or op == "adjoint" else ((), (2, 3),) @@ -1271,7 +1280,7 @@ def test_transposes(self, device, dtype): t2 = t2.conj() self.assertEqual(t2, t1) - @dtypes(*(torch.testing.get_all_dtypes())) + @dtypes(*all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool)) def test_transposes_errors(self, device, dtype): for op in ("H", "mT", "mH", "adjoint"): shapes = ((2,), (2, 3, 4)) if op == "H" else ((2,),) @@ -1397,8 +1406,7 @@ def _test_atleast_dim(self, torch_fn, np_fn, device, dtype): self.assertEqual(np_res, torch_res) # TODO: are these view ops? - @dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) + - get_all_complex_dtypes())) + @dtypes(*all_types_and_complex_and(torch.half)) def test_atleast(self, device, dtype): self._test_atleast_dim(torch.atleast_1d, np.atleast_1d, device, dtype) self._test_atleast_dim(torch.atleast_2d, np.atleast_2d, device, dtype) @@ -1535,7 +1543,7 @@ def test_broadcast_shapes_numpy_ref(self, device): self.assertEqual(res1, res2_numpy) # Skip BFloat16 since numpy does not support it - @dtypes(*get_all_dtypes(include_bfloat16=False)) + @dtypes(*all_types_and_complex_and(torch.half, torch.bool)) def test_broadcast_to(self, device, dtype): def can_broadcast(s0, s1): # s0.dim() <= s1.dim(), reverse s0 and s1 to compare trailing dimension @@ -1638,7 +1646,7 @@ def test_view(self, device): self.assertEqual(tensor.view(6, 2, 1), contig_tensor.view(6, 2, 1)) self.assertEqual(tensor.view(1, 6, 2, 1), contig_tensor.view(1, 6, 2, 1)) - @dtypes(*get_all_dtypes()) + @dtypes(*all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool)) def test_reshape_view_semantics(self, device, dtype): tensor = make_tensor((15, 4), dtype=dtype, device=device) target = (20, 3) @@ -1665,7 +1673,7 @@ def test_contiguous(self, device): @onlyNativeDeviceTypes # Skip BFloat16 since numpy does not support it - @dtypes(*get_all_dtypes(include_bfloat16=False)) + @dtypes(*all_types_and_complex_and(torch.half, torch.bool)) def test_tensor_split_sections(self, device, dtype): input_sizes = [ (0,), @@ -1696,7 +1704,7 @@ def test_tensor_split_sections(self, device, dtype): @onlyNativeDeviceTypes # Skip BFloat16 since numpy does not support it - @dtypes(*get_all_dtypes(include_bfloat16=False)) + @dtypes(*all_types_and_complex_and(torch.half, torch.bool)) def test_tensor_split_indices(self, device, dtype): input_sizes = [ (0,), @@ -1775,20 +1783,28 @@ def test_tensor_split_errors(self, device): def test_resize_all_dtypes_and_devices(self, device): shape = (2, 2) - for dt in get_all_dtypes(): + for dt in all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool): x = torch.tensor([[1, 2], [3, 4], [5, 6]], dtype=dt, device=device) x.resize_(shape) self.assertEqual(shape, x.shape) def test_resize_as_all_dtypes_and_devices(self, device): - for dt in get_all_dtypes(): + for dt in all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool): x = torch.tensor([[1, 2], [3, 4], [5, 6]], dtype=dt, device=device) y = torch.tensor([[1, 2, 3], [4, 5, 6]], dtype=dt, device=device) x.resize_as_(y) self.assertEqual(y.shape, x.shape) + @onlyNativeDeviceTypes + def test_resize_overflow(self, device): + x = torch.empty((), dtype=torch.float64) + with self.assertRaisesRegex(RuntimeError, 'Storage size calculation overflowed'): + x.resize_([2, 4, 2**29, 2**29]) + with self.assertRaisesRegex(RuntimeError, 'overflow'): + x.resize_([8, 8, 2**29, 2**29]) + def test_view_all_dtypes_and_devices(self, device): - for dt in get_all_dtypes(): + for dt in all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool): x = torch.tensor([[1, 2], [3, 4], [5, 6]], dtype=dt, device=device) self.assertEqual(x.view(6).shape, [6]) diff --git a/test/typing/reveal/namedtuple.py b/test/typing/reveal/namedtuple.py index 8a0508b325c5a9..2e130338f0b976 100644 --- a/test/typing/reveal/namedtuple.py +++ b/test/typing/reveal/namedtuple.py @@ -7,9 +7,9 @@ t_sort[0][0, 0] == 1.5 # noqa: B015 t_sort.indices[0, 0] == 1 # noqa: B015 t_sort.values[0, 0] == 1.5 # noqa: B015 -reveal_type(t_sort) # E: Tuple[{Tensor}, {Tensor}, fallback=torch._C.namedtuple_values_indices] +reveal_type(t_sort) # E: Tuple[{Tensor}, {Tensor}, fallback=torch.return_types.sort] t_qr = torch.linalg.qr(t) t_qr[0].shape == [2, 2] # noqa: B015 t_qr.Q.shape == [2, 2] # noqa: B015 -reveal_type(t_qr) # E: Tuple[{Tensor}, {Tensor}, fallback=torch._C._VariableFunctions.namedtuple_Q_R] +reveal_type(t_qr) # E: Tuple[{Tensor}, {Tensor}, fallback=torch.return_types.qr] diff --git a/third_party/eigen b/third_party/eigen index d41dc4dd74acce..3147391d946bb4 160000 --- a/third_party/eigen +++ b/third_party/eigen @@ -1 +1 @@ -Subproject commit d41dc4dd74acce21fb210e7625d5d135751fa9e5 +Subproject commit 3147391d946bb4b6c68edd901f2add6ac1f31f8c diff --git a/third_party/fbgemm b/third_party/fbgemm index d399aee88df3ec..9cf1a9ffefbb43 160000 --- a/third_party/fbgemm +++ b/third_party/fbgemm @@ -1 +1 @@ -Subproject commit d399aee88df3ece31d2615a2938837b1e745f446 +Subproject commit 9cf1a9ffefbb439e823dd3340ab4967e0cfe23a6 diff --git a/third_party/kineto b/third_party/kineto index b5bb62d25be75c..b2b48c00c6e5bd 160000 --- a/third_party/kineto +++ b/third_party/kineto @@ -1 +1 @@ -Subproject commit b5bb62d25be75c381dbbd975276602f021982ef2 +Subproject commit b2b48c00c6e5bd8e807e2231adb229db6a1d1c22 diff --git a/tools/amd_build/build_amd.py b/tools/amd_build/build_amd.py index 38698631c03cf0..785f63085c2ea4 100755 --- a/tools/amd_build/build_amd.py +++ b/tools/amd_build/build_amd.py @@ -89,6 +89,8 @@ "tools/autograd/templates/python_variable_methods.cpp", ] +includes = [os.path.join(proj_dir, include) for include in includes] + for new_dir in args.extra_include_dir: abs_new_dir = os.path.join(proj_dir, new_dir) if os.path.exists(abs_new_dir): @@ -112,6 +114,8 @@ "torch/include/*", ] +ignores = [os.path.join(proj_dir, ignore) for ignore in ignores] + # Check if the compiler is hip-clang. def is_hip_clang() -> bool: try: diff --git a/tools/autograd/BUILD.bazel b/tools/autograd/BUILD.bazel new file mode 100644 index 00000000000000..2fd1043f2d408f --- /dev/null +++ b/tools/autograd/BUILD.bazel @@ -0,0 +1,10 @@ +py_library( + name = "autograd", + srcs = glob(["*.py"]), + data = glob([ + "*.yaml", + "templates/*", + ]), + visibility = ["//:__subpackages__"], + deps = ["//tools/codegen"], +) diff --git a/tools/autograd/derivatives.yaml b/tools/autograd/derivatives.yaml index c21e7222a854e7..0bbb57a4c49264 100644 --- a/tools/autograd/derivatives.yaml +++ b/tools/autograd/derivatives.yaml @@ -315,6 +315,7 @@ - name: atan2(Tensor self, Tensor other) -> Tensor self, other: atan2_backward(grad, self, other, grad_input_mask) + result: (-self_p * other_t + other_p * self_t) / (self_p.pow(2) + other_p.pow(2)) - name: baddbmm(Tensor self, Tensor batch1, Tensor batch2, *, Scalar beta=1, Scalar alpha=1) -> Tensor self: maybe_multiply(grad, beta.conj()) @@ -365,12 +366,14 @@ - name: cholesky_inverse(Tensor self, bool upper=False) -> Tensor self: cholesky_inverse_backward(grad, self, upper, result) + result: cholesky_inverse_jvp(self_p, self_t, result, upper) # For clamp, gradient is not defined at the boundaries. But empirically it's helpful # to be able to get gradient on min and max, so we return the subgradient 1 for these cases. - name: clamp.Tensor(Tensor self, Tensor? min=None, Tensor? max=None) -> Tensor self: clamp_backward(grad, self, min, max) min, max: clamp_backward_min_max(grad, self, min, max, grad_input_mask) + result: clamp_jvp(self_p, self_t, min_p, min_t, max_p, max_t) - name: clamp(Tensor self, Scalar? min=None, Scalar? max=None) -> Tensor self: clamp_backward(grad, self, min, max) @@ -383,7 +386,7 @@ - name: clamp_min.Tensor(Tensor self, Tensor min) -> Tensor self: where(self >= min, grad, at::scalar_tensor(0., grad.options())) min: where(self < min, grad, at::scalar_tensor(0., grad.options())) - result: where(self_p >= min_p, self_t, at::scalar_tensor(0., self_p.options())) + where(self_p < min_p, min_t, at::scalar_tensor(0., self_p.options())) + result: where(self_p >= min_p, self_t, min_t) - name: clamp_max(Tensor self, Scalar max) -> Tensor self: where(self <= max, grad, at::scalar_tensor(0., grad.options())) @@ -392,7 +395,7 @@ - name: clamp_max.Tensor(Tensor self, Tensor max) -> Tensor self: where(self <= max, grad, at::scalar_tensor(0., grad.options())) max: where(self > max, grad, at::scalar_tensor(0., grad.options())) - result: where(self_p <= max_p, self_t, at::scalar_tensor(0., self_p.options())) + where(self_p > max_p, max_t, at::scalar_tensor(0., self_p.options())) + result: where(self_p <= max_p, self_t, max_t) - name: clone(Tensor self, *, MemoryFormat? memory_format=None) -> Tensor self: grad @@ -415,6 +418,7 @@ - name: polar(Tensor abs, Tensor angle) -> Tensor abs, angle: polar_backward(grad, result) + result: at::complex(abs_t*angle_p.cos() - angle_t*abs_p*angle_p.sin(), abs_t*angle_p.sin() + angle_t*abs_p*angle_p.cos()) - name: _conj(Tensor(a) self) -> Tensor(a) self: grad.conj() @@ -549,6 +553,7 @@ - name: native_dropout(Tensor input, float p, bool? train) -> (Tensor, Tensor) input: "GradMode::is_enabled() ? infinitely_differentiable_native_dropout_backward(grad, result1, (!train.has_value() || !train.value() ? 1 : (p == 1 ? 0.0 : 1.0 / (1.0 - p)))) : native_dropout_backward(grad, result1, (!train.has_value() || !train.value() ? 1 : (p == 1 ? 0.0 : 1.0 / (1.0 - p))))" + result0: "(!train.has_value() || train.value()) ? (p == 1 ? 0.0 : 1.0 / (1.0 - p)) * input_t * result1 : input_t" - name: native_dropout_backward(Tensor grad_output, Tensor mask, float scale) -> Tensor grad_output: "native_dropout_double_backward(grad, grad_output, mask, scale)" @@ -910,6 +915,7 @@ - name: logsumexp(Tensor self, int[1] dim, bool keepdim=False) -> Tensor self: logsumexp_backward(grad, self, result, dim, keepdim) + result: logsumexp_jvp(self_p, self_t, dim, keepdim) - name: lstsq(Tensor self, Tensor A) -> (Tensor solution, Tensor QR) self: not_implemented("lstsq") @@ -979,7 +985,7 @@ - name: maximum(Tensor self, Tensor other) -> Tensor self: at::where(self == other, grad / 2, grad).masked_fill_(self < other, 0) other: at::where(self == other, grad / 2, grad).masked_fill_(self > other, 0) - result: other_t + at::where(self_p == other_p, 0.5, (self_p > other_p).to(result.scalar_type())) * (self_t - other_t) + result: other_t + at::where(self_p == other_p, at::scalar_tensor(0.5, result.options()), (self_p > other_p).to(result.scalar_type())) * (self_t - other_t) - name: fmax(Tensor self, Tensor other) -> Tensor self: grad.masked_fill((self >= other).logical_or_(other.isnan()).logical_not_(), 0) @@ -1035,7 +1041,7 @@ - name: minimum(Tensor self, Tensor other) -> Tensor self: at::where(self == other, grad / 2, grad).masked_fill_(self > other, 0) other: at::where(self == other, grad / 2, grad).masked_fill_(self < other, 0) - result: other_t + at::where(self_p == other_p, 0.5, (self_p < other_p).to(result.scalar_type())) * (self_t - other_t) + result: other_t + at::where(self_p == other_p, at::scalar_tensor(0.5, result.options()), (self_p < other_p).to(result.scalar_type())) * (self_t - other_t) - name: fmin(Tensor self, Tensor other) -> Tensor self: grad.masked_fill((self <= other).logical_or_(other.isnan()).logical_not_(), 0) @@ -1266,6 +1272,15 @@ self: grad * std::sqrt(2 * M_PI) * (result.square() / 2).exp() result: auto_element_wise +- name: special_log_ndtr(Tensor self) -> Tensor + self: grad / std::sqrt(2 * M_PI) * (result + self.pow(2) / 2).neg().exp() + result: auto_element_wise + +# [Note: Sometimes view derivatives] +# The following situation applies to other operations as well. +# TODO: This note is only referenced once by to_dense. Make this +# more generic if it's been referenced more than once. +# # DO NOT define a backward for reshape! # reshape is special in that it sometimes returns a view, and sometimes not. # Defining a backward will make codegen spit out the forward call as @@ -1447,9 +1462,11 @@ - name: rsub.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor self: handle_r_to_c(self.scalar_type(), maybe_multiply(-grad, alpha.conj())) other: handle_r_to_c(other.scalar_type(), grad) + result: -maybe_multiply(self_t, alpha) + other_t - name: rsub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> Tensor self: handle_r_to_c(self.scalar_type(), maybe_multiply(-grad, alpha.conj())) + result: auto_element_wise - name: sum(Tensor self, *, ScalarType? dtype=None) -> Tensor self: grad.expand(self.sizes()) @@ -1564,7 +1581,11 @@ self: zeros_like(grad) result: auto_element_wise -- name: to_dense(Tensor self, ScalarType? dtype=None) -> Tensor +# DO NOT define a backward for to_dense +# See [Note: Sometimes view derivatives] +# - name: to_dense(Tensor self, ScalarType? dtype=None) -> Tensor +# +- name: _to_dense(Tensor self, ScalarType? dtype=None) -> Tensor self: to_dense_backward(grad, self) - name: to_sparse(Tensor self) -> Tensor @@ -1642,7 +1663,7 @@ self: at::view_as_real(grad.contiguous().resolve_conj()) # [gx, gy] result: at::view_as_complex(self_t) -- name: _s_where(Tensor condition, Tensor self, Tensor other) -> Tensor +- name: where.self(Tensor condition, Tensor self, Tensor other) -> Tensor condition: non_differentiable self: where(condition, grad, zeros_like(grad)) other: where(condition, zeros_like(grad), grad) @@ -1754,10 +1775,12 @@ - name: nll_loss_forward(Tensor self, Tensor target, Tensor? weight, int reduction, int ignore_index) -> (Tensor output, Tensor total_weight) self: nll_loss_backward(grad, self, target, weight, reduction, ignore_index, total_weight) target: non_differentiable + output: std::get<0>(nll_loss_forward(self_t, target, weight, reduction, ignore_index)) - name: nll_loss2d_forward(Tensor self, Tensor target, Tensor? weight, int reduction, int ignore_index) -> (Tensor output, Tensor total_weight) self: nll_loss2d_backward(grad, self, target, weight, reduction, ignore_index, total_weight) target: non_differentiable + output: std::get<0>(nll_loss2d_forward(self_t, target, weight, reduction, ignore_index)) - name: smooth_l1_loss(Tensor self, Tensor target, int reduction=Mean, float beta=1.0) -> Tensor self: smooth_l1_loss_backward(grad, self, target, reduction, beta) @@ -1837,6 +1860,7 @@ - name: _log_softmax(Tensor self, int dim, bool half_to_float) -> Tensor self: _log_softmax_backward_data(grad, result, dim, self.scalar_type()) + result: self_t - logsumexp_jvp(self_p, self_t, {dim}, true) - name: _sparse_log_softmax(Tensor self, int dim, bool half_to_float) -> Tensor self: _sparse_log_softmax_backward_data(grad, result, dim, self) @@ -1855,6 +1879,7 @@ - name: _softmax(Tensor self, int dim, bool half_to_float) -> Tensor self: _softmax_backward_data(grad, result, dim, self.scalar_type()) + result: result * (self_t - logsumexp_jvp(self_p, self_t, {dim}, true)) - name: _sparse_softmax(Tensor self, int dim, bool half_to_float) -> Tensor self: _sparse_softmax_backward_data(grad, result, dim, self) @@ -1903,43 +1928,52 @@ self: replication_pad3d_backward(grad, self, padding) result: auto_linear - # NOTE: Not implementing forward AD formulas for non-vec upsample overloads because they are - # only kept for backward compatability - name: upsample_linear1d(Tensor self, int[1] output_size, bool align_corners, float? scales=None) -> Tensor self: upsample_linear1d_backward(grad, output_size, self.sizes(), align_corners, scales) + result: auto_linear - name: upsample_bilinear2d(Tensor self, int[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor self: upsample_bilinear2d_backward(grad, output_size, self.sizes(), align_corners, scales_h, scales_w) + result: auto_linear - name: _upsample_bilinear2d_aa(Tensor self, int[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor self: _upsample_bilinear2d_aa_backward(grad, output_size, self.sizes(), align_corners, scales_h, scales_w) + result: auto_linear - name: upsample_bicubic2d(Tensor self, int[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor self: upsample_bicubic2d_backward(grad, output_size, self.sizes(), align_corners, scales_h, scales_w) + result: auto_linear - name: _upsample_bicubic2d_aa(Tensor self, int[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor self: _upsample_bicubic2d_aa_backward(grad, output_size, self.sizes(), align_corners, scales_h, scales_w) - name: upsample_trilinear3d(Tensor self, int[3] output_size, bool align_corners, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> Tensor self: upsample_trilinear3d_backward(grad, output_size, self.sizes(), align_corners, scales_d, scales_h, scales_w) + result: auto_linear - name: upsample_nearest1d(Tensor self, int[1] output_size, float? scales=None) -> Tensor self: upsample_nearest1d_backward(grad, output_size, self.sizes(), scales) + result: auto_linear - name: _upsample_nearest_exact1d(Tensor self, int[1] output_size, float? scales=None) -> Tensor self: _upsample_nearest_exact1d_backward(grad, output_size, self.sizes(), scales) + result: auto_linear - name: upsample_nearest2d(Tensor self, int[2] output_size, float? scales_h=None, float? scales_w=None) -> Tensor self: upsample_nearest2d_backward(grad, output_size, self.sizes(), scales_h, scales_w) + result: auto_linear - name: _upsample_nearest_exact2d(Tensor self, int[2] output_size, float? scales_h=None, float? scales_w=None) -> Tensor self: _upsample_nearest_exact2d_backward(grad, output_size, self.sizes(), scales_h, scales_w) + result: auto_linear - name: upsample_nearest3d(Tensor self, int[3] output_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> Tensor self: upsample_nearest3d_backward(grad, output_size, self.sizes(), scales_d, scales_h, scales_w) + result: auto_linear - name: _upsample_nearest_exact3d(Tensor self, int[3] output_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> Tensor self: _upsample_nearest_exact3d_backward(grad, output_size, self.sizes(), scales_d, scales_h, scales_w) + result: auto_linear - name: upsample_linear1d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> Tensor input: upsample_linear1d_backward(grad, output_size, input.sizes(), align_corners, scale_factors) @@ -2144,6 +2178,7 @@ - name: elu_backward(Tensor grad_output, Scalar alpha, Scalar scale, Scalar input_scale, bool is_result, Tensor self_or_result) -> Tensor grad_output: elu_backward(grad, alpha, scale, input_scale, is_result, self_or_result) self_or_result: elu_double_backward(grad, grad_output, alpha, scale, input_scale, is_result, self_or_result) + result: elu_backward(grad_output_t, alpha, scale, input_scale, is_result, self_or_result_p) + elu_double_backward(self_or_result_t, grad_output_p, alpha, scale, input_scale, is_result, self_or_result_p) - name: fractional_max_pool2d_backward(Tensor grad_output, Tensor self, int[2] kernel_size, int[2] output_size, Tensor indices) -> Tensor grad_output: max_pool_double_backward(grad, indices, 2) @@ -2186,6 +2221,24 @@ # self_is_result is always false here since double backward call is an out-of-place call, self is input itself grad_output: leaky_relu_backward(grad, self, negative_slope, false) self: zeros_like(grad) + # leaky_relu_backward(grad_output, self, negative_slope, false) + # computes grad_output * at::where(self_p > 0, 1, negative_slope) + # so the jvp formula is the following: + # grad_output_t * at::where(self_p > 0, self_p.new_ones([]), negative_slope); + # + # leaky_relu_backward(grad_output, result, negative_slope, true) + # computes grad_output * at::where(result > 0, 1, negative_slope) + # under the assumption that `negative_slope` is positive (otherwise, + # it is not possible to compute the gradient). + # + # so the jvp formula is the following: + # grad_output_t * at::where(result_p > 0, result_p.new_ones([]), negative_slope); + # with the assumption that negative_slope is positive. + # + # Combined together that results in the following optimized kernel which + # also checks the assumption that negative_slope is positive when self_is_result + # is True: + result: leaky_relu_backward(grad_output_t, self_p, negative_slope, self_is_result) - name: max_pool2d_with_indices_backward(Tensor grad_output, Tensor self, int[2] kernel_size, int[2] stride, int[2] padding, int[2] dilation, bool ceil_mode, Tensor indices) -> Tensor grad_output: max_pool_double_backward(grad, indices, 2) @@ -2286,43 +2339,52 @@ self: zeros_like(grad) result: zeros_like(self_t) + threshold_backward(grad_output_t, self_p, threshold) - # NOTE: Not implementing forward AD formulas for backwards of non-vec upsample overloads - # because they are only kept for backward compatability - name: upsample_linear1d_backward(Tensor grad_output, int[1] output_size, int[3] input_size, bool align_corners, float? scales=None) -> Tensor grad_output: upsample_linear1d(grad, output_size, align_corners, scales) + result: auto_linear - name: upsample_bilinear2d_backward(Tensor grad_output, int[2] output_size, int[4] input_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor grad_output: upsample_bilinear2d(grad, output_size, align_corners, scales_h, scales_w) + result: auto_linear - name: _upsample_bilinear2d_aa_backward(Tensor grad_output, int[2] output_size, int[4] input_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor grad_output: _upsample_bilinear2d_aa(grad, output_size, align_corners, scales_h, scales_w) + result: auto_linear - name: upsample_bicubic2d_backward(Tensor grad_output, int[2] output_size, int[4] input_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor grad_output: upsample_bicubic2d(grad, output_size, align_corners, scales_h, scales_w) + result: auto_linear - name: _upsample_bicubic2d_aa_backward(Tensor grad_output, int[2] output_size, int[4] input_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor grad_output: _upsample_bicubic2d_aa(grad, output_size, align_corners, scales_h, scales_w) - name: upsample_trilinear3d_backward(Tensor grad_output, int[3] output_size, int[5] input_size, bool align_corners, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> Tensor grad_output: upsample_trilinear3d(grad, output_size, align_corners, scales_d, scales_h, scales_w) + result: auto_linear - name: upsample_nearest1d_backward(Tensor grad_output, int[1] output_size, int[3] input_size, float? scales=None) -> Tensor grad_output: upsample_nearest1d(grad, output_size, scales) + result: auto_linear - name: _upsample_nearest_exact1d_backward(Tensor grad_output, int[1] output_size, int[3] input_size, float? scales=None) -> Tensor grad_output: _upsample_nearest_exact1d(grad, output_size, scales) + result: auto_linear - name: upsample_nearest2d_backward(Tensor grad_output, int[2] output_size, int[4] input_size, float? scales_h=None, float? scales_w=None) -> Tensor grad_output: upsample_nearest2d(grad, output_size, scales_h, scales_w) + result: auto_linear - name: _upsample_nearest_exact2d_backward(Tensor grad_output, int[2] output_size, int[4] input_size, float? scales_h=None, float? scales_w=None) -> Tensor grad_output: _upsample_nearest_exact2d(grad, output_size, scales_h, scales_w) + result: auto_linear - name: upsample_nearest3d_backward(Tensor grad_output, int[3] output_size, int[5] input_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> Tensor grad_output: upsample_nearest3d(grad, output_size, scales_d, scales_h, scales_w) + result: auto_linear - name: _upsample_nearest_exact3d_backward(Tensor grad_output, int[3] output_size, int[5] input_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> Tensor grad_output: _upsample_nearest_exact3d(grad, output_size, scales_d, scales_h, scales_w) + result: auto_linear - name: upsample_linear1d_backward.vec(Tensor grad_output, int[]? output_size, int[] input_size, bool align_corners, float[]? scale_factors) -> Tensor grad_output: upsample_linear1d(grad, output_size, align_corners, scale_factors) @@ -2490,12 +2552,15 @@ # fft - name: _fft_r2c(Tensor self, int[] dim, int normalization, bool onesided) -> Tensor self: fft_r2c_backward(grad, dim, normalization, onesided, self.size(dim.back())) + result: auto_linear - name: _fft_c2r(Tensor self, int[] dim, int normalization, int last_dim_size) -> Tensor self: fft_c2r_backward(grad, dim, normalization) + result: auto_linear - name: _fft_c2c(Tensor self, int[] dim, int normalization, bool forward) -> Tensor self: _fft_c2c(grad, dim, normalization, !forward) + result: auto_linear - name: unbind.int(Tensor(a -> *) self, int dim=0) -> Tensor(a)[] self: unbind_backward(grads, dim) @@ -2595,6 +2660,6 @@ - name: _efficientzerotensor(int[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor output_differentiability: [False] -- name: scatter_reduce.two(Tensor self, int dim, Tensor index, str reduce, *, int? output_size=None) -> Tensor - self: scatter_reduce_backward(grad, self, dim, index, reduce, result) +- name: scatter_reduce.two(Tensor self, int dim, Tensor index, Tensor src, str reduce, *, bool include_self=True) -> Tensor + self, src: scatter_reduce_backward(grad, self, dim, index, src, reduce, include_self, result) index: non_differentiable diff --git a/tools/autograd/gen_autograd_functions.py b/tools/autograd/gen_autograd_functions.py index be7c7212db8dc6..fd9f50e8eb80b9 100644 --- a/tools/autograd/gen_autograd_functions.py +++ b/tools/autograd/gen_autograd_functions.py @@ -13,7 +13,8 @@ uses_single_grad) from tools.codegen.api.types import (Binding, BaseCType, OptionalCType, tensorT, longT, doubleT, scalarT, stringT, boolT, intArrayRefT, - tensorListT, MutRefCType, ListCType, ArrayRefCType) + tensorListT, MutRefCType, ListCType, ArrayRefCType, + optionalIntArrayRefT) from tools.codegen.code_template import CodeTemplate from tools.codegen.utils import FileManager from tools.codegen.model import Argument @@ -204,7 +205,7 @@ GETTER_BODY_VEC_SAVEDVAR = """\ PyObject* tup = PyTuple_New((Py_ssize_t) prop.size()); -for (int i = 0; i < prop.size(); i++) { +for (auto i: c10::irange(prop.size())) { PyTuple_SetItem(tup, (Py_ssize_t) i, THPVariable_Wrap(prop[i].unpack(self->cdata))); } return tup; @@ -212,7 +213,7 @@ GETTER_BODY_RAW_VEC_SAVEDVAR = """\ PyObject* tup = PyTuple_New((Py_ssize_t) prop.size()); -for (int i = 0; i < prop.size(); i++) { +for (auto i : c10::irange(prop.size())) { pybind11::object obj = pybind11::cast(prop[i], pybind11::return_value_policy::reference); PyTuple_SetItem(tup, (Py_ssize_t) i, obj.release().ptr()); } @@ -221,7 +222,7 @@ GETTER_BODY_ARRAYREF_LONG = """\ PyObject* tup = PyTuple_New((Py_ssize_t) prop.size()); -for (int i = 0; i < prop.size(); i++) { +for (auto i : c10::irange(prop.size())) { PyTuple_SetItem(tup, (Py_ssize_t) i, PyLong_FromUnsignedLong((uint64_t) prop[i])); } return tup; @@ -229,7 +230,7 @@ GETTER_BODY_ARRAYREF_DOUBLE = """\ PyObject* tup = PyTuple_New((Py_ssize_t) prop.size()); -for (int i = 0; i < prop.size(); i++) { +for (auto i : c10::irange(prop.size())) { PyTuple_SetItem(tup, (Py_ssize_t) i, PyFloat_FromDouble((double) prop[i])); } return tup; @@ -422,6 +423,10 @@ def save_var(var: SavedAttribute, is_output: bool) -> None: saved_variables.append(f'std::vector {name};') getter_definitions.append(GETTER_DEFINITION.substitute( op=info.op, name=name, body=GETTER_BODY_ARRAYREF_LONG)) + elif type == BaseCType(optionalIntArrayRefT): + saved_variables.append(f'c10::OptionalArray {name};') + getter_definitions.append(GETTER_DEFINITION_OPT_ARRAYREF.substitute( + op=info.op, name=name, body=GETTER_BODY_ARRAYREF_LONG)) elif type == OptionalCType(BaseCType(intArrayRefT)): saved_variables.append(f'c10::OptionalArray {name};') getter_definitions.append(GETTER_DEFINITION_OPT_ARRAYREF.substitute( diff --git a/tools/autograd/gen_variable_type.py b/tools/autograd/gen_variable_type.py index 4b634146dfedcd..62def9cc627371 100644 --- a/tools/autograd/gen_variable_type.py +++ b/tools/autograd/gen_variable_type.py @@ -91,7 +91,7 @@ 'triu', 'chunk', 'zero_', 'eq_', 'ne_', 'add', '__radd__', 'sum', '_conj', 'sin', 'cos', 'mul', 'sinc', 'sinh', 'cosh', '__rmul__', 'sgn', 'asin', 'acos', 'sub', 'div', 'cat', 'view_as_complex', 'index_put', - 'neg', 'complex', 'select', '_s_where', 'as_strided', 'slice', 'constant_pad_nd', + 'neg', 'complex', 'select', 'where', 'as_strided', 'slice', 'constant_pad_nd', 'unbind', 'split', 'split_with_sizes', 'unsafe_split', 'split_with_sizes_backward', 'dot', 'vdot', 'cholesky', 'triangular_solve', 'mm', '_unsafe_view', 'mv', 'outer', 'bmm', 'diagonal', 'alias', 'atan', 'log', 'log10', 'log1p', 'log2', 'reciprocal', @@ -111,10 +111,11 @@ 'scatter', 'scatter_add', 'sigmoid', 'sigmoid_backward', 'trapezoid', 'cumulative_trapezoid', 'conj_physical_', '_neg_view', '_reshape_alias', '_det_lu_based_helper', 'lu_solve', 'linalg_solve_triangular', 'linalg_pinv', 'linalg_lstsq', 'col2im', 'col2im_backward', 'im2col', 'im2col_backward', + 'cholesky_inverse', } GRADIENT_IMPLEMENTED_FOR_SPARSE_COMPLEX = { - 'to_dense', '_coalesce', 'coalesce', 'values', '_sparse_coo_tensor_with_dims_and_tensors', + '_to_dense', '_coalesce', 'coalesce', 'values', '_sparse_coo_tensor_with_dims_and_tensors', 'sparse_mask_helper_cuda', '_sparse_addmm', } @@ -359,12 +360,12 @@ """) FW_DERIVATIVE_FORBID_TEMPLATE = CodeTemplate("""\ -TORCH_CHECK_NOT_IMPLEMENTED(!(${cond}), "Trying to use forward AD with ${msg} that does not support it."); +TORCH_CHECK_NOT_IMPLEMENTED(!(${cond}), "Trying to use forward AD with ${name} that does not support it ${msg}"); """) FW_DERIVATIVE_FORBID_LIST_TEMPLATE = CodeTemplate("""\ for (const auto& _t: ${arg}) { - TORCH_CHECK_NOT_IMPLEMENTED(!(${cond}), "Trying to use forward AD with ${msg} that does not support it."); + TORCH_CHECK_NOT_IMPLEMENTED(!(${cond}), "Trying to use forward AD with ${name} that does not support it ${msg}"); } """) @@ -952,9 +953,11 @@ def emit_fw_derivatives() -> List[str]: def emit_forbid_fw_derivatives(is_out_fn: bool = False) -> str: def get_msg() -> str: if is_out_fn: - msg = name + " (because it is an out= function)" + msg = "because it is an out= function" else: - msg = name + msg = ("because it has not been implemented yet.\\nPlease file an issue " + "to PyTorch at https://github.com/pytorch/pytorch/issues/new?template=feature-request.yml " + "so that we can prioritize its implementation.") return msg res = "" to_check: List[str] = [] @@ -964,13 +967,13 @@ def get_msg() -> str: to_check.append(FW_DERIVATIVE_CHECK_TEMPLATE.substitute(req_inp=inp.name)) elif is_tensor_list_type(inp.type): cond = FW_DERIVATIVE_CHECK_TEMPLATE.substitute(req_inp="_t") - res += FW_DERIVATIVE_FORBID_LIST_TEMPLATE.substitute(arg=inp.name, cond=cond, msg=get_msg()) + res += FW_DERIVATIVE_FORBID_LIST_TEMPLATE.substitute(arg=inp.name, cond=cond, name=name, msg=get_msg()) else: raise RuntimeError(f'Unsupported input type for "{name}" when forbidding forward AD usage.') if len(to_check) > 0: cond = " || ".join(to_check) - res += FW_DERIVATIVE_FORBID_TEMPLATE.substitute(cond=cond, msg=get_msg()) + res += FW_DERIVATIVE_FORBID_TEMPLATE.substitute(cond=cond, name=name, msg=get_msg()) return res body: List[str] = [] diff --git a/tools/autograd/templates/python_variable_methods.cpp b/tools/autograd/templates/python_variable_methods.cpp index c2e3c41746219c..95f8d3fafc119d 100644 --- a/tools/autograd/templates/python_variable_methods.cpp +++ b/tools/autograd/templates/python_variable_methods.cpp @@ -541,6 +541,28 @@ static PyObject * THPVariable_xpu(PyObject* self, PyObject* args, PyObject* kwar END_HANDLE_TH_ERRORS } +static PyObject * THPVariable_ipu(PyObject* self, PyObject* args, PyObject* kwargs) +{ + HANDLE_TH_ERRORS + static PythonArgParser parser({ + "ipu(Device? device=None, bool non_blocking=False, *, MemoryFormat? memory_format=None)", + "ipu(Device? device=None, bool async=False, *, MemoryFormat? memory_format=None)|deprecated" + }); + auto& self_ = THPVariable_Unpack(self); + ParsedArgs<3> parsed_args; + auto r = parser.parse(self, args, kwargs, parsed_args); + + if (r.has_torch_function()) { + return handle_torch_function(r, self, args, kwargs, THPVariableClass, "torch.Tensor"); + } + + auto device = r.isNone(0) ? at::Device(at::DeviceType::IPU) : r.device(0); + auto opt_memory_format = r.memoryformatOptional(2); + TORCH_CHECK(device.is_ipu(), "Invalid device, must be ipu device"); + return THPVariable_Wrap(dispatch_to(self_, device, r.toBool(1), false, opt_memory_format)); + END_HANDLE_TH_ERRORS +} + static PyObject * THPVariable_to_type(PyObject* self, ScalarType scalarType, c10::optional optional_memory_format) { HANDLE_TH_ERRORS auto& self_ = THPVariable_Unpack(self); @@ -1205,6 +1227,7 @@ PyMethodDef variable_methods[] = { {"cpu", castPyCFunctionWithKeywords(THPVariable_cpu), METH_VARARGS | METH_KEYWORDS, NULL}, {"cuda", castPyCFunctionWithKeywords(THPVariable_cuda), METH_VARARGS | METH_KEYWORDS, NULL}, {"xpu", castPyCFunctionWithKeywords(THPVariable_xpu), METH_VARARGS | METH_KEYWORDS, NULL}, + {"ipu", castPyCFunctionWithKeywords(THPVariable_ipu), METH_VARARGS | METH_KEYWORDS, NULL}, {"data_ptr", THPVariable_data_ptr, METH_NOARGS, NULL}, {"dim", THPVariable_dim, METH_NOARGS, NULL}, {"has_names", THPVariable_has_names, METH_NOARGS, NULL}, diff --git a/tools/bazel.bzl b/tools/bazel.bzl index 3589d09df314d3..edb99f898d267b 100644 --- a/tools/bazel.bzl +++ b/tools/bazel.bzl @@ -3,6 +3,13 @@ load("@rules_cuda//cuda:defs.bzl", "requires_cuda_enabled") load("//c10/macros:cmake_configure_file.bzl", "cmake_configure_file") load("//tools/config:defs.bzl", "if_cuda") +def _py_library(name, **kwds): + deps = [dep for dep in kwds.pop("deps", []) if dep != None] + native.py_library(name = name, deps = deps, **kwds) + +def _requirement(_pypi_project): + return None + # Rules implementation for the Bazel build system. Since the common # build structure aims to replicate Bazel as much as possible, most of # the rules simply forward to the Bazel definitions. @@ -14,6 +21,9 @@ rules = struct( filegroup = native.filegroup, glob = native.glob, if_cuda = if_cuda, + py_binary = native.py_binary, + py_library = _py_library, + requirement = _requirement, requires_cuda_enabled = requires_cuda_enabled, select = select, test_suite = native.test_suite, diff --git a/tools/build_variables.bzl b/tools/build_variables.bzl index c957ec6cb17e51..81c09f23a9dd7d 100644 --- a/tools/build_variables.bzl +++ b/tools/build_variables.bzl @@ -42,21 +42,33 @@ GENERATED_CPP = [ "autograd/generated/python_variable_methods.cpp", ] +# This is duplicated in caffe2/CMakeLists.txt for now and not yet used in buck +GENERATED_LAZY_TS_CPP = [ + "lazy/generated/LazyNativeFunctions.cpp", + "lazy/generated/RegisterAutogradLazy.cpp", + "lazy/generated/RegisterLazy.cpp", +] + # NVFuser runtime library libtorch_nvfuser_runtime_sources = [ + "torch/csrc/jit/codegen/cuda/runtime/array.cu", "torch/csrc/jit/codegen/cuda/runtime/bf16_support.cu", "torch/csrc/jit/codegen/cuda/runtime/block_reduction.cu", "torch/csrc/jit/codegen/cuda/runtime/block_sync_atomic.cu", "torch/csrc/jit/codegen/cuda/runtime/block_sync_default.cu", "torch/csrc/jit/codegen/cuda/runtime/broadcast.cu", "torch/csrc/jit/codegen/cuda/runtime/fp16_support.cu", + "torch/csrc/jit/codegen/cuda/runtime/fused_reduction.cu", "torch/csrc/jit/codegen/cuda/runtime/grid_broadcast.cu", "torch/csrc/jit/codegen/cuda/runtime/grid_reduction.cu", "torch/csrc/jit/codegen/cuda/runtime/grid_sync.cu", "torch/csrc/jit/codegen/cuda/runtime/helpers.cu", "torch/csrc/jit/codegen/cuda/runtime/index_utils.cu", + "torch/csrc/jit/codegen/cuda/runtime/tensorcore.cu", "torch/csrc/jit/codegen/cuda/runtime/random_numbers.cu", "torch/csrc/jit/codegen/cuda/runtime/tensor.cu", + "torch/csrc/jit/codegen/cuda/runtime/tuple.cu", + "torch/csrc/jit/codegen/cuda/runtime/type_traits.cu", "torch/csrc/jit/codegen/cuda/runtime/welford.cu", "torch/csrc/jit/codegen/cuda/runtime/warp.cu", "aten/src/ATen/cuda/detail/PhiloxCudaStateRaw.cuh", @@ -148,6 +160,7 @@ libtorch_profiler_sources = [ "torch/csrc/autograd/profiler_legacy.cpp", "torch/csrc/autograd/profiler_kineto.cpp", "torch/csrc/profiler/api.cpp", + "torch/csrc/profiler/collection.cpp", "torch/csrc/profiler/kineto_shim.cpp", "torch/csrc/profiler/nvtx_observer.cpp", "torch/csrc/monitor/counters.cpp", @@ -239,6 +252,7 @@ core_sources_full_mobile_no_backend_interface = [ "torch/csrc/jit/passes/constant_propagation.cpp", "torch/csrc/jit/passes/restore_mutation.cpp", "torch/csrc/jit/passes/create_autodiff_subgraphs.cpp", + "torch/csrc/jit/passes/cuda_graph_fuser.cpp", "torch/csrc/jit/passes/dead_code_elimination.cpp", "torch/csrc/jit/passes/eliminate_no_ops.cpp", "torch/csrc/jit/passes/remove_redundant_profiles.cpp", @@ -320,11 +334,14 @@ core_sources_full_mobile_no_backend_interface = [ "torch/csrc/jit/runtime/interpreter/preprocess_graph.cpp", "torch/csrc/jit/runtime/interpreter.cpp", "torch/csrc/jit/runtime/logging.cpp", + "torch/csrc/jit/runtime/simple_graph_executor_impl.cpp", "torch/csrc/jit/runtime/profiling_graph_executor_impl.cpp", "torch/csrc/jit/runtime/profiling_record.cpp", "torch/csrc/jit/runtime/script_profile.cpp", "torch/csrc/jit/runtime/symbolic_script.cpp", "torch/csrc/jit/runtime/symbolic_shape_registry.cpp", + "torch/csrc/jit/runtime/decomposition_registry.cpp", + "torch/csrc/jit/runtime/decomposition_registry_util.cpp", "torch/csrc/jit/runtime/symbolic_shape_registry_util.cpp", "torch/csrc/jit/runtime/jit_trace.cpp", "torch/csrc/jit/serialization/callstack_debug_info_serialization.cpp", @@ -341,6 +358,7 @@ core_sources_full_mobile_no_backend_interface = [ "torch/csrc/jit/tensorexpr/cpp_codegen.cpp", "torch/csrc/jit/tensorexpr/eval.cpp", "torch/csrc/jit/tensorexpr/expr.cpp", + "torch/csrc/jit/tensorexpr/external_functions_core.cpp", "torch/csrc/jit/tensorexpr/external_functions_registry.cpp", "torch/csrc/jit/tensorexpr/graph_opt.cpp", "torch/csrc/jit/tensorexpr/hash_provider.cpp", @@ -402,6 +420,7 @@ lazy_tensor_core_sources = [ "torch/csrc/lazy/backend/lowering_context.cpp", "torch/csrc/lazy/core/config.cpp", "torch/csrc/lazy/core/debug_util.cpp", + "torch/csrc/lazy/core/dynamic_ir.cpp", "torch/csrc/lazy/core/hash.cpp", "torch/csrc/lazy/core/helpers.cpp", "torch/csrc/lazy/core/ir.cpp", @@ -432,6 +451,9 @@ lazy_tensor_core_sources = [ "torch/csrc/lazy/core/view_ops/unsqueeze.cpp", "torch/csrc/lazy/core/view_ops/select_view_update.cpp", "torch/csrc/lazy/core/view_ops/view.cpp", + # We should better segment the sources, but for now there are actually dependencies + # from some core files on some of these ts_backend files + # so we continue to build these parts of ts_backend in all build configs "torch/csrc/lazy/ts_backend/config.cpp", "torch/csrc/lazy/ts_backend/ops/arithmetic_ir_ops.cpp", "torch/csrc/lazy/ts_backend/ops/cast.cpp", @@ -442,6 +464,20 @@ lazy_tensor_core_sources = [ "torch/csrc/lazy/ts_backend/ts_node.cpp", ] +# We can't build all of the ts backend under certain build configurations, e.g. mobile, +# since it depends on things like autograd, meta functions, which may be disabled +lazy_tensor_ts_sources = [ + "torch/csrc/lazy/ts_backend/ops/batch_norm_ops.cpp", + "torch/csrc/lazy/ts_backend/ops/random_ops.cpp", + "torch/csrc/lazy/ts_backend/ts_autograd_functions.cpp", + "torch/csrc/lazy/ts_backend/ts_backend_impl.cpp", + "torch/csrc/lazy/ts_backend/ts_lowering_context.cpp", + "torch/csrc/lazy/ts_backend/ts_native_functions.cpp", + "torch/csrc/lazy/ts_backend/ts_node_lowering.cpp", + "torch/csrc/lazy/ts_backend/tensor_aten_ops.cpp", + "torch/csrc/lazy/ts_backend/ts_eager_fallback.cpp", +] + lazy_tensor_core_python_sources = [ "torch/csrc/lazy/python/init.cpp", "torch/csrc/lazy/python/python_util.cpp", @@ -639,6 +675,7 @@ libtorch_cuda_core_sources = [ "torch/csrc/jit/codegen/cuda/compute_at.cpp", "torch/csrc/jit/codegen/cuda/compute_at_map.cpp", "torch/csrc/jit/codegen/cuda/codegen.cpp", + "torch/csrc/jit/codegen/cuda/contiguity.cpp", "torch/csrc/jit/codegen/cuda/dispatch.cpp", "torch/csrc/jit/codegen/cuda/expr_evaluator.cpp", "torch/csrc/jit/codegen/cuda/executor.cpp", @@ -669,8 +706,10 @@ libtorch_cuda_core_sources = [ "torch/csrc/jit/codegen/cuda/lower_allocation.cpp", "torch/csrc/jit/codegen/cuda/lower_double_buffer.cpp", "torch/csrc/jit/codegen/cuda/lower_expr_sort.cpp", + "torch/csrc/jit/codegen/cuda/lower_fused_reduction.cpp", "torch/csrc/jit/codegen/cuda/lower_fusion_simplifier.cpp", "torch/csrc/jit/codegen/cuda/lower_index.cpp", + "torch/csrc/jit/codegen/cuda/lower_index_hoist.cpp", "torch/csrc/jit/codegen/cuda/lower_insert_syncs.cpp", "torch/csrc/jit/codegen/cuda/lower_loops.cpp", "torch/csrc/jit/codegen/cuda/lower_magic_zero.cpp", @@ -678,6 +717,7 @@ libtorch_cuda_core_sources = [ "torch/csrc/jit/codegen/cuda/lower_predicate.cpp", "torch/csrc/jit/codegen/cuda/lower_replace_size.cpp", "torch/csrc/jit/codegen/cuda/lower_shift.cpp", + "torch/csrc/jit/codegen/cuda/lower_sync_information.cpp", "torch/csrc/jit/codegen/cuda/lower_thread_predicate.cpp", "torch/csrc/jit/codegen/cuda/lower_trivial_broadcast.cpp", "torch/csrc/jit/codegen/cuda/lower_trivial_reductions.cpp", @@ -716,6 +756,8 @@ libtorch_cuda_core_sources = [ "torch/csrc/jit/codegen/cuda/transform_view.cpp", "torch/csrc/jit/codegen/cuda/type.cpp", "torch/csrc/jit/codegen/cuda/utils.cpp", + "torch/csrc/jit/codegen/cuda/mma_type.cpp", + "torch/csrc/jit/codegen/cuda/scheduler/mma_utils.cpp", "torch/csrc/jit/passes/frozen_conv_add_relu_fusion_cuda.cpp", "torch/csrc/jit/tensorexpr/cuda_codegen.cpp", "torch/csrc/jit/runtime/register_cuda_ops.cpp", @@ -873,6 +915,7 @@ libtorch_python_core_sources = [ "torch/csrc/jit/passes/onnx/remove_inplace_ops_for_onnx.cpp", "torch/csrc/jit/passes/onnx/shape_type_inference.cpp", "torch/csrc/jit/passes/onnx/function_extraction.cpp", + "torch/csrc/jit/passes/onnx/onnx_log.cpp", "torch/csrc/jit/python/pybind_utils.cpp", "torch/csrc/jit/passes/onnx/pattern_conversion/common.cpp", "torch/csrc/jit/passes/onnx/pattern_conversion/pattern_encapsulation.cpp", @@ -981,6 +1024,7 @@ aten_cpu_source_non_codegen_list = [ "aten/src/ATen/ParallelNativeTBB.cpp", "aten/src/ATen/ParallelOpenMP.cpp", "aten/src/ATen/ParallelThreadPoolNative.cpp", + "aten/src/ATen/PythonTorchFunctionTLS.cpp", "aten/src/ATen/ScalarOps.cpp", "aten/src/ATen/SequenceNumber.cpp", "aten/src/ATen/SparseTensorImpl.cpp", @@ -1159,7 +1203,7 @@ aten_native_source_non_codegen_list = [ "aten/src/ATen/native/quantized/cpu/qconcat.cpp", "aten/src/ATen/native/quantized/cpu/qconv.cpp", "aten/src/ATen/native/quantized/cpu/qconv_prepack.cpp", - "aten/src/ATen/native/quantized/cpu/qconv_unpack.cpp", + "aten/src/ATen/native/quantized/cpu/qconv_unpack_impl.cpp", "aten/src/ATen/native/quantized/cpu/qelu.cpp", "aten/src/ATen/native/quantized/cpu/qembeddingbag.cpp", "aten/src/ATen/native/quantized/cpu/qembeddingbag_prepack.cpp", @@ -1171,7 +1215,7 @@ aten_native_source_non_codegen_list = [ "aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp", "aten/src/ATen/native/quantized/cpu/qconv_dynamic.cpp", "aten/src/ATen/native/quantized/cpu/qlinear_prepack.cpp", - "aten/src/ATen/native/quantized/cpu/qlinear_unpack.cpp", + "aten/src/ATen/native/quantized/cpu/qlinear_unpack_impl.cpp", "aten/src/ATen/native/quantized/cpu/qmatmul.cpp", "aten/src/ATen/native/quantized/cpu/qmul.cpp", "aten/src/ATen/native/quantized/cpu/qnormalization.cpp", @@ -1195,6 +1239,9 @@ aten_native_source_non_codegen_list = [ "aten/src/ATen/native/quantized/fake_quant_per_channel_affine.cpp", "aten/src/ATen/native/quantized/fake_quant_per_tensor_affine.cpp", "aten/src/ATen/native/quantized/library.cpp", + "aten/src/ATen/native/quantized/cpu/ruy_utils.cpp", + "aten/src/ATen/native/quantized/cpu/xnnpack_utils.cpp", + "aten/src/ATen/native/quantized/qlinear_unpack.cpp", "aten/src/ATen/quantized/QTensorImpl.cpp", "aten/src/ATen/quantized/Quantizer.cpp", "aten/src/ATen/native/Activation.cpp", @@ -1214,7 +1261,7 @@ aten_native_source_non_codegen_list = [ "aten/src/ATen/native/CPUBlas.cpp", "aten/src/ATen/native/ChanelShuffle.cpp", "aten/src/ATen/native/Col2Im.cpp", - "aten/src/ATen/native/ConstantPadNd.cpp", + "aten/src/ATen/native/PadNd.cpp", "aten/src/ATen/native/Convolution.cpp", "aten/src/ATen/native/ConvolutionMM2d.cpp", "aten/src/ATen/native/ConvolutionMM3d.cpp", diff --git a/tools/code_coverage/README.md b/tools/code_coverage/README.md index 6e83dc593ed155..67adb445d053d7 100644 --- a/tools/code_coverage/README.md +++ b/tools/code_coverage/README.md @@ -3,7 +3,7 @@ ## Overview This tool is designed for calculating code coverage for Pytorch project. -It’s an integrated tool. You can use this tool to run and generate both file-level and line-level report for C++ and Python tests. It will also be the tool we use in *CircleCI* to generate report for each master commit. +It’s an integrated tool. You can use this tool to run and generate both file-level and line-level report for C++ and Python tests. It will also be the tool we use in *CircleCI* to generate report for each main commit. ### Simple * *Simple command to run:* @@ -30,11 +30,11 @@ This part will introduce about the arguments you can use when run this tool. The We have two different compilers, `gcc` and `clang`, and this tool supports both. But it is recommended to use `gcc` because it's much faster and use less disk place. The examples will also be divided to two parts, for `gcc` and `clang`. ## Preparation -The first step is to [build *Pytorch* from source](https://github.com/pytorch/pytorch#from-source) with `CODE_COVERAGE` option `ON`. You may also want to set `BUILD_TEST` option `ON` to get the test binaries. Besides, if you are under `gcc` compiler, to get accurate result, it is recommended to also select `CMAKE_BUILD_CONFIG=Debug`. +The first step is to [build *Pytorch* from source](https://github.com/pytorch/pytorch#from-source) with `USE_CPP_CODE_COVERAGE` option `ON`. You may also want to set `BUILD_TEST` option `ON` to get the test binaries. Besides, if you are under `gcc` compiler, to get accurate result, it is recommended to also select `CMAKE_BUILD_TYPE=Debug`. See: [how to adjust build options](https://github.com/pytorch/pytorch#adjust-build-options-optional) for reference. Following is one way to adjust build option: ``` # in build/ folder (all build artifacts must in `build/` folder) -cmake .. -DCODE_COVERAGE=ON -DBUILD_TEST=ON -DCMAKE_BUILD_CONFIG=Debug +cmake .. -DUSE_CPP_CODE_COVERAGE=ON -DBUILD_TEST=ON -DCMAKE_BUILD_TYPE=Debug ``` @@ -53,7 +53,7 @@ python oss_coverage.py --run-only=atest ``` This command will run `atest` binary in `build/bin/` folder and generate reoports over the entire *Pytorch* folder. You can find the reports in `profile/summary`. But you may only be interested in the `aten` folder, in this case, try: ``` -python oss_coverage.py --run-only=atest --interested-only=aten +python oss_coverage.py --run-only=atest --interest-only=aten ``` In *Pytorch*, `c++` tests located in `build/bin/` and `python` tests located in `test/`. If you want to run `python` test, try: ``` @@ -62,7 +62,7 @@ python oss_coverage.py --run-only=test_complex.py You may also want to specify more than one test or interested folder, in this case, try: ``` -python oss_coverage.py --run-only=atest c10_logging_test --interested-only aten/src/Aten c10/core +python oss_coverage.py --run-only=atest c10_logging_test --interest-only aten/src/Aten c10/core ``` That it is! With these two simple options, you can customize many different functionality according to your need. By default, the tool will run all tests in `build/bin` folder (by running all executable binaries in it) and `test/` folder (by running `run_test.py`), and then collect coverage over the entire *Pytorch* folder. If this is what you want, try: @@ -84,9 +84,9 @@ By default all steps will be run, but you can specify only run one of them. Foll `—summary` is useful when you have different interested folder. For example, ```bash # after run this command -python oss_coverage.py --run-only=atest --interested-folder=aten +python oss_coverage.py --run-only=atest --interest-only=aten # you may then want to learn atest's coverage over c10, instead of running the test again, you can: -python oss_coverage.py --run-only=atest --interested-folder=c10 --summary +python oss_coverage.py --run-only=atest --interest-only=c10 --summary ``` diff --git a/tools/codegen/BUILD.bazel b/tools/codegen/BUILD.bazel new file mode 100644 index 00000000000000..d1a0db360d230f --- /dev/null +++ b/tools/codegen/BUILD.bazel @@ -0,0 +1,4 @@ +load("//:tools/bazel.bzl", "rules") +load(":build.bzl", "define_targets") + +define_targets(rules = rules) diff --git a/tools/codegen/api/autograd.py b/tools/codegen/api/autograd.py index 64b7547e78f0d3..635ad927e8a221 100644 --- a/tools/codegen/api/autograd.py +++ b/tools/codegen/api/autograd.py @@ -335,9 +335,44 @@ def repl(m: Match[str]) -> str: required_primals = required_primals + ("self",) if required_primals else ("self",) if not is_exact_match: - # Make sure that the forward grad is modified inplace when the original formula - # is out of place - formula = f"self_t_raw.defined() ? self_t_raw.copy_({formula}) : {formula}" + # NOTE [In-place forward AD formula Optimization] + # + # This optimization transforms the formula to directly do inplace, i.e. + # instead of self_t.copy_(self_t.op()) we do self_t.op_() when the following are met: + # + # 1) the formula satisfies the pattern: "self_t.op(*args)" + # 2) "op" in (1) needs to be the same as the op the derivative is for + # + # (2) may seem too strict, but currently the only ops that satisfy (1) also satisfy (2) + # If there is a need, we can relax (2) to allow any op that has an in-place variant + is_single_method_on_self_t = False + match = re.fullmatch(r'self_t.([\w]*)\((.*)\)', formula) + if match: + op_name, between_parens = match.group(1), match.group(2) + + # We want to... + # Match: self_t.op1(other_p.op2(arg)) + # Avoid: self_t.op1(args) + self_t.op2(args) + # Avoid: self_t.op1(other_p.op2(arg)) + self_t.op2(args) + def check_parens_nest_level_gt_zero(s: str) -> bool: + level = 1 + for ch in s: + if ch == ")": + level -= 1 + if level == 0: + return False + if ch == "(": + level += 1 + return True + is_single_method_on_self_t = check_parens_nest_level_gt_zero(between_parens) + directly_do_inplace = is_single_method_on_self_t and op_name == info.name + + if directly_do_inplace: + formula = f"self_t_raw.defined() ? self_t_raw.{op_name}_({between_parens}) : {formula}" + else: + # Make sure that the forward grad is modified inplace when the original formula + # is out of place + formula = f"self_t_raw.defined() ? self_t_raw.copy_({formula}) : {formula}" required_original_self_value = bool(re.search(IDENT_REGEX.format("original_self_p"), formula)) diff --git a/tools/codegen/api/cpp.py b/tools/codegen/api/cpp.py index a485fc17acf601..904ab1c486940c 100644 --- a/tools/codegen/api/cpp.py +++ b/tools/codegen/api/cpp.py @@ -6,7 +6,8 @@ MutRefCType, ArrayCType, ListCType, VectorCType, ArrayRefCType, OptionalCType, TupleCType, SpecialArgName, boolT, scalarT, tensorListT, dimnameListT, tensorT, voidT, longT, - BaseTypeToCppMapping, intArrayRefT, tensorOptionsT) + BaseTypeToCppMapping, intArrayRefT, optionalIntArrayRefT, + tensorOptionsT) from tools.codegen import local from tools.codegen.utils import assert_never from typing import Optional, Sequence, Union, List, Set @@ -92,6 +93,8 @@ def argumenttype_type(t: Type, *, mutable: bool, binds: ArgName, remove_non_owni return NamedCType(binds, ConstRefCType(OptionalCType(BaseCType(tensorT)))) elif str(t.elem) == 'Scalar': return NamedCType(binds, ConstRefCType(OptionalCType(BaseCType(scalarT)))) + elif isinstance(t.elem, ListType) and str(t.elem.elem) == 'int': + return NamedCType(binds, BaseCType(optionalIntArrayRefT)) elem = argumenttype_type(t.elem, mutable=mutable, binds=binds) return NamedCType(binds, OptionalCType(elem.type)) elif isinstance(t, ListType): diff --git a/tools/codegen/api/lazy.py b/tools/codegen/api/lazy.py index ebbc72eb1fc000..6c927e62aa9014 100644 --- a/tools/codegen/api/lazy.py +++ b/tools/codegen/api/lazy.py @@ -1,29 +1,29 @@ -from typing import List, Union, Tuple +from typing import List, Union, Tuple, Optional from tools.codegen.model import (Type, BaseTy, BaseType, OptionalType, ListType, OperatorName, FunctionSchema, - Return, TensorOptionsArguments) + Return, TensorOptionsArguments, Argument) from tools.codegen.api.types import (CType, BaseCppType, BaseCType, OptionalCType, NamedCType, deviceT, layoutT, VectorCType, boolT, longT, doubleT, ListCType, stringT, - scalarT, scalarTypeT) + scalarT, scalarTypeT, memoryFormatT) valueT = BaseCppType('torch::lazy', 'Value') - +# this is a bad hack. I need to refactor the data model to represent each arg in the schema as an object, +# making it easier to represent special properties of an arg. +tensorListValueT = BaseCppType('torch::lazy', 'Value') def process_ir_type(typ: Type) -> Union[BaseCType, VectorCType, OptionalCType, ListCType]: """ This function takes a type from NativeFunctions and converts it for use with - lazy tensor codegen. Currently its output is used in several places, and so far - it has been possible for them to all use the same conversions, but that may not be - optimal or possible in the finished system. + lazy tensor codegen. Type conversion for lazy currently consists of - (1) changing Tensor-like things into Value-like things + (1) changing at::Tensors into lazy::Values (2) wrapping everything in a BaseCType - (3) making reference types into values (e.g. vector instead of IntArrayRef) + (3) making cpp-reference types into cpp-value types (e.g. vector instead of IntArrayRef) - (1) converts Tensors to Values since Values are how Lazy IR represents tensors. There - is special handling for Optional[Tensor] or List[Tensor], etc- hence 'tensor-like' + (1) converts at::Tensors to lazy::Values (which wrap lazy::Nodes, with which Lazy IR represents tensors.) + There is special handling for Optional[Tensor] or List[Tensor], etc- hence 'tensor-like' This is incomplete- there are assertions in places that it's expected to need to add more types as the codegen is used with more operators. @@ -33,7 +33,7 @@ def process_ir_type(typ: Type) -> Union[BaseCType, VectorCType, OptionalCType, L return BaseCType(valueT) elif typ.name == BaseTy.Scalar: # at::scalar has special handling, - # and is wrapped in an IR value just like at::tensor + # and is wrapped in an lazy::Value just like at::tensor return BaseCType(valueT) elif typ.name == BaseTy.ScalarType: return BaseCType(scalarTypeT) @@ -49,6 +49,8 @@ def process_ir_type(typ: Type) -> Union[BaseCType, VectorCType, OptionalCType, L return BaseCType(deviceT) elif typ.name == BaseTy.Layout: return BaseCType(layoutT) + elif typ.name == BaseTy.MemoryFormat: + return BaseCType(memoryFormatT) else: raise AssertionError(f"TODO add support for type {repr(typ)}") elif isinstance(typ, OptionalType): @@ -57,6 +59,9 @@ def process_ir_type(typ: Type) -> Union[BaseCType, VectorCType, OptionalCType, L if str(typ.elem) == 'Tensor?': # TODO(whc) is this actually correct? or should it use a Vector like above return ListCType(OptionalCType(BaseCType(valueT))) + elif str(typ.elem) == 'Tensor': + # this is a TensorList which comes in from GetTensorList as a Value + return BaseCType(tensorListValueT) else: return VectorCType(process_ir_type(typ.elem)) else: @@ -74,8 +79,7 @@ def isValueType(typ: CType) -> bool: return typ.type == valueT or typ.type == scalarT elif isinstance(typ, (OptionalCType, ListCType, VectorCType)): return isValueType(typ.elem) - else: - return False + return False def isWrappedScalarType(typ: Type) -> bool: """ @@ -89,45 +93,79 @@ def isWrappedScalarType(typ: Type) -> bool: return typ.name == BaseTy.Scalar elif isinstance(typ, (OptionalType, ListType)): return isWrappedScalarType(typ.elem) - else: - return False + return False + +def isGeneratorType(typ: Type) -> bool: + if isinstance(typ, BaseType): + return typ.name == BaseTy.Generator + elif isinstance(typ, (OptionalType)): + return isGeneratorType(typ.elem) + return False + +class LazyArgument: + name: str + orig_type: Type + lazy_type_: Optional[CType] + is_wrapped_scalar: bool + is_generator: bool + + # true if this argument is or contains a lazy IR value + is_lazy_value: bool + + def __init__(self, arg: Argument): + self.name = arg.name + self.orig_type = arg.type + self.is_generator = isGeneratorType(arg.type) + if self.is_generator: + assert isinstance(arg.type, OptionalType), "We expect all generators are optional since currently they are" + # there is no handling for generators in TorchScript IR (or XLA) + # so we fall back to eager if the (optional)generator has value, and otherwise + # its null and safe to exclude from lazy IR + self.lazy_type_ = None + else: + self.lazy_type_ = process_ir_type(arg.type) + self.is_wrapped_scalar = isWrappedScalarType(arg.type) + self.is_lazy_value = not self.is_generator and isValueType(self.lazy_type) + + @property + def lazy_type(self) -> CType: + assert self.lazy_type_ is not None, f"Attempted to access lazy_type for invalid argument {self.name}" + return self.lazy_type_ # Inspired by a FunctionSchema object, a LazyIrSchema holds the schema of a Lazy IR node. # Unlike a FunctionSchema, it has no round-trippable string form (relating to the YAML), # but carries type information from a native FunctionSchema modified for use with IR nodes, # and preserving original argument names. - - class LazyIrSchema: # The name of the operator this function schema describes. name: 'OperatorName' - positional_arg_types: Tuple[NamedCType, ...] - keyword_arg_types: Tuple[NamedCType, ...] + positional_args: Tuple[LazyArgument, ...] + keyword_args: Tuple[LazyArgument, ...] # TODO: Need to handle collisions with argument names at some point returns: Tuple['Return', ...] - wrapped_scalar_names: List[str] + # if this schema has a Generator arg, list its orig ctype/name but don't + # build a LazyArgument since lazy IR doesn't support it + generator_arg: Optional[NamedCType] = None def __init__(self, func: FunctionSchema): - positional_arg_types = [] + positional_args = [] for arg_field in ["pre_self_positional", "self_arg", "post_self_positional"]: if arg_field == "self_arg" and func.arguments.self_arg is not None: arg = getattr(func.arguments, "self_arg").argument - positional_arg_types.append(NamedCType(arg.name, process_ir_type(arg.type))) + positional_args.append(LazyArgument(arg)) elif getattr(func.arguments, arg_field) is not None: - positional_arg_types.extend([ - NamedCType( - arg.name, - process_ir_type(arg.type)) for arg in getattr(func.arguments, arg_field)]) - self.positional_arg_types = tuple(positional_arg_types) + positional_args.extend([ + LazyArgument(arg) for arg in getattr(func.arguments, arg_field)]) + self.positional_args = tuple(positional_args) - keyword_arg_types = [] + keyword_args = [] for arg_field in ["pre_tensor_options_kwarg_only", "tensor_options", "post_tensor_options_kwarg_only", @@ -136,11 +174,14 @@ def __init__(self, func: FunctionSchema): if curr_args is not None: if isinstance(curr_args, TensorOptionsArguments): curr_args = curr_args.all() - keyword_arg_types.extend([NamedCType(arg.name, process_ir_type(arg.type)) for arg in curr_args]) - self.keyword_arg_types = tuple(keyword_arg_types) + for arg in curr_args: + if isGeneratorType(arg.type): + assert self.generator_arg is None, "We expect there is only one generator arg" + self.generator_arg = NamedCType(arg.name, arg.type) + keyword_args.extend([LazyArgument(arg) for arg in curr_args]) + self.keyword_args = tuple(keyword_args) self.name = func.name self.returns = func.returns - self.wrapped_scalar_names = [arg.name for arg in func.schema_order_arguments() if isWrappedScalarType(arg.type)] @property def node_name(self) -> str: @@ -162,36 +203,42 @@ def aten_name(self) -> str: def base_name(self) -> str: return f"{self.name.name.base}" - def filtered_types(self, positional: bool = True, keyword: bool = True, - values: bool = True, scalars: bool = True) -> List[NamedCType]: - types: List[NamedCType] = [] + def filtered_args(self, positional: bool = True, keyword: bool = True, + values: bool = True, scalars: bool = True, generator: bool = False) -> List[LazyArgument]: + # This function maintains the sorted order of arguments but provides different filtered views. + # Some parts of the code care about kwargs vs args (TS lowerings), + # other parts care about whether they need to wrap the arg in a lazy value or leave it alone. + # Generators are special cased, as they are needed for fallback/shape-inference but not supported + # in TS lowerings and therefore also omitted from lazy IR. + args: List[LazyArgument] = [] if positional: - types.extend(self.positional_arg_types) + args.extend(self.positional_args) if keyword: - types.extend(self.keyword_arg_types) - - if values and scalars: - return types - - if values: - return [t for t in types if isValueType(t.type)] + args.extend(self.keyword_args) + + if values and scalars and generator: + return args + elif values and scalars: + return [a for a in args if not a.is_generator] + elif values: + return [a for a in args if a.is_lazy_value] elif scalars: - return [t for t in types if not isValueType(t.type)] + return [a for a in args if not a.is_lazy_value and (generator or not a.is_generator)] return [] @property - def positional_values(self) -> List[NamedCType]: - return self.filtered_types(positional=True, keyword=False, values=True, scalars=False) + def positional_values(self) -> List[LazyArgument]: + return self.filtered_args(positional=True, keyword=False, values=True, scalars=False) @property - def positional_scalars(self) -> List[NamedCType]: - return self.filtered_types(positional=True, keyword=False, values=False, scalars=True) + def positional_scalars(self) -> List[LazyArgument]: + return self.filtered_args(positional=True, keyword=False, values=False, scalars=True) @property - def keyword_values(self) -> List[NamedCType]: - return self.filtered_types(positional=False, keyword=True, values=True, scalars=False) + def keyword_values(self) -> List[LazyArgument]: + return self.filtered_args(positional=False, keyword=True, values=True, scalars=False) @property - def keyword_scalars(self) -> List[NamedCType]: - return self.filtered_types(positional=False, keyword=True, values=False, scalars=True) + def keyword_scalars(self) -> List[LazyArgument]: + return self.filtered_args(positional=False, keyword=True, values=False, scalars=True) diff --git a/tools/codegen/api/python.py b/tools/codegen/api/python.py index 6c362cb87387b3..759f7e504aab3f 100644 --- a/tools/codegen/api/python.py +++ b/tools/codegen/api/python.py @@ -188,29 +188,6 @@ class PythonReturns: returns: Tuple[Return, ...] - def named_tuple_pyi(self) -> Optional[Tuple[str, str]]: - python_returns = [argument_type_str_pyi(r.type) for r in self.returns] - field_names = namedtuple_fieldnames(self.returns) - if field_names: - namedtuple_name = '_'.join(['namedtuple'] + field_names) - tuple_args = [f'("{name}", {typ})' for name, typ in zip(field_names, python_returns)] - namedtuple_def = f'NamedTuple("{namedtuple_name}", [{", ".join(tuple_args)}])' - return namedtuple_name, namedtuple_def - return None - - def returns_str_pyi(self) -> str: - named_tuple = self.named_tuple_pyi() - if named_tuple is not None: - namedtuple_name, _ = named_tuple - return namedtuple_name - - python_returns = [argument_type_str_pyi(r.type) for r in self.returns] - if len(python_returns) > 1: - return 'Tuple[' + ', '.join(python_returns) + ']' - if len(python_returns) == 1: - return python_returns[0] - return 'None' - @dataclass(frozen=True) class PythonArgument: @@ -399,7 +376,7 @@ def signature_str_pyi(self, *, skip_outputs: bool = False) -> str: schema_formals.insert(positional_argc, '*') # only pyi signatures include returns - returns_str = self.returns.returns_str_pyi() + returns_str = returns_str_pyi(self) # pyi also includes self (with no typing/defaults) for methods if self.method: schema_formals.insert(0, "self") @@ -425,7 +402,7 @@ def signature_str_pyi_vararg(self, *, skip_outputs: bool = False) -> Optional[st # vararg signatures also omit the asterix schema_formals[0] = '*' + args[0].name + ': _int' - returns_str = self.returns.returns_str_pyi() + returns_str = returns_str_pyi(self) # pyi also includes self (with no typing/defaults) for methods if self.method: schema_formals.insert(0, "self") @@ -465,7 +442,7 @@ def signature_str_pyi(self, *, skip_outputs: bool = False) -> str: if len(schema_formals) > positional_argc: schema_formals.insert(positional_argc, '*') - returns_str = self.returns.returns_str_pyi() + returns_str = returns_str_pyi(self) return f'def {self.name}({", ".join(schema_formals)}) -> {returns_str}: ...' def signature_str_pyi_vararg(self, *, skip_outputs: bool = False) -> Optional[str]: @@ -594,7 +571,7 @@ def argument_type_str(t: Type, *, simple_type: bool = False) -> str: elif t.name in [BaseTy.bool, BaseTy.QScheme, BaseTy.Scalar, BaseTy.ScalarType, BaseTy.Generator, BaseTy.Storage, BaseTy.Layout, BaseTy.Device, BaseTy.MemoryFormat, - BaseTy.Dimname, BaseTy.Stream, BaseTy.ConstQuantizerPtr]: + BaseTy.Dimname, BaseTy.Stream, BaseTy.ConstQuantizerPtr, BaseTy.SymInt]: # These python schema type names line up with their function schema names return t.name.name @@ -777,6 +754,8 @@ def argument_type_str_pyi(t: Type) -> str: if isinstance(t, BaseType): if t.name == BaseTy.int: ret = '_int' + if t.name == BaseTy.SymInt: + ret = 'SymInt' elif t.name == BaseTy.float: ret = '_float' elif t.name == BaseTy.str: @@ -826,6 +805,51 @@ def argument_type_str_pyi(t: Type) -> str: raise RuntimeError(f'unrecognized type {repr(t)}') +def return_type_str_pyi(t: Type) -> str: + # Where arguments are open to accepting Union, return types should return + # concrete types + + if isinstance(t, OptionalType): + inner = return_type_str_pyi(t.elem) + return f"Optional[{inner}]" + + if isinstance(t, BaseType): + if t.name == BaseTy.Device: + return '_device' + elif t.name == BaseTy.Dimname: + ret = 'Optional[str]' + else: + return argument_type_str_pyi(t) + + if isinstance(t, ListType): + inner = return_type_str_pyi(t.elem) + return f"List[{inner}]" + + return argument_type_str_pyi(t) + +def returns_named_tuple_pyi(signature: PythonSignature) -> Optional[Tuple[str, str]]: + python_returns = [return_type_str_pyi(r.type) for r in signature.returns.returns] + namedtuple_name = signature.name + field_names = namedtuple_fieldnames(signature.returns.returns) + if field_names: + tuple_args = [f'("{name}", {typ})' for name, typ in zip(field_names, python_returns)] + namedtuple_def = f'NamedTuple("{namedtuple_name}", [{", ".join(tuple_args)}])' + return namedtuple_name, namedtuple_def + return None + +def returns_str_pyi(signature: PythonSignature) -> str: + field_names = namedtuple_fieldnames(signature.returns.returns) + if field_names: + return f"torch.return_types.{signature.name}" + + python_returns = [return_type_str_pyi(r.type) for r in signature.returns.returns] + if len(python_returns) > 1: + return 'Tuple[' + ', '.join(python_returns) + ']' + if len(python_returns) == 1: + return python_returns[0] + return 'None' + + # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # # # C++ Function Dispatch @@ -919,6 +943,7 @@ def dispatch_lambda_arg(cpp_arg: Binding) -> DispatchLambdaArgument: '::std::tuple', '::std::tuple', '::std::tuple', + '::std::tuple>', '::std::vector', 'at::Scalar', 'bool', 'int64_t', 'void*', 'void', 'at::QScheme', 'double', @@ -1011,6 +1036,8 @@ def arg_parser_unpack_method(t: Type, has_default: bool) -> str: return 'deviceWithDefault' if has_default else 'device' elif t.name == BaseTy.int: return 'toInt64' + elif t.name == BaseTy.SymInt: + return 'toSymInt' elif t.name == BaseTy.bool: return 'toBool' elif t.name == BaseTy.float: diff --git a/tools/codegen/api/structured.py b/tools/codegen/api/structured.py index a8c714a293f44f..b12a092a49f0a5 100644 --- a/tools/codegen/api/structured.py +++ b/tools/codegen/api/structured.py @@ -5,7 +5,8 @@ from tools.codegen.api.types import (ArgName, BaseCType, Binding, ArrayRefCType, ConstRefCType, OptionalCType, NamedCType, tensorT, scalarT, intArrayRefT, dimnameListT, - optionalTensorRefT, optionalScalarRefT) + optionalTensorRefT, optionalScalarRefT, + optionalIntArrayRefT, iTensorListRefT) from tools.codegen.api import cpp from tools.codegen.utils import assert_never @@ -37,15 +38,13 @@ def argumenttype_type(t: Type, *, mutable: bool, binds: ArgName) -> NamedCType: return NamedCType(binds, BaseCType(optionalTensorRefT)) elif t.elem == BaseType(BaseTy.Scalar): return NamedCType(binds, BaseCType(optionalScalarRefT)) + elif isinstance(t.elem, ListType) and str(t.elem.elem) == 'int': + return NamedCType(binds, BaseCType(optionalIntArrayRefT)) elem = argumenttype_type(t.elem, mutable=mutable, binds=binds) return NamedCType(binds, OptionalCType(elem.type)) elif isinstance(t, ListType): if t.elem == BaseType(BaseTy.Tensor): - raise AssertionError( - "list of tensor not supported by structured yet; to implement this " - "resolve torch::List issue, see " - "https://fb.workplace.com/groups/894363187646754/permalink/1149276442155426" - ) + return NamedCType(binds, BaseCType(iTensorListRefT)) # TODO: delete these special cases; see tools.codegen.api.cpp--these # must be changed in tandem, but there are problems; see # https://github.com/pytorch/pytorch/pull/51485 diff --git a/tools/codegen/api/translate.py b/tools/codegen/api/translate.py index 53919136ba6bdc..aea12852ea22e7 100644 --- a/tools/codegen/api/translate.py +++ b/tools/codegen/api/translate.py @@ -1,12 +1,12 @@ from typing import Dict, Sequence, List, NoReturn, Union -from tools.codegen.api.types import (BaseCType, Binding, ConstRefCType, +from tools.codegen.api.types import (tensorListT, BaseCType, Binding, ConstRefCType, Expr, MutRefCType, OptionalCType, NamedCType, SpecialArgName, tensorT, memoryFormatT, tensorOptionsT, scalarTypeT, boolT, deviceT, layoutT, optionalTensorRefT, - scalarT, optionalScalarRefT, + iTensorListRefT, scalarT, optionalScalarRefT, VectorCType, longT, intArrayRefT, - scalar_t, opmath_t) + scalar_t, opmath_t, optionalIntArrayRefT) # This file implements a small program synthesis engine that implements # conversions between one API to another. @@ -39,6 +39,7 @@ options_ctype = NamedCType("options", ConstRefCType(BaseCType(tensorOptionsT))) longVec_ctype = VectorCType(BaseCType(longT)) +optionalLongVec_ctype = OptionalCType(VectorCType(BaseCType(longT))) optionalScalar_ctype = OptionalCType(BaseCType(scalarT)) optionalTensor_ctype = OptionalCType(BaseCType(tensorT)) @@ -141,6 +142,10 @@ def translate( if t.type == BaseCType(scalar_t): ctx[NamedCType(t.name, BaseCType(opmath_t))] = f'static_cast({b.expr})' + # [Note: ITensorListRef] + if t.type == BaseCType(tensorListT): + ctx[NamedCType(t.name, BaseCType(iTensorListRefT))] = f"at::ITensorListRef({b.expr})" + # Add implicit bindings if the generated code is inside a Tensor method if method: ctx[NamedCType("self", MutRefCType(BaseCType(tensorT)))] = "const_cast(*this)" @@ -235,6 +240,8 @@ def direct_solve(goal: NamedCType) -> str: # We can always do translations from value types to reference types, like vector -> IntArrayRef elif goal.type == BaseCType(intArrayRefT): return direct_solve(NamedCType(goal.name, longVec_ctype)) + elif goal.type == BaseCType(optionalIntArrayRefT): + return direct_solve(NamedCType(goal.name, optionalLongVec_ctype)) elif goal.type == BaseCType(optionalScalarRefT): return direct_solve(NamedCType(goal.name, optionalScalar_ctype)) elif goal.type == BaseCType(optionalTensorRefT): @@ -254,6 +261,10 @@ def direct_solve(goal: NamedCType) -> str: intArrayRef_ctype = NamedCType(goal.name, BaseCType(intArrayRefT)) argname = direct_solve(intArrayRef_ctype) return f'{argname}.vec()' + elif goal.type == OptionalCType(VectorCType(BaseCType(longT))): + optionalIntArrayRef_ctype = NamedCType(goal.name, BaseCType(optionalIntArrayRefT)) + argname = direct_solve(optionalIntArrayRef_ctype) + return f'{argname}.has_value() ? c10::make_optional({argname}->vec()) : c10::nullopt' elif goal.type == OptionalCType(BaseCType(scalarT)): optionalScalarRef_ctype = NamedCType(goal.name, BaseCType(optionalScalarRefT)) argname = direct_solve(optionalScalarRef_ctype) diff --git a/tools/codegen/api/types.py b/tools/codegen/api/types.py index 8a01b49bfb42fc..81a198a79e5240 100644 --- a/tools/codegen/api/types.py +++ b/tools/codegen/api/types.py @@ -53,6 +53,7 @@ def __str__(self) -> str: tensorT = BaseCppType('at', 'Tensor') optionalTensorRefT = BaseCppType('at', 'OptionalTensorRef') tensorListT = BaseCppType('at', 'TensorList') +iTensorListRefT = BaseCppType('at', 'ITensorListRef') dimnameT = BaseCppType('at', 'Dimname') dimnameListT = BaseCppType('at', 'DimnameList') layoutT = BaseCppType('at', 'Layout') @@ -64,9 +65,11 @@ def __str__(self) -> str: storageT = BaseCppType('at', 'Storage') streamT = BaseCppType('at', 'Stream') intArrayRefT = BaseCppType('at', 'IntArrayRef') +optionalIntArrayRefT = BaseCppType('at', 'OptionalIntArrayRef') tensorOptionsT = BaseCppType('at', 'TensorOptions') typeAndSizeT = BaseCppType('torch::autograd::generated', 'TypeAndSize') tensorGeometryT = BaseCppType('at', 'TensorGeometry') +SymIntT = BaseCppType('c10', 'SymInt') # Types representing template parameters. Technically, we probably shouldn't # represent them this way in codegen, but it was pretty convenient. @@ -105,6 +108,7 @@ def __str__(self) -> str: BaseTy.QScheme: qschemeT, BaseTy.Storage: storageT, BaseTy.Stream: streamT, + BaseTy.SymInt: SymIntT, } # CTypes encode C++ type structure as needed for translation. diff --git a/tools/codegen/build.bzl b/tools/codegen/build.bzl new file mode 100644 index 00000000000000..ed04e35a439133 --- /dev/null +++ b/tools/codegen/build.bzl @@ -0,0 +1,16 @@ +def define_targets(rules): + rules.py_library( + name = "codegen", + srcs = rules.glob(["**/*.py"]), + deps = [ + rules.requirement("PyYAML"), + rules.requirement("typing-extensions"), + ], + visibility = ["//visibility:public"], + ) + + rules.py_binary( + name = "gen", + srcs = [":codegen"], + visibility = ["//visibility:public"], + ) diff --git a/tools/codegen/decompositions/gen_jit_decompositions.py b/tools/codegen/decompositions/gen_jit_decompositions.py new file mode 100644 index 00000000000000..a934bb3ecc4d80 --- /dev/null +++ b/tools/codegen/decompositions/gen_jit_decompositions.py @@ -0,0 +1,84 @@ +#!/usr/bin/env python3 +import os +from pathlib import Path + +from torch.jit._decompositions import decomposition_table +# from tools.codegen.code_template import CodeTemplate + +DECOMP_HEADER = r""" +/** + * @generated + * This is an auto-generated file. Please do not modify it by hand. + * To re-generate, please run: + * cd ~/pytorch && python tools/codegen/decompositions/gen_jit_decompositions.py + */ +#include +#include +#include +#include + +namespace torch { +namespace jit { + + +const std::string decomp_funcs = +R"(""" + + + +DECOMP_CENTER = r""" +)"; + +const std::string& GetSerializedDecompositions() { + return decomp_funcs; +} + +const OperatorMap& GetDecompositionMapping() { + // clang-format off + static const OperatorMap decomposition_mapping { +""" + +DECOMP_END = r""" + }; + // clang-format on + + return decomposition_mapping; +} + +} // namespace jit +} // namespace torch +""" + + +DECOMPOSITION_UTIL_FILE_NAME = "decomposition_registry_util.cpp" + +def gen_serialized_decompisitions() -> str: + return "\n".join([scripted_func.code for scripted_func in decomposition_table.values()]) + +def gen_decomposition_mappings() -> str: + decomposition_mappings = [] + for schema, scripted_func in decomposition_table.items(): + decomposition_mappings.append( + ' {"' + schema + '", "' + scripted_func.name + '"},' + ) + return "\n".join(decomposition_mappings) + +def write_decomposition_util_file(path: str) -> None: + decomposition_str = gen_serialized_decompisitions() + decomposition_mappings = gen_decomposition_mappings() + file_components = [DECOMP_HEADER, decomposition_str, DECOMP_CENTER, decomposition_mappings, DECOMP_END] + print("writing file to : ", path + "/" + DECOMPOSITION_UTIL_FILE_NAME) + with open( + os.path.join(path, DECOMPOSITION_UTIL_FILE_NAME), "wb" + ) as out_file: + final_output = "".join(file_components) + out_file.write(final_output.encode("utf-8")) + +def main() -> None: + pytorch_dir = Path(__file__).resolve().parents[3] + upgrader_path = pytorch_dir / "torch" / "csrc" / "jit" / "runtime" + write_decomposition_util_file(str(upgrader_path)) + + +if __name__ == '__main__': + main() diff --git a/tools/codegen/dest/lazy_ir.py b/tools/codegen/dest/lazy_ir.py index c744c2b91d0087..acc86169d4dbfd 100644 --- a/tools/codegen/dest/lazy_ir.py +++ b/tools/codegen/dest/lazy_ir.py @@ -1,29 +1,31 @@ -from abc import ABC, abstractmethod +from abc import ABC from typing import List, Union from dataclasses import dataclass from tools.codegen.context import method_with_native_function from tools.codegen.model import (BackendIndex, NativeFunction, NativeFunctionsGroup) -from tools.codegen.api.types import (BaseCType, OptionalCType, NamedCType, +from tools.codegen.api.types import (BaseCType, OptionalCType, VectorCType, kernel_signature) import tools.codegen.api.dispatcher as dispatcher -from tools.codegen.api.lazy import LazyIrSchema, isValueType +from tools.codegen.api.lazy import LazyIrSchema, LazyArgument, isValueType, tensorListValueT from tools.codegen.dest.lazy_ts_lowering import ts_lowering_body -def node_ctor_arg_rvalue_string(arg: NamedCType, schema: LazyIrSchema) -> str: +def node_ctor_arg_rvalue_string(arg: LazyArgument) -> str: """ - Given a NamedCType from a lazy IR schema, + Given a LazyArgument, generate a c++ string for materializing an rvalue of that arg for passing into a lazy Node constructor. """ - if isValueType(arg.type): - if isinstance(arg.type, BaseCType): - if arg.name in schema.wrapped_scalar_names: + if isValueType(arg.lazy_type): + if isinstance(arg.lazy_type, BaseCType): + if arg.is_wrapped_scalar: return f"torch::lazy::LazyGraphExecutor::Get()->GetIrValueForScalarFromCodegen({arg.name})" + elif arg.lazy_type.type is tensorListValueT: + return f"lazy_{arg.name}_tensorlist" return f"lazy_{arg.name}->GetIrValue()" - elif isinstance(arg.type, OptionalCType): - if arg.name in schema.wrapped_scalar_names: + elif isinstance(arg.lazy_type, OptionalCType): + if arg.is_wrapped_scalar: return f"{arg.name} ? " \ f"c10::make_optional(torch::lazy::LazyGraphExecutor::Get()->GetIrValueForScalarFromCodegen(*{arg.name})) : " \ "c10::nullopt" @@ -31,14 +33,14 @@ def node_ctor_arg_rvalue_string(arg: NamedCType, schema: LazyIrSchema) -> str: f"c10::make_optional(lazy_{arg.name}->GetIrValue()) : " \ "c10::nullopt" else: - raise AssertionError("TODO not sure if there are other valid types to handle here") + raise AssertionError(f"TODO not sure if there are other valid types to handle here ({arg.lazy_type})") else: - if isinstance(arg.type, VectorCType) and isinstance(arg.type.elem, BaseCType): - return f"std::vector<{arg.type.elem.type}>({arg.name}.begin(), {arg.name}.end())" - elif (isinstance(arg.type, OptionalCType) and - isinstance(arg.type.elem, VectorCType) and - isinstance(arg.type.elem.elem, BaseCType)): - return f"torch::lazy::ToOptionalVector<{arg.type.elem.elem.type}>({arg.name})" + if isinstance(arg.lazy_type, VectorCType) and isinstance(arg.lazy_type.elem, BaseCType): + return f"std::vector<{arg.lazy_type.elem.type}>({arg.name}.begin(), {arg.name}.end())" + elif (isinstance(arg.lazy_type, OptionalCType) and + isinstance(arg.lazy_type.elem, VectorCType) and + isinstance(arg.lazy_type.elem.elem, BaseCType)): + return f"torch::lazy::ToOptionalVector<{arg.lazy_type.elem.elem.type}>({arg.name})" else: return f"{arg.name}" @@ -46,20 +48,24 @@ def node_ctor_inputs(schema: LazyIrSchema) -> str: """ Produce a formatted string with the arguments as passed into the constructor of a node class. """ - node_ctor_values = [node_ctor_arg_rvalue_string(arg, schema) for arg in schema.filtered_types()] + node_ctor_values = [node_ctor_arg_rvalue_string(arg) for arg in schema.filtered_args()] return ",\n ".join(node_ctor_values) def gen_fallback_code(schema: LazyIrSchema, overload_name: str) -> str: """ Generate code that falls back to eager conditioned on a predicate """ - fallback_args = ",\n ".join([str(arg.name) for arg in schema.filtered_types()]) + fallback_args = ",\n ".join([str(arg.name) for arg in schema.filtered_args(generator=True)]) if len(overload_name): aten_op_str = f"ATEN_OP2({schema.aten_name}, {overload_name})" else: aten_op_str = f"ATEN_OP({schema.aten_name})" + or_has_generator = "" + if schema.generator_arg: + # generators are always optional and there is never more than one, at least currently + or_has_generator = f" || ({schema.generator_arg.name}.has_value() && {schema.generator_arg.name}->defined())" return f""" - if (force_eager_fallback({aten_symbol(schema)})) {{ + if (force_eager_fallback({aten_symbol(schema)}){or_has_generator}) {{ return at::native::call_fallback_fn<<c_eager_fallback, {aten_op_str}>::call( {fallback_args} ); @@ -78,56 +84,56 @@ def aten_symbol(schema: LazyIrSchema) -> str: class LazyIR(ABC): backend_index: BackendIndex node_base: str - lowering_function_type: str = "" - lowering_context_type: str = "" - lowering_return_type: str = "" @method_with_native_function def __call__(self, f: Union[NativeFunctionsGroup, NativeFunction]) -> List[str]: func = f.functional.func if isinstance(f, NativeFunctionsGroup) else f.func return self.gen(f) - @abstractmethod - def lowering_body(self, f: Union[NativeFunctionsGroup, NativeFunction]) -> str: - pass + # there is no lowering functionality generated unless this IR base class is subclassed and + # implemented as a backend-specific node + def lowering_function(self, f: Union[NativeFunctionsGroup, NativeFunction]) -> str: + return "" def gen(self, f: Union[NativeFunctionsGroup, NativeFunction]) -> List[str]: # for now, we just want one IR class decl and soon after also the method defs # and we use the functional version not out/inplace. func = f.functional.func if isinstance(f, NativeFunctionsGroup) else f.func schema = LazyIrSchema(func) - all_types = schema.filtered_types() - value_types = schema.filtered_types(values=True, scalars=False) - scalar_types = schema.filtered_types(values=False, scalars=True) + all_args = schema.filtered_args() + value_args = schema.filtered_args(values=True, scalars=False) + scalar_args = schema.filtered_args(values=False, scalars=True) - node_ctor_args = ", ".join([f"const {i.cpp_type()}& {i.name}" for i in all_types]) - scalar_initializers = ",\n ".join([f"{t.name}({t.name})" for t in scalar_types]) + node_ctor_args = ", ".join([f"const {i.lazy_type.cpp_type()}& {i.name}" for i in all_args]) + scalar_initializers = ",\n ".join([f"{a.name}({a.name})" for a in scalar_args]) comma_if_scalar_initializers = ",\n" if len(scalar_initializers) else "" - scalar_decls = "\n ".join([f"{t.cpp_type()} {t.name};" for t in scalar_types]) - scalar_hashes = ", ".join([f"{f.name}" for f in scalar_types]) + scalar_decls = "\n ".join([f"std::string {a.name};" if a.lazy_type.cpp_type() == "c10::string_view" + else f"{a.lazy_type.cpp_type()} {a.name};" + for a in scalar_args]) + scalar_hashes = ", ".join([f"{a.name}" for a in scalar_args]) base_ctor_value_args_list = [] optional_values = [] - for t in value_types: - if isinstance(t.type, BaseCType): - base_ctor_value_args_list.append(f"{t.name}") - elif isinstance(t.type, OptionalCType): - base_ctor_value_args_list.append(f"{t.name}.value_or(kNullValue)") - optional_values.append(t.name) + for arg in value_args: + if isinstance(arg.lazy_type, BaseCType) or isinstance(arg.lazy_type, VectorCType): + base_ctor_value_args_list.append(f"{arg.name}") + elif isinstance(arg.lazy_type, OptionalCType): + base_ctor_value_args_list.append(f"{arg.name}.value_or(kNullValue)") + optional_values.append(arg.name) else: - raise AssertionError("TODO not sure if there are other valid types to handle here") + raise AssertionError(f"TODO not sure if there are other valid types to handle here ({arg.lazy_type})") base_ctor_value_args = ", ".join(base_ctor_value_args_list) has_optional_decls = "\n ".join([f"bool has_{value}: 1;" for value in optional_values]) has_optional_defs = "\n ".join([f"has_{value} = !!{value};" for value in optional_values]) members_to_string = [] - for t in scalar_types: - if isinstance(t.type, OptionalCType): - members_to_string.append(f"""if ({t.name}.has_value()) {{ - ss << ", {t.name}=" << {t.name}.value(); + for arg in scalar_args: + if isinstance(arg.lazy_type, OptionalCType): + members_to_string.append(f"""if ({arg.name}.has_value()) {{ + ss << ", {arg.name}=" << {arg.name}.value(); }} else {{ - ss << ", {t.name}=null"; + ss << ", {arg.name}=null"; }}""") else: - members_to_string.append(f'ss << ", {t.name}=" << {t.name};') + members_to_string.append(f'ss << ", {arg.name}=" << {arg.name};') members_to_string_str = "\n ".join(members_to_string) return [f"""\ @@ -151,10 +157,7 @@ class {schema.node_name} : public {self.node_base} {{ return ss.str(); }} - {self.lowering_return_type} Lower({self.lowering_function_type} function, - {self.lowering_context_type} loctx) const override {{ - {self.lowering_body(f)} - }} + {self.lowering_function(f)} {scalar_decls} {has_optional_decls} @@ -166,31 +169,34 @@ class {schema.node_name} : public {self.node_base} {{ @dataclass(frozen=True) class TSLazyIR(LazyIR): - lowering_function_type: str = "std::shared_ptr" - lowering_context_type: str = "torch::lazy::TSLoweringContext*" - lowering_return_type: str = "torch::lazy::TSOpVector" - - def lowering_body(self, f: Union[NativeFunctionsGroup, NativeFunction]) -> str: - return ts_lowering_body(f) + def lowering_function(self, f: Union[NativeFunctionsGroup, NativeFunction]) -> str: + return f"""torch::lazy::TSOpVector Lower(std::shared_ptr function, + torch::lazy::TSLoweringContext* loctx) const override {{ + {ts_lowering_body(f)} + }}""" -def lazy_tensor_decls(value_types: List[NamedCType], tensor_class: str, schema: LazyIrSchema) -> str: +def lazy_tensor_decls(value_args: List[LazyArgument], tensor_class: str) -> str: lazy_tensor_decls: List[str] = [] - for t in value_types: - if t.name in schema.wrapped_scalar_names: + for arg in value_args: + if arg.is_wrapped_scalar: # no lazy tensor wrapper for scalars that are promoted to IR values continue - if isinstance(t.type, BaseCType): - lazy_tensor_decls.append( - f"{tensor_class}Ptr lazy_{t.name} = " - f"torch::lazy::GetLtcTensorOrCreateForWrappedNumber({t.name}, *common_device);") - elif isinstance(t.type, OptionalCType): + elif isinstance(arg.lazy_type, BaseCType): + if arg.lazy_type.type is tensorListValueT: + lazy_tensor_decls.append( + f"auto lazy_{arg.name}_tensorlist = torch::lazy::GetTensorList({arg.name});") + else: + lazy_tensor_decls.append( + f"{tensor_class}Ptr lazy_{arg.name} = " + f"torch::lazy::GetLtcTensorOrCreateForWrappedNumber({arg.name}, *common_device);") + elif isinstance(arg.lazy_type, OptionalCType): # TODO(alanwaketan): Maybe we want to apply GetLtcTensorOrCreateForWrappedNumber here, but hold it # until we encounter a real world example. lazy_tensor_decls.append( - f" {tensor_class}Ptr lazy_{t.name} = torch::lazy::TryGetLtcTensor({t.name}.value_or(at::Tensor()));") + f" {tensor_class}Ptr lazy_{arg.name} = torch::lazy::TryGetLtcTensor({arg.name}.value_or(at::Tensor()));") else: - raise AssertionError("TODO not sure if there are other valid types to handle here") + raise AssertionError(f"TODO not sure if there are other valid types to handle here ({arg.lazy_type})") return ("\n ").join(lazy_tensor_decls) @dataclass(frozen=True) @@ -198,38 +204,27 @@ class GenLazyNativeFuncDefinition: class_method_name: str backend_index: BackendIndex tensor_class: str + gen_forced_fallback_code: bool - @method_with_native_function - def __call__(self, func: NativeFunction) -> List[str]: - sig = kernel_signature(func, self.backend_index) + def gen_shape_call(self, func: NativeFunction) -> str: metadata = self.backend_index.get_kernel(func) assert metadata is not None schema = LazyIrSchema(func.func) - all_types = schema.filtered_types() - value_types = schema.filtered_types(values=True, scalars=False) - scalar_types = schema.filtered_types(values=False, scalars=True) + all_args = schema.filtered_args() returns_length = len(schema.returns) - fallback_str = gen_fallback_code(schema, overload_name=func.func.name.overload_name) - value_types_names = [f"{t.name}" for t in value_types if t.name not in schema.wrapped_scalar_names] - assert len(value_types_names) > 0, "Code below assumes there is at least one tensor arg" - get_device_str = f"""auto common_device = torch::lazy::GetBackendDevice({', '.join(value_types_names)}); - TORCH_INTERNAL_ASSERT(common_device); - """ - - lazy_tensor_decls_str = lazy_tensor_decls(value_types, self.tensor_class, schema) - node_ctor_input_str = node_ctor_inputs(schema) - # call the meta kernel if it exists, to compute output shape/dtype for our IR if func.structured or func.structured_delegate is not None: meta_out = """std::vector shapes{Shape(out_meta.scalar_type(), out_meta.sizes().vec())};""" if returns_length > 1: + def this_shape(i: int) -> str: return f"Shape(std::get<{i}>(out_meta).scalar_type(), std::get<{i}>(out_meta).sizes().vec())" - shapes_str = ','.join([this_shape(i) for i in range(returns_length)]) + + shapes_str = ",".join([this_shape(i) for i in range(returns_length)]) meta_out = "std::vector shapes{" + shapes_str + "};" - meta_str = f"""auto out_meta = at::meta::{schema.aten_name}({', '.join(str(t.name) for t in all_types)}); + meta_str = f"""auto out_meta = at::meta::{schema.aten_name}({', '.join(str(a.name) for a in all_args)}); {meta_out}""" else: shape_sig = ComputeShapeSignature(metadata.kernel, func) @@ -239,7 +234,43 @@ def this_shape(i: int) -> str: meta_str += f""" TORCH_INTERNAL_ASSERT(shapes.size() == {returns_length});""" - node_str = f"""auto node = torch::lazy::MakeNode({node_ctor_input_str}, + # Calculating which dimensions are symbolic + func_schema_str = "aten::" + str(func.func) + meta_str += f""" + if(symbolicShapeEnabled()){{ + std::vector inputs = {{ {', '.join(str(a.name) for a in all_args)} }}; + char* schema_str = "{func_schema_str}"; + applySymbolicShapesOnLT(schema_str, inputs, shapes); + }} + """ + return meta_str + + @method_with_native_function + def __call__(self, func: NativeFunction) -> List[str]: + sig = kernel_signature(func, self.backend_index) + metadata = self.backend_index.get_kernel(func) + assert metadata is not None + schema = LazyIrSchema(func.func) + value_args = schema.filtered_args(values=True, scalars=False) + returns_length = len(schema.returns) + + fallback_str = "" + if self.gen_forced_fallback_code: + fallback_str = gen_fallback_code(schema, overload_name=func.func.name.overload_name) + + value_types_names = [f"{a.name}" for a in value_args if not a.is_wrapped_scalar] + assert ( + len(value_types_names) > 0 + ), "Code below assumes there is at least one tensor arg" + get_device_str = f"""auto common_device = torch::lazy::GetBackendDevice({', '.join(value_types_names)}); + TORCH_INTERNAL_ASSERT(common_device); + """ + + lazy_tensor_decls_str = lazy_tensor_decls(value_args, self.tensor_class) + node_ctor_input_str = node_ctor_inputs(schema) + shape_str = self.gen_shape_call(func) + + node_str = f"""auto node = torch::lazy::MakeNode<{schema.node_name}>({node_ctor_input_str}, std::move(shapes));""" first_tensor_name = value_types_names[0] bridge_str = """auto result = torch::lazy::CreateAtenFromLtcTensor( @@ -253,24 +284,28 @@ def this_shape(i: int) -> str: auto result = torch::lazy::TupleAtenFromLtcTensors<{returns_length}>(lazy_tensors);""" if schema.name.name.inplace or func.func.is_out_fn(): - assert returns_length == 1, "We assumed there was no such case where an op is an in-place variant " \ - "and has tuple outputs." + assert returns_length == 1, ( + "We assumed there was no such case where an op is an in-place variant " + f"and has tuple outputs, but got tuple of len {returns_length}." + ) bridge_str = f"""lazy_{first_tensor_name}->SetInPlaceIrValue(node); auto& result = {first_tensor_name};""" - - return [f"""\ + return [ + f"""\ {sig.decl(name=f"{self.class_method_name}::{metadata.kernel}")} {{ {fallback_str} TORCH_LAZY_FN_COUNTER("lazy::"); {get_device_str} {lazy_tensor_decls_str} - {meta_str} + {shape_str} {node_str} {bridge_str} return result; }};\n - """] + """ + ] + class ComputeShapeSignature: """ @@ -279,7 +314,7 @@ class ComputeShapeSignature: def __init__(self, kernel_name: str, f: NativeFunction): self.__schema = LazyIrSchema(f.func) self.__dispatch_args = ', '.join([a.decl() for a in dispatcher.arguments(f.func)]) - self.__call_args = ", ".join([f"{t.name}" for t in self.__schema.filtered_types()]) + self.__call_args = ", ".join([f"{arg.name}" for arg in self.__schema.filtered_args(generator=True)]) self.__kernel_name = kernel_name def __decl_suffix(self) -> str: @@ -309,8 +344,8 @@ def __call__(self, f: NativeFunction) -> List[str]: metadata = self.backend_index.get_kernel(f) assert metadata is not None schema = LazyIrSchema(f.func) - value_types = schema.filtered_types(values=True, scalars=False) - lazy_tensor_decls_str = lazy_tensor_decls(value_types, self.tensor_class, schema) + value_args = schema.filtered_args(values=True, scalars=False) + lazy_tensor_decls_str = lazy_tensor_decls(value_args, self.tensor_class) node_ctor_input_str = node_ctor_inputs(schema) # Only generate shape/dtype fn for non-structured kernels, diff --git a/tools/codegen/dest/lazy_ts_lowering.py b/tools/codegen/dest/lazy_ts_lowering.py index 3f7701d5587a9a..25d594aa459ff6 100644 --- a/tools/codegen/dest/lazy_ts_lowering.py +++ b/tools/codegen/dest/lazy_ts_lowering.py @@ -1,6 +1,6 @@ from typing import Union from tools.codegen.model import (NativeFunction, NativeFunctionsGroup) -from tools.codegen.api.lazy import LazyIrSchema, isValueType +from tools.codegen.api.lazy import LazyIrSchema from tools.codegen.api.types import OptionalCType @@ -11,19 +11,19 @@ def ts_lowering_body(f: Union[NativeFunctionsGroup, NativeFunction]) -> str: schema = LazyIrSchema(func) emplace_arguments = [] - for value in schema.positional_arg_types: - if isValueType(value.type): - if isinstance(value.type, OptionalCType): - emplace_arguments.append(f"has_{value.name} ? loctx->GetOutputOp(operand(i++)) : nullptr") + for arg in schema.positional_args: + if arg.is_lazy_value: + if isinstance(arg.lazy_type, OptionalCType): + emplace_arguments.append(f"has_{arg.name} ? loctx->GetOutputOp(operand(i++)) : nullptr") continue emplace_arguments.append('loctx->GetOutputOp(operand(i++))') continue - emplace_arguments.append(f'"{value.name}", {value.name}') + emplace_arguments.append(f'"{arg.name}", {arg.name}') emplace_arguments_str = "\n ".join( [f"arguments.emplace_back({a});" for a in emplace_arguments]) - emplace_kwarg_values = [f'"{t.name}", loctx->GetOutputOp(operand(i++))' for t in schema.keyword_values] - emplace_kwarg_scalars = [f'"{t.name}", {t.name}' for t in schema.keyword_scalars] + emplace_kwarg_values = [f'"{arg.name}", loctx->GetOutputOp(operand(i++))' for arg in schema.keyword_values] + emplace_kwarg_scalars = [f'"{arg.name}", {arg.name}' for arg in schema.keyword_scalars] emplace_kwarguments = "\n ".join( [f"kwarguments.emplace_back({a});" for a in emplace_kwarg_values + emplace_kwarg_scalars]) return f"""\ diff --git a/tools/codegen/dest/register_dispatch_key.py b/tools/codegen/dest/register_dispatch_key.py index c555768d08ce38..dee32075f0376a 100644 --- a/tools/codegen/dest/register_dispatch_key.py +++ b/tools/codegen/dest/register_dispatch_key.py @@ -43,7 +43,9 @@ def gen_registration_headers( elif per_operator_headers: headers += [ "#include ", - "#include "] + "#include ", + "#include ", + "#include "] else: headers.append("#include ") @@ -60,30 +62,15 @@ def gen_create_out_helper(backend_index: BackendIndex) -> List[str]: dispatch = str(backend_index.dispatch_key).lower() empty_impl = f"at::detail::empty_{dispatch}" empty_strided_impl = f"at::detail::empty_strided_{dispatch}" - runtime_empty_supported_check = "" - elif backend_index.dispatch_key == DispatchKey.CompositeExplicitAutograd: + elif backend_index.dispatch_key in ( + DispatchKey.CompositeExplicitAutograd, DispatchKey.QuantizedCPU, DispatchKey.QuantizedCUDA): empty_impl = "at::empty" empty_strided_impl = "at::empty_strided" - runtime_empty_supported_check = """\ - if (!c10::detail::backend_supports_empty_operator(options)) {{ - // The main purpose of this CompositeExplicitAutograd kernel is to provide - // a "free" implementation of out-of-place operators. - // If a backend hasn't implemented an out-of-place op but has implemented - // the out= variant, then this kernel will call their out= variant. - // It does that by using at::empty() to create the tensor to pass to the out= variant though, - // so this "default" kernel doesn't actually handle backends that don't support at::empty - // (e.g. quantized backends). - // Returning an undefined tensor here allows us to reach the out= kernel and give a better error. - // Longer term, this could be better fixed by https://github.com/pytorch/pytorch/issues/52680 - return at::Tensor(); - }} -""" else: return [] return [f""" Tensor create_out(IntArrayRef sizes, IntArrayRef strides, const TensorOptions &options) {{ - {runtime_empty_supported_check} if (strides.empty()) {{ return {empty_impl}(sizes, {empty_options}); }} else {{ @@ -191,6 +178,10 @@ class RegisterDispatchKey: # all of the existing kernel signatures scattered across aten/src/ATen/native. class_method_name: Optional[str] + # Only set to true in lightweight dispatch. If lightweight dispatch is enabled we are registering + # operators into JIT op registry, thus we need to avoid generating code to register into the dispatcher. + skip_dispatcher_op_registration: bool + @staticmethod def gen_device_check(type: DeviceCheckType, args: List[Argument], method_name: str) -> str: if type == DeviceCheckType.NoCheck: @@ -282,6 +273,7 @@ def gen_structured(self, g: NativeFunctionsGroup) -> List[str]: self.rocm, self.cpp_namespace, self.class_method_name, + self.skip_dispatcher_op_registration, g ) return list(mapMaybe(structured_gen.gen_one, g.functions())) @@ -376,7 +368,7 @@ def generate_defn(cpp_sig: CppSignature) -> str: device_guard = "// DeviceGuard omitted" # default if f.device_guard and self.backend_index.device_guard: - has_tensor_options = any(isinstance(a.argument, TensorOptionsArguments) for a in args) + has_tensor_options = any(isinstance(a, TensorOptionsArguments) for a in f.func.arguments.non_out) if has_tensor_options: # kernel is creating a tensor device_guard = """ @@ -416,7 +408,7 @@ def generate_defn(cpp_sig: CppSignature) -> str: """ elif self.target is Target.REGISTRATION: - if f.manual_kernel_registration: + if f.manual_kernel_registration or self.skip_dispatcher_op_registration: return None else: payload = f"TORCH_FN({name})" diff --git a/tools/codegen/gen.py b/tools/codegen/gen.py index 4b35ee81f343ca..101f1fbe96aed6 100644 --- a/tools/codegen/gen.py +++ b/tools/codegen/gen.py @@ -26,6 +26,7 @@ import tools.codegen.api.meta as meta import tools.codegen.api.structured as structured from tools.codegen.api.translate import translate +from tools.codegen.code_template import CodeTemplate from tools.codegen.selective_build.selector import SelectiveBuilder from tools.codegen.utils import ( Target, concatMap, context, mapMaybe, YamlDumper, YamlLoader, FileManager, assert_never, make_file_manager @@ -1090,7 +1091,8 @@ def gen_aggregated_headers( selector, rocm=rocm, cpp_namespace='at::native', - class_method_name=None), + class_method_name=None, + skip_dispatcher_op_registration=False), grouped_native_functions )), }) @@ -1198,7 +1200,8 @@ def gen_per_operator_headers( selector, rocm=rocm, cpp_namespace='at::native', - class_method_name=None), + class_method_name=None, + skip_dispatcher_op_registration=False), grouped_functions )) @@ -1418,6 +1421,25 @@ def operator_headers() -> List[str]: return headers backend_index = backend_indices[dispatch_key] + dispatch_registrations_body = "" if skip_dispatcher_op_registration else "\n".join(list(concatMap( + dest.RegisterDispatchKey( + backend_index, + Target.REGISTRATION, + selector, + rocm=rocm, + cpp_namespace='at::native', + class_method_name=None, + skip_dispatcher_op_registration=skip_dispatcher_op_registration), + grouped_native_functions + ))) + static_template = CodeTemplate("""\ +TORCH_LIBRARY_IMPL(aten, $dispatch_key, m) { + $dispatch_registrations_body +};""") + static_init_dispatch_registrations = static_template.substitute( + dispatch_key=dispatch_key, + dispatch_registrations_body=dispatch_registrations_body + ) dispatch_namespace = str(dispatch_key).lower() fm.write_with_template(f'Register{dispatch_key}.cpp', 'RegisterDispatchKey.cpp', lambda: { 'extra_cuda_headers': extra_cuda_headers if is_cuda_dispatch_key(dispatch_key) else '', @@ -1434,7 +1456,8 @@ def operator_headers() -> List[str]: selector, rocm=rocm, cpp_namespace='at::native', - class_method_name=None), + class_method_name=None, + skip_dispatcher_op_registration=skip_dispatcher_op_registration), grouped_native_functions )), 'dispatch_anonymous_definitions': list(concatMap( @@ -1444,19 +1467,12 @@ def operator_headers() -> List[str]: selector, rocm=rocm, cpp_namespace='at::native', - class_method_name=None), - grouped_native_functions - )), - 'dispatch_registrations': [] if skip_dispatcher_op_registration else list(concatMap( - dest.RegisterDispatchKey( - backend_index, - Target.REGISTRATION, - selector, - rocm=rocm, - cpp_namespace='at::native', - class_method_name=None), + class_method_name=None, + skip_dispatcher_op_registration=skip_dispatcher_op_registration), grouped_native_functions )), + 'static_init_dispatch_registrations': static_init_dispatch_registrations, + 'deferred_dispatch_registrations': "", }) for g in structured_native_functions: diff --git a/tools/codegen/gen_backend_stubs.py b/tools/codegen/gen_backend_stubs.py index 5b703889ab85ef..587eea4d48c799 100644 --- a/tools/codegen/gen_backend_stubs.py +++ b/tools/codegen/gen_backend_stubs.py @@ -11,6 +11,7 @@ from tools.codegen.selective_build.selector import SelectiveBuilder from tools.codegen.utils import Target, concatMap, context, YamlLoader, FileManager from tools.codegen.context import native_function_manager +from tools.codegen.code_template import CodeTemplate import tools.codegen.dest as dest import tools.codegen.api.dispatcher as dispatcher from tools.codegen.api.types import DispatcherSignature @@ -19,7 +20,7 @@ # Parses the external backend's yaml, and adds a new BackendIndex for the backend's dispatch key. # Returns a Tuple of (backend_key, autograd_key, cpp_namespace, updated BackendIndex mapping) ParsedExternalYaml = namedtuple('ParsedExternalYaml', [ - 'backend_key', 'autograd_key', 'cpp_namespace', 'backend_indices']) + 'backend_key', 'autograd_key', 'class_name', 'cpp_namespace', 'backend_indices']) def parse_backend_yaml( backend_yaml_path: str, grouped_native_functions: Sequence[Union[NativeFunction, NativeFunctionsGroup]], @@ -35,11 +36,13 @@ def parse_backend_yaml( yaml_values = yaml.load(f, Loader=YamlLoader) assert isinstance(yaml_values, dict) - valid_keys = ['backend', 'cpp_namespace', 'extra_headers', 'supported', 'autograd', 'full_codegen'] + valid_keys = ['backend', 'class_name', 'cpp_namespace', 'extra_headers', 'supported', 'autograd', 'full_codegen'] backend = yaml_values.pop('backend', None) assert backend is not None, 'You must provide a value for "backend"' + class_name = yaml_values.pop('class_name', None) + cpp_namespace = yaml_values.pop('cpp_namespace', None) assert cpp_namespace is not None, 'You must provide a value for "cpp_namespace"' @@ -133,13 +136,14 @@ def create_backend_index( autograd key. They cannot be mix and matched. If this is something you need, feel free to create an issue! \ {forward_kernels[0].kernel} is listed under "supported", but {backward_kernels[0].kernel} is listed under "autograd".' - return ParsedExternalYaml(backend_key, autograd_key, cpp_namespace, backend_indices) + return ParsedExternalYaml(backend_key, autograd_key, class_name, cpp_namespace, backend_indices) def error_on_missing_kernels( native_functions: Sequence[NativeFunction], backend_indices: Dict[DispatchKey, BackendIndex], backend_key: DispatchKey, autograd_key: Optional[DispatchKey], + class_name: str, kernel_defn_file_path: str, full_codegen: Optional[List[OperatorName]] = None, ) -> None: @@ -152,9 +156,6 @@ def error_on_missing_kernels( if full_codegen is None: full_codegen = [] - class_name: Optional[str] = backend_indices[backend_key].native_function_class_name() - assert class_name is not None - expected_backend_op_names: List[OperatorName] = \ list(backend_indices[backend_key].index.keys()) + \ [] if autograd_key is None else list(backend_indices[autograd_key].index.keys()) @@ -208,7 +209,8 @@ def gen_dispatchkey_nativefunc_headers( backend_indices: Dict[DispatchKey, BackendIndex], grouped_native_functions: Sequence[Union[NativeFunction, NativeFunctionsGroup]], backend_dispatch_key: DispatchKey, - autograd_dispatch_key: Optional[DispatchKey]) -> None: + autograd_dispatch_key: Optional[DispatchKey], + backend_name: str = "") -> None: assert class_name is not None generated_comment = 'Autogenerated file by gen_backend_stubs.py. Do not edit directly!' @@ -230,26 +232,81 @@ def gen_dispatchkey_nativefunc_headers( 'class_name': class_name, 'namespace_epilogue': ns_helper.epilogue, 'dispatch_declarations': backend_declarations + autograd_declarations, + 'BackendName': backend_name, + 'DispatchKey': backend_dispatch_key, + }) def gen_dispatcher_registrations( fm: FileManager, output_dir: str, + class_name: str, cpp_namespace: str, backend_indices: Dict[DispatchKey, BackendIndex], grouped_native_functions: Sequence[Union[NativeFunction, NativeFunctionsGroup]], backend_dispatch_key: DispatchKey, dispatch_key: DispatchKey, - selector: 'SelectiveBuilder') -> None: + selector: 'SelectiveBuilder', + # build_in_tree is true for lazy TS backend and affects include paths, not used for external backends + build_in_tree: bool = False, + per_operator_headers: bool = False, + backend_name: str = "", + eager_registration: bool = True) -> None: + headers = [ + f"{output_dir}/{backend_dispatch_key}NativeFunctions.h", + ] + if build_in_tree: + external_backend_headers_str = "\n".join(f'#include <{h}>' for h in headers) + else: + external_backend_headers_str = "\n".join(f'#include "{h}"' for h in headers) + + assert class_name is not None backend_index = backend_indices[dispatch_key] + + dispatch_registrations_body = list(concatMap( + dest.RegisterDispatchKey( + backend_index, + Target.REGISTRATION, + selector, + rocm=False, + cpp_namespace=cpp_namespace, + class_method_name=f'{class_name}', + skip_dispatcher_op_registration=False), + grouped_native_functions + )) + deferred_dispatch_registrations = "" + static_init_dispatch_registrations = "" + if eager_registration: + static_template = CodeTemplate("""\ +TORCH_LIBRARY_IMPL(aten, $dispatch_key, m) { + $dispatch_registrations_body +};""") + static_init_dispatch_registrations = static_template.substitute( + dispatch_key=dispatch_key, + dispatch_registrations_body=dispatch_registrations_body + ) + else: + deferred_template = CodeTemplate("""\ +TORCH_API void Register${backend_name}${dispatch_key}NativeFunctions() { + static auto m = MAKE_TORCH_LIBRARY_IMPL(aten, $dispatch_key); + $dispatch_registrations_body +}""") + deferred_dispatch_registrations = deferred_template.substitute( + backend_name=backend_name, + dispatch_key=dispatch_key, + dispatch_registrations_body=dispatch_registrations_body + ) + fm.write_with_template(f'Register{dispatch_key}.cpp', 'RegisterDispatchKey.cpp', lambda: { + 'static_init_dispatch_registrations': static_init_dispatch_registrations, + 'deferred_dispatch_registrations': deferred_dispatch_registrations, 'extra_cuda_headers': '', - 'external_backend_headers': f'#include "{output_dir}/{backend_dispatch_key}NativeFunctions.h"', - 'ops_headers': '#include ', + 'external_backend_headers': external_backend_headers_str, + 'ops_headers': '#include ' if not per_operator_headers else '', 'DispatchKey': dispatch_key, 'dispatch_namespace': dispatch_key.lower(), - 'dispatch_headers': dest.gen_registration_headers(backend_index, per_operator_headers=False, rocm=False), + 'dispatch_headers': dest.gen_registration_headers(backend_index, per_operator_headers=per_operator_headers, rocm=False), 'dispatch_helpers': dest.gen_registration_helpers(backend_index), 'dispatch_namespaced_definitions': '', 'dispatch_anonymous_definitions': list(concatMap( @@ -259,17 +316,8 @@ def gen_dispatcher_registrations( selector, rocm=False, cpp_namespace=cpp_namespace, - class_method_name=f'{backend_dispatch_key}NativeFunctions'), - grouped_native_functions - )), - 'dispatch_registrations': list(concatMap( - dest.RegisterDispatchKey( - backend_index, - Target.REGISTRATION, - selector, - rocm=False, - cpp_namespace=cpp_namespace, - class_method_name=f'{dispatch_key}NativeFunctions'), + class_method_name=f'{class_name}', + skip_dispatcher_op_registration=False), grouped_native_functions )), }) @@ -293,6 +341,7 @@ def make_file_manager(install_dir: str) -> FileManager: backend_key = parsed_backend_yaml.backend_key autograd_key = parsed_backend_yaml.autograd_key cpp_namespace = parsed_backend_yaml.cpp_namespace + class_name = parsed_backend_yaml.class_name backend_indices = parsed_backend_yaml.backend_indices selector = SelectiveBuilder.get_nop_selector() @@ -302,17 +351,24 @@ def make_file_manager(install_dir: str) -> FileManager: # This could be useful if a backend wants to quickly set up a noop yaml file but doesn't have any kernels ready yet. return - class_name = backend_indices[backend_key].native_function_class_name() + if class_name is None: + # class_name is an optional argument to backend yaml file. + # if specified it allows an external backend to override + # the name of the class that all generated kernel definitions live under. + # if not specified, its value is given as native_function_class_name. + class_name = backend_indices[backend_key].native_function_class_name() + assert class_name is not None if impl_path is not None: - error_on_missing_kernels(native_functions, backend_indices, backend_key, autograd_key, impl_path) + error_on_missing_kernels(native_functions, backend_indices, backend_key, autograd_key, class_name, impl_path) + + gen_dispatchkey_nativefunc_headers(fm, class_name, cpp_namespace, backend_indices, + grouped_native_functions, backend_key, autograd_key) - gen_dispatchkey_nativefunc_headers(fm, class_name, cpp_namespace, backend_indices, - grouped_native_functions, backend_key, autograd_key) + for dispatch_key in [backend_key] if autograd_key is None else [backend_key, autograd_key]: + gen_dispatcher_registrations(fm, output_dir, class_name, cpp_namespace, backend_indices, + grouped_native_functions, backend_key, dispatch_key, selector) - for dispatch_key in [backend_key] if autograd_key is None else [backend_key, autograd_key]: - gen_dispatcher_registrations(fm, output_dir, cpp_namespace, backend_indices, grouped_native_functions, - backend_key, dispatch_key, selector) if __name__ == '__main__': main() diff --git a/tools/codegen/gen_functionalization_type.py b/tools/codegen/gen_functionalization_type.py index 6666a493be7423..06521836d733fc 100644 --- a/tools/codegen/gen_functionalization_type.py +++ b/tools/codegen/gen_functionalization_type.py @@ -10,7 +10,6 @@ ) from tools.codegen.selective_build.selector import SelectiveBuilder from typing import List, Optional, Union, Tuple -from tools.codegen.utils import mapMaybe def modifies_arguments(f: NativeFunction) -> bool: return f.func.kind() in [SchemaKind.inplace, SchemaKind.out] @@ -40,15 +39,26 @@ def is_tensor_like(a: Union[Argument, TensorOptionsArguments, SelfArgument]) -> # unwraps all tensor-like arguments, returning: # (1) a string containing all of the logic that does the unwrapping # (2) a context, to be used by translate(), with all of the relevant bindings. -def unwrap_tensor_args(sig: DispatcherSignature) -> Tuple[str, List[Binding]]: +def unwrap_tensor_args(sig: DispatcherSignature, *, is_view_op: bool) -> Tuple[str, List[Binding]]: context: List[Binding] = [] unwrapped_tensor_args: List[str] = [] for arg in sig.arguments(): if is_tensor_like(arg.argument): # for tensor inputs, we want to unwrap them before passing them into the redispatch calls. unwrapped_name = f'{arg.name}_' - unwrapped_tensor_args.append( - f'auto {unwrapped_name} = at::functionalization::impl::from_functional_tensor({arg.name});') + # For most ops, the functionalization needs to sync any pending updates on the input tensors + # before calling the operator, since otherwise the operator will act on stale data. + # For view ops though, we can continue to defer syncing until the tensor is used by + # a non-view operator. + maybe_sync_input = '' if is_view_op else f'at::functionalization::impl::sync({arg.name});' + unwrapped_tensor_args.append(f""" + {arg.nctype.remove_const_ref().cpp_type()} {unwrapped_name}; + if (at::functionalization::impl::isFunctionalTensor({arg.name})) {{ + {maybe_sync_input} + {unwrapped_name} = at::functionalization::impl::from_functional_tensor({arg.name}); + }} else {{ + {unwrapped_name} = {arg.name}; + }}""") context.append(arg.with_name(unwrapped_name)) else: # for non-tensor inputs, we want to pass them directly into the redispatch calls. @@ -129,11 +139,10 @@ def emit_view_functionalization_body( assert_view_op_properties(f.func) view_tensor_name = dispatcher_sig.arguments()[0].name - keyset = 'dispatchKeySet & c10::after_func_keyset' return_type = dispatcher_sig.returns_type().remove_const_ref().cpp_type() - unwrap_tensor_args_str, unwrapped_args_ctx = unwrap_tensor_args(dispatcher_sig) - view_redispatch_args = [keyset] + [e.expr for e in translate(unwrapped_args_ctx, call_sig.arguments(), method=False)] + unwrap_tensor_args_str, unwrapped_args_ctx = unwrap_tensor_args(dispatcher_sig, is_view_op=True) + view_redispatch_args = [e.expr for e in translate(unwrapped_args_ctx, call_sig.arguments(), method=False)] forward_lambda = FunctionalizationLambda.from_func(f, functional_op=functional_op, is_reverse=False) reverse_lambda = FunctionalizationLambda.from_func(f, functional_op=functional_op, is_reverse=True) @@ -145,6 +154,12 @@ def emit_view_functionalization_body( if f.tag is Tag.inplace_view: # See Note [Functionalization Pass - Inplace View Ops] for more details return f""" + if (!at::functionalization::impl::isFunctionalTensor({view_tensor_name})) {{ + // functionalization is re-entrant, but will no-op if it wasn't passed a FunctionalTensorWrapper. + {unwrap_tensor_args_str} + at::AutoDispatchSkipFunctionalize guard; + return at::_ops::{f.func.name.unambiguous_name()}::call({', '.join(view_redispatch_args)}); + }} at::functionalization::ViewMeta view_meta = at::functionalization::ViewMeta( {forward_lambda.decl()} {{ return {forward_lambda.inner_call()} @@ -154,7 +169,6 @@ def emit_view_functionalization_body( }} ); at::functionalization::impl::mutate_view_meta({view_tensor_name}, view_meta); - {unwrap_tensor_args_str} {return_type} reference_tensor_output; {{ at::AutoDispatchSkipFunctionalize guard; @@ -169,13 +183,18 @@ def emit_view_functionalization_body( else: return f""" {unwrap_tensor_args_str} + if (!at::functionalization::impl::isFunctionalTensor({view_tensor_name})) {{ + // functionalization is re-entrant, but will no-op if it wasn't passed a FunctionalTensorWrapper. + at::AutoDispatchSkipFunctionalize guard; + return at::_ops::{api_name}::call({', '.join(view_redispatch_args)}); + }} {return_type} tmp_output; {return_type} reference_tensor_output; {{ at::AutoDispatchSkipFunctionalize guard; {meta_conversion_str} reference_tensor_output = at::_ops::{api_name}::call({', '.join(meta_call_args)}); - tmp_output = at::_ops::{api_name}::redispatch({', '.join(view_redispatch_args)}); + tmp_output = at::_ops::{api_name}::call({', '.join(view_redispatch_args)}); // I'm fusing the [alias removal], [mutation removal], [add views back] passes together. // Later, we'll want to turn them into separate passes (since e.g. vulkan only cares about alias removal). }} @@ -203,16 +222,23 @@ def emit_inplace_functionalization_body( dispatcher_sig = DispatcherSignature.from_schema(f.func) - keyset = 'dispatchKeySet & c10::after_func_keyset' return_type = dispatcher_sig.returns_type().remove_const_ref().cpp_type() - unwrap_tensor_args_str, unwrapped_args_ctx = unwrap_tensor_args(dispatcher_sig) + unwrap_tensor_args_str, unwrapped_args_ctx = unwrap_tensor_args(dispatcher_sig, is_view_op=False) maybe_return = '' if len(f.func.returns) == 0 else 'return ' - sync_tensor_args = '\n '.join(mapMaybe( - lambda arg: f'at::functionalization::impl::sync({arg.name});' - if arg.type.is_tensor_like() else None, - f.func.arguments.flat_all)) + + mutated_names = [a.name for a in f.func.arguments.flat_all if a.type.is_tensor_like() and a.annotation is not None] + non_mutated_names = [a.name for a in f.func.arguments.flat_all if a.type.is_tensor_like() and a.annotation is None] + # all mutable inputs must be functional tensors in order to participate in functionalization + check_all_mutated_args_are_functional = ' && '.join( + ['true'] + [f'at::functionalization::impl::isFunctionalTensor({a})' for a in mutated_names]) + check_any_non_mutated_args_are_functional = ' || '.join( + ['false'] + [f'at::functionalization::impl::isFunctionalTensor({a})' for a in non_mutated_names]) + # These are used in the cases where we don't functionalize and redispatch to the inplace op + # case 1: we hit an inplace op that doesn't have an out-of-place equivalent + # case 2: we hit an inplace ops but our inputs are not functional tensors (in which case our kernel just no-ops) + inplace_exprs = [e.expr for e in translate(unwrapped_args_ctx, dispatcher_sig.arguments(), method=False)] # Note [functionalizating copy_() and not preserving strides] # copy_() can't be functionalized, since there doesn't exist an out-of-place variant. @@ -225,34 +251,31 @@ def emit_inplace_functionalization_body( # - There are actually a few other places where the functionalization pass currently doesn't support strides: # calls to slice/diagonal_scatter don't currently preserve the strides of their inputs (but maybe we should fix this). if str(f.func.name) == 'copy_': - exprs = [keyset] + [a.name for a in unwrapped_args_ctx] - functional_call_str = f"""\ - auto tmp_intermediate = at::_ops::to_other::redispatch({keyset}, src_, self_, non_blocking, false, c10::nullopt); - tmp_output = at::_ops::expand_as::redispatch({keyset}, tmp_intermediate, self_);""" + functional_call_str = """\ + auto tmp_intermediate = at::_ops::to_other::call(src_, self_, non_blocking, false, c10::nullopt); + tmp_output = at::_ops::expand_as::call(tmp_intermediate, self_);""" elif functional_op is None: # We can't functionalize this inplace op, since we don't know what the corresponding functional op is. - inplace_exprs = [keyset] + [e.expr for e in translate(unwrapped_args_ctx, dispatcher_sig.arguments(), method=False)] - warn_str = "Note: the functionalization pass encountered an operator ({}) that it could not functionalize, \ + warn_str = "Note: the functionalization pass encountered an operator ({str(f.func.name)}) that it could not functionalize, \ because it couldn't find an out-of-place equivalent of the operator to call. \ Instead, it's calling the inplace/view operator directly. \ -If this causes problems in your program, consider upstreaming the out-of-place op to PyTorch.".format(str(f.func.name)) +If this causes problems in your program, consider upstreaming the out-of-place op to PyTorch." return f""" if (c10::impl::tls_local_dispatch_key_set().included_.has(c10::DispatchKey::Functionalize)) {{ TORCH_WARN("{warn_str}"); }} - {sync_tensor_args} {unwrap_tensor_args_str} at::AutoDispatchSkipFunctionalize guard; // Redispatch as normally otherwise, since XLA has its own lowerings for special inplace ops. - {maybe_return}at::_ops::{f.func.name.unambiguous_name()}::redispatch({', '.join(inplace_exprs)}); + {maybe_return}at::_ops::{f.func.name.unambiguous_name()}::call({', '.join(inplace_exprs)}); """ else: # call the out-of-place variant of the op functional_sig = DispatcherSignature.from_schema(functional_op.func) - functional_exprs = [keyset] + [e.expr for e in translate(unwrapped_args_ctx, functional_sig.arguments(), method=False)] + functional_exprs = [e.expr for e in translate(unwrapped_args_ctx, functional_sig.arguments(), method=False)] functional_call_str = \ - f"tmp_output = at::_ops::{functional_op.func.name.unambiguous_name()}::redispatch({', '.join(functional_exprs)});" + f"tmp_output = at::_ops::{functional_op.func.name.unambiguous_name()}::call({', '.join(functional_exprs)});" mutable_input_post_processing = '\n'.join([ f""" @@ -263,16 +286,29 @@ def emit_inplace_functionalization_body( if a.annotation and a.annotation.is_write and a.type.is_tensor_like()]) return f""" - {sync_tensor_args} {unwrap_tensor_args_str} - {return_type} tmp_output; - {{ + if (!({check_all_mutated_args_are_functional})) {{ + if (({check_any_non_mutated_args_are_functional})) {{ + // case 1: trying to mutate a non functional tensor with a functional tensor is an error + TORCH_INTERNAL_ASSERT(false, + "mutating a non-functional tensor with a functional tensor is not allowed.", + " Please ensure that all of your inputs are wrapped inside of a functionalize() call."); + }} else {{ + // case 2: arguments are not functional tensors, so we no-op and redispatch. + at::AutoDispatchSkipFunctionalize guard; + at::_ops::{f.func.name.unambiguous_name()}::call({', '.join(inplace_exprs)}); + {return_str(f)}; + }} + }} else {{ + {return_type} tmp_output; + {{ at::AutoDispatchSkipFunctionalize guard; // The functionalization pass explicitly doesn't pass out= parameters to the redispatch {functional_call_str} - }} - {mutable_input_post_processing} - {return_str(f)};""" + }} + {mutable_input_post_processing} + {return_str(f)}; + }}""" def emit_declaration_for_noncomposite_views(f: NativeFunction) -> str: diff --git a/tools/codegen/gen_lazy_tensor.py b/tools/codegen/gen_lazy_tensor.py index 12a0dec9170e00..591abf3a479239 100644 --- a/tools/codegen/gen_lazy_tensor.py +++ b/tools/codegen/gen_lazy_tensor.py @@ -6,7 +6,7 @@ from collections import namedtuple, Counter from typing import List, Dict, Union, Sequence, Optional, Callable, Iterable, Iterator, Tuple, Type from tools.codegen.dest.lazy_ir import LazyIR, TSLazyIR -from tools.codegen.gen import get_grouped_native_functions, parse_native_yaml +from tools.codegen.gen import get_grouped_native_functions, parse_native_yaml, NamespaceHelper from tools.codegen.model import (FunctionSchema, NativeFunction, NativeFunctionsGroup, OperatorName) from tools.codegen.selective_build.selector import SelectiveBuilder @@ -16,6 +16,64 @@ gen_dispatchkey_nativefunc_headers, gen_dispatcher_registrations) +# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # +# +# Lazy Tensor Codegen +# +# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # +# Overview +# ~~~~~~~~ +# +# This codegen script builds on existing data models and helpers used +# by all ATen backends, and adds new functionality specific to lazy +# tensor backends. +# +# Inputs: +# - _native_functions.yaml: controls which operators are +# supported by the backend. +# +# Outputs: +# (for all backends) +# Ir.h defines Lazy IR classes to be constructed during tracing +# - opt-in: also generate 'lowering' methods for the TorchScript backend only +# NativeFunctions.cpp defines implementations of native functions which perform lazy tracing +# - opt-in: 'full_codegen' section of backend yaml; 'supported' section omits these implementations +# NativeFunctions.h declares implementations of native functions for both 'supported' and 'full_codegen' +# ops +# +# Register.cpp registers all op implementations with the dispatcher +# RegisterAutograd.cpp registers all autograd implementations with the dispatcher +# +# Validation Helpers: +# - Shape Inference: errs if any ops in backend yaml require shape inference not provided by meta kernels or +# implementations in torch/csrc/lazy/core/shape_inference.* +# - native function impls: errs if any 'supported' ops do not have an implementation defined in the backend +# (non-codegen) implementation file +# +# +# About the Data Model +# ~~~~~~~~~~~~~~~~~~~~ +# +# Modeled after ATen codegen, the first step is to parse yaml and build a data model for the operators +# we care about. In this case, the _native_functions yaml defines a subset of the core operators +# (defined in more detail in the main native_functions.yaml), which will be supported by your backend. +# Backends can list ops in two categories: +# - `supported` ops require hand-implementations but still get codegenned declarations and registrations +# - `full_codegen` ops get implementations (and IR classes) generated too +# +# Each native function is modeled as an object with a schema, and each schema has objects representing their +# arguments. Much of the codegen is manipulation of the arguments and their types. For example, lazy tensor +# backends need to transform 'at::Tensor' arguments into 'lazy::Value' objects, as well as replacing reference +# types (stringref) with actual string objects, and this is done by manipulating the data model objects. +# - see api/lazy.py for the lazy data model +# +# Once the data model is set up, the rest of this script processes a number of templates for output CPP file +# and fills in the template values using helpers in `dest/lazy_ir.py` and `dest/lazy_ts_lowering.py`. These +# helpers mostly iterate over functions and their arguments, outputting different c++ snippets. +# +# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # + + # Parses the external backend's yaml, and adds a new BackendIndex for the backend's dispatch key. # Returns a Tuple of (backend_key, autograd_key, cpp_namespace, updated BackendIndex mapping, full_codegen) ParsedExternalYaml = namedtuple('ParsedExternalYaml', [ @@ -62,6 +120,15 @@ def validate_shape_inference_header(shape_inference_hdr: str, expected_shape_inf and implement it in the the corresponding shape_inference.cpp file.\n {decl}""" +class default_args: + node_base: str = "Node" + node_base_hdr: Optional[str] = None + shape_inference_hdr: str = "torch/csrc/lazy/core/shape_inference.h" + tensor_class: str = "torch::lazy::LazyTensor" + tensor_class_hdr: str = "torch/csrc/lazy/core/tensor.h" + lazy_ir_cls: Type[LazyIR] = LazyIR + backend_name: str = "TorchScript" + def main() -> None: parser = argparse.ArgumentParser(description='Generate Lazy Tensor backend files') parser.add_argument( @@ -78,41 +145,62 @@ def main() -> None: '--gen_ts_lowerings', action="store_true", help='Generate TorchScript lowerings in addition to Lazy IR and NativeFunctions') parser.add_argument( - '--node_base', type=str, default="Node", help='Name of backend specific custom Lazy IR Node base class') + '--node_base', type=str, default=default_args.node_base, + help='Name of backend specific custom Lazy IR Node base class') parser.add_argument( - '--node_base_hdr', type=str, default=None, help='Path to header file defining custom Lazy IR Node base class') + '--node_base_hdr', type=str, default=default_args.node_base_hdr, + help='Path to header file defining custom Lazy IR Node base class') parser.add_argument( - '--shape_inference_hdr', type=str, default=None, + '--shape_inference_hdr', type=str, default=default_args.shape_inference_hdr, help='Path to header file defining custom Lazy shape inference functions') parser.add_argument( - '--tensor_class', type=str, default="torch::lazy::LazyTensor", + '--tensor_class', type=str, default=default_args.tensor_class, help='Name of backend specific custom Lazy Tensor class') parser.add_argument( - '--tensor_class_hdr', type=str, default="torch/csrc/lazy/core/tensor.h", + '--tensor_class_hdr', type=str, default=default_args.tensor_class_hdr, help='Path to header file defining custom Lazy Tensor class') + parser.add_argument( + '--backend_name', type=str, default=default_args.backend_name, + help='Name of the backend to generate') options = parser.parse_args() - run(options.source_yaml, options.output_dir, options.dry_run, options.impl_path, - options.gen_ts_lowerings, options.node_base, options.node_base_hdr, - options.tensor_class, options.tensor_class_hdr, options.shape_inference_hdr, - TSLazyIR) - - -def run(source_yaml: str, output_dir: str, dry_run: bool, impl_path: Optional[str], - gen_ts_lowerings: bool, node_base: str, node_base_hdr: Optional[str], - tensor_class: str, tensor_class_hdr: str, shape_inference_hdr: str, - lazy_ir_cls: Type[LazyIR]) -> None: - # Assumes that this file lives at PYTORCH_ROOT/tools/codegen/gen_backend_stubs.py - pytorch_root = pathlib.Path(__file__).parent.parent.parent.absolute() - template_dir = os.path.join(pytorch_root, "aten/src/ATen/templates") + torch_root = pathlib.Path(__file__).parent.parent.parent.absolute() + aten_path = str(torch_root / "aten" / "src" / "ATen") + ir_gen_class: Type[LazyIR] = default_args.lazy_ir_cls + if options.gen_ts_lowerings: + ir_gen_class = TSLazyIR + + run_gen_lazy_tensor(aten_path, options.source_yaml, options.output_dir, options.dry_run, options.impl_path, + options.node_base, options.node_base_hdr, + options.tensor_class, options.tensor_class_hdr, options.shape_inference_hdr, + ir_gen_class, options.backend_name) + + +def run_gen_lazy_tensor(aten_path: str, source_yaml: str, output_dir: str, + dry_run: bool, impl_path: Optional[str], + node_base: str = default_args.node_base, + node_base_hdr: Optional[str] = default_args.node_base_hdr, + tensor_class: str = default_args.tensor_class, + tensor_class_hdr: str = default_args.tensor_class_hdr, + shape_inference_hdr: str = default_args.shape_inference_hdr, + lazy_ir_cls: Type[LazyIR] = default_args.lazy_ir_cls, + # build_in_tree is true for TS backend and affects include paths + build_in_tree: bool = False, + # per_operator_headers changes whether ATen/Functions.h or individual operator headers are used + # it must match how ATen was built + per_operator_headers: bool = False, + backend_name: str = default_args.backend_name, + gen_forced_fallback_code: bool = False) -> None: + + template_dir = os.path.join(aten_path, "templates") def make_file_manager(install_dir: str) -> FileManager: return FileManager(install_dir=install_dir, template_dir=template_dir, dry_run=dry_run) fm = make_file_manager(output_dir) - native_yaml_path = os.path.join(pytorch_root, 'aten/src/ATen/native/native_functions.yaml') + native_yaml_path = os.path.join(aten_path, 'native/native_functions.yaml') parsed_yaml = parse_native_yaml(native_yaml_path) native_functions, backend_indices = parsed_yaml.native_functions, parsed_yaml.backend_indices grouped_native_functions = get_grouped_native_functions(native_functions) @@ -171,7 +259,7 @@ def gen_key(func: FunctionSchema) -> Tuple[str, str]: if impl_path is not None: error_on_missing_kernels(native_functions, backend_indices, backend_key, - autograd_key, impl_path, full_codegen) + autograd_key, class_name, impl_path, full_codegen) """ Validate Shape Inference Definitions @@ -196,19 +284,29 @@ def gen_key(func: FunctionSchema) -> Tuple[str, str]: codegenInplaceVariant=True ) ) + validate_shape_inference_header(shape_inference_hdr, expected_shape_infr_decls) assert class_name is not None # Generate nativefunction declarations + # Note, eager registrations is set to False for the lazy TS backend as another LTC backend + # may want to register their own lazy kernels instead of registering the TS ones. + # The registration will lazily happen when init_ts_backend is called. gen_dispatchkey_nativefunc_headers(fm, class_name, cpp_namespace, backend_indices, - grouped_native_functions, backend_key, autograd_key) + grouped_native_functions, backend_key, autograd_key, + backend_name) # Generate Dispatcher registrations which hook up the nativefunctions for dispatch_key in [backend_key] if autograd_key is None else [backend_key, autograd_key]: - gen_dispatcher_registrations(fm, output_dir, cpp_namespace, backend_indices, grouped_native_functions, - backend_key, dispatch_key, selector) + gen_dispatcher_registrations(fm, output_dir, class_name, cpp_namespace, backend_indices, grouped_native_functions, + backend_key, dispatch_key, selector, + build_in_tree=build_in_tree, + per_operator_headers=per_operator_headers, + backend_name=backend_name, + eager_registration=False) # Generate native function impls that build IR nodes + ns_helper = NamespaceHelper(cpp_namespace) fm.write_with_template(f'{backend_key}NativeFunctions.cpp', 'DispatchKeyNativeFunctions.cpp', lambda: { 'includes': [f'#include <{path}>' for path in [ tensor_class_hdr, @@ -216,46 +314,46 @@ def gen_key(func: FunctionSchema) -> Tuple[str, str]: "ATen/Functions.h", "ATen/MetaFunctions.h", "ATen/Operators.h", + "ATen/native/CPUFallback.h", "torch/csrc/lazy/core/lazy_graph_executor.h", "torch/csrc/lazy/core/metrics.h", "torch/csrc/lazy/core/shape.h", - "lazy_tensor_core/csrc/ts_backend/aten_eager_fallback.h", f"{output_dir}/{backend_key}NativeFunctions.h", - f"{output_dir}/{backend_key}LazyIr.h", - ]], + f"{output_dir}/LazyIr.h", + ] + (["torch/csrc/lazy/ts_backend/ts_eager_fallback.h"] if gen_forced_fallback_code else [])], 'native_functions_include': '', - 'backend_namespace': 'torch_lazy_tensors', # this is wrong + 'namespace_prologue': ns_helper.prologue, + 'namespace_epilogue': ns_helper.epilogue, 'native_function_definitions': list(concat_map_codegen( dest.GenLazyNativeFuncDefinition(f'{backend_key}NativeFunctions', backend_indices[backend_key], - tensor_class), + tensor_class, + gen_forced_fallback_code), grouped_native_functions, codegenInplaceVariant=True )), }) - # Generate IR node classes - fm.write_with_template(f'{backend_key}LazyIr.h', 'LazyIr.h', lambda: { + fm.write_with_template('LazyIr.h', 'LazyIr.h', lambda: { 'lazy_ir_sysinc': [f'#include <{path}>' for path in [ "ATen/core/Formatting.h", "c10/core/ScalarType.h", "c10/util/Optional.h", "torch/csrc/lazy/core/hash.h", "torch/csrc/lazy/core/ir.h", + "torch/csrc/lazy/core/shape.h", "vector", ]], 'lazy_ir_inc': [f'#include "{path}"' for path in [ node_base_hdr if node_base_hdr is not None else None ] if path is not None], - 'external_backend_headers': f'#include "{output_dir}/{backend_key}NativeFunctions.h"', - 'namespaced_headers': '', - 'DispatchKey': backend_key, - 'dispatch_namespace': backend_key.lower(), 'ir_declarations': list(concat_map_codegen( lazy_ir_cls(backend_indices[backend_key], node_base), grouped_native_functions )), + 'namespace_prologue': ns_helper.prologue, + 'namespace_epilogue': ns_helper.epilogue, }) diff --git a/tools/codegen/model.py b/tools/codegen/model.py index 1c61517a3e52b2..8536588848c4f1 100644 --- a/tools/codegen/model.py +++ b/tools/codegen/model.py @@ -48,58 +48,68 @@ class DispatchKey(Enum): Undefined = 0 CatchAll = Undefined - CPU = auto() - CUDA = auto() - HIP = auto() + Dense = auto() FPGA = auto() ORT = auto() - XLA = auto() - Lazy = auto() Vulkan = auto() Metal = auto() - XPU = auto() MKLDNN = auto() OpenGL = auto() OpenCL = auto() IDEEP = auto() - QuantizedCPU = auto() - QuantizedCUDA = auto() - QuantizedXPU = auto() + Quantized = auto() CustomRNGKeyId = auto() MkldnnCPU = auto() - SparseCPU = auto() - SparseCUDA = auto() + Sparse = auto() SparseCsrCPU = auto() SparseCsrCUDA = auto() - SparseHIP = auto() - SparseXPU = auto() - NestedTensor = auto() - PrivateUse1 = auto() - PrivateUse2 = auto() - PrivateUse3 = auto() - EndOfBackendKeys = PrivateUse3 ZeroTensor = auto() Meta = auto() BackendSelect = auto() Named = auto() AutogradOther = auto() + AutogradFunctionality = auto() + AutogradNestedTensor = auto() + Tracer = auto() + Autocast = auto() + Batched = auto() + VmapMode = auto() + TESTING_ONLY_GenericWrapper = auto() + TESTING_ONLY_GenericMode = auto() + EndOfFunctionalityKeys = TESTING_ONLY_GenericMode + + CPU = auto() + CUDA = auto() + HIP = auto() + XLA = auto() + Lazy = auto() + IPU = auto() + XPU = auto() + NestedTensor = auto() + PrivateUse1 = auto() + PrivateUse2 = auto() + PrivateUse3 = auto() + + QuantizedCPU = auto() + QuantizedCUDA = auto() + QuantizedXPU = auto() + + SparseCPU = auto() + SparseCUDA = auto() + SparseHIP = auto() + SparseXPU = auto() + AutogradCPU = auto() AutogradCUDA = auto() AutogradXLA = auto() AutogradLazy = auto() - AutogradNestedTensor = auto() + AutogradIPU = auto() AutogradXPU = auto() AutogradPrivateUse1 = auto() AutogradPrivateUse2 = auto() AutogradPrivateUse3 = auto() - Tracer = auto() - Autocast = auto() - Batched = auto() - VmapMode = auto() - TESTING_ONLY_GenericWrapper = auto() - TESTING_ONLY_GenericMode = auto() - NumDispatchKeys = auto() + Autograd = auto() CompositeImplicitAutograd = auto() CompositeExplicitAutograd = auto() @@ -454,6 +464,7 @@ def from_yaml( python_module = e.pop('python_module', None) assert python_module is None or isinstance(python_module, str), f'not a str: {python_module}' + assert python_module is None or Variant.method not in variants, 'functions in modules cannot be methods' category_override = e.pop('category_override', None) assert category_override is None or isinstance(category_override, str), f'not a str: {category_override}' @@ -1181,6 +1192,7 @@ def is_list_like(self) -> Optional['ListType']: 'QScheme', 'Storage', 'Stream', + 'SymInt', 'ConstQuantizerPtr', # TODO: rename )) diff --git a/tools/codegen/operator_versions/gen_mobile_upgraders.py b/tools/codegen/operator_versions/gen_mobile_upgraders.py index 5721d7086f81a7..fbfb1b39c1d2cb 100644 --- a/tools/codegen/operator_versions/gen_mobile_upgraders.py +++ b/tools/codegen/operator_versions/gen_mobile_upgraders.py @@ -119,8 +119,7 @@ class ByteCode(Enum): upgrader_function.function.append_operator( op.name, op.overload_name, - op.num_specified_args, - caffe2::serialize::kMaxSupportedFileFormatVersion); + op.num_specified_args); } } return upgrader_function_list; diff --git a/tools/codegen/operator_versions/gen_mobile_upgraders_constant.py b/tools/codegen/operator_versions/gen_mobile_upgraders_constant.py index 2adf6e793eebef..f83e5d1f4c943b 100644 --- a/tools/codegen/operator_versions/gen_mobile_upgraders_constant.py +++ b/tools/codegen/operator_versions/gen_mobile_upgraders_constant.py @@ -2,6 +2,6 @@ * @generated * This is an auto-generated file. Please do not modify it by hand. * To re-generate, please run: - * cd ~/pytorch && python torch/csrc/jit/mobile/upgrader_mobile.cpp + * cd ~/pytorch && python tools/codegen/operator_versions/gen_mobile_upgraders.py */ """ diff --git a/tools/extract_scripts.py b/tools/extract_scripts.py index fd90b1b9f0e5eb..5312ed00da111a 100755 --- a/tools/extract_scripts.py +++ b/tools/extract_scripts.py @@ -63,6 +63,8 @@ def main() -> None: for job_name, job in workflow['jobs'].items(): job_dir = out / p / job_name + if "steps" not in job: + continue steps = job['steps'] index_chars = len(str(len(steps) - 1)) for i, step in enumerate(steps, start=1): diff --git a/tools/git-pre-commit b/tools/git-pre-commit index 1c4340c6b43486..a7b9e4562cd6f8 100755 --- a/tools/git-pre-commit +++ b/tools/git-pre-commit @@ -1,9 +1,6 @@ #!/bin/bash set -e -echo "Running pre-commit flake8" -python3 tools/linter/flake8_hook.py - echo "Running pre-commit clang-tidy" git diff HEAD > pr.diff python3 -m tools.linter.clang_tidy --diff-file "pr.diff" diff --git a/tools/jit/gen_unboxing.py b/tools/jit/gen_unboxing.py index 9171c56a2f5584..976cf3e676ab71 100644 --- a/tools/jit/gen_unboxing.py +++ b/tools/jit/gen_unboxing.py @@ -10,6 +10,7 @@ from tools.codegen.context import method_with_native_function from tools.codegen.gen import parse_native_yaml, cpp_string from tools.codegen.model import NativeFunction, NativeFunctionsGroup, Variant +from tools.codegen.selective_build.selector import SelectiveBuilder from tools.codegen.utils import Target, FileManager, mapMaybe, make_file_manager from typing import Union, Sequence from typing_extensions import Literal @@ -19,9 +20,12 @@ @dataclass(frozen=True) class ComputeUnboxingFunctions: target: Union[Literal[Target.DECLARATION], Literal[Target.DEFINITION]] + selector: SelectiveBuilder @method_with_native_function def __call__(self, f: NativeFunction) -> str: + if not self.selector.is_root_operator(f"aten::{f.func.name}"): + return "" if self.target is Target.DECLARATION: # Note [The ATen Codegen Unboxing API] @@ -78,11 +82,15 @@ def __call__(self, f: NativeFunction) -> str: # Generates RegisterCodegenUnboxedKernels.cpp. @dataclass(frozen=True) class ComputeCodegenUnboxedKernels: + selector: SelectiveBuilder + @method_with_native_function def __call__(self, f: NativeFunction) -> str: + if not self.selector.is_root_operator(f"aten::{f.func.name}"): + return "" # We unconditionally generate function wrappers, sig_group = CppSignatureGroup.from_native_function( - f, method=(Variant.method in f.variants) + f, method=False ) sig = sig_group.most_faithful_signature() @@ -90,9 +98,34 @@ def __call__(self, f: NativeFunction) -> str: # escape double quote in schema, get rid of extra double quotes schema = cpp_string(str(sig.func))[1:-1] + # arguments + args = sig.arguments() + connector = ",\n\t\t" + args_code = [] + for arg in args: + if not arg.default: + arg_cpp = "c10::IValue(c10::nullopt)" + elif arg.default.startswith('{'): + arg_cpp = f"c10::IntArrayRef({arg.default})" + else: + arg_cpp = f"c10::IValue({arg.default})" + args_code.append(f"""c10::Argument("{arg.name}", nullptr, c10::nullopt, {arg_cpp})""") + + returns = f.func.returns + returns_code = [] + for ret in returns: + returns_code.append(f"""c10::Argument("{ret.name if ret.name else ""}")""") return f""" +// aten::{schema} OperatorGenerator( - TORCH_SELECTIVE_SCHEMA("aten::{schema}"), + "aten::{f.func.name.name}", + "{f.func.name.overload_name}", + {{ + {connector.join(args_code)} + }}, + {{ + {connector.join(returns_code)} + }}, [](Stack & stack) {{ RECORD_FUNCTION("{sig.name()}", std::vector()); at::unboxing::{unboxing.name(f)}(stack); @@ -106,6 +139,7 @@ def gen_unboxing( *, native_functions: Sequence[NativeFunction], cpu_fm: FileManager, + selector: SelectiveBuilder, ) -> None: def key_func(fn: Union[NativeFunction, NativeFunctionsGroup]) -> str: return fn.root_name @@ -115,7 +149,7 @@ def key_func(fn: Union[NativeFunction, NativeFunctionsGroup]) -> str: native_functions, key_fn=key_func, env_callable=lambda fn: { - "definitions": [ComputeUnboxingFunctions(Target.DEFINITION)(fn)] + "definitions": [ComputeUnboxingFunctions(Target.DEFINITION, selector)(fn)] }, num_shards=5, sharded_keys={"definitions"}, @@ -124,7 +158,7 @@ def key_func(fn: Union[NativeFunction, NativeFunctionsGroup]) -> str: "UnboxingFunctions.h", lambda: { "declarations": list( - mapMaybe(ComputeUnboxingFunctions(Target.DECLARATION), native_functions) + mapMaybe(ComputeUnboxingFunctions(Target.DECLARATION, selector), native_functions) ), }, ) @@ -132,8 +166,8 @@ def key_func(fn: Union[NativeFunction, NativeFunctionsGroup]) -> str: "RegisterCodegenUnboxedKernels.cpp", native_functions, key_fn=key_func, - env_callable=lambda fn: {"unboxed_ops": [ComputeCodegenUnboxedKernels()(fn)]}, - num_shards=5, + env_callable=lambda fn: {"unboxed_ops": [ComputeCodegenUnboxedKernels(selector)(fn)]}, + num_shards=10, sharded_keys={"unboxed_ops"}, ) @@ -156,9 +190,21 @@ def main() -> None: parser.add_argument( '--dry-run', action='store_true', help='run without writing any files (still updates outputs)') + parser.add_argument( + '--op_selection_yaml_path', + help='Provide a path to the operator selection (for custom build) YAML ' + 'that contains the information about the set of selected operators ' + 'and their categories (training, ...). Each operator is either a ' + 'full operator name with overload or just a bare operator name. ' + 'The operator names also contain the namespace prefix (e.g. aten::)') options = parser.parse_args() + if options.op_selection_yaml_path is not None: + selector = SelectiveBuilder.from_yaml_path(options.op_selection_yaml_path) + else: + selector = SelectiveBuilder.get_nop_selector() + native_yaml_path = os.path.join(options.source_path, "native/native_functions.yaml") parsed_yaml = parse_native_yaml(native_yaml_path) native_functions, backend_indices = ( @@ -167,7 +213,7 @@ def main() -> None: ) cpu_fm = make_file_manager(options=options) - gen_unboxing(native_functions=native_functions, cpu_fm=cpu_fm) + gen_unboxing(native_functions=native_functions, cpu_fm=cpu_fm, selector=selector) if options.output_dependencies: depfile_path = pathlib.Path(options.output_dependencies).resolve() diff --git a/tools/linter/clang_format_all.py b/tools/linter/clang_format_all.py index 7792f15a77d126..2a5f9370e922f6 100755 --- a/tools/linter/clang_format_all.py +++ b/tools/linter/clang_format_all.py @@ -21,13 +21,21 @@ # If you edit this, please edit the allowlist in clang_format_ci.sh as well. CLANG_FORMAT_ALLOWLIST = [ "c10/", + "ios/", "torch/csrc/jit/", + "torch/csrc/deploy/", "test/cpp/jit/", "test/cpp/tensorexpr/" ] +CLANG_FORMAT_BLOCK_LIST = { + "torch/csrc/jit/serialization/mobile_bytecode_generated.h", +} + + # Only files with names matching this regex will be formatted. -CPP_FILE_REGEX = re.compile(".*\\.(h|cpp|cc|c|hpp)$") +CPP_FILE_REGEX = re.compile(".*\\.(h|cpp|cc|c|hpp|m|mm)$") + def get_allowlisted_files() -> Set[str]: @@ -39,6 +47,9 @@ def get_allowlisted_files() -> Set[str]: for dir in CLANG_FORMAT_ALLOWLIST: for root, dirnames, filenames in os.walk(dir): for filename in filenames: + fullpath = os.path.join(root, filename) + if fullpath in CLANG_FORMAT_BLOCK_LIST: + continue if CPP_FILE_REGEX.match(filename): matches.append(os.path.join(root, filename)) return set(matches) diff --git a/tools/linter/clang_format_ci.sh b/tools/linter/clang_format_ci.sh index 6f5220e516d19f..15c8d235fe91c8 100755 --- a/tools/linter/clang_format_ci.sh +++ b/tools/linter/clang_format_ci.sh @@ -7,7 +7,9 @@ set -eux # If you edit this allowlist, please edit the one in clang_format_all.py as well find . -type f \ -path './c10/*' -or \ - -path './torch/csrc/jit/*' -or \ + -path './ios/*' -or \ + -path './torch/csrc/jit/!(serialization/mobile_bytecode_generated.h)' -or \ + -path './torch/csrc/deploy/*' -or \ -path './test/cpp/jit/*' -or \ -path './test/cpp/tensorexpr/*' \ | xargs tools/linter/git-clang-format --verbose "$1" -- diff --git a/tools/linter/clang_tidy/__main__.py b/tools/linter/clang_tidy/__main__.py index fa6403a64bb664..18f2da24337fc6 100644 --- a/tools/linter/clang_tidy/__main__.py +++ b/tools/linter/clang_tidy/__main__.py @@ -5,6 +5,7 @@ import subprocess import re import sys +from sysconfig import get_paths as gp from typing import List @@ -13,6 +14,9 @@ from tools.linter.install.clang_tidy import INSTALLATION_PATH from tools.linter.install.download_bin import PYTORCH_ROOT +# Returns '/usr/local/include/python' +def get_python_include_dir() -> str: + return gp()['include'] def clang_search_dirs() -> List[str]: # Compilers are ordered based on fallback preference @@ -76,6 +80,9 @@ def clang_search_dirs() -> List[str]: "-torch/csrc/jit/serialization/export.cpp", "-torch/csrc/jit/serialization/import.cpp", "-torch/csrc/jit/serialization/import_legacy.cpp", + "-torch/csrc/jit/serialization/mobile_bytecode_generated.cpp", + "-torch/csrc/init_flatbuffer_module.cpp", + "-torch/csrc/stub_with_flatbuffer.c", "-torch/csrc/onnx/init.cpp", "-torch/csrc/cuda/nccl.*", "-torch/csrc/cuda/python_nccl.cpp", @@ -90,7 +97,11 @@ def clang_search_dirs() -> List[str]: "-torch/csrc/deploy/test_deploy_python_ext.cpp", ], "paths": ["torch/csrc/"], - "include-dir": ["/usr/lib/llvm-11/include/openmp"] + clang_search_dirs(), + "include-dir": [ + "/usr/lib/llvm-11/include/openmp", + get_python_include_dir(), + os.path.join(PYTORCH_ROOT, "third_party/pybind11/include") + ] + clang_search_dirs(), "clang-tidy-exe": INSTALLATION_PATH, "compile-commands-dir": "build", "config-file": ".clang-tidy", diff --git a/tools/linter/clang_tidy/generate_build_files.py b/tools/linter/clang_tidy/generate_build_files.py index 9e3db664ab0d9d..95ff98c30011b2 100644 --- a/tools/linter/clang_tidy/generate_build_files.py +++ b/tools/linter/clang_tidy/generate_build_files.py @@ -51,8 +51,7 @@ def run_autogen() -> None: "tools/setup_helpers/generate_code.py", "--native-functions-path", "aten/src/ATen/native/native_functions.yaml", - "--nn-path", - "aten/src", + "--gen_lazy_ts_backend", ] ) diff --git a/tools/linter/flake8_hook.py b/tools/linter/flake8_hook.py deleted file mode 100755 index b9ebd5b4793123..00000000000000 --- a/tools/linter/flake8_hook.py +++ /dev/null @@ -1,13 +0,0 @@ -#!/usr/bin/env python3 - -import sys - -from flake8.main import git # type: ignore[import] - -if __name__ == '__main__': - sys.exit( - git.hook( - strict=True, - lazy=git.config_for('lazy'), - ) - ) diff --git a/tools/onnx/update_default_opset_version.py b/tools/onnx/update_default_opset_version.py new file mode 100755 index 00000000000000..358bbfdfe39ce1 --- /dev/null +++ b/tools/onnx/update_default_opset_version.py @@ -0,0 +1,78 @@ +#!/usr/bin/env python3 + +"""Updates the default value of opset_version. + +The current policy is that the default should be set to the +latest released version as of 18 months ago. + +Usage: +Run with no arguments. +""" + +import datetime +import os +import pathlib +import re +import sys +import subprocess +from subprocess import DEVNULL + + +pytorch_dir = pathlib.Path(__file__).parent.parent.parent.resolve() +onnx_dir = pytorch_dir / "third_party" / "onnx" +os.chdir(onnx_dir) + +date = datetime.datetime.now() - datetime.timedelta(days=18 * 30) +onnx_commit = subprocess.check_output(("git", "log", f"--until={date}", "--max-count=1", "--format=%H"), + encoding="utf-8").strip() +onnx_tags = subprocess.check_output(("git", "tag", "--list", f"--contains={onnx_commit}"), encoding="utf-8") +tag_tups = [] +semver_pat = re.compile(r"v(\d+)\.(\d+)\.(\d+)") +for tag in onnx_tags.splitlines(): + match = semver_pat.match(tag) + if match: + tag_tups.append(tuple(int(x) for x in match.groups())) + +version_str = "{}.{}.{}".format(*min(tag_tups)) + +print("Using ONNX release", version_str) + +head_commit = subprocess.check_output(("git", "log", "--max-count=1", "--format=%H", "HEAD"), + encoding="utf-8").strip() + +new_default = None + +subprocess.check_call(("git", "checkout", f"v{version_str}"), stdout=DEVNULL, stderr=DEVNULL) +try: + from onnx import helper # type: ignore[import] + for version in helper.VERSION_TABLE: + if version[0] == version_str: + new_default = version[2] + print("found new default opset_version", new_default) + break + if not new_default: + sys.exit(f"failed to find version {version_str} in onnx.helper.VERSION_TABLE at commit {onnx_commit}") +finally: + subprocess.check_call(("git", "checkout", head_commit), stdout=DEVNULL, stderr=DEVNULL) + +os.chdir(pytorch_dir) + + +def read_sub_write(path: str, prefix_pat: str) -> None: + with open(path, encoding="utf-8") as f: + content_str = f.read() + content_str = re.sub(prefix_pat, r"\g<1>{}".format(new_default), content_str) + with open(path, "w", encoding="utf-8") as f: + f.write(content_str) + print("modified", path) + +read_sub_write(os.path.join("torch", "onnx", "symbolic_helper.py"), + r"(_default_onnx_opset_version = )\d+") +read_sub_write(os.path.join("torch", "onnx", "__init__.py"), + r"(opset_version \(int, default )\d+") + +print("Updating operator .expect files") +subprocess.check_call(("python", "setup.py", "develop"), + stdout=DEVNULL, stderr=DEVNULL) +subprocess.check_call(("python", os.path.join("test", "onnx", "test_operators.py"), "--accept"), + stdout=DEVNULL, stderr=DEVNULL) diff --git a/tools/pyi/gen_pyi.py b/tools/pyi/gen_pyi.py index 73cc5fb2cbdeb4..6325bedffaaf67 100644 --- a/tools/pyi/gen_pyi.py +++ b/tools/pyi/gen_pyi.py @@ -4,7 +4,8 @@ from tools.codegen.model import Variant from tools.codegen.api.python import (PythonSignatureGroup, - PythonSignatureNativeFunctionPair) + PythonSignatureNativeFunctionPair, + returns_named_tuple_pyi) from tools.codegen.gen import parse_native_yaml from tools.codegen.utils import FileManager from typing import Sequence, List, Dict @@ -77,6 +78,7 @@ def should_bind_method(python_func: PythonSignatureNativeFunctionPair) -> bool: 'range', # defined in functional 'einsum', + 'histogramdd', # reduction argument; these bindings don't make sense 'binary_cross_entropy_with_logits', 'ctc_loss', @@ -397,7 +399,7 @@ def gen_pyi(native_yaml_path: str, deprecated_yaml_path: str, fm: FileManager) - name = group.signature.name unsorted_function_hints[name] += generate_type_hints(group) - named_tuple = group.signature.returns.named_tuple_pyi() + named_tuple = returns_named_tuple_pyi(group.signature) if named_tuple is not None and not group.signature.deprecated: # deprecated namedtuples are currently not included for torch functions tuple_name, tuple_def = named_tuple @@ -468,6 +470,7 @@ def gen_pyi(native_yaml_path: str, deprecated_yaml_path: str, fm: FileManager) - '_is_view': ['def _is_view(self) -> _bool: ...'], 'is_cuda': ['is_cuda: _bool'], 'is_leaf': ['is_leaf: _bool'], + 'is_nested': ['is_nested: _bool'], 'is_sparse': ['is_sparse: _bool'], 'is_sparse_csr' : ['is_sparse_csr: _bool'], 'is_quantized': ['is_quantized: _bool'], @@ -475,6 +478,7 @@ def gen_pyi(native_yaml_path: str, deprecated_yaml_path: str, fm: FileManager) - 'is_ort': ['is_ort: _bool'], 'is_mkldnn': ['is_mkldnn: _bool'], 'is_vulkan': ['is_vulkan: _bool'], + 'is_ipu': ['is_ipu: _bool'], 'storage_offset': ['def storage_offset(self) -> _int: ...'], 'to': ['def to(self, dtype: _dtype, non_blocking: _bool=False, copy: _bool=False) -> Tensor: ...', 'def to(self, device: Optional[Union[_device, str]]=None, dtype: Optional[_dtype]=None, ' @@ -524,7 +528,7 @@ def gen_pyi(native_yaml_path: str, deprecated_yaml_path: str, fm: FileManager) - name = group.signature.name unsorted_tensor_method_hints[name] += generate_type_hints(group) - named_tuple = group.signature.returns.named_tuple_pyi() + named_tuple = returns_named_tuple_pyi(group.signature) if named_tuple is not None and not group.signature.deprecated: # deprecated namedtuples are currently not included for torch functions tuple_name, tuple_def = named_tuple @@ -615,6 +619,10 @@ def gen_pyi(native_yaml_path: str, deprecated_yaml_path: str, fm: FileManager) - 'generated_comment': '@' + 'generated from torch/_C/_VariableFunctions.pyi.in', **env, }) + fm.write_with_template('torch/return_types.pyi', 'torch/_C/return_types.pyi.in', lambda: { + 'generated_comment': '@' + 'generated from torch/_C/return_types.pyi', + **env, + }) gen_nn_functional(fm) diff --git a/tools/setup_helpers/BUILD.bazel b/tools/setup_helpers/BUILD.bazel new file mode 100644 index 00000000000000..f7239029a0911b --- /dev/null +++ b/tools/setup_helpers/BUILD.bazel @@ -0,0 +1,16 @@ +py_binary( + name = "generate_code", + srcs = ["generate_code.py"], + deps = [ + "//:tools_jit", + "//tools/autograd", + "//tools/codegen", + ], + visibility = ["//:__pkg__"], +) + +py_binary( + name = "gen_version_header", + srcs = ["gen_version_header.py"], + visibility = ["//:__pkg__"], +) diff --git a/tools/setup_helpers/generate_code.py b/tools/setup_helpers/generate_code.py index ef90acc3935a15..9d176e45c91065 100644 --- a/tools/setup_helpers/generate_code.py +++ b/tools/setup_helpers/generate_code.py @@ -27,7 +27,6 @@ def all_generator_source() -> List[str]: def generate_code(ninja_global: Optional[str] = None, - nn_path: Optional[str] = None, native_functions_path: Optional[str] = None, install_dir: Optional[str] = None, subset: Optional[str] = None, @@ -135,7 +134,6 @@ def get_selector( def main() -> None: parser = argparse.ArgumentParser(description='Autogenerate code') parser.add_argument('--native-functions-path') - parser.add_argument('--nn-path') parser.add_argument('--ninja-global') parser.add_argument('--install_dir') parser.add_argument( @@ -162,11 +160,20 @@ def main() -> None: help='force it to generate schema-only registrations for ops that are not' 'listed on --selected-op-list' ) + parser.add_argument( + '--gen_lazy_ts_backend', + action='store_true', + help='Enable generation of the torch::lazy TorchScript backend' + ) + parser.add_argument( + '--per_operator_headers', + action='store_true', + help='Build lazy tensor ts backend with per-operator ATen headers, must match how ATen was built' + ) options = parser.parse_args() generate_code( options.ninja_global, - options.nn_path, options.native_functions_path, options.install_dir, options.subset, @@ -176,6 +183,34 @@ def main() -> None: operator_selector=get_selector(options.selected_op_list_path, options.operators_yaml_path), ) + if options.gen_lazy_ts_backend: + aten_path = os.path.dirname(os.path.dirname(options.native_functions_path)) + ts_backend_yaml = os.path.join(aten_path, 'native/ts_native_functions.yaml') + ts_native_functions = "torch/csrc/lazy/ts_backend/ts_native_functions.cpp" + ts_node_base = "torch/csrc/lazy/ts_backend/ts_node.h" + if options.install_dir is None: + options.install_dir = "torch/csrc" + lazy_install_dir = os.path.join(options.install_dir, "lazy/generated") + if not os.path.exists(lazy_install_dir): + os.makedirs(lazy_install_dir) + + assert os.path.isfile(ts_backend_yaml), f"Unable to access ts_backend_yaml: {ts_backend_yaml}" + assert os.path.isfile(ts_native_functions), f"Unable to access {ts_native_functions}" + from tools.codegen.gen_lazy_tensor import run_gen_lazy_tensor + from tools.codegen.dest.lazy_ir import TSLazyIR + run_gen_lazy_tensor(aten_path=aten_path, + source_yaml=ts_backend_yaml, + backend_name="TorchScript", + output_dir=lazy_install_dir, + dry_run=False, + impl_path=ts_native_functions, + node_base="TsNode", + node_base_hdr=ts_node_base, + build_in_tree=True, + lazy_ir_cls=TSLazyIR, + per_operator_headers=options.per_operator_headers, + gen_forced_fallback_code=True) + if __name__ == "__main__": main() diff --git a/tools/stats/export_slow_tests.py b/tools/stats/export_slow_tests.py index b9d71cfb6cb7a2..6659438479c233 100644 --- a/tools/stats/export_slow_tests.py +++ b/tools/stats/export_slow_tests.py @@ -12,6 +12,7 @@ SLOW_TESTS_FILE = '.pytorch-slow-tests.json' SLOW_TEST_CASE_THRESHOLD_SEC = 60.0 RELATIVE_DIFFERENCE_THRESHOLD = 0.1 +IGNORED_JOBS = ["asan", "periodic"] def get_test_case_times() -> Dict[str, float]: reports: List[Report] = get_previous_reports_for_branch('origin/viable/strict', "") @@ -21,6 +22,10 @@ def get_test_case_times() -> Dict[str, float]: if report.get('format_version', 1) != 2: # type: ignore[misc] raise RuntimeError("S3 format currently handled is version 2 only") v2report = cast(Version2Report, report) + + if any(job_name in str(report['build_job']) for job_name in IGNORED_JOBS): + continue + for test_file in v2report['files'].values(): for suitename, test_suite in test_file['suites'].items(): for casename, test_case in test_suite['cases'].items(): diff --git a/tools/stats/import_test_stats.py b/tools/stats/import_test_stats.py index 375f7181b4583e..1b6c1907a98ab4 100644 --- a/tools/stats/import_test_stats.py +++ b/tools/stats/import_test_stats.py @@ -10,13 +10,14 @@ def get_disabled_issues() -> List[str]: pr_body = os.getenv('PR_BODY', '') + commit_messages = os.getenv('COMMIT_MESSAGES', '') # The below regex is meant to match all *case-insensitive* keywords that # GitHub has delineated would link PRs to issues, more details here: # https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue. # E.g., "Close #62851", "fixES #62851" and "RESOLVED #62851" would all match, but not # "closes #62851" --> extra space, "fixing #62851" --> not a keyword, nor "fix 62851" --> no # - regex = '(?i)(Close(d|s)?|Resolve(d|s)?|Fix(ed|es)?) #([0-9]+)' - issue_numbers = [x[4] for x in re.findall(regex, pr_body)] + regex = '(?i)(Close(d|s)?|Resolve(d|s)?|Fix(ed|es)?) (#|https://github.com/pytorch/pytorch/issues/)([0-9]+)' + issue_numbers = [x[5] for x in re.findall(regex, pr_body + commit_messages)] print("Ignoring disabled issues: ", issue_numbers) return issue_numbers diff --git a/tools/stats/print_test_stats.py b/tools/stats/print_test_stats.py index b1887c8d277c31..836ee5f81cc51a 100755 --- a/tools/stats/print_test_stats.py +++ b/tools/stats/print_test_stats.py @@ -107,8 +107,14 @@ def plural(n: int) -> str: def get_base_commit(sha1: str) -> str: + default_branch = os.environ.get('GIT_DEFAULT_BRANCH') + # capture None and "" cases + if not default_branch: + default_branch = "master" + + default_remote = f"origin/{default_branch}" return subprocess.check_output( - ["git", "merge-base", sha1, "origin/master"], + ["git", "merge-base", sha1, default_remote], encoding="ascii", ).strip() @@ -206,7 +212,7 @@ def analyze( base_reports: Dict[Commit, List[SimplerReport]], ) -> List[SuiteDiff]: nonempty_shas = [sha for sha, reports in base_reports.items() if reports] - # most recent master ancestor with at least one S3 report, + # most recent main ancestor with at least one S3 report, # or empty list if there are none (will show all tests as added) base_report = base_reports[nonempty_shas[0]] if nonempty_shas else [] @@ -525,7 +531,7 @@ def regression_info( and its test times. Since Python dicts maintain insertion order (guaranteed as part of the language spec since 3.7), the base_reports argument must list the head's several most recent - master commits, from newest to oldest (so the merge-base is + main commits, from newest to oldest (so the merge-base is list(base_reports)[0]). """ simpler_head = simplify(head_report) @@ -570,6 +576,10 @@ def __init__(self, dom: Any) -> None: self.class_name = str(dom.attributes['classname'].value) self.name = str(dom.attributes['name'].value) self.time = float(dom.attributes['time'].value) + # The following attribute is currently ONLY used in process_intentional_test_runs for validation + # reasons. The test filename that populates TestFile is calculated and passed down through the test report path. + # The reason we don't just use this attribute is because it doesn't exist for cpp tests, e.g., in test_libtorch + self.file = str(dom.attributes['file'].value) if dom.hasAttribute('file') else 'N/A - probably a cpp test' error_elements = dom.getElementsByTagName('error') # DISCLAIMER: unexpected successes and expected failures are currently not reported in assemble_s3_object self.expected_failure = False @@ -595,9 +605,9 @@ def __repr__(self) -> str: return self.__str__() def __str__(self) -> str: - return f'[TestCase name: {self.name} | class_name: {self.class_name} | time: {self.time} | ' \ + return f'[TestCase name: {self.name} | class_name: {self.class_name} | file: {self.file} | time: {self.time} | ' \ f'expected_failure: {self.expected_failure} | skipped: {self.skipped} | errored: {self.errored} | ' \ - f'unexpected_success: {self.unexpected_success} | failed: {self.failed}]' + f'unexpected_success: {self.unexpected_success} | failed: {self.failed}]\n' class TestSuite: def __init__(self, name: str) -> None: @@ -638,6 +648,17 @@ def update(self, test_case: TestCase) -> None: self.test_cases[name].expected_failure |= test_case.expected_failure +# Tests that spawn duplicates (usually only twice) intentionally +MULTITESTS = [ + 'test_cpp_extensions_aot', + 'distributed/test_distributed_spawn', + 'distributed\\test_distributed_spawn', # for windows + 'distributed/test_c10d_gloo', + 'distributed\\test_c10d_gloo', # for windows + 'cpp' # The caffe2 cpp tests spawn duplicate test cases as well. +] + + DuplicatedDict = Dict[str, Dict[str, List[TestCase]]] class TestFile: @@ -647,27 +668,20 @@ def __init__(self, name: str) -> None: self.test_suites: Dict[str, TestSuite] = dict() def append(self, test_case: TestCase, test_type: str, duplicated_tests_dict: DuplicatedDict) -> None: - is_multi_test = self.name == 'test_cpp_extensions_aot' or \ - self.name == 'distributed/test_distributed_spawn' or \ - self.name == 'distributed/test_c10d_gloo' or \ - self.name == 'cpp' # The caffe2 cpp tests spawn duplicate test cases as well. - if is_multi_test: - suite_name = test_case.class_name + '__' + test_type - else: - suite_name = test_case.class_name + suite_name = test_case.class_name if suite_name not in self.test_suites: self.test_suites[suite_name] = TestSuite(suite_name) if test_case.name in self.test_suites[suite_name].test_cases: - if is_multi_test: + if self.name in MULTITESTS: self.test_suites[suite_name].update(test_case) self.total_time += test_case.time - else: - # Gather up duplicated test cases - if suite_name not in duplicated_tests_dict: - duplicated_tests_dict[suite_name] = dict() - if test_case.name not in duplicated_tests_dict[suite_name]: - duplicated_tests_dict[suite_name][test_case.name] = [self.test_suites[suite_name].test_cases[test_case.name]] - duplicated_tests_dict[suite_name][test_case.name].append(test_case) + + # Gather up duplicated test cases to parse for flaky reruns + if suite_name not in duplicated_tests_dict: + duplicated_tests_dict[suite_name] = dict() + if test_case.name not in duplicated_tests_dict[suite_name]: + duplicated_tests_dict[suite_name][test_case.name] = [self.test_suites[suite_name].test_cases[test_case.name]] + duplicated_tests_dict[suite_name][test_case.name].append(test_case) else: self.test_suites[suite_name].append(test_case) self.total_time += test_case.time @@ -737,17 +751,9 @@ def process_intentional_test_runs(runs: List[TestCase]) -> Tuple[int, int]: else: num_pass += 1 - REPEAT_TEST_FOR_TYPES_TESTS = [ - "test_data_parallel_module", - "test_data_parallel_module_kwargs_only", - "test_data_parallel_module_kwargs_only_empty_list", - "test_data_parallel_module_kwargs_only_empty_dict", - "test_data_parallel_module_kwargs_only_empty_tuple" - ] - - # Do not run checks for tests that use repeat_test_for_types decorator as they do not go well with our retry - # functionality. Once issue https://github.com/pytorch/pytorch/issues/69865 is fixed, we should remove the exception - if not any([x in test_run.name for x in REPEAT_TEST_FOR_TYPES_TESTS]): + # Do not run duplication checks for test files that spawn duplicate tests intentionally + # and are not necessarily flaky test reruns. + if not any(x in test_run.file for x in MULTITESTS): err_msg = f'Warning: unintentional test case duplicates found for {test_run.name} in suite {test_run.class_name}.' report_only = os.getenv('PYTORCH_OVERRIDE_FLAKY_SIGNAL') != '1' if report_only and num_fail + num_errored + num_unexpected_success < 1 or not report_only and num_expected_fail < 1: @@ -774,7 +780,7 @@ def assemble_flaky_test_stats(duplicated_tests_by_file: Dict[str, DuplicatedDict for suite_name, testcase_to_runs in suite_to_dict.items(): for testcase_name, list_of_runs in testcase_to_runs.items(): num_green, num_red = process_intentional_test_runs(list_of_runs) - if num_green > 0: # Otherwise, it's likely just a failing test + if num_green > 0 and num_red > 0: # Flaky tests show different results in consecutive reruns flaky_tests.append({ "name": testcase_name, "suite": suite_name, @@ -790,6 +796,7 @@ def assemble_flaky_test_stats(duplicated_tests_by_file: Dict[str, DuplicatedDict # write to S3 to go to Rockset as well import uuid for flaky_test in flaky_tests: + flaky_test["job_id"] = os.environ["GHA_WORKFLOW_JOB_ID"] flaky_test["workflow_id"] = workflow_id key = f"flaky_tests/{workflow_id}/{uuid.uuid4()}.json" obj = get_S3_object_from_bucket("ossci-raw-job-status", key) @@ -943,7 +950,7 @@ def print_regressions(head_report: Report, *, num_prev_commits: int) -> None: encoding="ascii", )) - # if current commit is already on master, we need to exclude it from + # if current commit is already on main, we need to exclude it from # this history; otherwise we include the merge-base commits = subprocess.check_output( ["git", "rev-list", f"--max-count={num_prev_commits+1}", base], diff --git a/tools/stats/upload_test_stats.py b/tools/stats/upload_test_stats.py new file mode 100644 index 00000000000000..899fc0495948c6 --- /dev/null +++ b/tools/stats/upload_test_stats.py @@ -0,0 +1,206 @@ +import argparse +import os +import requests +import shutil +import zipfile +import xml.etree.ElementTree as ET +from pathlib import Path +from typing import Dict, List, Any + +import rockset # type: ignore[import] +import boto3 # type: ignore[import] + +PYTORCH_REPO = "https://api.github.com/repos/pytorch/pytorch" +GITHUB_TOKEN = os.environ["GITHUB_TOKEN"] +REQUEST_HEADERS = { + "Accept": "application/vnd.github.v3+json", + "Authorization": "token " + GITHUB_TOKEN, +} +S3_RESOURCE = boto3.resource("s3") +TEMP_DIR = Path(os.environ["RUNNER_TEMP"]) / "tmp-test-stats" + + +def parse_xml_report( + report: Path, workflow_id: int, workflow_run_attempt: int +) -> List[Dict[str, Any]]: + """Convert a test report xml file into a JSON-serializable list of test cases.""" + # [Job id in artifacts] + # Retrieve the job id from the report path. In our GHA workflows, we append + # the job id to the end of the report name, so `report` looks like: + # unzipped-test-reports-foo_5596745227/test/test-reports/foo/TEST-foo.xml + # and we want to get `5596745227` out of it. + job_id = int(report.parts[0].rpartition("_")[2]) + + print(f"Parsing test report: {report}, job id: {job_id}") + root = ET.parse(report) + + test_cases = [] + for test_case in root.findall("testcase"): + case = process_xml_element(test_case) + case["workflow_id"] = workflow_id + case["workflow_run_attempt"] = workflow_run_attempt + case["job_id"] = job_id + test_cases.append(case) + + return test_cases + + +def process_xml_element(element: ET.Element) -> Dict[str, Any]: + """Convert a test suite element into a JSON-serializable dict.""" + ret: Dict[str, Any] = {} + + # Convert attributes directly into dict elements. + # e.g. + # + # becomes: + # {"name": "test_foo", "classname": "test_bar"} + ret.update(element.attrib) + + # By default, all attributes are strings. Apply a few special conversions + # here for well-known attributes so that they are the right type in Rockset. + line = ret.get("line") + if line: + ret["line"] = int(line) + time = ret.get("time") + if time: + ret["time"] = float(time) + + # Convert inner and outer text into special dict elements. + # e.g. + # my_inner_text my_tail + # becomes: + # {"text": "my_inner_text", "tail": " my_tail"} + if element.text and element.text.strip(): + ret["text"] = element.text + if element.tail and element.tail.strip(): + ret["tail"] = element.tail + + # Convert child elements recursively, placing them at a key: + # e.g. + # + # hello + # + # becomes + # {"foo": {"text": "hello"}} + for child in element: + ret[child.tag] = process_xml_element(child) + return ret + + +def get_artifact_urls(workflow_run_id: int) -> Dict[Path, str]: + """Get all workflow artifacts with 'test-report' in the name.""" + response = requests.get( + f"{PYTORCH_REPO}/actions/runs/{workflow_run_id}/artifacts?per_page=100", + ) + artifacts = response.json()["artifacts"] + while "next" in response.links.keys(): + response = requests.get(response.links["next"]["url"], headers=REQUEST_HEADERS) + artifacts.extend(response.json()["artifacts"]) + + artifact_urls = {} + for artifact in artifacts: + if "test-report" in artifact["name"]: + artifact_urls[Path(artifact["name"])] = artifact["archive_download_url"] + return artifact_urls + + +def unzip(p: Path) -> None: + """Unzip the provided zipfile to a similarly-named directory. + + Returns None if `p` is not a zipfile. + + Looks like: /tmp/test-reports.zip -> /tmp/unzipped-test-reports/ + """ + assert p.is_file() + unzipped_dir = p.with_name("unzipped-" + p.stem) + + with zipfile.ZipFile(p, "r") as zip: + zip.extractall(unzipped_dir) + + +def download_and_extract_artifact( + artifact_name: Path, artifact_url: str, workflow_run_attempt: int +) -> None: + # [Artifact run attempt] + # All artifacts on a workflow share a single namespace. However, we can + # re-run a workflow and produce a new set of artifacts. To avoid name + # collisions, we add `-runattempt1-` somewhere in the artifact name. + # + # This code parses out the run attempt number from the artifact name. If it + # doesn't match the one specified on the command line, skip it. + atoms = str(artifact_name).split("-") + for atom in atoms: + if atom.startswith("runattempt"): + found_run_attempt = int(atom[len("runattempt") :]) + if workflow_run_attempt != found_run_attempt: + print(f"Skipping {artifact_name} as it is an invalid run attempt.") + + print(f"Downloading and extracting {artifact_name}") + + response = requests.get(artifact_url, headers=REQUEST_HEADERS) + with open(artifact_name, "wb") as f: + f.write(response.content) + unzip(artifact_name) + + +def download_and_extract_s3_reports( + workflow_run_id: int, workflow_run_attempt: int +) -> None: + bucket = S3_RESOURCE.Bucket("gha-artifacts") + objs = bucket.objects.filter( + Prefix=f"pytorch/pytorch/{workflow_run_id}/{workflow_run_attempt}/artifact/test-reports" + ) + + for obj in objs: + p = Path(Path(obj.key).name) + print(f"Downloading and extracting {p}") + with open(p, "wb") as f: + f.write(obj.get()["Body"].read()) + unzip(p) + + +if __name__ == "__main__": + parser = argparse.ArgumentParser(description="Upload test stats to Rockset") + parser.add_argument( + "--workflow-run-id", + required=True, + help="id of the workflow to get artifacts from", + ) + parser.add_argument( + "--workflow-run-attempt", + required=True, + help="which retry of the workflow this is", + ) + args = parser.parse_args() + + if TEMP_DIR.exists(): + print("rm: ", TEMP_DIR) + shutil.rmtree(TEMP_DIR) + + print("mkdir: ", TEMP_DIR) + TEMP_DIR.mkdir() + print("cd to ", TEMP_DIR) + os.chdir(TEMP_DIR) + + # Download and extract all the reports (both GHA and S3) + download_and_extract_s3_reports(args.workflow_run_id, args.workflow_run_attempt) + artifact_urls = get_artifact_urls(args.workflow_run_id) + for name, url in artifact_urls.items(): + download_and_extract_artifact(Path(name), url, args.workflow_run_attempt) + + # Parse the reports and transform them to JSON + test_cases = [] + for xml_report in Path(".").glob("**/*.xml"): + test_cases.extend( + parse_xml_report( + xml_report, int(args.workflow_run_id), int(args.workflow_run_attempt) + ) + ) + + # Write the JSON to rockset + print(f"Writing {len(test_cases)} test cases to Rockset") + client = rockset.Client( + api_server="api.rs2.usw2.rockset.com", api_key=os.environ["ROCKSET_API_KEY"] + ) + client.Collection.retrieve("test_run").add_docs(test_cases) + print("Done!") diff --git a/tools/test/test_gen_backend_stubs.py b/tools/test/test_gen_backend_stubs.py index ee2ee8a0f0b9f9..9dae08c366068f 100644 --- a/tools/test/test_gen_backend_stubs.py +++ b/tools/test/test_gen_backend_stubs.py @@ -208,7 +208,7 @@ def test_unrecognized_key(self) -> None: - abs invalid_key: invalid_val''' output_error = self.get_errors_from_gen_backend_stubs(yaml_str) - self.assertExpectedInline(output_error, ''' contains unexpected keys: invalid_key. Only the following keys are supported: backend, cpp_namespace, extra_headers, supported, autograd, full_codegen''') # noqa: B950 + self.assertExpectedInline(output_error, ''' contains unexpected keys: invalid_key. Only the following keys are supported: backend, class_name, cpp_namespace, extra_headers, supported, autograd, full_codegen''') # noqa: B950 # if use_out_as_primary is provided, it must be a bool def test_use_out_as_primary_non_bool(self) -> None: diff --git a/tools/test/test_import_test_stats.py b/tools/test/test_import_test_stats.py new file mode 100644 index 00000000000000..5a43a7d45e8a97 --- /dev/null +++ b/tools/test/test_import_test_stats.py @@ -0,0 +1,51 @@ +import os +import unittest +from tools.stats.import_test_stats import get_disabled_issues +from typing import List +from unittest.mock import patch + +class TestGetDisabledIssues(unittest.TestCase): + + def run_assert_disabled_issues(self, pr_body: str, commit_messages: str, expected: List[str]) -> None: + with patch.dict(os.environ, {"PR_BODY": pr_body, "COMMIT_MESSAGES": commit_messages}): + disabled_issues = get_disabled_issues() + self.assertEqual(disabled_issues, expected) + + # test variations of close in PR_BODY + def test_closes_pr_body(self) -> None: + pr_body = 'closes #123 Close #143 ClOsE #345 closed #10283' + self.run_assert_disabled_issues(pr_body, '', ['123', '143', '345', '10283']) + + # test variations of fix in COMMIT_MESSAGES + def test_fixes_commit_messages(self) -> None: + commit_messages = 'fix #123 FixEd #143 fixes #345 FiXeD #10283' + self.run_assert_disabled_issues('', commit_messages, ['123', '143', '345', '10283']) + + # test variations of resolve in PR_BODY and COMMIT_MESSAGES + def test_resolves_pr_commits(self) -> None: + pr_body = 'resolve #123 resolveS #143' + commit_messages = 'REsolved #345 RESOLVES #10283' + self.run_assert_disabled_issues(pr_body, commit_messages, ['123', '143', '345', '10283']) + + # test links + def test_issue_links(self) -> None: + pr_body = 'closes https://github.com/pytorch/pytorch/issues/75198 fixes https://github.com/pytorch/pytorch/issues/75123' + self.run_assert_disabled_issues(pr_body, '', ['75198', '75123']) + + # test strange spacing + def test_spacing(self) -> None: + pr_body = 'resolve #123,resolveS #143Resolved #345\nRESOLVES #10283' + commit_messages = 'Fixed #2348fixes https://github.com/pytorch/pytorch/issues/75123resolveS #2134' + self.run_assert_disabled_issues(pr_body, commit_messages, ['123', '143', '345', '10283', '2348', '75123', '2134']) + + # test bad things + def test_not_accepted(self) -> None: + pr_body = 'fixes189 fixeshttps://github.com/pytorch/pytorch/issues/75123 ' \ + 'closedhttps://githubcom/pytorch/pytorch/issues/75123' + commit_messages = 'fix 234, fixes # 45, fixing #123, close 234, closes#45, closing #123 resolve 234, ' \ + 'resolves #45, resolving #123' + self.run_assert_disabled_issues(pr_body, commit_messages, []) + + +if __name__ == '__main__': + unittest.main() diff --git a/tools/testing/test_selections.py b/tools/testing/test_selections.py index c83b0619f03067..f09b87ac1a26bd 100644 --- a/tools/testing/test_selections.py +++ b/tools/testing/test_selections.py @@ -156,7 +156,8 @@ def _query_failure_test_module(reports: List[Tuple["Report", str]]) -> List[str] def _query_changed_test_files() -> List[str]: - cmd = ["git", "diff", "--name-only", "origin/master", "HEAD"] + default_branch = f"origin/{os.environ.get('GIT_DEFAULT_BRANCH', 'master')}" + cmd = ["git", "diff", "--name-only", default_branch, "HEAD"] proc = subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE) if proc.returncode != 0: diff --git a/torch/CMakeLists.txt b/torch/CMakeLists.txt index 00892ea09eae7d..4dddf7b33d71bf 100644 --- a/torch/CMakeLists.txt +++ b/torch/CMakeLists.txt @@ -44,6 +44,9 @@ set(TORCH_PYTHON_SRCS ) append_filelist("libtorch_python_core_sources" TORCH_PYTHON_SRCS) +list(APPEND TORCH_PYTHON_SRCS + ${TORCH_SRC_DIR}/csrc/init_flatbuffer_module.cpp) + # NB: This has to match the condition under which the JIT test directory # is included (at the time of writing that's in caffe2/CMakeLists.txt). if(BUILD_TEST) @@ -190,6 +193,7 @@ add_custom_target(torch_python_stubs DEPENDS "${TORCH_SRC_DIR}/_C/__init__.pyi" "${TORCH_SRC_DIR}/_C/_VariableFunctions.pyi" "${TORCH_SRC_DIR}/nn/functional.pyi" + "${TORCH_SRC_DIR}/utils/data/datapipes/datapipe.pyi" ) add_custom_command( OUTPUT @@ -210,6 +214,18 @@ add_custom_command( WORKING_DIRECTORY "${TORCH_ROOT}" ) +file(GLOB_RECURSE datapipe_files "${TORCH_SRC_DIR}/utils/data/datapipes/*.py") +add_custom_command( + OUTPUT + "${TORCH_SRC_DIR}/utils/data/datapipes/datapipe.pyi" + COMMAND + "${PYTHON_EXECUTABLE}" ${TORCH_SRC_DIR}/utils/data/datapipes/gen_pyi.py + DEPENDS + "${TORCH_SRC_DIR}/utils/data/datapipes/datapipe.pyi.in" + ${datapipe_files} + WORKING_DIRECTORY + "${TORCH_ROOT}" +) if(USE_DISTRIBUTED) if(WIN32) append_filelist("libtorch_python_distributed_core_sources" TORCH_PYTHON_SRCS) @@ -376,6 +392,9 @@ set_source_files_properties( # Disable certain warnings for GCC-9.X if(CMAKE_COMPILER_IS_GNUCXX AND (CMAKE_CXX_COMPILER_VERSION VERSION_GREATER 9.0.0)) set_source_files_properties(${TORCH_SRC_DIR}/csrc/Module.cpp PROPERTIES COMPILE_FLAGS "-Wno-cast-function-type") + set_source_files_properties( + ${TORCH_SRC_DIR}/csrc/init_flatbuffer_module.cpp + PROPERTIES COMPILE_FLAGS "-Wno-cast-function-type") set_source_files_properties(${TORCH_SRC_DIR}/csrc/autograd/python_variable.cpp PROPERTIES COMPILE_FLAGS "-Wno-cast-function-type") endif() diff --git a/torch/_C/_VariableFunctions.pyi.in b/torch/_C/_VariableFunctions.pyi.in index 1b3a760c8cbd49..75d566f131ab59 100644 --- a/torch/_C/_VariableFunctions.pyi.in +++ b/torch/_C/_VariableFunctions.pyi.in @@ -5,13 +5,11 @@ from typing import List, Tuple, Optional, Union, Any, ContextManager, Callable, from typing_extensions import Literal from torch._six import inf -from torch.types import _int, _float, _bool, Number, _dtype, _device, _qscheme, _size, _layout +from torch.types import _int, _float, _bool, Number, _dtype, _device, _qscheme, _size, _layout, SymInt +import torch import builtins -# REDUNDANT! -${namedtuple_defs} - ${function_hints} ${all_directive} diff --git a/torch/_C/__init__.pyi.in b/torch/_C/__init__.pyi.in index db093932f1c8e7..e252e9025778ed 100644 --- a/torch/_C/__init__.pyi.in +++ b/torch/_C/__init__.pyi.in @@ -12,7 +12,7 @@ from typing import ( from typing_extensions import Literal from torch._six import inf -from torch.types import _int, _float, _bool, _dtype, _device, _qscheme, _size, _layout, Device, Number, Storage +from torch.types import _int, _float, _bool, _dtype, _device, _qscheme, _size, _layout, Device, Number, Storage, SymInt from torch.storage import _TypedStorage import builtins @@ -22,6 +22,8 @@ import builtins from . import _nn as _nn from . import _onnx as _onnx from . import _VariableFunctions as _VariableFunctions +from . import _lazy as _lazy +from . import _lazy_ts_backend as _lazy_ts_backend T = TypeVar('T') @@ -214,6 +216,7 @@ def _jit_pass_propagate_shapes_on_graph(Graph) -> None: ... def _jit_erase_non_input_shape_information(Graph) -> None: ... def _jit_pass_common_expression_hoisting(Graph) -> None: ... def _jit_get_schemas_for_operator(name :str) -> List[FunctionSchema]: ... +def _jit_get_all_schemas() -> List[FunctionSchema]: ... def _jit_check_alias_annotation(g: Graph, args: Tuple[Any, ...], unqualified_op_name: str): ... def _jit_can_fuse_on_cpu() -> _bool: ... def _jit_can_fuse_on_gpu() -> _bool: ... @@ -233,7 +236,7 @@ def _jit_set_te_must_use_llvm_cpu(use_llvm: _bool): ... def _jit_set_nvfuser_enabled(enable: _bool) -> _bool: ... def _jit_cat_wo_conditionals(optimize_cat: _bool): ... def _jit_opt_conditionals(opt_conds: _bool): ... -def _jit_pass_canonicalize(graph: Graph): ... +def _jit_pass_canonicalize(graph: Graph, keep_unique_names: _bool = True): ... def _jit_pass_erase_shape_information(graph: Graph): ... def _jit_pass_fold_convbn(module: 'torch.jit.ScriptModule'): ... def _jit_pass_insert_observers(module: 'torch.jit.ScriptModule', @@ -260,7 +263,7 @@ ResolutionCallback = Callable[[str], Callable[..., Any]] # Defined in torch/csrc/jit/python/script_init.cpp # and torch/csrc/jit/python/init.cpp -def _create_function_from_graph(qualname: str, graph: Graph) -> Graph: ... +def _create_function_from_graph(qualname: str, graph: Graph) -> ScriptFunction: ... def _debug_set_autodiff_subgraph_inlining(disabled: _bool) -> None: ... def _ivalue_tags_match(lhs: ScriptModule, rhs: ScriptModule) -> _bool: ... def _jit_assert_is_instance(obj: Any, type: JitType): ... @@ -281,7 +284,7 @@ def _get_model_ops_and_info_from_buffer(buffer: BinaryIO): ... def _get_mobile_model_contained_types(filename: Union[str, Path]): ... def _get_mobile_model_contained_types_from_buffer(buffer: BinaryIO): ... def _logging_set_logger(logger: LoggerBase) -> LoggerBase: ... -def _get_graph_executor_optimize() -> _bool: ... +def _get_graph_executor_optimize(optimize: Optional[_bool] = None) -> _bool: ... def _set_graph_executor_optimize(optimize: _bool): ... def _export_opnames(module: ScriptModule) -> List[str]: ... def _create_function_from_trace( @@ -318,7 +321,7 @@ def _jit_pass_onnx_assign_output_shape(graph: Graph, tensors: List[Tensor], desc def _jit_pass_onnx_remove_inplace_ops_for_onnx(graph: Graph, module: Module) -> None: ... def _jit_pass_remove_inplace_ops(graph: Graph) -> None: ... def _jit_pass_canonicalize_graph_fuser_ops(graph: Graph) -> None: ... -def _jit_pass_peephole(graph: Graph, addmm_fusion_enabled: _bool) -> None: ... +def _jit_pass_peephole(graph: Graph, disable_shape_peepholes: _bool = False) -> None: ... def _jit_pass_fuse_addmm(graph: Graph) -> None: ... def _jit_pass_onnx_preprocess(graph: Graph) -> None: ... def _jit_pass_prepare_division_for_onnx(graph: Graph) -> None: ... @@ -345,6 +348,10 @@ def _jit_pass_onnx_function_substitution(graph: Graph) -> None: ... def _jit_pass_onnx_function_extraction(graph: Graph, module_names : Set[str], param_names : List[str]) -> Dict[Node, Dict[str, str]]: ... def _jit_pass_onnx_clear_scope_records() -> None: ... def _jit_pass_onnx_track_scope_attributes(graph: Graph, onnx_attrs: Dict[str, Any]) -> None: ... +def _jit_is_onnx_log_enabled() -> _bool: ... +def _jit_set_onnx_log_enabled(enabled: _bool) -> None: ... +def _jit_set_onnx_log_output_stream(stream_name: str) -> None: ... +def _jit_onnx_log(*args: Any) -> None: ... def _jit_pass_lower_graph(graph: Graph, m: Module) -> Tuple[Graph, List[IValue]]: ... def _jit_pass_inline_fork_wait(graph: Graph) -> None: ... def _jit_pass_onnx_deduplicate_initializers(graph: Graph, params_dict: Dict[str, IValue], is_train: _bool) -> Dict[str, IValue]: ... @@ -466,7 +473,8 @@ class Graph: def setInsertPoint(self, n: Union[Block, Node]) -> None: ... def insert_point_guard(self, n: Union[Block, Node]) -> _InsertPoint: ... def insertPoint(self) -> Node: ... - def insertGraph(sellf, callee: Graph, inputs: List[Value]) -> List[Value]: ... + def insertGraph(self, callee: Graph, inputs: List[Value]) -> List[Value]: ... + def makeMultiOutputIntoTuple(self) -> None: ... ... @@ -481,6 +489,8 @@ class Argument: class FunctionSchema: arguments: List[Argument] returns: List[Argument] + name: str + overload_name: str ... class _UpgraderEntry: @@ -817,9 +827,6 @@ class ThroughputBenchmark(object): def run_once(self, *args: Any, **kwargs: Any) -> Any: ... def benchmark(self, config: BenchmarkConfig) -> BenchmarkExecutionStats: ... -# IDK if these are actually exposed here, hope they are -${namedtuple_defs} - # Defined in torch/csrc/generic/Storage.cpp ${legacy_storage_base_hints} @@ -1134,6 +1141,9 @@ class TensorType(JitType): def getInferred(cls) -> TensorType: ... def with_sizes(self, other: Optional[List[Optional[_int]]]) -> TensorType: ... def sizes(self) -> Optional[List[_int]]: ... + def strides(self) -> Optional[List[_int]]: ... + def device(self) -> Optional[_device]: ... + def dtype(self) -> Optional[_dtype]: ... @staticmethod def create_from_tensor(t: Tensor) -> TensorType: ... diff --git a/torch/_C/_autograd.pyi b/torch/_C/_autograd.pyi index 38ac7ccaea0c8d..9cdf801dd7602b 100644 --- a/torch/_C/_autograd.pyi +++ b/torch/_C/_autograd.pyi @@ -87,6 +87,7 @@ def _prepare_profiler(config: ProfilerConfig, activities: Set[ProfilerActivity]) def _disable_profiler() -> _ProfilerResult: ... def _profiler_enabled() -> bool: ... def _add_metadata_json(key: str, value: str) -> None: ... +def _kineto_step() -> None: ... def kineto_available() -> bool: ... def _record_function_with_args_enter(name: str, args: List[Any]) -> torch.Tensor: ... def _record_function_with_args_exit(handle: torch.Tensor) -> None: ... diff --git a/torch/_C/_distributed_rpc.pyi b/torch/_C/_distributed_rpc.pyi index d89f614123e1c4..58d555297929f7 100644 --- a/torch/_C/_distributed_rpc.pyi +++ b/torch/_C/_distributed_rpc.pyi @@ -85,7 +85,7 @@ class TensorPipeAgent(RpcAgent): store: Store, name: str, worker_id: int, - world_size: int, + world_size: Optional[int], opts: _TensorPipeRpcBackendOptionsBase, reverse_device_maps: Dict[str, Dict[torch.device, torch.device]], devices: List[torch.device], diff --git a/torch/_C/_lazy.pyi b/torch/_C/_lazy.pyi new file mode 100644 index 00000000000000..5b4cf101234a44 --- /dev/null +++ b/torch/_C/_lazy.pyi @@ -0,0 +1,17 @@ +from typing import List +from torch import Tensor + +#defined in torch/csrc/lazy/python/init.cpp +def _mark_step(device: str, devices: List[str], wait: bool): ... +def _wait_device_ops(devices: List[str]): ... +def _reset_metrics(): ... +def _counter_names() -> List[str]: ... +def _counter_value(name: str) -> int: ... +def _get_graph_hash(tensors: List[Tensor]) -> str: ... +def _sync_multi(tensors: List[Tensor], devices: List[str], wait: bool = True, sync_ltc_data: bool = True): ... +def _get_tensor_id(tensor: Tensor) -> int: ... +def _get_tensors_text(tensors: List[Tensor]) -> str: ... +def _get_tensors_dot(tensors: List[Tensor]) -> str: ... +def _get_tensors_backend(tensors: List[Tensor]) -> str: ... +def _get_force_fallback() -> str: ... +def _set_force_fallback(newval: str): ... diff --git a/torch/_C/_lazy_ts_backend.pyi b/torch/_C/_lazy_ts_backend.pyi new file mode 100644 index 00000000000000..91575fe939bfa2 --- /dev/null +++ b/torch/_C/_lazy_ts_backend.pyi @@ -0,0 +1,8 @@ +#defined in torch/csrc/lazy/python/init.cpp + +from typing import List, Tuple, Any +from torch import Tensor + +def _init(): ... +def _get_tensors_ts_device_data_node(tensors: List[Tensor]) -> Tuple[List[int], List[Any]]: ... +def _run_cached_graph(hash_str: str, graph_inputs: List[Any]) -> List[Tensor]: ... diff --git a/torch/_C/build.bzl b/torch/_C/build.bzl new file mode 100644 index 00000000000000..230124eb69aa81 --- /dev/null +++ b/torch/_C/build.bzl @@ -0,0 +1,6 @@ +def define_targets(rules): + rules.filegroup( + name = "pyi.in", + srcs = rules.glob(["*.pyi.in"]), + visibility = ["//visibility:public"], + ) diff --git a/torch/_C/return_types.pyi.in b/torch/_C/return_types.pyi.in new file mode 100644 index 00000000000000..aa540ea328b5d9 --- /dev/null +++ b/torch/_C/return_types.pyi.in @@ -0,0 +1,10 @@ +# ${generated_comment} + +from torch import Tensor, Generator, strided, memory_format, contiguous_format, strided +from typing import List, Tuple, Optional, Union, Any, ContextManager, Callable, overload, Iterator, NamedTuple, Sequence, TypeVar +from typing_extensions import Literal +from torch._six import inf + +from torch.types import _int, _float, _bool, Number, _dtype, _device, _qscheme, _size, _layout + +${namedtuple_defs} diff --git a/torch/_C_flatbuffer/__init__.pyi b/torch/_C_flatbuffer/__init__.pyi new file mode 100644 index 00000000000000..3a2ff059b0ed9d --- /dev/null +++ b/torch/_C_flatbuffer/__init__.pyi @@ -0,0 +1,10 @@ +from torch._C import LiteScriptModule, ScriptModule + +def _load_mobile_module_from_file(filename: str): ... +def _load_mobile_module_from_bytes(bytes_: bytes): ... +def _load_jit_module_from_file(filename: str): ... +def _load_jit_module_from_bytes(bytes_: bytes): ... +def _save_mobile_module(m: LiteScriptModule, filename: str): ... +def _save_jit_module(m: ScriptModule, filename: str): ... +def _save_mobile_module_to_bytes(m: LiteScriptModule) -> bytes: ... +def _save_jit_module_to_bytes(m: ScriptModule) -> bytes: ... diff --git a/torch/__init__.py b/torch/__init__.py index 64827961c30cac..7011dc4e3b963d 100644 --- a/torch/__init__.py +++ b/torch/__init__.py @@ -39,6 +39,7 @@ 'no_grad', 'enable_grad', 'rand', 'randn', 'inference_mode', 'DoubleStorage', 'FloatStorage', 'LongStorage', 'IntStorage', 'ShortStorage', 'CharStorage', 'ByteStorage', 'BoolStorage', + '_TypedStorage', 'DoubleTensor', 'FloatTensor', 'LongTensor', 'IntTensor', 'ShortTensor', 'CharTensor', 'ByteTensor', 'BoolTensor', 'Tensor', 'lobpcg', 'use_deterministic_algorithms', @@ -594,7 +595,7 @@ def is_warn_always_enabled(): ################################################################################ from ._tensor import Tensor -from .storage import _StorageBase, _TypedStorage +from .storage import _StorageBase, _TypedStorage, _LegacyStorage # NOTE: New Storage classes should never be added. When adding a new # dtype, use torch.storage._TypedStorage directly. @@ -602,87 +603,87 @@ def is_warn_always_enabled(): class _UntypedStorage(_C.ByteStorageBase, _StorageBase): pass -class ByteStorage(_TypedStorage): +class ByteStorage(_LegacyStorage): @classproperty def dtype(self): return torch.uint8 -class DoubleStorage(_TypedStorage): +class DoubleStorage(_LegacyStorage): @classproperty def dtype(self): return torch.double -class FloatStorage(_TypedStorage): +class FloatStorage(_LegacyStorage): @classproperty def dtype(self): return torch.float -class HalfStorage(_TypedStorage): +class HalfStorage(_LegacyStorage): @classproperty def dtype(self): return torch.half -class LongStorage(_TypedStorage): +class LongStorage(_LegacyStorage): @classproperty def dtype(self): return torch.long -class IntStorage(_TypedStorage): +class IntStorage(_LegacyStorage): @classproperty def dtype(self): return torch.int -class ShortStorage(_TypedStorage): +class ShortStorage(_LegacyStorage): @classproperty def dtype(self): return torch.short -class CharStorage(_TypedStorage): +class CharStorage(_LegacyStorage): @classproperty def dtype(self): return torch.int8 -class BoolStorage(_TypedStorage): +class BoolStorage(_LegacyStorage): @classproperty def dtype(self): return torch.bool -class BFloat16Storage(_TypedStorage): +class BFloat16Storage(_LegacyStorage): @classproperty def dtype(self): return torch.bfloat16 -class ComplexDoubleStorage(_TypedStorage): +class ComplexDoubleStorage(_LegacyStorage): @classproperty def dtype(self): return torch.cdouble -class ComplexFloatStorage(_TypedStorage): +class ComplexFloatStorage(_LegacyStorage): @classproperty def dtype(self): return torch.cfloat -class QUInt8Storage(_TypedStorage): +class QUInt8Storage(_LegacyStorage): @classproperty def dtype(self): return torch.quint8 -class QInt8Storage(_TypedStorage): +class QInt8Storage(_LegacyStorage): @classproperty def dtype(self): return torch.qint8 -class QInt32Storage(_TypedStorage): +class QInt32Storage(_LegacyStorage): @classproperty def dtype(self): return torch.qint32 -class QUInt4x2Storage(_TypedStorage): +class QUInt4x2Storage(_LegacyStorage): @classproperty def dtype(self): return torch.quint4x2 -class QUInt2x4Storage(_TypedStorage): +class QUInt2x4Storage(_LegacyStorage): @classproperty def dtype(self): return torch.quint2x4 @@ -692,6 +693,7 @@ def dtype(self): ShortStorage, CharStorage, ByteStorage, HalfStorage, BoolStorage, QUInt8Storage, QInt8Storage, QInt32Storage, BFloat16Storage, ComplexFloatStorage, ComplexDoubleStorage, QUInt4x2Storage, QUInt2x4Storage, + _TypedStorage } # The _tensor_classes set is initialized by the call to _C._initialize_tensor_type_bindings() @@ -715,7 +717,7 @@ def manager_path(): raise RuntimeError("Unable to find torch_shm_manager at " + path) return path.encode('utf-8') -from .autocast_mode import autocast +from torch.amp import autocast # Shared memory manager needs to know the exact location of manager executable _C._initExtension(manager_path()) @@ -819,9 +821,6 @@ def _assert(condition, message): from torch import __future__ as __future__ from torch import profiler as profiler -from torch.nested._nestedtensor import NestedTensor -from torch.nested._nestedtensor import nested_tensor - _C._init_names(list(torch._storage_classes)) # attach docstrings to torch and tensor functions diff --git a/torch/_jit_internal.py b/torch/_jit_internal.py index ba570b35391e4e..3c067d5c1c53a3 100644 --- a/torch/_jit_internal.py +++ b/torch/_jit_internal.py @@ -18,6 +18,7 @@ import typing import io import pickle +import threading # This is needed. `torch._jit_internal` is imported before `torch.distributed.__init__`. # Explicitly ask to import `torch.distributed.__init__` first. # Otherwise, "AttributeError: module 'torch' has no attribute 'distributed'" is raised. @@ -1251,6 +1252,8 @@ def persistent_id(self, obj): return "" if isinstance(obj, torch.cuda.Event): return "" + if isinstance(obj, threading.Thread): + return "" return None diff --git a/torch/_lazy/__init__.py b/torch/_lazy/__init__.py new file mode 100644 index 00000000000000..ff4e90c0edf237 --- /dev/null +++ b/torch/_lazy/__init__.py @@ -0,0 +1,33 @@ +import torch._C._lazy + + +def mark_step(device: str = "lazy:0", wait=False): + """Triggers a mark step, which amounts to + - collecting a group of 'live' lazy tensors to index into the compilation cache + (lowering/compiling their IR graphs if not cached) + - kicking off execution of the compiled function + - (optionally, wait=True) waiting for cpu-side execution to complete (does not sync the accelerator) + """ + # TODO(whc) expand this to include backend hooks and align with XLA backend needs + torch._C._lazy._mark_step(device, [], wait=wait) + +def wait_device_ops(devices=None): + """Waits for all the async operations on the given devices to complete. + Args: + devices (string..., optional): The devices whose async ops need to be waited + for. If empty, all the local devices will be waited for. + """ + if devices is None: + devices = [] + torch._C._lazy._wait_device_ops(devices=devices) + +def sync_multi(tensors, devices): + """ + Sync the list of lazy tensors so there IR get lowered for the activate backend + and the compiled computation graph get cached. + """ + torch._C._lazy._sync_multi(tensors, devices) + +def get_tensor_id(tensor): + """Return a unique id of the lazy tensor maintained by LTC""" + return torch._C._lazy._get_tensor_id(tensor) diff --git a/torch/_lazy/computation.py b/torch/_lazy/computation.py new file mode 100644 index 00000000000000..7dd57cd7238d45 --- /dev/null +++ b/torch/_lazy/computation.py @@ -0,0 +1,23 @@ +import torch._C._lazy +import torch._C._lazy_ts_backend + +def get_tensors_ts_device_data_node(tensors): + """Return tensor ids and eager tensors for DeviceData nodes in the + IR for the passed in lazy tensors. + + TODO: This API is currently ts backend specific. We are working on + generalizing it to all backends including XLA. + """ + return torch._C._lazy_ts_backend._get_tensors_ts_device_data_node(tensors) + +def get_graph_hash(tensors): + """Return the graph hash for the passed in lazy tensors""" + return torch._C._lazy._get_graph_hash(tensors) + +def run_cached_graph(hash_str, graph_inputs): + """Running the cached computation graph with the given inputs + + TODO: This API is currently ts backend specific. We are working on + generalizing it to all backends including XLA. + """ + return torch._C._lazy_ts_backend._run_cached_graph(hash_str, graph_inputs) diff --git a/torch/_lazy/config.py b/torch/_lazy/config.py new file mode 100644 index 00000000000000..acff69da4e5a35 --- /dev/null +++ b/torch/_lazy/config.py @@ -0,0 +1,9 @@ +import torch._C._lazy + +def get_force_fallback(): + """Get the config used to force LTC fallback""" + return torch._C._lazy._get_force_fallback() + +def set_force_fallback(configval): + """Set the config used to force LTC fallback""" + torch._C._lazy._set_force_fallback(configval) diff --git a/torch/_lazy/debug.py b/torch/_lazy/debug.py new file mode 100644 index 00000000000000..882056ca9c0f3b --- /dev/null +++ b/torch/_lazy/debug.py @@ -0,0 +1,20 @@ +import torch._C._lazy + + +def render_ir_graph(tensors): + """Return a text dump of the LTC IR graph in dot format for the tensors. + The text can be processed by tools like dot to be rendered in pdf,png etc.""" + return torch._C._lazy._get_tensors_dot(tensors) + +def dump_ir(tensors, ir_format): + """Return a dump of the tensors in the specified format. + Valid format are + - text: for LTC IR + - backend: for the activate backend IR + """ + if ir_format == "text": + return torch._C._lazy._get_tensors_text(tensors) + elif ir_format == "backend": + return torch._C._lazy._get_tensors_backend(tensors) + else: + raise RuntimeError(f"Unrecognized IR format: {ir_format}") diff --git a/torch/_lazy/extract_compiled_graph.py b/torch/_lazy/extract_compiled_graph.py new file mode 100644 index 00000000000000..37d0e67f31f3f3 --- /dev/null +++ b/torch/_lazy/extract_compiled_graph.py @@ -0,0 +1,199 @@ +import torch._lazy.metrics as metrics +from torch._lazy.tensor_factory_functions import tensor_factory_functions +from torch._lazy import computation +from torch._lazy import debug as lazy_debug +import torch._lazy as lazy +import dataclasses +from typing import List, Dict, Any, Callable +import copy +from torch import fx +import torch +import itertools +import os + +debug = os.environ.get("debug_extract_compiled_graph") is not None + +@dataclasses.dataclass +class GraphInputMatcher: + """ + The GraphInputMatcher class setup the graph inputs for future calls after lazy tracing. + Specifically, those graph inputs corresponding to method parameters should be replaced with the + arguments for the current call. + + tensor_id_to_arg_idx maps the tensor id to the parameter index. + graph_input_tensor_ids, graph_input_ivalues list the tensor_id and ivalue for each of the + TS/XLA graph inputs. + """ + tensor_id_to_arg_idx: Dict[int, int] + graph_input_tensor_ids: List[int] + # there are 2 categories of graph_input_tensors. + # Category 1: those whose id are not found in tensor_id_to_arg_idx. These are + # most likely const tensors and we can get its content from graph_input_tensors + # Category 2: those whose id are found in tensor_id_to_arg_idx. We should get + # the tensor from method arguments + graph_input_ivalues: List[Any] + + # get the real graph input tensors + def __call__(self, args): + real_input = [] + for tensor_id, traced_ivalue in zip(self.graph_input_tensor_ids, self.graph_input_ivalues): + arg_idx = self.tensor_id_to_arg_idx.get(tensor_id, None) + if arg_idx is None: + inp = traced_ivalue + else: + inp = args[arg_idx] + real_input.append(inp) + return real_input + +class ReturnValueHandler: + r""" + When ltc_sync_multi is called on multi tensors, the compiled graph + will contain output only for unique tensors - if a tensor appears multiple + times in the input to _ltc_sync_multi, only the first occurance matters. + + However from python level, we still expect multi tensors returned with duplciation + even if the TS graph dedup the output. e.g. for method: + + def forward(self, a): + return a, a + + the TS graph captured by LTC will return a single tensor, but Python method expects 2. + + This class dedup the lazy tensors first to get the index that will be used + to duplicate the eager tensors later. + """ + def __init__(self, lazy_out_list): + self.index: List[List[int]] = [] + self.total_count = len(lazy_out_list) + + tensor_id_to_idx: Dict[int, int] = dict() + for dup_idx, lazy_tensor in enumerate(lazy_out_list): + uniq_idx = tensor_id_to_idx.get(id(lazy_tensor), None) + if uniq_idx is not None: + self.index[uniq_idx].append(dup_idx) + else: + uniq_idx = len(self.index) + self.index.append([dup_idx]) + tensor_id_to_idx[id(lazy_tensor)] = uniq_idx + + def duplicate_eager_tensors(self, eager_tensor_list): + duplicated_list = [None] * self.total_count + assert len(eager_tensor_list) == len(self.index) + + for uniq_idx, eager_tensor in enumerate(eager_tensor_list): + for dup_idx in self.index[uniq_idx]: + duplicated_list[dup_idx] = eager_tensor + return duplicated_list + +def force_lazy_device(model: fx.GraphModule): + """ + Factory methods in a Fx graph may create tensors for a specific eager devices. + If we take no actions, those eager tensors will be mixed with lazy tensors and + cause crash. This method overwrite those eager device to lazy device. + """ + def tolazydevice(dev): + if isinstance(dev, torch.device): + return torch.device("lazy", index=dev.index) + return dev + + def hasDeviceArg(args, kwargs): + return any(isinstance(arg, torch.device) for arg in itertools.chain(args, kwargs.values())) + + for nd in model.graph.nodes: + nd.args = tuple(tolazydevice(arg) for arg in nd.args) + nd.kwargs = {k: tolazydevice(v) for k, v in nd.kwargs.items()} + + # For torchbench like yolov3, hf_Bart, dynamo generates Fx graph that return + # eager tensors on the default device + # (check https://gist.github.com/shunting314/eabdf6c769c59bc384469717b8f9bb7f for yolove, + # and https://gist.github.com/shunting314/8d5e2d9348a3258959d3954186c48814 for hf_Bart). + # To force those tensors on the lazy device, we can not simply override + # the device argument since there is no explicit device argument. + # What we are doing here is, for the list of covered tensor factory methods + # we add a lazy device argument explicity. + # + # TODO: This solution is no ideal since we may miss some factory methods. In future + # when we support lazy mode, this method can be replaced by that. + if nd.target in tensor_factory_functions and not hasDeviceArg(nd.args, nd.kwargs): + kwargs = dict(nd.kwargs) # nd.kwargs is immutable. make a mutable copy. + kwargs["device"] = torch.device("lazy") + nd.kwargs = kwargs + + model.recompile() + +def get_fallback_ops(): + fallback_ops = [] + for opname in metrics.counter_names(): + if "aten::" not in opname: + continue + val = int(metrics.counter_value(opname)) + if val > 0: + fallback_ops.append(f"{opname}={val}") + + return fallback_ops + +def extract_compiled_graph(model: fx.GraphModule, example_inputs) -> Callable: + """ + Optimize an eager model with LTC and returns a wrapper to execute the + compiled graph directly without retracing. It depends on other mechanisms + like TorchDynamo guards to guarantee the returned wrapper is only called + when it's safe. + """ + lazy_args = [arg.to(device="lazy") for arg in example_inputs] + args_tensor_ids = [lazy.get_tensor_id(lazy_arg) for lazy_arg in lazy_args] + tensor_id_to_arg_idx = {tensor_id: i for i, tensor_id in enumerate(args_tensor_ids)} + lazy_model = copy.deepcopy(model).to(device=torch.device("lazy")) + force_lazy_device(lazy_model) + + # This line executes lazy tracing and enable us extracting compiled graph later + metrics.reset() + lazy_out = lazy_model(*lazy_args) + fallback_ops = get_fallback_ops() + metrics.reset() + + if len(fallback_ops) > 0: + raise RuntimeError(f"Fail to extact the compiled graph because of fallback: {','.join(fallback_ops)}") + + if not isinstance(lazy_out, (tuple, list)): + lazy_out = (lazy_out,) + + args_and_out = tuple(lazy_args) + tuple(lazy_out) + return_value_handler = ReturnValueHandler(args_and_out) + if debug: + print("Fx code:\n", model.code) + print("LTC IR:", lazy_debug.dump_ir(args_and_out, "text")) + + # TODO: this part is TS backend specific for now and will be generalized to + # support XLA + graph_input_tensor_ids, graph_input_ivalues = computation.get_tensors_ts_device_data_node(args_and_out) + assert len(graph_input_tensor_ids) == len(graph_input_ivalues) + graph_input_matcher = GraphInputMatcher(tensor_id_to_arg_idx, graph_input_tensor_ids, graph_input_ivalues) + + graph_hash = computation.get_graph_hash(args_and_out) + + if debug: + print("graph_hash", graph_hash) + print(f"args_tensor_ids {args_tensor_ids}") + print("tensor ids from device data:", graph_input_tensor_ids) + + # sync the list of output tensors so the computation graph for these + # tensors will be cached. Those computation graphs can be retrieved + # by graph hash later. + lazy.sync_multi(args_and_out, []) + + def optimized_mod(*args): + if len(args_and_out) == 0: + return () + graph_input = graph_input_matcher(args) + res = return_value_handler.duplicate_eager_tensors(computation.run_cached_graph(graph_hash, graph_input)) + + assert len(res) == len(args_and_out) + for i, arg in enumerate(args): + # only copy those tensors that get inplace updated + if arg is not res[i]: + arg.copy_(res[i]) + + # skip the args + return res[len(args):] + + return optimized_mod diff --git a/torch/_lazy/metrics.py b/torch/_lazy/metrics.py new file mode 100644 index 00000000000000..043db981bb71ed --- /dev/null +++ b/torch/_lazy/metrics.py @@ -0,0 +1,13 @@ +import torch._C._lazy + +def reset(): + """Resets all metric counters.""" + torch._C._lazy._reset_metrics() + +def counter_names(): + """Retrieves all the currently active counter names.""" + return torch._C._lazy._counter_names() + +def counter_value(name: str): + """Return the value of the counter with the speficied name""" + return torch._C._lazy._counter_value(name) diff --git a/torch/_lazy/tensor_factory_functions.py b/torch/_lazy/tensor_factory_functions.py new file mode 100644 index 00000000000000..47aa9c500466da --- /dev/null +++ b/torch/_lazy/tensor_factory_functions.py @@ -0,0 +1,48 @@ +import torch + +""" +tensor_factory_functions defines the list of torch functions that create tensors. +The list is grabbed by searching thru native_functions.yaml by the following +regular expression: + + cat native_functions.yaml | grep 'func:' | grep -v "Tensor.*->" | grep "[-]>.*Tensor" + +It's possible that new tensor factory functions are added making this list stale. +Use at your own risk or regenerate the list. +""" +tensor_factory_functions = ( + torch._cudnn_init_dropout_state, + torch.arange, + torch.bartlett_window, + torch.blackman_window, + torch._empty_affine_quantized, + torch.empty_strided, + torch.eye, + torch.full, + torch.from_file, + torch.hann_window, + torch.hamming_window, + torch.kaiser_window, + torch.linspace, + torch.logspace, + torch.ones, + torch.scalar_tensor, + torch.rand, + torch.randint, + torch.randn, + torch.randperm, + torch.range, + torch._efficientzerotensor, + torch.zeros, + torch.tril_indices, + torch.triu_indices, + # Note: the following functions match the regular expression search above but + # they are not available in the torch module. Comment out. + # torch._sparse_coo_tensor_with_dims, + # torch.fft_fftfreq, + # torch.fft_rfftfreq, +) + ( + # torch.tensor is special since it's not in native_functions.yaml + # add it separately + torch.tensor, +) diff --git a/torch/_lazy/ts_backend.py b/torch/_lazy/ts_backend.py new file mode 100644 index 00000000000000..118de2dbefca00 --- /dev/null +++ b/torch/_lazy/ts_backend.py @@ -0,0 +1,5 @@ +import torch._C._lazy_ts_backend + +def init(): + """Initializes the lazy Torchscript backend""" + torch._C._lazy_ts_backend._init() diff --git a/torch/_lobpcg.py b/torch/_lobpcg.py index 560d9579e61f90..f6d53c5ae7c8e0 100644 --- a/torch/_lobpcg.py +++ b/torch/_lobpcg.py @@ -652,17 +652,16 @@ class LOBPCG(object): """ def __init__(self, - A, # type: Optional[Tensor] - B, # type: Optional[Tensor] - X, # type: Tensor - iK, # type: Optional[Tensor] - iparams, # type: Dict[str, int] - fparams, # type: Dict[str, float] - bparams, # type: Dict[str, bool] - method, # type: str - tracker # type: None - ): - # type: (...) -> None + A: Optional[Tensor], + B: Optional[Tensor], + X: Tensor, + iK: Optional[Tensor], + iparams: Dict[str, int], + fparams: Dict[str, float], + bparams: Dict[str, bool], + method: str, + tracker: None + ) -> None: # constant parameters self.A = A @@ -681,10 +680,10 @@ def __init__(self, self.E = torch.zeros((n, ), dtype=X.dtype, device=X.device) self.R = torch.zeros((m, n), dtype=X.dtype, device=X.device) self.S = torch.zeros((m, 3 * n), dtype=X.dtype, device=X.device) - self.tvars = {} # type: Dict[str, Tensor] - self.ivars = {'istep': 0} # type: Dict[str, int] - self.fvars = {'_': 0.0} # type: Dict[str, float] - self.bvars = {'_': False} # type: Dict[str, bool] + self.tvars: Dict[str, Tensor] = {} + self.ivars: Dict[str, int] = {'istep': 0} + self.fvars: Dict[str, float] = {'_': 0.0} + self.bvars: Dict[str, bool] = {'_': False} def __str__(self): lines = ['LOPBCG:'] @@ -947,11 +946,10 @@ def _get_rayleigh_ritz_transform(self, S): return Rinv * d_col def _get_svqb(self, - U, # Tensor - drop, # bool - tau # float - ): - # type: (Tensor, bool, float) -> Tensor + U: Tensor, # Tensor + drop: bool, # bool + tau: float # float + ) -> Tensor: """Return B-orthonormal U. .. note:: When `drop` is `False` then `svqb` is based on the diff --git a/torch/_masked/__init__.py b/torch/_masked/__init__.py index e3ed37af4436a9..e28aeec93d54d0 100644 --- a/torch/_masked/__init__.py +++ b/torch/_masked/__init__.py @@ -163,9 +163,12 @@ def _generate_docstring(func): prod=(('dim',), ('keepdim=False', 'dtype=None', 'mask=None')), amin=(('dim',), ('keepdim=False', 'dtype=None', 'mask=None')), amax=(('dim',), ('keepdim=False', 'dtype=None', 'mask=None')), + argmin=(('dim__as_int',), ('keepdim=False', 'dtype=None', 'mask=None')), + argmax=(('dim__as_int',), ('keepdim=False', 'dtype=None', 'mask=None')), mean=(('dim',), ('keepdim=False', 'dtype=None', 'mask=None')), norm=(('ord', 'dim',), ('keepdim=False', 'dtype=None', 'mask=None')), var=(('dim', 'unbiased'), ('keepdim=False', 'dtype=None', 'mask=None')), + std=(('dim', 'unbiased'), ('keepdim=False', 'dtype=None', 'mask=None')), softmax=(('dim__as_int',), ('dtype=None', 'mask=None')), log_softmax=(('dim__as_int',), ('dtype=None', 'mask=None')), softmin=(('dim__as_int',), ('dtype=None', 'mask=None')), @@ -226,9 +229,12 @@ def _generate_docstring(func): prod='product', amax='maximum', amin='minimum', + argmax='argmax', + argmin='argmin', mean='mean', norm='norm', - var='variance') + var='variance', + std='standard_deviation') normalization_names = dict( softmax='softmax', @@ -248,7 +254,7 @@ def _generate_docstring(func): if func.__name__ in {'norm', 'normalize'}: example_args = (2.0, example_dim) example_input = example_input.to(dtype=torch.float32) - elif func.__name__ in {'var'}: + elif func.__name__ in {'var', 'std'}: example_args = (example_dim, False) else: example_args = (example_dim,) @@ -343,12 +349,12 @@ def _reduction_identity(op_name: str, input: Tensor, *args): return torch.tensor(0, dtype=dtype, device=device) elif op_name == 'prod': return torch.tensor(1, dtype=dtype, device=device) - elif op_name == 'amax': + elif op_name in {'amax', 'argmax'}: if torch.is_floating_point(input): return torch.tensor(-torch.inf, dtype=dtype, device=device) elif torch.is_signed(input) or dtype == torch.uint8: return torch.tensor(torch.iinfo(dtype).min, dtype=dtype, device=device) - elif op_name == 'amin': + elif op_name in {'amin', 'argmin'}: if torch.is_floating_point(input): return torch.tensor(torch.inf, dtype=dtype, device=device) elif torch.is_signed(input) or dtype == torch.uint8: @@ -366,7 +372,7 @@ def _reduction_identity(op_name: str, input: Tensor, *args): assert torch.is_floating_point(input), input.dtype return torch.tensor(torch.inf, dtype=dtype, device=device) return torch.tensor(0, dtype=dtype, device=device) - elif op_name == 'var': + elif op_name in {'var', 'std'}: return None raise NotImplementedError(f'identity of {op_name} on {dtype} input') @@ -375,6 +381,12 @@ def _canonical_dim(dim: DimOrDims, ndim: int) -> Tuple[int, ...]: """Return dim argument as a tuple of sorted dim values. """ dims: List[int] = [] + if dim == (): + # Currently, `dim=()` in reductions operations means "reduce + # over all dimensions" while in future, it will read "no + # reduce". See https://github.com/pytorch/pytorch/issues/29137 + # When gh-29137 is resolved, this if-block must be deleted. + dim = None if dim is None: return tuple(range(ndim)) ndim = max(ndim, 1) @@ -388,30 +400,252 @@ def _canonical_dim(dim: DimOrDims, ndim: int) -> Tuple[int, ...]: return tuple(sorted(dims)) +def _sparse_coo_flatten_indices(indices: Tensor, shape: tuple): + # Flatted N-D indices to 1-D indices + flat_indices = indices.new_zeros(indices.size(1)) + for d, sz in enumerate(shape): + flat_indices.mul_(sz) + flat_indices.add_(indices[d]) + return flat_indices + + +def _any(input: Tensor, dim: tuple, keepdim: bool): + # Support torch.any with tuple dim argument. + # Workaround of https://github.com/pytorch/pytorch/issues/56586 + r = input + for d in reversed(dim): + r = r.any(dim=d, keepdim=keepdim) + return r + + +def _sparse_coo_where(mask: Tensor, input: Tensor, fill_value: Tensor) -> Tensor: + """Sparse variant of torch.where. Supports sparse COO and hybrid sparse COO tensors. + + _sparse_coo_where implements the following invariant: + + _sparse_coo_where(mask, input, fill_value).to_dense(fill_value) == + torch.where(mask.to_dense(), input.to_dense(), torch.full(input.shape, fill_value)) + + where `a == b` means `assertEqual(a, b)`, mask is boolean sparse + tensor, and `to_dense(fill_value)` is like `to_dense()` except + that the unspecified elements are mapped to `fill_value` rather + than to `0`. + + Returns a sparse COO tensor with the following features: + + - all specified elements correspond to masked-in elements that + have the values of the input tensor. If there exists a masked-in + element (as specified by mask) that is not specified in the + input, in the result tensor, the corresponding element has value + 0. In the dense part of the sparse tensor, the masked-out + elements are replaced with fill_value. + + - all unspecified elements correspond to masked-out elements. + """ + + assert input.layout == torch.sparse_coo + assert mask.layout == input.layout + assert mask.shape == input.shape + assert mask.dense_dim() == input.dense_dim() # TODO: eliminate this restriction + + input = input.coalesce() + + # For set operations on sparse tensor indices, we'll convert + # multi-dimensional indices to 1-D indices for efficiency. + input_flat_indices = _sparse_coo_flatten_indices(input.indices(), input.shape[:input.sparse_dim()]) + mask_flat_indices = _sparse_coo_flatten_indices(mask.indices(), mask.shape[:mask.sparse_dim()]) + + # the set of mask flat indices that define masked-in elements: + if mask.dense_dim() > 0: + mask_values = _any(mask.values(), tuple(range(1, input.sparse_dim() + 1)), False) + else: + mask_values = mask.values() + maskin_flat_indices = mask_flat_indices[mask_values.nonzero()[:, 0]] + + def intersection(i1, i2): + union, counts = torch.cat([i1, i2]).unique(return_counts=True) + return union, torch.where(counts.gt(1)) + + def minus(i1, i2): + union, counts = torch.cat([i1, i2]).unique(return_counts=True) + return intersection(union[torch.where(counts.eq(1))], i1) + + def _apply(a): + obj, w = a + return obj[w] + + # the set of input flat indices of specified and masked-in elements: + maskin_input_flat_indices = _apply(intersection(maskin_flat_indices, input_flat_indices)) + _, w = intersection(input_flat_indices, maskin_input_flat_indices) + + # the indices and values of masked-in elements + where_input_indices = input.indices()[(slice(None),) + w] + where_input_values = input.values()[w] + + if mask.dense_dim() > 0: + # apply mask to the dense part of the input values: + _, w1 = intersection(mask_flat_indices, maskin_input_flat_indices) + where_mask_values = mask.values()[w1] + where_input_values = torch.where(where_mask_values, where_input_values, + where_input_values.new_full([], fill_value.item())) + + # the set of flat indices of unspecified input and masked-in elements: + maskin_zero_flat_indices = _apply(minus(maskin_flat_indices, maskin_input_flat_indices)) + + # the indices of masked-in zero elements + _, w = intersection(mask_flat_indices, maskin_zero_flat_indices) + where_zero_indices = mask.indices()[(slice(None),) + w] + + # construct result + n = where_zero_indices.size(1) + if n == 0: + # the input is coalesced, hence input_flat_indices are ordered + # and the result is guaranteed to be coalesced: + result = torch.sparse_coo_tensor(where_input_indices, where_input_values, input.shape) + return result._coalesced_(True) + + where_indices = torch.cat([where_input_indices, where_zero_indices], dim=1) + where_values = torch.cat([where_input_values, where_input_values.new_zeros((n,) + where_input_values.shape[1:])]) + result = torch.sparse_coo_tensor(where_indices, where_values, input.shape) + + # appending zero elements leads to uncoalesced sparse tensor + return result.coalesce() + + +def _sparse_csr_where(mask: Tensor, input: Tensor, fill_value: Tensor) -> Tensor: + """Sparse variant of torch.where. Supports sparse CSR tensors. + """ + # TODO: implement sparse CSR specific where operator for efficiency + return _sparse_coo_where(mask.to_sparse_coo(), input.to_sparse_coo(), fill_value).to_sparse_csr() + + +def _where(mask: Tensor, input: Tensor, fill_value: Tensor) -> Tensor: + """torch.where with sparse inputs support. + + _where implements the following invariant: + + _where(mask, input, fill_value).to_dense(fill_value) == + torch.where(mask.to_dense(), input.to_dense(), torch.full(input.shape, fill_value)) + + where `a == b` means `assertEqual(a, b)`, mask is boolean sparse + tensor, and `to_dense(fill_value)` is like `to_dense()` except + that the unspecified elements are mapped to `fill_value` rather + than to `0`. + + Returns a sparse tensor with the following features: + + - all specified elements correspond to masked-in elements that + have the values of the input tensor. If there exists a masked-in + element (as specified by mask) that is not specified in the + input, in the result tensor, the corresponding element has value + 0. In the dense part of the sparse tensor, the masked-out + elements are replaced with fill_value. + + - all unspecified elements correspond to masked-out elements. + """ + if mask.layout == torch.strided: + if fill_value.dtype == torch.bool: + # Workaround internal assert failure in + # test_nvfuser_correctness__masked_mean_cuda_bool: We + # don't have an op for aten::new_full but it isn't a + # special case. Argument types: Tensor, int[], bool, int, + # int, Device, bool + fill = input.new_full([], int(fill_value.item())).to(dtype=torch.bool) + else: + fill = input.new_full([], fill_value.item()) + return torch.where(mask, input, fill) + elif mask.layout == torch.sparse_coo: + return _sparse_coo_where(mask, input, fill_value) + elif mask.layout == torch.sparse_csr: + return _sparse_csr_where(mask, input, fill_value) + else: + raise ValueError(f'_where expects strided or sparse COO or sparse CSR tensor but got {mask.layout}') + + def _input_mask(input: Tensor, *args, **kwargs) -> Tensor: """Return canonical input mask. - Canonical input mask is a boolean tensor with the same shape as - input and with (broadcasted) content of mask, if specified. + + A canonical input mask is defined as a boolean mask tensor that + shape and layout matches with the shape and the layout of the + input. + + The canonical input mask is computed from the :attr:`mask` tensor + content to meet the following criteria: + + 1. The shape of the canonical input mask is the same as the shape + of :attr:`input` tensor. If the mask tensor has a smaller shape + than the shape of the :attr:`input`, broadcasting rules will be + applied. Downcasting of mask is not supported. + + 2. The layout of the canonical input mask is the same as the + layout of the :attr:`input` tensor. If the mask has different + layout, it will be converted to the expected layout. In the + case of sparse COO layout, the canonical input mask will be + coalesced. + + 3. The dtype of the canonical input mask is torch.bool. If the + mask dtype is not bool then it will be converted to bool dtype + using `.to(dtype=bool)` method call. + + 4. The elements of the canonical input mask have boolean values + copied from the content of the :attr:`mask` tensor (after + possible broadcasting and dtype conversion transforms). In + general, the sparsity pattern of the sparse canonical input + mask need not to be the same as the sparsity pattern of the + sparse :attr:`input` tensor. + """ + if input.layout not in {torch.strided, torch.sparse_coo, torch.sparse_csr}: + raise ValueError(f'_input_mask expects strided or sparse COO or sparse CSR tensor but got {input.layout}') + mask = kwargs.get('mask') + + # default mask if mask is None: - inmask = input.new_ones(input.shape, dtype=torch.bool) - elif mask.ndim < input.ndim: - inmask = torch.broadcast_to(mask.clone(), input.shape).to(dtype=torch.bool) - elif mask.ndim > input.ndim: - raise IndexError("_input_mask expected broadcastable mask (got mask dimensionality higher than of the input)") - elif mask.shape != input.shape: - inmask = torch.broadcast_to(mask.clone(), input.shape).to(dtype=torch.bool) - else: - inmask = mask.to(dtype=torch.bool) - return inmask + raise ValueError('_input_mask requires explicit mask') + + # mask shape must match with input shape + if mask.shape != input.shape: + if mask.ndim > input.ndim: + raise IndexError("_input_mask expected broadcastable mask (got mask dimensionality higher than of the input)") + if mask.layout == torch.strided: + mask = torch.broadcast_to(mask.clone(), input.shape).to(dtype=torch.bool) + elif mask.layout == torch.sparse_coo: + mask = torch._sparse_broadcast_to(mask, input.shape) + else: + assert mask.layout == torch.sparse_csr + # Broadcasting of CSR tensors is not implemented. Working + # around by using COO layout. + mask = torch._sparse_broadcast_to(mask.to_sparse(), input.shape).to_sparse_csr() + + # mask layout must match with input layout + if mask.layout != input.layout: + if input.layout == torch.strided: + mask = mask.to_dense() + elif input.layout == torch.sparse_coo: + if mask.layout == torch.strided: + mask = mask.to_sparse(input.sparse_dim()) + else: + mask = mask.to_sparse() + else: + assert input.layout == torch.sparse_csr + mask = mask.to_sparse_csr() + + # sparse mask must be coalesced + if mask.layout == torch.sparse_coo: + mask = mask.coalesce() + + # mask is a boolean tensor + mask = mask.to(dtype=torch.bool) + + return mask def _output_mask(op, input: Tensor, *args, **kwargs) -> Tensor: """Return output mask of masked operation applied to given arguments. """ if callable(op): - is_reduction = op.__name__ in {'sum', 'prod', 'amax', 'amin', 'mean', 'norm', 'var'} + is_reduction = op.__name__ in {'sum', 'prod', 'amax', 'amin', 'argmax', 'argmin', 'mean', 'norm', 'var', 'std'} is_normalization = op.__name__ in {'softmax', 'log_softmax', 'softmin', 'normalize'} if is_reduction: if op.__name__ == 'norm': @@ -421,10 +655,7 @@ def _output_mask(op, input: Tensor, *args, **kwargs) -> Tensor: outmask = _input_mask(input, *args, **kwargs) keepdim = kwargs.get('keepdim', False) dim_ = _canonical_dim(dim, input.ndim) - # Workaround https://github.com/pytorch/pytorch/issues/56586 - for d in reversed(dim_): - outmask = outmask.any(dim=d, keepdim=bool(keepdim)) - return outmask + return _any(outmask, dim_, bool(keepdim)) elif is_normalization: return _input_mask(input, *args, **kwargs) else: @@ -433,6 +664,19 @@ def _output_mask(op, input: Tensor, *args, **kwargs) -> Tensor: raise ValueError(f'_output_mask expected masked operation (got {type(op).__name__} object)') +def _combine_input_and_mask(op, input: Tensor, mask, *args) -> Tensor: + """Return input with masked-out elements eliminated for the given operations. + """ + if mask is None: + return input + canonical_mask = _input_mask(input, mask=mask) + if callable(op): + fill_value = _reduction_identity(op.__name__, input, *args) + return _where(canonical_mask, input, fill_value) + else: + raise ValueError(f'_combine_input_and_mask expected masked operation (got {type(op).__name__} object)') + + @_apply_docstring_templates def sum(input: Tensor, dim: DimOrDims = None, @@ -443,15 +687,43 @@ def sum(input: Tensor, # __doc__ is generated by _apply_docstring_templates decorator if dtype is None: dtype = input.dtype - # TODO: What follows is a reference implementation of a masked sum - # operation that is to be replaced with an optimized one and - # extended to support other layouts. + dim_ = _canonical_dim(dim, input.ndim) + + mask_input = _combine_input_and_mask(sum, input, mask) if input.layout == torch.strided: - mask_input = input if mask is None else torch.where(mask, input, input.new_zeros([])) - dim_ = _canonical_dim(dim, input.ndim) return torch.sum(mask_input, dim_, bool(keepdim), dtype=dtype) + + elif input.layout == torch.sparse_coo: + if mask_input.ndim == 0: + # Workaround https://github.com/pytorch/pytorch/issues/65400 + dim_ = () + + result = torch.sparse.sum(mask_input, dim=list(dim_), dtype=dtype) + if result.dtype != dtype: + # https://github.com/pytorch/pytorch/issues/65392 + # https://github.com/pytorch/pytorch/pull/66153 + result = result.to(dtype) + + if result.ndim == 0 and result.layout == torch.strided: + result = result.to_sparse() + + if keepdim and mask_input.ndim > 0: + # torch.sparse.sum does not support keepdim argument, so, + # here we restore the squeezed dimensions + if mask_input.dense_dim() > 0: + raise NotImplementedError('torch._masked.sum on hybrid COO sparse tensor') + indices = result._indices().new_zeros((mask_input.ndim, result._nnz())) + original_dims = tuple(i for i in range(mask_input.ndim) if i not in dim_) + indices[original_dims, ] = result._indices() + shape = tuple((1 if i in dim_ else mask_input.shape[i]) for i in range(mask_input.ndim)) + result = torch.sparse_coo_tensor(indices, result._values(), shape, dtype=result.dtype, device=result.device) + + return result + + elif input.layout == torch.sparse_csr: + return torch._sparse_csr_sum(mask_input, dim=list(dim_), keepdim=bool(keepdim), dtype=dtype) else: - raise ValueError(f'masked sum expects strided tensor (got {input.layout} tensor)') + raise ValueError(f'masked sum expects strided, sparse_coo, or sparse_csr tensor (got {input.layout} tensor)') @_apply_docstring_templates @@ -462,10 +734,9 @@ def prod(input: Tensor, dtype: Optional[DType] = None, mask: Optional[Tensor] = None) -> Tensor: # __doc__ is generated by _apply_docstring_templates decorator + mask_input = _combine_input_and_mask(prod, input, mask) if input.layout == torch.strided: - mask_input = input if mask is None else torch.where(mask, input, torch.ones_like(input)) dim_ = _canonical_dim(dim, input.ndim) - # Workaround https://github.com/pytorch/pytorch/issues/56586 result = mask_input for d in reversed(dim_): @@ -496,12 +767,8 @@ def amax(input: Tensor, {reduction_example}""" if dtype is None: dtype = input.dtype + mask_input = _combine_input_and_mask(amax, input, mask) if input.layout == torch.strided: - if mask is None: - mask_input = input - else: - identity = input.new_full([], _reduction_identity('amax', input)) - mask_input = torch.where(mask, input, identity) dim_ = _canonical_dim(dim, mask_input.ndim) return torch.amax(mask_input, dim_, bool(keepdim)).to(dtype=dtype) else: @@ -527,18 +794,58 @@ def amin(input: Tensor, {reduction_example}""" if dtype is None: dtype = input.dtype + mask_input = _combine_input_and_mask(amin, input, mask) if input.layout == torch.strided: - if mask is None: - mask_input = input - else: - identity = input.new_full([], _reduction_identity('amin', input)) - mask_input = torch.where(mask, input, identity) dim_ = _canonical_dim(dim, mask_input.ndim) return torch.amin(mask_input, dim_, bool(keepdim)).to(dtype=dtype) else: raise ValueError(f'masked amin expects strided tensor (got {input.layout} tensor)') +@_apply_docstring_templates +def argmax(input: Tensor, + dim: int = None, + *, + keepdim: Optional[bool] = False, + dtype: Optional[DType] = None, + mask: Optional[Tensor] = None) -> Tensor: + """\ +{reduction_signature} +{reduction_descr} +{reduction_identity_dtype} +{reduction_args} +{reduction_example}""" + if dtype is None: + dtype = input.dtype + mask_input = _combine_input_and_mask(argmax, input, mask) + if input.layout == torch.strided: + return torch.argmax(mask_input, dim, bool(keepdim)).to(dtype=dtype) + else: + raise ValueError(f'masked argmax expects strided tensor (got {input.layout} tensor)') + + +@_apply_docstring_templates +def argmin(input: Tensor, + dim: int = None, + *, + keepdim: Optional[bool] = False, + dtype: Optional[DType] = None, + mask: Optional[Tensor] = None) -> Tensor: + """\ +{reduction_signature} +{reduction_descr} +{reduction_identity_dtype} +{reduction_args} +{reduction_example}""" + if dtype is None: + dtype = input.dtype + mask_input = _combine_input_and_mask(argmin, input, mask) + if input.layout == torch.strided: + return torch.argmin(mask_input, dim, bool(keepdim)).to(dtype=dtype) + else: + raise ValueError(f'masked argmin expects strided tensor (got {input.layout} tensor)') + + @_apply_docstring_templates def mean(input: Tensor, dim: DimOrDims = None, @@ -564,9 +871,14 @@ def mean(input: Tensor, if dtype is None: dtype = input.dtype if input.layout == torch.strided: - inmask = _input_mask(input, mask=mask) - count = sum(inmask.new_ones(input.shape, dtype=torch.int64), dim, keepdim=keepdim, mask=inmask) - total = sum(input, dim, keepdim=keepdim, dtype=dtype, mask=inmask) + if mask is None: + # TODO: compute count analytically + count = sum(torch.ones(input.shape, dtype=torch.int64, device=input.device), dim, keepdim=keepdim) + total = sum(input, dim, keepdim=keepdim, dtype=dtype) + else: + inmask = _input_mask(input, mask=mask) + count = sum(inmask.new_ones(input.shape, dtype=torch.int64), dim, keepdim=keepdim, mask=inmask) + total = sum(input, dim, keepdim=keepdim, dtype=dtype, mask=inmask) return total / count else: raise ValueError(f'masked sum expects strided tensor (got {input.layout} tensor)') @@ -594,35 +906,22 @@ def norm(input: Tensor, {reduction_example}""" if dtype is None: dtype = input.dtype + mask_input = _combine_input_and_mask(norm, input, mask, ord) if input.layout == torch.strided: - identity = input.new_full([], _reduction_identity('norm', input, ord)) - mask_input = input if mask is None else torch.where(mask, input, identity) dim_ = _canonical_dim(dim, input.ndim) return torch.linalg.vector_norm(mask_input, ord, dim_, bool(keepdim), dtype=dtype) else: raise ValueError(f'masked norm expects strided tensor (got {input.layout} tensor)') -@_apply_docstring_templates -def var(input: Tensor, - dim: DimOrDims = None, - unbiased: Optional[bool] = False, - *, - keepdim: Optional[bool] = False, - dtype: Optional[DType] = None, - mask: Optional[Tensor] = None) -> Tensor: - """\ -{reduction_signature} - -{reduction_descr} - -The identity value of sample variance operation is undefined. The -elements of output tensor with strided layout, that correspond to -fully masked-out elements, have ``nan`` values. - -{reduction_args} - -{reduction_example}""" +def std_var(input: Tensor, + dim: DimOrDims = None, + unbiased: Optional[bool] = False, + *, + keepdim: Optional[bool] = False, + dtype: Optional[DType] = None, + mask: Optional[Tensor] = None, + take_sqrt: Optional[bool] = False) -> Tensor: if dtype is None: dtype = input.dtype if not (dtype.is_floating_point or dtype.is_complex): @@ -631,23 +930,88 @@ def var(input: Tensor, if not (compute_dtype.is_floating_point or compute_dtype.is_complex): compute_dtype = torch.float32 if input.layout == torch.strided: - inmask = _input_mask(input, mask=mask) - count = sum(inmask.new_ones(input.shape, dtype=torch.int64), dim, keepdim=True, mask=inmask) - sample_total = sum(input, dim, keepdim=True, dtype=dtype, mask=inmask) + if mask is None: + # TODO: compute count analytically + count = sum(torch.ones(input.shape, dtype=torch.int64, device=input.device), dim, keepdim=True) + sample_total = sum(input, dim, keepdim=True, dtype=dtype) + else: + inmask = _input_mask(input, mask=mask) + count = sum(inmask.new_ones(input.shape, dtype=torch.int64), dim, keepdim=True, mask=inmask) + sample_total = sum(input, dim, keepdim=True, dtype=dtype, mask=inmask) # TODO: replace torch.subtract/divide/square/maximum with # masked subtract/divide/square/maximum when these will be # available. sample_mean = torch.divide(sample_total, count) x = torch.subtract(input, sample_mean) - total = sum(x * x.conj(), dim, keepdim=keepdim, dtype=compute_dtype, mask=inmask) + if mask is None: + total = sum(x * x.conj(), dim, keepdim=keepdim, dtype=compute_dtype) + else: + total = sum(x * x.conj(), dim, keepdim=keepdim, dtype=compute_dtype, mask=inmask) if not keepdim: count = count.reshape(total.shape) if unbiased: count = torch.subtract(count, 1) count = torch.maximum(count, count.new_zeros([])) - return torch.divide(total, count).to(dtype=dtype) + output = torch.divide(total, count).to(dtype=dtype) + if take_sqrt: + output = torch.sqrt(output) + return output else: - raise ValueError(f'masked var expects strided tensor (got {input.layout} tensor)') + raise ValueError(f'masked std/var expects strided tensor (got {input.layout} tensor)') + + +@_apply_docstring_templates +def var(input: Tensor, + dim: DimOrDims = None, + unbiased: Optional[bool] = False, + *, + keepdim: Optional[bool] = False, + dtype: Optional[DType] = None, + mask: Optional[Tensor] = None) -> Tensor: + """\ +{reduction_signature} +{reduction_descr} +The identity value of sample variance operation is undefined. The +elements of output tensor with strided layout, that correspond to +fully masked-out elements, have ``nan`` values. +{reduction_args} +{reduction_example}""" + return std_var( + input=input, + dim=dim, + unbiased=unbiased, + keepdim=keepdim, + dtype=dtype, + mask=mask, + take_sqrt=False, + ) + + +@_apply_docstring_templates +def std(input: Tensor, + dim: DimOrDims = None, + unbiased: Optional[bool] = False, + *, + keepdim: Optional[bool] = False, + dtype: Optional[DType] = None, + mask: Optional[Tensor] = None) -> Tensor: + """\ +{reduction_signature} +{reduction_descr} +The identity value of sample standard deviation operation is undefined. The +elements of output tensor with strided layout, that correspond to +fully masked-out elements, have ``nan`` values. +{reduction_args} +{reduction_example}""" + return std_var( + input=input, + dim=dim, + unbiased=unbiased, + keepdim=keepdim, + dtype=dtype, + mask=mask, + take_sqrt=True + ) @_apply_docstring_templates @@ -659,10 +1023,8 @@ def softmax(input: Tensor, if dtype is None: dtype = input.dtype dim_ = _canonical_dim(dim, input.ndim)[0] + mask_input = _combine_input_and_mask(amax, input, mask) if input.layout == torch.strided: - fill = input.new_full([], _reduction_identity('amax', input)) - inmask = _input_mask(input, mask=mask) - mask_input = torch.where(inmask, input, fill) return torch.nn.functional.softmax(mask_input, dim_, dtype=dtype) else: raise ValueError(f'masked softmax expects strided tensor (got {input.layout} tensor)') @@ -677,10 +1039,8 @@ def log_softmax(input: Tensor, if dtype is None: dtype = input.dtype dim_ = _canonical_dim(dim, input.ndim)[0] + mask_input = _combine_input_and_mask(amax, input, mask) if input.layout == torch.strided: - fill = input.new_full([], _reduction_identity('amax', input)) - inmask = _input_mask(input, mask=mask) - mask_input = torch.where(inmask, input, fill) return torch.nn.functional.log_softmax(mask_input, dim_, dtype=dtype) else: raise ValueError(f'masked log_softmax expects strided tensor (got {input.layout} tensor)') @@ -695,10 +1055,8 @@ def softmin(input: Tensor, if dtype is None: dtype = input.dtype dim_ = _canonical_dim(dim, input.ndim)[0] + mask_input = _combine_input_and_mask(amin, input, mask) if input.layout == torch.strided: - fill = input.new_full([], _reduction_identity('amin', input)) - inmask = _input_mask(input, mask=mask) - mask_input = torch.where(inmask, input, fill) return torch.nn.functional.softmin(mask_input, dim_, dtype=dtype) else: raise ValueError(f'masked softmin expects strided tensor (got {input.layout} tensor)') @@ -715,13 +1073,12 @@ def normalize(input: Tensor, if dtype is None: dtype = input.dtype dim_ = _canonical_dim(dim, input.ndim)[0] + # TODO: eliminate mask_input as unnecessary when using masked divide. + mask_input = _combine_input_and_mask(sum, input, mask) if input.layout == torch.strided: nrm_ = norm(input, ord, dim, keepdim=True, dtype=dtype, mask=mask) # TODO: replace torch.maximum with masked maximum when available. denom = torch.maximum(nrm_, nrm_.new_full([], eps)) - # TODO: eliminate mask_input as unnecessary when using masked divide. - inmask = _input_mask(input, mask=mask) - mask_input = input if mask is None else torch.where(inmask, input, input.new_zeros([])) # TODO: replace torch.divide with masked divide when available. return torch.divide(mask_input, denom) else: diff --git a/torch/_masked/_docs.py b/torch/_masked/_docs.py index b8519b5f8f7b55..40b58ed8123d06 100644 --- a/torch/_masked/_docs.py +++ b/torch/_masked/_docs.py @@ -149,6 +149,136 @@ tensor([ -3, 9223372036854775807]) """ +argmax_docstring = """argmax(input, dim, *, keepdim=False, dtype=None, mask=None) -> Tensor +Returns argmax of all the elements in the :attr:`input` +tensor along the given dimension(s) :attr:`dim` while the :attr:`input` +elements are masked out according to the boolean tensor +:attr:`mask`. +The identity value of argmax operation, which is used to start the +reduction, depends on input dtype. For instance, for float32, uint8, +and int32 dtypes, the identity values are ``-inf``, ``0``, and ``-2147483648``, respectively. +If :attr:`keepdim` is ``True``, the output tensor is of the same size +as :attr:`input` except in the dimension(s) :attr:`dim` where it is of +size 1. Otherwise, :attr:`dim` is squeezed (see +:func:`torch.squeeze`), resulting in the output tensor having 1 (or +``len(dim)``) fewer dimension(s). + +The boolean tensor :attr:`mask` defines the "validity" of +:attr:`input` tensor elements: if :attr:`mask` element is True +then the corresponding element in :attr:`input` tensor will be +included in argmax computation, otherwise the element is +ignored. + +When all elements of :attr:`input` along the given dimension +:attr:`dim` are ignored (fully masked-out), the corresponding element +of the output tensor will have undefined value: it may or may not +correspond to the identity value of argmax operation; the +choice may correspond to the value that leads to the most efficient +storage of :attr:`output` tensor. + +The mask of the output tensor can be computed as +``torch.any(torch.broadcast_to(mask, input.shape), dim, keepdim=keepdim, +dtype=torch.bool)``. + +The shapes of the :attr:`mask` tensor and the :attr:`input` tensor +don't need to match, but they must be :ref:`broadcastable +` and the dimensionality of the :attr:`mask` +tensor must not be greater than of the :attr:`input` tensor. + +Args: + input (Tensor): the input tensor + dim (int): the dimension along which argmax is computed. + +Keyword args: + keepdim (bool, optional): whether the output tensor has + :attr:`dim` retained or not. Default: False. + dtype (:class:`torch.dtype`, optional): the desired data type + of returned tensor. If specified, the input tensor is + casted to :attr:`dtype` before the operation is + performed. Default: None. + mask (:class:`torch.Tensor`, optional): the boolean tensor + containing the binary mask of validity of input tensor + elements. + Default: None that is equivalent to ``torch.ones(input.shape, dtype=torch.bool)``. +Example:: + + >>> input = tensor([[-3, -2, -1], [ 0, 1, 2]]) + >>> input + tensor([[-3, -2, -1], + [ 0, 1, 2]]) + >>> mask = tensor([[ True, False, True], [False, False, False]]) + >>> mask + tensor([[ True, False, True], + [False, False, False]]) + >>> torch._masked.argmax(input, 1, mask=mask) + tensor([2, 0]) +""" + +argmin_docstring = """argmin(input, dim, *, keepdim=False, dtype=None, mask=None) -> Tensor +Returns argmin of all the elements in the :attr:`input` +tensor along the given dimension(s) :attr:`dim` while the :attr:`input` +elements are masked out according to the boolean tensor +:attr:`mask`. +The identity value of argmin operation, which is used to start the +reduction, depends on input dtype. For instance, for float32, uint8, +and int32 dtypes, the identity values are ``inf``, ``255``, and ``2147483647``, respectively. +If :attr:`keepdim` is ``True``, the output tensor is of the same size +as :attr:`input` except in the dimension(s) :attr:`dim` where it is of +size 1. Otherwise, :attr:`dim` is squeezed (see +:func:`torch.squeeze`), resulting in the output tensor having 1 (or +``len(dim)``) fewer dimension(s). + +The boolean tensor :attr:`mask` defines the "validity" of +:attr:`input` tensor elements: if :attr:`mask` element is True +then the corresponding element in :attr:`input` tensor will be +included in argmin computation, otherwise the element is +ignored. + +When all elements of :attr:`input` along the given dimension +:attr:`dim` are ignored (fully masked-out), the corresponding element +of the output tensor will have undefined value: it may or may not +correspond to the identity value of argmin operation; the +choice may correspond to the value that leads to the most efficient +storage of :attr:`output` tensor. + +The mask of the output tensor can be computed as +``torch.any(torch.broadcast_to(mask, input.shape), dim, keepdim=keepdim, +dtype=torch.bool)``. + +The shapes of the :attr:`mask` tensor and the :attr:`input` tensor +don't need to match, but they must be :ref:`broadcastable +` and the dimensionality of the :attr:`mask` +tensor must not be greater than of the :attr:`input` tensor. + +Args: + input (Tensor): the input tensor + dim (int): the dimension along which argmin is computed. + +Keyword args: + keepdim (bool, optional): whether the output tensor has + :attr:`dim` retained or not. Default: False. + dtype (:class:`torch.dtype`, optional): the desired data type + of returned tensor. If specified, the input tensor is + casted to :attr:`dtype` before the operation is + performed. Default: None. + mask (:class:`torch.Tensor`, optional): the boolean tensor + containing the binary mask of validity of input tensor + elements. + Default: None that is equivalent to ``torch.ones(input.shape, dtype=torch.bool)``. +Example:: + + >>> input = tensor([[-3, -2, -1], [ 0, 1, 2]]) + >>> input + tensor([[-3, -2, -1], + [ 0, 1, 2]]) + >>> mask = tensor([[ True, False, True], [False, False, False]]) + >>> mask + tensor([[ True, False, True], + [False, False, False]]) + >>> torch._masked.argmin(input, 1, mask=mask) + tensor([0, 0]) +""" + log_softmax_docstring = """log_softmax(input, dim, *, dtype=None, mask=None) -> Tensor Returns log_softmax of all the slices in the :attr:`input` tensor @@ -593,6 +723,74 @@ [ nan, nan, nan]]) """ +std_docstring = """std(input, dim, unbiased, *, keepdim=False, dtype=None, mask=None) -> Tensor +Returns standard_deviation of all the elements in the :attr:`input` +tensor along the given dimension(s) :attr:`dim` while the :attr:`input` +elements are masked out according to the boolean tensor +:attr:`mask`. +The identity value of sample standard deviation operation is undefined. The +elements of output tensor with strided layout, that correspond to +fully masked-out elements, have ``nan`` values. +If :attr:`keepdim` is ``True``, the output tensor is of the same size +as :attr:`input` except in the dimension(s) :attr:`dim` where it is of +size 1. Otherwise, :attr:`dim` is squeezed (see +:func:`torch.squeeze`), resulting in the output tensor having 1 (or +``len(dim)``) fewer dimension(s). + +The boolean tensor :attr:`mask` defines the "validity" of +:attr:`input` tensor elements: if :attr:`mask` element is True +then the corresponding element in :attr:`input` tensor will be +included in standard_deviation computation, otherwise the element is +ignored. + +When all elements of :attr:`input` along the given dimension +:attr:`dim` are ignored (fully masked-out), the corresponding element +of the output tensor will have undefined value: it may or may not +correspond to the identity value of standard_deviation operation; the +choice may correspond to the value that leads to the most efficient +storage of :attr:`output` tensor. + +The mask of the output tensor can be computed as +``torch.any(torch.broadcast_to(mask, input.shape), dim, keepdim=keepdim, +dtype=torch.bool)``. + +The shapes of the :attr:`mask` tensor and the :attr:`input` tensor +don't need to match, but they must be :ref:`broadcastable +` and the dimensionality of the :attr:`mask` +tensor must not be greater than of the :attr:`input` tensor. + +Args: + input (Tensor): the input tensor + dim (int or tuple of ints, optional): the dimension or dimensions to reduce. + Default: None that is equivalent to ``tuple(range(input.ndim))``. + unbiased (bool): when True, use Bessel’s correction, otherwise, compute + the uncorrected sample variance. + +Keyword args: + keepdim (bool, optional): whether the output tensor has + :attr:`dim` retained or not. Default: False. + dtype (:class:`torch.dtype`, optional): the desired data type + of returned tensor. If specified, the input tensor is + casted to :attr:`dtype` before the operation is + performed. Default: None. + mask (:class:`torch.Tensor`, optional): the boolean tensor + containing the binary mask of validity of input tensor + elements. + Default: None that is equivalent to ``torch.ones(input.shape, dtype=torch.bool)``. +Example:: + + >>> input = tensor([[-3, -2, -1], [ 0, 1, 2]]) + >>> input + tensor([[-3, -2, -1], + [ 0, 1, 2]]) + >>> mask = tensor([[ True, False, True], [False, False, False]]) + >>> mask + tensor([[ True, False, True], + [False, False, False]]) + >>> torch._masked.std(input, 1, False, mask=mask) + tensor([1., nan]) +""" + sum_docstring = """sum(input, dim, *, keepdim=False, dtype=None, mask=None) -> Tensor Returns sum of all the elements in the :attr:`input` diff --git a/torch/_ops.py b/torch/_ops.py index 13470bd8558256..645b309bfb3024 100644 --- a/torch/_ops.py +++ b/torch/_ops.py @@ -32,13 +32,17 @@ def __init__(self, overloadpacket, op, schema): self._op = op self._schema = schema self._overloadpacket = overloadpacket + self._overloadname = 'default' if schema.overload_name == '' else schema.overload_name + self.__name__ = "{}.{}".format(self._schema.name.split("::")[1], self._overloadname) + self.__module__ = overloadpacket.__module__ + op.__module__ = overloadpacket.__module__ # it's a no-op since OpOverload object is immutable and must be unique for a given op overload. def __deepcopy__(self, memo=None): return self - def __str__(self): - return "OpOverload(op='{}.{}', overload='{}')".format(*self._schema.name.split("::"), self.overload_name) + def __repr__(self): + return "".format(*self._schema.name.split("::"), self._overloadname) def __call__(self, *args, **kwargs): return self._op(*args, **kwargs or {}) @@ -46,17 +50,15 @@ def __call__(self, *args, **kwargs): def __getattr__(self, key): return getattr(self._op, key) - # `my_namespace::my_op` - @property - def name(self): - return "{}.{}".format(*self._schema.name.split("::")) + def __hash__(self): + return hash(self._op) - @property - def overload_name(self): - return self._schema.overload_name + # `my_namespace.my_op_name.overload_name` + def __str__(self): + return "{}.{}.{}".format(*self._schema.name.split("::"), self._overloadname) @property - def overload_packet(self): + def overloadpacket(self): return self._overloadpacket @property @@ -72,23 +74,21 @@ def __init__(self, qualified_op_name, op_name, op): # These attributes are accessible on the object through the properties # defined below but are immutable self._qualified_op_name = qualified_op_name - self._op_name = op_name + self.__name__ = op_name self._op = op # it's a no-op since OpOverloadPacket object is immutable and must be unique for a given op. def __deepcopy__(self, memo=None): return self - def __str__(self): - return "OpOverloadPacket(op='{}.{}')".format(*self._qualified_op_name.split("::")) + def __repr__(self): + return "".format(*self._qualified_op_name.split("::")) - @property - def qualified_op_name(self): - return "{}.{}".format(*self._qualified_op_name.split("::")) + def __hash__(self): + return hash(self._op) - @property - def op_name(self): - return self._op_name + def __str__(self): + return "{}.{}".format(*self._qualified_op_name.split("::")) @property def op(self): diff --git a/torch/_python_dispatcher.py b/torch/_python_dispatcher.py index aa19a18efb3b56..fe0c6253fdd34a 100644 --- a/torch/_python_dispatcher.py +++ b/torch/_python_dispatcher.py @@ -15,9 +15,9 @@ - CPU/AutogradCPU: represents in-tree backends which we usually have dedicated inference & autograd kernel in pytorch core library. E.g. CPU, CUDA -- QuantizedCPU/AutogradOther: represents in-tree backends which we usually have backend specific +- FPGA/AutogradOther: represents in-tree backends which we usually have backend specific inference kernels, but they share the same autograd kernel specified in AutogradOther. - E.g. QuantizedCPU, QuantizedCUDA + E.g. FPGA, SparseCsrCPU - XLA/AutogradXLA: represents out-of-tree backends which we don't have either inference or autograd kernel defined in pytorch core library. Backend owner is responsible for registering both inference & autograd kernels in their extensions(e.g. torch-xla) for the operators they support. @@ -53,7 +53,7 @@ class PythonDispatcher: name = "foo" runtime_keys = [ "CPU", "AutogradCPU", - "QuantizedCPU", "AutogradOther", + "FPGA", "AutogradOther", "XLA", "AutogradXLA", "Lazy", "AutogradLazy", ] diff --git a/torch/_tensor.py b/torch/_tensor.py index 6a50a029ae769e..cc853162a19d01 100644 --- a/torch/_tensor.py +++ b/torch/_tensor.py @@ -202,11 +202,7 @@ def storage(self): if self.dtype not in torch.storage._dtype_to_storage_type_map(): raise RuntimeError(f'unsupported Storage type: {self.dtype}') - storage = self._storage() - storage_name = torch.storage._dtype_to_storage_type_map()[self.dtype] - storage_class = eval(type(storage).__module__ + '.' + storage_name) - storage = storage_class(wrap_storage=storage) - return storage + return torch._TypedStorage(wrap_storage=self._storage(), dtype=self.dtype) def _reduce_ex_internal(self, proto): check_serializing_named_tensor(self) @@ -223,7 +219,7 @@ def _reduce_ex_internal(self, proto): # 2. Python list is not a good fit due to performance reason. # `tolist()` converts every single element in the tensor into python objects # and serialize them one by one. - if self.device.type in ['xla', 'ort', 'mlc']: + if self.device.type in ['xla', 'ort', 'mlc', 'hpu']: return (torch._utils._rebuild_device_tensor_from_numpy, (self.cpu().numpy(), self.dtype, str(self.device), @@ -659,7 +655,7 @@ def __rmod__(self, other): def __format__(self, format_spec): if has_torch_function_unary(self): return handle_torch_function(Tensor.__format__, (self,), self, format_spec) - if self.dim() == 0: + if self.dim() == 0 and not self.is_meta: return self.item().__format__(format_spec) return object.__format__(self, format_spec) @@ -866,10 +862,10 @@ def storage_type(self): Returns the type of the underlying storage. """ - # NB: this returns old fashioned _TypedStorage, e.g., FloatStorage, as it - # would be pretty pointless otherwise (it would always return - # _UntypedStorage) - return type(self.storage()) + if has_torch_function_unary(self): + return handle_torch_function(Tensor.storage_type, (self,), self) + + return self.storage()._get_legacy_storage_class() def refine_names(self, *names): r"""Refines the dimension names of :attr:`self` according to :attr:`names`. @@ -1067,53 +1063,7 @@ def to_sparse_coo(self): 25 """ - if self.is_sparse: - return self - if self.is_sparse_csr: - crow_indices = self.crow_indices() - col_indices = self.col_indices() - indices = torch._convert_indices_from_csr_to_coo(crow_indices, col_indices, - out_int32=crow_indices.dtype == torch.int32) - return torch.sparse_coo_tensor(indices, - self.values(), - size=self.shape, - dtype=self.dtype, - device=self.device) - else: - return self.to_sparse() - - def to_sparse_csr(self): - """ Convert a tensor to compressed row storage format. Only works with 2D tensors. - - Examples:: - - >>> dense = torch.randn(5, 5) - >>> sparse = dense.to_sparse_csr() - >>> sparse._nnz() - 25 - - """ - shape = self.size() - fill_value = 0 - if len(shape) != 2: - raise RuntimeError("Only 2D tensors can be converted to the CSR format but got shape: ", shape) - - if self.is_sparse: - coalesced_self = self.coalesce() - row_indices = coalesced_self.indices()[0] - device = coalesced_self.values().device - crow_indices = torch._convert_indices_from_coo_to_csr( - row_indices, self.shape[0], out_int32=row_indices.dtype == torch.int32) - return torch.sparse_csr_tensor(crow_indices, - coalesced_self.indices()[1].contiguous(), - coalesced_self.values(), - size=coalesced_self.shape, - dtype=coalesced_self.dtype, - device=device) - elif self.is_sparse_csr: - return self - else: - return self.to_sparse().to_sparse_csr() + return self.to_sparse() def _update_names(self, names, inplace): if has_torch_function_unary(self): diff --git a/torch/_tensor_docs.py b/torch/_tensor_docs.py index 7ff5da2c2f4e41..49e43c502861e2 100644 --- a/torch/_tensor_docs.py +++ b/torch/_tensor_docs.py @@ -1060,6 +1060,24 @@ def add_docstr_all(method, docstr): {memory_format} """.format(**common_args)) +add_docstr_all('ipu', + r""" +ipu(device=None, non_blocking=False, memory_format=torch.preserve_format) -> Tensor + +Returns a copy of this object in IPU memory. + +If this object is already in IPU memory and on the correct device, +then no copy is performed and the original object is returned. + +Args: + device (:class:`torch.device`): The destination IPU device. + Defaults to the current IPU device. + non_blocking (bool): If ``True`` and the source is in pinned memory, + the copy will be asynchronous with respect to the host. + Otherwise, the argument has no effect. Default: ``False``. + {memory_format} +""".format(**common_args)) + add_docstr_all('xpu', r""" xpu(device=None, non_blocking=False, memory_format=torch.preserve_format) -> Tensor @@ -3374,11 +3392,68 @@ def callable(a, b) -> number """.format(**reproducibility_notes)) -add_docstr_all('scatter_reduce', r""" -scatter_reduce(input, dim, index, reduce, *, output_size=None) -> Tensor +add_docstr_all('scatter_reduce_', r""" +scatter_reduce_(dim, index, src, reduce, *, include_self=True) -> Tensor -See :func:`torch.scatter_reduce` -""") +Reduces all values from the :attr:`src` tensor to the indices specified in +the :attr:`index` tensor in the :attr:`self` tensor using the applied reduction +defined via the :attr:`reduce` argument (:obj:`"sum"`, :obj:`"prod"`, :obj:`"mean"`, +:obj:`"amax"`, :obj:`"amin"`). For each value in :attr:`src`, it is reduced to an +index in :attr:`self` which is specified by its index in :attr:`src` for +``dimension != dim`` and by the corresponding value in :attr:`index` for +``dimension = dim``. If :obj:`include_self="True"`, the values in the :attr:`self` +tensor are included in the reduction. + +:attr:`self`, :attr:`index` and :attr:`src` should all have +the same number of dimensions. It is also required that +``index.size(d) <= src.size(d)`` for all dimensions ``d``, and that +``index.size(d) <= self.size(d)`` for all dimensions ``d != dim``. +Note that ``index`` and ``src`` do not broadcast. + +For a 3-D tensor with :obj:`reduce="sum"` and :obj:`include_self=True` the +output is given as:: + + self[index[i][j][k]][j][k] += src[i][j][k] # if dim == 0 + self[i][index[i][j][k]][k] += src[i][j][k] # if dim == 1 + self[i][j][index[i][j][k]] += src[i][j][k] # if dim == 2 + +Note: + {forward_reproducibility_note} + +.. note:: + + The backward pass is implemented only for ``src.shape == index.shape``. + +.. warning:: + + This function is in beta and may change in the near future. + +Args: + dim (int): the axis along which to index + index (LongTensor): the indices of elements to scatter and reduce. + src (Tensor): the source elements to scatter and reduce + reduce (str): the reduction operation to apply for non-unique indices + (:obj:`"sum"`, :obj:`"prod"`, :obj:`"mean"`, :obj:`"amax"`, :obj:`"amin"`) + include_self (bool): whether elements from the :attr:`self` tensor are + included in the reduction + +Example:: + + >>> src = torch.tensor([1., 2., 3., 4., 5., 6.]) + >>> index = torch.tensor([0, 1, 0, 1, 2, 1]) + >>> input = torch.tensor([1., 2., 3., 4.]) + >>> input.scatter_reduce(0, index, src, reduce="sum") + tensor([5., 14., 8., 4.]) + >>> input.scatter_reduce(0, index, src, reduce="sum", include_self=False) + tensor([4., 12., 5., 4.]) + >>> input2 = torch.tensor([5., 4., 3., 2.]) + >>> input2.scatter_reduce(0, index, src, reduce="amax") + tensor([5., 6., 5., 2.]) + >>> input2.scatter_reduce(0, index, src, reduce="amax", include_self=False) + tensor([3., 6., 5., 2.]) + + +""".format(**reproducibility_notes)) add_docstr_all('select', r""" @@ -4146,6 +4221,20 @@ def callable(a, b) -> number size=(3, 3), nnz=1, layout=torch.sparse_coo) """) +add_docstr_all('to_sparse_csr', + r""" +to_sparse_csr() -> Tensor +Convert a tensor to compressed row storage format. Only works with 2D tensors. + +Example:: + + >>> dense = torch.randn(5, 5) + >>> sparse = dense.to_sparse_csr() + >>> sparse._nnz() + 25 + +""") + add_docstr_all('to_mkldnn', r""" to_mkldnn() -> Tensor @@ -4752,6 +4841,13 @@ def callable(a, b) -> number Out-of-place version of :meth:`torch.Tensor.scatter_add_` """) +add_docstr_all('scatter_reduce', + r""" +scatter_reduce(dim, index, src, reduce, *, include_self=True) -> Tensor + +Out-of-place version of :meth:`torch.Tensor.scatter_reduce_` +""") + add_docstr_all('masked_scatter', r""" masked_scatter(mask, tensor) -> Tensor @@ -4868,6 +4964,11 @@ def callable(a, b) -> number Is ``True`` if the Tensor is stored on the GPU, ``False`` otherwise. """) +add_docstr_all('is_ipu', + r""" +Is ``True`` if the Tensor is stored on the IPU, ``False`` otherwise. +""") + add_docstr_all('is_xpu', r""" Is ``True`` if the Tensor is stored on the XPU, ``False`` otherwise. diff --git a/torch/_tensor_str.py b/torch/_tensor_str.py index b0bb6e93aaeecc..1c97505b0781b8 100644 --- a/torch/_tensor_str.py +++ b/torch/_tensor_str.py @@ -298,14 +298,14 @@ def get_summarized_data(self): return torch.stack([get_summarized_data(x) for x in self]) def _str_intern(inp): - prefix = 'tensor(' + self, tangent = torch.autograd.forward_ad.unpack_dual(inp) + prefix = "nested_tensor(" if self.is_nested else 'tensor(' indent = len(prefix) suffixes = [] # This is used to extract the primal value and thus disable the forward AD # within this function. # TODO(albanD) This needs to be updated when more than one level is supported - self, tangent = torch.autograd.forward_ad.unpack_dual(inp) # Note [Print tensor device]: # A general logic here is we only print device when it doesn't match @@ -380,6 +380,11 @@ def _str_intern(inp): suffixes.append('zero_point=' + str(self.q_per_channel_zero_points())) suffixes.append('axis=' + str(self.q_per_channel_axis())) tensor_str = _tensor_str(self.dequantize(), indent) + elif self.is_nested: + def indented_str(s, indent): + return "\n".join(f" {line}" for line in s.split("\n")) + strs = ",\n".join(indented_str(str(t), indent + 1) for t in torch.ops.aten.unbind.int(self, 0)) + tensor_str = f"[\n{strs}\n]" else: if self.is_meta: suffixes.append('size=' + str(tuple(self.shape))) diff --git a/torch/_torch_docs.py b/torch/_torch_docs.py index 4ba8d92b5834fb..10626fe72aa156 100644 --- a/torch/_torch_docs.py +++ b/torch/_torch_docs.py @@ -1031,9 +1031,6 @@ def merge_dicts(*dicts): CPU device, and not share its memory. .. seealso:: - :func:`torch.as_tensor` creates a tensor that always shares memory if the input is a - tensor or a NumPy array, copying otherwise. - :func:`torch.tensor` creates a tensor that always copies the data from the input object. :func:`torch.from_numpy` creates a tensor that always shares memory from NumPy arrays. @@ -8548,57 +8545,10 @@ def merge_dicts(*dicts): """) add_docstr(torch.scatter_reduce, r""" -scatter_reduce(input, dim, index, reduce, *, output_size=None) -> Tensor - -Reduces all values from the :attr:`input` tensor to the indices specified in -the :attr:`index` tensor. For each value in :attr:`input`, its output index is -specified by its index in :attr:`input` for ``dimension != dim`` and by the -corresponding value in :attr:`index` for ``dimension = dim``. -The applied reduction for non-unique indices is defined via the :attr:`reduce` -argument (:obj:`"sum"`, :obj:`"prod"`, :obj:`"mean"`, :obj:`"amax"`, :obj:`"amin"`). -For non-existing indices, the output will be filled with the identity of the -applied reduction (1 for :obj:`"prod"` and 0 otherwise). - -It is also required that ``index.size(d) == input.size(d)`` for all dimensions ``d``. -Moreover, if :attr:`output_size` is defined the the values of :attr:`index` must be -between ``0`` and ``output_size - 1`` inclusive. - - -For a 3-D tensor with :obj:`reduce="sum"`, the output is given as:: - - out[index[i][j][k]][j][k] += input[i][j][k] # if dim == 0 - out[i][index[i][j][k]][k] += input[i][j][k] # if dim == 1 - out[i][j][index[i][j][k]] += input[i][j][k] # if dim == 2 +scatter_reduce(input, dim, index, src, reduce, *, include_self=True) -> Tensor -Note: - This out-of-place operation is similar to the in-place versions of - :meth:`~torch.Tensor.scatter_` and :meth:`~torch.Tensor.scatter_add_`, - in which the output tensor is automatically created according to the - maximum values in :attr:`index` and filled based on the identity of the - applied reduction. - -Note: - {forward_reproducibility_note} - -Args: - input (Tensor): the input tensor - dim (int): the axis along which to index - index (LongTensor): the indices of elements to scatter and reduce. - src (Tensor): the source elements to scatter and reduce - reduce (str): the reduction operation to apply for non-unique indices - (:obj:`"sum"`, :obj:`"prod"`, :obj:`"mean"`, :obj:`"amax"`, :obj:`"amin"`) - output_size (int, optional): the size of the output at dimension :attr:`dim`. - If set to :obj:`None`, will get automatically inferred according to - :obj:`index.max() + 1` - -Example:: - - >>> input = torch.tensor([1, 2, 3, 4, 5, 6]) - >>> index = torch.tensor([0, 1, 0, 1, 2, 1]) - >>> torch.scatter_reduce(input, 0, index, reduce="sum", output_size=3) - tensor([4, 12, 5]) - -""".format(**reproducibility_notes)) +Out-of-place version of :meth:`torch.Tensor.scatter_reduce_` +""") add_docstr(torch.select, r""" @@ -9800,10 +9750,10 @@ def merge_dicts(*dicts): r""" roll(input, shifts, dims=None) -> Tensor -Roll the tensor along the given dimension(s). Elements that are shifted beyond the -last position are re-introduced at the first position. If a dimension is not -specified, the tensor will be flattened before rolling and then restored -to the original shape. +Roll the tensor :attr:`input` along the given dimension(s). Elements that are +shifted beyond the last position are re-introduced at the first position. If +:attr:`dims` is `None`, the tensor will be flattened before rolling and then +restored to the original shape. Args: {input} @@ -9821,6 +9771,11 @@ def merge_dicts(*dicts): [3, 4], [5, 6], [7, 8]]) + >>> torch.roll(x, 1) + tensor([[8, 1], + [2, 3], + [4, 5], + [6, 7]]) >>> torch.roll(x, 1, 0) tensor([[7, 8], [1, 2], diff --git a/torch/amp/__init__.py b/torch/amp/__init__.py new file mode 100644 index 00000000000000..e4fe09f55632e4 --- /dev/null +++ b/torch/amp/__init__.py @@ -0,0 +1 @@ +from .autocast_mode import autocast diff --git a/torch/autocast_mode.py b/torch/amp/autocast_mode.py similarity index 93% rename from torch/autocast_mode.py rename to torch/amp/autocast_mode.py index daf2a34383fb43..e9edae02819aad 100644 --- a/torch/autocast_mode.py +++ b/torch/amp/autocast_mode.py @@ -3,7 +3,7 @@ import warnings from typing import Any, Optional -from .types import _dtype +from torch.types import _dtype def autocast_decorator(autocast_instance, func): @functools.wraps(func) @@ -47,7 +47,7 @@ class autocast(object): loss.backward() optimizer.step() - See the :ref:`Automatic Mixed Precision examples` for usage (along with gradient scaling) + See the :ref:`CUDA Automatic Mixed Precision examples` for usage (along with gradient scaling) in more complex scenarios (e.g., gradient penalty, multiple models/losses, custom autograd functions). :class:`autocast` can also be used as a decorator, e.g., on the ``forward`` method of your model:: @@ -102,6 +102,23 @@ def forward(self, input): # After exiting autocast, calls f_float16.float() to use with d_float32 g_float32 = torch.mm(d_float32, f_bfloat16.float()) + Example to use with jit trace in inference:: + + class TestModel(nn.Module): + def __init__(self, input_size, num_classes): + super(TestModel, self).__init__() + self.fc1 = nn.Linear(input_size, num_classes) + def forward(self, x): + return self.fc1(x) + + input_size = 2 + num_classes = 2 + model = TestModel(input_size, num_classes).eval() + + with torch.cpu.amp.autocast(cache_enabled=False): + model = torch.jit.trace(model, torch.randn(1, input_size)) + print(model.graph_for(torch.randn(1, input_size))) + Type mismatch errors *in* an autocast-enabled region are a bug; if this is what you observe, please file an issue. diff --git a/torch/ao/ns/_numeric_suite.py b/torch/ao/ns/_numeric_suite.py index 2db70b87a56aa6..2a54535678b271 100644 --- a/torch/ao/ns/_numeric_suite.py +++ b/torch/ao/ns/_numeric_suite.py @@ -436,6 +436,8 @@ def get_matching_activations( quantized_dict = get_logger_dict(q_module) act_dict: Dict[str, Dict] = {} for key in quantized_dict: + if len(quantized_dict[key]["tensor_val"]) == 0: + continue match_key = _find_match(sorted(float_dict, reverse=True), key, "stats") if match_key is not None: act_dict[key] = {} diff --git a/torch/ao/ns/fx/mappings.py b/torch/ao/ns/fx/mappings.py index c31261913ad358..5c3574c108a277 100644 --- a/torch/ao/ns/fx/mappings.py +++ b/torch/ao/ns/fx/mappings.py @@ -26,8 +26,10 @@ def get_base_name_to_sets_of_related_ops() -> Dict[str, Set[NSNodeTargetType]]: nn.Conv1d, nnq.Conv1d, nnqd.Conv1d, + nnqat.Conv1d, nniqat.ConvBn1d, nniqat.ConvBnReLU1d, + nniqat.ConvReLU1d, nniq.ConvReLU1d, nni.ConvReLU1d, ]), @@ -74,6 +76,7 @@ def get_base_name_to_sets_of_related_ops() -> Dict[str, Set[NSNodeTargetType]]: nn.Linear, nnq.Linear, nni.LinearReLU, + nni.LinearBn1d, nniq.LinearReLU, nniqd.LinearReLU, nnqat.Linear, @@ -447,10 +450,10 @@ def get_node_type_to_io_type_map() -> Dict[str, Set[NSNodeTargetType]]: F.dropout, F.silu, F.mish, - # TODO(future PR): implement shadowing for binary ops and - # uncomment below - # operator.add, - # operator.mul, + operator.add, + torch.add, + operator.mul, + torch.mul, torch.sum, ]) @@ -513,6 +516,7 @@ def get_node_type_to_io_type_map() -> Dict[str, Set[NSNodeTargetType]]: torch.squeeze, torch.stack, torch.unsqueeze, + operator.add, ]) MODS_IO_TYPE_FP32: Set[NSNodeTargetType] = set([ @@ -527,6 +531,7 @@ def get_node_type_to_io_type_map() -> Dict[str, Set[NSNodeTargetType]]: nnqd.Conv1d, nnqd.Conv2d, nnqd.Conv3d, + nnqat.Conv1d, nnqat.Conv2d, nnqat.Conv3d, nnqat.Embedding, @@ -561,6 +566,7 @@ def get_node_type_to_io_type_map() -> Dict[str, Set[NSNodeTargetType]]: nni.ConvReLU2d, nni.ConvReLU3d, nni.LinearReLU, + nni.LinearBn1d, nni.ConvBn1d, nni.ConvBn2d, nni.ConvBn3d, @@ -570,6 +576,7 @@ def get_node_type_to_io_type_map() -> Dict[str, Set[NSNodeTargetType]]: nniqat.ConvBnReLU1d, nniqat.ConvBnReLU2d, nniqat.ConvBnReLU3d, + nniqat.ConvReLU1d, nniqat.ConvReLU2d, nniqat.ConvReLU3d, nniqat.LinearReLU, @@ -581,7 +588,6 @@ def get_node_type_to_io_type_map() -> Dict[str, Set[NSNodeTargetType]]: nnq.Linear, nnq.Conv1d, nnq.Conv2d, - nniq.ConvReLU2d, nnq.Conv3d, nnq.BatchNorm2d, nnq.BatchNorm3d, diff --git a/torch/ao/ns/fx/pattern_utils.py b/torch/ao/ns/fx/pattern_utils.py index b0adb5faf95d15..96569789bde4b6 100644 --- a/torch/ao/ns/fx/pattern_utils.py +++ b/torch/ao/ns/fx/pattern_utils.py @@ -8,7 +8,7 @@ from torch.ao.quantization.utils import getattr_from_fqn from .ns_types import NSNodeTargetType -from torch.ao.quantization.fx.pattern_utils import get_default_quant_patterns +from torch.ao.quantization.fx.backend_config.utils import get_native_quant_patterns from torch.ao.quantization import ( ObserverBase, FakeQuantizeBase, @@ -66,9 +66,18 @@ def get_reversed_fusions() -> List[Tuple[NSFusionType, int]]: # * multiple ops: (torch.nn.ReLU, torch.nn.Conv2d) # For fusions, we only care about patterns composed of multiple ops. # TODO(future PR): allow customizations from default patterns. - all_quant_patterns = get_default_quant_patterns() + all_quant_patterns = get_native_quant_patterns() + default_base_op_idx = 0 for quant_pattern, _quant_handler in all_quant_patterns.items(): + # TODO: this is a temporary hack to flatten the patterns from quantization so + # that it works with the ns matcher function, maybe we should use `is_match` + # in torch.ao.quantization.fx.match_utils to match the patterns + if isinstance(quant_pattern, tuple) and len(quant_pattern) == 2 and \ + isinstance(quant_pattern[1], tuple) and len(quant_pattern[1]) == 2: + # flatten the pattern with form (nn.ReLU, (nn.BatchNorm2d, nn.Conv2d)) + quant_pattern = (quant_pattern[0], quant_pattern[1][0], quant_pattern[1][1]) + # Only patterns of multiple ops are fusions, ignore # patterns which contain a single ops (they get matched # without caring about fusions). diff --git a/torch/ao/ns/fx/weight_utils.py b/torch/ao/ns/fx/weight_utils.py index 36e183efe1d8ec..4dba8461957efd 100644 --- a/torch/ao/ns/fx/weight_utils.py +++ b/torch/ao/ns/fx/weight_utils.py @@ -189,6 +189,7 @@ def get_op_to_type_to_weight_extraction_fn() -> Dict[str, Dict[Callable, Callabl nnqat.Linear: mod_weight_detach, nnqd.Linear: mod_weight_bias_0, nniqat.LinearReLU: mod_weight_detach, + nniqat.LinearBn1d: mod_weight_detach, nn.modules.linear.NonDynamicallyQuantizableLinear: mod_weight_detach, # LSTM nn.LSTM: get_lstm_weight, diff --git a/torch/ao/quantization/_quantize_fx_do_not_use.py b/torch/ao/quantization/_quantize_fx_do_not_use.py deleted file mode 100644 index d39abe299393b3..00000000000000 --- a/torch/ao/quantization/_quantize_fx_do_not_use.py +++ /dev/null @@ -1,34 +0,0 @@ -import torch -from torch.fx import GraphModule -from typing import Dict, Any, Optional -from .quantize_fx import ( - _check_is_graph_module, - check_is_valid_convert_custom_config_dict -) -from .fx._convert_do_not_use import _convert_do_not_use - -def _convert_fx_do_not_use( - graph_module: GraphModule, is_reference: bool = False, - convert_custom_config_dict: Dict[str, Any] = None, - _remove_qconfig: bool = True, - backend_config_dict: Optional[Dict[str, Any]] = None) -> torch.nn.Module: - """ - Please do not use, this is a temporary function to migrate convert_fx - to a new implementation - """ - assert is_reference - if convert_custom_config_dict is None: - convert_custom_config_dict = {} - - _check_is_graph_module(graph_module) - check_is_valid_convert_custom_config_dict(convert_custom_config_dict) - - quantized = _convert_do_not_use( - graph_module, is_reference, convert_custom_config_dict, - False, _remove_qconfig_flag=_remove_qconfig, - backend_config_dict=backend_config_dict) - - preserved_attributes = convert_custom_config_dict.get("preserved_attributes", []) - for attr_name in preserved_attributes: - setattr(quantized, attr_name, getattr(graph_module, attr_name)) - return quantized diff --git a/torch/ao/quantization/fake_quantize.py b/torch/ao/quantization/fake_quantize.py index 9e49a8392e3ea2..ec8b9ffd3b2084 100644 --- a/torch/ao/quantization/fake_quantize.py +++ b/torch/ao/quantization/fake_quantize.py @@ -6,11 +6,9 @@ import torch from torch.nn import Module from torch.ao.quantization.observer import ( - MinMaxObserver, MovingAverageMinMaxObserver, HistogramObserver, MovingAveragePerChannelMinMaxObserver, - PerChannelMinMaxObserver, FixedQParamsObserver, default_affine_fixed_qparams_observer, default_symmetric_fixed_qparams_observer, @@ -123,15 +121,25 @@ class FakeQuantize(FakeQuantizeBase): scale: torch.Tensor zero_point: torch.Tensor - def __init__(self, observer=MovingAverageMinMaxObserver, quant_min=0, quant_max=255, **observer_kwargs): + def __init__(self, observer=MovingAverageMinMaxObserver, quant_min=None, quant_max=None, **observer_kwargs): super().__init__() - assert quant_min <= quant_max, \ - 'quant_min must be less than or equal to quant_max' - self.quant_min = quant_min - self.quant_max = quant_max + # Populate quant_min/quant_max to observer_kwargs if valid + if quant_min is not None and quant_max is not None: + assert quant_min <= quant_max, \ + 'quant_min must be less than or equal to quant_max' + dtype = observer_kwargs.get("dtype", torch.quint8) + if hasattr(observer, "p"): + # In case observer is _PartialWrapper, dtype can be stored in + # observer.p.keywords["dtype"] + dtype = getattr(getattr(observer, "p", {}), "keywords", {}).get( + "dtype", dtype + ) + assert torch.iinfo(dtype).min <= quant_min, 'quant_min out of bound' + assert quant_max <= torch.iinfo(dtype).max, 'quant_max out of bound' + observer_kwargs.update({"quant_min": quant_min, "quant_max": quant_max}) self.activation_post_process = observer(**observer_kwargs) - assert torch.iinfo(self.activation_post_process.dtype).min <= quant_min, 'quant_min out of bound' - assert quant_max <= torch.iinfo(self.activation_post_process.dtype).max, 'quant_max out of bound' + self.quant_min = self.activation_post_process.quant_min + self.quant_max = self.activation_post_process.quant_max if _is_float_qparams(self.activation_post_process.qscheme): zero_point_dtype = torch.float else: @@ -335,10 +343,11 @@ def forward(self, X: torch.Tensor) -> torch.Tensor: dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, reduce_range=False) """ Default fake_quant for weights. +Observer is memoryless since averaging_constant is 1. """ -default_dynamic_fake_quant = FakeQuantize.with_args(observer=MinMaxObserver, quant_min=0, quant_max=255, - dtype=torch.quint8, memoryless=True) +default_dynamic_fake_quant = FakeQuantize.with_args(observer=MovingAverageMinMaxObserver, quant_min=0, quant_max=255, + dtype=torch.quint8, averaging_constant=1) """ Default dynamic fake_quant for activations. """ @@ -355,23 +364,25 @@ def forward(self, X: torch.Tensor) -> torch.Tensor: ch_axis=0) """ Default fake_quant for per-channel weights. +Observer is memoryless since averaging_constant is 1. """ -default_embedding_fake_quant = FakeQuantize.with_args(observer=PerChannelMinMaxObserver, +default_embedding_fake_quant = FakeQuantize.with_args(observer=MovingAveragePerChannelMinMaxObserver, qscheme=torch.per_channel_affine_float_qparams, dtype=torch.quint8, quant_min=0, quant_max=255, ch_axis=0, - memoryless=True) + averaging_constant=1) """ Default fake_quant for embeddings. +Observer is memoryless since averaging_constant is 1. """ -default_embedding_fake_quant_4bit = FakeQuantize.with_args(observer=PerChannelMinMaxObserver, +default_embedding_fake_quant_4bit = FakeQuantize.with_args(observer=MovingAveragePerChannelMinMaxObserver, qscheme=torch.per_channel_affine_float_qparams, ch_axis=0, dtype=torch.quint4x2, - memoryless=True) + averaging_constant=1) default_histogram_fake_quant = FakeQuantize.with_args(observer=HistogramObserver, quant_min=0, @@ -411,6 +422,27 @@ def forward(self, X: torch.Tensor) -> torch.Tensor: Fused version of `default_per_channel_weight_fake_quant`, with improved performance. """ +fused_wt_fake_quant_range_neg_127_to_127 = FusedMovingAvgObsFakeQuantize.with_args(observer=MovingAverageMinMaxObserver, + quant_min=-127, + quant_max=127, + dtype=torch.qint8, + qscheme=torch.per_tensor_symmetric, + eps=2 ** -12) +""" +Fused version of `default_weight_fake_quant`, with the 8-bit values restricted to [-127, +127], excluding -128. +""" + +fused_per_channel_wt_fake_quant_range_neg_127_to_127 = FusedMovingAvgObsFakeQuantize.with_args(observer=MovingAverageMinMaxObserver, + quant_min=-127, + quant_max=127, + dtype=torch.qint8, + qscheme=torch.per_channel_symmetric, + eps=2 ** -12) +""" +Fused version of `default_per_channel_weight_fake_quant`, with the 8-bit values restricted to [-127, +127], excluding -128. +""" + + def _is_fake_quant_script_module(mod): ''' Returns true if given mod is an instance of FakeQuantize script module. ''' diff --git a/torch/ao/quantization/fuse_modules.py b/torch/ao/quantization/fuse_modules.py index f276eea3c871ff..1f7027f5c8d574 100644 --- a/torch/ao/quantization/fuse_modules.py +++ b/torch/ao/quantization/fuse_modules.py @@ -7,6 +7,7 @@ # for backward compatiblity from torch.ao.quantization.fuser_method_mappings import fuse_conv_bn # noqa: F401 from torch.ao.quantization.fuser_method_mappings import fuse_conv_bn_relu # noqa: F401 +from torch.nn.utils.parametrize import type_before_parametrizations from typing import List, Optional @@ -41,7 +42,7 @@ def fuse_known_modules(mod_list, is_qat, additional_fuser_method_mapping=None): For these sequences, the first element in the output module list performs the fused operation. The rest of the elements are set to nn.Identity() """ - types = tuple(type(m) for m in mod_list) + types = tuple(type_before_parametrizations(m) for m in mod_list) fuser_method = get_fuser_method(types, additional_fuser_method_mapping) if fuser_method is None: raise NotImplementedError("Cannot fuse modules: {}".format(types)) diff --git a/torch/ao/quantization/fuser_method_mappings.py b/torch/ao/quantization/fuser_method_mappings.py index f152c30b616f99..a2882f1360479c 100644 --- a/torch/ao/quantization/fuser_method_mappings.py +++ b/torch/ao/quantization/fuser_method_mappings.py @@ -33,8 +33,6 @@ def fuse_conv_bn(is_qat, conv, bn): } if is_qat: - # TODO: remove the assert later - assert conv.training, "qat is only supported when conv.training is True currently" assert bn.num_features == conv.out_channels, 'Output channel of Conv2d must match num_features of BatchNorm2d' assert bn.affine, 'Only support fusing BatchNorm2d with affine set to True' assert bn.track_running_stats, 'Only support fusing BatchNorm2d with tracking_running_stats set to True' @@ -66,8 +64,6 @@ def fuse_conv_bn_relu(is_qat, conv, bn, relu): "Conv and BN both must be in the same mode (train or eval)." fused_module : Optional[Type[nn.Sequential]] = None if is_qat: - # TODO: remove the assert later - assert conv.training, "qat is only supported when conv.training is True currently" map_to_fused_module_train = { nn.Conv1d: nni.ConvBnReLU1d, nn.Conv2d: nni.ConvBnReLU2d, @@ -113,8 +109,6 @@ def fuse_linear_bn(is_qat, linear, bn): "Linear and BN both must be in the same mode (train or eval)." if is_qat: - # TODO: remove the assert later - assert linear.training, "qat is only supported when linear.training is True currently" assert bn.num_features == linear.out_features,\ "Output features of Linear must match num_features of BatchNorm1d" assert bn.affine, "Only support fusing BatchNorm1d with affine set to True" @@ -142,8 +136,7 @@ def fuse_convtranspose_bn(is_qat, convt, bn): "ConvTranspose and BN both must be in the same mode (train or eval)." if is_qat: - assert convt.training, "qat is only supported when convt.training is True currently" - raise Exception("Fusing ConvTranspose+BatchNorm not yet supported in training.") + raise Exception("Fusing ConvTranspose+BatchNorm not yet supported in QAT.") else: return nn.utils.fusion.fuse_conv_bn_eval(convt, bn, transpose=True) diff --git a/torch/ao/quantization/fx/_convert_do_not_use.py b/torch/ao/quantization/fx/_convert_do_not_use.py deleted file mode 100644 index 3d5aea83953cd9..00000000000000 --- a/torch/ao/quantization/fx/_convert_do_not_use.py +++ /dev/null @@ -1,332 +0,0 @@ -from typing import Any, Dict, List, Optional, Set, Callable -import torch -from torch.fx import ( - GraphModule, -) -from torch.fx.graph import ( - Graph, - Node, -) -from ..qconfig import QConfigAny -from ..utils import ( - activation_is_int8_quantized, - weight_is_statically_quantized, - get_qparam_dict, - _parent_name, -) -from .backend_config.utils import get_quantized_reference_module_mapping - -from .graph_module import ( - QuantizedGraphModule, - is_observed_standalone_module, -) -from ._equalize import update_obs_for_equalization, convert_eq_obs -from .utils import ( - get_custom_module_class_keys, - get_quantize_node_info, - create_getattr_from_value, -) - -from torch.ao.quantization.quantize import ( - _remove_qconfig, - is_activation_post_process, -) - -from .convert import restore_state - -# these are tuples so that they can work with isinstance(module, tuple_of_classes) -FUSED_MODULE_CLASSES = ( - torch.nn.intrinsic.LinearReLU, - torch.nn.intrinsic.ConvReLU1d, - torch.nn.intrinsic.ConvReLU2d, - torch.nn.intrinsic.ConvReLU3d, -) - -QAT_MODULE_CLASSES = ( - torch.nn.qat.Linear, - torch.nn.qat.Conv2d, - torch.nn.qat.Conv3d, - torch.nn.intrinsic.qat.LinearReLU, - torch.nn.intrinsic.qat.ConvBn2d, - torch.nn.intrinsic.qat.ConvBnReLU2d, - torch.nn.intrinsic.qat.ConvReLU2d, - torch.nn.intrinsic.qat.ConvBn3d, - torch.nn.intrinsic.qat.ConvBnReLU3d, - torch.nn.intrinsic.qat.ConvReLU3d -) - -def insert_dequantize_node( - node: Node, - graph: Graph): - """ Inserts dequantize node for `node` in `graph` - """ - with graph.inserting_after(node): - dequantize_node = graph.call_method("dequantize", (node,)) - for user_node in dict(node.users): - if user_node is not dequantize_node: - user_node.replace_input_with(node, dequantize_node) - - -def convert_standalone_module( - node: Node, - modules: Dict[str, torch.nn.Module], - model: torch.fx.GraphModule, - is_reference: bool, - backend_config_dict: Dict[str, Any]): - convert = torch.ao.quantization._quantize_fx_do_not_use._convert_do_not_use # type: ignore[attr-defined] - # We know that observed standalone module is a GraphModule since - # it's produced by us - observed_standalone_module : GraphModule = modules[str(node.target)] # type: ignore[assignment] - sm_input_quantized_idxs = \ - observed_standalone_module \ - ._standalone_module_input_quantized_idxs\ - .tolist() # type: ignore[operator] - # remove the dequantize nodes for inputs - args = list(node.args) - for idx in range(len(args)): - if idx in sm_input_quantized_idxs: - arg = args[idx] - if arg.op == "call_method" and arg.target == "dequantize": # type: ignore[union-attr] - quantize_node = arg.args[0] # type: ignore[union-attr] - node.replace_input_with(arg, quantize_node) - if len(arg.users) == 0: # type: ignore[union-attr] - model.graph.erase_node(arg) - # add dequantize node for output - sm_output_quantized_idxs = \ - observed_standalone_module \ - ._standalone_module_output_quantized_idxs \ - .tolist() # type: ignore[operator] - if len(sm_output_quantized_idxs) > 0: - assert sm_output_quantized_idxs[0] == 0, "Currently only quantized" - "output idxs = [0] is supported" - - # if it's non-empty, then it means the output is kept in quantized form - # we'll just add a dequantize node after this node - insert_dequantize_node(node, model.graph) - - # TODO: allow convert_custom_config_dict to override backend_config_dict - # for standalone module - quantized_standalone_module = convert( - observed_standalone_module, - is_reference=True, - backend_config_dict=backend_config_dict) - parent_name, name = _parent_name(node.target) - # update the modules dict - setattr(modules[parent_name], name, quantized_standalone_module) - modules[str(node.target)] = quantized_standalone_module - -def convert_weighted_module( - node: Node, - modules: Dict[str, torch.nn.Module], - observed_node_names: Set[str], - quantized_reference_module_mapping: Dict[Callable, Any]): - original_module = modules[str(node.target)] - qconfig = original_module.qconfig - - is_observed = node.name in observed_node_names - is_activation_quantized = activation_is_int8_quantized(qconfig) - is_weight_quantized = weight_is_statically_quantized(qconfig) - # TODO: rename weight_is_statically_quantized to weight_is_int8_quantized - if qconfig is None or \ - not is_observed or \ - not is_weight_quantized or \ - not is_activation_quantized: - return - - float_module = original_module - fused_module = None - if isinstance( - original_module, - QAT_MODULE_CLASSES): - # case 1. converting qat module to - # a float module, we need to attch - # weight fake_quant to the module, - # weight fake_quant is assumed to be run during - # QAT so we don't need to run it again here - float_module = original_module.to_float() # type: ignore[operator] - # change qat conv to conv - parent_name, name = _parent_name(node.target) - setattr(modules[parent_name], name, float_module) - if isinstance(float_module, torch.nn.intrinsic._FusedModule): - fused_module = float_module - float_module = fused_module[0] - weight_post_process = original_module.weight_fake_quant - else: - # case 2. converting a float module/fused float module - # to float module, we need to attach - # weight observer to the conv module and run it - # with conv weight - if isinstance(original_module, torch.nn.intrinsic._FusedModule): - fused_module = original_module - float_module = fused_module[0] # type: ignore[index] - assert qconfig is not None - weight_post_process = qconfig.weight() # type: ignore[union-attr, operator] - # run weight observer - weight_post_process(float_module.weight) # type: ignore[operator] - weight_qparams = get_qparam_dict(weight_post_process) - # TODO: may need to change the mapping when we support dynamic quantization - ref_qmodule_cls = quantized_reference_module_mapping.get(type(float_module), None) - assert ref_qmodule_cls is not None, f"No reference quantized module class configured for {type(float_module)}" - ref_qmodule = ref_qmodule_cls.from_float(float_module, weight_qparams) # type: ignore[attr-defined] - if fused_module is not None: - fused_module[0] = ref_qmodule - else: - parent_name, name = _parent_name(node.target) - setattr(modules[parent_name], name, ref_qmodule) - -def _convert_do_not_use( - model: GraphModule, is_reference: bool = False, - convert_custom_config_dict: Dict[str, Any] = None, - is_standalone_module: bool = False, - _remove_qconfig_flag: bool = True, - backend_config_dict: Optional[Dict[str, Any]] = None) -> torch.nn.Module: - """ - We will convert an observed model (a module with observer calls) to a reference - quantized model, the rule is simple: - 1. for each observer module call in the graph, we'll convert it to calls to - quantize and dequantize functions based on the observer instance - 2. for weighted operations like linear/conv, we need to convert them to reference - quantized module, this requires us to know whether the dtype configured for the - weight is supported in the backend, this is done in prepare step and the result - is stored in observed_node_names, we can decide whether we need to swap the - module based on this set - - standalone_module means it a submodule that is not inlined in - parent module, and will be quantized separately as one unit. - - Returns a quantized standalone module, whether input/output is quantized is - specified by prepare_custom_config_dict, with - input_quantized_idxs, output_quantized_idxs, please - see docs for prepare_fx for details - """ - if convert_custom_config_dict is None: - convert_custom_config_dict = {} - patterns, node_name_to_scope, prepare_custom_config_dict, observed_node_names = restore_state(model) - qconfig_map: Dict[str, QConfigAny] = model._qconfig_map # type: ignore[assignment] - - assert is_reference, "_convert_do_not_use only supports reference option" - - # mapping from fully qualified module name to module instance - # for example, - # { - # '': Model(...), - # 'linear': Linear(...), - # 'linear.weight_fake_quant': PerChannelMinMaxObserver(...), - # } - # We use remove_duplicate=False here because torch.cat uses - # the same activation_post_process module instance but different names - modules = dict(model.named_modules(remove_duplicate=False)) - - custom_module_classes = get_custom_module_class_keys( - convert_custom_config_dict, - "observed_to_quantized_custom_module_class") - - if model._equalization_qconfig_map is not None: - # If we want to do equalization then do the following: - # Calculate the equalization scale, update the observers with the scaled - # inputs, and scale the weight - weight_eq_obs_dict = update_obs_for_equalization(model, modules) - convert_eq_obs(model, modules, weight_eq_obs_dict) - - graph_inputs: List[str] = [] - for node in model.graph.nodes: - if node.op == 'placeholder': - graph_inputs.append(node.name) - - def replace_observer_with_quantize_dequantize_node(graph: Graph, node: Node, modules: Dict[str, torch.nn.Module]) -> None: - """ Replace activation_post_process module call node with quantize and - dequantize node - - Before: - ... -> observer_0(x) -> ... - After: - ... -> torch.quantize_per_tensor(x, ...) -> x.dequantize() -> ... - """ - assert modules is not None - assert isinstance(node.target, str) - observer_module = modules[node.target] - root_module = modules[""] - if observer_module.dtype == torch.float32: - # remove the node for now - # TODO: support dynamic quant - with graph.inserting_before(node): - node.replace_all_uses_with(node.args[0]) - graph.erase_node(node) - elif observer_module.dtype in [torch.quint8, torch.qint8, torch.float16]: - node_type, quantize_op, qparams = get_quantize_node_info(observer_module) - # replace observer node with quant - dequant node - with graph.inserting_before(node): - input_node = node.args[0] - inputs = [input_node] - for key, value in qparams.items(): - if key in ['_scale_', '_zero_point_']: - # For scale and zero_point values we register them as buffers in the root module. - # TODO: maybe need more complex attr name here - qparam_node = create_getattr_from_value(root_module, graph, key, value) - inputs.append(qparam_node) - else: - # for qparams that are not scale/zero_point (like axis, dtype) we store them as literals in the graph. - inputs.append(value) - - quantized_node = graph.create_node(node_type, quantize_op, tuple(inputs), {}) - dequantized_node = graph.call_method("dequantize", args=(quantized_node,)) - node.replace_all_uses_with(dequantized_node) - graph.erase_node(node) - - - # additional state to override inputs to be quantized, if specified - # by the user - placeholder_node_seen_cnt = 0 - output_node_seen_cnt = 0 - input_quantized_idxs: List[int] = prepare_custom_config_dict.get( - "input_quantized_idxs", []) - output_quantized_idxs: List[int] = prepare_custom_config_dict.get( - "output_quantized_idxs", []) - - if backend_config_dict is None: - backend_config_dict = {} - quantized_reference_module_mapping = get_quantized_reference_module_mapping(backend_config_dict) - # convert tuples so that it can work with isinstance(module, tuple_of_classes) - weighted_module_classes = tuple(quantized_reference_module_mapping.keys()) - - for node in list(model.graph.nodes): - if node.op == 'placeholder': - cur_placeholder_node_idx = placeholder_node_seen_cnt - placeholder_node_seen_cnt += 1 - if cur_placeholder_node_idx in input_quantized_idxs: - # Inputs are assumed to be quantized if the user specifid the - # input_quantized_idxs override. - # we need to dequantize the inputs since all operators took - # floating point inputs in reference quantized models - insert_dequantize_node(node, model.graph) - elif node.op == "output": - cur_output_node_idx = output_node_seen_cnt - output_node_seen_cnt += 1 - if cur_output_node_idx in output_quantized_idxs: - # Result are kept quantized if the user specified the - # output_quantized_idxs override. - # Remove the dequantize operator in the end - maybe_dequantize_node = node.args[0] - if isinstance(maybe_dequantize_node, Node) and \ - maybe_dequantize_node.op == "call_method" and \ - maybe_dequantize_node.target == "dequantize": - quantize_node = maybe_dequantize_node.args[0] - maybe_dequantize_node.replace_all_uses_with(quantize_node) - model.graph.erase_node(maybe_dequantize_node) - elif node.op == "call_module": - if is_activation_post_process(modules[node.target]): - replace_observer_with_quantize_dequantize_node(model.graph, node, modules) - elif is_observed_standalone_module(modules[node.target]): - # TODO: move this to a separate function - convert_standalone_module(node, modules, model, is_reference, backend_config_dict) - - elif type(modules[node.target]) in set( - weighted_module_classes).union(QAT_MODULE_CLASSES).union(FUSED_MODULE_CLASSES): - convert_weighted_module(node, modules, observed_node_names, quantized_reference_module_mapping) - - # removes qconfig and activation_post_process modules - if _remove_qconfig_flag: - _remove_qconfig(model) - preserved_attributes = set(convert_custom_config_dict.get("preserved_attributes", [])) - model = QuantizedGraphModule(model, model.graph, preserved_attributes) - return model diff --git a/torch/ao/quantization/fx/_lower_to_native_backend.py b/torch/ao/quantization/fx/_lower_to_native_backend.py index 8b66370cb2a364..fdd0a5c172b75c 100644 --- a/torch/ao/quantization/fx/_lower_to_native_backend.py +++ b/torch/ao/quantization/fx/_lower_to_native_backend.py @@ -1,31 +1,28 @@ -import itertools import torch -from torch.fx import map_arg +from torch.fx import map_arg, Node from torch.fx.graph import Graph import torch.nn as nn import torch.nn.functional as F import torch.nn.intrinsic as nni import torch.nn.intrinsic.quantized as nniq +import torch.nn.intrinsic.quantized.dynamic as nniqd import torch.nn.quantized as nnq +import torch.nn.quantized.dynamic as nnqd import torch.nn.quantized._reference as nnqr from torch.nn.quantized.modules.utils import WeightedQuantizedModule -from . import subgraph_rewriter_FORKED_DO_NOT_USE from .graph_module import QuantizedGraphModule -from .quantized_fusion_patterns_and_replacements import get_fbgemm_patterns_and_replacements -from .match_utils import is_match, MatchAllNode -from .quantization_types import Pattern from .utils import ( collect_producer_nodes, get_linear_prepack_op_for_dtype, get_new_attr_name_with_prefix, + get_qconv_prepack_op, graph_module_from_producer_nodes, ) from ..utils import _parent_name from ..qconfig import QConfigAny from ..quantization_mappings import get_quantized_operator from .utils import create_node_from_old_node_preserve_meta -from typing import Dict, Tuple, Type, List, Callable, Any, Union -from torch.fx import Node +from typing import Dict, Tuple, Type, List, Callable, Any, Union, Set, Optional import operator QOP_TO_ARG_NAMES_TO_SKIP = { @@ -85,6 +82,10 @@ def is_default_node(node, modules): torch.nn.InstanceNorm3d, torch.nn.LayerNorm, torch.nn.Dropout, + torch.nn.BatchNorm2d, + torch.nn.BatchNorm3d, + torch.nn.intrinsic.BNReLU2d, + torch.nn.intrinsic.BNReLU3d, ] return _is_node_in_list(node, modules, func_list, method_list, module_type_list) @@ -179,9 +180,13 @@ def is_special_pattern_node(node, modules): res_module = res_module or is_call_module return res_function, res_method, res_module - def is_dequantize_node(node): - return isinstance(node, Node) and node.op == 'call_method' and node.target == 'dequantize' + return isinstance(node, Node) and node.op == "call_method" and node.target == "dequantize" + +def is_getattr_tensor_metadata_node(node): + return node.op == "call_function" and \ + node.target == getattr and \ + node.args[1] in ["shape"] def should_skip_lowering(op: torch.fx.node.Node, qconfig_map: Dict[str, QConfigAny]): """ @@ -192,16 +197,32 @@ def should_skip_lowering(op: torch.fx.node.Node, qconfig_map: Dict[str, QConfigA """ return op.name in qconfig_map and qconfig_map[op.name] is None -# Mapping from reference module class to the replacement quantized module class for lowering -LOWER_MODULE_MAP: Dict[Type[nn.Module], Type[WeightedQuantizedModule]] = { +# Mapping from reference module class to the replacement static quantized module class for lowering +STATIC_LOWER_MODULE_MAP: Dict[Type[nn.Module], Type[WeightedQuantizedModule]] = { nnqr.Linear: nnq.Linear, nnqr.Conv1d: nnq.Conv1d, nnqr.Conv2d: nnq.Conv2d, nnqr.Conv3d: nnq.Conv3d, } -# TODO: merge with LOWER_MODULE_MAP after we merge -# _lower_weighted_ref_module and special_pattern_replacement +# Mapping from reference module class to the replacement dynamic quantized module class for lowering +DYNAMIC_LOWER_MODULE_MAP: Dict[Type[nn.Module], Type[nn.Module]] = { + nnqr.Linear: nnqd.Linear, + nnqr.GRUCell: nnqd.GRUCell, + nnqr.LSTMCell: nnqd.LSTMCell, + nnqr.RNNCell: nnqd.RNNCell, + nnqr.LSTM: nnqd.LSTM, +} + +# Mapping from reference module class to the replacement weight only quantized module class for lowering +# TODO: correct the namespace for these modules +WEIGHT_ONLY_LOWER_MODULE_MAP: Dict[Type[nn.Module], Type[nn.Module]] = { + nnqr.Embedding: nnq.Embedding, + nnqr.EmbeddingBag: nnq.EmbeddingBag, +} + +# TODO: merge with STATIC_LOWER_MODULE_MAP after we merge +# _lower_static_weighted_ref_module and special_pattern_replacement SPECIAL_PATTERN_LOWER_MODULE_MAP = { nn.BatchNorm2d: nnq.BatchNorm2d, nn.BatchNorm3d: nnq.BatchNorm3d, @@ -215,26 +236,38 @@ def should_skip_lowering(op: torch.fx.node.Node, qconfig_map: Dict[str, QConfigA nn.InstanceNorm3d: nnq.InstanceNorm3d, nn.LayerNorm: nnq.LayerNorm, nn.Dropout: nnq.Dropout, + nni.BNReLU2d: nniq.BNReLU2d, + nni.BNReLU3d: nniq.BNReLU3d, } # Mapping from fused module class to a 2-tuple of: # 1) The inner reference module class -# 2) The replacement quantized module class for lowering -LOWER_FUSED_MODULE_MAP: Dict[Type[nn.Module], Tuple[Type[nn.Module], Type[WeightedQuantizedModule]]] = { +# 2) The replacement static quantized module class for lowering +STATIC_LOWER_FUSED_MODULE_MAP: Dict[Type[nn.Module], Tuple[Type[nn.Module], Type[WeightedQuantizedModule]]] = { nni.LinearReLU: (nnqr.Linear, nniq.LinearReLU), nni.ConvReLU1d: (nnqr.Conv1d, nniq.ConvReLU1d), nni.ConvReLU2d: (nnqr.Conv2d, nniq.ConvReLU2d), nni.ConvReLU3d: (nnqr.Conv3d, nniq.ConvReLU3d), } +# Mapping from fused module class to a 2-tuple of: +# 1) The inner reference module class +# 2) The replacement dynamic quantized module class for lowering +DYNAMIC_LOWER_FUSED_MODULE_MAP: Dict[Type[nn.Module], Tuple[Type[nn.Module], Type[nn.Module]]] = { + nni.LinearReLU: (nnqr.Linear, nniqd.LinearReLU), +} + # Mapping from a functional to lower to a 2-tuple of # 1) The quantized version of the op # 2) The quantized version of the op fused with relu, if it exists, else None -LOWER_FUNCTIONAL_MAP = { +STATIC_LOWER_FUNCTIONAL_MAP: Dict[Callable, Tuple[Callable, Callable]] = { F.linear: (torch.ops.quantized.linear, torch.ops.quantized.linear_relu), + F.conv1d: (torch.ops.quantized.conv1d, torch.ops.quantized.conv1d_relu), + F.conv2d: (torch.ops.quantized.conv2d, torch.ops.quantized.conv2d_relu), + F.conv3d: (torch.ops.quantized.conv3d, torch.ops.quantized.conv3d_relu), } -WEIGHT_PREPACK_OPS = { +WEIGHT_PREPACK_OPS: Set[Callable] = { torch._ops.ops.quantized.linear_prepack, torch._ops.ops.quantized.linear_prepack_fp16, torch._ops.ops.quantized.conv1d_prepack, @@ -242,9 +275,39 @@ def should_skip_lowering(op: torch.fx.node.Node, qconfig_map: Dict[str, QConfigA torch._ops.ops.quantized.conv3d_prepack, } +# Mapping from a functional to a dictionary, where the key is a 2-tuple of +# (activation_compute_dtype, weight_dtype) and the value is a 2-tuple of +# 1) The dynamically quantized version of the op +# 2) The dynamically quantized version of the op fused with relu, if it exists, else None +DYNAMIC_LOWER_FUNCTIONAL_MAP: Dict[Callable, Dict[Tuple[torch.dtype, torch.dtype], Tuple[Callable, Optional[Callable]]]] = { + F.linear: { + (torch.quint8, torch.qint8): (torch.ops.quantized.linear_dynamic, + torch.ops.quantized.linear_relu_dynamic), + (torch.float16, torch.float16): (torch.ops.quantized.linear_dynamic_fp16, + torch.ops.quantized.linear_relu_dynamic_fp16) + }, + # dynamic conv + relu is not available yet + F.conv1d: { + (torch.quint8, torch.qint8): (torch.ops.quantized.conv1d_dynamic, None), + }, + F.conv2d: { + (torch.quint8, torch.qint8): (torch.ops.quantized.conv2d_dynamic, None), + }, + F.conv3d: { + (torch.quint8, torch.qint8): (torch.ops.quantized.conv3d_dynamic, None), + }, +} + +CONV_FUNCTIONAL_OPS: Set[Callable] = { + F.conv1d, + F.conv2d, + F.conv3d, +} + def fold_weight( - quantized: QuantizedGraphModule, - node_name_to_scope: Dict[str, Tuple[str, type]]) -> QuantizedGraphModule: + quantized: QuantizedGraphModule, + node_name_to_scope: Dict[str, Tuple[str, type]] +) -> QuantizedGraphModule: """ Trace back from the weight node util we hit getattr, reconstruct the graph module with the traced nodes and run the graph module to pack the @@ -295,186 +358,404 @@ def load_arg(a): else: # copy other nodes env[node.name] = folded_graph.node_copy(node, load_arg) - quantized = QuantizedGraphModule(quantized_root, folded_graph, quantized_root.preserved_attr_names) - return quantized + return QuantizedGraphModule(quantized_root, folded_graph, quantized_root.preserved_attr_names) -def _lower_weighted_ref_module(model: QuantizedGraphModule) -> QuantizedGraphModule: +def _get_module(node: Node, modules: Dict[str, nn.Module]) -> Optional[nn.Module]: + """ + Return the `torch.nn.Module` that corresponds to the specified node's target. + If no such node exists, return None. + """ + if node.op == "call_module" and str(node.target) in modules: + return modules[str(node.target)] + else: + return None + +def _match_static_pattern( + node: Node, + modules: Dict[str, nn.Module], + qconfig_map: Dict[str, QConfigAny], + matching_modules_or_ops: List[Callable], + dequantize_node_arg_indices: List[int] +) -> Union[Tuple[Node, Node, Node], Tuple[None, None, None]]: + """ + Match the pattern (dequantize - ref node - quantize) against the node provided. + + If there is a match, return a 3-tuple of: + 1) q_node: the quantize node, + 2) relu_node: a relu node wrapping the ref_node, and + 3) ref_node: a reference module or functional node to replace with its quantized counterpart + Otherwise, if there is no match, return a 3-tuple of (None, None, None). + + Parameters: + node: The `torch.fx.Node` to match against. + modules: A mapping from node names to modules in the model graph, used for module lookup. + qconfig_map: A mapping from node names to the qconfigs associated with the nodes. + If the corresponding qconfig for the reference node is None, then return no match. + matching_modules_or_ops: Either a list of functions or a list of `torch.nn.Module`s. + If the reference node is not in this list, then return no match. + dequantize_node_arg_indices: A list of indices in the reference node args where dequantize + nodes may be present. An empty list means skipping the check for dequantize nodes. + """ + SKIP_LOWERING_VALUE = (None, None, None) + + # Match quantize node + if node.op != "call_function" or node.target != torch.quantize_per_tensor: + return SKIP_LOWERING_VALUE + q_node = node + ref_node = q_node.args[0] + assert(isinstance(ref_node, Node)) + + # Handle cases where the node is wrapped in a ReLU + if (ref_node.op == "call_function" and ref_node.target in (F.relu, torch.relu)) or\ + (ref_node.op == "call_module" and type(_get_module(ref_node, modules)) == nn.ReLU): + relu_node = ref_node + ref_node = relu_node.args[0] + assert(isinstance(ref_node, Node)) + else: + relu_node = None + if should_skip_lowering(ref_node, qconfig_map): + return SKIP_LOWERING_VALUE + + # Match reference module or functional + if isinstance(matching_modules_or_ops[0], type) and issubclass(matching_modules_or_ops[0], nn.Module): + expected_op = "call_module" + match_key = type(_get_module(ref_node, modules)) + else: + expected_op = "call_function" + match_key = ref_node.target + if ref_node.op != expected_op or match_key not in matching_modules_or_ops: + return SKIP_LOWERING_VALUE + + # Match dequantize node(s). Both of the following conditions must pass: + # (1) All `torch.fx.Node`s at the matching indices must be a dequantize node + # (2) There must be at least one dequantize node + matched_dequantize = False + for i in dequantize_node_arg_indices: + assert i < len(ref_node.args),\ + "Dequantize index %s exceeded reference node's arg length %s" % (i, len(ref_node.args)) + arg = ref_node.args[i] + if is_dequantize_node(arg): + matched_dequantize = True + elif isinstance(arg, Node): + return SKIP_LOWERING_VALUE + if not matched_dequantize: + return SKIP_LOWERING_VALUE + + return (q_node, relu_node, ref_node) + +def _lower_static_weighted_ref_module( + model: QuantizedGraphModule, + qconfig_map: Dict[str, QConfigAny]): """ Traverse the graph and find dequantize - ref module - quantize patterns and replace them with the quantized version of the ref module. """ - for ref_class in list(LOWER_MODULE_MAP.keys()) + list(LOWER_FUSED_MODULE_MAP.keys()): - pattern = (torch.quantize_per_tensor, - (ref_class, "dequantize"), - MatchAllNode, MatchAllNode, MatchAllNode) - modules = dict(model.named_modules(remove_duplicate=False)) - nodes = list(model.graph.nodes) - # TODO: maybe orgnize this better (e.g. break down to more functions) - # to make this function more readable - for n in model.graph.nodes: - if not is_match(modules, n, pattern): - continue - q_node = n - ref_node = q_node.args[0] - dq_node = ref_node.args[0] - # get output scale/zero_point/dtype from the quantize node - scale_node = q_node.args[1] - zero_point_node = q_node.args[2] - dtype = q_node.args[3] - - # this can be removed if we add support for "get_attr" in is_match - if scale_node.op != "get_attr" or zero_point_node.op != "get_attr": - print("Find the pattern but scale_node and zero_point node are not `get_attr`," - f"got: {scale_node.format_node} {zero_point_node.format_node()}") + modules = dict(model.named_modules(remove_duplicate=False)) + nodes = list(model.graph.nodes) + for n in model.graph.nodes: + # Step 0: Find nodes that match this pattern (dequantize - ref module - quantize) + matching_modules = list(STATIC_LOWER_MODULE_MAP.keys()) + list(STATIC_LOWER_FUSED_MODULE_MAP.keys()) + (q_node, relu_node, ref_node) = _match_static_pattern( + n, modules, qconfig_map, matching_modules, dequantize_node_arg_indices=[0]) # type: ignore[arg-type] + if q_node is None: + continue + assert(ref_node is not None) + (_, scale_node, zero_point_node, _) = q_node.args + ref_module = _get_module(ref_node, modules) + ref_class = type(ref_module) + assert(isinstance(scale_node, Node)) + assert(isinstance(zero_point_node, Node)) + assert(issubclass(ref_class, nn.Module)) + + # Step 1: Change this pattern to use the corresponding quantized module + # For fused modules, we also check whether the inner module is a reference module + # If so, we replace the entire fused module with the corresponding quantized module + if ref_class in STATIC_LOWER_FUSED_MODULE_MAP: + inner_ref_class, q_class = STATIC_LOWER_FUSED_MODULE_MAP[ref_class] + if type(ref_module[0]) != inner_ref_class: # type: ignore[index] continue + else: + q_class = STATIC_LOWER_MODULE_MAP[ref_class] + output_scale = getattr(model, scale_node.target) + output_zero_point = getattr(model, zero_point_node.target) + q_module = q_class.from_reference(ref_module, output_scale, output_zero_point) + # replace reference module with quantized module + parent_name, module_name = _parent_name(ref_node.target) + setattr(modules[parent_name], module_name, q_module) + + # Step 2: Remove dq_node, q_node and its args + dq_node = ref_node.args[0] + assert(isinstance(dq_node, Node)) + dq_node.replace_all_uses_with(dq_node.args[0]) + model.graph.erase_node(dq_node) + q_node.replace_all_uses_with(ref_node) + model.graph.erase_node(q_node) + model.graph.erase_node(scale_node) + model.graph.erase_node(zero_point_node) - # this can be removed if we add support for constants in is_match - if dtype != torch.quint8: - print(f"Only qint8 output for quantized op is supported, got: {dtype}") +def _lower_dynamic_weighted_ref_module(model: QuantizedGraphModule): + """ + Traverse the graph and find quantize_per_tensor_dynamic - dequantize - ref_module patterns + and replace them with the dynamically quantized version of the ref module. + """ + named_modules = dict(model.named_modules(remove_duplicate=False)) + for n in model.graph.nodes: + if n.op != "call_module" or \ + type(named_modules[str(n.target)]) not in \ + set(DYNAMIC_LOWER_MODULE_MAP.keys()).union( + set(DYNAMIC_LOWER_FUSED_MODULE_MAP.keys())): + continue + ref_node = n + dq_node = ref_node.args[0] + if dq_node.op != "call_method" or dq_node.target != "dequantize": + continue + # don't support lowering the pattern when the result of dequantize is used by + # multiple nodes + if len(dq_node.users) > 1: + continue + + input_dynamic_q_node = dq_node.args[0] + # don't support lowering the pattern when the result of quantize is used by + # multiple nodes + if len(input_dynamic_q_node.users) > 1: + continue + + if input_dynamic_q_node.op != "call_function" or \ + input_dynamic_q_node.target != torch.quantize_per_tensor_dynamic: + continue + + activation_compute_dtype = input_dynamic_q_node.args[1] + is_fp16 = activation_compute_dtype == torch.float16 + is_int8 = activation_compute_dtype in [torch.quint8, torch.qint8] + if not is_int8 and not is_fp16: + continue + + ref_module = named_modules[str(ref_node.target)] + ref_class = type(ref_module) + if ref_class in DYNAMIC_LOWER_FUSED_MODULE_MAP: + inner_ref_class, q_class = DYNAMIC_LOWER_FUSED_MODULE_MAP[ref_class] + if type(ref_module[0]) != inner_ref_class: continue + else: + q_class = DYNAMIC_LOWER_MODULE_MAP.get(ref_class) # type: ignore[assignment] + # TODO: maybe define a WeightedDynamicallyQuantizedModule + q_module = q_class.from_reference(ref_module) # type: ignore[attr-defined] - # change this pattern to use the corresponding quantized module - ref_module = modules[ref_node.target] - output_scale = getattr(model, scale_node.target) - output_zero_point = getattr(model, zero_point_node.target) - # For fused modules, we also check whether the inner module is a reference module - # If so, we replace the entire fused module with the corresponding quantized module - if ref_class in LOWER_FUSED_MODULE_MAP: - inner_ref_class, q_class = LOWER_FUSED_MODULE_MAP[ref_class] - if type(ref_module[0]) != inner_ref_class: - continue - else: - q_class = LOWER_MODULE_MAP[type(ref_module)] - assert issubclass(q_class, WeightedQuantizedModule) # suppress mypy warnings - q_module = q_class.from_reference(ref_module, output_scale, output_zero_point) - - # replace reference module with quantized module - parent_name, module_name = _parent_name(ref_node.target) - setattr(modules[parent_name], module_name, q_module) - # remove dq node: - dq_node_input = dq_node.args[0] - - dq_node.replace_all_uses_with(dq_node_input) - model.graph.erase_node(dq_node) + # replace reference moduel with dynamically quantized module + parent_name, module_name = _parent_name(ref_node.target) + setattr(named_modules[parent_name], module_name, q_module) - # remove q node and args: - q_node.replace_all_uses_with(ref_node) - model.graph.erase_node(q_node) - model.graph.erase_node(scale_node) - model.graph.erase_node(zero_point_node) - return model + # remove q - dq node + dq_node.replace_all_uses_with(input_dynamic_q_node) + model.graph.erase_node(dq_node) + input_dynamic_q_node.replace_all_uses_with(input_dynamic_q_node.args[0]) + model.graph.erase_node(input_dynamic_q_node) -def _lower_weighted_ref_functional( - model: QuantizedGraphModule, - qconfig_map: Dict[str, QConfigAny] -) -> QuantizedGraphModule: +def _lower_weight_only_weighted_ref_module(model: QuantizedGraphModule): + """ + Traverse the graph and find ref_module patterns + and replace them with the weight only quantized version of the ref module. + """ + named_modules = dict(model.named_modules(remove_duplicate=False)) + for n in model.graph.nodes: + if n.op != "call_module" or \ + type(named_modules[str(n.target)]) not in \ + set(WEIGHT_ONLY_LOWER_MODULE_MAP.keys()): + continue + ref_node = n + ref_module = named_modules[str(ref_node.target)] + ref_class = type(ref_module) + q_class = WEIGHT_ONLY_LOWER_MODULE_MAP.get(ref_class) + # TODO: WeightedQuantizedModule is currently assuming static quant apis + # with output_scale, output_zero_point in from_reference, we may want to + # relax that, or rename this + # TODO: maybe define a WeightedWeightOnlyQuantizedModule + q_module = q_class.from_reference(ref_module) # type: ignore[union-attr] + + # replace reference moduel with dynamically quantized module + parent_name, module_name = _parent_name(ref_node.target) + setattr(named_modules[parent_name], module_name, q_module) + +def _lower_static_weighted_ref_functional( + model: QuantizedGraphModule, + qconfig_map: Dict[str, QConfigAny]): """ Traverse the graph and replace functional reference patterns with their quantized versions. """ - for ref_func, (q_func, q_relu_func) in LOWER_FUNCTIONAL_MAP.items(): - configurations = itertools.product( - (False, True), # is_relu: whether ref_func is wrapped in a relu op - (False, True), # has_bias: whether bias is passed as an extra argument to ref_func - ) - for is_relu, has_bias in configurations: - if is_relu and q_relu_func is None: - continue + modules = dict(model.named_modules(remove_duplicate=False)) + nodes = list(model.graph.nodes) + for n in model.graph.nodes: + # Step 0: Find nodes that match this pattern (dequantize - functional op - quantize) + matching_ops = list(STATIC_LOWER_FUNCTIONAL_MAP.keys()) + (q_node, relu_node, func_node) = _match_static_pattern( + n, modules, qconfig_map, matching_ops, dequantize_node_arg_indices=[0, 1]) + if q_node is None: + continue + assert(func_node is not None) + (_, output_scale_node, output_zp_node, _) = q_node.args + (input_dq_node, weight_dq_node, *remaining_func_args) = func_node.args + assert(isinstance(output_zp_node, Node)) + assert(isinstance(input_dq_node, Node)) + assert(isinstance(weight_dq_node, Node)) + quantized_weight = weight_dq_node.args[0] + assert(isinstance(quantized_weight, Node)) + if quantized_weight.op != "call_function" or\ + quantized_weight.target not in (torch.quantize_per_tensor, torch.quantize_per_channel): + continue - # Set up match pattern: (dequantize - [relu_op - ] func_op - quantize) - # Func args: (dequantized inputs, dequantized weights[, bias]) - # Quantize args: (func, scale, zp, dtype) - func_pattern: Tuple[Any, ...] = () - if has_bias: - func_pattern = (ref_func, "dequantize", "dequantize", MatchAllNode) - else: - func_pattern = (ref_func, "dequantize", "dequantize") - if is_relu: - func_pattern = (F.relu, func_pattern) - pattern = (torch.quantize_per_tensor, func_pattern, MatchAllNode, MatchAllNode, MatchAllNode) - - # Iterate through nodes in the graph to find a match - # If there is a match, replace the above pattern with the corresponding quantized op - modules = dict(model.named_modules(remove_duplicate=False)) - nodes = list(model.graph.nodes) - for n in model.graph.nodes: - if not is_match(modules, n, pattern): - continue - q_node = n - (func_node, output_scale_node, output_zp_node, dtype) = q_node.args - if is_relu: - relu_node = func_node - func_node = relu_node.args[0] - else: - relu_node = None - input_dq_node = func_node.args[0] - weight_dq_node = func_node.args[1] - - if should_skip_lowering(func_node, qconfig_map): - continue - - # Step 1: Replace quantized weights with packed weights, which will be folded later - quantized_weight = weight_dq_node.args[0] - weight_dtype = quantized_weight.args[-1] - if has_bias: - bias = func_node.args[2] - else: - bias = func_node.kwargs.get("bias", None) - prepack_args = (quantized_weight, bias) - if ref_func == F.linear: - prepack_op = get_linear_prepack_op_for_dtype(weight_dtype) - else: - raise ValueError("Lowering for functional currently only supports linear op") - insert_prepack_after = bias if has_bias else quantized_weight - with model.graph.inserting_after(insert_prepack_after): - packed_weight = model.graph.create_node("call_function", prepack_op, prepack_args, {}) - - # Step 2: Replace reference pattern with the corresponding quantized op - func_node.args = (input_dq_node.args[0], packed_weight, output_scale_node, output_zp_node) - func_node.target = q_relu_func if is_relu else q_func - q_node.replace_all_uses_with(func_node) - output_zp_node.append(func_node) - - # Clean up: Remove dequantize and quantize nodes and the old func node - for dqn in [input_dq_node, weight_dq_node]: - dqn_input = dqn.args[0] - dqn.replace_all_uses_with(dqn_input) - model.graph.erase_node(dqn) - model.graph.erase_node(q_node) - if is_relu: - model.graph.erase_node(relu_node) - return model + # Step 1: Replace quantized weights with packed weights, which will be folded later + # Use the right prepack op and prepare the corresponding args + # Linear prepack args: (quantized weights[, bias]) + # Conv prepack args: (quantized weights[, bias, stride, padding, dilation, groups]) + prepack_args = [quantized_weight] + remaining_func_args + if func_node.target == F.linear: + weight_dtype = quantized_weight.args[-1] + prepack_op = get_linear_prepack_op_for_dtype(weight_dtype) + elif func_node.target in CONV_FUNCTIONAL_OPS: + prepack_op = get_qconv_prepack_op(func_node.target) # type: ignore[arg-type] + # For conv1d, the stride, padding, and dilation args may be ints, + # in which case we need to convert them to tuples + if func_node.target == F.conv1d: + for i in [2, 3, 4]: + if len(prepack_args) > i and isinstance(prepack_args[i], int): + prepack_args[i] = (prepack_args[i],) + else: + raise ValueError("Lowering is not supported for op '%s'" % func_node.target) + with model.graph.inserting_before(output_scale_node): + packed_weight = model.graph.create_node("call_function", prepack_op, tuple(prepack_args), {}) + + # Step 2: Replace reference pattern with the corresponding quantized op + (q_func, q_relu_func) = STATIC_LOWER_FUNCTIONAL_MAP[func_node.target] # type: ignore[index] + func_node.target = q_relu_func if relu_node is not None else q_func + func_node.args = (input_dq_node.args[0], packed_weight, output_scale_node, output_zp_node) + q_node.replace_all_uses_with(func_node) + # Move func_node after output_zp_node in the graph + output_zp_node.append(func_node) + + # Clean up: Remove dequantize and quantize nodes, and the relu node if it exists + for dqn in [input_dq_node, weight_dq_node]: + dqn_input = dqn.args[0] + dqn.replace_all_uses_with(dqn_input) + model.graph.erase_node(dqn) + model.graph.erase_node(q_node) + if relu_node is not None: + model.graph.erase_node(relu_node) -def _lower_quantized_binary_op( - model: QuantizedGraphModule, - qconfig_map: Dict[str, QConfigAny] -) -> QuantizedGraphModule: +def _lower_dynamic_weighted_ref_functional( + model: QuantizedGraphModule, + qconfig_map: Dict[str, QConfigAny]): + """ + Traverse the graph and replace functional reference patterns with their dynamically + quantized versions. + Examples: + quantize_per_tensor_dynamic - dequantize - functional linear --> linear_dynamic + to(torch.float16) - dequantize - functional linear --> linear_dynamic_fp16 + """ modules = dict(model.named_modules(remove_duplicate=False)) + nodes = list(model.graph.nodes) + # we want to search in reserved order so that we can match the larger patterns first + # e.g. we want to match linear - relu before linear. + for n in reversed(model.graph.nodes): + + # Step 0: Find nodes that match this pattern + # (quantize_per_tensor_dynamic - dequantize - dynamically quantized op) + # We search for the pattern backwards, starting with the quantize node + # Quantize node args: (func, scale, zp, dtype) + func_node = n + # Handle cases where the functional op is wrapped in a ReLU + if func_node.op == "call_function" and func_node.target == F.relu or \ + func_node.op == "call_module" and \ + type(modules[str(func_node.target)]) == torch.nn.ReLU: + relu_node = func_node + func_node = relu_node.args[0] + else: + relu_node = None + if should_skip_lowering(func_node, qconfig_map): + continue + # Linear args: (dequantized inputs, dequantized weights[, bias]) + # Conv args: (dequantized inputs, dequantized weights[, bias, stride, padding, dilation, groups]) + if func_node.op != "call_function" or func_node.target not in DYNAMIC_LOWER_FUNCTIONAL_MAP: + continue + (input_dq_node, weight_dq_node, *remaining_func_args) = func_node.args + if input_dq_node.op != "call_method" or input_dq_node.target != "dequantize" or \ + weight_dq_node.op != "call_method" or weight_dq_node.target != "dequantize": + continue - def get_bop_patterns(bop: Any) -> List[Pattern]: - patterns: List[Pattern] = [] - bop_pattern = (bop, MatchAllNode, MatchAllNode) - for relu_op in [torch.relu, torch.nn.functional.relu, torch.nn.ReLU]: - patterns.append( - (torch.quantize_per_tensor, - (relu_op, bop_pattern), - MatchAllNode, MatchAllNode, MatchAllNode)) - patterns.append( - (torch.quantize_per_tensor, - bop_pattern, - MatchAllNode, MatchAllNode, MatchAllNode)) - return patterns - - patterns: List[Pattern] = [] - for bop in [operator.add, torch.add, operator.mul, torch.mul]: - patterns.extend(get_bop_patterns(bop)) - patterns.extend( - [ - (torch.quantize_per_tensor, - (torch.matmul, "dequantize", "dequantize"), - MatchAllNode, MatchAllNode, MatchAllNode) - ] - ) + input_dynamic_q_node = input_dq_node.args[0] + # don't support lowering the pattern when the result of quantize is used by + # multiple nodes + if len(input_dynamic_q_node.users) > 1: + continue + + if input_dynamic_q_node.op != "call_function" or \ + input_dynamic_q_node.target != torch.quantize_per_tensor_dynamic: + continue + + reduce_range_node = None + (pattern_input, activation_compute_dtype, reduce_range_node) = input_dynamic_q_node.args + is_fp16 = activation_compute_dtype == torch.float16 + is_int8 = activation_compute_dtype in [torch.quint8, torch.qint8] + if not is_int8 and not is_fp16: + continue + + quantized_weight = weight_dq_node.args[0] + weight_dtype = quantized_weight.args[-1] + + # Step 1: Try to select reference pattern with the corresponding quantized op + dynamic_quant_dtype_key = (activation_compute_dtype, weight_dtype) + if dynamic_quant_dtype_key not in DYNAMIC_LOWER_FUNCTIONAL_MAP[func_node.target]: + print(f"Didn't find dtype combination {dynamic_quant_dtype_key} during " + f"dynamic quantized op lowering for {func_node.target}") + continue + (q_func, q_relu_func) = DYNAMIC_LOWER_FUNCTIONAL_MAP[func_node.target][dynamic_quant_dtype_key] + + if q_func is None or q_relu_func is None: + print("Didn't find corresponding quantized function or quantized relu function " + f"for {func_node.target}, {dynamic_quant_dtype_key}") + continue + + # Step 2: Replace quantized weights with packed weights, which will be folded later + # Use the right prepack op and prepare the corresponding args + # Linear prepack args: (quantized weights[, bias]) + # Conv prepack args: (quantized weights[, bias, stride, padding, dilation, groups]) + prepack_args = [quantized_weight] + remaining_func_args + if func_node.target == F.linear: + prepack_op = get_linear_prepack_op_for_dtype(weight_dtype) + elif func_node.target in CONV_FUNCTIONAL_OPS: + prepack_op = get_qconv_prepack_op(func_node.target) + # For conv1d, the stride, padding, and dilation args may be ints, + # in which case we need to convert them to tuples + if func_node.target == F.conv1d: + for i in [2, 3, 4]: + if len(prepack_args) > i and isinstance(prepack_args[i], int): + prepack_args[i] = (prepack_args[i],) + else: + raise ValueError("Lowering is not supported for op '%s'" % func_node.target) + with model.graph.inserting_before(func_node): + packed_weight = model.graph.create_node("call_function", prepack_op, tuple(prepack_args), {}) + + # Step 3: Replace reference pattern with the corresponding quantized op + func_node.target = q_relu_func if relu_node is not None else q_func + if is_int8: + func_node.args = (pattern_input, packed_weight, reduce_range_node) + else: + func_node.args = (pattern_input, packed_weight) + if relu_node is not None: + relu_node.replace_all_uses_with(func_node) + + # Step 4: Remove dequantize and quantize nodes, and the relu node if it exists + for dqn in [input_dq_node, weight_dq_node]: + dqn_input = dqn.args[0] + dqn.replace_all_uses_with(dqn_input) + model.graph.erase_node(dqn) + model.graph.erase_node(input_dynamic_q_node) + if relu_node is not None: + model.graph.erase_node(relu_node) + +def _lower_quantized_binary_op( + model: QuantizedGraphModule, + qconfig_map: Dict[str, QConfigAny]): qbin_op_mapping: Dict[Union[Callable, str], Callable] = { operator.add: torch.ops.quantized.add, torch.add: torch.ops.quantized.add, @@ -488,94 +769,62 @@ def get_bop_patterns(bop: Any) -> List[Pattern]: operator.mul: torch.ops.quantized.mul_relu, torch.mul: torch.ops.quantized.mul_relu, } - for pattern in patterns: - for n in model.graph.nodes: - if not is_match(modules, n, pattern): - continue - q_node = n - is_quantize = q_node.target == torch.quantize_per_tensor - is_to_fp16 = q_node.op == "call_method" and q_node.target == "to" and q_node.args[1] == torch.float16 - if not (is_quantize or is_to_fp16): - continue - - # start tracing back from quantize node - node = q_node.args[0] - if not isinstance(node, Node): - continue - relu_node = None - if ( - node.op == 'call_function' and - node.target in (torch.nn.functional.relu, torch.relu) - ) or ( - node.op == 'call_module' and - isinstance(modules[str(node.target)], torch.nn.ReLU) - ): - relu_node = node - node = node.args[0] - - # binary operator node, e.g. torch.add(x, y) - bop_node = node - if bop_node.op != "call_function" or \ - bop_node.target not in set([torch.add, operator.add, torch.mul, operator.mul, torch.matmul]): - continue - - if should_skip_lowering(bop_node, qconfig_map): - continue + binary_ops_to_lower: List[Callable] = [operator.add, torch.add, operator.mul, torch.mul, torch.matmul] + modules = dict(model.named_modules(remove_duplicate=False)) + for n in model.graph.nodes: + # Step 0: Find nodes that match this pattern (dequantize - ref module - quantize) + (q_node, relu_node, bop_node) = _match_static_pattern( + n, modules, qconfig_map, binary_ops_to_lower, dequantize_node_arg_indices=[0, 1]) + if q_node is None: + continue + assert(bop_node is not None) + (_, scale_node, zero_point_node, _) = q_node.args - # remove dequant node - arg0 = bop_node.args[0] - arg1 = bop_node.args[1] - dq_node0, dq_node1 = None, None - if is_dequantize_node(arg0): - dq_node0 = arg0 - if is_dequantize_node(arg1): - dq_node1 = arg1 - if dq_node0 is None and dq_node1 is None: + # Step 1: Remove dequant nodes + num_dq_nodes = 0 + for arg in bop_node.args: + if not is_dequantize_node(arg): continue - for dq_node in [dq_node0, dq_node1]: - if dq_node is None: - continue - # dequantize node is only used once, this is enforced by `is_match` - dn_input = dq_node.args[0] - dq_node.replace_all_uses_with(dn_input) - model.graph.erase_node(dq_node) - - # swap binary op to quantized binary op - assert bop_node.target in qbin_op_mapping - binop_to_qbinop = qbin_op_mapping if relu_node is None else qbin_relu_op_mapping - qbin_op = binop_to_qbinop[bop_node.target] - # prepare the args for quantized bianry op - # (x, y) - qop_node_args = list(bop_node.args) - # (x, y, scale, zero_point) - # add scale and zero_point arguments for Tensor - Tensor operation - if dq_node0 is not None and dq_node1 is not None: - qop_node_args.extend([q_node.args[1], q_node.args[2]]) - - # insert a call to quantized binary op and remove the original binary op - with model.graph.inserting_after(q_node): - qop_node = create_node_from_old_node_preserve_meta( - model.graph, - ("call_function", qbin_op, tuple(qop_node_args), {}), - bop_node) - q_node.replace_all_uses_with(qop_node) - - # remove quantize node - model.graph.erase_node(q_node) - # remove relu node if any - if relu_node is not None: - model.graph.erase_node(relu_node) - # remove binary op node - model.graph.erase_node(bop_node) - - return model + dq_node = arg + assert(isinstance(dq_node, Node)) + dn_input = dq_node.args[0] + dq_node.replace_all_uses_with(dn_input) + model.graph.erase_node(dq_node) + num_dq_nodes += 1 + assert(num_dq_nodes > 0) + + # Step 2: Swap binary op to quantized binary op + assert bop_node.target in qbin_op_mapping + binop_to_qbinop = qbin_op_mapping if relu_node is None else qbin_relu_op_mapping + qbin_op = binop_to_qbinop[bop_node.target] + # prepare the args for quantized bianry op + # (x, y) + qop_node_args = list(bop_node.args) + # (x, y, scale, zero_point) + # add scale and zero_point arguments for Tensor - Tensor operation + if num_dq_nodes == 2: + qop_node_args.extend([scale_node, zero_point_node]) + # insert a call to quantized binary op and remove the original binary op + with model.graph.inserting_after(q_node): + qop_node = create_node_from_old_node_preserve_meta( + model.graph, + ("call_function", qbin_op, tuple(qop_node_args), {}), + bop_node) + q_node.replace_all_uses_with(qop_node) + + # Step 3: Remove quantize node, binary op node, and relu node if any + model.graph.erase_node(q_node) + if relu_node is not None: + model.graph.erase_node(relu_node) + model.graph.erase_node(bop_node) -def special_pattern_replacement(model: QuantizedGraphModule) -> QuantizedGraphModule: +def special_pattern_replacement(model: QuantizedGraphModule): modules = dict(model.named_modules(remove_duplicate=False)) for n in model.graph.nodes: q_node = n is_quantize = q_node.target == torch.quantize_per_tensor - is_to_fp16 = q_node.op == "call_method" and q_node.target == "to" and q_node.args[1] == torch.float16 + is_to_fp16 = q_node.op == "call_method" and q_node.target == "to" and \ + len(q_node.args) == 2 and q_node.args[1] == torch.float16 if not (is_quantize or is_to_fp16): continue ref_node = q_node.args[0] @@ -677,6 +926,20 @@ def special_pattern_replacement(model: QuantizedGraphModule) -> QuantizedGraphMo return model +def _lower_getattr_tensor_metadta_op(model: QuantizedGraphModule): + """ Modified the graph of the model inplace, to skip extra dequantize op before + the general tensor shape ops when possible + """ + for n in model.graph.nodes: + if is_getattr_tensor_metadata_node(n): + maybe_dq = n.args[0] + if maybe_dq.op != "call_method" or maybe_dq.target != "dequantize": + continue + # skip the dequantize node + args = list(n.args) + args[0] = n.args[0].args[0] + n.args = tuple(args) + def _lower_to_native_backend( model: QuantizedGraphModule, qconfig_map: Dict[str, QConfigAny], @@ -686,13 +949,16 @@ def _lower_to_native_backend( to the native backend in PyTorch (fbgemm/qnnpack), both backends shares the same operator signature so they can be lowered with the same function """ - model = _lower_weighted_ref_module(model) - model = _lower_weighted_ref_functional(model, qconfig_map) - for pattern, replacement in get_fbgemm_patterns_and_replacements(): - subgraph_rewriter_FORKED_DO_NOT_USE.replace_pattern(model, pattern, replacement) + _lower_static_weighted_ref_module(model, qconfig_map) + _lower_dynamic_weighted_ref_module(model) + _lower_weight_only_weighted_ref_module(model) + _lower_static_weighted_ref_functional(model, qconfig_map) + _lower_dynamic_weighted_ref_functional(model, qconfig_map) _lower_quantized_binary_op(model, qconfig_map) + _lower_getattr_tensor_metadta_op(model) special_pattern_replacement(model) model = fold_weight(model, node_name_to_scope) + model.graph.eliminate_dead_code() model.recompile() model.graph.lint() return model diff --git a/torch/ao/quantization/fx/backend_config/__init__.py b/torch/ao/quantization/fx/backend_config/__init__.py index b595b660344e9c..3fc6762815763c 100644 --- a/torch/ao/quantization/fx/backend_config/__init__.py +++ b/torch/ao/quantization/fx/backend_config/__init__.py @@ -1,4 +1,5 @@ from .tensorrt import get_tensorrt_backend_config_dict +from .native import get_native_backend_config_dict # TODO: add more validations def validate_backend_config_dict(backend_config_dict): diff --git a/torch/ao/quantization/fx/backend_config/native.py b/torch/ao/quantization/fx/backend_config/native.py new file mode 100644 index 00000000000000..e18465a19cf039 --- /dev/null +++ b/torch/ao/quantization/fx/backend_config/native.py @@ -0,0 +1,618 @@ +from collections import namedtuple +from typing import List, Dict, Any +import operator +import torch +from .observation_type import ObservationType +import torch.nn.functional as F +import torch.nn as nn +import torch.nn.intrinsic as nni +import torch.nn.intrinsic.qat as nniqat +import torch.nn.qat as nnqat +import torch.nn.quantized._reference as nnqr +from ...observer import ( + default_affine_fixed_qparams_observer, + default_symmetric_fixed_qparams_observer, +) +from ...fake_quantize import FixedQParamsFakeQuantize +from ...fuser_method_mappings import ( + reverse_sequential_wrapper2, + reverse2, + reverse3, + fuse_conv_bn, + fuse_conv_bn_relu, + fuse_linear_bn, + fuse_convtranspose_bn, +) + +_ConvMetadata = namedtuple( + "_ConvMetadata", + ["root", "transpose", "bn", "reference", "qat", "relu", "relu_qat", "bn_qat", + "bn_relu_qat", "func"]) +_Conv1dMetadata = _ConvMetadata( + nn.Conv1d, nn.ConvTranspose1d, nn.BatchNorm1d, nnqr.Conv1d, nnqat.Conv1d, nni.ConvReLU1d, + nniqat.ConvReLU1d, nniqat.ConvBn1d, nniqat.ConvBnReLU1d, F.conv1d) +_Conv2dMetadata = _ConvMetadata( + nn.Conv2d, nn.ConvTranspose2d, nn.BatchNorm2d, nnqr.Conv2d, nnqat.Conv2d, nni.ConvReLU2d, + nniqat.ConvReLU2d, nniqat.ConvBn2d, nniqat.ConvBnReLU2d, F.conv2d) +_Conv3dMetadata = _ConvMetadata( + nn.Conv3d, nn.ConvTranspose3d, nn.BatchNorm3d, nnqr.Conv3d, nnqat.Conv3d, nni.ConvReLU3d, + nniqat.ConvReLU3d, nniqat.ConvBn3d, nniqat.ConvBnReLU3d, F.conv3d) + +# =================== +# | DTYPE CONFIGS | +# =================== + +# weighted op int8 dtype config +# this is config for ops that has quantized weights, like linear, conv +weighted_op_int8_dtype_config = { + # optional, input activation dtype + "input_dtype": torch.quint8, + # optional, weight dtype + "weight_dtype": torch.qint8, + # optional, bias dtype + "bias_dtype": torch.float, + # optional, output activation dtype + "output_dtype": torch.quint8 +} + +default_op_quint8_dtype_config = { + # optional, input activation dtype + "input_dtype": torch.quint8, + # optional, output activation dtype + "output_dtype": torch.quint8, +} + +default_op_fp16_dtype_config = { + # optional, input activation dtype + "input_dtype": torch.float16, + # optional, weight dtype + "weight_dtype": torch.float16, + # optional, output activation dtype + "output_dtype": torch.float16, +} + +default_dynamic_int8_dtype_config = { + "input_dtype": torch.quint8, + "weight_dtype": torch.qint8, + "output_dtype": torch.quint8, + # currently the dtype check is not yet enabled, so we provided the dtype_configs but + # it is not really used yet, + # we will enable it a bit later after we moved everything to backend_config_dict + "is_dynamic": True, +} + +weight_only_quint8_dtype_config = { + "input_dtype": torch.float, + "weight_dtype": torch.quint8, + "output_dtype": torch.float, +} + +weight_only_quint4x2_dtype_config = { + "input_dtype": torch.float, + "weight_dtype": torch.quint4x2, + "output_dtype": torch.float, +} + +# ====================== +# | OPERATOR CONFIGS | +# ====================== + +def _get_default_op_backend_config(op, dtype_configs): + return { + "pattern": op, + "observation_type": ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT, + "dtype_configs": dtype_configs, + } + +_DEFAULT_OP_INT8_CONFIGS = [ + _get_default_op_backend_config(op, [default_op_quint8_dtype_config]) for op in [ + torch.nn.ConvTranspose1d, + torch.nn.ConvTranspose2d, + torch.nn.ELU, + torch.nn.LeakyReLU, + torch.nn.Hardswish, + torch.nn.InstanceNorm1d, + torch.nn.InstanceNorm2d, + torch.nn.InstanceNorm3d, + torch.nn.LayerNorm, + torch.nn.Dropout, + torch.nn.functional.elu, + torch.nn.functional.hardswish, + torch.nn.functional.instance_norm, + torch.nn.functional.leaky_relu, + torch.nn.functional.dropout, + torch.nn.functional.layer_norm, + ]] + +def _get_linear_configs(): + """ + Return all configs related to linear modules and ops. + """ + observation_type = ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT + dtype_configs = [weighted_op_int8_dtype_config] + linear_configs = [] + + # (1) Single linear modules/functions + # ------------------------------------- + # linear module + linear_configs.append({ + # Please see README under this folder for pattern format + "pattern": torch.nn.Linear, + "observation_type": observation_type, + "dtype_configs": dtype_configs, + # the root module for the pattern, used to query the reference quantized module + # e.g. for a (torch.nn.ReLU, torch.nn.Linear) pattern, the root will be torch.nn.Linear + "root_module": torch.nn.Linear, + # the corresponding reference quantized module for the root module + "reference_quantized_module_for_root": nnqr.Linear, + "qat_module": nnqat.Linear, + }) + # linear qat module + linear_configs.append({ + "pattern": nnqat.Linear, + "observation_type": observation_type, + "dtype_configs": dtype_configs, + "root_module": torch.nn.Linear, + "reference_quantized_module_for_root": nnqr.Linear, + }) + # functional linear + linear_configs.append({ + "pattern": torch.nn.functional.linear, + "observation_type": observation_type, + "dtype_configs": dtype_configs, + }) + + # (2) Linear + relu + # ------------------- + # 2.1 linear module + relu fusion config + # linear relu, linear module + relu module + linear_configs.append({ + "pattern": (torch.nn.ReLU, torch.nn.Linear), + "dtype_configs": dtype_configs, + "fuser_method": reverse_sequential_wrapper2(nni.LinearReLU), + }) + # linear relu, linear module + functional relu + linear_configs.append({ + "pattern": (torch.nn.functional.relu, torch.nn.Linear), + "dtype_configs": dtype_configs, + "fuser_method": reverse_sequential_wrapper2(nni.LinearReLU), + }) + + # 2.2 linear module + relu, fused module configs + # linear relu, fused module + linear_configs.append({ + "pattern": nni.LinearReLU, + "observation_type": observation_type, + "dtype_configs": dtype_configs, + "root_module": torch.nn.Linear, + "reference_quantized_module_for_root": nnqr.Linear, + "qat_module": nniqat.LinearReLU, + }) + # linear relu, qat fused module + linear_configs.append({ + "pattern": nniqat.LinearReLU, + "observation_type": observation_type, + "dtype_configs": dtype_configs, + "root_module": torch.nn.Linear, + "reference_quantized_module_for_root": nnqr.Linear, + }) + # 2.3 functional linear + relu configs + # linear relu, functional linear + relu module + linear_configs.append({ + "pattern": (torch.nn.ReLU, F.linear), + "observation_type": observation_type, + "dtype_configs": dtype_configs, + }) + # linear relu, functional linear + functional relu + linear_configs.append({ + "pattern": (F.relu, F.linear), + "observation_type": observation_type, + "dtype_configs": dtype_configs, + }) + + # (3) Linear + batchnorm + # ------------------------ + # 3.1 linear bn fusion + linear_configs.append({ + "pattern": (nn.BatchNorm1d, nn.Linear), + "dtype_configs": dtype_configs, + "fuser_method": reverse2(fuse_linear_bn) + }) + + # 3.2 linear bn quantization + # linear bn, fused module + linear_configs.append({ + "pattern": nni.LinearBn1d, + "observation_type": observation_type, + "dtype_configs": dtype_configs, + "root_module": torch.nn.Linear, + "reference_quantized_module_for_root": nnqr.Linear, + "qat_module": nniqat.LinearBn1d, + }) + # linear bn, qat fused module + linear_configs.append({ + "pattern": nniqat.LinearBn1d, + "observation_type": observation_type, + "dtype_configs": dtype_configs, + "root_module": torch.nn.Linear, + "reference_quantized_module_for_root": nnqr.Linear, + }) + return linear_configs + +def _get_conv_configs(): + """ + Return all configs related to conv modules and ops. + """ + conv_configs = [] + observation_type = ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT + dtype_configs = [weighted_op_int8_dtype_config] + for convs in [_Conv1dMetadata, _Conv2dMetadata, _Conv3dMetadata]: + + # (1) Single conv modules/functions + # ----------------------------------- + # conv module + conv_configs.append({ + "pattern": convs.root, + "observation_type": observation_type, + "dtype_configs": dtype_configs, + "root_module": convs.root, + "reference_quantized_module_for_root": convs.reference, + "qat_module": convs.qat, + }) + # conv qat module + conv_configs.append({ + "pattern": convs.qat, + "observation_type": observation_type, + "dtype_configs": dtype_configs, + "root_module": convs.root, + "reference_quantized_module_for_root": convs.reference, + }) + # functional conv + conv_configs.append({ + "pattern": convs.func, + "observation_type": observation_type, + "dtype_configs": dtype_configs, + }) + + # (2) Conv + relu + # ----------------- + # 2.1 conv module + relu fusion configs + # conv relu fusion, conv module + relu module + conv_configs.append({ + "pattern": (torch.nn.ReLU, convs.root), + "dtype_configs": dtype_configs, + "fuser_method": reverse_sequential_wrapper2(convs.relu), + }) + # conv relu fusion, conv module + functional relu + conv_configs.append({ + "pattern": (F.relu, convs.root), + "dtype_configs": dtype_configs, + "fuser_method": reverse_sequential_wrapper2(convs.relu), + }) + # 2.2 conv module + relu fused module configs + # conv relu, fused module + conv_configs.append({ + "pattern": convs.relu, + "observation_type": observation_type, + "dtype_configs": dtype_configs, + "root_module": convs.root, + "reference_quantized_module_for_root": convs.reference, + "qat_module": convs.relu_qat, + }) + # conv relu, qat fused module + conv_configs.append({ + "pattern": convs.relu_qat, + "observation_type": observation_type, + "dtype_configs": dtype_configs, + "root_module": convs.root, + "reference_quantized_module_for_root": convs.reference, + }) + # 2.3 functional conv + relu configs + # conv relu, functional conv + relu module + conv_configs.append({ + "pattern": (torch.nn.ReLU, convs.func), + "observation_type": observation_type, + "dtype_configs": dtype_configs, + }) + # conv relu, functional conv + functional relu + conv_configs.append({ + "pattern": (F.relu, convs.func), + "observation_type": observation_type, + "dtype_configs": dtype_configs, + }) + + # (3) Conv + batchnorm (+ relu) + # ------------------------------- + # 3.1 conv bn fusion configs + # conv + bn fusion + conv_configs.append({ + "pattern": (convs.bn, convs.root), + "dtype_configs": dtype_configs, + "fuser_method": reverse2(fuse_conv_bn), + }) + # conv + bn + relu module fusion + conv_configs.append({ + "pattern": (nn.ReLU, (convs.bn, convs.root)), + "dtype_configs": dtype_configs, + "fuser_method": reverse3(fuse_conv_bn_relu), + }) + # conv + bn + relu functional fusion + conv_configs.append({ + "pattern": (F.relu, (convs.bn, convs.root)), + "dtype_configs": dtype_configs, + "root_module": convs.root, + "fuser_method": reverse3(fuse_conv_bn_relu), + }) + # TODO: we can add fusion for torch.relu as well + + # 3.2 conv + bn (+ relu) fused module configs + # conv bn, qat fused module + conv_configs.append({ + "pattern": convs.bn_qat, + "observation_type": observation_type, + "dtype_configs": dtype_configs, + "root_module": convs.root, + "reference_quantized_module_for_root": convs.reference, + }) + # conv bn relu, qat fused module + conv_configs.append({ + "pattern": convs.bn_relu_qat, + "observation_type": observation_type, + "dtype_configs": dtype_configs, + "root_module": convs.root, + "reference_quantized_module_for_root": convs.reference, + }) + + # (4) conv transpose fusion + conv_configs.append({ + "pattern": (convs.bn, convs.transpose), + "dtype_configs": dtype_configs, + "fuser_method": reverse2(fuse_convtranspose_bn), + }) + + return conv_configs + +def _get_binary_op_configs(): + binary_op_configs: List[Dict[str, Any]] = [] + num_tensor_args_to_observation_type_mapping = { + # TODO: this is not used right now since we have extra check in prepare + # will need to change this to NO_OBSERVER later after we implemented + # Tensor dtype inference properly + 0: ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT, + 1: ObservationType.OUTPUT_SHARE_OBSERVER_WITH_INPUT, + 2: ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT, + } + dtype_configs = [ + weighted_op_int8_dtype_config, + ] + for op_with_quantized_bop_scalar_variant in [ + operator.add, torch.add, operator.mul, torch.mul]: + binary_op_configs.append({ + "pattern": (torch.nn.ReLU, op_with_quantized_bop_scalar_variant), + "num_tensor_args_to_observation_type": num_tensor_args_to_observation_type_mapping, + "dtype_configs": dtype_configs, + }) + binary_op_configs.append({ + "pattern": (torch.nn.functional.relu, op_with_quantized_bop_scalar_variant), + "num_tensor_args_to_observation_type": num_tensor_args_to_observation_type_mapping, + "dtype_configs": dtype_configs, + }) + binary_op_configs.append({ + "pattern": (torch.relu, op_with_quantized_bop_scalar_variant), + "num_tensor_args_to_observation_type": num_tensor_args_to_observation_type_mapping, + "dtype_configs": dtype_configs, + }) + binary_op_configs.append({ + "pattern": op_with_quantized_bop_scalar_variant, + "num_tensor_args_to_observation_type": num_tensor_args_to_observation_type_mapping, + "dtype_configs": dtype_configs, + }) + return binary_op_configs + + +def _get_fixed_qparams_op_configs(): + fixed_qparams_op_configs = [] + for fixed_qparam_op, output_observer in [ + (torch.nn.Hardsigmoid, default_affine_fixed_qparams_observer), + (torch.nn.functional.hardsigmoid, default_affine_fixed_qparams_observer), + ("hardsigmoid", default_affine_fixed_qparams_observer), + ("hardsigmoid_", default_affine_fixed_qparams_observer), + (torch.nn.Sigmoid, default_affine_fixed_qparams_observer), + (torch.sigmoid, default_affine_fixed_qparams_observer), + ("sigmoid", default_affine_fixed_qparams_observer), + ("sigmoid_", default_affine_fixed_qparams_observer), + (torch.nn.Tanh, default_symmetric_fixed_qparams_observer), + (torch.tanh, default_symmetric_fixed_qparams_observer), + ("tanh", default_symmetric_fixed_qparams_observer), + ("tanh_", default_symmetric_fixed_qparams_observer), + ]: + fixed_qparams_op_configs.append({ + "pattern": fixed_qparam_op, + "observation_type": ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT, + # TODO: The following two keys are temporary, since we don't want to put observer in the configs + # we expect that it's provided by user + # What we want to put here is the requirement on observers, in this case dtype, + # quant_min, quant_max etc., but we need to first move all configs to + # backend_config_dict to do that, we'll remove these keys after we fully migrated + # everything to use backend_config_dict + "_overwrite_output_fake_quantizer": FixedQParamsFakeQuantize.with_args(observer=output_observer), + "_overwrite_output_observer": output_observer, + "dtype_configs": [ + weighted_op_int8_dtype_config, + ], + }) + return fixed_qparams_op_configs + +_CAT_CONFIG = { + "pattern": torch.cat, + "observation_type": ObservationType.OUTPUT_SHARE_OBSERVER_WITH_INPUT, + "dtype_configs": [ + default_op_quint8_dtype_config, + ] +} + +def _get_bn_configs(): + """ Get configs related to batchnorm + """ + bn_configs = [] + bn_to_fused_bn = { + torch.nn.BatchNorm2d: nni.BNReLU2d, + torch.nn.BatchNorm3d: nni.BNReLU3d, + } + for bn in bn_to_fused_bn.keys(): + # bn module + relu module fusion config + bn_configs.append({ + "pattern": (torch.nn.ReLU, bn), + "dtype_configs": default_op_quint8_dtype_config, + "fuser_method": reverse_sequential_wrapper2(bn_to_fused_bn[bn]), + }) + # bn module + F.relu fusion config + bn_configs.append({ + "pattern": (torch.nn.functional.relu, bn), + "dtype_configs": default_op_quint8_dtype_config, + "fuser_method": reverse_sequential_wrapper2(bn_to_fused_bn[bn]), + }) + bn_configs.append({ + "pattern": bn, + "observation_type": ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT, + "dtype_configs": default_op_quint8_dtype_config, + }) + + # fused bn configs + for fused_bn in bn_to_fused_bn.values(): + bn_configs.append({ + "pattern": fused_bn, + "observation_type": ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT, + "dtype_configs": default_op_quint8_dtype_config, + }) + return bn_configs + +def _get_share_qparams_op_configs(): + """ Get the operator config for the operators that works for both float and quantized input + if input is quantized, the output Tensor shares the same quantization parameter + with input. + Example operator: avgpool2d, reshape, transpose, maxpool2d + Example observed operator: + observer_0 - avgpool2d - observer_0 (same observer instance as input) + """ + + def _get_share_qprams_op_backend_config(op): + return { + "pattern": op, + "observation_type": ObservationType.OUTPUT_SHARE_OBSERVER_WITH_INPUT, + "dtype_configs": [default_op_quint8_dtype_config], + } + + share_qparams_ops = [ + torch.nn.AdaptiveAvgPool1d, + torch.nn.AdaptiveAvgPool2d, + torch.nn.AdaptiveAvgPool3d, + torch.nn.AvgPool1d, + torch.nn.AvgPool2d, + torch.nn.AvgPool3d, + torch.nn.Hardtanh, + torch.nn.Identity, + torch.nn.MaxPool1d, + torch.nn.MaxPool2d, + torch.nn.MaxPool3d, + torch.nn.ReLU, + torch.nn.ReLU6, + torch.adaptive_avg_pool1d, + torch.nn.functional.adaptive_avg_pool2d, + torch.nn.functional.adaptive_avg_pool3d, + torch.nn.functional.hardtanh, + torch.nn.functional.hardtanh_, + torch.nn.functional.interpolate, + torch.nn.functional.max_pool1d, + torch.nn.functional.max_pool2d, + torch.nn.functional.max_pool3d, + torch.nn.functional.relu, + torch.nn.functional.relu6, + torch.avg_pool1d, + torch._C._nn.avg_pool2d, + torch._C._nn.avg_pool3d, + torch.clamp, + torch.flatten, + torch.mean, + torch.repeat_interleave, + torch.transpose, + torch.squeeze, + torch.stack, + torch.unsqueeze, + operator.floordiv, + "contiguous", + "clamp", + "detach", + "detach_", + "mean", + "permute", + "repeat", + "repeat_interleave", + "reshape", + "resize_", + "relu", + "relu_", + "shape", + "size", + "squeeze", + "squeeze_", + "transpose", + "unsqueeze", + "unsqueeze_", + "view" + ] + return [_get_share_qprams_op_backend_config(op) for op in share_qparams_ops] + +def _get_rnn_op_configs(): + rnn_op_configs = [] + for rnn_op in [ + torch.nn.GRUCell, + torch.nn.LSTMCell, + torch.nn.RNNCell, + torch.nn.LSTM, + ]: + rnn_op_configs.append({ + "pattern": rnn_op, + "observation_type": ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT, + "dtype_configs": [default_dynamic_int8_dtype_config], + }) + return rnn_op_configs + +def _get_embedding_op_configs(): + embedding_op_configs = [] + for embedding_op in [ + torch.nn.Embedding, + torch.nn.EmbeddingBag, + nnqat.Embedding, + nnqat.EmbeddingBag, + ]: + embedding_op_configs.append({ + "pattern": embedding_op, + "observation_type": ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT, + "dtype_configs": [ + weight_only_quint8_dtype_config, + weight_only_quint4x2_dtype_config + ], + # This is temporary, and will be removed soon + "_input_output_observed": False + }) + return embedding_op_configs + +def get_native_backend_config_dict(): + """ Get backend_config_dict for PyTorch Native backend (fbgemm/qnnpack). """ + return { + # optional + "name": "native", + "configs": [ + *_DEFAULT_OP_INT8_CONFIGS, + *_get_linear_configs(), + *_get_conv_configs(), + *_get_binary_op_configs(), + *_get_fixed_qparams_op_configs(), + _CAT_CONFIG, + *_get_bn_configs(), + *_get_share_qparams_op_configs(), + *_get_rnn_op_configs(), + *_get_embedding_op_configs(), + ], + } diff --git a/torch/ao/quantization/fx/backend_config/quantize_handler.py b/torch/ao/quantization/fx/backend_config/quantize_handler.py index fe932e31bd214a..b836cc3bf149af 100644 --- a/torch/ao/quantization/fx/backend_config/quantize_handler.py +++ b/torch/ao/quantization/fx/backend_config/quantize_handler.py @@ -1,18 +1,67 @@ import torch -from typing import Dict -from torch.fx.graph import Node +from typing import Dict, Callable, Any, Optional from .observation_type import ObservationType from ..quantization_patterns import QuantizeHandler +from ..quantization_types import Pattern, NodePattern +from ...utils import ( + activation_dtype, +) -def get_quantize_handler_cls(observation_type, dtype_configs): +def get_quantize_handler_cls( + observation_type, + dtype_configs, + num_tensor_args_to_observation_type, + overwrite_output_fake_quantizer, + overwrite_output_observer, + input_output_observed): class ConfigurableQuantizeHandler(QuantizeHandler): - def __init__(self, node: Node, modules: Dict[str, torch.nn.Module]): - super().__init__(node, modules) - self.observation_type = observation_type + def __init__( + self, + node_pattern: NodePattern, + modules: Dict[str, torch.nn.Module], + root_node_getter: Callable = None): + super().__init__(node_pattern, modules, root_node_getter) + if num_tensor_args_to_observation_type: + assert self.num_tensor_args in num_tensor_args_to_observation_type, \ + f"Must provide observation_type config for tensor number {self.num_tensor_args}" \ + f" in num_tensor_args_to_observation_type for {node_pattern}" + self.observation_type = num_tensor_args_to_observation_type[self.num_tensor_args] + else: + self.observation_type = observation_type self.dtype_configs = dtype_configs + self.overwrite_output_fake_quantizer = overwrite_output_fake_quantizer + self.overwrite_output_observer = overwrite_output_observer + self.input_output_observed_ = input_output_observed def is_general_tensor_value_op(self) -> bool: - return observation_type == ObservationType.OUTPUT_SHARE_OBSERVER_WITH_INPUT + return self.observation_type == ObservationType.OUTPUT_SHARE_OBSERVER_WITH_INPUT + + # TODO: change this to output activation + def get_activation_ctr( + self, + qconfig: Any, + pattern: Pattern, + is_training: bool, + ) -> Optional[Callable]: + """ + Returns the constructor for the activation observer which should be + used for the pattern matched to this handler. Some handlers override + this to a different value than what is specified in the qconfig. + """ + act_dtype = activation_dtype(qconfig) + # TODO: change to is_qat + if is_training: + if act_dtype == torch.quint8 and self.overwrite_output_fake_quantizer is not None: + return self.overwrite_output_fake_quantizer + else: + if act_dtype == torch.quint8 and self.overwrite_output_observer is not None: + return self.overwrite_output_observer + return qconfig.activation + + # This is temporary, and will be removed soon + def input_output_observed(self): + return self.input_output_observed_ + return ConfigurableQuantizeHandler diff --git a/torch/ao/quantization/fx/backend_config/utils.py b/torch/ao/quantization/fx/backend_config/utils.py index 04f080289c8d30..b45641c4012c45 100644 --- a/torch/ao/quantization/fx/backend_config/utils.py +++ b/torch/ao/quantization/fx/backend_config/utils.py @@ -1,8 +1,12 @@ +from typing import Dict, Any, List, Callable, Union + import torch +from torch.ao.quantization.utils import get_combined_dict +from torch.ao.quantization.fx.pattern_utils import get_default_quant_patterns, sorted_patterns_dict import torch.nn as nn from .quantize_handler import get_quantize_handler_cls from .fuse_handler import get_fuse_handler_cls -from typing import Dict, Any, List, Callable, Union +from .native import get_native_backend_config_dict from ..quantization_types import Pattern, QuantizerCls def get_pattern_to_quantize_handlers( @@ -16,10 +20,20 @@ def get_pattern_to_quantize_handlers( pattern_to_quantize_handlers = dict() for config in backend_config_dict.get("configs", []): pattern = config["pattern"] - observation_type = config["observation_type"] + observation_type = config.get("observation_type", None) dtype_configs = config["dtype_configs"] + num_tensor_args_to_observation_type = config.get("num_tensor_args_to_observation_type", {}) + overwrite_fake_quantizer = config.get("_overwrite_output_fake_quantizer", None) + overwrite_observer = config.get("_overwrite_output_observer", None) + input_output_observed = config.get("_input_output_observed", True) pattern_to_quantize_handlers[pattern] = \ - get_quantize_handler_cls(observation_type, dtype_configs) + get_quantize_handler_cls( + observation_type, + dtype_configs, + num_tensor_args_to_observation_type, + overwrite_fake_quantizer, + overwrite_observer, + input_output_observed) return pattern_to_quantize_handlers @@ -125,3 +139,18 @@ def extra_inputs_getter(pattern) -> List[Any]: extra_inputs_getter_mapping[pattern] = extra_inputs_getter return extra_inputs_getter_mapping + +def get_native_quant_patterns(additional_quant_patterns: Dict[Pattern, QuantizerCls] = None) -> Dict[Pattern, QuantizerCls]: + """ + Return a map from pattern to quantize handlers based on the default patterns and the native backend_config_dict. + The returned map is sorted such that longer patterns will be encountered first when iterating through it. + """ + patterns = get_default_quant_patterns() + if additional_quant_patterns is not None: + patterns = get_combined_dict(patterns, additional_quant_patterns) + # TODO: currently we just extend the quantize handlers generated from + # `get_native_backend_config_dict` + # in the future we can just assign backend_config_dict when everything is defined + for pattern, quantize_handler in get_pattern_to_quantize_handlers(get_native_backend_config_dict()).items(): + patterns[pattern] = quantize_handler + return sorted_patterns_dict(patterns) diff --git a/torch/ao/quantization/fx/common_quantization_patterns.py b/torch/ao/quantization/fx/common_quantization_patterns.py index a6e687cc6e91ba..a863c18a383e14 100644 --- a/torch/ao/quantization/fx/common_quantization_patterns.py +++ b/torch/ao/quantization/fx/common_quantization_patterns.py @@ -1,73 +1,8 @@ -import torch -from torch.fx.graph import ( - Node, - Graph, -) - -from ..utils import ( - get_qconfig_dtypes, - activation_dtype, -) - -from .utils import ( - quantize_node, -) - from .quantization_patterns import ( QuantizeHandler, ) - -from ..qconfig import QConfigAny - -from typing import Any, Callable, Dict, Tuple - +# TODO: remove class CommonQuantizeHandler(QuantizeHandler): """ Common quantized op, first input and first output will be quantized """ - def __init__( - self, - node: Node, - modules: Dict[str, torch.nn.Module]): - super().__init__(node, modules) - if node.op == "call_function" or node.op == "call_method": - self.op = node.target - elif node.op == "call_module": - self.op = type(modules[str(node.target)]) - - def convert(self, - node: Node, - qconfig: QConfigAny, - modules: Dict[str, torch.nn.Module], - quantized_graph: Graph, - node_name_to_scope: Dict[str, Tuple[str, type]], - load_arg: Callable, - is_reference: bool = False, - convert_custom_config_dict: Dict[str, Any] = None) -> Node: - if not self.all_node_args_are_tensors: - return NotImplemented - assert node.op in ['call_module', 'call_function'], 'Only call_module and ' + \ - 'call_function are handled in DefaultNode' - assert is_reference - if convert_custom_config_dict is None: - convert_custom_config_dict = {} - additional_static_quant_mapping = convert_custom_config_dict.get("static", {}) - - dtypes = get_qconfig_dtypes(qconfig) - # We can produce reference for a dtypes including - # (torch.quint8, torch.qint8, torch.qint32, torch.float16) - act_dtype = activation_dtype(qconfig) - if act_dtype == torch.float: - op_out = quantized_graph.node_copy(node, load_arg(quantized=torch.float)) - return op_out - else: - activation_post_process = \ - self._maybe_get_last_node_only_observer(modules) - assert activation_post_process is not None - # make sure the input is quantized to act_dtype - load_arg(quantized={0: act_dtype})(node.args) - args = load_arg(quantized=torch.float)(node.args) - kwargs = load_arg(quantized=torch.float)(node.kwargs) - op_out = quantized_graph.node_copy(node, load_arg(quantized=torch.float)) - return quantize_node( - op_out, activation_post_process, - node, modules, quantized_graph, node_name_to_scope, is_input=False) + pass diff --git a/torch/ao/quantization/fx/convert.py b/torch/ao/quantization/fx/convert.py index 717ad46529f0bb..5bb8b16910f7c8 100644 --- a/torch/ao/quantization/fx/convert.py +++ b/torch/ao/quantization/fx/convert.py @@ -1,29 +1,26 @@ -from typing import Any, Dict, Tuple, List, Callable, Optional, Union, Set -from collections import defaultdict -import copy +from typing import Any, Dict, List, Optional, Set, Callable, Tuple import torch +import copy +import warnings from torch.fx import ( GraphModule, - Proxy, - map_arg ) from torch.fx.graph import ( Graph, Node, + Argument, ) -from torch.fx.node import Argument -from .quantization_types import Pattern -from ..qconfig import QConfigAny, qconfig_equals -from .match_utils import ( - find_matches, -) -from .graph_module import ( - is_observed_module, - is_observed_standalone_module, - QuantizedGraphModule, +from ..utils import ( + activation_is_statically_quantized, + weight_is_quantized, + get_qparam_dict, + _parent_name, + get_swapped_custom_module_class, + get_quant_type, ) -from .quantization_patterns import ( - QuantizeHandler, +from ..qconfig import ( + QConfigAny, + qconfig_equals ) from ..qconfig_dict_utils import ( convert_dict_to_ordered_dict, @@ -34,64 +31,138 @@ compare_prepare_convert_qconfig_dict, update_qconfig_for_fusion, ) +from ..quantization_mappings import DEFAULT_REFERENCE_STATIC_QUANT_MODULE_MAPPINGS +from .backend_config.utils import get_quantized_reference_module_mapping +from .graph_module import ( + QuantizedGraphModule, + is_observed_module, + is_observed_standalone_module, +) from ._equalize import update_obs_for_equalization, convert_eq_obs from .utils import ( - is_get_tensor_info_node, - node_return_type_is_int, - quantize_node, + get_custom_module_class_keys, + get_quantize_node_info, + create_getattr_from_value, collect_producer_nodes, graph_module_from_producer_nodes, - get_custom_module_class_keys, WEIGHT_INDEX_DICT, ) +from .quantization_patterns import ( + QuantizeHandler, +) +from .quantization_types import Pattern +from ..quant_type import QuantType from torch.ao.quantization.quantize import ( _remove_qconfig, is_activation_post_process, ) -from ..utils import ( - activation_is_statically_quantized, - activation_dtype, +from .lower_to_fbgemm import lower_to_fbgemm + +# these are tuples so that they can work with isinstance(module, tuple_of_classes) +FUSED_MODULE_CLASSES = ( + torch.nn.intrinsic.LinearReLU, + torch.nn.intrinsic.LinearBn1d, + torch.nn.intrinsic.ConvReLU1d, + torch.nn.intrinsic.ConvReLU2d, + torch.nn.intrinsic.ConvReLU3d, ) -from .lower_to_fbgemm import lower_to_fbgemm -from ..quantization_mappings import ( - DEFAULT_QAT_MODULE_MAPPINGS, +FLOAT_WEIGHTED_MODULE_CLASSES = ( + torch.nn.Linear, + torch.nn.Conv1d, + torch.nn.Conv2d, + torch.nn.Conv3d, +) + +QAT_MODULE_CLASSES = ( + torch.nn.qat.Linear, + torch.nn.qat.Conv2d, + torch.nn.qat.Conv3d, + torch.nn.intrinsic.qat.LinearReLU, + torch.nn.intrinsic.qat.LinearBn1d, + torch.nn.intrinsic.qat.ConvBn1d, + torch.nn.intrinsic.qat.ConvBnReLU1d, + torch.nn.intrinsic.qat.ConvReLU1d, + torch.nn.intrinsic.qat.ConvBn2d, + torch.nn.intrinsic.qat.ConvBnReLU2d, + torch.nn.intrinsic.qat.ConvReLU2d, + torch.nn.intrinsic.qat.ConvBn3d, + torch.nn.intrinsic.qat.ConvBnReLU3d, + torch.nn.intrinsic.qat.ConvReLU3d +) + +WEIGHT_ONLY_MODULE_CLASSES = ( + torch.nn.Embedding, + torch.nn.EmbeddingBag, ) +DYNAMIC_MODULE_CLASSES = ( + torch.nn.GRUCell, + torch.nn.LSTMCell, + torch.nn.RNNCell, + torch.nn.LSTM, +) + +def restore_state( + observed: torch.nn.Module +) -> Tuple[Dict[Pattern, QuantizeHandler], + Dict[str, Tuple[str, type]], + Dict[str, Any], + Set[str]]: + assert is_observed_module(observed), \ + 'incoming model must be produced by prepare_fx' + prepare_custom_config_dict: Dict[str, Any] = \ + observed._prepare_custom_config_dict # type: ignore[assignment] + node_name_to_scope: Dict[str, Tuple[str, type]] = observed._node_name_to_scope # type: ignore[assignment] + patterns: Dict[Pattern, QuantizeHandler] = observed._patterns # type: ignore[assignment] + observed_node_names: Set[str] = observed._observed_node_names # type: ignore[assignment] + return patterns, node_name_to_scope, prepare_custom_config_dict, observed_node_names + +def has_none_qconfig(node: Argument, qconfig_map: Dict[str, QConfigAny]) -> bool: + """ Check if a node has a qconfig of None, i.e. user requested to not quantize + the node + """ + return isinstance(node, Node) and node.name in qconfig_map and qconfig_map[node.name] is None + def run_weight_observers(observed: GraphModule) -> None: - r''' Extract the subgraph that produces the weight for dynamic quant + """ Extract the subgraph that produces the weight for dynamic quant or weight only quant node and run the subgraph to observe the weight. Note that the observers of dynamic quant or weight only quant ops are run during the convert step. - ''' + """ for node in observed.graph.nodes: - if node.op == 'call_function' and node.target in WEIGHT_INDEX_DICT: - for i, node_arg in enumerate(node.args): - if i in WEIGHT_INDEX_DICT[node.target]: - # node_arg is weight - weight_observer_nodes = collect_producer_nodes(node_arg) - if weight_observer_nodes is not None: - weight_observer_module = \ - graph_module_from_producer_nodes( - observed, weight_observer_nodes) - # run the weight observer - weight_observer_module() - -def remove_quant_dequant_pairs(quantized: QuantizedGraphModule) -> QuantizedGraphModule: + if node.op != 'call_function' or node.target not in WEIGHT_INDEX_DICT: + continue + for i, node_arg in enumerate(node.args): + if i not in WEIGHT_INDEX_DICT[node.target]: + continue + # node_arg is weight + weight_observer_nodes = collect_producer_nodes(node_arg) + if weight_observer_nodes is None: + continue + weight_observer_module = \ + graph_module_from_producer_nodes( + observed, weight_observer_nodes) + # run the weight observer + weight_observer_module() + +# this method is temporary will be removed soon +def duplicate_quantize_dynamic_node(quantized: QuantizedGraphModule) -> QuantizedGraphModule: quantized_root = quantized for node in quantized.graph.nodes: - if node.op == "call_function" and node.target in [torch.quantize_per_tensor, torch.quantize_per_channel]: + if (node.op == "call_function" and node.target == torch.quantize_per_tensor_dynamic): users = list(node.users) - user = users[0] if users else None - if len(users) == 1 and user.op == "call_method" and user.target == "dequantize": - user.replace_all_uses_with(node.args[0]) - quantized.graph.erase_node(user) - orig_args = list(node.args) + if len(users) > 1: + for user in users: + with quantized.graph.inserting_before(node): + new_node = quantized.graph.create_node( + "call_function", + torch.quantize_per_tensor_dynamic, + node.args, + node.kwargs) + user.replace_input_with(node, new_node) quantized.graph.erase_node(node) - for arg in orig_args: - if isinstance(arg, Node) and len(list(arg.users)) == 0: - quantized.graph.erase_node(arg) quantized = QuantizedGraphModule(quantized_root, quantized.graph, quantized_root.preserved_attr_names) return quantized @@ -138,28 +209,376 @@ def remove_extra_dequantize(quantized: QuantizedGraphModule) -> QuantizedGraphMo quantized = QuantizedGraphModule(quantized_root, quantized.graph, quantized_root.preserved_attr_names) return quantized +def remove_quant_dequant_pairs(quantized: QuantizedGraphModule) -> QuantizedGraphModule: + quantized_root = quantized + for node in quantized.graph.nodes: + if node.op == "call_function" and node.target in [torch.quantize_per_tensor, torch.quantize_per_channel]: + users = list(node.users) + user = users[0] if users else None + if len(users) == 1 and user.op == "call_method" and user.target == "dequantize": + user.replace_all_uses_with(node.args[0]) + quantized.graph.erase_node(user) + orig_args = list(node.args) + quantized.graph.erase_node(node) + for arg in orig_args: + if isinstance(arg, Node) and len(list(arg.users)) == 0: + quantized.graph.erase_node(arg) -def restore_state( - observed: torch.nn.Module -) -> Tuple[Dict[Pattern, QuantizeHandler], - Dict[str, Tuple[str, type]], - Dict[str, Any], - Set[str]]: - assert is_observed_module(observed), \ - 'incoming model must be produced by prepare_fx' - prepare_custom_config_dict: Dict[str, Any] = \ - observed._prepare_custom_config_dict # type: ignore[assignment] - node_name_to_scope: Dict[str, Tuple[str, type]] = observed._node_name_to_scope # type: ignore[assignment] - patterns: Dict[Pattern, QuantizeHandler] = observed._patterns # type: ignore[assignment] - observed_node_names: Set[str] = observed._observed_node_names # type: ignore[assignment] - return patterns, node_name_to_scope, prepare_custom_config_dict, observed_node_names + quantized = QuantizedGraphModule(quantized_root, quantized.graph, quantized_root.preserved_attr_names) + return quantized -def convert(model: GraphModule, is_reference: bool = False, - convert_custom_config_dict: Dict[str, Any] = None, - is_standalone_module: bool = False, - _remove_qconfig_flag: bool = True, - convert_qconfig_dict: Dict[str, Any] = None) -> torch.nn.Module: - """ standalone_module means it a submodule that is not inlined in +def maybe_recursive_remove_dequantize(arg: Any, node: Node, graph: Graph): + """ If the arg is a dequantize Node, or a list/tuple/dict of dequantize Node, + we'll recursively remove the dequantize Node + """ + if isinstance(arg, Node) and \ + arg.op == "call_method" and \ + arg.target == "dequantize": + quantize_node = arg.args[0] + # we only replace the specific use since dequantize could be used by other nodes + # as well + node.replace_input_with(arg, quantize_node) + elif isinstance(arg, (list, tuple)): + for arg_element in arg: + maybe_recursive_remove_dequantize(arg_element, node, graph) + elif isinstance(arg, dict): + for arg_element in arg.values(): + maybe_recursive_remove_dequantize(arg_element, node, graph) + else: + warnings.warn(f"Unsupported node type in recursive remove dequantize: {type(arg)}") + +def get_module_path_and_prefix( + obs_node: Node, + node_name_to_scope: Dict[str, Tuple[str, type]], + qconfig_map: Dict[str, QConfigAny]): + """ Given and observer node, get the `Scope` or the fully qualified name for + the submodule containing the observed node, also return a prefix of "_input" + when the observed node is an input of a F.linear op, and not the output of another + quantized op. + TODO: this logic is hacky, we should think about how to remove it or make it more + general + """ + observed_node = obs_node.args[0] + # an observer can be inserted for both input of the next operator or output of the previous + # operator (they can be the same) + # this flag identifies if the observer is inserted only because the observed node is + # the input of the next operator + assert isinstance(observed_node, Node), \ + f"Expecting observed node to be a Node, but got {observed_node}" + is_input_observer_only = qconfig_map[observed_node.name] is None if observed_node.name in qconfig_map else None + if is_input_observer_only: + # if the quantize function is at the input of op, then we find the first user of the observer_node + # to get the path. If a linear call_function is in the user list, we return the first instance + # of linear node to get the FQN. + users = list(obs_node.users) + first_linear_use_or_first_use = users[0] if users else None + linear_node = None + for n in users: + if n.op == "call_function" and n.target == torch.nn.functional.linear: + linear_node = n + break + if linear_node: + first_linear_use_or_first_use = linear_node + prefix = "_input" + else: + # if the quantize function is at the output of the op, we use the observer input node to get the path + first_linear_use_or_first_use = observed_node + prefix = "" + + if first_linear_use_or_first_use and first_linear_use_or_first_use.name in node_name_to_scope: + module_path, _ = node_name_to_scope[first_linear_use_or_first_use.name] + else: + # TODO: it's not used, so actually we can skip quantization + # but this requires changing return type of quantize_node + # we can fix it later if needed + module_path = "" + return module_path, prefix + +def insert_dequantize_node( + node: Node, + graph: Graph): + """ Inserts dequantize node for `node` in `graph` + """ + with graph.inserting_after(node): + dequantize_node = graph.call_method("dequantize", (node,)) + for user_node in dict(node.users): + if user_node is not dequantize_node: + user_node.replace_input_with(node, dequantize_node) + +def maybe_get_observer_for_node( + node: Node, + modules: Dict[str, torch.nn.Module] +) -> Optional[torch.nn.Module]: + """ + If the node is observed, return the observer + instance. Otherwise, return None. + """ + for maybe_obs_node, _ in node.users.items(): + if maybe_obs_node.op == 'call_module': + maybe_obs = modules[str(maybe_obs_node.target)] + if is_activation_post_process(maybe_obs): + return maybe_obs + return None + +def convert_standalone_module( + node: Node, + modules: Dict[str, torch.nn.Module], + model: torch.fx.GraphModule, + is_reference: bool, + backend_config_dict: Optional[Dict[str, Any]]): + """ Converts a observed standalone module to a quantized standalone module by calling + the fx convert api, currently using the same `is_reference` flag as parent, but we may + changing this behavior in the future (e.g. separating quantization and lowering for + standalone module as well) + + Args: + - node: The call_module node of the observed standalone module + - modules: named_module of original model + - model: original model + - is_reference: a flag from parent provided by user to decide if we want to + produce a reference model or a fbgemm/qnnpack model + - backend_config_dict: backend configuration of the target backend of quantization + """ + convert = torch.ao.quantization.quantize_fx.convert_fx # type: ignore[attr-defined] + # We know that observed standalone module is a GraphModule since + # it's produced by us + observed_standalone_module : GraphModule = modules[str(node.target)] # type: ignore[assignment] + sm_input_quantized_idxs = \ + observed_standalone_module \ + ._standalone_module_input_quantized_idxs\ + .tolist() # type: ignore[operator] + # remove the dequantize nodes for inputs + args = list(node.args) + for idx in range(len(args)): + if idx in sm_input_quantized_idxs: + arg = args[idx] + if arg.op == "call_method" and arg.target == "dequantize": # type: ignore[union-attr] + quantize_node = arg.args[0] # type: ignore[union-attr] + node.replace_input_with(arg, quantize_node) + if len(arg.users) == 0: # type: ignore[union-attr] + model.graph.erase_node(arg) + # add dequantize node for output + sm_output_quantized_idxs = \ + observed_standalone_module \ + ._standalone_module_output_quantized_idxs \ + .tolist() # type: ignore[operator] + if len(sm_output_quantized_idxs) > 0: + assert sm_output_quantized_idxs[0] == 0, "Currently only quantized" + "output idxs = [0] is supported" + + # if it's non-empty, then it means the output is kept in quantized form + # we'll just add a dequantize node after this node + insert_dequantize_node(node, model.graph) + + # TODO: allow convert_custom_config_dict to override backend_config_dict + # for standalone module + # TODO: think about how to handle `is_reference` here + quantized_standalone_module = convert( + observed_standalone_module, + is_reference=is_reference, + backend_config_dict=backend_config_dict) + parent_name, name = _parent_name(node.target) + # update the modules dict + setattr(modules[parent_name], name, quantized_standalone_module) + modules[str(node.target)] = quantized_standalone_module + +def convert_weighted_module( + node: Node, + modules: Dict[str, torch.nn.Module], + observed_node_names: Set[str], + quantized_reference_module_mapping: Dict[Callable, Any], + qconfig_map: Dict[str, QConfigAny]): + """ Convert a weighted module to reference quantized module in the model + If the QConfig of a QAT module is not set, the module will still be converted to + a float module. + + Args: + - node: The call_module node of the observed standalone module + - modules: named_module of original model + - observed_node_names: names for the set of observed fx node, we can skip + this conversion if the node is not observed + - quantized_reference_module_mapping: module mapping from floating point module class + to quantized reference module class, e.g. nn.Conv2d to nn.quantized._reference.Conv2d + """ + original_module = modules[str(node.target)] + float_module = original_module + weight_post_process = None + + if isinstance( + original_module, + QAT_MODULE_CLASSES): + # Converting qat module to a float module, we need to attch + # weight fake_quant to the module, weight fake_quant is assumed to be run during + # QAT so we don't need to run it again here + float_module = original_module.to_float() # type: ignore[operator] + # change qat module to float module + parent_name, name = _parent_name(node.target) + setattr(modules[parent_name], name, float_module) + weight_post_process = original_module.weight_fake_quant + + qconfig = original_module.qconfig + is_observed = node.name in observed_node_names + # If a qconfig is not defined for this node, then skip converting to a reference module + if qconfig is None or has_none_qconfig(node, qconfig_map) or not is_observed: + return + + # TODO: rename weight_is_statically_quantized to weight_is_int8_quantized + is_weight_quantized = weight_is_quantized(qconfig) + quant_type = get_quant_type(qconfig) + + # skip reference module swapping for embedding when quantization mode does not + # match + # TODO: we need a more systematic way to handle this after we migrate to use + # backend_config_dict everywhere + if isinstance(original_module, WEIGHT_ONLY_MODULE_CLASSES) and \ + quant_type != QuantType.WEIGHT_ONLY: + return + + if isinstance(original_module, DYNAMIC_MODULE_CLASSES) and \ + quant_type != QuantType.DYNAMIC: + return + + # the condition for swapping the module to reference quantized module is: + # weights need to be quantized + if not is_weight_quantized: + return + + fused_module = None + # extract the inidividual float_module and fused module + if isinstance(float_module, torch.nn.intrinsic._FusedModule): + fused_module = float_module + float_module = fused_module[0] # type: ignore[index] + + # TODO: expose this through backend_config_dict + # weight_qparams or weight_qparams dict + wq_or_wq_dict = {} + if isinstance(float_module, torch.nn.RNNCellBase): + weight_post_process_ih = qconfig.weight() # type: ignore[union-attr, operator] + weight_post_process_hh = qconfig.weight() # type: ignore[union-attr, operator] + weight_post_process_ih(float_module.weight_ih) + weight_post_process_hh(float_module.weight_hh) + weight_qparams_ih = get_qparam_dict(weight_post_process_ih) + weight_qparams_hh = get_qparam_dict(weight_post_process_hh) + wq_or_wq_dict = { + "weight_ih": weight_qparams_ih, + "weight_hh": weight_qparams_hh, + } + elif isinstance(float_module, torch.nn.LSTM): + # format for wq_or_wq_dict (flattened attributes): + # {"weight_ih_l0_scale": ..., "weight_ih_l0_qscheme": ..., ...} + for wn in float_module._flat_weights_names: + if hasattr(float_module, wn) and wn.startswith("weight"): + weight = getattr(float_module, wn) + weight_post_process = qconfig.weight() # type: ignore[union-attr, operator] + if weight_post_process.dtype == torch.qint8: + weight_post_process(weight) + wq_or_wq_dict[wn] = get_qparam_dict(weight_post_process) + else: + # weight_post_process is None means the original module is not a QAT module + # we need to get weight_post_process from qconfig in this case + if weight_post_process is None: + weight_post_process = qconfig.weight() # type: ignore[union-attr, operator] + # run weight observer + # TODO: This is currently a hack for QAT to get the right shapes for scale and zero point. + # In the future, we should require the user to calibrate the model after calling prepare + # Issue: https://github.com/pytorch/pytorch/issues/73941 + weight_post_process(float_module.weight) # type: ignore[operator] + wq_or_wq_dict = get_qparam_dict(weight_post_process) + + # We use the same reference module for all modes of quantization: static, dynamic, weight_only + ref_qmodule_cls = quantized_reference_module_mapping.get(type(float_module), None) + assert ref_qmodule_cls is not None, f"No reference quantized module class configured for {type(float_module)}" + ref_qmodule = ref_qmodule_cls.from_float(float_module, wq_or_wq_dict) # type: ignore[attr-defined] + if fused_module is not None: + fused_module[0] = ref_qmodule + else: + parent_name, name = _parent_name(node.target) + setattr(modules[parent_name], name, ref_qmodule) + +def convert_custom_module( + node: Node, + graph: Graph, + modules: Dict[str, torch.nn.Module], + custom_module_class_mapping: Dict[Callable, Callable], + statically_quantized_custom_module_nodes: Set[Node]): + """ Converts an observed custom module to a quantized custom module based on + `custom_module_class_mapping` + For static quantization, we'll also remove the previous `dequantize` node and + attach the observer node for output to the module, the observer for the node + will be converted to a dequantize node instead of quantize-dequantize pairs + later in the graph. In the end we would have a quantized custom module that + has the same interface as a default quantized module in nn.quantized namespace, + i.e. quantized input and quantized output. + + Args: + - node: The call_module node of the observed standalone module + - graph: The graph containing the node + - modules: named_module of original model + - custom_module_class_mapping: mapping from observed custom module class to + quantized custom module class, used to swap custom modules + - statically_quantized_custom_module_nodes: we'll add the custom module node + if we find it is statically quantized, this will be used later when converting + observers to quant/dequant node pairs, if the observed node is a statically + quantized custom module nodes, we'll convert the observer to a dequantize node, + this is to keep the interface the same as the default quantized module. + TODO: maybe we want to redesign this part to align with reference model design + as well, but there has been some discussions around the interface, so we can do + it later. + """ + observed_custom_module = modules[str(node.target)] + maybe_obs = maybe_get_observer_for_node(node, modules) + qconfig = observed_custom_module.qconfig + if activation_is_statically_quantized(qconfig): + statically_quantized_custom_module_nodes.add(node) + # remove the previous dequant node + prev_node = node.args[0] + # expecting the input node for a custom module node to be a Node + assert isinstance(prev_node, Node), \ + f"Expecting the argument for custom module node to be a Node, but got {prev_node}" + if prev_node.op == "call_method" and prev_node.target == "dequantize": + # change the connection for custom module, we'll change the input + # of custom module node to quantize node: + # Before: quantize - dequantize - custom - module + # After: quantize - custom - module + # \ - dequantize + node.replace_input_with(prev_node, prev_node.args[0]) + + # Remove the dequantize node if it doesn't have other users + if len(prev_node.users) == 0: + graph.erase_node(prev_node) + + # absorb the following observer into the module conversion + activation_post_process = maybe_get_observer_for_node(node, modules) + assert activation_post_process is not None + observed_custom_module.activation_post_process = activation_post_process + + # swap the observed custom module to quantized custom module + quantized_custom_module_class = get_swapped_custom_module_class( + observed_custom_module, custom_module_class_mapping, qconfig) + quantized_custom_module = \ + quantized_custom_module_class.from_observed(observed_custom_module) + parent_name, name = _parent_name(node.target) + setattr(modules[parent_name], name, quantized_custom_module) + +def convert( + model: GraphModule, is_reference: bool = False, + convert_custom_config_dict: Dict[str, Any] = None, + is_standalone_module: bool = False, + _remove_qconfig_flag: bool = True, + convert_qconfig_dict: Dict[str, Any] = None, + backend_config_dict: Optional[Dict[str, Any]] = None) -> torch.nn.Module: + """ + We will convert an observed model (a module with observer calls) to a reference + quantized model, the rule is simple: + 1. for each observer module call in the graph, we'll convert it to calls to + quantize and dequantize functions based on the observer instance + 2. for weighted operations like linear/conv, we need to convert them to reference + quantized module, this requires us to know whether the dtype configured for the + weight is supported in the backend, this is done in prepare step and the result + is stored in observed_node_names, we can decide whether we need to swap the + module based on this set + + standalone_module means it a submodule that is not inlined in parent module, and will be quantized separately as one unit. Returns a quantized standalone module, whether input/output is quantized is @@ -169,7 +588,7 @@ def convert(model: GraphModule, is_reference: bool = False, """ if convert_custom_config_dict is None: convert_custom_config_dict = {} - patterns, node_name_to_scope, prepare_custom_config_dict, _ = restore_state(model) + patterns, node_name_to_scope, prepare_custom_config_dict, observed_node_names = restore_state(model) qconfig_map: Dict[str, QConfigAny] = model._qconfig_map # type: ignore[assignment] # TODO this should be removed now that gpu support for quantization is being supported. @@ -198,9 +617,7 @@ def convert(model: GraphModule, is_reference: bool = False, modules_copy = copy.deepcopy(modules) convert_dict_to_ordered_dict(convert_qconfig_dict) if model._is_qat: - additional_qat_module_mapping = prepare_custom_config_dict.get( - "additional_qat_module_mapping", {}) - convert_qconfig_dict = update_qconfig_for_qat(convert_qconfig_dict, additional_qat_module_mapping) + convert_qconfig_dict = update_qconfig_for_qat(convert_qconfig_dict, {}) convert_qconfig_dict = update_qconfig_for_fusion(model, convert_qconfig_dict) compare_prepare_convert_qconfig_dict(prepare_qconfig_dict, convert_qconfig_dict) # type: ignore[arg-type] @@ -217,10 +634,7 @@ def convert(model: GraphModule, is_reference: bool = False, custom_module_classes = get_custom_module_class_keys( convert_custom_config_dict, "observed_to_quantized_custom_module_class") - matches = find_matches( - model.graph, modules, patterns, - qconfig_map, - custom_module_classes=custom_module_classes) + custom_module_class_mapping = convert_custom_config_dict.get("observed_to_quantized_custom_module_class", {}) if model._equalization_qconfig_map is not None: # If we want to do equalization then do the following: @@ -233,353 +647,167 @@ def convert(model: GraphModule, is_reference: bool = False, # for dynamic quant ops or weight only quant ops run_weight_observers(model) - quantized_graph = Graph() - env: Dict[str, Dict[Optional[torch.dtype], Node]] = defaultdict(lambda: defaultdict(Node)) # type: ignore[arg-type] - graph_inputs: List[str] = [] for node in model.graph.nodes: if node.op == 'placeholder': graph_inputs.append(node.name) - def load_non_quantized(n: Node) -> Node: - assert n.name in env, \ - 'trying to load float node but did not find ' + \ - 'node:' + n.name + \ - ' in env: ' + \ - str(env) - dtype_to_node = env[n.name] - if torch.float in dtype_to_node: - return dtype_to_node[torch.float] - elif None in dtype_to_node: - return dtype_to_node[None] - else: - quantized_node = None - for dtype in [torch.quint8, torch.qint8, torch.float16]: - if dtype in dtype_to_node: - quantized_node = dtype_to_node[dtype] - break - assert quantized_node is not None, "Did not find a supported quantized dtype:{}".format(dtype_to_node) - env[n.name][torch.float] = Proxy(quantized_node).dequantize().node - return env[n.name][torch.float] - - def load_quantized(dtype: torch.dtype): - def load_quantized_impl(n: Node): - assert n.name in env, \ - 'trying to load quantized node but did not find node:' + \ - n.name + ' in environment:' + str(env) - dtype_to_node = env[n.name] - local_dtype : Optional[torch.dtype] = dtype - if local_dtype == torch.float and local_dtype not in dtype_to_node: - local_dtype = None - if local_dtype in [torch.float, None]: - return load_non_quantized(n) - assert local_dtype in dtype_to_node, f'Expecting {dtype} in {dtype_to_node}' - return dtype_to_node[local_dtype] - - return load_quantized_impl - - def load_x(n: Node) -> Node: - assert n.name in env, \ - 'node ' + n.name + ' does not exist in environment' - dtype_to_node = env[n.name] - dtypes = [torch.quint8, torch.qint8, torch.float16, torch.float32, None] - for dtype in dtypes: - if dtype in dtype_to_node: - return dtype_to_node[dtype] - raise Exception(f'dtype {dtype} not found in environment: {dtype_to_node} for node {n.name}') - - def load_arg( - quantized: Optional[Union[List[int], Dict[int, torch.dtype], torch.dtype, Tuple[int, ...]]] - ) -> Callable[[Node], Argument]: + # TODO: move this outside of this function + def replace_observer_with_quantize_dequantize_node( + model: torch.nn.Module, + graph: Graph, + node: Node, + modules: Dict[str, torch.nn.Module], + node_name_to_scope: Dict[str, Tuple[str, type]], + qconfig_map: Dict[str, QConfigAny]) -> None: + """ Replace activation_post_process module call node with quantize and + dequantize node + + Before: + ... -> observer_0(x) -> ... + After: + ... -> torch.quantize_per_tensor(x, ...) -> x.dequantize() -> ... """ - Input: quantized, which can be None, torch.dtype, list or tuple - - if quantized is None, then we'll load the node as long as it - exists - - if quantized is a dtype, then all args will be - quantized to the specific dtype - - if quantized is an empty list or tuple, then it is the same as load_arg(quantized=torch.float) - - if quantized is a list or tuple, then arg should be a list and - the args with corresponding indexes will be quantized to torch.quint8 - - - Output: fn which takes arg_or_args, and loads them from the - corresponding environment depending on the value of quantized. - """ - assert quantized is None or \ - isinstance(quantized, (tuple, list, dict, torch.dtype)), type(quantized) - if isinstance(quantized, (tuple, list, dict)) and len(quantized) == 0: - # empty tuple or list means nothing is quantized - quantized = torch.float - - def load_arg_impl(arg_or_args): - # we'll update the format of `quantized` - # to better match arg_or_args - updated_quantized: Optional[Union[List[int], torch.dtype, Dict[int, torch.dtype], Tuple[int, ...]]] = quantized - - if isinstance(quantized, (tuple, list)) and \ - len(quantized) == 1 and isinstance(arg_or_args, Node): - # when argument is one Node instead of tuple, we just need to check - # 0 is in the quantized list - if 0 in quantized: - updated_quantized = torch.quint8 - - if updated_quantized is None: - return map_arg(arg_or_args, load_x) - if isinstance(updated_quantized, torch.dtype): - return map_arg( - arg_or_args, - load_quantized(updated_quantized)) - elif isinstance(updated_quantized, (tuple, list)): - assert isinstance(arg_or_args, (tuple, list)), arg_or_args - loaded_args = [] - # for now, we only support quantizing positional arguments - for i, a in enumerate(arg_or_args): - if i in updated_quantized: - # Currently it's hardcoded to torch.quint8, we can extend this - # in the future to support all quantized - # dtypes - loaded_args.append(map_arg(a, load_quantized(torch.quint8))) - else: - loaded_args.append(map_arg(a, load_non_quantized)) - return type(arg_or_args)(loaded_args) - elif isinstance(updated_quantized, dict): - loaded_args = [] - for i, a in enumerate(arg_or_args): - if i in updated_quantized: - loaded_args.append(map_arg(a, load_quantized(updated_quantized[i]))) - else: - loaded_args.append(map_arg(a, load_non_quantized)) - return type(arg_or_args)(loaded_args) - return load_arg_impl - - def node_arg_is_quantized(node_arg: Any) -> bool: - if isinstance(node_arg, Node): - assert node_arg.name in env, \ - 'Expecting node_arg to be in the environment' - if node_arg.name in env: - dtype_to_node = env[node_arg.name] - return any([x in dtype_to_node for x in [torch.quint8, torch.qint8, torch.float16]]) - else: - return False - elif isinstance(node_arg, list): - quantized = map(node_arg_is_quantized, node_arg) - if all(quantized): - return True - elif not any(quantized): - return False - else: - raise Exception( - "partially quantized inputs in list not handled yet") - else: - return False - - def is_output_quantized( - node: Node, obj: QuantizeHandler, qconfig: QConfigAny, - modules: Dict[str, torch.nn.Module]) -> bool: - """ Check if output node is quantized or not """ - assert modules is not None - # for some ops the output is quantized only when `is_reference` is True - # and when `is_reference` is False, it has limited qconfig - # support, for example `add` - # ideally this check should not happen here, it should happen either in - # prepare or during lowering, we don't need this check - # after the default path is changed to produce reference patterns - quantized = obj.is_output_quantized(qconfig) - - # Need to get correct quantized/non-quantized state forn the output - # of FixedQParamsQuantizeHandler - # TODO: we may want to try to remove the special case here - # as well - if obj.should_mark_output_quantized_from_input_quantized_status(qconfig): - assert node.op in [ - 'call_module', - 'call_function', - 'call_method'], \ - 'FixedQParamsQuantizeHandler of type ' + node.op + ' is not handled' - # TODO: need to extend this to consider all relevant args instead of just arg[0] - quantized = node_arg_is_quantized(node.args[0]) - - # the output is unquantized if the node is not a CopyNode - # or the activation is not statically quantized - if not activation_is_statically_quantized(qconfig) or \ - not obj.input_output_observed(): - quantized = False - if node_return_type_is_int(node): - quantized = False - - return quantized - - def insert_quantize_node(node: Node, modules: Dict[str, torch.nn.Module]) -> None: - """ Given a activation_post_process module call node, insert a - quantize node""" assert modules is not None assert isinstance(node.target, str) + module_path, prefix = get_module_path_and_prefix(node, node_name_to_scope, qconfig_map) observer_module = modules[node.target] - prev_node = node.args[0] - if observer_module.dtype == torch.float32: - # copy the observer for fp32 dtype - env[node.name][torch.float] = quantized_graph.node_copy( - node, load_non_quantized) - elif isinstance(prev_node, Node) and prev_node.name in env: - # if previous node is already quantized, we'll just remove the - # activation_post_process - prev_dtype_to_node: Dict[Optional[torch.dtype], Node] = env[prev_node.name] - current_dtype: Optional[torch.dtype] = observer_module.dtype # type: ignore[assignment] - if current_dtype in prev_dtype_to_node: - env[node.name][current_dtype] = prev_dtype_to_node[current_dtype] - else: - root_module = modules[""] - assert isinstance(prev_node, Node) - observer_dtype: torch.dtype = observer_module.dtype # type: ignore[assignment] - env[node.name][observer_dtype] = \ - quantize_node( - load_non_quantized(prev_node), - observer_module, node, modules, quantized_graph, - node_name_to_scope, is_input=True) + maybe_quantize_node_info = get_quantize_node_info(observer_module) + # Skip replacing observers to quant/dequant nodes if the qconfigs of all + # consumers and producers of this observer are None + skip_replacement = all([ + has_none_qconfig(n, qconfig_map) for n in + list(node.args) + list(node.users.keys())]) + if skip_replacement or maybe_quantize_node_info is None: + # didn't find correponding quantize op and info for the observer_module + # so we just remove the observer + with graph.inserting_before(node): + node.replace_all_uses_with(node.args[0]) + graph.erase_node(node) else: - # replace activation post process with quantization ops - root_module = modules[""] - assert isinstance(node.args[0], Node) - dtype: torch.dtype = observer_module.dtype # type: ignore[assignment] - env[node.name][dtype] = \ - quantize_node( - load_non_quantized(node.args[0]), - observer_module, node, modules, - quantized_graph, - node_name_to_scope, is_input=True) + # otherwise, we can convert the observer moduel call to quantize/dequantize node + node_type, quantize_op, qparams = maybe_quantize_node_info + # replace observer node with quant - dequant node + with graph.inserting_before(node): + input_node = node.args[0] + inputs = [input_node] + for key, value in qparams.items(): + # TODO: we can add the information of whether a value needs to + # be registered as an attribute in qparams dict itself + if key in ['_scale_', '_zero_point_']: + # For scale and zero_point values we register them as buffers in the root module. + # TODO: maybe need more complex attr name here + qparam_node = create_getattr_from_value(model, graph, module_path + prefix + key, value) + inputs.append(qparam_node) + else: + # for qparams that are not scale/zero_point (like axis, dtype) we store them as literals in the graph. + inputs.append(value) + + quantized_node = graph.create_node(node_type, quantize_op, tuple(inputs), {}) + dequantized_node = graph.call_method("dequantize", args=(quantized_node,)) + node.replace_all_uses_with(dequantized_node) + graph.erase_node(node) + + # this is a temporary hack for custom module, we may want to implement + # this properly after the custom module class design is finalized + def replace_observer_with_dequantize_node(node: Node, graph: Graph): + call_custom_module_node = node.args[0] + assert isinstance(call_custom_module_node, Node), \ + f"Expecting the for call custom module node to be a Node, but got {call_custom_module_node}" + node.replace_all_uses_with(call_custom_module_node) + graph.erase_node(node) + insert_dequantize_node(call_custom_module_node, graph) # additional state to override inputs to be quantized, if specified # by the user placeholder_node_seen_cnt = 0 - output_node_seen_cnt = 0 input_quantized_idxs: List[int] = prepare_custom_config_dict.get( "input_quantized_idxs", []) output_quantized_idxs: List[int] = prepare_custom_config_dict.get( "output_quantized_idxs", []) - for node in model.graph.nodes: - if node.op == "output": - cur_output_node_idx = output_node_seen_cnt - output_node_seen_cnt += 1 - if cur_output_node_idx in output_quantized_idxs: - # Result are kept quantized if the user specified the - # output_quantized_idxs override. - graph_output = map_arg(node.args[0], load_x) - else: - graph_output = map_arg(node.args[0], load_non_quantized) - quantized_graph.output(graph_output) - continue - root_node, matched, matched_pattern, obj, qconfig = \ - matches.get(node.name, (None, None, None, None, None)) - if root_node is node: - is_observed_standalone_module_node = ( - node.op == 'call_module' and - is_observed_standalone_module( - modules[node.target]) - ) - if qconfig is None and not is_observed_standalone_module_node: - result = quantized_graph.node_copy( - node, load_non_quantized) - quantized = False - # If there are QAT swapped modules in the graph that we don't want to quantize, rever them back to FP32 ones. - if node.op == 'call_module' and type(modules[node.target]) in DEFAULT_QAT_MODULE_MAPPINGS.values(): - float_mod = modules[node.target].to_float() - setattr(model, node.name, float_mod) - with model.graph.inserting_before(node): - new_float_node = model.graph.create_node('call_module', node.name, node.args, node.kwargs) - else: - assert obj is not None - # We will get whether the output is quantized or not before - # convert for standalone module and after convert - # for non-standalone module, since _standalone_module_output_quantized_idxs - # is only available in observed standalone module - if is_observed_standalone_module_node: - out_quant_idxs = modules[node.target]._standalone_module_output_quantized_idxs.tolist() # noqa: B950 - assert len(out_quant_idxs) <= 1, "Currently standalone only support one output" - quantized = 0 in out_quant_idxs - - qconfig = qconfig_map[node.name] - # Note: load_arg can be overwritten in the convert method when used to - # create Node in graph - result = obj.convert( - node, qconfig, modules, quantized_graph, node_name_to_scope, load_arg, is_reference=is_reference, - convert_custom_config_dict=convert_custom_config_dict) - if not is_observed_standalone_module_node: - quantized = is_output_quantized(node, obj, qconfig, modules) - - if quantized: - env[node.name][activation_dtype(qconfig)] = result - else: - env[node.name][torch.float] = result - continue - elif root_node is not None: - if qconfig is None: - # This branch is hit if all of these conditions are met: - # 1. we are in a fusion pattern of multiple nodes (i.e. add-relu) - # 2. the current node is not the "root_node" of the pattern - # 3. quantization for this pattern is disabled - # - # In this case, we need to make sure to populate the env with - # intermediate nodes manually, because the QuantizeHandler.convert - # function will not be called. - result = quantized_graph.node_copy( - node, load_non_quantized) - env[node.name][torch.float] = result - continue + if backend_config_dict is None: + quantized_reference_module_mapping = copy.deepcopy(DEFAULT_REFERENCE_STATIC_QUANT_MODULE_MAPPINGS) + else: + quantized_reference_module_mapping = get_quantized_reference_module_mapping(backend_config_dict) + # convert tuples so that it can work with isinstance(module, tuple_of_classes) + weighted_module_classes = tuple(quantized_reference_module_mapping.keys()) + statically_quantized_custom_module_nodes: Set[Node] = set() - # handle activation post process calls - if node.op == 'call_module' and \ - is_activation_post_process(modules[node.target]): - insert_quantize_node(node, modules) - elif node.op == 'placeholder': + for node in list(model.graph.nodes): + if node.op == 'placeholder': cur_placeholder_node_idx = placeholder_node_seen_cnt placeholder_node_seen_cnt += 1 if cur_placeholder_node_idx in input_quantized_idxs: - env[node.name][torch.quint8] = quantized_graph.node_copy( - node, load_non_quantized) + # Inputs are assumed to be quantized if the user specifid the + # input_quantized_idxs override. + # we need to dequantize the inputs since all operators took + # floating point inputs in reference quantized models + insert_dequantize_node(node, model.graph) + elif node.op == "output": + # If the argument is empty we don't need to do anything + if len(output_quantized_idxs) == 0: + continue + # Result are kept quantized if the user specified the + # output_quantized_idxs override. + # Remove the dequantize operator for the node in the end if any + return_node = node + output = node.args[0] + # outputs can be Node, list, tuple, dict, other cases are not supported yet + if isinstance(output, (list, tuple)): + for idx in output_quantized_idxs: + maybe_recursive_remove_dequantize(output[idx], return_node, model.graph) + elif isinstance(output, (Node, dict)): + # we treat dict as a single argument currently, but it can be extended + # to support {"key": dtype} after we change output_quantized_idxs to + # dict + if 0 in output_quantized_idxs: + maybe_recursive_remove_dequantize(output, return_node, model.graph) else: - env[node.name][torch.float] = \ - quantized_graph.node_copy(node, load_non_quantized) - else: - # copy quantized or non-quantized node - # get_tensor_info_node like shape works for both - # quantized and non-quantized input and output a non-Tensor - # (we use None for dtype currently for non-Tensors) - if is_get_tensor_info_node(node): - env[node.name][None] = \ - quantized_graph.node_copy(node, load_x) - else: - env[node.name][torch.float] = \ - quantized_graph.node_copy(node, load_non_quantized) + warnings.warn(f"Unsupported node type for output_quantized_idxs: {type(output)}") + elif node.op == "call_module": + if is_activation_post_process(modules[node.target]): + observed_node = node.args[0] + if observed_node in statically_quantized_custom_module_nodes: + replace_observer_with_dequantize_node(node, model.graph) + else: + replace_observer_with_quantize_dequantize_node( + model, model.graph, node, modules, node_name_to_scope, + qconfig_map) + elif is_observed_standalone_module(modules[node.target]): + convert_standalone_module( + node, modules, model, is_reference, backend_config_dict) + elif type(modules[node.target]) in set( + weighted_module_classes).union(QAT_MODULE_CLASSES).union(FUSED_MODULE_CLASSES): + # extra check for fused module classes to make sure they are fused module classes + # of target modules + if type(modules[node.target]) in FUSED_MODULE_CLASSES and \ + type(modules[node.target][0]) not in FLOAT_WEIGHTED_MODULE_CLASSES: + continue + convert_weighted_module( + node, modules, observed_node_names, quantized_reference_module_mapping, qconfig_map) + elif type(modules[node.target]) in custom_module_classes: + convert_custom_module( + node, model.graph, modules, custom_module_class_mapping, + statically_quantized_custom_module_nodes) - # remove activation post process - act_post_process_removed_graph = Graph() - remove_env: Dict[str, Node] = {} - - def load_arg_remove(a: Argument) -> Argument: - return map_arg(a, lambda node: remove_env[node.name]) + preserved_attributes = set(convert_custom_config_dict.get("preserved_attributes", [])) + model = QuantizedGraphModule(model, copy.deepcopy(model.graph), preserved_attributes) - for node in quantized_graph.nodes: - if node.op == 'output': - act_post_process_removed_graph.output( - map_arg(node.args[0], load_arg_remove)) - continue - if node.op == 'call_module' and \ - is_activation_post_process(modules[node.target]): - # remove activation post process node - remove_env[node.name] = remove_env[node.args[0].name] - else: - remove_env[node.name] = act_post_process_removed_graph.node_copy( - node, load_arg_remove) + # remove deadcode after converting observers to quant/dequant ops + model.graph.eliminate_dead_code() + model.recompile() - # removes qconfig and activation_post_process modules - if _remove_qconfig_flag: - _remove_qconfig(model) - preserved_attributes = set(convert_custom_config_dict.get("preserved_attributes", [])) - model = QuantizedGraphModule(model, act_post_process_removed_graph, preserved_attributes) + # TODO: maybe move this to quantize_fx.py if not is_reference: model = duplicate_dequantize_node(model) + model = duplicate_quantize_dynamic_node(model) model = lower_to_fbgemm(model, qconfig_map, node_name_to_scope) model = remove_quant_dequant_pairs(model) model = remove_extra_dequantize(model) + # TODO: this looks hacky, we want to check why we need this and see if we can + # remove this + # removes qconfig and activation_post_process modules + if _remove_qconfig_flag: + _remove_qconfig(model) return model diff --git a/torch/ao/quantization/fx/fuse.py b/torch/ao/quantization/fx/fuse.py index a8d48420c8d34b..c7f4444c6a0317 100644 --- a/torch/ao/quantization/fx/fuse.py +++ b/torch/ao/quantization/fx/fuse.py @@ -4,9 +4,6 @@ map_arg ) from torch.fx.graph import Graph -from ..utils import ( - get_combined_dict -) from .graph_module import ( FusedGraphModule ) @@ -15,13 +12,14 @@ MatchAllNode, ) from .pattern_utils import ( - get_default_fusion_patterns, + sorted_patterns_dict, ) from .backend_config.utils import get_fusion_pattern_to_fuse_handler_cls from .backend_config.utils import get_fuser_method_mapping from .backend_config.utils import get_fusion_pattern_to_root_node_getter from .backend_config.utils import get_fusion_pattern_to_extra_inputs_getter +from .backend_config import get_native_backend_config_dict from .fusion_patterns import * # noqa: F401,F403 @@ -42,21 +40,14 @@ def fuse( input_graph = model.graph named_modules = dict(input_root.named_modules()) - # TODO: remove this branch after we define the configurations for the - # default/native backend if backend_config_dict is None: - additional_fusion_patterns = \ - fuse_custom_config_dict.get("additional_fusion_pattern", {}) - fusion_pattern_to_fuse_handler_cls = get_combined_dict( - get_default_fusion_patterns(), additional_fusion_patterns) - fuser_method_mapping = None - fusion_pattern_to_root_node_getter = {} - fusion_pattern_to_extra_inputs_getter = {} - else: - fusion_pattern_to_fuse_handler_cls = get_fusion_pattern_to_fuse_handler_cls(backend_config_dict) - fuser_method_mapping = get_fuser_method_mapping(backend_config_dict) - fusion_pattern_to_root_node_getter = get_fusion_pattern_to_root_node_getter(backend_config_dict) - fusion_pattern_to_extra_inputs_getter = get_fusion_pattern_to_extra_inputs_getter(backend_config_dict) + backend_config_dict = get_native_backend_config_dict() + + fusion_pattern_to_fuse_handler_cls = sorted_patterns_dict(get_fusion_pattern_to_fuse_handler_cls(backend_config_dict)) + fuser_method_mapping = get_fuser_method_mapping(backend_config_dict) + fusion_pattern_to_root_node_getter = get_fusion_pattern_to_root_node_getter(backend_config_dict) + fusion_pattern_to_extra_inputs_getter = get_fusion_pattern_to_extra_inputs_getter(backend_config_dict) + # find fusion fusion_pairs = _find_matches( input_root, input_graph, fusion_pattern_to_fuse_handler_cls) @@ -111,6 +102,7 @@ def _find_matches( # a map from node to the matched subpattern node_to_subpattern: Dict[Node, Any] = {} + # TODO: dedup with quantization matching function in match_utils.py def apply_match(pattern, node, match, matched_node_pattern, node_to_subpattern): if isinstance(pattern, tuple): s, *args = pattern @@ -122,10 +114,13 @@ def apply_match(pattern, node, match, matched_node_pattern, node_to_subpattern): else: # the first pattern matches will take precedence if node.name not in match_map: - node_to_subpattern[node] = pattern matched_node_pattern.append(node) - root_node, pattern, handler = match - match_map[node.name] = (root_node, pattern, matched_node_pattern, handler, node_to_subpattern) + # MatchAllNode here is actually MatchAllInputNode which should not + # be added to match_map + if pattern is not MatchAllNode: + node_to_subpattern[node] = pattern + root_node, pattern, handler = match + match_map[node.name] = (root_node, pattern, matched_node_pattern, handler, node_to_subpattern) for node in reversed(graph.nodes): if node.name not in match_map: @@ -133,5 +128,6 @@ def apply_match(pattern, node, match, matched_node_pattern, node_to_subpattern): matched_node_pattern: List[Node] = [] if is_match(modules, node, pattern): apply_match(pattern, node, (node, pattern, value(node)), matched_node_pattern, node_to_subpattern) + break return match_map diff --git a/torch/ao/quantization/fx/fusion_patterns.py b/torch/ao/quantization/fx/fusion_patterns.py index aa4d39c831562b..70a2701e5ac174 100644 --- a/torch/ao/quantization/fx/fusion_patterns.py +++ b/torch/ao/quantization/fx/fusion_patterns.py @@ -1,8 +1,5 @@ import torch from torch.fx.graph import Node, Graph -from .pattern_utils import ( - register_fusion_pattern, -) from ..utils import _parent_name from .quantization_types import NodePattern, Pattern from ..fuser_method_mappings import get_fuser_method_new @@ -34,31 +31,7 @@ def fuse(self, is_qat: bool) -> Node: pass -@register_fusion_pattern((torch.nn.ReLU, torch.nn.Conv1d)) -@register_fusion_pattern((torch.nn.ReLU, torch.nn.Conv2d)) -@register_fusion_pattern((torch.nn.ReLU, torch.nn.Conv3d)) -@register_fusion_pattern((torch.nn.functional.relu, torch.nn.Conv1d)) -@register_fusion_pattern((torch.nn.functional.relu, torch.nn.Conv2d)) -@register_fusion_pattern((torch.nn.functional.relu, torch.nn.Conv3d)) -@register_fusion_pattern((torch.nn.functional.relu, torch.nn.Linear)) -@register_fusion_pattern((torch.nn.ReLU, torch.nn.Linear)) -@register_fusion_pattern((torch.nn.functional.relu, torch.nn.BatchNorm2d)) -@register_fusion_pattern((torch.nn.ReLU, torch.nn.BatchNorm2d)) -@register_fusion_pattern((torch.nn.functional.relu, torch.nn.BatchNorm3d)) -@register_fusion_pattern((torch.nn.ReLU, torch.nn.BatchNorm3d)) -@register_fusion_pattern((torch.nn.BatchNorm1d, torch.nn.Conv1d)) -@register_fusion_pattern((torch.nn.BatchNorm2d, torch.nn.Conv2d)) -@register_fusion_pattern((torch.nn.BatchNorm3d, torch.nn.Conv3d)) -@register_fusion_pattern((torch.nn.BatchNorm1d, torch.nn.Linear)) -@register_fusion_pattern((torch.nn.ReLU, (torch.nn.BatchNorm1d, torch.nn.Conv1d))) -@register_fusion_pattern((torch.nn.ReLU, (torch.nn.BatchNorm2d, torch.nn.Conv2d))) -@register_fusion_pattern((torch.nn.ReLU, (torch.nn.BatchNorm3d, torch.nn.Conv3d))) -@register_fusion_pattern((torch.nn.functional.relu, (torch.nn.BatchNorm1d, torch.nn.Conv1d))) -@register_fusion_pattern((torch.nn.functional.relu, (torch.nn.BatchNorm2d, torch.nn.Conv2d))) -@register_fusion_pattern((torch.nn.functional.relu, (torch.nn.BatchNorm3d, torch.nn.Conv3d))) -@register_fusion_pattern((torch.nn.BatchNorm1d, torch.nn.ConvTranspose1d)) -@register_fusion_pattern((torch.nn.BatchNorm2d, torch.nn.ConvTranspose2d)) -@register_fusion_pattern((torch.nn.BatchNorm3d, torch.nn.ConvTranspose3d)) +# TODO: move this to backend_config.fuse_handler class DefaultFuseHandler(FuseHandler): def __init__( self, @@ -75,11 +48,9 @@ def fuse(self, fuse_custom_config_dict: Dict[str, Any], fuser_method_mapping: Optional[Dict[Pattern, Union[torch.nn.Sequential, Callable]]], is_qat: bool) -> Node: - additional_fuser_method_mapping = fuse_custom_config_dict.get("additional_fuser_method_mapping", {}) assert root_node.op == "call_module", "Expecting module node to be a call_module Node" root_module = named_modules[str(root_node.target)] - assert len(additional_fuser_method_mapping) == 0, "Fusion implementation is " - "undergoing changes, additoinal_fuser_method_mapping is not supported currently." + def get_modules(pattern): """ Given a node pattern, extract the corresponding modules e.g. input: (relu_node, (bn_node, conv_node)) diff --git a/torch/ao/quantization/fx/graph_module.py b/torch/ao/quantization/fx/graph_module.py index ef43a42d030ff7..2e37e4a557e47e 100644 --- a/torch/ao/quantization/fx/graph_module.py +++ b/torch/ao/quantization/fx/graph_module.py @@ -18,7 +18,7 @@ def __init__(self, root: Union[torch.nn.Module, Dict[str, Any]], graph: Graph, p def __deepcopy__(self, memo): fake_mod = torch.nn.Module() fake_mod.__dict__ = copy.deepcopy(self.__dict__) - return FusedGraphModule(fake_mod, self.graph, self.preserved_attr_names) + return FusedGraphModule(fake_mod, copy.deepcopy(self.graph), copy.deepcopy(self.preserved_attr_names)) class ObservedGraphModule(GraphModule): @@ -45,7 +45,7 @@ def __init__(self, root: Union[torch.nn.Module, Dict[str, Any]], graph: Graph, p def __deepcopy__(self, memo): fake_mod = torch.nn.Module() fake_mod.__dict__ = copy.deepcopy(self.__dict__) - return ObservedGraphModule(fake_mod, self.graph, self.preserved_attr_names) + return ObservedGraphModule(fake_mod, copy.deepcopy(self.graph), copy.deepcopy(self.preserved_attr_names)) def is_observed_module(module: Any) -> bool: return isinstance(module, ObservedGraphModule) @@ -60,7 +60,7 @@ def __init__(self, root: Union[torch.nn.Module, Dict[str, Any]], graph: Graph, p def __deepcopy__(self, memo): fake_mod = torch.nn.Module() fake_mod.__dict__ = copy.deepcopy(self.__dict__) - return ObservedStandaloneGraphModule(fake_mod, self.graph, self.preserved_attr_names) + return ObservedStandaloneGraphModule(fake_mod, copy.deepcopy(self.graph), copy.deepcopy(self.preserved_attr_names)) def is_observed_standalone_module(module: Any) -> bool: return isinstance(module, ObservedStandaloneGraphModule) @@ -104,4 +104,4 @@ def _load_from_state_dict(self, state_dict, prefix, local_metadata, strict, def __deepcopy__(self, memo): fake_mod = torch.nn.Module() fake_mod.__dict__ = copy.deepcopy(self.__dict__) - return QuantizedGraphModule(fake_mod, self.graph, self.preserved_attr_names) + return QuantizedGraphModule(fake_mod, copy.deepcopy(self.graph), copy.deepcopy(self.preserved_attr_names)) diff --git a/torch/ao/quantization/fx/match_utils.py b/torch/ao/quantization/fx/match_utils.py index 876bc39d547132..a1217ec2f8973c 100644 --- a/torch/ao/quantization/fx/match_utils.py +++ b/torch/ao/quantization/fx/match_utils.py @@ -7,8 +7,6 @@ from .quantization_types import Pattern from .quantization_patterns import ( QuantizeHandler, - CustomModuleQuantizeHandler, - StandaloneModuleQuantizeHandler, ) from ..qconfig import ( QConfigAny, @@ -76,6 +74,7 @@ def find_matches( graph: Graph, modules: Dict[str, torch.nn.Module], patterns: Dict[Pattern, QuantizeHandler], + root_node_getter_mapping: Dict[Pattern, Callable], qconfig_map: Dict[str, QConfigAny], standalone_module_names: List[str] = None, standalone_module_classes: List[Callable] = None, @@ -114,29 +113,80 @@ def find_matches( match_map: Dict[str, MatchResult] = {} all_matched : Set[str] = set() - def record_match(pattern, node, matched): + def _recursive_record_node_in_match_map( + last_node, + match_map, + node_pattern, + matched_node_pattern, + pattern, + match_value, + qconfig): + if isinstance(node_pattern, Node): + match_map[node_pattern.name] = ( + last_node, matched_node_pattern, pattern, match_value, qconfig) + else: + for n in node_pattern: + _recursive_record_node_in_match_map(last_node, match_map, n, matched_node_pattern, pattern, match_value, qconfig) + + # TODO: 1. merge with fuse matcher 2. document the code + def record_match( + pattern, + node, + last_node, + matched_node_pattern, + match_map): if isinstance(pattern, tuple): s, *args = pattern - record_match(s, node, matched) + current_node_pattern: List[Node] = [] + record_match( + s, + node, + last_node, + matched_node_pattern, + match_map) if pattern[0] is not getattr: for subpattern, arg in zip(args, node.args): - record_match(subpattern, arg, matched) + record_match( + subpattern, + arg, + node, + current_node_pattern, + match_map) + if len(current_node_pattern) > 1: + matched_node_pattern.append(tuple(current_node_pattern)) + else: + matched_node_pattern.append(current_node_pattern[0]) else: - matched.append(node) + matched_node_pattern.append(node) - cache_for_no_tensor_check: Dict[Node, bool] = dict() for node in reversed(graph.nodes): if node.name not in match_map and node.name not in all_matched: - for pattern, value in patterns.items(): - if is_match(modules, node, pattern): - matched: List[Any] = [] - record_match(pattern, node, matched) - for n in matched: - match_map[n.name] = ( - node, matched, pattern, value(node, modules), # type: ignore[operator] - qconfig_map[n.name]) - all_matched.add(n.name) - # break after finding the first match + for pattern, quantize_handler_cls in patterns.items(): + root_node_getter = root_node_getter_mapping.get(pattern, None) + if is_match(modules, node, pattern) and node.name not in match_map: + matched_node_pattern: List[Node] = [] + record_match( + pattern, + node, + node, + matched_node_pattern, + match_map) + quantize_handler = quantize_handler_cls( # type: ignore[operator] + matched_node_pattern, + modules, + root_node_getter) + last_node = node + # record the match for all nodes in the pattern + _recursive_record_node_in_match_map( + last_node, + match_map, + # we need to record all nodes in the matched pattern in the match_map + matched_node_pattern, + # this is a part of the value corresponding to the node + matched_node_pattern, + pattern, + quantize_handler, + qconfig_map[node.name]) break # add custom module instances to the match result @@ -146,7 +196,7 @@ def record_match(pattern, node, matched): type(modules[node.target]) in custom_module_classes: custom_module_qconfig = qconfig_map[node.name] match_map[node.name] = ( - node, [node], None, CustomModuleQuantizeHandler(node, modules), + node, node, None, QuantizeHandler(node, modules, is_custom_module=True), custom_module_qconfig) def is_standalone_module(node_target: str, modules: Dict[str, torch.nn.Module]): @@ -162,10 +212,10 @@ def is_standalone_module(node_target: str, modules: Dict[str, torch.nn.Module]): (is_standalone_module(node.target, modules) or is_observed_standalone_module(modules[node.target])): # add node to matched nodes - custom_module_qconfig = qconfig_map[node.name] + standalone_module_qconfig = qconfig_map[node.name] match_map[node.name] = ( - node, [node], None, - StandaloneModuleQuantizeHandler(node, modules), - custom_module_qconfig) + node, node, None, + QuantizeHandler(node, modules, is_standalone_module=True), + standalone_module_qconfig) return match_map diff --git a/torch/ao/quantization/fx/pattern_utils.py b/torch/ao/quantization/fx/pattern_utils.py index bba17d730d6ac2..7c8c034108c4fd 100644 --- a/torch/ao/quantization/fx/pattern_utils.py +++ b/torch/ao/quantization/fx/pattern_utils.py @@ -8,7 +8,7 @@ from ..fake_quantize import FixedQParamsFakeQuantize # from .quantization_patterns import BinaryOpQuantizeHandler from ..observer import ObserverBase - +import copy # TODO(future PR): fix the typing on QuantizeHandler (currently a circular dependency) QuantizeHandler = Any @@ -25,7 +25,7 @@ def insert(fn): return insert def get_default_fusion_patterns() -> Dict[Pattern, QuantizeHandler]: - return DEFAULT_FUSION_PATTERNS + return copy.copy(DEFAULT_FUSION_PATTERNS) DEFAULT_QUANTIZATION_PATTERNS = OrderedDict() @@ -47,15 +47,15 @@ def insert(fn): # Get patterns for both static quantization and qat def get_default_quant_patterns() -> Dict[Pattern, QuantizeHandler]: - return DEFAULT_QUANTIZATION_PATTERNS + return copy.copy(DEFAULT_QUANTIZATION_PATTERNS) # a map from pattern to output activation post process constructor # e.g. torch.sigmoid -> default_affine_fixed_qparam_fake_quant def get_default_output_activation_post_process_map(is_training) -> Dict[Pattern, ObserverBase]: if is_training: - return DEFAULT_OUTPUT_FAKE_QUANTIZE_MAP + return copy.copy(DEFAULT_OUTPUT_FAKE_QUANTIZE_MAP) else: - return DEFAULT_OUTPUT_OBSERVER_MAP + return copy.copy(DEFAULT_OUTPUT_OBSERVER_MAP) # Example use of register pattern function: # @register_fusion_pattern(torch.nn.ReLU, (torch.nn.BatchNorm2d, torch.nn.Conv2d))) @@ -63,3 +63,27 @@ def get_default_output_activation_post_process_map(is_training) -> Dict[Pattern, # def __init__(...): # ... # + +def sorted_patterns_dict(patterns_dict: Dict[Pattern, QuantizeHandler]) -> Dict[Pattern, QuantizeHandler]: + """ + Return a sorted version of the patterns dictionary such that longer patterns are matched first, + e.g. match (F.relu, F.linear) before F.relu. + This works for current use cases, but we may need to have a more clever way to sort + things to address more complex patterns + """ + + def get_len(pattern): + """ this will calculate the length of the pattern by counting all the entries + in the pattern. + this will make sure (nn.ReLU, (nn.BatchNorm, nn.Conv2d)) comes before + (nn.BatchNorm, nn.Conv2d) so that we can match the former first + """ + len = 0 + if isinstance(pattern, tuple): + for item in pattern: + len += get_len(item) + else: + len += 1 + return len + + return OrderedDict(sorted(patterns_dict.items(), key=lambda kv: -get_len(kv[0]) if isinstance(kv[0], tuple) else 1)) diff --git a/torch/ao/quantization/fx/prepare.py b/torch/ao/quantization/fx/prepare.py index 3c50565d60b856..f3a490258d1451 100644 --- a/torch/ao/quantization/fx/prepare.py +++ b/torch/ao/quantization/fx/prepare.py @@ -30,11 +30,12 @@ from .quantization_patterns import ( QuantizeHandler, - CustomModuleQuantizeHandler, - StandaloneModuleQuantizeHandler, ) -from .quantization_types import Pattern +from .quantization_types import ( + Pattern, + NodePattern +) from ._equalize import ( is_equalization_observer, @@ -48,7 +49,7 @@ from .pattern_utils import ( MatchResult, - get_default_quant_patterns, + sorted_patterns_dict, ) from .match_utils import ( @@ -60,7 +61,7 @@ get_custom_module_class_keys, all_node_args_have_no_tensors, assert_and_get_unique_device, - node_bool_tensor_arg_indexes, + get_non_observable_arg_indexes_and_types, get_new_attr_name_with_prefix, NON_QUANTIZABLE_WEIGHT_OPS, WEIGHT_INDEX_DICT, @@ -77,7 +78,6 @@ ) from ..utils import ( - get_combined_dict, get_qconfig_dtypes, get_swapped_custom_module_class, activation_is_statically_quantized, @@ -89,11 +89,16 @@ get_pattern_to_dtype_configs, get_pattern_to_input_type_to_index, get_module_to_qat_module, + get_native_quant_patterns, + get_fusion_pattern_to_root_node_getter, ) from typing import Any, Callable, Dict, List, Optional, Tuple, Union, Set from collections import defaultdict +# list of dtypes to not add observers to +DO_NOT_OBS_DTYPE_LIST = [int, float, torch.bool, None] + def is_activation_post_process_node(node: Node, modules: Dict[str, torch.nn.Module]) -> bool: return isinstance(node, torch.fx.Node) and node.op == "call_module" and \ is_activation_post_process(modules[str(node.target)]) @@ -125,7 +130,7 @@ def node_arg_is_bias(node: Node, arg: Any) -> bool: def is_input_arg_dtype_supported_by_backend( arg: Argument, node: Node, - node_name_to_target_dtype: Dict[str, Dict[str, Optional[torch.dtype]]], + node_name_to_target_dtype: Dict[str, Dict[str, Optional[Union[torch.dtype, type]]]], dtype_config: Dict[str, torch.dtype], ) -> bool: """ Check if the configured qconfig for the argument @@ -152,7 +157,7 @@ def is_input_arg_dtype_supported_by_backend( def is_output_dtype_supported_by_backend( node: Node, - node_name_to_target_dtype: Dict[str, Dict[str, Optional[torch.dtype]]], + node_name_to_target_dtype: Dict[str, Dict[str, Optional[Union[torch.dtype, type]]]], dtype_config: Dict[str, torch.dtype], ) -> bool: """ Check if the configured qconfig for the output @@ -169,15 +174,15 @@ def is_observer_in_same_graph(node, modules, node_name_to_target_dtype): in a different place rather than not observed. """ node_output_dtype = get_arg_target_dtype_as_output(node, modules, node_name_to_target_dtype) - if isinstance(node.args[0], Node): + if len(node.args) > 0 and isinstance(node.args[0], Node): if node_output_dtype == torch.quint8 and node.args[0].op == 'placeholder': return False return True def is_pattern_dtype_config_supported_by_backend( pattern: Optional[Pattern], - matched_nodes: Optional[List[Node]], - node_name_to_target_dtype: Dict[str, Dict[str, Optional[torch.dtype]]], + matched_node_pattern: Optional[NodePattern], + node_name_to_target_dtype: Dict[str, Dict[str, Optional[Union[torch.dtype, type]]]], backend_config_dict: Optional[Dict[str, Any]] ) -> bool: """ Check is the dtype configuration of a pattern is supported by @@ -185,14 +190,15 @@ def is_pattern_dtype_config_supported_by_backend( """ if backend_config_dict is None or pattern is None: return True - assert matched_nodes is not None and len(matched_nodes) >= 1 + assert matched_node_pattern is not None and len(matched_node_pattern) >= 1 pattern_to_dtype_configs = get_pattern_to_dtype_configs(backend_config_dict) dtype_configs: List[Dict[str, torch.dtype]] = pattern_to_dtype_configs.get(pattern, []) - # TODO: this only checks one input and one output, need to generalize to multiple + # TODO: this only works for one input and one output patterns, need to generalize to multiple # inputs/output - input_node = matched_nodes[-1] - output_node = matched_nodes[0] + root_node = _default_root_node_getter(matched_node_pattern) + input_node = root_node + output_node = matched_node_pattern[0] for dtype_config in dtype_configs: # check if arg dtype are supported supported = True @@ -243,6 +249,19 @@ def qat_swap_modules( module_to_qat_module: Dict[Callable, Callable]) -> None: convert(root, mapping=module_to_qat_module, inplace=True, remove_qconfig=False) +def add_matched_node_name_to_set(matched_node_pattern: NodePattern, s: Set[str]): + if isinstance(matched_node_pattern, Node): + s.add(matched_node_pattern.name) + elif isinstance(matched_node_pattern, (list, tuple)): + for maybe_node in matched_node_pattern: + add_matched_node_name_to_set(maybe_node, s) + +# this is temporary, will be removed soon +def _default_root_node_getter(node_pattern): + while not isinstance(node_pattern, Node): + node_pattern = node_pattern[-1] + return node_pattern + # TODO: remove observed_op, looks like it's not used def insert_observer( node: Node, @@ -283,7 +302,7 @@ def get_target_activation_dtype_for_node( qhandler: Optional[QuantizeHandler], modules: Dict[str, torch.nn.Module], cache_for_no_tensor_check: Dict[Node, bool], -) -> Dict[str, Optional[torch.dtype]]: +) -> Dict[str, Optional[Union[torch.dtype, type]]]: """ Returns the expected dtype of the input and output of this node after convert. If the value is not None, it represents the dtype of the @@ -329,7 +348,7 @@ def get_target_activation_dtype_for_node( # get qconfig to determine the eventual dtype of this node if qconfig is not None: - if qhandler is not None and qhandler.input_output_observed() and qhandler.is_output_quantized(qconfig): + if qhandler is not None and qhandler.input_output_observed(): act_dtype, weight_dtype, act_compute_dtype = \ get_qconfig_dtypes(qconfig) bias_dtype = torch.float16 \ @@ -337,6 +356,7 @@ def get_target_activation_dtype_for_node( else torch.float return { "input_activation_dtype": act_dtype, + "input_activation_compute_dtype": act_compute_dtype, "weight_dtype": weight_dtype, "bias_dtype": bias_dtype, "output_activation_dtype": act_dtype, @@ -372,8 +392,8 @@ def get_target_activation_dtype_for_node( def get_arg_target_dtype_as_output( arg: Node, modules: Dict[str, torch.nn.Module], - node_name_to_target_dtype: Dict[str, Dict[str, Optional[torch.dtype]]], -) -> Optional[torch.dtype]: + node_name_to_target_dtype: Dict[str, Dict[str, Optional[Union[torch.dtype, type]]]], +) -> Optional[Union[torch.dtype, type]]: """ Get the target output activation dtype for the argumnet in the original graph, skipping inserted observers We are assuming that the observers are inserted correctly, and the dtype for @@ -391,8 +411,8 @@ def get_arg_target_dtype_as_input_to_node( arg: Node, node: Node, modules: Dict[str, torch.nn.Module], - node_name_to_target_dtype: Dict[str, Dict[str, Optional[torch.dtype]]], -) -> Optional[torch.dtype]: + node_name_to_target_dtype: Dict[str, Dict[str, Optional[Union[torch.dtype, type]]]], +) -> Optional[Union[torch.dtype, type]]: """ Get the target argument dtype for the argument `arg`, as input to node `node` """ @@ -410,6 +430,24 @@ def get_arg_target_dtype_as_input_to_node( else: return node_name_to_target_dtype[node.name]["bias_dtype"] +def get_arg_target_compute_dtype_as_input_to_node( + arg: Node, + node: Node, + modules: Dict[str, torch.nn.Module], + node_name_to_target_dtype: Dict[str, Dict[str, Union[torch.dtype, type, None]]], +) -> Union[torch.dtype, type, None]: + """ Get the target argument dtype for the argument `arg`, as input + to node `node` + """ + assert isinstance(arg, Node) + is_weight = node_arg_is_weight(node, arg) + is_bias = node_arg_is_bias(node, arg) + is_activation = not is_weight and not is_bias + if is_activation and \ + "input_activation_compute_dtype" in node_name_to_target_dtype[node.name]: + return node_name_to_target_dtype[node.name]["input_activation_compute_dtype"] + else: + return None def maybe_insert_input_observer_for_arg_or_kwarg( node: Union[Node, Any], @@ -418,7 +456,7 @@ def maybe_insert_input_observer_for_arg_or_kwarg( model: torch.nn.Module, modules: Dict[str, torch.nn.Module], graph: Graph, - node_name_to_target_dtype: Dict[str, Dict[str, Optional[torch.dtype]]], + node_name_to_target_dtype: Dict[str, Dict[str, Optional[Union[torch.dtype, type]]]], qhandler: Optional[QuantizeHandler], prepare_custom_config_dict: Dict[str, Any], backend_config_dict: Optional[Dict[str, Any]], @@ -447,8 +485,7 @@ def maybe_insert_input_observer_for_arg_or_kwarg( # default (no observer) new_arg = arg - is_standalone_module = qhandler is not None and \ - isinstance(qhandler, StandaloneModuleQuantizeHandler) + is_standalone_module = qhandler is not None and qhandler.is_standalone_module() assert qconfig is not None if not is_standalone_module: # regular flow for most nodes, except standalone modules @@ -461,6 +498,9 @@ def maybe_insert_input_observer_for_arg_or_kwarg( arg_as_output_target_dtype = get_arg_target_dtype_as_output(arg, modules, node_name_to_target_dtype) arg_as_input_target_dtype = get_arg_target_dtype_as_input_to_node(arg, node, modules, node_name_to_target_dtype) + arg_as_input_target_compute_dtype = \ + get_arg_target_compute_dtype_as_input_to_node( + arg, node, modules, node_name_to_target_dtype) needs_obs = ( # if the dtypes are different, we need an observer (arg_as_output_target_dtype != arg_as_input_target_dtype) and @@ -469,10 +509,16 @@ def maybe_insert_input_observer_for_arg_or_kwarg( # TODO(future PR): change this so a placeholder is inserted for # future dequants, to make the logic easier to understand (arg_as_input_target_dtype != torch.float) and - # if arg is a bool tensor or not a tensor, do not insert observer - (arg_as_output_target_dtype not in (torch.bool, None)) and + # if arg output dtype is in DO_NOT_OBS_DTYPE_LIST do not insert observer + (arg_as_output_target_dtype not in DO_NOT_OBS_DTYPE_LIST) and # if qconfig is reuse_input qconfig, we won't insert extra observer for input - not is_reuse_input_qconfig_ + not is_reuse_input_qconfig_ or + # need to add input observer for dynamic quantization + # only add observer for first input for now, we may need to extend + # qconfig_dict and backend_config_dict to support more general configurations + # of dynamic quantization, e.g. dynamically quantizing second input, third + # input etc. + (arg_as_input_target_compute_dtype in [torch.quint8, torch.int8, torch.float16]) and arg is node.args[0] ) else: @@ -544,7 +590,7 @@ def maybe_insert_input_observers_for_node( model: torch.nn.Module, modules: Dict[str, torch.nn.Module], graph: Graph, - node_name_to_target_dtype: Dict[str, Dict[str, Optional[torch.dtype]]], + node_name_to_target_dtype: Dict[str, Dict[str, Optional[Union[torch.dtype, type]]]], qhandler: Optional[QuantizeHandler], prepare_custom_config_dict: Dict[str, Any], backend_config_dict: Optional[Dict[str, Any]], @@ -599,7 +645,7 @@ def maybe_insert_input_equalization_observers_for_node( model: torch.nn.Module, modules: Dict[str, torch.nn.Module], graph: Graph, - node_name_to_target_dtype: Dict[str, Dict[str, Optional[torch.dtype]]], + node_name_to_target_dtype: Dict[str, Dict[str, Optional[Union[torch.dtype, type]]]], is_branch: bool, ) -> None: """ @@ -643,7 +689,7 @@ def maybe_insert_output_observer_for_node( modules: Dict[str, torch.nn.Module], graph: Graph, matches: Dict[str, MatchResult], - node_name_to_target_dtype: Dict[str, Dict[str, Optional[torch.dtype]]], + node_name_to_target_dtype: Dict[str, Dict[str, Optional[Union[torch.dtype, type]]]], matched_pattern: Any, qhandler: Optional[QuantizeHandler], is_qat: bool, @@ -654,7 +700,7 @@ def maybe_insert_output_observer_for_node( If `node` does not need an output observer, returns None. """ - root_node, matched_nodes, pattern, qhandler, qconfig = matches.get( + root_node, _, pattern, qhandler, qconfig = matches.get( node.name, (None, None, None, None, None)) if qhandler is None: @@ -663,13 +709,10 @@ def maybe_insert_output_observer_for_node( assert qconfig is not None assert node.op != 'output', 'observer insertion for outputs is handled elsewhere' - is_standalone_module = qhandler is not None and \ - isinstance(qhandler, StandaloneModuleQuantizeHandler) + is_standalone_module = qhandler is not None and qhandler.is_standalone_module() dtype = node_name_to_target_dtype[node.name]["output_activation_dtype"] - should_insert_observer = \ - qhandler.should_insert_observer_for_output( - qconfig, is_qat) and dtype not in (torch.bool, None, torch.float) + should_insert_observer = dtype not in DO_NOT_OBS_DTYPE_LIST + [torch.float] # TODO(future PR): move the following logic to # should_insert_observer_for_output should_insert_observer = should_insert_observer and \ @@ -696,7 +739,7 @@ def maybe_insert_output_observer_for_node( def maybe_insert_observers_before_graph_output( graph_output_node: Node, output_quantized_idxs: List[int], - node_name_to_target_dtype: Dict[str, Dict[str, Optional[torch.dtype]]], + node_name_to_target_dtype: Dict[str, Dict[str, Optional[Union[torch.dtype, type]]]], qconfig_map: Dict[str, QConfigAny], model: torch.nn.Module, modules: Dict[str, torch.nn.Module], @@ -725,7 +768,7 @@ def maybe_insert_observers_before_graph_output( def _recursive_maybe_replace_node_with_obs( maybe_node: Argument, target_dtype: torch.dtype, - node_name_to_target_dtype: Dict[str, Dict[str, Optional[torch.dtype]]], + node_name_to_target_dtype: Dict[str, Dict[str, Optional[Union[torch.dtype, type]]]], qconfig_map: Dict[str, QConfigAny], model: torch.nn.Module, modules: Dict[str, torch.nn.Module], @@ -796,8 +839,8 @@ def _recursive_maybe_replace_node_with_obs( def maybe_propagate_dtype_for_node( node: Node, - target_dtype: torch.dtype, - node_name_to_target_dtype: Dict[str, Dict[str, Optional[torch.dtype]]], + target_dtype: Union[torch.dtype, type], + node_name_to_target_dtype: Dict[str, Dict[str, Optional[Union[torch.dtype, type]]]], matches: Dict[str, MatchResult], ) -> None: """ @@ -809,9 +852,9 @@ def maybe_propagate_dtype_for_node( node_name_to_target_dtype[node.name]["input_activation_dtype"] = target_dtype node_name_to_target_dtype[node.name]["output_activation_dtype"] = target_dtype # if this is a copy node, propagate to first arg - root_node, matched_nodes, pattern, qhandler, qconfig = matches.get( + root_node, _, pattern, qhandler, qconfig = matches.get( node.name, (None, None, None, None, None)) - if qhandler is not None and qhandler.is_general_tensor_shape_op(): + if qhandler is not None and qhandler.is_general_tensor_value_op(): prev_node = node.args[0] if isinstance(prev_node, Node): maybe_propagate_dtype_for_node( @@ -819,7 +862,7 @@ def maybe_propagate_dtype_for_node( def propagate_dtypes_for_known_nodes( graph: Graph, - node_name_to_target_dtype: Dict[str, Dict[str, Optional[torch.dtype]]], + node_name_to_target_dtype: Dict[str, Dict[str, Optional[Union[torch.dtype, type]]]], matches: Dict[str, MatchResult], ) -> None: """ @@ -833,11 +876,26 @@ def propagate_dtypes_for_known_nodes( replace this with a better way to reason about dtypes of tensors. """ for node in graph.nodes: - bool_arg_idxs = node_bool_tensor_arg_indexes(node) - for bool_arg_idx in bool_arg_idxs: - cur_node = node.args[bool_arg_idx] - maybe_propagate_dtype_for_node( - cur_node, torch.bool, node_name_to_target_dtype, matches) + non_observable_arg_dict = get_non_observable_arg_indexes_and_types(node) + + for arg_type in non_observable_arg_dict: + non_observable_indices = non_observable_arg_dict[arg_type](node) + + for index in non_observable_indices: + arg = node.args[index] + + # when an argument is a tuple, it does not show up as another node so we need to go through + # all elements of the tuple manually + if isinstance(arg, tuple) or isinstance(arg, list): + arg_list = list(arg) + else: + arg_list = [arg] + + for cur_arg in arg_list: + # hard coded arguments show up but aren't `Node` typed and do not need dtype propgated + if isinstance(cur_arg, torch.fx.node.Node): + maybe_propagate_dtype_for_node( + cur_arg, arg_type, node_name_to_target_dtype, matches) def maybe_make_input_output_share_observers( node: Node, @@ -1021,7 +1079,7 @@ def insert_observers_for_model( # } # # TODO: rename this to node_name_to_target_dtype_info - node_name_to_target_dtype: Dict[str, Dict[str, Optional[torch.dtype]]] = defaultdict(dict) + node_name_to_target_dtype: Dict[str, Dict[str, Optional[Union[torch.dtype, type]]]] = defaultdict(dict) cache_for_no_tensor_check: Dict[Node, bool] = dict() inputs_seen_counter = 0 @@ -1033,7 +1091,7 @@ def insert_observers_for_model( # other nodes output dtype is specified by the qconfig modules = dict(model.named_modules(remove_duplicate=False)) for node in model.graph.nodes: - root_node, matched_nodes, pattern, qhandler, qconfig = matches.get( + root_node, _, pattern, qhandler, qconfig = matches.get( node.name, (None, None, None, None, None)) node_name_to_target_dtype[node.name] = get_target_activation_dtype_for_node( node, qconfig, inputs_seen_counter, outputs_seen_counter, @@ -1074,7 +1132,7 @@ def insert_observers_for_model( elif node.op in ('call_module', 'call_method', 'call_function', 'output'): # check for matches - root_node, matched_nodes, pattern, qhandler, qconfig = matches.get( + last_node, matched_node_pattern, pattern, qhandler, qconfig = matches.get( node.name, (None, None, None, None, None)) equalization_qconfig = equalization_config_map.get(node.name, None) @@ -1093,15 +1151,14 @@ def insert_observers_for_model( ) is_supported_by_backend = is_pattern_dtype_config_supported_by_backend( - pattern, matched_nodes, node_name_to_target_dtype, backend_config_dict) + pattern, matched_node_pattern, node_name_to_target_dtype, backend_config_dict) if not skip_inserting_observers and is_supported_by_backend: modules = dict(model.named_modules(remove_duplicate=False)) if node.op != 'output': - assert matched_nodes is not None + assert matched_node_pattern is not None # add matched nodes to the observed node name set - for n in matched_nodes: - observed_node_names.add(n.name) + add_matched_node_name_to_set(matched_node_pattern, observed_node_names) # This is currently only used for equalization. # Checks if the current node is in a branch in which the two @@ -1128,26 +1185,28 @@ def insert_observers_for_model( if user != node and is_user_quantized: is_quantized_branch = True - # this modifies node inplace - maybe_insert_input_observers_for_node( - node, qconfig, model, modules, graph, - node_name_to_target_dtype, - qhandler, - prepare_custom_config_dict, - backend_config_dict) - - # Insert equalization input observers if needed - maybe_insert_input_equalization_observers_for_node( - node, equalization_qconfig, model, modules, graph, - node_name_to_target_dtype, is_quantized_branch) - - is_last_node_of_pattern = root_node is node + # TODO: this only works for sequential fusion right now, extend it + # it to automatically detect all input nodes based on the pattern + # need to change find_matches function to return this information + root_node = _default_root_node_getter(matched_node_pattern) + is_input_node_of_the_pattern = node is root_node + if is_input_node_of_the_pattern: + # this modifies node inplace + maybe_insert_input_observers_for_node( + node, qconfig, model, modules, graph, + node_name_to_target_dtype, + qhandler, + prepare_custom_config_dict, + backend_config_dict) + + # Insert equalization input observers if needed + maybe_insert_input_equalization_observers_for_node( + node, equalization_qconfig, model, modules, graph, + node_name_to_target_dtype, is_quantized_branch) + + is_last_node_of_pattern = node is last_node is_general_tensor_value_op = \ (qhandler is not None and qhandler.is_general_tensor_value_op()) - - is_general_tensor_shape_op = \ - (qhandler is not None and qhandler.is_general_tensor_shape_op()) - is_reuse_input_qconfig_ = is_reuse_input_qconfig(qconfig) if is_last_node_of_pattern: @@ -1183,11 +1242,11 @@ def insert_observers_for_model( # to make all inputs and outputs use the first input's # observer if (is_general_tensor_value_op and is_observer_in_same_graph_) or \ - is_general_tensor_shape_op or is_reuse_input_qconfig_: + is_reuse_input_qconfig_: if not maybe_make_input_output_share_observers(node, model, modules): remove_output_observer(node, model, modules) - if isinstance(qhandler, CustomModuleQuantizeHandler): + if qhandler is not None and qhandler.is_custom_module(): swap_custom_module_to_observed(node, qconfig, modules, prepare_custom_config_dict) else: # output @@ -1226,11 +1285,11 @@ def run_prepare_fx_on_standalone_modules( """ for ( node_name, - (root_node, matched_nodes, pattern, qhandler, qconfig), + (root_node, _, pattern, qhandler, qconfig), ) in matches.items(): if qhandler is None: continue - elif not isinstance(qhandler, StandaloneModuleQuantizeHandler): + elif not qhandler.is_standalone_module(): continue sm_qconfig_dict, sm_prepare_config_dict, sm_backend_config_dict = \ @@ -1312,8 +1371,6 @@ def prepare( if equalization_qconfig_dict is None: equalization_qconfig_dict = {} - additional_quant_patterns = \ - prepare_custom_config_dict.get("additional_quant_pattern", {}) # mapping from a tuple of nodes in reverse order to uninitialized # QuantizeHandler subclass. For example, # { @@ -1324,13 +1381,14 @@ def prepare( # ((, ): # ), # } + # TODO: rename to pattern_to_quantize_handler patterns: Dict[Pattern, QuantizeHandler] = {} if backend_config_dict is None: - quant_patterns = get_default_quant_patterns() - patterns = get_combined_dict( - quant_patterns, additional_quant_patterns) + patterns = get_native_quant_patterns({}) + root_node_getter_mapping = {} else: patterns = get_pattern_to_quantize_handlers(backend_config_dict) + patterns = sorted_patterns_dict(patterns) # TODO: make WEIGHT_INDEX_DICT and BIAS_INDEX_DICT an argument to the functions that needs them # TODO: refactor this part to return WEIGHT_INDEX_DICT and BIAS_INDEX_DICT @@ -1350,27 +1408,27 @@ def prepare( else: index_dict[pattern] = [index] # type: ignore[index] + root_node_getter_mapping = \ + get_fusion_pattern_to_root_node_getter(backend_config_dict) + convert_dict_to_ordered_dict(qconfig_dict) convert_dict_to_ordered_dict(equalization_qconfig_dict) qconfig_dict = update_qconfig_for_fusion(model, qconfig_dict) equalization_qconfig_dict = update_qconfig_for_fusion(model, equalization_qconfig_dict) flattened_qconfig_dict = get_flattened_qconfig_dict(qconfig_dict) # TODO: support regex as well - propagate_qconfig_(model, flattened_qconfig_dict) + propagate_qconfig_(model, flattened_qconfig_dict, prepare_custom_config_dict) if is_qat: - additional_qat_module_mapping = prepare_custom_config_dict.get( - "additional_qat_module_mapping", {}) # this path will be deprecated after we fully migrate the convert path # of fbgemm/qnnpack to use the reference path, it will stay # here for a few months if backend_config_dict is None: - module_to_qat_module = get_combined_dict( - get_default_qat_module_mappings(), additional_qat_module_mapping) + module_to_qat_module = get_default_qat_module_mappings() else: module_to_qat_module = get_module_to_qat_module(backend_config_dict) qat_swap_modules(model, module_to_qat_module) - qconfig_dict = update_qconfig_for_qat(qconfig_dict, additional_qat_module_mapping) + qconfig_dict = update_qconfig_for_qat(qconfig_dict, {}) # mapping from fully qualified module name to module instance # for example, @@ -1396,8 +1454,8 @@ def prepare( custom_module_classes = get_custom_module_class_keys( prepare_custom_config_dict, "float_to_observed_custom_module_class") matches = find_matches( - model.graph, modules, patterns, qconfig_map, standalone_module_names, - standalone_module_classes, custom_module_classes) + model.graph, modules, patterns, root_node_getter_mapping, qconfig_map, + standalone_module_names, standalone_module_classes, custom_module_classes) input_quantized_idxs: List[int] = prepare_custom_config_dict.get( "input_quantized_idxs", []) diff --git a/torch/ao/quantization/fx/qconfig_utils.py b/torch/ao/quantization/fx/qconfig_utils.py index 80afa562a10f4a..188de460dbae2f 100644 --- a/torch/ao/quantization/fx/qconfig_utils.py +++ b/torch/ao/quantization/fx/qconfig_utils.py @@ -215,7 +215,6 @@ def check_is_valid_prepare_custom_config_dict(prepare_custom_config_dict: Option "non_traceable_module_class", "additional_fuser_method_mapping", "additional_qat__module_mapping", - "additional_fusion_pattern", "additional_quant_pattern", "input_quantized_idxs", "output_quantized_idxs", diff --git a/torch/ao/quantization/fx/quantization_patterns.py b/torch/ao/quantization/fx/quantization_patterns.py index 7f9947bccb39b1..486208d98bbc40 100644 --- a/torch/ao/quantization/fx/quantization_patterns.py +++ b/torch/ao/quantization/fx/quantization_patterns.py @@ -1,56 +1,25 @@ import torch -from torch.fx import GraphModule from torch.fx.graph import ( Node, - Graph, -) -from ..observer import ( - default_affine_fixed_qparams_observer, - default_symmetric_fixed_qparams_observer, -) - -from ..quantization_mappings import ( - get_static_quant_module_class, - get_dynamic_quant_module_class, -) -from ..utils import ( - get_swapped_custom_module_class, - activation_is_statically_quantized, - activation_is_int8_quantized, - weight_is_statically_quantized, - get_qconfig_dtypes, - activation_dtype, - get_qparam_dict, -) - -from torch.ao.quantization.quantize import ( - is_activation_post_process, ) -from .pattern_utils import ( - register_quant_pattern, - get_default_output_activation_post_process_map, - Pattern, -) -from ..utils import _parent_name from .utils import ( all_node_args_have_no_tensors, - quantize_node, - get_per_tensor_qparams, - get_linear_prepack_op_for_dtype, - create_qparam_nodes, - get_qconv_prepack_op, - get_qconv_op, - create_node_from_old_node_preserve_meta, ) - -from ..qconfig import QConfigAny +from .quantization_types import ( + Pattern, + NodePattern, +) from abc import ABC -import operator -import warnings +from typing import Any, Callable, Dict, Optional -from typing import Any, Callable, Dict, Union, Optional, Tuple, List +def _default_root_node_getter(node_pattern): + if node_pattern is None: + return node_pattern + while not isinstance(node_pattern, Node): + node_pattern = node_pattern[-1] + return node_pattern # ------------------------- # Pattern Registrations @@ -62,33 +31,37 @@ class QuantizeHandler(ABC): """ Base handler class for the quantizer patterns """ - def __init__(self, node: Node, modules: Dict[str, torch.nn.Module]): + def __init__( + self, + node_pattern: NodePattern, + modules: Dict[str, torch.nn.Module], + root_node_getter: Callable = None, + is_custom_module=False, + is_standalone_module=False): """ Records pattern information in __init__, which will be used in convert """ - # this is an indicator of whether all the inputs are Node or not - # since some op might be quantized differently depending on whether - # all inputs are tensors or not, e.g. add/mul - self.num_tensor_args = len(node.args) - self.all_node_args_are_tensors = True - # the last node of the matched pattern - self.last_node = node - - def _maybe_get_last_node_only_observer( - self, - modules: Dict[str, torch.nn.Module] - ) -> Optional[torch.nn.Module]: - """ - If the last node of the pattern is observed, return the observer - instance. Otherwise, return None. - """ - for maybe_obs_node, _ in self.last_node.users.items(): - if maybe_obs_node.op == 'call_module': - maybe_obs = modules[str(maybe_obs_node.target)] - if is_activation_post_process(maybe_obs): - return maybe_obs - return None - + self.node_pattern = node_pattern + self.modules = modules + if root_node_getter is None: + root_node_getter = _default_root_node_getter + self.root_node = root_node_getter(node_pattern) + self.is_custom_module_ = is_custom_module + self.is_standalone_module_ = is_standalone_module + self.num_tensor_args = 0 + # determine how many of the first two args are Tensors (versus scalars) + # this distinguishes things like "x + y" from "x + 2" or "2 + x" + if isinstance(self.root_node, Node): + cache_for_no_tensor_check: Dict[Node, bool] = dict() + for arg_idx in range(len(self.root_node.args)): + arg = self.root_node.args[arg_idx] + if isinstance(arg, Node) and ( + not all_node_args_have_no_tensors( + arg, self.modules, cache_for_no_tensor_check)): + self.num_tensor_args += 1 + + # TODO: can remove after the is_dynamic flag is defined, so that we can + # move embedding op to backend_config_dict def input_output_observed(self) -> bool: """ Returns True if the pattern matched to this qhandler could be @@ -100,44 +73,16 @@ def is_general_tensor_value_op(self) -> bool: """ Returns True if the operator works for both floating point and quantized input, and does some computation based on the input Tensor, + or the ops that only re-arranges the Tensor values or query some metadata + about the Tensor so we need to insert observer/fake_quant for the output of the - operator since the distribution of values is different for input and output - Tensors (for HistogramObserver) - while they share the same quantization parameters - Example: avgpool2d - """ - return False - - def is_general_tensor_shape_op(self) -> bool: - """ Similar to is_general_tensor_value_op, this is a check - for ops that works for both floating point and quantized input, - that only re-arranges the Tensor values or query some metadata about the Tensor - We don't insert observer/fake_quant for the output of these operators - Example: reshape, transpose, maxpool2d - """ - return False - - def should_insert_observer_for_output( - self, - qconfig: Any, - model_is_training: bool, - ) -> bool: - """ - Returns true if an observer should be inserted for the output of - the pattern matched to this QuantizeHandler instance during the - prepare step. - """ - # TODO(future PR): potentially clean up and deduplicate these - # mappings. - return self.all_node_args_are_tensors and self.input_output_observed() - - def should_mark_output_quantized_from_input_quantized_status( - self, - qconfig: QConfigAny - ) -> bool: - """ - Returns true if after convert, the output of the matched pattern is - quantized iff the first input is also quantized. + operator (same observer instance as input) + since the distribution of values is different for input and output + Tensors (for HistogramObserver) while they share the same quantization + parameters + Example operator: avgpool2d, reshape, transpose, maxpool2d + Example observed operator: + observer_0 - avgpool2d - observer_0 (same observer instance as input) """ return False @@ -154,1510 +99,62 @@ def get_activation_ctr( """ return qconfig.activation - def is_output_quantized(self, qconfig): - """ Returns true if the output node of convert is quantized - when is_reference is False, we would return float node when a certain dtype - combination is not supported (since fbgemm/qnnpack only support certain dtype - combinations), so the output may be float, but when is_reference is True, - we support all dtype combinations so the output will always be quantized. - - TODO: This is fragile, whether output is quantized should not depend on `is_reference` since - we want to make sure whether a Tensor is quantized - should be the same in prepare and convert and is_reference - is only available in convert currently - - """ - return True - - def convert(self, - node: Node, - qconfig: QConfigAny, - modules: Dict[str, torch.nn.Module], - quantized_graph: Graph, - node_name_to_scope: Dict[str, Tuple[str, type]], - load_arg: Callable, - is_reference: bool = False, - convert_custom_config_dict: Dict[str, Any] = None) -> Node: - """ Convert the given node to a quantized node and insert - it to the quantized graph - """ - return NotImplemented - - -# Binary op configs - -# Supported combinations are: -# quant_type | activation (compute_type) | weight -# static quint8 qint8 - -# tuple (activation_dtype, weight_dtype, compute_dtype) -# these are supported types for common binary ops like add/mul etc. -all_dtypes = [ - (torch.qint8, torch.qint8, None), - (torch.quint8, torch.qint8, None), - (torch.float16, torch.float16, None), -] -fp16_dtypes = [ - (torch.float16, torch.float16, None) -] -int8_dtypes = [ - (torch.qint8, torch.qint8, None), - (torch.quint8, torch.qint8, None), -] -binary_op_supported_dtypes : Dict[Union[Callable, str], List[Tuple[torch.dtype, torch.dtype, None]]] = { - operator.add: all_dtypes, - torch.add: all_dtypes, - operator.mul: all_dtypes, - torch.mul: all_dtypes, - torch.bmm: fp16_dtypes, - torch.sub: fp16_dtypes, - operator.sub: fp16_dtypes, - torch.div: fp16_dtypes, - operator.truediv: fp16_dtypes, - torch.matmul: int8_dtypes, -} - -default_op_supported_dtypes = { - torch.nn.ConvTranspose1d: int8_dtypes, - torch.nn.ConvTranspose2d: int8_dtypes, - torch.nn.ELU: int8_dtypes, - torch.nn.LeakyReLU: int8_dtypes, - torch.nn.Hardswish: int8_dtypes, - torch.nn.InstanceNorm1d: int8_dtypes, - torch.nn.InstanceNorm2d: int8_dtypes, - torch.nn.InstanceNorm3d: int8_dtypes, - torch.nn.LayerNorm: all_dtypes, - torch.nn.SiLU: fp16_dtypes, - torch.nn.Mish: fp16_dtypes, - torch.nn.GELU: int8_dtypes, - torch.nn.Dropout: int8_dtypes, - torch.nn.Softmax: int8_dtypes, - torch.nn.functional.elu: int8_dtypes, - torch.nn.functional.hardswish: int8_dtypes, - torch.nn.functional.instance_norm: int8_dtypes, - torch.nn.functional.layer_norm: all_dtypes, - torch.nn.functional.leaky_relu: int8_dtypes, - torch.nn.functional.silu: fp16_dtypes, - torch.nn.functional.mish: fp16_dtypes, - torch.nn.functional.gelu: int8_dtypes, - torch.nn.functional.softmax: int8_dtypes, - torch.nn.functional.dropout: int8_dtypes, - torch.sum: fp16_dtypes, -} - -QAT_CONV_MODULE_CLASSES = \ - (torch.nn.qat.Conv2d, - torch.nn.qat.Conv3d, - torch.nn.intrinsic.qat.ConvBn1d, - torch.nn.intrinsic.qat.ConvBn2d, - torch.nn.intrinsic.qat.ConvBn3d, - torch.nn.intrinsic.qat.ConvBnReLU1d, - torch.nn.intrinsic.qat.ConvBnReLU2d, - torch.nn.intrinsic.qat.ConvBnReLU3d, - torch.nn.intrinsic.qat.ConvReLU2d, - torch.nn.intrinsic.qat.ConvReLU3d) - -########################## -# Helper Functions -########################## - -def _load_weight_qparams( - self, state_dict, prefix, local_metadata, strict, - missing_keys, unexpected_keys, error_msgs): - key = prefix + "_weight_qparams" - if key in state_dict: - self._weight_qparams = state_dict[key] - state_dict.pop(key) + def is_custom_module(self): + return self.is_custom_module_ -def _save_weight_qparams(self, destination, prefix, keep_vars): - for attr_name in dir(self): - if "_weight_qparams" == attr_name and \ - isinstance(getattr(self, attr_name), dict): - weight_qparams = getattr(self, attr_name) - destination[prefix + attr_name] = weight_qparams + def is_standalone_module(self): + return self.is_standalone_module_ - -def _to_reference(float_module, weight_qparams): - """ Make a weighted float module (e.g. conv and linear )a reference module by - attaching _weight_qparams that records the qparams for weight - and change the name for the module so that it's recognized - when people print the model - """ - float_module._weight_qparams = weight_qparams - float_module._register_state_dict_hook(_save_weight_qparams) - float_module._register_load_state_dict_pre_hook(_load_weight_qparams, with_module=True) - - float_module_name = float_module._get_name() - - def _get_name(): - return float_module_name + "(Reference)" - - float_module._get_name = _get_name - -@register_quant_pattern(operator.add) -@register_quant_pattern(operator.sub) -@register_quant_pattern(operator.mul) -@register_quant_pattern(operator.truediv) -@register_quant_pattern(torch.add) -@register_quant_pattern(torch.sub) -@register_quant_pattern(torch.mul) -@register_quant_pattern(torch.div) -@register_quant_pattern(torch.bmm) -@register_quant_pattern((torch.nn.ReLU, operator.add)) -@register_quant_pattern((torch.nn.ReLU, operator.mul)) -@register_quant_pattern((torch.nn.ReLU, torch.add)) -@register_quant_pattern((torch.nn.ReLU, torch.mul)) -@register_quant_pattern((torch.nn.functional.relu, operator.add)) -@register_quant_pattern((torch.nn.functional.relu, operator.mul)) -@register_quant_pattern((torch.nn.functional.relu, torch.add)) -@register_quant_pattern((torch.nn.functional.relu, torch.mul)) -@register_quant_pattern((torch.relu, operator.add)) -@register_quant_pattern((torch.relu, operator.mul)) -@register_quant_pattern(torch.matmul) +# TODO: remove this class, this is still exposed in torch.quantization +# but we should be able to break bc class BinaryOpQuantizeHandler(QuantizeHandler): - def __init__( - self, - node: Node, - modules: Dict[str, torch.nn.Module]): - super().__init__(node, modules) - self.relu_node = None - if ( - node.op == 'call_function' and - node.target in (torch.nn.functional.relu, torch.relu) - ) or ( - node.op == 'call_module' and - isinstance(modules[str(node.target)], torch.nn.ReLU) - ): - self.relu_node = node - node = node.args[0] # type: ignore[assignment] - self.binary_op_node = node - self.binary_op = node.target - - # determine how many of the first two args are Tensors (versus scalars) - # this distinguishes things like "x + y" from "x + 2" or "2 + x" - self.num_tensor_args = 0 - cache_for_no_tensor_check: Dict[Node, bool] = dict() - for arg_idx in range(len(self.binary_op_node.args)): - arg = self.binary_op_node.args[arg_idx] - if isinstance(arg, Node) and (not all_node_args_have_no_tensors(arg, modules, cache_for_no_tensor_check)): - self.num_tensor_args += 1 - self.all_node_args_are_tensors = \ - (self.num_tensor_args == len(self.binary_op_node.args)) - - def should_insert_observer_for_output( - self, - qconfig: Any, - model_is_training: bool, - ) -> bool: - """ - Returns true if an observer should be inserted for the output of - the pattern matched to this QuantizeHandler instance during the - prepare step. - """ - dtypes = get_qconfig_dtypes(qconfig) - if not (self.binary_op in binary_op_supported_dtypes and dtypes in binary_op_supported_dtypes[self.binary_op]): - return False - if self.num_tensor_args == 1: - return True - elif self.all_node_args_are_tensors and self.input_output_observed(): - return True - else: - return False - - def is_general_tensor_value_op(self) -> bool: - return self.num_tensor_args == 1 + pass - def input_output_observed(self): - # for x + y where x and y are scalars, we do not observe anything - return self.num_tensor_args > 0 - - def is_output_quantized(self, qconfig): - dtypes = get_qconfig_dtypes(qconfig) - return self.binary_op in binary_op_supported_dtypes and \ - dtypes in binary_op_supported_dtypes[self.binary_op] - - def convert(self, - node: Node, - qconfig: QConfigAny, - modules: Dict[str, torch.nn.Module], - quantized_graph: Graph, - node_name_to_scope: Dict[str, Tuple[str, type]], - load_arg: Callable, - is_reference: bool = False, - convert_custom_config_dict: Dict[str, Any] = None) -> Node: - - if self.num_tensor_args == 0: - # example: x + y, when x and y are scalars - return quantized_graph.node_copy( - node, load_arg(quantized=None)) - - dtypes = get_qconfig_dtypes(qconfig) - - act_dtype = activation_dtype(qconfig) - dtypes = get_qconfig_dtypes(qconfig) - if act_dtype == torch.float or \ - not (self.binary_op in binary_op_supported_dtypes and dtypes in binary_op_supported_dtypes[self.binary_op]): - if self.relu_node: - op_out = quantized_graph.node_copy(self.binary_op_node, load_arg(quantized=torch.float)) - relu_args = [op_out] - relu_args.extend(load_arg(quantized=torch.float)(self.relu_node.args[1:])) - relu_kwargs = load_arg(quantized=torch.float)(self.relu_node.kwargs) - return create_node_from_old_node_preserve_meta( - quantized_graph, - ("call_function", torch.nn.functional.relu, tuple(relu_args), relu_kwargs), - self.relu_node) - else: - return quantized_graph.node_copy(node, load_arg(quantized=torch.float)) - else: - if self.num_tensor_args == 2: - # make sure both inputs are quantized to act_dtype - load_arg(quantized={0: act_dtype, 1: act_dtype})(self.binary_op_node.args) - args = load_arg(quantized=torch.float)(self.binary_op_node.args) - kwargs = load_arg(quantized=torch.float)(self.binary_op_node.kwargs) - op_out = quantized_graph.node_copy(self.binary_op_node, load_arg(quantized=torch.float)) - - def modified_load_arg(n: Node): - if n.name == self.binary_op_node.name: - return op_out - else: - return load_arg(quantized=torch.float)(n) - - if self.relu_node: - op_out = quantized_graph.node_copy(self.relu_node, modified_load_arg) - activation_post_process = \ - self._maybe_get_last_node_only_observer(modules) - assert activation_post_process is not None - return quantize_node( - op_out, activation_post_process, - node, modules, quantized_graph, node_name_to_scope, is_input=False) - -@register_quant_pattern(torch.cat) class CatQuantizeHandler(QuantizeHandler): - def is_general_tensor_value_op(self) -> bool: - return True - - def convert(self, - node: Node, - qconfig: QConfigAny, - modules: Dict[str, torch.nn.Module], - quantized_graph: Graph, - node_name_to_scope: Dict[str, Tuple[str, type]], - load_arg: Callable, - is_reference: bool = False, - convert_custom_config_dict: Dict[str, Any] = None) -> Node: - if not self.all_node_args_are_tensors: - return NotImplemented - act_dtype = activation_dtype(qconfig) - if act_dtype == torch.float: - op_out = quantized_graph.node_copy(node, load_arg(quantized=torch.float)) - return op_out - else: - activation_post_process = \ - self._maybe_get_last_node_only_observer(modules) - assert activation_post_process is not None - # make sure the first argument is quantized to act_dtype - load_arg(quantized={0: act_dtype})(node.args) - args = list(load_arg(quantized=torch.float)(node.args)) - kwargs = load_arg(quantized=torch.float)(node.kwargs) - op_out = quantized_graph.node_copy(node, load_arg(quantized=torch.float)) - return quantize_node( - op_out, - activation_post_process, - node, - modules, - quantized_graph, - node_name_to_scope, - is_input=False) + pass -# handle conv, maybe followed by relu -# NB: matching order is reversed, that is we match from the bottom of this list to the beginning -@register_quant_pattern(torch.nn.Conv1d) -@register_quant_pattern(torch.nn.Conv2d) -@register_quant_pattern(torch.nn.Conv3d) -@register_quant_pattern(torch.nn.functional.conv1d) -@register_quant_pattern(torch.nn.functional.conv2d) -@register_quant_pattern(torch.nn.functional.conv3d) -# TODO: add qat.Conv1d -@register_quant_pattern(torch.nn.qat.Conv2d) -@register_quant_pattern(torch.nn.qat.Conv3d) -@register_quant_pattern(torch.nn.intrinsic.ConvReLU1d) -@register_quant_pattern(torch.nn.intrinsic.ConvReLU2d) -@register_quant_pattern(torch.nn.intrinsic.ConvReLU3d) -@register_quant_pattern(torch.nn.intrinsic.qat.ConvBn1d) -@register_quant_pattern(torch.nn.intrinsic.qat.ConvBn2d) -@register_quant_pattern(torch.nn.intrinsic.qat.ConvBn3d) -@register_quant_pattern(torch.nn.intrinsic.qat.ConvBnReLU1d) -@register_quant_pattern(torch.nn.intrinsic.qat.ConvBnReLU2d) -@register_quant_pattern(torch.nn.intrinsic.qat.ConvBnReLU3d) -@register_quant_pattern(torch.nn.intrinsic.qat.ConvReLU2d) -@register_quant_pattern(torch.nn.intrinsic.qat.ConvReLU3d) -@register_quant_pattern((torch.nn.functional.relu, torch.nn.functional.conv1d)) -@register_quant_pattern((torch.nn.functional.relu, torch.nn.functional.conv2d)) -@register_quant_pattern((torch.nn.functional.relu, torch.nn.functional.conv3d)) -@register_quant_pattern((torch.nn.ReLU, torch.nn.functional.conv1d)) -@register_quant_pattern((torch.nn.ReLU, torch.nn.functional.conv2d)) -@register_quant_pattern((torch.nn.ReLU, torch.nn.functional.conv3d)) -# just for error checks -@register_quant_pattern((torch.nn.ReLU, torch.nn.Conv1d)) -@register_quant_pattern((torch.nn.ReLU, torch.nn.Conv2d)) -@register_quant_pattern((torch.nn.ReLU, torch.nn.Conv3d)) -@register_quant_pattern((torch.nn.functional.relu, torch.nn.Conv2d)) -@register_quant_pattern((torch.nn.functional.relu, torch.nn.Conv3d)) -# TODO: rename Relu -> ReLU to be more consistent with other classes +# TODO: remove this class class ConvReluQuantizeHandler(QuantizeHandler): - def __init__(self, node: Node, modules: Dict[str, torch.nn.Module]): - super().__init__(node, modules) - self.relu_node = None - if (node.op == 'call_function' and node.target is torch.nn.functional.relu) or \ - (node.op == 'call_module' and isinstance(modules[str(node.target)], torch.nn.ReLU)): - self.relu_node = node - node = node.args[0] # type: ignore[assignment] - self.conv_node = node - if node.op == "call_module": - self.conv = modules[str(self.conv_node.target)] - elif node.op == "call_function": - self.conv = node.target # type: ignore[assignment] - - def convert(self, - node: Node, - qconfig: QConfigAny, - modules: Dict[str, torch.nn.Module], - quantized_graph: Graph, - node_name_to_scope: Dict[str, Tuple[str, type]], - load_arg: Callable, - is_reference: bool = False, - convert_custom_config_dict: Dict[str, Any] = None) -> Node: - # Supported combinations are: - # quant_type | activation (compute_type) | weight - # static quint8 qint8 - - # tuple (activation_dtype, weight_dtype, compute_dtype) - supported_dtypes = [ - (torch.quint8, torch.qint8, None), - ] - - # TODO: is_reference option for conv module - dtypes = get_qconfig_dtypes(qconfig) - # leave the op unquantized if the dtype combination is not supported - if not is_reference and dtypes not in supported_dtypes: - warnings.warn( - "dtype combination: {} is not " - "supported by Conv " - "supported dtype combinations are: {}".format(dtypes, supported_dtypes)) - if self.relu_node: - conv_out = quantized_graph.node_copy(self.conv_node, load_arg(quantized=torch.float)) - relu_args = [conv_out] - relu_args.extend(load_arg(quantized=torch.float)(self.relu_node.args[1:])) - relu_kwargs = load_arg(quantized=torch.float)(self.relu_node.kwargs) - return create_node_from_old_node_preserve_meta( - quantized_graph, - ("call_function", torch.nn.functional.relu, tuple(relu_args), relu_kwargs), - self.relu_node) - else: - return quantized_graph.node_copy(node, load_arg(quantized=torch.float)) - - activation_int8_quantized = activation_is_int8_quantized(qconfig) - - if self.conv_node.op == 'call_module': - # note that relu should already be fused into conv module in the fusion step - assert self.relu_node is None, 'conv module and relu fusion is not executed, ' \ - 'please make sure to run fusion before prepare' - output_activation_post_process = \ - self._maybe_get_last_node_only_observer(modules) - assert output_activation_post_process is not None - - module_types_supports_reference_pattern = [ - torch.nn.Conv1d, - torch.nn.Conv2d, - torch.nn.Conv3d, - torch.nn.intrinsic.ConvReLU1d, - torch.nn.intrinsic.ConvReLU2d, - torch.nn.intrinsic.ConvReLU3d, - ] - module_types_supports_reference_pattern.extend(list(QAT_CONV_MODULE_CLASSES)) - # We'll always produce reference pattern for torch.nn.Conv*d, - # will remove the else branch after we migrated all use cases - if is_reference or \ - type(self.conv) in module_types_supports_reference_pattern and \ - dtypes in [(torch.quint8, torch.qint8, None)]: - # produce dequant - float_op - quant pattern - dtype = torch.float - if activation_int8_quantized: - dtype = activation_dtype(qconfig) - activation = load_arg(quantized=dtype)(self.conv_node.args[0]) - args = load_arg(quantized=torch.float)(self.conv_node.args) - # Get the float conv and attach quantization scheme and quantization - # parameters of weight to the module - # and qparam is a dictionary of - # {"qscheme": ..., "scale": ..., "zero_point": ...} for per tensor quantization or - # {"qscheme": ..., "scale": ..., "zero_point": ..., "axis": ...} for per channel quantization - float_conv = self.conv - fused_conv = None - if isinstance( - float_conv, - QAT_CONV_MODULE_CLASSES): - # case 1. converting qat conv module to - # a float conv module, we need to attch - # weight fake_quant to the conv module, - # weight fake_quant is assumed to be run during - # QAT so we don't need to run it again here - float_conv = float_conv.to_float() # type: ignore[operator] - # change qat conv to conv - parent_name, name = _parent_name(self.conv_node.target) - setattr(modules[parent_name], name, float_conv) - if isinstance(float_conv, torch.nn.intrinsic._FusedModule): - fused_conv = float_conv - float_conv = fused_conv[0] - weight_post_process = self.conv.weight_fake_quant - else: - # case 2. converting a conv module/fused conv module - # to float conv module, we need to attach - # weight observer to the conv module and run it - # with conv weight - if isinstance(float_conv, torch.nn.intrinsic._FusedModule): - fused_conv = float_conv - float_conv = fused_conv[0] # type: ignore[index] - assert qconfig is not None - weight_post_process = qconfig.weight() - - # return early when we don't have a valid match - # this typically happens when we called the same conv multiple times in the - # same graph, and it is transformed in previous steps into a reference conv already - if type(float_conv) not in [torch.nn.Conv1d, torch.nn.Conv2d, torch.nn.Conv3d]: - op_out = create_node_from_old_node_preserve_meta( - quantized_graph, - ('call_module', self.conv_node.target, args, {}), - self.conv_node) - return op_out - - qconv_cls = get_static_quant_module_class( - type(float_conv), is_reference=True) - # run weight observer - # TODO: This is currently a hack for QAT to get the right shapes for scale and zero point. - # In the future, we should require the user to calibrate the model after calling prepare - weight_post_process(float_conv.weight) # type: ignore[operator] - weight_qparams = get_qparam_dict(weight_post_process) - # hardcoded for now, TODO: expose the api to user, - # we can have a map from module to reference module - # and allow user to register new ones - ref_conv = qconv_cls.from_float(float_conv, weight_qparams) # type: ignore[attr-defined] - # if the parent is a fused conv (Sequential), we can replace the first - # item to ref conv, otherwise we can update - # the conv instance in the module tree - if fused_conv is not None: - fused_conv[0] = ref_conv - parent_name, name = _parent_name(self.conv_node.target) - setattr(modules[parent_name], name, fused_conv) - else: - parent_name, name = _parent_name(self.conv_node.target) - setattr(modules[parent_name], name, ref_conv) - op_out = create_node_from_old_node_preserve_meta( - quantized_graph, - ('call_module', self.conv_node.target, args, {}), - self.conv_node) - if output_activation_post_process: - op_out = quantize_node( - op_out, - output_activation_post_process, - node, - modules, - quantized_graph, - node_name_to_scope, - is_input=False) - return op_out - else: - if convert_custom_config_dict is None: - convert_custom_config_dict = {} - additional_static_quant_mapping = convert_custom_config_dict.get("static", {}) - # 1. attach activation post process to module - self.conv.activation_post_process = output_activation_post_process - # 2. select quantized class - qconv_cls = get_static_quant_module_class( - type(self.conv), additional_static_quant_mapping, is_reference=is_reference) - quantized = qconv_cls.from_float(self.conv) - parent_name, name = _parent_name(self.conv_node.target) - setattr(modules[parent_name], name, quantized) - return create_node_from_old_node_preserve_meta( - quantized_graph, - ( - 'call_module', - self.conv_node.target, - (load_arg(quantized=torch.quint8)(self.conv_node.args[0]),), - {}, - ), - self.conv_node) - else: # call_function - assert self.conv_node.op == "call_function" - if is_reference: - # make sure the input and weight are quantized to torch.quint8, torch.qint8, respectively - load_arg(quantized={0: torch.quint8, 1: torch.qint8})(self.conv_node.args) - args = load_arg(quantized=torch.float)(self.conv_node.args) - kwargs = load_arg(quantized=torch.float)(self.conv_node.kwargs) - op_out = create_node_from_old_node_preserve_meta( - quantized_graph, - ("call_function", self.conv, args, kwargs), - self.conv_node) - if self.relu_node: - relu_args = [op_out] - relu_args.extend(load_arg(quantized=torch.float)(self.relu_node.args[1:])) - relu_kwargs = load_arg(quantized=torch.float)(self.relu_node.kwargs) - op_out = create_node_from_old_node_preserve_meta( - quantized_graph, - ("call_function", torch.nn.functional.relu, tuple(relu_args), relu_kwargs), - self.relu_node) + pass - if activation_int8_quantized: - root_module = modules[''] - act_post_process_name = self.relu_node.name if self.relu_node else self.conv_node.name - act_post_process_node = self.relu_node if self.relu_node else self.conv_node - activation_post_process = \ - self._maybe_get_last_node_only_observer(modules) - assert activation_post_process is not None - return quantize_node( - op_out, - activation_post_process, - act_post_process_node, - modules, - quantized_graph, - node_name_to_scope, - is_input=False) - else: - # output for dynamically quantized conv op is not quantized - return op_out - else: - assert len(self.conv_node.args) >= 7, \ - "only conv2d calls with all arguments specified is supported right now in is_reference=False option" - # make sure the input and weight are quantized to torch.quint8, torch.qint8, respectively - args = load_arg(quantized={0: torch.quint8, 1: torch.qint8})(self.conv_node.args) - # pack weight - weight = load_arg(quantized=torch.qint8)(self.conv_node.args[1]) - other_args = load_arg(quantized=torch.float)(self.conv_node.args[2:]) - bias, stride, padding, dilation, groups = other_args - if self.conv == torch.nn.functional.conv1d: - # F.conv1d can take `int` as well as `list[int]` for stride, - # padding, dilation, but the prepack op cannot. Convert - # these to lists if needed. - stride = [stride] if isinstance(stride, int) else stride - padding = [padding] if isinstance(padding, int) else padding - dilation = [dilation] if isinstance(dilation, int) else dilation - prepack_args = (weight, bias, stride, padding, dilation, groups) - prepack_op = get_qconv_prepack_op(self.conv) - packed_weight = quantized_graph.create_node( - "call_function", prepack_op, prepack_args, {}) - assert activation_int8_quantized, \ - "currently only static quantization is supported for conv" - # construct conv input - if activation_int8_quantized: - qconv_op = get_qconv_op(self.conv, self.relu_node is not None) - conv_input = load_arg(quantized=torch.quint8)(self.conv_node.args[0]) - - activation_post_process = \ - self._maybe_get_last_node_only_observer(modules) - assert activation_post_process is not None - - scale, zero_point, _ = get_per_tensor_qparams(activation_post_process) - scale_node, zero_point_node = \ - create_qparam_nodes( - self.conv_node.name, scale, zero_point, modules, - quantized_graph, node_name_to_scope) - qconv_args = (conv_input, packed_weight, scale_node, zero_point_node) - kwargs = load_arg(quantized=torch.float)(self.conv_node.kwargs) - op = create_node_from_old_node_preserve_meta( - quantized_graph, - ('call_function', qconv_op, qconv_args, kwargs), - self.conv_node) - # Store the name of the fused op to get the path of node after fusion as well. - # TODO: may need to change the key to Node regenerate the map in each transformation, - # since we might not be able to rely on the name - node_name_to_scope[op.name] = node_name_to_scope[self.conv_node.name] - return op - else: - # conv2d_dyanmic branch - raise Exception("Only static quant is supported for conv") - -@register_quant_pattern(torch.nn.Linear) -@register_quant_pattern(torch.nn.functional.linear) -@register_quant_pattern(torch.nn.qat.Linear) -@register_quant_pattern(torch.nn.intrinsic.LinearReLU) -@register_quant_pattern(torch.nn.intrinsic.qat.LinearReLU) -@register_quant_pattern((torch.nn.functional.relu, torch.nn.functional.linear)) -@register_quant_pattern((torch.nn.ReLU, torch.nn.functional.linear)) -# for error checks -@register_quant_pattern((torch.nn.ReLU, torch.nn.Linear)) -@register_quant_pattern((torch.nn.functional.relu, torch.nn.Linear)) +# TODO: remove this class class LinearReLUQuantizeHandler(QuantizeHandler): - def __init__( - self, - node: Node, - modules: Dict[str, torch.nn.Module]): - super().__init__(node, modules) - self.relu_node = None - if (node.op == 'call_function' and node.target is torch.nn.functional.relu) or \ - (node.op == 'call_module' and isinstance(modules[str(node.target)], torch.nn.ReLU)): - self.relu_node = node - node = node.args[0] # type: ignore[assignment] - self.linear_node = node - if node.op == 'call_module': - self.linear = modules[str(self.linear_node.target)] - - def convert(self, - node: Node, - qconfig: QConfigAny, - modules: Dict[str, torch.nn.Module], - quantized_graph: Graph, - node_name_to_scope: Dict[str, Tuple[str, type]], - load_arg: Callable, - is_reference: bool = False, - convert_custom_config_dict: Dict[str, Any] = None) -> Node: - if convert_custom_config_dict is None: - convert_custom_config_dict = {} - # Supported combinations are: - # quant_type | activation (compute_type) | weight - # static quint8 qint8 - # dynamic float32 (quint8) qint8 - # weight_only float32 float16 - # tuple (activation_dtype, weight_dtype, compute_dtype) - supported_dtypes = [ - (torch.quint8, torch.qint8, None), - (torch.float32, torch.qint8, torch.quint8), - (torch.float32, torch.float16, None), - # static float16 quantization - (torch.float16, torch.float16, None), - ] - dtypes = get_qconfig_dtypes(qconfig) - # leave the op unquantized if the dtype combination is not supported - if not is_reference and dtypes not in supported_dtypes: - warnings.warn( - "dtype combination: {} is not " - "supported by Linear " - "supported dtype combinations are: {}".format(dtypes, supported_dtypes)) - if self.relu_node: - op_out = quantized_graph.node_copy(self.linear_node, load_arg(quantized=torch.float)) - relu_args = [op_out] - relu_args.extend(load_arg(quantized=torch.float)(self.relu_node.args[1:])) - relu_kwargs = load_arg(quantized=torch.float)(self.relu_node.kwargs) - return create_node_from_old_node_preserve_meta( - quantized_graph, - ("call_function", torch.nn.functional.relu, tuple(relu_args), relu_kwargs), - self.relu_node) - else: - return quantized_graph.node_copy(node, load_arg(quantized=None)) - - activation_int8_quantized = activation_is_int8_quantized(qconfig) - activation_statically_quantized = activation_is_statically_quantized(qconfig) - weight_dtype = dtypes[1] - if self.linear_node.op == 'call_module': - - output_activation_post_process = \ - self._maybe_get_last_node_only_observer(modules) - - # note that relu should already be fused into linear modul in the fusion step - assert self.relu_node is None, 'linear module and relu fusion is not executed, ' \ - 'please make sure to run fusion before prepare' - # we'll always produce reference pattern for the following modules - # will remove the else branch after we migrated all use cases - module_allowlist = [ - torch.nn.Linear, - torch.nn.qat.Linear, - torch.nn.intrinsic.modules.fused.LinearReLU, - torch.nn.intrinsic.qat.modules.linear_relu.LinearReLU - ] - if is_reference or type(self.linear) in module_allowlist and dtypes in [(torch.quint8, torch.qint8, None)]: - # produce dequant - float_op - quant pattern - dtype = torch.float - if activation_int8_quantized: - dtype = activation_dtype(qconfig) - activation = load_arg(quantized=dtype)(self.linear_node.args[0]) - args = load_arg(quantized=torch.float)(self.linear_node.args) + pass - # Get the float linear and attach qscheme and qparams the the module - float_linear = self.linear - fused_linear = None - if isinstance(float_linear, (torch.nn.qat.Linear, torch.nn.intrinsic.qat.LinearReLU)): - float_linear = float_linear.to_float() - # change qat linear to linear - parent_name, name = _parent_name(self.linear_node.target) - setattr(modules[parent_name], name, float_linear) - # Attach weight fake quant to the linear module - if isinstance(float_linear, torch.nn.intrinsic.LinearReLU): - fused_linear = float_linear - float_linear = float_linear[0] - weight_post_process = self.linear.weight_fake_quant - else: - if isinstance(float_linear, torch.nn.intrinsic.LinearReLU): - fused_linear = float_linear - float_linear = self.linear[0] # type: ignore[index] - # Attach the weight observer to the module - weight_post_process = qconfig.weight() # type: ignore[union-attr] - - # Run weight observer - # TODO: This is currently a hack for QAT to get the right shapes for scale and zero point. - # In the future, we should require the user to calibrate the model after calling prepare - weight_post_process(float_linear.weight) # type: ignore[operator] - - weight_qparams = get_qparam_dict(weight_post_process) - # TODO: include the configuration in backend_config_dict - # we can have a map from module to reference module - # and allow user to register new ones - qlinear_cls = get_static_quant_module_class( - type(float_linear), is_reference=True) - ref_linear = qlinear_cls.from_float(float_linear, weight_qparams) - - # if the parent is a fused linear (Sequential), we can replace the first - # item to ref linear, otherwise we can update - # the linear instance in the module tree - if fused_linear is not None: - fused_linear[0] = ref_linear - else: - parent_name, name = _parent_name(self.linear_node.target) - setattr(modules[parent_name], name, ref_linear) - op_out = create_node_from_old_node_preserve_meta( - quantized_graph, - ('call_module', self.linear_node.target, args, {}), - self.linear_node) - if output_activation_post_process: - op_out = quantize_node( - op_out, - output_activation_post_process, - node, - modules, - quantized_graph, - node_name_to_scope, - is_input=False) - return op_out - # non-reference option - else: - # 1. attach output activation post process to linear module - if output_activation_post_process: - self.linear.activation_post_process = output_activation_post_process - - # 2. select corresponding quantized linear class for the float linear class - if activation_int8_quantized: - additional_static_quant_mapping = convert_custom_config_dict.get("static", {}) - qlinear = get_static_quant_module_class( - type(self.linear), additional_static_quant_mapping) - else: - assert dtypes in [ - (torch.float32, torch.qint8, torch.quint8), - (torch.float32, torch.float16, None), - ], f"dtype {dtypes} not supported yet" - additional_dynamic_quant_mapping = convert_custom_config_dict.get("dynamic", {}) - qlinear = get_dynamic_quant_module_class(type(self.linear), additional_dynamic_quant_mapping) - - quantized = qlinear.from_float(self.linear) - parent_name, name = _parent_name(self.linear_node.target) - setattr(modules[parent_name], name, quantized) - # activation needs to be quantized for static quantization - dtype = torch.float - if activation_int8_quantized: - dtype = activation_dtype(qconfig) - return create_node_from_old_node_preserve_meta( - quantized_graph, - ( - 'call_module', - self.linear_node.target, - (load_arg(quantized=dtype)(self.linear_node.args[0]),), {}, - ), - self.linear_node) - else: # call_function - assert self.linear_node.op == 'call_function' - if is_reference or self.linear_node.target == torch.nn.functional.linear and\ - dtypes in [(torch.quint8, torch.qint8, None)]: - quantized_input_dtypes = [torch.float, torch.float] - if activation_int8_quantized: - quantized_input_dtypes[0] = torch.quint8 - if weight_is_statically_quantized(qconfig): - quantized_input_dtypes[1] = torch.qint8 - args = load_arg(quantized=quantized_input_dtypes)(self.linear_node.args) - args = load_arg(quantized=torch.float)(self.linear_node.args) - kwargs = load_arg(quantized=torch.float)(self.linear_node.kwargs) - op_out = create_node_from_old_node_preserve_meta( - quantized_graph, - ("call_function", torch.nn.functional.linear, args, kwargs), - self.linear_node) - if self.relu_node: - relu_args = [op_out] - relu_args.extend(load_arg(quantized=torch.float)(self.relu_node.args[1:])) - relu_kwargs = load_arg(quantized=torch.float)(self.relu_node.kwargs) - op_out = create_node_from_old_node_preserve_meta( - quantized_graph, - ("call_function", torch.nn.functional.relu, tuple(relu_args), relu_kwargs), - self.relu_node) - - if activation_statically_quantized: - # quantize output for statically quantized linear op - root_module = modules[''] - act_post_process_name = self.relu_node.name if self.relu_node else self.linear_node.name - act_post_process_node = self.relu_node if self.relu_node else self.linear_node - activation_post_process = \ - self._maybe_get_last_node_only_observer(modules) - assert activation_post_process is not None - return quantize_node( - op_out, - activation_post_process, - act_post_process_node, - modules, - quantized_graph, - node_name_to_scope, - is_input=False, - output_prefix="") - else: - # output for dynamically quantized linear op is not quantized - return op_out - else: # non-reference option - # prepacking weights for static int8 quant and dynamic quant - if dtypes != (torch.float16, torch.float16, None): - # linear args - # (x, weight, bias, ...) - # TODO: the name should be weight is int8 quantized - weight_quantized = weight_is_statically_quantized(qconfig) - dtype = weight_dtype if weight_quantized else torch.float - linear_weight = load_arg(quantized=dtype)(self.linear_node.args[1]) - - # get other arguments - kwargs = {**load_arg(quantized=torch.float)(self.linear_node.kwargs)} - # all args after bias, including bias - other_args = load_arg(quantized=torch.float)(self.linear_node.args[2:]) - # bias might be either positional, or a keyword argument - if len(self.linear_node.args) > 2: - bias = load_arg(quantized=torch.float)(self.linear_node.args[2]) - other_args = other_args[1:] # remove the bias argument - else: - bias = kwargs.pop('bias', None) - - prepack_args = (linear_weight, bias) - prepack_op = get_linear_prepack_op_for_dtype(weight_dtype) - packed_weight = quantized_graph.create_node( - 'call_function', prepack_op, prepack_args, {}) - # construct linear input - if activation_int8_quantized: - qlinear_op = torch.ops.quantized.linear_relu if self.relu_node else torch.ops.quantized.linear - linear_input = load_arg(quantized=torch.quint8)(self.linear_node.args[0]) - activation_post_process = \ - self._maybe_get_last_node_only_observer(modules) - assert activation_post_process is not None - scale, zero_point, _ = get_per_tensor_qparams(activation_post_process) - scale_node, zero_point_node = \ - create_qparam_nodes( - self.linear_node.name, scale, zero_point, modules, - quantized_graph, node_name_to_scope) - - qlinear_args = (linear_input, packed_weight, scale_node, zero_point_node) - op = create_node_from_old_node_preserve_meta( - quantized_graph, - ("call_function", qlinear_op, qlinear_args, kwargs), - self.linear_node) - # Store the name of the fused op to get the path of node after fusion as well. - # TODO: may need to change the key to Node regenerate the map in each transformation, - # since we might not be able to rely on the name - node_name_to_scope[op.name] = node_name_to_scope[self.linear_node.name] - return op - elif dtypes in [(torch.float32, torch.qint8, torch.quint8), - (torch.float32, torch.float16, None)]: - # choose linear dynamic or linear dynamic fp16 op based on weight dtype - if weight_dtype == torch.qint8: - if self.relu_node: - qlinear_op = torch.ops.quantized.linear_relu_dynamic - else: - qlinear_op = torch.ops.quantized.linear_dynamic - else: - if self.relu_node: - qlinear_op = torch.ops.quantized.linear_relu_dynamic_fp16 - else: - qlinear_op = torch.ops.quantized.linear_dynamic_fp16 - - linear_input = load_arg(quantized=torch.float)(self.linear_node.args[0]) - qlinear_args = (linear_input, packed_weight) # type: ignore[assignment] - op_out = create_node_from_old_node_preserve_meta( - quantized_graph, - ("call_function", qlinear_op, qlinear_args, kwargs), - self.linear_node) - # Store the name of the dynamic op to get the path of node after replacement as well. - # TODO: may need to change the key to Node regenerate the map in each transformation, - # since we might not be able to rely on the name - node_name_to_scope[op_out.name] = node_name_to_scope[self.linear_node.name] - return op_out - else: - assert dtypes == (torch.float16, torch.float16, None) - # TODO (refactor) this is duplicated, maybe have a helper function - if self.relu_node: - op_out = quantized_graph.node_copy(self.linear_node, load_arg(quantized=torch.float)) - relu_args = [op_out] - relu_args.extend(load_arg(quantized=torch.float)(self.relu_node.args[1:])) - relu_kwargs = load_arg(quantized=torch.float)(self.relu_node.kwargs) - op_out = create_node_from_old_node_preserve_meta( - quantized_graph, - ("call_function", torch.nn.functional.relu, tuple(relu_args), relu_kwargs), - self.relu_node) - else: - op_out = quantized_graph.node_copy(node, load_arg(quantized=torch.float)) - return quantized_graph.create_node( - "call_method", "to", (op_out, torch.float16), {}) - -@register_quant_pattern(torch.nn.BatchNorm2d) -@register_quant_pattern(torch.nn.BatchNorm3d) -@register_quant_pattern(torch.nn.intrinsic.BNReLU2d) -@register_quant_pattern(torch.nn.intrinsic.BNReLU3d) +# TODO: remove this class class BatchNormQuantizeHandler(QuantizeHandler): - def __init__( - self, - node: Node, - modules: Dict[str, torch.nn.Module]): - super().__init__(node, modules) - assert node.op == 'call_module' - self.bn_node = node - self.bn = modules[str(self.bn_node.target)] + pass - def convert(self, - node: Node, - qconfig: QConfigAny, - modules: Dict[str, torch.nn.Module], - quantized_graph: Graph, - node_name_to_scope: Dict[str, Tuple[str, type]], - load_arg: Callable, - is_reference: bool = False, - convert_custom_config_dict: Dict[str, Any] = None) -> Node: - if convert_custom_config_dict is None: - convert_custom_config_dict = {} - additional_static_quant_mapping = convert_custom_config_dict.get("static", {}) - # 1. attach activation post process to module - output_activation_post_process = \ - self._maybe_get_last_node_only_observer(modules) - assert output_activation_post_process is not None - if is_reference: - # produce dequant - float_op - quant pattern - dtype = activation_dtype(qconfig) - activation = load_arg(quantized=dtype)(self.bn_node.args[0]) - args = load_arg(quantized=torch.float)(self.bn_node.args) - op_out = create_node_from_old_node_preserve_meta( - quantized_graph, - ("call_module", self.bn_node.target, args, {}), - self.bn_node) - if output_activation_post_process: - op_out = quantize_node( - op_out, - output_activation_post_process, - node, - modules, - quantized_graph, - node_name_to_scope, - is_input=False) - return op_out - else: - self.bn.activation_post_process = output_activation_post_process - qbn_cls = get_static_quant_module_class(type(self.bn), additional_static_quant_mapping) - quantized = qbn_cls.from_float(self.bn) - parent_name, name = _parent_name(self.bn_node.target) - setattr(modules[parent_name], name, quantized) - return create_node_from_old_node_preserve_meta( - quantized_graph, - ( - 'call_module', - self.bn_node.target, - load_arg(quantized=[0])(self.bn_node.args), - load_arg(quantized=torch.float)(self.bn_node.kwargs), - ), - self.bn_node) - -@register_quant_pattern(torch.nn.qat.Embedding) -@register_quant_pattern(torch.nn.qat.EmbeddingBag) -@register_quant_pattern(torch.nn.Embedding) -@register_quant_pattern(torch.nn.EmbeddingBag) +# TODO: remove this class class EmbeddingQuantizeHandler(QuantizeHandler): - def __init__( - self, - node: Node, - modules: Dict[str, torch.nn.Module]): - super().__init__(node, modules) - - def input_output_observed(self) -> bool: - return False + pass - def convert(self, - node: Node, - qconfig: QConfigAny, - modules: Dict[str, torch.nn.Module], - quantized_graph: Graph, - node_name_to_scope: Dict[str, Tuple[str, type]], - load_arg: Callable, - is_reference: bool = False, - convert_custom_config_dict: Dict[str, Any] = None) -> Node: - # Supported combinations are: - # quant_type | activation | weight | activation_compute_type - # weight_only | float32 | quint8 | None - # weight_only | float32 | quint4x2 | None - # tuple (activation_dtype, weight_dtype, compute_dtype) - supported_dtypes = [ - (torch.float32, torch.quint8, None), - (torch.float32, torch.quint4x2, None), - ] - assert node.op == 'call_module' - emb_node = node - dtypes = get_qconfig_dtypes(qconfig) - # leave the op unquantized if the dtype combination is not supported - if dtypes not in supported_dtypes: - warnings.warn( - "dtype combination: {} is not " - "supported by Embedding/EmbeddingBag, " - "supported dtype combinations are: {}".format(dtypes, supported_dtypes)) - return quantized_graph.node_copy(node, load_arg(quantized=None)) - - emb = modules[str(emb_node.target)] - qemb = get_static_quant_module_class(type(emb)) - quantized = qemb.from_float(emb) - parent_name, name = _parent_name(emb_node.target) - setattr(modules[parent_name], name, quantized) - return create_node_from_old_node_preserve_meta( - quantized_graph, - ( - 'call_module', - emb_node.target, - load_arg(quantized=torch.float)(emb_node.args), - load_arg(quantized=torch.float)(emb_node.kwargs), - ), - emb_node) - -# TODO (maybe): merge with embedding quantize handler -@register_quant_pattern(torch.nn.GRUCell) -@register_quant_pattern(torch.nn.LSTMCell) -@register_quant_pattern(torch.nn.RNNCell) -@register_quant_pattern(torch.nn.LSTM) +# TODO: remove this class class RNNDynamicQuantizeHandler(QuantizeHandler): - def __init__( - self, - node: Node, - modules: Dict[str, torch.nn.Module]): - super().__init__(node, modules) - - def input_output_observed(self) -> bool: - return False - - def convert(self, - node: Node, - qconfig: QConfigAny, - modules: Dict[str, torch.nn.Module], - quantized_graph: Graph, - node_name_to_scope: Dict[str, Tuple[str, type]], - load_arg: Callable, - is_reference: bool = False, - convert_custom_config_dict: Dict[str, Any] = None) -> Node: - # Supported combinations are: - # quant_type | activation | weight | activation_compute_type - # dynamic | float32 | qint8 | quint8 - # dynamic | float32 | float16 | None - # tuple (activation_dtype, weight_dtype, compute_dtype) - supported_dtypes = [ - (torch.float32, torch.qint8, torch.quint8), - (torch.float32, torch.float16, None), - ] - assert node.op == 'call_module' - dtypes = get_qconfig_dtypes(qconfig) - # leave the op unquantized if the dtype combination is not supported - if dtypes not in supported_dtypes: - warnings.warn( - "dtype combination: {} is not " - "supported by Embedding/EmbeddingBag, " - "supported dtype combinations are: {}".format(dtypes, supported_dtypes)) - return quantized_graph.node_copy(node, load_arg(quantized=None)) + pass - act_dtype, weight_dtype, compute_dtype = dtypes - activation = load_arg(quantized=act_dtype)(node.args[0]) - module = modules[str(node.target)] - qmodule_cls = get_dynamic_quant_module_class(type(module)) - qmodule = qmodule_cls.from_float(module) - parent_name, name = _parent_name(node.target) - setattr(modules[parent_name], name, qmodule) - return create_node_from_old_node_preserve_meta( - quantized_graph, - ( - 'call_module', - node.target, - load_arg(quantized=torch.float)(node.args), - load_arg(quantized=torch.float)(node.kwargs), - ), - node) - -ARGS_TO_SKIP = { - torch._ops.ops.quantized.hardswish: ['inplace'], - torch._ops.ops.quantized.elu: ['inplace'], - torch._ops.ops.quantized.dropout: ['inplace'], - torch._ops.ops.quantized.instance_norm: - ['running_mean', 'running_var', 'use_input_stats', 'momentum'], -} -@register_quant_pattern(torch.nn.ConvTranspose1d) -@register_quant_pattern(torch.nn.ConvTranspose2d) -@register_quant_pattern(torch.nn.ELU) -@register_quant_pattern(torch.nn.LeakyReLU) -@register_quant_pattern(torch.nn.Hardswish) -@register_quant_pattern(torch.nn.InstanceNorm1d) -@register_quant_pattern(torch.nn.InstanceNorm2d) -@register_quant_pattern(torch.nn.InstanceNorm3d) -@register_quant_pattern(torch.nn.LayerNorm) -@register_quant_pattern(torch.nn.SiLU) -@register_quant_pattern(torch.nn.Mish) -@register_quant_pattern(torch.nn.Dropout) -# we currently only support reference patterns for these ops so they have been removed -# until they receive a proper fp16 kernel. To use the reference pattern, use a custom qconfig -# @register_quant_pattern(torch.nn.GELU) -# @register_quant_pattern(torch.nn.Softmax) -@register_quant_pattern(torch.nn.functional.elu) -@register_quant_pattern(torch.nn.functional.hardswish) -@register_quant_pattern(torch.nn.functional.instance_norm) -@register_quant_pattern(torch.nn.functional.layer_norm) -@register_quant_pattern(torch.nn.functional.leaky_relu) -@register_quant_pattern(torch.nn.functional.silu) -@register_quant_pattern(torch.nn.functional.mish) -@register_quant_pattern(torch.nn.functional.dropout) -# we currently only support reference patterns for these ops so they have been removed -# until they receive a proper fp16 kernel. To use the reference pattern, use a custom qconfig -# @register_quant_pattern(torch.nn.functional.gelu) -# @register_quant_pattern(torch.nn.functional.softmax) -@register_quant_pattern(torch.sum) +# TODO: remove this class class DefaultNodeQuantizeHandler(QuantizeHandler): """ Common quantized op, first input and first output will be quantized """ - def __init__( - self, - node: Node, - modules: Dict[str, torch.nn.Module]): - super().__init__(node, modules) - if node.op == "call_function" or node.op == "call_method": - self.op = node.target - elif node.op == "call_module": - self.op = type(modules[str(node.target)]) - - def is_output_quantized(self, qconfig): - dtypes = get_qconfig_dtypes(qconfig) - return self.op in default_op_supported_dtypes and \ - dtypes in default_op_supported_dtypes[self.op] - - def convert(self, - node: Node, - qconfig: QConfigAny, - modules: Dict[str, torch.nn.Module], - quantized_graph: Graph, - node_name_to_scope: Dict[str, Tuple[str, type]], - load_arg: Callable, - is_reference: bool = False, - convert_custom_config_dict: Dict[str, Any] = None) -> Node: - if not self.all_node_args_are_tensors: - return NotImplemented - assert node.op in ['call_module', 'call_function'], 'Only call_module and ' + \ - 'call_function are handled in DefaultNode' - if convert_custom_config_dict is None: - convert_custom_config_dict = {} - additional_static_quant_mapping = convert_custom_config_dict.get("static", {}) - - dtypes = get_qconfig_dtypes(qconfig) - if not is_reference and dtypes not in default_op_supported_dtypes[self.op]: - warnings.warn( - "dtype combination: {} is not " - "supported by {} " - "supported dtype combinations are: {}".format(dtypes, self.op, default_op_supported_dtypes[self.op])) - return quantized_graph.node_copy(node, load_arg(quantized=torch.float)) - - # We can produce reference for a dtypes including - # (torch.quint8, torch.qint8, torch.qint32, torch.float16) - act_dtype = activation_dtype(qconfig) - if act_dtype == torch.float: - op_out = quantized_graph.node_copy(node, load_arg(quantized=torch.float)) - return op_out - else: - activation_post_process = \ - self._maybe_get_last_node_only_observer(modules) - assert activation_post_process is not None - # make sure the input is quantized to act_dtype - load_arg(quantized={0: act_dtype})(node.args) - args = load_arg(quantized=torch.float)(node.args) - kwargs = load_arg(quantized=torch.float)(node.kwargs) - # swap float module to reference module (ConvTranspose) - float_module = modules[str(node.target)] if node.op == "call_module" else None - if type(float_module) in [torch.nn.ConvTranspose1d, torch.nn.ConvTranspose2d]: - ref_module_cls = get_static_quant_module_class(type(float_module), is_reference=True) + pass - weight_post_process = qconfig.weight() # type: ignore[union-attr] - weight_post_process(float_module.weight) # type: ignore[union-attr] - weight_qparams = get_qparam_dict(weight_post_process) - ref_module = ref_module_cls.from_float(float_module, weight_qparams) # type: ignore[attr-defined] - parent_name, name = _parent_name(node.target) - setattr(modules[parent_name], name, ref_module) - op_out = quantized_graph.node_copy(node, load_arg(quantized=torch.float)) - return quantize_node( - op_out, activation_post_process, - node, modules, quantized_graph, node_name_to_scope, is_input=False) - -@register_quant_pattern(torch.nn.Hardsigmoid, default_affine_fixed_qparams_observer) -@register_quant_pattern(torch.nn.functional.hardsigmoid, default_affine_fixed_qparams_observer) -@register_quant_pattern('hardsigmoid', default_affine_fixed_qparams_observer) -@register_quant_pattern('hardsigmoid_', default_affine_fixed_qparams_observer) -@register_quant_pattern(torch.nn.Sigmoid, default_affine_fixed_qparams_observer) -@register_quant_pattern(torch.sigmoid, default_affine_fixed_qparams_observer) -@register_quant_pattern('sigmoid', default_affine_fixed_qparams_observer) -@register_quant_pattern('sigmoid_', default_affine_fixed_qparams_observer) -@register_quant_pattern(torch.nn.Tanh, default_symmetric_fixed_qparams_observer) -@register_quant_pattern(torch.tanh, default_symmetric_fixed_qparams_observer) -@register_quant_pattern('tanh', default_symmetric_fixed_qparams_observer) -@register_quant_pattern('tanh_', default_symmetric_fixed_qparams_observer) +# TODO: remove this class class FixedQParamsOpQuantizeHandler(QuantizeHandler): - def __init__(self, - node: Node, - modules: Dict[str, torch.nn.Module]): - super().__init__(node, modules) - self.node = node - - def should_mark_output_quantized_from_input_quantized_status( - self, - qconfig: QConfigAny - ) -> bool: - # FixQParamOps are the same as CopyNode in int8 quantization - return activation_dtype(qconfig) in [torch.quint8, torch.qint8] - - # some qhandlers override the activations constructor - def get_activation_ctr(self, qconfig, pattern, is_training) -> Optional[Callable]: - act_dtype = activation_dtype(qconfig) - if act_dtype == torch.quint8: - return get_default_output_activation_post_process_map(is_training).get( - pattern, qconfig.activation) - else: - return qconfig.activation + pass - def convert(self, - node: Node, - qconfig: QConfigAny, - modules: Dict[str, torch.nn.Module], - quantized_graph: Graph, - node_name_to_scope: Dict[str, Tuple[str, type]], - load_arg: Callable, - is_reference: bool = False, - convert_custom_config_dict: Dict[str, Any] = None) -> Node: - act_dtype = activation_dtype(qconfig) - if act_dtype == torch.float: - op_out = quantized_graph.node_copy(node, load_arg(quantized=torch.float)) - return op_out - else: - activation_post_process = \ - self._maybe_get_last_node_only_observer(modules) - assert activation_post_process is not None - # make sure the input is quantized to act_dtype - load_arg(quantized={0: act_dtype})(node.args) - args = load_arg(quantized=torch.float)(node.args) - kwargs = load_arg(quantized=torch.float)(node.kwargs) - op_out = quantized_graph.node_copy(node, load_arg(quantized=torch.float)) - return quantize_node( - op_out, activation_post_process, - node, modules, quantized_graph, node_name_to_scope, is_input=False) - -@register_quant_pattern(torch.nn.AdaptiveAvgPool1d) -@register_quant_pattern(torch.nn.AdaptiveAvgPool2d) -@register_quant_pattern(torch.nn.AdaptiveAvgPool3d) -@register_quant_pattern(torch.nn.AvgPool1d) -@register_quant_pattern(torch.nn.AvgPool2d) -@register_quant_pattern(torch.nn.AvgPool3d) -@register_quant_pattern(torch.nn.Hardtanh) -@register_quant_pattern(torch.nn.MaxPool1d) -@register_quant_pattern(torch.nn.MaxPool2d) -@register_quant_pattern(torch.nn.MaxPool3d) -@register_quant_pattern(torch.nn.ReLU) -@register_quant_pattern(torch.nn.ReLU6) -@register_quant_pattern(torch.adaptive_avg_pool1d) -@register_quant_pattern(torch.nn.functional.adaptive_avg_pool2d) -@register_quant_pattern(torch.nn.functional.adaptive_avg_pool3d) -@register_quant_pattern(torch.nn.functional.hardtanh) -@register_quant_pattern(torch.nn.functional.hardtanh_) -@register_quant_pattern(torch.nn.functional.interpolate) -@register_quant_pattern(torch.nn.functional.max_pool1d) -@register_quant_pattern(torch.nn.functional.max_pool2d) -@register_quant_pattern(torch.nn.functional.max_pool3d) -@register_quant_pattern(torch.nn.functional.relu) -@register_quant_pattern(torch.nn.functional.relu6) -@register_quant_pattern(torch.avg_pool1d) -@register_quant_pattern(torch._C._nn.avg_pool2d) -@register_quant_pattern(torch._C._nn.avg_pool3d) -@register_quant_pattern(torch.clamp) -@register_quant_pattern(torch.flatten) -@register_quant_pattern(torch.mean) -@register_quant_pattern(operator.floordiv) -@register_quant_pattern('clamp') -@register_quant_pattern('mean') -@register_quant_pattern('relu') -@register_quant_pattern('relu_') +# TODO: remove class CopyNodeQuantizeHandler(QuantizeHandler): - """ Operators that works on both float and quantized input - if input is quantized, the output Tensor shares - the same quantization parameter with input. - These ops will do computation on the input Tensor, e.g. average pool, so we will - insert extra observer/fake_quant for the output of these operators. - TODO: maybe rename this to TensorValueOpQuantizeHandler - """ - def should_mark_output_quantized_from_input_quantized_status( - self, - qconfig: QConfigAny - ) -> bool: - return True - - def is_general_tensor_value_op(self) -> bool: - return True - - def convert(self, - node: Node, - qconfig: QConfigAny, - modules: Dict[str, torch.nn.Module], - quantized_graph: Graph, - node_name_to_scope: Dict[str, Tuple[str, type]], - load_arg: Callable, - is_reference: bool = False, - convert_custom_config_dict: Dict[str, Any] = None) -> Node: + pass - # when activation dtype is torch.float, the node does not require - # observation - # e.g. dynamic quantization or weight_only quantization - act_dtype = activation_dtype(qconfig) - if act_dtype == torch.float: - op_out = quantized_graph.node_copy(node, load_arg(quantized=torch.float)) - return op_out - else: - activation_post_process = \ - self._maybe_get_last_node_only_observer(modules) - if activation_post_process is not None: - # make sure the input is quantized to act_dtype - load_arg(quantized={0: act_dtype})(node.args) - args = list(load_arg(quantized=torch.float)(node.args)) - kwargs = load_arg(quantized=torch.float)(node.kwargs) - op_out = quantized_graph.node_copy(node, load_arg(quantized=torch.float)) - return quantize_node( - op_out, - activation_post_process, - node, modules, quantized_graph, node_name_to_scope, is_input=False) - else: - op_out = quantized_graph.node_copy(node, load_arg(quantized=torch.float)) - return op_out - -class CustomModuleQuantizeHandler(QuantizeHandler): - def convert(self, - node: Node, - qconfig: QConfigAny, - modules: Dict[str, torch.nn.Module], - quantized_graph: Graph, - node_name_to_scope: Dict[str, Tuple[str, type]], - load_arg: Callable, - is_reference: bool = False, - convert_custom_config_dict: Dict[str, Any] = None) -> Node: - """ Convert a float custom module to quantized custom module - """ - assert node.op == 'call_module' - assert convert_custom_config_dict is not None - custom_module_class_mapping = convert_custom_config_dict.get("observed_to_quantized_custom_module_class", None) - assert custom_module_class_mapping is not None - observed_custom_module = modules[str(node.target)] - if activation_is_statically_quantized(qconfig): - activation_post_process = \ - self._maybe_get_last_node_only_observer(modules) - assert activation_post_process is not None - observed_custom_module.activation_post_process = activation_post_process - quantized_custom_module_class = get_swapped_custom_module_class( - observed_custom_module, custom_module_class_mapping, qconfig) - quantized_custom_module = \ - quantized_custom_module_class.from_observed(observed_custom_module) - parent_name, name = _parent_name(node.target) - setattr(modules[parent_name], name, quantized_custom_module) - # hardcoded the quntized input to be None (take whatever is in the environemnt), - # we can extend this - # if there is a need, e.g. get the indexes of quantized inputs from some - # module attribute like module._QUANTIZED_INPUT_INDEXES - return quantized_graph.node_copy(node, load_arg(quantized=None)) - -@register_quant_pattern(torch.nn.Identity) -@register_quant_pattern(torch.transpose) -@register_quant_pattern(torch.repeat_interleave) -@register_quant_pattern(torch.squeeze) -@register_quant_pattern(torch.stack) -@register_quant_pattern(torch.unsqueeze) -@register_quant_pattern('contiguous') -@register_quant_pattern('detach') -@register_quant_pattern('detach_') -@register_quant_pattern('permute') -@register_quant_pattern('repeat') -@register_quant_pattern('repeat_interleave') -@register_quant_pattern('reshape') -@register_quant_pattern('resize_') -@register_quant_pattern('shape') -@register_quant_pattern('size') -@register_quant_pattern('squeeze') -@register_quant_pattern('squeeze_') -@register_quant_pattern('transpose') -@register_quant_pattern('unsqueeze') -@register_quant_pattern('unsqueeze_') -@register_quant_pattern('view') +# TODO: remove class GeneralTensorShapeOpQuantizeHandler(QuantizeHandler): - """ Operators that works on both float and quantized input - if input is quantized, the output Tensor shares - the same quantization parameter with input. - These ops only do rearrangement of Tensor values, for - example reshape, or just query the information about Tensor - e.g. size, and we do not insert extra observer/fake_quant - for the output of the operator. - """ - def is_general_tensor_shape_op(self) -> bool: - return True + pass - def should_mark_output_quantized_from_input_quantized_status( - self, - qconfig: QConfigAny - ) -> bool: - return True - - def convert(self, - node: Node, - qconfig: QConfigAny, - modules: Dict[str, torch.nn.Module], - quantized_graph: Graph, - node_name_to_scope: Dict[str, Tuple[str, type]], - load_arg: Callable, - is_reference: bool = False, - convert_custom_config_dict: Dict[str, Any] = None) -> Node: - # when activation dtype is torch.float, the node does not require - # observation - # e.g. dynamic quantization or weight_only quantization - act_dtype = activation_dtype(qconfig) - if act_dtype == torch.float: - op_out = quantized_graph.node_copy(node, load_arg(quantized=torch.float)) - return op_out - else: - activation_post_process = \ - self._maybe_get_last_node_only_observer(modules) - if activation_post_process is not None: - args = list(load_arg(quantized=torch.float)(node.args)) - kwargs = load_arg(quantized=torch.float)(node.kwargs) - op_out = quantized_graph.node_copy(node, load_arg(quantized=torch.float)) - return quantize_node( - op_out, - activation_post_process, - node, modules, quantized_graph, node_name_to_scope, is_input=False) - else: - return quantized_graph.node_copy(node, load_arg(quantized=torch.float)) +# TODO: not used, can be removed after torch.quantization namespace is deprecated +class CustomModuleQuantizeHandler(QuantizeHandler): + pass +# TODO: not used, can be removed after torch.quantization namespace is deprecated class StandaloneModuleQuantizeHandler(QuantizeHandler): - """ Converts an observed standalone module to quantized standalone module - by calling convert_fx on the observed standalone module. - """ - def convert(self, - node: Node, - qconfig: QConfigAny, - modules: Dict[str, torch.nn.Module], - quantized_graph: Graph, - node_name_to_scope: Dict[str, Tuple[str, type]], - load_arg: Callable, - is_reference: bool = False, - convert_custom_config_dict: Dict[str, Any] = None) -> Node: - assert node.op == 'call_module' - convert = torch.ao.quantization.quantize_fx._convert_standalone_module_fx # type: ignore[attr-defined] - # We know that observed standalone module is a GraphModule since - # it's produced by us - observed_standalone_module : GraphModule = modules[str(node.target)] # type: ignore[assignment] - input_quantized_idxs = observed_standalone_module._standalone_module_input_quantized_idxs.tolist() # type: ignore[operator] - quantized_standalone_module = convert(observed_standalone_module, is_reference=is_reference) - parent_name, name = _parent_name(node.target) - # update the modules dict - setattr(modules[parent_name], name, quantized_standalone_module) - modules[str(node.target)] = quantized_standalone_module - return quantized_graph.node_copy(node, load_arg(quantized=input_quantized_idxs)) + pass diff --git a/torch/ao/quantization/fx/quantized_fusion_patterns_and_replacements.py b/torch/ao/quantization/fx/quantized_fusion_patterns_and_replacements.py deleted file mode 100644 index ce23f17db71d8f..00000000000000 --- a/torch/ao/quantization/fx/quantized_fusion_patterns_and_replacements.py +++ /dev/null @@ -1,152 +0,0 @@ -import torch - -def relu_inplace_pattern(x, scale, zero_point): - x = x.dequantize() - x = torch.nn.functional.relu(x, inplace=True) - x = torch.quantize_per_tensor(x, scale, zero_point, torch.quint8) - return x - -def relu_non_inplace_pattern(x, scale, zero_point): - x = x.dequantize() - x = torch.nn.functional.relu(x, inplace=False) - x = torch.quantize_per_tensor(x, scale, zero_point, torch.quint8) - return x - -def relu_replacement(x, scale, zero_point): - x = torch.nn.functional.relu(x) - return x - -def relu_method_pattern(x, scale, zero_point): - x = x.dequantize() - x = x.relu() - x = torch.quantize_per_tensor(x, scale, zero_point, torch.quint8) - return x - -def relu_method_replacement(x, scale, zero_point): - x = x.relu() - return x - -def relu_inplace_method_pattern(x, scale, zero_point): - x = x.dequantize() - x = x.relu_() - x = torch.quantize_per_tensor(x, scale, zero_point, torch.quint8) - return x - -def relu_inplace_method_replacement(x, scale, zero_point): - x = x.relu_() - return x - -def relu6_inplace_pattern(x, scale, zero_point): - x = x.dequantize() - x = torch.nn.functional.relu6(x, inplace=True) - x = torch.quantize_per_tensor(x, scale, zero_point, torch.quint8) - return x - -def relu6_non_inplace_pattern(x, scale, zero_point): - x = x.dequantize() - x = torch.nn.functional.relu6(x, inplace=False) - x = torch.quantize_per_tensor(x, scale, zero_point, torch.quint8) - return x - -def relu6_replacement(x, scale, zero_point): - x = torch.nn.functional.relu6(x) - return x - - -def hardtanh_pattern(x, scale, zero_point): - x = x.dequantize() - x = torch.nn.functional.hardtanh(x, inplace=True) - x = torch.quantize_per_tensor(x, scale, zero_point, torch.quint8) - return x - -def hardtanh_non_inplace_pattern(x, scale, zero_point): - x = x.dequantize() - x = torch.nn.functional.hardtanh(x, inplace=False) - x = torch.quantize_per_tensor(x, scale, zero_point, torch.quint8) - return x - -def hardtanh_replacement(x, scale, zero_point): - x = torch.nn.functional.hardtanh(x) - return x - -def hardtanh_inplace_pattern(x, scale, zero_point): - x = x.dequantize() - x = torch.nn.functional.hardtanh_(x) - x = torch.quantize_per_tensor(x, scale, zero_point, torch.quint8) - return x - -def hardtanh_inplace_replacement(x, scale, zero_point): - x = torch.nn.functional.hardtanh_(x) - return x - -def min_pattern(x, scale, zero_point): - x = x.dequantize() - x = torch.min(x) - x = torch.quantize_per_tensor(x, scale, zero_point, torch.quint8) - return x - -def min_replacement(x, scale, zero_point): - x = torch.min(x) - return x - -def max_pattern(x, scale, zero_point): - x = x.dequantize() - x = torch.max(x) - x = torch.quantize_per_tensor(x, scale, zero_point, torch.quint8) - return x - -def max_replacement(x, scale, zero_point): - x = torch.max(x) - return x - -def mean_pattern(x, scale, zero_point): - x = x.dequantize() - x = torch.mean(x) - x = torch.quantize_per_tensor(x, scale, zero_point, torch.quint8) - return x - -def mean_replacement(x, scale, zero_point): - x = torch.mean(x) - return x - -def mean_method_pattern(x, scale, zero_point): - x = x.dequantize() - x = x.mean() - x = torch.quantize_per_tensor(x, scale, zero_point, torch.quint8) - return x - -def mean_method_replacement(x, scale, zero_point): - x = x.mean() - return x - -def flatten_pattern(x, scale, zero_point): - x = x.dequantize() - x = torch.flatten(x) - x = torch.quantize_per_tensor(x, scale, zero_point, torch.quint8) - return x - -def flatten_replacement(x, scale, zero_point): - x = torch.flatten(x) - return x - -def _get_all_patterns_and_replacements(): - return [ - (relu_inplace_pattern, relu_replacement), - (relu_non_inplace_pattern, relu_replacement), - (relu_method_pattern, relu_method_replacement), - (relu_inplace_method_pattern, relu_inplace_method_replacement), - (relu6_inplace_pattern, relu6_replacement), - (relu6_non_inplace_pattern, relu6_replacement), - (hardtanh_pattern, hardtanh_replacement), - (hardtanh_non_inplace_pattern, hardtanh_replacement), - (hardtanh_inplace_pattern, hardtanh_inplace_replacement), - (mean_pattern, mean_replacement), - (mean_method_pattern, mean_method_replacement), - ] - - -def get_fbgemm_patterns_and_replacements(): - return _get_all_patterns_and_replacements() - -def get_qnnpack_patterns_and_replacements(): - return _get_all_patterns_and_replacements() diff --git a/torch/ao/quantization/fx/subgraph_rewriter_FORKED_DO_NOT_USE.py b/torch/ao/quantization/fx/subgraph_rewriter_FORKED_DO_NOT_USE.py deleted file mode 100644 index a64b537173a90f..00000000000000 --- a/torch/ao/quantization/fx/subgraph_rewriter_FORKED_DO_NOT_USE.py +++ /dev/null @@ -1,445 +0,0 @@ -from torch.fx.graph_module import GraphModule -from torch.fx.graph import Graph -from torch.fx.node import Node -from torch.fx._symbolic_trace import symbolic_trace -from torch.fx._compatibility import compatibility - -import copy -from typing import Callable, Dict, List, NamedTuple, Optional, Set -import torch - -@compatibility(is_backward_compatible=True) -class Match(NamedTuple): - # Node from which the match was found - anchor: Node - # Maps nodes in the pattern subgraph to nodes in the larger graph - nodes_map: Dict[Node, Node] - -class _SubgraphMatcher: - def __init__(self, pattern: Graph) -> None: - self.pattern = pattern - if len(pattern.nodes) == 0: - raise ValueError("_SubgraphMatcher cannot be initialized with an " - "empty pattern") - # `self.pattern_anchor` is the output Node in `pattern` - self.pattern_anchor = next(iter(reversed(pattern.nodes))) - # Ensure that there is only a single output value in the pattern - # since we don't support multiple outputs - assert len(self.pattern_anchor.all_input_nodes) == 1, \ - "Pattern matching on multiple outputs is not supported" - # Maps nodes in the pattern subgraph to nodes in the larger graph - self.nodes_map: Dict[Node, Node] = {} - - def matches_subgraph_from_anchor(self, anchor: Node) -> bool: - """ - Checks if the whole pattern can be matched starting from - ``anchor`` in the larger graph. - - Pattern matching is done by recursively comparing the pattern - node's use-def relationships against the graph node's. - """ - self.nodes_map = {} - return self._match_nodes(self.pattern_anchor, anchor) - - # Compare the pattern node `pn` against the graph node `gn` - def _match_nodes(self, pn: Node, gn: Node) -> bool: - - # Check if we've already matched these nodes in the current - # traversal - if pn in self.nodes_map: - return self.nodes_map[pn] == gn - - def attributes_are_equal(pn: Node, gn: Node) -> bool: - # Use placeholder and output nodes as wildcards. The - # only exception is that an output node can't match - # a placeholder - if (pn.op == "placeholder" - or (pn.op == "output" and gn.op != "placeholder")): - return True - return pn.op == gn.op and pn.target == gn.target - - # Terminate early if the node attributes are not equal - if not attributes_are_equal(pn, gn): - return False - - # Optimistically mark `pn` as a match for `gn` - self.nodes_map[pn] = gn - - # Traverse the use-def relationships to ensure that `pn` is a true - # match for `gn` - if pn.op == "placeholder": - return True - if (pn.op != "output" - and len(pn.all_input_nodes) != len(gn.all_input_nodes)): - return False - if pn.op == "output": - match_found = any(self._match_nodes(pn.all_input_nodes[0], gn_) - for gn_ in gn.all_input_nodes) - else: - match_found = (len(pn.all_input_nodes) == len(gn.all_input_nodes) - and all(self._match_nodes(pn_, gn_) for pn_, gn_ - in zip(pn.all_input_nodes, gn.all_input_nodes))) - if not match_found: - self.nodes_map.pop(pn) - return False - - return True - - -def _replace_submodules(gm: GraphModule, replacement: torch.nn.Module) -> None: - gm.delete_all_unused_submodules() - - if isinstance(replacement, GraphModule): - replacement.graph.lint() - - def try_get_submodule(mod: torch.nn.Module, target: str) -> Optional[torch.nn.Module]: - try: - mod_match = mod.get_submodule(target) - return mod_match - except AttributeError: - return None - - for node in gm.graph.nodes: - if node.op == "call_module" or node.op == "get_attr": - - gm_submod = try_get_submodule(gm, node.target) - - replacement_submod = try_get_submodule(replacement, node.target) - - # CASE 1: This target already exists as a submodule in our - # result GraphModule. Whether or not it exists in - # `replacement`, the existing submodule takes precedence. - if gm_submod is not None: - continue - - # CASE 2: The target exists as a submodule in `replacement` - # only, so we need to copy it over. - elif replacement_submod is not None: - new_submod = copy.deepcopy(getattr(replacement, node.target)) - gm.add_submodule(node.target, new_submod) - - # CASE 3: The target doesn't exist as a submodule in `gm` - # or `replacement` - else: - raise RuntimeError("Attempted to create a \"", node.op, - "\" node during subgraph rewriting " - f"with target {node.target}, but " - "the referenced submodule does not " - "exist in either the original " - "GraphModule `gm` or the replacement" - " GraphModule `replacement`") - - gm.graph.lint() - -@compatibility(is_backward_compatible=True) -def replace_pattern(gm: GraphModule, pattern: Callable, replacement: Callable) -> List[Match]: - """ - Matches all possible non-overlapping sets of operators and their - data dependencies (``pattern``) in the Graph of a GraphModule - (``gm``), then replaces each of these matched subgraphs with another - subgraph (``replacement``). - - Args: - ``gm``: The GraphModule that wraps the Graph to operate on - ``pattern``: The subgraph to match in ``gm`` for replacement - ``replacement``: The subgraph to replace ``pattern`` with - - Returns: - List[Match]: A list of ``Match`` objects representing the places - in the original graph that ``pattern`` was matched to. The list - is empty if there are no matches. ``Match`` is defined as: - - .. code-block:: python - - class Match(NamedTuple): - # Node from which the match was found - anchor: Node - # Maps nodes in the pattern subgraph to nodes in the larger graph - nodes_map: Dict[Node, Node] - - Examples: - - .. code-block:: python - - import torch - from torch.fx import symbolic_trace, subgraph_rewriter - - class M(torch.nn.Module): - def __init__(self): - super().__init__() - - def forward(self, x, w1, w2): - m1 = torch.cat([w1, w2]).sum() - m2 = torch.cat([w1, w2]).sum() - return x + torch.max(m1) + torch.max(m2) - - def pattern(w1, w2): - return torch.cat([w1, w2]).sum() - - def replacement(w1, w2): - return torch.stack([w1, w2]) - - traced_module = symbolic_trace(M()) - - subgraph_rewriter.replace_pattern(traced_module, pattern, replacement) - - The above code will first match ``pattern`` in the ``forward`` - method of ``traced_module``. Pattern-matching is done based on - use-def relationships, not node names. For example, if you had - ``p = torch.cat([a, b])`` in ``pattern``, you could match - ``m = torch.cat([a, b])`` in the original ``forward`` function, - despite the variable names being different (``p`` vs ``m``). - - The ``return`` statement in ``pattern`` is matched based on its - value only; it may or may not match to the ``return`` statement in - the larger graph. In other words, the pattern doesn't have to extend - to the end of the larger graph. - - When the pattern is matched, it will be removed from the larger - function and replaced by ``replacement``. If there are multiple - matches for ``pattern`` in the larger function, each non-overlapping - match will be replaced. In the case of a match overlap, the first - found match in the set of overlapping matches will be replaced. - ("First" here being defined as the first in a topological ordering - of the Nodes' use-def relationships. In most cases, the first Node - is the parameter that appears directly after ``self``, while the - last Node is whatever the function returns.) - - One important thing to note is that the parameters of the - ``pattern`` Callable must be used in the Callable itself, - and the parameters of the ``replacement`` Callable must match - the pattern. The first rule is why, in the above code block, the - ``forward`` function has parameters ``x, w1, w2``, but the - ``pattern`` function only has parameters ``w1, w2``. ``pattern`` - doesn't use ``x``, so it shouldn't specify ``x`` as a parameter. - As an example of the second rule, consider replacing - - .. code-block:: python - - def pattern(x, y): - return torch.neg(x) + torch.relu(y) - - with - - .. code-block:: python - - def replacement(x, y): - return torch.relu(x) - - In this case, ``replacement`` needs the same number of parameters - as ``pattern`` (both ``x`` and ``y``), even though the parameter - ``y`` isn't used in ``replacement``. - - After calling ``subgraph_rewriter.replace_pattern``, the generated - Python code looks like this: - - .. code-block:: python - - def forward(self, x, w1, w2): - stack_1 = torch.stack([w1, w2]) - sum_1 = stack_1.sum() - stack_2 = torch.stack([w1, w2]) - sum_2 = stack_2.sum() - max_1 = torch.max(sum_1) - add_1 = x + max_1 - max_2 = torch.max(sum_2) - add_2 = add_1 + max_2 - return add_2 - """ - # Get the graphs for `gm`, `pattern`, `replacement` - original_graph = gm.graph - pattern_graph = symbolic_trace(pattern).graph - replacement_graph = symbolic_trace(replacement).graph - - # Find all possible pattern matches in original_graph. Note that - # pattern matches may overlap with each other. - matcher = _SubgraphMatcher(pattern_graph) - matches: List[Match] = [] - - # Consider each node as an "anchor" (deepest matching graph node) - for anchor in original_graph.nodes: - - if matcher.matches_subgraph_from_anchor(anchor): - - def pattern_is_contained(nodes_map: Dict[Node, Node]) -> bool: - # `lookup` represents all the nodes in `original_graph` - # that are part of `pattern` - lookup: Dict[Node, Node] = {v: k for k, v in nodes_map.items()} - for n in lookup.keys(): - - # Nodes that can "leak"... - - # Placeholders (by definition) - if n.op == "placeholder": - continue - # Pattern output (acts as a container) - if lookup[n].op == "output": - continue - # Result contained by pattern output (what we'll - # hook in to the new Graph, thus what we'll - # potentially use in other areas of the Graph as - # an input Node) - if (len(lookup[n].users) == 1 - and list(lookup[n].users.keys())[0].op == "output"): - continue - - for user in n.users: - # If this node has users that were not in - # `lookup`, then it must leak out of the - # pattern subgraph - if user not in lookup: - return False - return True - - # It's not a match if the pattern leaks out into the rest - # of the graph - if pattern_is_contained(matcher.nodes_map): - # Shallow copy nodes_map - matches.append(Match(anchor=anchor, - nodes_map=copy.copy({ - key: value - for key, value in matcher.nodes_map.items() - }))) - - # The set of all nodes in `original_graph` that we've seen thus far - # as part of a pattern match - replaced_nodes: Set[Node] = set() - # As we progressively replace nodes, we'll need to keep track of how the match results should change - match_changed_node: Dict[Node, Node] = dict() - - # Return True if one of the nodes in the current match has already - # been used as part of another match - def overlaps_with_prev_match(match: Match) -> bool: - for pn, gn in match.nodes_map.items(): - if pn.op in ["placeholder", "output"]: - continue - if gn in replaced_nodes and gn.op != "placeholder": - return True - return False - - for match in matches: - # Skip overlapping matches - if overlaps_with_prev_match(match): - continue - - # Map replacement graph nodes to their copy in `original_graph` - val_map: Dict[Node, Node] = {} - - pattern_placeholders = [n for n in pattern_graph.nodes - if n.op == "placeholder"] - assert len(pattern_placeholders) > 0 - replacement_placeholders = [n for n in replacement_graph.nodes - if n.op == "placeholder"] - assert len(pattern_placeholders) == len(replacement_placeholders) - placeholder_map = {r: p for r, p - in zip(replacement_placeholders, pattern_placeholders)} - - # node from `original_graph` that matched with the output node - # in `pattern` - subgraph_output: Node = match.anchor - - def mark_node_as_replaced(n: Node) -> None: - if n not in match.nodes_map.values(): - return - for n_ in n.all_input_nodes: - mark_node_as_replaced(n_) - replaced_nodes.add(n) - - for input_node in subgraph_output.all_input_nodes: - mark_node_as_replaced(input_node) - - # Initialize `val_map` with mappings from placeholder nodes in - # `replacement` to their corresponding node in `original_graph` - for replacement_node in replacement_placeholders: - # Get the `original_graph` placeholder node - # corresponding to the current `replacement_node` - pattern_node = placeholder_map[replacement_node] - original_graph_node = match_changed_node.get(match.nodes_map[pattern_node], match.nodes_map[pattern_node]) - - # Populate `val_map` - val_map[replacement_node] = original_graph_node - - # Copy the stack trace from the original graph to the replacement graph. - # Currently this is using a naive strategy: - # 1. find the first node with non-null stack trace in the original graph - # 2. if found, copy this stack trace to every node in the replacement graph - first_stack_trace = None - for pn, gn in match.nodes_map.items(): - if gn.stack_trace is not None: - first_stack_trace = gn.stack_trace - break - if first_stack_trace is not None: - for node in replacement_graph.nodes: - node.stack_trace = first_stack_trace - - # Copy the replacement graph over - with original_graph.inserting_before(subgraph_output): - copied_output = original_graph.graph_copy(replacement_graph, - val_map) - - # Clear out stack traces to prevent interference with next match - for node in replacement_graph.nodes: - node.stack_trace = None - - # Hook the output Node of the replacement subgraph in to the - # original Graph at the correct location - - # CASE 1: We need to hook the replacement subgraph in somewhere - # in the middle of the graph. We replace the Node in the - # original graph that corresponds to the end of the pattern - # subgraph - if subgraph_output.op != "output": - pattern_outputs = [n for n in pattern_graph.nodes - if n.op == "output"] - assert len(pattern_outputs) > 0 - replacement_outputs = [n for n in replacement_graph.nodes - if n.op == "output"] - assert len(replacement_outputs) == len(pattern_outputs) - outputs_map = {p: r for r, p - in zip(replacement_outputs, pattern_outputs)} - - for pn, gn in match.nodes_map.items(): - if gn.op == "placeholder": - continue - - # Search for the node corresponding to the output of the pattern - if pn.op != "output": - continue - assert subgraph_output == gn - - # Update all anchor inputs to the new nodes - rn = outputs_map[pn] - for pn_input, rn_input in zip(pn.all_input_nodes, rn.all_input_nodes): - gn_input = match.nodes_map[pn_input] - rn_input_in_original_graph = val_map[rn_input] - gn_input.replace_all_uses_with(rn_input_in_original_graph) - # We store the updated node point in case other nodes want to use it - match_changed_node[gn_input] = rn_input_in_original_graph - - assert subgraph_output.op != "output" - # CASE 2: The pattern subgraph match extends to the end of the - # original graph, so we need to change the current graph's - # output Node to reflect the insertion of the replacement graph. - # We'll keep the current output Node, but update its args and - # `_input_nodes` as necessary - else: - subgraph_output.args = ((copied_output,)) - if isinstance(copied_output, Node): - subgraph_output._input_nodes = {copied_output: None} - - assert isinstance(copied_output, Node) - # Erase the `pattern` nodes - for node in reversed(original_graph.nodes): - if len(node.users) == 0 and node.op != "output": - original_graph.erase_node(node) - - # Update the passed-in GraphModule to reflect the new state of - # `original_graph` - gm.recompile() - - # If `replacement` was an nn.Module, we'll need to make sure that - # all the submodules have been copied over correctly - if isinstance(replacement, torch.nn.Module): - _replace_submodules(gm, replacement) - - return matches diff --git a/torch/ao/quantization/fx/utils.py b/torch/ao/quantization/fx/utils.py index cbb56d405353e8..70b852395ca905 100644 --- a/torch/ao/quantization/fx/utils.py +++ b/torch/ao/quantization/fx/utils.py @@ -12,7 +12,9 @@ ) from typing import Callable, Optional, List, Dict, Any, Set, Tuple, Union, Type +from collections import namedtuple import operator +import warnings # A dictionary for querying the weight index for a given op WEIGHT_INDEX_DICT = { @@ -111,7 +113,7 @@ def get_per_tensor_qparams(activation_post_process): dtype = activation_post_process.dtype return scale, zero_point, dtype -def get_quantize_node_info(activation_post_process: Callable) -> Tuple[str, Union[Callable, str], Dict[str, Any]]: +def get_quantize_node_info(activation_post_process: Callable) -> Optional[Tuple[str, Union[Callable, str], Dict[str, Any]]]: ''' Given an activation_post_process module, return node_type(e.g. call_function), quantize op(e.g. quantize_per_tensor) and a dictionary of extracted qparams from the module @@ -137,14 +139,17 @@ def get_quantize_node_info(activation_post_process: Callable) -> Tuple[str, Unio node_type = "call_method" quantize_op = "to" qparams = {"_dtype_": dtype} - elif dtype == torch.float32 and compute_dtype in [torch.quint8, torch.qint8]: + elif dtype == torch.float32 and compute_dtype in [torch.quint8, torch.qint8, torch.float16]: + # dynamic quantization node_type = "call_function" quantize_op = torch.quantize_per_tensor_dynamic + # TODO: get reduce range from observer + # reduce_range = activation_post_process.reduce_range reduce_range = torch.backends.quantized.engine == "fbgemm" qparams = {"_dtype_": compute_dtype, "_reduce_range_": reduce_range} else: - raise Exception("Unsupported dtype in get_quantize_node_info:" + str(dtype)) - assert quantize_op is not None + warnings.warn(f"Unsupported activation_post_process in get_quantize_node_info: {activation_post_process}") + return None return node_type, quantize_op, qparams def quantize_node( @@ -193,7 +198,10 @@ def quantize_node( module_path = "" root_module = modules[''] graph = quantized_graph - node_type, quantize_op, qparams = get_quantize_node_info(obs_module) + maybe_quantize_node_info = get_quantize_node_info(obs_module) + assert maybe_quantize_node_info is not None, \ + f"Expecting quantize node info not to be None, observer: {obs_module}" + node_type, quantize_op, qparams = maybe_quantize_node_info inputs = [in_node] for key, value in qparams.items(): @@ -464,6 +472,74 @@ def all_node_args_have_no_tensors(node: Node, modules: Dict[str, torch.nn.Module cache[node] = result return result +def all_node_args_except_first(node: Node) -> List[int]: + """ + Returns all node arg indices after first + """ + return list(range(1, len(node.args))) + +def return_arg_list(arg_indices: List[int]) -> Callable[[Node], List[int]]: + """ + Constructs a function that takes a node as arg and returns the arg_indices + that are valid for node.args + """ + def arg_indices_func(node: Node) -> List[int]: + return [i for i in arg_indices if i < len(node.args)] + return arg_indices_func + +NodeInfo = namedtuple("NodeInfo", "op target") + +# this dict identifies which indices of a node are non tensors +# so that they can be propagated correctly since inserting observers +# for them would cause errors + +NON_OBSERVABLE_ARG_DICT: Dict[NodeInfo, Dict[Union[type, torch.dtype], Callable[[Node], List[int]]]] = { + NodeInfo("call_method", "masked_fill") : { + torch.bool: return_arg_list([1]), + float: return_arg_list([2]) + }, + NodeInfo("call_method", "permute") : { + int: all_node_args_except_first + }, + NodeInfo("call_method", "repeat") : { + int: all_node_args_except_first + }, + NodeInfo("call_method", "reshape") : { + int: all_node_args_except_first + }, + NodeInfo("call_method", "size") : { + int: return_arg_list([1]) + }, + NodeInfo("call_method", "transpose") : { + int: all_node_args_except_first + }, + NodeInfo("call_method", torch.transpose) : { + int: all_node_args_except_first + }, + NodeInfo("call_method", "unsqueeze") : { + int: return_arg_list([1]) + }, + NodeInfo("call_method", "unsqueeze_") : { + int: return_arg_list([1]) + }, + NodeInfo("call_method", torch.unsqueeze) : { + int: return_arg_list([1]) + }, + NodeInfo("call_method", "view") : { + int: all_node_args_except_first + }, +} + +EMPTY_ARG_DICT: Dict[Union[type, torch.dtype], Callable[[Node], List[int]]] = {} + +def get_non_observable_arg_indexes_and_types(node: Node) -> Dict[Union[type, torch.dtype], Callable[[Node], List[int]]]: + """ + Returns a dict with of non float tensor types as keys and values which correspond to a + function to retrieve the list (which takes the node as an argument) + """ + info = NodeInfo(node.op, node.target) + + return NON_OBSERVABLE_ARG_DICT.get(info, EMPTY_ARG_DICT) def node_return_type_is_int(node: Node) -> bool: """ @@ -472,13 +548,6 @@ def node_return_type_is_int(node: Node) -> bool: """ return node.op == 'call_method' and node.target == 'size' -def node_bool_tensor_arg_indexes(node: Node) -> List[int]: - """ - Returns indexes of boolean Tensor args - """ - if node.op == "call_method" and node.target == "masked_fill": - return [1] - return [] def is_get_tensor_info_node(node: Node) -> bool: """ Returns True if this node is a node that takes a Tensor as input and output some diff --git a/torch/ao/quantization/observer.py b/torch/ao/quantization/observer.py index 73f911a68f7b71..1bdc603213aa2c 100644 --- a/torch/ao/quantization/observer.py +++ b/torch/ao/quantization/observer.py @@ -128,6 +128,7 @@ class _ObserverBase(ObserverBase): This is sometimes required to avoid instruction overflow. quant_min: Minimum quantization value. If unspecified, it will follow the 8-bit setup. quant_max: Maximum quantization value. If unspecified, it will follow the 8-bit setup. + eps: Epsilon value for float32, Defaults to `torch.finfo(torch.float32).eps`. .. warning:: @@ -169,6 +170,7 @@ def __init__( quant_min=None, quant_max=None, factory_kwargs=None, + eps=torch.finfo(torch.float32).eps, ) -> None: factory_kwargs = torch.nn.factory_kwargs(factory_kwargs) super(_ObserverBase, self).__init__(dtype=dtype) @@ -180,7 +182,7 @@ def __init__( ) self.reduce_range = reduce_range self.register_buffer( - "eps", torch.tensor([torch.finfo(torch.float32).eps], **factory_kwargs) + "eps", torch.tensor([eps], **factory_kwargs) ) assert self.qscheme in ( torch.per_tensor_affine, @@ -346,8 +348,7 @@ class MinMaxObserver(_ObserverBase): reduce_range: Reduces the range of the quantized data type by 1 bit quant_min: Minimum quantization value. If unspecified, it will follow the 8-bit setup. quant_max: Maximum quantization value. If unspecified, it will follow the 8-bit setup. - memoryless: Boolean that controls whether observer removes old data when a new input is seen. - This is most useful for simulating dynamic quantization, especially during QAT. + eps: Epsilon value for float32, Defaults to `torch.finfo(torch.float32).eps`. Given running min/max as :math:`x_\text{min}` and :math:`x_\text{max}`, scale :math:`s` and zero point :math:`z` are computed as: @@ -406,7 +407,7 @@ def __init__( quant_min=None, quant_max=None, factory_kwargs=None, - memoryless=False, + eps=torch.finfo(torch.float32).eps, ) -> None: # For x86 quantized kernels, we need to ensure that the vpmaddubsw @@ -422,8 +423,8 @@ def __init__( quant_min=quant_min, quant_max=quant_max, factory_kwargs=factory_kwargs, + eps=eps, ) - self.memoryless = memoryless factory_kwargs = torch.nn.factory_kwargs(factory_kwargs) self.register_buffer("min_val", torch.tensor(float("inf"), **factory_kwargs)) self.register_buffer("max_val", torch.tensor(float("-inf"), **factory_kwargs)) @@ -441,8 +442,6 @@ def forward(self, x_orig): r"""Records the running minimum and maximum of ``x``.""" if x_orig.numel() == 0: return x_orig - elif self.memoryless: - self.reset_min_max_vals() x = x_orig.detach() # avoid keeping autograd tape x = x.to(self.min_val.dtype) min_val_cur, max_val_cur = torch.aminmax(x) @@ -483,6 +482,7 @@ class MovingAverageMinMaxObserver(MinMaxObserver): reduce_range: Reduces the range of the quantized data type by 1 bit quant_min: Minimum quantization value. If unspecified, it will follow the 8-bit setup. quant_max: Maximum quantization value. If unspecified, it will follow the 8-bit setup. + eps: Epsilon value for float32, Defaults to `torch.finfo(torch.float32).eps`. The moving average min/max is computed as follows @@ -519,6 +519,7 @@ def __init__( reduce_range=False, quant_min=None, quant_max=None, + eps=torch.finfo(torch.float32).eps, **kwargs ) -> None: self.averaging_constant = averaging_constant @@ -528,6 +529,7 @@ def __init__( reduce_range=reduce_range, quant_min=quant_min, quant_max=quant_max, + eps=eps, **kwargs ) @@ -565,8 +567,7 @@ class PerChannelMinMaxObserver(_ObserverBase): reduce_range: Reduces the range of the quantized data type by 1 bit quant_min: Minimum quantization value. If unspecified, it will follow the 8-bit setup. quant_max: Maximum quantization value. If unspecified, it will follow the 8-bit setup. - memoryless: Boolean that controls whether observer removes old data when a new input is seen. - This is most useful for simulating dynamic quantization, especially during QAT. + eps: Epsilon value for float32, Defaults to `torch.finfo(torch.float32).eps`. The quantization parameters are computed the same way as in :class:`~torch.ao.quantization.observer.MinMaxObserver`, with the difference @@ -588,7 +589,7 @@ def __init__( quant_min=None, quant_max=None, factory_kwargs=None, - memoryless=False, + eps=torch.finfo(torch.float32).eps, ) -> None: super(PerChannelMinMaxObserver, self).__init__( dtype=dtype, @@ -597,8 +598,8 @@ def __init__( quant_min=quant_min, quant_max=quant_max, factory_kwargs=factory_kwargs, + eps=eps, ) - self.memoryless = memoryless factory_kwargs = torch.nn.factory_kwargs(factory_kwargs) self.ch_axis = ch_axis self.register_buffer("min_val", torch.tensor([], **factory_kwargs)) @@ -631,7 +632,7 @@ def _forward(self, x_orig): # are done in place and types need to match for comparisons y = y.to(self.min_val.dtype) y = torch.flatten(y, start_dim=1) - if min_val.numel() == 0 or max_val.numel() == 0 or self.memoryless: + if min_val.numel() == 0 or max_val.numel() == 0: min_val, max_val = torch.aminmax(y, dim=1) else: min_val_cur, max_val_cur = torch.aminmax(y, dim=1) @@ -751,6 +752,7 @@ class MovingAveragePerChannelMinMaxObserver(PerChannelMinMaxObserver): reduce_range: Reduces the range of the quantized data type by 1 bit quant_min: Minimum quantization value. If unspecified, it will follow the 8-bit setup. quant_max: Maximum quantization value. If unspecified, it will follow the 8-bit setup. + eps: Epsilon value for float32, Defaults to `torch.finfo(torch.float32).eps`. The quantization parameters are computed the same way as in :class:`~torch.ao.quantization.observer.MovingAverageMinMaxObserver`, with the @@ -770,6 +772,7 @@ def __init__( reduce_range=False, quant_min=None, quant_max=None, + eps=torch.finfo(torch.float32).eps, **kwargs ) -> None: super(MovingAveragePerChannelMinMaxObserver, self).__init__( @@ -779,6 +782,7 @@ def __init__( reduce_range=reduce_range, quant_min=quant_min, quant_max=quant_max, + eps=eps, **kwargs ) self.averaging_constant = averaging_constant @@ -822,6 +826,7 @@ class HistogramObserver(_ObserverBase): dtype: Quantized data type qscheme: Quantization scheme to be used reduce_range: Reduces the range of the quantized data type by 1 bit + eps: Epsilon value for float32, Defaults to `torch.finfo(torch.float32).eps`. The scale and zero point are computed as follows: @@ -848,6 +853,7 @@ def __init__( quant_min=None, quant_max=None, factory_kwargs=None, + eps=torch.finfo(torch.float32).eps, ) -> None: # bins: The number of bins used for histogram calculation. super(HistogramObserver, self).__init__( @@ -857,6 +863,7 @@ def __init__( quant_min=quant_min, quant_max=quant_max, factory_kwargs=factory_kwargs, + eps=eps, ) factory_kwargs = torch.nn.factory_kwargs(factory_kwargs) self.bins = bins @@ -1435,6 +1442,13 @@ def load_observer_state_dict(mod, obs_dict): Default weight observer. """ +weight_observer_range_neg_127_to_127 = MinMaxObserver.with_args( + dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, + quant_min=-127, quant_max=127, eps=2 ** -12) +""" +Symmetric weight observer with the 8-bit values restricted to [-127, +127], excluding -128. +""" + default_histogram_observer = HistogramObserver.with_args(quant_min=0, quant_max=127) """ Default histogram observer, usually used for PTQ. @@ -1448,6 +1462,13 @@ def load_observer_state_dict(mod, obs_dict): weight quantization is supported, such as `fbgemm`. """ +per_channel_weight_observer_range_neg_127_to_127 = MinMaxObserver.with_args( + dtype=torch.qint8, qscheme=torch.per_channel_symmetric, + quant_min=-127, quant_max=127, eps=2 ** -12) +""" +Per-channel, symmetric weight observer with the 8-bit values restricted to [-127, +127], excluding -128. +""" + default_dynamic_quant_observer = PlaceholderObserver.with_args( dtype=torch.float, compute_dtype=torch.quint8 ) diff --git a/torch/ao/quantization/qconfig.py b/torch/ao/quantization/qconfig.py index c35739ab9b82ed..94e9646d84522a 100644 --- a/torch/ao/quantization/qconfig.py +++ b/torch/ao/quantization/qconfig.py @@ -16,6 +16,8 @@ default_fused_per_channel_wt_fake_quant, default_embedding_fake_quant, default_embedding_fake_quant_4bit, + fused_wt_fake_quant_range_neg_127_to_127, + fused_per_channel_wt_fake_quant_range_neg_127_to_127, ) from .observer import ( @@ -32,6 +34,8 @@ default_per_channel_weight_observer, default_placeholder_observer, default_weight_observer, + weight_observer_range_neg_127_to_127, + per_channel_weight_observer_range_neg_127_to_127, default_reuse_input_observer, ) import warnings @@ -113,7 +117,7 @@ def __new__(cls, activation=torch.nn.Identity, weight=torch.nn.Identity): Default dynamic qconfig. """ -float16_dynamic_qconfig = QConfig(activation=PlaceholderObserver.with_args(dtype=torch.float32), +float16_dynamic_qconfig = QConfig(activation=PlaceholderObserver.with_args(dtype=torch.float32, compute_dtype=torch.float16), weight=PlaceholderObserver.with_args(dtype=torch.float16)) """ Dynamic qconfig with weights quantized to `torch.float16`. @@ -184,8 +188,8 @@ def get_default_qconfig(backend='fbgemm', version=0): Returns the default PTQ qconfig for the specified backend. Args: - * `backend`: a string representing the target backend. Currently supports `fbgemm` - and `qnnpack`. + * `backend`: a string representing the target backend. Currently supports `fbgemm`, + `qnnpack` and `onednn`. Return: qconfig @@ -197,6 +201,9 @@ def get_default_qconfig(backend='fbgemm', version=0): elif backend == 'qnnpack': qconfig = QConfig(activation=HistogramObserver.with_args(reduce_range=False), weight=default_weight_observer) + elif backend == 'onednn': + qconfig = QConfig(activation=HistogramObserver.with_args(reduce_range=False), + weight=default_per_channel_weight_observer) else: qconfig = default_qconfig else: @@ -205,6 +212,42 @@ def get_default_qconfig(backend='fbgemm', version=0): return qconfig +""" +Default, symmetric PTQ qconfig for the specified backend. And a per_channel +variant of the same. + +Symmetric here applies to signed weights with zero point = 0, and additional +value restrictions. The activations are also signed 8-bit integers with this +qconfig. + + * Once this change is merged [as of 3/17/22], with backend or qengine = + 'qnnpack', some quantized operators with this symmetric qconfig may use + operators from xnnpack library. + + ** Support to use xnnpack ops with `qnnpack` backed for asymmetric + qconfig (returned by get_default_qconfig()) is not available yet. + + * This qconfig uses signed activations and weights. Weights have added + restrictions such as zero point is forced to be 0, making the weights + symmetric, hence the name. And the 8-bit quantized values are + restricting to to [-127, +127], excluding -128. + + * xnnpack has a requantization scale value restriction, 0x1p-32 <= + requantization_scale < 256.0 where, `requantization_scale = (input_scale + * kernel_scale) / (output_scale)`. Using this eps (w/ assumed max value + of 256) is to prevent requantization_scale to go below xnnpack lower + threshold. +""" +default_symmetric_qnnpack_qconfig = QConfig(activation=HistogramObserver.with_args(dtype=torch.qint8, + reduce_range=False, + eps=2 ** -12), + weight=weight_observer_range_neg_127_to_127) + +default_per_channel_symmetric_qnnpack_qconfig = QConfig(activation=HistogramObserver.with_args(dtype=torch.qint8, + reduce_range=False, + eps=2 ** -12), + weight=per_channel_weight_observer_range_neg_127_to_127) + default_embedding_qat_qconfig = QConfig(activation=NoopObserver.with_args(dtype=torch.float32), weight=default_embedding_fake_quant) @@ -216,8 +259,8 @@ def get_default_qat_qconfig(backend='fbgemm', version=1): Returns the default QAT qconfig for the specified backend. Args: - * `backend`: a string representing the target backend. Currently supports `fbgemm` - and `qnnpack`. + * `backend`: a string representing the target backend. Currently supports `fbgemm`, + `qnnpack` and `onednn`. * `version`: version, for backwards compatibility. Can be `None` or `1`. Return: @@ -237,6 +280,11 @@ def get_default_qat_qconfig(backend='fbgemm', version=1): quant_max=255, reduce_range=False), weight=default_weight_fake_quant) + elif backend == 'onednn': + qconfig = QConfig(activation=FakeQuantize.with_args(observer=MovingAverageMinMaxObserver, + quant_min=0, + quant_max=255), + weight=default_per_channel_weight_fake_quant) else: qconfig = default_qat_qconfig # Use the fused observe + fake_quant modules for doing QAT. @@ -253,6 +301,11 @@ def get_default_qat_qconfig(backend='fbgemm', version=1): quant_max=255, reduce_range=False), weight=default_fused_wt_fake_quant) + elif backend == 'onednn': + qconfig = QConfig(activation=FusedMovingAvgObsFakeQuantize.with_args(observer=MovingAverageMinMaxObserver, + quant_min=0, + quant_max=255), + weight=default_fused_per_channel_wt_fake_quant) else: qconfig = default_qat_qconfig_v2 else: @@ -261,6 +314,27 @@ def get_default_qat_qconfig(backend='fbgemm', version=1): return qconfig +""" +Default symmetric QAT qconfig for qnnpack. And its per channel weight variant. +""" +default_symmetric_qnnpack_qat_qconfig = QConfig( + activation=FusedMovingAvgObsFakeQuantize.with_args(observer=MovingAverageMinMaxObserver, + quant_min=-128, + quant_max=127, + dtype=torch.qint8, + reduce_range=False, + eps=2 ** -12), + weight=fused_wt_fake_quant_range_neg_127_to_127) + +default_per_channel_symmetric_qnnpack_qat_qconfig = QConfig( + activation=FusedMovingAvgObsFakeQuantize.with_args(observer=MovingAverageMinMaxObserver, + quant_min=-128, + quant_max=127, + dtype=torch.qint8, + reduce_range=False, + eps=2 ** -12), + weight=fused_per_channel_wt_fake_quant_range_neg_127_to_127) + def _get_default_qconfig_dict_helper(qconfig, qconfig_transpose): return { "": qconfig, @@ -404,9 +478,10 @@ def partial_equals(p1, p2): def activation_is_memoryless(qconfig: QConfig): """ Return whether the observer for activations defined in the given QConfig is memoryless. + This means a MovingAverage observer with averaging constant equal to 1. """ def _is_memoryless(observer): - return hasattr(observer, "memoryless") and observer.memoryless + return hasattr(observer, "averaging_constant") and observer.averaging_constant == 1 act = qconfig.activation() if isinstance(act, FakeQuantizeBase) and hasattr(act, "activation_post_process"): return _is_memoryless(act.activation_post_process) diff --git a/torch/ao/quantization/quantization_mappings.py b/torch/ao/quantization/quantization_mappings.py index d561f42ad44722..88016f06cda057 100644 --- a/torch/ao/quantization/quantization_mappings.py +++ b/torch/ao/quantization/quantization_mappings.py @@ -23,9 +23,12 @@ default_symmetric_fixed_qparams_fake_quant, ) from torch.ao.quantization.utils import get_combined_dict +from torch.nn.utils.parametrize import type_before_parametrizations # Default map for swapping float module to reference quantized modules DEFAULT_REFERENCE_STATIC_QUANT_MODULE_MAPPINGS : Dict[Callable, Any] = { + QuantStub: nnq.Quantize, + DeQuantStub: nnq.DeQuantize, nn.Linear: nnqr.Linear, nn.Conv1d: nnqr.Conv1d, nn.Conv2d: nnqr.Conv2d, @@ -33,6 +36,12 @@ nn.ConvTranspose1d: nnqr.ConvTranspose1d, nn.ConvTranspose2d: nnqr.ConvTranspose2d, nn.ConvTranspose3d: nnqr.ConvTranspose3d, + nn.Embedding: nnqr.Embedding, + nn.EmbeddingBag: nnqr.EmbeddingBag, + nn.GRUCell: nnqr.GRUCell, + nn.LSTMCell: nnqr.LSTMCell, + nn.RNNCell: nnqr.RNNCell, + nn.LSTM: nnqr.LSTM, } # Default map for swapping float module to quantized ones @@ -175,6 +184,11 @@ def get_default_static_quant_module_mappings() -> Dict[Callable, Any]: ''' return copy.deepcopy(DEFAULT_STATIC_QUANT_MODULE_MAPPINGS) +def get_default_static_quant_reference_module_mappings() -> Dict[Callable, Any]: + ''' Get reference module mapping for post training static quantization + ''' + return copy.deepcopy(DEFAULT_REFERENCE_STATIC_QUANT_MODULE_MAPPINGS) + def get_embedding_static_quant_module_mappings() -> Dict[Callable, Any]: ''' Get module mapping, including mapping for embedding QAT ''' @@ -293,7 +307,7 @@ def _get_special_act_post_process(module: torch.nn.Module) -> Optional[Callable] input: torch.nn.Sigmoid output: default_affine_fixed_qparam_fake_quant """ - return DEFAULT_MODULE_TO_ACT_POST_PROCESS.get(type(module), None) + return DEFAULT_MODULE_TO_ACT_POST_PROCESS.get(type_before_parametrizations(module), None) def _has_special_act_post_process(module: torch.nn.Module) -> bool: return module.training and type(module) in DEFAULT_MODULE_TO_ACT_POST_PROCESS diff --git a/torch/ao/quantization/quantize.py b/torch/ao/quantization/quantize.py index fad2b8abe6eabc..f5aa195c94dd9e 100644 --- a/torch/ao/quantization/quantize.py +++ b/torch/ao/quantization/quantize.py @@ -10,13 +10,14 @@ from torch.ao.quantization.quantization_mappings import ( get_default_dynamic_quant_module_mappings, get_default_static_quant_module_mappings, + get_default_static_quant_reference_module_mappings, get_default_qat_module_mappings, get_default_qconfig_propagation_list, no_observer_set, _has_special_act_post_process, _get_special_act_post_process, ) -from .utils import get_qparam_dict +from .utils import get_qparam_dict, has_no_children_ignoring_parametrizations from torch.ao.quantization.stubs import DeQuantStub, QuantWrapper from torch.ao.quantization.qconfig import ( add_module_to_qconfig_obs_ctr, @@ -25,6 +26,7 @@ float_qparams_weight_only_qconfig, float_qparams_weight_only_qconfig_4bit, activation_is_memoryless) +from torch.nn.utils.parametrize import type_before_parametrizations def is_activation_post_process(module): return (isinstance(module, torch.ao.quantization.ObserverBase) or @@ -32,7 +34,7 @@ def is_activation_post_process(module): def _propagate_qconfig_helper(module, qconfig_dict, - qconfig_parent=None, prefix=''): + qconfig_parent=None, prefix='', prepare_custom_config_dict=None): r"""This is a helper function for `propagate_qconfig_` Args: @@ -44,12 +46,14 @@ def _propagate_qconfig_helper(module, qconfig_dict, module prefix: corresponding prefix of the current module, used as key in qconfig_dict + prepare_custom_config_dict: dictionary for custom handling of modules + see docs for :func:`~torch.ao.quantization.prepare_fx` Return: None, module is modified inplace with qconfig attached """ - module_qconfig = qconfig_dict.get(type(module), qconfig_parent) + module_qconfig = qconfig_dict.get(type_before_parametrizations(module), qconfig_parent) module_qconfig = qconfig_dict.get(prefix, module_qconfig) module_qconfig = getattr(module, 'qconfig', module_qconfig) @@ -60,10 +64,16 @@ def _propagate_qconfig_helper(module, qconfig_dict, for name, child in module.named_children(): module_prefix = prefix + '.' + name if prefix else name - _propagate_qconfig_helper(child, qconfig_dict, - qconfig_with_device_check, module_prefix) + # do no not propagate qconfig to child if child is non traceable + if prepare_custom_config_dict is None or not ( + name in prepare_custom_config_dict.get("non_traceable_module_name", []) + or type(child) in prepare_custom_config_dict.get("non_traceable_module_class", []) + ): + _propagate_qconfig_helper( + child, qconfig_dict, qconfig_with_device_check, module_prefix + ) -def propagate_qconfig_(module, qconfig_dict=None): +def propagate_qconfig_(module, qconfig_dict=None, prepare_custom_config_dict=None): r"""Propagate qconfig through the module hierarchy and assign `qconfig` attribute on each leaf module @@ -73,13 +83,17 @@ def propagate_qconfig_(module, qconfig_dict=None): quantization configuration, qconfig applies to all submodules of a given module unless qconfig for the submodules are specified (when the submodule already has qconfig attribute) + prepare_custom_config_dict: dictionary for custom handling of modules + see docs for :func:`~torch.ao.quantization.prepare_fx` Return: None, module is modified inplace with qconfig attached """ if qconfig_dict is None: qconfig_dict = {} - _propagate_qconfig_helper(module, qconfig_dict) + if prepare_custom_config_dict is None: + prepare_custom_config_dict = {} + _propagate_qconfig_helper(module, qconfig_dict, prepare_custom_config_dict=prepare_custom_config_dict) def _observer_forward_hook(self, input, output): r"""Forward hook that calls observer on the output @@ -157,9 +171,9 @@ def insert_activation_post_process(m, special_act_post_process=None): for name, child in module.named_children(): # TODO remove Dropout special after codebase stable - if type(child) in [nn.Dropout]: + if type_before_parametrizations(child) in [nn.Dropout]: continue - elif type(child) in [nnq.FloatFunctional, nnq.QFunctional]: + elif type_before_parametrizations(child) in [nnq.FloatFunctional, nnq.QFunctional]: if needs_observation(child): child.activation_post_process = get_activation_post_process(child.qconfig, device) elif isinstance(child, _FusedModule): @@ -169,23 +183,23 @@ def insert_activation_post_process(m, special_act_post_process=None): elif _has_special_act_post_process(child): special_act_post_process = _get_special_act_post_process(child) insert_activation_post_process(child, special_act_post_process) - elif non_leaf_module_list is not None and type(child) in non_leaf_module_list: + elif non_leaf_module_list is not None and type_before_parametrizations(child) in non_leaf_module_list: if needs_observation(child): insert_activation_post_process(child) - elif needs_observation(child) and type(child) in custom_module_class_mapping: - observed_child = custom_module_class_mapping[type(child)].from_float(child) + elif needs_observation(child) and type_before_parametrizations(child) in custom_module_class_mapping: + observed_child = custom_module_class_mapping[type_before_parametrizations(child)].from_float(child) setattr(module, name, observed_child) # TODO: These are the modules that cannot be observed # Once there are more, we should move them to a separate list - if custom_module_class_mapping[type(child)] not in no_observer_set(): + if custom_module_class_mapping[type_before_parametrizations(child)] not in no_observer_set(): insert_activation_post_process(observed_child) else: add_observer_(child, qconfig_propagation_list, non_leaf_module_list, device, custom_module_class_mapping) # Insert observers only for leaf nodes, note that this observer is for # the output of the module, for input QuantStub will observe them - if len(module._modules) == 0 and not isinstance(module, torch.nn.Sequential) \ - and type(module) in qconfig_propagation_list: + if has_no_children_ignoring_parametrizations(module) and not isinstance(module, torch.nn.Sequential) \ + and type_before_parametrizations(module) in qconfig_propagation_list: insert_activation_post_process(module) def get_unique_devices_(module): @@ -207,7 +221,7 @@ def add_quant_dequant(module): wraps the input module, the latter case only happens when the input module is a leaf module and we want to quantize it. """ - if len(module._modules) == 0 and hasattr(module, 'qconfig') and module.qconfig: + if has_no_children_ignoring_parametrizations(module) and hasattr(module, 'qconfig') and module.qconfig: return QuantWrapper(module) for name, child in module.named_children(): @@ -472,7 +486,7 @@ def quantize_qat(model, run_fn, run_args, inplace=False): def convert( module, mapping=None, inplace=False, remove_qconfig=True, - convert_custom_config_dict=None): + is_reference=False, convert_custom_config_dict=None): r"""Converts submodules in input module to a different module according to `mapping` by calling `from_float` method on the target module class. And remove qconfig at the end if remove_qconfig is set to True. @@ -503,7 +517,7 @@ def convert( if not inplace: module = copy.deepcopy(module) _convert( - module, mapping, inplace=True, + module, mapping, inplace=True, is_reference=is_reference, convert_custom_config_dict=convert_custom_config_dict) if remove_qconfig: _remove_qconfig(module) @@ -511,7 +525,7 @@ def convert( def _convert( module, mapping=None, inplace=False, - convert_custom_config_dict=None): + is_reference=False, convert_custom_config_dict=None): r"""Converts submodules in input module to a different module according to `mapping` by calling `from_float` method on the target module class @@ -522,10 +536,12 @@ def _convert( Modules inplace: carry out model transformations in-place, the original module is mutated + is_reference: a flag to enable quantized reference module """ if mapping is None: - mapping = get_default_static_quant_module_mappings() + mapping = get_default_static_quant_reference_module_mappings() if is_reference \ + else get_default_static_quant_module_mappings() if convert_custom_config_dict is None: convert_custom_config_dict = {} custom_module_class_mapping = convert_custom_config_dict.get("observed_to_quantized_custom_module_class", {}) @@ -537,9 +553,9 @@ def _convert( # both fused modules and observed custom modules are # swapped as one unit if not isinstance(mod, _FusedModule) and \ - type(mod) not in custom_module_class_mapping: + type_before_parametrizations(mod) not in custom_module_class_mapping: _convert(mod, mapping, True, # inplace - convert_custom_config_dict) + is_reference, convert_custom_config_dict) reassign[name] = swap_module(mod, mapping, custom_module_class_mapping) for key, value in reassign.items(): @@ -561,11 +577,11 @@ def swap_module(mod, mapping, custom_module_class_mapping): new_mod = mod if hasattr(mod, 'qconfig') and mod.qconfig is not None: swapped = False - if type(mod) in custom_module_class_mapping: - new_mod = custom_module_class_mapping[type(mod)].from_observed(mod) + if type_before_parametrizations(mod) in custom_module_class_mapping: + new_mod = custom_module_class_mapping[type_before_parametrizations(mod)].from_observed(mod) swapped = True - elif type(mod) in mapping: - qmod = mapping[type(mod)] + elif type_before_parametrizations(mod) in mapping: + qmod = mapping[type_before_parametrizations(mod)] if hasattr(qmod, '_IS_REFERENCE') and qmod._IS_REFERENCE: assert mod.qconfig is not None weight_post_process = mod.qconfig.weight() diff --git a/torch/ao/quantization/quantize_fx.py b/torch/ao/quantization/quantize_fx.py index 1eb71c1ca20d04..c5929304c5a1b2 100644 --- a/torch/ao/quantization/quantize_fx.py +++ b/torch/ao/quantization/quantize_fx.py @@ -6,7 +6,8 @@ from torch.fx.node import Target, Node, Argument from torch.nn.intrinsic import _FusedModule from .fx import fuse # noqa: F401 -from .fx import prepare, convert # noqa: F401 +from .fx import prepare # noqa: F401 +from .fx.convert import convert from .fx import get_tensorrt_backend_config_dict # noqa: F401 from .fx.graph_module import ObservedGraphModule from .fx.qconfig_utils import ( @@ -309,10 +310,6 @@ def fuse_fx( * `fuse_custom_config_dict`: Dictionary for custom configurations for fuse_fx, e.g.:: fuse_custom_config_dict = { - "additional_fuser_method_mapping": { - (Module1, Module2): fuse_module1_module2 - } - # Attributes that are not used in forward function will # be removed when constructing GraphModule, this is a list of attributes # to preserve as an attribute of the GraphModule even when they are @@ -328,7 +325,6 @@ def fuse_fx( """ torch._C._log_api_usage_once("quantization_api.quantize_fx.fuse_fx") - assert not model.training, "fuse_fx only works on models in eval mode" check_is_valid_fuse_custom_config_dict(fuse_custom_config_dict) graph_module = torch.fx.symbolic_trace(model) preserved_attributes: Set[str] = set() @@ -439,27 +435,6 @@ def prepare_fx( NonTraceableModule ], - # Additional fuser_method mapping - "additional_fuser_method_mapping": { - (torch.nn.Conv2d, torch.nn.BatchNorm2d): fuse_conv_bn - }, - - # Additioanl module mapping for qat - "additional_qat_module_mapping": { - torch.nn.intrinsic.ConvBn2d: torch.nn.qat.ConvBn2d - }, - - # Additional fusion patterns - "additional_fusion_pattern": { - (torch.nn.BatchNorm2d, torch.nn.Conv2d): ConvReluFusionhandler - }, - - # Additional quantization patterns - "additional_quant_pattern": { - torch.nn.Conv2d: ConvReluQuantizeHandler, - (torch.nn.ReLU, torch.nn.Conv2d): ConvReluQuantizeHandler, - } - # By default, inputs and outputs of the graph are assumed to be in # fp32. Providing `input_quantized_idxs` will set the inputs with the # corresponding indices to be quantized. Providing @@ -511,7 +486,6 @@ def calibrate(model, data_loader): """ torch._C._log_api_usage_once("quantization_api.quantize_fx.prepare_fx") - assert not model.training, "prepare_fx only works for models in " + "eval mode" return _prepare_fx( model, qconfig_dict, @@ -560,7 +534,6 @@ def train_loop(model, train_data): """ torch._C._log_api_usage_once("quantization_api.quantize_fx.prepare_qat_fx") - assert model.training, "prepare_qat_fx only works for models in " + "train mode" return _prepare_fx( model, qconfig_dict, @@ -577,6 +550,7 @@ def _convert_fx( is_standalone_module: bool = False, _remove_qconfig: bool = True, qconfig_dict: Dict[str, Any] = None, + backend_config_dict: Dict[str, Any] = None, ) -> torch.nn.Module: """ `is_standalone_module`: see docs in :func:`~torch.ao.quantization.prepare_standalone_module_fx` """ @@ -593,6 +567,7 @@ def _convert_fx( is_standalone_module, _remove_qconfig_flag=_remove_qconfig, convert_qconfig_dict=qconfig_dict, + backend_config_dict=backend_config_dict, ) preserved_attributes = convert_custom_config_dict.get("preserved_attributes", []) @@ -607,6 +582,7 @@ def convert_fx( convert_custom_config_dict: Optional[Dict[str, Any]] = None, _remove_qconfig: bool = True, qconfig_dict: Dict[str, Any] = None, + backend_config_dict: Dict[str, Any] = None, ) -> torch.nn.Module: r""" Convert a calibrated or trained model to a quantized model @@ -618,20 +594,6 @@ def convert_fx( * `convert_custom_config_dict`: dictionary for custom configurations for convert function:: convert_custom_config_dict = { - - # additional object (module/operator) mappings that will overwrite the default - # module mappinng - "additional_object_mapping": { - "static": { - FloatModule: QuantizedModule, - float_op: quantized_op - }, - "dynamic": { - FloatModule: DynamicallyQuantizedModule, - float_op: dynamically_quantized_op - }, - }, - # user will manually define the corresponding quantized # module class which has a from_observed class method that converts # observed custom module to quantized custom module @@ -677,6 +639,11 @@ def convert_fx( ], } + * `backend_config_dict`: A configuration for the backend which describes how + operators should be quantized in the backend, this includes quantization + mode support (static/dynamic/weight_only), dtype support (quint8/qint8 etc.), + observer placement for each operators and fused operators. Detailed + documentation can be found in torch/ao/quantization/fx/backend_config/README.md Return: A quantized model (GraphModule) @@ -694,6 +661,7 @@ def convert_fx( convert_custom_config_dict, _remove_qconfig=_remove_qconfig, qconfig_dict=qconfig_dict, + backend_config_dict=backend_config_dict, ) diff --git a/torch/ao/quantization/utils.py b/torch/ao/quantization/utils.py index 0533119703bcb1..f42b5c1ce723f0 100644 --- a/torch/ao/quantization/utils.py +++ b/torch/ao/quantization/utils.py @@ -6,6 +6,7 @@ import torch from torch.ao.quantization.quant_type import QuantType, quant_type_to_str from typing import Tuple, Any, Union, Callable +from torch.nn.utils.parametrize import is_parametrized # Type for fusion patterns, it can be more complicated than the following actually, # see pattern.md for docs @@ -184,6 +185,16 @@ def activation_is_statically_quantized(qconfig): """ return activation_dtype(qconfig) in [torch.quint8, torch.qint8, torch.float16] +def activation_is_dynamically_quantized(qconfig): + """ Given a qconfig, decide if the activation needs to be + dynamically quantized or not, this includes dynamically quantizing to + quint8, qint8 and float16 + """ + activation_dtype, _, activation_compute_dtype = \ + get_qconfig_dtypes(qconfig) + return activation_dtype == torch.float and \ + activation_compute_dtype in [torch.quint8, torch.qint8, torch.float16] + def activation_is_int8_quantized(qconfig): """ Given a qconfig, decide if the activation needs to be quantized to int8 or not, this includes quantizing to quint8, qint8 @@ -200,7 +211,7 @@ def weight_is_quantized(qconfig): """ Given a qconfig, decide if the weight needs to be quantized or not """ - return weight_dtype(qconfig) in [torch.quint8, torch.qint8, torch.float16] + return weight_dtype(qconfig) in [torch.quint8, torch.qint8, torch.float16, torch.quint4x2] def weight_is_statically_quantized(qconfig): """ Given a qconfig, decide if the weight needs to be statically @@ -235,7 +246,7 @@ def get_quant_type(qconfig): assert qconfig is not None activation = qconfig.activation() weight = qconfig.weight() - static_dtypes = [torch.quint8, torch.qint8] + static_dtypes = [torch.quint8, torch.qint8, torch.quint4x2] if weight.dtype in static_dtypes: if activation.dtype in static_dtypes: return QuantType.STATIC @@ -289,6 +300,7 @@ def calculate_qmin_qmax(quant_min: int, quant_max: int, has_customized_qrange: b r"""Calculates actual qmin and qmax based on the quantization range, observer datatype and if range is reduced. """ + # TODO(jerryzh): Figure out why custom quant_min/quant_max are still adjusted. if has_customized_qrange: # This initialization here is to be resolve TorchScript compilation issues and allow # using of refinement to decouple initial_qmin and initial_qmax from quantization range. @@ -315,10 +327,6 @@ def calculate_qmin_qmax(quant_min: int, quant_max: int, has_customized_qrange: b assert ( 0 < qrange_len <= 2**31 ), "quantization range should be positive and not exceed the maximum bit range (=4294967296)." - if dtype == torch.qint8: - quant_min, quant_max = -qrange_len // 2, qrange_len // 2 - 1 - else: - quant_min, quant_max = 0, qrange_len - 1 if reduce_range: quant_min, quant_max = quant_min // 2, quant_max // 2 else: @@ -349,3 +357,16 @@ def _parent_name(target): return '', r[0] else: return r[0], r[1] + +def has_no_children_ignoring_parametrizations(module): + """ + Checks if module._modules is empty or + if module is a parametrization, checks that module._modules only has + the 'parametrizations' module + """ + if len(module._modules) == 0: + return True + elif is_parametrized(module): + return len(module._modules) == 1 and 'parametrizations' in module._modules + else: + return False diff --git a/torch/autograd/__init__.py b/torch/autograd/__init__.py index 28eb729ffcbae0..7c1188da10b47a 100644 --- a/torch/autograd/__init__.py +++ b/torch/autograd/__init__.py @@ -309,7 +309,7 @@ def variable(*args, **kwargs): _supported_activities, _add_metadata_json, SavedTensor, _push_saved_tensors_default_hooks, _pop_saved_tensors_default_hooks) -from torch._C._autograd import (_ProfilerResult, _KinetoEvent, +from torch._C._autograd import (_ProfilerResult, _KinetoEvent, _kineto_step, _prepare_profiler, _enable_profiler, _disable_profiler) from . import profiler diff --git a/torch/autograd/functional.py b/torch/autograd/functional.py index 6fe0b5ee09f354..d94407e30833c1 100644 --- a/torch/autograd/functional.py +++ b/torch/autograd/functional.py @@ -416,11 +416,12 @@ def _construct_standard_basis_for(tensors: Tuple[torch.Tensor, ...], tensor_nume assert len(tensors) == len(tensor_numels) assert len(tensors) > 0 total_numel = sum(tensor_numels) - diag_start_indices = (0, *torch.tensor(tensor_numels).cumsum(dim=0)[:-1].neg().unbind()) chunks = tuple(tensor.new_zeros(total_numel, tensor_numel) for tensor, tensor_numel in zip(tensors, tensor_numels)) - for chunk, diag_start_idx in zip(chunks, diag_start_indices): + diag_start_idx = 0 + for chunk, numel in zip(chunks, tensor_numels): chunk.diagonal(diag_start_idx).fill_(1) + diag_start_idx -= numel return chunks diff --git a/torch/autograd/grad_mode.py b/torch/autograd/grad_mode.py index c57a16f80d76be..331327e26737a7 100644 --- a/torch/autograd/grad_mode.py +++ b/torch/autograd/grad_mode.py @@ -111,7 +111,7 @@ class no_grad(_DecoratorContextManager): Example:: - >>> x = torch.tensor([1], requires_grad=True) + >>> x = torch.tensor([1.], requires_grad=True) >>> with torch.no_grad(): ... y = x * 2 >>> y.requires_grad @@ -206,7 +206,7 @@ class set_grad_enabled(_DecoratorContextManager): Example:: - >>> x = torch.tensor([1], requires_grad=True) + >>> x = torch.tensor([1.], requires_grad=True) >>> is_train = False >>> with torch.set_grad_enabled(is_train): ... y = x * 2 diff --git a/torch/autograd/gradcheck.py b/torch/autograd/gradcheck.py index fd6e7651999362..0ec2c2d1ef9066 100644 --- a/torch/autograd/gradcheck.py +++ b/torch/autograd/gradcheck.py @@ -504,7 +504,7 @@ def _stack_and_check_tensors(list_of_list_of_tensors, inputs, If the test - manually invokes gradcheck/gradgradcheck, then call gradcheck/gradgradcheck with `nondet_tol=` as a keyword argument. -- is OpInfo-based (e.g., in test_ops.py), then modify the OpInfo for the test +- is OpInfo-based (e.g., in test_ops_gradients.py), then modify the OpInfo for the test to have `gradcheck_nondet_tol=`. - is a Module test (e.g., in common_nn.py), then modify the corresponding module_test entry to have `gradcheck_nondet_tol=` @@ -717,7 +717,7 @@ def _check_no_differentiable_outputs_fast(func, func_out, all_inputs, inputs_ind If the test - manually invokes gradcheck/gradgradcheck, then call gradcheck/gradgradcheck with `check_batched_grad=False` as a keyword argument. -- is OpInfo-based (e.g., in test_ops.py), then modify the OpInfo for the test +- is OpInfo-based (e.g., in test_ops_gradients.py), then modify the OpInfo for the test to have `check_batched_grad=False` and/or `check_batched_gradgrad=False`. If you're modifying an existing operator that supports batched grad computation, @@ -743,7 +743,7 @@ def _check_no_differentiable_outputs_fast(func, func_out, all_inputs, inputs_ind If the test - manually invokes gradcheck/gradgradcheck, then call gradcheck/gradgradcheck with `check_batched_forward_grad=False` as a keyword argument. -- is OpInfo-based (e.g., in test_ops.py), then modify the OpInfo for the test +- is OpInfo-based (e.g., in test_ops_gradients.py), then modify the OpInfo for the test to have `check_batched_forward_grad=False` """ @@ -1196,7 +1196,7 @@ def _adjusted_atol(atol, u, v): If the test - manually invokes gradcheck/gradgradcheck, then call gradcheck/gradgradcheck with `fast_mode=False` as a keyword argument. -- is OpInfo-based (e.g., in test_ops.py), then modify the OpInfo for the test +- is OpInfo-based (e.g., in test_ops_gradients.py), then modify the OpInfo for the test to have `gradcheck_fast_mode=False` - is a Module test (e.g., in common_nn.py), then modify the corresponding module_test entry to have `gradcheck_fast_mode=False` diff --git a/torch/autograd/profiler.py b/torch/autograd/profiler.py index 91c8d40c0cd1c1..af410570d9071c 100644 --- a/torch/autograd/profiler.py +++ b/torch/autograd/profiler.py @@ -6,7 +6,7 @@ from torch.autograd import ( DeviceType, ProfilerActivity, ProfilerConfig, ProfilerState, kineto_available, _ProfilerResult, _disable_profiler, _enable_profiler, - _prepare_profiler, _supported_activities + _prepare_profiler, _supported_activities, _kineto_step, ) import torch import torch.cuda @@ -428,17 +428,20 @@ def __init__(self, name: str, args: Optional[str] = None): self.args: Optional[str] = args # Whether or not we should run record function's end callbacks when exiting. self.run_callbacks_on_exit: bool = True - # Stores underlying RecordFunction as a tensor. TODO: move to custom - # class (https://github.com/pytorch/pytorch/issues/35026). - self.handle: torch.Tensor = torch.zeros(1) + # TODO: TorchScript ignores standard type annotation here + # self.record: Optional["torch.classes.profiler._RecordFunction"] = None + self.record = torch.jit.annotate(Optional["torch.classes.profiler._RecordFunction"], None) def __enter__(self): - self.handle = torch.ops.profiler._record_function_enter(self.name, self.args) + self.record = torch.ops.profiler._record_function_enter_new(self.name, self.args) return self def __exit__(self, exc_type: Any, exc_value: Any, traceback: Any): if self.run_callbacks_on_exit: - torch.ops.profiler._record_function_exit(self.handle) + # Local variable is needed by TorchScript to refine Optional[T] to T + record = self.record + assert record is not None + torch.ops.profiler._record_function_exit(record) def _call_end_callbacks_on_future(self, fut: Future[Any]) -> Future[Any]: """ @@ -465,7 +468,11 @@ def _call_end_callbacks_on_future(self, fut: Future[Any]) -> Future[Any]: # We are scheduling to run this RecordFunction's end callbacks when the # passed in future completes, so don't run end callbacks on exit. self.run_callbacks_on_exit = False - profiled_future = torch.ops.profiler._call_end_callbacks_on_jit_fut(self.handle, fut) + + # Local variable is needed by TorchScript to refine Optional[T] to T + record = self.record + assert record is not None + profiled_future = torch.ops.profiler._call_end_callbacks_on_jit_fut(record, fut) return profiled_future @@ -664,3 +671,10 @@ def parse_nvprof_trace(path): functions.sort(key=lambda evt: evt.time_range.start) return functions + + +def kineto_step(): + """ Notify kineto so it is aware of iteration boundaries for asynchronous + trace requests. + """ + _kineto_step() diff --git a/torch/autograd/profiler_util.py b/torch/autograd/profiler_util.py index 6062c097b25319..dc505fbc210aac 100644 --- a/torch/autograd/profiler_util.py +++ b/torch/autograd/profiler_util.py @@ -642,6 +642,7 @@ def _filter_name(name): filtered_out_names = [ MEMORY_EVENT_NAME, # used only for the top-level memory events "profiler::_record_function_enter", + "profiler::_record_function_enter_new", "profiler::_record_function_exit", "aten::is_leaf", "aten::output_nr", diff --git a/torch/backends/_coreml/preprocess.py b/torch/backends/_coreml/preprocess.py index 7f27e60e5acb44..3884058cd0ecf0 100644 --- a/torch/backends/_coreml/preprocess.py +++ b/torch/backends/_coreml/preprocess.py @@ -1,7 +1,6 @@ import hashlib import json -from dataclasses import dataclass, astuple, field -from typing import Dict, Tuple, List +from typing import Dict, Tuple import coremltools as ct # type: ignore[import] import torch @@ -35,86 +34,56 @@ class CoreMLComputeUnit: ALL = "all" -@dataclass -class _TensorSpec: - shape: List[int] = field(default_factory=List[int]) - dtype: int = ScalarType.Float - - -def TensorSpec(*args, **kwargs): - """ - TensorSpec specifies the tensor information. The default dtype is float32 - Example: - ts = TensorSpec( - shape = [1, 3, 224, 224], - dtype = ScalarType.Float - ) - """ - return astuple(_TensorSpec(*args, **kwargs)) - - -@dataclass -class _CompileSpec: - inputs: Tuple[_TensorSpec] = () # type: ignore[assignment] - outputs: Tuple[_TensorSpec] = () # type: ignore[assignment] - backend: str = CoreMLComputeUnit.CPU - allow_low_precision: bool = True - - -def CompileSpec(*args, **kwargs): - """ - CompileSpec specifies the model information. - Example: - cs = CompileSpec( - inputs=( - TensorSpec( - shape=[1, 3, 224, 224], - ), - ), - outputs=( - TensorSpec( - shape=[1, 1000], - ), - ), - backend=CoreMLComputeUnit.CPU, - allow_low_precision=True, - ), - """ - return astuple(_CompileSpec(*args, **kwargs)) - - -def _convert_to_mil_type(spec: _TensorSpec, name: str): - ml_type = TensorType(shape=spec.shape, dtype=torch_to_mil_types[spec.dtype]) +def TensorSpec(shape, dtype=ScalarType.Float): + return (shape, dtype) + + +def CompileSpec(inputs, outputs, backend=CoreMLComputeUnit.CPU, allow_low_precision=True): + return (inputs, outputs, backend, allow_low_precision) + + +def _check_enumerated_shape(shape): + for s in shape: + if not isinstance(s, (list, tuple)): + return False + return True + + +def _convert_to_mil_type(shape, dtype, name: str): + mil_shape = shape + if _check_enumerated_shape(shape): + mil_shape = ct.EnumeratedShapes(shape) + ml_type = TensorType(shape=mil_shape, dtype=torch_to_mil_types[dtype]) ml_type.name = name return ml_type def preprocess(script_module: torch._C.ScriptObject, compile_spec: Dict[str, Tuple]): spec = compile_spec["forward"] - forward_spec = _CompileSpec(*spec) + input_specs, output_specs, backend, allow_low_precision = spec mil_inputs = [] inputs = [] - for index, input_spec in enumerate(forward_spec.inputs): - input_spec = _TensorSpec(*input_spec) # type: ignore[misc] + for index, input in enumerate(input_specs): + shape, dtype = input name = "input_" + str(index) - inputs.append([name, str(input_spec.dtype), str(input_spec.shape)]) - ml_type = _convert_to_mil_type(input_spec, name) + inputs.append([name, str(dtype), str(shape)]) + ml_type = _convert_to_mil_type(shape, dtype, name) mil_inputs.append(ml_type) model = torch.jit.RecursiveScriptModule._construct(script_module, lambda x: None) mlmodel = ct.convert(model, inputs=mil_inputs) spec = mlmodel.get_spec() - output_specs = forward_spec.outputs assert len(spec.description.output) == len(output_specs) # type: ignore[attr-defined] outputs = [] - for index, output_spec in enumerate(output_specs): - output_spec = _TensorSpec(*output_spec) # type: ignore[misc] + for index, output in enumerate(output_specs): + shape, dtype = output name = spec.description.output[index].name # type: ignore[attr-defined] - outputs.append([name, str(output_spec.dtype), str(output_spec.shape)]) + outputs.append([name, str(dtype), str(shape)]) mlmodel = ct.models.model.MLModel(spec) + print(mlmodel) config = { "spec_ver": str(spec.specificationVersion), # type: ignore[attr-defined] - "backend": forward_spec.backend, - "allow_low_precision": str(forward_spec.allow_low_precision), + "backend": backend, + "allow_low_precision": str(allow_low_precision), } metadata = { "coremltool_ver": mlmodel.user_defined_metadata[CT_METADATA_VERSION], diff --git a/torch/backends/_nnapi/serializer.py b/torch/backends/_nnapi/serializer.py index d29b5987295c74..4bbf9b5e85308a 100644 --- a/torch/backends/_nnapi/serializer.py +++ b/torch/backends/_nnapi/serializer.py @@ -1549,11 +1549,28 @@ def add_adaptive_avg_pool2d(self, node): self.add_operation(NNAPI_OperationCode.AVERAGE_POOL_2D, inputs, outputs) def add_upsample_nearest2d(self, node): - assert node.inputsSize() == 3 + assert node.inputsSize() == 3 or node.inputsSize() == 4 assert node.outputsSize() == 1 - image, size_jit, scale_jit = node.inputs() + if node.inputsSize() == 3: + image, size_jit, scale_jit = node.inputs() + else: + image, size_jit, scale_h_jit, scale_w_jit = node.inputs() size_ctype, size_arg = self.get_constant_value(size_jit) - scale_ctype, scale_arg = self.get_constant_value(scale_jit) + + if node.inputsSize() == 3: + scale_ctype, scale_arg = self.get_constant_value(scale_jit) + else: + scale_h_ctype, scale_h_arg = self.get_constant_value(scale_h_jit) + scale_w_ctype, scale_w_arg = self.get_constant_value(scale_w_jit) + + # The only way for the 4-argument overload of upsample_nearest2d to + # have been added to the graph without error is if the scale_h and + # scale_w arguments are None + assert scale_h_ctype.kind() == "NoneType" + assert scale_w_ctype.kind() == "NoneType" + + scale_ctype = scale_h_ctype + scale_arg = scale_h_arg image_id, image_oper = self.get_tensor_operand_by_jitval(image) assert len(image_oper.shape) == 4 diff --git a/torch/backends/quantized/__init__.py b/torch/backends/quantized/__init__.py index a24d88bcc6e6d4..6f7d479e90c4a4 100644 --- a/torch/backends/quantized/__init__.py +++ b/torch/backends/quantized/__init__.py @@ -11,6 +11,8 @@ def _get_qengine_id(qengine: str) -> int: ret = 1 elif qengine == 'qnnpack': ret = 2 + elif qengine == 'onednn': + ret = 3 else: ret = -1 raise RuntimeError("{} is not a valid value for quantized engine".format(qengine)) @@ -18,7 +20,7 @@ def _get_qengine_id(qengine: str) -> int: # This function should correspond to the enums present in c10/core/QEngine.h def _get_qengine_str(qengine: int) -> str: - all_engines = {0 : 'none', 1 : 'fbgemm', 2 : 'qnnpack'} + all_engines = {0 : 'none', 1 : 'fbgemm', 2 : 'qnnpack', 3 : 'onednn'} return all_engines.get(qengine, '*undefined') class _QEngineProp(object): diff --git a/torch/cpu/amp/autocast_mode.py b/torch/cpu/amp/autocast_mode.py index 49ffb5c11b4257..03cbcdcda0fc61 100644 --- a/torch/cpu/amp/autocast_mode.py +++ b/torch/cpu/amp/autocast_mode.py @@ -1,7 +1,7 @@ import torch from typing import Any -class autocast(torch.autocast_mode.autocast): +class autocast(torch.amp.autocast_mode.autocast): r""" See :class:`torch.autocast`. ``torch.cpu.amp.autocast(args...)`` is equivalent to ``torch.autocast("cpu", args...)`` diff --git a/torch/csrc/api/include/torch/fft.h b/torch/csrc/api/include/torch/fft.h index 23ecbf1be0c697..71a3146c990f18 100644 --- a/torch/csrc/api/include/torch/fft.h +++ b/torch/csrc/api/include/torch/fft.h @@ -44,7 +44,7 @@ inline Tensor ifft(const Tensor& self, /// torch::fft::fft2(t); /// ``` inline Tensor fft2(const Tensor& self, - c10::optional s=c10::nullopt, + OptionalIntArrayRef s=c10::nullopt, IntArrayRef dim={-2, -1}, c10::optional norm=c10::nullopt) { return torch::fft_fft2(self, s, dim, norm); @@ -59,7 +59,7 @@ inline Tensor fft2(const Tensor& self, /// torch::fft::ifft2(t); /// ``` inline Tensor ifft2(const Tensor& self, - c10::optional s=c10::nullopt, + at::OptionalIntArrayRef s=c10::nullopt, IntArrayRef dim={-2, -1}, c10::optional norm=c10::nullopt) { return torch::fft_ifft2(self, s, dim, norm); @@ -74,8 +74,8 @@ inline Tensor ifft2(const Tensor& self, /// torch::fft::fftn(t); /// ``` inline Tensor fftn(const Tensor& self, - c10::optional s=c10::nullopt, - c10::optional dim=c10::nullopt, + at::OptionalIntArrayRef s=c10::nullopt, + at::OptionalIntArrayRef dim=c10::nullopt, c10::optional norm=c10::nullopt) { return torch::fft_fftn(self, s, dim, norm); } @@ -89,8 +89,8 @@ inline Tensor fftn(const Tensor& self, /// torch::fft::ifftn(t); /// ``` inline Tensor ifftn(const Tensor& self, - c10::optional s=c10::nullopt, - c10::optional dim=c10::nullopt, + at::OptionalIntArrayRef s=c10::nullopt, + at::OptionalIntArrayRef dim=c10::nullopt, c10::optional norm=c10::nullopt) { return torch::fft_ifftn(self, s, dim, norm); } @@ -138,7 +138,7 @@ inline Tensor irfft(const Tensor& self, /// torch::fft::rfft2(t); /// ``` inline Tensor rfft2(const Tensor& self, - c10::optional s=c10::nullopt, + at::OptionalIntArrayRef s=c10::nullopt, IntArrayRef dim={-2, -1}, c10::optional norm=c10::nullopt) { return torch::fft_rfft2(self, s, dim, norm); @@ -153,7 +153,7 @@ inline Tensor rfft2(const Tensor& self, /// torch::fft::irfft2(t); /// ``` inline Tensor irfft2(const Tensor& self, - c10::optional s=c10::nullopt, + at::OptionalIntArrayRef s=c10::nullopt, IntArrayRef dim={-2, -1}, c10::optional norm=c10::nullopt) { return torch::fft_irfft2(self, s, dim, norm); @@ -168,8 +168,8 @@ inline Tensor irfft2(const Tensor& self, /// torch::fft::rfftn(t); /// ``` inline Tensor rfftn(const Tensor& self, - c10::optional s=c10::nullopt, - c10::optional dim=c10::nullopt, + at::OptionalIntArrayRef s=c10::nullopt, + at::OptionalIntArrayRef dim=c10::nullopt, c10::optional norm=c10::nullopt) { return torch::fft_rfftn(self, s, dim, norm); } @@ -183,8 +183,8 @@ inline Tensor rfftn(const Tensor& self, /// torch::fft::irfftn(t); /// ``` inline Tensor irfftn(const Tensor& self, - c10::optional s=c10::nullopt, - c10::optional dim=c10::nullopt, + at::OptionalIntArrayRef s=c10::nullopt, + at::OptionalIntArrayRef dim=c10::nullopt, c10::optional norm=c10::nullopt) { return torch::fft_irfftn(self, s, dim, norm); } @@ -238,7 +238,7 @@ inline Tensor ihfft(const Tensor& self, /// assert(T.is_floating_point() && T.numel() == 128 * 128); /// ``` inline Tensor hfft2(const Tensor& self, - c10::optional s=c10::nullopt, + at::OptionalIntArrayRef s=c10::nullopt, IntArrayRef dim={-2, -1}, c10::optional norm=c10::nullopt) { return torch::fft_hfft2(self, s, dim, norm); @@ -256,7 +256,7 @@ inline Tensor hfft2(const Tensor& self, /// assert(t.is_complex() && t.size(1) == 65); /// ``` inline Tensor ihfft2(const Tensor& self, - c10::optional s=c10::nullopt, + at::OptionalIntArrayRef s=c10::nullopt, IntArrayRef dim={-2, -1}, c10::optional norm=c10::nullopt) { return torch::fft_ihfft2(self, s, dim, norm); @@ -274,7 +274,7 @@ inline Tensor ihfft2(const Tensor& self, /// assert(T.is_floating_point() && T.numel() == 128 * 128); /// ``` inline Tensor hfftn(const Tensor& self, - c10::optional s=c10::nullopt, + at::OptionalIntArrayRef s=c10::nullopt, IntArrayRef dim={-2, -1}, c10::optional norm=c10::nullopt) { return torch::fft_hfftn(self, s, dim, norm); @@ -292,7 +292,7 @@ inline Tensor hfftn(const Tensor& self, /// assert(t.is_complex() && t.size(1) == 65); /// ``` inline Tensor ihfftn(const Tensor& self, - c10::optional s=c10::nullopt, + at::OptionalIntArrayRef s=c10::nullopt, IntArrayRef dim={-2, -1}, c10::optional norm=c10::nullopt) { return torch::fft_ihfftn(self, s, dim, norm); @@ -341,7 +341,7 @@ inline Tensor rfftfreq(int64_t n, const TensorOptions& options) { /// auto x = torch::randn({127, 4}); /// auto centred_fft = torch::fft::fftshift(torch::fft::fftn(x)); /// ``` -inline Tensor fftshift(const Tensor& x, c10::optional dim=c10::nullopt) { +inline Tensor fftshift(const Tensor& x, at::OptionalIntArrayRef dim=c10::nullopt) { return torch::fft_fftshift(x, dim); } @@ -356,7 +356,7 @@ inline Tensor fftshift(const Tensor& x, c10::optional dim=c10::null /// auto unshift = torch::fft::ifftshift(shift); /// assert(torch::allclose(x, unshift)); /// ``` -inline Tensor ifftshift(const Tensor& x, c10::optional dim=c10::nullopt) { +inline Tensor ifftshift(const Tensor& x, at::OptionalIntArrayRef dim=c10::nullopt) { return torch::fft_ifftshift(x, dim); } diff --git a/torch/csrc/api/include/torch/linalg.h b/torch/csrc/api/include/torch/linalg.h index e16c1f61e503b2..705e2e41b73d7a 100644 --- a/torch/csrc/api/include/torch/linalg.h +++ b/torch/csrc/api/include/torch/linalg.h @@ -84,27 +84,27 @@ inline Tensor matrix_exp(const Tensor& self) { return torch::linalg_matrix_exp(self); } -inline Tensor norm(const Tensor& self, const optional& opt_ord, optional opt_dim, bool keepdim, optional opt_dtype) { +inline Tensor norm(const Tensor& self, const optional& opt_ord, OptionalIntArrayRef opt_dim, bool keepdim, optional opt_dtype) { return torch::linalg_norm(self, opt_ord, opt_dim, keepdim, opt_dtype); } -inline Tensor norm(const Tensor& self, c10::string_view ord, optional opt_dim, bool keepdim, optional opt_dtype) { +inline Tensor norm(const Tensor& self, c10::string_view ord, OptionalIntArrayRef opt_dim, bool keepdim, optional opt_dtype) { return torch::linalg_norm(self, ord, opt_dim, keepdim, opt_dtype); } -inline Tensor& norm_out(Tensor& result, const Tensor& self, const optional& opt_ord, optional opt_dim, bool keepdim, optional opt_dtype) { +inline Tensor& norm_out(Tensor& result, const Tensor& self, const optional& opt_ord, OptionalIntArrayRef opt_dim, bool keepdim, optional opt_dtype) { return torch::linalg_norm_out(result, self, opt_ord, opt_dim, keepdim, opt_dtype); } -inline Tensor& norm_out(Tensor& result, const Tensor& self, c10::string_view ord, optional opt_dim, bool keepdim, optional opt_dtype) { +inline Tensor& norm_out(Tensor& result, const Tensor& self, c10::string_view ord, OptionalIntArrayRef opt_dim, bool keepdim, optional opt_dtype) { return torch::linalg_norm_out(result, self, ord, opt_dim, keepdim, opt_dtype); } -inline Tensor vector_norm(const Tensor& self, Scalar ord, optional opt_dim, bool keepdim, optional opt_dtype) { +inline Tensor vector_norm(const Tensor& self, Scalar ord, OptionalIntArrayRef opt_dim, bool keepdim, optional opt_dtype) { return torch::linalg_vector_norm(self, ord, opt_dim, keepdim, opt_dtype); } -inline Tensor& vector_norm_out(Tensor& result, const Tensor& self, Scalar ord, optional opt_dim, bool keepdim, optional opt_dtype) { +inline Tensor& vector_norm_out(Tensor& result, const Tensor& self, Scalar ord, OptionalIntArrayRef opt_dim, bool keepdim, optional opt_dtype) { return torch::linalg_vector_norm_out(result, self, ord, opt_dim, keepdim, opt_dtype); } @@ -228,11 +228,11 @@ inline Tensor& tensorinv_out(Tensor& result,const Tensor& self, int64_t ind) { return torch::linalg_tensorinv_out(result, self, ind); } -inline Tensor tensorsolve(const Tensor& self, const Tensor& other, optional dims) { +inline Tensor tensorsolve(const Tensor& self, const Tensor& other, OptionalIntArrayRef dims) { return torch::linalg_tensorsolve(self, other, dims); } -inline Tensor& tensorsolve_out(Tensor& result, const Tensor& self, const Tensor& other, optional dims) { +inline Tensor& tensorsolve_out(Tensor& result, const Tensor& self, const Tensor& other, OptionalIntArrayRef dims) { return torch::linalg_tensorsolve_out(result, self, other, dims); } @@ -354,22 +354,22 @@ inline Tensor matrix_exp(const Tensor& input) { } // C10_DEPRECATED_MESSAGE("linalg_norm is deprecated, use norm instead.") -inline Tensor linalg_norm(const Tensor& self, const optional& opt_ord, optional opt_dim, bool keepdim, optional opt_dtype) { +inline Tensor linalg_norm(const Tensor& self, const optional& opt_ord, OptionalIntArrayRef opt_dim, bool keepdim, optional opt_dtype) { return detail::norm(self, opt_ord, opt_dim, keepdim, opt_dtype); } // C10_DEPRECATED_MESSAGE("linalg_norm is deprecated, use norm instead.") -inline Tensor linalg_norm(const Tensor& self, c10::string_view ord, optional opt_dim, bool keepdim, optional opt_dtype) { +inline Tensor linalg_norm(const Tensor& self, c10::string_view ord, OptionalIntArrayRef opt_dim, bool keepdim, optional opt_dtype) { return detail::norm(self, ord, opt_dim, keepdim, opt_dtype); } // C10_DEPRECATED_MESSAGE("linalg_norm_out is deprecated, use norm_out instead.") -inline Tensor& linalg_norm_out(Tensor& result, const Tensor& self, const optional& opt_ord, optional opt_dim, bool keepdim, optional opt_dtype) { +inline Tensor& linalg_norm_out(Tensor& result, const Tensor& self, const optional& opt_ord, OptionalIntArrayRef opt_dim, bool keepdim, optional opt_dtype) { return detail::norm_out(result, self, opt_ord, opt_dim, keepdim, opt_dtype); } // C10_DEPRECATED_MESSAGE("linalg_norm_out is deprecated, use norm_out instead.") -inline Tensor& linalg_norm_out(Tensor& result, const Tensor& self, c10::string_view ord, optional opt_dim, bool keepdim, optional opt_dtype) { +inline Tensor& linalg_norm_out(Tensor& result, const Tensor& self, c10::string_view ord, OptionalIntArrayRef opt_dim, bool keepdim, optional opt_dtype) { return detail::norm_out(result, self, ord, opt_dim, keepdim, opt_dtype); } @@ -384,28 +384,28 @@ inline std::tuple lu_factor_out(Tensor& LU, Tensor& pivots, co return detail::lu_factor_out(LU, pivots, self, pivot); } -inline Tensor norm(const Tensor& self, const optional& opt_ord, optional opt_dim, bool keepdim, optional opt_dtype) { +inline Tensor norm(const Tensor& self, const optional& opt_ord, OptionalIntArrayRef opt_dim, bool keepdim, optional opt_dtype) { return detail::norm(self, opt_ord, opt_dim, keepdim, opt_dtype); } -inline Tensor norm(const Tensor& self, std::string ord, optional opt_dim, bool keepdim, optional opt_dtype) { +inline Tensor norm(const Tensor& self, std::string ord, OptionalIntArrayRef opt_dim, bool keepdim, optional opt_dtype) { return detail::norm(self, ord, opt_dim, keepdim, opt_dtype); } -inline Tensor& norm_out(Tensor& result, const Tensor& self, const optional& opt_ord, optional opt_dim, bool keepdim, optional opt_dtype) { +inline Tensor& norm_out(Tensor& result, const Tensor& self, const optional& opt_ord, OptionalIntArrayRef opt_dim, bool keepdim, optional opt_dtype) { return detail::norm_out(result, self, opt_ord, opt_dim, keepdim, opt_dtype); } -inline Tensor& norm_out(Tensor& result, const Tensor& self, std::string ord, optional opt_dim, bool keepdim, optional opt_dtype) { +inline Tensor& norm_out(Tensor& result, const Tensor& self, std::string ord, OptionalIntArrayRef opt_dim, bool keepdim, optional opt_dtype) { return detail::norm_out(result, self, ord, opt_dim, keepdim, opt_dtype); } /// See https://pytorch.org/docs/master/linalg.html#torch.linalg.vector_norm -inline Tensor vector_norm(const Tensor& self, Scalar ord, optional opt_dim, bool keepdim, optional opt_dtype) { +inline Tensor vector_norm(const Tensor& self, Scalar ord, OptionalIntArrayRef opt_dim, bool keepdim, optional opt_dtype) { return detail::vector_norm(self, ord, opt_dim, keepdim, opt_dtype); } -inline Tensor& vector_norm_out(Tensor& result, const Tensor& self, Scalar ord, optional opt_dim, bool keepdim, optional opt_dtype) { +inline Tensor& vector_norm_out(Tensor& result, const Tensor& self, Scalar ord, OptionalIntArrayRef opt_dim, bool keepdim, optional opt_dtype) { return detail::vector_norm_out(result, self, ord, opt_dim, keepdim, opt_dtype); } @@ -574,11 +574,11 @@ inline Tensor& tensorinv_out(Tensor& result, const Tensor& self, int64_t ind) { /// auto b = torch::randn(2*3, 4); /// auto x = torch::linalg::tensorsolve(a, b); /// ``` -inline Tensor tensorsolve(const Tensor& input, const Tensor& other, optional dims) { +inline Tensor tensorsolve(const Tensor& input, const Tensor& other, OptionalIntArrayRef dims) { return detail::tensorsolve(input, other, dims); } -inline Tensor& tensorsolve_out(Tensor& result, const Tensor& input, const Tensor& other, optional dims) { +inline Tensor& tensorsolve_out(Tensor& result, const Tensor& input, const Tensor& other, OptionalIntArrayRef dims) { return detail::tensorsolve_out(result, input, other, dims); } diff --git a/torch/csrc/api/include/torch/nn/functional/padding.h b/torch/csrc/api/include/torch/nn/functional/padding.h index 611f407d9b7a77..1b2f77626cdbbd 100644 --- a/torch/csrc/api/include/torch/nn/functional/padding.h +++ b/torch/csrc/api/include/torch/nn/functional/padding.h @@ -1,83 +1,36 @@ #pragma once #include +#include namespace torch { namespace nn { namespace functional { -inline Tensor _narrow_with_range(const Tensor& input, int64_t dim, int64_t start, int64_t end) { - return input.narrow(dim, start, end - start); -} - -inline Tensor _pad_circular(Tensor input, IntArrayRef padding) { - int padding_size = padding.size(); - input = torch::cat({input, _narrow_with_range(input, 2, 0, padding[-1 + padding_size])}, /*dim=*/2); - input = torch::cat({_narrow_with_range(input, 2, -(padding[-1 + padding_size] + padding[-2 + padding_size]), -padding[-1 + padding_size]), input}, /*dim=*/2); - - if (padding_size > 2) { - input = torch::cat({input, _narrow_with_range(input, 3, 0, padding[-3 + padding_size])}, /*dim=*/3); - input = torch::cat({_narrow_with_range(input, 3, -(padding[-3 + padding_size] + padding[-4 + padding_size]), -padding[-3 + padding_size]), input}, /*dim=*/3); - } - - if (padding_size > 4) { - input = torch::cat({input, _narrow_with_range(input, 4, 0, padding[-5 + padding_size])}, /*dim=*/4); - input = torch::cat({_narrow_with_range(input, 4, -(padding[-5 + padding_size] + padding[-6 + padding_size]), -padding[-5 + padding_size]), input}, /*dim=*/4); - } - - return input; -} - #ifndef DOXYGEN_SHOULD_SKIP_THIS namespace detail { inline Tensor pad(const Tensor& input, IntArrayRef pad, PadFuncOptions::mode_t mode, double value) { - TORCH_CHECK(pad.size() % 2 == 0, "Padding length must be divisible by 2"); - TORCH_CHECK(((int64_t)(pad.size() / 2)) <= input.dim(), "Padding length too large"); - if (c10::get_if(&mode)) { - return torch::constant_pad_nd(input, pad, value); - } else { - TORCH_CHECK( - value == 0, - "Padding mode \"", - torch::enumtype::get_enum_name(mode), - "\" doesn't take in value argument"); - if (pad.size() == 2 && (input.dim() == 2 || input.dim() == 3)) { - if (c10::get_if(&mode)) { - return torch::reflection_pad1d(input, pad); - } else if (c10::get_if(&mode)) { - return torch::replication_pad1d(input, pad); - } else if (c10::get_if(&mode)) { - return _pad_circular(input, pad); - } else { - TORCH_CHECK(false, "NotImplementedError"); - } - } else if(pad.size() == 4 && (input.dim() == 3 || input.dim() == 4)) { - if (c10::get_if(&mode)) { - return torch::reflection_pad2d(input, pad); - } else if (c10::get_if(&mode)) { - return torch::replication_pad2d(input, pad); - } else if (c10::get_if(&mode)) { - return _pad_circular(input, pad); - } else { - TORCH_CHECK(false, "NotImplementedError"); - } - } else if (pad.size() == 6 && (input.dim() == 4 || input.dim() == 5)) { - if (c10::get_if(&mode)) { - return torch::reflection_pad3d(input, pad); - } else if (c10::get_if(&mode)) { - return torch::replication_pad3d(input, pad); - } else if (c10::get_if(&mode)) { - return _pad_circular(input, pad); - } else { - TORCH_CHECK(false, "NotImplementedError"); - } - } else { - TORCH_CHECK(false, "Only 2D, 3D, 4D, 5D padding with non-constant padding are supported for now"); + const auto mode_enum = [&] { + if (c10::get_if(&mode)) { + return at::padding_mode::constant; + } else if (c10::get_if(&mode)) { + return at::padding_mode::reflect; + } else if (c10::get_if(&mode)) { + return at::padding_mode::replicate; + } else if (c10::get_if(&mode)) { + return at::padding_mode::circular; } + TORCH_CHECK(false, "Unrecognised padding mode"); + }(); + + c10::optional fill_value; + if (value != 0.0) { + fill_value = value; } + return at::_pad_enum(input, pad, static_cast(mode_enum), fill_value); } } // namespace detail #endif /* DOXYGEN_SHOULD_SKIP_THIS */ diff --git a/torch/csrc/api/include/torch/special.h b/torch/csrc/api/include/torch/special.h index 6e0ecc0fbcadac..d667e094f99353 100644 --- a/torch/csrc/api/include/torch/special.h +++ b/torch/csrc/api/include/torch/special.h @@ -215,6 +215,15 @@ inline Tensor& logsumexp_out(Tensor& result, const Tensor& self, IntArrayRef dim return torch::special_logsumexp_out(result, self, dims, keepdim); } +/// Computes the argument, x, for which the area under the Gaussian probability density +/// function (integrated from minus infinity to x) is equal to input, elementwise. +/// See https://pytorch.org/docs/master/special.html#torch.special.ndtri +/// +/// Example: +/// ``` +/// auto t = torch::rand(128, dtype=kDouble); +/// torch::special::ndtri(t); +/// ``` inline Tensor ndtri(const Tensor& self) { return torch::special_ndtri(self); } @@ -223,6 +232,23 @@ inline Tensor& ndtri_out(Tensor& result, const Tensor& self) { return torch::special_ndtri_out(result, self); } +/// Computes the log of area under the standard Gaussian probability density function, +/// integrated from minus infinity to :attr:`input`, elementwise +/// See https://pytorch.org/docs/master/special.html#torch.special.log_ndtr +/// +/// Example: +/// ``` +/// auto t = torch::randn(128, dtype=kDouble); +/// torch::special::log_ndtr(t); +/// ``` +inline Tensor log_ndtr(const Tensor& self) { + return torch::special_log_ndtr(self); +} + +inline Tensor& log_ndtr_out(Tensor& result, const Tensor& self) { + return torch::special_log_ndtr_out(result, self); +} + /// Computes the logit of input, elementwise. /// See https://pytorch.org/docs/master/special.html#torch.special.logit. /// diff --git a/torch/csrc/autograd/FunctionsManual.cpp b/torch/csrc/autograd/FunctionsManual.cpp index c91d82d9263586..162fe0e9fe61a4 100644 --- a/torch/csrc/autograd/FunctionsManual.cpp +++ b/torch/csrc/autograd/FunctionsManual.cpp @@ -232,7 +232,7 @@ Tensor norm_backward(Tensor grad, const Tensor& self, const optional & p return self_scaled * scale_v; } -Tensor linalg_vector_norm_backward(Tensor grad, const Tensor& self, const Scalar& scalar_ord, Tensor norm, const optional& opt_dim, bool keepdim) { +Tensor linalg_vector_norm_backward(Tensor grad, const Tensor& self, const Scalar& scalar_ord, Tensor norm, const at::OptionalIntArrayRef& opt_dim, bool keepdim) { auto dim = opt_dim.value_or(IntArrayRef({})); return norm_backward(grad, self, scalar_ord, norm, dim, keepdim); } @@ -717,6 +717,22 @@ std::tuple clamp_backward_min_max( return ret; } +at::Tensor clamp_jvp( + const Tensor& self_p, const Tensor& self_t, + const Tensor& min_p, const Tensor& min_t, + const Tensor& max_p, const Tensor& max_t +) { + if (min_p.defined() && max_p.defined()) { + return where(min_p > max_p, max_t, where(self_p < min_p, min_t, where(self_p > max_p, max_t, self_t))); + } else if (min_p.defined()) { + return where(self_p > min_p, self_t, min_t); + } else if (max_p.defined()) { + return where(self_p < max_p, self_t, max_t); + } else { + return self_t; + } +} + Tensor convolution_jvp( const Tensor& input_p, const Tensor& input_t, const Tensor& weight_p, const Tensor& weight_t, @@ -764,7 +780,7 @@ Tensor convolution_backward_jvp_grad_bias( } else { TORCH_INTERNAL_ASSERT( false, - "convolution_backward_jvp_grad_bias expected dim of grad_out_t to be 3, 4, or 4, but got: ", + "convolution_backward_jvp_grad_bias expected dim of grad_out_t to be 3, 4, or 5, but got: ", grad_out_t.dim()); } } @@ -1050,7 +1066,7 @@ static Tensor var_backward(const Tensor & grad, const Tensor & self, int64_t cor return (2.0 / (self.numel() - correction)) * grad * (self - self.mean()); } -Tensor var_backward(Tensor grad, const Tensor& self, c10::optional dim_opt, +Tensor var_backward(Tensor grad, const Tensor& self, at::OptionalIntArrayRef dim_opt, c10::optional correction_opt, bool keepdim) { auto correction = correction_opt.value_or(1); if (self.dim() == 0 || !dim_opt.has_value()) { @@ -1065,7 +1081,7 @@ Tensor var_backward(Tensor grad, const Tensor& self, c10::optional return (2.0 / dof) * grad * (self - self.mean(dim, /*keepdim=*/true)); } -Tensor var_jvp(const Tensor& self_t, const Tensor& self_p, const Tensor& result, c10::optional dim_opt, +Tensor var_jvp(const Tensor& self_t, const Tensor& self_p, const Tensor& result, at::OptionalIntArrayRef dim_opt, c10::optional correction_opt, bool keepdim) { auto correction = correction_opt.value_or(1); if (self_p.dim() == 0 || !dim_opt.has_value()) { @@ -1078,7 +1094,7 @@ Tensor var_jvp(const Tensor& self_t, const Tensor& self_p, const Tensor& result, Tensor std_backward( const Tensor& result, const Tensor& grad, const Tensor& self, - c10::optional dim, c10::optional correction, bool keepdim) { + at::OptionalIntArrayRef dim, c10::optional correction, bool keepdim) { auto grad_var = (grad / (result * 2)).masked_fill_(result == 0, 0); return var_backward(grad_var, self, dim, correction, keepdim); } @@ -1093,7 +1109,7 @@ Tensor mean_backward(Tensor grad, const IntArrayRef sizes, int64_t numel) { static Tensor mean_backward( const Tensor& grad, const IntArrayRef sizes, int64_t numel, - c10::optional dim, bool keepdim) { + at::OptionalIntArrayRef dim, bool keepdim) { if (dim.has_value()) { return mean_backward(grad, sizes, *dim, keepdim); } else { @@ -1103,7 +1119,7 @@ static Tensor mean_backward( Tensor var_std_mean_backward( const variable_list& grads, const Tensor& self, const Tensor& r1, - const Tensor& r2, c10::optional dim, + const Tensor& r2, at::OptionalIntArrayRef dim, c10::optional correction, bool keepdim, bool is_std) { Tensor grad; if (grads[0].defined()) { @@ -1176,19 +1192,35 @@ Tensor cholesky_inverse_backward(Tensor grad, Tensor L, bool upper, Tensor inver at::NoTF32Guard disable_tf32; Tensor grad_L; if (grad.defined()) { - Tensor common_term = grad + grad.mT(); + Tensor common_term = grad + grad.mH(); common_term = at::matmul(inverse, at::matmul(common_term, inverse)); if (upper) { grad_L = -at::matmul(L, common_term); } else { grad_L = -at::matmul(common_term, L); } - } else { - grad_L = at::zeros({1}, L.options()).expand_as(L); } + return grad_L; } +// If X = (L L^H)^{-1} with L lower-triangular with a real positive diagonal, +// then dX = K^H + K, where +// K = L^{-H} dL^{-1} [dL^{-1} = -L^{-1} dL L^{-1}] +// = -L^{-H} L^{-1} dL L^{-1} [L^{-H} L^{-1} = X] +// = -X dL L^{-1} [X = X^H = L^{-H} L^{-1} = L^{-1} L^{-H}] +// = -X dL X L^{H}. +// If X = (U^H U)^{-1} with U upper-triangular with a real positive diagonal, +// then K becomes +// K = -X dU^H X U +Tensor cholesky_inverse_jvp(const Tensor& F, const Tensor& dF, const Tensor& X, bool upper) { + at::NoTF32Guard disable_tf32; + const auto CF = upper ? F : F.mH(); + const auto dCF = upper ? dF.mH() : dF; + const auto partial_dX = -X.matmul(dCF).matmul(X).matmul(CF); + return partial_dX + partial_dX.mH(); +} + // The formula for forward AD is adapted from // // Golub, Gene H., and Victor Pereyra. "The Differentiation of Pseudo-Inverses and Nonlinear @@ -5200,6 +5232,25 @@ Tensor lu_factor_ex_jvp( } } +Tensor logsumexp_jvp(const Tensor& self_p, const Tensor& self_t, IntArrayRef dim, bool keepdim) { + // NB: for simplicitly, we recompute some values that can be reused from forward + auto self_p_exp = (self_p - at::amax(self_p, dim, true)).exp(); // Use the exp-normalize trick + auto sumexp_p = self_p_exp.sum(dim, keepdim); + + // NB: it's OK for logsumexp_jvp to be reused for formulas like softmax/log_softmax + // that only have one differentiable input, because that means self_t are never zerotensors + TORCH_INTERNAL_ASSERT(!self_t._is_zerotensor()) + if (areAnyTensorSubclassLike({self_p, self_t})) { + auto result = (self_p_exp * self_t).sum(dim, keepdim); + result /= sumexp_p; + return result; + } else { + self_p_exp *= self_t; + auto sumexp_t = self_p_exp.sum(dim, keepdim); + return sumexp_t /= sumexp_p; + } +} + Tensor warn_backwards(const Tensor &grad_output) { TORCH_WARN("Warn from backward"); return grad_output; @@ -5224,41 +5275,53 @@ std::tuple _cudnn_convolution_backward( return result; } -Tensor scatter_reduce_backward(const Tensor & grad, - const Tensor& input, - int dim, - const Tensor & index, - c10::string_view reduce, - const Tensor & result){ - Tensor grad_input; - +std::tuple scatter_reduce_backward( + const Tensor& grad, + const Tensor& self, + int dim, + const Tensor& index, + const Tensor& src, + c10::string_view reduce, + bool include_self, + const Tensor& result) { + Tensor grad_self, grad_src; + + // FIXME: complex gradients not handled correctly + // For now this is ok as scatter_reduce isn't added to the whitelist + // in tools/autograd/gen_variable_type.py - // TODO: gather doesn't support broadcasting of input and index - // currently this works because scatter_reduce doesn't support broadcasting yet but - // this needs to be fixed when scatter_reduce is upgraded to support broadcasting - // by broadcasting index here too. + if (!grad.defined()) { + return std::make_tuple(grad_self, grad_src); + } if (reduce == "sum") { - grad_input = grad.gather(dim, index); + grad_self = grad; + grad_src = grad.gather(dim, index); } else if (reduce == "prod") { - grad_input = (grad * result).gather(dim, index) / input; - // handle nans in above computation when input = 0, we know result = 0 (0 / 0 -> nan) - // so just replace with 0 - grad_input.masked_fill_(input == 0, 0); + grad_self = (grad * result) / self; + grad_self.masked_fill_(self == 0, 0); + grad_src = (grad * result).gather(dim, index) / src; + grad_src.masked_fill_(src == 0, 0); } else if (reduce == "mean") { - Tensor N = zeros_like(grad); - N.scatter_add_(dim, index, ones_like(input)); - Tensor N_input = N.gather(dim, index); - grad_input = grad.gather(dim, index) / N_input; - grad_input.masked_fill_(N_input == 0, 0); + Tensor N = include_self ? ones_like(grad) : zeros_like(grad); + N = N.scatter_add(dim, index, ones_like(src)); + N.masked_fill_(N == 0, 1); + grad_self = grad / N; + Tensor N_src = N.gather(dim, index); + grad_src = grad.gather(dim, index) / N_src; } else if (reduce == "amax" || reduce == "amin") { + grad_self = (self == result) * grad; Tensor value = result.gather(dim, index); - grad_input = (input == value) * grad.gather(dim, index); + grad_src = (src == value) * grad.gather(dim, index); } else { AT_ERROR("Expected 'reduce' to be one of 'sum', 'prod', 'mean', 'amax', 'amin' but got ", reduce, "."); } - return grad_input; + if (!include_self) { + grad_self = grad_self.scatter(dim, index, 0); + } + + return std::make_tuple(grad_self, grad_src); } diff --git a/torch/csrc/autograd/FunctionsManual.h b/torch/csrc/autograd/FunctionsManual.h index 9451f5f49d20a4..c9c245b3cd1c69 100644 --- a/torch/csrc/autograd/FunctionsManual.h +++ b/torch/csrc/autograd/FunctionsManual.h @@ -49,7 +49,7 @@ Tensor restore_reduced_dims(const Tensor &output, IntArrayRef dims, bool keepdim Tensor scale_grad_by_count(const Tensor &grad, const Tensor &mask, IntArrayRef dims); at::Tensor norm_backward(const at::Tensor & grad, const at::Tensor & self, const optional & p_, const at::Tensor & norm); at::Tensor norm_backward(at::Tensor grad, const at::Tensor & self, const optional & p_, at::Tensor norm, at::IntArrayRef dim, bool keepdim); -at::Tensor linalg_vector_norm_backward(at::Tensor grad, const at::Tensor & self, const at::Scalar & ord, at::Tensor norm, const c10::optional & opt_dim, bool keepdim); +at::Tensor linalg_vector_norm_backward(at::Tensor grad, const at::Tensor & self, const at::Scalar & ord, at::Tensor norm, const at::OptionalIntArrayRef & opt_dim, bool keepdim); at::Tensor pow_backward(at::Tensor grad, const at::Tensor & self, const at::Scalar & exponent_); at::Tensor pow_backward_self(at::Tensor grad, const at::Tensor & self, const at::Tensor & exponent); at::Tensor pow_backward_exponent(at::Tensor grad, const at::Tensor& self, const at::Tensor& exponent, at::Tensor result); @@ -77,6 +77,7 @@ at::Tensor solve_backward_self(const at::Tensor & grad, const at::Tensor & self, at::Tensor solve_backward_A(const at::Tensor & grad, const at::Tensor & self, const at::Tensor & A, const at::Tensor & solution); at::Tensor cumsum_backward(const at::Tensor & grad, int64_t dim); at::Tensor logsumexp_backward(at::Tensor grad, const at::Tensor & self, at::Tensor result, at::IntArrayRef dim, bool keepdim); +at::Tensor logsumexp_jvp(const at::Tensor& self_p, const at::Tensor& self_t, IntArrayRef dim, bool keepdim); at::Tensor logcumsumexp_backward(at::Tensor grad, const at::Tensor & self, at::Tensor result, int64_t dim); at::Tensor unbind_backward(const variable_list& grads, int64_t dim); at::Tensor unsqueeze_to(const at::Tensor & self, at::IntArrayRef sizes); @@ -85,6 +86,11 @@ std::vector cat_tensors_backward(const at::Tensor & grad, const std: at::Tensor clamp_backward(const at::Tensor & grad, const at::Tensor &self, const optional& min, const optional& max); at::Tensor clamp_backward(const at::Tensor & grad, const at::Tensor &self, const at::Tensor& min, const at::Tensor& max); std::tuple clamp_backward_min_max(const at::Tensor& grad, const at::Tensor& self, const at::Tensor& min, const at::Tensor& max, const std::array&); +at::Tensor clamp_jvp( + const Tensor& self_p, const Tensor& self_t, + const Tensor& min_p, const Tensor& min_t, + const Tensor& max_p, const Tensor& max_t +); at::IntArrayRef strides_or_error(const Tensor & input, c10::string_view const & input_name); at::Tensor mm_mat1_backward(const Tensor & grad, const Tensor & mat2, at::IntArrayRef mat1_sizes, at::IntArrayRef mat1_strides, const Scalar & alpha); at::Tensor mm_mat2_backward(const at::Tensor & grad, const at::Tensor & mat1, at::IntArrayRef sizes, at::IntArrayRef strides, const at::Scalar & alpha); @@ -97,16 +103,17 @@ at::Tensor infinitely_differentiable_native_dropout_backward(const at::Tensor& g at::Tensor native_dropout_double_backward(const at::Tensor& ggI, const at::Tensor& grad, const at::Tensor& mask, double scale); at::Tensor evenly_distribute_backward(at::Tensor grad, const at::Tensor & input, const at::Tensor & value); at::Tensor sgn_backward(Tensor result, Tensor grad, Tensor self); -at::Tensor var_backward(at::Tensor grad, const at::Tensor& self, c10::optional dim, c10::optional correction, bool keepdim); -at::Tensor var_jvp(const at::Tensor& self_t, const at::Tensor& self_p, const at::Tensor& result, c10::optional dim_opt, c10::optional correction_opt, bool keepdim); -at::Tensor std_backward(const at::Tensor& result, const at::Tensor& grad, const at::Tensor& self, c10::optional dim, c10::optional correction, bool keepdim); +at::Tensor var_backward(at::Tensor grad, const at::Tensor& self, at::OptionalIntArrayRef dim, c10::optional correction, bool keepdim); +at::Tensor var_jvp(const at::Tensor& self_t, const at::Tensor& self_p, const at::Tensor& result, at::OptionalIntArrayRef dim_opt, c10::optional correction_opt, bool keepdim); +at::Tensor std_backward(const at::Tensor& result, const at::Tensor& grad, const at::Tensor& self, at::OptionalIntArrayRef dim, c10::optional correction, bool keepdim); at::Tensor mean_backward(at::Tensor grad, const at::IntArrayRef sizes, at::IntArrayRef dim, bool keepdim); at::Tensor mean_backward(at::Tensor grad, const at::IntArrayRef sizes, int64_t numel); -at::Tensor var_std_mean_backward(const variable_list& grads, const at::Tensor& self, const at::Tensor& r1, const at::Tensor& r2, c10::optional dim, c10::optional correction, bool keepdim, bool is_std); +at::Tensor var_std_mean_backward(const variable_list& grads, const at::Tensor& self, const at::Tensor& r1, const at::Tensor& r2, at::OptionalIntArrayRef dim, c10::optional correction, bool keepdim, bool is_std); at::Tensor masked_scatter_backward(const at::Tensor & grad, const at::Tensor & mask, at::IntArrayRef sizes); at::Tensor cholesky_backward(at::Tensor grad, bool upper, at::Tensor L); at::Tensor cholesky_jvp(const at::Tensor& input_tangent, const at::Tensor& L, bool upper); at::Tensor cholesky_inverse_backward(at::Tensor grad, at::Tensor L, bool upper, at::Tensor inverse); +at::Tensor cholesky_inverse_jvp(const at::Tensor& F, const at::Tensor& dF, const at::Tensor& X, bool upper); Tensor pinv_jvp( const Tensor& A, const Tensor& pinvA, @@ -465,12 +472,14 @@ std::tuple _cudnn_convolution_backward( at::IntArrayRef output_padding, at::IntArrayRef stride, at::IntArrayRef dilation, bool transposed, int64_t groups, ::std::array output_mask); -Tensor scatter_reduce_backward( +std::tuple scatter_reduce_backward( const Tensor& grad, - const Tensor& input, + const Tensor& self, int dim, const Tensor& index, + const Tensor& src, c10::string_view reduce, + bool include_self, const Tensor& result ); diff --git a/torch/csrc/autograd/TraceTypeManual.cpp b/torch/csrc/autograd/TraceTypeManual.cpp index 031b50215d8caf..a96fa42abd172a 100644 --- a/torch/csrc/autograd/TraceTypeManual.cpp +++ b/torch/csrc/autograd/TraceTypeManual.cpp @@ -283,7 +283,9 @@ void general_trace_function( AT_ASSERT(iter->isObject()); tracer::addOutput(node, iter->toObject()); } else { - throw std::runtime_error("unsupported output type: " + type->str()); + throw std::runtime_error( + "unsupported output type: " + type->str() + + ", from operator: " + toString(op.operator_name())); } } } diff --git a/torch/csrc/autograd/function.h b/torch/csrc/autograd/function.h index cc5fa59e9ed6a2..e258cbf4b6588d 100644 --- a/torch/csrc/autograd/function.h +++ b/torch/csrc/autograd/function.h @@ -151,6 +151,9 @@ struct TORCH_API Node : std::enable_shared_from_this { // probably operate with names. at::NoNamesGuard no_names_guard; + // Keep track of backward pass for rocblas. + at::BackwardPassGuard in_backward; + bool pre_sampled = false; if (at::shouldRunRecordFunction(&pre_sampled)) { // Using RecordFunction to trigger observers in the backward pass diff --git a/torch/csrc/autograd/init.cpp b/torch/csrc/autograd/init.cpp index 8499fd90314978..36b7b185b596d8 100644 --- a/torch/csrc/autograd/init.cpp +++ b/torch/csrc/autograd/init.cpp @@ -9,7 +9,6 @@ #include #include #include -#include #include #include #include @@ -21,8 +20,10 @@ #include #include #include +#include #include #include +#include #include #include @@ -233,6 +234,7 @@ PyObject* THPAutograd_initExtension(PyObject* _unused, PyObject *unused) { m.def("_disable_profiler", disableProfiler); m.def("_prepare_profiler", prepareProfiler); m.def("_add_metadata_json", addMetadataJson); // Only if `USE_KINETO` is set + m.def("_kineto_step", profilerStep); // Only if `USE_KINETO` is set m.def("kineto_available", []() { return torch::profiler::kKinetoAvailable; }); // NOTICE: These record functions are not torch operators and may not show up @@ -241,7 +243,9 @@ PyObject* THPAutograd_initExtension(PyObject* _unused, PyObject *unused) { // Creates a new profiling scope using RecordFunction and invokes its starting // callbacks. m.def("_record_function_with_args_enter", [](const std::string& name, py::args args) { - auto rec = std::make_unique(at::RecordScope::USER_SCOPE); + using torch::autograd::profiler::PythonRecordFunction; + auto python_rec = c10::make_intrusive(at::RecordScope::USER_SCOPE); + auto *rec = &python_rec->record; if (rec->isActive()) { if (rec->needsInputs()) { auto iv_inputs = std::vector(); @@ -253,16 +257,19 @@ PyObject* THPAutograd_initExtension(PyObject* _unused, PyObject *unused) { rec->before(name); } } - return at::cpp_custom_type_hack::create(std::move(rec), at::TensorOptions()); + return torch::jit::toPyObject(std::move(python_rec)); }); // Ends the profiling scope created with record_function_with_param_enter. - m.def("_record_function_with_args_exit", [](const at::Tensor& handle) { - // We don't actually need to do anything with handle just need to persist the - // lifetime until now. - auto& rec = at::cpp_custom_type_hack::cast(handle); - rec.end(); - }); + m.def("_record_function_with_args_exit", + [](const py::object &obj) { + using torch::autograd::profiler::PythonRecordFunction; + auto python_record = torch::jit::toCustomClass(obj); + + // We don't actually need to do anything with handle just need to persist the + // lifetime until now. + python_record->record.end(); + }); m.def("_supported_activities", []() { std::set activities {ActivityType::CPU}; @@ -554,6 +561,31 @@ static PyObject * exit_python_mode(PyObject* _unused, PyObject* arg) { END_HANDLE_TH_ERRORS } +static PyObject * set_torch_function_mode(PyObject* _unused, PyObject* arg) { + HANDLE_TH_ERRORS + if (arg == Py_None) { + at::impl::PythonTorchFunctionTLS::set_mode(nullptr); + } else { + Py_INCREF(arg); + at::impl::PythonTorchFunctionTLS::set_mode(std::make_shared(arg, getPyInterpreter())); + } + Py_RETURN_NONE; + END_HANDLE_TH_ERRORS +} + +static PyObject * get_torch_function_mode(PyObject* _unused, PyObject* _unused2) { + HANDLE_TH_ERRORS + const auto& mode = at::impl::PythonTorchFunctionTLS::get_mode(); + if (!mode) { + Py_RETURN_NONE; + } else { + auto* r = mode->ptr(getPyInterpreter()); + Py_INCREF(r); + return r; + } + END_HANDLE_TH_ERRORS +} + // autograd methods on torch._C static PyMethodDef methods[] = { // NOLINT {"_set_grad_enabled", set_grad_enabled, METH_O, nullptr}, @@ -578,6 +610,8 @@ static PyMethodDef methods[] = { // NOLINT {"_exit_dual_level", castPyCFunctionWithKeywords(python_exit_dual_level), METH_VARARGS | METH_KEYWORDS, nullptr}, {"_enter_python_mode", enter_python_mode, METH_O, nullptr}, {"_exit_python_mode", exit_python_mode, METH_NOARGS, nullptr}, + {"_set_torch_function_mode", set_torch_function_mode, METH_O, nullptr}, + {"_get_torch_function_mode", get_torch_function_mode, METH_NOARGS, nullptr}, {nullptr, nullptr, 0, nullptr} }; diff --git a/torch/csrc/autograd/profiler_kineto.cpp b/torch/csrc/autograd/profiler_kineto.cpp index 1ce7d85887be08..58ebb4ea119686 100644 --- a/torch/csrc/autograd/profiler_kineto.cpp +++ b/torch/csrc/autograd/profiler_kineto.cpp @@ -4,11 +4,12 @@ #include #include #include +#include +#include -#include -#include -#include #include +#include +#include #include #include @@ -117,46 +118,8 @@ namespace { using torch::profiler::impl::ProfilerThreadLocalStateBase; using torch::profiler::impl::ActiveProfilerType; -// NOLINTNEXTLINE(cppcoreguidelines-pro-type-member-init) -struct OpEventData { - // POD members - int64_t start_us_; - int64_t end_us_; - uint64_t correlation_id_; - uint64_t start_thread_id_; - uint64_t end_thread_id_; - int64_t sequence_number_; - uint64_t forward_thread_id_; - uint8_t record_function_scope_; - bool is_async_; - int64_t debug_handle_; - torch::profiler::impl::kineto::DeviceAndResource kineto_info_; - - std::string name_; - - // report_input_shapes - std::vector> shapes_; - std::vector dtypes_; - - // with_stack - std::vector stack_; - - // with_modules - c10::optional> module_hierarchy_; - - // with_flops - std::unordered_map extra_args_; - - // reportBackendEventToActiveKinetoProfiler - c10::optional backend_; - - // ProfilerState::KINETO_GPU_FALLBACK - torch::profiler::impl::CUDAEventStub cuda_event_start_ = nullptr; - torch::profiler::impl::CUDAEventStub cuda_event_end_ = nullptr; -}; - struct MemoryEventData { - int64_t start_time; + torch::profiler::impl::approx_time_t start_time; void* ptr; int64_t alloc_size; int64_t total_allocated; @@ -174,11 +137,6 @@ static inline uint64_t getForwardThreadKey(uint64_t tid, uint64_t seqNr) { return (((tid) << 48) | ((seqNr) & (((uint64_t)1 << 48) - 1))); } -struct KinetoObserverContext : public at::ObserverContext { - explicit KinetoObserverContext(OpEventData* data) : data_(data) {} - OpEventData* data_; -}; - struct KinetoThreadLocalState : public ProfilerThreadLocalStateBase { explicit KinetoThreadLocalState( const ProfilerConfig& config, @@ -186,6 +144,7 @@ struct KinetoThreadLocalState : public ProfilerThreadLocalStateBase { : ProfilerThreadLocalStateBase(config), start_time_(getTimeUs()), activities_(std::move(activities)), + record_queue_(config), cpu_trace_(start_time_, "PyTorch Profiler") {} ~KinetoThreadLocalState() override = default; @@ -204,12 +163,6 @@ struct KinetoThreadLocalState : public ProfilerThreadLocalStateBase { return config().with_stack && activities_.count(ActivityType::CPU); } - std::unique_ptr newOpEvent() { - std::lock_guard guard(state_mutex_); - op_events_.emplace_back(); - return std::make_unique(&op_events_.back()); - } - void reportMemoryUsage( void* ptr, int64_t alloc_size, @@ -217,16 +170,17 @@ struct KinetoThreadLocalState : public ProfilerThreadLocalStateBase { int64_t total_reserved, c10::Device device) override { if (config_.profile_memory && config_.state != ProfilerState::Disabled) { - memory_events_.push_back( - {getTimeUs(), - ptr, - alloc_size, - total_allocated, - total_reserved, - at::RecordFunction::currentThreadId(), - torch::profiler::impl::kineto::kineto_ids(), - device.type(), - device.index()}); + std::lock_guard guard(state_mutex_); + memory_events_.emplace_back( + torch::profiler::impl::getApproximateTime(), + ptr, + alloc_size, + total_allocated, + total_reserved, + at::RecordFunction::currentThreadId(), + torch::profiler::impl::kineto::kineto_ids(), + device.type(), + device.index()); } } @@ -264,84 +218,103 @@ struct KinetoThreadLocalState : public ProfilerThreadLocalStateBase { void materializeOpEvents() { std::lock_guard guard(state_mutex_); + auto converter = clock_converter_.makeConverter(); for (const auto& e : memory_events_) { - cpu_trace_.addMemoryUsageActivity( - kMemoryEventName, - e.kineto_info, - e.start_time, - c10::Device(e.device_type, e.device_index), - e.ptr, - e.alloc_size, - e.total_allocated, - e.total_reserved); + auto start_time_us = converter(e.start_time) / 1000; + cpu_trace_.addMemoryUsageActivity( + kMemoryEventName, + e.kineto_info, + start_time_us, + c10::Device(e.device_type, e.device_index), + e.ptr, + e.alloc_size, + e.total_allocated, + e.total_reserved); kineto_events_.emplace_back(); auto& evt = kineto_events_.back(); evt.name(kMemoryEventName) - .startUs(e.start_time) + .startUs(start_time_us) .deviceIndex(e.device_index) .deviceType(e.device_type) .nBytes(e.alloc_size) .startThreadId(e.threadID); } + memory_events_.clear(); + + for (const auto& e : record_queue_.getRecords(converter)) { + // `take_data` handles time conversion. + int64_t start_us = e.start_time_us_; + int64_t end_us = e.end_time_us_; - for (const auto& e : op_events_) { - if (e.end_us_ < e.start_us_) { + if (end_us < start_us) { // We initialize end_us_ to the smallest int64_t, so this means that // the op did not finish before we stopped profiling. continue; } cpu_trace_.addCPUActivity( - e.name_, + e.name(), e.kineto_info_, - e.correlation_id_, - e.start_us_, - e.end_us_); + e.correlation_id(), + start_us, + end_us); kineto_events_.emplace_back(); kineto_events_.back() - .name(e.name_) - .startUs(e.start_us_) - .durationUs(e.end_us_ - e.start_us_) - .correlationId(e.correlation_id_) + .name(e.name()) + .startUs(start_us) + .durationUs(end_us - start_us) + .correlationId(e.correlation_id()) .deviceType(c10::DeviceType::CPU) - .startThreadId(e.start_thread_id_) - .endThreadId(e.end_thread_id_) - .sequenceNr(e.sequence_number_) - .fwdThreadId(e.forward_thread_id_) - .scope(e.record_function_scope_) - .setAsync(e.is_async_) - .debugHandle(e.debug_handle_); - - if (!e.shapes_.empty()) { - kineto_events_.back().shapes(e.shapes_); + .startThreadId(e.start_tid_); + + c10::visit( + c10::overloaded( + [&](const torch::profiler::impl::OpEvent& op_event) { + kineto_events_.back() + .endThreadId(op_event.end_thread_id_) + .sequenceNr(op_event.sequence_number_) + .fwdThreadId(op_event.forward_thread_id_) + .scope(op_event.record_function_scope_) + .setAsync(op_event.is_async_) + .debugHandle(op_event.debug_handle_); + }, + [&](const torch::profiler::impl::BackendEvent& backend_event) { + kineto_events_.back() + .endThreadId(e.start_tid_) + .scope(backend_event.record_function_scope_) + .debugHandle(backend_event.debug_handle_) + .backend(backend_event.backend_); + }), + e.event_); + + if (!e.inputs_.shapes_.empty()) { + kineto_events_.back().shapes(e.inputs_.shapes_); } - if (!e.dtypes_.empty()) { - kineto_events_.back().dtypes(e.dtypes_); + if (!e.inputs_.dtypes_.empty()) { + kineto_events_.back().dtypes(e.inputs_.dtypes_); } - if (!e.stack_.empty()) { - kineto_events_.back().stack(e.stack_); + if (!e.jit_stack_.empty()) { + kineto_events_.back().stack(e.jit_stack_); } - if (e.module_hierarchy_) { - kineto_events_.back().moduleHierarchy(*e.module_hierarchy_); + if (!e.jit_modules_.empty()) { + kineto_events_.back().moduleHierarchy(e.jit_modules_); } if (!e.extra_args_.empty()) { kineto_events_.back().flops( - computeFlops(std::string(e.name_), e.extra_args_)); + computeFlops(e.name(), e.extra_args_)); } - if (e.backend_) { - kineto_events_.back().backend(*e.backend_); - } - kineto_events_.back().cuda_event_start_ = e.cuda_event_start_; - kineto_events_.back().cuda_event_end_ = e.cuda_event_end_; + kineto_events_.back().cuda_event_start_ = + e.gpu_fallback_.cuda_event_start_; + kineto_events_.back().cuda_event_end_ = + e.gpu_fallback_.cuda_event_end_; } - op_events_.clear(); } void finalizeCPUTrace(std::unique_ptr& cpu_trace) { @@ -549,12 +522,7 @@ struct KinetoThreadLocalState : public ProfilerThreadLocalStateBase { auto iter = tidSeq2activity.find(key); if (iter != tidSeq2activity.end()) { libkineto::GenericTraceActivity* fwd = iter->second; -#ifdef USE_KINETO_UPDATED fwd->flow.start = true; -#else - activity.flow.linkedActivity = fwd; // Only destination side set this, - // to distinguish with start side. -#endif activity.flow.id = fwd->flow.id = fwd_bwd_link_id; activity.flow.type = fwd->flow.type = libkineto::kLinkFwdBwd; ++fwd_bwd_link_id; @@ -586,6 +554,9 @@ struct KinetoThreadLocalState : public ProfilerThreadLocalStateBase { #ifdef USE_KINETO const auto& events = *(trace.get()->activities()); for (const auto& ev_ptr : events) { + if (ev_ptr == nullptr) { + continue; + } const auto& activity = *ev_ptr; // These events are already processed if (activity.type() != libkineto::ActivityType::CPU_OP && @@ -611,9 +582,10 @@ struct KinetoThreadLocalState : public ProfilerThreadLocalStateBase { } uint64_t start_time_; + torch::profiler::impl::ApproximateClockToUnixTimeConverter clock_converter_; std::set activities_; - std::deque op_events_; - std::deque memory_events_; + torch::profiler::impl::RecordQueue record_queue_; + torch::profiler::impl::AppendOnlyList memory_events_; torch::profiler::impl::kineto::TraceWrapper cpu_trace_; std::vector kineto_events_; // Optional, if event post-processing is enabled. @@ -634,51 +606,7 @@ void pushProfilingCallbacks(const std::unordered_set& scopes) { const auto& config = state_ptr->config(); auto corr_id = next_correlation_id(); torch::profiler::impl::kineto::pushCorrelationId(corr_id); - - auto ctx_ptr = state_ptr->newOpEvent(); - auto data_ptr = ctx_ptr->data_; - - data_ptr->end_us_ = std::numeric_limits::min(); - data_ptr->correlation_id_ = corr_id; - data_ptr->start_thread_id_ = fn.threadId(); - data_ptr->sequence_number_ = fn.seqNr(); - data_ptr->forward_thread_id_ = fn.forwardThreadId(); - data_ptr->record_function_scope_ = (uint8_t)fn.scope(); - data_ptr->is_async_ = fn.isAsync(); - data_ptr->debug_handle_ = fn.debugHandle(); - data_ptr->kineto_info_ = torch::profiler::impl::kineto::kineto_ids(); - data_ptr->name_ = fn.name(); - if (config.report_input_shapes) { - data_ptr->shapes_ = torch::profiler::impl::inputSizes(fn); - data_ptr->dtypes_ = torch::profiler::impl::inputTypes(fn); - } -#if !defined BUILD_LITE_INTERPRETER && !defined C10_MOBILE - // backward nodes source range corresponds to the forward node - // TODO: consider using C++ stack trace - if (config.with_stack && - fn.scope() != at::RecordScope::BACKWARD_FUNCTION) { - auto cs = torch::profiler::impl::prepareCallstack(jit::currentCallstack()); - data_ptr->stack_ = callstackStr(cs); - } - if (config.with_modules && - fn.scope() != at::RecordScope::BACKWARD_FUNCTION) { - data_ptr->module_hierarchy_ = jit::currentModuleHierarchy(); - } -#endif - if (config.with_flops) { - data_ptr->extra_args_ = torch::profiler::impl::saveExtraArgs(fn); - } - data_ptr->start_us_ = getTimeUs(); - - if (config.state == ProfilerState::KINETO_GPU_FALLBACK) { - try { - torch::profiler::impl::cudaStubs()->record( - nullptr, &data_ptr->cuda_event_start_, nullptr); - } catch (const std::exception& e) { - LOG(WARNING) << "Failed to record CUDA event. " << e.what(); - } - } - return ctx_ptr; + return state_ptr->record_queue_.getSubqueue()->begin_op(fn, corr_id); }, [](const at::RecordFunction& fn, at::ObserverContext* ctx_ptr) { auto state_ptr = KinetoThreadLocalState::getTLS(); @@ -687,23 +615,22 @@ void pushProfilingCallbacks(const std::unordered_set& scopes) { } const auto& config = state_ptr->config(); auto* kineto_ctx_ptr = - static_cast(ctx_ptr); + static_cast(ctx_ptr); TORCH_INTERNAL_ASSERT(kineto_ctx_ptr != nullptr); - auto data_ptr = kineto_ctx_ptr->data_; - data_ptr->end_us_ = getTimeUs(); - data_ptr->end_thread_id_ = at::RecordFunction::currentThreadId(); - + kineto_ctx_ptr->event_->end_time_ = torch::profiler::impl::getApproximateTime(); + kineto_ctx_ptr->event_->end_thread_id_ = at::RecordFunction::currentThreadId(); if (config.state == ProfilerState::KINETO_GPU_FALLBACK) { try { + auto fallback = kineto_ctx_ptr->fallback_; + TORCH_INTERNAL_ASSERT(fallback != nullptr); torch::profiler::impl::cudaStubs()->record( - nullptr, &data_ptr->cuda_event_end_, nullptr); + nullptr, &fallback->cuda_event_end_, nullptr); } catch (const std::exception& e) { LOG(WARNING) << "Failed to record CUDA event. " << e.what(); } } torch::profiler::impl::kineto::popCorrelationId(); - torch::profiler::impl::kineto::recordThreadInfo(); }) .needsInputs(registration_state_ptr->config().report_input_shapes) .scopes(scopes)); @@ -724,21 +651,14 @@ void reportBackendEventToActiveKinetoProfiler( return; } - auto ctx_ptr = state_ptr->newOpEvent(); - auto data_ptr = ctx_ptr->data_; - data_ptr->start_us_ = start_time_us; - data_ptr->end_us_ = end_time_us; - data_ptr->correlation_id_ = std::numeric_limits::max(); - data_ptr->start_thread_id_ = at::RecordFunction::currentThreadId(); - data_ptr->end_thread_id_ = data_ptr->start_thread_id_; - data_ptr->sequence_number_ = -1; - data_ptr->forward_thread_id_ = data_ptr->start_thread_id_; - data_ptr->record_function_scope_ = (uint8_t)scope; - data_ptr->is_async_ = false; - data_ptr->debug_handle_ = debug_handle; - data_ptr->kineto_info_ = torch::profiler::impl::kineto::kineto_ids(); - data_ptr->name_ = event_name; - data_ptr->backend_ = backend_name; + state_ptr->record_queue_.getSubqueue()->emplace_backend_event( + torch::profiler::impl::BackendEvent { + start_time_us, + end_time_us, + (uint8_t)scope, + debug_handle, + event_name, + backend_name}); /* no support for input shapes now? if (config.report_input_shapes) { @@ -746,8 +666,6 @@ void reportBackendEventToActiveKinetoProfiler( ctx_ptr->dtypes = inputTypes(fn); } */ - - torch::profiler::impl::kineto::recordThreadInfo(); } void prepareProfiler( diff --git a/torch/csrc/autograd/python_function.cpp b/torch/csrc/autograd/python_function.cpp index 9a6221130ed0ca..43911fe18b993f 100644 --- a/torch/csrc/autograd/python_function.cpp +++ b/torch/csrc/autograd/python_function.cpp @@ -167,10 +167,16 @@ auto PyNode::is_traceable() -> bool { } auto PyNode::release_variables() -> void { - pybind11::gil_scoped_acquire gil; - auto f = (THPFunction*) obj; - f->saved_variables.clear(); - f->has_freed_buffers = 1; + // This function is called as part of the Node destructor! + // Since this object might be kept alive by C++, it is possible + // that the python interpreter is already dead here. In that case + // we just leak the saved objects. + if (Py_IsInitialized()) { + pybind11::gil_scoped_acquire gil; + auto f = (THPFunction*) obj; + f->saved_variables.clear(); + f->has_freed_buffers = 1; + } } auto PyNode::name() const -> std::string { @@ -564,6 +570,11 @@ static void _trace_post_record( } node->i_(jit::attr::inplace, is_inplace); + if (PyObject* module_name = PyDict_GetItemString(((PyTypeObject*)op_obj)->tp_dict, "__module__")) { + if (auto ptr = PyUnicode_AsUTF8(module_name)) { + node->s_(jit::attr::module, std::string(ptr)); + } + } // Isolate C variable ptrs in a vector int num_outputs = PyTuple_GET_SIZE(output_objects); @@ -671,10 +682,19 @@ PyObject* THPFunction_name(PyObject *self, PyObject* noargs) { PyObject *THPFunction_apply(PyObject *cls, PyObject *inputs) { HANDLE_TH_ERRORS + + // save a local copy of seq_id before it gets incremented + int seq_id = at::sequence_number::peek(); + auto info_pair = unpack_input(inputs); + UnpackedInput& unpacked_input = info_pair.first; + InputFlags& input_info = info_pair.second; + + // Call record function after all the inputs have been decoded, but + // before context has been allocated. RECORD_FUNCTION( ((PyTypeObject*)cls)->tp_name, - std::vector(), - at::sequence_number::peek()); + std::vector(unpacked_input.input_vars.begin(), unpacked_input.input_vars.end()), + seq_id); // Temporary hack to improve functorch UX. We'll find a better solution. const auto& functorch_tls = at::functorch::functorchTLSAccessor(); @@ -691,11 +711,6 @@ PyObject *THPFunction_apply(PyObject *cls, PyObject *inputs) auto cdata = std::shared_ptr(new PyNode(std::move(ctx_obj)), deleteNode); ctx->cdata = cdata; - // Prepare inputs and allocate context (grad fn) - auto info_pair = unpack_input(inputs); - UnpackedInput& unpacked_input = info_pair.first; - InputFlags& input_info = info_pair.second; - // Record input nodes if tracing auto* node = _trace_pre_record(cls, inputs, unpacked_input.input_vars); @@ -705,6 +720,7 @@ PyObject *THPFunction_apply(PyObject *cls, PyObject *inputs) ctx->needs_input_grad = input_info.needs_input_grad.release(); ctx->is_variable_input = std::move(input_info.is_variable_input); + // Prepend ctx to input_tuple, in preparation for static method call auto num_args = PyTuple_GET_SIZE(inputs); THPObjectPtr ctx_input_tuple(PyTuple_New(num_args + 1)); diff --git a/torch/csrc/autograd/python_mode.cpp b/torch/csrc/autograd/python_mode.cpp index cda38bdb7dff3e..7e49d29d824368 100644 --- a/torch/csrc/autograd/python_mode.cpp +++ b/torch/csrc/autograd/python_mode.cpp @@ -1,8 +1,9 @@ -#include -#include -#include #include +#include #include +#include +#include +#include namespace torch { namespace autograd { @@ -13,10 +14,10 @@ void PythonMode::enter(PyObject* type) { "python mode has already been set. We do not yet support nested python ", "mode. Please file us an issue and reset it before setting it again.") } - // TorchDispatchTypeObject steals a reference, See NOTE [What is TorchDispatchTypeObject?] + // SafePyObject steals a reference, See NOTE [What is SafePyObject?] Py_INCREF(type); - auto state = std::make_shared(type, getPyInterpreter()); - at::impl::PythonModeTLS::set_state(state); + at::impl::PythonModeTLS::set_state( + std::make_shared(type, getPyInterpreter())); } void PythonMode::exit() { diff --git a/torch/csrc/autograd/python_variable.cpp b/torch/csrc/autograd/python_variable.cpp index f960d8287c24e4..e3d828a699346b 100644 --- a/torch/csrc/autograd/python_variable.cpp +++ b/torch/csrc/autograd/python_variable.cpp @@ -1,36 +1,34 @@ -#include - -#include +#include +#include +#include +#include +#include +#include +#include #include #include -#include #include +#include #include #include #include +#include +#include +#include #include #include #include -#include -#include -#include -#include #include #include +#include +#include #include -#include #include #include #include -#include #include +#include #include -#include -#include -#include -#include -#include - #include #include @@ -104,7 +102,7 @@ void concrete_dispatch_fn( const c10::impl::PyInterpreter*, const c10::OperatorHandle& op, torch::jit::Stack* stack, - const std::shared_ptr& type); + const std::shared_ptr& type); class PyInterpreterHolder { public: @@ -901,6 +899,16 @@ PyObject *THPVariable_is_cuda(THPVariable *self, void *unused) END_HANDLE_TH_ERRORS } +PyObject* THPVariable_is_ipu(THPVariable* self, void* unused) { + HANDLE_TH_ERRORS + if (check_has_torch_function((PyObject*)self)) { + return handle_torch_function_getter(self, "is_ipu"); + } + auto& self_ = THPVariable_Unpack(self); + return torch::autograd::utils::wrap(self_.is_ipu()); + END_HANDLE_TH_ERRORS +} + PyObject* THPVariable_is_xpu(THPVariable* self, void* unused) { HANDLE_TH_ERRORS if (check_has_torch_function((PyObject*)self)) { @@ -1010,6 +1018,17 @@ PyObject *THPVariable_is_complex(THPVariable *self, void *unused) END_HANDLE_TH_ERRORS } +PyObject *THPVariable_is_nested(THPVariable *self, void *unused) +{ + HANDLE_TH_ERRORS + if (check_has_torch_function((PyObject *)self)) { + return handle_torch_function_getter(self, "is_nested"); + } + auto& self_ = THPVariable_Unpack(self); + return torch::autograd::utils::wrap(self_.is_nested()); + END_HANDLE_TH_ERRORS +} + static PyObject *THPVariable_dtype(THPVariable *self, void *unused) { HANDLE_TH_ERRORS @@ -1064,28 +1083,28 @@ PyObject *THPVariable_get_imag(THPVariable* self, void *unused) END_HANDLE_TH_ERRORS } -int THPVariable_set_real(THPVariable *self, THPVariable *real, void *unused) +int THPVariable_set_real(PyObject* self, PyObject* real, void *unused) { HANDLE_TH_ERRORS auto& self_ = THPVariable_Unpack(self); - auto& real_ = THPVariable_Unpack(real); + auto self_real = at::real(self_); + auto real_ = valueToTensor(self_real.options(), real, self_real.device()); { pybind11::gil_scoped_release no_gil; - auto self_real = at::real(self_); self_real.copy_(real_); return 0; } END_HANDLE_TH_ERRORS_RET(-1) } -int THPVariable_set_imag(THPVariable* self, THPVariable *imag, void *unused) +int THPVariable_set_imag(PyObject* self, PyObject* imag, void *unused) { HANDLE_TH_ERRORS auto& self_ = THPVariable_Unpack(self); - auto& imag_ = THPVariable_Unpack(imag); + auto self_imag = at::imag(self_); + auto imag_ = valueToTensor(self_imag.options(), imag, self_imag.device()); { pybind11::gil_scoped_release no_gil; - auto self_imag = at::imag(self_); self_imag.copy_(imag_); return 0; } @@ -1119,6 +1138,7 @@ static struct PyGetSetDef THPVariable_properties[] = { {"shape", (getter)THPVariable_get_shape, nullptr, nullptr, nullptr}, {"is_cuda", (getter)THPVariable_is_cuda, nullptr, nullptr, nullptr}, {"is_xpu", (getter)THPVariable_is_xpu, nullptr, nullptr, nullptr}, + {"is_ipu", (getter)THPVariable_is_ipu, nullptr, nullptr, nullptr}, {"is_sparse", (getter)THPVariable_is_sparse, nullptr, nullptr, nullptr}, {"is_sparse_csr", (getter)THPVariable_is_sparse_csr, nullptr, nullptr, nullptr}, {"is_mkldnn", (getter)THPVariable_is_mkldnn, nullptr, nullptr, nullptr}, @@ -1128,6 +1148,7 @@ static struct PyGetSetDef THPVariable_properties[] = { {"is_complex", (getter)THPVariable_is_complex, nullptr, nullptr, nullptr}, {"is_quantized", (getter)THPVariable_is_quantized, nullptr, nullptr, nullptr}, {"is_meta", (getter)THPVariable_is_meta, nullptr, nullptr, nullptr}, + {"is_nested", (getter)THPVariable_is_nested, nullptr, nullptr, nullptr}, {"dtype", (getter)THPVariable_dtype, nullptr, nullptr, nullptr}, {"layout", (getter)THPVariable_layout, nullptr, nullptr, nullptr}, {"device", (getter)THPVariable_device, nullptr, nullptr, nullptr}, @@ -1267,7 +1288,7 @@ PyObject *THPVariable_pynew(PyTypeObject *type, PyObject *args, PyObject *kwargs HANDLE_TH_ERRORS TORCH_CHECK(type != &THPVariableType, "Cannot directly construct _TensorBase; subclass it and then construct that"); jit::tracer::warn("torch.Tensor", jit::tracer::WARN_CONSTRUCTOR); - auto tensor = torch::utils::legacy_tensor_ctor(torch::tensors::get_default_dispatch_key(), torch::tensors::get_default_scalar_type(), args, kwargs); + auto tensor = torch::utils::base_tensor_ctor(args, kwargs); // WARNING: tensor is NOT guaranteed to be a fresh tensor; e.g., if it was // given a raw pointer that will refcount bump return THPVariable_NewWithVar( @@ -1674,7 +1695,7 @@ void concrete_dispatch_fn( const c10::impl::PyInterpreter*, const c10::OperatorHandle& op, torch::jit::Stack* stack, - const std::shared_ptr& type) { + const std::shared_ptr& type) { const auto& schema = op.schema(); const auto num_returns = schema.returns().size(); @@ -1684,6 +1705,7 @@ void concrete_dispatch_fn( // Parse the name into namespace and name (no overload_name) // TODO: put this into the library const auto& qualified_name = op.operator_name().name; + const auto& overload_name = schema.overload_name(); auto pos = qualified_name.find("::"); TORCH_INTERNAL_ASSERT(pos != std::string::npos, qualified_name); // Make me some null terminated strings @@ -1704,6 +1726,12 @@ void concrete_dispatch_fn( // overload resolution but is more complicated (need to expose separate // functions per overload) py::handle torch_api_function = py::module::import("torch").attr("ops").attr(ns).attr(func_name); + py::handle torch_api_function_overload; + if (overload_name == "") { + torch_api_function_overload = torch_api_function.attr("default"); + } else { + torch_api_function_overload = torch_api_function.attr(overload_name.c_str()); + } std::string module_name_str = "torch.ops." + ns_str; // About all the pointers: @@ -1752,7 +1780,7 @@ void concrete_dispatch_fn( py::dict kwargs; if (type) { - append_overloaded_type(&overloaded_args, type->ptr()); + append_overloaded_type(&overloaded_args, type->ptr(getPyInterpreter())); } // Find overloaded tensors @@ -1790,15 +1818,15 @@ void concrete_dispatch_fn( kwargs[py::cast(arg.name())] = torch::jit::toPyObject(std::move(arguments[idx])); } - auto out = py::reinterpret_steal(handle_torch_function_no_python_arg_parser( - overloaded_args, - args.ptr(), - kwargs.ptr(), - func_name, - torch_api_function.ptr(), - module_name_str.c_str(), - "__torch_dispatch__" - )); + auto out = py::reinterpret_steal( + handle_torch_function_no_python_arg_parser( + overloaded_args, + args.ptr(), + kwargs.ptr(), + func_name, + torch_api_function_overload.ptr(), + module_name_str.c_str(), + TorchFunctionName::TorchDispatch)); if (num_returns == 0) { // Check that we got a None return from Python. Anything else is an error. @@ -1830,15 +1858,20 @@ c10::intrusive_ptr concrete_detach_fn(const c10::impl::PyInterpreter py::dict kwargs; - auto out = py::reinterpret_steal(handle_torch_function_no_python_arg_parser( - overloaded_args, - args.ptr(), - kwargs.ptr(), - "detach", - py::module::import("torch").attr("ops").attr("aten").attr("detach").ptr(), - "torch.ops.aten", - "__torch_dispatch__" - )); + auto out = py::reinterpret_steal( + handle_torch_function_no_python_arg_parser( + overloaded_args, + args.ptr(), + kwargs.ptr(), + "detach", + py::module::import("torch") + .attr("ops") + .attr("aten") + .attr("detach") + .attr("default") + .ptr(), + "torch.ops.aten", + TorchFunctionName::TorchDispatch)); TORCH_CHECK(THPVariable_Check(out.ptr()), "detach returned invalid type ", py::detail::get_fully_qualified_tp_name(Py_TYPE(out.ptr())), ", expected Tensor"); const Tensor& res_t = THPVariable_Unpack(out.ptr()); diff --git a/torch/csrc/autograd/python_variable_indexing.cpp b/torch/csrc/autograd/python_variable_indexing.cpp index 8faa07066ead73..6b7b7b6ef29f3f 100644 --- a/torch/csrc/autograd/python_variable_indexing.cpp +++ b/torch/csrc/autograd/python_variable_indexing.cpp @@ -4,7 +4,6 @@ #include #include #include -#include #include #include #include @@ -88,7 +87,7 @@ static inline Variable sequenceToVariable(c10::TensorOptions options, PyObject* return torch::utils::indexing_tensor_from_data(options, kLong, c10::nullopt, seq); } -static inline Variable valueToTensor(c10::TensorOptions options, PyObject* value, const at::Device& device) { +inline Variable valueToTensor(c10::TensorOptions options, PyObject* value, const at::Device& device) { if (THPVariable_Check(value)) { return THPVariable_Unpack(value); } diff --git a/torch/csrc/autograd/python_variable_indexing.h b/torch/csrc/autograd/python_variable_indexing.h index 398b77293810d2..027bffb6dc8a04 100644 --- a/torch/csrc/autograd/python_variable_indexing.h +++ b/torch/csrc/autograd/python_variable_indexing.h @@ -1,6 +1,7 @@ #pragma once #include +#include namespace torch { namespace autograd { @@ -8,4 +9,6 @@ Py_ssize_t THPVariable_length(PyObject* self); PyObject* THPVariable_getitem(PyObject* self, PyObject* index); int THPVariable_setitem(PyObject* self, PyObject* index, PyObject* value); +Variable valueToTensor(c10::TensorOptions options, PyObject* value, const at::Device& device); + }} // namespace torch::autograd diff --git a/torch/csrc/autograd/record_function_ops.cpp b/torch/csrc/autograd/record_function_ops.cpp index 2cf427e04f6091..ad8bf336ee1507 100644 --- a/torch/csrc/autograd/record_function_ops.cpp +++ b/torch/csrc/autograd/record_function_ops.cpp @@ -1,8 +1,10 @@ +#include #include #include #include -#include +#include +#include namespace caffe2 { // Required for cpp_custom_type_hack to work @@ -16,47 +18,68 @@ namespace profiler { // Creates a new profiling scope using RecordFunction and invokes its starting // callbacks. -at::Tensor record_function_enter( +void record_function_enter( const std::string& name, - const c10::optional& args) { - auto rec = std::make_unique(at::RecordScope::USER_SCOPE); - if (rec->isActive()) { - if (rec->needsInputs() && args.has_value()) { - rec->before(name, std::vector{c10::IValue{args.value()}}); + const c10::optional& args, + at::RecordFunction &rec) { + if (rec.isActive()) { + if (rec.needsInputs() && args.has_value()) { + rec.before(name, std::vector{c10::IValue{args.value()}}); } else { - rec->before(name); + rec.before(name); } } +} + +// Legacy signature using cpp_custom_type_hack +at::Tensor record_function_enter_legacy( + const std::string& name, + const c10::optional& args) { + auto rec = std::make_unique(at::RecordScope::USER_SCOPE); + record_function_enter(name, args, *rec); return at::cpp_custom_type_hack::create(std::move(rec), at::TensorOptions()); } +// New signature using custom_class +c10::intrusive_ptr record_function_enter_new( + const std::string &name, const c10::optional &args) { + auto rec = c10::make_intrusive(at::RecordScope::USER_SCOPE); + record_function_enter(name, args, rec->record); + return rec; +} + at::RecordFunction& getRecordFunctionFromTensor(const at::Tensor& handle) { auto& rec = at::cpp_custom_type_hack::cast(handle); return rec; } // Ends the profiling scope created with record_function_enter. -void record_function_exit(const at::Tensor& handle) { +void record_function_exit(at::RecordFunction &rec) { + rec.end(); +} + +// Legacy signature using cpp_custom_type_hack +void record_function_exit_legacy(const at::Tensor &handle) { // We don't actually need to do anything with handle just need to persist the // lifetime until now. auto& rec = getRecordFunctionFromTensor(handle); - rec.end(); + record_function_exit(rec); +} + +// New signature using custom_class +void record_function_exit_new(const c10::intrusive_ptr &record) { + record_function_exit(record->record); } +template c10::intrusive_ptr _call_end_callbacks_on_fut( - const at::Tensor& handle, + Func get_record, const c10::intrusive_ptr& fut) { // Profiling callback that ends the associated record_function // and returns the value of the passed in future. std::function futureProfilingFunc = - [handle](c10::ivalue::Future& fut) { - TORCH_INTERNAL_ASSERT( - handle.defined(), - "Undefined RecordFunction handle. This can happen if the handle is " - "not correctly persisted and is destroyed before the future is " - "realized."); - - auto& rec = getRecordFunctionFromTensor(handle); + [get_record = std::move(get_record)](c10::ivalue::Future& fut) { + auto& rec = get_record(); rec.end(); // Note: this future is returned to the user to ensure that a call to wait() // ensures that profiling callbacks have ran. To ensure that this is @@ -67,36 +90,74 @@ c10::intrusive_ptr _call_end_callbacks_on_fut( }; // Define a future that completes after the profiling callbacks are run. auto profiledFut = fut->then(at::wrapPropagateTLSState( - futureProfilingFunc), + std::move(futureProfilingFunc)), fut->elementType() ); return profiledFut; } -// Internal only, do not use directly, use Python's record_function() -TORCH_LIBRARY_FRAGMENT(profiler, m) { - m.def("_record_function_enter(str name, str? args=None) -> Tensor", &record_function_enter); - m.def("_record_function_exit", &record_function_exit); +// Legacy signature using cpp_custom_type_hack +c10::intrusive_ptr _call_end_callbacks_on_fut_legacy( + const at::Tensor &handle, + const c10::intrusive_ptr& fut) { + return _call_end_callbacks_on_fut( + [handle] () -> at::RecordFunction& { + TORCH_INTERNAL_ASSERT( + handle.defined(), + "Undefined RecordFunction handle. This can happen if the handle is " + "not correctly persisted and is destroyed before the future is " + "realized."); + + return getRecordFunctionFromTensor(handle); + }, + fut + ); } -// Needed to register JIT operator in operator registry below -c10::AliasAnalysisKind aliasAnalysisFromSchema() { - return c10::AliasAnalysisKind::FROM_SCHEMA; +// New signature using custom_class +c10::intrusive_ptr _call_end_callbacks_on_fut_new( + const c10::intrusive_ptr &record, + const c10::intrusive_ptr& fut) { + return _call_end_callbacks_on_fut( + [record] () -> at::RecordFunction& { return record->record; }, fut); } -jit::RegisterOperators reg_fut_ops({ - jit::Operator( +// Internal only, do not use directly, use Python's record_function() +TORCH_LIBRARY_FRAGMENT(profiler, m) { + m.class_("_RecordFunction"); + + m.def("_record_function_enter(str name, str? args=None) -> Tensor", + &record_function_enter_legacy); + m.def("_record_function_enter_new(str name, str? args=None) -> " + "__torch__.torch.classes.profiler._RecordFunction", + &record_function_enter_new); + m.def("_record_function_exit", &record_function_exit_legacy); + m.def("_record_function_exit._RecordFunction", &record_function_exit_new); + + torch::jit::registerOperator(torch::jit::Operator( "profiler::_call_end_callbacks_on_jit_fut(Tensor x, Future(t) y) -> Future(t)", [](jit::Stack& stack) { // Pop inputs, which should be a future and a tensor auto fut = jit::pop(stack).toFuture(); auto tensor = jit::pop(stack).toTensor(); - auto profiledFut = _call_end_callbacks_on_fut(tensor, fut); + auto profiledFut = _call_end_callbacks_on_fut_legacy(tensor, fut); // return future that completes when profiling callbacks have run. jit::push(stack, std::move(profiledFut)); }, - aliasAnalysisFromSchema()), -}); + c10::AliasAnalysisKind::FROM_SCHEMA)); + torch::jit::registerOperator(torch::jit::Operator( + "profiler::_call_end_callbacks_on_jit_fut._RecordFunction(" + "__torch__.torch.classes.profiler._RecordFunction x, Future(t) y) -> Future(t)", + [](c10::Stack &stack) { + // Pop inputs, which should be a future and a PythonRecordFunction + auto fut = torch::jit::pop(stack).toFuture(); + auto tensor = torch::jit::pop(stack).toCustomClass(); + auto profiledFut = _call_end_callbacks_on_fut_new(tensor, fut); + // return future that completes when profiling callbacks have run. + torch::jit::push(stack, std::move(profiledFut)); + }, + c10::AliasAnalysisKind::FROM_SCHEMA)); +} } // namespace profiler } // namespace autograd diff --git a/torch/csrc/autograd/record_function_ops.h b/torch/csrc/autograd/record_function_ops.h index 9042537aeabccb..81cc584381d42d 100644 --- a/torch/csrc/autograd/record_function_ops.h +++ b/torch/csrc/autograd/record_function_ops.h @@ -1,17 +1,30 @@ #pragma once #include #include +#include namespace torch { namespace autograd { namespace profiler { + +struct PythonRecordFunction: public torch::CustomClassHolder { + at::RecordFunction record; + + PythonRecordFunction( + at::RecordScope scope = at::RecordScope::FUNCTION, + bool pre_sampled = false) + : record(scope, pre_sampled) + {} +}; + // Creates a new profiling scope using RecordFunction and invokes its starting // callbacks. -TORCH_API at::Tensor record_function_enter(const std::string& name, const c10::optional& args = c10::nullopt); +TORCH_API c10::intrusive_ptr record_function_enter_new( + const std::string &name, const c10::optional &args = c10::nullopt); // Schedules RecordFunction's end callbacks to be run on completion of a future. -TORCH_API c10::intrusive_ptr _call_end_callbacks_on_fut( - const at::Tensor& handle, +TORCH_API c10::intrusive_ptr _call_end_callbacks_on_fut_new( + const c10::intrusive_ptr &record, const c10::intrusive_ptr& fut); } // namespace profiler diff --git a/torch/csrc/autograd/utils/wrap_outputs.h b/torch/csrc/autograd/utils/wrap_outputs.h index 10439553fcc571..114b53487368c7 100644 --- a/torch/csrc/autograd/utils/wrap_outputs.h +++ b/torch/csrc/autograd/utils/wrap_outputs.h @@ -7,6 +7,7 @@ #include #include #include +#include #include #include @@ -77,117 +78,6 @@ inline PyObject* wrap(at::QScheme qscheme) { return thp_qscheme; } -inline PyObject* wrap(std::tuple tensors) { - auto r = THPObjectPtr{PyTuple_New(2)}; - if (!r) throw python_error(); - PyTuple_SET_ITEM(r.get(), 0, wrap(std::get<0>(tensors))); - PyTuple_SET_ITEM(r.get(), 1, wrap(std::get<1>(tensors))); - return r.release(); -} - -inline PyObject* wrap(PyTypeObject *type, std::tuple tensors) { - auto r = THPObjectPtr{PyStructSequence_New(type)}; - if (!r) throw python_error(); - PyStructSequence_SET_ITEM(r.get(), 0, wrap(std::get<0>(tensors))); - PyStructSequence_SET_ITEM(r.get(), 1, wrap(std::get<1>(tensors))); - return r.release(); -} - -inline PyObject* wrap(std::tuple tensors) { - auto r = THPObjectPtr{PyTuple_New(3)}; - if (!r) throw python_error(); - PyTuple_SET_ITEM(r.get(), 0, wrap(std::move(std::get<0>(tensors)))); - PyTuple_SET_ITEM(r.get(), 1, wrap(std::move(std::get<1>(tensors)))); - PyTuple_SET_ITEM(r.get(), 2, wrap(std::move(std::get<2>(tensors)))); - return r.release(); -} - -inline PyObject* wrap(PyTypeObject *type, std::tuple tensors) { - auto r = THPObjectPtr{PyStructSequence_New(type)}; - if (!r) throw python_error(); - PyStructSequence_SET_ITEM(r.get(), 0, wrap(std::get<0>(tensors))); - PyStructSequence_SET_ITEM(r.get(), 1, wrap(std::get<1>(tensors))); - PyStructSequence_SET_ITEM(r.get(), 2, wrap(std::get<2>(tensors))); - return r.release(); -} - -inline PyObject* wrap(PyTypeObject *type, std::tuple tensors) { - auto r = THPObjectPtr{PyStructSequence_New(type)}; - if (!r) throw python_error(); - PyStructSequence_SET_ITEM(r.get(), 0, wrap(std::get<0>(tensors))); - PyStructSequence_SET_ITEM(r.get(), 1, wrap(std::get<1>(tensors))); - PyStructSequence_SET_ITEM(r.get(), 2, wrap(std::get<2>(tensors))); - PyStructSequence_SET_ITEM(r.get(), 3, wrap(std::get<3>(tensors))); - return r.release(); -} - -inline PyObject* wrap(std::tuple tensors) { - auto r = THPObjectPtr{PyTuple_New(4)}; - if (!r) throw python_error(); - PyTuple_SET_ITEM(r.get(), 0, wrap(std::move(std::get<0>(tensors)))); - PyTuple_SET_ITEM(r.get(), 1, wrap(std::move(std::get<1>(tensors)))); - PyTuple_SET_ITEM(r.get(), 2, wrap(std::move(std::get<2>(tensors)))); - PyTuple_SET_ITEM(r.get(), 3, wrap(std::get<3>(tensors))); - return r.release(); -} - -inline PyObject* wrap(std::tuple tensors) { - auto r = THPObjectPtr{PyTuple_New(4)}; - if (!r) throw python_error(); - PyTuple_SET_ITEM(r.get(), 0, wrap(std::move(std::get<0>(tensors)))); - PyTuple_SET_ITEM(r.get(), 1, wrap(std::move(std::get<1>(tensors)))); - // NOLINTNEXTLINE(performance-move-const-arg) - PyTuple_SET_ITEM(r.get(), 2, wrap(std::move(std::get<2>(tensors)))); - // NOLINTNEXTLINE(performance-move-const-arg) - PyTuple_SET_ITEM(r.get(), 3, wrap(std::move(std::get<3>(tensors)))); - return r.release(); -} - -inline PyObject* wrap(std::tuple tensors) { - auto r = THPObjectPtr{PyTuple_New(5)}; - if (!r) throw python_error(); - PyTuple_SET_ITEM(r.get(), 0, wrap(std::move(std::get<0>(tensors)))); - PyTuple_SET_ITEM(r.get(), 1, wrap(std::move(std::get<1>(tensors)))); - PyTuple_SET_ITEM(r.get(), 2, wrap(std::move(std::get<2>(tensors)))); - PyTuple_SET_ITEM(r.get(), 3, wrap(std::move(std::get<3>(tensors)))); - PyTuple_SET_ITEM(r.get(), 4, wrap(std::get<4>(tensors))); - return r.release(); -} - -inline PyObject* wrap(std::tuple tensors) { - auto r = THPObjectPtr{PyTuple_New(5)}; - if (!r) throw python_error(); - PyTuple_SET_ITEM(r.get(), 0, wrap(std::move(std::get<0>(tensors)))); - PyTuple_SET_ITEM(r.get(), 1, wrap(std::move(std::get<1>(tensors)))); - // NOLINTNEXTLINE(performance-move-const-arg) - PyTuple_SET_ITEM(r.get(), 2, wrap(std::move(std::get<2>(tensors)))); - PyTuple_SET_ITEM(r.get(), 3, wrap(std::move(std::get<3>(tensors)))); - // NOLINTNEXTLINE(performance-move-const-arg) - PyTuple_SET_ITEM(r.get(), 4, wrap(std::move(std::get<4>(tensors)))); - return r.release(); -} - -inline PyObject* wrap(std::tuple tensors) { - auto r = THPObjectPtr{PyTuple_New(4)}; - if (!r) throw python_error(); - PyTuple_SET_ITEM(r.get(), 0, wrap(std::move(std::get<0>(tensors)))); - PyTuple_SET_ITEM(r.get(), 1, wrap(std::move(std::get<1>(tensors)))); - PyTuple_SET_ITEM(r.get(), 2, wrap(std::move(std::get<2>(tensors)))); - PyTuple_SET_ITEM(r.get(), 3, wrap(std::move(std::get<3>(tensors)))); - return r.release(); -} - -inline PyObject* wrap(std::tuple tensors) { - auto r = THPObjectPtr{PyTuple_New(5)}; - if (!r) throw python_error(); - PyTuple_SET_ITEM(r.get(), 0, wrap(std::move(std::get<0>(tensors)))); - PyTuple_SET_ITEM(r.get(), 1, wrap(std::move(std::get<1>(tensors)))); - PyTuple_SET_ITEM(r.get(), 2, wrap(std::move(std::get<2>(tensors)))); - PyTuple_SET_ITEM(r.get(), 3, wrap(std::move(std::get<3>(tensors)))); - PyTuple_SET_ITEM(r.get(), 4, wrap(std::move(std::get<4>(tensors)))); - return r.release(); -} - inline PyObject* wrap(at::TensorList tl) { auto r = THPObjectPtr{PyTuple_New(tl.size())}; if (!r) throw python_error(); @@ -206,13 +96,38 @@ inline PyObject* wrap(at::IntArrayRef list) { return r.release(); } -inline PyObject* wrap(std::tuple tensors) { - auto r = THPObjectPtr{PyTuple_New(2)}; +namespace detail { +template +void apply_with_idx_impl(const F &f, Tuple &t, std::index_sequence /*indices*/) { + (void)std::initializer_list { + (f(std::get(t), Is), 0)... + }; +} + +// For tuple(a, b, c), calls f(a, 0), f(b, 1), f(c, 2) +template +void apply_with_idx(const F & f, std::tuple &t) { + apply_with_idx_impl(f, t, std::index_sequence_for{}); +} +} // namespace detail + +template +PyObject* wrap(std::tuple values) { + auto r = THPObjectPtr{PyTuple_New(sizeof...(Ts))}; + if (!r) throw python_error(); + detail::apply_with_idx([&](auto &value, size_t idx) { + PyTuple_SET_ITEM(r.get(), idx, wrap(std::move(value))); + }, values); + return r.release(); +} + +template +PyObject* wrap(PyTypeObject *type, std::tuple values) { + auto r = THPObjectPtr{PyStructSequence_New(type)}; if (!r) throw python_error(); - // NOLINTNEXTLINE(performance-move-const-arg) - PyTuple_SET_ITEM(r.get(), 0, wrap(std::move(std::get<0>(tensors)))); - // NOLINTNEXTLINE(performance-move-const-arg) - PyTuple_SET_ITEM(r.get(), 1, wrap(std::move(std::get<1>(tensors)))); + detail::apply_with_idx([&](auto &value, size_t idx) { + PyStructSequence_SET_ITEM(r.get(), idx, wrap(std::move(value))); + }, values); return r.release(); } diff --git a/torch/csrc/cuda/Event.cpp b/torch/csrc/cuda/Event.cpp index 20821636a7744b..4312b3aaf7b0c0 100644 --- a/torch/csrc/cuda/Event.cpp +++ b/torch/csrc/cuda/Event.cpp @@ -119,7 +119,7 @@ static PyObject * THCPEvent_wait(PyObject *_self, PyObject *_stream) { { auto self = (THCPEvent*)_self; auto stream = (THCPStream*)_stream; - pybind11::gil_scoped_release no_gil; + pybind11::gil_scoped_release no_gil{}; self->cuda_event.block(stream->cuda_stream); } Py_RETURN_NONE; @@ -145,7 +145,7 @@ static PyObject * THCPEvent_synchronize(PyObject *_self, PyObject *noargs) { HANDLE_TH_ERRORS { auto self = (THCPEvent*)_self; - pybind11::gil_scoped_release no_gil; + pybind11::gil_scoped_release no_gil{}; self->cuda_event.synchronize(); } Py_RETURN_NONE; diff --git a/torch/csrc/cuda/shared/cudart.cpp b/torch/csrc/cuda/shared/cudart.cpp index b93d921a16a946..b0af4c0884e91b 100644 --- a/torch/csrc/cuda/shared/cudart.cpp +++ b/torch/csrc/cuda/shared/cudart.cpp @@ -49,8 +49,8 @@ void initCudartBindings(PyObject* module) { #endif cudart.def("cuda" "MemGetInfo", [](int device) -> std::pair { c10::cuda::CUDAGuard guard(device); - size_t device_free; - size_t device_total; + size_t device_free = 0; + size_t device_total = 0; cudaMemGetInfo(&device_free, &device_total); return {device_free, device_total}; }); diff --git a/torch/csrc/deploy/CMakeLists.txt b/torch/csrc/deploy/CMakeLists.txt index f8aa997eb10922..ec1dd3fef75a9d 100644 --- a/torch/csrc/deploy/CMakeLists.txt +++ b/torch/csrc/deploy/CMakeLists.txt @@ -33,10 +33,23 @@ caffe2_interface_library(torch_deploy_internal torch_deploy) set(INTERPRETER_TEST_SOURCES ${DEPLOY_DIR}/test_deploy.cpp ) +set(INTERPRETER_TEST_SOURCES_GPU + ${DEPLOY_DIR}/test_deploy_gpu.cpp +) + add_executable(test_deploy ${INTERPRETER_TEST_SOURCES}) target_compile_definitions(test_deploy PUBLIC TEST_CUSTOM_LIBRARY) target_include_directories(test_deploy PRIVATE ${PYTORCH_ROOT}/torch) -target_link_libraries(test_deploy PUBLIC "-Wl,--no-as-needed" gtest dl torch_deploy) +target_link_libraries(test_deploy + PUBLIC "-Wl,--no-as-needed -rdynamic" gtest dl torch_deploy +) + +add_executable(test_deploy_gpu ${INTERPRETER_TEST_SOURCES_GPU}) +target_compile_definitions(test_deploy_gpu PUBLIC TEST_CUSTOM_LIBRARY) +target_include_directories(test_deploy_gpu PRIVATE ${PYTORCH_ROOT}/torch) +target_link_libraries(test_deploy_gpu + PUBLIC "-Wl,--no-as-needed -rdynamic" gtest dl torch_deploy +) add_library(test_deploy_lib SHARED test_deploy_lib.cpp) add_dependencies(test_deploy_lib cpython) @@ -45,14 +58,19 @@ target_link_libraries(test_deploy_lib PRIVATE pybind::pybind11) add_executable(deploy_benchmark ${DEPLOY_DIR}/example/benchmark.cpp) target_include_directories(deploy_benchmark PRIVATE ${PYTORCH_ROOT}/torch) -target_link_libraries(deploy_benchmark PUBLIC "-Wl,--no-as-needed" torch_deploy) +target_link_libraries(deploy_benchmark + PUBLIC "-Wl,--no-as-needed -rdynamic" torch_deploy +) add_executable(interactive_embedded_interpreter ${DEPLOY_DIR}/interactive_embedded_interpreter.cpp) target_include_directories(interactive_embedded_interpreter PRIVATE ${PYTORCH_ROOT}/torch) -target_link_libraries(interactive_embedded_interpreter PUBLIC "-Wl,--no-as-needed" torch_deploy) +target_link_libraries(interactive_embedded_interpreter + PUBLIC "-Wl,--no-as-needed -rdynamic" torch_deploy +) if(INSTALL_TEST) install(TARGETS test_deploy DESTINATION bin) + install(TARGETS test_deploy_gpu DESTINATION bin) endif() install(TARGETS torch_deploy DESTINATION lib) diff --git a/torch/csrc/deploy/Exception.h b/torch/csrc/deploy/Exception.h new file mode 100644 index 00000000000000..f4311debeebc45 --- /dev/null +++ b/torch/csrc/deploy/Exception.h @@ -0,0 +1,47 @@ +#ifndef MULTIPY_EXCEPTION_H +#define MULTIPY_EXCEPTION_H + +#include + +#define MULTIPY_INTERNAL_ASSERT_WITH_MESSAGE(condition, message) \ + if (!(condition)) { \ + throw std::runtime_error( \ + "Internal Assertion failed: (" + std::string(#condition) + "), " + \ + "function " + __FUNCTION__ + ", file " + __FILE__ + ", line " + \ + std::to_string(__LINE__) + ".\n" + "Please report bug to Pytorch.\n" + \ + message + "\n"); \ + } + +#define MULTIPY_INTERNAL_ASSERT_NO_MESSAGE(condition) \ + MULTIPY_INTERNAL_ASSERT_WITH_MESSAGE(#condition, "") + +#define MULTIPY_INTERNAL_ASSERT_(x, condition, message, FUNC, ...) FUNC + +#define MULTIPY_INTERNAL_ASSERT(...) \ + MULTIPY_INTERNAL_ASSERT_( \ + , \ + ##__VA_ARGS__, \ + MULTIPY_INTERNAL_ASSERT_WITH_MESSAGE(__VA_ARGS__), \ + MULTIPY_INTERNAL_ASSERT_NO_MESSAGE(__VA_ARGS__)); + +#define MULTIPY_CHECK_WITH_MESSAGE(condition, message) \ + if (!(condition)) { \ + throw std::runtime_error( \ + "Check failed: (" + std::string(#condition) + "), " + "function " + \ + __FUNCTION__ + ", file " + __FILE__ + ", line " + \ + std::to_string(__LINE__) + ".\n" + message + "\n"); \ + } + +#define MULTIPY_CHECK_NO_MESSAGE(condition) \ + MULTIPY_CHECK_WITH_MESSAGE(#condition, "") + +#define MULTIPY_CHECK_(x, condition, message, FUNC, ...) FUNC + +#define MULTIPY_CHECK(...) \ + MULTIPY_CHECK_( \ + , \ + ##__VA_ARGS__, \ + MULTIPY_CHECK_WITH_MESSAGE(__VA_ARGS__), \ + MULTIPY_CHECK_NO_MESSAGE(__VA_ARGS__)); + +#endif // MULTIPY_EXCEPTION_H diff --git a/torch/csrc/deploy/benchmark.cpp b/torch/csrc/deploy/benchmark.cpp new file mode 100644 index 00000000000000..82296a5e1a1da2 --- /dev/null +++ b/torch/csrc/deploy/benchmark.cpp @@ -0,0 +1,336 @@ +#include + +#include +#include +#include + +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +typedef void (*function_type)(const char*); + +bool cuda = false; + +constexpr auto latency_p = { + 25., + 50., + 95.}; //{1., 5., 25., 50., 75., 90., 95., 99., 99.25, 99.5, 99.75, 99.9}; + +// NOLINTNEXTLINE(cppcoreguidelines-pro-type-member-init) +struct Report { + std::string benchmark; + std::string strategy; + size_t n_threads; + size_t items_completed; + double work_items_per_second; + std::vector latencies; + static void report_header(std::ostream& out) { + out << "benchmark, strategy, n_threads, work_items_completed, work_items_per_second"; + for (double l : latency_p) { + out << ", p" << l << "_latency"; + } + out << ", device\n"; + } + void report(std::ostream& out) { + out << benchmark << ", " << strategy << ", " << n_threads << ", " + << items_completed << ", " << work_items_per_second; + for (double l : latencies) { + out << ", " << l; + } + out << ", " << (cuda ? "cuda" : "cpu") << "\n"; + } +}; + +const int min_items_to_complete = 1; + +struct RunPython { + static torch::deploy::ReplicatedObj load_and_wrap( + torch::deploy::Package& package) { + auto I = package.acquireSession(); + auto obj = I.self.attr("load_pickle")({"model", "model.pkl"}); + if (cuda) { + obj = I.global("gpu_wrapper", "GPUWrapper")({obj}); + } + return I.createMovable(obj); + } + // NOLINTNEXTLINE(cppcoreguidelines-pro-type-member-init) + RunPython( + torch::deploy::Package& package, + std::vector eg, + const torch::deploy::Interpreter* interps) + : obj_(load_and_wrap(package)), eg_(std::move(eg)), interps_(interps) {} + void operator()(int i) { + auto I = obj_.acquireSession(); + if (cuda) { + // NOLINTNEXTLINE(cppcoreguidelines-init-variables) + std::vector eg2 = {i}; + eg2.insert(eg2.end(), eg_.begin(), eg_.end()); + I.self(eg2); + } else { + I.self(eg_); + } + } + torch::deploy::ReplicatedObj obj_; + std::vector eg_; + const torch::deploy::Interpreter* interps_; +}; + +// def to_device(i, d): +// if isinstance(i, torch.Tensor): +// return i.to(device=d) +// elif isinstance(i, (tuple, list)): +// return tuple(to_device(e, d) for e in i) +// else: +// raise RuntimeError('inputs are weird') + +static torch::IValue to_device(const torch::IValue& v, torch::Device to); + +static std::vector to_device_vec( + at::ArrayRef vs, + torch::Device to) { + std::vector results; + for (const torch::IValue& v : vs) { + results.push_back(to_device(v, to)); + } + return results; +} + +static torch::IValue to_device(const torch::IValue& v, torch::Device to) { + if (v.isTensor()) { + return v.toTensor().to(to); + } else if (v.isTuple()) { + auto tup = v.toTuple(); + return c10::ivalue::Tuple::create(to_device_vec(tup->elements(), to)); + } else if (v.isList()) { + auto converted = to_device_vec(v.toListRef(), to); + torch::List result(v.toList().elementType()); + for (const torch::IValue& v : converted) { + result.push_back(v); + } + return result; + } else { + MULTIPY_INTERNAL_ASSERT(false, "cannot to_device"); + } +} + +static bool exists(const std::string& fname) { + std::fstream jit_file(fname); + return jit_file.good(); +} + +struct RunJIT { + RunJIT(const std::string& file_to_run, std::vector eg) + : eg_(std::move(eg)) { + if (!cuda) { + models_.push_back(torch::jit::load(file_to_run + "_jit")); + } else { + for (const auto i : c10::irange(2)) { + auto d = torch::Device(torch::DeviceType::CUDA, i); + std::stringstream qualified; + qualified << file_to_run << "_jit_" << i; + auto loaded = exists(qualified.str()) + ? torch::jit::load(qualified.str(), d) + : torch::jit::load(file_to_run + "_jit", d); + loaded.to(d); + models_.push_back(loaded); + } + } + } + void operator()(int i) { + if (cuda) { + const auto device_id = i % models_.size(); + auto d = torch::Device(torch::DeviceType::CUDA, device_id); + to_device( + models_[device_id].forward(to_device_vec(eg_, d)), + torch::DeviceType::CPU); + } else { + models_[0].forward(eg_); + } + } + std::vector eg_; + std::vector models_; +}; + +struct Benchmark { + // NOLINTNEXTLINE(cppcoreguidelines-pro-type-member-init) + Benchmark( + torch::deploy::InterpreterManager& manager, + size_t n_threads, + std::string strategy, + // NOLINTNEXTLINE(modernize-pass-by-value) + std::string file_to_run, + size_t n_seconds = 5) + : manager_(manager), + n_threads_(n_threads), + strategy_(strategy), + file_to_run_(file_to_run), + n_seconds_(n_seconds), + should_run_(true), + items_completed_(0), + reached_min_items_completed_(0) { + // NOLINTNEXTLINE(bugprone-branch-clone) + if (strategy == "one_python") { + manager.debugLimitInterpreters(1); + } else if (strategy == "multi_python") { + manager.debugLimitInterpreters(n_threads_); + } + } + + Report run() { + pthread_barrier_init(&first_run_, nullptr, n_threads_ + 1); + + // NOLINTNEXTLINE(cppcoreguidelines-init-variables) + torch::deploy::Package package = manager_.loadPackage(file_to_run_); + + // NOLINTNEXTLINE(cppcoreguidelines-init-variables) + std::vector eg; + { + auto I = package.acquireSession(); + + eg = I.global("builtins", "tuple")( + I.self.attr("load_pickle")({"model", "example.pkl"})) + .toIValue() + .toTupleRef() + .elements(); + } + + // NOLINTNEXTLINE(bugprone-branch-clone) + if (strategy_ == "jit") { + run_one_work_item = RunJIT(file_to_run_, std::move(eg)); + } else { + run_one_work_item = + RunPython(package, std::move(eg), manager_.allInstances().data()); + } + + // NOLINTNEXTLINE(cppcoreguidelines-init-variables) + std::vector> latencies(n_threads_); + + for (const auto i : c10::irange(n_threads_)) { + threads_.emplace_back([this, &latencies, i] { + torch::NoGradGuard guard; + // do initial work + run_one_work_item(i); + + pthread_barrier_wait(&first_run_); + size_t local_items_completed = 0; + while (should_run_) { + auto begin = std::chrono::steady_clock::now(); + run_one_work_item(i); + auto end = std::chrono::steady_clock::now(); + double work_seconds = + std::chrono::duration(end - begin).count(); + latencies[i].push_back(work_seconds); + local_items_completed++; + if (local_items_completed == min_items_to_complete) { + reached_min_items_completed_++; + } + } + items_completed_ += local_items_completed; + }); + } + + pthread_barrier_wait(&first_run_); + auto begin = std::chrono::steady_clock::now(); + auto try_stop_at = begin + std::chrono::seconds(n_seconds_); + std::this_thread::sleep_until(try_stop_at); + for (int i = 0; reached_min_items_completed_ < n_threads_; ++i) { + std::this_thread::sleep_until( + begin + (i + 2) * std::chrono::seconds(n_seconds_)); + } + should_run_ = false; + for (std::thread& thread : threads_) { + thread.join(); + } + auto end = std::chrono::steady_clock::now(); + // NOLINTNEXTLINE(cppcoreguidelines-init-variables) + double total_seconds = std::chrono::duration(end - begin).count(); + Report report; + report.benchmark = file_to_run_; + report.strategy = strategy_; + report.n_threads = n_threads_; + report.items_completed = items_completed_; + report.work_items_per_second = items_completed_ / total_seconds; + reportLatencies(report.latencies, latencies); + run_one_work_item = nullptr; + return report; + } + + private: + void reportLatencies( + std::vector& results, + const std::vector>& latencies) { + // NOLINTNEXTLINE(cppcoreguidelines-init-variables) + std::vector flat_latencies; + for (const auto& elem : latencies) { + flat_latencies.insert(flat_latencies.end(), elem.begin(), elem.end()); + } + std::sort(flat_latencies.begin(), flat_latencies.end()); + for (double target : latency_p) { + size_t idx = size_t(flat_latencies.size() * target / 100.0); + double time = flat_latencies.size() == 0 + ? 0 + : flat_latencies.at(std::min(flat_latencies.size() - 1, idx)); + results.push_back(time); + } + } + torch::deploy::InterpreterManager& manager_; + size_t n_threads_; + std::string strategy_; + std::string file_to_run_; + size_t n_seconds_; + pthread_barrier_t first_run_; + std::atomic should_run_; + std::atomic items_completed_; + std::atomic reached_min_items_completed_; + std::vector threads_; + std::function run_one_work_item; +}; + +// NOLINTNEXTLINE(bugprone-exception-escape) +int main(int argc, char* argv[]) { + int max_thread = atoi(argv[1]); + cuda = std::string(argv[2]) == "cuda"; + // NOLINTNEXTLINE(cppcoreguidelines-init-variables) + bool jit_enable = std::string(argv[3]) == "jit"; + Report::report_header(std::cout); + torch::deploy::InterpreterManager manager(max_thread); + + // make sure gpu_wrapper.py is in the import path + for (auto& interp : manager.allInstances()) { + auto I = interp.acquireSession(); + I.global("sys", "path").attr("append")({"torch/csrc/deploy/example"}); + } + + auto n_threads = {1, 2, 4, 8, 16, 32, 40}; + for (const auto i : c10::irange(4, argc)) { + std::string model_file = argv[i]; + for (int n_thread : n_threads) { + if (n_thread > max_thread) { + continue; + } + for (std::string strategy : {"one_python", "multi_python", "jit"}) { + if (strategy == "jit") { + if (!jit_enable) { + continue; + } + if (!exists(model_file + "_jit")) { + continue; + } + } + Benchmark b(manager, n_thread, strategy, model_file); + Report r = b.run(); + r.report(std::cout); + } + } + } + return 0; +} diff --git a/torch/csrc/deploy/deploy.cpp b/torch/csrc/deploy/deploy.cpp index 647c9a4e810bd0..47a6936c72025b 100644 --- a/torch/csrc/deploy/deploy.cpp +++ b/torch/csrc/deploy/deploy.cpp @@ -1,6 +1,8 @@ -#include +#include #include #include +#include + #include #include @@ -54,12 +56,13 @@ static bool writeDeployInterpreter(FILE* dst) { std::ifstream("/proc/self/cmdline") >> exePath; ElfFile elfFile(exePath.c_str()); for (const auto& s : pythonInterpreterSection) { - at::optional
payloadSection = elfFile.findSection(s.sectionName); - if (payloadSection != at::nullopt) { + multipy::optional
payloadSection = + elfFile.findSection(s.sectionName); + if (payloadSection != multipy::nullopt) { payloadStart = payloadSection->start; customLoader = s.customLoader; size = payloadSection->len; - TORCH_CHECK(payloadSection.has_value(), "Missing the payload section"); + MULTIPY_CHECK(payloadSection.has_value(), "Missing the payload section"); break; } } @@ -74,10 +77,10 @@ static bool writeDeployInterpreter(FILE* dst) { break; } } - TORCH_CHECK( + MULTIPY_CHECK( libStart != nullptr && libEnd != nullptr, - "torch::deploy requires a build-time dependency on embedded_interpreter or embedded_interpreter_cuda, neither of which were found. torch::cuda::is_available()=", - torch::cuda::is_available()); + "torch::deploy requires a build-time dependency on embedded_interpreter or embedded_interpreter_cuda, neither of which were found. torch::cuda::is_available()=" + + std::to_string(torch::cuda::is_available())); size = libEnd - libStart; payloadStart = libStart; @@ -99,12 +102,12 @@ InterpreterManager::InterpreterManager( // can be used for balancing work across GPUs I.global("torch", "version").attr("__setattr__")({"interp", int(i)}); instances_.back().pImpl_->setFindModule( - [this](const std::string& name) -> at::optional { + [this](const std::string& name) -> multipy::optional { auto it = registeredModuleSource_.find(name); if (it != registeredModuleSource_.end()) { return it->second; } else { - return at::nullopt; + return multipy::nullopt; } }); } @@ -189,11 +192,11 @@ void ReplicatedObj::unload(const Interpreter* onThisInterpreter) { ReplicatedObj InterpreterSession::createMovable(Obj obj) { TORCH_DEPLOY_TRY - TORCH_CHECK( + MULTIPY_CHECK( manager_, "Can only create a movable object when the session was created from an interpreter that is part of a InterpreterManager"); - TORCH_CHECK( + MULTIPY_CHECK( impl_->isOwner(obj), "Cannot create movable from an object that lives in different session"); @@ -214,6 +217,11 @@ using dlopen_t = void* (*)(const char*, int); // function. static dlopen_t find_real_dlopen() { void* libc = dlopen("libdl.so.2", RTLD_NOLOAD | RTLD_LAZY | RTLD_LOCAL); + // libdl is gone on some newer systems. + if (!libc) { + // libc.so won't open with dlopen because it's a linker script. + libc = dlopen("libc.so.6", RTLD_NOLOAD | RTLD_LAZY | RTLD_LOCAL); + } TORCH_INTERNAL_ASSERT(libc); auto dlopen_ = (dlopen_t)dlsym(libc, "dlopen"); TORCH_INTERNAL_ASSERT(dlopen_); diff --git a/torch/csrc/deploy/deploy.h b/torch/csrc/deploy/deploy.h index c6a4794a932d02..b986093ed020ad 100644 --- a/torch/csrc/deploy/deploy.h +++ b/torch/csrc/deploy/deploy.h @@ -1,7 +1,7 @@ #pragma once -#include #include #include +#include #include #include #include @@ -95,7 +95,7 @@ struct TORCH_API LoadBalancer { } void setResourceLimit(size_t n) { TORCH_DEPLOY_TRY - TORCH_INTERNAL_ASSERT(n <= allocated_); + MULTIPY_INTERNAL_ASSERT(n <= allocated_); n_ = n; TORCH_DEPLOY_SAFE_CATCH_RETHROW } diff --git a/torch/csrc/deploy/elf_file.cpp b/torch/csrc/deploy/elf_file.cpp index 85eaaa19cc26ee..ca1e749868e51d 100644 --- a/torch/csrc/deploy/elf_file.cpp +++ b/torch/csrc/deploy/elf_file.cpp @@ -1,5 +1,7 @@ #include +#include #include +#include namespace torch { namespace deploy { @@ -13,7 +15,7 @@ ElfFile::ElfFile(const char* filename) : memFile_(filename) { shdrList_ = (Elf64_Shdr*)(fileData + ehdr_->e_shoff); auto strtabSecNo = ehdr_->e_shstrndx; - TORCH_CHECK( + MULTIPY_CHECK( strtabSecNo >= 0 && strtabSecNo < numSections_, "e_shstrndx out of range"); @@ -25,9 +27,9 @@ ElfFile::ElfFile(const char* filename) : memFile_(filename) { } } -at::optional
ElfFile::findSection(const char* name) const { - TORCH_CHECK(name != nullptr, "Null name"); - at::optional
found = at::nullopt; +multipy::optional
ElfFile::findSection(const char* name) const { + MULTIPY_CHECK(name != nullptr, "Null name"); + multipy::optional
found = multipy::nullopt; for (const auto& section : sections_) { if (strcmp(name, section.name) == 0) { found = section; @@ -40,13 +42,13 @@ at::optional
ElfFile::findSection(const char* name) const { void ElfFile::checkFormat() const { // check the magic numbers - TORCH_CHECK( + MULTIPY_CHECK( (ehdr_->e_ident[EI_MAG0] == ELFMAG0) && (ehdr_->e_ident[EI_MAG1] == ELFMAG1) && (ehdr_->e_ident[EI_MAG2] == ELFMAG2) && (ehdr_->e_ident[EI_MAG3] == ELFMAG3), "Unexpected magic numbers"); - TORCH_CHECK( + MULTIPY_CHECK( ehdr_->e_ident[EI_CLASS] == ELFCLASS64, "Only support 64bit ELF file"); } diff --git a/torch/csrc/deploy/elf_file.h b/torch/csrc/deploy/elf_file.h index e27750c01139e0..31ea7976af88c5 100644 --- a/torch/csrc/deploy/elf_file.h +++ b/torch/csrc/deploy/elf_file.h @@ -1,7 +1,8 @@ #pragma once -#include #include +#include +#include #include #include @@ -30,7 +31,7 @@ struct Section { class ElfFile { public: explicit ElfFile(const char* filename); - at::optional
findSection(const char* name) const; + multipy::optional
findSection(const char* name) const; private: Section toSection(Elf64_Shdr* shdr) { @@ -40,7 +41,7 @@ class ElfFile { const char* name = ""; if (strtabSection_) { - TORCH_CHECK(nameOff >= 0 && nameOff < strtabSection_.len); + MULTIPY_CHECK(nameOff >= 0 && nameOff < strtabSection_.len); name = strtabSection_.start + nameOff; } const char* start = memFile_.data() + shOff; @@ -48,7 +49,7 @@ class ElfFile { } [[nodiscard]] const char* str(size_t off) const { - TORCH_CHECK(off < strtabSection_.len, "String table index out of range"); + MULTIPY_CHECK(off < strtabSection_.len, "String table index out of range"); return strtabSection_.start + off; } void checkFormat() const; diff --git a/torch/csrc/deploy/environment.h b/torch/csrc/deploy/environment.h index 4485a4e1d031a4..433ce6bcb3f660 100644 --- a/torch/csrc/deploy/environment.h +++ b/torch/csrc/deploy/environment.h @@ -1,5 +1,6 @@ #pragma once #include +#include #include #include #include @@ -27,7 +28,7 @@ class Environment { // load the zipped torch modules constexpr const char* ZIPPED_TORCH_NAME = ".torch_python_modules"; auto zippedTorchSection = elfFile.findSection(ZIPPED_TORCH_NAME); - TORCH_CHECK( + MULTIPY_CHECK( zippedTorchSection.has_value(), "Missing the zipped torch section"); const char* zippedTorchStart = zippedTorchSection->start; auto zippedTorchSize = zippedTorchSection->len; @@ -35,7 +36,7 @@ class Environment { std::string zipArchive = std::string(pythonAppDir) + "/torch_python_modules.zip"; auto zippedFile = fopen(zipArchive.c_str(), "wb"); - TORCH_CHECK( + MULTIPY_CHECK( zippedFile != nullptr, "Fail to create file: ", strerror(errno)); fwrite(zippedTorchStart, 1, zippedTorchSize, zippedFile); fclose(zippedFile); diff --git a/torch/csrc/deploy/example/examples.py b/torch/csrc/deploy/example/examples.py index 25bb54a0c606e7..73eeb2149b545f 100644 --- a/torch/csrc/deploy/example/examples.py +++ b/torch/csrc/deploy/example/examples.py @@ -146,8 +146,7 @@ class MultiReturn(torch.nn.Module): def __init__(self): super(MultiReturn, self).__init__() - def forward(self, t): - # type: (Tuple[Tensor, Tensor]) -> Tuple[Tuple[Tensor, Tensor], Tuple[Tensor, Tensor]] + def forward(self, t: Tuple[Tensor, Tensor]) -> Tuple[Tuple[Tensor, Tensor], Tuple[Tensor, Tensor]]: a, b = t result = ((a.masked_fill_(b, 0.1), b), (torch.ones_like(a), b)) return result diff --git a/torch/csrc/deploy/interpreter/Optional.hpp b/torch/csrc/deploy/interpreter/Optional.hpp new file mode 100644 index 00000000000000..92b73d7f6fbba4 --- /dev/null +++ b/torch/csrc/deploy/interpreter/Optional.hpp @@ -0,0 +1,1107 @@ +// Copyright (C) 2011 - 2012 Andrzej Krzemienski. +// +// Use, modification, and distribution is subject to the Boost Software +// License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at +// http://www.boost.org/LICENSE_1_0.txt) +// +// The idea and interface is based on Boost.Optional library +// authored by Fernando Luis Cacciola Carballal +// +// Source: https://github.com/akrzemi1/Optional + +#ifndef ___OPTIONAL_HPP___ +#define ___OPTIONAL_HPP___ + +#include +#include +#include +#include +#include +#include +#include + +#define TR2_OPTIONAL_REQUIRES(...) \ + typename std::enable_if<__VA_ARGS__::value, bool>::type = false + +#if defined __GNUC__ // NOTE: GNUC is also defined for Clang +#if (__GNUC__ == 4) && (__GNUC_MINOR__ >= 8) +#define TR2_OPTIONAL_GCC_4_8_AND_HIGHER___ +#elif (__GNUC__ > 4) +#define TR2_OPTIONAL_GCC_4_8_AND_HIGHER___ +#endif + +#if (__GNUC__ == 4) && (__GNUC_MINOR__ >= 7) +#define TR2_OPTIONAL_GCC_4_7_AND_HIGHER___ +#elif (__GNUC__ > 4) +#define TR2_OPTIONAL_GCC_4_7_AND_HIGHER___ +#endif + +#if (__GNUC__ == 4) && (__GNUC_MINOR__ == 8) && (__GNUC_PATCHLEVEL__ >= 1) +#define TR2_OPTIONAL_GCC_4_8_1_AND_HIGHER___ +#elif (__GNUC__ == 4) && (__GNUC_MINOR__ >= 9) +#define TR2_OPTIONAL_GCC_4_8_1_AND_HIGHER___ +#elif (__GNUC__ > 4) +#define TR2_OPTIONAL_GCC_4_8_1_AND_HIGHER___ +#endif +#endif + +#if defined __clang_major__ +#if (__clang_major__ == 3 && __clang_minor__ >= 5) +#define TR2_OPTIONAL_CLANG_3_5_AND_HIGHTER_ +#elif (__clang_major__ > 3) +#define TR2_OPTIONAL_CLANG_3_5_AND_HIGHTER_ +#endif +#if defined TR2_OPTIONAL_CLANG_3_5_AND_HIGHTER_ +#define TR2_OPTIONAL_CLANG_3_4_2_AND_HIGHER_ +#elif ( \ + __clang_major__ == 3 && __clang_minor__ == 4 && __clang_patchlevel__ >= 2) +#define TR2_OPTIONAL_CLANG_3_4_2_AND_HIGHER_ +#endif +#endif + +#if defined _MSC_VER +#if (_MSC_VER >= 1900) +#define TR2_OPTIONAL_MSVC_2015_AND_HIGHER___ +#endif +#endif + +#if defined __clang__ +#if (__clang_major__ > 2) || (__clang_major__ == 2) && (__clang_minor__ >= 9) +#define OPTIONAL_HAS_THIS_RVALUE_REFS 1 +#else +#define OPTIONAL_HAS_THIS_RVALUE_REFS 0 +#endif +#elif defined TR2_OPTIONAL_GCC_4_8_1_AND_HIGHER___ +#define OPTIONAL_HAS_THIS_RVALUE_REFS 1 +#elif defined TR2_OPTIONAL_MSVC_2015_AND_HIGHER___ +#define OPTIONAL_HAS_THIS_RVALUE_REFS 1 +#else +#define OPTIONAL_HAS_THIS_RVALUE_REFS 0 +#endif + +#if defined TR2_OPTIONAL_GCC_4_8_1_AND_HIGHER___ +#define OPTIONAL_HAS_CONSTEXPR_INIT_LIST 1 +#define OPTIONAL_CONSTEXPR_INIT_LIST constexpr +#else +#define OPTIONAL_HAS_CONSTEXPR_INIT_LIST 0 +#define OPTIONAL_CONSTEXPR_INIT_LIST +#endif + +#if defined TR2_OPTIONAL_CLANG_3_5_AND_HIGHTER_ && (defined __cplusplus) && \ + (__cplusplus != 201103L) +#define OPTIONAL_HAS_MOVE_ACCESSORS 1 +#else +#define OPTIONAL_HAS_MOVE_ACCESSORS 0 +#endif + +// In C++11 constexpr implies const, so we need to make non-const members also +// non-constexpr +#if (defined __cplusplus) && (__cplusplus == 201103L) +#define OPTIONAL_MUTABLE_CONSTEXPR +#else +#define OPTIONAL_MUTABLE_CONSTEXPR constexpr +#endif + +namespace multipy { + +// BEGIN workaround for missing std::is_trivially_destructible +#if defined TR2_OPTIONAL_GCC_4_8_AND_HIGHER___ +// leave it: it is already there +#elif defined TR2_OPTIONAL_CLANG_3_4_2_AND_HIGHER_ +// leave it: it is already there +#elif defined TR2_OPTIONAL_MSVC_2015_AND_HIGHER___ +// leave it: it is already there +#elif defined TR2_OPTIONAL_DISABLE_EMULATION_OF_TYPE_TRAITS +// leave it: the user doesn't want it +#else +template +using std::is_trivially_destructible = std::has_trivial_destructor; +#endif +// END workaround for missing std::is_trivially_destructible + +#if (defined TR2_OPTIONAL_GCC_4_7_AND_HIGHER___) +// leave it; our metafunctions are already defined. +#elif defined TR2_OPTIONAL_CLANG_3_4_2_AND_HIGHER_ +// leave it; our metafunctions are already defined. +#elif defined TR2_OPTIONAL_MSVC_2015_AND_HIGHER___ +// leave it: it is already there +#elif defined TR2_OPTIONAL_DISABLE_EMULATION_OF_TYPE_TRAITS +// leave it: the user doesn't want it +#else + +// workaround for missing traits in GCC and CLANG +template +struct std::is_nothrow_move_constructible { + constexpr static bool value = std::is_nothrow_constructible::value; +}; + +template +struct is_assignable { + template + constexpr static bool has_assign(...) { + return false; + } + + template < + class X, + class Y, + size_t S = sizeof((std::declval() = std::declval(), true))> + // the comma operator is necessary for the cases where operator= returns void + constexpr static bool has_assign(bool) { + return true; + } + + constexpr static bool value = has_assign(true); +}; + +template +struct std::is_nothrow_move_assignable { + template + struct has_nothrow_move_assign { + constexpr static bool value = false; + }; + + template + struct has_nothrow_move_assign { + constexpr static bool value = + noexcept(std::declval() = std::declval()); + }; + + constexpr static bool value = + has_nothrow_move_assign::value>::value; +}; +// end workaround + +#endif + +// 20.5.4, optional for object types +template +class optional; + +// 20.5.5, optional for lvalue reference types +template +class optional; + +// workaround: std utility functions aren't constexpr yet +template +inline constexpr T&& constexpr_forward( + typename std::remove_reference::type& t) noexcept { + return static_cast(t); +} + +template +inline constexpr T&& constexpr_forward( + typename std::remove_reference::type&& t) noexcept { + static_assert(!std::is_lvalue_reference::value, "!!"); + return static_cast(t); +} + +template +inline constexpr typename std::remove_reference::type&& constexpr_move( + T&& t) noexcept { + return static_cast::type&&>(t); +} + +#if defined NDEBUG +#define TR2_OPTIONAL_ASSERTED_EXPRESSION(CHECK, EXPR) (EXPR) +#else +#define TR2_OPTIONAL_ASSERTED_EXPRESSION(CHECK, EXPR) \ + ((CHECK) ? (EXPR) : ([] { assert(!#CHECK); }(), (EXPR))) +#endif + +namespace detail_ { + +// static_addressof: a constexpr version of addressof +template +struct has_overloaded_addressof { + template + constexpr static bool has_overload(...) { + return false; + } + + template ().operator&())> + constexpr static bool has_overload(bool) { + return true; + } + + constexpr static bool value = has_overload(true); +}; + +template )> +constexpr T* static_addressof(T& ref) { + return &ref; +} + +template )> +T* static_addressof(T& ref) { + return std::addressof(ref); +} + +// the call to convert(b) has return type A and converts b to type A iff b +// decltype(b) is implicitly convertible to A +template +constexpr U convert(U v) { + return v; +} + +namespace swap_ns { +using std::swap; + +template +void adl_swap(T& t, T& u) noexcept(noexcept(swap(t, u))) { + swap(t, u); +} + +} // namespace swap_ns + +} // namespace detail_ + +constexpr struct trivial_init_t { +} trivial_init{}; + +// 20.5.6, In-place construction +constexpr struct in_place_t { +} in_place{}; + +// 20.5.7, Disengaged state indicator +struct nullopt_t { + struct init {}; + constexpr explicit nullopt_t(init) {} +}; +constexpr nullopt_t nullopt{nullopt_t::init()}; + +// 20.5.8, class bad_optional_access +class bad_optional_access : public std::logic_error { + public: + explicit bad_optional_access(const std::string& what_arg) + : std::logic_error{what_arg} {} + explicit bad_optional_access(const char* what_arg) + : std::logic_error{what_arg} {} +}; + +template +union storage_t { + unsigned char dummy_; + T value_; + + constexpr storage_t(trivial_init_t) noexcept : dummy_(){}; + + template + constexpr storage_t(Args&&... args) + : value_(constexpr_forward(args)...) {} + + ~storage_t() {} +}; + +template +union constexpr_storage_t { + unsigned char dummy_; + T value_; + + constexpr constexpr_storage_t(trivial_init_t) noexcept : dummy_(){}; + + template + constexpr constexpr_storage_t(Args&&... args) + : value_(constexpr_forward(args)...) {} + + ~constexpr_storage_t() = default; +}; + +template +struct optional_base { + bool init_; + storage_t storage_; + + constexpr optional_base() noexcept : init_(false), storage_(trivial_init){}; + + explicit constexpr optional_base(const T& v) : init_(true), storage_(v) {} + + explicit constexpr optional_base(T&& v) + : init_(true), storage_(constexpr_move(v)) {} + + template + explicit optional_base(in_place_t, Args&&... args) + : init_(true), storage_(constexpr_forward(args)...) {} + + template < + class U, + class... Args, + TR2_OPTIONAL_REQUIRES(std::is_constructible>)> + explicit optional_base( + in_place_t, + std::initializer_list il, + Args&&... args) + : init_(true), storage_(il, std::forward(args)...) {} + + ~optional_base() { + if (init_) + storage_.value_.T::~T(); + } +}; + +template +struct constexpr_optional_base { + bool init_; + constexpr_storage_t storage_; + + constexpr constexpr_optional_base() noexcept + : init_(false), storage_(trivial_init){}; + + explicit constexpr constexpr_optional_base(const T& v) + : init_(true), storage_(v) {} + + explicit constexpr constexpr_optional_base(T&& v) + : init_(true), storage_(constexpr_move(v)) {} + + template + explicit constexpr constexpr_optional_base(in_place_t, Args&&... args) + : init_(true), storage_(constexpr_forward(args)...) {} + + template < + class U, + class... Args, + TR2_OPTIONAL_REQUIRES(std::is_constructible>)> + OPTIONAL_CONSTEXPR_INIT_LIST explicit constexpr_optional_base( + in_place_t, + std::initializer_list il, + Args&&... args) + : init_(true), storage_(il, std::forward(args)...) {} + + ~constexpr_optional_base() = default; +}; + +template +using OptionalBase = typename std::conditional< + std::is_trivially_destructible::value, // if possible + constexpr_optional_base::type>, // use base with trivial destructor + optional_base::type>>::type; + +template +class optional : private OptionalBase { + static_assert( + !std::is_same::type, nullopt_t>::value, + "bad T"); + static_assert( + !std::is_same::type, in_place_t>::value, + "bad T"); + + constexpr bool initialized() const noexcept { + return OptionalBase::init_; + } + typename std::remove_const::type* dataptr() { + return std::addressof(OptionalBase::storage_.value_); + } + constexpr const T* dataptr() const { + return detail_::static_addressof(OptionalBase::storage_.value_); + } + +#if OPTIONAL_HAS_THIS_RVALUE_REFS == 1 + constexpr const T& contained_val() const& { + return OptionalBase::storage_.value_; + } +#if OPTIONAL_HAS_MOVE_ACCESSORS == 1 + OPTIONAL_MUTABLE_CONSTEXPR T&& contained_val() && { + return std::move(OptionalBase::storage_.value_); + } + OPTIONAL_MUTABLE_CONSTEXPR T& contained_val() & { + return OptionalBase::storage_.value_; + } +#else + T& contained_val() & { + return OptionalBase::storage_.value_; + } + T&& contained_val() && { + return std::move(OptionalBase::storage_.value_); + } +#endif +#else + constexpr const T& contained_val() const { + return OptionalBase::storage_.value_; + } + T& contained_val() { + return OptionalBase::storage_.value_; + } +#endif + + void clear() noexcept { + if (initialized()) + dataptr()->T::~T(); + OptionalBase::init_ = false; + } + + template + void initialize(Args&&... args) noexcept( + noexcept(T(std::forward(args)...))) { + assert(!OptionalBase::init_); + ::new (static_cast(dataptr())) T(std::forward(args)...); + OptionalBase::init_ = true; + } + + template + void initialize(std::initializer_list il, Args&&... args) noexcept( + noexcept(T(il, std::forward(args)...))) { + assert(!OptionalBase::init_); + ::new (static_cast(dataptr())) T(il, std::forward(args)...); + OptionalBase::init_ = true; + } + + public: + typedef T value_type; + + // 20.5.5.1, constructors + constexpr optional() noexcept : OptionalBase(){}; + constexpr optional(nullopt_t) noexcept : OptionalBase(){}; + + optional(const optional& rhs) : OptionalBase() { + if (rhs.initialized()) { + ::new (static_cast(dataptr())) T(*rhs); + OptionalBase::init_ = true; + } + } + + optional(optional&& rhs) noexcept( + std::is_nothrow_move_constructible::value) + : OptionalBase() { + if (rhs.initialized()) { + ::new (static_cast(dataptr())) T(std::move(*rhs)); + OptionalBase::init_ = true; + } + } + + constexpr optional(const T& v) : OptionalBase(v) {} + + constexpr optional(T&& v) : OptionalBase(constexpr_move(v)) {} + + template + explicit constexpr optional(in_place_t, Args&&... args) + : OptionalBase(in_place_t{}, constexpr_forward(args)...) {} + + template < + class U, + class... Args, + TR2_OPTIONAL_REQUIRES(std::is_constructible>)> + OPTIONAL_CONSTEXPR_INIT_LIST explicit optional( + in_place_t, + std::initializer_list il, + Args&&... args) + : OptionalBase(in_place_t{}, il, constexpr_forward(args)...) {} + + // 20.5.4.2, Destructor + ~optional() = default; + + // 20.5.4.3, assignment + optional& operator=(nullopt_t) noexcept { + clear(); + return *this; + } + + optional& operator=(const optional& rhs) { + if (initialized() == true && rhs.initialized() == false) + clear(); + else if (initialized() == false && rhs.initialized() == true) + initialize(*rhs); + else if (initialized() == true && rhs.initialized() == true) + contained_val() = *rhs; + return *this; + } + + optional& operator=(optional&& rhs) noexcept( + std::is_nothrow_move_assignable::value&& + std::is_nothrow_move_constructible::value) { + if (initialized() == true && rhs.initialized() == false) + clear(); + else if (initialized() == false && rhs.initialized() == true) + initialize(std::move(*rhs)); + else if (initialized() == true && rhs.initialized() == true) + contained_val() = std::move(*rhs); + return *this; + } + + template + auto operator=(U&& v) -> typename std::enable_if< + std::is_same::type, T>::value, + optional&>::type { + if (initialized()) { + contained_val() = std::forward(v); + } else { + initialize(std::forward(v)); + } + return *this; + } + + template + void emplace(Args&&... args) { + clear(); + initialize(std::forward(args)...); + } + + template + void emplace(std::initializer_list il, Args&&... args) { + clear(); + initialize(il, std::forward(args)...); + } + + // 20.5.4.4, Swap + void swap(optional& rhs) noexcept( + std::is_nothrow_move_constructible::value&& noexcept( + detail_::swap_ns::adl_swap(std::declval(), std::declval()))) { + if (initialized() == true && rhs.initialized() == false) { + rhs.initialize(std::move(**this)); + clear(); + } else if (initialized() == false && rhs.initialized() == true) { + initialize(std::move(*rhs)); + rhs.clear(); + } else if (initialized() == true && rhs.initialized() == true) { + using std::swap; + swap(**this, *rhs); + } + } + + // 20.5.4.5, Observers + + explicit constexpr operator bool() const noexcept { + return initialized(); + } + constexpr bool has_value() const noexcept { + return initialized(); + } + + constexpr T const* operator->() const { + return TR2_OPTIONAL_ASSERTED_EXPRESSION(initialized(), dataptr()); + } + +#if OPTIONAL_HAS_MOVE_ACCESSORS == 1 + + OPTIONAL_MUTABLE_CONSTEXPR T* operator->() { + assert(initialized()); + return dataptr(); + } + + constexpr T const& operator*() const& { + return TR2_OPTIONAL_ASSERTED_EXPRESSION(initialized(), contained_val()); + } + + OPTIONAL_MUTABLE_CONSTEXPR T& operator*() & { + assert(initialized()); + return contained_val(); + } + + OPTIONAL_MUTABLE_CONSTEXPR T&& operator*() && { + assert(initialized()); + return constexpr_move(contained_val()); + } + + constexpr T const& value() const& { + return initialized() + ? contained_val() + : (throw bad_optional_access("bad optional access"), contained_val()); + } + + OPTIONAL_MUTABLE_CONSTEXPR T& value() & { + return initialized() + ? contained_val() + : (throw bad_optional_access("bad optional access"), contained_val()); + } + + OPTIONAL_MUTABLE_CONSTEXPR T&& value() && { + if (!initialized()) + throw bad_optional_access("bad optional access"); + return std::move(contained_val()); + } + +#else + + T* operator->() { + assert(initialized()); + return dataptr(); + } + + constexpr T const& operator*() const { + return TR2_OPTIONAL_ASSERTED_EXPRESSION(initialized(), contained_val()); + } + + T& operator*() { + assert(initialized()); + return contained_val(); + } + + constexpr T const& value() const { + return initialized() + ? contained_val() + : (throw bad_optional_access("bad optional access"), contained_val()); + } + + T& value() { + return initialized() + ? contained_val() + : (throw bad_optional_access("bad optional access"), contained_val()); + } + +#endif + +#if OPTIONAL_HAS_THIS_RVALUE_REFS == 1 + + template + constexpr T value_or(V&& v) const& { + return *this ? **this : detail_::convert(constexpr_forward(v)); + } + +#if OPTIONAL_HAS_MOVE_ACCESSORS == 1 + + template + OPTIONAL_MUTABLE_CONSTEXPR T value_or(V&& v) && { + return *this + ? constexpr_move(const_cast&>(*this).contained_val()) + : detail_::convert(constexpr_forward(v)); + } + +#else + + template + T value_or(V&& v) && { + return *this + ? constexpr_move(const_cast&>(*this).contained_val()) + : detail_::convert(constexpr_forward(v)); + } + +#endif + +#else + + template + constexpr T value_or(V&& v) const { + return *this ? **this : detail_::convert(constexpr_forward(v)); + } + +#endif + + // 20.6.3.6, modifiers + void reset() noexcept { + clear(); + } +}; + +template +class optional { + static_assert(!std::is_same::value, "bad T"); + static_assert(!std::is_same::value, "bad T"); + T* ref; + + public: + // 20.5.5.1, construction/destruction + constexpr optional() noexcept : ref(nullptr) {} + + constexpr optional(nullopt_t) noexcept : ref(nullptr) {} + + constexpr optional(T& v) noexcept : ref(detail_::static_addressof(v)) {} + + optional(T&&) = delete; + + constexpr optional(const optional& rhs) noexcept : ref(rhs.ref) {} + + explicit constexpr optional(in_place_t, T& v) noexcept + : ref(detail_::static_addressof(v)) {} + + explicit optional(in_place_t, T&&) = delete; + + ~optional() = default; + + // 20.5.5.2, mutation + optional& operator=(nullopt_t) noexcept { + ref = nullptr; + return *this; + } + + // optional& operator=(const optional& rhs) noexcept { + // ref = rhs.ref; + // return *this; + // } + + // optional& operator=(optional&& rhs) noexcept { + // ref = rhs.ref; + // return *this; + // } + + template + auto operator=(U&& rhs) noexcept -> typename std::enable_if< + std::is_same::type, optional>::value, + optional&>::type { + ref = rhs.ref; + return *this; + } + + template + auto operator=(U&& rhs) noexcept -> typename std::enable_if< + !std::is_same::type, optional>::value, + optional&>::type = delete; + + void emplace(T& v) noexcept { + ref = detail_::static_addressof(v); + } + + void emplace(T&&) = delete; + + void swap(optional& rhs) noexcept { + std::swap(ref, rhs.ref); + } + + // 20.5.5.3, observers + constexpr T* operator->() const { + return TR2_OPTIONAL_ASSERTED_EXPRESSION(ref, ref); + } + + constexpr T& operator*() const { + return TR2_OPTIONAL_ASSERTED_EXPRESSION(ref, *ref); + } + + constexpr T& value() const { + return ref ? *ref + : (throw bad_optional_access("bad optional access"), *ref); + } + + explicit constexpr operator bool() const noexcept { + return ref != nullptr; + } + + constexpr bool has_value() const noexcept { + return ref != nullptr; + } + + template + constexpr typename std::decay::type value_or(V&& v) const { + return *this ? **this + : detail_::convert::type>( + constexpr_forward(v)); + } + + // x.x.x.x, modifiers + void reset() noexcept { + ref = nullptr; + } +}; + +template +class optional { + static_assert(sizeof(T) == 0, "optional rvalue references disallowed"); +}; + +// 20.5.8, Relational operators +template +constexpr bool operator==(const optional& x, const optional& y) { + return bool(x) != bool(y) ? false : bool(x) == false ? true : *x == *y; +} + +template +constexpr bool operator!=(const optional& x, const optional& y) { + return !(x == y); +} + +template +constexpr bool operator<(const optional& x, const optional& y) { + return (!y) ? false : (!x) ? true : *x < *y; +} + +template +constexpr bool operator>(const optional& x, const optional& y) { + return (y < x); +} + +template +constexpr bool operator<=(const optional& x, const optional& y) { + return !(y < x); +} + +template +constexpr bool operator>=(const optional& x, const optional& y) { + return !(x < y); +} + +// 20.5.9, Comparison with nullopt +template +constexpr bool operator==(const optional& x, nullopt_t) noexcept { + return (!x); +} + +template +constexpr bool operator==(nullopt_t, const optional& x) noexcept { + return (!x); +} + +template +constexpr bool operator!=(const optional& x, nullopt_t) noexcept { + return bool(x); +} + +template +constexpr bool operator!=(nullopt_t, const optional& x) noexcept { + return bool(x); +} + +template +constexpr bool operator<(const optional&, nullopt_t) noexcept { + return false; +} + +template +constexpr bool operator<(nullopt_t, const optional& x) noexcept { + return bool(x); +} + +template +constexpr bool operator<=(const optional& x, nullopt_t) noexcept { + return (!x); +} + +template +constexpr bool operator<=(nullopt_t, const optional&) noexcept { + return true; +} + +template +constexpr bool operator>(const optional& x, nullopt_t) noexcept { + return bool(x); +} + +template +constexpr bool operator>(nullopt_t, const optional&) noexcept { + return false; +} + +template +constexpr bool operator>=(const optional&, nullopt_t) noexcept { + return true; +} + +template +constexpr bool operator>=(nullopt_t, const optional& x) noexcept { + return (!x); +} + +// 20.5.10, Comparison with T +template +constexpr bool operator==(const optional& x, const T& v) { + return bool(x) ? *x == v : false; +} + +template +constexpr bool operator==(const T& v, const optional& x) { + return bool(x) ? v == *x : false; +} + +template +constexpr bool operator!=(const optional& x, const T& v) { + return bool(x) ? *x != v : true; +} + +template +constexpr bool operator!=(const T& v, const optional& x) { + return bool(x) ? v != *x : true; +} + +template +constexpr bool operator<(const optional& x, const T& v) { + return bool(x) ? *x < v : true; +} + +template +constexpr bool operator>(const T& v, const optional& x) { + return bool(x) ? v > *x : true; +} + +template +constexpr bool operator>(const optional& x, const T& v) { + return bool(x) ? *x > v : false; +} + +template +constexpr bool operator<(const T& v, const optional& x) { + return bool(x) ? v < *x : false; +} + +template +constexpr bool operator>=(const optional& x, const T& v) { + return bool(x) ? *x >= v : false; +} + +template +constexpr bool operator<=(const T& v, const optional& x) { + return bool(x) ? v <= *x : false; +} + +template +constexpr bool operator<=(const optional& x, const T& v) { + return bool(x) ? *x <= v : true; +} + +template +constexpr bool operator>=(const T& v, const optional& x) { + return bool(x) ? v >= *x : true; +} + +// Comparison of optional with T +template +constexpr bool operator==(const optional& x, const T& v) { + return bool(x) ? *x == v : false; +} + +template +constexpr bool operator==(const T& v, const optional& x) { + return bool(x) ? v == *x : false; +} + +template +constexpr bool operator!=(const optional& x, const T& v) { + return bool(x) ? *x != v : true; +} + +template +constexpr bool operator!=(const T& v, const optional& x) { + return bool(x) ? v != *x : true; +} + +template +constexpr bool operator<(const optional& x, const T& v) { + return bool(x) ? *x < v : true; +} + +template +constexpr bool operator>(const T& v, const optional& x) { + return bool(x) ? v > *x : true; +} + +template +constexpr bool operator>(const optional& x, const T& v) { + return bool(x) ? *x > v : false; +} + +template +constexpr bool operator<(const T& v, const optional& x) { + return bool(x) ? v < *x : false; +} + +template +constexpr bool operator>=(const optional& x, const T& v) { + return bool(x) ? *x >= v : false; +} + +template +constexpr bool operator<=(const T& v, const optional& x) { + return bool(x) ? v <= *x : false; +} + +template +constexpr bool operator<=(const optional& x, const T& v) { + return bool(x) ? *x <= v : true; +} + +template +constexpr bool operator>=(const T& v, const optional& x) { + return bool(x) ? v >= *x : true; +} + +// Comparison of optional with T +template +constexpr bool operator==(const optional& x, const T& v) { + return bool(x) ? *x == v : false; +} + +template +constexpr bool operator==(const T& v, const optional& x) { + return bool(x) ? v == *x : false; +} + +template +constexpr bool operator!=(const optional& x, const T& v) { + return bool(x) ? *x != v : true; +} + +template +constexpr bool operator!=(const T& v, const optional& x) { + return bool(x) ? v != *x : true; +} + +template +constexpr bool operator<(const optional& x, const T& v) { + return bool(x) ? *x < v : true; +} + +template +constexpr bool operator>(const T& v, const optional& x) { + return bool(x) ? v > *x : true; +} + +template +constexpr bool operator>(const optional& x, const T& v) { + return bool(x) ? *x > v : false; +} + +template +constexpr bool operator<(const T& v, const optional& x) { + return bool(x) ? v < *x : false; +} + +template +constexpr bool operator>=(const optional& x, const T& v) { + return bool(x) ? *x >= v : false; +} + +template +constexpr bool operator<=(const T& v, const optional& x) { + return bool(x) ? v <= *x : false; +} + +template +constexpr bool operator<=(const optional& x, const T& v) { + return bool(x) ? *x <= v : true; +} + +template +constexpr bool operator>=(const T& v, const optional& x) { + return bool(x) ? v >= *x : true; +} + +// 20.5.12, Specialized algorithms +template +void swap(optional& x, optional& y) noexcept(noexcept(x.swap(y))) { + x.swap(y); +} + +template +constexpr optional::type> make_optional(T&& v) { + return optional::type>(constexpr_forward(v)); +} + +template +constexpr optional make_optional(std::reference_wrapper v) { + return optional(v.get()); +} + +} // namespace multipy + +namespace std { +template +struct hash> { + typedef typename hash::result_type result_type; + typedef multipy::optional argument_type; + + constexpr result_type operator()(argument_type const& arg) const { + return arg ? std::hash{}(*arg) : result_type{}; + } +}; + +template +struct hash> { + typedef typename hash::result_type result_type; + typedef multipy::optional argument_type; + + constexpr result_type operator()(argument_type const& arg) const { + return arg ? std::hash{}(*arg) : result_type{}; + } +}; +} // namespace std + +#undef TR2_OPTIONAL_REQUIRES +#undef TR2_OPTIONAL_ASSERTED_EXPRESSION + +#endif //___OPTIONAL_HPP___ diff --git a/torch/csrc/deploy/interpreter/builtin_registry.cpp b/torch/csrc/deploy/interpreter/builtin_registry.cpp index a34768c2a009bf..6bcabd969ec521 100644 --- a/torch/csrc/deploy/interpreter/builtin_registry.cpp +++ b/torch/csrc/deploy/interpreter/builtin_registry.cpp @@ -1,6 +1,7 @@ #include #include #include +#include #include namespace torch { @@ -44,7 +45,7 @@ BuiltinRegistryItem::BuiltinRegistryItem( fprintf( stderr, - "torch::deploy builtin %s contains %d modules\n", + "torch::deploy builtin %s contains %u modules\n", name, numModules); } @@ -109,8 +110,8 @@ BuiltinRegistryItem* BuiltinRegistry::getItem(const std::string& name) { : get()->items_[itr->second].get(); } -int BuiltinRegistry::totalNumModules() { - int tot = 0; +unsigned BuiltinRegistry::totalNumModules() { + unsigned tot = 0; for (const auto& itemptr : get()->items_) { tot += itemptr->numModules; } @@ -119,7 +120,7 @@ int BuiltinRegistry::totalNumModules() { struct _frozen* BuiltinRegistry::getAllFrozenModules() { /* Allocate new memory for the combined table */ - int totNumModules = totalNumModules(); + size_t totNumModules = totalNumModules(); struct _frozen* p = nullptr; if (totNumModules > 0 && totNumModules <= SIZE_MAX / sizeof(struct _frozen) - 1) { @@ -134,7 +135,7 @@ struct _frozen* BuiltinRegistry::getAllFrozenModules() { memset(&p[0], 0, sizeof(p[0])); /* Copy the tables into the new memory */ - int off = 0; + unsigned off = 0; for (const auto& itemptr : items()) { if (itemptr->numModules > 0) { memcpy( diff --git a/torch/csrc/deploy/interpreter/builtin_registry.h b/torch/csrc/deploy/interpreter/builtin_registry.h index da7eb372de84f1..533adc2100b3d1 100644 --- a/torch/csrc/deploy/interpreter/builtin_registry.h +++ b/torch/csrc/deploy/interpreter/builtin_registry.h @@ -49,7 +49,7 @@ struct BuiltinRegistryItem { std::vector>&& _builtinModules); const char* name; const struct _frozen* frozenModules; - int numModules; + unsigned numModules; std::vector> builtinModules; }; @@ -77,7 +77,7 @@ class BuiltinRegistry { static const std::vector>& items() { return get()->items_; } - static int totalNumModules(); + static unsigned totalNumModules(); static BuiltinRegistry* get(); static BuiltinRegistryItem* getItem(const std::string& name); static std::vector> getAllBuiltinModules(); diff --git a/torch/csrc/deploy/interpreter/import_find_sharedfuncptr.cpp b/torch/csrc/deploy/interpreter/import_find_sharedfuncptr.cpp index b8af5de3db205e..2a89a96c623d71 100644 --- a/torch/csrc/deploy/interpreter/import_find_sharedfuncptr.cpp +++ b/torch/csrc/deploy/interpreter/import_find_sharedfuncptr.cpp @@ -1,4 +1,5 @@ #include +#include #include using torch::deploy::CustomLibrary; diff --git a/torch/csrc/deploy/interpreter/interpreter_impl.cpp b/torch/csrc/deploy/interpreter/interpreter_impl.cpp index 1ff30f0afbb04b..2af33582aa6dfd 100644 --- a/torch/csrc/deploy/interpreter/interpreter_impl.cpp +++ b/torch/csrc/deploy/interpreter/interpreter_impl.cpp @@ -9,6 +9,7 @@ #include #include #include +#include #include #include @@ -219,8 +220,8 @@ struct __attribute__((visibility("hidden"))) ConcreteInterpreterImpl } void setFindModule( - std::function(const std::string&)> find_module) - override { + std::function(const std::string&)> + find_module) override { std::function wrapped_find_module = [=](const std::string& name) -> py::object { auto r = find_module(name); diff --git a/torch/csrc/deploy/interpreter/interpreter_impl.h b/torch/csrc/deploy/interpreter/interpreter_impl.h index 10a1489740ec27..a2dd57e9beeba6 100644 --- a/torch/csrc/deploy/interpreter/interpreter_impl.h +++ b/torch/csrc/deploy/interpreter/interpreter_impl.h @@ -3,6 +3,7 @@ #include #include #include +#include /* Torch Deploy intentionally embeds multiple copies of c++ libraries providing python bindings necessary for torch::deploy users in the same @@ -15,8 +16,8 @@ the client application. It is safe to throw exception types that are defined once in - the context of the client application, such as c10::Error, which is defined - in libtorch, which isn't duplicated in torch::deploy interpreters. + the context of the client application, such as std::runtime_error, + which isn't duplicated in torch::deploy interpreters. ==> Use TORCH_DEPLOY_TRY, _SAFE_CATCH_RETHROW around _ALL_ torch::deploy APIs @@ -30,20 +31,17 @@ */ #define TORCH_DEPLOY_TRY try { -#define TORCH_DEPLOY_SAFE_CATCH_RETHROW \ - } \ - catch (std::exception & err) { \ - throw c10::Error( \ - std::string( \ - "Exception Caught inside torch::deploy embedded library: \n") + \ - err.what(), \ - ""); \ - } \ - catch (...) { \ - throw c10::Error( \ - std::string( \ - "Unknown Exception Caught inside torch::deploy embedded library"), \ - ""); \ +#define TORCH_DEPLOY_SAFE_CATCH_RETHROW \ + } \ + catch (std::exception & err) { \ + throw std::runtime_error( \ + std::string( \ + "Exception Caught inside torch::deploy embedded library: \n") + \ + err.what()); \ + } \ + catch (...) { \ + throw std::runtime_error(std::string( \ + "Unknown Exception Caught inside torch::deploy embedded library")); \ } namespace torch { namespace deploy { @@ -132,7 +130,7 @@ struct InterpreterSessionImpl { struct InterpreterImpl { virtual InterpreterSessionImpl* acquireSession() = 0; virtual void setFindModule( - std::function(const std::string&)> + std::function(const std::string&)> find_module) = 0; virtual ~InterpreterImpl() = default; // this will uninitialize python }; diff --git a/torch/csrc/deploy/loader.cpp b/torch/csrc/deploy/loader.cpp index f03a2d299a5510..ab4d0c7c329e5e 100644 --- a/torch/csrc/deploy/loader.cpp +++ b/torch/csrc/deploy/loader.cpp @@ -53,8 +53,8 @@ // Get PAGE_SIZE and PAGE_MASK. #include -#include #include +#include #include #include @@ -300,15 +300,15 @@ struct __attribute__((visibility("hidden"))) SystemLibraryImpl SystemLibraryImpl(void* handle, bool steal) : handle_(handle), own_handle_(steal && handle != RTLD_DEFAULT) {} - at::optional sym(const char* name) const override { + multipy::optional sym(const char* name) const override { void* r = dlsym(handle_, name); if (!r) { - return at::nullopt; + return multipy::nullopt; } return (Elf64_Addr)r; } - at::optional tls_sym(const char* name) const override; + multipy::optional tls_sym(const char* name) const override; ~SystemLibraryImpl() override { if (own_handle_) { @@ -534,11 +534,11 @@ struct ElfDynamicInfo { } } - at::optional sym( + multipy::optional sym( const char* name, GnuHash* precomputed_hash = nullptr) const { if (!gnu_bucket_) { - return at::nullopt; // no hashtable was loaded + return multipy::nullopt; // no hashtable was loaded } GnuHash hash_obj = precomputed_hash ? *precomputed_hash : GnuHash(name); auto hash = hash_obj.hash; @@ -551,12 +551,12 @@ struct ElfDynamicInfo { const uint32_t h2 = (hash >> gnu_shift2_) % kBloomMaskBits; if ((1 & (bloom_word >> h1) & (bloom_word >> h2)) != 1) { - return at::nullopt; + return multipy::nullopt; } uint32_t sym_idx = gnu_bucket_[hash % gnu_nbucket_]; if (sym_idx == 0) { - return at::nullopt; + return multipy::nullopt; } uint32_t chain_value = 0; @@ -574,12 +574,12 @@ struct ElfDynamicInfo { ((ELF64_ST_TYPE(sym->st_info) == STT_TLS) ? 0 : load_bias_); } // symbol isn't defined - return at::nullopt; + return multipy::nullopt; } } ++sym_idx; } while ((chain_value & 1) == 0); - return at::nullopt; + return multipy::nullopt; } }; @@ -613,7 +613,7 @@ struct AlreadyLoadedSymTable { dyninfo_.initialize_from_dynamic_section(name, dynamic, load_bias, true); } - at::optional sym(const char* name) { + multipy::optional sym(const char* name) { return dyninfo_.sym(name); } }; @@ -626,8 +626,8 @@ static int iterate_cb(struct dl_phdr_info* info, size_t size, void* data) { // with a normal dlsym call. Instead we iterate through all loaded libraries and // check their symbol tables for the symbol. The value of the symbol is the TLS // offset. When we find the library we also get the module id. -at::optional slow_find_tls_symbol_offset(const char* sym_name) { - at::optional result = at::nullopt; +multipy::optional slow_find_tls_symbol_offset(const char* sym_name) { + multipy::optional result = multipy::nullopt; std::function cb = [&](struct dl_phdr_info* info, size_t size) { // std::cout << "SEARCHING .. " << info->dlpi_name << "\n"; @@ -650,10 +650,11 @@ at::optional slow_find_tls_symbol_offset(const char* sym_name) { return result; } -at::optional SystemLibraryImpl::tls_sym(const char* name) const { +multipy::optional SystemLibraryImpl::tls_sym(const char* name) const { if (!sym(name)) { - return at::nullopt; // before we do a bunch of slow lookups to find the - // module_id, check that this even defines the symbol + return multipy::nullopt; // before we do a bunch of slow lookups to find the + // module_id, check that this even defines the + // symbol } if (handle_ == RTLD_DEFAULT) { return slow_find_tls_symbol_offset(name); @@ -675,7 +676,7 @@ at::optional SystemLibraryImpl::tls_sym(const char* name) const { "failed to query dlinfo for module_id"); return TLSIndex{module_id, *r}; } - return at::nullopt; + return multipy::nullopt; } // dlopen does not accept additional search paths as an argument. @@ -966,7 +967,7 @@ struct __attribute__((visibility("hidden"))) CustomLibraryImpl dyninfo_.needed_); } - at::optional lookup_symbol(Elf64_Xword r_info) { + multipy::optional lookup_symbol(Elf64_Xword r_info) { const uint32_t r_type = ELF64_R_TYPE(r_info); const uint32_t r_sym = ELF64_R_SYM(r_info); @@ -999,10 +1000,10 @@ struct __attribute__((visibility("hidden"))) CustomLibraryImpl name_.c_str(), sym_name); } - return at::nullopt; + return multipy::nullopt; } - at::optional tls_lookup_symbol(Elf64_Xword r_info) { + multipy::optional tls_lookup_symbol(Elf64_Xword r_info) { const uint32_t r_sym = ELF64_R_SYM(r_info); if (r_sym == 0) { @@ -1030,7 +1031,7 @@ struct __attribute__((visibility("hidden"))) CustomLibraryImpl name_.c_str(), sym_name); } - return at::nullopt; + return multipy::nullopt; } void relocate_one(const Elf64_Rela& reloc) { @@ -1177,16 +1178,16 @@ struct __attribute__((visibility("hidden"))) CustomLibraryImpl f(argc_, argv_, environ); } - at::optional sym(const char* name) const override { + multipy::optional sym(const char* name) const override { return dyninfo_.sym(name); } - at::optional tls_sym(const char* name) const override { + multipy::optional tls_sym(const char* name) const override { auto r = dyninfo_.sym(name); if (r) { return TLSIndex{module_id(), *r}; } - return at::nullopt; + return multipy::nullopt; } void* tls_addr(size_t offset) { diff --git a/torch/csrc/deploy/loader.h b/torch/csrc/deploy/loader.h index eeff1a30174ee9..9e5a7fd4571de8 100644 --- a/torch/csrc/deploy/loader.h +++ b/torch/csrc/deploy/loader.h @@ -1,7 +1,7 @@ #pragma once -#include #include #include +#include #include namespace torch { @@ -19,8 +19,8 @@ struct TLSIndex { struct SymbolProvider { SymbolProvider() = default; - virtual at::optional sym(const char* name) const = 0; - virtual at::optional tls_sym(const char* name) const = 0; + virtual multipy::optional sym(const char* name) const = 0; + virtual multipy::optional tls_sym(const char* name) const = 0; SymbolProvider(const SymbolProvider&) = delete; SymbolProvider& operator=(const SymbolProvider&) = delete; virtual ~SymbolProvider() = default; diff --git a/torch/csrc/deploy/mem_file.h b/torch/csrc/deploy/mem_file.h index c50889f8353bb3..df4fe941ca58c0 100644 --- a/torch/csrc/deploy/mem_file.h +++ b/torch/csrc/deploy/mem_file.h @@ -1,9 +1,9 @@ #pragma once -#include #include #include #include +#include #include #include #include @@ -20,18 +20,21 @@ namespace deploy { struct MemFile { explicit MemFile(const char* filename_) : fd_(0), mem_(nullptr), n_bytes_(0) { fd_ = open(filename_, O_RDONLY); - TORCH_CHECK(fd_ != -1, "failed to open {}: {}", filename_, strerror(errno)); + MULTIPY_CHECK( + fd_ != -1, "failed to open {}: {}" + filename_ + strerror(errno)); // NOLINTNEXTLINE struct stat s; if (-1 == fstat(fd_, &s)) { close(fd_); // destructors don't run during exceptions - TORCH_CHECK(false, "failed to stat {}: {}", filename_, strerror(errno)); + MULTIPY_CHECK( + false, "failed to stat {}: {}" + filename_ + strerror(errno)); } n_bytes_ = s.st_size; mem_ = mmap(nullptr, n_bytes_, PROT_READ, MAP_SHARED, fd_, 0); if (MAP_FAILED == mem_) { close(fd_); - TORCH_CHECK(false, "failed to mmap {}: {}", filename_, strerror(errno)); + MULTIPY_CHECK( + false, "failed to mmap {}: {}" + filename_ + strerror(errno)); } } MemFile(const MemFile&) = delete; diff --git a/torch/csrc/deploy/test_deploy.cpp b/torch/csrc/deploy/test_deploy.cpp index 840720cc01f895..973fbff0fa4f26 100644 --- a/torch/csrc/deploy/test_deploy.cpp +++ b/torch/csrc/deploy/test_deploy.cpp @@ -182,13 +182,14 @@ TEST(TorchpyTest, ErrorsReplicatingObj) { auto obj = session1.fromMovable(replicatedObj); // should throw an error when trying to access obj from different session // NOLINTNEXTLINE(hicpp-avoid-goto,cppcoreguidelines-avoid-goto) - EXPECT_THROW(session2.createMovable(obj), c10::Error); + EXPECT_THROW(session2.createMovable(obj), std::runtime_error); try { session2.createMovable(obj); - } catch (c10::Error& error) { + } catch (std::runtime_error& error) { EXPECT_TRUE( - error.msg().find( - "Cannot create movable from an object that lives in different session") != + std::string(error.what()) + .find( + "Cannot create movable from an object that lives in different session") != std::string::npos); } } @@ -197,15 +198,15 @@ TEST(TorchpyTest, ThrowsSafely) { // See explanation in deploy.h torch::deploy::InterpreterManager manager(3); // NOLINTNEXTLINE(hicpp-avoid-goto,cppcoreguidelines-avoid-goto) - EXPECT_THROW(manager.loadPackage("some garbage path"), c10::Error); + EXPECT_THROW(manager.loadPackage("some garbage path"), std::runtime_error); torch::deploy::Package p = manager.loadPackage(path("SIMPLE", simple)); // NOLINTNEXTLINE(hicpp-avoid-goto,cppcoreguidelines-avoid-goto) - EXPECT_THROW(p.loadPickle("some other", "garbage path"), c10::Error); + EXPECT_THROW(p.loadPickle("some other", "garbage path"), std::runtime_error); auto model = p.loadPickle("model", "model.pkl"); // NOLINTNEXTLINE(hicpp-avoid-goto,cppcoreguidelines-avoid-goto) - EXPECT_THROW(model(at::IValue("unexpected input")), c10::Error); + EXPECT_THROW(model(at::IValue("unexpected input")), std::runtime_error); } TEST(TorchpyTest, AcquireMultipleSessionsInTheSamePackage) { @@ -238,7 +239,7 @@ TEST(TorchpyTest, TensorSharingNotAllowed) { auto t = obj.toIValue().toTensor(); // try to feed it to the other interpreter, should error // NOLINTNEXTLINE(hicpp-avoid-goto,cppcoreguidelines-avoid-goto) - ASSERT_THROW(I1.global("torch", "sigmoid")({t}), c10::Error); + ASSERT_THROW(I1.global("torch", "sigmoid")({t}), std::runtime_error); } TEST(TorchpyTest, TaggingRace) { @@ -259,7 +260,7 @@ TEST(TorchpyTest, TaggingRace) { try { I.fromIValue(t); success++; - } catch (const c10::Error& e) { + } catch (const std::runtime_error& e) { failed++; } } @@ -279,7 +280,7 @@ TEST(TorchpyTest, DisarmHook) { torch::deploy::InterpreterManager m(1); auto I = m.acquireOne(); // NOLINTNEXTLINE(hicpp-avoid-goto,cppcoreguidelines-avoid-goto) - ASSERT_THROW(I.fromIValue(t), c10::Error); // NOT a segfault + ASSERT_THROW(I.fromIValue(t), std::runtime_error); // NOT a segfault } TEST(TorchpyTest, RegisterModule) { @@ -291,6 +292,7 @@ TEST(TorchpyTest, RegisterModule) { } } +#ifdef FBCODE_CAFFE2 TEST(TorchpyTest, FxModule) { size_t nthreads = 3; torch::deploy::InterpreterManager manager(nthreads); @@ -317,6 +319,7 @@ TEST(TorchpyTest, FxModule) { ASSERT_TRUE(ref_output.equal(outputs[i])); } } +#endif // Moving a tensor between interpreters should share the underlying storage. TEST(TorchpyTest, TensorSerializationSharing) { @@ -479,6 +482,42 @@ TEST(TorchpyTest, TestPyYAML) { } #endif +TEST(TorchpyTest, PrintInstruction) { + const auto jit_script_with_print = R"JIT( + def forward(self, a): + print(a) + return a + a + )JIT"; + + auto input = torch::autograd::make_variable(at::randn({2, 3})); + auto expected_forward = input + input; + + auto module = std::make_shared( + "Module", std::make_shared()); + module->define(jit_script_with_print); + + std::vector inputs{at::IValue(input)}; + + // Checking that a module containing prim::Print() works fine. + auto result1 = (*module)(inputs); + EXPECT_TRUE(result1.toTensor().equal(expected_forward)); + + { + auto interpreterManager = + std::make_shared(1); + + // Checking that a module containing prim::Print() still works fine + // after Python environment was created. + auto result2 = (*module)(inputs); + EXPECT_TRUE(result2.toTensor().equal(expected_forward)); + } + + // Checking that a module containing prim::Print() still works fine + // after Python environment was created and then destroyed. + auto result3 = (*module)(inputs); + EXPECT_TRUE(result3.toTensor().equal(expected_forward)); +} + int main(int argc, char* argv[]) { ::testing::InitGoogleTest(&argc, argv); int rc = RUN_ALL_TESTS(); diff --git a/torch/csrc/deploy/test_deploy_gpu.cpp b/torch/csrc/deploy/test_deploy_gpu.cpp index 8fa154b8070953..48660c79fefa3e 100644 --- a/torch/csrc/deploy/test_deploy_gpu.cpp +++ b/torch/csrc/deploy/test_deploy_gpu.cpp @@ -67,6 +67,7 @@ TEST(TorchDeployGPUTest, UsesDistributed) { } } +#ifdef FBCODE_CAFFE2 TEST(TorchDeployGPUTest, TensorRT) { if (!torch::cuda::is_available()) { GTEST_SKIP(); @@ -85,6 +86,7 @@ TEST(TorchDeployGPUTest, TensorRT) { output.allclose(model(at::IValue{input}).toIValue().toTensor())); } } +#endif // OSS build does not have bultin numpy support yet. Use this flag to guard the // test case. diff --git a/torch/csrc/deploy/test_deploy_missing_interpreter.cpp b/torch/csrc/deploy/test_deploy_missing_interpreter.cpp index 8ac602a3f2fc5e..b47f4556ad781e 100644 --- a/torch/csrc/deploy/test_deploy_missing_interpreter.cpp +++ b/torch/csrc/deploy/test_deploy_missing_interpreter.cpp @@ -10,5 +10,5 @@ int main(int argc, char* argv[]) { TEST(TorchDeployMissingInterpreter, Throws) { // NOLINTNEXTLINE(hicpp-avoid-goto,cppcoreguidelines-avoid-goto) - EXPECT_THROW(torch::deploy::InterpreterManager(1), c10::Error); + EXPECT_THROW(torch::deploy::InterpreterManager(1), std::runtime_error); } diff --git a/torch/csrc/deploy/unity/xar_environment.cpp b/torch/csrc/deploy/unity/xar_environment.cpp index 3ff233b0c420cc..4bb764374525ec 100644 --- a/torch/csrc/deploy/unity/xar_environment.cpp +++ b/torch/csrc/deploy/unity/xar_environment.cpp @@ -2,6 +2,7 @@ #include #include #include +#include #include #include @@ -59,7 +60,7 @@ bool _fileExists(const std::string& filePath) { } void XarEnvironment::setupPythonApp() { - TORCH_CHECK( + MULTIPY_CHECK( !alreadySetupPythonApp_, "Already setup the python application. It should only been done once!"); @@ -67,7 +68,8 @@ void XarEnvironment::setupPythonApp() { constexpr const char* SECTION_NAME = ".torch_deploy_payload.unity"; ElfFile elfFile(exePath_.c_str()); auto payloadSection = elfFile.findSection(SECTION_NAME); - TORCH_CHECK(payloadSection != at::nullopt, "Missing the payload section"); + MULTIPY_CHECK( + payloadSection != multipy::nullopt, "Missing the payload section"); const char* pythonAppPkgStart = payloadSection->start; auto pythonAppPkgSize = payloadSection->len; LOG(INFO) << "Embedded binary size " << pythonAppPkgSize; @@ -107,23 +109,26 @@ void XarEnvironment::setupPythonApp() { * past runs. It should be pretty safe to discard them. */ std::string rmCmd = fmt::format("rm -rf {}", pythonAppDir_); - TORCH_CHECK(system(rmCmd.c_str()) == 0, "Fail to remove the directory."); + MULTIPY_CHECK(system(rmCmd.c_str()) == 0, "Fail to remove the directory."); // recreate the directory auto r = mkdir(pythonAppDir_.c_str(), 0777); - TORCH_CHECK(r == 0, "Failed to create directory: ", strerror(errno)); + MULTIPY_CHECK(r == 0, "Failed to create directory: " + strerror(errno)); std::string pythonAppArchive = std::string(pythonAppDir_) + "/python_app.xar"; auto fp = fopen(pythonAppArchive.c_str(), "wb"); - TORCH_CHECK(fp != nullptr, "Fail to create file: ", strerror(errno)); + MULTIPY_CHECK(fp != nullptr, "Fail to create file: " + strerror(errno)); auto written = fwrite(pythonAppPkgStart, 1, pythonAppPkgSize, fp); - TORCH_CHECK(written == pythonAppPkgSize, "Expected written == size"); + MULTIPY_CHECK(written == pythonAppPkgSize, "Expected written == size"); fclose(fp); std::string extractCommand = fmt::format( "unsquashfs -o 4096 -d {} {}", pythonAppRoot_, pythonAppArchive); r = system(extractCommand.c_str()); - TORCH_CHECK(r == 0, "Fail to extract the python package"); + MULTIPY_CHECK( + r == 0, + "Fail to extract the python package" + std::to_string(r) + + extractCommand.c_str()); alreadySetupPythonApp_ = true; } @@ -143,12 +148,9 @@ void XarEnvironment::preloadSharedLibraries() { << " does not exist in the python app root, skip loading it"; continue; } - TORCH_CHECK( + MULTIPY_CHECK( dlopen(preloadList[i], RTLD_GLOBAL | RTLD_LAZY) != nullptr, - "Fail to open the shared library ", - preloadList[i], - ": ", - dlerror()); + "Fail to open the shared library " + preloadList[i] + ": " + dlerror()); } } diff --git a/torch/csrc/distributed/c10d/NCCLUtils.hpp b/torch/csrc/distributed/c10d/NCCLUtils.hpp index 9dabc0c8c3fc35..7ca54d167eadc5 100644 --- a/torch/csrc/distributed/c10d/NCCLUtils.hpp +++ b/torch/csrc/distributed/c10d/NCCLUtils.hpp @@ -25,7 +25,8 @@ const inline char* getNcclErrorDetailStr(ncclResult_t error, c10::optional #include +#include namespace c10d { @@ -180,4 +181,8 @@ ProcessGroup::ProcessGroup(int rank, int size) ProcessGroup::~ProcessGroup() {} +void ProcessGroup::init() { + C10_LOG_API_USAGE_ONCE(fmt::format("c10d.process_group_{}", getBackendName())); +} + } // namespace c10d diff --git a/torch/csrc/distributed/c10d/ProcessGroup.hpp b/torch/csrc/distributed/c10d/ProcessGroup.hpp index f66919a63b4401..f2418eb4bb9ac7 100644 --- a/torch/csrc/distributed/c10d/ProcessGroup.hpp +++ b/torch/csrc/distributed/c10d/ProcessGroup.hpp @@ -427,6 +427,10 @@ class TORCH_API ProcessGroup : public torch::CustomClassHolder { } protected: + // Implementations of this interface need to call this to setup + // appropriate logging etc. + void init(); + const int rank_; const int size_; // Optional sequence number structure for matching collectives. diff --git a/torch/csrc/distributed/c10d/ProcessGroupGloo.cpp b/torch/csrc/distributed/c10d/ProcessGroupGloo.cpp index 1297af592d98e8..0a33784ecaed80 100644 --- a/torch/csrc/distributed/c10d/ProcessGroupGloo.cpp +++ b/torch/csrc/distributed/c10d/ProcessGroupGloo.cpp @@ -763,6 +763,8 @@ ProcessGroupGloo::ProcessGroupGloo( for(const auto i : c10::irange(threads_.size())) { threads_[i] = std::thread(&ProcessGroupGloo::runLoop, this, i); } + + init(); } ProcessGroupGloo::~ProcessGroupGloo() { diff --git a/torch/csrc/distributed/c10d/ProcessGroupMPI.cpp b/torch/csrc/distributed/c10d/ProcessGroupMPI.cpp index 714f3a84deb61f..55d7d7c50441ee 100644 --- a/torch/csrc/distributed/c10d/ProcessGroupMPI.cpp +++ b/torch/csrc/distributed/c10d/ProcessGroupMPI.cpp @@ -310,6 +310,8 @@ ProcessGroupMPI::ProcessGroupMPI(int rank, int size, MPI_Comm pgComm) // Start the worker thread accepting MPI calls workerThread_ = std::thread(&ProcessGroupMPI::runLoop, this); + + init(); } ProcessGroupMPI::~ProcessGroupMPI() { diff --git a/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp b/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp index c397937ab79cb3..86d7897f558b14 100644 --- a/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp +++ b/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp @@ -1,4 +1,5 @@ #include +#include #include #ifdef USE_C10D_NCCL @@ -282,7 +283,8 @@ ProcessGroupNCCL::WorkNCCL::WorkNCCL( OpType opType, uint64_t seq, const char* profilingTitle, - const c10::optional>& inputs) + const c10::optional>& inputs, + bool desyncDebug) : Work(rank, opType, profilingTitle, inputs), devices_(devices), workStartTime_(std::chrono::steady_clock::now()), @@ -290,8 +292,10 @@ ProcessGroupNCCL::WorkNCCL::WorkNCCL( // Creates the CUDA event wrappers // Note: The actual events are lazily created when first recorded to with // DEFAULT_FLAGS = cudaEventDisableTiming. - ncclStartEvents_ = - std::make_shared>(devices.size()); + if (desyncDebug) { + ncclStartEvents_ = + std::make_shared>(devices.size()); + } ncclEndEvents_ = std::make_shared>(devices.size()); ncclComms_.resize(devices.size()); @@ -373,11 +377,20 @@ bool ProcessGroupNCCL::WorkNCCL::startedGPUExecutionInternal() const { } bool ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const { - for (const auto i : c10::irange(devices_.size())) { - // Checking the work's corresponding CUDA events' status - if (!(*ncclEndEvents_)[i].query()) { - return false; + try { + for (const auto i : c10::irange(devices_.size())) { + // Checking the work's corresponding CUDA events' status + if (!(*ncclEndEvents_)[i].query()) { + return false; + } + } + } catch (const std::exception& e) { + if (std::string(e.what()).find("driver shutting down") == std::string::npos) { + throw; } + LOG(INFO) << "[Rank " << rank_ + << "] Event query failed with exception: " + << e.what(); } return true; } @@ -537,9 +550,7 @@ ProcessGroupNCCL::ProcessGroupNCCL( "ProcessGroupNCCL is only supported with GPUs, no GPUs found!"); blockingWait_ = parseEnvVarFlag(NCCL_BLOCKING_WAIT); asyncErrorHandling_ = parseEnvVarFlag(NCCL_ASYNC_ERROR_HANDLING); - // Infer desync debug from whether TORCH_DISTRIBUTED_DEBUG >= INFO - // Provide backward support of NCCL_DESYNC_DEBUG - desyncDebug_ = dist_debug_level_ >= DebugLevel::Info || parseEnvVarFlag(NCCL_DESYNC_DEBUG); + desyncDebug_ = parseEnvVarFlag(NCCL_DESYNC_DEBUG); if (blockingWait_) { if (asyncErrorHandling_ || desyncDebug_) { @@ -578,20 +589,25 @@ ProcessGroupNCCL::ProcessGroupNCCL( workCleanupThread_ = std::thread(&ProcessGroupNCCL::workCleanupLoop, this); } - const char* ncclDebugLevel = std::getenv("NCCL_DEBUG"); - - if (!ncclDebugLevel) { - ncclDebugLevel = "UNSET"; - } - + init(); LOG(INFO) << "[Rank " << rank_ << "] ProcessGroupNCCL initialized with following options:" << "\nNCCL_ASYNC_ERROR_HANDLING: " << asyncErrorHandling_ + << "\nNCCL_DESYNC_DEBUG: " << desyncDebug_ << "\nNCCL_BLOCKING_WAIT: " << blockingWait_ << "\nTIMEOUT(ms): " << options_->timeout.count() << "\nUSE_HIGH_PRIORITY_STREAM: " - << options_->is_high_priority_stream - << "\nNCCL_DEBUG: " << ncclDebugLevel; + << options_->is_high_priority_stream; + +#ifdef USE_NCCL_WITH_UCC + static std::once_flag initialize_ucc_lib_flag; + std::call_once(initialize_ucc_lib_flag, [&]{ + uccLib_ = loadTorchUCC(); + if (uccLib_ != nullptr) { + LOG(INFO) << "[Rank " << rank_ << "] torch_ucc.so loaded"; + } + }); +#endif } void ProcessGroupNCCL::runHealthCheck() { @@ -1166,6 +1182,12 @@ std::vector>& ProcessGroupNCCL::getNCCLComm( // [Note 2 ] C10D_NCCL_CHECK(ncclGroupEnd(), c10::nullopt); + // At this point NCCL should have been initialized, hence we can accurately get + // the env value even if NCCL sets it by reading from nccl.conf file + if (getRank() == 0) { + LOG(INFO) << "NCCL_DEBUG: " << parse_env("NCCL_DEBUG"); + } + // See [Group Start/End Note] for (const auto i : c10::irange(ncclActiveGroupCounter_)) { (void)i; @@ -1338,7 +1360,8 @@ c10::intrusive_ptr ProcessGroupNCCL::initWork( opType, seq_, profilingTitle, - inputs); + inputs, + desyncDebug_); } std::vector ProcessGroupNCCL::WorkNCCL::result() { @@ -2259,6 +2282,9 @@ c10::intrusive_ptr ProcessGroupNCCL::gather( invalidArgument("requires empty output on non-root"); } outputs = {}; + // append a empty tensor to the list, we don't use it but + // collective function requires it to invoke its macros + outputs.emplace_back(); } return collective( @@ -2405,6 +2431,10 @@ c10::intrusive_ptr ProcessGroupNCCL::_allgather_base( "nccl:_all_gather_base"); } +#ifdef USE_NCCL_WITH_UCC +std::shared_ptr ProcessGroupNCCL::uccLib_ = nullptr; +#endif + } // namespace c10d #endif // USE_C10D_NCCL diff --git a/torch/csrc/distributed/c10d/ProcessGroupNCCL.hpp b/torch/csrc/distributed/c10d/ProcessGroupNCCL.hpp index d13b683a2a33c0..89f0b4b813d859 100644 --- a/torch/csrc/distributed/c10d/ProcessGroupNCCL.hpp +++ b/torch/csrc/distributed/c10d/ProcessGroupNCCL.hpp @@ -12,7 +12,9 @@ #include #include #include +#include +#include #include #include #include @@ -89,7 +91,8 @@ class TORCH_API ProcessGroupNCCL : public ProcessGroup { OpType opType, uint64_t seq, const char* profilingTitle = nullptr, - const c10::optional>& inputs = c10::nullopt); + const c10::optional>& inputs = c10::nullopt, + bool desyncDebug = false); // Copy constructor doing partial copy without outputs_. Cleanup thread // monitors and removes finished works. However it will deadlock when // destructs outputs_ tensors who are view tensors in autograd graph. @@ -622,6 +625,11 @@ class TORCH_API ProcessGroupNCCL : public ProcessGroup { // Counting for the sequential number of NCCL collective call. uint64_t seq_{0}; + +#ifdef USE_NCCL_WITH_UCC + // ProcessGroupUCC shared library handle + static std::shared_ptr uccLib_; +#endif }; } // namespace c10d diff --git a/torch/csrc/distributed/c10d/UCCForNCCL.hpp b/torch/csrc/distributed/c10d/UCCForNCCL.hpp new file mode 100644 index 00000000000000..ce38894faebc13 --- /dev/null +++ b/torch/csrc/distributed/c10d/UCCForNCCL.hpp @@ -0,0 +1,25 @@ +#pragma once + +#include +#include +#include +#include + +#include + +namespace c10d { + +inline std::shared_ptr loadTorchUCC() { + const char *path = std::getenv("TORCH_UCC_LIBRARY_PATH"); + if (path != nullptr) { + try { + return std::make_shared(path); + } catch (const c10::DynamicLibraryError &e) { + TORCH_WARN("TORCH_UCC_LIBRARY_PATH is set, " + "but the loading of torch_ucc.so failed with:", e.msg()); + } + } + return nullptr; +} + +} // namespace c10d diff --git a/torch/csrc/distributed/c10d/Utils.hpp b/torch/csrc/distributed/c10d/Utils.hpp index efa0a7e7ff687b..501993a728b7e1 100644 --- a/torch/csrc/distributed/c10d/Utils.hpp +++ b/torch/csrc/distributed/c10d/Utils.hpp @@ -407,7 +407,7 @@ inline void checkSplitSizes( "Tensor's dim 0 does not divide equally across group size"); } else { TORCH_CHECK( - split_sizes.size() == group_size, + split_sizes.size() == static_cast(group_size), "Number of tensor splits not equal to group size"); const auto sum = c10::sum_integers(split_sizes); TORCH_CHECK( diff --git a/torch/csrc/distributed/c10d/debug.h b/torch/csrc/distributed/c10d/debug.h index 7c326b2380eff4..ecfb4944829570 100644 --- a/torch/csrc/distributed/c10d/debug.h +++ b/torch/csrc/distributed/c10d/debug.h @@ -11,9 +11,9 @@ namespace c10d { enum class DebugLevel { - Off = 0, - Info = 1, - Detail = 2 + Off, + Info, + Detail }; TORCH_API void setDebugLevel(DebugLevel level); diff --git a/torch/csrc/distributed/c10d/reducer.cpp b/torch/csrc/distributed/c10d/reducer.cpp index 815e36cfa263eb..777abfaf2f0f0e 100644 --- a/torch/csrc/distributed/c10d/reducer.cpp +++ b/torch/csrc/distributed/c10d/reducer.cpp @@ -2052,6 +2052,47 @@ void verify_params_across_processes( const c10::intrusive_ptr& process_group, const std::vector& params, const c10::optional>& logger) { + + // First verify number of parameters to avoid inconsistent inputs into + // broadcast which can cause a crash. + // See https://github.com/pytorch/pytorch/issues/73547 + + at::TensorOptions param_size_options; + param_size_options = param_size_options.dtype(at::kLong); + param_size_options = param_size_options.device(params[0].device()); + // Note: Not using tensor building API because of + // https://github.com/pytorch/pytorch/issues/74114 + at::Tensor param_size_tensor = at::tensor( + {static_cast(params.size())}, param_size_options); + + // Allgather and verify parameter size. + std::vector> param_size_output_tensors; + param_size_output_tensors.emplace_back(std::vector{}); + auto world_size = process_group->getSize(); + for (size_t i = 0 ; i < world_size ; ++i) { + param_size_output_tensors.front().emplace_back( + at::empty_like(param_size_tensor) + ); + } + + std::vector param_size_vec{param_size_tensor}; + process_group->allgather(param_size_output_tensors, param_size_vec)->wait(); + auto result_size_tensors = param_size_output_tensors.front(); + for (size_t i = 0; i < world_size ; ++i ) { + auto param_size_for_rank = result_size_tensors[i][0].item(); + TORCH_CHECK( + param_size_for_rank == params.size(), + c10::str( + "DDP expects same model across all ranks, but Rank ", + process_group->getRank(), + " has ", params.size(), " params, while rank ", i, + " has inconsistent ", param_size_for_rank, + " params." + ) + ); + } + + // Continue with parameter shape verification. size_t i = 0; for (const auto& t : params) { i += 2 * t.dim(); @@ -2085,10 +2126,9 @@ void verify_params_across_processes( i = 0; for (const auto p : c10::irange(params.size())) { const auto& t = params[p]; - // I'd like to include which process we are in the message, - // but ProcessGroup::getRank is not public! for (const auto& sz : t.sizes()) { - auto msg = c10::str("params[", p, "] in this process", + auto msg = c10::str("[", process_group->getRank(), + "]: params[", p, "] in this process", " with sizes ", t.sizes(), " appears not to match sizes of the same param in process 0."); diff --git a/torch/csrc/distributed/c10d/reducer.hpp b/torch/csrc/distributed/c10d/reducer.hpp index ecf92a1893ff2f..adc021ca381439 100644 --- a/torch/csrc/distributed/c10d/reducer.hpp +++ b/torch/csrc/distributed/c10d/reducer.hpp @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include @@ -28,77 +29,10 @@ constexpr int kDefaultFirstBucketBytes = int(1024 * 1024); constexpr int kDefaultBucketBytesCap = int(25 * 1024 * 1024); // Collect runtime stats once for every kDDPRuntimeLoggingSampleRate iterations. constexpr int kDDPRuntimeLoggingSampleRate = 100; -constexpr int kUnsetTime = -1; - -inline int64_t current_time_in_nanos() { - return torch::profiler::impl::getTime(); -} // Forward declaration class Logger; -class TORCH_API Timer { - private: - // The timestamp of forward call start time in each iteration. - int64_t forward_start_time = kUnsetTime; - // The timestamp of backward computation start and end time in each - // iteration. - int64_t backward_compute_start_time = kUnsetTime; - int64_t backward_compute_end_time = kUnsetTime; - // The timestamp of first communication call start time in each iteration. - int64_t backward_comm_start_time = kUnsetTime; - // The timestamp of last communication call end time in each iteration. - int64_t backward_comm_end_time = kUnsetTime; - public: - enum class Event { - kForwardStart, - kBackwardComputeStart, - kBackwardComputeEnd, - kBackwardCommStart, - kBackwardCommEnd, - }; - - // Record the current event, i.e., mark it as having occurred now. Default - // CPU implementation. - virtual void record(Event event) { - getTimeRef(event) = current_time_in_nanos(); - } - - // Return the difference between when two events occurred, in nanoseconds. - // Or nullopt if one of them hasn't been recorded. - virtual c10::optional measureDifference(Event start, Event end) = 0; - - virtual ~Timer() = default; - - // Return host-side timestamp, or nullopt if it has not yet been recorded. - c10::optional getTimestamp(Event event) { - auto time = getTimeRef(event); - if (time == kUnsetTime) { - return c10::nullopt; - } else { - return time; - } - } - - // Return host-side time member variable corresponding to the given event. - int64_t& getTimeRef(Event event) { - switch (event) { - case Event::kForwardStart: - return forward_start_time; - case Event::kBackwardComputeStart: - return backward_compute_start_time; - case Event::kBackwardComputeEnd: - return backward_compute_end_time; - case Event::kBackwardCommStart: - return backward_comm_start_time; - case Event::kBackwardCommEnd: - return backward_comm_end_time; - default: - TORCH_INTERNAL_ASSERT(false); - } - } -}; - // Local accumulator type for a single bucket. struct BucketAccumulator { std::vector indices; @@ -106,8 +40,6 @@ struct BucketAccumulator { size_t size_limit = 0; }; -C10_DECLARE_TYPED_REGISTRY(TimerRegistry, c10::DeviceType, Timer, std::unique_ptr, c10::Device); - class TORCH_API Reducer { public: // The constructor takes a list of variables (i.e. parameters) for this diff --git a/torch/csrc/distributed/c10d/reducer_cuda.cpp b/torch/csrc/distributed/c10d/reducer_cuda.cpp index b836cddd8017c9..a1c570da5d59ab 100644 --- a/torch/csrc/distributed/c10d/reducer_cuda.cpp +++ b/torch/csrc/distributed/c10d/reducer_cuda.cpp @@ -1,4 +1,4 @@ -#include +#include #include #include diff --git a/torch/csrc/distributed/c10d/reducer_timer.hpp b/torch/csrc/distributed/c10d/reducer_timer.hpp new file mode 100644 index 00000000000000..ba696383b88e7f --- /dev/null +++ b/torch/csrc/distributed/c10d/reducer_timer.hpp @@ -0,0 +1,75 @@ +#pragma once +#include + +namespace c10d { +constexpr int kUnsetTime = -1; + +inline int64_t current_time_in_nanos() { + return torch::profiler::impl::getTime(); +} + +class TORCH_API Timer { + private: + // The timestamp of forward call start time in each iteration. + int64_t forward_start_time = kUnsetTime; + // The timestamp of backward computation start and end time in each + // iteration. + int64_t backward_compute_start_time = kUnsetTime; + int64_t backward_compute_end_time = kUnsetTime; + // The timestamp of first communication call start time in each iteration. + int64_t backward_comm_start_time = kUnsetTime; + // The timestamp of last communication call end time in each iteration. + int64_t backward_comm_end_time = kUnsetTime; + + public: + enum class Event { + kForwardStart, + kBackwardComputeStart, + kBackwardComputeEnd, + kBackwardCommStart, + kBackwardCommEnd, + }; + + // Record the current event, i.e., mark it as having occurred now. Default + // CPU implementation. + virtual void record(Event event) { + getTimeRef(event) = current_time_in_nanos(); + } + + // Return the difference between when two events occurred, in nanoseconds. + // Or nullopt if one of them hasn't been recorded. + virtual c10::optional measureDifference(Event start, Event end) = 0; + + virtual ~Timer() = default; + + // Return host-side timestamp, or nullopt if it has not yet been recorded. + c10::optional getTimestamp(Event event) { + auto time = getTimeRef(event); + if (time == kUnsetTime) { + return c10::nullopt; + } else { + return time; + } + } + + // Return host-side time member variable corresponding to the given event. + int64_t& getTimeRef(Event event) { + switch (event) { + case Event::kForwardStart: + return forward_start_time; + case Event::kBackwardComputeStart: + return backward_compute_start_time; + case Event::kBackwardComputeEnd: + return backward_compute_end_time; + case Event::kBackwardCommStart: + return backward_comm_start_time; + case Event::kBackwardCommEnd: + return backward_comm_end_time; + default: + TORCH_INTERNAL_ASSERT(false); + } + } +}; + +C10_DECLARE_TYPED_REGISTRY(TimerRegistry, c10::DeviceType, Timer, std::unique_ptr, c10::Device); +} // namespace c10d diff --git a/torch/csrc/distributed/c10d/socket.cpp b/torch/csrc/distributed/c10d/socket.cpp index af09473a36ac82..acd819ab631cda 100644 --- a/torch/csrc/distributed/c10d/socket.cpp +++ b/torch/csrc/distributed/c10d/socket.cpp @@ -613,28 +613,11 @@ std::unique_ptr SocketConnectOp::run() { } bool SocketConnectOp::tryConnect(int family) { - ::addrinfo hints{}, *naked_result = nullptr; - + ::addrinfo hints{}; hints.ai_flags = AI_V4MAPPED | AI_ALL | AI_NUMERICSERV; hints.ai_family = family; hints.ai_socktype = SOCK_STREAM; - int r = ::getaddrinfo(host_, port_.c_str(), &hints, &naked_result); - if (r != 0) { - const char* gai_err = ::gai_strerror(r); - - recordError("The {}network addresses of ({}, {}) cannot be retrieved (gai error: {} - {}).", - family == AF_INET ? "IPv4 " : family == AF_INET6 ? "IPv6 " : "", - host_, - port_, - r, - gai_err); - - return false; - } - - addrinfo_ptr result{naked_result}; - deadline_ = Clock::now() + opts_->connect_timeout(); std::size_t retry_attempt = 1; @@ -645,16 +628,33 @@ bool SocketConnectOp::tryConnect(int family) { errors_.clear(); - for (::addrinfo* addr = naked_result; addr != nullptr; addr = addr->ai_next) { - C10D_TRACE("The client socket is attempting to connect to {}.", *addr); + ::addrinfo *naked_result = nullptr; + // patternlint-disable cpp-dns-deps + int r = ::getaddrinfo(host_, port_.c_str(), &hints, &naked_result); + if (r != 0) { + const char* gai_err = ::gai_strerror(r); + + recordError("The {}network addresses of ({}, {}) cannot be retrieved (gai error: {} - {}).", + family == AF_INET ? "IPv4 " : family == AF_INET6 ? "IPv6 " : "", + host_, + port_, + r, + gai_err); + retry = true; + } else { + addrinfo_ptr result{naked_result}; + + for (::addrinfo* addr = naked_result; addr != nullptr; addr = addr->ai_next) { + C10D_TRACE("The client socket is attempting to connect to {}.", *addr); - ConnectResult cr = tryConnect(*addr); - if (cr == ConnectResult::Success) { - return true; - } + ConnectResult cr = tryConnect(*addr); + if (cr == ConnectResult::Success) { + return true; + } - if (cr == ConnectResult::Retry) { - retry = true; + if (cr == ConnectResult::Retry) { + retry = true; + } } } diff --git a/torch/csrc/distributed/rpc/agent_utils.cpp b/torch/csrc/distributed/rpc/agent_utils.cpp index 45ffb2903bb0ca..3aa11961b0d671 100644 --- a/torch/csrc/distributed/rpc/agent_utils.cpp +++ b/torch/csrc/distributed/rpc/agent_utils.cpp @@ -41,6 +41,89 @@ std::unordered_map collectNames( return nameToId; } +std::vector splitString( + const std::string& s, + const std::string& delim) { + std::vector tokens; + size_t start = 0; + // NOLINTNEXTLINE(cppcoreguidelines-init-variables) + size_t end; + while ((end = s.find(delim, start)) != std::string::npos) { + tokens.emplace_back(s.substr(start, end - start)); + start = end + delim.length(); + } + tokens.emplace_back(s.substr(start)); + return tokens; +} + +std::unordered_map collectCurrentNames( + ::c10d::PrefixStore store, + const worker_id_t selfId, + const std::string& selfName) { + std::vector selfNameVector( + (uint8_t*)selfName.c_str(), + (uint8_t*)selfName.c_str() + selfName.length()); + + // Check that ID does not already exist and set {ID : NAME} + std::vector resultVector = store.compareSet( + c10::to_string(selfId), std::vector(), selfNameVector); + TORCH_CHECK( + resultVector == selfNameVector, + "RPC worker id ", + selfId, + " is not unique. Worker ", + resultVector, + " and already has ID and ", + selfNameVector, + " cannot be added."); + + store.set(c10::to_string(selfId), selfNameVector); + + std::unordered_map nameToId; + nameToId.emplace(selfName, selfId); + + // Check to see if there is list of worker names in the store + std::string allWorkerInfosKey("AllWorkerInfos"); + bool worker_names_available = + store.check(std::vector{allWorkerInfosKey}); + std::string allWorkerInfos; + if (worker_names_available) { + // Get the current list of workers + std::vector allWorkerInfosKeyVector = store.get(allWorkerInfosKey); + allWorkerInfos = std::string( + (char*)allWorkerInfosKeyVector.data(), allWorkerInfosKeyVector.size()); + // workerInfos are comma separated, (e.g. + // "Name1-Rank1,Name2-Rank2,Name3-Rank2") parse list of workers + for (const std::string& workerInfo : splitString(allWorkerInfos, ",")) { + auto workerInfoVec = splitString(workerInfo, "-"); + std::string workerName = workerInfoVec.at(0); + int workerId = std::stoi(workerInfoVec.at(1)); + + TORCH_CHECK( + nameToId.find(workerName) == nameToId.end(), + "RPC worker name ", + workerName, + " is not unique. Workers ", + nameToId.find(workerName)->second, + " and ", + workerId, + " share the same name."); + + nameToId.emplace(workerName, workerId); + } + allWorkerInfos = fmt::format("{},{}-{}", allWorkerInfos, selfName, selfId); + } else { + // Add own name to worker list + allWorkerInfos = fmt::format("{}-{}", selfName, selfId); + } + std::vector allWorkerInfosVector( + (uint8_t*)allWorkerInfos.c_str(), + (uint8_t*)allWorkerInfos.c_str() + allWorkerInfos.length()); + store.set(allWorkerInfosKey, allWorkerInfosVector); + + return nameToId; +} + const string storeKeyBarrierId = "_ID_"; const string storeKeyProcessCount = "PROCESS_COUNT"; const string storeKeyActiveCallCount = "ACTIVE_CALLS"; diff --git a/torch/csrc/distributed/rpc/agent_utils.h b/torch/csrc/distributed/rpc/agent_utils.h index befa26b8603754..d7e63dd033f74e 100644 --- a/torch/csrc/distributed/rpc/agent_utils.h +++ b/torch/csrc/distributed/rpc/agent_utils.h @@ -16,6 +16,16 @@ std::unordered_map collectNames( const std::string& selfName, const int worldSize); +// Ranks in dynamic RPC groups will initially call into this to establish the +// name-to-id mapping for the current peers in the group. The current rank will +// put its own worker info in the store and discover all the ranks that came +// before it. NOTE: This needs to be called with the Dynamic RPC group +// membership management token held. +std::unordered_map collectCurrentNames( + ::c10d::PrefixStore store, + const worker_id_t selfId, + const std::string& selfName); + // This performs a synchronization of all call counts by using store. // All RPC peers wait for others to join to exit at the same time. int syncCallCount( diff --git a/torch/csrc/distributed/rpc/init.cpp b/torch/csrc/distributed/rpc/init.cpp index 8c16d87f9ee9b2..0552c9c641148d 100644 --- a/torch/csrc/distributed/rpc/init.cpp +++ b/torch/csrc/distributed/rpc/init.cpp @@ -576,7 +576,7 @@ PyObject* rpc_init(PyObject* _unused, PyObject* noargs) { [](const c10::intrusive_ptr<::c10d::Store>& store, std::string selfName, worker_id_t selfId, - int worldSize, + optional worldSize, TensorPipeRpcBackendOptions opts, std::unordered_map reverseDeviceMaps, std::vector devices) { diff --git a/torch/csrc/distributed/rpc/tensorpipe_agent.cpp b/torch/csrc/distributed/rpc/tensorpipe_agent.cpp index aaaf3c673f7557..d2f753a9edcbe1 100644 --- a/torch/csrc/distributed/rpc/tensorpipe_agent.cpp +++ b/torch/csrc/distributed/rpc/tensorpipe_agent.cpp @@ -342,9 +342,15 @@ void TensorPipeAgent::removeFromTimeoutMap(uint64_t messageId) { } } -void TensorPipeAgent::prepareNames() { - auto nameToId = collectNames( - rankToNameStore_, workerInfo_.id_, workerInfo_.name_, worldSize_); +void TensorPipeAgent::prepareNames(bool isStaticGroup) { + std::unordered_map nameToId; + if (isStaticGroup) { + nameToId = collectNames( + rankToNameStore_, workerInfo_.id_, workerInfo_.name_, worldSize_); + } else { + nameToId = collectCurrentNames( + rankToNameStore_, workerInfo_.id_, workerInfo_.name_); + } for (const auto& entry : nameToId) { const auto& workerName = entry.first; @@ -354,11 +360,35 @@ void TensorPipeAgent::prepareNames() { } } +void TensorPipeAgent::checkAndSetStaticGroup( + const c10::intrusive_ptr<::c10d::Store>& store) { + std::string isStaticGroupKey("rpcIsStaticGroup"); + + std::string isStaticGroupStr = isStaticGroup_ ? "true" : "false"; + std::vector isStaticGroupVec( + (uint8_t*)isStaticGroupStr.c_str(), + (uint8_t*)isStaticGroupStr.c_str() + isStaticGroupStr.length()); + std::vector returnedVec; + returnedVec = store->compareSet( + isStaticGroupKey, std::vector(), isStaticGroupVec); + std::string returnedVal = std::string(returnedVec.begin(), returnedVec.end()); + // In both cases, the returned value should be the value of isStaticGroupStr, + // otherwise there is a discrepency with initialization among one of the + // members + TORCH_CHECK( + returnedVal == isStaticGroupStr, + fmt::format( + "RPC group mixes statically and dynamically initialized members which is not supported. ", + "Static group property is initialized as {} and is trying to be set as {} ", + isStaticGroup_, + returnedVal)); +} + TensorPipeAgent::TensorPipeAgent( const c10::intrusive_ptr<::c10d::Store>& store, std::string selfName, worker_id_t selfId, - int worldSize, + optional worldSize, TensorPipeRpcBackendOptions opts, std::unordered_map reverseDeviceMaps, std::vector devices, @@ -377,9 +407,16 @@ TensorPipeAgent::TensorPipeAgent( rankToNameStore_("names", store), nameToAddressStore_("addrs", store), shutdownStore_("shutdown", store), - worldSize_(worldSize) { + isStaticGroup_(worldSize.has_value()) { + if (isStaticGroup_) { + worldSize_ = worldSize.value(); + } + + // check the static group attribute against store + checkAndSetStaticGroup(store); + // collect worker names - prepareNames(); + prepareNames(isStaticGroup_); // Initialize the time-series metrics tracking map timeSeriesMetrics_.emplace(kGilAverageWaitTime, TimeSeriesMetricsTracker()); diff --git a/torch/csrc/distributed/rpc/tensorpipe_agent.h b/torch/csrc/distributed/rpc/tensorpipe_agent.h index b76e1a099bebd5..4d667f02961fc7 100644 --- a/torch/csrc/distributed/rpc/tensorpipe_agent.h +++ b/torch/csrc/distributed/rpc/tensorpipe_agent.h @@ -165,7 +165,7 @@ class TORCH_API TensorPipeAgent : public RpcAgent { const c10::intrusive_ptr<::c10d::Store>& store, std::string selfName, worker_id_t selfId, - int worldSize, + optional worldSize, TensorPipeRpcBackendOptions opts, std::unordered_map reverseDeviceMaps, std::vector devices, @@ -233,7 +233,10 @@ class TORCH_API TensorPipeAgent : public RpcAgent { void removeFromTimeoutMap(uint64_t messageId); // Populates workerIdToInfo_ and workerNameToInfo_ using addressStore_ - void prepareNames(); + void prepareNames(bool isStaticGroup); + + // Check the static group attribute with the value set in store + void checkAndSetStaticGroup(const c10::intrusive_ptr<::c10d::Store>& store); const std::string& findWorkerURL(const WorkerInfo& worker) const; @@ -331,7 +334,8 @@ class TORCH_API TensorPipeAgent : public RpcAgent { // Store keys that will used to count joined processes and active calls during // the shutdown process ::c10d::PrefixStore shutdownStore_; - const int worldSize_; + int worldSize_ = 0; + const bool isStaticGroup_; std::atomic nextMessageID_{0}; diff --git a/torch/csrc/distributed/rpc/torchscript_functions.cpp b/torch/csrc/distributed/rpc/torchscript_functions.cpp index 464a290de1dc96..8afbc813591442 100644 --- a/torch/csrc/distributed/rpc/torchscript_functions.cpp +++ b/torch/csrc/distributed/rpc/torchscript_functions.cpp @@ -21,10 +21,7 @@ c10::intrusive_ptr rpcTorchscript( std::vector& stack, const float rpcTimeoutSeconds, const bool isAsyncExecution) { - // This dummy tensor holds an at::RecordFunction when profiling is enabled. - // This is because at::RecordFunction is not yet registered as a TorchScript - // custom class (https://github.com/pytorch/pytorch/issues/35026) - at::Tensor handle = at::zeros(1); + c10::intrusive_ptr record; auto shouldProfile = torch::autograd::profiler::profilerEnabled() && !torch::distributed::rpc::RemoteProfilerManager::getInstance() .isCurrentKeySet(); @@ -35,7 +32,8 @@ c10::intrusive_ptr rpcTorchscript( .qualifiedName(), /* name of torchscript function being run */ RpcAgent::getCurrentRpcAgent()->getWorkerInfo().name_, dstWorkerName); - handle = torch::autograd::profiler::record_function_enter(rpcAsyncJitKey); + record = + torch::autograd::profiler::record_function_enter_new(rpcAsyncJitKey); auto& remoteProfilerManager = torch::distributed::rpc::RemoteProfilerManager::getInstance(); remoteProfilerManager.setCurrentKey(rpcAsyncJitKey); @@ -75,7 +73,8 @@ c10::intrusive_ptr rpcTorchscript( })); if (shouldProfile) { auto profiledFutPtr = - torch::autograd::profiler::_call_end_callbacks_on_fut(handle, futPtr); + torch::autograd::profiler::_call_end_callbacks_on_fut_new( + record, futPtr); return profiledFutPtr; } return futPtr; diff --git a/torch/csrc/generic/Storage.cpp b/torch/csrc/generic/Storage.cpp index 99499ef9a01947..4743ba1a862787 100644 --- a/torch/csrc/generic/Storage.cpp +++ b/torch/csrc/generic/Storage.cpp @@ -144,7 +144,7 @@ static PyObject * THPStorage_(get)(THPStorage *self, PyObject *index) int64_t nindex = THPUtils_unpackLong(index); if (nindex < 0) nindex += (self->cdata->nbytes() / sizeof(scalar_t)); - if (nindex < 0 || nindex >= (self->cdata->nbytes() / sizeof(scalar_t))) { + if (nindex < 0 || nindex >= static_cast(self->cdata->nbytes() / sizeof(scalar_t))) { PyErr_SetString(PyExc_IndexError, fmt::format( "index {} out of range for storage of size {}", nindex, self->cdata->nbytes() / sizeof(scalar_t))); diff --git a/torch/csrc/generic/StorageSharing.cpp b/torch/csrc/generic/StorageSharing.cpp index 01cd5c49998b1f..701df7daaa0c14 100644 --- a/torch/csrc/generic/StorageSharing.cpp +++ b/torch/csrc/generic/StorageSharing.cpp @@ -282,13 +282,9 @@ static PyObject * THPStorage_(shareCuda)(PyObject *_self, PyObject *noargs) // NOLINTNEXTLINE(cppcoreguidelines-init-variables) cudaIpcEventHandle_t ipc_event_handle; -#if !defined(USE_ROCM) if (sent_data->event_sync_required_) { C10_CUDA_CHECK(cudaIpcGetEventHandle(&ipc_event_handle, sent_data->event_)); } -#else - // ipc_event_handle unused in storage receiver, we can leave it uninitialized. -#endif _event_handle = PyBytes_FromStringAndSize((char *)&ipc_event_handle, CUDA_IPC_HANDLE_SIZE); _event_sync_required = PyBool_FromLong(sent_data->event_sync_required_); @@ -400,7 +396,6 @@ static PyObject * THPStorage_(newSharedCuda)(PyObject *_unused, PyObject *args) int64_t device = THPUtils_unpackLong(_device); at::cuda::CUDAGuard device_guard(device); -#if !defined(USE_ROCM) if (PyObject_IsTrue(_event_sync_required)) { // Ensure that producer prepared all tensor's data std::string s_ipc_event_handle = @@ -413,9 +408,6 @@ static PyObject * THPStorage_(newSharedCuda)(PyObject *_unused, PyObject *args) AT_CUDA_CHECK( cudaStreamWaitEvent(c10::cuda::getCurrentCUDAStream(device), event, 0)); } -#else - // Already synchronized inside producer stream -#endif std::string s_handle = THPStorage_(bytesAsHandleString)(_handle); std::shared_ptr basePtr = c10::cuda::CUDACachingAllocator::getIpcDevPtr(s_handle); diff --git a/torch/csrc/init_flatbuffer_module.cpp b/torch/csrc/init_flatbuffer_module.cpp new file mode 100644 index 00000000000000..22ec10b6f29b78 --- /dev/null +++ b/torch/csrc/init_flatbuffer_module.cpp @@ -0,0 +1,97 @@ +#include + +#include +#include + +#include +#include +#include +#include +#include +#include + +#include // NOLINT +#include +#include +#include +#include +#include + +namespace py = pybind11; + +static std::shared_ptr copyStr(const std::string& bytes) { + size_t size = (bytes.size() / FLATBUFFERS_MAX_ALIGNMENT + 1) * + FLATBUFFERS_MAX_ALIGNMENT; +#ifdef _WIN32 + std::shared_ptr bytes_copy( + static_cast(_aligned_malloc(size, FLATBUFFERS_MAX_ALIGNMENT)), + _aligned_free); +#else + std::shared_ptr bytes_copy( + static_cast(aligned_alloc(FLATBUFFERS_MAX_ALIGNMENT, size)), free); +#endif + memcpy(bytes_copy.get(), bytes.data(), bytes.size()); + return bytes_copy; +} + +extern "C" +#ifdef _WIN32 + __declspec(dllexport) +#endif + PyObject* initModuleFlatbuffer() { + using namespace torch::jit; + PyMethodDef m[] = {{nullptr, nullptr, 0, nullptr}}; // NOLINT + static struct PyModuleDef torchmodule = { + PyModuleDef_HEAD_INIT, + "torch._C_flatbuffer", + nullptr, + -1, + m, + }; // NOLINT + PyObject* module = PyModule_Create(&torchmodule); + auto pym = py::handle(module).cast(); + pym.def("_load_mobile_module_from_file", [](const std::string& filename) { + return torch::jit::load_mobile_module_from_file(filename); + }); + pym.def("_load_mobile_module_from_bytes", [](const std::string& bytes) { + auto bytes_copy = copyStr(bytes); + return torch::jit::parse_and_initialize_mobile_module( + bytes_copy, bytes.size()); + }); + pym.def("_load_jit_module_from_file", [](const std::string& filename) { + ExtraFilesMap extra_files = ExtraFilesMap(); + return torch::jit::load_jit_module_from_file(filename, extra_files); + }); + pym.def("_load_jit_module_from_bytes", [](const std::string& bytes) { + auto bytes_copy = copyStr(bytes); + ExtraFilesMap extra_files = ExtraFilesMap(); + return torch::jit::parse_and_initialize_jit_module( + bytes_copy, bytes.size(), extra_files); + }); + pym.def( + "_save_mobile_module", + [](const torch::jit::mobile::Module& module, + const std::string& filename) { + return torch::jit::save_mobile_module(module, filename); + }); + pym.def( + "_save_jit_module", + [](const torch::jit::Module& module, const std::string& filename) { + return torch::jit::save_jit_module(module, filename); + }); + pym.def( + "_save_mobile_module_to_bytes", + [](const torch::jit::mobile::Module& module) { + auto detached_buffer = torch::jit::save_mobile_module_to_bytes(module); + return py::bytes( + reinterpret_cast(detached_buffer.data()), + detached_buffer.size()); + }); + pym.def("_save_jit_module_to_bytes", [](const torch::jit::Module& module) { + auto detached_buffer = torch::jit::save_jit_module_to_bytes(module); + return py::bytes( + reinterpret_cast(detached_buffer.data()), + detached_buffer.size()); + }); + return module; +} diff --git a/torch/csrc/jit/api/function_impl.h b/torch/csrc/jit/api/function_impl.h index c92e46a352e363..d97f3a2c862faa 100644 --- a/torch/csrc/jit/api/function_impl.h +++ b/torch/csrc/jit/api/function_impl.h @@ -13,10 +13,14 @@ struct TORCH_API GraphFunction : public Function { GraphFunction( c10::QualifiedName name, std::shared_ptr graph, - std::function function_creator) + std::function function_creator, + c10::optional executor_execution_mode = + c10::nullopt) : name_(std::move(name)), graph_(std::move(graph)), - function_creator_(std::move(function_creator)) {} + function_creator_(std::move(function_creator)) { + executor_execution_mode_ = executor_execution_mode; + } bool isGraphFunction() const override { return true; @@ -53,6 +57,13 @@ struct TORCH_API GraphFunction : public Function { return name_; } + // private/unstable api. sets the initial execution mode + // will not affect executor if there is an existing executor + // created for this function + void _set_initial_executor_execution_mode(ExecutorExecutionMode mode) { + executor_execution_mode_ = mode; + } + // if this isn't yet defined, run its method_creator function void ensure_defined() override; @@ -92,14 +103,20 @@ struct TORCH_API GraphFunction : public Function { return *executor; } check_single_output(); - executor = GraphExecutor(optimized_graph(), name_.name()); + const std::string& name = name_.name(); + std::shared_ptr opt_graph = optimized_graph(); + if (!executor_execution_mode_) { + executor = GraphExecutor(opt_graph, name); + } else { + executor = GraphExecutor(opt_graph, name, *executor_execution_mode_); + } return *executor; } using Function::call; bool call( Stack& stack, - size_t bailOut, + c10::optional bailOut, c10::function_ref f) override { f(get_executor().getPlanFor(stack, bailOut).code); return true; @@ -128,6 +145,10 @@ struct TORCH_API GraphFunction : public Function { // The original, non-optimized graph std::shared_ptr graph_; // for debugging and for inlining + // allows users to specify Simple/Profiling Executor for function + // TODO: add more executors + mutable c10::optional executor_execution_mode_; + // Optimized graph, computed lazily. Used for inlining. mutable std::array< c10::optional>, diff --git a/torch/csrc/jit/api/module.h b/torch/csrc/jit/api/module.h index a040b953be1c23..a6aa49278cbec6 100644 --- a/torch/csrc/jit/api/module.h +++ b/torch/csrc/jit/api/module.h @@ -223,12 +223,14 @@ struct TORCH_API Module : public Object { void _save_for_mobile( std::ostream& out, const ExtraFilesMap& extra_files = ExtraFilesMap(), - bool save_mobile_debug_info = false) const; + bool save_mobile_debug_info = false, + bool use_flatbuffer = false) const; void _save_for_mobile( const std::string& filename, const ExtraFilesMap& extra_files = ExtraFilesMap(), - bool save_mobile_debug_info = false) const; + bool save_mobile_debug_info = false, + bool use_flatbuffer = false) const; Module copy() const; @@ -265,6 +267,10 @@ struct TORCH_API Module : public Object { return _ivalue() == y._ivalue(); } + void set_delete_memory(std::shared_ptr delete_mem) { + mem_to_delete_ = delete_mem; + } + private: Module clone_impl( std::unordered_map& type_remap, @@ -286,6 +292,9 @@ struct TORCH_API Module : public Object { const c10::optional& device, const c10::optional& dtype, bool non_blocking); + + // Extra handle for the module to delete when itself is deleted + std::shared_ptr mem_to_delete_; }; // C++ equivalent api of `torch.jit.freeze`. See documentation there for diff --git a/torch/csrc/jit/api/module_save.cpp b/torch/csrc/jit/api/module_save.cpp index c8afa5efaf3529..912c38612c354b 100644 --- a/torch/csrc/jit/api/module_save.cpp +++ b/torch/csrc/jit/api/module_save.cpp @@ -16,25 +16,29 @@ void Module::save(const std::string& filename, const ExtraFilesMap& extra_files) void Module::_save_for_mobile( std::ostream& out, const ExtraFilesMap& extra_files, - bool save_mobile_debug_info) const { + bool save_mobile_debug_info, + bool use_flatbuffer) const { ExportModule( *this, out, extra_files, true /* bytecode_format */, - save_mobile_debug_info); + save_mobile_debug_info, + use_flatbuffer); } void Module::_save_for_mobile( const std::string& filename, const ExtraFilesMap& extra_files, - bool save_mobile_debug_info) const { + bool save_mobile_debug_info, + bool use_flatbuffer) const { ExportModule( *this, filename, extra_files, true /* bytecode_format */, - save_mobile_debug_info); + save_mobile_debug_info, + use_flatbuffer); } } // namespace jit diff --git a/torch/csrc/jit/backends/nnapi/nnapi_backend_lib.cpp b/torch/csrc/jit/backends/nnapi/nnapi_backend_lib.cpp index 7d9dc18c12589f..ba4a2b25c23a78 100644 --- a/torch/csrc/jit/backends/nnapi/nnapi_backend_lib.cpp +++ b/torch/csrc/jit/backends/nnapi/nnapi_backend_lib.cpp @@ -31,7 +31,7 @@ class NnapiBackend : public PyTorchBackendInterface { c10::impl::GenericDict compile( c10::IValue processed, c10::impl::GenericDict method_compile_spec) override { - // Wrap procesed in dictionary: {"forward": processed} + // Wrap processed in dictionary: {"forward": processed} auto dict = processed.toGenericDict(); c10::Dict handles( c10::StringType::get(), c10::AnyType::get()); @@ -64,7 +64,7 @@ class NnapiBackend : public PyTorchBackendInterface { auto inp_mem_fmts = dict.at("inp_mem_fmts").toIntList(); TORCH_CHECK(tensorInp.size() == inp_mem_fmts.size()); std::vector fixed_inputs; - for (int i = 0; i < tensorInp.size(); i++) { + for (auto i = 0U; i < tensorInp.size(); i++) { int fmt = inp_mem_fmts[i]; // These constants match the values in DimOrder in serializer.py // 0: NCHW, 1: NHWC @@ -84,7 +84,7 @@ class NnapiBackend : public PyTorchBackendInterface { // Adjust output memory formats auto out_mem_fmts = dict.at("out_mem_fmts").toIntList(); TORCH_CHECK(outputs.size() == out_mem_fmts.size()); - for (int i = 0; i < outputs.size(); i++) { + for (auto i = 0U; i < outputs.size(); i++) { int fmt = out_mem_fmts[i]; // These constants match the values in DimOrder in serializer.py // 0: NCHW, 1: NHWC diff --git a/torch/csrc/jit/backends/nnapi/nnapi_backend_preprocess.cpp b/torch/csrc/jit/backends/nnapi/nnapi_backend_preprocess.cpp index be0dbe18d90d0c..a787ecc6cbfda6 100644 --- a/torch/csrc/jit/backends/nnapi/nnapi_backend_preprocess.cpp +++ b/torch/csrc/jit/backends/nnapi/nnapi_backend_preprocess.cpp @@ -96,7 +96,7 @@ c10::IValue preprocess( // transform Python lists to C++ c10::List c10::List weights( py::cast>(nnapi_processed[2])); - for (int i = 0; i < weights.size(); i++) { + for (auto i = 0U; i < weights.size(); i++) { weights.set(i, weights.get(i).contiguous()); } c10::List inp_mem_fmts( diff --git a/torch/csrc/jit/codegen/cuda/README.md b/torch/csrc/jit/codegen/cuda/README.md new file mode 100644 index 00000000000000..4f50c32aecdb4f --- /dev/null +++ b/torch/csrc/jit/codegen/cuda/README.md @@ -0,0 +1,228 @@ +# NVFuser - A Fusion Code Generator for NVIDIA GPUs +_NVFuser is integrated as a backend for TorchScript's Profiling Graph Executor_ + +## Enabling NVFuser +_NVFuser is not currently the default fuser for NVIDIA GPUs._ + +**Fusions will only show up during the ~3rd iteration of execution, the exact number depends on profiling executor's optimization phases** + +### Enable by Context Manager + +``` +jit_model = torch.jit.script(model) + +with torch.jit.fuser("fuser2") : + for _ in range(5) : + outputs = jit_model(inputs) +``` + +### Enable by Specific Functions + +1. Disable cpu/gpu fusion for native/nnc fuser +``` +torch._C._jit_override_can_fuse_on_cpu(False) +torch._C._jit_override_can_fuse_on_gpu(False) +``` +2. Disable nnc fuser +``` +torch._C._jit_set_texpr_fuser_enabled(False) +``` +3. Enable nvfuser +``` +torch._C._jit_set_nvfuser_enabled(True) +``` + +## Simple knobs to change fusion behavior + +1. Allow single node fusion `torch._C._jit_set_nvfuser_single_node_mode(True)` +Fusion group is only created when two or more compatible ops are grouped together. Turn on single node fusion would allow fusion pass to create fusion group with a single node, this is very handy for testing and could be useful when single node generated kernel out-performs native cuda kernels in framework. + +2. Allow horizontal fusion `torch._C._jit_set_nvfuser_horizontal_mode(True)` +Fusion pass fuses producer to consumer, horizontal mode allows sibling nodes that shared tensor input to be fused together. This could save input memory bandwidth. + +3. Turn off guard for fusion `torch._C._jit_set_nvfuser_guard_mode(False)` +This disables the runtime check on fusion group pre-assumptions (tensor meta information / constant inputs / profiled constants), this really is only used for testing as we want to ensure generated kernels are indeed tested and you should avoid using this in training scripts. + +## Fusion Debugging + +Given the following script as an example + +``` +import torch + +def forward(x): + o = x + 1.0 + o = o.relu() + return o + +shape = (2, 32, 128, 512) +input = torch.rand(*shape).cuda() +t = torch.jit.script(forward) + +with torch.jit.fuser("fuser2"): + for k in range(4): + o = t(input) +``` + +### TorchScript Based Debugging + +#### 1. TorchScript IR Graph + +##### Usage + +Two easy ways to checkout fusion for graph: The first one is to print out graph in python script after a few runs (for optimization to kick in). + +`print(t.graph_for(input))` + +The second way is to turn on graph dumping in profiling executor via command line below: + +``` +PYTORCH_JIT_LOG_LEVEL="profiling_graph_executor_impl" python +``` + +##### Example Output + +Graph print out is straight forward and you should look for `prim::CudaFusionGroup_X` for fused kernels. While profiling executor dumps many things, but the most important part is `Optimized Graph`. In this example, it shows a Fusion Group, which is an indication that fusion is happening and you should be expecting fused kernel! + +``` + Optimized Graph: + graph(%x.1 : Tensor): + %12 : bool = prim::CudaFusionGuard[types=[Float(2, 32, 128, 512, strides=[2097152, 65536, 512, 1], requires_grad=0, device=cuda:0)]](%x.1) + %11 : Tensor = prim::If(%12) + block0(): + %o.8 : Tensor = prim::CudaFusionGroup_0[cache_id=0](%x.1) + -> (%o.8) + block1(): + %18 : Function = prim::Constant[name="fallback_function", fallback=1]() + %19 : (Float(2, 32, 128, 512, strides=[2097152, 65536, 512, 1], requires_grad=0, device=cuda:0)) = prim::CallFunction(%18, %x.1) + %20 : Float(2, 32, 128, 512, strides=[2097152, 65536, 512, 1], requires_grad=0, device=cuda:0) = prim::TupleUnpack(%19) + -> (%20) + return (%11) + with prim::CudaFusionGroup_0 = graph(%2 : Float(2, 32, 128, 512, strides=[2097152, 65536, 512, 1], requires_grad=0, device=cuda:0)): + %4 : int = prim::Constant[value=1]() + %3 : float = prim::Constant[value=1.]() # test.py:6:12 + %o.1 : Float(2, 32, 128, 512, strides=[2097152, 65536, 512, 1], requires_grad=0, device=cuda:0) = aten::add(%2, %3, %4) # test.py:6:8 + %o.5 : Float(2, 32, 128, 512, strides=[2097152, 65536, 512, 1], requires_grad=0, device=cuda:0) = aten::relu(%o.1) # test.py:7:8 + return (%o.5) +``` + +Note that one thing that could prevents fusion when you are running training is autodiff. Fusion pass only runs within `prim::DifferentiableGraph`, so the first thing you should check is to that targetted ops are within differentiable graph subgraphs. +Graph dump could be quite confusing to look at, since it naively dumps all graphs executed by profiling executor and differentiable graphs are executed via a nested graph executor. So for each graph, you might see a few segmented `Optimized Graph` where each corresponds to a differentiable node in the original graph. + +#### 2. Cuda Fusion Graphs + +##### Usage + +Cuda fusion dump gives the input and output graph to fusion pass. This is a good place to check fusion pass logic. + +``` +PYTORCH_JIT_LOG_LEVEL="graph_fuser" python +``` + +##### Example Output + +Running the same script above, in the log, you should be looking for two graphs `Before Fusion` shows the subgraph where fusion pass runs on; `Before Compilation` shows the graph sent to codegen backend, where each `CudaFusionGroup` will trigger codegen runtime system to generate kernel(s) to execute the subgraph. + +``` + Before Fusion: + graph(%x.1 : Tensor): + %2 : float = prim::Constant[value=1.]() + %1 : int = prim::Constant[value=1]() + %3 : Tensor = prim::profile[profiled_type=Float(2, 32, 128, 512, strides=[2097152, 65536, 512, 1], requires_grad=0, device=cuda:0)](%x.1) + %o.10 : Tensor = aten::add(%3, %2, %1) # test.py:6:8 + %5 : Tensor = prim::profile[profiled_type=Float(2, 32, 128, 512, strides=[2097152, 65536, 512, 1], requires_grad=0, device=cuda:0)](%o.10) + %o.7 : Tensor = aten::relu(%5) # test.py:7:8 + %7 : Tensor = prim::profile[profiled_type=Float(2, 32, 128, 512, strides=[2097152, 65536, 512, 1], requires_grad=0, device=cuda:0)](%o.7) + %8 : Tensor = prim::profile[profiled_type=Float(2, 32, 128, 512, strides=[2097152, 65536, 512, 1], requires_grad=0, device=cuda:0)](%o.7) + return (%7, %8) + + Before Compilation: + graph(%x.1 : Tensor): + %13 : bool = prim::CudaFusionGuard[types=[Float(2, 32, 128, 512, strides=[2097152, 65536, 512, 1], requires_grad=0, device=cuda:0)]](%x.1) + %12 : Tensor = prim::If(%13) + block0(): + %o.11 : Tensor = prim::CudaFusionGroup_0(%x.1) + -> (%o.11) + block1(): + %o.7 : Tensor = prim::FallbackGraph_1(%x.1) + -> (%o.7) + return (%12, %12) + with prim::CudaFusionGroup_0 = graph(%2 : Float(2, 32, 128, 512, strides=[2097152, 65536, 512, 1], requires_grad=0, device=cuda:0)): + %4 : int = prim::Constant[value=1]() + %3 : float = prim::Constant[value=1.]() + %o.10 : Float(2, 32, 128, 512, strides=[2097152, 65536, 512, 1], requires_grad=0, device=cuda:0) = aten::add(%2, %3, %4) # test.py:6:8 + %o.7 : Float(2, 32, 128, 512, strides=[2097152, 65536, 512, 1], requires_grad=0, device=cuda:0) = aten::relu(%o.10) # test.py:7:8 + return (%o.7) + with prim::FallbackGraph_1 = graph(%x.1 : Float(2, 32, 128, 512, strides=[2097152, 65536, 512, 1], requires_grad=0, device=cuda:0)): + %1 : int = prim::Constant[value=1]() + %2 : float = prim::Constant[value=1.]() + %o.10 : Float(2, 32, 128, 512, strides=[2097152, 65536, 512, 1], requires_grad=0, device=cuda:0) = aten::add(%x.1, %2, %1) # test.py:6:8 + %o.7 : Float(2, 32, 128, 512, strides=[2097152, 65536, 512, 1], requires_grad=0, device=cuda:0) = aten::relu(%o.10) # test.py:7:8 + return (%o.7) +``` + +### General ideals of debug no-fusion + +Currently there we have a few consumers that utilizes nvfuser via lowering computations to TorchScript and executing that through a ProfilingExecutor. + +Without going into too much details about how the integration is done, a few notes on debugging no-fusion on ProfilingExecutor: + +1. Run TorchScript module multiple times (5 could be a lucky number) to enable fusion. + Because ProfilingExecutor takes the first (few) runs for profiling, later optimization (including the fusion pass the enables nvfuser) relies on profiling information to run, so your initial runs are not going to trigger fused kernels. + Note that the number of profiling runs is dependent on your model. + +2. Fused kernel should show up in TorchScript IR as `prim::CudaFusionGroup`. You can look at your TorchScript optimized graph to see if fusion is happening `jit_model.graph_for(*inputs)`. + +3. If your scripted model has inputs requiring gradient, fusion is only happening for graphs inside `prim::DifferentiableGraph`. + There are many reasons why your graph is not autodiff-able. Take a look at `/torch/csrc/jit/runtime/symbolic_scripts.cpp`, which lists all autodiff-able ops (note that this is a different list from autograd-supported ops). There's also a threshold where tiny autodiff graph are inlined/reverted, which could be disabled via `torch._C._debug_set_autodiff_subgraph_inlining(False)`. + +### General ideals of debug nvfuser mal-functioning + +Assuming we have ProfilingExecutor things worked out properly, that is, you see a region that's supposed to be fused but did not ended up in a fused kernel, here's ways to dig deeper: + +1. Dump fusion pass result: + `PYTORCH_JIT_LOG_LEVEL=graph_fuser python your_script.py &> log` + + Looks for graph dumped with `Before Fusion` & `Before Compilation`, which shows the portion of graph where fusion pass runs on and the result of fusion (`CudaFusionGroup`). + +2. Check out which ops are not fused and roughly why: + `PYTORCH_JIT_LOG_LEVEL=">partition:graph_fuser" python your_script.py &> log` + + Enabling GRAPH_UPDATE from partition.cpp dumps a log when a given node is rejected by fusion. + +3. Disabling FALLBACK path: + If you see a warning where a FALLBACK path has been taken while executing your model with nvfuser enabled, it's indicating that either codegen or fusion pass has failed unexpectedly. This is likely to cause regression on model performance, even though it's still functionally correct. We recommend to disable FALLBACK path, so error would be reported properly to open an informative issue. + + `PYTORCH_NVFUSER_DISABLE_FALLBACK=1 python your_script.py &> log` + +4. Pin point kernel/fusion pattern that's causing error: + With a larger model that includes multiple fusion patterns, it could be tricky to figure out which exact fusion is causing FALLBACK and build up a minimal python repro. + One quick thing to try is to run the example with a few knobs turned on: + + ``` + PYTORCH_NVFUSER_DISABLE_FALLBACK=1 \ + PYTORCH_JIT_LOG_LEVEL=">partition:graph_fuser:>>kernel_cache" \ + python your_script.py &> log + ``` + + This logs all TorchScript IR parsed to codegen IR as well as kernel generated and executed by nvfuser. Since fallback path is disabled, it's likely that the last log would indicate the failing fusion. + + Hint: look for last `Before Compilation:` that indicates a parsing failure, or `running GraphCache: xxxxx`, which indicates jit compilation/execution failure (also search for the GraphCache address, which would should have dumped a TorchScript IR earlier. + +### Query nvfuser codegen kernels + +There're a few debug dump that could be turned on via environment variables. Look for `PYTORCH_NVFUSER_DUMP` inside `[pytorch_source_path]/torch/csrc/jit/codegen/cuda/utils.cpp`. A few useful ones are: +1. `dump_eff_bandwidth`: print out effective bandwidth of each generated kernel. This naively measure the kernel time divided by I/O buffer size and is a good/simple metric of performance for bandwidth bound kernels +2. `cuda_kernel`: print out generated cuda kernels +3. `launch_param`: print out launch config of generated kernels +4. `print_args`: print out input output tensors of executed codegen kernels + +### FAQs + +1. There's regression after turning on nvfuser. + +First thing is to check that you have fusion kernel running properly. Try to run your model with fallback disabled to see if you hit any errors that caused fallback via `export PYTORCH_NVFUSER_DISABLE_FALLBACK=1`. + +2. I didn't see any speedup with nvfuser. + +Check if there is fusion in your script model. Run your script with `PYTORCH_JIT_LOG_LEVEL="graph_fuser"`, you should see some log dump of before/after graph regarding fusion pass. If nothing shows up in the log, that means something in TorchScript is not right and fusion pass are not executed. Check [General ideals of debug no-fusion] for more details. diff --git a/torch/csrc/jit/codegen/cuda/arith.cpp b/torch/csrc/jit/codegen/cuda/arith.cpp index d9bf46b51c7837..cbdf83d8ff3f71 100644 --- a/torch/csrc/jit/codegen/cuda/arith.cpp +++ b/torch/csrc/jit/codegen/cuda/arith.cpp @@ -33,6 +33,9 @@ Val* newScalar(ValType vtype, DataType dtype) { case DataType::Int32: case DataType::Int: return IrBuilder::create(); + case DataType::ComplexFloat: + case DataType::ComplexDouble: + return IrBuilder::create(); default: break; } @@ -187,7 +190,7 @@ Val* newValLike(Val* val, DataType dtype) { Val* castOp(DataType dtype, Val* v1) { if (v1->getDataType().value() == dtype) { - return v1; + return set(v1); } if (cast_func_str(std::make_pair(v1->getDataType().value(), dtype)) == @@ -258,12 +261,10 @@ TensorView* unaryOp( NVFUSER_DEFINE_UNARY_OP(set, Set) NVFUSER_DEFINE_UNARY_OP(randlike, RandLike) -NVFUSER_DEFINE_UNARY_OP(abs, Abs) NVFUSER_DEFINE_UNARY_OP(notOp, Not) NVFUSER_DEFINE_UNARY_OP(ceil, Ceil) NVFUSER_DEFINE_UNARY_OP(floor, Floor) NVFUSER_DEFINE_UNARY_OP(frac, Frac) -NVFUSER_DEFINE_UNARY_OP(gelu, Gelu) NVFUSER_DEFINE_UNARY_OP(neg, Neg) NVFUSER_DEFINE_UNARY_OP(relu, Relu) NVFUSER_DEFINE_UNARY_OP(round, Round) @@ -271,6 +272,25 @@ NVFUSER_DEFINE_UNARY_OP(silu, Silu) NVFUSER_DEFINE_UNARY_OP(trunc, Trunc) #undef NVFUSER_DEFINE_UNARY_OP +// The output of abs(complex_tensor) are real numbers +Val* abs(Val* v) { + if (v->getDataType() == DataType::ComplexDouble) { + Val* out = newValLike(v, DataType::Double); + IrBuilder::create(UnaryOpType::Abs, out, v); + return out; + } + if (v->getDataType() == DataType::ComplexFloat) { + Val* out = newValLike(v, DataType::Float); + IrBuilder::create(UnaryOpType::Abs, out, v); + return out; + } + return unaryOp(UnaryOpType::Abs, v); +} + +TensorView* abs(TensorView* tv) { + return abs(tv->as())->as(); +} + // UNARY FLOAT CAST OPERATIONS #define NVFUSER_DEFINE_UNARY_FLOAT_OP(op_name, op_type) \ @@ -652,8 +672,9 @@ TensorView* reductionOp( const auto init_type = init->getDataType().value(); TORCH_CHECK( (isFloatingPointType(out_type) && isFloatingPointType(init_type)) || + (isComplexType(out_type) && isComplexType(init_type)) || (isIntegralType(out_type) && isIntegralType(init_type)) || - (out_type == DataType::Bool && init_type == DataType::Bool), + (isBooleanType(out_type) && isBooleanType(init_type)), "Types should match for reduction ops but received: ", out_type, " and ", @@ -661,7 +682,7 @@ TensorView* reductionOp( IrBuilder::create(reduction_op_type, init, out, tv); if (keep_dim) { - auto tv_root = TensorDomain::noReductions(tv->getRootDomain()); + auto tv_root = TensorDomain::noReductions(tv->getMaybeRFactorDomain()); std::vector is_broadcast(tv_root.size(), false); for (auto axis : uint_axes) { is_broadcast.at(axis) = true; @@ -680,8 +701,13 @@ TensorView* sum( auto dtype = v1->getDataType().value(); if (isFloatingPointType(dtype)) { init = IrBuilder::create(0.0); + } else if (isComplexType(dtype)) { + init = IrBuilder::create(c10::complex(0.0, 0.0)); } else if (isIntegralType(dtype)) { init = FusionGuard::getCurFusion()->zeroVal(); + } else if (isBooleanType(dtype)) { + v1 = castOp(DataType::Int, v1); + init = FusionGuard::getCurFusion()->zeroVal(); } else { TORCH_CHECK( false, @@ -705,7 +731,13 @@ TensorView* max( init = IrBuilder::create(std::numeric_limits::lowest()); break; case (DataType::Int): - init = IrBuilder::create(INT_MIN); + init = IrBuilder::create(std::numeric_limits::lowest()); + break; + case (DataType::Int32): + init = IrBuilder::create(std::numeric_limits::lowest()); + break; + case (DataType::Bool): + init = IrBuilder::create(false); break; default: TORCH_CHECK( @@ -730,7 +762,13 @@ TensorView* min( init = IrBuilder::create(FLT_MAX); break; case (DataType::Int): - init = IrBuilder::create(INT_MAX); + init = IrBuilder::create(std::numeric_limits::max()); + break; + case (DataType::Int32): + init = IrBuilder::create(std::numeric_limits::max()); + break; + case (DataType::Bool): + init = IrBuilder::create(true); break; default: TORCH_CHECK( @@ -779,7 +817,12 @@ TensorView* broadcast( ParallelType::Serial, IterType::BroadcastWithoutStride)); } else { - out_domain.push_back(inp_domain[iinp]->clone()); + out_domain.push_back(IrBuilder::create( + inp_domain[iinp]->start(), + inp_domain[iinp]->extent(), + inp_domain[iinp]->stopOffset(), + inp_domain[iinp]->getParallelType(), + inp_domain[iinp]->getIterType())); iinp++; } ibdim++; @@ -856,7 +899,7 @@ WelfordResult Welford( // Create tensor outputs TensorView* out_avg = newForReduction(tv, uint_axes); TensorView* out_var = newForReduction(tv, uint_axes); - TensorView* out_N = newForReduction(tv, uint_axes, DataType::Int); + TensorView* out_N = newForReduction(tv, uint_axes, DataType::Index); IrBuilder::create( out_avg, @@ -889,7 +932,7 @@ WelfordResult WelfordResult::rFactor(const std::vector& axes) { TensorView* transpose( TensorView* inp, const std::unordered_map& old2new) { - auto inp_domain = TensorDomain::noReductions(inp->getRootDomain()); + auto inp_domain = TensorDomain::noReductions(inp->getMaybeRFactorDomain()); std::vector out_domain(inp_domain.size()); auto new2old = ir_utils::normalizeOld2New(old2new, inp_domain.size()); @@ -1109,7 +1152,7 @@ TensorView* clamp(TensorView* in, Val* min_val, Val* max_val) { // sum_to operator TensorView* sum_to(TensorView* in, const std::vector& sum_to_size) { - const auto& root = TensorDomain::noReductions(in->getRootDomain()); + const auto& root = TensorDomain::noReductions(in->getMaybeRFactorDomain()); TORCH_CHECK( root.size() >= sum_to_size.size(), @@ -1155,7 +1198,7 @@ TensorView* sum_to(TensorView* in, const std::vector& sum_to_size) { } TensorView* sum_to(TensorView* in, const std::vector& sum_to_size) { - const auto& root = TensorDomain::noReductions(in->getRootDomain()); + const auto& root = TensorDomain::noReductions(in->getMaybeRFactorDomain()); TORCH_CHECK( root.size() >= sum_to_size.size(), @@ -1380,7 +1423,7 @@ TensorView* gather( const std::vector>& pad_width, const std::vector& strides, bool trim_out_of_bounds) { - auto inp_dom = TensorDomain::noReductions(inp->getRootDomain()); + auto inp_dom = TensorDomain::noReductions(inp->getMaybeRFactorDomain()); const auto ndims = inp_dom.size(); TORCH_CHECK( @@ -1484,6 +1527,135 @@ TensorView* gather( return out_tv; } +namespace { + +//! Create new output for mma +static TensorView* newForMma( + TensorView* tv_a, + TensorView* tv_b, + const std::vector& axes, + DataType data_type = DataType::Float) { + auto orig_domain_a = + TensorDomain::noReductions(tv_a->getMaybeRFactorDomain()); + auto orig_domain_b = + TensorDomain::noReductions(tv_b->getMaybeRFactorDomain()); + + TORCH_INTERNAL_ASSERT( + orig_domain_a.size() == orig_domain_b.size(), + "MMA op: need matching dim input"); + + std::set axes_set(axes.begin(), axes.end()); + std::vector new_domain; + + TORCH_INTERNAL_ASSERT( + !axes_set.empty(), + "Asked for ouput of reduction, but no reduction axis provided."); + + TORCH_INTERNAL_ASSERT( + (*(axes_set.rbegin())) < orig_domain_a.size(), + "Error setting up reduction, reduction axis (", + *(axes_set.rbegin()), + ") is outside nDims (", + orig_domain_a.size(), + "). Keep in mind reductions are relative to root domains, not modified views."); + + auto axis_iter = axes_set.begin(); + for (const auto dim : c10::irange(orig_domain_a.size())) { + bool isReduction = false; + if (axis_iter != axes_set.end() && *axis_iter == dim) { + isReduction = true; + axis_iter++; + } + + const IterDomain* id = orig_domain_a[dim]->isBroadcast() + ? orig_domain_b[dim] + : orig_domain_a[dim]; + + TORCH_CHECK( + !(isReduction && id->isBroadcast() && !id->isImplicitBroadcast()), + "Cannot reduce an axis that is marked as broadcasted as it has an undetermined size. Tried to reduce ID = ", + id, + " of tensor ", + tv_a, + "and", + tv_b); + + new_domain.push_back(IrBuilder::create( + id->start(), + id->extent(), + id->stopOffset(), + ParallelType::Serial, + isReduction ? IterType::Reduction : id->getIterType())); + } + + TensorDomain* td = IrBuilder::create( + new_domain, std::vector(new_domain.size(), true)); + + return IrBuilder::create(td, data_type); +} + +} // namespace + +TensorView* fusedMultiplySum( + TensorView* tv_a, + TensorView* tv_b, + const std::vector& axes, + Val* init) { + if (init == nullptr) { + init = IrBuilder::create(0); + } + + // TODO: + // We will want to support initialize and rfactor with + // mma as well, for maybe fusing bias in prolog. + // TODO: check init type if given a tv, + // not supported currently though. + TORCH_CHECK( + init->isConstScalar(), + "Cannot create a reduction operation where the initial value is not a const scalar."); + + // TODO: + // Validate axis relationships between a and b + TORCH_CHECK(tv_a->nDims() > 0, "Tried to reduce a 0-dim tensor"); + + // TODO: + // Add tf32 and other mma data types + // Add fallback path for non-mma data types. + TORCH_CHECK(tv_a->getDataType().value() == DataType::Half); + TORCH_CHECK(tv_b->getDataType().value() == DataType::Half); + + TORCH_CHECK(axes.size() > 0, "No reduction axis specified"); + + // TODO: + // will lift this in a follow up when we have a + // more generic axes matching. + TORCH_CHECK( + axes.size() == 1, "Single axis reduction only for mma op instantiation.") + + std::vector uint_axes; + const int ndims = tv_a->domain()->noReductions().size(); + for (int axis : axes) { + if (axis < 0) { + axis += ndims; + } + + TORCH_CHECK( + axis >= 0 && axis < ndims, + "Reduction on invalid axis, recieved: ", + axis, + " however tensor view only has ", + ndims, + " non-reduction dims."); + + uint_axes.push_back((unsigned int)axis); + } + + TensorView* out = newForMma(tv_a, tv_b, uint_axes); + IrBuilder::create(out, tv_a, tv_b, init); + + return out; +} + } // namespace cuda } // namespace fuser } // namespace jit diff --git a/torch/csrc/jit/codegen/cuda/arith.h b/torch/csrc/jit/codegen/cuda/arith.h index 1f18f65666ad09..f224468c6bed82 100644 --- a/torch/csrc/jit/codegen/cuda/arith.h +++ b/torch/csrc/jit/codegen/cuda/arith.h @@ -161,9 +161,6 @@ TORCH_CUDA_CU_API TensorView* floor(TensorView*); // frac TORCH_CUDA_CU_API Val* frac(Val*); TORCH_CUDA_CU_API TensorView* frac(TensorView*); -// gelu -TORCH_CUDA_CU_API Val* gelu(Val*); -TORCH_CUDA_CU_API TensorView* gelu(TensorView*); // silu TORCH_CUDA_CU_API Val* silu(Val*); TORCH_CUDA_CU_API TensorView* silu(TensorView*); @@ -561,6 +558,28 @@ TORCH_CUDA_CU_API TensorView* gather( const std::vector& strides = {}, bool trim_out_of_bounds = false); +//! A fused pointwise multiply and sum +//! operator that instantiates the following +//! fused pattern: +//! c = mul(tv_a, tv_b); +//! return sum(c, axes) +//! +//! \param tv_a first multiply operand +//! \param tv_b second multiply operand +//! \param axes axes to sum over +//! \param init sum initial value +//! +//! Note & TODO: +//! currently only support lowering to a mma op +//! through this interface and only support fp16 inputs. +//! will support converting back to multiply and reduce in +//! a follow up. +TORCH_CUDA_CU_API TensorView* fusedMultiplySum( + TensorView* tv_a, + TensorView* tv_b, + const std::vector& axes, + Val* init = nullptr); + } // namespace cuda } // namespace fuser } // namespace jit diff --git a/torch/csrc/jit/codegen/cuda/codegen.cpp b/torch/csrc/jit/codegen/cuda/codegen.cpp index 67926e92672644..2287b2835ee603 100644 --- a/torch/csrc/jit/codegen/cuda/codegen.cpp +++ b/torch/csrc/jit/codegen/cuda/codegen.cpp @@ -4,6 +4,7 @@ #include #include #include +#include #include #include @@ -20,6 +21,105 @@ namespace codegen { namespace { +std::string ptrType(DataType dt) { + std::stringstream ss; + ss << dt << "*"; + return ss.str(); +} + +std::string refType(DataType dt) { + std::stringstream ss; + ss << dt << "&"; + return ss.str(); +} + +//! Utility class to build an argument list +class ArgumentBuilder { + public: + //! Build an argument list where each argument is separated with a comma + ArgumentBuilder() = default; + + //! Build an argument list where each argument has its own line + ArgumentBuilder(int indent_level, const char* tab) { + std::stringstream ss; + for (const auto i : c10::irange(indent_level)) { + (void)i; // Suppress unused variable warning + ss << tab; + } + sep_ = ",\n" + ss.str(); + } + + //! Add a new argument + template + ArgumentBuilder& arg(const T& x) { + addSeparator(); + return append(x); + } + + //! Append to the last argument + template + ArgumentBuilder& append(const T& arg) { + ss_ << arg; + return *this; + } + + //! Get a string of the argument list + std::string str() const { + return ss_.str(); + } + + friend std::ostream& operator<<(std::ostream& os, const ArgumentBuilder& ab) { + return os << ab.str(); + } + + private: + void addSeparator() { + if (ss_.tellp() != 0) { + ss_ << sep_; + } + } + + private: + std::string sep_ = ", "; + std::stringstream ss_; +}; + +//! Append to the last argument +template <> +ArgumentBuilder& ArgumentBuilder::append(const bool& arg) { + ss_ << (arg ? "true" : "false"); + return *this; +} + +//! Returns "template_name" +template +std::string genTemplate( + const TemplateNameT& template_name, + const TemplateArgT& template_arg) { + std::stringstream ss; + ss << template_name << "<" << template_arg << ">"; + return ss.str(); +} + +//! Returns "func_name(func_arg)" +template +std::string genCall(const FuncNameT& func_name, const FuncArgT& func_arg) { + std::stringstream ss; + ss << func_name << "(" << func_arg << ")"; + return ss.str(); +} + +//! Returns "func_name(func_arg)" +template +std::string genCall( + const FuncNameT& func_name, + const TemplateArgT& template_arg, + const FuncArgT& func_arg) { + std::stringstream ss; + ss << func_name << "<" << template_arg << ">(" << func_arg << ")"; + return ss.str(); +} + class CudaKernelGenerator : private OptOutConstDispatch { static constexpr const char* kTab = " "; @@ -46,6 +146,8 @@ class CudaKernelGenerator : private OptOutConstDispatch { code_ << "__global__ void " << kernel_name << "("; + std::unordered_set unique_args; + std::vector params; // Inputs & Outputs @@ -53,27 +155,44 @@ class CudaKernelGenerator : private OptOutConstDispatch { params.push_back(val); } for (auto val : kernel_->outputs()) { + TORCH_INTERNAL_ASSERT( + !val->isScalar(), "No scalar output is allowed: ", val->toString()); params.push_back(val); } // Generate parameter declarations - for (Val* val : params) { - if (const auto tv = dynamic_cast(val)) { + unsigned int duplicate_counter = 0; + for (auto i : c10::irange(params.size())) { + std::stringstream var_name_ss; + if (params[i]->isA()) { + var_name_ss << varName(params[i]->as()); + } else { + var_name_ss << gen(params[i]); + } + + // If value is duplicate in arguments change the name to avoid name + // conflicts in args. + if (!unique_args.emplace(params[i]).second) { + var_name_ss << "_duplicate_" << duplicate_counter++; + } + + if (const auto tv = dynamic_cast(params[i])) { if (tv->isCpuScalar()) { - code_ << " CpuScalarTensor<" << val->dtype() << "> " << varName(tv); + code_ << " CpuScalarTensor<" << params[i]->dtype() << "> " + << var_name_ss.str(); } else { code_ - << "Tensor<" << val->dtype() << ", " + << "Tensor<" << params[i]->dtype() << ", " << TensorDomain::noReductions(tv->getMaybeRFactorDomain()).size() - << "> " << varName(tv); + << "> " << var_name_ss.str(); } } else { - TORCH_INTERNAL_ASSERT(val->isScalar()); // NOLINT (LLVM bug 48525) - TORCH_INTERNAL_ASSERT(val->definition() == nullptr); - code_ << val->dtype() << " " << gen(val); + TORCH_INTERNAL_ASSERT(params[i]->isScalar()); // NOLINT (LLVM bug 48525) + TORCH_INTERNAL_ASSERT(params[i]->definition() == nullptr); + code_ << params[i]->dtype() << " " << var_name_ss.str(); } - if (val != params.back()) { + if (i + 1 != params.size()) { code_ << ", "; } } @@ -211,10 +330,6 @@ class CudaKernelGenerator : private OptOutConstDispatch { std::string gen(const Statement* stmt) { std::stringstream tmp_code; std::swap(tmp_code, code_); - auto replacement = replacement_map_.find(stmt); - if (replacement != replacement_map_.end()) { - stmt = replacement->second; - } OptOutConstDispatch::handle(stmt); std::swap(tmp_code, code_); return tmp_code.str(); @@ -247,7 +362,8 @@ class CudaKernelGenerator : private OptOutConstDispatch { void handle(const Bool* pred) final { const auto def = pred->definition(); - if (print_inline_ && def != nullptr) { + const bool has_alloc = alloc_map_.find(pred) != alloc_map_.end(); + if (def != nullptr && !has_alloc) { code_ << "(" << gen(def) << ")"; } else if (pred->isConst()) { code_ << (*pred->value() ? "true" : "false"); @@ -258,7 +374,8 @@ class CudaKernelGenerator : private OptOutConstDispatch { void handle(const Double* d) final { const auto def = d->definition(); - if (print_inline_ && def != nullptr) { + const bool has_alloc = alloc_map_.find(d) != alloc_map_.end(); + if (def != nullptr && !has_alloc) { code_ << "(" << gen(def) << ")"; } else if (d->isConst()) { const int digits = std::numeric_limits::max_digits10; @@ -270,8 +387,9 @@ class CudaKernelGenerator : private OptOutConstDispatch { void handle(const Int* i) final { const auto def = i->definition(); - if (print_inline_ && def != nullptr) { - code_ << "(" << gen(def) << ")"; + const bool has_alloc = alloc_map_.find(i) != alloc_map_.end(); + if (def != nullptr && !has_alloc) { + code_ << "(" << genInline(def) << ")"; } else if (i->isConst()) { code_ << *i->value(); } else { @@ -279,6 +397,20 @@ class CudaKernelGenerator : private OptOutConstDispatch { } } + void handle(const ComplexDouble* c) final { + const auto def = c->definition(); + const bool has_alloc = alloc_map_.find(c) != alloc_map_.end(); + if (def != nullptr && !has_alloc) { + code_ << "(" << gen(def) << ")"; + } else if (c->isConst()) { + const int digits = std::numeric_limits::max_digits10; + code_ << "std::complex" << std::setprecision(digits) + << *c->value(); + } else { + code_ << varName(c); + } + } + void handle(const NamedScalar* ns) final { // dim3 components are unsigned int. Cast to signed integer to // support negative indexing @@ -291,24 +423,27 @@ class CudaKernelGenerator : private OptOutConstDispatch { } void handle(const kir::TensorIndex* ti) final { - code_ << varName(ti->view()) << "["; - bool first = true; + std::stringstream index; for (auto* ind : ti->indices()) { if (!ind->isZeroInt()) { if (!first) { - code_ << " + "; + index << " + "; } - code_ << genInline(ind); + index << genInline(ind); first = false; } } if (first) { - code_ << "0"; + index << "0"; } - - code_ << "]"; + bool is_volatile = ti->view()->getMemoryType() == MemoryType::Global && + kernel_->summary().sync_map.needsRawSync(ti->view()).hasBID(); + if (is_volatile) { + code_ << "*(volatile " << ti->getDataType().value() << "*)&"; + } + code_ << varName(ti->view()) << "[" << index.str() << "]"; } void handle(const IterDomain*) final { @@ -327,6 +462,21 @@ class CudaKernelGenerator : private OptOutConstDispatch { bool is_vector_op = false; size_t vector_word_size = 1; + if (uop->out()->isA()) { + auto out_tv = uop->out()->as()->view(); + if (std::any_of( + out_tv->domain()->domain().begin(), + out_tv->domain()->domain().end(), + [&](IterDomain* id) { return id->isMma(); })) { + auto mma = dynamic_cast( + uop->out()->as()->view()->definition()); + TORCH_INTERNAL_ASSERT( + mma != nullptr, "CodeGen: mma op not in mma loop"); + genMmaInitialization(mma, uop); + return; + } + } + if (vectorize_scope_ && uop->out()->isA()) { auto ti = uop->out()->as(); @@ -370,26 +520,77 @@ class CudaKernelGenerator : private OptOutConstDispatch { uop->out()->dtype() == uop->in()->dtype(), "Vectorized store/load requires input and output datatypes match."); } - } - if (is_vector_op) { - if (uop->in()->isScalar()) { - indent() << "reinterpret_cast<" - << "Array<" << uop->out()->dtype() << ", " << vector_word_size - << ">*>" - << "(&" << gen(uop->out()) << ")->set(" << gen(uop->in()) - << ");\n"; - } else { - indent() << "*reinterpret_cast<" - << "Array<" << uop->out()->dtype() << ", " << vector_word_size - << ">*>" - << "(&" << gen(uop->out()) << ")" - << " = *reinterpret_cast<" - << "Array<" << uop->in()->dtype() << ", " << vector_word_size - << ">*>" - << "(&" << gen(uop->in()) << ");\n"; + if (is_vector_op) { + auto out_tv = uop->out()->as()->view(); + if (uop->in()->isScalar()) { + // Note: + // Double buffered local tensors need indexed initialization, + // so will need to use `arraySet` option. + if (out_tv->getMemoryType() == MemoryType::Local && + !out_tv->isDoubleBuffered()) { + // Vectorized initialization + indent() << varName(out_tv) << ".set(" << gen(uop->in()) << ");\n"; + } else { + // Note: currently arraySet option is not vectorized, so it will + // rely on auto vectorization pass of cuda compiler. + indent() << "arraySet<" << out_tv->getDataType().value() << ", " + << vector_word_size << ">(&" << gen(uop->out()) << ", " + << "(" << out_tv->getDataType().value() << ")" + << gen(uop->in()) << ");\n"; + } + } else { + // Vectorized load + TORCH_INTERNAL_ASSERT( + uop->in()->isA(), + "Invalid input to unary op with tensor output, found: ", + uop->in()->toString()); + + auto in_tv = uop->in()->as()->view(); + bool localToGlobal = out_tv->getMemoryType() == MemoryType::Global && + in_tv->getMemoryType() == MemoryType::Local; + + bool globalToLocal = out_tv->getMemoryType() == MemoryType::Local && + in_tv->getMemoryType() == MemoryType::Global; + + bool globalToGlobal = out_tv->getMemoryType() == MemoryType::Global && + in_tv->getMemoryType() == MemoryType::Global; + + bool is_volatile_to = out_tv->getMemoryType() == MemoryType::Global && + kernel_->summary().sync_map.needsRawSync(out_tv).hasBID(); + + bool is_volatile_from = + in_tv->getMemoryType() == MemoryType::Global && + kernel_->summary().sync_map.needsRawSync(in_tv).hasBID(); + + if (localToGlobal) { + indent() << "loadLocalToGlobal<" << uop->out()->dtype() << ", " + << vector_word_size << ", " + << (is_volatile_to ? "true" : "false") << ">("; + code_ << " &" << gen(uop->out()) << ", &" << gen(uop->in()) + << ");\n"; + } else if (globalToLocal) { + indent() << "loadGlobalToLocal<" << uop->out()->dtype() << ", " + << vector_word_size << ", " + << (is_volatile_from ? "true" : "false") << ">(&" + << gen(uop->out()) << ", "; + code_ << " &" << gen(uop->in()) << ");\n"; + } else if (globalToGlobal) { + indent() << "loadGlobalToGlobal<" << uop->out()->dtype() << ", " + << vector_word_size << ", " + << (is_volatile_to ? "true" : "false") << ", " + << (is_volatile_from ? "true" : "false") << ">("; + code_ << " &" << gen(uop->out()) << ", "; + code_ << " &" << gen(uop->in()) << ");\n"; + } else { + indent() << "loadGeneric<" << uop->out()->dtype() << ", " + << vector_word_size << ">("; + code_ << " &" << gen(uop->out()) << ", "; + code_ << " &" << gen(uop->in()) << ");\n"; + } + } + return; } - return; } if (uop->out()->isA()) { @@ -469,6 +670,9 @@ class CudaKernelGenerator : private OptOutConstDispatch { if (integer_op_str(op_type) && isIntegralType(out->dtype())) { auto int_op = integer_op_str(op_type); expr << *int_op; + } else if (bool_op_str(op_type) && isBooleanType(out->dtype())) { + auto bool_op = bool_op_str(op_type); + expr << *bool_op; } else { expr << op_type; if (needFloatSuffix(op_type) && out->dtype() == DataType::Float) { @@ -620,6 +824,10 @@ class CudaKernelGenerator : private OptOutConstDispatch { if (integer_op_str(op_type) && isIntegralType(bop->out()->dtype())) { auto int_op = integer_op_str(op_type); code_ << " = " << *int_op << "(\n"; + } else if ( + bool_op_str(op_type) && isBooleanType(bop->out()->dtype())) { + auto bool_op = bool_op_str(op_type); + code_ << " = " << *bool_op << "(\n"; } else { std::stringstream op_str; op_str << op_type; @@ -667,6 +875,74 @@ class CudaKernelGenerator : private OptOutConstDispatch { } } + std::string genArchString(MmaOptions options) { + std::stringstream ss; + if (isVolta(options.macro)) { + ss << "Volta"; + } else if (isTuring(options.macro)) { + ss << "Turing"; + } else if (isAmpere(options.macro)) { + ss << "Ampere"; + } else { + TORCH_INTERNAL_ASSERT(false, "mma macro unknown arch"); + } + return ss.str(); + } + + std::string genMmaOp(const MmaOp* mma, bool init = false) { + std::stringstream ss; + auto options = mma->options(); + ss << genArchString(options) << "::"; + if (init) { + ss << "init"; + } + ss << toString(options.macro) << toString(options.operand_layout); + // TODO: additional parameter could be removed by swizzling iterdomain + auto acc_stride = mma->accStride(); + TORCH_INTERNAL_ASSERT(acc_stride > 0); + ss << "<" << acc_stride << ">"; + return ss.str(); + } + + void genMmaOperands(const MmaOp* mma) { + std::stringstream ss; + auto options = mma->options(); + auto in_a = mma->inA()->as()->view(); + auto dtype = in_a->getDataType().value(); + indent() << kTab << "reinterpret_cast*>(&" + << gen(mma->inA()) << "),\n"; + indent() << kTab << "reinterpret_cast*>(&" + << gen(mma->inB()) << ")"; + } + + void genMmaInitialization(const MmaOp* mma, const UnaryOp* uop) { + auto options = mma->options(); + + indent() << genMmaOp(mma, true) << "(reinterpret_castout()->getDataType().value() << "," + << getOutputRegisterSize(mma->options().macro) << "," + << getOutputRegisterSize(mma->options().macro) << ">*>" + << "(&" << gen(uop->out()) << "));\n"; + } + + void handle(const MmaOp* mma) final { + auto options = mma->options(); + auto in_a = mma->inA()->as(); + auto out = mma->out()->as(); + indent() << genMmaOp(mma) << "(\n"; + indent() << kTab << "reinterpret_castview()->getDataType().value() << "," + << getOutputRegisterSize(options.macro) << "," + << getOutputRegisterSize(options.macro) << ">*>(&" + << gen(mma->out()) << "),\n"; + genMmaOperands(mma); + code_ << ");\n"; + } + std::string genReductionOp(BinaryOpType op_type, Val* out) { std::stringstream lambda; DataType data_type = out->dtype(); @@ -870,7 +1146,7 @@ class CudaKernelGenerator : private OptOutConstDispatch { indent() << data_type << " " << "block_result_var_" << block_reduce_name_ << " = " << gen(wop->initVar()) << ";\n"; - indent() << DataType::Int << " " + indent() << out_N->dtype() << " " << "block_result_n_" << block_reduce_name_ << " = " << gen(wop->initN()) << ";\n"; } @@ -900,7 +1176,7 @@ class CudaKernelGenerator : private OptOutConstDispatch { << "*>(shared_mem_avg),\n"; indent() << kTab << "reinterpret_cast<" << data_type << "*>(shared_mem_var),\n"; - indent() << kTab << "reinterpret_cast<" << DataType::Int + indent() << kTab << "reinterpret_cast<" << out_N->dtype() << "*>(shared_mem_n),\n"; TORCH_INTERNAL_ASSERT(wop->predicate() != nullptr); TORCH_INTERNAL_ASSERT( @@ -921,8 +1197,11 @@ class CudaKernelGenerator : private OptOutConstDispatch { std::string generateGridReduceTemplateFlags( const REDUCTION_OP* rop, const ParallelTypeBitmap& thread_pred) { + TORCH_INTERNAL_ASSERT( + !rop->isFused(), "This is not for the fused reduction kernel\n"); + const auto par_domains = ir_utils::getParallelDomains(rop->outputs()[0]); - std::stringstream flags; + ArgumentBuilder flags; for (const ParallelType pt : kParallelTypeThreads) { const bool parallel_reduction = par_domains.find(pt) != par_domains.end() && @@ -941,10 +1220,7 @@ class CudaKernelGenerator : private OptOutConstDispatch { } else { flag = !pred && !parallel_reduction; } - if (pt != kParallelTypeThreads[0]) { - flags << ", "; - } - flags << (flag ? "true" : "false"); + flags.arg(flag); } return flags.str(); } @@ -967,6 +1243,11 @@ class CudaKernelGenerator : private OptOutConstDispatch { grop->reduction_buffer()->buffer()->as(); const auto sync_buffer = grop->sync_buffer()->buffer()->as(); + if (rop->isFused()) { + generateFusedGridReduction(grop); + return; + } + const std::string flags_str = generateGridReduceTemplateFlags(rop, grop->threadPredicate()); @@ -974,33 +1255,108 @@ class CudaKernelGenerator : private OptOutConstDispatch { kernel_->summary().has_cooperative_grid_reduction; // Since block-level reduction is already done, those dimensions - // with tidx/y/z being true do not participate in the grid reduction. - indent() << "reduction::gridReduce<" << flags_str << ", " - << (persistent_sync ? "true" : "false") << ">(\n"; - indent() << kTab << gen(rop->out()) << ",\n"; + // with tidx/y/z being true do not participate in the grid + // reduction. + ArgumentBuilder template_args; + template_args.arg(flags_str).arg(persistent_sync); + + ArgumentBuilder func_args(block_nest_level_ + 1, kTab); + func_args.arg(gen(rop->out())); if (domain->hasBlockReduction()) { - indent() << kTab << "block_result_" << block_reduce_name_ << ",\n"; + func_args.arg("block_result_").append(block_reduce_name_); block_reduce_name_++; } else { - indent() << kTab << gen(rop->in()) << ",\n"; + func_args.arg(gen(rop->in())); } - indent() << kTab << genReductionOp(op_type, out) << ",\n"; - indent() << kTab << "&" << varName(work_buffer) << "[0],\n"; - indent() << kTab << varName(sync_buffer) << ",\n"; - indent() << kTab << "static_cast<" << data_type << "*>(shared_mem),\n"; + func_args.arg(genReductionOp(op_type, out)); + func_args.arg("&").append(varName(work_buffer)).append("[0]"); + func_args.arg(varName(sync_buffer)); + func_args.arg(genCall("static_cast", ptrType(data_type), "shared_mem")); + // read and write predicates TORCH_INTERNAL_ASSERT( grop->predicate() != nullptr && grop->predicate()->hasValue()); - auto read_pred = genInline(grop->predicate()); - indent() << kTab << read_pred << ",\n"; + const auto read_pred = genInline(grop->predicate()); + func_args.arg(read_pred); if (grop->writePredicate() != nullptr) { TORCH_INTERNAL_ASSERT(grop->writePredicate()->hasValue()); - auto write_pred = genInline(grop->writePredicate()); - indent() << kTab << write_pred << ",\n"; + func_args.arg(genInline(grop->writePredicate())); } else { - indent() << kTab << read_pred << ",\n"; + func_args.arg(read_pred); } - indent() << kTab << data_type << "(" - << genInline(grop->reduction_op()->init()) << "));\n"; + // Init val + func_args.arg(genCall(data_type, genInline(grop->reduction_op()->init()))); + + indent() << "reduction::gridReduce<" << template_args << ">(\n"; + indent() << kTab << func_args << ");\n"; + } + + std::string genFusedReductionName(const kir::TensorIndex* reduction_out) { + return varName(reduction_out->view()) + "_reduction"; + } + + void generateFusedGridReduction(const kir::GridReduction* grop) { + const auto rop = grop->reduction_op(); + TORCH_INTERNAL_ASSERT(rop->isFused()); + + const auto out = rop->out()->as(); + const auto domain = out->view()->domain(); + + const auto data_type = rop->out()->dtype(); + const auto op_type = rop->getReductionOpType(); + + const auto work_buffer = + grop->reduction_buffer()->buffer()->as(); + const auto sync_buffer = grop->sync_buffer()->buffer()->as(); + + const auto reduction_name = genFusedReductionName(out); + + // template + // __device__ __inline__ void reduce( + // RefTuple out, + // const LocalTuple& inp, + // VolatilePtrTuple global_work_buffer, + // int64_t* global_sync_buffer, // Allocated as product of all + // // non-participating Grid dimension + // PtrTuple shared_buf, + // bool read_pred, // Prevent reading from out of bounds memory + // bool write_pred, // Prevent from writing out of bounds + // const LocalTuple& init_val, + // Func reduction_op); + + indent() << reduction_name << ".reduce(\n"; + + ArgumentBuilder func_args(block_nest_level_ + 1, kTab); + // out + func_args.arg(genCall("RefTuple", data_type, gen(rop->out()))); + // inp + func_args.arg(genCall("ConstRefTuple", data_type, gen(rop->in()))); + // global_work_buffer + func_args.arg(genCall( + "VolatilePtrTuple", data_type, "&" + varName(work_buffer) + "[0]")); + // global_sync_buffer + func_args.arg("&").append(varName(sync_buffer)).append("[0]"); + // shared_buf + func_args.arg(genCall( + "PtrTuple", + data_type, + genCall("static_cast", ptrType(data_type), "shared_mem"))); + // read and write predicates + TORCH_INTERNAL_ASSERT( + grop->predicate() != nullptr && grop->predicate()->hasValue()); + const auto read_pred = genInline(grop->predicate()); + auto write_pred = read_pred; + if (grop->writePredicate() != nullptr) { + TORCH_INTERNAL_ASSERT(grop->writePredicate()->hasValue()); + write_pred = genInline(grop->writePredicate()); + } + func_args.arg(read_pred).arg(write_pred); + // init_val + func_args.arg(genCall( + "LocalTuple", data_type, genInline(grop->reduction_op()->init()))); + // reduction_op + func_args.arg(genReductionOp(op_type, out)); + + indent() << kTab << func_args << ");\n"; } void handle(const kir::GridBroadcast* grop) final { @@ -1066,6 +1422,11 @@ class CudaKernelGenerator : private OptOutConstDispatch { const auto n_buffer = gwop->N_buffer()->buffer()->as(); const auto sync_buffer = gwop->sync_buffer()->buffer()->as(); + if (wop->isFused()) { + generateFusedGridWelford(gwop); + return; + } + const bool persistent_sync = kernel_->summary().has_cooperative_grid_reduction; @@ -1119,76 +1480,188 @@ class CudaKernelGenerator : private OptOutConstDispatch { indent() << kTab << data_type << "(0));\n"; } + void generateFusedGridWelford(const kir::GridWelford* gwop) { + const auto wop = gwop->welford_op(); + TORCH_INTERNAL_ASSERT(wop->isFused()); + + const auto out = wop->out()->as(); + const auto domain = out->view()->domain(); + + const auto data_type = wop->outAvg()->dtype(); + const auto index_type = wop->outN()->dtype(); + TORCH_INTERNAL_ASSERT(wop->outAvg()->dtype() == wop->outVar()->dtype()); + + ArgumentBuilder data_type_args; + data_type_args.arg(data_type).arg(data_type).arg(index_type); + + const auto sync_buffer = gwop->sync_buffer()->buffer()->as(); + + const auto reduction_name = genFusedReductionName(out); + + // template + // __device__ __inline__ void reduce( + // RefTuple out, + // const LocalTuple& inp, + // VolatilePtrTuple global_work_buffer, + // int64_t* global_sync_buffer, // Allocated as product of all + // // non-participating Grid dimension + // PtrTuple shared_buf, + // bool read_pred, // Prevent reading from out of bounds memory + // bool write_pred, // Prevent from writing out of bounds + // const LocalTuple& init_val, + // Func reduction_op); + + ArgumentBuilder out_args; + out_args.arg(gen(wop->outAvg())); + out_args.arg(gen(wop->outVar())); + out_args.arg(gen(wop->outN())); + + ArgumentBuilder in_args; + in_args.arg(gen(wop->inAvg())); + if (wop->inVar() != nullptr) { + in_args.arg(gen(wop->inVar())); + } else { + in_args.arg("(").append(data_type).append(")0"); + } + in_args.arg(gen(wop->inN())); + + ArgumentBuilder init_args; + init_args.arg(gen(wop->initAvg())); + init_args.arg(gen(wop->initVar())); + init_args.arg(gen(wop->initN())); + + ArgumentBuilder work_buffer_args; + work_buffer_args.arg("&") + .append(varName(gwop->avg_buffer()->buffer()->as())) + .append("[0]"); + work_buffer_args.arg("&") + .append(varName(gwop->var_buffer()->buffer()->as())) + .append("[0]"); + work_buffer_args.arg("&") + .append(varName(gwop->N_buffer()->buffer()->as())) + .append("[0]"); + + ArgumentBuilder smem_buffer_args; + smem_buffer_args.arg( + genCall("reinterpret_cast", ptrType(data_type), "shared_mem_avg")); + smem_buffer_args.arg( + genCall("reinterpret_cast", ptrType(data_type), "shared_mem_var")); + smem_buffer_args.arg( + genCall("reinterpret_cast", ptrType(index_type), "shared_mem_n")); + + ArgumentBuilder func_args(block_nest_level_ + 1, kTab); + // out + func_args.arg(genCall("RefTuple", data_type_args, out_args)); + // inp + func_args.arg(genCall("ConstRefTuple", data_type_args, in_args)); + // global_work_buffer + func_args.arg( + genCall("VolatilePtrTuple", data_type_args, work_buffer_args)); + // global_sync_buffer + func_args.arg("&").append(varName(sync_buffer)).append("[0]"); + // shared_buf + func_args.arg(genCall("PtrTuple", data_type_args, smem_buffer_args)); + // read and write predicates + TORCH_INTERNAL_ASSERT( + gwop->predicate() != nullptr && gwop->predicate()->hasValue()); + const auto read_pred = genInline(gwop->predicate()); + auto write_pred = read_pred; + if (gwop->writePredicate() != nullptr) { + TORCH_INTERNAL_ASSERT(gwop->writePredicate()->hasValue()); + write_pred = genInline(gwop->writePredicate()); + } + func_args.arg(read_pred).arg(write_pred); + // init_val + func_args.arg(genCall("LocalTuple", data_type_args, init_args)); + // reduction_op + func_args.arg(genTemplate( + "welfordCombine", ArgumentBuilder().arg(data_type).arg(index_type))); + + indent() << reduction_name << ".reduce(\n"; + indent() << kTab << func_args << ");\n"; + } + + void handle(const kir::AllocateFusedReduction* alloc_fused_reduction) final { + // See the runtime file of the fused reduction + enum class ReductionParallelTypeState { Reduce, Iter, Pred, Inactive }; + + using ReductionParallelTypeStateArray = + ParallelTypeMap; + + ReductionParallelTypeStateArray states( + ReductionParallelTypeState::Inactive); + + for (const ParallelType pt : kParallelTypeThreads) { + // It may be better to predicate grid reductions on dimensions they don't + // actively use, however since that should generally be discouraged (they + // should be part of the iter portion of the operation, or they should be + // predciated out) we're just going to assume they're part of the iter + // dimension. This would cause more communication than strictly necessary + // but should not be a common use case. + auto pt_dim = kernel_->summary().parallel_dimension_map_.get(pt); + if (pt_dim == nullptr || pt_dim->isOneInt()) { + continue; + } + // Initialize pt_dim if used to an iter dimension. It may change to a + // reduction or predicated dimension later. + states[pt] = ReductionParallelTypeState::Iter; + } + + for (auto id : alloc_fused_reduction->out()->view()->domain()->domain()) { + auto pt = id->getParallelType(); + if (isParallelTypeThread(pt)) { + auto state = id->isReduction() ? ReductionParallelTypeState::Reduce + : ReductionParallelTypeState::Iter; + states[pt] = state; + } + } + + for (const auto predicated_pt : alloc_fused_reduction->threadPredicate()) { + auto& state = states[predicated_pt]; + TORCH_INTERNAL_ASSERT( + state != ReductionParallelTypeState::Reduce, + "Invalid thread predication: ", + predicated_pt); + state = ReductionParallelTypeState::Pred; + } + + ArgumentBuilder flags; + for (auto pt : kParallelTypeThreads) { + flags.arg(static_cast(states[pt])); + } + + // Persistent + flags.arg(true); + + // Broadcast is fused + flags.arg(true); + + const auto reduction_name = + genFusedReductionName(alloc_fused_reduction->out()); + + indent() << genTemplate("fused_reduction::ParallelReduce", flags) << " " + << reduction_name << ";\n"; + } + void handleScope(const kir::Scope& scope) { for (auto expr : scope.exprs()) { OptOutConstDispatch::handle(expr); } } - void handle(const kir::ForLoop* loop) final { - if (loop->iter_domain()->isBroadcast()) { - handleScope(loop->body()); - return; - } else if (loop->vectorize()) { + void handleTrivialLoop(const kir::ForLoop* loop) { + if (loop->vectorize()) { vectorize_scope_ = loop->vectorize(); - handleScope(loop->body()); - vectorize_scope_ = false; - return; - } else if (loop->iter_domain()->isStride()) { - // A stride domain only executes the loop body with the loop - // index being zero. - indent() << "constexpr " - << "nvfuser_index_t" - << " " << gen(loop->index()) << " = 0;\n"; - handleScope(loop->body()); - return; } - - // By default, a parallelized loop would look like: - // - // for (int x = threadIdx.x; x < stop; x += blockDim.x) { - // do_some_comp(x); - // } - // - // When stop is guaranteed to be smaller or equal to the number of - // threads, the for-loop is not necessary. In the above case, we - // would just generate the loop body without the for clause but - // references to the loop index replaced by the loop start value. - // - // When the loop end is the same as the IterDomain extent, the - // assumption can be safely made. This is more conservative than - // necessary since the loop stop value just needs to be <= the - // IterDomain extent. However, at this point, this conservative - // analysis seems sufficient. - if (loop->stop() == loop->iter_domain()->extent() && - loop->iter_domain()->isThread()) { - // Register a replacement of references to the loop index with - // the loop start value. - replacement_map_.insert({loop->index(), loop->start()}); - handleScope(loop->body()); - replacement_map_.erase(loop->index()); - return; + handleScope(loop->body()); + if (loop->vectorize()) { + vectorize_scope_ = false; } + } - if (loop->start()->isZeroInt() && loop->stop()->isOneInt()) { - indent() << "constexpr " - << "nvfuser_index_t" - << " " << gen(loop->index()) << " = 0;\n"; - handleScope(loop->body()); - return; - } else if ( - // Special case handling for a pattern where start == end - 1. - loop->start()->definition() != nullptr && - loop->start()->definition()->isA() && - loop->start()->definition()->as()->getBinaryOpType() == - BinaryOpType::Sub && - loop->start()->definition()->as()->lhs() == loop->stop() && - loop->start()->definition()->as()->rhs()->isOneInt()) { - indent() << "const " - << "nvfuser_index_t" - << " " << gen(loop->index()) << " = " << genInline(loop->start()) - << ";\n"; - handleScope(loop->body()); + void handle(const kir::ForLoop* loop) final { + if (loop->isTrivial()) { + handleTrivialLoop(loop); return; } @@ -1259,6 +1732,9 @@ class CudaKernelGenerator : private OptOutConstDispatch { void handle(const kir::Allocate* alloc) final { const auto buffer_dtype = alloc->buffer()->dtype(); + TORCH_INTERNAL_ASSERT(alloc->buffer() != nullptr); + alloc_map_.emplace(alloc->buffer(), alloc); + if (!alloc->buffer()->isA()) { indent() << buffer_dtype << " " << gen(alloc->buffer()) << ";\n"; return; @@ -1273,8 +1749,9 @@ class CudaKernelGenerator : private OptOutConstDispatch { // Allocate alias another Allocate stmt const auto alias_tv = alloc->alias()->buffer()->as(); indent() << "// Alias Allocation - " << alloc->memoryType() << "\n"; - indent() << buffer_dtype << "* " << varName(tv) << " = " - << varName(alias_tv) << ";\n"; + indent() << "auto& " << varName(tv) << " = " << varName(alias_tv) + << ";\n"; + } else { // Standard Memory Allocation switch (tv->getMemoryType()) { @@ -1284,11 +1761,23 @@ class CudaKernelGenerator : private OptOutConstDispatch { case MemoryType::Shared: if (kir::ExpressionEvaluator::isConst(size)) { // Static shared memory - indent() << "__shared__ " << buffer_dtype << " " << varName(tv) - << "[" << genInline(size) << "];\n"; + // Always align to 16B for tensorview buffers + // with any vectorized access. + // TODO: + // This path will be less commonly exercised once we + // start dynamically allocate all the tensors and + // might be removed in a follow up. + auto va = kernel_->summary().vectorized_accesses; + if (va.count(tv)) { + indent() << "__align__(16) "; + } else { + indent(); + } + code_ << "__shared__ " << buffer_dtype << " " << varName(tv) << "[" + << genInline(size) << "];\n"; } else { // Align Offset Position - indent() << "offset = alignBufferSize(offset," + indent() << "offset = alignBufferSize(offset, " << dataTypeSize(buffer_dtype) << ");\n"; // Shared Memory Pointer indent() << buffer_dtype << "* " << varName(tv) @@ -1299,17 +1788,23 @@ class CudaKernelGenerator : private OptOutConstDispatch { << buffer_dtype << "));\n"; } break; - case MemoryType::Local: - indent() << buffer_dtype << " " << varName(tv) << "[" - << genInline(size) << "];\n"; - break; + case MemoryType::Local: { + auto va = kernel_->summary().vectorized_accesses; + if (va.find(tv) != va.end()) { + indent() << "Array<" << buffer_dtype << ", " << genInline(size) + << ", " << va.at(tv) << "> " << varName(tv) << ";\n"; + } else { + indent() << buffer_dtype << " " << varName(tv) << "[" + << genInline(size) << "];\n"; + } + } break; default: TORCH_INTERNAL_ASSERT(false, "Unexpected memory type"); } } } - void handle(const kir::Sync*) final { + void handle(const kir::BlockSync*) final { // Use a custom synchronization method if enabled if (std::getenv("PYTORCH_NVFUSER_USE_BLOCK_SYNC_ATOMIC")) { indent() << "block_sync::sync();\n"; @@ -1318,6 +1813,31 @@ class CudaKernelGenerator : private OptOutConstDispatch { } } + void handle(const kir::GridSync* sync) final { + // Use a custom synchronization method if enabled + bool bidx = sync->syncDims().get(ParallelType::BIDx); + bool bidy = sync->syncDims().get(ParallelType::BIDy); + bool bidz = sync->syncDims().get(ParallelType::BIDz); + auto bool2str = [](bool b) { return (b ? "true" : "false"); }; + std::stringstream sync_str; + sync_str << bool2str(bidx) << ", " << bool2str(bidy) << ", " + << bool2str(bidz); + + std::stringstream sync_segment_size; + sync_segment_size << "index_utils::maskedSize<" << sync_str.str() + << ">(gridDim)"; + + std::stringstream sync_idx; + sync_idx << "index_utils::maskedOffset<" << bool2str(!bidx) << ", " + << bool2str(!bidy) << ", " << bool2str(!bidz) + << ">(gridDim, blockDim)"; + + indent() << "grid_sync::sync<" << sync_str.str() << ", true>(\n"; + indent() << " " << varName(sync->syncBuffer()) << "[" << sync_idx.str() + << "],\n"; + indent() << " " << sync_segment_size.str() << ");\n"; + } + void handle(const kir::InitMagicZero*) final { indent() << "NVFUSER_DEFINE_MAGIC_ZERO\n"; } @@ -1336,8 +1856,9 @@ class CudaKernelGenerator : private OptOutConstDispatch { // Mark when we are inside of a vectorized for-loop bool vectorize_scope_ = false; - //! Holds active replacement mappings during codegen - std::unordered_map replacement_map_; + //! Keep track of Allocate node for Val. Used to determine if Val + //! should be inlined. + std::unordered_map alloc_map_; }; } // namespace diff --git a/torch/csrc/jit/codegen/cuda/compute_at.cpp b/torch/csrc/jit/codegen/cuda/compute_at.cpp index f51e0fe1bc9e98..306f631194f7b5 100644 --- a/torch/csrc/jit/codegen/cuda/compute_at.cpp +++ b/torch/csrc/jit/codegen/cuda/compute_at.cpp @@ -785,16 +785,14 @@ void ComputeAt::updateSiblings() { id->parallelize(sibling_id->getParallelType()); } } - if (tv->getComputeAtPosition() > sibling_tv->getComputeAtPosition()) { - auto sibling_domain = TransformReplay::fullSelfReplay( - sibling_tv->domain(), tv->domain()); - validateDomain(sibling_tv, sibling_domain); - sibling_tv->setDomain(sibling_domain); - sibling_tv->setComputeAt(tv->getComputeAtPosition()); - sibling_tv->setMaxProducer(tv->getMaxProducerPosition()); - auto consumer_tvs = ir_utils::consumerTvsOf(sibling_tv); - consumers_to_update.insert(consumer_tvs.begin(), consumer_tvs.end()); - } + auto sibling_domain = + TransformReplay::fullSelfReplay(sibling_tv->domain(), tv->domain()); + validateDomain(sibling_tv, sibling_domain); + sibling_tv->setDomain(sibling_domain); + sibling_tv->setComputeAt(tv->getComputeAtPosition()); + sibling_tv->setMaxProducer(tv->getMaxProducerPosition()); + auto consumer_tvs = ir_utils::consumerTvsOf(sibling_tv); + consumers_to_update.insert(consumer_tvs.begin(), consumer_tvs.end()); } } diff --git a/torch/csrc/jit/codegen/cuda/compute_at_map.cpp b/torch/csrc/jit/codegen/cuda/compute_at_map.cpp index f46a7495163024..0269c890ba0f5d 100644 --- a/torch/csrc/jit/codegen/cuda/compute_at_map.cpp +++ b/torch/csrc/jit/codegen/cuda/compute_at_map.cpp @@ -256,10 +256,15 @@ void ComputeAtMap::build(Fusion* fusion, GpuLower* gpu_lower) { if (first_output_tv == nullptr) { first_output_tv = c_tv; } else { - // Map multi outputs of an expression to eachother. c is current output, - // and f as first output. Keep consistent with the later section of - // producer and consumers. Which here producer is now "first output", - // and consumer is still consumer. + // Map multi outputs of an expression to each other. c is current + // output, and f as first output. Keep consistent with the later section + // of producer and consumers. Which here producer is now "first output", + // and consumer is still consumer. One exception is how the + // domains left of CA positions are handled in the Parallel + // map. Those domains are not mapped in producer and consumer + // mappings as they do not share loops, but are mapped in the + // case of mapping multiple outputs since they do share the + // same loops. TORCH_INTERNAL_ASSERT( c_tv->getRootDomain().size() == @@ -282,35 +287,14 @@ void ComputeAtMap::build(Fusion* fusion, GpuLower* gpu_lower) { auto c2f_map = replay_FasC.getReplay(); - // If we're creating parallel map, only map the leaf - // axes. Also, the producer axis must be left of the CA - // point. - // Otherwise, map the entire replay map. - if (mapping_mode_ == MappingMode::PARALLEL) { - // Mark axes left of compute at point for parallel type tracking - std::unordered_set producer_axes_to_map( - first_output_tv->domain()->domain().begin(), - first_output_tv->domain()->domain().begin() + - first_output_tv->getComputeAtPosition()); - - for (auto c_id : c_tv->domain()->domain()) { - auto it = c2f_map.find(c_id); - if (it == c2f_map.end()) { - continue; - } - auto f_id = it->second; - if (producer_axes_to_map.find(f_id) == producer_axes_to_map.end()) { - continue; - } - mapIds(f_id, c_id); - } - } else { - for (auto entry : c2f_map) { - auto c_id = entry.first; - auto f_id = entry.second; - // Map the id's together - mapIds(f_id, c_id); - } + // Map the entire replay map between the multiple + // consumers even for the Parallel map as they share the same + // loop. + for (auto entry : c2f_map) { + auto c_id = entry.first; + auto f_id = entry.second; + // Map the id's together + mapIds(f_id, c_id); } } @@ -457,16 +441,42 @@ void ComputeAtMap::build(Fusion* fusion, GpuLower* gpu_lower) { int max_concrete_count = -1; int max_broadcast_count = -1; IterDomain* concrete_id = nullptr; + + // Prefer domains appearing after rfactor domains. This matters + // when view merges domains to create a new domain, which becomes + // an rfactor domain. Suppose a broadcast follows the view + // operation and the broadcast domain is merged with the domain + // matching with the rfactor domain, that domain should be chosen + // as the concrete domain as it has the broadcast domain and the + // domain matching with the rfactor domain. The concrete domain + // does not have a history of merge/shift further up from the + // rfactor domain in pre-view tensors, but that should be fine as + // IndexCompute with those pre-view tensors should be able to + // compute indices from their leaf domains. + // See issue #1493 + + // Indicate if the previous ID was an rfactor domain + bool rf_detected = false; for (auto id : *set) { - int concrete_count = n_concrete_ids_.at(id); - if (concrete_count >= max_concrete_count) { - int broadcast_count = n_broadcast_ids_.at(id); - if (concrete_count > max_concrete_count || - broadcast_count > max_broadcast_count) { - max_concrete_count = concrete_count; - max_broadcast_count = broadcast_count; - concrete_id = id; + // If the previous ID is an rfactor, reset the concrete ID with + // this ID no matter how many IDs the previous concrete ID has. + if (rf_detected) { + concrete_id = id; + max_concrete_count = n_concrete_ids_.at(id); + max_broadcast_count = n_broadcast_ids_.at(id); + rf_detected = id->isRFactorProduct(); + } else { + int concrete_count = n_concrete_ids_.at(id); + if (concrete_count >= max_concrete_count) { + int broadcast_count = n_broadcast_ids_.at(id); + if (concrete_count > max_concrete_count || + broadcast_count > max_broadcast_count) { + max_concrete_count = concrete_count; + max_broadcast_count = broadcast_count; + concrete_id = id; + } } + rf_detected = id->isRFactorProduct(); } } diff --git a/torch/csrc/jit/codegen/cuda/contiguity.cpp b/torch/csrc/jit/codegen/cuda/contiguity.cpp new file mode 100644 index 00000000000000..780e4298c6bf52 --- /dev/null +++ b/torch/csrc/jit/codegen/cuda/contiguity.cpp @@ -0,0 +1,164 @@ +#include +#include + +#include + +namespace torch { +namespace jit { +namespace fuser { +namespace cuda { + +ContigIDs::ContigIDs( + const std::vector& ids, + const std::vector& root_domain, + const std::vector& root_contiguity) + : root_domain_(root_domain), root_contiguity_(root_contiguity) { + if (ids.empty()) { + return; + } + + TORCH_INTERNAL_ASSERT( + root_domain_.size() == root_contiguity_.size(), + "Arguments don't match ", + root_domain_.size(), + " != ", + root_contiguity_.size()); + + TORCH_INTERNAL_ASSERT( + GpuLower::current() != nullptr, "GpuLower is not found"); + + for (const auto i : c10::irange(root_domain_.size())) { + auto root_domain_i = root_domain_[i]->as(); + // If a root domain has halo, can't use merged domain even if + // both inputs are contiguous. HaloInfo is also initialized for + // rfactor root domains, which should just return "zero" + // RootAxisInfo. This should be safe as no rfactor tensor should + // need halo. + if (root_contiguity_[i] && + !GpuLower::current() + ->haloInfo() + .getRootAxisInfo(root_domain_i) + .hasHalo()) { + contig_ids_.emplace(root_domain_i); + is_contig_root_[root_domain_i] = true; + within_contig_ids_[root_domain_i] = std::unordered_set(); + } else { + is_contig_root_[root_domain_i] = false; + } + root_to_indexed_id_[root_domain_i] = root_domain_i; + } + + auto exprs = StmtSort::getExprs(ids[0]->fusion(), {ids.begin(), ids.end()}); + + for (auto expr : exprs) { + handle(expr); + } +} + +void ContigIDs::handle(Merge* merge) { + // If either input is non-contiguous so is output. + const auto inner = merge->inner(); + const auto outer = merge->outer(); + + if (!isContig(inner) || !isContig(outer)) { + return; + } + + // Grab inputs, make sure they're in root domain, check if they're + // contiguous. + + auto lhs_inputs = + ir_utils::iterDomainInputsOfOrderedAs({outer}, root_domain_); + auto rhs_inputs = + ir_utils::iterDomainInputsOfOrderedAs({inner}, root_domain_); + + TORCH_INTERNAL_ASSERT( + inRoot(lhs_inputs) && inRoot(rhs_inputs), + "Found an invalid merge operation, inputs of its arguments are not in the root domain."); + + std::deque ordered_inputs(lhs_inputs.begin(), lhs_inputs.end()); + ordered_inputs.insert( + ordered_inputs.end(), rhs_inputs.begin(), rhs_inputs.end()); + + // If any root input is not contig, output is not contig + if (!(std::all_of( + ordered_inputs.begin(), ordered_inputs.end(), [this](IterDomain* id) { + return is_contig_root_.at(id) && !id->isBroadcast() && + !id->isReduction(); + }))) { + return; + } + + std::deque root_copy(root_domain_.begin(), root_domain_.end()); + + // Forward to first matching argument + while (!root_copy.empty() && !ordered_inputs.empty()) { + if (root_copy.front() != ordered_inputs.front()) { + root_copy.pop_front(); + } else { + break; + } + } + + // Forward through all matching arguments + while (!root_copy.empty() && !ordered_inputs.empty()) { + if (root_copy.front() == ordered_inputs.front()) { + root_copy.pop_front(); + ordered_inputs.pop_front(); + // This is no longer causing an error in: + // ReductionSchedulerMultiDimNonFastest TODO: test reenablement to make + // sure it does what's expected + // } else if ( + // root_copy.front()->isReduction() || + // root_copy.front()->isBroadcast()) { + // root_copy.pop_front(); + } else { + break; + } + } + + // If we matched all inputs, the output is contiguous. Only want to keep the + // top contig ID, lower ids should be placed in the "within_contig_ids" map + // of top id. + auto out = merge->out()->as(); + if (ordered_inputs.empty()) { + if (contig_ids_.find(inner) != contig_ids_.end()) { + contig_ids_.erase(inner); + } + + if (contig_ids_.find(outer) != contig_ids_.end()) { + contig_ids_.erase(outer); + } + + contig_ids_.emplace(out); + + std::unordered_set within_out; + within_out.emplace(inner); + if (within_contig_ids_.find(inner) != within_contig_ids_.end()) { + auto in_inner = within_contig_ids_.at(inner); + within_out.insert(in_inner.begin(), in_inner.end()); + within_contig_ids_.erase(inner); + } + + within_out.emplace(outer); + if (within_contig_ids_.find(outer) != within_contig_ids_.end()) { + auto in_outer = within_contig_ids_.at(outer); + within_out.insert(in_outer.begin(), in_outer.end()); + within_contig_ids_.erase(outer); + } + + within_contig_ids_[out] = within_out; + + for (auto root : lhs_inputs) { + root_to_indexed_id_[root] = out; + } + for (auto root : rhs_inputs) { + root_to_indexed_id_[root] = out; + } + } +} + +} // namespace cuda +} // namespace fuser +} // namespace jit +} // namespace torch diff --git a/torch/csrc/jit/codegen/cuda/contiguity.h b/torch/csrc/jit/codegen/cuda/contiguity.h new file mode 100644 index 00000000000000..0379f0c5ecda37 --- /dev/null +++ b/torch/csrc/jit/codegen/cuda/contiguity.h @@ -0,0 +1,88 @@ +#pragma once + +#include + +#include + +namespace torch { +namespace jit { +namespace fuser { +namespace cuda { + +// A merge is contiguous if: +// Inputs of outer are to the left in the root domain of the inputs of RHS. +// All inputs are contiguous in the root domain: +// - All marked as contiguous +// - Only gaps between inputs are broadcast or reductoin dims +// There are no split transformations performed on outer or inner +// All transformations on outer or inner are contiguous merges +// If this criteria holds, then we can index the input root domains of this +// merge with the indexing provided to the output of the merge in the backward +// index pass + +class ContigIDs : public OptInDispatch { + public: + ContigIDs() = delete; + + // Check through the history of ids whose inputs map to root_domain with + // contiguity root_contiguity. Return unordered_set of all merges that are + // contiguous. Ignore root order is primarily used for predicate generation. + // In this case we can linearize indexing of any ID that only consists of + // merge operations. + ContigIDs( + const std::vector& ids, + const std::vector& root_domain, + const std::vector& root_contiguity); + + const std::unordered_set& contigIDs() const { + return contig_ids_; + } + + const std::unordered_map>& + withinContigIDs() const { + return within_contig_ids_; + } + + const std::unordered_map& rootToIndexedID() const { + return root_to_indexed_id_; + } + + private: + using OptInDispatch::handle; + + bool inRoot(const std::vector& ids) { + return std::all_of(ids.begin(), ids.end(), [this](IterDomain* id) { + return is_contig_root_.find(id) != is_contig_root_.end(); + }); + } + + bool isContig(IterDomain* id) { + return contig_ids_.find(id) != contig_ids_.end(); + } + + // Split outputs are not contiguous, don't need to do anything. + void handle(Split*) override {} + + void handle(Merge* merge) override; + + private: + //! Root domains to analyze contiguity + const std::vector& root_domain_; + //! Contiguity of root_domain_ + const std::vector& root_contiguity_; + //! Mapping of root domain to bool indicating contiguity + std::unordered_map is_contig_root_; + // Mark if ids are result of contigous merges + std::unordered_set contig_ids_; + // Given contiguous domain, return all iter domains within its history. + std::unordered_map> + within_contig_ids_; + //! Mapping of root domain to the actual indexed domain, which can + //! be itself or a contig merged domain if found. + std::unordered_map root_to_indexed_id_; +}; + +} // namespace cuda +} // namespace fuser +} // namespace jit +} // namespace torch diff --git a/torch/csrc/jit/codegen/cuda/dispatch.cpp b/torch/csrc/jit/codegen/cuda/dispatch.cpp index 1702de93bdd47e..dc7ac6403d657c 100644 --- a/torch/csrc/jit/codegen/cuda/dispatch.cpp +++ b/torch/csrc/jit/codegen/cuda/dispatch.cpp @@ -54,6 +54,9 @@ void Val::dispatch(T handler, Val* val) { case DataType::Int: ptr(handler)->handle(val->as()); return; + case DataType::ComplexDouble: + ptr(handler)->handle(val->as()); + return; default: break; } @@ -101,6 +104,9 @@ void Expr::dispatch(T handler, Expr* expr) { case ExprType::WelfordOp: ptr(handler)->handle(expr->as()); return; + case ExprType::MmaOp: + ptr(handler)->handle(expr->as()); + return; case ExprType::BroadcastOp: ptr(handler)->handle(expr->as()); return; @@ -120,6 +126,9 @@ void Expr::dispatch(T handler, Expr* expr) { case ExprType::GatherOp: ptr(handler)->handle(expr->as()); return; + case ExprType::ViewDtypeOp: + ptr(handler)->handle(expr->as()); + return; case ExprType::ViewOp: ptr(handler)->handle(expr->as()); return; @@ -127,8 +136,11 @@ void Expr::dispatch(T handler, Expr* expr) { case ExprType::Allocate: ptr(handler)->handle(expr->as()); return; - case ExprType::Sync: - ptr(handler)->handle(expr->as()); + case ExprType::BlockSync: + ptr(handler)->handle(expr->as()); + return; + case ExprType::GridSync: + ptr(handler)->handle(expr->as()); return; case ExprType::InitMagicZero: ptr(handler)->handle(expr->as()); @@ -151,6 +163,9 @@ void Expr::dispatch(T handler, Expr* expr) { case ExprType::GridWelford: ptr(handler)->handle(expr->as()); return; + case ExprType::AllocateFusedReduction: + ptr(handler)->handle(expr->as()); + return; default: TORCH_INTERNAL_ASSERT(false, "Unknown exprtype in dispatch!"); } @@ -180,6 +195,9 @@ void Val::constDispatch(T handler, const Val* val) { case DataType::Int: ptr(handler)->handle(val->as()); return; + case DataType::ComplexDouble: + ptr(handler)->handle(val->as()); + return; default: break; } @@ -227,6 +245,9 @@ void Expr::constDispatch(T handler, const Expr* expr) { case ExprType::WelfordOp: ptr(handler)->handle(expr->as()); return; + case ExprType::MmaOp: + ptr(handler)->handle(expr->as()); + return; case ExprType::BroadcastOp: ptr(handler)->handle(expr->as()); return; @@ -246,6 +267,9 @@ void Expr::constDispatch(T handler, const Expr* expr) { case ExprType::GatherOp: ptr(handler)->handle(expr->as()); return; + case ExprType::ViewDtypeOp: + ptr(handler)->handle(expr->as()); + return; case ExprType::ViewOp: ptr(handler)->handle(expr->as()); return; @@ -253,8 +277,11 @@ void Expr::constDispatch(T handler, const Expr* expr) { case ExprType::Allocate: ptr(handler)->handle(expr->as()); return; - case ExprType::Sync: - ptr(handler)->handle(expr->as()); + case ExprType::BlockSync: + ptr(handler)->handle(expr->as()); + return; + case ExprType::GridSync: + ptr(handler)->handle(expr->as()); return; case ExprType::InitMagicZero: ptr(handler)->handle(expr->as()); @@ -277,6 +304,9 @@ void Expr::constDispatch(T handler, const Expr* expr) { case ExprType::GridWelford: ptr(handler)->handle(expr->as()); return; + case ExprType::AllocateFusedReduction: + ptr(handler)->handle(expr->as()); + return; default: TORCH_INTERNAL_ASSERT(false, "Unknown exprtype in dispatch!"); } @@ -317,6 +347,9 @@ void Val::mutatorDispatch(T mutator, Val* val) { case DataType::Int: ptr(mutator)->mutate(val->as()); return; + case DataType::ComplexDouble: + ptr(mutator)->mutate(val->as()); + return; default: break; } @@ -364,6 +397,9 @@ void Expr::mutatorDispatch(T mutator, Expr* expr) { case ExprType::WelfordOp: ptr(mutator)->mutate(expr->as()); return; + case ExprType::MmaOp: + ptr(mutator)->mutate(expr->as()); + return; case ExprType::BroadcastOp: ptr(mutator)->mutate(expr->as()); return; @@ -383,6 +419,9 @@ void Expr::mutatorDispatch(T mutator, Expr* expr) { case ExprType::GatherOp: ptr(mutator)->mutate(expr->as()); return; + case ExprType::ViewDtypeOp: + ptr(mutator)->mutate(expr->as()); + return; case ExprType::ViewOp: ptr(mutator)->mutate(expr->as()); return; @@ -390,8 +429,11 @@ void Expr::mutatorDispatch(T mutator, Expr* expr) { case ExprType::Allocate: ptr(mutator)->mutate(expr->as()); return; - case ExprType::Sync: - ptr(mutator)->mutate(expr->as()); + case ExprType::BlockSync: + ptr(mutator)->mutate(expr->as()); + return; + case ExprType::GridSync: + ptr(mutator)->mutate(expr->as()); return; case ExprType::InitMagicZero: ptr(mutator)->mutate(expr->as()); @@ -414,6 +456,9 @@ void Expr::mutatorDispatch(T mutator, Expr* expr) { case ExprType::GridWelford: ptr(mutator)->mutate(expr->as()); return; + case ExprType::AllocateFusedReduction: + ptr(mutator)->mutate(expr->as()); + return; default: TORCH_INTERNAL_ASSERT(false, "Unknown exprtype in dispatch!"); } @@ -530,6 +575,9 @@ void OptOutConstDispatch::handle(const Double* stmt) { void OptOutConstDispatch::handle(const Int* stmt) { unhandled(stmt); } +void OptOutConstDispatch::handle(const ComplexDouble* stmt) { + unhandled(stmt); +} void OptOutConstDispatch::handle(const NamedScalar* stmt) { unhandled(stmt); } @@ -566,6 +614,9 @@ void OptOutConstDispatch::handle(const ReductionOp* stmt) { void OptOutConstDispatch::handle(const WelfordOp* stmt) { unhandled(stmt); } +void OptOutConstDispatch::handle(const MmaOp* stmt) { + unhandled(stmt); +} void OptOutConstDispatch::handle(const BroadcastOp* stmt) { unhandled(stmt); } @@ -585,6 +636,9 @@ void OptOutConstDispatch::handle(const ShiftOp* stmt) { void OptOutConstDispatch::handle(const GatherOp* stmt) { unhandled(stmt); } +void OptOutConstDispatch::handle(const ViewDtypeOp* stmt) { + unhandled(stmt); +} void OptOutConstDispatch::handle(const ViewOp* stmt) { unhandled(stmt); } @@ -592,7 +646,10 @@ void OptOutConstDispatch::handle(const ViewOp* stmt) { void OptOutConstDispatch::handle(const kir::Allocate* stmt) { unhandled(stmt); } -void OptOutConstDispatch::handle(const kir::Sync* stmt) { +void OptOutConstDispatch::handle(const kir::BlockSync* stmt) { + unhandled(stmt); +} +void OptOutConstDispatch::handle(const kir::GridSync* stmt) { unhandled(stmt); } void OptOutConstDispatch::handle(const kir::InitMagicZero* stmt) { @@ -616,6 +673,9 @@ void OptOutConstDispatch::handle(const kir::GridBroadcast* stmt) { void OptOutConstDispatch::handle(const kir::GridWelford* stmt) { unhandled(stmt); } +void OptOutConstDispatch::handle(const kir::AllocateFusedReduction* stmt) { + unhandled(stmt); +} void OptOutDispatch::unhandled(Statement*) {} @@ -629,6 +689,9 @@ void OptOutDispatch::handle(Double* stmt) { void OptOutDispatch::handle(Int* stmt) { unhandled(stmt); } +void OptOutDispatch::handle(ComplexDouble* stmt) { + unhandled(stmt); +} void OptOutDispatch::handle(NamedScalar* stmt) { unhandled(stmt); } @@ -665,6 +728,9 @@ void OptOutDispatch::handle(ReductionOp* stmt) { void OptOutDispatch::handle(WelfordOp* stmt) { unhandled(stmt); } +void OptOutDispatch::handle(MmaOp* stmt) { + unhandled(stmt); +} void OptOutDispatch::handle(BroadcastOp* stmt) { unhandled(stmt); } @@ -684,6 +750,9 @@ void OptOutDispatch::handle(ShiftOp* stmt) { void OptOutDispatch::handle(GatherOp* stmt) { unhandled(stmt); } +void OptOutDispatch::handle(ViewDtypeOp* stmt) { + unhandled(stmt); +} void OptOutDispatch::handle(ViewOp* stmt) { unhandled(stmt); } @@ -691,7 +760,10 @@ void OptOutDispatch::handle(ViewOp* stmt) { void OptOutDispatch::handle(kir::Allocate* stmt) { unhandled(stmt); } -void OptOutDispatch::handle(kir::Sync* stmt) { +void OptOutDispatch::handle(kir::BlockSync* stmt) { + unhandled(stmt); +} +void OptOutDispatch::handle(kir::GridSync* stmt) { unhandled(stmt); } void OptOutDispatch::handle(kir::InitMagicZero* stmt) { @@ -715,6 +787,9 @@ void OptOutDispatch::handle(kir::GridBroadcast* stmt) { void OptOutDispatch::handle(kir::GridWelford* stmt) { unhandled(stmt); } +void OptOutDispatch::handle(kir::AllocateFusedReduction* stmt) { + unhandled(stmt); +} } // namespace cuda } // namespace fuser diff --git a/torch/csrc/jit/codegen/cuda/dispatch.h b/torch/csrc/jit/codegen/cuda/dispatch.h index 6961ebd6a1584e..c38641cee580a8 100644 --- a/torch/csrc/jit/codegen/cuda/dispatch.h +++ b/torch/csrc/jit/codegen/cuda/dispatch.h @@ -64,6 +64,7 @@ class TensorView; class Bool; class Double; class Int; +class ComplexDouble; class NamedScalar; // Exprs @@ -72,31 +73,31 @@ class BinaryOp; class TernaryOp; class ReductionOp; class WelfordOp; +class MmaOp; class BroadcastOp; class TransposeOp; class ShiftOp; class GatherOp; +class ViewDtypeOp; class ViewOp; // Exprs class Split; class Merge; -class TransposeOp; -class ShiftOp; -class GatherOp; -class ViewOp; namespace kir { class Predicate; class TensorIndex; class Allocate; -class Sync; +class BlockSync; +class GridSync; class ForLoop; class IfThenElse; class GridReduction; class GridBroadcast; class GridWelford; +class AllocateFusedReduction; class InitMagicZero; class UpdateMagicZero; } // namespace kir @@ -120,6 +121,7 @@ class TORCH_CUDA_CU_API OptOutConstDispatch : public PolymorphicBase { virtual void handle(const Bool* stmt); virtual void handle(const Double* stmt); virtual void handle(const Int* stmt); + virtual void handle(const ComplexDouble* stmt); virtual void handle(const NamedScalar* stmt); virtual void handle(const kir::Predicate*); @@ -131,6 +133,7 @@ class TORCH_CUDA_CU_API OptOutConstDispatch : public PolymorphicBase { virtual void handle(const TernaryOp* stmt); virtual void handle(const ReductionOp* stmt); virtual void handle(const WelfordOp* stmt); + virtual void handle(const MmaOp* stmt); virtual void handle(const BroadcastOp* stmt); virtual void handle(const Split* stmt); @@ -138,10 +141,12 @@ class TORCH_CUDA_CU_API OptOutConstDispatch : public PolymorphicBase { virtual void handle(const TransposeOp* stmt); virtual void handle(const ShiftOp* stmt); virtual void handle(const GatherOp* stmt); + virtual void handle(const ViewDtypeOp* stmt); virtual void handle(const ViewOp* stmt); virtual void handle(const kir::Allocate*); - virtual void handle(const kir::Sync*); + virtual void handle(const kir::BlockSync*); + virtual void handle(const kir::GridSync*); virtual void handle(const kir::InitMagicZero*); virtual void handle(const kir::UpdateMagicZero*); virtual void handle(const kir::ForLoop*); @@ -149,6 +154,7 @@ class TORCH_CUDA_CU_API OptOutConstDispatch : public PolymorphicBase { virtual void handle(const kir::GridReduction*); virtual void handle(const kir::GridBroadcast*); virtual void handle(const kir::GridWelford*); + virtual void handle(const kir::AllocateFusedReduction*); }; class TORCH_CUDA_CU_API OptOutDispatch : public PolymorphicBase { @@ -165,6 +171,7 @@ class TORCH_CUDA_CU_API OptOutDispatch : public PolymorphicBase { virtual void handle(Bool* stmt); virtual void handle(Double* stmt); virtual void handle(Int* stmt); + virtual void handle(ComplexDouble* stmt); virtual void handle(NamedScalar* stmt); virtual void handle(IterDomain* stmt); virtual void handle(TensorDomain* stmt); @@ -179,6 +186,7 @@ class TORCH_CUDA_CU_API OptOutDispatch : public PolymorphicBase { virtual void handle(TernaryOp* stmt); virtual void handle(ReductionOp* stmt); virtual void handle(WelfordOp* stmt); + virtual void handle(MmaOp* stmt); virtual void handle(BroadcastOp* stmt); virtual void handle(Split* stmt); @@ -186,10 +194,12 @@ class TORCH_CUDA_CU_API OptOutDispatch : public PolymorphicBase { virtual void handle(TransposeOp* stmt); virtual void handle(ShiftOp* stmt); virtual void handle(GatherOp* stmt); + virtual void handle(ViewDtypeOp* stmt); virtual void handle(ViewOp* stmt); virtual void handle(kir::Allocate* stmt); - virtual void handle(kir::Sync* stmt); + virtual void handle(kir::BlockSync* stmt); + virtual void handle(kir::GridSync* stmt); virtual void handle(kir::InitMagicZero* stmt); virtual void handle(kir::UpdateMagicZero* stmt); virtual void handle(kir::ForLoop* stmt); @@ -197,6 +207,7 @@ class TORCH_CUDA_CU_API OptOutDispatch : public PolymorphicBase { virtual void handle(kir::GridReduction* stmt); virtual void handle(kir::GridBroadcast* stmt); virtual void handle(kir::GridWelford* stmt); + virtual void handle(kir::AllocateFusedReduction* stmt); }; class TORCH_CUDA_CU_API OptInConstDispatch : public OptOutConstDispatch { @@ -254,6 +265,7 @@ class TORCH_CUDA_CU_API OptOutMutator : public PolymorphicBase { virtual void mutate(Bool*); virtual void mutate(Double*); virtual void mutate(Int*); + virtual void mutate(ComplexDouble*); virtual void mutate(NamedScalar*); virtual void mutate(IterDomain*); virtual void mutate(TensorDomain*); @@ -268,6 +280,7 @@ class TORCH_CUDA_CU_API OptOutMutator : public PolymorphicBase { virtual void mutate(TernaryOp*); virtual void mutate(ReductionOp*); virtual void mutate(WelfordOp*); + virtual void mutate(MmaOp*); virtual void mutate(BroadcastOp*); virtual void mutate(Split*); @@ -275,10 +288,12 @@ class TORCH_CUDA_CU_API OptOutMutator : public PolymorphicBase { virtual void mutate(TransposeOp*); virtual void mutate(ShiftOp*); virtual void mutate(GatherOp*); + virtual void mutate(ViewDtypeOp*); virtual void mutate(ViewOp*); virtual void mutate(kir::Allocate*); - virtual void mutate(kir::Sync*); + virtual void mutate(kir::BlockSync*); + virtual void mutate(kir::GridSync*); virtual void mutate(kir::InitMagicZero*); virtual void mutate(kir::UpdateMagicZero*); virtual void mutate(kir::ForLoop*); @@ -286,6 +301,7 @@ class TORCH_CUDA_CU_API OptOutMutator : public PolymorphicBase { virtual void mutate(kir::GridReduction*); virtual void mutate(kir::GridBroadcast*); virtual void mutate(kir::GridWelford*); + virtual void mutate(kir::AllocateFusedReduction*); protected: void removeExpr(IrContainer*, Expr*); diff --git a/torch/csrc/jit/codegen/cuda/evaluator_common.cpp b/torch/csrc/jit/codegen/cuda/evaluator_common.cpp index 0948131956982b..83107569dc54b5 100644 --- a/torch/csrc/jit/codegen/cuda/evaluator_common.cpp +++ b/torch/csrc/jit/codegen/cuda/evaluator_common.cpp @@ -388,7 +388,7 @@ void KernelPrecomputedIntegers::bindTensorMetaData( const at::Tensor& at_tensor) { std::vector> ret; const auto root_domain = - TensorDomain::noReductions(tv->domain()->getRootDomain()); + TensorDomain::noReductions(tv->domain()->getMaybeRFactorDomain()); TORCH_INTERNAL_ASSERT( at_tensor.ndimension() == static_cast(root_domain.size()), "Something went wrong configuring launch. Inputs do not match."); diff --git a/torch/csrc/jit/codegen/cuda/executor.cpp b/torch/csrc/jit/codegen/cuda/executor.cpp index 5e6f2d9375e019..a32dbbf73b2485 100644 --- a/torch/csrc/jit/codegen/cuda/executor.cpp +++ b/torch/csrc/jit/codegen/cuda/executor.cpp @@ -13,6 +13,7 @@ #include #include +#include #include #include #include @@ -56,6 +57,18 @@ typedef unsigned long long int uint64_t; )"; } +static const std::string& defineComplexTypes() { + static std::string result = std::string(R"ESCAPE( +#define POS_INFINITY __int_as_float(0x7f800000) +#define INFINITY POS_INFINITY +#define NEG_INFINITY __int_as_float(0xff800000) +#define NAN __int_as_float(0x7fffffff) +)ESCAPE") + + at::cuda::get_traits_string() + at::cuda::get_complex_body_string() + + at::cuda::get_cmath_string() + at::cuda::get_complex_math_string(); + return result; +} + } // namespace std::string FusionExecutor::getStructuredCode(const std::string& kernel) { @@ -70,7 +83,7 @@ std::string FusionExecutor::getStructuredCode(const std::string& kernel) { #endif code += std::string("namespace ") + FusionExecutor::kernelNamespace() + " {\n" + defineIntegerTypes() + defineIndexMode(options_.index_mode) + - executor_utils::kernelPreamble() + kernel + "}\n"; + defineComplexTypes() + executor_utils::kernelPreamble() + kernel + "}\n"; if (isDebugDumpEnabled(DebugDumpOption::CudaKernel)) { std::cout << "\n======= Codegen output for kernel: " << kernelName() @@ -169,12 +182,15 @@ void FusionExecutor::compileFusion( c10::DeviceGuard dg(options_.device); TORCH_INTERNAL_ASSERT( - options.device.is_cuda(), "Provided device to CUDA fuser is the CPU."); - auto properties = at::cuda::getDeviceProperties(options.device.index()); + options_.device.is_cuda(), "Provided device to CUDA fuser is the CPU."); + auto properties = at::cuda::getDeviceProperties(options_.device.index()); max_device_smem = properties->sharedMemPerBlock; warp_size_ = properties->warpSize; - lowered_ = std::make_unique(fusion); + lowered_ = std::make_unique( + fusion, + options_.index_mode == KernelIndexMode::INT64 ? DataType::Int + : DataType::Int32); const auto kernel = lowered_->kernel(); fusion_ = lowered_->kernel()->as(); @@ -464,8 +480,12 @@ LaunchParams FusionExecutor::computeLaunchParams( } maximum_value = std::max(maximum_value, *val); } - expr_eval.bind(p_type, maximum_value); - launch_params.bind(maximum_value, p_type); + // Protect for size-0 tensors, they still have a value so would prefer to + // bind nothing than 0 + if (maximum_value > 0) { + expr_eval.bind(p_type, maximum_value); + launch_params.bind(maximum_value, p_type); + } } // Re-run the integer machine with all @@ -552,23 +572,41 @@ FusionExecutor::GlobalBuffers FusionExecutor::allocGlobalVals( } std::vector FusionExecutor::allocOutputs( + const at::ArrayRef& inputs, kir::ExpressionEvaluator& expr_eval, const std::unordered_set& alias_indices) { FUSER_PERF_SCOPE("FusionExecutor::AllocOutputs"); const auto kernel = lowered_->kernel(); // NOLINTNEXTLINE(cppcoreguidelines-init-variables) std::vector outputs; - for (const auto i : c10::irange(kernel->outputs().size())) { - TORCH_INTERNAL_ASSERT( - kernel->outputs()[i]->isA(), - "Cannot allocate outputs that are not tensors."); - auto output = kernel->outputs()[i]->as(); - if (alias_indices.count(i) == 0) { - outputs.push_back( - inferAndAllocOutput(output, expr_eval, options_, false)); + for (const auto out_i : c10::irange(kernel->outputs().size())) { + // Dummy output. + if (kernel->outputs()[out_i]->isFusionInput()) { + for (auto inp_i : c10::irange(kernel->inputs().size())) { + if (kernel->inputs()[inp_i] == kernel->outputs()[out_i]) { + TORCH_INTERNAL_ASSERT( + inp_i < inputs.size(), + "Issue with an input showing up as output, couldn't find input."); + TORCH_INTERNAL_ASSERT( + inputs[inp_i].isTensor(), + "Cannot register a scalar as an output in a fusion."); + outputs.push_back(inputs[inp_i].toTensor()); + break; + } + } } else { - // aliasing to inputs, no need to allocate real output - outputs.push_back(inferAndAlloc(output, {}, expr_eval, options_, false)); + TORCH_INTERNAL_ASSERT( + kernel->outputs()[out_i]->isA(), + "Cannot allocate outputs that are not tensors."); + auto output = kernel->outputs()[out_i]->as(); + if (alias_indices.count(out_i) == 0) { + outputs.push_back( + inferAndAllocOutput(output, expr_eval, options_, false)); + } else { + // aliasing to inputs, no need to allocate real output + outputs.push_back( + inferAndAlloc(output, {}, expr_eval, options_, false)); + } } } return outputs; @@ -753,7 +791,7 @@ std::vector FusionExecutor::runFusion( auto& output_alias_indices = output_alias_indices_entry.get(); - allocated_outputs = allocOutputs(expr_eval, output_alias_indices); + allocated_outputs = allocOutputs(inputs, expr_eval, output_alias_indices); for (const auto& entry : alias_indices) { TORCH_INTERNAL_ASSERT( @@ -826,14 +864,17 @@ std::vector FusionExecutor::runFusion( << "Inputs:" << std::endl; for (const auto& input : inputs) { if (input.isTensor()) { - std::cout << input.toTensor().scalar_type() << " " - << input.toTensor().sizes() << std::endl; + const auto& input_tensor = input.toTensor(); + std::cout << " " << input_tensor.scalar_type() << " " + << input.toTensor().sizes() + << " (strides = " << input.toTensor().strides() << ")" + << std::endl; } } std::cout << "Outputs:" << std::endl; for (const auto& output : allocated_outputs) { std::cout << " " << output.scalar_type() << " " << output.sizes() - << std::endl; + << " (strides = " << output.strides() << ")" << std::endl; } std::cout << "Reduction and semaphore buffers:" << std::endl; for (const auto& buffer : global_buffers.buffers) { diff --git a/torch/csrc/jit/codegen/cuda/executor.h b/torch/csrc/jit/codegen/cuda/executor.h index 40accbfb5208d0..a62507e87bfd89 100644 --- a/torch/csrc/jit/codegen/cuda/executor.h +++ b/torch/csrc/jit/codegen/cuda/executor.h @@ -165,6 +165,7 @@ class TORCH_CUDA_CU_API FusionExecutor : public NonCopyable { // skip allocating real storage for those, but still maintain its spot to // maintain the indexing from output aliases to inputs std::vector allocOutputs( + const at::ArrayRef& inputs, kir::ExpressionEvaluator& expr_eval, const std::unordered_set& alias_indices = {}); diff --git a/torch/csrc/jit/codegen/cuda/executor_kernel_arg.cpp b/torch/csrc/jit/codegen/cuda/executor_kernel_arg.cpp index 883fae207c51d2..da5667f9faccdd 100644 --- a/torch/csrc/jit/codegen/cuda/executor_kernel_arg.cpp +++ b/torch/csrc/jit/codegen/cuda/executor_kernel_arg.cpp @@ -88,6 +88,10 @@ std::unique_ptr getTensorArg( return getTensorArg(nDims); case c10::ScalarType::Int: return getTensorArg(nDims); + case c10::ScalarType::ComplexFloat: + return getTensorArg, INDEX_MODE>(nDims); + case c10::ScalarType::ComplexDouble: + return getTensorArg, INDEX_MODE>(nDims); default: TORCH_CHECK( false, @@ -193,6 +197,10 @@ void KernelArgumentHolder::push(const IValue& val) { auto scalar_val = val.toScalar(); switch (scalar_val.type()) { // NOLINTNEXTLINE(bugprone-branch-clone) + case c10::ScalarType::ComplexDouble: + arguments_.push_back( + std::make_unique(scalar_val.toComplexDouble())); + return; case c10::ScalarType::Double: arguments_.push_back(std::make_unique(scalar_val.toDouble())); return; diff --git a/torch/csrc/jit/codegen/cuda/executor_kernel_arg.h b/torch/csrc/jit/codegen/cuda/executor_kernel_arg.h index d457a69adb2505..c135328a3acc1e 100644 --- a/torch/csrc/jit/codegen/cuda/executor_kernel_arg.h +++ b/torch/csrc/jit/codegen/cuda/executor_kernel_arg.h @@ -4,6 +4,7 @@ #include #include #include +#include namespace torch { namespace jit { @@ -18,10 +19,8 @@ struct TensorArgCodegen { }; T* data; - // NOLINTNEXTLINE(cppcoreguidelines-avoid-c-arrays,modernize-avoid-c-arrays) - nvfuser_index_t size[N]; - // NOLINTNEXTLINE(cppcoreguidelines-avoid-c-arrays,modernize-avoid-c-arrays) - nvfuser_index_t stride[N]; + std::array size; + std::array stride; constexpr int nDims() { return N; } @@ -71,8 +70,7 @@ struct ArgAbstract { struct PhiloxCudaStateArg : public ArgAbstract { at::PhiloxCudaState val_; PhiloxCudaStateArg(at::PhiloxCudaState _val) : val_(_val){}; - // NOLINTNEXTLINE(modernize-use-override,cppcoreguidelines-explicit-virtual-functions) - void* arg() { + void* arg() override { return &val_; } }; @@ -80,8 +78,7 @@ struct PhiloxCudaStateArg : public ArgAbstract { struct LongArg : public ArgAbstract { int64_t val_; explicit LongArg(int64_t _val) : val_(_val) {} - // NOLINTNEXTLINE(modernize-use-override,cppcoreguidelines-explicit-virtual-functions) - void* arg() { + void* arg() override { return &val_; } }; @@ -89,8 +86,15 @@ struct LongArg : public ArgAbstract { struct DoubleArg : public ArgAbstract { double val_; explicit DoubleArg(double _val) : val_(_val) {} - // NOLINTNEXTLINE(modernize-use-override,cppcoreguidelines-explicit-virtual-functions) - void* arg() { + void* arg() override { + return &val_; + } +}; + +struct ComplexDoubleArg : public ArgAbstract { + c10::complex val_; + explicit ComplexDoubleArg(c10::complex _val) : val_(_val) {} + void* arg() override { return &val_; } }; @@ -98,8 +102,7 @@ struct DoubleArg : public ArgAbstract { struct BoolArg : public ArgAbstract { bool val_; explicit BoolArg(bool _val) : val_(_val) {} - // NOLINTNEXTLINE(modernize-use-override,cppcoreguidelines-explicit-virtual-functions) - void* arg() { + void* arg() override { return &val_; } }; diff --git a/torch/csrc/jit/codegen/cuda/executor_utils.cpp b/torch/csrc/jit/codegen/cuda/executor_utils.cpp index 5323036e5df982..d81ce7b2c55c76 100644 --- a/torch/csrc/jit/codegen/cuda/executor_utils.cpp +++ b/torch/csrc/jit/codegen/cuda/executor_utils.cpp @@ -5,20 +5,24 @@ #include #include +#include #include #include #include #include +#include #include #include #include +#include #include #include #include #include #include #include +#include #include #include #include @@ -26,6 +30,9 @@ #include #include #include +#include +#include +#include #include #include @@ -68,9 +75,12 @@ std::string kernelPreamble() { // Base classes and helpers ss << nvfuser_resources::tensor_cu; + ss << nvfuser_resources::type_traits_cu; + ss << nvfuser_resources::array_cu; ss << nvfuser_resources::random_numbers_cu; ss << nvfuser_resources::helpers_cu; ss << nvfuser_resources::index_utils_cu; + ss << nvfuser_resources::tuple_cu; // Synchronization classes if (std::getenv("PYTORCH_NVFUSER_USE_BLOCK_SYNC_ATOMIC")) { @@ -87,6 +97,8 @@ std::string kernelPreamble() { ss << nvfuser_resources::broadcast_cu; ss << nvfuser_resources::welford_cu; ss << nvfuser_resources::warp_cu; + ss << nvfuser_resources::tensorcore_cu; + ss << nvfuser_resources::fused_reduction_cu; // Random utilities ss << nvfuser_resources::PhiloxCudaStateRaw_cu; @@ -123,9 +135,9 @@ bool validateKernelArgTensor( size_t arg_dim = arg.dim(); // Note: This requires current Fusion to be active. // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - size_t param_dim = - TensorDomain::noReductions(param->as()->getRootDomain()) - .size(); + size_t param_dim = TensorDomain::noReductions( + param->as()->getMaybeRFactorDomain()) + .size(); // see [Note - broadcast support in integration] // Because of broadcasting support handled in integration, we relax the rank // check as necessary. @@ -166,6 +178,12 @@ bool validateKernelArgTensor( case at::ScalarType::Bool: match = param_data_type == DataType::Bool; break; + case at::ScalarType::ComplexFloat: + match = param_data_type == DataType::ComplexFloat; + break; + case at::ScalarType::ComplexDouble: + match = param_data_type == DataType::ComplexDouble; + break; default: msg << "Argument element type, " << arg_data_type << ", is not supported." << "\n"; @@ -193,6 +211,10 @@ bool validateKernelArgScalar( case c10::ScalarType::Long: match = param_type == DataType::Int || param_type == DataType::Int32; break; + case c10::ScalarType::ComplexDouble: + match = param_type == DataType::ComplexDouble || + param_type == DataType::ComplexFloat; + break; case c10::ScalarType::Double: match = param_type == DataType::Double || param_type == DataType::Float || param_type == DataType::Half || param_type == DataType::BFloat16; @@ -254,6 +276,10 @@ bool checkSameStride(const std::vector& tensors) { // Return true if all the tensors are contiguous and have the same striding bool checkSameContiguity(const std::vector& tensors) { + if (tensors.size() < 2) { + return true; + } + auto reference = tensors.front(); if (!reference.isTensor()) { return false; @@ -286,6 +312,7 @@ bool checkValidMisalignedTensors( // Only check input tensors return checkSameStride(inp_tensors); } else if (!out_tv.empty() && out_tensors.empty()) { + // out_tensors is empty unless outputs are given to runFusion. // Assume out tensors are contiguous return checkSameContiguity(inp_tensors); } else { @@ -350,243 +377,231 @@ void validateKernelOutputs( namespace { -bool canVectorize(const IValue& aten_val, int word_size) { - if (!aten_val.isTensor()) { - return false; - } - - const auto& aten_tensor = aten_val.toTensor(); - - if (reinterpret_cast(aten_tensor.data_ptr()) % - (word_size * aten_tensor.dtype().itemsize()) != - 0) { - return false; - } - - for (size_t i = aten_tensor.ndimension(); i > 0; i--) { - if (aten_tensor.size(i - 1) != 1) { - if (aten_tensor.size(aten_tensor.ndimension() - 1) % word_size != 0 || - aten_tensor.stride(aten_tensor.ndimension() - 1) != 1) { - return false; - } - break; - } - } - - for (auto stride : aten_tensor.strides()) { - if (stride != 1 && stride % word_size != 0) { - return false; - } - } - - return true; -} - -// Returns true if a TV can be used with ParallelType::Vectorize. When -// input or output tensors are involved, the other version of -// canVectorize is used. -bool canVectorize( - TensorView* tv, - int word_size, - kir::ExpressionEvaluator& expr_eval) { - IterDomain* last_root_dim = nullptr; - for (size_t i = tv->getRootDomain().size(); i > 0; i--) { - auto r_id = tv->getRootDomain()[i - 1]; - if (r_id->isReduction() || r_id->isTrivialReduction() || - r_id->isBroadcast()) { - continue; - } - last_root_dim = r_id; - break; - } - - if (last_root_dim == nullptr) { - return false; - } - - auto last_dim_size = expr_eval.evaluate(last_root_dim->extent()); +// Finds a fusion input or output tensor to validate its stides +// for vectorization. +// Returns a pair consisting of a flag indicating it's a fusion input +// and an integer position within in the input or output tensor list. +std::vector> getVectorizedFusionInputOutput( + TensorView* producer_tv, + TensorView* consumer_tv, + Fusion* fusion) { + std::vector> vectorized_input_output; - if (!last_dim_size.has_value()) { - return false; - } + // When the producer is a fusion input, validate only the producer + // and assume the consumer is contiguous. Similarly, when the + // consumer is a fusion output, validate the consumer and assume the + // producer is contiguous. - if (last_dim_size.value() % word_size != 0) { - return false; + if (producer_tv->isFusionInput()) { + auto producer_it = std::find( + fusion->inputs().begin(), fusion->inputs().end(), producer_tv); + TORCH_INTERNAL_ASSERT( + producer_it != fusion->inputs().end(), + "Could not find ", + producer_tv, + " in fusion inputs."); + auto pos = std::distance(fusion->inputs().begin(), producer_it); + vectorized_input_output.push_back( + std::make_pair(true, static_cast(pos))); + } else { + // If not fusion input, assume it's fully contiguous, so nothing + // to check with respect to strides. + TORCH_INTERNAL_ASSERT( + std::all_of( + producer_tv->domain()->contiguity().begin(), + producer_tv->domain()->contiguity().end(), + [](bool contig) { return contig; }), + "Unsupported pattern of vectorization: ", + consumer_tv->definition()->toString()); } - return true; -} - -// Check if there's any split that is non-divisible and vectorized. If -// found, Vectorize is illegal. -void validateVectorizedSplits( - kir::Kernel* kernel, - kir::ExpressionEvaluator& expr_eval) { - for (const auto& extent_factor : kernel->summary().splits_to_validate) { - auto input_extent = expr_eval.evaluate(extent_factor.first); - auto split_factor = expr_eval.evaluate(extent_factor.second); + if (consumer_tv->isFusionOutput()) { + auto consumer_it = std::find( + fusion->outputs().begin(), fusion->outputs().end(), consumer_tv); TORCH_INTERNAL_ASSERT( - input_extent.has_value(), - "Could not check if a split with vectorization is divisible because the extent, ", - extent_factor.first->toString(), - ", is not possible to evaluate."); - TORCH_INTERNAL_ASSERT( - input_extent.has_value(), - "Could not check if a split with vectorization is divisible because the split factor, ", - extent_factor.second->toString(), - ", is not possible to evaluate."); + consumer_it != fusion->outputs().end(), + "Could not find ", + consumer_tv, + " in fusion outputs."); + auto pos = std::distance(fusion->outputs().begin(), consumer_it); + vectorized_input_output.push_back( + std::make_pair(false, static_cast(pos))); + } else { + // If not fusion input, assume it's fully contiguous, so nothing + // to check with respect to strides. TORCH_INTERNAL_ASSERT( - input_extent.value() % split_factor.value() == 0, - "Non-divisible split with vectorization is detected. ", - "Extent: ", - input_extent.value(), - ". Factor: ", - split_factor.value()); + std::all_of( + consumer_tv->domain()->contiguity().begin(), + consumer_tv->domain()->contiguity().end(), + [](bool contig) { return contig; }), + "Unsupported pattern of vectorization: ", + consumer_tv->definition()->toString()); } + + return vectorized_input_output; } -//! Returns the position information of vectorized input/output tensors -//! in the given fusion. +//! Returns the information of vectorized input/output tensors +//! in the given fusion. std::unique_ptr getVectorizedTensorValidationInfo( - Fusion* fusion) { + kir::Kernel* kernel) { auto vectorized_tensor_info_ptr = std::make_unique(); - auto& tv_to_vector_word_size = - vectorized_tensor_info_ptr->tv_to_vector_word_size; - auto& global_inp_misaligned_tv = - vectorized_tensor_info_ptr->global_inp_misaligned_tv; - auto& global_out_misaligned_tv = - vectorized_tensor_info_ptr->global_out_misaligned_tv; - kir::ExpressionEvaluator expr_eval; + for (const auto& vector_info : kernel->summary().vectorized_set_info) { + auto consumer_tv = vector_info.consumer_tv; + auto producer_tv = vector_info.producer_tv; - // Find all vectorized tensors and their word size - for (auto expr : fusion->exprs()) { - if (!expr->isA() || - expr->as()->getUnaryOpType() != UnaryOpType::Set) { - continue; - } - auto uop = expr->as(); - if (!uop->out()->isA() || !uop->in()->isA()) { - continue; - } - auto out_tv = uop->out()->as(); - auto in_tv = uop->in()->as(); - IterDomain* vector_dim = nullptr; - for (auto id : out_tv->domain()->domain()) { - if (id->getParallelType() == ParallelType::Vectorize || - id->getParallelType() == ParallelType::MisalignedVectorize) { - TORCH_INTERNAL_ASSERT( - vector_dim == nullptr, - "Found multiple vectorized dimensions on tensor ", - out_tv); - vector_dim = id; - } - } - if (vector_dim == nullptr) { - continue; - } - auto vector_word_size = expr_eval.evaluate(vector_dim->extent()); - TORCH_INTERNAL_ASSERT( - vector_word_size.has_value(), - "Non constant vector dimension found in ", - out_tv); - - // The expression here must be a UnaryOp::Set, so checking either of the - // input or output tensor should be sufficient. When the output is a - // fusion output, check the tensor as its size information is available - // without using the expression evaluator. - auto tv_to_verify = out_tv->isFusionOutput() ? out_tv : in_tv; - tv_to_vector_word_size[tv_to_verify] = vector_word_size.value(); - - if (vector_dim->getParallelType() == ParallelType::MisalignedVectorize) { + auto vector_dim = vector_info.vectorized_leaf_id; + const auto is_aligned = + vector_dim->getParallelType() == ParallelType::Vectorize; + + // Find fusion inputs and outputs that are used with misaligned + // vectorization. + if (!is_aligned) { TORCH_INTERNAL_ASSERT( - in_tv->isFusionInput() || out_tv->isFusionOutput(), + producer_tv->isFusionInput() || consumer_tv->isFusionOutput(), "MisalignedVectorize is assumed to be used with either input or output tensor"); - if (out_tv->getMemoryType() == MemoryType::Global && - in_tv->getMemoryType() == MemoryType::Local) { - global_out_misaligned_tv.insert(out_tv); + if (consumer_tv->getMemoryType() == MemoryType::Global && + producer_tv->getMemoryType() == MemoryType::Local) { + vectorized_tensor_info_ptr->global_out_misaligned_tv.insert( + consumer_tv); } else if ( - in_tv->getMemoryType() == MemoryType::Global && - out_tv->getMemoryType() == MemoryType::Local) { - global_inp_misaligned_tv.insert(in_tv); + producer_tv->getMemoryType() == MemoryType::Global && + consumer_tv->getMemoryType() == MemoryType::Local) { + vectorized_tensor_info_ptr->global_inp_misaligned_tv.insert( + producer_tv); } else { TORCH_INTERNAL_ASSERT( false, "Unsupported memory configuration for misaligned vectorization."); } } - } - // Check striding information on input and outputs as well as size information - // of all - auto& inp_misaligned_tensors_pos = - vectorized_tensor_info_ptr->inp_misaligned_tensors_pos; - auto& out_misaligned_tensors_pos = - vectorized_tensor_info_ptr->out_misaligned_tensors_pos; - auto& inp_pos_to_word_size_map_to_verify = - vectorized_tensor_info_ptr->inp_pos_to_word_size_map_to_verify; - auto& out_pos_to_word_size_map_to_verify = - vectorized_tensor_info_ptr->out_pos_to_word_size_map_to_verify; - auto& intermediate_tv_to_word_size_map_to_verify = - vectorized_tensor_info_ptr->intermediate_tv_to_word_size_map_to_verify; - - for (auto entry : tv_to_vector_word_size) { - auto tv = entry.first; - auto word_size = entry.second; - if (tv->isFusionInput()) { - auto inp_it = - std::find(fusion->inputs().begin(), fusion->inputs().end(), tv); - TORCH_INTERNAL_ASSERT( - inp_it != fusion->inputs().end(), - "Could not find ", - tv, - " in fusion inputs."); - auto inp_pos = std::distance(fusion->inputs().begin(), inp_it); - - if (global_inp_misaligned_tv.find(tv) != global_inp_misaligned_tv.end()) { - inp_misaligned_tensors_pos.emplace_back(inp_pos); - } else { - // Shouldn't visit same pos twice here, assert ? - inp_pos_to_word_size_map_to_verify[inp_pos] = word_size; - } - } else if (tv->isFusionOutput()) { - auto out_it = - std::find(fusion->outputs().begin(), fusion->outputs().end(), tv); - TORCH_INTERNAL_ASSERT( - out_it != fusion->outputs().end(), - "Could not find ", - tv, - " in provided fusion outputs."); - auto out_pos = std::distance(fusion->outputs().begin(), out_it); - - if (global_out_misaligned_tv.find(tv) != global_out_misaligned_tv.end()) { - out_misaligned_tensors_pos.emplace_back(out_pos); + // Collect information on corresponding fusion input and output + // tensors to verify strides. + auto inp_or_out_info = + getVectorizedFusionInputOutput(producer_tv, consumer_tv, kernel); + + // If both producer and consumer are contig and intermediate, + // nothing to validate with respect to strides. + if (inp_or_out_info.empty()) { + continue; + } + + // Misaligned vectorize only allows from input to local or local + // to output + if (!is_aligned) { + TORCH_INTERNAL_ASSERT(inp_or_out_info.size() == 1); + } + + for (const auto& inp_or_out : inp_or_out_info) { + const bool is_input = inp_or_out.first; + const int pos = inp_or_out.second; + + if (is_aligned) { + auto& pos_list = is_input + ? vectorized_tensor_info_ptr->aligned_vectorized_inp_tensor_pos + : vectorized_tensor_info_ptr->aligned_vectorized_out_tensor_pos; + pos_list.push_back(pos); } else { - out_pos_to_word_size_map_to_verify[out_pos] = word_size; + auto& map = is_input + ? vectorized_tensor_info_ptr->inp_misaligned_tensors_pos + : vectorized_tensor_info_ptr->out_misaligned_tensors_pos; + map.emplace_back(pos); } - } else { - // Intermediate tensors. Note that this must be Vectorize as - // MisalignedVectorize is only supported for inputs and outputs. - intermediate_tv_to_word_size_map_to_verify[tv] = word_size; } } return vectorized_tensor_info_ptr; } -} // namespace -// Misaligned vectorization check. Currently misaligned vectorization is limited -// to global-register and register-global load/store patterns. However, this -// could be improved to include shared memory. -void validateVectorizedTensors( +// Make sure the root domain(s) comprising the vectorized leaf domain +// have the (merged) extent that is divisible by the vectorization +// word size. +void validateAlignedVectorizeExtents( + const VectorizedSetInfo& info, + kir::ExpressionEvaluator& expr_eval) { + int64_t vectorized_merged_domain_extent = 1; + for (auto id : info.contig_root_ids) { + auto extent_val = expr_eval.evaluate(id->extent()); + TORCH_INTERNAL_ASSERT( + extent_val.has_value(), + "Error vectorizing, ", + info.consumer_tv->toString(), + " as the extent of a vectorized root domain, ", + id->toString(), + ", is unknown."); + vectorized_merged_domain_extent *= extent_val.value(); + } + + TORCH_INTERNAL_ASSERT( + vectorized_merged_domain_extent % info.word_size == 0, + "Error vectorizing, ", + info.consumer_tv->toString(), + " as the extent of the indexed domain, ", + vectorized_merged_domain_extent, + ", is not divisible by vector word size ", + info.word_size); +} + +void validateAlignedVectorizedFusionInputOutput( + const IValue& aten_val, + int word_size, + TensorView* tv) { + TORCH_INTERNAL_ASSERT(aten_val.isTensor()); + + const auto& aten_tensor = aten_val.toTensor(); + + TORCH_INTERNAL_ASSERT( + reinterpret_cast(aten_tensor.data_ptr()) % + (word_size * aten_tensor.dtype().itemsize()) == + 0, + "Vectorization of ", + tv->toString(), + " not possible as the memory address is not aligned. ", + "Address: ", + aten_tensor.data_ptr(), + ", vector word size: ", + word_size, + ", data type: ", + aten_tensor.dtype()); + + // Traverse strides from the right-most domains. The rightmost + // domain must have stride 1. + int64_t cur_contig_stride = 1; + bool still_rightmost = true; + for (auto i = aten_tensor.ndimension() - 1; i >= 0; --i) { + const auto stride = aten_tensor.strides().at(i); + // If this domain is contiguous, then not necessary to check the + // stride. Otherwise, stride must be 1 if it's rightmost or + // divisible by word_size. + TORCH_INTERNAL_ASSERT( + stride == cur_contig_stride || (still_rightmost && stride == 1) || + (!still_rightmost && stride % word_size == 0), + "Vectorization of ", + tv->toString(), + " with word size ", + word_size, + " not possible due to invalid stride.", + " Domain: ", + tv->axis(i)->toString(), + ", stride: ", + stride) + // If the domain is size-1, the next domain is still considered + // rightmost. + const auto size = aten_tensor.sizes().at(i); + still_rightmost = still_rightmost && size == 1; + cur_contig_stride = stride * size; + } +} + +void validateAlignedVectorizedTensors( kir::Kernel* kernel, const at::ArrayRef& inputs, const std::vector& outputs, caching::ExecutorCompileTimeInfoCache* data_cache, kir::ExpressionEvaluator& expr_eval) { - FUSER_PERF_SCOPE("FusionExecutor::validateVectorizedTensors"); - auto tensor_vectorization_validation_entry = executor_utils::caching::ExecutorCompileTimeEntry< executor_utils::caching::VectorizedTensorValidation>( @@ -594,40 +609,51 @@ void validateVectorizedTensors( return executor_utils::getVectorizedTensorValidationInfo(kernel); }); - // Validate all the canVectorizes: - for (auto it : tensor_vectorization_validation_entry.get() - .inp_pos_to_word_size_map_to_verify) { - TORCH_INTERNAL_ASSERT( - canVectorize(inputs[it.first], it.second), - "Error vectorizing, ", - kernel->inputs()[it.first], - " as input provided does not allowed vectorization by word size, ", - it.second); - } + // Verify extents of aligned vectorized tensors + for (const auto& vec_info : kernel->summary().vectorized_set_info) { + auto in_tv = vec_info.producer_tv; + auto out_tv = vec_info.consumer_tv; - if (outputs.size() > 0) { - for (auto it : tensor_vectorization_validation_entry.get() - .out_pos_to_word_size_map_to_verify) { - TORCH_INTERNAL_ASSERT( - canVectorize(outputs[it.first], it.second), - "Error vectorizing, ", - kernel->outputs()[it.first], - " as output provided does not allowed vectorization by word size, ", - it.second); + if (vec_info.vectorized_leaf_id->getParallelType() == + ParallelType::Vectorize) { + validateAlignedVectorizeExtents(vec_info, expr_eval); } } - for (auto it : tensor_vectorization_validation_entry.get() - .intermediate_tv_to_word_size_map_to_verify) { - auto tv = it.first; - auto vec_width = it.second; - TORCH_INTERNAL_ASSERT( - canVectorize(tv, vec_width, expr_eval), - "Error vectorizing, ", - tv->toString(), - " as the extent of the vectorized axis does not allowed vectorization by word size, ", - vec_width); + // Validate input and output tensors with aligend + // vectorization. + for (auto pos : tensor_vectorization_validation_entry.get() + .aligned_vectorized_inp_tensor_pos) { + auto tv = kernel->inputs().at(pos)->as(); + auto word_size = kernel->summary().vectorized_accesses.at(tv); + validateAlignedVectorizedFusionInputOutput(inputs[pos], word_size, tv); + } + + if (!outputs.empty()) { + for (auto pos : tensor_vectorization_validation_entry.get() + .aligned_vectorized_out_tensor_pos) { + auto tv = kernel->outputs().at(pos)->as(); + auto word_size = kernel->summary().vectorized_accesses.at(tv); + validateAlignedVectorizedFusionInputOutput(outputs[pos], word_size, tv); + } } +} + +// Misaligned vectorization check. Currently misaligned vectorization is limited +// to global-register and register-global load/store patterns. However, this +// could be improved to include shared memory. +void validateMisalignedVectorizedTensors( + kir::Kernel* kernel, + const at::ArrayRef& inputs, + const std::vector& outputs, + caching::ExecutorCompileTimeInfoCache* data_cache, + kir::ExpressionEvaluator& expr_eval) { + auto tensor_vectorization_validation_entry = + executor_utils::caching::ExecutorCompileTimeEntry< + executor_utils::caching::VectorizedTensorValidation>( + data_cache, [kernel]() { + return executor_utils::getVectorizedTensorValidationInfo(kernel); + }); std::vector inp_misaligned_tensors; std::vector out_misaligned_tensors; @@ -659,6 +685,51 @@ void validateVectorizedTensors( inp_misaligned_tensors, out_misaligned_tensors), "All global tensors must have the same stride for misaligned vectorization."); +} + +// Check if there's any split that is non-divisible and vectorized. If +// found, Vectorize is illegal. +void validateVectorizedSplits( + kir::Kernel* kernel, + kir::ExpressionEvaluator& expr_eval) { + for (const auto& extent_factor : kernel->summary().splits_to_validate) { + auto input_extent = expr_eval.evaluate(extent_factor.first); + auto split_factor = expr_eval.evaluate(extent_factor.second); + TORCH_INTERNAL_ASSERT( + input_extent.has_value(), + "Could not check if a split with vectorization is divisible because the extent, ", + extent_factor.first->toString(), + ", is not possible to evaluate."); + TORCH_INTERNAL_ASSERT( + input_extent.has_value(), + "Could not check if a split with vectorization is divisible because the split factor, ", + extent_factor.second->toString(), + ", is not possible to evaluate."); + TORCH_INTERNAL_ASSERT( + input_extent.value() % split_factor.value() == 0, + "Non-divisible split with vectorization is detected. ", + "Extent: ", + input_extent.value(), + ". Factor: ", + split_factor.value()); + } +} + +} // namespace + +void validateVectorizedTensors( + kir::Kernel* kernel, + const at::ArrayRef& inputs, + const std::vector& outputs, + caching::ExecutorCompileTimeInfoCache* data_cache, + kir::ExpressionEvaluator& expr_eval) { + FUSER_PERF_SCOPE("FusionExecutor::validateVectorizedTensors"); + + validateAlignedVectorizedTensors( + kernel, inputs, outputs, data_cache, expr_eval); + + validateMisalignedVectorizedTensors( + kernel, inputs, outputs, data_cache, expr_eval); validateVectorizedSplits(kernel, expr_eval); } @@ -686,8 +757,8 @@ kir::ExpressionEvaluator bindKernelInputs( i); const auto aten_tensor = aten_inputs[i].toTensor(); - const auto root_domain = - TensorDomain::noReductions(tensor_input->domain()->getRootDomain()); + const auto root_domain = TensorDomain::noReductions( + tensor_input->domain()->getMaybeRFactorDomain()); TORCH_INTERNAL_ASSERT( aten_tensor.ndimension() == static_cast(root_domain.size()), "Something went wrong configuring launch. Inputs no longer match."); @@ -695,6 +766,11 @@ kir::ExpressionEvaluator bindKernelInputs( for (const auto dim : c10::irange(root_domain.size())) { const auto extent = root_domain[dim]->extent(); const auto value = aten_tensor.sizes()[dim]; + if (value == 0 && tensor_input->uses().empty()) { + // If there's no uses, ignore there's a size-0 dimension. + continue; + } + TORCH_INTERNAL_ASSERT(value != 0, "Cannot handle size-0 dimensions"); bool should_bind = true; if (check_consistency) { const auto prev_value = expr_eval.evaluate(extent); @@ -717,7 +793,9 @@ kir::ExpressionEvaluator bindKernelInputs( // NOLINTNEXTLINE: https://bugs.llvm.org/show_bug.cgi?id=48525 } else if (input->isScalar() && input->dtype() == DataType::Int) { TORCH_INTERNAL_ASSERT( - aten_inputs[i].type()->kind() == c10::TypeKind::IntType); + aten_inputs[i].type()->kind() == c10::TypeKind::IntType, + "kernel expected Scalar Int inputs, but found", + aten_inputs[i].type()->str()); expr_eval.bind(input, aten_inputs[i].toInt()); } } @@ -748,14 +826,19 @@ ExpressionEvaluator bindFusionInputs( "Something went wrong configuring launch. Inputs do not match."); auto aten_tensor = aten_inputs[i].toTensor(); - auto root_dom = TensorDomain::noReductions(cg_tensor->getRootDomain()); + auto root_dom = + TensorDomain::noReductions(cg_tensor->getMaybeRFactorDomain()); TORCH_INTERNAL_ASSERT( aten_tensor.ndimension() == (int64_t)root_dom.size(), "Something went wrong configuring launch. Inputs do not match."); - for (const auto dim : c10::irange(root_dom.size())) { const auto extent = root_dom[dim]->extent(); const auto value = aten_tensor.sizes()[dim]; + if (value == 0 && cg_tensor->uses().empty()) { + // If there's no uses, ignore there's a size-0 dimension. + continue; + } + TORCH_INTERNAL_ASSERT(value != 0, "Cannot handle size-0 dimensions"); const auto prev_value = evaluator.evaluate(extent); if (prev_value.has_value()) { TORCH_CHECK( @@ -774,7 +857,9 @@ ExpressionEvaluator bindFusionInputs( inputs[i]->getValType().value() == ValType::Scalar && inputs[i]->getDataType().value() == DataType::Int) { TORCH_INTERNAL_ASSERT( - aten_inputs[i].type()->kind() == c10::TypeKind::IntType); + aten_inputs[i].type()->kind() == c10::TypeKind::IntType, + "fusion expected Scalar Int inputs, but found", + aten_inputs[i].type()->str()); evaluator.bind(inputs[i], aten_inputs[i].toInt()); } } diff --git a/torch/csrc/jit/codegen/cuda/executor_utils.h b/torch/csrc/jit/codegen/cuda/executor_utils.h index 93deec6343f1fb..eb73643ed8d895 100644 --- a/torch/csrc/jit/codegen/cuda/executor_utils.h +++ b/torch/csrc/jit/codegen/cuda/executor_utils.h @@ -147,15 +147,18 @@ class WarpPaddedParallelExtents { //! VectorizedTensorInfo: //! Auxiliary data type for entry class VectorizedTensorValidation struct VectorizedTensorInfo { + //! Aligned vectorized fusion inputs + std::vector aligned_vectorized_inp_tensor_pos; + //! Aligned vectorized fusion outputs + std::vector aligned_vectorized_out_tensor_pos; + //! Misaligned vectorized input tensors std::unordered_set global_inp_misaligned_tv; + //! Misaligned vectorized output tensors std::unordered_set global_out_misaligned_tv; - std::unordered_map tv_to_vector_word_size; + //! Positions of misaligned input tensors std::vector inp_misaligned_tensors_pos; + //! Positions of misaligned output tensors std::vector out_misaligned_tensors_pos; - std::unordered_map inp_pos_to_word_size_map_to_verify; - std::unordered_map out_pos_to_word_size_map_to_verify; - std::unordered_map - intermediate_tv_to_word_size_map_to_verify; }; //! Compile-time info to be cached in each FusionExecutor: diff --git a/torch/csrc/jit/codegen/cuda/fusion.cpp b/torch/csrc/jit/codegen/cuda/fusion.cpp index be686c0d9439ab..f96237ee9d6874 100644 --- a/torch/csrc/jit/codegen/cuda/fusion.cpp +++ b/torch/csrc/jit/codegen/cuda/fusion.cpp @@ -176,9 +176,17 @@ void Fusion::removeVal(Val* val) { void Fusion::addInput(Val* input) { assertInContainer(input, "Cannot register input "); + TORCH_INTERNAL_ASSERT( + input->getDataType() != DataType::Index, + "Data type Index is a local compile time data type only, it cannot be used as an input in case it was generated from another kernel."); + if (input->getValType().value() == ValType::TensorView) { auto tv = input->as(); tv->setMemoryType(MemoryType::Global); + } else if (input->getValType().value() == ValType::Scalar) { + TORCH_CHECK( + !input->isConst(), + "Immediate scalar value cannot be added as an input. It is not necessary to pass it as an input."); } inputs_.push_back(input); @@ -188,6 +196,19 @@ void Fusion::addInput(Val* input) { } void Fusion::addOutput(Val* output) { + // We currently don't support explicitly outputing aliased inputs. This is + // because they are already marked as output for in-place update. It's tricky + // to allow marking them explicitly as real output, since that requires us to + // register/identify output not only by `Val*` pointer, but also by indices; + // it also requires us to magically arrange `outputs_` entries in proper order + // ^^^ this doesn't look intuitive on `outputs_` in fusion. + // I think we can solve this by marking addOutput on io_alias_ keys after + // fusion is fully defined. Tracking this in #1488 + // Apparently we can't do this neither at the time. I think segmentation + // unfortunately would call addOutput after we marked io_alias_ map. + // TORCH_CHECK(io_alias_.count(output) == 0, + // "can't register aliased output as real output"); + assertInContainer(output, "Cannot register output "); if (output->getValType().value() == ValType::TensorView) { auto tv = output->as(); @@ -304,13 +325,13 @@ void Fusion::print() { std::cout << "}\n\n"; } -void Fusion::printKernel() { +void Fusion::printKernel(DataType index_type) { FUSER_PERF_SCOPE("Fusion::printKernel"); TORCH_INTERNAL_ASSERT( !this->isA(), "Cannot \"print kernel\" of a kernel container. ", "This would require lowering during lowering."); - std::cout << codegen::generateCudaKernel(GpuLower(this).kernel()); + std::cout << codegen::generateCudaKernel(GpuLower(this, index_type).kernel()); } void Fusion::printMath(bool from_outputs_only) { @@ -567,6 +588,33 @@ bool Fusion::isAliasCompatible(Val* left, Val* right) { } void Fusion::aliasOutputToInput(Val* output, Val* input) { + // Because we could cast output when input is casted. + TORCH_INTERNAL_ASSERT( + !output->isFusionOutput(), + "Do NOT add aliased output to fusion output outside of `aliasOutputToInput"); + + if (!input->isFusionInput()) { + auto input_expr = input->definition(); + // TORCH_INTERNAL_ASSERT(input_def.etype() == ExprType::UnaryOp, "expected + // unary op for aliased input"); + TORCH_INTERNAL_ASSERT( + input_expr->isA(), "expected unary op for aliased input"); + auto input_uop = input_expr->as(); + TORCH_INTERNAL_ASSERT( + input_uop->getUnaryOpType() == UnaryOpType::Cast, + "expected aliased input to be output of cast op"); + input = input_uop->in(); + } + TORCH_INTERNAL_ASSERT( + input->getDataType().has_value() && output->getDataType().has_value(), + "requires DataType to be available for aliased output to input"); + + if (input->getDataType().value() != output->getDataType().value()) { + output = castOp(input->getDataType().value(), output); + } + // TODO: output should be marked at the end of fusion definition #1488 + addOutput(output); + TORCH_INTERNAL_ASSERT( isAliasCompatible(input, output), "The input and output values are not alias-compatible."); diff --git a/torch/csrc/jit/codegen/cuda/fusion.h b/torch/csrc/jit/codegen/cuda/fusion.h index 2e76e00896b5f3..e67b287288f908 100644 --- a/torch/csrc/jit/codegen/cuda/fusion.h +++ b/torch/csrc/jit/codegen/cuda/fusion.h @@ -135,7 +135,7 @@ class TORCH_CUDA_CU_API Fusion : public IrContainer { void printTransforms(); //! Lower the fusion and print a kernel - void printKernel(); + void printKernel(DataType index_type = DataType::Int); //! Return a list of topologically sorted expressions. This only includes //! exprs required to genereate registered outputs. diff --git a/torch/csrc/jit/codegen/cuda/fusion_segmenter.cpp b/torch/csrc/jit/codegen/cuda/fusion_segmenter.cpp index 0e74ce172f9161..bec8f6e99ea361 100644 --- a/torch/csrc/jit/codegen/cuda/fusion_segmenter.cpp +++ b/torch/csrc/jit/codegen/cuda/fusion_segmenter.cpp @@ -1170,14 +1170,24 @@ std::unique_ptr SegmentedFusion::makeFusion(SegmentedGroup* sg) { fusion_segment->removeOutput(out); } + std::vector view_tvs; for (auto inp : getAllInputs(sg)) { - fusion_segment->addInput(complete_to_segment_map.clone(inp)); + auto clone_tv = complete_to_segment_map.clone(inp); + fusion_segment->addInput(clone_tv); + if (inp->isDefinitionType(ExprType::ViewOp)) { + TORCH_INTERNAL_ASSERT(clone_tv != nullptr && clone_tv->isA()); + view_tvs.push_back(clone_tv->as()); + } } for (auto out : getAllOutputs(sg)) { fusion_segment->addOutput(complete_to_segment_map.clone(out)); } + for (auto tv : view_tvs) { + tv->convertRfactorToRootDomain(); + } + return fusion_segment; } @@ -2715,8 +2725,8 @@ void SegmentCandidateFinder::findSegments() { } } - auto reduction_ops = - ir_utils::getReductionOps(segmented_fusion_->completeFusion()); + auto reduction_ops = ir_utils::getReductionOps( + segmented_fusion_->completeFusion(), true /* ignore_trivial */); auto welford_ops = ir_utils::filterByType(reduction_ops); if (options_.run_translate_welford && @@ -2798,12 +2808,12 @@ void SegmentCandidateFinder::findSegments() { if (options_.run_final_merge) { // TODO: consider interleaving herrmman merge and bruteforce merge, as - // bruteforce merge can introduce - // opportunities for more herrmann merge + // bruteforce merge can introduce opportunities for more herrmann merge finalMerge(); } finalize(); + if (isDebugDumpEnabled(DebugDumpOption::FusionSegmentsDrawing)) { segmented_fusion_->draw(); } @@ -3012,6 +3022,7 @@ void SegmentCandidateFinder::finalize() { // Finalize each group, fill in the missing inputs, i.e. tensor dims. for (auto g : groups()) { + g->setHeuristic(deriveHeuristic(g)); g->finalize(); } } diff --git a/torch/csrc/jit/codegen/cuda/fusion_segmenter.h b/torch/csrc/jit/codegen/cuda/fusion_segmenter.h index 63124839fc1e1d..6e8b15cb67b851 100644 --- a/torch/csrc/jit/codegen/cuda/fusion_segmenter.h +++ b/torch/csrc/jit/codegen/cuda/fusion_segmenter.h @@ -129,7 +129,7 @@ class TORCH_CUDA_CU_API SegmentedGroup { int group_id_ = -1; //! The scheduler to use for compiling this group - ScheduleHeuristic heuristic_ = ScheduleHeuristic::PointWise; + ScheduleHeuristic heuristic_ = ScheduleHeuristic::None; //! Exprs that make up the group std::vector exprs_; diff --git a/torch/csrc/jit/codegen/cuda/graph_fuser.cpp b/torch/csrc/jit/codegen/cuda/graph_fuser.cpp index dee3fa50fb4051..92ad8ce80fbf8c 100644 --- a/torch/csrc/jit/codegen/cuda/graph_fuser.cpp +++ b/torch/csrc/jit/codegen/cuda/graph_fuser.cpp @@ -945,7 +945,11 @@ struct CudaGraphFuser { // extended shape expression support to reduction operations // TODO: `aten::sum` is too flexible, we should restrict for a better // match - if (n->kind() == aten::sum) { + // TODO: Add python tests where we check for existing ops and their + // shape expression logic. + static std::unordered_set reduction_ops( + {aten::sum, aten::mean, aten::var, aten::std}); + if (reduction_ops.find(n->kind()) != reduction_ops.end()) { // TODO: expand support to wire non-constant inputs, this is currently // blocked by profiling executor not capable of profiling scalar inputs. TORCH_INTERNAL_ASSERT( @@ -1102,7 +1106,8 @@ struct CudaGraphFuser { // TODO: failure in buildShapeExpressions should not break fusion execution, // we can add a try/catch here to bailout from removeOutputsUsedOnlyInSize. GRAPH_DEBUG("before build shape expression: ", *graph_); - fusion_value_to_runtime_shape_ = buildShapeExpressions(fusion_group); + auto shape_map = buildShapeExpressions(fusion_group); + fusion_value_to_runtime_shape_.insert(shape_map.begin(), shape_map.end()); GRAPH_DEBUG("after build shape expression: ", *graph_); auto outputs = fusion_group->outputs().vec(); @@ -1113,14 +1118,12 @@ struct CudaGraphFuser { for (int64_t i = static_cast(outputs.size()) - 1; i >= 0; --i) { auto output = outputs[i]; auto soutput = soutputs[i]; - if (usedOnlyInDtypeAndSize(output) && - fusion_value_to_runtime_shape_.count(soutput) > 0) { + if (usedOnlyInDtypeAndSize(output) && shape_map.count(soutput) > 0) { bool has_dtype = usedInDtype(output); auto uses = output->uses(); for (Use u : uses) { if (u.user->matches("aten::size(Tensor self) -> int[]")) { - u.user->output()->replaceAllUsesWith( - fusion_value_to_runtime_shape_.at(soutput)); + u.user->output()->replaceAllUsesWith(shape_map.at(soutput)); u.user->destroy(); } else if (u.user->matches("prim::dtype(Tensor a) -> int")) { continue; @@ -1210,7 +1213,12 @@ struct CudaGraphFuser { for (Node* node : block_->nodes()) { for (Block* sub_block : node->blocks()) { - CudaGraphFuser(sub_block, graph_).run(); + CudaGraphFuser sub_block_cfg(sub_block, graph_); + sub_block_cfg.run(); + // Accumulate runtime shapes for all sub-blocks + fusion_value_to_runtime_shape_.insert( + sub_block_cfg.fusion_value_to_runtime_shape_.begin(), + sub_block_cfg.fusion_value_to_runtime_shape_.end()); } } } @@ -1605,17 +1613,19 @@ void guardFusionGroup( // TODO: Add support for dynamic split to view guard // Path from profile-ivalue to prim::view_copy operation - // profile-ivalue -> Uses: [Constant, CudaFusionGroup] + // profile-ivalue -> Constant -> CudaFusionGroup // Get argument position in CudaFusionGroup // Get argument in subgraph for CudaFusionGroup // CudaFusionGroup argument -> Constant List -> prim::view_copy - auto cuda_fusion_group_arg = profiled_ival->uses().back().offset; - auto subgraph_arg = fusion_graph->inputs()[cuda_fusion_group_arg]; + auto subgraph_arg = fusion_graph->inputs()[offset]; auto constant = subgraph_arg->uses().front().user->output(); + + TORCH_INTERNAL_ASSERT(!constant->uses().empty()); auto view = constant->uses().front().user; TORCH_INTERNAL_ASSERT( view->kind() == prim::view_copy || view->kind() == prim::reshape_copy); + ivalue_check = guardView( fusion, fusion_value_to_runtime_size, @@ -1710,11 +1720,15 @@ void guardFusionGroups( // c. restore conditional constant to non-constant for fallback guardFusionGroup(fusion, fusion_value_to_runtime_size); } +} - if (GRAPH_DEBUG_ENABLED) { - GRAPH_DEBUG("Exporting all NVFuser fusions:"); - for (Node* fusion : fusions) { - GRAPH_EXPORT("", fusion->g(attr::Subgraph)); +void dumpFusionGroups(std::shared_ptr& g) { + DepthFirstGraphNodeIterator it(g); + Node* n = nullptr; + GRAPH_DEBUG("Exporting all NVFuser fusions:"); + while ((n = it.next()) != nullptr) { + if (n->kind() == prim::FallbackGraph) { + GRAPH_EXPORT("", n->g(attr::Subgraph)); } } } @@ -2009,23 +2023,6 @@ void ExtractProfileIValue(Node* profile_ivalue) { } } -void traverseProfileIValues( - Block* block, - const std::function& func) { - std::vector profile_ivalues; - for (Node* n : block->nodes()) { - for (Block* b : n->blocks()) { - traverseProfileIValues(b, func); - } - if (n->kind() == prim::profile_ivalue) { - profile_ivalues.push_back(n); - } - } - for (Node* profile_ivalue : profile_ivalues) { - func(profile_ivalue); - } -} - // break `linear` layer into `matmul` and `add_optional`. This allows us to fuse // the binary operation without supporting gemm. // Note that we are not breaking `linear` layer without bias. @@ -2086,48 +2083,58 @@ void decomposeLinearOps(Block* block) { // Replace 'operation' with 'operation_copy' to guard alias operations. // Supports View, Reshape, Squeeze, and Unsqueeze void replaceAliasOpsWithCopy(std::shared_ptr& graph, Block* block) { - static std::unordered_map op_mapping( - {{aten::view, prim::view_copy}, + static std::unordered_map alias_to_copy_mapping( + // TODO: revert disabled aten::view + {// {aten::view, prim::view_copy}, {aten::reshape, prim::reshape_copy}, {aten::squeeze, prim::squeeze_copy}, {aten::unsqueeze, prim::unsqueeze_copy}}); - std::vector maybe_alias_nodes; + std::vector maybe_safe_alias_nodes; for (Node* n : block->nodes()) { for (Block* b : n->blocks()) { replaceAliasOpsWithCopy(graph, b); } - if (op_mapping.find(n->kind()) != op_mapping.end()) { - maybe_alias_nodes.push_back(n); + if (alias_to_copy_mapping.find(n->kind()) != alias_to_copy_mapping.end()) { + maybe_safe_alias_nodes.push_back(n); } } auto alias_db = std::make_unique(graph); - for (Node* n : maybe_alias_nodes) { - if (!alias_db->safeToChangeAliasingRelationship( - n->input(0), n->output(0))) { - continue; - } + auto safeToChangeAliasToCopy = [&alias_db](Node* n) { + return !alias_db->hasWriters(n->input(0)) && + !alias_db->hasWriters(n->output(0)); + }; + + auto replaceAliasWithCopy = [&graph, &alias_db](Node* n) { WithInsertPoint guard(n); - auto op_copy = - graph->insertNode(graph->create(op_mapping[n->kind()], n->inputs(), 1)); - op_copy->output()->setType(n->output(0)->type()); + auto copy_op = graph->insertNode( + graph->create(alias_to_copy_mapping[n->kind()], n->inputs(), 1)); + copy_op->output()->setType(n->output(0)->type()); // adding newly created value into alias_db; - alias_db->createValue(op_copy->output()); + alias_db->createValue(copy_op->output()); - n->output()->replaceAllUsesWith(op_copy->output()); + n->output()->replaceAllUsesWith(copy_op->output()); n->destroy(); + }; + + for (Node* n : maybe_safe_alias_nodes) { + if (!safeToChangeAliasToCopy(n)) { + continue; + } + replaceAliasWithCopy(n); } } -// Revert all 'op_copy' with 'op' except in CudaFusionGroup +// Revert all 'operation_copy' with 'operation' except in CudaFusionGroup // e.g., Any non-fused alias operation including within the prim::FallbackGraph // Supports View, Reshape, Squeeze, and Unsqueeze void revertAliasCopyOps(std::shared_ptr& graph, Block* block) { - static std::unordered_map op_mapping( - {{prim::view_copy, aten::view}, + static std::unordered_map copy_to_alias_mapping( + // TODO: revert disabled aten::view + {// {prim::view_copy, aten::view}, {prim::reshape_copy, aten::reshape}, {prim::squeeze_copy, aten::squeeze}, {prim::unsqueeze_copy, aten::unsqueeze}}); @@ -2147,18 +2154,22 @@ void revertAliasCopyOps(std::shared_ptr& graph, Block* block) { revertAliasCopyOps(graph, b); } // Revert any non-fused alias copy ops - if (op_mapping.find(n->kind()) != op_mapping.end()) { + if (copy_to_alias_mapping.find(n->kind()) != copy_to_alias_mapping.end()) { alias_copy_ops.push_back(n); } } - for (Node* n : alias_copy_ops) { + auto replaceCopyWithAlias = [&graph](Node* n) { WithInsertPoint guard(n); - auto reverted_op = - graph->insertNode(graph->create(op_mapping[n->kind()], n->inputs(), 1)); - reverted_op->output()->setType(n->output(0)->type()); - n->output()->replaceAllUsesWith(reverted_op->output()); + auto alias_op = graph->insertNode( + graph->create(copy_to_alias_mapping[n->kind()], n->inputs(), 1)); + alias_op->output()->setType(n->output(0)->type()); + n->output()->replaceAllUsesWith(alias_op->output()); n->destroy(); + }; + + for (Node* n : alias_copy_ops) { + replaceCopyWithAlias(n); } } @@ -2242,6 +2253,67 @@ bool removeInplaceOperations(const std::shared_ptr& graph) { graph, [&](Node* node) { return inplace_ops.count(node->kind()) != 0; }); } +// Recursively traverse blocks, gather all nodes with given symbol, +// and then apply mutator function. +void mutateNode( + Block* block, + Symbol symbol, + const std::function& func) { + // Recursively call mutateNode on blocks + // Gather all nodes with given symbol + std::vector nodes; + for (Node* n : block->nodes()) { + for (Block* b : n->blocks()) { + mutateNode(b, symbol, func); + } + if (n->kind() == symbol) { + nodes.push_back(n); + } + } + + // Apply mutator funcion to every node + for (Node* n : nodes) { + func(n); + } +} + +// For the given CudaFusionGroup, separate nested views and remove any unused, +// intermediate views +void separateNestedViews(Node* cuda_fusion_group) { + TORCH_INTERNAL_ASSERT(cuda_fusion_group->kind() == prim::CudaFusionGroup); + + auto isView = [](Node* node) { + static std::unordered_set alias_op_set( + {prim::view_copy, prim::reshape_copy}); + return alias_op_set.find(node->kind()) != alias_op_set.end(); + }; + + // node -> input / output values + auto isNestedView = [&isView](Node* node) { + return isView(node) && isView(node->input(0)->node()); + }; + + auto subgraph = cuda_fusion_group->g(attr::Subgraph); + for (auto node : subgraph->block()->nodes()) { + if (isNestedView(node)) { + // grandparent -> (view / reshape) parent -> (view / reshape) node + auto parent_value = node->input(0); + auto parent = parent_value->node(); + + auto grandparent_value = parent->input(0); + auto grandparent = grandparent_value->node(); + + // Before: gp -> x -> n + // After: gp -> x / gp -> n + // Delete x if no more uses + node->replaceInputWith(parent_value, grandparent_value); + if (!parent->hasUses()) { + parent->destroy(); + } + } + } +} + } // anonymous namespace void CudaFuseGraph(std::shared_ptr& graph) { @@ -2252,7 +2324,7 @@ void CudaFuseGraph(std::shared_ptr& graph) { // I don't know how to store edge/node in attribute. so let's abuse data flow // dependency and add inputs to conditional constant generated by // aten::profile_ivalue - traverseProfileIValues(graph->block(), ExtractProfileIValue); + mutateNode(graph->block(), prim::profile_ivalue, ExtractProfileIValue); GRAPH_DEBUG("insert conditional constant from profile_ivalue: ", *graph); // TODO: we need to properly restore shape information after fusion. @@ -2292,7 +2364,7 @@ void CudaFuseGraph(std::shared_ptr& graph) { alterBatchNormImpls(graph->block()); GRAPH_DEBUG("After _batch_norm_impl_index: ", *graph); - traverseProfileIValues(graph->block(), RemoveProfileIValue); + mutateNode(graph->block(), prim::profile_ivalue, RemoveProfileIValue); GRAPH_DEBUG("Before remove missing profiling: ", *graph); removeFusionWithMissingProfilingInformation(graph->block()); @@ -2302,9 +2374,15 @@ void CudaFuseGraph(std::shared_ptr& graph) { removeOutputUsedOnlyInDtype(graph->block()); GRAPH_DEBUG("After removeOutputUsedOnlyInDtype: ", *graph); + mutateNode(graph->block(), prim::CudaFusionGroup, separateNestedViews); + GRAPH_DEBUG( + "separate nested and delete redundant views in CudaFusionGroup:", *graph); + revertAliasCopyOps(graph, graph->block()); GRAPH_DEBUG("revert alias_copy ops by nvfuser: ", *graph); + dumpFusionGroups(graph); + // After FuseGraph some common subexpressions may come back EliminateCommonSubexpression(graph); // We might have emitted a fair amount of useless shape propagating code, so diff --git a/torch/csrc/jit/codegen/cuda/index_compute.cpp b/torch/csrc/jit/codegen/cuda/index_compute.cpp index 8e151372b7558f..16cc960791c678 100644 --- a/torch/csrc/jit/codegen/cuda/index_compute.cpp +++ b/torch/csrc/jit/codegen/cuda/index_compute.cpp @@ -3,6 +3,7 @@ #include #include #include +#include #include #include #include @@ -27,203 +28,6 @@ namespace cuda { namespace { -// A merge is contiguous if: -// Inputs of outer are to the left in the root domain of the inputs of RHS. -// All inputs are contiguous in the root domain: -// - All marked as contiguous -// - Only gaps between inputs are broadcast or reductoin dims -// There are no split transformations performed on outer or inner -// All transformations on outer or inner are contiguous merges -// If this criteria holds, then we can index the input root domains of this -// merge with the indexing provided to the output of the merge in the backward -// index pass - -class ContigIDs : public OptInDispatch { - private: - using OptInDispatch::handle; - - // Mark if ids are result of contigous merges - std::unordered_set contig_ids; - // Given contiguous domain, return all iter domains within its history. - std::unordered_map> - within_contig_ids; - const std::vector& root_domain_; - const std::vector& root_contiguity_; - std::unordered_map is_contig_root; - - bool inRoot(const std::vector& ids) { - return std::all_of(ids.begin(), ids.end(), [this](IterDomain* id) { - return is_contig_root.find(id) != is_contig_root.end(); - }); - } - - bool isContig(IterDomain* id) { - return contig_ids.find(id) != contig_ids.end(); - } - - // Split outputs are not contiguous, don't need to do anything. - void handle(Split*) override {} - - void handle(Merge* merge) override { - // If either input is non-contiguous so is output. - const auto inner = merge->inner(); - const auto outer = merge->outer(); - - if (!isContig(inner) || !isContig(outer)) { - return; - } - - // Grab inputs, make sure they're in root domain, check if they're - // contiguous. - - auto lhs_inputs = - ir_utils::iterDomainInputsOfOrderedAs({outer}, root_domain_); - auto rhs_inputs = - ir_utils::iterDomainInputsOfOrderedAs({inner}, root_domain_); - - TORCH_INTERNAL_ASSERT( - inRoot(lhs_inputs) && inRoot(rhs_inputs), - "Found an invalid merge operation, inputs of its arguments are not in the root domain."); - - std::deque ordered_inputs( - lhs_inputs.begin(), lhs_inputs.end()); - ordered_inputs.insert( - ordered_inputs.end(), rhs_inputs.begin(), rhs_inputs.end()); - - // If any root input is not contig, output is not contig - if (!(std::all_of( - ordered_inputs.begin(), - ordered_inputs.end(), - [this](IterDomain* id) { - return is_contig_root.at(id) && !id->isBroadcast() && - !id->isReduction(); - }))) { - return; - } - - std::deque root_copy(root_domain_.begin(), root_domain_.end()); - - // Forward to first matching argument - while (!root_copy.empty() && !ordered_inputs.empty()) { - if (root_copy.front() != ordered_inputs.front()) { - root_copy.pop_front(); - } else { - break; - } - } - - // Forward through all matching arguments - while (!root_copy.empty() && !ordered_inputs.empty()) { - if (root_copy.front() == ordered_inputs.front()) { - root_copy.pop_front(); - ordered_inputs.pop_front(); - // This is no longer causing an error in: - // ReductionSchedulerMultiDimNonFastest TODO: test reenablement to make - // sure it does what's expected - // } else if ( - // root_copy.front()->isReduction() || - // root_copy.front()->isBroadcast()) { - // root_copy.pop_front(); - } else { - break; - } - } - - // If we matched all inputs, the output is contiguous. Only want to keep the - // top contig ID, lower ids should be placed in the "within_contig_ids" map - // of top id. - auto out = merge->out()->as(); - if (ordered_inputs.empty()) { - if (contig_ids.find(inner) != contig_ids.end()) { - contig_ids.erase(inner); - } - - if (contig_ids.find(outer) != contig_ids.end()) { - contig_ids.erase(outer); - } - - contig_ids.emplace(out); - - std::unordered_set within_out; - within_out.emplace(inner); - if (within_contig_ids.find(inner) != within_contig_ids.end()) { - auto in_inner = within_contig_ids.at(inner); - within_out.insert(in_inner.begin(), in_inner.end()); - within_contig_ids.erase(inner); - } - - within_out.emplace(outer); - if (within_contig_ids.find(outer) != within_contig_ids.end()) { - auto in_outer = within_contig_ids.at(outer); - within_out.insert(in_outer.begin(), in_outer.end()); - within_contig_ids.erase(outer); - } - - within_contig_ids[out] = within_out; - } - } - - public: - ContigIDs() = delete; - - // Check through the history of ids whose inputs map to root_domain with - // contiguity root_contiguity. Return unordered_set of all merges that are - // contiguous. Ignore root order is primarily used for predicate generation. - // In this case we can linearize indexing of any ID that only consists of - // merge operations. - ContigIDs( - const std::vector& ids, - const std::vector& root_domain, - const std::vector& root_contiguity) - : root_domain_(root_domain), root_contiguity_(root_contiguity) { - if (ids.empty()) { - return; - } - - TORCH_INTERNAL_ASSERT( - root_domain_.size() == root_contiguity_.size(), - "Arguments don't match ", - root_domain_.size(), - " != ", - root_contiguity_.size()); - - for (const auto i : c10::irange(root_domain_.size())) { - // If a root domain has halo, can't use merged domain even if - // both inputs are contiguous. HaloInfo is also initialized for - // rfactor root domains, which should just return "zero" - // RootAxisInfo. This should be safe as no rfactor tensor should - // need halo. - if (root_contiguity_[i] && - !GpuLower::current() - ->haloInfo() - .getRootAxisInfo(root_domain_[i]) - .hasHalo()) { - auto root_domain_i = root_domain_[i]->as(); - contig_ids.emplace(root_domain_i); - within_contig_ids[root_domain_i] = std::unordered_set(); - is_contig_root[root_domain_[i]] = true; - } else { - is_contig_root[root_domain_[i]] = false; - } - } - - auto exprs = StmtSort::getExprs(ids[0]->fusion(), {ids.begin(), ids.end()}); - - for (auto expr : exprs) { - handle(expr); - } - } - - const std::unordered_set contigIDs() const { - return contig_ids; - } - - const std::unordered_map> - withinContigIDs() const { - return within_contig_ids; - } -}; - // Update the HaloInfo mappings for a reference tensor by propagating // the halo information from the consumer tensor. void updateHaloInfoForReference( @@ -395,7 +199,7 @@ Val* getProducerOffsetWithGather( // producer offset: window_index - padding auto producer_offset = SimplifyingIrBuilder::subExpr( - window_idx, IrBuilder::create(pad_width)); + window_idx, SimplifyingIrBuilder::create(pad_width)); return producer_offset; } @@ -496,14 +300,14 @@ Val* getProducerIndexWithPartialSplit( if (consumer_offset->isZeroInt()) { return producer_index; } else { - return IrBuilder::addExpr(producer_index, consumer_offset); + return SimplifyingIrBuilder::addExpr(producer_index, consumer_offset); } } // Non-global case. Difference of the split offsets must be // accounted. - auto diff = IrBuilder::subExpr(consumer_offset, producer_offset); + auto diff = SimplifyingIrBuilder::subExpr(consumer_offset, producer_offset); kir::ExpressionEvaluator ee; auto diff_eval = ee.evaluate(diff); // We currently only allow constant offsetting @@ -513,8 +317,8 @@ Val* getProducerIndexWithPartialSplit( return producer_index; } - return IrBuilder::addExpr( - producer_index, IrBuilder::create(diff_eval.value())); + return SimplifyingIrBuilder::addExpr( + producer_index, SimplifyingIrBuilder::create(diff_eval.value())); } } // namespace @@ -564,13 +368,14 @@ void IndexCompute::handle(Split* split) { index_map_[in_id] = outer_ind; extent_map_[in_id] = getExtent(outer_id); } else { - index_map_[in_id] = IrBuilder::addExpr( - IrBuilder::mulExpr(outer_ind, getExtent(inner_id)), inner_ind); + index_map_[in_id] = SimplifyingIrBuilder::addExpr( + SimplifyingIrBuilder::mulExpr(outer_ind, getExtent(inner_id)), + inner_ind); // The extent should be updated only when its allocation is // partial, i.e., zero_merged_in is true. See PR #1270. if (zero_merged_in) { - extent_map_[in_id] = - IrBuilder::mulExpr(getExtent(outer_id), getExtent(inner_id)); + extent_map_[in_id] = SimplifyingIrBuilder::mulExpr( + getExtent(outer_id), getExtent(inner_id)); } } } @@ -679,8 +484,8 @@ void IndexCompute::handle(Merge* merge) { zero_merged_in_.emplace(inner_id); zero_merged_in_.emplace(outer_id); } else { - index_map_[outer_id] = IrBuilder::divExpr(out_ind, inner_extent); - index_map_[inner_id] = IrBuilder::modExpr(out_ind, inner_extent); + index_map_[outer_id] = SimplifyingIrBuilder::divExpr(out_ind, inner_extent); + index_map_[inner_id] = SimplifyingIrBuilder::modExpr(out_ind, inner_extent); } } @@ -724,7 +529,8 @@ IndexCompute::IndexCompute( ContigIDs contig_finder( td_->domain(), td_->getMaybeRFactorDomain(), root_contiguity); contig_ids = contig_finder.contigIDs(); - auto within_contig = contig_finder.withinContigIDs(); + root_to_contig_id_ = contig_finder.rootToIndexedID(); + const auto& within_contig = contig_finder.withinContigIDs(); for (auto contig_id : contig_ids) { if (index_map_.find(contig_id) != index_map_.end()) { TORCH_INTERNAL_ASSERT( @@ -734,6 +540,10 @@ IndexCompute::IndexCompute( } } } + } else { + for (auto root_id : td_->getMaybeRFactorDomain()) { + root_to_contig_id_[root_id] = root_id; + } } } @@ -744,7 +554,7 @@ void IndexCompute::run() { traverseFrom(td_->fusion(), domain_vals, false); } -Val* IndexCompute::getExtent(IterDomain* id) { +Val* IndexCompute::getExtent(IterDomain* id) const { // Pick from extent_map_ if available. Previously parallel // dimensions were ued (e.g., blockDim.x), however, it would result // in out-of-bounds errors when the extent of IterDomain is smaller @@ -768,7 +578,8 @@ IndexCompute IndexCompute::updateIndexCompute( const TensorDomain* new_td, const std::unordered_map& id_map, const std::vector& root_contiguity, - const std::unordered_map& reference_halo_extent_map) { + const std::unordered_map& reference_halo_extent_map) + const { FUSER_PERF_SCOPE("GpuLower::Lower::updateIndexCompute"); std::unordered_map updated_index_map; @@ -852,10 +663,13 @@ class UpdateLeafIndices : public IterVisitor { } auto factor = split->factor(); - index_map_[inner_id] = IrBuilder::modExpr(index_map_[in_id], factor); + index_map_[inner_id] = + SimplifyingIrBuilder::modExpr(index_map_[in_id], factor); extent_map_[inner_id] = factor; - index_map_[outer_id] = IrBuilder::divExpr(index_map_[in_id], factor); - extent_map_[outer_id] = IrBuilder::ceilDivExpr(getExtent(in_id), factor); + index_map_[outer_id] = + SimplifyingIrBuilder::divExpr(index_map_[in_id], factor); + extent_map_[outer_id] = + SimplifyingIrBuilder::ceilDivExpr(getExtent(in_id), factor); } void handle(Merge* merge) override { @@ -874,12 +688,13 @@ class UpdateLeafIndices : public IterVisitor { TORCH_INTERNAL_ASSERT( index_map_.find(inner_id) != index_map_.end(), "Inner ID not found"); - index_map_[out_id] = IrBuilder::mulExpr( + index_map_[out_id] = SimplifyingIrBuilder::mulExpr( index_map_[inner_id], - IrBuilder::mulExpr(index_map_[outer_id], getExtent(inner_id))); + SimplifyingIrBuilder::mulExpr( + index_map_[outer_id], getExtent(inner_id))); extent_map_[out_id] = - IrBuilder::mulExpr(getExtent(outer_id), getExtent(inner_id)); + SimplifyingIrBuilder::mulExpr(getExtent(outer_id), getExtent(inner_id)); } // return extent_map_[id] if exists, else return id->extent() @@ -906,8 +721,8 @@ Val* getHaloExtentOfRootAxis(IterDomain* id, Val* normal_extent = nullptr) { const auto& halo = GpuLower::current()->haloInfo().getRootAxisInfo(id); if (halo.hasHalo()) { - auto halo_extent = - IrBuilder::addExpr(normal_extent, IrBuilder::create(halo.width())); + auto halo_extent = SimplifyingIrBuilder::addExpr( + normal_extent, SimplifyingIrBuilder::create(halo.width())); return halo_extent; } else { return normal_extent; @@ -959,8 +774,8 @@ void IndexSwizzle::run() { auto idx_to_swizzle_i = indexMap().at(id_to_swizzle_i); auto idx_to_swizzle_j = indexMap().at(id_to_swizzle_j); - auto swizzled_idx = IrBuilder::modExpr( - IrBuilder::addExpr(idx_to_swizzle_i, idx_to_swizzle_j), + auto swizzled_idx = SimplifyingIrBuilder::modExpr( + SimplifyingIrBuilder::addExpr(idx_to_swizzle_i, idx_to_swizzle_j), id_to_swizzle_j->extent()); index_map_[id_to_swizzle_j] = swizzled_idx; swizzled_ids_.insert(id_to_swizzle_j); @@ -1012,6 +827,13 @@ indexMapFromTV( std::unordered_map loop_to_ind_map; + // Check if the current op has an implicit loop implemented + // within an mma instruction. + bool within_mma_loops = + std::any_of(loops.begin(), loops.end(), [](kir::ForLoop* fl) { + return fl->iter_domain()->isMma(); + }); + // When indexed as a producer, the parallel types of the the // producer domains may not be the same as those of the loops, but // that's still valid parallelization. However, in that case, using @@ -1047,8 +869,15 @@ indexMapFromTV( for (auto loop : loops) { Val* idx = nullptr; - const auto same_parallel_type = - as_consumer || find_matching_parallel_domain(loop->iter_domain()); + const auto same_parallel_type = as_consumer || + find_matching_parallel_domain(loop->iter_domain()) || + // Note && TODO: + // mma swizzled lane_id does not map naturally from producer + // to consumer but they should still be detected as same + // parallel type. In a follow up may want to extent + // find_matching_parallel_domain to cover this case. + (within_mma_loops && + loop->iter_domain()->getParallelType() == ParallelType::TIDx); // See also LoopNestGenerator::pushAlloc. // NOLINTNEXTLINE(bugprone-branch-clone) if (!within_alloc) { @@ -1076,18 +905,22 @@ indexMapFromTV( // Similarly for local memory tensors, zero replacement can be // only done when there's a matching domain with the same // parallel type - (loop->iter_domain()->isThread() && is_local && same_parallel_type) || - loop->vectorize()) { + (loop->iter_domain()->isThread() && is_local && same_parallel_type)) { idx = GpuLower::current()->kernel()->zeroVal(); - if (!loop->vectorize()) { - zero_loops.insert(loop); - } + zero_loops.insert(loop); } else { idx = loop->index(); } + // If the loop is trivial, the loop index can only be the loop + // start value. + if (idx == loop->index() && loop->isTrivial()) { + idx = loop->start(); + } + if (loop == double_buffer_loop) { - idx = IrBuilder::addExpr(idx, GpuLower::current()->kernel()->oneVal()); + idx = SimplifyingIrBuilder::addExpr( + idx, GpuLower::current()->kernel()->oneVal()); } loop_to_ind_map[loop] = idx; @@ -1192,6 +1025,130 @@ std::unordered_map indexMapReferenceTo( return index_map_ref_to_producer; } +Val* hoistConsumerIndex( + IterDomain* consumer_root_id, + const TensorView* consumer_tv, + const IndexCompute& consumer_indexing, + TensorDomain* ref_td, + const IndexCompute& ref_indexing, + const std::vector& loops, + Val* index) { + // If index has no defining expression, there's nothing to hoist + if (disableIndexHoisting() || index->definition() == nullptr) { + return index; + } + + // The old swizzle interface, which should be deprecated, is not + // supported. + if (consumer_tv->swizzleType() != SwizzleType::NoSwizzle) { + return index; + } + + // Find the true indexed domain, which can be a merged contiguous domain. + auto indexed_consumer_id_it = + consumer_indexing.rootToContigID().find(consumer_root_id); + TORCH_INTERNAL_ASSERT( + indexed_consumer_id_it != consumer_indexing.rootToContigID().end(), + "Consumer indexed ID not found: ", + consumer_root_id->toString()); + auto indexed_consumer_id = indexed_consumer_id_it->second; + + // Insert the index into the common index map. A previously inserted + // val can be returned. + auto common_index = GpuLower::current() + ->commonIndexMap() + .insert( + indexed_consumer_id, + consumer_tv->domain(), + ref_td, + ref_indexing.indexMap(), + loops, + index) + .first; + + return common_index; +} + +std::unordered_map invertOneToOneMap( + const std::unordered_map& map) { + std::unordered_map inverted; + for (const auto& kv : map) { + bool inserted = inverted.emplace(kv.second, kv.first).second; + TORCH_INTERNAL_ASSERT( + inserted, + "Multiple mappings to the same value detected: ", + kv.second->toString()); + } + return inverted; +} + +Val* hoistProducerIndex( + IterDomain* producer_root_id, + const TensorView* producer_tv, + const IndexCompute& producer_indexing, + const TensorView* consumer_tv, + const std::unordered_map& p2c_map, + TensorDomain* ref_td, + const IndexCompute& ref_indexing, + const std::vector& loops, + Val* index) { + // If index has no defining expression, there's nothing to hoist + if (disableIndexHoisting() || index->definition() == nullptr) { + return index; + } + + // The old swizzle interface, which should be deprecated, is not + // supported. + if (producer_tv->swizzleType() != SwizzleType::NoSwizzle) { + return index; + } + + auto indexed_producer_id_it = + producer_indexing.rootToContigID().find(producer_root_id); + TORCH_INTERNAL_ASSERT( + indexed_producer_id_it != producer_indexing.rootToContigID().end(), + "Producer indexed ID not found: ", + producer_root_id->toString()); + auto indexed_producer_id = indexed_producer_id_it->second; + + // Use the corresponding consumer domain to find matching + // for-loops. Note that there's no CA mapping with the producer + // domains as the producer TensorDomain is a temporary replay + // domain. + + auto indexed_consumer_id_it = p2c_map.find(indexed_producer_id); + + // There can be no corresponding consumer ID. For example, consider: + // consumer: [b1, i2, i3] + // producer: [i2, i3]. + // Suppose the consumer is transformed as: + // consumer: [(b1*i2)*i3] + // Then the producer would be transformed when indexed: + // producer: [i2*i3] + // Assuming i2 and i3 are contiguous, the producer indexing is done + // with the mreged i2*i3 domain, but there's no domain in the + // cosumer that maps with the producer indexed domain. + // It seems non-trivial to support patterns like this. Skip for now. + if (indexed_consumer_id_it == p2c_map.end()) { + return index; + } + + IterDomain* indexed_consumer_id = indexed_consumer_id_it->second; + + auto common_index = GpuLower::current() + ->commonIndexMap() + .insert( + indexed_consumer_id, + consumer_tv->domain(), + ref_td, + ref_indexing.indexMap(), + loops, + index) + .first; + + return common_index; +} + } // namespace std::vector Index::getGlobalProducerStridedIndices( @@ -1219,16 +1176,17 @@ std::vector Index::getGlobalProducerStridedIndices( // Map everything we can from reference to producer using compute at index // map. Use consumer as a proxy between producer and the generated reference. std::unordered_map index_map_ref_to_producer; - { - // This replay has to be consistent with compute at index map. - BestEffortReplay replay_producer_as_consumer( - producer_tv->domain()->domain(), - consumer_tv->domain()->domain(), - pairwise_map.mapConsumerToProducer( - consumer_tv->domain(), producer_tv->domain())); - const auto& c2p_map = replay_producer_as_consumer.getReplay(); + // This replay has to be consistent with compute at index map. + BestEffortReplay replay_producer_as_consumer( + producer_tv->domain()->domain(), + consumer_tv->domain()->domain(), + pairwise_map.mapConsumerToProducer( + consumer_tv->domain(), producer_tv->domain())); + const auto& c2p_map = replay_producer_as_consumer.getReplay(); + const auto p2c_map = invertOneToOneMap(c2p_map); + { std::unordered_map index_map_ref_to_consumer = indexMapReferenceTo( consumer_tv, gpu_lower->caIndexMap(), reference_id_map); @@ -1302,7 +1260,8 @@ std::vector Index::getGlobalProducerStridedIndices( } std::stringstream ss; ss << "T" << producer_tv->name() << ".stride[" << stride_i++ << "]"; - strides[i] = IrBuilder::create(ss.str(), DataType::Int); + strides[i] = + SimplifyingIrBuilder::create(ss.str(), DataType::Int); } } @@ -1343,12 +1302,13 @@ std::vector Index::getGlobalProducerStridedIndices( // by extent of this dimension auto root_dim_extent = getHaloExtentOfRootAxis(root_dom[dim]); cur_contig_stride = - IrBuilder::mulExpr(cur_contig_stride, root_dim_extent); + SimplifyingIrBuilder::mulExpr(cur_contig_stride, root_dim_extent); } else { // If non contiguous dimension, keep local stride information, set cur // stride to local stride * local raw extent auto root_dim_extent = getHaloExtentOfRootAxis(root_dom[dim]); - cur_contig_stride = IrBuilder::mulExpr(strides[dim], root_dim_extent); + cur_contig_stride = + SimplifyingIrBuilder::mulExpr(strides[dim], root_dim_extent); } } @@ -1380,6 +1340,18 @@ std::vector Index::getGlobalProducerStridedIndices( auto root_ind = producer_indexing.indexMap().at(root_dom[i]); + // index hoist must be done before the adjustments for halo + root_ind = hoistProducerIndex( + root_dom[i], + producer_tv, + producer_indexing, + consumer_tv, + p2c_map, + reference.domain, + ref_compute, + loops, + root_ind); + root_ind = getProducerIndexWithHalo(producer_tv, i, root_ind, consumer_tv); root_ind = getProducerIndexWithGather( @@ -1396,9 +1368,10 @@ std::vector Index::getGlobalProducerStridedIndices( if (root_ind->isZeroInt()) { continue; } else { - auto strided_ind = IrBuilder::mulExpr(root_ind, strides[i]); + auto strided_ind = SimplifyingIrBuilder::mulExpr(root_ind, strides[i]); if (i == root_dom.size() - 1 && vectorize_shift != nullptr) { - strided_inds[i] = IrBuilder::addExpr(strided_ind, vectorize_shift); + strided_inds[i] = + SimplifyingIrBuilder::addExpr(strided_ind, vectorize_shift); } else { strided_inds[i] = strided_ind; } @@ -1434,25 +1407,25 @@ std::vector Index::getNonGlobalProducerStridedIndices( // the allocation position of the producer, and to figure out which producer // indices are mapped to consumer trivial reductions. std::unordered_map p2c_alloc_map; - { - // We want to play producer as consumer instead of the other way around - // since consumer may have some broadcasted axes producer doesn't have - // merged into loops producer may use. If we did consumer as producer we - // wouldn't have this information in the mapping. - auto replay_PasC = BestEffortReplay::replayPasC( - producer_tv, consumer_tv, -1, pairwise_map); - - auto c2p_map = replay_PasC.getReplay(); - - // Grab consumer domain entries and reverse replay map. TODO: Maybe - // TransformReplay::replayPasC could return this map - for (auto id : consumer_tv->domain()->domain()) { - auto c2p_it = c2p_map.find(id); - if (c2p_it != c2p_map.end()) { - auto c_id = c2p_it->first; - auto p_id = c2p_it->second; - p2c_alloc_map[p_id] = c_id; - } + + // We want to play producer as consumer instead of the other way around + // since consumer may have some broadcasted axes producer doesn't have + // merged into loops producer may use. If we did consumer as producer we + // wouldn't have this information in the mapping. + auto replay_PasC = + BestEffortReplay::replayPasC(producer_tv, consumer_tv, -1, pairwise_map); + + const auto& c2p_map = replay_PasC.getReplay(); + const auto p2c_map = invertOneToOneMap(c2p_map); + + // Grab consumer domain entries and reverse replay map. TODO: Maybe + // TransformReplay::replayPasC could return this map + for (auto id : consumer_tv->domain()->domain()) { + auto c2p_it = c2p_map.find(id); + if (c2p_it != c2p_map.end()) { + auto c_id = c2p_it->first; + auto p_id = c2p_it->second; + p2c_alloc_map[p_id] = c_id; } } @@ -1641,6 +1614,18 @@ std::vector Index::getNonGlobalProducerStridedIndices( auto root_ind_i = index_map.at(root_dom[i]); + // index hoist must be done before the adjustments for halo + root_ind_i = hoistProducerIndex( + root_dom[i], + producer_tv, + producer_indexing, + consumer_tv, + c2p_map, + reference.domain, + ref_compute, + loops, + root_ind_i); + root_ind_i = getProducerIndexWithHalo(producer_tv, i, root_ind_i, consumer_tv); @@ -1685,13 +1670,13 @@ std::vector Index::getNonGlobalProducerStridedIndices( if (stride == nullptr) { stride = root_ext_j; } else { - stride = IrBuilder::mulExpr(stride, root_ext_j); + stride = SimplifyingIrBuilder::mulExpr(stride, root_ext_j); } } } if (stride != nullptr) { - strided_inds[i] = IrBuilder::mulExpr(root_ind_i, stride); + strided_inds[i] = SimplifyingIrBuilder::mulExpr(root_ind_i, stride); } else { strided_inds[i] = root_ind_i; } @@ -1701,12 +1686,14 @@ std::vector Index::getNonGlobalProducerStridedIndices( auto db_loop = gpu_lower->doubleBufferInfo().getDoubleBufferLoop( producer_tv, loops, true); if (db_loop != nullptr) { - auto db_switch_index = - IrBuilder::modExpr(db_loop->index(), IrBuilder::create(2)); + auto loop_index = + db_loop->isTrivial() ? db_loop->start() : db_loop->index(); + auto db_switch_index = SimplifyingIrBuilder::modExpr( + loop_index, SimplifyingIrBuilder::create(2)); auto original_alloc_size = gpu_lower->doubleBufferInfo().getOriginalAllocSize(producer_tv); auto db_strided_index = - IrBuilder::mulExpr(db_switch_index, original_alloc_size); + SimplifyingIrBuilder::mulExpr(db_switch_index, original_alloc_size); strided_inds.push_back(db_strided_index); } } @@ -1845,6 +1832,16 @@ std::vector Index::getGlobalConsumerStridedIndices( auto root_ind = consumer_indexing.indexMap().at(root_dom[i]); + // index hoist must be done before the adjustments for halo + root_ind = hoistConsumerIndex( + root_dom[i], + consumer_tv, + consumer_indexing, + reference.domain, + ref_compute, + loops, + root_ind); + root_ind = SimplifyingIrBuilder::addExpr( root_ind, getGlobalConsumerOffsetWithPartialSplit(root_dom[i])); @@ -1979,11 +1976,21 @@ std::vector Index::getNonGlobalConsumerStridedIndices( " id: ", root_dom[i]->toString()); - const auto root_ind_i = index_map.at(root_dom[i]); + auto root_ind_i = index_map.at(root_dom[i]); if (root_ind_i->isZeroInt()) { continue; } + // index hoist must be done before the adjustments for halo + root_ind_i = hoistConsumerIndex( + root_dom[i], + consumer_tv, + consumer_indexing, + reference.domain, + ref_compute, + loops, + root_ind_i); + // Compute striding for this index. Val* stride = nullptr; for (const auto j : c10::irange(i + 1, root_dom.size())) { @@ -2012,13 +2019,13 @@ std::vector Index::getNonGlobalConsumerStridedIndices( if (stride == nullptr) { stride = root_ext_j; } else { - stride = IrBuilder::mulExpr(stride, root_ext_j); + stride = SimplifyingIrBuilder::mulExpr(stride, root_ext_j); } } } if (stride != nullptr) { - strided_inds[i] = IrBuilder::mulExpr(root_ind_i, stride); + strided_inds[i] = SimplifyingIrBuilder::mulExpr(root_ind_i, stride); } else { strided_inds[i] = root_ind_i; } @@ -2037,13 +2044,14 @@ std::vector Index::getNonGlobalConsumerStridedIndices( auto db_loop = gpu_lower->doubleBufferInfo().getDoubleBufferLoop( consumer_tv, loops, true); if (db_loop != nullptr) { - auto db_switch_index = IrBuilder::subExpr( + auto db_switch_index = SimplifyingIrBuilder::subExpr( gpu_lower->kernel()->oneVal(), - IrBuilder::modExpr(db_loop->index(), IrBuilder::create(2))); + SimplifyingIrBuilder::modExpr( + db_loop->index(), SimplifyingIrBuilder::create(2))); auto original_alloc_size = gpu_lower->doubleBufferInfo().getOriginalAllocSize(consumer_tv); auto db_strided_index = - IrBuilder::mulExpr(db_switch_index, original_alloc_size); + SimplifyingIrBuilder::mulExpr(db_switch_index, original_alloc_size); strided_inds.push_back(db_strided_index); } } @@ -2085,7 +2093,8 @@ kir::TensorIndex* Index::getProducerIndex( const TensorView* consumer, const std::vector& loops) { auto strided_indices = getProducerStridedIndices(producer, consumer, loops); - return IrBuilder::create(producer, strided_indices); + return SimplifyingIrBuilder::create( + producer, strided_indices); } std::vector Index::getConsumerStridedIndices( @@ -2113,7 +2122,8 @@ kir::TensorIndex* Index::getConsumerIndex( const TensorView* consumer, const std::vector& loops) { auto strided_indices = getConsumerStridedIndices(consumer, loops); - return IrBuilder::create(consumer, strided_indices); + return SimplifyingIrBuilder::create( + consumer, strided_indices); } namespace { @@ -2363,8 +2373,8 @@ std::pair getStartAndStopOffsetsForShift( } return { - IrBuilder::create(start_offset), - IrBuilder::create(stop_offset)}; + SimplifyingIrBuilder::create(start_offset), + SimplifyingIrBuilder::create(stop_offset)}; } std::pair getStartAndStopOffsetsForGather( @@ -2627,6 +2637,15 @@ auto getPredicateReferenceIndexing( } } + for (const auto loop : loops) { + auto& idx = loop_to_ind_map.at(loop); + // If the loop is trivial, the loop index can only be the loop + // start value. + if (idx == loop->index() && loop->isTrivial()) { + idx = loop->start(); + } + } + if (double_buffer_axis != nullptr) { auto db_loop = GpuLower::current()->doubleBufferInfo().getDoubleBufferLoop( double_buffer_axis, loops, true); @@ -2639,7 +2658,7 @@ auto getPredicateReferenceIndexing( // unswitch. In that case, it is not necessary to move ahead the // index for double buffering. if (cur_index == db_loop->index()) { - loop_to_ind_map[db_loop] = IrBuilder::addExpr( + loop_to_ind_map[db_loop] = SimplifyingIrBuilder::addExpr( cur_index, GpuLower::current()->kernel()->oneVal()); } } @@ -2813,8 +2832,7 @@ bool canOmitStopPredicate( } } - // Omit only when both the index and extent are "simple". - if (!(index_simple && contig_id->extent()->definition() == nullptr)) { + if (!index_simple) { return false; } @@ -2827,14 +2845,20 @@ bool canOmitStopPredicate( auto stop_offset_val = stop_offset->as()->value(); - auto halo_ext = gpu_lower->haloInfo().getRootAxisInfo(contig_id).width(); - // If they are not compile-time constant, can't prove the // condition. if (!stop_offset_val.has_value()) { return false; } + // Note that when a root domain is halo extended, it is the domain + // to be predicated, not its merged contig id even if it exists. So, + // if contig_id does not have root axis info, contig_id is + // guaranteed to have no halo. + auto halo_ext = gpu_lower->haloInfo().hasRootAxisInfo(contig_id) + ? gpu_lower->haloInfo().getRootAxisInfo(contig_id).width() + : 0; + if (halo_ext + stop_offset_val.value() > 0) { return false; } @@ -2858,6 +2882,61 @@ bool canOmitStopPredicate( return true; } +std::pair hoistPredicates( + Val* start_index, + Val* stop_index, + const std::vector& loops, + kir::ForLoop* unswitch_or_vec_loop, + IterDomain* predicated_consumer_id, + TensorView* predicated_consumer_tv, + TensorDomain* ref_td, + const std::unordered_map& ref_start_index_map, + const std::unordered_map& ref_stop_index_map) { + const std::pair same_indices{start_index, stop_index}; + + if (disableIndexHoisting()) { + return same_indices; + } + + const auto start_is_same_as_stop = stop_index == start_index; + + Val* hoisted_stop_index = nullptr; + + if (stop_index->definition() == nullptr) { + // If the index doens't have an expression, nothing to hoist + hoisted_stop_index = stop_index; + } else { + bool inserted = false; + std::tie(hoisted_stop_index, inserted) = + GpuLower::current()->commonIndexMap().insert( + predicated_consumer_id, + predicated_consumer_tv->domain(), + ref_td, + ref_stop_index_map, + loops, + stop_index); + } + + Val* hoisted_start_index = nullptr; + if (start_is_same_as_stop) { + hoisted_start_index = hoisted_stop_index; + } else if (start_index->definition() == nullptr) { + hoisted_start_index = start_index; + } else { + bool inserted = false; + std::tie(hoisted_start_index, inserted) = + GpuLower::current()->commonIndexMap().insert( + predicated_consumer_id, + predicated_consumer_tv->domain(), + ref_td, + ref_start_index_map, + loops, + start_index); + } + + return {hoisted_start_index, hoisted_stop_index}; +} + } // namespace // Returns predicates and the concrete (by loop map) root domains they cover @@ -2908,10 +2987,13 @@ std::pair, ReferenceTensor> Index:: // If not unswitch, share the same indexing map as the stop index // map + const auto& ref_start_indexing = is_unswitch + ? getPredicateReferenceIndexing( + loops, reference, unswitch_or_vec_loop, db_axis, true) + : ref_stop_indexing; + std::unordered_map consumer_start_index_map; if (is_unswitch) { - auto ref_start_indexing = getPredicateReferenceIndexing( - loops, reference, unswitch_or_vec_loop, db_axis, true); const auto consumer_start_indexing = ref_start_indexing.updateIndexCompute( consumer_tv->domain(), ref_2_consumer, @@ -2986,6 +3068,17 @@ std::pair, ReferenceTensor> Index:: auto stop_index = consumer_stop_indexing_it->second; auto start_index = consumer_start_index_map.at(contig_id); + std::tie(start_index, stop_index) = hoistPredicates( + start_index, + stop_index, + loops, + unswitch_or_vec_loop, + contig_id, + consumer_tv, + reference.domain, + ref_start_indexing.indexMap(), + ref_stop_indexing.indexMap()); + // Build predicates for start positions as: // start_index + start_offset >= 0 auto start_offset = simplifyStartOffset(info.start_offset_); diff --git a/torch/csrc/jit/codegen/cuda/index_compute.h b/torch/csrc/jit/codegen/cuda/index_compute.h index 27f1c911bde122..32aa3421ae8b28 100644 --- a/torch/csrc/jit/codegen/cuda/index_compute.h +++ b/torch/csrc/jit/codegen/cuda/index_compute.h @@ -69,7 +69,7 @@ class IndexCompute : public BackwardVisitor { void handle(Expr*) override; // return extent_map_[id] if exists, else return id->extent() - Val* getExtent(IterDomain* id); + Val* getExtent(IterDomain* id) const; //! True if a domain is not used to index bool isZero(IterDomain* id) const; @@ -105,6 +105,9 @@ class IndexCompute : public BackwardVisitor { // IDs that are a result of contiguous merges std::unordered_set contig_ids; + // Map from root to contig domains + std::unordered_map root_to_contig_id_; + // Mentions if we should propagate an index down a particular IterDomain path // if there's an option std::unordered_set preferred_paths_; @@ -130,6 +133,10 @@ class IndexCompute : public BackwardVisitor { return zero_merged_in_; } + const std::unordered_map& rootToContigID() const { + return root_to_contig_id_; + } + // Propagate back from _td using initial_index_map IndexCompute( const TensorDomain* _td, @@ -148,7 +155,7 @@ class IndexCompute : public BackwardVisitor { const std::unordered_map& id_map, const std::vector& _root_contiguity, const std::unordered_map& reference_halo_extent_map = - {}); + {}) const; virtual void run(); }; diff --git a/torch/csrc/jit/codegen/cuda/index_reference_replay.cpp b/torch/csrc/jit/codegen/cuda/index_reference_replay.cpp index 27e5b93e94e29c..a0e346f8892c61 100644 --- a/torch/csrc/jit/codegen/cuda/index_reference_replay.cpp +++ b/torch/csrc/jit/codegen/cuda/index_reference_replay.cpp @@ -40,7 +40,7 @@ IterDomain* IndexReferenceReplay::idCopy(IterDomain* id) { // reduction. All we care about are the transformations, and trying to make // sure we track correctly a replaying with consistent reduction/broadcast // domains is challenging and unnecessary. - auto copied_id = IrBuilder::create( + auto copied_id = SimplifyingIrBuilder::create( id->container(), id->start(), id->extent(), id->getParallelType()); replayed_ids_.emplace_back(copied_id); return copied_id; @@ -59,13 +59,13 @@ void IndexReferenceReplay::handle(Split* split) { // Don't produce the same values multiple times auto ref_outer = concreteToRefId(toConcrete(split->outer())); auto ref_inner = concreteToRefId(toConcrete(split->inner())); - if (ref_id_produced_.find(ref_outer) != ref_id_consumed_.end() || - ref_id_produced_.find(ref_inner) != ref_id_consumed_.end()) { + if (ref_id_produced_.find(ref_outer) != ref_id_produced_.end() || + ref_id_produced_.find(ref_inner) != ref_id_produced_.end()) { return; } // Replay the provided split operation and add it to the reference DAG - IrBuilder::create( + SimplifyingIrBuilder::create( split->container(), ref_outer, ref_inner, @@ -92,12 +92,13 @@ void IndexReferenceReplay::handle(Merge* merge) { // Don't produce the same values multiple times auto ref_out = concreteToRefId(toConcrete(merge->out())); - if (ref_id_produced_.find(ref_out) != ref_id_consumed_.end()) { + if (ref_id_produced_.find(ref_out) != ref_id_produced_.end()) { return; } // Replay the provided merge operation and add it to the reference DAG - IrBuilder::create(merge->container(), ref_out, ref_outer, ref_inner); + SimplifyingIrBuilder::create( + merge->container(), ref_out, ref_outer, ref_inner); // Mark producers and consumers ref_id_consumed_.emplace(ref_outer); @@ -218,7 +219,7 @@ TensorDomain* IndexReferenceReplay::computeReplay() { loops_replayed_domain.begin(), loops_replayed_domain.end(), [](IterDomain* id) { return id->definition() != nullptr; })) { - auto domain = IrBuilder::create( + auto domain = SimplifyingIrBuilder::create( // If there was no replay only return a domain with a root domain. loops_replayed_domain); return domain; @@ -253,7 +254,7 @@ TensorDomain* IndexReferenceReplay::computeReplay() { } // Create and return the reference. - auto domain = IrBuilder::create( + auto domain = SimplifyingIrBuilder::create( std::vector( root_domain_ids.begin(), root_domain_ids.end()), loops_replayed_domain); @@ -276,17 +277,23 @@ IndexCompute getReferenceIndexing( auto loop = loop_structure[loop_i]; auto ind = loop->index(); - initial_index_map[ref_axis] = ind; - if (loop->vectorize()) { - initial_index_map[ref_axis] = GpuLower::current()->kernel()->zeroVal(); - } else if (double_buffer_loop == loop) { + // If the loop is trivial, only the start value is used + if (loop->isTrivial()) { + initial_index_map[ref_axis] = loop->start(); + } else { + initial_index_map[ref_axis] = ind; + } + + if (double_buffer_loop == loop) { + TORCH_INTERNAL_ASSERT( + !loop->isTrivial(), "The double buffer loop must be materialized"); // This version of getReferenceIndexing is only used for // indexing global tensors. When indexing global producers, the // index for a double buffered loop needs to be incremented. The // parameter double_buffer_loop should be nullptr when indexing // global consumers tensors. - initial_index_map[ref_axis] = - IrBuilder::addExpr(ind, GpuLower::current()->kernel()->oneVal()); + initial_index_map[ref_axis] = SimplifyingIrBuilder::addExpr( + initial_index_map[ref_axis], GpuLower::current()->kernel()->oneVal()); } if (Index::protectWithMagicZero(loop, ref_axis, ind)) { @@ -297,7 +304,7 @@ IndexCompute getReferenceIndexing( // Add magic zero to a fairly inner most index if (magic_zero_loop >= 0) { auto ref_id = reference_tensor->axis(magic_zero_loop); - initial_index_map[ref_id] = IrBuilder::addExpr( + initial_index_map[ref_id] = SimplifyingIrBuilder::addExpr( initial_index_map[ref_id], FusionGuard::getCurFusion()->magicZeroVal()); } diff --git a/torch/csrc/jit/codegen/cuda/interface.cpp b/torch/csrc/jit/codegen/cuda/interface.cpp index d21004ae154278..1292f4b7ed02ab 100644 --- a/torch/csrc/jit/codegen/cuda/interface.cpp +++ b/torch/csrc/jit/codegen/cuda/interface.cpp @@ -90,6 +90,11 @@ bool profileNode(const Node* node) { getFuserInterface()->fn_profile_n(node); } +bool skipNode(const std::string& symbol_str, bool flip) { + return getFuserInterface()->fn_skip_n != nullptr && + getFuserInterface()->fn_skip_n(symbol_str, flip); +} + //! [ Note -- type guard logic in CudaFusionGuard ] //! //! CudaFusionGuard is used to Guard input tensor to `CudaFusionGroup` so that @@ -117,11 +122,15 @@ bool profileNode(const Node* node) { //! extra attention should be paid to contiguity across size-1 //! dimensions. //! c. size check: +//! c.1 broadcast check: //! making sure that broadcast semantics are identical. So we want to //! make sure a given dimension either are both size-1 for `tensor` & //! `guard_tensor_type`, or are both non-size-1. //! This is due to the fact that we specialize size-1 dimension as //! broadcasted dimension while translating PyTorch tensor to Fusion IR. +//! c.1 size-0 check: +//! we don't specialize this on codegen, but we do specialize fusion +//! logic for size-0 on reductoins, hence the check //! bool complyWith( const at::Tensor& tensor, @@ -133,13 +142,19 @@ bool complyWith( // check a. if num_dimension check fails or scalar type check fails if (*guard_tensor_type->dim() != static_cast(tensor.ndimension()) || (guard_tensor_type->scalarType().has_value() && - (guard_tensor_type->scalarType().value() != tensor.scalar_type()))) { + (guard_tensor_type->scalarType().value() != tensor.scalar_type())) || + (guard_tensor_type->device().has_value() && + (guard_tensor_type->device().value() != tensor.device())) || + (guard_tensor_type->requiresGrad().has_value() && + guard_tensor_type->requiresGrad().value() != + (tensor.requires_grad() && at::GradMode::is_enabled()))) { return false; } // TODO: should we get symbolic_size instead and check for size // consistency across tensors as well? const auto& sizes = guard_tensor_type->sizes(); + // see [ Note -- stirde_properties in tensor type ] const auto& stride_properties = guard_tensor_type->stride_properties(); const auto& t_sizes = tensor.sizes(); @@ -207,12 +222,18 @@ bool complyWith( } } - // check c, we go along semantic ordered dimensions + // check c.1, we go along semantic ordered dimensions // check broadcast / size-1: bool guard_bcast = sizes[j].has_value() && sizes[j].value() == 1; if (guard_bcast != (t_sizes[j] == 1)) { return false; } + + // check c.2, check for size-0 + bool guard_size_0 = sizes[j].has_value() && sizes[j].value() == 0; + if (guard_size_0 != (t_sizes[j] == 0)) { + return false; + } } return true; @@ -675,7 +696,7 @@ RegisterOperators reg_infer_unsqueeze_size({ // NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables) RegisterOperators reg_infer_squeeze_dim_size({ Operator( - "prim::infer_squeeze_size(int[] a, int dim) -> int[]", + "prim::infer_squeeze_size.dim(int[] a, int dim) -> int[]", [](const Node* node) -> Operation { return [](Stack& stack) { auto dim = pop(stack).toInt(); @@ -696,7 +717,7 @@ RegisterOperators reg_infer_squeeze_dim_size({ // NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables) RegisterOperators reg_infer_squeeze_size({ Operator( - "prim::infer_squeeze_size.dim(int[] a) -> int[]", + "prim::infer_squeeze_size(int[] a) -> int[]", [](const Node* node) -> Operation { return [](Stack& stack) { auto size = pop(stack).toIntVector(); diff --git a/torch/csrc/jit/codegen/cuda/interface.h b/torch/csrc/jit/codegen/cuda/interface.h index 8afa854ea5cf46..a24d3f3b043276 100644 --- a/torch/csrc/jit/codegen/cuda/interface.h +++ b/torch/csrc/jit/codegen/cuda/interface.h @@ -19,10 +19,10 @@ namespace cuda { TORCH_API std::atomic& getCudaFusionGuardMode(); -C10_EXPORT bool getSingletonFusion(); -C10_EXPORT bool setSingletonFusion(bool value); -C10_EXPORT bool getHorizontalFusion(); -C10_EXPORT bool setHorizontalFusion(bool value); +TORCH_API bool getSingletonFusion(); +TORCH_API bool setSingletonFusion(bool value); +TORCH_API bool getHorizontalFusion(); +TORCH_API bool setHorizontalFusion(bool value); // dummy struct to allow API registration struct CudaFuserInterface { @@ -32,19 +32,22 @@ struct CudaFuserInterface { bool (*fn_can_fuse_n)(const Node*) = nullptr; void (*fn_insert_profile_inodes)(ProfilingRecord* pr) = nullptr; bool (*fn_profile_n)(const Node*) = nullptr; + bool (*fn_skip_n)(const std::string&, bool flip) = nullptr; }; // Get interface, this is used by registration and user facing API internally -C10_EXPORT CudaFuserInterface* getFuserInterface(); +TORCH_API CudaFuserInterface* getFuserInterface(); -C10_EXPORT void compileFusionGroup(Node* fusion_node); -C10_EXPORT void runFusionGroup(const Node* fusion_node, Stack& stack); -C10_EXPORT void fuseGraph(std::shared_ptr&); -C10_EXPORT bool canFuseNode(const Node* node); -C10_EXPORT void InsertProfileNodesForCUDAFuser(ProfilingRecord* pr); -C10_EXPORT bool profileNode(const Node* node); +TORCH_API void compileFusionGroup(Node* fusion_node); +TORCH_API void runFusionGroup(const Node* fusion_node, Stack& stack); +TORCH_API void fuseGraph(std::shared_ptr&); +TORCH_API bool canFuseNode(const Node* node); +TORCH_API void InsertProfileNodesForCUDAFuser(ProfilingRecord* pr); +TORCH_API bool profileNode(const Node* node); -C10_EXPORT bool complyWith( +TORCH_API bool skipNode(const std::string& symbol_str, bool flip = true); + +TORCH_API bool complyWith( const at::Tensor& tensor, const c10::TensorTypePtr& guard_tensor_type); diff --git a/torch/csrc/jit/codegen/cuda/ir_base_nodes.cpp b/torch/csrc/jit/codegen/cuda/ir_base_nodes.cpp index 6a094c104df34d..39434ff993721b 100644 --- a/torch/csrc/jit/codegen/cuda/ir_base_nodes.cpp +++ b/torch/csrc/jit/codegen/cuda/ir_base_nodes.cpp @@ -103,6 +103,19 @@ const std::vector& Val::uses() const { return uses_; } +void Val::resolveIndexDtype() { + TORCH_INTERNAL_ASSERT( + vtype_ == ValType::TensorView, + "Resolving index type is currently only supported on tensor view values."); + TORCH_INTERNAL_ASSERT( + dtype_ == DataType::Index, + "Can only resolve index type if a tensor has an Index DataType."); + TORCH_INTERNAL_ASSERT( + container()->isA(), + "Index type can only be resolved at compile time."); + dtype_ = container()->as()->indexType(); +} + namespace { // Traverse definition of all values involved in constructing the provided val. @@ -180,6 +193,16 @@ bool Val::isOneInt() const { return int_val.has_value() && int_val.value() == 1; } +bool Val::isDefinitionType(ExprType expression_type) const { + if (definition() != nullptr) { + auto def_expr_type = definition()->getExprType(); + if (def_expr_type.has_value() && def_expr_type.value() == expression_type) { + return true; + } + } + return false; +} + c10::optional Val::getDataType() const { TORCH_INTERNAL_ASSERT( dtype_ != DataType::Null, "Value does not have a data type."); diff --git a/torch/csrc/jit/codegen/cuda/ir_base_nodes.h b/torch/csrc/jit/codegen/cuda/ir_base_nodes.h index 1b8444fae46203..70f0b8f80fe538 100644 --- a/torch/csrc/jit/codegen/cuda/ir_base_nodes.h +++ b/torch/csrc/jit/codegen/cuda/ir_base_nodes.h @@ -266,6 +266,9 @@ class TORCH_CUDA_CU_API Val : public Statement { return definition_; } + // Determine if value definition matches given expression type + bool isDefinitionType(ExprType expression_type) const; + const std::vector& uses() const; bool isFusionInput() const { @@ -309,13 +312,13 @@ class TORCH_CUDA_CU_API Val : public Statement { definition_ = expr; } + void resolveIndexDtype(); + protected: friend Fusion; // NOLINTNEXTLINE(cppcoreguidelines-non-private-member-variables-in-classes) const ValType vtype_; - // NOLINTNEXTLINE(cppcoreguidelines-non-private-member-variables-in-classes) - const DataType dtype_; // TODO: Add fusion passkey for this void setIsFusionInput(bool is_fusion_input) { @@ -333,6 +336,11 @@ class TORCH_CUDA_CU_API Val : public Statement { } private: + // There's only one instance where dtype can change, and that's through + // resolving the index data type from nvfuser to either Int or Int32 for + // welford operations. + DataType dtype_; + // Following is managed by Fusion and can change. bool is_fusion_input_ = false; bool is_fusion_output_ = false; diff --git a/torch/csrc/jit/codegen/cuda/ir_builder.cpp b/torch/csrc/jit/codegen/cuda/ir_builder.cpp index 17a4e59cfb625b..c17ff0de44a49b 100644 --- a/torch/csrc/jit/codegen/cuda/ir_builder.cpp +++ b/torch/csrc/jit/codegen/cuda/ir_builder.cpp @@ -47,6 +47,7 @@ IR_BUILDER_INSTANTIATE(TensorView) IR_BUILDER_INSTANTIATE(Bool) IR_BUILDER_INSTANTIATE(Double) IR_BUILDER_INSTANTIATE(Int) +IR_BUILDER_INSTANTIATE(ComplexDouble) IR_BUILDER_INSTANTIATE(NamedScalar) // Exprs @@ -55,12 +56,14 @@ IR_BUILDER_INSTANTIATE(Merge) IR_BUILDER_INSTANTIATE(TransposeOp) IR_BUILDER_INSTANTIATE(ShiftOp) IR_BUILDER_INSTANTIATE(GatherOp) +IR_BUILDER_INSTANTIATE(ViewDtypeOp) IR_BUILDER_INSTANTIATE(ViewOp) IR_BUILDER_INSTANTIATE(UnaryOp) IR_BUILDER_INSTANTIATE(BinaryOp) IR_BUILDER_INSTANTIATE(TernaryOp) IR_BUILDER_INSTANTIATE(ReductionOp) IR_BUILDER_INSTANTIATE(WelfordOp) +IR_BUILDER_INSTANTIATE(MmaOp) IR_BUILDER_INSTANTIATE(BroadcastOp) Val* IrBuilder::newResult(DataType dtype) { @@ -268,6 +271,61 @@ Val* SimplifyingIrBuilder::subExpr(Val* lhs, Val* rhs) { return addExpr(lhs, negExpr(rhs)); } +Val* SimplifyingIrBuilder::mulExpr(Int* lhs, Int::ScalarType rhs) { + if (rhs == 0) { + return lhs->container()->zeroVal(); + } else if (rhs == 1) { + return lhs; + } else if (lhs == nullptr) { + return IrBuilder::create(rhs); + } else if (lhs->isConst()) { + return IrBuilder::create(lhs->value().value() * rhs); + } else { + return IrBuilder::mulExpr(lhs, IrBuilder::create(rhs)); + } +} + +Val* SimplifyingIrBuilder::mulExpr(Val* lhs, Int::ScalarType rhs) { + auto lhs_int = dynamic_cast(lhs); + if (lhs_int != nullptr) { + return mulExpr(lhs_int, rhs); + } else { + return IrBuilder::mulExpr(lhs, IrBuilder::create(rhs)); + } +} + +Val* SimplifyingIrBuilder::mulExpr(Int* lhs, Int* rhs) { + if (rhs == nullptr) { + return lhs; + } else if (lhs == nullptr) { + return rhs; + } else if (lhs->isConst()) { + return mulExpr(rhs, lhs->value().value()); + } else if (rhs->isConst()) { + return mulExpr(lhs, rhs->value().value()); + } else { + return IrBuilder::mulExpr(lhs, rhs); + } +} + +Val* SimplifyingIrBuilder::mulExpr(Val* lhs, Val* rhs) { + TORCH_INTERNAL_ASSERT(lhs != nullptr || rhs != nullptr); + if (lhs == nullptr || lhs->isOneInt()) { + return rhs; + } else if (rhs == nullptr || rhs->isOneInt()) { + return lhs; + } else if (lhs->isZeroInt() || rhs->isZeroInt()) { + return lhs->container()->zeroVal(); + } + auto lhs_int = dynamic_cast(lhs); + auto rhs_int = dynamic_cast(rhs); + if (lhs_int != nullptr && rhs_int != nullptr) { + return mulExpr(lhs_int, rhs_int); + } else { + return IrBuilder::mulExpr(lhs, rhs); + } +} + Val* SimplifyingIrBuilder::andExpr(Val* lhs, Val* rhs) { TORCH_INTERNAL_ASSERT(!(lhs == nullptr && rhs == nullptr)); diff --git a/torch/csrc/jit/codegen/cuda/ir_builder.h b/torch/csrc/jit/codegen/cuda/ir_builder.h index 5087f2832a99df..f122232f8fb8eb 100644 --- a/torch/csrc/jit/codegen/cuda/ir_builder.h +++ b/torch/csrc/jit/codegen/cuda/ir_builder.h @@ -116,6 +116,10 @@ class TORCH_CUDA_CU_API SimplifyingIrBuilder : public IrBuilder { static Val* addExpr(Int* lhs, Int* rhs); static Val* addExpr(Val* lhs, Val* rhs); static Val* subExpr(Val* lhs, Val* rhs); + static Val* mulExpr(Int* lhs, Int::ScalarType rhs); + static Val* mulExpr(Val* lhs, Int::ScalarType rhs); + static Val* mulExpr(Int* lhs, Int* rhs); + static Val* mulExpr(Val* lhs, Val* rhs); static Val* andExpr(Val* lhs, Val* rhs); static Val* maxExpr(Val* lhs, Val* rhs); static Val* minExpr(Val* lhs, Val* rhs); diff --git a/torch/csrc/jit/codegen/cuda/ir_cloner.cpp b/torch/csrc/jit/codegen/cuda/ir_cloner.cpp index 8a1717e8d059dd..1ddc4feb90dacc 100644 --- a/torch/csrc/jit/codegen/cuda/ir_cloner.cpp +++ b/torch/csrc/jit/codegen/cuda/ir_cloner.cpp @@ -76,6 +76,10 @@ void IrCloner::handle(const Int* i) { clone_ = IrBuilder::clone(i, this); } +void IrCloner::handle(const ComplexDouble* c) { + clone_ = IrBuilder::clone(c, this); +} + void IrCloner::handle(const NamedScalar* named_scalar) { clone_ = IrBuilder::clone(named_scalar, this); } @@ -108,6 +112,10 @@ void IrCloner::handle(const WelfordOp* op) { clone_ = IrBuilder::clone(op, this); } +void IrCloner::handle(const MmaOp* op) { + clone_ = IrBuilder::clone(op, this); +} + void IrCloner::handle(const TransposeOp* op) { clone_ = IrBuilder::clone(op, this); } @@ -120,6 +128,10 @@ void IrCloner::handle(const GatherOp* op) { clone_ = IrBuilder::clone(op, this); } +void IrCloner::handle(const ViewDtypeOp* op) { + clone_ = IrBuilder::clone(op, this); +} + void IrCloner::handle(const ViewOp* op) { clone_ = IrBuilder::clone(op, this); } diff --git a/torch/csrc/jit/codegen/cuda/ir_cloner.h b/torch/csrc/jit/codegen/cuda/ir_cloner.h index 1755b9e95632fe..3f50cd48e93bf6 100644 --- a/torch/csrc/jit/codegen/cuda/ir_cloner.h +++ b/torch/csrc/jit/codegen/cuda/ir_cloner.h @@ -65,6 +65,7 @@ class TORCH_CUDA_CU_API IrCloner : private OptInConstDispatch { void handle(const Bool*) override; void handle(const Double*) override; void handle(const Int*) override; + void handle(const ComplexDouble*) override; void handle(const NamedScalar*) override; void handle(const UnaryOp*) override; @@ -73,9 +74,11 @@ class TORCH_CUDA_CU_API IrCloner : private OptInConstDispatch { void handle(const BroadcastOp*) override; void handle(const ReductionOp*) override; void handle(const WelfordOp*) override; + void handle(const MmaOp*) override; void handle(const TransposeOp*) override; void handle(const ShiftOp*) override; void handle(const GatherOp*) override; + void handle(const ViewDtypeOp*) override; void handle(const ViewOp*) override; void handle(const Split*) override; diff --git a/torch/csrc/jit/codegen/cuda/ir_graphviz.cpp b/torch/csrc/jit/codegen/cuda/ir_graphviz.cpp index 7511fbd4d6d595..941bf22dea7633 100644 --- a/torch/csrc/jit/codegen/cuda/ir_graphviz.cpp +++ b/torch/csrc/jit/codegen/cuda/ir_graphviz.cpp @@ -371,6 +371,10 @@ void IrGraphGenerator::handle(const Int* i) { printValue(i, IrNodeLabel::gen(i, detail_level_)); } +void IrGraphGenerator::handle(const ComplexDouble* i) { + printValue(i, IrNodeLabel::gen(i, detail_level_)); +} + void IrGraphGenerator::handle(const NamedScalar* i) { printValue(i, IrNodeLabel::gen(i, detail_level_)); } diff --git a/torch/csrc/jit/codegen/cuda/ir_graphviz.h b/torch/csrc/jit/codegen/cuda/ir_graphviz.h index f9b3adf703d14c..e5bbcac9157dc7 100644 --- a/torch/csrc/jit/codegen/cuda/ir_graphviz.h +++ b/torch/csrc/jit/codegen/cuda/ir_graphviz.h @@ -79,6 +79,7 @@ class TORCH_CUDA_CU_API IrGraphGenerator : private OptInConstDispatch { void handle(const Bool*) override; void handle(const Double*) override; void handle(const Int*) override; + void handle(const ComplexDouble*) override; void handle(const NamedScalar*) override; void handle(const UnaryOp*) override; diff --git a/torch/csrc/jit/codegen/cuda/ir_interface_nodes.h b/torch/csrc/jit/codegen/cuda/ir_interface_nodes.h index 28478c64d91efe..bfc76acdfccd37 100644 --- a/torch/csrc/jit/codegen/cuda/ir_interface_nodes.h +++ b/torch/csrc/jit/codegen/cuda/ir_interface_nodes.h @@ -53,9 +53,9 @@ class TORCH_CUDA_CU_API Bool : public Val { const c10::optional maybe_value_; }; -//! A Float64 value. For now we don't have any other type besides -//! Float64. This value can be a symbolic value (defined after the kernel -//! is compiled) or a constant value (inlined into the kernel definition). +//! A Float64 value. This value can be a symbolic value (defined after the +//! kernel is compiled) or a constant value (inlined into the kernel +//! definition). class TORCH_CUDA_CU_API Double : public Val { public: using ScalarType = double; @@ -114,6 +114,39 @@ class TORCH_CUDA_CU_API Int : public Val { const c10::optional maybe_value_; }; +//! An c10::complex value. This value can be a symbolic value (defined +//! after the kernel is compiled) or a constant value (inlined into the kernel +//! definition). +class TORCH_CUDA_CU_API ComplexDouble : public Val { + public: + using ScalarType = c10::complex; + + ComplexDouble(IrBuilderPasskey passkey); + + explicit ComplexDouble(IrBuilderPasskey passkey, ScalarType value); + + explicit ComplexDouble( + IrBuilderPasskey passkey, + c10::optional value); + + ComplexDouble(const ComplexDouble* src, IrCloner* ir_cloner); + + bool isSymbolic() const { + return !(maybe_value_.has_value()); + } + bool isConst() const final { + return maybe_value_.has_value(); + } + c10::optional value() const { + return maybe_value_; + } + + bool sameAs(const Statement* other) const override; + + private: + const c10::optional maybe_value_; +}; + //! Mode during propagation of computeAt, standard will throw an error if //! computeAt position provided can't be satisfied, best effort will lower the //! computeAt position as needed during traversal, most inlined will increase @@ -176,6 +209,13 @@ class TORCH_CUDA_CU_API TensorView : public Val { return domain_; } + //! This is for a TensorView with an rFactor domain that is an input to a + //! fusion segment. We convert the rfactor domain into a new root domain. + //! Any dynamic-sized rfactor iterDomains are given a new symbolic extent. + //! Concrete integer extents are kept. Output TensorViews of any subsequent + //! expressions that use this TensorView are also updated. + void convertRfactorToRootDomain(); + void setContiguity(const std::vector& contig) { domain()->setContiguity(contig); } @@ -400,6 +440,24 @@ class TORCH_CUDA_CU_API TensorView : public Val { return is_double_buffered_; } + //! Fill in mma options in scheduling time. + //! Each mma op in Fusion IR must be configured once before lowering. + //! Mma options are configuration parameters used in lowering to mma + //! instrinsics, mainly the type of mma macro to use and input data layout + //! etc. + //! + //! TODO: This step will very likely be removed in a follow up PR. All of + //! the options configured here could actually be inferred from fusion IR + //! once we are feature complete. + void configureMma(MmaOptions options); + + //! Transforms the innermost iterdomains according to the given mma swizzle, + //! this should be used on the tvs that are either inputs/outputs of an + //! MmaOp, or any tv's that are involved in prolog/epilog fusions and need to + //! have a matching thread swizzle with the mma operand/result. + //! More detail on usage see [WarpMmaSwizzler] in scheduler/mma_utils.h . + void applyMmaSwizzle(MmaOptions options); + friend TORCH_CUDA_CU_API TransformPropagator; friend TORCH_CUDA_CU_API TransformReplay; friend TORCH_CUDA_CU_API OptOutMutator; diff --git a/torch/csrc/jit/codegen/cuda/ir_internal_nodes.h b/torch/csrc/jit/codegen/cuda/ir_internal_nodes.h index bb494148be2135..41f16978779768 100644 --- a/torch/csrc/jit/codegen/cuda/ir_internal_nodes.h +++ b/torch/csrc/jit/codegen/cuda/ir_internal_nodes.h @@ -5,6 +5,7 @@ #include #include #include +#include #include //! Nodes in here should generally not be used by users. They should be behind @@ -150,7 +151,8 @@ class TORCH_CUDA_CU_API ReductionOp : public Expr { BinaryOpType reduction_op_type, Val* init, Val* out, - Val* in); + Val* in, + bool is_fused = false); ReductionOp(const ReductionOp* src, IrCloner* ir_cloner); @@ -168,6 +170,10 @@ class TORCH_CUDA_CU_API ReductionOp : public Expr { return reduction_op_type_; } + bool isFused() const { + return is_fused_; + } + bool sameAs(const Statement* other) const override; private: @@ -175,6 +181,8 @@ class TORCH_CUDA_CU_API ReductionOp : public Expr { Val* const init_ = nullptr; Val* const out_ = nullptr; Val* const in_ = nullptr; + //! True if using the fused reduction kernel + bool is_fused_ = false; }; //! Welford Scan operation. @@ -190,7 +198,8 @@ class TORCH_CUDA_CU_API WelfordOp : public Expr { Val* init_N, Val* in_avg, Val* in_var, - Val* in_N); + Val* in_N, + bool is_fused = false); WelfordOp(const WelfordOp* src, IrCloner* ir_cloner); @@ -250,6 +259,10 @@ class TORCH_CUDA_CU_API WelfordOp : public Expr { return !init_N_->isZeroInt(); } + bool isFused() const { + return is_fused_; + } + private: Val* const out_avg_; Val* const out_var_; @@ -260,6 +273,63 @@ class TORCH_CUDA_CU_API WelfordOp : public Expr { Val* const in_avg_; Val* const in_var_; Val* const in_N_; + //! True if using the fused reduction kernel (not implemented yet) + bool is_fused_ = false; +}; + +//! Fused Matmul operation +class TORCH_CUDA_CU_API MmaOp : public Expr { + public: + MmaOp(IrBuilderPasskey, Val* out, Val* in_a, Val* in_b, Val* init); + + MmaOp( + IrBuilderPasskey, + Val* out, + Val* in_a, + Val* in_b, + Val* init, + MmaOptions options); + + MmaOp(const MmaOp* src, IrCloner* ir_cloner); + + Val* out() const { + return out_; + } + + Val* inA() const { + return in_a_; + } + + Val* inB() const { + return in_b_; + } + + Val* init() const { + return init_; + } + + const auto& options() const { + TORCH_INTERNAL_ASSERT(options_.has_value(), "MmaOp not configured:", this); + return options_.value(); + } + + bool sameAs(const Statement* const other) const override; + + auto accStride() const { + TORCH_INTERNAL_ASSERT(options_.has_value(), "MmaOp not configured:", this); + return options_->accumulator_stride; + } + + void configureOptions(MmaOptions options) { + options_ = options; + } + + private: + Val* const out_ = nullptr; + Val* const in_a_ = nullptr; + Val* const in_b_ = nullptr; + Val* const init_ = nullptr; + c10::optional options_ = c10::nullopt; }; class TORCH_CUDA_CU_API TransposeOp : public Expr { @@ -429,6 +499,34 @@ class TORCH_CUDA_CU_API GatherOp : public Expr { std::vector> pad_width_; }; +class TORCH_CUDA_CU_API ViewDtypeOp : public Expr { + public: + ViewDtypeOp( + IrBuilderPasskey, + TensorView* out, + TensorView* in, + DataType dtype); + + ViewDtypeOp(const ViewDtypeOp* src, IrCloner* ir_cloner); + + TensorView* out() const { + return out_; + } + + TensorView* in() const { + return in_; + } + + DataType dtype() const { + return dtype_; + } + + private: + TensorView* const out_ = nullptr; + TensorView* const in_ = nullptr; + DataType dtype_; +}; + class TORCH_CUDA_CU_API ViewOp : public Expr { public: ViewOp(IrBuilderPasskey, TensorView* out, TensorView* in); @@ -662,6 +760,50 @@ class TORCH_CUDA_CU_API IterDomain : public Val { return definition() == nullptr; } + //! Marks that this id represents a + //! instruction loop, mma use only. + //! + //! An instruction loop can be considered a generalization of + //! vectorization. It also represents a loop that's implemented + //! by an instruction and should not be realized by codegen and + //! cannot be inlined with. + //! As an example, if a mma macro, call it mma_eg implements: + //! for m in M + //! for n in N + //! for k in K + //! C[m,n] += A[m,k]*B[k,n], + //! But the generated code should simply be: + //! mma_eg(C,A,B) + //! without the 3 level loopnest, i.e. they're instruction loops. + //! + //! In the actual mma macros, the loopnests it implements is a + //! transformed version of above to match the mma swizzle. + //! So it's different implicit loopnest for different macros. + //! WarpMmaSwizzler will label the instruction loops case-by-case. + bool isMma() const { + return parallel_type_ == ParallelType::Mma; + } + + bool isMmaSwizzled() const { + return is_mma_swizzled_; + } + + //! Used by WarpMmaSwizzler, this is an utility for WarpMmaSwizzler + //! to lock the thread swizzled iterdomains. + //! Only true for the iterdomains produced by WarpMmaSwizzler. + //! Mma ops require specific swizzle patterns + //! and this label utility is to prevent any further transform on the + //! iterdomains involved in the swizzle so that the pattern remain correct in + //! generated code. + //! + //! Note: + //! Used only through WarpMmaSwizzler only and mma validation relies on + //! this + //! flag being set on the correct iterdomains. + void toMmaSwizzled() { + is_mma_swizzled_ = true; + } + protected: friend TensorDomain; friend ReplayTransformations; @@ -682,6 +824,11 @@ class TORCH_CUDA_CU_API IterDomain : public Val { // TODO: Remove only used in kernel IR because IterDomains don't maintain // definitions of split/merge. bool is_simple_ = true; + + //! Tracks if this id represents a thread swizzled loop or + //! models an implicit loop within instructions. Should not make + //! any changes once an id is warp mapped. + bool is_mma_swizzled_ = false; }; //! TensorDomain holds a vector of IterDomains. It holds an IterDomain for every diff --git a/torch/csrc/jit/codegen/cuda/ir_iostream.cpp b/torch/csrc/jit/codegen/cuda/ir_iostream.cpp index 8c0e1022308320..0ca27be650ca73 100644 --- a/torch/csrc/jit/codegen/cuda/ir_iostream.cpp +++ b/torch/csrc/jit/codegen/cuda/ir_iostream.cpp @@ -146,33 +146,29 @@ void IrPrinter::handle(const TensorDomain* td) { } void IrPrinter::handle(const TensorView* tv) { - if (tv->nDims() == 0) { - os_ << typePrefix(tv->getDataType().value()) << varName(tv); - } else { - os_ << "T" << varName(tv); - switch (tv->getMemoryType()) { - case MemoryType::Global: - os_ << "_g"; - break; - case MemoryType::Shared: - os_ << "_s"; - break; - case MemoryType::Local: - os_ << "_l"; - break; - } - handle(tv->domain()); + os_ << "T" << varName(tv); + switch (tv->getMemoryType()) { + case MemoryType::Global: + os_ << "_g"; + break; + case MemoryType::Shared: + os_ << "_s"; + break; + case MemoryType::Local: + os_ << "_l"; + break; + } + handle(tv->domain()); - if (tv->getComputeAtPosition() > 0) { - os_ << " ca_pos( "; - os_ << tv->getComputeAtPosition(); - os_ << " )"; - } - if (tv->getMaxProducerPosition() > 0) { - os_ << " produce_pos( "; - os_ << tv->getMaxProducerPosition(); - os_ << ")"; - } + if (tv->getComputeAtPosition() > 0) { + os_ << " ca_pos( "; + os_ << tv->getComputeAtPosition(); + os_ << " )"; + } + if (tv->getMaxProducerPosition() > 0) { + os_ << " produce_pos( "; + os_ << tv->getMaxProducerPosition(); + os_ << ")"; } } @@ -225,6 +221,25 @@ void IrPrinter::handle(const Int* i) { } } +void IrPrinter::handle(const ComplexDouble* c) { + if (print_inline_) { + if (auto def = c->definition()) { + os_ << "( "; + handle(def); + os_ << " )"; + return; + } + } + + if (c->isSymbolic()) { + os_ << "c" << varName(c); + } else { + os_ << "std::complex" + << std::setprecision(std::numeric_limits::max_digits10) + << *(c->value()); + } +} + void IrPrinter::handle(const NamedScalar* ns) { os_ << ns->name(); } @@ -377,7 +392,8 @@ void IrPrinter::handle(const TernaryOp* top) { void IrPrinter::handle(const ReductionOp* rop) { indent() << rop->out() << " = reduction( " << rop->in() << ", op = " << rop->getReductionOpType() - << ", initial value = " << rop->init() << " )\n"; + << ", initial value = " << rop->init() + << ", fused = " << rop->isFused() << " )\n"; } void IrPrinter::handle(const WelfordOp* wop) { @@ -395,6 +411,7 @@ void IrPrinter::handle(const WelfordOp* wop) { os_ << "\n initial value = " << wop->initAvg() << "(Avg)\n " << wop->initVar() << "(Var)\n " << wop->initN() << "(N)"; } + os_ << "\n fused = " << wop->isFused(); os_ << " )\n"; } @@ -439,6 +456,11 @@ void IrPrinter::handle(const ShiftOp* sop) { << "}, {" << sop->padWidth() << "} )\n"; } +void IrPrinter::handle(const MmaOp* mma) { + indent() << mma->out() << " = mma(" << mma->inA() << "," << mma->inB(); + os_ << ")\n"; +} + void IrPrinter::handle(const GatherOp* op) { indent() << op->out() << " = gather( " << op->in() << ", {"; bool no_comma = true; @@ -461,6 +483,11 @@ void IrPrinter::handle(const GatherOp* op) { os_ << "} )\n"; } +void IrPrinter::handle(const ViewDtypeOp* top) { + indent() << top->out() << " = view.dtype( " << top->in() << ", " + << top->dtype() << " )\n"; +} + void IrPrinter::handle(const ViewOp* top) { indent() << top->out() << " = view( " << top->in() << " )\n"; } @@ -540,11 +567,17 @@ void IrPrinter::handle(const kir::Allocate* node) { } } -void IrPrinter::handle(const kir::Sync* node) { - indent() << "SYNC(war_hazard=" << boolLiteral(node->isWarHazardSync()) +void IrPrinter::handle(const kir::BlockSync* node) { + indent() << "BLOCKSYNC(war_hazard=" << boolLiteral(node->isWarHazardSync()) << ")\n"; } +void IrPrinter::handle(const kir::GridSync* node) { + indent() << "GRIDSYNC(" << node->syncDims().toString() << ", "; + handle(node->syncBuffer()); + os_ << ")\n"; +} + void IrPrinter::handle(const kir::ForLoop* node) { indent() << "FOR "; handle(node->index()); @@ -566,7 +599,19 @@ void IrPrinter::handle(const kir::IfThenElse* node) { } void IrPrinter::handle(const kir::GridBroadcast* node) { - TORCH_INTERNAL_ASSERT(false, "Not implemented yet."); + const auto* broadcast_op = node->broadcast_op(); + indent(); + handle(broadcast_op->out()); + os_ << " = " + << "GRID_BROADCAST(in="; + handle(broadcast_op->in()); + os_ << ")\n"; + indent() << kTab << ".broadcast_buffer="; + handle(node->broadcast_buffer()->buffer()); + os_ << "\n"; + indent() << kTab << ".sync_buffer="; + handle(node->sync_buffer()->buffer()); + os_ << "\n"; } void IrPrinter::handle(const kir::GridReduction* node) { @@ -579,8 +624,19 @@ void IrPrinter::handle(const kir::GridReduction* node) { handle(reduction_op->in()); os_ << ", init="; handle(reduction_op->init()); - os_ << ", pred="; - handle(reduction_op->predicate()); + os_ << ", read_pred="; + if (reduction_op->predicate() != nullptr) { + handle(reduction_op->predicate()); + } else { + os_ << "nullptr"; + } + os_ << ")\n"; + os_ << ", write_pred="; + if (reduction_op->writePredicate() != nullptr) { + handle(reduction_op->writePredicate()); + } else { + os_ << "nullptr"; + } os_ << ")\n"; indent() << kTab << ".reduction_buffer="; handle(node->reduction_buffer()->buffer()); @@ -588,8 +644,19 @@ void IrPrinter::handle(const kir::GridReduction* node) { indent() << kTab << ".sync_buffer="; handle(node->sync_buffer()->buffer()); os_ << "\n"; - indent() << kTab << ".grid_pred="; - handle(node->predicate()); + indent() << kTab << ".grid_read_pred="; + if (node->predicate() != nullptr) { + handle(node->predicate()); + } else { + os_ << "nullptr"; + } + os_ << "\n"; + indent() << kTab << ".grid_write_pred="; + if (node->writePredicate() != nullptr) { + handle(node->writePredicate()); + } else { + os_ << "nullptr"; + } os_ << "\n"; } @@ -619,8 +686,19 @@ void IrPrinter::handle(const kir::GridWelford* node) { os_ << " initN="; handle(welford_op->initN()); } - indent() << ", pred="; - handle(welford_op->predicate()); + indent() << ", read_pred="; + if (welford_op->predicate() != nullptr) { + handle(welford_op->predicate()); + } else { + os_ << "nullptr"; + } + os_ << ")\n"; + indent() << ", write_pred="; + if (welford_op->writePredicate() != nullptr) { + handle(welford_op->writePredicate()); + } else { + os_ << "nullptr"; + } os_ << ")\n"; indent() << kTab << ".var_buffer="; handle(node->var_buffer()->buffer()); @@ -632,8 +710,19 @@ void IrPrinter::handle(const kir::GridWelford* node) { indent() << kTab << ".sync_buffer="; handle(node->sync_buffer()->buffer()); os_ << "\n"; - indent() << kTab << ".grid_pred="; - handle(node->predicate()); + indent() << kTab << ".grid_read_pred="; + if (node->predicate() != nullptr) { + handle(node->predicate()); + } else { + os_ << "nullptr"; + } + os_ << "\n"; + indent() << kTab << ".grid_write_pred="; + if (node->writePredicate() != nullptr) { + handle(node->writePredicate()); + } else { + os_ << "nullptr"; + } os_ << "\n"; } @@ -645,6 +734,12 @@ void IrPrinter::handle(const kir::UpdateMagicZero* node) { indent() << "NVFUSER_UPDATE_MAGIC_ZERO\n"; } +void IrPrinter::handle(const kir::AllocateFusedReduction* node) { + indent() << "AllocateFusedReduction(reduction buffer="; + handle(node->out()); + os_ << ")\n"; +} + void IrTransformPrinter::handle(Fusion* f) { auto all_vals = f->usedMathVals(); diff --git a/torch/csrc/jit/codegen/cuda/ir_iostream.h b/torch/csrc/jit/codegen/cuda/ir_iostream.h index f8c07886114f16..e25e6ef0f865d3 100644 --- a/torch/csrc/jit/codegen/cuda/ir_iostream.h +++ b/torch/csrc/jit/codegen/cuda/ir_iostream.h @@ -79,6 +79,7 @@ class TORCH_CUDA_CU_API IrPrinter : public OptInConstDispatch { void handle(const Bool*) final; void handle(const Double*) final; void handle(const Int*) final; + void handle(const ComplexDouble*) final; void handle(const NamedScalar*) final; void handle(const UnaryOp*) final; @@ -86,10 +87,12 @@ class TORCH_CUDA_CU_API IrPrinter : public OptInConstDispatch { void handle(const TernaryOp*) final; void handle(const ReductionOp*) final; void handle(const WelfordOp*) final; + void handle(const MmaOp*) final; void handle(const BroadcastOp*) final; void handle(const TransposeOp*) final; void handle(const ShiftOp*) final; void handle(const GatherOp*) final; + void handle(const ViewDtypeOp*) final; void handle(const ViewOp*) final; void handle(const kir::Predicate*) final; @@ -101,9 +104,11 @@ class TORCH_CUDA_CU_API IrPrinter : public OptInConstDispatch { void handle(const kir::ForLoop*) final; void handle(const kir::IfThenElse*) final; void handle(const kir::Allocate*) final; - void handle(const kir::Sync*) final; + void handle(const kir::BlockSync*) final; + void handle(const kir::GridSync*) final; void handle(const kir::InitMagicZero*) final; void handle(const kir::UpdateMagicZero*) final; + void handle(const kir::AllocateFusedReduction*) final; // IR math printer overrides these to prevent them from printing, keep // override diff --git a/torch/csrc/jit/codegen/cuda/ir_nodes.cpp b/torch/csrc/jit/codegen/cuda/ir_nodes.cpp index 884b6a6e0eca79..44f2e29df5e9a4 100644 --- a/torch/csrc/jit/codegen/cuda/ir_nodes.cpp +++ b/torch/csrc/jit/codegen/cuda/ir_nodes.cpp @@ -152,6 +152,36 @@ bool Int::sameAs(const Statement* other) const { return false; } +ComplexDouble::ComplexDouble(IrBuilderPasskey passkey) + : Val(passkey, ValType::Scalar, DataType::ComplexDouble), + maybe_value_{c10::nullopt} {} + +ComplexDouble::ComplexDouble(IrBuilderPasskey passkey, ScalarType value) + : Val(passkey, ValType::Scalar, DataType::ComplexDouble), + maybe_value_{value} {} + +ComplexDouble::ComplexDouble( + IrBuilderPasskey passkey, + c10::optional value) + : Val(passkey, ValType::Scalar, DataType::ComplexDouble), + maybe_value_{value} {} + +ComplexDouble::ComplexDouble(const ComplexDouble* src, IrCloner* ir_cloner) + : Val(src, ir_cloner), maybe_value_(src->maybe_value_) {} + +bool ComplexDouble::sameAs(const Statement* other) const { + if (this == other) { + return true; + } + if (!other->isA()) { + return false; + } + const auto other_complex = other->as(); + if (isConst() && other_complex->isConst()) + return *value() == *(other_complex->value()); + return false; +} + UnaryOp::UnaryOp(IrBuilderPasskey passkey, UnaryOpType type, Val* out, Val* in) : Expr(passkey, ExprType::UnaryOp), unary_op_type_{type}, @@ -351,12 +381,14 @@ ReductionOp::ReductionOp( BinaryOpType reduction_op_type, Val* init, Val* out, - Val* in) + Val* in, + bool is_fused) : Expr(passkey, ExprType::ReductionOp), reduction_op_type_(reduction_op_type), init_(init), out_(out), - in_(in) { + in_(in), + is_fused_(is_fused) { TORCH_CHECK( out->getValType().value() == ValType::TensorView || out->getValType().value() == ValType::TensorIndex); @@ -393,7 +425,8 @@ WelfordOp::WelfordOp( Val* init_N, Val* in_avg, Val* in_var, - Val* in_N) + Val* in_N, + bool is_fused) : Expr(passkey, ExprType::WelfordOp), out_avg_(out_avg), out_var_(out_var), @@ -403,7 +436,8 @@ WelfordOp::WelfordOp( init_N_(init_N), in_avg_(in_avg), in_var_(in_var), - in_N_(in_N) { + in_N_(in_N), + is_fused_(is_fused) { // Check output type TORCH_INTERNAL_ASSERT( out_avg->getValType().value() == ValType::TensorView || @@ -472,7 +506,8 @@ WelfordOp::WelfordOp(const WelfordOp* src, IrCloner* ir_cloner) init_N_(ir_cloner->clone(src->init_N_)), in_avg_(ir_cloner->clone(src->in_avg_)), in_var_(src->in_var_ ? ir_cloner->clone(src->in_var_) : nullptr), - in_N_(ir_cloner->clone(src->in_N_)) {} + in_N_(ir_cloner->clone(src->in_N_)), + is_fused_(src->is_fused_) {} namespace { inline bool sameOptionalVal(Val* a, Val* b) { @@ -495,12 +530,75 @@ bool WelfordOp::sameAs(const Statement* other) const { return false; } +MmaOp::MmaOp( + IrBuilderPasskey passkey, + Val* out, + Val* in_a, + Val* in_b, + Val* init) + : Expr(passkey, ExprType::MmaOp), + out_(out), + in_a_(in_a), + in_b_(in_b), + init_(init) { + // Check output type + TORCH_INTERNAL_ASSERT( + out->getValType().value() == ValType::TensorView || + out->getValType().value() == ValType::TensorIndex); + + TORCH_INTERNAL_ASSERT( + in_a->getValType().value() == ValType::TensorView || + in_a->getValType().value() == ValType::TensorIndex, + in_a->getValType().value()); + + TORCH_INTERNAL_ASSERT( + in_b->getValType().value() == ValType::TensorView || + in_b->getValType().value() == ValType::TensorIndex, + in_b->getValType().value()); + + addOutput(out); + addInput(in_a); + addInput(in_b); +} + +MmaOp::MmaOp( + IrBuilderPasskey passkey, + Val* out, + Val* in_a, + Val* in_b, + Val* init, + MmaOptions options) + : MmaOp(passkey, out, in_a, in_b, init) { + options_ = options; +} + +MmaOp::MmaOp(const MmaOp* src, IrCloner* ir_cloner) + : Expr(src, ir_cloner), + out_(ir_cloner->clone(src->out_)), + in_a_(ir_cloner->clone(src->in_a_)), + in_b_(ir_cloner->clone(src->in_b_)), + init_(ir_cloner->clone(src->init_)), + options_(src->options_) {} + +bool MmaOp::sameAs(const Statement* other) const { + if (this == other) { + return true; + } + if (auto other_mma = dynamic_cast(other)) { + return out_->sameAs(other_mma->out_) && in_a_->sameAs(other_mma->in_a_) && + in_b_->sameAs(other_mma->in_b_) && init_->sameAs(other_mma->init_) && + options_ == other_mma->options_; + } + return false; +} + ReductionOp::ReductionOp(const ReductionOp* src, IrCloner* ir_cloner) : Expr(src, ir_cloner), reduction_op_type_(src->reduction_op_type_), init_(ir_cloner->clone(src->init_)), out_(ir_cloner->clone(src->out_)), - in_(ir_cloner->clone(src->in_)) {} + in_(ir_cloner->clone(src->in_)), + is_fused_(src->is_fused_) {} bool ReductionOp::sameAs(const Statement* other) const { if (this == other) { @@ -697,6 +795,22 @@ int GatherOp::gatherAxis(int axis) const { return int(windowShape().size()) + axis; } +ViewDtypeOp::ViewDtypeOp( + IrBuilderPasskey passkey, + TensorView* out, + TensorView* in, + DataType dtype) + : Expr(passkey, ExprType::ViewDtypeOp), out_(out), in_(in), dtype_(dtype) { + addOutput(out); + addInput(in); +} + +ViewDtypeOp::ViewDtypeOp(const ViewDtypeOp* src, IrCloner* ir_cloner) + : Expr(src, ir_cloner), + out_(ir_cloner->clone(src->out_)), + in_(ir_cloner->clone(src->in_)), + dtype_(src->dtype()) {} + ViewOp::ViewOp(IrBuilderPasskey passkey, TensorView* out, TensorView* in) : Expr(passkey, ExprType::ViewOp), out_(out), in_(in) { addOutput(out); @@ -767,7 +881,8 @@ IterDomain::IterDomain(const IterDomain* src, IrCloner* ir_cloner) iter_type_(src->iter_type_), is_rfactor_domain_(src->is_rfactor_domain_), is_padded_dimension_(src->is_padded_dimension_), - padded_to_size_(src->padded_to_size_) {} + padded_to_size_(src->padded_to_size_), + is_mma_swizzled_(src->is_mma_swizzled_) {} bool IterDomain::sameAs(const Statement* other) const { if (other == this) { @@ -978,6 +1093,12 @@ void IterDomain::parallelize(ParallelType t) { extent(), " ."); } + + if (isMmaSwizzled()) { + TORCH_CHECK( + t == ParallelType::Vectorize, + "Parallel type other than vectorize not allowed for warp mapped ids"); + } } bool IterDomain::maybePartial() const { @@ -1314,6 +1435,10 @@ void TensorDomain::split( "Partial split is only allowed with root domains"); } + TORCH_INTERNAL_ASSERT( + !id->isMmaSwizzled(), + "Further transformation on warp mapped id's not allowed."); + auto split_ids = IterDomain::split(id, factor, inner_split, trim_out_of_bounds); domain_.erase(domain_.begin() + axis_); @@ -1349,6 +1474,10 @@ void TensorDomain::merge(int axis_o, int axis_i) { IterDomain* first = axis(axis_o); IterDomain* second = axis(axis_i); + TORCH_INTERNAL_ASSERT( + !first->isMmaSwizzled() && !second->isMmaSwizzled(), + "Further transformation on warp mapped id's not allowed."); + IterDomain* merged_id = IterDomain::merge(first, second); domain_.erase(domain_.begin() + axis_i); diff --git a/torch/csrc/jit/codegen/cuda/ir_utils.cpp b/torch/csrc/jit/codegen/cuda/ir_utils.cpp index 004cfa23dff43c..6abc8cce0723b7 100644 --- a/torch/csrc/jit/codegen/cuda/ir_utils.cpp +++ b/torch/csrc/jit/codegen/cuda/ir_utils.cpp @@ -254,6 +254,21 @@ struct SubstituteInExpr : public OptInDispatch { gather_expr->padWidth()); } + void handle(ViewDtypeOp* view_expr) final { + TORCH_INTERNAL_ASSERT( + substitute_->isA(), + "All args to view must be TensorView, but received a non-TensorView for replacement: ", + substitute_); + auto in = reference_->sameAs(view_expr->in()) + ? substitute_->as() + : view_expr->in(); + auto out = reference_->sameAs(view_expr->out()) + ? substitute_->as() + : view_expr->out(); + expr_ = IrBuilder::create( + view_expr->container(), out, in, view_expr->dtype()); + } + void handle(ViewOp* view_expr) final { TORCH_INTERNAL_ASSERT( substitute_->isA(), @@ -309,7 +324,29 @@ struct SubstituteInExpr : public OptInDispatch { init_N, in_avg, in_var, - in_N); + in_N, + welford_expr->isFused()); + } + + void handle(MmaOp* mma_expr) final { + TORCH_INTERNAL_ASSERT( + substitute_->isA(), + "All args to MmaOp must be TensorView, but received a non-TensorView for replacement: ", + substitute_); + auto in_a = reference_->sameAs(mma_expr->inA()) + ? substitute_->as() + : mma_expr->inA(); + auto in_b = reference_->sameAs(mma_expr->inB()) + ? substitute_->as() + : mma_expr->inB(); + auto out = reference_->sameAs(mma_expr->out()) + ? substitute_->as() + : mma_expr->out(); + auto init = reference_->sameAs(mma_expr->init()) + ? substitute_->as() + : mma_expr->init(); + expr_ = IrBuilder::create( + mma_expr->container(), out, in_a, in_b, init, mma_expr->options()); } private: @@ -434,7 +471,7 @@ std::vector allTvs(Fusion* fusion) { return uniqueEntries({used_tvs.begin(), used_tvs.end()}); } -std::vector getReductionOps(Fusion* fusion) { +std::vector getReductionOps(Fusion* fusion, bool ignore_trivial) { std::vector red_ops; for (auto expr : fusion->exprs()) { const Val* out_val = nullptr; @@ -452,8 +489,9 @@ std::vector getReductionOps(Fusion* fusion) { if (std::any_of( out_tv->getRootDomain().begin(), out_tv->getRootDomain().end(), - [](IterDomain* id) { - return id->isReduction() && !id->isTrivialReduction(); + [&ignore_trivial](IterDomain* id) { + return id->isReduction() && + !(ignore_trivial && id->isTrivialReduction()); })) { red_ops.push_back(expr); } @@ -461,6 +499,73 @@ std::vector getReductionOps(Fusion* fusion) { return red_ops; } +namespace { + +class ValReplacementMutator : private OptOutMutator { + public: + ValReplacementMutator( + Fusion* fusion, + const std::unordered_map& replacement_map) + : replacement_map_(replacement_map) { + FusionGuard fg(fusion); + + // Welford makes this a little annoying since it holds a count which is + // typically not used by anything else. If we don't grab that count, then it + // would be a tensorview that doesn't get updated extents. Therefore, first + // grab all leaves towards outputs and grab stmts from there. + auto stmts = StmtSort::getStmts(fusion, allLeafOuts(fusion), true); + for (auto stmt : stmts) { + mutate(stmt); + } + } + + private: + using OptOutMutator::mutate; + void mutate(Val* val) final { + if (replacement_map_.find(val) == replacement_map_.end()) { + return OptOutMutator::mutate(val); + } + auto replaced_val = replacement_map_.at(val); + registerMutation(val, replaced_val); + } + + std::vector allLeafOuts(Fusion* fusion) { + auto exprs = StmtSort::getExprs(fusion, true); + std::unordered_set inputs; + std::unordered_set outputs; + std::vector ordered_outputs; + for (auto expr : exprs) { + inputs.insert(expr->inputs().begin(), expr->inputs().end()); + outputs.insert(expr->outputs().begin(), expr->outputs().end()); + ordered_outputs.insert( + ordered_outputs.end(), + expr->outputs().begin(), + expr->outputs().end()); + } + for (auto input : inputs) { + outputs.erase(input); + } + + std::vector ordered_leaf_outs; + for (auto out : ordered_outputs) { + if (outputs.find(out) != outputs.end()) { + ordered_leaf_outs.push_back(out); + } + } + return ordered_leaf_outs; + } + + const std::unordered_map& replacement_map_; +}; + +} // namespace + +void replaceValue( + Fusion* fusion, + const std::unordered_map& replacement_map) { + ValReplacementMutator(fusion, replacement_map); +} + } // namespace ir_utils } // namespace cuda } // namespace fuser diff --git a/torch/csrc/jit/codegen/cuda/ir_utils.h b/torch/csrc/jit/codegen/cuda/ir_utils.h index 1bf3f27ec0b9bc..dd5c9dd13e83ae 100644 --- a/torch/csrc/jit/codegen/cuda/ir_utils.h +++ b/torch/csrc/jit/codegen/cuda/ir_utils.h @@ -12,6 +12,11 @@ namespace fuser { namespace cuda { namespace ir_utils { +// Replace values in fusion using ValReplacementMutator +void replaceValue( + Fusion*, + const std::unordered_map& replacement_map); + template class FilterIterator { public: @@ -178,7 +183,9 @@ TORCH_CUDA_CU_API std::vector outputTvsOf( // returns all tensor views in fusion that are used between outputs and inputs. TORCH_CUDA_CU_API std::vector allTvs(Fusion* fusion); -TORCH_CUDA_CU_API std::vector getReductionOps(Fusion* fusion); +TORCH_CUDA_CU_API std::vector getReductionOps( + Fusion* fusion, + bool ignore_trivial = true); } // namespace ir_utils } // namespace cuda diff --git a/torch/csrc/jit/codegen/cuda/iter_visitor.cpp b/torch/csrc/jit/codegen/cuda/iter_visitor.cpp index 894b40f79e3fa1..34345600b465d4 100644 --- a/torch/csrc/jit/codegen/cuda/iter_visitor.cpp +++ b/torch/csrc/jit/codegen/cuda/iter_visitor.cpp @@ -83,6 +83,10 @@ class RecursiveDependencies : public OptInDispatch { simpleVal(stmt); } + void handle(ComplexDouble* stmt) final { + simpleVal(stmt); + } + void handle(NamedScalar* stmt) final { simpleVal(stmt); } @@ -593,6 +597,9 @@ class DependentVals : public IterVisitor { std::unordered_set outs_; // Boundary where we want to stop searching beyond + // TODO: Based on the todo below, shouldn't we stop just at the definition of? + // If we really wanted to make this traverse left, wouldn't we first check + // which outputs are outputs dependent on of? std::unordered_set boundary_; std::vector next(Val* v) override { @@ -616,6 +623,11 @@ class DependentVals : public IterVisitor { } // optimization to limit search path + // TODO: Is this valid? Couldn't something like: + // out0 = of + val0 + // out1 = out0 + val1 + // out2 = TernaryOp(out1, val0, of) + // Hide the dep of out1 on of? void createBoundary() { for (auto v_of : of_) { for (auto v_expr : v_of->uses()) { diff --git a/torch/csrc/jit/codegen/cuda/kernel.cpp b/torch/csrc/jit/codegen/cuda/kernel.cpp index b9062f5bc458fb..54963709bd1cb5 100644 --- a/torch/csrc/jit/codegen/cuda/kernel.cpp +++ b/torch/csrc/jit/codegen/cuda/kernel.cpp @@ -49,7 +49,7 @@ class KernelIrScanner : private IrVisitor { handle(out); } } - void handle(Sync* sync) final { + void handle(BlockSync* sync) final { // TODO: Move to a dedicated validation pass // which is not on the common execution/compilation path if (sync->isWarHazardSync()) { @@ -57,6 +57,10 @@ class KernelIrScanner : private IrVisitor { } } + void handle(GridSync* sync) final { + summary_.has_cooperative_grid_reduction = true; + } + void handle(Allocate* allocate) final { switch (allocate->memoryType()) { case MemoryType::Global: @@ -276,6 +280,12 @@ void Kernel::finalize(std::vector top_level_exprs) { warp_padded_parallel_info_ = GpuLower::current()->getWarpPaddedParallelInfo(); ValidateAllocation::validate(this); analyze(); + // Make sure this is after analyze as it sets summary_ + summary_.vectorized_accesses = GpuLower::current()->vectorizedAccesses(); + summary_.vectorized_set_info = GpuLower::current()->vectorizedSetInfo(); + summary_.sync_map = GpuLower::current()->syncMap(); + summary_.parallel_dimension_map_ = + GpuLower::current()->parallelDimensionMap(); } void Kernel::analyze() { @@ -345,6 +355,10 @@ void Kernel::registerExpr(Expr* expr) { Fusion::registerExpr(expr); } +std::vector& KernelInternalProxy::topLevelExprs() { + return kernel_->top_level_exprs_; +} + } // namespace kir } // namespace cuda } // namespace fuser diff --git a/torch/csrc/jit/codegen/cuda/kernel.h b/torch/csrc/jit/codegen/cuda/kernel.h index 0c8bbdef9dfdfd..4930da1a287212 100644 --- a/torch/csrc/jit/codegen/cuda/kernel.h +++ b/torch/csrc/jit/codegen/cuda/kernel.h @@ -5,8 +5,11 @@ #include #include #include +#include #include +#include #include +#include #include #include @@ -78,12 +81,31 @@ struct KernelSummary { //! Effective ParallelTypes of broadcast ops std::unordered_map broadcast_parallel_types; + + //! Track which tensor views are inputs or outputs of a vectorized operation + //! and their maximum vectorized access size + std::unordered_map vectorized_accesses; + + // Sync map is needed to figure out if global memory buffers need to be marked + // as volatile because they're used for communication. + SyncMap sync_map; + + // Parallel dimension map needed to set the correct properties of grid buffers + // (is a dim inactive) + ParallelDimensionMap parallel_dimension_map_; + + //! Track information on vectorized set operations for runtime validation + std::vector vectorized_set_info; }; +class KernelInternalProxy; + //! Container for a lowered Kernel IR //! // NOLINTNEXTLINE(cppcoreguidelines-pro-type-member-init) class TORCH_CUDA_CU_API Kernel final : public Fusion { + friend KernelInternalProxy; + public: // Kernel starts by grabbing all the nodes from the provided fusion. // Kernel is not SSA, if a definition is not set, we should update it, but @@ -91,7 +113,9 @@ class TORCH_CUDA_CU_API Kernel final : public Fusion { // we do something like generate an initialization statement for a reduction // TV, we may want to continue to do fusion like analysis on the original // expression. - Kernel(Fusion* fusion) : Fusion(*fusion) {} + // TODO: Assert index type is int or int32 + Kernel(Fusion* fusion, DataType index_type = DataType::Int) + : Fusion(*fusion), index_type_(index_type) {} Kernel() = delete; @@ -102,8 +126,7 @@ class TORCH_CUDA_CU_API Kernel final : public Fusion { //! Finalize a kernel definition //! //! At this point we have a complete kernel definition and we can - //! run analysis passes to build a KernelSummary - //! + //! run analysis passes to build a KernelSummary. void finalize(std::vector top_level_exprs); const std::vector& topLevelExprs() const { @@ -114,6 +137,10 @@ class TORCH_CUDA_CU_API Kernel final : public Fusion { return summary_; } + DataType indexType() const { + return index_type_; + } + //! Checks if parallel type is padded bool isParallelTypePadded(ParallelType ptype) const { return ptype == ParallelType::TIDx && @@ -140,16 +167,32 @@ class TORCH_CUDA_CU_API Kernel final : public Fusion { // Analyze the kernel IR and caches the summary of interesting data void analyze(); - private: // Top level statements std::vector top_level_exprs_; // Summary of interesting kernel data KernelSummary summary_; + // Is this kernel being compiled with int32 or int64 indexing. This + // information is required to resolve DataType::Index + DataType index_type_ = DataType::Int; + WarpPaddedParallelInfo warp_padded_parallel_info_; }; +//! A special debugging proxy for Kernel. +//! +//! Should not be used for other than testing and debugging. +class TORCH_CUDA_CU_API KernelInternalProxy { + public: + KernelInternalProxy(Kernel* kernel) : kernel_(kernel) {} + + std::vector& topLevelExprs(); + + private: + Kernel* kernel_ = nullptr; +}; + } // namespace kir } // namespace cuda } // namespace fuser diff --git a/torch/csrc/jit/codegen/cuda/kernel_cache.cpp b/torch/csrc/jit/codegen/cuda/kernel_cache.cpp index c1c113dbbc4ac3..6a1d50462f957a 100644 --- a/torch/csrc/jit/codegen/cuda/kernel_cache.cpp +++ b/torch/csrc/jit/codegen/cuda/kernel_cache.cpp @@ -4,6 +4,7 @@ #include #include #include +#include #include #include @@ -118,7 +119,11 @@ InputsIdLookup::IdLookupReturn InputsIdLookup::lookupId( } FusionExecutorCache::FusionExecutorCache(std::unique_ptr fusion) - : fusion_(std::move(fusion)) {} + : fusion_(std::move(fusion)) { + for (const auto& indices : fusion_->getOutputAliasIndices()) { + aliased_output_indices_.insert(indices); + } +} // Note [ Permutation support in nvfuser ] // @@ -187,6 +192,12 @@ std::vector FusionExecutorCache::runFusionWithInputs( outputs[pair.first] = outputs[pair.first].permute(pair.second); } + int offset = 0; + for (const auto& v : aliased_output_indices_) { + outputs.erase(outputs.begin() + v - offset); + offset++; + } + return outputs; } @@ -634,6 +645,8 @@ void GraphCache::createFusion(const std::shared_ptr& graph) { fusion_executor_cache_ = std::make_unique(parseJitIR(graph)); + + num_of_outputs_ = graph->outputs().size(); } // NOLINTNEXTLINE(cppcoreguidelines-pro-type-member-init) @@ -642,6 +655,8 @@ GraphCache::GraphCache(const std::shared_ptr& graph) { TORCH_INTERNAL_ASSERT( IsNewExecutorEnabled(), "legacy executor is not supported by nvfuser"); + GRAPH_DEBUG("GraphCache constructor: ", this); + GRAPH_DUMP("GraphCache created for graph", graph); createFusion(graph); } @@ -649,7 +664,16 @@ std::vector GraphCache::runGraphWithInputs( const at::ArrayRef& inputs) { FUSER_PERF_SCOPE("GraphCache::runGraphWithInputs"); - return fusion_executor_cache_->runFusionWithInputs(inputs); + GRAPH_DEBUG("running GraphCache: ", this); + auto outputs = fusion_executor_cache_->runFusionWithInputs(inputs); + TORCH_INTERNAL_ASSERT( + outputs.size() == num_of_outputs_, + "FusionExecutorCache returned ", + outputs.size(), + " outputs, doesn't match computational graph, which requires ", + num_of_outputs_); + + return outputs; } } // namespace cuda diff --git a/torch/csrc/jit/codegen/cuda/kernel_cache.h b/torch/csrc/jit/codegen/cuda/kernel_cache.h index cba42f99dc4c36..71dd6c3592d00a 100644 --- a/torch/csrc/jit/codegen/cuda/kernel_cache.h +++ b/torch/csrc/jit/codegen/cuda/kernel_cache.h @@ -410,6 +410,11 @@ class TORCH_CUDA_CU_API FusionExecutorCache { //! TODO: this can be largely expanded to look at complete //! caching profiles. Currently it just makes it easier to test FusionKernelRuntime* most_recent_runtime_ = nullptr; + + //! indices of fusion outputs that are aliased to inputs. These are used only + //! to support in-place update and should have been dropped before pushing + //! outputs to stack. + std::set aliased_output_indices_; }; class GraphCache { @@ -426,15 +431,15 @@ class GraphCache { const at::ArrayRef& inputs); private: - //! Computation graph; - std::shared_ptr graph_; - //! construct FusionExecutorCache void createFusion(const std::shared_ptr& graph); private: //! FusionExecutorCache that performs schedule and kernel execution; std::unique_ptr fusion_executor_cache_; + + //! num of outputs + size_t num_of_outputs_ = 0; }; } // namespace cuda diff --git a/torch/csrc/jit/codegen/cuda/kernel_ir.cpp b/torch/csrc/jit/codegen/cuda/kernel_ir.cpp index 5d2eb44f8a8cb9..46fdc78aade718 100644 --- a/torch/csrc/jit/codegen/cuda/kernel_ir.cpp +++ b/torch/csrc/jit/codegen/cuda/kernel_ir.cpp @@ -78,13 +78,21 @@ TensorIndex::TensorIndex( } } -Sync::Sync(IrBuilderPasskey passkey, bool war_sync) - : Expr(passkey, ExprType::Sync), war_sync_(war_sync) { +BlockSync::BlockSync(IrBuilderPasskey passkey, bool war_sync) + : Expr(passkey, ExprType::BlockSync), war_sync_(war_sync) { TORCH_INTERNAL_ASSERT( passkey.ir_container_->isA(), "IR type only valid for Kernel container."); } +GridSync::GridSync( + IrBuilderPasskey passkey, + ParallelTypeBitmap sync_dims, + Val* sync_buffer) + : Expr(passkey, ExprType::GridSync), + sync_dims_(sync_dims), + sync_buffer_(sync_buffer) {} + InitMagicZero::InitMagicZero(IrBuilderPasskey passkey) : Expr(passkey, ExprType::InitMagicZero) { TORCH_INTERNAL_ASSERT( @@ -206,7 +214,8 @@ ForLoop::ForLoop(IrBuilderPasskey passkey, IterDomain* iter_domain) nullptr, nullptr, nullptr, - isParallelTypeVectorize(iter_domain->getParallelType()), + !iter_domain->isBroadcast() && + isParallelTypeVectorize(iter_domain->getParallelType()), nullptr, false) { TORCH_INTERNAL_ASSERT( @@ -298,6 +307,51 @@ Val* ForLoop::step() const { return step_; } +bool ForLoop::isTrivial() const { + // These loops are not materialized + if (vectorize() || iter_domain()->isBroadcast() || + iter_domain()->isStride() || iter_domain()->isMma()) { + return true; + } + + // By default, a parallelized loop would look like: + // + // for (int x = threadIdx.x; x < stop; x += blockDim.x) { + // do_some_comp(x); + // } + // + // When stop is guaranteed to be smaller or equal to the number of + // threads, the for-loop is not necessary. In the above case, we + // would just generate the loop body without the for clause but + // references to the loop index replaced by the loop start value. + // + // When the loop end is the same as the IterDomain extent, the + // assumption can be safely made. This is more conservative than + // necessary since the loop stop value just needs to be <= the + // IterDomain extent. However, at this point, this conservative + // analysis seems sufficient. + if (stop() == iter_domain()->extent() && iter_domain()->isThread()) { + return true; + } + + // Extent-1 loop: for (int i = 0; i < 1; ++i) { + if (start()->isZeroInt() && stop()->isOneInt() && step()->isOneInt()) { + return true; + } + + // Another extent-1 loop: for (int i = N - 1; i < N; ++i) { + if (start()->definition() != nullptr && + start()->definition()->isA() && + start()->definition()->as()->getBinaryOpType() == + BinaryOpType::Sub && + start()->definition()->as()->lhs() == stop() && + start()->definition()->as()->rhs()->isOneInt()) { + return true; + } + + return false; +} + IfThenElse::IfThenElse(IrBuilderPasskey passkey, Predicate* cond) : Expr(passkey, ExprType::IfThenElse), then_body_(this), else_body_(this) { setPredicate(cond); @@ -419,6 +473,50 @@ GridWelford::GridWelford( "IR type only valid for Kernel container."); } +AllocateFusedReduction::AllocateFusedReduction( + IrBuilderPasskey passkey, + GridReduction* grid_reduction) + : Expr(passkey, ExprType::AllocateFusedReduction), + grid_expr_(grid_reduction) { + TORCH_INTERNAL_ASSERT( + passkey.ir_container_->isA(), + "IR type only valid for Kernel container."); +} + +AllocateFusedReduction::AllocateFusedReduction( + IrBuilderPasskey passkey, + GridWelford* grid_welford) + : Expr(passkey, ExprType::AllocateFusedReduction), + grid_expr_(grid_welford) { + TORCH_INTERNAL_ASSERT( + passkey.ir_container_->isA(), + "IR type only valid for Kernel container."); +} + +TensorIndex* AllocateFusedReduction::out() const { + TORCH_INTERNAL_ASSERT(grid_expr_ != nullptr); + if (auto grid_reduction = dynamic_cast(grid_expr_)) { + return grid_reduction->reduction_op()->out()->as(); + } else if (auto grid_welford = dynamic_cast(grid_expr_)) { + return grid_welford->welford_op()->out()->as(); + } else { + TORCH_INTERNAL_ASSERT( + false, "Invalid grid expression: ", grid_expr_->toString()); + } +} + +const ParallelTypeBitmap& AllocateFusedReduction::threadPredicate() const { + TORCH_INTERNAL_ASSERT(grid_expr_ != nullptr); + if (auto grid_reduction = dynamic_cast(grid_expr_)) { + return grid_reduction->threadPredicate(); + } else if (auto grid_welford = dynamic_cast(grid_expr_)) { + return grid_welford->threadPredicate(); + } else { + TORCH_INTERNAL_ASSERT( + false, "Invalid grid expression: ", grid_expr_->toString()); + } +} + } // namespace kir } // namespace cuda } // namespace fuser diff --git a/torch/csrc/jit/codegen/cuda/kernel_ir.h b/torch/csrc/jit/codegen/cuda/kernel_ir.h index ad6be90bf98a58..bc714e5d87e470 100644 --- a/torch/csrc/jit/codegen/cuda/kernel_ir.h +++ b/torch/csrc/jit/codegen/cuda/kernel_ir.h @@ -52,7 +52,8 @@ class TensorIndex; // Expressions class Allocate; -class Sync; +class BlockSync; +class GridSync; class InitMagicZero; class UpdateMagicZero; class ForLoop; @@ -60,6 +61,7 @@ class IfThenElse; class GridReduction; class GridBroadcast; class GridWelford; +class AllocateFusedReduction; // Expr container class Scope; @@ -143,7 +145,7 @@ class TORCH_CUDA_CU_API TensorIndex final : public Val { public: TensorIndex( IrBuilderPasskey, - const fuser::cuda::TensorView* view, + const TensorView* view, std::vector indices); std::vector::size_type nDims() const { @@ -240,9 +242,9 @@ class TORCH_CUDA_CU_API Allocate final : public Expr { // // TODO(kir): change name to SyncThreads as we could have other barriers. // -class TORCH_CUDA_CU_API Sync final : public Expr { +class TORCH_CUDA_CU_API BlockSync final : public Expr { public: - explicit Sync(IrBuilderPasskey passkey, bool war_sync = false); + explicit BlockSync(IrBuilderPasskey passkey, bool war_sync = false); bool isWarHazardSync() const { return war_sync_; @@ -253,6 +255,28 @@ class TORCH_CUDA_CU_API Sync final : public Expr { bool war_sync_ = false; }; +// Synchronize all blocks in device, implies cooperative group launch is +// required. +class TORCH_CUDA_CU_API GridSync final : public Expr { + public: + explicit GridSync( + IrBuilderPasskey passkey, + ParallelTypeBitmap sync_dims, + Val* sync_buffer); + + ParallelTypeBitmap syncDims() const { + return sync_dims_; + } + + Val* syncBuffer() const { + return sync_buffer_; + } + + private: + ParallelTypeBitmap sync_dims_; + Val* sync_buffer_ = nullptr; +}; + // Simply prints "DEFINE_MAGIC_ZERO" in the code in accordance with magic_zero // in helpers.cu class TORCH_CUDA_CU_API InitMagicZero final : public Expr { @@ -408,6 +432,9 @@ class TORCH_CUDA_CU_API ForLoop final : public Expr { unroll_required_ = true; } + //! True if no actual for-loop is materialized + bool isTrivial() const; + private: //! Returns if a loop could be unrolled. bool isUnrollable() const; @@ -603,6 +630,30 @@ class TORCH_CUDA_CU_API GridWelford final : public Expr { ParallelTypeBitmap thread_predicate_; }; +// Allocate an instance of the fused reduction class. +class TORCH_CUDA_CU_API AllocateFusedReduction final : public Expr { + public: + explicit AllocateFusedReduction( + IrBuilderPasskey passkey, + GridReduction* grid_reduction); + + explicit AllocateFusedReduction( + IrBuilderPasskey passkey, + GridWelford* grid_welford); + + Expr* gridExpr() const { + return grid_expr_; + } + + TensorIndex* out() const; + + const ParallelTypeBitmap& threadPredicate() const; + + private: + //! GridReduction or GridWelford + Expr* grid_expr_ = nullptr; +}; + } // namespace kir } // namespace cuda } // namespace fuser diff --git a/torch/csrc/jit/codegen/cuda/kernel_ir_dispatch.cpp b/torch/csrc/jit/codegen/cuda/kernel_ir_dispatch.cpp index bfc4794e299b4e..a64b07da4a0538 100644 --- a/torch/csrc/jit/codegen/cuda/kernel_ir_dispatch.cpp +++ b/torch/csrc/jit/codegen/cuda/kernel_ir_dispatch.cpp @@ -17,15 +17,18 @@ std::vector IrVisitor::handle(const std::vector& exprs) { void IrVisitor::handle(ForLoop* fl) { for_loops_.push_back(fl); scope_.push_back(&fl->body()); + scope_exprs_.push_back(fl); auto body_exprs = std::vector(fl->body().exprs()); for (auto expr : body_exprs) { handle(expr); } + scope_exprs_.pop_back(); scope_.pop_back(); for_loops_.pop_back(); } void IrVisitor::handle(IfThenElse* ite) { + scope_exprs_.push_back(ite); scope_.push_back(&ite->thenBody()); auto then_exprs = std::vector(ite->thenBody().exprs()); for (auto expr : then_exprs) { @@ -39,10 +42,11 @@ void IrVisitor::handle(IfThenElse* ite) { handle(expr); } scope_.pop_back(); + scope_exprs_.pop_back(); } std::vector ExprMutator::mutate(bool reverse_order) { - if (insertions_.empty() && replacements_.empty()) { + if (insertions_.empty() && replacements_.empty() && removal_.empty()) { return exprs_; } @@ -107,6 +111,22 @@ std::vector ExprMutator::mutate(bool reverse_order) { } } + for (auto removal_info : removal_) { + if (removal_info.scope == nullptr) { + auto pos_it = + std::find(exprs_.begin(), exprs_.end(), removal_info.reference); + TORCH_INTERNAL_ASSERT( + pos_it != exprs_.end(), "Issue finding expression to remove."); + exprs_.erase(pos_it); + } else { + TORCH_INTERNAL_ASSERT( + removal_info.scope->contains(removal_info.reference), + "Expression to remove is not found in the given scope: ", + removal_info.reference->toString()); + removal_info.scope->erase(removal_info.reference); + } + } + insertions_.clear(); replacements_.clear(); @@ -132,8 +152,12 @@ void ExprMutator::registerMutation( mutation.mode = mode; if (mode == MutationMode::BEFORE || mode == MutationMode::AFTER) { insertions_.push_back(mutation); - } else { + } else if (mode == MutationMode::REPLACE) { replacements_.push_back(mutation); + } else if (mode == MutationMode::REMOVE) { + removal_.push_back(mutation); + } else { + TORCH_INTERNAL_ASSERT(false, "Invalid mutation type"); } } @@ -158,6 +182,10 @@ void ExprMutator::registerReplace( registerMutation(reference, new_expr, scope, MutationMode::REPLACE); } +void ExprMutator::registerRemove(Expr* expr_to_remove, Scope* scope) { + registerMutation(expr_to_remove, nullptr, scope, MutationMode::REMOVE); +} + void ExprMutator::registerInsertBefore(Expr* reference, Expr* new_expr) { Scope* scope = scope_.empty() ? nullptr : scope_.back(); registerInsertBefore(reference, new_expr, scope); @@ -173,6 +201,11 @@ void ExprMutator::registerReplace(Expr* reference, Expr* new_expr) { registerReplace(reference, new_expr, scope); } +void ExprMutator::registerRemove(Expr* expr_to_remove) { + Scope* scope = scope_.empty() ? nullptr : scope_.back(); + registerRemove(expr_to_remove, scope); +} + } // namespace kir } // namespace cuda } // namespace fuser diff --git a/torch/csrc/jit/codegen/cuda/kernel_ir_dispatch.h b/torch/csrc/jit/codegen/cuda/kernel_ir_dispatch.h index 2140498af14009..d665c4a6fdf539 100644 --- a/torch/csrc/jit/codegen/cuda/kernel_ir_dispatch.h +++ b/torch/csrc/jit/codegen/cuda/kernel_ir_dispatch.h @@ -41,14 +41,15 @@ class TORCH_CUDA_CU_API IrVisitor : public OptOutDispatch { protected: std::vector for_loops_; std::vector scope_; + std::vector scope_exprs_; std::vector exprs_; }; // Base Expr Mutator class that visits all nodes with IrVisitor, and then -// inserts new expressions or replaces expressions based on insertion/replace -// maps provided. These replacement maps are expected to accumulate during an -// initial traversal, then runs an insertion based on them after the overloaded -// traversal. +// inserts new expressions, replaces expressions based on insertion/replace +// maps provided or removes existing expressions. These replacement +// maps are expected to accumulate during an initial traversal, then +// runs an insertion based on them after the overloaded traversal. // // Order of mutations may be important, mutations are ordered according to the // following rules: @@ -61,6 +62,8 @@ class TORCH_CUDA_CU_API IrVisitor : public OptOutDispatch { // Before/After insertions are done before Expr replacements, so reference for // insertions must be on pre-replaced Exprs // +// Removal of expressions is done after replacements. +// // To place in a scope that is empty, simply provide a nullptr reference // Since insertions are done in order, it's possible to insert an expression in // an empty scope, and then use that inserted scope as a reference for @@ -79,6 +82,7 @@ class ExprMutator : public IrVisitor { void registerInsertBefore(Expr* reference, Expr* new_expr, Scope* scope); void registerInsertAfter(Expr* reference, Expr* new_expr, Scope* scope); void registerReplace(Expr* reference, Expr* new_expr, Scope* scope); + void registerRemove(Expr* expr_to_remove, Scope* scope); // Registration function which need to be called "in place" during visiting. // I.E. @@ -87,9 +91,10 @@ class ExprMutator : public IrVisitor { void registerInsertBefore(Expr* reference, Expr* new_expr); void registerInsertAfter(Expr* reference, Expr* new_expr); void registerReplace(Expr* reference, Expr* new_expr); + void registerRemove(Expr* expr_to_remove); private: - enum class MutationMode { BEFORE, AFTER, REPLACE }; + enum class MutationMode { BEFORE, AFTER, REPLACE, REMOVE }; void registerMutation( Expr* ref, @@ -109,6 +114,9 @@ class ExprMutator : public IrVisitor { // Track replacements as they're registered std::vector replacements_; + + // Track removal as they're registered + std::vector removal_; }; } // namespace kir diff --git a/torch/csrc/jit/codegen/cuda/lower2device.cpp b/torch/csrc/jit/codegen/cuda/lower2device.cpp index 21eb6e02fb8ef0..de54e7b50434d1 100644 --- a/torch/csrc/jit/codegen/cuda/lower2device.cpp +++ b/torch/csrc/jit/codegen/cuda/lower2device.cpp @@ -184,7 +184,7 @@ void GpuLower::collectPaddedParallelDims() { } } -void GpuLower::lower(Fusion* fusion) { +void GpuLower::lower(Fusion* fusion, DataType index_type) { FUSER_PERF_SCOPE("GpuLower::lower"); TORCH_INTERNAL_ASSERT(fusion != nullptr); TORCH_INTERNAL_ASSERT( @@ -199,58 +199,85 @@ void GpuLower::lower(Fusion* fusion) { } } lower_guard(this); // Copy fusion into a new kernel for processing - kernel_ = std::make_unique(fusion); + kernel_ = std::make_unique(fusion, index_type); // Alias the fusion kernel caries around as a view of itself. fusion_ = kernel_.get(); + // Convert tensor views of DataType::Index type to either Int or Int32 + for (auto tv : ir_utils::allTvs(fusion_)) { + if (tv->dtype() == DataType::Index) { + tv->resolveIndexDtype(); + } + } + FusionGuard fg(fusion_); // prepare for lowering validateIr(fusion_); + // Checks if any TIDx dim is marked as padded to a warp. Also checks if we can + // determine the padding is explicitly a single warp. collectPaddedParallelDims(); + // Replaces integers that are tensor sizes by named scalars as "T0.size[0]" replaceSymbolicSizes(fusion_); + // Traverse through reductions and termine if any iteration domains are + // trivial reductions. Add these iteration domains to trivial_reduction_info_ + // which simply holds a map of which axes are trivial and which are not. trivial_reduction_info_.build(fusion_); - trivialReductionReplacement(fusion_, trivialReductionInfo()); + // Replaces trivial reduction expressions (all id's being reduced are trivial) + // with set unary op + trivialReductionReplacement(fusion_, trivial_reduction_info_); // In the future we may directly use this map, but for now it will propagate - // and validate (to some extent) the parallelization strategy. - // This is the first time nodes will be lowered to kir nodes. Since for now we - // propagate the parallel strategy in some instances, we need to do it before - // lowering. + // and validate (to some extent) the parallelization strategy. Map only axes + // to the left of compute at position, forward broadcast in replay. ca_parallel_map_ = ComputeAtMap(ComputeAtMap::MappingMode::PARALLEL); ca_parallel_map_.build(fusion_, current()); - // Want to run this after parallel map is created - validateVectorize(fusion_); - - // Generate mappings to generate indices + // Generate mappings to generate indices. Maps all iteration domains but + // doesn't map any broadcast iteration domains, nor forward them in replay. ca_index_map_ = ComputeAtMap(ComputeAtMap::MappingMode::INDEX); ca_index_map_.build(fusion_, current()); - // Generate mappings to generate and map to loop nests + // Generate mappings to generate and map to loop nests. Maps all iteration + // domains, forwards broadcasts, ensures root domain mappings exist (aren't + // replaced in forwarding). ca_loop_map_ = ComputeAtMap(ComputeAtMap::MappingMode::LOOP); ca_loop_map_.build(fusion_, current()); + // Used in parallel dimension map + concretized_broadcast_domains_.build(fusion_); + parallelDimensionMap().build(fusion_); if (isDebugDumpEnabled(DebugDumpOption::ParallelDimensions)) { std::cout << "Parallel dimension map:" << std::endl; std::cout << parallel_dimension_map_.toString() << std::endl; } - concretized_broadcast_domains_.build(fusion_); + // Validate mma data format and compatibility if any on the fusion. + validateMma(fusion_); // Compute thread predicates. Depends on parallel_dimension_map_ thread_pred_map_.build(fusion_); - // Depends on thread_pred_map_ - validateParallelize(fusion_); + // Fuse cetain patterns of reductions, such as a grid reduction + // followed by a grid broadcast. Only depends on parallelization and + // thread predicate map. + fuseReductions(fusion_); // Scan the whole fusion and build mappings about halo extensions of // all IterDomains haloInfo().build(fusion_); + // Want to run this after parallel map and halo info map are + // created. vectorized_accesses_ and vectorized_set_info_ are filled. + validateAndCollectVectorizeInfo(fusion_); + + // Depends on thread_pred_map_, validates parallelization collects which + // tensor views need WAR or RAW syncs + sync_map_.build(fusion_); + partialSplitMap().build(fusion_); validatePartialSplit(fusion_); @@ -312,14 +339,20 @@ void GpuLower::lower(Fusion* fusion) { const auto exprs_conditional_loops = generateConditionalFromPredicate(exprs_with_fused_broadcast); + const auto exprs_common_index_allocated = + allocateCommonIndices(exprs_conditional_loops); + // Insert fake zero updates to make sure nvrtc doesn't blow out register use // on index and predicate reuse - const auto exprs_register_adjusted = insertMagicZero(exprs_conditional_loops); + const auto exprs_register_adjusted = + insertMagicZero(exprs_common_index_allocated); const auto exprs_cleaned_up_loops = KIRCleaner::cleanUp(exprs_register_adjusted); - // We now have the lowered expressions, finalize the kernel IR + // We now have the lowered expressions, finalize the kernel IR. This function + // will also copy over some relevant information for code generation from + // GpuLower. kernel_->finalize(exprs_cleaned_up_loops); } diff --git a/torch/csrc/jit/codegen/cuda/lower2device.h b/torch/csrc/jit/codegen/cuda/lower2device.h index b97c6ac18373c3..6273c0e2d6a9ba 100644 --- a/torch/csrc/jit/codegen/cuda/lower2device.h +++ b/torch/csrc/jit/codegen/cuda/lower2device.h @@ -8,8 +8,11 @@ #include #include #include +#include +#include #include #include +#include #include #include #include @@ -18,9 +21,12 @@ #include #include #include +#include #include #include +#include +#include namespace torch { namespace jit { @@ -38,9 +44,12 @@ class TORCH_CUDA_CU_API GpuLower : public NonCopyable { public: GpuLower() = delete; + // GpuLower lowers the provided fusion into a kernel which can be translated + // into cuda code. index_type allows to compile the kernel based on int32 + // indexing instead of int64 for additional performance. // NOLINTNEXTLINE(cppcoreguidelines-pro-type-member-init) - explicit GpuLower(Fusion* fusion) { - lower(fusion); + explicit GpuLower(Fusion* fusion, DataType index_type = DataType::Int) { + lower(fusion, index_type); } kir::Kernel* kernel() const; @@ -57,6 +66,12 @@ class TORCH_CUDA_CU_API GpuLower : public NonCopyable { return thread_pred_map_; } + // Returns non-const reference. Necessary to reset a predicate flag + // when a broadcast expression is fused into a reduction. + ThreadPredicateMap& threadPredMap() { + return thread_pred_map_; + } + const ComputeAtMap& caLoopMap() const { return ca_loop_map_; } @@ -125,8 +140,36 @@ class TORCH_CUDA_CU_API GpuLower : public NonCopyable { return double_buffer_info_; } + CommonIndexMap& commonIndexMap() { + return common_index_map_; + } + + const auto& vectorizedAccesses() const { + return vectorized_accesses_; + } + + auto& vectorizedAccesses() { + return vectorized_accesses_; + } + + const auto& vectorizedSetInfo() const { + return vectorized_set_info_; + } + + auto& vectorizedSetInfo() { + return vectorized_set_info_; + } + + FusedReductionInfo& fusedReductionInfo() { + return fused_reduction_info_; + } + + const SyncMap& syncMap() const { + return sync_map_; + } + private: - void lower(Fusion* fusion); + void lower(Fusion* fusion, DataType index_type); // Goes through the parallelized iterdomains of the used TVs and find // the parallel dimensions that need to be padded to a multiples of @@ -152,6 +195,16 @@ class TORCH_CUDA_CU_API GpuLower : public NonCopyable { PartialSplitMap partial_split_map_; NonDivisibleSplitInfo non_divisible_split_info_; DoubleBufferInfo double_buffer_info_; + CommonIndexMap common_index_map_; + FusedReductionInfo fused_reduction_info_; + SyncMap sync_map_; + + // Track which tensor views are inputs or outputs of a vectorized operation + // and their maximum vectorized access size + // std::unordered_map vectorized_accesses_; + std::unordered_map vectorized_accesses_; + // Info on each vectorized set op + std::vector vectorized_set_info_; Fusion* fusion_ = nullptr; }; diff --git a/torch/csrc/jit/codegen/cuda/lower_alias_memory.cpp b/torch/csrc/jit/codegen/cuda/lower_alias_memory.cpp index 17a2db069d865c..32da48bf51417a 100644 --- a/torch/csrc/jit/codegen/cuda/lower_alias_memory.cpp +++ b/torch/csrc/jit/codegen/cuda/lower_alias_memory.cpp @@ -920,6 +920,31 @@ class AllocateReuseModifier { continue; } + if (alloc_info->alloc_expr->buffer()->isA()) { + if (!alloc_info->alloc_expr->buffer()->isA()) { + continue; + } + auto this_tv = alloc_info->alloc_expr->buffer()->as(); + auto reuse_tv = alloc_info->alloc_expr->buffer()->as(); + // Check that either both tv's are vectorized acceses, or neither are. + // Vectorized allocations require correct alignment so they can only + // alias with other allocations with the right alignment + const auto& va = GpuLower::current()->vectorizedAccesses(); + if ((va.find(this_tv) == va.end()) != + (va.find(reuse_tv) == va.end())) { + return false; + } + + // Shared memory is all aligned to 128 bits, local memory might not be + if (this_tv->getMemoryType() == MemoryType::Local && + va.find(this_tv) != va.end()) { + // Make sure alignment matches + if (va.at(this_tv) != va.at(reuse_tv)) { + return false; + } + } + } + // TODO: // Outer interval based sharing supports arbitrary re-indexing into // the same buffer and would require additional syncs if fully diff --git a/torch/csrc/jit/codegen/cuda/lower_allocation.cpp b/torch/csrc/jit/codegen/cuda/lower_allocation.cpp index c03848ccff86e9..bb2f8b173fdee0 100644 --- a/torch/csrc/jit/codegen/cuda/lower_allocation.cpp +++ b/torch/csrc/jit/codegen/cuda/lower_allocation.cpp @@ -453,6 +453,8 @@ class AllocationInserter : public kir::ExprMutator { default_val == nullptr, "Reduction should not have a default initialization value for predicate elimination."); init = expr->as()->init(); + } else if (expr->isA()) { + init = expr->as()->init(); } else if (expr->isA()) { TORCH_INTERNAL_ASSERT( default_val == nullptr, diff --git a/torch/csrc/jit/codegen/cuda/lower_double_buffer.cpp b/torch/csrc/jit/codegen/cuda/lower_double_buffer.cpp index c8110413de7430..571ba62a545baf 100644 --- a/torch/csrc/jit/codegen/cuda/lower_double_buffer.cpp +++ b/torch/csrc/jit/codegen/cuda/lower_double_buffer.cpp @@ -407,7 +407,7 @@ class DoubleBufferInserter : private kir::ExprMutator { // RAW sync is not inserted for double buffered tensors. The only // exception is the prologue load. if (write_to_smem) { - auto sync = IrBuilder::create(); + auto sync = IrBuilder::create(); registerInsertBefore(double_buffer_loop, sync); } diff --git a/torch/csrc/jit/codegen/cuda/lower_expr_sort.cpp b/torch/csrc/jit/codegen/cuda/lower_expr_sort.cpp index 84c72c08185d7b..cd5a589f13ad6e 100644 --- a/torch/csrc/jit/codegen/cuda/lower_expr_sort.cpp +++ b/torch/csrc/jit/codegen/cuda/lower_expr_sort.cpp @@ -683,9 +683,9 @@ struct LocalDomainSorter { // Return if id0 should be before id1 inline bool operator()(IterDomain* id0, IterDomain* id1) { auto concrete_id_0 = - GpuLower::current()->caLoopMap().getConcreteMappedID(id0); + GpuLower::current()->caParallelMap().getConcreteMappedID(id0); auto concrete_id_1 = - GpuLower::current()->caLoopMap().getConcreteMappedID(id1); + GpuLower::current()->caParallelMap().getConcreteMappedID(id1); if (concrete_id_dependencies_.find(concrete_id_0) != concrete_id_dependencies_.end()) { @@ -840,7 +840,7 @@ ExprGroup* ExprSegmentationSorter::makeMergedNode( if (producer_of_consumer_edge->isA()) { auto tv = producer_of_consumer_edge->as(); for (const auto tv_i : c10::irange(tv->getComputeAtPosition())) { - ca_ids.emplace(GpuLower::current()->caLoopMap().getConcreteMappedID( + ca_ids.emplace(GpuLower::current()->caParallelMap().getConcreteMappedID( tv->axis(tv_i))); } } @@ -855,7 +855,7 @@ ExprGroup* ExprSegmentationSorter::makeMergedNode( if (consumer_of_producer_edge->isA()) { auto tv = consumer_of_producer_edge->as(); for (const auto tv_i : c10::irange(tv->getMaxProducerPosition())) { - pa_ids.emplace(GpuLower::current()->caLoopMap().getConcreteMappedID( + pa_ids.emplace(GpuLower::current()->caParallelMap().getConcreteMappedID( tv->axis(tv_i))); } } @@ -866,7 +866,7 @@ ExprGroup* ExprSegmentationSorter::makeMergedNode( auto ordered_ids = getLocalDomainOrdering( joined_groups->exprs(), - GpuLower::current()->caLoopMap(), + GpuLower::current()->caParallelMap(), all_ca_pa_ids, concrete_id_dependencies); @@ -914,7 +914,7 @@ bool canReducePA(ExprGroup* group) { // it can't decide if it can be reduced bool has_matching_pa = false; for (const auto i : c10::irange(consumer_tv->getMaxProducerPosition())) { - if (GpuLower::current()->caLoopMap().areMapped( + if (GpuLower::current()->caParallelMap().areMapped( consumer_tv->axis(i), group_pa_last_id)) { has_matching_pa = true; break; @@ -931,7 +931,7 @@ bool canReducePA(ExprGroup* group) { static_cast(producer_tv->getComputeAtPosition()); producer_pos_i > 0; producer_pos_i--) { - if (GpuLower::current()->caLoopMap().areMapped( + if (GpuLower::current()->caParallelMap().areMapped( producer_tv->axis(producer_pos_i - 1), group_pa_last_id)) { return false; } @@ -1027,7 +1027,7 @@ void ExprSegmentationSorter::initializeForLoopDependencies() { tv_id_i--) { auto tv_id = tv->axis((int)(tv_id_i - 1)); auto concrete_id = - GpuLower::current()->caLoopMap().getConcreteMappedID(tv_id); + GpuLower::current()->caParallelMap().getConcreteMappedID(tv_id); if (concrete_id_dependencies.find(concrete_id) == concrete_id_dependencies.end()) { @@ -1039,7 +1039,7 @@ void ExprSegmentationSorter::initializeForLoopDependencies() { // Loops after tv_id are dependent on tv_id dependencies.emplace( - GpuLower::current()->caLoopMap().getConcreteMappedID(tv_id)); + GpuLower::current()->caParallelMap().getConcreteMappedID(tv_id)); } } @@ -1067,27 +1067,62 @@ void ExprSegmentationSorter::initializeForLoopDependencies() { std::back_inserter(to_visit), [](const auto& concrete_dep_entry) { return concrete_dep_entry.first; }); + size_t inf_loop_counter = to_visit.size(); + bool failed = false; + while (!to_visit.empty()) { auto id = to_visit.front(); to_visit.pop_front(); + if (inf_loop_counter-- == 0) { + failed = true; + break; + } + auto& dependencies = concrete_id_dependencies.at(id); - bool ready = std::all_of( - dependencies.begin(), dependencies.end(), [&visited](IterDomain* id) { - return visited.count(id); - }); + bool ready = dependencies.empty() || + std::all_of(dependencies.begin(), + dependencies.end(), + [&visited](IterDomain* id) { return visited.count(id); }); if (!ready) { to_visit.push_back(id); continue; } + inf_loop_counter = to_visit.size(); + for (auto dependency : dependencies) { auto dep_of_dep = concrete_id_dependencies.at(dependency); dependencies.insert(dep_of_dep.begin(), dep_of_dep.end()); } visited.emplace(id); } + if (failed) { + std::cerr + << "ERROR: Iteration domain sorting has failed, infinite loop detected." + << std::endl; + std::cerr << "Failed to sort out: " << std::endl; + for (auto entry : to_visit) { + std::cerr << entry->toString(); + if (entry != to_visit.back()) { + std::cerr << ", "; + } + } + + std::cerr << "Depdencies: " << std::endl; + for (const auto& dep_entry : concrete_id_dependencies) { + std::cerr << " Deps of " << dep_entry.first->toString() << std::endl + << " "; + + for (auto dep : dep_entry.second) { + std::cerr << dep->toString() << ", "; + } + std::cerr << std::endl; + } + + TORCH_INTERNAL_ASSERT(false); + } } // Checks if the for loop associated with the concrete ID is ready to be @@ -1145,7 +1180,7 @@ bool ExprSegmentationSorter::supportedMerge(ExprGroup* sg1, ExprGroup* sg2) { return false; } - const auto& loop_map = GpuLower::current()->caLoopMap(); + const auto& parallel_map = GpuLower::current()->caParallelMap(); // If inner loop dependencies have not been resolved, cannot merge. if (!loopReady(producer_ca_domain.back()) || @@ -1182,11 +1217,11 @@ bool ExprSegmentationSorter::supportedMerge(ExprGroup* sg1, ExprGroup* sg2) { continue; } - if (!loop_map.areMapped(compute_at_dim, producer_ca_domain.back())) { + if (!parallel_map.areMapped(compute_at_dim, producer_ca_domain.back())) { continue; } - if (loop_map.areMapped(compute_at_dim, consumer_pa_domain.back())) { + if (parallel_map.areMapped(compute_at_dim, consumer_pa_domain.back())) { return true; } } diff --git a/torch/csrc/jit/codegen/cuda/lower_fused_reduction.cpp b/torch/csrc/jit/codegen/cuda/lower_fused_reduction.cpp new file mode 100644 index 00000000000000..cf6458ea0980c3 --- /dev/null +++ b/torch/csrc/jit/codegen/cuda/lower_fused_reduction.cpp @@ -0,0 +1,312 @@ +#include +#include +#include +#include + +#include + +#include + +namespace torch { +namespace jit { +namespace fuser { +namespace cuda { + +namespace { + +//! An instance of reduction patterns to fuse +class FusedReductionBroadcastInfo : public PolymorphicBase { + public: + FusedReductionBroadcastInfo(ReductionOp* reduction, bool with_broadcast) + : reductions_({reduction}), with_broadcast_({with_broadcast}) {} + + FusedReductionBroadcastInfo(WelfordOp* welford, bool with_broadcast) + : reductions_({welford}), with_broadcast_({with_broadcast}) {} + + const std::vector& reductions() const { + return reductions_; + } + + const std::vector& withBroadcast() const { + return with_broadcast_; + } + + private: + // Holds ReductionOp or WelfordOp. Can be multiple in the case of + // horizontal fusion + std::vector reductions_; + // True each reduction also broadcasts + std::vector with_broadcast_; +}; + +//! Inspect a fusion to detect eligible sequences of expressions to +//! use the fused reduction kernel +class FusionInspector : private IterVisitor { + public: + static std::vector run(Fusion* fusion) { + FusionInspector inspector(fusion); + return inspector.fusion_list_; + } + + private: + FusionInspector(Fusion* fusion) { + traverse(fusion); + } + + using IterVisitor::handle; + + void handle(ReductionOp* rop) final { + /// If it's a grid reduction, keep track of tensors that depend on + /// this reduction. + // Only consider when out is on register as that is assumed in the + // fused reduction kernel. + auto out = rop->out()->as(); + if (out->getMemoryType() == MemoryType::Local && + out->domain()->hasGridReduction()) { + reduction_dep_[out].insert(rop); + } + } + + void handle(WelfordOp* wop) final { + /// If it's a grid reduction, keep track of tensors that depend on + /// this reduction. + // Only consider when out is on register as that is assumed in the + // fused reduction kernel. + auto out = wop->out()->as(); + if (out->getMemoryType() == MemoryType::Local && + out->domain()->hasGridReduction()) { + reduction_dep_[out].insert(wop); + } + } + + void handle(Expr* expr) final { + IterVisitor::handle(expr); + for (auto in_tv : ir_utils::filterByType(expr->inputs())) { + for (auto reduction_op : reduction_dep_[in_tv]) { + if (fused_exprs_.find(reduction_op) != fused_exprs_.end()) { + continue; + } + for (auto out_tv : + ir_utils::filterByType(expr->outputs())) { + reduction_dep_[out_tv].insert(reduction_op); + } + } + } + } + + // In the case of welford, use the fused broadcast reduction when at + // least one of the outputs is broadcast. + void handle(BroadcastOp* bop) final { + // Detect a pattern where a reduction is followed by a broadcast + auto bop_out = bop->out()->as(); + auto bop_in = bop->in()->as(); + + for (Expr* preceding_expr : reduction_dep_[bop_in]) { + auto parallel_reduction_axes = + getReductionParallelTypeStates(preceding_expr); + + // If not matching, propagate the reduction further down to + // subsequent expressions + if (!isBroadcastFuseable(bop_out, parallel_reduction_axes)) { + continue; + } + + if (fused_exprs_.find(preceding_expr) != fused_exprs_.end()) { + // Already added to the fusion list. This can happen with + // welford as there can be multiple broadcast consumer + // expressions. + continue; + } + + if (preceding_expr->isA()) { + fusion_list_.emplace_back(preceding_expr->as(), true); + } else { + fusion_list_.emplace_back(preceding_expr->as(), true); + } + + fused_exprs_.insert(preceding_expr); + } + } + + ParallelTypeBitmap getReductionParallelTypeStates(Expr* expr) { + ParallelTypeBitmap parallel_reduction_axes; + + for (auto id : ir_utils::getTvOutput(expr)->domain()->domain()) { + auto pt = id->getParallelType(); + if (id->isReduction() && isParallelTypeThread(pt)) { + parallel_reduction_axes.set(pt); + } + } + + return parallel_reduction_axes; + } + + // Requires reduction parallel dimensions to exactly match parallel broadcast + // dimensions + bool isBroadcastFuseable( + TensorView* broadcast_out, + const ParallelTypeBitmap& parallel_reduction_axes) { + const auto broadcast_parallel_types = + GpuLower::current()->threadPredMap().getParallelBroadcastDomains( + broadcast_out); + + // If no parallel broadcast, nothing to fuse + if (broadcast_parallel_types.none()) { + return false; + } + + // Make sure the broadcast parallel types are the types reduced by + // the preceding reduction op + for (auto id : broadcast_out->domain()->domain()) { + auto pt = id->getParallelType(); + if (!isParallelTypeThread(pt)) { + continue; + } + // Parallel broadcast must be included in reduction_states + if (id->isBroadcast() && broadcast_parallel_types.get(pt)) { + if (!parallel_reduction_axes.get(pt)) { + return false; + } + } + } + + return true; + } + + private: + //! List of expression sequences to fuse + std::vector fusion_list_; + //! Keep track of fused reduction/welford exprs to avoid duplication + std::unordered_set fused_exprs_; + //! Keep track of ReductionOp/WelfordOp expressions that are + //! (indirectly) input to a tensor + std::unordered_map> reduction_dep_; +}; + +//! Transform a fusion to use the fused reduction kernel. +class FusionTransformer { + public: + static void run( + Fusion* fusion, + const std::vector& fusion_list) { + FusionTransformer transformer(fusion, fusion_list); + } + + private: + FusionTransformer( + Fusion* fusion, + const std::vector& fusion_list) + : fusion_(fusion), fusion_list_(fusion_list) { + transform(); + } + + void transform() { + for (const auto& info : fusion_list_) { + transform(info); + } + // If the thread predicate map is modified, rebuild the + // map. build() only updates mappings that need to be updated. + if (thread_pred_map_modified_) { + GpuLower::current()->threadPredMap().build(fusion_); + } + } + + void transform(const FusedReductionBroadcastInfo& info) { + TORCH_INTERNAL_ASSERT( + info.reductions().size() == 1, "Horizontal fusion not supported yet"); + + for (const auto i : c10::irange(info.reductions().size())) { + const auto expr = info.reductions().at(i); + const auto with_broadcast = info.withBroadcast().at(i); + Expr* fused_expr = nullptr; + + if (auto reduction = dynamic_cast(expr)) { + TORCH_INTERNAL_ASSERT(!reduction->isFused()); + + auto red_op_type = reduction->getReductionOpType(); + auto init = reduction->init(); + auto out = reduction->out(); + auto in = reduction->in(); + + fusion_->removeExpr(reduction); + + fused_expr = + IrBuilder::create(red_op_type, init, out, in, true); + } else if (auto welford = dynamic_cast(expr)) { + TORCH_INTERNAL_ASSERT(!welford->isFused()); + + auto out_avg = welford->outAvg(); + auto out_var = welford->outVar(); + auto out_n = welford->outN(); + auto init_avg = welford->initAvg(); + auto init_var = welford->initVar(); + auto init_n = welford->initN(); + auto in_avg = welford->inAvg(); + auto in_var = welford->inVar(); + auto in_n = welford->inN(); + + fusion_->removeExpr(welford); + + fused_expr = IrBuilder::create( + out_avg, + out_var, + out_n, + init_avg, + init_var, + init_n, + in_avg, + in_var, + in_n, + true); + } + + TORCH_INTERNAL_ASSERT(fused_expr != nullptr); + + // Do not just remove the broadcast but just reset the thread + // predicate of the broadcast op. Since fusion is applied only + // when all parallel broadcast domains are to be parallel + // reduction, all parallel types can be reset. + if (with_broadcast) { + // It may be just fine to remove the broadcast expr, but + // technically speaking that would violate the root domain mapping + // as broadcast domains would appear in the consumer of the + // broadcast output tensor without a broadcast expression. + for (auto reduction_out : + ir_utils::filterByType(fused_expr->outputs())) { + for (auto id : reduction_out->domain()->domain()) { + if (id->isReduction()) { + GpuLower::current()->fusedReductionInfo().markAsAllreduce(id); + GpuLower::current()->threadPredMap().markAsUpdated(reduction_out); + thread_pred_map_modified_ = true; + } + } + } + } + } + } + + private: + Fusion* fusion_ = nullptr; + const std::vector& fusion_list_; + bool thread_pred_map_modified_ = false; +}; + +} // namespace + +void fuseReductions(Fusion* fusion) { + auto fusion_list = FusionInspector::run(fusion); + FusionTransformer::run(fusion, fusion_list); +} + +void FusedReductionInfo::markAsAllreduce(IterDomain* id) { + allreduce_ids_.insert(id); +} + +bool FusedReductionInfo::isAllreduce(IterDomain* id) const { + return allreduce_ids_.find(id) != allreduce_ids_.end(); +} + +} // namespace cuda +} // namespace fuser +} // namespace jit +} // namespace torch diff --git a/torch/csrc/jit/codegen/cuda/lower_fused_reduction.h b/torch/csrc/jit/codegen/cuda/lower_fused_reduction.h new file mode 100644 index 00000000000000..97cd5f66086752 --- /dev/null +++ b/torch/csrc/jit/codegen/cuda/lower_fused_reduction.h @@ -0,0 +1,34 @@ +#pragma once + +#include + +namespace torch { +namespace jit { +namespace fuser { +namespace cuda { + +//! Keep track of certain patterns of reductions. +//! +//! - Allreduce IterDomain: reduced and broadcast domain. +class FusedReductionInfo { + public: + void markAsAllreduce(IterDomain* id); + + bool isAllreduce(IterDomain* id) const; + + private: + // Reduction IterDomains that are also broadcast + std::unordered_set allreduce_ids_; +}; + +//! Detect reductions and broadcasts that are eligible for the fused +//! reduction kernel. When found, the predicate flags of the broadcast +//! is unset, which effectively makes the broadcast just a unary set +//! op. +//! TODO: Consider moving the warp-based fused reduction here. +void fuseReductions(Fusion*); + +} // namespace cuda +} // namespace fuser +} // namespace jit +} // namespace torch diff --git a/torch/csrc/jit/codegen/cuda/lower_fusion_simplifier.cpp b/torch/csrc/jit/codegen/cuda/lower_fusion_simplifier.cpp index fa84d1006a16b8..dd4a06dfb3f829 100644 --- a/torch/csrc/jit/codegen/cuda/lower_fusion_simplifier.cpp +++ b/torch/csrc/jit/codegen/cuda/lower_fusion_simplifier.cpp @@ -91,6 +91,15 @@ class UnaryOpInserter : private kir::ExprMutator { gop, IrBuilder::create(container, UnaryOpType::Set, out, in)); } + void handle(ViewDtypeOp* vop) final { + auto out = vop->out(); + auto in = vop->in(); + auto container = out->container(); + registerReplace( + vop, + IrBuilder::create(container, UnaryOpType::EraseType, out, in)); + } + void handle(ViewOp* vop) final { auto out = vop->out(); auto in = vop->in(); diff --git a/torch/csrc/jit/codegen/cuda/lower_index.cpp b/torch/csrc/jit/codegen/cuda/lower_index.cpp index b0ef14079c436d..5db1999a3a0635 100644 --- a/torch/csrc/jit/codegen/cuda/lower_index.cpp +++ b/torch/csrc/jit/codegen/cuda/lower_index.cpp @@ -37,6 +37,11 @@ void IndexLowering::pushBack(Expr* expr) { } } +void IndexLowering::insertAtTopLevel(Expr* expr) { + TORCH_INTERNAL_ASSERT(!lowered_exprs_.empty()); + lowered_exprs_.insert(lowered_exprs_.end() - 1, expr); +} + void IndexLowering::handle(const kir::IfThenElse* ite) { const auto prev_scope = active_scope_; @@ -101,7 +106,11 @@ namespace { // Get the size of the temporary work buffer for grid communication, this can be // grid reduction, broadcast, or grid welford. -Val* getGridCommWorkBufferSize(const TensorDomain* td) { +// expansion_factor can be optionally passed to expand the allocation +// size. For example, FusedReduction should double the work buffer size. +Val* getGridCommWorkBufferSize( + const TensorDomain* td, + int expansion_factor = 1) { // The buffer size is the number of thread blocks multiplied by the // number of threads not used for reduction domains. // Note: Previously it was calculated based on the shape of the @@ -111,7 +120,11 @@ Val* getGridCommWorkBufferSize(const TensorDomain* td) { // size if the parallel dimensions are exact, but otherwise, just // computing the buffer size based on the tensor shape isn't // sufficient since there could be extra threads/blocks. - Val* buffer_size = GpuLower::current()->kernel()->oneVal(); + TORCH_INTERNAL_ASSERT( + expansion_factor >= 1, "Invalid expansion factor: ", expansion_factor); + Val* buffer_size = expansion_factor == 1 + ? GpuLower::current()->kernel()->oneVal() + : IrBuilder::create(expansion_factor); for (auto pt : kParallelTypeThreads) { auto pt_dim = GpuLower::current()->parallelDimensionMap().get(pt); if (pt_dim == nullptr || pt_dim->isOneInt()) { @@ -172,89 +185,122 @@ void IndexLowering::handle(const ReductionOp* rop) { const auto out_tv = rop->out()->as(); const auto out_domain = out_tv->domain(); - const bool is_block_reduce = out_domain->hasBlockReduction(); - const bool is_grid_reduce = out_domain->hasGridReduction(); - - // If we do a grid reduction we can't have a reduction axis that is not bound - // to a grid or block dim () - if (is_grid_reduce) { - TORCH_INTERNAL_ASSERT( - std::none_of( - out_domain->domain().begin(), - out_domain->domain().end(), - [](IterDomain* id) { - return !id->isThread() && id->isReduction() && - !id->extent()->isOneInt(); - }), - "Found a reduction stage that has both a non-parallelized ", - "reduction and a grid reduction. This is not supported, ", - "please use rfactor to do the serialized reduction first, ", - "then the grid reduction."); - } + const bool has_block_reduce = out_domain->hasBlockReduction(); + const bool has_grid_reduce = out_domain->hasGridReduction(); const auto out = lowerDstIndex(rop->out()); const auto in = lowerSrcIndex(rop->in(), rop->out()); - ReductionOp* block_reduction_op = nullptr; + // Serial reduction + if (!has_block_reduce && !has_grid_reduce) { + pushBack( + IrBuilder::create(rop->getReductionOpType(), out, out, in)); + return; + } - if (is_block_reduce) { - block_reduction_op = IrBuilder::create( - rop->getReductionOpType(), rop->init(), out, in); - if (rop->predicate()) { - block_reduction_op->setPredicate(rop->predicate()); - } - if (rop->writePredicate()) { - block_reduction_op->setWritePredicate(rop->writePredicate()); - } - pushBack(block_reduction_op); + ReductionOp* indexed_rop = IrBuilder::create( + rop->getReductionOpType(), rop->init(), out, in, rop->isFused()); + if (rop->predicate()) { + indexed_rop->setPredicate(rop->predicate()); + } + if (rop->writePredicate()) { + indexed_rop->setWritePredicate(rop->writePredicate()); } - if (is_grid_reduce) { - const auto reduce_buffer = allocGlobalBufferForGridComm( - getGridCommWorkBufferSize(out_domain), out->dtype(), false); - - const auto sync_buffer = allocGlobalBufferForGridComm( - getGridSyncBufferSize(out_domain), DataType::Int, true); - - const auto grid_reduction_op = (block_reduction_op == nullptr) - ? IrBuilder::create( - rop->getReductionOpType(), rop->init(), out, in) - : block_reduction_op; - - // The thread predicate for GridReduction needs to be set - // separately from the main predicate. Do not combine them like - // other expressions. - const auto& thread_pred = - GpuLower::current()->threadPredMap().getPredicatedParallelTypes(out_tv); - auto grid_reduction = IrBuilder::create( - grid_reduction_op, reduce_buffer, sync_buffer); - grid_reduction->setThreadPredicate(thread_pred); - - if (rop->predicate()) { - // If preceded by a blockReduce, all thread blocks should have - // valid inputs to gridReduce. In fact, using the original - // predicate does not work when the write predicate of the - // blockReduce is different from the read predicate. - if (is_block_reduce) { - grid_reduction->setPredicate(IrBuilder::create( - GpuLower::current()->kernel()->trueVal())); - } else { - grid_reduction->setPredicate(rop->predicate()); - } - } + // If not grid reduction, just append the new ReductionOp node + if (!has_grid_reduce) { + pushBack(indexed_rop); + return; + } + + handleGridReduction(indexed_rop); +} + +void IndexLowering::handleGridReduction(ReductionOp* indexed_rop) { + const auto out_tv = indexed_rop->out()->as()->view(); + const auto out_domain = out_tv->domain(); + + TORCH_INTERNAL_ASSERT(out_domain->hasGridReduction()); + + // If we do a grid reduction we can't have a reduction axis that is not bound + // to a grid or block dim. + TORCH_INTERNAL_ASSERT( + std::none_of( + out_domain->domain().begin(), + out_domain->domain().end(), + [](IterDomain* id) { + return !id->isThread() && id->isReduction() && + !id->extent()->isOneInt(); + }), + "Found a reduction stage that has both a non-parallelized ", + "reduction and a grid reduction. This is not supported, ", + "please use rfactor to do the serialized reduction first, ", + "then the grid reduction."); + + // When using the fused reduction in a loop, the global work buffer + // is double buffered to save global synchronizations. + auto is_within_a_loop = std::any_of( + out_domain->domain().begin(), + out_domain->domain().end(), + [](IterDomain* id) { return !isTrivialIterDomain(id); }); + + const auto reduce_buffer = allocGlobalBufferForGridComm( + getGridCommWorkBufferSize( + out_domain, indexed_rop->isFused() && is_within_a_loop ? 2 : 1), + indexed_rop->out()->dtype(), + false); + + const auto sync_buffer = allocGlobalBufferForGridComm( + getGridSyncBufferSize(out_domain), DataType::Int, true); - if (rop->writePredicate()) { - grid_reduction->setWritePredicate(rop->writePredicate()); + const bool block_reduce_separated = + out_domain->hasBlockReduction() && !indexed_rop->isFused(); + + // The thread predicate for GridReduction needs to be set + // separately from the main predicate. Do not combine them like + // other expressions. + const auto& thread_pred = + GpuLower::current()->threadPredMap().getPredicatedParallelTypes(out_tv); + + auto grid_reduction = IrBuilder::create( + indexed_rop, reduce_buffer, sync_buffer); + + grid_reduction->setThreadPredicate(thread_pred); + + // If preceded by a blockReduce, all thread blocks should have + // valid inputs to gridReduce. In fact, using the original + // predicate does not work when the write predicate of the + // blockReduce is different from the read predicate. + if (indexed_rop->predicate()) { + if (block_reduce_separated) { + grid_reduction->setPredicate(IrBuilder::create( + GpuLower::current()->kernel()->trueVal())); + } else { + grid_reduction->setPredicate(indexed_rop->predicate()); } + } - pushBack(reduce_buffer); - pushBack(sync_buffer); - pushBack(grid_reduction); + if (indexed_rop->writePredicate()) { + grid_reduction->setWritePredicate(indexed_rop->writePredicate()); } - if (!is_block_reduce && !is_grid_reduce) { - pushBack( - IrBuilder::create(rop->getReductionOpType(), out, out, in)); + // Push back the reduction op when block reduction is done + // separately. Otherwise, the reduction op is just referenced from + // the grid reduction op. + if (block_reduce_separated) { + pushBack(indexed_rop); + } + + pushBack(reduce_buffer); + pushBack(sync_buffer); + pushBack(grid_reduction); + + if (indexed_rop->isFused()) { + // When using the fused reduction, allocate the reduction object at + // the outer-most scope + auto fused_reduction_alloc_reduction = + IrBuilder::create(grid_reduction); + insertAtTopLevel(fused_reduction_alloc_reduction); } } @@ -264,12 +310,12 @@ void IndexLowering::handle(const WelfordOp* wop) { const auto out_tv = wop->outAvg()->as(); const auto out_domain = out_tv->domain(); - const bool is_block_reduce = out_domain->hasBlockReduction(); - const bool is_grid_reduce = out_domain->hasGridReduction(); + const bool has_block_reduce = out_domain->hasBlockReduction(); + const bool has_grid_reduce = out_domain->hasGridReduction(); // If we do a grid reduction we can't have a reduction axis that is not bound // to a grid or block dim () - if (is_grid_reduce) { + if (has_grid_reduce) { TORCH_INTERNAL_ASSERT( std::none_of( out_domain->domain().begin(), @@ -298,7 +344,7 @@ void IndexLowering::handle(const WelfordOp* wop) { auto out_var = lowerDstIndex(wop->outVar()); auto out_N = lowerDstIndex(wop->outN()); - WelfordOp* welford_op = IrBuilder::create( + WelfordOp* indexed_wop = IrBuilder::create( out_avg, out_var, out_N, @@ -307,70 +353,111 @@ void IndexLowering::handle(const WelfordOp* wop) { wop->initN(), in_avg, in_var, - in_N); + in_N, + wop->isFused()); - WelfordOp* block_welford_op = nullptr; + if (wop->predicate()) { + indexed_wop->setPredicate(wop->predicate()); + } + if (wop->writePredicate()) { + indexed_wop->setWritePredicate(wop->writePredicate()); + } - if (is_block_reduce) { - block_welford_op = welford_op; - if (wop->predicate()) { - block_welford_op->setPredicate(wop->predicate()); - } - if (wop->writePredicate()) { - block_welford_op->setWritePredicate(wop->writePredicate()); - } - pushBack(block_welford_op); + // Serial welford + if (!has_block_reduce && !has_grid_reduce) { + pushBack(indexed_wop); + return; } - if (is_grid_reduce) { - // Buffer allocation - const auto work_buffer_size = getGridCommWorkBufferSize(out_domain); - - const auto out_var_buffer = - allocGlobalBufferForGridComm(work_buffer_size, out_var->dtype(), false); - const auto out_avg_buffer = - allocGlobalBufferForGridComm(work_buffer_size, out_avg->dtype(), false); - const auto out_N_buffer = - allocGlobalBufferForGridComm(work_buffer_size, out_N->dtype(), false); - - const auto sync_buffer = allocGlobalBufferForGridComm( - getGridSyncBufferSize(out_domain), DataType::Int, true); - - // Grid Welford instantiation - const auto grid_welford_op = - (block_welford_op == nullptr) ? welford_op : block_welford_op; - - // The thread predicate for GridReduction needs to be set - // separately from the main predicate. Do not combine them like - // other expressions. - const auto& thread_pred = - GpuLower::current()->threadPredMap().getPredicatedParallelTypes(out_tv); - - auto grid_welford = IrBuilder::create( - grid_welford_op, - out_var_buffer, - out_avg_buffer, - out_N_buffer, - sync_buffer); - - grid_welford->setThreadPredicate(thread_pred); - - if (wop->predicate()) { - grid_welford->setPredicate(wop->predicate()); + // Block-only welford + if (!has_grid_reduce) { + pushBack(indexed_wop); + return; + } + + handleGridWelford(indexed_wop); +} + +void IndexLowering::handleGridWelford(WelfordOp* indexed_wop) { + const auto out_tv = indexed_wop->out()->as()->view(); + const auto out_domain = out_tv->domain(); + + // Buffer allocation + // When using the fused reduction in a loop, the global work buffer + // is double buffered to save global synchronizations. + auto is_within_a_loop = std::any_of( + out_domain->domain().begin(), + out_domain->domain().end(), + [](IterDomain* id) { return !isTrivialIterDomain(id); }); + + const auto work_buffer_size = getGridCommWorkBufferSize( + out_domain, indexed_wop->isFused() && is_within_a_loop ? 2 : 1); + + const auto out_var_buffer = allocGlobalBufferForGridComm( + work_buffer_size, indexed_wop->outVar()->dtype(), false); + const auto out_avg_buffer = allocGlobalBufferForGridComm( + work_buffer_size, indexed_wop->outAvg()->dtype(), false); + const auto out_N_buffer = allocGlobalBufferForGridComm( + work_buffer_size, indexed_wop->outN()->dtype(), false); + + const auto sync_buffer = allocGlobalBufferForGridComm( + getGridSyncBufferSize(out_domain), DataType::Int, true); + + // The thread predicate for GridReduction needs to be set + // separately from the main predicate. Do not combine them like + // other expressions. + const auto& thread_pred = + GpuLower::current()->threadPredMap().getPredicatedParallelTypes(out_tv); + + auto grid_welford = IrBuilder::create( + indexed_wop, out_var_buffer, out_avg_buffer, out_N_buffer, sync_buffer); + + grid_welford->setThreadPredicate(thread_pred); + + const bool block_reduce_separated = + out_domain->hasBlockReduction() && !indexed_wop->isFused(); + + if (indexed_wop->predicate()) { + if (block_reduce_separated) { + grid_welford->setPredicate(IrBuilder::create( + GpuLower::current()->kernel()->trueVal())); + } else { + grid_welford->setPredicate(indexed_wop->predicate()); } + } - pushBack(out_var_buffer); - pushBack(out_avg_buffer); - pushBack(out_N_buffer); - pushBack(sync_buffer); - pushBack(grid_welford); + if (indexed_wop->writePredicate()) { + grid_welford->setWritePredicate(indexed_wop->writePredicate()); } - if (!is_block_reduce && !is_grid_reduce) { - pushBack(welford_op); + if (block_reduce_separated) { + pushBack(indexed_wop); + } + + pushBack(out_var_buffer); + pushBack(out_avg_buffer); + pushBack(out_N_buffer); + pushBack(sync_buffer); + pushBack(grid_welford); + + if (indexed_wop->isFused()) { + // When using the fused reduction, allocate the reduction object at + // the outer-most scope + auto fused_reduction_alloc_reduction = + IrBuilder::create(grid_welford); + insertAtTopLevel(fused_reduction_alloc_reduction); } } +void IndexLowering::handle(const MmaOp* mma) { + const auto a = lowerSrcIndex(mma->inA(), mma->out()); + const auto b = lowerSrcIndex(mma->inB(), mma->out()); + const auto out = lowerDstIndex(mma->out()); + auto mma_indexed = + IrBuilder::create(out, a, b, mma->init(), mma->options()); + pushBack(mma_indexed); +} + void IndexLowering::handle(const BroadcastOp* bop) { TORCH_INTERNAL_ASSERT(ir_utils::isTvOp(bop)); @@ -423,9 +510,14 @@ void IndexLowering::handle(const kir::Allocate* allocate) { pushBack(const_cast(allocate)); // NOLINT } -void IndexLowering::handle(const kir::Sync* sync) { +void IndexLowering::handle(const kir::BlockSync* sync) { + // TODO(kir): remove the need for const_cast + pushBack(const_cast(sync)); // NOLINT +} + +void IndexLowering::handle(const kir::GridSync* sync) { // TODO(kir): remove the need for const_cast - pushBack(const_cast(sync)); // NOLINT + pushBack(const_cast(sync)); // NOLINT } void IndexLowering::generate(const std::vector& exprs) { diff --git a/torch/csrc/jit/codegen/cuda/lower_index.h b/torch/csrc/jit/codegen/cuda/lower_index.h index 2f3af0061e1898..78d6bb2a02fb78 100644 --- a/torch/csrc/jit/codegen/cuda/lower_index.h +++ b/torch/csrc/jit/codegen/cuda/lower_index.h @@ -30,23 +30,32 @@ class TORCH_CUDA_CU_API IndexLowering : private OptOutConstDispatch { void pushBack(Expr*); + // Insert an expression before the current top-level expression. + void insertAtTopLevel(Expr* expr); + void handle(const UnaryOp*) final; void handle(const BinaryOp*) final; void handle(const TernaryOp*) final; void handle(const ReductionOp*) final; void handle(const WelfordOp*) final; + void handle(const MmaOp*) final; void handle(const BroadcastOp*) final; void handle(const kir::ForLoop*) final; void handle(const kir::IfThenElse*) final; void handle(const kir::Allocate*) final; - void handle(const kir::Sync*) final; + void handle(const kir::BlockSync*) final; + void handle(const kir::GridSync*) final; void generate(const std::vector& exprs); Val* lowerSrcIndex(Val* val, Val* dst) const; + Val* lowerDstIndex(Val* dst) const; + void handleGridReduction(ReductionOp* new_rop); + void handleGridWelford(WelfordOp* new_wop); + private: std::vector lowered_exprs_; diff --git a/torch/csrc/jit/codegen/cuda/lower_index_hoist.cpp b/torch/csrc/jit/codegen/cuda/lower_index_hoist.cpp new file mode 100644 index 00000000000000..699c887816f8d6 --- /dev/null +++ b/torch/csrc/jit/codegen/cuda/lower_index_hoist.cpp @@ -0,0 +1,326 @@ +#include +#include +#include +#include + +#include + +namespace torch { +namespace jit { +namespace fuser { +namespace cuda { + +namespace { + +// Return leaf domains of a given domain. +std::unordered_set getUsedLeafIds( + IterDomain* id, + TensorDomain* td) { + const auto all_vals_between = DependencyCheck::getAllValsBetween( + {id}, {td->domain().begin(), td->domain().end()}); + + std::unordered_set used_leaf_ids; + + for (const auto leaf : td->domain()) { + if (std::find(all_vals_between.begin(), all_vals_between.end(), leaf) != + all_vals_between.end()) { + used_leaf_ids.insert(leaf); + } + } + + TORCH_INTERNAL_ASSERT( + !used_leaf_ids.empty(), + "No used id found: ", + id->toString(), + ", ", + td->toString()); + + return used_leaf_ids; +} + +} // namespace + +CommonIndexKey::CommonIndexKey( + IterDomain* consumer_indexed_id, + TensorDomain* consumer_td, + TensorDomain* ref_td, + const std::unordered_map& ref_index_map, + const std::vector& loops) { + auto gpu_lower = GpuLower::current(); + + concrete_indexed_id_ = + gpu_lower->caIndexMap().getConcreteMappedID(consumer_indexed_id); + + const auto consumer_leaf_ids = + getUsedLeafIds(consumer_indexed_id, consumer_td); + + // Convert to Parallel concrete IDs to find matching loops. + std::unordered_set concrete_leaf_ids; + for (auto& id : consumer_leaf_ids) { + concrete_leaf_ids.insert( + gpu_lower->caParallelMap().getConcreteMappedID(id)); + } + + // Find used loops and their index vals + for (const auto i : c10::irange(loops.size())) { + auto loop = loops.at(i); + auto loop_id = + gpu_lower->caParallelMap().getConcreteMappedID(loop->iter_domain()); + auto it = concrete_leaf_ids.find(loop_id); + if (it != concrete_leaf_ids.end()) { + // This leaf reference id is used for indexing the consumer id + used_loops_.push_back(loop); + auto index_it = ref_index_map.find(ref_td->axis(i)); + TORCH_INTERNAL_ASSERT( + index_it != ref_index_map.end(), + "Index not found for leaf ID, ", + ref_td->axis(i)->toString()); + loop_index_vals_.push_back(index_it->second); + } + } + + TORCH_INTERNAL_ASSERT( + !used_loops_.empty(), + "No loop used for indexing found. ", + consumer_indexed_id->toString()); + + TORCH_INTERNAL_ASSERT( + consumer_leaf_ids.size() == used_loops_.size(), + "consumer_leaf_ids.size() = ", + consumer_leaf_ids.size(), + ", used_loops_.size() == ", + used_loops_.size(), + ", loops.size() == ", + loops.size()); +} + +bool CommonIndexKey::operator==(const CommonIndexKey& other) const { + auto gpu_lower = GpuLower::current(); + + if (concrete_indexed_id_ != other.concrete_indexed_id_) { + return false; + } + + if (used_loops_.size() != other.used_loops_.size()) { + return false; + } + + for (const auto i : c10::irange(used_loops_.size())) { + auto lhs_loop = used_loops_.at(i); + auto rhs_loop = other.used_loops_.at(i); + if (lhs_loop == rhs_loop) { + continue; + } + if (gpu_lower->caLoopMap().areMapped( + lhs_loop->iter_domain(), rhs_loop->iter_domain()) && + lhs_loop->isTrivial() && rhs_loop->isTrivial()) { + continue; + } + return false; + } + + for (const auto i : c10::irange(loop_index_vals_.size())) { + auto lhs_index = loop_index_vals_.at(i); + auto rhs_index = other.loop_index_vals_.at(i); + if (lhs_index == rhs_index) { + continue; + } + // Initial index variables can have some additions such as magic + // zero and "1" when used in producer indexing for double buffered + // tensors. Thus, the initial variables themselves may be + // different, and its components need to be examined. An easy way + // is to flatten them to strings as follows. + auto lhs_str = loop_index_vals_.at(i)->toInlineString(); + auto rhs_str = other.loop_index_vals_.at(i)->toInlineString(); + if (lhs_str == rhs_str) { + continue; + } + + return false; + } + + return true; +} + +std::string CommonIndexKey::toString() const { + TORCH_INTERNAL_ASSERT(concrete_indexed_id_ != nullptr); + std::stringstream ss; + ss << "CommonIndexKey: " << concrete_indexed_id_->toString(); + ss << ", { "; + for (auto loop : used_loops_) { + ss << loop->iter_domain()->toString() << " "; + } + ss << "}"; + ss << ", { "; + for (auto val : loop_index_vals_) { + ss << val->toString() << " "; + } + ss << "}"; + return ss.str(); +} + +std::pair CommonIndexMap::insert( + IterDomain* indexed_consumer_id, + TensorDomain* consumer_td, + TensorDomain* ref_td, + const std::unordered_map& ref_index_map, + const std::vector& loops, + Val* index) { + if (index->definition() == nullptr) { + // Only expression is eligible to hoist + return {index, false}; + } + + const CommonIndexKey key( + indexed_consumer_id, consumer_td, ref_td, ref_index_map, loops); + + Val* hoisted_index = nullptr; + bool new_index_inserted = false; + + // If already mapped, return the previously mapped index + auto it = common_index_map_.find(key); + if (it != common_index_map_.end()) { + hoisted_index = it->second; + new_index_inserted = false; + ++use_counts_.at(key); + } else { + common_index_map_.emplace(key, index); + hoisted_index = index; + new_index_inserted = true; + use_counts_[key] = 1; + } + + return {hoisted_index, new_index_inserted}; +} + +namespace { + +//! Insertion point of allocation +struct CommonIndexInsertionInfo { + Expr* ref = nullptr; + kir::Scope* scope = nullptr; +}; + +// Inserts allocations of hoisted indices +class CommonIndexInserter : private kir::ExprMutator { + public: + static std::vector run( + const std::vector& exprs, + const CommonIndexMap& common_indices) { + CommonIndexInserter inserter(exprs, common_indices); + return inserter.exprs_; + } + + private: + CommonIndexInserter( + const std::vector& exprs, + const CommonIndexMap& common_index_map) + : common_index_map_(common_index_map) { + // Create a map to keys from loops where they should be inserted + for (const auto& kv : common_index_map.commonIndexMap()) { + const auto& key = kv.first; + // Only consider indices used multiple times + if (!usedMultipleTimes(key)) { + continue; + } + TORCH_INTERNAL_ASSERT(!key.usedLoops().empty()); + auto insertion_loop = key.usedLoops().back(); + innermost_used_loop_map_[insertion_loop].push_back(key); + } + + traverseAndInsert(exprs); + } + + CommonIndexInsertionInfo findInsertionPoint( + const CommonIndexKey& key, + kir::ForLoop* current_loop) const { + CommonIndexInsertionInfo info; + + // Allocation must be inside any used non-trivial loop. Since the + // loop index value is constant if a loop is trivial, allocation + // does not need to be inside trivial loops. + for (const auto loop : key.usedLoops()) { + if (!loop->isTrivial()) { + info.ref = loop->body()[0]; + info.scope = &(loop->body()); + } + } + + // If no non-trivial used loop is found, insert at the top-level + // scope just before the outer-most loop. + if (info.ref == nullptr) { + info.ref = scope_exprs_.empty() ? current_loop : scope_exprs_.at(0); + info.scope = nullptr; + } + + return info; + } + + using kir::ExprMutator::handle; + + void handle(kir::ForLoop* loop) final { + auto innermost_loop_map_it = innermost_used_loop_map_.find(loop); + if (innermost_loop_map_it == innermost_used_loop_map_.end()) { + kir::ExprMutator::handle(loop); + return; + } + + for (const auto& key : innermost_loop_map_it->second) { + const auto common_index = common_index_map_.commonIndexMap().at(key); + + // Insert only when the index is used multiple times and is not + // yet inserted. + if (inserted_indices_.find(common_index) != inserted_indices_.end()) { + continue; + } + + auto alloc = IrBuilder::create( + common_index, + MemoryType::Local, + GpuLower::current()->kernel()->oneVal()); + const auto common_index_def = common_index->definition(); + TORCH_INTERNAL_ASSERT( + common_index_def != nullptr, + "Hosted index must have a definition. ", + common_index->toString()); + + const auto insertion_info = findInsertionPoint(key, loop); + registerInsertBefore(insertion_info.ref, alloc, insertion_info.scope); + registerInsertBefore( + insertion_info.ref, common_index_def, insertion_info.scope); + + // Track inserted index + inserted_indices_.emplace(common_index); + } + + kir::ExprMutator::handle(loop); + } + + bool usedMultipleTimes(const CommonIndexKey& key) { + auto it = common_index_map_.useCounts().find(key); + TORCH_INTERNAL_ASSERT( + it != common_index_map_.useCounts().end(), + "Key not found in the use-count map: ", + key.toString()); + return it->second > 1; + } + + private: + const CommonIndexMap& common_index_map_; + //! Map to CommonIndexKeys from their innermost used loops + std::unordered_map> + innermost_used_loop_map_; + //! Keep track of inserted indices + std::unordered_set inserted_indices_; +}; + +} // namespace + +std::vector allocateCommonIndices(const std::vector& exprs) { + return CommonIndexInserter::run(exprs, GpuLower::current()->commonIndexMap()); +} + +} // namespace cuda +} // namespace fuser +} // namespace jit +} // namespace torch diff --git a/torch/csrc/jit/codegen/cuda/lower_index_hoist.h b/torch/csrc/jit/codegen/cuda/lower_index_hoist.h new file mode 100644 index 00000000000000..5e0256f9e84498 --- /dev/null +++ b/torch/csrc/jit/codegen/cuda/lower_index_hoist.h @@ -0,0 +1,121 @@ +#pragma once + +#include + +#include +#include +#include + +// Hoisting common index subexpressions +// +// Class CommonIndexMap is updated during the lowering as new indices +// are inserted. An index is uniquely identified with CommonIndexKey, +// which consists of the concrete ID of the indexed/predicated domain, +// the for-loops used in the index, and the index vals of the use +// for-loops. +// +// Once all indices are inserted to CommonIndexMap, allocations of the +// the hoisted indices are inserted by allocateCommonIndices. Note +// that this assumes that the CUDA code generator does not inline a +// scalar Val with allocation (PR #1434). + +namespace torch { +namespace jit { +namespace fuser { +namespace cuda { + +//! Class to represent unique indexed domains for index +//! hoisting. Uniquenesss is determined with the indexed domain +//! itself, the for-loops and their index values. +class CommonIndexKey { + friend struct CommonIndexKeyHash; + + public: + //! \param consumer_indexed_id Indexed consumer domain + //! \param consumer_td TensorDomain of consumer_indexed_id + //! \param ref_td Reference domain at the time of indexing + //! \param ref_index_map Index map of the reference domain + //! \param loops Loop structure where this id is indexed + CommonIndexKey( + IterDomain* consumer_indexed_id, + TensorDomain* consumer_td, + TensorDomain* ref_td, + const std::unordered_map& ref_index_map, + const std::vector& loops); + + const IterDomain* concreteIndexedId() const { + return concrete_indexed_id_; + } + + const std::vector& usedLoops() const { + return used_loops_; + } + + const std::vector& loopIndexVals() const { + return loop_index_vals_; + } + + bool operator==(const CommonIndexKey& other) const; + + std::string toString() const; + + private: + //! Concrete domain of indexed domain + IterDomain* concrete_indexed_id_ = nullptr; + //! Loops used for the index + std::vector used_loops_; + //! Loop index vals for the used loops + std::vector loop_index_vals_; +}; + +struct CommonIndexKeyHash { + std::size_t operator()(const CommonIndexKey& key) const { + auto h = std::hash{}(key.concrete_indexed_id_); + // NOTE: do not use other fields as the pointers can be different + // even when two keys can share the same index + return h; + } +}; + +//! Map to hold hoisted common indices +class TORCH_CUDA_CU_API CommonIndexMap { + public: + //! Register an indexd consumer domain to hoist + //! + //! Returns a corresponding hoisted index and a flag indicating if a + //! new index is inserted. + //! + //! Consumer domains are used even for producer indexing since + //! producer domains in producer indexing are temporary replay + //! domains. + std::pair insert( + IterDomain* indexed_consumer_id, + TensorDomain* consumer_td, + TensorDomain* ref_td, + const std::unordered_map& ref_index_map, + const std::vector& loops, + Val* index); + + const auto& commonIndexMap() const { + return common_index_map_; + } + + const auto& useCounts() const { + return use_counts_; + } + + private: + //! Map to hold hoisted common indices + std::unordered_map + common_index_map_; + std::unordered_map use_counts_; +}; + +//! Insert allocations of hoisted indices. Must be called after +//! collecting all common indices. +std::vector allocateCommonIndices(const std::vector& exprs); + +} // namespace cuda +} // namespace fuser +} // namespace jit +} // namespace torch diff --git a/torch/csrc/jit/codegen/cuda/lower_insert_syncs.cpp b/torch/csrc/jit/codegen/cuda/lower_insert_syncs.cpp index 77be88183eccb4..1acf33150cc401 100644 --- a/torch/csrc/jit/codegen/cuda/lower_insert_syncs.cpp +++ b/torch/csrc/jit/codegen/cuda/lower_insert_syncs.cpp @@ -145,7 +145,21 @@ class WarSyncInserter : private kir::ExprMutator { kir::ExprMutator::handle(ite); } - void handle(kir::Sync* sync) final { + void handle(kir::BlockSync* sync) final { + // Register the sync for the active for loop + sync_hit_.back() = true; + // Run through the active allocations, if a read was hit, register there was + // a sync after the read. If there's subsequent reads on this buffer the + // sync_after_read will be cleared. + for (auto& entry : smem_allocations_) { + auto& alloc_stack = entry.second; + if (alloc_stack.back().read_hit) { + alloc_stack.back().sync_after_read = true; + } + } + } + + void handle(kir::GridSync* sync) final { // Register the sync for the active for loop sync_hit_.back() = true; // Run through the active allocations, if a read was hit, register there was @@ -191,9 +205,11 @@ class WarSyncInserter : private kir::ExprMutator { // Mark write has been hit for all output tvs auto out_tvs = ir_utils::filterByType(expr->outputs()); for (auto out_tv : out_tvs) { - if (out_tv->getMemoryType() != MemoryType::Shared) { + if (out_tv->getMemoryType() != MemoryType::Shared || + GpuLower::current()->syncMap().needsRawSync(out_tv).none()) { continue; } + auto& entry = getMemInfo(out_tv); // If this is the first write and there's a sync in one of the loops after @@ -207,9 +223,11 @@ class WarSyncInserter : private kir::ExprMutator { // Mark read was hit, if sync_after_read was set, clear it. auto inp_tvs = ir_utils::filterByType(expr->inputs()); for (auto inp_tv : inp_tvs) { - if (inp_tv->getMemoryType() != MemoryType::Shared) { + if (inp_tv->getMemoryType() != MemoryType::Shared || + GpuLower::current()->syncMap().needsRawSync(inp_tv).none()) { continue; } + auto& entry = getMemInfo(inp_tv); entry.read_hit = true; // Clear the sync_after_read if it was set because there was another write @@ -223,10 +241,7 @@ class WarSyncInserter : private kir::ExprMutator { sync_hit_.push_back(false); // If there is no real iterating loop WAR syncs aren't necessary - within_iter_loop_ = within_iter_loop_ || - !(for_loop->iter_domain()->isThread() || - for_loop->iter_domain()->isBroadcast() || - for_loop->iter_domain()->extent()->isOneInt()); + within_iter_loop_ = within_iter_loop_ || !for_loop->isTrivial(); // Process the expressions in the for loop kir::ExprMutator::handle(for_loop); @@ -260,7 +275,7 @@ class WarSyncInserter : private kir::ExprMutator { // WAR Sync is necessary in this loop, register its insertion. if (insert_sync) { - auto sync_expr = IrBuilder::create(true); + auto sync_expr = IrBuilder::create(true); kir::ExprMutator::registerInsertAfter( for_loop->body().exprs().back(), sync_expr, &for_loop->body()); handle(sync_expr); @@ -376,15 +391,56 @@ class ValidatePlacementAfterWrites : private kir::IrVisitor { const std::unordered_set& writes_; }; +namespace { + +Val* getGridSyncBufferSize(const ParallelTypeBitmap& ptb) { + // See the comment above for getGridCommWorkBufferSize. + TORCH_INTERNAL_ASSERT( + ptb.hasBID(), + "Detected needing a grid sync but no grid bits set in bitmap."); + Val* buffer_size = GpuLower::current()->kernel()->oneVal(); + for (auto pt : kParallelTypeBIDs) { + if (!ptb.get(pt)) { + continue; + } + auto pt_dim = GpuLower::current()->parallelDimensionMap().get(pt); + if (pt_dim == nullptr || pt_dim->isOneInt()) { + continue; + } + buffer_size = IrBuilder::mulExpr(buffer_size, pt_dim); + } + return buffer_size; +} + +// Copied from lower_index.cpp, may be worth either removing this function and +// doing it inline or reusing the function from lower_index.cpp +kir::Allocate* allocGlobalBufferForGridComm( + Val* buffer_size, + DataType dtype, + bool zero_init) { + const std::vector new_buffer_ids = { + IrBuilder::create( + GpuLower::current()->kernel()->zeroVal(), buffer_size)}; + const auto buffer_domain = IrBuilder::create(new_buffer_ids); + const auto buffer_tv = + IrBuilder::create(buffer_domain, dtype, MemoryType::Global); + return IrBuilder::create( + buffer_tv, buffer_tv->getMemoryType(), nullptr, zero_init); +} + +} // namespace + class ReadAfterWriteSyncs : public kir::ExprMutator { private: using kir::ExprMutator::handle; //! Traverse up the loop stack from loops_it and if a halo loop is //! found, place a given sync expr before the outer-most halo loop. + // TODO: What needs to be done here for gmem comm? bool insertBeforeHaloLoop( std::vector::iterator loops_it, - kir::Sync* sync_expr, + Expr* sync_expr, + Expr* maybe_alloc, const std::unordered_set& writes) { std::vector::iterator halo_loop_it; bool halo_loop_found = false; @@ -424,6 +480,10 @@ class ReadAfterWriteSyncs : public kir::ExprMutator { auto place_in = *(halo_loop_it - 1); kir::ExprMutator::registerInsertBefore( halo_loop, sync_expr, &place_in->body()); + if (maybe_alloc != nullptr) { + kir::ExprMutator::registerInsertBefore( + halo_loop, maybe_alloc, &place_in->body()); + } } return true; @@ -435,7 +495,8 @@ class ReadAfterWriteSyncs : public kir::ExprMutator { return; } - if (sync_after_.size() > 0 && sync_after_.front() == expr) { + if (sync_after_.size() > 0 && sync_after_.front().first == expr) { + auto sync_bitmap = sync_after_.front().second; sync_after_.pop_front(); auto last_writes = last_writes_.front(); last_writes_.pop_front(); @@ -450,8 +511,16 @@ class ReadAfterWriteSyncs : public kir::ExprMutator { // TODO: This may be a common operation, could be worth making a utility // out of or saving state for tensor view ID -> for loop // TODO: Explicitly test the 3 cases below - - auto sync_expr = IrBuilder::create(); + Expr* sync_expr = nullptr; + kir::Allocate* maybe_alloc = nullptr; + if (sync_bitmap.hasBID()) { + maybe_alloc = allocGlobalBufferForGridComm( + getGridSyncBufferSize(sync_bitmap), DataType::Int, true); + sync_expr = IrBuilder::create( + sync_bitmap, maybe_alloc->buffer()); + } else { + sync_expr = IrBuilder::create(); + } if (out_tv->getComputeAtPosition() == 0) { // Sync should be placed at global scope, after its outer most loop if // it has one. @@ -465,8 +534,10 @@ class ReadAfterWriteSyncs : public kir::ExprMutator { "Tried to place after, ", place_after->toString(), ", but could not find this expression at the global scope."); - - registerInsertAfter(*(place_after_it + 1), sync_expr, nullptr); + registerInsertAfter(*(place_after_it), sync_expr, nullptr); + if (maybe_alloc != nullptr) { + registerInsertAfter(place_after, maybe_alloc, nullptr); + } } else { // Find the last loop in computeAt of out_tv, this is the loop where we // would place an allocation for out_tv @@ -485,7 +556,8 @@ class ReadAfterWriteSyncs : public kir::ExprMutator { TORCH_INTERNAL_ASSERT(loops_it != for_loops_.end()); // block sync must be placed before halo-extended loops - if (insertBeforeHaloLoop(loops_it, sync_expr, last_writes)) { + if (insertBeforeHaloLoop( + loops_it, sync_expr, maybe_alloc, last_writes)) { return; } @@ -503,6 +575,9 @@ class ReadAfterWriteSyncs : public kir::ExprMutator { } registerInsertAfter(place_after, sync_expr, &place_in->body()); + if (maybe_alloc != nullptr) { + registerInsertAfter(place_after, maybe_alloc, &place_in->body()); + } } } } @@ -514,11 +589,6 @@ class ReadAfterWriteSyncs : public kir::ExprMutator { "this pass should be run before any conditionals are placed in code."); } - // Clear the modify status for all shared memory buffers - static void cleanSharedMemory(std::unordered_map& smem) { - smem.clear(); - } - // Return a set of expressions that modify shared-memory // tensors. Expressions are excluded when syncthreads are already // placed. @@ -526,7 +596,13 @@ class ReadAfterWriteSyncs : public kir::ExprMutator { const std::unordered_map& smem, const std::vector& tvs) const { std::unordered_set last_writes; - for (auto tv : tvs) { + for (auto tv : ir_utils::filterByType(tvs)) { + if (GpuLower::current()->syncMap().needsRawSync(tv).none()) { + continue; + } + if (tv->getMemoryType() != MemoryType::Shared) { + continue; + } auto it = smem.find(tv); if (it != smem.end()) { last_writes.insert(it->second); @@ -535,10 +611,27 @@ class ReadAfterWriteSyncs : public kir::ExprMutator { return last_writes; } + std::unordered_set isModifiedGlobalMemory( + const std::unordered_map& gmem, + const std::vector& tvs) const { + std::unordered_set last_writes; + for (auto tv : ir_utils::filterByType(tvs)) { + if (GpuLower::current()->syncMap().needsRawSync(tv).none()) { + continue; + } + auto it = gmem.find(tv); + if (it != gmem.end()) { + last_writes.insert(it->second); + } + } + return last_writes; + } + ReadAfterWriteSyncs(const std::vector& _exprs) { // Fusion shared_memory values // Tracks if shared memory is modified std::unordered_map smem; + std::unordered_map gmem; // Flatten all the expressions auto flattened_exprs = ExprFlattener::flatten(_exprs); @@ -549,14 +642,36 @@ class ReadAfterWriteSyncs : public kir::ExprMutator { continue; } - auto last_writes = isModifiedSharedMemory(smem, expr->inputs()); - if (!last_writes.empty()) { + auto last_gmem_writes = isModifiedGlobalMemory(gmem, expr->inputs()); + if (!last_gmem_writes.empty()) { TORCH_INTERNAL_ASSERT( prev_tv_expr != nullptr, "Can't require sync on inputs, however, detected it's needed."); - sync_after_.push_back(prev_tv_expr); - last_writes_.push_back(last_writes); - cleanSharedMemory(smem); + ParallelTypeBitmap bitmap; + for (auto entry : gmem) { + TORCH_INTERNAL_ASSERT(entry.first->isA()); + auto sync_bits = GpuLower::current()->syncMap().needsRawSync( + entry.first->as()); + bitmap |= sync_bits; + } + // Temporarily do full grid sync. + sync_after_.emplace_back(std::make_pair(prev_tv_expr, bitmap)); + last_writes_.push_back(last_gmem_writes); + gmem.clear(); + } + + auto last_smem_writes = isModifiedSharedMemory(smem, expr->inputs()); + if (!last_smem_writes.empty()) { + TORCH_INTERNAL_ASSERT( + prev_tv_expr != nullptr, + "Can't require sync on inputs, however, detected it's needed."); + ParallelTypeBitmap bitmap; + bitmap.set(ParallelType::TIDx); + bitmap.set(ParallelType::TIDy); + bitmap.set(ParallelType::TIDz); + sync_after_.emplace_back(std::make_pair(prev_tv_expr, bitmap)); + last_writes_.push_back(last_smem_writes); + smem.clear(); } for (auto tv : ir_utils::filterByType(expr->outputs())) { @@ -567,6 +682,9 @@ class ReadAfterWriteSyncs : public kir::ExprMutator { !tv->isDoubleBuffered()) { smem[tv] = expr; } + if (tv->getMemoryType() == MemoryType::Global) { + gmem[tv] = expr; + } } prev_tv_expr = expr; @@ -580,7 +698,7 @@ class ReadAfterWriteSyncs : public kir::ExprMutator { private: //! Keep track of expressions that must be followed by syncthreads - std::deque sync_after_; + std::deque> sync_after_; //! Keep track of write expressions that must be placed before //! syncthreads. diff --git a/torch/csrc/jit/codegen/cuda/lower_predicate.cpp b/torch/csrc/jit/codegen/cuda/lower_predicate.cpp index cd34c56b510e7c..166f38d6cf56f7 100644 --- a/torch/csrc/jit/codegen/cuda/lower_predicate.cpp +++ b/torch/csrc/jit/codegen/cuda/lower_predicate.cpp @@ -126,6 +126,12 @@ class ConditionalFromPredicateModifier : public kir::IrVisitor { } }; +void assertOnWarpOps(const Expr* expr) { + TORCH_INTERNAL_ASSERT( + !expr->isA(), + "Mma op: cannot eliminate predicate for mma op, tiling not valid"); +} + } // namespace std::vector generateConditionalFromPredicate( @@ -151,6 +157,8 @@ class PredicateAnalyzer : public OptOutDispatch { // of the parallelized axis is the actual size of the axis, not // the number of threads. Since the number of threads can be // larger than the axis size, it's not safe to skip predication + + // Check that parallel dimension will not generate out of bound index if (!(producer->getMemoryType() == MemoryType::Local && consumer->getMemoryType() == MemoryType::Local)) { return true; @@ -355,6 +363,10 @@ void PredicateElimination::handle(Expr* expr) { } if (needsPredicate(expr)) { + // Warp primitives are currently limited to un-predicated usage, + // predicating these ops will require extra steps to ensure that + // the whole warp will get the same value. + assertOnWarpOps(expr); return; } @@ -392,6 +404,11 @@ void PredicateElimination::handle(Expr* expr) { continue; } + if (expr->isA()) { + setReductionInitValue(input, expr->as()->init()); + continue; + } + // If an input does not need a predicate either, then it should // have some value, so no need to set a default value if (non_predicated_exprs_.find(input_def) != non_predicated_exprs_.end()) { diff --git a/torch/csrc/jit/codegen/cuda/lower_replace_size.cpp b/torch/csrc/jit/codegen/cuda/lower_replace_size.cpp index 582b6d91d067af..beec550e537f6e 100644 --- a/torch/csrc/jit/codegen/cuda/lower_replace_size.cpp +++ b/torch/csrc/jit/codegen/cuda/lower_replace_size.cpp @@ -147,61 +147,6 @@ std::unordered_map getSimplificationMap(Fusion* fusion) { return extent_to_min_input_id_extent; } -std::vector allLeafOuts(Fusion* fusion) { - auto exprs = StmtSort::getExprs(fusion, true); - std::unordered_set inputs; - std::unordered_set outputs; - std::vector ordered_outputs; - for (auto expr : exprs) { - inputs.insert(expr->inputs().begin(), expr->inputs().end()); - outputs.insert(expr->outputs().begin(), expr->outputs().end()); - ordered_outputs.insert( - ordered_outputs.end(), expr->outputs().begin(), expr->outputs().end()); - } - for (auto input : inputs) { - outputs.erase(input); - } - - std::vector ordered_leaf_outs; - for (auto out : ordered_outputs) { - if (outputs.find(out) != outputs.end()) { - ordered_leaf_outs.push_back(out); - } - } - return ordered_leaf_outs; -} - -class ValReplacementMutator : private OptOutMutator { - public: - ValReplacementMutator( - Fusion* fusion, - const std::unordered_map& replacement_map) - : replacement_map_(replacement_map) { - FusionGuard fg(fusion); - - // Welford makes this a little annoying since it holds a count which is - // typically not used by anything else. If we don't grab that count, then it - // would be a tensorview that doesn't get updated extents. Therefore, first - // grab all leaves towards outputs and grab stmts from there. - auto stmts = StmtSort::getStmts(fusion, allLeafOuts(fusion), true); - for (auto stmt : stmts) { - mutate(stmt); - } - } - - private: - using OptOutMutator::mutate; - void mutate(Val* val) final { - if (replacement_map_.find(val) == replacement_map_.end()) { - return OptOutMutator::mutate(val); - } - auto replaced_val = replacement_map_.at(val); - registerMutation(val, replaced_val); - } - - const std::unordered_map& replacement_map_; -}; - } // namespace void replaceSymbolicSizes(Fusion* fusion) { @@ -279,7 +224,7 @@ void replaceSymbolicSizes(Fusion* fusion) { } // Run mutation on the fusion with the tensor_dim_map - ValReplacementMutator(fusion, tensor_dim_map); + ir_utils::replaceValue(fusion, tensor_dim_map); } } // namespace cuda diff --git a/torch/csrc/jit/codegen/cuda/lower_sync_information.cpp b/torch/csrc/jit/codegen/cuda/lower_sync_information.cpp new file mode 100644 index 00000000000000..8ab11140f497a8 --- /dev/null +++ b/torch/csrc/jit/codegen/cuda/lower_sync_information.cpp @@ -0,0 +1,451 @@ + +#include +#include +#include + +#include + +namespace torch { +namespace jit { +namespace fuser { +namespace cuda { + +namespace { + +// Validate parallelization of a single tensor +void validateParallelizationOfTensor(TensorView* tv) { + // Each ParallelType can be used only once. + ParallelTypeBitmap pt_map; + for (size_t i = 0; i < tv->nDims(); ++i) { + auto axis = tv->axis(i); + auto ptype = axis->getParallelType(); + if (!isParallelTypeThread(ptype)) { + continue; + } + + // It doesn't matter if this axis is a non-concretized broadcast + // TODO: merging broadcast and non-broadcast + if (axis->isBroadcast() && + !GpuLower::current()->concretizedBroadcastDomains().isConcretized( + axis)) { + continue; + } + + TORCH_INTERNAL_ASSERT( + !pt_map.get(ptype), + "Multiple use of ", + ptype, + " in tensor t", + tv->name(), + ": ", + tv); + pt_map.set(ptype); + } + + // If this tensor is predicated by a paralel type, it should not be + // used to parallelize any domain of this tensor + + const auto thread_pred = + GpuLower::current()->threadPredMap().getPredicateInfo(tv); + + auto predicated_parallel_types = pt_map & thread_pred.limited_types; + + TORCH_INTERNAL_ASSERT( + predicated_parallel_types.none(), + "Invalid parallelization of tensor t", + tv->name(), + ". The tensor is parallelized with ", + predicated_parallel_types.toString(), + ", but it's invalid to use the types as the tensor is also predicated with them.", + ", thread pred: ", + thread_pred.limited_types.toString()); +} + +//! Return true if axis is derived from a root axis that is an input +//! to a CA leaf axis. +bool derivedFromRootCAAxes(TensorView* tv, IterDomain* axis) { + std::vector ca_axes( + tv->domain()->domain().begin(), + tv->domain()->domain().begin() + tv->getComputeAtPosition()); + + auto ca_root_vals = IterVisitor::getInputsTo( + std::vector(ca_axes.begin(), ca_axes.end())); + + auto root_vals = IterVisitor::getInputsTo({axis}); + + return std::any_of( + root_vals.begin(), root_vals.end(), [&ca_root_vals](auto root) { + return std::find(ca_root_vals.begin(), ca_root_vals.end(), root) != + ca_root_vals.end(); + }); +} + +} // namespace + +void SyncMap::build(Fusion* fusion) { + FUSER_PERF_SCOPE("GpuLower::Lower::validateParallelize"); + FusionGuard fg(fusion); + + const auto& par_map = GpuLower::current()->caParallelMap(); + const auto& loop_map = GpuLower::current()->caLoopMap(); + const auto& index_map = GpuLower::current()->caIndexMap(); + const auto& pred_map = GpuLower::current()->threadPredMap(); + + auto exprs = StmtSort::getExprs(fusion); + + // Run through expressions and check for communication across threads/blocks + // occuring from producer to consumer of the expression + for (auto expr : exprs) { + if (!ir_utils::isTvOp(expr)) { + continue; + } + + // Validate parallelization of each consumer by itself + for (auto consumer : ir_utils::filterByType(expr->outputs())) { + validateParallelizationOfTensor(consumer); + } + + // It's probably enough to just check all producers to one consumer as + // multi-consumers are guaranteed to be transformed/parallelized the same, + // but to be conservative for now checking every producer <-> consumer + // relationship. + for (auto producer : ir_utils::filterByType(expr->inputs())) { + // Parallelization on input tensors have no effect. + if (producer->isFusionInput()) { + continue; + } + + ParallelTypeBitmap raw_dims; + + const auto parallel_bcast_doms = + pred_map.getParallelBroadcastDomains(producer); + + // Stash information about parallelized producer iteration domains + std::vector producer_parallel_ids( + ParallelTypeBitmap::kNumParallelTypes, nullptr); + ParallelTypeBitmap producer_parallel_bitmap; + + // Tracking for quick check later + std::unordered_set producer_within_compute_at; + + for (const auto producer_i : c10::irange(producer->nDims())) { + auto producer_axis = producer->axis(producer_i); + auto producer_ptype = + par_map.getConcreteMappedID(producer_axis)->getParallelType(); + + if (!isParallelTypeThread(producer_ptype)) { + continue; + } + + // Producer reductions shouldn't map to consumers + if (producer_axis->isReduction()) { + continue; + } + + if (producer_i < producer->getComputeAtPosition()) { + producer_within_compute_at.emplace(producer_axis); + } + + producer_parallel_bitmap.set(producer_ptype); + producer_parallel_ids[getParallelTypeBitMapOffset(producer_ptype)] = + producer_axis; + } + + for (auto consumer : + ir_utils::filterByType(expr->outputs())) { + // Stash information about parallelized consumer iteration domains + std::vector consumer_parallel_ids( + ParallelTypeBitmap::kNumParallelTypes, nullptr); + ParallelTypeBitmap consumer_parallel_bitmap; + + for (const auto consumer_i : c10::irange(consumer->nDims())) { + auto consumer_axis = consumer->axis(consumer_i); + auto consumer_ptype = + par_map.getConcreteMappedID(consumer_axis)->getParallelType(); + + if (!isParallelTypeThread(consumer_ptype)) { + continue; + } + + // When the consumer axis is a broadcast, it is not really + // parallelized unless thread-predicated and eventually concretized + if (consumer_axis->isBroadcast() && + (!parallel_bcast_doms.get(consumer_ptype) || + !GpuLower::current() + ->concretizedBroadcastDomains() + .isConcretized(consumer_axis))) { + continue; + } + + consumer_parallel_bitmap.set(consumer_ptype); + consumer_parallel_ids[getParallelTypeBitMapOffset(consumer_ptype)] = + consumer_axis; + } + + // At this point each parallel type that's present in the consumer or + // the producer will be present in their corresponding `_parallel_ids` + // map going from parallel index type (only size 6 for grid/block dims) + // to the iteration domain of that parallel type. + for (auto parallel_type : kParallelTypeThreads) { + // TIDx is reserved for lane_id in the case of mma ops. + // It is swizzled and handled separately in validateMma. + if (parallel_type == ParallelType::TIDx && expr->isA()) { + continue; + } + + auto parallel_type_i = getParallelTypeBitMapOffset(parallel_type); + + auto p_id = producer_parallel_ids[parallel_type_i]; + auto c_id = consumer_parallel_ids[parallel_type_i]; + + if (p_id == nullptr && c_id == nullptr) { + continue; + } else if (p_id != nullptr && c_id != nullptr) { + if (loop_map.areMapped(p_id, c_id)) { + const auto halo_info = GpuLower::current()->haloInfo(); + + if (halo_info.hasHaloWidth(p_id) != + halo_info.hasHaloWidth(c_id) || + (halo_info.hasHaloWidth(p_id) && + halo_info.hasHaloWidth(c_id) && + halo_info.getHaloWidth(p_id) != + halo_info.getHaloWidth(c_id))) { + raw_dims.set(parallel_type); + continue; + } + } + } else { + if (p_id != nullptr) { + auto it = std::find_if( + consumer->domain()->domain().begin(), + consumer->domain()->domain().end(), + [&](IterDomain* c_id) { + return loop_map.areMapped(p_id, c_id); + }); + + // If there isn't a mapping from producer to a consumer domain, + // need to assume there's communication across this parallel + // dimension. + c_id = it == consumer->domain()->domain().end() ? nullptr : *it; + // i.e. if producer is parallelized across threadIdx.x in a + // certain split, if the consumer doesn't map to this split, + // then we need to assume it has to be in smem with proper + // syncs. + } else { + auto it = std::find_if( + producer->domain()->domain().begin(), + producer->domain()->domain().end(), + [&](IterDomain* p_id) { + return loop_map.areMapped(p_id, c_id); + }); + if (it == producer->domain()->domain().end()) { + // Can't infer anything if producer doesn't have a matching axis + // to parallel consumer dim. + continue; + } + p_id = *it; + } + } + + // Comm pattern options (when parallel types don't have matching + // axes) and required memory, Chart is producer parallel type, + // consumer parallel type Parallel types are Serial(S), + // threadIdx(T), blockIdx(B), Memory required for the producer is + // Local(L), Shared(S), Global(G), Sync is None (N/A), blockSync(B), + // grid_sync(G) + // + // P C Mem Req Sync Type + // S S L N/A + // S T L N/A + // S B L N/A + // T S S B + // T T S B + // T B S B + // B S G G + // B T G G + // B B G G + + auto producer_ptype = + par_map.getConcreteMappedID(p_id)->getParallelType(); + auto consumer_ptype = c_id == nullptr + ? ParallelType::Serial + : par_map.getConcreteMappedID(c_id)->getParallelType(); + + if (!p_id->isBroadcast() && isParallelTypeThread(producer_ptype) && + !(isParallelTypeThread(consumer_ptype) && + parallel_bcast_doms.get(consumer_ptype)) && + // Being in compute at means consumer and producer rely on the + // same loop size + !producer_within_compute_at.count(p_id) && + // For usage of derivedFromRootCAAxes check + // NVFuserTest.FusionAdvancedIndexing1_CUDA + (c_id == nullptr || !derivedFromRootCAAxes(producer, p_id))) { + // There must be a consumer axis that uses the same indexing + // with the same parallel type as the producer axis. The index + // map is used to to find such an axis. In addition, even when + // no mapped axis is found in the index map, but when an mapped + // axis exists in the loop map, the producer and consumer axes + // may still use the same indexing. That only happens when the + // producer is derived from a root axis that is an input to any + // leaf CA axes. In such a case, the axis in the reference + // tensor that maps to the producer axis is created based on the + // consumer, so both the producer and consumer axes should have + // the same indexing. See issue #995 as well as the + // FusionValidateParallelize6 test for a concrete example. + auto it = std::find_if( + consumer->domain()->domain().begin(), + consumer->domain()->domain().end(), + [&](IterDomain* c_id_) { + return index_map.areMapped(p_id, c_id_); + }); + if (it == consumer->domain()->domain().end()) { + if (isParallelTypeThread(producer_ptype)) { + raw_dims.set(producer_ptype); + } + if (isParallelTypeThread(consumer_ptype)) { + raw_dims.set(consumer_ptype); + } + } + } + + // In shift or gather operations, if a thread or block + // domain's root ID is shifted or gathered, it can overlap + // in shared or global memory. This doesn't + // require a RAW sync since each thread would still write every value + // it would read, but it can require a WAR sync for Shared Memory. + // Since there isn't a separate structure for WAR than RAW for now + // we'll flag it on RAW which will trigger the WAR. + // See test FusionValidateParallelizeShift_CUDA for a + // concrete example where this sync is required. + if ((expr->getExprType() == ExprType::GatherOp || + expr->getExprType() == ExprType::ShiftOp) && + producer->getMemoryType() == MemoryType::Shared && + isParallelTypeThreadDim(producer_ptype)) { + std::unordered_set shifted_rfactor_ids; + if (expr->getExprType() == ExprType::GatherOp) { + auto gather_op = expr->as(); + for (auto root_i : + c10::irange(producer->getMaybeRFactorDomain().size())) { + auto rfactor_id = producer->getMaybeRFactorDomain()[root_i]; + // If the window shape is 1, it just copies the + // producer to the consumer + if (gather_op->windowShape()[root_i] != 1) { + shifted_rfactor_ids.insert(rfactor_id); + } + } + } else if (expr->getExprType() == ExprType::ShiftOp) { + auto shift_op = expr->as(); + for (auto root_i : + c10::irange(producer->getMaybeRFactorDomain().size())) { + auto rfactor_id = producer->getMaybeRFactorDomain()[root_i]; + // If the shift offset is 0, it doesn't actually shift + if (shift_op->offsets()[root_i] != 0) { + shifted_rfactor_ids.insert(rfactor_id); + } + } + } + + // Grab all values between shifted rfactor domains and p_id so we + // can identify which rfactor domains are inputs to the p_id + auto p_id_dep_vals = + DependencyCheck::getAllValsBetween(shifted_rfactor_ids, {p_id}); + // If this shifted rfactor domain is an input to p_id, we + // must have a WAR sync. Mark raw sync so it will be generated. + if (!p_id_dep_vals.empty()) { + raw_dims.set(producer_ptype); + } + } + + // If same parallel type and mapped, no need for syncs unless + // producer is in smem, producer parallel type is a thread + // dimension, and consumer concretizes the dimension. This sync is + // due to the redundant predicate omission in lower thread + // predicate. + auto redundant_preds = GpuLower::current() + ->threadPredMap() + .getPredicateInfo(producer) + .redundant_types; + + if (p_id->isBroadcast() && + GpuLower::current()->concretizedBroadcastDomains().isConcretized( + p_id) && + producer->getMemoryType() == MemoryType::Shared && + redundant_preds.hasTID()) { + redundant_preds.clearAllBID(); + raw_dims |= redundant_preds; + continue; + } + + // When the producer axis is a broadcast, it is not really + // parallelized unless thread-predicated and concretized + if (isParallelTypeThread(producer_ptype) && p_id->isBroadcast() && + (!parallel_bcast_doms.get(producer_ptype) || + !GpuLower::current() + ->concretizedBroadcastDomains() + .isConcretized(p_id))) { + continue; + } + + // If matching dims and matching parallel types, no comm is necessary. + if (producer_ptype == consumer_ptype && + loop_map.areMapped(p_id, c_id)) { + continue; + } + + // Set parallel dimensions that communication is occuring over. + if (isParallelTypeThread(producer_ptype)) { + raw_dims.set(producer_ptype); + } + } // end for ptypes + + if (raw_dims.hasBID()) { + TORCH_INTERNAL_ASSERT( + producer->getMemoryType() == MemoryType::Global, + "Inconsistent parallelization found between TV", + producer->name(), + " (", + producer->toString(), + ") and TV", + consumer->name(), + "(", + consumer->toString(), + "). Producer is required to be in Global Memory based on parallelization strategy."); + } else if (raw_dims.hasTID()) { + TORCH_INTERNAL_ASSERT( + producer->getMemoryType() == MemoryType::Global || + producer->getMemoryType() == MemoryType::Shared, + "Inconsistent parallelization found between TV", + producer->name(), + " (", + producer->toString(), + ") and TV", + consumer->name(), + "(", + consumer->toString(), + "). Producer is required to be in Global or Shared Memory based on parallelization strategy."); + } + + } // end for consumers + + if (raw_dims.any()) { + needs_raw_sync_[producer] = raw_dims; + } + + } // end producer + } +} + +std::string SyncMap::toString() const { + std::stringstream ss; + ss << "TVs requiring RAW:" << std::endl; + for (auto entry : needs_raw_sync_) { + ss << " " << entry.first->toString() << " :: " << entry.second.toString() + << std::endl; + } + return ss.str(); +} + +} // namespace cuda +} // namespace fuser +} // namespace jit +} // namespace torch diff --git a/torch/csrc/jit/codegen/cuda/lower_sync_information.h b/torch/csrc/jit/codegen/cuda/lower_sync_information.h new file mode 100644 index 00000000000000..09fcf9eabd7f34 --- /dev/null +++ b/torch/csrc/jit/codegen/cuda/lower_sync_information.h @@ -0,0 +1,45 @@ +#pragma once + +#include +#include + +#include + +namespace torch { +namespace jit { +namespace fuser { +namespace cuda { + +class SyncMap { + public: + std::string toString() const; + + //! Validates all tensors are consistently parallelized. Basically, + //! when a producer axis is threaded, either with threadIdx or + //! blockIdx, there must be a mapped consumer axis with the + //! same ParallelType with some exceptions. + //! + //! This function assumes Loop and Parallel ComputeAtMaps are already + //! built as they are used to validate consistency. + //! + //! Fills needs_raw_sync with output TVs if they need a raw sync if on smem or + //! gmem. The second entry in this map is the parallel dimensions being + //! communicated across. + void build(Fusion* fusion); + + ParallelTypeBitmap needsRawSync(TensorView* tv) const { + auto it = needs_raw_sync_.find(tv); + if (it != needs_raw_sync_.end()) { + return it->second; + } + return ParallelTypeBitmap(); + } + + private: + std::unordered_map needs_raw_sync_; +}; + +} // namespace cuda +} // namespace fuser +} // namespace jit +} // namespace torch diff --git a/torch/csrc/jit/codegen/cuda/lower_thread_predicate.cpp b/torch/csrc/jit/codegen/cuda/lower_thread_predicate.cpp index 8721490feb7917..7f77182bd71713 100644 --- a/torch/csrc/jit/codegen/cuda/lower_thread_predicate.cpp +++ b/torch/csrc/jit/codegen/cuda/lower_thread_predicate.cpp @@ -146,6 +146,21 @@ ParallelTypeBitmap getReductionPredicateForUnusedParallelTypes( void ThreadPredicateMap::updateBitSet(const Expr* expr) { FUSER_PERF_SCOPE("GpuLower::Lower::ThreadPredicateMap::updateBitSet"); + // If all of the inputs are not updated and all of the outputs have + // already mappings, don't do anything + if (std::all_of( + ir_utils::filterByType(expr->inputs()).begin(), + ir_utils::filterByType(expr->inputs()).end(), + [this](TensorView* tv) { + return updated_tvs_.find(tv) == updated_tvs_.end(); + }) && + std::all_of( + ir_utils::filterByType(expr->outputs()).begin(), + ir_utils::filterByType(expr->outputs()).end(), + [this](TensorView* tv) { return find(tv) != end(); })) { + return; + } + // Which predicates were set for the inputs ParallelTypeBitmap input_preds; @@ -181,7 +196,8 @@ void ThreadPredicateMap::updateBitSet(const Expr* expr) { for (auto id : tv_inp->domain()->domain()) { if (id->isThread()) { id_ptypes.set(id->getParallelType()); - if (id->isReduction()) { + if (id->isReduction() && + !GpuLower::current()->fusedReductionInfo().isAllreduce(id)) { id_reductions.set(id->getParallelType()); } if (id->isBroadcast() && @@ -228,9 +244,8 @@ void ThreadPredicateMap::updateBitSet(const Expr* expr) { // Run through outputs and set bitset predicates for (auto* out_tv : ir_utils::filterByType(expr->outputs())) { - TORCH_INTERNAL_ASSERT(find(out_tv) == end()); auto redundant_types = avoidRedundantWrites(out_tv); - insert(out_tv, output_preds, redundant_types); + update(out_tv, output_preds, redundant_types); } } @@ -240,12 +255,13 @@ void ThreadPredicateMap::build(Fusion* fusion) { // Initialize mapping for input tensors for (auto inp : fusion->inputs()) { if (auto tv = dynamic_cast(inp)) { - insert(tv, ParallelTypeBitmap(), ParallelTypeBitmap()); + update(tv, ParallelTypeBitmap(), ParallelTypeBitmap()); } } for (auto expr : fusion->exprs()) { updateBitSet(expr); } + updated_tvs_.clear(); } ThreadPredicateMap::const_iterator ThreadPredicateMap::find( @@ -284,17 +300,31 @@ ParallelTypeBitmap ThreadPredicateMap::getPredicatedParallelTypes( return pred_info.limited_types | pred_info.redundant_types; } -void ThreadPredicateMap::insert( +bool ThreadPredicateMap::update( const TensorView* tv, - const ParallelTypeBitmap& valid_types, + const ParallelTypeBitmap& limited_types, const ParallelTypeBitmap& redundant_types) { - insert(tv, {valid_types, redundant_types}); + return update(tv, {limited_types, redundant_types}); } -void ThreadPredicateMap::insert( +bool ThreadPredicateMap::update( const TensorView* tv, const PredicateInfo& pred_info) { - thread_predicates_.insert({tv, pred_info}); + auto existing_mapping_it = thread_predicates_.find(tv); + if (existing_mapping_it != end()) { + PredicateInfo& existing_info = existing_mapping_it->second; + if (existing_info == pred_info) { + return false; + } else { + existing_info = pred_info; + markAsUpdated(tv); + return true; + } + } else { + thread_predicates_.insert({tv, pred_info}); + markAsUpdated(tv); + return true; + } } Bool* ThreadPredicateMap::getPredicate(const TensorView* tv) const { @@ -333,6 +363,10 @@ ParallelTypeBitmap ThreadPredicateMap::getParallelBroadcastDomains( return parallel_broadcast & at(tv).limited_types; } +void ThreadPredicateMap::markAsUpdated(const TensorView* tv) { + updated_tvs_.insert(tv); +} + void ThreadPredicateMap::print() const { std::cout << "\nThreadPredicateMap\n"; std::cout << "--------------------------------\n"; diff --git a/torch/csrc/jit/codegen/cuda/lower_thread_predicate.h b/torch/csrc/jit/codegen/cuda/lower_thread_predicate.h index 0d7a2685b32150..2fb115953c6e75 100644 --- a/torch/csrc/jit/codegen/cuda/lower_thread_predicate.h +++ b/torch/csrc/jit/codegen/cuda/lower_thread_predicate.h @@ -48,6 +48,10 @@ class TORCH_CUDA_CU_API ThreadPredicateMap { ParallelTypeBitmap limited_types; // Parallel types where only one thread/block is enough. ParallelTypeBitmap redundant_types; + bool operator==(const PredicateInfo& other) const { + return limited_types == other.limited_types && + redundant_types == other.redundant_types; + } }; using MapType = std::unordered_map; @@ -78,6 +82,10 @@ class TORCH_CUDA_CU_API ThreadPredicateMap { //! blockBroadcast unless it is predicated by limited_types_ ParallelTypeBitmap getParallelBroadcastDomains(const TensorView* tv) const; + //! Mark tv as updated so that rebuilding the map should recompute + //! its predicates and those of its dependents. + void markAsUpdated(const TensorView* tv); + void print() const; //! Generate a Bool value from PredicateInfo. @@ -94,17 +102,19 @@ class TORCH_CUDA_CU_API ThreadPredicateMap { const PredicateInfo& at(const TensorView* tv) const; PredicateInfo& at(const TensorView* tv); - //! Insert a new mapping - void insert( + //! Update a mapping + bool update( const TensorView* tv, - const ParallelTypeBitmap& valid_types, + const ParallelTypeBitmap& limited_types, const ParallelTypeBitmap& redundant_types); - //! Insert a new mapping - void insert(const TensorView* tv, const PredicateInfo& pred_and_src); + //! Update a mapping + bool update(const TensorView* tv, const PredicateInfo& pred_and_src); private: MapType thread_predicates_; + //! Keep track of updated tensors that need predicates to be computed + std::unordered_set updated_tvs_; }; } // namespace cuda diff --git a/torch/csrc/jit/codegen/cuda/lower_trivial_reductions.cpp b/torch/csrc/jit/codegen/cuda/lower_trivial_reductions.cpp index a8905b4d4047e8..9922b243e4eedd 100644 --- a/torch/csrc/jit/codegen/cuda/lower_trivial_reductions.cpp +++ b/torch/csrc/jit/codegen/cuda/lower_trivial_reductions.cpp @@ -18,6 +18,7 @@ namespace { bool analyzeIfDerivedFromTrivialReduction(TensorView* tv, IterDomain* id); +// Checks the producer of tv to see if the bool traverseToRFactorTensor(TensorView* tv, IterDomain* root_id) { TORCH_INTERNAL_ASSERT( root_id->definition() == nullptr, "Not root IterDomain: ", root_id); @@ -29,6 +30,7 @@ bool traverseToRFactorTensor(TensorView* tv, IterDomain* root_id) { const auto& inputs = tv->definition()->inputs(); + // Check the reduction expression that produces tv if (inputs.size() != 1 || !inputs[0]->isA() || (tv->definition()->getExprType() != ExprType::ReductionOp && tv->definition()->getExprType() != ExprType::WelfordOp)) { @@ -63,8 +65,10 @@ bool analyzeIfDerivedFromTrivialReduction(TensorView* tv, IterDomain* id) { continue; } // If not possible to prove the root ID is trivial, see if the ID - // is derived from a rfactor tensor and, if so, continue the - // analysis at the rfactor tensor. + // is derived from a rfactor tensor. This may mean that the iteration domain + // was merged or split in another expression through rfactor. Trace back + // through rfactor expressions to find original roots and determine there if + // trivial. if (!traverseToRFactorTensor(tv, root_id)) { return false; } diff --git a/torch/csrc/jit/codegen/cuda/lower_trivial_reductions.h b/torch/csrc/jit/codegen/cuda/lower_trivial_reductions.h index 9ccbc2f78285d0..655d64a0417973 100644 --- a/torch/csrc/jit/codegen/cuda/lower_trivial_reductions.h +++ b/torch/csrc/jit/codegen/cuda/lower_trivial_reductions.h @@ -20,6 +20,8 @@ class TORCH_CUDA_CU_API TrivialReductionInfo { void build(Fusion* fusion); bool isDerived(IterDomain* id) const; + + // TODO: Not used, cleanup bool isDerivedFromRoot(IterDomain* id) const; private: diff --git a/torch/csrc/jit/codegen/cuda/lower_utils.cpp b/torch/csrc/jit/codegen/cuda/lower_utils.cpp index ba2f618efae06e..49852aff5e8320 100644 --- a/torch/csrc/jit/codegen/cuda/lower_utils.cpp +++ b/torch/csrc/jit/codegen/cuda/lower_utils.cpp @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -92,10 +93,12 @@ bool isTvOp(const Expr* expr) { expr->getExprType().value() == ExprType::TernaryOp || expr->getExprType().value() == ExprType::ReductionOp || expr->getExprType().value() == ExprType::WelfordOp || + expr->getExprType().value() == ExprType::MmaOp || expr->getExprType().value() == ExprType::BroadcastOp || expr->getExprType().value() == ExprType::TransposeOp || expr->getExprType().value() == ExprType::ShiftOp || expr->getExprType().value() == ExprType::GatherOp || + expr->getExprType().value() == ExprType::ViewDtypeOp || expr->getExprType().value() == ExprType::ViewOp || expr->getExprType().value() == ExprType::GridReduction || expr->getExprType().value() == ExprType::GridBroadcast || @@ -334,48 +337,21 @@ BasicAllocInfo getAllocInformation( namespace { -class ReplaceExprInput : public OptOutDispatch { +class ReplaceExprInput : private kir::ExprMutator { public: - using OptOutDispatch::handle; - static Expr* replace( - Expr* expr, - const std::unordered_map& replacement_map) { - ReplaceExprInput replacer(expr, replacement_map); - TORCH_INTERNAL_ASSERT(expr != nullptr); - replacer.handle(expr); - TORCH_INTERNAL_ASSERT(replacer.replaced_expr_ != nullptr); - auto ret_expr = replacer.replaced_expr_; - - // Copy predicates if the original expr is predicated - if (ret_expr != expr) { - ret_expr->setPredicate(expr->predicate()); - ret_expr->setWritePredicate(expr->writePredicate()); - } - return ret_expr; - } - static std::vector replace( - const std::vector& scope, + const std::vector& exprs, const std::unordered_map& replacement_map) { - std::vector ret_expr; - ret_expr.reserve(scope.size()); - - for (auto expr : scope) { - ret_expr.push_back(replace(expr, replacement_map)); - } - - return ret_expr; + ReplaceExprInput replacer(replacement_map); + replacer.traverseAndInsert(exprs); + return replacer.exprs_; } private: - // TODO: Replace this with mutator, example of this is done in replace - // symbolic sizes - ReplaceExprInput( - Expr* expr, - const std::unordered_map& replacement_map) - : replacement_map_(replacement_map) { - replaced_expr_ = expr; - } + ReplaceExprInput(const std::unordered_map& replacement_map) + : replacement_map_(replacement_map) {} + + using kir::ExprMutator::handle; c10::optional> getMaybeInputReplacementMap( Expr* expr) { @@ -398,93 +374,77 @@ class ReplaceExprInput : public OptOutDispatch { } } - // IR visitor interface - void handle(kir::ForLoop* for_loop) final { - auto new_for_loop = IrBuilder::create(for_loop); - - auto replaced_loop_body = - replace(for_loop->body().exprs(), replacement_map_); - - for (auto new_expr : replaced_loop_body) { - new_for_loop->body().push_back(new_expr); - } - replaced_expr_ = new_for_loop; - } - - void handle(kir::IfThenElse* ite) final { - auto new_ite = IrBuilder::create(ite->predicate()); - auto replaced_then_body = - replace(ite->thenBody().exprs(), replacement_map_); - for (auto new_expr : replaced_then_body) { - new_ite->thenBody().push_back(new_expr); - } - if (ite->hasElse()) { - auto replaced_else_body = - replace(ite->elseBody().exprs(), replacement_map_); - for (auto new_expr : replaced_else_body) { - new_ite->elseBody().push_back(new_expr); - } - } - replaced_expr_ = new_ite; + // Copy predicates and register expression replacement + void registerReplaceWithPredicate(Expr* old_expr, Expr* new_expr) { + new_expr->setPredicate(old_expr->predicate()); + new_expr->setWritePredicate(old_expr->writePredicate()); + registerReplace(old_expr, new_expr); } void handle(UnaryOp* node) final { auto replaced_inputs = getMaybeInputReplacementMap(node); if (replaced_inputs.has_value()) { - replaced_expr_ = IrBuilder::create( + auto replacement = IrBuilder::create( node->getUnaryOpType(), node->out(), replaced_inputs.value().at(node->in())); + registerReplaceWithPredicate(node, replacement); } } + void handle(BinaryOp* node) final { auto replaced_inputs = getMaybeInputReplacementMap(node); if (replaced_inputs.has_value()) { - replaced_expr_ = IrBuilder::create( + auto replacement = IrBuilder::create( node->getBinaryOpType(), node->out(), replaced_inputs.value().at(node->lhs()), replaced_inputs.value().at(node->rhs())); + registerReplaceWithPredicate(node, replacement); } } void handle(TernaryOp* node) final { auto replaced_inputs = getMaybeInputReplacementMap(node); if (replaced_inputs.has_value()) { - replaced_expr_ = IrBuilder::create( + auto replacement = IrBuilder::create( node->getTernaryOpType(), node->out(), replaced_inputs.value().at(node->in1()), replaced_inputs.value().at(node->in2()), replaced_inputs.value().at(node->in3())); + registerReplaceWithPredicate(node, replacement); } } void handle(ReductionOp* node) final { auto replaced_inputs = getMaybeInputReplacementMap(node); if (replaced_inputs.has_value()) { - replaced_expr_ = IrBuilder::create( + auto replacement = IrBuilder::create( node->getReductionOpType(), node->init(), node->out(), - replaced_inputs.value().at(node->in())); + replaced_inputs.value().at(node->in()), + node->isFused()); + registerReplaceWithPredicate(node, replacement); } } void handle(BroadcastOp* node) final { auto replaced_inputs = getMaybeInputReplacementMap(node); if (replaced_inputs.has_value()) { - replaced_expr_ = IrBuilder::create( + auto replacement = IrBuilder::create( node->out(), replaced_inputs.value().at(node->in()), node->getBroadcastDimFlags()); + registerReplaceWithPredicate(node, replacement); } } void handle(WelfordOp* node) final { auto replaced_inputs = getMaybeInputReplacementMap(node); if (replaced_inputs.has_value()) { - replaced_expr_ = IrBuilder::create( + auto replacement = IrBuilder::create( node->outAvg(), node->outVar(), node->outN(), @@ -494,11 +454,24 @@ class ReplaceExprInput : public OptOutDispatch { replaced_inputs.value().at(node->inAvg()), replaced_inputs.value().at(node->inVar()), replaced_inputs.value().at(node->inN())); + registerReplaceWithPredicate(node, replacement); + } + } + + void handle(MmaOp* node) final { + auto replaced_inputs = getMaybeInputReplacementMap(node); + if (replaced_inputs.has_value()) { + auto replacement = IrBuilder::create( + node->out(), + replaced_inputs.value().at(node->inA()), + replaced_inputs.value().at(node->inB()), + node->init(), + node->options()); + registerReplaceWithPredicate(node, replacement); } } private: - Expr* replaced_expr_ = nullptr; const std::unordered_map& replacement_map_; }; @@ -510,6 +483,15 @@ std::vector replaceInputsInExpr( return ReplaceExprInput::replace(exprs, replacement_map); } +bool isTrivialIterDomain(IterDomain* id) { + auto pt = id->getParallelType(); + return id->isReduction() || id->isBroadcast() || id->isStride() || + (id->extent()->isOneInt() && id->start()->isZeroInt()) || + pt == ParallelType::Vectorize || + (isParallelTypeThread(pt) && + !GpuLower::current()->haloInfo().hasHaloWidth(id)); +} + } // namespace cuda } // namespace fuser } // namespace jit diff --git a/torch/csrc/jit/codegen/cuda/lower_utils.h b/torch/csrc/jit/codegen/cuda/lower_utils.h index 4ed6c25e731a5b..39fec2aef103ec 100644 --- a/torch/csrc/jit/codegen/cuda/lower_utils.h +++ b/torch/csrc/jit/codegen/cuda/lower_utils.h @@ -137,6 +137,9 @@ std::vector replaceInputsInExpr( const std::vector& exprs, const std::unordered_map& replacement_map); +// True if an IterDomain does not materialize a loop +bool isTrivialIterDomain(IterDomain* id); + } // namespace cuda } // namespace fuser } // namespace jit diff --git a/torch/csrc/jit/codegen/cuda/lower_validation.cpp b/torch/csrc/jit/codegen/cuda/lower_validation.cpp index 25ba76ee71b2da..5f30cb513f55a7 100644 --- a/torch/csrc/jit/codegen/cuda/lower_validation.cpp +++ b/torch/csrc/jit/codegen/cuda/lower_validation.cpp @@ -1,5 +1,6 @@ #include +#include #include #include #include @@ -10,6 +11,7 @@ #include #include +#include #include namespace torch { @@ -260,6 +262,116 @@ class VectorizeValidator : public OptInDispatch { domains_.insert(m->inner()); } + // For the producer tensor, it's indexed first by transformed like + // the consumer. So, to find its contig merged domain, use the + // consumer TensorDomain with the producer contiguity info. + static std::vector mapProducerContiguity( + TensorView* producer_tv, + TensorView* consumer_tv) { + const auto c2p = PairwiseRootDomainMap(producer_tv, consumer_tv) + .mapConsumerToProducer( + consumer_tv->domain(), producer_tv->domain()); + + std::vector producer_contiguity; + + for (auto consumer_root_id : consumer_tv->getRootDomain()) { + auto producer_root_id = c2p.at(consumer_root_id); + auto producer_root_it = std::find( + producer_tv->getMaybeRFactorDomain().begin(), + producer_tv->getMaybeRFactorDomain().end(), + producer_root_id); + TORCH_INTERNAL_ASSERT( + producer_root_it != producer_tv->getMaybeRFactorDomain().end()); + auto producer_root_id_offset = std::distance( + producer_tv->getMaybeRFactorDomain().begin(), producer_root_it); + producer_contiguity.push_back( + producer_tv->domain()->contiguity().at(producer_root_id_offset)); + } + + return producer_contiguity; + } + + //! Find the contig root domains that a vectorized leaf domain + //! depends on. + static void fillVectorizedContigRootDomains( + TensorView* consumer_tv, + VectorizedSetInfo& info) { + auto producer_tv = + consumer_tv->definition()->inputs().at(0)->as(); + + // For each of the producer and consumer vectorized root domains, + // find the contig merged domain if exists. The extent of the + // domain is the size that must be divisible by the vectorization + // word size. Both of the producer and consumer domains must be + // divisible, so pick the one that has the smaller number of + // merged domains. + + ContigIDs consumer_contig_finder( + consumer_tv->domain()->domain(), + consumer_tv->getRootDomain(), + consumer_tv->domain()->contiguity()); + + // info.vectorized_root_id is validated at this point to be the + // last concrete root domain in consumer. + auto consumer_root_id = info.vectorized_root_id; + + // Find the root domains that are dependency of the merged contig domain. + auto consumer_indexed_it = + consumer_contig_finder.rootToIndexedID().find(consumer_root_id); + TORCH_INTERNAL_ASSERT( + consumer_indexed_it != consumer_contig_finder.rootToIndexedID().end(), + "Contiguity information not found for root domain: ", + consumer_root_id->toString()); + auto consumer_indexed_id = consumer_indexed_it->second; + // Actual indexed root domains for this consumer root domain. If + // contig merge is done, multiple root domains are included. + std::unordered_set consumer_indexed_root_ids; + if (consumer_indexed_id == consumer_root_id) { + // Indexed domain is equal to the root domain, meaning no contig + // merge is involved. + consumer_indexed_root_ids.insert(consumer_root_id); + } else { + auto consumer_within_contig_it = + consumer_contig_finder.withinContigIDs().find(consumer_indexed_id); + TORCH_INTERNAL_ASSERT( + consumer_within_contig_it != + consumer_contig_finder.withinContigIDs().end()); + consumer_indexed_root_ids = consumer_within_contig_it->second; + } + + // Note: we use the consumer domain with the producer + // contiguity. + ContigIDs producer_contig_finder( + consumer_tv->domain()->domain(), + consumer_tv->getRootDomain(), + mapProducerContiguity(producer_tv, consumer_tv)); + + auto producer_indexed_it = + producer_contig_finder.rootToIndexedID().find(consumer_root_id); + TORCH_INTERNAL_ASSERT( + producer_indexed_it != producer_contig_finder.rootToIndexedID().end(), + "Contiguity information not found for root domain: ", + consumer_root_id->toString()); + auto producer_indexed_id = producer_indexed_it->second; + std::unordered_set producer_indexed_root_ids; + if (producer_indexed_id == consumer_root_id) { + producer_indexed_root_ids.insert(consumer_root_id); + } else { + auto producer_within_contig_it = + producer_contig_finder.withinContigIDs().find(producer_indexed_id); + TORCH_INTERNAL_ASSERT( + producer_within_contig_it != + producer_contig_finder.withinContigIDs().end()); + producer_indexed_root_ids = producer_within_contig_it->second; + } + + // Pick the smaller merged domain + info.contig_root_ids = + consumer_indexed_root_ids.size() < producer_indexed_root_ids.size() + ? consumer_indexed_root_ids + : producer_indexed_root_ids; + } + private: std::unordered_set domains_; IterDomain* vectorized_id_ = nullptr; @@ -284,8 +396,10 @@ class VectorizeValidator : public OptInDispatch { } } - // If no vectorized id's found simply return; - if (v_id == nullptr) { + // If no vectorized ids found simply return. If vectorized access is + // broadcast, it won't generate an actual vector instruction, so can safely + // be ignore + if (v_id == nullptr || v_id->isBroadcast()) { return; } @@ -318,7 +432,10 @@ class VectorizeValidator : public OptInDispatch { vector_size, " however, vector sizes only upto and including 16 bytes are supported."); - auto replay_exprs = StmtSort::getExprs(fusion, {v_id}, false); + auto replay_exprs = DependencyCheck::getAllExprsBetween( + {tv->getMaybeRFactorDomain().begin(), + tv->getMaybeRFactorDomain().end()}, + {v_id}); VectorizeValidator validator(v_id); @@ -376,12 +493,56 @@ class VectorizeValidator : public OptInDispatch { "Vectorized dim has to be from a contiguous inner most position: ", tv, "\n"); + + // Save info required to lowering and runtime validation + auto consumer_word_size_it = + GpuLower::current()->vectorizedAccesses().find(tv); + if (consumer_word_size_it != + GpuLower::current()->vectorizedAccesses().end()) { + consumer_word_size_it->second = std::max( + (int)vector_size_optional.value(), consumer_word_size_it->second); + } else { + GpuLower::current()->vectorizedAccesses().emplace( + tv, (int)vector_size_optional.value()); + } + auto producer_tv = tv->definition()->inputs().at(0)->as(); + auto producer_word_size_it = + GpuLower::current()->vectorizedAccesses().find(producer_tv); + if (producer_word_size_it != + GpuLower::current()->vectorizedAccesses().end()) { + producer_word_size_it->second = std::max( + (int)vector_size_optional.value(), producer_word_size_it->second); + } else { + GpuLower::current()->vectorizedAccesses().emplace( + producer_tv, (int)vector_size_optional.value()); + } + + VectorizedSetInfo vectorized_set_info; + vectorized_set_info.consumer_tv = tv; + vectorized_set_info.producer_tv = producer_tv; + // Note that VectorizedSetInfo is about each instance of + // vectorized set operations, so the word size is the size of this + // specific vectorized set. + vectorized_set_info.word_size = (int)vector_size_optional.value(); + vectorized_set_info.vectorized_leaf_id = v_id; + vectorized_set_info.vectorized_root_id = validator.vectorized_id_; + // For aligned vectorize, the extent of a vectorized domain must + // be divisible by the vector word size. The domain is usually + // just one of the root domains, but can be a merged domain of + // contiguous domains. + if (!misaligned_vectorize) { + fillVectorizedContigRootDomains(tv, vectorized_set_info); + } + GpuLower::current()->vectorizedSetInfo().emplace_back(vectorized_set_info); } }; } // namespace -void validateVectorize(Fusion* fusion) { +// Uses ContigIDs to find root contig domains that a vectorized domain +// depends on. As ContigIDs depends on HaloInfo, this must be done +// after HaloInfo is created. +void validateAndCollectVectorizeInfo(Fusion* fusion) { FUSER_PERF_SCOPE("GpuLower::Lower::validateVectorize"); FusionGuard fg(fusion); @@ -443,6 +604,10 @@ void validateVectorize(Fusion* fusion) { "TensorView: ", tv); } + // Validate the vectorized domain maps to the innermost domain of + // tv. Note that we don't need to validate its producer tv as + // both Vectorize and MisalignedVectorize can only be used with + // UnaryOp::Set. if (has_vectorize_dim || has_misaligned_vectorize_dim) { VectorizeValidator::validate(tv); } @@ -451,176 +616,6 @@ void validateVectorize(Fusion* fusion) { namespace { -// Validate parallelization of a single tensor -void validateParallelizationOfTensor(TensorView* tv) { - // Each ParallelType can be used only once. - ParallelTypeBitmap pt_map; - for (size_t i = 0; i < tv->nDims(); ++i) { - auto axis = tv->axis(i); - auto ptype = axis->getParallelType(); - if (!isParallelTypeThread(ptype)) { - continue; - } - - // It doesn't matter if this axis is a non-concretized broadcast - // TODO: merging broadcast and non-broadcast - if (axis->isBroadcast() && - !GpuLower::current()->concretizedBroadcastDomains().isConcretized( - axis)) { - continue; - } - - TORCH_INTERNAL_ASSERT( - !pt_map.get(ptype), - "Multiple use of ", - ptype, - " in tensor t", - tv->name(), - ": ", - tv); - pt_map.set(ptype); - } - - // If this tensor is predicated by a paralel type, it should not be - // used to parallelize any domain of this tensor - - const auto thread_pred = - GpuLower::current()->threadPredMap().getPredicateInfo(tv); - - auto predicated_parallel_types = pt_map & thread_pred.limited_types; - - TORCH_INTERNAL_ASSERT( - predicated_parallel_types.none(), - "Invalid parallelization of tensor t", - tv->name(), - ". The tensor is parallelized with ", - predicated_parallel_types.toString(), - ", but it's invalid to use the types as the tensor is also predicated with them.", - ", thread pred: ", - thread_pred.limited_types.toString()); -} - -} // namespace - -void validateParallelize(Fusion* fusion) { - FUSER_PERF_SCOPE("GpuLower::Lower::validateParallelize"); - FusionGuard fg(fusion); - - const auto& par_map = GpuLower::current()->caParallelMap(); - const auto& loop_map = GpuLower::current()->caLoopMap(); - const auto& pred_map = GpuLower::current()->threadPredMap(); - - auto exprs = StmtSort::getExprs(fusion); - - for (auto expr : exprs) { - if (!ir_utils::isTvOp(expr)) { - continue; - } - // Validate parallelization of each consumer by itself - for (auto consumer : ir_utils::filterByType(expr->outputs())) { - validateParallelizationOfTensor(consumer); - } - // Validate parallelization between a producer and a consumer - for (auto producer : ir_utils::filterByType(expr->inputs())) { - // Parallelization on input tensors have no effect. - if (producer->isFusionInput()) { - continue; - } - const auto parallel_bcast_doms = - pred_map.getParallelBroadcastDomains(producer); - for (const auto i : c10::irange(producer->nDims())) { - // If a producer axis is threaded, either with threadIdx or - // blockIdx, there must be a mapped consumer axis with the - // same ParallelType. An exception is when the producer is - // allocated on shared memory and its parallelized with - // threadIdx. In that case, there is no parallelization - // constraint on the consumer as syncthreads will be inserted - // when necessary. - auto producer_axis = producer->axis(i); - auto producer_ptype = - par_map.getConcreteMappedID(producer_axis)->getParallelType(); - if (!isParallelTypeThread(producer_ptype)) { - continue; - } - // When the producer axis is a broadcast, it is not really - // parallelized unless thread-predicated - if (producer_axis->isBroadcast() && - !parallel_bcast_doms.get(producer_ptype)) { - continue; - } - // No constraint on the consumer tensor when the producer - // axis is parallelized with threadIdx and allocates on - // shared memory - if (isParallelTypeThreadDim(producer_ptype) && - producer->getMemoryType() == MemoryType::Shared) { - continue; - } - // There should be also nothing to validate when the producer - // axis is reduction. - if (producer_axis->isReduction()) { - continue; - } - // There must be a consumer axis that uses the same indexing - // with the same parallel type as the producer axis. The loop - // map is used to to find such an axis. Broadcast forwarding - // does not cause any inconsistent parallelization as indexing - // takes care of the forwarding. - for (auto consumer : - ir_utils::filterByType(expr->outputs())) { - auto it = std::find_if( - consumer->domain()->domain().begin(), - consumer->domain()->domain().end(), - [&](IterDomain* consumer_axis) { - return loop_map.areMapped(producer_axis, consumer_axis); - }); - TORCH_INTERNAL_ASSERT( - it != consumer->domain()->domain().end(), - "Inconsistent parallelization found between TV", - producer->name(), - " (", - producer, - ") and TV", - consumer->name(), - "(", - consumer, - "). ", - "TV", - consumer->name(), - " does not have a matching axis for parallelized producer axis, ", - producer_axis, - ". CA Map: ", - loop_map.toString()); - auto consumer_axis = *it; - auto consumer_ptype = - par_map.getConcreteMappedID(consumer_axis)->getParallelType(); - TORCH_INTERNAL_ASSERT( - producer_ptype == consumer_ptype, - "Inconsistent parallelization found between TV", - producer->name(), - " (", - producer, - ") and TV", - consumer->name(), - "(", - consumer, - "). " - "Producer axis, ", - producer_axis, - " is parallelized with ", - stringifyThread(producer_ptype), - ", but the parallel type of its matching consumer axis, ", - consumer_axis, - " is ", - stringifyThread(consumer_ptype), - "."); - } - } - } - } -} - -namespace { - // Backward propagation of partial ranges from outputs to // inputs. Necessary to determine required ranges to compute. // @@ -802,6 +797,95 @@ void validatePartialSplit(Fusion* fusion) { } } +namespace { + +//! Utility to make sure targeted gpu capability is +//! higher than provided major.minor. +void validateMinimumArch(int major, int minor) { + auto prop = at::cuda::getCurrentDeviceProperties(); + TORCH_INTERNAL_ASSERT(prop->major >= major); + if (prop->major == major) { + TORCH_INTERNAL_ASSERT(prop->minor >= minor); + } +} + +//! Validates that the operand and result tensors +//! of mma ops are swizzled and also validates +//! specialization of tidx as lane id. +void validateMmaTensors(MmaOp* mma) { + bool tidx_validated = false; + std::vector to_validate = { + mma->inA()->as(), + mma->inB()->as(), + mma->out()->as()}; + + for (auto tv : to_validate) { + for (auto id : tv->domain()->domain()) { + auto ptype = id->getParallelType(); + if (ptype == ParallelType::TIDx) { + TORCH_INTERNAL_ASSERT( + id->isMmaSwizzled(), + "TIDx for mma input/output must be set by WarpMmaSwizzler", + id, + tv); + if (!tidx_validated) { + // Check that TIDx is exact lane_id + const auto& paralel_dim_map = + GpuLower::current()->parallelDimensionMap(); + TORCH_INTERNAL_ASSERT( + paralel_dim_map.isExact(ptype) && + paralel_dim_map.get(ptype)->getInt().has_value() && + paralel_dim_map.get(ptype)->getInt().value() == + at::cuda::warp_size(), + "TIDx is reserved for lane id in mma kernels, and it needs to be exactly a warp"); + tidx_validated = true; + } + } + } + } + + // Note: this check will be relaxed in a follow up. + auto validate_operand_ids = [](const TensorView* tv) { + TORCH_INTERNAL_ASSERT( + std::all_of( + tv->domain()->domain().begin() + tv->getComputeAtPosition(), + tv->domain()->domain().end(), + [](IterDomain* id) { + return id->isMmaSwizzled() || + (id->isBroadcast() && + id->getParallelType() == ParallelType::Serial); + }), + "All id's on the right of CA pos needs to be mma-swizzled by WarpMmaSwizzler\n", + tv); + }; + + validate_operand_ids(mma->inA()->as()); + validate_operand_ids(mma->inB()->as()); +} + +} // namespace + +//! Validate data format and GPU arch compatibility of scheduled +//! mma operators on the fusion. +void validateMma(Fusion* fusion) { + auto exprs = StmtSort::getExprs(fusion); + + for (auto expr : exprs) { + if (auto mma = dynamic_cast(expr)) { + validateMmaTensors(mma); + + switch (mma->options().macro) { + case MmaOptions::MacroType::Volta_16_16_4: + validateMinimumArch(7, 0); + break; + default: + TORCH_INTERNAL_ASSERT(false, "validate mma: unsupported macro"); + break; + } + } + } +} + } // namespace cuda } // namespace fuser } // namespace jit diff --git a/torch/csrc/jit/codegen/cuda/lower_validation.h b/torch/csrc/jit/codegen/cuda/lower_validation.h index 115df13c32201e..a3009fc5dddc32 100644 --- a/torch/csrc/jit/codegen/cuda/lower_validation.h +++ b/torch/csrc/jit/codegen/cuda/lower_validation.h @@ -11,16 +11,9 @@ namespace cuda { void validateIr(Fusion* fusion); -void validateVectorize(Fusion* fusion); - -//! Validates all tensors are consistently parallelized. Basically, -//! when a producer axis is threaded, either with threadIdx or -//! blockIdx, there must be a mapped consumer axis with the -//! same ParallelType with some exceptions. -//! -//! This function assumes Loop and Parallel ComputeAtMaps are already -//! built as they are used to validate consistency. -void validateParallelize(Fusion* fusion); +//! Validate vectorization and collect information on vectorization +//! used in code generation as well as runtime validation. +void validateAndCollectVectorizeInfo(Fusion* fusion); //! Validates partial split expressions. Partial split only uses an //! inner subdomain specified by start and stop offsets, ignoring the @@ -30,6 +23,10 @@ void validateParallelize(Fusion* fusion); //! calculated that are necessary for output values. void validatePartialSplit(Fusion* fusion); +//! Validate data format and GPU arch compatibility of scheduled +//! mma operators on the fusion. +void validateMma(Fusion* fusion); + } // namespace cuda } // namespace fuser } // namespace jit diff --git a/torch/csrc/jit/codegen/cuda/lower_warp_reduce.cpp b/torch/csrc/jit/codegen/cuda/lower_warp_reduce.cpp index 630d3128e783d6..1d87790c014fb8 100644 --- a/torch/csrc/jit/codegen/cuda/lower_warp_reduce.cpp +++ b/torch/csrc/jit/codegen/cuda/lower_warp_reduce.cpp @@ -13,6 +13,46 @@ namespace cuda { namespace { +//! A helper class for EliminateDeadBroadcastAndAllocate. Eliminate +//! dead Allocate and Broadcast detected by EliminateDeadBroadcastAndAllocate. +class DeadTvEliminator : private kir::ExprMutator { + public: + static std::vector run( + const std::vector& exprs, + const std::unordered_set& dead_tvs) { + return DeadTvEliminator(exprs, dead_tvs).exprs_; + } + + private: + DeadTvEliminator( + const std::vector& exprs, + const std::unordered_set& dead_tvs) + : dead_tvs_(dead_tvs) { + traverseAndInsert(exprs); + } + + using kir::ExprMutator::handle; + + void handle(kir::Allocate* allocate) final { + if (auto buffer_tv = dynamic_cast(allocate->buffer())) { + if (dead_tvs_.count(buffer_tv)) { + registerRemove(allocate); + } + } + } + + void handle(BroadcastOp* broadcast) final { + if (auto out_ti = dynamic_cast(broadcast->out())) { + if (dead_tvs_.count(out_ti->view())) { + registerRemove(broadcast); + } + } + } + + private: + const std::unordered_set& dead_tvs_; +}; + //! A simple DCE for eliminating the //! parallel broadcasts that has been fused //! and their corresponding allocations @@ -20,14 +60,13 @@ class EliminateDeadBroadcastAndAllocate { public: static std::vector run(const std::vector& exprs) { EliminateDeadBroadcastAndAllocate dce(exprs); - return dce.result_exprs_; + return DeadTvEliminator::run(exprs, dce.dead_tvs_); } private: EliminateDeadBroadcastAndAllocate(const std::vector& exprs) { findLiveTvs(exprs); findDeadTvs(); - eliminateDeadCode(exprs); } void findLiveTvs(const std::vector& exprs) { @@ -70,93 +109,10 @@ class EliminateDeadBroadcastAndAllocate { } } - void eliminateDeadCode(const std::vector& exprs) { - result_exprs_ = eliminateDeadCodeInScope(exprs); - } - - bool shouldEliminate(Expr* expr) { - if (auto allocate = dynamic_cast(expr)) { - if (auto buffer_tv = dynamic_cast(allocate->buffer())) { - if (dead_tvs_.count(buffer_tv)) { - return true; - } - } - } else if (auto broadcast = dynamic_cast(expr)) { - if (auto out_ti = dynamic_cast(broadcast->out())) { - if (dead_tvs_.count(out_ti->view())) { - return true; - } - } - } - return false; - } - - //! Returns a new vector of exprs with dead exprs - //! eliminated. - std::vector eliminateDeadCodeInScope(const std::vector& exprs) { - std::vector result_exprs; - - for (auto expr : exprs) { - auto result_expr = expr; - if (auto for_loop = dynamic_cast(expr)) { - result_expr = eliminateDeadCode(for_loop); - } else if (auto ite = dynamic_cast(expr)) { - result_expr = eliminateDeadCode(ite); - } else { - if (shouldEliminate(expr)) { - result_expr = nullptr; - } - } - - // Push the result expr if not eliminated - if (result_expr) { - result_exprs.push_back(result_expr); - } - } - - return result_exprs; - } - - kir::ForLoop* eliminateDeadCode(kir::ForLoop* for_loop) { - auto new_loop_body = eliminateDeadCodeInScope(for_loop->body().exprs()); - if (new_loop_body.empty()) { - return nullptr; - } - - // TODO: we will need a kernel_ir cloner to make this - // kind of logic re-usable. - auto new_loop = scope_utils::cloneForLoop(for_loop); - - for (auto expr : new_loop_body) { - new_loop->body().push_back(expr); - } - return new_loop; - } - - kir::IfThenElse* eliminateDeadCode(kir::IfThenElse* ite) { - auto new_then_body = eliminateDeadCodeInScope(ite->thenBody().exprs()); - auto new_else_body = eliminateDeadCodeInScope(ite->elseBody().exprs()); - if (new_then_body.empty() && new_else_body.empty()) { - return nullptr; - } - - auto new_ite = scope_utils::cloneIfThenElse(ite); - - for (auto expr : new_then_body) { - new_ite->thenBody().push_back(expr); - } - for (auto expr : new_else_body) { - new_ite->elseBody().push_back(expr); - } - return new_ite; - } - private: std::unordered_set live_tvs_; std::unordered_set dead_tvs_; std::unordered_set candidate_tv_set_; - - std::vector result_exprs_; }; //! A pass to eliminate redundant parallel broadcasts that are consumers @@ -220,6 +176,7 @@ class FuseBroadcastWithWarpReduce : private kir::IrVisitor { } } } + kir::IrVisitor::handle(expr); } bool openLoopNestLevel(IterDomain* id) { diff --git a/torch/csrc/jit/codegen/cuda/manager.cpp b/torch/csrc/jit/codegen/cuda/manager.cpp index 0f5967c004d103..94a57959d57d55 100644 --- a/torch/csrc/jit/codegen/cuda/manager.cpp +++ b/torch/csrc/jit/codegen/cuda/manager.cpp @@ -9,6 +9,7 @@ #include #include #include +#include #include #include #include @@ -182,7 +183,6 @@ void compileCudaFusionGroup(Node* fusion_node) { // node only insert meta information after itself). PropagateShapesOnGraph(graph); TypePropagate(graph); - PropagateShapesOnGraph(graph); int32_t fusion_cache_id = CudaFusionManager::getManager().registerOrGetCacheId(graph); @@ -209,7 +209,7 @@ void runCudaFusionGroup(const Node* fusion_node, Stack& stack) { FUSER_PERF_SCOPE("nvFuser::Manager::runCudaFusionGroup"); // Fallback to use if anything goes wrong - auto take_fallback = [&]() { + auto take_fallback = [&](Stack& stack) { // copying graph here since we are eliminating shape information; auto copied_graph = fusion_node->g(attr::Subgraph)->copy(); EraseShapeInformation(copied_graph); @@ -217,6 +217,24 @@ void runCudaFusionGroup(const Node* fusion_node, Stack& stack) { InterpreterState{Code(copied_graph, "fallback_cuda_fuser")}.run(stack); }; + c10::optional stack_copy; + auto compare_callback = getCudaFuserComparisonCallback(); + if (compare_callback.run_fallback) { + // make a copy of the stack + int64_t inputs_size = + static_cast(fusion_node->g(attr::Subgraph)->inputs().size()); + TORCH_INTERNAL_ASSERT(stack.size() >= inputs_size); + stack_copy = Stack(); + stack_copy->insert( + stack_copy->end(), stack.begin(), stack.end() - inputs_size); + // deepcopy the last (inputs_size) stack items + std::transform( + stack.end() - inputs_size, + stack.end(), + std::back_inserter(*stack_copy), + [](const c10::IValue& ivalue) { return ivalue.deepcopy(); }); + } + auto run_fusion = [&]() { TORCH_CHECK( fusion_node->kind() == prim::CudaFusionGroup, @@ -253,11 +271,45 @@ void runCudaFusionGroup(const Node* fusion_node, Stack& stack) { "Failed for some reason. To debug try disable codegen fallback path" "via setting the env variable" "`export PYTORCH_NVFUSER_DISABLE_FALLBACK=1`"); - take_fallback(); + take_fallback(stack); } } else { run_fusion(); } + + if (compare_callback.callback != nullptr) { + Stack fused_outputs; + Stack fallback_outputs; + int64_t output_count = + static_cast(fusion_node->g(attr::Subgraph)->outputs().size()); + TORCH_CHECK( + output_count <= stack.size(), + "Expected ", + output_count, + " outputs but found only ", + stack.size(), + " items on the stack"); + + fused_outputs.insert( + fused_outputs.begin(), stack.end() - output_count, stack.end()); + + if (stack_copy) { + take_fallback(*stack_copy); + TORCH_CHECK( + stack_copy->size() == stack.size(), + "Fused graph returns stack with ", + stack.size(), + " items, compared to ", + stack_copy->size(), + " from unfused graph"); + fallback_outputs.insert( + fallback_outputs.begin(), + stack_copy->end() - output_count, + stack_copy->end()); + } + auto graph_str = fusion_node->g(attr::Subgraph)->toString(); + compare_callback.callback(fused_outputs, fallback_outputs, graph_str); + } } } // namespace cuda diff --git a/torch/csrc/jit/codegen/cuda/mma_type.cpp b/torch/csrc/jit/codegen/cuda/mma_type.cpp new file mode 100644 index 00000000000000..3751cdea6bcf67 --- /dev/null +++ b/torch/csrc/jit/codegen/cuda/mma_type.cpp @@ -0,0 +1,139 @@ +#include + +namespace torch { +namespace jit { +namespace fuser { +namespace cuda { + +MmaBuilder::MmaBuilder( + MmaOptions::MacroType macro, + MatMulTileOptions gemm_tile) { + option_.macro = macro; + // Calculate accumulator stride, will be removed once transpose swizzle ready + int outer_stride = gemm_tile.warp_tile.n / gemm_tile.instruction_tile.n; + switch (macro) { + // Numbers depend on actual output layout of mma instruction + case MmaOptions::MacroType::Volta_16_16_4: + option_.accumulator_stride = outer_stride * 4; + break; + default: + TORCH_CHECK(false, "unsupported macro"); + break; + } +} + +MmaBuilder& MmaBuilder::layout(MmaOptions::MmaInputLayout layout) { + option_.operand_layout = layout; + return *this; +} + +MmaBuilder& MmaBuilder::operand(MmaOptions::Operand a_or_b) { + option_.operand = a_or_b; + return *this; +} + +// TODO: validate op config +MmaOptions MmaBuilder::build() const { + return option_; +} + +bool isVolta(MmaOptions::MacroType macro) { + return macro == MmaOptions::MacroType::Volta_16_16_4; +} + +bool isTuring(MmaOptions::MacroType macro) { + return macro == MmaOptions::MacroType::Turing_16_8_16; +} + +bool isAmpere(MmaOptions::MacroType macro) { + return false; +} + +int getOutputRegisterSize(MmaOptions::MacroType macro) { + switch (macro) { + case MmaOptions::MacroType::Volta_16_16_4: + return 8; + break; + default: + TORCH_INTERNAL_ASSERT(false, "unknown macro"); + break; + } + return -1; +} + +int getInputARegisterSize(MmaOptions::MacroType macro) { + switch (macro) { + case MmaOptions::MacroType::Volta_16_16_4: + return 4; + break; + default: + TORCH_INTERNAL_ASSERT(false, "unknown macro"); + break; + } + return -1; +} + +int getInputBRegisterSize(MmaOptions::MacroType macro) { + switch (macro) { + case MmaOptions::MacroType::Volta_16_16_4: + return 4; + break; + default: + TORCH_INTERNAL_ASSERT(false, "unknown macro"); + break; + } + return -1; +} + +bool isOperandTransposed(MmaOptions options) { + switch (options.operand) { + case MmaOptions::Operand::A: + return options.operand_layout == MmaOptions::MmaInputLayout::TT || + options.operand_layout == MmaOptions::MmaInputLayout::TN; + case MmaOptions::Operand::B: + return options.operand_layout == MmaOptions::MmaInputLayout::TT || + options.operand_layout == MmaOptions::MmaInputLayout::NT; + default: + TORCH_CHECK(false, "isOperandTransposed: please specify operand"); + } + return false; +} + +std::string toString(MmaOptions::MmaInputLayout input_layout) { + std::stringstream ss; + switch (input_layout) { + case MmaOptions::MmaInputLayout::TT: + ss << "TT"; + break; + case MmaOptions::MmaInputLayout::TN: + ss << "TN"; + break; + case MmaOptions::MmaInputLayout::NT: + ss << "NT"; + break; + default: + TORCH_INTERNAL_ASSERT(false, "unsupported operand layout"); + } + return ss.str(); +} + +std::string toString(MmaOptions::MacroType mt) { + std::stringstream ss; + switch (mt) { + case MmaOptions::MacroType::NoMMA: + ss << "NoOp"; + break; + case MmaOptions::MacroType::Volta_16_16_4: + ss << "M16N16K4"; + break; + default: + TORCH_INTERNAL_ASSERT(false, "undefined mma type"); + break; + } + return ss.str(); +} + +} // namespace cuda +} // namespace fuser +} // namespace jit +} // namespace torch diff --git a/torch/csrc/jit/codegen/cuda/mma_type.h b/torch/csrc/jit/codegen/cuda/mma_type.h new file mode 100644 index 00000000000000..5f42d41ded65e1 --- /dev/null +++ b/torch/csrc/jit/codegen/cuda/mma_type.h @@ -0,0 +1,132 @@ +#pragma once +#include +#include + +namespace torch { +namespace jit { +namespace fuser { +namespace cuda { + +//! Utility data structure for recording gemm tiles +struct GemmTile { + int m, n, k; + GemmTile(int m_, int n_, int k_) : m(m_), n(n_), k(k_) {} + + bool operator==(const GemmTile& other) { + return m == other.m && n == other.n && k == other.k; + } + + GemmTile operator/(const GemmTile& other) { + return GemmTile(m / other.m, n / other.n, k / other.k); + } +}; + +//! Utility data structure for recording gemm tiles +struct TORCH_CUDA_CU_API MatMulTileOptions { + GemmTile cta_tile = GemmTile(128, 128, 32); + GemmTile warp_tile = GemmTile(64, 64, 32); + GemmTile instruction_tile = GemmTile(16, 8, 16); + + MatMulTileOptions() = default; + MatMulTileOptions( + GemmTile cta_tile_, + GemmTile warp_tile_, + GemmTile instruction_tile_) + : cta_tile(cta_tile_), + warp_tile(warp_tile_), + instruction_tile(instruction_tile_) {} + + bool operator==(const MatMulTileOptions& other) { + return cta_tile == other.cta_tile && warp_tile == other.warp_tile && + instruction_tile == other.instruction_tile; + } +}; + +//! Information for configuring and lowering mma ops +struct MmaOptions { + //! Type of mma instrinsic macro to use + //! This will translate to which mma intrinsic from runtime string + //! to be generated to implement the mma op. The current plan + //! is to have exactly one macro for each + //! (arch, datatype, operand layout) triple, though there + //! exists multiple possibilities for some cases, e.g. for Turing and fp16 + //! one can use 16_8_8 or 16_8_16. + //! Will consider adding more choices that the scheduler can pick from + //! when our perf target becomes more fine grained, which is more likely in + //! latency bound kernels. + enum class MacroType { + NoMMA = 0, + Volta_16_16_4, + Turing_16_8_16, // place holder for turing/ampere mma + Ampere_16_8_8 // place holder for tf32 + }; + + //! [Operand Layout Convention] + //! Operand layout, T=transposed/row_major, N=normal/col_major + //! We don't support calling NN mma directly since it implies + //! a fused transpose. User needs to swap the operands and use + //! TT mma to make the transpose explicit. + //! Ordered by position of K + //! NT : K,M x K,N -> K,M,N + //! TT : M,K X K,N -> M,K,N + //! TN : M,K X N,K -> M,N,K + enum class MmaInputLayout { NT = 0, TT, TN }; + + //! Utility to annotate which input of mma this option struct describes + enum class Operand { NotOperand = 0, A, B }; + + //! Utility to annotate which mma macro this config uses. + MacroType macro = MacroType::NoMMA; + + //! Utility to annotate transposition of operands + MmaInputLayout operand_layout = MmaInputLayout::TT; + + //! Utility to annotate which input of mma this option struct describes + Operand operand = Operand::A; + + //! Accumulator register stride, will be removed when the swizzle op + //! is introduced and the output can be labeled with a transpose swizzle. + int accumulator_stride = 0; + + bool operator==(const MmaOptions& other) const { + return macro == other.macro && operand_layout == other.operand_layout && + operand == other.operand && + accumulator_stride == other.accumulator_stride; + } +}; + +//! User interface generating mma options for mma op +class TORCH_CUDA_CU_API MmaBuilder { + public: + MmaBuilder(MmaOptions::MacroType macro, MatMulTileOptions gemm_tile); + MmaBuilder& layout(MmaOptions::MmaInputLayout layout); + MmaBuilder& operand(MmaOptions::Operand a_or_b); + MmaOptions build() const; + + private: + MmaOptions option_; +}; + +//! GPU arch check for macro type +bool isVolta(MmaOptions::MacroType macro); +bool isTuring(MmaOptions::MacroType macro); +bool isAmpere(MmaOptions::MacroType macro); + +//! Returns true if the given option describes a transposed operand +bool isOperandTransposed(MmaOptions options); + +// Unpacked constants from macro type: +// exact numbers are defined by each individual instruction. +int getOutputRegisterSize(MmaOptions::MacroType macro); +int getInputARegisterSize(MmaOptions::MacroType macro); +int getInputBRegisterSize(MmaOptions::MacroType macro); + +// MMA stringify utils +std::string toString(MmaOptions::MacroType macro); +std::string toString(MmaOptions::MmaInputLayout input_layout); +std::string toString(MmaOptions::MacroType mt); + +} // namespace cuda +} // namespace fuser +} // namespace jit +} // namespace torch diff --git a/torch/csrc/jit/codegen/cuda/mutator.cpp b/torch/csrc/jit/codegen/cuda/mutator.cpp index c24e444eb566ec..5e397a5bfa116b 100644 --- a/torch/csrc/jit/codegen/cuda/mutator.cpp +++ b/torch/csrc/jit/codegen/cuda/mutator.cpp @@ -51,6 +51,8 @@ void OptOutMutator::mutate(Double* d) {} void OptOutMutator::mutate(Int* i) {} +void OptOutMutator::mutate(ComplexDouble* c) {} + void OptOutMutator::mutate(NamedScalar* ns) {} void OptOutMutator::mutate(IterDomain* id) { @@ -181,7 +183,8 @@ void OptOutMutator::mutate(ReductionOp* rop) { auto container = rop->container(); auto rop_type = rop->getReductionOpType(); container->removeExpr(rop); - IrBuilder::create(container, rop_type, init, out, in); + IrBuilder::create( + container, rop_type, init, out, in, rop->isFused()); } namespace { @@ -230,7 +233,26 @@ void OptOutMutator::mutate(WelfordOp* wop) { init_N, in_avg, in_var, - in_N); + in_N, + wop->isFused()); +} + +void OptOutMutator::mutate(MmaOp* mma) { + Val* out = maybeMutated(mma->out()); + Val* in_a = maybeMutated(mma->inA()); + Val* in_b = maybeMutated(mma->inB()); + Val* init = mma->init(); + + if (out->sameAs(mma->out()) && in_a->sameAs(mma->inA()) && + in_b->sameAs(mma->inB())) { + return; + } + + auto container = mma->container(); + auto options = mma->options(); + container->removeExpr(mma); + auto new_mma = + IrBuilder::create(container, out, in_a, in_b, init, options); } void OptOutMutator::mutate(BroadcastOp* bop) { @@ -291,6 +313,19 @@ void OptOutMutator::mutate(GatherOp* op) { IrBuilder::create(container, out, in, window_shape, pad_width); } +void OptOutMutator::mutate(ViewDtypeOp* vop) { + TensorView* out = maybeMutated(vop->out())->as(); + TensorView* in = maybeMutated(vop->in())->as(); + + if (out->sameAs(vop->out()) && in->sameAs(vop->in())) { + return; + } + + auto container = vop->container(); + container->removeExpr(vop); + IrBuilder::create(container, out, in, vop->dtype()); +} + void OptOutMutator::mutate(ViewOp* vop) { TensorView* out = maybeMutated(vop->out())->as(); TensorView* in = maybeMutated(vop->in())->as(); @@ -344,7 +379,10 @@ void OptOutMutator::mutate(Merge* m) { void OptOutMutator::mutate(kir::Allocate*) { TORCH_INTERNAL_ASSERT(false, "Not implemented yet."); } -void OptOutMutator::mutate(kir::Sync*) { +void OptOutMutator::mutate(kir::BlockSync*) { + TORCH_INTERNAL_ASSERT(false, "Not implemented yet."); +} +void OptOutMutator::mutate(kir::GridSync*) { TORCH_INTERNAL_ASSERT(false, "Not implemented yet."); } void OptOutMutator::mutate(kir::InitMagicZero*) { @@ -368,6 +406,9 @@ void OptOutMutator::mutate(kir::GridBroadcast*) { void OptOutMutator::mutate(kir::GridWelford*) { TORCH_INTERNAL_ASSERT(false, "Not implemented yet."); } +void OptOutMutator::mutate(kir::AllocateFusedReduction*) { + TORCH_INTERNAL_ASSERT(false, "Not implemented yet."); +} void OptOutMutator::removeExpr(IrContainer* container, Expr* expr) { container->removeExpr(expr); diff --git a/torch/csrc/jit/codegen/cuda/nvfuser.cmake b/torch/csrc/jit/codegen/cuda/nvfuser.cmake new file mode 100644 index 00000000000000..5dc211eb4f6cee --- /dev/null +++ b/torch/csrc/jit/codegen/cuda/nvfuser.cmake @@ -0,0 +1,58 @@ +if(BUILD_SPLIT_CUDA) + set(TORCHLIB_FLAVOR torch_cuda_cu) # chose torch_cuda_cu here since JIT is in torch_cuda_cpp +elseif(USE_CUDA) + set(TORCHLIB_FLAVOR torch_cuda) +elseif(USE_ROCM) + set(TORCHLIB_FLAVOR torch_hip) +endif() + +# The list of NVFUSER runtime files +list(APPEND NVFUSER_RUNTIME_FILES + ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/runtime/array.cu + ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/runtime/block_reduction.cu + ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/runtime/block_sync_atomic.cu + ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/runtime/block_sync_default.cu + ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/runtime/broadcast.cu + ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/runtime/fp16_support.cu + ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/runtime/fused_reduction.cu + ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/runtime/bf16_support.cu + ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/runtime/grid_broadcast.cu + ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/runtime/grid_reduction.cu + ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/runtime/grid_sync.cu + ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/runtime/helpers.cu + ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/runtime/index_utils.cu + ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/runtime/random_numbers.cu + ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/runtime/tensor.cu + ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/runtime/tuple.cu + ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/runtime/type_traits.cu + ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/runtime/welford.cu + ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/runtime/warp.cu + ${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/runtime/tensorcore.cu + ${TORCH_ROOT}/aten/src/ATen/cuda/detail/PhiloxCudaStateRaw.cuh + ${TORCH_ROOT}/aten/src/ATen/cuda/detail/UnpackRaw.cuh +) + +file(MAKE_DIRECTORY "${CMAKE_BINARY_DIR}/include/nvfuser_resources") + +# "stringify" NVFUSER runtime sources +# (generate C++ header files embedding the original input as a string literal) +set(NVFUSER_STRINGIFY_TOOL "${TORCH_SRC_DIR}/csrc/jit/codegen/cuda/tools/stringify_file.py") +foreach(src ${NVFUSER_RUNTIME_FILES}) + get_filename_component(filename ${src} NAME_WE) + set(dst "${CMAKE_BINARY_DIR}/include/nvfuser_resources/${filename}.h") + add_custom_command( + COMMENT "Stringify NVFUSER runtime source file" + OUTPUT ${dst} + DEPENDS ${src} + COMMAND ${PYTHON_EXECUTABLE} ${NVFUSER_STRINGIFY_TOOL} -i ${src} -o ${dst} + ) + add_custom_target(nvfuser_rt_${filename} DEPENDS ${dst}) + add_dependencies(${TORCHLIB_FLAVOR} nvfuser_rt_${filename}) + + # also generate the resource headers during the configuration step + # (so tools like clang-tidy can run w/o requiring a real build) + execute_process(COMMAND + ${PYTHON_EXECUTABLE} ${NVFUSER_STRINGIFY_TOOL} -i ${src} -o ${dst}) +endforeach() + +target_include_directories(${TORCHLIB_FLAVOR} PRIVATE "${CMAKE_BINARY_DIR}/include") diff --git a/torch/csrc/jit/codegen/cuda/ops/alias.cpp b/torch/csrc/jit/codegen/cuda/ops/alias.cpp index 14aff510911e24..cc3220c742feb5 100644 --- a/torch/csrc/jit/codegen/cuda/ops/alias.cpp +++ b/torch/csrc/jit/codegen/cuda/ops/alias.cpp @@ -52,11 +52,39 @@ TensorView* applyViewTransforms( } // namespace +TensorView* view(TensorView* x, DataType dtype) { + if (x->getDataType() == dtype) { + return x; + } + + // TODO: support view(dtype) for dtypes of different size. + TORCH_INTERNAL_ASSERT( + dataTypeSize(x->getDataType().value()) == dataTypeSize(dtype), + "Currently, aten::view only supports viewing the data as a type with the same size."); + + std::vector out_domain; + auto inp_domain = TensorDomain::noReductions(x->getMaybeRFactorDomain()); + out_domain.reserve(inp_domain.size()); + for (auto d : inp_domain) { + out_domain.push_back(d->clone()); + } + auto out = IrBuilder::create( + x->container(), + IrBuilder::create( + out_domain, std::vector(out_domain.size(), true)), + dtype); + + IrBuilder::create(x->container(), out, x, dtype); + return out; +} + TensorView* view( TensorView* x, const std::vector& original_sizes, const std::vector& new_sizes) { - TORCH_INTERNAL_ASSERT(x->nDims() == original_sizes.size()); + TORCH_INTERNAL_ASSERT( + TensorDomain::noReductions(x->getMaybeRFactorDomain()).size() == + original_sizes.size()); auto analyze_view = analyzeView(x, original_sizes, new_sizes); @@ -90,8 +118,7 @@ TensorView* squeeze(TensorView* x, const std::vector& sizes, int dim) { if (dim < 0) { dim = (int)(x->nDims()) + dim; } - TORCH_INTERNAL_ASSERT(dim >= 0 && dim < x->nDims()); - if (sizes[dim] == 1) { + if (dim >= 0 && dim < x->nDims() && sizes[dim] == 1) { return sum(x, {dim}); } else { return set(x); diff --git a/torch/csrc/jit/codegen/cuda/ops/alias.h b/torch/csrc/jit/codegen/cuda/ops/alias.h index 8003e3268b3285..30f3de2f228b34 100644 --- a/torch/csrc/jit/codegen/cuda/ops/alias.h +++ b/torch/csrc/jit/codegen/cuda/ops/alias.h @@ -16,6 +16,8 @@ namespace jit { namespace fuser { namespace cuda { +TORCH_CUDA_CU_API TensorView* view(TensorView* x, DataType dtype); + TORCH_CUDA_CU_API TensorView* view( TensorView* x, const std::vector& original_sizes, diff --git a/torch/csrc/jit/codegen/cuda/ops/composite.cpp b/torch/csrc/jit/codegen/cuda/ops/composite.cpp index c01b7230625596..3c7c713e734d33 100644 --- a/torch/csrc/jit/codegen/cuda/ops/composite.cpp +++ b/torch/csrc/jit/codegen/cuda/ops/composite.cpp @@ -49,18 +49,6 @@ TensorView* dropout_backward(TensorView* dy, TensorView* mask, Val* scale) { return dx; } -Val* softplus(Val* x, Val* beta, Val* threshold) { - TORCH_INTERNAL_ASSERT(x != nullptr, "Input is invalid."); - TORCH_INTERNAL_ASSERT(beta != nullptr, "Beta is invalid."); - TORCH_INTERNAL_ASSERT( - threshold != nullptr, "Threshold is not a valid Double."); - - auto op_beta = mul(x, beta); - auto maybe_result = div(log1p(exp(op_beta)), beta); - auto y = where(gt(op_beta, threshold), x, maybe_result); - return y; -} - LstmResult lstm( TensorView* prev_cell, TensorView* in_x, @@ -85,7 +73,53 @@ LstmResult lstm( return {cell, hidden}; } -Val* fast_gelu(Val* x) { +TensorView* softplus(TensorView* x, Val* beta, Val* threshold) { + TORCH_INTERNAL_ASSERT(x != nullptr, "Input is invalid."); + TORCH_INTERNAL_ASSERT(beta != nullptr, "Beta is invalid."); + TORCH_INTERNAL_ASSERT( + threshold != nullptr, "Threshold is not a valid Double."); + + auto op_beta = mul(x, beta); + auto maybe_result = div(log1p(exp(op_beta)), beta); + auto y = where(gt(op_beta, threshold), x, maybe_result); + return y; +} + +TensorView* gelu(TensorView* x) { + TORCH_INTERNAL_ASSERT(x != nullptr, "Input is invalid"); + + auto kappa = IrBuilder::create(x->container(), M_SQRT1_2); + auto half = IrBuilder::create(x->container(), 0.5); + auto one = IrBuilder::create(x->container(), 1.); + + auto cdf = mul(half, add(one, erf(mul(x, kappa)))); + auto y = mul(x, cdf); + return y; +} + +TensorView* gelu_backward(TensorView* dy, TensorView* x) { + TORCH_INTERNAL_ASSERT(dy != nullptr, "Grad Output is invalid."); + TORCH_INTERNAL_ASSERT(x != nullptr, "Input is invalid"); + + constexpr double kAlpha = M_2_SQRTPI * M_SQRT1_2 * 0.5; + const double kHalf = 0.5; + + auto cdf_1 = mul(x, IrBuilder::create(x->container(), M_SQRT1_2)); + auto cdf_2 = erf(cdf_1); + auto cdf_3 = add(cdf_2, IrBuilder::create(x->container(), 1.)); + auto cdf_4 = mul(cdf_3, IrBuilder::create(x->container(), kHalf)); + + auto pdf_1 = mul(x, x); + auto pdf_2 = mul(pdf_1, IrBuilder::create(x->container(), -kHalf)); + auto pdf_3 = exp(pdf_2); + + auto out = addcmul( + cdf_4, x, pdf_3, IrBuilder::create(x->container(), kAlpha)); + auto dx = mul(out, dy); + return dx; +} + +TensorView* tanh_gelu(TensorView* x) { TORCH_INTERNAL_ASSERT(x != nullptr, "Input is invalid"); constexpr double kBeta = M_SQRT2 * M_2_SQRTPI * 0.5; @@ -104,7 +138,7 @@ Val* fast_gelu(Val* x) { return y; } -Val* fast_gelu_backward(Val* dy, Val* x) { +TensorView* tanh_gelu_backward(TensorView* dy, TensorView* x) { TORCH_INTERNAL_ASSERT(dy != nullptr, "Grad Output is invalid."); TORCH_INTERNAL_ASSERT(x != nullptr, "Input is invalid"); @@ -139,29 +173,7 @@ Val* fast_gelu_backward(Val* dy, Val* x) { return dx; } -Val* gelu_backward(Val* dy, Val* x) { - TORCH_INTERNAL_ASSERT(dy != nullptr, "Grad Output is invalid."); - TORCH_INTERNAL_ASSERT(x != nullptr, "Input is invalid"); - - constexpr double kAlpha = M_2_SQRTPI * M_SQRT1_2 * 0.5; - const double kHalf = 0.5; - - auto cdf_1 = mul(x, IrBuilder::create(x->container(), M_SQRT1_2)); - auto cdf_2 = erf(cdf_1); - auto cdf_3 = add(cdf_2, IrBuilder::create(x->container(), 1.)); - auto cdf_4 = mul(cdf_3, IrBuilder::create(x->container(), kHalf)); - - auto pdf_1 = mul(x, x); - auto pdf_2 = mul(pdf_1, IrBuilder::create(x->container(), -kHalf)); - auto pdf_3 = exp(pdf_2); - - auto out = addcmul( - cdf_4, x, pdf_3, IrBuilder::create(x->container(), kAlpha)); - auto dx = mul(out, dy); - return dx; -} - -Val* tanh_backward(Val* dy, Val* tanh_x) { +TensorView* tanh_backward(TensorView* dy, TensorView* tanh_x) { TORCH_INTERNAL_ASSERT(dy != nullptr, "Grad Output is invalid."); TORCH_INTERNAL_ASSERT(tanh_x != nullptr, "Input is invalid"); diff --git a/torch/csrc/jit/codegen/cuda/ops/composite.h b/torch/csrc/jit/codegen/cuda/ops/composite.h index 63e17629f40b6a..99ce3c30a25208 100644 --- a/torch/csrc/jit/codegen/cuda/ops/composite.h +++ b/torch/csrc/jit/codegen/cuda/ops/composite.h @@ -31,8 +31,6 @@ TORCH_CUDA_CU_API TensorView* dropout_backward( TensorView* mask, Val* scale); -TORCH_CUDA_CU_API Val* softplus(Val* x, Val* beta, Val* threshold); - struct LstmResult { TensorView* cell = nullptr; TensorView* hidden = nullptr; @@ -45,10 +43,15 @@ TORCH_CUDA_CU_API LstmResult lstm( TensorView* cell_x, TensorView* out_x); -TORCH_CUDA_CU_API Val* fast_gelu(Val* x); -TORCH_CUDA_CU_API Val* fast_gelu_backward(Val* dy, Val* x); -TORCH_CUDA_CU_API Val* gelu_backward(Val* dy, Val* x); -TORCH_CUDA_CU_API Val* tanh_backward(Val* dy, Val* tanh_x); +TORCH_CUDA_CU_API TensorView* softplus( + TensorView* x, + Val* beta, + Val* threshold); +TORCH_CUDA_CU_API TensorView* gelu(TensorView* x); +TORCH_CUDA_CU_API TensorView* gelu_backward(TensorView* dy, TensorView* x); +TORCH_CUDA_CU_API TensorView* tanh_gelu(TensorView* x); +TORCH_CUDA_CU_API TensorView* tanh_gelu_backward(TensorView* dy, TensorView* x); +TORCH_CUDA_CU_API TensorView* tanh_backward(TensorView* dy, TensorView* tanh_x); } // namespace cuda } // namespace fuser diff --git a/torch/csrc/jit/codegen/cuda/ops/normalization.cpp b/torch/csrc/jit/codegen/cuda/ops/normalization.cpp index 4a473f662039c8..6311e67dd8f67b 100644 --- a/torch/csrc/jit/codegen/cuda/ops/normalization.cpp +++ b/torch/csrc/jit/codegen/cuda/ops/normalization.cpp @@ -7,6 +7,64 @@ namespace jit { namespace fuser { namespace cuda { +int nonNegativeAxis(int axis, int ndims) { + return (axis >= 0) ? axis : (ndims + axis); +} + +Val* numFeatures(TensorView* x, const std::vector& dims, int ndims) { + Val* num_features = IrBuilder::create(x->container(), 1); + for (const auto dim : dims) { + const int axis = nonNegativeAxis(dim, ndims); + num_features = mul(num_features, x->domain()->domain()[axis]->extent()); + } + return num_features; +} + +TensorView* mean(TensorView* x, const std::vector& dims, bool keepdim) { + TORCH_INTERNAL_ASSERT(x != nullptr, "Input is invalid."); + + const int kNumberOfDims = + TensorDomain::noReductions(x->getMaybeRFactorDomain()).size(); + + auto sum_x = sum(x, dims, keepdim); + auto y = div(sum_x, numFeatures(x, dims, kNumberOfDims)); + return y; +} + +TensorView* variance( + TensorView* x, + const std::vector& dims, + bool unbiased, + bool keepdim) { + TORCH_INTERNAL_ASSERT(x != nullptr, "Input is invalid."); + + const int kNumberOfDims = + TensorDomain::noReductions(x->getMaybeRFactorDomain()).size(); + + auto bcast_mean = mean(x, dims, true /* keepdim */); + auto x_mean_sub = sub(x, bcast_mean); + auto x_mean_sub_sq = mul(x_mean_sub, x_mean_sub); + auto sum_x_mean_sub_sq = sum(x_mean_sub_sq, dims, keepdim); + + auto num_features = numFeatures(x, dims, kNumberOfDims); + if (unbiased) { + num_features = + sub(num_features, IrBuilder::create(x->container(), 1.)); + } + auto y = div(sum_x_mean_sub_sq, num_features); + + return y; +} + +TensorView* standard_deviation( + TensorView* x, + const std::vector& dims, + bool unbiased, + bool keepdim) { + TORCH_INTERNAL_ASSERT(x != nullptr, "Input is invalid."); + return sqrt(variance(x, dims, unbiased, keepdim)); +} + TensorView* softmax(TensorView* x, int dim) { TORCH_INTERNAL_ASSERT(x != nullptr, "Input is invalid."); @@ -50,6 +108,45 @@ TensorView* softmax_backward(TensorView* dy, TensorView* y, int dim) { return dx; } +TensorView* log_softmax(TensorView* x, int dim) { + TORCH_INTERNAL_ASSERT(x != nullptr, "Input is invalid."); + + const int kNumberOfDims = + TensorDomain::noReductions(x->getMaybeRFactorDomain()).size(); + const int kReductionAxis = (dim < 0) ? dim + kNumberOfDims : dim; + TORCH_INTERNAL_ASSERT(kReductionAxis >= 0 && kReductionAxis < kNumberOfDims); + + std::vector broadcast_mask(kNumberOfDims, false); + broadcast_mask[kReductionAxis] = true; + + auto max_val = max(x, {kReductionAxis}); + auto bcast_max = broadcast(max_val, broadcast_mask); + auto x_max_sub = sub(x, bcast_max); + auto exp_val = exp(x_max_sub); + auto bcast_sum = sum(exp_val, {kReductionAxis}, true /* keepdim */); + auto log_sum_exp = log(bcast_sum); + auto y = sub(x_max_sub, log_sum_exp); + + return y; +} + +TensorView* log_softmax_backward(TensorView* dy, TensorView* y, int dim) { + TORCH_INTERNAL_ASSERT(dy != nullptr, "Grad Output is invalid."); + TORCH_INTERNAL_ASSERT(y != nullptr, "Output is invalid."); + + const int kNumberOfDims = + TensorDomain::noReductions(y->getMaybeRFactorDomain()).size(); + const int kReductionAxis = (dim < 0) ? dim + kNumberOfDims : dim; + TORCH_INTERNAL_ASSERT(kReductionAxis >= 0 && kReductionAxis < kNumberOfDims); + + auto bcast_sum_grad = sum(dy, {kReductionAxis}, true /* keepdim */); + auto softmax = exp(y); + auto softmax_sum_mul = mul(softmax, bcast_sum_grad); + auto dx = sub(dy, softmax_sum_mul); + + return dx; +} + ForwardNormResult layer_norm( TensorView* x, const std::vector& norm_shape, @@ -59,18 +156,9 @@ ForwardNormResult layer_norm( return layer_norm(x, norm_shape.size(), weight, bias, eps); } -ForwardNormResult layer_norm( - TensorView* x, - const size_t kNormShapeNumDims, - TensorView* weight, - TensorView* bias, - Val* eps) { - TORCH_INTERNAL_ASSERT(x != nullptr, "Input is invalid."); - TORCH_INTERNAL_ASSERT( - eps != nullptr && eps->getDataType().has_value() && - eps->getDataType().value() == DataType::Double, - "Epsilon (eps) is not a valid Double."); - +auto norm_properties_from_num_dims( + const TensorView* x, + const size_t kNormShapeNumDims) { // (B, C, H, W, D) tensor // norm_shape = [H, W, D] // M = outer = product of remaining dimensions = B * C @@ -82,13 +170,14 @@ ForwardNormResult layer_norm( std::vector outer_reduction_axes(kOuterNumDims); std::vector outer_broadcast_mask(kNumberOfDims, false); + std::vector inner_reduction_axes(kNormShapeNumDims); + std::vector inner_broadcast_mask(kNumberOfDims, false); + for (const auto idx : c10::irange(kOuterNumDims)) { outer_reduction_axes[idx] = idx; outer_broadcast_mask[idx] = true; } - std::vector inner_reduction_axes(kNormShapeNumDims); - std::vector inner_broadcast_mask(kNumberOfDims, false); Val* num_features = IrBuilder::create(x->container(), 1); for (const auto idx : c10::irange(kNormShapeNumDims)) { const size_t axis = kNumberOfDims - 1 - idx; @@ -96,14 +185,42 @@ ForwardNormResult layer_norm( inner_broadcast_mask[axis] = true; num_features = mul(num_features, x->domain()->domain()[axis]->extent()); } + struct result { + std::vector outer_reduction_axes; + std::vector outer_broadcast_mask; + std::vector inner_reduction_axes; + std::vector inner_broadcast_mask; + Val* num_features = nullptr; + } r; + r.outer_reduction_axes = outer_reduction_axes; + r.outer_broadcast_mask = outer_broadcast_mask; + r.inner_reduction_axes = inner_reduction_axes; + r.inner_broadcast_mask = inner_broadcast_mask; + r.num_features = num_features; + return r; +} + +ForwardNormResult layer_norm( + TensorView* x, + const size_t kNormShapeNumDims, + TensorView* weight, + TensorView* bias, + Val* eps) { + TORCH_INTERNAL_ASSERT(x != nullptr, "Input is invalid."); + TORCH_INTERNAL_ASSERT( + eps != nullptr && eps->getDataType().has_value() && + eps->getDataType().value() == DataType::Double, + "Epsilon (eps) is not a valid Double."); + + auto r = norm_properties_from_num_dims(x, kNormShapeNumDims); // Main algorithm - auto welford_out = Welford(x, inner_reduction_axes); - auto mean_bcast = broadcast(welford_out.avg, inner_broadcast_mask); + auto welford_out = Welford(x, r.inner_reduction_axes); + auto mean_bcast = broadcast(welford_out.avg, r.inner_broadcast_mask); auto x_sub_mean = sub(x, mean_bcast); - auto var_sum_bcast = broadcast(welford_out.var_sum, inner_broadcast_mask); - auto var = mul(var_sum_bcast, reciprocal(num_features)); + auto var_sum_bcast = broadcast(welford_out.var_sum, r.inner_broadcast_mask); + auto var = mul(var_sum_bcast, reciprocal(r.num_features)); auto var_eps = add(var, eps); auto invstd = rsqrt(var_eps); @@ -111,19 +228,58 @@ ForwardNormResult layer_norm( // Optional: norm * weight if (weight != nullptr) { - auto weight_bcast = broadcast(weight, outer_broadcast_mask); + auto weight_bcast = broadcast(weight, r.outer_broadcast_mask); y = mul(y, weight_bcast); } // Optional: norm * weight + bias if (bias != nullptr) { - auto bias_bcast = broadcast(bias, outer_broadcast_mask); + auto bias_bcast = broadcast(bias, r.outer_broadcast_mask); y = add(y, bias_bcast); } return {y, mean_bcast, invstd}; } +ForwardRMSNormResult rms_norm( + TensorView* x, + const std::vector& norm_shape, + TensorView* weight, + Val* eps) { + return rms_norm(x, norm_shape.size(), weight, eps); +} + +ForwardRMSNormResult rms_norm( + TensorView* x, + const size_t kNormShapeNumDims, + TensorView* weight, + Val* eps) { + TORCH_INTERNAL_ASSERT(x != nullptr, "Input is invalid."); + TORCH_INTERNAL_ASSERT( + eps != nullptr && eps->getDataType().has_value() && + eps->getDataType().value() == DataType::Double, + "Epsilon (eps) is not a valid Double."); + + auto r = norm_properties_from_num_dims(x, kNormShapeNumDims); + + // Main algorithm + auto var_sum = sum(mul(x, x), r.inner_reduction_axes); + auto var_sum_bcast = broadcast(var_sum, r.inner_broadcast_mask); + auto var = mul(var_sum_bcast, reciprocal(r.num_features)); + auto var_eps = add(var, eps); + auto invstd = rsqrt(var_eps); + + auto y = mul(x, invstd); + + // Optional: norm * weight + if (weight != nullptr) { + auto weight_bcast = broadcast(weight, r.outer_broadcast_mask); + y = mul(y, weight_bcast); + } + + return {y, invstd}; +} + BackwardNormResult layer_norm_backward( TensorView* dy, TensorView* x, @@ -138,55 +294,30 @@ BackwardNormResult layer_norm_backward( TORCH_INTERNAL_ASSERT(mean != nullptr, "Mean is invalid."); TORCH_INTERNAL_ASSERT(invstd != nullptr, "Inv std is invalid."); - // (B, C, H, W, D) tensor - // norm_shape = [H, W, D] - // M = outer = product of remaining dimensions = B * C - // N = reduction = product of norm_shape = H * W * D - // weight = bias = norm_shape tensor - const size_t kNumberOfDims = - TensorDomain::noReductions(x->getMaybeRFactorDomain()).size(); - const size_t kNormShapeNumDims = norm_shape.size(); - const size_t kOuterNumDims = kNumberOfDims - kNormShapeNumDims; - - std::vector outer_reduction_axes(kOuterNumDims); - std::vector outer_broadcast_mask(kNumberOfDims, false); - for (const auto idx : c10::irange(kOuterNumDims)) { - outer_reduction_axes[idx] = idx; - outer_broadcast_mask[idx] = true; - } - - std::vector inner_reduction_axes(kNormShapeNumDims); - std::vector inner_broadcast_mask(kNumberOfDims, false); - Val* num_features = IrBuilder::create(x->container(), 1); - for (const auto idx : c10::irange(kNormShapeNumDims)) { - const size_t axis = kNumberOfDims - 1 - idx; - inner_reduction_axes[idx] = axis; - inner_broadcast_mask[axis] = true; - num_features = mul(num_features, x->domain()->domain()[axis]->extent()); - } + auto r = norm_properties_from_num_dims(x, norm_shape.size()); auto x_hat = mul(sub(x, mean), invstd); TensorView* grad_x_hat = nullptr; if (weight != nullptr) { - auto* bcast_weight = broadcast(weight, outer_broadcast_mask); + auto* bcast_weight = broadcast(weight, r.outer_broadcast_mask); grad_x_hat = mul(dy, bcast_weight); } else { grad_x_hat = dy; } - auto a = mul(num_features, grad_x_hat); + auto a = mul(r.num_features, grad_x_hat); - auto b = sum(grad_x_hat, inner_reduction_axes); - auto bcast_b = broadcast(b, inner_broadcast_mask); + auto b = sum(grad_x_hat, r.inner_reduction_axes); + auto bcast_b = broadcast(b, r.inner_broadcast_mask); auto c1 = mul(grad_x_hat, x_hat); - auto c2 = sum(c1, inner_reduction_axes); - auto bcast_c2 = broadcast(c2, inner_broadcast_mask); + auto c2 = sum(c1, r.inner_reduction_axes); + auto bcast_c2 = broadcast(c2, r.inner_broadcast_mask); auto c3 = mul(x_hat, bcast_c2); auto inner = sub(sub(a, bcast_b), c3); - auto reciprocal_size = reciprocal(num_features); + auto reciprocal_size = reciprocal(r.num_features); TensorView* dx = nullptr; if (output_mask[0]) { @@ -195,16 +326,65 @@ BackwardNormResult layer_norm_backward( TensorView* dw = nullptr; if (output_mask[1] && weight != nullptr) { - dw = sum(mul(dy, x_hat), outer_reduction_axes); + dw = sum(mul(dy, x_hat), r.outer_reduction_axes); } TensorView* db = nullptr; if (output_mask[2] && bias != nullptr) { - db = sum(dy, outer_reduction_axes); + db = sum(dy, r.outer_reduction_axes); } return {dx, dw, db}; } +BackwardRMSNormResult rms_norm_backward( + TensorView* dy, + TensorView* x, + const std::vector& norm_shape, + TensorView* invstd, + TensorView* weight, + const std::vector& output_mask) { + TORCH_INTERNAL_ASSERT(dy != nullptr, "Grad Output is invalid."); + TORCH_INTERNAL_ASSERT(x != nullptr, "Input is invalid."); + TORCH_INTERNAL_ASSERT(invstd != nullptr, "Inv std is invalid."); + + auto r = norm_properties_from_num_dims(x, norm_shape.size()); + + auto x_hat = mul(x, invstd); + + TensorView* grad_x_hat = nullptr; + if (weight != nullptr) { + auto* bcast_weight = broadcast(weight, r.outer_broadcast_mask); + grad_x_hat = mul(dy, bcast_weight); + } else { + grad_x_hat = dy; + } + + auto a = mul(r.num_features, grad_x_hat); + + auto b = sum(grad_x_hat, r.inner_reduction_axes); + auto bcast_b = broadcast(b, r.inner_broadcast_mask); + + auto c1 = mul(grad_x_hat, x_hat); + auto c2 = sum(c1, r.inner_reduction_axes); + auto bcast_c2 = broadcast(c2, r.inner_broadcast_mask); + auto c3 = mul(x_hat, bcast_c2); + + auto inner = sub(sub(a, bcast_b), c3); + auto reciprocal_size = reciprocal(r.num_features); + + TensorView* dx = nullptr; + if (output_mask[0]) { + dx = mul(mul(reciprocal_size, invstd), inner); + } + + TensorView* dw = nullptr; + if (output_mask[1] && weight != nullptr) { + dw = sum(mul(dy, x_hat), r.outer_reduction_axes); + } + + return {dx, dw}; +} + ForwardNormResult batch_norm( TensorView* x, TensorView* weight, @@ -300,19 +480,16 @@ ForwardNormResult batch_norm( "Input running stats must have dtype defined"); auto casted_output = castOp(*rm_dtype, aliased_output); - fusion->addOutput(casted_output); fusion->aliasOutputToInput(casted_output, input_to_cast); }; if (running_mean->isFusionInput()) { - fusion->addOutput(new_mean_hat); fusion->aliasOutputToInput(new_mean_hat, running_mean); } else { cast_to_input_dtype(running_mean, new_mean_hat); } if (running_var->isFusionInput()) { - fusion->addOutput(new_var_hat); fusion->aliasOutputToInput(new_var_hat, running_var); } else { cast_to_input_dtype(running_var, new_var_hat); @@ -465,7 +642,8 @@ ForwardNormResult instance_norm( TensorView* running_var, const bool kUseInputStats, Val* momentum, - Val* eps) { + Val* eps, + bool channels_last) { auto fusion = FusionGuard::getCurFusion(); TORCH_INTERNAL_ASSERT(x != nullptr, "Input is invalid."); @@ -489,9 +667,9 @@ ForwardNormResult instance_norm( // N = reduction = H * W * D // weight = bias = C tensor const size_t kBatchDim = 0; - const size_t kChannelsDim = 1; const size_t kNumberOfDims = TensorDomain::noReductions(x->getMaybeRFactorDomain()).size(); + const size_t kChannelsDim = channels_last ? kNumberOfDims - 1 : 1; std::vector x_reduction_axes; std::vector x_broadcast_mask(kNumberOfDims, false); @@ -522,31 +700,51 @@ ForwardNormResult instance_norm( // updating running mean and running var if (running_mean != nullptr && running_var != nullptr) { + auto _running_mean = running_mean; + auto _running_var = running_var; + if (_running_mean->getDataType().value() == DataType::Half || + _running_mean->getDataType().value() == DataType::BFloat16) { + _running_mean = castOp(DataType::Float, _running_mean); + } + if (_running_var->getDataType().value() == DataType::Half || + _running_var->getDataType().value() == DataType::BFloat16) { + _running_var = castOp(DataType::Float, running_var); + } auto rev_momentum = sub(IrBuilder::create(x->container(), 1.0), momentum); auto current_mean_hat = mul(welford_out.avg, momentum); - auto mean_hat = mul(running_mean, rev_momentum); + auto mean_hat = mul(_running_mean, rev_momentum); auto new_mean_hat = add(mean_hat, current_mean_hat); // NS: static_cast to workaround VC++ error, see // https://godbolt.org/z/6Prd77xYs auto new_mean_sum = sum(new_mean_hat, {static_cast(kBatchDim)}); auto new_mean_channels_only = mul(new_mean_sum, reciprocal(B)); - fusion->addOutput(new_mean_channels_only); + if (running_mean->getDataType().value() == DataType::Half || + running_mean->getDataType().value() == DataType::BFloat16) { + new_mean_channels_only = + castOp(running_mean->getDataType().value(), new_mean_channels_only); + } + // fusion->addOutput(new_mean_channels_only); fusion->aliasOutputToInput(new_mean_channels_only, running_mean); auto num_feature_decrement = sub(N, x->container()->oneVal()); auto unbiased_var = mul(welford_out.var_sum, reciprocal(num_feature_decrement)); auto current_var_hat = mul(unbiased_var, momentum); - auto var_hat = mul(running_var, rev_momentum); + auto var_hat = mul(_running_var, rev_momentum); auto new_var_hat = add(var_hat, current_var_hat); // NS: static_cast to workaround VC++ error, see // https://godbolt.org/z/6Prd77xYs auto new_var_sum = sum(new_var_hat, {static_cast(kBatchDim)}); auto new_var_channels_only = mul(new_var_sum, reciprocal(B)); - fusion->addOutput(new_var_channels_only); + if (running_var->getDataType().value() == DataType::Half || + running_var->getDataType().value() == DataType::BFloat16) { + new_var_channels_only = + castOp(running_var->getDataType().value(), new_var_channels_only); + } + // fusion->addOutput(new_var_channels_only); fusion->aliasOutputToInput(new_var_channels_only, running_var); } @@ -590,6 +788,121 @@ ForwardNormResult instance_norm( return {y, mean, invstd}; } +BackwardNormResult instance_norm_backward( + TensorView* input, + TensorView* grad_output, + TensorView* weight, + TensorView* running_mean, + TensorView* running_var, + TensorView* save_mean, + TensorView* save_invstd, + const bool kTraining, + Val* eps, + const std::vector& output_mask, + bool channels_last) { + TORCH_INTERNAL_ASSERT(input != nullptr, "Input is invalid."); + TORCH_INTERNAL_ASSERT(grad_output != nullptr, "Grad Output is invalid."); + TORCH_INTERNAL_ASSERT( + eps != nullptr && eps->getDataType().has_value() && + eps->getDataType().value() == DataType::Double, + "Epsilon (eps) is not a valid Double."); + + // (B, C, H, W, D) tensor + // M = outer = channels + // N = reduction = B * H * W * D + // weight = bias = (C) tensor + const size_t kNumberOfDims = + TensorDomain::noReductions(input->getMaybeRFactorDomain()).size(); + // channels last format means C dimension is at axis kNumberOfDims-1 at x / + // grad_out + const size_t b_axis = 0; // for clarity + const size_t c_axis = channels_last ? kNumberOfDims - 1 : 1; + + std::vector reduction_axes; + std::vector broadcast_mask(kNumberOfDims, false); + // weight has its own broadcast mask as it is broadcast for the batch unlike + // mean/var + std::vector weight_broadcast_mask(kNumberOfDims, false); + Val* num_features = nullptr; + for (const auto axis : c10::irange(kNumberOfDims)) { + if (axis != c_axis) { + weight_broadcast_mask[axis] = true; + if (axis != b_axis) { + reduction_axes.push_back(axis); + broadcast_mask[axis] = true; + if (num_features == nullptr) { + num_features = castOp( + DataType::Double, input->domain()->domain()[axis]->extent()); + } else { + num_features = + mul(num_features, input->domain()->domain()[axis]->extent()); + } + } + } + } + + auto mean = save_mean; + auto invstd = save_invstd; + if (kTraining) { + TORCH_INTERNAL_ASSERT( + save_mean != nullptr && save_invstd != nullptr, + "When training=True, save_mean and save_invstd are required."); + } else { + mean = running_mean; + invstd = rsqrt(add(running_var, eps)); + } + mean = broadcast(mean, broadcast_mask); + + auto norm = reciprocal(num_features); + + auto grad_output_sum = sum(grad_output, reduction_axes); + auto dot_p = sum(mul(grad_output, sub(input, mean)), reduction_axes); + + auto grad_mean = broadcast(mul(grad_output_sum, norm), broadcast_mask); + + auto proj_scale = + broadcast(mul(mul(dot_p, norm), mul(invstd, invstd)), broadcast_mask); + + TensorView* grad_scale = nullptr; + + if (weight == nullptr) { + grad_scale = + mul(broadcast(invstd, broadcast_mask), + IrBuilder::create(input->container(), 1)); + } else { + grad_scale = + mul(broadcast(invstd, broadcast_mask), + broadcast(weight, weight_broadcast_mask)); + } + + TensorView* grad_input = nullptr; + if (kTraining) { + auto proj = mul(sub(input, mean), proj_scale); + grad_input = mul(sub(sub(grad_output, proj), grad_mean), grad_scale); + } else { + grad_input = mul(grad_output, grad_scale); + } + + TensorView* grad_weight = nullptr; + TensorView* grad_weight_reduced = nullptr; + if (output_mask[1]) { + grad_weight = mul(dot_p, invstd); + // TODO: grad weight needs to be reduced across batch-dim but is this the + // most efficient place or can reduction happen earlier? + grad_weight_reduced = sum(grad_weight, {0}); + } + + TensorView* grad_bias = nullptr; + TensorView* grad_bias_reduced = nullptr; + if (output_mask[2]) { + grad_bias = grad_output_sum; + // TODO: same as above for grad weight + grad_bias_reduced = sum(grad_bias, {0}); + } + + return {grad_input, grad_weight_reduced, grad_bias_reduced}; +} + } // namespace cuda } // namespace fuser } // namespace jit diff --git a/torch/csrc/jit/codegen/cuda/ops/normalization.h b/torch/csrc/jit/codegen/cuda/ops/normalization.h index b28cdf6b33ca88..74d8cc4ab65099 100644 --- a/torch/csrc/jit/codegen/cuda/ops/normalization.h +++ b/torch/csrc/jit/codegen/cuda/ops/normalization.h @@ -28,6 +28,33 @@ struct BackwardNormResult { TensorView* grad_bias = nullptr; }; +struct ForwardRMSNormResult { + TensorView* output = nullptr; + TensorView* invstd = nullptr; +}; + +struct BackwardRMSNormResult { + TensorView* grad_input = nullptr; + TensorView* grad_weight = nullptr; +}; + +TORCH_CUDA_CU_API TensorView* mean( + TensorView* x, + const std::vector& dims, + bool keepdim); + +TORCH_CUDA_CU_API TensorView* variance( + TensorView* x, + const std::vector& dims, + bool unbiased, + bool keepdim); + +TORCH_CUDA_CU_API TensorView* standard_deviation( + TensorView* x, + const std::vector& dims, + bool unbiased, + bool keepdim); + TORCH_CUDA_CU_API TensorView* softmax(TensorView* x, int dim); TORCH_CUDA_CU_API TensorView* softmax_backward( @@ -35,6 +62,13 @@ TORCH_CUDA_CU_API TensorView* softmax_backward( TensorView* y, const int dim); +TORCH_CUDA_CU_API TensorView* log_softmax(TensorView* x, int dim); + +TORCH_CUDA_CU_API TensorView* log_softmax_backward( + TensorView* dy, + TensorView* y, + const int dim); + TORCH_CUDA_CU_API ForwardNormResult layer_norm( TensorView* x, const std::vector& norm_shape, @@ -49,6 +83,18 @@ TORCH_CUDA_CU_API ForwardNormResult layer_norm( TensorView* bias, Val* eps); +TORCH_CUDA_CU_API ForwardRMSNormResult rms_norm( + TensorView* x, + const std::vector& norm_shape, + TensorView* weight, + Val* eps); + +TORCH_CUDA_CU_API ForwardRMSNormResult rms_norm( + TensorView* x, + const size_t kNormShapeNumDims, + TensorView* weight, + Val* eps); + TORCH_CUDA_CU_API BackwardNormResult layer_norm_backward( TensorView* dy, TensorView* x, @@ -59,6 +105,14 @@ TORCH_CUDA_CU_API BackwardNormResult layer_norm_backward( TensorView* bias, const std::vector& output_mask); +TORCH_CUDA_CU_API BackwardRMSNormResult rms_norm_backward( + TensorView* dy, + TensorView* x, + const std::vector& norm_shape, + TensorView* rstd, + TensorView* weight, + const std::vector& output_mask); + TORCH_CUDA_CU_API ForwardNormResult batch_norm( TensorView* x, TensorView* weight, @@ -89,9 +143,23 @@ TORCH_CUDA_CU_API ForwardNormResult instance_norm( TensorView* bias, TensorView* running_mean, TensorView* running_var, - const bool kUseInputStats, + const bool kUseInputStats, // kTraining? Val* momentum, - Val* eps); + Val* eps, + bool channels_last = false); + +TORCH_CUDA_CU_API BackwardNormResult instance_norm_backward( + TensorView* x, + TensorView* dy, + TensorView* weight, + TensorView* running_mean, + TensorView* running_var, + TensorView* save_mean, + TensorView* save_invstd, + const bool kTraining, + Val* eps, + const std::vector& output_mask, + bool channels_last = false); } // namespace cuda } // namespace fuser diff --git a/torch/csrc/jit/codegen/cuda/parallel_dimension_map.cpp b/torch/csrc/jit/codegen/cuda/parallel_dimension_map.cpp index d966fc21a971a7..795eab0a634f5c 100644 --- a/torch/csrc/jit/codegen/cuda/parallel_dimension_map.cpp +++ b/torch/csrc/jit/codegen/cuda/parallel_dimension_map.cpp @@ -43,28 +43,22 @@ void ParallelDimensionMap::build(Fusion* fusion) { } void ParallelDimensionMap::registerConstantExtent(IterDomain* id) { - ExpressionEvaluator ee(id->fusion()); - auto extent_int = ee.evaluate(id->extent()); - if (!extent_int.has_value()) { + if (!id->extent()->isConstScalar()) { // Nothing to do if not constant return; } - auto const_extent = extent_int.value(); + ExpressionEvaluator ee(id->fusion()); + auto extent_int = ee.evaluate(id->extent()); + TORCH_INTERNAL_ASSERT( + extent_int.has_value(), + "Extent of ", + id->toString(), + " should have been constant, but could not be evaluated at compile time."); - // Ignore if this is derived from a size-1 domain as it is likely a - // size-1 broadcast domain and that does not represent the actual - // dimension even if it's constant. Being size-1 may not always mean - // it's a broadcast domain, but it'd be safe to assume it is mostly - // the case. If it is not a broadcast, ignoring this domain does not - // impact the correctness. - auto extent_inputs = InputsOf::output(id->fusion(), id->extent()); - if (std::any_of(extent_inputs.begin(), extent_inputs.end(), [](Val* input) { - return input->isOneInt(); - })) { - return; - } + auto const_extent = extent_int.value(); + // Uses index map auto concrete_id = getCAMappedConcreteDomain(id); auto existing_it = constant_extent_map_.find(id); @@ -106,14 +100,13 @@ void ParallelDimensionMap::populateDimensionMapWithSingleCASet( auto it = constant_extent_map_.find(id); if (it != constant_extent_map_.end()) { - if (it->second.size() == 1) { - dim_map_.insert({pt, IrBuilder::create(*(it->second.begin()))}); - exact_types_.insert(pt); - } else { - // Multiple constant dimensions found; Use the corresponding - // symbolic parallel dim - dim_map_.insert({pt, NamedScalar::getParallelDim(pt)}); - } + TORCH_INTERNAL_ASSERT( + it->second.size() == 1, + "Only one value found mapped to parallel type ", + stringifyThread(pt), + " yet its bound to multiple extents."); + dim_map_.insert({pt, IrBuilder::create(*(it->second.begin()))}); + exact_types_.insert(pt); } else { // Prefer to use blockDim/gridDim if not constant dim_map_.insert({pt, NamedScalar::getParallelDim(pt)}); @@ -200,7 +193,9 @@ void ParallelDimensionMap::adjustMappingsForWarpPadding() { // non-exact. auto& warp_info = gpu_lower->getWarpPaddedParallelInfo(); - if (!warp_info.is_tidx_padded) { + // TIDx isn't really padded if there isn't a warp reduction (this could + // change) + if (!(warp_info.is_tidx_padded && warp_info.has_warp_reduction)) { return; } @@ -218,11 +213,24 @@ void ParallelDimensionMap::adjustMappingsForWarpPadding() { return; } } + // If tidx is strictly defined as blockDim.x then it must be set to a + // multiple of the warp and can be considered exact + bool tidx_def_trivial = true; + for (auto entry : concrete_dom_map_.at(tidx_pt)) { + if (!entry->isA() || + !entry->as()->sameAs( + NamedScalar::getParallelDim(tidx_pt))) { + tidx_def_trivial = false; + } + } + if (tidx_def_trivial) { + return; + } } // TIDx is padded to a multiple of warp. If it's known to be a // single warp, use the constant warp size as the dimension of - // TIDx. Otherwise, jsut use blockDim.x. + // TIDx. Otherwise, just use blockDim.x. if (warp_info.is_tidx_single_warp) { dim_map_.at(ParallelType::TIDx) = IrBuilder::create(warp_size); } else { @@ -292,6 +300,13 @@ bool ParallelDimensionMap::equalDim(Val* dim1, Val* dim2) { // If both are BinaryOp or UnaryOp, check their inputs. Since these // Vals are IterDomain extents, UnaryOp should not occur, but // checking shouldn't be harmful. + // TODO: + // We might be able to replace this with dim1->toInlineString() == + // dim2->toInlineString() + // If we want this less conservative we could make an "exact map" which + // could be another mode in compute at that maps all iter domains, but not + // concretized broadcast axes and only forwards through non-concretized + // broadcast axes. if ((dim1_def->isA() && dim2_def->isA() && (dim1_def->as()->getBinaryOpType() == dim2_def->as()->getBinaryOpType())) || diff --git a/torch/csrc/jit/codegen/cuda/parallel_type_bitmap.h b/torch/csrc/jit/codegen/cuda/parallel_type_bitmap.h index 3bfb32d38bc027..642017a3c0977f 100644 --- a/torch/csrc/jit/codegen/cuda/parallel_type_bitmap.h +++ b/torch/csrc/jit/codegen/cuda/parallel_type_bitmap.h @@ -3,6 +3,7 @@ #include #include +#include #include #include #include @@ -160,6 +161,20 @@ class ParallelTypeBitmap { *this |= ParallelTypeBitmap(kBIDBits); } + //! Clear all of the TID flags + void clearAllTID() { + auto tid_bits = ParallelTypeBitmap(kTIDBits); + auto not_tid_bits = ~tid_bits; + *this &= not_tid_bits; + } + + //! Clear all of the BID flags + void clearAllBID() { + auto bid_bits = ParallelTypeBitmap(kBIDBits); + auto not_bid_bits = ~bid_bits; + *this &= not_bid_bits; + } + //! Get an iterator to traverse set types Iterator begin() const { return Iterator::begin(*this); @@ -271,6 +286,52 @@ inline ParallelTypeBitmap::Iterator ParallelTypeBitmap::Iterator::end( return Iterator(map, kOffsetEnd); } +//! Map from ParallelType to template type T +template +class ParallelTypeMap { + public: + ParallelTypeMap() = default; + + ParallelTypeMap(const T& init) { + std::fill(map_.begin(), map_.end(), init); + } + + T& operator[](ParallelType pt) { + return map_[getParallelTypeBitMapOffset(pt)]; + } + + const T& operator[](ParallelType pt) const { + return map_[getParallelTypeBitMapOffset(pt)]; + } + + T& at(ParallelType pt) { + return map_.at(getParallelTypeBitMapOffset(pt)); + } + + const T& at(ParallelType pt) const { + return map_.at(getParallelTypeBitMapOffset(pt)); + } + + auto begin() { + return map_.begin(); + } + + auto begin() const { + return map_.begin(); + } + + auto end() { + return map_.begin(); + } + + auto end() const { + return map_.begin(); + } + + private: + std::array map_; +}; + } // namespace cuda } // namespace fuser } // namespace jit diff --git a/torch/csrc/jit/codegen/cuda/parser.cpp b/torch/csrc/jit/codegen/cuda/parser.cpp index 94dad076db85ca..419bb028e3dc04 100644 --- a/torch/csrc/jit/codegen/cuda/parser.cpp +++ b/torch/csrc/jit/codegen/cuda/parser.cpp @@ -38,11 +38,15 @@ constexpr auto kNumBinaryOpsWithAlpha = 6; constexpr auto kNumLerpOps = 2; constexpr auto kNumLayernormFwd = 2; constexpr auto kNumBatchnormFwd = 3; +constexpr auto kNumBatchnormBwd = 2; constexpr auto kNumInstancenormFwd = 1; constexpr auto kNumSumToSize = 2; constexpr auto kNumAutocastOps = 2; constexpr auto kNumAliasDimOps = 2; constexpr auto kNumViewOps = 2; +constexpr auto kNumVarOps = 2; +constexpr auto kNumSoftmaxFwd = 2; +constexpr auto kNumSoftmaxBwd = 2; namespace { @@ -64,6 +68,21 @@ const auto& strAttr = Symbol::attr("profiled_str"); typedef Val* CgValue; typedef Expr* CgOp; +bool isReductionNonCompatibleTensor( + const std::shared_ptr& tensor_type) { + return is_zero_dim_tensor(tensor_type) || is_zero_sized_tensor(tensor_type); +} + +bool isInputNonSizeZeroTensor(const Node* node) { + for (const auto& val : node->inputs()) { + auto tensor_type = val->type()->cast(); + if (tensor_type && is_zero_sized_tensor(tensor_type)) { + return false; + } + } + return true; +} + // Note [ Permutation Bookkeeping and Propagation in Parser ] // // The goal in supporting permutation propagation in parser is to: @@ -577,6 +596,9 @@ class IrParser { // return nullptr if entry does not exist static const RegistrationEntry* lookupInRegistry(const Node* node) { + if (parser_skip_set_.count(node->kind()) != 0) { + return nullptr; + } // we need to use maybeSchema for nodes like prim::Constant, which doesn't // have a schema auto schema_ptr = node->maybeSchema(); @@ -600,6 +622,20 @@ class IrParser { return nullptr; } + static bool querySkipSymbolSet(c10::Symbol symbol, bool flip) { + // no need to init registry here (unlike `lookupInSymbolSet`, as + // `parser_skip_set_` is not initialized via initialization + bool ret = parser_skip_set_.count(symbol) != 0; + if (flip) { + if (ret) { + parser_skip_set_.erase(symbol); + } else { + parser_skip_set_.insert(symbol); + } + } + return ret; + } + static void initRegistry() { if (init_registry_) { // TODO: mutex this guy; @@ -733,7 +769,7 @@ class IrParser { value_map.emplace( node->output()->unique(), ValueHolder(out, format)); }, - nullptr, + isInputNonSizeZeroTensor, nullptr); } @@ -769,7 +805,7 @@ class IrParser { value_map.emplace( node->output()->unique(), ValueHolder(out, format)); }, - nullptr, + isInputNonSizeZeroTensor, nullptr); } @@ -825,7 +861,7 @@ class IrParser { value_map.emplace( node->output()->unique(), ValueHolder(out, format)); }, - nullptr, + isInputNonSizeZeroTensor, nullptr); } @@ -874,7 +910,7 @@ class IrParser { value_map.emplace( node->output()->unique(), ValueHolder(out, format)); }, - nullptr, + isInputNonSizeZeroTensor, nullptr); } @@ -917,7 +953,7 @@ class IrParser { value_map.emplace( node->output()->unique(), ValueHolder(out, format)); }, - nullptr, + isInputNonSizeZeroTensor, nullptr); } @@ -988,7 +1024,7 @@ class IrParser { value_map.emplace( node->output()->unique(), ValueHolder(out, format)); }, - nullptr, + isInputNonSizeZeroTensor, nullptr); } @@ -1009,7 +1045,7 @@ class IrParser { auto out = randlike(operand); value_map.emplace(node->output()->unique(), out); }, - nullptr, + isInputNonSizeZeroTensor, nullptr); } @@ -1024,14 +1060,14 @@ class IrParser { std::tie(format, list_val) = getConsistentValues( MemoryFormat::Contiguous(), value_map[node->inputs()[0]->unique()]); - auto operand = list_val.front(); + auto operand = list_val.front()->as(); list_val.pop_front(); auto& beta = value_map[node->inputs()[1]->unique()]; auto& threshold = value_map[node->inputs()[2]->unique()]; auto out = softplus(operand, beta, threshold); value_map.emplace(node->output()->unique(), out); }, - nullptr, + isInputNonSizeZeroTensor, nullptr); } @@ -1054,7 +1090,7 @@ class IrParser { auto out = threshold(operand, th, value); value_map.emplace(node->output()->unique(), out); }, - nullptr, + isInputNonSizeZeroTensor, nullptr); } @@ -1086,7 +1122,7 @@ class IrParser { value_map.emplace( node->output()->unique(), ValueHolder(out, format)); }, - nullptr, + isInputNonSizeZeroTensor, nullptr); } @@ -1112,7 +1148,7 @@ class IrParser { auto out = clamp(operand, low, high); value_map.emplace(node->output()->unique(), out); }, - nullptr, + isInputNonSizeZeroTensor, nullptr); } @@ -1140,7 +1176,7 @@ class IrParser { value_map.emplace( node->output()->unique(), ValueHolder(out, format)); }, - nullptr, + isInputNonSizeZeroTensor, nullptr); } @@ -1171,7 +1207,7 @@ class IrParser { value_map.emplace( node->output()->unique(), ValueHolder(out, format)); }, - nullptr, + isInputNonSizeZeroTensor, nullptr); } } @@ -1203,7 +1239,7 @@ class IrParser { value_map.emplace( node->output()->unique(), ValueHolder(out, format)); }, - nullptr, + isInputNonSizeZeroTensor, nullptr); } @@ -1240,7 +1276,7 @@ class IrParser { ValueHolder(TensorViewBuilder().build(), format)); } }, - nullptr, + isInputNonSizeZeroTensor, nullptr); } @@ -1273,7 +1309,7 @@ class IrParser { value_map.emplace(node->output()->unique(), input); } }, - nullptr, + isInputNonSizeZeroTensor, nullptr); } @@ -1301,7 +1337,7 @@ class IrParser { grad->as(), mask->as(), scale); value_map.emplace(node->output()->unique(), output); }, - nullptr, + isInputNonSizeZeroTensor, nullptr); } @@ -1342,9 +1378,6 @@ class IrParser { static_cast(NoneType::get()))) { running_mean = value_map[node->input(3)->unique()]->as(); - TORCH_INTERNAL_ASSERT( - running_mean->isFusionInput(), - "IO_tensor `instance_norm::running_mean` can only be input tensor to fusion"); } TensorView* running_var = nullptr; @@ -1352,9 +1385,6 @@ class IrParser { static_cast(NoneType::get()))) { running_var = value_map[node->input(4)->unique()]->as(); - TORCH_INTERNAL_ASSERT( - running_var->isFusionInput(), - "IO_tensor `instance_norm::running_var` can only be input tensor to fusion"); } // NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers) @@ -1397,7 +1427,13 @@ class IrParser { value_map.emplace(node->output()->unique(), result.output); } }, - [](const Node* node) -> bool { return true; }, + [](const Node* node) -> bool { + if (isReductionNonCompatibleTensor( + node->input(0)->type()->cast())) { + return false; + } + return true; + }, [](const Node* node) -> OperatorType { return OperatorType::Normalization; }); @@ -1508,7 +1544,13 @@ class IrParser { ValueHolder(result.output, format)); } }, - [](const Node* node) -> bool { return true; }, + [](const Node* node) -> bool { + if (isReductionNonCompatibleTensor( + node->input(0)->type()->cast())) { + return false; + } + return true; + }, [](const Node* node) -> OperatorType { return OperatorType::Normalization; }); @@ -1516,156 +1558,208 @@ class IrParser { } { - auto ptr_op = getOperatorForLiteral( - "aten::_batch_norm_impl_index_backward(int impl_index, Tensor input, Tensor grad_output, Tensor? weight, Tensor? running_mean, Tensor? running_var, Tensor? save_mean, Tensor? save_var_transform, bool train, float eps, bool[3] output_mask, Tensor reservedSpace) -> (Tensor, Tensor, Tensor)"); - REGISTER_PARSE_RULE( - ptr_op, - { - // discard impl_index and reservedSpace since we don't use them - MemoryFormat format; - std::list list_val; - std::tie(format, list_val) = getConsistentValues( - c10::nullopt, - value_map[node->inputs()[1]->unique()], - value_map[node->inputs()[2]->unique()]); - if (format.hasPermutation() && !format.isChannelsLast()) { + std::array BatchNormBwd = { + "aten::_batch_norm_impl_index_backward(int impl_index, Tensor input, Tensor grad_output, Tensor? weight, Tensor? running_mean, Tensor? running_var, Tensor? save_mean, Tensor? save_var_transform, bool train, float eps, bool[3] output_mask, Tensor reservedSpace) -> (Tensor, Tensor, Tensor)", + "aten::native_batch_norm_backward(Tensor grad_out, Tensor input, Tensor? weight, Tensor? running_mean, Tensor? running_var, Tensor? save_mean, Tensor? save_invstd, bool train, float eps, bool[3] output_mask) -> (Tensor, Tensor, Tensor)"}; + for (auto signature : BatchNormBwd) { + auto ptr_op = getOperatorForLiteral(signature); + REGISTER_PARSE_RULE( + ptr_op, + { + JitValue* ts_input = nullptr; + JitValue* ts_grad_output; + JitValue* ts_weight = nullptr; + JitValue* ts_r_mean = nullptr; + JitValue* ts_r_var = nullptr; + JitValue* ts_save_mean = nullptr; + JitValue* ts_save_invstd = nullptr; + JitValue* ts_train = nullptr; + JitValue* ts_eps = nullptr; + JitValue* ts_mask = nullptr; + if (node->kind() == + c10::Symbol::fromQualString( + "aten::_batch_norm_impl_index_backward")) { + ts_input = node->input(1); + ts_grad_output = node->input(2); + ts_weight = node->input(3); + ts_r_mean = node->input(4); + ts_r_var = node->input(5); + ts_save_mean = node->input(6); + ts_save_invstd = node->input(7); + ts_train = node->input(8); + ts_eps = node->input(9); + ts_mask = node->input(10); + } else if ( + node->kind() == + c10::Symbol::fromQualString( + "aten::native_batch_norm_backward")) { + ts_grad_output = node->input(0); + ts_input = node->input(1); + ts_weight = node->input(2); + ts_r_mean = node->input(3); + ts_r_var = node->input(4); + ts_save_mean = node->input(5); + ts_save_invstd = node->input(6); + ts_train = node->input(7); + ts_eps = node->input(8); + ts_mask = node->input(9); + } else { + TORCH_INTERNAL_ASSERT( + false, + "Forgot to register the key for BN variation: ", + node->kind().toDisplayString()); + } + + // discard impl_index and reservedSpace since we don't use them + MemoryFormat format; + std::list list_val; std::tie(format, list_val) = getConsistentValues( - MemoryFormat::Contiguous(), - value_map[node->inputs()[1]->unique()], - value_map[node->inputs()[2]->unique()]); - } - auto operand0 = list_val.front(); - list_val.pop_front(); - auto operand1 = list_val.front(); - list_val.pop_front(); - auto input = operand0->as(); - auto grad_out = operand1->as(); + c10::nullopt, + value_map[ts_input->unique()], + value_map[ts_grad_output->unique()]); + if (format.hasPermutation() && !format.isChannelsLast()) { + std::tie(format, list_val) = getConsistentValues( + MemoryFormat::Contiguous(), + value_map[ts_input->unique()], + value_map[ts_grad_output->unique()]); + } + auto operand0 = list_val.front(); + list_val.pop_front(); + auto operand1 = list_val.front(); + list_val.pop_front(); + auto input = operand0->as(); + auto grad_out = operand1->as(); - TensorView* weight = nullptr; - if (!node->input(3)->type()->isSubtypeOf( - static_cast(NoneType::get()))) { - weight = value_map[node->input(3)->unique()]->as(); - } + TensorView* weight = nullptr; + if (!ts_weight->type()->isSubtypeOf( + static_cast(NoneType::get()))) { + weight = value_map[ts_weight->unique()]->as(); + } - TensorView* running_mean = nullptr; - if (!node->input(4)->type()->isSubtypeOf( - static_cast(NoneType::get()))) { - running_mean = - value_map[node->input(4)->unique()]->as(); - } + TensorView* running_mean = nullptr; + if (!ts_r_mean->type()->isSubtypeOf( + static_cast(NoneType::get()))) { + running_mean = value_map[ts_r_mean->unique()]->as(); + } - TensorView* running_var = nullptr; - if (!node->input(5)->type()->isSubtypeOf( - static_cast(NoneType::get()))) { - running_var = - value_map[node->input(5)->unique()]->as(); - } + TensorView* running_var = nullptr; + if (!ts_r_var->type()->isSubtypeOf( + static_cast(NoneType::get()))) { + running_var = value_map[ts_r_var->unique()]->as(); + } - TensorView* save_mean = nullptr; - // NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers) - if (!node->input(6)->type()->isSubtypeOf( - static_cast(NoneType::get()))) { + TensorView* save_mean = nullptr; // NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers) - save_mean = value_map[node->input(6)->unique()]->as(); - } - - TensorView* save_invstd = nullptr; - // NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers) - if (!node->input(7)->type()->isSubtypeOf( - static_cast(NoneType::get()))) { - save_invstd = - // NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers) - value_map[node->input(7)->unique()]->as(); - } + if (!ts_save_mean->type()->isSubtypeOf( + static_cast(NoneType::get()))) { + // NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers) + save_mean = value_map[ts_save_mean->unique()]->as(); + } - // NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers) - auto training = constant_as(node->input(8)); - TORCH_INTERNAL_ASSERT( - training.has_value(), - "The training (bool) parameter is required."); - const bool kTraining = training.value(); + TensorView* save_invstd = nullptr; + // NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers) + if (!ts_save_invstd->type()->isSubtypeOf( + static_cast(NoneType::get()))) { + save_invstd = + // NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers) + value_map[ts_save_invstd->unique()]->as(); + } - // NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers) - Val* eps_ptr = nullptr; - // NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers) - if (auto eps = constant_as(node->input(9))) { - eps_ptr = IrBuilder::create(eps.value()); - } else { // NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers) - eps_ptr = value_map[node->input(7)->unique()]; - } + auto training = constant_as(ts_train); + TORCH_INTERNAL_ASSERT( + training.has_value(), + "The training (bool) parameter is required."); + const bool kTraining = training.value(); - // NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers) - auto out_mask_list = constant_as>(node->input(10)); - TORCH_INTERNAL_ASSERT( - out_mask_list.has_value(), - "output mask for batch_norm_backward"); - std::vector output_mask; - for (const auto value : out_mask_list->vec()) { - output_mask.emplace_back(static_cast(value)); - } + // NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers) + Val* eps_ptr = nullptr; + // NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers) + if (auto eps = constant_as(ts_eps)) { + eps_ptr = IrBuilder::create(eps.value()); + } else { + // NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers) + eps_ptr = value_map[ts_eps->unique()]; + } - // TODO: merge this loop below. - if (kTraining) { - TORCH_INTERNAL_ASSERT( - save_mean != nullptr && save_invstd != nullptr, - "When training=True, save_mean and save_invstd are required."); - } else { - // TODO: this is not a legit assumption? Can't we run with - // track_running_stats == false && training == false - // which should just run through the case above. + // NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers) + auto out_mask_list = constant_as>(ts_mask); TORCH_INTERNAL_ASSERT( - running_mean != nullptr && running_var != nullptr, - "When training=False, running_mean and running_invstd are required."); - } + out_mask_list.has_value(), + "output mask for batch_norm_backward"); + std::vector output_mask; + for (const auto value : out_mask_list->vec()) { + output_mask.emplace_back(static_cast(value)); + } - auto grads = batch_norm_backward( - input, - grad_out, - weight, - running_mean, - running_var, - save_mean, - save_invstd, - kTraining, - eps_ptr, - output_mask, - format.isChannelsLast()); + // TODO: merge this loop below. + if (kTraining) { + TORCH_INTERNAL_ASSERT( + save_mean != nullptr && save_invstd != nullptr, + "When training=True, save_mean and save_invstd are required."); + } else { + // TODO: this is not a legit assumption? Can't we run with + // track_running_stats == false && training == false + // which should just run through the case above. + TORCH_INTERNAL_ASSERT( + running_mean != nullptr && running_var != nullptr, + "When training=False, running_mean and running_invstd are required."); + } - if (output_mask[0]) { - TORCH_INTERNAL_ASSERT(grads.grad_input != nullptr); - value_map.emplace( - node->output(0)->unique(), - ValueHolder(grads.grad_input, format)); - } else { - TORCH_INTERNAL_ASSERT(grads.grad_input == nullptr); - value_map.emplace( - node->output(0)->unique(), - ValueHolder(TensorViewBuilder().build(), format)); - } + auto grads = batch_norm_backward( + input, + grad_out, + weight, + running_mean, + running_var, + save_mean, + save_invstd, + kTraining, + eps_ptr, + output_mask, + format.isChannelsLast()); - if (output_mask[1]) { - TORCH_INTERNAL_ASSERT(grads.grad_weight != nullptr); - value_map.emplace(node->output(1)->unique(), grads.grad_weight); - } else { - TORCH_INTERNAL_ASSERT(grads.grad_weight == nullptr); - value_map.emplace( - node->output(1)->unique(), TensorViewBuilder().build()); - } + if (output_mask[0]) { + TORCH_INTERNAL_ASSERT(grads.grad_input != nullptr); + value_map.emplace( + node->output(0)->unique(), + ValueHolder(grads.grad_input, format)); + } else { + TORCH_INTERNAL_ASSERT(grads.grad_input == nullptr); + value_map.emplace( + node->output(0)->unique(), + ValueHolder(TensorViewBuilder().build(), format)); + } - if (output_mask[2]) { - TORCH_INTERNAL_ASSERT(grads.grad_bias != nullptr); - value_map.emplace(node->output(2)->unique(), grads.grad_bias); - } else { - TORCH_INTERNAL_ASSERT(grads.grad_bias == nullptr); - value_map.emplace( - node->output(2)->unique(), TensorViewBuilder().build()); - } - }, - [](const Node* node) -> bool { return true; }, - [](const Node* node) -> OperatorType { - return OperatorType::Normalization; - }); + if (output_mask[1]) { + TORCH_INTERNAL_ASSERT(grads.grad_weight != nullptr); + value_map.emplace(node->output(1)->unique(), grads.grad_weight); + } else { + TORCH_INTERNAL_ASSERT(grads.grad_weight == nullptr); + value_map.emplace( + node->output(1)->unique(), TensorViewBuilder().build()); + } + + if (output_mask[2]) { + TORCH_INTERNAL_ASSERT(grads.grad_bias != nullptr); + value_map.emplace(node->output(2)->unique(), grads.grad_bias); + } else { + TORCH_INTERNAL_ASSERT(grads.grad_bias == nullptr); + value_map.emplace( + node->output(2)->unique(), TensorViewBuilder().build()); + } + }, + [](const Node* node) -> bool { + if (isReductionNonCompatibleTensor( + node->input(1)->type()->cast())) { + return false; + } + return true; + }, + [](const Node* node) -> OperatorType { + return OperatorType::Normalization; + }); + } } { @@ -1727,7 +1821,13 @@ class IrParser { } }, // TODO: #ProfileIValue List should update this - [](const Node* node) -> bool { return true; }, + [](const Node* node) -> bool { + if (isReductionNonCompatibleTensor( + node->input(0)->type()->cast())) { + return false; + } + return true; + }, [](const Node* node) -> OperatorType { return OperatorType::Normalization; }); @@ -1825,42 +1925,9 @@ class IrParser { } }, // TODO: #ProfileIValue List should update this - [](const Node* node) -> bool { return true; }, - [](const Node* node) -> OperatorType { - return OperatorType::Normalization; - }); - } - - { - auto ptr_op = getOperatorForLiteral( - "aten::softmax.int(Tensor self, int dim, int? dtype) -> Tensor"); - REGISTER_PARSE_RULE( - ptr_op, - { - MemoryFormat format; - std::list list_val; - std::tie(format, list_val) = getConsistentValues( - MemoryFormat::Contiguous(), - value_map[node->inputs()[0]->unique()]); - auto input_t = list_val.front(); - list_val.pop_front(); - auto input = input_t->as(); - - auto dim_value = constant_as(node->input(1)); - TORCH_INTERNAL_ASSERT( - dim_value.has_value(), "dim in softmax is not valid"); - - auto output = softmax(input, dim_value.value()); - value_map.emplace(node->output()->unique(), output); - }, [](const Node* node) -> bool { - if (node->inputs()[1]->node()->kind() != prim::Constant) { - return false; - } - // TODO: support dynamic input by profiling it - if (!node->inputs()[2]->type()->isSubtypeOf( - static_cast(NoneType::get())) && - node->inputs()[2]->node()->kind() != prim::Constant) { + if (isReductionNonCompatibleTensor( + node->input(0)->type()->cast())) { return false; } return true; @@ -1870,6 +1937,58 @@ class IrParser { }); } + { + std::array SoftmaxFwd = { + "aten::softmax.int(Tensor self, int dim, ScalarType? dtype=None) -> Tensor", + "aten::log_softmax.int(Tensor self, int dim, ScalarType? dtype=None) -> Tensor"}; + for (auto signature : SoftmaxFwd) { + auto ptr_op = getOperatorForLiteral(signature); + REGISTER_PARSE_RULE( + ptr_op, + { + MemoryFormat format; + std::list list_val; + std::tie(format, list_val) = getConsistentValues( + MemoryFormat::Contiguous(), + value_map[node->inputs()[0]->unique()]); + auto input_t = list_val.front(); + list_val.pop_front(); + auto input = input_t->as(); + + auto dim_value = constant_as(node->input(1)); + TORCH_INTERNAL_ASSERT( + dim_value.has_value(), "dim in softmax is not valid"); + + bool is_log_softmax = node->kind() == + c10::Symbol::fromQualString("aten::log_softmax"); + + auto output = (is_log_softmax) + ? log_softmax(input, dim_value.value()) + : softmax(input, dim_value.value()); + value_map.emplace(node->output()->unique(), output); + }, + [](const Node* node) -> bool { + if (isReductionNonCompatibleTensor( + node->input(0)->type()->cast())) { + return false; + } + if (node->inputs()[1]->node()->kind() != prim::Constant) { + return false; + } + // TODO: support dynamic input by profiling it + if (!node->inputs()[2]->type()->isSubtypeOf( + static_cast(NoneType::get())) && + node->inputs()[2]->node()->kind() != prim::Constant) { + return false; + } + return true; + }, + [](const Node* node) -> OperatorType { + return OperatorType::Normalization; + }); + } + } + { // LTC uses this op for softmax auto ptr_op = getOperatorForLiteral( "aten::_softmax(Tensor self, int dim, bool half_to_float) -> Tensor"); @@ -1893,6 +2012,10 @@ class IrParser { value_map.emplace(node->output()->unique(), output); }, [](const Node* node) -> bool { + if (isReductionNonCompatibleTensor( + node->input(0)->type()->cast())) { + return false; + } if (node->inputs()[1]->node()->kind() != prim::Constant) { return false; } @@ -1917,35 +2040,104 @@ class IrParser { } { - auto ptr_op = getOperatorForLiteral( - "aten::_softmax_backward_data(Tensor grad_output, Tensor output, int dim, ScalarType input_dtype) -> Tensor"); - REGISTER_PARSE_RULE( - ptr_op, - { - auto grad_output = - value_map[node->input(0)->unique()]->as(); + std::array SoftmaxBwd = { + "aten::_log_softmax_backward_data(Tensor grad_output, Tensor output, int dim, ScalarType input_dtype) -> Tensor", + "aten::_softmax_backward_data(Tensor grad_output, Tensor output, int dim, ScalarType input_dtype) -> Tensor"}; + for (auto signature : SoftmaxBwd) { + auto ptr_op = getOperatorForLiteral(signature); + REGISTER_PARSE_RULE( + ptr_op, + { + auto grad_output = + value_map[node->input(0)->unique()]->as(); - auto output = value_map[node->input(1)->unique()]->as(); + auto output = + value_map[node->input(1)->unique()]->as(); - auto dim_value = constant_as(node->input(2)); - TORCH_INTERNAL_ASSERT( - dim_value.has_value(), "dim in softmax is not valid"); + auto dim_value = constant_as(node->input(2)); + TORCH_INTERNAL_ASSERT( + dim_value.has_value(), "dim in softmax is not valid"); - // input_dtype here is ignored! type_inference handles it - auto grad_input = - softmax_backward(grad_output, output, dim_value.value()); + // input_dtype here is ignored! type_inference handles it + bool is_log_softmax = node->kind() == + c10::Symbol::fromQualString( + "aten::_log_softmax_backward_data"); + auto grad_input = (is_log_softmax) + ? log_softmax_backward(grad_output, output, dim_value.value()) + : softmax_backward(grad_output, output, dim_value.value()); - value_map.emplace(node->output()->unique(), grad_input); - }, - [](const Node* node) -> bool { - if (node->inputs()[2]->node()->kind() != prim::Constant) { - return false; - } - return true; - }, - [](const Node* node) -> OperatorType { - return OperatorType::Normalization; - }); + value_map.emplace(node->output()->unique(), grad_input); + }, + [](const Node* node) -> bool { + if (isReductionNonCompatibleTensor( + node->input(0)->type()->cast())) { + return false; + } + if (node->inputs()[2]->node()->kind() != prim::Constant) { + return false; + } + return true; + }, + [](const Node* node) -> OperatorType { + return OperatorType::Normalization; + }); + } + } + + { + std::array Variance = { + "aten::var.dim(Tensor self, int[1] dim, bool unbiased=True, bool keepdim=False) -> Tensor", + "aten::std.dim(Tensor self, int[1] dim, bool unbiased=True, bool keepdim=False) -> Tensor"}; + for (auto signature : Variance) { + auto ptr_op = getOperatorForLiteral(signature); + REGISTER_PARSE_RULE( + ptr_op, + { + MemoryFormat format; + std::list list_val; + std::tie(format, list_val) = getConsistentValues( + MemoryFormat::Contiguous(), + value_map[node->inputs()[0]->unique()]); + auto input_t = list_val.front(); + list_val.pop_front(); + auto input = input_t->as(); + + bool is_variance = + node->kind() == c10::Symbol::fromQualString("aten::var"); + + auto dims_list = constant_as>(node->input(1)); + TORCH_INTERNAL_ASSERT( + dims_list.has_value(), "Cannot fuse with dynamic axes"); + std::vector dims; + for (const auto dim : dims_list->vec()) { + dims.emplace_back(static_cast(dim)); + } + + auto unbiased = constant_as(node->input(2)); + TORCH_INTERNAL_ASSERT( + unbiased.has_value(), "Cannot fuse with dynamic unbiased"); + + auto keepdim = constant_as(node->input(3)); + TORCH_INTERNAL_ASSERT( + keepdim.has_value(), "Cannot fuse with dynamic keepdim"); + + auto output = (is_variance) + ? variance(input, dims, unbiased.value(), keepdim.value()) + : standard_deviation( + input, dims, unbiased.value(), keepdim.value()); + value_map.emplace(node->output()->unique(), output); + }, + [](const Node* node) -> bool { + if (isReductionNonCompatibleTensor( + node->input(0)->type()->cast())) { + return false; + } + return true; + }, + [](const Node* node) -> OperatorType { + return OperatorType::Normalization; + }); + } } { @@ -1967,8 +2159,13 @@ class IrParser { dims_list.has_value(), "aten::sum cannot be fused with dynamic axes"); std::vector dims; - for (const auto dim : dims_list->vec()) { - dims.emplace_back(static_cast(dim)); + if (!dims_list->empty()) { + for (const auto dim : dims_list->vec()) { + dims.emplace_back(static_cast(dim)); + } + } else { + dims.resize(self->as()->nDims()); + std::iota(dims.begin(), dims.end(), 0); } auto keepdim = constant_as(node->input(2)); TORCH_INTERNAL_ASSERT( @@ -1978,20 +2175,20 @@ class IrParser { value_map.emplace(node->output()->unique(), out); }, [](const Node* node) -> bool { + if (isReductionNonCompatibleTensor( + node->input(0)->type()->cast())) { + return false; + } // TODO: support cast of output types if (!node->inputs()[3]->type()->isSubtypeOf( static_cast(NoneType::get()))) { // We can only handle output as half, float, and double; if (const auto opt_ivalue = toIValue(node->input(3))) { const auto scalar_type = opt_ivalue->toScalarType(); - if (scalar_type == at::ScalarType::Double || - scalar_type == at::ScalarType::Float || - scalar_type == at::ScalarType::BFloat16 || - scalar_type == at::ScalarType::Half) { - return true; + if (!at::isFloatingType(scalar_type)) { + return false; } } - return false; } // we don't support dynamic reduction axes; if (node->inputs()[1]->node()->kind() != prim::Constant) { @@ -2027,8 +2224,13 @@ class IrParser { dims_list.has_value(), "aten::mean cannot be fused with dynamic axes"); std::vector dims; - for (const auto dim : dims_list->vec()) { - dims.emplace_back(static_cast(dim)); + if (!dims_list->empty()) { + for (const auto dim : dims_list->vec()) { + dims.emplace_back(static_cast(dim)); + } + } else { + dims.resize(self->as()->nDims()); + std::iota(dims.begin(), dims.end(), 0); } auto keepdim = constant_as(node->input(2)); TORCH_INTERNAL_ASSERT( @@ -2047,20 +2249,20 @@ class IrParser { value_map.emplace(node->output()->unique(), out); }, [](const Node* node) -> bool { + if (isReductionNonCompatibleTensor( + node->input(0)->type()->cast())) { + return false; + } // TODO: support cast of output types if (!node->inputs()[3]->type()->isSubtypeOf( static_cast(NoneType::get()))) { // We can only handle output as half, float, and double; if (const auto opt_ivalue = toIValue(node->input(3))) { const auto scalar_type = opt_ivalue->toScalarType(); - if (scalar_type == at::ScalarType::Double || - scalar_type == at::ScalarType::Float || - scalar_type == at::ScalarType::BFloat16 || - scalar_type == at::ScalarType::Half) { - return true; + if (!at::isFloatingType(scalar_type)) { + return false; } } - return false; } // we don't support dynamic reduction axes; if (node->inputs()[1]->node()->kind() != prim::Constant) { @@ -2105,13 +2307,15 @@ class IrParser { } }, [](const Node* node) -> bool { + if (isReductionNonCompatibleTensor( + node->input(0)->type()->cast())) { + return false; + } // we don't support dynamic reduction axes; if (node->inputs()[1]->node()->kind() != prim::Constant) { return false; } return true; - // auto size_to = constant_as>(node->input(1)); - // return size_to.has_value() && !size_to->empty(); }, [](const Node* node) -> OperatorType { auto size_to = constant_as>(node->input(1)); @@ -2146,7 +2350,7 @@ class IrParser { value_map.emplace( node->output()->unique(), ValueHolder(out, format)); }, - nullptr, + isInputNonSizeZeroTensor, nullptr); } } @@ -2184,7 +2388,7 @@ class IrParser { value_map.emplace( node->output()->unique(), ValueHolder(out, format)); }, - nullptr, + isInputNonSizeZeroTensor, nullptr); } @@ -2213,7 +2417,7 @@ class IrParser { value_map.emplace( node->output()->unique(), ValueHolder(out, format)); }, - nullptr, + isInputNonSizeZeroTensor, nullptr); } @@ -2274,7 +2478,7 @@ class IrParser { node->output()->unique(), ValueHolder(out, format)); } }, - nullptr, + isInputNonSizeZeroTensor, nullptr); } @@ -2288,27 +2492,22 @@ class IrParser { std::list list_val; std::tie(format, list_val) = getConsistentValues( c10::nullopt, value_map[node->inputs()[0]->unique()]); - auto self = list_val.front(); + auto self = list_val.front()->as(); list_val.pop_front(); auto approximate = constant_as(node->input(1)); TORCH_INTERNAL_ASSERT( approximate.has_value(), "The approximate parameter is required."); - const auto kApproximate = approximate.value(); - - Val* out = nullptr; - if (at::native::get_gelutype_enum(kApproximate) == - at::native::GeluType::Tanh) { - out = fast_gelu(self); - } else { - out = unaryOp(UnaryOpType::Gelu, self); - } + const auto kTanhGelu = + at::native::get_gelutype_enum(approximate.value()) == + at::native::GeluType::Tanh; + auto out = (kTanhGelu) ? tanh_gelu(self) : gelu(self); value_map.emplace( node->output()->unique(), ValueHolder(out, format)); }, - nullptr, + isInputNonSizeZeroTensor, nullptr); } @@ -2324,29 +2523,25 @@ class IrParser { c10::nullopt, value_map[node->inputs()[0]->unique()], value_map[node->inputs()[1]->unique()]); - auto grad_out = list_val.front(); + auto grad_out = list_val.front()->as(); list_val.pop_front(); - auto self = list_val.front(); + auto self = list_val.front()->as(); list_val.pop_front(); auto approximate = constant_as(node->input(2)); TORCH_INTERNAL_ASSERT( approximate.has_value(), "The approximate parameter is required."); - const auto kApproximate = approximate.value(); - - Val* grad_in = nullptr; - if (at::native::get_gelutype_enum(kApproximate) == - at::native::GeluType::Tanh) { - grad_in = fast_gelu_backward(grad_out, self); - } else { - grad_in = gelu_backward(grad_out, self); - } + const auto kTanhGelu = + at::native::get_gelutype_enum(approximate.value()) == + at::native::GeluType::Tanh; + auto grad_in = (kTanhGelu) ? tanh_gelu_backward(grad_out, self) + : gelu_backward(grad_out, self); value_map.emplace( node->output()->unique(), ValueHolder(grad_in, format)); }, - nullptr, + isInputNonSizeZeroTensor, nullptr); } @@ -2362,16 +2557,16 @@ class IrParser { c10::nullopt, value_map[node->inputs()[0]->unique()], value_map[node->inputs()[1]->unique()]); - auto grad_out = list_val.front(); + auto grad_out = list_val.front()->as(); list_val.pop_front(); - auto self = list_val.front(); + auto self = list_val.front()->as(); list_val.pop_front(); auto grad_in = tanh_backward(grad_out, self); value_map.emplace( node->output()->unique(), ValueHolder(grad_in, format)); }, - nullptr, + isInputNonSizeZeroTensor, nullptr); } @@ -2393,8 +2588,13 @@ class IrParser { dims_list.has_value(), "aten::amax cannot be fused with dynamic axes"); std::vector dims; - for (const auto dim : dims_list->vec()) { - dims.emplace_back(static_cast(dim)); + if (!dims_list->empty()) { + for (const auto dim : dims_list->vec()) { + dims.emplace_back(static_cast(dim)); + } + } else { + dims.resize(self->as()->nDims()); + std::iota(dims.begin(), dims.end(), 0); } auto keepdim = constant_as(node->input(2)); TORCH_INTERNAL_ASSERT( @@ -2405,6 +2605,10 @@ class IrParser { value_map.emplace(node->output()->unique(), out); }, [](const Node* node) -> bool { + if (isReductionNonCompatibleTensor( + node->input(0)->type()->cast())) { + return false; + } // we don't support dynamic reduction axes; if (node->inputs()[1]->node()->kind() != prim::Constant) { return false; @@ -2449,10 +2653,26 @@ class IrParser { value_map.emplace(node->output()->unique(), output); }, [](const Node* node) -> bool { + auto self_value = node->inputs()[0]; + auto tensor_type = self_value->type()->cast(); + if (tensor_type == nullptr) { + return false; + } + if (!tensor_type->sizes().concrete_sizes().has_value()) { + // Shape information for input tensor is required. + return false; + } + + if (!isInputNonSizeZeroTensor(node)) { + return false; + } // Reject fusing node if view_sizes contains an inferred dimension auto view_sizes = constant_as>(node->input(1)); - TORCH_INTERNAL_ASSERT( - view_sizes.has_value(), "The size parameter is required."); + if (!view_sizes.has_value()) { + // The size parameter is required. + return false; + } + for (auto axis_size : view_sizes->vec()) { if (axis_size == -1) { return false; @@ -2485,7 +2705,18 @@ class IrParser { auto output = squeeze(self, self_sizes); value_map.emplace(node->output()->unique(), output); }, - nullptr, + [](const Node* node) -> bool { + // Shape information for input tensor is required. + auto self_value = node->inputs()[0]; + auto tensor_type = self_value->type()->cast(); + if (tensor_type == nullptr) { + return false; + } + if (!isInputNonSizeZeroTensor(node)) { + return false; + } + return tensor_type->sizes().concrete_sizes().has_value(); + }, nullptr); } @@ -2521,7 +2752,19 @@ class IrParser { } value_map.emplace(node->output()->unique(), output); }, - nullptr, + [](const Node* node) -> bool { + // Shape information for input tensor is required. + auto self_value = node->inputs()[0]; + auto tensor_type = self_value->type()->cast(); + if (tensor_type == nullptr) { + return false; + } + if (!isInputNonSizeZeroTensor(node)) { + return false; + } + auto optional_sizes = tensor_type->sizes().concrete_sizes(); + return tensor_type->sizes().concrete_sizes().has_value(); + }, nullptr); } } @@ -2662,7 +2905,6 @@ class IrParser { nhwc_stride_vec[i]->stride_index_ = n_dim - i - 1; } - // auto updated_tensor_type = c10::TensorType::create( tensor_type = c10::TensorType::create( tensor_type->scalarType(), tensor_type->device(), @@ -2688,6 +2930,7 @@ class IrParser { std::unordered_map value_map_; static std::unordered_set parser_symbol_set_; + static std::unordered_set parser_skip_set_; // parsing rule registry. static std::unordered_map @@ -2701,6 +2944,7 @@ class IrParser { static bool init_registry_; }; std::unordered_set IrParser::parser_symbol_set_; // NOLINT +std::unordered_set IrParser::parser_skip_set_; // NOLINT std::unordered_map IrParser::jit_operator_registry_; // NOLINT std::unordered_map @@ -2995,6 +3239,11 @@ bool shouldProfileNode(const Node* node) { return IrParser::lookupInSymbolSet(node); } +bool skipNodeKind(const std::string& symbol_str, bool flip) { + return IrParser::querySkipSymbolSet( + c10::Symbol::fromQualString(symbol_str), flip); +} + bool insertProfileIValue(ProfilingRecord* pr, Node* node, size_t offset) { // is skip constant necessary? if (node->input(offset)->node()->kind() == prim::Constant) { @@ -3172,6 +3421,38 @@ bool insertProfileIValue(ProfilingRecord* pr, Node* node, size_t offset) { return true; } + static auto gelu_schema = + getOperatorForLiteral( + "aten::gelu(Tensor self, *, str approximate='none') -> Tensor") + ->schema(); + if (node->matches(gelu_schema)) { + switch (offset) { + // argument 1: approximate; + case 1: + profileString(pr, node, offset); + break; + default: + return false; + } + return true; + } + + static auto gelu_backward_schema = + getOperatorForLiteral( + "aten::gelu_backward(Tensor grad_output, Tensor self, *, str approximate='none') -> Tensor") + ->schema(); + if (node->matches(gelu_backward_schema)) { + switch (offset) { + // argument 2: approximate; + case 2: + profileString(pr, node, offset); + break; + default: + return false; + } + return true; + } + static auto native_layer_norm_schema = getOperatorForLiteral( "aten::native_layer_norm(Tensor input, int[] normalized_shape, Tensor? weight, Tensor? bias, float eps) -> (Tensor, Tensor, Tensor)") @@ -3213,6 +3494,26 @@ bool insertProfileIValue(ProfilingRecord* pr, Node* node, size_t offset) { return true; } + static auto batch_norm_backward_schema = + getOperatorForLiteral( + "aten::native_batch_norm_backward(Tensor grad_out, Tensor input, Tensor? weight, Tensor? running_mean, Tensor? running_var, Tensor? save_mean, Tensor? save_invstd, bool train, float eps, bool[3] output_mask) -> (Tensor, Tensor, Tensor)") + ->schema(); + if (node->matches(batch_norm_backward_schema)) { + switch (offset) { + // NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers) + case 7: // argument 8: training; + profileBool(pr, node, offset); + break; + // NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers) + case 9: + profileBoolList(pr, node, offset); + break; + default: + return false; + } + return true; + } + static auto native_layer_norm_backward_schema = getOperatorForLiteral( "aten::native_layer_norm_backward(Tensor grad_out, Tensor input, int[] normalized_shape, Tensor mean, Tensor rstd, Tensor? weight, Tensor? bias, bool[3] output_mask) -> (Tensor, Tensor, Tensor)") @@ -3246,43 +3547,16 @@ bool insertProfileIValue(ProfilingRecord* pr, Node* node, size_t offset) { } } - static auto gelu_schema = - getOperatorForLiteral( - "aten::gelu(Tensor self, *, str approximate='none') -> Tensor") - ->schema(); - if (node->matches(gelu_schema)) { - switch (offset) { - // argument 1: approximate; - case 1: - profileString(pr, node, offset); - break; - default: - return false; - } - return true; - } - - static auto gelu_backward_schema = + static auto log_softmax_backward_data_schema = getOperatorForLiteral( - "aten::gelu_backward(Tensor grad_output, Tensor self, *, str approximate='none') -> Tensor") + "aten::_log_softmax_backward_data(Tensor grad_output, Tensor output, int dim, ScalarType input_dtype) -> Tensor") ->schema(); - if (node->matches(gelu_backward_schema)) { - switch (offset) { - // argument 2: approximate; - case 2: - profileString(pr, node, offset); - break; - default: - return false; - } - return true; - } - static auto softmax_backward_data_schema = getOperatorForLiteral( "aten::_softmax_backward_data(Tensor grad_output, Tensor output, int dim, ScalarType input_dtype) -> Tensor") ->schema(); - if (node->matches(softmax_backward_data_schema)) { + if (node->matches(log_softmax_backward_data_schema) || + node->matches(softmax_backward_data_schema)) { switch (offset) { case 3: profileInt(pr, node, offset); diff --git a/torch/csrc/jit/codegen/cuda/parser.h b/torch/csrc/jit/codegen/cuda/parser.h index 6d52b325042577..ddfbf7762742a7 100644 --- a/torch/csrc/jit/codegen/cuda/parser.h +++ b/torch/csrc/jit/codegen/cuda/parser.h @@ -44,6 +44,8 @@ TORCH_CUDA_CU_API bool isElementWiseNode(const Node* node); TORCH_CUDA_CU_API bool isNodeParsible(const Node* node); TORCH_CUDA_CU_API bool shouldProfileNode(const Node* node); +TORCH_CUDA_CU_API bool skipNodeKind(const std::string& symbol_str, bool flip); + void InsertProfileNodes(ProfilingRecord* pr); // lowers PyTorch jit graph to `Fusion`. diff --git a/torch/csrc/jit/codegen/cuda/partition.cpp b/torch/csrc/jit/codegen/cuda/partition.cpp index 91d68494fd42fe..c5a452dc366982 100644 --- a/torch/csrc/jit/codegen/cuda/partition.cpp +++ b/torch/csrc/jit/codegen/cuda/partition.cpp @@ -6,6 +6,7 @@ #include #include #include +#include namespace torch { namespace jit { @@ -51,6 +52,26 @@ static c10::optional getDevice(const Value* value) { return tensor_type.device(); } +static bool hasBfloat(const Node* node) { + auto has_bfloat = [](const Value* value) { + if (!value->type()->isSubtypeOf(*TensorType::get())) { + return false; + } + auto opt_scalar_type = value->type()->expectRef().scalarType(); + if (opt_scalar_type.has_value() && + opt_scalar_type.value() == at::ScalarType::BFloat16) { + return true; + } + return false; + }; + + if (std::any_of(node->inputs().begin(), node->inputs().end(), has_bfloat) || + std::any_of(node->outputs().begin(), node->outputs().end(), has_bfloat)) { + return true; + } + return false; +} + static c10::optional getDevice(const Node* node) { c10::optional ret = c10::nullopt; auto merge_devices = [&ret](const c10::optional& device) { @@ -87,7 +108,29 @@ static c10::optional getDevice(const Node* node) { return ret; } -static bool isFusibleDevice(const Node* node, const c10::Device device) { +static bool isDeviceCompatible(const Node* node, const c10::Device& device) { + // only fuses cuda device + if (!device.is_cuda()) { + GRAPH_UPDATE("rejecting node (non-cuda device): ", *node); + return false; + } + const auto major = at::cuda::getDeviceProperties(device.index())->major; + // disable non-elementwise fusion on pre-volta devices + if (major < 7 && hasNonElementWiseOperation(node)) { + GRAPH_UPDATE( + "rejecting node (non element-wise op not supported on SM < 7X): ", + *node); + return false; + } + // disable bfloat fusion on pre-ampere devices + if (major < 8 && hasBfloat(node)) { + GRAPH_UPDATE("rejecting node (bfloat not supported on SM < 8X): ", *node); + return false; + } + return true; +} + +static bool isFusibleDevice(const Node* node, const c10::Device& device) { TORCH_INTERNAL_ASSERT( device.index() != INVALID_INDEX, "fusible device needs to be validate"); auto opt_device = getDevice(node); @@ -95,6 +138,12 @@ static bool isFusibleDevice(const Node* node, const c10::Device device) { // node into an existing `device` if (opt_device.has_value() && (opt_device->index() == INVALID_INDEX || opt_device != device)) { + GRAPH_UPDATE( + "rejecting node from fusion (outputs device not matching fusion): ", + *node); + return false; + } + if (!isDeviceCompatible(node, device)) { return false; } return true; @@ -105,12 +154,14 @@ static bool isFusibleDevice(const Node* node) { auto device = getDevice(node); // be conservative and only fuse cuda operations, this avoids us initializing // operations that produces cpu scalar outputs - if (!device.has_value()) { + if (!device.has_value() || device->index() == INVALID_INDEX) { + return false; + } + + if (!isDeviceCompatible(node, device.value())) { return false; } - return device->index() != INVALID_INDEX && device->is_cuda() && - (at::cuda::getDeviceProperties(device->index())->major >= 7 || - !hasNonElementWiseOperation(node)); + return true; } bool compatibleType(const torch::jit::Value* val) { @@ -120,6 +171,11 @@ bool compatibleType(const torch::jit::Value* val) { DataType::Null) { return false; } + // Complex is disabled until its support is completely added + // TODO: remove this logic + if (isComplexType(aten_to_data_type(tensor_type->scalarType().value()))) { + return false; + } } } return true; @@ -161,268 +217,35 @@ bool checkOutputTensorTypes(const Node* node) { } inline bool isFusibleNode(const Node* node) { + // Check if already part of a fusion group if (node->kind() == prim::CudaFusionGroup) return true; // Check we have a parsing rule - bool isFusible = isNodeParsible(node); - // Check if we have a tensor type it's one we support - isFusible = isFusible && checkInputTensorTypes(node); - isFusible = isFusible && checkOutputTensorTypes(node); - // Check if already part of a fusion group - return isFusible; -} - -bool maybeBroadcast( - const TensorTypePtr& type, - const std::vector>& shape) { - if (type->dim()) { - if (type->dim().value() < shape.size()) { - // no broadcast for reduction operation; - return false; - } else if (type->dim().value() > shape.size()) { - // increased rank means there is reduction; - return true; - } else { - // same rank, we need to iterate through sizes and check if size-1 - // exists in input `shape` - for (const auto& opt_size : shape) { - // TODO: not sure if we need to check for output size != 1, since we - // are currently marking all size-1 dimension as broadcast in codegen. - if (opt_size.has_value() && opt_size.value() == 1) { - return true; - } - } + if (!isNodeParsible(node)) { + // ignoring profile nodes & constant nodes to avoid noise from debugging + if (node->kind() != prim::Constant && + node->kind() != prim::profile_ivalue && node->kind() != prim::profile && + node->kind() != prim::Param) { + GRAPH_UPDATE("rejecting node from fusion (node not parsible): ", *node); } + return false; } - return false; -} - -// utility function to check if the node implies broadcast on a given shape ( -// assumed to be shape of an input tensor) -// limitations: -// 1. we rely on shape information to judge this. so we would require output -// shape to be available; -// 2. we basically compares given shape to the shape of the only output of -// the node and return true if it implies broadcast from the former to the -// latter. -bool maybeBroadcastOnShape( - const Node* n, - const std::vector>& shape) { - // TODO: we are only checking output 0. This means that our current check for - // normalization is not complete. - // assumes that if output is not a tensor type, it's not broadcasting - if (auto out_type = n->output(0)->type()->cast()) { - return maybeBroadcast(out_type, shape); - } - return false; -}; - -// return true if node is pointwise operation and input tensors all have -// identical shape. -bool isNonBroadcastElementWise(const Node* n) { - if (hasNonElementWiseOperation(n)) { + // Check if we have a tensor type it's one we support + if (!checkInputTensorTypes(node)) { + GRAPH_UPDATE( + "rejecting node from fusion (input scalar type not supported): ", + *node); return false; } - - for (const auto output : n->outputs()) { - const auto& n_output_type = output->type()->cast(); - - // TODO: we need to stay on safer side instead of "default to return true - // when shape information is not available.", Change that when we enable - // profiling on autodiff FW execution. - if (n_output_type != nullptr && n_output_type->sizes().sizes()) { - const std::vector>& n_output_shape = - n_output_type->sizes().sizes().value(); - - for (auto input : n->inputs()) { - if (auto t_type = input->type()->cast()) { - if (maybeBroadcast(t_type, n_output_shape)) { - return false; - } - } - } - } + if (!checkOutputTensorTypes(node)) { + GRAPH_UPDATE( + "rejecting node from fusion (output scalar type not supported): ", + *node); + return false; } - return true; } -//! [ Note - tricky broadcasting ] -//! -//! github issue # 190 -//! -//! To extend the issue further, we consider two difficult broadcasting cases -//! that is difficult to naively schedule: -//! scenario 1: single tensor with multiple broadcasting semantics; -//! ``` -//! %t = op(...) -//! %t0_o = op0(%t, %t0) -//! %t1_o = op1(%t, %t1) -//! ``` -//! It's hard to check/validate whether `%t0` and `%t1` implies -//! identical broadcasting for `%t` so that we can simply -//! broadcast it to their common shape and use the broadcasted -//! tensor view in both `op0` and `op1`; or, if `%t0` and `%t1` -//! has different shapes, we would need differently broadcasted -//! `%t` for the two ops. Even with this condition sorted out, -//! scheduling is challenging. As we cannot inline the computation -//! of `%t` to the downstream consumer of `%t0_o` and `%t1_o` -//! easily, because `computeAt` could propagate contradicting -//! transformations on the common ancestor `%t`. See footnote*; -//! scenario 2: output tensor_view which is broadcasted later; -//! ``` -//! %t = op(...) -//! %t0_o = op0(%t, %t0) -//! return (%t, %t0_o) -//! ``` -//! Similarly, if we need to broadcast `%t` to `%t0` for `op0`, -//! and use it as output, it also complicates schedule. -//! -//! Currently we just avoid the two cases in our graph partitioning. -//! -//! We bake the implementation along with our partition, where we merge nodes -//! from producer to consumer. In the example down, we list all "type"s of edges -//! among producer/consumer and the out side world. -//! -//! %input_t0, %input_t1, %input_t2 # inputs from outside world feeding -//! # producer/consumer pair -//! %p_out_t0, %p_out_t1 = producer(%input_t0, %input_t1) -//! %c_out_t, ... = consumer(%input_t0, %input_t2, %p_out_t0) -//! -//! producer/consumer : the nodes that we are trying to merge, each node could -//! be -//! a parsible real operation or a `CudaFusionGroup`. -//! %input_t0 : inputs shared by both producer & consumer -//! %input_t1 : inputs feed only to producer, but not to consumer -//! %input_t2 : inputs feed only to consumer, but not to producer -//! %p_put_t0 : outputs of producer that is fed to consumer -//! %p_put_t1 : outputs of producer that is not fed to consumer -//! %c_put_t0 : outputs of consumer -//! -//! We can see that after merging consumer & producer, we will have: -//! %input_t0, %input_t1, %input_t2 # inputs from outside world feeding -//! # producer/consumer pair -//! %p_out_t, %c_out_t = group(%input_t0, %input_t1, %input_t2) -//! -//! Under the assumption that any existing `CudaFusionGroup` does not have -//! violating broadcasting semantics mentioned above. -//! -//! If we examine the `group`, new cases of scenario 1 (multiple broadcast) -//! could only be created by merging new edges in the new `group`, that is: -//! case 1. `%input_t0`, shared by `producer` and `consumer` -//! case 2. `%p_out_t0`, produced by `producer` and fed to `consumer` -//! -//! new cases of scenario 2 (output was broadcasted later) could only be added -//! via: -//! case 3. `%p_out_t0`, produced by `producer` and fed to `consumer`, which -//! could be broadcasted in the consumer subgraph. -//! -//! footnote*: -//! We are only disabling multiple broadcast right on the tensor, instead of -//! tracing all the broadcast further down. -//! I don't think we need to worry about broadcasting further down the -//! dependency chain, as those would create new IterDomain, which doesn't have -//! th problem of conflicting broadcasting. -bool createTrickyBroadcast(const Node* consumer, const Node* producer) { - auto count_broadcasting_in_node = - [](const Node* node, - const std::vector>& shape, - size_t offset) { - int num_broadcasting = 0; - if (node->kind() == prim::CudaFusionGroup) { - // be careful here as `subgraph_input`, as its name suggests, is in a - // different fraph from `node`. - const auto& subgraph_input = - node->g(attr::Subgraph)->inputs()[offset]; - for (const auto& use : subgraph_input->uses()) { - if (maybeBroadcastOnShape(use.user, shape)) { - num_broadcasting++; - } - } - } else { - if (maybeBroadcastOnShape(node, shape)) { - num_broadcasting++; - } - } - return num_broadcasting; - }; - - // case 1. We check shared inputs to `producer` & `consumer`; - for (const auto i : c10::irange(producer->inputs().size())) { - auto n_input = producer->input(i); - auto n_input_type = n_input->type()->cast(); - if (n_input_type != nullptr && n_input_type->sizes().sizes()) { - std::vector> n_input_shape = - n_input_type->sizes().sizes().value(); - int num_broadcasting = 0; - - // check broadcasting for the n_input inside `consumer`; - for (const auto& use : n_input->uses()) { - if (use.user == consumer) { - num_broadcasting += - count_broadcasting_in_node(consumer, n_input_shape, use.offset); - } - } - - // if no broadcasting happened for consumer, there's no point check - // multiple broadcasting in producer alone; - if (num_broadcasting == 0) { - continue; - } - - // check broadcasting for n_input inside `producer`; - num_broadcasting += - count_broadcasting_in_node(producer, n_input_shape, i); - - // encounted multiple broadcasting scheme for a single TV, we will not be - // able to schedule this, prevent the fusion; (case 1) - if (num_broadcasting > 1) { - return true; - } - } - } - - // case 2. We check input to `consumer` that is also the output from - // `producer` - for (const auto i : c10::irange(producer->outputs().size())) { - auto n_output = producer->output(i); - auto n_output_type = n_output->type()->cast(); - if (n_output_type != nullptr && n_output_type->sizes().sizes()) { - std::vector> n_output_shape = - n_output_type->sizes().sizes().value(); - int num_broadcasting = 0; - // If we only look at case 1 & case 2, we need to check broadcast of - // `n_output` inside `producer`, if it is a `prim::CudaFusionGroup`. - // this is actually not necessary when we consider case 3, as we avoid - // broadcasting on outputs already; - - // TODO: merge this code with case 1. - // check broadcasting for the n_output inside `consumer`; - bool use_as_output = false; - for (const auto& use : n_output->uses()) { - if (use.user == consumer) { - num_broadcasting += - count_broadcasting_in_node(consumer, n_output_shape, use.offset); - } else { - // case 3. output is used by other nodes not the consumer, no - // broadcasting is allowed; - use_as_output = true; - } - } - - // encounted multiple broadcasting scheme for a single TV, we will not be - // able to schedule this, prevent the fusion; (case 2) - // Alternatively, if use_as_output is true, we would not permit broadcast - // at all. (case 3) - if (num_broadcasting > (use_as_output ? 0 : 1)) { - return true; - } - } - } - - return false; -} - } // namespace bool isFusibleCudaFusionGroup(const Node* node) { diff --git a/torch/csrc/jit/codegen/cuda/register_interface.cpp b/torch/csrc/jit/codegen/cuda/register_interface.cpp index a3fba4b629751d..d47c220a17e926 100644 --- a/torch/csrc/jit/codegen/cuda/register_interface.cpp +++ b/torch/csrc/jit/codegen/cuda/register_interface.cpp @@ -25,6 +25,7 @@ class RegisterInterface { ptr->fn_can_fuse_n = &isFusibleCudaFusionGroup; ptr->fn_insert_profile_inodes = &InsertProfileNodes; ptr->fn_profile_n = &shouldProfileNode; + ptr->fn_skip_n = &skipNodeKind; } }; diff --git a/torch/csrc/jit/codegen/cuda/root_domain_map.cpp b/torch/csrc/jit/codegen/cuda/root_domain_map.cpp index b48c6b00b3a331..011cbcf8098e15 100644 --- a/torch/csrc/jit/codegen/cuda/root_domain_map.cpp +++ b/torch/csrc/jit/codegen/cuda/root_domain_map.cpp @@ -285,6 +285,12 @@ void UnmappableReductionDomains::handle(ReductionOp* op) { handleReductionOutput(out_tv); } +void UnmappableReductionDomains::handle(MmaOp* mma) { + // Builds a map from reduction domains to consumer domains. + TensorView* out_tv = mma->out()->as(); + handleReductionOutput(out_tv); +} + void UnmappableReductionDomains::handle(WelfordOp* op) { // Builds a map from reduction domains to consumer domains. handleReductionOutput(op->outAvg()->as()); diff --git a/torch/csrc/jit/codegen/cuda/root_domain_map.h b/torch/csrc/jit/codegen/cuda/root_domain_map.h index 5156dc604f15b0..57f1c0d299d019 100644 --- a/torch/csrc/jit/codegen/cuda/root_domain_map.h +++ b/torch/csrc/jit/codegen/cuda/root_domain_map.h @@ -187,6 +187,7 @@ class TORCH_CUDA_CU_API UnmappableReductionDomains : private IterVisitor { using IterVisitor::handle; void handle(ReductionOp* op) override; void handle(WelfordOp* op) override; + void handle(MmaOp* op) override; void handleReductionOutput(TensorView* out_tv); @@ -393,10 +394,18 @@ class TORCH_CUDA_CU_API ComputeAtRootDomainMapBuilder mapPointwiseOrReductionOp(wop); } + void handle(MmaOp* wop) override { + mapPointwiseOrReductionOp(wop); + } + void handle(ShiftOp* op) override { mapPointwiseOrReductionOp(op); } + void handle(ViewDtypeOp* op) override { + mapPointwiseOrReductionOp(op); + } + void handle(ViewOp* op) override { mapPointwiseOrReductionOp(op); } diff --git a/torch/csrc/jit/codegen/cuda/runtime/array.cu b/torch/csrc/jit/codegen/cuda/runtime/array.cu new file mode 100644 index 00000000000000..470482d79eaf81 --- /dev/null +++ b/torch/csrc/jit/codegen/cuda/runtime/array.cu @@ -0,0 +1,264 @@ +// aligned register array for vectorized load/store +template +struct alignas(sizeof(scalar_t) * align_size) Array { + scalar_t array[size]; + + __device__ void set(scalar_t v) { +#pragma unroll + for (int i = 0; i < size; ++i) { + array[i] = v; + } + } + + __device__ scalar_t& operator[](const unsigned int i) { + return array[i]; + } +}; + +// Used for vectorized allocations that are not in registers +template +__device__ void arraySet(scalar_t* buff, scalar_t val) { +#pragma unroll + for (int i = 0; i < vec_size; ++i) { + buff[i] = val; + } +} + +template +__device__ void loadGeneric(scalar_t* to, scalar_t* from) { + // It would be really nice to use memcpy here, but one example was failing + // with: + // + // memcpy(to, from, vec_size * sizeof(scalar_t)); + // + // Yet passing with: + // + // for(int i = 0; i < vec_size; i++){ + // to[i] = from[i]; + // } + + switch (sizeof(scalar_t) * vec_size) { + case 1: + *reinterpret_cast(to) = *reinterpret_cast(from); + break; + case 2: + *reinterpret_cast(to) = *reinterpret_cast(from); + break; + case 4: + *reinterpret_cast(to) = *reinterpret_cast(from); + break; + case 8: + *reinterpret_cast(to) = *reinterpret_cast(from); + break; + case 12: + *reinterpret_cast(to) = *reinterpret_cast(from); + break; + case 16: + *reinterpret_cast(to) = *reinterpret_cast(from); + break; + } +} + +// Volatile version only works with c++ fundamnetal types +template < + typename scalar_t, + int vec_size, + bool is_volatile_to, + bool is_volatile_from> +__device__ void loadGenericVolatile( + typename MaybeVolatile::type* to, + typename MaybeVolatile::type* from) { + switch (sizeof(scalar_t) * vec_size) { + // Reinterpret cast like this with volatile types only works for C++ + // fundamental types otherwise the = operator is not defined + case 1: + *reinterpret_cast< + typename MaybeVolatile::type*>(to) = + *reinterpret_cast< + typename MaybeVolatile::type*>( + from); + break; + case 2: + *reinterpret_cast::type*>( + to) = + *reinterpret_cast< + typename MaybeVolatile::type*>(from); + break; + case 4: + *reinterpret_cast< + typename MaybeVolatile::type*>(to) = + *reinterpret_cast< + typename MaybeVolatile::type*>( + from); + break; + case 8: + *reinterpret_cast::type*>( + to) = + *reinterpret_cast< + typename MaybeVolatile::type*>(from); + break; + } +} + +template +__device__ void loadLocalToGlobal( + typename MaybeVolatile::type* to, + scalar_t* from) { + switch (sizeof(scalar_t) * vec_size) { + case 1: + case 2: + case 4: + loadGenericVolatile(to, from); + break; + case 8: { + uint2 const& data = *reinterpret_cast(from); + if (is_volatile) { + asm volatile( + "st.volatile.global.v2.s32 [%0], {%1,%2};" ::"l"( + (typename MaybeVolatile::type*)to), + "r"(data.x), + "r"(data.y)); + } else { + asm volatile( + "st.global.cs.v2.s32 [%0], {%1,%2};" ::"l"( + (typename MaybeVolatile::type*)to), + "r"(data.x), + "r"(data.y)); + } + break; + } + case 12: { + uint3 const& data = *reinterpret_cast(from); + if (is_volatile) { + asm volatile( + "st.volatile.global.v3.s32 [%0], {%1,%2,%3};" ::"l"( + (typename MaybeVolatile::type*)to), + "r"(data.x), + "r"(data.y), + "r"(data.z)); + } else { + asm volatile( + "st.global.cs.v3.s32 [%0], {%1,%2,%3};" ::"l"( + (typename MaybeVolatile::type*)to), + "r"(data.x), + "r"(data.y), + "r"(data.z)); + } + break; + } + case 16: { + uint4 const& data = *reinterpret_cast(from); + if (is_volatile) { + asm volatile( + "st.volatile.global.v4.s32 [%0], {%1,%2,%3,%4};" ::"l"( + (typename MaybeVolatile::type*)to), + "r"(data.x), + "r"(data.y), + "r"(data.z), + "r"(data.w)); + } else { + asm volatile( + "st.global.cs.v4.s32 [%0], {%1,%2,%3,%4};" ::"l"( + (typename MaybeVolatile::type*)to), + "r"(data.x), + "r"(data.y), + "r"(data.z), + "r"(data.w)); + } + break; + } + } +} + +template +__device__ void loadGlobalToLocal( + scalar_t* to, + typename MaybeVolatile::type* from) { + switch (sizeof(scalar_t) * vec_size) { + case 1: + case 2: + case 4: + loadGenericVolatile(to, from); + break; + case 8: { + if (is_volatile) { + uint2& data = *reinterpret_cast(to); + asm volatile("ld.volatile.global.v2.s32 {%0,%1}, [%2];" + : "=r"(data.x), "=r"(data.y) + : "l"((uint2*)from)); + break; + } else { + uint2& data = *reinterpret_cast(to); + asm volatile("ld.global.cs.v2.s32 {%0,%1}, [%2];" + : "=r"(data.x), "=r"(data.y) + : "l"((uint2*)from)); + } + break; + } + case 12: { + if (is_volatile) { + uint3& data = *reinterpret_cast(to); + asm volatile("ld.volatile.global.v3.s32 {%0,%1,%2}, [%3];" + : "=r"(data.x), "=r"(data.y), "=r"(data.z) + : "l"((uint3*)from)); + } else { + uint3& data = *reinterpret_cast(to); + asm volatile("ld.global.cs.v3.s32 {%0,%1,%2}, [%3];" + : "=r"(data.x), "=r"(data.y), "=r"(data.z) + : "l"((uint3*)from)); + } + break; + } + case 16: { + if (is_volatile) { + uint4& data = *reinterpret_cast(to); + asm volatile("ld.volatile.global.v4.s32 {%0,%1,%2,%3}, [%4];" + : "=r"(data.x), "=r"(data.y), "=r"(data.z), "=r"(data.w) + : "l"((uint4*)from)); + } else { + uint4& data = *reinterpret_cast(to); + asm volatile("ld.global.cs.v4.s32 {%0,%1,%2,%3}, [%4];" + : "=r"(data.x), "=r"(data.y), "=r"(data.z), "=r"(data.w) + : "l"((uint4*)from)); + } + break; + } + } +} + +template < + typename scalar_t, + int vec_size, + bool is_volatile_to, + bool is_volatile_from> +__device__ void loadGlobalToGlobal( + typename MaybeVolatile::type* to, + typename MaybeVolatile::type* from) { + switch (sizeof(scalar_t) * vec_size) { + // Reinterpret cast like this with volatile types only works for C++ + // fundamental types otherwise the = operator is not defined + case 1: + case 2: + case 4: + case 8: + loadGenericVolatile( + to, from); + break; + case 12: { + uint3 local_intermediate; + loadGlobalToLocal( + reinterpret_cast(&local_intermediate), from); + loadLocalToGlobal( + to, reinterpret_cast(&local_intermediate)); + break; + } + case 16: { + uint4 local_intermediate; + loadGlobalToLocal( + reinterpret_cast(&local_intermediate), from); + loadLocalToGlobal( + to, reinterpret_cast(&local_intermediate)); + break; + } + } +} diff --git a/torch/csrc/jit/codegen/cuda/runtime/fp16_support.cu b/torch/csrc/jit/codegen/cuda/runtime/fp16_support.cu index 4bd402e84c6041..410f3a7aaea12b 100644 --- a/torch/csrc/jit/codegen/cuda/runtime/fp16_support.cu +++ b/torch/csrc/jit/codegen/cuda/runtime/fp16_support.cu @@ -30,14 +30,3 @@ __device__ float __half2float(const __half h) { asm("{ cvt.f32.f16 %0, %1;}\n" : "=f"(val) : "h"(__NVFUSER_HALF_TO_CUS(h))); return val; } - -// aligned vector generates vectorized load/store on CUDA -template -struct alignas(sizeof(scalar_t) * vec_size) Array { - scalar_t val[vec_size]; - __device__ void set(scalar_t v) { - for (int i = 0; i < vec_size; ++i) { - val[i] = v; - } - } -}; diff --git a/torch/csrc/jit/codegen/cuda/runtime/fused_reduction.cu b/torch/csrc/jit/codegen/cuda/runtime/fused_reduction.cu new file mode 100644 index 00000000000000..69a36699265334 --- /dev/null +++ b/torch/csrc/jit/codegen/cuda/runtime/fused_reduction.cu @@ -0,0 +1,529 @@ +namespace fused_reduction { + +// We have 6 dimensions, 3 in the grid, 3 in the block +// They can be 1 of 3 states, +// Reduction Domain - TEMPLATE STATE 0 +// - Participating in the reduction, has values coming in, one value coming +// out across the dimension +// Iteration Domain - TEMPLATE STATE 1 +// - Not participating in the reduction, has values across the dimension after +// the reduction +// Collapsed Domain - TEMPLATE STATE 2 +// - Previously reduced, doesn't need to be reduced on that dimension, doesn't +// have values across that dimension +constexpr __device__ bool isReduce(int STATE) { + return STATE == 0; +} + +constexpr __device__ bool isIter(int STATE) { + return STATE == 1; +} + +constexpr __device__ bool isPred(int STATE) { + return STATE == 2; +} + +constexpr __device__ bool inactive(int STATE) { + return STATE == 3; +} + +constexpr __device__ bool activeNotIter(int STATE) { + return STATE != 3 && STATE != 1; +} + +// When generating an index into the reduction, we have to stride by iteration +// domains and reduction domains. Collapsed domains we can ignore, but we need +// to make sure they never read or write (need to be predicated to correct +// participation). + +// All inclusive reduction with option to re-broadcast. This reduction class +// does not use predication of parallelization in the read or write predicates. +// Instead there are 3 states each dimension of parallelization can have, +// described above. Predication, indexing, and reduction will be done based on +// this information. +template < + int X_BLOCK, + int Y_BLOCK, + int Z_BLOCK, + int X_THREAD, + int Y_THREAD, + int Z_THREAD, + bool PERSISTENT_REDUCTION, + bool BROADCAST> +class ParallelReduce { + static constexpr bool BLOCK_REDUCE = + isReduce(X_THREAD) || isReduce(Y_THREAD) || isReduce(Z_THREAD); + + static constexpr bool GRID_REDUCE = + isReduce(X_BLOCK) || isReduce(Y_BLOCK) || isReduce(Z_BLOCK); + + // ping-pong between global buffers to avoid a second sync + bool flip = false; + + public: + __device__ ParallelReduce() {} + + template + __device__ __inline__ void reduce( + RefTuple out, + const ConstRefTuple& inp, + VolatilePtrTuple global_work_buffer, + int64_t* global_sync_buffer, // Allocated as product of all + // non-participating Grid dimension + PtrTuple shared_buf, + bool read_pred, // Prevent reading from out of bounds memory + bool write_pred, // Prevent from writing out of bounds + const LocalTuple& init_val, + Func reduction_op) { + // If no reduction needed, just return input + if (!BLOCK_REDUCE && !GRID_REDUCE) { + if (read_pred && write_pred) { + out = inp; + } + return; + } + + // Don't read/write in temporary buffers if in a predicated dimension + bool block_reduce_participate = index_utils:: + maskedIsZero( + threadIdx); + + // Initialize block result + LocalTuple block_result = init_val; + + // Grab input data if participating in the reduction, set to block_result in + // the case there is no block reduction + if (block_reduce_participate && read_pred) { + block_result = inp; + } + + // Only threads that with id == 0 in the dimensions being reduced will + // have a valid result + bool has_block_result = index_utils::maskedIsZero< + isReduce(X_THREAD), + isReduce(Y_THREAD), + isReduce(Z_THREAD)>(threadIdx); + + if (BLOCK_REDUCE) { + // -- START BLOCK REDUCTION -- // + + // Size of the block reduction segment, can be an int since it's limited + // to number of threads + int block_reduction_size = index_utils::maskedSize< + isReduce(X_THREAD), + isReduce(Y_THREAD), + isReduce(Z_THREAD)>(blockDim); + + // Index in the reduction segment, can be an int since it's limited to + // number of threads + int tid_in_block_reduction = index_utils::maskedOffset< + isReduce(X_THREAD), + isReduce(Y_THREAD), + isReduce(Z_THREAD)>(threadIdx, blockDim); + + // ID of the block reduction this thread is participating in + // + // If any of the parallel dimensions are predicated out, that means + // they've already been reduced, so we only care about the first thread in + // that dimension. Therefore don't expand the reduction_idx by that + // dimension + int block_reduction_idx = index_utils:: + maskedOffset( + threadIdx, blockDim); + + // Shared memory buffer is 2D + // [iter dimension, reduction dimension] + + // Offset into smem for the current thread + int block_reduce_smem_offset = + block_reduction_idx * block_reduction_size + tid_in_block_reduction; + + // Initialize shared memory + if (block_reduce_participate) { + copyTuple(shared_buf, block_reduce_smem_offset, block_result); + } + + // Sync to make sure smem is completely initialized + block_sync::sync(); + + // Round reduction size down to nearest power of 2 + int np2 = 1 << (31 - __clz(block_reduction_size)); + + // Perform an initial reduction leaving np2 elements + if (block_reduce_participate && tid_in_block_reduction < np2 && + tid_in_block_reduction + np2 < block_reduction_size) { + reduce( + shared_buf, + block_reduce_smem_offset, + shared_buf, + block_reduce_smem_offset + np2, + reduction_op); + } + + // Always need to sync while operating on shared memory + block_sync::sync(); + + // Reduce down until 2 values, leaving 2 values allows us to manually + // perform the last reduction and avoid a syncthreads + for (int factor = np2 / 2; factor > 1; factor >>= 1) { + if (tid_in_block_reduction < factor && block_reduce_participate) { + reduce( + shared_buf, + block_reduce_smem_offset, + shared_buf, + block_reduce_smem_offset + factor, + reduction_op); + } + block_sync::sync(); + } + + // Accumulate that last valid result + if (has_block_result) { + copyTuple(block_result, shared_buf, block_reduce_smem_offset); + if (block_reduction_size > 1) { + reduce( + block_result, + 0, + shared_buf, + block_reduce_smem_offset + 1, + reduction_op); + } + } + + // ===== BLOCK REDUCTION CLEANUP ======= + if (!GRID_REDUCE) { + // If no grid reduction, we don't have to continue. Either broadcast + // back across the block or return the correct reduction + if (has_block_result && write_pred) { + reduce(block_result, 0, out, 0, reduction_op); + out = block_result; + } + if (BROADCAST) { + // No grid reduce, but need to broadcast, perform block broadcast + if (has_block_result && write_pred) { + // Put result back in shared memory, put in the first entry of the + // reduction segment's buffer + copyTuple( + shared_buf, + block_reduction_idx * block_reduction_size, + block_result); + } + + // Sync threads to make sure result is in smem + block_sync::sync(); + // If the thread is participating, and is not attempting to write out + // of bounds, return the broadcasted value. + if (block_reduce_participate && write_pred) { + copyTuple( + out, shared_buf, block_reduction_idx * block_reduction_size); + } + } + + // Forward protect shared memory, don't want threads to continue to + // another reduction/broadcast and pollute shared memory before the + // reduction is completely finished. + // + // This could be avoided in some cases if we added thread syncs from + // block reductions in the syncthread insertion pass. + block_sync::sync(); + return; + } + } + + // -- START GRID REDUCTION -- // + // Grid reductions are more challenging for two reasons, (1) the reduction + // itself is 3D instead of 2D because we now have an iter domain space in + // the grid dimension. (2) a tree reduction isn't performed, instead all + // blocks will populate GMEM and one block will finish the grid reduction. + + // What is the grid reduction size, block reduction already performed so + // that doesn't have to be taken into consideration + const auto grid_red_size = index_utils:: + maskedSize( + gridDim); + + // Which ID in the reduction is this block. Threads can participate in + // multiple grid reductions, but the block will have the same relative index + // in those reductions + const auto idx_in_grid_red = index_utils:: + maskedOffset( + blockIdx, gridDim); + + if (PERSISTENT_REDUCTION && flip) { + auto global_buffer_size = + index_utils:: + maskedSize( + gridDim) * + grid_red_size; + global_work_buffer += global_buffer_size; + } + flip = ~flip; + + // How many grid reductions have to be performed, in the grid dimension + const auto num_block_iters = index_utils:: + maskedSize(gridDim); + + // Which grid reduction does this block participate in, in the grid + // dimension + const auto block_red_idx_offset = index_utils:: + maskedOffset( + blockIdx, gridDim); + + // How many grid reductions have to be performed, in the block dimension + const auto num_thread_iters = index_utils:: + maskedSize( + blockDim); + + // Which grid reduction does this thread participate in, in the block + // dimension + const auto thread_red_idx_offset = index_utils:: + maskedOffset( + threadIdx, blockDim); + + // 3D buffer of reductions: + // [reduction_offset(grid), iter_offset(grid), iter_offset(block)] + // Offset into the work buffer + const auto work_buf_offset = + (idx_in_grid_red * num_block_iters + block_red_idx_offset) * + num_thread_iters + + thread_red_idx_offset; + + // Don't read/write in temporary buffers if in a predicated dimension + bool grid_reduce_participate = index_utils:: + maskedIsZero( + blockIdx); + + if (grid_reduce_participate && block_reduce_participate) { + if (has_block_result) { + copyTuple(global_work_buffer, work_buf_offset, block_result); + } + } + + // -- GLOBAL BUFFER FILLED -- // + + bool last_block = index_utils:: + maskedIsLast( + blockIdx, gridDim); + + if (grid_reduce_participate) { + // Don't need to sync up blocks that are not participating in this + // reduction + grid_sync::sync< + isReduce(X_BLOCK), + isReduce(Y_BLOCK), + isReduce(Z_BLOCK), + PERSISTENT_REDUCTION>( + global_sync_buffer[block_red_idx_offset], grid_red_size, last_block); + } + + // -- START BLOCK CLEANUP -- // + // All blocks perform the last cleanup, so every block, and every thread + // will have the final result + + // Initialize block result + LocalTuple last_block_result(init_val); + + if ((PERSISTENT_REDUCTION || last_block) && grid_reduce_participate) { + // Can use the last block to reduce all the values the blocks filled in. + // Can use any thread that has been predicated, or has been reduced to do + // this reduction, cannot use any block that's associated with an + // iteration domain + + // Start with non-block reduction + + // Index in the reduction segment + int tid_in_block_reduction_2 = index_utils::maskedOffset< + activeNotIter(X_THREAD), + activeNotIter(Y_THREAD), + activeNotIter(Z_THREAD)>(threadIdx, blockDim); + + int block_reduction_size_2 = index_utils::maskedSize< + activeNotIter(X_THREAD), + activeNotIter(Y_THREAD), + activeNotIter(Z_THREAD)>(blockDim); + + // 3D buffer of reductions: + // [reduction_offset(grid), iter_offset(grid), iter_offset(block)] + // Change the offset, we want to keep the last two dimensions, but the + // first dimension is what we will reduce over + const auto work_buf_offset_2 = + block_red_idx_offset * num_thread_iters + thread_red_idx_offset; + for (auto reduction_i = tid_in_block_reduction_2; + reduction_i < grid_red_size; + reduction_i += block_reduction_size_2) { + reduce( + last_block_result, + 0, + global_work_buffer, + work_buf_offset_2 + + reduction_i * num_block_iters * + num_thread_iters, // Iterating over the outer most + // dimension, so need to stride by the + // total number of grid reductions. Could + // come back and change it so this is the + // contiguous dimension + reduction_op); + } + + // -- START LAST BLOCK - BLOCK REDUCTION -- // + + // Reduced so we have one value per thread, we need to further reduce any + // dimension that is not an iter dimension + + // Which block reduction this thread is participating in + int block_reduction_idx = index_utils:: + maskedOffset( + threadIdx, blockDim); + + // Offset in smem for this thread's result + auto smem_offset = block_reduction_idx * block_reduction_size_2 + + tid_in_block_reduction_2; + + // Similar as before, reduce down to nearest power of 2 so we can do a + // tree reduction + int np2 = 1 << (31 - __clz(min(block_reduction_size_2, grid_red_size))); + + // Threads values are initialized, so all can participate here + if (tid_in_block_reduction_2 >= np2) { + copyTuple(shared_buf, smem_offset, last_block_result); + } + + block_sync::sync(); + + if (tid_in_block_reduction_2 < np2 && + tid_in_block_reduction_2 + np2 < + min(block_reduction_size_2, grid_red_size)) { + reduce( + last_block_result, 0, shared_buf, smem_offset + np2, reduction_op); + } + + if (tid_in_block_reduction_2 < np2) { + copyTuple(shared_buf, smem_offset, last_block_result); + } + + // Always sync when communicating across smem + block_sync::sync(); + + // Reduce down to 2 values, last thread will do the final reduction and + // can save a syncthreads this way + for (int factor = np2 / 2; factor > 1; factor >>= 1) { + if (tid_in_block_reduction_2 < factor) { + reduce( + shared_buf, + smem_offset, + shared_buf, + smem_offset + factor, + reduction_op); + } + block_sync::sync(); + } + + // If this thread in each block has the final result before broadcasting + // to all other threads in block + bool has_block_result_2 = index_utils::maskedIsZero< + activeNotIter(X_THREAD), + activeNotIter(Y_THREAD), + activeNotIter(Z_THREAD)>(threadIdx); + // Do the last reduction, protected by the write predicate + copyTuple(last_block_result, shared_buf, smem_offset); + if (has_block_result && grid_reduce_participate) { + reduce(last_block_result, 0, out, 0, reduction_op); + if (min(block_reduction_size_2, grid_red_size) > 1) { + reduce( + last_block_result, 0, shared_buf, smem_offset + 1, reduction_op); + } + } + if (grid_reduce_participate && PERSISTENT_REDUCTION) { + // If persistent reduction, always broadcast reduced values + copyTuple(shared_buf, smem_offset, last_block_result); + block_sync::sync(); + if (write_pred && block_reduce_participate) { + copyTuple( + out, shared_buf, block_reduction_idx * block_reduction_size_2); + } + // For persistent kernels we double the global buffer allocation so we + // don't need to protect those buffers every iteration preventing the + // need of an additional grid_sync. Since we flip back and forth between + // sections of the buffer, the one grid sync protects the other part of + // the buffer. + + } else { + // Forward protect the smem used in this reduction + if (grid_reduce_participate) { + if (last_block && has_block_result && block_reduce_participate && + write_pred) { + copyTuple( + out, shared_buf, block_reduction_idx * block_reduction_size_2); + } + } + block_sync::sync(); + } + } + } + + private: + template + __inline__ __device__ static void reduce( + TupleType0& val0, + nvfuser_index_t offset0, + const TupleType1& val1, + nvfuser_index_t offset1, + Func reduction_op) { + static_assert( + TupleType0::num_vals == TupleType1::num_vals, + "Invalid number of values"); + TupleReduce::reduce( + val0, offset0, val1, offset1, reduction_op); + } + + template < + typename TupleType0, + typename TupleType1, + typename Func, + int num_vals> + struct TupleReduce {}; + + template + struct TupleReduce { + __inline__ __device__ static void reduce( + TupleType0& val0, + nvfuser_index_t offset0, + const TupleType1& val1, + nvfuser_index_t offset1, + Func reduction_op) { + static_assert( + IsSameType< + typename TupleType0::ValTypes, + typename TupleType1::ValTypes>::value, + "Invalid value types"); + reduction_op(val0.val<0>(offset0), val1.val<0>(offset1)); + } + }; + + template + struct TupleReduce { + __inline__ __device__ static void reduce( + TupleType0& val0, + nvfuser_index_t offset0, + const TupleType1& val1, + nvfuser_index_t offset1, + Func reduction_op) { + static_assert( + IsSameType< + typename TupleType0::ValTypes, + typename TupleType1::ValTypes>::value, + "Invalid value types"); + reduction_op( + val0.val<0>(offset0), + val0.val<1>(offset0), + val0.val<2>(offset0), + val1.val<0>(offset1), + val1.val<1>(offset1), + val1.val<2>(offset1)); + } + }; + + // End Parallel reduce class +}; + +} // namespace fused_reduction diff --git a/torch/csrc/jit/codegen/cuda/runtime/grid_reduction.cu b/torch/csrc/jit/codegen/cuda/runtime/grid_reduction.cu index 83382f4704c6a5..df88b76772a7f9 100644 --- a/torch/csrc/jit/codegen/cuda/runtime/grid_reduction.cu +++ b/torch/csrc/jit/codegen/cuda/runtime/grid_reduction.cu @@ -272,6 +272,3 @@ __device__ void gridReduce( } } // namespace reduction - -#undef isize -#undef ioffset diff --git a/torch/csrc/jit/codegen/cuda/runtime/grid_sync.cu b/torch/csrc/jit/codegen/cuda/runtime/grid_sync.cu index a134bd81c2da3c..4bb89e17ece43d 100644 --- a/torch/csrc/jit/codegen/cuda/runtime/grid_sync.cu +++ b/torch/csrc/jit/codegen/cuda/runtime/grid_sync.cu @@ -18,7 +18,10 @@ __device__ T globalAsVolatile(volatile T& global_val) { // [X,Y,Z]_BLOCK. The granularity of this sync are those dimensions. I.E. // Marking X and Y but not Z means there should be Z semaphores of size X*Y. template -__device__ void sync(int64_t& semaphore, const uint64_t& segment_size) { +__device__ void sync( + int64_t& semaphore, + const uint64_t& segment_size, + const bool last_block) { // Finish all global memory transactions before synchronizing __threadfence(); @@ -36,8 +39,6 @@ __device__ void sync(int64_t& semaphore, const uint64_t& segment_size) { // Makes the assumption that blocks are in increasing order, this is not // guaranteed by CUDA but this is the current behavior, and unlikely to // change. - bool last_block = - index_utils::maskedIsLast(blockIdx, gridDim); if (last_block) { semaphore_increment = FIRST_UINT64_BIT - (segment_size - 1); } @@ -63,4 +64,13 @@ __device__ void sync(int64_t& semaphore, const uint64_t& segment_size) { // Sync block to make sure all other threads are waiting on the sync block_sync::sync(); } + +template +__device__ void sync(int64_t& semaphore, const uint64_t& segment_size) { + sync( + semaphore, + segment_size, + index_utils::maskedIsLast(blockIdx, gridDim)); +} + } // namespace grid_sync diff --git a/torch/csrc/jit/codegen/cuda/runtime/helpers.cu b/torch/csrc/jit/codegen/cuda/runtime/helpers.cu index 02fd8bf8777296..0d27bb50e5f6dd 100644 --- a/torch/csrc/jit/codegen/cuda/runtime/helpers.cu +++ b/torch/csrc/jit/codegen/cuda/runtime/helpers.cu @@ -28,19 +28,19 @@ __device__ constexpr int64_t ceilDiv(int a, int64_t b) { } __device__ constexpr int max(int a, int b) { - return ::max(a, b); + return a > b ? a : b; } __device__ constexpr int64_t max(int64_t a, int b) { - return ::max(a, (int64_t)b); + return a > (int64_t)b ? a : (int64_t)b; } __device__ constexpr int64_t max(int a, int64_t b) { - return ::max((int64_t)a, b); + return (int64_t)a > b ? (int64_t)a : b; } __device__ constexpr int64_t max(int64_t a, int64_t b) { - return ::max(a, b); + return a > b ? a : b; } __device__ double fmax(double a, double b) { @@ -50,7 +50,7 @@ __device__ double fmax(double a, double b) { } else if (b != b) { return b; } else { - return ::fmax(a, b); + return a > b ? a : b; } } @@ -61,24 +61,24 @@ __device__ float fmax(float a, float b) { } else if (b != b) { return b; } else { - return ::fmax(a, b); + return a > b ? a : b; } } __device__ constexpr int min(int a, int b) { - return ::min(a, b); + return a > b ? b : a; } __device__ constexpr int64_t min(int64_t a, int b) { - return ::min(a, (int64_t)b); + return (int64_t)a > b ? b : (int64_t)a; } __device__ constexpr int64_t min(int a, int64_t b) { - return ::min((int64_t)a, b); + return a > (int64_t)b ? (int64_t)b : a; } __device__ constexpr int64_t min(int64_t a, int64_t b) { - return ::min(a, b); + return a > b ? b : a; } __device__ double fmin(double a, double b) { @@ -88,7 +88,7 @@ __device__ double fmin(double a, double b) { } else if (b != b) { return b; } else { - return ::fmin(a, b); + return a > b ? b : a; } } @@ -99,7 +99,7 @@ __device__ float fmin(float a, float b) { } else if (b != b) { return b; } else { - return ::fmin(a, b); + return a > b ? b : a; } } @@ -115,20 +115,20 @@ __device__ float clamp(float x, double minv, double maxv) { return x < minv ? minv : (x > maxv ? maxv : x); } -__device__ double frac(double x) { - return x - trunc(x); +__device__ int clamp(int x, int64_t minv, int64_t maxv) { + return x < minv ? minv : (x > maxv ? maxv : x); } -__device__ float frac(float x) { - return x - trunc(x); +__device__ int64_t clamp(int64_t x, int64_t minv, int64_t maxv) { + return x < minv ? minv : (x > maxv ? maxv : x); } -__device__ double gelu(double x) { - return x * normcdf(x); +__device__ double frac(double x) { + return x - trunc(x); } -__device__ float gelu(float x) { - return x * normcdf(x); +__device__ float frac(float x) { + return x - trunc(x); } __device__ double reciprocal(double x) { @@ -139,6 +139,14 @@ __device__ float reciprocal(float x) { return 1 / x; } +__device__ std::complex reciprocal(std::complex x) { + return 1.0 / x; +} + +__device__ std::complex reciprocal(std::complex x) { + return 1.0f / x; +} + __device__ double relu(double x) { return x <= 0 ? 0 : x; } @@ -170,11 +178,19 @@ __device__ float remainder(float a, float b) { } __device__ double sigmoid(double x) { - return 1 / (1 + exp(-x)); + return 1.0 / (1.0 + exp(-x)); } __device__ float sigmoid(float x) { - return 1 / (1 + exp(-x)); + return 1.0f / (1.0f + exp(-x)); +} + +__device__ std::complex sigmoid(std::complex x) { + return 1.0 / (1.0 + exp(-x)); +} + +__device__ std::complex sigmoid(std::complex x) { + return 1.0f / (1.0f + exp(-x)); } __device__ double silu(double x) { @@ -193,6 +209,28 @@ __device__ float threshold(float x, double t, double v) { return x <= t ? v : x; } +__device__ std::complex where( + bool c, + std::complex a, + std::complex b) { + return c ? a : b; +} + +__device__ std::complex where( + bool c, + std::complex a, + std::complex b) { + return c ? a : b; +} + +__device__ int threshold(int x, int64_t t, int64_t v) { + return x <= t ? v : x; +} + +__device__ int64_t threshold(int64_t x, int64_t t, int64_t v) { + return x <= t ? v : x; +} + __device__ double where(bool c, double a, double b) { return c ? a : b; } @@ -205,6 +243,18 @@ __device__ int64_t where(bool c, int64_t a, int64_t b) { return c ? a : b; } +__device__ int where(bool c, int a, int b) { + return c ? a : b; +} + +__device__ int64_t where(bool c, int64_t a, int b) { + return c ? a : b; +} + +__device__ int64_t where(bool c, int a, int64_t b) { + return c ? a : b; +} + __device__ double randLike(Philox& rnd) { return uniform(rnd(), rnd()); } @@ -267,31 +317,59 @@ __device__ T pow(T a, T b) { } } -template int pow(int a, int b); -template int64_t pow(int64_t a, int64_t b); +template __device__ int pow(int a, int b); +template __device__ int64_t pow(int64_t a, int64_t b); template <> -float pow(float a, float b) { +__device__ float pow(float a, float b) { return ::pow(a, b); } template <> -double pow(double a, double b) { +__device__ double pow(double a, double b) { return ::pow(a, b); } -float pow(float a, int b) { +__device__ float pow(float a, int b) { return pow(a, (float)b); } -double pow(double a, int b) { +__device__ double pow(double a, int b) { return pow(a, (double)b); } -float pow(float a, int64_t b) { +__device__ float pow(float a, int64_t b) { return pow(a, (float)b); } -double pow(double a, int64_t b) { +__device__ double pow(double a, int64_t b) { return pow(a, (double)b); } + +int64_t pow(int64_t a, int b) { + return pow(a, (int64_t)b); +} + +int64_t pow(int a, int64_t b) { + return pow((int64_t)a, b); +} + +template +struct alignas(align) TypelessData { + int8_t data[size]; + + template _ = 0> + TypelessData(T x) { + *reinterpret_cast(data) = x; + } + + template _ = 0> + operator T() { + return *reinterpret_cast(data); + } +}; + +template +TypelessData erase_type(T x) { + return x; +} diff --git a/torch/csrc/jit/codegen/cuda/runtime/tensorcore.cu b/torch/csrc/jit/codegen/cuda/runtime/tensorcore.cu new file mode 100644 index 00000000000000..f95978e84475bf --- /dev/null +++ b/torch/csrc/jit/codegen/cuda/runtime/tensorcore.cu @@ -0,0 +1,215 @@ +// Utility macro for this file +#define DEVICE_INLINE __device__ inline + +// MMA instruction wrappers: +// The wrappers are subroutines that implement matrix of size +// A(M,K) X B(K,N) = C(M,N) +// The naming of the wrappers follow similar naming conventions +// as the mma instructions. +// All the mma macros follow the namespace and naming like +// Arch::M (M-dim) N (N-dim) K(K-dim) (Layout), eg. +// Volta::M16N16K4TT, +// with the dimensions describing the size of the sub-matrices being +// multiplied by this wrapper. +// see [Operand Layout Convention] in mma_type.h for details on the layout +// notation. +namespace Volta { + +namespace util { +// MMA instruction wrappers (sm_70+): +// The instruction wrappers below are quarter-warp macros, which currently +// nvfuser +// doesn't explicitly model. So they are currently only meant to be +// used as building blocks in warp level mma macros + +// 8x8x4 mma instruction, per quarter warp (8 threads), fp32 accumulate +// per thread register: +// A[4] x B[4] -> C[8] +DEVICE_INLINE void mmaM8n8k4tt( + Array* C, + Array<__half, 4, 4>* A, + Array<__half, 4, 4>* B) { + unsigned const* _A = reinterpret_cast(A); + unsigned const* _B = reinterpret_cast(B); + unsigned* _C = reinterpret_cast(C); + + asm("mma.sync.aligned.m8n8k4.row.row.f32.f16.f16.f32 {%0,%1,%2,%3,%4,%5,%6,%7}, {%8,%9}, {%10,%11}, {%12,%13,%14,%15,%16,%17,%18,%19};\n" + : "=r"(_C[0]), + "=r"(_C[1]), + "=r"(_C[2]), + "=r"(_C[3]), + "=r"(_C[4]), + "=r"(_C[5]), + "=r"(_C[6]), + "=r"(_C[7]) + : "r"(_A[0]), + "r"(_A[1]), + "r"(_B[0]), + "r"(_B[1]), + "r"(_C[0]), + "r"(_C[1]), + "r"(_C[2]), + "r"(_C[3]), + "r"(_C[4]), + "r"(_C[5]), + "r"(_C[6]), + "r"(_C[7])); +} + +DEVICE_INLINE void mmaM8n8k4tn( + Array* C, + Array<__half, 4, 4>* A, + Array<__half, 4, 4>* B) { + unsigned const* _A = reinterpret_cast(A); + unsigned const* _B = reinterpret_cast(B); + unsigned* _C = reinterpret_cast(C); + + asm("mma.sync.aligned.m8n8k4.row.col.f32.f16.f16.f32 {%0,%1,%2,%3,%4,%5,%6,%7}, {%8,%9}, {%10,%11}, {%12,%13,%14,%15,%16,%17,%18,%19};\n" + : "=r"(_C[0]), + "=r"(_C[1]), + "=r"(_C[2]), + "=r"(_C[3]), + "=r"(_C[4]), + "=r"(_C[5]), + "=r"(_C[6]), + "=r"(_C[7]) + : "r"(_A[0]), + "r"(_A[1]), + "r"(_B[0]), + "r"(_B[1]), + "r"(_C[0]), + "r"(_C[1]), + "r"(_C[2]), + "r"(_C[3]), + "r"(_C[4]), + "r"(_C[5]), + "r"(_C[6]), + "r"(_C[7])); +} + +DEVICE_INLINE void mmaM8n8k4nt( + Array* C, + Array<__half, 4, 4>* A, + Array<__half, 4, 4>* B) { + unsigned const* _A = reinterpret_cast(A); + unsigned const* _B = reinterpret_cast(B); + unsigned* _C = reinterpret_cast(C); + + asm("mma.sync.aligned.m8n8k4.col.row.f32.f16.f16.f32 {%0,%1,%2,%3,%4,%5,%6,%7}, {%8,%9}, {%10,%11}, {%12,%13,%14,%15,%16,%17,%18,%19};\n" + : "=r"(_C[0]), + "=r"(_C[1]), + "=r"(_C[2]), + "=r"(_C[3]), + "=r"(_C[4]), + "=r"(_C[5]), + "=r"(_C[6]), + "=r"(_C[7]) + : "r"(_A[0]), + "r"(_A[1]), + "r"(_B[0]), + "r"(_B[1]), + "r"(_C[0]), + "r"(_C[1]), + "r"(_C[2]), + "r"(_C[3]), + "r"(_C[4]), + "r"(_C[5]), + "r"(_C[6]), + "r"(_C[7])); +} + +// TODO: in a follow up, +// lift this part onto iterdomain ops, once the +// swizzle ops are ready. +template +DEVICE_INLINE Array accToMma(float* _C) { + float C_data[8] = { + _C[0], + _C[1], + _C[acc_stride], + _C[acc_stride + 1], + _C[2], + _C[3], + _C[acc_stride + 2], + _C[acc_stride + 3], + }; + + return *reinterpret_cast*>(&C_data[0]); +} + +template +DEVICE_INLINE void mmaToAcc(float* _C, Array& C) { + float* C_data = reinterpret_cast(&C); + _C[0] = C_data[0]; + _C[1] = C_data[1]; + _C[acc_stride] = C_data[2]; + _C[acc_stride + 1] = C_data[3]; + _C[2] = C_data[4]; + _C[3] = C_data[5]; + _C[acc_stride + 2] = C_data[6]; + _C[acc_stride + 3] = C_data[7]; +} + +// Should be able to lift this with transpose op as well. +template +DEVICE_INLINE void initM16N16K4(Array& accumulator) { + float* _C = reinterpret_cast(&accumulator); + float zeros[8] = {0, 0, 0, 0, 0, 0, 0, 0}; + mmaToAcc(_C, *reinterpret_cast*>(&zeros[0])); +} + +} // namespace util + +template +DEVICE_INLINE void M16N16K4TT( + Array* C, + Array<__half, 4, 4>* A, + Array<__half, 4, 4>* B) { + float* _C = reinterpret_cast(C); + Array C_data = util::accToMma(_C); + util::mmaM8n8k4tt(&C_data, A, B); + util::mmaToAcc(_C, C_data); +} + +template +DEVICE_INLINE void M16N16K4TN( + Array* C, + Array<__half, 4, 4>* A, + Array<__half, 4, 4>* B) { + float* _C = reinterpret_cast(C); + Array C_data = util::accToMma(_C); + util::mmaM8n8k4tn(&C_data, A, B); + util::mmaToAcc(_C, C_data); +} + +template +DEVICE_INLINE void M16N16K4NT( + Array* C, + Array<__half, 4, 4>* A, + Array<__half, 4, 4>* B) { + float* _C = reinterpret_cast(C); + Array C_data = util::accToMma(_C); + util::mmaM8n8k4nt(&C_data, A, B); + util::mmaToAcc(_C, C_data); +} + +// Same initialization for now, will be different in interleaved +// macros +template +DEVICE_INLINE void initM16N16K4TT(Array* accumulator) { + util::initM16N16K4(*accumulator); +} + +template +DEVICE_INLINE void initM16N16K4TN(Array* accumulator) { + util::initM16N16K4(*accumulator); +} + +template +DEVICE_INLINE void initM16N16K4NT(Array* accumulator) { + util::initM16N16K4(*accumulator); +} + +} // namespace Volta + +#undef DEVICE_INLINE diff --git a/torch/csrc/jit/codegen/cuda/runtime/tuple.cu b/torch/csrc/jit/codegen/cuda/runtime/tuple.cu new file mode 100644 index 00000000000000..8e67dba7da72c9 --- /dev/null +++ b/torch/csrc/jit/codegen/cuda/runtime/tuple.cu @@ -0,0 +1,322 @@ +// std::tuple-like type +template +struct Tuple; + +template +struct Tuple { + T0 val0; + + __device__ Tuple(T0 _val0) : val0(_val0) {} + + // Only valid when instantiated for pointer types + __device__ void operator+=(nvfuser_index_t offset) { + static_assert(IsPointerType::value, "Invalid for non-pointer types"); + val0 += offset; + } +}; + +template +struct Tuple { + T0 val0; + T1 val1; + + __device__ Tuple(T0 _val0, T1 _val1) : val0(_val0), val1(_val1) {} + + // Only valid when instantiated for pointer types + __device__ void operator+=(nvfuser_index_t offset) { + static_assert(IsPointerType::value, "Invalid for non-pointer types"); + static_assert(IsPointerType::value, "Invalid for non-pointer types"); + val0 += offset; + val1 += offset; + } +}; + +template +struct Tuple { + T0 val0; + T1 val1; + T2 val2; + + __device__ Tuple(T0 _val0, T1 _val1, T2 _val2) + : val0(_val0), val1(_val1), val2(_val2) {} + + // Only valid when instantiated for pointer types + __device__ void operator+=(nvfuser_index_t offset) { + static_assert(IsPointerType::value, "Invalid for non-pointer types"); + static_assert(IsPointerType::value, "Invalid for non-pointer types"); + static_assert(IsPointerType::value, "Invalid for non-pointer types"); + val0 += offset; + val1 += offset; + val2 += offset; + } +}; + +// Accessor for Tuple +template +struct get; + +template <> +struct get<0> { + template + __device__ auto& operator()(Tuple& vals) { + return vals.val0; + } + template + __device__ const auto& operator()(const Tuple& vals) { + return vals.val0; + } +}; + +template <> +struct get<1> { + template + __device__ auto& operator()(Tuple& vals) { + return vals.val1; + } + template + __device__ const auto& operator()(const Tuple& vals) { + return vals.val1; + } +}; + +template <> +struct get<2> { + template + __device__ auto& operator()(Tuple& vals) { + return vals.val2; + } + template + __device__ const auto& operator()(const Tuple& vals) { + return vals.val2; + } +}; + +template +__inline__ __device__ static void copyTuple( + DstType& dst, + nvfuser_index_t dst_offset, + const SrcType& src, + nvfuser_index_t src_offset = 0); + +template +__inline__ __device__ static void copyTuple( + DstType& dst, + const SrcType& src, + nvfuser_index_t src_offset = 0); + +template +class LocalTuple { + public: + static constexpr int num_vals = sizeof...(Types); + using ValTypes = TypeList; + + __device__ LocalTuple(Types... args) : vals_(args...) {} + + __device__ LocalTuple(const LocalTuple& other) : vals_(other.vals_) {} + + template