Skip to content

manywheel: add _GLIBCXX_USE_CXX11_ABI=1 support for linux cpu wheel #990

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 31, 2022
Merged

manywheel: add _GLIBCXX_USE_CXX11_ABI=1 support for linux cpu wheel #990

merged 1 commit into from
Mar 31, 2022

Conversation

zhuhong61
Copy link
Contributor

The target of this commit is to add the support for a new linux
cpu pip wheel file built with _GLIBCXX_USE_CXX11_ABI=1.

Currently, linux wheels are built in a system based on CentOS7
and devtoolset7, and CXX11_ABI is ignored by the compiler. The
same issue with devtoolset8 and devtoolset9, and so we add a Docker
file (Dockerfile_cxx11-abi) with Ubuntu 18.04 as base image to
support CXX11_ABI=1, by referring the Dockerfile for libtorch.

To build the new docker image with CXX11_ABI support, run:
GPU_ARCH_TYPE=cpu-cxx11-abi manywheel/build_docker.sh
or
manywheel/build_all_docker.sh

To build a linux cpu pip wheel with CXX11_ABI within this image, run:
// the below settings are special for this image
export DESIRED_CUDA=cpu-cxx11-abi # change from cpu for wheel name
export GPU_ARCH_TYPE=cpu-cxx11-abi # change from cpu for build.sh
export DOCKER_IMAGE=pytorch/manylinuxcxx11-abi-builder:cpu-cxx11-abi
export DESIRED_DEVTOOLSET=cxx11-abi

// the below settings are as usual
export BINARY_ENV_FILE=/tmp/env
export BUILDER_ROOT=/builder
export DESIRED_PYTHON=3.7 # or 3.8, 3.9, etc.
export IS_GHA=1
export PACKAGE_TYPE=manywheel
export PYTORCH_FINAL_PACKAGE_DIR=/artifacts
export PYTORCH_ROOT=/pytorch
export GITHUB_WORKSPACE=/your_path_to_workspace

// the '-e DESIRED_DEVTOOLSET' below is newly added for this container,
// others are as usual
set -x
mkdir -p artifacts/
container_name=$(docker run
-e BINARY_ENV_FILE
-e BUILDER_ROOT
-e DESIRED_CUDA
-e DESIRED_PYTHON
-e GPU_ARCH_TYPE
-e IS_GHA
-e PACKAGE_TYPE
-e PYTORCH_FINAL_PACKAGE_DIR
-e PYTORCH_ROOT
-e DOCKER_IMAGE
-e DESIRED_DEVTOOLSET
--tty
--detach
-v "${GITHUB_WORKSPACE}/pytorch:/pytorch"
-v "${GITHUB_WORKSPACE}/builder:/builder"
-v "${RUNNER_TEMP}/artifacts:/artifacts"
-w /
"${DOCKER_IMAGE}"
)

// build pip wheel as usual,
// and the built wheel file name looks like: torch-1.12.0.dev20220312+cpu.cxx11.abi-cp37-cp37m-linux_x86_64.whl
docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh"
docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/manywheel/build.sh"

// to verify the built wheel file, we'll see 'True'
$ pip install torch-1.12.0.dev20220312+cpu.cxx11.abi-cp37-cp37m-linux_x86_64.whl
$ python -c 'import torch; print(torch._C._GLIBCXX_USE_CXX11_ABI)'
True

Co-authored-by: Guo Yejun [email protected]
Co-authored-by: Zhu Hong [email protected]

The target of this commit is to add the support for a new linux
cpu pip wheel file built with _GLIBCXX_USE_CXX11_ABI=1.

Currently, linux wheels are built in a system based on CentOS7
and devtoolset7, and CXX11_ABI is ignored by the compiler. The
same issue with devtoolset8 and devtoolset9, and so we add a Docker
file (Dockerfile_cxx11-abi) with Ubuntu 18.04 as base image to
support CXX11_ABI=1, by referring the Dockerfile for libtorch.

To build the new docker image with CXX11_ABI support, run:
GPU_ARCH_TYPE=cpu-cxx11-abi manywheel/build_docker.sh
or
manywheel/build_all_docker.sh

To build a linux cpu pip wheel with CXX11_ABI within this image, run:
// the below settings are special for this image
export DESIRED_CUDA=cpu-cxx11-abi   # change from cpu for wheel name
export GPU_ARCH_TYPE=cpu-cxx11-abi  # change from cpu for build.sh
export DOCKER_IMAGE=pytorch/manylinuxcxx11-abi-builder:cpu-cxx11-abi
export DESIRED_DEVTOOLSET=cxx11-abi

// the below settings are as usual
export BINARY_ENV_FILE=/tmp/env
export BUILDER_ROOT=/builder
export DESIRED_PYTHON=3.7   # or 3.8, 3.9, etc.
export IS_GHA=1
export PACKAGE_TYPE=manywheel
export PYTORCH_FINAL_PACKAGE_DIR=/artifacts
export PYTORCH_ROOT=/pytorch
export GITHUB_WORKSPACE=/your_path_to_workspace

// the '-e DESIRED_DEVTOOLSET' below is newly added for this container,
// others are as usual
set -x
  mkdir -p artifacts/
  container_name=$(docker run \
    -e BINARY_ENV_FILE \
    -e BUILDER_ROOT \
    -e DESIRED_CUDA \
    -e DESIRED_PYTHON \
    -e GPU_ARCH_TYPE \
    -e IS_GHA \
    -e PACKAGE_TYPE \
    -e PYTORCH_FINAL_PACKAGE_DIR \
    -e PYTORCH_ROOT \
    -e DOCKER_IMAGE \
    -e DESIRED_DEVTOOLSET \
    --tty \
    --detach \
    -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \
    -v "${GITHUB_WORKSPACE}/builder:/builder" \
    -v "${RUNNER_TEMP}/artifacts:/artifacts" \
    -w / \
    "${DOCKER_IMAGE}"
  )

// build pip wheel as usual,
// and the built wheel file name looks like: torch-1.12.0.dev20220312+cpu.cxx11.abi-cp37-cp37m-linux_x86_64.whl
docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh"
docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash /builder/manywheel/build.sh"

// to verify the built wheel file, we'll see 'True'
$ pip install torch-1.12.0.dev20220312+cpu.cxx11.abi-cp37-cp37m-linux_x86_64.whl
$ python -c 'import torch; print(torch._C._GLIBCXX_USE_CXX11_ABI)'
True

Co-authored-by: Guo Yejun <[email protected]>
Co-authored-by: Zhu Hong <[email protected]>
@facebook-github-bot
Copy link
Contributor

Hi @zhuhong61!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@guoyejun
Copy link
Contributor

any comment? thanks.

@guoyejun
Copy link
Contributor

@malfet any comment? thanks.

@guoyejun
Copy link
Contributor

guoyejun commented Mar 31, 2022

there is error to "Build manywheel docker images / build-docker-cuda (11.3) (pull_request) Failing after 1m "

I checked the log, and looks that this issue is not caused by this PR.

+ docker build -t docker.io/pytorch/manylinux-builder:cuda11.3 --build-arg BASE_CUDA_VERSION=11.3 --build-arg GPU_IMAGE=nvidia/cuda:10.2-devel-centos7 --target cuda_final -f /home/runner/work/builder/builder/manywheel/Dockerfile /home/runner/work/builder/builder
...
#11 [common  2/19] RUN yum install -y         aclocal         autoconf         automake         bison         bzip2         curl         diffutils         file         git         make         patch         perl         unzip         util-linux         wget         which         xz         yasm
#11 sha256:6c19de3706b80355181e9c4a16009985be81e232ce9826bbe69ac21eaba9150b
#11 0.447 Loaded plugins: fastestmirror, ovl
#11 0.604 Determining fastest mirrors
#11 1.180  * base: mirror.vacares.com
#11 1.180  * extras: centos.mirror.lstn.net
#11 1.181  * updates: atl.mirrors.clouvider.net
#11 1.635 https://developer.download.nvidia.com/compute/machine-learning/repos/rhel7/x86_64/repodata/repomd.xml: [Errno 14] HTTPS Error 404 - Not Found
#11 1.636 Trying other mirror.
#11 1.636 To address this issue please refer to the below wiki article 
#11 1.636 
#11 1.636 https://wiki.centos.org/yum-errors
#11 1.636 
#11 1.636 If above article doesn't help to resolve this issue please use [https://bugs.centos.org/.](https://bugs.centos.org/)
#11 1.636 
#11 1.638 
#11 1.638 
#11 1.638  One of the configured repositories failed (nvidia-ml),
#11 1.638  and yum doesn't have enough cached data to continue. At this point the only
#11 1.638  safe thing yum can do is fail. There are a few ways to work "fix" this:
#11 1.638 
#11 1.638      1. Contact the upstream for the repository and get them to fix the problem.
#11 1.638 
#11 1.638      2. Reconfigure the baseurl/etc. for the repository, to point to a working
#11 1.638         upstream. This is most often useful if you are using a newer
#11 1.638         distribution release than is supported by the repository (and the
#11 1.638         packages for the previous distribution release still work).
#11 1.638 
#11 1.638      3. Run the command with the repository temporarily disabled
#11 1.638             yum --disablerepo=nvidia-ml ...
#11 1.638 
#11 1.638      4. Disable the repository permanently, so yum won't use it by default. Yum
#11 1.638         will then just ignore the repository until you permanently enable it
#11 1.638         again or use --enablerepo for temporary usage:
#11 1.638 
#11 1.638             yum-config-manager --disable nvidia-ml
#11 1.638         or
#11 1.638             subscription-manager repos --disable=nvidia-ml
#11 1.638 
#11 1.638      5. Configure the failing repository to be skipped, if it is unavailable.
#11 1.638         Note that yum will try to contact the repo. when it runs most commands,
#11 1.638         so will have to try and fail each time (and thus. yum will be be much
#11 1.638         slower). If it is a very temporary problem though, this is often a nice
#11 1.638         compromise:
#11 1.638 
#11 1.638             yum-config-manager --save --setopt=nvidia-ml.skip_if_unavailable=true
#11 1.638 
#11 1.638 failure: repodata/repomd.xml from nvidia-ml: [Errno [256](https://github.com/pytorch/builder/runs/5764231634?check_suite_focus=true#step:4:256)] No more mirrors to try.
#11 1.638 https://developer.download.nvidia.com/compute/machine-learning/repos/rhel7/x86_64/repodata/repomd.xml: [Errno 14] HTTPS Error 404 - Not Found
#11 ERROR: executor failed running [/bin/sh -c yum install -y         aclocal         autoconf         automake         bison         bzip2         curl         diffutils         file         git         make         patch         perl         unzip         util-linux         wget         which         xz         yasm]: exit code: 1

#18 [base 2/8] RUN yum install -y wget curl perl util-linux xz bzip2 git patch which perl zlib-devel
#18 sha256:85b55d7[268](https://github.com/pytorch/builder/runs/5764231634?check_suite_focus=true#step:4:268)ad8c9b497852109d67f42ae51b7a367e627ee35635e6da9d5ff657
#18 CANCELED
------
 > [common  2/19] RUN yum install -y         aclocal         autoconf         automake         bison         bzip2         curl         diffutils         file         git         make         patch         perl         unzip         util-linux         wget         which         xz         yasm:
------
executor failed running [/bin/sh -c yum install -y         aclocal         autoconf         automake         bison         bzip2         curl         diffutils         file         git         make         patch         perl         unzip         util-linux         wget         which         xz         yasm]: exit code: 1
Error: Process completed with exit code 1.

@malfet
Copy link
Contributor

malfet commented Mar 31, 2022

@guoyejun yeah, NVIDIA is having an outage today, see pytorch/pytorch#74968

@guoyejun
Copy link
Contributor

thanks @malfet

and looks that we can add a new check (build-docker-cpu-cxx11-abi) in the CI system once this PR is accepted. :)

@malfet malfet merged commit 53b8397 into pytorch:main Mar 31, 2022
@guoyejun
Copy link
Contributor

guoyejun commented Apr 8, 2022

@malfet possible to add new checks in CI system for github/pytorch/pytorch to verify the built linux pip wheel with cxx11-abi? thanks.

And we also need to build the cxx11-abi wheel file nightly and at release time. Take nightly as an example, to provide the pip cxx11 wheel file at https://download.pytorch.org/whl/nightly/cpu, and also mention the install commands at https://pytorch.org/get-started/locally/ (Preview (Nightly)--Linux--Pip--python--CPU)

@guoyejun
Copy link
Contributor

looks that glibc version is a bit high with ubuntu 18.04 as base image in this PR, we'll try to see if centos8 (with lower glibc version) as base image works, want to know your comment, @malfet , we may create the change if you do not object, thanks

@malfet
Copy link
Contributor

malfet commented Apr 20, 2022

@malfet possible to add new checks in CI system for github/pytorch/pytorch to verify the built linux pip wheel with cxx11-abi? thanks.

@guoyejun feel free to propose the PR that does that (binary build matrix is defined in https://github.com/pytorch/pytorch/blob/master/.github/scripts/generate_binary_build_matrix.py )

@guoyejun
Copy link
Contributor

@malfet possible to add new checks in CI system for github/pytorch/pytorch to verify the built linux pip wheel with cxx11-abi? thanks.

@guoyejun feel free to propose the PR that does that (binary build matrix is defined in https://github.com/pytorch/pytorch/blob/master/.github/scripts/generate_binary_build_matrix.py )

got it, thanks. We'll look at it after the base image is done.

@zhuhong61
Copy link
Contributor Author

zhuhong61 commented Jun 18, 2022

Hi @malfet, we add new checks for build-docker-cpu-cxx11-abi in CI system, and the work-flowchecks in our PR https://github.com/pytorch/pytorch/pull/79409 has regenerated the .github/workflows/generated-linux-binary-manywheel-nightly.yml. How can we make sure the jobs in generated-linux-binary-manywheel-nightly.yml has been truly triggered, and whether the cpu-cxx-abi docker has been built? Is there any other works needed? Thanks!

pytorchmergebot pushed a commit to pytorch/pytorch that referenced this pull request Feb 14, 2023
…cpu-cxx11-abi (#79409)

We added the linux pip wheel with cpu-cxx11-abi in pytorch/builder, see: pytorch/builder#990 and pytorch/builder#1023

The purpose of this PR is to add new checks in pytorch CI system to verify the linux pip wheel with cpu-cxx11-abi.

Co-authored-by: Zhu Hong <[email protected]>
Co-authored-by: Guo Yejun <[email protected]>

Pull Request resolved: #79409
Approved by: https://github.com/malfet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants