Skip to content

[BE] Add sccache to manywheel binary build #1169

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 18 commits into from

Conversation

huydhn
Copy link
Contributor

@huydhn huydhn commented Oct 21, 2022

Manywheel binary build currently doesn't have any compiler cache, so it takes more than 2h+ to just build PyTorch https://github.com/pytorch/pytorch/actions/runs/3285968556/jobs/5413580528.

I'll make a similar change to libtorch binary build later in a separate PR (lower priority cause building libtorch is not that slow)

Testing

  • Build the new image locally with GPU_ARCH_TYPE=cuda GPU_ARCH_VERSION=11.6 manywheel/build_docker.sh
  • Build PyTorch binary locally with the locally built image, i.e. pytorch/manylinux-builder:cuda11.6-e7608179efd287af102e40941fc24abff8d8a5bd. Here is the exact command I run inside the container:
export PYTORCH_ROOT=/tmp/pytorch
export BUILDER_ROOT=/tmp/builder
export PACKAGE_TYPE=manywheel
export DESIRED_CUDA=cu116
export GPU_ARCH_VERSION=11.6
export GPU_ARCH_TYPE=cuda
export DESIRED_PYTHON=3.7
export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2
export SCCACHE_S3_KEY_PREFIX=debug
/tmp/builder/manywheel/build.sh

@huydhn huydhn requested review from atalman and a team October 21, 2022 21:41
@huydhn huydhn self-assigned this Oct 21, 2022
@huydhn huydhn marked this pull request as ready for review October 21, 2022 23:16
@huydhn
Copy link
Contributor Author

huydhn commented Oct 21, 2022

@huydhn
Copy link
Contributor Author

huydhn commented Oct 22, 2022

The ROCm failure https://github.com/pytorch/builder/actions/runs/3301097112/jobs/5446221931 is due to the recent change in #1160. cc @jataylo Could you help do a fix for that? It looks like 5.1.1 is missing from the list (-eq v.s. -ge)

@jataylo
Copy link
Contributor

jataylo commented Oct 24, 2022

@huydhn Will take a look at this, thank you.

CC @jithunnair-amd

@jithunnair-amd
Copy link
Collaborator

@huydhn Attempting a fix in PR #1170

@huydhn
Copy link
Contributor Author

huydhn commented Oct 24, 2022

Push to https://github.com/pytorch/builder/tree/add-sccache-support so that the docker image can be published for testing

@huydhn
Copy link
Contributor Author

huydhn commented Oct 24, 2022

The new docker images are published for testing cuda11.6

@malfet
Copy link
Contributor

malfet commented Oct 25, 2022

Not using sccache for binary builds is a design decision:

  • sccache in the past is known to return stale/invalid objects (as it failed to account that file needs to be recompiled)
  • This also a security risk, as one can submit a PR that populates cache with some malicious data that will be incorporated into a nightly/release builds.

@atalman
Copy link
Contributor

atalman commented Oct 25, 2022

I would also add, that for binary builds we want to have predictable dependency versions from what I saw in failure logs it failed to return some dependency - dependency was missing or incorrect version. This maybe a same use case as @malfet point 1.

@huydhn huydhn marked this pull request as draft October 25, 2022 21:18
@huydhn huydhn closed this Oct 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants