Skip to content

enable Windows CPU CI on GHA #7475

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 44 commits into from
Apr 4, 2023
Merged

enable Windows CPU CI on GHA #7475

merged 44 commits into from
Apr 4, 2023

Conversation

pmeier
Copy link
Collaborator

@pmeier pmeier commented Mar 30, 2023

Same deal as for the other migrations here: let's run the CircleCI and GHA tests in parallel for a few weeks and if nothing comes up, we can remove the ones on CircleCI. This only ports the CPU jobs for now due to pytorch/test-infra#3979.

cc @seemethere

@pytorch-bot
Copy link

pytorch-bot bot commented Mar 30, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/7475

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Failures

As of commit c5c129f:

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

if [[ "${OS_TYPE}" == "macos" && $(uname -m) == x86_64 ]]; then
echo '::group::Uninstall system JPEG libraries on macOS'
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just moves the grouping inside the if so this doesn't show up empty on non-macos runners.

Comment on lines 37 to 38
# FIXME: Port this to pytorch/test-infra/.github/workflows/windows_job.yml
export PATH="/c/Jenkins/Miniconda3/Scripts:${PATH}"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to dig why putting this into the windows job template as I did in pytorch/test-infra#3960 didn't do the job.

In any case, since this is just fixing the PATH for the conda binary, this is not blocking for this PR and we can fix later.

@pmeier pmeier changed the title [DEBUG] enable Windows CI on GHA enable Windows CI on GHA Mar 30, 2023
@@ -66,10 +69,24 @@ ltt install --progress-bar=off \
torch

if [[ $GPU_ARCH_TYPE == 'cuda' ]]; then
python3 -c "import torch; exit(not torch.cuda.is_available())"
python -c "import torch; exit(not torch.cuda.is_available())"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python3 is not not available in conda envs on Windows.

@@ -4,15 +4,11 @@ set -euo pipefail

./.github/scripts/setup-env.sh

# Prepare conda
CONDA_PATH=$(which conda)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shorten this a little. No functional difference, but can be easier copy-pasted during SSH debug sessions in CI.


echo '::group::Install testing utilities'
pip install --progress-bar=off pytest pytest-mock pytest-cov
echo '::endgroup::'

echo '::group::Run unittests'
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a command is finished, GH automatically collapses groups. Since we want to see the tests, drop the grouping here.

@pmeier pmeier changed the title enable Windows CI on GHA enable Windows CPU CI on GHA Apr 3, 2023
@pmeier pmeier requested a review from atalman April 3, 2023 12:00
@pmeier pmeier marked this pull request as ready for review April 3, 2023 12:00
@pmeier pmeier mentioned this pull request Apr 3, 2023
21 tasks
Copy link
Contributor

@atalman atalman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pmeier
Copy link
Collaborator Author

pmeier commented Apr 4, 2023

@atalman Just downloading nvjpeg from conda didn't work since we only look for it in $CUDA_HOME:

vision/setup.py

Lines 313 to 317 in 781f512

nvjpeg_found = (
extension is CUDAExtension
and CUDA_HOME is not None
and os.path.exists(os.path.join(CUDA_HOME, "include", "nvjpeg.h"))
)

Until we rebuild the base images to include this by default, we could do the same as we do in our packaging scripts:

:cuda117
set CUDA_INSTALL_EXE=cuda_11.7.0_516.01_windows.exe
if not exist "%SRC_DIR%\temp_build\%CUDA_INSTALL_EXE%" (
curl -k -L "https://ossci-windows.s3.amazonaws.com/%CUDA_INSTALL_EXE%" --output "%SRC_DIR%\temp_build\%CUDA_INSTALL_EXE%"
if errorlevel 1 exit /b 1
set "CUDA_SETUP_FILE=%SRC_DIR%\temp_build\%CUDA_INSTALL_EXE%"
set "ARGS=thrust_11.7 nvcc_11.7 cuobjdump_11.7 nvprune_11.7 nvprof_11.7 cupti_11.7 cublas_11.7 cublas_dev_11.7 cudart_11.7 cufft_11.7 cufft_dev_11.7 curand_11.7 curand_dev_11.7 cusolver_11.7 cusolver_dev_11.7 cusparse_11.7 cusparse_dev_11.7 npp_11.7 npp_dev_11.7 nvjpeg_11.7 nvjpeg_dev_11.7 nvrtc_11.7 nvrtc_dev_11.7 nvml_dev_11.7"
)
set CUDNN_INSTALL_ZIP=cudnn-windows-x86_64-8.3.2.44_cuda11.5-archive.zip
set CUDNN_FOLDER=cudnn-windows-x86_64-8.3.2.44_cuda11.5-archive
set CUDNN_LIB_FOLDER="lib"
if not exist "%SRC_DIR%\temp_build\%CUDNN_INSTALL_ZIP%" (
curl -k -L "http://s3.amazonaws.com/ossci-windows/%CUDNN_INSTALL_ZIP%" --output "%SRC_DIR%\temp_build\%CUDNN_INSTALL_ZIP%"
if errorlevel 1 exit /b 1
set "CUDNN_SETUP_FILE=%SRC_DIR%\temp_build\%CUDNN_INSTALL_ZIP%"
rem Make sure windows path contains zlib dll
curl -k -L "http://s3.amazonaws.com/ossci-windows/zlib123dllx64.zip" --output "%SRC_DIR%\temp_build\zlib123dllx64.zip"
7z x "%SRC_DIR%\temp_build\zlib123dllx64.zip" -o"%SRC_DIR%\temp_build\zlib"
xcopy /Y "%SRC_DIR%\temp_build\zlib\dll_x64\*.dll" "C:\Windows\System32"
)
goto cuda_common

However, due to pytorch/test-infra#3986 it is currently almost impossible for me to try this. Thus, to unblock this PR, I'm going to remove the GPU workflow for now.

When nvjpeg is available by default on the Windows runners, or they spin up predictable enough for me to SSH in and debug, we can re-enable them.

@pmeier pmeier merged commit 5c5a94d into pytorch:main Apr 4, 2023
@pmeier pmeier deleted the win-ci branch April 4, 2023 14:09
@github-actions
Copy link

github-actions bot commented Apr 4, 2023

Hey @pmeier!

You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

facebook-github-bot pushed a commit that referenced this pull request Apr 24, 2023
Reviewed By: vmoens

Differential Revision: D45183662

fbshipit-source-id: 4a8562c760d3551680b5fe3ac36da3ed52e33aee
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants