Skip to content

nvJPEG is missing from Windows GPU runners #3979

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pmeier opened this issue Apr 3, 2023 · 3 comments
Closed

nvJPEG is missing from Windows GPU runners #3979

pmeier opened this issue Apr 3, 2023 · 3 comments

Comments

@pmeier
Copy link
Contributor

pmeier commented Apr 3, 2023

According to the NVIDIA website, nvJPEG should come with the CUDA installation:

  • For the most current version of nvJPEG, download the CUDA Toolkit.
  • If you are using CUDA Toolkit 10.0 or 9.0, please download the nvJPEG installer.

However, this does not seem to be the case: https://github.com/pytorch/vision/actions/runs/4594756756/jobs/8114184733#step:9:234

The images torchvision uses for CircleCI have an explicit check for this: https://github.com/pytorch/vision/blob/781f512b01bc2324d7fdd11f0901f60571fc476f/.circleci/unittest/windows/scripts/set_cuda_envs.sh#L32-L35 Doing the same on the windows_job, fails (pytorch/vision#7475)

This currently blocks migrating Windows GPU workflows for torchvision.

@malfet
Copy link
Contributor

malfet commented Apr 4, 2023

CUDA installer can choose whether or not to include various subcomponents and I had to fix missing nvjpeg in torchvision many times, most recent in pytorch/vision#7186

I.e. looks like all one has to do is to add nvjpeg to

$installerArgs = "nvcc_$cudaVersion cuobjdump_$cudaVersion nvprune_$cudaVersion nvprof_$cudaVersion cupti_$cudaVersion cublas_$cudaVersion cublas_dev_$cudaVersion cudart_$cudaVersion cufft_$cudaVersion cufft_dev_$cudaVersion curand_$cudaVersion curand_dev_$cudaVersion cusolver_$cudaVersion cusolver_dev_$cudaVersion cusparse_$cudaVersion cusparse_dev_$cudaVersion npp_$cudaVersion npp_dev_$cudaVersion nvrtc_$cudaVersion nvrtc_dev_$cudaVersion nvml_dev_$cudaVersion"
and the deploy new AMI. @atalman can you please take care of that?

malfet added a commit that referenced this issue Apr 4, 2023
Partially addresses #3979
malfet added a commit that referenced this issue Apr 4, 2023
Similar to #3988
Partially addresses #3979
Test plan: `¯\_(ツ)_/¯`
@pmeier
Copy link
Contributor Author

pmeier commented Apr 5, 2023

Why do we install only parts of the toolkit and not all of it? Is storage on the runners a concern? If yes, I saw that there are older versions, i.e. < CUDA 11.7, present on the runner that could potentially be removed.

@atalman
Copy link
Contributor

atalman commented May 18, 2023

this is completed AMI was deployed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants