-
Notifications
You must be signed in to change notification settings - Fork 227
try setting MAX_JOBS=4 for oom in arm wheel #1804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@pytorchbot rebase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-reviewing since we are having libopenblas.so test issues.
@pytorchbot rebase |
Please rebase so that the s390x errors will not show up: https://hud.pytorch.org/pytorch/pytorch/pull/126174 For the cuda test failures, we need to wait for ARM + CUDA instance availability: e.g. https://aws.amazon.com/ec2/instance-types/g5g/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebase is needed to fix some ibm errors.
Otherwise, looks great!
Thanks for reviewing. I think we need the SBSA nvidia driver 550.54.15 to be uploaded to AWS instead of the instance availability. I started https://github.com/pytorch/test-infra/pull/5218/files to be merged once we upload the sbsa nvidia driver runfile to https://s3.amazonaws.com/ossci-linux/nvidia_driver/. |
@pytorchbot rebase |
The error message was "RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx" Would we need to have a nvidia gpu first and then installing a driver? The M7G instance does not have an NVIDIA GPU. |
And good catch, eventually we would need SBSA nvidia driver 550.54.15 to be uploaded to AWS for the test to work. |
The rebase command may not work on pytorch/builder repo. A manual rebase is needed. |
Yes we will need an ARM+CUDA instance, thanks for catching that. |
* Disable automatic building of s390x docker image * Update docker image and build scripts for s390x * Switch devtoolset to 13 There is a not yet investigated build failure caused by gcc 12, but it doesn't reproduce with gcc 13. * Adapt binaries check for s390x * Switch to ubuntu:24.04 for s390x * Update libgomp.so.1 path for s390x
This reverts commit 6b90c09.
* Don't deactivate/remove conda on linux * test
* Add manylinux_2_28 image
* Manylinux 2_28 fix cmake install * fix
This reverts commit bebc062.
please ignore the above commits created by rebase, will resolve these later. |
https://github.com/pytorch/pytorch/actions/runs/8840652730/job/24276381274?pr=124112 hitting OOM error in building cuda ARM wheel.
Try changing MAX_JOBS.