-
Notifications
You must be signed in to change notification settings - Fork 24.4k
Add nvFuser support for torch.native_batch_norm #85562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/85562
Note: Links to docs will display an error until the docs builds have been completed. ❌ 4 FailuresAs of commit d523e29: The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
nvfuser::Tensor bias, | ||
nvfuser::Tensor running_mean, | ||
nvfuser::Tensor running_var, | ||
bool kTraining, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Why is the training parameter preficed with a k
? I thought the k
usually denoted an enum value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably I copied it from
const bool kTraining, |
I'll rename it.
{}, | ||
std::move(_outputs), | ||
"null_tensor", | ||
RecordType::Tensor) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a new Op Type since you have a unique Record like RecordType::NullTensor
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks fine, I think a new Record type should be added for NullTensor
.
@pytorchbot merge -g |
@pytorchbot successfully started a merge job. Check the current status here. |
Merge failedReason: The following mandatory check(s) failed (Rule Dig deeper by viewing the failures on hud Details for Dev Infra teamRaised by workflow job |
Fixes BN inference. I'm stealing Ivan's changes from pytorch#85562 We are returning mini-batch stats during inference run in aten, this is not the right behavior and we should have changed that instead. But for the time being, let's change nvfuser behavior just to get CI green. Also, the extra set here to avoid trivial forwarding should be removed once #1995 is merged.
OOM failure: 2022-09-30T15:29:35.8006856Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_complex32 FAILED [ 23%]
2022-09-30T15:29:35.8006880Z
2022-09-30T15:29:35.8007041Z =================================== FAILURES ===================================
2022-09-30T15:29:35.8007247Z ________ TestCommonCUDA.test_python_ref__refs_diag_embed_cuda_complex32 ________
2022-09-30T15:29:35.8007389Z Traceback (most recent call last):
2022-09-30T15:29:35.8007772Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_comparison.py", line 1073, in assert_equal
2022-09-30T15:29:35.8007888Z pair.compare()
2022-09-30T15:29:35.8008229Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_comparison.py", line 620, in compare
2022-09-30T15:29:35.8008381Z self._compare_values(actual, expected)
2022-09-30T15:29:35.8008715Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_comparison.py", line 721, in _compare_values
2022-09-30T15:29:35.8008940Z compare_fn(actual, expected, rtol=self.rtol, atol=self.atol, equal_nan=self.equal_nan)
2022-09-30T15:29:35.8009316Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_comparison.py", line 853, in _compare_regular_values_close
2022-09-30T15:29:35.8009536Z matches = torch.isclose(actual, expected, rtol=rtol, atol=atol, equal_nan=equal_nan)
2022-09-30T15:29:35.8010173Z torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 34.00 MiB (GPU 0; 7.44 GiB total capacity; 402.93 MiB already allocated; 29.19 MiB free; 1.64 GiB allowed; 818.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF |
@pytorchbot merge -f "XLA failure is unrelated, see 86093" |
@pytorchbot successfully started a merge job. Check the current status here. |
Hey @IvanYashchuk. |
@pytorchbot revert -m "Periodic tests failing test_nvfuser_extremal_values_native_batch_norm_cuda_float64 (main.TestCudaFuserOpInfoCUDA)" -c nosignal |
|
@pytorchbot successfully started a revert job. Check the current status here. |
Reverting PR 85562 failedReason: The following mandatory check(s) failed (Rule Dig deeper by viewing the failures on hud Details for Dev Infra teamRaised by workflow job |
I'm unable to revert this PR (you'll need to sign the new CLA @IvanYashchuk), disabling the failing tests for now |
/easycla As part of the transition to the PyTorch Foundation, this project now requires contributions be covered under the new CLA. See #85559 for additional details. This comment will trigger a new check of this PR. If you are already covered, you will simply see a new "EasyCLA" check that passes. If you are not covered, a bot will leave a new comment with a link to sign. |
This PR adds nvFuser's implementation for batch_norm as there's no reference yet (pytorch/pytorch#81191) and no in-place copy support (pytorch/pytorch#84545). Pull Request resolved: pytorch/pytorch#85562 Approved by: https://github.com/kevinstephano, https://github.com/ngimel
This PR adds nvFuser's implementation for batch_norm as there's no reference yet (pytorch/pytorch#81191) and no in-place copy support (pytorch/pytorch#84545). Pull Request resolved: pytorch/pytorch#85562 Approved by: https://github.com/kevinstephano, https://github.com/ngimel
This PR adds nvFuser's implementation for batch_norm as there's no reference yet (#81191) and no in-place copy support (#84545).