Skip to content

Add nvFuser support for torch.native_batch_norm #85562

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 24 commits into from

Conversation

IvanYashchuk
Copy link
Collaborator

This PR adds nvFuser's implementation for batch_norm as there's no reference yet (#81191) and no in-place copy support (#84545).

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 23, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/85562

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 Failures

As of commit d523e29:

The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: jit release notes category label Sep 23, 2022
@facebook-github-bot facebook-github-bot added cla signed oncall: jit Add this issue/PR to JIT oncall triage queue labels Sep 23, 2022
@IvanYashchuk IvanYashchuk marked this pull request as ready for review September 27, 2022 16:21
nvfuser::Tensor bias,
nvfuser::Tensor running_mean,
nvfuser::Tensor running_var,
bool kTraining,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Why is the training parameter preficed with a k? I thought the k usually denoted an enum value.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably I copied it from

I'll rename it.

{},
std::move(_outputs),
"null_tensor",
RecordType::Tensor) {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a new Op Type since you have a unique Record like RecordType::NullTensor.

Copy link
Collaborator

@kevinstephano kevinstephano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks fine, I think a new Record type should be added for NullTensor.

@IvanYashchuk
Copy link
Collaborator Author

@pytorchbot merge -g

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a merge job. Check the current status here.
The merge job was triggered with the green (-g) flag. This means that your change will be merged once all checks on your PR have passed (ETA: 0-4 Hours). If this is not the intended behavior, feel free to use some of the other merge options in the wiki.
Please reach out to the PyTorch DevX Team with feedback or questions!

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: The following mandatory check(s) failed (Rule superuser):

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

jjsjann123 added a commit to csarofeen/pytorch that referenced this pull request Sep 30, 2022
Fixes BN inference. I'm stealing Ivan's changes from pytorch#85562

We are returning mini-batch stats during inference run in aten, this is not the right behavior and we should have changed that instead. But for the time being, let's change nvfuser behavior just to get CI green.

Also, the extra set here to avoid trivial forwarding should be removed once #1995 is merged.
@IvanYashchuk
Copy link
Collaborator Author

OOM failure:

2022-09-30T15:29:35.8006856Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_complex32 FAILED [ 23%]
2022-09-30T15:29:35.8006880Z 
2022-09-30T15:29:35.8007041Z =================================== FAILURES ===================================
2022-09-30T15:29:35.8007247Z ________ TestCommonCUDA.test_python_ref__refs_diag_embed_cuda_complex32 ________
2022-09-30T15:29:35.8007389Z Traceback (most recent call last):
2022-09-30T15:29:35.8007772Z   File "/opt/conda/lib/python3.10/site-packages/torch/testing/_comparison.py", line 1073, in assert_equal
2022-09-30T15:29:35.8007888Z     pair.compare()
2022-09-30T15:29:35.8008229Z   File "/opt/conda/lib/python3.10/site-packages/torch/testing/_comparison.py", line 620, in compare
2022-09-30T15:29:35.8008381Z     self._compare_values(actual, expected)
2022-09-30T15:29:35.8008715Z   File "/opt/conda/lib/python3.10/site-packages/torch/testing/_comparison.py", line 721, in _compare_values
2022-09-30T15:29:35.8008940Z     compare_fn(actual, expected, rtol=self.rtol, atol=self.atol, equal_nan=self.equal_nan)
2022-09-30T15:29:35.8009316Z   File "/opt/conda/lib/python3.10/site-packages/torch/testing/_comparison.py", line 853, in _compare_regular_values_close
2022-09-30T15:29:35.8009536Z     matches = torch.isclose(actual, expected, rtol=rtol, atol=atol, equal_nan=equal_nan)
2022-09-30T15:29:35.8010173Z torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 34.00 MiB (GPU 0; 7.44 GiB total capacity; 402.93 MiB already allocated; 29.19 MiB free; 1.64 GiB allowed; 818.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

@IvanYashchuk
Copy link
Collaborator Author

@pytorchbot merge -f "XLA failure is unrelated, see 86093"

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a merge job. Check the current status here.
The merge job was triggered with the force (-f) flag. This means your change will be merged immediately, bypassing any CI checks (ETA: 1-5 minutes). If this is not the intended behavior, feel free to use some of the other merge options in the wiki.
Please reach out to the PyTorch DevX Team with feedback or questions!

@github-actions
Copy link
Contributor

github-actions bot commented Oct 3, 2022

Hey @IvanYashchuk.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

@soulitzer
Copy link
Contributor

@pytorchbot revert -m "Periodic tests failing test_nvfuser_extremal_values_native_batch_norm_cuda_float64 (main.TestCudaFuserOpInfoCUDA)" -c nosignal

@soulitzer soulitzer reopened this Oct 4, 2022
@soulitzer soulitzer added the ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR label Oct 4, 2022
@linux-foundation-easycla
Copy link

CLA Not Signed

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Please reach out to the PyTorch DevX Team with feedback or questions!

@pytorchmergebot
Copy link
Collaborator

Reverting PR 85562 failed

Reason: The following mandatory check(s) failed (Rule superuser):

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

@soulitzer
Copy link
Contributor

I'm unable to revert this PR (you'll need to sign the new CLA @IvanYashchuk), disabling the failing tests for now

@facebook-github-bot
Copy link
Contributor

/easycla

As part of the transition to the PyTorch Foundation, this project now requires contributions be covered under the new CLA. See #85559 for additional details.

This comment will trigger a new check of this PR. If you are already covered, you will simply see a new "EasyCLA" check that passes. If you are not covered, a bot will leave a new comment with a link to sign.

jjsjann123 pushed a commit to jjsjann123/nvfuser that referenced this pull request Oct 29, 2022
This PR adds nvFuser's implementation for batch_norm as there's no reference yet (pytorch/pytorch#81191) and no in-place copy support (pytorch/pytorch#84545).

Pull Request resolved: pytorch/pytorch#85562
Approved by: https://github.com/kevinstephano, https://github.com/ngimel
jjsjann123 pushed a commit to jjsjann123/nvfuser that referenced this pull request Nov 10, 2022
This PR adds nvFuser's implementation for batch_norm as there's no reference yet (pytorch/pytorch#81191) and no in-place copy support (pytorch/pytorch#84545).

Pull Request resolved: pytorch/pytorch#85562
Approved by: https://github.com/kevinstephano, https://github.com/ngimel
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR cla signed Merged module: nvfuser module: primTorch oncall: jit Add this issue/PR to JIT oncall triage queue open source release notes: jit release notes category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants