Skip to content

IFU-master-2022-04-11 #993

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 344 commits into from
Apr 22, 2022
Merged

IFU-master-2022-04-11 #993

merged 344 commits into from
Apr 22, 2022

Conversation

rraminen
Copy link

No conflict

ezyang and others added 30 commits April 2, 2022 02:18
If __torch_function__ was disabled, this TLS should propagate to
other threads.

Although I was thinking about pytorch#73942
when I did this, this doesn't actually help solve the problem, because
when I disable __torch_function__ as part of the disabled
__torch_function__ implementation, this is prior to when snapshotting
happens (also snapshotting only happens for Python tensors anyway).

I intend to add some more TLS to this struct soon, which is why it's
a struct and not just a bool.

Testing is not so easy to do because on CPU there isn't an easy way
to get Python code running in another thread.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: pytorch#75110

Approved by: https://github.com/albanD
Summary:
Pull Request resolved: pytorch#75138

(Note: this ignores all push blocking failures!)

Test Plan: CI

Reviewed By: wconstab

Differential Revision: D35331263

fbshipit-source-id: e426c4017359c9f98188c0df5226775be7b1f700
(cherry picked from commit bf1768f)
Partially fixes: pytorch#66328

This PR introduces a templated class `IList<T>`: a wrapper container for
boxed (`c10::List<T>`) and unboxed (`at::ArrayRef<T>`) containers. At this point, it was
created having `T = Tensor` in mind, but aiming for supporting `T = OptionalTensorRef`,
too.

Pull Request resolved: pytorch#67964

Approved by: https://github.com/ezyang
…74843)

Summary:
Pull Request resolved: pytorch#74843

is_output_quantized is used to check if we should quantize the op based on the dtype configuration in qconfig and what
is supported by the backend, we'll skip inserting observer if the dtype configuration is not supported by the backend,
this is now supported by backend_config_dict, and we can remove this function now.

Also we previously supported fp16 static quantization for some ops for one of our internal use case, and now it is not required, so
we can remove them

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: andrewor14

Differential Revision: D35190541

fbshipit-source-id: 623d961810737ec01e1f8b269ec48a6a99bb284a
(cherry picked from commit a405998)
This PR enables jit-compiled reductions and moves `prod` to be jit-compiled.
Currently, only reductions that can use `func_wrapper` for automatic implementation of `reduce/project/translate_idx` opes are supported, there are a few TODOs for support of more complex reductions such as norms and max, that typically require full-fledged ReduceOps functor. Similarly, only reductions with a single input are supported.
Number of inputs is hardcoded to 1, which is true for our current reductions, but can be relaxed in the future.

Pull Request resolved: pytorch#74446
Approved by: https://github.com/mruberry
References: pytorch#13918

Add more test cases for list of numpy array inputs
Pull Request resolved: pytorch#72249
Approved by: https://github.com/mruberry
Fixes pytorch#74122

This re-enables TestTorchFunctionOverride and fixes a bunch of test failures
that had crept in while it was disabled.

Pull Request resolved: pytorch#74202

Approved by: https://github.com/ezyang
…orch#75149)

Summary:
Pull Request resolved: pytorch#75149

https://github.com/pytorch/rfcs/blob/master/RFC-0017-PyTorch-Operator-Versioning.md
ghstack-source-id: 152906910

Test Plan: CI

Reviewed By: qihqi

Differential Revision: D35338681

fbshipit-source-id: 03cb699696af2c946d67ece95bdc019fc4a4cb11
(cherry picked from commit b72737e)
Summary:
Add BFloat16 support for smooth_l1_loss on CPU.

Pull Request resolved: pytorch#62558

Reviewed By: H-Huang

Differential Revision: D34897859

Pulled By: frank-wei

fbshipit-source-id: a52138c89852642db78f5f3083d05873f3cdec3a
(cherry picked from commit 71908ee)
Summary:
Pull Request resolved: pytorch#75176

Switch to python resources to fix build on buck2

https://www.internalfb.com/intern/wiki/Buck-users/Python_Resources_in_fbcode/

Reviewed By: r-barnes

Differential Revision: D35352705

fbshipit-source-id: f85043ebbcfbb30d287c802ff7401c89155a024a
(cherry picked from commit 35e7a98)
Test Plan: revert-hammer

Differential Revision:
D35352705 (pytorch@152489a)

Original commit changeset: f85043ebbcfb

Original Phabricator Diff: D35352705 (pytorch@152489a)

fbshipit-source-id: 901e28dd17150c6300b2d263aba1a8b0651d3020
(cherry picked from commit ab91a2a)
…torch#74636)

Summary:
Pull Request resolved: pytorch#74636

This commit changes how quantization patterns for linear
and conv are set up in prepare. Previously, these were set up
through ConvReluQuantizeHandler and LinearReLUQuantizeHandler.
After this commit, however, these were set up through the
corresponding entries in the native backend_config_dict,
rendering the above quantize handlers no longer necessary.
In future commits, we will do the same for the remaining ops.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: jerryzh168, ngimel

Differential Revision: D35225680

fbshipit-source-id: 4a79f63a11fce46701eb17aaf3619c1e827d72a4
(cherry picked from commit 475f599)
I also took the opportunity to update the documentation a little
for clarity.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: pytorch#75141

Approved by: https://github.com/zou3519
The pattern of a PyObject* bundled with a PyInterpreter* is pretty
useful in many contexts (e.g., TorchDispatchTypeObject) so I have turned
it into a dedicated class SafePyObject.  In the process I fixed a
bug with the old TorchDispatchTypeObject (copy constructor/assignment
was not deleted), made the API more safe (retrieving the PyObject*
pointer requires verification that the PyInterpreter* matches) and
fixed some minor inefficiencies in C++ code.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: pytorch#75142

Approved by: https://github.com/zou3519
Now there is truly only one way to call __torch_function__
and that is via handle_torch_function_no_python_arg_parser

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: pytorch#75159

Approved by: https://github.com/zou3519
Fixes #ISSUE_NUMBER

Pull Request resolved: pytorch#75165
Approved by: https://github.com/seemethere
This PR updates the documentation for the CosineEmbeddingLoss.
The loss function uses the `cosine similarity` but in the documentation the term `cosine distance` is used. Therefor the term is changed to `cosine similarity`

Fixes pytorch#75104

Pull Request resolved: pytorch#75188
Approved by: https://github.com/cpuhrsch
Make it accept origin environment variable
Add explicit skip for pytorch@0a6a1b2 as its a rare case of same commit landed/reverted twice

Pull Request resolved: pytorch#75209
Approved by: https://github.com/atalman, https://github.com/bigfootjon
- Fix _Demux can not be pickled with DILL presented pytorch#74958 (comment)
- And add cache to traverse function to prevent infinite recursion for circular reference of DataPipe (Fixes pytorch/data#237)
Pull Request resolved: pytorch#75034
Approved by: https://github.com/wenleix
Recently, @cpuhrsch noticed that going to viable/strict still didn't resolve upstream failures for lint. This is because we didn't check out the head SHA for those GHA (we missed it last time).

This PR attempts to do some consolidation and fix that problem to make viable/strict more reliable.

Pull Request resolved: pytorch#75199
Approved by: https://github.com/cpuhrsch, https://github.com/seemethere, https://github.com/malfet
…ink, hardswish and softplus on CPU (pytorch#63134)

Summary:
Add BFloat16 support for logsigmoid, hardsigmoid, hardshrink, softshrink,  hardswish and softplus  on CPU,  and optimize the performance of softshrink.

Pull Request resolved: pytorch#63134

Reviewed By: yinghai

Differential Revision: D34897992

Pulled By: frank-wei

fbshipit-source-id: 4c778f5271d6fa54dd78158258941def8d9252f5
(cherry picked from commit decda0e)
Summary:
Pull Request resolved: pytorch#73219

Saw a report that this elementwise add is causing overhead. IIUC this is easy to fuse?
ghstack-source-id: 152549975

Test Plan:
CI, review

Ran benchmark_transformers.par mha --batch-size 64 --max-sequence-length 128 --avg-sequence-length 256 --large --use-real-data-distribution --use-mask
and looked at the PT time number

```
before:
B=64, T=128, Half=True, GPU=True, Seed=1234, Padded tokens=54.92%, Use Mask=True             PT Time: 1.24ms, NativePT Time: 1000000000.00ms, HF Time: 1.10ms,             PT FLOPS: 59.07TFLOP/s, NativePT FLOPS: 0.00TFLOP/s, HF FLOPS: 66.46TFLOP/s
B=64, T=128, Half=True, GPU=True, Seed=1234, Padded tokens=54.92%, Use Mask=True             PT Time: 1.23ms, NativePT Time: 1000000000.00ms, HF Time: 1.09ms,             PT FLOPS: 59.57TFLOP/s, NativePT FLOPS: 0.00TFLOP/s, HF FLOPS: 66.75TFLOP/s
B=64, T=128, Half=True, GPU=True, Seed=1234, Padded tokens=54.92%, Use Mask=True             PT Time: 1.24ms, NativePT Time: 1000000000.00ms, HF Time: 1.09ms,             PT FLOPS: 58.87TFLOP/s, NativePT FLOPS: 0.00TFLOP/s, HF FLOPS: 66.77TFLOP/s

after:
B=64, T=128, Half=True, GPU=True, Seed=1234, Padded tokens=54.92%, Use Mask=True             PT Time: 1.22ms, NativePT Time: 1000000000.00ms, HF Time: 1.10ms,             PT FLOPS: 60.07TFLOP/s, NativePT FLOPS: 0.00TFLOP/s, HF FLOPS: 66.51TFLOP/s
B=64, T=128, Half=True, GPU=True, Seed=1234, Padded tokens=54.92%, Use Mask=True             PT Time: 1.22ms, NativePT Time: 1000000000.00ms, HF Time: 1.09ms,             PT FLOPS: 59.80TFLOP/s, NativePT FLOPS: 0.00TFLOP/s, HF FLOPS: 66.69TFLOP/s
B=64, T=128, Half=True, GPU=True, Seed=1234, Padded tokens=54.92%, Use Mask=True             PT Time: 1.21ms, NativePT Time: 1000000000.00ms, HF Time: 1.09ms,             PT FLOPS: 60.21TFLOP/s, NativePT FLOPS: 0.00TFLOP/s, HF FLOPS: 66.86TFLOP/s
```

Inspected a Kineto trace and confirmed that an elementwise add was fused into baddbmm.

Additional opportunity: I see a copy_ inside baddbmm that wasn't happening with the bmm path and I'm not sure why. Perhaps something went wrong with the structured kernels port by ezyang?

Reviewed By: ezyang

Differential Revision: D34160547

fbshipit-source-id: 78d406fb035e6f3bf13af2c9443a886eada35ac4
(cherry picked from commit aaffc39)
pytorch#75007)

Summary:
Previously the highest-level process group in `period_process_group_dict` could be `None`, indicating the global group. Now `period_process_group_dict` cannot contain `None` as a process group, so the function `_find_process_group` can just return a process group instead of a tuple -- when not found, just return `None`, because now the returned process group cannot be `None`.

Proposal: pytorch#71325

Pull Request resolved: pytorch#75007

Reviewed By: awgu

Differential Revision: D35357816

Pulled By: rohan-varma

fbshipit-source-id: 4522dba49797df7140227bfd822d668b7e118a66
(cherry picked from commit 77ca01b)
pytorchmergebot and others added 6 commits April 11, 2022 15:24
Fixes pytorch#74264 (comment).

The shape check works with or without the extras added in pytorch#74264.

```py
>>> a = torch.rand(2, 2).to_sparse_csr()
>>> b = torch.rand(2, 3).to_sparse_csr()
>>> torch.testing.assert_close(a, b)
AssertionError: The values for attribute 'shape' do not match: torch.Size([2, 2]) != torch.Size([2, 3]).
```

Tensor comparison is split into two parts:

1. Attribute comparison.
2. Value comparison.

https://github.com/pytorch/pytorch/blob/bcf6974c207ac0339bfb8bdfdb0b0ec348f7a22f/torch/testing/_comparison.py#L611-L616

The attribute comparison happens in

https://github.com/pytorch/pytorch/blob/bcf6974c207ac0339bfb8bdfdb0b0ec348f7a22f/torch/testing/_comparison.py#L618

The check for the matching shape

https://github.com/pytorch/pytorch/blob/bcf6974c207ac0339bfb8bdfdb0b0ec348f7a22f/torch/testing/_comparison.py#L647-L648

is one of the few checks that cannot be disabled through keyword arguments. Thus, there is no need for this check in `_compare_sparse_csr_values` since the comparison will fail before if the shapes mismatch.
Pull Request resolved: pytorch#75593
Approved by: https://github.com/cpuhrsch
…erge (pytorch#75542)

Summary: Pull Request resolved: pytorch#75542

Reviewed By: malfet

Differential Revision: D35513051

fbshipit-source-id: adf59359fcf2410fa8a61746533c896ec22d5ed3
(cherry picked from commit ab65394)
Fix pytorch#75482
There are several random exceptions of adding exclusions in Windows Defender https://github.com/pytorch/pytorch/runs/5953410781?check_suite_focus=true

It looks that the Add/Set-MpPreference(added in pytorch#75313) might be unstable.
Since it is a defensive step, the exception could be ignored so that the workflow should continue even the command fails.

reference:
https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_commonparameters?view=powershell-7.2

verification:
In the test PR https://github.com/pytorch/pytorch/runs/5966277521?check_suite_focus=true#step:3:54
it tries to delete 2 non-existing processes, but the workflow can continue run.
`-ErrorAction Ignore` works in the runner.
Pull Request resolved: pytorch#75588
Approved by: https://github.com/suo
Summary: The primary issue for enabling sparsity to work with QAT
convert (unlike normal quantization convert) is that when the
parametrized module undergoes the QAT convert, the parametrizations need
to be maintained. If the parametrizations don't
get transfered during the convert, the sparsifier would lose its
connection to the model. In practice this was handled using the
transfer_parametrizations_and_params function to move the weight and
bias and any associated paramerizations to the new module. This PR also adds
tests for transfer_parametrizations_and_params and type_before_parametrizations
to test_nn.py and also added comments to the test code for
composability.

Test Plan: python test/test_ao_sparsity.py TestComposability
python test/test_nn.py TestNN

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: pytorch#74848

Approved by: https://github.com/vkuzo, https://github.com/Lezcano
@rraminen
Copy link
Author

test pytorch/apex/torchvision please

@rraminen
Copy link
Author

rraminen commented Apr 19, 2022

As the CI is on ROCm 5.0, the PyTorch unit tests are tested locally on ROCm 5.1 release image.

The below 4 tests fail because of packaging error. They also fail in the rocm5.1 release image with PyT built from current rocm fork master before this IFU. (https://github.com/ROCmSoftwarePlatform/frameworks-internal/issues/1530)

test_loading_pickle (test_directory_reader.DirectoryReaderTest)
test_model_save (test_model.ModelTest)
test_resnet (test_model.ModelTest)
test_script_resnet (test_model.ModelTest)

The below 2 unit tests fail with proxy related errors. They also fail in the rocm5.1 release image with PyT built from current rocm fork master before this IFU. (https://github.com/ROCmSoftwarePlatform/frameworks-internal/issues/1531)

test_torchvision_models_detection_ssd300_vgg16 (main.TestVisionTracing)
test_torchvision_models_detection_ssdlite320_mobilenet_v3_large (main.TestVisionTracing)

The below 3 tests fail, but there are skipped for now( in #1001) as we are aware of this issue (https://ontrack-internal.amd.com/browse/SWDEV-332522):

test_event_handle_exporter (main.TestMultiprocessing)
test_event_handle_importer (main.TestMultiprocessing)
test_event_multiprocess (main.TestMultiprocessing)

Among the distributed unit tests, the below two tests fail but can be ignored as they are disabled in the upstream CI:

test_post_localSGD_optimizer_parity_with_hierarchical_sgd (main.TestDistBackendWithSpawn)
test_post_localSGD_optimizer_parity_with_hierarchical_sgd_grad_is_view (main.TestDistBackendWithSpawn)

PT_unittests_log_IFU-master-2022-04-11.log

PT_unittests_log_IFU-master-2022-04-11_distributed.log

@rraminen
Copy link
Author

http://rocmhead.amd.com:8080/job/pytorch/job/pytorch-ci/22/

Apex unit tests look good
apex.test.log

Torchvision unit tests look good
torchvision.test.log

They used 0e487f5

@rraminen rraminen requested a review from pruthvistony April 19, 2022 22:04
@pruthvistony
Copy link
Collaborator

test pytorch/apex/torchvision please

@pruthvistony
Copy link
Collaborator

@rraminen , can you push an empty commit to trigger the CI.
The previous failure is in ld while building torchvision.

@rraminen
Copy link
Author

Hi @pruthvistony, I pushed an empty commit.

@pruthvistony pruthvistony merged commit 625dd01 into master Apr 22, 2022
rraminen added a commit to rraminen/pytorch that referenced this pull request Apr 22, 2022
rraminen added a commit that referenced this pull request May 2, 2022
pruthvistony pushed a commit that referenced this pull request Feb 21, 2023
pruthvistony pushed a commit that referenced this pull request May 24, 2023
pruthvistony pushed a commit that referenced this pull request Sep 11, 2023
rraminen added a commit to rraminen/pytorch that referenced this pull request Oct 11, 2023
pruthvistony pushed a commit that referenced this pull request Dec 6, 2023
pruthvistony pushed a commit that referenced this pull request Jan 21, 2024
pruthvistony pushed a commit that referenced this pull request Jan 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.