IFU-master-2022-04-11 #993

rraminen · 2022-04-11T17:32:03Z

No conflict

If __torch_function__ was disabled, this TLS should propagate to other threads. Although I was thinking about pytorch#73942 when I did this, this doesn't actually help solve the problem, because when I disable __torch_function__ as part of the disabled __torch_function__ implementation, this is prior to when snapshotting happens (also snapshotting only happens for Python tensors anyway). I intend to add some more TLS to this struct soon, which is why it's a struct and not just a bool. Testing is not so easy to do because on CPU there isn't an easy way to get Python code running in another thread. Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: pytorch#75110 Approved by: https://github.com/albanD

Summary: Pull Request resolved: pytorch#75138 (Note: this ignores all push blocking failures!) Test Plan: CI Reviewed By: wconstab Differential Revision: D35331263 fbshipit-source-id: e426c4017359c9f98188c0df5226775be7b1f700 (cherry picked from commit bf1768f)

Partially fixes: pytorch#66328 This PR introduces a templated class `IList<T>`: a wrapper container for boxed (`c10::List<T>`) and unboxed (`at::ArrayRef<T>`) containers. At this point, it was created having `T = Tensor` in mind, but aiming for supporting `T = OptionalTensorRef`, too. Pull Request resolved: pytorch#67964 Approved by: https://github.com/ezyang

…74843) Summary: Pull Request resolved: pytorch#74843 is_output_quantized is used to check if we should quantize the op based on the dtype configuration in qconfig and what is supported by the backend, we'll skip inserting observer if the dtype configuration is not supported by the backend, this is now supported by backend_config_dict, and we can remove this function now. Also we previously supported fp16 static quantization for some ops for one of our internal use case, and now it is not required, so we can remove them Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: andrewor14 Differential Revision: D35190541 fbshipit-source-id: 623d961810737ec01e1f8b269ec48a6a99bb284a (cherry picked from commit a405998)

This PR enables jit-compiled reductions and moves `prod` to be jit-compiled. Currently, only reductions that can use `func_wrapper` for automatic implementation of `reduce/project/translate_idx` opes are supported, there are a few TODOs for support of more complex reductions such as norms and max, that typically require full-fledged ReduceOps functor. Similarly, only reductions with a single input are supported. Number of inputs is hardcoded to 1, which is true for our current reductions, but can be relaxed in the future. Pull Request resolved: pytorch#74446 Approved by: https://github.com/mruberry

Pull Request resolved: pytorch#75105 Approved by: https://github.com/albanD

References: pytorch#13918 Add more test cases for list of numpy array inputs Pull Request resolved: pytorch#72249 Approved by: https://github.com/mruberry

Fixes pytorch#74122 This re-enables TestTorchFunctionOverride and fixes a bunch of test failures that had crept in while it was disabled. Pull Request resolved: pytorch#74202 Approved by: https://github.com/ezyang

…orch#75149) Summary: Pull Request resolved: pytorch#75149 https://github.com/pytorch/rfcs/blob/master/RFC-0017-PyTorch-Operator-Versioning.md ghstack-source-id: 152906910 Test Plan: CI Reviewed By: qihqi Differential Revision: D35338681 fbshipit-source-id: 03cb699696af2c946d67ece95bdc019fc4a4cb11 (cherry picked from commit b72737e)

Summary: Add BFloat16 support for smooth_l1_loss on CPU. Pull Request resolved: pytorch#62558 Reviewed By: H-Huang Differential Revision: D34897859 Pulled By: frank-wei fbshipit-source-id: a52138c89852642db78f5f3083d05873f3cdec3a (cherry picked from commit 71908ee)

Summary: Pull Request resolved: pytorch#75176 Switch to python resources to fix build on buck2 https://www.internalfb.com/intern/wiki/Buck-users/Python_Resources_in_fbcode/ Reviewed By: r-barnes Differential Revision: D35352705 fbshipit-source-id: f85043ebbcfbb30d287c802ff7401c89155a024a (cherry picked from commit 35e7a98)

Test Plan: revert-hammer Differential Revision: D35352705 (pytorch@152489a) Original commit changeset: f85043ebbcfb Original Phabricator Diff: D35352705 (pytorch@152489a) fbshipit-source-id: 901e28dd17150c6300b2d263aba1a8b0651d3020 (cherry picked from commit ab91a2a)

…torch#74636) Summary: Pull Request resolved: pytorch#74636 This commit changes how quantization patterns for linear and conv are set up in prepare. Previously, these were set up through ConvReluQuantizeHandler and LinearReLUQuantizeHandler. After this commit, however, these were set up through the corresponding entries in the native backend_config_dict, rendering the above quantize handlers no longer necessary. In future commits, we will do the same for the remaining ops. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: jerryzh168, ngimel Differential Revision: D35225680 fbshipit-source-id: 4a79f63a11fce46701eb17aaf3619c1e827d72a4 (cherry picked from commit 475f599)

I also took the opportunity to update the documentation a little for clarity. Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: pytorch#75141 Approved by: https://github.com/zou3519

The pattern of a PyObject* bundled with a PyInterpreter* is pretty useful in many contexts (e.g., TorchDispatchTypeObject) so I have turned it into a dedicated class SafePyObject. In the process I fixed a bug with the old TorchDispatchTypeObject (copy constructor/assignment was not deleted), made the API more safe (retrieving the PyObject* pointer requires verification that the PyInterpreter* matches) and fixed some minor inefficiencies in C++ code. Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: pytorch#75142 Approved by: https://github.com/zou3519

Now there is truly only one way to call __torch_function__ and that is via handle_torch_function_no_python_arg_parser Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: pytorch#75159 Approved by: https://github.com/zou3519

Fixes pytorch#73187. Pull Request resolved: pytorch#75164 Approved by: https://github.com/albanD

Pull Request resolved: pytorch#73441 Approved by: https://github.com/ezyang, https://github.com/albanD

Fixes #ISSUE_NUMBER Pull Request resolved: pytorch#75165 Approved by: https://github.com/seemethere

This PR updates the documentation for the CosineEmbeddingLoss. The loss function uses the `cosine similarity` but in the documentation the term `cosine distance` is used. Therefor the term is changed to `cosine similarity` Fixes pytorch#75104 Pull Request resolved: pytorch#75188 Approved by: https://github.com/cpuhrsch

Make it accept origin environment variable Add explicit skip for pytorch@0a6a1b2 as its a rare case of same commit landed/reverted twice Pull Request resolved: pytorch#75209 Approved by: https://github.com/atalman, https://github.com/bigfootjon

Pull Request resolved: pytorch#72633 Approved by: https://github.com/cpuhrsch

- Fix _Demux can not be pickled with DILL presented pytorch#74958 (comment) - And add cache to traverse function to prevent infinite recursion for circular reference of DataPipe (Fixes pytorch/data#237) Pull Request resolved: pytorch#75034 Approved by: https://github.com/wenleix

Fixes #[74404](pytorch#74404) Pull Request resolved: pytorch#74997 Approved by: https://github.com/albanD

@cpuhrsch

Recently, @cpuhrsch noticed that going to viable/strict still didn't resolve upstream failures for lint. This is because we didn't check out the head SHA for those GHA (we missed it last time). This PR attempts to do some consolidation and fix that problem to make viable/strict more reliable. Pull Request resolved: pytorch#75199 Approved by: https://github.com/cpuhrsch, https://github.com/seemethere, https://github.com/malfet

…ink, hardswish and softplus on CPU (pytorch#63134) Summary: Add BFloat16 support for logsigmoid, hardsigmoid, hardshrink, softshrink, hardswish and softplus on CPU, and optimize the performance of softshrink. Pull Request resolved: pytorch#63134 Reviewed By: yinghai Differential Revision: D34897992 Pulled By: frank-wei fbshipit-source-id: 4c778f5271d6fa54dd78158258941def8d9252f5 (cherry picked from commit decda0e)

Summary: Pull Request resolved: pytorch#73219 Saw a report that this elementwise add is causing overhead. IIUC this is easy to fuse? ghstack-source-id: 152549975 Test Plan: CI, review Ran benchmark_transformers.par mha --batch-size 64 --max-sequence-length 128 --avg-sequence-length 256 --large --use-real-data-distribution --use-mask and looked at the PT time number ``` before: B=64, T=128, Half=True, GPU=True, Seed=1234, Padded tokens=54.92%, Use Mask=True PT Time: 1.24ms, NativePT Time: 1000000000.00ms, HF Time: 1.10ms, PT FLOPS: 59.07TFLOP/s, NativePT FLOPS: 0.00TFLOP/s, HF FLOPS: 66.46TFLOP/s B=64, T=128, Half=True, GPU=True, Seed=1234, Padded tokens=54.92%, Use Mask=True PT Time: 1.23ms, NativePT Time: 1000000000.00ms, HF Time: 1.09ms, PT FLOPS: 59.57TFLOP/s, NativePT FLOPS: 0.00TFLOP/s, HF FLOPS: 66.75TFLOP/s B=64, T=128, Half=True, GPU=True, Seed=1234, Padded tokens=54.92%, Use Mask=True PT Time: 1.24ms, NativePT Time: 1000000000.00ms, HF Time: 1.09ms, PT FLOPS: 58.87TFLOP/s, NativePT FLOPS: 0.00TFLOP/s, HF FLOPS: 66.77TFLOP/s after: B=64, T=128, Half=True, GPU=True, Seed=1234, Padded tokens=54.92%, Use Mask=True PT Time: 1.22ms, NativePT Time: 1000000000.00ms, HF Time: 1.10ms, PT FLOPS: 60.07TFLOP/s, NativePT FLOPS: 0.00TFLOP/s, HF FLOPS: 66.51TFLOP/s B=64, T=128, Half=True, GPU=True, Seed=1234, Padded tokens=54.92%, Use Mask=True PT Time: 1.22ms, NativePT Time: 1000000000.00ms, HF Time: 1.09ms, PT FLOPS: 59.80TFLOP/s, NativePT FLOPS: 0.00TFLOP/s, HF FLOPS: 66.69TFLOP/s B=64, T=128, Half=True, GPU=True, Seed=1234, Padded tokens=54.92%, Use Mask=True PT Time: 1.21ms, NativePT Time: 1000000000.00ms, HF Time: 1.09ms, PT FLOPS: 60.21TFLOP/s, NativePT FLOPS: 0.00TFLOP/s, HF FLOPS: 66.86TFLOP/s ``` Inspected a Kineto trace and confirmed that an elementwise add was fused into baddbmm. Additional opportunity: I see a copy_ inside baddbmm that wasn't happening with the bmm path and I'm not sure why. Perhaps something went wrong with the structured kernels port by ezyang? Reviewed By: ezyang Differential Revision: D34160547 fbshipit-source-id: 78d406fb035e6f3bf13af2c9443a886eada35ac4 (cherry picked from commit aaffc39)

pytorch#75007) Summary: Previously the highest-level process group in `period_process_group_dict` could be `None`, indicating the global group. Now `period_process_group_dict` cannot contain `None` as a process group, so the function `_find_process_group` can just return a process group instead of a tuple -- when not found, just return `None`, because now the returned process group cannot be `None`. Proposal: pytorch#71325 Pull Request resolved: pytorch#75007 Reviewed By: awgu Differential Revision: D35357816 Pulled By: rohan-varma fbshipit-source-id: 4522dba49797df7140227bfd822d668b7e118a66 (cherry picked from commit 77ca01b)

This reverts commit ad028e5. Reverted pytorch#74446 on behalf of https://github.com/seemethere

This reverts commit cda3f58. Reverted pytorch#72633 on behalf of https://github.com/janeyx99

This reverts commit 709fcc8. Reverted pytorch#75293 on behalf of https://github.com/janeyx99

Fixes pytorch#74264 (comment). The shape check works with or without the extras added in pytorch#74264. ```py >>> a = torch.rand(2, 2).to_sparse_csr() >>> b = torch.rand(2, 3).to_sparse_csr() >>> torch.testing.assert_close(a, b) AssertionError: The values for attribute 'shape' do not match: torch.Size([2, 2]) != torch.Size([2, 3]). ``` Tensor comparison is split into two parts: 1. Attribute comparison. 2. Value comparison. https://github.com/pytorch/pytorch/blob/bcf6974c207ac0339bfb8bdfdb0b0ec348f7a22f/torch/testing/_comparison.py#L611-L616 The attribute comparison happens in https://github.com/pytorch/pytorch/blob/bcf6974c207ac0339bfb8bdfdb0b0ec348f7a22f/torch/testing/_comparison.py#L618 The check for the matching shape https://github.com/pytorch/pytorch/blob/bcf6974c207ac0339bfb8bdfdb0b0ec348f7a22f/torch/testing/_comparison.py#L647-L648 is one of the few checks that cannot be disabled through keyword arguments. Thus, there is no need for this check in `_compare_sparse_csr_values` since the comparison will fail before if the shapes mismatch. Pull Request resolved: pytorch#75593 Approved by: https://github.com/cpuhrsch

…erge (pytorch#75542) Summary: Pull Request resolved: pytorch#75542 Reviewed By: malfet Differential Revision: D35513051 fbshipit-source-id: adf59359fcf2410fa8a61746533c896ec22d5ed3 (cherry picked from commit ab65394)

Fix pytorch#75482 There are several random exceptions of adding exclusions in Windows Defender https://github.com/pytorch/pytorch/runs/5953410781?check_suite_focus=true It looks that the Add/Set-MpPreference(added in pytorch#75313) might be unstable. Since it is a defensive step, the exception could be ignored so that the workflow should continue even the command fails. reference: https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_commonparameters?view=powershell-7.2 verification: In the test PR https://github.com/pytorch/pytorch/runs/5966277521?check_suite_focus=true#step:3:54 it tries to delete 2 non-existing processes, but the workflow can continue run. `-ErrorAction Ignore` works in the runner. Pull Request resolved: pytorch#75588 Approved by: https://github.com/suo

Summary: The primary issue for enabling sparsity to work with QAT convert (unlike normal quantization convert) is that when the parametrized module undergoes the QAT convert, the parametrizations need to be maintained. If the parametrizations don't get transfered during the convert, the sparsifier would lose its connection to the model. In practice this was handled using the transfer_parametrizations_and_params function to move the weight and bias and any associated paramerizations to the new module. This PR also adds tests for transfer_parametrizations_and_params and type_before_parametrizations to test_nn.py and also added comments to the test code for composability. Test Plan: python test/test_ao_sparsity.py TestComposability python test/test_nn.py TestNN Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: pytorch#74848 Approved by: https://github.com/vkuzo, https://github.com/Lezcano

…4-11

rraminen · 2022-04-18T17:52:40Z

test pytorch/apex/torchvision please

rraminen · 2022-04-19T06:58:31Z

As the CI is on ROCm 5.0, the PyTorch unit tests are tested locally on ROCm 5.1 release image.

The below 4 tests fail because of packaging error. They also fail in the rocm5.1 release image with PyT built from current rocm fork master before this IFU. (https://github.com/ROCmSoftwarePlatform/frameworks-internal/issues/1530)

test_loading_pickle (test_directory_reader.DirectoryReaderTest)
test_model_save (test_model.ModelTest)
test_resnet (test_model.ModelTest)
test_script_resnet (test_model.ModelTest)

The below 2 unit tests fail with proxy related errors. They also fail in the rocm5.1 release image with PyT built from current rocm fork master before this IFU. (https://github.com/ROCmSoftwarePlatform/frameworks-internal/issues/1531)

test_torchvision_models_detection_ssd300_vgg16 (main.TestVisionTracing)
test_torchvision_models_detection_ssdlite320_mobilenet_v3_large (main.TestVisionTracing)

The below 3 tests fail, but there are skipped for now( in #1001) as we are aware of this issue (https://ontrack-internal.amd.com/browse/SWDEV-332522):

test_event_handle_exporter (main.TestMultiprocessing)
test_event_handle_importer (main.TestMultiprocessing)
test_event_multiprocess (main.TestMultiprocessing)

Among the distributed unit tests, the below two tests fail but can be ignored as they are disabled in the upstream CI:

test_post_localSGD_optimizer_parity_with_hierarchical_sgd (main.TestDistBackendWithSpawn)
test_post_localSGD_optimizer_parity_with_hierarchical_sgd_grad_is_view (main.TestDistBackendWithSpawn)

PT_unittests_log_IFU-master-2022-04-11.log

PT_unittests_log_IFU-master-2022-04-11_distributed.log

rraminen · 2022-04-19T22:03:35Z

http://rocmhead.amd.com:8080/job/pytorch/job/pytorch-ci/22/

Apex unit tests look good
apex.test.log

Torchvision unit tests look good
torchvision.test.log

They used 0e487f5

pruthvistony · 2022-04-21T06:42:21Z

test pytorch/apex/torchvision please

pruthvistony · 2022-04-21T06:47:27Z

@rraminen , can you push an empty commit to trigger the CI.
The previous failure is in ld while building torchvision.

rraminen · 2022-04-21T15:49:35Z

Hi @pruthvistony, I pushed an empty commit.

ezyang and others added 30 commits April 2, 2022 02:18

Update forward AD not supported error message

631f035

Pull Request resolved: pytorch#75105 Approved by: https://github.com/albanD

torch.tensor: add tests for list of numpy arrays case

0509022

References: pytorch#13918 Add more test cases for list of numpy array inputs Pull Request resolved: pytorch#72249 Approved by: https://github.com/mruberry

Restore TestTorchFunctionOverride

bf16552

Fixes pytorch#74122 This re-enables TestTorchFunctionOverride and fixes a bunch of test failures that had crept in while it was disabled. Pull Request resolved: pytorch#74202 Approved by: https://github.com/ezyang

Split PyInterpreter into its own file.

1faf1cd

I also took the opportunity to update the documentation a little for clarity. Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: pytorch#75141 Approved by: https://github.com/zou3519

Use the same checks in all grid_sampler functions

936a650

Fixes pytorch#73187. Pull Request resolved: pytorch#75164 Approved by: https://github.com/albanD

make functionalization work better with subclasses

9429dbb

Pull Request resolved: pytorch#73441 Approved by: https://github.com/ezyang, https://github.com/albanD

Typo in Dockerfile

0765a80

Fixes #ISSUE_NUMBER Pull Request resolved: pytorch#75165 Approved by: https://github.com/seemethere

Support masked sum on CSR tensors [CPU, CUDA]

cda3f58

Pull Request resolved: pytorch#72633 Approved by: https://github.com/cpuhrsch

Fix parameterlist dir func error (pytorch#74404)

29de792

Fixes #[74404](pytorch#74404) Pull Request resolved: pytorch#74997 Approved by: https://github.com/albanD

Revert "WIP Jiterator reduction"

7d2f36b

This reverts commit ad028e5. Reverted pytorch#74446 on behalf of https://github.com/seemethere

Revert "Support masked sum on CSR tensors [CPU, CUDA]"

f6b9a1d

This reverts commit cda3f58. Reverted pytorch#72633 on behalf of https://github.com/janeyx99

pytorchmergebot and others added 6 commits April 11, 2022 15:24

Revert "Add cuda-11.3+clang9 build workflow"

8fe43d7

This reverts commit 709fcc8. Reverted pytorch#75293 on behalf of https://github.com/janeyx99

Merge remote-tracking branch 'upstream/master' into IFU-master-2022-0…

fc5d806

…4-11

rraminen requested review from jeffdaily and jithunnair-amd as code owners April 11, 2022 17:32

Updated related_commits

0e487f5

rraminen requested a review from pruthvistony April 19, 2022 22:04

pruthvistony approved these changes Apr 21, 2022

View reviewed changes

Trigger CI

87dbfd5

pruthvistony merged commit 625dd01 into master Apr 22, 2022

rraminen added a commit to rraminen/pytorch that referenced this pull request Apr 22, 2022

Skipped the tests as the fail w.r.t ROCm#993

018f450

rraminen mentioned this pull request Apr 22, 2022

Skipped the failing tests on ROCm during IFU-master-2022-04-11 #1001

Merged

rraminen added a commit that referenced this pull request May 2, 2022

Skipped the tests as the fail w.r.t #993

2193d64

WBobby mentioned this pull request Aug 17, 2022

Add ROCm5.2.3/AMDGPU support for PyTorch WBobby/pytorch#2

Closed

pruthvistony pushed a commit that referenced this pull request Feb 21, 2023

Skipped the tests as the fail w.r.t #993

40be013

pruthvistony pushed a commit that referenced this pull request May 24, 2023

Skipped the tests as the fail w.r.t #993

2d9f3ec

pruthvistony pushed a commit that referenced this pull request Sep 11, 2023

Skipped the tests as the fail w.r.t #993

8261d14

rraminen added a commit to rraminen/pytorch that referenced this pull request Oct 11, 2023

Skipped the tests as the fail w.r.t ROCm#993

9a1bcf0

pruthvistony pushed a commit that referenced this pull request Dec 6, 2023

Skipped the tests as the fail w.r.t #993

6945aa0

pruthvistony pushed a commit that referenced this pull request Jan 21, 2024

Skipped the tests as the fail w.r.t #993

a119855

pruthvistony pushed a commit that referenced this pull request Jan 22, 2024

Skipped the tests as the fail w.r.t #993

b2079ca

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

IFU-master-2022-04-11 #993

IFU-master-2022-04-11 #993

Uh oh!

rraminen commented Apr 11, 2022

Uh oh!

rraminen commented Apr 18, 2022

Uh oh!

rraminen commented Apr 19, 2022 •

edited

Loading

Uh oh!

rraminen commented Apr 19, 2022

Uh oh!

pruthvistony commented Apr 21, 2022

Uh oh!

pruthvistony commented Apr 21, 2022

Uh oh!

rraminen commented Apr 21, 2022

Uh oh!

Uh oh!

IFU-master-2022-04-11 #993

IFU-master-2022-04-11 #993

Uh oh!

Conversation

rraminen commented Apr 11, 2022

Uh oh!

rraminen commented Apr 18, 2022

Uh oh!

rraminen commented Apr 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rraminen commented Apr 19, 2022

Uh oh!

pruthvistony commented Apr 21, 2022

Uh oh!

pruthvistony commented Apr 21, 2022

Uh oh!

rraminen commented Apr 21, 2022

Uh oh!

Uh oh!

rraminen commented Apr 19, 2022 •

edited

Loading