Add ROCm5.2.3/AMDGPU support for PyTorch #2

WBobby · 2022-08-17T20:22:44Z

Sphinx panel
[GHA] Migrate win/linux binary-smoke workflows from CircleCI
[Quant][fx] Add lowering for Linear-Bn1d in QAT mode ([Quant][fx] Add lowering for Linear-Bn1d in QAT mode pytorch/pytorch#73509)
[Quant][fx] Add lowering for functional conv ([Quant][fx] Add lowering for functional conv pytorch/pytorch#73708)
CMake option for using static MKL libraries
ci: Enable long paths for windows
ci: Migrate windows conda to GHA
[Static Runtime] Add out variant wrapper for aten::ones ([Static Runtime] Add out variant wrapper for aten::ones pytorch/pytorch#73851)
[FSDP] Add grad accumulation without no_sync() ([FSDP] Add grad accumulation without no_sync() pytorch/pytorch#73535)
[Easy][FSDP] Fix warning render ([Easy][FSDP] Fix warning render pytorch/pytorch#73786)
[FSDP][BE] Change assert to assertEqual ([FSDP][BE] Change assert to assertEqual pytorch/pytorch#73787)
[ONNX] Fix repeat interleave when repeats and dim is 1
[JIT] make RegisterCudaFuseGraph use TORCH_API instead of C10_EXPORT ([JIT] make RegisterCudaFuseGraph use TORCH_API instead of C10_EXPORT pytorch/pytorch#73742)
Update torch_dispatch to return op overload instead of the opoverload packet function (Update __torch_dispatch__ to return op overload instead of the opoverload packet function pytorch/pytorch#72673)
[RELAND] [cuDNN] Add a new optimized cuDNN RNN algorithm for small RNN hidden_size ([RELAND] [cuDNN] Add a new optimized cuDNN RNN algorithm for small RNN hidden_size pytorch/pytorch#73211)
Add forward AD support for logsumexp, log_softmax, softmax, nll_loss, and cross_entropy (Add forward AD support for logsumexp, log_softmax, softmax, nll_loss, and cross_entropy pytorch/pytorch#73741)
Deduplicate legacy _ctor and _new Python bindings (Deduplicate legacy _ctor and _new Python bindings pytorch/pytorch#73822)
add tools/onnx to merge rules
Revert "ci: Migrate windows conda to GHA"
[SR] Make sigrid_transforms fusion work on graph outputs ([SR] Make sigrid_transforms fusion work on graph outputs pytorch/pytorch#73091)
[fx/graph_drawer] Add skip_node_names_in_args option, default to True ([fx/graph_drawer] Add skip_node_names_in_args option, default to True pytorch/pytorch#73815)
record_function: add torchbind alternative API (record_function: add torchbind alternative API pytorch/pytorch#72301)
Fix docstring hiding due to Fix typing errors in the torch.distributions module pytorch/pytorch#45689 (Fix docstring hiding due to #45689 pytorch/pytorch#73747)
[FSDP] Add always_wrap policy ([FSDP] Add always_wrap policy pytorch/pytorch#73687)
[Model Averaging] Add a reference to hierarchical SGD ([Model Averaging] Add a reference to hierarchical SGD pytorch/pytorch#73823)
[Qunat] Refactor reference module mapping ([Quant] Refactor reference module mapping pytorch/pytorch#72755)
[FSDP] Generalize fsdp_modules() ([FSDP] Generalize fsdp_modules() pytorch/pytorch#73553)
[ZeRO][BE] Clean up ZeRO tests ([ZeRO][BE] Clean up ZeRO tests pytorch/pytorch#73842)
Fix libtorch_cuda_linalg builds (Fix libtorch_cuda_linalg builds pytorch/pytorch#73896)
[JIT] log extract tool - dump NVFuser fallbacks instead of fusion groups ([JIT] log extract tool - dump NVFuser fallbacks instead of fusion groups pytorch/pytorch#73881)
Fix nightly docker publish build
Add logging for ProcessGroup backends. (Add logging for ProcessGroup backends. pytorch/pytorch#73702)
[codemod][type-comments] Convert type comments in examples.py ([codemod][type-comments] Convert type comments in examples.py pytorch/pytorch#73085)
[ROCM] Navi21 Enablement 5: Softmax kernels ([ROCM] Navi21 Enablement 5: Softmax kernels pytorch/pytorch#73545)
Fix segfault while real and imaginary attributes are set to a number (Fix segfault while real and imaginary attributes are set to a number pytorch/pytorch#73867)
Back out "Revert D34524207: [pytorch][PR] remove _s_where" (Back out "Revert D34524207: [pytorch][PR] remove _s_where" pytorch/pytorch#73579)
[FSDP] Provide a utility API to allow users easily to set state_dict_type ([FSDP] Provide a utility API to allow users easily to set state_dict_type pytorch/pytorch#73716)
ci: Migrate windows conda to GHA
Fix fx tracing for OpOverload (Fix fx tracing for OpOverload pytorch/pytorch#73940)
Add Autocast support for Einsum (Add Autocast support for Einsum pytorch/pytorch#71916)
[ONNX] Fix onnx gather shape inference
[codemod][type-comments] Convert type comments in _lobpcg.py ([codemod][type-comments] Convert type comments in _lobpcg.py pytorch/pytorch#73088)
Add test with multiple ops (Add test with multiple ops pytorch/pytorch#73888)
Use cub::DeviceSelect::UniqueByKey for EmbeddingBackward (Use cub::DeviceSelect::UniqueByKey for EmbeddingBackward pytorch/pytorch#68376)
[CircleCI] Delete MacOS binary smoke tests
[Quant] Qadd: Add qint8 support backed by xnnpack ([Quant] Qadd: Add qint8 support backed by xnnpack pytorch/pytorch#73663)
Revert D34455360: Multisect successfully blamed D34455360 for test failures
Move autograd functional tests to separate file (Move autograd functional tests to separate file pytorch/pytorch#73852)
Clean up some tests to use common_utils.parametrize (Clean up some tests to use common_utils.parametrize pytorch/pytorch#73853)
Parametrize some TestAutogradFunctional tests to use logging_tensor (Parametrize some TestAutogradFunctional tests to use logging_tensor pytorch/pytorch#73854)
Check that PR base is default branch in trymerge.py
Fix Undefined variable in QInterpolateBenchmark
Add test/mobile to merge_rules.json
[SR][easy] Stack/concat out variants do not segfault on empty inputs ([SR][easy] Stack/concat out variants do not segfault on empty inputs pytorch/pytorch#73704)
[pytorch] flatten_indices function should use vector::resize instead of reserve ([pytorch] flatten_indices function should use vector::resize instead of reserve pytorch/pytorch#73831)
[PyTorch] Move is_nested_tensor_impl & add get_nested_tensor_impl_or_null ([PyTorch] Move is_nested_tensor_impl & add get_nested_tensor_impl_or_null pytorch/pytorch#73928)
FX graph module - prevent infinite recursion (FX graph module - prevent infinite recursion pytorch/pytorch#73866)
add script to aggregate production ops
[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
[ONNX] Adds overload_name to Aten op ([ONNX] Adds overload_name to Aten op pytorch/pytorch#69378) ([ONNX] Adds overload_name to Aten op (#69378) pytorch/pytorch#73280)
[ONNX] Add module name as PythonOp attribute ([ONNX] Add module name as PythonOp attribute pytorch/pytorch#67193) ([ONNX] Add module name as PythonOp attribute (#67193) pytorch/pytorch#73281)
[ONNX] use onnxruntime 1.10 in CI ([ONNX] use onnxruntime 1.10 in CI pytorch/pytorch#69271) ([ONNX] use onnxruntime 1.10 in CI (#69271) pytorch/pytorch#73282)
[ONNX] Add symbolic support for torch.nn.cosinesimilarity ([ONNX] Add symbolic support for torch.nn.cosinesimilarity pytorch/pytorch#72128) ([ONNX] Add symbolic support for torch.nn.cosinesimilarity (#72128) pytorch/pytorch#73283)
Print system info as part of EC2 info step
Reset worker cycle iterator for determinism across runs (Reset worker cycle iterator for determinism across runs pytorch/pytorch#73675)
[DataPipe] Adding serialization test for all MapDataPipe ([DataPipe] Adding serialization test for all MapDataPipe pytorch/pytorch#73921)
[DataPipe] Slight refactoring IterDataPipe serialization test ([DataPipe] Slight refactoring IterDataPipe serialization test pytorch/pytorch#73922)
Relax dtype restrictions on torch.Tensor (Relax dtype restrictions on torch.Tensor pytorch/pytorch#73850)
Get rid of TorchScript sparse tensor is experimental warning. (Get rid of TorchScript sparse tensor is experimental warning. pytorch/pytorch#73874)
[Lazy][JIT] Do not crash when target device is unsupported by fuser ([Lazy][JIT] Do not crash when target device is unsupported by fuser pytorch/pytorch#73820)
[OSS] add script to generate test models for mobile (add iOS model test pytorch/pytorch#73746)
Revert "Check that PR base is default branch in trymerge.py"
[GHF][BE] Add match_rules test
ci: Fix cudatoolkit issue, make docker builds testable
RNN args renaming in memonger.
Fix deadlock in some edge case in autograd (Fix deadlock in some edge case in autograd pytorch/pytorch#73961)
[Quant][test] Added test to check if fp16 packing->unpacking yields the same result as to(torch.float16).to(torch.float32) [reland] ([Quant][test] Added test to check if fp16 packing->unpacking yields the same result as to(torch.float16).to(torch.float32) [reland] pytorch/pytorch#73808)
[tensorexp] ExternalCall2 without memcpy ([tensorexp] ExternalCallWithAlloc (take ownership of aten Tensor, no memcpy ) pytorch/pytorch#72225)
[GHF] Specify multiple mandatory checks
[fx/operator_schemas] Bring back check for OpOverload ([fx/operator_schemas] Bring back check for OpOverload pytorch/pytorch#73978)
[Quant] Qconv: Add qint8 support backed by xnnpack ([Quant] Qconv: Add qint8 support backed by xnnpack pytorch/pytorch#73669)
[Quant] Qlinear Add qint8 support backed by xnnpack ([Quant] Qlinear Add qint8 support backed by xnnpack pytorch/pytorch#73672)
[Static Runtime] Add native op support for aten::len ([Static Runtime] Add native op support for aten::len pytorch/pytorch#73899)
Fix distributions/test_distributions.py for Python 3.10
expanded weights: conv faster rule (expanded weights: conv faster rule pytorch/pytorch#73692)
update script to calculate operator coverage
Don't do math with null pointers in SortingKernel.cpp (Don't do math with null pointers in SortingKernel.cpp pytorch/pytorch#73986)
Report the names of unsupported operators in flatbuffer_loader.cpp (Report the names of unsupported operators in flatbuffer_loader.cpp pytorch/pytorch#73865)
Add run_android_tests workflow
[GH1] Refuse to merge PRs with internal changes
Adding a step to start docker if it is not running.
[PyTorch Distributed] Add debug hint for NCCL async system error ([PyTorch Distributed] Add debug hint for NCCL async system error pytorch/pytorch#73897)
port torch cov tests to error inputs (port torch cov tests to error inputs pytorch/pytorch#73977)
Disable TF32 in some linalg tests; Disable TF32 in svd_lowrank forward (Disable TF32 in some linalg tests; Disable TF32 in svd_lowrank forward pytorch/pytorch#73614)
[shard] use gather_object in gather API ([shard] use gather_object for gather API pytorch/pytorch#71624)
[ONNX] Enable topk export with non-int64 k
Jit save/load meta tensors (Jit save/load meta tensors pytorch/pytorch#73435)
[shard] disable rocm and windows for sharding_spec test ([shard] disable rocm and windows for sharding_spec test pytorch/pytorch#74040)
rename config module file to work with gh pages better
[ci] move rocm distributed jobs to master-only
[PyTorch] Fix lkj_cholesky device error (Fix lkj_cholesky device error pytorch/pytorch#73980)
docs: code examples running successfully
Cleanup all module references in doc (Cleanup all module references in doc pytorch/pytorch#73983)
disable LT interface (disable LT interface pytorch/pytorch#74021)
[ao] Removing memoryless observer args for MovingAverage ([ao] Removing memoryless observer args for MovingAverage pytorch/pytorch#73947)
[PT-D][FSDP] Implement clip_grad_norm for FSDP ([PT-D][FSDP] Implement _clip_grad_norm_ for FSDP pytorch/pytorch#73405)
[Static Runtime] Add out variant wrapper for aten::zeros ([Static Runtime] Add out variant wrapper for aten::zeros pytorch/pytorch#73946)
[JIT] call super().setUp() in test_jit_fuser_te.py ([JIT] call super().setUp() in test_jit_fuser_te.py pytorch/pytorch#73762)
(torchx/elastic) honor NCCL_ASYNC_ERROR_HANDLING set from the env var ((torchx/elastic) honor NCCL_ASYNC_ERROR_HANDLING set from the env var pytorch/pytorch#73982)
Support tensor.getitem() in TorchScript compilation (Support tensor.__getitem__() in TorchScript compilation pytorch/pytorch#73952)
Fix test_binary_ufuncs.py for Python 3.10
[jit][edge] Pass through dynamic type for DictType. ([jit][edge] Pass through dynamic type for DictType. pytorch/pytorch#74025)
Automated submodule update: FBGEMM (Automated submodule update: FBGEMM pytorch/pytorch#73895)
[easy][PyTorchEdge] Add magic number to flatbuffer schema ([PyTorchEdge] Add magic number to flatbuffer schema pytorch/pytorch#74048)
[vulkan] Enable Pytorch Vulkan to build in FBCode ([vulkan] Enable Pytorch Vulkan to build in FBCode pytorch/pytorch#73872)
[iOS] Update Cocoapods for 1.11 ([iOS] Update Cocoapods for 1.11 pytorch/pytorch#74089)
[iOS] Fix the TestApp ([iOS] Fix the TestApp pytorch/pytorch#74090)
[PyTorch] [Model Tracer] Use c10::Synchronized for kernel dtype tracer ([PyTorch] [Model Tracer] Use c10::Synchronized<T> for kernel dtype tracer pytorch/pytorch#73723)
[PyTorch] [Model Tracer] Use c10::Synchronized for custom classes tracer ([PyTorch] [Model Tracer] Use c10::Synchronized<T> for custom classes tracer pytorch/pytorch#73724)
[PyTorch] [Model Tracer] Use c10::Synchronized for build features tracer ([PyTorch] [Model Tracer] Use c10::Synchronized<T> for build features tracer pytorch/pytorch#73725)
[PyTorch] Update Synchronized::withLock() to return the type/value from the aceepted callable ([PyTorch] Update Synchronized<T>::withLock() to return the type/value from the aceepted callable pytorch/pytorch#74060)
[PyTorch] Use c10::Synchronized in RWSafeLeftRightWrapper ([PyTorch] Use c10::Synchronized<T> in RWSafeLeftRightWrapper pytorch/pytorch#74061)
[PyTorch] Add unit test for c10::Synchronized ([PyTorch] Add unit test for c10::Synchronized<T> pytorch/pytorch#74062)
[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
[Quant][core] Merged conv packed params and linear packed params ([Quant][core] Merged conv packed params and linear packed params pytorch/pytorch#73486)
Revert D34814800: [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Revert D34800969: [PyTorch] Add unit test for c10::Synchronized
fix: nn.Module allowing for expected Mixin MRO
Revert D34645508: [PyTorch] Use c10::Synchronized in RWSafeLeftRightWrapper
Revert D34645509: [PyTorch] Update Synchronized::withLock() to return the type/value from the aceepted callable
Revert D34604066: [PyTorch] [Model Tracer] Use c10::Synchronized for build features tracer
Revert D34604067: [PyTorch] [Model Tracer] Use c10::Synchronized for custom classes tracer
Revert D34604068: [PyTorch] [Model Tracer] Use c10::Synchronized for kernel dtype tracer
linalg_solve_triangular should not be a method
Automated submodule update: FBGEMM (Automated submodule update: FBGEMM pytorch/pytorch#74088)
[quant][fx] Fully align convert with the reference model design and simplify the implementation ([quant][fx] Fully align convert with the reference model design and simplify the implementation pytorch/pytorch#73863)
[jiterator] sigmoid : complex dtypes ([jiterator] sigmoid : complex dtypes pytorch/pytorch#73643)
[Profiler] Specialized AppendOnlyQueue ([Profiler] Specialized AppendOnlyQueue pytorch/pytorch#73409)
regenerate flatbuffer header (Update generated headers and clang format. pytorch/pytorch#73810)
Add ONEDNN quantization backend (Add ONEDNN quantization backend pytorch/pytorch#69820)
[PyTorch] Add stub NestedTensor_is_contiguous function ([PyTorch] Add stub NestedTensor_is_contiguous function pytorch/pytorch#73997)
[PyTorch] Move get_nested_tensor_impl to header ([PyTorch] Move get_nested_tensor_impl to header pytorch/pytorch#73998)
Revert D33716039: [pytorch][PR] Add ONEDNN quantization backend
[JIT] add keep_unique_names arg to canonicalize python bindings ([JIT] add keep_unique_names arg to canonicalize python bindings pytorch/pytorch#74074)
[ONNX] Remove dangling print in repeat_interleave
[PyTorch Distributed] Update documentation about NCCL environment variables ([PyTorch Distributed] Update documentation about NCCL environment variables pytorch/pytorch#74006)
[BE] Add optional step to upload coredumps
[fx][acc_tracer] fix defaulted placeholder normalization ([fx][acc_tracer] fix defaulted placeholder normalization pytorch/pytorch#73406)
[BC-Breaking] Remove redundant fsdp prefix ([BC-Breaking] Remove redundant fsdp prefix pytorch/pytorch#73791)
[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
[Quant][core][gpu][improvement] Refactored implementation for conv2d_cudnn to use packed parameters ([Quant][core][gpu][improvement] Refactored implementation for conv2d_cudnn to use packed parameters pytorch/pytorch#73510)
[Quant][core][refactorization] Refactored qconv_unpack.cpp into an implementation file and higher level call registration and definition file ([Quant][core][refactorization] Refactored qconv_unpack.cpp into an implementation file and higher level call registration and definition file pytorch/pytorch#73773)
Revert D34641680: [Quant][core][refactorization] Refactored qconv_unpack.cpp into an implementation file and higher level call registration and definition file
Revert D34803275: [Quant][core][gpu][improvement] Refactored implementation for conv2d_cudnn to use packed parameters
Remove trailing semicolon. (Remove trailing semicolon. pytorch/pytorch#74031)
[Profiler] Prefer TSC to wall clock when available ([Profiler] Prefer TSC to wall clock when available pytorch/pytorch#73855)
[PyTorch Edge] Internal Optimized Quantized Matmul ([PyTorch Edge] Internal Optimized Quantized Matmul pytorch/pytorch#73244)
[PyTorch Edge] Perform QMatMul Requantization within Ruy ([PyTorch Edge] Perform QMatMul Requantization within Ruy pytorch/pytorch#73245)
[PyTorch Edge] Allow >2 dimensional tensors in internal Quantized Matmul ([PyTorch Edge] Allow >2 dimensional tensors in internal Quantized Matmul pytorch/pytorch#73246)
[PyTorch Edge] Use Parallelization in Internal Quantized Matmul ([PyTorch Edge] Use Parallelization in Internal Quantized Matmul pytorch/pytorch#73247)
Add atalman to OSS CI reviewers
[GH][BE] Fix typo
[BE] Unify CI workflow dispatch
Back out "fix: nn.Module allowing for expected Mixin MRO"
[JIT] enable NNC cpu fusion with torch.jit.fuser("fuser1") ([JIT] enable NNC cpu fusion with torch.jit.fuser("fuser1") pytorch/pytorch#74078)
.github: Add PyTorch Core team as superusers
[jiterator] exp: complex
Fix math formatting, misc edit
[Static Runtime] Add out variant wrapper for aten::ones_like ([Static Runtime] Add out variant wrapper for aten::ones_like pytorch/pytorch#73945)
[JIT] C10_EXPORT -> TORCH_API ([JIT] C10_EXPORT -> TORCH_API pytorch/pytorch#73818)
remove redundant index check for index_select_out_cpu_dim1_ (remove redundant index check for index_select_out_cpu_dim1_ pytorch/pytorch#74093)
[ONNX] Remove redundant warning for reshape
Add names to lint jobs so we could require them pass
Update tls logic to work better with guarded call (Update tls logic to work better with guarded call pytorch/pytorch#73925)
[ci] add a script to get the workflow job id
[RFC] release: Formalize patch release process
Disable all buffer tracing
[Static Runtime] Use IValue::toListRef for aten::len to address comment on D34705231 ([Static Runtime] Use IValue::toListRef for aten::len to address comment on D34705231 pytorch/pytorch#74192)
Fix Android full CI build
[Kineto] Manual Submodule Update ([pytorch][PR][Kineto] Manual Submodule Update pytorch/pytorch#73858)
[FX] Fix bare generic type annotations ([FX] Fix bare generic type annotations pytorch/pytorch#74135)
[PyTorchEdge] Start writing magic to flatbuffer output ([PyTorchEdge] Start writing magic to flatbuffer output pytorch/pytorch#74084)
[FX] Disable buffer tracing test due to SEV remediation
Mention milestones as way of tracking issues/PRs
Add onednn quant backend (Add onednn quant backend pytorch/pytorch#74137)
Remove inaccurate confusing signal box from README.md
[quant] Fix implementation for output_quantized_idxs in convert ([quant] Fix implementation for output_quantized_idxs in convert pytorch/pytorch#74140)
[Quant][fx] Reenable serialization test after convert refactor ([Quant][fx] Reenable serialization test after convert refactor pytorch/pytorch#74204)
[FSDP] Option to summon on rank 0 only ([FSDP] Option to summon on rank 0 only pytorch/pytorch#73903)
[FSDP] summon offload to CPU ([FSDP] summon offload to CPU pytorch/pytorch#73904)
Revert D34846005: [quant] Fix implementation for output_quantized_idxs in convert
Make torch.nn importable on Python-3.7.0
Fix test_reduce_add_coalesced failure (Fix test_reduce_add_coalesced failure pytorch/pytorch#74027)
[Easy] Remove erroneous comment ([Easy] Remove erroneous comment pytorch/pytorch#74195)
[fix] kaiser_window : meta for window_length > 1 ([fix] kaiser_window : meta for window_length > 1 pytorch/pytorch#73733)
[PyTorch] Avoid schema parsing in lightweight dispatch ([PyTorch] Avoid schema parsing in lightweight dispatch pytorch/pytorch#74069)
[DataPipe] Separating DataPipes from Dataset into different files ([DataPipe] Separating DataPipes from Dataset into different files pytorch/pytorch#73396)
[BE Hackathon][DataPipe] Automatically generate datapipe.pyi via CMake ([BE Hackathon][DataPipe] Automatically generate datapipe.pyi via CMake pytorch/pytorch#73991)
Parametrize remaining tests in TestAutogradFunctional to use logging_tensor
[ddp] parameter verification ([ddp] parameter verification pytorch/pytorch#74113)
[Reducer] small fix ([Reducer] small fix pytorch/pytorch#74127)
check if owner is pytorch in jobs
[Static Runtime] Fix a bug that aten::full reuses a tensor that does not match requested one ([Static Runtime] Fix a bug that aten::fill reuses a tensor that does not match requested one pytorch/pytorch#73990)
[Static Runtime] Use composite op for TE fusion ([Static Runtime] Use composite op for TE fusion pytorch/pytorch#74126)
[torch::deploy] remove asserts from deploy ([torch::deploy] remove asserts from deploy pytorch/pytorch#73456)
Install ffmpeg=4.4.1 because torchvision doesn't compile on ffmpeg-5
Improve numerical stability of torch.distributions.wishart.Wishart (Improve numerical stability of torch.distributions.wishart.Wishart pytorch/pytorch#72993)
Revert D34868005: [torch::deploy] remove asserts from deploy
Create method to map JIT module to (source, constant) and back. (Create method to map JIT module to (source, constant) and back. pytorch/pytorch#74119)
Update circleci pipelines to support both master and main branches.
[PyTorch] Make TensorImpl::sizes() customizable and disable it for NestedTensorImpl ([PyTorch] Make TensorImpl::sizes() customizable and disable it for NestedTensorImpl pytorch/pytorch#73817)
[Codemod][Codemod deprecated unittest asserts] fbcode//caffe2/test ([Codemod][Codemod deprecated unittest asserts] fbcode//caffe2/test pytorch/pytorch#71708)
fixing assert condition (fixing assert condition pytorch/pytorch#74239)
Replace get_all_ type macros with the ATen dispatch macros. (Replace get_all_ type macros with the ATen dispatch macros. pytorch/pytorch#71561)
[ROCm] Update the handling of hipRuntimeGetVersion()
Implement checksuites pagination
ci: Migrate metrics credentials to managed IAM
Performance and memory improvements to batched torch.linalg.solve (2nd attempt) (Performance and memory improvements to batched torch.linalg.solve (2nd attempt) pytorch/pytorch#71756)
[SR] Avoid allocating rstd/mean in layer_norm ([SR] Avoid allocating rstd/mean in layer_norm pytorch/pytorch#73606)
Revert D34856571: [pytorch][PR] Replace get_all_ type macros with the ATen dispatch macros.
Add facebook-github-bot to superuser approver list
release: Add convenience script for branch cutting
[caffe2] Fix alias analysis for quantization compression ops ([caffe2] Fix alias analysis for quantization compression ops pytorch/pytorch#74169)
Revert "Add facebook-github-bot to superuser approver list"
[quant] Fix implementation for output_quantized_idxs in convert ([quant] Fix implementation for output_quantized_idxs in convert pytorch/pytorch#74140) ([reland][quant] Fix implementation for output_quantized_idxs in convert (#74140) pytorch/pytorch#74229)
[quant][fx] Allow incrementally remove the items in quantization_patterns.py ([quant][fx] Allow incrementally remove the items in quantization_patterns.py pytorch/pytorch#74210)
[GHA][BE] Delete unused definitions ([GHA][BE] Delete unused definitions pytorch/pytorch#74270)
Automated submodule update: FBGEMM (Automated submodule update: FBGEMM pytorch/pytorch#74104)
[ao] Fixing obs insertion through dtype propagation ([ao] Fixing obs insertion through dtype propagation pytorch/pytorch#73274)
[PyTorch GPU Allocator] Better use of blocks with rounding of allocation sizes ([CUDACachingAllocator] Better use of blocks with rounding of allocation sizes pytorch/pytorch#74213)
[Quant][core][gpu][improvement] Refactored implementation for conv2d_cudnn to use packed parameters (Reland PR#73510) ([Quant][core][gpu][improvement] Refactored implementation for conv2d_cudnn to use packed parameters (Reland PR#73510) pytorch/pytorch#74220)
Improve error message of loading saved TS module out of support window (Improve error message of loading saved TS module out of support window pytorch/pytorch#74228)
Add operator selection ability to gen_unboxing (Add operator selection ability to gen_unboxing pytorch/pytorch#74271)
[Quant] qconv: fix xnnpack operator caching ([Quant] qconv: fix xnnpack operator caching pytorch/pytorch#74217)
[Quant][core][refactorization] Refactored qconv_unpack.cpp into an implementation file and higher level call registration and definition file (Reland [Quant][core][refactorization] Refactored qconv_unpack.cpp into an implementation file and higher level call registration and definition file pytorch/pytorch#73773) ([Quant][core][refactorization] Refactored qconv_unpack.cpp into an implementation file and higher level call registration and definition file (Reland #73773) pytorch/pytorch#74227)
Support running pipelines on main in .jenkins and tools
Excluding ASAN and periodic jobs from slow job calculation (Excluding ASAN and periodic jobs from slow job calculation pytorch/pytorch#74253)
Use assertEqual consistently in test_sparse_csr.py
[PyTorch] [Model Tracer] Use c10::Synchronized for kernel dtype tracer ([PyTorch] [Model Tracer] Use c10::Synchronized<T> for kernel dtype tracer pytorch/pytorch#74105)
[PyTorch] [Model Tracer] Use c10::Synchronized for custom classes tracer ([PyTorch] [Model Tracer] Use c10::Synchronized<T> for custom classes tracer pytorch/pytorch#74106)
[PyTorch] [Model Tracer] Use c10::Synchronized for build features tracer ([PyTorch] [Model Tracer] Use c10::Synchronized<T> for build features tracer pytorch/pytorch#74107)
[PyTorch] Update Synchronized::withLock() to return the type/value from the aceepted callable ([PyTorch] Update Synchronized<T>::withLock() to return the type/value from the aceepted callable pytorch/pytorch#74108)
[PyTorch] Use c10::Synchronized in RWSafeLeftRightWrapper ([PyTorch] Use c10::Synchronized<T> in RWSafeLeftRightWrapper pytorch/pytorch#74109)
[PyTorch] Add unit test for c10::Synchronized ([PyTorch] Add unit test for c10::Synchronized<T> pytorch/pytorch#74110)
Dispatch from torch.Tensor.to_sparse_coo to to_sparse
Preserve codegen on fx graph in transformer (Preserve codegen on fx graph in transformer pytorch/pytorch#74189)
[ATen][AMD] revert a change of USE_DIRECT_NVRTC ([ATen][AMD] revert a change of USE_DIRECT_NVRTC pytorch/pytorch#74194)
Revert "Update tls logic to work better with guarded call (Update tls logic to work better with guarded call pytorch/pytorch#73925)"
[PyTorch] Add Tensor.is_nested ([PyTorch] Add Tensor.is_nested pytorch/pytorch#73999)
[DDP] Generalize activation checkpoint tests ([DDP] Generalize activation checkpoint tests pytorch/pytorch#74130)
[DDP][Tests] Fix weight sharing test ([DDP][Tests] Fix weight sharing test pytorch/pytorch#74252)
Revert D34886987: [Quant][core][refactorization] Refactored qconv_unpack.cpp into an implementation file and higher level call registration and definition file (Reland [Quant][core][refactorization] Refactored qconv_unpack.cpp into an implementation file and higher level call registration and definition file pytorch/pytorch#73773)
Revert D34886988: [Quant][core][gpu][improvement] Refactored implementation for conv2d_cudnn to use packed parameters (Reland PR#73510)
make sharding strategy configurable and support zero2 algorithm (make sharding strategy configurable and support zero2 algorithm pytorch/pytorch#73819)
remove _lazy_init() in rebuild full params (remove _lazy_init() in rebuild full params pytorch/pytorch#74263)
Remove sync in embedding (Remove sync in embedding pytorch/pytorch#70943)
[Profiler] Switch to thread local subqueues to reduce lock contention. ([Profiler] Switch to thread local subqueues to reduce lock contention. pytorch/pytorch#74151)
First step of refactor lower passes (First step of refactor lower passes (#18) pytorch/pytorch#74219)
[GHA][BE] Remove unneeded shellcheck suppressions ([GHA][BE] Remove unneeded shellcheck suppressions pytorch/pytorch#74308)
[jiterator] sqrt-rsqrt : complex
Move test ops gradients and test ops jit to separate files
Advance fbgemm submodule
Revert "Move test ops gradients and test ops jit to separate files"
Add checkout-pytorch action
add no-sudo argument to checkout-pytorch
Use shared CUPTI by default
Add super() calls for Fx TestCases (Add super() calls for Fx TestCases pytorch/pytorch#74216)
Prepatory changes for GHA workflow consolidations.
fix bug in print test stats
fix upload-test-artifacts step
[PT-D][DDP][BE] Add unit tests for Forward and Backward Hook ([PT-D][DDP] Add unit tests for Forward and Backward Hook pytorch/pytorch#74063)
[fix] torch.amax and torch.amin for empty tensors if dim arg not provided. ([fix] torch.amax and torch.amin for empty tensors if dim arg not provided. pytorch/pytorch#73914)
[Static Runtime] Add out variant wrapper for aten::index_select ([Static Runtime] Add out variant wrapper for aten::index_select pytorch/pytorch#74321)
[torch::deploy] Remove c10::errors from torch::deploy ([torch::deploy] Remove c10::errors from torch::deploy pytorch/pytorch#74283)
[torch::deploy] Replace c10::optional with boost implementation ([torch::deploy] Replace ArrayRef with local implementation to torch::deploy pytorch/pytorch#74286)
[PyTorch][Static Runtime] out variant for where.self ([PyTorch][Static Runtime] out variant for _s_where pytorch/pytorch#73438)
final fixes
Update Release.md with release day steps
Revert "Update Release.md with release day steps"
[pkg] Improve mocked module detection by combining mocked object errors with the rest of the errors in PackageExporter ([pkg] Improve mocked module detection by combining mocked object errors with the rest of the errors in PackageExporter pytorch/pytorch#74315)
Revert D34932200: [pkg] Improve mocked module detection by combining mocked object errors with the rest of the errors in PackageExporter
Move test ops gradients and test ops jit to separate files
Use has_torch_function_unary instead of manual type test.
[Quant][fx] Refactor lowering code (part 1) ([Quant][fx] Refactor lowering code (part 1) pytorch/pytorch#74128)
[WIP][fx2trt] Replacing fp16 and int8 mode with enum type ([WIP][fx2trt] Replacing fp16 and int8 mode with enum type (#24) pytorch/pytorch#74338)
[PyTorch Distributed] Move NCCL_DEBUG print to after NCCL init ([PyTorch Distributed] Move NCCL_DEBUG print to after NCCL init pytorch/pytorch#74287)
Update Release.md with release day steps
Move torch::deploy tests to their own workflow job (Move torch::deploy tests to their own workflow job pytorch/pytorch#73676)
[Quant][core][gpu][improvement] Refactored implementation for conv2d_cudnn to use packed parameters (Reland PR#73510) ([Quant][core][gpu][improvement] Refactored implementation for conv2d_cudnn to use packed parameters (Reland PR#73510) pytorch/pytorch#74318)
[Quant][core][refactorization] Refactored qconv_unpack.cpp into an implementation file and higher level call registration and definition file (Reland [Quant][core][refactorization] Refactored qconv_unpack.cpp into an implementation file and higher level call registration and definition file pytorch/pytorch#73773) ([Quant][core][refactorization] Refactored qconv_unpack.cpp into an implementation file and higher level call registration and definition file (Reland #73773) pytorch/pytorch#74319)
report an error if num_channels is not divisible by num_groups for nn.GroupNorm
Move AndroidNightly to GHA
Run lazy tensor codegen in generate_code.py (Run lazy tensor codegen in generate_code.py pytorch/pytorch#73996)
Add lazy tensor unit tests, disabled (Add lazy tensor unit tests, disabled pytorch/pytorch#74309)
utils: Only check for xnnpack if torch installed (utils: Only check for xnnpack if torch installed pytorch/pytorch#74342)
[Vulkan] Optimize GRU operator with pre-packing ([Vulkan] Optimize GRU operator with pre-packing pytorch/pytorch#73599)
Sync lazy_tensor_staging to master (Sync lazy_tensor_staging to master pytorch/pytorch#74311)
Update LSTM documentation
ci: Add workflow to test collect_env.py
Fix asmjit compilation with clang-13
[PyTorch] Move NestedTensor printing to _tensor_str.py ([PyTorch] Move NestedTensor printing to _tensor_str.py pytorch/pytorch#74000)
[PyTorch] Call self._impl.unbind directly in _nestedtensor wrapper ([PyTorch] Call self._impl.unbind directly in _nestedtensor wrapper pytorch/pytorch#74001)
[PyTorch] Hook CUDA LayerNormKernel up for dispatch ([PyTorch] Hook CUDA LayerNormKernel up for dispatch pytorch/pytorch#74259)
Automated submodule update: FBGEMM (Automated submodule update: FBGEMM pytorch/pytorch#74369)
use gmock 1.10 instead of 1.8 (use gmock 1.10 instead of 1.8 pytorch/pytorch#74150)
[pkg] Improve mocked module detection by combining mocked object errors with the rest of the errors in PackageExporter ([pkg] Improve mocked module detection by combining mocked object errors with the rest of the errors in PackageExporter pytorch/pytorch#74315)
[3/5] Put JIT source inside flatbuffer ([3/5] Put JIT source inside flatbuffer pytorch/pytorch#74245)
[quant][fx] Remove convert.py since it is not used now ([quant][fx] Remove convert.py since it is not used now pytorch/pytorch#74276)
[quant] Rename _convert_do_not_use.py to convert.py ([quant] Rename _convert_do_not_use.py to convert.py pytorch/pytorch#74322)
[ONNX] ONNX Exporter logging ([ONNX] ONNX Exporter logging pytorch/pytorch#71342)
Revert D34957139: [pytorch][PR] Automated submodule update: FBGEMM
[quant] Don't regard MatchAllNode as node matched ([quant] Don't regard MatchAllNode as node matched pytorch/pytorch#74198)
[GHF] Rebase and retry push if fails
Change ShardedTensor torch_function to be a classmethod. (Change ShardedTensor torch_function to be a classmethod. pytorch/pytorch#74273)
Pin unittest-xml-reporting to freeze printing test summary logic
Revert D34929680: Multisect successfully blamed D34929680 for test failures (Revert D34929680: Multisect successfully blamed D34929680 for test failures pytorch/pytorch#74381)
[quant][core][performance] Removed int_repr calls in quantized conv2d cudnn implementation ([quant][core][performance] Removed int_repr calls in quantized conv2d cudnn implementation pytorch/pytorch#73849)
[quant][core][performance] Changed cudnn quantized conv2d impl to use inplace operations ([quant][core][performance] Changed cudnn quantized conv2d impl to use inplace operations pytorch/pytorch#73857)
[ci] inline display_ec2 script
[quant] Add default symmetric qconfig for qnnpack ([quant] Add default symmetric qconfig for qnnpack pytorch/pytorch#74396)
[AutoAccept][Codemod][FBSourceBlackLinter] Daily arc lint --take BLACK
[PG NCCL] catch cuda lib runtime error - driver shutting down ([PG NCCL] catch cuda lib runtime error - driver shutting down pytorch/pytorch#74258)
[Quant][core][refactorization] Refactored qlinear_unpack.cpp into an implementation file and higher level call registration and definition file ([Quant][core][refactorization] Refactored qlinear_unpack.cpp into an implementation file and higher level call registration and definition file pytorch/pytorch#73956)
[quant][fx] Only do reference moduel swapping for floating point fused modules ([quant][fx] Only do reference moduel swapping for floating point fused modules pytorch/pytorch#74231)
[quant][fx] Fix dynamic weighted op lowering when input is used multiple times ([quant][fx] Fix dynamic weighted op lowering when input is used multiple times pytorch/pytorch#74364)
Automated submodule update: FBGEMM (Automated submodule update: FBGEMM pytorch/pytorch#74409)
Automated submodule update: FBGEMM (Automated submodule update: FBGEMM pytorch/pytorch#74423)
[jiterator] log, log10 : complex
Extends OpInfo architecture with reference inputs, adds them for elementwise binary operators
[FSDP] Override named_parameters() for clean names in summon_full_params() ([FSDP] Override named_parameters() for clean names in summon_full_params() pytorch/pytorch#74333)
Inplace forward AD formula optimization
Workflow consolidation for GitHub actions
[jiterator] log2 : complex
skip failing jobs
[ROCm] enable foreach fastpath
[ROCm] revert cat operator performance work-around
[ci] move test_tools into lint workflow
[GHF] Add remaining PyTorch Distributed Devs to GHF merge rules
[PyTorch] Make NestedTensorImpl::get_nested_size_tensor() const
[quant][fx] Cleanup quantization_patterns.py ([quant][fx] Cleanup quantization_patterns.py pytorch/pytorch#74407)
[CUDACachingAlloc/GPUInference] Implement garbage collection without GPU sync ([CUDACachingAlloc/GPUInference] Implement garbage collection without GPU sync pytorch/pytorch#74261)
Enable test_lazy binary test in oss CI (Enable test_lazy binary test in oss CI pytorch/pytorch#74449)
[ROCm] Update the magma commit
[ROCM] Navi21 Enablement 6: Tensor kernels
[structured kernels] Port amin to structured kernels. ([structured kernels] Port amin to structured kernels. pytorch/pytorch#73581)
[ROCm] enable HIP IPC
Support RRefs that contain threading.Thread
[shard] use scatter in shard_parameter API ([shard] use scatter in shard_parameter API pytorch/pytorch#72160)
Revert "[ROCm] Update the magma commit"
torch.where variant that supports sparse tensor inputs
[Core ML] Support enumerated input shapes ([Core ML] Support enumerated input shapes pytorch/pytorch#74441)
supporting complex with requires_grad in autodiff (supporting complex with requires_grad in autodiff pytorch/pytorch#74339)
adjust conditions to enable jit decomposition pass only for GPU device (adjust conditions to enable jit decomposition pass only for GPU device pytorch/pytorch#73637)
[GHA] Use GITHUB_DIR in path generate_ci_workflow.py
Add JIT graph fuser for oneDNN Graph API (Preview4) (Add JIT graph fuser for oneDNN Graph API (Preview4) pytorch/pytorch#68111)
[ROCm] Update the magma commit
Refactor error input tests in test_torch.py to OpInfos (Refactor error input tests in test_torch.py to OpInfos pytorch/pytorch#73981)
skip flaky onnx quantized test
Update RELEASE.md with steps to prepare before cutting RC
Revert D34584878: [pytorch][PR] Add JIT graph fuser for oneDNN Graph API (Preview4)
[GHA] Fix Android-full workflow
Throw python_error if the call returns nullptr.
add autocast cpu doc
[quant][fx] Remove unused method from QuantizeHandler ([quant][fx] Remove unused method from QuantizeHandler pytorch/pytorch#74408)
Create jiterator cache dirs recursively
Enable faster cuBLAS path for torch.linalg.lstsq for batch of small matrices
added handling for r=0 edge case in torch.combinations(tensor, r)
Cosmetic changes to torchhub tests (Cosmetic changes to torchhub tests pytorch/pytorch#74431)
Remove deprecated import_module hub function (Remove deprecated import_module hub function pytorch/pytorch#74428)
Remove private _download_url_to_file (Remove private _download_url_to_file pytorch/pytorch#74429)
Properly catch warning in torchhub tests (Properly catch warning in torchhub tests pytorch/pytorch#74430)
Cosmetic changes to torchhub tests (Cosmetic changes to torchhub tests pytorch/pytorch#74431)
disable android workflows
Adjust binary_linux_test.sh to support reruns
Torchhub: automatically disambiguate ref if it's both a branch and a tag (Torchhub: automatically disambiguate ref if it's both a branch and a tag pytorch/pytorch#74418)
CrossEntropyLoss triggers floating point exception
[hotfix] hotfix a bug of shard tensor
Support coalesce/to_dense on boolean sparse tensors.
[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Make lazy codegen honor per-operator-headers flag (Make lazy codegen honor per-operator-headers flag pytorch/pytorch#74450)
Disable meta device tests.
Process inputs and outputs in fx interpreter (Process inputs and outputs in fx interpreter pytorch/pytorch#74242)
[GHF] Refactor comment fetching logic
[GHF] Force dry-run to be passed as named argument
Passing explicit pretrained_backbone (Passing explicit pretrained_backbone pytorch/pytorch#74372)
Fixes build of CUDAHooks.cpp
[GHF] Add test for GitHubPR.get_last_comment
Revert D34898108: Process inputs and outputs in fx interpreter
docs: expose at::native::unfold (docs: expose at::native::unfold pytorch/pytorch#74224)
Extend _save_for_mobile and _load_for_mobile to support flatbuffer format; Default format is pickle + Change buck targets to support only pickle and pickle + flatbuffer for migration (Extend _save_for_mobile and _load_for_mobile to support flatbuffer format; Default format is pickle pytorch/pytorch#74209)
[NNC] call super().setUp() & tearDown() in test_tensorexpr.py ([NNC] call super().setUp() & tearDown() in test_tensorexpr.py pytorch/pytorch#74504)
[ROCm] enable foreach fastpath
Revert "Create jiterator cache dirs recursively"
Prelu OpInfo: change to mostly use positional arg
[PyTorch] Fix flatbuffer build error on Android ([PyTorch] Fix flatbuffer build error on Android pytorch/pytorch#74518)
[PyTorch] Fix quantized linear_unpack ops not being registered issue ([PyTorch] Fix quantized linear_unpack ops not being registered issue pytorch/pytorch#74526)
[ONNX] Adjust is_train flag for onnx pass deduplicate initializers
Revert D34805092: Extend _save_for_mobile and _load_for_mobile to support flatbuffer format; Default format is pickle + Change buck targets to support only pickle and pickle + flatbuffer for migration
[SR] Avoid boxing inputs in DictConstruct/ListUnpack ([SR] Avoid boxing inputs in DictConstruct/ListUnpack pytorch/pytorch#74250)
Land remaining parts of Torchscript Lazy Tensor backend (Land remaining parts of Torchscript Lazy Tensor backend pytorch/pytorch#74111)
Make bitwise_left_shift/right_shift scalar variants composite
Virtualize <type>Storage classes (Virtualize <type>Storage classes pytorch/pytorch#66970)
Print reason for test skipped in CI (Print reason for test skipped in CI pytorch/pytorch#74451)
skip HPU tensor in TensorIterator (skip HPU tensor in TensorIterator pytorch/pytorch#73343)
[pytorch profiler] enable iteration tracking for kineto ([pytorch profiler] enable iteation tracking for kineto pytorch/pytorch#72292)
Automated submodule update: FBGEMM (Automated submodule update: FBGEMM pytorch/pytorch#74447)
[mobile] add test model generation script and ios tests
Back out "[PyTorch Distributed] Consolidate NCCL_DESYNC_DEBUG and TORCH_DISTRIBUTED_DEBUG=INFO" (Back out "[PyTorch Distributed] Consolidate NCCL_DESYNC_DEBUG and TORCH_DISTRIBUTED_DEBUG=INFO" pytorch/pytorch#74586)
Class name
[quant][core] Refactor qat conv implementation to use the same _ConvNd as base class ([quant][core] Refactor qat conv implementation to use the same _ConvNd as base class pytorch/pytorch#74505)
[quant][core][gpu][refactor] Refactored auxiliary functions in cudnn Conv.cpp to an utilities file ([quant][core][gpu][refactor] Refactored auxiliary functions in cudnn Conv.cpp to an utilities file pytorch/pytorch#73957)
[quant][core][gpu] Wrapped CacheKey in Conv.cpp with anonymous namespace ([quant][core][gpu] Wrapped CacheKey in Conv.cpp with anonymous namespace pytorch/pytorch#74543)
[Foreach Reduction] Use OpMathType tensor for intermediate results
Move to_sparse_csr to C++
[GHF] Add --force option
[GHF] Add option to fetch all comments for PR
disable contiguity on cross dimensional overlapped tensor
[ONNX] Modify int[] listConstruct to support tensor arg
[quant][fx] Support conv1d and its fusion variants in QAT ([quant][fx] Support conv1d and its fusion variants in QAT pytorch/pytorch#74506)
[quant][fx] Relax the constraint for input of custom module nodes ([quant][fx] Relax the constraint for input of custom module nodes pytorch/pytorch#74510)
set_dir expanding "~"
Revert "disable contiguity on cross dimensional overlapped tensor"
nvfuser parser skip api (nvfuser parser skip api pytorch/pytorch#74520)
Add CPU slow test job (Add CPU slow test job pytorch/pytorch#73748)
disable contiguity on cross dimensional overlapped tensor
Enable XLA CI tests
[complex-half] support casting (by updating copy_)
[jiterate] addcmul : complex
Create jiterator cache dirs recursively (reland) ( Create jiterator cache dirs recursively (reland) pytorch/pytorch#74592)
[SR] Eliminate extra permute ops before aten::sum ([SR] Eliminate extra permute ops before aten::sum pytorch/pytorch#74481)
[JIT] Add autocasting to freezing pass & enable autocast pass by default ([JIT] Add autocasting to freezing pass & enable autocast pass by default pytorch/pytorch#74178)
Supports super().torch_dispatch with arguments list
[ONNX] Fix 1d case flatten export
Revert "Enable XLA CI tests"
[GHF] Pass message id to revert command
Allow torch/csrc/deploy/interpreter/Optional.hpp to be allowed into the wheel distribution (Allow torch/csrc/deploy/interpreter/Optional.hpp to be allowed into the wheel distribution pytorch/pytorch#74643)
[GHF] Add Lint to the list of mandatory checks
[jiterator] abs : complex
[FSDP] Fix summon_full_params test ([FSDP] Fix summon_full_params test pytorch/pytorch#74456)
[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
[shard] Add ReplicatedTensor ([shard] Add ReplicatedTensor pytorch/pytorch#73529)
Remove high priority as an owner for tests (Remove high priority as an owner for tests pytorch/pytorch#74555)
Add forward AD support for clamp when bounds are tensors
Skip specifying rcond for gelsy driver in tests
[quant] Add default symmetric qat qconfig for qnnpack ([quant] Add default symmetric qat qconfig for qnnpack pytorch/pytorch#74507)
[Dynamic RPC] Allow for optional world_size argument in init_rpc ([Dynamic RPC] Allow for optional world_size argument in init_rpc pytorch/pytorch#73372)
[Dynamic RPC] Allow newly joined ranks to communicate with existing ranks ([Dynamic RPC] Allow newly joined ranks to communicate with existing ranks pytorch/pytorch#73373)
Add cuda_atomic_ops_test to run_tests.sh
Decorate test_pdist_large for requiring large memory (Decorate test_pdist_large for requiring large memory pytorch/pytorch#74574)
register meta allocator
Adding gpuAtomicMin and gpuAtomicMax for non-integer types
Add meta support for [un]squeeze(), fix bug with set_()
Pull request to run CI for Clarify test dependencies (e.g., into a test-requirements.txt file) pytorch/pytorch#72556 (Pull request to run CI for #72556 pytorch/pytorch#73404)
[quant] Populate FakeQuantize quant_min/quant_max to observer ([quant] Fix the quant_min/quant_max override in FakeQuantize pytorch/pytorch#74581)
[reland] Process inputs and outputs in fx interpreter ([reland] Process inputs and outputs in fx interpreter pytorch/pytorch#74637)
[Profiler] Pay for what you use (v2) ([Profiler] Pay for what you use (v2) pytorch/pytorch#74484)
Automated submodule update: FBGEMM (Automated submodule update: FBGEMM pytorch/pytorch#74633)
c10d: retry dns lookup failures (c10d: retry dns lookup failures pytorch/pytorch#74641)
[pyper] to + lengths_to_offsets ([pyper] to + lengths_to_offsets pytorch/pytorch#73879)
[quant] Remove assert for weight since it could be non-Tensor ([quant] Remove assert for weight since it could be non-Tensor pytorch/pytorch#74365)
Back out "Revert D34805092: Extend _save_for_mobile and _load_for_mobile to support flatbuffer format; Default format is pickle + Change buck targets to support only pickle and pickle + flatbuffer for migration" (Extend _save_for_mobile and _load_for_mobile to work with flatbuffer format pytorch/pytorch#74594)
[Model Averaging] Make HierarchicalModelAverager a subclass of averagers.ModelAverager
Add support for backend to register reducer timer
Jinja2 version pinned to 3.0.* (Jinja2 version pinned to 3.0.* pytorch/pytorch#74690)
Add Python Version to Torch.Package metadata (Add Python Version to Torch.Package metadata pytorch/pytorch#74610)
[4/5]Testing jit module in flatbuffer in Python. ([4/5]Testing jit module in flatbuffer in Python. pytorch/pytorch#74387)
Jinja2 for docs/cpp build set to version 3.0
Implement _pad_circular in ATen
Fix formatting of scalar tensors (don't call item)
[PyTorch] Avoid registering ops into dispatcher in lightweight dispatch ([PyTorch] Avoid registering ops into dispatcher in lightweight dispatch pytorch/pytorch#74664)
[PyTorch] Only select root ops in codegen unboxing ([PyTorch] Only select root ops in codegen unboxing pytorch/pytorch#74663)
[quant][fx] Support some default ops in the native backend config ([quant][fx] Support some default ops in the native backend config pytorch/pytorch#74600)
Fix issue with prim::Print() and torch::deploy (Fix issue with prim::Print() and torch::deploy pytorch/pytorch#74513)
Make args work in the uru10x10_to_trt_eval script (Make args work in the uru10x10_to_trt_eval script pytorch/pytorch#74707)
[PyTorch][deploy] Work around missing libdl ([PyTorch][deploy] Work around missing libdl pytorch/pytorch#74705)
qlinear: Remove legacy cpp_custom_type_hack support (qlinear: Remove legacy cpp_custom_type_hack support pytorch/pytorch#72680)
Reland "[pytorch][PR] Support dataclasses in TorchScript" (Reland "[pytorch][PR] Support dataclasses in TorchScript" pytorch/pytorch#74353)
[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
[quant][fx] Merge is_general_tensor_shape_op into is_general_tensor_value_op in QuantizeHandler ([quant][fx] Merge is_general_tensor_shape_op into is_general_tensor_value_op in QuantizeHandler pytorch/pytorch#74601)
Use nvidia cuda image without cudnn for cudnn 8 and up
Call super().setUp() and super().tearDown() in torchhub tests
Remove with_traceback(None) in wrapped_call to show the root cause error
[PyTorch Edge] Make contexts thread local for quantized matmul ([PyTorch Edge] Make contexts thread local for quantized matmul pytorch/pytorch#74676)
[quant][fx] Removing more unused code ([quant][fx] Removing more unused code pytorch/pytorch#74603)
Add Sherlock to superusers
free up dispatch key space (in C++) (free up dispatch key space (in C++) pytorch/pytorch#72827)
DispatchKeySet perf improvements (DispatchKeySet perf improvements pytorch/pytorch#72828)
[ROCM] Navi21 Enablement 7: Sparse kernels
mul(sparse_csr, sparse_csr) using mul(sparse, sparse)
Revert D34808842: Reland "[pytorch][PR] Support dataclasses in TorchScript"
[GHF] Speedup default PR query
Make ProcessGroupNCCL load torch_ucc.so when TORCH_UCC_LIBRARY_PATH is set (Make ProcessGroupNCCL load torch_ucc.so when TORCH_UCC_LIBRARY_PATH is set pytorch/pytorch#69552)
Support masked sum on sparse COO tensors.
[GHF] Adding James Reed to Merge Rules superusers ([GHF] Adding James Reed to Merge Rules superusers pytorch/pytorch#74758)
[Profiler] Store Input shapes, dtypes, and metadata into flat AppendOnlyList ([Profiler] Store Input shapes, dtypes, and metadata into flat AppendOnlyList pytorch/pytorch#74241)
[reland] Update tls logic to work better with guarded call (Update tls logic to work better with guarded call pytorch/pytorch#73925)
Adds dependencies on lazy codegen sources to invocation of generate_code (Adds dependencies on lazy codegen sources to invocation of generate_code pytorch/pytorch#74750)
ci: Move ssh setup to it's own action
Allow specifying tags for aten operators in native_functions.yaml
Add private conversion function from CSR to block CSR
[android] improve unsupported scalar type error message for android
[FX] Assert None concrete_args and improve error messages ([FX] Assert None concrete_args and improve error messages pytorch/pytorch#74662)
[FX] Make split_module preserve proper placeholder names ([FX] Make split_module preserve proper placeholder names pytorch/pytorch#74736)
[ROCM] Navi21 Enablement 8: Index, Repeat and Sort kernels
Fix full Android builds (take 2)
Resolve int[]? arguments to new OptionalIntArrayRef class
Better type checking in disable_torch_function/dispatch
Remove default parameter of ShufflerIterDataPipe (Remove default parameter of ShufflerIterDataPipe pytorch/pytorch#74370)
[Easy][FSDP] Minor doc fixes ([Easy][FSDP] Minor doc fixes pytorch/pytorch#74214)
Fix asarray docs + add test case.
[Quant][fx] Refactor lowering code (part 2) ([Quant][fx] Refactor lowering code (part 2) pytorch/pytorch#74619)
[FSDP] named_buffers fix ([FSDP] named_buffers fix pytorch/pytorch#74517)
[complex32] support complex operator
[jiterator] reduce kernel code duplication ([jiterator] reduce kernel code duplication pytorch/pytorch#73908)
Revert D35045122: [Easy][FSDP] Minor doc fixes
[jiterate] addcdiv : complex
[jiterator] sgn: complex
[quant][fx] Remove should_insert_output_observers ([quant][fx] Remove should_insert_output_observers pytorch/pytorch#74775)
Expose GetMetaDataIfDebugging API (Expose GetMetaDataIfDebugging API pytorch/pytorch#74784)
Automatically extern C extension modules in torch.package (Automatically extern C extension modules in torch.package pytorch/pytorch#74702)
[quant][fx] Remove input_output_observed from BinaryOpQuantizeHandler ([quant][fx] Remove input_output_observed from BinaryOpQuantizeHandler pytorch/pytorch#74776)
[fix] Contiguity of torch.ravel!
Grammatically updated quantization tech doc
[quant][gpu][core] Implemented quantized add operator using cudnn ([quant][gpu][core] Implemented quantized add operator using cudnn [reland PR74463] pytorch/pytorch#74463)
[SR] Native implementation for IntImplicit ([SR] Native implementation for IntImplicit pytorch/pytorch#74562)
[SR] Native implementation for select ([SR] Native implementation for select pytorch/pytorch#74568)
[SR] Native implementation for reshape_as ([SR] Native implementation for reshape_as pytorch/pytorch#74585)
[Model Averaging] Remove unused variable world_size in post_localSGD_hook.py ([Model Averaging] Remove unused variable world_size in post_localSGD_hook.py pytorch/pytorch#74803)
Fix misleading DataLoader docstring
Revert "Allow specifying tags for aten operators in native_functions.yaml"
Revert D35009111: [quant][gpu][core] Implemented quantized add operator using cudnn
[PyTorch] Add fused addmm path in linear for contiguous 3D input ([PyTorch] Add fused addmm path in linear for contiguous 3D input pytorch/pytorch#72728)
Add missing LTC headers, re-enble xla configuration
Revert D35124731: Automatically extern C extension modules in torch.package
[Easy][PyTorchEdge] Fix unused variable build error ([PTE] Fix unused variable build error pytorch/pytorch#74117)
Move CompositeCompliance tests to their own TestCase (Move CompositeCompliance tests to their own TestCase pytorch/pytorch#74644)
Test case where some inputs are Tensor Subclasses in CompositeCompiance (Test case where some inputs are Tensor Subclasses in CompositeCompiance pytorch/pytorch#74645)
Composite Compliance testing for backward formulas (Composite Compliance testing for backward formulas pytorch/pytorch#74646)
[ROCm] Enable topk operator for bfloat16 dtype
[PT-D] Update dist code owners ([PT-D] Update dist code owners pytorch/pytorch#74840)
fixes torch.jit.script lp_pool bug. (fixes torch.jit.script lp_pool bug. pytorch/pytorch#73287)
CentOS 7 build fix in rocm5.2_internal_testing branch (CentOS 7 build failed in rocm5.1_internal_testing branch ROCm/pytorch#900)
[ci] make wait-ssh the default behavior
Add strided layout support for to_dense
Add typing for torch.return_type
Add C++ implementation of histogramdd
[GHF] Adding PyTorch Compilers Devs to merge_rules
add model test for Android
Cleanup C10::Scalar stringification (Cleanup C10::Scalar stringification pytorch/pytorch#73462)
[vulkan] Remove unnecessary include in vulkan_api_test ([vulkan] Remove unnecessary include in vulkan_api_test pytorch/pytorch#74699)
[FSDP] exclude from typing ([FSDP] exclude from typing pytorch/pytorch#74833)
Add lazy tensor python bindings (Add lazy tensor python bindings pytorch/pytorch#74508)
[fix] fix op name in dispatch
rocblas alt impl during backward pass only (rocblas alt impl during backward pass only ROCm/pytorch#978)
[ci] unpin workflows from master
[ci] add an easier-to-understand error for workflow consolidation FC breaks
[ci] add timeout to rocm test
[JIT][easy] comment about where nvfuser is called in profiling_graph_executor_impl.cpp
Back out "DispatchKeySet perf improvements" (Back out "DispatchKeySet perf improvements" pytorch/pytorch#74858)
Back out "free up dispatch key space (in C++)" (Back out "free up dispatch key space (in C++)" pytorch/pytorch#74859)
[fx2trt] Enable enum type with lower_precision ([WIP][fx2trt] Enable enum type with lower_precision pytorch/pytorch#74841)
[GHF] Fix force merge handling
Combine android and ios merge rule
Fix docstring for torch.roll
[Profiler] Limit calls to recordThreadInfo ([Profiler] Limit calls to recordThreadInfo pytorch/pytorch#74888)
[FX] Fix type of argument min_acc_module_size ([FX] Fix type of argument min_acc_module_size pytorch/pytorch#74891)
Add additional CUDA error handling macros (Add additional CUDA error handling macros pytorch/pytorch#74865)
Back out D34696255 "[pyper] to + lengths_to_offsets" (Back out D34696255 "[pyper] to + lengths_to_offsets" pytorch/pytorch#74906)
fix docs error in Autograd Mechanics
Clean up profiling mode and profiling executor strategy (Clean up profiling mode and profiling executor strategy pytorch/pytorch#73875)
Remove bailout logic (Remove bailout logic pytorch/pytorch#73876)
[JIT] Make aot autograd decompositions usable in JIT, add script for serializing the decompositions ([JIT] Make aot autograd decompositions usable in JIT, add script for serializing the decompositions pytorch/pytorch#73938)
Introduce function-local settings for executor, expose in c++ (Introduce function-local settings for executor, expose in c++ pytorch/pytorch#74012)
Add api for changing function executor settings, hook up execution with decomposition registry (Add api for changing function executor settings, hook up execution with decomposition registry pytorch/pytorch#74186)
Extend Graph Export to NNC, extend script to support CPU (Extend Graph Export to NNC, extend script to support CPU pytorch/pytorch#74076)
get_bazel: Add download path for Mac
Support sum(sparse_csr)
[GHF] Add ability to specify team as the approver
[GHF] Add pytorch/pytorch-dev-infra as OSS CI approver
[PyTorch] Delete NestedTensor Python wrapper ([PyTorch] Delete NestedTensor Python wrapper pytorch/pytorch#74691)
[SR] Force split_and_squeeze usage via graph transformation ([SR] Force split_and_squeeze usage via graph transformation pytorch/pytorch#74274)
Requires grad guard
Fix cusparse sync issue in bsrsv2 and bsrsm2 [v2]
[ROCm] libtorch nightly builds
Extend CSR constructor to support batched indices and values
expanded weights: embedding faster rule
Revert "add model test for Android"
[Easy][FSDP] (Reland) Doc fixes ([Easy][FSDP] (Reland) Doc fixes pytorch/pytorch#74834)
More folders in clang-tidy (More folders in clang-tidy pytorch/pytorch#74908)
Add comments for adding shape function and linting
Implement torch.special.log_ndtr
[quant][gpu][core] Implemented quantized add operator using cudnn [reland PR74463] ([quant][gpu][core] Implemented quantized add operator using cudnn [reland PR74463] pytorch/pytorch#74463)
[GHF] Print better merge rule match reason
fix typo torch.functional.py
[complex32] make_tensor
Upgrade CI to ROCm5.0
[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Add torchhub devs to GHF superusers
Move torchhub tests into separate test_hub.py file
Remove use of force_reload parameter from torchhub tests
[FSDP] Add full optim state dict ([FSDP] Add full optim state dict pytorch/pytorch#74215)
[FSDP] Optim state chkpt: key by param name, not ID ([FSDP] Optim state chkpt: key by param name, not ID pytorch/pytorch#74879)
[FSDP] Add re-key btw param names/IDs for optim state dict ([FSDP] Add re-key btw param names/IDs for optim state dict pytorch/pytorch#74912)
Pin cmake on windows
[AO][bugfix] Fixing FX QAT but for untraceable modules ([AO][bugfix] Fixing FX QAT but for untraceable modules pytorch/pytorch#74277)
[Lint] Clang-format ios folder
record_function: update to use custom_class API
[testing] Update dispatch macros
masked std
[jiterator] kaiser_window
EmbeddingBagCUDA: remove oob check for perf
Check all CUDA API calls for errors in caffe2/c10/ (Check all CUDA API calls for errors in caffe2/c10/ pytorch/pytorch#74918)
[Model Averaging] Add a unit test that launches hierarchical SGD by PostLocalSGDOptimizer ([Model Averaging] Add a unit test that launches hierarchical SGD by PostLocalSGDOptimizer pytorch/pytorch#74668)
[BE] Fix bug in flaky test uploading
[ci] delete unused templates
Revert "[testing] Update dispatch macros"
Revert "Extend CSR constructor to support batched indices and values"
Add binary to benchmark model load speed (Add binary to benchmark model load speed pytorch/pytorch#74700)
[DataPipe] only apply special serialization when dill is installed
[PyTorchEdge] Export flatbuffers from _save_parameters() ([PyTorchEdge] Export flatbuffers from _save_parameters() pytorch/pytorch#74579)
[PyTorchEdge] Make _load_parameters() handle flatbuffer inputs ([PyTorchEdge] Make _load_parameters() handle flatbuffer inputs pytorch/pytorch#74580)
Add models test for android and iOS
[testing] Update dispatch macros ([testing] Update dispatch macros pytorch/pytorch#74977)
ci: Comment out nvidia repositories
[tensorexpr] Add support for aten::stack ([tensorexpr] Add support for aten::stack pytorch/pytorch#73801)
[ios] Update _ios-build-test.yml
Reland "[pytorch][PR] Support dataclasses in TorchScript" take 2 (Reland "[pytorch][PR] Support dataclasses in TorchScript" pytorch/pytorch#74353)
Removed python dispatch keys from dispatch key extraction
ci: move nvidia repo disable to common.sh
[skip ci]Revert "Add comments for adding shape function and linting"
[skip ci] Set pytree tests to module: pytree owner ([skip ci] Set pytree tests to module: pytree owner pytorch/pytorch#74686)
Add comments for adding shape function and linting ([JIT] Add comments for adding shape function and linting pytorch/pytorch#73570)
[JIT] fix common_expression_hoisting ([JIT] fix common_expression_hoisting pytorch/pytorch#74794)
[tensorexpr] Enabled aten::stack in the fuser pass with static shapes ([tensorexpr] Enabled aten::stack in the fuser pass with static shapes pytorch/pytorch#74077)
[quant][fx] Cleanup unused to_fp16 check code in lowering ([quant][fx] Cleanup unused to_fp16 check code in lowering pytorch/pytorch#74969)
Revert D34808051: [tensorexpr] Enabled aten::stack in the fuser pass with static shapes
Revert D34647822: [tensorexpr] Add support for aten::stack
Check all CUDA API calls for errors in torch/ (Check all CUDA API calls for errors in torch/ pytorch/pytorch#74923)
[quant][ns] Fix ns tool bug for mobilenetv2/v3 ([quant][ns] Fix ns tool bug for mobilenetv2/v3 pytorch/pytorch#74149)
Revert D35194935: Check all CUDA API calls for errors in torch/
[quant] fix int16 quantization scale in conv weight ([quant] fix int16 quantization scale in conv weight pytorch/pytorch#74665)
[Deploy] Change numModules type to unsigned ([Deploy] Change numModules type to unsigned pytorch/pytorch#74978)
[quant] Fix qmin/qmax when using customized qrange ([quant] Fix qmin/qmax when using customized qrange pytorch/pytorch#74717)
Nvfuser code bump 030122 (Nvfuser code bump 030122 pytorch/pytorch#73627)
Revert D35258695: [quant][fx] Cleanup unused to_fp16 check code in lowering
Revert "Reland "[pytorch][PR] Support dataclasses in TorchScript" take 2 (Reland "[pytorch][PR] Support dataclasses in TorchScript" pytorch/pytorch#74353)"
Make all .pyi.in files exportable from torch/_C/ folder (Make all .pyi.in files exportable from torch/_C/ folder pytorch/pytorch#74962)
move //tools/codegen:codegen into shared build structure (move //tools/codegen:codegen into shared build structure pytorch/pytorch#74386)
use the //tools/codegen target in Bazel (use the //tools/codegen target in Bazel pytorch/pytorch#74465)
[WIP][FSDP] Mixed precision enablement ([FSDP] Mixed precision enablement pytorch/pytorch#74452)
Add Hpu to the rebuild component list
move codegen binary to the common build system (move codegen binary to the common build system pytorch/pytorch#74470)
use the shared //tools/codegen:gen in OSS Bazel (use the shared //tools/codegen:gen in OSS Bazel pytorch/pytorch#74471)
Catch overflows in calculating storage byte size
Revert D35000703: [WIP][FSDP] Mixed precision enablement
Use ONNX Runtime 1.11 in CI
remove unused nn_path from generate_code (remove unused nn_path from generate_code pytorch/pytorch#74563)
[ROCm] unskip FFT tests
expanded weights: layer norm faster rule
[PyTorch Edge] Add Quantized Softmax Op (Naive Implementation) ([PyTorch Edge] Add Quantized Softmax Op (Naive Implementation) pytorch/pytorch#75017)
expanded weights: group norm faster rule
Adding versions to flatbuffer schema (schema change pytorch/pytorch#74989)
Revert "Removed python dispatch keys from dispatch key extraction"
Reland: "free up dispatch key space (in C++)" (Back out "Back out "free up dispatch key space (in C++)"" pytorch/pytorch#74963)
Introducing SymInt to Pytorch (for tracing size arithmetic) (master rebase) (Introducing SymInt to Pytorch (for tracing size arithmetic) (master rebase) pytorch/pytorch#74861)
[FSDP] Register state_dict hooks for FlatParamsWrapper even if params_list is empty ([FSDP] Register state_dict hooks for FlatParamsWrapper even if params_list is empty pytorch/pytorch#74860)
Changes to support input sequence ID tracking (Changes to support input sequence ID tracking pytorch/pytorch#70264)
Disallow functions that are in submodules to also be methods
Fix ir_metadata Python frames func and remove dead code (Fix ir_metadata Python frames func and remove dead code pytorch/pytorch#74979)
[torch.package] add utility for determining where bad modules may come from ([torch.package] add utility for determining where bad modules may come from pytorch/pytorch#74998)
masked argmin and argmax
Pin cmake version to workaround pytorch build issues
[TorchArrow][AIBench] Add AIBench Metrics for TorchArrow Inference Benchmark Test ([TorchArrow][AIBench] Add AIBench Metrics for TorchArrow Inference Benchmark Test pytorch/pytorch#75035)
Implement F.pad in ATen
stft: Implement center padding in ATen
Add non-eager registration to dispatch autogen (Add non-eager registration to dispatch autogen pytorch/pytorch#74557)
Revert "masked argmin and argmax"
Add mapping for unsqueeze_n_times (Add mapping for unsqueeze_n_times pytorch/pytorch#75043)
Add forward AD for torch.atan2
[static-runtime] optimize empty if blocks at runtime ([static-runtime] optimize empty if blocks at runtime pytorch/pytorch#74987)
[BC-breaking] Use ScatterGatherKernel for scatter_reduce (CPU-only) ([BC-breaking] Use ScatterGatherKernel for scatter_reduce (CPU-only) pytorch/pytorch#74226)
[PT] Make error message from jit.trace more meaningful. ([PT] Make error message from jit.trace more meaningful. pytorch/pytorch#75056)
[fx][1/2] add PassManager and refactor AFG/AGM ([fx][1/2] add PassManager and refactor AFG/AGM pytorch/pytorch#74972)
[AutoAccept][Codemod][FBSourceGoogleJavaFormatLinter] Daily arc lint --take GOOGLEJAVAFORMAT
[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Nvfuser guard patch
Use the same checks in all grid_sampler functions (Use the same checks in all grid_sampler functions pytorch/pytorch#74635)
Extending _get_bytecode_version to support flatbuffers format (Extending _get_bytecode_version to support flatbuffers format pytorch/pytorch#75021)
[complex32] cat, fill_(partial), item
scatter_reduce CUDA support
re-enable disabled test via commit message
Removed python dispatch keys from dispatch key extraction
Revert D35284563: Use the same checks in all grid_sampler functions
[quant] Always match the first matchable pattern in fuse ([quant] Always match the first matchable pattern in fuse pytorch/pytorch#75047)
[quant][gpu][core] Added quantized linear operator in cudnn ([quant][gpu][core] Added quantized linear operator in cudnn pytorch/pytorch#73959)
Fix c10 sign-compare violations
Fix sign-compare violations in python_list.h
irangefy autograd codegen
irangefy ONNX
Fix LTC tests on Windows (Fix LTC tests on Windows pytorch/pytorch#74960)
[mobile] display uncovered ops and pytorch version
[BE] add readme for .github, located in .github/scripts
[mobile] enable ios tests for on-the-fly models
cholesky_inverse: complex autograd, forward AD and correct tests.
GIT_DEFAULT_BRANCH can be empty
Update SSH part of contributing. Fixes Debug with ssh instructions incorrect on CONTRIBUTING.md pytorch/pytorch#75112
[ROCm] re-enable test_Conv2d_groups_nobias tests
[GHA] Do not chown for linux CI
[ROCM] unskip test_fn_grad
Revert "Nvfuser guard patch"
[JIT] Optionally validate nvfuser outputs after execution ([JIT] Optionally validate nvfuser outputs after execution pytorch/pytorch#74361)
[JIT] Enable NVFuser tests in OSS CI ([JIT] Enable NVFuser tests in CI pytorch/pytorch#73322)
[JIT] OpInfo tests for nvfuser ([JIT] OpInfo tests for nvfuser pytorch/pytorch#71299)
Fix SyncBatchNorm for empty inputs (Fix SyncBatchNorm for empty inputs pytorch/pytorch#74944)
[PyTorch] Flip polarity of masked_softmax mask (Correctly handle include replacements. ROCm/pytorch#78)
Dynamo+LTC: merging related code from staging branch to master (Dynamo+LTC: merging related code from staging branch to master pytorch/pytorch#75046)
[Easy][FSDP] Update full osd warning ([Easy][FSDP] Update full osd warning pytorch/pytorch#75109)
Extend jit::load to work on flatbuffer file (Extend jit::load to work on flatbuffer file pytorch/pytorch#75022)
Rewrite upgrader bytecode version from 3 to 4 (content unchanged) (Rewrite upgrader bytecode version from 3 to 4 (content unchanged) pytorch/pytorch#75120)
Save disable_torch_function in ThreadLocalState
Skip TorchScript backend for OVRSource as well (Skip TorchScript backend for OVRSource as well pytorch/pytorch#75138)
[WIP] Add support to Tensor[] for structured kernel codegen.
[quant][fx] Remove is_output_quantized from QuantizeHandler ([quant][fx] Remove is_output_quantized from QuantizeHandler pytorch/pytorch#74843)
WIP Jiterator reduction
Update forward AD not supported error message
torch.tensor: add tests for list of numpy arrays case
Restore TestTorchFunctionOverride
Deprecate bytecode v3 and bump kMinSupportedBytecodeVersion to 4 (Deprecate bytecode v3 and bump kMinSupportedBytecodeVersion to 4 pytorch/pytorch#75149)
Add BFloat16 support for smooth_l1_loss on CPU (Add BFloat16 support for smooth_l1_loss on CPU pytorch/pytorch#62558)
fix: buck2 build fbcode//caffe2:torch_types_gen (fix: buck2 build fbcode//caffe2:torch_types_gen pytorch/pytorch#75176)
Revert D35352705: fix: buck2 build fbcode//caffe2:torch_types_gen
[Quant][fx] Define native backend_config_dict for linear and conv ([Quant][fx] Define native backend_config_dict for linear and conv pytorch/pytorch#74636)
Split PyInterpreter into its own file.
Introduce SafePyObject, make TorchDispatchTypeObject use it
Dedupe no parsing torch_function handler
Use the same checks in all grid_sampler functions
make functionalization work better with subclasses
Typo in Dockerfile
changed documentation from cosine distance to cosine similarity; fixe…
[GHA] Small updates to syncbranches.yml
Support masked sum on CSR tensors [CPU, CUDA]
[DataPipe] apply dill serialization for _Demux and add cache to traverse
Fix parameterlist dir func error (Module parameterlist attr script error pytorch/pytorch#74404)
Go through codebase and consolidate checkout steps
Add BFloat16 support for logsigmoid, hardsigmoid, hardshrink, softshrink, hardswish and softplus on CPU (Add BFloat16 support for logsigmoid, hardsigmoid, hardshrink, softshrink, hardswish and softplus on CPU pytorch/pytorch#63134)
[PyTorch] Existing MHA: fuse the attn_mask addition ([PyTorch] Existing MHA: fuse the attn_mask addition pytorch/pytorch#73219)
[Model Averaging] Code simplification for _find_process_group function ([Model Averaging] Code simplification for _find_process_group function pytorch/pytorch#75007)
Revert "WIP Jiterator reduction"
Revert "Support masked sum on CSR tensors [CPU, CUDA]"
Extend CSR constructor to support batched indices and values
Fix Team Tagging in CODEOWNERS
[Static Runtime] Fix handling relu in quantized linear relu dynamic op
[JIT] nvfuser CI fixes
quantization: autogenerate quantization backend configs for documentation (quantization: autogenerate quantization backend configs for documentation pytorch/pytorch#75126)
Add "pytorch/metamates" to merge_rules.json (Merge from upstream ROCm/pytorch#82)
Fix sign-compare in c10d/Utils.hpp
Fix sign-compare violations in cpp tests
ci: Bump linux runner availability to 750
[GHA] Add note about updating cached GQL queries
[quant][fx] Fix lowering pass for cases when to is not called with positional args ([quant][fx] Fix lowering pass for cases when to is not called with positional args pytorch/pytorch#75146)
Typofix
Fix sign-compare in nnapi backend
Fix sign-compare violations in torch_python
Fix sign-compare in caffe2
Fix sign-compare in caffe2 cpp tests
Fix lazy ts backend build flags (Fix lazy ts backend build flags pytorch/pytorch#75237)
Fix casting bug in state_step for optimizers when loading state dict
[ao][sparsity] make sparsity and PTQ compose ([ao][sparsity] make sparsity and PTQ compose pytorch/pytorch#74845)
Towards supporting quantized structured kernels (Towards supporting quantized structured kernels pytorch/pytorch#74560)
[quant][refactor] Refactor find_matches for easier future extension ([quant][refactor] Refactor find_matches for easier future extension pytorch/pytorch#74878)
Back out "[PyTorch Edge] Add Quantized Softmax Op (Naive Implementation)"
torch.hub security improvement: add new trust_repo parameter
Back out "Extend jit::load to work on flatbuffer file" (Back out "Extend jit::load to work on flatbuffer file" pytorch/pytorch#75244)
Shape functions: Use friendlier clamping pattern
Add -Wsign-compare to list of clang flags
[FSDP] Warning when fail to clone ([FSDP] Warning when fail to clone pytorch/pytorch#74946)
Add quantized::softmax to fc list
[FSDP][Easy] Fix 0-dim tensor optim state device ([FSDP][Easy] Fix 0-dim tensor optim state device pytorch/pytorch#75243)
Add include_self flag to scatter_reduce
torch.mm(dense, sparse_csr)
[complex] conv1d
add DispatchKeySet function to get highest backend key
.github: Update s3 actions to include runAttempt
masked argmin/argmax
don't add extra shuffle in DataLoader2 if one is present
Revert "[complex] conv1d"
Split SILU OpInfo
use links to re-enable tests
[Model Averaging] Fix post_localSGD_optimizer
Back out "Revert D35000703: [WIP][FSDP] Mixed precision enablement" ([Reland][FSDP] Mixed precision enablement" pytorch/pytorch#75024)
Revert "Extend CSR constructor to support batched indices and values"
[pkg] add generic ZipFile Reader/Writer ([pkg] add generic ZipFile Reader/Writer pytorch/pytorch#72237)
[Static Runtime] Fix a bug that aten::full_like reuses a tensor that does not match arguments ([Static Runtime] Fix a bug that aten::full_like reuses a tensor that does not match arguments pytorch/pytorch#74255)
[pkg] add zipfile unit tests ([pkg] add zipfile unit tests pytorch/pytorch#74929)
[FSDP] Code simplification
Back out "[shard] use scatter in shard_parameter API" (Back out "[shard] use scatter in shard_parameter API" pytorch/pytorch#75295)
[pt] Add half precision support for nn.EmbeddingBag (CPU) ([pt] Add half precision support for nn.EmbeddingBag (CPU) pytorch/pytorch#74844)
Move shape and operand definitions to base node (Move shape and operand definitions to base node pytorch/pytorch#75223)
Added support for SSA for ops not in a JIT graph
Add TORCH_CHECK for floating point exception in native_group_norm
jiterator reland (Enable simple reductions with jiterator, move prod to be jiterated pytorch/pytorch#75231)
Deprecate __torch_function__ as instance method in C++
Make LazyIr.h use provided backend namespace (Make LazyIr.h use provided backend namespace pytorch/pytorch#75264)
[ao][sparsity] make sparsity compose with PTQ convert ([ao][sparsity] make sparsity compose with PTQ convert pytorch/pytorch#74846)
Revert D35254715: [pkg] add zipfile unit tests
Revert D33970688: [pkg] add generic ZipFile Reader/Writer
Wconstab/doc codegen (Wconstab/doc codegen pytorch/pytorch#74850)
Reland NVFuser guard changes
[quant][fx] Add support for BinarOpQuantizeHandler in backend_config_dict ([quant][fx] Add support for BinarOpQuantizeHandler in backend_config_dict pytorch/pytorch#74882)
[quant][fx] Support override observers and fake quantize module in backend_config_dict ([quant][fx] Support override observers and fake quantize module in backend_config_dict pytorch/pytorch#75135)
[fx][ShapeProp] make shapes and args/kwargs concrete for minimizer ([fx][ShapeProp] make shapes and args/kwargs concrete for minimizer pytorch/pytorch#75291)
Make default codegen behavior skip Lower function (Make default codegen behavior skip Lower function pytorch/pytorch#75267)
Make forced eager fallback optional in codegen (Make forced eager fallback optional in codegen pytorch/pytorch#75274)
[FSDP] Enhance test for checkpoint
move Bazel //:tools_autograd to the //tools/autograd package (move Bazel //:tools_autograd to the //tools/autograd package pytorch/pytorch#74745)
Update related_commits
[reland] [complex] conv1d
[complex32] enable complex32 testing for atleast_nd
[complex32] {d,v,h}split
[ROCm] upgrade CI distributed test to ROCm 5.0
[WIP] enable cu116 builds
add pytorch directory into Exclusions of Windows Defender
[quant][fx] Remove Standalone and CustomModule QuantizeHandler type checks in prepare ([quant][fx] Remove Standalone and CustomModule QuantizeHandler type checks in prepare pytorch/pytorch#75202)
Extend jit::load to work on flatbuffer file; Take 2 (Extend jit::load to work on flatbuffer file; Take 2 pytorch/pytorch#75256)
Small repro improvements (Small repro improvements pytorch/pytorch#75108)
Add Parsing of tensor constants (Add Parsing of tensor constants pytorch/pytorch#75119)
Add support for nested var names in parser (Add support for nested var names in parser pytorch/pytorch#75124)
[JIT] Allow empty temporary list literals to be matched to arbitrary types ([JIT] Allow empty temporary list literals to be matched to arbitrary types pytorch/pytorch#74768)
Add support for legacy tensor constructors in JIT (Add support for legacy tensor constructors in JIT pytorch/pytorch#74785)
elu, selu, celu fwd-over-rev rules
[ONNX] Add bucketize symbolic
initial test stats work
[CUDA] Add fastAtomicAdd to scatter_add
[kineto] submodule update and fixes
[ci] fixes to upload test stats workflow
[ci] Only upload test stats on workflow success/failure
[pkg] add generic ZipFile Reader/Writer ([pkg] add generic ZipFile Reader/Writer pytorch/pytorch#72237)
[pkg] add zipfile unit tests ([pkg] add zipfile unit tests pytorch/pytorch#74929)
[quant][fx] Move all binary op configs to backend_config_dict ([quant][fx] Move all binary op configs to backend_config_dict pytorch/pytorch#75241)
[quant][fx] Remove the remaining registrations in BinaryOpQuantizeHandler ([quant][fx] Remove the remaining registrations in BinaryOpQuantizeHandler pytorch/pytorch#75258)
[quant][fx] Add cat to backend_config_dict ([quant][fx] Add cat to backend_config_dict pytorch/pytorch#75259)
[quant][fx] Add BatchNorm ops to backend_config_dict ([quant][fx] Add BatchNorm ops to backend_config_dict pytorch/pytorch#75260)
[quant][fx] Move the remaining fixed qparam ops to backend_config_dict ([quant][fx] Move the remaining fixed qparam ops to backend_config_dict pytorch/pytorch#75314)
[PT-D] Fix Sharding spec inference to avoid invalid chunk sharding to be inferred as chunkshardingspec ([PT-D] Fix Sharding spec inference to avoid invalid chunk sharding to be inferred as chunkshardingspec pytorch/pytorch#75296)
Fix duplicate commit logic in syncbranches (Fix duplicate commit logic in syncbranches pytorch/pytorch#75384)
[ci] temporarily pin get-workflow-job-id to master
[ONNX] update default opset_version to 13 ([ONNX] update default opset_version to 13 pytorch/pytorch#73898)
Add list of supported ATen ops by ONNX converter into torch.onnx page
[ONNX] Support torch.amax and torch.amin
[ONNX] Fix error comparing tensors on different devices in DeduplicateInitializers
[ROCM] Navi21 Enablement 9: Range and Multinomial Kernels ([ROCM] Navi21 Enablement 9: Range and Multinomial Kernels pytorch/pytorch#73550)
Update upload-test-stats req
updating nvfuser tests
disabling view
use Timer for cuda benchmarks
Remove internal logic to handle bytecode version 3 (Deprecate bytecode version 3 pytorch/pytorch#57775)
[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
[SR] Fix StaticRuntime move ctor ([SR] Fix StaticRuntime move ctor pytorch/pytorch#74927)
torch_function mode
Dynamo+LTC: add pybind to set force fallback config and use that in test_extract_compiled_graph.py (Dynamo+LTC: add pybind to set force fallback config and use that in test_extract_compiled_graph.py pytorch/pytorch#75292)
[quant][core][improvements] Removed reflection_pad1d_quantized_cpu, dimension and output resizing code in reflection_pad1d_out_template and implemented reflection_pad1d_out_quantized_cpu ([quant][core][improvements] Removed reflection_pad1d_quantized_cpu, dimension and output resizing code in reflection_pad1d_out_template and implemented reflection_pad1d_out_quantized_cpu pytorch/pytorch#74755)
Enable TE fuser to support user defined operator (Enable TE fuser to support user defined operator pytorch/pytorch#73073)
Abate spurious resize warnings in MultiMarginLoss on CUDA
[jiterator] sigmoid_backward: complex ([jiterator] sigmoid_backward: complex pytorch/pytorch#74948)
[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Revert D35423078: [pkg] add zipfile unit tests
Revert D35423079: [pkg] add generic ZipFile Reader/Writer
Fix minimum, maximum forward-ad formula for float32
leaky_relu forward-over-reverse rule
[quant][core][gpu] Implemented max pooling 2D using cudnn ([quant][core][gpu] Implemented max pooling 2D using cudnn pytorch/pytorch#74673)
[ci] add env vars to upload test stats
Update documentation for scatter_reduce
Fix addmm_cpu for int64
Support masked sum on CSR tensors [CPU, CUDA]
Extend CSR constructor to support batched indices and values
Add new keys for Graphcore IPU (DispatchKey / Backend / DeviceType)
[ci] add rery to getting job id
Revert "[CUDA] Add fastAtomicAdd to scatter_add"
[FSDP][Easy] Fix return in docstrings
[SR] Update README ([SR] Update README pytorch/pytorch#75283)
[ONNX] Use fixed test input for flaky test
update eigen submodule to latest release (3.4.0) with rocm fixes
Remove flake8 pre-commit hook
[ci] run upload-test-stats on self-hosted runner
[quant] adding const qualifier to Quantizer::equalTo ([quant] adding const qualifier to Quantizer::equalTo pytorch/pytorch#75355)
[PyTorchEdge] Backport from v9 flatbuffer to v8 pickle ([PyTorchEdge] Backport from v9 flatbuffer to v8 pickle pytorch/pytorch#75201)
Adding integration of SSA into LazyTensor
[SR] Do not manage tensors that escape scope via container ([SR] Do not manage tensors that escape scope via container pytorch/pytorch#74966)
fix to map an undefined tensor back to a tensor list
[quant][fx] Move output share qparam with input ops to backend_config_dict ([quant][fx] Move output share qparam with input ops to backend_config_dict pytorch/pytorch#75315)
[fx2trt] support for ne, logical_not, logical_and ([fx2trt] support for ne, logical_not, logical_and pytorch/pytorch#75444)
[easy][PTE] Ensure the stream points to the beginning before calling getFileFormat ([PTE] Ensure the stream points to the beginning before calling getFileFormat pytorch/pytorch#75437)
[JIT] Add support for closed over inf
[SR] Refactor memory planner to prepare for new algorithm ([SR] Refactor memory planner to prepare for new algorithm pytorch/pytorch#74730)
[JIT] Move log_extract.py helper functions to torch.utils
Cuda 11.6 Disable failing tests (Cuda 11.6 Disable failing tests pytorch/pytorch#75420)
CUDA Kernels: Use per-operator headers (2/4) (CUDA Kernels: Use per-operator headers (2/4) pytorch/pytorch#71213)
CUDA Kernels: Use per-operator headers (3/4) (CUDA Kernels: Use per-operator headers (3/4) pytorch/pytorch#71214)
CUDA Kernels: Use per-operator headers (4/4) (CUDA Kernels: Use per-operator headers (4/4) pytorch/pytorch#71215)
Revert "Support masked sum on CSR tensors [CPU, CUDA]"
[ao][sparsity] Composability of fusion and sparsity ([ao][sparsity] Composability of fusion and sparsity pytorch/pytorch#74847)
[torch deploy] Add -rdynamic option explicitly to CMakeLists.txt ([torch deploy] Add -rdynamic option explicitly to CMakeLists.txt pytorch/pytorch#75461)
Dynamo+LTC: handle inplace ops (Dynamo+LTC: handle inplace ops pytorch/pytorch#75359)
[wip] Correctly use pip3 and python3 for upload-test-stats
[quant] add checking number of args when checking observer in same graph ([quant] add checking number of args when checking observer in same graph pytorch/pytorch#75460)
Add forward AD for rsub, polar, and FFT
[mobile] Update test model generation script to count op occurrences
[quant][fx] Move rnn ops to backend_config_dict ([quant][fx] Move rnn ops to backend_config_dict pytorch/pytorch#75316)
move setup_helpers programs to their own package (move setup_helpers programs to their own package pytorch/pytorch#74838)
Updated cudnn_frontend submodule to v0.6 (Updated cudnn_frontend submodule to v0.6 pytorch/pytorch#75481)
[ONNX] Raise exception for mixed precision input for BatchNormalization
forward-mode AD formula for F.dropout
[quant][fx] Move embedding ops to backend_config_dict ([quant][fx] Move embedding ops to backend_config_dict pytorch/pytorch#75317)
[Quant][fx] Decouple prepare_*fx from training/eval modes (#75401)
remove from generate_ci_workflows.py, add tags back in .github/templates
fix 'pytorch/tools/code_coverage/README.md' for renamed options
Enable atomicAddNoRet() for all gfx targets.
ROCm: Enable test_masked_scatter_large_tensor
Add edge case tests and update tril code to cover
improve datapipe deprecation warnings (#74685)
[complex] conv2d
Make split_with_sizes an overload of split
[complex32] enable testing for {h,v,d}stack
Ansley's rebase of DimensionNode onto master (#75352)
Fix exporting models to ONNX without allow_tf32 in _convolution call
[ROCm] revert cat operator performance work-around (#986)
[PyTorch] _addm_activation native function for matmul/bias/activation fusion
[PyTorch] Add & use in-place gelu
[PyTorch] NestedTensor kernels for {r,g}elu{,_}
Ensure convolution_backward respects output_mask
[LT] Support diagonal op (#75230)
Update onnx.rst
Support masked sum on CSR tensors [CPU, CUDA]
updated _forward_unim. to include descriptive error
Docs: Fix log_target example in kl divergence
Add maximize flag to Adadelta
[BE] Fix deprecated usages of isIntegral
[ci] Fix bug in get_workflow_job_id.py
Revert D35489740: [pytorch][PR] Updated cudnn_frontend submodule to v0.6
Disable strides and contiguity for CSR tensors
[c10d] fix nccl gather outputs on non-root ranks (#75535)
[fx2trt] improve to_dtype (#48)
Optimize size(dim) and stride(dim)
[shard] add ShardedTensor.cpu() (#74941)
[quant][fx] Move all fusion registrations to backend_config_dict (#75318)
[quant][fx] Remove "additional_fusion_pattern" from prepare_custom_config_dict (#75377)
[quant][fx] Using native backend_config_dict in fusion (#75378)
[qunat][fx][bc-breaking] Remove "additional_quant_pattern" key from prepare_custom_config_dict (#75386)
Fix tsan issue (#75528)
[complex] conv3d
refining regrex of .gitignore core.*
[jiterator] neg: complex
[qunat][fx][bc-breaking] Remove "additional_qat_mapping" key from prepare_custom_config_dict (#75387)
make apply_to_tensors support OrderedDict type (#75560)
[qunat][fx] Remove "additional_fuser_method_mapping" key from prepare_custom_config_dict (#75388)
[quant][fx] Remove additional_object_mapping from the docs (#75389)
[FSDP] Add rank0_only to full_optim_state_dict()
Handle real paths that have other invalid Python identifiers
[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
fix external backend device guard codegen for factory ops
CUDA 11.6 workflows (#75518)
Add cuda-11.3+clang9 build workflow
[complex32] enable test view, view_as
Revert "Add cuda-11.3+clang9 build workflow"
remove special casing for sparse CSR shape comparison
[pytorch][require export] Create smartplatform configuration for trymerge (#75542)
ignore exception in Add-MpPreference
[ao][sparsity] comsability for sparsity and QAT convert
[ROCM] Navi21 Enablement 6: Tensor kernels
[ROCM] Navi21 Enablement 7: Sparse kernels
[ROCM] Navi21 Enablement 8: Index, Repeat and Sort kernels
[ROCM] Navi21 Enablement 9: Range and Multinomial Kernels ([ROCM] Navi21 Enablement 9: Range and Multinomial Kernels pytorch/pytorch#73550)
match rocm5.2 internal testing
Revert "Replace internal::GRAIN_SIZE by grain_size (parameter). (#53177)"
Updated related_commits
Cherry picked ROCm@a41829d
Hipify bug fix for header_include_paths being passed in as None from JIT path
Trigger CI
Skipped the tests as the fail w.r.t https://github.com/ROCmSoftwarePlatform/pytorch/pull/993
[Distributed tests] Add skip for odd world_size condition
linux focal builds install cmake from conda
missing header include in SortImpl.cu
[ROCm] Disable TestDataParallelDeviceType tests
ROCm: Fix test_nadam (#1006)
[Docker] Pin protobuf to 3.20.1 (#78369)
parallel_apply should forward current streams to worker threads
Add ROCm5.2/AMDGPU support (#1027)
Patch taken from "[CI] Make install_user.sh compatible with Focal (#77622)" commit 6aea0b1
Upgrade mypy to 0.960
Fix baseurl link in CentOS for ROCm5.2
Fix baseurl link in CentOS for ROCm5.2 (#1031)
[ test_shape_ops] Increase system memory requirement (#1026)
[test_large_cumprod_cuda_float16]: Increase system memory requirement (#1032)
Add ROCm5.2.1/AMDGPU support (#1046)
Add ROCm5.2.3/AMDGPU support for PyTorch

Fixes #ISSUE_NUMBER

Summary: X-link: pytorch/pytorch-canary#82 This will allow us to enable co-development merges between phabricator and GitHub Pull Request resolved: pytorch#75226 Reviewed By: malfet, seemethere Differential Revision: D35375458 Pulled By: bigfootjon fbshipit-source-id: e25f35e02b404850132c3972744202d27a18d8aa (cherry picked from commit 957c313)

Prerequisite change for enabling `-Werror=sign-compare` across PyTorch repo Pull Request resolved: pytorch#75081 Approved by: https://github.com/atalman

Prerequisite change for enabling `-Werror=sign-compare` across PyTorch repo Pull Request resolved: pytorch#75080 Approved by: https://github.com/atalman

Was noticing longer than average queuing times for linux.2xlarge and also found out that we're hitting our max limit more often than not so bumping this to 750 to give us more capacity to play around with. Signed-off-by: Eli Uriegas <[email protected]> <details> <summary> Number of times we've hit this in the last week </summary> ![Screen Shot 2022-04-04 at 3 44 30 PM](https://user-images.githubusercontent.com/1700823/161644454-eda8d3af-2e62-4e66-aea3-13ec37a41d7d.png) Query: https://fburl.com/6cst46y0 </summary> Pull Request resolved: pytorch#75234 Approved by: https://github.com/kit1980, https://github.com/osalpekar, https://github.com/malfet

Fixes #ISSUE_NUMBER Pull Request resolved: pytorch#75229 Approved by: https://github.com/seemethere, https://github.com/bigfootjon

…positional args (pytorch#75146) Summary: Pull Request resolved: pytorch#75146 Previously we assume `to` must be called with positioanl args, but this may not be the case, e.g. we can do `to(dtype=?)` or `to(memory_format=?)` Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: ejguan Differential Revision: D35342088 fbshipit-source-id: 22bfe78ae84e74141ae6560285c5c38bc068c999 (cherry picked from commit a3593c0)

Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: pytorch#75187 Approved by: https://github.com/zou3519

Prerequisite change for enabling `-Werror=sign-compare` across PyTorch repo Pull Request resolved: pytorch#75083 Approved by: https://github.com/ngimel

Prerequisite change for enabling `-Werror=sign-compare` across PyTorch repo Pull Request resolved: pytorch#75079 Approved by: https://github.com/albanD

Prerequisite change for enabling `-Werror=sign-compare` across PyTorch repo Pull Request resolved: pytorch#75082 Approved by: https://github.com/ngimel

Prerequisite change for enabling `-Werror=sign-compare` across PyTorch repo Pull Request resolved: pytorch#75084 Approved by: https://github.com/ngimel

Summary: Pull Request resolved: pytorch#75237 applies 'OVRSOURCE' logic to one more place missed in D35331263 (pytorch@8b7e2bf) so that lazy TS backend is not compiled in internal builds Test Plan: CI Reviewed By: malfet, shunting314 Differential Revision: D35377758 fbshipit-source-id: 5dcd3d36e50a8917470a917f2120353972dc31ba (cherry picked from commit 8b8ed7b)

Pull Request resolved: pytorch#75214 Approved by: https://github.com/albanD

Summary: Pull Request resolved: pytorch#74845 This PR adds support for quantization flow to detect parametrized modules and match them using their original module types. This mainly involved using the new type_before_parametrizations function rather than type to check for module mathcing Test Plan: python test/test_ao_sparsity.py TestComposability Imported from OSS Reviewed By: jerryzh168 Differential Revision: D35240274 fbshipit-source-id: 7294d89c9c2e069e51d8b9bafa45c15f92bed124 (cherry picked from commit ed5cdb7)

Summary: Pull Request resolved: pytorch#74560 This PR add support for quantized tensors with "unknown quantizer", which means that we can use standard APIs like torch.empty to allocate quantized tensors, with the understanding that we will set the quantizer later. This makes meta functions applicable to quantized tensors (they will allocate with unknown quantizer and the kernel will set the quantizer later) and fixes a bug David Dang reported where structured kernels give a weird error message when you call them with quantized inputs. This is not a complete support for quantized structured kernels because I haven't actually tried porting any of the quantized implementations to structured; qadd is probably a good choice to try first as it does its broadcasting implementation using TensorIterator. My goal here is just to show that the error message is better. See also pytorch#52680 Signed-off-by: Edward Z. Yang <ezyangfb.com> Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D35317441 Pulled By: dzdang fbshipit-source-id: ffb85b0e06ccbcc2b01052ca6760517684048b39 (cherry picked from commit 2a54b8b)

…ytorch#74878) Summary: Pull Request resolved: pytorch#74878 Previously we record the matched node as a list of nodes: `List[Node]`, this does not generalize to a graph, which is needed for future use cases, in this PR we changed the recorded node as NodePattern instead, currently defined as ``` NodePattern = Union[Tuple[Node, Node], Tuple[Node, Tuple[Node, Node]], Any] ``` but can be more general. This will allow us to support more general patterns with backend_config_dict api, and is also needed for BinaryOpQuantizeHandler refactor Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D35203616 fbshipit-source-id: f4bf5b056cfc0955455eea9c2bf1ac9f6dde3974 (cherry picked from commit b290c04)

…on)" Summary: Original commit changeset: 426a07808035 Original Phabricator Diff: D34943147 (pytorch@8d7242a) Since D34943147 (pytorch@8d7242a) landed, Adfinder push candidates show consistently push blocking red counters for getAds C CPU main thread and getAds NC CPU main thread. AF auto prod canary for D34943147 (pytorch@8d7242a), c1-c2 does shows 1.19% regression for counter 'getAds C CPU main thread' and ~1% regression for counter 'getAds C CPU main thread': https://www.internalfb.com/intern/experiment_store/experiment/27487791896054/#commit1-commit2 To help unblock adfinder push, reverting D34943147 (pytorch@8d7242a) Test Plan: Canary: https://our.intern.facebook.com/intern/ads/canary/442677925633895915 Canary completed: https://www.internalfb.com/intern/experiment_store/experiment/25288768753864/#commit1-commit2 Counter 'getAds C CPU main thread' moves in the opposite direction by -0.75. Differential Revision: D35370901 fbshipit-source-id: b2e89f5976eb3fa2c2b22f120c0e32e380f5bc52 (cherry picked from commit 1eb14fe)

As pointed by pytorch#71205, `torch.hub.load` assumes that the user trusts the repo from where the code is gathered and exececuted. We propose a solution to make sure that the user is aware of the security threat that this can represent. **Solution**: Adds a `trust_repo` parameter to the `load`, `list` and `help` functions in torch.hub. For now, the default `trust_repo=None` warns that, in the future, the user will need to authorize explicitly every repo before downloading it. Once the repo has been trusted (via `trust_repo=True` or via a command prompt input) it will be added to the list of trusted repositories. Pull Request resolved: pytorch#72060 Approved by: https://github.com/NicolasHug

Summary: Pull Request resolved: pytorch#75244 Original commit changeset: d653a5af662a Original Phabricator Diff: D35060736 (pytorch@d9d3492) Test Plan: Model loading test, verified that D35060736 (pytorch@d9d3492) will cause the torch::save => torch::load failure. Reviewed By: yinghai, jianyuh Differential Revision: D35387009 fbshipit-source-id: 9d176992d402d57779e2af3d905b3c1538335298 (cherry picked from commit 6c8cc0d)

When start_val == 0, using the comparison `start_val > self[dim]` can be folded easily (0 is never strictly greater than the result of `self[dim]`), but `start_val >= self[dim]` can't. Since we assign `start_val = sef[dim]` in the body anyway, both these are equivalent Pull Request resolved: pytorch#74980 Approved by: https://github.com/eellison

It caused a number of internal only compilation failures, for example see: pytorch#74425 (comment) and pytorch#74542 (comment) Pull Request resolved: pytorch#75085 Approved by: https://github.com/ngimel, https://github.com/albanD

Summary: Pull Request resolved: pytorch#74946 Warn instead of hard failure when fail to clone state_dict, as this param might not be managed by FSDP and thus we do not expect to clone it. ghstack-source-id: 152978204 Test Plan: CI Reviewed By: mrshenli Differential Revision: D35242306 fbshipit-source-id: d9eb58a2993341040e4a9f36fa388f423bd2ddc5 (cherry picked from commit 6b0d080)

Hoping to fix regression from https://hud.pytorch.org/minihud#1bcae0d10e1c4eddf07f9e60ced9b4f3c2c04b1f Adding quantized::softmax to list until 4/15/22. Pull Request resolved: pytorch#75254 Approved by: https://github.com/albanD

Summary: Pull Request resolved: pytorch#75243 Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D35384883 Pulled By: awgu fbshipit-source-id: 8dfc12035b79861df093d5921ed7b36050c9f3a0 (cherry picked from commit 6991467)

Pull Request resolved: pytorch#74607 Approved by: https://github.com/cpuhrsch

Fixes pytorch#68621 Pull Request resolved: pytorch#73686 Approved by: https://github.com/IvanYashchuk, https://github.com/malfet

Reference: pytorch#71108 Pull Request resolved: pytorch#75013 Approved by: https://github.com/anjali411

Pull Request resolved: pytorch#75233 Approved by: https://github.com/ezyang, https://github.com/larryliu0820

Updates our s3 actions to upload and download artifacts to versions that include runAttempt in the prefix for the artifact. This change is mostly to make it so that subsequent re-runs of a workflow do not attempt to grab artifacts from previous runs Coincides with: * seemethere/upload-artifact-s3#4 * seemethere/download-artifact-s3#1 Signed-off-by: Eli Uriegas <[email protected]> Pull Request resolved: pytorch#74576 Approved by: https://github.com/malfet, https://github.com/janeyx99

Pull Request resolved: pytorch#75212 Approved by: https://github.com/cpuhrsch

…-04-11 IFU-master-2022-04-11

Skipped the failing tests on ROCm during IFU-master-2022-04-11

As per pytorch#74995, the tests needs to be skipped for odd WORLD_SIZE Signed-off-by: Jagadish Krishnamoorthy <[email protected]> Fixes pytorch#74995 Pull Request resolved: pytorch#76136 Approved by: https://github.com/kumpera, https://github.com/wayi1

SortImpl.cu needs to include <thrust/execution_policy.h> for thrust::host. Depending on the nvidia/thrust or rocThrust version, transitive inclusion of this header is not guaranteed.

Signed-off-by: Jagadish Krishnamoorthy <[email protected]>

Change the rtol level Signed-off-by: Jagadish Krishnamoorthy <[email protected]>

…el_test [ROCm] Disable TestDataParallelDeviceType tests

To protect CI from sudden version updates, that are not compatible with other packages Fixes pytorch#78362 Pull Request resolved: pytorch#78369 Approved by: https://github.com/suo, https://github.com/atalman

Co-authored-by: Wang, Yanyao <[email protected]>

…ytorch#77622)" commit 6aea0b1

Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: pytorch#78804 Approved by: https://github.com/janeyx99

Co-authored-by: Wang, Yanyao <[email protected]>

Increase system memory requirement for TestShapeOpsCUDA.test_flip_large_tensor_cuda Signed-off-by: Jagadish Krishnamoorthy <[email protected]>

…ROCm#1032) Signed-off-by: Jagadish Krishnamoorthy <[email protected]>

* Fix baseurl link in CentOS for ROCm5.2 * Add ROCm5.2.1/AMDGPU support Co-authored-by: Wang, Yanyao <[email protected]>

…m5.2_internal_testing

WBobby · 2022-08-17T20:30:57Z

no use it.

…78136) (pytorch#78204) This prevents `import torch` accidentally crash on machines with no metal devices Should prevent crashes reported in pytorch#77662 (comment) and https://github.com/pytorch/functorch/runs/6560056366?check_suite_focus=true Backtrace to the crash: ``` (lldb) bt * thread #1, stop reason = signal SIGSTOP * frame #0: 0x00007fff7202be57 libobjc.A.dylib`objc_msgSend + 23 frame #1: 0x000000010fd9f524 libtorch_cpu.dylib`at::mps::HeapAllocator::MPSHeapAllocatorImpl::MPSHeapAllocatorImpl() + 436 frame #2: 0x000000010fda011d libtorch_cpu.dylib`_GLOBAL__sub_I_MPSAllocator.mm + 125 frame ROCm#3: 0x000000010ada81e3 dyld`ImageLoaderMachO::doModInitFunctions(ImageLoader::LinkContext const&) + 535 frame ROCm#4: 0x000000010ada85ee dyld`ImageLoaderMachO::doInitialization(ImageLoader::LinkContext const&) + 40(lldb) up frame #1: 0x000000010fd9f524 libtorch_cpu.dylib`at::mps::HeapAllocator::MPSHeapAllocatorImpl::MPSHeapAllocatorImpl() + 436 libtorch_cpu.dylib`at::mps::HeapAllocator::MPSHeapAllocatorImpl::MPSHeapAllocatorImpl: -> 0x10fd9f524 <+436>: movq %rax, 0x1b0(%rbx) 0x10fd9f52b <+443>: movw $0x0, 0x1b8(%rbx) 0x10fd9f534 <+452>: addq $0x8, %rsp 0x10fd9f538 <+456>: popq %rbx (lldb) disassemble ... 0x10fd9f514 <+420>: movq 0xf19ad15(%rip), %rsi ; "maxBufferLength" 0x10fd9f51b <+427>: movq %r14, %rdi 0x10fd9f51e <+430>: callq *0xeaa326c(%rip) ; (void *)0x00007fff7202be40: objc_msgSend ``` which corresponds to `[m_device maxBufferLength]` call, where `m_device` is not initialized in https://github.com/pytorch/pytorch/blob/2ae3c59e4bcb8e6e75b4a942cacc2d338c88e609/aten/src/ATen/mps/MPSAllocator.h#L171 Pull Request resolved: pytorch#78136 Approved by: https://github.com/seemethere Co-authored-by: Nikita Shulga <[email protected]>

This makes the rocm jobs run on master-only. We've been battling queue times for a few months now (pytorch#73039). So far we have tried or investigated: 1. Moving distributed builds to master 2. Moving distributed builds to periodic 3. Only running rocm on a specific set of paths 4. Running multiple jobs on a single rocm host. Unfortunately, we haven't been able to reduce queuing times to good levels. As a result, ROCm jobs are the "weightiest" job in PR CI, with an average TTS of 3.3h (see https://hud.pytorch.org/metrics, panel name "Job time-to-signal, all branches"). There are two things we haven't tried so far: 1. Running "smoke tests" only on PR 2. Switching rocm builds to master Since #2 is easiest let's give it a try. For now, the policy would be the same as what we do for other capacity-constrained configurations (Win and Mac)—run on master only, but revert if there is a breakage introduced. [skip ci] Pull Request resolved: pytorch#77989 Approved by: https://github.com/malfet, https://github.com/janeyx99

…78136) This prevents `import torch` accidentally crash on machines with no metal devices Should prevent crashes reported in pytorch#77662 (comment) and https://github.com/pytorch/functorch/runs/6560056366?check_suite_focus=true Backtrace to the crash: ``` (lldb) bt * thread #1, stop reason = signal SIGSTOP * frame #0: 0x00007fff7202be57 libobjc.A.dylib`objc_msgSend + 23 frame #1: 0x000000010fd9f524 libtorch_cpu.dylib`at::mps::HeapAllocator::MPSHeapAllocatorImpl::MPSHeapAllocatorImpl() + 436 frame #2: 0x000000010fda011d libtorch_cpu.dylib`_GLOBAL__sub_I_MPSAllocator.mm + 125 frame ROCm#3: 0x000000010ada81e3 dyld`ImageLoaderMachO::doModInitFunctions(ImageLoader::LinkContext const&) + 535 frame ROCm#4: 0x000000010ada85ee dyld`ImageLoaderMachO::doInitialization(ImageLoader::LinkContext const&) + 40(lldb) up frame #1: 0x000000010fd9f524 libtorch_cpu.dylib`at::mps::HeapAllocator::MPSHeapAllocatorImpl::MPSHeapAllocatorImpl() + 436 libtorch_cpu.dylib`at::mps::HeapAllocator::MPSHeapAllocatorImpl::MPSHeapAllocatorImpl: -> 0x10fd9f524 <+436>: movq %rax, 0x1b0(%rbx) 0x10fd9f52b <+443>: movw $0x0, 0x1b8(%rbx) 0x10fd9f534 <+452>: addq $0x8, %rsp 0x10fd9f538 <+456>: popq %rbx (lldb) disassemble ... 0x10fd9f514 <+420>: movq 0xf19ad15(%rip), %rsi ; "maxBufferLength" 0x10fd9f51b <+427>: movq %r14, %rdi 0x10fd9f51e <+430>: callq *0xeaa326c(%rip) ; (void *)0x00007fff7202be40: objc_msgSend ``` which corresponds to `[m_device maxBufferLength]` call, where `m_device` is not initialized in https://github.com/pytorch/pytorch/blob/2ae3c59e4bcb8e6e75b4a942cacc2d338c88e609/aten/src/ATen/mps/MPSAllocator.h#L171 Pull Request resolved: pytorch#78136 Approved by: https://github.com/seemethere

… of libtorch_python (pytorch#78028) Summary: This moves torch::class_<WorkerInfo> into `rpc_agent.cpp` so it gets registered in libtorch instead of libtorch_python. This is intermediate work to getting torch::deploy to load an unmodified copy of libtorch. Current RPC is incompatible due to duplicate registrations. ``` unknown file: Failure C++ exception with description "Exception Caught inside torch::deploy embedded library: Custom class with name __torch__.torch.classes.dist_rpc.WorkerInfo is already registered. Ensure that registration with torch::class_ is only called once. Exception raised from registerCustomClass at ../aten/src/ATen/core/custom_class.cpp:61 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x3e (0x7f3bd9adb92e in /home/tristanr/venvs/multipy/lib/python3.8/site-packages/torch/lib/libc10.so) frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5c (0x7f3bd9ab7068 in /home/tristanr/venvs/multipy/lib/python3.8/site-packages/torch/lib/libc10.so) frame #2: torch::registerCustomClass(std::shared_ptr<c10::ClassType>) + 0x110 (0x7f3bc2258980 in /home/tristanr/venvs/multipy/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so) frame ROCm#3: torch::detail::class_base::class_base(std::string const&, std::string const&, std::string, std::type_info const&, std::type_info const&) + 0x3b9 (0x7f3bc225a419 in /home/tristanr/venvs/multipy/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so) frame ROCm#4: [0x7f3ba45cfea1] frame ROCm#5: <unknown function> + 0x1b5334 (0x5652bdab9334 in ./test_deploy) frame ROCm#6: <unknown function> + 0x1b4f3e (0x5652bdab8f3e in ./test_deploy) frame ROCm#7: <unknown function> + 0x1b519b (0x5652bdab919b in ./test_deploy) frame ROCm#8: loadSearchFile(char const*) + 0x23e (0x7f3ba62f37f8 in /tmp/torch_deploy9ATEFg) frame ROCm#9: deploy_set_self + 0x51 (0x7f3ba62f38f9 in /tmp/torch_deploy9ATEFg) frame ROCm#10: torch::deploy::Interpreter::Interpreter(torch::deploy::InterpreterManager*, std::shared_ptr<torch::deploy::Environment>) + 0x274 (0x5652bdaaa790 in ./test_deploy) frame ROCm#11: void __gnu_cxx::new_allocator<torch::deploy::Interpreter>::construct<torch::deploy::Interpreter, torch::deploy::InterpreterManager*, std::shared_ptr<torch::deploy::Environment>&>(torch::deploy::Interpreter*, torch::deploy::InterpreterManager*&&, std::shared_ptr<torch::deploy::Environment>&) + 0x81 (0x5652bdaaf58b in ./test_deploy) frame ROCm#12: void std::allocator_traits<std::allocator<torch::deploy::Interpreter> >::construct<torch::deploy::Interpreter, torch::deploy::InterpreterManager*, std::shared_ptr<torch::deploy::Environment>&>(std::allocator<torch::deploy::Interpreter>&, torch::deploy::Interpreter*, torch::deploy::InterpreterManager*&&, std::shared_ptr<torch::deploy::Environment>&) + 0x4a (0x5652bdaae320 in ./test_deploy) frame ROCm#13: void std::vector<torch::deploy::Interpreter, std::allocator<torch::deploy::Interpreter> >::_M_realloc_insert<torch::deploy::InterpreterManager*, std::shared_ptr<torch::deploy::Environment>&>(__gnu_cxx::__normal_iterator<torch::deploy::Interpreter*, std::vector<torch::deploy::Interpreter, std::allocator<torch::deploy::Interpreter> > >, torch::deploy::InterpreterManager*&&, std::shared_ptr<torch::deploy::Environment>&) + 0xee (0x5652bdaae4a0 in ./test_deploy) frame ROCm#14: void std::vector<torch::deploy::Interpreter, std::allocator<torch::deploy::Interpreter> >::emplace_back<torch::deploy::InterpreterManager*, std::shared_ptr<torch::deploy::Environment>&>(torch::deploy::InterpreterManager*&&, std::shared_ptr<torch::deploy::Environment>&) + 0xb6 (0x5652bdaad258 in ./test_deploy) frame ROCm#15: torch::deploy::InterpreterManager::InterpreterManager(unsigned long, std::shared_ptr<torch::deploy::Environment>) + 0x123 (0x5652bdaa83b1 in ./test_deploy) frame ROCm#16: TorchpyTest_InitTwice_Test::TestBody() + 0x65 (0x5652bda075a9 in ./test_deploy) frame ROCm#17: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 0x65 (0x5652bda944b7 in ./test_deploy) frame ROCm#18: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 0x5a (0x5652bda8cfe7 in ./test_deploy) frame ROCm#19: testing::Test::Run() + 0x100 (0x5652bda68622 in ./test_deploy) frame ROCm#20: testing::TestInfo::Run() + 0x10f (0x5652bda68fb3 in ./test_deploy) frame ROCm#21: testing::TestSuite::Run() + 0x121 (0x5652bda6980d in ./test_deploy) frame ROCm#22: testing::internal::UnitTestImpl::RunAllTests() + 0x38e (0x5652bda756e6 in ./test_deploy) frame ROCm#23: bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 0x65 (0x5652bda9586b in ./test_deploy) frame ROCm#24: bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 0x5a (0x5652bda8e0f7 in ./test_deploy) frame ROCm#25: testing::UnitTest::Run() + 0xc9 (0x5652bda73fd1 in ./test_deploy) frame ROCm#26: RUN_ALL_TESTS() + 0x11 (0x5652bda169fa in ./test_deploy) frame ROCm#27: main + 0x27 (0x5652bda10ce2 in ./test_deploy) frame ROCm#28: <unknown function> + 0x2d310 (0x7f3bc0431310 in /usr/lib/libc.so.6) frame ROCm#29: __libc_start_main + 0x81 (0x7f3bc04313c1 in /usr/lib/libc.so.6) frame ROCm#30: _start + 0x25 (0x5652bda063b5 in ./test_deploy) ``` Test Plan: CI Differential Revision: D36564258 Pull Request resolved: pytorch#78028 Approved by: https://github.com/rohan-varma

… to conform with non-quantized countertpart filenames Summary: Names of analogous files in quantized directory (previously snake case) were inconsistent with their non-quantized filename counterparts (pascal case). This is the first of a series of PRs that changes all files in quantized (and sub-directories) dir to have pascal case. `aten/src/ATen/native/quantized/qconv_unpack.cpp` has not been renamed yet because (for reasons currently unknown) after making the name change, `import torch` produces the below error (`qlinear_unpack.cpp` renaming also seems to fail some phabricator CI tests for similar reasons). We suspect that these may be undefined errors and will revisit naming these files in a future PR. ``` terminate called after throwing an instance of 'c10::Error' what(): Type c10::intrusive_ptr<ConvPackedParamsBase<2> > could not be converted to any of the known types. Exception raised from operator() at ../aten/src/ATen/core/jit_type.h:1735 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x55 (0x7f26745c0c65 in /data/users/dzdang/pytorch/torch/lib/libc10.so) frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xb1 (0x7f26745bdcd1 in /data/users/dzdang/pytorch/torch/lib/libc10.so) frame #2: <unknown function> + 0x1494e24 (0x7f2663b14e24 in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so) frame ROCm#3: <unknown function> + 0xfed0bc (0x7f266366d0bc in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so) frame ROCm#4: c10::detail::infer_schema::make_function_schema(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>) + 0x5a (0x7f266366d71a in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so) frame ROCm#5: c10::detail::infer_schema::make_function_schema(c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>) + 0x7b (0x7f266366e06b in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so) frame ROCm#6: <unknown function> + 0x1493f32 (0x7f2663b13f32 in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so) frame ROCm#7: <unknown function> + 0xe227dd (0x7f26634a27dd in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so) frame ROCm#8: <unknown function> + 0x14e0a (0x7f268c934e0a in /lib64/ld-linux-x86-64.so.2) ..........................truncated............. ``` Test Plan: ``` python test/test_quantization.py ``` Pull Request resolved: pytorch#77037 Approved by: https://github.com/jerryzh168

bigfootjon and others added 30 commits April 4, 2022 22:28

Fix sign-compare in c10d/Utils.hpp

ef56497

Prerequisite change for enabling `-Werror=sign-compare` across PyTorch repo Pull Request resolved: pytorch#75081 Approved by: https://github.com/atalman

Fix sign-compare violations in cpp tests

81d765e

Prerequisite change for enabling `-Werror=sign-compare` across PyTorch repo Pull Request resolved: pytorch#75080 Approved by: https://github.com/atalman

[GHA] Add note about updating cached GQL queries

dd63658

Fixes #ISSUE_NUMBER Pull Request resolved: pytorch#75229 Approved by: https://github.com/seemethere, https://github.com/bigfootjon

Typofix

fd8f2e4

Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: pytorch#75187 Approved by: https://github.com/zou3519

Fix sign-compare in nnapi backend

78305ad

Prerequisite change for enabling `-Werror=sign-compare` across PyTorch repo Pull Request resolved: pytorch#75083 Approved by: https://github.com/ngimel

Fix sign-compare violations in torch_python

c593c22

Prerequisite change for enabling `-Werror=sign-compare` across PyTorch repo Pull Request resolved: pytorch#75079 Approved by: https://github.com/albanD

Fix sign-compare in caffe2

6d85e7d

Prerequisite change for enabling `-Werror=sign-compare` across PyTorch repo Pull Request resolved: pytorch#75082 Approved by: https://github.com/ngimel

Fix sign-compare in caffe2 cpp tests

f6e7a2a

Prerequisite change for enabling `-Werror=sign-compare` across PyTorch repo Pull Request resolved: pytorch#75084 Approved by: https://github.com/ngimel

Fix casting bug in state_step for optimizers when loading state dict

10bb0ff

Pull Request resolved: pytorch#75214 Approved by: https://github.com/albanD

Add -Wsign-compare to list of clang flags

90a56fc

It caused a number of internal only compilation failures, for example see: pytorch#74425 (comment) and pytorch#74542 (comment) Pull Request resolved: pytorch#75085 Approved by: https://github.com/ngimel, https://github.com/albanD

Add quantized::softmax to fc list

79cae53

Hoping to fix regression from https://hud.pytorch.org/minihud#1bcae0d10e1c4eddf07f9e60ced9b4f3c2c04b1f Adding quantized::softmax to list until 4/15/22. Pull Request resolved: pytorch#75254 Approved by: https://github.com/albanD

Add include_self flag to scatter_reduce

e9a8e6f

Pull Request resolved: pytorch#74607 Approved by: https://github.com/cpuhrsch

torch.mm(dense, sparse_csr)

f2a4d49

Fixes pytorch#68621 Pull Request resolved: pytorch#73686 Approved by: https://github.com/IvanYashchuk, https://github.com/malfet

[complex] conv1d

b64e7de

Reference: pytorch#71108 Pull Request resolved: pytorch#75013 Approved by: https://github.com/anjali411

add DispatchKeySet function to get highest backend key

5870e84

Pull Request resolved: pytorch#75233 Approved by: https://github.com/ezyang, https://github.com/larryliu0820

masked argmin/argmax

e5a1a78

Pull Request resolved: pytorch#75212 Approved by: https://github.com/cpuhrsch

pruthvistony and others added 22 commits April 22, 2022 09:27

Merge pull request ROCm#993 from ROCmSoftwarePlatform/IFU-master-2022…

625dd01

…-04-11 IFU-master-2022-04-11

Skipped the tests as the fail w.r.t ROCm#993

018f450

Merge branch 'rocm_fork/master' into rocm5.2_internal_testing

81605c8

Merge pull request ROCm#1001 from rraminen/skip_failining_unittests_5.2

e2476c0

Skipped the failing tests on ROCm during IFU-master-2022-04-11

linux focal builds install cmake from conda

4c5dfca

missing header include in SortImpl.cu

cbf022a

SortImpl.cu needs to include <thrust/execution_policy.h> for thrust::host. Depending on the nvidia/thrust or rocThrust version, transitive inclusion of this header is not guaranteed.

[ROCm] Disable TestDataParallelDeviceType tests

803faf1

Signed-off-by: Jagadish Krishnamoorthy <[email protected]>

ROCm: Fix test_nadam (ROCm#1006)

7e0e9cd

Change the rtol level Signed-off-by: Jagadish Krishnamoorthy <[email protected]>

Merge pull request ROCm#1014 from jaglinux/origin/disable_dist_parall…

e88d29c

…el_test [ROCm] Disable TestDataParallelDeviceType tests

[Docker] Pin protobuf to 3.20.1 (pytorch#78369)

e00eac7

To protect CI from sudden version updates, that are not compatible with other packages Fixes pytorch#78362 Pull Request resolved: pytorch#78369 Approved by: https://github.com/suo, https://github.com/atalman

parallel_apply should forward current streams to worker threads

07b877b

Add ROCm5.2/AMDGPU support (ROCm#1027)

ac0a1af

Co-authored-by: Wang, Yanyao <[email protected]>

Patch taken from "[CI] Make install_user.sh compatible with Focal (p…

fc1bb5b

…ytorch#77622)" commit 6aea0b1

Upgrade mypy to 0.960

2a932eb

Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: pytorch#78804 Approved by: https://github.com/janeyx99

Fix baseurl link in CentOS for ROCm5.2

593a7a0

Fix baseurl link in CentOS for ROCm5.2 (ROCm#1031)

572c17c

Co-authored-by: Wang, Yanyao <[email protected]>

[ test_shape_ops] Increase system memory requirement (ROCm#1026)

25c434b

Increase system memory requirement for TestShapeOpsCUDA.test_flip_large_tensor_cuda Signed-off-by: Jagadish Krishnamoorthy <[email protected]>

[test_large_cumprod_cuda_float16]: Increase system memory requirement (…

9db2776

…ROCm#1032) Signed-off-by: Jagadish Krishnamoorthy <[email protected]>

Add ROCm5.2.1/AMDGPU support (ROCm#1046)

0f267e9

* Fix baseurl link in CentOS for ROCm5.2 * Add ROCm5.2.1/AMDGPU support Co-authored-by: Wang, Yanyao <[email protected]>

Merge branch 'ROCmSoftwarePlatform:rocm5.2_internal_testing' into roc…

2630f9d

…m5.2_internal_testing

Add ROCm5.2.3/AMDGPU support for PyTorch

2650e50

WBobby closed this Aug 17, 2022

WBobby deleted the rocm5.2_internal_testing branch August 18, 2022 14:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add ROCm5.2.3/AMDGPU support for PyTorch #2

Add ROCm5.2.3/AMDGPU support for PyTorch #2

Uh oh!

WBobby commented Aug 17, 2022

Uh oh!

WBobby commented Aug 17, 2022

Uh oh!

Uh oh!

Add ROCm5.2.3/AMDGPU support for PyTorch #2

Add ROCm5.2.3/AMDGPU support for PyTorch #2

Uh oh!

Conversation

WBobby commented Aug 17, 2022

Uh oh!

WBobby commented Aug 17, 2022

Uh oh!

Uh oh!